Calculus 1 New Lecture Notes
Calculus 1 New Lecture Notes
“And what are these Fluxions? The Velocities of evanescent Increments? And
what are these same evanescent Increments? They are neither finite Quantities nor
Quantities infinitely small, nor yet nothing. May we not call them the ghosts of
departed quantities?”
1 Introduction 7
1.1 Historic introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Course information and organisation . . . . . . . . . . . . . . . . . . . . . . . . 11
3 The Limit 63
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 The Existence of the Limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.1 Continuous Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.2 Limits Involving Infinity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Working with Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3.1 Rules for Limits of Composite Expressions. . . . . . . . . . . . . . . . . . 77
3.3.2 Multiple Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3
4 CONTENTS
4 The Derivative 95
4.1 Differentiation from first principles . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.3 Properties of the derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3.1 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3.2 The Sum Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.3 The Product Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.4 The Quotient Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4 Derivatives of Implicit Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.1 Inverse Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.3 Parametric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5 Integration 125
5.0.1 Motion at constant speed: the area under a straight line . . . . . . . . . 126
5.1 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . 139
5.2.1 Indefinite and Definite Integrals. . . . . . . . . . . . . . . . . . . . . . . . 144
5.3 Properties of the Integral and some Techniques for Integration . . . . . . . . . . 145
5.3.1 Solving Integrals by Substitution . . . . . . . . . . . . . . . . . . . . . . 147
5.3.2 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.3 Partial Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.3.4 Recursion Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4 Some Applications: Length, Area and Volume. . . . . . . . . . . . . . . . . . . . 163
5.4.1 Areas of Circles and Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.4.2 Volumes of Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.4.3 The Length of a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Appendix 195
6.A The Ratio Test Implies the Root Test. . . . . . . . . . . . . . . . . . . . . . . . 195
Acknowledgements
These lecture notes were written in August-September 2015, with a major revision written in
August-September 2018, and built upon the previous excellent set of notes written by Professor
Ton Coolen, Professor Coolen’s hand can be sensed throughout these notes and I am grateful
to him for introducing me to this module in 2013. Each overhaul of a standard set of lecture
notes builds its foundations on the previous courses and the work of Professor Peter Sollich and
Dr Gérard Watts has had a profound impact on this iteration of the course. All typograph-
ical errors and errors of any other type are, of course, my fault and I will be grateful for all
comments of any kind of mistake. Thanks are due to Jane Bennett-Rees, Robert Evans, Asuka
Kumon, (Anthony) Peter Young, Michael Yiasemides, Haodong Sun, Shuo Huang, Pablo de
Castro, Rishi Mouland, Alessio Sarti, Senan Sekhon, Keith Glennon, Veno Mramor, Véronique
Fischer and many others for pointing out typos and making other comments on the text.
In which we introduce and motivate the study of calculus; describe the course in outline; explain
the organisation of the course and how it will be examined.
7
8 CHAPTER 1. INTRODUCTION
two, forming two new continuous lines. Now if each continuous line contains N atoms then the
pair of lines contain 2N atoms - we appear to have created indivisible atoms (and if we had
joined lines together we would be apparently destroying indivisible atoms), which undermines
the assumption of a finite number of geometric atoms making up any continuous line. There
is another possibility: perhaps each continuous line contains an infinite number of indivisible
parts, even lines of finite length - which gives a consistent picture but raises further questions.
Especially a fundamental problem in constructing the calculus: how can an infinite number
of infinitesimal quantities give a finite quantity? The most famous poser of such paradoxical
problems was Zeno of Alea, and perhaps his most famous paradox is that of Achilles and the
tortoise which is succinctly described by Aristotle:
“In a race, the quickest runner can never overtake the slowest, since the pursuer
must first reach the point whence the pursued started, so that the slower must
always hold a lead.”
It is really the same problem as that of indivisibles vs infinitesimals. The resolution is calculus
which gives a rigorous way to deal with the algebra of infinitesimal quantities. Parts of the
ancient world did make use of infinitesimal mathematics, although such methods were not
trusted, for example Archimedes was able to find the volume of a sphere, a cone and a cylinder
using his “method of indivisibles”, but he verified his results using geometrical derivations as
well. Zu Chongzhi and his son Zu Gengzhi were also able to make similar computations.
The question of the indivisibles or infinitesimals was not properly progressed for over a
thousand years until Evangelista Torricelli and John Wallis’s infinitesimals in the 17th century.
It is interesting to note how philosophically disturbing it was in that era to contemplate the
idea that the mathematical world was underpinned by infinitesimal quantities rather than
indivisible atoms. The solidity and surety of atoms of geometry sat easily with the idea of a
stable universe, in which stable kingdoms are governed by unquestionable systems of law and
religion. The idea of infinitesimals which were not mathematically well-understood was not
welcome1 and there are famous examples of respectable and historic figures who rejected the
idea of an infinitesimal quantity, notably René Descartes (who first embraced and then turned
away from infinitesimals) and George Berkeley who said that infinitesimals were
The period that commenced with Newton and Leibniz saw the continual development of calculus
and the branch of mathematics called analysis right up until 1910 when consensus was reached.
1
There is an excellent book called Infinitesimal by Amir Alexander which discusses just how dangerous an
idea the infinitesimal calculus was at the time.
1.1. HISTORIC INTRODUCTION 9
The consensus was not concerned with whether a set of statements was true or false but focussed
on the validity of the techniques used to make the arguments and their range of valid application.
This was required in order to rigorously prove statements of calculus. It does seem peculiar that
calculus can be founded properly in the 17th century but discussions of its validity cannot be
settled until the 20th century. The reason is that calculus (and analysis) is concerned with the
interplay between the very large (the infinities) and the very small (infinitesimal quantities) and
as you will know infinities and zeroes can make even the most simple algebraic manipulation
invalid or misleading. Consider the simple slicing up of disc into similar segments each of which
subtends an angle θ at the centre of the circle:
The straight lines (in red above) can be summed to find the length of the circumference of
the circle. As the angle θ is decreased the better the approximation of the circumference and
in the limit where θ → 0 the approximation of circumference can be expected to become exact.
However, in that limit, the length of the red lines goes to zero and we see we are arguing that
an infinite sum of infinitesimal lengths is neither zero nor infinity but is finite. The central
question is to understand under which circumstances this is the case.
Another similar example also highlights the need to resolve the question of how one works
with infinitesimal quantities. Consider the square ABCD whose edges are all of 1 metre long.
Imagine an ant travels the path along the edges AB and then BC, denote the path ABC.
The distance travelled by the ant is 2 metres. Now consider a new path within the square
AB1 B2 B3 C whose edges all measure 0.5 metres as shown by the middle square in figure 1.1,
i.e. |AB1 | = |B1 B2 | = |B2 B3 | = |B3 C| = 0.5m. The ant travels the path AB1 B2 B3 C which has
length 4 × 12 = 2 metres. Now repeat the process: half the length of each edge and form the
staircase path whose edges all of length n1 metres. There are now 2n edges of length n1 giving
a total path length of, again, 2 metres. No matter how small the edge length n1 is made, the
total path length remains 2 metres. However we recognise that as n → ∞ the path travelled
√
by the ant approaches the diagonal of the unit square, which has length 2 metres. So there is
10 CHAPTER 1. INTRODUCTION
1 133 '
h
B C B C B C
is
1 B Bz
'
, .
yz
is
A D A D A D
th
Edge length =
Figure 1.1: An ant travels along a staircase shaped path, the path always has length two metres,
regardless of the number of steps.
a fundamental difference between the finite n case and the case where n → ∞, and the process
of taking the limit to infinity needs to be carefully constructed.
Consider a final example that was the subject of significant debate in the 18th century, the
series:
1 − 1 + 1 − 1 + 1 − 1 + 1 − 1 + ....
The modern view is that this series is divergent as it does not tend to a limit, however you
would be in the good company of Euler if you argued that it is an example of a geometric series
of the form
1 + x + x2 + x3 + ...
1
(with x = −1) which sums to 1−x = 21 . You would also be forgiven if, without a system of
analysis, you argued that you could sum the series to zero or one.
A final confusion that we would like to emphasise here at the outset of our course was
apparent already within the foundations of calculus and occurs when we consider two functions
f (x) and g(x) which under a transformation x → x + dx such that f → f + df and g → g + dg.
The variation of a product of these functions is derived from the variations of f and g as
However Leibniz2 argued that the infinitesimal variation of a product of the two functions is
d(f g) = f dg + g df
that is df dg is treated as zero whereas df or dg is infinitesimally small but finite and not
treated as zero - why is it valid to treat df dg as zero? The answer to this question requires
us to consider what we mean by a function and to develop a mathematical method that will
2
This is Leibniz famous product rule, albeit written in a slightly unfamiliar notation.
1.2. COURSE INFORMATION AND ORGANISATION 11
allow us to work with finite but very small quantities rather than to attempt an algebra of
zeros. The development of these techniques will pave the way for calculus. It will turn out
that there are many mysteries which will need to be vanquished and even the most beguilingly
simple expressions will acquire a clear definition in this quest. For example if you paused when
reading the expression f g and wondered just what is a product of functions - you would be
asking a good question, one that we will answer as the course develops.
Course tutorials
The tutorials will start in the second week of the teaching term.
Your tutorial group will be shown on your personal timetable where you will also find the the
classroom number for your group. Attendance at tutorials is registered, if you fail to attend
and/or fail to hand in solutions to tutorial exercises repeatedly you may not be allowed to sit
the January examination. Of course, more importantly, by not trying the problems you will
not get to enjoy many of the intricacies of analysis.
(ii) There will be a final (2 hour) examination in the early January, counting for 80% of the
overall mark. Make sure you able to sit the exam in January (i.e. don’t book travel
tickets before the exam dates are announced - usually in November).
12 CHAPTER 1. INTRODUCTION
There will be no resits for class tests. See the KEATS page on class tests and course website for
more details. Class test results will be announced on the class test KEATS page and solutions
will also be published there.
Reading week
Reading week occurs in the middle of the first teaching semester - please check the current
term dates to be certain which week is reading week. It is a week of no scheduled study and
is intended as a time for catching up the material on all your courses: there are no lectures or
tutorials classes or office hours in this week.
Tutorials
• Your tutorial group will normally consist of about 20 students. Tutorials are not like mini-
lectures, but more like small workshops; they involve and demand active participation
from the group.
• The idea for the tutorials is that at the start of the first tutorial you form small groups
of (say) 3-4 people, and work together in these small groups for the rest of the semester.
Each week you work during the tutorial hour on the problems set for that particular week.
You will find these problems at the end of the course lecture notes.
• The teaching assistant will go around in the room and make herself/himself available to
anyone who gets stuck. Their role, however, is not to give complete ‘model solutions’;
you learn more from twenty minutes struggling with a problem than from two minutes of
being being told by someone else how it is done. Discussing problems is also one of the
most efficient ways of putting new material into your long-term memory. The assistant
will only go through model solutions on the board if it turns out that there are particular
problems everyone got wrong (in her/his judgement).
• You will be expected to each present some solution or partial solution to a problem on
the board to your classmates at some point in the tutorials. This is to give you some
experience of presenting mathematics to a group, a skill which is high demand in the
modern world (both academic and otherwise).
• You will usually not be able to do all exercises during the tutorial (they tend to be too
hard and/or too many for that), but you will submit your solutions to those question
marked with ? (written and explained properly) to your tutor at the end of the tutorial.
Normally you will need to commence writing your solutions in advance of the tutorial.
1.2. COURSE INFORMATION AND ORGANISATION 13
• After you have handed in your work for a particular week, the solutions will be made
available for you to download from the Calculus I page on KEATS. The teaching assistant
will inspect your solutions and return the work to you at your next tutorial.
• Experience shows that if you have understood the tutorial exercises (perhaps not in the
week you first tried to do them, but upon further study), you will also be successful in
the Calculus I class tests and in the January examination.
• You will know your tutorial group allocation from your personal timetable. But if you
are uncertain which group you should be in please write to [email protected].
Always inspect regularly the Calculus I KEATS page to be updated on any changes.
The book by Jenny Olive (which is available online from the library via the “Resource List”
on the right-hand-side of the Calculus I KEATS page) is written in a very informal style, and
intended for ‘science’ students, rather than mathematics students; in parts it is rather basic, as
14 CHAPTER 1. INTRODUCTION
well as missing out some of the more advanced topics, but it has well written sections on most
of the material covered in this module and provides many examples - it also has hamsters. The
book by Michael Spivak has yellow pigs. The book by James Stewart is extremely popular.
A fourth book which might also prove useful is
Again this is not intended for mathematics students, but covers almost all the topics in the
module in a very straightforward and clear way with many worked examples, and might prove
useful if you find the other books too technical or need to see more examples.
There are a number of other very good books introducing more advanced topics which
also cover in their introductory chapters some of the material in the Calculus 1 course. In
particular complex numbers are not usually covered as part of a first course in calculus and
you may not find complex numbers discussed in the books above. There are many excellent
books on complex analysis which tend to begin with a clear introduction to complex numbers.
I recommend
Topics
In which we introduce the most important character in the course: the function. We will define
functions of one variable as a map and we will build up a catalogue of functions and study their
properties. In our thinking we will come across the infinite sum which will foreshadow the idea
of an analytic function which we will chase until we meet it formally at the culmination of the
course.
1. Practically the set A indicates the possible inputs to a function, while the set B specifies
where the output resides.
2. The definition above defines the qualities of any function, but to define a particular func-
tion one must state how it acts on each element of A. Mathematically we write the action
of a function which associates an element x ∈ A to an element y ∈ B as f : x 7→ y. N.B.
The symbol 7→ is used when a map is acting on a single element in a set, while the symbol
→ is used to indicate a map between sets of elements.
17
18 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
Xz Y2
•
•
Xz Y3
.
.
X¢ .
•
Y4
Xz Y3
Y3
.
.
Xz
.
.
ys
B
B
A f
×
y
.
•
,
,
Xz Y2
•
•
Xz Y2
•
•
A f B
Xz Y3
.
.
×
y
.
•
,
,
Xz Y3 B
.
.
A f
X¢× .
•
Y4
Y2y
.
Xz
•
•
•
,
X¢
,
Y4
.
•
Xz Y2
•
•
Xz Y3
.
.
X¢Xz .
.
•
.
Y4Y3
X¢ ysY4 B
A
.
f
• •
A f B
×
y
.
•
,
,
y ys
.
× .
•
,
,
Xz Y2
•
•
Xz Y2
•
•
Y3
A Xz
.
f
.
Y3 B
in A which is mapped to y5 in B. A
X¢×
X¢
,
.
. •
•
Y4
y ,
B
f Y4
.
•
Xzx. ysy
• •
Y2
.
•
•
,
ys
5. The definition of a function f : A → B requires that the output of f is a unique element
•
Xz Y2
•
•
Xz Y3
.
.
Y3
Y4
>
X¢ Y4 B
A ys
.
f
. •
A f > B
×
y
.
•
,
,
y ys
.
× .
•
,
,
Xz Y2
•
•
Xz Y2
•
•
Xz Y3
.
.
A Xz
.
f
.
Y3 B
X¢
x.
.•
•.
yY4 ,
X¢ .
•
Y4
ys
.
Xz Y2
•
•
ys
.
Xz Y3
.
.
>
X¢ Y4
is a well-defined function, whereas the map defined by
.
•
>
A f
.
ys B
A f B
x. y
•
.
,
x. y
•
.
,
Xz Y2
•
•
Xz Y2
•
•
Xz Y3
.
.
Xz > Y3
.
.
X¢ >
Y4
.
•
X¢ .
> •
Y4
>
ys
.
ys
.
2.1. THE DEFINITION OF A FUNCTION. 19
2. It is natural to define the range of f in an abstract way, as f (x) for all elements x ∈ A.
Hence we might define a function as f : A → B by f : x 7→ f (x) for all x ∈ A, where
y = f (x) ∈ B.
3. Where the domain and range are known, we might swiftly refer to a function as f (x)
where x lies in the domain. This is a useful algebraic notation that will enable us to
define functions whose domain and ranges are infinite sets, such as the number sets N,
Z, Q, R or C.
4. The domain and range of a function might be unions of sets, for example
x 2 + 7 x > 0
f : R → (−∞, −3] ∪ [7, ∞), f (x) = .
x − 3 x ≤ 0
5. The range B of a function f : A → B needs only to contain the elements f (x) for each
element x ∈ A. In particular, as we saw earlier, the range B may contain elements which
are not mapped to by f . For example,
x 2 + 7 x > 0
f : R → R, f (x) =
x − 3 x ≤ 0
1
Note that conventions differ in the definition of the range, we have identified the range and co-domain in
this course, however another common convention is to identify the range with the image. It is very common
for fundamental definitions to differ in mathematics, we take the view of Shakespeare’s Juliet when she said
‘what’s in a name? That which we call a rose by any other name would smell as sweet.’
20 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
is well-defined.
When defining a function it is necessary to specify the domain and range. The same op-
eration applied over different ranges and domains gives a different function. As we will see in
some of the following examples the same functional operation defined over different domains
and ranges may not even give a well-defined function.
f : R → R, f (x) = x + 2.
The function defined above is very simple: it translates the real number line by +2. It is
because we have this picture in mind that we are immediately content that the output is unique,
so this is a well-defined function. The specification of the domain tells us that x ∈ R, and
the definition of the range that f (x) ∈ R. Evidently various parts of the definition of the
function are independent, it is up to us as mathematicians to check that the domain and range
are suitable for any explicitly defined function f (x). There are many ways we could pick the
domain and range so that this function is not well-defined, e.g. if we had changed the range so
that f : R → Z, for f (x) = x + 2 the range would simply be incorrect.
Here we have a complicated definition for a function, which is perfectly valid (as you should
check). Notice that the notation for an open interval for all positive numbers is written [0, ∞),
it is an open set in the positive direction which never reaches ∞. As ∞ is not a real number 2 .
Example 2.3. Is
√
f : R → R, f (x) = ± x
a well-defined function?
2
It is possible to extend the real number line to include ±∞. This is denoted R ∪ {−∞, +∞} or sometimes
R and is called the affinely extended real number line.
2.1. THE DEFINITION OF A FUNCTION. 21
It is not well-defined for two reasons. First f (x) gives two outputs for every input. So let
us correct this and redefine the function as
√
f : R → R, f (x) = x.
√
We emphasise that the notation x denotes the principal square root of a number, i.e. its
positive square root. The function is still not well-defined for if x < 0 then f (x) ∈
/ R. However
we may again amend the definition to
√
f : R+ → R, f (x) = x.
This is well-defined, even though there is some redundancy in the range, we could have tightened
the definition to
√
f : R+ → R+ , f (x) = x
We have restricted our examples to simple algebraic functions but we could certainly have
constructed well-defined functions of a more abstract and curious nature e.g.
1, if x ∈ Q
f : R → {0, 1}, f (x) = .
0 if x ∈/Q
Figure 2.1: A log plot of the number of google hits of the integers from 0 to 1012 .
For example the following defines a surjective function as every element of B is mapped to
from A:
A f B
x. y
.
.
,
Xz Y2
•
•
Xz Y3
•
.
×x .
We can write the definition of a surjective function in notation as follows. If, for f : A → B,
∀y ∈ B ∃x ∈ A such that y = f (x) then f is a surjection.
For example the following defines an injective function as each element of A is associated
A f B
×
y
.
•
,
,
Xz Y2
•
•
Xz Y3
.
.
Xz Y2
•
•
Xz Y3
.
.
X¢ .
•
Y4
ys
•
We can write the definition of an injection in mathematical notation as follows. If, for f : A →
A f B
y
B, ∀ x1 , x2 ∈ A, f (x1 ) = f (x2 ) =⇒ x1 = x2 then f (x) is an injective function.
× .
•
,
,
Xz Y2
•
•
Xz Y3
.
.
Example 2.4. State which, if any, of the following functions are injective, surjective or both?
X¢ .
•
Y4
ys
.
x.
√ •
.
y ,
Y2
Xz Y3
.
.
Y4
.
•
>
ys
.
-4 -2 2 4
-2
-4
The graph shows the function as a map from a point on the x-axis to a point on the y-axis
and these points are paired together to give the coordinates of the curve we have sketched.
If you wanted to draw an arrow from a real number on the x-axis to find its image in the
real numbers on the y-axis you would draw a vertical line from the number on the x-axis
until it meets the curve at a point and then draw a horizontal line from that point until
it meets the y-axis. Now we can ask if f1 is surjective: is every number on the y-axis in
24 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
the image of f1 ? We can see from the graph, which is always positive, that none of the
negative numbers on the y-axis are in the image of the function. To prove the function is
not surjective we need only show that one element in R (which is the specified range of f1 )
is not in the image of f1 , e.g. suppose that −1 was in the image of f1 then there would
!
exist some fixed value x0 ∈ R such that −1 = x20 as every real number squares to a positive
real number we see that x0 ∈ / R. This contradicts the assumption that x0 ∈ R and is an
example of a proof by contradiction: we have made an assumption and then developed an
expression logically using that assumption and shows that it leads to a contradiction - the
only possibility is that our assumption was not correct. Hence there is no element in R
that is mapped to −1 by f1 hence f1 does not map onto all of R, so f1 is not surjective.
From the graph we can also quickly deduce that f1 is not injective either, as both x and
−x are mapped to the same point on the y-axis as the curve is symmetric about the line
x = 0. As two different numbers are mapped to the same number on the y-axis we see
that the function is not one-to-one. If we had wanted to prove that f1 is not injective
algebraically we would need only to give one example where the map is not one-to-one
(e.g. f1 (2) = 4 = f1 (−2)).
-4 -2 2 4
-2
-4
which indicates that f2 is both surjective and injective. To prove that f2 is surjective
then we need to show that for any value y ∈ R we can identify a value x ∈ R for which
f2 (x) = y, i.e. we must solve x3 = y for x in terms of y. We readily obtain that x = y 1/3
which exists in R. To prove injectivity, we may use a proof by contradiction. Suppose
there exists x1 6= x2 both in R such that f2 (x1 ) = f2 (x2 ) then this implies that x31 = x32
and by cube-rooting both sides4 we have x1 = x2 , contradicting our assumption, hence f2
must be an injective function.
4
Taking the cube-root, or raising to the power of one-third is a one-to-one map
2.2. INVERSE FUNCTIONS 25
0
0 1 2 3 4 5
The square-root function is both injective and surjective (defined as a map from R+ →
R+ ).
-4 -2 2 4
-2
-4
In the examples above both f2 and f3 are bijections. Bijections are very useful functions
because by being both injective and surjective between two sets A and B it means that every
element in A is uniquely paired with an element in B and vice versa. This implies that a
bijective function can be inverted.
Now if we are given a bijection how do we find the inverse function? We can do it in three
steps:
(iii) Then as y = f (x) = f (g(y)) then g(y) = f −1 (y), or changing variable label g(x) = f −1 (x).
This procedure is straightforward, for example, if we were trying to find the inverse function
to f (x) = x + 4 where x ∈ R, we would write
y = x + 4 =⇒ x = y − 4 ≡ g(y) = f −1 (y)
therefore f −1 (x) = x − 4 as expected. However it is worth practising further for yourself so you
are happy that all the steps are sensible.
Many common functions are not bijections, but we can restrict the domain and range of
their definition so as to construct a bijection which can then be inverted on the restricted
domain and range. We will consider a simple example in this section and then in later sections
we will look at more involved inverse functions.
Let us consider the quadratic function f : R → R given by f : x 7→ x2 , whose graph is
-4 -2 2 4
-2
-4
The claim that we cannot invert this function is equivalent to stating that f −1 : R → R given
√
by f −1 (x) = x is not a well-defined function. We can see why, the (real) square-root is not
defined on the negative real numbers. The function f fails to be a bijection because it is not
28 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
injective nor is it surjective onto R+ . We can see both these observations from the graph. The
image of f is R+ , so that if we were to change the definition of f and restrict its range as
f : R → R+ by f : x 7→ x2 , then the new function would be surjective. Furthermore we can
see from the graph that it is generally a two-to-one function (apart from at zero): if we drew
a horizontal line across the graph for any non-zero, positive value on the y-axis, the line would
meet the function at two points. However if we restricted the domain of the function’s definition
to, for example, just R+ , it would become an injective function. In other words f : R+ → R+
given by f : x 7→ x2 is both an injective and a surjective function, so it is a bijection and its
inverse function exists and is given by f −1 : R+ → R+ such that f −1 (f (x)) = x for all x ∈ R+ .
√
This is a function we have a short-hand for, namely it is given by f −1 (x) = x - and this is
the way the square-root function is constructed.
Before moving on, let’s consider the graph of the bijective function f : R+ → R+ given by
f : x 7→ x2 :
0
0 1 2 3 4 5
This is the restriction of the parabola f (x) = x2 constructed over R, and compared to the
graph above we see that by restricting the domain we have effectively cut the function at its
turning point and by stopping it turning we have ensured that what remains is a one-to-one
function. This idea of restricting a function’s domain to lie between its turning points so as to
make it injective on the new domain is how we will set about constructing more complicated
inverse functions. We will make one final remark about inverse functions here, as f : A → B
and f −1 : B → A the graph of the inverse function is given by the graph of the original function
but with the x-axis and y-axis interchanged. Hence if we to plot both f and f −1 on the same
set of axes we would find that f −1 is the reflection of f in the line y = x. For the example we
have considered here we plot both f and f −1 below so we can see that one is a reflection of the
2.3. SOME STANDARD FUNCTIONS 29
other.
5
x squared
square-root of x
2
0
0 1 2 3 4 5
Comment(s). The notation xn ,to denote the n’th power of x was first introduced by Descartes
in his book Géométrie, published in 1637=(1637)1 (1637 is a prime number so its prime fac-
torisation, written in exponent notation, is 16371 ).
It is clear from the definition of xn that n must be a positive integer, what does it mean to
multiply a number by itself a negative number of times? Or a fraction number of times? Or
even zero times?
30 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
The definition allows us to develop some rules for multiplying powers of numbers, informed
by our understanding of multiplication of the real numbers, but written in the short exponent
notation.
xn × xm = x × x × x . . . × x × x × x × x . . . × x = x × x × x . . . × x = xn+m
| {z } | {z } | {z }
n-fold product of x’s m-fold product of x’s (n + m)-fold product of x’s
where n, m ∈ Z+ .
We may also use the exponent notation to divide powers of x:
n-fold product of x’s
z }| {
n
x x × x × x... × x
= = x × x × x . . . × x = xn−m if (n − m) ∈ Z+ .
xm x × x × x... × x | {z }
| {z } (n − m)-fold product of x’s
m-fold product of x’s
This last condition is a little restrictive and comes from our definition of xn which is only
defined when n is a positive integer. Using the rule for multiplication of powers we realise that
we can give meaning to x−m (m ∈ Z+ ) as
xn
xn−m = xn × x−m = .
xm
Hence we have constructed a logical definition of for the meaning of xn when n ∈ Z− , namely:
1
x−n ≡ .
x × x × x... × x
| {z }
n-fold product of x’s
To recap, we have extended the definition of xn from n being a positive integer, to n being a
positive or a negative integer. From this the definition of x0 also follows quickly as
n-fold product of x’s
z }| {
x × x... × x
x0 ≡ xn−n = xn × x−n = = 1.
x × x... × x
| {z }
n-fold product of x’s
Consequently we have extended our definition of xn to any integer power, i.e. when n ∈ Z.
Turning again to our knowledge and experience of multiplication of the real numbers we
also understand that
n
z }| {
(xm )n = xm × xm . . . × xm = (x × x × x . . . × x) × (x × x × x . . . × x) × . . . (x × x × x . . . × x) = xm×n .
| {z } | {z } | {z } | {z }
n m m m
Comment(s). It is customary to drop the multiplication symbol algebraically and write m×n =
mn, where, as here, when the multiplication rule is obvious. Hence one writes (xm )n = xmn =
xnm .
2.3. SOME STANDARD FUNCTIONS 31
We can now extend the notion of the exponent to any rational power (Q) by noting that
(xp/q )q = xqp/q = xp .
√ √
Therefore (xp/q ) is the q’th root of xp , e.g. if p = 1 and q = 2 then x1/2 = 2 x ≡ x is the
√
square root, or if p = 1 and q = 3 then x1/3 = 3 x is the cube-root, or if p = 2 and q = 5 then
√
5
x2/5 = (x2 )1/5 = x2 . With this interpretation we have extended the range of our definition of
xn to n ∈ Q.
Exercise 2.2. Check that the laws for the manipulation of powers (i.e. (xn )(xm ) = xn+m and
(xm )n = xnm ) are valid for n, m ∈ Q.
It is a little more challenging to go further and generalise to real powers of x, i.e. when
n ∈ R. We can use the definition of the exponent when n ∈ Q to identify a rational number that
is arbitrarily close to any real number. Specifically given a degree of accuracy we can always
identify a rational number p/q that is within that degree of accuracy of any irrational real
number5 . For example, suppose we wish to express π as a fraction to two decimal places then
3.14 = 314
100
= 157
50
and now we can give meaning to x157/50 which is xπ(up to 2.d.p.) . If we wanted to
find a better approximation of xπ we can always find a more accurate rational approximation of
π, e.g. to eight decimal places 3.14159265 = 314159265
108
. As the rational approximations improve
we approach the meaning of xπ , although it seems we never meet it. This may worry you, but
such a definition involving a limiting procedure can be put on a definite footing and the subject
of limits will be a major topic later in this course.
1. In other words, ‘loga (y) gives the power to which I must raise a to get y’ i.e. loga (ax ) = x.
2. The graph of y = log2 x is shown in figure 2.2. This is the inverse function of 2x . As
f : A → B and f −1 : B → A, then you would expect that the inverse function is the
mirror image of the original function in the line y = x as indicated in figure 2.2.
3. The natural logarithm is defined as the logarithmic function when the base is e, and is
denoted ln, i.e. ln(x) ≡ loge (x). The graph of y = ln(x) is shown in figure 2.3. Notice
5
It is only when n is an irrational real number that we are lacking a definition of the exponent xn .
32 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
2 to the power of x
4 y=x
Log base 2 of x
2
-4 -2 2 4
-2
-4
Figure 2.2: A sketch of y = 2x , y = x and y = log2 x. The curve y = log2 x is the reflection in
the line y = x of y = 2x .
that broadly there is little difference in the sketch of the two logarithm graphs y = ln x
and y = log2 x.
(i) We will prove that loga x logb a = logb x, by raising both sides of this equation as a power
of base b. For the right-hand-side, by definition, we have
blogb x = x
Hence as the exponentiation of b gives a unique output for each input (it is a one-to-one
function), we have shown the identity.
6
Note that proving identities is always a matter of showing that the two sides of the given equation are equal
- there is no need to derive the identity.
2.3. SOME STANDARD FUNCTIONS 33
2
ln(x)
-4 -2 2 4
-2
-4
Figure 2.3: A sketch of y = ln(x). Notice that it is qualitatively similar to the graph of
y = log2 (x) shown in figure 2.2.
(ii) We will raise both sides of the identity loga xy = loga x + loga y as a power of a and show
the result is identical. From the left-hand-side we have
a(loga xy) = xy
As required.
(iii) We will use the same device again and raise both sides of the identity loga xy = y loga x
as a power of a. From the left-hand-side we have
y
a(loga x ) = xy
4. We have not mentioned the domain and range of the polynomial functions as these func-
tions might be real polynomial functions (mapping from R to R) or complex polynomial
functions (mapping from C to C), or more generally from an abstract number field F to
F. The context of use of the polynomial functions will normally define clearly the range
and domain of a particular polynomial function. In this course we will consider real and
complex polynomial functions.
We have not formally introduced the derivative yet in these lecture notes, but it will be useful
to assume a simple result which we will prove later on, namely that the derivative of a monomial
d
term axn is anxn−1 , i.e. dx (axn ) = anxn−1 . Armed with this result (and with some experience
of differentiation) we can see that a polynomial function of degree n which is defined by n + 1
coefficients {a0 , a1 , a2 , . . . an } could be defined in an alternative way. Rather than defining the
function globally and explicitly, we could reconstruct the function from knowledge of all its
derivatives at a point - this is a very local way to define a function, and what is surprising is
that it will be sufficient to reconstruct the function globally. Given the polynomial function:
we can compute the derivatives at the point x = 0 (any other point would do, but the expressions
would become very complicated), as shown in the table. We see that from the values of the
dm
m dxm
f (x)|x=0
0 a0
1 a1
2 2a2
3 6a3
.. ..
. .
n n!an
multiple derivatives at a point we have extracted all the data about the coefficients from which
2.3. SOME STANDARD FUNCTIONS 35
we could reconstruct the polynomial function. This illustrates an alternative way to define a
polynomial function: if we know the value of a function at a point and we know all its derivatives
at that point we can define the polynomial function.
The polynomial functions are ubiquitous in mathematics, and we will meet many examples
in this course, but their definition immediately suggests a way to generalise them: could we
consider polynomial functions of infinite degree? This would mean the function would be an
infinite sum of terms and each term dependent on x, and we will meet such functions, known as
power series, later on in the course. However we raise the idea here because we will introduce
such a function now and it will be defined most naturally in terms of its derivatives at a point.
f (x) = ex
where e is Euler’s number and ex is defined as being the function which is its own derivative
and equals 1 when x = 0.
1. Euler’s number is named after the prolific Swiss mathematician Leonhard Euler, and the
number which is most commonly simply called ‘e’. The value of the exponential function
at x = 1 gives a value for e.
2. There are alternative and equivalent definitions of the exponential function such as
n
x x
e ≡ lim 1 +
n→∞ n
These definitions make use the limit and the integral, concepts which we will define later
in this course.
exp(x)
2
ln(x)
-4 -2 2 4
-2
-4
Figure 2.4: The exponential function ex (or exp(x)) and its inverse the natural logarithm ln (x).
5. For x > 0 ex grows very quickly (‘exponential growth’) and is unbounded, consequently
for x < 0 ex decreases as x decreases and approaches zero as x approaches −∞. See the
sketch of ex in figure 2.4.
We motivated the exponential function as a polynomial function of infinite degree, we will
introduce such power series later in the course, but let us develop an expression for ex using
the definition of the exponential function above. Recall that we commented earlier that two
functions are identical if all their derivatives (including the zeroth derivative) at a point are
d
equal. We will presume the following action for the derivative of a power of x: dx (axn ) = anxn−1 ,
we will prove this later, but it will benefit us to make use of this fact here to find another
expression for the exponential function. We will start by testing the assumption that ex can be
written as a polynomial in x of finite degree n (and we will see how this fails) so let us write
n
x !
X
e = ak x k
k=0
Now by the definition of the exponential function this derivative must equal our proposed
polynomial expression for ex i.e.
n n−1
k !
X X
ak x = ak+1 (k + 1)xk .
k=0 k=0
Notice that the left-hand-side is a polynomial of degree n while the right-hand-side is a polyno-
mial of degree n − 1. Of course this makes sense as the derivative has decreased the maximum
power of x by one. We have hinted already how we can resolve this: we will allow the maximum
power of x to tend to infinity, and the sum becomes infinite, i.e. our candidate exponential
function becomes ∞
x !
X
e = ak x k
k=0
and the condition that the first derivative equals ex becomes
∞ ∞
k !
X X
ak x = ak+1 (k + 1)xk .
k=0 k=0
Can we pick coefficients ak so that the left-hand-side equals the right-hand-side? If the expres-
sions are equal then the coefficient of each power of x must be equal on both sides, i.e. we
ak
require ak = (k + 1)ak+1 or ak+1 = k+1 for all k. We can repeatedly use this to find:
ak ak−1 a0
ak+1 = = = ... =
k+1 (k + 1)k (k + 1)!
where (k + 1)! = (k + 1) × k × (k − 1) × (k − 2) × . . . 2 × 1 is called the factorial of (k + 1) and
is defined as indicated for all k + 1 ∈ Z+ , but note that 0! ≡ 1 by definition. Putting together
our findings we now have
∞
x !
X a0 k
e = x .
k=0
k!
Now as e0 = 1 we see that a0 = 1 hence we have the series form of the exponential function
∞
x
X 1 k x2 x3 x4
e = x =1+x+ + + + ... (2.1)
k=0
k! 2! 3! 4!
e = 2.71828182845904523536028747135266249775724709369995 . . . .
38 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
Definition 2.3.5. A function which can be locally written as a convergent power series is called
an analytic function.
The word ‘locally’ in the above definition really means that function is well-defined in the
vicinity of a certain point. We will understand the reason for including this in the definition
when we study Taylor series. All polynomial functions, the exponential function, the trigono-
metric functions, power series, the logarithm function are all examples of analytic functions.
Definition 2.3.6. The unit circle is the set of points (x, y) satisfying x2 + y 2 = 1.
Definition 2.3.7. One radian is the angle subtended at the centre of a circle by an arc that is
equal in length to the radius of the circle.
In other words, the arc length on a circle of radius R is equal to Rθ where θ is the angle
(in radians) swept through by the radial vector for points along the arc. The radian is the
standard unit of angular measurement and the circumference of the the circle 2πR for a circle
of radius, R. Hence 2π is the length of the circumference of the unit circle. If one wishes to
convert degrees to radians then using the conversion for the unit circle 360◦ = 2π radians, or
◦
1 radian = 360
2π
= 57.295779513 . . .◦ .
Now we can define the trigonometric functions cos θ, sin θ and tan θ.
Definition 2.3.8. The points (x, y) = (cos θ, sin θ) are the Cartesian coordinates of points on
the unit circle whose radial vector (i.e. the straight line from the point (0, 0) to (x, y)) is at an
angle θ to the x-axis.
7
A transcendental number is any number which is not the solution of any non-constant polynomial equation
with rational coefficients.
2.3. SOME STANDARD FUNCTIONS 39
sin θ
Definition 2.3.9. The tangent function is given by tan θ ≡ cos θ
.
1. cos2 θ + sin2 θ = 1.
π
3. tan θ is not defined for θ = 2
+ nπ where n ∈ Z: for these values of the θ, cos θ = 0.
4. cos θ and sin θ are both periodic, with period 2π, in other words cos (θ + 2π) = cos θ =
cos (θ + n2π) where n ∈ Z, and similarly for sin θ.
5. tan θ is periodic, but its period is π as sin(θ + π) = − sin θ and cos(θ + π) = −cos(θ) (see
below for proof of these statements).
Sketches of the trigonometric functions can be derived quickly from the definitions and are
shown in figure 2.5. In the comments above we made use of some fundamental properties of sine
and cosine that originate from their definition as defining the coordinate (x, y) = (cos θ, sin θ)
on the unit circle. Since any point on the unit circle can be written in terms of sine and cosine,
transformations which generate symmetries of the circle maps one point on the circle (x, y) to
another point (x0 , y 0 ) also on the circle and so also expressible in terms of sine and cosine. That
is the symmetry transformations of the circle give rise to interesting transformations of sine
and cosine. We consider the effect of some simple symmetry transformations:
tan(x)
4
sin(x)
cos(x)
5π 3π π π 3π 5π
- -2 π - -π - π 2π
2 2 2 2 2 2
-2
-4
In summary:
cos(−θ) = cos θ
sin(−θ) = − sin θ.
In summary:
cos(π − θ) = − cos θ
sin(π − θ) = sin θ.
In summary:
cos(θ + π) = − cos θ
sin(θ + π) = − sin θ.
42 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
π
x = cos θ → x0 = cos( − θ) ≡ y = sin θ
2
0 π
y = sin θ → y = sin( − θ) ≡ x = cos θ
2
In summary:
π
cos( − θ) = sin θ
2
π
sin( − θ) = cos θ.
2
The identities above agree with the double-angle formulae for sine and cosine which are often
taught without proof in pre-university calculus. We will return to the trigonometric functions
and derive the double-angle formulae in a later chapter.
We will now give a second, equivalent, algebraic definition of the basic trigonometric func-
tions in terms of a nice property of its derivatives.
d2 f
= −f
dx2
are
cos x df
if f (0) = 1 and dx
(0) =0
f=
sin x df
if f (0) = 0 and dx
(0) = 1.
We have asserted, in passing, that cos x and sin x are analytic functions. While we are not
in a position to check convergence yet, we can certainly identify the correct power series.
d2 f
Example 2.5. Use the definition of cos x and sin x as solutions to dx2
= −f with the appro-
priate boundary conditions to find their power series.
2.3. SOME STANDARD FUNCTIONS 43
P∞
Let f (x) ≡ n=0 an xn then
∞ ∞ ∞
d2 f X
n−2
X
n !
X
= a n n(n − 1)x = a n+2 (n + 2)(n + 1)x = − an x n .
dx2 n=0 n=0 n=0
hence we have X xn X xn
f (x) = a0 (−1)n/2 + a1 (−1)(n−1)/2 .
n even
n! n odd
n!
Applying the boundary conditions for sin(x), where we have f (0) = 0 and f 0 (0) = 1 gives a0 = 0
and a1 = 1, therefore
X xn x3 x5 x7
sin(x) = (−1)(n−1)/2 =x− + − + .... (2.2)
n odd
n! 3! 5! 7!
We can take advantage of mathematical software to plot graphs of some of the partial sums of
the first few terms of sin(x) and compare the result against the graph of sin(x). Bearing in mind
that we expect the full sum to reproduce the sine function exactly, we may anticipate that the
3 5 7 n
sum of the terms up to and including order n in x, given by x − x3! + x5! − x7! + . . . + (−1)(n−1)/2 xn!
(for odd n) gives a better and better approximation to the sine function as n increases. Sketches
of the partial sums, together with the sine function, are shown in figure 2.6.
At this stage you should correctly feel concerned about the infinite sums and infinite prod-
ucts. How do we know that they are finite? Addressing this concern is one of the major topics
in analysis and which we will work on in this course.
In addition to the basic trigonometric functions there are a set of related and commonly
used functions - they appear so frequently in mathematics that we will define them now.
44 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
4
up to order 9
up to order 5
up to order 1
5π 3π π π 3π 5π
- -2 π - -π - π 2π
2 2 2 2 2 2
-2
up to order 11
up to order 7
up to order 3
-4
Figure 2.6: Building sin(x) as a power series, by taking more and more terms in the summation.
k−1
Dashed: sin(x). Solid: f (x) = nk odd (−1) 2 xk /k! for different choices of n (values of n are
P
1. It is not defined for θ = π2 + nπ where n ∈ Z. These are the same points where tan θ is
not defined, so you might suspect there is a connection between tan θ and sec θ and you
would be correct. . . .
Figure 2.7: A segment of the unit circle, subtending an angle of θ. The scale factor to map the
small right-angled triangle up to the large one is sec θ.
2. cot(θ + π) = cot θ.
But where do all these unusual names come from? Well, Latin is the short answer... the
sine function is derived from ‘sinus’ meaning ‘a bend’, while cosine is short for ‘complementary
sine’, the tangent originates from ‘tangere’ meaning ‘touching’ (see the sketch in figure 2.7) and
secant from ‘secare’ meaning ‘to cut’ (again see the sketch in figure 2.7 to get the sense behind
these names).
Definition 2.3.14. The inverse trigonometric functions are known by particular names. The
inverse sine function is denoted arcsin or sin−1 , the inverse cosine function is denoted arccos
of cos−1 and the inverse tangent function is denoted arctan or tan−1 .
46 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
1 sin(x)
y=1/2
5π 3π π π 3π 5π
- -2 π - -π - π 2π
2 2 2 2 2 2
-1
-2
Figure 2.8: A sketch of sin x highlighting its many-to-one nature. The horizontal line indicates
the line y = 12 and its multiple intersections with sin x are shown over the domain [−5π/2, 5π/2].
Comment(s). (On the inverse trigonometric functions...) The trigonometric functions are
all periodic, sine and cosine have period 2π while the tangent has period π, see the sketch in
figure 2.8. Consequently for a given θ there are infinitely many values θ0 , such that θ 6= θ0
and sin(θ0 ) = sin(θ), cos(θ0 ) = cos(θ) and tan(θ0 ) = tan(θ). This means that the trigonometric
functions are not bijective functions, so they are not invertible functions on their full domain.
However by restricting the domain, one can then construct a well-defined inverse functions.
From the graph in figure 2.8, and our knowledge of the continued periodicity of the trigono-
metric functions, we see that, for example, sin θ = 21 has an infinite number of solutions for θ,
i.e. θ = π6 + n2π and θ = 5π 6
+ m2π for all n, m ∈ Z. That is, there exist θ1 6= θ2 such that
sin θ1 = sin θ2 = . . ., so that on the domain of the real numbers R the sine function is not an
injective function and so the inverse sine function is not defined. The same is also true for the
inverse cosine function and the inverse tangent function. However if we limit the domain and
range we can find inverse trigonometric functions.
Our aim is to reduce the domain and range of the trigonometric function such that each
input gives a different unique output. Graphically this corresponds to restricting the sine and
cosine graphs to a domain between consecutive turning points. There are many ways to do this
and we give the standard definitions here.
Definition 2.3.15. The function arcsin (also denoted sin−1 ) is defined by
The function arcsin x when sketched as a graph is a line segment and is a reflection in the
line y = x of sin x over the domain [−π/2, π/2], this is shown in figure 2.9.
2.3. SOME STANDARD FUNCTIONS 47
π
2 arcsin(x)
1 sin(x)
π π
- -1 1
2 2
-1
π
-
2
Figure 2.9: A sketch of arcsin x and sin x over the domains [−1, 1] and [−π/2, π/2] respectively.
Notice how they are mirrored in the line y = x.
arccos(x)
π
-1 1 π
-1
cos(x)
Figure 2.10: A sketch of arccos x and cos x over the domains [−1, 1] and [0, π] respectively.
The sketch of cos(x) and arccos(x) are shown on the same graph in figure 2.10.
tan(x)
π
2 arctan(x)
π π
-
2 π 2
-
2
Figure 2.11: A sketch of arctan x and tan x over the domains [−π, π] and (−π/2, π/2) respec-
tively.
Definition 2.3.18. The hyperbolic functions x = cosh(a) and y = sinh(a) are the coordinates
of points on the right-hand branch of the hyperbola x2 − y 2 = 1, as depicted in figure 2.12.
1. The notation cosh and sinh are shorthand for ‘hyperbolic cosine’ and ‘hyperbolic sine’.
2. There are multiple ways of pronouncing sinh (pronounced as either “shine” or “sinch”)
while cosh is pronounced as written (“kosh”).
4. The parameter a used to define the point (cosh(a), sinh(a)) is not an angle but has the
magnitude of an area and maybe positive or negative (if the point (x, y) is below the x-
axis). Its magnitude is twice the area shaded in figure 2.12 i.e. up to a sign it is the
area8 enclosed between the x-axis, the right-hand branch of the hyperbola and the straight
8
The simplest way to prove this is by integration - try and show this once you have met the integral later in
the course.
2.3. SOME STANDARD FUNCTIONS 49
Figure 2.12: The hyperbolic functions x = cosh(a) and y = sinh(a) shown as points on the
right-hand branch of the hyperbola x2 − y 2 = 1.
line from the origin to the point x = cosh(a) and y = sinh(a). It is worth comparing this
definition of a with that of the argument of the standard trigonometric functions where
the area of a segment of the unit circle subtending an angle θ at the origin is 12 θr2 = 12 θ.
sinh : R → R
and
cosh : R → [1, ∞).
Definition 2.3.19. The hyperbolic tangent function is denoted tanh and defined by
sinh(a)
tanh(a) = .
cosh(a)
2. tanh(a) is the gradient of the straight line through the origin and the point with coordinates
(cosh(a), sinh(a)) on the hyperbola.
50 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
cosh(x)
-5 -4 -3 -2 -1 1 2 3 4 5
tanh(x)
-2
-4
sinh(x)
Figure 2.13: Sketches of the hyperbolic functions as the curves y = cosh(x), y = sinh(x) and
y = tanh(x).
Sketches of the hyperbolic trigonometric functions are shown in figure 2.13. Related hyper-
bolic functions are given by
1
sech (x) ≡
cosh(x)
1
cosech (x) ≡
sinh(x)
1
coth(x) ≡
tanh(x)
One can derive some properties of the hyperbolic functions from the symmetries of the
hyperbola:
9
√ √
The equation defining the hyperbola can be rewritten as the pair of curves y = x2 − 1 and y = − x2 − 1
and for large x these equations approximate closely the lines y = x and y = −x.
2.3. SOME STANDARD FUNCTIONS 51
In summary:
cosh(−a) = cosh(a)
sinh(−a) = − sinh(a).
The circle has an infinite number of symmetries, but the hyperbola only obviously has two:
reflection in the x-axis and reflection in the y-axis. Why does reflection of the hyperbola in the
y-axis not generate an identity of the hyperbolic functions?
One can also define the hyperbolic functions algebraically as solutions to differential equa-
tions.
d2 f
=f
dx2
are
cosh x df
if f (0) = 1 and dx
(0) =0
f=
sinh x df
if f (0) = 0 and dx
(0) = 1.
This definition, coming prior to the definition of the derivative is premature! However we
d
can make use of it (again assuming only that dx (axn ) = anx(n−1) ) to find power series.
d2 f
Example 2.6. Use the definition of cosh(x) and sinh(x) as solutions to dx2
= f with the ap-
propriate boundary conditions to find their power series.
52 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
P∞
Let f (x) ≡ n=0 an xn then
∞ ∞ ∞
d2 f X
n−2
X
n !
X
= a n n(n − 1)x = a n+2 (n + 2)(n + 1)x = an x n .
dx2 n=0 n=0 n=0
Definition 2.3.21. The inverse hyperbolic functions are called arcsinh or sinh−1 , arccosh or
cosh−1 and arctanh or tanh−1 and are the inverse sinh, cosh and tanh functions respectively.
Recall the graphs of the hyperbolic functions which are shown in figure 2.13. These functions
are not periodic, and only cosh “doubles back on itself” and is the only one of the three which
is not injective and whose inverse function will have a restricted domain. There are some
further pleasant qualities of the hyperbolic functions which will allow us to find simple analytic
expressions for their inverse functions. Our construction will rest upon the power series for the
exponential function10 which we recall is:
∞
x
X 1 n x2 x3 x4 x5
e = x =1+x+ + + + + ...
n=0
n! 2! 3! 4! 5!
10
We should note that we are building up a lot of debt here: we have assumed that the derivative of axn is
anxn−1 and that the infinite sum for ex converges - we will pay off our debts in due course.
2.3. SOME STANDARD FUNCTIONS 53
∞
X (−1)n x2 x3 x4 x5
e−x = xn = 1 − x + − + − + ....
n=0
n! 2! 3! 4! 5!
While, earlier, in a similar analysis in equations (2.4) and (2.5) we found that
X xn x3 x5 x7
sinh(x) = =x+ + + + ... and
n odd
n! 3! 5! 7!
X xn x2 x4 x6
cosh(x) = =1+ + + + ....
n even
n! 2! 4! 6!
Hence we notice that our power series for the two hyperbolic functions can be rewritten in
terms of the exponential function, that is,
1
sinh(x) = (ex − e−x ) and (2.6)
2
1
cosh(x) = (ex + e−x ). (2.7)
2
Using these expressions we can find simple analytic expressions for arcsinh , arccosh and
arctanh .
54 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
Example 2.7. Find an analytic expression for the inverse sinh function.
Let us write y ≡ sinh x = 12 (ex − e−x ) and rearrange this expression to find x as a function
of y, i.e. x(y). Putting all terms on the same side of the equation we find
ex − e−x − 2y = 0
and therefore
p
x = ln(y + 1 + y 2 ).
That is
p
arcsinh y = ln(y + 1 + y2) ∀ y ∈ R.
A sketch of the graph of arcsinh is given in in figure 2.14, you should take some time to
confirm the plot agrees with your expectations of the analytic function we have derived in the
previous example.
10 sinh(x)
arcsinh(x)
-3 -2 -1 1 2 3
-5
-10
Euler’s Formula
√
The complex numbers C introduces the imaginary number i = −1 to the real numbers and
consists of all numbers of the form x + iy where x, y ∈ R. The complex numbers C is an
extension of R, and we may wonder if we can we extend the range of validity of the exponent
an to include the case where n ∈ C. For example just what could an expression such as ai
mean? What is the sense in saying it means we multiply a by itself i times? Not much, but
this is just the kind of loose end we might have expected when we investigating the complex
numbers, and we must hope that we can give such expressions some logical meaning.
Recall that we can write ex , cosh x and sinh x as infinite sums, and even relate the formulae
to find exponential expressions for the hyperbolic functions. Now that we have introduced the
complex number i we can also re-express our power series for sine in equation (2.2) and for
cosine in equation (2.3) in terms of the exponential in a similar way. Recalling our power series
for ex given in equation (2.1) we can now add in the imaginary number to find:
∞
ix
X 1 x2 x3 x4 x5 x6 x7
e = (ix)k = 1 + ix − −i + +i − − i + ...
k=0
k! 2! 3! 4! 5! 6! 7!
and ∞
−ix
X 1 x2 x3 x4 x5 x6 x7
e = (−ix)k = 1 − ix − +i + −i − + i + ...
k=0
k! 2! 3! 4! 5! 6! 7!
where we have used our definition of i, that i2 = −1 to expand out and simplify the powers of
56 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
i. By comparing with the power series proposed for sin(x) and cos(x) in equations (2.2) and
(2.3), we deduce that
1 ix
sin x = (e − e−ix ) (2.9)
2i
1
cos x = (eix + e−ix ). (2.10)
2
We may now invert these equations to find an expression for eix in terms of cos x and sin x.
This gives us Euler’s formula.
Definition 2.3.22. Euler’s formula is
where θ ∈ R.
Leonhard Euler (1707-1783) was a Swiss mathematician who played
a pivotal role in the development of analysis. Euler, pronounced “Oiler”
rather than “U-ler”12 , is widely celebrated as one of the greatest mathe-
maticians to have ever lived. While measures of the greatness of mathe-
maticians are subjective, what is not subjective is how prolific Euler was,
he had 13 children13 and still managed to write over 800 papers! It was
said of Euler that “he calculated just as men breathe, as eagles sustain
themselves in the air”.
Our development of Euler’s formula has been based on a number of
assumptions which we have yet to prove, so let us at this stage use some
of our prior knowledge of calculus to check that the formula is, at least, Figure 2.15
11
sensible. So forgoing the idea that we have yet to discover the derivative Leonhard Euler
we note that:
d iθ d
(e ) = ieiθ while (cos θ + i sin θ) = − sin θ + i cos θ = i(i sin θ + cos θ) = ieiθ .
dθ dθ
Thus this remarkable formula of Euler’s passes a first test - why did we do this? We have no
good handle on an imaginary power, nothing in our experience with which to check the logic
of the formula and we have developed our power series on the back of a set of ideas which we
have yet to convince ourselves of. So it is reassuring to check that this formula agrees with our
pre-knowledge of calculus. We will make much use of it in the following.
11
This portrait is by Jakob Handmann. Euler is winking at you. No he isn’t, in later life he had problems
with his eye, although it is understood he developed a cataract in his left eye, so his closed right eye in this
portrait is puzzling.
12
See the 2014 film “The Imitation Game” for a good example of how not to pronounce Euler.
13
This is not what I mean by prolific!
2.3. SOME STANDARD FUNCTIONS 57
3. When θ = π we have eiπ = −1. This remarkable statement is known as Euler’s identity
and is frequently celebrated as it relates e, i, π and −1. See figure 2.16 for many rapturous
comments on this identity.
4. eiθ has a period of 2π inherited from the trigonometric functions, and indeed ei2π = 1.
This will be of concern in defining its inverse function, the natural logarithm, over the
complex numbers.
Double-Angle Formulae
It will prove very useful to us to develop the double-angle formulae for sine and cosine (i.e. to
rewrite the expressions cos(θ + φ) and sin(θ + φ) in terms of cos θ, cos φ, sin θ and sin φ). Due
to Euler’s formula we now have a very simple way to quickly write these formulae down.
Let us multiply two complex numbers, both of modulus 1:
There appears to be nothing to this, but notice that if we use Euler’s formula on the left we will
have trigonometric functions of single angles, while on the right it will give us trigonometric
functions of the sum of the angles:
If we equate the real part on each side and the imaginary part on each side of the above we
arrive at the double-angle formulae for cosine and sine:
Exercise 2.5. Use the double angle formulae for cosine and sine to show that
tan θ + tan φ
tan(θ + φ) = .
1 − tan θ tan φ
Comment(s). (On the double angle formulae...)
1. When φ = θ we have
3. The difference formula are quick to derive (using sin(−x) = −sin(x) and cos(−x) =
cos(x)):
4. One can convert sums of trigonometric functions into products and vice-versa:
1
(cos(θ + φ) + cos(θ − φ)) = cos θ cos φ
2
1
(cos(θ − φ) − cos(θ + φ)) = sin θ sin φ
2
1
(sin(θ + φ) + sin(θ − φ)) = sin θ cos φ
2
1
(sin(θ + φ) − sin(θ − φ)) = sin φ cos θ
2
α+β α−β
cos(α) + cos(β) = 2 cos( ) cos( )
2 2
α+β α−β
sin(α) + sin(β) = 2 sin( ) cos( )
2 2
α+β α−β
cos(α) − cos(β) = −2 sin( ) sin( )
2 2
α−β α+β
sin(α) − sin(β) = 2 sin( ) cos( )
2 2
Exercise 2.6. Write cos θ, sin θ and tan θ in terms of t = tan 2θ . (Motivation: these formulae
are very useful for solving otherwise difficult integrals.)
60 CHAPTER 2. FUNCTIONS OF ONE VARIABLE
There are some simple relations between the trigonometric functions and the hyperbolic func-
tions which are evident in the similarity of the analytic expressions. By allowing the arguments
of cos, sin, cosh and sinh to be extended to the complex numbers we can use the analytic
expressions to find direct relations between the trigonometric functions with argument θ and
the hyperbolic functions with argument iθ:
1
cos θ = (eiθ + e−iθ ) = cosh(iθ)
2
1
sin θ = (eiθ − e−iθ ) = −i sinh(iθ)
2i
−i sinh(iθ)
tan θ = = −i tanh(iθ).
cosh(iθ)
Notice that we can use our relations above to convert the fundamental trigonometric identity
cos2 θ + sin2 θ = 1
into
cosh2 (iθ) − sinh2 (iθ) = 1.
Similarly we can also rewrite hyperbolic functions with argument θ as trigonometric functions
of iθ with complex coefficients:
f (x) + g(x)
f (x)g(x) .
2.4. A ZOO OF FUNCTIONS 61
This gives us a clue to consider a more abstract way to combine functions: functions of functions.
It is evidently simple for us to combine functions abstractly as
f (g(x)).
Definition 2.3.23. Function composition is the application of one function f to the output of
another function g which is defined point-wise for x ∈ R (for each value in the domain at a
time) and is denoted
(f ◦ g)(x) ≡ f (g(x)) ∀ x ∈ R.
Note the use of the common mathematical shorthand ∀ which means “for all”. By composing
two functions we link them together in the sense that the output of one function (g(x) above)
becomes the input of another function (f (x) above).
We note that function composition is very common, consider classical mechanics which was
a major motivation for the development of the calculus, where one might expect a speed v to
be defined as a function of a position x, which in turn may be defined as a function of time,
t, i.e. v(x(t)) this function v is naturally written in terms of x, when it is called an implicit
function of t but an explicit function of x. Of course if we know x(t) then we can write the
composition v(x(t)) as an explicit function of t denoted v(t). For example if v = ax + b and
x = t2 then we can also define v(t) = at2 + b.
derivatives:
f (0) = b g(0) = d
f 0 (0) = a g 0 (0) = c.
So if b = d and a = c then the two linear functions are the same. Since we have not introduced
the derivative yet we will leave this idea on the back-burner, but note that if we cannot find
the derivative of a function (if there are functions for which it does not exist) then we will not
be able to invoke this check of whether two functions are the same.
One is not often tempted to compare all values of two functions but, instead, to manipulate
one algebraic expression for a function f (x) into another g(x) but there are some traps with
this, which we should be aware of.
x2
Example 2.8. Is f (x) ≡ x
the same function as g(x) = x?
x2
f (x) = = x = g(x) (2.13)
x
and argue that the two functions are identical. But what is f (0) = 00 , such expressions are called
undefined, whereas g(0) = 0. So the two functions are not identical. The message is that one
must be careful when simplifying algebraic expressions where one might be implicitly dividing by
zero.
3. The Limit
In which we encounter the most important idea in the course: the limit. We develop its formal
definition, gain experience in computing limits and use it to define continuous functions.
3.1 Motivation
A major problem in mathematics occurs whenever we encounter an expression of the form 00 -
this expression is not a number. We know this because we can see the evident (and ill-defined)
division by zero, however we are nevertheless tempted to try to draw some conclusions about
it. We might take any of the following stances in opposition to the idea that it is not a number:
(a.) it is zero, as any number multiplied by zero gives zero, or
(c.) it tends to either ±∞, as any number divided by a small, positive number blows up
towards plus infinity, while division by a small, negative number blows up towards negative
infinity - but zero is equally close to a small positive and a small negative number, so we
are even confused over which number 1 approaches as → 0 is it plus or minus infinity?
By way of example consider the function
x2 − 9
f (x) =
x−3
and resist the strong inclination to algebraically simplify the function to (x + 3) using the
difference of two squares as follows
x2 − 9 (x + 3)(x − 3) !
= = x + 3 ≡ g(x).
x−3 x−3
63
64 CHAPTER 3. THE LIMIT
Why should we not do this? The function f (x) is almost but not quite the same function as
g(x) = x + 3, the functions are the same apart from at the point x = 3. To rewrite the line
above without the exclamation mark we should add a condition so that it now reads
(x + 3)(x − 3)
f (x) = = x + 3 = g(x) for x 6= 3
x−3
otherwise we would be implicitly dividing by zero. The function f (x) is identical to g(x) accept
at the point x = 3 where it involves a division by zero and so becomes undefined. If one plotted
all the points of f (x) for x ∈ R it would be identical to the straight line g(x) = x + 3 apart
from missing a single point at x = 3. If we plugged in some numbers to see what happens to
f (x) as x gets close to 3 we could build up a table of data to help us understand what happens
before and after f (3) and compare it with g(x).
x2 −9
x f (x) = x−3
x g(x) = x + 3
In the table above we have evaluated f (x) and g(x) at and near x = 3. We could have
picked a value for x infinitesimally close to 3, but not quite 3, e.g. x = 3 + such that 6= 0
and we would have found that f (3 + ) = 6 + = g(3 + ). As → 0, f (3 + ) → 6, in words,
as x approaches 3, f (x) approaches 6.
As mathematicians we are interested in fixing problems and a resolution to the problem for
f (x) at x = 3 that we are tempted towards is to discard the point x = 3 and to treat f (x) as
we did algebraically. We would be able to work with the function f (x) to an arbitrary degree
of accuracy as close to x = 3 as we would care to choose. So long as we do not have to do
any computations at exactly x = 3 then we could replace f (x) in all our calculations with the
better behaved function g(x) = x + 3.
3.1. MOTIVATION 65
This reasoning is very disturbing. Of course it would be nice to replace a function which is
an ill-defined function on R with one which is well-defined everywhere by adding in a point but
what would we gain? And would it be sufficiently interesting to justify the seemingly illogical
manoeuvre of replacing f (x) with g(x)? Now even supposing that we could convince ourselves
that we would learn enough about f (x) to justify the transition to work with g(x), what about
other functions which have ill-defined, singular points1 , we would like to know how to go about
finding the well-behaved function that we could work with.
Let us think about this last problem. If we had been presented with f (x) how would we
have known that we could add in a point and it would become g(x)? In the case we considered
the algebra was simple to do and we could immediately find a way to get rid of the singularity in
f (x) and find g(x), but what about more complicated examples. Suppose instead we considered
the function
sin x
f (x) ≡
x
whose graph looks like an oscillating sin x function but whose amplitude varies as x1 (see figure
3.1). With the aid of the computer-generated graph, we can guess that approaching the singular
sin x
Figure 3.1: The graph of f (x) = x
.
sin x
point (as x → 0) then x
→ 1. With this example we guess that the well-behaved function is
1
Points which become ill-defined are called singularities or singular points.
66 CHAPTER 3. THE LIMIT
This is all very well, but we have been entirely loose with our method and our mathematical
language. Let’s try and rectify this2 .
lim [f (x)].
x→x0
We will develop this terminology to give its full definition shortly. However before doing so
let’s address any doubts we have about whether what we are proposing has any mathematical
value. We may think it is sufficient to be able to sketch a graph and read off its tendencies as it
approaches any singular points but can we trust our intuition about the output data from our
computers and calculators to work out the value of a limit? Let us look at a slightly different
example of an ill-defined function which will trouble us on this point. Consider now
π
f (x) ≡ sin
x
whose argument π/x is singular at x = 0. Near the singularity at x = 0 we may compute the
value of the function to try to deduce the limit of the function as x approaches 0. We show
some data in the table below:
We will return later to sinx x to see if our intuition from the graph was correct, but first we need to develop
2
x f (x) = sin( πx )
0.1 0.0000000000020665
0.01 0.0000000000206930
0.001 0.0000000002065886
0.0001 0.0000000020658863
0 Not defined.
−0.0001 −0.0000000020658863
−0.001 −0.0000000002065886
−0.01 −0.0000000000206930
−0.1 −0.0000000000020665
For our computations shown in the table we have approximated π ≈ 3.14159265359 and
we realise that if our calculator had only shown eight decimal places we would have thought
this function to be zero in the domain above. This would have led us to conclude that as x
approaches zero f (x) also approaches zero. But we’d have been wrong. With a little thought
(and without a calculator) we realise that the values of x we chose to evaluate f (x) are special
1 n
points for the sine function, as for x = ± 10 for n = 1, 2, 3, 4 we see that we were evaluating
π ±π
sin( x ) = sin( 10−n ) = sin(±10n π) = 0.
If we had chosen different values of x we would have found wildly different answers as we
approached x = 0:
x f (x) = sin( πx )
0.7 −0.9749279121818890
0.07 0.7818314824698670
0.007 0.4338837390909680
0.0007 0.9749279121161520
0 Not defined.
−0.0007 −0.9749279121161520
−0.007 −0.4338837390909680
−0.07 −0.7818314824698670
−0.7 0.9749279121818890
the table. Notice how the output of the function now varies wildly between −1 and 1. This
behaviour of the function is (slightly) better appreciated by considering the graph of the function
which is shown in figure 3.2.
π
Figure 3.2: The graph of f (x) = sin x
.
Around x = 0 the graph becomes a solid blob - we might think this is a consequence of the
thick line of the graph, but if we zoom in, the dense lines of the graph only persist (see figure
3.3). We are left to conclude that neither our calculator nor our graphing software is helping
us to find the limit of sin πx as x approaches zero.
π
Figure 3.3: Zooming in on the graph of f (x) = sin x
around x = 0: the blob persists!
In fact in this function sin πx our scheme of discovering a limit has hit a big problem: this
function does not have a limit as x approaches zero! Recall our rough definition of limx→x0 [f (x)]
as the value f (x) approaches as its argument approaches arbitrarily close to x0 . In making this
3.2. THE EXISTENCE OF THE LIMIT. 69
definition we imagined that functions do always have such limit points - we were picturing a
graph given by a continuous curve with one point missing. How can we tell if no such limit
exists? We must bear in mind that we will not be able to learn much from the graph - but we
could refine our definition of a limit further by saying that as x approaches closer to x0 so f (x)
must approach closer to limx→x0 [f (x)] ≡ L. In other words as we restrict the argument x to
only a small neighbourhood of x0 , if f (x) is converging towards a value then that number is the
limit, L. Mathematically we are saying that a limit exists if for |x − x0 | < δ that |f (x) − L| <
for all values of ∈ R+ and where δ ∈ R+ (in figure 3.4 we illustrate these ranges for the graph
of a simple function).
Figure 3.4: For any given > 0 then if f (x) lies in the range (L − , L + ) when x lies in the
range (x0 − δ, x0 + δ) then we say the limx→x0 (f (x)) = L.
Consider the limit of two simple functions at x0 = 0: y = 2 and x = 0. In the first case the
function is a horizontal line with coordinates (x, 2) for all x ∈ R - by visualising the function we
can immediately see the limit exists: the function can always be trapped between 2+ and 2−
for values of x near x0 = 0. More algebraically as x → 0 and the limit is 2: |f (x) − 2| = 0 <
for x ∈ (0 − δ, 0 + δ) for all δ > 0. The second function x = 0 is a vertical line consisting of
the points (0, y) for all y ∈ R, one cannot find a bound on the range of the function when x
approaches x = 0, i.e. the output of the function at x = 0 is R and so we cannot find a limit
that the function approaches in this second example.
Returning our thoughts to the function f (x) ≡ sin πx we realise that we can always find
any output value in the range [−1, 1] no matter how close x is to zero, i.e. this function does
not have a limit. Let’s convince ourselves of this by using our mathematical definition of the
limit. If there is no limit then we claim that
π !
| sin − L| < ∀ ∈ R+
x
70 CHAPTER 3. THE LIMIT
is violated for a restricted range of x around zero, i.e. |x| < δ. To see this pick, for example,
= 21 , then we must check
π ! 1
| sin − L| <
x 2
for all x close to zero. Before we can check this we need to specify the limit L we think the
function approaches. For simplicity in presenting the ideas we will limit our proof to show only
that the function does not limit to L = 0. We wish to check whether
π π ! 1
| sin − 0| = | sin |<
x x 2
for all x near x = 0. If we choose
π π
= + 2nπ
x 2
then x = ( 2 + 2n)−1 = 1+4n and so for large n x can be made as close to x = 0 as we like and
1 2
π 1 2
| sin |=1> = for x = , n ∈ Z.
x 2 1 + 4n
The above argument gives a specific proof that limx→0 [sin πx ] 6= 0 - we do not have to repeat
the proof for any other values of as the inequality has already been violated for = 12 and
if the limit exists it must be true for all > 0. What about other limits besides L = 0? We
leave it as an exercise for the curious reader to pursue for we have some other intricate points
to develop about the limit.
Thus far we have obscured an important point. In the notation limx→x0 (f (x)) we have
presumed that the meaning of x → x0 is unambiguous, but of course it is not. We could
approach x = x0 from either above x0 or below and, if they exist, the limits found in approaching
from two different directions may be different.
Definition 3.2.1. The left limit (or limit from below) of f (x) is denoted limx→x−0 [f (x)] or
limx↑x0 [f (x)], if it exists it is the value that f (x) approaches as x approaches x0 from below i.e.
values of x < x0 . Formally the limit limx→x−0 [f (x)] = L exists if for all > 0 there exists δ > 0
such that |f (x) − L| < for x ∈ (x0 − δ, x0 ).
Definition 3.2.2. The right limit (or limit from above) of f (x) is denoted limx→x+0 [f (x)] or
limx↓x0 [f (x)], if it exists it is the value that f (x) approaches as x approaches x0 from above i.e.
values of x > x0 . Formally the limit limx→x+0 [f (x)] = L exists if for all > 0 there exists δ > 0
such that |f (x) − L| < for x ∈ (x0 , x0 + δ).
Example 3.1. Show that the limits from above and below as x approaches zero for the function
f (x) = x1 are different.
Now 1 divided by any small number is large, however if the small number is negative then
its reciprocal will be a large negative number, but if the small number is positive it will be a
large positive number, i.e.
1 1
lim− ( ) = −∞ = 6 lim+ ( ) = ∞.
x→0 x x→0 x
You might feel uncomfortable about setting a limit equal to ∞, after all ∞ is not a number.
But this is one of the advantages of the limit, it allows us to consider values which were not in
the range of the original function. In writing the limit limx→x0 (f (x)) = ∞ means only that the
function’s output approaches ∞ as x → x0 - we need never get to the point where we claim that
the function’s output is ∞.
Example 3.2. Show that the limits from above and below as x approaches 4 for the function
x2 + 4 x>4
g(x) =
−x2 + 4 x ≤ 4
are different.
100
80
60
40
20
-2 2 4 6 8 10
Notice the jump that occurs at x = 4. Consequently the limit will differ if we approach x =
from above or below. Specifically we have
This is an example of a function which is not continuous and we will soon use the limit to give
a rigorous definition to the continuity of a function.
When the limit from above is equal to the limit from below then only a single notation is
needed - this is the notation we had adopted initially. Let us now give the formal definition of
the limit.
Definition 3.2.3. The limit of a function f (x) exists and is equal to L as x → x0 is denoted
lim [f (x)] = L
x→x0
if and only if
|f (x) − L| <
for all > 0 where 0 < |x − x0 | < δ for δ > 0.
1. The definition of the limit as stated above implicitly means that the limit from above is
equal to the limit from below on any open interval (a, b), i.e.
2. This so called − δ definition of the limit was first given by Bolzano in 1817, but is most
closely associated with Augustin-Louis Cauchy3 (1789-1857) of whom it is claimed more
mathematical concepts are named after than for any other mathematician.
We will now take a short digression from our topic of limits to discuss one important use of
the limit: to define continuous functions.
Definition 3.2.5. A function f (x) is continuous if it is continuous at all points in its domain.
1. Continuity implies that a small change in the argument of a function gives a small change
in its output, so for example, the function g(x) used in example 3.2 is not continuous.
3
Even though it was only expressed in this modern form by Karl Weierstrass.
3.2. THE EXISTENCE OF THE LIMIT. 73
2. Colloquially it is often said that continuous functions are those for which the graph can
be drawn without lifting the pen from the paper (however be wary of functions such as
sin πx which you might draw without lifting your pen by forming a blob around x = 0).
3. Some functions which we might think of as very well-behaved are not continuous, for
example f (x) = x1 on R.
4. Some functions which we might not think of as being so well-behaved are continuous, for
example f (x) = |x|. See the following example.
-10 -5 5 10
It should be obvious that it is continuous when x 6= 0 and the point x = 0 is our main
concern. If we take the limit from above we have
Hence the limits from above and from below are both equal to zero and hence the limit limx→0 [|x|]
exists and is also equal to zero. Since this is also the value of the function at zero (i.e. |0| = 0)
then we have
lim [|x|] = |0|.
x→0
74 CHAPTER 3. THE LIMIT
Hence the modulus function is a continuous function, despite its nasty-looking right-angle at
x = 0.
1 2 3 4
Knowing that a function is continuous has some interesting consequences. One of the seemingly
straightforward consequences is called the intermediate value theorem and it can be used to
make some fascinating observations. Let’s define the theorem here.
Theorem 3.1. (The Intermediate Value Theorem) Let f (x) be a continuous function on [a, b]
and suppose f (a) < y < f (b) then there exists an x = c with c ∈ (a, b) such that f (c) = y.
One can use the intermediate value theorem to find approximate expressions for irrational
numbers. For example one might consider the continuous function f (x) = x2 − 2 to evaluate
√ 4 √
2 . Now we know that when f (x) = 0 then x = 2, but why can we be sure that f (x) does
equal zero for some value of x. The intermediate value theorem tells us that such a value exists
if we can identify a and b such that f (a) < 0 < f (b). Of course we can do this, for example we
can check that f (1) = −1 and f (2) = 2, i.e. a = 1 and b = 2 tells us that there exists 1 < c < 2
√
such that f (c) = 0. Hence we have observed that 2 lies in the interval (1, 2). For a better
approximation one can reduce the range (a, b). In principle this method for other polynomial
functions gives us some certainty that roots exist, where the function is zero.
A neat observation that follows from a particular version of the intermediate value theorem.
Assume we have a continuous function f (x) on the interval [0, 1] such that 0 ≤ f (x) ≤ 1,
then there exists c ∈ [0, 1] such that f (c) = c, i.e. at one point the function behaves like the
identity function! Now f : [0, 1] → [0, 1] so you can picture f as a set of reassignments of all
the numbers in the interval [0, 1], and we are arguing that one number c is mapped to itself -
hence it is called a fixed point. Let’s use the intermediate value theorem to prove our claim.
To do so, consider the function g(x) ≡ f (x) − x which is continuous (as f (x) is continuous by
assumption and the identity function f (x) = x is continuous too). Noting that g(1) ≤ g(0) as
g(0) = f (0) − 0 = f (0) ≥ 0 and g(1) = f (1) − 1 ≤ 0 then by the intermediate value theorem we
know that there exist some value x = c such that g(c) = 0 where c ∈ [0, 1]. Therefore we have
that g(c) = f (c) − c = 0 hence f (c) = c, as claimed. For example consider the sine function
or the cosine function on the interval [0, 1] - this theorem gives us the insight that there are
values sin(c) = c and cos(ĉ) = ĉ. Can you find a way to estimate these fixed points of sine and
cosine?
4
See the University of Utah video by Jim Fowler on Coursera course “Calculus 1” to see this done in detail.
76 CHAPTER 3. THE LIMIT
The intermediate value theorem is seemingly simple but its applications are manifold. In
particular it is the intermediate value theorem that lies behind Brouwer’s fixed point theorem
which has applications in the investigation of differential equations, differential geometry and
even game theory. From small acorns, mighty oak trees grow.
Definition 3.2.6.
lim [f (x)] = L ⇐⇒ ∀ > 0, ∃X > 0 such that |f (x) − L| < for x > X.
x→∞
1. Compare this definition with the definition for approaching a limit from below, i.e. contrast
x ∈ (x0 − δ, x0 ) with x ∈ (X, ∞) used here. Of course ∞ can only be approached from
below which explains the change of form for the definition above.
2. Similarly one can take the limit x → −∞, which takes the form of a limit from above:
lim [f (x)] = L ⇐⇒ ∀ > 0, ∃X < 0 such that |f (x)−L| < for x < X.
x→−∞
f (α) ≡ tanh(α)
We have to show that for all > 0 there exists an X > 0 such that |f (α) − 1| < when
α > X. We will first give a sketch proof using the graph of tanh α shown in figure 2.13. Our
aim is to show that tanh α approaches 1 to a specified degree of accuracy encoded in so for
1
example if we started with = 100 then we aim to show that we can identify a range of α > X
1
such that | tanh x − 1| < 100 . From the graph we understand this is possible immediately as
tanh α asymptotes to y = 1 at infinity, hence it is only a small matter of computation to identify
1 99
when | tanh X − 1| = 100 , i.e. we aim to solve tanh X = 100 which gives, roughly X = 2.65
radians. This is not a proof, but by this stage we should be convinced that we could repeat the
computation of X for any .
Let us consider a more challenging version of the same question, let us try to show that
the limit is 1 directly. Recalling the definition of the hyperbolic functions as points on the right
hand branch of the hyperbola x2 − y 2 = 1, where x = cosh α and y = sinh α then we have
sinh α y y
tanh α = = =p
cosh α x 1 + y2
as limy→∞ [ y12 ] = 0. In the above line we have made a number of standard manipulations of
limits and while it should certainly be possible to read and understand the line of working above,
it should also make us a little worried as we have not yet discussed the fundamentals of working
with limits. The working above is correct, but what we will discuss next is why these lines of
working are valid.
Now if it happens that the functions, rather than be well-defined in R, are slightly weaker but
do still have well-defined limits which lie in R, then we can quickly say that the limit of h(x) is
equal to the sum of the limits f (x) and g(x): “the limit of the sum is the sum of the limits”, so
long as the limits both lie in R. With the same constraint that the limit must be a well-defined
number we can convince ourselves that also the limit of a product is the product of the limits.
But what about division, well then we must take care as although zero is a well-defined number
the operation of division by zero is ill-defined. Let’s list the rules.
1.
lim [f (x) + g(x)] = lim [f (x)] + lim [g(x)] = a + b
x→x0 x→x0 x→x0
2.
lim [f (x)g(x)] = lim [f (x)] lim [g(x)] = ab
x→x0 x→x0 x→x0
3.
f (x) limx→x0 [f (x)] a
lim [ ]= = only if lim [g(x)] 6= 0.
x→x0 g(x) limx→x0 [g(x)] b x→x0
We have laboured the point that these rules for limits are not generally valid if limx→x0 [f (x)] =
∞. Why? As infinity is not a real number we do not, for example, have a way to de-
fine its addition and subtraction in R. Let’s convince ourselves that this is a problem. Let
limx→x0 [f (x)] = ∞ and limx→x0 [g(x)] = 1, then if we could apply the rules above for the sum
of limits we would find
!
lim [f (x) + g(x)] = ∞ + 1 = ∞ (3.1)
x→x0
!
lim [f (x) − f (x)] = ∞ − ∞ = lim [0] = 0 (3.2)
x→x0 x→x0
In fact the first of the above equations presents no problems, but it is the second one where
we have subtracted one infinite limit from another infinite limit that will cause all manner of
illogical deductions, for example, if this second rule were correct we could write
! !
lim [f (x) + g(x) − f (x)] = ∞ + 1 − ∞ = 0
x→x0
!
i.e. that 1 = 0 which is a problem. Hence we must take more care when manipulating sums
of infinite limits and so these general rules above are not generally valid when the limit is infinite.
3.3. WORKING WITH LIMITS 79
x2 − 9
(x − 3)(x + 3)
lim = lim = lim (x + 3) = lim x + lim 3 = 3 + 3 = 6.
x→3 x − 3 x→3 x−3 x→3 x→3 x→3
x2 − 4x − 5 limx→−1 (x − 5)
(x + 1)(x − 5) (x − 5) −6
lim = lim = lim = = = 6.
x→−1 x(x + 1) x→−1 x(x + 1) x→−1 x limx→−1 x −1
x2 + 7x − 3
2
1/x2
x + 7x − 3
lim = lim ×
x→∞ 3x2 − 18x + 24 x→∞ 3x2 − 18x + 24 1/x2
1 + x7 − x32
= lim
x→∞ 3 − 18 + 242
x 7x 3
limx→∞ 1 + x − x2
=
limx→∞ 3 − 18 + x242
x
limx→∞ 1 + limx→∞ x7 − limx→∞ x32
=
limx→∞ 3 − limx→∞ 18 + limx→∞ x242
x
1+0−0
=
3−0+0
1
= .
3
Notice in the first line that we have multiplied the function inside the limit by 1 written in
2
the form 1/x
1/x2
- doing this to a function would normally result in changing the function so it
is no longer defined at x = 0, but doing so inside the limit where x → ∞ means we are not
considering the function near x = 0 so the non-trivial change to the function does not affect
the value of the limit.
80 CHAPTER 3. THE LIMIT
We will introduce a final rule that can be very useful in evaluating limits, this is a rule for
the limit of a composite function.
If limx→x0 [g(x)] = b, limx→b [f (x)] = a and f (x) is a continuous function then for x0 ∈ R we
have
4.
lim [f (g(x))] = f ( lim [g(x)]) = a.
x→x0 x→x0
You should convince yourself that this rule is numerically plausible for well-behaved func-
tions and for such functions it is a consequence of the rules for the limits of sums, products and
quotients for all functions which can be written as a sum, product or quotient.
1. The conditions surrounding this limit are more constraining than for the other rules, note
in particular that as x0 ∈ R the asymptotic limit x0 → ∞ is not necessarily covered by
the rule. However when facing such a limit one may always make a change of variables
x ≡ y1 such that the limit becomes y → 0 and is then covered by the rule. We will discuss
changing the limit variable in more detail in the upcoming section on the evaluation of
limits.
2. Notice that functions f (x) which are not continuous are not covered by this rule. Why is
this? Consider the following functions:
x2 + 1 x 6= 0
f (x) = and g(x) = 0
7 x=0
As limx→0 [f (x)] = 1 6= f (0) = 7 then f (x) is not a continuous function. Now let us
note that the composite function is f (g(x)) = f (0) = 7 is a continuous function with a
well-defined limit, i.e. we know that limx→0 [f (g(x))] = limx→0 [7] = 7. However if we had
attempted (recklessly) to use the rule above we would have needed limx→0+ [g(x)] = 0 and
limx→0 [f (x)] = 1 and so if we could use the rule we would find
! !
lim [f (g(x))] = f (lim [g(x)]) = 1.
x→0 x→0
Note in the line above the rule for limits of composite functions has been applied (in
order to check if it is valid for a function with a discontinuity - we see that it is not).
We draw attention to that the constraint on using the rule for taking limits of composite
functions is only on the function f (x) being continuous, there is no constraint that g(x)
be a continuous function.
3.3. WORKING WITH LIMITS 81
cos2 x limx→ π cos2 x
lim e =e 2 = e0 = 1.
x→ π2
If we wished to compare this with the abstract formulation of the limit rule we would define
To further convince ourselves that this is correct we show the graph in figure 3.8
1.0
0.5
-3 -2 -1 1 2 3
-0.5
-1.0
π(x2 −1)
Figure 3.8: The graph of f (x) = sin 4(x−1)
in the vicinity of x = 1.
A very common trick that can be used to manipulate expressions is to replace a function
with the exponential function acting on the natural logarithm, e.g.
g(x) )
lim [f (x)g(x) ] = lim [eln(f (x) ] = lim [eg(x) ln(f (x)) ] = elimx→x0 [g(x) ln(f (x))] .
x→x0 x→x0 x→x0
82 CHAPTER 3. THE LIMIT
It’s not clear at this stage that we have much call for such a rearrangement, but as the functions
we consider become more complicated it will be useful for us to try such operations in order
to split complicated limits into simpler ones. The logarithm allows powers to be re-cast as
coefficients which can prove very useful, for example a limit that you will evaluate in the
tutorial exercises can be rearranged as
1 1
lim [(1 + x) x ] = elimx→0 [ x ln(1+x)] .
x→0
Since we do not know how to evaluate limx→0 [ x1 ln(1 + x)] yet we pause here and leave the
development of this limit for later after we have shown that limx→0 [ x1 ln(1 + x)] = 1.
Example 3.10. Show that changing the order the limits are taken in the following double-limit
changes its value:
lim lim (1 + tanh(x + y)).
x→∞ y→−∞
while
lim lim (1 + tanh(x + y)) = lim (1 + 1) = 2.
y→−∞ x→∞ y→−∞
We include a sketch of (1 + tanh(x + y)) in figure 3.9 to help us visualise the limits.
However there are many functions for which the order of the limits does not change the
evaluation.
Example 3.11. Show that changing the order the limits are taken in the following double-limit
does not change its value:
lim lim (x2 y − e−x−y ).
x→0 y→1
3.3. WORKING WITH LIMITS 83
2.0
1.5
10
1.0
0.5 5
0.0
-10 0
y
-5
0 -5
x
5
-10
10
1
lim lim (x2 y − e−x−y ) = lim (x2 − e−x−1 ) = −
x→0 y→1 x→0 e
and
1
lim lim (x2 y − e−x−y ) = lim (−e−y ) = − .
y→1 x→0 y→1 e
We include a sketch of x2 y − e−x−y in figure 3.10 to help us visualise the limits.
We do not intend to draw any conclusions about the class of functions and double-limits for
which it is possible to change the order of the limits. However you can begin to see that the
order of the limits specifies the path one takes in approaching a limit on a surface, or on a more
general multi-variable function. Consequently the condition for a a multi-variable function to
be continuous (i.e. the limit is independent of the path) is constraining. This will be a subject
of study in some of the following courses in your degree.
0 2
-5
1
-1
y
0 0
x 1
-1
2
where limx→x0 [f (x)] ∈ R, limx→x0 [g(x)] = 0, and g(x) is a factor of f (x), then if h(x) ≡ fg(x)
(x)
inside the limit and limx→x0 [h(x)] exists we can evaluate the limit. We saw this method used
in examples 3.5 and 3.6. When the limit is taken as x → ±∞ on ratios of polynomial functions
of the same degree we can evaluate the limit by looking at the terms of highest order, i.e.
Pn j
j=0 aj x an x n
an an
lim Pn k
= lim = lim = .
x→∞ bn xn
k=0 bk x
x→∞ x→∞ bn bn
We saw this method used in example 3.7.
Inspired by the graph, shown in figure 3.1 we claimed earlier that limx→0 sinx x = 1. Now
we could at this stage invoke the infinite power series for x and convince ourselves that our
guess was correct, i.e.
P n xn
sin x n, odd (−1) n!
lim = lim
x→0 x x→0 x
x3 5
x − 3! + x5! − O(x7 )
= lim
x→0 x
2
x x4
− O(x6 )
= lim 1 − +
x→0 3! 5!
x2 x4
− lim O(x6 )
= lim 1 − lim + lim
x→0 x→0 3! x→0 5! x→0
=1
Notice the use of the O(x7 ) notation above - in this context it simply means that the size of
the next largest terms in the infinite series, after those we have written out, is proportional to
3.3. WORKING WITH LIMITS 85
x7 . This notation is a shorthand and allows us to write only the leading terms in an infinite
power series.
However our formula for the sine function as an infinite power series rests on some con-
jectures we have yet to prove. Fortunately there is another common method used to evaluate
limits, which we can use to confirm this function limits to one as x approaches zero.
This theorem is sometimes called the Squeeze Theorem or the Pinching Theorem.
for x such that |x − x0 | < δ and if limx→x0 [f (x)] = limx→x0 [h(x)] = L where L ∈ R then
limx→x0 [g(x)] = L.
That is if we can identify two functions f (x) and h(x) that sandwich the function whose
limit we are interested in g(x) in the vicinity of x in the limit, and that they are equal in the
limit then as g(x) lies between them, then it too must equal the same limit. This theorem is
very helpful if we can find two sandwiching functions and their limit is simpler to evaluate than
the function in the middle of the sandwich. Let us look at some examples.
The trickiest part in applying the sandwich theorem is identifying the bounding functions
which lie above and below the function we are interested in, in the vicinity of the limit. To
find some useful inequalities we will look again at the two right-angled triangles (with sides
of length {cos θ, sin θ, 1} and the second with edges of length {1, tan θ, sec θ}) which bound the
segment of the unit circle subtending an angle θ at the centre of the circle - this is hard to
comprehend without a diagram, fortunately we have drawn the central idea in an earlier image
- see figure 2.7. The length of the arc shown in the figure in radians is θ, so from the diagram
we immediately have the following trigonometric inequality when 0 ≤ θ ≤ π2 (note θ is used as
a positive length in constructing the inequality):
π
sin θ ≤ θ ≤ tan θ for 0 ≤ θ ≤
2
86 CHAPTER 3. THE LIMIT
We may usefully split this inequality in two and make use of both
π
sin θ ≤ θ and θ ≤ tan θ for 0 ≤ θ ≤ .
2
From the first inequality we may divide through (recall θ > 0 here) to obtain
sin θ π
≤1 for 0 ≤ θ ≤
θ 2
which will give the upper bound we will use when we invoke the sandwich theorem. While from
sin θ
θ ≤ tan θ = cos θ
we have
sin θ π
cos θ ≤ for 0 ≤ θ ≤ .
θ 2
Altogether we have
sin θ π
cos θ ≤ ≤1 for 0 ≤ θ ≤
θ 2
sin (−θ) sin(θ)
which we may extend to negative θ as cos (−θ) = cos (θ) and −θ
= θ
. So we have,
sin θ π π
cos θ ≤ ≤1 for − ≤θ≤
θ 2 2
which we may use in the sandwich theorem as by taking the limit to zero we find:
sin θ
1 = lim(cos θ) ≤ lim( ) ≤ lim(1) = 1
θ→0 θ→0 θ θ→0
hence as the upper bound and the lower bound both limit to one as x approaches zero, by the
sandwich theorem we have that
sin θ
lim( )=1
θ→0 θ
as required.
In this example we are at an advantage as it features a trigonometric function and our first
thought might be to try to use
1
−1 ≤ sin ≤ 1.
x
To turn this into something close to our expression, without losing the order of the inequality,
we must multiply by a positive number. Hence let’s multiply by |x| rather than x to obtain:
1
−|x| ≤ |x| sin ≤ |x|.
x
3.3. WORKING WITH LIMITS 87
1.0 1.0
0.5 0.5
-0.5 -0.5
-1.0 -1.0
Figure 3.11: In the left-hand graph we show −|x| ≤ |x| sin(1/x) ≤ |x|, while on the right we
show −|x| ≤ x sin(1/x) ≤ |x|
In the graph on the left of figure 3.11 we sketch the graphs of these functions to see the
inequality in terms of the curves. How do we arrive at an inequality useful for the problem at
hand? Now we notice that −|x| ≤ x ≤ |x|, hence we now can deduce
1
−|x| ≤ x sin ≤ |x|
x
and this inequality is shown graphically on the right in figure 3.11. Now the function sandwiched
in the middle of the inequality is the one we are interested in, and furthermore the inequalities
are of the form of the sandwich theorem as in the limit we have limx→0 (|x|) = limx→0 (−|x|) = 0.
Hence if take the limit on the inequality we find:
1
0 = lim (−|x|) ≤ lim ((x sin ) ≤ lim (|x|) = 0.
x→0 x→0 x x→0
We did a lot of work to find these limits, can we learn anything else from these results? The
two limits in the example do look rather similar and we can modify them to show that they
tell us about two different limits of the same function.
Let us commence with the result of the second limit
1
lim ((x sin ) = 0.
x→0 x
If we change the variable used in the limit to y ≡ x1 then the function in the limit becomes
1
y
sin y, the same function, albeit written in terms of a different variable, as in the first example.
We also have to change the limit x → 0 into a limit in terms of y, using y = x1 as x approaches
88 CHAPTER 3. THE LIMIT
a very small number so y approaches a very large number, but there is a small concern for as
x → 0+ then y → ∞, while as x → 0− , y → −∞. So in translating the limit from one in terms
of x we in fact generate two limits in terms of y:
sin y sin y
lim ( ) = 0 and lim ( ) = 0.
y→−∞ y y→∞ y
Given that these are limits to infinity it is in fact necessary that the limits are one-sided (i.e.
from above and from below) so what seemed like a complexity is rather satisfying and we can
be pleased with this two-for-one result5 . We may also use the same change of variables to show
that the limit limx→0 ( sinx x ) = 1 can be rewritten as the two asymptotic limits:
1 1
lim (y sin ) = 1 and lim (y sin ) = 1.
y→−∞ y y→∞ y
Furthermore we may even consider changing to imaginary variables if the limit remains
well-defined, for example,
(e − e−ix )
1 ix
(e − e−y )
y
sin x 2i sinh y
1 = lim = lim = lim = lim
x→0 x x→0 x y→0 2y y→0 y
where we have substituted y ≡ ix, in order to find another useful limit.
Exercise 3.2. Prove that limx→0 [ tanx x ] = 1. Hint: you may assume, without further proof, that
limx→0 [ sinx x ] = 1 in your answer.
The result in example 3.13 above may seem rather exciting to us, as previously we argued
that sin πx had no limit as x → 0. Now if we rescale by defining y = πx then as
the variable
x → 0 we have y → 0 and so we claim that limy→0 sin y1
does not exist, where from the
results above
we see that
just by multiplying the function by y we discover a well-defined limit
i.e. limy→0 y sin y1
= 0. Let us emphasise that we cannot use the fact that the limit of a
product is the product of the limits here, i.e.
1 1
lim y sin 6= lim (y) lim (sin )
y→0 y y→0 y→0 y
5
Given the symmetries of the sine function in the numerator and the linear function in the denominator then
even if we had only one of these results we could have determined the second by substituting z = −y.
3.3. WORKING WITH LIMITS 89
20 1500
15
1000
10
5 500
-3 -2 -1 1 2 3
-10 -5 5 10
Figure 3.12: The graphs of f (x) = ln(x) (blue), g(x) = x (red) and h(x) = ex (green) above,
near zero on the left and for a slightly larger domain on the right. Notice that for positive x
f (x) < x < ex and that exponential growth is really very fast indeed!
as the limit limy→0 (sin y1 ) is undefined and so the rule for limits of products is not valid here.
What has happened is that the linear function y has gone to zero “faster” than (sin y1 ) has
oscillated about zero as y → 0. While we cannot split the product of limits up in this case it
is useful to adopt this kind of thinking and ask yourself how parts of functions grow or shrink
as the limit is taken and which types of function dominate. This thinking does not replace a
detailed analysis of a function but it can prove a helpful guide. We will now compare some
other common types of function in the limit to get a sense of which dominate in specified limits.
The answer to the question in the title is given away by these terms use in modern language
where it is common to speak of exponential growth (or decay) to refer to something which
grows incredibly fast (or shrinks very quickly). You are also probably familiar with logarithmic
graph paper which is used to plot functions whose outputs vary very quickly - the logarithm
is applied to slow down the variation of the function. So as ln(x) = ln(ln(ex )) we may guess
that ln(x) grows more slowly as x increases than x (an example of a polynomial function), and
in turn as x = ln(ex ) that polynomial functions grow more slowly than exponential functions
as x grows. In each case it is common to speak of logarithmic growth, polynomial growth and
exponential growth. Of course the linguistic meaning of the terms is confirmed by plotting the
graphs of f (x) = ln(x), g(x) = x and h(x) = ex . The result shown in figure 3.12 is what we
should expect: f (x) is the mirror image of h(x) (where it is defined) in g(x) as h(x) is the
inverse function of f (x).
Let us now investigate how these types of functions behave in the limit by way of the fol-
lowing examples.
90 CHAPTER 3. THE LIMIT
Let us reflect and notice that we are comparing the speed with which ln(1 + x) approaches
zero - this is a shifted version of the logarithm function sketched in figure 3.12. Hence as x → 0,
ln(1 + x) → ln(1) = 0, and in all the limit in question is approaching the dreaded 0/0 and we
cannot obviously split the function as a product. Instead let us try making a substitution to
change the variable, we will substitute 1 + x = ey so that the logarithm is simplified, we end up
with
y
ye− 2
y y y/2 −y
lim y = lim y y y = lim y
= lim y
lim e 2] = 1
y→0 e − 1 y→0 e 2 (e 2 − e− 2 ) y→0 2 sinh y→0 sinh y→0
2 2
There is an artistry to analysis and which route you prefer to take will depend upon your tastes.
This limit is telling us that the polynomial function x which goes to zero dominates the
logarithmic function ln(x) which approaches −∞ as x → 0. We give the sketch of the graph
near zero in figure 3.13. Notice that the limit is taken from above as ln(x) is undefined for
x ≤ 0.
3.3. WORKING WITH LIMITS 91
1.0
0.5
This will be our most intricate proof of a limit and we will set it out in three steps below.
However our first instinct might be to change variables to remove the logarithm, so let us
substitute x = e−y so that the limit becomes6
− y e−y = 0.
lim
y→∞
Again we note if it is true then we learn that the behaviour of the exponential near −∞ sup-
presses the growth of y, as expected. Our aim is to use the sandwich theorem and so we must
identify some useful inequalities. At its root we are trying to compare ey with y which will
motivate the first of our three steps:
(i) We claim that ew ≥ w for all w ≥ 0. Let’s prove this algebraically. To do so we will
consider the difference function f (w) = ew − w, and we will have proved our claim if we
can show that f (w) ≥ 0 for all w ≥ 0. First we note that f (0) = 1, so the claim is true
d
at w = 0. Now using the definition of the exponential function (that dx (ex ) = ex ) we can
show that
df
= ew − 1 ≥ 0 for all w ≥ 0.
dw
Hence the difference function is a function f (w) with an ever increasing gradient as w
increases from zero towards ∞. As it is positive at w = 0 it must (due to its positive
gradient) therefore be positive for all y > 0 too. Hence we have argued that f (w) =
ew − w ≥ 0 for all w ≥ 0 as claimed.
6
There is no reason at this stage to substitute x = e−y rather than x = ey but it will prove very useful
later to make this choice, the other choice will not work with the following method. Why not try it, after you
have worked through the three steps and find out where the substitution x = ey runs into difficulty with this
method?
92 CHAPTER 3. THE LIMIT
(ii) Now let w ≡ z2 in the above inequality. It is perhaps surprising that such a substitution
will prove valuable but it is only the first prong of this two-pronged manoeuvre. From step
(i) we have
z z
e2 ≥ for all z ≥ 0.
2
The second part here is to square the above inequality (as both sides are positive this does
not present any ambiguity) and we find
z2
ez ≥ for all z ≥ 0
4
hence after rearranging (and taking care not to divide by zero implicitly) we have
4
ze−z ≤ for all z > 0.
z
Note that limz→∞ [ z4 ] = 0.
(iii) We are interested in the limit limy→∞ [−yey ]. From step (ii) we have found one useful
inequality whose bounding function tends to 0 in the limit, if we can identify another
function g(y) such that g(y) ≤ ye−y and satisfying limy→0 [g(y)] = 0 for large y we will be
in a position to use the sandwich theorem. The simplest function to try is the constant
function g(y) = 0, now for y > 0 then ye−y > 0 hence we may use the constant function
and we have the inequalities:
4
0 ≤ ye−y ≤ ∀ y > 0.
y
You might be concerned about the sign of the function in the above compared to our
function, in truth if the sandwiching is correct it does not matter, but for the fastidious
we may change the sign and exchange the inequality signs:
4
0 ≥ −ye−y ≥ − ∀ y > 0.
y
Now taking the limit we have
4
0 = lim [0] ≥ lim [−ye−y ] ≥ lim [− ] = 0
y→∞ y→∞ y→∞ y
hence by the sandwiching theorem
lim [−ye−y ] = 0
y→∞
as required. This was, perhaps, a tedious proof, but its method was interesting and you
can be assured that there are simpler methods for finding this limit which we will meet
3.3. WORKING WITH LIMITS 93
in this course. For example if we had been willing to make use of the power series for
yn ym
ey = ∞ y
P
n=0 n! then we have the straightforward inequality that e ≥ m! for any y > 1 and
−y
m ∈ Z+ , hence rearranging gives ym!
m ≥ e and from which we obtain ye−y ≤ ym−1
m!
which
gives an upper bound to use in the sandwich theorem.
From this last example we also readily obtain another useful inequality by substitution:
(x≡ y1 ) ln(y −1 ) − ln(y)
0 = lim+ x ln(x) = lim = lim .
x→0 y→∞ y y→∞ y
Finding intelligent ways to discover limits of functions is an art-form and requires practise: do
try out all the limit questions you can find. We now turn our attention to the a fundamental
object in calculus: the derivative and the limit will play a central role.
94 CHAPTER 3. THE LIMIT
4. The Derivative
In which we will meet the derivative function, the function which gives the slope of the tangent
to a point on function at any point. We will see that the limit will play a central role in the
definition of the derivative and we will derive the derivatives of many of the most common
functions from first principles.
The material in this chapter will be covered in weeks 7 and 8 of the course.
Limits are clearly very interesting in the detailed study of a function, but are they of practical
importance? Is it ever important to know the value a function is approaching rather than the
actual value of the function? One of the motivating factors in developing the calculus was
the desire to understand physical properties in the natural world. It is intrinsically interesting
to enquire what is the speed of an object, but it can be surprisingly difficult to define. Let
us explain why such a simple concept proved challenging to pin down until the calculus was
developed.
The average speed of an object is defined as
Change in the object’s position.
Average speed ≡
Change in time.
If an object moves from ~r(t1 ) to ~r(t2 ) over the time interval [t1 , t2 ] then we can use this definition
to find that its average speed is
|~r(t2 ) − ~r(t1 )|
.
t2 − t1
Obviously this presents some problems for objects whose speed changes rapidly, the average
speed loses a lot of information. For example consider an idealised bouncing ball moving in the
x-direction with constant speed 1 :
1
In case you were wondering, for a simple illustration we have neglected resistance in the ball’s motion, and
presumed gravity is 10ms−2 acting vertically downwards, started the ball at a height of 1m with zero initial
95
96 CHAPTER 4. THE DERIVATIVE
yHtL
1.0
0.8
0.6
0.4
0.2
xHtL
1 2 3 4 5 6
The horizontal speed is constant by design but the vertical speed is varying from zero at
the top of the bounce and its maximum speed when it hits the floor. Let’s look at the plot of
its vertical position y against time t for its first bounces.
yHtL
1.0
0.8
0.6
0.4
0.2
t
0.5 1.0 1.5 2.0 2.5
√
As x(t) = 5t this graph looks like a squashed version of the last sketch.
Let us make a computation of the ball’s average speed for the time interval [0, √15 ] which
from the graph (or better from the equation y(t) = −5t2 + 1) we can read off y(0) = 1 and
y( √15 ) = 0. In figure 4.1 we have annotated the graph of the first bounce (in a plot of y(t)
against t), to help compute the average speed. Reading off from the graph for t1 = 0 and t2 = 1
(so that t2 − t1 = 1), we have y(t2 ) − y(t1 ) = −1. The average vertical speed (in ms−1 ) is
|y(t2 ) − y(t1 )| | − 1| √
= 1 = 5.
t2 − t1 √
5
√
vertical speed and a horizontal speed of 5ms−1 . The equations of motion give us y = −x2 + 1, before any
bounce occurs. In addition we have assumed a perfect bounce takes place, i.e. an instantaneous reflection of
the speed of the ball in the floor-line at the moment of impact. The situation is not very realistic but is good
enough for our purpose.
97
yHtL
tz - t ,
1.0
tips
) yay ( t )
-
,
= -
1
.
=fg
0.8
0.6
0.4
0.2
t
x=o 0.2 0.4 0.6 0.8
q
1
Figure 4.1: The computation of the average speed from t = 0 to the first bounce at t = 5
.
From the graph we can see this is the speed a ball which moved with constant speed from
y = −1 to y = 0 would move at. Our ball actually has this speed for only one moment
during its first descent: we can surmise this only because we are actually experts in this kind
of motion and we know that the vertical speed starts at 0ms−1 and continually increases due
to the gravitational acceleration to its maximum speed. In other words, the straight line (in
red on figure 4.1) is tangent to the path y(t) at just one point.
Of course if we were to compute the average speed for smaller and smaller time intervals
we would be able to find an average velocity for each time interval, i.e. we could build up
a set of average speeds v̂1 , v̂2 , v̂3 , . . . v̂n for the time intervals [t1 , t2 ], [t2 , t3 ], [t3 , t4 ], . . . [tn , tn+1 ]
respectively. We attempt to illustrate the improved accuracy in finding in figure 4.2, where
one can see that for sufficiently short time intervals the sequence of straight lines whose slopes
are the average speed for the interval approaches and becomes almost indistinguishable from
the curve y(t). If we consider the limit in which a time interval [tn , tn+1 ] becomes vanishingly
yHtL yHtL yHtL
E ,
=D .
t , t ,
:
0.6 0.6 0.6
tz= 't
:
0.4 0.4 0.4
0.2
0.2 0.4
= 't
0.6 0.8
t
0.2
0.2
:
0.4
4
0.6 0.8
t
0.2
Figure 4.2: Approaching a speed function by finding the average speed for smaller and smaller
time intervals.
small (i.e. tn+1 → tn ) then the average speed for that interval approached the instantaneous
speed: it is the slope of the tangent to the curve as tn+1 → tn . This gives us a definition of
98 CHAPTER 4. THE DERIVATIVE
the instantaneous speed at time tn , as the slope of the tangent to the curve y(t) at time t = tn
plotted against t. Since this curve is relatively well-behaved (apart from at the bounces) this
gives a good practical definition for constructing the speed function, namely the speed in the
y-direction at time t = tn is given by evaluating
|y(tn+1 ) − y(tn )|
.
tn+1 − tn
when tn+1 = tn . But of course when t2 = t1 the denominator becomes zero and the function
2
is not defined. Earlier we saw the example of the function xx which was not defined at x = 0
and the limit was introduced as a method to make sense of the algebraic simplification of this
2
function as limx→0 [ xx ] = limx→0 [x] = 0 which is valid arbitrarily close to x = 0. In exactly the
same manner we may avoid contemplating division by zero in the speed function for y(t) by
extending the definition to include a limit as:
y(tn+1 ) − y(tn )
ẏ(tn ) ≡ lim .
tn+1 →tn tn+1 − tn
The dot on the y is Newton’s notation for a time-derivative, hence ẏ is the standard notation
for the velocity in the y-direction2 .
1. There are different notations in common use for the derivative of a function f (x), the
df
most common notation is due to Leibniz and is dx , but one may also commonly write
0
f (x) for the derivative of f (x) with respect to x. This second ‘primed’ notation is due
to Lagrange and means that the derivative is taken with respect to the argument of the
df
function, hence f 0 (y) = dy , when the argument is a complicated function this notation is
very useful, e.g. the derivative of f (x − ct), a travelling wave function, with respect to
(x − ct) is written f 0 (x − ct) and frequently the argument is dropped in the notation so
one may simply write f 0 for this same derivative - use of this notation obviously requires
the function and its argument to be clearly defined. As mentioned earlier time derivatives
have a special “dotted” notation reserved for them, which was first used by Newton who
was studying dynamics, so that ẋ = dx dt
.
able to construct the straight line through two points arbitrarily close to each other (but
not identical) in this way we may have a well-defined definition of the tangent line to a
function: a line apparently given meaning by only a single point on f (x).
This gives a value for the slope of the tangent to f (x) at x = x0 , it is not a function of x.
To construct the derivative function, which we have defined above we should evaluate the
derivative at all points x0 in the domain of the function, the resulting set of values can be
used to then define the derivative as given above. One understands why we have taken the
short cut we have in our definition, but we must wonder what happens if the derivative at
a point x = x0 does not exist. Such a function is said to not be differentiable at x0 and
we will return to this idea in the following section.
5. Note that the limit in the derivative will only exist on an open interval (a, b) ∈ R when
the limit from above is equal to the limit from below.
Finding the derivative by evaluating the limiting value of the slope is often referred to as
‘differentiation from first principles’ as it does not rely upon knowing the derivative of any other
function. We will now find from first principles some standard derivatives.
Example 4.1. Find the derivative of the linear function f (x) = ax + b with respect to x where
a, b ∈ R at the point x = x0 .
f (x) = ax2 + bx + c
= 2ax + b
f (x) = axn
Note that this derivative was assumed in earlier chapters when defining the exponential
function, the trigonometric functions and the hyperbolic functions as infinite series. This was
one key pillar that we built much of the course so far upon, the other was that the infinite sum
for ex converges. From first principles we have:
df f (x + h) − f (x)
= lim
dx h→0 h
a(x + h)n − axn
= lim
h→0 h
X n
a n n−k k n
= lim x h −x
h→0 h k
k=0
X n
a n n−k k
= lim x h
h→0 h k
k=1
X n
n n−k k−1
= lim a x h
h→0 k
k=1
n n−1 n n−2 2
= lim a x +a x h + O(h )
h→0 1 2
= anxn−1
102 CHAPTER 4. THE DERIVATIVE
Or, alternatively, without using the binomial expansion (so that the following becomes a proof
for all n ∈ R):
df f (x + h) − f (x)
= lim
dx h→0 h
a(x + h)n − axn
= lim
h→0 h
(1 + hx )n − 1
n
= ax lim
h→0 h
ln[(1+ h )n ]
n e x −1
= ax lim
h→0 h
n ln(1+ h )
e x − 1
n
= ax lim .
h→0 h
Here we now choose to insert ln(1 + g)/ ln(1 + g) = 1, so that we may make use of the standard
x
limits (evaluated in the previous chapter): limx→0 [ ln(1+x)
x
] = 1 and limx→0 [ e x−1 ] = 1 - see
example 3.14 for the proof of both limits. So, upon inserting ln(1 + g)/ ln(1 + g) we have
en ln(1+g) − 1
df ln(1 + g)
= axn−1 lim
dx g→0 ln(1 + g) g
n ln(1+g)
n−1 e −1 ln(1 + g)
= ax lim lim
g→0 ln(1 + g) g→0 g
k
e −1
= axn−1 lim
k→0 k/n
= anxn−1
where we substituted k ≡ n ln(1+g). This second method is much longer than using the binomial
expansion, but it used two nice and common “tricks” to manipulate the limit into the form of
some standard limits.
f (x) = ax
4.1. DIFFERENTIATION FROM FIRST PRINCIPLES 103
df f (x + h) − f (x)
= lim
dx h→0 h
x+h
− ax
a
= lim
h→0 h
h
x a −1
= lim a
h→0 h
h
x a −1
= lim [a ] lim
h→0 h→0 h
ln(ah )
x e −1
= a lim
h→0 h
k
x e −1
= a lim
k→0 k/ ln(a)
2
(1 + k + k2! + O(k 3 )) − 1
x
= a ln(a) lim
k→0 k
= ax ln(a) lim 1 + O(k)
k→0
= ax ln(a)
Where we changed the limit variable using k = ln ah = h ln a. Note that had we chosen a = e,
df
Euler’s number, then we would have found dx = ex ln(e) = ex which was part of our defining
relation for the exponential in earlier chapters.
f (x) = ln(x)
df f (x + h) − f (x)
= lim
dx h→0 h
ln(x + h) − ln(x)
= lim
h→0 h
ln(x(1 + hx )) − ln(x)
= lim
h→0 h
h
ln(1 + x ) + ln(x) − ln(x)
= lim
h→0 h
ln(1 + hx )
= lim
h→0 h
ln(1 + g)
= lim
g→0 gx
1 ln(1 + g)
= lim
x g→0 g
1
=
x
f (x) = sin(x)
df f (x + h) − f (x)
= lim
dx h→0 h
sin(x + h) − sin(x)
= lim
h→0 h
sin(x) cos(h) + cos(x) sin(h) − sin(x)
= lim
h→0 h
sin(x)(cos(h) − 1) + cos(x) sin(h)
= lim
h→0 h
cos(h) − 1 sin(h)
= sin(x) lim + cos(x) lim
h→0 h h→0 h
4.1. DIFFERENTIATION FROM FIRST PRINCIPLES 105
Let us comment at this stage that we do have recourse to using the series expansion for cos(h) =
1 − O(h2 ) to rapidly show that
O(h2 )
cos(h) − 1
lim = lim = lim O(h) = 0
h→0 h h→0 h h→0
sin(h) df
and thence after substituting the limit limh→0 h
= 1 we would find dx = cos(x). However
since this is an analysis course we are going to think of another proof just to practise using the
sandwich theorem again, and of course it will take slightly longer to get to the same answer but
will be interesting. As −1 ≤ cos(h) ≤ 1 then we may deduce that (cos(h) − 1) ≤ 0, this gives
the upper bound for use in the sandwich theorem. For the lower bound we turn to geometry and
the sector shown in figure 2.5. The base of the large right-angled-triangle has length 1, and we
have embedded a similar triangle within it whose base is cos(θ). Hence the remaining part of
the base line (whose length isn’t indicated on the diagram) has length 1 − cos(θ). Now consider
the right-angled-triangle with base 1 − cos(θ) and height sin θ, its hypotenuse has length-squared:
As this triangle would be embedded within the sector of the unit circle shown in figure 2.5, its
hypotenuse must have length less than or equal to the arc length shown, which is θ, i.e. we have
2(1 − cos θ) ≤ θ2
or
θ2
(cos θ − 1) ≥ −
2
which gives our lower bound. Returning to our limit, we are now able to sandwich the limit as
follows
h2
h cos(h) − 1
0 = lim − = lim − ≤ lim ≤ 0.
h→0 2 h→0 2h h→0 h
Hence by the sandwich theorem we have
cos(h) − 1
lim =0
h→0 h
d
which we may use to show4 that dx
(sin(x)) = cos(x).
If either of these cases occur, the derivative at x = x0 does not exist and the function is said
to be ‘not differentiable’ at x = x0 .
df
Definition 4.2.1. A function f (x) is differentiable at the point x = x0 if the derivative dx
exists at x = x0 . A function for which the derivative exists for all points in its domain is called
a differentiable function.
Proof: Suppose that f (x) is a differentiable function. Now recall the definition for f (x) to
be continuous at the point x = x0 , namely that
This is what we are aiming to show, we will nevertheless start with this statement and see how it
relates to the limit which defines the derivative of f (x). We begin by rearranging the statement
of the continuity of f (x) at x = x0 :
where we redefined the limit variable h ≡ x − x0 and in the penultimate line we were able to
split the limit of the product into the product of the limits as we know both limits exist. We only
know that both limits exist because we assumed that the function is differentiable, and hence
it is differentiable at the point x = x0 . Hence by assuming that f (x) is differentiable we have
shown that it is also continuous.
At this stage you might be wondering if there are any continuous functions which are not
differentiable. Indeed there are, and this point tackles our second problem with the derivative:
when the limit of the slope does not exist. The most famous example used to show that a
continuous function is not necessarily differentiable is the modulus function (which we defined
in definition 2.5.3) and is sketched in figure 4.3.
Figure 4.3: The modulus function is a continuous function, but is not a differentiable function.
One can immediately guess where the problem point will be from the graph: the graph of
the function although continuous has a right-angle in it at x = 0. Our instincts will prove to
be correct but let us check it carefully. First we ought to confirm that the modulus function is
continuous, namely we check that
lim [|x|] = |x0 |
x→x0
this function is simple enough that we may be confident enough to assert the above from the
sketch of the graph: it is a continuous function. But is it differentiable? The derivative can be
readily checked to give (for f (x) = |x|)
df +1 x > 0
0
=
dx x=x0 −1 x0 < 0.
108 CHAPTER 4. THE DERIVATIVE
Hence now we see that at x0 = 0 the limit from above in the definition of the derivative does
not equal the limit from below. Above zero the limit in the derivative is +1, while from below
it is −1, hence
f (0 + h) − f (0)
lim
h→0 h
does not exist and so the modulus function is an example of a continuous function which is not
differentiable at x = 0, and hence is not a differentiable function.
These thoughts above about the conditions necessary for a function to have a well-defined
derivative, highlight a very simple idea behind the derivative, namely that if one zooms in
sufficiently close to the graph of a differentiable function, its graph approaches a straight line.
The straight line it approaches is part of the tangent to the function at a point, and the slope
of the tangent is the derivative of the function at the point. No matter how much one zoomed
into the microscopic detail of the modulus function at x = 0, the V shape of the curve would
not smooth out into a straight line - so it is not differentiable. Of course when we face questions
of differentiability we will rarely be in the position of being able to quickly sketch the function
and understand what happens to the curve as we zoom in on a point.
At this point it is useful to emphasise the difference between the function v(x(t)) which is the
velocity written as an explicit function of x(t) (i.e. when one writes v(x(t)) it is written as a
function of x, it is the same as writing v(x) where the notation indicating that x is a function
of t has been suppressed) and the function v(t) which is the velocity function written as an
explicit function of t. We can obtain v(t) from v(x(t)) by substituting the expression for x(t)
into v(x(t)) so we obtain a function of t rather than x. No matter which variable we use to
express v the velocity is still the same for any particular value of t = t0 or x = x0 = x(t0 ).
Hence while it is a simple matter to compute from first principles the derivative d(v(t))
dt
we know
d(v(x(t)))
that it must also be possible to compute dt
- the only change is that we use the variable
x to express v before taking the derivative. This is the aim of the chain rule: to give a method
to compute dvdt
starting from v(x), without having to substitute x(t) immediately. It is worth
commenting that if we begin with v(x) and x(t) and we wish to compute derivatives there are
dv
two simple derivatives we can compute without much thought, namely dx and dx
dt
. The Chain
dv
rule is a simple rule for combining these two derivatives to obtain dt .
Let us now state and prove the chain rule formula in terms of a pair of differentiable functions
f (x) and g(x). The chain rule is:
d(f (g)) df dg
=
dx dg dx
Proof:
d(f (g)) f (g(x + h)) − f (g(x))
= lim
dx h→0 h
f (g(x + h)) − f (g(x)) g(x + h) − g(x)
= lim
h→0 g(x + h) − g(x) h
f (g(x + h)) − f (g(x)) g(x + h) − g(x)
= lim lim
h→0 g(x + h) − g(x) h→0 h
f (g + ) − f (g) g(x + h) − g(x)
= lim lim
→0 h→0 h
df dg
=
dg dx
where we have made a change of the limit variable so that ≡ g(x + h) − g(x), hence g(x + h) =
g(x) + and
lim () = lim (g(x + h) − g(x)) = 0.
h→0 h→0
Example 4.7. Use the chain rule to find the derivative of f (ax) with respect to x where a is
a constant.
n
Example 4.8. Find the derivative with respect to x of the function ex as a function of x.
n
Using the chain rule (with f (x) = ex and g(x) = xn , so that f (g(x)) = ex ) we have
d xn d(f (g)) df dg n
e = = = nxn−1 ex .
dx dx dg dx
Example 4.9. Find the derivative with respect to x of the function ln(cosh(x)) as a function
of x.
df 1 dg
Using the chain rule (with f (x) = ln(x) and g(x) = cosh(x), so that dx
= x
and dx
=
sinh(x)) we have
d d(f (g)) df dg 1 sinh(x)
ln(cosh(x)) = = = sinh(x) = = tanh(x).
dx dx dg dx g cosh(x)
Example 4.10. Find the derivative with respect to x of the function f (g(h(x))) as a function
of x.
Exercise 4.4. Use the product rule to find the derivatives of f (x) = sin2 x and g(x) = cos2 x.
Make sure your results agree with the calculation of the derivatives from first principles found
in answering the previous exercise.
Example 4.11. Find the derivative with respect to x of the function x2 sin x as a function of x.
Using the product rule (on the product of functions x2 and sin x) we have
d(x2 )
d 2 d(sin x)
x sin x = sin x + x2 = 2x sin x + x2 cos x.
dx dx dx
Example 4.12. Find the derivative with respect to x of the function xx as a function of x.
We will insert a natural logarithm and an exponential to manipulate xx into a form upon
which we may employ the chain rule:
d x d ln(xx ) d x ln(x)
x = e = e .
dx dx dx
Now we may use the chain rule (and the product rule) with f (x) = ex and g(x) = x ln(x), so
that f (g(x)) = ex ln x . Hence we have:
d x ln(x) x ln x d
e =e x ln x = ex ln x (ln x + 1) = xx (1 + ln x).
dx dx
This is not so simple a formula to remember, so it is good news that it can be derived from the
product rule and the chain rule together.
Exercise 4.5. Check the validity of the quotient rule by first computingthe derivative
of tan(x)
d sin x
from first principles and comparing your result with the derivative dx cos x
computed using
the quotient rule.
Now, since both f and f −1 are differentiable functions we may use the chain rule to obtain:
d(f −1 (x))
d −1 df
f (f (x)) = = 1.
dx df −1 dx
Hence,
d(f −1 )
1
=
dx df
df −1
or using the primed notation to denote the derivative with respect to the argument we have
1
(f −1 )0 (x) =
f 0 (f −1 (x))
which is the statement in the last part of the inverse function theorem. The meaning of the
above is best seen through some examples.
The natural logarithm is the inverse function of the exponential function f (x) = ex and is
defined for x > 0. The defining relation is:
eln(x) = x.
arcsin(x) is the inverse function for f (x) = sin(x) and is defined for x ∈ [− π2 , π2 ]. The
defining relation is:
sin (arcsin(x)) = x.
4.4. DERIVATIVES OF IMPLICIT FUNCTIONS 115
Hence,
d 1 1 1
arcsin(x) = =p = √ .
dx cos (arcsin(x)) 1 − sin2 (arcsin(x)) 1 − x2
Where we have made use of the identity cos2 (x) + sin2 (x) = 1.
arctan(x) is the inverse function for f (x) = tan(x) and is defined for x ∈ (− π2 , π2 ). The
defining relation is:
tan (arctan(x)) = x.
Hence,
d 1
arctan(x) = .
dx 1 + x2
4.4.2 Curves
A curve, for the purpose of this course, means the set of solutions to equations defined in terms
of x and y which are related by
R(x, y) = 0.
Typically the sets of points which satisfy the above relation form lines which may be curved
(as the name suggests) or straight lines (as the name does not suggest). There is a difference
between a relation being used to implicitly define a function as above and the idea that one can
find an explicit function f (x) to define the curve as y = f (x). Usually one can only locally find
a function y = f (x) which will relate the y-coordinate of a point on a curve to its x-coordinate,
but one can cover the curve with these local functions. Consider the example of the unit circle
which is defined by the relation:
x2 + y 2 = 1.
116 CHAPTER 4. THE DERIVATIVE
One can rearrange this equation to find two functions which cover the curve for different values
of the coordinates, namely √
+ 1 − x2 y ≥ 0
y= √
− 1 − x2 y < 0
For more interesting curves the principle remains the same: one can locally find functions of
the form y = f (x) that are the same shape as the curve (locally). These local functions are the
functions which are defined implicitly by R(x, y) = 0.
To find the gradient of a tangent to a curve, one can employ the chain rule and the defining
relation, together with the notion that y = f (x):
d d
R(x, y) = R(x, f (x)) = 0.
dx dx
dy
Example 4.16. Find the derivative dx
for the function y(x) defined implicitly by the relation:
R(x, y) ≡ y + sin y − x = 0.
We treat y as a function of x, i.e. y = y(x) and use the chain rule when taking the derivative
of the relation with respect to x:
d d dy dy
R(x, y) = y + sin y − x = + cos y −1=0
dx dx dx dx
y0 + sin y0 − x0 = 0
armed with the explicit point (x0 , y0 ) which lies on R(x, y) = 0 we can then compute the gradient
of the the tangent line to the curve at that point, i.e.
dy 1
(x0 ) = .
dx 1 + cos y0
4.4. DERIVATIVES OF IMPLICIT FUNCTIONS 117
For example at x = 0 we have y = −sin(y) which is satisfied when y = 0, so the point (0, 0)
dy 1
lies on the curve and at that point the gradient of the tangent line is dx (0) = 1+cos 0
= 12 . As
y increases we see that the gradient oscillates between being undefined (i.e. the curve becomes
vertical) and 12 . So we expect this curve to wiggle its way through the point (0, 0) and a plot
of the graph can confirm this. The sketch of the graph shown here in figure 4.4 was found by
solving the differential equation for the gradient (and setting the constant to zero) to obtain
y(x).
10
-10 -5 5 10
-5
-10
Figure 4.4: The graph of the curve y(x) defined as the solution set of y + sin y − x = 0.
dy
Example 4.17. Find the derivative dx
for the function y(x) defined implicitly by the relation:
We have,
d d dy 1 d(xy) dy 1 dy
R(x, y) = y − tanh (xy) = − 2 = − 2 (y + x ) = 0
dx dx dx cosh (xy) dx dx cosh (xy) dx
which we may rearrange to find
dy y
= 2 .
dx cosh (xy) − x
dy 0
This is a challenging graph to sketch! First we notice that when y = 0 we have dx (y = 0) = 1−x
which equals zero apart from at the point x = 1 where it becomes undefined. For which values
of x does y(x) = 0? We must solve 0 = tanh 0 × x which is true for all x. Hence we find
one set of solutions is given by the x − axis whose gradient is zero. Let us turn our thoughts
to the point x = 1 where the derivative was undefined: evidently y = tanh(y) is solved when
y = 0, but this we already knew. So we may (correctly) wonder why the derivative is undefined
118 CHAPTER 4. THE DERIVATIVE
there. Now as y = tanh(xy) then y ∈ (−1, +1). Let us see if there are any points on the
curve where y(x) = 1, this corresponds to tanh(x) = 1, which is only true in the limit, i.e.
limx→∞ (tanh(x)) = 1 where we find
dy y 1
lim = lim = lim = 0.
x→∞,y→1 dx x→∞,y→1 cosh2 (xy) − x x→∞ cosh2 (x) − x
So the curve is tangential to y = 1 as x → +∞. We can make a similar argument for y = −1
when x → ∞, so y = −1 is also a tangent to the curve as x → ∞. We have now three tangent
lines to the curve as x → ∞, given by y = {−1, 0, 1}, we can conclude that the curve has at
least three parts to it as x grows. We may wonder if there are more branches to this curve?
Consider y = δ where 0 < δ < 1, then we are aiming to solve the equation δ = tanh(xδ) which
has a unique solution x = 1δ arctanh(δ). Hence we now have the picture that there are three
parts to the curve: a curve when y > 0 and x > 1, the line y = 0 and a curve when y < 0 and
x > 0 (for the last curve consider y = δ 0 where −1 < δ 0 < 0). Finally consider the gradient for
the part of the curve where y > 0 and x > 1, at x → ∞ the gradient approaches zero. We may
consider the points near x = 1 by substituting x = 1 + where ≥ 0 to find:
dy y
lim = lim
y→0+ ,x→1+ dx y→0+ ,→0+ cosh2 ((1 + )y) − (1 + )
y
= lim 2 y2
y→0+ ,→0+ (1 + (1+)
2!
+ O(y 4 ))2 ) − (1 + )
y
= lim+
y→0 y + O(y 4 )
2
= ∞.
Hence the upper curve becomes vertical as it approaches the point (x = 1, y = 0). A similar
argument can be made for the curve when y < 0. Eventually these lengthy observations can
allow us to sketch the curve and we show the graph in figure 4.5.
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
0.0 0.5 1.0 1.5 2.0
Figure 4.5: The graph of the curve y(x) defined as the solution set of y − tanh (yx) = 0.
(t, mt + c). The choice of parameterisation is not unique: one can always shift t by a constant
t0 so as to move the point corresponding to t = 0 along the curve: if t0 ≡ t + t0 then t = t0 − t0
and the parameterisation of the straight line becomes (t, mt + c) = (t0 − t0 , mt0 − mt0 + c), and if
we picked the value of t0 = mc then the line becomes the points (t0 − mc , mt0 ) now parameterised
by t0 : one can move the origin in t (t = 0) to wherever one wishes on a parametric curve in this
way, which can be very useful.
Now, given (x(t), y(t)) we may wonder if y(x) exists. Suppose that x = f (t), if f −1 exists
and is well-defined we have t = f −1 (x), hence we would then have y(t) = y(f −1 (x)) ≡ y(x(t)).
So if y(x(t)) exists then by the chain rule we have:
dy dy dx
=
dt dx dt
120 CHAPTER 4. THE DERIVATIVE
and hence
dy
dy dt
= .
dx dx
dt
dx dy
Now as x(t) and y(t) are differentiable functions we may compute directly dt
and dt
, and com-
dy
bine to obtain dx , so long as dx
dt
6= 0.
dy
Example 4.18. Find the derivative function dx
for the curve defined parametrically by
x(t) = t + cos(t)
y(t) = ln(cosh(sin(t)))
where t ∈ R.
dx
= 1 − sin(t)
dt
dy 1 d sinh(sin(t)) d
= cosh(sin(t)) = sin(t) = tanh(sin(t)) cos(t)
dt cosh(sin(t)) dt cosh(sin(t)) dt
π
Hence we see that the derivative is not defined when t = 2
+ n2π for n ∈ Z, but is defined
elsewhere. We have:
dy
dy dt tanh(sin(t)) cos(t)
= = .
dx dx 1 − sin(t)
dt
A sketch of the parametric curve is shown in figure 4.6 where it can be observed that the
derivative is not well-defined at x = π2 + n2π.
dy
Example 4.19. Find the derivative function dx
for the curve defined parametrically by
x(t) = et
y(t) = tan(t)
where t ∈ R.
4.5. THE MEAN VALUE THEOREM 121
0.4
0.3
0.2
0.1
x
5Π 3Π Π Π 3Π 5Π
- -2 Π - -Π - 0 Π 2Π
2 2 2 2 2 2
Figure 4.6: The graph of the curve y(x) defined parametrically as the points (t +
cos(t), ln(cosh(sin(t))) ) for t ∈ R.
Our preliminary observation is that x(t) > 0 so the curve is defined only for x ∈ R+ . We
may compute the following derivatives immediately:
dx
= et = x
dt
dy
= 1 + tan2 (t) = 1 + y 2
dt
Hence we see that the derivative is not defined when x = 0 but is defined elsewhere. We have:
dy
dy dt 1 + y2
= = .
dx dx x
dt
x
100 200 300 400
-2
-4
-6
Figure 4.7: The graph of the curve y(x) defined parametrically as the points (et , tan(t)) ) for
t ∈ [−8, 8].
speed limit at any point between the two measurement points. All this rests upon the mean
value theorem.
Theorem 4.3. (The Mean Value Theorem) Let f be a continuous function on [a, b] and a
differentiable function on (a, b), then there exists a point c ∈ (a, b) where
f (b) − f (a)
f 0 (c) = .
b−a
The mean value theorem says that at some point c the slope f (b)−f b−a
(a)
of the straight line
connecting f (a) to f (b) is actually equal to the derivative of f at c. It is helpful to think what
this means when the function is the position function and the derivative is taken with respect
to time. In this case the mean value theorem tells us that for a journey from f (a) to f (b) then
your average speed f (b)−f
b−a
(a)
is equal to your actual instantaneous speed at least one moment,
time c. Let us try some numbers and stick with the setting of the distance, time and speed.
Suppose that it takes you 1 hour to travel 60 miles, then your average speed for the journey
is 60 miles per hour (mph). The mean value theorem tells you that at least for one moments
in the journey you were travelling at 60 mph. Of course you might have travelled the route in
a number of different ways, e.g. you may have travelled at 60 mph for the entire journey, or
you may have travelled part of the journey at a speed greater than 60 mph and another at a
speed less than 60 mph (so that the average speed is still 60 mph), but in that case your speed
4.5. THE MEAN VALUE THEOREM 123
would have to pass through 60 mph at some point. This is obviously an unpleasant realisation
for an apprentice analyst wishing to break the speed limit. Such a mathematician might have
thought to argue in court against the evidence of a speed trap by imagining that there existed
a position function whose derivative nowhere exceeded the speed limit, but the mean value
theorem tells her or him that no such function exists.
Let us return to abstract functions and illustrate the mean value theorem on a graph. In
figure 4.8 we see a curve y = f (x) and construct the straight line which passes through the
Figure 4.8: The mean value theorem illustrated on the curve y = f (x).
points f (a) and f (b). The slope of this straight line is f (b)−f
b−a
(a)
and the mean value theorem
tells us that this straight line is tangent to the curve y = f (x) for at least one point x = c
where c ∈ (a, b), in figure 4.8 we illustrate this by translating the line passing through f (a) and
f (b) until it is tangent to y = f (x). For the curve we have sketched there is at least one other
point on y = f (x) whose tangent has the same slope.
A very useful example of the mean value theorem occurs when f (a) = f (b) as in this case
we learn that there exists some c such that
f (b) − f (a) f (a) − f (a)
f 0 (c) = = =0
b−a b−a
i.e. there exists a stationary point at x = c. This example of the mean value theorem is
important enough to have its own name, it is called Rolle’s theorem after Michel Rolle who
first proved this in 1691.
We commenced this section by wondering whether we can construct a function from knowl-
edge of its derivatives. The mean value theorem tells us about at least one derivative on the
open interval (a, b), so how might we use this to construct the function on the open interval?
There is a very useful theorem that allows us to reconstruct certain functions: the constant
functions.
124 CHAPTER 4. THE DERIVATIVE
f 0 (x) = 0
Proof: We may consider two points a and b defined on the open interval and then we have,
by the mean value theorem, that
f (b) − f (a)
= f 0 (c)
b−a
where c ∈ (a, b). However we know (from the statement of the theorem) that the function’s
derivatives are all zero on the open interval, which includes (a, b) hence f 0 (c) = 0. Therefore
we have
f (b) − f (a) = 0
for all a and b, hence f (a) = f (b) for any choice of a and b in the open interval where the
derivative vanishes, hence it is a constant function.
In this chapter we developed a definition of the derivative function and studied its properties.
The derivative is the instantaneous slope of a function and is defined using the limit. We
developed some properties of the derivative, most importantly the chain rule and the product
rule. It was proved that every differentiable function is also continuous and we saw how we can
use the derivative on functions defined implicitly, via an inverse or curves defined parametrically
and we had some practise sketching functions with the aid of the derivative function. The
final section of this chapter presented the mean value theorem and we saw how the vanishing
derivative on an open interval implied that the function must be constant on the open interval.
We began our final comments by speculating whether we could commence with knowledge of
a derivative function and use it to reconstruct the function. Such an operation would be the
inverse of the derivative, it would be the process of finding the antiderivative. In the following
chapter we will see how the antiderivative is related to the integral which gives a method for
finding the area under a curve.
5. Integration
In which we meet the integral: a way of turning a sum of infinitely many infinitesimally thin
parallelograms into finite number which gives the area under a curve. Astonishingly this will
be related by the fundamental theorem of calculus to the antiderivative of a function!
“Newton had made his discoveries first, and he had discovered more,
but Leibniz had done what Newton had not: published his work for
Figure 5.1:
the world to use and to judge.”
Isaac New-
ton (top) in
It is widely accepted that both men discovered the calculus independently but
1689 and Got-
Newton did his reputation no favours during the dispute. When a committee
tfried Leibniz
of the Royal Society gave its report on the argument it said, in the words of
(below).
Gleick, that:
“It judged Newton’s method to be not only the first - “by many years” - but also
more elegant, more natural, more geometrical, more useful and more certain.”
125
126 CHAPTER 5. INTEGRATION
The President of the Royal Society at the time the report was made and also the secret author
of the report was... Isaac Newton.
Figure 5.2: The function x(t) where x0 (t) is constant. The area under the graph as the position
changes from xS to xF is shaded, badly, but not particularly dynamically interesting.
motion is a constant function and we sketch it below. The area under the velocity function is
Figure 5.3: The function v(t) is constant. The area under the graph as the position changes
from xS to xF is shaded.
dynamically interesting, as the speed for this example is constant, we can use the formula for
the average speed as the particle moves from xS at time tS to xF at time tF is
dx
× (tF − tS ) = xF − xS .
dt
5.1. THE RIEMANN INTEGRAL 127
For this simple motion we see that from the graph of the velocity, dx
dt
, we can obtain information
related to the graph of the position function x(t) rather than just v(t).
We have considered a very simple straight line function, but for infinitesimal domains any
continuous function is approximately a straight line, so we may expect this feature to continue,
that integration (the process of computing the area under a graph) is roughly the inverse
procedure to taking the derivative, that is,
d
dt Integration
x(t) −→ v(t) and x(t) ←− v(t).
That integration is the inverse of differentiation is remarkable and is the fundamental theorem
of calculus - we will return to this later in this chapter. First we must formalise and develop
integration.
1.2
1.0 f (x)
0.8
0.6
0.4
fk
0.2
0.0 x
xk xk+1
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2
0 1 2 3 4
Figure 5.5: A staircase function: let the vertical sides of the rectangles sit at x1 < x2 < x3 <
. . . < xn < xn+1 and let the height of the k’th rectangle by fk .
where fk is a sequence of real numbers (not specified here, although it is possible to read off
some values of fk from the graph). Now the area of the k’th rectangle is its width times its
height or
(xk+1 − xk )fi
and notice that the sign convention agrees with that of the integral: if fk < 0 then the “area” is
counted negatively. This unusual function is chosen solely because we can compute its integral
exactly:
Z xn+1 X n
f (x)dx = (xk+1 − xk )fk .
x1 k=1
On the left-hand-side we have the notation denoting the integral, while on the right-hand-side
we have been able to write an expression for the sum of the areas of the n rectangles under the
staircase function between x = x1 and x = xn+1 . This exact integral is the foundation upon
which all of our Riemann integrals will be constructed.
5.1. THE RIEMANN INTEGRAL 129
In the previous example we considered a staircase function and showed that we could com-
pute its integral exactly. This was because the staircase function’s special shape allowed it to
be partitioned into rectangles whose areas could be rapidly evaluated. We would like to be
able to find the integral of an arbitrary curve, not only staircase functions, and what we will do
is to try to come up with a method of approximating curves to staircase functions. What we
will achieve ultimately will be a way of sandwiching the integral of any function between two
staircase functions, such that in a limit we can compute the integral. We will worry what this
means soon enough, first let us try to see how we might construct staircase functions which
closely approximate an arbitrary continuous function.
So how would we approximate a continuous function to a staircase function? What we
would like is a staircase whose steps are the same height as the function. But as the function
is continuous it cannot have any jumps in values as occurs for the staircase function when one
moves between steps. But what if we made the width of the steps infinitesimally thin? If we
could do this then we could approximate a continuous function by a staircase function whose
steps have infinitesimal width and whose heights are equal to the function. This will be our
goal, but first let us think a little about some staircase functions which begin to approximate
an arbitrary continuous function.
Imagine writing an algorithm to associate a staircase function to a curve: it would be difficult
to know where to begin, indeed there are many possible beginnings. There are a number of
decisions to make: how wide should the steps be? Should the step width vary or be constant?
How do decide the height of each step so that it is related to the function? For example should
the step height be the value of the function at the mid-point of the step width (as depicted
in figure 5.6), or it could be given by the top left-hand-corner of each step, or the right-hand
corner or indeed any point on the step could be put equal to the function at the same point.
How will we find a unique way to associate a staircase function to a continuous curve? Well,
the answer is that for steps of finite width we will not find a unique staircase function, but once
we take the limit so that the steps go to infinitesimal width then all staircase functions will
approach the continuous function. The algorithm we will follow is:
(i) consider all possible staircase functions where each step touches the curve we are interested
in (typically this will be an infinite set of staircase functions),
(ii) find the area under each staircase graph in the limit where the width of all steps goes to
zero and
(iii) check whether all these areas for the limits of the variety of staircase functions give the
same area.
130 CHAPTER 5. INTEGRATION
1.2
1.0 f (x)
0.8
0.6
0.4
fk
0.2
0.0 x
xk xk+1
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2
0 1 2 3 4
Figure 5.6: An example of a staircase function (split into rectangular strips) for which the
mid-point of the step is equal to the function, note the varying width of the steps.
The method proposed is tedious and, potentially, will take an infinite time to actually carry
out but since such a method may exist it gives us courage that, for a continuous function
Rb
f (x) defined on the interval [a, b] the integral a f (x) dx exists. This encapsulates the formal
definition of the Riemann integral:
Definition 5.1.2. The (Riemann) integral of a function f (x) on the interval [a, b], if it exists,
is equal to the sum of the areas of the steps of any staircase function S(x) in the limit that the
step width, w, approaches zero so long as limw→0 (S(x)) = f (x).
Comment(s). (On the Riemann integral...) Notice that we have generalised the discussion and
the Riemann integral is defined for any function, not just continuous functions in this definition
and we have introduced the idea that the integral might not exist for general functions. As we
have seen earlier in the course the limit does not always exist and it is therefore possible for the
integral of a function to not be defined. For example, think of integrating 1/x on the interval
[0, b] where b > 0, from the graph we know that the integral will be infinite and so not exist.
The short story (but not the full story) is that if a function is continuous on the interval then
the Riemann integral exists and the function is said to be integrable on the interval.
To construct the Riemann integral we will first notice that out of the infinite set of staircase
5.1. THE RIEMANN INTEGRAL 131
functions which we may associate with a function we are trying to integrate, there are two
classes of staircase function which stand out:
• S − (x) those which have the minimum area (for a given, not necessarily constant, set of
step-widths) and
• S + (x) those which have the maximum area (for a given, not necessarily constant, set of
step-widths).
Before defining these functions let us consider the graphs in figure 5.7 showing examples of
S − (x) and S + (x) for an arbitrary continuous function f (x). Notice that, when f (x) > 0, the
0.8
staircase: S − (x) 0.8
f (x) 0.8
staircase: S + (x)
0.6 0.6 0.6
Figure 5.7: Staircase functions constructed for the function f (x) (shown in the centre and
repeated in the diagrams on the left and on the right). The area of the staircase function S − (x)
underestimates the integral of f (x), while the area of S + (x) overestimates the same integral.
area of each rectangular step in the staircase function S − (x) is less than the area under the
graph evaluated on the interval covered by the step’s width. When f (x) < 0 the integral of
each step in S − (x) is more negative (less than) that of f (x) on the same interval. Now consider
S + (x) and notice that the integral of each step in the staircase function is always greater than
the integral of f (x). In other words the staircase functions satisfy the relation
side of the n steps. We will consider the integral over the interval x ∈ [a, b] so x1 = a and
xn+1 = b. Hence we can define S − (x) and S + (x) for a given function f (x) as follows:
In words the k 0 th step of the staircase function has constant height over the range x ∈ [xk , xk+1 ).
We pick the height of each step of S − (x) to just touch f (x) at its minimum point, so it will
always have an integral less than that of f (x), while S + (x) has steps which just touch f (x) at
its maximum point in the width of the step, so the integral of S + (x) is always greater than the
integral f (x). Notice that the width of each step has not been fixed in any way, the steps have
arbitrary width, so there still remain an infinite set of choices for S − (x) and S + (x). However
we are now sure that:
X n Z b Z b Z b Xn
− − −
A ≡ Sk (xk+1 −xk ) = S (x)dx ≤ f (x)dx ≤ +
S (x)dx = Sk+ (xk+1 −xk ) ≡ A+ .
k=1 a a a k=1
We have “sandwiched” the integral of f (x) between the integrals for the two staircase functions
which we are able to evaluate as summations. The idea now is to use the sandwich theorem.
If we can take a limit so that S + (x) and S − (x) both approach f (x) then if the limits of their
integrals exist and are equal then that limiting value is the value of the integral of f (x). The
real question is what is the limit that we should take? By now we should expect to take a limit
such that the width of the steps in S ± (x) become infinitesimal. But of course if we simply make
the step widths thinner without increasing the number of steps then the staircase functions will
no longer cover the domain of the function. So we must do two things at once when we take
the limit:
• the width of each step must become infinitesimal i.e. (xk+1 − xk ) → 0 and
How do we know that the number of steps must grow infinitely large? Well we are interested
in the integral over x ∈ [a, b], hence the width of the staircase functions in total must remain
b − a as we take the limit, i.e. we need b − a = n+1
P
k=1 (xk+1 − xk ) and so as (xk+1 − xk ) shrinks,
so the number of steps n must increase as b − a is constant.
You might think of this limit as moving through the space of staircase functions S ± (x)
to arrive at the pair of staircase functions whose steps are infinitesimally thin. For example
pictorially for S − (x) the process of taking this limit and moving through the staircase functions
is shown in figure 5.8. If we now take this (abstract) limit on the inequality which sandwiches
5.1. THE RIEMANN INTEGRAL 133
Figure 5.8: The width of each step in the staircase function S − (x) is reduced and the number
of steps is increased, so that the integral of S − (x) approaches from below that of f (x).
Rb
a
f (x)dx we have:
Z b
−
lim [A ] ≤ f (x)dx ≤ lim [A+ ].
(xk+1 −xk )→0,n→∞ a (xk+1 −xk )→0,n→∞
For x ∈ [0, b], the cosine function decreases monotonically (i.e. if x, x0 ∈ [0, b] and x0 > x
then cos(x0 ) < cos(x)). Now we may construct the staircase functions S ± (x) via
Now, for convenience, we may choose steps of equal width, w, with x1 = 0 and xn+1 = b so that
as
b
b = xn+1 = nw then w= .
n
134 CHAPTER 5. INTEGRATION
We can also find expressions for each of the x-coordinates of the sides of the steps, xk ,
b(k − 1)
xk = x1 + (k − 1)w = (k − 1)w =
n
The limit we wish to take involves simultaneously taking n → ∞ and w → 0, which will be
problematic unless we can rewrite the summation, in order to move n from the summation and
into an expression where we can take the limit.
1−z n+1
Pn
[Hint: First prove that k=0 zk = 1−z
then substitute z = eiθ = cos(θ) + i sin(θ) into this
result.]
Hence we have,
n
X
−
A =w cos(kw)
k=1
Xn
= w( cos(kw) − 1)
k=0
1 − cos((n + 1)w) − cos(w) + cos(nw)
=w −1
2 − 2 cos(w)
−1 − cos((n + 1)w) + cos(w) + cos(nw)
=w
2 − 2 cos(w)
−1 − cos(b + w) + cos(w) + cos(b)
=w
2 − 2 cos(w)
−1 − cos(b) cos(w) + sin(b) sin(w) + cos(w) + cos(b)
=w
2 − 2 cos(w)
where we have used b = nw to eliminate n from the expression. Hence we may now attempt to
5.1. THE RIEMANN INTEGRAL 135
For A+ we have
n
X n
X n
X
A = +
Sk+ (xk+1 − xk ) = w cos(xk ) = w cos((k − 1)w).
k=1 k=1 k=1
Hence
n
X
A+ = w cos((k − 1)w)
k=1
n−1
X
=w cos(`w)
`=0
n
X
= w(1 + cos(`w) − cos(nw))
`=1
= w(1 − cos(b)) + A−
Therefore as we have
Z b
− +
sin(b) = lim A ≤ cos(x)dx ≤ lim A = sin(b)
w→0 0 w→0
then Z b
cos(x)dx = sin(b).
0
136 CHAPTER 5. INTEGRATION
Exercise 5.2. The construction of the staircase functions in the example above rested on the
observation that cos(x) is monotonically decreasing for x ∈ [0, b] where 0 ≤ b ≤ π. Outline the
procedure for showing that
Z b
cos(x)dx = sin(b)
0
for b > π.
Exercise 5.3. Construct the staircase functions one would use to evaluate
Z b
sin(x)dx
0
Example 5.2. Use staircase functions and the sandwich theorem to evaluate
Z b
A= xm dx
a
In this example we will follow a seemingly more flamboyant path to evaluate the integral, namely
we will consider steps of varying size which grow as a geometric progression in w. Given that
x1 = a and xn+1 = b then we define the step sides to lie at
Therefore we have
n1 k−1
b b n
w= =⇒ xk = a .
a a
Let us make a few mathematical comments about this choice and a few consequences that will
prove useful.
(iv) the largest step size is the final one which has width xn+1 − xn = awn−1 (w − 1) = b(1 − w1 ),
and
(v) the limit in which the largest step’s width is reduced to zero is the limit when w → 1+ (the
limit is taken from above as by point (i) w > 1).
((b/a)m+1 − 1)
A+ = am+1 wm (w − 1)
wm+1 − 1
((b/a)m+1 − 1)
= am+1 wm (w − 1) .
wm+1 − 1
Turning to the lower bound and noting that
Sk− = xm
k = (aw ) = (awk )m w−m = Sk+ w−m
k−1 m
We have,
b
((b/a)m+1 − 1)
Z
− −m +
A =w A ≤ xm dx ≤ A+ = am+1 wm (w − 1) .
a wm+1 − 1
This looks simultaneously promising and horrendous: the good news is that in the limit w → 1
it is clear that we will have limw→1+ [A+ ] ≤ A ≤ limw→1+ [A+ ] so that A will be sandwiched by
the limit if it exists, the bad news is that we will need to evaluate the limw→1+ [A+ ]. So,
((b/a)m+1 − 1)
+ m+1 m
lim [A ] = lim+ a w (w − 1)
w→1+ w→1 wm+1 − 1
m
m+1 m+1 w (w − 1)
= lim+ (b −a ) m+1
w→1 w −1
m+1 m+1 m (w − 1)
= (b −a ) lim+ w lim
w→1 w→1+ w m+1 − 1
m+1 m+1 1−w
= (a −b ) lim+
w→1 wm+1 − 1
−1
1 − wm+1
m+1 m+1
= −(a −b ) lim+
w→1 1−w
−1
1 − wm+1
m+1 m+1
= −(a −b ) lim+
w→1 1−w
X m −1
m+1 m+1 j
= −(a −b ) lim+ w
w→1
j=0
b
bm+1 − am+1
Z
xm dx = .
a m+1
It is worth noting a small point about generalising this result. If we had considered x < 0 then
the defining relations of Sk− and Sk+ would have depended crucially on whether n is odd or even,
e.g. for x < 0 and odd n then
Sk− = min[xn ] for x ∈ [xk , xk+1 ) =⇒ Si− = xnk
Sk+ = max[xn ] for x ∈ [xk , xk+1 ) =⇒ Sk+ = xnk+1 .
5.2. THE FUNDAMENTAL THEOREM OF CALCULUS 139
But notice that in the long run this fact would not cause us much concern as the change of n
from even to odd amounts to interchanging the definition of Sk+ and Sk− . Hence if the integral
exists and the pair of staircase functions limit to the same sum, then it will not change the
result if the bounding functions are swapped depending upon even or odd n.
Of course it is very satisfying that meaning can be given to an integral and that we can
construct the Riemann integral from first principles in this way. However it is a tedious and
lengthy computation and it will come as a relief to learn that we do not need to repeat this
process for each and every function we have met in the course so far. Instead we can rely on
the fundamental theorem of calculus to give us a quick way to integrate functions.
Theorem 5.1. (The first fundamental theorem of calculus.) Let f : [a, b] → R be a continuous
function and let F : [a, b] → R be defined by
Z x
F (x) = f (t)dt.
a
Then, F is a continuous function on [a, b], differentiable on the open interval (a, b) and
dF
= f (x).
dx
Theorem 5.2. (The second fundamental theorem of calculus.) Let f : [a, b] → R be a contin-
uous function and let F (x) be defined by
dF
= f (x) ∀ x ∈ [a, b].
dx
Then, if f (x) is integrable on [a, b],
Z b
f (x)dx = F (b) − F (a).
a
2. The first fundamental theorem states that the integral of a function (from a to x) is the
antiderivative F (x). At this stage it should be clear that
Z x
f (t)dt
a
is a function of x, but it should not yet be clear that this function will be the antiderivative
and the appearance of a as a limit of the integral is not yet explained.
3. The second fundamental theorem tells us that if we know the antiderivative of a function,
then we can use this to rapidly evaluate the integral (rather than resort to the time-
consuming construction of staircase functions and the use of the sandwich theorem, as we
have seen in the previous section).
Proof: (the first fundamental theorem of calculus) In this proof we will limit ourselves to
the case where f (x) is a differentiable function. Now,
dF F (x + h) − F (x)
= lim
dx h→0 h
Z x+h Z x
1
= lim f (t)dt − f (t)dt
h→0 h a a
Z x+h
1
= lim f (t)dt .
h→0 h x
Next we will introduce an upper and lower bound on f (t) where t ∈ [x, x + h] and make use of
the sandwich theorem to in order to take the limit above. We will make use of
df
C ≡ max
t∈[x,x+h] dt
this is the maximum of the modulus of the derivative in the range [x, x + h]. Now we don’t
need to evaluate this number, it will prove very useful in evaluating the limit, but will vanish
when we take the limit. As C is the maximum rate of increase (or decrease) of the function
f (t) on the interval [x, x + h] we know that
Where did this inequality come from, the sketch in figure 5.9 will help us to understand it.
In figure 5.9 we have marked in the straight lines with gradient ±C which pass through f (x).
As, by the definition of C, df
dt
≤ C and dfdt
≥ −C for all t ∈ [x, x + h], then the straight lines
with the gradients ±C always are greater/less than or equal to f (t) for t ∈ [x, x + h]. The
maximum/minimum point on these lines lie at t = x + h and hence we have the inequality
(which we repeat here):
Figure 5.9: An arbitrary differentiable function f (t) and we are interested in its values when
t ∈ [x, x + h]. C is the absolute value of the maximum gradient of f (x) and the straight lines
marked in are those with gradient ±C passing through f (x).
R x+h
as f (x) + Ch are constants in the integral with respect to t and x dt = h (we can evaluate
the integral of the constant function: in this case it evaluates the area of a rectangle of width
h and height 1). We can also evaluate the lower bound in the limit to find:
Z x+h
1 1
lim (f (x) − Ch)dt = lim (f (x) − Ch)h = lim f (x) − Ch = f (x).
h→0 h x h→0 h h→0
dF
= f (x)
dx
as required.
Proof: (the second fundamental theorem of calculus) If f (x) = dF
dx
is integrable on [a, b]
then we can construct the Riemann integral as the limit of a sum of n rectangular areas of
width wi ≡ xi+1 − xi such that x0 = a and xn+1 = b. That is we know that,
Z b n
X
f (x)dx = lim f (x) (xi+1 − xi )
a wi →0,n→∞
i=1 x∈[xi ,xi+1 ]
n
X
dF
= lim (xi+1 − xi )
wi →0,n→∞
i=1
dx x∈[xi ,xi+1 ]
Now dF
dx
is some value of the function dF
dx
evaluated on the interval [xi , xi+1 ] and by
x∈[xi ,xi+1 ]
5.2. THE FUNDAMENTAL THEOREM OF CALCULUS 143
dF
the mean value theorem we know one special value of dx
on the interval, namely,
F (xi+1 ) − F (xi ) dF
= (ci ) for some ci ∈ [xi , xi+1 ]
xi+1 − xi dx
and we may use this value for the derivative of F (x) in the interval [xi , xi+1 ] in the Riemann
sum above. We have:
Z b Xn
F (xi+1 ) − F (xi )
f (x)dx = lim (xi+1 − xi )
a wi →0,n→∞
i=1
(xi+1 − xi )
Xn
= lim (F (xi+1 ) − F (xi ))
wi →0,n→∞
i=1
= lim (F (x1 ) − F (x0 )) + (F (x2 ) − F (x1 ) + . . . + (F (xn+1 ) − F (xn ))
wi →0,n→∞
= lim F (xn+1 ) − F (x0 )
wi →0,n→∞
= lim F (b) − F (a)
wi →0,n→∞
= F (b) − F (a)
as required.
Comment(s). (On another proof of the second fundamental theorem of calculus...) If we knew
that f (x) had an antiderivative (and not just that it is integrable2 ) we could have proved the
second fundamental theorem by using the first fundamental theorem. As f (x) is a continuous
Rx
function on [a, b] then the integral a f (t)dt exists for x ∈ [a, b]. Let us give this useful function
a name Z x
G(x) ≡ f (t)dt
a
and by the first fundamental theorem we have
dG
= f (x) ∀ x ∈ [a, b].
dx
Rb
Notice that G(a) = 0 and G(b) = a f (t)dt, i.e. G(b) is the integral of interest in the statement
of the second fundamental theorem of calculus. Now consider a second antiderivative of f (x),
denoted F (x), such that
dF
= f (x) ∀ x ∈ [a, b].
dx
2
There exist functions which are integrable, but which do not have an antiderivative e.g. Thomae’s function
(f (x) = 0 if x ∈ R \ Q and f (x) = 1/q if x = p/q ∈ Q and p and q have no common factors), and there also
exist functions who have an antiderivative but which are not Riemann integrable e.g. Volterra’s function, see
https://en.wikipedia.org/wiki/Volterra%27s_function.
144 CHAPTER 5. INTEGRATION
In this proof we aim to show that G(x) = F (x) − F (a), so let us introduce a third function, the
difference between F (x) and G(x):
then
dH
= f (x) − f (x) = 0 ∀ x ∈ [a, b].
dx
So we have learnt that H(x) has a gradient of zero for x ∈ [a, b], therefore H(x) = H(a) which
implies that
F (x) − G(x) = F (a) − G(a) ∀ x ∈ [a, b].
as required.
dF
where dx
= f (x) and A is a number.
Rb
2. So far we have only defined a f (x)dx for a ≤ b but we can generalise this to include
b ≤ a via the (second) fundamental theorem of calculus:
Z b Z a
f (x)dx = F (b) − F (a) = −(F (a) − F (b)) = − f (x)dx.
a b
R
Definition 5.2.2. The indefinite integral f (x)dx is a function: it has no specified limits.
Comment(s). (On indefinite integrals...) The indefinite integral at first sight is an abuse of
notation: the integral is defined (up to a sign) as an area via its limits, but the indefinite integral
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION145
has no limits - so what does it mean? It is another notation for the antiderivatives or primitives
of f (x). Recall that by the (first) fundamental theorem of calculus that the antiderivative is
Z x Z x
dF
F (x) = f (t)dt = dt = F (x) − F (0)
0 0 dt
d
so that dx (F (x) − F (a)) = dF dx
= f (x), in other words there are multiple antiderivatives (an-
tiderivatives are specified only up to a constant). Now it is useful to think of the class of
antiderivative functions for f (x) and while one could write F (x) − F (a) to denote the an-
tiderivatives it is useful to have a briefer notation, hence the notation for the indefinite integral,
Rx R
instead of writing a f (t)dt we write f (x)dx to denote the class of antiderivative functions
and emphasise that it is a function of x, hence the indefinite integral is
Z
f (x)dx ≡ F (x) + K K∈R
dF
by identifying the antiderivative F (x), i.e. by solving dx
= f (x).
where a is a constant.
Proof:
d dF
(aF (x)) = a = af (x).
dx dx
Hence, Z Z
d
(aF (x))dx = aF (x) + K = a f (x)dx.
dx
(ii) Integrals of sums of functions are sums of integrals:
Z
(f (x) + g(x))dx = F (x) + G(x).
Proof:
d dF dG
(F (x) + G(x)) = + = f (x) + g(x).
dx dx dx
By points (i) and (ii), integration is a linear operation.
(iii) Transforming an integral with respect to x into an integral with respect to t by substituting
x = x(t) to find: Z x2 Z t2
dx
f (x)dx = f (x(t)) dt
x1 t1 dt
where x(t1 ) = x1 and x(t2 ) = x2 .
dF (x(t)) dF dx
where we have used the chain rule dt
= dx dt
.
Proof:
d df dg
(f (x)g(x)) = g+f ,
dx dx dx
hence, Z Z Z
d df dg
(f (x)g(x))dx = f (x)g(x) + K = g(x)dx + f (x) dx
dx dx dx
as required.
We will now practise these techniques to evaluate definite integrals or identify the antiderivative
for an indefinite integral with multiple examples.
There are many options, but we are guided to simplify the terms in the integral and so choose
the substitution
x2 = t
hence 2xdx = dt. Therefore we have upon substituting into the integral
Z Z
x2 1
e xdx = et dt
2
1
= et + K
2
1 2
= ex + K.
2
Let
x = sin θ
hence dx = cos θdθ. Our motivation for choosing this substitution is the trigonometric identity
1
1 − sin2 θ = cos2 θ, i.e. (1 − x2 )− 2 = 1/ cos θ, hence we have
Z Z
2 − 21 1
(1 − x ) dx = (cos θdθ)
cos θ
Z
= dθ
=θ+K
= arcsin(x) + K.
d
Of course we may have recognised immediately the antiderivative by recalling that dx
(arcsin(x)) =
√ 1 .
1−x2
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION149
Let
x = sinh θ
hence dx = cosh θdθ. Hence we have
Z Z
2 − 12 1
(1 + x ) dx = (1 + sinh2 θ)− 2 (cosh θ)dθ
Z
= dθ
=θ+K
= arcsinh (x) + K.
Let
x = tan θ
hence dx = (1 + tan2 θ)dθ. Hence we have
Z Z
(1 + x ) dx = (1 + tan2 θ)−1 (1 + tan2 θ)dθ
2 −1
Z
= dθ
=θ+K
= arctan(x) + K.
Let
x = tanh θ
=θ+K
= arctanh (x) + K.
x + 1 = cosh θ
= arccosh(4) − arccosh(3)
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION151
x + 1 = sin θ
= arcsin(1/2) − arcsin(−1/2)
π −π
= −
6 6
π
= .
3
−dt
Z Z
sin(x)
dx =
cos(x) t
Z
d
= (− ln(|t|))dt
dt
= − ln(|t|) + K
= − ln(| cos(x)|) + K
There are many correct ways to approach this integral. We will first try to simplify the
integral with the substitution
1
=t
sin(x)
√
q
1
1−
q
cos(x) t2 1
so that dt = − sin 2 (x) dx =− 1 dx = −t2 1− t2
dx = −t t2 − 1dx. Upon substitution we
t2
find
Z
−dt
Z
dx
= t √
sin(x) t t2 − 1
Z
dt
=− √ .
t2 − 1
t = cosh(u)
Z Z
dt sinh(u)du
− √ =− q
t2 − 1 cosh2 (u) − 1
Z
=− du
= −u + K
= −arccosh(t) + K
1
= −arccosh( ) + K.
sin(x)
This answer can be written in many different forms using the various identities we have seen
for the trigonometric and hyperbolic functions. Similarly if we had pursued a different path to
find the antiderivative, by making different substitutions for example then we would have ended
up with the answer expressed in a different form. The solution here is frequently written in
154 CHAPTER 5. INTEGRATION
terms of tan(x/2) and the logarithm, let us convert our answer into this common format:
r
1 1 1
−arccosh( ) = − ln( −1+ )
sin(x) sin2 x sin x
s
1 − sin2 x
1
= − ln( + )
sin2 x sin x
cos x + 1
= − ln( )
sin x
sin x
= ln( )
cos x + 1
2 sin (x/2) cos(x/2)
= ln( 2 )
cos (x/2) − sin2 (x/2) + 1
2 sin (x/2) cos(x/2)
= ln( )
2 cos2 (x/2)
sin (x/2)
= ln( )
cos(x/2)
= ln(| tan(x/2)|).
The technique allows us to move the derivative from g(x) to its coefficient f (x) (and introduce
df
a minus sign plus a boundary term). This will be very useful if dx is a simple function, e.g. a
constant. Let us practise using the method by looking at some examples.
Example 5.14. Find, up to a constant, the function of x given by the indefinite integral
Z
arcsin(x)dx.
First we will make a substitution aimed at making the integral more palatable, we substitute
x = sin θ, so that dx = cos θdθ hence
Z Z Z
d
arcsin(x)dx = θ cos θdθ = θ (sin θ)dθ.
dθ
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION155
In the last line we have rewritten it so that a derivative is applied to one term under the integral
in order to emphasise that we are now in a situation where using integration by parts is a good
idea. Why is it a good idea, because integration by parts gives us a way to move the derivative
from sin θ onto θ and dθdθ
= 1 is simple. Let us be careful and identify the functions f and g
given in our abstract derivative with the functions in our present example3 : we take f (θ) = θ,
g(θ) = sin θ, so that
Z Z
dg df
f (θ) dθ = − g(θ)dθ + f (θ)g(θ)
dθ dθ
hence
Z Z
d dθ
θ (sin θ)dθ = − sin θdθ + θ sin θ
dθ dθ
Z
=− sin θdθ + θ sin θ
Z
d
=− (− cos θ)dθ + θ sin θ
dθ
= cos θ + θ sin θ + K
= cos(arcsin(x)) + x arcsin x + K
√
= 1 − x2 + x arcsin x + K
Example 5.15. Find, up to a constant, the function of x given by the indefinite integral
Z
x2 cos(2x)dx.
Now we integrate by parts. It is a good idea to practise integrating by parts without the need to
identify f (x) and g(x) before carrying out the procedure. Specifically notice that integration by
parts tells you to add a minus sign and shift the derivative from one term onto its coefficients
in the integral, and of course do not forget to add the boundary term - this one should practise
3
Note that in the abstract definition the variable was x but in our example it is now θ.
156 CHAPTER 5. INTEGRATION
x2
Z Z
2 d 1 d 2 1
x ( sin(2x))dx = − (x )( sin(2x))dx + sin(2x)
dx 2 dx 2 2
x2
Z
= − x sin(2x)dx + sin(2x)
2
x2
Z
d 1
= x ( cos(2x))dx + sin(2x)
dx 2 2
x2
Z
dx 1 x
=− ( cos(2x))dx + cos(2x) + sin(2x)
dx 2 2 2
x2
Z
1 x
=− cos(2x)dx + cos(2x) + sin(2x)
2 2 2
x2
Z
1 d 1 x
=− ( sin(2x))dx + cos(2x) + sin(2x)
2 dx 2 2 2
1 x x2
= − sin(2x) + cos(2x) + sin(2x) + K
4 2 2
Example 5.16. Find, up to a constant, the function of x given by the indefinite integral
Z
xe−x dx.
Z Z
−x d
xe dx = x (−e−x )dx
dx
Z
d
= − x (e−x )dx
dx
Z
= e−x dx − xe−x
= −e−x − xe−x + K.
Example 5.17. Find, up to a constant, the function of x given by the indefinite integral
Z
ln(x)dx.
This example involves a good trick: to use integration by parts we need a derivative under
the integral, now since dx
dx
= 1 we can always insert this trivial derivative without changing the
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION157
integral, so that,
Z Z
dx
ln(x)dx = ln(x) dx
dx
Z
d
=− (ln(x))xdx + x ln(x)
dx
Z
1
=− xdx + x ln(x)
x
Z
= − dx + x ln(x)
= −x + x ln(x) + K
dx
as a function of x, up to a constant. [Hint: insert dx
= 1 as in the last example.]
As mentioned our aim is to split the fraction into a sum of fractions, now we know that,
A B Ag(x) + Bf (x)
+ =
f (x) g(x) f (x)g(x)
158 CHAPTER 5. INTEGRATION
and our aim is to reverse this process. Therefore it will aid us to identify the factorisation of
the denominator in the problem. We have:
Z Z Z
3x + 1 3x + 1 A B
dx = dx ≡ + dx.
x2 − 3x + 2 (x − 1)(x − 2) x−1 x−2
Our aim now is to find A and B, we know (from summing the fractions) that
A(x − 2) + B(x − 1) = 3x + 1
hence we may now identify A and B by equating the coefficients of x and the constants on each
side of the above and solving the simultaneous equations in A and B that doing so gives us. We
find:
A+B =3
−2A − B = 1
Most integrals which can be quickly solved using partial fractions are similar to the above
example: one attempts to factorise the denominator, to find the partial fractions and then
integrate these simpler fractions. The simplest complication emerges when the denominator
has a repeated factor as in the following example.
If we commence by trying to split the fraction into partial fractions with the standard pro-
cedure we find
2x + 3 A B (A + B)(x − 1)
2
= + =
(x − 1) x−1 x−1 (x − 1)2
and so we have the contradiction that A + B = 2 and A + B = −3, something has gone astray
here, and the source of the problem is the repeated factor in the denominator. Notice that if
it had been possible to find the partial fractions as above it would have meant that a factor of
(x − 1) could be removed from both the numerator and the denominator. From the form of the
fraction that we start with we see that this is not possible (otherwise we would have simplified
5.3. PROPERTIES OF THE INTEGRAL AND SOME TECHNIQUES FOR INTEGRATION159
the fraction). Hence our first move was amiss, instead of splitting the fraction into partial
fractions whose denominators are identical we seek A and B such that:
2x + 3 A B Ax − A + B
2
= + 2
= .
(x − 1) x − 1 (x − 1) (x − 1)2
Now we find that A = 2 and B = 5, hence,
Z Z Z
2x + 3 2 5 5
2
dx = dx + 2
dx = 2 ln(|x − 1|) − + K.
(x − 1) x−1 (x − 1) x−1
One further common complication concerns what happens when the factor in the denomi-
nator is a polynomial of order greater than one. Consider the following example.
If we commence naı̈vely by trying to split the fraction with constant numerators we find
2x2 − 5x + 7 A B Ax2 + 3A + Bx
= + =
x(x2 + 3) x x2 + 3 x(x2 + 3)
and we see that the simultaneous equations for A and B are contradictory: A = 2, B = −5
and 3A = 7, our choice for the partition was too tightly constraining. Notice also that in every
other example we have chosen partial fractions such that the numerator was of order one less
than the denominator, hence we are motivated to try the following:
2x2 − 5x + 7 A Bx + C Ax2 + 3A + Bx2 + Cx
= + =
x(x2 + 3) x x2 + 3 x(x2 + 3)
so that the simultaneous equations to solve are
A+B =2
C = −5
3A = 7.
These equations are not contradictory and we can solve them to find A = 37 , B = − 13 and
C = −5. Hence,
2x2 − 5x + 7 − 13 x − 5
Z Z Z
7 1
dx = dx + dx
x(x2 + 3) 3 x x2 + 3
Z Z Z
7 1 1 x 5
= dx − dx − dx
3 x 3 x2 + 3 x2 + 3
7 1 2 5 x
= ln(|x|) − ln(|x + 3|) − √ arctan √ + K.
3 6 3 3
160 CHAPTER 5. INTEGRATION
There are some further variations which we will highlight here, namely, what happens when
the degree of the polynomial in the numerator is the same as that in the denominator, or even
greater? We will consider two simple examples in theses classes and also make some comments
on what to do if one cannot factorise the denominator.
Here the denominator has the same power as the numerator, our aim will be to write the
numerator as a multiple of the denominator plus some remainder to simplify the problem. If
we can do this, we may then cancel out common factors and hope to find a simpler problem.
Hence we must first find f (x) and the remainder R such that
(x − 2)f (x) + R = x + 2
we may carry out the long division or alternatively we may write f (x) = ax + b and find a and
b, i.e. from (x − 2)(ax + b) + R = ax2 + bx − 2ax − 2b + R = x + 2 we have a = 0, b = 1 and
so R = 4. Hence,
(x − 2) + 4 x−2
Z Z Z Z
x+2 4
dx = dx = dx + dx = x + 4 ln(|x − 2|) + K.
x−2 x−2 x−2 x−2
The factorisation of the numerator allowed us to split the fraction into a constant part and a
term in which the order of the numerator was less than the denominator.
(x + 2)3
Z
dx.
(x − 2)2
We will aim to repeat the tactic of the previous example, but now there is some increased
complication. Recall that the success of the previous example rested upon removing factors in
the denominator from the numerator, so for this example we will aim to solve
= nIn−1 .
This is a recursion formula for the integral In : it expresses In as a function of In−1 and more gen-
eral recursion formulae may involve other integrals of lower order in n, e.g. Iˆn = f (Iˆn1 , Iˆn2 , . . . Iˆ0 )
where n > n1 > n2 > . . . > 0. A recursion formula allows one to replace more complex inte-
grals (where n is large) with integrals (where n is smaller) which one hopes will be simpler to
compute. We can repeatedly use In = nIn−1 for successively smaller values of n until we find
a simple integral that we can solve:
In = nIn−1 = n(n − 1)In−2 = n(n − 1)(n − 2)In−3 = . . . = n(n − 1)(n − 2) . . . 2I1 = n!I0 .
162 CHAPTER 5. INTEGRATION
Therefore, Z ∞
In ≡ xn e−x dx = n!
0
This is a beautiful expression, but notice how simple it is to compute all In (i.e. for any n ∈ Z+ )
once a recursion formula has been identified. What is odd about this particular integral is that
R ∞ n −x
0
x e dx is well-defined for any real value of n, not only positive integer values of n, while
the computation of the recursion relation is left unchanged. In fact mathematicians use this
integral to define what is meant by n! when n ∈ R+ . For this reason, this integral occurs
frequently in mathematics and is called the gamma function and is defined as follows:
Z ∞
Γ(n + 1) ≡ xn e−x dx ≡ n! ∀ n ∈ R+ .
0
By repeating the integration by parts, one can find the recursion relation for the gamma function
is the same as for the original integral, namely,
Γ(n + 1) = nΓ(n) ∀ n ∈ R+ .
Of course if one begins with a non-integer value for n then by repeated use of the recursion
relation the value of n (appearing in the integral) can be lowered until it lies between 0 and
1, at which point one can only try to compute the remaining integral. A very interesting set
of examples occur when n is half-integer, then the final integral to compute (after using the
√
recursion relation) is Γ( 21 ) = π, so, example:
by using the substitution x = y 2 and the integral of (half of ) the Gaussian function given by
Z ∞ √
−y 2 π
e dy = .
0 2
5.4. SOME APPLICATIONS: LENGTH, AREA AND VOLUME. 163
n−1 1
In = In−2 + cosn−1 x sin x.
n n
Hence find I5 as a function of x. [Hint: commence by writing cosn x = cosn−1 x cos x.]
We defined the integral to measure (up to a sign) the area under a curve f (x). It produces the
area under a curve if the curve lies above the x-axis, but it produces minus the area between
the curve and the x-axis where f (x) < 0. This seems to present a problem to anyone trying to
use integration to find for example an expression for the area of a circle using integration. If
the circle has its centre located on the x-axis then the integral of the circle will be zero, as the
positive half-area above the axis cancels out the negative half-area found by integrating the part
of the circle beneath the x-axis. Hence we need a trick to use integration to evaluate the area
of a circle, or indeed the area of any shape which is left unchanged by a reflection in the x-axis.
However the trick is very simple, since we know that half the area is found by integrating under
the positive part of the curve which defines the shape, we may simply evaluate this integral
and then double the result to find the area of the full shape.
Let us consider the example of the circle of radius R, centred at the origin. It is defined as
the set of points (x, y) satisfying
x2 + y 2 = R 2 .
√
Hence writing the curve y(x) we have y(x) = ± R2 − x2 for x ∈ [−R, R]. There are a pair of
curves which are mapped into each other by reflection in the x-axis. Our aim will be to find the
√
area under the positive curve, y(x) = R2 − x2 , this is the area of the semi-circle as shaded in
figure 5.10. Now we may compute the area of the semicircle by integration:
164 CHAPTER 5. INTEGRATION
x
−R R
√
Figure 5.10: The integral of y(x) = R2 − x2 from x = −R to x = R gives the area of the
semicircle.
Z R
Asemicircle = y(x)dx
−R
Z R √
= R2 − x2 dx
−R
Z π
2 p
= R2 − R2 sin2 θ(R cos θdθ)
− π2
Z π
2
= R2 cos2 θdθ
− π2
Z π
2
2 1
=R (1 + cos(2θ))dθ
− π2 2
π
R2
Z
d 1
2
= (θ + sin(2θ))dθ
2 − π2 dθ 2
π2
R2
1
= θ + sin(2θ)
2 2 −π 2
R2 π π
= (( + 0) − (− + 0))
2 2 2
1 2
= πR
2
where we made the substitution x = R sin θ, hence dx = R cos θdθ. This is the area of half the
5.4. SOME APPLICATIONS: LENGTH, AREA AND VOLUME. 165
One can carry out a similar computation to find the area of an ellipse, this is the subject of the
following example.
Example 5.23. Find the area of the ellipse with foci located at (±a, 0) where a ∈ R+ .
where R ∈ R+ and R > a. We would like to work with an equation of the form y = f (x), so
we rearrange the equation to obtain4
2 2 a2 x 2
2 2
y =R −x −a + 2 .
R
Hence the upper edge of the ellipse is given by
r r
a 2 x2 a2
y = R2 − x2 − a2 + 2 = (R2 − a2 ) − x2 (1 − 2 )
R R
. We would like to use an integral to find the area of the shaded region shown in figure 5.11
The area of the ellipse will be twice the area of upper half of the ellipse, i.e. twice the integral
of y(x), hence,
Z R
Aellipse = 2 y(x)dx
−R
Z Rr
R 2 − a2
=2 (R2 − a2 ) − x2 ( )dx
−R R2
Z R√ r
x2
=2 R2 − a2 1 − 2 dx
−R R
π
√ Z
2 p
= 2 R 2 − a2 1 − sin2 θ(R cos θdθ)
− π2
π
√ Z
2
2
=2 R −a 2 R cos2 θdθ
− π2
π
√ Z
2 1
= 2R R2 − a2 (1 + cos(2θ))dθ
− π2 2
√
= πR R2 − a2
4
This is a reasonably lengthy algebraic manipulation, there is no trick used here to simplify the work but we
are only presenting the result
166 CHAPTER 5. INTEGRATION
x
−R R
q
a2
Figure 5.11: The integral of y(x) = (R2 − a2 ) − x2 (1 − R2
) from x = −R to x = R gives the
area of half of the ellipse.
where we have used the substitution x = R sin θ. Notice that this formula can be used to find
the formula for the circle in the limit a = 0. Also note that the y-intercept of y(x) is given by
√
y(0) = R2 − a2 , which is a helpful way to recall the formula for the ellipse.
as A(xi ) = π(f (xi ))2 , the area of a circle of radius f (xi ). If we split the volume into n cylinders
of equal width w then we have xi = a + (i − 1)w and as w = b−a n
then xi = a + (i − 1) b−a
n
, so
that xn+1 = b and x1 = a. Now in the limit w → 0 we have (by the definition of the Riemann
5.4. SOME APPLICATIONS: LENGTH, AREA AND VOLUME. 167
A(xi )
A
A
A
A
A
A
x
←
→
w
Figure 5.12: A volume whose surface is formed by rotating a continuous curve y = f (x) about
the x-axis may be sliced into thin circles of width w and radius f (xi ) at x = xi . In the limit
when w → 0 the circle has area A(xi ) = πf (xi )2 .
integral):
n
X Z b
2
lim (V ) = lim ( wπ(f (xi )) ) = π f (x)2 dx.
w→0,n→∞ w→0,n→∞ a
i=1
This integral above is the definition of the volume of revolution of the continuous function f (x)
about the x-axis from x = a to x = b.
Example 5.24. Use the formula for the volume of revolution to find the volume of the sphere
of radius R.
By rotating the circle defined by x2 + y 2 = R2 around the x-axis we find the volume of the
168 CHAPTER 5. INTEGRATION
sphere of radius R:
Z R
V =π (f (x))2 dx
−R
Z R
=π (R2 − x2 )dx
−R
Z R
d 1
=π (R2 x − x3 )dx
−R dx 3
R
1
= π R 2 x − x3
3 −R
1 1
= π((R3 − R3 ) − (−R3 + R3 ))
3 3
2 3 2 3
= π( R + R )
3 3
4 3
= πR .
3
Exercise 5.8. The volume bounded by the ellipse when it is rotated about the x-axis is called
an ellipsoid or sometimes a cigar. Use the formula for the volume of revolution to show that
the volume formed by rotating the ellipse defined by
p p
(x − a)2 + y 2 + (x + a)2 + y 2 = 2R
as w = xi+1 − xi . Now the full length of the curve is found by summing the li and taking the
limit as w → 0 while x1 = a and xn+1 = b are held fixed (i.e. this is the usual prescription for
5.4. SOME APPLICATIONS: LENGTH, AREA AND VOLUME. 169
√
The circumference is twice the length of the curve y = R2 − x2 , from x = −R to x = R.
We compute the ingredients used to find the length of the curve, namely, by taking the derivative
with respect to x on both sides of y 2 = R2 − x2 :
dy dy x
2y = −2x =⇒ =− .
dx dx y
Therefore, 2
x2 x2 + y 2 R2 R2
dy
1+ =1+ = = = .
dx y2 y2 y2 R 2 − x2
Hence,
R
r
R2
Z
L=2 dx
−R R 2 − x2
Z R
R
=2 √ dx
−R R − x2
2
Z π
2 1
=2 p (R cos θdθ)
− π2 R2 − R2 sin2 θ
Z π
2
= 2R dθ
− π2
= 2πR.
Of course one can use this technique to evaluate the lengths of more interesting curves,
for example the length of a spiral. However we will leave the examples of greater complexity
to the problem sheets. Instead here we will make a small digression to make an interesting
comment that is contained in one of the more seemingly paradoxical mathematical results of
the twentieth century which was commented on in a famous mathematical paper.
170 CHAPTER 5. INTEGRATION
The famous paper was called “How Long is the Coast of Britain? Statistical Self-Similarity
and Fractional Dimension.” It was written by Benoı̂t Mandelbrot and published in 1967 and
involved self-similar curves called fractals. One of his motivations was an observation of Lewis
Fry Richardson5 that the length of a coastline depends on the measurement scale that is used
to measure it. Empirical measurement confirms that the smaller the ruler used the longer
the coastline is measured to be! Our integral expressions for the length of a curve allow us
to understand the idea as w → 0 (our length scale is decreased) the curve length increases
to a maximum (but not infinite) length. The curves we are considering are well-behaved,
continuous curves so we might conclude that result concerning the length of the coastline of
Britain is perfectly sensible, however we might also wonder whether the coastline of Britain
is not given by a continuous function. The paradox contained in the measurement leads us
to ponder the question: what is the smallest unit of measurement in the natural world and
how long is a coastline measured using this length as a basic unit. We might speculate that
this basic unit is of sub-nuclear length and then we might wonder further where exactly is the
boundary of the nucleus - i.e. what boundary should we measure along at these small scales?
We might take heart if we know that most fundamental physics involves descriptions of the
world (often) in terms of continuous functions. So there may be hope yet of measuring the
coastline of Britain in the 21st century.
5
Not the Hollyoaks character but a real English mathematician who lived from 1881 to 1953.
6. Power Series
In which we return to study the infinite sum and give mathematically sound criteria when
such a sum exists. We meet the radius of convergence and show that the function ex and the
trigonometric and hyperbolic functions are well-defined. We state Taylor’s theorem which gives
a way to write a function locally as a power series and finally we show that an application of
Taylor’s theorem helps in the computation of many limits.
At the start of this course we asked the question: how many functions are there? Our aim
was not to give an answer to this question but to guide us to thinking about classes of functions.
In the course we have met many large classes of functions, for example polynomial functions,
trigonometric functions, inverse functions and so on. Throughout this tour of functions we
were haunted by the notion that some functions were more fundamental than others (i.e. they
could be used to form the other functions through addition, multiplication or composition) and
indeed that some functions were better behaved than other well-defined functions (for example
continuous functions and differentiable functions). In this chapter we will combine some of
these thoughts to define a purist’s notion of a well-behaved function: this will be the class of
analytic functions. To define an analytic function we will also face another puzzle that has
been with us in the course since we introduced the definition of the exponential function as a
series: how do we know when an infinite sum is well-defined and converges? We will use these
ideas to give the definition of a Taylor series expansion of a function about a point and then
be able to say precisely what the properties of an analytic function are.
171
172 CHAPTER 6. POWER SERIES
3. We can immediately think of infinite series which do not sum to a finite number1 (e.g.
P∞
n=0 n), so there are infinite series which do not sum to a finite number and are not
well-defined. As mathematicians we would like to know when this is sensible.
6.2 Convergence
As mathematicians we are expected to know whether or not we can find the sum of a series
(infinite or not). It is not common in everyday life that sums do not converge but in mathematics
this is very common, and such a sum or series is called divergent. Let us formally define these
terms.
What can we say if we are given a series and asked to determine whether it is convergent or
divergent? There is something we can deduce immediately about any convergent series, namely
1
The example used here has made it into popular culture because it is a very important divergent sum that
1
appears, among other places, in string theory, and because the oft-quoted claim that it equals − 12 is disturbing.
1
One can show that this sum can be split in a sensible way into a finite part (equal to − 12 ) and an infinite part,
and there is a method called ζ-regularisation which is used to throw away the infinite part, however sometimes
P∞ 1
you will see it claimed erroneously that n=0 n = − 12 , where in reality it is only after the regularisation which
removes an infinite part that this is the case. Without any regularisation it is a divergent sum.
6.2. CONVERGENCE 173
P∞
that if an infinite series S ≡ n=n0 an is convergent then we can say that:
lim (an ) = 0.
n→∞
This property of a convergent series is known as “the vanishing condition.” One can see it is
true by considering the difference between two convergent sums and taking a limit:
but notice that Sn − Sn−1 = an , hence we have the vanishing condition for any convergent
series. For example consider the convergent series:
∞
X 1 1 1 1
n
= 1 + + + + ...
n=0
2 2 4 8
1
where an = 2n
. It is clear that the vanishing condition is satisfied on an as
1
lim =0
n→∞ 2n
and on the other hand we can compute the partial sum using the geometric identity to find
N
X 1 1 − 2N1+1 2 − 21N 1
SN = = 1 = = 2 −
n=0
2n 1− 2 2−1 2N
and hence
1
lim (SN ) = lim (2 − ) = 2.
N →∞ N →∞ 2N
So this infinite sum is convergent and it satisfies the vanishing condition. However the vanishing
condition is a necessary condition for a convergent series but it is not a sufficient one. That is,
the vanishing condition is true for all convergent series, but it is not a criterion to base a test
for convergence on, as there are some divergent series which pass the vanishing condition test
as well. For example the series
∞
X 1
n=0
n
does not converge, but does satisfy the vanishing condition as limn→∞ ( n1 ) = 0. There is a
cunning argument to show that this sum does not converge. First consider the partial sum for
this series:
N
X 1
SN ≡
n=1
n
174 CHAPTER 6. POWER SERIES
lim (T2N − TN ) = T − T = 0
N →∞
P∞ 1
but for the series n=1 n we have found that
1
lim (S2N − SN ) ≥
N →∞ 2
hence we conclude that the series diverges even though it satisfies the vanishing condition.
The series ∞ 1
P
n=n0 n diverges as the terms in the sum do not become small enough at a fast
enough rate. Whereas the series ∞ 1
P
n=n0 n(n+1) has terms which diminish in size at faster rate
than those in the series ∞ 1
P
n=n0 n and does converge. Let us convince ourselves of this. The
partial sum of this series is
N
X 1
SN ≡
n=1
n(n + 1)
N
X 1 1
= −
n=1
n (n + 1)
1 1 1 1 1 1 1 1
= − + − + − + ... −
1 2 2 3 3 4 N N +1
1
=1− .
N +1
Hence,
1
S ≡ lim (SN ) = lim 1 − = 1.
N →∞ N →∞ N +1
To recap the series ∞
P 1
P∞ 1
n=1 n is divergent, while the series n=1 n(n+1) = 1 is convergent. It will
be an interesting challenge to understand the relation between the exponent of n in the series
term an and the convergence of the series. However our more immediate challenge is to develop
some criteria which we can use to determine whether a series converges or not.
6.2. CONVERGENCE 175
+
Comparison Test.) If both an > 0 and bn > 0 for all n ∈ Z and if
(C1) (The Limit
limn→∞ abnn = L where 0 < L < ∞, then either both series n an and n bn are
P P
1. The first test uses a comparison against another known convergent series to determine
if a series converges, the second and third tests require only knowledge of the series in
question.
2. We draw attention to the fact that in (C2) and (C3) certain cases are omitted, specifically
1
when limn→∞ |an | n = 1 and limn→∞ | an+1
an
| = 1, the root and the ratio tests are both
inconclusive.
3. The root test (C2) is more powerful than the ratio test (C3), but often (C3) is simpler to
work with.
Proofs of these convergence criteria are not part of this course but they are interesting, so we
include a proof of the limit comparison test here in the lecture notes but these are included for
fun and will not be examinable!
Proof of (C1): We will make use of the following lemma2 which is known as the direct
comparison test for convergence:
2
“A subsidiary or intermediate theorem in an argument or proof”, from the Greek meaning ‘something
assumed’. You might like to think of it as meaning a little theorem.
176 CHAPTER 6. POWER SERIES
P∞
Lemma 6.1. Suppose that 0 < an ≤ bn for all n ∈ Z+ then if n=1 bn converges then so does
P∞
n=1 an .
Proof of lemma: Let the partial sums of the two series be denoted by SN ≡ N
P
n=1 an and
PN
TN ≡ n=1 bn then as an ≤ bn for all n ∈ Z then SN ≤ TN for all N ∈ Z. Consequently (as
P P
0 < an ≤ bn ), limN →∞ (SN ) ≤ limN →∞ (TN ) = n bn as n bn is convergent.
Now if limn→∞ ( abnn ) = L, where 0 < L < ∞ (as an > 0 and bn > 0), then (by the definition
of the existence of the asymptotic limit) there exists some N such that for all n > N
an
− < −L<
bn
hence,
(L − )bn < an < (L + )bn .
P P
Hence an < (L + )bn for all n > N hence if n bn is convergent then (L + ) n bn converges
and so, by the lemma, n>N an is convergent. Now N
P P
n=1 an being a finite sum of real numbers
is convergent hence:
X XN X∞
an = an + an
n n=1 n=N
is convergent as it is the sum of two convergent series.
P
If, alternatively, we know that n an is convergent then by choosing N large enough we
can guarantee that L − is positive and hence
an
bn <
L−
P
and by the lemma n bn is convergent. Hence if either series is convergent, then both are, or,
alternatively, if either series is divergent then both are.
Let us practise using the limit comparison test.
Example 6.1. Use the limit comparison test (C1) to show that
∞
X 1
n=1
n2
is convergent.
Let an = n12 and now we must identify a convergent series n bn such that limn→∞ abnn exists.
P
1
P P
Hence as the limit exists and n bn is convergent then n n2 is convergent.
1
P
hence the series will only converge if m 2m(2m−1) is convergent. Now we may use the limit
comparison test (C1) to compare this series against the convergent series n n12 , i.e. taking
P
1
an = 2n(2n−1) and bn = n12 , then
1
( 2n(2n−1) ) n2 n2
an 1
lim = lim = lim = lim = .
n→∞ bn n→∞ ( n12 ) n→∞ 2n(2n − 1) n→∞ 4n2 − 2n 4
1
P P
Therefore as the limit exists and n bn is convergent then n 2n(2n−1) is convergent and hence
P (−1)n+1
n n
is convergent.
We may think of a power series as giving a series for any given value of x, that is a power
series is a series n an where the terms an = bn xn each depend on x. We might immediately
P
wonder how S(x) varies as x varies: in particular is S(x) convergent for all or any values of
x? Now we may make use of the convergence criteria, in particular Cauchy’s n0 th root test
(C2) and D’Alembert’s ratio test (C3) for the series n an by substituting an = bn xn into the
P
178 CHAPTER 6. POWER SERIES
|x| < R1
and S(x) will diverge if |x| > R1 . R1 is called the radius of convergence as it delimits the points
where the power series is well-defined and converges from the points where it diverges. From
the convergence criterion (C3) we will find an alternative definition of the radius of convergence.
Substituting an = bn xn into (C3) we find that if
n+1
an+1 < 1 S(x) converges
= lim bn+1 x = lim bn+1 x
lim
n→∞ an n→∞ bn x n n→∞ bn > 1 S(x) diverges.
bn+1 −1
Let us define limn→∞ bn ≡ R2 then S(x) will converge for all x such that
|x| < R2
and will diverge for all x such that |x| > R2 . Naturally we would only expect there to be a single
value R = R1 = R2 which is the radius of convergence for a power series. The relation between
R1 and R2 is not part of this course, but of course there is an interesting interaction between
the pair of radii of convergence. At the end of this chapter we will include an appendix for
the interested reader who wishes to understand more about the two formulae for the radius of
convergence of a power series (which is not an examinable part of the course, but is interesting).
ber R such that the series converges for all x such that |x| < R and diverges for all |x| > R,
where
1
R = lim |bn |− n
n→∞
and, where it exists,
bn
R = lim .
n→∞ bn+1
6.3. SERIES AS FUNCTIONS OF X. 179
1. For |x| = R there is no general conclusion to be drawn, one must inspect the series being
considered using some alternative method.
the power series has consecutive terms which are non-zero. Many power series have only
even or odd terms, in which case this limit is not defined. When both definitions of the
radius of convergence exist they agree, and in the appendix you can see that the radius
coming from the ratio comparison test implies the existence of the other definition (from
the root test) of the radius of convergence, and in that case they coincide.
1
Here the series has coefficients bn = n!
, hence
1
= lim ( n! )
bn
= lim (n + 1)! = lim (n + 1) = ∞.
R = lim
n→∞ bn+1 n→∞ ( 1 ) n→∞ n! n→∞
(n+1)!
Hence ex converges for all |x| < ∞, i.e. for all x ∈ R. At this stage, we may recall (with
some satisfaction) all the results that rested upon this observation, including the exponential
notation for the trigonometric and hyperbolic functions, Euler’s formula, the polar coordinate
notation for complex numbers, the roots of unity, trigonometric identities for double and half
angle formulae and hence all the integral and derivatives which required these identities.
Example 6.4. Find the radius of convergence for the power series
∞
X
S(x) = xn .
n=0
180 CHAPTER 6. POWER SERIES
Hence S(x) converges for all |x| < 1. From the identity for the geometric sum of a finite number
of terms we know that
N
X 1 − xN +1
xn = ∀ x 6= 1.
n=0
1−x
Now we have
0 if n is even
bn = n−1
(−1)m (−1) 2
(2m+1)!
= n!
if n is odd
3
N=4
N=3
N=2
2
N=1
1 N=0
-1
-2
-3
To show that this limit exists we can use Stirling’s approximation for n! which is given by:
n
n √
n! ≈ 2πn for n >> 1
e
this approximation is valid as n → ∞. Hence we have,
n
n √
1 1 n 1
R = lim |n!| n ≈ lim | 2πn| n = lim lim (2πn) 2n = ∞.
n→∞ n→∞ e n→∞ e n→∞
We are able to split the product of the limits as,
1 1 ln n ln n
lim (n 2n ) = lim (eln (n 2n ) ) = lim (e 2n ) = e(limn→∞ ( 2n )) = e0 = 1.
n→∞ n→∞ n→∞
Therefore as the radius of convergence is infinite, the sine function converges for all x ∈ R.
This was a rather difficult conclusion to reach using the root test for convergence, alternatively
182 CHAPTER 6. POWER SERIES
we could have argued that sin(x) converges by using the triangle inequality:
∞ 2m+1 ∞ ∞
|x|2m+1 |x|m
X
X x X
| sin(x)| =
(−1)m ≤ ≤ = e|x|
m=0
(2m + 1)!
m=0
(2m + 1)! m=0
m!
be a shorthand notation for the n’th derivative of f (x) evaluated at the point x = x0 . Using
this notation we make the following claim which will form a basis for Taylor’s theorem: if a
function f (x) can be written as a power series with a radius of convergence R > 0 (i.e. for all
x ∈ R such that |x| < R, then f (x) = ∞ n
P
n=0 bn x ) and if we may exchange the order of the
derivative and the summation for an infinite series (as we can for a finite series) then:
f n (0)
bn =
n!
if the n’th derivative of f (x) exists at x = 0.
Proof of the claim: If
∞
X f n (0) n
S(x) ≡ x for |x| < R
n=0
n!
is the same as f (x) then we can confirm this by checking that all of their derivatives at a point
x = 0 are equal as well as f (0) = S(0). So,
∞
dm X f n (0) n
m
S (x) = m x
dx n=0
n!
∞
X f n (0)
= n(n − 1)(n − 2) . . . (n − m + 1)xn−m
n=0
n!
∞
X f n (0) n−m
= x .
n=m
(n − m)!
6.4. TAYLOR’S THEOREM 183
Now at x = 0 we have
∞
X f n (0) n−m
S m (0) = 0 = f n (0)
n=m
(n − m)!
∞
X f n (0)
f (x) = xn
n=0
n!
as claimed.
d
Now, because we might be (rightly) concerned about moving dx (which is a limit) past an
infinite sum (which is another limit) we present a proof that sidesteps this issue and uses the
fundamental theorem of calculus. Now,
Z x Z x
n+1 d n
f (y)dy = (f (y))dy = f n (x) − f n (0)
0 0 dy
Z x
n n
f (x) = f (0) + f n+1 (y)dy.
0
Our intention is to make repeated use of the identity above. By assumption we will presume
that f is differentiable (if the function can be written as a convergent series then it will be
infinitely differentiable) and we commence with writing the statement above for the case n = 0:
Z x
f (x) = f (0) + f 1 (y)dy
0
Z 1
= f (0) + x f 1 (xz1 )dz1
Z0 1 Z xz1
1 2
= f (0) + x f (0) + f (y1 )dy1 dz1
0 0
Z 1 Z xz1
1 2
= f (0) + xf (0) + x f (y1 )dy1 dz1
0 0
where we have made the substitution y = xz1 so that dy = xdz1 and x is a constant. In the
final line we have made use of the identity for f n (x) given above. Our intention is to repeat
184 CHAPTER 6. POWER SERIES
x2 2 x3
f (x) = f (0) + xf 1 (0) + f (0) + f 3 (0) + . . . .
2 3!
However if we stopped the procedure after the first N terms of the Taylor series had been
generated we would have been left with an integral RN +1 (x) which is known as the remainder
term:
N
X f n (0)xn
f (x) = + RN +1 (x)
n=0
n!
where
Z 1
R1 (x) = x f 1 (xz1 )dz1
Z0 1 Z xz1
2
R2 (x) = x f (y1 )dy1 dz1
0 0
Z 1 Z 1 Z xz1 z2
2 3
R3 (x) = x z1 f (y2 )dy2 dz2 dz1
0 0 0
..
.
The result we have derived above is an example of a Taylor series, a Taylor series f (x) about
x = 0 is known as the Maclaurin series for f (x). Let us summarise:
6.4. TAYLOR’S THEOREM 185
Definition 6.4.1. The Maclaurin series for a function f (x) is a Taylor series expansion about
x = 0 and to order N is written
N x
f n (0)
Z
X
n 1
f (x) = x + RN +1 (x) where RN +1 (x) = f N +1 (y)(x − y)N dy.
n=0
n! N! 0
Why can we write the remainder term this way? We can prove this by induction. Let SN
be the statement that
N
X f n (0) n
RN +1 (x) = f (x) − x .
n=0
n!
We will prove the inductive step first, namely that SN =⇒ SN +1 .
Z x
1
RN +2 (x) = f N +2 (y)(x − y)N +1 dy
(N + 1)! 0
Z x
1 d N +1
= (f (y))(x − y)N +1 dy
(N + 1)! 0 dy
Z x y=x
1 N +1 d N +1 1 N +1 N +1
=− f (y) ((x − y) )dy + f (y)(x − y)
(N + 1)! 0 dy (N + 1)! y=0
Z x
1 1
= f N +1 (y)(x − y)N dy − f N +1 (0)(x)N +1
N! 0 (N + 1)!
1
= RN +1 (x) − f N +1 (0)xN +1
(N + 1)!
N
X f n (0) 1
= f (x) − xn − f N +1 (0)xN +1
n=0
n! (N + 1)!
N +1
X f n (0) n
= f (x) − x
n=0
n!
which is the statement SN +1 . Now let us check the basis step for N = 0, the left-hand-side of
the statement S0 is
1 x 1
Z Z x
0
R1 (x) = f (y)(x − y) dy = f 1 (y)dy = f (x) − f (0).
0! 0 0
Therefore S0 is a true statement and our proof by induction of the structure of the remainder
term is complete.
186 CHAPTER 6. POWER SERIES
The Maclaurin series give an approximation for f (x) around zero with the discrepancy being
given by the remainder term. In general one is interested in expanding about points x = a
where a may be non-zero. This more general case is called the Taylor series. The expansion
parameter is no longer x = (x − 0) but in the parameter (x − a):
Definition 6.4.2. The Taylor series to order N around the point x = a of a function f (x) is
N x
f n (a)
Z
X
n 1
f (x) = (x − a) + RN +1 (x) where RN +1 (x) = f N +1 (y)(x − y)N dy.
n=0
n! N! a
One can set about proving this theorem in the same manner as we did for the Maclaurin
series, but instead of using x one works with the shifted variable (x − a).
Our aim is to get some familiarity with using the Taylor series by computing the Taylor
series for some familiar functions.
Example 6.6. Find the Maclaurin series for the exponential function up to order N and state
the remainder term.
d n
We have f (x) = ex , hence f n (x) = dx x x
n (e ) = e . For the Maclaurin series we need to
expand about x = 0, so we need f n (0) = e0 = 1 for all n. Hence the Maclaurin series is
N
x
X xn
e = + RN +1 (x)
n=0
n!
where Z x
1
RN +1 (x) = ey (x − y)N dy.
N! 0
Although it is not required for the example, it is reassuring to note that the remainder term
vanishes as N → ∞ by the following argument. Let M ∈ Z such that M > (x − y) > 0 (as
y ∈ [0, x] in the integral) then for all N > M
(N − 1)!
(x − y)N −M < M N −M < M (M + 1)(M + 2) . . . (M + (N − M − 1)) =
(M − 1)!
and so,
(N − 1)! (N − 1)!
(x − y)N < (x − y)M < MM .
(M − 1)! (M − 1)!
Therefore
(x − y)N MM
M (N − 1)! 1
0 ≤ lim ≤ lim M = lim = 0.
N →∞ N! N →∞ (M − 1)!N ! (M − 1)! N →∞ N
6.4. TAYLOR’S THEOREM 187
Consequently we see that in the limit N → ∞ that the remainder term vanishes,
Z x
(x − y)N
y
lim RN +1 (x) = e lim dy = 0.
N →∞ 0 N →∞ N!
This is consistent with our observation that the infinite power series for the exponential function
is convergent for all x ∈ R.
Example 6.7. Find the Maclaurin series for the sine function up to order N , where N is an
odd, positive integer, and state the remainder term.
Hence we have
0 for even n
f n (0) = n−1
(−1) 2 for odd n.
Hence to find the terms up to order n = N we may write odd n = 2m+1 for m ∈ {0, 1, 2, 3, . . . , N2−1 },
(recall that N is odd by the assumption stated in the question) so that
N −1
2
X (−1)m 2m+1
sin x = x + RN +2 (x)
m=0
(2m + 1)!
where
Z x Z x
1 N +2 N +1 1 N +1
RN +2 = f (y)(x − y) dy = (−1) 2 cos(y)(x − y)N +1 dy.
(N + 1)! 0 (N + 1)! 0
We observe here that as the terms of even order in x are zero, the remainder term after the
xN term (where N is odd) begins with terms proportional to xN +2 , hence the remainder term
is RN +2 rather than RN +1 .
Example 6.8. Find the first N (non-zero) terms, where N is even, in the Taylor series for
the cosine function expanded about the point x = a.
Hence we have
(−1) n+1
2 sin a for odd n
n
f (a) =
(−1) n2 cos a for even n.
To find the first N terms we do not need to consider the remainder term, and including the
first term we need only construct terms up to and including order N − 1 hence we have
Example 6.9. Find the first 4 (non-zero) terms, in the Maclaurin series for the function
f (x) = ln (1 − x).
d n
n f n (x) = dx n (ln(1 − x)) f n (0)
0 ln (1 − x) 0
−1
1 (1−x)
−1
−1
2 (1−x)2
−1
−2
3 (1−x)3
−2
−6
4 (1−x)4
−6
4
X f n (0)
ln(1 − x) = xn + R5 (x)
n=0
n!
1 1 1
= −x − x2 − (2)x3 − (6)x4 + R5 (x)
2 3! 4!
1 1 3 1 4
= −x − x2 − x − x + R5 (x).
2 3 4
6.4. TAYLOR’S THEOREM 189
If we had continued the expansion to higher order terms the apparent pattern would continue
and we would find
∞
X xn
ln(1 − x) = −
n=1
n
so long as the general remainder term RN +1 (x) vanishes as N → ∞. So let us check when this
is the case:
Z x
1 N +1 N
lim (RN +1 (x)) = lim f (y)(x − y) dy
N →∞ N →∞ N! 0
Z x
(x − y)N
1
= − lim N! dy
N →∞ N! 0 (1 − y)N +1
Z x
(x − y)N
= − lim N +1
dy
N →∞ 0 (1 − y)
Z x
(x − y)N
1
=− lim N
dy
0 N →∞ (1 − y) 1−y
Now
(x − y)N
lim
N →∞ (1 − y)N
converges to zero if | x−y
1−y
| < 1, i.e. if |x| < 1, but if x > 1 the remainder term diverges. This is
a sign that the power series
∞
X xn
ln(1 − x) = −
n=1
n
converges only if |x| < 1, we can show this is the case by identifying the radius of convergence.
We have bn = − n1 and so,
1
( n+1 )
bn+1 n
= lim 1
R = lim = lim = lim
1 = 1.
n→∞ bn n→∞ ( n1 ) n→∞ n + 1 n→∞ 1 + n
x2 x 3
ln(1 − x) = −x − − − ...
2 3
converges for |x| < 1.
We can now re-state the definition of the analytic function
Definition 6.4.3. A function which can be locally written as a convergent power series is called
an analytic function.
190 CHAPTER 6. POWER SERIES
That is if the Taylor series expansion for a function about a point x = a is convergent for
some non-zero radius R about the point x = a, i.e. for all x |x − a| < R, and if this is true
for all a ∈ R, then the function is analytic. For example, as the exponential function has a
convergent power series for all x it is an analytic function.
Now we can make a few simple observations about such analytic functions, as we can
always find a convergent power series expansion, then the function is differentiable (as we have
no trouble in taking the derivative of a power series or infinite polynomial function) and so it
is also continuous. Hence the analytic functions are the purist’s idea of a well-defined function
of one variable. This observation presents the end of the long-term quest for this course.
x2 x 3
ex = 1 − x + − + ....
2! 3!
We may make a similar observation for
x2 x 3
ln(1 − x) = −x − − − ...
2 3
which is convergent for x ∈ (−1, 1), hence
x2 x3
ln(1 + x) = x − + − ...
2 3
is convergent for x ∈ (−1, 1). We may use (well-defined) operations on analytic functions to
find other series expansions, e.g.
1 d d x2 x3
= (ln(1 + x)) = (x − + − . . .) = 1 − x + x2 − x3 + . . .
1+x dx dx 2 3
which we might have deduced by summing the geometric series on the right when |x| < 1. A
6.5. L’HÔPITAL’S RULE 191
1 1
= 2 4
cos(x) 1 − 2! + x4! − . . .
x
1
≡
1+Y
= 1 − Y + Y 2 − Y 3 + ...
x2 x4 x 2 x4
= 1 − (− + − . . .) + (− + − . . .)2 + . . .
2! 4! 2! 4!
x2 x4 x4
=1+ − + + ...
2! 4! 4
x2 5x4
=1+ + + ...
2! 4!
where we have limited |Y | < 1, notice that in general Y = cos(x)−1 ∈ [−2, 0], so this restriction
on Y corresponds to constraining 0 ≤ cos(x) ≤ 1. By attempting other manipulations of
functions where they are know to be analytic one can come up with other wonderful examples
of Taylor series.
We conclude the course with a wonderful observation which is built upon the Taylor series
and is fantastically useful for computing limits.
where we have substituted f (0) = 0 and g(0) = 0 before taking the limit. So long as g 1 (0) 6= 0
this result gives a well-defined limit. Let us summarise the generalisation of this result which
is known as l’Hôpital’s Rule:
Theorem 6.1. (l’Hôpital’s Rule) Let f (x) and g(x) be two functions which are differentiable on
the interval I (except possibly at the point x = x0 ) such that limx→x0 (f (x)) = 0, limx→x0 (g(x)) =
0 and g 0 (x) 6= 0 for all x ∈ I \ x0 then
0
f (x) f (x)
lim = lim
x→x0 g(x) x→x0 g 0 (x)
where x0 ∈ I.
It is possible to further generalise this statement of the theorem. To see this we will consider
again the example where x0 = 0 but now consider functions such that limx→0 (f (x)) = ±∞ and
limx→0 (g(x)) = ±∞ then we have,
g 0 (x)/(g(x))2
f (x) 1/g(x)
lim = lim = lim −
x→0 g(x) x→0 1/f (x) x→0 −f 0 (x)/(f (x))2
where we have been able to
invoke l’Hôpital’s
rule as we have limx→0 (1/f (x)) = 0 and limx→0 (1/g(x)) =
(g(x))2
0. Now if the limit limx→0 (f (x))2
exists we can multiply the above by it to find
g 0 (x)
g(x)
lim = lim .
x→0 f (x) x→0 f 0 (x)
We note that
ln(x)
x ln(x) =
1/x
and as limx→0+ (ln(x)) = ∞ and limx→0+ (1/x) = ∞, so we have,
ln(x) 1/x −x
lim x ln(x) = lim+ = lim+ = lim+ = 0.
x→0+ x→0 1/x x→0 −1/x2 x→0 1
• f (x) and g(x) have well-defined Taylor expansions around the limit point, and
194 CHAPTER 6. POWER SERIES
f 0 (x)
• the limit limx→x0 g 0 (x)
exists while f (x0 ) = 0 and g(x0 ) = 0 or f (x0 ) = ±∞ and
g(x0 ) = ±∞.
Although l’Hôpital’s rule seems like a panacea for all limits, the assumptions around its con-
struction means one must take care when using it. For example consider the limit
2
2x − 3x + 1 3
lim =
x→2 5x + 4 14
as the quotient is well-defined at the limit point, if we had mistakenly applied l’Hôpital’s rule
we would have evaluated:
d
(2x2 − 3x + 1)
dx 4x − 3
lim d
= lim =1
x→2
dx
(5x + 4) x→2 5
which is not the correct answer. Another example where the rule is not valid is the limit
x + sin(x) sin(x)
lim = 1 + lim = 1.
x→∞ x x→∞ x
and this limit does not exist, so l’Hôpital’s rule was not of use here and some other method is
needed3 .
3
We made use of limx→0 ( sin(1/x) 1
1/x ) = 0 to deduce the limit by using a change of the limit variable y = x .
6.A. THE RATIO TEST IMPLIES THE ROOT TEST. 195
Appendix
For different series it is useful to prefer one of these definitions for a radius of convergence over
another. In this appendix we will show that if R2 exists, then R1 also exists and in that case
R1 = R2 giving a single radius of convergence from the pair of definitions.
Let us commence by recalling the − δ definition for the existence of a limit to equation 6.1
before putting the resulting expression in a helpful format.
If the limit defining R2 exists then
an
| − R2 | < 2 ∀ 2 > 0, n > N
an+1
where N is some sufficiently large value of n ∈ Z chosen such that the inequality is satisfied.
Now it will be useful to consider the natural logarithm of R2 in order to later compare it with
R1 . We have,
an
ln(R2 ) ≡ ln( lim
) = lim ln |an | − ln |an+1 | .
n→∞ an+1 n→∞
Now to compare this with the limit R1 we will also take the natural logarithm of R1 , so,
1 1 1
ln(R1 ) ≡ ln( lim |an |− n ) = lim (ln(|an |− n )) = lim (− ln |an |) (6.4)
n→∞ n→∞ n→∞ n
4
The material in this appendix is provided for interest and is not an examinable part of the course.
5
Recall that |x| < R defines the values of x for which the power series is convergent and well-defined and R
is the radius of convergence.
196 CHAPTER 6. POWER SERIES
where M ∈ Z and M ≥ N . We may develop the left-hand-side of the equation above by using
the triangle inequality (i.e. |A + B| ≤ |A| + |B|), so that,
M
M
X X
| ln |an | − ln |an+1 | − ln(R2 )| ≥ (ln |an | − ln |an+1 | − ln(R2 ))
n=N
n=N
= ln |aN | − ln |aN +1 | + ln |aN +1 | − ln |aN +2 | + . . .
+ ln |aM | − ln |aM +1 | − (M − N + 1) ln(R2 )
= ln |aN | − ln |aM +1 | − (M − N + 1) ln(R2 ).
have that,
1
lim −
ln |aM +1 | = ln(R1 ),
M →∞ M +1
hence,
R1
ln(R1 ) − ln(R2 ) = ln < ∀ .
R2
This is true for all > 0 which implies that R1 = R2 . Implicitly our work is now done, as if
R2 exists then we have shown that it is equal to R1 and hence R1 exists. However we set out
to show that equation (6.3) implies equation (6.5). This will be achieved in a simple step from
the equation we have derived above:
1 1
lim −
ln |aM +1 | − ln(R2 ) = lim −
ln |aM +1 | − ln(R2 ) < .
M →∞ M +1 M →∞ M +1
199
200 CHAPTER 7. TUTORIAL EXERCISES
In this tutorial you will practise working with injective, surjective and bijective functions. Ques-
tions marked with ? should be submitted to your tutor in your tutorial class for marking and
feedback.
Tutor’s Example1
Let f : A → B be given by
sin x
f (x) = .
x
Identify the largest domain A ∈ R for which f is well-defined. Sketch the graph of f . If
B = (−∞, n] ∈ R identify the smallest value of n for which f is well-defined.
1. State the maximum domains and minimum ranges in R for which the following are well-
defined functions:
(x+2)2
(a) f (x) = (x−2)2
1
(b) g(x) = x
0 x<0
(c) h(x) = 1
2
x=0
1
x > 0.
The function h(x) is a common definition of the Heaviside step function, named after
Oliver Heaviside.
2. For each of the three functions in question 1, sketch the graph and qualitatively describe
the slope of each graph - in particular state whether the slope is positive, negative or does
not exist.
? 3. Which of the following graphs could correspond to well-defined functions? State whether
the functions could be bijections. You may assume that the graphs are plotted over the
full domain and range in each case (i.e. this means you can assume the range is such that
1
To be shown by the tutor on the board at the start of the tutorial
201
1.0
0.5 100
50
-10 -8 -6 -4 -2 2
-50
-0.5
-100
-1.0
(a) (b) -150
1.0 10
8
0.5
-3 -2 -1 1 2 3
4
-0.5
2
(c) -1.0
(d) -10 -5 5 10
10
150
-10 -5 5 10
100
50 -5
1 2 3 4 5
-10
(e) (f )
? 4. Sketch the following functions and for each case state whether the function is injective,
surjective, neither or if the function is not well-defined?
6. Mathematica2 Graphs are plotted on Mathematica using the command Plot[]. For ex-
ample if you open Mathematica, type
then push the shift and the enter key simultaneously, you will get a plot of f (x) = x2 for
x ∈ [−5, 5] within the real numbers. There are many more options which can be added
to the command so that:
Plot[xˆ2, {x, −5, 5}, GridLines− > Automatic, PlotRange− > {{−10, 10}, {−50, 50}}]
draws the parabola f (x) = x2 for the values of x in the domain [−5, 5], on the axes with
grid lines added in automatically and with horizontal axes running from −10 to 10 and
vertical axes from −50 to 50. Practise using Mathematica to confirm your graph sketches
of the functions in problem 1 (some research will be required to plot h(x) in part (c)).
7. A challenging problem3 . Consider the set of all lists of positive integers, L, i.e. the ele-
ments of L are lists of positive integers {n1 , n2 , . . . , nN } and each ni ∈ Z+ , e.g. {1, 4, 6},
{4, 1, 6, 6, 6, 4} and {7, 52, 23456, 3, 504, 10023} are elements of L. Argue that L is count-
ably infinite and construct a bijection from L to N.
2
Occasionally there will be a challenge to use Mathematica - these challenges are for you to try outside the
tutorial class using the college’s computer rooms.
3
All challenging problems in the tutorial problem sets are beyond the scope of the course, and are not
examinable, but they are, hopefully interesting and inspiring.
203
In this tutorial you will work with inverse functions, exponents, logarithms, polynomial func-
tions and trigonometric functions. Questions marked with ? should be submitted to your tutor
in your tutorial class for marking and feedback.
Tutor’s Example4
Prove that the function is NOT a bijection. Identify domains (which together cover R) over
which f (x) becomes invertible.
? 10. For what value of n does log2 (3) log3 (4) log4 (5) . . . logn (n+1) = 10? [Hat-tip: this problem
was set at math10.com6 .]
11. Use the definitions of the sine, cosine and tangent functions, to prove the following pair
of trigonometric identities:
Both of these identities arise from using Pythagoras’ theorem for right-angled triangles.
Identify the right-angled triangles for which Pythagoras’ theorem gives the identities
above.
(i) ex ey = ex+y
(ii) e−x = 1
ex
Pn n
You may wish to use the binomial expansion (x + y)n = k=0 k
xk y n−k .
PN 1 k
13. Mathematica. Use Mathematica to plot f (x) = k=0 k! x for N = 2, 5 and 10 and
compare your plot to the exponential function denoted by Exp[x] in Mathematica. Plot
all four graphs on the same set of axes for the domain x ∈ [−5, 5]. N.B. The Mathematica
command for a summation is Sum[] which takes two arguments: the expression to be
summed and the range of the summation variable. For example the sum of the first 10
square numbers is found using Sum[kˆ2, {k, 1, 10}].
14. A challenging problem. In the lecture notes we considered differential equations of the
form
dn
(f (x)) = Af (x).
dxn
When A = 1 and n = 1 the solution satisfying f (0) = 1 is the exponential function;
when A = 1 and n = 2 the solutions are the hyperbolic functions (f (x) = cosh(x) when
f (0) = 1 and f 0 (0) = 0 and f (x) = sinh(x) when f (0) = 0 and f 0 (0) = 1); when A = −1
and n = 2 the solutions are the trigonometric functions (f (x) = cos(x) when f (0) = 1
and f 0 (0) = 0 and f (x) = sin(x) when f (0) = 0 and f 0 (0) = 1). What are the solutions
to the equation when
In this tutorial you will practise working with the inverse trigonometric functions, the hyper-
bolic functions, trigonometric functions with complex arguments, double-angle formulae and
composition of functions. Questions marked with ? should be submitted to your tutor in your
tutorial class for marking and feedback.
Tutor’s Example7
Prove that
a cos θ + b sin θ = c sin(θ + α)
? 15. Express the following combinations of functions in the form c sin(θ + α), where c and α
are real constants which you have to find (assume |β| < π/2):
√
(a) sin(θ) + cos(θ) (b) 3 sin(θ) − cos(θ) (c) sin(θ) − tan(β) cos(θ)
? 16. Use the explicit formula for the function arcsinh : R → R, namely arcsinh(y) = ln(|y +
p
y 2 + 1|), to show that
17. Use the exponential forms for the sinh and cosh functions to derive an analytic expression
for arctanh .
p √
? 18. Show that cos(π/12) = 12 2 + 3.
(Hint: use the formula for cos(2θ) in terms of cos(θ))
19. Find formulae that express the hyperbolic functions sinh(x), cosh(x) and tanh(x) as ratios
of polynomials of t = tanh(x/2).
7
To be shown by the tutor on the board at the start of the tutorial
206 CHAPTER 7. TUTORIAL EXERCISES
? 20. Find all solutions θ ∈ R (if any) of the equation cot(θ) + tan(θ) = α, for the following
three cases:
(i) α = 1 (ii) α = 2 (iii) α = 4
21. Let α, β ∈ R. Rewrite each of the following expressions as a constant times a product of
trigonometric functions:
(i) sin(α − β) + sin(α + β) (ii) cos(α − β) − cos(α + β)
and
4z(1 − y 2 ) + 4y(1 − z 2 ) = (1 + z 2 )(1 + y 2 )
where x, y, z ∈ R, find the value of the following expression:
2 2
1 − z 2 1 − x2
2x 2z
− + − .
1 + x2 1 + z 2 1 + z 2 1 + x2
Hint:Use the substitutions x = tan α, y = tan β and z = tan γ to rewrite the expressions.
207
In this tutorial you will work with the limit, continuous functions, the intermediate value theo-
rem, asymptotic limits and the algebra of limits. Questions marked with ? should be submitted
to your tutor in your tutorial class for marking and feedback.
Tutor’s Example8
is a continuous function.
? 25. For each function below identify the value of k that makes the function continuous:
x2 +2x−15 x<3
x−3
(a) f : R → R given by f (x) =
x2 − kx + 1 x ≥ 3.
3x−6
k(x3 +2x2 −11x+6)
x<2
(b) f : [−3, 3] → R given by f (x) =
7x − 1 x ≥ 2.
12
2e3x x≤0
(c) f : R → R given by f (x) = sin(kx)
x > 0.
x
n
27. Let n ∈ Z+ , and calculate the limit limn→∞ 1 + n1 .
Hint: substitute n−1 = x and try to use limits calculated in the lectures.
8
To be shown by the tutor on the board at the start of the tutorial
208 CHAPTER 7. TUTORIAL EXERCISES
28. Calculate the following limits, if they exist, without using power series:
1−cos(x) sin(7x)
(a) limx→0 x2
(b) lim
x→0 tan(2x)
sin(x2 )
(c) limx→0 x
(d) lim cosh(x)
x→0
29. Calculate the following limits, if they exist, using the power series representations of the
hyperbolic functions:
sinh(x) 1 − cosh(x)
(a) limx→0 x
(b) lim
x→0 x
1−cosh(x) tanh(x)
(c) limx→0 x2
(d) lim
x→0 x
sin(x) arcsin(x)
(a) limx→π π−x
(b) lim
x→0 x
arcsin(3x) arctanh(x)
(c) limx→0 tan(5x)
(d) lim
x→0 x
(x − 1)2 + y 2 = 1
and a shrinking circle C2 with radius r and center the origin. P is the point (0, r), Q is
the upper point of intersection of the two circles, and R is the point of intersection of the
line P Q and the x-axis. What happens to R as C2 shrinks, that is, as r → 0+ ?
9
This problem is perhaps not so challenging but the result is interesting - I first saw this question on the blog
https://mrchasemath.wordpress.com/2010/02/23/really-fun-limit-problem/ where you can find access
to a Java applet which will help you understand the solution.
209
In this tutorial you will do some more work with limits, deriving fundamental limits using
the sandwich theorem and simplifying complicated limit problems. Questions marked with ?
should be submitted to your tutor in your tutorial class for marking and feedback.
Tutor’s Example10
32. Use the sandwich theorem to show that limx→0 (x4 cos(2/x)) = 0.
4x − 9 ≤ f (x) ≤ x2 − 4x + 7
? 36. Calculate the following limits, if they exist, using any suitable method:
10
To be shown by the tutor on the board at the start of the tutorial
210 CHAPTER 7. TUTORIAL EXERCISES
sin(x)
(a) limx→∞ x
(g) limx→0+ e1/x
cos(x)
(b) limx→∞ x2
(h) limx→0+ xsin(x)
ax−bx
(c) limx→∞ x(e1/x − 1) (i) limx→0 x
where a > b > 0
(x+1) ln(x)
(d) limx→0+ xx (j) limx→0+ sin(x)
ln(x)
(e) limx→∞ x2 +1
(k) limx→0− e1/x
xx −1
(f) limx→0+ x ln(x) (l) limx→∞ xe1/x
37. Mathematica Use Mathematica to check your answers to problem 33. The Mathematica
command for finding a limit is Limit[] and it takes at least two arguments, the first
being the expression to be evaluated under the limit and the second entry showing the
value the variable approaches in the limit. For example to compute limx→0 ( sin(x)
x
) one
types
Limit[Sin[x]/x, x− > 0]
before pushing shift and enter simultaneously to find the result.
In this tutorial you will revise and work with the derivative. Questions marked with ? should
be submitted to your tutor in your tutorial class for marking and feedback.
Tutor’s Example11
df
p
Let f : R → R be given by f (x) = 1 + sin(x)). Compute dx
and sketch the graph of f (x) for
x ∈ [−π, π]. Is the function differentiable for x ∈ [−π, π]?
41. Let f n (x) denote the n0 th derivative with respect to x of the function f : R → R, where
n ∈ Z+ . Calculate f n (x) at x = 0 when
(a) f (x) = ex .
(b) f (x) = sin(x).
(c) f (x) = xn ex .
44. Consider the hyperbola defined as the set of points satisfying x2 − y 2 = 16. Let A =
(−5, yA ), B = (5, yB ) and C the point where the tangent to the hyperbola at A meets
the tangent to the hyperbola at B. Compute the area of the triangle ABC.
45. Mathematica. Computation of derivatives in Mathematica uses the operation D[]. For
example, the operation D[x^2+3x+2,x] computes the derivative of x2 +3x+2 with respect
to x. To compute higher derivatives Mathematica uses the formulation D[f,{x,n}] to
n
give ddxnf . Use Mathematica to check your answers to questions 40, 41 part (c) and 42.
has an extreme value at a point where the sign of the derivative does not make a simple
change.
12
This problem is, perhaps, less challenging than it is interesting. The function is taken from the marvellous
book “Counterexamples in Analysis” by Bernard R. Gelbaum and John M. H. Olmstead.
213
In this tutorial you will work with derivatives of implicit functions, inverse functions, paramet-
ric functions, and the mean value theorem. Questions marked with ? should be submitted to
your tutor in your tutorial class for marking and feedback.
Tutor’s Example13
The curve defined as the set of points (x, y) satisfying the equation
y 2 = x 3 + x2
dy
is an example of an elliptic curve. Find an expression for dx
and use the result to sketch the
curve.
? 47. Each of the following is an equation that determines y as an implicit function of x. Find
in all cases an expression for dy/dx.
(a) x2 + y 2 = 1 (b) y 3 + x3 = 1
? 48. Each of the following are a pair of equations which determine x and y in terms of a
parameter t ∈ R, which defines a function y(x) implicitly. Find in all cases an expression
for dy/dx.
dy
(a) Find an expression for dx
in terms of t from this parametric form.
(b) Find an explicit expression for y in terms of x that no longer involves t.
dy
(c) Use the result of (b) to find dx
in terms of x, and verify that this agrees with what
you got in (a).
50. Find all values c for which the mean value theorem is satisfied for the functions:
51. Use both the intermediate value theorem and the mean value theorem to prove that
f : R → R given by f (x) = x3 − 2x2 + 4x − 8 has exactly one real root.
52. Mathematica The plotting of graphs of parametric functions can be achieved in Mathe-
matica by using ParametricPlot[]. The operation takes a pair of arguments, the first
being the set of coordinates (x(t), y(t)) for the function and the second being the range
of the parametric variable t. For example ParametricPlot[{Cos[t],Sin[t]},{t,0,Pi}
plots the semicircle given by the coordinates x = cos(t), y = sin(t) for for t ∈ [0, π].
Practise using this command by plotting the parametric curves in question 48, you will
wish to choose a range of t that gives an informative plot, rather than using t ∈ R.
and use this formula to find expressions constraining f n (0) where f (x) = tan(x) (and
n
f n (x) = ddxnf ). As far as possible, use these relations to find a closed expression for f n (0).
215
In this tutorial you will work with the integral, in particular the Riemann integral and the
fundamental theorems of calculus. Questions marked with ? should be submitted to your tutor
in your tutorial class for marking and feedback.
Tutor’s Example14
54. Construct staircase functions and then evaluate from first principles the Riemann integral
of f : R → R given by
? 55. Find a ‘primitive’ for each of the following functions, i.e. a function F (x) such that
F 0 (x) = f (x). Prove your claims.
59. Use the fundamental theorem of calculus to show that, upon substitution of x = x(t),
Z x2 Z t2
dx
f (x)dx = f (x(t)) dt.
x1 t1 dt
60. Mathematica. Mathematica evaluates standard integrals using the operation Integrate[].
For example, to find the primitive of x2 one evaluates Integrate[x^2,x], the first argu-
ment gives the function to be integrated and the second argument gives the variable to
integrate against. For definite integrals one must also specify a range for the variable, so
R3
that to find 2 x2 dx one evaluates Integrate[x^2,{x,2,3}]. Use Mathematica to check
your answers to problem 55.
15
This problem was suggested by Mr. Joshua Brazier.
217
In this tutorial you will revise further techniques to integrate functions and some applications
of the integral. Questions marked with ? should be submitted to your tutor in your tutorial
class for marking and feedback.
Tutor’s Example16
2x3 dx
Z
(e) put x = t − 1
1+x
Z
xdx
(f) √ put x = t2
1+ x
R
66. Define the indefinite integrals In (x) for n ∈ Z as In (x) = cosn (x)dx. Use integration by
parts to derive the following recursion formula:
1 n−1
In (x) = sin(x) cosn−1 (x) + In−2 (x)
n n
R
Use the recursion formula to find the indefinite integral cos4 (x)dx.
67. Use the method of partial fractions to calculate the following integrals:
Z
R dx xdx
(a) x2 +x−6
(b)
(1 − x)2 (1 + x)
Z
R dx xdx
(c) 2x2 +x−1
(d) 2
12x − 7x − 12
68. A challenging problem. (Taken from the Indian Institute of Technology Joint Entrance
Exam in 2006.)
Evaluate the following R1
0
(1 − x20 )100 dx
5050 R 1 .
0
(1 − x20 )101 dx
R1
[Hint: commence by considering In = 0 (1 − xm )n dx.]
219
In this tutorial you will revise the radius of convergence and Taylor (and Maclaurin) series.
Questions marked with ? should be submitted to your tutor in your tutorial class for marking
and feedback.
Tutor’s Example17
69. Calculate the length L of the spiral in the plane described by the parametric equations
y(t) = vt cos(ωt) and x(t) = vt sin(ωt) from t = 0 to t = τ , where v, ω are constants.
? 70. Calculate the length L of the curve in the plane described by the function y = ln(cos(x)),
between the points x = −π/4 and x = π/4.
71. Find out for the following series whether they are convergent or divergent, using the
various results stated and/or derived in the lectures or otherwise:
X
−3
(−1)n n−2
P
(a) nn (b)
n
√ X
nα e−n
P
(c) n 1/ n (d) (α > 0)
n
? 72. Find the radii of convergence for the following power series:
X
n 3
(−1)n xn /n2
P
(a) n x /n (b)
n
√ X
n
xn nα e−n
P
(c) nx / n (d) (α > 0)
n
17
To be shown by the tutor on the board at the start of the tutorial
220 CHAPTER 7. TUTORIAL EXERCISES
73. Derive the Taylor expansions for the following functions, up to order N = 3 about the
point x = 0 for the first three functions and up to order N for the last one, and give in
each case an exact expression for the remainder term:
(a) f (x) = tan(x) (b) f (x) = tanh(x)
√
(c) f (x) = x3 (d) f (x) = 1+x esin(x)
P∞ n
? 74. Write the following two functions as power series of the form n=0 bn x , and determine
for each the associated radius of convergence:
(a) f (x) = 1/(1−x)2 (b) f (x) = arctan(x)
Substitute x = 1 into your result under (b) (why is it not obvious that this is allowed?)
and derive an expression for the number π as a series (the so-called Leibniz series).
75. Find the first three terms (i.e. up to order x3 ) of the Taylor expansions about x = 0 for
the following two functions, by combining and/or manipulation the Taylor expansions of
other functions that you know:
1+2x
(a) f (x) = tanh(x) (b) f (x) = ln
1−2x
76. Show that for x ∈ R the series representation ex = ∞ n
P
n=0 x /n! of the exponential function
has the following properties (you may assume that you can interchange differentiation and
summation):
d ax
e = aeax e0 = 1
dx