Refresher Lecture Notes
Refresher Lecture Notes
Keith Ball
Chapter 1. Functions
Functions are some of the most important objects in mathematics. But it was only in
the 19th century that mathematicians finally settled upon a clear definition of what they
meant by functions. They chose just about the simplest possible concept.
Suppose A and B are sets. A function from A to B sends each element of A to something
in B. That’s all. We don’t say anything about how the function does its job: the only
thing we insist is that it sends each element of the first set, to something in the second.
Thus, every function comes “equipped,” with two sets: A, the set of points where the
function is defined, and B, the set of possible values of the function. If the name of the
function is f , we can draw attention to these sets by writing
f : A → B.
The set A is called the domain of f : B is called its codomain. For each element x of
the domain we write f (x) for the place to which x is sent: the image of x under f .
1
Refresher Course, Keith Ball 2
The only rule is that every element of the domain, should be the origin of precisely one
arrow: for each x in A, there is an unique image, f (x).
What does equation (1) tell us? It says that whatever real number you pick, the image
will be the square of that number. The function s, maps each real number to its square.
s is the function that squares things.
The last sentence illustrates the real idea: A function is what it does. If you want
to say what a particular function is, you have to say what it does, to each point of its
domain.
As I mentioned, this definition can be written: “s is the function that squares things.”
When you write the definition this way, you see that the “x” disappears. We don’t need
an x when we write the definition in words. This brings out the fact that the definition
is not telling us anything about x: it is telling us about the function s. The statement
says that, whatever number you put in to s, you will get out the square of that number.
The statement
s(w) = w2 , for each real number w (3)
says exactly the same as (2). Each of them tells us what the function s does: and therefore,
what s is.
Now let’s look at a rather different function. This time we will take the domain and
codomain to be the finite set
{1, 2, 3, 4}
consisting of just 4 numbers. Let p be the function given by
We have not written down any “formula” for the function. But we have certainly specified
a perfectly good function: we know what p does to each element of its domain.
Refresher Course, Keith Ball 3
It is important to try to rid oneself of the feeling that a function has to be given by a
“formula.” There is nothing in the definition which requires that it should. All we know
is that each point of the domain has to get sent to a point of the codomain. To illustrate
this issue let me address the following question.
Find a function f defined for all real numbers with the property that f (0) = 1, f (1) = 0
and f (2) = 2.
So we want
�
f (0) = 1
f (1) = 0 �
f (2) = 2 � �
You might be tempted to try to find a formula for something like the function on the
right. (And clearly I did just that in order to tell the computer how to draw it.)
Composition of functions
Functions can be combined in various ways. One of the most important is composition.
If you have functions
g:A→B
and
f :B→C
then it is possible to build a new function which maps from A to C by first applying g
and then applying f to the result. This new function sends an element x of A, to the
element f (g(x)) of C. The new function is called the composition of f and g and is often
written f ◦ g.
So f ◦ g is our name for the function which first does g and then does f to the result.
Suppose f : R → R and g : R → R are given by
f (t) = t1/3
g(x) = 1 + x2 .
Algebra of functions
f +g
Remember that in order to say what f + g is, we have to say what it does: and indeed
we have. f + g is the function which takes each number x to the sum of the values f (x)
and g(x).
Refresher Course, Keith Ball 5
This addition of functions is already very familiar to you. You have often worked with
polynomial functions
x 7→ 1 + 3x + 2x2
which are built by adding multiples of powers.
The point I want to make here, which you have perhaps glossed over in the past, is that
we use the addition of ordinary real numbers,
f (x) + g(x)
In a similar way we can multiply functions. We can form a new function f.g from two old
ones by defining
(f.g)(x) = f (x)g(x).
√
Again you are very familiar with this process. The function x 7→ (1 + x) 1 + x2 is
√
obtained by multiplying the functions x 7→ 1 + x and x 7→ 1 + x2 .
Refresher Course, Keith Ball 6
Chapter 2. Polynomials
Polynomials make up the simplest class of functions which are varied enough to be inter-
esting. A polynomial is a function such as
x 7→ 1 − 3x + 4/7x2 + 2πx3
x 7→ a0 + a1 x + a2 x2 + . . . + an xn
where n is a non-negative integer and a0 , a1 , . . . are numbers. The highest power with a
non-zero coefficient is called the degree of the polynomial.
We often use polynomials to approximate more complicated functions: they are varied
enough to enable us to approximate pretty accurately, but simple enough for us to be
able to calculate them efficiently.
x 7→ 1 + 4x
x 7→ 2/7 − x.
x 7→ ax + b
where a and b are numbers. These functions are called linear because if you plot the
graph y = ax + b you get a straight line.
One useful property of linear functions is that you can easily solve equations of the form
ax + b = 0
(where a 6= 0): so you can easily find where a linear function takes the value 0. You may
have exploited this fact already in Newton’s method for approximating the solutions of
equations.
Refresher Course, Keith Ball 7
x 7→ ax2 + bx + c
for numbers a, b and c. If you plot y = ax2 + bx + c you get a parabola (as long as a 6= 0).
Again we can solve equations of the form
ax2 + bx + c = 0
to find x. If a 6= 0 and
ax2 + bx + c = 0
then √
−b ± b2 − 4ac
x= .
2a
However, in this case we need square roots to express the solution.
To use the formula in concrete cases, we need an efficient and reliable method for calcu-
lating square roots. (Your calculator is equipped with such a method, probably a relative
of Newton’s method.)
The most basic property of polynomials relates their zeros and their factors.
Theorem (Zeros and factors of polynomials). Let p be a polynomial and suppose that
p(α) = 0: that α is a zero of p. Then we can factorise p as a product of two polynomials
p(x) = (x − α)q(x)
For example, if p(x) = x3 − 11x2 + 7x + 3 then you can check that p(1) = 0 and we
can write
x3 − 11x2 + 7x + 3 = (x − 1)(x2 − 10x − 3).
Once you have factorised the polynomial you can immediately see that p(1) = 0 because
the first factor is 0 when x = 1. Putting it another way, if a polynomial is zero at α,
it is zero for a very simple reason: there is a linear factor of the polynomial which is
Refresher Course, Keith Ball 8
“obviously” zero at α. This is not true in any normal sense for other functions. sin x is
zero at x = π but sin x is not “divisible” by x − π in any algebraic sense.
How do we demonstrate that each zero corresponds to a linear factor? The argument we
need is very close to what we actually do to find the factor. Suppose we want to factor
the polynomial
x3 − 11x2 + 7x + 3 = (x − 1)q(x).
How do we do it? We have in our minds, or on a piece of scrap paper, a tentative product
What do we need to start off with in the second factor if we are going to get x3 in the
product? Clearly we need an x2 :
Now what? Our product now looks like (x − 1)x2 = x3 − x2 so we need to get an extra
−10x2 somehow, in order to end up with −11x2 altogether. We can get this −10x2 by
putting in an extra −10x into q:
Now we have a product (x − 1)(x2 − 10x) = x3 − 11x2 + 10x, so we need a further −3x
to get 7x. So we put −3 into q:
Now we have no more question marks left to fill, so we just have to hope that the last
term of p, the 3, automatically works out right. Sure enough it does. Is this a miracle
or can we see why?
If we were to start with any polynomial p(x) and divide by x − 1, we could continue to
choose terms in q(x) until we had managed to get all the powers of x we wanted except
the last one; the constant term. So we would have
where q is a polynomial and r is a number. Now suppose that p(x) = 0 when x = 1; ie.
p(1) = 0. Then, if we put x = 1 into equation (4) we get
0 = 0.q(1) + r
Refresher Course, Keith Ball 9
and hence r = 0.
In other words the constant term works out automatically. Now if we put r = 0 back into
(4) we get
p(x) = (x − 1).q(x)
which is the factorisation that we wanted. The division process just described is a special
case of a more general fact.
Consequences of factorisation
Once you have factored a polynomial (if you can), you have a pretty good idea of how
the polynomial behaves. For example, if p(x) = x(x − 1)(x − 2) then it’s pretty easy to
see that the graph of y = p(x) looks something like this:
One of the most important consequences of factorisation is a bit more theoretical. If you
multiply together some linear factors, let’s say five of them,
you get a polynomial of degree 5. So we can immediately see that a polynomial of degree
4 cannot have 5 different zeros since this would make it a product of 5 or more factors.
In general:
This principle can be reinterpreted. Suppose that f and g are two polynomials of degree
at most 5 and suppose we know of 6 different places where f and g take the same value:
they agree at 6 different places. In other words I can find x0 , x1 , . . . , x5 with
Then the polynomial f − g also has degree at most 5 but is zero at 6 different places.
(f − g)(x0 ) = f (x0 ) − g(x0 ) = 0 and so on.
So this polynomial must be “identically zero”: it must be the zero polynomial. This in
turn means that f and g must be the same as one another.
It wasn’t too hard to show that polynomials can’t have too many zeros: we just observed
that they obviously can’t have too many factors. The question of whether they have any
zeros is much more difficult. For a start, in order to guarantee that you can factor
polynomials, you need to introduce complex numbers. This was not really done until the
18th century. The first arguments which clearly showed that you can always factorise
polynomials, did not appear until the 19th century. This stunning fact is called the The
Fundamental Theorem of Algebra. We shall discuss this remarkable fact later on.
Refresher Course, Keith Ball 11
Chapter 3. Summation
It often happens in mathematics that we need to refer to the sum of a large number of
terms or perhaps of an indeterminate number of terms
1 + 2 + 3 + . . . + n.
P
In these situations it is convenient to use the notation. Thus, the expression could be
written as n
X
i.
i=1
to mean
x1 + x2 + x3 + . . . + xn . (6)
Before moving on to calculations, I want to draw your attention to the role of the letter i
in each of these expressions. Whereas i appears in the expression (5) it does not appear
in the expanded version (6). The letter i is merely a “dummy variable” which is being
used to give us an instruction: “Add up the numbers x1 , x2 , . . . xn .” This remark makes
it clear that
X13
xi
i=1
and
13
X
xk
k=1
are the same: they are different ways to write the same expression
x1 + x2 + x3 + . . . + x13 .
It is important to bear in mind the meaning of expressions such as these when using
them.
Refresher Course, Keith Ball 12
In addition to being a shorthand, sigma notation helps to clarify sums when the terms
are complicated.
1 1 1 1 1
+ + + +
6 24 60 120 210
means the same as
5
X 1
n=1
n(n + 1)(n + 2)
but it is much easier to see what’s going on in the second expression.
Geometric series
The most important example of summation is one that you have met. Suppose r is a
number and we consider the sequence
(1, r, r2 , r3 , . . .)
in which each number is r times the previous one. Such sequences occur naturally in the
calculation of interest repayments and the study of radioactive decay for example. Can
we find a simple expression for the sum of the terms of such a sequence
n
X
rk ?
k=0
As we increase n, these sums look more and more complicated and it is less and less easy
to see how they behave:
1
1+r
1 + r + r2
1 + r + r2 + r3
..
.
However there is a simple trick which enables us to rewrite these sums. If you multiply
one of these expressions by 1 − r, almost everything cancels: for example,
(1 − r)(1 + r + r2 + r3 ) = 1 + r + r2 + r3
− r − r2 − r3 − r4
= 1 − r4
Refresher Course, Keith Ball 13
21 − r43
1+r+r +r = .
1−r
which has the advantage that it does not become more complicated as n increases.
2 1 − rn+1
n
1 + r + r + ... + r = .
1−r
Notice that we could interpret this theorem as a statement about the factorisation of a
polynomial:
1 − xn+1 = (1 − x)(1 + x + x2 + · · · + xn ).
Thus, the formula for the sum of a geometric progression is just a statement about fac-
torising a special family of polynomials.
Infinite series
Power series are supposed to be like polynomials, but with infinitely many terms. It comes
as no surprise that we have to be rather careful when we talk about infinite sums: they
aren’t quite as simple as finite sums.
1, 1 21 , 1 34 , 1 78 , . . .
and it’s pretty clear that this sequence of numbers is approaching 2. So it seems reasonable
to say that the infinite sum is equal to 2.
1 1 1
1+ + + + . . . = 2. (7)
2 4 8
1 + 2 + 4 + 8 + 16 + . . . .
If we keep adding more terms, the result just shoots off out of sight, so we have no way
to make sense of this infinite sum.
Experience has shown us that the most convenient way to make sense of an infinite sum
corresponds to this idea of watching what happens as we add successive terms. If we have
a sequence of numbers
a1 , a2 , a3 , a4 , a5 , . . .
and if the sequence of partial sums
a1
a1 + a2
a1 + a2 + a3
a1 + a2 + a3 + a4
..
.
P∞
approaches a fixed number A then we say that the series 1 ak converges and
∞
X
ak = A.
k=1
Refresher Course, Keith Ball 15
It is a special case of a more general statement about geometric series. Suppose that
−1 < r < 1. Recall that
1 − rn+1
1 + r + r2 + . . . + rn = .
1−r
Thus our formula for the sum of a geometric progression tells us the size of the partial
sums of the infinite series
1 + r + r2 + . . . .
As n gets larger, the right hand side approaches
1
1−r
because rn+1 → 0. (This depends upon the fact that −1 < r < 1.)
Equation (8) tells us how to sum infinite geometric series. These are some of the most
important infinite series in mathematics. They crop up all over the place.
The Binomial Theorem concerns the expansions of the powers of a sum of two numbers
(hence “binomial”). The powers 2 and 3 give rise to the familiar expansions
(x + y)2 = x2 + 2xy + y 2
and
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 .
where the question-marks denote coefficients: the so-called binomial coefficients, which
appear in what is known as Pascal’s Triangle1 :
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
..
.
The coefficients in the expansion of (x+y)n appear in the nth row of the triangle, provided
we count the single 1 at the top, as the zeroth row. Each row of the triangle is obtained
from the previous row in a simple way: to obtain a given entry you add together the two
entries above it.
1
The triangle was known hundreds of years before Pascal in India, China, Persia and probably else-
where.
Refresher Course, Keith Ball 17
To see why Pascal’s triangle appears let’s try to find the coefficients in the expansion of
(x + y)4 from those before. We can write (x + y)4 in terms of the previous expansion in
an obvious way.
(x + y)4 = (x + y)(x + y)3 .
So, assuming we know the third row of the triangle, we get
If we multiply the second factor by x and by y in turn, we get two pieces which contribute
to the total:
x4 + 3x3 y + 3x2 y 2 + xy 3
x3 y + 3x2 y 2 + 3xy 3 + y 4
Each of these pieces has coefficients 1,3,3,1 like the 3rd row, but the coefficients are
attached to different combinations of x and y. When we combine the two pieces, we
therefore add shifted copies of the 3rd row to get
n
The coefficient is pronounced “en choose kay” for reasons described later. The
k
So far we have examined Pascal’s Triangle and explained why it is built up in the way
that it is, but we didn’t really do anything more than reinterpret multiplication by x + y
in terms of addition. If we are to use binomial coefficients, it would be nice to have a
simple formula which will enable us to calculate them without going through the whole
business of building Pascal’s Triangle.
for example, without finding 20 rows of the triangle? The answer is provided by the Bi-
nomial Theorem which tells us that the coefficients can be written in terms of factorials.
The simplest way to prove the Binomial Theorem is to check algebraically that the ex-
pressions involving factorials do indeed satisfy the Pascal triangle property:
n n−1 n−1
= + .
k k−1 k
Refresher Course, Keith Ball 19
Once you know that the factorial expressions do satisfy this equation you can deduce that
they are the binomial coefficients by induction: assuming that they give the correct values
in the nth row we can conclude that they also give the correct values in the (n + 1)st (and
so we only need to check the first row in order to get the induction started).
This inductive argument gives a perfectly good proof of the Binomial Theorem but it
isn’t very illuminating. It just seems to work by magic without really explaining why the
coefficients have the factorial formula or how someone came up with the formula in the
first place. There is a much more instructive way to find the formulae.
all at once (instead of squaring x + y and then multiplying again) you would write down
each possible product made up of one factor from each bracket:
This procedure makes it clear that the coefficient of xy 2 is equal to 3 because there are
three different ways of getting a product of one x and two y’s.
3
The binomial coefficient is thus the number of ways of selecting two factors from
2
among the three: namely the two factors which contribute y to the product. It is auto-
matically equal to the number of ways of choosing one factor from among the three (the
factor which contributes the x).
n
In the same way, is the number of different choices of k objects from a given n
k
objects. This is why we call it “n choose k”. For each k we have that
n n
=
k n−k
Refresher Course, Keith Ball 20
simply because, instead of focusing upon the k we choose, we could instead focus upon
the n − k that we don’t.
Now we’re ready to derive the factorial formula. Let’s do a particular example, with
n = 7 and k = 4. We want to calculate how many different 4-somes we can make out of
7 objects.
Suppose that the objects are numbered from 1 to 7. Imagine first that we write down all
possible orderings of the 7 objects,
1234567
3146752
..
.
There are 7! = 5040 of them. Now from each ordering, select the first 4 objects. So from
the second ordering above we would select the foursome {1, 3, 4, 6}.
How many times will each 4-some get selected? The 4-some {1, 3, 4, 6} will be selected each
time that our ordering has these four numbers distributed among the first four positions
and the numbers 2, 5 and 7 distributed among the last three positions.
There are 4! × 3! ways of doing this, since the numbers 1, 3, 4 and 6 can be ordered in
4! ways and the other three in 3! ways. So from our 7! orderings, each foursome will get
selected 4! × 3! times. This means that the number of different foursomes is
7!
4! 3!
which is what we wanted to check.
There are many ways to state the Binomial Theorem. The important thing to understand
is that the theorem links two apparently different questions and provides two different
interpretations of the binomial coefficients.
Refresher Course, Keith Ball 21
Consider the following simple problem. You have at your disposal two commercially
available mixtures of nitrate and phosphate fertilisers: call the mixtures X and Y. 1kg of
each of these mixtures contains the following amounts of each fertiliser.
X Y
You wish to make up a bag containing 120g of N and 150g of P. Can you do it, and if so
how much of each mixture do you need?
To solve the problem you let x be the number of kilos of X and y be the number of kilos
of Y. Then you want to arrange that
These are simultaneous linear equations which are easily solved to yield
x = 0.3
y = 0.6
so that you need 300g of X and 600g of Y (and you can indeed achieve your aim).
Let us think for a moment why this problem gave rise to linear equations (which we have
no difficulty solving). Why is it that x and y appear only in simple linear combinations?
There are two closely related points involved:
• The total amount of nitrate is just the sum of the amount of nitrate coming from
X and that coming from Y.
Refresher Course, Keith Ball 23
When you lump together some of X and some of Y you simply add the amounts of N
contributed by each (and similarly the amounts of P).
This “additivity of lumping together” principle, holds in a wide variety of situations. For
example, if you put together two lumps of a certain radioactive substance, then the number
of atoms which decay in any given period is just the sum of the numbers from the two
lumps. As you may know, radioactive decay is governed by a differential equation rather
than by algebraic equations like those above. Nevertheless, we still refer to this differential
equation as a linear differential equation because it exhibits the same additivity principle
as the linear equations above.
Linear equations are the most useful in mathematics, for three reasons:
• They turn up naturally in many situations.
• Even when the true equations are non-linear, we can often approximate them by
linear ones.
• Linear equations are usually much easier to solve than nonlinear ones.
(The difficulty of solving non-linear equations is well-illustrated by the fact that we cannot
solve, precisely, the equations governing the motion of three heavy objects under gravity.)
Matrices
We have developed a useful shorthand for writing systems of linear equations using vectors
and matrices. The system
2x − y = 5
x + 2y = 5.
is written
2 −1 x 5
. =
1 2 y 5
Our choice about how to multiply vectors by matrices is made deliberately so as to
correspond to the way in which linear equations are built from their coefficients. We
define
2 −1 x 2x − y
. =
1 2 y x + 2y
Refresher Course, Keith Ball 24
However, once we have decided how to multiply a vector by a matrix, we can study this
operation in a slightly different way. Instead of trying to solve some particular set of
equations we can think of multiplication by the matrix
2 −1
1 2
as giving rise to a transformation of the x, y-plane. The transformation is
x 2x − y
7→ .
y x + 2y
The diagram shows what the map does to the unit square. It is a rotation together with
an enlargement.
(���)
(���)
(���)
In the same way, every 2 × 2 matrix gives rise to a transformation of the plane. The
matrix
a b
c d
takes the point (x, y) to the point (ax + by, cx + dy).
Since every matrix gives rise to a map, the obvious question is “Which maps arise from
matrices?” Not all of them do: only those with certain special properties. Suppose I
am thinking of a matrix and I tell you what it does to the points (1, 0) and (0, 1). For
example,
(1, 0) 7→ (3, 7)
(0, 1) 7→ (4, 5)
We can see that such a matrix takes the point (1, 0) to the point (a, c). So it must be
that, a = 3 and c = 7. Similarly, b = 4 and d = 5. So the matrix is
3 4
.
7 5
This tells us that any map that is given by a matrix is a very special map: once you know
what it does to the points (1, 0) and (0, 1), you know what it does to everything. The
reason for this is an additivity property for matrix multiplication. If M is a matrix and
u and v are vectors then
M.(u + v) = M.u + M.v.
Multiplication by M preserves addition of vectors: if you add the vectors you add their
images. Similarly, if you double a vector you double its image or if you multiply a vector
by the number t, you multiply its image by t.
Every map of the plane that is given by a matrix multiplication has these properties. If
M is a matrix, u and v are vectors and λ is a number then:
M.(λu) = λM.u.
A map with this property is called a linear map. All maps of the plane given by matrix
multiplication are linear maps.
Refresher Course, Keith Ball 26
Have we answered our earlier question: “Which maps arise from matrices?” We can see
that all matrix maps are linear: is it true that all linear maps are given by matrices? Yes
indeed: and we have more or less demonstrated this, already. (See if you can give an
explanation of this fact: that any map which is linear is given by a matrix.)
Now we know that the linear maps of the plane to itself are exactly those maps which are
given by matrices. This should prompt us to ask the following question. If linear maps
are just the same as matrix maps, why have we invented a fancy new name for them:
“linear”? The reason is that linear maps turn up in many other situations (not just maps
of the plane), where matrices do not seem to be remotely relevant.
Definition (Linear maps). A map M is linear if whenever u and v are vectors and λ
is a number,
M.(u + v) = M.u + M.v
M.(λu) = λM.u.
There are at least two operations that you have met, other than matrix multiplication,
which possess properties like these: differentiation and integration. For example, when
you differentiate the sum of two functions, you can do it by differentiating each of them
and then adding the results. Differentiation and integration are linear maps: except that
these maps act, not on vectors or points, but on functions.
Refresher Course, Keith Ball 27
In the last chapter I talked about transformations of the plane which are given by matrix
multiplication. I glossed over the question of why these transformations of the plane are
interesting or useful. We shall see that among other things, rotations about the origin are
linear maps. Rotations are certainly very important: we need to understand rotations in
order to be able to relate observations made by one person to those made by someone
else, who is facing in a different direction.
For the moment, I want to take for granted that linear maps are useful and to talk a
bit more about some of their properties. One of the first things we always do when we
have thought of some kind of mathematical operation is to see what happens if we do
one after another (if we can). Suppose we have a pair of 2 × 2 matrices M and N . If
we transform the plane using M and then transform using N we will end up with some
overall transformation.
How is this matrix related to M and N ? As you will have guessed, this new matrix is the
product N.M .
3 2 2 1 12 11
. = .
1 5 3 4 17 21
(Note that N is written before M in the product, even though the map it corresponds to
consists of “first do M and then do N ”. This inversion of the order results from the fact
that we write maps on the left of the vectors to which they apply.) In general, whenever
we apply a matrix map M followed by a matrix map N we get a matrix map given by
the matrix product N.M .
You might like to check this for yourself by demonstrating it for an arbitrary pair of
matrices. Owing to the fact that you have met matrix multiplication before, I chose simply
to tell you that composition of maps corresponds to matrix multiplication. But this tends
to hide the crucial point: the reason that we multiply matrices the way we do, (and not
some other way) is that we want to talk about combining transformations, one after
another. Funny rules for combining bracketed arrangements of numbers are not our real
aim. Our real aim is to describe what happens when we combine matrix transformations
of the plane (or higher-dimensional space). Matrices and matrix multiplication are the
tools that do the job.
When you first came across matrix multiplication you may have been a bit distressed by
the fact that it is not what we call “commutative”; the order of the matrices makes a
difference. The two products below do not produce the same result:
0 −1 3 0 3 0 0 −1
. and . .
1 0 0 1 0 1 1 0
Think about what happens to a square if you first stretch in the x direction and then
rotate through 90o ; or if you first rotate and then stretch in the x direction.
Matrix inverses
Once we know how to combine linear maps, we can ask how to invert (or undo) them. If
you give me a linear transformation of the plane, can I find a linear transformation which
returns every point to its original position? Let’s try it for the matrix
2 1
M= .
3 4
This matrix is called the inverse (or left inverse) of M and we usually write it M −1 . Notice
that once you have found the inverse, you can check that M −1 .M = I very easily; much
more easily than you could find M −1 .
Theorem (Left and right inverses). If two square matrices multiply to give the identity
in one order, then they automatically do the same, in the other order.
For 2 × 2 matrices you can check this by hand. For general n × n matrices some theory
is needed to handle the same question.
Refresher Course, Keith Ball 32
The Babylonians chose to measure angle in degrees, but this is a very arbitrary measure,
which is unsuitable for most mathematical purposes. The most natural way to measure
angle is in radians. Let’s recall what that means?
Often when one first meets radians one feels that they have a rather mysterious quality.
They don’t. We just use the length of the circular arc, as a way to measure the angle.
Nothing could be simpler.
It is immediately clear that a full turn is angle 2π, a half turn is π and so on. Naturally,
if we have a circle whose radius is different from 1, we have to calculate angle by taking
the ratio of the length of the arc, to the radius.
Let us start by noticing that radians have one property, (in common with degrees), which
is vital.
ϕ
θ
When you rotate through one angle and then
through another, the total angle you get is
ϕθ
the sum of the original ones. This is clear be-
cause the same statement holds for lengths:
lengths add up in the obvious way.
Refresher Course, Keith Ball 33
Later we shall discuss the crucial property of radians which is not possessed by degrees,
(nor by any measure of angle other than radians).
(��� θ� ��� θ)
θ
Mark a point on the circle at an angle θ, mea-
sured from the horizontal. The x and y co-
ordinates of this point are the numbers cos θ � �
and sin θ.
Again, we can express each of these numbers as a ratio of lengths, when we look at a
circle whose radius is different from 1.
The geometric definition of cos and sin makes the following standard properties obvious.
For any θ,
cos2 θ + sin2 θ = 1.
(This is Pythagoras’ Theorem.) Both functions cos and sin repeat themselves after an
angle 2π: they are periodic with period 2π. cos is an even function while sin is an odd
function and for any θ, π
sin θ = cos −θ .
2
Refresher Course, Keith Ball 34
The graphs of cos and sin are the following familiar pictures.
π �π
-�
π �π
-�
There is no simple way to calculate cos and sin for a general angle. In order to approximate
them, we need to use the techniques of calculus, just as we do for the exponential and
logarithmic functions. However, for certain special angles, we can find the values of cos
and sin using simple geometric arguments. For integral multiples of π2 we can easily see
that both functions are ±1 or 0 and can easily check which. The cosines and sines of π3
and π4 are also not too hard to calculate. In the examples, you are asked to find cos π8 ,
π
cos 12 and cos 2π
5
. You could continue to think up new angles for which you can obtain
exact expressions, but this is not an efficient way to calculate for a general angle. We
shall discuss a more systematic approach in later chapters. I want to devote the rest of
this chapter to the addition formulae for cosine and sine.
In an earlier chapter I mentioned that rotations about the origin are linear maps: they
are given by matrices. Let’s quickly convince ourselves of that. What we want to know
is that if u and v are vectors and R is a rotation, then R(u + v) = R(u) + R(v). The
question is, do we get the same thing if we rotate the sum of u + v, as if we add together
the rotated vectors R(u) and R(v). The picture tells the story:
Refresher Course, Keith Ball 35
�(�)
�(�+�)
�+�
�(�) �
So now we know that rotations about the origin are linear maps. What are their matrices?
Let’s find the matrix that gives the rotation through an angle θ anticlockwise? Suppose
it is the matrix
a b
.
c d
We want to find the numbers a, b, c and d. As we did in the last chapter, we can look at
what the matrix does to a special choice of vectors:
1 0
and .
0 1
respectively.
(-��� θ� ��� θ)
(��� θ� ��� θ)
θ
θ
� (���)
These matrices (for different θ) are used in many ways throughout physics and engineer-
ing: whenever you programme a computer to handle rotations you need to give it these
matrices. We don’t have any other quantitative description of a rotation.
The first use to which we will put these matrices is to discover the addition formulae for
cos and sin. You have met these formulae and begun to realise that they turn up a lot.
The reason is that addition of angle is a natural thing to do: as we saw, it corresponds
to following one rotation by another. Rotation through φ, followed by rotation through θ
produces rotation through θ + φ.
The matrices for rotations through θ and φ are
cos θ − sin θ cos φ − sin φ
and .
sin θ cos θ sin φ cos φ
But we saw yesterday that the matrix for the composition, one map followed by another,
is obtained by multiplying the matrices of the separate maps. So the last matrix is equal
to the product
cos θ − sin θ cos φ − sin φ
.
sin θ cos θ sin φ cos φ
which is
cos θ cos φ − sin θ sin φ − cos θ sin φ − sin θ cos φ
.
sin θ cos φ + cos θ sin φ − sin θ sin φ + cos θ cos φ
By equating the two expressions for the “θ + φ” matrix we can immediately read off the
standard addition formulae.
If you haven’t ever tried it, you might find it instructive to derive the addition formulae
using “old-fashioned” constructive geometry: it isn’t too hard: but it isn’t too pleasant
either.
Once we have the addition formulae for cos and sin we can derive the formula for tan: try
it. Naturally we can also derive the double angle formulae: for example
cos 2θ = cos(θ + θ)
= cos θ cos θ − sin θ sin θ
= cos2 θ − sin2 θ
= 2 cos2 θ − 1.
In a few chapters’ time, we will look at the addition formulae in the context of complex
numbers and see an alternative way to understand (and hence remember) them.
Refresher Course, Keith Ball 38
Chapter 8. Differentiation.
In the middle of the seventeenth century, with an extraordinary burst of activity, Isaac
Newton not only explained the motion of the planets and the behaviour of falling bodies on
the basis of a single principle of gravitation, but also created the mathematical tool known
as calculus. By relating the two operations that we now call differentiation and integration,
and by inventing a systematic method for carrying out the former, Newton practically
restarted mathematics. Ever since Newton, mathematical knowledge has grown at a rate
incomparably greater than it did at any time before him. In this and the next two chapters
I want to recall the major ideas involved in differentiation; in finding derivatives. The
aim is to study rates of change of numerical quantities, (relative to one another).
As you know the earth is round (more or less). However, if you look out of the window, it
looks flat. The more closely you look at a curve, such as the curve of the earth’s surface,
the flatter it looks.
Let’s look at a mathematical curve which is easier to describe than the earth’s surface; for
example the curve y = s(x) = x2 . I have drawn three graphs of this function for different
ranges of the x-coordinate, near to the point (1, 1) on the curve. The ranges are,
0 < x < 2
0.6 < x < 1.4 and
0.9 < x < 1.1
Refresher Course, Keith Ball 39
� ���
�
�
�
���
� � � ���
���
���
���
� ���
The graphs for the shorter ranges have been blown up to fill the same width of picture.
What you notice is that smaller pieces of the curve look flatter even when you blow
them up to the same size. Contrast this with what happens when you focus onto a
sharp corner:
� ���
-� � -��� ���
���
-��� ���
In this case the corners look exactly the same as you blow them up: there is no flattening.
However in the case of the smooth curve, as you focus on smaller and smaller pieces, your
graph looks more and more like a straight line, even when you scale up the picture.
Refresher Course, Keith Ball 40
Not only does the curve look more and more straight as you focus on smaller pieces: there
is a particular straight line that the curve is copying: the tangent line. This straight line
just brushes the curve at a particular point.
� �
How can we find which line is the tangent? One thing we know about the tangent is
obvious: it passes through the point. The problem is to find its slope. The tangent line
to a graph, at a point, is supposed to provide a good description of how the function is
behaving near that point. Can we understand algebraically how the function x 7→ x2 is
behaving near x = 1?
If x is close to 1, then I can write x as 1 + h for some small number h. The value of the
function at x is thus
x2 = (1 + h)2 = 1 + 2h + h2 .
The first thing you notice about this expression is that if h is small, then the function is
fairly close to 1. If x is close to 1 then x2 is close to 1. That hardly comes as a surprise.
Much more important however, is that if h is a small number, then h2 is extremely small.
For example, if h = 0.01 then h2 = 0.0001 which is much smaller.
This ratio 2, is the slope of the tangent line that we wanted to find. As you know we call
it the derivative of the function at x = 1.
�+��
� � �+�
Let’s try to find the derivative of the function x 7→ x2 at other places. Let’s try an
arbitrary value x = c. How does the function behave near c? Put x = c + h and evaluate
the function at x:
x2 = (c + h)2 = c2 + 2ch + h2 .
Near c we can see that the function is well approximated by the linear function c + h 7→
c2 + 2ch. As we increase x from c to c + h, the value of x2 increases by about 2c times as
much; from c2 to c2 + 2ch. So the derivative of the function at c is 2c.
Let’s do another example. Suppose q(x) = x3 and we wish to calculate the derivative of
q at an arbitrary number x. We choose a number nearby x + h and evaluate q(x + h).
If h is very small, the last expression differs from q(x) = x3 by about 3x2 h. So the slope
of the curve at the point (x, x3 ) is 3x2 .
Can we give a more systematic description of what we just did? We evaluated q at x and
x + h and looked at the difference between them:
Then we picked out “the coefficient of h” in this difference; namely 3x2 . We can do the
“picking out” algebraically: divide the difference by h to get 3x2 + 3xh + h2 and now see
what happens to this as h approaches zero. Clearly as h approaches 0, the expression
3x2 + 3xh + h2 approaches 3x2 .
q(x + h) − q(x)
h
and then asking what happens to it, as h approaches zero. We can try to repeat this
procedure for any function. Let’s call it f . We evaluate the function at c and at c + h.
We take the difference, and divide by h:
f (x + h) − f (x)
h
Now we ask, does this ratio approach a limiting value, as h approaches 0?
1
r(x) = .
x
The value at x + h is
1
.
x+h
Now it isn’t so easy to see what to do with this. We can’t “expand” this reciprocal in
the same way as a square or a cube. We have to use our more formal description of the
procedure. We write down the ratio
1
− x1
r(x + h) − r(x) x+h 1 1 1
= = − .
h h h x+h x
In order to see what happens to this expression as h approaches 0, our only hope is to
simplify it, by combining the fractions in the bracket. We get
1 x − (x + h) 1 −h
=
h (x + h)x h (x + h)x
−1
= .
(x + h)x
Refresher Course, Keith Ball 43
Thus
r(x + h) − r(x) −1
= .
h (x + h)x
Now we are home. As h approaches 0, x+h approaches x and so this expression approaches
−1
x2
1 −1
(at least provided x 6= 0). So, the derivative of the function x 7→ x
at x is x2
.
f (x + h) − f (x)
h
approaches some limiting value as h approaches 0, we say that f has derivative at x, equal
to this value. We denote it
f 0 (x).
Near c the function f is approximated by a linear function
f (x + h) ≈ f (x) + f 0 (x).h
We now understand what we mean by the derivative and we know how to calculate
derivatives for some special functions. We could continue, adding to our repertoire of
differentiable functions: but we would go nuts. Instead, we need a machine to do the
work for us. The point is that the functions we use in mathematics are built up in
fairly simple ways from a few basic functions. The machine tells us how to differentiate
complicated functions, once we already know how to differentiate the pieces out of which
they are built.
Refresher Course, Keith Ball 44
• exponentials
• trigonometric functions.
The sum rule expresses the derivative of a sum of functions in terms of the derivatives
of those functions. The product rule tells us how to differentiate a product of functions
as long as we already know how to differentiate the factors. If f and g are differentiable
functions then
(f + g)0 = f 0 + g 0
and
(f g)0 = f 0 g + f g 0 .
Once we know the derivatives of the two very simple functions x 7→ 1 and x 7→ x we can
use the product rule to find the derivatives of all the monomial functions
x 7→ x2
x 7→ x3
..
.
In other words we can find the derivatives of all functions x 7→ xn where n is a positive
integer, since each of these is built up from the function x 7→ x by multiplication. Using
the sum rule as well, we can now find the derivatives of all polynomials.
The third part of the differentiation machine is the chain rule: it tells us how to express
the derivative of a composition of functions in terms of the derivatives of those functions.
Suppose f and g are functions and we form the composition
x 7→ f (g(x)).
Refresher Course, Keith Ball 45
(We usually write this function as f ◦ g.) Choose a number x where we want to calculate
the rate of change of f ◦ g. The value of the function at x depends upon the value of g at
x and the value of f at g(x). Similarly, the values of f ◦ g near x depend upon the values
of g near x and the values of f near g(x). So it is not surprising that the rate of change
of the composition, depends upon the derivative g 0 (x) of g at cx, and also the derivative
f 0 (g(cx)) of f at the point g(x).
(f ◦ g)0 (x) = f 0 (g(x)).g 0 (x)
1
Let’s have an example. Let g and f be given by g(x) = 1 + x2 and f (t) = t
respectively.
Then
g 0 (x) = 2x
while
−1
f 0 (t) =
t2
Hence
−1 −1
f 0 (g(x)) = =
g(x)2 (1 + x2 )2
So
d 1 −2x
2
= .
dx 1 + x (1 + x2 )2
Derivatives of inverses
Inverse functions play a rather important role in mathematics: the logarithm as the inverse
of the exponential, the inverse sine function and square roots for example. So it natural
to ask whether we can find the derivative of the inverse of a function whose derivative we
already know. In order to find out, we need some sort of idea.
Suppose f is differentiable and g is its inverse. Since f and g are inverses of one another,
if you do g followed by f you get back where you started:
f (g(x)) = x
for each x in the domain of g. The right hand side of this equation, we know how to
differentiate explicitly: we get 1. The left hand side, we can differentiate using the chain
rule, to get
f 0 (g(x)).g 0 (x).
Refresher Course, Keith Ball 46
f 0 (g(x)).g 0 (x) = 1
and hence
1
g 0 (x) = .
f 0 (g(x))
t 7→ t2
that squares things. As above we’ll call this inverse function f . So f (t) = t2 .
Thus we get the familiar derivative for the function x 7→ x1/2 by using our knowledge of
the derivative of its inverse.
There are many functions in which the variable is the exponent: x 7→ 2x , x 7→ (5.267)x
and so on. These functions have certain things in common: most especially, if f is an
exponential function then
f (x + y) = f (x).f (y)
Exponentials turn addition into multiplication.
Refresher Course, Keith Ball 47
But there is one exponential function which is special: and because it is so special we call
it the exponential function. Let’s look at the graph of y = 2x .
-� -� � �
If you try to estimate the slope of this curve at the point (0, 1) you will find that it comes
out to be about 0.693... This is not terribly convenient. If you try the same thing with
the graph of y = 3x you get a slope of about 1.099... Again this is not very helpful.
However, there is a number, which we call e, between 2 and 3 with the property that the
curve y = ex has a slope equal to 1 at (0, 1). The curve and its tangent are shown below.
-� -� � �
Why is it vital that we should have a special exponential function whose derivative at 0
Refresher Course, Keith Ball 48
is 1? As you know
d x
e = ex
dx
the exponential function is its own derivative. For this to be true we need that the slope
at 0 is equal to 1: because e0 = 1.
Our choice f (x) = ex of the exponential function is intended to guarantee that the expo-
nential is its own derivative. In the first instance it just guarantees that ex has the correct
slope at x = 0. What this means is that as h → 0
f (0 + h) − f (0) eh − 1
= → 1. (9)
h h
Now let’s calculate the derivative everywhere else. We want to know what happens to the
ratio
f (x + h) − f (x) ex+h − ex
=
h h
as h → 0. But this is
ex .eh − ex eh − 1
= ex → ex .
h h
Hence if f (x) = ex then f 0 (x) = ex . The exponential function is thus its own derivative.
This is what makes the exponential function useful in calculus.
The above treatment of the exponential function was a bit cavalier: I made no attempt
to justify my claim that the number e exists, although this claim is intuitively very
reasonable. I certainly made very little attempt to calculate e: could it be 19
7
perhaps?
We shall return to this later.
Refresher Course, Keith Ball 49
In Chapter 7 we recalled that we measure angle in radians and found the addition formulae
for sine and cosine. My aim in this chapter is to find the derivatives of the trigonometric
functions and in doing so explain why radians are so important.
x 7→ sin x
and
x 7→ sin x◦ .
The two graphs are drawn on the same sized axes and in each case the horizontal and
vertical axes have the same scales.
-π π �π �π �π
-�
-� � �� ��
-�
Notice that for the first function I did not specify the units of angle: radians are taken for
granted. For the second function, angle is measured in degrees. Clearly the two graphs
look very different. The characteristic wiggly behaviour of sine doesn’t show up at all on
the second graph because the horizontal scale only goes up to 15◦ . More significantly, the
first graph has a slope at the origin that looks like a reasonably sized number; something
like 1; not something very large or very small, like 1000 or 0.021.
In fact the slope of the second graph at the origin is 0.0174.... whereas the slope of the
first graph is 1: the graph is instantaneously rising at 45◦ , at the origin. Let’s see why
we get a slope of 1 for the graph of y = sin x at 0, as long as we measure angle in
radians.
Refresher Course, Keith Ball 50
We want to show that the curve y = sin x looks like the line y = x when x is a small
number: that the tangent line to to the sine graph at x = 0 is the line y = x. So we want
to verify that when x is small, the ratio
sin x
x
is close to 1. For this we need to consider a small sector of a circle of radius 1.
� ��� � �
It is intuitively clear that for small angles the ratio of the arc to the height is very close
to 1. Remember that the length of the arc is the relevant quantity, precisely because we
are using radians to measure angle. (You may object that this “intuitive” argument is
not very precise, even though it is pretty convincing. In fact the only real obstacle to our
making it precise, is that we haven’t been absolutely precise about what we mean by
the length of a curve.)
Theorem (The slope of sine at 0). As x approaches 0, the ratio
sin x
x
approaches 1, provided we measure angle in radians. Consequently, at 0, the slope
of the curve y = sin x, is 1.
Why is it such a good thing to have slope equal to 1 at the origin? As you know, and as
we shall see in detail next week, the derivative of sine is cosine. In other words the slope
of the curve y = sin x at x = t is equal to cos t; not 5.3 cos t: not 0.017 cos t but cos t.
Clearly, if the slope were not equal to 1 at the origin we wouldn’t get cosine; because
cos 0 = 1.
The reason we choose to measure angle in radians is that when we look at the functions
sine and cosine applied to angles in radians, we get the simplest possible derivatives: the
derivative of sin is cos and the derivative of cos is -sin. To finish this section let’s remark
that the slope of cosine at 0 is easy to determine.
Refresher Course, Keith Ball 51
Theorem (The slope of cosine at 0). At 0, the slope of the curve y = cos x, is 0.
π �π
-�
The function is even so if it has a tangent line at x = 0 that line will be the graph of an
even function and hence horizontal.
In the previous section we saw that the slope of the curve y = sin x at 0, is 1 and that the
slope of the curve y = cos x at 0 is 0. The first amounted to showing that as h approaches
0,
sin h
→ 1.
h
Our aim now is to find the derivatives of these functions at all other places. We want to
calculate, in particular, what happens as h approaches 0 to
sin(x + h) − sin x
.
h
To do this we use the addition formula which tells us that
Hence
sin(x + h) − sin x sin x. cos h + cos x. sin h − sin x
=
h h
cos h − 1 sin h
= sin x + cos x
h h
You can find the derivative of cos in much the same way.
π
�
- π�
and hence
1
g 0 (x) =
f 0 (g(x))
1
=
cos g(x)
1
= .
cos(sin−1 x)
This is not the expression you are accustomed to. The expression cos(sin−1 x) simplifies.
Suppose x is sin θ so that θ = sin−1 x. Our aim is to find cos(sin−1 x) which is cos θ.
In other words we are told that sin θ = x and we want to find cos θ. A picture will do it:
� �
� - ��
√
If sin θ = x then cos θ = 1 − x2 .
Earlier we talked about differentiation: the first part of calculus. Now we are going to look
at the second part: integration. Integrals appear all over mathematics and have many
different interpretations and uses. They originate in the basic problem of calculating area.
Consider the function x 7→ x2 on the interval {x : 0 ≤ x ≤ 1}.
�=��
We divide up the interval into many equal pieces, and upon each piece we draw a rectangle
whose top touches the curve.
�=��
��
��
� �
� � �
Refresher Course, Keith Ball 55
We can calculate the sum of all the areas of the rectangles, and we would expect this to
be a good estimate for the area we want. Let’s do the calculation for a general number,
n, of equal pieces. The rectangles are numbered from 1 to n. The first rectangle is based
on the interval {x : 0 ≤ x ≤ n1 }. The second on the interval {x : n1 ≤ x ≤ n2 }. In general,
the k th rectangle is based on the interval from k−1
n
to nk . The height of the k th rectangle
is thus 2
k k2
= 2.
n n
1
The width of each rectangle is n
so the area of the k th rectangle is
k2
.
n3
It is not too hard to find a formula for the sum of the square numbers from 1 to n. Using
this we can write the total area as a single expression involving n. It is
We can in principle do this for any nice continuous function. This is a way to construct
the integral formally but it is an extremely cumbersome way to calculate integrals. We
need something more powerful. Suppose we have a function f and we look at its graph.
Refresher Course, Keith Ball 56
�=�(�)
� �
We decide to try to calculate infinitely many areas all at once: we try to calculate the
area between t = a and t = x for every number x. The area we get will certainly depend
upon the value of x: it will be some function of x. Let’s call it A(x) as shown below.
�=�(�)
�(�)
� �
What can we say about the function A? The crucial discovery is that we can immediately
write down the derivative of the area function A.
A0 = f.
The rate at which we add on new area as we increase x, is equal to the height of the
curve: f (x). This principle is so important that we call it the Fundamental Theorem of
Calculus.
Refresher Course, Keith Ball 57
�=�(�)
�(�)
� � �+�
The fundamental theorem really tells us two things. On the one hand, it tells us how to
differentiate a function which is given as a definite integral. On the other hand it gives
us a faster way to calculate the integrals of certain functions.
The second point is more familiar to you, even though the first point is really the simpler
of the two. Let’s have an example of the first point. Suppose we know that the function
F is given by Z x
1
F (x) = dt
2 ln t
for each x > 2.
� � �
However, it turns out that you can’t express F using the normal functions of mathematics,
combined in the usual algebraic ways. You can’t write down any formula for F which is
“more explicit” than this Z x
1
F (x) = dt.
2 ln t
Refresher Course, Keith Ball 59
However, there are things you can say about F . You know its derivative: F 0 (x) = ln1x . It
is extremely important to be comfortable with the idea that we can define a function using
a definite integral, because such definitions occur all the time when you solve differential
equations.
Refresher Course, Keith Ball 60
Let’s return to what I called the second point about the Fundamental Theorem of Calculus
R1
and recall what you already know. If you want to evaluate an integral like 0 t2 dt what
Rx 2
you really do is to say: I shall evaluate all the integrals 0 t dt, for all possible values of
x. Suppose that, as above, you write
Z x
A(x) = t2 dt.
0
You don’t yet know what the function A is but you know its derivative: A0 (x) = x2 .
Now you ask yourself a question. Can I think of any function whose derivative is x2 ?
The answer is yes. The obvious one is
1
x 7→ x3 .
3
But this is not the only such function. If you add a constant to this function you don’t
change its derivative. If you just shift a graph upwards, you don’t change its slope. So
there are many functions whose derivative is x2 : for example
1 3
x 7→ x
3
1 3
x 7→ x +2
3
1 3
x 7→ x +π
3
..
.
The obvious question is, are these the only ones? Is it true that if you know the slope
of a function at every point, then you know what the function is, apart from a possible
constant?
The answer is yes. I shan’t attempt to justify this but I think it is at least intuitively
reasonable. So now we know that our function A(x) is a function of the form
1 3
x 7→ x +C
3
for some constant C. It only remains to decide which of these functions: to find the value
of C. We could do this if we knew the value of A at just one point.
Refresher Course, Keith Ball 61
because if you integrate over a range with no thickness you get no area. So A(x) = 13 x3 +C
and A(0) = 0. That tells us that C = 0 and so the function we want is
1 3
x 7→ x.
3
So we have found that for every number x,
Z x
1
t2 dt = x3 .
0 3
We can now go back and find the integral that we originally wanted. We just put x = 1
and we get Z 1
1
t2 dt = .
0 3
That’s how we find definite integrals using the fundamental theorem. But, as you know,
there is short cut. Instead of finding all functions whose derivative is x2 and then choosing
the right one, we can use a trick. We just think of one such function, say
1
F (x) = x3
3
and then we calculate F (1) − F (0). By subtracting off F (0) we automatically modify the
function to get the right constant C.
I have explained the long-winded process for two reasons: partly to show why the trick
works and partly to explain something much more important. Frequently at school you
are asked to find indefinite integrals, and you were told to put in the constant C. It is
important to understand what you are doing. Try not to think of the indefinite integral as
a thing (a mathematical thing like a number) but as a question. The indefinite integral
Z
t2 dt
is asking the question, “Which functions have x2 as their derivative? Tell me all of them.”
The answer is “All functions of the form x 7→ 1/3x3 + C”. And now you see why that is
a good question.
Refresher Course, Keith Ball 62
whose terms are the reciprocals of the squares. If you begin to calculate the partial sums
1 = 1
1 5
1+ 4
= 4
1 + 14 + 1
9
= 49
36
1 + 14 + 19 + 1
16
= 205
144
..
.
you quickly despair of finding a pattern. It is highly unlikely that there is any simple
formula for these partial sums: (we shall see some evidence in a moment). So it is not at
all clear whether this series converges: whether these sums approach a limit. However,
one thing is obvious about the partial sums listed above: they get larger as we add more
terms, because the terms that we are adding are positive.
If we are adding up a series of positive terms like this, there are only two possible kinds
of behaviour: either the partial sums approach a finite limit, or they increase without
bound. Either the sums eventually surpass any value we can think of or they are trapped
below some number. In the second case they get squeezed up against a ceiling and are
forced to approach a limit.
are less than 2 (for example), then we could conclude that they approach some limit, even
though we would not know what limit. Note that the bound we find, 2, need not be the
Refresher Course, Keith Ball 63
limit. It is just some value that traps the sums. The limit can’t be larger than 2 but it
could be smaller.
The idea will be to compare the sums with certain integrals which we are able to evaluate.
Look at a graph of the function x 7→ x12 .
� � � � �
Thus, there is a rectangle of height 41 and width 1 underneath the curve between x = 1
and x = 2. The area of this rectangle is 41 .
� � � � �
Hence, all the partial sums of this series are at most 2, and the series converges to
something. What Euler managed to do, three hundred years ago, was to find out what
that “something” is. He discovered the extraordinary and beautiful fact that
1 1 1 π2
1+ + + + ... = .
4 9 16 6
Refresher Course, Keith Ball 65
Approximation of one kind or another plays a vital role in mathematics: in some ways it
is the central problem in mathematics. Most of the equations that arise in physics cannot
be solved exactly and we need to find ways to approximate their solutions. The advent
of computers has made it possible to use approximation methods that would otherwise
be too complex. But these methods have to be invented before they can be programmed.
Even in cases where we are accustomed to thinking that we can solve equations, it turns
out, when we look closely, that some approximation is going on. For example, we normally
feel that we have no difficulty in solving equations like
x2 = 2.
√
But the answer, 2 is a number which needs to be approximated if we wish to use it on
√
a computer. We need a way of calculating the first 10 decimal places for 2 (or the first
15 or ...).
is an integral that we can “do exactly.” The value is ln2. But if I ask you “How do you
calculate ln2?” you would have trouble giving me a much better answer than
Z 2
1
ln 2 = dx
1 x
and finding some way to estimate this integral. In mathematics, we need to be able to
approximate particular numbers, like ln2, and also to approximate functions.
It turns out that the first problem is often best tackled via the second. In this chapter
I want to discuss the most fundamental and important way to approximate functions:
Taylor approximation. As we saw in an earlier chapter, the derivative of a function at a
point tells us the slope of the tangent to the graph of the function at that point.
Refresher Course, Keith Ball 66
� �
The derivative tells us how to approximate a function near a particular point, by a linear
function. Although linear approximations are extremely important for many reasons, it is
often necessary to approximate by other kinds of functions: for example, by polynomials
of degree larger than one. How can we approximate a function be a quadratic polynomial?
Just as a linear function is determined by its value at a point together with its slope so a
quadratic function is determined by its value at a point, its derivative at that point and
its second derivative there. Indeed, suppose I have a quadratic polynomial p given by
p(x) = a + bx + cx2 .
Then
p0 (x) = b + 2cx
and
p00 (x) = 2c.
Once I tell you the second derivative p00 (0), you can find the coefficient
p00 (0)
c= .
2
From the derivative at 0, you can find b = p0 (0). The value of the function itself, p(0), is
a.
p00 (0) 2
p(x) = p(0) + p0 (0)x + x.
2
Refresher Course, Keith Ball 67
1
Let’s have an example. Suppose f (x) = 1−x
. The first two derivatives of f are
1
f 0 (x) =
(1 − x)2
and
2
f 00 (x) = .
(1 − x)3
Substituting 0 for x we obtain
Notice again the sequence of operations in the above calculations. We are given a function:
1
x 7→ .
1−x
1. We use the machine to differentiate this function an appropriate number of times.
For a general point a we get that the quadratic Taylor approximation to a function f near
a is given by
f 00 (a)
p(x) = f (a) + f 0 (a)(x − a) + (x − a)2 .
2
Now let’s move on to higher order approximations. The nth order Taylor approximation
to f near a should be a polynomial of degree n whose value, and whose first n derivatives,
at a are the same as those of f at a. The polynomial is given by
Let’s use the Taylor approximation formula to find further approximations to the function
1
x 7→ .
1−x
The successive derivatives of this function are
1 1 2 6 n!
, , , , ..., , ...
1−x (1 − x)2 (1 − x)3 (1 − x) 4 (1 − x)n+1
1, 1, 2, 6, 24, . . . , n!, . . .
This makes it easy to see that the nth order Taylor approximation at zero, to the function
1
x 7→
1−x
is
1 + x + x2 + x3 + x4 + . . . + xn .
In consequence we expect that if x is close to 0, and n is large,
1
≈ 1 + x + x2 + x3 + . . . + xn .
1−x
Refresher Course, Keith Ball 69
This should not come as a great surprise. When we talked about geometric series, we
found that as long as |x| < 1,
1
= 1 + x + x2 + x3 + . . . .
1−x
1
In other words, the infinite sum converges to the value 1−x . This is the same thing as
1
saying that we get good approximations to 1−x by taking a large enough number of terms
1
of the series. For the function x 7→ 1−x , the Taylor approximations at zero approach the
value of the function, as long as |x| < 1.
This kind of situation occurs quite frequently: there are many functions for which the
Taylor approximations at a point converge to the correct value, at least near the point.
When this happens we get an infinite series that represents the function (at least near a
point). Such a series is called the Taylor series for the function at that point. We can thus
1
state the above remarks in the following way. The Taylor series for the function x 7→ 1−x
at zero, is
1 + x + x2 + x3 + x 4 + . . . .
In the next chapter I will talk about perhaps the most important Taylor series of all.
Refresher Course, Keith Ball 70
In this chapter I want to talk about the Taylor series for the exponential function x 7→ ex .
This function is its own derivative: so all its derivatives are the same function. Therefore,
at the point 0, all the derivatives of this function are equal to 1. Hence, the Taylor series
of the function, at 0, is
∞
x2 x3 x4 X xn
1+x+ + + + ... = .
2 6 24 0
n!
It turns out that this series makes sense for any value of x. The series converges, however
large x may be. The reason (roughly) is that the denominators n! grow extremely rapidly
as n increases, so that the terms of the series eventually become very small, even if x is
large. So we can write
x2 x3 x4
ex = 1 + x + + + + ...
2! 3! 4!
for every number x.
If you remember, I stated earlier that there is a number e for which the function x 7→ ex
has derivative equal to 1 at 0. I did not demonstrate the existence of this number e. We
can now say what this number is:
1 1 1
e = 1 + 1 + + + + ....
2! 3! 4!
Hence, for example, we can estimate e, as accurately as we wish, in a fairly simple way.
If you prefer, you could take this series to be the definition of e.
In fact, you could go a bit further, and say that you wouldn’t mind having a√ slightly
clearer definition of ex . After all, it isn’t entirely obvious what we mean by e 2 . How
√
do we multiply the number e by itself 2 times? If you wish, you could take the Taylor
series
x2 x3 x4
1+x+ + + + ...
2! 3! 4!
as the definition of ex . What I am more interested in today is to see how to relate the
series to the properties with which you are more familiar.
The exponential function has two crucial properties both of which have already been
mentioned.
Refresher Course, Keith Ball 71
•
d x
e = ex .
dx
•
ex+y = ex .ey
for all numbers x and y.
Remember that the first of these was the property that we used to find the Taylor series.
It is not too surprising that we can go back.
Let us make a fairly bold assumption: that we can differentiate the series
x2 x3 x4
1+x+ + + + ...
2 6 24
“term by term,” just as if it were a finite sum. What do we get?
x2 x3
0+1+x+ + + ....
2 6
So, when we differentiate the series we get exactly the same series back again. The 0 at
the front makes no difference to the value.
x2 x3 y2 y3
1+x+ + + ··· 1+y+ + + ··· .
2 6 2 6
So we have to form all possible products of one term from the first bracket and one term
from the second bracket (and then add all of these products together).
When you are multiplying two infinite sums, you have to be a bit careful about how you
write down the products so as not to miss any. In order to find a systematic way of doing
Refresher Course, Keith Ball 72
1 x x2 /2 x3 /6 ...
1 1 x x2 /2 x3 /6 ...
y y xy x2 y/2 ...
y 2 /2 y 2 /2 xy 2 /2 ...
y 3 /6 y 3 /6 ...
.. ..
. .
Along the top are the terms of the first sum. Down the side are the terms of the second.
In the grid are the products of all possible pairs, located in the obvious positions. Our
job is to add up all the products in the grid.
You might be tempted to add up all the products in the first row, then move on to the
second and so on. But with infinite sums this is not a very good idea. You will never get
to the end of the first row: so you will never pick up any terms from any of the other
rows.
One way to make sure that we include everything is to add them diagonally. We start
with the top left corner 1. Then we take the next NE-SW diagonal down: the short
diagonal containing x and y. That gives us the sum x + y. Then the next diagonal down
gives us x2 /2 + xy + y 2 /2. Continuing like this we pick up all the products in the grid.
Now let us collect each diagonal sum over a common denominator. The first is x + y. The
second is
x2 + 2xy + y 2
2
and the third is
x3 + 3x2 y + 3xy 2 + y 3
.
6
Refresher Course, Keith Ball 73
We immediately see that we have the binomial expansions of (x + y)2 and (x + y)3 . So
the whole sum looks like
(x + y)2 (x + y)3
1 + (x + y) + + + ···
2 6
and this is exp(x + y).
What this argument shows is that the characteristic property of the exponential function
ex+y = ex .ey is actually the same as the binomial theorem.
The logarithm
The exponential function is strictly increasing: so it never takes the same value twice.
-� -� � �
ln(uv) = ln u + ln v.
Let’s prove this from the definition. We know that ln(uv) is the number whose exponential
is uv:
eln(uv) = uv.
But we know that the number in question is ln u + ln v because
Hence
ln(uv) = ln u + ln v.
Let us use the chain rule to compute the derivative of ln from that of the exponential.
Let f be the exponential function and g the logarithm.
f (g(x)) = x
and so
f 0 (g(x))g 0 (x) = 1.
In the present case
f 0 (x) = ex
and hence
1 1 1
g 0 (x) = = = .
f 0 (g(x)) exp(ln x) x
Refresher Course, Keith Ball 75
For many reasons, it is important to be able to solve algebraic (or polynomial) equations:
equations like
x3 − 3x2 + x − 5 = 0.
But the system of real numbers does not provide us with solutions, even to simple
quadratic equations, because negative numbers do not have real square roots. To solve
this problem, we adjoin to the real number system, a further number, which we call i (or
sometimes j), with the property that i2 = −1. We then study the set of all numbers of
the form x + iy where x and y are real numbers. We call these expressions, complex
numbers.
You might at once wonder whether this i is really a number. What is i? Where is it?
Mathematicians have learnt from painful experience that to ponder such philosophical
questions is invariably fruitless. During the 15th century, there were misgivings about the
existence of negative numbers: such misgivings now look absurd. The only things that
matter are whether or not we can build a sensible arithmetic, using complex numbers,
and whether this arithmetic gives us useful mathematical tools?
How do we do arithmetic with complex numbers? I have to tell you how to add and
multiply complex numbers. The sum of x + iy and u + iv is the number
(x + u) + i(y + v).
We add complex numbers by adding their real parts and adding their imaginary parts,
just as if these were ordinary algebraic expressions.
The rule for multiplication looks more complicated: the product of x + iy and u + iv is
the number
(xu − yv) + i(xv + yu).
But the rule is really very simple. We multiply the numbers as if they were ordinary
algebraic expressions:
i2 = −1.
However, there are certain things we need to check before we rush off and do complex
arithmetic. The most important thing is to check that each complex number other than
0 + 0.i has a reciprocal. For example, is there a complex number equal to
1
?
2 + 3i
Can we find a number x + iy for which
2x − 3y = 1
3x + 2y = 0.
x = 2/13, y = −3/13.
As you know, there is a way to streamline the calculation above. You write the expression
1
2 + 3i
in the form
2 − 3i
(2 + 3i)(2 − 3i)
by “multiplying top and bottom by 2−3i.” You now observe that the bottom is 22 −(3i)2 =
4 − 9i2 = 4 + 9 = 13.
1 2 − 3i 2 − 3i
= = .
2 + 3i (2 + 3i)(2 − 3i) 13
Refresher Course, Keith Ball 77
Since the bottom is now a real number we know how to break the number up into real
and imaginary parts.
2 − 3i 2 3
= − i.
13 13 13
Once you have reciprocals, you have no problem dividing one complex number by another
(which isn’t zero). So we now have a system of arithmetic for complex numbers.
Once we have introduced complex numbers we can find a square root for each negative
number: in fact two square roots. For example, each of the numbers i and −i has -1 as
its square. Similarly, -4 has the square roots 2i and −2i. We can then solve all quadratic
equations
ax2 + bx + c = 0.
It would be natural to expect that if you want to solve cubic equations, you have to throw
in some more numbers, and to solve quartics, yet more.... However, it turns out that to
solve polynomial equations, you only ever need the complex numbers. Even if
you wish to solve equations with complex coefficients (not real ones), you can do it with
complex numbers.
For example, we know that there is at least one complex number z which satisfies
z 8 + (2 + i)z 5 + iz − 3 = 0.
This astonishing fact is so important that we call it “The fundamental theorem of algebra.”
Now suppose that p is a polynomial with complex coefficients. Once you have found a
zero of p, p(α) = 0 say, you can factorise: p(x) = (x − α)q(x) where q is a polynomial of
degree one less than the degree of p. So you can continue the process, by finding a zero
of q. Eventually, you end up with the original polynomial p, expressed as a product of
linear factors with complex coefficients. We can thus state the fundamental theorem of
algebra in the following way.
The fundamental theorem of algebra is not easy to prove. I certainly shan’t attempt
to prove it here. The first really convincing demonstration was found by Gauss, at the
beginning of the 19th century.
Since each complex number x+iy is built out of two real components we can represent the
complex numbers by the points of the plane: the number x + iy corresponds to the point
(x, y). Such a representation might be nothing more than an artificial correspondence.
However, it turns out to be anything but artificial: the arithmetic of the complex numbers
is intimately related to the geometry of the plane. Let us begin gently.
(���)
(���)
(���)
Thus, the map which takes a point x + iy to the point (x + iy) + (2 − i) acts on the plane
as a translation, by the displacement (2, −1).
Multiplication is a bit more complicated. Let’s try multiplication by 2 + 3i. This takes a
number x + iy to the number
More generally, the map corresponding to multiplication by a + ib is the linear map given
by the matrix
a −b
.
b a
This map is a combination of an enlargement and rotation. The scale factor of the
enlargement is the number √
a2 + b 2 .
√
The real number a2 + b2 is thus related to the algebraic behaviour of the complex number
a + ib. Of course, it is also the distance of the point (a, b) from (0, 0). The appearance
of the Euclidean (or Pythagorean) distance, lies at the heart of the relationship between
√
the complex numbers and the geometry of the plane. We call the number a2 + b2 , the
absolute value of the complex number a + ib and denote it with vertical bars
|a + ib|
Thus addition of complex numbers corresponds to translation of the plane and multiplica-
tion of complex numbers corresponds to matrix operations on the plane. The arithmetic
of complex numbers fits with the geometry of the plane. When we talk about the complex
plane we don’t just mean that complex numbers are pairs of real numbers: the complex
numbers really do form a plane.
One final remark is appropriate for this chapter. We frequently abbreviate our notation
x + iy for a complex number. Unless we are explicitly interested in the real and imaginary
parts of a number, we usually represent it by a single letter instead of 2 (or 3 if you include
the letter i). Thus, we quite happily write w or z to mean a complex number.
Refresher Course, Keith Ball 80
In the last chapter I introduced the complex numbers and mentioned the fantastic fact
that all polynomial equations have complex solutions. This fact alone would be enough to
make the complex numbers important in mathematics. But their importance in physics
and engineering stems from another, equally astonishing discovery. In this chapter I will
explain how we extend the exponential function to the complex numbers: how we define
the exponential of a complex number.
√
I pointed out that we have to be a bit careful what we mean by e 2 . It isn’t clear what
√
it means to multiply a number by itself 2 times. However, at a pinch, I bet you could
have come up with a sensible definition. The situation looks much more difficult when
we move into the complex plane. What on earth is 7i ? That’s what this chapter will be
about.
If things are going to work out nicely, one thing certainly ought to be true. Since
7 = eln7 ,
ez ,
At this point we make an inspired guess: (or Euler and his contempories made one).
Remember that for a real number x we had an expression for ex as an infinite series:
x2 x3
ex = 1 + x + + + ....
2 6
Can we use the same formula to tell us the value for ez when z is a complex number?
What could go wrong? The first thing we have to do is to make sense of the infinite sum.
In the case of real numbers, this involved determining whether the successive sums ap-
proach a limiting value. Can we make sense of complex numbers approaching a limit?
Refresher Course, Keith Ball 81
Yes, because we can measure how far apart they are: we can measure the distance be-
tween complex numbers because they are points in the plane. So, we can at least say
“We define
z2 z3
ez = 1 + z + + + . . . .”
2 6
The second important thing we need to do is to check that our new function has the right
property for an exponential:
ew+z = ew .ez
for any two complex numbers w and z. We can do this, just as we did for real numbers,
because the binomial theorem works just as well for complex numbers. Once we have the
multiplicative property for the exponential, we feel a bit happier about writing it ez .
Now comes the really important question. Is our complex exponential function useful, or
is it just a mathematical trick with no real point? To answer this we try to understand
a bit about it. If you wanted to calculate es+it for a complex number s + it written in
terms of its real and imaginary parts, the natural thing to do would be say
es+it = es .eit .
Since s is a real number, we already understand es . The problem is to understand eit , for
each real number t.
We have
t2 t4 t3 t5
it
e = 1− + − ... + i t − + − ... .
2 24 6 5!
The two series for the real and imaginary parts of eit are very familiar. They are the series
for cos t and sin t respectively. What we have found is that
This remarkable formula links together the exponential series, which begins as a purely
analytic construction, and the trigonometric functions, which are geometric construc-
tions.
Some of the most powerful mathematical ideas are those which link together algebra or
analysis and geometry. The physicist Richard Feynman described the formula
as “our jewel.”
Going back to our picture of the complex numbers as points in the plane, we can now
draw in the point eit for a real number t.
�� �
�
� �
Points of this form are exactly the points which lie on the circle of radius 1, centre
0, because the points have coordinates (cos t, sin t). We now have a way to define the
exponential of a complex number and we are able to give a geometric meaning to the
values we get. This, finally, justifies our view of the complex numbers as a plane. We
have an algebraic way to describe the circle of radius 1, sitting in the complex numbers.
We have a map
t 7→ eit
from the real line to the complex plane, which winds the real line around the circle of
radius one, infinitely many times.
The rest of this chapter is devoted to applications of the jewel formula. The first is a quick
derivation of the addition formulae for the trigonometric functions. The second relates
the derivatives of the exponential and trigonometric functions.
Refresher Course, Keith Ball 83
We know from the magic formula of the last section that for any real numbers θ and φ,
But as soon as we see the exponential of a sum, we can immediately write it as a product,
and each of the factors can then be put back in terms of trigonometric functions:
So we get
cos(θ + φ) + i sin(θ + φ) = (cos θ cos φ − sin θ sin φ) + i(sin θ cos φ + cos θ sin φ).
This equation involving complex numbers can be regarded as two equations involving real
numbers. By equating real and imaginary parts we can read off the addition formulae.
Once you have seen this, you can never again dare to admit that you can’t remember the
addition formulae for the trigonometric functions.
We can also use the jewel to link together the derivatives of the exponential and trigono-
metric functions.
cos t + i sin t = eit .
If we differentiate eit with respect to t we get ieit . Hence
d d
(cos t + i sin t) = eit = ieit = i (cos t + i sin t) = i cos t − sin t.
dt dt
By comparing real and imaginary parts we can confirm that
d d
cos t = − sin t and sin t = cos t.
dt dt
The formula
eit = cos t + i sin t.
Refresher Course, Keith Ball 84
only makes sense if we measure the angle in radians. Recall that we chose to use the
exponential function rather than an exponential function (x 7→ 2x for example) in order
to make the derivative of the function as simple as possible:
d x
e = ex .
dx
We chose to measure angle in radians in order to make the derivatives of the trigonometric
functions as simple as possible
d d
cos x = − sin x and sin x = cos x.
dx dx
We now see that these two choices were in fact the same choice. Together they lead to
one of the most remarkable human achievements of all time