0% found this document useful (0 votes)

18 views45 pages

Mathematics (1)

The document contains lecture notes on basic set theory, countability, and functions, primarily aimed at PhD students. It covers concepts such as sets, subsets, unions, intersections, and the completeness axiom of real numbers, along with definitions of functions and sequences. The notes also include examples and important theorems like the Triangle Inequality.

Uploaded by

Kyle Fagioli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views45 pages

Mathematics (1)

Uploaded by

Kyle Fagioli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Mathematics Lecture Notes

Author: Brennan Thompson ([email protected])

Last updated: September 1, 2023 (corrections in red)

Warning: All sections marked with an asterisk (*) as well as all proofs are reserved for PhD students.

Basic Set Theory

A set is simply a collection of objects. For example, we can represent the different colours on a traffic light
as the following set:
{Red, Yellow, Green}.
We call the different objects in a set the elements of that set. In general, if a is an element of the set
A, we write a ∈ A, and if a is not an element of A, we write a ∈ / A. Continuing with the example
above, Red ∈ {Red, Yellow, Green}, but Blue ∈ / {Red, Yellow, Green}. Writing something like Red, Yellow ∈
{Red, Yellow, Green} means that both Red and Yellow are elements of the set {Red, Yellow, Green}.
The sets A and B are said to be equal, denoted A = B, if every element in A is in B and vice-versa. For
example, {Red, Yellow, Green} = {Green, Yellow, Red} (the order we list the elements in does not matter).
The set A is said to be a subset of the set B, denoted A ⊂ B, if all the elements in A are also in B. For
example, {Red, Yellow} ⊂ {Red, Yellow, Green}. Note that, by definition, A ⊂ A (all elements in A are
obviously in A). Also, A ⊂ B and B ⊂ A together imply that A = B. If A ⊂ B but A ̸= B (meaning
there must be at least one element in B that is not in A), then A is said to be a proper subset of B.1 In
the above example, {Red, Yellow} is a proper subset of {Red, Yellow, Green}. Finally, if both A and B are
subsets of C, we can write A, B ⊂ C.
The empty set, denoted ∅, has no elements. Note that ∅ is a subset of any set (indeed, we could even
write ∅ ⊂ ∅). A singleton is a set with only one element. For example, {Red} is a singleton (if we wanted
to refer just to the element Red itself, we would not have the curly brackets around it).
The union of A and B denoted A ∪ B, consists of all of the elements that belong to either A or
B (or both). For example, {Red, Yellow} ∪ {Red, Green} = {Red, Yellow, Green}. The intersection of
A and B, denoted A ∩ B, consists of all of the elements that belong to both A and B. For example,
{Red, Yellow} ∩ {Red, Green} = {Red}. The difference between set A and set B, denoted A \ B, is defined
as all the elements in A that are not in B. For example, {Red, Yellow, Green}\{Yellow} = {Red, Green}.
A finite set has a finite number of elements, while an infinite set has an infinite number of elements.
Examples of infinite sets are the set of natural numbers

N = {1, 2, 3, . . .},

the set of integers

Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .},
the set of rational numbers
Q = {p/q : p, q ∈ Z and q ̸= 0}
(the colon here is read as “such that”), and the set of real numbers R. We won’t attempt to define R here
(see Ross, 2013, Section 6, for a nice discussion),
√ but
√ it suffices √
to say that R consists of all rational numbers
as well as all irrational numbers such as 2 (i.e., 2 ∈ R but 2 ∈ / Q), meaning that Q is a proper subset
of R.

* Countability
A set is said to be countable if it is either finite or if it is countably infinite, which means that if it has
a one-to-one correspondence with N (i.e., each element in the set can be paired with exactly one element
of N and vice-versa). So, N itself is obviously countably infinite. Another example of a countably infinite
1 Some authors will use ⊂ to denote a proper subset and ⊆ to denote a subset, but there is rarely a need to differentiate
between the two.

1
set is the set of even natural numbers, E = {2, 4, 6, . . .}. To see why this is the case, consider the following
correspondence:
N : 1 2 3 ··· n ···
↕ ↕ ↕ ↕
E : 2 4 6 · · · 2n · · ·
To see why Z is a countably infinite set, consider the following correspondence:

N: 1 2 3 4 5 ··· n ···
↕ ↕ ↕ ↕ ↕ ( ↕
(n − 1)/2 if n is odd
Z: 0 −1 1 −2 2 ··· ···
−n/2 if n is even

What might be more surprising is that Q is also countably infinite. To see this, begin by defining, for each
n ∈ N, the set:
p
An = ± : p, q ∈ N are the smallest terms such that p + q = n
q
That is,
0 1 −1 1 −1 2 −2
A1 = , A2 = , , A3 = , , , ,
1 1 1 2 2 1 1
and so on. We then have the following correspondence:

N: 1 2 3 4 5 6 7 ···
↕ ↕ ↕ ↕ ↕ ↕ ↕
Q : A1,1 A2,1 A2,2 A3,1 A3,2 A3,3 A3,4 ···

where An,i is the ith element of An (e.g., A2,1 = 1/1). On the other hand, R is not countably infinite (i.e.,
it is uncountable), but we won’t prove this here (for a proof, see Bartle & Sherbert, 2011, Theorem 2.54).

More About R
Let a, b ∈ R with a < b. Various intervals within R are defined as follows:

• Open interval: (a, b) = {x ∈ R : a < x < b}

• Closed interval: [a, b] = {x ∈ R : a ≤ x ≤ b}
• Half-open interval (or half-closed interval):

(a, b] = {x ∈ R : a < x ≤ b}
[a, b) = {x ∈ R : a ≤ x < b}

• Intervals involving −∞ and ∞:

(−∞, ∞) = R
(a, ∞) = {x ∈ R : x > a}
[a, ∞) = {x ∈ R : x ≥ a}
(−∞, b) = {x ∈ R : x < b}
(−∞, b] = {x ∈ R : x ≤ b}

Note that a construction such as [−∞, ∞] makes no sense since −∞ and ∞ are not real numbers.

2
The non-negative real numbers [0, ∞) are denoted by R+ , while the non-positive real numbers (−∞, 0]
are denoted by R− .
The set S ⊂ R is said to bounded from above if there is a u ∈ R such that s ≤ u for all s ∈ S (here, u
is called an upper bound of S). For example, consider the sets A = [0, 1], B = (−∞, 1), and C = (0, ∞).
learly A and B are bounded from above (any real number greater than or equal to 1 is an upper bound for
both), while C is not bounded from above. The set S ⊂ R is said to bounded from below if there is a
l ∈ R such that s ≥ l for all s ∈ S (here, l is called an lower bound of S). In the example above, clearly A
and C are bounded from below (any real number less than or equal to 0 is a lower bound for both), while B
is not bounded from below. If the set S ⊂ R is bounded from both above and below, it is simply said to be
bounded. More directly, we can say that the set S is bounded if there exists a b ∈ R such that |s| ≤ b for
all s ∈ S. If S is not bounded it is said to be unbounded, even if it is bounded from either above or below
(but not both). In the example above, only A is bounded, while B and C are unbounded.
If a set S ⊂ R is bounded from above, its smallest upper bound is called the supremum of S and
is denoted by sup S (another term for supremum is least upper bound). In the above example, sup A =
sup B = 1. Since the set C is not bounded from above, it is convenient to write sup C = ∞. If a set S ⊂ R
is bounded for below, its largest lower bound is called the infimum of S and is denoted by inf S (another
term for infimum is greatest lower bound). In the above example, inf A = inf C = 0. Since the set B is not
bounded from below, it is convenient to write inf B = −∞. If a set S ⊂ R has a largest element, we call
that element the maximum of S and denote it by max S. In the above example, max A = 1 (this is its also
supremum), while B and C have no maximum (notice that 1 ∈ / B). If a set S ⊂ R has a smallest element,
we call that element the minumum of S, and denote it by min S. In the above example, min A = 0 (this is
its also infimum), while B and C have no minimum (notice that 0 ∈ / C).

* The Completeness Axiom

Any serious study of R relies on an understanding of the following axiom (an axiom is an assumption we
make, as opposed to a theorem, which is a statement we can prove given a set of axioms):
Completeness Axiom. Every non-empty subset of R that is bounded from above has a supremum that is
itself a real number.
From this axiom, one can show that if every non-empty subset of R that is bounded from below has an
infimum that is itself a real number.
To appreciate the importance of this axiom, consider A = {r ∈ Q : r2 < 2}, which is a non-empty subset
of Q. Although we can easily find rational numbers that bound A from above (e.g., 142/100 would be an
upper bound), there is no rational number that would provide a least upper bound (1415/1000 is an upper
bound smaller than 142/100, but we can continue on like this indefinitely).
√ Of course, A is also a non-empty
subset of R and thus has a supremum in R (clearly sup A = 2, which is an irrational number), but the
point is this: Q does not exhibit the “completeness” that R does. Instead, Q can be seen to have “gaps”
(the irrational numbers).

Functions
Given two sets A and B, a function f from A to B (written f : A → B) maps each element of A to a specific
element of B. That is, each a ∈ A is mapped to some f (a) ∈ B. It is important to distinguish between the
function f and the value f (a). We can think of f as a “process” that takes a as “input” and produces f (a)
as “output”. For example, consider the function f : {Red, Yellow, Green} → {Stop, Yield, Go} defined by

Stop if Colour = Red,

f (Colour) = Yield if Colour = Yellow,

Go if Colour = Green.


This function takes a colour as an input, and produces an action as an output, e.g., plugging the colour Red
into this function produces the action Stop as an output.

3
If f : A → B, the set A is referred to as the domain of f , and we say that f is defined on A (and on
any subset of A).2 The codomain of f is B, while the range of f is {f (a) ∈ B : a ∈ A} (i.e., the subset of
B containing only the elements of B that f can map to from A). We will sometimes denote the range of f
as f (A).
A function whose range is a subset of R is called a real-valued function. When we encounter a real-
valued function, we usually write its codomain as R, even if its range is some proper subset of R. For
example, consider the function f : R → R defined by f (x) = |x|. This function has range R+ , which is a
proper subset of its codomain R.

* The Triangle Inequality

In what follows, we will make extensive use of the following result:
Theorem 1 (Triangle Inequality). |a + b| ≤ |a| + |b| for all a, b ∈ R.
Proof. We will make repeated use of the “obvious” facts that |x| ≥ x and |x|2 = x2 for all x ∈ R. Now,
notice that
(|a| + |b|)2 = |a|2 + 2|a||b| + |b2 | = a2 + 2|a||b| + b2 ≥ a2 + 2ab + b2 .
The final inequality above follows from the fact that |a| ≥ a and |b| ≥ b, so |a||b| ≥ ab. Thus, since
|a + b|2 = (a + b)2 = a2 + 2ab + b2 , we have

|a + b|2 ≤ (|a| + |b|)2 .

Taking the square root of both side of the above produces the desired result.
The “Reverse Triangle Inequality” states that ||a| − |b|| ≤ |a − b| for all a, b ∈ R. You are asked to prove
this in Exercise 3.

Sequences of Real Numbers

A sequence of real numbers √ is a real-valued function whose domain is N. For example, the function
f : N → R defined by f (n) = n is a sequence. In order to differentiate sequences from other functions,
we will avoid the generic f and use the generic s instead. Moreover, instead of writing something like s(n),
we will write sn . The real numbers s1 , s2 , s3 , . . . (i.e., the outputs of the function s corresponding to inputs
1, 2, 3, . . .) are referred to as the terms of the sequence. Thus, since sn refers specifically to the nth term of
the sequence, we write (sn ) to denote the sequence itself, i.e., the function s : N → R. For example, consider
the sequence (an ) defined by an = n−2 . The terms of this sequence are

a1 = 1, a2 = 1/4, a3 = 1/9,

and so on.
A sequence (sn ) is said to converge to s ∈ R if, for every ϵ > 0, there exists an N ∈ N such that
|sn − s| < ϵ for all n > N . If (sn ) converges to s, we say that s is the limit of (sn ) and write lim sn = s. If
a sequence converges to some limit, then it is said to be convergent.
This definition needs some unpacking. Suppose (sn ) converges to s. What this is saying is that sn will
be “close” to s (less than ϵ away) provided that n is sufficiently large (i.e., larger than N , which depends on
ϵ).
For example, consider again the sequence (an ) defined by an = n−2 . It seems intuitive to say that
lim an = 0, but let’s try to relate this to the definition given above. Suppose, for the moment, that ϵ = 0.1.
Then, as Figure 1 makes clear, we have |n−2 − 0| < 0.1 for all n > 3 (all terms between the two dotted lines
satisfy this condition).
In general, we can choose ϵ as small as we want provided that we choose N is large enough (in this
sense, we say that sn can be made “arbitrarily close” to s for sufficiently large n). A useful “trick” for
2 In the above example, f is defined on {Red, Yellow}, which is a subset of its domain. However, f is not defined on

{Red, Blue} since Blue is not an element of its domain.

4
Figure 1: The sequence (an ) defined by an = n−2

selecting the appropriate N corresponding to our choice of ϵ is to solve the equation |sN − s| = ϵ for N (if
the resulting value of N is not a natural number, then round down to the nearest one).3 For the sequence
(an ) considered above, we have |N −2 − 0| = ϵ. Since the quantity inside the absolute value function must
be positive (remember that N ∈ N meaning that N cannot be negative), this is equivalent to N −2 = ϵ or
N = ϵ−1/2 . Thus, with ϵ = 0.1, we have ϵ−1/2 ≈ 3.1623, so we set N = 3 as above. Similarly, with ϵ = 0.01,
we have ϵ−1/2 = 10, so we set N = 10 (notice that |11−2 − 0| ≈ 0.008 < 0.01).
A sequence that does not have a limit (i.e., that is not convergent) is said to be divergent. For example,
the sequence (bn ) defined by bn = (−1)n diverges since the terms of this sequence are

b1 = −1, b2 = 1, b3 = −1

and so on.
We say that a sequence (sn ) diverges to ∞ (and write lim sn = ∞) if, for every u > √ 0 there exists an
N ∈ N such that sn > u for all n > N . For example, the sequence (cn ) defined by cn =√ n diverges to ∞.
To show this formally, let u > 0 and N = u2 . Then n > N implies n > u2 and thus n > u. Similarly,
we say that a sequence (sn ) diverges to −∞ (and write lim sn = −∞) if, for every l < 0, there exists an
N ∈ N such that sn < l for all n > N .
The sequence (sn ) is said to be bounded if {sn : n ∈ N} is a bounded set, i.e., if there exists a b ∈ R
such that |sn | ≤ b for all n ∈ N. For example, the sequence (an ) defined by an = n−2 is bounded (note that
sup{an : n ∈ N} = 1 and inf = {an : n ∈ N} = 0, so we could say |an | ≤ 1 for all n ∈ N).
Clearly, any sequence that diverges to ∞ or −∞ is unbounded. However, it is important to recognize
that for any sequence (sn ), even for one that is unbounded, sn ∈ R for all n ∈ N, i.e., |sn | < ∞ for all n ∈ N.
In other words, saying that (sn ) diverges to ∞ or −∞ does not mean that sn is ever equal to ∞ or −∞
(remember that ∞ and −∞ are not real numbers). Instead, it just means that |sn | “grows without bound”.
The above discussion leads nicely into the following theorem:

Theorem 2. Every convergent sequence is bounded.

3 The reason we round down to the nearest natural number is that we only need |sn − s| < ϵ for n greater than N (remember
that n is also a natural number). If the resulting value of N is less than 1, use 1.

5
Proof. Let (sn ) be a sequence converging to s ∈ R. Letting ϵ = 1, there exists an N such that |sn − s| < 1
for all n > N . Thus, by the triangle inequality, we have |sn | < |s| + 1 for all n > N , meaning that |sn | is
bounded for n > N . Of course, for any n ≤ N , there may be some term in the set {|s1 |, . . . , |sN |} which is
larger than |sn | (but still finite), so we write

|sn | ≤ max{|s1 |, |s2 |, . . . , |sN |, |s| + 1} < ∞

for all n ∈ N.
As noted above, sequences that diverge to ∞ or −∞ are unbounded. However, some divergent sequences
are bounded. For example, the sequence (bn ) defined by bn = (−1)n is divergent but bounded since |bn | ≤ 1
for all n ∈ N.
The results in the next theorem will be used frequently:
Theorem 3 (Algebraic Limit Theorem). Let (sn ) and (tn ) be sequences converging to s and t, respectively.
Then,
(i) lim asn = as, for all a ∈ R.
(ii) lim sn + tn = s + t.
(iii) lim sn tn = st.
(iv) lim sn /tn = s/t, provided t ̸= 0.
Proof. (i) We consider only the case where a ̸= 0 (when a = 0, we have the sequence 0, 0, 0, . . . which
obviously converges to 0). Notice that |asn − as| = |a||sn − s|. Since ϵ is arbitrary, we know that there exists
an N ∈ N such that |sn − s| < ϵ/|a| for all n > N (if we can choose an Nϵ ∈ N such that |sn − s| < ϵ for all
n > Nϵ , then we can also choose an N such that |sn − s| < ϵ/|a| for all n > N ) . Thus, for any n > N , we
have
ϵ
|asn − as| = |a||sn − s| < |a| = ϵ.
|a|
(ii) Notice that |(sn + tn ) − (s + t)| ≤ |sn − s| + |tn − t| by the Triangle Inequality. We know that there
exists an N1 ∈ N such that |sn − s| < ϵ/2 for all n > N1 , and an N2 ∈ N such that |tn − t| < ϵ/2 for all
n > N2 . Thus, for all all n > max{N1 , N2 }, we have
ϵ ϵ
|(sn + tn ) − (s + t)| ≤ |sn − s| + |tn − t| < + = ϵ.
2 2
(iii) Notice that

|sn tn − st| = |sn tn − stn + stn − st| ≤ |sn tn − stn | + |stn − st| = |tn ||sn − s| + |s||tn − t|

by the Triangle Inequality. Now consider the term |tn ||sn − s|. Since the sequence (tn ) is bounded (by
Theorem 2), we know there exists a b such that |tn | < b for all n ∈ N. Thus, we know that there exists an
N1 ∈ N such that |sn − s| < ϵ/(2b) for all n > N1 . Next consider the term |s||tn − t|, and assume for the
moment that s ̸= 0. We know that there exists an N2 ∈ N such that |tn − t| < ϵ/(2|s|) for all n > N2 . Thus,
for all n > max{N1 , N2 }, we have
ϵ ϵ
|sn tn − st| ≤ |tn ||sn − s| + |s||tn − t| < b + |s| = ϵ.
2b 2|s|
All that remains is to consider the case where s = 0. Similar to above, we know that there exists an N3 ∈ N
such that |sn | < ϵ/b for all n > N3 . Thus, for all n > N3 , we have
ϵ
|sn tn | ≤ |sn ||tn | < b = ϵ.
b
(iv) We just need to show that lim(1/tn ) = 1/t and then the result follows from (iii). Notice that
1 1 t − tn |tn − t| 1
− = = = |tn − t|.
tn t ttn |ttn | |t||tn |

6
We know that there exists a N1 such that |tn − t| < |t|/2 for all n > N1 . By the Reverse Triangle
Inequality, we have
||t| − |tn || ≤ |t − tn | = |tn − t|.
Thus, since ||t| − |tn || ≥ |t| − |tn |, we have |t| − |tn | < |t|/2 for all n > N1 , which implies that |tn | > |t|/2, or
equivalently, that
1 2
<
|tn | |t|
for all n > N1 . We also know that there exists an N2 ∈ N such that |tn − t| < ϵ|t|2 /2 for all n > N2 . Thus,
for all n > max{N1 , N2 }, we have

1 1 1 2 ϵ|t|2
− = |tn − t| < 2 = ϵ.
tn t |t||tn | |t| 2

It is worth noting here that lim sn /tn exists even if tn = 0 for some n < max{N1 , N2 }.
A sequence (sn ) is said to be non-decreasing if sn ≤ sn+1 for all n, and non-increasing if sn ≥ sn+1
for all n (if the inequalities are strict, then we say that the sequence is increasing rather than non-decreasing
or decreasing rather than non-increasing). A sequence that is either non-decreasing or non-increasing (or
both, in the case of a constant sequence) is called monotone. For example, the sequence (an ) defined by
an = n−2 is monotone (it is decreasing), while the sequence (bn ) defined by bn = (−1)n is not monotone.
We are now ready to prove one of the most important results we will encounter:
Theorem 4 (Monotone Convergence Theorem). If a sequence is monotonic and bounded, then it is conver-
gent.
Proof. Suppose the sequence (sn ) is monotone and bounded. More specifically, let’s say (sn ) is non-decreasing
(the non-increasing case can be handled analogously). Now, consider the set {sn : n ∈ N}. Since this set
is bounded, it has a supremum S ∈ R (by the completeness axiom), i.e., sn ≤ S for all n ∈ N. Intuitively,
lim sn = S, but let’s confirm this is indeed the case. For ϵ > 0, S − ϵ is not an upper bound of {sn : n ∈ N}
(s is the smallest upper bound of this set) and thus there exists an N such that sN > S − ϵ. Since sn is
non-decreasing, sn ≥ sN for all n > N . Thus,

S − ϵ < sN ≤ sn ≤ S < S + ϵ

for all n > N (the last inequality above follows from the fact that ϵ > 0). Therefore, |sn − S| < ϵ for n > N ,
i.e., (sn ) converges to S.
Consider again the sequence (bn ) defined by bn = (−1)n . Although this sequence is bounded, it is not
monotone, so the MCT cannot be applied. Of course, this does not mean that a sequence must be monotone
in order to converge. For example, consider the sequence (dn ) defined by

(−1)n
dn = ,
n
which has terms

d1 = −1, d2 = 1/2, d3 = −1/3, d4 = 1/4, d5 = −1/5, d6 = 1/6,

and so on. This sequence not monotone, but it does converge to 0. To see this, let’s use our usual “trick”
and set
(−1)N
− 0 = ϵ,
N
which is equivalent to
1
=ϵ
N
or N = 1/ϵ. Thus, for all n > 1/ϵ, we have |dn − 0| < ϵ. To be concrete, set ϵ = 0.1 so that N = 10. As
Figure 2 makes clear, |dn − 0| < 0.1 for all n > 10 (e.g., |d11 − 0| ≈ 0.091).

7
Figure 2: The sequence (dn ) defined by dn = (−1)n /n

We will now consider an application of the MCT. Suppose (sn ) is a sequence, and let
n
X
Sn = sk
k=1

for each n ∈ N. Then (Sn ) is itself a sequence, which we call a sequence of partial sums. The terms in
(Sn ) are
S1 = s1 , S2 = s1 + s2 , S3 = s1 + s2 + s3 ,
P∞
and
P∞ so on. If (Sn ) converges to S, then we say that the infinite series k=1 s−2 k converges to S and write
k=1 sk = S. For example, consider again the sequence (an ) defined by an = n , and let
n n
X X 1 1 1 1
An = ak = 2
= 1 + + + ... + 2.
k 4 9 n
k=1 k=1

The sequence (An ) has terms

1 5 5 1 49
A1 = 1, A2 = 1 + = , A3 = + = ,
4 4 4 9 36
and so on (see Figure 3). Since every term in (an ) is positive, (An ) is an increasing sequence. Moreover,
(An ) is bounded since
1 1 1 1 1 1
1+ + + ... + ≤ 1+ + + ... +
2×2 3×3 n×n 2×1 3×2 n × (n − 1)

1 1 1 1 1
=1+ 1− + − + ... + −
2 2 3 n−1 n

1
=1+ 1−
n
1
=2− ,
n

8
Pn
Figure 3: The sequence (An ) defined by An = k=1 n−2

i.e., 2 is an upper bound for (An ) (and, obviously, 1 is a lower bound). Thus, by the MCT, (An ) converges.
We didn’t work out what lim An is, but it is not 2 (it is actually π 2 /6 = 1.64 . . .). Often, it’s enough just to
know whether or not a sequence/series converges.
Notice that the sequence (An ) defined above can be also be defined recursively as

A1 = 1 and An = An−1 + n−2 for all n ≥ 2.

Another example of a recursively defined sequence is the well-known Fibonacci sequence (Fn ) defined by

F1 = 1, F2 = 1, and Fn = Fn−1 + Fn−2 for all n ≥ 3.

Clearly, this sequence diverges to ∞. However, we can show that the sequence (Gn ) defined by Gn = Fn /2n
converges. First, notice that (Gn ) has terms

G1 = 1/2, G2 = 1/4, G3 = 2/8, G4 = 3/16, G5 = 5/32,

and so on, and thus seems to be non-increasing (see also Figure 4). We can confirm this by noting that, for
n ≥ 3,
Fn Fn+1 2Fn − Fn+1 2Fn − (Fn + Fn−1 ) Fn − Fn−1
Gn − Gn+1 = n − n+1 = n+1
= n+1
= >0
2 2 2 2 2n+1
(since (Fn ) itself is increasing for n ≥ 3). This also makes it clear that 1/2 provides an upper bound for
(Gn ). Moreover, Gn > 0 for all n (both the numerator and denominator are positive for all n), i.e., 0 is a
lower bound for (Gn ). Thus, since that (Gn ) is monotonic and bounded, the MCT tells us it converges.
Before moving on, let’s consider the series (Bn ) defined by
n
X
Bn = αn ,
k=1

with |α| ∈ (0, 1). First, notice that

(1 − α)Bn = (α + α2 + . . . + αn ) − (α2 + α3 + . . . + αn+1 ) = α − αn+1 = α(1 − αn ),

9
Figure 4: The sequence (Gn ) defined by Gn = Fn /2n where (Fn ) is the Fibonacci sequence

so
α(1 − αn )
Bn = .
1−α
Thus, since lim αn = 0 (see Exercise 9), we have
α
lim Bn = .
1−α
by Theorem 3. It is worth noting that we can use this result to show that
n
X α 1
lim αn = α0 + =
1−α 1−α
k=0

(again by Theorem 3).

* Subsequences
Let m1 , m2 , m3 . . . be natural numbers with m1 < m2 < m3 . . ., which implies that mn ≥ n for all n ∈
N. Given the sequence (sn ), the sequence (smn ), i.e., the sequence with nth term smn , is said to be a
subsequence of (sn ).
As an example, consider the sequence (en ) defined by

1
en = 1 + (−1)n ,
n
which has terms

e1 = −2, e2 = 3/2, e3 = −4/3, e4 = 5/4, e5 = −6/5, e6 = 7/6,

and so on. One possible subsequence of (en ) can be constructed by setting mn = 2n for all n ∈ N, i.e.,
letting
em1 = d2 = 3/2, em2 = d4 = 5/4, em3 = d6 = 7/6

10
and so on. The terms in this subsequence consist of only the positive terms in the original sequence (en ). In
this sense, a subsequence can be viewed as being obtained by “deleting” certain terms from a sequence (in
this example, we deleted all the negative terms from (en )). Of course, we could also obtain a subsequence
from (en ) by deleting all its positive terms (i.e., setting mn = 2n − 1 for all n ∈ N). Alternatively, we could
obtain a subsequence from (en ) by deleting the first k terms (i.e., setting mn = n + k for all n ∈ N).
The following lemma will be used to prove a useful theorem about subsequences (a lemma is a theorem
that is not so interesting itself, but is instead used to prove other theorems):
Lemma 1. Every sequence contains a monotone subsequence.
Proof. We will say that sm is a dominant term in the sequence (sn ) if sm ≥ sn for all n > m. We now
consider two cases:
Case 1: Suppose (sn ) has infinitely many dominant terms. Then we can construct a subsequence (smn )
which has only of these dominant terms, and this subsequence will be non-increasing (i.e., monotone) since
sm1 ≥ sm2 ≥ sm3 ≥ . . ..
Case 2: Suppose (sn ) has finitely many (possibly zero) dominant terms. Let sm1 be the first term in
(sn ) after its final dominant term (if (sn ) has zero dominant terms, then m1 = 1). Since sm1 is not itself
dominant, there exists an m2 > m1 such that sm2 > sm1 . Similarly, since sm2 is not dominant, there exists
an m3 > m2 such that sm3 > sm2 , and so on. Thus, the subsequence (smn ) will be increasing (i.e., monotone)
since sm1 < sm2 < sm3 . . ..
Notice that the sequence (en ) defined above has infinitely many dominant terms, namely e2 , e4 , e6 , . . ., and
the subsequence that contains only these terms is monotone (it is decreasing). Similarly, the sequence (bn )
defined above has infinitely many dominant terms, namely b2 , b4 , b6 , . . ., and the subsequence that contains
only these terms is monotone (it is constant). On the other hand, the sequence (cn ) defined above has no
dominant terms since cn > cm for all n > m, and the subsequence (cmn ) that has nth term cmn = cn is
monotone (it is increasing).
We are now ready to prove the following:
Theorem 5 (Bolzano–Weierstrass Theorem). Every bounded sequence contains a convergent subsequence.
Proof. If (sn ) is a bounded sequence, then any subsequence of (sn ) is bounded. Thus by Lemma 1, (sn )
contains a monotone bounded subsequence, and by the MCT, this subsequence is convergent.
Notice that the sequence (en ) defined above is bounded (a lower bound is -2 and an upper bound is
3/2). The BWT says that this sequence must contain a convergent subsequence, even though it is not itself
convergent. One such subsequence is the one with terms e2 , e4 , e6 , . . . (which clearly converges to 1), another
has the terms e1 , e3 , e5 (which clearly converges to -1). The limit of a convergent subsequence is called a
subsequential limit (e.g., 1 and -1 are both subsequential limits of the sequence (en ) given above).
Before moving on, we will prove another lemma about subsequences that will be used later on:
Lemma 2. Every subsequence of a convergent sequence has the same limit.
Proof. Let (sn ) be sequence converging to s, i.e., for every ϵ > 0, there exists and N ∈ N such that |sn −s| < ϵ
for all n > N . Now, let (smn ) be an arbitrary subsequence of (sn ). Since mn ≥ n for all n ∈ N, we have
|smn − s| < ϵ for all n > N , and thus lim smn = s.
Recall that the sequence (en ) defined above has more than one subsequential limit. Thus, Lemma 2 tells
us that it is not convergent.
Suppose (sn ) is a bounded sequence. Defining the sequence (s̄n ) by s̄n = sup{sk : k ≥ n}, the limit
superior of (sn ) is
lim sup sn = lim s̄n .
Similarly, defining the sequence (sn ) by sn = inf{sk : k ≥ n}, the limit inferior of (sn ) is
lim inf sn = lim sn .
Both of these limits exist, even if (sn ) is divergent. To see this, notice the sequence (s̄n ) has terms
s̄1 = sup{s1 , s2 , s3 , . . .}, s̄2 = sup{s2 , s3 , s4 , . . .}, s̄3 = sup{s3 , s4 , s5 , . . .}

11
and so on. Since
{sn , sn+1 , sn+2 , . . .} ⊃ {sn+1 , sn+2 , sn+3 , . . .}
for all n ∈ N, we have s̄n ≥ s̄n+1 for all n ∈ N. That is, (s̄n ) is non-increasing. Moreover, (s̄n ) is bounded
since (sn ) is bounded. Thus, (s̄n ) converges by the MCT. Similarly, the sequence (sn ) is non-decreasing and
bounded, and thus converges by the MCT.
Let’s now consider several examples. First consider the sequence (an ) defined above. We have ān = an
and an = 0 for all n ∈ N, so lim sup an = 0 and lim inf an = 0. Next consider the sequence (bn ) defined
above. We have b̄n = 1 and bn = −1 for all n ∈ N, so lim sup bn = 1 and lim inf bn = −1. Finally, consider
the sequence (en ) defined above, which had terms

e1 = −2, e2 = 3/2, e3 = −4/3, e4 = 5/4, e5 = −6/5, e6 = 7/6,

and so on. The sequence (ēn ) has terms

ē1 = ē2 = 3/2, ē3 = ē4 = 5/4, ē5 = ē6 = 7/6,

and so on, which makes makes it easy to see that lim ēn = 1, i.e., lim sup en = 1. Similarly, the sequence
(en ) has terms
e1 = −2, e2 = e3 = −4/3, e4 = e5 = −6/5,
and so on, which makes makes it easy to see that lim en = −1, i.e., lim inf en = −1.
As the first and third examples above make clear, it is possible that sn > lim sup sn for some n ∈ N, i.e.,
lim sup sn may be greater than sup{sn : n ∈ N}.
We will now prove a useful lemma:

Lemma 3. The limit superior and limit inferior of a bounded sequence are subsequential limits of that
sequence.
Proof. We will prove that every bounded sequence contains a subsequence that converges to its limit superior.
The proof that that every bounded sequence contains subsequence that converges to its limit inferior is
similar.
Suppose (sn ) is a bounded sequence with limit superior s̄. As in the proof of Lemma 1, we need to
consider two cases:
Case 1: Suppose (sn ) has infinitely many dominant terms. Then the subsequence (smn ) that has only
these dominant terms has nth term smn = s̄mn . Since lim s̄n = s̄, we have lim smn = s̄.
Case 2: Suppose (sn ) has finitely many (possibly zero) dominant terms. Letting sm1 be the first term in
(sn ) after its final dominant term, we have s̄mn ≤ s̄ for all n ∈ N. Since sm1 is not itself dominant, there
exists an m2 > m1 such that sm2 > sm1 , and so on, which means that (smn ) is increasing. Finally, since s̄
is an upper bound on (smn ), for any ϵ > 0, there exists an N such that

s̄ − ϵ < smn ≤ s̄ < s̄ + ϵ

for all mn > N , and thus lim smn = s̄.

We are now ready to prove another important theorem:
Theorem 6. A bounded sequence is convergent if and only if its limit superior and limit inferior are equal.
Proof. Let S denote the set of all subsequential limits of the bounded sequence (sn ), and let s̄ and s denote
lim sup sn and lim inf sn , respectively
Suppose (sn ) converges. By Lemma 2, S is a singleton. However, from Lemma 3, we know s̄, s ∈ S, so
it must be the case that s̄ = s.
We will now prove the converse. Suppose s̄ = s, and consider some arbitrary t ∈ S which is the limit of
the subsequence (smn ). Since mn ≥ n for all n ∈ N, we have

{smn , smn+1 , smn+2 , . . .} ⊂ {sn , sn+1 , sn+2 , . . .}

12
for all n ∈ N, which means that

sup{smk : k ≥ n} ≤ sup{sk : k ≥ n}

for all n ∈ N. Thus, t ≤ s̄, i.e., sup S = s̄. A similar argument can be used to show that inf S = s. Further,
since s̄, s ∈ S by Lemma 3, we can say that lim sup sn = max S and lim inf sn = min S, which means that S
is a singleton. Thus, by Lemma 2, (sn ) converges.
To illustrate this, consider again the sequence (dn ) defined above which had terms

d1 = −1, d2 = 1/2, d3 = −1/3, d4 = 1/4, d5 = −1/5, d6 = 1/6,

and so on. The sequence (d¯n ) has terms

d¯1 = d¯2 = 1/2, d¯3 = d¯4 = 1/4, d¯5 = d¯6 = 1/6,

and so on, which makes makes it easy to see that lim d¯n = 0, i.e., lim sup dn = 0. Similarly, the sequence
(dn ) has terms
d1 = −1, d2 = d3 = −1/3, d4 = d5 = −1/5,
and so on, which makes makes it easy to see that lim dn = 0, i.e., lim inf dn = 0. Since lim sup dn =
lim sup dn = 0, Theorem 6 tells us that lim en = 0 (as we saw earlier).

* Cauchy Sequences
A sequence sn is called a Cauchy sequence if, for every ϵ > 0, there exists an N ∈ N such that m, n > N
implies |sn − sm | < ϵ.
Suppose sn is a Cauchy sequence. What this says is that sn and sm will be “close” to one another (i.e.,
they will be no more than ϵ apart) provided n and m are both “sufficiently large” (i.e., larger than N , which
depends on ϵ).
Consider again the sequence (an ) defined by an = n−2 . To see that this is a Cauchy sequence, let
N = ϵ−1/2 and notice that, for n, m > ϵ−1/2 , we have |n−2 − m−2 | < ϵ. Why? Suppose (without loss of
generality) that m > n. This implies that n−2 > m−2 and thus |n−2 − m−2 | = n−2 − m−2 < ϵ − m−2 < ϵ
(the second to last inequality follows from the fact that n−2 < ϵ when n > ϵ−1/2 , and the last inequality
follows from the fact that m−2 > 0).
The following two lemmas will be used in proving an important theorem about Cauchy sequences:
Lemma 4. All convergent sequences are Cauchy sequences.
Proof. Let sn be a convergent sequence with limit s, i.e., for all ϵ > 0, there exists an N ∈ N such that
|sn − s| < ϵ. Since this holds for all ϵ > 0, we can also say there exists an N ∈ N such that |sn − s| < ϵ/2.
Now, from the triangle inequality, we have

|sn − sm | ≤ |sn − s| + |s − sm |

(to see this, let a = sn − s and b = s − sm and recall that |a + b| ≤ |a| + |b| for all a, b ∈ R). Combining this
with the above fact that |sn − s| < ϵ/2 for all n > N (and, equivalently, that |sm − s| < ϵ/2 for all m > N ),
we have
|sn − sm | < ϵ
for all n, m > N .
Lemma 5. All Cauchy sequences are bounded.
Proof. If sn is a Cauchy sequence, there exists an N ∈ N such that |sn − sm | < ϵ for all n, m > N , which
means that |sn − sN +1 | < ϵ for all n > N (obviously N + 1 > N ). Now, from the triangle inequality, we have

|sn | ≤ |sn − sN +1 | + |sN +1 |

13
(to see this, let a = sn − SN +1 and b = sN +1 and recall that |a + b| ≤ |a| + |b| for all a, b ∈ R). Combining
this with the fact that |sn − sN +1 | < ϵ for all n > N means that

|sn | < |sN +1 | + ϵ

for all n > N . Finally, let

M = max{|s1 |, |s2 |, . . . , |sN |, |sN +1 | + ϵ}.
Then |sn | ≤ M for all n ∈ N.

We are now ready to prove the following:

Theorem 7 (Cauchy Convergence Criterion). A sequence is convergent if and only if it is a Cauchy sequence.
Proof. Lemma 4 tells us that all convergent sequences are Cauchy sequences, so it only remains to be proven
that all Cauchy sequences are convergent sequences. Let (sn ) be a Cauchy sequence. By Lemma 5, (sn ) is
bounded, and thus by the BWT, (sn ) contains a convergent subsequence (smn ). Denote the limit of this
subsequence by s∗ . We will now show that (sn ) also converges to s∗ .
Since (sn ) is a Cauchy sequence, there exists an N ∈ N such that |sn − sm | < ϵ for all n, m > N . Since
this holds for all ϵ > 0, we can also say there exists an N ∈ N such that |sn − sm | < ϵ/2. Now, since the
subsequence (smn ) converges to s∗ , there exists a K ≥ N which is in the set {m1 , m2 , m3 , . . .} such that
|sK − s∗ | < ϵ/2 (restricting K to be in the set {m1 , m2 , m3 , . . .} ensures that sK is a term of (smn )). Since
K ≥ N , we have |sn − sK | < ϵ/2 for n > N . Thus, for n > N , we have

|sn − s∗ | ≤ |sn − sK | + |sK − s∗ | < ϵ/2 + ϵ/2 = ϵ

(the first inequality follows from the triangle inequality; to see this, let a = sn − sK and b = sK − s∗ and
recall that |a + b| ≤ |a| + |b| for all a, b ∈ R).

Some Important Properties of Functions

Let A ⊂ R and f : A → R. We say that f is bounded if its range (i.e., the set f (A) = {f (x) ∈ R : x ∈ A})
is a bounded set. Now suppose further that x1 , x2 ∈ B ⊂ A and that x1 < x2 . We say that f is non-
decreasing on B if f (x1 ) ≤ f (x2 ) and non-decreasing on B if f (x1 ) ≥ f (x2 ) (if the inequalities are
strict, then we say that f is increasing on B rather than non-decreasing on B or that f is decreasing on
B rather than non-increasing on B). It should come as no surprise at this point that we say that f is
monotone on B if it is either non-increasing on B or non-decreasing on B.
Let I be an interval in R and f : I → R. We say that f is convex on I if, for all x, y ∈ I and all t ∈ (0, 1),

f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y).

If the inequality sign above is reversed, we say that f is concave on I.

To illustrate these properties, consider the function f : (0, ∞) → R defined by f (x) = 1/x. This function
is clearly decreasing on (0, ∞) since, for any x1 , x2 ∈ (0, ∞) satisfying x1 < x2 , we have 1/x1 > 1/x2 . To
see that f it unbounded, suppose that u = 1/x0 is an upper bound on the range of f . Since f is decreasing
on (0, ∞), there exists an x ∈ (0, x0 ) such that 1/x > u, so this is a contradiction. To show that f is convex
on (0, ∞), we need to show that
1 1 1
≤ t + (1 − t)
tx + (1 − t)y x y
for all x, y ∈ (0, ∞) and all t ∈ (0, 1). First, note that the above is equivalent to

1 ty + (1 − t)x
≤ .
tx + (1 − t)y xy

Since x, y ∈ (0, ∞) and t ∈ (0, 1), the denominator on each side of the above is positive, so we can write

xy ≤ (ty + (1 − t)x)(tx + (1 − t)y).

14
Figure 5: The function f : (0, ∞) → R defined by f (x) = 1/x

Subtracting xy from each side of the above and simplifying, we have

0 ≤ t(1 − t)(x − y)2 .

Since t ∈ (0, 1), this is equivalent to 0 ≤ (x − y)2 , which is true for any x, y ∈ R. Figure 5 depicts this for
the case where x = 1/2, y = 3/2, and t = 1/2. Notice that

1 1 1 3
f (tx + (1 − t)y) = f × + × = f (1) = 1,
2 2 2 2

which is less than

1 1 1 3 1 1 2 4
tf (x) + (1 − t)f (y) = f + f = ×2+ × = .
2 2 2 2 2 2 3 3

The downward-sloping dashed line shows that this ordering generalizes to other values of t ∈ (0, 1).

Some Topology of R
Let ϵ ∈ (0, ∞). The ϵ-neighbourhood of x ∈ R is Vϵ (x) = (x − ϵ, x + ϵ). The set Vϵ (x) \ {x} = (x − ϵ, x) ∪
(x, x + ϵ) is called the deleted ϵ-neighbourhood of x.
The point x ∈ R is an interior point of the set A ⊂ R if there exists a Vϵ (x) ⊂ A. Note that an interior
point of A must be in A, but this does not mean that every point in A is an interior point of A. For example,
the points a and b are not interior points of [a, b] since, for every ϵ > 0, Vϵ (a) \ [a, b] = (a − ϵ, a) ̸= ∅ and
Vϵ (b) \ [a, b] = (b, b + ϵ) ̸= ∅ (i.e., every Vϵ (a) and Vϵ (b) contain points outside [a, b]). On the other hand,
every point in (a, b) is an interior point of [a, b]. To see this, let x ∈ (a, b). We need to show that there is
some Vϵ (x) ⊂ [a, b]. There are two cases to consider. First, if x > (a + b)/2 (i.e., if x closer to b than a), then
Vb−x (x) = (2x − b, b) ⊂ [a, b] (since 2x − b > a). Second, if x ≤ (a + b)/2, then Vx−a (x) = (a, 2x − a) ⊂ [a, b]
(since 2x − a ≤ b). Similar arguments could be used to show that a and b are not interior points of (a, b),
but that every point in (a, b) is an interior point of (a, b). Finally, it should be clear that every point in R is
an interior point of R.

15
The point x ∈ R is a boundary point of the set A ⊂ R if every Vϵ (x) contains points in both A and
R \ A. Note that a boundary point of A may not be in A. For example, a is a boundary point of (a, b) since,
for every ϵ > 0, Vϵ (a) ∩ (a, b) = (a, min{a + ϵ, b}) ̸= ∅ and Vϵ (a) ∩ (R \ (a, b)) = (a − ϵ, a] ̸= ∅ (i.e., every Vϵ (a)
contains points inside (a, b) and outside (a, b)). Similarly, b is a boundary point of (a, b) since, for every
ϵ > 0, Vϵ (b) ∩ (a, b) = (max{a, b − ϵ}, b) ̸= ∅ and Vϵ (b) ∩ (R − (a, b)) = (b, b + ϵ] ̸= ∅. On the other hand, no
point in (a, b) is a boundary point of (a, b). To see this, let x ∈ (a, b). We need to show that there is some
Vϵ (x) that does not contain any points outside (a, b). There are again two cases to consider. If x > (a + b)/2
then Vb−x (x) = (2x − b, b) and Vϵ (a) ∩ (R \ (a, b)) = ∅, and if x ≤ (a + b)/2 then Vx−a (x) = (a, 2x − a) and
Vϵ (a) ∩ (R \ (a, b)) = ∅. Similar arguments could be used to show that a and b are boundary points of [a, b],
but that every point in (a, b) is not a boundary point of [a, b]. Finally, it should be clear R has no boundary
points.
The point x ∈ R is a limit point of the set A ⊂ R if every (Vϵ (x) \ {x}) ∩ A ̸= ∅ (i.e., if every Vϵ (x)
contains at least one point in A other than x).4 Note that a limit point of A need not be an element of A. For
example, every point in [a, b] is a limit point of (a, b). First, the points a and b are limit points in (a, b) since,
for every ϵ > 0, (Vϵ (a) \ {a}) ∩ (a, b) = (a, min{a + ϵ, b}) ̸= ∅ and (Vϵ (b) \ {b}) ∩ (a, b) = (max{a, b − ϵ}, b) ̸= ∅.
Second any point x ∈ (a, b) is a limit point in (a, b) since, for every ϵ > 0, (Vϵ (b) \ {b}) ∩ (a, b) = (max{a, b −
ϵ}, min{b, x + ϵ}) \ {x} ̸= ∅. A similar argument could be used to show that every point in [a, b] is a limit
point of [a, b]. Finally, it should be clear that every point in R is a limit point of R.
At this point you might be thinking that the limit points of a set are just the union of its interior points
and its boundary points. This is true for intervals in R, but it is not true more generally. For example, the
finite set {0, 1, 2} has no limit points at all. The points 0, 1 and 2 are isolated points in this set, i.e., points
in the set which are not limit points of the set (also, the points 0 and 2 are boundary points of this set, while
the point 1 is an interior point of this set).
A set A ⊂ R is said to be open if every point in A is an interior point of A. As we just saw, the open
interval (a, b) is an open set (hence the name). On the other hand, the closed interval [a, b] is not an open
set since a and b are not interior points of it. Note also that the union of two open sets is an open set. To
see this, consider any point c ∈ A ∪ B, where A and B are open sets. This means c ∈ A or c ∈ B (or both),
so there must exist a Vϵ (c) ∈ A or Vϵ (c) ∈ B (or both). Thus, Vϵ (c) ∈ A ∪ B. Similar arguments could be
used to show that the intersection of two closed sets is a closed set.
A set A ⊂ R is said to be closed if every boundary point of A is a point in A. Not surprisingly, the
closed interval [a, b] is a closed set since a, b ∈ [a, b]. On the other hand, the open interval (a, b) is not closed
since a, b ∈
/ (a, b). You should try to convince yourself that the union of two closed sets is closed, and that
the intersection of two closed sets is closed.
A surprising result is that R is both open and closed (it is “clopen”). It is open since, for every x ∈ R,
we have Vϵ (x) ⊂ R for any ϵ. It is closed since it has no boundary points and thus it is vacuously true that
it contains all of its boundary points. It may also be surprising that a set can be neither open nor closed.
For example, consider the half-open interval (a, b]. This set is open since there exists no Vϵ (b) in it. It is
also not closed since a is a boundary point for it but a ∈ / (a, b]. Similarly, [a, b) is neither open nor closed.
On the other hand, (−∞, b] and [a, ∞) are closed since they include all their boundary points (b and a,
respectively).

Functional Limits
Let A ⊂ R and c be a limit point of A. We say f : A → R has limit L at c if, for any ϵ > 0, there
exists a δ > 0 such that |f (x) − L| < ϵ for all points x ∈ A satisfying 0 < |x − c| < δ. This is written as
limx→c f (x) = L. Alternatively, we may say that f converges to L at c.
We can also write this definition using the language of neighbourhoods. Specifically, we say that f has
limit L at c if, for any ϵ > 0, there exists a δ > 0 such that f (x) ∈ Vϵ (L) for any x ∈ (Vδ (c) \ {c}) ∩ A.
What limx→c f (x) = L means is that is that we can make f (x) “arbitrarily close” to L (i.e., within
the ϵ-neighbourhood of L) by making x “sufficiently close”, but not equal, to c (i.e., within the deleted
δ-neighbourhood of c). The key is that ϵ is arbitrary while δ will, in general, depend on ϵ. Accordingly, ϵ
can be seen to play the same role that it did in the definition of a limit of a sequence, while δ plays a similar
role to that of N .
4A limit point is also called a cluster point or an accumulation point.

16
√
Figure 6: The function f : R+ → R defined by f (x) = x

√
For example, let’s show formally that the function f : R+ → R defined by f (x) = √x has limit 2 at 4.
This means that we need to find a δ > 0 such that x ∈ R+ and 0 < |x − 4| < δ imply | x − 2| < ϵ for any
ϵ > 0. Notice that √ √
√ ( x − 2)( x + 2) |x − 4| |x − 4|
| x − 2| = √ =√ <
x+2 x+2 2
(the inequality at the end follows from the fact that the denominator on the left is greater than the denom-
inator on the right). The above suggests that we could set δ = 2ϵ (this is a very “conservative” choice of δ).
To see that this “works”, note that 0 < |x − 4| < 2ϵ implies that

|x − 4|
< ϵ.
2
√ √
We saw above that the quantity on the left is larger than | x − 2|, so | x − 2| < ϵ as desired. To be concrete,
suppose we want ϵ = 0.1 and thus set√δ = 0.2. Now, consider a point that is within 0.2 of 4, say 3.81 (notice
that |3.81 − 4| = 0.19 < 0.2). Since | 3.81 − 2| ≈ 0.048 < 0.1, we are safe. Indeed, as Figure 6 makes clear,
for any x ∈ (3.8, 4.2) (i.e., any x between the vertical dotted lines), we have f (x) ∈ (1.9, 2.1) (i.e., any f (x)
between the horizontal dotted lines).
It is important to emphasize that x need not be in A, which means that f need not be defined at c (this
is what makes limits interesting!). For example, consider the function f : [0, 4) ∪ (4, ∞) → R defined by
√
x−2
f (x) =
x−4
(see Figure 7; the open dot emphasizes that f (4) is undefined). This function is not defined at 4, but 4 is a
limit point of its domain. We will now show that limx→4 f (x) = 1/4. First, note that
√
x−2 1 |x − 4| |x − 4|
− = √ <
x−4 4 4( x + 2)2 4

(similar to above, the inequality at the end follows

√ from the fact that the denominator on the left is greater
than the denominator on the right since ( x + 2)2 > 1 for any x ≥ 0). Thus, by setting δ = 4ϵ, we

17
√
Figure 7: The function f : [0, 4) ∪ (4, ∞) → R defined by f (x) = ( x − 2)/(x − 4)

can guarantee that |f (x) − 1/4| < ϵ whenever 0 < |x − 4| < δ (as in the previous example, this is a very
“conservative” choice of δ).
Let A ⊂ R and f : A → R. If c is a limit point of the set {x ∈ A : x > c}, then we say that f has
right-hand limit L at c (and write limx→c+ f (x) = L) if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − L| < ϵ for all points x ∈ A satisfying 0 < x − c < δ (notice that the inequality on the left in this
condition implies that x > c). Similarly, if c is a limit point of the set {x ∈ A : x < c}, then we say that f
has left-hand limit L at c (and write limx→c− f (x) = L) if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − L| < ϵ for all points x ∈ A satisfying 0 < c − x < δ (notice that the inequality on the left in this
condition implies that x < c). The following result should be quite intuitive from these definitions:
Theorem 8. Let A ⊂ R, f : A → R, and c be a limit point of both {x ∈ A : x > c} and {x ∈ A : x < c}.
Then limx→c f (x) = L if and only if limx→c+ f (x) = limx→c− f (x) = L.
Proof. By definition, if f has limit L, then the left-hand and right-hand limits of f will both be equal to L.
Conversely, if the left-hand and right-hand limits of f both equal L, then for any ϵ > 0, there exists a δ1 > 0
such that |f (x) − L| < ϵ for all points x ∈ A satisfying 0 < x − c < δ1 and a δ2 such that |f (x) − L| < ϵ for
all points x ∈ A satisfying 0 < c − x < δ2 . Setting δ = min{δ1 , δ2 } we thus have |f (x) − L| < ϵ for all points
x ∈ A satisfying 0 < |x − c| < δ, i.e., f has limit L.
Notice that Theorem 8 does not apply to the endpoints of an interval. For example, if the domain of
the function f is (a, b) then neither limx→a− f (x) nor limx→b+ f (x) exist, but it is possible that limx→a f (x)
and limx→b f (x) to exist. In particular, if limx→a+ f (x) exists then limx→a f (x) = limx→a+ f (x) and if
limx→b+ f (x) exists then limx→b f (x) = limx→b− f (x). An example is given in Exercise 13.
A function which does not converge at some limit point in its domain is said to diverge at that point.
For example, if limx→c− f (x) and limx→c− f (x) both exist, but limx→c+ f (x) ̸= limx→c− , then f diverges at
L (by Theorem 8). As an example, consider the function f : R → R defined by
(
1 + x if x ≥ 0
f (x) =
x if x < 0,
or more compactly, f (x) = x + max{0, x}/x (see Figure 8; the open dot emphasizes that f (0) ̸= 0). First,
notice that 0 is a limit point of both (0, ∞) and (−∞, 0). To see that limx→0+ f (x) = 1, set δ = ϵ so that

18
Figure 8: The function f : R → R defined by f (x) = x + max{0, x}/x

|f (x) − 1| = x < ϵ whenever 0 < x < δ. To see that limx→0+ f (x) = 0, set δ = ϵ so that |f (x) − 0| = −x < ϵ
whenever 0 < −x < δ (or equivalently, x ∈ (−δ, 0)). Thus, we can say that f diverges at 0.
Let A ⊂ R and c be a limit point of A. We say f : A → R diverges to ∞ at c (and write limx→c f (x) = ∞)
if, for every α ∈ R, there exists a δ > 0 such that f (x) > α for all points x ∈ A satisfying 0 < |x − c| < δ. We
similarly say that f diverges to −∞ at c (and write limx→c f (x) = −∞) if, for every β ∈ R, there exists a
δ > 0 such that f (x) < β for all points x ∈ A satisfying 0 < |x − c| < δ.
For example, we can say that
√ the function f : (−∞, 0)∪(0,
√ ∞) → R defined√by f (x) = 1/x2 diverges to ∞
at 0. To see this, set δ = 1/ α. Then, for 0 < |x| < 1/ α, we have 1/|x| > α, or equivalently, 1/x2 > α.
More concretely, suppose α = 100 and pick some x such that 0 < |x| < 0.1, say x = 0.09. We then have
1/x2 ≈ 123.5 > 100 (see Figure 9). The general idea here is that, for any x in the deleted δ-neighbourhood
of c, we have f (x) > α (think of α as a large positive number and δ as depending on α).
We can combine the above definitions to write things like limx→c+ f (x) = ∞ (f diverges to ∞ from the
right), limx→c− f (x) = ∞ (f diverges to ∞ from the left), limx→c+ f (x) = −∞ (f diverges to −∞ from
the right), or limx→c− f (x) = −∞ (f diverges to −∞ from the left). For example, consider the function
f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = 1/x. It should be clear from Figure 10 that limx→0+ f (x) = ∞
and limx→0− f (x) = −∞ (Figure 10 should also make it clear how foolish it would be to write 1/0 = ∞).
Let A ⊂ R and f : A → R. If A is unbounded from above, we write limx→∞ f (x) = L if, for every
ϵ > 0, there exists an a ∈ R such that |f (x) − L| < ϵ for all points x ∈ A satisfying x > a. Similarly, if A
is unbounded from below, we write limx→−∞ f (x) = L if, for every ϵ > 0, there exists a b ∈ R such that
|f (x) − L| < ϵ for all points x ∈ A satisfying x < b.
For example, consider again√ the function f : (−∞, 0)√∪ (0, ∞) → R defined by f (x) = 1/x2 . To see that
2
limx→∞ f (x) = 0, set a = 1/ ϵ. Then, for any x > 1/ ϵ, we have |1/x | < ϵ. Concretely, let ϵ = 0.01 so
that a = 10. Now, for any x > 10, we have |1/x2 | < ϵ (e.g., with x = 1, we have |1/x2 | ≈ 0.008). You should
try to convince yourself that limx→−∞ f (x) = 0.

Continuity
Let A ⊂ R and c ∈ A. We say f : A → R is continuous at c if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − f (c)| < ϵ for all points x ∈ A satisfying |x − c| < δ (notice that this allows for x = c). Alternatively,

19
Figure 9: The function f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = 1/x2

Figure 10: The function f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = 1/x

20
we may say that f is continuous at c if, for any ϵ > 0, there exists a δ such that f (A ∩ Vδ (c)) ⊂ Vϵ (f (c)). If
f is continuous at every point in B ⊂ A, we say that it is continuous on the set B.
It is useful to note that, if c is a limit point of A (i.e., so long as c is an element of A but is not an
isolated point of A), then we can say that f is continuous at c if and only if limx→c f (x) = f (c). On the
other hand, if c is an isolated point of A, then f is automatically continuous at c. The reason for this is
that, since c is an isolated point in A there exists some δ such that Vδ (c) = {c}, and thus for any x ∈ Vδ (c)
we have |f (x) − f (c)| = 0 (Exercise 15 provides a relevant example). Of course, we very rarely encounter a
function that has an isolated point in its domain, so we typically only care about continuity at limit points.
Continuing
√ with one of the examples from the previous section, the function f : [0, ∞) → R defined by
f (x) = x is continuous at 4 since limx→4 f (x) = f (4). On the other hand, the function f : R → R defined
by f (x) = x + max{0, x}/x (see Figure 8) is not continuous at 0. To see this, note that limx→0− f (x) = 0
and limx→0+ f (x) = 1, so limx→0 f (x) does not exist (by Theorem 8). Discontinuities that occur where the
left-hand limit and right-hand limits are not equal are called jump discontinuities.
It is worth emphasizing that, if f is not defined at c, then c cannot possibly be a point of continuity
regardless of whether or not lim√x→c f (x) = c exists. For example, consider again the function f : [0, 4) ∪
(4, ∞) → R defined by f (x) = ( x − 2)/(x − 4) (see Figure 7). This function is not continuous at 4 since it
is not defined at 4. Discontinuities that occur where the limit exists but the function is not defined (or the
function is defined but has a different value than the limit) are called removable discontinuities.

* Sequential Definitions of Functional Limits and Continuity

Let A ⊂ R and f : A → R. Given some sequence (sn ), the sequence (f (sn )) has terms f (s1 ), f (s2 ), f (s3 ),
and so on (think of these terms as the output obtained by applying the function f to the terms s1 , s2 , s3 ,
and so on). For example,
√ consider the sequence (an ) defined by an = n−2 and the function f : [0, ∞) → R
√
defined by f (x) = x. Then the sequence ( an ) has terms
√ √ √
a1 = 1, a2 = 1/2, a3 = 1/3,

and so on. The point here is that, by applying a function to sequence, we obtain another sequence which we
√
can apply all our usual results to. For example, it is easy to show that lim an = 0.
With this in mind, we now present the following theorem:
Theorem 9. Let A ⊂ R, c be a limit point of A, and f : A → R. The two statements are equivalent:
(i) limx→c f (x) = L.

(ii) For every sequence (sn ) with sn ∈ A \ {c} for all n ∈ N and lim sn = c, we have lim f (sn ) = L.
Proof. We first show that (i) implies (ii). Suppose, for any ϵ > 0, there exists a δ such that |f (x) − L| < ϵ
for all x ∈ A such that 0 < |x − c| < δ. If (sn ) converges to c, then, for any δ > 0, there exists an N ∈ N
such that 0 < |sn − c| < δ for n > N (note that here we have used δ rather than ϵ in the definition of a limit
of a sequence, and also that |sn − c| > 0 since sn ∈ A \ {c}, i.e., sn ̸= c). This means that |f (sn ) − L| < ϵ
for n > N .
We now show that (ii) implies (i), this time using a contrapositive argument. Suppose that there exists
an ϵ > 0 such that, for any δ, |f (x) − L| ≥ ϵ for at least one x ∈ A satisfying 0 < |x − c| < δ. Thus, there
is a number xn ∈ A such that 0 < |xn − c| < n−1 (ensuring that lim xn = c) but |f (xn ) − L| ≥ ϵ. In other
words, if (xn ) converges to c but lim f (xn ) = L fails to hold, then L is not the limit of f at c.
Statement (ii) in Theorem 9 is called the “sequential criterion” for a functional limit and will be quite
useful in proving some important results about functions.
We can also define continuity in terms of sequences:
Theorem 10. Let A ⊂ R, c ∈ A, and f : A → R. The two statements are equivalent:
(i) f is continuous at c.

(ii) For every sequence (sn ) with sn ∈ A for all n ∈ N and lim sn = c, we have lim f (sn ) = f (c).

21
Proof. The proof is similar to that for Theorem 9 except that we may have sn = c for some n ∈ N.
Before moving on, we present a very simple lemma (to be used later on) that is proved using the sequential
criteria for a functional limit:
Lemma 6. If a convergent sequence is bounded within a closed interval, then its limit is in that closed
interval.
Proof. Suppose the sequence (sn ) converges to s and that sn ∈ [a, b] for all n ∈ N. We want to show that
s ∈ [a, b]. First, for any ϵ > 0, there exists an N ∈ N such that |sn − s| < ϵ, or equivalently, s − ϵ < sn < s + ϵ
for all n > N . Now, since a ≤ sn ≤ b for all n ∈ N, we have a ≤ sn < s + ϵ, which implies a < s + ϵ and thus
a ≤ s, and similarly, s − ϵ < sn ≤ b, which implies s − ϵ < b and thus s ≤ b.

Algebraic Properties of Functional Limits and Continuity

We are often interested in combinations of functions. Letting A ⊂ R, f : A → R and g : A → R, we can
define the following for any x ∈ A:
• (af )(x) = af (x) for some constant a ∈ R.
• (f + g)(x) = f (x) + g(x).
• (f − g)(x) = f (x) − g(x).
• (f g)(x) = f (x)g(x).
• (f /g)(x) = f (x)/g(x) provided g(x) ̸= 0.
Note that these combined functions are themselves functions mapping from A to R, i.e., we could write
something like (f + g) : A → R.
Going forward, we will make regular use of the algebraic properties of limits and continuity given in the
following two theorems:
Theorem 11. Let A ⊂ R, c be a limit point of A, f : A → R and g : A → R. If limx→c f (x) = L and
limx→c g(x) = M , then:
(i) limx→c af (x) = aL for all a ∈ R.
(ii) limx→c (f + g)(x) = L + M .
(iii) limx→c (f − g)(x) = L − M .
(iv) limx→c (f g)(x) = LM .
(v) limx→c (f /g)(x) = L/M provided M ̸= 0.
Proof. This follows immediately from Theorem 3 and Theorem 9.
It is important to emphasize that the above results do not hold if f and/or g diverge to ∞ or −∞ (i.e.,
these results hold only when L and M are real numbers).
Theorem 12. If f : A → R and g : A → R are continuous at c ∈ A, then:
(i) af is continuous at c for all a ∈ R.
(ii) (f + g) is continuous at c.
(iii) (f − g) is continuous at c.
(iv) (f g) is continuous at c.
(v) (f /g) is continuous at c provided g(c) ̸= 0.
Proof. This follows immediately from Theorem 3 and Theorem 10.

22
Composite Functions
We will sometimes be interested in applying one function to the output of another function. Specifically, let
A, B ⊂ R, f : A → R, and g : B → R, with f (A) ⊂ B (i.e., the range of A is a subset of the domain of g).
Then for any x ∈ A we define
(g ◦ f )(x) = g(f (x))
as the composition of g on f . The domain of this composite function is A (the domain of f ), while its
range is
g(f (A)) = {g(f (a)) ∈ R : a ∈ A},
which is subset of the range of g (possibly, but not necessarily
p a proper subset of it).
For example, let f : (0, ∞) → R be defined by f (x) = 1/x and g : R → R be defined by g(x) p = −|x|
(notice that f ((0, ∞)) = (0, ∞), which is a subset of R, the domain of g). Here, we have (g◦f )(x) = −| 1/x|,
which has domain
p (0, ∞) and range (−∞, 0) (a proper subset of R− , the range of g). On the other hand,
(f ◦ g)(x) = −1/|x| (i.e., the composition of f on g) is completely nonsensical (it is not defined for any x).
The following theorem relates to the continuity of compositions:
Theorem 13. Let A, B ⊂ R, f : A → R, and g : B → R, with f (A) ⊂ B. If f is continuous at c ∈ A and g
is continuous at f (c) ∈ B, then g ◦ f is continuous at c.

Proof. Let b = f (c) and let W be an ϵ-neighbourhood of g(b). Since g is continuous at b, there exists a
δ-neighbourhood V of b such that g(V ) ⊂ W . Backing up a step, since f is continuous at c, there exists a
γ-neighbourhood U of c such that f (U ) ⊂ V . Thus g ◦ f (U ) ⊂ W since f (A) ⊂ B.
An immediate consequence of Theorem 13 is that, if f is continuous on A and g is continuous on B, then
g ◦ f is continuous on A.

Differentiation
Let A ⊂ R and x ∈ A. We say that f : A → R is differentiable at the point x if

f (x + h) − f (x)
lim .
h→0 h
exists.5 If this limit does exist, it is called the derivative of f at x and is denoted by f ′ (x). If f is
differentiable at all points in B ⊂ A, we say that it is differentiable on the set B and f ′ is then a
real-valued function defined on B (do not confuse the real number f ′ (x) with the function f ′ ; the value of
f ′ (x) may be different for different values of x but it is always a real number). If f ′ (x) and

f ′ (x + h) − f ′ (x)
lim
h→0 h
both exist, we say that f is twice differentiable at the point x, and refer to the latter limit as the second
derivative of f at x, denoted f ′′ (x). If f is twice differentiable at all points in C ⊂ B, we say that it is
twice differentiable on the set C, and f ′′ is then a real-valued function defined on C. Note that, if f is
twice differentiable on C, then it must also be differentiable on C (f ′ cannot be differentiable on C if f is
not differentiable on C). Note that C ⊂ B ⊂ A allows for the possibility that A, B, and C are all equal, but
rules out the possibility that A is a proper subset of B or that B is a proper subset of C.
For example, consider the function f : R → R defined by f (x) = x2 . We have

(x + h)2 − x2
f ′ (x) = lim = lim 2x + h = 2x
h→0 h h→0

and
2(x + h) − 2x
f ′′ (x) = lim = lim 2 = 2,
h→0 h h→0
5 Letting y = x + h, we can also write this limit as limy→x (f (y) − f (x))/(y − x).

23
√
so f is twice differentiable on R. As another example, consider the function g : R+ → R defined by g(x) = x.
We have √ √
′ x+h− x 1 1
g (x) = lim = lim √ √ = √ ,
h→0 h h→0 x+h+ x 2 x
so g is differentiable only on (0, ∞), which is a proper subset of R+ . Moreover,

√1 − 1
√
2 x+h 2 x 1 1
g ′′ (x) = lim = lim − √ √ √ √ = − 3/2 ,
h→0 h h→0 2 x x + h( x + x + h) 4x

so g is twice differentiable on (0, ∞).

The following theorem establishes the relationship between differentiability and continuity:
Theorem 14. Let A ⊂ R and x ∈ A. If f : A → R is differentiable at x, then it is continuous at x.
Proof. Suppose f is differentiable at x. We have

f (x + h) − f (x) f (x + h) − f (x)
lim f (x + h) − f (x) = lim h = lim h lim = 0f ′ (x) = 0.
h→0 h→0 h h→0 h→0 h
Thus, limh→0 f (x + h) = f (x), or equivalently, limy→x f (y) = f (x) (to see this, let h = y − x).
It is possible that f is continuous at x, but not differentiable at x. For example, the function f : R → R
defined by f (x) = |x| is continuous at 0, but it is not differentiable at 0.
An immediate consequence of Theorem 14 is that, if f is differentiable on some set, then it is continuous
on that set. Moreover, if f is twice differentiable on some set, then both f and f ′ are continuous on that set.
The following theorem establishes some algebraic properties of derivatives:
Theorem 15. Let A ⊂ R and x ∈ A. If f : A → R and g : A → R are differentiable at x, then:
(i) (af )′ (x) = af ′ (x).

(ii) (f + g)′ (x) = f ′ (x) + g ′ (x).

(iii) (f − g)′ (x) = f ′ (x) − g ′ (x).
(iv) (f g)(x) = f ′ (x)g(x) + f (x)g ′ (x).

(v) (f /g)(x) = (f ′ (x)g(x)−f (x)g ′ (x))/(g(x))2 provided g(x) ̸= 0.

Proof. (i) This is easy, but provides a good warm-up:

af (x + h) − af (x) f (x + h) − f (x)
(af )′ (x) = lim . = a lim = af ′ (x).
h→0 h h→0 h
Notice that we used part (i) of Theorem 11 in moving a outside the limit. Other parts of Theorem 11 are
used below.
(ii) This is not really any more difficult:

f (x + h) + g(x + h) − f (x) − g(x)

(f + g)′ (x) = lim
h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim +
h→0 h h
f (x + h) − f (x) g(x + h) − g(x)
= lim + lim
h→0 h h→0 h
= f ′ (x) + g ′ (x).

(iii) This follows immediately from combining (i) and (ii).

24
(iv) Now things get more interesting:
f (x + h)g(x + h) − f (x)g(x)
(f g)′ (x) = lim
h→0 h
f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)
= lim
h→0 h
(f (x + h) − f (x))g(x + h) + f (x)(g(x + h) − g(x))
= lim
h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim lim g(x + h) + f (x) lim
h→0 h h→0 h→0 h
= f ′ (x)g(x) + f (x)g ′ (x).
Here, we have used the fact that, since g is differentiable at x, it is continuous at x (by Theorem 14), and
thus limh→0 g(x + h) = g(x).
(v) We will again use the fact that limh→0 g(x+h) = g(x) and thus, since g(x) ̸= 0, we have limh→0 1/g(x+
h) = 1/g(x):
f (x + h)/g(x + h) − f (x)/g(x)
(f /g)′ (x) = lim
h→0 h
f (x + h)g(x) − f (x)g(x + h)
= lim
h→0 hg(x)g(x + h)
1 1 f (x + h)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(x + h)
= lim lim
g(x) h→0 g(x + h) h→0 h
1 (f (x + h) − f (x))g(x)−f (x)(g(x+h) − g(x))
= lim
(g(x))2 h→0 h

1 f (x + h) − f (x) g(x+h) − g(x)
= g(x) lim −f (x) lim
(g(x))2 h→0 h h→0 h
′ ′
f (x)g(x)−f (x)g (x)
= .
(g(x))2

The following theorem is the so-called “chain rule”:

Theorem 16. Let A, B ⊂ R, f : A → R, and g : B → R, with f (A) ⊂ B. If f is differentiable at x ∈ A
and g is differentiable at f (x) ∈ B, then (g ◦ f )′ (x) = g ′ (f (x))f ′ (x).
Proof. First, notice that
(g ◦ f )(y) − (g ◦ f )(x) g(f (y)) − g(f (x))
(g ◦ f )′ (x) = lim = lim .
y→x y−x y→x y−x
Next, since g is differentiable at f (x), we have
g(f (y)) − g(f (x))
g ′ (f (x)) = lim
f (y)→f (x) f (y) − f (x)
(which is clearly different from (g ◦ f )′ (x)). Define the function d : B → R by
(
g(f (y))−g(f (x))
f (y)−f (x) − g ′ (f (x)) if y ̸= x
d(f (y)) =
0 if y = x,
and notice that limf (y)→f (x) d(f (y)) = 0, which makes d continuous at f (y). Assuming for the moment that
y ̸= x, we can re-arrange the above to obtain
g(f (y)) − g(f (x)) f (y) − f (x)
= (d(f (y)) + g ′ (f (x))) .
y−x y−x
While the above is undefined for y = x, we can take limits of both sides as y → x to obtain the desired result
(the key here is to recognize that limy→x d(f (y)) = 0).

25
Figure 11: The function f : [−1, 1] → R defined by f (x) = x + |x|

For example, let f : R → R be defined by f (x) = 2 + x and g : R → R be defined by g(x) = x2 so that

(g ◦ f )(x) = (2 + x)2 for any x ∈ R. Since f ′ (x) = 1 and g ′ (x) = 2x, the chain rule tells us that

(g ◦ f )′ (x) = 2f (x) · 1 = 2(2 + x) = 4 + 2x.

To confirm this, define the function h : R → R by h(x) = 4 + 4x + x2 , and note that h(x) = (g ◦ f )(x) for
all x ∈ R. We can see directly that h′ (x) = 4 + 2x.

Optimization
Let A ⊂ R and f : A → R. Various optimizers of f are defined as follows:

• The point x∗ ∈ A is a global maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ A (i.e., if f (x∗ ) = max f (A)).
• The point x∗ ∈ A is a local maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.

• The point x∗ ∈ A is a global minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ A (i.e., if f (x∗ ) = min f (A)).
• The point x∗ ∈ A is a local minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.

These definitions should make it clear that a global optimizer is necessarily a local optimizer (but not
vice-versa).
If x∗ is a global/local maximizer of f , we say that f (x∗ ) is a global/local maximum value of f . Similarly,
if x is a global/local minimizer of f , we say that f (x∗ ) is a global/local minimum value of f .
∗

The key word above is “a”: A function may have multiple global maximizers, multiple local maxi-
mizers, multiple global minimizers, and/or multiple local minimizers. For example, consider the function
f : [−1, 1] → R defined by f (x) = x + |x| (see Figure 11). Every point in [−1, 0] is a global minimizer (and
thus a local minimizer), while 1 is the only global maximizer (and thus a local maximizer). What may be
surprising is that every point in [−1, 0) is also a local maximizer (0 is not a local maximizer since f (x) > f (0)
for all x ∈ (0, 1]).
The following theorem can be used to establish the existence of global optimizers:

26
Figure 12: The function f : (−2, 1] → R defined by f (x) = x2

Theorem 17 (Weierstrass Extreme Value Theorem). If a real-valued function defined on a closed interval
is continuous on its domain, then it reaches a global maximum and a global minimum.
Proof. Let f : [a, b] → R. Suppose that f is unbounded on [a, b]. Then for each n ∈ N, there is an sn ∈ [a, b]
such that |f (sn )| > n. By the BWT, the sequence (sn ) has a subsequence (smn ) that converges to some s.
Since (sn ) is bounded within [a, b], so is (smn ), and thus s ∈ [a, b] by Lemma 6. Now, since f is continuous
at s, we have lim f (smn ) = f (s), but lim |f (smn )| = ∞, which is a contradiction. Thus, f must be bounded
on [a, b].
Let M be the supremum of the range of f . Then for each n ∈ N, there is an tn ∈ [a, b] such that
M − n−1 < f (tn ) ≤ M , so that lim f (tn ) = M . By the BWT, the sequence (tn ) has a subsequence (tmn )
that converges to some t, and t ∈ [a, b] by the argument in the preceding paragraph. Now, since f is
continuous at t, we have lim f (tmn ) = f (t). Moreover, since f (tmn ) is a subsequence of f (tn ), and f (tn )
converges to M , Lemma 2 tells us that f (tmn ) also converges to M . Thus, M is a global maximum of f . To
show that f has a global minimum, we apply the preceding argument to −f .
It is possible that a function has a global maximizer or global minimizer even if it is not continuous
√ or
if its domain is not a closed interval. For example, the function f : [0, b) → R defined by f (x) = x has a
global minimizer at 0. That said, there is no√global maximizer
√ of this function. To see this, suppose that
x∗ ∈ [0, b) is a global maximizer of f . Since x∗ + ϵ > x∗ for any√ϵ ∈ (0, b − x∗ ), this is a contradiction.
On the other hand, the function g : [0, b] → R defined by g(x) = x must have both a global maximizer
and global minimizer since its domain is a closed interval and it is continuous
√ (the global maximizer is b).
It should be clear that the function h : (0, b) → R defined by h(x) = x has no global maximizers or global
minimizers.
If a function does not have any global local maximizers, then it definitely does not have any local global
maximizers, but a function can have one or more local maximizers and not have any global maximizers (the
same statement holds true if we replace the word “maximizers” with “minimizers”). To see this, consider
the function f : (−2, 1] → R defined by f (x) = x2 (see Figure 12; the open dot emphasizes that this function
is not defined for x = −2). Although 1 is a local maximizer, there is no global maximizer (1 is not a global
maximizer since f (x) > f (1) for any x ∈ (−2, −1)).
The following theorem provides a necessary condition for a local optimizer:

27
Figure 13: The function f : (−1, 1) → R defined by f (x) = x3

Theorem 18. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be differentiable at x∗ .

If x∗ is a local optimizer of f , then f ′ (x∗ ) = 0.
Proof. Suppose x∗ is a local maximizer of f . Then, for small enough h ∈ R, we have f (x∗ ) ≥ f (x∗ + h).
Now if h > 0, we have
f (x∗ + h) − f (x∗ )
≤ 0.
h
Taking limits as h → 0, we have f ′ (x∗ ) ≤ 0. A symmetric argument for h < 0 gives f ′ (x∗ ) ≥ 0. So, it
must be the case that f ′ (x∗ ) = 0. An analogous argument can be used for the case where x∗ is a local
minimum.
It is possible that f ′ (x) = 0 even when x is not a local optimizer. For example, consider the function
f : (−1, 1) → R defined by f (x) = x3 (see Figure 13; the open dots emphasize that this function is not
defined for x ∈ {−1, 1}). We have f ′ (x) = 3x2 , so f ′ (0) = 0. However, 0 is not a local maximizer or a local
minimizer of f . To see this, suppose that 0 is a local maximizer. Since (0 + ϵ)3 > 03 for any ϵ > 0, this is
a contradiction. A similar argument can be used to show that 0 is not a local minimizer. In fact, f has no
local or global optimizers (its domain is not a closed interval, so the WEVT does not apply). On the other
hand, the function g : [−1, 1] → R defined by g(x) = x3 has a global (and thus local) minimizer at -1 and a
global (and thus local) maximizer at 1, even though g ′ (−1) = g ′ (1) = 3 (-1 and 1 are not interior points of
[−1, 1], so Theorem 18 does not apply).
As another example, consider again the function f : R → R defined by f (x) = |x|. This function has a
local minimizer at 0, but is not differentiable at 0 (so Theorem 18 does not apply).
Keep in mind that there may be more than one interior point x∗ satisfying f (x∗ ) = 0. For example,
4 2 ′ 3
consider the
′
√ function
′
√ → R defined by f (x) = x − x (see
f : [−2, 2]
′
√ Figure√13). We have f (x) = 4x − 2x,
so f (−1/ 2) = f (0) = f (1/ 2) = 0. As we will see later, −1/ 2 and 1/ 2 are local minimizers and 0 is
a local maximizer. In fact, -2 and 2 (which are not interior points of [−2, 2]) are global maximizers and thus
also local maximizers (so there are a total of two local minimizers and three local maximizers).
As the above examples make clear, Theorem 18 is really not all that useful on its own. It can help us
identify candidates for local optimizers, but that is all. Before getting into sufficient conditions for local
optimizers, we need to establish a few intermediary results:

28
Figure 14: The function f : [−2, 2] → R defined by f (x) = x4 − x2

Theorem 19 (Rolle’s Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b). If
f (a) = f (b), then there exists a c ∈ (a, b) such that f ′ (c) = 0.
Proof. Since f is continuous on a closed interval, it has a global maximizer and a global minimizer by the
WEVT. If a (and thus b since f (a) = f (b)) is both a global maximizer and a global minimizer, then f is
constant and f ′ (c) = 0 for any c ∈ (a, b). Otherwise, there is a global optimizer in (a, b) and by Theorem
18, f ′ will be equal to zero at this optimizer.

Theorem 20 (Mean Value Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b).
Then there exists a c ∈ (a, b) such that
f (b) − f (a)
f ′ (c) = .
b−a
Proof. Consider the function g : [a, b] → R defined by

f (b) − f (a)
g(x) = f (x) − (x − a) + f (a) .
b−a

Note that g is continuous on [a, b] and differentiable on (a, b), and that g(a) = g(b). Thus, by Rolle’s
Theorem, there exists a c ∈ (a, b) such that g ′ (c) = 0. Since

f (b) − f (a)
g ′ (c) = f ′ (c) − ,
b−a
we have the desired result.
We will not use Rolle’s Theorem for anything other than proving the MVT, so it might better be called
a lemma, but we will honour tradition by calling it a theorem. In fact, we will only use the MVT to prove
the following lemma (which truly is a lemma):
Lemma 7. Let I be an interval in R and f : I → R be differentiable on I. For all x1 , x2 ∈ I satisfying
x1 < x2 :

29
(i) f ′ (x) ≥ 0 for all x ∈ I if and only if f (x1 ) ≤ f (x2 ) (i.e., f is non-decreasing on I).
(ii) f ′ (x) ≤ 0 for all x ∈ I if and only if f (x1 ) ≥ f (x2 ) (i.e., f is non-increasing on I).
Proof. (i) Suppose f ′ (x) ≥ 0 for all x ∈ I and apply the MVT to f on [x1 , x2 ] to obtain a c ∈ (x1 , x2 ) such
that
f (x2 ) − f (x1 ) = f ′ (c)(x2 − x1 ).
Since f ′ (c) ≥ 0 and x2 > x1 , the right-hand-side of the above is non-negative, which means that f (x1 ) ≤
f (x2 ). Moving in the other direction, if f (x1 ) ≤ f (x2 ), then the right-hand-side of the above must be
non-negative, which means that f ′ (c) ≥ 0 (since x2 > x1 ).
(ii) The proof of (ii) is similar and will be omitted.
Note that in Lemma 7, I need not be a closed interval, whereas in the previous two theorems, we required
f to be defined at the endpoints a and b (if f : (a, b) → R, then f is not defined at a or b). If we wanted
to apply Lemma 7 when I = (a, b), we just need to keep in mind that x1 > a and x2 < b since we require
x1 , x2 ∈ I. On the other hand, if we wanted to apply Lemma 7 when I = [a, b], we could have x1 = a or
x2 = b. The key is that x1 , x2 must be interior points in I. Of course, we could also apply Lemma 7 when I
is half-open or when I involves −∞ or ∞. √ √
For example, consider again the function f : R+ → R defined by f (x) = x. We have f ′ (x) = 1/(2 x) >
√ √
0 for all x ∈ R+ , so f is increasing on R+ . This can be confirmed directly by noting that x1 < x2 for all
x1 , x2 ∈ R+ satisfying x1 < x2 .
We now use Lemma 7 to prove the following theorem, which justifies the so-called “first derivative test”:
Theorem 21. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be differentiable on
Vϵ (x∗ ) \ {x∗ } for some ϵ > 0.
(i) x∗ is a local maximizer of f if and only if there exists an ϵ such that f ′ (x) ≥ 0 for all x ∈ (x∗ − ϵ, x∗ )
and f ′ (x) ≤ 0 for all x ∈ (x∗ , x∗ + ϵ).

(ii) x∗ is a local minimizer of f if and only if there exists an ϵ such that f ′ (x) ≤ 0 for all x ∈ (x∗ − ϵ, x∗ )
and f ′ (x) ≥ 0 for all x ∈ (x∗ , x∗ + ϵ).
Proof. (i) Suppose f ′ (x) ≥ 0 for all x ∈ (x∗ − ϵ, x∗ ) and f ′ (x) ≤ 0 for all x ∈ (x∗ , x∗ + ϵ). By Lemma
7, this implies that f is non-decreasing on x ∈ (x∗ − ϵ, x∗ ) and non-increasing on x ∈ (x∗ , x∗ + ϵ). Thus,
f (x∗ ) ≥ f (x) for all x ∈ (x∗ − ϵ, x∗ ) and f (x∗ ) ≥ f (x) for all x ∈ (x∗ , x∗ + ϵ), or equivalently, f (x∗ ) ≥ f (x)
for all x ∈ Vϵ (x∗ ). This argument also works in the other direction since Lemma 7 works in both directions.
(ii) The proof of (ii) is similar and will be omitted.
Consider again the function f : (−1, 1) → R defined by f (x) = x3 . We have f ′ (x) = 3x2 > 0 for all
x ∈ (−1, 1). Thus, 0 is not a local optimizer.
Next, consider again the function f : (−1, 1) → R defined by f (x) = |x|. We have f ′ (x) = −1 for all
x ∈ (−1, 0) and f ′ (x) = 1 for all x ∈ (0, 1), so 0 is indeed a local minimizer as argued above. The interesting
thing about this example is that f ′ (0) does not even exist (note that Theorem 21 does not require f to be
differentiable at x∗ ).
Theorem 21 is useful on its own, but can also be used to prove the following theorem, which provides
sufficient conditions for local maximizers and local minimizers:

Theorem 22. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be twice differentiable

on some Vϵ (x∗ ).
(i) If f ′ (x∗ ) = 0 and f ′′ (x∗ ) < 0, then x∗ is a local maximizer of f .
(ii) If f ′ (x∗ ) = 0 and f ′′ (x∗ ) > 0, then x∗ is a local minimizer of f .

Proof. (i) Suppose f ′ (x∗ ) = 0 and f ′′ (x∗ ) < 0. We have

f ′ (x∗ + h) − f (x∗ ) f ′ (x∗ + h)

f ′′ (x∗ ) = lim = lim ,
h→0 h h→0 h

30
since f ′ (x∗ ) = 0. Thus, f ′′ (x∗ ) < 0 implies that there exists an ϵ > 0 such that f ′ (x∗ + h) > 0 for all
h ∈ (−ϵ, 0) and f ′ (x∗ + h) < 0 for all h ∈ (0, ϵ), or equivalently, f ′ (x) > 0 for x ∈ (x∗ − ϵ, x∗ ) and f ′ (x) < 0
for x ∈ (x∗ , x∗ + ϵ), so x∗ is a local maximizer by Theorem 21.
(ii) The proof of (ii) is similar and will be omitted.
For example, consider the function f : (−1, 1) → R defined by f (x) = x2 . We have f ′ (x) = 2x and
f (x) = 2, so f ′ (0) = 0 and f ′′ (0) > 0. Thus, 0 is a local minimizer of f .
′′
4
As another√example, consider again √ the function f : [−2, 2] → R defined by f (x)
√=x − x2 . √
As we saw
′ ′ ′ ′′ 2 ′′ ′′
above, f (−1/√ 2) = f (0)√= f (1/ 2) = 0. Moreover, f (x) = 12x − 2 so f (−1/ 2) = f (−1/ 2) = 4 >
0. Thus, −1/ 2 and −1/ 2 are local minimizers of f . On the other hand, f ′′ (0) so Theorem 22 cannot tell
us whether 0 is a local minimizer or√local maximizer (or neither). Fortunately, we can √ check using Theorem
21. In particular, for any x ∈ (−1/ 2, 0) we have f ′ (x) > 0, and for any x ∈ (0, 1/ 2) we have f ′ (x) < 0.
Thus, 0 is a local maximizer of f .
The following theorem provides necessary conditions for local maximizers and local minimizers:
Theorem 23. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be twice differentiable
on some Vϵ (x∗ ).
(i) If x∗ is a local maximizer of f , then f ′ (x∗ ) = 0 and f ′′ (x∗ ) ≤ 0.

(ii) if x∗ is a local minimizer of f , then f ′ (x∗ ) = 0 and f ′′ (x∗ ) ≥ 0.

Proof. (i) Suppose x∗ is a local maximizer of f . Then f ′ (x∗ ) = 0 (by Theorem 18). Now, if f ′′ (x∗ ) > 0 then
x∗ is also a local minimizer of f by Theorem 21. This would mean that f is constant on some Vϵ (x∗ ), in
which case f ′′ (x∗ ) = 0, a contradiction.
(ii) The proof of (ii) is similar and will be omitted.
Make sure you understand the difference between Theorems 22 and 23. Theorem 22 tells us that, if
f ′ (x∗ ) = 0 and f ′′ (x∗ ) < 0 then x∗ is definitely a local maximizer, and if f ′ (x∗ ) = 0 and f ′′ (x∗ ) > 0 then x∗
is definitely a local minimizer. On the other hand, Theorem 23 tells us that if f ′ (x∗ ) = 0 and f ′′ (x∗ ) = 0,
then x∗ is possibly a local maximizer or a local minimizer (or neither). For example, consider again the
function f : (−1, 1) → R defined by f (x) = x3 . We have f ′ (0) = 0 and f ′′ (0) = 0, but, as we saw above, 0
is not a local maximizer or local minimizer. On the other hand, consider the function g : R → R defined by
g(x) = x4 . We have g ′ (x) = 4x3 and g ′′ (x) = 12x2 . Thus, g ′ (0) = 0 and g ′′ (0) = 0. However, since g ′ (x) < 0
for all x < 0 and g ′ (x) > 0 for all x > 0, we can say that 0 is a local minimizer (by Theorem 21).
One last issue we need to address before moving on is how we can identify global optimizers. We will
restrict our attention to continuous functions defined on a closed interval, so that we can be certain there is
at least one global maximizer and at least one global minimizer (by the WEVT). Assuming f is defined on
the closed interval [a, b], we simply compute f (a), f (b), and f (x) for every x that has been identified as a
local optimizer (or a candidate for one) and compare their values. For √ example, consider
√ again the function
f : [−2, 2] → R defined by f (x) = x4 −x2 . Recall that we identified −1/ 2 and 1/ √ 2 = 0 as√local minimizers,
and 0 as a local maximizer. Comparing the values f (−2) = f (2) = 12, f (−1/ 2) = f (1/ 2)√= −1/4, √ and
f (0) = 0, we see that the global maximizers are -2 and 2, and the global minimizers and −1/ 2 and 1/ 2.
On the other hand, 0 is not a global optimizer.
If f is either convex or concave on an interval, there is yet another method we can turn to. First, one
more lemma:
Lemma 8. Let I be an open interval in R and f : I → R be differentiable at x ∈ I.
(i) f is convex on I if and only if f (y) − f (x) ≥ f ′ (x)(y − x) for all y ∈ I.

(ii) f is concave on I if and only if f (y) − f (x) ≤ f ′ (x)(y − x) for all y ∈ I.

Proof. (i) Since f is convex, we have

f (ty + (1 − t)x) ≤ tf (y) + (1 − t)f (x).

31
Re-arranging the above yields

f (ty + (1 − t)x) − f (x)

f (y) − f (x) ≥ .
t
Defining the function g : (0, 1) → R as g(t) = f (ty + (1 − t)x), we can re-write the above as

g(t) − g(0)
f (y) − f (x) ≥ .
t
Since f is differentiable at x, g is differentiable at 0. Thus, taking the (right-hand) limit of both sides of the
above at 0 yields
f (y) − f (x) ≥ g ′ (0).
Finally, since
g ′ (t) = f ′ (ty + (1 − t)x)(y − x)
(by the chain rule), we have
g ′ (0) = f ′ (x)(y − x)
which yields the desired result. To show the converse, let z = (1 − t)x + ty. Since z ∈ I, we have

f (x) − f (z) ≥ f ′ (z)(x − z)

and
f (y) − f (z) ≥ f ′ (z)(y − z).
Multiplying the first of these inequalities by 1 − t and the second by t and then adding, we have

(1 − t)(f (x) − f (z)) + t(f (y) − f (z)) ≥ (1 − t)f ′ (z)(x − z) + tf ′ (z)(y − z),

or equivalently,
(1 − t)f (x) + tf (y) − f (z) ≥ f ′ (z)((1 − t)x + ty − z).
Since z = (1 − t)x + ty, the right-hand side of the above is equal to zero, and we have

(1 − t)f (x) + tf (y) ≥ f ((1 − t)x + ty),

i.e., f is convex on I.
(ii) The proof of (ii) is similar and will be omitted.
Lemma 8 will now be used to prove the following theorem:

Theorem 24. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be differentiable at x∗ .

(i) If f is convex on I then x∗ is a global minimizer of f if and only if f ′ (x∗ ) = 0.
(ii) If f is concave on I then x∗ is a global maximizer of f if and only if f ′ (x∗ ) = 0.

Proof. (i) If f ′ (x∗ ) = 0, then, by Lemma 8, f (y) − f (x∗ ) ≤ 0 (i.e., f (x∗ ) ≤ f (y)) for all y ∈ I. Conversely, if
x∗ is a global maximizer of f (meaning that it is also a local maximizer of f ), then f ′ (x∗ ) = 0 by Theorem
18.
(ii) The proof of (ii) is similar and will be omitted.
For example, consider again the function f : (−2, 1] → R defined by f (x) = x2 (see again Figure 12).
Since this function is convex on its domain (see Exercise 12) and f ′ (0) = 0, we can conclude that 0 is a
global minimizer. Note that, although 1 is a global maximizer of f , 1 is not an interior point in (−2, 1], so
Theorem 24 is not applicable to it.

32
Extensions to Rk
The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (a, b) such
that a ∈ A and b ∈ B (these ordered pairs (a, b) are not sets themselves, but rather elements of the set
A × B). For example, if A = {Red, Yellow} and B = {Green, Blue}, then

A × B = {(Red, Green), (Red, Blue), (Yellow, Green), (Yellow, Blue)}.

Note that an ordered pair such as (Green, Red) is not an element of A × B since Green ∈ / A and Red ∈
/ B.
On the other hand, note that (Red, Yellow) and (Yellow, Red) are distinct elements within

A2 = A × A = {(Red, Red), (Red, Yellow), (Yellow, Red), (Yellow, Yellow)}.

The important point is that the order of ordered pairs matters!

In what follows, our interest will focus on subsets of

Rk = R × · · · × R .
| {z }
k times

Elements in Rk (which we again call points) are k-vectors such as x = (x1 , . . . , xk ) or y = (y1 , . . . , yk ).6 For
example, consider the set

C = (−∞, 0) × (0, ∞) = {(x1 , x2 ) ∈ R2 : x1 < 0, x2 > 0}

(which is a proper subset of R2 ), and the points x = (−1, 1), y = (−3, 3) and z = (1, −1). We have x, y ∈ C
but z ∈
/ C.
A real-valued function whose domain is a subset of Rk with k > 1 is called a real-valued function of
several variables.7 For example, consider the function f : R2 → R defined by f (x1 , x2 ) = x1 x2 . This
function takes the point (x1 , x2 ) ∈ R2 as input and produces the point x1 x2 ∈ R as output.
Let x, y ∈ Rk . The function d : Rk × Rk → R defined by
v
u k
uX
d(x, y) = t (xi − yi )2
i=1

yields the Euclidean distance between x and y. We call

v
u k
uX
d(x, 0) = t x2i
i=1

the Euclidian norm of x, and denote it by ||x|| (here, 0 is a k-vector of zeros).8 Note that when k = 1, we
have d(x, y) = |x − y| and ||x|| = |x|.
For example, consider again the set C and the points x, y ∈ C defined above. We have
p p √
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 = (−1 + 3)2 + (1 − 3)2 = 8,
√ √
||x|| = 2, and ||y|| = 18.
It is worth mentioning
Pk that the Euclidean distance is just one way to measure distance between two
points in Rk (e.g., i=1 |xi − yi | yields the “Manhattan distance” between x and y). Loosely defined, a
measure of distance between two elements in a set is called a metric, and a metric along with the set on
which it operates is called a metric space. The function d defined above along with Rk is known as the
Euclidean k-space.
6 When working with matrices, we will always treat a k-vector as a k × 1 matrix.
7 We will also encounter functions whose range is a subset of Rk (e.g., when defining the derivative of a real-valued function
of several variables).
8 Recognizing that x − y = (x − y , . . . , x − y ), we can also write d(x, y) = ||x − y||
1 1 k k

33
It is straightforward to extend most of the definitions we encountered earlier to Rk using the function d.
For example, a set A ∈ Rk is said to be bounded if there exists an M ∈ R such that d(x, y) < M for all
x, y ∈ A (i.e., if the Euclidian distance between any two points in A is finite).
Similarly, letting ϵ > 0, the ϵ-neighbourhood of x ∈ Rk is

Vϵ (x) = {y ∈ Rk : d(x, y) < ϵ}

(i.e., the set of all points whose Euclidian distance from x is less than ϵ). Note that, with k = 1, we have

Vϵ (x) = {y ∈ R : |x − y| < ϵ} = (x − ϵ, x + ϵ)

as we saw before. With k = 2, we have

p
Vϵ (x1 , x2 ) = {(y1 , y2 ) ∈ R2 : (x1 − y1 )2 + (x2 − y2 )2 < ϵ}

(i.e., the set of all points inside a circle with radius ϵ centered at (x1 , x2 )). The set Vϵ (x) \ {x} is called the
deleted ϵ-neighbourhood of x (note that {x} is a singleton, i.e., a set with only one element, namely x).
With these extended definitions of ϵ-neighbourhoods and deleted ϵ-neighbourhoods in hand, our earlier
topological definitions are virtually unchanged:
• The point x ∈ Rk is an interior point of the set A ⊂ Rk if there exists a Vϵ (x) ⊂ A.
• The point x ∈ Rk is a boundary point of the set A ⊂ Rk if every Vϵ (x) contains points in both A
and Rk − A.
• The point x ∈ Rk is a limit point of the set A ⊂ Rk if every (Vϵ (x) \ {x}) ∩ A ̸= ∅ (i.e., Vϵ (x) contains
at least one point in A other than x).
• A set A ⊂ Rk is said to be open if every point in A is an interior point of A.
• A set A ⊂ Rk is said to be closed if every boundary point of A is a point in A.
Let’s now look a bit more closely at the set C defined above. First, every point in C is an interior point. To
see this, let w ∈ C and set q
ϵ = min{w12 , w22 }
so that Vϵ (w) ⊂ C. Figure 15 illustrates this for the case where w = (−3, 4); notice that the right edge of
the circle representing V3 (−3, 4) just touches the vertical axis while the bottom edge of this circle is above
the horizontal axis, meaning that all points inside this circle are elements of C (if we had w12 > w22 , then the
bottom edge of the circle would just touch the horizontal axis and the right edge of the circle would be to
the left of the vertical axis). Next, the set of all boundary points of C is

{(x1 , x2 ) ∈ R2 : x1 ≤ 0, x2 = 0} ∪ {(x1 , x2 ) ∈ R2 : x1 = 0, x2 ≥ 0},

(this is the set of all points lying on either the vertical axis or the horizontal axis in Figure 15, none of
which are in C itself). For example, (0, 0) is a boundary point of C. To see this, let ϵ > 0 and consider the
2
points a = (−ϵ/2, ϵ/2) ∈ C and √ b = (ϵ/2, −ϵ/2) ∈ C − R = [0, ∞) × (−∞, 0]. We have a, b ∈ Vϵ (0, 0) since
d(a, (0, 0)) = d(b, (0, 0)) = ϵ/ 2 < ϵ. That is, the every ϵ-neighbourhood of (0, 0) includes points in C and
in C − R2 . Finally, the set of limit points of C is the union of its interior points and its boundary points:
(−∞, 0] × [0, ∞).
We now turn to limits and continuity of real-valued functions of several variables. Since we presented
definitions using the language of neighbourhoods already, there is nothing new here conceptually.9 Specifi-
cally, let A ⊂ Rk and c be a limit point of A. We say f : A → R has limit L at c if, for any ϵ > 0, there
exists a δ such that f (x) ∈ Vϵ (L)for any x ∈ (Vδ (c) \ {c}) ∩ A. This is written as limx→c f (x) = L.
Let A ⊂ Rk and c ∈ A. We say that f : A → R is continuous at the point c if, for any ϵ > 0, there
exists a δ such that f (A ∩ Vδ (c)) ⊂ Vϵ (f (c)). If c is a limit point of A, then f is continuous if and only if
9 We won’t even bother defining things like one-sided limits for real-valued functions of several variables; you have all the
tools you need to do so yourself.

34
Figure 15: V3 (−3, 4)

limx→c f (x) = f (c). As before, if f is continuous at every point in B ⊂ A, we say that it is continuous on
the set B.
Fortunately, the algebraic properties of functional limits and continuity presented in Theorems 11 and
12, respectively, as well as the chain rule (Theorem 16) continue to hold with real-valued functions of several
variables. The interested reader may refer to Rudin (1976, Theorems 4.4, 4.9, and 9.15) for formal statements
and proofs of these results.

Differentiation of Real-valued Functions of Several Variables

Let A be an open set in Rk and x ∈ A. We say that f : A → R is differentiable at the point x if there
exists an r ∈ Rk such that
f (x + h) − f (x) − r′ h
lim
h→0 ||h||
Pk
exists and is equal to zero (note that h is a k-vector and r′ h = i=1 ri hi ). As before, if f is differentiable
at all points in B ⊂ A, we say that it is differentiable on the set B.
For example, consider the function f : R2 → R defined by f (x1 , x2 ) = x21 +x22 . Here, the above expression
is
(x1 + h1 )2 + (x2 + h2 )2 − (x21 + x22 ) − (r1 h1 + r2 h2 )
lim p .
h1 ,h2 →0 h21 + h22
Setting r1 = 2x1 and r2 = 2x2 and simplifying, we have

h2 + h22
q
lim p1 = lim h21 + h22 = 0.
h1 ,h2 →0 h21 + h22 h1 ,h2 →0

Thus, f is differentiable on R2 since the above holds for any (x1 , x2 ) ∈ R2 .

The following theorem is analogous to Theorem 14:

Theorem 25. Let A be an open set in Rk and x ∈ A. If f : A → R is differentiable at x, then f is

continuous at x.

35
Proof. Suppose f is differentiable at x. We have
f (x + h) − f (x) − r′ h
lim f (x + h) − f (x) − r′ h = lim ||h|| lim = 0 · 0 = 0,
h→0 h→0 h→0 ||h||
which implies
lim f (x + h) − r′ h = f (x).
h→0

Thus, since limh→0 r′ h = 0, we have limh→0 f (x + h) = f (x).

An immediate consequence of Theorem 25 is that, if f is differentiable on B ⊂ A, then it is continuous
on B.
If
f (x1 , . . . , ci−1 , ci + h, ci+1 , . . . , ck ) − f (x)
lim
h→0 h
exists we call it the ith first-order partial derivative of f at x and denote it by fi (x). Similarly, if
fi (x1 , . . . , cj−1 , cj + h, cj+1 , . . . , ck ) − fi (x)
lim
h→0 h
exists we call it the ij th second-order partial derivative of f at x and denote it by fij (x). If the ith
first-order partial derivative of f exists at all points B ⊂ A, then fi is a real-value function defined on B.
Similarly, if the ijth second-order partial derivative of f exists at all points C ⊂ B, then fij is a real-value
function defined on C.
For example, consider again the function f : R2 → R defined by f (x1 , x2 ) = x21 + x22 . The first-order
partial derivatives of f are
(x1 + h)2 + x22 − (x21 + x22 ) (x1 + h)2 − x21
f1 (x1 , x2 ) = lim = lim = 2x1
h→0 h h→0 h
and
x21 + (x2 + h)2 − (x21 + x22 ) (x2 + h)2 − x22
f2 (x1 , x2 ) = lim = lim = 2x2 .
h→0 h h→0 h
The second-order partial derivatives are
2(x1 + h) − 2x1
f11 (x1 , x2 ) = lim = 2,
h→0 h
2(x2 + h) − 2x2
f22 (x1 , x2 ) = lim = 2,
h→0 h
2x1 − 2x1
f12 (x1 , x2 ) = lim = 0,
h→0 h
and
2x2 − 2x2
f21 (x1 , x2 ) = lim =0
h h→0

(recall from earlier that, so long as the numerator is zero, it is not a problem if the limit of the denominator
is zero).
The following theorem provides a necessary condition for the existence of partial derivatives:
Theorem 26. Let A be an open set in Rk and x ∈ A. If f : A → R is differentiable at x, then all the partial
derivatives of f at x exist.
Proof. Suppose f is differentiable at x. Letting hi = t and hj = 0 for all j ̸= i, we have r′ h = tri and
||h|| = t. Thus, from the definition of differentiability, we have
f (x1 , . . . , xi−1 , xi + t, xi+1 , . . . , xk ) − f (x)
lim = ri ,
t→0 t
i.e., fi (x) = ri .

36
It is possible that the partial derivatives of f at x exist even if f is not differentiable at x. In fact, the
partial derivatives of f at x can exist even if f is not continuous at x.
Let’s now return to the definition of differentiability. In the proof of Theorem 26, we saw that, if f is
differentiable at x, then  
f1 (x)
r =  ...  .
 

fk (x)
We call this the Jacobian matrix of f at x. It is tempting to think of r as a function of x, but r ∈ Rk . In
particular, each ri = fi (x) ∈ R (do not confuse the real number fi (x) with the function fi ). That said, the
value of r may be different for different values of x (recall that (r1 , r2 ) = (2x1 , 2x2 ) in the example considered
above).
Now suppose that f is differentiable on B ⊂ A so that all the partial derivatives of f exist for every
x ∈ B (by Theorem 26), and consider the function f ′ : B → Rk defined by

f ′ (x) = (f1 (x), . . . , fk (x)).

This function takes the point x ∈ Rk as input and produces the point (f1 (x), . . . , fk (x)) ∈ Rk as output
(this is a vector-valued function, as opposed to a real-valued function). Perhaps not surprisingly, we will
call f ′ (x) the derivative of f at x. We then say that f ′ is differentiable at the point x ∈ B, or equivalently,
that f is twice differentiable at the point x, if there exists a k × k matrix of real numbers H such that
||f ′ (x + h) − f ′ (x) − Hh||
lim
h→0 ||h||

exists and is equal to zero (note that Hh is a k × 1 matrix, i.e., a k-vector like f ′ (x + h) and f ′ (x)). If f is
twice differentiable at all points in C ⊂ B, then we say f is twice differentiable on the set C. Moreover,
by Theorem 25, if f is twice differentiable on C, then f ′ is continuous on C.
Consider once again the function f : R2 → R defined by f (x1 , x2 ) = x21 + x22 . Here, the above expression
is
2(x1 + h1 ) 2x1 h1 2h1 h
− −H −H 1
2(x2 + h2 ) 2x2 h2 2h2 h2
lim p
2 2
= lim p
2 2
.
h1 ,h2 →0 h1 + h2 h1 ,h2 →0 h1 + h2
Setting
2 0
H= ,
0 2
we have

2h1 2 0 h1 2h1 2h1 √
− −
2h2 0 2 h2 2h2 2h2 02 + 02
lim p = lim p = lim p 2 =0
h1 ,h2 →0 h1 + h22
2 h1 ,h2 →0 h1 + h22
2 h1 ,h2 →0 h1 + h22

Thus, f is twice differentiable on R2 since the above holds for any (x1 , x2 ) ∈ R2 .
The analogue of Theorem 26 for vector-valued functions (see Rudin, 1973, Theorem 9.17) tells us that,
if f is twice differentiable at x, then all the second-order partial derivatives of f at x exist. In particular,
we have  
f11 (x) f12 (x) · · · f1k (x)
f21 (x) f22 (x) · · · f2k (x)
H= .
 
.. .. .. 
 .. . . . 
fk1 (x) fk2 (x) · · · fkk (x)
We call H the Hessian matrix for f at x. It is important to emphasize that this is not a function, but
rather a k × k matrix of real numbers, i.e., each fii (x) ∈ R. In the example considered above, H is the same
for any x, but, in general, H may vary with x.
The following theorem will be helpful when dealing with Hessian matrices:

37
Theorem 27 (Young’s Theorem). Let A be an open set in Rk , x ∈ A, and f : A → R. If the partial
derivatives fi and fj exist in some Vϵ (x) and fij and fji are both continuous at x, then fij (x) = fji (x).
Proof. See Apostol (1974, Theorem 12.13).
If f is differentiable on some set and f ′ is continuous on that set, it is said to be continuously differen-
tiable on that set (this is true whether f is a real-valued function of one variable or of several variables, but
we did not need to introduce this concept in the previous section). If f ′ is continuously differentiable on some
set, then f is twice continuously differentiable that set. Thus, if f is twice continuously differentiable
on some Vϵ (x), Young’s Theorem tells us that the Hessian matrix for f at x will be symmetric.
Keep in mind that the existence of partial derivatives does not imply differentiability (only the opposite
is true; see Theorem 26). However, the following theorem shows that continuity of partial derivatives (when
they exist) does imply differentiability:
Theorem 28. Let A be an open set in Rk , and f : A → R. f is continuously differentiable on A if and only
if all its partial derivatives exist and are continuous on A.
Proof. See Rudin (1976, Theorem 9.21).

An immediate consequence of Theorem 28 is that f is twice continuously differentiable on some set if

and only if all its second-order partial derivatives exist and are continuous on that set.

Optimization of Real-valued Functions of Several Variables

Let A ⊂ Rk and f : A → R. Various optimizers of f are defined just like before:

• The point x∗ ∈ A is a global maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ A (i.e., if f (x∗ ) = max f (A)).

• The point x∗ ∈ A is a local maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.
• The point x∗ ∈ A is a global minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ A (i.e., if f (x∗ ) = min f (A))
• The point x∗ ∈ A is a local minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.

If x∗ is a global/local maximizer of f , we say that f (x∗ ) is a global/local maximum value of f . Similarly,

if x∗ is a global/local minimizer of f , we say that f (x∗ ) is a global/local minimum value of f .
The following theorem, which is analogous to Theorem 18, gives a necessary condition for a local optimizer:

Theorem 29. Let A be an open set in Rk , x∗ ∈ A, and f : A → R have all its partial derivatives at x∗ . If
x∗ is a local optimizer of f , then fi (x∗ ) = 0 for all i.
Proof. Assume x∗ is a local optimizer of f and fi (x∗ ) exists for all i. Next, fix some point z ∈ Rk and
consider the function g : R → R defined by

g(t) = f (x∗ + tz).

Note that g(0) = f (x∗ ). Thus, since x∗ is a local optimizer of f , it must be the case that g ′ (0) = 0 (by
Theorem 18). Now, since
Xk
′
g (t) = fi (x∗ + tz)zi ,
i=1

g ′ (0) = 0 implies that

k
X
fi (x∗ )zi = 0.
i=1

Since this needs to hold for any z ∈ R, we require fi (x∗ ) = 0 for all i.

38
For example, consider the function f : R2 → R defined by

f (x1 , x2 ) = 4x1 + 2x2 − 2x21 − x22 + x1 x2 .

The first-order partial derivatives of f at (x1 , x2 ) are

f1 (x1 , x2 ) = 4 − 4x1 + x2 and f2 (x1 , x2 ) = 2 − 2x2 + x1 .

Now suppose (x∗1 , x∗2 ) ∈ R2 is an optimizer. Then, by Theorem 29, we have

4 − 4x∗1 + x∗2 = 0 and 2 − 2x∗2 + x∗1 = 0.

Solving the above two equations for x∗1 and x∗2 yields (x∗1 , x∗2 ) = (10/7, 12/7). It is important to emphasize
that this point may not be an optimizer of f , but it is a candidate for one.
At this point, it would be prudent to review some facts from your first course in linear algebra. A pth-
order principal minor of a k × k matrix is the determinant of a submatrix that is obtained by deleting
k − p rows and columns of the matrix (note that 1 ≤ p ≤ k). In general, a k × k matrix has

k k!
=
p p!(k − p)!
pth-order principal minors. The leading pth-order principal minor of a k × k matrix is the determinant
of the submatrix that is obtained by deleting the last k − p rows and columns of the matrix (or, more simply,
the deteriminant of the upper-left p × p submatrix of the matrix). For example, consider the 2 × 2 matrix

a a12
A = 11 .
a21 a22

A has 2 first-order principal minors:

det a11 = a11 and det a22 = a22

(the first of these is the leading first-order principal minor). Moreover, A has 1 second-order principal minor

det(A) = a11 a22 − a21 a12

(and this is the leading second-order principal minor). Now consider the 3 × 3 matrix
 
b11 b12 b13
B = b21 b22 b23  .
b31 b32 b33

B has 3 first-order principal minors:

det b11 , det b22 , and det b33

(the first of these is the leading first-order principal minor). B also has 3 second-order principal minors:

b b b b b b
det 11 12 , det 11 13 , and det 22 23
b21 b22 b31 b33 b32 b33

(the first of these is the leading second-order principal minor). Finally, there is 1 third-order principal minor
of B:
b22 b23 b21 b23 b21 b22
det(B) = b11 det − b12 det + b13 det
b32 b33 b31 b33 b31 b32
(and this is the leading third-order principal minor).
A symmetric matrix is:
• positive definite if and only if all its leading principal minors are positive.

39
• positive semi-definite if and only if all its principal minors are non-negative.
• negative definite if and only if all its odd-numbered leading principal minors are negative and all its
even-numbered leading principal minors are positive.
• negative semi-definite if and only if all its odd-numbered principal minors are non-positive and all
its even-numbered principal minors are non-negative.
Note that a matrix of zeros is both positive semi-definite and negative semi-definite.
For example, consider the symmetric matrix

a b
.
b c

This matrix is:

• Positive definite if and only if a > 0 and ac − b2 > 0.
• Positive semi-definite if and only if a ≥ 0, c ≥ 0, and ac − b2 ≥ 0.
• Negative definite if and only if a < 0 and ac − b2 > 0.
• Negative semi-definite if and only if a ≤ 0, c ≤ 0, and ac − b2 ≥ 0.
The following theorem, which is analogous to Theorem 23, provides sufficient conditions for local maxi-
mizers and local minimizers:
Theorem 30. Let A be an open set in Rk , x∗ ∈ A, and f : A → R be twice continuously differentiable on
some Vϵ (x∗ ).
(i) If all first-order partial derivatives of f at x∗ are zero and the Hessian matrix for f at x∗ is negative
definite, x∗ is a local maximizer.
(ii) If all first-order partial derivatives of f at x∗ are zero and the Hessian matrix for f at x∗ is positive
definite, x∗ is a local minimizer.
Proof. See Binmore (1982, Theorem 19.42).
Consider again the function f : R2 → R defined by

f (x1 , x2 ) = 4x1 + 2x2 − 2x21 − x22 + x1 x2 .

The second-order partial derivatives of f at (x1 , x2 ) ∈ R2 are

f11 (x1 , x2 ) = −4, f12 (x1 , x2 ) = f21 (x1 , x2 ) = 1, and f2 (x1 , x2 ) = −2

(notice that none of these depend on x1 or x2 , but this will not always be the case). Thus, the Hessian of f
at (x∗1 , x∗2 ) = (10/7, 12/7) (really, at any (x1 , x2 ) ∈ R2 ) is

−4 1
H= .
1 −2

This matrix is negative definite since −4 < 0 and 8 − 1 > 0. Thus, by Theorem 30, we can say that
(10/7, 12/7) is a local maximizer.
As emphasized earlier, it is possible that H takes on different values for different values of x. Accordingly,
it is possible (for example) that H is positive definite for some values of x but negative definite for other
values of x. In such cases, we would need to examine the definiteness of H specifically at x∗ .
To round things out, we will present one last theorem, analogous to Theorem 23, which provides necessary
conditions for local maximizers and local minimizers:
Theorem 31. Let A be an open set in Rk , x∗ ∈ A, and f : A → R be twice continuously differentiable on
some Vϵ (x∗ ).

40
(i) If x∗ is a local maximizer, all first-order derivatives of f at x∗ are zero and the Hessian matrix for f
at x∗ is negative semi-definite.
(ii) If x∗ is a local minimizer, all first-order derivatives of f at x∗ are zero and the Hessian matrix for f
at x∗ is positive semi-definite.

Proof. (i) Assume x∗ is a local maximizer of f . Then fi (x∗ ) = 0 for all i (by Theorem 29). Next, fix some
other point z ∈ Rk and consider again the function g defined in the proof of Theorem 29. Since x∗ is a local
maximizer of f , it must be the case that g ′′ (0) ≤ 0 (by Theorem 23). We have
k X
X k
g ′′ (t) = fij (x∗ + tz)zi zj ,
i=1 j=1

so g ′′ (0) ≤ 0 implies that

k X
X k
fij (x∗ )zi zj ≤ 0,
i=1 j=1

or, in matrix form, z′ Hz ≤ 0 (see below). Thus, since this must hold for any z ∈ R (i.e., even when z ̸= 0),
H must be negative semi-definite (note that this relies on the fact that H is symmetric, which follows from
an application of Young’s Theorem).
(ii) The proof of (ii) is similar and will be omitted.
Similar to Theorem 23, all that Theorem 31 can do for us is identify a candidate for a local optimizer.
3/2 3/2
As another example, consider the function f : R2+ → R defined by f (x1 , x2 ) = x1 + x2 . The first-order
partial derivatives of f at (x1 , x2 ) are
√ √
3 x1 3 x2
f1 (x1 , x2 ) = and f2 (x1 , x2 ) = ,
2 2
and the second-order partial derivatives of f at (x1 , x2 ) are
3 3
f11 (x1 , x2 ) = √ , f12 (x1 , x2 ) = f21 (x1 , x2 ) = 0, and f22 (x1 , x2 ) = √ .
4 x1 4 x2

Now suppose (x∗1 , x∗2 ) ∈ R2+ is an optimizer. Then, by Theorem 29, we have
p ∗
3 x∗2
p
3 x1
=0 and = 0.
2 2
Solving the above two equations for x∗1 and x∗2 yields (x∗1 , x∗2 ) = (0, 0). Unfortunately, f11 and f22 are
undefined at (0, 0), so Theorems 30 and 31 are not applicable (both of these theorems require that f be
twice continuously differentiable on some Vϵ (x∗1 , x∗2 )). More to the point, the Hessian of f at (0, 0) is
undefined so we can’t say anything about its definiteness. So, all that we can say is that (0, 0) is a candidate
for a local optimizer of f (it is actually a local minimizer of f since f (0, 0) = 0 and f (x1 , x2 ) > 0 for all
(x1 , x2 ) ∈ R2+ \ {0, 0}).

Constrained Optimization
In what follows, we will focus solely on maximization problems since the problem of minimizing the function
f is equivalent to the problem of maximizing the function −f . Moreover, to keep life simple, we will focus
on the case where the domain of f is a subset of R2 . This has the added benefit of allowing us to reduce
notational clutter slightly by denoting points in R2 by (x, y) rather than (x1 , x2 ).
Before proceeding, we need to digest the following theorem:

41
Theorem 32 (Implicit Function Theorem). Let A be an open subset of R2 and f : A → R be continuously
differentiable on A. If (c, d) ∈ A satisfies f (c, d) = 0 and f2 (c, d) ̸= 0, there exists an open interval I ⊂ R
containing c and one and only one function g : I → R implicitly defined by f (x, g(x)) = 0 for all x ∈ I and
satisfying g(c) = d. Moreover g is continuously differentiable on I with

f1 (c, g(c))
g ′ (c) = − .
f2 (c, g(c))

Proof. See Rudin (1976, Theorem 9.28).

For example, consider the function f : R2 → R defined by f (x, y) = x2 − 3xy + y 3 − 7. Suppose we are
interested in the point (4, 3) since we know that f (4, 3) = 0. We have f1 (4, 3) = −1 and f2 (4, 3) = 15 ̸= 0.
The beauty of the IFT is that, without specifying what g is, we know that g ′ (4) = 1/15. Roughly speaking,
g ′ (4) tells us how much y needs to change in order to maintain f (x, y) = 0 as x moves slightly away from 4.
For example, if x increases by 0.3 to 4.3, this suggests that we increase y by 0.3/15 to 3.02. Of course, this
is not an exact result: |f (4.3, 3.02) − 0| ≈ 0.08. With the aid of a computer, we can do a bit better:

|f (4.3, 3.014754436002) − 0| < 10−11

Let A be an open subset of R2 and f : A → R. In the prelude, we considered methods for finding
local/global maximizers of f in the set A. Often, however, we want to restrict our search to some proper
subset of A. Specifically, letting g : A → R, suppose we want to restrict our search to the feasible set

C = {(x, y) ∈ A : g(x, y) = 0}.

We then say (x∗ , y ∗ ) is a global maximizer of f on the feasible set C if

f (x∗ , y ∗ ) ≥ f (x, y) for all (x, y) ∈ C.

Similarly, we say that (x∗ , y ∗ ) is a local maximizer of f on the feasible set C if

f (x∗ , y ∗ ) ≥ f (x, y) for all (x, y) ∈ Vϵ (x, y) ∩ C for some ϵ > 0.

For example, consider the function f : R2 → R defined by f (x, y) = 2x2 − y 2 . Now, suppose we want to
constrain our search for local/global maximizers to the feasible set

C = {(x, y) ∈ R2 : 2x + y = 1}.

Here, the function g : R2 → R is defined as g(x, y) = 2x + y − 1.

Returning to the general case, we introduce the Lagrangian function L : R × A → R defined by

L(λ, x, y) = f (x, y) + λg(x, y)

(λ ∈ R is called the Lagrange multiplier). The following theorem provides sufficient conditions for a local
maximizer of f on C (the proof should not be skipped):
Theorem 33. Let A be an open subset of R2 , f : A → R and g : A → R both be twice continuously
differentiable on A, and C = {(x, y) ∈ A : g(x, y) = 0}. Define L : R × A → R by L(λ, x, y) = f (x, y) +
λg(x, y), and let (λ∗ , x∗ , y ∗ ) ∈ R × A. If all first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are zero and
the determinant of the Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is positive, then (x∗ , y ∗ ) is a local maximizer of
f on C.
Proof. Suppose all first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are zero and the determinant of the
Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is positive. The first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are

L1 (λ∗ , x∗ , y ∗ ) = g(x∗ , y ∗ ), L2 (λ∗ , x∗ , y ∗ ) = f1 (x∗ , y ∗ ) + λ∗ g1 (x∗ , y ∗ ),

and
L3 (λ∗ , x∗ , y ∗ ) = f2 (x∗ , y ∗ ) + λ∗ g2 (x∗ , y ∗ ).

42
Note that, since L1 (λ∗ , x∗ , y ∗ ) = 0, we have g(x∗ , y ∗ ) = 0, i.e., (x∗ , y ∗ ) ∈ C as desired.
The Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is

g1 (x∗ , y ∗ ) g2 (x∗ , y ∗ )
 
0
g1 (x∗ , y ∗ ) L22 (λ∗ , x∗ , y ∗ ) L23 (λ∗ , x∗ , y ∗ )
g2 (x∗ , y ∗ ) L23 (λ∗ , x∗ , y ∗ ) L33 (λ∗ , x∗ , y ∗ )

(since f and g are twice continuously differentiable on A, so is L, and thus L32 (λ∗ , x∗ , y ∗ ) = L23 (λ∗ , x∗ , y ∗ )
by Young’s Theorem). The determinant of this matrix is

D = −(L22 (λ∗ , x∗ , y ∗ )g2 (x∗ , y ∗ )2 − 2L23 (λ∗ , x∗ , y ∗ )g1 (x∗ , y ∗ )g2 (x∗ , y ∗ ) + L33 (λ∗ , x∗ , y ∗ )g1 (x∗ , y ∗ )2 ).

We thus require g1 (x∗ , y ∗ ) ̸= 0 or g2 (x∗ , y ∗ ) ̸= 0 (or both) so that D ̸= 0. Without loss of generality, assume
g2 (x∗ , y ∗ ) ̸= 0. Then, by the IFT, there exists an open interval I ⊂ R containing x∗ and a function h : I → R
implicitly defined by g(x, h(x)) = 0 for all x ∈ I satisfying h(x∗ ) = y ∗ and with

g1 (x∗ , h(x∗ ))
h′ (x∗ ) = −
g2 (x∗ , h(x∗ ))

(note that g and h here correspond to f and g, respectively, in the statement of the IFT above).
We now introduce the function F : I → R defined by F (x) = f (x, h(x)), and show that x∗ is a local
maximizer of F , i.e, that F ′ (x∗ ) = 0 and F ′′ (x∗ ) < 0. The derivative of F at x∗ is

F ′ (x∗ ) = f1 (x∗ , h(x∗ )) + f2 (c, h(x∗ ))h′ (x∗ )

= L2 (λ∗ , x∗ , h(x∗ )) + L3 (λ∗ , x∗ , h(x∗ ))h′ (x∗ )
= 0,

since L2 (λ∗ , x∗ , y ∗ ) = L3 (λ∗ , x∗ , y ∗ ) = 0. The second derivative of F at c is

F ′′ (x∗ ) = L22 (λ∗ , x∗ , h(x∗ )) + 2L23 (λ∗ , x∗ , h(x∗ ))h′ (x∗ ) + L33 (λ∗ , x∗ , h(x∗ ))h′ (x∗ )2
− L3 (λ∗ , x∗ , h(x∗ ))h′′ (x∗ )
2
g1 (x∗ , y ∗ ) g1 (x∗ , y ∗ )

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
= L22 (λ , x , y ) + 2L23 (λ , x , y ) − + L33 (λ , x , y ) −
g2 (x∗ , y ∗ ) g2 (x∗ , y ∗ )
D
=−
g2 (x∗ , y ∗ )2
<0

since D > 0 (in the second equality above, we have used the fact that L3 (λ∗ , x∗ , y ∗ ) = 0).
Let’s return now to the example considered above. The Lagrangian function is

L(λ, x, y) = 2x2 − y 2 + λ(2x + y − 1).

The first-order partial derivatives of L at (λ, x, y) are

L1 (λ, x, y) = 2x + y − 1, L2 (λ, x, y) = 4x + 2λ, and L3 (λ, x, y) = −2y + λ.

Setting each of these first-order partial derivatives equal to zero and solving yields (λ∗ , x∗ , y ∗ ) = (−2, 1, −1).
The Hessian for L at this point (actually, at any point in R3 ) is
 
0 2 1
2 4 0  .
1 0 −2

The determinant of this matrix is 4 (i.e., positive), so we can conclude that (1, −1) is a local maximizer of
f on C.

43
To understand what is happening here, let’s put this in the context of the proof of Theorem 2. Here, it’s
easy to see that h(x) = 1 − 2x, and thus

F (x) = f (x, h(x)) = 2x2 − (1 − 2x)2 = −2x2 + 4x − 1.

Since F ′ (1) = 0 and F ′′ (1) = −4 < 0, we can conclude that 1 is a local maximizer of F (meaning that
(1, g(1)) = (1, −1) is a local maximizer of f on C). Unfortunately, in many cases, we don’t have an explicit
expression for h, so we need to use the Lagrangian method instead of this more direct approach.
For completeness, the following theorem provides sufficient conditions for a local maximizer in the general
case where f is a real-valued function of k ≥ 2 variables:
Theorem 34. Let k ≥ 2, A be an open subset of Rk , f : A → R and g : A → R both be twice continuously
differentiable on A, and C = {x ∈ A : g(x) = 0}. Define L : R × A → R by L(λ, x) = f (x) + λg(x), and let
(λ∗ , x∗ ) ∈ R × A and Dp be the pth-order leading principal minor of the Hessian matrix for L at (λ∗ , x∗ ). If
all first-order partial derivatives of L at (λ∗ , x∗ ) are zero and, for ℓ = 3, . . . , k + 1, Dℓ is positive for odd ℓ
and negative for even ℓ, then x∗ is a local maximizer of f on C.
Proof. See Simon and Blume (1994, Theorem 30.12).

Note that the Hessian matrix for L at (λ∗ , x∗ ) is (k + 1) × (k + 1), so Dk+1 is just the determinant of
this Hessian matrix itself. For example, with k = 2, we have k + 1 = 3, so the only second-order condition
in the above theorem is D3 > 0 (i.e., the determinant of the Hessian matrix is positive), which is consistent
with Theorem 2. When k = 3, the second-order conditions in the above theorem are D3 > 0 and D4 < 0.
When k = 4, the second-order conditions in the above theorem are D3 > 0, D4 < 0, and D5 > 0.

* Exercises
1. Show that N2 is countable.
2. In the proof of Theorem 1, we used the “obvious” facts that |x| ≥ x and |x|2 = x2 for all x ∈ R. Prove
these facts.
3. Prove the Reverse Triangle Inequality.
4. Show that |ab| = |a||b| for all a, b ∈ R.
5. Let (sn ) be some sequence. Show that, if lim |sn | = 0, then lim sn = 0.

6. Let (sn ) be some sequence. Show that, if lim sn = s, then lim |sn | = |s|. (HINT: You will need to use
the Reverse Triangle Inequality.)
7. Let (sn ) and (tn ) be sequences converging to the same limit. Show that lim |sn − tn | = 0.
8. Let (sn ) be some sequence. Show that sn ≤ sup{sk : k ≥ n} for all n ∈ N.

9. Consider the sequence (an ) defined by an = αn where |α| ∈ (0, 1). Show that lim an = 0.
10. Consider the sequence (an ) defined by an = (−1)n+1 . Find lim sup an and lim inf an .
11. Show that the sequence (an ) defined by an = (−1)n /n is a Cauchy sequence.
12. Show that the function f : R → R defined by f (x) = x2 is convex on R.
√
13. Consider the function f : R+ → R defined by f (x) = x. Using the ϵ − δ definition of a (two-sided)
limit, show that limx→0 f (x) = 0..
14. Let f be a real-valued function and c be an interior point in the domain of f . Write a formal definition
of limx→c+ f (x) = ∞ (f diverges to ∞ from the right).

44
15. Consider the function f : (−∞, −1) ∪ {0} ∪ (1, ∞) → R defined by

−1 if x < −1

f (x) = 0 if x = 0

1 if x > 1.


Show the following:

(a) f has discontinuities at -1 and 1.
(b) f is continuous at 0.

References
Apostol, T.M. (1974). Mathematical Analysis, 2nd edition. Pearson.

Bartle, R.G. and Sherbert, D.R. (2011). Introduction to Real Analysis, 4th edition. Wiley.

Binmore, K.G. (1982). Mathematical Analysis, 2nd edition. Cambridge.

Ross, K.A. (2013). Elementary Analysis, 2nd edition. Springer.

Rudin, W. (1976). Principles of Mathematical Analysis, 3rd edition. McGraw-Hill.

Simon, C.P. and Blume, L. (1994). Mathematics for Economists. Norton.

PG TRB Maths Study Material 2
No ratings yet
PG TRB Maths Study Material 2
11 pages
Real Analysis One
No ratings yet
Real Analysis One
29 pages
Real Analysis Lecture Notes 1
No ratings yet
Real Analysis Lecture Notes 1
6 pages
Real Analysis
No ratings yet
Real Analysis
40 pages
Real Analysis: Jos e Mar Ia Cueto
No ratings yet
Real Analysis: Jos e Mar Ia Cueto
14 pages
National Academy Dharmapuri 97876 60996, 7010865319
No ratings yet
National Academy Dharmapuri 97876 60996, 7010865319
11 pages
Week 9
No ratings yet
Week 9
5 pages
An Introduction To Real Analysis - (3 Real Numbers)
No ratings yet
An Introduction To Real Analysis - (3 Real Numbers)
10 pages
Chapter One (System of Real Numbers) ... 2.1
No ratings yet
Chapter One (System of Real Numbers) ... 2.1
32 pages
Gate Environmental Science Study Material PDF
No ratings yet
Gate Environmental Science Study Material PDF
9 pages
MFCS 1 Lecture Notes
No ratings yet
MFCS 1 Lecture Notes
215 pages
KEY CONCEPTS: Introduction To Real Analysis: 1 Sets and Functions
No ratings yet
KEY CONCEPTS: Introduction To Real Analysis: 1 Sets and Functions
13 pages
analysis full course
No ratings yet
analysis full course
69 pages
Chapter1 analysis mathematical_analysis_i (1)
No ratings yet
Chapter1 analysis mathematical_analysis_i (1)
11 pages
Mathematics 0
No ratings yet
Mathematics 0
131 pages
Chapter 2 - Topology
No ratings yet
Chapter 2 - Topology
59 pages
Real Numbers
No ratings yet
Real Numbers
8 pages
453 Notes
No ratings yet
453 Notes
119 pages
MAT3711 2014 - SG - 001 2010 4 B
No ratings yet
MAT3711 2014 - SG - 001 2010 4 B
176 pages
The Real Number System: Sets of Real Numbers R
No ratings yet
The Real Number System: Sets of Real Numbers R
6 pages
Topology Notes
0% (1)
Topology Notes
30 pages
STAT103 Sets Summary
No ratings yet
STAT103 Sets Summary
2 pages
SMA 300 NOTES
No ratings yet
SMA 300 NOTES
46 pages
Sequences of real and complex numbers Chapter I
No ratings yet
Sequences of real and complex numbers Chapter I
17 pages
An Introduction To Real Analysis - (2 Sets and Functions)
No ratings yet
An Introduction To Real Analysis - (2 Sets and Functions)
8 pages
Mathematical Analysis - Real Sequences and Real Series
No ratings yet
Mathematical Analysis - Real Sequences and Real Series
53 pages
G1CMIN Measure and Integration 2003-4: Prof. J.K. Langley May 13, 2004
No ratings yet
G1CMIN Measure and Integration 2003-4: Prof. J.K. Langley May 13, 2004
43 pages
Real Analysis Study Materials Part-1
No ratings yet
Real Analysis Study Materials Part-1
26 pages
Supremum
No ratings yet
Supremum
6 pages
Chapter 1. Some Requisites: 1.1 Countable and Uncountable Sets
No ratings yet
Chapter 1. Some Requisites: 1.1 Countable and Uncountable Sets
8 pages
Lecturenotes Real Analysis 0908 PDF
No ratings yet
Lecturenotes Real Analysis 0908 PDF
7 pages
MeasureTheory UCLA
100% (3)
MeasureTheory UCLA
105 pages
Calculus Notes
No ratings yet
Calculus Notes
68 pages
My Project
No ratings yet
My Project
25 pages
PRECALCULUS-W1
No ratings yet
PRECALCULUS-W1
36 pages
Intro Analysis
No ratings yet
Intro Analysis
269 pages
SMA 2301 Real Analysis I Notes
No ratings yet
SMA 2301 Real Analysis I Notes
55 pages
cours (1)
No ratings yet
cours (1)
14 pages
2 Sequences Infinite Series New 841685858553204
No ratings yet
2 Sequences Infinite Series New 841685858553204
89 pages
Lecture mklpp
No ratings yet
Lecture mklpp
43 pages
MTL1
No ratings yet
MTL1
3 pages
Single Column SEBA
No ratings yet
Single Column SEBA
63 pages
School of Economics ECON6004: Mathematical Economics: Lecture 1.1: Numbers, Sets, and Functions
No ratings yet
School of Economics ECON6004: Mathematical Economics: Lecture 1.1: Numbers, Sets, and Functions
12 pages
Project Work On Permutation and Combination Full
No ratings yet
Project Work On Permutation and Combination Full
24 pages
Mathematics_Formula_Sheet
No ratings yet
Mathematics_Formula_Sheet
36 pages
Unit 1-Block 1
No ratings yet
Unit 1-Block 1
25 pages
Schaum General Topology
No ratings yet
Schaum General Topology
252 pages
Sets _ Mind Maps __ (JEE Ultimate CC 2.0 2023) (2)
No ratings yet
Sets _ Mind Maps __ (JEE Ultimate CC 2.0 2023) (2)
2 pages
Project
No ratings yet
Project
38 pages
1. MATH-121-NOTES-SETS-I
No ratings yet
1. MATH-121-NOTES-SETS-I
14 pages
HW_910016_2Quest
No ratings yet
HW_910016_2Quest
19 pages
Introduction To Real Analysis: 1 Sets
No ratings yet
Introduction To Real Analysis: 1 Sets
8 pages
Selective Review
No ratings yet
Selective Review
40 pages
Appendix To Notes 1: Cardinality
No ratings yet
Appendix To Notes 1: Cardinality
6 pages
ra2012-lectures-2012-12-07-signed
No ratings yet
ra2012-lectures-2012-12-07-signed
114 pages
Lecture Notes
No ratings yet
Lecture Notes
148 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Boolean Algebra Essentials
From Everand
Boolean Algebra Essentials
Alan Solomon
No ratings yet
Statistics (1)
No ratings yet
Statistics (1)
38 pages
STAT 357 Homework #2
No ratings yet
STAT 357 Homework #2
1 page
STAT 357 Homework #1
No ratings yet
STAT 357 Homework #1
1 page
STAT 357 Homework #3
No ratings yet
STAT 357 Homework #3
1 page
Mouth Dissolving Film Thesis
100% (2)
Mouth Dissolving Film Thesis
4 pages
MUSC 1004 Test 3 Study Guide
No ratings yet
MUSC 1004 Test 3 Study Guide
2 pages
Immediate Download Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebooks 2024
100% (3)
Immediate Download Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebooks 2024
52 pages
Securing Microservices In: Securing Your First Microservice
No ratings yet
Securing Microservices In: Securing Your First Microservice
43 pages
Pneumatic Ball Machines Owner's Manual: Models
No ratings yet
Pneumatic Ball Machines Owner's Manual: Models
8 pages
Btec HM Coursebook
No ratings yet
Btec HM Coursebook
388 pages
Mindmap Osteoarthritis Pain
No ratings yet
Mindmap Osteoarthritis Pain
1 page
Rocks and Minerals Webquest
No ratings yet
Rocks and Minerals Webquest
4 pages
Chronic Health Problems
No ratings yet
Chronic Health Problems
4 pages
Session 1 - Speed-Time Graph Worksheet - For Tutor
No ratings yet
Session 1 - Speed-Time Graph Worksheet - For Tutor
4 pages
Fatigue Analysis of Pulley by Using Finite Element Analysis
No ratings yet
Fatigue Analysis of Pulley by Using Finite Element Analysis
4 pages
Lesson Plan Pythagorean Theorem
No ratings yet
Lesson Plan Pythagorean Theorem
4 pages
The Importance of A Feasibility Study To An Entrepreneur
No ratings yet
The Importance of A Feasibility Study To An Entrepreneur
3 pages
Display LED in Pattern: Pin Diagram
No ratings yet
Display LED in Pattern: Pin Diagram
29 pages
Cambridge IGCSE: Global Perspectives 0457/12
No ratings yet
Cambridge IGCSE: Global Perspectives 0457/12
4 pages
OSS QB (1)
No ratings yet
OSS QB (1)
6 pages
Buying and Selling: and Net Profit/Loss
No ratings yet
Buying and Selling: and Net Profit/Loss
19 pages
University College of Engineering Villupuram Department of Mechanical Engineering
No ratings yet
University College of Engineering Villupuram Department of Mechanical Engineering
6 pages
Madeeasy 2025 ESE BOOK
No ratings yet
Madeeasy 2025 ESE BOOK
25 pages
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
No ratings yet
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
5 pages
Client Information Form
No ratings yet
Client Information Form
5 pages
Semtech Keycoder
No ratings yet
Semtech Keycoder
18 pages
European J Soil Science - 2022 - de - Crude Glycerol A Biodiesel Byproduct Used As A Soil Amendment To Temporarily
No ratings yet
European J Soil Science - 2022 - de - Crude Glycerol A Biodiesel Byproduct Used As A Soil Amendment To Temporarily
19 pages
Abacare: Floor Plan Layout Legend
No ratings yet
Abacare: Floor Plan Layout Legend
2 pages
BBSM-2015 - Ii
No ratings yet
BBSM-2015 - Ii
10 pages
Maths Literacy Grade 11 Assignment 2022 Only
No ratings yet
Maths Literacy Grade 11 Assignment 2022 Only
12 pages
Download full Contemporary Strategic Management An Australasian Perspective 2nd Edition Grant Solutions Manual all chapters
100% (8)
Download full Contemporary Strategic Management An Australasian Perspective 2nd Edition Grant Solutions Manual all chapters
38 pages
Lecture - 8: Fluid Motion: Streamlines Etc
No ratings yet
Lecture - 8: Fluid Motion: Streamlines Etc
13 pages
EFT Form
No ratings yet
EFT Form
1 page
RADIOLOGY MultipleQuestions
100% (1)
RADIOLOGY MultipleQuestions
6 pages