Mathematics (1)
Mathematics (1)
Warning: All sections marked with an asterisk (*) as well as all proofs are reserved for PhD students.
N = {1, 2, 3, . . .},
* Countability
A set is said to be countable if it is either finite or if it is countably infinite, which means that if it has
a one-to-one correspondence with N (i.e., each element in the set can be paired with exactly one element
of N and vice-versa). So, N itself is obviously countably infinite. Another example of a countably infinite
1 Some authors will use ⊂ to denote a proper subset and ⊆ to denote a subset, but there is rarely a need to differentiate
between the two.
1
set is the set of even natural numbers, E = {2, 4, 6, . . .}. To see why this is the case, consider the following
correspondence:
N : 1 2 3 ··· n ···
↕ ↕ ↕ ↕
E : 2 4 6 · · · 2n · · ·
To see why Z is a countably infinite set, consider the following correspondence:
N: 1 2 3 4 5 ··· n ···
↕ ↕ ↕ ↕ ↕ ( ↕
(n − 1)/2 if n is odd
Z: 0 −1 1 −2 2 ··· ···
−n/2 if n is even
What might be more surprising is that Q is also countably infinite. To see this, begin by defining, for each
n ∈ N, the set:
p
An = ± : p, q ∈ N are the smallest terms such that p + q = n
q
That is,
0 1 −1 1 −1 2 −2
A1 = , A2 = , , A3 = , , , ,
1 1 1 2 2 1 1
and so on. We then have the following correspondence:
N: 1 2 3 4 5 6 7 ···
↕ ↕ ↕ ↕ ↕ ↕ ↕
Q : A1,1 A2,1 A2,2 A3,1 A3,2 A3,3 A3,4 ···
where An,i is the ith element of An (e.g., A2,1 = 1/1). On the other hand, R is not countably infinite (i.e.,
it is uncountable), but we won’t prove this here (for a proof, see Bartle & Sherbert, 2011, Theorem 2.54).
More About R
Let a, b ∈ R with a < b. Various intervals within R are defined as follows:
(a, b] = {x ∈ R : a < x ≤ b}
[a, b) = {x ∈ R : a ≤ x < b}
Note that a construction such as [−∞, ∞] makes no sense since −∞ and ∞ are not real numbers.
2
The non-negative real numbers [0, ∞) are denoted by R+ , while the non-positive real numbers (−∞, 0]
are denoted by R− .
The set S ⊂ R is said to bounded from above if there is a u ∈ R such that s ≤ u for all s ∈ S (here, u
is called an upper bound of S). For example, consider the sets A = [0, 1], B = (−∞, 1), and C = (0, ∞).
learly A and B are bounded from above (any real number greater than or equal to 1 is an upper bound for
both), while C is not bounded from above. The set S ⊂ R is said to bounded from below if there is a
l ∈ R such that s ≥ l for all s ∈ S (here, l is called an lower bound of S). In the example above, clearly A
and C are bounded from below (any real number less than or equal to 0 is a lower bound for both), while B
is not bounded from below. If the set S ⊂ R is bounded from both above and below, it is simply said to be
bounded. More directly, we can say that the set S is bounded if there exists a b ∈ R such that |s| ≤ b for
all s ∈ S. If S is not bounded it is said to be unbounded, even if it is bounded from either above or below
(but not both). In the example above, only A is bounded, while B and C are unbounded.
If a set S ⊂ R is bounded from above, its smallest upper bound is called the supremum of S and
is denoted by sup S (another term for supremum is least upper bound). In the above example, sup A =
sup B = 1. Since the set C is not bounded from above, it is convenient to write sup C = ∞. If a set S ⊂ R
is bounded for below, its largest lower bound is called the infimum of S and is denoted by inf S (another
term for infimum is greatest lower bound). In the above example, inf A = inf C = 0. Since the set B is not
bounded from below, it is convenient to write inf B = −∞. If a set S ⊂ R has a largest element, we call
that element the maximum of S and denote it by max S. In the above example, max A = 1 (this is its also
supremum), while B and C have no maximum (notice that 1 ∈ / B). If a set S ⊂ R has a smallest element,
we call that element the minumum of S, and denote it by min S. In the above example, min A = 0 (this is
its also infimum), while B and C have no minimum (notice that 0 ∈ / C).
Functions
Given two sets A and B, a function f from A to B (written f : A → B) maps each element of A to a specific
element of B. That is, each a ∈ A is mapped to some f (a) ∈ B. It is important to distinguish between the
function f and the value f (a). We can think of f as a “process” that takes a as “input” and produces f (a)
as “output”. For example, consider the function f : {Red, Yellow, Green} → {Stop, Yield, Go} defined by
Stop if Colour = Red,
f (Colour) = Yield if Colour = Yellow,
Go if Colour = Green.
This function takes a colour as an input, and produces an action as an output, e.g., plugging the colour Red
into this function produces the action Stop as an output.
3
If f : A → B, the set A is referred to as the domain of f , and we say that f is defined on A (and on
any subset of A).2 The codomain of f is B, while the range of f is {f (a) ∈ B : a ∈ A} (i.e., the subset of
B containing only the elements of B that f can map to from A). We will sometimes denote the range of f
as f (A).
A function whose range is a subset of R is called a real-valued function. When we encounter a real-
valued function, we usually write its codomain as R, even if its range is some proper subset of R. For
example, consider the function f : R → R defined by f (x) = |x|. This function has range R+ , which is a
proper subset of its codomain R.
Taking the square root of both side of the above produces the desired result.
The “Reverse Triangle Inequality” states that ||a| − |b|| ≤ |a − b| for all a, b ∈ R. You are asked to prove
this in Exercise 3.
a1 = 1, a2 = 1/4, a3 = 1/9,
and so on.
A sequence (sn ) is said to converge to s ∈ R if, for every ϵ > 0, there exists an N ∈ N such that
|sn − s| < ϵ for all n > N . If (sn ) converges to s, we say that s is the limit of (sn ) and write lim sn = s. If
a sequence converges to some limit, then it is said to be convergent.
This definition needs some unpacking. Suppose (sn ) converges to s. What this is saying is that sn will
be “close” to s (less than ϵ away) provided that n is sufficiently large (i.e., larger than N , which depends on
ϵ).
For example, consider again the sequence (an ) defined by an = n−2 . It seems intuitive to say that
lim an = 0, but let’s try to relate this to the definition given above. Suppose, for the moment, that ϵ = 0.1.
Then, as Figure 1 makes clear, we have |n−2 − 0| < 0.1 for all n > 3 (all terms between the two dotted lines
satisfy this condition).
In general, we can choose ϵ as small as we want provided that we choose N is large enough (in this
sense, we say that sn can be made “arbitrarily close” to s for sufficiently large n). A useful “trick” for
2 In the above example, f is defined on {Red, Yellow}, which is a subset of its domain. However, f is not defined on
4
Figure 1: The sequence (an ) defined by an = n−2
selecting the appropriate N corresponding to our choice of ϵ is to solve the equation |sN − s| = ϵ for N (if
the resulting value of N is not a natural number, then round down to the nearest one).3 For the sequence
(an ) considered above, we have |N −2 − 0| = ϵ. Since the quantity inside the absolute value function must
be positive (remember that N ∈ N meaning that N cannot be negative), this is equivalent to N −2 = ϵ or
N = ϵ−1/2 . Thus, with ϵ = 0.1, we have ϵ−1/2 ≈ 3.1623, so we set N = 3 as above. Similarly, with ϵ = 0.01,
we have ϵ−1/2 = 10, so we set N = 10 (notice that |11−2 − 0| ≈ 0.008 < 0.01).
A sequence that does not have a limit (i.e., that is not convergent) is said to be divergent. For example,
the sequence (bn ) defined by bn = (−1)n diverges since the terms of this sequence are
b1 = −1, b2 = 1, b3 = −1
and so on.
We say that a sequence (sn ) diverges to ∞ (and write lim sn = ∞) if, for every u > √ 0 there exists an
N ∈ N such that sn > u for all n > N . For example, the sequence (cn ) defined by cn =√ n diverges to ∞.
To show this formally, let u > 0 and N = u2 . Then n > N implies n > u2 and thus n > u. Similarly,
we say that a sequence (sn ) diverges to −∞ (and write lim sn = −∞) if, for every l < 0, there exists an
N ∈ N such that sn < l for all n > N .
The sequence (sn ) is said to be bounded if {sn : n ∈ N} is a bounded set, i.e., if there exists a b ∈ R
such that |sn | ≤ b for all n ∈ N. For example, the sequence (an ) defined by an = n−2 is bounded (note that
sup{an : n ∈ N} = 1 and inf = {an : n ∈ N} = 0, so we could say |an | ≤ 1 for all n ∈ N).
Clearly, any sequence that diverges to ∞ or −∞ is unbounded. However, it is important to recognize
that for any sequence (sn ), even for one that is unbounded, sn ∈ R for all n ∈ N, i.e., |sn | < ∞ for all n ∈ N.
In other words, saying that (sn ) diverges to ∞ or −∞ does not mean that sn is ever equal to ∞ or −∞
(remember that ∞ and −∞ are not real numbers). Instead, it just means that |sn | “grows without bound”.
The above discussion leads nicely into the following theorem:
5
Proof. Let (sn ) be a sequence converging to s ∈ R. Letting ϵ = 1, there exists an N such that |sn − s| < 1
for all n > N . Thus, by the triangle inequality, we have |sn | < |s| + 1 for all n > N , meaning that |sn | is
bounded for n > N . Of course, for any n ≤ N , there may be some term in the set {|s1 |, . . . , |sN |} which is
larger than |sn | (but still finite), so we write
for all n ∈ N.
As noted above, sequences that diverge to ∞ or −∞ are unbounded. However, some divergent sequences
are bounded. For example, the sequence (bn ) defined by bn = (−1)n is divergent but bounded since |bn | ≤ 1
for all n ∈ N.
The results in the next theorem will be used frequently:
Theorem 3 (Algebraic Limit Theorem). Let (sn ) and (tn ) be sequences converging to s and t, respectively.
Then,
(i) lim asn = as, for all a ∈ R.
(ii) lim sn + tn = s + t.
(iii) lim sn tn = st.
(iv) lim sn /tn = s/t, provided t ̸= 0.
Proof. (i) We consider only the case where a ̸= 0 (when a = 0, we have the sequence 0, 0, 0, . . . which
obviously converges to 0). Notice that |asn − as| = |a||sn − s|. Since ϵ is arbitrary, we know that there exists
an N ∈ N such that |sn − s| < ϵ/|a| for all n > N (if we can choose an Nϵ ∈ N such that |sn − s| < ϵ for all
n > Nϵ , then we can also choose an N such that |sn − s| < ϵ/|a| for all n > N ) . Thus, for any n > N , we
have
ϵ
|asn − as| = |a||sn − s| < |a| = ϵ.
|a|
(ii) Notice that |(sn + tn ) − (s + t)| ≤ |sn − s| + |tn − t| by the Triangle Inequality. We know that there
exists an N1 ∈ N such that |sn − s| < ϵ/2 for all n > N1 , and an N2 ∈ N such that |tn − t| < ϵ/2 for all
n > N2 . Thus, for all all n > max{N1 , N2 }, we have
ϵ ϵ
|(sn + tn ) − (s + t)| ≤ |sn − s| + |tn − t| < + = ϵ.
2 2
(iii) Notice that
|sn tn − st| = |sn tn − stn + stn − st| ≤ |sn tn − stn | + |stn − st| = |tn ||sn − s| + |s||tn − t|
by the Triangle Inequality. Now consider the term |tn ||sn − s|. Since the sequence (tn ) is bounded (by
Theorem 2), we know there exists a b such that |tn | < b for all n ∈ N. Thus, we know that there exists an
N1 ∈ N such that |sn − s| < ϵ/(2b) for all n > N1 . Next consider the term |s||tn − t|, and assume for the
moment that s ̸= 0. We know that there exists an N2 ∈ N such that |tn − t| < ϵ/(2|s|) for all n > N2 . Thus,
for all n > max{N1 , N2 }, we have
ϵ ϵ
|sn tn − st| ≤ |tn ||sn − s| + |s||tn − t| < b + |s| = ϵ.
2b 2|s|
All that remains is to consider the case where s = 0. Similar to above, we know that there exists an N3 ∈ N
such that |sn | < ϵ/b for all n > N3 . Thus, for all n > N3 , we have
ϵ
|sn tn | ≤ |sn ||tn | < b = ϵ.
b
(iv) We just need to show that lim(1/tn ) = 1/t and then the result follows from (iii). Notice that
1 1 t − tn |tn − t| 1
− = = = |tn − t|.
tn t ttn |ttn | |t||tn |
6
We know that there exists a N1 such that |tn − t| < |t|/2 for all n > N1 . By the Reverse Triangle
Inequality, we have
||t| − |tn || ≤ |t − tn | = |tn − t|.
Thus, since ||t| − |tn || ≥ |t| − |tn |, we have |t| − |tn | < |t|/2 for all n > N1 , which implies that |tn | > |t|/2, or
equivalently, that
1 2
<
|tn | |t|
for all n > N1 . We also know that there exists an N2 ∈ N such that |tn − t| < ϵ|t|2 /2 for all n > N2 . Thus,
for all n > max{N1 , N2 }, we have
1 1 1 2 ϵ|t|2
− = |tn − t| < 2 = ϵ.
tn t |t||tn | |t| 2
It is worth noting here that lim sn /tn exists even if tn = 0 for some n < max{N1 , N2 }.
A sequence (sn ) is said to be non-decreasing if sn ≤ sn+1 for all n, and non-increasing if sn ≥ sn+1
for all n (if the inequalities are strict, then we say that the sequence is increasing rather than non-decreasing
or decreasing rather than non-increasing). A sequence that is either non-decreasing or non-increasing (or
both, in the case of a constant sequence) is called monotone. For example, the sequence (an ) defined by
an = n−2 is monotone (it is decreasing), while the sequence (bn ) defined by bn = (−1)n is not monotone.
We are now ready to prove one of the most important results we will encounter:
Theorem 4 (Monotone Convergence Theorem). If a sequence is monotonic and bounded, then it is conver-
gent.
Proof. Suppose the sequence (sn ) is monotone and bounded. More specifically, let’s say (sn ) is non-decreasing
(the non-increasing case can be handled analogously). Now, consider the set {sn : n ∈ N}. Since this set
is bounded, it has a supremum S ∈ R (by the completeness axiom), i.e., sn ≤ S for all n ∈ N. Intuitively,
lim sn = S, but let’s confirm this is indeed the case. For ϵ > 0, S − ϵ is not an upper bound of {sn : n ∈ N}
(s is the smallest upper bound of this set) and thus there exists an N such that sN > S − ϵ. Since sn is
non-decreasing, sn ≥ sN for all n > N . Thus,
S − ϵ < sN ≤ sn ≤ S < S + ϵ
for all n > N (the last inequality above follows from the fact that ϵ > 0). Therefore, |sn − S| < ϵ for n > N ,
i.e., (sn ) converges to S.
Consider again the sequence (bn ) defined by bn = (−1)n . Although this sequence is bounded, it is not
monotone, so the MCT cannot be applied. Of course, this does not mean that a sequence must be monotone
in order to converge. For example, consider the sequence (dn ) defined by
(−1)n
dn = ,
n
which has terms
and so on. This sequence not monotone, but it does converge to 0. To see this, let’s use our usual “trick”
and set
(−1)N
− 0 = ϵ,
N
which is equivalent to
1
=ϵ
N
or N = 1/ϵ. Thus, for all n > 1/ϵ, we have |dn − 0| < ϵ. To be concrete, set ϵ = 0.1 so that N = 10. As
Figure 2 makes clear, |dn − 0| < 0.1 for all n > 10 (e.g., |d11 − 0| ≈ 0.091).
7
Figure 2: The sequence (dn ) defined by dn = (−1)n /n
We will now consider an application of the MCT. Suppose (sn ) is a sequence, and let
n
X
Sn = sk
k=1
for each n ∈ N. Then (Sn ) is itself a sequence, which we call a sequence of partial sums. The terms in
(Sn ) are
S1 = s1 , S2 = s1 + s2 , S3 = s1 + s2 + s3 ,
P∞
and
P∞ so on. If (Sn ) converges to S, then we say that the infinite series k=1 s−2 k converges to S and write
k=1 sk = S. For example, consider again the sequence (an ) defined by an = n , and let
n n
X X 1 1 1 1
An = ak = 2
= 1 + + + ... + 2.
k 4 9 n
k=1 k=1
8
Pn
Figure 3: The sequence (An ) defined by An = k=1 n−2
i.e., 2 is an upper bound for (An ) (and, obviously, 1 is a lower bound). Thus, by the MCT, (An ) converges.
We didn’t work out what lim An is, but it is not 2 (it is actually π 2 /6 = 1.64 . . .). Often, it’s enough just to
know whether or not a sequence/series converges.
Notice that the sequence (An ) defined above can be also be defined recursively as
Another example of a recursively defined sequence is the well-known Fibonacci sequence (Fn ) defined by
Clearly, this sequence diverges to ∞. However, we can show that the sequence (Gn ) defined by Gn = Fn /2n
converges. First, notice that (Gn ) has terms
and so on, and thus seems to be non-increasing (see also Figure 4). We can confirm this by noting that, for
n ≥ 3,
Fn Fn+1 2Fn − Fn+1 2Fn − (Fn + Fn−1 ) Fn − Fn−1
Gn − Gn+1 = n − n+1 = n+1
= n+1
= >0
2 2 2 2 2n+1
(since (Fn ) itself is increasing for n ≥ 3). This also makes it clear that 1/2 provides an upper bound for
(Gn ). Moreover, Gn > 0 for all n (both the numerator and denominator are positive for all n), i.e., 0 is a
lower bound for (Gn ). Thus, since that (Gn ) is monotonic and bounded, the MCT tells us it converges.
Before moving on, let’s consider the series (Bn ) defined by
n
X
Bn = αn ,
k=1
9
Figure 4: The sequence (Gn ) defined by Gn = Fn /2n where (Fn ) is the Fibonacci sequence
so
α(1 − αn )
Bn = .
1−α
Thus, since lim αn = 0 (see Exercise 9), we have
α
lim Bn = .
1−α
by Theorem 3. It is worth noting that we can use this result to show that
n
X α 1
lim αn = α0 + =
1−α 1−α
k=0
* Subsequences
Let m1 , m2 , m3 . . . be natural numbers with m1 < m2 < m3 . . ., which implies that mn ≥ n for all n ∈
N. Given the sequence (sn ), the sequence (smn ), i.e., the sequence with nth term smn , is said to be a
subsequence of (sn ).
As an example, consider the sequence (en ) defined by
1
en = 1 + (−1)n ,
n
which has terms
and so on. One possible subsequence of (en ) can be constructed by setting mn = 2n for all n ∈ N, i.e.,
letting
em1 = d2 = 3/2, em2 = d4 = 5/4, em3 = d6 = 7/6
10
and so on. The terms in this subsequence consist of only the positive terms in the original sequence (en ). In
this sense, a subsequence can be viewed as being obtained by “deleting” certain terms from a sequence (in
this example, we deleted all the negative terms from (en )). Of course, we could also obtain a subsequence
from (en ) by deleting all its positive terms (i.e., setting mn = 2n − 1 for all n ∈ N). Alternatively, we could
obtain a subsequence from (en ) by deleting the first k terms (i.e., setting mn = n + k for all n ∈ N).
The following lemma will be used to prove a useful theorem about subsequences (a lemma is a theorem
that is not so interesting itself, but is instead used to prove other theorems):
Lemma 1. Every sequence contains a monotone subsequence.
Proof. We will say that sm is a dominant term in the sequence (sn ) if sm ≥ sn for all n > m. We now
consider two cases:
Case 1: Suppose (sn ) has infinitely many dominant terms. Then we can construct a subsequence (smn )
which has only of these dominant terms, and this subsequence will be non-increasing (i.e., monotone) since
sm1 ≥ sm2 ≥ sm3 ≥ . . ..
Case 2: Suppose (sn ) has finitely many (possibly zero) dominant terms. Let sm1 be the first term in
(sn ) after its final dominant term (if (sn ) has zero dominant terms, then m1 = 1). Since sm1 is not itself
dominant, there exists an m2 > m1 such that sm2 > sm1 . Similarly, since sm2 is not dominant, there exists
an m3 > m2 such that sm3 > sm2 , and so on. Thus, the subsequence (smn ) will be increasing (i.e., monotone)
since sm1 < sm2 < sm3 . . ..
Notice that the sequence (en ) defined above has infinitely many dominant terms, namely e2 , e4 , e6 , . . ., and
the subsequence that contains only these terms is monotone (it is decreasing). Similarly, the sequence (bn )
defined above has infinitely many dominant terms, namely b2 , b4 , b6 , . . ., and the subsequence that contains
only these terms is monotone (it is constant). On the other hand, the sequence (cn ) defined above has no
dominant terms since cn > cm for all n > m, and the subsequence (cmn ) that has nth term cmn = cn is
monotone (it is increasing).
We are now ready to prove the following:
Theorem 5 (Bolzano–Weierstrass Theorem). Every bounded sequence contains a convergent subsequence.
Proof. If (sn ) is a bounded sequence, then any subsequence of (sn ) is bounded. Thus by Lemma 1, (sn )
contains a monotone bounded subsequence, and by the MCT, this subsequence is convergent.
Notice that the sequence (en ) defined above is bounded (a lower bound is -2 and an upper bound is
3/2). The BWT says that this sequence must contain a convergent subsequence, even though it is not itself
convergent. One such subsequence is the one with terms e2 , e4 , e6 , . . . (which clearly converges to 1), another
has the terms e1 , e3 , e5 (which clearly converges to -1). The limit of a convergent subsequence is called a
subsequential limit (e.g., 1 and -1 are both subsequential limits of the sequence (en ) given above).
Before moving on, we will prove another lemma about subsequences that will be used later on:
Lemma 2. Every subsequence of a convergent sequence has the same limit.
Proof. Let (sn ) be sequence converging to s, i.e., for every ϵ > 0, there exists and N ∈ N such that |sn −s| < ϵ
for all n > N . Now, let (smn ) be an arbitrary subsequence of (sn ). Since mn ≥ n for all n ∈ N, we have
|smn − s| < ϵ for all n > N , and thus lim smn = s.
Recall that the sequence (en ) defined above has more than one subsequential limit. Thus, Lemma 2 tells
us that it is not convergent.
Suppose (sn ) is a bounded sequence. Defining the sequence (s̄n ) by s̄n = sup{sk : k ≥ n}, the limit
superior of (sn ) is
lim sup sn = lim s̄n .
Similarly, defining the sequence (sn ) by sn = inf{sk : k ≥ n}, the limit inferior of (sn ) is
lim inf sn = lim sn .
Both of these limits exist, even if (sn ) is divergent. To see this, notice the sequence (s̄n ) has terms
s̄1 = sup{s1 , s2 , s3 , . . .}, s̄2 = sup{s2 , s3 , s4 , . . .}, s̄3 = sup{s3 , s4 , s5 , . . .}
11
and so on. Since
{sn , sn+1 , sn+2 , . . .} ⊃ {sn+1 , sn+2 , sn+3 , . . .}
for all n ∈ N, we have s̄n ≥ s̄n+1 for all n ∈ N. That is, (s̄n ) is non-increasing. Moreover, (s̄n ) is bounded
since (sn ) is bounded. Thus, (s̄n ) converges by the MCT. Similarly, the sequence (sn ) is non-decreasing and
bounded, and thus converges by the MCT.
Let’s now consider several examples. First consider the sequence (an ) defined above. We have ān = an
and an = 0 for all n ∈ N, so lim sup an = 0 and lim inf an = 0. Next consider the sequence (bn ) defined
above. We have b̄n = 1 and bn = −1 for all n ∈ N, so lim sup bn = 1 and lim inf bn = −1. Finally, consider
the sequence (en ) defined above, which had terms
and so on, which makes makes it easy to see that lim ēn = 1, i.e., lim sup en = 1. Similarly, the sequence
(en ) has terms
e1 = −2, e2 = e3 = −4/3, e4 = e5 = −6/5,
and so on, which makes makes it easy to see that lim en = −1, i.e., lim inf en = −1.
As the first and third examples above make clear, it is possible that sn > lim sup sn for some n ∈ N, i.e.,
lim sup sn may be greater than sup{sn : n ∈ N}.
We will now prove a useful lemma:
Lemma 3. The limit superior and limit inferior of a bounded sequence are subsequential limits of that
sequence.
Proof. We will prove that every bounded sequence contains a subsequence that converges to its limit superior.
The proof that that every bounded sequence contains subsequence that converges to its limit inferior is
similar.
Suppose (sn ) is a bounded sequence with limit superior s̄. As in the proof of Lemma 1, we need to
consider two cases:
Case 1: Suppose (sn ) has infinitely many dominant terms. Then the subsequence (smn ) that has only
these dominant terms has nth term smn = s̄mn . Since lim s̄n = s̄, we have lim smn = s̄.
Case 2: Suppose (sn ) has finitely many (possibly zero) dominant terms. Letting sm1 be the first term in
(sn ) after its final dominant term, we have s̄mn ≤ s̄ for all n ∈ N. Since sm1 is not itself dominant, there
exists an m2 > m1 such that sm2 > sm1 , and so on, which means that (smn ) is increasing. Finally, since s̄
is an upper bound on (smn ), for any ϵ > 0, there exists an N such that
12
for all n ∈ N, which means that
sup{smk : k ≥ n} ≤ sup{sk : k ≥ n}
for all n ∈ N. Thus, t ≤ s̄, i.e., sup S = s̄. A similar argument can be used to show that inf S = s. Further,
since s̄, s ∈ S by Lemma 3, we can say that lim sup sn = max S and lim inf sn = min S, which means that S
is a singleton. Thus, by Lemma 2, (sn ) converges.
To illustrate this, consider again the sequence (dn ) defined above which had terms
and so on, which makes makes it easy to see that lim d¯n = 0, i.e., lim sup dn = 0. Similarly, the sequence
(dn ) has terms
d1 = −1, d2 = d3 = −1/3, d4 = d5 = −1/5,
and so on, which makes makes it easy to see that lim dn = 0, i.e., lim inf dn = 0. Since lim sup dn =
lim sup dn = 0, Theorem 6 tells us that lim en = 0 (as we saw earlier).
* Cauchy Sequences
A sequence sn is called a Cauchy sequence if, for every ϵ > 0, there exists an N ∈ N such that m, n > N
implies |sn − sm | < ϵ.
Suppose sn is a Cauchy sequence. What this says is that sn and sm will be “close” to one another (i.e.,
they will be no more than ϵ apart) provided n and m are both “sufficiently large” (i.e., larger than N , which
depends on ϵ).
Consider again the sequence (an ) defined by an = n−2 . To see that this is a Cauchy sequence, let
N = ϵ−1/2 and notice that, for n, m > ϵ−1/2 , we have |n−2 − m−2 | < ϵ. Why? Suppose (without loss of
generality) that m > n. This implies that n−2 > m−2 and thus |n−2 − m−2 | = n−2 − m−2 < ϵ − m−2 < ϵ
(the second to last inequality follows from the fact that n−2 < ϵ when n > ϵ−1/2 , and the last inequality
follows from the fact that m−2 > 0).
The following two lemmas will be used in proving an important theorem about Cauchy sequences:
Lemma 4. All convergent sequences are Cauchy sequences.
Proof. Let sn be a convergent sequence with limit s, i.e., for all ϵ > 0, there exists an N ∈ N such that
|sn − s| < ϵ. Since this holds for all ϵ > 0, we can also say there exists an N ∈ N such that |sn − s| < ϵ/2.
Now, from the triangle inequality, we have
|sn − sm | ≤ |sn − s| + |s − sm |
(to see this, let a = sn − s and b = s − sm and recall that |a + b| ≤ |a| + |b| for all a, b ∈ R). Combining this
with the above fact that |sn − s| < ϵ/2 for all n > N (and, equivalently, that |sm − s| < ϵ/2 for all m > N ),
we have
|sn − sm | < ϵ
for all n, m > N .
Lemma 5. All Cauchy sequences are bounded.
Proof. If sn is a Cauchy sequence, there exists an N ∈ N such that |sn − sm | < ϵ for all n, m > N , which
means that |sn − sN +1 | < ϵ for all n > N (obviously N + 1 > N ). Now, from the triangle inequality, we have
13
(to see this, let a = sn − SN +1 and b = sN +1 and recall that |a + b| ≤ |a| + |b| for all a, b ∈ R). Combining
this with the fact that |sn − sN +1 | < ϵ for all n > N means that
(the first inequality follows from the triangle inequality; to see this, let a = sn − sK and b = sK − s∗ and
recall that |a + b| ≤ |a| + |b| for all a, b ∈ R).
1 ty + (1 − t)x
≤ .
tx + (1 − t)y xy
Since x, y ∈ (0, ∞) and t ∈ (0, 1), the denominator on each side of the above is positive, so we can write
14
Figure 5: The function f : (0, ∞) → R defined by f (x) = 1/x
Since t ∈ (0, 1), this is equivalent to 0 ≤ (x − y)2 , which is true for any x, y ∈ R. Figure 5 depicts this for
the case where x = 1/2, y = 3/2, and t = 1/2. Notice that
1 1 1 3
f (tx + (1 − t)y) = f × + × = f (1) = 1,
2 2 2 2
The downward-sloping dashed line shows that this ordering generalizes to other values of t ∈ (0, 1).
Some Topology of R
Let ϵ ∈ (0, ∞). The ϵ-neighbourhood of x ∈ R is Vϵ (x) = (x − ϵ, x + ϵ). The set Vϵ (x) \ {x} = (x − ϵ, x) ∪
(x, x + ϵ) is called the deleted ϵ-neighbourhood of x.
The point x ∈ R is an interior point of the set A ⊂ R if there exists a Vϵ (x) ⊂ A. Note that an interior
point of A must be in A, but this does not mean that every point in A is an interior point of A. For example,
the points a and b are not interior points of [a, b] since, for every ϵ > 0, Vϵ (a) \ [a, b] = (a − ϵ, a) ̸= ∅ and
Vϵ (b) \ [a, b] = (b, b + ϵ) ̸= ∅ (i.e., every Vϵ (a) and Vϵ (b) contain points outside [a, b]). On the other hand,
every point in (a, b) is an interior point of [a, b]. To see this, let x ∈ (a, b). We need to show that there is
some Vϵ (x) ⊂ [a, b]. There are two cases to consider. First, if x > (a + b)/2 (i.e., if x closer to b than a), then
Vb−x (x) = (2x − b, b) ⊂ [a, b] (since 2x − b > a). Second, if x ≤ (a + b)/2, then Vx−a (x) = (a, 2x − a) ⊂ [a, b]
(since 2x − a ≤ b). Similar arguments could be used to show that a and b are not interior points of (a, b),
but that every point in (a, b) is an interior point of (a, b). Finally, it should be clear that every point in R is
an interior point of R.
15
The point x ∈ R is a boundary point of the set A ⊂ R if every Vϵ (x) contains points in both A and
R \ A. Note that a boundary point of A may not be in A. For example, a is a boundary point of (a, b) since,
for every ϵ > 0, Vϵ (a) ∩ (a, b) = (a, min{a + ϵ, b}) ̸= ∅ and Vϵ (a) ∩ (R \ (a, b)) = (a − ϵ, a] ̸= ∅ (i.e., every Vϵ (a)
contains points inside (a, b) and outside (a, b)). Similarly, b is a boundary point of (a, b) since, for every
ϵ > 0, Vϵ (b) ∩ (a, b) = (max{a, b − ϵ}, b) ̸= ∅ and Vϵ (b) ∩ (R − (a, b)) = (b, b + ϵ] ̸= ∅. On the other hand, no
point in (a, b) is a boundary point of (a, b). To see this, let x ∈ (a, b). We need to show that there is some
Vϵ (x) that does not contain any points outside (a, b). There are again two cases to consider. If x > (a + b)/2
then Vb−x (x) = (2x − b, b) and Vϵ (a) ∩ (R \ (a, b)) = ∅, and if x ≤ (a + b)/2 then Vx−a (x) = (a, 2x − a) and
Vϵ (a) ∩ (R \ (a, b)) = ∅. Similar arguments could be used to show that a and b are boundary points of [a, b],
but that every point in (a, b) is not a boundary point of [a, b]. Finally, it should be clear R has no boundary
points.
The point x ∈ R is a limit point of the set A ⊂ R if every (Vϵ (x) \ {x}) ∩ A ̸= ∅ (i.e., if every Vϵ (x)
contains at least one point in A other than x).4 Note that a limit point of A need not be an element of A. For
example, every point in [a, b] is a limit point of (a, b). First, the points a and b are limit points in (a, b) since,
for every ϵ > 0, (Vϵ (a) \ {a}) ∩ (a, b) = (a, min{a + ϵ, b}) ̸= ∅ and (Vϵ (b) \ {b}) ∩ (a, b) = (max{a, b − ϵ}, b) ̸= ∅.
Second any point x ∈ (a, b) is a limit point in (a, b) since, for every ϵ > 0, (Vϵ (b) \ {b}) ∩ (a, b) = (max{a, b −
ϵ}, min{b, x + ϵ}) \ {x} ̸= ∅. A similar argument could be used to show that every point in [a, b] is a limit
point of [a, b]. Finally, it should be clear that every point in R is a limit point of R.
At this point you might be thinking that the limit points of a set are just the union of its interior points
and its boundary points. This is true for intervals in R, but it is not true more generally. For example, the
finite set {0, 1, 2} has no limit points at all. The points 0, 1 and 2 are isolated points in this set, i.e., points
in the set which are not limit points of the set (also, the points 0 and 2 are boundary points of this set, while
the point 1 is an interior point of this set).
A set A ⊂ R is said to be open if every point in A is an interior point of A. As we just saw, the open
interval (a, b) is an open set (hence the name). On the other hand, the closed interval [a, b] is not an open
set since a and b are not interior points of it. Note also that the union of two open sets is an open set. To
see this, consider any point c ∈ A ∪ B, where A and B are open sets. This means c ∈ A or c ∈ B (or both),
so there must exist a Vϵ (c) ∈ A or Vϵ (c) ∈ B (or both). Thus, Vϵ (c) ∈ A ∪ B. Similar arguments could be
used to show that the intersection of two closed sets is a closed set.
A set A ⊂ R is said to be closed if every boundary point of A is a point in A. Not surprisingly, the
closed interval [a, b] is a closed set since a, b ∈ [a, b]. On the other hand, the open interval (a, b) is not closed
since a, b ∈
/ (a, b). You should try to convince yourself that the union of two closed sets is closed, and that
the intersection of two closed sets is closed.
A surprising result is that R is both open and closed (it is “clopen”). It is open since, for every x ∈ R,
we have Vϵ (x) ⊂ R for any ϵ. It is closed since it has no boundary points and thus it is vacuously true that
it contains all of its boundary points. It may also be surprising that a set can be neither open nor closed.
For example, consider the half-open interval (a, b]. This set is open since there exists no Vϵ (b) in it. It is
also not closed since a is a boundary point for it but a ∈ / (a, b]. Similarly, [a, b) is neither open nor closed.
On the other hand, (−∞, b] and [a, ∞) are closed since they include all their boundary points (b and a,
respectively).
Functional Limits
Let A ⊂ R and c be a limit point of A. We say f : A → R has limit L at c if, for any ϵ > 0, there
exists a δ > 0 such that |f (x) − L| < ϵ for all points x ∈ A satisfying 0 < |x − c| < δ. This is written as
limx→c f (x) = L. Alternatively, we may say that f converges to L at c.
We can also write this definition using the language of neighbourhoods. Specifically, we say that f has
limit L at c if, for any ϵ > 0, there exists a δ > 0 such that f (x) ∈ Vϵ (L) for any x ∈ (Vδ (c) \ {c}) ∩ A.
What limx→c f (x) = L means is that is that we can make f (x) “arbitrarily close” to L (i.e., within
the ϵ-neighbourhood of L) by making x “sufficiently close”, but not equal, to c (i.e., within the deleted
δ-neighbourhood of c). The key is that ϵ is arbitrary while δ will, in general, depend on ϵ. Accordingly, ϵ
can be seen to play the same role that it did in the definition of a limit of a sequence, while δ plays a similar
role to that of N .
4A limit point is also called a cluster point or an accumulation point.
16
√
Figure 6: The function f : R+ → R defined by f (x) = x
√
For example, let’s show formally that the function f : R+ → R defined by f (x) = √x has limit 2 at 4.
This means that we need to find a δ > 0 such that x ∈ R+ and 0 < |x − 4| < δ imply | x − 2| < ϵ for any
ϵ > 0. Notice that √ √
√ ( x − 2)( x + 2) |x − 4| |x − 4|
| x − 2| = √ =√ <
x+2 x+2 2
(the inequality at the end follows from the fact that the denominator on the left is greater than the denom-
inator on the right). The above suggests that we could set δ = 2ϵ (this is a very “conservative” choice of δ).
To see that this “works”, note that 0 < |x − 4| < 2ϵ implies that
|x − 4|
< ϵ.
2
√ √
We saw above that the quantity on the left is larger than | x − 2|, so | x − 2| < ϵ as desired. To be concrete,
suppose we want ϵ = 0.1 and thus set√δ = 0.2. Now, consider a point that is within 0.2 of 4, say 3.81 (notice
that |3.81 − 4| = 0.19 < 0.2). Since | 3.81 − 2| ≈ 0.048 < 0.1, we are safe. Indeed, as Figure 6 makes clear,
for any x ∈ (3.8, 4.2) (i.e., any x between the vertical dotted lines), we have f (x) ∈ (1.9, 2.1) (i.e., any f (x)
between the horizontal dotted lines).
It is important to emphasize that x need not be in A, which means that f need not be defined at c (this
is what makes limits interesting!). For example, consider the function f : [0, 4) ∪ (4, ∞) → R defined by
√
x−2
f (x) =
x−4
(see Figure 7; the open dot emphasizes that f (4) is undefined). This function is not defined at 4, but 4 is a
limit point of its domain. We will now show that limx→4 f (x) = 1/4. First, note that
√
x−2 1 |x − 4| |x − 4|
− = √ <
x−4 4 4( x + 2)2 4
17
√
Figure 7: The function f : [0, 4) ∪ (4, ∞) → R defined by f (x) = ( x − 2)/(x − 4)
can guarantee that |f (x) − 1/4| < ϵ whenever 0 < |x − 4| < δ (as in the previous example, this is a very
“conservative” choice of δ).
Let A ⊂ R and f : A → R. If c is a limit point of the set {x ∈ A : x > c}, then we say that f has
right-hand limit L at c (and write limx→c+ f (x) = L) if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − L| < ϵ for all points x ∈ A satisfying 0 < x − c < δ (notice that the inequality on the left in this
condition implies that x > c). Similarly, if c is a limit point of the set {x ∈ A : x < c}, then we say that f
has left-hand limit L at c (and write limx→c− f (x) = L) if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − L| < ϵ for all points x ∈ A satisfying 0 < c − x < δ (notice that the inequality on the left in this
condition implies that x < c). The following result should be quite intuitive from these definitions:
Theorem 8. Let A ⊂ R, f : A → R, and c be a limit point of both {x ∈ A : x > c} and {x ∈ A : x < c}.
Then limx→c f (x) = L if and only if limx→c+ f (x) = limx→c− f (x) = L.
Proof. By definition, if f has limit L, then the left-hand and right-hand limits of f will both be equal to L.
Conversely, if the left-hand and right-hand limits of f both equal L, then for any ϵ > 0, there exists a δ1 > 0
such that |f (x) − L| < ϵ for all points x ∈ A satisfying 0 < x − c < δ1 and a δ2 such that |f (x) − L| < ϵ for
all points x ∈ A satisfying 0 < c − x < δ2 . Setting δ = min{δ1 , δ2 } we thus have |f (x) − L| < ϵ for all points
x ∈ A satisfying 0 < |x − c| < δ, i.e., f has limit L.
Notice that Theorem 8 does not apply to the endpoints of an interval. For example, if the domain of
the function f is (a, b) then neither limx→a− f (x) nor limx→b+ f (x) exist, but it is possible that limx→a f (x)
and limx→b f (x) to exist. In particular, if limx→a+ f (x) exists then limx→a f (x) = limx→a+ f (x) and if
limx→b+ f (x) exists then limx→b f (x) = limx→b− f (x). An example is given in Exercise 13.
A function which does not converge at some limit point in its domain is said to diverge at that point.
For example, if limx→c− f (x) and limx→c− f (x) both exist, but limx→c+ f (x) ̸= limx→c− , then f diverges at
L (by Theorem 8). As an example, consider the function f : R → R defined by
(
1 + x if x ≥ 0
f (x) =
x if x < 0,
or more compactly, f (x) = x + max{0, x}/x (see Figure 8; the open dot emphasizes that f (0) ̸= 0). First,
notice that 0 is a limit point of both (0, ∞) and (−∞, 0). To see that limx→0+ f (x) = 1, set δ = ϵ so that
18
Figure 8: The function f : R → R defined by f (x) = x + max{0, x}/x
|f (x) − 1| = x < ϵ whenever 0 < x < δ. To see that limx→0+ f (x) = 0, set δ = ϵ so that |f (x) − 0| = −x < ϵ
whenever 0 < −x < δ (or equivalently, x ∈ (−δ, 0)). Thus, we can say that f diverges at 0.
Let A ⊂ R and c be a limit point of A. We say f : A → R diverges to ∞ at c (and write limx→c f (x) = ∞)
if, for every α ∈ R, there exists a δ > 0 such that f (x) > α for all points x ∈ A satisfying 0 < |x − c| < δ. We
similarly say that f diverges to −∞ at c (and write limx→c f (x) = −∞) if, for every β ∈ R, there exists a
δ > 0 such that f (x) < β for all points x ∈ A satisfying 0 < |x − c| < δ.
For example, we can say that
√ the function f : (−∞, 0)∪(0,
√ ∞) → R defined√by f (x) = 1/x2 diverges to ∞
at 0. To see this, set δ = 1/ α. Then, for 0 < |x| < 1/ α, we have 1/|x| > α, or equivalently, 1/x2 > α.
More concretely, suppose α = 100 and pick some x such that 0 < |x| < 0.1, say x = 0.09. We then have
1/x2 ≈ 123.5 > 100 (see Figure 9). The general idea here is that, for any x in the deleted δ-neighbourhood
of c, we have f (x) > α (think of α as a large positive number and δ as depending on α).
We can combine the above definitions to write things like limx→c+ f (x) = ∞ (f diverges to ∞ from the
right), limx→c− f (x) = ∞ (f diverges to ∞ from the left), limx→c+ f (x) = −∞ (f diverges to −∞ from
the right), or limx→c− f (x) = −∞ (f diverges to −∞ from the left). For example, consider the function
f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = 1/x. It should be clear from Figure 10 that limx→0+ f (x) = ∞
and limx→0− f (x) = −∞ (Figure 10 should also make it clear how foolish it would be to write 1/0 = ∞).
Let A ⊂ R and f : A → R. If A is unbounded from above, we write limx→∞ f (x) = L if, for every
ϵ > 0, there exists an a ∈ R such that |f (x) − L| < ϵ for all points x ∈ A satisfying x > a. Similarly, if A
is unbounded from below, we write limx→−∞ f (x) = L if, for every ϵ > 0, there exists a b ∈ R such that
|f (x) − L| < ϵ for all points x ∈ A satisfying x < b.
For example, consider again√ the function f : (−∞, 0)√∪ (0, ∞) → R defined by f (x) = 1/x2 . To see that
2
limx→∞ f (x) = 0, set a = 1/ ϵ. Then, for any x > 1/ ϵ, we have |1/x | < ϵ. Concretely, let ϵ = 0.01 so
that a = 10. Now, for any x > 10, we have |1/x2 | < ϵ (e.g., with x = 1, we have |1/x2 | ≈ 0.008). You should
try to convince yourself that limx→−∞ f (x) = 0.
Continuity
Let A ⊂ R and c ∈ A. We say f : A → R is continuous at c if, for any ϵ > 0, there exists a δ > 0 such that
|f (x) − f (c)| < ϵ for all points x ∈ A satisfying |x − c| < δ (notice that this allows for x = c). Alternatively,
19
Figure 9: The function f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = 1/x2
20
we may say that f is continuous at c if, for any ϵ > 0, there exists a δ such that f (A ∩ Vδ (c)) ⊂ Vϵ (f (c)). If
f is continuous at every point in B ⊂ A, we say that it is continuous on the set B.
It is useful to note that, if c is a limit point of A (i.e., so long as c is an element of A but is not an
isolated point of A), then we can say that f is continuous at c if and only if limx→c f (x) = f (c). On the
other hand, if c is an isolated point of A, then f is automatically continuous at c. The reason for this is
that, since c is an isolated point in A there exists some δ such that Vδ (c) = {c}, and thus for any x ∈ Vδ (c)
we have |f (x) − f (c)| = 0 (Exercise 15 provides a relevant example). Of course, we very rarely encounter a
function that has an isolated point in its domain, so we typically only care about continuity at limit points.
Continuing
√ with one of the examples from the previous section, the function f : [0, ∞) → R defined by
f (x) = x is continuous at 4 since limx→4 f (x) = f (4). On the other hand, the function f : R → R defined
by f (x) = x + max{0, x}/x (see Figure 8) is not continuous at 0. To see this, note that limx→0− f (x) = 0
and limx→0+ f (x) = 1, so limx→0 f (x) does not exist (by Theorem 8). Discontinuities that occur where the
left-hand limit and right-hand limits are not equal are called jump discontinuities.
It is worth emphasizing that, if f is not defined at c, then c cannot possibly be a point of continuity
regardless of whether or not lim√x→c f (x) = c exists. For example, consider again the function f : [0, 4) ∪
(4, ∞) → R defined by f (x) = ( x − 2)/(x − 4) (see Figure 7). This function is not continuous at 4 since it
is not defined at 4. Discontinuities that occur where the limit exists but the function is not defined (or the
function is defined but has a different value than the limit) are called removable discontinuities.
and so on. The point here is that, by applying a function to sequence, we obtain another sequence which we
√
can apply all our usual results to. For example, it is easy to show that lim an = 0.
With this in mind, we now present the following theorem:
Theorem 9. Let A ⊂ R, c be a limit point of A, and f : A → R. The two statements are equivalent:
(i) limx→c f (x) = L.
(ii) For every sequence (sn ) with sn ∈ A \ {c} for all n ∈ N and lim sn = c, we have lim f (sn ) = L.
Proof. We first show that (i) implies (ii). Suppose, for any ϵ > 0, there exists a δ such that |f (x) − L| < ϵ
for all x ∈ A such that 0 < |x − c| < δ. If (sn ) converges to c, then, for any δ > 0, there exists an N ∈ N
such that 0 < |sn − c| < δ for n > N (note that here we have used δ rather than ϵ in the definition of a limit
of a sequence, and also that |sn − c| > 0 since sn ∈ A \ {c}, i.e., sn ̸= c). This means that |f (sn ) − L| < ϵ
for n > N .
We now show that (ii) implies (i), this time using a contrapositive argument. Suppose that there exists
an ϵ > 0 such that, for any δ, |f (x) − L| ≥ ϵ for at least one x ∈ A satisfying 0 < |x − c| < δ. Thus, there
is a number xn ∈ A such that 0 < |xn − c| < n−1 (ensuring that lim xn = c) but |f (xn ) − L| ≥ ϵ. In other
words, if (xn ) converges to c but lim f (xn ) = L fails to hold, then L is not the limit of f at c.
Statement (ii) in Theorem 9 is called the “sequential criterion” for a functional limit and will be quite
useful in proving some important results about functions.
We can also define continuity in terms of sequences:
Theorem 10. Let A ⊂ R, c ∈ A, and f : A → R. The two statements are equivalent:
(i) f is continuous at c.
(ii) For every sequence (sn ) with sn ∈ A for all n ∈ N and lim sn = c, we have lim f (sn ) = f (c).
21
Proof. The proof is similar to that for Theorem 9 except that we may have sn = c for some n ∈ N.
Before moving on, we present a very simple lemma (to be used later on) that is proved using the sequential
criteria for a functional limit:
Lemma 6. If a convergent sequence is bounded within a closed interval, then its limit is in that closed
interval.
Proof. Suppose the sequence (sn ) converges to s and that sn ∈ [a, b] for all n ∈ N. We want to show that
s ∈ [a, b]. First, for any ϵ > 0, there exists an N ∈ N such that |sn − s| < ϵ, or equivalently, s − ϵ < sn < s + ϵ
for all n > N . Now, since a ≤ sn ≤ b for all n ∈ N, we have a ≤ sn < s + ϵ, which implies a < s + ϵ and thus
a ≤ s, and similarly, s − ϵ < sn ≤ b, which implies s − ϵ < b and thus s ≤ b.
22
Composite Functions
We will sometimes be interested in applying one function to the output of another function. Specifically, let
A, B ⊂ R, f : A → R, and g : B → R, with f (A) ⊂ B (i.e., the range of A is a subset of the domain of g).
Then for any x ∈ A we define
(g ◦ f )(x) = g(f (x))
as the composition of g on f . The domain of this composite function is A (the domain of f ), while its
range is
g(f (A)) = {g(f (a)) ∈ R : a ∈ A},
which is subset of the range of g (possibly, but not necessarily
p a proper subset of it).
For example, let f : (0, ∞) → R be defined by f (x) = 1/x and g : R → R be defined by g(x) p = −|x|
(notice that f ((0, ∞)) = (0, ∞), which is a subset of R, the domain of g). Here, we have (g◦f )(x) = −| 1/x|,
which has domain
p (0, ∞) and range (−∞, 0) (a proper subset of R− , the range of g). On the other hand,
(f ◦ g)(x) = −1/|x| (i.e., the composition of f on g) is completely nonsensical (it is not defined for any x).
The following theorem relates to the continuity of compositions:
Theorem 13. Let A, B ⊂ R, f : A → R, and g : B → R, with f (A) ⊂ B. If f is continuous at c ∈ A and g
is continuous at f (c) ∈ B, then g ◦ f is continuous at c.
Proof. Let b = f (c) and let W be an ϵ-neighbourhood of g(b). Since g is continuous at b, there exists a
δ-neighbourhood V of b such that g(V ) ⊂ W . Backing up a step, since f is continuous at c, there exists a
γ-neighbourhood U of c such that f (U ) ⊂ V . Thus g ◦ f (U ) ⊂ W since f (A) ⊂ B.
An immediate consequence of Theorem 13 is that, if f is continuous on A and g is continuous on B, then
g ◦ f is continuous on A.
Differentiation
Let A ⊂ R and x ∈ A. We say that f : A → R is differentiable at the point x if
f (x + h) − f (x)
lim .
h→0 h
exists.5 If this limit does exist, it is called the derivative of f at x and is denoted by f ′ (x). If f is
differentiable at all points in B ⊂ A, we say that it is differentiable on the set B and f ′ is then a
real-valued function defined on B (do not confuse the real number f ′ (x) with the function f ′ ; the value of
f ′ (x) may be different for different values of x but it is always a real number). If f ′ (x) and
f ′ (x + h) − f ′ (x)
lim
h→0 h
both exist, we say that f is twice differentiable at the point x, and refer to the latter limit as the second
derivative of f at x, denoted f ′′ (x). If f is twice differentiable at all points in C ⊂ B, we say that it is
twice differentiable on the set C, and f ′′ is then a real-valued function defined on C. Note that, if f is
twice differentiable on C, then it must also be differentiable on C (f ′ cannot be differentiable on C if f is
not differentiable on C). Note that C ⊂ B ⊂ A allows for the possibility that A, B, and C are all equal, but
rules out the possibility that A is a proper subset of B or that B is a proper subset of C.
For example, consider the function f : R → R defined by f (x) = x2 . We have
(x + h)2 − x2
f ′ (x) = lim = lim 2x + h = 2x
h→0 h h→0
and
2(x + h) − 2x
f ′′ (x) = lim = lim 2 = 2,
h→0 h h→0
5 Letting y = x + h, we can also write this limit as limy→x (f (y) − f (x))/(y − x).
23
√
so f is twice differentiable on R. As another example, consider the function g : R+ → R defined by g(x) = x.
We have √ √
′ x+h− x 1 1
g (x) = lim = lim √ √ = √ ,
h→0 h h→0 x+h+ x 2 x
so g is differentiable only on (0, ∞), which is a proper subset of R+ . Moreover,
√1 − 1
√
2 x+h 2 x 1 1
g ′′ (x) = lim = lim − √ √ √ √ = − 3/2 ,
h→0 h h→0 2 x x + h( x + x + h) 4x
f (x + h) − f (x) f (x + h) − f (x)
lim f (x + h) − f (x) = lim h = lim h lim = 0f ′ (x) = 0.
h→0 h→0 h h→0 h→0 h
Thus, limh→0 f (x + h) = f (x), or equivalently, limy→x f (y) = f (x) (to see this, let h = y − x).
It is possible that f is continuous at x, but not differentiable at x. For example, the function f : R → R
defined by f (x) = |x| is continuous at 0, but it is not differentiable at 0.
An immediate consequence of Theorem 14 is that, if f is differentiable on some set, then it is continuous
on that set. Moreover, if f is twice differentiable on some set, then both f and f ′ are continuous on that set.
The following theorem establishes some algebraic properties of derivatives:
Theorem 15. Let A ⊂ R and x ∈ A. If f : A → R and g : A → R are differentiable at x, then:
(i) (af )′ (x) = af ′ (x).
af (x + h) − af (x) f (x + h) − f (x)
(af )′ (x) = lim . = a lim = af ′ (x).
h→0 h h→0 h
Notice that we used part (i) of Theorem 11 in moving a outside the limit. Other parts of Theorem 11 are
used below.
(ii) This is not really any more difficult:
24
(iv) Now things get more interesting:
f (x + h)g(x + h) − f (x)g(x)
(f g)′ (x) = lim
h→0 h
f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)
= lim
h→0 h
(f (x + h) − f (x))g(x + h) + f (x)(g(x + h) − g(x))
= lim
h→0 h
f (x + h) − f (x) g(x + h) − g(x)
= lim lim g(x + h) + f (x) lim
h→0 h h→0 h→0 h
= f ′ (x)g(x) + f (x)g ′ (x).
Here, we have used the fact that, since g is differentiable at x, it is continuous at x (by Theorem 14), and
thus limh→0 g(x + h) = g(x).
(v) We will again use the fact that limh→0 g(x+h) = g(x) and thus, since g(x) ̸= 0, we have limh→0 1/g(x+
h) = 1/g(x):
f (x + h)/g(x + h) − f (x)/g(x)
(f /g)′ (x) = lim
h→0 h
f (x + h)g(x) − f (x)g(x + h)
= lim
h→0 hg(x)g(x + h)
1 1 f (x + h)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(x + h)
= lim lim
g(x) h→0 g(x + h) h→0 h
1 (f (x + h) − f (x))g(x)−f (x)(g(x+h) − g(x))
= lim
(g(x))2 h→0 h
1 f (x + h) − f (x) g(x+h) − g(x)
= g(x) lim −f (x) lim
(g(x))2 h→0 h h→0 h
′ ′
f (x)g(x)−f (x)g (x)
= .
(g(x))2
25
Figure 11: The function f : [−1, 1] → R defined by f (x) = x + |x|
To confirm this, define the function h : R → R by h(x) = 4 + 4x + x2 , and note that h(x) = (g ◦ f )(x) for
all x ∈ R. We can see directly that h′ (x) = 4 + 2x.
Optimization
Let A ⊂ R and f : A → R. Various optimizers of f are defined as follows:
• The point x∗ ∈ A is a global maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ A (i.e., if f (x∗ ) = max f (A)).
• The point x∗ ∈ A is a local maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.
• The point x∗ ∈ A is a global minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ A (i.e., if f (x∗ ) = min f (A)).
• The point x∗ ∈ A is a local minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.
These definitions should make it clear that a global optimizer is necessarily a local optimizer (but not
vice-versa).
If x∗ is a global/local maximizer of f , we say that f (x∗ ) is a global/local maximum value of f . Similarly,
if x is a global/local minimizer of f , we say that f (x∗ ) is a global/local minimum value of f .
∗
The key word above is “a”: A function may have multiple global maximizers, multiple local maxi-
mizers, multiple global minimizers, and/or multiple local minimizers. For example, consider the function
f : [−1, 1] → R defined by f (x) = x + |x| (see Figure 11). Every point in [−1, 0] is a global minimizer (and
thus a local minimizer), while 1 is the only global maximizer (and thus a local maximizer). What may be
surprising is that every point in [−1, 0) is also a local maximizer (0 is not a local maximizer since f (x) > f (0)
for all x ∈ (0, 1]).
The following theorem can be used to establish the existence of global optimizers:
26
Figure 12: The function f : (−2, 1] → R defined by f (x) = x2
Theorem 17 (Weierstrass Extreme Value Theorem). If a real-valued function defined on a closed interval
is continuous on its domain, then it reaches a global maximum and a global minimum.
Proof. Let f : [a, b] → R. Suppose that f is unbounded on [a, b]. Then for each n ∈ N, there is an sn ∈ [a, b]
such that |f (sn )| > n. By the BWT, the sequence (sn ) has a subsequence (smn ) that converges to some s.
Since (sn ) is bounded within [a, b], so is (smn ), and thus s ∈ [a, b] by Lemma 6. Now, since f is continuous
at s, we have lim f (smn ) = f (s), but lim |f (smn )| = ∞, which is a contradiction. Thus, f must be bounded
on [a, b].
Let M be the supremum of the range of f . Then for each n ∈ N, there is an tn ∈ [a, b] such that
M − n−1 < f (tn ) ≤ M , so that lim f (tn ) = M . By the BWT, the sequence (tn ) has a subsequence (tmn )
that converges to some t, and t ∈ [a, b] by the argument in the preceding paragraph. Now, since f is
continuous at t, we have lim f (tmn ) = f (t). Moreover, since f (tmn ) is a subsequence of f (tn ), and f (tn )
converges to M , Lemma 2 tells us that f (tmn ) also converges to M . Thus, M is a global maximum of f . To
show that f has a global minimum, we apply the preceding argument to −f .
It is possible that a function has a global maximizer or global minimizer even if it is not continuous
√ or
if its domain is not a closed interval. For example, the function f : [0, b) → R defined by f (x) = x has a
global minimizer at 0. That said, there is no√global maximizer
√ of this function. To see this, suppose that
x∗ ∈ [0, b) is a global maximizer of f . Since x∗ + ϵ > x∗ for any√ϵ ∈ (0, b − x∗ ), this is a contradiction.
On the other hand, the function g : [0, b] → R defined by g(x) = x must have both a global maximizer
and global minimizer since its domain is a closed interval and it is continuous
√ (the global maximizer is b).
It should be clear that the function h : (0, b) → R defined by h(x) = x has no global maximizers or global
minimizers.
If a function does not have any global local maximizers, then it definitely does not have any local global
maximizers, but a function can have one or more local maximizers and not have any global maximizers (the
same statement holds true if we replace the word “maximizers” with “minimizers”). To see this, consider
the function f : (−2, 1] → R defined by f (x) = x2 (see Figure 12; the open dot emphasizes that this function
is not defined for x = −2). Although 1 is a local maximizer, there is no global maximizer (1 is not a global
maximizer since f (x) > f (1) for any x ∈ (−2, −1)).
The following theorem provides a necessary condition for a local optimizer:
27
Figure 13: The function f : (−1, 1) → R defined by f (x) = x3
28
Figure 14: The function f : [−2, 2] → R defined by f (x) = x4 − x2
Theorem 19 (Rolle’s Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b). If
f (a) = f (b), then there exists a c ∈ (a, b) such that f ′ (c) = 0.
Proof. Since f is continuous on a closed interval, it has a global maximizer and a global minimizer by the
WEVT. If a (and thus b since f (a) = f (b)) is both a global maximizer and a global minimizer, then f is
constant and f ′ (c) = 0 for any c ∈ (a, b). Otherwise, there is a global optimizer in (a, b) and by Theorem
18, f ′ will be equal to zero at this optimizer.
Theorem 20 (Mean Value Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b).
Then there exists a c ∈ (a, b) such that
f (b) − f (a)
f ′ (c) = .
b−a
Proof. Consider the function g : [a, b] → R defined by
f (b) − f (a)
g(x) = f (x) − (x − a) + f (a) .
b−a
Note that g is continuous on [a, b] and differentiable on (a, b), and that g(a) = g(b). Thus, by Rolle’s
Theorem, there exists a c ∈ (a, b) such that g ′ (c) = 0. Since
f (b) − f (a)
g ′ (c) = f ′ (c) − ,
b−a
we have the desired result.
We will not use Rolle’s Theorem for anything other than proving the MVT, so it might better be called
a lemma, but we will honour tradition by calling it a theorem. In fact, we will only use the MVT to prove
the following lemma (which truly is a lemma):
Lemma 7. Let I be an interval in R and f : I → R be differentiable on I. For all x1 , x2 ∈ I satisfying
x1 < x2 :
29
(i) f ′ (x) ≥ 0 for all x ∈ I if and only if f (x1 ) ≤ f (x2 ) (i.e., f is non-decreasing on I).
(ii) f ′ (x) ≤ 0 for all x ∈ I if and only if f (x1 ) ≥ f (x2 ) (i.e., f is non-increasing on I).
Proof. (i) Suppose f ′ (x) ≥ 0 for all x ∈ I and apply the MVT to f on [x1 , x2 ] to obtain a c ∈ (x1 , x2 ) such
that
f (x2 ) − f (x1 ) = f ′ (c)(x2 − x1 ).
Since f ′ (c) ≥ 0 and x2 > x1 , the right-hand-side of the above is non-negative, which means that f (x1 ) ≤
f (x2 ). Moving in the other direction, if f (x1 ) ≤ f (x2 ), then the right-hand-side of the above must be
non-negative, which means that f ′ (c) ≥ 0 (since x2 > x1 ).
(ii) The proof of (ii) is similar and will be omitted.
Note that in Lemma 7, I need not be a closed interval, whereas in the previous two theorems, we required
f to be defined at the endpoints a and b (if f : (a, b) → R, then f is not defined at a or b). If we wanted
to apply Lemma 7 when I = (a, b), we just need to keep in mind that x1 > a and x2 < b since we require
x1 , x2 ∈ I. On the other hand, if we wanted to apply Lemma 7 when I = [a, b], we could have x1 = a or
x2 = b. The key is that x1 , x2 must be interior points in I. Of course, we could also apply Lemma 7 when I
is half-open or when I involves −∞ or ∞. √ √
For example, consider again the function f : R+ → R defined by f (x) = x. We have f ′ (x) = 1/(2 x) >
√ √
0 for all x ∈ R+ , so f is increasing on R+ . This can be confirmed directly by noting that x1 < x2 for all
x1 , x2 ∈ R+ satisfying x1 < x2 .
We now use Lemma 7 to prove the following theorem, which justifies the so-called “first derivative test”:
Theorem 21. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be differentiable on
Vϵ (x∗ ) \ {x∗ } for some ϵ > 0.
(i) x∗ is a local maximizer of f if and only if there exists an ϵ such that f ′ (x) ≥ 0 for all x ∈ (x∗ − ϵ, x∗ )
and f ′ (x) ≤ 0 for all x ∈ (x∗ , x∗ + ϵ).
(ii) x∗ is a local minimizer of f if and only if there exists an ϵ such that f ′ (x) ≤ 0 for all x ∈ (x∗ − ϵ, x∗ )
and f ′ (x) ≥ 0 for all x ∈ (x∗ , x∗ + ϵ).
Proof. (i) Suppose f ′ (x) ≥ 0 for all x ∈ (x∗ − ϵ, x∗ ) and f ′ (x) ≤ 0 for all x ∈ (x∗ , x∗ + ϵ). By Lemma
7, this implies that f is non-decreasing on x ∈ (x∗ − ϵ, x∗ ) and non-increasing on x ∈ (x∗ , x∗ + ϵ). Thus,
f (x∗ ) ≥ f (x) for all x ∈ (x∗ − ϵ, x∗ ) and f (x∗ ) ≥ f (x) for all x ∈ (x∗ , x∗ + ϵ), or equivalently, f (x∗ ) ≥ f (x)
for all x ∈ Vϵ (x∗ ). This argument also works in the other direction since Lemma 7 works in both directions.
(ii) The proof of (ii) is similar and will be omitted.
Consider again the function f : (−1, 1) → R defined by f (x) = x3 . We have f ′ (x) = 3x2 > 0 for all
x ∈ (−1, 1). Thus, 0 is not a local optimizer.
Next, consider again the function f : (−1, 1) → R defined by f (x) = |x|. We have f ′ (x) = −1 for all
x ∈ (−1, 0) and f ′ (x) = 1 for all x ∈ (0, 1), so 0 is indeed a local minimizer as argued above. The interesting
thing about this example is that f ′ (0) does not even exist (note that Theorem 21 does not require f to be
differentiable at x∗ ).
Theorem 21 is useful on its own, but can also be used to prove the following theorem, which provides
sufficient conditions for local maximizers and local minimizers:
30
since f ′ (x∗ ) = 0. Thus, f ′′ (x∗ ) < 0 implies that there exists an ϵ > 0 such that f ′ (x∗ + h) > 0 for all
h ∈ (−ϵ, 0) and f ′ (x∗ + h) < 0 for all h ∈ (0, ϵ), or equivalently, f ′ (x) > 0 for x ∈ (x∗ − ϵ, x∗ ) and f ′ (x) < 0
for x ∈ (x∗ , x∗ + ϵ), so x∗ is a local maximizer by Theorem 21.
(ii) The proof of (ii) is similar and will be omitted.
For example, consider the function f : (−1, 1) → R defined by f (x) = x2 . We have f ′ (x) = 2x and
f (x) = 2, so f ′ (0) = 0 and f ′′ (0) > 0. Thus, 0 is a local minimizer of f .
′′
4
As another√example, consider again √ the function f : [−2, 2] → R defined by f (x)
√=x − x2 . √
As we saw
′ ′ ′ ′′ 2 ′′ ′′
above, f (−1/√ 2) = f (0)√= f (1/ 2) = 0. Moreover, f (x) = 12x − 2 so f (−1/ 2) = f (−1/ 2) = 4 >
0. Thus, −1/ 2 and −1/ 2 are local minimizers of f . On the other hand, f ′′ (0) so Theorem 22 cannot tell
us whether 0 is a local minimizer or√local maximizer (or neither). Fortunately, we can √ check using Theorem
21. In particular, for any x ∈ (−1/ 2, 0) we have f ′ (x) > 0, and for any x ∈ (0, 1/ 2) we have f ′ (x) < 0.
Thus, 0 is a local maximizer of f .
The following theorem provides necessary conditions for local maximizers and local minimizers:
Theorem 23. Let I be an interval in R, x∗ be an interior point of I, and f : I → R be twice differentiable
on some Vϵ (x∗ ).
(i) If x∗ is a local maximizer of f , then f ′ (x∗ ) = 0 and f ′′ (x∗ ) ≤ 0.
31
Re-arranging the above yields
g(t) − g(0)
f (y) − f (x) ≥ .
t
Since f is differentiable at x, g is differentiable at 0. Thus, taking the (right-hand) limit of both sides of the
above at 0 yields
f (y) − f (x) ≥ g ′ (0).
Finally, since
g ′ (t) = f ′ (ty + (1 − t)x)(y − x)
(by the chain rule), we have
g ′ (0) = f ′ (x)(y − x)
which yields the desired result. To show the converse, let z = (1 − t)x + ty. Since z ∈ I, we have
and
f (y) − f (z) ≥ f ′ (z)(y − z).
Multiplying the first of these inequalities by 1 − t and the second by t and then adding, we have
(1 − t)(f (x) − f (z)) + t(f (y) − f (z)) ≥ (1 − t)f ′ (z)(x − z) + tf ′ (z)(y − z),
or equivalently,
(1 − t)f (x) + tf (y) − f (z) ≥ f ′ (z)((1 − t)x + ty − z).
Since z = (1 − t)x + ty, the right-hand side of the above is equal to zero, and we have
i.e., f is convex on I.
(ii) The proof of (ii) is similar and will be omitted.
Lemma 8 will now be used to prove the following theorem:
Proof. (i) If f ′ (x∗ ) = 0, then, by Lemma 8, f (y) − f (x∗ ) ≤ 0 (i.e., f (x∗ ) ≤ f (y)) for all y ∈ I. Conversely, if
x∗ is a global maximizer of f (meaning that it is also a local maximizer of f ), then f ′ (x∗ ) = 0 by Theorem
18.
(ii) The proof of (ii) is similar and will be omitted.
For example, consider again the function f : (−2, 1] → R defined by f (x) = x2 (see again Figure 12).
Since this function is convex on its domain (see Exercise 12) and f ′ (0) = 0, we can conclude that 0 is a
global minimizer. Note that, although 1 is a global maximizer of f , 1 is not an interior point in (−2, 1], so
Theorem 24 is not applicable to it.
32
Extensions to Rk
The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (a, b) such
that a ∈ A and b ∈ B (these ordered pairs (a, b) are not sets themselves, but rather elements of the set
A × B). For example, if A = {Red, Yellow} and B = {Green, Blue}, then
Note that an ordered pair such as (Green, Red) is not an element of A × B since Green ∈ / A and Red ∈
/ B.
On the other hand, note that (Red, Yellow) and (Yellow, Red) are distinct elements within
Rk = R × · · · × R .
| {z }
k times
Elements in Rk (which we again call points) are k-vectors such as x = (x1 , . . . , xk ) or y = (y1 , . . . , yk ).6 For
example, consider the set
(which is a proper subset of R2 ), and the points x = (−1, 1), y = (−3, 3) and z = (1, −1). We have x, y ∈ C
but z ∈
/ C.
A real-valued function whose domain is a subset of Rk with k > 1 is called a real-valued function of
several variables.7 For example, consider the function f : R2 → R defined by f (x1 , x2 ) = x1 x2 . This
function takes the point (x1 , x2 ) ∈ R2 as input and produces the point x1 x2 ∈ R as output.
Let x, y ∈ Rk . The function d : Rk × Rk → R defined by
v
u k
uX
d(x, y) = t (xi − yi )2
i=1
the Euclidian norm of x, and denote it by ||x|| (here, 0 is a k-vector of zeros).8 Note that when k = 1, we
have d(x, y) = |x − y| and ||x|| = |x|.
For example, consider again the set C and the points x, y ∈ C defined above. We have
p p √
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 = (−1 + 3)2 + (1 − 3)2 = 8,
√ √
||x|| = 2, and ||y|| = 18.
It is worth mentioning
Pk that the Euclidean distance is just one way to measure distance between two
points in Rk (e.g., i=1 |xi − yi | yields the “Manhattan distance” between x and y). Loosely defined, a
measure of distance between two elements in a set is called a metric, and a metric along with the set on
which it operates is called a metric space. The function d defined above along with Rk is known as the
Euclidean k-space.
6 When working with matrices, we will always treat a k-vector as a k × 1 matrix.
7 We will also encounter functions whose range is a subset of Rk (e.g., when defining the derivative of a real-valued function
of several variables).
8 Recognizing that x − y = (x − y , . . . , x − y ), we can also write d(x, y) = ||x − y||
1 1 k k
33
It is straightforward to extend most of the definitions we encountered earlier to Rk using the function d.
For example, a set A ∈ Rk is said to be bounded if there exists an M ∈ R such that d(x, y) < M for all
x, y ∈ A (i.e., if the Euclidian distance between any two points in A is finite).
Similarly, letting ϵ > 0, the ϵ-neighbourhood of x ∈ Rk is
(i.e., the set of all points whose Euclidian distance from x is less than ϵ). Note that, with k = 1, we have
Vϵ (x) = {y ∈ R : |x − y| < ϵ} = (x − ϵ, x + ϵ)
(i.e., the set of all points inside a circle with radius ϵ centered at (x1 , x2 )). The set Vϵ (x) \ {x} is called the
deleted ϵ-neighbourhood of x (note that {x} is a singleton, i.e., a set with only one element, namely x).
With these extended definitions of ϵ-neighbourhoods and deleted ϵ-neighbourhoods in hand, our earlier
topological definitions are virtually unchanged:
• The point x ∈ Rk is an interior point of the set A ⊂ Rk if there exists a Vϵ (x) ⊂ A.
• The point x ∈ Rk is a boundary point of the set A ⊂ Rk if every Vϵ (x) contains points in both A
and Rk − A.
• The point x ∈ Rk is a limit point of the set A ⊂ Rk if every (Vϵ (x) \ {x}) ∩ A ̸= ∅ (i.e., Vϵ (x) contains
at least one point in A other than x).
• A set A ⊂ Rk is said to be open if every point in A is an interior point of A.
• A set A ⊂ Rk is said to be closed if every boundary point of A is a point in A.
Let’s now look a bit more closely at the set C defined above. First, every point in C is an interior point. To
see this, let w ∈ C and set q
ϵ = min{w12 , w22 }
so that Vϵ (w) ⊂ C. Figure 15 illustrates this for the case where w = (−3, 4); notice that the right edge of
the circle representing V3 (−3, 4) just touches the vertical axis while the bottom edge of this circle is above
the horizontal axis, meaning that all points inside this circle are elements of C (if we had w12 > w22 , then the
bottom edge of the circle would just touch the horizontal axis and the right edge of the circle would be to
the left of the vertical axis). Next, the set of all boundary points of C is
(this is the set of all points lying on either the vertical axis or the horizontal axis in Figure 15, none of
which are in C itself). For example, (0, 0) is a boundary point of C. To see this, let ϵ > 0 and consider the
2
points a = (−ϵ/2, ϵ/2) ∈ C and √ b = (ϵ/2, −ϵ/2) ∈ C − R = [0, ∞) × (−∞, 0]. We have a, b ∈ Vϵ (0, 0) since
d(a, (0, 0)) = d(b, (0, 0)) = ϵ/ 2 < ϵ. That is, the every ϵ-neighbourhood of (0, 0) includes points in C and
in C − R2 . Finally, the set of limit points of C is the union of its interior points and its boundary points:
(−∞, 0] × [0, ∞).
We now turn to limits and continuity of real-valued functions of several variables. Since we presented
definitions using the language of neighbourhoods already, there is nothing new here conceptually.9 Specifi-
cally, let A ⊂ Rk and c be a limit point of A. We say f : A → R has limit L at c if, for any ϵ > 0, there
exists a δ such that f (x) ∈ Vϵ (L)for any x ∈ (Vδ (c) \ {c}) ∩ A. This is written as limx→c f (x) = L.
Let A ⊂ Rk and c ∈ A. We say that f : A → R is continuous at the point c if, for any ϵ > 0, there
exists a δ such that f (A ∩ Vδ (c)) ⊂ Vϵ (f (c)). If c is a limit point of A, then f is continuous if and only if
9 We won’t even bother defining things like one-sided limits for real-valued functions of several variables; you have all the
tools you need to do so yourself.
34
Figure 15: V3 (−3, 4)
limx→c f (x) = f (c). As before, if f is continuous at every point in B ⊂ A, we say that it is continuous on
the set B.
Fortunately, the algebraic properties of functional limits and continuity presented in Theorems 11 and
12, respectively, as well as the chain rule (Theorem 16) continue to hold with real-valued functions of several
variables. The interested reader may refer to Rudin (1976, Theorems 4.4, 4.9, and 9.15) for formal statements
and proofs of these results.
h2 + h22
q
lim p1 = lim h21 + h22 = 0.
h1 ,h2 →0 h21 + h22 h1 ,h2 →0
35
Proof. Suppose f is differentiable at x. We have
f (x + h) − f (x) − r′ h
lim f (x + h) − f (x) − r′ h = lim ||h|| lim = 0 · 0 = 0,
h→0 h→0 h→0 ||h||
which implies
lim f (x + h) − r′ h = f (x).
h→0
(recall from earlier that, so long as the numerator is zero, it is not a problem if the limit of the denominator
is zero).
The following theorem provides a necessary condition for the existence of partial derivatives:
Theorem 26. Let A be an open set in Rk and x ∈ A. If f : A → R is differentiable at x, then all the partial
derivatives of f at x exist.
Proof. Suppose f is differentiable at x. Letting hi = t and hj = 0 for all j ̸= i, we have r′ h = tri and
||h|| = t. Thus, from the definition of differentiability, we have
f (x1 , . . . , xi−1 , xi + t, xi+1 , . . . , xk ) − f (x)
lim = ri ,
t→0 t
i.e., fi (x) = ri .
36
It is possible that the partial derivatives of f at x exist even if f is not differentiable at x. In fact, the
partial derivatives of f at x can exist even if f is not continuous at x.
Let’s now return to the definition of differentiability. In the proof of Theorem 26, we saw that, if f is
differentiable at x, then
f1 (x)
r = ... .
fk (x)
We call this the Jacobian matrix of f at x. It is tempting to think of r as a function of x, but r ∈ Rk . In
particular, each ri = fi (x) ∈ R (do not confuse the real number fi (x) with the function fi ). That said, the
value of r may be different for different values of x (recall that (r1 , r2 ) = (2x1 , 2x2 ) in the example considered
above).
Now suppose that f is differentiable on B ⊂ A so that all the partial derivatives of f exist for every
x ∈ B (by Theorem 26), and consider the function f ′ : B → Rk defined by
This function takes the point x ∈ Rk as input and produces the point (f1 (x), . . . , fk (x)) ∈ Rk as output
(this is a vector-valued function, as opposed to a real-valued function). Perhaps not surprisingly, we will
call f ′ (x) the derivative of f at x. We then say that f ′ is differentiable at the point x ∈ B, or equivalently,
that f is twice differentiable at the point x, if there exists a k × k matrix of real numbers H such that
||f ′ (x + h) − f ′ (x) − Hh||
lim
h→0 ||h||
exists and is equal to zero (note that Hh is a k × 1 matrix, i.e., a k-vector like f ′ (x + h) and f ′ (x)). If f is
twice differentiable at all points in C ⊂ B, then we say f is twice differentiable on the set C. Moreover,
by Theorem 25, if f is twice differentiable on C, then f ′ is continuous on C.
Consider once again the function f : R2 → R defined by f (x1 , x2 ) = x21 + x22 . Here, the above expression
is
2(x1 + h1 ) 2x1 h1 2h1 h
− −H −H 1
2(x2 + h2 ) 2x2 h2 2h2 h2
lim p
2 2
= lim p
2 2
.
h1 ,h2 →0 h1 + h2 h1 ,h2 →0 h1 + h2
Setting
2 0
H= ,
0 2
we have
2h1 2 0 h1 2h1 2h1 √
− −
2h2 0 2 h2 2h2 2h2 02 + 02
lim p = lim p = lim p 2 =0
h1 ,h2 →0 h1 + h22
2 h1 ,h2 →0 h1 + h22
2 h1 ,h2 →0 h1 + h22
Thus, f is twice differentiable on R2 since the above holds for any (x1 , x2 ) ∈ R2 .
The analogue of Theorem 26 for vector-valued functions (see Rudin, 1973, Theorem 9.17) tells us that,
if f is twice differentiable at x, then all the second-order partial derivatives of f at x exist. In particular,
we have
f11 (x) f12 (x) · · · f1k (x)
f21 (x) f22 (x) · · · f2k (x)
H= .
.. .. ..
.. . . .
fk1 (x) fk2 (x) · · · fkk (x)
We call H the Hessian matrix for f at x. It is important to emphasize that this is not a function, but
rather a k × k matrix of real numbers, i.e., each fii (x) ∈ R. In the example considered above, H is the same
for any x, but, in general, H may vary with x.
The following theorem will be helpful when dealing with Hessian matrices:
37
Theorem 27 (Young’s Theorem). Let A be an open set in Rk , x ∈ A, and f : A → R. If the partial
derivatives fi and fj exist in some Vϵ (x) and fij and fji are both continuous at x, then fij (x) = fji (x).
Proof. See Apostol (1974, Theorem 12.13).
If f is differentiable on some set and f ′ is continuous on that set, it is said to be continuously differen-
tiable on that set (this is true whether f is a real-valued function of one variable or of several variables, but
we did not need to introduce this concept in the previous section). If f ′ is continuously differentiable on some
set, then f is twice continuously differentiable that set. Thus, if f is twice continuously differentiable
on some Vϵ (x), Young’s Theorem tells us that the Hessian matrix for f at x will be symmetric.
Keep in mind that the existence of partial derivatives does not imply differentiability (only the opposite
is true; see Theorem 26). However, the following theorem shows that continuity of partial derivatives (when
they exist) does imply differentiability:
Theorem 28. Let A be an open set in Rk , and f : A → R. f is continuously differentiable on A if and only
if all its partial derivatives exist and are continuous on A.
Proof. See Rudin (1976, Theorem 9.21).
• The point x∗ ∈ A is a global maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ A (i.e., if f (x∗ ) = max f (A)).
• The point x∗ ∈ A is a local maximizer of f if f (x∗ ) ≥ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.
• The point x∗ ∈ A is a global minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ A (i.e., if f (x∗ ) = min f (A))
• The point x∗ ∈ A is a local minimizer of f if f (x∗ ) ≤ f (x) for all x ∈ Vϵ (x∗ ) ∩ A for some ϵ > 0.
Theorem 29. Let A be an open set in Rk , x∗ ∈ A, and f : A → R have all its partial derivatives at x∗ . If
x∗ is a local optimizer of f , then fi (x∗ ) = 0 for all i.
Proof. Assume x∗ is a local optimizer of f and fi (x∗ ) exists for all i. Next, fix some point z ∈ Rk and
consider the function g : R → R defined by
Note that g(0) = f (x∗ ). Thus, since x∗ is a local optimizer of f , it must be the case that g ′ (0) = 0 (by
Theorem 18). Now, since
Xk
′
g (t) = fi (x∗ + tz)zi ,
i=1
Since this needs to hold for any z ∈ R, we require fi (x∗ ) = 0 for all i.
38
For example, consider the function f : R2 → R defined by
Solving the above two equations for x∗1 and x∗2 yields (x∗1 , x∗2 ) = (10/7, 12/7). It is important to emphasize
that this point may not be an optimizer of f , but it is a candidate for one.
At this point, it would be prudent to review some facts from your first course in linear algebra. A pth-
order principal minor of a k × k matrix is the determinant of a submatrix that is obtained by deleting
k − p rows and columns of the matrix (note that 1 ≤ p ≤ k). In general, a k × k matrix has
k k!
=
p p!(k − p)!
pth-order principal minors. The leading pth-order principal minor of a k × k matrix is the determinant
of the submatrix that is obtained by deleting the last k − p rows and columns of the matrix (or, more simply,
the deteriminant of the upper-left p × p submatrix of the matrix). For example, consider the 2 × 2 matrix
a a12
A = 11 .
a21 a22
(the first of these is the leading first-order principal minor). Moreover, A has 1 second-order principal minor
(and this is the leading second-order principal minor). Now consider the 3 × 3 matrix
b11 b12 b13
B = b21 b22 b23 .
b31 b32 b33
(the first of these is the leading first-order principal minor). B also has 3 second-order principal minors:
b b b b b b
det 11 12 , det 11 13 , and det 22 23
b21 b22 b31 b33 b32 b33
(the first of these is the leading second-order principal minor). Finally, there is 1 third-order principal minor
of B:
b22 b23 b21 b23 b21 b22
det(B) = b11 det − b12 det + b13 det
b32 b33 b31 b33 b31 b32
(and this is the leading third-order principal minor).
A symmetric matrix is:
• positive definite if and only if all its leading principal minors are positive.
39
• positive semi-definite if and only if all its principal minors are non-negative.
• negative definite if and only if all its odd-numbered leading principal minors are negative and all its
even-numbered leading principal minors are positive.
• negative semi-definite if and only if all its odd-numbered principal minors are non-positive and all
its even-numbered principal minors are non-negative.
Note that a matrix of zeros is both positive semi-definite and negative semi-definite.
For example, consider the symmetric matrix
a b
.
b c
(notice that none of these depend on x1 or x2 , but this will not always be the case). Thus, the Hessian of f
at (x∗1 , x∗2 ) = (10/7, 12/7) (really, at any (x1 , x2 ) ∈ R2 ) is
−4 1
H= .
1 −2
This matrix is negative definite since −4 < 0 and 8 − 1 > 0. Thus, by Theorem 30, we can say that
(10/7, 12/7) is a local maximizer.
As emphasized earlier, it is possible that H takes on different values for different values of x. Accordingly,
it is possible (for example) that H is positive definite for some values of x but negative definite for other
values of x. In such cases, we would need to examine the definiteness of H specifically at x∗ .
To round things out, we will present one last theorem, analogous to Theorem 23, which provides necessary
conditions for local maximizers and local minimizers:
Theorem 31. Let A be an open set in Rk , x∗ ∈ A, and f : A → R be twice continuously differentiable on
some Vϵ (x∗ ).
40
(i) If x∗ is a local maximizer, all first-order derivatives of f at x∗ are zero and the Hessian matrix for f
at x∗ is negative semi-definite.
(ii) If x∗ is a local minimizer, all first-order derivatives of f at x∗ are zero and the Hessian matrix for f
at x∗ is positive semi-definite.
Proof. (i) Assume x∗ is a local maximizer of f . Then fi (x∗ ) = 0 for all i (by Theorem 29). Next, fix some
other point z ∈ Rk and consider again the function g defined in the proof of Theorem 29. Since x∗ is a local
maximizer of f , it must be the case that g ′′ (0) ≤ 0 (by Theorem 23). We have
k X
X k
g ′′ (t) = fij (x∗ + tz)zi zj ,
i=1 j=1
or, in matrix form, z′ Hz ≤ 0 (see below). Thus, since this must hold for any z ∈ R (i.e., even when z ̸= 0),
H must be negative semi-definite (note that this relies on the fact that H is symmetric, which follows from
an application of Young’s Theorem).
(ii) The proof of (ii) is similar and will be omitted.
Similar to Theorem 23, all that Theorem 31 can do for us is identify a candidate for a local optimizer.
3/2 3/2
As another example, consider the function f : R2+ → R defined by f (x1 , x2 ) = x1 + x2 . The first-order
partial derivatives of f at (x1 , x2 ) are
√ √
3 x1 3 x2
f1 (x1 , x2 ) = and f2 (x1 , x2 ) = ,
2 2
and the second-order partial derivatives of f at (x1 , x2 ) are
3 3
f11 (x1 , x2 ) = √ , f12 (x1 , x2 ) = f21 (x1 , x2 ) = 0, and f22 (x1 , x2 ) = √ .
4 x1 4 x2
Now suppose (x∗1 , x∗2 ) ∈ R2+ is an optimizer. Then, by Theorem 29, we have
p ∗
3 x∗2
p
3 x1
=0 and = 0.
2 2
Solving the above two equations for x∗1 and x∗2 yields (x∗1 , x∗2 ) = (0, 0). Unfortunately, f11 and f22 are
undefined at (0, 0), so Theorems 30 and 31 are not applicable (both of these theorems require that f be
twice continuously differentiable on some Vϵ (x∗1 , x∗2 )). More to the point, the Hessian of f at (0, 0) is
undefined so we can’t say anything about its definiteness. So, all that we can say is that (0, 0) is a candidate
for a local optimizer of f (it is actually a local minimizer of f since f (0, 0) = 0 and f (x1 , x2 ) > 0 for all
(x1 , x2 ) ∈ R2+ \ {0, 0}).
Constrained Optimization
In what follows, we will focus solely on maximization problems since the problem of minimizing the function
f is equivalent to the problem of maximizing the function −f . Moreover, to keep life simple, we will focus
on the case where the domain of f is a subset of R2 . This has the added benefit of allowing us to reduce
notational clutter slightly by denoting points in R2 by (x, y) rather than (x1 , x2 ).
Before proceeding, we need to digest the following theorem:
41
Theorem 32 (Implicit Function Theorem). Let A be an open subset of R2 and f : A → R be continuously
differentiable on A. If (c, d) ∈ A satisfies f (c, d) = 0 and f2 (c, d) ̸= 0, there exists an open interval I ⊂ R
containing c and one and only one function g : I → R implicitly defined by f (x, g(x)) = 0 for all x ∈ I and
satisfying g(c) = d. Moreover g is continuously differentiable on I with
f1 (c, g(c))
g ′ (c) = − .
f2 (c, g(c))
Let A be an open subset of R2 and f : A → R. In the prelude, we considered methods for finding
local/global maximizers of f in the set A. Often, however, we want to restrict our search to some proper
subset of A. Specifically, letting g : A → R, suppose we want to restrict our search to the feasible set
For example, consider the function f : R2 → R defined by f (x, y) = 2x2 − y 2 . Now, suppose we want to
constrain our search for local/global maximizers to the feasible set
C = {(x, y) ∈ R2 : 2x + y = 1}.
(λ ∈ R is called the Lagrange multiplier). The following theorem provides sufficient conditions for a local
maximizer of f on C (the proof should not be skipped):
Theorem 33. Let A be an open subset of R2 , f : A → R and g : A → R both be twice continuously
differentiable on A, and C = {(x, y) ∈ A : g(x, y) = 0}. Define L : R × A → R by L(λ, x, y) = f (x, y) +
λg(x, y), and let (λ∗ , x∗ , y ∗ ) ∈ R × A. If all first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are zero and
the determinant of the Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is positive, then (x∗ , y ∗ ) is a local maximizer of
f on C.
Proof. Suppose all first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are zero and the determinant of the
Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is positive. The first-order partial derivatives of L at (λ∗ , x∗ , y ∗ ) are
and
L3 (λ∗ , x∗ , y ∗ ) = f2 (x∗ , y ∗ ) + λ∗ g2 (x∗ , y ∗ ).
42
Note that, since L1 (λ∗ , x∗ , y ∗ ) = 0, we have g(x∗ , y ∗ ) = 0, i.e., (x∗ , y ∗ ) ∈ C as desired.
The Hessian matrix for L at (λ∗ , x∗ , y ∗ ) is
g1 (x∗ , y ∗ ) g2 (x∗ , y ∗ )
0
g1 (x∗ , y ∗ ) L22 (λ∗ , x∗ , y ∗ ) L23 (λ∗ , x∗ , y ∗ )
g2 (x∗ , y ∗ ) L23 (λ∗ , x∗ , y ∗ ) L33 (λ∗ , x∗ , y ∗ )
(since f and g are twice continuously differentiable on A, so is L, and thus L32 (λ∗ , x∗ , y ∗ ) = L23 (λ∗ , x∗ , y ∗ )
by Young’s Theorem). The determinant of this matrix is
D = −(L22 (λ∗ , x∗ , y ∗ )g2 (x∗ , y ∗ )2 − 2L23 (λ∗ , x∗ , y ∗ )g1 (x∗ , y ∗ )g2 (x∗ , y ∗ ) + L33 (λ∗ , x∗ , y ∗ )g1 (x∗ , y ∗ )2 ).
We thus require g1 (x∗ , y ∗ ) ̸= 0 or g2 (x∗ , y ∗ ) ̸= 0 (or both) so that D ̸= 0. Without loss of generality, assume
g2 (x∗ , y ∗ ) ̸= 0. Then, by the IFT, there exists an open interval I ⊂ R containing x∗ and a function h : I → R
implicitly defined by g(x, h(x)) = 0 for all x ∈ I satisfying h(x∗ ) = y ∗ and with
g1 (x∗ , h(x∗ ))
h′ (x∗ ) = −
g2 (x∗ , h(x∗ ))
(note that g and h here correspond to f and g, respectively, in the statement of the IFT above).
We now introduce the function F : I → R defined by F (x) = f (x, h(x)), and show that x∗ is a local
maximizer of F , i.e, that F ′ (x∗ ) = 0 and F ′′ (x∗ ) < 0. The derivative of F at x∗ is
F ′′ (x∗ ) = L22 (λ∗ , x∗ , h(x∗ )) + 2L23 (λ∗ , x∗ , h(x∗ ))h′ (x∗ ) + L33 (λ∗ , x∗ , h(x∗ ))h′ (x∗ )2
− L3 (λ∗ , x∗ , h(x∗ ))h′′ (x∗ )
2
g1 (x∗ , y ∗ ) g1 (x∗ , y ∗ )
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
= L22 (λ , x , y ) + 2L23 (λ , x , y ) − + L33 (λ , x , y ) −
g2 (x∗ , y ∗ ) g2 (x∗ , y ∗ )
D
=−
g2 (x∗ , y ∗ )2
<0
since D > 0 (in the second equality above, we have used the fact that L3 (λ∗ , x∗ , y ∗ ) = 0).
Let’s return now to the example considered above. The Lagrangian function is
Setting each of these first-order partial derivatives equal to zero and solving yields (λ∗ , x∗ , y ∗ ) = (−2, 1, −1).
The Hessian for L at this point (actually, at any point in R3 ) is
0 2 1
2 4 0 .
1 0 −2
The determinant of this matrix is 4 (i.e., positive), so we can conclude that (1, −1) is a local maximizer of
f on C.
43
To understand what is happening here, let’s put this in the context of the proof of Theorem 2. Here, it’s
easy to see that h(x) = 1 − 2x, and thus
Since F ′ (1) = 0 and F ′′ (1) = −4 < 0, we can conclude that 1 is a local maximizer of F (meaning that
(1, g(1)) = (1, −1) is a local maximizer of f on C). Unfortunately, in many cases, we don’t have an explicit
expression for h, so we need to use the Lagrangian method instead of this more direct approach.
For completeness, the following theorem provides sufficient conditions for a local maximizer in the general
case where f is a real-valued function of k ≥ 2 variables:
Theorem 34. Let k ≥ 2, A be an open subset of Rk , f : A → R and g : A → R both be twice continuously
differentiable on A, and C = {x ∈ A : g(x) = 0}. Define L : R × A → R by L(λ, x) = f (x) + λg(x), and let
(λ∗ , x∗ ) ∈ R × A and Dp be the pth-order leading principal minor of the Hessian matrix for L at (λ∗ , x∗ ). If
all first-order partial derivatives of L at (λ∗ , x∗ ) are zero and, for ℓ = 3, . . . , k + 1, Dℓ is positive for odd ℓ
and negative for even ℓ, then x∗ is a local maximizer of f on C.
Proof. See Simon and Blume (1994, Theorem 30.12).
Note that the Hessian matrix for L at (λ∗ , x∗ ) is (k + 1) × (k + 1), so Dk+1 is just the determinant of
this Hessian matrix itself. For example, with k = 2, we have k + 1 = 3, so the only second-order condition
in the above theorem is D3 > 0 (i.e., the determinant of the Hessian matrix is positive), which is consistent
with Theorem 2. When k = 3, the second-order conditions in the above theorem are D3 > 0 and D4 < 0.
When k = 4, the second-order conditions in the above theorem are D3 > 0, D4 < 0, and D5 > 0.
* Exercises
1. Show that N2 is countable.
2. In the proof of Theorem 1, we used the “obvious” facts that |x| ≥ x and |x|2 = x2 for all x ∈ R. Prove
these facts.
3. Prove the Reverse Triangle Inequality.
4. Show that |ab| = |a||b| for all a, b ∈ R.
5. Let (sn ) be some sequence. Show that, if lim |sn | = 0, then lim sn = 0.
6. Let (sn ) be some sequence. Show that, if lim sn = s, then lim |sn | = |s|. (HINT: You will need to use
the Reverse Triangle Inequality.)
7. Let (sn ) and (tn ) be sequences converging to the same limit. Show that lim |sn − tn | = 0.
8. Let (sn ) be some sequence. Show that sn ≤ sup{sk : k ≥ n} for all n ∈ N.
9. Consider the sequence (an ) defined by an = αn where |α| ∈ (0, 1). Show that lim an = 0.
10. Consider the sequence (an ) defined by an = (−1)n+1 . Find lim sup an and lim inf an .
11. Show that the sequence (an ) defined by an = (−1)n /n is a Cauchy sequence.
12. Show that the function f : R → R defined by f (x) = x2 is convex on R.
√
13. Consider the function f : R+ → R defined by f (x) = x. Using the ϵ − δ definition of a (two-sided)
limit, show that limx→0 f (x) = 0..
14. Let f be a real-valued function and c be an interior point in the domain of f . Write a formal definition
of limx→c+ f (x) = ∞ (f diverges to ∞ from the right).
44
15. Consider the function f : (−∞, −1) ∪ {0} ∪ (1, ∞) → R defined by
−1 if x < −1
f (x) = 0 if x = 0
1 if x > 1.
References
Apostol, T.M. (1974). Mathematical Analysis, 2nd edition. Pearson.
Bartle, R.G. and Sherbert, D.R. (2011). Introduction to Real Analysis, 4th edition. Wiley.
45