Math Review: Appendix
Math Review: Appendix
The last word, when all is heard: Fear God and keep his com-
mandments, for this is man’s all; because God will bring to
judgment every work, with all its hidden qualities, whether
good or bad.
—Ecclesiastes 12:13–14
The appendix reviews mathematical techniques that are used in the book. The appendix also
serves as a glossary of symbols, notation, and concepts employed in the text. For more infor-
mation on the topics covered here, the reader should consult references supplied at the end of
the appendix.
In an attempt to keep the presentation as clear as possible, we employ the theorem-proof
style whenever appropriate. Major results of independent interest are encapsulated as theorems.
A lemma is used to denote an auxiliary result that is used as a stepping stone toward a theorem.
We use the term proposition as a way to announce a minor theorem. A corollary is a direct
consequence of a theorem or proposition. A hypothesis is a statement that is taken for further
reasoning in a proof. For more information on the subject of writing for mathematical or technical
sciences, we recommend a very clear and useful book by Higham [122].
The colon, :, following x is read “such that.” The following sets are used in the book:
N = {1, 2, 3, . . .}, the set of positive integers,
Z = {0, 1, −1, 2, −2, . . .}, the set of integers,
R, the set of real numbers,
C, the set of complex numbers.
Sometimes, we take a fixed set and then carry out manipulations with reference to the fixed set.
We call this fixed set the universe of discourse, or just the universe, for short.
Throughout the text, we work with statements that are combinations of individual statements.
It is necessary to distinguish between sentences and the statements. A sentence is a part of a
language. Sentences are used to make different statements. In classical logic, statements are
either true or false. Consider the following two statements:
A : x > 7;
B : x 2 > 49.
We can combine the above two statements into one statement, called a conditional, that has the
form “IF A, THEN B.” Thus, a conditional is a compound statement obtained by placing the
word “IF” before the first statement and inserting the word “THEN” before the second statement.
We use the symbol “⇒” to represent the conditional “IF A, THEN B” as A ⇒ B. The statement
A ⇒ B also reads as “A implies B,” or “A only if B,” or “A is sufficient for B,” or “B is
necessary for A.” Statements A and B may be either true or false. The relationship between the
truth or falsity of A and B and the conditional A ⇒ B can be illustrated by means of a diagram
called a truth table shown in Table A.1. In the table, T stands for “true” and F stands for “false.”
It is intuitively clear that if A is true, then B must also be true for the statement A ⇒ B to be
true. If A is not true, then the sentence A ⇒ B does not have an obvious meaning in everyday
language. We can interpret A ⇒ B to mean that we cannot have A true and B not true. One can
check that the truth values of the statements A ⇒ B and “not (A and not B)” are the same. We
can also check that the truth values of A ⇒ B and “not B ⇒ not A” are the same. The above
is the basis for three methods, inference processes, of proving statements of the form A ⇒ B.
Inference is a process by which one statement is arrived at, using reasoning, on the basis of one
or more other statements accepted as a starting point. Reasoning is a special kind of thinking
in which inference takes place using the laws of thought. By a proof we mean the process of
establishing the truth of a statement by arriving at it from other statements using principles of
reasoning. Most of the time we are concerned with the statements of the form A ⇒ B and use
A B A⇒B
F F T
F T T
T F F
T T T
626 APPENDIX • MATH REVIEW
A.2 Vectors
We define an n-dimensional column vector to be an n-tuple of numbers:
⎡ ⎤
x1
⎢ x2 ⎥
⎢ . ⎥, (A.1)
⎣.⎦
.
xn
where n is a positive integer. Thus a vector can be viewed as an ordered set of scalars. We
use lowercase bold letters—for example, x—to denote such n-tuples. We call the numbers
x1 , x2 , . . . , xn the coordinates or components of the vector x. The number of components of
a vector is its dimension. When referring to a particular component of a vector, we use its
component—for example, xi —or the subscripted name of the vector—for example (x)i —to
denote xi . The set of all n-tuples, that is, n-vectors, with real coefficients is denoted Rn and called
the real n-space. The transpose of the vector x, denoted x T , is an n-dimensional row vector
⎡ ⎤T
x1
⎢ x2 ⎥
xT = ⎢ ⎥
⎣ .. ⎦ = [x1 x2 · · · xn ]. (A.2)
.
xn
The sum of two n-vectors x and y is an n-vector, denoted x + y, whose ith component is
xi + yi . The following rules are satisfied:
1. (x + y) + z = x + ( y + z), that is, vector addition is associative.
2. x + y = y + x, which means that vector addition is commutative.
3. Let 0 = [0 0 · · · 0]T denote the zero vector. Then,
0 + x = x + 0 = x.
4. Let x = [x1 x2 · · · xn ]T and let −x = [−x1 −x2 · · · −xn ]T . Then,
x + (−x) = 0.
A.2 VECTORS 627
The product of an n-vector x by a scalar c is an nth vector, denoted cx, whose ith component
is cxi .
For two vectors x and y from Rn , we define their scalar product, also called the inner product,
to be
x, y = x T y = x1 y1 + x2 y2 + · · · + xn yn . (A.3)
The scalar product of vectors from Rn is commutative, that is,
x, y = y, x,
and distributivity over vector addition holds:
x, y + z = x, y + x, z.
Two vectors x and y are said to be orthogonal if x, y = 0.
A vector norm, denoted · , satisfies the following properties:
VN 1. x > 0 for any nonzero vector.
VN 2. For any scalar c, we have cx = |c|x.
VN 3. For any two vectors x and y, the triangle inequality holds, that is, x+ y ≤ x+ y.
The Euclidean norm, or the 2-norm, of a vector x ∈ Rn , denoted x2 , is defined as
x2 = x, x = x12 + x22 + · · · + xn2 . (A.4)
Note that x22 = x, x = x T x. The Euclidean norm is a special class of norms called the
Hölder or p-norms defined by
x p = (|x1 | p + · · · + |xn | p )1/ p , p ≥ 1. (A.5)
Other often-used norms that are special cases of the above class of norms are the 1-norm and
the infinity norm, defined respectively as
x1 = |x1 | + · · · + |xn | (A.6)
and
x∞ = max |xi |. (A.7)
i
Most of the time, we use the Euclidean norm for vectors. So for the sake of brevity, we write
x to mean x2 .
We now introduce the notion of projection. Let x, y be two vectors, where y = 0. Let
z = c y, c ∈ R, be the vector such that the vector x − z is orthogonal to y—see Figure A.1 for
628 APPENDIX • MATH REVIEW
x x⫺z
z ⫽ cy
0 Figure A.1 Computing the projection of x along y.
Multiplying both sides of (A.17) by y 2 and taking the square root, concludes the
proof.
A.3 MATRICES AND DETERMINANTS 629
is called a matrix. Thus a matrix is a set of scalars subject to two orderings. We say that it is an
m by n matrix, or m × n matrix. This matrix has m rows and n columns. If m = n, then we say
that it is a square matrix. If the elements of an m × n matrix A are real numbers, then we write
A ∈ Rm×n . If the elements of the matrix A are complex numbers, then we write A ∈ Cm×n . We
define the zero matrix, denoted O, to be the matrix such that ai j = 0 for all i, j. We define the
630 APPENDIX • MATH REVIEW
n × n identity matrix, denoted I n , as the matrix having diagonal components all equal to 1 and
all other components equal to 0.
Given an m × n matrix A. The n × m matrix B such that b ji = ai j is called the transpose of
A and is denoted AT . Thus, the transpose of a matrix is obtained by changing rows into columns
and vice versa. A matrix A that is equal to its transpose—that is, A = AT —is called symmetric.
We now define the sum of two matrices. For two matrices A and B of the same size, we
define A + B to be the matrix whose elements are ai j + bi j ; that is, we add matrices of the same
size componentwise.
Let c be a number and let A be a matrix. We define c A to be the matrix whose components
are cai j . Thus, to obtain c A, we multiply each component of A by c.
We now define the product of two matrices. Let A be an m × n matrix and let B be an n × s
matrix. Let aiT denote the ith row of A, and let b j denote the jth column of B. Then, the product
of AB is an m × s matrix C defined as
⎡ T ⎤
a1 b1 a1T b2 · · · a1T bs
⎢ T ⎥
⎢ a2 b1 a2T b2 · · · a2T bs ⎥
C = AB = ⎢ ⎢ .. .. ..
⎥
.. ⎥ . (A.23)
⎣ . . . . ⎦
am b1 am b2 · · · am bs
T T T
It is not always true that AB = B A; that is, matrix multiplication is not necessarily commutative.
Let A, B, and C be matrices such that A, B can be multiplied, A, C can be multiplied, and
B, C can be added. Then,
A(B + C) = AB + AC. (A.24)
The above property is called the distributive law. If x is a number, then
A(x B) = x( AB). (A.25)
Furthermore, matrix multiplication is associative because
( AB)C = A(BC). (A.26)
Suppose now that A, B can be multiplied. Then,
( AB)T = B T AT . (A.27)
Thus the transpose of a product is the product of transposed matrices in reverse order.
Let A be an n × n matrix. An inverse of A, if it exists, is an n × n matrix B such that
AB = B A = I n . (A.28)
If an inverse exists, then it is unique. A square matrix that is invertible is said to be nonsingular.
We denote the inverse of A by A−1 . Hence, we have
A−1 A = A A−1 = I n .
The transpose of an inverse is the inverse of the transpose,
( A−1 )T = ( AT )−1 , (A.29)
and we denote it as A−T .
We now present the Sherman–Morrison–Woodbury formula for inverting a special kind of
matrices.
A.3 MATRICES AND DETERMINANTS 631
This is the expansion of the determinant along the ith row. In a similar fashion, we can expand
the determinant along a column. The determinant has the following properties:
D 1. det A = det AT .
D 2. Let A and B be square matrices of the same size, then det( AB) = det A det B =
det(B A).
D 3. det O = 0.
D 4. det I n = 1.
We now list additional properties of determinants. Let ai denote the ith column of A. Then,
we write
det A = det(a1 , a2 , . . . , an ).
D 5. Suppose that the ith column ai is written as a sum
ai = b + c.
Then, we have
det(a1 , . . . , ai , . . . , an ) = det(a1 , . . . , b + c, . . . , an )
= det(a1 , . . . , b, . . . , an ) + det(a1 , . . . , c, . . . , an ).
D 6. If x is a number, then
det(a1 , . . . , x ai , . . . , an ) = x det(a1 , . . . , ai , . . . , an ).
D 7. If two columns of the matrix are equal, then the determinant is equal to zero.
D 8. If we add a multiple of one column to another, then the value of the determinant does
not change. In other words, if x is a number, then
det(a1 , . . . , ai , . . . , a j , . . . , an ) = det(a1 , . . . , ai + x a j , . . . , a j , . . . , an ).
Properties D 5–D 8 hold also for rows instead of columns because det A = det AT .
Using the cofactors of A and its determinant, we can give a formula for A−1 . First, we define
the adjugate of A, denoted adj A, to be the matrix whose (i, j)th entry is the cofactor c ji ; that
is, adj A is the transpose of the cofactor matrix of A. Then,
−1 1 adj A
A = adj A = . (A.35)
det A det A
If A is n × n and v is a nonzero n × 1 vector such that for some scalar λ,
Av = λv, (A.36)
then v is an eigenvector corresponding to the eigenvalue λ. In order that λ be an eigenvalue of
A, it is necessary and sufficient for the matrix λI n − A to be singular; that is, det(λI n − A) = 0.
This leads to an nth-order polynomial equation in λ of the form
det(λI n − A) = λn + an−1 λn−1 + · · · + a1 λ + a0 = 0. (A.37)
This equation, called the characteristic polynomial equation, must have n, possibly nondis-
tinct, complex roots that are the eigenvalues of A. The spectral radius of A, denoted ρ( A), is
A.4 QUADRATIC FORMS 633
defined as
ρ( A) = max |λi ( A)|,
1≤i≤n
We now present tests for definiteness properties of symmetric matrices. We state these tests
in terms of principal minors of a given real, symmetric, n × n matrix Q with elements qi j . We
also give tests for definiteness in terms of eigenvalues of Q. The principal minors of Q are
det Q itself and the determinants of submatrices of Q obtained by removing successively an ith
row and ith column. Specifically, for p = 1, 2, . . . , n, the principal minors are
⎡ ⎤
qi1 i1 qi1 i2 · · · qi1 i p
⎢ q i 2 i 1 qi 2 i 2 · · · qi 2 i p ⎥
i1 , i2 , . . . , i p ⎢ ⎥
p = det⎢ . . . . ⎥, 1 ≤ i1 ≤ i2 ≤ · · · ≤ i p .
i1 , i2 , . . . , i p ⎣ . . .
. . . .
. ⎦
qi p i 1 qi p i 2 · · · qi p i p
We now give two tests for Q = Q T to be positive definite: one in terms of its leading principal
minors, and the other in terms of its eigenvalues. The leading principal minors of Q are
1 1, 2 q11 q12
1 = q11 , 2 = det ,
1 1, 2 q21 q22
⎡ ⎤
q11 q12 q13
1, 2, 3 1, 2, ..., p
3 = det⎣q21 q22 q23 ⎦ , . . . , p = det Q.
1, 2, 3 1, 2, ..., p
q31 q32 q33
◆ Example A.1
We will find the range of the parameter γ for which the quadratic form
−5 − γ 6
xT x
0 −2
A quadratic form is negative definite if and only if its negative is positive definite.
So instead of working with f , we will find the range of the parameter γ for which
g = − f is positive definite, where
5+γ −3
g = xT x.
−3 2
is positive definite. The first-order leading principal minor of the above matrix is
1 = 5 + γ , which is positive if and only if γ > −5. The second-order leading
principal minor is
5+γ −3
2 = det = 2γ + 1,
−3 2
which positive for γ > −1/2. Thus the quadratic form f is negative definite if and
only if γ > −1/2.
If a given matrix Q ∈ Rn×n is symmetric, then its eigenvalues are real and there exists a
unitary matrix U such that
⎡ ⎤
λ1 0 · · · 0
⎢ 0 λ2 · · · 0⎥
⎢ .. ⎥
U T QU = ⎢ .. .. ⎥ = . (A.40)
⎣. . .⎦
0 0 ··· λn
Using the above, one can show that a symmetric positive
√ semidefinite matrix Q has a positive
1/2
semidefinite symmetric square root, denoted Q or Q, that satisfies
Indeed, it follows from Theorem A.5 that if Q = Q T ≥ 0 then its eigenvalues are all nonnegative.
Hence,
⎡ 1/2 ⎤
λ1 0 ··· 0
⎢ ⎥
⎢ 0 1/2
λ2 ··· 0 ⎥
1/2 = ⎢⎢ .. .. .. ⎥
⎥ (A.42)
⎣ . . . ⎦
0 0 ··· λ1/2
n
Let V ∈ Rn×r be a matrix that consists of the first r columns of the matrix U, and let
⎡ 1/2 ⎤
λ1 0 ··· 0
⎢ ⎥
⎢ 0 1/2
λ2 ··· 0 ⎥
⎢ ⎥ T
C=⎢ . . . ⎥V . (A.45)
⎢ .. .. .. ⎥
⎣ ⎦
0 0 · · · λr
1/2
Q = C T C. (A.46)
A = BC,
Thus, the matrix A ⊗ B consists of mn blocks. Using the definition of the Kronecker product,
we can verify the following two properties:
( A ⊗ B)(C ⊗ D) = AC ⊗ B D, (A.48)
( A ⊗ B)T = AT ⊗ B T . (A.49)
Let
X = [x 1 x2 ··· xm ]
yields the column nm vector formed from the columns of X taken in order. Let now A be an
n × n, and let C and X be n × m matrices. Then, the matrix equation
AX = C (A.50)
can be written as
XB = C (A.52)
can be represented as
Using the above two facts, we can verify that the matrix equation
AX + X B = C, (A.54)
◆ Example A.2
Consider the matrix equation (A.54), where
0 1 −2 0 1 0
A= , B= , and C = .
0 1 −3 1 1 1
We will represent the above equation in the matrix–vector format as given by (A.55),
that is,
( I 2 ⊗ A − B T ⊗ I 2 )s( X ) = s( C).
The above system of linear equations has a unique solution because the matrix
⎡ ⎤
−2 1 −3 0
⎢ 0 −1 0 −3⎥
⎢ ⎥
⎣ 0 0 1 1⎦
0 0 0 2
is nonsingular.
Let λi , v i be the eigenvalues and eigenvectors, respectively of A, and μ j and w j the eigen-
values and eigenvectors of m × m matrix B. Then,
( A ⊗ B)(v i ⊗ w j ) = Av i ⊗ Bw j
= λi v i ⊗ μ j w j
= λi μ j (v i ⊗ w j ). (A.56)
Thus, the eigenvalues of A ⊗ B are λi μ j , and their respective eigenvectors are v i ⊗ w j for
i = 1, 2, . . . , n, j = 1, 2, . . . , m.
A.5 THE KRONECKER PRODUCT 639
◆ Example A.3
For the given two matrices
⎡ ⎤
4 6 −8
0 0
A= and B = ⎣0 −1 0⎦ ,
2 −2
0 2 7
we will find the eigenvalues of the matrix ( I + A ⊗ B T ), where I is the identity
matrix of appropriate dimensions. We note that
eig( I + A ⊗ B T ) = 1 + eig( A)eig( B),
because for any square matrix M, eig( I +M) = 1+eig( M) and eig( M T ) = eig( M).
By inspection,
eig( A) = {0, −2} and eig( B) = {4, −1, 7}.
Hence,
eig( I + A ⊗ B T ) = {1, 1, 1, −7, 3, −13}.
We will now use (A.56) to study the matrix equation (A.54), which we represent as
M s(X) = s(C), (A.57)
where
M = I m ⊗ A + BT ⊗ I n . (A.58)
The solution to (A.57) is unique if, and only if, the mn × mn matrix M is nonsingular. To find
the condition for this to hold, consider the matrix
(I m + ε B T ) ⊗ (I n + ε A) = I m ⊗ I n + ε M + ε 2 B T ⊗ A (A.59)
whose eigenvalues are
(1 + εμ j )(1 + ελi ) = 1 + ε(μ j + λi ) + ε 2 μ j λi (A.60)
because for a square matrix Q we have
λi (I n + ε Q) = 1 + ελi ( Q).
Comparing the terms in ε in (A.59) and (A.60), we conclude that the eigenvalues of M are
λi + μ j , i = 1, 2, . . . , n, j = 1, 2, . . . , m. Hence, M is nonsingular if and only if
λi + μ j = 0. (A.61)
The above is a necessary and sufficient condition for the solution X of the matrix equation
AX + X B = C to be unique.
Another useful identity involving the Kronecker product is
s( ABC) = (C T ⊗ A)s(B). (A.62)
For further information on the subject of the Kronecker product, we refer the reader to
Brewer [33].
640 APPENDIX • MATH REVIEW
The Least Upper Bound Axiom Every nonempty set S of real numbers that has an upper
bound has a least upper bound, also called supremum, and is denoted
sup{x : x ∈ S}.
Examples
1 2 n
1. sup , ,..., , . . . = 1;
2 3 n+1
1 1 1 1
2. sup − , − , − , . . . , − 3 , . . . = 0;
2 8 27 n
√ √ √
3. sup{x : x < 3} = sup x : − 3 < x < 3 = 3.
2
Theorem A.7 If M = sup{x : x ∈ S} and ε > 0, then there is at least one number x in
S such that
M − ε < x ≤ M.
Proof The condition x ≤ M is satisfied by all numbers x in S by virtue of the least
upper bound axiom. We have to show that for any ε > 0 there is a number x ∈ S
such that
M − ε < x.
We prove the statement by contradiction. We suppose that there is no such number
in S. Then,
x ≤ M−ε for all x ∈ S
and hence M − ε would be an upper bound of S that is less than M, which is the least
upper bound. This contradicts the assumption, and thus the proof is complete.
To illustrate the above theorem, we consider the following set of real numbers:
1 2 3 n
S= , , ,..., ,... .
2 3 4 n+1
Note that
sup{x : x ∈ S} = 1.
Let ε = 0.1. Then, for example, the element of the set S that is equal to 99/100 satisfies
99
1−ε < ≤ 1.
100
A.7 SEQUENCES 641
Theorem A.8 Every nonempty set of real numbers that has a lower bound has a
greatest lower bound, also called infimum, and is denoted as
inf{x : x ∈ S}.
Proof By assumption, S is nonempty and has a lower bound that we call s. Thus
s≤x for all x ∈ S.
Hence
−x ≤ −s for all x ∈ S;
that is, the set {−x : x ∈ S} has an upper bound −s. From the least upper bound
axiom, we conclude that the set {−x : x ∈ S} has a least upper bound, supremum.
We call it −s0 . Because
−x ≤ −s0 for all x ∈ S,
we have
s0 ≤ x for all x ∈ S,
and thus s0 is a lower bound for S. We now prove, by contradiction, that s0 is the
greatest lower bound, infimum, of the set S. Indeed, if there was a number x̃ satisfying
s0 < x̃ ≤ x for all x ∈ S,
then we would have
−x ≤ −x̃ < −s0 for all x ∈ S,
which contradicts the fact that −s0 is the supremum of {−x : x ∈ S}.
Using arguments similar to those we used in the proof of Theorem A.8, we can prove the
following theorem:
Theorem A.9 If m = inf{x : x ∈ S} and ε > 0, then there is at least one number x in
S such that
m ≤ x < m + ε.
A.7 Sequences
A sequence of real numbers can be viewed as a function whose domain is the set of natural
numbers 1, 2, . . . , n, . . . and whose range is contained in R. Thus, a sequence of real numbers
can be viewed as a set of ordered pairs (1, a1 ), (2, a2 ), . . . , (n, an ), . . . . We denote a sequence
642 APPENDIX • MATH REVIEW
a1
a3
g⫹
a5
a7
g a8
a6
g⫺ a4
a2
Figure A.2 An illustration of the notion
0 1 2 3 4 5 6 7 8 of the limit of a sequence.
of real numbers
a1 , a2 , . . . , an , . . .
as {an }.
A sequence {an } is increasing if a1 < a2 < · · · < an · · ·. In general, a sequence is increasing
if an < an+1 . If an ≤ an+1 , then we say that the sequence is nondecreasing. Similarly, we can
define decreasing and nonincreasing sequences. Nonincreasing or nondecreasing sequences are
called monotone sequences.
A number g is called the limit of the infinite sequence a1 , a2 , . . . , an , . . . if for any positive
ε there is a natural number k = k(ε) such that for all n > k we have
|an − g| < ε;
that is, an lies between g − ε and g + ε for all n > k. In other words, for any ε > 0 we have
|an − g| < ε for sufficiently large n’s (see Figure A.2), and we write
g = lim an ,
n→∞
or
an → g.
A sequence that has a limit is called a convergent sequence. A sequence that has no limit is
called divergent.
The notion of the sequence can be extended to sequences with elements in R p . Specifically,
a sequence in R p is a function whose domain is the set of natural numbers 1, 2, . . . , n, . . . and
whose range is contained in R p .
Suppose that a1 , a2 , . . . , an , . . . , denoted {an }, is a sequence in R p . Then, g ∈ R p is called
the limit of this sequence if for each ε > 0 there exists a natural number k = k(ε) such that for
all n > k we have
an − g < ε,
and we write
g = lim an ,
n→∞
or
an → g.
A.7 SEQUENCES 643
A sequence in R p is bounded if there exists a number M ≥ 0 such that an ≤ M for all
n = 1, 2, . . . . The set of all bounded sequences is sometimes denoted as l∞ .
Let
M = max(a1 , a2 , . . . , ak , g + 1),
then we have
M ≥ an for all n,
which means that a convergent sequence is bounded, and the proof is complete.
and thus
am − g < ε
n
We now state, without a proof, an important result due to Bolzano and Weierstrass.
Theorem A.14 (Cauchy) A sequence {an } in R p is convergent if and only if for every
ε > 0 there exists an r = r (ε) such that
an − ar < ε
holds for all n > r .
Proof (⇒) Let
lim an = g,
n→∞
and let ε > 0 be given. Then, for some l = l(ε),
an − g < 12 ε for all n > l.
Let r = l + 1, then
ar − g < 12 ε.
Adding the above inequalities, we obtain
an − g + ar − g < ε for all n > r.
On the other hand,
an − ar ≤ an − g + ar − g.
Hence,
an − ar < ε for all n > r.
(⇐) We now assume that for every ε > 0 there exists an r = r (ε) such that
an − ar < ε for all n > r.
We then show that the above condition implies
lim an = g.
n→∞
First, we show that {an } is a bounded sequence. Let ε = 1. Then, an r = r (ε) exists
such that an − ar < 1 for all n > r . Hence,
an − ar ≤ an − ar < 1,
646 APPENDIX • MATH REVIEW
From the hypothesis of the theorem it follows that for some r = r (ε/3) we have
an − ar < 13 ε for all n > r. (A.65)
On the other hand, the assumption that the subsequence {amn } is convergent implies
that for some k = k(ε/3) we have
amn − g < 13 ε for all n > k. (A.66)
We can select k so that k > r . In such a case, (A.65) and (A.66) are satisfied for all
n > k. Because mn ≥ n > r , we have for all n > k the following:
am − ar < 1 ε,
n 3
or, equivalently,
ar − am < 1 ε. (A.67)
n 3
N
sN = an = a1 + a2 + · · · + a N
n=1
an . (A.68)
n=1
If the sequence of the partial sums has a limit—that is, s N → s as N → ∞—then we say that
the series converges to the sum s, and write
∞
s= an .
n=1
The above formula can only make sense when the series (A.68) converges.
648 APPENDIX • MATH REVIEW
∞
Theorem A.16 If the series n=1 an converges, then
an → 0 as n → ∞.
Proof Let s be the sum of the series ∞n=1 an . Then,
N
sN = an → s as N → ∞. (A.69)
n=1
We also have
s N−1 → s as N → ∞, (A.70)
and
a N = (a1 + a2 + · · · + a N ) − (a1 + a2 + · · · + a N−1 )
= s N − s N−1 .
Therefore, by (A.69) and (A.70),
a N → (s − s) = 0 as N → ∞,
and the proof is complete.
A.8 Functions
We discuss functions of one and several variables and their basic properties. A function f from
a set A to a set B, denoted
f : A → B,
is a rule that assigns to each x ∈ A a unique element y ∈ B. The set A is the domain of f and
the range of f is a subset of B, not necessarily the whole of B.
Let f : S → R, where S is a subset of R. Then, f is said to be continuous at z 0 ∈ S if for
every ε > 0, there exists δ = δ(ε, z 0 ) such that if z ∈ S, |z − z 0 | < δ, then | f (z) − f (z 0 )| < ε.
This is also written as f (z) → f (z 0 ) as z → z 0 . If f is continuous at every point of S, then we
say that f is continuous on S.
We say that the function f is bounded on a set S if there is M ≥ 0 such that for any z ∈ S
we have
| f (z)| ≤ M.
A.8 FUNCTIONS 649
With the help of the above lemma, we can now prove the maximum–minimum theorem of
Weierstrass.
A sphere, ball, around x in R p is the set of the form { y : x − y < ε} for some ε > 0. A set
in R p is said to be open if around every point x in there is a sphere that is contained in .
If x ∈ R p , then any set that contains an open set containing x is called a neighborhood of x.
A sphere around x is a neighborhood of x.
A point x ∈ R p is called a boundary point of a set ⊆ R p if every neighborhood of x
contains a point of and a point that is not in . The set of all boundary points of is called
the boundary of . A set is said to be closed if it contains its boundary. Equivalently, a set
is closed if x k → g with x k ∈ implies g ∈ . A set is bounded if it is contained within
some sphere of finite radius. A set is compact if it is both closed and bounded. Any finite set
= {x 1 , x 2 , . . . , x k } in R p is compact.
We now generalize Lemma A.2 for real-valued functions of several variables.
650 APPENDIX • MATH REVIEW
n
(bk − ak ) < δ (A.72)
k=1
n
| f (bk ) − f (ak )| < ε. (A.73)
k=1
A.9 LINEAR OPERATORS 651
Note that every absolutely continuous function is continuous, because the case n = 1 is not
excluded in the above definition. A function f that satisfies the Lipschitz condition,
| f (x) − f (y)| ≤ L|x − y|,
is absolutely continuous.
The matrix norm—that is, the norm of the linear map F given by (A.75)—is equivalent to
A matrix norm produced using (A.75) or (A.76) is called the induced norm. Note that
Ax − A y = A(x − y) ≤ Ax − y;
that is, the linear map, F, represented by the matrix A is uniformly continuous.
We now comment on the notation used in (A.75) and (A.76). First, the same symbol is used
for the matrix norm as for the norm of a vector. Second, the norms appearing on the right-hand
side of (A.75) and (A.76) are the norms of the vectors x ∈ Rn and Ax ∈ Rm . Thus, if Rn is
equipped with the vector norm ·s while Rm is equipped with the vector norm ·t , then (A.76)
652 APPENDIX • MATH REVIEW
should be interpreted as
A = max Axt .
xs =1
Vector norms and a matrix norm are said to be compatible, or consistent, if they satisfy
We will show that (A.76) is well-defined. For this, we need the following lemma.
By the continuity of the vector norm and the theorem of Weierstrass on page 650, a vector
x 0 can be found such that x 0 = 1 and A = Ax 0 = maxx=1 Ax. Thus, the expres-
sion (A.76) is well-defined.
Next, we will show that (A.76) satisfies (A.77). Suppose that we are given a vector y = 0.
Then,
1
x= y
y
satisfies the condition x = 1. Hence,
A y ≤ y A.
Proof of MN 4. For the product of two matrices AB, there exists a vector x 0 , x 0 = 1,
such that AB = AB x 0 . Then, using (A.76), we have
AB = A( B x 0 ) ≤ AB x 0 .
Employing (A.76) again yields
AB ≤ ABx 0 = AB.
The induced matrix norms corresponding to the one-, two-, and infinity-norm for vectors are:
• A1 = max j a j 1 , that is, A1 is the maximum of absolute column sum;
• A2 = σ1 ( A), which is the largest singular value of A;
• A∞ = maxi aiT 1 , which is the maximum of absolute row sum.
A matrix norm that is not induced by a vector norm is the Frobenius norm, defined for an m × n
matrix A as
! m n "1/2
A F = ai2j . (A.78)
i=1 j=1
can be expressed as a linear combination of these vectors. Any such set of vectors is called a
basis of the subspace. The vectors forming a basis are not unique.
The number of linearly independent columns of a matrix A is called the rank of A and is
denoted rank( A). The rank of A is also equal to the number of linearly independent rows of A.
For the product of an m ×n matrix A and an n × p matrix B, we have Sylvester’s inequalities:
rank( A) + rank(B) − n ≤ rank( AB) ≤ min{rank( A), rank(B)}. (A.88)
For an m × n matrix A, the set of all n-vectors orthogonal to the rows of A is called the
kernel, or null space, of A and is denoted ker( A), that is,
ker( A) = {x : Ax = 0}. (A.89)
The kernel of A is a subspace of Rn . Indeed, if x 1 , x 2 ∈ ker( A), then Ax 1 = 0 and Ax 2 = 0.
Hence, for any scalars α1 and α2 , we have
A(α1 x 1 + α2 x 2 ) = α1 Ax 1 + α2 Ax 2 = 0. (A.90)
Thus, if x 1 , x 2 ∈ ker( A), then α1 x 1 + α2 x 2 ∈ ker( A), which means that ker( A) is a subspace
of Rn . Similarly, we define the kernel of AT to be the set
ker( AT ) = {w : AT w = 0}. (A.91)
The intersection of range( A) and ker( AT ) contains only the zero vector, that is,
range( A) ∩ ker( AT ) = {0}. (A.92)
For a given subspace S of Rn , the orthogonal complement S ⊥ of S is defined as
S ⊥ = {x ∈ Rn : x T y = 0 for all y ∈ S}. (A.93)
The vectors in ker( AT ) are orthogonal to the vectors in range( A), and vice versa. We say
that range( A) and ker( AT ) are orthogonal complements of one another and write
range( A)⊥ = ker( AT ) (A.94)
and
ker( AT )⊥ = range( A). (A.95)
Similarly,
range( AT )⊥ = ker( A) (A.96)
and
ker( A)⊥ = range( AT ). (A.97)
If A is an m × n matrix with linearly independent columns, which implies that m ≥ n, then
rank( A) = n = dim range( A)
and
dim ker( AT ) = m − n.
We now define a normed space.
656 APPENDIX • MATH REVIEW
· : C( ) → R defined as
f = max | f (x)| (A.98)
x∈
satisfies the norm axioms. The function space C( ), with scalars from R and with the norm (A.98),
is a linear normed space. In particular, the space of all real-valued functions, f (x), continuous
on some interval [a, b] of the real line with the norm
f = max | f (x)| (A.99)
x∈[a,b]
0 Ax
range (A)
x ∗ = ( AT A)−1 AT b (A.107)
658 APPENDIX • MATH REVIEW
The above point satisfies the second-order sufficient condition to be the minimizer of (A.105)
because the Hessian of (A.105) is
2 AT A,
which is positive definite. The solution given by (A.107) is called the least squares solution of
the inconsistent linear system of equations, Ax = b, because it minimizes the sum of squares
m
e = e e = i=1
2 T
ei2 . For a graphical interpretation of the least squares solution, we refer to
Figure A.3.
Suppose now that we interpret the ith row of the matrix A and the ith component of the
vector b as the ith data pair. Thus, the ith data pair has the form (aiT ; bi ). Suppose that the k
data pairs are represented by the pair ( A; b). We know that the least squares solution to the
system of equations represented by the data pair ( A; b) is given by (A.107). If the new data pair,
(ak+1
T
; bk+1 ), becomes available, then we could find the least squares solution to the system of
equations
A b
T x= (A.108)
ak+1 bk+1
by applying the formula (A.107) to (A.108). Let x k+1 denote the least squares solution to the
above problem. Then,
!
T
"−1
T
A A A b
x k+1 = T T T . (A.109)
ak+1 ak+1 ak+1 bk+1
In the case of large k the above method of recomputing the new least squares solution from
scratch may be time-consuming. We present a recursive method for computing least squares
solutions as the data pairs become available. To proceed, let
P k = ( AT A)−1 . (A.110)
In view of the above, we can write
T
A A
P −1
k+1 = T T
ak+1 ak+1
A
= AT ak+1 T
ak+1
= AT A + ak+1 ak+1
T
= P −1
k + a k+1 a k+1 .
T
(A.111)
Hence,
−1
P k+1 = P −1
k + a k+1 a k+1
T
.
Applying to the above the matrix inversion formula (A.30), we obtain
T
P k ak+1 ak+1 Pk
P k+1 = P k − . (A.112)
1 + ak+1 P k ak+1
T
A.12 CONTRACTION MAPS 659
P −1 −1
k = P k+1 − a k+1 a k+1
T
The two equations (A.112) and (A.118) constitute the recursive least squares (RLS) algorithm.
To initialize the RLS, we can set
−1
P 0 = AnT An and x 0 = P n AnT bn ,
where the pair ( An ; bn ) represents the first n data pairs.
F(x ∗ ) = x ∗ . (A.120)
The importance of the contraction operators lies in the fact that there is a unique fixed point.
Furthermore, this point may be found by successive iterations as stated in the following theorem.
Observe that
( j)
x − x ( j+1) = F x ( j−1) − F x ( j)
≤ L x ( j−1) − x ( j)
..
.
≤ L j x (0) − x (1) , (A.124)
We use the above to further evaluate the right-hand side of (A.123) to obtain
(m)
x − x (n) ≤ L n x (0) − x (1) (1 + L + · · · + L m−n−1 )
≤ L n x (0) − x (1) (1 + L + · · · + L m−n−1 + L m−n + · · ·)
Ln
x (0) − x (1)
=
1−L
because 0 < L < 1. Recall that a sequence x (0) , x (1) , . . . is a Cauchy sequence if for
each ε > 0 there exists a positive integer n = n(ε) such that x (m) − x (n) < ε for all
m > n. Given an ε > 0, let n be such that
ε(1 − L )
Ln <
x (0) − x (1) .
Then,
(m) Ln
x − x (n) ≤ x (0) − x (1) < ε for all m > n. (A.125)
1−L
Therefore, the sequence x (0) , x (1) , . . . is a Cauchy sequence, and we will now show
that the limit g of this sequence is the fixed point x ∗ of the map F . Note that F
is uniformly continuous. Indeed, for each ε > 0 there exists a δ = δ(ε) such that if
x − y < δ, then F ( x) − F ( y) < ε. Since F ( x) − F ( y) ≤ L x − y, given an
ε > 0, we can choose δ = ε/L . Because F is continuous, we have
lim F x (n) = F lim x (n) = F ( g).
n→∞ n→∞
Using the above proof we can give an estimate of the norm of x (n) − x ∗ . Indeed, if we fix
n in (A.125) and allow m → ∞, then
(n) Ln
x − x∗ ≤ x (0) − F x (0) . (A.126)
1−L
662 APPENDIX • MATH REVIEW
To prove this equivalence, let φ be a solution of the initial value problem (A.128). Then x(t0 ) = x0
and
φ̇(t) = f (t, φ(t)) for all t ∈ I.
Integrating the above from t0 to t, we obtain
# t # t
φ̇(s)ds = f (s, φ(s)) ds.
t0 t0
Hence,
# t
φ(t) − x0 = f (s, φ(s)) ds.
t0
Hence,
$t
# t $τ
a(s) ds − a(s) ds
x(t) = e t0
x0 + e t0
b(τ ) dτ , (A.136)
t0
Lemma A.5 Let v(t) and w(t) be continuous real functions of t. Let w(t) ≥ 0, and c
be a real constant. If
# t
v(t) ≤ c + w(s)v(s) ds, (A.138)
0
then
$t
w(s) ds
v(t) ≤ ce 0 . (A.139)
Proof Denote the right-hand side of (A.138) by x(t); that is, let
# t
x(t) = c + w(s)v(s) ds (A.140)
0
and let
z(t) = x(t) − v(t). (A.141)
664 APPENDIX • MATH REVIEW
Note that z(t) ≥ 0, and x(t) is differentiable. Differentiating both sides of (A.140) and
using (A.141) yields
ẋ(t) = w(t)v(t)
= w(t) x(t) − w(t) z(t). (A.142)
We use (A.137) to represent (A.142) in the equivalent integral form as
$t # t $t
w(s) ds w(s) ds
x(t) = e 0 x0 − e τ w(τ ) z(τ ) dτ. (A.143)
0
Because z(t) ≥ 0 and w(t) ≥ 0, we have
$t
w(s) ds
x(t) ≤ e 0 x0 . (A.144)
From the definition of x(t) given by (A.140), we have x(0) = c. On the other hand,
v(t) = x(t) − z(t) ≤ x(t), (A.145)
because z(t) ≥ 0. Hence,
$t
w(s) ds
v(t) ≤ ce 0 ,
and the proof is completed.
We refer to the above equation as the equation of comparison. Denote the solution to the equation
of comparison by
x̃(t) = x̃(t; t0 , x̃ 0 ).
The following theorem, which can be found, for example, in Corduneanu [55, p. 49], relates the
solutions of the differential inequality and the associated differential equation.
A.15 SOLVING THE STATE EQUATIONS 665
Theorem A.20 If x(t) is a differentiable solution to the differential inequality and x̃(t)
is the solution to the associated differential equation with the corresponding initial
conditions such that
x0 ≤ x̃ 0 ,
then
x(t) ≤ x̃(t)
for all t ≥ t0 .
Proof We prove the theorem by contradiction. We suppose that x0 ≤ x̃ 0 but that
x(t) ≤ x̃(t) does not hold. Thus, there exists t1 > t0 such that
x(t1 ) > x̃(t1 ).
Let
y(t) = x(t) − x̃(t).
Note that y(t0 ) = x(t0 ) − x̃(t0 ) ≤ 0 and that y(t1 ) = x(t1 ) − x̃(t1 ) > 0. Denote by
τ the largest t ∈ [t0 , t1 ] such that y(τ ) = 0. Such a value of t exists by virtue of the
continuity of y. Thus,
y(t) > 0 for all t ∈ (τ, t1 ].
On the other hand, for t ∈ [τ, t1 ],
˙
ẏ(t) = ẋ(t) − x̃(t)
≤ ω(t, x(t)) − ω(t, x̃(t))
≤ |ω(t, x(t)) − ω(t, x̃(t))|
≤ L |x(t) − x̃(t)|.
In the interval [τ, t1 ],
y(t) = x(t) − x̃(t) ≥ 0.
Hence,
ẏ(t) ≤ L |x(t) − x̃(t)|
= L ( x(t) − x̃(t))
= L y(t).
Therefore, in the interval [τ, t1 ], the function y satisfies
y(t) ≤ y(τ )eL (t−τ ) = 0
because y(τ ) = 0. But this is a contradiction to the previous conclusion that y(t) > 0
for all t ∈ (τ, t1 ], which completes the proof.
x(t) = e At x 0
The above is a generalization of the situation when A is a scalar. The exponential matrix e At is
defined as
t2 t3
e At = I n + t A + A2 + A3 + · · · .
2! 3!
Observe that
d At
e = Ae At = e At A.
dt
Suppose now that the initial condition x(0) = x 0 is replaced with a more general condition
x(t0 ) = x 0 ,
where t0 is a given initial time instant. Then, the solution to ẋ(t) = Ax(t) subject to x(t0 ) = x 0 is
x(t) = e A(t−t0 ) x 0
As in the case of the uncontrolled system, we can also use the Laplace transform technique
to obtain the solution to (A.149) subject to the initial condition x(0) = x 0 . Indeed, taking the
Laplace transform of (A.149) and performing some manipulations yields
X(s) = (s In − A)−1 x(0) + (s In − A)−1 BU(s).
Hence,
x(t) = L−1 {(s In − A)−1 x(0)} + L−1 {(s In − A)−1 BU(s)}.
If x 0 = x(t0 ), then
x(t) = L−1 {e−t0 s (s In − A)−1 x(t0 )} + L−1 {e−t0 s (s In − A)−1 BU(s)}.
1 cos t
sin t
t 1
For example, if the point is moving along a straight line passing through fixed points y and z in
Rn , then
x(t) = y + t z. (A.153)
As another example, consider
x1 (t) cos t
x(t) = = ∈ R2 . (A.154)
x2 (t) sin t
Then, the point moves around a circle of radius 1 in counterclockwise direction as shown in
Figure A.4. We used t as the variable corresponding to the angle corresponding to the position
of the point on the circle. We are now ready for a formal definition of a curve.
Definition A.6 Let I ∈ R be an interval. A parameterized curve defined on the interval I
is an association which to each point of I associates a vector, where x(t) denotes the vector
associated to t ∈ I by x. We write the association t → x(t) as
x : I → Rn .
We also call this association the parameterization of a curve. We call x(t) the position vector
at time t.
The position vector can be represented in terms of its coordinates as in (A.152), where each
component xi (t) is a function of time t. We say that this curve is differentiable if each function
xi (t) is a differentiable function of time t. We define the derivative d x(t)/dt to be
⎡ ⎤
d x1 (t)
⎢ dt ⎥
⎢ ⎥
⎢ d x (t) ⎥
d x(t) ⎢ ⎥
2
⎢ ⎥
ẋ(t) = = ⎢ dt ⎥
⎢ ⎥
⎢ ... ⎥
dt
⎢ ⎥
⎣ d x (t) ⎦
n
dt
and call it the velocity vector of the curve at time t. The velocity vector is located at the origin.
However, when we translate it to the point x(t), as in Figure A.5, then we visualize it as tangent
to the curve x(t).
A.16 CURVES AND SURFACES 669
x(t) .
x(t) ⫹ x(t)
.
Figure A.5 Velocity vector of a curve at time t is parallel
x(t) to a tangent vector at time t.
We define the tangent line to a curve x at time t to be the line passing through x(t) in the
direction of ẋ(t), provided that ẋ(t) = 0. If ẋ(t) = 0, we do not define a tangent line. The speed
of the curve x(t), denoted v(t), is v(t) = ẋ(t). To illustrate the above, we consider the point
moving on the circle. The point position at time t is
x(t) = [cos t sin t]T .
The velocity vector of the curve at time t is
ẋ(t) = [−sin t cos t]T ,
and the speed of the curve x(t) at time t is
v(t) = ẋ(t) = (− sin t)2 + cos2 t = 1.
The set of points in R3 such that for some F : R3 → R we have
F(x) = F(x1 , x2 , x3 ) = 0
and
∂F ∂F ∂F
D F(x1 , x2 , x3 ) = ∇ F(x1 , x2 , x3 ) = T
0T
=
∂ x1 ∂ x2 ∂ x3
is called a surface in R3 , where ∇ F = grad F. For example, let
F(x1 , x2 , x3 ) = x12 + x22 + x32 − 1;
then, the surface is the sphere of radius 1, centered at the origin of R3 .
Now let
C(t) = [x1 (t) x2 (t) ··· xn (t)]T
be a differentiable curve. We say that the curve lies on the surface described by
F1 (x1 , x2 , . . . , xn ) = 0,
..
.
Fm (x1 , x2 , . . . , xn ) = 0
if, for all t, we have
F1 (x1 (t), x2 (t), . . . , xn (t)) = 0,
..
.
Fm (x1 (t), x2 (t), . . . , xn (t)) = 0.
670 APPENDIX • MATH REVIEW
F1 ⫽ 0 grad F2
x0
F2 ⫽ 0
.
C(t0)
grad F1
C(t)
for all x ∈ U. We now consider an inverse problem: When is a given vector field in Rn a
gradient?
Definition A.9 Let F be a vector field on an open set U. If f is a differentiable function on
U such that
F = ∇ f,
then we say that f is a potential function for F.
We then can ask the question, Do the potential functions exist? To answer the question, we recast
the problem into the one that involves solving n first-order partial differential equations:
∂f
= Fi (x), i = 1, 2, . . . , n,
∂ xi
with the initial condition
F(x 0 ) = F 0 .
To answer the above question we need one more definition.
Definition A.10 An open set U is said to be connected if, given two points in U, there is a
differentiable curve in U that joins the two points.
If U = Rn , then any two points can be joined by a straight line. Of course, it is not always the
case that two points of an open set can be joined by a straight line, as illustrated in Figure A.8.
We say that the function F = [F1 F2 · · · Fn ]T is a C 1 function if all partial derivatives of all
functions Fi exist and are continuous.
672 APPENDIX • MATH REVIEW
x2
Rectangle
[[
x1
x2
[ [
x10
x20
[ [
x1
x20
Let
∂u( x2 )
−F 2 ( x10 , x2 ) + = 0,
∂ x2
∂u( x2 )
that is, ∂ x2
= F 2 ( x10 , x2 ). Then, D 2 f = F 2 as required. To conclude the proof, we let
# x2
u( x2 ) = F 2 ( x10 , x2 ) dx2 ,
x20
which yields
# x1 # x2
f ( x1 , x2 ) = F 1 (s, x2 ) ds + F 2 ( x10 , t) dt (A.155)
x10 x20
We note that if the connected region U is an arbitrary open set, then the above proof may not
work. This is because we needed to integrate over intervals that were contained in the region U.
Suppose we are given a vector field F on an open set U in the plane. We can interpret F as a
field of forces. We wish to calculate the work done when moving a particle from a point C(t1 )
to a point C(t2 ) along the curve C(t) in U , as illustrated in Figure A.10. Let
We assume that each component of F is differentiable; that is, F is a differentiable vector field.
We next assume that the curve C(t) is defined on a closed interval [t1 , t2 ] with t1 < t2 . For each
t ∈ [t1 , t2 ] the value C(t) is a point in R2 . The curve C lies in U if C(t) is a point in U for
all t ∈ [t1 , t2 ]. We further assume that the curve C is continuously differentiable and that its
derivative Ċ(t) = d C(t)/dt exists and is continuous; that is, we assume that C ∈ C 1 .
674 APPENDIX • MATH REVIEW
C(t2)
.
C (t)
C(t)
Theorem A.22 (Green) Let the vector field F be given on a region A that is the interior
of a closed path C parameterized counterclockwise. Then,
# ##
∂ F2 ∂ F1
F 1 dx1 + F 2 dx2 = − dx1 dx2 .
C A ∂ x1 ∂ x2
∇(x T A y) = A y, (A.159)
∇( y T Ax) = AT y, (A.160)
Let f : Rn → Rn and g : Rn → Rn be two maps, and let ∂ f /∂ x and ∂ g/∂ x be the Jacobian
matrices of f and g, respectively. Then,
T T
∂f ∂g
∇( f (x) g(x)) = ∇(g(x) f (x)) =
T T
g+ f. (A.162)
∂x ∂x
Notes
For more on the subject of proof techniques, we refer to Velleman [288]. For further reading on
linear algebra, and in particular matrix theory, we recommend classical texts of Gel’fand [95]
and Gantmacher [93]. Analyses of numerical methods for matrix computations can be found in
Faddeeva [78], Householder [127], Golub and Van Loan [104], and Gill, Murray, and Wright [97].
Lang [175] and Seeley [255] are recommended for a review of calculus of several variables. A
straightforward approach to mathematical analysis is presented in Binmore [28]. Principles of
real analysis can be found in Natanson [215], Bartle [22], and Rudin [244]. Some facts from
basic functional analysis that we touched upon in this appendix are discussed, for example, by
Maddox [193] and by Debnath and Mikusiński [58]. Lucid expositions of differential equations
are in Driver [69], Miller and Michel [205], and Boyce and DiPrima [32].
G. H. Hardy (1877–1947) argues in reference 114 (p. 113) that in truly great theorems
and their proofs, “there is a very high degree of unexpectedness, combined with inevitability
and economy. The arguments take so odd and surprising a form; the weapons used seem so
childishly simple when compared with the far-reaching results; but there is no escape from
the conclusions.” He further states that “A mathematical proof should resemble a simple and
clear-cut constellation, not a scattered cluster in the Milky Way.”
EXERCISES
A.1 The open unit disc with respect to the norm · p is the set
The unit discs are quite different for different choices of the norm · p . Sketch 1 ,
2 , 3 , and ∞ .
A.2 Given a linear combination of matrices,
M = α1 M 1 + · · · + αr M r , (A.164)
j
where for i = 1, . . . , r , αi ∈ R, and M i ∈ Rn×n . Let mi be the jth row of the matrix
676 EXERCISES
M i . Thus,
⎡ ⎤
mi1
⎢ 2⎥
⎢ mi ⎥
Mi = ⎢ ⎥
⎢ .. ⎥ .
⎣ . ⎦
min
Show that
⎡ ⎤
m1s1
r
r
⎢ . ⎥
det M = ··· αs1 · · · αsn det ⎢ . ⎥
⎣ . ⎦. (A.165)
s1 =1 sn =1
mnsn
A.3 Let X ∈ Rn×m and Y ∈ Rm×n . Show that
det(I n + XY ) = det(I m + Y X).
A.4 Let λi ( A) denote the ith eigenvalue of a matrix A ∈ Rn×n and let α be a scalar. Show
that
(a) λi (α A) = αλi ( A);
(b) λi (I n ± α A) = 1 ± αλi ( A).
A.5 Show that the formulas (A.75) and (A.76) for matrix norms are equivalent.
A.6 Show that for any square matrix A ∈ Rn×n , we have
A ≥ max |λi ( A)|. (A.166)
1≤i≤n
is
A < 1.
A.9 Let A ∈ Rn×n . Show that a necessary and sufficient condition for
lim Ak → O.
k→∞
is
|λi ( A)| < 1, i = 1, 2, . . . , n.
A.10 Let eig(M) denote the set of eigenvalues of the square matrix M and let be the open
unit disk in the complex plane. Prove that if eig(M) ⊂ ∪ {1}, then limk→∞ M k exists
if and only if the dimension of the eigenspace corresponding to the eigenvalue 1 is
equal to the multiplicity of the eigenvalue 1; that is, the geometric multiplicity of the
eigenvalue 1 is equal to its algebraic multiplicity.
EXERCISES 677
A.11
(a) Show that the series
∞
1
n=1
n
diverges to +∞.
(b) Show that if α is a rational number such that α > 1, then the series
∞
1
n=1
nα
converges.
(c) Use MATLAB’s Symbolic Math Toolbox to compute
∞
1
2
.
n=1
n
A.12
(a) Let 0 < x < 1. Show that
ln(1 − x) ≤ −x.
(b) Let 0 < x ≤ 1/2. Show that
ln(1 − x) ≥ −2x.
A.13 Let 0 < ak < 1 for k ≥ 0. Use a proof by contraposition, Theorem A.16, and Exer-
cise A.12 to show that
%∞
∞
(1 − ak ) = 0 if and only if ak = ∞.
k=0 k=0
A.17 For
−2 −6
A= ,
2 −10
find constants M > 0 and α > 0 such that
e At 2 ≤ Me−αt , t ≥ 0.
A.18 Show that the function space C([a, b]) of real-valued functions, f : R → R, with the
norm defined by (A.99) is a normed linear space. Show that this space is a Banach
space.
A.19 Derive a sufficient condition for the convergence of the iteration process
x (k+1) = Ax (k) + b, k = 0, 1, . . . ,
for any initial x(0), where A ∈ Rn×n and b ∈ Rn .