Laa 2024
Laa 2024
Chapter 1. INTRODUCTION 4
Chapter 2. VECTOR SPACES 6
Chapter 3. LINEAR TRANSFORMS AND MATRICES 12
Chapter 4. INNER-PRODUCT AND NORMED SPACES 16
Chapter 8. DIAGONALIZATION 27
Chapter 9. EIGEN DECOMPOSITION 31
Chapter 10. SINGULAR VALUE DECOMPOSITION 33
Chapter 11. POSITIVE MATRICES 35
3
CHAPTER 1
INTRODUCTION
We discuss some concrete problems that can be solved using linear algebra.
(1) Linear regression. Given a matrix A and a vector b, find vector x that
minimizes
m 2
k Ax − bk2 = ∑ ∑ Ai j x j − bi ,
i =1 j
VECTOR SPACES
and
L2 = x ∈ S : | x1 |2 + | x2 |2 + · · · < ∞ .
One direction is obvious. In the other direction, all we need to check is that 0 ∈ V
also belongs to U and that for all u ∈ U, we have −u ∈ U; the rest of the vector
space axioms hold automatically.
E XAMPLE 2. In the examples below, V is a vector space and U ⊂ V is a subspace.
is a subspace of Rn .
E XERCISE 8. Let V be a vector space over F. Let X be a nonempty subset of V with the
property that
∀ u, v ∈ X, t ∈ F : tu + (1 − t)v ∈ X.
Prove that there exists an x0 ∈ X and a subspace U ⊆ V such that X = x0 + U.
We now describe two commonly used operations for generating new subspaces
from existing ones.
P ROPOSITION 2. Let (Uα )α ∈ I be a collection of subspaces of V, where I is some index set.
Then \
Uα is a subspace of V.
α∈ I
Unfortunately, the union of two subspaces need not be a subspace. For example, let
if V = R2 , U1 = { x ∈ V : x1 = 0}, and U2 = { x ∈ V : x2 = 0}, then U1 ∪ U2 is a
cross-shaped object which is not closed under addition.
D EFINITION 3. Let U1 , . . . , Um be subspaces of V. Their sum is defined as
U1 + · · · + Um = u 1 + · · · + u m : u 1 ∈ U1 , . . . , u m ∈ Um .
E XAMPLE 3. Consider the example above: U1 = { x ∈ V : x1 = 0} and U2 = { x ∈ V :
x2 = 0}. Then U1 + U2 = R2 .
Next, we look at the concepts of span, independence, and basis. All of these rest on
the idea of a “linear combination,” which in turn is built on the two operational
pillars of a vector space.
D EFINITION 5. Let v1 , . . . , vm ∈ V. We say that v ∈ V is a linear combination of
v1 , . . . , vm if there exists a1 , . . . am ∈ F such that v = a1 v1 + · · · + am vm . The set of all
possible linear combinations is called the span of v1 , . . . , vm :
span(v1 , . . . , vm ) = a1 v1 + · · · + am vm : a1 , . . . , am ∈ F .
E XAMPLE 4.
(1) Any v ∈ R2 is in the span of v1 = (1, 0) and v2 = (0, 1).
(2) The vector v = (1, 1, 1) is not in the span of v1 = (1, 0, 0) and v2 = (0, 1, 0).
E XERCISE 11. Let v1 = (1, −1) and v2 = (−1, 1). Show that span(v1 , v2 ) = R2 .
E XERCISE 12. Show that U = span(v1 , . . . , vm ) is a subspace of V. Moreover, argue why
U is the smallest subspace containing v1 , . . . , vm .
We now define the core concept for this course, the notion of finite-dimensional
vector spaces.
D EFINITION 6. A vector space V is finite-dimensional if there exists n > 1 and v1 , . . . , vn ∈
V such that V = span(v1 , . . . , vn ). A vector space that is not finite-dimensional is called
infinite-dimensional.
Note that we have not defined “dimension” at this point. Moreover, n in the above
definition is not necessarily the dimension of the vector space.
E XAMPLE 5.
(1) Fm is finite-dimensional.
(2) The space Fm×n of (m × n) F-valued matrices is finite-dimensional.
(3) The space of polynomials of degree at most k is finite-dimensional.
E XERCISE 13. Prove that every subspace of a finite-dimensional vector space is finite-
dimensional.
E XERCISE 14. Verify that R is a vector space over R. Also, verify that R is a vector space
over Q. Is R finite-dimensional over R? Is R finite-dimensional over Q?
E XERCISE 15. Show that the following spaces are infinite-dimensional.
(1) The space of polynomials with complex coefficients.
(2) The space of continuous functions f : R → R.
10 2. VECTOR SPACES
E XERCISE 16. Let V be a vector space. Show that the following are equivalent:
(1) V is infinite-dimensional.
(2) For all k > 1, there exists v1 , . . . , vk ∈ V that are linearly independent.
(3) For all k > 1 and v1 , . . . , vk ∈ V, there exists v ∈ V that does not belong to
span(v1 , . . . , vk ).
D EFINITION 7. We say that v1 , . . . , vm ∈ V are linearly independent if
a1 v1 + . . . + am vm = 0 ⇒ a1 = . . . = am = 0.
In other words, there does not exist a1 , . . . , am that are not all zero and yet
a1 v1 + . . . + am vm = 0.
If the vectors are not linearly independent, we say they are linearly dependent.
E XERCISE 17.
(1) Show that v1 = (1, −1) and v2 = (−1, 1) are linearly independent in R2 .
(2) In Fn , define vectors e1 , . . . , en to be
(
1, if j = i,
∀ i = 1, . . . , n, ∀ j = 1, . . . , n : ei ( j) =
0, if j 6= i.
Show that e1 , . . . , en are linearly independent. We call this the standard basis.
(3) Show that the system of functions e1 , e2 , . . . : [0, 1] → C defined as
ek (t) = exp(ι2πt) (t ∈ [0, 1]),
are linearly independent in the space of complex-valued functions on [0, 1].
E XERCISE 18. If v1 , . . . , vm are linearly independent, then show that
span(v1 , . . . , vm ) = span(v1 ) ⊕ · · · ⊕ span(vm ).
Thus, a clear relationship exists between the cardinality of spanning vectors and
linearly independent vectors. As a consequence of the above observation, we can
conclude the following.
C OROLLARY 1. Any (n + 1) vectors in Fn must be linearly dependent and any (n − 1)
vectors cannot span Fn .
Later, we will see that every finite-dimensional vector space has a special basis
known as an orthonormal basis, which is easier to work with.
We next give a useful characterization of a basis (this is the definition of a basis in
some books).
E XERCISE 20. Show that a set of vectors {v1 , . . . , vn } is a basis of V if and only if, for all
v ∈ V, there exists unique a1 , . . . , an ∈ F such that v = a1 v1 + . . . + an vn .
D EFINITION 9. For a finite-dimensional vector space V, we define its dimension to be the
cardinality of a basis of V. We denote this by dim V.
The trivial space V = {0} does not admit a basis; its dimension is defined as 0.
The following result confirms that the notion of “dimension” does not depend on
the choice of basis; it is an intrinsic quantity for a vector space.
P ROPOSITION 4. Let V be a finite-dimensional vector space. Then any two bases of V
have the same cardinality.
E XERCISE 21. Let dim V = n. Show that
(1) Any n linearly independent vectors is a basis of V.
(2) Any n vectors that span V is a basis of V.
E XERCISE 22. Let V be a finite-dimensional space and U be a subspace of V. Then show
that dim U 6 dim V.
E XERCISE 23. Find dim H where H is the subspace in Exercise 7.
E XERCISE 24. Let U1 and U2 be two subspaces of a vector space V. If dim U1 +
dim U2 > dim V, then prove that there must exist a nonzero vector in U1 ∩ U2 .
E XERCISE 25. Let V be a vector space over R. A subspace U ⊆ V is proper if U 6= V.
Prove that if dim V > 1, then V cannot be the union of proper subspaces.
We now look at the central objects of study in linear algebra — linear transforms
and matrices. We recall that the underlying field F is R or C.
D EFINITION 10. Let V and W be two vector spaces over F. A linear transform from V to
W is a function T : V → W with the following properties:
∀ u, v ∈ V : T (u + v) = T (u) + T (v).
∀ u ∈ V, a ∈ F : T ( au) = aT (u).
R EMARK 3.
(1) Linear transforms are also called linear maps or linear operators. In this course,
we will reserve the term “operator” for the case where V = W.
(2) We will use L(V, W) to denote the set of all possible linear transforms from V to
W. In particular, we will use the shorthand L(V) = L(V, V).
(3) A linear transform T : V → W maps 0V to 0W .
(4) We will use 0 and Id to denote the zero and identity maps:
∀v ∈ V : 0(v) = 0 and Id(v) = v.
E XAMPLE 6.
The set L(V, W) comes with a natural vector space structure. Namely, if T, T 0 ∈
L(V, W) and a ∈ F, then we can define T + T 0 : V → W and aT : V → W as
follows:
∀v ∈ V : ( T + T 0 )(v) = T (v) + T 0 (v) and ( aT )(v) = aT (v).
E XERCISE 30. Verify that T + T 0 and aT are linear transforms and that L(V, W) is a
vector space over F. In particular, prove that dim L(V, W) = dim V · dim W.
What is the relation between the different matrix representations in Exercise 38.
D EFINITION 17. We say that M, N ∈ Fn×n are similar, denoted M ∼ N, if there exists
an invertible matrix P ∈ Fn×n such that M = PNP−1 .
E XERCISE 39. Let M and N be matrix representations of a linear operator corresponding
to two different bases. Prove that M ∼ N.
In Problem (14), we show that the space of matrices Fn×n can be partitioned into
equivalence classes, where each equivalence class [ M] = { N : N ∼ M} consists of
matrix representations of the linear transform represented by M.
CHAPTER 4
We first recall the concepts of dot product and Euclidean distance on Rn and Cn .
The dot product is defined as
(2) x · y = x1 y1 + · · · + xn yn ( x, y ∈ Rn ),
and
(3) x · y = x̄1 y1 + · · · + x̄n yn ( x, y ∈ Cn ),
where z̄ denotes the conjugate of z ∈ C. The length of x ∈ Rn is given by
√ 1
(4) k xk = x · x = ( x21 + · · · + x2n ) 2 ,
and that of z ∈ Cn is
√ 1
(5) k zk = z · z = (| z1 |2 + · · · + | zn |2 ) 2 .
E XERCISE 42. Let V = Rn and fix w1 , . . . , wn > 0. Verify that the following is an inner
product on V:
∀ x, y ∈ V : h x, yi = w1 x1 y1 + · · · + wn xn yn .
Thus, Rn and Cn are examples of inner-product spaces. In fact, we can make the
following general statement.
T HEOREM 4. Any finite-dimensional vector space is an inner-product space.
We now turn to normed spaces. In this case, we do not need to distinguish between
real and complex spaces.
D EFINITION 21. Let V be a real or complex vector space. An norm on V is a map
k·k : V → R such that
(1) k xk > 0 for all x ∈ V, and k xk = 0 if and only if x = 0.
(2) ∀ x ∈ V, a ∈ F : k axk = | a| k xk.
(3) ∀ x, y ∈ V, k x + yk 6 k xk + k yk.
R EMARK 9.
(1) Property 1 is positive definiteness, 2 is homogeneity, and 3 is triangle inequality.
(2) Norm of a vector is a real number, even for a complex vector space.
E XERCISE 44. Verify that (4) and (5) are valid norms on Rn and Cn .
D EFINITION 22. A vector space with a norm is called a normed space.
Thus, Rn and Cn are both inner product and normed spaces. Note that the respec-
tive norms (4) and (5) are derived from the respective dot products. In fact, any
inner product naturally induces a norm as follows.
E XERCISE 45. Let (V, h·, ·i) be an inner product space. Define
1
∀x ∈ V : k xk = h x, xi 2 .
Verify that k·k is a norm. We call k·k the norm induced by the inner product h·, ·i.
While every inner product space is automatically a normed space, the converse
is not true, i.e., not all norms are induced by an inner product, as shown in the
following example.
E XERCISE 46. Let V = R2 and k xk = | x1 | + | x2 |. Verify that this is a norm. Explain
why there does not exist an inner product h·, ·i on V such that k xk = h x, xi1/2 .
We can now state the Pythagoras theorem and the parallelogram law.
E XERCISE 47. Show if that if x ⊥ y, then k x + yk2 = k xk2 + k yk2 .
E XERCISE 48. Show that k x + yk2 + k x − yk2 = 2(k xk2 + k yk2 ).
R EMARK 10. The proof of the parallelogram law depends crucially on the fact that the
norm in question is derived from an inner product. In fact, the parallelogram law need not
hold in general normed spaces.
D EFINITION 24. A list of vectors q1 , . . . , qn in V are said to be an orthonormal system if
kqi k = 1 for all i and qi ⊥ q j for all i 6= j.
E XAMPLE 9.
(1) The vectors in Exercise 17 form an orthonormal system w.r.t. the dot product.
(2) The functions in Exercise 17 are orthonormal wrt the following inner product:
Z 1
h f , gi = f (t) g(t) dt.
0
D EFINITION 25. A basis in which the vectors form an orthonormal system is called an
orthonormal basis (ONB).
E XERCISE 50. Show that the orthonormal system in Exercise 17 is an ONB of Rn .
The above ONB is w.r.t. the special inner product on Rn , namely, the dot product.
Can we come up with an ONB if we work with a different inner product? What
about an abstract inner product?
T HEOREM 7. Any finite-dimensional inner product space has an ONB.
We call the latter Parseval’s formula; this gives us an easy means of computing the
norm of a vector from its coefficients.
Having discussed inner product spaces, we move to linear operators on (and
between) such spaces. A fundamental result in this regard is the characterization of
the dual space V0 = L(V, F) — the space of linear functionals on V.
T HEOREM 8. Let V be a finite-dimensional inner product space. Then
∀ ` ∈ V0 , ∃ v ∈ V : `( x) = hv, xi ( x ∈ V).
That is, every linear functional on V is given by an inner product.
E XERCISE 53. If V is finite-dimensional, then show that V0 = ∼ V.
(1) P2 = P.
(2) R( P) = U and N ( P) = U⊥ .
(3) P + PU⊥ = Id.
(4) k Pxk 6 k xk for all x ∈ V.
(5) For all x ∈ V,
P( x) = argmin ku − xk.
u ∈U
In other words, P( x) is the (unique) point in U that is “closest” to x.
E XERCISE 58. If P is an orthogonal projection, then show that
∀ x, y ∈ V : h Px, yi = h x, Pyi.
We call such operators as self-adjoint. But first, we must define the “adjoint” of an
operator; the definition is motivated by the following results.
E XERCISE 59. For A ∈ Rm×n , define its transpose A> ∈ Rn×m to be ( A> )i j = A ji for
all 1 6 i 6 m, 1 6 j 6 n. Show that
∀ x, y ∈ Rn : h Ax, yi = h x, A> yi,
where h·, ·i is the standard dot product.
On the other hand, for A ∈ Cm×n define its Hermitian transpose AH ∈ Cn×n to be
( AH )i j = A ji for all 1 6 i, j 6 n. Show that
∀ x, y ∈ Cn : h Ax, yi = h x, AH yi,
20
5. TRANSFORMS ON INNER PRODUCT SPACES 21
We can identify the following four “fundamental subspaces” associated with any
T ∈ L(V, W):
N ( T ), R( T ∗ ) ⊆ V and N ( T ∗ ), R( T ) ⊆ W.
There exists a particular relation between these subspaces.
T HEOREM 10. Let T ∈ L(V, W). Then
(1) N ( T ∗ T ) = N ( T ) and N ( TT ∗ ) = N ( T ∗ ).
(2) N ( T )⊥ = R( T ∗ ) and N ( T ∗ )⊥ = R( T ).
(3) V = N ( T ) ⊕ R( T ∗ ) and W = N ( T ∗ ) ⊕ R( T ).
We end this part by discussing some important classes of operators that will be
studied in greater detail later.
D EFINITION 28. An operator is self-adjoint if T ∗ = T, i.e., if
∀ x, y ∈ V : h Tx, yi = h x, Tyi.
In particular, a real self-adjoint matrix (A> = A) is called symmetric, and a complex
self-adjoint matrix (AH = A) is called Hermitian.
R EMARK 12. The inner product is essential in deciding if an operator is self-adjoint. An
operator that is self-adjoint w.r.t. one inner product need not be self-adjoint w.r.t. a different
inner product.
E XERCISE 64. Let P ∈ L(V) be such that P2 = P. Prove P is an orthogonal projection if
and only if P is self-adjoint.
22 5. TRANSFORMS ON INNER PRODUCT SPACES
What is special about a self-adjoint operator? We will later see that such operators
can always be diagonalized in an ONB. In fact, they form a subclass of a larger set
of diagonalizable operators.
D EFINITION 29. An operator is T ∈ L(V) is said to be normal if it commutes with its
adjoint: T ∗ T = TT ∗ .
Similar to vectors, we can define inner products and norms for matrices. After all,
the space of matrices (of a fixed shape) is finite-dimensional, and we know that
such spaces are automatically an inner product space.
We first look at the concept of operator norm, which can even be defined for
operators on infinite-dimensional spaces.
D EFINITION 31. Let (V, k·kV ) and (W, k·kW ) be two normed spaces. The operator (or
induced) norm of T ∈ L(V, W) is defined as
n o
(8) k T k = max k TxkW : x ∈ V, k xkV = 1 .
E XERCISE 70. Verify that (8) is a valid norm on L(V, W). Moreover, show that
(9) ∀ T, S ∈ L(V) : k TSk 6 k T k k Sk.
R EMARK 13.
(1) The sup in (8) is finite if V is finite-dimensional but need not be finite if V is
infinite-dimensional; if the sup is finite, we call T a bounded operator.
(2) An important special case is when T is a matrix A and the norms are Euclidean.
In this case, we call (8) the spectral norm and is denoted by k Ak2 . We will later
see that this corresponds to the largest singular value of a matrix.
Generally, a “matrix norm” is a standard norm on the space of matrices with the
additional submultiplicative property (9). We record this for completeness.
D EFINITION 32. A map k·k : Cn×n → R is called a matrix norm if it satisfies the
following properties.
We already know from Exercise 70 that the spectral norm is a matrix norm. Another
important example is the Frobenius norm.
D EFINITION 33. The Frobenius norm of A ∈ Cm×n is defined as
!1/2
m n
(10) k Ak F = ∑∑ | ai j |2 .
i =1 j=1
23
24 6. OPERATOR AND MATRIX NORMS
E XERCISE 71. Verify that (10) is a matrix norm in the sense of Definition 32 but is not an
induced norm in the sense of (8).
E XERCISE 72. Let A ∈ Cm×n be of rank r. Prove that
√
k Ak2 6 k Ak F 6 r k Ak2 .
Verify that these are norms on Cm×n and that they are induced by norms on Cm and Cn .
The following result tells us that the set of invertible matrices in Cn×n form an open
set.
E XERCISE 75. Let A ∈ Cn×n be invertible and k · k be a norm on Cn×n . Show (without
using determinants) that there exists δ > 0 such that B is invertible if k B − Ak < δ.
We will later need the following result to establish the Cayley-Hamilton theorem.
E XERCISE 76. Let p be a polynomial with complex coefficients and k · k be a norm on Cn×n .
Let A ∈ Cn×n and ( An ) be a sequence of matrices in Cn×n such that k An − Ak → 0.
Prove that
lim k p( An ) − p( A)k = 0.
n→∞
CHAPTER 7
In this part, we will talk about the trace and determinant of a matrix. These are
useful for simplifying calculations involving matrices.
Unlike the previous chapters, we will depart from generic linear transforms and
deal exclusively with matrices. In particular, we will use determinants to character-
ize invertible matrices and find eigenvalues of matrices. Unless otherwise stated,
the underlying field F is R or C.
D EFINITION 34. The trace of A ∈ Fm×n is tr( A) = A11 + · · · + Ann .
We can view trace as a linear functional on the space of matrices, i.e., tr ∈ L(Fm×n , F).
In particular, we have the following properties.
E XERCISE 77. Show that
(1) tr( AB) = tr( BA) for all A, B ∈ Fm×n .
(2) k Ak2F = tr( AH A) for all A ∈ Fm×n .
(3) kuk2 = tr(uu> ) for all u ∈ Rn .
(4) kuk2 = tr(uuH ) for all u ∈ Cn .
(5) A ∼ B ⇒ tr( A) = tr( B) for all A, B ∈ Fm×n .
E XERCISE 78. Let A and B be two matrix representations of an operator T, show that
tr( A) = tr( B). Thus, we can define the trace of T as tr( A), where A = [ T ].
E XAMPLE 11. (2, 1) and (3, 2, 1) are transpositions but (2, 3, 1) and (3, 1, 2) are not
transpositions.
E XERCISE 80. Compute |Tn | and compare with |Πn |.
There are different ways to express a permutation using transpositions, e.g., apply-
ing the same transposition an even number of times does not make any difference.
What is invariant is the parity of the number of transpositions in the decomposition.
E XERCISE 82. Let σ ∈ Πn . Suppose there exists τ1 , . . . , τk ∈ Tn and ρ1 , . . . , ρ` ∈ Tn
such that
σ = τ1 ◦ · · · ◦ τk = ρ1 ◦ · · · ◦ ρ` .
Show that both k and ` are even or both are odd. Thus, every σ ∈ Πn has even parity (even
permutation) or odd parity (odd permutation).
DIAGONALIZATION
In this chapter, we will work with a finite-dimensional vector space V over the field
C. We will use PC to denote the space of polynomials with complex coefficients.
We wish to understand the conditions for an operator to be “diagonalizable”.
D EFINITION 39. An operator T ∈ L(V) is said to be diagonalizable if there exists a basis
v1 , . . . , vn of V and λ1 , . . . , λn ∈ C such that
∀1 6 i 6 n : Tvi = λi vi .
R EMARK 14. The matrix representation [ T ] with respect to v1 , . . . , vn is diagonal:
λ1
A = diag(λ1 , . . . , λn ) =
.. .
.
λn
In other words, a diagonalizable matrix is similar to a diagonal matrix.
(1) For the zero operator, every nonzero vector is an eigenvector, and 0 is the only
eigenvalue
(2) For the identity operator, every nonzero vector is an eigenvector, and 1 is the only
eigenvalue
(3) The eigenvalues of an (n × n) diagonal matrix are the diagonal elements, and
eigenvectors are the columns of the identity matrix In .
E XERCISE 86. Let T be such that T k = 0 for some k > 2. Find the eigenvalues of T.
The following relates eigenvalues with determinants. Recall that a square matrix is
not invertible if and only if det( A) = 0.
27
28 8. DIAGONALIZATION
E XERCISE 87. Let T ∈ L(V) and λ ∈ C. Show that λ is an eigenvalue of T if and only if
T − λId is not injective. In particular, if A = [ T ] ∈ Cn×n , then show that λ ∈ C is an
eigenvalue of T if and only if A − λIn is not invertible, or equivalently, det( A − λIn ) = 0.
Using the permutation-based formula for determinants, we can deduce the follow-
ing.
E XERCISE 88. Let A ∈ Cn×n . Show that p(t) = det( A − tIn ) is a polynomial of the
form p(t) = (−1)n tn + a1 tn−1 + · · · + an .
We will later need the following result called the Cayley-Hamilton theorem. This
requires the concept of a matrix polynomial. For any polynomial q(t) = a0 + a1 t +
· · · + am tm , ai ∈ C, and A ∈ Cn×n , we define the matrix q( A) ∈ Cn×n as follows:
q( A) = a0 In + a1 A + · · · + am Am .
T HEOREM 11. Let A ∈ Cn×n . Then p A ( A) = 0.
E XERCISE 90. Let A ∈ Cn×n and q ∈ PC . Show that if λ is an eigenvalue of A, then
q(λ ) is an eigenvalue of q( A). Conversely, if µ is an eigenvalue of q( A), then show that
µ = q(λ ) for some eigenvalue λ of A.
Use the above result to establish the Cayley-Hamilton theorem for diagonalizable
matrices.
E XERCISE 91. Establish Theorem 11 for a diagonalizable matrix.
That an operator T ∈ L(V) can have at most n = dim(V) distinct eigenvalues can
be deduced from the following observation.
8. DIAGONALIZATION 29
We remark that the last condition is not necessary for diagonalizability. Indeed,
the zero and identity transform have just one distinct eigenvalue but are trivially
diagonalizable.
We will now develop a necessary and sufficient condition for an operator to be
diagonalizable. We need the concepts of algebraic and geometric multiplicity in
this regard.
D EFINITION 43. Let λ ∈ C be an eigenvalue of T. We say that the algebraic multiplicity
of λ is k if λ appears k times as a root of the characteristic polynomial. More precisely,
we can write the characteristic polynomial as (t − λ )k q(t) for some polynomial q where
q(λ ) 6= 0.
We will establish the Schur decomposition using induction on n. We will use the
following base case.
E XERCISE 96. Establish the Schur decomposition for any A ∈ C2×2 .
We will next deduce the Cayley-Hamilton theorem using a limiting argument and
the Schur decomposition.
L EMMA 2. Diagonalizable matrices are dense in Cn×n .
By dense, we mean that for any A ∈ Cn×n , there exists a sequence of diagonalizable
matrices An such that k An − Ak → 0 as n → ∞, where k · k is some matrix norm.
Using Lemma 2 and Exercise 76, we can establish the Cayley-Hamilton theorem for
general matrices.
CHAPTER 9
EIGEN DECOMPOSITION
Recall the concept of a self-adjoint operator on an inner product space. This includes,
in particular, symmetric and Hermitian matrices. They are, in turn, a particular case
of normal operators, which include skew-adjoint and unitary operators. We will
show that these operators can be diagonalized in an orthonormal (or unitary) basis.
Notably, we will develop this theory without using determinants, characteristic
equations, or the fundamental theorem of algebra.
T HEOREM 15. Let V be a complex, finite-dimensional inner product space, and let T ∈
L(V) be self-adjoint. Then there exists an orthonormal basis of eigenvectors of T.
R EMARK 16.
Theorem 15 can be deduced via induction on the dimension of the space, along
with the following result. That the eigenvalues are real is a part of the proof.
L EMMA 3. Every self-adjoint operator has an eigenvector with a real eigenvalue.
We can extend the spectral theorem to the larger class of normal operators. However,
the eigenvalues are no longer guaranteed to be real.
T HEOREM 16. Let V be a complex, finite-dimensional inner product space, and let T ∈
L(V) be a normal operator. Then there exists an orthonormal basis of eigenvectors of T.
R EMARK 17.
(1) By extending the orthonormal vectors into orthonormal bases, we can write
(16) A = UΣV > ,
where U ∈ Cm×m and V ∈ Cn×n are unitary matrices, and Σ ∈ Rm×n is
defined as (
σi 1 6 i = j 6 r,
Σi j =
0 otherwise.
We call (15) the partial SVD of A and (16) the full SVD.
(2) We call σ1 , . . . , σr the singular values, u1 , . . . , ur the left singular vectors, and
v1 , . . . , vr the right singular vectors.
(3) The SVD and eigendecomposition of A need not coincide even if A is Hermit-
ian. Indeed, while eigenvalues can take negative values, singular values are
nonnegative by construction. However, they are the same if A is PSD.
E XERCISE 101. Prove that for A ∈ Cm×n ,
n o
σ1 ( A) = max k AxkCm : k xkCn = 1 ,
In other words, σ1 ( A) = k Ak2 , the spectral norm induced by the standard Eu-
clidean norms on Cn and Cm .
Consider the full SVD (16). Let v1 , . . . , vn ∈ Cn be the columns of V and u1 , . . . , um ∈
Cm be the columns of U. The following result relates these to the four fundamental
subspaces of A.
E XERCISE 102. Show that R( A) = span(u1 , . . . , ur ) and N ( A) = span(vr+1 , . . . , vn ).
We can use the SVD to define the two important matrix norms––the spectral norm
(largest singular value) and the nuclear norm (sum of the singular values).
E XERCISE 105. Let (15) be the SVD of A ∈ Cm×n . Define k Ak2 = σ1 and k Ak∗ =
σ1 + · · · + σr . Verify that these are matrix norms on Cm×n . Moreover, prove that these are
dual of each other in the sense that
k Ak∗ = max tr( X >A),
k X k 2 61
and
k Ak2 = max tr( X >A),
k X k ∗ 61
where the variable X ∈ Cm × n .
We can use the SVD for low-rank approximation. In particular, we have the fol-
lowing result which states that the optimal rank-k approximation of a matrix is
provided by a k-term truncation of (15).
E XERCISE 106. Let (15) be the SVD of A ∈ Cm×n and let k 6 r. Define
Ak = σ1 u1 v> >
1 + · · · + σk uk vk .
Prove that
k Ak − Ak∗ = max k X − Ak∗ ,
rank ( X )6k
where k · k∗ is the nuclear norm.
CHAPTER 11
POSITIVE MATRICES
In this chapter, we will work with real square matrices. However, for eigenvalue
computation, we will consider them as elements of Cn×n . We say that A is positive
(A > 0) if each Ai j > 0 and that A is nonnegative (A > 0) if each Ai j > 0. Similarly,
we will use the symbols x > 0 and x > 0 for vectors. We will use σ ( A) for the
spectrum of A, the set of eigenvalues of A. The spectral radius of A is defined as
ρ( A) = max |λ | : λ ∈ σ ( A) .
In fact, any positive matrix has a positive eigenvalue (Perron root) and a correspond-
ing positive eigenvector (Perron vector). Moreover, any nonnegative eigenvector
must be a multiple of the Perron vector.
T HEOREM 19. Let A > 0 and r = ρ( A). Then
Among other things, we will need the following results to establish Theorem 19.
35
36 11. POSITIVE MATRICES
E XERCISE 107. Let k·k be an induced matrix norm on Cn×n . Prove that
∀ A ∈ Cn × n : ρ( A) = lim k Ak k1/k .
k→∞
E XERCISE 108. Let A ∈ Cn×n be such that ρ( A) < 1. Then, for any matrix norm k · k,
lim k Ak k = 0.
k→∞
P ROPOSITION 6. Let A > 0 and ρ( A) = 1. Suppose q > 0 be such that q 6 Aq. Show
that q = Aq.
What can be said if A is not positive but nonnegative, i.e., has some zeros? Clearly,
r cannot be guaranteed to be positive since this class includes the 0 matrix. A more
interesting example is
0 1
(18) A= .
0 0
(1) r ∈ σ ( A).
(2) There exists x > 0, x 6= 0, such that Ax = rx.
The pre-multiplication with P> and post-multiplication with P has the effect of
reordering the rows and columns of A. Thus, (18) is reducible, while (19) is irre-
ducible.
11. POSITIVE MATRICES 37
0 ··· 0 1 1
E XERCISE 115. Let A > 0 be irreducible. Show that, for all 1 6 i, j 6 n, there exists
1 6 k 6 n − 1 such that ( Ak )i j > 0. In particular, show that ( I + A)n−1 > 0.
E XERCISE 116. Let A ∈ Cn×n be p be polynomial with complex coefficients. Show that
σ ( p( A)) = { p(λ ) : λ ∈ σ ( A)} .
E XERCISE 117. Let A ∈ Cn×n be such that ρ( A) ∈ σ ( A). Show that, for all k > 1,
ρ ( I + A)k = (1 + ρ( A))k .
CHAPTER 12
APPLICATIONS
1. Linear regression
Despite the nonlinear nature of p, we can find the optimal set of coefficients that
minimizes (1) using linear algebra. More precisely, define the coefficient vector
β = (β0 , . . . , βn ), the measurement vector y = ( y1 , . . . , yK ), and the data matrix
1 x1 x21 . . . xn1
1 x2 x2 . . . xn
2 2
X = . .. .
.. .. ..
.. . . . .
1 xK xK . . . xnK
2
That is, we can find the minimizer by solving X >Xβ = X >y. This is called the
normal equations and has a nice geometric interpretation (see fig. 2)
How do we know that these equations have a solution in the first place? Fortunately,
it does. We can use the decomposition range-nullspace decomposition of a linear
transform to establish the following result.
E XERCISE 119. Prove that, for any X and y, there exists β such that X >Xβ = X >y.
40 12. APPLICATIONS
In other words, we can always solve the regression problem (21) irrespective of the
nature of the data points.
3. Subspace approximation
The mathematical model for this approximation problem can be viewed as a gen-
eralization of linear regression to higher dimensions; however, there is a subtle
difference in the way the loss function is defined. More precisely, let PV ∈ L(Rd )
be the orthogonal projector onto the subspace V. The “best” approximation of each
xk would be PV ( xk ) and the distortion incurred is the projection error k xk − PV xk k,
where k · k ie the standard Eulcidean norm. On summing the errors from all the
points, we obtain the loss function
K
1
(26) `(V) = ∑ k xk − PV xk k2 ,
K k=1
where the variable is the subspace V. Thus, finding the optimal subspace amounts
to minimizing (26) w.r.t V.
We can solve the above problem by representing V in an orthonormal basis and
expressing (26) in terms of this basis vectors. More precisely, let dim V = p and
V = span(v1 , . . . , v p ), where v1 , . . . , v p are is an orthonormal basis of V.
42 12. APPLICATIONS
A popular technique for reducing the number of features in a dataset without losing
important information is the principal component analysis (PCA). This is modeled
on the approximation problem discussed above. Based on the above analysis, we
can summarize the key steps in PCA:
(1) Center the data to have zero mean (e.g. by subtracting the mean).
(2) Compute the covariance matrix C of the centered data.
(3) Find the top-p eigenvectors of C (e.g. using the power method).
(4) Project the data onto the subspace spanned by these eigenvectors.
4. Low-rank approximation
6. Markov chain
PROBLEMS
P ROBLEM 1. Let K denote the set of real-valued functions on [0, 1]. With the natural
vector space operations on K, explain why K is infinite-dimensional. On the other hand,
what can we say about the dimension of the vector space of functions f : {0, 1} → R?
P ROBLEM 2. Consider the space P of polynomials of degree at most n. Verify that p0 (t) =
1, p1 (t) = t, . . . , pn (t) = tn form a basis for P.
Consider the subspace
n Z 1 o
Q= p∈P: p(t) dt = 0 .
0
Find the dimension of Q.
P ROBLEM 3. Let V be a vector space and v1 , . . . , vn ∈ V be linearly independent. Show
that for any v ∈ V, the dimension of span(v1 + v, . . . , vn + v) is at least n − 1.
P ROBLEM 4. Let U1 and U2 be subspaces of a finite-dimensional inner product space V.
Prove that (U⊥ ⊥ ⊥ ⊥
1 ) = U1 and (U1 ∩ U2 ) = U1 + U2 .
⊥
P ROBLEM 12. Suppose T ∈ L(V, W) and T 0 ∈ L(W, V) are such that T ◦ T 0 = Id.
Show that N ( T ) = {0} and R( T ) = W.
P ROBLEM 13. Let T ∈ L(V, W), where V and W are finite-dimensional spaces. Define
S : R( T ∗ ) → R( T ), ∀ x ∈ R( T ∗ ) : Sx = Tx.
Prove that S is a linear isomorphism. Conclude that rank ( T ∗ ) = rank ( T ).
P ROBLEM 14. Recall that we use A ∼ B to mean that A and B are similar. Show that ∼
is an equivalence relation on the set Fn×n , i.e., ∼ is reflexive, symmetric, and transitive.
P ROBLEM 15. Let A ∈ Rm×n and b ∈ Rm be such that the equation Ax = b admits
a solution. Prove that there exists exactly one solution x of the form x = A>z for some
z ∈ Rm .
P ROBLEM 16. Let G be an undirected graph with vertices V = {1, . . . , n} and edges E .
Define L ∈ Rn×n as follows:
di
if i = j,
Li j = − 1 if (i, j) ∈ E ,
0 otherwise,
Using these, prove that k A ◦ Bk2 6 k Ak2 k Bk2 for all A, B ∈ Rn×n .
P ROBLEM 23. Let A, B ∈ Cn×n . Prove that
(1) I − AB invertible ⇐⇒ I − BA invertible.
(2) λ is an eigenvalue of AB ⇐⇒ λ is an eigenvalues of BA.
P ROBLEM 24. Let A, B ∈ Rn×n be symmetric matrices and let A be positive definite.
Prove that the eigenvalues of A−1 B are real. Moreover, if B is positive definite, then prove
that the eigenvalues of A−1 B are positive.
P ROBLEM 25. Let A ∈ Cn×n and τ > 0 be such that
n
∀1 6 i 6 n : ∑ | Ai j | 6 τ .
j=1
Prove that ρ( A) 6 τ.
P ROBLEM 26. Suppose A ∈ Rn×n is diagonalizable and two of its eigenvalues are equal.
Show that v, Av, . . . , An−1 v are linearly dependent for all v ∈ Rn .
P ROBLEM 27. The spectral norm of A ∈ Rn×n is defined as
k Ak2 = max k Axk : k xk = 1 ,
where k·k is the standard Euclidean norm on Rn . Prove that
max k Axk : k xk 6 1 .
k Ak2 = max y>Ax : k xk 6 1 and k yk 6 1 .
σmax ( A).