0% found this document useful (0 votes)
40 views45 pages

Laa 2024

Uploaded by

Rishabh Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views45 pages

Laa 2024

Uploaded by

Rishabh Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

LINEAR ALGEBRA AND ITS APPLICATIONS

© Kunal N. Chaudhury 2024


Contents

Chapter 1. INTRODUCTION 4
Chapter 2. VECTOR SPACES 6
Chapter 3. LINEAR TRANSFORMS AND MATRICES 12
Chapter 4. INNER-PRODUCT AND NORMED SPACES 16

Chapter 5. TRANSFORMS ON INNER PRODUCT SPACES 20


Chapter 6. OPERATOR AND MATRIX NORMS 23
Chapter 7. TRACE AND DETERMINANT 25

Chapter 8. DIAGONALIZATION 27
Chapter 9. EIGEN DECOMPOSITION 31
Chapter 10. SINGULAR VALUE DECOMPOSITION 33
Chapter 11. POSITIVE MATRICES 35

Chapter 12. APPLICATIONS 38


1. Linear regression 38
2. Linearly constrained optimization 40
3. Subspace approximation 41
4. Low-rank approximation 42
5. Closest orthogonal transform 42
6. Markov chain 42
7. Convolutions and DFT 42
Chapter 13. PROBLEMS 43

3
CHAPTER 1

INTRODUCTION

Linear algebra is a fundamental topic with applications across various fields.

(1) Machine Learning: Algorithms such as linear regression, support vec-


tor machines, and principal component analysis, rely on linear algebra
concepts.
(2) Signal Processing: Comes up in signal processing applications such as
image and audio filtering, detection, and compression.
(3) Control and Networks: Used in control engineering to design control and
dynamical systems. Matrices are also used to analyze and model social or
transportation networks.
(4) Computer Graphics: Used to represent and manipulate 2D and 3D ob-
jects, perform transformations (rotation, scaling, translation), and handle
lighting and shading computations.
(5) Operations Research: Linear programming, which comes up in resource
allocation and scheduling problems, involves optimizing a linear function
subject to linear constraints.
(6) Physical Sciences: Essential in solving problems in classical mechanics,
quantum mechanics, and electromagnetism.
(7) Economics and Finance: Used in economic models, financial analysis, and
portfolio optimization.

We discuss some concrete problems that can be solved using linear algebra.

(1) Linear regression. Given a matrix A and a vector b, find vector x that
minimizes
m  2
k Ax − bk2 = ∑ ∑ Ai j x j − bi ,
i =1 j

where n is the number of variables and m is the number of measurements.


We use this when the equation Ax = b is not solvable and we wish to find
an approximate solution. This is the backbone of numerous prediction
and inference problems.
(2) Quadratic optimization.
More generally, we might have a problem of the form

min x>Qx + q>x subject to Bx = c,


x

where Q is a symmetric matrix and q is a vector.


4
1. INTRODUCTION 5

(3) Eigenvalue problem. Given a symmetric matrix Q, we wish to find


orthonormal vectors x1 , . . . , xk that minimizes
x> >
1 Qx1 + · · · + xk Qxk .
For the case k = 1, the solution to this problem is the eigenvector with the
smallest eigenvalue. However, the solution is far from obvious when k > 1.
The eigenvalue problem has profound applications ranging from circuit
theory and structural engineering to quantum mechanics and control
theory.
(4) Principal component analysis. Given vectors z1 , . . . , zk in an n-dimensional
space, find a subspace S of dimension d < n that minimizes
k k
∑ dist(zi , S)2 = ∑ min
x∈ S
k x − zi k 2 .
i =1 i =1
This is not so obvious, but we can transform it into an eigenvalue prob-
lem. This has numerous applications, including data compression, noise
reduction, clustering, and feature engineering.
(5) Low-rank approximation. Given a matrix A of rank k, we wish to find
another matrix X of rank r < k that minimizes k X − Ak, where k · k is
some norm on the space of matrices.
Low-rank approximation has use cases in image and video compres-
sion, recommendation systems, feature extraction, and dimensionality
reduction.
(6) Closest orthogonal transform. Given a matrix A, we wish to find an
orthogonal matrix X (used to model rotations or reflections) that is closest
to A, namely, that minimizes k X − Ak. This has applications in computer
graphics, quantum mechanics, multiview reconstruction, and determina-
tion of molecular structure, to name a few.
The above problems involve optimizing an objective function over a feasible (or
constraint) set. Two fundamental questions in this regard are:
(i) How do we know beforehand that the problem has a solution?
(ii) Can we develop an efficient algorithm to find the solution?
We will see how these can be addressed using concepts from linear algebra: solution
of linear systems, eigendecomposition, singular value decomposition, etc. In fact,
the nontrivial part is to develop a mathematical formulation that allows us to
reduce them to a linear algebra problem, which then can be solved using efficient
algorithms. This comes under the scope of numerical linear algebra and will not be
covered in this course.
Since most of you have taken a course on matrices, let’s start by discussing some
fundamental results that will be covered in this course:
(1) Rank-Nullity theorem.
(2) Diagonalization and Triangularization.
(3) Eigenvalue Decomposition.
(4) Singular Value Decomposition.
(5) Perron-Frobenius theorem.
CHAPTER 2

VECTOR SPACES

A vector space (also called a linear space) is a mathematical abstraction of physical


two- and three-dimensional spaces. In three-dimensional space, we can add two
vectors and scale a vector, and these operations come with specific properties. By
defining these properties axiomatically, we arrive at the abstract concept of a vector
space. The advantage of this abstraction is that it extends beyond traditional vectors;
the elements of a vector space can be matrices, sequences, or even functions.
To define a vector space, we need the concept of a (scalar) field, which we will
denote by F. In this course, F = R (real numbers) or F = C (complex numbers).
We will skip the abstract definition of a field. We just need that R and C come
with addition and multiplication operations, denoted by + and ∗, and that every
non-zero element has a multiplicative inverse.
D EFINITION 1. A vector space over a field (F, +, ∗) is a set V with operations + :
V × V → V and · : F × V → V having the following properties.
(1) ∀ u, v ∈ V : u + v = v + u.
(2) ∀ u, v, w ∈ F : u + (v + w) = (u + v) + w.
(3) ∀ a, b ∈ F, u ∈ V : a · (b · u) = ( a ∗ b) · u.
(4) ∀ a ∈ F, u, v ∈ V : a · (u + v) = ( a · u) + (b · v).
(5) ∀ a, b ∈ F, u ∈ V : ( a + b) · u = ( a · u) + (b · u).
(6) ∀ u ∈ V : 1F · u = u.
(7) ∃ 0 ∈ V : ∀ u ∈ V, u + 0 = u.
(8) ∀ u ∈ V, ∃ (−u) ∈ F : u + (−u) = 0.
R EMARK 1. Let us note the following.
(1) In property 6, 1F denotes the multiplicative identity of F.
(2) Property 1 is commutativity, 2 and 3 are associativity, 4 and 5 are distributivity.
(3) We call 0 the zero (or null) vector and −v the inverse of v.
(4) The same + symbol is used for F and V to keep the notations simple.
(5) The same V can give rise to different spaces if we change the + or · operations.
(6) We will simply denote a · u as au.
E XERCISE 1. Prove that the null vector and the inverse of a given vector are unique.
E XERCISE 2. Let V be a vector space over F. Then show that
(1) ∀ a ∈ F : a · 0V = 0V .
(2) ∀ v ∈ V : 0F · v = 0V .
(3) ∀ v ∈ V : (−1)F · v = −v,
where 0F and 0V are the identity elements of F and V and (−1)F is the inverse of 1F .
6
2. VECTOR SPACES 7

E XAMPLE 1. We give some examples of vector spaces.

(1) C is a vector space over C and over R.


(2) Let F be a field and n ∈ N. Define
Fn = ( x 1 , . . . , x n ) : x 1 , . . . , x n ∈ F .


Define + and · as follows:


( x1 , . . . , xn ) + ( y1 , . . . , yn ) = ( x1 + y1 , . . . , xn + yn ),
and
∀ a ∈ F : a · ( x1 , . . . , xn ) = ( ax1 , . . . , axn ).
Then (Fn , +, ·) is a vector space over F.
(3) Let X be a set and F = (F, +, ∗) be a field. Let F be the collection of functions
f : X → F. Then F is a vector space with the following natural operations:
∀x ∈ X : ( f + g)( x) = f ( x) + g( x),
and
∀x ∈ X : ( a · f )( x) = a ∗ f ( x).
E XERCISE 3. Let S be the collection of complex-valued sequences x = ( x1 , x2 , . . .), where
xn ∈ C for all n ∈ N. Let
L1 = x ∈ S : | x1 | + | x2 | + · · · < ∞ ,


and
L2 = x ∈ S : | x1 |2 + | x2 |2 + · · · < ∞ .


Verify that L1 and L2 are vector spaces over C.


E XERCISE 4. Let GLn (C) be the set of (n × n) complex-valued matrices that are invertible.
Is GLn (C) a vector space over C?
E XERCISE 5. Verify that the set of continuous functions f : R → R form a vector space
over R.

The example in Exercise 5 demonstrates that thinking of a vector as merely a point


in a coordinate space can be misleading.
A recurring theme in linear algebra is to decompose a vector space into smaller
vector spaces with specific structures, known as subspaces.
D EFINITION 2. Let (V, +, ·) be a vector space over F. A nonempty set U ⊆ V is called a
subspace of V if (U, +, ·) is a vector space over F.
R EMARK 2.

(1) When we say U is a subspace, it is understood that U is a subset of some vector


space.
(2) In some textbooks, instead of asking U to be nonempty, the condition 0 ∈ U is
used; these are in fact equivalent.
(3) Once we fix U ⊂ V, we must have u + v ∈ V and au ∈ V for all u, v ∈ U and
a ∈ F; what is not obvious is that u + v ∈ U and au ∈ U.
P ROPOSITION 1. A nonempty set U ⊆ V is a subspace of V if and only if u + v ∈ U and
au ∈ U for all u, v ∈ U and a ∈ F.
8 2. VECTOR SPACES

One direction is obvious. In the other direction, all we need to check is that 0 ∈ V
also belongs to U and that for all u ∈ U, we have −u ∈ U; the rest of the vector
space axioms hold automatically.
E XAMPLE 2. In the examples below, V is a vector space and U ⊂ V is a subspace.

(1) V = Fn and U = { x ∈ V : x1 = 0}.


(2) V = Fn and U = { x ∈ V : x1 = · · · = xn }.
(3) V = Fn and U = { x ∈ V : a1 x1 + · · · + an xn = 0} where a ∈ Fn .
(4) V is the space of polynomials and U = { p ∈ V : p(0) = 0}.
E XERCISE 6. Let V be the space of complex-valued sequences and let U consist of those
x ∈ V for which xn → 0 as n → ∞. Verify that U is a subspace of V.
(xn → 0 means that, for all  > 0, there exists N > 1, such that | xn | <  for all n > N.)
E XERCISE 7. Let a = ( a1 , . . . , an ) ∈ Rn . Show that
H = ( x 1 , . . . , x n ) ∈ Rn : a 1 x 1 + · · · + a n x n = 0


is a subspace of Rn .
E XERCISE 8. Let V be a vector space over F. Let X be a nonempty subset of V with the
property that
∀ u, v ∈ X, t ∈ F : tu + (1 − t)v ∈ X.
Prove that there exists an x0 ∈ X and a subspace U ⊆ V such that X = x0 + U.

We now describe two commonly used operations for generating new subspaces
from existing ones.
P ROPOSITION 2. Let (Uα )α ∈ I be a collection of subspaces of V, where I is some index set.
Then \
Uα is a subspace of V.
α∈ I

Unfortunately, the union of two subspaces need not be a subspace. For example, let
if V = R2 , U1 = { x ∈ V : x1 = 0}, and U2 = { x ∈ V : x2 = 0}, then U1 ∪ U2 is a
cross-shaped object which is not closed under addition.
D EFINITION 3. Let U1 , . . . , Um be subspaces of V. Their sum is defined as

U1 + · · · + Um = u 1 + · · · + u m : u 1 ∈ U1 , . . . , u m ∈ Um .
E XAMPLE 3. Consider the example above: U1 = { x ∈ V : x1 = 0} and U2 = { x ∈ V :
x2 = 0}. Then U1 + U2 = R2 .

The sum of subspaces is the smallest subspace containing their union.


E XERCISE 9. Verify that U1 + · · · + Um is subspace of V. Moreover, show that it is the
smallest subspace containing U1 , . . . , Um , namely, for any subspace W ⊂ V containing
U1 ∪ . . . ∪ Um , we have U1 + · · · + Um ⊆ W.

We again revisit the example U1 = { x ∈ V : x1 = 0} and U2 = { x ∈ V : x2 = 0}.


For every v ∈ U1 + U2 = R2 , there exists u1 ∈ U2 and u2 ∈ U2 such that v =
u1 + u2 . This decomposition is unique and is called a “direct” sum.
2. VECTOR SPACES 9

D EFINITION 4. Let U1 , . . . , Um be subspaces of V. We say that U1 + · · · + Um is a direct


sum if every v ∈ U1 + · · · + Um can be uniquely written as v = u1 + · · · + um form some
ui ∈ Ui . We use the symbol U1 ⊕ · · · ⊕ Um for a direct sum.

We give a straightforward characterization of the direct sum for m = 2.


E XERCISE 10. Prove that U1 + · · · + Um is a direct sum if and only if u1 + · · · + um = 0
and ui ∈ Ui implies ui = 0. In particular, show that U1 + U2 is a direct sum if and only if
U1 ∩ U2 = 0.

Next, we look at the concepts of span, independence, and basis. All of these rest on
the idea of a “linear combination,” which in turn is built on the two operational
pillars of a vector space.
D EFINITION 5. Let v1 , . . . , vm ∈ V. We say that v ∈ V is a linear combination of
v1 , . . . , vm if there exists a1 , . . . am ∈ F such that v = a1 v1 + · · · + am vm . The set of all
possible linear combinations is called the span of v1 , . . . , vm :

span(v1 , . . . , vm ) = a1 v1 + · · · + am vm : a1 , . . . , am ∈ F .
E XAMPLE 4.
(1) Any v ∈ R2 is in the span of v1 = (1, 0) and v2 = (0, 1).
(2) The vector v = (1, 1, 1) is not in the span of v1 = (1, 0, 0) and v2 = (0, 1, 0).
E XERCISE 11. Let v1 = (1, −1) and v2 = (−1, 1). Show that span(v1 , v2 ) = R2 .
E XERCISE 12. Show that U = span(v1 , . . . , vm ) is a subspace of V. Moreover, argue why
U is the smallest subspace containing v1 , . . . , vm .

We now define the core concept for this course, the notion of finite-dimensional
vector spaces.
D EFINITION 6. A vector space V is finite-dimensional if there exists n > 1 and v1 , . . . , vn ∈
V such that V = span(v1 , . . . , vn ). A vector space that is not finite-dimensional is called
infinite-dimensional.

Note that we have not defined “dimension” at this point. Moreover, n in the above
definition is not necessarily the dimension of the vector space.
E XAMPLE 5.
(1) Fm is finite-dimensional.
(2) The space Fm×n of (m × n) F-valued matrices is finite-dimensional.
(3) The space of polynomials of degree at most k is finite-dimensional.
E XERCISE 13. Prove that every subspace of a finite-dimensional vector space is finite-
dimensional.
E XERCISE 14. Verify that R is a vector space over R. Also, verify that R is a vector space
over Q. Is R finite-dimensional over R? Is R finite-dimensional over Q?
E XERCISE 15. Show that the following spaces are infinite-dimensional.
(1) The space of polynomials with complex coefficients.
(2) The space of continuous functions f : R → R.
10 2. VECTOR SPACES

E XERCISE 16. Let V be a vector space. Show that the following are equivalent:

(1) V is infinite-dimensional.
(2) For all k > 1, there exists v1 , . . . , vk ∈ V that are linearly independent.
(3) For all k > 1 and v1 , . . . , vk ∈ V, there exists v ∈ V that does not belong to
span(v1 , . . . , vk ).
D EFINITION 7. We say that v1 , . . . , vm ∈ V are linearly independent if
a1 v1 + . . . + am vm = 0 ⇒ a1 = . . . = am = 0.

In other words, there does not exist a1 , . . . , am that are not all zero and yet
a1 v1 + . . . + am vm = 0.
If the vectors are not linearly independent, we say they are linearly dependent.
E XERCISE 17.

(1) Show that v1 = (1, −1) and v2 = (−1, 1) are linearly independent in R2 .
(2) In Fn , define vectors e1 , . . . , en to be
(
1, if j = i,
∀ i = 1, . . . , n, ∀ j = 1, . . . , n : ei ( j) =
0, if j 6= i.
Show that e1 , . . . , en are linearly independent. We call this the standard basis.
(3) Show that the system of functions e1 , e2 , . . . : [0, 1] → C defined as
ek (t) = exp(ι2πt) (t ∈ [0, 1]),
are linearly independent in the space of complex-valued functions on [0, 1].
E XERCISE 18. If v1 , . . . , vm are linearly independent, then show that
span(v1 , . . . , vm ) = span(v1 ) ⊕ · · · ⊕ span(vm ).

We can generalize the above result as follows.


P ROPOSITION 3. Let V = span(v1 , . . . , vm ) and let u1 , . . . , un ∈ V be linearly indepen-
dent. Then m > n.

Thus, a clear relationship exists between the cardinality of spanning vectors and
linearly independent vectors. As a consequence of the above observation, we can
conclude the following.
C OROLLARY 1. Any (n + 1) vectors in Fn must be linearly dependent and any (n − 1)
vectors cannot span Fn .

We can prove Proposition 3 using the following lemma.


L EMMA 1. Let v1 , . . . , vk be linearly independent and let vk+1 , . . . , vk+ p be such that
v1 , . . . , vk+ p are linearly dependent. Then there exists 1 6 j 6 p such that
vk+ j ∈ span(v1 , . . . , vk+ j−1 ).
D EFINITION 8. A basis for V is a set of linearly independent vectors that span V.
E XERCISE 19.
2. VECTOR SPACES 11

(1) Show that e1 , . . . , en in Exercise 17 are linearly independent.


(2) Find a basis for the space of matrices Fm×n .
(3) Find a basis for the space of cubic polynomials.

In fact, we can always find a basis if the space is finite-dimensional.


T HEOREM 1. Every finite-dimensional vector space has a basis.

Later, we will see that every finite-dimensional vector space has a special basis
known as an orthonormal basis, which is easier to work with.
We next give a useful characterization of a basis (this is the definition of a basis in
some books).
E XERCISE 20. Show that a set of vectors {v1 , . . . , vn } is a basis of V if and only if, for all
v ∈ V, there exists unique a1 , . . . , an ∈ F such that v = a1 v1 + . . . + an vn .
D EFINITION 9. For a finite-dimensional vector space V, we define its dimension to be the
cardinality of a basis of V. We denote this by dim V.

The trivial space V = {0} does not admit a basis; its dimension is defined as 0.
The following result confirms that the notion of “dimension” does not depend on
the choice of basis; it is an intrinsic quantity for a vector space.
P ROPOSITION 4. Let V be a finite-dimensional vector space. Then any two bases of V
have the same cardinality.
E XERCISE 21. Let dim V = n. Show that
(1) Any n linearly independent vectors is a basis of V.
(2) Any n vectors that span V is a basis of V.
E XERCISE 22. Let V be a finite-dimensional space and U be a subspace of V. Then show
that dim U 6 dim V.
E XERCISE 23. Find dim H where H is the subspace in Exercise 7.
E XERCISE 24. Let U1 and U2 be two subspaces of a vector space V. If dim U1 +
dim U2 > dim V, then prove that there must exist a nonzero vector in U1 ∩ U2 .
E XERCISE 25. Let V be a vector space over R. A subspace U ⊆ V is proper if U 6= V.
Prove that if dim V > 1, then V cannot be the union of proper subspaces.

Next, we introduce the concept of a complementary subspace.


P ROPOSITION 5. Let V be a finite-dimensional space and U be a subspace of V. Then there
exists a subspace W of V such that V = U ⊕ W.

We call W the complementary subspace of U. Note that the complementary sub-


space is not unique.
E XERCISE 26. In Propsotion 5, prove that dim V = dim U + dim W.
E XERCISE 27. Let V = R2 and U = {( x, 0) : x ∈ R}. Show that
W1 = {(0, x) : x ∈ R} and W2 = {( x, x) : x ∈ R}
are complementary subspaces of U.
CHAPTER 3

LINEAR TRANSFORMS AND MATRICES

We now look at the central objects of study in linear algebra — linear transforms
and matrices. We recall that the underlying field F is R or C.
D EFINITION 10. Let V and W be two vector spaces over F. A linear transform from V to
W is a function T : V → W with the following properties:
∀ u, v ∈ V : T (u + v) = T (u) + T (v).
∀ u ∈ V, a ∈ F : T ( au) = aT (u).
R EMARK 3.

(1) Linear transforms are also called linear maps or linear operators. In this course,
we will reserve the term “operator” for the case where V = W.
(2) We will use L(V, W) to denote the set of all possible linear transforms from V to
W. In particular, we will use the shorthand L(V) = L(V, V).
(3) A linear transform T : V → W maps 0V to 0W .
(4) We will use 0 and Id to denote the zero and identity maps:
∀v ∈ V : 0(v) = 0 and Id(v) = v.
E XAMPLE 6.

(1) V = W = R, and T ( x) = ax, where a ∈ R.


(2) V = Rn , W = R, and T ( x) = a1 x1 + a2 x2 , where a1 , a2 ∈ R.
(3) V = W = R2 , and T ( x1 , x2 ) = ( x2 , x1 ).
(4) V and W are the space of polynomials and T is the derivative operator.
(5) V is the space of continuous functions on [0, 1], W = R, and
Z 1
∀ f ∈ C ([0, 1]) : T( f ) = f (t) dt.
0

Linear transforms come up naturally in defining a “coordinate map” associated


with a finite-dimensional vector space.
D EFINITION 11. Let V be finite-dimensional vector space and dim V = n. Fix a basis
{v1 , . . . , vn } of V. Define
φ : V → Fn , φ(v) = ( x1 , . . . , xn ),
where v = x1 v1 + · · · + xn vn is the unique representation of v in the basis. We call φ the
coordinate map and x = φ(v) the coordinates of v.

In fact, φ has more structure to it than just linearity — it is an isomorphism.


12
3. LINEAR TRANSFORMS AND MATRICES 13

D EFINITION 12. A linear map T ∈ L(V, W) is called an isomorphism if it is a bijection,


i.e., if for all w ∈ W, there exists a unique v ∈ V such that w = T (v). We say that V and
W are isomorphic (written as V ∼ = W) if there exists an isomorphism T ∈ L(V, W).
E XERCISE 28. Show that the coordinate map is a linear isomorphism from V to Fdim V .

Every isomorphism admits an “inverse”, which is again linear.


E XERCISE 29. Let T ∈ L(V, W) be an isomorphism. Show that T −1 ∈ L(W, V) is also
an isomorphism.

The set L(V, W) comes with a natural vector space structure. Namely, if T, T 0 ∈
L(V, W) and a ∈ F, then we can define T + T 0 : V → W and aT : V → W as
follows:
∀v ∈ V : ( T + T 0 )(v) = T (v) + T 0 (v) and ( aT )(v) = aT (v).
E XERCISE 30. Verify that T + T 0 and aT are linear transforms and that L(V, W) is a
vector space over F. In particular, prove that dim L(V, W) = dim V · dim W.

Next, we introduce two natural subspaces associated with a linear transform.


D EFINITION 13. Let T ∈ L(V, W). The null space and range space of T are defined to be

N ( T ) = v ∈ V : T (v) = 0 and R( T ) = { T (v) : v ∈ V}.
R EMARK 4.

(1) N ( T ) and R( T ) are nonempty as 0V ∈ N ( T ) and 0W ∈ R( T ).


(2) N ( T ) ⊆ V and R( T ) ⊆ W.
(3) If T = 0, then N ( T ) = V and R( T ) = {0}.
(4) If T = Id, then N ( T ) = {0} and R( T ) = W.
E XERCISE 31. Let V = R2 , W = R, and T ( x) = x1 + x2 . Find N ( T ) and R( T ) and
give their geometric descriptions.

Since T is linear, we can say more about N ( T ) and R( T ).


E XERCISE 32. Show that N ( T ) is a subspace of V and R( T ) is a subspace of W.

Since N ( T ) and R( T ) are subspaces, we have the following definitions.


D EFINITION 14. The nullity and rank of T are defined as
nullity T = dim N ( T ) and rank T = dim R( T ).
E XERCISE 33. Find nullity T and rank T for the example in Exercise 31.

There exists a precise relation between these quantities.


T HEOREM 2. Let T ∈ L(V, W). If V is finite-dimensional, then R( T ) is finite-dimensional,
and
nullity T + rank T = dim V.
R EMARK 5. We will refer to this result as the rank-nullity theorem. For the above result to
hold, V must be finite-dimensional, but W can be infinite-dimensional.
14 3. LINEAR TRANSFORMS AND MATRICES

As a direct corollary of rank-nullity, we have the following results.


C OROLLARY 2. For all T ∈ L(V), N ( T ) = {0} ⇐⇒ R( T ) = V.
C OROLLARY 3. Let T ∈ L(V, W). If dim W < dim V, then T cannot be injective. On
the other hand, dim W > dim V, then T cannot be surjective.

We next discuss an important class of operators. For any P ∈ L(V), we define


P2 = P ◦ P.
D EFINITION 15. We say that P ∈ L(V) is a projection operator (or projector) if P2 = P.
E XAMPLE 7. We give some examples of projection operators below.
(1) Zero and identity maps.
(2) V = R2 , P ∈ L(V) : P( x1 , x2 ) = ( x1 , 0).
(3) V = R2 , P ∈ L(V) : P( x) = ( x1 + x2 , 0).
E XERCISE 34. Show that P is a projector if and only if Id − P is a projector.

The relation between projection and direct sum is described next.


T HEOREM 3. Let P ∈ L(V) be a projection operator. Then we can decompose V as
V = N ( P) ⊕ R( P). Conversely, if V = U ⊕ W, then we can find a projection operator
P ∈ L(V) such that U = N ( P) and W = R( P).

In the latter case, we call P the projection of V onto U along W.


We conclude this part by connecting linear transforms and matrices. A matrix is
just an array of numbers. More precisely, a F-valued matrix A of size (m × n) is
represented as
 
a11 · · · a1n
 a21 · · · a2n 
(1) A= ( a i j ∈ F) .
 
.. 
 . 
am1 ··· amn
We say that A has m rows and n columns. The (i, j)-th component of A is ai j . For
1 6 j 6 n, the vector  
a1 j
 .. 
c j =  .  ∈ Fm × 1
am j
is the j-th column of A. Similarly, for 1 6 i 6 m, the vector
ri = ( ai1 · · · ain ) ∈ F1×n
is the i-th row of A.
Matrices can be viewed as linear transforms in their own right. The space of
F-valued (m × n) matrices is denoted by Fm×n .
D EFINITION 16. Let A ∈ Fm×n and B ∈ Fn× p . Their product is the matrix C ∈ Fm× p
given by
n
∀ 1 6 i 6 m, 1 6 j 6 p : ci j = ∑ aik bk j .
k=1
3. LINEAR TRANSFORMS AND MATRICES 15

We denote this as C = AB.


E XERCISE 35. Let m, n > 1 and A be defined as in (1). Consider the map T : Fn → Fm
defined as
n
∀ x ∈ Fn : ( T ( x))i = ∑ ai j x j (1 6 i 6 m).
j=1
Show that T ∈ L(Fn , Fm ).

Any linear transform between finite-dimensional spaces can be represented using


matrices. This is similar to using a coordinate map to represent a vector in a vector
space.
E XERCISE 36. Let T ∈ L(V, W), where dim V = n and dim W = m. Let v1 , . . . , vn
be a basis of V and φ : V → Fn be the corresponding coordinate map. Similarly, let
w1 , . . . , wm be a basis of W and ψ : W → Fm be the corresponding coordinate map.
For all 1 6 j 6 n, there exists a1 j , . . . , am j ∈ F such that
m
T (v j ) = ∑ ai j wi .
i =1
Define A ∈ Fm × n as in (1). Show that T = ψ−1 ◦ A ◦ φ.

We call A the “matrix representation” of T and denote it as A = [ T ].


R EMARK 6.
(1) We obtain different matrix representations using different sets of bases, i.e., a
matrix representation depends on the choice of bases.
(2) We will exploit this idea when we discuss different matrix decompositions. In
particular, we will ask whether a given transform can be “diagonalized” using
appropriate bases. A diagonal representation is useful for computational reasons.
E XERCISE 37. Let T1 ∈ L(V, W) and T2 ∈ L(U, V). For a fixed choice of bases for U, V
and W, show that
[ T1 ◦ T2 ] = [ T1 ][ T2 ].
E XERCISE 38. Let T ∈ L(R2 , R2 ) be given by T ( x1 , x2 ) = ( x1 + 2x2 , 2x1 + 4x2 ). Find
[ T ] for the following choice of bases:
(1) v1 = (1, 0) and v2 = (0, 1); w1 = (1, 0) and w2 = (0, 1).
(2) v1 = (1, 2) and v2 = (−2, 1); w1 = (1, 0) and w2 = (0, 1).
(3) v1 = (1, 2) and v2 = (−2, 1); w1 = (1, 2) and w2 = (−2, 1).

What is the relation between the different matrix representations in Exercise 38.
D EFINITION 17. We say that M, N ∈ Fn×n are similar, denoted M ∼ N, if there exists
an invertible matrix P ∈ Fn×n such that M = PNP−1 .
E XERCISE 39. Let M and N be matrix representations of a linear operator corresponding
to two different bases. Prove that M ∼ N.

In Problem (14), we show that the space of matrices Fn×n can be partitioned into
equivalence classes, where each equivalence class [ M] = { N : N ∼ M} consists of
matrix representations of the linear transform represented by M.
CHAPTER 4

INNER-PRODUCT AND NORMED SPACES

We first recall the concepts of dot product and Euclidean distance on Rn and Cn .
The dot product is defined as
(2) x · y = x1 y1 + · · · + xn yn ( x, y ∈ Rn ),
and
(3) x · y = x̄1 y1 + · · · + x̄n yn ( x, y ∈ Cn ),
where z̄ denotes the conjugate of z ∈ C. The length of x ∈ Rn is given by
√ 1
(4) k xk = x · x = ( x21 + · · · + x2n ) 2 ,
and that of z ∈ Cn is
√ 1
(5) k zk = z · z = (| z1 |2 + · · · + | zn |2 ) 2 .

The Euclidean distance between x and y in Rn or Cn is defined as k x − yk.


The above structures form the basis of Euclidean geometry. Can we extend these
definitions to the space of polynomials or continuous functions on [0, 1]? More
generally, how do we define them for an abstract vector space? This brings us to
the concepts of “inner product” and “norm”. We will see that while every inner
product gives rise to a norm, the converse is not true.
To define the inner product, we must distinguish between real and complex spaces.
We call V a real (resp. complex) vector space if F = R (resp. F = C).
D EFINITION 18. Let V be real vector space. An inner product on V is a map h·, ·i :
V × V → R such that
(1) h x, xi > 0 for all x ∈ V, and h x, xi = 0 if and only if x = 0.
(2) ∀ x, y ∈ V : h x, yi = h y, xi.
(3) ∀ y ∈ V, the map x 7→ h y, xi is linear.
R EMARK 7. Property 1 is called positive definiteness and 2 as symmetry. It follows from
properties 1-3 that the map x 7→ h x, yi is linear for all y ∈ V. For this reason, we refer
to 3 as bilinearity. Thus, any positive definite, symmetric, bilinear form on V is an inner
product.
E XERCISE 40. Verify that (2) has the properties of an inner product on Rn .
E XERCISE 41. Let V be the space of polynomials with real coefficients. Verify that the
following is an inner product on V:
Z 1
(6) ∀ p, q ∈ V : h p, qi = p(t)q(t) dt.
−1
16
4. INNER-PRODUCT AND NORMED SPACES 17

E XERCISE 42. Let V = Rn and fix w1 , . . . , wn > 0. Verify that the following is an inner
product on V:
∀ x, y ∈ V : h x, yi = w1 x1 y1 + · · · + wn xn yn .

The definition of inner product is slightly different for a complex space.


D EFINITION 19. Let V be complex vector space. An inner product on V is a map
h·, ·i : V × V → C such that
(1) h x, xi > 0 for all x ∈ V, and h x, xi = 0 if and only if x = 0.
(2) ∀ x, y ∈ V : h x, yi = h y, xi.
(3) ∀ y ∈ V, the map x 7→ h y, xi is linear.
R EMARK 8. Property 2 is called skew-symmetry. Unlike the inner product on a real vector
space, we do not have bilinearity but something called sesquilinearity:
h x + y, vi = h x, vi + h y, vi and h ax, vi = āh x, vi.
E XERCISE 43. Verify that (3) has the properties of an inner product on Cn .
D EFINITION 20. A vector space with an inner product is called an inner product space.

Thus, Rn and Cn are examples of inner-product spaces. In fact, we can make the
following general statement.
T HEOREM 4. Any finite-dimensional vector space is an inner-product space.

We now turn to normed spaces. In this case, we do not need to distinguish between
real and complex spaces.
D EFINITION 21. Let V be a real or complex vector space. An norm on V is a map
k·k : V → R such that
(1) k xk > 0 for all x ∈ V, and k xk = 0 if and only if x = 0.
(2) ∀ x ∈ V, a ∈ F : k axk = | a| k xk.
(3) ∀ x, y ∈ V, k x + yk 6 k xk + k yk.
R EMARK 9.
(1) Property 1 is positive definiteness, 2 is homogeneity, and 3 is triangle inequality.
(2) Norm of a vector is a real number, even for a complex vector space.
E XERCISE 44. Verify that (4) and (5) are valid norms on Rn and Cn .
D EFINITION 22. A vector space with a norm is called a normed space.

Thus, Rn and Cn are both inner product and normed spaces. Note that the respec-
tive norms (4) and (5) are derived from the respective dot products. In fact, any
inner product naturally induces a norm as follows.
E XERCISE 45. Let (V, h·, ·i) be an inner product space. Define
1
∀x ∈ V : k xk = h x, xi 2 .
Verify that k·k is a norm. We call k·k the norm induced by the inner product h·, ·i.

Combining the above observation with Theorem 4, we have the following.


18 4. INNER-PRODUCT AND NORMED SPACES

T HEOREM 5. Any finite-dimensional vector space is a normed space.

While every inner product space is automatically a normed space, the converse
is not true, i.e., not all norms are induced by an inner product, as shown in the
following example.
E XERCISE 46. Let V = R2 and k xk = | x1 | + | x2 |. Verify that this is a norm. Explain
why there does not exist an inner product h·, ·i on V such that k xk = h x, xi1/2 .

We now look at the famous Cauchy-Schwarz inequality relating an inner product


to its induced norm.
T HEOREM 6. Let (V, h·, ·i) be an inner product space with induced norm k·k. Then
∀ x, y ∈ V : |h x, yi| 6 k xk · k yk.

A central concept in Euclidean geometry is that of orthogonality. For example, we


have the Pythagoras theorem for right-angle triangles. We also have the parallel-
ogram law, which states that the sum of the sides of a parallelogram is twice the
sum of the diagonals. These hold in the abstract setting of an inner product space.
In the rest of the discussion, we will assume that V is an inner product space.
D EFINITION 23. We say that x, y ∈ V are orthogonal, denoted x ⊥ y, if h x, yi = 0.
Furthermore, they are orthonormal if k xk = k yk = 1.
E XAMPLE 8.
(1) Vectors (1, 1) and (1, −1) are orthogonal but not vectors (1, 1) and (1, 0).
(2) Vectors (1, 0) and (0, 1) are orthonormal.
(3) Consider the inner product (6). Then p(t) = t and q(t) = t2 are orthogonal.

We can now state the Pythagoras theorem and the parallelogram law.
E XERCISE 47. Show if that if x ⊥ y, then k x + yk2 = k xk2 + k yk2 .
E XERCISE 48. Show that k x + yk2 + k x − yk2 = 2(k xk2 + k yk2 ).
R EMARK 10. The proof of the parallelogram law depends crucially on the fact that the
norm in question is derived from an inner product. In fact, the parallelogram law need not
hold in general normed spaces.
D EFINITION 24. A list of vectors q1 , . . . , qn in V are said to be an orthonormal system if
kqi k = 1 for all i and qi ⊥ q j for all i 6= j.
E XAMPLE 9.
(1) The vectors in Exercise 17 form an orthonormal system w.r.t. the dot product.
(2) The functions in Exercise 17 are orthonormal wrt the following inner product:
Z 1
h f , gi = f (t) g(t) dt.
0

At this point, we introduce the concept of an orthonormal basis. This is motivated


by the following result.
E XERCISE 49. Show that any orthonormal system of vectors is linearly independent.
4. INNER-PRODUCT AND NORMED SPACES 19

D EFINITION 25. A basis in which the vectors form an orthonormal system is called an
orthonormal basis (ONB).
E XERCISE 50. Show that the orthonormal system in Exercise 17 is an ONB of Rn .

The above ONB is w.r.t. the special inner product on Rn , namely, the dot product.
Can we come up with an ONB if we work with a different inner product? What
about an abstract inner product?
T HEOREM 7. Any finite-dimensional inner product space has an ONB.

This can be shown using an inductive process called Gram-Schmidt orthogonaliza-


tion.
E XERCISE 51. Let V be the space of polynomials and consider the inner product (6). Apply
Gram-Schmidt to the polynomials p1 (t) = 1, p2 (t) = t, and p3 (t) = t2 to derive an
orthonormal system which spans the same space as p1 , p2 , and p3 .

As mentioned earlier, working with an ONB has computational advantages. This is


because it is easy to compute the coordinates w.r.t. an ONB.
E XERCISE 52. Let q1 , . . . , qn be an ONB of V. Prove that, for all x ∈ V,
n n
x= ∑ hx, qi iqi and k xk2 = ∑ |hx, qi i|2 .
i =1 i =1

We call the latter Parseval’s formula; this gives us an easy means of computing the
norm of a vector from its coefficients.
Having discussed inner product spaces, we move to linear operators on (and
between) such spaces. A fundamental result in this regard is the characterization of
the dual space V0 = L(V, F) — the space of linear functionals on V.
T HEOREM 8. Let V be a finite-dimensional inner product space. Then
∀ ` ∈ V0 , ∃ v ∈ V : `( x) = hv, xi ( x ∈ V).
That is, every linear functional on V is given by an inner product.
E XERCISE 53. If V is finite-dimensional, then show that V0 = ∼ V.

We revisit the concept of a complementary subspace for an inner product space.


D EFINITION 26. Let U be a subspace of an inner product space V. The orthogonal
complement of U is defined as
U⊥ = x ∈ V : x ⊥ u ∀ u ∈ U .


We record two trivial cases: V⊥ = {0} and {0}⊥ = V.


E XERCISE 54. Verify that U⊥ is a subspace of V. Moreover, show that
U ∩ U⊥ = {0}, (U⊥ )⊥ = U, and V = U ⊕ U⊥ .

The last relation is called the orthogonal decomposition of V.


E XERCISE 55. Let q1 , . . . , qn be an ONB. For 1 < k < n, let U = span(q1 , . . . , qk ) and
W = span(qk+1 , . . . , qn ). Show that W = U⊥ .
CHAPTER 5

TRANSFORMS ON INNER PRODUCT SPACES

Given the orthogonal decomposition V = U ⊕ U⊥ , we are naturally led to the


concept of an orthogonal projection.
D EFINITION 27. The orthogonal projection onto a subspace U is an operator PU : V → V
defined as
∀ x ∈ V : PU ( x) = u ( x = u + v, u ∈ U, v ∈ U⊥ ).
R EMARK 11.
(1) It is clear that PU ∈ L(V).
(2) When we say that P is an orthogonal projection, it is understood that P = PU for
some subspace U ⊆ V.
E XERCISE 56. Let u ∈ V and U = span(u). Give a formula for PU .

Some properties of orthogonal projection are mentioned below.


E XERCISE 57. Let P = PU be an orthogonal projection on V. Show that

(1) P2 = P.
(2) R( P) = U and N ( P) = U⊥ .
(3) P + PU⊥ = Id.
(4) k Pxk 6 k xk for all x ∈ V.
(5) For all x ∈ V,
P( x) = argmin ku − xk.
u ∈U
In other words, P( x) is the (unique) point in U that is “closest” to x.
E XERCISE 58. If P is an orthogonal projection, then show that
∀ x, y ∈ V : h Px, yi = h x, Pyi.

We call such operators as self-adjoint. But first, we must define the “adjoint” of an
operator; the definition is motivated by the following results.
E XERCISE 59. For A ∈ Rm×n , define its transpose A> ∈ Rn×m to be ( A> )i j = A ji for
all 1 6 i 6 m, 1 6 j 6 n. Show that
∀ x, y ∈ Rn : h Ax, yi = h x, A> yi,
where h·, ·i is the standard dot product.
On the other hand, for A ∈ Cm×n define its Hermitian transpose AH ∈ Cn×n to be
( AH )i j = A ji for all 1 6 i, j 6 n. Show that
∀ x, y ∈ Cn : h Ax, yi = h x, AH yi,
20
5. TRANSFORMS ON INNER PRODUCT SPACES 21

where h·, ·i is the Hermitian dot product.

In fact, every operator on a finite-dimensional space has an adjoint; this essentially


follows from Theorem 8.
T HEOREM 9. Let V and W be finite-dimensional and T ∈ L(V, W). Prove that there
exists an unique operator T ∗ ∈ L(W, V) such that
(7) ∀ x ∈ V, y ∈ W : h Tx, yi = h x, T ∗ yi.

We call T ∗ the adjoint of T. We list some useful properties of the adjoint.


E XERCISE 60. Let T, S ∈ L(V) and a ∈ F. Show that
(1) ( T + S)∗ = T ∗ + S∗ and ( aT )∗ = āT ∗ .
(2) ( T ∗ )∗ = T.
(3) ( TS)∗ = S∗ T ∗ .

We can identify the following four “fundamental subspaces” associated with any
T ∈ L(V, W):
N ( T ), R( T ∗ ) ⊆ V and N ( T ∗ ), R( T ) ⊆ W.
There exists a particular relation between these subspaces.
T HEOREM 10. Let T ∈ L(V, W). Then
(1) N ( T ∗ T ) = N ( T ) and N ( TT ∗ ) = N ( T ∗ ).
(2) N ( T )⊥ = R( T ∗ ) and N ( T ∗ )⊥ = R( T ).
(3) V = N ( T ) ⊕ R( T ∗ ) and W = N ( T ∗ ) ⊕ R( T ).

As a corollary of the above result, we can conclude the following.


E XERCISE 61. Prove that for all T ∈ L(V, W), rank T ∗ = rank T
E XERCISE 62. Let A ∈ Rm×n and b ∈ Rm be such that the equation Ax = b admits a
solution. Show that there is a solution of the form x = A>z where z ∈ Rm .
E XERCISE 63. Let A ∈ Rm×n . Show that the equation A>A = A>b has a solution for all
b ∈ Rm . Moreover, show that the solution is unique if and only if N ( A) = {0}.

We end this part by discussing some important classes of operators that will be
studied in greater detail later.
D EFINITION 28. An operator is self-adjoint if T ∗ = T, i.e., if
∀ x, y ∈ V : h Tx, yi = h x, Tyi.
In particular, a real self-adjoint matrix (A> = A) is called symmetric, and a complex
self-adjoint matrix (AH = A) is called Hermitian.
R EMARK 12. The inner product is essential in deciding if an operator is self-adjoint. An
operator that is self-adjoint w.r.t. one inner product need not be self-adjoint w.r.t. a different
inner product.
E XERCISE 64. Let P ∈ L(V) be such that P2 = P. Prove P is an orthogonal projection if
and only if P is self-adjoint.
22 5. TRANSFORMS ON INNER PRODUCT SPACES

E XERCISE 65. Let u ∈ Rn . Verify that Ai j = ui u j , 1 6 i, j 6 n, is symmetric. On the


other hand, verify that Ai j = vi v j , 1 6 i, j 6 n, is Hermitian where v ∈ Cn .
E XERCISE 66. Let A ∈ Cn×n . Show that if
∀ x ∈ Cn : h x, AxiCn = 0,
where h·, ·iCn is the dot product on Cn , then A = 0.
On the other hand, give a counterexample to show that if A ∈ R2×2 , then the condition
∀ x ∈ R2 : h x, AxiR2 = 0,
where h·, ·iR2 is the dot product on R2 , does not imply that A = 0.

What is special about a self-adjoint operator? We will later see that such operators
can always be diagonalized in an ONB. In fact, they form a subclass of a larger set
of diagonalizable operators.
D EFINITION 29. An operator is T ∈ L(V) is said to be normal if it commutes with its
adjoint: T ∗ T = TT ∗ .

Obviously, a self-adjoint operator is normal. In particular, any symmetric or Her-


mitian matrix is normal. Another important class of normal operators is the unitary
operators.
D EFINITION 30. An operator is T ∈ L(V) is said to be unitary if T ∗ T = TT ∗ = Id. A
real-valued unitary matrix is called an orthogonal matrix.

In fact, we need just one equality in the above definition if V is finite-dimensional.


This is because of the following observation.
E XERCISE 67. Let V be finite-dimensional. Show that T ∗ T = Id if and only if TT ∗ = Id.

The following is a useful characterization of unitary operators.


E XERCISE 68. Let V be finite-dimensional and let T ∈ L(V). Show that the following are
equivalent:
(1) T is unitary, i.e., T ∗ T = TT ∗ = Id.
(2) T is an isomorphism and T −1 = T ∗ .
(3) ∀ x ∈ V : k Txk = k xk.
(4) ∀ x, y ∈ V : h Tx, Tyi = h x, yi.
(5) T maps any ONB into a (possibly different) ONB.
E XERCISE 69. Show that the following operators are unitary.

(1) T ∈ L(R2 ) given by T ( x1 , x2 ) = ( x2 , − x1 ).


(2) A ∈ R2×2 given by
 
cos θ sin θ
A= .
− sin θ cos θ
(3) A ∈ C2×2 given by
eιθ −e−ιθ
 
1
A= √ .
2 eιθ e−ιθ
CHAPTER 6

OPERATOR AND MATRIX NORMS

Similar to vectors, we can define inner products and norms for matrices. After all,
the space of matrices (of a fixed shape) is finite-dimensional, and we know that
such spaces are automatically an inner product space.
We first look at the concept of operator norm, which can even be defined for
operators on infinite-dimensional spaces.
D EFINITION 31. Let (V, k·kV ) and (W, k·kW ) be two normed spaces. The operator (or
induced) norm of T ∈ L(V, W) is defined as
n o
(8) k T k = max k TxkW : x ∈ V, k xkV = 1 .

E XERCISE 70. Verify that (8) is a valid norm on L(V, W). Moreover, show that
(9) ∀ T, S ∈ L(V) : k TSk 6 k T k k Sk.
R EMARK 13.

(1) The sup in (8) is finite if V is finite-dimensional but need not be finite if V is
infinite-dimensional; if the sup is finite, we call T a bounded operator.
(2) An important special case is when T is a matrix A and the norms are Euclidean.
In this case, we call (8) the spectral norm and is denoted by k Ak2 . We will later
see that this corresponds to the largest singular value of a matrix.

Generally, a “matrix norm” is a standard norm on the space of matrices with the
additional submultiplicative property (9). We record this for completeness.
D EFINITION 32. A map k·k : Cn×n → R is called a matrix norm if it satisfies the
following properties.

(1) ∀ A ∈ Cn×n : k Ak > 0 and k Ak = 0 ⇐⇒ A = 0.


(2) ∀ A ∈ Cn×n , c ∈ C : kcAk = |c| · k Ak.
(3) ∀ A, B ∈ Cn×n : k A + Bk 6 k Ak + k Bk.
(4) ∀ A, B ∈ Cn×n : k ABk 6 k Ak k Bk.

We already know from Exercise 70 that the spectral norm is a matrix norm. Another
important example is the Frobenius norm.
D EFINITION 33. The Frobenius norm of A ∈ Cm×n is defined as
!1/2
m n
(10) k Ak F = ∑∑ | ai j |2 .
i =1 j=1

23
24 6. OPERATOR AND MATRIX NORMS

E XERCISE 71. Verify that (10) is a matrix norm in the sense of Definition 32 but is not an
induced norm in the sense of (8).
E XERCISE 72. Let A ∈ Cm×n be of rank r. Prove that

k Ak2 6 k Ak F 6 r k Ak2 .

We next look at some other examples of matrix norms.


E XERCISE 73. For A ∈ Cm×n , define
m n
k Ak1 = max ∑ | ai j | and k Ak∞ = max ∑ | ai j | .
16 j6n i = 1 1 6i 6 m j = 1

Verify that these are norms on Cm×n and that they are induced by norms on Cm and Cn .

We know that the series 1 + a + a2 + · · · converges for any a ∈ C if | a| < 1. The


following is a generalization of this result for matrices. A sequence of matrices
A1 , A2 , . . . is said to converge to A if k An − Ak → 0 as n → ∞ (the choice of matrix
norm does not matter).
E XERCISE 74. Let A ∈ Cn×n such that k Ak < 1, where k·k is an induced matrix norm.
Prove that I − A is invertible, and
1
k( I − A)−1 k 6 .
1 − k Ak
Moreover, prove that
lim ( I + A + A2 + · · · + An ) = ( I − A)−1 .
n→∞

The following result tells us that the set of invertible matrices in Cn×n form an open
set.
E XERCISE 75. Let A ∈ Cn×n be invertible and k · k be a norm on Cn×n . Show (without
using determinants) that there exists δ > 0 such that B is invertible if k B − Ak < δ.

We will later need the following result to establish the Cayley-Hamilton theorem.
E XERCISE 76. Let p be a polynomial with complex coefficients and k · k be a norm on Cn×n .
Let A ∈ Cn×n and ( An ) be a sequence of matrices in Cn×n such that k An − Ak → 0.
Prove that
lim k p( An ) − p( A)k = 0.
n→∞
CHAPTER 7

TRACE AND DETERMINANT

In this part, we will talk about the trace and determinant of a matrix. These are
useful for simplifying calculations involving matrices.
Unlike the previous chapters, we will depart from generic linear transforms and
deal exclusively with matrices. In particular, we will use determinants to character-
ize invertible matrices and find eigenvalues of matrices. Unless otherwise stated,
the underlying field F is R or C.
D EFINITION 34. The trace of A ∈ Fm×n is tr( A) = A11 + · · · + Ann .

We can view trace as a linear functional on the space of matrices, i.e., tr ∈ L(Fm×n , F).
In particular, we have the following properties.
E XERCISE 77. Show that
(1) tr( AB) = tr( BA) for all A, B ∈ Fm×n .
(2) k Ak2F = tr( AH A) for all A ∈ Fm×n .
(3) kuk2 = tr(uu> ) for all u ∈ Rn .
(4) kuk2 = tr(uuH ) for all u ∈ Cn .
(5) A ∼ B ⇒ tr( A) = tr( B) for all A, B ∈ Fm×n .
E XERCISE 78. Let A and B be two matrix representations of an operator T, show that
tr( A) = tr( B). Thus, we can define the trace of T as tr( A), where A = [ T ].

The definition of determinants is more complex. We will do so using permutations.


A permutation on [n] = {1, 2, . . . , n} is a reordering of the elements of [n]. We can
also view them as bijections on [n].
D EFINITION 35. A permutation on [n] is a bijection σ : [n] → [n]. We will use Πn to
denote the set of permutations on [n].
E XAMPLE 10. Π1 = {(1)} and Π2 = {(1, 2), (2, 1)}.
E XERCISE 79. Compute |Πn |.

A permutation can be decomposed into elementary permutations.


D EFINITION 36. A permutation τ ∈ Πn is called a transposition if there exists p, q ∈
[n], p 6= q, such that 
q
 i = p,
σ (i ) = p i = q,

i i∈ / { p, q}.

We will use Tn to denote the set of transpositions on [n].


25
26 7. TRACE AND DETERMINANT

E XAMPLE 11. (2, 1) and (3, 2, 1) are transpositions but (2, 3, 1) and (3, 1, 2) are not
transpositions.
E XERCISE 80. Compute |Tn | and compare with |Πn |.

The reason we are interested in transpositions is the following result.


E XERCISE 81. Show that any permutation is a composition of transpositions, i.e.,
∀ σ ∈ Πn , ∃ τ1 , . . . , τk ∈ Tn : σ = τ1 ◦ · · · ◦ τk .
E XAMPLE 12. We can write (3, 1, 2) as the transposition (1, 2, 3) → (1, 3, 2) followed
by the transposition (1, 3, 2) → (3, 1, 2).

There are different ways to express a permutation using transpositions, e.g., apply-
ing the same transposition an even number of times does not make any difference.
What is invariant is the parity of the number of transpositions in the decomposition.
E XERCISE 82. Let σ ∈ Πn . Suppose there exists τ1 , . . . , τk ∈ Tn and ρ1 , . . . , ρ` ∈ Tn
such that
σ = τ1 ◦ · · · ◦ τk = ρ1 ◦ · · · ◦ ρ` .
Show that both k and ` are even or both are odd. Thus, every σ ∈ Πn has even parity (even
permutation) or odd parity (odd permutation).

We encode the above structure using a ±1 signature.


D EFINITION 37. The sign of a permutation σ ∈ Πn is defined as
n
sign(σ ) = +1 if σ is even, −1 if σ is odd.

E XERCISE 83. Prove the following results.


(1) sign(σρ) = sign(σ ) sign(ρ) for all σ , ρ ∈ Πn .
(2) Πn = {σρ : ρ ∈ Πn } = {ρσ : ρ ∈ Πn } for all σ ∈ Πn .
(3) Πn = {σ −1 : σ ∈ Πn }.
D EFINITION 38. The determinant of A = ( ai j ) ∈ Fn×n is defined as
n
(11) det( A) = ∑ sign(σ ) ∏ aσ ( j), j .
σ ∈Πn j=1

We collect some useful properties of determinants.


E XERCISE 84. Let A, B ∈ Fm×n . Deduce the following properties from definition (11).
(1) det( A) = ∑σ ∈Πn sign(σ ) ∏in=1 ai,σ (i) .
(2) det( In ) = 1.
(3) ∀c ∈ F : det(cA) = cn det( A).
(4) det( A> ) = det( A).
(5) det( AB) = det( A) det( B).
(6) det( A) = 0 ⇐⇒ A is not invertible.
(7) A ∼ B ⇒ det( A) = det( B).
(8) A has two identical columns (or rows) ⇒ det( A) = 0.
(9) A unitary ⇒ det( A) = ±1.
CHAPTER 8

DIAGONALIZATION

In this chapter, we will work with a finite-dimensional vector space V over the field
C. We will use PC to denote the space of polynomials with complex coefficients.
We wish to understand the conditions for an operator to be “diagonalizable”.
D EFINITION 39. An operator T ∈ L(V) is said to be diagonalizable if there exists a basis
v1 , . . . , vn of V and λ1 , . . . , λn ∈ C such that
∀1 6 i 6 n : Tvi = λi vi .
R EMARK 14. The matrix representation [ T ] with respect to v1 , . . . , vn is diagonal:
 
λ1
A = diag(λ1 , . . . , λn ) = 
 .. .

.
λn
In other words, a diagonalizable matrix is similar to a diagonal matrix.

Are all operators diagonalizable?


E XERCISE 85. Let T ∈ L(V) be a nonzero operator such that T n = 0 for some n > 2.
Show that T cannot be diagonalized. In particular, show that the matrix
 
0 1
A=
0 0
cannot be diagonalized as an operator A ∈ L(C2 ).
D EFINITION 40. We say that λ ∈ C is an eigenvalue of T ∈ L(V) if there exists a nonzero
v ∈ V such that Tv = λv. We call v an eigenvector in this case.
E XAMPLE 13.

(1) For the zero operator, every nonzero vector is an eigenvector, and 0 is the only
eigenvalue
(2) For the identity operator, every nonzero vector is an eigenvector, and 1 is the only
eigenvalue
(3) The eigenvalues of an (n × n) diagonal matrix are the diagonal elements, and
eigenvectors are the columns of the identity matrix In .
E XERCISE 86. Let T be such that T k = 0 for some k > 2. Find the eigenvalues of T.

The following relates eigenvalues with determinants. Recall that a square matrix is
not invertible if and only if det( A) = 0.
27
28 8. DIAGONALIZATION

E XERCISE 87. Let T ∈ L(V) and λ ∈ C. Show that λ is an eigenvalue of T if and only if
T − λId is not injective. In particular, if A = [ T ] ∈ Cn×n , then show that λ ∈ C is an
eigenvalue of T if and only if A − λIn is not invertible, or equivalently, det( A − λIn ) = 0.

Using the permutation-based formula for determinants, we can deduce the follow-
ing.
E XERCISE 88. Let A ∈ Cn×n . Show that p(t) = det( A − tIn ) is a polynomial of the
form p(t) = (−1)n tn + a1 tn−1 + · · · + an .

The polynomial in question is an important object in the study of eigenvalues.


D EFINITION 41. Let A = [ T ] ∈ Cn×n be a matrix representation of T ∈ L(V). We call
p A (t) = det( A − tIn ) the characteristic polynomial of T.
E XERCISE 89. If A and B are the matrix representations of some operator T, then p A = p B .
R EMARK 15. An operator on a vector space over the field R is not guaranteed to have an
eigenvalue. Indeed, for the example
 
0 −1
A= ,
1 0
the characteristic polynomial is p A (t) = t2 + 1, which has no real roots. However, if we
switch to F = C, then A has two eigenvalues.

We will later need the following result called the Cayley-Hamilton theorem. This
requires the concept of a matrix polynomial. For any polynomial q(t) = a0 + a1 t +
· · · + am tm , ai ∈ C, and A ∈ Cn×n , we define the matrix q( A) ∈ Cn×n as follows:
q( A) = a0 In + a1 A + · · · + am Am .
T HEOREM 11. Let A ∈ Cn×n . Then p A ( A) = 0.
E XERCISE 90. Let A ∈ Cn×n and q ∈ PC . Show that if λ is an eigenvalue of A, then
q(λ ) is an eigenvalue of q( A). Conversely, if µ is an eigenvalue of q( A), then show that
µ = q(λ ) for some eigenvalue λ of A.

Use the above result to establish the Cayley-Hamilton theorem for diagonalizable
matrices.
E XERCISE 91. Establish Theorem 11 for a diagonalizable matrix.

We have seen that the degree of the characteristic polynomial of an (n × n) matrix is


n. It follows from the fundamental theorem of algebra that the matrix has precisely
n complex eigenvalues, including repetitions.
D EFINITION 42. Let λ ∈ C be an eigenvalue of T. The algebraic multiplicity of T, denoted
as a(λ ), is the number of times λ appears as a root of the characteristic polynomial.

In other words, we can write the characteristic polynomial as


p[ T ] ( t ) = ( t − λ ) a(λ ) q ( t ) , q(λ ) 6= 0.

That an operator T ∈ L(V) can have at most n = dim(V) distinct eigenvalues can
be deduced from the following observation.
8. DIAGONALIZATION 29

E XERCISE 92. Let λ1 , . . . , λk be distinct eigenvalues of T and let v1 , . . . , vk be the corre-


sponding eigenvectors. Then v1 , . . . , vk are linearly independent.

We record the previous fact and an important consequence of it — a sufficient


condition for an operator to be diagonalizable.
T HEOREM 12. Let V be finite-dimensional. Then any T ∈ L(V) can have at most dim V
distinct eigenvalues. In particular, T is diagonalizable if it has dim V distinct eigenvalues.

We remark that the last condition is not necessary for diagonalizability. Indeed,
the zero and identity transform have just one distinct eigenvalue but are trivially
diagonalizable.
We will now develop a necessary and sufficient condition for an operator to be
diagonalizable. We need the concepts of algebraic and geometric multiplicity in
this regard.
D EFINITION 43. Let λ ∈ C be an eigenvalue of T. We say that the algebraic multiplicity
of λ is k if λ appears k times as a root of the characteristic polynomial. More precisely,
we can write the characteristic polynomial as (t − λ )k q(t) for some polynomial q where
q(λ ) 6= 0.

In particular, if λ1 , . . . , λk ∈ C are the distinct eigenvalues of an operator, and


a1 , . . . , ak are the corresponding algebraic multiplicities, then the characteristic
polynomial must be of the form
p(t) = (−1)n (t − λ1 ) a1 · · · (t − λk ) ak .
D EFINITION 44. Let λ ∈ C be an eigenvalue of T. We call ( T − λ Id) the eigenspace of λ
and it geometric multiplicity is defined as g(λ ) = nullity ( T − λ Id).
E XERCISE 93. Let λ1 , . . . , λk ∈ C be the distinct eigenvalue of T ∈ L(V). Let u1 , . . . , uk
be such that ui ∈ N ( T − λi Id) for i = 1, . . . , k. Prove that
u1 + · · · + uk = 0 ⇒ u1 = · · · = uk = 0.

A natural question is whether there is a relation between algebraic and geometric


multiplicity of an eigenvalue.
E XERCISE 94. Show that the algebraic multiplicity of an eigenvalue is at least as large as
its geometric multiplicity. Moreover, using the example
 
0 1
(12) A= ,
0 0
show that the algebraic multiplicity can be larger than the geometric multiplicity.

We say that an operator is “defective” if the algebraic multiplicity of one of its


eigenvalues is strictly larger than its geometric multiplicity. The following result
tells us that these are precisely the operators that cannot be diagonalized.
T HEOREM 13. Let λ1 , . . . , λk ∈ C be the distinct eigenvalue of T ∈ L(V). Prove that the
following are equivalent.
(1) g(λ1 ) + · · · + g(λk ) = dim V.
30 8. DIAGONALIZATION

(2) ∀ 1 6 i 6 k : a(λi ) = g(λi ).


(3) T is diagonalizable.

As a corollary of Theorem 13, we get the sufficient condition in Theorem 12.


We conclude this section with a useful result called the Schur decomposition. We
have seen that not all matrices are diagonalizable. However, every matrix is similar
to an upper triangular matrix.
T HEOREM 14. For any A ∈ Cn×n , there exists an unitary matrix U ∈ Cn×n and an
upper triangular matrix Λ ∈ Cn×n such that A = UΛU H .

This is called the Schur decomposition. It relies on the existence of eigenvalues of a


complex matrix. By upper triangular, we mean Λi j = 0 for i > j.
E XERCISE 95. Show that the diagonal elements of Λ are the eigenvalues of A.

We will establish the Schur decomposition using induction on n. We will use the
following base case.
E XERCISE 96. Establish the Schur decomposition for any A ∈ C2×2 .

We will next deduce the Cayley-Hamilton theorem using a limiting argument and
the Schur decomposition.
L EMMA 2. Diagonalizable matrices are dense in Cn×n .

By dense, we mean that for any A ∈ Cn×n , there exists a sequence of diagonalizable
matrices An such that k An − Ak → 0 as n → ∞, where k · k is some matrix norm.
Using Lemma 2 and Exercise 76, we can establish the Cayley-Hamilton theorem for
general matrices.
CHAPTER 9

EIGEN DECOMPOSITION

Recall the concept of a self-adjoint operator on an inner product space. This includes,
in particular, symmetric and Hermitian matrices. They are, in turn, a particular case
of normal operators, which include skew-adjoint and unitary operators. We will
show that these operators can be diagonalized in an orthonormal (or unitary) basis.
Notably, we will develop this theory without using determinants, characteristic
equations, or the fundamental theorem of algebra.
T HEOREM 15. Let V be a complex, finite-dimensional inner product space, and let T ∈
L(V) be self-adjoint. Then there exists an orthonormal basis of eigenvectors of T.
R EMARK 16.

(1) We can deduce that the eigenvalues of T are real.


(2) The above result is called the spectral decomposition of T.
(3) That the eigenvalues are real is a bonus and is not strictly required for T to be
diagonalizable.
(4) Specializing the above result to a symmetric matrix A, we can conclude the
existence of a diagonal matrix Λ = diag(λ1 , . . . , λn ) and an orthogonal matrix
Q such that
(13) A = QΛQ> .
On the other hand, if A ∈ Cn×n is Hermitian, then we can write
(14) A = UΛU H ,
where U is unitary and Λ = diag(λ1 , . . . , λn ), λi ∈ R. We refer to (13) and
(14) as the eigendecomposition of A.

Theorem 15 can be deduced via induction on the dimension of the space, along
with the following result. That the eigenvalues are real is a part of the proof.
L EMMA 3. Every self-adjoint operator has an eigenvector with a real eigenvalue.

The induction process is based on the concept of an invariant subspace.


D EFINITION 45. A subspace U ⊂ V is an invariant subspace of T ∈ L(V) if Tu ∈ U for
all u ∈ U.
E XERCISE 97. Let T ∈ L(V) be self-adjoint and U be an invariant subspace of T. Define
TU ∈ L(U), the restriction of T on U, as follows:
∀u ∈ U : TU u = Tu.
Show that TU ∈ L(U) is self-adjoint.
31
32 9. EIGEN DECOMPOSITION

The archetype example of invariant subspaces is an eigenspace and its orthogonal


complement.
E XERCISE 98. Let q be an eigenvector of T. Show that T is invariant on span{q} and
span{q}⊥ .

We can extend the spectral theorem to the larger class of normal operators. However,
the eigenvalues are no longer guaranteed to be real.
T HEOREM 16. Let V be a complex, finite-dimensional inner product space, and let T ∈
L(V) be a normal operator. Then there exists an orthonormal basis of eigenvectors of T.

Theorem 16 can be deduced from the following observation, which is an important


result in itself.
E XERCISE 99. Let V be a finite-dimensional inner product space. Suppose T, S ∈ L(V) be
self-adjoint and TS = ST. Then T and S can be diagonalized in a common orthonormal
basis.

We next look at a variational characterization of eigenvalues called the min-max the-


orem. It allows us to define eigenvalues without the need to associate eigenvectors
with them.
T HEOREM 17. Let V be a finite-dimensional inner product space. Let T be self-adjoint and
let λ1 > λ2 > · · · > λn be its eigenvalues. Then, for all 1 6 k 6 n,
n o
λk = max min h x, Txi : x ∈ U, k xk = 1 ,
dim U=k
where U is a subspace of V. In particular,
λ1 = max h x, Txi and λn = min h x, Txi.
x∈V, k xk=1 x∈V, k xk=1

Using Theorem 17, establish the perturbation result in Problem 17.


We conclude this part by discussing a particular class of self-adjoint operators that
come up in various applications.
D EFINITION 46. A self-adjoint operator is said to be positive semidefinite (resp. positive
definite) if the eigenvalues of T are nonnegative (resp. positive).
E XAMPLE 14. Following are some examples of positive semidefinite (PSD) and positive
definite (PD) operators.
(1) The zero operator is PSD, while the identity operator is PD.
(2) For all q ∈ V, the operator T ( x) = h x, qi q is PSD.
(3) Hessian matrix of a twice continuously differentiable convex function.
(4) Gram and covariance matrices associated with a system of vectors.

The following is a useful characterization of PSD and PD operators.


E XERCISE 100. Let T ∈ L(V) be self-adjoint. Show that
T is PSD ⇐⇒ ∀ x ∈ V : h x, Txi > 0,
and
T is PD ⇐⇒ ∀ x ∈ V \ {0} : h x, Txi > 0.
CHAPTER 10

SINGULAR VALUE DECOMPOSITION

Let A ∈ Cm×n . The question of diagonalization does not arise if m 6= n (rectangular


matrix). In fact, we have even seen examples of square matrices which cannot be
diagonalized. The next best is the singular value decomposition (SVD) of A.
T HEOREM 18. Let A ∈ Cm×n and r = rank A > 1. Then there exists orthonormal
vectors v1 , . . . , vr ∈ Cn , orthonormal vectors u1 , . . . , ur ∈ Cm , and σ1 > · · · > σr > 0,
such that
(15) A = σ1 u1 v> >
1 + · · · + σr ur vr .

R EMARK 17.

(1) By extending the orthonormal vectors into orthonormal bases, we can write
(16) A = UΣV > ,
where U ∈ Cm×m and V ∈ Cn×n are unitary matrices, and Σ ∈ Rm×n is
defined as (
σi 1 6 i = j 6 r,
Σi j =
0 otherwise.
We call (15) the partial SVD of A and (16) the full SVD.
(2) We call σ1 , . . . , σr the singular values, u1 , . . . , ur the left singular vectors, and
v1 , . . . , vr the right singular vectors.
(3) The SVD and eigendecomposition of A need not coincide even if A is Hermit-
ian. Indeed, while eigenvalues can take negative values, singular values are
nonnegative by construction. However, they are the same if A is PSD.
E XERCISE 101. Prove that for A ∈ Cm×n ,
n o
σ1 ( A) = max k AxkCm : k xkCn = 1 ,

where k·kCm and k·kCn are the Euclidean norms on Cm and Cn .

In other words, σ1 ( A) = k Ak2 , the spectral norm induced by the standard Eu-
clidean norms on Cn and Cm .
Consider the full SVD (16). Let v1 , . . . , vn ∈ Cn be the columns of V and u1 , . . . , um ∈
Cm be the columns of U. The following result relates these to the four fundamental
subspaces of A.
E XERCISE 102. Show that R( A) = span(u1 , . . . , ur ) and N ( A) = span(vr+1 , . . . , vn ).

Similar to Theorem 17, we have the following min-max characterization.


33
34 10. SINGULAR VALUE DECOMPOSITION

E XERCISE 103. Show that, for all 1 6 k 6 r,


n o
σk = min max k Axk2 : x ∈ U, k xk2 = 1 ,
dim U=k
where U is a subspace of V.

We have the following perturbation result as an application of the min-max charac-


terization.
E XERCISE 104. Let A ∈ Cn×n . Let rank ( A) = r and σ1 > · · · > σr be the singular
values of A. If k Ek2 < σr , then prove that
rank ( A + E) > r.
This means that a “small” perturbation cannot reduce the rank.

We can use the SVD to define the two important matrix norms––the spectral norm
(largest singular value) and the nuclear norm (sum of the singular values).
E XERCISE 105. Let (15) be the SVD of A ∈ Cm×n . Define k Ak2 = σ1 and k Ak∗ =
σ1 + · · · + σr . Verify that these are matrix norms on Cm×n . Moreover, prove that these are
dual of each other in the sense that
k Ak∗ = max tr( X >A),
k X k 2 61
and
k Ak2 = max tr( X >A),
k X k ∗ 61
where the variable X ∈ Cm × n .

We can use the SVD for low-rank approximation. In particular, we have the fol-
lowing result which states that the optimal rank-k approximation of a matrix is
provided by a k-term truncation of (15).
E XERCISE 106. Let (15) be the SVD of A ∈ Cm×n and let k 6 r. Define
Ak = σ1 u1 v> >
1 + · · · + σk uk vk .
Prove that
k Ak − Ak∗ = max k X − Ak∗ ,
rank ( X )6k
where k · k∗ is the nuclear norm.
CHAPTER 11

POSITIVE MATRICES

In this chapter, we will work with real square matrices. However, for eigenvalue
computation, we will consider them as elements of Cn×n . We say that A is positive
(A > 0) if each Ai j > 0 and that A is nonnegative (A > 0) if each Ai j > 0. Similarly,
we will use the symbols x > 0 and x > 0 for vectors. We will use σ ( A) for the
spectrum of A, the set of eigenvalues of A. The spectral radius of A is defined as

ρ( A) = max |λ | : λ ∈ σ ( A) .

To motivate the Perron-Frobenius theorem, it would be instructive to look at a 2 × 2


matrix,
 
a b
(17) A= ,
c d
where a, b, c, and d are positive. The eigenvalues of A are
1 1/2
( a + d ± ∆), ∆ = ( a − d)2 + 4bc .
2
Since ∆ > 0, we can conclude that

• r = ρ( A) = ( a + d + ∆)/2 is positive and r ∈ σ ( A).


• r is the only eigenvalue such that |λ | = r.
• The algebraic multiplicity of r is one.
• The eigenvector corresponding to r is p = ( a − d + ∆, 2c), a positive vector.
• The other eigenvector (2b, d − a − ∆) has a negative component.

In fact, any positive matrix has a positive eigenvalue (Perron root) and a correspond-
ing positive eigenvector (Perron vector). Moreover, any nonnegative eigenvector
must be a multiple of the Perron vector.
T HEOREM 19. Let A > 0 and r = ρ( A). Then

(1) r > 0 and r is an eigenvalue of A.


(2) r is the only eigenvalue on the spectral circle {λ ∈ C : |λ | = r}.
(3) There exists p > 0 such that Ap = rp.
(4) The algebraic (hence the geometric) multiplicity of r is one.
(5) If x is an eigenvector of A and x > 0, then x = cp, c > 0.
R EMARK 18. We call r the Perron root of A. The Perron vector is the unique vector p > 0
such as Ap = rp and k pk1 = 1.

Among other things, we will need the following results to establish Theorem 19.
35
36 11. POSITIVE MATRICES

E XERCISE 107. Let k·k be an induced matrix norm on Cn×n . Prove that

∀ A ∈ Cn × n : ρ( A) = lim k Ak k1/k .
k→∞

E XERCISE 108. Let A ∈ Cn×n be such that ρ( A) < 1. Then, for any matrix norm k · k,

lim k Ak k = 0.
k→∞

P ROPOSITION 6. Let A > 0 and ρ( A) = 1. Suppose q > 0 be such that q 6 Aq. Show
that q = Aq.

What can be said if A is not positive but nonnegative, i.e., has some zeros? Clearly,
r cannot be guaranteed to be positive since this class includes the 0 matrix. A more
interesting example is
 
0 1
(18) A= .
0 0

In this case, σ ( A) = {0}, ρ( A) = 0, the algebraic multiplicity of 0 is two, and its


eigenvectors are multiples of (1, 0). Thus, (18) violates Theorem 19 on multiple
counts. However, we can guarantee the following.

E XERCISE 109. Let A > 0 and r = ρ( A). Prove that

(1) r ∈ σ ( A).
(2) There exists x > 0, x 6= 0, such that Ax = rx.

This is far as Theorem 19 can be generalized without additional assumptions.


Frobenius noticed that depending on the “position of the zeros”, one can recover
almost all the properties in Theorem 19. For example, consider the matrix
 
0 1
(19) A= .
1 0

In this case, σ ( A) = {−1, 1} and ρ( A) = 1. Moreover, the positve vector p = (1, 1)


corresponds to the eigenvalue λ = 1. Thus, (19) satisfies properties 1, 3, and 4 in
Theorem 19.

D EFINITION 47. A matrix A is said to be reducible if there exists a permutation matrix P


and matrices X, Y and Z such that
 
X Y
(20) P>AP =
0 Z

Otherwise, A is said to be irreducible.

The pre-multiplication with P> and post-multiplication with P has the effect of
reordering the rows and columns of A. Thus, (18) is reducible, while (19) is irre-
ducible.
11. POSITIVE MATRICES 37

E XERCISE 110. Prove that the following (n × n) tridiagonal matrix is irreducible:


0 ··· 0
 
1 1
.. .
. .. 

1 1 1
 
 .. .

0 1
 1 . 0 
. . . .
 .. .. .. . . 1

0 ··· 0 1 1

We can characterize the irreducibility of A ∈ Rn×n based on its graphical repre-


sentation. Let G( A) be a directed graph with nodes {1, 2, . . . , n} that has an edge
i → j if and only if Ai j > 0. A path from node i to node j is a sequence of nodes
i = i0 , i1 , . . . , in = j such that ik → ik+1 for all k = 0, . . . , n − 1. We say that G( A) is
strongly connected if, for every pair of nodes (i, j), there exists a path from i to j.
E XERCISE 111. Prove that A is irreducible if and only if G( A) is strongly connected.
T HEOREM 20. Let A > 0 be irreducible and let r = ρ( A). Then
(1) r > 0 and r ∈ σ ( A).
(2) The algebraic multiplicity of r is one.
(3) There exists p > 0 such that Ap = rp.
(4) The only nonnegative eigenvectors are p and its positive multiples.
R EMARK 19. The only property that irreducibility cannot salvage is property 2 in Theo-
rem 19. Indeed, we have two different eigenvalues of (19) on the spectral circle |λ | = 1.
E XERCISE 112. Verify that the tridiagonal matrix
 
1 1 0
A = 1 1 1
0 1 1
satisfies the properties in Theorem 20.

We can deduce Theorem 20 from Theorem 19 using the following observations.


E XERCISE 113. Let A, B ∈ Rn×n be such that A > B. Show that ρ( A) > ρ( B).
E XERCISE 114. Let A > 0 and define Ak = A + k−1 E for all k > 1, where E is the
all-ones matrix. Prove that
lim ρ( Ak ) = ρ( A).
k→∞

E XERCISE 115. Let A > 0 be irreducible. Show that, for all 1 6 i, j 6 n, there exists
1 6 k 6 n − 1 such that ( Ak )i j > 0. In particular, show that ( I + A)n−1 > 0.
E XERCISE 116. Let A ∈ Cn×n be p be polynomial with complex coefficients. Show that
σ ( p( A)) = { p(λ ) : λ ∈ σ ( A)} .
E XERCISE 117. Let A ∈ Cn×n be such that ρ( A) ∈ σ ( A). Show that, for all k > 1,
ρ ( I + A)k = (1 + ρ( A))k .

CHAPTER 12

APPLICATIONS

1. Linear regression

Linear regression is a powerful and versatile technique widely employed in data


analysis, machine learning, and scientific research. This uses a linear (affine) model
to capture the relationship between dependent and independent variables. We will
consider a simple yet profound application of linear regression, namely, fitting a
polynomial using linear regression. In many real-world scenarios, the relationship
between variables is not linear. Polynomial regression can be used to capture these
nonlinear relationships by fitting a polynomial equation to the data.

F IGURE 1. Fitting a cubic polynomial.

A polynomial of degree n can be expressed as


p( x) = β0 + β1 x + β2 x2 + . . . + βn xn ,
where y is the dependent variable, x is the independent variable, and β0 , . . . , βn are
the coefficients. Suppose we are given data points ( x1 , y1 ), . . . , ( xK , yK ). The fitting
problem is to find a polynomial that best explains the data. In other words, we wish
to find β0 , . . . , βn such that p( xk ) ≈ yk for all k = 1, . . . , K. We use “≈” since there
might not exist a p that fits each point exactly (think of fitting three non-collinear
38
1. LINEAR REGRESSION 39

points with a line). In least-squares regression, we wish to find β0 , . . . , βn that


minimizes
K
`(β0 , . . . , βn ) = ∑ ( yk − p(xk ))2 .
k=1

F IGURE 2. The normal equation can be written as X >( Xβ∗ − y) =


0 which requires the optimal residual vector Xβ∗ − y to be perpen-
dicular to the range space of X. The latter property is essentially a
consequence of the Pythagoras theorem.

Despite the nonlinear nature of p, we can find the optimal set of coefficients that
minimizes (1) using linear algebra. More precisely, define the coefficient vector
β = (β0 , . . . , βn ), the measurement vector y = ( y1 , . . . , yK ), and the data matrix
1 x1 x21 . . . xn1
 
1 x2 x2 . . . xn 
2 2
X = . ..  .

.. .. ..
 .. . . . . 
1 xK xK . . . xnK
2

We can then express the regression problem as


(21) min `(β) = k y − Xβk2 ,
β ∈Rn + 1

where k · k is the standard Euclidean norm on RK . This is an instance of an uncon-


strained optimization problem, where the loss function is quadratic. We wish to find
β∗ such that `(β∗ ) 6 `(β) for all β; we call β∗ a minimizer of (21). The following
result connects this problem with the solution of linear systems of equations.
E XERCISE 118. Prove that β∗ is a minimizer of (21) if and only if X >Xβ∗ = X >y.

That is, we can find the minimizer by solving X >Xβ = X >y. This is called the
normal equations and has a nice geometric interpretation (see fig. 2)
How do we know that these equations have a solution in the first place? Fortunately,
it does. We can use the decomposition range-nullspace decomposition of a linear
transform to establish the following result.
E XERCISE 119. Prove that, for any X and y, there exists β such that X >Xβ = X >y.
40 12. APPLICATIONS

F IGURE 3. Optimizing a quadratic function over an affine space.

In other words, we can always solve the regression problem (21) irrespective of the
nature of the data points.

2. Linearly constrained optimization

We often encounter the problem of minimizing a differentiable function f : Rn → R


over an affine space Ax = b (A ∈ Rm×n , b ∈ Rm ):
min f (β)
β ∈Rn
(22) subject to Aβ = b.
This is an example of a constrained optimization problem. Unlike linear regression,
the variables in this case are restricted and cannot take on all possible values.
However, note that the regression problem is a special case of (22): on setting A = 0
and b = 0 (all β ∈ Rn is admissible) and f (β) = k Xβ − yk2 in (22), we obtain (21).
We can characterize the solution of (22) using linear algebra.
E XERCISE 120. Let β∗ ∈ Rn be a solution of (22). Prove that there exists λ ∈ Rm s.t.
(23) ∇f (β∗ ) = A>λ.

Consider the particular case


1 >
(24) f (β) =
β Qβ,
2
where Q is a symmetric matrix. Then (23) reduces to Qβ∗ = A>λ. Also, since
Aβ∗ = b, we obtain the system of equations
   ∗  
A 0 β b
(25) = .
Q − A> λ 0
3. SUBSPACE APPROXIMATION 41

This constitutes m + n equations in m + n variables. Assuming we can solve this


system of equations, can we assert that β∗ is the desired minimizer? Unfortunately,
(23) is just a necessary condition and the answer is generally no. However, if Q is
positive semidefinite (i.e., f is convex), then β∗ is indeed the desired minimizer.
E XERCISE 121. Let β∗ and λ be a solution of (25) and f be given by (24) where Q is PSD.
Prove that β∗ is the solution of (22).

3. Subspace approximation

We consider the problem of approximating a set of points x1 , . . . , xK in Rd by a


subspace V ⊂ Rd . This comes up in dimensionality reduction techniques (where
dim V  d) such as principal component analysis. We assume the points are
centered around the origin.

F IGURE 4. Approximating a point cloud in R3 by a plane. This is


done by computing the eigenbasis of the covariance matrix.

The mathematical model for this approximation problem can be viewed as a gen-
eralization of linear regression to higher dimensions; however, there is a subtle
difference in the way the loss function is defined. More precisely, let PV ∈ L(Rd )
be the orthogonal projector onto the subspace V. The “best” approximation of each
xk would be PV ( xk ) and the distortion incurred is the projection error k xk − PV xk k,
where k · k ie the standard Eulcidean norm. On summing the errors from all the
points, we obtain the loss function
K
1
(26) `(V) = ∑ k xk − PV xk k2 ,
K k=1

where the variable is the subspace V. Thus, finding the optimal subspace amounts
to minimizing (26) w.r.t V.
We can solve the above problem by representing V in an orthonormal basis and
expressing (26) in terms of this basis vectors. More precisely, let dim V = p and
V = span(v1 , . . . , v p ), where v1 , . . . , v p are is an orthonormal basis of V.
42 12. APPLICATIONS

E XERCISE 122. For any x ∈ Rd , show that


p
(27) k x − PV xk2 = k xk2 − ∑ ( x> v j )2 .
j=1

In particular, define the covariance matrix


K
1
C= ∑ xk x>
k .
K k=1
Explain using (27) why minimizing (26) is the same as maximizing
p
(28) ∑ v>j Cv j
j=1

where the variable is the orthonormal basis v1 , . . . , v p .

In other words, we have reduced the original subspace approximation problem to


an eigenvalue problem. This has a standard solution. Notice that C is a PSD matrix
with nonnegative eigenvalues.
E XERCISE 123. Prove that (28) is maximized if we take v1 , . . . , v p to be the top eigenvectors
of C, i.e., the eigenvectors corresponding to the largest eigenvalues of C.

A popular technique for reducing the number of features in a dataset without losing
important information is the principal component analysis (PCA). This is modeled
on the approximation problem discussed above. Based on the above analysis, we
can summarize the key steps in PCA:
(1) Center the data to have zero mean (e.g. by subtracting the mean).
(2) Compute the covariance matrix C of the centered data.
(3) Find the top-p eigenvectors of C (e.g. using the power method).
(4) Project the data onto the subspace spanned by these eigenvectors.

4. Low-rank approximation

5. Closest orthogonal transform

6. Markov chain

7. Convolutions and DFT


CHAPTER 13

PROBLEMS

P ROBLEM 1. Let K denote the set of real-valued functions on [0, 1]. With the natural
vector space operations on K, explain why K is infinite-dimensional. On the other hand,
what can we say about the dimension of the vector space of functions f : {0, 1} → R?
P ROBLEM 2. Consider the space P of polynomials of degree at most n. Verify that p0 (t) =
1, p1 (t) = t, . . . , pn (t) = tn form a basis for P.
Consider the subspace
n Z 1 o
Q= p∈P: p(t) dt = 0 .
0
Find the dimension of Q.
P ROBLEM 3. Let V be a vector space and v1 , . . . , vn ∈ V be linearly independent. Show
that for any v ∈ V, the dimension of span(v1 + v, . . . , vn + v) is at least n − 1.
P ROBLEM 4. Let U1 and U2 be subspaces of a finite-dimensional inner product space V.
Prove that (U⊥ ⊥ ⊥ ⊥
1 ) = U1 and (U1 ∩ U2 ) = U1 + U2 .

P ROBLEM 5. Let U1 and U2 be subspaces of a finite-dimensional space. Show that


dim U1 + U2 = dim U1 + dim U2 − dim U1 ∩ U2 .
In particular, if U1 and U2 form a direct sum, then show that
dim U1 ⊕ U2 = dim U1 + dim U2 .
P ROBLEM 6. Let U1 , U2 , U3 be subspaces of an n-dimensional vector space. Prove that
dim (U1 ∩ U2 ∩ U3 ) > dim U1 + dim U2 + dim U3 − 2n.
P ROBLEM 7. Let V be a finite-dimensional real vector space. Prove that L(V, R) is a
finite-dimensional vector space by explicitly constructing a basis for L(V, R). Further,
prove that dim L(V, R) = dim V.
P ROBLEM 8. Let T ∈ L(V). Prove that
T isomorphism ⇐⇒ N ( T ) = {0} ⇐⇒ R( T ) = V.
P ROBLEM 9. Let (V, k·k) be a normed vector space and let T ∈ L(V) be such that
k T (v)k 6 kvk for all v ∈ V. Prove that T − 2I is invertible, where I is the identity map
on V.
P ROBLEM 10. Let V and W be two finite-dimensional spaces. Prove that V is isomorphic
to W if and only if dim V = dim W.
P ROBLEM 11. Suppose V is finite-dimensional and T1 , T2 ∈ L(V). Prove that
dim N ( T1 ◦ T2 ) 6 dim N ( T1 ) + dim N ( T2 ).
43
44 13. PROBLEMS

P ROBLEM 12. Suppose T ∈ L(V, W) and T 0 ∈ L(W, V) are such that T ◦ T 0 = Id.
Show that N ( T ) = {0} and R( T ) = W.
P ROBLEM 13. Let T ∈ L(V, W), where V and W are finite-dimensional spaces. Define
S : R( T ∗ ) → R( T ), ∀ x ∈ R( T ∗ ) : Sx = Tx.
Prove that S is a linear isomorphism. Conclude that rank ( T ∗ ) = rank ( T ).
P ROBLEM 14. Recall that we use A ∼ B to mean that A and B are similar. Show that ∼
is an equivalence relation on the set Fn×n , i.e., ∼ is reflexive, symmetric, and transitive.
P ROBLEM 15. Let A ∈ Rm×n and b ∈ Rm be such that the equation Ax = b admits
a solution. Prove that there exists exactly one solution x of the form x = A>z for some
z ∈ Rm .
P ROBLEM 16. Let G be an undirected graph with vertices V = {1, . . . , n} and edges E .
Define L ∈ Rn×n as follows:

 di
 if i = j,
Li j = − 1 if (i, j) ∈ E ,

0 otherwise,

where di is the degree of vertex i.

(1) Show that L is PSD but not PD.


(2) Prove that nullity ( L) = 1 if and only if G is connected. More generally, prove
that nullity ( L) is the number of connected components of G .
P ROBLEM 17. Let A, B ∈ Cm×n . Show that
∀1 6 i 6 n : λi ( A ) − k B k2 6 λi ( A + B ) 6 λi ( A ) + k B k2 .
In particular, explain how we can conclude that the map A 7→ λi ( A) is continuous.
P ROBLEM 18. Let α 6= β and A ∈ Cn×n . Establish from first principles that

N ( A − αIn )( A − βIn ) = N ( A − αIn ) ⊕ N ( A − βIn ).
Using the above decomposition, prove that A is similar to
 
αIk 0
0 βIn−k
if and only if ( A − αIn )( A − βIn ) = 0. (I p denotes the identity matrix on R p× p ).
P ROBLEM 19. Let V be a finite-dimensional space. Let P ∈ L(V) be a projection operator
on V. Let [ P] be a matrix representation of P with respect to some basis of V. Prove that
rank P = trace [ P].
P ROBLEM 20. For any A ∈ Rn×n , prove that there exists an orthogonal matrix Q ∈ Rn×n
and a PSD matrix P ∈ Rn×n such that A = QP.
P ROBLEM 21. Let P1 , . . . , Pm ∈ L(V) be projection operators such that P1 + · · · + Pm is
the identity operator on V. Prove that

(1) rank P1 + · · · + rank Pm = dim V, and


(2) V = R( P1 ) ⊕ · · · ⊕ R( Pm ).
13. PROBLEMS 45

P ROBLEM 22. For A, B ∈ Rn×n , define its Hadamard product A ◦ B ∈ Rn×n to be


∀ 1 6 i, j 6 n : ( A ◦ B)i j = Ai j Bi j .
Let k Ak2 be the spectral norm of A and λmax ( B) be the largest eigenvalue of B. Prove that
(1) ∀ C, D ∈ Sn+ : kC ◦ D k2 6 λmax (C )λmax ( D ).
(2) ∀ A, B ∈ Rn×n : k A ◦ Bk22 6 k( A> A) ◦ ( B> B)k2 .

Using these, prove that k A ◦ Bk2 6 k Ak2 k Bk2 for all A, B ∈ Rn×n .
P ROBLEM 23. Let A, B ∈ Cn×n . Prove that
(1) I − AB invertible ⇐⇒ I − BA invertible.
(2) λ is an eigenvalue of AB ⇐⇒ λ is an eigenvalues of BA.
P ROBLEM 24. Let A, B ∈ Rn×n be symmetric matrices and let A be positive definite.
Prove that the eigenvalues of A−1 B are real. Moreover, if B is positive definite, then prove
that the eigenvalues of A−1 B are positive.
P ROBLEM 25. Let A ∈ Cn×n and τ > 0 be such that
n
∀1 6 i 6 n : ∑ | Ai j | 6 τ .
j=1

Prove that ρ( A) 6 τ.
P ROBLEM 26. Suppose A ∈ Rn×n is diagonalizable and two of its eigenvalues are equal.
Show that v, Av, . . . , An−1 v are linearly dependent for all v ∈ Rn .
P ROBLEM 27. The spectral norm of A ∈ Rn×n is defined as

k Ak2 = max k Axk : k xk = 1 ,
where k·k is the standard Euclidean norm on Rn . Prove that
 
max k Axk : k xk 6 1 .

k Ak2 = max y>Ax : k xk 6 1 and k yk 6 1 .

σmax ( A).

P ROBLEM 28. Suppose u, v ∈ V, where V is a real inner-product space. Prove that


hu, vi = 0 if and only if kuk 6 ku + cvk for all c ∈ R.
P ROBLEM 29. Let V be a real inner-product space. Let f : V → V be such that
∀ x, y ∈ V : h f ( x), f ( y)i = h x, yi.
Prove that f is linear. On the other hand, give a simple example to show that if
∀x ∈ V : h f ( x), f ( x)i = h x, xi,
then f need not be linear.
P ROBLEM 30. For some A ∈ Cn×n , suppose xH Ax = 0 for all x ∈ Cn . Show that A = 0.
On the other hand, give an example of A ∈ R2×2 s.t. A 6= 0 but x>Ax = 0 for all x ∈ R2 .
P ROBLEM 31. Let A and B be PSD. Prove that A2 = B2 only if A = B.
P ROBLEM 32. Using the permutation-based formula for determinants, prove that det( AB) =
det( A) det( B) for any A, B ∈ Rn×n .

You might also like