LinearAlgebra GDF Jan5 23
LinearAlgebra GDF Jan5 23
3.4 The Principle of Superposition . . . 65 6.3 Similar Matrices and Jordan Normal
Form . . . . . . . . . . . . . . . 163
3.5 Composition and Multiplication of
Matrices . . . . . . . . . . . . . 69 6.4 Sinks, Saddles, and Sources . . . . . 168
3.6 Properties of Matrix Multiplication . 73 6.5 *Matrix Exponentials . . . . . . . . 174
3.7 Solving Linear Systems and Inverses 77 6.6 *The Cayley Hamilton Theorem . . 180
3.8 Determinants of 2 × 2 Matrices . . . 84 6.7 *Second Order Equations . . . . . . 182
4 Solving Linear Differential Equations . 87 7 Determinants and Eigenvalues . . . . . 187
4.1 A Single Differential Equation . . . . 88 7.1 Determinants . . . . . . . . . . . . . 188
7.2 Eigenvalues and Eigenvectors . . . . 198 Index . . . . . . . . . . . . . . . . . . . . . . . 292
7.3 Real Diagonalizable Matrices . . . . 204
7.4 *Existence of Determinants . . . . . 208
8 Linear Maps and Changes of Coordi-
nates . . . . . . . . . . . . . . . . . . 211
8.1 Linear Mappings and Bases . . . . . 212
8.2 Row Rank Equals Column Rank . . 217
8.3 Vectors and Matrices in Coordinates 221
8.4 *Matrices of Linear Maps on a Vector
Space . . . . . . . . . . . . . . . 229
9 Least Squares . . . . . . . . . . . . . . . . 233
9.1 Least Squares Approximations . . . . 234
9.2 Best Approximate Solution . . . . . 238
9.3 Least Squares Fitting of Data . . . . 240
10 Orthogonality . . . . . . . . . . . . . . . 247
10.1 Orthonormal Bases and Orthogonal
Matrices . . . . . . . . . . . . . 248
10.2 Gram-Schmidt Orthonormalization
Process . . . . . . . . . . . . . . 251
10.3 The Spectral Theory of Symmetric
Matrices . . . . . . . . . . . . . 254
10.4 *QR Decompositions . . . . . . . . 257
11 *Matrix Normal Forms . . . . . . . . . 262
11.1 Simple Complex Eigenvalues . . . . 263
11.2 Multiplicity and Generalized Eigen-
vectors . . . . . . . . . . . . . . 270
11.3 The Jordan Normal Form Theorem 275
11.4 *Markov Matrix Theory . . . . . . 281
11.5 *Proof of Jordan Normal Form . . . 284
12 Matlab Commands . . . . . . . . . . . 287
Preface
i
Preface
There are two types of exercises included with most sec- In our course we ask the students to read the material in
tions — those that should be completed using pencil and Chapter 1 and to use the computer instructions in that
paper (called Hand Exercises) and those that should be chapter as an entry into MATLAB. In class we cover only
completed with the assistance of computers (called Com- the material on dot product. Chapter 2 explains how to
puter Exercises). solve systems of linear equations and is required for a first
course on linear algebra. The proof of the uniqueness of
reduced echelon form matrices is not very illuminating
Ways to use the text: We envision this course as a one- for students and can be omitted in classroom discussion.
year sequence replacing the standard one semester lin- Sections whose material we feel can be omitted are noted
ear algebra and ODE courses. There is a natural one by asterisks in the Table of Contents and Section 2.6 is
semester Linear Systems course that can be taught us- the first example of such a section.
ing the material in this book. In this course students
will learn both the basics of linear algebra and the ba- In Chapter 3 we introduce matrix multiplication as a
sics of linear systems of differential equations. This one notation that simplifies the presentation of systems of
semester course covers the material in the first eight chap- linear equations. We then show how matrix multiplica-
ters. The Linear Systems course stresses eigenvalues and tion leads to linear mappings and how linearity leads to
a baby Jordan normal form theory for 2 × 2 matrices and the principle of superposition. Multiplication of matrices
culminates in a classification of phase portraits for planar is introduced as composition of linear mappings, which
constant coefficient linear systems of differential equa- makes transparent the observation that multiplication of
tions. Time permitting additional linear algebra topics matrices is associative. The chapter ends with a discus-
from Chapters 9 and 10 may be included. Such material sion of inverse matrices and the role that inverses play in
includes changes of coordinates for linear mappings, and solving systems of linear equations. The determinant of
orthogonality including Gram-Schmidt orthonormaliza- a 2 × 2 matrix is introduced and its role in determining
tion and least squares fitting of data. matrix inverses is emphasized.
We believe that by being exposed to ODE theory a stu-
dent taking just the first semester of this sequence will
gain a better appreciation of linear algebra than will a Chapter 4 This chapter provides a nonstandard intro-
student who takes a standard one semester introduction duction to differential equations. We begin by emphasiz-
to linear algebra. However, a more traditional Linear Al- ing that solutions to differential equations are functions
gebra course can be taught by omitting Chapter 7 and (or pairs of functions for planar systems). We explain in
de-emphasizing some of the material in Chapter 6. Then detail the two ways that we may graph solutions to differ-
there will be time in a one semester course to cover a se- ential equations (time series and phase space) and how to
lection of the linear algebra topics mentioned at the end go back and forth between these two graphical represen-
of the previous paragraph. tations. The use of the computer is mandatory in this
chapter. Chapter 4 dwells on the qualitative theory of
solutions to autonomous ordinary differential equations.
Chapters 1–3 We consider the first two chapters to be In one dimension we discuss the importance of knowing
introductory material and we attempt to cover this mate- equilibria and their stability so that we can understand
rial as quickly as we can. Chapter 1 introduces MATLAB the fate of all solutions. In two dimensions we empha-
along with elementary remarks on vectors and matrices. size constant coefficient linear systems and the existence
ii
Preface
(numerical) of invariant directions (eigendirections). In Chapter 6 describes closed form solutions to planar sys-
this way we motivate the introduction of eigenvalues and tems of constant coefficient linear differential equations
eigenvectors, which are discussed in detail for 2 × 2 ma- in two different ways: a direct method based on eigen-
trices. Once we know how to compute eigenvalues and values and eigenvectors and a related method based on
eigendirections, we then show how this information cou- similarity of matrices. Each method has its virtues and
pled with superposition leads to closed form solution to vices. Note that the Jordan normal form theorem for
initial value problems, at least when the eigenvalues are 2 × 2 matrices is proved when discussing how to solve
real and distinct. linear planar systems using similarity of matrices.
We are not trying to give a thorough grounding in tech-
niques for solving differential equations in Chapter 4;
Chapters 7, 8, 10, and 11 Chapter 7 discusses deter-
rather we are trying to give an introduction to the ways
minants, characteristic polynomials, and eigenvalues for
that modern computer programs will represent graph-
n × n matrices. Chapter 8 presents more advanced mate-
ically solutions to differential equations. We have in-
rial on linear mappings including row rank equals column
cluded, however, a section on separation of variables for
rank and the matrix representation of mappings in differ-
those who wish to introduce techniques for finding closed
ent coordinate systems. The material in Sections 8.1 and
form solutions to single differential equations at this time.
8.2 could be presented directly after Chapter 5, while the
Our preference is to omit this section in the Linear Sys-
material in Section 8.3 explains the geometric meaning
tems course as well as to omit the applications in Sec-
of similarity.
tion 4.2 of the linear growth model in one dimension to
interest rates and population dynamics. Orthogonal bases and orthogonal matrices, least squares
and Gram-Schmidt orthonormalization, and symmetric
matrices are presented in Chapter 10. This material is
Chapter 5 In this chapter we introduce vector space the- very important, but is not required later in the text, and
ory: vector spaces, subspaces, spanning sets, linear inde- may be omitted.
pendence, bases, dimensions and the other basic notions
in linear algebra. Since solutions to differential equations The Jordan normal form theorem for n × n matrices is
naturally reside in function spaces, we are able to illus- presented in Chapter 11. Diagonalization of matrices
trate that vector spaces other than Rn arise naturally. with distinct real and complex eigenvalues is presented
We have found that, depending on time, the proof of in the first two sections. The appendices, including the
the main theorem, which appears in Section 5.6, may be proof of the complete Jordan normal form theorem, are
omitted in a first course. The material in these chapters included for completeness and should be omitted in class-
is mandatory in any first course on linear algebra. room presentations.
Chapter 6 At this juncture the text divides into two The Classroom Use of Computers At the University of
tracks: one concerned with the qualitative theory of so- Houston we use a classroom with an IBM compatible
lutions to linear and nonlinear planar systems of differen- PC and an overhead display. Lectures are presented three
tial equations and one mainly concerned with the devel- hours a week using a combination of blackboard and com-
opment of higher dimensional linear algebra. We begin puter display. We find it inadvisable to use the computer
with a description of the differential equations chapters. for more than five minutes at a time; we tend to go back
iii
Preface
and forth between standard lecture style and computer May, 1998 Michael Dellnitz
presentations. (The preloaded matrices and differential Columbus Martin Golubitsky
equations are important to the smooth use of the com-
February, 2018 James Fowler
puter in class.)
We ask students to enroll in a one hour computer lab
where they can practice using the material in the text on
a computer, do their homework and additional projects,
and ask questions of TA’s. Our computer lab happens to
have 15 power macs. In addition, we ensure that MAT-
LAB and the laode files are available on student use com-
puters around the campus (which is not always easy).
The laode files are on the enclosed CDROM; they may
also be downloaded by using a web browser or by anony-
mous ftp.
iv
Chapter 1 Preliminaries
1 Preliminaries
The subjects of linear algebra and differential equations
involve manipulating vector equations. In this chapter
we introduce our notation for vectors and matrices —
and we introduce MATLAB, a computer program that is
designed to perform vector manipulations in a natural
way.
We begin, in Section 1.1, by defining vectors and matri-
ces, and by explaining how to add and scalar multiply
vectors and matrices. In Section 1.2 we explain how to
enter vectors and matrices into MATLAB, and how to
perform the operations of addition and scalar multiplica-
tion in MATLAB. There are many special types of ma-
trices; these types are introduced in Section 1.3. In the
concluding section, we introduce the geometric interpre-
tations of vector addition and scalar multiplication; in
addition we discuss the angle between vectors through
the use of the dot product of two vectors.
1
§1.1 Vectors and Matrices
1.1 Vectors and Matrices Addition and Scalar Multiplication of Vectors There
are two basic operations on vectors: addition and scalar
In their elementary form, matrices and vectors are just
multiplication. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
lists of real numbers in different formats. An n-vector is
be n-vectors. Then
a list of n numbers (x1 , x2 , . . . , xn ). We may write this
vector as a row vector as we have just done — or as a x + y = (x1 + y1 , . . . , xn + yn );
column vector
that is, vector addition is defined as componentwise ad-
x1
.. dition.
. .
Similarly, scalar multiplication is defined as component-
xn wise multiplication. A scalar is just a number. Initially,
we use the term scalar to refer to a real number — but
The set of all (real-valued) n-vectors is denoted by Rn ; later on we sometimes use the term scalar to refer to a
so points in Rn are called vectors. The sets Rn when n complex number. Suppose r is a real number; then the
is small are very familiar sets. The set R1 = R is the real multiplication of a vector by the scalar r is defined as
number line, and the set R2 is the Cartesian plane. The
set R3 consists of points or vectors in three dimensional rx = (rx1 , . . . , rxn ).
space.
Subtraction of vectors is defined simply as
An m × n matrix is a rectangular array of numbers with
m rows and n columns. A general 2 × 3 matrix has the x − y = (x1 − y1 , . . . , xn − yn ).
form
Formally, subtraction of vectors may also be defined as
a11 a12 a13
A= .
a21 a22 a23 x − y = x + (−1)y.
We use the convention that matrix entries aij are indexed Division of a vector x by a scalar r is defined to be
so that the first subscript i refers to the row while the 1
second subscript j refers to the column. So the entry a21 r
x.
refers to the matrix entry in the 2nd row, 1st column.
The standard difficulties concerning division by zero still
An n × m matrix A and an n0 × m0 matrix B are equal hold.
precisely when the sizes of the matrices are equal (n = n0
and m = m0 ) and when each of the corresponding entries
are equal (aij = bij ). Addition and Scalar Multiplication of Matrices Simi-
larly, we add two m × n matrices by adding correspond-
There is some redundancy in the use of the terms “vec-
ing entries, and we multiply a scalar times a matrix by
tor” and “matrix”. For example, a row n-vector may be
multiplying each entry of the matrix by that scalar. For
thought of as a 1 × n matrix, and a column n-vector may
example,
be thought of as a n × 1 matrix. There are situations
where matrix notation is preferable to vector notation
0 2 1 −3 1 −1
and vice-versa. + =
4 6 1 4 5 10
2
§1.1 Vectors and Matrices
and
1 3 2 1
8. A = and B = .
2 −4 8 −16 0 4 1 −2
4 = .
3 1 12 4
2 1 0
The main restriction on adding two matrices is that the
2 1
9. A = 4 1 0 and B = .
1 −2
matrices must be of the same size. So you cannot add a 0 0 0
4 × 3 matrix to 6 × 2 matrix — even though they both
have twelve entries.
2 1
In Exercises 10 – 11, let A = and B =
−1 4
Exercises
0 2
and compute the given expression.
3 −1
10. 4A + B.
In Exercises 1 – 3, let x = (2, 1, 3) and y = (1, 1, −1) and
compute the given expression. 11. 2A − 3B.
1. x + y.
2. 2x − 3y.
3. 4x.
3
§1.2 MATLAB
1.2 MATLAB x + y
We shall use MATLAB to compute addition and scalar
and MATLAB responds with
multiplication of vectors in two and three dimensions.
This will serve the purpose of introducing some basic ans =
MATLAB commands. 3 1 2
Entering Vectors and Vector Operations Begin a This vector is easily checked to be the sum of the vectors
MATLAB session. We now discuss how to enter a vector x and y. Similarly, to perform a scalar multiplication,
into MATLAB. The syntax is straightforward; to enter type
the row vector x = (1, 2, 1) type2
2*x
x = [1 2 1] which yields
y = [2 -1 1] ans =
-1 3 0
and MATLAB responds with
We mention two points concerning the operations that
y = we have just performed in MATLAB.
2 -1 1
(a) When entering a vector or a number, MATLAB auto-
matically echoes what has been entered. This echo-
To add the vectors x and y, type
ing can be suppressed by appending a semicolon to
2
MATLAB has several useful line editing features. We point the line. For example, type
out two here:
(a) Horizontal arrow keys (→, ←) move the cursor one space with- z = [-1 2 3];
out deleting a character.
(b) Vertical arrow keys (↑, ↓) recall previous and next command
and MATLAB responds with a new line awaiting a
lines. new command. To see the contents of the vector z
just type z and MATLAB responds with
4
§1.2 MATLAB
z = just type:
-1 2 3
z = [-1; 2; 3]
(b) MATLAB stores in a new vector the information ob-
tained by algebraic manipulation. Type and MATLAB responds with
a = 2*x - 3*y + 4*z;
z =
Now type a to find -1
2
a = 3
-8 15 11
We see that MATLAB has created a new row vector Note that MATLAB will not add a row vector and a col-
a with the correct number of entries. umn vector. Try typing x + z.
Individual entries of a vector can also be addressed. For
Note: In order to use the result of a calculation later instance, to display the first component of z type z(1).
in a MATLAB session, we need to name the result of
that calculation. To recall the calculation 2*x - 3*y +
4*z, we needed to name that calculation, which we did Entering Matrices Matrices are entered into MATLAB
by typing a = 2*x - 3*y + 4*z. Then we were able to row by row with rows separated either by semicolons or
recall the result just by typing a. by line returns. To enter the 2 × 3 matrix
We have seen that we enter a row n vector into MATLAB
2 3 1
A= ,
by surrounding a list of n numbers separated by spaces 1 4 7
with square brackets. For example, to enter the 5-vector
w = (1, 3, 5, 7, 9) just type just type
w = [1 3 5 7 9] A = [2 3 1; 1 4 7]
Note that the addition of two vectors is only defined when MATLAB has very sophisticated methods for addressing
the vectors have the same number of entries. Trying to the entries of a matrix. You can directly address individ-
add the 3-vector x with the 5-vector w by typing x + w ual entries, individual rows, and individual columns. To
in MATLAB yields the warning: display the entry in the 1st row, 3rd column of A, type
A(1,3). To display the 2nd column of A, type A(:,2);
??? Error using ==> + and to display the 1st row of A, type A(1,:). For ex-
Matrix dimensions must agree. ample, to add the two rows of A and store them in the
vector x, just type
In MATLAB new rows are indicated by typing ;. For
example, to enter the column vector x = A(1,:) + A(2,:)
−1
z = 2 , MATLAB has many operations involving matrices —
3 these will be introduced later, as needed.
5
§1.2 MATLAB
(a) a13 + a32 . 9. (matlab) Use MATLAB to compute log(−0.1). Are you
surprised by the answer?
(b) Three times the 3 rd
column of A.
(c) Twice the 2 nd
row of A minus the 3rd row.
(d) The sum of all of the columns of A.
In Exercises 5 – 6, let
1.2 2.3 −0.5 −2.9 1.23 1.6
A= and B=
0.7 −1.4 2.3 −2.2 1.67 0
6
§1.3 Special Kinds of Matrices
7
§1.3 Special Kinds of Matrices
8
§1.3 Special Kinds of Matrices
In Exercises 21-22, compute A + At and show it is symmetric: 27. Show that an n×n skew-symmetric matrix A has diagonal
elements aii = 0, for all i = 1, . . . , n.
2 3
21. A = .
−1 6
−1 1 −2 3
Exercises 24-25 prove several properties of the transpose op- Let A = and B = . In Exer-
4 5 0 1
eration
cises 30-33, determine whether the given matrix is symmetric,
24. Let A and B be m × n matrices. Show that skew-symmetric, or neither.
31. At − B.
9
§1.3 Special Kinds of Matrices
A + At A − At
A= + . (1.3.3)
2 2
Use (1.3.3) to verify that any n×n matrix can be expressed as
a sum of a symmetric matrix and a skew-symmetric matrix.
13 2 −41 81
−2 11 −7 6
38. (matlab) Let A = . Use
15 −3 −20 −19
4 −8 −27 0
(1.3.3) and MATLAB to express A = B + C, where B is
symmetric and C is skew-symmetric.
10
§1.4 The Geometry of Vector Operations
Operations 4
3 y
−1
11
§1.4 The Geometry of Vector Operations
3
x · y = x1 y1 + · · · + xn yn . (1.4.2)
2.5 Note that x · x is just ||x|| , the length of x squared.
2
0.5
x = [1 4 2];
y = [2 3 -1];
dot(x,y)
0
4 0
2 4
0
0
2
MATLAB responds with the dot product of x and y,
−2
−4
−2 namely,
−4
ans =
Figure 2: Addition of two vectors in three dimensions.
12
MATLAB responds with: One of the most important facts concerning dot products
is the one that states
ans = x·y =0 if and only if x and y are perpendicular.
4.5826 (1.4.3)
√ Indeed, dot product also gives a way of numerically de-
which is indeed approximately
p
1 + 4 2 + 22 = 21. termining the angle between n-vectors, as follows.
Now suppose r ∈ R and x ∈ R . A calculation shows
n
Theorem 1.4.1. Let θ be the angle between two nonzero
that n-vectors x and y. Then
||rx|| = |r|||x||. (1.4.1)
x·y
cos θ = . (1.4.4)
See Exercise 18. Note also that if r is positive, then ||x||||y||
the direction of rx is the same as that of x; while if r
is negative, then the direction of rx is opposite to the It follows that cos θ = 0 if and only if x · y = 0. Thus
direction of x. The lengths of the vectors 3x and −3x (1.4.3) is valid.
are each three times the length of x — but these vectors We show that Theorem 1.4.1 is just a restatement of the
point in opposite directions. Scalar multiplication by the law of cosines. This law states
scalar 0 produces the 0 vector, the vector whose entries
are all zero. c2 = a2 + b2 − 2ab cos θ,
where a, b, c are the lengths of the sides of a triangle and
Dot Product and Angles The dot product of two n- θ is the interior angle opposite the side of length c. See
vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) is an im- Figure 3.
12
§1.4 The Geometry of Vector Operations
(a cos θ, a sin θ)
x-y
a c
x
θ
(0, 0) b (b, 0) θ y
13
§1.4 The Geometry of Vector Operations
1. x = (3, 0).
to obtain the answer of 45.5847◦ .
2. x = (2, −1).
We verify (1.4.5) as follows. Note that the area of P is 5. x = (1, 3) and y = (3, −1).
the same as the area of the rectangle R also pictured in 6. x = (2, −1) and y = (−2, 1).
Figure 5. The side lengths of R are: ||v|| and ||w|| sin θ
where θ is the angle between v and w. A computation 7. x = (1, 1, 3, 5) and y = (1, −4, 3, 0).
using (1.4.4) shows that
8. x = (2, 1, 4, 5) and y = (1, −4, 3, −2).
2 2 2 2
|R| = ||v|| ||w|| sin θ
= ||v||2 ||w||2 (1 − cos2 θ) 9. Find a real number a so that the vectors
2 !
v·w
2 2
= ||v|| ||w|| 1 − x = (1, 3, 2) and y = (2, a, −6)
||v||||w||
are perpendicular.
= ||v||2 ||w||2 − (v · w)2 ,
P |w| sin(θ) R
11. Find the cosine of the angle between the normal vectors
θ to the planes
0 v |v| 2x − 2y + z = 14 and x + y − 2z = −10.
14
§1.4 The Geometry of Vector Operations
12. x = (2, 0) and y = (2, 1). In Exercises 25 – 26 let P be the parallelogram generated by
the given vectors v and w in R3 . Compute the area of that
13. x = (2, −1) and y = (1, 2). parallelogram.
14. x = (−1, 1, 4) and y = (0, 1, 3).
25. (matlab) v = (1, 5, 7) and w = (−2, 4, 13).
15. x = (−10, 1, 0) and y = (0, 1, 20).
26. (matlab) v = (2, −1, 1) and w = (−1, 4, 3).
16. x = (2, −1, 1, 3, 0) and y = (4, 0, 2, 7, 5).
17. x = (5, −1, 4, 1, 0, 0) and y = (−3, 0, 0, 1, 10, −5). 27. Show that the only vector that has norm zero is the zero
vector. In other words, ||x|| = 0 implies that x = 0.
18. Using the definition of length, verify that formula (1.4.1)
is valid.
15
Chapter 2 Solving Linear Equations
2 Solving Linear Equations tance) in Section 2.6. This section is included mainly for
completeness and need not be covered on a first reading.
The primary motivation for the study of vectors and ma-
trices is based on the study of solving systems of linear
equations. The algorithms that enable us to find solu-
tions are themselves based on certain kinds of matrix
manipulations. In these algorithms, matrices serve as a
shorthand for calculation, rather than as a basis for a
theory. We will see later that these matrix manipula-
tions do lead to a rich theory of how to solve systems
of linear equations. But our first step is just to see how
these equations are actually solved.
We begin with a discussion in Section 2.1 of how to write
systems of linear equations in terms of matrices. We
also show by example how complicated writing down the
answer to such systems can be. In Section 2.2, we recall
that solution sets to systems of linear equations in two
and three variables are lines and planes.
The best known and probably the most efficient method
for solving systems of linear equations (especially with
a moderate to large number of unknowns) is Gaussian
elimination. The idea behind this method, which is in-
troduced in Section 2.3, is to manipulate matrices by
elementary row operations to reduced echelon form. It
is then possible just to look at the reduced echelon form
matrix and to read off the solutions to the linear system,
if any. The process of reading off the solutions is formal-
ized in Section 2.4; see Theorem 2.4.6. Our discussion
of solving linear equations is presented with equations
whose coefficients are real numbers — though most of
our examples have just integer coefficients. The meth-
ods work just as well with complex numbers, and this
generalization is discussed in Section 2.5.
Throughout this chapter, we alternately discuss the the-
ory and show how calculations that are tedious when
done by hand can easily be performed by computer using
MATLAB. The chapter ends with a proof of the unique-
ness of row echelon form (a topic of theoretical impor-
16
§2.1 Systems of Linear Equations and Matrices
2.1 Systems of Linear Equations The algorithmic method used to solve (2.1.1) can be ex-
panded to produce a method, called substitution, for solv-
and Matrices ing larger systems. We describe the substitution method
It is a simple exercise to solve the system of two equations as it applies to (2.1.2). Solve the 1st equation in (2.1.2)
for x1 , obtaining
x + y = 7
(2.1.1) 4 4 3 6 2
−x + 3y = 1 x1 = + x2 − x3 + x4 − x5 . (2.1.3)
5 5 5 5 5
to find that x = 5 and y = 2. One way to solve system Then substitute the right hand side of (2.1.3) for x1 in
(2.1.1) is to add the two equations, obtaining the remaining four equations in (2.1.2) to obtain a new
system of four equations in the four variables x2 ,x3 ,x4 ,x5 .
4y = 8; This procedure eliminates the variable x1 . Now proceed
inductively — solve the 1st equation in the new system
hence y = 2. Substituting y = 2 into the 1st equation in for x2 and substitute this expression into the remaining
(2.1.1) yields x = 5. three equations to obtain a system of three equations in
This system of equations can be solved in a more algo- three unknowns. This step eliminates the variable x2 .
rithmic fashion by solving the 1st equation in (2.1.1) for Continue by substitution to eliminate the variables x3
x as and x4 , and arrive at a simple equation in x5 — which
x = 7 − y, can be solved. Once x5 is known, then x4 , x3 , x2 , and
x1 can be found in turn.
and substituting this answer into the 2nd equation in
(2.1.1), to obtain
Two Questions
−(7 − y) + 3y = 1.
• Is it realistic to expect to complete the substitution
This equation simplifies to: procedure without making a mistake in arithmetic?
17
§2.1 Systems of Linear Equations and Matrices
e2_1_4 x =
5.0000
followed by a carriage return. This instruction tells MAT- 2.0000
LAB to load equation (2.1.4*) of Chapter 2. The matrix 3.0000
of coefficients is now available in MATLAB; note that this 4.0000
matrix is stored in the 5 × 5 array A. What should appear 1.0000
is:
This answer is interpreted as follows: the five values of
the unknowns x1 ,x2 ,x3 ,x4 ,x5 are stored in the vector x;
A =
that is,
5 -4 3 -6 2
2 1 -1 -1 1 x1 = 5, x2 = 2, x3 = 3, x4 = 4, x5 = 1. (2.1.6)
18
§2.1 Systems of Linear Equations and Matrices
The reader may verify that (2.1.6) is indeed a solution Thus the command A(3,4) = -2 changes the entry in
of (2.1.2) by substituting the values in (2.1.6) into the the 3rd row, 4th column of A from 1 to −2. In other words,
equations in (2.1.2). we have now entered into MATLAB the information that
is needed to solve the system of equations
Changing Entries in MATLAB MATLAB also permits 5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4
access to single components of x. For instance, type 2x1 + x2 − x3 − x4 + x5 = 6
x1 + 2x2 + x3 − 2x4 + 3x5 = 19
x(5) −2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.
and the 5th entry of x is displayed,
As expected, this change in the coefficient matrix results
ans = in a change in the solution of system (2.1.2), as well.
1.0000 Typing
19
§2.1 Systems of Linear Equations and Matrices
x = A\b 1.
2x − y = 0
now leads to the message 3x = 6
2.
Warning: Matrix is singular to working precision. 3x − 4y = 2
2y + z = 1
x = 3z = 9
Inf 3.
Inf −2x + y = 9
Inf 3x + 3y = −9
Inf
Inf
4. Write the coefficient matrices for each of the systems of
Obviously, something is wrong; MATLAB cannot find linear equations given in Exercises 1 – 3.
a solution to this system of equations! Assuming that
MATLAB is working correctly, we have shed light on one
of our previous questions: the method of substitution 5. Neither of the following systems of three equations in three
described by (2.1.3) need not always lead to a solution, unknowns has a unique solution — but for different reasons.
even though the method does work for system (2.1.2). Solve these systems and explain why these systems cannot be
Why? As we will see, this is one of the questions that is solved uniquely.
answered by the theory of linear algebra. In the case of
(2.1.7), it is fairly easy to see what the difficulty is: the x − y = 4
second and fourth equations have the form y = 6 and (a) x + 3y − 2z = −6
−
−y = −12, respectively. 4x + 2y 3z = 1
and
Warning: The MATLAB command 2x − 4y + 3z = 4
(b) 3x − 5y + 3z = 5
x = A\b 2y − 3z = −4
20
§2.1 Systems of Linear Equations and Matrices
7. (a) Find a quadratic polynomial p(x) = ax2 + bx + c 9. (matlab) Matrices are entered in MATLAB as follows.
satisfying p(0) = 1, p(1) = 5, and p(−1) = −5. To enter the 2 × 3 matrix A, type A = [ -1 1 2; 4 1 2].
Enter this matrix into MATLAB; the displayed matrix should
be
A =
(b) Prove that for every triple of real numbers L, M , and -1 1 2
N , there is a quadratic polynomial satisfying p(0) = L, 4 1 2
p(1) = M , and p(−1) = N .
Now change the entry in the 2nd row, 1st column to −5.
21
§2.1 Systems of Linear Equations and Matrices
x = A\b
x =
-0.2000
1.0000
-1.2000
Find an integer for the entry in the 2nd row, 2nd column of A
so that the solution
x = A\b
Vitamin S1 S2 S3 S4
A 25% 19% 20% 3%
B 2% 14% 2% 14%
C 8% 4% 1% 0%
F 25% 31% 25% 16%
22
§2.2 The Geometry of Low-Dimensional Solutions
Linear Equations in Two Dimensions The set of all produces a vector whose entries correspond to the y-
solutions to the equation coordinates of points on the line (2.2.1). Then typing
2x − y = 6 (2.2.1) plot(x,y)
is a straight line in the xy plane; this line has slope 2 and
produces the desired plot. It is useful to label the axes
y-intercept equal to −6. We can use MATLAB to plot the
on this figure, which is accomplished by typing
solutions to this equation — though some understanding
of the way MATLAB works is needed.
xlabel('x')
The plot command in MATLAB plots a sequence of ylabel('y')
points in the plane, as follows. Let X and Y be n vectors.
Then We can now use MATLAB to solve the equation (2.1.1)
graphically. Recall that (2.1.1) is:
plot(X,Y)
x + y =7
will plot the points (X(1), Y (1)), (X(2), Y (2)), …, −x + 3y = 1
(X(n), Y (n)) in the xy-plane.
A solution to this system of equations is a point that lies
To plot points on the line (2.2.1) we need to enter the x- on both lines in the system. Suppose that we search for a
coordinates of the points we wish to plot. If we want to solution to this system that has an x-coordinate between
plot a hundred points, we would be facing a tedious task. −3 and 7. Then type the commands
MATLAB has a command to simplify this task. Typing
x = linspace(-3,7,100);
x = linspace(-5,5,100); y = 7 - x;
plot(x,y)
produces a vector x with 100 entries with the 1st entry xlabel('x')
equal to −5, the last entry equal to 5, and the remaining ylabel('y')
98 entries equally spaced between −5 and 5. MATLAB hold on
has another command that allows us to create a vector y = (1 + x)/3;
of points x. In this command we specify the distance plot(x,y)
between points rather than the number of points. That axis('equal')
command is: grid
23
§2.2 The Geometry of Low-Dimensional Solutions
The MATLAB command hold on tells MATLAB to keep these lines are parallel and unequal, then there are no
the present figure and to add the information that follows solutions, as there are no points of intersection.
to that figure. The command axis('equal') instructs
MATLAB to make unit distances on the x and y axes
Linear Equations in Three Dimensions We begin by
equal. The last MATLAB command superimposes grid
observing that the set of all solutions to a linear equa-
lines. See Figure 6. From this figure you can see that
tion in three variables forms a plane. More precisely, the
the solution to this system is (x, y) = (5, 2), which we
solutions to the equation
already knew.
10 ax + by + cz = d (2.2.2)
X · Y = x1 y1 + x2 y2 + x3 y3 ,
y
4
where X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ). We recall
from Chapter 1 (1.4.3) the following important fact con-
cerning dot products:
2
X ·Y =0
24
§2.2 The Geometry of Low-Dimensional Solutions
puts equation (2.2.3) into the form (2.2.2). In this way The first command tells MATLAB to create a square grid
we see that the set of solutions to a single linear equation in the xy-plane. Grid points are equally spaced between
in three variables forms a plane. See Figure 7. −5 and 5 at intervals of 0.5 on both the x and y axes. The
second command tells MATLAB to compute the z value
N of the solution to (2.2.4) at each grid point. The third
command tells MATLAB to graph the surface containing
the points (x, y, z). See Figure 8.
X 30
X
0 20
10
−10
−20
0 −30
5
N.
0
0
−5
We now use MATLAB to visualize the planes that are
−5
25
§2.2 The Geometry of Low-Dimensional Solutions
20
10
30
0
20
−10
10
−20
0
−30
−10 5
5
−20
0
0
−30
5
−5 −5
5
0
0
Figure 10: Point of intersection of three planes.
−5 −5
Figure 9: Line of intersection of two planes. this system accurately. Denote the 3 × 3 matrix of coef-
ficients by A, the vector of coefficients on the right hand
We can now see geometrically that the solution to three side by b, and the solution by x. Solve the system in
simultaneous linear equations in three unknowns will gen- MATLAB by typing
erally be a point — since generally three planes in three
space intersect in a point. To visualize this intersection, A = [ -2 3 1; 2 -3 1; -3 0.2 1];
as shown in Figure 10, we extend the previous system of b = [2; 0; 1];
equations to x = A\b
26
§2.2 The Geometry of Low-Dimensional Solutions
12
y
a single variable, such as 8
y = x2 − 2x + 3 (2.2.5) 6
produces the graph of (2.2.5) in Figure 11. In a similar 4. Find a system of two linear equations in three unknowns
fashion, MATLAB has the ‘dot’ operations of ./, .\, and whose solution set is the line consisting of scalar multiples of
.^, as well as .*. the vector (1, 2, 1).
27
§2.2 The Geometry of Low-Dimensional Solutions
5. Find the cosine of the angle between the normal vectors 9. (matlab) Use MATLAB to solve graphically the planar
to the planes system of linear equations
x + 4y = −4
4x + 3y = 4 12. (matlab) Use MATLAB to graph the function y =
2 − x sin(x2 − 1) on the interval [−2, 3]. How many relative
to an accuracy of two decimal points. maxima does this function have on this interval?
Hint: The MATLAB command zoom on allows us to view
the plot in a window whose axes are one-half those of original.
Each time you click with the mouse on a point, the axes’ limits
are halved and centered at the designated point. Coupling
zoom on with grid on allows you to determine approximate
numerical values for the intersection point.
28
§2.3 Gaussian Elimination
Easily Solved Equations Some systems are easily Definition 2.3.1. A linear system of equations is in-
solved. The system of three equations (m = 3) in three consistent if the system has no solutions and consistent
unknowns (n = 3) if the system does have solutions.
x1 + 2x2 + 3x3 = 10
1 7 As discussed in the previous section, (2.1.7) is an example
x2 − x3 = (2.3.2) of a linear system that MATLAB cannot solve. In fact,
5 5
x3 = 3 that system is inconsistent — inspect the 2nd and 4th
equations in (2.1.7).
is one example. The 3rd equation states that x3 = 3.
Gaussian elimination is an algorithm for finding all so-
Substituting this value into the 2nd equation allows us to
lutions to a system of linear equations by reducing the
solve the 2nd equation for x2 = 2. Finally, substituting
given system to ones like (2.3.2) and (2.3.3), that are
x2 = 2 and x3 = 3 into the 1st equation allows us to solve
easily solved by back substitution. Consequently, Gaus-
for x1 = −3. The process that we have just described is
sian elimination can also be used to determine whether
called back substitution.
a system is consistent or inconsistent.
Next, consider the system of two equations (m = 2) in
three unknowns (n = 3):
Elementary Equation Operations There are three ways
x1 + 2x2 + 3x3 = 10
(2.3.3) to change a system of equations without changing the
x3 = 3 . set of solutions; Gaussian elimination is based on this
observation. The three elementary operations are:
The 2nd equation in (2.3.3) states that x3 = 3. Substitut-
ing this value into the 1st equation leads to the equation
(a) Swap two equations.
x1 = 1 − 2x2 .
(b) Multiply a single equation by a nonzero number.
We have shown that every solution to (2.3.3) has the
form (x1 , x2 , x3 ) = (1 − 2x2 , x2 , 3) and that every vector (c) Add a scalar multiple of one equation to another.
29
§2.3 Gaussian Elimination
We begin with an example: The augmented matrix contains all of the information
that is needed to solve system (2.3.1).
x1 + 2x2 + 3x3 = 10
x1 + 2x2 + x3 = 4 (2.3.4)
2x1 + 9x2 + 5x3 = 27 . Elementary Row Operations The elementary opera-
tions used in Gaussian elimination can be interpreted as
Gaussian elimination works by eliminating variables from
row operations on the augmented matrix, as follows:
the equations in a fashion similar to the substitution
method in the previous section. To begin, eliminate the
(a) Swap two rows.
variable x1 from all but the 1st equation, as follows. Sub-
tract the 1st equation from the 2nd , and subtract twice (b) Multiply a single row by a nonzero number.
the 1st equation from the 3rd , obtaining:
(c) Add a scalar multiple of one row to another.
x1 + 2x2 + 3x3 = 10
−2x3 = −6 (2.3.5) We claim that by using these elementary row operations
5x2 − x3 = 7 . intelligently, we can always solve a consistent linear sys-
Next, swap the 2nd and 3rd equations, so that the coef- tem — indeed, we can determine when a linear system
ficient of x2 in the new 2nd equation is nonzero. This is consistent or inconsistent. The idea is to perform ele-
yields mentary row operations in such a way that the new aug-
mented matrix has zero entries below the diagonal.
x1 + 2x2 + 3x3 = 10
5x2 − x3 = 7 (2.3.6) We describe this process inductively. Begin with the 1st
−2x3 = −6 . column. We assume for now that some entry in this col-
umn is nonzero. If a11 = 0, then swap two rows so that
Now, divide the 2nd equation by 5 and the 3rd equation the number a11 is nonzero. Then divide the 1st row by
by −2 to obtain a system of equations identical to our a11 so that the leading entry in that row is 1. Now sub-
first example (2.3.2), which we solved by back substitu- tract ai1 times the 1st row from the ith row for each row
tion. i from 2 to m. The end result is that the 1st column has
a 1 in the 1st row and a 0 in every row below the 1st .
The result is
Augmented Matrices The process of performing Gaus-
1 ∗ ··· ∗
sian elimination when the number of equations is greater 0 ∗ ··· ∗
than two or three is painful. The computer, however, can .. .. .. .. .
. . . .
help with the manipulations. We begin by introducing
the augmented matrix. The augmented matrix associ- 0 ∗ ··· ∗
ated with (2.3.1) has m rows and n + 1 columns and is
written as: Next we consider the 2nd column. We assume that some
entry in that column below the 1st row is nonzero. So,
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2 if necessary, we can swap two rows below the 1st row so
(2.3.7) that the entry a22 is nonzero. Then we divide the 2nd
.. .. .. ..
. . . . row by a22 so that its leading nonzero entry is 1. Then
am1 am2 ··· amn bm we subtract appropriate multiples of the 2nd row from
30
§2.3 Gaussian Elimination
each row below the 2nd so that all the entries in the 2nd A(4,:) = A(4,:) - 3*A(7,:)
column below the 2nd row are 0. The result is
The first elementary row operation, swapping two rows,
1 ∗ ··· ∗ requires a different kind of MATLAB command. In MAT-
··· ∗ LAB, the ith and j th rows of the matrix A are permuted
0 1
.. .. .. .. .
. . . . by the command
0 0 ··· ∗
A([i j],:) = A([j i],:)
Then we continue with the 3rd column. That’s the idea. So, to swap the 1st and 3rd rows of the matrix A, we type
However, does this process always work and what hap-
pens if all of the entries in a column are zero? Before A([1 3],:) = A([3 1],:)
answering these questions we do experimentation with
MATLAB.
Examples of Row Reduction in MATLAB Let us see
how the row operations can be used in MATLAB. As an
Row Operations in MATLAB In MATLAB the ith row
example, we consider the augmented matrix
of a matrix A is specified by A(i,:). Thus to replace the
5th row of a matrix A by twice itself, we need only type:
1 3 0 −1 −8
2 6 −4 4 4
(2.3.8*)
A(5,:) = 2*A(5,:)
1 0 −1 −9 −35
0 1 0 3 10
In general, we can replace the ith row of the matrix A by
c times itself by typing We enter this information into MATLAB by typing
31
§2.3 Gaussian Elimination
A = A =
1 3 0 -1 -8 1 3 0 -1 -8
0 0 -4 6 20 0 1 0 3 10
1 0 -1 -9 -35 0 0 -1 1 3
0 1 0 3 10 0 0 -4 6 20
In the next step, we eliminate the 1 from the entry in the Now we have set all entries in the 2nd column below the
3rd row, 1st column of A. We do this by typing 2nd row to 0.
Next, we set the first nonzero entry in the 3rd row to 1
A(3,:) = A(3,:) - A(1,:)
by multiplying the 3rd row by −1, obtaining
which yields
A =
A = 1 3 0 -1 -8
1 3 0 -1 -8 0 1 0 3 10
0 0 -4 6 20 0 0 1 -1 -3
0 -3 -1 -8 -27 0 0 -4 6 20
0 1 0 3 10
Since the leading nonzero entry in the 3rd row is 1, we
Using elementary row operations, we have now set the next eliminate the nonzero entry in the 3rd column, 4th
entries in the 1st column below the 1st row to 0. Next, row. This is accomplished by the following MATLAB
we alter the 2nd column. We begin by swapping the 2nd command:
and 4th rows so that the leading nonzero entry in the 2nd
row is 1. To accomplish this swap, we type A(4,:) = A(4,:) + 4*A(3,:)
32
§2.3 Gaussian Elimination
e2_3_11 A =
1 0 -2 3 4 0 1
The information in (2.3.11*) is contained in the coeffi- 0 1 2 4 0 -2 0
cient matrix C and the right hand side b. A direct solu- 0 -1 0 -6 -10 8 -6
tion is found by typing -3 0 6 -8 -12 2 -2
which yields the same answer as in (2.3.10), namely, A(4,:) = A(4,:) + 3*A(1,:)
33
§2.3 Gaussian Elimination
to create two more zeros in the 4th row. Finally, we nonzero coefficient. In this case, we use the 4th equa-
eliminate the -1 in the 3rd row, 2nd column by tion to solve for x4 in terms of x5 and x6 , and then we
substitute for x4 in the first three equations. This process
A(3,:) = A(3,:) + A(2,:) can also be accomplished by elementary row operations.
Indeed, eliminating the variable x4 from the first three
to arrive at equations is the same as using row operations to set the
first three entries in the 4th column to 0. We can do this
A = by typing
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0 A(3,:) = A(3,:) + A(4,:);
0 0 2 -2 -10 6 -6 A(2,:) = A(2,:) - 4*A(4,:);
0 0 0 1 0 2 1 A(1,:) = A(1,:) - 3*A(4,:)
Next we set the leading nonzero entry in the 3rd row to Remember: By typing semicolons after the first two
1 by dividing the 3rd row by 2. That is, we type rows, we have told MATLAB not to print the intermediate
results. Since we have not typed a semicolon after the
A(3,:) = A(3,:)/2 3rd row, MATLAB outputs
to obtain A =
1 0 -2 0 4 -6 -2
A = 0 1 2 0 0 -10 -4
1 0 -2 3 4 0 1 0 0 1 0 -5 5 -2
0 1 2 4 0 -2 0 0 0 0 1 0 2 1
0 0 1 -1 -5 3 -3
0 0 0 1 0 2 1 We proceed with back substitution by eliminating the
nonzero entries in the first two rows of the 3rd column.
We say that the matrix A is in (row) echelon form since To do this, type
the first nonzero entry in each row is a 1, each entry in a
column below a leading 1 is 0, and the leading 1 moves A(2,:) = A(2,:) - 2*A(3,:);
to the right as you go down the matrix. In row echelon A(1,:) = A(1,:) + 2*A(3,:)
form, the entries where leading 1’s occur are called pivots.
If we compare the structure of this matrix to the ones we which yields
have obtained previously, then we see that here we have
two columns too many. Indeed, we may solve these equa- A =
tions by back substitution for any choice of the variables
1 0 0 0 -6 4 -6
x5 and x6 .
0 1 0 0 10 -20 0
The idea behind back substitution is to solve the last 0 0 1 0 -5 5 -2
equation for the variable corresponding to the first 0 0 0 1 0 2 1
34
§2.3 Gaussian Elimination
The augmented matrix is now in reduced echelon form It follows that the general solution to a linear system of
and the corresponding system of equations has the form equations is given by a single solution (x5 = x6 = 0) plus
the linear combination of a finite number of vectors. We
x1 − 6x5 + 4x6 = −6 will discuss reduced echelon form in more detail in the
x2 + 10x5 − 20x6 = 0
(2.3.13) next section.
x3 − 5x5 + 5x6 = −2
x4 + 2x6 = 1,
Exercises
A matrix is in reduced echelon form if it is in echelon
form and if every entry in a column containing a pivot,
other than the pivot itself, is 0. In Exercises 1 – 3 determine whether the given matrix is in
Reduced echelon form allows us to solve directly this sys- reduced echelon form.
tem of equations in terms of the variables x5 and x6 ,
1 −1 0 1
1. 0 1 0 −6 .
x1 −6 + 6x5 − 4x6 0 0 1 0
x2 −10x5 + 20x6
1 0 −2 0
x3 −2 + 5x5 − 5x6
= . (2.3.14) 2. 0 1 4 0 .
x4 1 − 2x6
x5
0 0 0 1
x5
x6 x6
0 1 0 3
3. 0 0 2 1 .
It is important to note that every consistent system of 0 0 0 0
linear equations corresponding to an augmented matrix
in reduced echelon form can be solved as in (2.3.14) —
and this is one reason for emphasizing reduced echelon In Exercises 4 – 6 we list the reduced echelon form of an
form. We can rewrite the solutions in (2.3.14) in the augmented matrix of a system of linear equations. Which
columns in these augmented matrices contain pivots? De-
form:
scribe all solutions to these systems of equations in the form
of (2.3.14).
x1 , −6 6 −4
x2 0 −10 20
1 4 0 0
4. 0 0 1 5 .
x3 −2 5
−5
= + x5 + x 6
.
−2
x4 1 0 0 0 0 0
x5 0 1 0
x6 0 0 1 1 2 0 0 0
5. 0 0 1 1 0 .
Definition 2.3.2. A linear combination of the vectors 0 0 0 0 1
v1 , . . . , vk in Rn is a vector in Rn of the form
1 −6 0 −1 1
6. 0 0 1 2 9 .
v = α1 v1 + · · · + αk vk 0 0 0 0 0
where α1 , . . . αk are scalars in R.
35
§2.3 Gaussian Elimination
7. Suppose that à = [A|b] is the augmented matrix of a (b) if solutions are not unique, how many variables can be
system of 4 linear equations in 7 unknowns. Suppose that assigned arbitrary values.
the solution is defined by 5 parameters. Let E be the 4 × 8
reduced echelon form of Ã. How many pivots does E have?
1 0 0 3
12. 0 2 1 1 .
0 0 0 0
36
§2.3 Gaussian Elimination
In Exercises 21 – 23 use elementary row operations and MAT- 26. (matlab) Comment: To understand the point of this
LAB to put each of the given matrices into row echelon form. exercise you must begin by typing the MATLAB command
Suppose that the matrix is the augmented matrix for a system format short e. This command will set a format in which
of linear equations. Is the system consistent or inconsistent? you can see the difficulties that sometimes arise in numerical
computations.
21. (matlab)
Consider the following two 3 × 3-matrices:
2 1 1
.
4 2 3
1 3 4 3 14
22. (matlab) A = 2 1 1 and B = 1 21 .
3 −4 0 2 −4 3 5 3 −45
0 2 3 1 . (2.3.17*)
3 1 4 5 Note that matrix B is obtained from matrix A by interchang-
ing the first two columns.
23. (matlab)
−2 1 9 1
(a) Use MATLAB to put A into row echelon form using the
3 3 −4 2 . transformations
1 4 5 5 (a) Subtract 2 times the 1st row from the 2nd .
(b) Add 4 times the 1st row to the 3rd .
Observation: In standard format MATLAB displays all (c) Divide the 2nd row by −5.
nonzero real numbers with four decimal places while it dis-
(d) Subtract 15 times the 2nd row from the 3rd .
plays zero as 0. An unfortunate consequence of this display
is that when a matrix has both zero and noninteger entries, (b) Put B by hand into row echelon form using the trans-
the columns will not align — which is a nuisance. You can formations
work with rational numbers rather than decimal numbers by (a) Divide the 1st row by 3.
typing format rational. Then the columns will align.
(b) Subtract the 1st row from the 2nd .
24. (matlab) Load the following 6 × 8 matrix A into MAT- (c) Subtract 3 times the 1st row from the 3rd .
LAB by typing e2_3_16. (d) Multiply the 2nd row by 3/5.
(e) Add 5 times the 2nd row to the 3rd .
0 0 0 1 3 5 0 9
0 3
6 −6 −6 −12 0 1
(c) Use MATLAB to put B into row echelon form using the
0 2 4 −5 −7 14 0 1 (2.3.16*)
same transformations as in part (b).
A= 0 1 2 1 14 21 0 −1
(d) Discuss the outcome of the three transformations. Is
0 0 0 2 4 9 0 7
there a difference in the results? Would you expect to
0 5 10 −11 −13 2 0 2
see a difference? Could the difference be crucial when
Use MATLAB to transform this matrix to row echelon form. solving a system of linear equations?
25. (matlab) Use row reduction and back substitution to 27. (matlab) Find a cubic polynomial
solve the following system of linear equations:
p(x) = ax3 + bx2 + cx + d
2x1 + 3x2 − 4x3 + x4 = 2
3x1 − x2 − x3 + 2x4 = 4 so that p(1) = 2, p(2) = 3, p0 (−1) = −1, and p0 (3) = 1.
x1 − 7x2 + 5x3 − x4 = 6
37
§2.4 Reduction to Echelon Form
2.4 Reduction to Echelon Form Here are three examples of matrices that are not in ech-
elon form.
In this section, we formalize our previous numerical ex-
0 0 1 15
periments. We define more precisely the notions of ech- 1 −1 14 −6
elon form and reduced echelon form matrices, and we 0 0 0 0
prove that every matrix can be put into reduced eche-
lon form using a sequence of elementary row operations.
1 −1 14 −6
Consequently, we will have developed an algorithm for 0 0 3 15
determining whether a system of linear equations is con- 0 0 0 0
sistent or inconsistent, and for determining all solutions
to a consistent system. 1 −1 14 −6
0 0 0 0
Definition 2.4.1. A matrix E is in (row) echelon form 0 0 1 15
if two conditions hold.
Definition 2.4.2. Two m×n matrices are row equivalent
(a) The first nonzero entry in each row of E is equal to if one can be transformed to the other by a sequence of
1. This leading entry 1 is called a pivot. elementary row operations.
(b) A pivot in the (i + 1)st row of E occurs in a column
to the right of the column where the pivot in the ith Let A = (aij ) be a matrix with m rows and n columns.
row occurs. We want to show that we can perform row operations
on A so that the transformed matrix is in echelon form;
Note: A consequence of Definition 2.4.1 is that all rows that is, A is row equivalent to a matrix in echelon form.
in an echelon form matrix that are identically zero occur If A = 0, then we are finished. So we assume that some
at the bottom of the matrix. entry in A is nonzero and that the 1st column where that
Here are three examples of matrices that are in echelon nonzero entry occurs is in the k th column. By swapping
form. The pivot in each row (which is always equal to 1) rows we can assume that a1k is nonzero. Next, divide
is preceded by a ∗. the 1st row by a1k , thus setting a1k = 1. Now, using
MATLAB notation, perform the row operations
∗1 0 −1 0 −6 4 −6
0 ∗1 4 0 0 −2 0
A(i,:) = A(i,:) - A(i,k)*A(1,:)
0 0 0 ∗1 −5 5 −2
0 0 0 0 0 ∗1 0 for each i ≥ 2. This sequence of row operations leads to
a matrix whose first nonzero column has a 1 in the 1st
∗1 0 −1 0 −6
0 ∗1 0 3 0 row and a zero in each row below the 1st row.
0 ∗1 −5 Now we look for the next column that has a nonzero entry
0 0
0 0 0 0 0 below the 1st row and call that column `. By construction
` > k. We can swap rows so that the entry in the 2nd
0 ∗1 −1 14 −6
0 0 0 ∗1 15 row, `th column is nonzero. Then we divide the 2nd row
by this nonzero element, so that the pivot in the 2nd row
0 0 0 0 0
0 0 0 0 0 is 1. Again we perform elementary row operations so that
38
§2.4 Reduction to Echelon Form
all entries below the 2nd row in the `th column are set to Reduced Echelon Form in MATLAB Preprogrammed
0. Now proceed inductively until we run out of nonzero into MATLAB is a routine to row reduce any matrix to
rows. reduced echelon form. The command is rref. For ex-
ample, recall the 4 × 7 matrix A in (2.3.12*) by typing
This argument proves:
e2_3_12. Put A into reduced row echelon form by typing
Proposition 2.4.3. Every matrix is row equivalent to a rref(A) and obtaining
matrix in echelon form.
ans =
More importantly, the previous argument provides an al- 1 0 0 0 -6 4 -6
gorithm for transforming matrices into echelon form. 0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
Reduction to Reduced Echelon Form 0 0 0 1 0 2 1
Definition 2.4.4. A matrix E is in reduced echelon form Compare the result with the system of equations (2.3.13).
if
Proof Let A be a matrix. Proposition 2.4.3 states all elementary row operations are invertible
that we can transform A by elementary row operations
— they can be undone.
to a matrix E in echelon form. Next we transform E
into reduced echelon form by some additional elementary For example, swapping two rows is undone by just swap-
row operations, as follows. Choose the pivot in the last ping these rows again. Similarly, multiplying a row by a
nonzero row of E. Call that row `, and let k be the nonzero number c is undone by just dividing that same
column where the pivot occurs. By adding multiples of row by c. Finally, adding c times the j th row to the ith
the `th row to the rows above, we can transform each row is undone by subtracting c times the j th row from
entry in the k th column above the pivot to 0. Note that the ith row.
none of these row operations alters the matrix before the Thus, we can make several observations about solutions
k th column. (Also note that this process is identical to to linear systems. Let E be an augmented matrix corre-
the process of back substitution.)
sponding to a system of linear equations having n vari-
Again we proceed inductively by choosing the pivot in ables. Since an augmented matrix is formed from the
the (` − 1)st row, which is 1, and zeroing out all entries matrix of coefficients by adding a column, we see that
above that pivot using elementary row operations. the augmented matrix has n + 1 columns.
39
§2.4 Reduction to Echelon Form
Theorem 2.4.6. Suppose that E is an m × (n + 1) aug- Thus, each choice of the n − ` numbers x`+1 , . . . , xn
mented matrix that is in reduced echelon form. Let ` be uniquely determines values of x1 , . . . , x` so that
the number of nonzero rows in E x1 , . . . , xn is a solution to this system. In particular,
the system is consistent, so (a) is proved; and the set of
(a) The system of linear equations corresponding to E all solutions is parameterized by n − ` numbers, so (b) is
is inconsistent if and only if the `th row in E has a proved.
pivot in the (n + 1)st column.
(b) If the linear system corresponding to E is consistent, Two Examples Illustrating Theorem 2.4.6 The reduced
then the set of all solutions is parameterized by n − ` echelon form matrix
parameters.
1 5 0 0
E= 0 0 1 0
0 0 0 1
Proof Suppose that the last nonzero row in E has its
pivot in the (n + 1)st column. Then the corresponding is the augmented matrix of an inconsistent system of
equation is: three equations in three unknowns.
This system can be rewritten in the form: Consequences of Theorem 2.4.6 It follows from Theo-
rem 2.4.6 that linear systems of equations with fewer
x1 = b1 − a1,`+1 x`+1 − · · · − a1,n xn equations than unknowns and with zeros on the right
hand side always have nonzero solutions. More precisely:
x2 = b2 − a2,`+1 x`+1 − · · · − a2,n xn (2.4.1)
.. .. Corollary 2.4.7. Let A be an m×n matrix where m < n.
. . Then the system of linear equations whose augmented
x` = b` − a`,`+1 x`+1 − · · · − a`,n xn . matrix is (A|0) has a nonzero solution.
40
§2.4 Reduction to Echelon Form
Proof Perform elementary row operations on the aug- Hence Theorem 2.4.6(b) implies that the solutions to the
mented matrix (A|0) to arrive at the reduced echelon system corresponding to E are parameterized by n − `
form matrix (E|0). Since the zero vector is a solution, parameters. If ` < n, then the solution is not unique. So
the associated system of equations is consistent. Now the ` = n.
number of nonzero rows ` in (E|0) is less than or equal to Next observe that since the system of linear equations
the number of rows m in E. By assumption m < n and is consistent, it follows from Theorem 2.4.6(a) that the
hence ` < n. It follows from Theorem 2.4.6 that solu- pivot in the nth row must occur in a column before the
tions to the linear system are parametrized by n − ` ≥ 1 (n + 1)st . It follows that the reduced echelon matrix
parameters and that there are nonzero solutions.
E = (In |c) for some c ∈ Rn . Since (A|b) is row equiva-
lent to (In |c), it follows, by using the same sequence of
Recall that two m × n matrices are row equivalent if one elementary row operations, that A is row equivalent to
can be transformed to the other by elementary row op- In .
erations.
Corollary 2.4.8. Let A be an n × n square matrix and Uniqueness of Reduced Echelon Form and Rank Ab-
let b be in Rn . Then A is row equivalent to the identity stractly, our discussion of reduced echelon form has one
matrix In if and only if the system of linear equations point remaining to be proved. We know that every ma-
whose augmented matrix is (A|b) has a unique solution. trix A can be transformed by elementary row operations
to reduced echelon form. Suppose, however, that we use
two different sequences of elementary row operations to
Proof Suppose that A is row equivalent to In . Then, transform A to two reduced echelon form matrices E1
by using the same sequence of elementary row operations, and E2 . Can E1 and E2 be different? The answer is:
it follows that the n × (n + 1) augmented matrix (A|b) No.
is row equivalent to (In |c) for some vector c ∈ Rn . The
system of linear equations that corresponds to (In |c) is: Theorem 2.4.9. For each matrix A, there is precisely
one reduced echelon form matrix E that is row equivalent
x1 = c1 to A.
.. .. ..
. . . The proof of Theorem 2.4.9 is given in Section 2.6. Since
xn = cn , every matrix is row equivalent to a unique matrix in re-
duced echelon form, we can define the rank of a matrix
which transparently has the unique solution x =
as follows.
(c1 , . . . , cn ). Since elementary row operations do not
change the solutions of the equations, the original aug- Definition 2.4.10. Let A be an m × n matrix that is
mented system (A|b) also has a unique solution. row equivalent to a reduced echelon form matrix E. Then
Conversely, suppose that the system of linear equations the rank of A, denoted rank(A), is the number of nonzero
associated to (A|b) has a unique solution. Suppose that rows in E.
(A|b) is row equivalent to a reduced echelon form matrix
E. Suppose that the last nonzero row in E is the `th Corollary 2.4.11. We make four remarks concerning
row. Since the system has a solution, it is consistent. the rank of a matrix.
41
§2.4 Reduction to Echelon Form
(c) The number ` in the statement of Theorem 2.4.6 is 4. The augmented matrix of a consistent system of five equa-
tions in seven unknowns has rank equal to three. How many
just the rank of E.
parameters are needed to specify all solutions?
(d) In particular, if the rank of the augmented matrix
corresponding to a consistent system of linear equa-
tions in n unknowns has rank `, then the solutions 5. The augmented matrix of a consistent system of nine equa-
to this system are parametrized by n − ` parameters. tions in twelve unknowns has rank equal to five. How many
parameters are needed to specify all solutions?
Exercises
6. Consider the system of equations
42
§2.4 Reduction to Echelon Form
8. (matlab)
b1
b2
2
4 6 −2 1
(a) Describe the sets of vectors b = b3 ∈ R such that
4
B= 0 0 4 1 −1 (2.4.3*) b4
2 4 0 1 2 the system of equations Ax = b has (i) no solution, (ii)
one solution, and (iii) infinitely many solutions.
9. (matlab)
(b) Denote the first column of A by C1 , the second column
2 3 −1 4 by C2 and the third column by C3 . Can you write the
C= 8 11 −7 8 (2.4.4*) vector
2 2 −4 −3
2
10. (matlab) y=
−4
5
2.3 4.66 −1.2 2.11 −2 0
0 0 1.33 0 1.44
D= in the form
4.6 9.32 −7.986 4.22 −10.048
1.84 3.728 −5.216 1.688 −6.208 x1 C1 + x2 C2 + x3 C3 (2.4.6)
(2.4.5*)
where x1 , x2 , x3 ∈ R? If so, express y in this form.
(i) no solution,
14. Prove that the rank of an m × n matrix A is less than or (ii) one solution, and
equal to the minimum of m and n.
(iii) infinitely many solutions.
15. Consider the matrix Solution: Subtracting r times the first row of A from the
1 0 −1
second row of that matrix yields
−2 0 2
A=
0 1 −2 1 −r 1 1 −r 1
2 =
0 0 0 0 r −1 1−r 0 (r + 1)(r − 1) 1−r
43
§2.4 Reduction to Echelon Form
(a) rank(A) = 2 if r 6= 1.
(b) The linear system corresponding to the augmented ma-
trix A has
(i) no solution if r = −1,
(ii) one solution if r 6= ±1, and
(iii) infinitely many solutions if r = 1.
44
§2.5 Linear Equations with Special Coefficients
√
2.5 Linear Equations with Special Then divide the 2nd row by 36.2 − 3π 2, obtaining:
Coefficients
1 π 2
√ √
11.2 2√
45
§2.5 Linear Equations with Special Coefficients
to obtain
Integers and Rational Numbers Now suppose that all
of the coefficients in a system of linear equations are inte-
ans =
gers. When we add, subtract or multiply integers — we
24.5417
get integers. In general, however, when we divide an inte-
-1.9588
ger by an integer we get a rational number rather than an
integer. Indeed, since elementary row operations involve
The reader may check that this answer agrees with the only the operations of addition, subtraction, multiplica-
answer in (2.5.2) to MATLAB output accuracy by typing tion and division, we see that if we perform elementary
row operations on a matrix with integer entries, we will
x2 = (exp(1)-33.6*sqrt(2))/(36.2-3*pi*sqrt(2)) end up with a matrix with rational numbers as entries.
x1 = 11.2*sqrt(2)-pi*sqrt(2)*x2 MATLAB can display calculations using rational numbers
rather than decimal numbers. To display calculations
to obtain using only rational numbers, type
x1 = format rational
24.5417
For example, let
and 2
2 1 0
1 3 −5 1
x2 = A=
4
(2.5.3*)
2 1 3
-1.9588 2 1 −1 4
46
§2.5 Linear Equations with Special Coefficients
47
§2.5 Linear Equations with Special Coefficients
it uses to do arithmetic with real and rational numbers. Complex Conjugation Let a = α + iβ be a complex num-
For example, we can solve the system of linear equations ber. Then the complex conjugate of a is defined to be
(4 − i)x1 + 2x2 = 3−i a = α − iβ.
2x1 + (4 − 3i)x2 = 2+i
Let a = α + iβ and c = γ + iδ be complex numbers. Then
in MATLAB by typing we claim that
a+c = a+c
A = [4-i 2; 2 4-3i]; (2.5.5)
ac = a c
b = [3-i; 2+i];
A\b To verify these statements, calculate
a + c = (α + γ) + i(β + δ) = (α + γ) − i(β + δ)
The solution to this system of equations is:
= (α − iβ) + (γ − iδ) = a + c
ans =
0.8457 - 0.1632i and
-0.1098 + 0.2493i
ac = (αγ − βδ) + i(αδ + βγ)
Note: Care must be given when entering complex numbers into = (αγ − βδ) − i(αδ + βγ)
arrays in MATLAB. For example, if you type
= (α − iβ)(γ − iδ) = a c.
b = [3 -i; 2 +i]
48
§2.5 Linear Equations with Special Coefficients
In Exercises 4 – 6 use MATLAB to solve the given system of In Exercise 12-15, write the given expression in the form α+iβ
linear equations to four significant decimal places. where α, β ∈ R:
4. (matlab) 12.
1−i
.
√ 1+i
0.1x1
√ + 5x2 − 2x3 = 1 √
3+i
− 3x1 + πx2 − 2.6x3 = 14.3 .
√ 13. √ .
π 1 − 3i
x1 − 7x2 + x3 = 2
2
14. (−2 + 3i)(1 − 2i)2 .
5. (matlab)
(5 + 4i)(1 + i)
(4 − i)x1 + (2 + 3i)x2 = −i 15. .
. 3 − 2i
ix1 − 4x2 = 2.2
49
§2.6 Uniqueness of Reduced Echelon Form
2.6 Uniqueness of Reduced Echelon in the equations associated to the matrix (F |0), xk must
be zero to be a solution. This argument contradicts the
Form fact that the (E|0) equations and the (F |0) equations
In this section we prove Theorem 2.4.9, which states that have the same solutions. So the pivots of F must also
every matrix is row equivalent to precisely one reduced occur in columns 1, . . . , `, and the equations associated
echelon form matrix. to F must have the form:
Proof of Theorem 2.4.9: Suppose that E and F are x1 = −â1,`+1 x`+1 − · · · − â1,n xn
two m × n reduced echelon matrices that are row equiva- x2 = −â2,`+1 x`+1 − · · · − â2,n xn
lent to A. Since elementary row operations are invertible, .. .. (2.6.2)
the two matrices E and F are row equivalent. Thus, the . .
systems of linear equations associated to the m × (n + 1) x` = −â`,`+1 x`+1 − · · · − â`,n xn
matrices (E|0) and (F |0) must have exactly the same set
of solutions. It is the fact that the solution sets of the lin- where âi,j are scalars.
ear equations associated to (E|0) and (F |0) are identical To complete this proof, we show that ai,j = âi,j . These
that allows us to prove that E = F . equalities are verified as follows. There is just one solu-
Begin by renumbering the variables x1 , . . . , xn so that tion to each system (2.6.1) and (2.6.2) of the form
the equations associated to (E|0) have the form:
x`+1 = 1, x`+2 = · · · = xn = 0.
x1 = −a1,`+1 x`+1 − · · · − a1,n xn
x2 = −a2,`+1 x`+1 − · · · − a2,n xn These solutions are
.. .. (2.6.1)
. . (−a1,`+1 , . . . , −a`,`+1 , 1, 0, · · · , 0)
x` = −a`,`+1 x`+1 − · · · − a`,n xn .
for (2.6.1) and
In this form, pivots of E occur in the columns 1, . . . , `.
We begin by showing that the matrix F also has pivots in (−â1,`+1 , . . . , −â`,`+1 , 1, 0 · · · , 0)
columns 1, . . . , `. Moreover, there is a unique solution to
these equations for every choice of numbers x`+1 , . . . , xn . for (2.6.2). It follows that aj,`+1 = âj,`+1 for j = 1, . . . , `.
Complete this proof by repeating this argument. Just
Suppose that the pivots of F do not occur in columns inspect solutions of the form
1, . . . , `. Then there is a row in F whose first nonzero
entry occurs in a column k > `. This row corresponds to x`+1 = 0, x`+2 = 1, x`+3 = · · · = xn = 0
an equation
through
xk = ck+1 xk+1 + · · · + cn xn .
Now, consider solutions that satisfy x`+1 = · · · = xn−1 = 0, xn = 1.
50
Chapter 3 Matrices and Linearity
51
§3.1 Matrix Multiplication of Vectors
3.1 Matrix Multiplication of Vectors For example, when m = 2 and n = 3, then the product
is a 2-vector
In Chapter 2 we discussed how matrices appear when
solving systems of m linear equations in n unknowns.
x
a11 a12 a13 1
a11 x1 + a12 x2 + a13 x3
Given the system x2 = .
a21 a22 a23 a21 x1 + a22 x2 + a23 x3
x3
a11 x1 + a12 x2 + ··· + a1n xn = b1 (3.1.3)
a21 x1 + a22 x2 + ··· + a2n xn = b2 As a specific example, compute
.. .. .. .. (3.1.1)
. . . .
2
am1 x1 + am2 x2 + · · · + amn xn = bm , 2 3 −1 2 · 2 + 3 · (−3) + (−1) · 4
−3 =
4 1 5 4 · 2 + 1 · (−3) + 5·4
we saw that all relevant information is contained in the 4
m × n matrix of coefficients
−9
= .
a11 a12 · · · a1n
25
a21 a22 · · · a2n
A=
.. .. ..
Using (3.1.2) we have a compact notation for writing sys-
. . . tems of linear equations. For example, using a special
am1 am2 · · · amn instance of (3.1.3),
and the m vector
x1
2 3 −1 2x1 + 3x2 − x3
b1
x2 = .
4 1 5 4x1 + x2 + 5x3
b = ... .
x3
am1 ··· amn xn am1 x1 + · · · + amn xn vector of unknowns, and b is the m vector of constants
(3.1.2) on the right hand side of (3.1.1).
52
§3.1 Matrix Multiplication of Vectors
3
We may verify this result by solving the system of linear
1
equations Ax = b. Indeed if we type 6. A = (5) and x = .
0
53
§3.1 Matrix Multiplication of Vectors
54
§3.1 Matrix Multiplication of Vectors
and
2.7
6.1
−8.3
x=
8.9 .
8.3
2
−4.9
55
§3.2 Matrix Mappings
3.2 Matrix Mappings Here the matrix mapping is given by (x, y) 7→ (λx, µy);
that is, a mapping that independently stretches and/or
Having illustrated the notational advantage of using ma-
contracts the x and y coordinates. Even these simple
trices and matrix multiplication, we now begin to discuss
looking mappings can move objects in the plane in a
why there is also a conceptual advantage to matrix mul-
somewhat complicated fashion.
tiplication, a conceptual advantage that will help us to
understand how systems of linear equations and linear
differential equations may be solved. The Program map We use MATLAB to explore planar
Matrix multiplication allows us to view m × n matrices matrix mappings using the program map. In MATLAB
as mappings from Rn to Rm . Let A be an m × n matrix type the command
and let x be an n vector. Then
map
x 7→ Ax
and a window appears labeled Map. The 2 × 2 matrix
defines a mapping from R to R .
n m
0 −1
. (3.2.1)
The simplest example of a matrix mapping is given by 1 0
1 × 1 matrices. Matrix mappings defined from R → R has been pre-entered. Click on the Custom button. In the
are Icons menu click on an icon — say Dog — and a blue ‘Dog’
x 7→ ax will appear in the graphing window. Next click on the
where a is a real number. Note that the graph of this Iterate button and a new version of the Dog will appear in
function is just a straight line through the origin (with yellow —the yellow Dog is just rotated about the origin
slope a). From this example we see that matrix mappings counterclockwise by 90◦ from the blue dog. Indeed, the
are very special mappings indeed. In higher dimensions, matrix (3.2.1) rotates the plane counterclockwise by 90◦ .
matrix mappings provide a richer set of mappings; we To verify this statement click on Iterate again and see
explore here planar mappings — mappings of the plane that the yellow dog rotates 90◦ counterclockwise into the
into itself — using MATLAB graphics and the program magenta dog. Of course, the magenta dog is rotated 180◦
map. from the original blue dog. Clicking on Iterate once more
produces a fourth dog — this one in cyan. Finally, one
The simplest planar matrix mappings are the dilatations. more click on the Iterate button will rotate the cyan dog
Let A = cI2 where c > 0 is a scalar. When c < 1 into a red dog that exactly covers the original blue dog.
vectors are contracted by a factor of c and and these
mappings are examples of contractions. When c > 1 Other matrices will produce different motions of the
vectors are stretched or expanded by a factor of c and plane. Click on the Reset button. Then either push the
these dilatations are examples of expansions. We now Custom button, type the entries in the matrix, and click
explore some more complicated planar matrix mappings. on the Iterate button; or choose one of the pre-assigned
matrices listed in the Gallery menu and click on the It-
The next planar motions that we study are those given erate button. For example, clicking on the Contracting
by the matrices rotation button recalls the matrix
λ 0 0.3 −0.8
A= .
0 µ 0.8 0.3
56
§3.2 Matrix Mappings
This matrix rotates the plane through an angle of approx- movement associated with the linear map x 7→ −cx where
imately 69.4◦ counterclockwise and contracts the plane x ∈ R2 and c > 0 may be thought of as a dilatation
by a factor of approximately 0.85. Now click on Dog in (x 7→ cx) followed by rotation through 180◦ (x 7→ −x).
the Icons menu to bring up the blue dog again. Repeated
We claim that combining dilatations with general rota-
clicking on Iterate rotates and contracts the dog so that tions produces spirals. Consider the matrix
dogs in a cycling set of colors slowly converge towards
the origin in a spiral of dogs.4
c cos θ −c sin θ
S= = cRθ
c sin θ c cos θ
Rotations Rotating the plane counterclockwise through where c < 1. Then a calculation similar to the previous
an angle θ is a motion given by a matrix mapping. We one shows that
show that the matrix that performs this rotation is:
S(rvϕ ) = c(rvϕ+θ ).
cos θ − sin θ
Rθ = . (3.2.2) So S rotates vectors in the plane while contracting them
sin θ cos θ
by the factor c. Thus, multiplying a vector repeatedly by
To verify that Rθ rotates the plane counterclockwise S spirals that vector into the origin. The example that
through angle θ, let vϕ be the unit vector whose angle we just considered while using map is
from the horizontal is ϕ; that is, vϕ = (cos ϕ, sin ϕ). We
0.85 cos(69.4◦ ) −0.85 sin(69.4◦ )
can write every vector in R2 as rvϕ for some number 0.3 −0.8 ∼
= ◦ ◦ ,
r ≥ 0. Using the trigonometric identities for the cosine 0.8 0.3 0.85 sin(69.4 ) 0.85 cos(69.4 )
and sine of the sum of two angles, we have: which is an example of S with c = 0.85 and θ = 69.4◦ .
cos θ − sin θ r cos ϕ
Rθ (rvϕ ) =
sin θ cos θ r sin ϕ A Notation for Matrix Mappings We reinforce the idea
r cos θ cos ϕ − r sin θ sin ϕ
that matrices are mappings by introducing a notation for
=
r sin θ cos ϕ + r cos θ sin ϕ the mapping associated with an m × n matrix A. Define
cos(θ + ϕ) LA : Rn → Rm
= r
sin(θ + ϕ)
= rvϕ+θ . by
LA (x) = Ax,
This calculation shows that Rθ rotates every vector in for every x ∈ R .
n
the plane counterclockwise through angle θ.
There are two special matrices: the m × n zero matrix O
It follows from (3.2.2) that R180◦ = −I2 . So rotating all of whose entries are 0 and the n × n identity matrix
a vector in the plane by 180◦ is the same as reflecting In whose diagonal entries are 1 and whose off diagonal
the vector through the origin. It also follows that the entries are 0. For instance,
4
When using the program map first choose an Icon (or Vector),
second choose a Matrix from the Gallery (or a Custom matrix), 1 0 0
and finally click on Iterate. Then Iterate again or Reset to start I3 = 0 1 0 .
over. 0 0 1
57
§3.2 Matrix Mappings
The mappings associated with these special matrices are 6. What 2×2 matrix rotates the plane clockwise by 90◦ while
also special. Let x be an n vector. Then dilating it by a factor of 2?
Ox = 0, (3.2.3)
7. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
where the 0 on the right hand side of (3.2.3) is the m across the x axis.
vector all of whose entries are 0. The mapping LO is the
zero mapping — the mapping that maps every vector x
to 0. 8. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
across the y axis.
Similarly,
In x = x
9. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
for every vector x. It follows that across the line x = y.
LIn (x) = x
10. Suppose the mapping L : R3 → R2 is linear and satisfies
is the identity mapping, since it maps every vector to
itself. It is for this reason that the matrix In is called the
1 0 0
1 2 −1
n × n identity matrix. L 0 =
2
L 1 =
0
L 0 =
4
0 1 1
58
§3.2 Matrix Mappings
In Exercises 19 – 21 use Exercise 14 and map to verify that rotates the plane through an angle of 180◦ . Using the program
the given matrices rotate te through an angle θ followed by a map verify that both matrices map the vector (1, 1) to its neg-
dilatation cI2 . Find θ and c in each case. ative (−1, −1). Now perform two experiments. First, choose
the dog icon and move that dog by the matrix A. Second,
move that dog using the matrix B. Describe the difference in
1 −2
19. (matlab) A = .
2 1 the result.
−2.4 −0.2
20. (matlab) B = .
0.2 −2.4
2.67 1.3
21. (matlab) C = .
−1.3 2.67
59
§3.3 Linearity
Multiplication of a scalar times a vector is defined by (a) L(x + y) = L(x) + L(y) for all x, y ∈ Rn .
x1 cx1
(b) L(cx) = cL(x) for all x ∈ Rn and all scalars c ∈ R.
c ... = ... .
z1 = A*(x+y) Hence
z2 = A*x + A*y L(x + y) = L(x) + L(y)
and compare z1 and z2. The fact that they are both for every pair of vectors x and y in R2 .
equal to Similarly, to verify Definition 3.3.1(b), let c ∈ R be a
35
scalar and compute
33
verifies (3.3.1) in this case. Similarly, type L(cx) = L(cx1 , cx2 ) = ((cx1 ) + 3(cx2 ), 2(cx1 ) − (cx2 )).
60
§3.3 Linearity
cL(x) = c(x1 +3x2 , 2x1 −x2 ) = (c(x1 +3x2 ), c(2x1 −x2 )), • f (x) = x2 . Calculate
from which it follows that f (x + y) = (x + y)2 = x2 + 2xy + y 2
L(cx) = cL(x) while
for every vector x ∈ R and every scalar c ∈ R. Thus L
2 f (x) + f (y) = x2 + y 2 .
is a linear mapping. The two expressions are not equal and f (x) = x2 is
In fact, the mapping (3.3.4) is a matrix mapping and not linear.
could have been written in the form • f (x) = ex . Calculate
1 3
L(x) =
2 −1
x. f (x + y) = ex+y = ex ey
61
§3.3 Linearity
So if we let a = L(1), then we see that Theorem 3.3.5. Let L : Rn → Rm be a linear mapping.
Then there exists an m × n matrix A such that L = LA .
L(x) = ax.
Thus linear mappings of R into R are very special map- Proof There are two steps to the proof: determine the
pings indeed; they are all scalar multiples of the identity matrix A and verify that LA = L.
mapping.
Let A be the matrix whose j th column is L(ej ). By
Lemma 3.3.4 L(ej ) = Aej ; that is, L(ej ) = LA (ej ).
All Linear Mappings are Matrix Mappings We end this Lemma 3.3.3 implies that L = LA .
section by proving that every linear mapping is given
by matrix multiplication. But first we state and prove Theorem 3.3.5 provides a simple way of showing that
two lemmas. There is a standard set of vectors that is
used over and over again in linear algebra, which we now L(0) = 0
define.
for any linear map L. Indeed, L(0) = LA (0) = A0 = 0
Definition 3.3.2. Let j be an integer between 1 and n. for some matrix A. (This fact can also be proved directly
The n-vector ej is the vector that has a 1 in the j th entry from the definition of linear mapping.)
and zeros in all other entries.
Lemma 3.3.3. Let L1 : Rn → Rm and L2 : Rn → Rm be Using Theorem 3.3.5 to Find Matrices Associated to Linear
linear mappings. Suppose that L1 (ej ) = L2 (ej ) for every Maps The proof of Theorem 3.3.5 shows that the j th
j = 1, . . . , n. Then L1 = L2 . column of the matrix A associated to a linear mapping
L is L(ej ) viewed as a column vector. As an example,
Proof Let x = (x1 , . . . , xn ) be a vector in Rn . Then let L : R2 → R2 be rotation clockwise through 90◦ . Ge-
ometrically, it is easy to see that
x = x1 e1 + · · · + xn en .
Linearity of L1 and L2 implies that 1 0
L(e1 ) = L =
0 −1
L1 (x) = x1 L1 (e1 ) + · · · + xn L1 (en )
and
= x1 L2 (e1 ) + · · · + xn L2 (en )
0
1
= L2 (x). L(e2 ) = L = .
1 0
Since L1 (x) = L2 (x) for all x ∈ Rn , it follows that L1 = Since we know that rotations are linear maps, it follows
L2 . that the matrix A associated to the linear map L is:
Lemma 3.3.4. Let A be an m × n matrix. Then Aej is
0 1
the j th column of A. A=
−1 0
.
Proof Recall the definition of matrix multiplication Additional examples of linear mappings whose associated
given in (3.1.2). In that formula, just set xi equal to zero matrices can be found using Theorem 3.3.5 are given in
for all i 6= j and set xj = 1. Exercises 11 – 14.
62
§3.3 Linearity
is a linear mapping.
In Exercises 6 – 9 determine whether the given transformation (b) Find the 3 × 3 matrix A such that
is linear.
L(x) = Ax,
6. T : R3 → R2 defined by T (x1 , x2 , x3 ) = (x1 +2x2 −x3 , x1 −
4x3 ). that is, L = LA .
63
§3.3 Linearity
13. Argue geometrically that rotation of the plane counter- 18. (matlab) Let
clockwise through an angle of 45◦ is a linear mapping. Find a
2 × 2 matrix A such that LA rotates the plane counterclock-
0 0.5
A= .
wise by 45◦ . −0.5 0
20. (matlab)
16. Let P : Rn → Rm and Q : Rn → Rm be linear mappings.
4 0 −3 2 4 1
(a) Prove that S : Rn → Rm defined by −4 −1
2 8 3 3
A= −1 2 1 10 −2 −2
, x=
S(x) = P (x) + Q(x) 4 4 −2 1 2 3
−2 3 1 1 −1 −1
is also a linear mapping. (3.3.6*)
(b) Theorem 3.3.5 states that there are matrices A, B and 2
C such that 0
13 ,
y= c = −13.
and Q = LB and
P = LA S = LC . −2
What is the relationship between the matrices A, B, and 1
C?
64
§3.4 The Principle of Superposition
3.4 The Principle of Superposition We illustrate this principle by explicitly solving the sys-
tem of equations
The principle of superposition is just a restatement of
the fact that matrix mappings are linear. Nevertheless,
x1
this restatement is helpful when trying to understand the
1 2 −1 1 x2 =
0
structure of solutions to systems of linear equations. 2 5 −4 −1 x3 0
.
x4
Homogeneous Equations A system of linear equations Use row reduction to show that the matrix
is homogeneous if it has the form
1 2 −1 1
2 5 −4 −1
Ax = 0, (3.4.1)
is row equivalent to
where A is an m × n matrix and x ∈ Rn . Note that ho-
mogeneous systems are consistent since 0 ∈ Rn is always
1 0 3 7
a solution, that is, A(0) = 0. 0 1 −2 −3
The principle of superposition makes two assertions: which is in reduced echelon form. Recall, using the meth-
ods of Section 2.3, that every solution to this linear sys-
• Suppose that y and z in Rn are solutions to (3.4.1) tem has the form
(that is, suppose that Ay = 0 and Az = 0); then
−3x3 − 7x4
−3
−7
y + z is a solution to (3.4.1). 2x3 + 3x4
= x3 2 + x4 3 .
x3 1 0
• Suppose that c is a scalar; then cy is a solution to
x4 0 1
(3.4.1).
Superposition is verified again by observing that the form
The principle of superposition is proved using the linear- of the solutions is preserved under vector addition and
ity of matrix multiplication. Calculate scalar multiplication. For instance, suppose that
−3 −7 −3 −7
A(y + z) = Ay + Az = 0 + 0 = 0 2 3 2 3
and β1
α1 +α +β
1 2 0 1 2 0
to verify that y + z is a solution, and calculate
0 1 0 1
A(cy) = c(Ay) = c · 0 = 0 are two solutions. Then the sum has the form
to verify that cy is a solution. −3 −7
2
We see that solutions to homogeneous systems of linear + γ2 3
γ1
1 0
equations always satisfy the general property of superpo-
0 1
sition: sums of solutions are solutions and scalar multi-
ples of solutions are solutions. where γj = αj + βj .
65
§3.4 The Principle of Superposition
We have actually proved more than superposition. We the homogeneous equation. More precisely, suppose that
have shown in this example that every solution is a su- we know all of the solutions w to the homogeneous equa-
perposition of just two solutions tion Ax = 0 and one solution y to the inhomogeneous
equation Ax = b. Then y + w is another solution to
−3 −7 the inhomogeneous equation and every solution to the
2 3
inhomogeneous equation has this form.
1 and 0 .
0 1
An Example of an Inhomogeneous Equation Suppose that
Inhomogeneous Equations The linear system of m we want to find all solutions of Ax = b where
equations in n unknowns is written as
3 2 1
−2
Ax = b A = 0 1 −2 and b = 4 .
3 3 −1 2
where A is an m × n matrix, x ∈ Rn , and b ∈ Rm . This
system is inhomogeneous when the vector b is nonzero. Suppose that you are told that y = (−5, 6, 1)t is a solu-
Note that if y, z ∈ Rn are solutions to the inhomogeneous tion of the inhomogeneous equation. (This fact can be
equation (that is, Ay = b and Az = b), then y − z is a verified by a short calculation — just multiply Ay and
solution to the homogeneous equation. That is, see that the result equals b.) Next find all solutions to the
homogeneous equation Ax = 0 by putting A into reduced
A(y − z) = Ay − Az = b − b = 0.
echelon form. The resulting row echelon form matrix is
For example, let
5
1 0
1 2 0 3 3
and b = 1 −2 .
A= . 0
−2 0 1 −1
0 0 0
Then
Hence we see that the solutions of the homogeneous equa-
1 3
y= 1 and z = 0 tion Ax = 0 are
1 5
5 5
are both solutions to the linear system Ax = b. It follows − s −
3 3
that 2s = s 2 .
−2
s 1
y−z = 1
−4 Combining these results, we conclude that all the solu-
is a solution to the homogeneous system Ax = 0, which tions of Ax = b are given by
can be checked by direct calculation.
5
Thus we can completely solve the inhomogeneous equa-
−5 −
3 .
tion by finding one solution to the inhomogeneous equa- 6 + s
2
tion and then adding to that solution every solution of 1 1
66
§3.4 The Principle of Superposition
67
§3.4 The Principle of Superposition
10. Let A be an n × n matrix with rank n − 1. Suppose u and (a) Determine the matrix representation of L. Namely, find
v in Rn are distinct solutions to the inhomogeneous system the matrix A such that LA = L.
Ax = b. Verify that every solution to Ax = b can be written
0
as αu + (1 − α)v for some α ∈ R. (b) Verify that 2 is a solution to Lx =
0
2
(Hint: idea is similar to Exercise 5.) −2
0
(c) Find the full set of solutions of Lx = .
2
11. Let L : R3 → R4 be a mapping such that
x1 − x2
x1 + x2 − x3
Lx =
−x1 + x2 + 4x3
−3x1 + x2 − x3
x1
for all x = x2 .
x3
2
1
0
(c) Find the full set of solutions of Lx =
−9 .
and
0
0
L 1 = .
1
−1
68
§3.5 Composition and Multiplication of Matrices
3.5 Composition and Multiplication With this computation in mind, we define the product
of Matrices
2 1
0 3
−1 10
AB = = .
The composition of two matrix mappings leads to another 1 −1 −1 4 1 −1
matrix mapping from which the concept of multiplication
of two matrices follows. Matrix multiplication can be Using the same approach we can derive a formula for
introduced by formula, but then the idea is unmotivated matrix multiplication of 2 × 2 matrices. Suppose
and one is left to wonder why matrix multiplication is
a11 a12
b11 b12
defined in such a seemingly awkward way. A= and B = .
a21 a22 b21 b22
We begin with the example of 2 × 2 matrices. Suppose
that Then
b11 x1 + b12 x2
2 1 0 3 A(Bx) = A
A= and B = . b21 x1 + b22 x2
1 −1 −1 4
a11 (b11 x1 + b12 x2 ) + a12 (b21 x1 + b22 x2 )
We have seen that the mappings =
a21 (b11 x1 + b12 x2 ) + a22 (b21 x1 + b22 x2 )
x 7→ Ax and
x 7→ Bx (a11 b11 + a12 b21 )x1 + (a11 b12 + a12 b22 )x2
=
(a21 b11 + a22 b21 )x1 + (a21 b12 + a22 b22 )x2
map 2-vectors to 2-vectors. So we can ask what hap-
a11 b11 + a12 b21 a11 b12 + a12 b22
x1
pens when we compose these mappings. In symbols, we =
a21 b11 + a22 b21 a21 b12 + a22 b22 x2
.
compute
Hence, for 2 × 2 matrices, we see that composition of
LA ◦LB (x) = LA (LB (x)) = A(Bx). matrix mappings defines the matrix multiplication
In coordinates, let x = (x1 , x2 ) and compute
a11 a12 b11 b12
a21 a22 b21 b22
3x2
A(Bx) = A
−x1 + 4x2 to be
−x1 + 10x2
a11 b11 + a12 b21 a11 b12 + a12 b22
=
x1 − x2
. . (3.5.1)
a21 b11 + a22 b21 a21 b12 + a22 b22
It follows that we can rewrite A(Bx) using multiplication
Formula (3.5.1) may seem a bit formidable, but it does
of a matrix times a vector as
have structure. Suppose A and B are 2×2 matrices, then
the entry of
−1 10 x1
A(Bx) = .
1 −1 x2 C = AB
In particular, LA ◦LB is again a linear mapping, namely in the ith row, j th column may be written as
LC , where 2
−1 10
X
C= . ai1 b1j + ai2 b2j = aik bkj .
1 −1 k=1
69
§3.5 Composition and Multiplication of Matrices
We shall see that an analog of this formula is available where Bj ≡ Bej is the j th column of the matrix B.
for matrix multiplications of all sizes. But to derive this Therefore,
formula, it is easier to develop matrix multiplication ab-
stractly. C = (AB1 | · · · |ABp ). (3.5.2)
70
§3.5 Composition and Multiplication of Matrices
71
§3.5 Composition and Multiplication of Matrices
Exercise 15 - 22 discuss permutation matrices: n × n matrices 21. Show that there are n! different permutation matrices of
that have exactly one 1 in each row and each column, and size n × n, where n! = n · (n − 1) · · · 2 · 1. Hint: Use induction.
whose remaining entries are 0.
15. There
are two 2× 2 permutation
matrices: I2 and 22. Consider the following n × n permutation matrix
0 1 0 1 x1
. Compute . Then describe in
1 0 1 0 x2
0 1 0 0 ··· 0
0 1
words how acts on each coordinate vector 0 0 1 0 ··· 0
1 0
0 0 0 1 ··· 0
A= .. .. .. .. .. .
1 0 . . . . .
e1 = , e2 = .
0 1
0 0 0 0 ··· 1
1 0 0 0 ··· 0
Exercises 16 - 20 consider 3 × 3 permutation matrices. Let Show that An = In . Hint: Describe the action of A on
coordinate vectors.
0 1 0 1 0 0
A= 0 0 1 and B = 0 0 1 .
1 0 0 0 1 0
72
§3.6 Properties of Matrix Multiplication
73
§3.6 Properties of Matrix Multiplication
• Let A and B be m×n matrices and let C be an n×p where atjk is the (j, k)th entry in At and btij is the (i, j)th
matrix. Then entry in B t . It follows from the definition of transpose
that the (i, k)th entry in B t At is:
(A + B)C = AC + BC.
n n
Similarly, if D is a q × m matrix, then
X X
bji akj = akj bji ,
j=1 j=1
D(A + B) = DA + DB.
which verifies the claim.
So matrix multiplication distributes across matrix
addition.
Matrix Multiplication in MATLAB Let us now explain
• If α and β are scalars, then how matrix multiplication works in MATLAB. We load
(α + β)A = αA + βA. the matrices
−5 2 0
So addition distributes with scalar multiplication.
−1 1 −4 2 −2 −2 5 5
A= and B = 4 −5 1 −1 2
• Scalar multiplication and matrix multiplication sat- −4 4 2
3 2 3 −3 3
isfy: −1 3 −1
(αA)C = α(AC). (3.6.2*)
by typing
Matrix Multiplication and Transposes Let A be an m × n e3_6_2
matrix and let B be an n × p matrix, so that the matrix
product AB is defined and AB is an m × p matrix. Note
Now the command C = A*B asks MATLAB to compute
that At is an n×m matrix and that B t is a p×n matrix, so
the matrix C as the product of A and B. We obtain
that in general the product At B t is not defined. However,
the product B t At is defined and is an p × m matrix, as
C =
is the matrix (AB)t . We claim that
-2 0 12 -27 -21
(AB)t = B t At . (3.6.1) -10 -11 -9 6 -15
14 -8 18 -30 -6
We verify this claim by direct computation. The (i, k)th 7 -15 2 -5 -2
entry in (AB)t is the (k, i)th entry in AB. That entry is:
n Let us confirm this result by another computation. As
we have seen above the 4th column of C should be given
X
akj bji .
j=1 by the product of A with the 4th column of B. Indeed,
if we perform this computation and type
The (i, k)th entry in B t At is:
n A*B(:,4)
X
btij atjk ,
j=1 the result is
74
§3.6 Properties of Matrix Multiplication
ans = 3. Let
-27
0 1 0
6 A= 0 0 1 .
0 0 0
-30
-5 1 1
Compute B = I3 + A + A2 and C = I3 + tA + (tA)2 where
2 2
t is a real number.
which is precisely the 4th column of C.
MATLAB also recognizes when a matrix multiplication of
two matrices is not defined. For example, the product of 4. Let
the 3 × 5 matrix B with the 4 × 3 matrix A is not defined,
1 0 0 −1
and if we type B*A then we obtain the error message I=
0 1
and J=
1 0
.
75
§3.6 Properties of Matrix Multiplication
76
§3.7 Solving Linear Systems and Inverses
3.7 Solving Linear Systems and Invertibility We begin by giving a precise definition of
invertibility for square matrices.
Inverses
When we solve the simple equation Definition 3.7.1. The n × n matrix A is invertible if
there is an n × n matrix B such that
ax = b,
AB = In and BA = In .
we do so by dividing by a to obtain
The matrix B is called an inverse of A. If A is not
1 invertible, then A is noninvertible or singular.
x = b.
a
This division works as long as a 6= 0. Geometrically, we can see that some matrices are invert-
ible. For example, the matrix
Writing systems of linear equations as
0 −1
Ax = b R90 =
1 0
suggests that solutions should have the form
rotates the plane counterclockwise through 90◦ and is
1 invertible. The inverse matrix of R90 is the matrix that
x= b
A rotates the plane clockwise through 90◦ . That matrix is:
and the MATLAB command for solving linear systems
0 1
R−90 = .
−1 0
x=A\b
This statement can be checked algebraically by verifying
suggests that there is some merit to this analogy. that R90 R−90 = I2 and that R−90 R90 = I2 .
The following is a better analogy. Multiplication by a Similarly,
has the inverse operation: division by a; multiplying a
5 3
number x by a and then multiplying the result by a−1 = B=
2 1
1/a leaves the number x unchanged (as long as a 6= 0).
is an inverse of
In this sense we should write the solution to ax = b as
−1 3
x = a−1 b. A= ,
2 −5
For systems of equations Ax = b we wish to write solu-
as matrix multiplication shows that AB = I2 and BA =
tions as
I2 . In fact, there is an elementary formula for finding
x = A−1 b.
inverses of 2 × 2 matrices (when they exist); see (3.8.1)
In this section we consider the questions: What does A−1 in Section 3.8.
mean and when does A−1 exist? (Even in one dimension,
On the other hand, not all matrices are invertible. For
we have seen that the inverse does not always exist, since
1 example, the zero matrix is noninvertible, since 0B = 0
0−1 = is undefined.) for any matrix B.
0
77
§3.7 Solving Linear Systems and Inverses
Lemma 3.7.2. If an n × n matrix A is invertible, then Invertibility and Unique Solutions Next we discuss the
its inverse is unique and is denoted by A−1 . implications of invertibility for the solution of the inho-
mogeneous linear system:
Proof Let B and C be n×n matrices that are inverses
of A. Then Ax = b, (3.7.1)
BA = In and AC = In . where A is an n × n matrix and b ∈ Rn .
We use the associativity of matrix multiplication to prove Proposition 3.7.5. Let A be an invertible n × n matrix
that B = C. Compute and let b be in Rn . Then the system of linear equations
B = BIn = B(AC) = (BA)C = In C = C. (3.7.1) has a unique solution.
Proof We can solve the linear system (3.7.1) by setting
We now show how to compute inverses for products of x = A−1 b. (3.7.2)
invertible matrices.
Proposition 3.7.3. Let A and B be two invertible n × n This solution is easily verified by calculating
matrices. Then AB is also invertible and Ax = A(A−1 b) = (AA−1 )b = In b = b.
−1 −1 −1
(AB) =B A .
Next, suppose that x is a solution to (3.7.1). Then
Proof Use associativity of matrix multiplication to
x = In x = (A−1 A)x = A−1 (Ax) = A−1 b.
compute
(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In . So A−1 b is the only possible solution.
78
§3.7 Solving Linear Systems and Inverses
(In |B). Eliminating all columns from the right half of (b) ⇒ (c) This implication is straightforward — just take
M except the j th column yields the matrix (A|ej ). The b = 0 in (3.7.1).
same sequence of elementary row operations states that
(c) ⇒ (d) This implication is just a restatement of Chap-
the matrix (A|ej ) is row equivalent to (In |Bj ) where Bj
ter 2, Corollary 2.4.8.
is the j th column of B. It follows that Bj is the solution
to the system of linear equations Ax = ej and that the (d) ⇒ (a). This implication is just Proposition 3.7.7.
matrix product
AB = (AB1 | · · · |ABn ) = (e1 | · · · |en ) = In . A Method for Computing Inverse Matrices The proof
of Proposition 3.7.7 gives a constructive method for find-
So AB = In . ing the inverse of any invertible square matrix.
We claim that BA = In and hence that A is invertible. Theorem 3.7.9. Let A be an n × n matrix that is row
To verify this claim form the n × 2n matrix N = (In |A). equivalent to In and let M be the n×2n augmented matrix
Using the same sequence of elementary row operations
M = (A|In ). (3.7.3)
again shows that N is row equivalent to (B|In ). By con-
struction the matrix B is row equivalent to In . There- Then the matrix M is row equivalent to (In |A−1 ).
fore, there is a unique solution to the system of linear
equations Bx = ej . Now eliminating all columns except
the j th from the right hand side of the matrix (B|In ) An Example Compute the inverse of the matrix
shows that the solution to the system of linear equations
1 2 0
Bx = ej is just Aj , where Aj is the j th column of A. It A = 0 1 3 .
follows that 0 0 1
BA = (BA1 | · · · |BAn ) = (e1 | · · · |en ) = In . Begin by forming the 3 × 6 matrix
Hence BA = In . 1 2 0 1 0 0
M = 0 1 3 0 1 0 .
Theorem 3.7.8. Let A be an n × n matrix. Then the 0 0 1 0 0 1
following are equivalent:
To put M in row echelon form by row reduction, first
subtract 3 times the 3rd row from the 2nd row, obtaining
(a) A is invertible.
1 2 0 1 0 0
(b) The equation Ax = b has a unique solution for each 0 1 0 0 1 −3 .
b ∈ Rn . 0 0 1 0 0 1
(c) The only solution to Ax = 0 is x = 0.
Second, subtract 2 times the 2nd row from the 1st row,
(d) A is row equivalent to In . obtaining
1 0 0 1 −2 6
Proof (a) ⇒ (b) This implication is just Proposi- 0 1 0 0 1 −3 .
tion 3.7.5. 0 0 1 0 0 1
79
§3.7 Solving Linear Systems and Inverses
M = [A eye(3)] to obtain
M =
in MATLAB yields the result 1.0000 0 0 -1.0000 2.0000 -2.0000
0 1.0000 0 5.0000 -9.0000 11.0000
0 0 1.0000 -2.0000 4.0000 -5.0000
M =
1 2 4 1 0 0 Thus C = A−1 is obtained by extracting the last three
3 1 1 0 1 0 columns of M by typing
2 0 -1 0 0 1
C = M(:,[4 5 6])
Now row reduce M to reduced echelon form as follows.
Type which yields
80
§3.7 Solving Linear Systems and Inverses
You may check that C is the inverse of A by typing A*C This computation also illustrates the fact that even when
and C*A. the matrix A has integer entries, the inverse of A usually
has noninteger entries.
In fact, this entire scheme for computing the inverse of
a matrix has been preprogrammed into MATLAB . Just Let b = (2, −8, 18, −6, −1). Then we may use the inverse
type B = A−1 to compute the solution of Ax = b. Indeed if
we type
inv(A)
b = [2;-8;18;-6;-1];
to obtain x = B*b
ans = then we obtain
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000 x =
-2.0000 4.0000 -5.0000 -1.0000
2.0000
We illustrate again this simple method for computing the
1.0000
inverse of a matrix A. For example, reload the matrix in
-1.0000
(3.1.4*) by typing e3_1_4 and obtaining:
3.0000
A =
5 -4 3 -6 2 as desired (see (3.1.5*)). With this computation we have
2 -4 -2 -1 1 confirmed the analytical results of the previous subsec-
1 2 1 -5 3 tions.
-2 -1 -2 1 -1
1 -6 1 1 4 Exercises
The command B = inv(A) stores the inverse of the ma-
trix A in the matrix B, and we obtain the result
1. Verify by matrix multiplication that the following matrices
B = are inverses of each other:
-0.0712 0.2856 -0.0862 -0.4813 -
1 0 2 −1 0 2
0.0915 0 −1 2 and 2 −1 −2 .
-0.1169 0.0585 0.0690 -0.2324 -
1 0 1 1 0 −1
0.0660
0.1462 -0.3231 -
0.0862 0.0405 0.0825 2. Let α 6= 0 be a real number and let A be an invertible
-0.1289 0.0645 -0.1034 - matrix. Show that the inverse of the matrix αA is given by
0.2819 0.0555 1 −1
A .
-0.1619 0.0810 0.1724 - α
0.1679 0.1394
81
§3.7 Solving Linear Systems and Inverses
In Exercises 5 – 6 use row reduction to find the inverse of the In Exercises 11 – 12 use row reduction to find the inverse of
given matrix. the given matrix and confirm your results using the command
inv.
1 4 5
5. 0 1 −1 .
11. (matlab)
−2 0 −8
2 1 3
A= 1 2 3 . (3.7.5*)
1 −1 −1
6. 0 2 0 . 5 1 0
2 0 −1
12. (matlab)
7. True or false? If true, explain why; if false, give a coun- 0 5 1 3
terexample.
1 5 3 −1
B=
. (3.7.6*)
2 1 0 −4
(a) If A and B are matrices such that AB = I2 , then BA = 1 7 2 3
I2 .
(b) If A, B, C are matrices such that AB = In and BC = In ,
then A = C. 13. (matlab) Try to compute the inverse of the matrix
(c) Let A be an m × n matrix and b be a column m vector.
1 0 3
If the system of linear equations Ax = b has a unique C = −1 2 −2 (3.7.7*)
solution, then A is invertible. 0 2 1
82
§3.7 Solving Linear Systems and Inverses
14. Let A and B be 3 × 3 invertible matrices so that 17. Let A be an m × n matrix with rank n. Use Exercise 16
to prove: Suppose the system of linear equations
1 0 −1 1 1 1
−1
A = −1 −1 0 and B = 1 1 0
−1
Ax = b (3.7.8)
0 1 −1 1 0 0
is consistent, then
Without computing A or B, determine the following:
x = (At A)−1 At b.
(a) rank(A)
(b) The solution to
1
Bx = 1
1
(c) (2BA)−1
(d) The matrix C so that ACB + 3I3 = 0.
83
§3.8 Determinants of 2 × 2 Matrices
This is most easily verified by directly applying the for- Corollary 3.8.3. A 2 × 2 matrix A is invertible if and
mula for matrix multiplication. So A is invertible when only if det(A) 6= 0.
ad − bc 6= 0. We shall prove below that ad − bc must be
nonzero when A is invertible. Proof If A is invertible, then AA−1 = I2 . Proposi-
From this discussion it is clear that the number ad − bc tion 3.8.2 implies that
must be an important quantity for 2 × 2 matrices. So we
det(A) det(A−1 ) = det(I2 ) = 1.
define:
Definition 3.8.1. The determinant of the 2 × 2 matrix Therefore, det(A) 6= 0. Conversely, if det(A) 6= 0, then
A is (3.8.1) implies that A is invertible.
det(A) = ad − bc. (3.8.2)
Proposition 3.8.2. As a function on 2 × 2 matrices, the Determinants and Area Suppose that v and w are two
determinant satisfies the following properties. vectors in R2 that point in different directions. Then, the
set of points
(a) The determinant of an upper triangular matrix is the
product of the diagonal elements. z = αv + βw where 0 ≤ α, β ≤ 1
(b) The determinants of a matrix and its transpose are is a parallelogram, that we denote by P . We denote
equal. the area of P by |P |. For example, the unit square S,
whose corners are (0, 0), (1, 0), (0, 1), and (1, 1), is the
(c) det(AB) = det(A) det(B). parallelogram generated by the unit vectors e1 and e2 .
Proof Both (a) and (b) are easily verified by direct Next let A be a 2 × 2 matrix and let
calculation. Property (c) is also verified by direct calcu-
A(P ) = {Az : z ∈ P }.
lation — but of a more extensive sort. Note that
It follows from linearity (since Az = αAv + βAw) that
a b α β aα + bγ aβ + bδ
= . A(P ) is the parallelogram generated by Av and Aw.
c d γ δ cα + dγ cβ + dδ
84
§3.8 Determinants of 2 × 2 Matrices
where P is the parallelogram generated by v and w. 4. Let A be a 2 × 2 matrix having integer entries. Find a
Therefore, (det A)2 = |A(S)|2 and (3.8.3) is verified. condition on the entries of A that guarantees that A−1 has
integer entries.
Theorem 3.8.5. Let P be a parallelogram in R2 and let
A be a 2 × 2 matrix. Then
5. Let A be a 2×2 matrix and assume that det(A) 6= 0. Then
|A(P )| = | det A||P |. (3.8.4) use the explicit form for A−1 given in (3.8.1) to verify that
1
Proof First note that (3.8.3) a special case of (3.8.4), det(A−1 ) = .
det(A)
since |S| = 1. Next, let P be the parallelogram gen-
erated by the (column) vectors v and w, and let B =
(v|w). Then P = B(S). It follows from (3.8.3) that 6. Suppose a 2 × 2 matrix A satisfies the following equation:
|P | = | det B|. Moreover,
0 2 −1 2
A = (3.8.5)
|A(P )| = |(AB)(S)| 1 2 1 4
= | det(AB)| Without calculating the entries of A, find det(A).
= | det A|| det B|
7. Find the entries of A defined in (3.8.5) and verify your
= | det A||P |, determinant calculation from Exercise 6.
as desired.
85
§3.8 Determinants of 2 × 2 Matrices
8. Sketch the triangle whose vertices are 0, p = (3, 0)t , and In Exercises 13 – 16 use the unit square icon in the program
q = (0, 2)t ; and find the area of this triangle. Let map to test Proposition 3.8.4, as follows. Enter the given
matrix A into map and map the unit square icon. Compute
−4 −3 det(A) by estimating the area of A(S) — given that S has
M= .
5 −2 unit area. For each matrix, use this numerical experiment to
Sketch the triangle whose vertices are 0, M p, and M q; and decide whether or not the matrix is invertible.
find the area of this triangle.
0 −2
13. (matlab) A = .
2 0
9. Cramer’s rule provides a method based on determinants
−0.5 −0.5
for finding the unique solution to the linear equation Ax = b 14. (matlab) A = .
0.7 0.7
when A is an invertible matrix. More precisely, let A be an
invertible 2 × 2 matrix and let b ∈ R2 be a column vector. Let
−1 −0.5
15. (matlab) A = .
Bj be the 2 × 2 matrix obtained from A by replacing the j th −2 −1
column of A by the vector b. Let x = (x1 , x2 )t be the unique
solution to Ax = b. Then Cramer’s rule states that
0.7071 0.7071
16. (matlab) A = .
−0.7071 0.7071
det(Bj )
xj = . (3.8.6)
det(A)
Prove Cramer’s rule. Hint: Write the general system of two
equations in two unknowns as 17. Suppose a 2 × 2 matrix A satisfies the following equation:
a11 x1 + a12 x2 = b1 0 2 −1 2
A = .
1 2 1 4
a21 x1 + a22 x2 = b2 .
Subtract a11 times the second equation from a21 times the Without calculating the entries of A, find det(A).
first equation to eliminate x1 ; then solve for x2 , and verify
(3.8.6). Use a similar calculation to solve for x1 .
4x − 3y = −1
11. Solve for y.
x + 2y = 7
86
Chapter 4 Solving Linear Differential Equations
4 Solving Linear Differential ist. In Section 4.6 we develop the theory of eigenvalues
and characteristic polynomials of 2 × 2 matrices. (The
Equations corresponding theory for n × n matrices is developed in
Chapter 7.)
The study of linear systems of equations given in Chap-
ter 2 provides one motivation for the study of matrices The method for solving planar constant coefficient linear
and linear algebra. Linear constant coefficient systems of differential equations with real eigenvalues is summarized
ordinary differential equations provide a second motiva- in Section 4.7. This method is based on the material
tion for this study. In this chapter we show how the phase of Sections 4.5 and 4.6. The complete discussion of the
space geometry of systems of differential equations moti- solutions of linear planar systems of differential equations
vates the idea of eigendirections (or invariant directions) is given in Chapter 6. This discussion is best done after
and eigenvalues (or growth rates). we have introduced the linear algebra concepts of vector
subspaces and bases in Chapter 5.
We begin this chapter with a discussion of the theory and
application of the simplest of linear differential equations, The chapter ends with an optional discussion of Markov
the linear growth equation, ẋ = λx. In Section 4.1, we chains in Section 4.8. Markov chains give a method for
solve the linear growth equation and discuss the fact that analyzing branch processes where at each time unit sev-
solutions to differential equations are functions; and we eral outcomes are possible, each with a given probability.
emphasize this point by using MATLAB to graph solu-
tions of x as a function of t. In the optional Section 4.2
we illustrate the applicability of this very simple equa-
tion with a discussion of compound interest and a simple
population model.
The next two sections introduce planar constant coeffi-
cient linear differential equations. In these sections we
use the program PhasePlane (written by John Polking
and updated by Roy Goodman) that solves numerically
planar systems of differential equations. In Section 4.3
we discuss uncoupled systems — two independent one di-
mensional systems like those presented in Section 4.1 —
whose solution geometry in the plane is somewhat more
complicated than might be expected. In Section 4.4 we
discuss coupled linear systems. Here we illustrate the
existence and nonexistence of eigendirections.
In Section 4.5 we show how initial value problems can
be solved by building the solution — through the use of
superposition as discussed in Section 3.4 — from simpler
solutions. These simpler solutions are ones generated
from real eigenvalues and eigenvectors — when they ex-
87
§4.1 A Single Differential Equation
4.1 A Single Differential Equation where C is an arbitrary constant. (It is tempting to put
a constant of integration on both sides of (4.1.2), but two
Algebraic operations such as addition and multiplication
constants are not needed, as we can just combine both
are performed on numbers while the calculus operations
constants on the right hand side of this equation.) Since
of differentiation and integration are performed on func-
the indefinite integral of dx/dt is just the function x(t),
tions. Thus algebraic equations (such as x2 = 9) are
we have
solved for numbers (x = ±3) while differential (and inte- Z
gral) equations are solved for functions. x(t) = f (τ )dτ + C. (4.1.3)
In Chapter 2 we discussed how to solve systems of linear In particular, finding closed form solutions to differential
equations, such as equations of the type (4.1.1) is equivalent to finding all
definite integrals of the function f (t). Indeed, to find
x1 + x2 = 2
closed form solutions to differential equations like (4.1.1)
x1 − x2 = 4 we need to know all of the techniques of integration from
for numbers integral calculus.
We note that if x(t) is a real-valued function of t, then
x1 = 3 and x2 = −1,
we denote the derivative of x with respect to t using the
while in this chapter we discuss how to solve some linear following
systems of differential equations for functions. dx
ẋ x0
dt
Solving a single linear equation in one unknown x is a
all of which are standard notations for the derivative.
simple task. For example, solve
2x = 4 Initial Conditions and the Role of the Integration Constant
for x = 2. Solving a single differential equation in one C Equation (4.1.3) tells us that there are an infinite
unknown function x(t) is far from trivial. number of solutions to the differential equation (4.1.1),
each one corresponding to a different choice of the con-
stant C. To understand how to interpret the constant C,
Integral Calculus as a Differential Equation Mathemati- consider the example
cally, the simplest type of differential equation is:
dx
dx (t) = cos t.
(t) = f (t) (4.1.1) dt
dt Using (4.1.3) we see that the answer is
where f is some continuous function. In words, this equa- Z
tion asks us to find all functions x(t) whose derivative is x(t) = cos τ dτ + C = sin t + C.
f (t). The fundamental theorem of calculus tells us the
answer: x(t) is an antiderivative of f (t). Thus to find Note that
all solutions, we just integrate both sides of (4.1.1) with x(0) = sin(0) + C = C.
respect to t. Formally, using indefinite integrals, Thus, the constant C represents an initial condition for
the differential equation. We will return to the discussion
Z Z
dx
(t)dt = f (t)dt + C, (4.1.2) of initial conditions several times in this chapter.
dt
88
§4.1 A Single Differential Equation
The Linear Differential Equation of Growth and Decay is a constant (independent of t). Using the product rule
The subject of differential equations that we study begins and (4.1.4), compute
when the function f on the right hand side of (4.1.1) de-
pends explicitly on the function x, and the simplest such d
x(t)e−λt =
d
(x(t)) e−λt + x(t)
d −λt
e
differential equation is: dt dt dt
= (λx(t))e−λt + x(t)(−λe−λt )
dx = 0.
(t) = x(t).
dt
Now recall that the only functions whose derivatives are
Using results from differential calculus, we can solve this identically zero are the constant functions. Thus,
equation; indeed, we can solve the slightly more compli-
cated equation x(t)e−λt = K
dx
(t) = λx(t), (4.1.4) for some constant K ∈ R. Hence x(t) has the form
dt
(4.1.5), as claimed.
where λ ∈ R is a constant. The differential equation
(4.1.4) is linear since x(t) appears by itself on the right Next, we discuss the role of the constant K. We have
hand side. Moreover, (4.1.4) is homogeneous since the written the function as x(t), and we have meant the
constant function x(t) = 0 is a solution. reader to think of the variable t as time. Thus x(0) is
the initial value of the function x(t) at time t = 0; we say
In words (4.1.4) asks: For which functions x(t) is the
that x(0) is the initial value of x(t). From (4.1.5) we see
derivative of x(t) equal to λx(t). The function
that
x(0) = K,
x(t) = eλt
and that K is the initial value of the solution of (4.1.4).
is such a function, since Henceforth, we write K as x0 so that the notation calls
attention to the special meaning of this constant.
dx d
(t) = eλt = λeλt = λx(t). By deriving (4.1.5) we have proved:
dt dt
Theorem 4.1.1. There is a unique solution to the initial
More generally, the function
value problem
x(t) = Keλt (4.1.5) dx
(t) = λx(t)
is a solution to (4.1.4) for any real constant K. We claim
dt (4.1.6)
that the functions (4.1.5) list all (differentiable) functions x(0) = x0 .
that solve (4.1.4).
That solution is
To verify this claim, we let x(t) be a solution to (4.1.4) x(t) = x0 eλt .
and show that the ratio
89
§4.1 A Single Differential Equation
x
by graphing the solutions in MATLAB.
3
Suppose we set x0 = 1 and λ = ±0.5. Type
2
x0 = 1;
lambda = 0.5; 1
t = linspace(-1,4,100);
x = x0*exp(lambda*t); 0
plot(t,x) −1 −0.5 0 0.5 1 1.5
t
2 2.5 3 3.5 4
hold on
xlabel('t') Figure 12: Solutions of (4.1.4) for t ∈ [−1, 4], x0 = 1 and
ylabel('x') λ = ±0.5.
lambda = -0.5;
x = x0*exp(lambda*t);
dx
plot(t,x) 2. ODE: = x + et .
dt
The result of this calculation is shown in Figure 12. Functions: x1 (t) = tet and x2 (t) = 2et .
In this way we can actually see the difference between dx
exponential growth (λ = 0.5) and exponential decay 3. ODE: = x2 + 1.
dt
(λ = −0.5), as discussed in the limit in (4.1.7). Functions: x1 (t) = − tan t and x2 (t) = tan t.
dx x
4. ODE: = .
Exercises dt t
Functions: x1 (t) = t + 1 and x2 (t) = 5t.
90
§4.1 A Single Differential Equation
Hint: Use the fact that the trigonometric functions sin and
cos can be evaluated in MATLAB in the same way as the ex-
ponential function, that is, by using sin and cos instead
of exp.
91
§4.2 *Rate Problems
92
§4.2 *Rate Problems
spaced) times during that year, and the effective interest It follows that
rate when compounding N times is: ∆P
= rP,
∆t
r N and, on taking the limit ∆t → 0, we have the differential
1+ − 1.
N equation
dP
For the curious, we can write a program in MATLAB (t) = rP (t).
dt
to compute (4.2.1). Suppose we assume that the initial
deposit P0 = $1, 000, the simple interest rate is 6% per Since P (0) = P0 the solution of the initial value problem
year, and the interest payments are made monthly. In given in Theorem 4.1.1 shows that
MATLAB type P (t) = P0 ert .
93
§4.2 *Rate Problems
they just differ on the precise form of r. In general, the hours? Express your answer in terms of x0 , the initial number
rate r will depend on the size of the population p as well of bacteria.
as the time t, that is, r is a function r(p, t).
The simplest population model — which we now assume
5. Suppose you deposit $10,000 in a bank at an interest of
— is the one in which r is assumed to be constant. Then
7.5% compounded continuously. How much money will be in
equation (4.2.2) is identical to (4.1.4) after identifying your account a year and a half later? How much would you
p with x and r with λ. Hence we may interpret r as have if the interest were compounded monthly?
the growth rate for the population. The form of the
solution in (4.1.5) shows that the size of a population
grows exponentially if r > 0 and decays exponentially if 6. Newton’s law of cooling states that the rate at which a
r < 0. body changes temperature is proportional to the difference
The mathematical description of this simplest population between the body temperature and the temperature of the
model shows that the assumption of a constant growth surrounding medium. That is,
rate leads to exponential growth (or exponential decay). dT
Is this realistic? Surely, no population will grow expo- = α(T − Tm ) (4.2.3)
dt
nentially for all time, and other factors, such as limited
living space, have to be taken into account. On the other where T (t) is the temperature of the body at time t, Tm is
the constant temperature of the surrounding medium, and α
hand, exponential growth describes well the growth in
is the constant of proportionality. Suppose the body is in air
human population during much of human history. So
of temperature 50◦ and the body cools from 100◦ to 75◦ in 20
this model, though surely oversimplified, gives some in- minutes. What will the temperature of the body be after one
sight into population growth. hour? Hint: Rewrite (4.2.3) in terms of U (t) = T (t) − Tm .
Exercises
7. Let p(t) be the population of group Grk at time t mea-
sured in years. Let r be the growth rate of the group Grk.
Suppose that the population of Grks changes according to the
In Exercises 1 – 3 find solutions to the given initial value differential equation (4.2.2). Find r so that the population of
problems. Grks doubles every 50 years. How large must r be so that the
dx population doubles every 25 years?
1. = sin(2t), x(π) = 2.
dt
dx
2. = t2 , x(2) = 8. 8. You deposit $4,000 in a bank at an interest of 5.5% but
dt
after half a year the bank changes the interest rate to 4.5%.
dx 1 Suppose that the interest is compounded continuously. How
3. = 2, x(1) = 1.
dt t much money will be in your account after one year?
94
§4.2 *Rate Problems
10. Two banks each pay 7% interest per year — one com-
pounds money daily and one compounds money continuously.
What is the difference in earnings in one year in an account
having $10,000.
95
§4.3 Uncoupled Linear Systems of Two Equations
4.3 Uncoupled Linear Systems of Asymptotic Stability of the Origin As we did for the
single equation (4.1.4), we ask what happens to solutions
Two Equations to (4.3.2) starting at (x0 , y0 ) as time t increases. That
A system of two linear ordinary differential equations has is, we compute
the form
dx lim (x(t), y(t)) = lim (x0 eat , y0 edt ).
(t) = ax(t) + by(t) t→∞ t→∞
dt (4.3.1) This limit is (0, 0) when both a < 0 and d < 0; but if
dy
dt
(t) = cx(t) + dy(t), either a or d is positive, then most solutions diverge to
infinity, since either
where a, b, c, d are real constants. Solutions of (4.3.1) are
pairs of functions (x(t), y(t)). lim |x(t)| = ∞ or lim |y(t)| = ∞.
t→∞ t→∞
A solution to the planar system (4.3.1) that is constant
in time t is called an equilibrium. Observe that the origin Roughly speaking, an equilibrium (x0 , y0 ) is asymptoti-
(x(t), y(t)) = (0, 0) is always an equilibrium solution to cally stable if every trajectory (x(t), y(t)) beginning from
the linear system (4.3.1). an initial condition near (x0 , y0 ) stays near (x0 , y0 ) for
We begin our discussion of linear systems of differential all positive t, and
equations by considering uncoupled systems of the form lim (x(t), y(t)) = (x0 , y0 ).
t→∞
dx
(t) = ax(t) The equilibrium is unstable if there are trajectories with
dt (4.3.2) initial conditions arbitrarily close to the equilibrium that
dy
(t) = dy(t). move far away from that equilibrium.
dt
At this stage, it is not clear how to determine whether
Since the system is uncoupled (that is, the equation for the origin is asymptotically stable for a general linear
ẋ does not depend on y and the equation for ẏ does not system (4.3.1). However, for uncoupled linear systems
depend on x), we can solve this system by solving each we have shown that the origin is an asymptotically stable
equation independently, as we did for (4.1.4): equilibrium when both a < 0 and d < 0. If either a > 0
or d > 0, then (0, 0) is unstable.
x(t) = x0 eat
(4.3.3)
y(t) = y0 edt .
Invariance of the Axes There is another observation that
There are now two initial conditions that are identified we can make for uncoupled systems. Suppose that the
by initial condition for an uncoupled system lies on the
x(0) = x0 and y(0) = y0 . x-axis; that is, suppose y0 = 0. Then the solution
(x(t), y(t)) = (x0 eat , 0) also lies on the x-axis for all time.
Similarly, if the initial condition lies on the y-axis, then
Having found all the solutions to (4.3.2) in (4.3.3), we
the solution (0, y0 edt ) lies on the y-axis for all time.
now explore the geometry of the phase plane for these
uncoupled systems both analytically and by using MAT- This invariance of the coordinate axes for uncoupled sys-
LAB. tems follows directly from (4.3.3). It turns out that
96
§4.3 Uncoupled Linear Systems of Two Equations
many linear systems of differential equations have invari- (ax0 + by0 , cx0 . + dy0 ). So the differential equation solver
ant lines; this is a topic to which we return later in this plots the direction field (f, g) and then finds curves that
chapter. are tangent to these vectors at each point in time.
The program PhasePlane, written by Roy Goodman and
Generating Phase Space Pictures with PhasePlane originally by John Polking under the name pplane, draws
How can we visualize a solution (x(t), y(t)) in (4.3.3) to two-dimensional phase planes. In MATLAB type
the system of differential equations (4.3.2)? The time se-
ries approach suggests that we should graph (x(t), y(t)) PhasePlane
as a function of t; that is, we should plot the curve
and the PhasePlane window will appear. PhasePlane has
(t, x(t), y(t)) a number of preprogrammed differential equations listed
in a menu accessed by clicking on library. To explore
in three dimensions. Using MATLAB it is possible to linear systems, choose linear system in the library.
plot such a graph — but such a graph by itself is difficult
to interpret. Alternatively, we could graph either of the To integrate the uncoupled linear system, set the param-
functions x(t) or y(t) by themselves as we do for solutions eters b and c equal to zero, a = 2, and d = −3. We
to single equations — but then some information is lost. now have the system (4.3.2). After pushing Update, the
Phase plane window is filled by vectors (f, g) indicating
The method we prefer is the phase space plot obtained by directions.
thinking of (x(t), y(t)) as the position of a particle in the
xy-plane at time t. We then graph the point (x(t), y(t)) We may start the computations by clicking with a mouse
in the plane as t varies. When looking at phase space button on an initial value (x0 , y0 ). For example, if we
plots, it is natural to call solutions trajectories, since we click approximately onto (x(0), y(0)) = (x0 , y0 ) = (1, 1),
can imagine that we are watching a particle moving in then the trajectory in the upper right quadrant of Fig-
the plane as time changes. ure 13 displays.
We begin by considering uncoupled linear equations. As First PhasePlane draws the trajectory in forward time
we saw, when the initial conditions are on a coordinate for t ≥ 0 and then it draws the trajectory in backwards
axis (either (x0 , 0) or (0, y0 )), the solutions remain on time for t ≤ 0. More precisely, when we click on a
that coordinate axis for all time t. For these initial point (x0 , y0 ) in the (x, y)-plane, PhasePlane computes
conditions, the equations behave as if they were one di- that part of the solution that lies inside the specified dis-
mensional. However, if we consider an initial condition play window and that goes through this point. For linear
(x0 , y0 ) that is not on a coordinate axis, then even for an systems there is precisely one solution that goes through
uncoupled system it is a little difficult to see what the a specified point in the (x, y)-plane.
trajectory looks like. At this point it is useful to use the
computer. Saddles, Sinks, and Sources for the Uncoupled System
The method used to integrate planar systems of differ- (4.3.2) In a qualitative fashion, the trajectories of un-
ential equations is similar to that used to integrate sin- coupled linear systems are determined by the invariance
gle equations. The solution curve (x(t), y(t)) to (4.3.2) of the coordinate axes and by the signs of the constants
at a point (x0 , y0 ) is tangent to the direction (f, g) = a and d.
97
§4.3 Uncoupled Linear Systems of Two Equations
y −1
−2
−3
−4
−5
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Figure 13: PHASEPLANE Display for (4.3.2) with a = 2, d = −3 and x, y ∈ [−5, 5]. Solutions going through (±1, ±1)
are shown.
Saddles: ad < 0 In Figure 13, where a = 2 > 0 and origin as time tends to infinity. Hence — as mentioned
d = −3 < 0, the origin is a saddle. If we choose several previously, and in contrast to saddles — the equilibrium
initial values (x0 , y0 ) one after another, then we find that (0, 0) is asymptotically stable. Observe that solutions
as time increases all solutions approach the x-axis. That approach the origin on trajectories that are tangent to
is, if (x(t), y(t)) is a solution to this system of differen- the x-axis. Since d < a < 0, the trajectory decreases
tial equations, then lim y(t) = 0. This observation is to zero faster in the y direction than it does in the x-
direction. If you change parameters so that a < d < 0,
t→∞
particularly noticeable when we choose initial conditions
close to the origin (0, 0). On the other hand, solutions then trajectories will approach the origin tangent to the
also approach the y-axis as t → −∞. These qualitative y-axis.
features of the phase plane are valid whenever a > 0 and
d < 0.
Sources: a > 0 and d > 0 Choose the constants a and d
When a < 0 and d > 0, then the origin is also a saddle so that both are positive. In forward time, all trajecto-
— but the roles of the x and y axes are reversed. ries, except the equilibrium at the origin, move towards
infinity and the origin is called a source.
98
§4.3 Uncoupled Linear Systems of Two Equations
solution (x(t), y(t)) in the upper right window of Phase- 7. Use the phase plane picture given in Figure
13 to draw
Plane. the time series x(t) when (x(0), y(0)) =
1 1
, . Check your
2 2
answer using PhasePlane.
Exercises
5. a = 0 and d = −2.3.
(a) Show that the points (x(t), y(t)) lie on the curve whose
equation is:
y0a xd − xd0 y a = 0.
99
§4.4 Coupled Linear Systems
4.4 Coupled Linear Systems Similarly, in backward time t the solutions approach the
anti-diagonal x1 = −x2 . In other words, as for the case
The general linear constant coefficient system in two un-
of uncoupled systems, we find two distinguished direc-
known functions x1 , x2 is:
tions in the (x, y)-plane. See Figure 14. Moreover, the
dx1 computations indicate that these lines are invariant in
(t) = ax1 (t) + bx2 (t) the sense that solutions starting on these lines remain on
dt (4.4.1) them for all time. This statement can be verified numer-
dx2
dt
(t) = cx1 (t) + dx2 (t). ically by choosing initial conditions (x0 , y0 ) = (1, 1) and
(x0 , y0 ) = (1, −1).
The uncoupled systems studied in Section 4.3 are ob-
tained by setting b = c = 0 in (4.4.1). We have discussed
how to solve (4.4.1) by formula (4.3.3) when the system 5
is uncoupled. We have also discussed how to visualize the 4
phase plane for different choices of the diagonal entries 3
a and d. At present, we cannot solve (4.4.1) by formula
when the coefficient matrix is not diagonal. But we may
2
y
by solving −1
−2
dx1
(t) = −x1 (t) + 3x2 (t) −3
dt −4
dx2
(t) = 3x1 (t) − x2 (t). −5
dt
−6 −4 −2 0 2 4 6
100
§4.4 Coupled Linear Systems
the x and y axes are eigendirections. The numerical Nonexistence of Eigendirections We now show analyti-
computations that we have just performed indicate that cally that certain linear systems of differential equations
eigendirections exist for many coupled systems. This dis- have no invariant lines in their phase portrait. Consider
cussion leads naturally to two questions: the system
ẋ = y
(4.4.2)
ẏ = −x.
(a) Do eigendirections always exist?
Observe that (x(t), y(t)) = (sin t, cos t) is a solution to
(b) How can we find eigendirections? (4.4.2) by calculating
d
ẋ(t) = sin t = cos t = y(t)
The second question will be answered in Sections 4.5 and dt
4.6. We can answer the first question by performing ẏ(t) =
d
cos t = − sin t = −x(t)
another numerical computation. In the setup window, dt
change the parameter b to −2. Then numerically com- We have shown analytically that the unit circle centered
pute some solutions to see that there are no eigendirec- at the origin is a solution trajectory for (4.4.2). Hence
tions in the phase space of this system. Observe that all (4.4.2) has no eigendirections. It may be checked using
solutions appear to spiral into the origin as time goes to MATLAB that all solution trajectories for (4.4.2) are just
infinity. The phase portrait is shown in Figure 15. circles centered at the origin.
Exercises
5
2
1. (matlab) Choose the linear system in PhasePlane and set
1 a = 0, b = 1, and c = −1. Then find values d such that except
0 for the origin itself all solutions appear to
y
−5
2. (matlab) Choose the linear system in PhasePlane and
−5 −4 −3 −2 −1 0
x
1 2 3 4 5
set a = −1, c = 3, and d = −1. Then find a value for
b such that the behavior of the solutions of the system is
Figure 15: PHASEPLANE Display for the linear system “qualitatively” the same as for a diagonal system where a
with a = −1, b = −2, c = 3, d = −1. and d are negative. In particular, the origin should be an
asymptotically stable equilibrium and the solutions should
approach that equilibrium along a distinguished line.
101
§4.4 Coupled Linear Systems
3. (matlab) Choose the linear system in PhasePlane and set 8. The ODE is:
a = d and b = c. Verify that for these systems of differential
equations: ẋ = y
1 1
(a) When |a| < b typical trajectories approach the line y = x ẏ = − x + y + 1.
t2 t
as t → ∞ and the line y = −x as t → −∞.
The pairs of functions are:
(b) Assume that b is positive, a is negative, and b < −a.
With these assumptions show that the origin is a sink (x1 (t), y1 (t)) = (t2 , 2t) and (x2 (t), y2 (t)) = (2t2 , 4t).
and that typical trajectories approach the origin tangent
to the line y = x.
102
§4.5 The Initial Value Problem and Eigenvectors
In Section 4.4, we plotted the phase space picture of the for some scalar α.
planar system of differential equations Similarly, we also saw in our MATLAB experiments that
there was a solution that for all time stayed on the anti-
ẋ x(t)
=C (4.5.3) diagonal, the line y = −x. Such a solution must have the
ẏ y(t)
103
§4.5 The Initial Value Problem and Eigenvectors
104
§4.5 The Initial Value Problem and Eigenvectors
It follows from (4.5.7) and (4.5.8) that if v is an eigen- We ask: Is there a value of λ and a nonzero vector (x, y)
vector of C with eigenvalue λ, then such that
x x
du C =λ ? (4.5.9)
= λu. y y
dt
Thus we have returned to our original linear differential Equation (4.5.9) implies that
equation that has solutions
−1 − λ −2 x
u(t) = Keλt , = 0.
3 −1 − λ y
105
§4.5 The Initial Value Problem and Eigenvectors
2 ! in matrix form.
1
1+λ .
3 −1 − λ
2. Show that all solutions to the system of linear differential
Subtracting 3 times the 1st row from the second produces equations
the matrix
dx
= 3x
2 dt
1 1+λ dy
= −2y
.
6 dt
0 −(1 + λ) −
1+λ are linear combinations of the two solutions
This matrix is not row equivalent to I2 when the lower
1 0
U (t) = e3t and V (t) = e−2t
right hand entry is zero; that is, when 0 1
.
6
(1 + λ) + = 0.
1+λ 3. Consider
dX
That is, when (t) = CX(t) (4.5.10)
dt
2
(1 + λ) = −6, where
2 3
which is not possible for any real number λ. This example C=
0 −1
.
shows that the question of whether a given matrix has
Let
a real eigenvalue and a real eigenvector — and hence
1
1
when the associated system of differential equations has v1 = and v2 = ,
0 −1
a line that is invariant under the dynamics — is a subtle and let
question. Y (t) = e2t v1 and Z(t) = e−t v2 .
Questions concerning eigenvectors and eigenvalues are
(a) Show that Y (t) and Z(t) are solutions to (4.5.10).
central to much of the theory of linear algebra. We dis-
cuss this topic for 2×2 matrices in Section 4.6 and Chap- (b) Show that X(t) = 2Y (t)−14Z(t) is a solution to (4.5.10).
ter 6 and for general square matrices in Chapters 7 and (c) Use the principle of superposition to verify that X(t) =
11. αY (t) + βZ(t) is a solution to (4.5.10).
106
§4.5 The Initial Value Problem and Eigenvectors
(d) Using the general solution found in part (c), find a solu- 7. Let
tion X(t) to (4.5.10) such that
1 2
C= .
−3 −1
Show that C has no real eigenvectors.
3
X(0) = .
−1
107
§4.6 Eigenvalues of 2 × 2 Matrices
Characteristic Polynomials Corollary 3.8.3 of Chapter 3 It is now easy to verify (4.6.5) for (4.6.6).
states that 2 × 2 matrices are singular precisely when
their determinant is zero. It follows that λ ∈ R is an Eigenvalues For 2 × 2 matrices A, pA (λ) is a quadratic
eigenvalue for the 2 × 2 matrix A precisely when polynomial. As we have discussed, the real roots of pA
det(A − λI2 ) = 0. (4.6.2) are real eigenvalues of A. For 2 × 2 matrices we now gen-
eralize our first definition of eigenvalues, Definition 4.5.1,
We can compute (4.6.2) explicitly as follows. Note that to include complex eigenvalues, as follows.
A − λI2 =
a−λ b
. Definition 4.6.2. An eigenvalue of A is a root of the
c d−λ characteristic polynomial pA .
Therefore
It follows from Definition 4.6.2 that every 2×2 matrix has
det(A − λI2 ) = (a − λ)(d − λ) − bc precisely two eigenvalues, which may be equal or complex
= λ2 − (a + d)λ + (ad − bc).(4.6.3) conjugate pairs.
108
§4.6 Eigenvalues of 2 × 2 Matrices
Suppose that λ1 and λ2 are the roots of pA . It follows Since the characteristic polynomial of 2 × 2 matrices is
that always a quadratic polynomial, it follows that 2 × 2 ma-
trices have precisely two eigenvalues — including mul-
pA (λ) = (λ−λ1 )(λ−λ2 ) = λ2 −(λ1 +λ2 )λ+λ1 λ2 . (4.6.7) tiplicity — and these can be described as follows. The
discriminant of A is:
Equating the two forms of pA (4.6.5) and (4.6.7) shows
that
D = [tr(A)]2 − 4 det(A). (4.6.10)
tr(A) = λ1 + λ2 (4.6.8)
Theorem 4.6.3. There are three possibilities for the two
det(A) = λ1 λ2 . (4.6.9) eigenvalues of a 2 × 2 matrix A that we can describe in
Thus, for 2 × 2 matrices, the trace is the sum of the terms of the discriminant:
eigenvalues and the determinant is the product of the
eigenvalues. In Chapter 7, Theorems 7.2.4(b) and 7.2.9 (i) The eigenvalues of A are real and distinct (D > 0).
we show that these statements are also valid for n × n
matrices. (ii) The eigenvalues of A are a complex conjugate pair
(D < 0).
Recall that in example (4.6.6) the characteristic polyno-
mial is (iii) The eigenvalues of A are real and equal (D = 0).
Thus the eigenvalues of A are λ1 = 1 and λ2 = 5 and Proof We can find the roots of the characteristic poly-
identities (4.6.8) and (4.6.9) are easily verified for this nomial using the form of pA given in (4.6.5) and the
example. quadratic formula. The roots are:
Using the quadratic formula we see that the roots of pB Eigenvectors The following lemma contains an impor-
(that is, the eigenvalues of B) are tant observation about eigenvectors:
√ √
λ1 = 3 + i 2 and λ2 = 3 − i 2. Lemma 4.6.4. Every eigenvalue λ of a 2 × 2 matrix A
has an eigenvector v. That is, there is a nonzero vector
Again the sum of the eigenvalues is 6 which equals the v ∈ C2 satisfying
trace of B and the product of the eigenvalues is 11 which Av = λv.
equals the determinant of B.
109
§4.6 Eigenvalues of 2 × 2 Matrices
Proof When the eigenvalue λ is real we know that an It follows that v1 = (3, −1)t is an eigenvector since
eigenvector v ∈ R2 exists. However, when λ is complex,
then we must show that there is a complex eigenvector (A − I2 )v1 = 0.
v ∈ C2 , and this we have not yet done. More precisely, we
Similarly, to find an eigenvector associated with the
must show that if λ is a complex root of the characteristic
eigenvalue λ2 = 5 compute
polynomial pA , then there is a complex vector v such that
(A − λI2 )v = 0. −3 3
A − λ2 I2 = A − 5I2 = .
1 −1
As we discussed in Section 2.5, finding v is equivalent to
It follows that v2 = (1, 1)t is an eigenvector since
showing that the complex matrix
(A − 5I2 )v2 = 0.
a−λ b
A − λI2 =
c d−λ
is not row equivalent to the identity matrix. See The- Examples of Matrices with Complex Eigenvectors Let
orem 2.5.2 of Chapter 2. Since a is real and λ is not,
0 −1
a − λ 6= 0. A short calculation shows that A − λI2 is row A= .
1 0
equivalent to the matrix
b Then pA (λ) = λ2 + 1 and the eigenvalues of A are ±i. To
1 a−λ find the eigenvector v ∈ C2 whose existence is guaranteed
.
pA (λ) by Lemma 4.6.4, we need to solve the complex system of
0
a−λ linear equations Av = iv. We can rewrite this system as:
This matrix is not row equivalent to the identity matrix
−i −1
v1
since pA (λ) = 0. 1 −i v2
= 0.
110
§4.6 Eigenvalues of 2 × 2 Matrices
More generally, let In Exercises 6 – 8 compute the eigenvalues for the given 2 × 2
matrix.
σ −τ
(4.6.12)
A= , 1 2
τ σ 6. .
0 −5
where τ 6= 0. Then
−3 2
7. .
1 0
pA (λ) = λ2 − 2σλ + σ 2 + τ 2
= (λ − (σ + iτ ))(λ − (σ − iτ )), 3 −2
8. .
2 −1
and the eigenvalues of A are the complex conjugates
σ ± iτ . Thus A has no real eigenvectors. The com-
plex eigenvectors of A are v and v where v is defined 9. Suppose that the characteristic polynomial of the 2 × 2
in (4.6.11). matrix A is pA (λ) = λ2 + 2λ − 6. Find det(A) and tr(A).
not invertible? Note: These values of λ are just the eigen- In Exercises 11 – 13 use the program map to guess whether
1 4
values of the matrix . the given matrix has real or complex conjugate eigenvalues.
2 3
For each example, write the reasons for your guess.
0.97 −0.22
In Exercises 2 – 5 compute the determinant, trace, and char- 11. (matlab) A = .
0.22 0.97
acteristic polynomials for the given 2 × 2 matrix.
0.97 0.22
12. (matlab) B = .
1 4
2. . 0.22 0.97
0 −1
0.4 −1.4
13. (matlab) C = .
2 13
3. . 1.5 0.5
−1 5
1 4
4. . In Exercises 14 – 15 use the program map to guess one of the
1 −1
eigenvectors of the given matrix. What is the corresponding
eigenvalue? Using map, can you find a second eigenvalue and
4 10
5. . eigenvector?
2 5
111
§4.6 Eigenvalues of 2 × 2 Matrices
2 4
14. (matlab) A = .
2 0
2 −1
15. (matlab) B = .
0.25 1
Hint: Use the feature Rescale in the MAP Options. Then
the length of the vector is rescaled to one after each use of
the command Map. In this way you can avoid overflows in
the computations while still being able to see the directions
where the vectors are moved by the matrix mapping.
112
§4.7 Initial Value Problems Revisited
4.7 Initial Value Problems Revisited These roots may be found either by factoring pC or by
using the quadratic formula. The roots are real and dis-
To summarize the ideas developed in this chapter, we
tinct when the discriminant
review the method that we have developed to solve the
system of differential equations D = tr(C)2 − 4 det(C) > 0.
satisfying the initial conditions Step 2: Find eigenvectors v1 and v2 of C associated with
the eigenvalues λ1 and λ2 .
x(0) = x0
(4.7.2) For j = 1 and j = 2, the eigenvector vj is found by
y(0) = y0 . solving the homogeneous system of linear equations
(C − λj I2 )v = 0 (4.7.5)
Begin by rewriting (4.7.1) in matrix form
for one nonzero solution. Lemma 4.6.4 tells us that there
Ẋ = CX (4.7.3) is always a nonzero solution to (4.7.5) since λj is an eigen-
value of C.
where
Rewrite the initial conditions (4.7.2) in vector form X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 , (4.7.6)
where α1 , α2 ∈ R.
X(0) = X0 (4.7.4)
Theorem 4.5.3 tells us that for j = 1, 2
where
x0
Xj (t) = eλj t vj
X0 = .
y0 is a solution to (4.7.3). The principle of superposition
(see Section 4.5) allows us to conclude that
When the eigenvalues of C are real and distinct we now
know how to solve the initial value problem (4.7.3) and X(t) = α1 X1 (t) + α2 X2 (t)
(4.7.4). This solution is found in four steps.
is also a solution to (4.7.3) for any scalars α1 , α2 ∈ R.
Thus, (4.7.6) is valid.
Step 1: Find the eigenvalues λ1 and λ2 of C. Note that the initial condition corresponding to the gen-
These eigenvalues are the roots of the characteristic poly- eral solution (4.7.6) is
nomial as given by (4.6.5): X(0) = α1 v1 + α2 v2 , (4.7.7)
pC (λ) = λ2 − tr(C)λ + det(C). since e0 = 1.
113
§4.7 Initial Value Problems Revisited
Step 4: Solve the initial value problem by solving the with initial conditions
system of linear equations
x(0) = 2
(4.7.12)
X0 = α1 v1 + α2 v2 (4.7.8) y(0) = −3.
for α1 and α2 (see (4.7.7)). Rewrite the system (4.7.11) in matrix form as
Let A be the 2 × 2 matrix whose columns are v1 and v2 .
Ẋ = CX
That is,
A = (v1 |v2 ). (4.7.9) where
3 −1
Then we may rewrite (4.7.8) in the form C= .
4 −2
A
α1
= X0 . (4.7.10) Rewrite the initial conditions (4.7.12) in vector form
α2
2
X(0) = X0 = .
We claim that the matrix A = (v1 |v2 ) (defined in (4.7.9)) −3
is always invertible. Recall Lemma 4.5.2 which states
that if w is a nonzero multiple of v2 , then w is also an Now proceed through the four steps outlined previously.
eigenvector of A associated to the eigenvalue λ2 . Since
the eigenvalues λ1 and λ2 are distinct, it follows that
Step 1: Find the eigenvalues of C.
the eigenvector v1 is not a scalar multiple of the eigen-
vector v2 (see Lemma 4.5.2). Therefore, the area of the The characteristic polynomial of C is
parallelogram spanned by v1 and v2 is nonzero and the
pC (λ) = λ2 −tr(C)λ+det(C) = λ2 −λ−2 = (λ−2)(λ+1).
determinant of A is nonzero by Theorem 3.8.5 of Chap-
ter 3. Corollary 3.8.3 of Chapter 3 now implies that A is Therefore, the eigenvalues of C are
invertible. Thus, the unique solution to (4.7.10) is
λ1 = 2 and λ2 = −1.
α1
= A−1 X0 .
α2
Step 2: Find the eigenvectors of C.
This equation is easily solved since we have an explicit
formula for A−1 when A is a 2 × 2 matrix (see (3.8.1) in Find an eigenvector associated with the eigenvalue λ1 = 2
Section 3.8). Indeed, by solving the system of equations
1
3 −1 2 0
−1 d −b (C − λ1 I2 )v = − v
A = . 4 −2 0 2
det(A) −c a
1 −1
= v = 0.
4 −4
An Initial Value Problem Solved by Hand Solve the linear
system of differential equations One particular solution to this system is
ẋ = 3x − y 1
(4.7.11) v1 = .
ẏ = 4x − 2y 1
114
§4.7 Initial Value Problems Revisited
Similarly, find an eigenvector associated with the eigen- It follows that we solve for the coefficients αj as
value λ2 = −1 by solving the system of equations
α1 1 4 −1 2 1 11
3 −1
−1 0
= A−1 X0 = = .
(C − λ2 I2 )v = − v α2 3 −1 1 −3 3 −5
4 −2 0 −1
In coordinates
4 −1
= v = 0. 11 5
4 −1 α1 = and α2 = − .
3 3
One particular solution to this system is
1
The solution to the initial value problem (4.7.11) and
v2 =
4
. (4.7.12) is:
1
11e2t v1 − 5e−t v2
X(t) =
Step 3: Write the general solution to the system of dif- 3
ferential equations.
1 1 1
= 11e2t − 5e−t .
3 1 4
Using superposition the general solution to the system
(4.7.11) is: Expressing the solution in coordinates, we obtain:
1 1 1
2t −t 2t −t
11e2t − 5e−t
X(t) = α1 e v1 +α2 e v2 = α1 e +α2 e , x(t) =
1 4 3
1
where α1 , α2 ∈ R. Note that the initial state of this 11e2t − 20e−t .
y(t) =
3
solution is:
X(0) = α1
1
+ α2
1
=
α1 + α2
. An Initial Value Problem Solved using MATLAB Next,
1 4 α1 + 4α2 solve the system of ODEs
ẋ = 1.7x + 3.5y
Step 4: Solve the initial value problem.
ẏ = 1.3x − 4.6y
Let
with initial conditions
1 1
A = (v1 |v2 ) = .
1 4
The equation for the initial condition is x(0) = 2.7
y(0) = 1.1 .
α1
A = X0 .
α2 Rewrite this system in matrix form as
See (4.7.9).
Ẋ = CX
We can write the inverse of A by formula as
where
−1 1 4 −1 1.7 3.5
A = . C= .
3 −1 1 1.3 −4.6
115
§4.7 Initial Value Problems Revisited
C = [1.7 3.5; 1.3 -4.6] Similarly, to find an eigenvector associated to the eigen-
X0 = [2.7; 1.1] value λ2 type
C2 = C - lambda(2)*eye(2);
Step 1: Find the eigenvalues of C by typing v2 = null(C2)
and obtaining v2 =
-0.4496
lambda = 0.8932
2.3543
-5.2543 Step 3: The general solution to this system of differen-
tial equations is:
So the eigenvalues of C are real and distinct.
2.3543t −0.9830 −5.2543t −0.4496
X(t) = α1 e +α2 e .
−0.1838 0.8932
Step 2: To find the eigenvectors of C we need to solve
two homogeneous systems of linear equations. The ma-
trix associated with the first system is obtained by typing Step 4: Solve the initial value problem by finding the
scalars α1 and α2 . Form the matrix A by typing
C1 = C - lambda(1)*eye(2)
A = [v1 v2]
which yields
Then solve for the α’s by typing
C1 = alpha = inv(A)*X0
-0.6543 3.5000
1.3000 -6.9543 obtaining
116
§4.7 Initial Value Problems Revisited
117
§4.8 *Markov Chains
4.8 *Markov Chains each hour the cat is asked to move from the room that it
is in to another. True to form, however, the cat chooses
Markov chains provide an interesting and useful applica-
with equal probability to stay in the room for another
tion of matrices and linear algebra. In this section we
hour or to move through one of the allowed passages.
introduce Markov chains via some of the theory and two
Suppose that we let pij be the probability that the cat
examples. The theory can be understood and applied
will move from room i to room j; in particular, pii is the
to examples using just the background in linear algebra
probability that the cat will stay in room i. For example,
that we have developed in this chapter.
when the cat is in room 1, it has four choices — it can stay
in room 1 or move to any of the other rooms. Assuming
An Example of Cats Consider the four room apartment that each of these choices is made with equal probability,
pictured in Figure 16. One way passages between the we see that
rooms are indicated by arrows. For example, it is possible 1 1 1 1
to go from room 1 directly to any other room, but when p11 = p12 = p13 = p14 = .
4 4 4 4
in room 3 it is possible to go only to room 4.
It is now straightforward to verify that
1 1
p21 = p22 = p23 = 0 p24 = 0
2 2
1 2 p31 = 0 p32 = 0
1
p33 =
1
2
1
p34 =
1
2
1
p41 = 0 p42 = p43 = p44 = .
3 3 3
Putting these probabilities together yields the transition
matrix: 1 1 1 1
4 4 4 4
1 1
0 0
P = 2 2 (4.8.1*)
0 1 1
0
3 4
2 2
1 1 1
0
3 3 3
This transition matrix has the properties that all entries
are nonnegative and that the entries in each row sum to
1.
Figure 16: Schematic design of apartment passages.
Three Basic Questions Using the transition matrix P ,
Suppose that there is a cat in the apartment and that at we discuss the answers to three questions:
118
§4.8 *Markov Chains
(4.8.3)
(1)
v2 = p12 v1 + p22 v2 + p32 v3 + p42 v4 ;
(4.8.2)
(2)
p14 = p11 p14 + p12 p24 + p13 p34 + p14 p44 ;
that is, v2 is the sum of the proportion of cats in each
(1)
that is, the probability is the sum of the probabilities
that the cat will move from room 1 to each room i and room i that are expected to migrate to room 2 in one
then from room i to room 4. In this case the answer is: step. In this case, the answer is:
(2) 1 1 1 1 1 1 1 13 1 1 1
p14 = × + ×0+ × + × = ≈ 0.27 . v1 + v2 + v4 .
4 4 4 4 2 4 3 48 4 2 3
It follows from (4.8.2) and the definition of matrix multi- It now follows from (4.8.3), the definition of the transpose
plication that p14 is just the (1, 4)th entry in the matrix
(2) of a matrix, and the definition of matrix multiplication
that v2 is the 2nd entry in the vector P t V0 . Indeed,
(1)
P . An induction argument shows that the probability
2
of the cat moving from room i to room j in k steps is it follows by induction that vi is the ith entry in the
(k)
precisely the (i, j)th entry in the matrix P k — which vector (P t )k V0 which answers the first part of Question
answers Question (A). In particular, we can answer the (B).
question: What is the probability that the cat will move We may rephrase the second part of Question (B) as
from room 4 to room 3 in four steps? Using MATLAB the follows. Let
answer is given by typing e4_10_1 to recall the matrix
P and then typing Vk = (v1k , v2k , v3k , v4k )t = (P t )k V0 .
P4 = P^4; Question (B) actually asks: What will the vector Vk look
P4(4,3) like for large k. To answer that question we need some
119
§4.8 *Markov Chains
results about matrices like the matrix P in (4.8.1*). But cat starts in room 1; then the initial distribution of cats
first we explore the answer to this question numerically is one cat in room 1 and zero cats in any of the other
using MATLAB. rooms. So V0 = e1 . In our discussion of Question (B) we
saw that the 3rd entry in (P t )k V0 gives the probability
Suppose, for example, that the initial vector is
ck that the cat will be in room 3 after k steps.
2 In the extreme, suppose that the probability that the cat
will be in room 3 is 1 for each step k. Then the fraction
43
V0 = 21 .
(4.8.4*)
of the time that the cat is in room 3 is
34
(1 + 1 + · · · + 1)/100 = 1.
Typing e4_10_1 and e4_10_4 enters the matrix P and
the initial vector V0 into MATLAB. To compute V20 , the In general, the fraction of the time f that the cat will be
distribution of cats after 20 steps, type in room 3 during a span of 100 steps is
Q=P' 1
f= (c1 + c2 + · · · + c100 ).
V20 = Q^(20)*V0 100
Since ck = (P t )k V0 , we see that
and obtain
1
f= (P t V0 + (P t )2 V0 + · · · + (P t )100 V0 ). (4.8.5)
V20 = 100
18.1818
27.2727 So, to answer Question (C), we need a way to sum the
27.2727 expression for f in (4.8.5), at least approximately. This
27.2727 is not an easy task — though the answer itself is easy to
explain. Let V be the eigenvector of P t with eigenvalue
Thus, after rounding to the nearest integer, we expect 1 such that the sum of the entries in V is 1. The answer
27 cats to be in each of rooms 2,3 and 4 and 18 cats to is: f is approximately equal to V . See Theorem 4.8.4 for
be in room 1 after 20 steps. In fact, the vector V20 has a more precise statement.
a remarkable feature. Compute Q*V20 in MATLAB and In our previous calculations the vector V20 was seen to be
see that V20 = P t V20 ; that is, V20 is, to within four digit (approximately) an eigenvector of P t with eigenvalue 1.
numerical precision, an eigenvector of P t with eigenvalue Moreover the sum of the entries in V20 is precisely 100.
equal to 1. This computation was not a numerical acci- Therefore, we normalize V20 to get V by setting
dent, as we now describe. Indeed, compute V20 for several
initial distributions V0 of cats and see that the answer will 1
V = V20 .
always be the same — up to four digit accuracy. 100
So, the fraction of time that the cat spends in room 3 is
A Discussion of Question (C) Suppose there is just one f ≈ 0.2727. Indeed, we expect the cat to spend approxi-
cat in the apartment; and we ask how many times that mately 27% of its time in rooms 2,3,4 and about 18% of
cat is expected to visit room 3 in 100 steps. Suppose the its time in room 1.
120
§4.8 *Markov Chains
Markov Matrices We now abstract the salient proper- Proposition 4.8.2. Let P be a transition matrix for a
ties of our cat example. A Markov chain is a system Markov chain.
with a finite number of states labeled 1,…,n along with
probabilities pij of moving from site i to site j in a single (a) The probability of moving from site i to site j in
step. The Markov assumption is that these probabili- exactly k steps is the (i, j)th entry in the matrix P k .
ties depend only on the site that you are in and not on
how you got there. In our example, we assumed that the (b) The expected number of individuals at site i after
probability of the cat moving from say room 2 to room 4 exactly k steps is the ith entry in the vector Vk ≡
did not depend on how the cat got to room 2 in the first (P t )k V0 .
place. (c) P is a Markov matrix.
We make a second assumption: there is a k such that
it is possible to move from any site i to any site j in Proof Only minor changes in our discussion of the cat
exactly k steps. This assumption is not valid for general example proves parts (a) and (b) of the proposition.
Markov chains, though it is valid for the cat example,
(c) The assumption that it is possible to move from each
since it is possible to move from any room to any other
site i to each site j in exactly k steps means that the
room in that example in exactly three steps. (It takes a
(i, j)th entry of P k is positive. For that k, all of the
minimum of three steps to get from room 3 to room 1 in
entries of P k are positive. In the cat example, all entries
the cat example.) To simplify our discussion we include
of P 3 are positive.
this assumption in our definition of a Markov chain.
Definition 4.8.1. Markov matrices are square matrices Proposition 4.8.2 gives the answer to Question (A) and
P such that the first part of Question (B) for general Markov chains.
Let vi ≥ 0 be the number of individuals initially at
(0)
(a) all entries in P are nonnegative,
site i, and let V0 = (v1 , . . . , vn(0) )t . The total number of
(0)
(b) the entries in each row of P sum to 1, and individuals in the initial population is:
(c) there is a positive integer k such that all of the en- (0)
#(V0 ) = v1 + · · · + vn(0) .
tries in P k are positive.
Theorem 4.8.3. Let P be a Markov matrix. Then
It is straightforward to verify that parts (a) and (b) in the
definition of Markov matrices are satisfied by the transi- (a) #(Vk ) = #(V0 ); that is, the number of individuals
tion matrix after k time steps is the same as the initial number.
(b) V = lim Vk exists and #(V ) = #(V0 ).
p11 · · · p1n
P = ... .. .. k→∞
. .
of a Markov chain. To verify part (c) requires further Proof (a) By induction it is sufficient to show that
discussion. #(V1 ) = #(V0 ). We do this by calculating from V1 =
121
§4.8 *Markov Chains
P t V0 that See (4.8.5). The proof of this theorem involves being able
to calculate the limit of fN as N → ∞. There are two
(1)
#(V1 ) = v1 + · · · + vn(1) main ideas. First, the limit of the matrix (P t )N exists
(p11 v1 + · · · + pn1 vn(0) ) + · · · + (p1n v1 + · · · + pnnasvn(0)
N )approaches infinity — call that limit Q. Moreover,
(0) (0)
=
(0)
Q is a matrix all of whose columns equal V . Second, for
= (p11 + · · · + p1n )v1 + · · · + (pn1 + · · · + pnn )vn(0) large N , the sum
(0)
= v1 + · · · + vn(0)
P t + (P t )2 + · · · + (P t )N ≈ Q + Q + · · · + Q = N Q,
since the entries in each row of P sum to 1. Thus #(V1 ) =
so that the limit of the fN is Qei = V .
#(V0 ), as claimed.
The verification of these statements is beyond the scope
(b) The hard part of this theorem is proving that the
of this text. For those interested, the idea of the proof
limiting vector V exists; we give a proof of this fact in
of the second part is roughly the following. Fix k large
Chapter 11, Theorem 11.4.4. Once V exists it follows
enough so that (P t )k is close to Q. Then when N is large,
directly from (a) that #(V ) = #(V0 ).
much larger than k, the sum of the first k terms in the
(c) Just calculate that series is nearly zero.
P t V = P t ( lim Vk ) = P t ( lim (P t )k V0 ) Theorem 4.8.4 gives the answer to Question (C) for a
k→∞ k→∞
= lim (P )t k+1 t k
V0 = lim (P ) V0 = V, general Markov chain. It follows from Theorem 4.8.4 that
k→∞ k→∞ for Markov chains the amount of time that an individual
spends in room i is independent of the individual’s initial
which proves (c).
room — at least after a large number of steps.
A complete proof of this theorem relies on a result known
Theorem 4.8.3(b) gives the answer to the second part of
as the ergodic theorem. Roughly speaking, the ergodic
Question (B) for general Markov chains. Next we discuss
theorem relates space averages with time averages. To
Question (C).
see how this point is relevant, note that Question (B)
Theorem 4.8.4. Let P be a Markov matrix. Let V be deals with the issue of how a large number of individuals
the eigenvector of P t with eigenvalue 1 and #(V ) = 1. will be distributed in space after a large number of steps,
Then after a large number of steps N the expected number while Question (C) deals with the issue of how the path
of times an individual will visit site i is N vi where vi is of a single individual will be distributed in time after a
the ith entry in V . large number of steps.
Sketch of Proof In our discussion of Question (C) An Example of Umbrellas This example focuses on the
for the cat example, we explained why the fraction fN utility of answering Question (C) and reinforces the fact
of time that an individual will visit site j when starting that results in Theorem 4.8.3 have the second interpre-
initially at site i is the j th entry in the sum tation given in Theorem 4.8.4.
122
§4.8 *Markov Chains
for his office, then the man takes an umbrella from home Specifically,
to office, assuming that he has an umbrella at home. If
it is raining in the afternoon, then the man takes an 1 0 0 0 0
umbrella from office to home, assuming that he has an 0.2 0.8 0 0 0
(4.8.6*)
umbrella in his office. Suppose that the probability that P = 0 0.2 0.8 0 0
it will rain in the morning is p = 0.2 and the probability 0 0 0.2 0.8 0
that it will rain in the afternoon is q = 0.3, and these 0 0 0 0.2 0.8
probabilities are independent. What percentage of days
will the man get wet going from home to office; that is, 0.7 0.3 0 0 0
what percentage of the days will the man be at home on 0 0.7 0.3 0 0
a rainy morning with all of his umbrellas at the office? Q= 0 0 0.7 0.3 0
0 0 0 0.7 0.3
There are five states in the system depending on the num- 0 0 0 0 1
ber of umbrellas that are at home. Let si where 0 ≤ i ≤ 4
be the state with i umbrellas at home and 4 − i umbrel- The transition matrix M from moving from state si on
las at work. For example, s2 is the state of having two one morning to state sj the next morning is just M =
umbrellas at home and two at the office. Let P be the P Q. We can compute this matrix using MATLAB by
5 × 5 transition matrix of state changes from morning to typing
afternoon and Q be the 5 × 5 transition matrix of state
changes from afternoon to morning. For example, the e4_10_6
probability p23 of moving from site s2 to site s3 is 0, since M = P*Q
it is not possible to have more umbrellas at home after
going to work in the morning. The probability q23 = q, obtaining
since the number of umbrellas at home will increase by
one only if it is raining in the afternoon. The transition M =
probabilities between all states are given in the following 0.7000 0.3000 0 0 0
transition matrices: 0.1400 0.6200 0.2400 0 0
0 0.1400 0.6200 0.2400 0
0 0 0.1400 0.6200 0.2400
1 0 0 0 0
p 1−p 0 0 0
0 0 0 0.1400 0.8600
P = 0 p 1−p 0 0 ;
It is easy to check using MATLAB that all entries in the
0 0 p 1−p 0
0 0 0 p 1−p matrix M 4 are nonzero. So M is a Markov matrix and
we can use Theorem 4.8.4 to find the limiting distri-
bution of states. Start with some initial condition like
V0 = (0, 0, 1, 0, 0)t corresponding to the state in which
1−q q 0 0 0
0 1−q q 0 0 two umbrellas are at home and two at the office. Then
compute the vectors Vk = (M t )k V0 until arriving at an
Q= 0 0 1−q q 0
eigenvector of M t with eigenvalue 1. For example, V70 is
0 0 0 1−q q
0 0 0 0 1 computed by typing V70 = M'^(70)*V0 and obtaining
123
§4.8 *Markov Chains
V70 =
0.0419
0.0898
2
0.1537
0.2633
0.4512
5
We interpret V ≈ V70 in the following way. Since v1
is approximately .042, it follows that for approximately
4.2% of all steps the umbrellas are in state s0 . That is,
1 3
approximately 4.2% of all days there are no umbrellas at
home. The probability that it will rain in the morning
on one of those days is 0.2. Therefore, the probability of
being at home in the morning when it is raining without
any umbrellas is approximately 0.008. 4
124
§4.8 *Markov Chains
8. (matlab) A truck rental company has locations in three 11. (matlab) Suppose that the original man in the text with
cities A, B and C. Statistically, the company knows that the umbrellas has only three umbrellas instead of four. What is
trucks rented at one location will be returned in one week to the probability that on a given day he will get wet going to
the three locations in the following proportions. work?
Suppose that the company has 250 trucks. How should the
company distribute the trucks so that the number of trucks
available at each location remains approximately constant
from one week to the next?
9. (matlab) Let
0.10 0.20 0.30 0.15 0.25
0.05 0.35 0.10 0.40 0.10
(4.8.7*)
P = 0 0 0.35 0.55 0.10
0.25 0.25 0.25 0.25 0
0.33 0.32 0 0 0.35
125
Chapter 5 Vector Spaces
126
§5.1 Vector Spaces and Subspaces
5.1 Vector Spaces and Subspaces We can also multiply a function f by a scalar c ∈ R by
defining the function cf to be:
Vector spaces abstract the arithmetic properties of addi-
tion and scalar multiplication of vectors. In Rn we know (cf )(t) = cf (t).
how to add vectors and to multiply vectors by scalars. In-
deed, it is straightforward to verify that each of the eight With these operations of addition and scalar multiplica-
properties listed in Table 1 is valid for vectors in V = Rn . tion, F is a vector space; that is, F satisfies the eight
Remarkably, sets that satisfy these eight properties have vector space properties in Table 1. More precisely:
much in common with Rn . So we define:
(A3) Define the zero function O by
Definition 5.1.1. Let V be a set having the two oper-
ations of addition and scalar multiplication. Then V is O(t) = 0 for all t ∈ R.
a vector space if the eight properties listed in Table 5.1.1
hold. The elements of a vector space are called vectors. For every x in F the function O satisfies:
(x + O)(t) = x(t) + O(t) = x(t) + 0 = x(t).
The vector 0 mentioned in (A3) in Table 1 is called the
zero vector. Therefore, x + O = x and O is the additive identity
When we say that a vector space V has the two opera- in F.
tions of addition and scalar multiplication we mean that (A4) Let x be a function in F and define y(t) = −x(t).
the sum of two vectors in V is again a vector in V and Then y is also a function in F, and
the scalar product of a vector with a number is again
a vector in V . These two properties are called closure (x + y)(t) = x(t) + y(t) = x(t) + (−x(t)) = 0 = O(t).
under addition and closure under scalar multiplication.
Thus, x has the additive inverse −x.
In this discussion we focus on just two types of vector
spaces: Rn and function spaces. The reason that we After these comments it is straightforward to verify that
make this choice is that solutions to linear equations are the remaining six properties in Table 1 are satisfied by
vectors in Rn while solutions to linear systems of differ- functions in F.
ential equations are vectors of functions.
Sets that are not Vector Spaces It is worth considering
An Example of a Function Space For example, let F de- how closure under vector addition and scalar multiplica-
note the set of all functions f : R → R. Note that func- tion can fail. Consider the following three examples.
tions like f1 (t) = t2 − 2t + 7 and f2 (t) = sin t are in F
since they are defined for all real numbers t, but that (i) Let V1 be the set that consists of just the x and y
functions like g1 (t) =
1
and g2 (t) = tan t are not in F axes in the plane. Since (1, 0) and (0, 1) are in V1
t but
since they are not defined for all t.
(1, 0) + (0, 1) = (1, 1)
We can add two functions f and g by defining the func-
is not in V1 , we see that V1 is not closed under vector
tion f + g to be:
addition. On the other hand, V1 is closed under
(f + g)(t) = f (t) + g(t). scalar multiplication.
127
§5.1 Vector Spaces and Subspaces
(ii) Let V2 be the set of all vectors (k, `) ∈ R2 where The x-axis and the xz-plane are examples of subsets
k and ` are integers. The set V2 is closed under of R3 that are closed under addition and closed under
addition but not under scalar multiplication since scalar multiplication. Every vector on the x-axis has the
1 1
(1, 0) = ( , 0) is not in V2 . form (a, 0, 0) ∈ R3 . The sum of two vectors (a, 0, 0) and
2 2 (b, 0, 0) on the x-axis is (a + b, 0, 0) which is also on the
(iii) Let V3 = [1, 2] be the closed interval in R. The set V3 x-axis. The x-axis is also closed under scalar multiplica-
is neither closed under addition (1 + 1.5 = 2.5 6∈ V3 ) tion as r(a, 0, 0) = (ra, 0, 0), and the x-axis is a subspace
nor under scalar multiplication (4 · 1.5 = 6 6∈ V3 ). of R3 . Similarly, every vector in the xz-plane in R3 has
Hence the set V3 is not closed under vector addition the form (a1 , 0, a3 ). As in the case of the x-axis, it is easy
and not closed under scalar multiplication. to verify that this set of vectors is closed under addition
and scalar multiplication. Thus, the xz-plane is also a
subspace of R3 .
Subspaces
In Theorem 5.1.4 we show that every subset of a vector
Definition 5.1.2. Let V be a vector space. A nonempty space that is closed under addition and scalar multiplica-
subset W ⊂ V is a subspace if W is a vector space us- tion is a subspace. To verify this statement, we need the
ing the operations of addition and scalar multiplication following lemma in which some special notation is used.
defined on V . Typically, we use the same notation 0 to denote the real
number zero and the zero vector. In the following lemma
Note that in order for a subset W of a vector space V to it is convenient to distinguish the two different uses of 0,
be a subspace it must be closed under addition and closed and we write the zero vector in boldface.
under scalar multiplication. That is, suppose w1 , w2 ∈ W Lemma 5.1.3. Let V be a vector space, and let 0 ∈ V
and r ∈ R. Then be the zero vector. Then
128
§5.1 Vector Spaces and Subspaces
Proof Let v be a vector in V and use (D1) to compute Theorem 5.1.4. Let W be a subset of the vector space
V . If W is closed under addition and closed under scalar
0v + 0v = (0 + 0)v = 0v. multiplication, then W is a subspace.
By (A4) the vector 0v has an additive inverse −0v.
Adding −0v to both sides yields
Proof We have to show that W is a vector space us-
(0v + 0v) + (−0v) = 0v + (−0v) = 0. ing the operations of addition and scalar multiplication
defined on V . That is, we need to verify that the eight
Associativity of addition (A2) now implies properties listed in Table 1 are satisfied. Note that prop-
0v + (0v + (−0v)) = 0. erties (A1), (A2), (M1), (M2), (D1), and (D2) are valid
for vectors in W since they are valid for vectors in V .
A second application of (A4) implies that
It remains to verify (A3) and (A4). Let w ∈ W be any
0v + 0 = 0 vector. Since W is closed under scalar multiplication, it
follows that 0w and (−1)w are in W . Lemma 5.1.3 states
and (A3) implies that 0v = 0. that 0w = 0 and (−1)w = −w; it follows that 0 and −w
Next, we show that the additive inverse −v of a vector v are in W . Hence, properties (A3) and (A4) are valid for
is unique. That is, if v + a = 0, then a = −v. vectors in W , since they are valid for vectors in V .
Before beginning the proof, note that commutativity of
addition (A1) together with (A3) implies that 0 + v = v. Examples of Subspaces of Rn
Similarly, (A1) and (A4) imply that −v + v = 0.
Example 5.1.5. (a) Let V be a vector space. Then
To prove uniqueness of additive inverses, add −v to both the subsets V and {0} are always subspaces of V . A
sides of the equation v + a = 0 yielding subspace W ⊂ V is proper if W 6= 0 and W 6= V .
−v + (v + a) = −v + 0. (b) Lines through the origin are subspaces of Rn . Let
Properties (A2) and (A3) imply w ∈ Rn be a nonzero vector and let W = {rw :
r ∈ R}. The set W is closed under addition and
(−v + v) + a = −v. scalar multiplication and is a subspace of Rn by The-
orem 5.1.4. The subspace W is just a line through
But the origin in Rn , since the vector rw points in the
(−v + v) + a = 0 + a = a. same direction as w when r > 0 and the exact op-
Therefore a = −v, as claimed. posite direction when r < 0.
To verify that (−1)v = −v, we show that (−1)v is the (c) Planes containing the origin are subspaces of R3 . To
additive inverse of v. Using (M1), (D1), and the fact that verify this point, let P be a plane through the origin
0v = 0, calculate and let N be a vector perpendicular to P . Then P
v + (−1)v = 1v + (−1)v = (1 − 1)v = 0v = 0. consists of all vectors v ∈ R3 perpendicular to N ;
using the dot-product (see Chapter 2, (2.2.3)) we
Thus, (−1)v is the additive inverse of v and must equal recall that such vectors satisfy the linear equation
−v, as claimed. N · v = 0. By superposition, the set of all solutions
129
§5.1 Vector Spaces and Subspaces
to this equation is closed under addition and scalar (iii) u(t) = csc(t) is neither defined nor continuous at
multiplication and is therefore a subspace by Theo- t = kπ for any integer k.
rem 5.1.4.
The subset C 1 ⊂ F is a subspace and hence a vector
In a sense that will be made precise all subspaces of Rn space. The reason is simple. If x(t) and y(t) are contin-
can be written as the span of a finite number of vectors uously differentiable, then
generalizing Example 5.1.5(b) or as solutions to a system d dx dy
of linear equations generalizing Example 5.1.5(c). dt
(x + y) =
dt
+
dt
.
130
§5.1 Vector Spaces and Subspaces
2. Let V2 be the set of all 2 × 3 matrices. Verify that V2 is a 14. S = {x ∈ R2 : Ax = 0} where A is a 3 × 2 matrix.
vector space. 15. S = {x ∈ R2 : Ax = b} where A is a 3 × 2 matrix and
b ∈ R3 is a fixed nonzero vector.
3. Let
16. Let V be a vector space and let W1 and W2 be subspaces.
1 1 0
A= .
1 −1 1 Show that the intersection W1 ∩ W2 is also a subspace of V .
Let V3 be the set of vectors x ∈ R3 such that Ax = 0. Verify
that V3 is a subspace of R3 . Compare V1 with V3 . 17. For which scalars a, b, c do the solutions to the equation
ax + by = c
In Exercises 4 – 10 you are given a vector space V and a subset form a subspace of R ?2
their first component. 19. Show that the set of all solutions to the differential equa-
tion ẋ = 2x is a subspace of C 1 .
6. V = R2 and W consists of vectors in R2 for which the sum
of the components is 1.
20. Recall from equation (4.5.6) of Section 4.5 that solutions
7. V = R2 and W consists of vectors in R2 for which the sum to the system of differential equations
of the components is 0. dX
−1 3
= X
8. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying dt 3 −1
are
Z 4
x(t)dt = 0.
1 1
−2 X(t) = αe2t + βe−4t .
1 −1
9. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying Use this formula for solutions to show that the set of solutions
x(1) = 0. to this system of differential equations is a vector subspace of
(C 1 )2 .
10. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying
x(1) = 1.
21. Let V = R+ = {x ∈ R : x > 0}. Show that V is a vector
space under the operations of ‘addition’ (⊕)
In Exercises 11 – 15 which of the sets S are subspaces? v ⊕ w = vw for all v, w ∈ V
11. S = {(a, b, c) ∈ R : a ≥ 0, b ≥ 0, c ≥ 0}.
3
and ‘scalar multiplication’ (⊗)
12. S = {(x1 , x2 , x3 ) ∈ R : a1 x1 + a2 x2 + a3 x3 =
3
r ⊗ v = vr for all v ∈ V and r ∈ R
0 where a1 , a2 , a3 ∈ R are fixed}. Hints: The additive identity is v = 1; the additive inverse is
1
13. S = {(x, y) ∈ R2 : ; and the multiplicative identity is r = 1.
(x, y) is on the line through (1, 1) with slope 1}. v
131
§5.2 Construction of Subspaces
5.2 Construction of Subspaces We will see later that a solution to (5.2.2) has coordinate
functions xj (t) in C 1 . The principle of superposition then
The principle of superposition shows that the set of all
shows that W is a subspace of (C 1 )n . Suppose x(t) and
solutions to a homogeneous system of linear equations is
y(t) are solutions of (5.2.2). Then
closed under addition and scalar multiplication and is a
subspace. Indeed, there are two ways to describe sub- d dx dy
spaces: first as solutions to linear systems, and second as (x(t)+y(t)) = (t)+ (t) = Cx(t)+Cy(t) = C(x(t)+y(t));
dt dt dt
the span of a set of vectors. We shall see that solving a
homogeneous linear system of equations just means writ- so x(t) + y(t) is a solution of (5.2.2) and in W . A similar
ing the solution set as the span of a finite set of vectors. calculation shows that rx(t) is also in W and that W ⊂
(C 1 )n is a subspace.
Solutions to Homogeneous Systems Form Subspaces
Definition 5.2.1. Let A be an m × n matrix. The null Writing Solution Subspaces as a Span The way we
space of A is the set of solutions to the homogeneous solve homogeneous systems of equations gives a second
system of linear equations method for defining subspaces. For example, consider
the system
Ax = 0. (5.2.1)
Ax = 0,
Lemma 5.2.2. Let A be an m × n matrix. Then the null where
space of A is a subspace of Rn .
2 1 4 0
A= .
−1 0 2 1
Proof Suppose that x and y are solutions to (5.2.1).
Then The matrix A is row equivalent to the reduced echelon
A(x + y) = Ax + Ay = 0 + 0 = 0; form matrix
so x + y is a solution of (5.2.1). Similarly, for r ∈ R
1 0 −2 −1
E= .
A(rx) = rAx = r0 = 0; 0 1 8 2
132
§5.2 Construction of Subspaces
two vectors by use of vector addition and scalar multipli- It follows that
cation. We say that this subspace is spanned by the two
vectors Rn = span{e1 , . . . , en }.
2 1
−8 Similarly, the set span{e1 , e3 } ⊂ R3 is just the x1 x3 -
and −2 .
plane, since vectors in this span are
1 0
0 1
x1 e1 + x3 e3 = x1 (1, 0, 0) + x3 (0, 0, 1) = (x1 , 0, x3 ).
For example, a calculation verifies that the vector
Proposition 5.2.4. Let V be a vector space and let
−1 w1 , . . . , wk ∈ V . Then W = span{w1 , . . . , wk } ⊂ V is
a subspace.
−2
1
−3
Proof Suppose x, y ∈ W . Then
is also a solution of Ax = 0. Indeed, we may write it as
x = r1 w1 + · · · + rk wk
−1 2 1
−2 −8 −2 y = s1 w1 + · · · + sk wk
1 = 1 − 3 0 .
(5.2.3)
for some scalars r1 , . . . , rk and s1 , . . . , sk . It follows that
−3 0 1
x + y = (r1 + s1 )w1 + · · · + (rk + sk )wk
Spans Let v1 , . . . , vk be a set of vectors in a vector space
and
V . A vector v ∈ V is a linear combination of v1 , . . . , vk
rx = (rr1 )w1 + · · · + (rrk )wk
if
v = r1 v1 + · · · + rk vk are both in span{w1 , . . . , wk }. Hence W ⊂ V is closed
for some scalars r1 , . . . , rk . under addition and scalar multiplication, and is a sub-
space by Theorem 5.1.4.
Definition 5.2.3. The set of all linear combinations of
the vectors v1 , . . . , vk in a vector space V is the span of For example, let
v1 , . . . , vk and is denoted by span{v1 , . . . , vk }.
v = (2, 1, 0) and w = (1, 1, 1) (5.2.4)
For example, the vector on the left hand side in (5.2.3) is
a linear combination of the two vectors on the right hand be vectors in R3 . Then linear combinations of the vectors
side. v and w have the form
The simplest example of a span is Rn itself. Let vj = ej αv + βw = (2α + β, α + β, β)
where ej ∈ Rn is the vector with a 1 in the j th coordinate
and 0 in all other coordinates. Then every vector x = for real numbers α and β. Note that every one of these
(x1 , . . . , xn ) ∈ Rn can be written as vectors is a solution to the linear equation
x = x1 e1 + · · · + xn en . x1 − 2x2 + x3 = 0, (5.2.5)
133
§5.2 Construction of Subspaces
that is, the 1st coordinate minus twice the 2nd coordinate In Exercises 5 – 8 each of the given matrices is in reduced ech-
plus the 3rd coordinate equals zero. Moreover, you may elon form. Write solutions of the corresponding homogeneous
verify that every solution of (5.2.5) is a linear combina- system of linear equations as a span of vectors.
tion of the vectors v and w in (5.2.4). Thus, the set of
1 2 0 1 0
solutions to the homogeneous linear equation (5.2.5) is a 5. A = 0 0 1 4 0 .
subspace, and that subspace can be written as the span 0 0 0 0 1
of all linear combinations of the vectors v and w.
1 3 0 5
6. B = .
In this language we see that the process of solving a ho- 0 0 1 2
mogeneous system of linear equations is just the process
1 0 2
of finding a set of vectors that span the subspace of all 7. A = .
0 1 1
solutions. Indeed, we can now restate Theorem 2.4.6 of
Chapter 2. Recall that a matrix A has rank ` if it is row
1 −1 0 5 0 0
equivalent to a matrix in echelon form with ` nonzero 8. B = 0 0 1 2 0 2 .
rows. 0 0 0 0 1 2
Proposition 5.2.5. Let A be an m × n matrix with rank 9. Write a system of two linear equations of the form Ax = 0
`. Then the null space of A is the span of n − ` vectors. where A is a 2 × 4 matrix whose subspace of solutions in R4
is the span of the two vectors
We have now seen that there are two ways to describe
1 0
subspaces — as solutions of homogeneous systems of lin- −1 0
ear equations and as a span of a set of vectors, the span-
v1 = 0 and v2 = 1 .
134
§5.2 Construction of Subspaces
span{v, v} = span{v}.
135
§5.3 Spanning Sets and MATLAB
5.3 Spanning Sets and MATLAB and observe that this linear combination is the desired
one.
In this section we discuss:
Next we describe how to find the coefficients 4.1404 and
• how to find a spanning set for the subspace of solu- -7.2012 by showing that these coefficients themselves
tions to a homogeneous system of linear equations are solutions to another system of linear equations.
using the MATLAB command null, and
When is a Vector in a Span? Let w1 , . . . , wk and v be
• how to determine when a vector is in the subspace vectors in Rn . We now describe a method that allows
spanned by a set of vectors using the MATLAB com- us to decide whether v is in span{w1 , . . . , wk }. To an-
mand rref. swer this question one has to solve a system of n linear
equations in k unknowns. The unknowns correspond to
the coefficients in the linear combination of the vectors
Spanning Sets for Homogeneous Linear Equations In
w1 , . . . , wk that gives v.
Chapter 2 we saw how to use Gaussian elimination, back
substitution, and MATLAB to compute solutions to a Let us be more precise. The vector v is in
system of linear equations. For systems of homogeneous span{w1 , . . . , wk } if and only if there are constants
equations, MATLAB provides a command to find a span- r1 , . . . , rk such that the equation
ning set for the subspace of solutions. That command is
null. For example, if we type r1 w1 + · · · + rk wk = v (5.3.1)
The two columns of the matrix B span the set of solu- Ar = v. (5.3.3)
tions of the equation Ax = 0. In particular, the vector
(2, −8, 1, 0) is a solution to Ax = 0 and is therefore a To summarize:
linear combination of the column vectors of B. Indeed, Lemma 5.3.1. Let w1 , . . . , wk and v be vectors in Rn .
type Then v is in span{w1 , . . . , wk } if and only if the system
of linear equations (5.3.3) has a solution where A is the
4.1404*B(:,1)-7.2012*B(:,2) n × k defined in (5.3.2).
136
§5.3 Spanning Sets and MATLAB
So v =
3 1
w1 + w2 . aug = [A v]
2 2 rref(aug)
Row reduction to reduced echelon form has been prepro-
grammed in the MATLAB command rref. Consider the which yields
following example. Let
ans =
w1 = (2, 0, −1, 4) and w2 = (2, −1, 0, 2) (5.3.4) 1 0 0
0 1 0
and ask the question whether v = (−2, 4, −3, 4) is in 0 0 1
span{w1 , w2 }. 0 0 0
In MATLAB load the matrix A having w1 and w2 as its
columns and the vector v by typing e5_3_5 This matrix corresponds to an inconsistent system; thus
v is no longer in the span of w1 and w2 .
2 2 −2
0 −1
and v = 4 . (5.3.5*) Exercises
A= −1 −3
0
4 2 4
We can solve the system of equations using MATLAB. In Exercises 1 – 3 use the null command in MATLAB to find
First, form the augmented matrix by typing all the solutions of the linear system of equations Ax = 0.
137
§5.3 Spanning Sets and MATLAB
1. (matlab)
−4 0 −4 3
A= (5.3.6*)
−4 1 −1 1
2. (matlab)
1 2
A= 1 0 (5.3.7*)
3 −2
3. (matlab)
1 1 2
A= . (5.3.8*)
−1 2 −1
138
§5.4 Linear Dependence and Linear Independence
5.4 Linear Dependence and Linear Since linear independence means not linearly dependent,
Lemma 5.4.2 can be rewritten as:
Independence
An important question in linear algebra concerns finding Lemma 5.4.4. The set of vectors {w1 , . . . , wk } is lin-
spanning sets for subspaces having the smallest number early independent if and only if whenever
of vectors. Let w1 , . . . , wk be vectors in a vector space
V and let W = span{w1 , . . . , wk }. Suppose that W is r1 w1 + · · · + rk wk = 0,
generated by a subset of these k vectors. Indeed, sup-
it follows that
pose that the k th vector is redundant in the sense that
W = span{w1 , . . . , wk−1 }. Since wk ∈ W , this is possi- r1 = r2 = · · · = rk = 0.
ble only if wk is a linear combination of the k − 1 vectors
w1 , . . . , wk−1 ; that is, only if
Let ej be the vector in Rn whose j th component is 1 and
wk = r1 w1 + · · · + rk−1 wk−1 . (5.4.1) all of whose other components are 0. The set of vectors
e1 , . . . , en is the simplest example of a set of linearly in-
Definition 5.4.1. Let w1 , . . . , wk be vectors in the vec-
dependent vectors in Rn . We use Lemma 5.4.4 to verify
tor space V . The set {w1 , . . . , wk } is linearly dependent
independence by supposing that
if one of the vectors wj can be written as a linear com-
bination of the remaining k − 1 vectors. r1 e1 + · · · + rn en = 0.
Note that when k = 1, the phrase ‘{w1 } is linearly de- A calculation shows that
pendent’ means that w1 = 0.
If we set rk = −1, then we may rewrite (5.4.1) as 0 = r1 e1 + · · · + rn en = (r1 , . . . , rn ).
139
§5.4 Linear Dependence and Linear Independence
is a solution to the system of equations AR = 0 precisely are linearly dependent. After typing e5_4_4 in MATLAB,
when form the 5 × 4 matrix A by typing
r1 w1 + · · · + rk wk = 0. (5.4.3)
If there is a nonzero solution R to AR = 0, then the A = [w1 w2 w3 w4]
vectors {w1 , . . . , wk } are linearly dependent; if the only
solution to AR = 0 is R = 0, then the vectors are linearly Determine whether there is a nonzero solution to AR = 0
independent. by typing
The preceding discussion is summarized by:
Lemma 5.4.5. The vectors w1 , . . . , wk in Rn are linearly null(A)
dependent if the null space of the n × k matrix A defined
in (5.4.2) is nonzero and linearly independent if the null The response from MATLAB is
space of A is zero.
ans =
-0.7559
A Simple Example of Linear Independence with Two Vectors
-0.3780
The two vectors
0.3780
0.3780
2 1
−8 −2
1 and w2 = 0
w1 =
showing that there is a nonzero solution to AR = 0 and
0 1 the vectors wj are linearly dependent. Indeed, this so-
are linearly independent. To see this suppose that lution for R shows that we can solve for w1 in terms of
r1 w1 + r2 w2 = 0. Using the components of w1 and w2 w2 , w3 , w4 . We can now ask whether or not w2 , w3 , w4
this equality is equivalent to the system of four equations are linearly dependent. To answer this question form the
matrix
2r1 + r2 = 0, −8r1 − 2r2 = 0, r1 = 0, and r2 = 0.
In particular, r1 = r2 = 0; hence w1 and w2 are linearly B = [w2 w3 w4]
independent.
and type null(B) to obtain
Using MATLAB to Decide Linear Dependence Suppose
that we want to determine whether or not the vectors ans =
0 Empty matrix: 3-by-0
1 −1 1
2 1 1 4
showing that the only solution to BR = 0 is the zero
w1 = −1 w2 = 4 w3 = −1 w4 =
3
solution R = 0. Thus, w2 , w3 , w4 are linearly indepen-
3 −2 3 1
5 0 12 dent. For these particular vectors, any three of the four
−2
(5.4.4*) are linearly independent.
140
§5.4 Linear Dependence and Linear Independence
{u1 + u2 , u2 + u3 , u3 + u1 }
1. Let w be a vector in the vector space V . Show that the
is also linearly independent.
sets of vectors {w, 0} and {w, −w} are linearly dependent.
u1 = (1, −1, 1) u2 = (2, 1, −2) u3 = (10, 2, −6). (a) Show that the rows of A are linearly dependent.
Is the set {u1 , u2 , u3 } linearly dependent or linearly indepen- (b) Find two subsets S1 and S2 of rows of A such that
dent? (i) S1 6= S2 .
(ii) span S1 = span S2 .
4. For which values of b are the vectors (1, b, 2b) and (2, 1, 4) (iii) The vectors in S1 are linearly independent and the
linearly independent? vectors in S2 are linearly independent.
141
§5.4 Linear Dependence and Linear Independence
142
§5.5 Dimension and Bases
5.5 Dimension and Bases if B is a spanning set for W with the smallest number of
The minimum number of vectors that span a vector space elements in a spanning set for W .
has special significance.
It follows that if {w1 , . . . , wk } is a basis for W , then k =
Definition 5.5.1. The vector space V has finite dimen-
dim W . The main theorem about bases is:
sion if V is the span of a finite number of vectors. If V
has finite dimension, then the smallest number of vectors
that span V is called the dimension of V and is denoted Theorem 5.5.3. A set of vectors B = {w1 , . . . , wk } in
by dim V . a vector space W is a basis for W if and only if the set
B is linearly independent and spans W .
For example, recall that ej is the vector in Rn whose j th
component is 1 and all of whose other components are 0. Remark: The importance of Theorem 5.5.3 is that we
Let x = (x1 , . . . , xn ) be in Rn . Then can show that a set of vectors is a basis by verifying
spanning and linear independence. We never have to
x = x1 e1 + · · · + xn en . (5.5.1)
check directly that the spanning set has the minimum
Since every vector in R is a linear combination of the
n
number of vectors for a spanning set.
vectors e1 , . . . , en , it follows that Rn = span{e1 , . . . , en }. For example, we have shown previously that the set of
Thus, Rn is finite dimensional. Moreover, the dimension vectors {e1 , . . . , en } in Rn is linearly independent and
of Rn is at most n, since Rn is spanned by n vectors. It spans Rn . It follows from Theorem 5.5.3 that this set is
seems unlikely that Rn could be spanned by fewer than a basis, and that the dimension of Rn is n. In particular,
n vectors— but this point needs to be proved. Rn cannot be spanned by fewer than n vectors.
The proof of Theorem 5.5.3 is given in Section 5.6.
An Example of a Vector Space that is Not Finite Dimen-
sional Next we discuss an example of a vector space that
does not have finite dimension. Consider the subspace
P ⊂ C 1 consisting of polynomials of all degrees. We Consequences of Theorem 5.5.3 We discuss two ap-
show that P is not the span of a finite number of vectors plications of Theorem 5.5.3. First, we use this theorem
and hence that P does not have finite dimension. Let to derive a way of determining the dimension of the sub-
p1 (t), p2 (t), . . . , pk (t) be a set of k polynomials and let d space spanned by a finite number of vectors. Second,
be the maximum degree of these k polynomials. Then we show that the dimension of the subspace of solutions
every polynomial in the span of p1 (t), . . . , pk (t) has de- to a homogeneous system of linear equation Ax = 0 is
gree less than or equal to d. In particular, p(t) = td+1 n − rank(A) where A is an m × n matrix.
is a polynomial that is not in the span of p1 (t), . . . , pk (t)
and P is not spanned by finitely many vectors.
Computing the Dimension of a Span We show that the
dimension of a span of vectors can be found using ele-
Bases and The Main Theorem
mentary row operations on M .
Definition 5.5.2. Let B = {w1 , . . . , wk } be a set of vec-
tors in a vector space W . The subset B is a basis for W Lemma 5.5.4. Let w1 , . . . , wk be k row vectors in Rn
143
§5.5 Dimension and Bases
and let W = span{w1 , . . . , wk } ⊂ Rn . Define {v1 , . . . , v` } is a basis for W and that the dimension of
W is `. To verify the claim, suppose
w1
.. a1 v1 + · · · + a` v` = 0. (5.5.3)
M = .
wk We show that ai must equal 0 as follows. In the ith row,
the pivot must occur in some column — say in the j th
to be the matrix whose rows are the wj s. Then column. It follows that the j th entry in the vector of the
left hand side of (5.5.3) is
dim(W ) = rank(M ). (5.5.2)
0a1 + · · · + 0ai−1 + 1ai + 0ai+1 + · · · + 0a` = ai ,
ans =
v1
.. 1.0000 0 1.4706 1.1176
.
v`
0 1.0000 1.7059 2.1765
E= 0 ,
0 0 0 0
.
.
. indicating that the dimension of the subspace W is two,
0 and therefore {w1 , w2 , w3 } is not a basis of W . Alter-
natively, we can use the MATLAB command rank(M) to
where the vj are the nonzero rows in the reduced echelon compute the rank of M and the dimension of the span
form matrix. W.
We claim that the vectors v1 , . . . , v` are linearly in- However, if we change one of the entries in w3 ,
dependent. It then follows from Theorem 5.5.3 that for instance w3(3)=-18 then indeed the command
144
§5.5 Dimension and Bases
145
§5.5 Dimension and Bases
After solving for the variables corresponding to pivots, 3. Let S = span{v1 , v2 , v3 } where
we find that the spanning set of the null space consists
v1 = (1, 0, −1, 0) v2 = (0, 1, 1, 1) v3 = (5, 4, −1, 4).
of p vectors in Rn , which we label as {wj1 , . . . , wjp }. See
(5.5.6). Note that the jm th entry of wjm is 1 while the Find the dimension of S and find a basis for S.
jm th entry in all of the other p − 1 vectors is 0. Again,
see (5.5.6) as an example that supports this statement.
It follows that the set of spanning vectors is a linearly 4. Find a basis for the null space of
independent set. That is, suppose that
1 0 −1 2
A = 1 −1 0 0 .
r1 wj1 + · · · + rp wjp = 0. 4 −5 1 −2
From the jm th entry in this equation, it follows that rm = What is the dimension of the null space of A?
0; and the vectors are linearly independent.
dp d2 p d3 p
p, , 2 , 3
dt dt dt
1. Show that U = {u1 , u2 , u3 } where
is a basis for P3 .
u1 = (1, 1, 0) u2 = (0, 1, 0) u3 = (−1, 0, 1)
is a basis for R3 .
8. Let u ∈ Rn be a nonzero row vector.
146
§5.5 Dimension and Bases
9. Let
{v1 , v2 , v3 }
be a basis for R . Find all k so that
3
147
§5.6 The Proof of the Main Theorem
5.6 The Proof of the Main Proof Since the dimension of W is m we know
that this vector space can be written as W =
Theorem span{v1 , . . . , vm }. Moreover, Lemma 5.6.1 implies that
We begin the proof of Theorem 5.5.3 with two lemmas the vectors v1 , . . . , vm are linearly independent. Suppose
on linearly independent and spanning sets. that {w1 , . . . , wk } is another set of vectors where k > m.
We have to show that the vectors w1 , . . . , wk are linearly
Lemma 5.6.1. Let {w1 , . . . , wk } be a set of vectors in
dependent; that is, we must show that there exist scalars
a vector space V and let W be the subspace spanned by
r1 , . . . , rk not all of which are zero that satisfy
these vectors. Then there is a linearly independent subset
of {w1 , . . . , wk } that also spans W . r1 w1 + · · · + rk wk = 0. (5.6.1)
We find these scalars by solving a system of linear equa-
Proof If {w1 , . . . , wk } is linearly independent, then tions, as we now show.
the lemma is proved. If not, then the set {w1 , . . . , wk } is The fact that W is spanned by the vectors vj implies that
linearly dependent. If this set is linearly dependent, then
at least one of the vectors is a linear combination of the w1 = a11 v1 + · · · + am1 vm
others. By renumbering if necessary, we can assume that w2 = a12 v1 + · · · + am2 vm
wk is a linear combination of w1 , . . . , wk−1 ; that is, ..
.
wk = a1 w1 + · · · + ak−1 wk−1 .
wk = a1k v1 + · · · + amk vm .
Now suppose that w ∈ W . Then
It follows that r1 w1 + · · · + rk wk equals
w = b1 w1 + · · · + bk wk . r1 (a11 v1 + · · · + am1 vm ) +
r2 (a12 v1 + · · · + am2 vm ) + · · · +
It follows that
rk (a1k v1 + · · · + amk vm )
w = (b1 + bk a1 )w1 + · · · + (bk−1 + bk ak−1 )wk−1 , Rearranging terms leads to the expression:
and that W = span{w1 , . . . , wk−1 }. If the vectors (a11 r1 + · · · + a1k rk )v1 +
w1 , . . . , wk−1 are linearly independent, then the proof of (a21 r1 + · · · + a2k rk )v2 +···+ (5.6.2)
the lemma is complete. If not, continue inductively until (am1 r1 + · · · + amk rk )vm .
a linearly independent subset of the wj that also spans
W is found. Thus, (5.6.1) is valid if and only if (5.6.2) sums to zero.
Since the set {v1 , . . . , vm } is linearly independent, (5.6.2)
The important point in proving that linear independence can equal zero if and only if
together with spanning imply that we have a basis is a11 r1 + · · · + a1k rk = 0
discussed in the next lemma.
a21 r1 + · · · + a2k rk = 0
Lemma 5.6.2. Let W be an m-dimensional vector space ..
and let k > m be an integer. Then any set of k vectors .
in W is linearly dependent. am1 r1 + · · · + amk rk = 0.
148
§5.6 The Proof of the Main Theorem
Since m < k, Chapter 2, Theorem 2.4.6 implies that We now discuss a second approach to finding a basis for a
this system of homogeneous linear equations always has a nonzero subspace W of a finite dimensional vector space
nonzero solution r = (r1 , . . . , rk ) — from which it follows V.
that the wi are linearly dependent.
Lemma 5.6.4. Let {u1 , . . . , uk } be a linearly indepen-
Corollary 5.6.3. Let V be a vector space of dimension dent set of vectors in a vector space V and assume that
n and let {u1 , . . . , uk } be a linearly independent set of
uk+1 6∈ span{u1 , . . . , uk }.
vectors in V . Then k ≤ n.
Then {u1 , . . . , uk+1 } is also a linearly independent set.
149
§5.6 The Proof of the Main Theorem
• Continue until a spanning set for W is found. This (b) Let {w1 , . . . , wk } be a basis for W . Theorem 5.5.3 im-
set is a basis for W . plies that this set is linearly independent. If {w1 , . . . , wk }
does not span V , then it can be extended to a basis as
We now justify this approach to finding bases for sub- above. But then dim V > dim W , which is a contradic-
spaces. Suppose that W is a subspace of a finite di- tion.
mensional vector space V . For example, suppose that
W ⊂ Rn . Then our approach to finding a basis of W Corollary 5.6.7. Let B = {w1 , . . . , wn } be a set of n
is as follows. Choose a nonzero vector w1 ∈ W . If vectors in an n-dimensional vector space V . Then the
W = span{w1 }, then we are done. If not, choose a vector following are equivalent:
w2 ∈ W – span{w1 }. It follows from Lemma 5.6.4 that
{w1 , w2 } is linearly independent. If W = span{w1 , w2 }, (a) B is a spanning set of V ,
then Theorem 5.5.3 implies that {w1 , w2 } is a basis (b) B is a basis for V , and
for W , dim W = 2, and we are done. If not, choose
w3 ∈ W – span{w1 , w2 } and {w1 , w2 , w3 } is linearly in- (c) B is a linearly independent set.
dependent. The finite dimension of V implies that con-
tinuing inductively must lead to a spanning set of linear Proof By definition, (a) implies (b) since a basis is
independent vectors for W — which by Theorem 5.5.3 is a spanning set with the number of vectors equal to the
a basis. This discussion proves: dimension of the space. Theorem 5.5.3 states that a ba-
sis is a linearly independent set; so (b) implies (c). If B
Corollary 5.6.5. Every linearly independent subset of is a linearly independent set of n vectors, then it spans
a finite dimensional vector space V can be extended to a a subspace W of dimension n. It follows from Corol-
basis of V . lary 5.6.6(b) that W = V and that (c) implies (a).
Further consequences of Theorem 5.5.3 We summa- Subspaces of R3 We can now classify all subspaces of
rize here several important facts about dimensions. R3 . They are: the origin, lines through the origin, planes
through the origin, and R3 . All of these sets were shown
Corollary 5.6.6. Let W be a subspace of a finite dimen-
to be subspaces in Example 5.1.5(a–c).
sional vector space V .
To verify that these sets are the only subspaces of R3 ,
(a) Suppose that W is a proper subspace. Then dim W < note that Theorem 5.5.3 implies that proper subspaces
dim V . of R3 have dimension equal either to one or two. (The
zero dimensional subspace is the origin and the only three
(b) Suppose that dim W = dim V . Then W = V . dimensional subspace is R3 itself.) One dimensional sub-
spaces of R3 are spanned by one nonzero vector and
Proof (a) Let dim W = k and let {w1 , . . . , wk } be a are just lines through the origin. See Example 5.1.5(b).
We claim that all two dimensional subspaces are planes
basis for W . Since W is a proper subspace of V , there is
through the origin.
a vector w ∈ V – W . It follows from Lemma 5.6.4 that
{w1 , . . . , wk , w} is a linearly independent set. Therefore, Suppose that W ⊂ R3 is a subspace spanned by two non-
Corollary 5.6.3 implies that k + 1 ≤ n. collinear vectors w1 and w2 . We show that W is a plane
150
§5.6 The Proof of the Main Theorem
through the origin using results in Chapter 2. Observe 5. Let A be a 7 × 5 matrix with rank(A) = r.
that there is a vector N = (N1 , N2 , N3 ) perpendicular
(a) What is the largest value that r can have?
to w1 = (a11 , a12 , a13 ) and w2 = (a21 , a22 , a23 ). Such a
vector N satisfies the two linear equations: (b) Give a condition equivalent to the system of equations
Ax = b having a solution.
w1 · N = a11 N1 + a12 N2 + a13 N3 = 0 (c) What is the dimension of the null space of A?
w2 · N = a21 N1 + a22 N2 + a23 N3 = 0. (d) If there is a solution to Ax = b, then how many param-
eters are needed to describe the set of all solutions?
Chapter 2, Theorem 2.4.6 implies that a system of two
linear equations in three unknowns has a nonzero solu-
tion. Let P be the plane perpendicular to N that con- 6. Let
1 3 −1 4
tains the origin. We show that W = P and hence that A= 2 1 5 7 .
the claim is valid. 3 4 4 11
The choice of N shows that the vectors w1 and w2 are
(a) Find a basis for the subspace C ⊂ R3 spanned by the
both in P . In fact, since P is a subspace it contains every columns of A.
vector in span{w1 , w2 }. Thus W ⊂ P . If P contains just
(b) Find a basis for the subspace R ⊂ R4 spanned by the
one additional vector w3 ∈ R3 that is not in W , then the rows of A.
span of w1 , w2 , w3 is three dimensional and P = W = R3 .
(c) What is the relationship between dim C and dim R?
In Exercises 1 – 3 you are given a pair of vectors v1 , v2 span- are linearly independent. Show that the span of v1 and v2
ning a subspace of R3 . Decide whether that subspace is a line forms a plane in R3 by showing that every linear combination
or a plane through the origin. If it is a plane, then compute is the solution to a single linear equation. Use this equa-
a vector N that is perpendicular to that plane. tion to determine the normal vector N to this plane. Verify
Lemma 5.6.4 by verifying directly that v1 , v2 , N are linearly
1. v1 = (2, 1, 2) and v2 = (0, −1, 1). independent vectors.
2. v1 = (2, 1, −1) and v2 = (−4, −2, 2).
3. v1 = (0, 1, 0) and v2 = (4, 1, 0). 8. Let W be an infinite dimensional subspace of the vector
space V . Show that V is infinite dimensional.
151
§5.6 The Proof of the Main Theorem
(a) Find a value for λ such that the dimension of Consider the vectors
span{w1 , w2 , w3 , w4 } is three. Then decide whether
−1 2 2 2
{w1 , w2 , w3 } or {w1 , w2 , w4 } is a basis for R3 . 1 5 1 −2
v1 = v2 = v3 = v4 =
(b) Find a value for λ such that the dimension of
0 3 1 0
span{w1 , w2 , w3 , w4 } is two. −1 −1 1 2
152
Chapter 6 Closed Form Solutions for Planar ODEs
6 Closed Form Solutions for ponentials yield an elegant third way to derive closed
form solutions to n-dimensional linear ODE systems
Planar ODEs (Section 6.5). This method leads to a proof of uniqueness
of solutions to initial value problems of linear systems
In this chapter we describe several methods for finding
(Theorem 6.5.1). A proof of the Cayley Hamilton Theo-
closed form solutions to planar constant coefficient sys-
rem for 2 × 2 matrices is given in Section 6.6. In the last
tems of linear differential equations and we use these
section, Section 6.7, we obtain solutions to second order
methods to discuss qualitative features of phase portraits
equations by reducting them to first order systems.
of these solutions.
In Section 6.1 we show how uniqueness to initial value
problems implies that the space of solutions to a con-
stant coefficient system of n linear differential equations
is n dimensional. Using this observation we present a
direct method for solving planar linear systems in Sec-
tion 6.2. This method extends the discussion of solutions
to systems whose coefficient matrices have distinct real
eigenvalues given in Section 4.7 to the cases of complex
eigenvalues and equal real eigenvalues.
A second method for finding solutions is to use changes
of coordinates to make the coefficient matrix of the dif-
ferential equation as simple as possible. This idea leads
to the notion of similarity of matrices, which is discussed
in Section 6.3, and leads to the second method for solving
planar linear systems. Similarity also leads to the Jordan
Normal Form theorem for 2 × 2 matrices. Both the di-
rect method and the method based on similarity require
being able to compute the eigenvalues and eigenvectors
of the coefficient matrix.
The important subject of qualitative features of phase
portraits of linear systems are explored in Section 6.4.
Specifically we discuss saddles, sinks, sources and asymp-
totic stability. This discussion also uses similarity and
Jordan Normal Form. We find that the qualitative theory
is determined by the eigenvalues and eigenvectors of the
coefficient matrix — which is not surprising given that
we can classify matrices up to similarity by just knowing
their eigenvalues and eigenvectors.
Chapter 6 ends with three optional sections. Matrix ex-
153
§6.1 The Initial Value Problem
6.1 The Initial Value Problem Superposition guarantees that every linear combination
of these solutions
Recall that a planar autonomous constant coefficient sys-
tem of ordinary differential equations has the form X(t) = α1 X1 (t) + α2 X2 (t) = α1 eλ1 t v1 + α2 eλ2 t v2
dx
= ax + by is a solution to (6.1.2). Since v1 and v2 are linearly in-
dt (6.1.1) dependent, we can always choose scalars α1 , α2 ∈ R to
dy
= cx + dy solve any given initial value problem of (6.1.2). It follows
dt from the uniqueness of solutions to initial value problems
where a, b, c, d ∈ R. Computer experiments using Phase- that all solutions to (6.1.2) are included in this family of
Plane lead us to believe that there is just one solution to solutions. Uniqueness is proved in the special case of lin-
(6.1.1) satisfying the initial conditions ear systems in Theorem 6.5.1. This proof uses matrix
exponentials.
x(0) = x0
y(0) = y0 . We generalize this discussion so that we will be able to
find closed form solutions to (6.1.2) in Section 6.2 when
We prove existence in this section and the next by deter- the eigenvalues of C are complex or are real and equal.
mining explicit formulas for solutions.
Suppose that X1 (t) and X2 (t) are two solutions to (6.1.1)
such that
The Initial Value Problem for Linear Systems In this
chapter we discuss how to find solutions (x(t), y(t)) to v1 = X1 (0) and v2 = X2 (0)
(6.1.1) satisfying the initial values x(0) = x0 and y(0) = are linearly independent. Then all solutions to (6.1.1)
y0 . It is convenient to rewrite (6.1.1) in matrix form as: are linear combinations of these two solutions. We verify
dX this statement as follows. Corollary 5.6.7 of Chapter 5
(t) = CX(t). (6.1.2) states that since {v1 , v2 } is a linearly independent set in
dt
R2 , it is also a basis of R2 . Thus for every X0 ∈ R2 there
The initial value problem is then stated as: Find a so-
exist scalars r1 , r2 such that
lution to (6.1.2) satisfying X(0) = X0 where X0 =
(x0 , y0 )t . Everything that we have said here works X0 = r1 v1 + r2 v2 .
equally well for n dimensional systems of linear differ-
It follows from superposition that the solution
ential equations. Just let C be an n × n matrix and let
X0 be an n vector of initial conditions. X(t) = r1 X1 (t) + r2 X2 (t)
is the unique solution whose initial condition vector is
Solving the Initial Value Problem Using Superposition In X0 .
Section 4.7 we discussed how to solve (6.1.2) when the
We have proved that every solution to this linear sys-
eigenvalues of C are real and distinct. Recall that when
tem of differential equations is a linear combination of
λ1 and λ2 are distinct real eigenvalues of C with asso-
these two solutions — that is, we have proved that the
ciated eigenvectors v1 and v2 , there are two solutions to
dimension of the space of solutions to (6.1.2) is two. This
(6.1.2) given by the explicit formulas
proof generalizes immediately to a proof of the following
X1 (t) = eλ1 t v1 and X2 (t) = eλ2 t v2 . theorem for n × n systems.
154
§6.1 The Initial Value Problem
We call (6.1.3) the general solution to the system of dif- are eigenvectors of the coefficient matrix of (6.1.6) and find
ferential equations Ẋ = CX. When solving the initial the associated eigenvalues.
value problem we find a particular solution by specifying 2. Find the solution to (6.1.6) satisfying initial conditions
the scalars r1 , . . . , rn . X(0) = (−14, 22)t .
Corollary 6.1.2. Let C be an n × n matrix and let 3. Find the solution to (6.1.6) satisfying initial conditions
X = {X1 (t), . . . , Xn (t)} X(0) = (−3, 5)t .
155
§6.1 The Initial Value Problem
In Exercises 9 – 12, consider the system of differential equa- 16. Find a solution to the system of differential equations
tions Ẋ = CX satisfying the initial condition X(0) = (10, −4, 9)t .
dx
= −y
dt (6.1.8) 17. Find a solution to the system of differential equations
dy Ẋ = CX satisfying the initial condition X(0) = (2, −1, 3)t .
= x.
dt
9. Show that (x1 (t), y1 (t)) = (cos t, sin t) is a solution to
(6.1.8). 18. Show that for some nonzero a the function x(t) = at5 is a
solution to the differential equation ẋ = x4/5 . Then show that
10. Show that (x2 (t), y2 (t)) = (− sin t, cos t) is a solution to
there are at least two solutions to the initial value problem
(6.1.8).
x(0) = 0 for this differential equation.
11. Using Exercises 9 and 10, find a solution (x(t), y(t)) to
(6.1.8) that satisfies (x(0), y(0)) = (0, 1).
12. Using Exercises 9 and 10, find a solution (x(t), y(t)) to 19. (matlab) Use PhasePlane to investigate the system of
(6.1.8) that satisfies (x(0), y(0)) = (1, 1). differential equations
dx
= −2y
In Exercises 13 – 14, consider the system of differential equa- dt (6.1.10)
dy
tions = −x + y.
dx dt
= −2x + 7y
dt
dy (6.1.9) (a) Use PhasePlane to find two independent eigendirections
dt
= 5y, (and hence eigenvectors) for (6.1.10).
(b) Using (a), find the eigenvalues of the coefficient matrix
13. Find a solution to (6.1.9) satisfying the initial condition
of (6.1.10).
(x(0), y(0)) = (1, 0).
(c) Find a closed form solution to (6.1.10) satisfying the ini-
14. Find a solution to (6.1.9) satisfying the initial condition tial condition !
(x(0), y(0)) = (−1, 2). 4
X(0) = .
−1
In Exercises 15 – 17, consider the matrix (d) Study the time series of y versus t for the solution in
(c) by comparing the graph of the closed form solution
−1 −10 −6 obtained in (c) with the time series graph using Phase-
C= 0 4 3 . Plane.
0 −14 −9
156
§6.2 Closed Form Solutions by the Direct Method
6.2 Closed Form Solutions by the ness we repeat the result. The general solution is:
Direct Method X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 . (6.2.2)
In Section 4.7 we showed in detail how solutions to pla-
nar systems of constant coefficient differential equations The initial value problem is solved by finding real num-
with distinct real eigenvalues are found. This method bers α1 and α2 such that
was just reviewed in Section 6.1 where we saw that the X0 = α1 v1 + α2 v2 .
crucial step in solving these systems of differential equa-
tions is the step where we find two linearly independent See Section 4.7 for a detailed discussion with examples.
solutions. In this section we discuss how to find these
two linearly independent solutions when the eigenvalues
of the coefficient matrix are either complex or real and Complex Conjugate Eigenvalues Suppose that the
equal. eigenvalues of C are complex, that is, suppose that
λ1 = σ + iτ with τ 6= 0 is an eigenvalue of C with eigen-
By finding these two linearly independent solutions we vector v1 = v +iw, where v, w ∈ R2 . We claim that X1 (t)
will find both the general solution of the system of differ- and X2 (t), where
ential equations Ẋ = CX and a method for solving the
initial value problem X1 (t) = eσt (cos(τ t)v − sin(τ t)w)
(6.2.3)
X2 (t) = eσt (sin(τ t)v + cos(τ t)w),
dX
= CX
dt (6.2.1) are solutions to (6.2.1) and that the general solution to
X(0) = X0 .
(6.2.1) is:
X(t) = α1 X1 (t) + α2 X2 (t), (6.2.4)
The principle results of this section are summarized as
follows. Let C be a 2 × 2 matrix with eigenvalues λ1 and where α1 , α2 are real scalars.
λ2 , and associated eigenvectors v1 and v2 . Two basic observations are needed when deriving (6.2.3)
and (6.2.4); these observations use basic properties of the
(a) If the eigenvalues are real and v1 and v2 are linearly complex exponential function.
independent, then the general solution to (6.2.1) is
given by (6.2.2). The first property is Euler’s celebrated formula:
(b) If the eigenvalues are complex, then the general so- eiτ = cos τ + i sin τ (6.2.5)
lution to (6.2.1) is given by (6.2.3) and (6.2.4).
for any real number τ . A justification of this formula is
(c) If the eigenvalues are equal (and hence real) and given in Exercise 1. The second property is the important
there is only one linearly independent eigenvector, feature of exponential functions:
then the general solution to (6.2.1) is given by
(6.2.18). ex+y = ex ey (6.2.6)
157
§6.2 Closed Form Solutions by the Direct Method
Euler’s formula allows us to differentiate complex expo- Lemma 6.2.1. The complex vector-valued function X(t)
nentials, obtaining the expected result: is a solution to Ẋ = CX if and only if the real and
imaginary parts are real vector-valued solutions to Ẋ =
d iτ t d CX.
e = (cos(τ t) + i sin(τ t))
dt dt
= τ (− sin(τ t) + i cos(τ t))
Proof Equating the real and imaginary parts of
= iτ (cos(τ t) + i sin(τ t)) (6.2.10) implies that Ẋ1 = CX1 and Ẋ2 = CX2 .
= iτ eiτ t .
It follows from Lemma 6.2.1 that finding one complex-
Euler’s formula also implies that valued solution to a linear differential equation provides
us with two real-valued solutions. Identity (6.2.9) implies
eλt = eσt+iτ t = eσt eiτ t = eσt (cos(τ t)+i sin(τ t)), (6.2.8) that
X(t) = eλ1 t v1
where λ = σ + iτ . Most importantly, we note that
is a complex-valued solution to (6.2.1). Using Euler’s
d λt formula we compute the real and imaginary parts of X(t),
e = λeλt . (6.2.9)
dt as follows.
We use (6.2.8) and the product rule for differentiation to X(t) = e(σ+iτ )t (v + iw)
verify (6.2.9) as follows: = eσt (cos(τ t) + i sin(τ t))(v + iw)
d λt d σt iτ t = eσt (cos(τ t)v − sin(τ t)w)
e = e e
dt dt + ieσt (sin(τ t)v + cos(τ t)w).
σeσt eiτ t + eσt iτ eiτ t
=
Since the real and imaginary parts of X(t) are solutions
= (σ + iτ )eσt+iτ t to Ẋ = CX, it follows that the real-valued functions
= λeλt . X1 (t) and X2 (t) defined in (6.2.3) are indeed solutions.
Returning to the case where C is a 2 × 2 matrix, we
Verification that (6.2.4) is the General Solution A com- see that if X1 (0) = v and X2 (0) = w are linearly inde-
plex vector-valued function X(t) = X1 (t) + iX2 (t) ∈ Cn pendent, then Corollary 6.1.2 implies that (6.2.4) is the
consists of a real part X1 (t) ∈ Rn and an imaginary part general solution to Ẋ = CX. The linear independence
X2 (t) ∈ Rn . For such functions X(t) we define of v and w is verified using the following lemma.
Ẋ1 + iẊ2 = Ẋ = CX = CX1 + iCX2 . (6.2.10) and v and w are linearly independent vectors.
158
§6.2 Closed Form Solutions by the Direct Method
The characteristic polynomial for the matrix C is: The characteristic polynomial of A is
pA (λ) = λ2 −tr(A)λ+det(A) = λ2 −2λ1 λ+λ21 = (λ−λ1 )2 .
pC (λ) = λ2 + 4λ + 13,
Thus the eigenvalues of A both equal λ1 .
whose roots are λ1 = −2 + 3i and λ2 = −2 − 3i. So
159
§6.2 Closed Form Solutions by the Direct Method
In matrix form this equation is that this observation about generalized eigenvectors is
always valid.
0 1
0 = (A − λ1 I2 )v = v.
0 0 Lemma 6.2.4. Let C be a 2×2 matrix with both eigenval-
ues equal to λ1 and with one linearly independent eigen-
A quick calculation shows that all solutions are multiples vector v1 . Let w1 be a generalized eigenvector of C, then
of v1 = e1 = (1, 0)t . v1 and w1 are linearly independent.
In fact, this observation is valid for any 2 × 2 matrix that
has equal eigenvalues and is not a scalar multiple of the Proof If v1 and w1 were linearly dependent, then w1
identity, as the next lemma shows. would be a multiple of v1 and hence an eigenvector of C.
Lemma 6.2.3. Let C be a 2 × 2 matrix. Suppose that But C − λ1 I2 applied to an eigenvector is zero, which is
C has two linearly independent eigenvectors both with a contradiction. Therefore, v1 and w1 are linearly inde-
eigenvalue λ1 . Then C = λ1 I2 . pendent.
Therefore, Cv = λ1 v for every v ∈ R2 and hence C = Lemma 6.2.5. Let C be a 2 × 2 matrix with a double
λ1 I2 . eigenvalue λ1 ∈ R. Then
(C − λ1 I2 )2 = 0. (6.2.17)
Generalized Eigenvectors Suppose that C has exactly one
linearly independent real eigenvector v1 with a double Proof Suppose that C has two linearly independent
real eigenvalue λ1 . We call w1 a generalized eigenvector eigenvectors. Then Lemma 6.2.3 implies that C − λ1 I2 =
of C it satisfies the system of linear equations 0 and hence that (C − λ1 I2 )2 = 0.
(C − λ1 I2 )w1 = v1 . (6.2.16) Suppose that C has one linearly independent eigenvec-
tor v1 and a generalized eigenvector w1 . It follows from
The matrix A in (6.2.15) has a generalized eigenvector. Lemma 6.2.4(a) that {v1 , w1 } is a basis of R2 . It also
To verify this point solve the linear system follows by definition of eigenvector and generalized eigen-
vector that
0 1 1
(C − λ1 I2 )w1 = w1 = v1 = (C − λ1 I2 )2 v1 = (C − λ1 I2 )0 = 0
0 0 0
(C − λ1 I2 )2 w1 = (C − λ1 I2 )v1 = 0
for w1 = e2 . Note that for this matrix C, v1 = e1 and
w1 = e2 are linearly independent. The next lemma shows Hence, (6.2.17) is valid.
160
§6.2 Closed Form Solutions by the Direct Method
Independent Solutions to Differential Equations with Equal Thus λ1 = −2 is an eigenvalue of multiplicity two. It
Eigenvalues Suppose that the 2 × 2 matric C has a dou- follows that
ble eigenvalue λ1 . Then the general solution to the initial
3 −1
value problem Ẋ = CX and X(0) = X0 is: C − λ1 I2 =
9 −3
X(t) = eλ1 t [I2 + t(C − λ1 I2 )]X0 . (6.2.18)
and from (6.2.18) that
This is the form of the solution that is given by ma-
trix exponentials. We verify (6.2.18) by observing that
−2t 1 + 3t −t 2 −2t 2 + 3t
X(t) = e =e .
X(0) = X0 and calculating 9t 1 − 3t 3 3 + 9t
An Example with Equal Eigenvalues Consider the system In modern language De Moivre’s formula states that
of differential equations n
eniθ = eiθ .
dX 1 −1
= X (6.2.19)
dt 9 −5 In Exercises 2 - 3 use De Moivre’s formula coupled with Eu-
ler’s formula (6.2.5) to determine trigonometric identities for
with initial value
the given quantity in terms of cos θ, sin θ, cos ϕ, sin ϕ.
2
X0 = . 2. cos(θ + ϕ).
3
3. sin(3θ).
1 −1
The characteristic polynomial for the matrix
9 −5
is In Exercises 4 – 7 compute the general solution for the given
pC (λ) = λ2 + 4λ + 4 = (λ + 2)2 . system of differential equations.
161
§6.2 Closed Form Solutions by the Direct Method
to prove
10. eiπ + 1.
162
§6.3 Similar Matrices and Jordan Normal Form
Definition 6.3.1. The n × n matrices B and C are sim- and the eigenvalues of A and B are equal.
ilar if there exists an invertible n × n matrix P such that
Proof The determinant is a function on 2 × 2 matrices
C = P −1 BP. that has several important properties. Recall, in partic-
ular, from Chapter 3, Theorem 3.8.2 that for any pair of
Our interest in similar matrices stems from the fact that 2 × 2 matrices A and B:
if we know the solutions to the system of differential equa-
det(AB) = det(A) det(B), (6.3.1)
tions Ẏ = CY , then we also know the solutions to the
system of differential equations Ẋ = BX. More precisely, and for any invertible 2 × 2 matrix P
Lemma 6.3.2. Suppose that B and C = P −1 BP are 1
det(P −1 ) = . (6.3.2)
similar matrices. If Y (t) is a solution to the system of det(P )
differential equations Ẏ = CY , then X(t) = P Y (t) is a
solution to the system of differential equations Ẋ = BX. Let P be an invertible 2 × 2 matrix so that B = P −1 AP .
Using (6.3.1) and (6.3.2) we see that
Proof Since the entries in the matrix P are constants,
pB (λ) = det(B − λI2 )
it follows that
dX dY = det(P −1 AP − λI2 )
=P .
dt dt = det(P −1 (A − λI2 )P )
Since Y (t) is a solution to the Ẏ = CY equation, it = det(A − λI2 )
follows that = pA (λ).
dX
= P CY. Hence the eigenvalues of A and B are the same. It follows
dt
from (4.6.8) and (4.6.9) of Section 4.6 that the determi-
Since Y = P −1 X and P CP −1 = B, nants and traces of A and B are equal.
dX For example, if
= P CP −1 X = BX.
dt
−1 0 1 2
Thus X(t) is a solution to Ẋ = BX, as claimed. A= and P = ,
0 1 1 1
163
§6.3 Similar Matrices and Jordan Normal Form
(c) Suppose that C has exactly one linearly independent P −1 CP e1 = P −1 Cv1 = σP −1 v1 + τ P −1 v2 = σe1 + τ e2 ,
real eigenvector v1 with real eigenvalue λ1 . Then and
−1 λ1 1
P CP = , P −1 CP e2 = P −1 Cv2 = −τ P −1 v1 +σP −1 v2 = −τ e1 +σe2 .
0 λ1
where v2 is a generalized eigenvector of C that sat- Thus the columns of P −1 CP are
isfies
σ −τ
(C − λ1 I2 )v2 = v1 . (6.3.3) and ,
τ σ
164
§6.3 Similar Matrices and Jordan Normal Form
165
§6.3 Similar Matrices and Jordan Normal Form
Thus the matrix P that transforms C into normal form 1. Suppose that the matrices A and B are similar and the
is matrices B and C are similar. Show that A and C are also
similar matrices.
2 0 1 3 0
P = and P −1 = .
−1 −3 6 −1 −2 2. Use (4.6.13) in Chapter 3 to verify that the traces of similar
matrices are equal.
It follows from (6.3.4) that the solution to the initial value
problem is
In Exercises 3 – 4 determine whether or not the given matrices
are similar, and why.
−2t cos(3t) − sin(3t) −1
X(t) = e P P X0
sin(3t) cos(3t)
1 2 2 −2
3. and .
1 −2t 2 0 cos(3t) − sin(3t) 3 0 A = B =
= e X0 . 3 4 −3 8
6 −1 −3 sin(3t) cos(3t) −1 −2
166
§6.3 Similar Matrices and Jordan Normal Form
and let
7. Solve the initial value problem
−2.8 −3.4
B = P −1 CP =
2.6 2.8
ẋ = 2x + 3y
Describe the solutions to the system
ẏ = −3x + 2y
dX
where x(0) = 1 and y(0) = −2. = BX. (6.3.6)
dt
What is the relationship between solutions of (6.3.5) to solu-
8. Solve the initial value problem tions of (6.3.6)?
ẋ = −2x + y
ẏ = −2y
167
§6.4 Sinks, Saddles, and Sources
6.4 Sinks, Saddles, and Sources (b) If the eigenvalues of C have positive real part, then
the origin is a source.
The qualitative theory of autonomous differential equa-
tions begins with the observation that many important (c) If one eigenvalue of C is positive and one is negative,
properties of solutions to constant coefficient systems of then the origin is a saddle.
differential equations
dX
= CX (6.4.1) Proof Lemma 6.3.3 states that the similar matrices B
dt and C have the same eigenvalues. Moreover, as noted
are unchanged by similarity. the origin is a sink, saddle, or source for B if and only if
it is a sink, saddle, or source for C. Thus, we need only
We call the origin of the linear system (6.4.1) a sink (or verify the theorem for normal form matrices as given in
asymptotically stable) if all solutions X(t) satisfy Table 2.
lim X(t) = 0. (a) If the eigenvalues λ1 and λ2 are real and there
t→∞
are two independent eigenvectors, then Chapter 6, The-
The origin is a source if all nonzero solutions X(t) satisfy orem 6.3.4 states that the matrix C is similar to the di-
agonal matrix
lim ||X(t)|| = ∞.
λ1 0
t→∞ B= .
0 λ2
Finally, the origin is a saddle if some solutions limit to
0 and some solutions grow infinitely large. Recall also The general solution to the differential equation Ẋ = BX
from Lemma 6.3.2 that if B = P −1 CP , then P −1 X(t) is
is a solution to Ẋ = BX whenever X(t) is a solution to x1 (t) = α1 eλ1 t and x2 (t) = α2 eλ2 t .
(6.4.1). Since P −1 is a matrix of constants that do not
depend on t, it follows that Since
lim eλ1 t = 0 = lim eλ2 t ,
−1 t→∞ t→∞
lim X(t) = 0 ⇐⇒ lim P X(t) = 0.
t→∞ t→∞
when λ1 and λ2 are negative, it follows that
or
lim X(t) = 0
lim ||X(t)|| = ∞ ⇐⇒ lim ||P −1 X(t)|| = ∞. t→∞
t→∞ t→∞
for all solutions X(t), and the origin is a sink. Note that
It follows the origin is C is a sink (or saddle or source)
if both of the eigenvalues are positive, then X(t) will
for (6.4.1) if and only if P −1 X(t) is a sink (or saddle or
undergo exponential growth and the origin is a source.
source) for Ẋ = BX.
(b) If the eigenvalues of C are the complex conjugates
Theorem 6.4.1. Consider the system (6.4.1) where C σ ±iτ where τ 6= 0, then Chapter 6, Theorem 6.3.4 states
is a 2 × 2 matrix. that after a similarity transformation (6.4.1) has the form
168
§6.4 Sinks, Saddles, and Sources
and solutions for this equation have the form (6.3.4) of and not on the formulae for solutions to (6.4.1). This
Chapter 6, that is, is a much simpler calculation. However, Theorem 6.4.2
simplifies the calculation substantially further.
σt cos(τ t) − sin(τ t)
X(t) = e X0 = eσt Rτ t X0 , Theorem 6.4.2. (a) If det(C) < 0, then 0 is a saddle.
sin(τ t) cos(τ t)
where Rτ t is a rotation matrix (recall (3.2.2) of Chap- (b) If det(C) > 0 and tr(C) < 0, then 0 is a sink.
ter 3). It follows that as time evolves the vector X0 is (c) If det(C) > 0 and tr(C) > 0, then 0 is a source.
rotated about the origin and then expanded or contracted
by the factor eσt . So when σ < 0, lim X(t) = 0 for all Proof Recall from (4.6.9) that det(C) is the product
t→∞
solutions X(t). Hence the origin is a sink and when σ > 0 of the eigenvalues of C. Hence, if det(C) < 0, then the
solutions spiral away from the origin and the origin is a signs of the eigenvalues must be opposite, and we have
source. a saddle. Next, suppose det(C) > 0. If the eigenvalues
(c) If the eigenvalues are both equal to λ1 and if there is are real, then the eigenvalues are either both positive (a
only one independent eigenvector, then Chapter 6, The- source) or both negative (a sink). Recall from (4.6.8) that
orem 6.3.4 states that after a similarity transformation tr(C) is sum of the eigenvalues and the sign of the trace
(6.4.1) has the form determines the sign of the eigenvalues. Finally, assume
the eigenvalues are complex conjugates σ ± iτ . Then
det(C) = σ 2 + τ 2 > 0 and tr(C) = 2σ. Thus, the sign of
λ1 1
Ẋ = X,
0 λ1 the real parts of the complex eigenvalues is given by the
sign of tr(C).
whose solutions are
1 t
X(t) = e tλ
X0 Time Series It is instructive to note how the time series
0 1
x1 (t) damps down to the origin in the three cases listed
using Table 2(c). Note that the functions in Theorem 6.4.1. In Figure 18 we present the time series
eλ1 t and teλ1 t both have limits equal to zero as for the three coefficient matrices:
t → ∞. In the second case, use l’Hôspital’s rule and the
−2 0
assumption that −λ1 > 0 to compute C1 =
0 −1
,
t 1 −1 −55
lim = − lim = 0. C2 = ,
t→∞ e−λ1 t t→∞ λ1 e−λ1 t 55 −1
Hence lim X(t) = 0 for all solutions X(t) and the origin
−2 1
t→∞ C3 = .
0 −2
is asymptotically stable. Note that initially ||X(t)|| can
grow since t is increasing. But eventually exponential In this figure, we can see the exponential decay to zero
decay wins out and solutions limit on the origin. Note associated with the unequal real eigenvalues of C1 ; the
that solutions grow exponentially when λ1 > 0. damped oscillation associated with the complex eigenval-
ues of C2 ; and the initial growth of the time series due
Theorem 6.4.1 shows that the qualitative features of the to the te−2t term followed by exponential decay to zero
origin for (6.4.1) depend only on the eigenvalues of C in the equal eigenvalue C3 example.
169
§6.4 Sinks, Saddles, and Sources
0
0
15
−0.5
−2
−1
10 −4
−1.5
−6
5
−2
−8
x
x
−2.5 0 −10
−3 −12
−5
−3.5 −14
−10
−4 −16
−4.5 −18
−15
−20
−5
−0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 −1 0 1 2 3 4 −0.5 0 0.5 1 1.5 2 2.5 3
t t t
Sources Versus Sinks The explicit form of solutions to Phase Portraits for Saddles Next we discuss the phase
planar linear systems shows that solutions with initial portraits of linear saddles. Using PhasePlane, draw the
conditions near the origin grow exponentially in forward phase portrait of the saddle
time when the origin of (6.4.1) is a source. We can prove
this point geometrically, as follows. ẋ = 2x + y
(6.4.4)
ẏ = −x − 3y,
The phase planes of sources and sinks are almost the
same; they have the same trajectories but the arrows are
as in Figure 20. The important feature of saddles is that
reversed. To verify this point, note that
there are special trajectories (the eigendirections) that
limit on the origin in either forward or backward time.
Ẋ = −CX (6.4.2)
Definition 6.4.3. The stable manifold or stable orbit of
is a sink when (6.4.1) is a source; observe that the tra- a saddle consists of those trajectories that limit on the
jectories of solutions of (6.4.1) are the same as those origin in forward time; the unstable manifold or unstable
of (6.4.2) — just with time running backwards. For orbit of a saddle consists of those trajectories that limit
let X(t) be a solution to (6.4.1); then X(−t) is a so- on the origin in backward time.
lution to (6.4.2). See Figure 19 for plots of Ẋ = BX and
Ẋ = −BX where Let λ1 < 0 and λ2 > 0 be the eigenvalues of a saddle with
associated eigenvectors v1 and v2 . The stable orbits are
−1 −5
B= . (6.4.3) given by the solutions X(t) = ±eλ1 t v1 and the unstable
5 −1
orbits are given by the solutions X(t) = ±eλ2 t v2 .
170
§6.4 Sinks, Saddles, and Sources
5 5
4 4
3 3
2 2
1 1
0 0
y
y
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x
Figure 19: (Left) Sink Ẋ = BX where B is given in (6.4.3). (Right) Source Ẋ = −BX.
principal use of this feature is seen when analyzing non- plotted a time series of the first quadrant solution. Note
linear systems, it is useful to introduce this feature now. how the x time series increases exponentially to +∞ in
As an example, load the linear system (6.4.4) into Phase- forward time and the y time series decreases in forward
Plane and click on Update. Now pull down the Anal- time while going exponentially towards −∞. The two
ysis menu and click on Find nearby equilibrium. Click time series together give the trajectory (x(t), y(t)) that
the cross hairs in the PHASEPLANE Display window on in forward time is asymptotic to the line given by the
a point near the origin; PhasePlane responds by plot- unstable eigendirection.
ting the equilibrium and the real eigenvectors — and by
putting a small circle about the origin. The circle in-
dicates that the numerical algorithm programmed into Exercises
PhasePlane has detected an equilibrium near the chosen
point. This process numerically verifies that the origin is
a saddle (a fact that could have been verified in a more In Exercises 1 – 3 determine whether or not the equilibrium
straightforward way). at the origin in the system of differential equations Ẋ = CX
is asymptotically stable.
Now pull down the Analysis menu again and click on Solve
for stable separatrices. PhasePlane responds by drawing
1 2
the stable and unstable orbits. The result is shown in 1. C = .
4 1
Figure 20(left). On this figure we have also plotted one
trajectory from each quadrant; thus obtaining the phase
−1 2
2. C = .
portrait of a saddle. On the right of Figure 20 we have −4 −1
171
§6.4 Sinks, Saddles, and Sources
5 25
x
4
y
3 20
1 15
0
y
x and y
−1 10
−2
−3 5
−4
−5 0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
−5
−1.5 −1 −0.5 0 0.5 1 1.5 2
t
Figure 20: (Left) Saddle phase portrait. (Right) First quadrant solution time series.
2 1 1 −8
3. C = . 9. C = .
1 −5 2 1
172
§6.4 Sinks, Saddles, and Sources
16.
1 1 2
v1 = v2 = λ1 = −1 λ2 = 3 X0 =
−1 1 0
17.
1 0 2
v1 = ; v2 = ; λ1 = −1; λ2 = 3; X0 =
−1 −3 −2
18.
1−i 1+i 2
v1 = ; v2 = ; λ1 = −1+i; λ2 = −1−i; X0 =
−1 −1 0
173
§6.5 *Matrix Exponentials
174
§6.5 *Matrix Exponentials
(6.5.1) in closed form is equivalent to finding a closed Next verify (6.5.6) by computing
form expression for the matrix exponential etC .
∞ ∞
1 k X 1 −1
Theorem 6.5.1. The unique solution to the initial value
X
eC = C = (P BP )k
problem k!
k=0
k!
k=0
dX ∞ ∞
!
= CX X 1 −1 k X 1 k
dt = P B P = P −1 B P = P −1 eB P.
k! k!
k=0 k=0
X(0) = X0
is
tC
X(t) = e X0 .
Proof Existence follows from the previous discussion. Explicit Computation of Matrix Exponentials We be-
For uniqueness, suppose that Y (t) is a solution to Ẏ = gin with the simplest computation of a matrix exponen-
CY with Y (0) = X0 . We claim that Y (t) = X(t). Let tial.
Z(t) = e−tC Y (t) and use the product rule to compute (a) Let L be a multiple of the identity; that is, let
L = αIn where α is a real number. Then
dZ dY
= −Ce−tC Y (t)+e−tC (t) = e−tC (−CY (t)+CY (t)) = 0
dt dt
eαIn = eα In . (6.5.7)
It follows that Z is constant in t and Z(t) = Z(0) =
Y (0) = X0 or Y (t) = etC X0 = X(t), as claimed. That is, eαIn is a scalar multiple of the identity. To verify
(6.5.7), compute
Similarity and Matrix Exponentials We introduce similar-
α2 2 α3 3
ity at this juncture for the following reason: if C is a ma- eαIn = In + αIn + I + I + ···
trix that is similar to B, then eC can be computed from 2! n 3! n
eB . More precisely: α2 α3
= (1 + α + + + · · · )In = eα In .
2! 3!
Lemma 6.5.2. Let C and B be n × n similar matrices,
and let P be an invertible n × n matrix such that (b) Let C be a 2 × 2 diagonal matrix,
C = P −1 BP. !
λ1 0
Then C= ,
0 λ2
C
e =P −1 B
e P. (6.5.6)
where λ1 and λ2 are real constants. Then
Proof Note that for all powers of k we have !
eλ1 t 0
etC
= . (6.5.8)
(P −1 BP )k = P −1 B k P. 0 eλ 2 t
175
§6.5 *Matrix Exponentials
To verify (6.5.8) compute In this computation we have used the fact that the
trigonometric functions cos t and sin t have the power se-
t2 2 t3 3 ries expansions:
etC = I2 + tC + C + C + ···
2! 3!
2
X (−1)k ∞
t 2 1 2 1
t + t4 + · · · = t2k ,
! !
1 0 λ1 t 0 2! λ1 0 cos t = 1−
2! 4! (2k)!
= + + 2 + ··· k=0
0 1 0 λ2 t t 2
0 λ ∞
2! 2 1 3 1 5 X (−1)k 2k+1
! sin t = t − t + t + ··· = t .
eλ1 t 0 3! 5! (2k + 1)!
k=0
= .
0 eλ2 t
See Exercise 10 for an alternative proof of (6.5.9).
(c) Suppose that To compute the matrix exponential MATLAB provides
the command expm. We use this command to compute
the matrix exponential etC for
!
0 −1
C= .
1 0 !
0 −1 π
C= and t = .
Then ! 1 0 4
cos t − sin t
etC = . (6.5.9)
sin t cos t Type
We begin this computation by observing that
C = [0, -1; 1, 0];
C 2 = −I2 , C 3 = −C, and C 4 = In . t = pi/4;
expm(t*C)
Therefore, by collecting terms of odd and even power in
the series expansion for the matrix exponential we obtain that gives the answer
t2 2 t3 3
etC = I2 + tC + C + C + ··· ans =
2! 3! 0.7071 -0.7071
t2 t3 0.7071 0.7071
= I2 + tC − I2 − C + · · ·
2! 3!
t2 t4 t6
= 1 − + − + · · · I2 + Indeed, this is precisely what we expect by (6.5.9), since
2! 4! 6!
π π 1
t3 t5 t7
t − + − + ··· C cos = sin = √ ≈ 0.70710678.
3! 5! 7! 4 4 2
= (cos t)I2 + (sin t)C
! (d) Let
cos t − sin t
!
= . 0 1
sin t cos t C= .
0 0
176
§6.5 *Matrix Exponentials
0 0 0
!
1. (matlab) Let L be the 3 × 3 matrix 6.
0 −2
.
2 0
2 0 −1
L = 0 −1 3 .
by choosing for t the values 1.0, 1.5 and 2.5. Does eC e1.5C = dx
= x
e2.5C ? dt (6.5.11)
x(0) = es
3. (matlab) For the scalar exponential function et it is well (b) Fix s and verify that z(t) = et es is also a solution to
known that for any pair of real numbers t1 , t2 the following (6.5.11).
equality holds: (c) Use Theorem 6.5.1 to conclude that y(t) = z(t) for every
et1 +t2 = et1 et2 . s.
Use MATLAB to find two 2 × 2 matrices C1 and C2 such that
In this exercise you will need to use the following theorem
eC1 +C2 6= eC1 eC2 . from analysis:
177
§6.5 *Matrix Exponentials
10. Prove that That is, to compute ||A||m , first sum the absolute values of
!! ! the entries in each row of A, and then take the maximum of
0 −1 cos t − sin t these sums. Prove that:
exp t = . (6.5.13)
1 0 sin t cos t
||AB||m ≤ ||A||m ||B||m .
Hint:
Hint: Begin by noting that
! !
cos t − sin t ! !
(a) Verify that X1 (t) = and X2 (t) =
n X
X n n X
X n
! n X
X n
dX 0 −1 = max |aik bkj | .
= X 1≤i≤n
dt 1 0 (6.5.14) k=1 j=1
for j = 1, 2. c1 + c2 + · · · + cN + · · ·
(b) Since Xj (0) = ej , use Theorem 6.5.1 to verify that converges absolutely if there is a constant K such that for
!! every N the partial sum satisfies:
0 −1
Xj (t) = exp t ej . (6.5.15)
1 0 |c1 | + |c2 | + · · · + |cN | ≤ K.
178
§6.5 *Matrix Exponentials
179
§6.6 *The Cayley Hamilton Theorem
6.6 *The Cayley Hamilton Theorem Using the fact that pA (λ) = λ2 − tr(A)λ + det(A), we see
that
The Jordan normal form theorem (Theorem 6.3.4) for
real 2×2 matrices states that every 2×2 matrix is similar pC (λ) = (λ − λ1 )(λ − λ2 )
to one of the matrices in Table 2. We use this theorem
pD (λ) = λ2 − 2σλ + (σ 2 + τ 2 )
to prove the Cayley Hamilton theorem for 2 × 2 matrices
and then use the Cayley Hamilton theorem to present pE (λ) = (λ − λ1 )2 .
another method for computing solutions to planar linear
It now follows that
systems of differential equations in the case of real equal
eigenvalues. pC (C) = (C − λ1 I2 )(C − λ2 I2 )
The Cayley Hamilton theorem states that a matrix sat-
0 0 λ1 − λ2 0
=
isfies its own characteristic polynomial. More precisely: 0 λ2 − λ1 0 0
= 0,
Theorem 6.6.1 (Cayley Hamilton Theorem). Let A be
a 2 × 2 matrix and let and
σ 2 − τ 2 −2στ
pA (λ) = λ2 + aλ + b pD (D) = −
2στ σ2 − τ 2
be the characteristic polynomial of A. Then
σ −τ
2σ +
τ σ
pA (A) = A2 + aA + bI2 = 0.
2 2 1 0
(σ + τ ) = 0,
0 1
Proof Suppose B = P −1 AP and A are similar matri-
ces. We claim that if pA (A) = 0, then pB (B) = 0. To and
verify this claim, recall from Lemma 6.3.3 that pA = pB
0 1
2
2
and calculate pE (E) = (E − λ1 I2 ) =
0 0
= 0.
pB (B) = pA (P −1 AP ) = (P −1 AP )2 + aP −1 AP + bI2
−1
=P pA (A)P = 0.
The Example with Equal Eigenvalues Revisited When
Theorem 6.3.4 classifies 2 × 2 matrices up to similarity. the eigenvalues λ1 = λ2 , the closed form solution of Ẋ =
Thus, we need only verify this theorem for the matrices CX is a straightforward formula
180
§6.6 *The Cayley Hamilton Theorem
181
§6.7 *Second Order Equations
6.7 *Second Order Equations x(t) measure the distance that the spring is extended (or
compressed). It follows from Newton’s Law that (6.7.3)
A second order constant coefficient homogeneous differ-
is satisfied. Hooke’s law states that the force F acting on
ential equation is a differential equation of the form:
a spring is
ẍ + bẋ + ax = 0, (6.7.1) F = −κx,
where a and b are real numbers. where κ is a positive constant. If the spring is damped
by sliding friction, then
d2 x
+ g = 0. (6.7.4) m
dt2
182
§6.7 *Second Order Equations
A Reduction to a First Order System There is a simple Q in (6.7.7). Second, we know from the general theory of
trick that reduces a single linear second order differential planar systems that solutions will have the form x(t) =
equation to a system of two linear first order equations. eλ0 t for some scalar λ0 . We need only determine the
For example, consider the linear homogeneous ordinary values of λ0 for which we get solutions to (6.7.1).
differential equation (6.7.1). To reduce this second order We now discuss the second approach. Suppose that
equation to a first order system, just set y = ẋ. Then x(t) = eλ0 t is a solution to (6.7.1). Substituting this
(6.7.1) becomes form of x(t) in (6.7.1) yields the equation
ẏ + by + ax = 0. λ20 + bλ0 + a eλ0 t = 0.
It follows that if x(t) is a solution to (6.7.1) and y(t) = So x(t) = eλ0 t is a solution to (6.7.1) precisely when
ẋ(t), then (x(t), y(t)) is a solution to pQ (λ0 ) = 0, where
ẋ = y
(6.7.6) pQ (λ) = λ2 + bλ + a (6.7.8)
ẏ = −ax − by.
is the characteristic polynomial of the matrix Q in (6.7.7).
We can rewrite (6.7.6) as
Suppose that λ1 and λ2 are distinct real roots of pQ .
Ẋ = QX. Then the general solution to (6.7.1) is
183
§6.7 *Second Order Equations
So α1 = −2, α2 = 2, and the solution to the initial value Summary It follows from this discussion that solutions
problem for (6.7.9) is to second order homogeneous linear equations are either
a linear combination of two exponentials (real unequal
x(t) = −2e−t + 2e−2t eigenvalues), α + βt times one exponential (real equal
eigenvalues), or a time periodic function times an expo-
nential (complex eigenvalues).
An Example with Complex Conjugate Eigenvalues Con- In particular, if the real part of the complex eigenvalues
sider the differential equation is zero, then the solution is time periodic. The frequency
of this periodic solution is often called the internal fre-
ẍ − 2ẋ + 5x = 0. (6.7.10) quency, a point that is made more clearly in the next
example.
The roots of the characteristic polynomial associated to
(6.7.10) are λ1 = 1 + 2i and λ2 = 1 − 2i. It follows from
the discussion in the previous section that the general Solving the Spring Equation Consider the equation for
solution to (6.7.10) is the frictionless spring without external forcing. From
(6.7.5) we get
x(t) = Re α1 eλ1 t + α2 eλ2 t (6.7.11)
mẍ + κx = 0.
r r
κ κ
where α1 and α2 are complex scalars. Indeed, we can where κ > 0. The roots are λ1 = i and λ2 = − i.
rewrite this solution in real form (using Euler’s formula) m m
So the general solution is
as
x(t) = et (β1 cos(2t) + β2 sin(2t)) , x(t) = α cos(τ t) + β sin(τ t),
184
§6.7 *Second Order Equations
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Figure 22: (Left) Graph of solution to undamped spring equation with initial conditions x(0) = 1 and ẋ(0) = 0. (Right)
Graph of solution to damped spring equation with the same initial conditions.
Since σ < 0, these solutions oscillate but damp down to 2. By direct integration solve the differential equation (6.7.4)
zero. In particular, the solution satisfying initial condi- for a point particle moving only under the influence of gravity.
tions x(0) = 1 and ẋ(0) = 0 is Show that the solution is
1
µ x(t) = − gt2 + v0 t + x0
x(t) = e−µt/2m cos(τ t) − sin(τ t) . 2
2mτ where x0 is the initial position of the particle and v0 is the
initial velocity.
µ
The graph of this solution when τ = 1 and = 0.07
2m
is given in Figure 22 (right). Compare the solutions for In Exercises 3 – 5 find the general solution to the given dif-
the undamped and damped springs. ferential equation.
185
§6.7 *Second Order Equations
3. ẍ + 2ẋ − 3x = 0.
4. ẍ − 6ẋ + 9x = 0. In addition, find the solution to this
equation satisfying initial values x(1) = 1 and ẋ(1) = 0.
5. ẍ + 2ẋ + 2x = 0.
186
Chapter 7 Determinants and Eigenvalues
7 Determinants and
Eigenvalues
In Section 3.8 we introduced determinants for 2 × 2 ma-
trices A. There we showed that the determinant of A is
nonzero if and only if A is invertible. In Section 4.6 we
saw that the eigenvalues of A are the roots of its char-
acteristic polynomial, and that its characteristic polyno-
mial is just the determinant of a related matrix, namely,
pA (λ) = det(A − λI2 ).
In Section 7.1 we generalize the concept of determinants
to n×n matrices, and in Section 7.2 we use determinants
to show that every n×n matrix has exactly n eigenvalues
— the roots of its characteristic polynomial. Properties
of eigenvalues are also discussed in detail in Section 7.2.
Certain details concerning determinants are deferred to
Appendix 7.4.
187
§7.1 Determinants
7.1 Determinants Proof (a) Note that Definition 7.1.1(a) implies that
D(cIn ) = cn . It follows from (7.1.1) that
There are several equivalent ways to introduce determi-
nants — none of which are easily motivated. We prefer to
D(cA) = D(cIn A) = D(cIn )D(A) = cn D(A).
define determinants through the properties they satisfy
rather than by formula. These properties actually en-
able us to compute determinants of n × n matrices where (b) Definition 7.1.1(b) implies that it suffices to prove
n > 3, which further justifies the approach. Later on, we this assertion when one row of A is zero. Suppose that
will give an inductive formula (7.1.9) for computing the the ith row of A is zero. Let J be an n × n diagonal
determinant. matrix with a 1 in every diagonal entry except the ith
diagonal entry which is 0. A matrix calculation shows
Definition 7.1.1. A determinant function of a square that JA = A. It follows from Definition 7.1.1(a) that
n × n matrix A is a real number D(A) that satisfies three D(J) = 0 and from (7.1.1) that D(A) = 0.
properties:
Determinants of 2 × 2 Matrices Before discussing how
(a) If A = (aij ) is lower triangular, then D(A) is the
to compute determinants, we discuss the special case of
product of the diagonal entries; that is,
2 × 2 matrices. Recall from (3.8.2) of Section 3.8 that
D(A) = a11 · · · · · ann . when
a b
A=
c d
(b) D(At ) = D(A).
we defined
(c) Let B be an n × n matrix. Then det(A) = ad − bc. (7.1.2)
We check that (7.1.2) satisfies the three properties in Def-
D(AB) = D(A)D(B). (7.1.1)
inition 7.1.1. Observe that when A is lower triangular,
then b = 0 and det(A) = ad. So (a) is satisfied. It is
Theorem 7.1.2. There exists a unique determinant straightforward to verify (b). We already verified (c) in
function det satisfying the three properties of Defini- Chapter 3, Proposition 3.8.2.
tion 7.1.1. It is less obvious perhaps — but true nonetheless — that
the three properties of D(A) actually force the determi-
We will show that it is possible to compute the determi- nant of 2 × 2 matrices to be given by formula (7.1.2). We
nant of any n × n matrix using Definition 7.1.1. Here we begin by showing that Definition 7.1.1 implies that
present a few examples:
0 1
Lemma 7.1.3. Let A be an n × n matrix. D = −1. (7.1.3)
1 0
(a) Let c ∈ R be a scalar. Then D(cA) = cn D(A). We verify (7.1.3) by observing that
188
§7.1 Determinants
equals as desired.
1 −1
1 0
1 0
1 1
We have verified that the only possible determinant func-
. (7.1.4) tion for 2×2 matrices is the determinant function defined
0 1 1 1 0 −1 0 1
by (7.1.2).
Hence property (c), (a) and (b) imply that
189
§7.1 Determinants
A by swapping the ith and j th rows. For example, Proof (a) The matrix that adds a multiple of one row
to another is triangular (either upper or lower) and has
0 0 1 a11 a12 a13 a31 a32 a33 1’s on the diagonal. Thus property (a) in Definition 7.1.1
implies that the determinants of these matrices are equal
0 1 0 a21 a22 a23 = a21 a22 a23 ,
1 0 0 a31 a32 a33 a11 a12 a13 to 1.
which swaps the 1st and 3rd rows. Another calculation (b) The matrix that multiplies the ith row by c 6= 0 is
shows that R2 = In and hence that R is invertible since a diagonal matrix all of whose diagonal entries are 1 ex-
R−1 = R. cept for aii = c. Again property (a) implies that the
determinant of this matrix is c 6= 0.
Finally, we claim that adding c times the ith row of A to
the j th row of A can be viewed as matrix multiplication. (c) The matrix that swaps the ith row with the j th row
Let Ek` be the matrix all of whose entries are 0 except is the product of four matrices of types (a) and (b). To
for the entry in the k th row and `th column which is 1. see this let A be an n × n matrix whose ith row vector is
Then R = In + cEij has the property that RA is the ai . Then perform the following four operations in order:
matrix obtained by adding c times the j th row of A to
the ith row. We can verify by multiplication that R is
invertible and that R−1 = In − cEij . More precisely, Operation Result Matrix
Add ri to rj ri = ai rj = ai + aj B1
(In + cEij )(In − cEij ) = In + cEij − cEij − c2 Eij Multiply ri by −1
2
= In , rj = −ai rj = ai + aj B2
Add rj to ri ri = aj rj = ai + aj B3
since Eij
2
= O for i 6= j. For example, Subtract ri from rj ri = aj rj = ai B4
1 5 0 a11 a12 a13
It follows that the swap matrix equals B4 B3 B2 B1 .
(I3 + 5E12 )A = 0 1 0 a21 a22 a23
Therefore
0 0 1 a31 a32 a33
det(swap) = det(B4 ) det(B3 ) det(B2 ) det(B1 )
a11 + 5a21 a12 + 5a22 a13 + 5a23
= a21 a22 a23 ,
= (1)(−1)(1)(1) = −1.
a31 a32 a33
adds 5 times the 2nd row to the 1st row.
190
§7.1 Determinants
trices R1 . . . Rs such that We use this approach to compute the determinant of the
4 × 4 matrix
E = Rs · · · R1 A. (7.1.5)
0 2 10 −2
It follows from Definition 7.1.1(c) that we can compute 1 2 4 0
A= .
the determinant of A once we know the determinants of
1 6 1 −2
reduced echelon form matrices and the determinants of 2 1 1 0
elementary row matrices. In particular
The idea is to use (7.1.7) to keep track of the determi-
D(E) nant while row reducing A to upper triangular form. For
D(A) = . (7.1.6)
D(R1 ) · · · D(Rs ) instance, swapping rows changes the sign of the determi-
nant; so
It is easy to compute the determinant of any matrix in
reduced echelon form using Definition 7.1.1(a) since all
1 2 4 0
reduced echelon form n×n matrices are upper triangular. 0 2 10 −2
Lemma 7.1.5 tells us how to compute the determinants det(A) = − det .
1 6 1 −2
of elementary row matrices. This discussion proves: 2 1 1 0
1 2 4 0
We still need to show that determinant functions exist 0 2 10 −2
when n > 2. More precisely, we know that the reduced det(A) = − det .
0 4 −3 −2
echelon form matrix E is uniquely defined from A (Chap-
0 −3 −7 0
ter 2, Theorem 2.4.9), but there is more than one way to
perform elementary row operations on A to get to E.
Thus, we can write A in the form (7.1.6) in many differ- Multiplying a row by a scalar c corresponds to an ele-
ent ways, and these different decompositions might lead mentary row matrix whose determinant is c. To make
to different values for det A. (They don’t.) sure that we do not change the value of det(A), we have
to divide the determinant by c as we multiply a row of A
by c. So as we divide the second row of the matrix by 2,
An Example of Determinants by Row Reduction As a we multiply the whole result by 2, obtaining
practical matter we row reduce a square matrix A by
premultiplying A by an elementary row matrix Rj . Thus
1 2 4 0
0 1 5 −1
det(A) = −2 det .
1 0 4 −3 −2
det(A) = det(Rj A). (7.1.7)
det(Rj ) 0 −3 −7 0
191
§7.1 Determinants
We continue row reduction by zeroing out the last two If A is singular, then A is row equivalent to a non-identity
entries in the 2nd column, obtaining reduced echelon form matrix E whose determinant is zero
(since E is upper triangular and its last diagonal entry
1 2 4 0 is zero). So it follows from (7.1.5) that
0 1 5 −1
det(A) = −2 det
0 0 −23
2 0 = det(E) = det(R1 ) · · · det(Rs ) det(A)
0 0 8 −3
1 2 4 0
Since det(Rj ) 6= 0, it follows that det(A) = 0.
0 1 5 −1
= 46 det
2 .
Corollary 7.1.8. If the rows of an n × n matrix A are
0 0 1 − linearly dependent (for example, if one row of A is a
23
0 0 8 −3 scalar multiple of another row of A), then det(A) = 0.
Thus
1 2 4 0
An Inductive Formula for Determinants In this sub-
0 1 5 −1 section we present an inductive formula for the determi-
nant — that is, we assume that the determinant is known
det(A) = 46 det 2
= −106.
0 0 1 −
23
for square (n−1)×(n−1) matrices and use this formula to
53 define the determinant for n × n matrices. This inductive
0 0 0 −
23 formula is called expansion by cofactors.
Let A = (aij ) be an n × n matrix. Let Aij be the (n −
Determinants and Inverses We end this subsection with 1) × (n − 1) matrix formed from A by deleting the ith row
an important observation about the determinant func- and the j th column. The matrices Aij are called cofactor
tion. This observation generalizes to dimension n Corol- matrices of A.
lary 3.8.3 of Chapter 3.
Inductively we define the determinant of an n × n matrix
Theorem 7.1.7. An n × n matrix A is invertible if and A by:
only if det(A) 6= 0. Moreover, if A−1 exists, then
n
X
1 det(A) = (−1)1+j a1j det(A1j )
det A−1 = . (7.1.8) j=1
det A
= a11 det(A11 ) − a12 det(A12 ) + · · ·
det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1. In Appendix 7.4 we show that the determinant function
defined by (7.1.9) satisfies all properties of a determinant
Thus det(A) 6= 0 and (7.1.8) is valid. In particular, the function. Formula (7.1.9) is also called expansion by co-
determinants of elementary row matrices are nonzero, factors along the 1st row, since the a1j are taken from the
since they are all invertible. (This point was proved by 1st row of A. Since det(A) = det(At ), it follows that if
direct calculation in Lemma 7.1.5.) (7.1.9) is valid as an inductive definition of determinant,
192
§7.1 Determinants
then expansion by cofactors along the 1st column is also There is a visual mnemonic for remembering how to com-
valid. That is, pute the six terms in formula (7.1.11) for the determinant
of 3 × 3 matrices. Write the matrix as a 3 × 5 array by
det(A) = a11 det(A11 )−a21 det(A21 )+· · ·+(−1)n+1 an1 det(An1 ).repeating the first two columns, as shown in bold face in
(7.1.10) Figure 23: Then add the product of terms connected by
We now explore some of the consequences of this defini- solid lines sloping down and to the right and subtract the
tion, beginning with determinants of small matrices. For products of terms connected by dashed lines sloping up
example, Definition 7.1.1(a) implies that the determinant and to the right. Warning: this nice crisscross algorithm
of a 1 × 1 matrix is just for computing determinants of 3 × 3 matrices does not
generalize to n × n matrices.
det(a) = a.
When computing determinants of n × n matrices when
Therefore, using (7.1.9), the determinant of a 2×2 matrix n > 3, it is usually more efficient to compute the determi-
is: nant using row reduction rather than by using formula
(7.1.9). In the appendix to this chapter, Section 7.4,
= a11 det(a22 )−a12 det(a21 ) = a11 a22 −a12 a21we
, verify that formula (7.1.9) actually satisfies the three
a11 a12
det
a21 a22 properties of a determinant, thus completing the proof of
Theorem 7.1.2.
which is just the formula for determinants of 2 × 2 ma-
trices given in (7.1.2). An interesting and useful formula for reducing the effort
in computing determinants is given by the following for-
Similarly, we can now find a formula for the determinant
mula.
of 3 × 3 matrices A as follows.
Lemma 7.1.9. Let A be an n × n matrix of the form
a22 a23 a21 a23
det(A) = a11 det − a12 det
B 0
a32 a33 a31 a33 A= ,
C D
a21 a22
+ a13 det
a31 a32 where B is a k × k matrix and D is an (n − k) × (n − k)
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 matrix. Then
− a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . (7.1.11) det(A) = det(B) det(D).
As an example, compute Proof We prove this result using (7.1.9) coupled with
induction. Assume that this lemma is valid or all (n −
2 1 4 1) × (n − 1) matrices of the appropriate form. Now use
det 1 −1 3 (7.1.9) to compute
5 6 −2
det(A) = a11 det(A11 ) − a12 det(A12 ) + · · · ± a1n det(A1n )
using formula (7.1.11) as = b11 det(A11 ) − b12 det(A12 ) + · · · ± b1k det(A1k ).
2(−1)(−2) + 1 · 3 · 5 + 4 · 6 · 1 − 4(−1)5 − 3 · 6 · 2 − (−2)1 · 1 Note that the cofactor matrices A1j are obtained from
= 4 + 15 + 24 + 20 − 36 + 2 = 29. A by deleting the 1st row and the j th column. These
193
§7.1 Determinants
matrices all have the form to use. For example, typing e8_1_11 will load the matrix
B1j 0 1 2 3 0
A1j = ,
Cj D 2 1 4 1
A= −2 −1
. (7.1.12*)
0 1
where Cj is obtained from C by deleting the j th column. −1 0 −2 3
By induction on k
To compute the determinant of A just type det(A) and
det(A1j ) = det(B1j ) det(D). obtain the answer
194
§7.1 Determinants
example, to clear the entries in the 1st column below the To evaluate the determinant of A, which is now an upper
1st row, type triangular matrix, type
obtaining ans =
-46
A =
1 2 3 0 as expected.
0 -3 -2 1
0 3 6 1 Exercises
0 2 1 3
To clear the 2nd column below the 2nd row type In Exercises 1 – 3 compute the determinants of the given
matrix.
A(3,:) = A(3,:) + A(2,:);A(4,:)
−2 1 0
= A(4,:) - A(4,2)*A(2,:)/A(2,2) 1. A = 4 5 0 .
1 0 2
obtaining
1 0 2 3
−1 −2 3 2
A = 2. B = .
4 −2 0 3
1.0000 2.0000 3.0000 0
1 2 0 −3
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
2 1 −1 0 0
0 0 -0.3333 3.6667 1 −2 3 0 0
3. C = .
−3 2 −2 0 0
Finally, to clear the entry (4, 3) type −1
1 1 2 4
0 2 3 −1 −3
A(4,:) = A(4,:) -A(4,3)*A(3,:)/A(3,3)
0 2 0 1
to obtain 1 −1 0 −1
4. Compute det
1
.
1 1 3
A = 0 1 0 1
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000 −2
−3 2
0 0 4.0000 2.0000 5. Find det(A−1 ) where A = 4 1 3 .
0 0 0 3.8333 −1 1 1
195
§7.1 Determinants
vanishes.
2 −1 0 2 0 0
A= 0 3 0 and B= 0 −1 0 .
1 5 3 0 0 3
18. Suppose that two n × p matrices A and B are row equiv-
(a) For what values of λ is det(λA − B) = 0?
alent. Show that there is an invertible n × n matrix P such
(b) Is there a vector x for which Ax = Bx? that B = P A.
−1 2 3 1 19. Let A be an invertible n × n matrix and let b ∈ Rn be
1 −1 2 −1 a column vector. Let Bj be the n × n matrix obtained from
12. Compute det
.
1 1 1 3 A by replacing the j th column of A by the vector b. Let
0 −3 2 −1 x = (x1 , . . . , xn )t be the unique solution to Ax = b. Then
Cramer’s rule states that
In Exercises 13 – 14 verify that the given matrix has deter- det(Bj )
minant −1. xj = . (7.1.13)
det(A)
196
§7.1 Determinants
Prove Cramer’s rule. Hint: Let Aj be the j th column of A 25. The (n + 1) × (n + 1) Vandermonde matrix is
so that Aj = Aej . Show that
xn xn−1 ··· x21 x1 1
1 1
Bj = A(e1 | · · · |ej−1 |x|ej+1 | · · · |en ). xn 2 xn−1
2 ··· x22 x2 1
Vn+1 = .. .. .. .. .. .
Using this product, compute the determinant of Bj and verify . . . . .
(7.1.13). xn
n+1 xn−1
n+1 ··· x2n−1 xn+1 1
(7.1.14)
Verify that
20. Show that the determinant of a permutation matrix is 1
(7.1.15)
Y
or −1. Hint: Related to Exercise 21. Use induction. det(Vn+1 ) = (xi − xj ).
1≤i<j≤n+1
24. Let
x21
x1 1
V3 = x22 x2 1
x23 x3 1
Verify that
197
§7.2 Eigenvalues and Eigenvectors
(∗) ann − λ
Av = λv. (7.2.1)
Since the determinant of a triangular matrix is the prod-
It follows that the matrix A − λIn is singular since uct of the diagonal entries, it follows that
Theorem 7.1.7 implies that and hence that the diagonal entries of A are roots of the
characteristic polynomial. A similar argument works if
det(A − λIn ) = 0. A is upper triangular.
With these observations in mind, we can make the fol- It follows from (7.2.2) that the characteristic polynomial
lowing definition. of a triangular matrix is a polynomial of degree n and
that
Definition 7.2.1. Let A be an n × n matrix. The char-
acteristic polynomial of A is: pA (λ) = (−1)n λn + bn−1 λn−1 + · · · + b0 . (7.2.3)
pA (λ) = det(A − λIn ). for some real constants b0 , . . . , bn−1 . In fact, this state-
ment is true in general.
Theorem 7.2.3. Let A be an n × n matrix. Then pA is
a polynomial of degree n of the form (7.2.3).
In Theorem 7.2.3 we show that pA (λ) is indeed a poly-
nomial of degree n in λ. Note here that the roots of pA
are the eigenvalues of A. As we discussed, the real eigen- Proof Let C be an n × n matrix whose entries have
values of A are roots of the characteristic polynomial. the form cij + dij λ. Then det(C) is a polynomial in λ of
Conversely, if λ is a real root of pA , then Theorem 7.1.7 degree at most n. We verify this statement by induction.
states that the matrix A − λIn is singular and therefore It is easily verified when n = 1, since then C = (c + dλ)
that there exists a nonzero vector v such that (7.2.1) for some real numbers c and d. Then det(C) = c + dλ
is satisfied. Similarly, by using this extended algebraic which is a polynomial of degree at most one. (It may have
definition of eigenvalues we allow the possibility of com- degree zero, if d = 0.) So assume that this statement is
plex eigenvalues. The complex analog of Theorem 7.1.7 true for (n − 1) × (n − 1) matrices. Recall from (7.1.9)
shows that if λ is a complex eigenvalue, then there exists that
a nonzero complex n-vector v such that (7.2.1) is satis-
fied. det(C) = (c11 +d11 λ) det(C11 )+· · ·+(−1)n+1 (c1n +d1n λ) det(C1n ).
198
§7.2 Eigenvalues and Eigenvectors
By induction each of the determinants C1j is a polyno- polynomial has exactly two roots. In general, the proof
mial of degree at most n−1. It follows that multiplication of the fundamental theorem is not easy and is certainly
by c1j + d1j λ yields a polynomial of degree at most n in beyond the limits of this course. Indeed, the difficulty in
λ. Since the sum of polynomials of degree at most n is proving the fundamental theorem of algebra is in proving
a polynomial of degree at most n, we have verified our that a polynomial p(λ) of degree n > 0 has one (complex)
assertion. root. Suppose that λ0 is a root of p(λ); that is, suppose
Since A − λIn is a matrix whose entries have the desired that p(λ0 ) = 0. Then it follows that
form, it follows that pA (λ) is a polynomial of degree at p(λ) = (λ − λ0 )q(λ) (7.2.4)
most n in λ. To complete the proof of this theorem we
need to show that the coefficient of λn is (−1)n . Again, for some polynomial q of degree n − 1. So once we know
we verify this statement by induction. This statement is that p has a root, then we can argue by induction to prove
easily verified for 1 × 1 matrices — we assume that it is that p has n roots. A linear algebra proof of (7.2.4) is
true for (n − 1) × (n − 1) matrices. Again use (7.1.9) to sketched in Exercise 17.
compute
Recall that a polynomial need not have any real roots.
det(A − λIn ) = (a11 − λ) det(B11 ) − a12 det(B12 ) + · · · For example, the polynomial p(λ) = λ2 + 1 has no real
roots, since p(λ) > 0 for all real √
λ. This polynomial does
+ (−1)n+1 a1n det(B1n ).
have two complex roots ±i = ± −1.
where B1j are the cofactor matrices of A−λIn . Using our However, a polynomial with real coefficients has either
previous observation all of the terms det(B1j ) are poly- real roots or complex roots that come in complex con-
nomials of degree at most n − 1. Thus, in this expansion, jugate pairs. To verify this statement, we need to show
the only term that can contribute a term of degree n is: that if λ0 is a complex root of p(λ), then so is λ0 . We
claim that
−λ det(B11 ). p(λ) = p(λ).
Note that the cofactor matrix B11 is the (n − 1) × (n − 1) To verify this point, suppose that
matrix
p(λ) = cn λn + cn−1 λn−1 + · · · + c0 ,
B11 = A11 − λIn−1 ,
where A11 is the first cofactor matrix of the matrix A. By where each cj ∈ R. Then
induction, det(B11 ) is a polynomial of degree n − 1 with
leading term (−1)n−1 λn−1 . Multiplying this polynomial p(λ)
by −λ yields a polynomial of degree n with the correct = cn λn + cn−1 λn−1 + · · · + c0
leading term. n n−1
= cn λ + cn−1 λ + · · · + c0
= p(λ)
General Properties of Eigenvalues The fundamental
theorem of algebra states that every polynomial of degree If λ0 is a root of p(λ), then
n has exactly n roots (counting multiplicity). For exam-
ple, the quadratic formula shows that every quadratic p(λ0 ) = p(λ0 ) = 0 = 0.
199
§7.2 Eigenvalues and Eigenvectors
(a) pA (λ) = (λ1 − λ) · · · (λn − λ), Theorem 7.2.7. Let A be an invertible n×n matrix with
eigenvalues λ1 , · · · , λn . Then the eigenvalues of A−1 are
(b) det(A) = λ1 · · · λn . 1 , · · · , λn .
λ−1 −1
1 1
The eigenvalues of a matrix do not have to be different. (−1)n det(A)λn pA−1 ( ) = (−λ)n det(A) det(A−1 − In )
λ λ
For example, consider the extreme case of a strictly tri- 1
angular matrix A. Example 7.2.2 shows that all of the = det(−λA) det(A−1 − In )
λ
eigenvalues of A are zero. 1
= det(−λA(A−1 − In ))
We now discuss certain properties of eigenvalues. λ
= det(A − λIn )
Corollary 7.2.5. Let A be an n × n matrix. Then A is
= pA (λ),
invertible if and only if zero is not an eigenvalue of A.
which verifies the claim.
Proof The proof follows from Theorem 7.1.7 and The-
orem 7.2.4(b). Theorem 7.2.8. Let A and B be similar n × n matrices.
Then
Lemma 7.2.6. Let A be a singular n × n matrix. Then pA = pB ,
the null space of A is the span of all eigenvectors whose
associated eigenvalue is zero. and hence the eigenvalues of A and B are identical.
200
§7.2 Eigenvalues and Eigenvectors
Recall that the trace of an n × n matrix A is the sum of pA (λ) = λ4 − 5λ3 + 15λ2 − 10λ − 46.
the diagonal entries of A; that is
The eigenvalues of A are found by typing eig(A) and
tr(A) = a11 + · · · + ann . obtaining
We state without proof the following theorem:
ans =
Theorem 7.2.9. Let A be an n × n matrix with eigen- -1.2224
values λ1 , . . . , λn . Then 1.6605 + 3.1958i
1.6605 - 3.1958i
tr(A) = λ1 + · · · + λn .
2.9014
It follows from Theorem 7.2.8 that the traces of similar
matrices are equal. Thus A has two real eigenvalues and one complex conju-
gate pair of eigenvalues. Note that MATLAB has prepro-
Definition 7.2.10. Associated with each eigenvalue λ grammed not only the algorithm for finding the charac-
of the square matrix A is a vector subspace of Rn . This teristic polynomial, but also numerical routines for find-
subspace, called the eigenspace of λ, is the null space(A− ing the roots of the characteristic polynomial.
λIn ) and consists of all eigenvectors associated with the
The trace of A is found by typing trace(A) and obtaining
eigenvalue λ.
ans =
MATLAB Calculations The commands for computing 5
characteristic polynomials and eigenvalues of square ma-
trices are straightforward in MATLAB . In particular, for
an n × n matrix A, the MATLAB command poly(A) re- Using the MATLAB command sum we can verify the
turns the coefficients of (−1)n pA (λ). statement of Theorem 7.2.9. Indeed sum(v) computes
the sum of the components of the vector v and typing
For example, reload the 4 × 4 matrix A of (7.1.12*) by
typing e8_1_11. The characteristic polynomial of A is
found by typing sum(eig(A))
201
§7.2 Eigenvalues and Eigenvectors
Exercises
8 5
6. Consider the matrix A = .
−10 −7
A2 + A + In = 0.
4. Consider the matrix
Prove that A is invertible.
−1 1 1
A= 1 −1 1 .
1 1 −1
9. Let A be an n × n matrix. Explain why the eigenvalues of
(a) Verify that the characteristic polynomial of A is pλ (A) = A and At are identical.
(λ − 1)(λ + 2)2 .
(b) Show that (1, 1, 1) is an eigenvector of A corresponding
In Exercises 10 – 12 decide whether the given statements are
to λ = 1.
true or false. If the statements are false, give a counterexam-
(c) Show that (1, 1, 1) is orthogonal to every eigenvector of ple; if the statements are true, give a proof.
A corresponding to the eigenvalue λ = −2.
10. If the eigenvalues of a 2 × 2 matrix are equal to 1, then
the four entries of that matrix are each less than 500.
5. Let
11. If A is a 4 × 4 matrix and det(A) > 0, then det(−A) > 0.
0 −3 −2
A= 1 −4 −2
−3 4 1 12. The trace of the product of two n × n matrices is the
product of the traces.
One of the eigenvalues of A is −1. Find the other eigenvalues
of A.
202
§7.2 Eigenvalues and Eigenvectors
13. When n is odd show that every real n × n matrix has a 17. (matlab) Verify (7.2.4) by proving the following. Let
real eigenvalue. Pn be the vector space of polynomials in λ of degree less than
or equal to n.
In Exercises 14 – 15, use MATLAB to compute (a) the eigen- (a) Prove that dim Pn is n+1 by showing that {1, λ, . . . , λn }
values, traces, and characteristic polynomials of the given is a basis.
matrix. (b) Use the results from part (a) to confirm The- (b) For every λ0 prove that {1, λ − λ0 , . . . , (λ − λ0 )n } is a
orems 7.2.7 and 7.2.9. basis of Pn .
14. (matlab) (c) Use (b) to verify (7.2.4).
−12 −19 −3 14 0
18. Let A be an n × n matrix. List as TRUE all of the
−12 10 14 −19 8
(7.2.5*)
A= 4 −2 1 7 −3 following that are equivalent to A being invertible and FALSE
.
otherwise:
−9 17 −12 −5 −8
−12 −1 7 13 −12
(a) dim(range(LA )) = n
15. (matlab)
(b) A has n distinct real eigenvalues
(c) 0 is not an eigenvalue of A
−12 −5 13 −6 −5 12
7 14 6 1 8 18
(d) The system of equations Ax = e1 is consistent
−8 14 13 9 2 1
(7.2.6*)
B= .
(e) The system of equations Ax = e1 has a unique solution
2 4 6 −8 −2 15
(f) A is similar to In
−14 0 −6 14 8 −13
8 16 −8 3 5 19
(g) det(A) 6= 0
(h) The rows of A form a basis for Rn
16. (matlab) Use MATLAB to compute the characteristic
polynomial of the following matrix:
4 −6 7
A= 2 0 5
−10 2 5
B = −(A3 + p2 A2 + p1 A + p0 I).
203
§7.3 Real Diagonalizable Matrices
7.3 Real Diagonalizable Matrices Since A is triangular, it follows that both eigenvalues of
A are equal to 1. Since A is not the identity matrix,
An n × n matrix is real diagonalizable if it is similar to
it cannot be diagonalizable. More generally, if N is a
a diagonal matrix. More precisely, an n × n matrix A
nonzero strictly upper triangular n × n matrix, then the
is real diagonalizable if there exists an invertible n × n
matrix In + N is not diagonalizable.
matrix S such that
These examples show that complex eigenvalues are al-
D = S −1 AS ways obstructions to real diagonalization and multiple
real eigenvalues are sometimes obstructions to diagonal-
is a diagonal matrix. In this section we investigate when ization. Indeed,
a matrix is diagonalizable. In this discussion we assume
that all matrices have real entries. Theorem 7.3.1. Let A be an n×n matrix with n distinct
We begin with the observation that not all matrices are real eigenvalues. Then A is real diagonalizable.
real diagonalizable. We saw in Example 7.2.2 that the
diagonal entries of the diagonal matrix D are the eigen- There are two ideas in the proof of Theorem 7.3.1, and
values of D. Theorem 7.2.8 states that similar matrices they are summarized in the following lemmas.
have the same eigenvalues. Thus if a matrix is real diag- Lemma 7.3.2. Let λ1 , . . . , λk be distinct real eigenvalues
onalizable, then it must have real eigenvalues. It follows, for an n × n matrix A. Let vj be eigenvectors associated
for example, that the 2 × 2 matrix with the eigenvalue λj . Then {v1 , . . . , vk } is a linearly
0 −1
independent set.
1 0
is not real diagonalizable, since its eigenvalues are ±i. Proof We prove the lemma by using induction on k.
When k = 1 the proof is simple, since v1 6= 0. So we can
However, even if a matrix A has real eigenvalues, it need assume that {v1 , . . . , vk−1 } is a linearly independent set.
not be diagonalizable. For example, the only matrix sim-
ilar to the identity matrix In is the identity matrix itself. Let α1 , . . . , αk be scalars such that
To verify this point, calculate
α1 v1 + · · · + αk vk = 0. (7.3.1)
−1 −1
S In S = S S = In .
We must show that all αj = 0.
Suppose that A is a matrix all of whose eigenvalues are Begin by multiplying both sides of (7.3.1) by A, to obtain:
equal to 1. If A is similar to a diagonal matrix D, then
D must have all of its eigenvalues equal to 1. Since the 0 = A(α1 v1 + · · · + αk vk )
identity matrix is the only diagonal matrix with all eigen- = α1 Av1 + · · · + αk Avk (7.3.2)
values equal to 1, D = In . So, if A is similar to a diagonal
= α1 λ1 v1 + · · · + αk λk vk .
matrix, it must itself be the identity matrix. Consider,
however, the 2 × 2 matrix
Now subtract λk times (7.3.1) from (7.3.2), to obtain:
1 1
A= . α1 (λ1 − λk )v1 + · · · + αk−1 (λk−1 − λk )vk−1 = 0.
0 1
204
§7.3 Real Diagonalizable Matrices
Since {v1 , . . . , vk−1 } is a linearly independent set, it fol- {v1 , . . . , vn } is a linearly independent set of eigenvectors
lows that of A.
αj (λj − λk ) = 0, Since D is diagonal, Dej = λj ej for some real number
for j = 1, . . . , k − 1. Since all of the eigenvalues are λj . It follows that
distinct, λj − λk 6= 0 and αj = 0 for j = 1, . . . , k − 1. Avj = SDS −1 vj = SDS −1 Sej = SDej = λj Sej = λj vj .
Substituting this information into (7.3.1) yields αk vk =
0. Since vk 6= 0, αk is also equal to zero. So vj is an eigenvector of A. Since the matrix S is in-
vertible, its columns are linearly independent. Since the
Lemma 7.3.3. Let A be an n × n matrix. Then A is columns of S are vj , the set {v1 , . . . , vn } is a linearly
real diagonalizable if and only if A has n real linearly independent set of eigenvectors of A, as claimed.
independent eigenvectors.
Proof of Theorem 7.3.1 Let λ1 , . . . , λn be the dis-
tinct eigenvalues of A and let v1 , . . . , vn be the cor-
Proof Suppose that A has n linearly independent responding eigenvectors. Lemma 7.3.2 implies that
eigenvectors v1 , . . . , vn . Let λ1 , . . . , λn be the corre- {v1 , . . . , vn } is a linearly independent set in Rn and there-
sponding eigenvalues of A; that is, Avj = λj vj . Let fore a basis. Lemma 7.3.3 implies that A is diagonaliz-
S = (v1 | · · · |vn ) be the n × n matrix whose columns are able.
the eigenvectors vj . We claim that D = S −1 AS is a
diagonal matrix. Compute Remark. Theorem 7.3.1 can be generalized as follows.
Suppose all eigenvalues of the n × n matrix A are real.
D = S −1 AS = S −1 A(v1 | · · · |vn ) = S −1 (Av1 | · · · |Avn ) Then A is diagonalizable if and only if the dimension of
the eigenspace associated with each eigenvalue λ is equal
= S −1 (λ1 v1 | · · · |λn vn ). to the number of times λ is an eigenvalue of A. Issues
surrounding this remark are discussed in Chapter 11.
It follows that
Note that
−6 12 4
S −1 vj = ej , A= 8 −21 −8 . (7.3.3*)
−29 72 27
since
Sej = vj . We use MATLAB to answer the questions: Is A real
diagonalizable and, if it is, can we find the matrix S such
Therefore, that S −1 AS is diagonal? We can find the eigenvalues of
D = (λ1 e1 | · · · |λn en ) A by typing eig(A). MATLAB’s response is:
is a diagonal matrix.
ans =
Conversely, suppose that A is a real diagonalizable ma- -2.0000
trix. Then there exists an invertible matrix S such that -1.0000
D = S −1 AS is diagonal. Let vj = Sej . We claim that 3.0000
205
§7.3 Real Diagonalizable Matrices
Since the eigenvalues of A are real and distinct, Theo- and MATLAB responds with
rem 7.3.1 states that A can be diagonalized. That is,
there is a matrix S such that S =
−1 0 0
-0.7071 0.8729 -0.0000
S −1 AS = 0 −2 0 -0.0000 0.4364 -0.3162
0 0 3 -0.7071 -0.2182 0.9487
Finally, check that S −1 AS is the desired diagonal matrix (a) Find the eigenvalues and eigenvectors of A.
by typing inv(S)*A*S to obtain (b) Find an invertible matrix S such that S −1 AS is a diag-
onal matrix D. What is D?
ans =
-1.0000 0.0000 0
0.0000 -2.0000 -0.0000 2. The eigenvalues of
0.0000 0 3.0000
−1 2 −1
It is cumbersome to use the null command to find eigen- A= 3 0 1
vectors and MATLAB has been preprogrammed to do −3 −2 −3
these computations automatically. We can use the eig
are 2, −2, −4. Find the eigenvectors of A for each of these
command to find the eigenvectors and eigenvalues of a eigenvalues and find a 3×3 invertible matrix S so that S −1 AS
matrix A, as follows. Type is diagonal.
[S,D] = eig(A)
206
§7.3 Real Diagonalizable Matrices
12. (matlab)
5. Let A and B be n × n matrices. Suppose that A is real
diagonalizable and that B is similar to A. Show that B is real
−2.2 4.1 −1.5 −0.2
diagonalizable.
−3.4 4.8 −1.0 0.2
A=
. (7.3.5*)
−1.0 0.4 1.9 0.2
−14.5 17.8 −6.7 0.6
6. Let A be an n × n real diagonalizable matrix. Show that
13. (matlab)
A + αIn is also real diagonalizable.
1.9 2.2 1.5 −1.6 −2.8
0.8 2.6 1.5 −1.8 −2.0
7. Let A be an n × n matrix with a real eigenvalue λ and
(7.3.6*)
B= 2.6 2.8 1.6 −2.1 −3.8 .
associated eigenvector v. Assume that all other eigenvalues
4.8 3.6 1.5 −3.1 −5.2
of A are different from λ. Let B be an n × n matrix that
−2.1 1.2 1.7 −0.2 0.0
commutes with A; that is, AB = BA. Show that v is also an
eigenvector for B.
207
§7.4 *Existence of Determinants
7.4 *Existence of Determinants j > 2, let Ĉ be obtained from C by swapping rows j and
2. The cofactors Ĉ1k are then obtained from the cofac-
The purpose of this appendix is to verify the inductive
tors C1k by swapping rows j − 1 and 1. The induction
definition of determinant (7.1.9). We have already shown
hypothesis then implies that det(Ĉ1k ) = − det(C1k ) and
that if a determinant function exists, then it is unique.
det(Ĉ) = − det(C). Thus, verifying that det(C) = 0
We also know that the determinant function exists for
reduces to verifying the result when rows 1 and 2 are
1 × 1 matrices. So we assume by induction that the de-
equal.
terminant function exists for (n − 1) × (n − 1) matrices
and prove that the inductive definition gives a determi- Indeed, the most difficult part of this proof is the calcula-
nant function for n × n matrices. tion that shows that if the 1st and 2nd rows of C are equal,
then D(C) = 0. This calculation is tedious and requires
Recall that Aij is the cofactor matrix obtained from A
some facility with indexes and summations. Rather than
by deleting the ith row and j th column — so Aij is an
prove this for general n, we present the proof for n = 4.
(n − 1) × (n − 1) matrix. The inductive definition is:
This case contains all of the ideas of the general proof.
D(A) = a11 det(A11 )−a12 det(A12 )+· · ·+(−1)n+1 a1n det(A1n ). We can assume that
a1 a2 a3 a4
We use the notation D(A) to remind us that we have not a1 a2 a3 a4
yet verified that this definition satisfies properties (a)- C= c31 c32 c33 c34
(c) of Definition 7.1.1. In this appendix we verify these
c41 c42 c43 c44
properties after assuming that the inductive definition
satisfies properties (a)-(c) for (n − 1) × (n − 1) matrices.
Using the cofactor definition D(C) =
For emphasis, we use the notation det to indicate the
determinant of square matrices of size less than n. Note
a2 a3 a4 a1 a3 a4
that Lemma 7.1.5, the computation of determinants of el- a1 det c32 c33 c34 − a2 det c31 c33 c34 +
ementary row operations, can therefore be assumed valid c42 c43 c44 c41 c43 c44
for (n − 1) × (n − 1) matrices. a1 a2 a4 a1 a2 a3
c31 c32 c34 − a4 det c31 c32 c33 .
We begin with the following two lemmas. a3 det
c41 c42 c44 c41 c42 c43
Lemma 7.4.1. Let C be an n × n matrix. If two rows
Next we expand each of the four 3 × 3 matrices along
of C are equal or one row of C is zero, then D(C) = 0.
their 1st rows, obtaining D(C) =
c33 c34 c32 c34 c32 c33
Proof Suppose that row i of C is zero. If i > 1, then a1 a2 det − a3 det + a4 det
each cofactor has a zero row and by induction the deter- c43 c44 c42 c44 c42 c43
c33 c34 c31 c34 c31 c33
minant of the cofactor is 0. If row 1 is zero, then the −a2 a1 det
c c
− a3 det
c c
+ a4 det
43 44 41 44 c41 c43
cofactor expansion is 0 and D(C) = 0.
c32 c34
c31 c34
c31 c32
+a3 a1 det − a2 det + a4 det
Suppose that row i and row j are equal, where i > 1 c42 c44 c41 c44 c41 c42
and j > 1. Then the result follows by the induction hy- −a4 a1 det
c32 c33
− a2 det
c31 c33
+ a3 det
c31 c32
c42 c43 c41 c43 c41 c42
pothesis, since each of the cofactors has two equal rows.
So, we can assume that row 1 and row j are equal. If Combining the 2×2 determinants leads to D(C) = 0.
208
§7.4 *Existence of Determinants
Lemma 7.4.2. Let E be an n×n elementary row matrix (II) Next suppose that E adds row i to row j. If i, j > 1,
and let B be an n × n matrix. Then then the result follows from the induction hypothesis
since the new cofactors are obtained from the old co-
D(EB) = D(E)D(B) (7.4.1) factors by adding row i − 1 to row j − 1.
209
§7.4 *Existence of Determinants
and direct calculation shows that the bottom row of RB a nonzero n-vector w such that Rt w = 0. It follows that
is also zero. Hence D(RB) = 0 and property (c) is valid. v = (Est )−1 · · · (E1t )−1 w satisfies At w = Rt v = 0 and At
is singular.
Next suppose now that A is nonsingular. It follows that
At = Rt E1t · · · Est
210
Chapter 8 Linear Maps and Changes of Coordinates
211
§8.1 Linear Mappings and Bases
212
§8.1 Linear Mappings and Bases
Linear independence implies that αj − βj = 0; that is There are two assertions made in Theorem 8.1.2. The
αj = βj . We can now define first is that a linear map exists mapping vi to wi . The
second is that there is only one linear mapping that ac-
L(v) = α1 w1 + · · · + αn wn . (8.1.1) complishes this task. If we drop the constraint that the
map be linear, then many mappings may satisfy these
We claim that L is linear. Let v̂ ∈ V be another vector conditions. For example, find a linear map from R → R
and let that maps 1 to 4. There is only one: y = 4x. However
there are many nonlinear maps that send 1 to 4. Exam-
v̂ = β1 v1 + · · · + βn vn .
ples are y = x + 3 and y = 4x2 .
It follows that
Finding the Matrix of a Linear Map from Rn → Rm Given
v + v̂ = (α1 + β1 )v1 + · · · + (αn + βn )vn ,
by Theorem 8.1.2 Suppose that V = Rn and W = Rm .
We know that every linear map L : Rn → Rm can be
and hence by (8.1.1) that
defined as multiplication by an m × n matrix. The ques-
tion that we next address is: How can we find the matrix
L(v + v̂) = (α1 + β1 )w1 + · · · + (αn + βn )wn
whose existence is guaranteed by Theorem 8.1.2?
= (α1 w1 + · · · + αn wn ) + (β1 w1 + · · · + βn wn )
More precisely, let v1 , . . . , vn be a basis for Rn and let
= L(v) + L(v̂).
w1 , . . . , wn be vectors in Rm . We suppose that all of
these vectors are row vectors. Then we need to find an
Similarly m × n matrix A such that Avit = wit for all i. We find
A as follows. Let v ∈ Rn be a row vector. Since the vi
L(cv) = L((cα1 )v1 + · · · + (cαn )vn ) form a basis, there exist scalars αi such that
= c(α1 w1 + · · · + αn wn )
v = α1 v1 + · · · + αn vn .
= cL(v).
In coordinates
Thus L is linear.
α1
Let M : V → W be another linear mapping such that v t = (v1t | · · · |vnt ) ... , (8.1.2)
M (vi ) = wi . Then αn
213
§8.1 Linear Mappings and Bases
where (w1t | · · · |wnt ) is an m × n matrix. Using (8.1.2) we Now apply (8.1.3) to obtain
see that
1 0 1
1 2 1 1 −1 1 −1
Av t = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 v t , A=
0 1 −1
−1 0 1 =
1 −1 3
.
2
−3 2 −5
and
A = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 (8.1.3) As a check, verify by matrix multiplication that Avi =
wi , as claimed.
is the desired m × n matrix.
is invertible. This can either be done in MATLAB using Proof If w1 , . . . , wn is a basis for W , then use The-
the inv command or by hand by row reducing the matrix orem 8.1.2 to define a linear map M : W → V by
M (wj ) = vj . Note that
1 −1 0 1 0 0
4 1 1 0 1 0 L◦M (wj ) = L(vj ) = wj .
1 1 0 0 0 1
It follows by linearity (using the uniqueness part of The-
to obtain orem 8.1.2) that L◦M is the identity of W . Similarly,
M ◦L is the identity map on V , and L is invertible.
1 0 1
1
(v1t |v2t |v3t )−1 = −1 0 1 . Conversely, suppose that L◦M and M ◦L are identity
2
−3 2 −5 maps and that wj = L(vj ). We must show that
214
§8.1 Linear Mappings and Bases
then apply M to both sides of this equation to obtain v1 = (1, 0, 2) v2 = (2, −1, 1) v3 = (−2, 1, 0)
0 = M (α1 w1 + · · · + αn wn ) = α1 v1 + · · · + αn vn . and
w1 = (−1, 0) w2 = (0, 1) w3 = (3, 1).
But the vj are linearly independent. Therefore, αj = 0
and the wj are linearly independent.
2. Let Pn be the vector space of polynomials p(t) of degree
To show that the wj span W , let w be a vector in less than or equal to n. Show that {1, t, t2 , . . . , tn } is a basis
W . Since the vj are a basis for V , there exist scalars for Pn .
β1 , . . . , βn such that
215
§8.1 Linear Mappings and Bases
5. Show that 10. Let M(n) denote the vector space of n × n matrices and
let A be an n × n matrix. Let L : M(n) → M(n) be the
Z t
L(p) = p(s)ds
0 mapping defined by L(X) = AX − XA where X ∈ M(n).
is a linear mapping of P2 → P3 . Verify that L is a linear mapping. Show that the null space
of L, {X ∈ M(n) : L(X) = 0}, is a subspace consisting of all
matrices X that commute with A.
6. Let P3 ⊂ C 1 be the vector space of polynomials of degree
less than or equal to three. Let T : P3 → R be the function
dp 11. Which of the following are True and which False. Give
T (p) = (0), where p ∈ P. reasons for your answer.
dt
(a) Show that T is linear. (a) For any n × n matrix A, det(A) is the product of its n
(b) Find a basis for the null space of T . eigenvalues.
(c) Let S : P3 → R be the function S(p) = p(0)2 . Show that (b) Similar matrices always have the same eigenvectors.
S is not linear. (c) For any n × n matrix A and scalar k ∈ R, det(kA) =
kn det(A).
(d) There is a linear map L : R3 → R2 such that
7. Use Exercises 4, 5 and Theorem 8.1.2 to show that
L(1, 2, 3) = (0, 1) and L(2, 4, 6) = (1, 1).
d
◦L : P2 → P2
dt (e) The only rank 0 matrix is the zero matrix.
is the identity map.
Z 2π
12. Let L : C 1 → R be defined by L(f ) = f (t) cos(t)dt
8. Let W ⊂ Rn be a k-dimensional subspace where k < n. 0
for f ∈ C . Verify that L is a linear mapping.
1
Define
W ⊥ = {v ∈ Rn : v · w = 0 for all w ∈ W }
13. Let P be the vector space of polynomials in one variable
(a) Show that W ⊥ is a subspace of Rn .
Z t
t. Define L : P → P by L(p)(t) = (s − 1)p(s)ds. Verify
(b) Find a basis for W ⊥ in the special case that W = 0
that L is a linear mapping.
span{e1 , e2 , e3 } ⊂ R5 .
9. Let C denote the set of complex numbers. Verify that C is 14. Show that
a two-dimensional vector space. Show that L : C → C defined d2
: P4 → P2
by dt2
L(z) = λz, is a linear mapping. Then compute bases for the null space
where λ = σ + iτ is a linear mapping. d2
and range of 2 .
dt
216
§8.2 Row Rank Equals Column Rank
8.2 Row Rank Equals Column Rank So the null space of L is closed under addition and scalar
multiplication and is a subspace of V .
Let A be an m × n matrix. The row space of A is the
span of the row vectors of A and is a subspace of Rn . To prove that the range of L is a subspace of W , let w1
The column space of A is the span of the columns of A and w2 be in the range of L. Then, by definition, there
and is a subspace of Rm . exist v1 and v2 in V such that L(vj ) = wj . It follows
that
Definition 8.2.1. The row rank of A is the dimension
of the row space of A and the column rank of A is the L(v1 + v2 ) = L(v1 ) + L(v2 ) = w1 + w2 .
dimension of the column space of A.
Therefore, w1 + w2 is in the range of L. Similarly,
Lemma 5.5.4 of Chapter 5 states that
L(cv1 ) = cL(v1 ) = cw1 .
row rank(A) = rank(A).
So the range of L is closed under addition and scalar
We show below that row ranks and column ranks are multiplication and is a subspace of W .
equal. We begin by continuing the discussion of the pre-
vious section on linear maps between vector spaces.
Suppose that A is an m × n matrix and LA : Rn → Rm is
the associated linear map. Then the null space of LA is
Null Space and Range Each linear map between vector precisely the null space of A, as defined in Definition 5.2.1
spaces defines two subspaces. Let V and W be vector of Chapter 5. Moreover, the range of LA is the column
spaces and let L : V → W be a linear map. Then space of A. To verify this, write A = (A1 | · · · |An ) where
Aj is the j th column of A and let v = (v1 , . . . vn )t . Then,
null space(L) = {v ∈ V : L(v) = 0} ⊂ V LA (v) is the linear combination of columns of A
and
LA (v) = Av = v1 A1 + · · · + vn An .
range(L) = {L(v) ∈ W : v ∈ V } ⊂ W.
Lemma 8.2.2. Let L : V → W be a linear map between There is a theorem that relates the dimensions of the null
vector spaces. Then the null space of L is a subspace of space and range with the dimension of V .
V and the range of L is a subspace of W .
Theorem 8.2.3. Let V and W be vector spaces with V
finite dimensional and let L : V → W be a linear map.
Proof The proof that the null space of L is a subspace Then
of V follows from linearity in precisely the same way that
the null space of an m × n matrix is a subspace of Rn . dim(V ) = dim(null space(L)) + dim(range(L)).
That is, if v1 and v2 are in the null space of L, then
217
§8.2 Row Rank Equals Column Rank
spanned by the vectors L(vj ) where v1 , . . . , vn is a basis Row Rank and Column Rank Recall Theorem 5.5.6 of
for V ). Let u1 , . . . , uk be a basis for the null space of L Chapter 5 that states that the nullity plus the rank of an
and let w1 , . . . , w` be a basis for the range of L. Choose m × n matrix equals n. At first glance it might seem that
vectors yj ∈ V such that L(yj ) = wj . We claim that this theorem and Theorem 8.2.3 contain the same infor-
u1 , . . . , uk , y1 , . . . , y` is a basis for V , which proves the mation, but they do not. Theorem 5.5.6 of Chapter 5
theorem. is proved using a detailed analysis of solutions of linear
To verify that u1 , . . . , uk , y1 , . . . , y` are linear indepen- equations based on Gaussian elimination, back substitu-
dent, suppose that tion, and reduced echelon form, while Theorem 8.2.3 is
proved using abstract properties of linear maps.
α1 u1 + · · · + αk uk + β1 y1 + · · · + β` y` = 0. (8.2.1) Let A be an m × n matrix. Theorem 5.5.6 of Chapter 5
states that
Apply L to both sides of (8.2.1) to obtain
β1 w1 + · · · + β` w` = 0. nullity(A) + rank(A) = n.
Since the wj are linearly independent, it follows that βj = Meanwhile, Theorem 8.2.3 states that
0 for all j. Now (8.2.1) implies that
dim(null space(LA )) + dim(range(LA )) = n.
α1 u1 + · · · + αk uk = 0.
Since the uj are linearly independent, it follows that αj = But the dimension of the null space of LA equals the
0 for all j. nullity of A and the dimension of the range of A equals
the dimension of the column space of A. Therefore,
To verify that u1 , . . . , uk , y1 , . . . , y` span V , let v be in
V . Since w1 , . . . , w` span W , it follows that there exist nullity(A) + dim(column space(A)) = n.
scalars βj such that
218
§8.2 Row Rank Equals Column Rank
219
§8.2 Row Rank Equals Column Rank
9. Let
a11 a12 a13 a14 a15
A=
a21 a22 a23 a24 a25
and suppose a11 a22 − a12 a21 6= 0. What is the nullity(A)?
Explain your answer.
220
§8.3 Vectors and Matrices in Coordinates
8.3 Vectors and Matrices in uniquely as a linear combination of vectors in W; that is,
Coordinates v = α1 w1 + · · · + αn wn ,
In the last half of this chapter we discuss how similarity
of matrices should be thought of as change of coordi- for uniquely defined scalars α1 , . . . , αn .
nates for linear mappings. There are three steps in this
discussion. Proof Since W is a basis, Theorem 5.5.3 of Chapter 5
implies that the vectors w1 , . . . , wn span V and are lin-
(a) Formalize the idea of coordinates for a vector in early independent. Therefore, we can write v in V as a
terms of basis. linear combination of vectors in B. That is, there are
scalars α1 , . . . , αn such that
(b) Discuss how to write a linear map as a matrix in
each coordinate system. v = α1 w1 + · · · + αn wn .
(c) Determine how the matrices corresponding to the Next we show that these scalars are uniquely defined.
same linear map in two different coordinate systems Suppose that we can write v as a linear combination of
are related. the vectors in B in a second way; that is, suppose
Coordinates of Vectors using Bases Throughout, we Since the vectors in W are linearly independent, it follows
have written vectors v ∈ Rn in coordinates as v = that αj = βj for all j.
(v1 , . . . , vn ), and we have used this notation almost with-
out comment. From the point of view of vector space Definition 8.3.2. Let W = {w1 , . . . , wn } be a basis in
operations, we are just writing a vector space V . Lemma 8.3.1 states that we can write
v ∈ V uniquely as
v = v1 e 1 + · · · + vn e n
v = α1 w1 + · · · + αn wn . (8.3.1)
as a linear combination of the standard basis E =
{e1 , . . . , en } of Rn . The scalars α1 , . . . , αn are the coordinates of v relative to
the basis W, and we denote the coordinates of v in the
More generally, each basis provides a set of coordinates
basis W by
for a vector space. This fact is described by the following
lemma (although its proof is identical to the first part of [v]W = (α1 , . . . , αn ) ∈ Rn . (8.3.2)
the proof of Theorem 8.1.2.
Lemma 8.3.1. Let W = {w1 , . . . , wn } be a basis for the We call the coordinates of a vector v ∈ Rn relative to the
vector space V . Then each vector v in V can be written standard basis, the standard coordinates of v.
221
§8.3 Vectors and Matrices in Coordinates
Writing Linear Maps in Coordinates as Matrices Let Proof The process of choosing the coordinates of vec-
V be a finite dimensional vector space of dimension n and tors relative to a given basis W = {w1 , . . . , wn } of a
let L : V → V be a linear mapping. We now show how vector space V is itself linear. Indeed,
each basis of V allows us to associate an n × n matrix
to L. Previously we considered this question with the [u + v]W = [u]W + [v]W
standard basis on V = Rn . We showed in Chapter 3 that [cv]W = c[v]W .
we can write the linear mapping L as a matrix mapping,
as follows. Let E = {e1 , . . . , en } be the standard basis in Thus the coordinate mapping relative to a basis W of V
Rn . Let A be the n × n matrix whose j th column is the defined by
n vector L(ej ). Then Chapter 3, Theorem 3.3.5 shows v 7→ [v]W (8.3.4)
that the linear map is given by matrix multiplication as
is a linear mapping of V into Rn . We denote this linear
L(v) = Av. mapping by [·]W : V → Rn .
Thus every linear mapping on Rn can be written in this It now follows that both the left hand and right hand
matrix form. sides of (8.3.3) can be thought of as linear mappings
of V → Rn . In verifying this comment, we recall
Remark. Another way to think of the j th column of the Lemma 8.1.3 of Chapter 5 that states that the compo-
matrix A is as the coordinate vector of L(ej ) relative to sition of linear maps is linear. On the left hand side we
the standard basis, that is, as [L(ej )]E . We denote the have the mapping
matrix A by [L]E ; this notation emphasizes the fact that
A is the matrix of L relative to the standard basis. v 7→ L(v) 7→ [L(v)]W ,
We now discuss how to write a linear map L as a matrix which is the composition of the linear maps: [·]W with
using different coordinates. L. See (8.3.4). The right hand side is
222
§8.3 Vectors and Matrices in Coordinates
Computations of Vectors in Coordinates in Rn We and (1.5, 0.5) are the coordinates of v in the basis
divide this subsection into three parts. We consider a {w1 , w2 }.
simple example in R2 algebraically in the first part and
Using the notation in (8.3.2), we may rewrite (8.3.5) as
geometrically in the second. In the third part we formal-
ize and extend the algebraic discussion to Rn .
1 2 1
[v]W = [v]E ,
3 1 −1
An Example of Coordinates in R How do we find the
2
coordinates of a vector v in a basis? For example, choose where E = {e1 , e2 } is the standard basis.
a (nonstandard) basis in the plane — say
w1 = (1, 1) and w2 = (1, −2). Planar Coordinates Viewed Geometrically using MATLAB
Since {w1 , w2 } is a basis, we may write the vector v as Next we use MATLAB to view geometrically the notion
a linear combination of the vectors w1 and w2 . Thus we of coordinates relative to a basis W = {w1 , w2 } in the
can find scalars α1 and α2 so that plane. Type
223
§8.3 Vectors and Matrices in Coordinates
−1
Determining the Matrix of a Linear Mapping in Co-
−1.5 ordinates Suppose that we are given the linear map
LA : Rn → Rn associated to the matrix A in standard
−2 w2 coordinates and a basis w1 , . . . , wn of Rn . How do we find
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 the matrix [LA ]W . As above, we assume that the vectors
wj and the vector v are row vectors Since LA (v) = Av t
Figure 24: The coordinates of v = (2.0, 0.5) in the basis we can rewrite (8.3.3) as
w1 = (1, 1), w2 = (1, −2).
[LA ]W [v]W = [Av t ]W
As above, let PW = (w1t | · · · |wnt ). Using (8.3.6) we see
Thus,
that
α1 −1 t
[LA ]W PW v = PW −1
Av t .
[v]W = ... = PW
−1 t
v, (8.3.6)
Setting
αn −1 t
u = PW v
where PW = (w1t | · · · |wnt ). Since the wj are a basis for we see that
−1
Rn , the columns of the matrix PW are linearly indepen- [LA ]W u = PW APW u.
dent, and PW is invertible. Therefore,
−1
We may use (8.3.6) to compute [v]W using MATLAB. For [LA ]W = PW APW .
example, let We have proved:
v = (4, 1, 3)
Theorem 8.3.5. Let A be an n × n matrix and let
and LA : Rn → Rn be the associated linear map. Let
W = {w1 , . . . , wn } be a basis for Rn . Then the ma-
w1 = (1, 4, 7) w2 = (2, 1, 0) w3 = (−4, 2, 1). trix [LA ]W associated to to LA in the basis W is similar
to A. Therefore the determinant, trace, and eigenvalues
Then [v]W is found by typing of [LA ]W are identical to those of A.
224
§8.3 Vectors and Matrices in Coordinates
Matrix Normal Forms in R2 If we are careful about iw2 be a complex eigenvector of L associated with
how we choose the basis W, then we can simplify the the eigenvalue σ − iτ . Then W = {w1 , w2 } is a basis
form of the matrix [L]W . Indeed, we have already seen and
examples of this process when we discussed how to find
σ −τ
[L]W = .
closed form solutions to linear planar systems of ODEs τ σ
in the previous chapter. For example, suppose that L :
R2 → R2 has real eigenvalues λ1 and λ2 with two linearly (c) Suppose that L has exactly one linearly independent
independent eigenvectors w1 and w2 . Then the matrix real eigenvector w1 with real eigenvalue λ. Choose
associated to L in the basis W = {w1 , w2 } is the diagonal the generalized eigenvector w2
matrix
(L − λI2 )(w2 ) = w1 . (8.3.8)
λ1 0
[L]W = , (8.3.7)
0 λ2
Then W = {w1 , w2 } is a basis and
since
λ1
λ 1
[L(w1 )]W = [λ1 w1 ]W = [L]W = .
0 0 λ
and
0
[L(w2 )]W = [λ2 w2 ]W = . Proof The verification of (a) was discussed in (8.3.7).
λ2
The verification of (b) follows from (6.2.11) on equating
w1 with v and w2 with w. The verification of (c) follows
In Chapter 6 we showed how to classify 2 × 2 matrices
directly from (8.3.8) as
up to similarity (see Theorem 6.3.4) and how to use this
classification to find closed form solutions to planar sys-
[L(w1 )]W = λe1 and [L(w2 )]W = e1 + λe2 .
tems of linear ODEs (see Section 6.3). We now use the
ideas of coordinates and matrices associated with bases
to reinterpret the normal form result (Theorem 6.3.4) in
a more geometric fashion.
Visualization of Coordinate Changes in ODEs We con-
Theorem 8.3.6. Let L : R2 → R2 be a linear mapping. sider two examples. As a first example note that the
Then in an appropriate coordinate system defined by the matrices
basis W below, the matrix [L]W has one of the following
forms.
1 0 4 −3
C= and B = ,
0 −2 6 −5
(a) Suppose that L has two linearly independent real
eigenvectors w1 and w2 with real eigenvalues λ1 and are similar matrices. Indeed, B = P −1 CP where
λ2 . Then
2 −1
[L]W =
λ1 0
. P = . (8.3.9)
0 λ2 1 −1
(b) Suppose that L has no real eigenvectors and complex The phase portraits of the differential equations Ẋ = BX
conjugate eigenvalues σ ± iτ where τ 6= 0. Let w1 + and Ẋ = CX are shown in Figure 25. Note that both
225
§8.3 Vectors and Matrices in Coordinates
5 5
4 4
3 3
2 2
1 1
0 0
y
y
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x
phase portraits are pictures of the same saddle — just in 2. Let W = {v1 , . . . , vn } be a basis of Rn .
different coordinate systems.
(a) State the definition of the coordinates of a vector x ∈ Rn
As a second example note that the matrices relative to W, and describe how to find them given the
standard coordinates of x.
0 2 6 −4
C= and B = (b) What vector v ∈ Rn satisfies
−2 0 10 −6
are similar matrices, and both are centers. Indeed, B = [v]W = e1 − e2
P −1 CP where P is the same matrix as in (8.3.9). The
phase portraits of the differential equations Ẋ = BX (c) What is the definition of the matrix of a linear function
T : Rn → Rn relative to W?
and Ẋ = CX are shown in Figure 26. Note that both
phase portraits are pictures of the same center — just in (d) Let T : Rn → Rn be a linear transformation with stan-
different coordinate systems. dard matrix A so that [T ]W = B. What is the relation-
ship between A and B?
Exercises
3. Let W = {w1 , w2 } be a basis for R2 where w1 = (1, 2) and
w2 = (0, 1). Let LA : R2 → R2 be the linear map given by
the matrix
1. Let w1 = (1, 4), w2 = (−2, 1) and W = {w1 , w2 }. Find the
2 1
A=
coordinates of v = (−1, 32) in the W basis. −1 0
in standard coordinates. Find the matrix [L]W .
226
§8.3 Vectors and Matrices in Coordinates
5 5
4 4
3 3
2 2
1 1
0 0
y
y
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x
4. Let Eij be the 2 × 3 matrix whose entry in the ith row and 6. Verify that V = {p1 , p2 , p3 } where
j th column is 1 and all of whose other entries are 0.
p1 (t) = 1 + 2t, p2 (t) = t + 2t2 , and p3 (t) = 2 − t2 ,
(a) Show that
is a basis for the vector space of polynomials P2 . Let p(t) = t
V = {E11 , E12 , E13 , E21 , E22 , E23 } and find [p]V .
227
§8.3 Vectors and Matrices in Coordinates
w1 = (1, 2, 3, 4)
w2 = (0, −1, 1, 3)
(8.3.12*)
w3 = (2, 0, 0, 1)
w4 = (−1, 1, 3, 0)
228
§8.4 *Matrices of Linear Maps on a Vector Space
229
§8.4 *Matrices of Linear Maps on a Vector Space
230
§8.4 *Matrices of Linear Maps on a Vector Space
3 3 z1
2 2 z2
0.7916
1 w1 1
1.319
0 v 0 v
0.6645
−1 −1
−1.192
−2 w2 −2
−3 −3
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
Figure 27: The coordinates of v = (1.9839, −0.0097) in the bases w1 = (1, 1), w2 = (1, −2) and z1 = (1, 3), z2 = (−1, 2).
1. Let and
w1 = (1, 2) and w2 = (0, 1) z1 = (−1.4, 0.3) and z2 = (0.1, −0.2)
and be two bases of R2 and let v = (0.6, 0.1). Find [v]W , [v]Z ,
z1 = (2, 3) and z2 = (3, 4) and CWZ .
231
§8.4 *Matrices of Linear Maps on a Vector Space
(a) Try to determine the way that the matrix A moves vec-
tors in R3 . For example, let
1 1
w1 = (1, 1, 1)t w2 = √ (1, −2, 1)t w3 = √ (1, 0, −1)t
6 2
and compute Awj .
(b) Let W = {w1 , w2 , w3 } be the basis of R3 given in (a).
Compute [LA ]W .
(c) Determine the way that the matrix [LA ]W moves vectors
in R3 . For example, consider how this matrix moves the
standard basis vectors e1 , e2 , e3 . Compare this answer
with that in part (a).
232
Chapter 9 Least Squares
9 Least Squares
In Section 9.1 we study the geometric problem of least
squares approximations: Given a point x0 and a subspace
W ⊂ Rn , find the point w0 ∈ W closest to x0 . We then
use least squares approximation to discuss two applica-
tions: the best approximate solution to an inconsistent
linear systems in Section 9.2 and least squares fitting of
data in Section 9.3.
233
§9.1 Least Squares Approximations
9.1 Least Squares Approximations The form (9.1.2) means that the sum of the squares of
the components of the vectors b−w is minimal at w = w̃.
Let W ⊂ R be a subspace and b ∈ R be a vector.
n n
In this section we solve a basic geometric problem and Recall from (1.4.3) that two vectors z1 , z2 ∈ Rn are per-
investigate some of its consequences. The problem is: pendicular or equivalently orthogonal if z1 ·z2 = 0. Before
continuing, we state and prove
Find the vector w̃ ∈ W that is the nearest vec-
Lemma 9.1.2 (The Law of Pythagorus). The vectors
tor to b in W .
z1 , z2 ∈ Rn are orthogonal if and only if
Definition 9.1.1. The vector w̃ in the subspace W of
Rn that is the nearest to the vector b in Rn is called the ||z1 + z2 ||2 = ||z1 ||2 + ||z2 ||2 . (9.1.3)
least squares approximation of b in W .
Proof To verify (9.1.3) calculate
w
~ It follows that z1 and z2 satisfy (9.1.3) if and only if
W z1 · z2 = 0 if and only if z1 and z2 are orthogonal.
234
§9.1 Least Squares Approximations
dimensional with basis vector w. Since W is one dimen- Proof Observe that the vector b − w̃ is orthogonal to
sional, a vector w̃ ∈ W must be a multiple of w; that is, every vector in W precisely when b − w̃ is orthogonal to
w̃ = αw for α ∈ R. Suppose that we can find a scalar a each basis vector wj . It follows from Lemma 9.1.3 that
so that b − αw is orthogonal to every vector in W . Then w̃ is the closest vector to b in W if
it follows from Lemma 9.1.3 that w̃ is the closest vector
(b − w̃) · wj = 0
in W to b. To find α, calculate
for every j. That is, if
0 = (b − αw) · w = b · w − αw · w.
w̃ · wj = b · wj
Then
b·w for every j. These equations can be rewritten as a system
α=
||w||2 of equations in terms of the αi , as follows:
and w1 · w1 α1 + · · · + w1 · wk αk = w1 · b
b·w .. (9.1.7)
w̃ = w. (9.1.4) .
||w||2
wk · w1 α1 + · · · + wk · wk αk = wk · b.
Observe that ||w||2 6= 0 since w is a basis vector.
For example, if b = (1, 2, −1, 3) ∈ R4 and w = (0, 1, 2, 3). Note that if u, v ∈ Rn are column vectors, then u·v = ut v.
Then the vector w̃ in the space spanned by w that is Therefore, we can rewrite (9.1.7) as
nearest to b is
α1
9
w̃ = w W t W ... = W t b,
14
since b · w = 9 and ||w||2 = 14. αk
where W is the matrix whose columns are the wj and b is
Least Squares Distance to a Subspace Similarly, we viewed as a column vector. Note that the matrix W t W
solve the general least squares problem by solving a sys- is a k × k matrix.
tem of linear equations. We claim that W t W is invertible. To verify this claim, it
suffices to show that the null space of W t W is zero; that
Theorem 9.1.4. Let b ∈ Rn be a vector, let {w1 , . . . , wk }
is, if W t Wz = 0 for some z ∈ Rk , then we show z = 0.
be a basis for the subspace W ⊂ Rn , and let W =
First, calculate
(w1 | · · · |wk ) be the n × k matrix whose columns are the
basis vectors of W . Suppose ||Wz||2 = Wz · Wz = (Wz)t Wz = z t W t Wz = z t 0 = 0.
235
§9.1 Least Squares Approximations
Corollary 9.1.5. Let b be a vector in Rn , let W be a 5. Prove that the least squares approximation is unique.
subspace of Rn , and let w1 , . . . , wk in Rn be a basis for More precisely, let W be a subspace of Rn and let b be a
W . Let W be the n × k matrix (w1 | · · · |wk ). Then vector in Rn outside W . Suppose w1 , w2 ∈ W are two vectors
such that the distance ||w1 − b|| = ||w2 − b|| is minimal among
w̃ = W(W t W)−1 W t b ∈ W (9.1.8) all vectors in W . Then show that w1 = w2 .
as claimed.
1 −1
2 1
w1 = −3 and w2 = 0
Exercises 1 2
in
R .
4
Find the closest point w̃ in W to the point b =
1
1. Use the least squares method, specifically formula (9.1.4) 3
in R4 .
to find the minimal distance between the point (3, 4) to the
−1
x-axis. 2
2. Use the least squares method, specifically formula (9.1.4), 8. (matlab) Let W be the vector space spanned by the
to find the minimal distance between the point (3, 4) to the vectors
y-axis. 1 −3 −5
3 0 2
(9.1.9)
7 , 8 , −2 .
1 3 1
3. Find the point on the line y = x in R2 that has minimal 0 9 4
distance to the point (1, 6).
9
6
Find the closest point w̃ in W to the point b = in R5 .
1 1 4
4. Find the vector in the plane x − y − z = 0 that has the
2 3 3
1 0
minimal distance to the point x0 = 1 .
1
236
§9.1 Least Squares Approximations
237
§9.2 Best Approximate Solution
238
§9.2 Best Approximate Solution
(b) Find the range of A. 6. (matlab) Consider the system of 3 linear equations in 3
(c) Let
W = range(A). Find w̃ in W that is closest to unknowns
4 Ax = b
b= among all w ∈ W .
2 where
(d) Find a least squares approximation solution to the sys-
tem.
2 1 −3 7
A= 0 3 −3 and b = 1 .
−3 0 3 3
3. Consider the following system of linear equations Use MATLAB to verify that the system is inconsistent and
to find a least squares approximation solution of the system.
x1 + x2 = 0
2x1 − 3x2 = 1
5x1 − 2x2 = 2
x̃ + x
239
§9.3 Least Squares Fitting of Data
9.3 Least Squares Fitting of Data b1 and b2 (that do not depend on i) for which yi = b1 +
b2 xi for each i. But these points are just data; errors
We begin this section by using the method of least
may have been made in their measurement. So we ask:
squares to find the best straight line fit to a set of data.
Find b01 and b02 so that the error made in fitting the data
Later in the section we will discuss best fits to other
to the line y = b01 + b02 x is minimal, that is, the error that
curves.
is made in that fit is less than or equal to the error made
in fitting the data to the line y = b1 + b2 x for any other
An Example of Best Linear Fit to Data Suppose that we choice of b1 and b2 .
are given n data points (xi , yi ) for i = 1, . . . , 10. For
We begin by discussing what that error actually is. Given
example, consider the ten points
constants b1 and b2 and given a data point xi , the dif-
(2.0, 0.1) (3.0, 2.7) (1.5, −1.1) (−1.0, −5.5) (0.0, −3.4) ference between the data value yi and the hypothesized
(3.6, 3.0) (0.7, −2.8) (4.1, 4.0) (1.9, −1.9) (5.0, 5.5) value b1 + b2 xi is the error that is made at that data
(9.3.1*) point. Next, we combine the errors made at all of the
The ten points (xi , yi ) are plotted in Figure 29 using the data points; a standard way to combine the errors is to
commands use the Euclidean distance
e9_3_1 1
2 2
plot(X,Y,'o') E(b) = (y1 − (b1 + b2 x1 )) + · · · + (y10 − (b1 + b2 x10 )) 2.
axis([-3,7,-8,8])
xlabel('x') Rewriting E(b) in vector notation leads to an economy
ylabel('y') in notation and to a conceptual advantage. Let
8
X = (x1 , . . . , x10 )t Y = (y1 , . . . , y10 )t and F1 = (1, 1, . . . , 1)
It follows that
−2
−4
−6
E(b) = ||Y − (b1 F1 + b2 X)||.
−8
−3 −2 −1 0 1 2 3 4 5 6 7 The problem of making a least squares fit is to minimize
x
E over all b1 and b2 .
Figure 29: Scatter plot of data in (9.3.1*). To solve the minimization problem, note that the vec-
tors b1 F1 + b2 X form a two dimensional subspace W =
Next, suppose that there is a linear relation between the span{F1 , X} ⊂ R10 (at least when X is not a scalar mul-
xi and the yi ; that is, we assume that there are constants tiple of F1 , which is almost always). Minimizing E is
240
§9.3 Least Squares Fitting of Data
y
that the values of b01 and b02 are obtained using (9.1.6). −2
−6
A = [F1 X]; −8
b0 = inv(A'*A)*A'*Y −3 −2 −1 0 1 2
x
3 4 5 6 7
to obtain Figure 30: Scatter plot of data in (9.3.1*) with best linear
approximation.
b0(1) = -3.8597
b0(2) = 1.8845
Least Squares Fit to a Quadratic Polynomial Suppose
Superimposing the line y = −3.8597 + 1.8845x on the that we want to fit the data (xi , yi ) to a quadratic poly-
scatter plot in Figure 29 yields the plot in Figure 30. nomial
The total error is E(b0) = 1.9634 (obtained in MATLAB y = b1 + b2 x + b3 x2
by typing norm(Y-(b0(1)*F1+b0(2)*X)). Compare this
with the error E(2, −4) = 2.0928. by least squares methods. We want to find constants
b01 , b02 , b03 so that the error made is using the quadratic
polynomial y = b01 + b02 x + b03 x2 is minimal among all pos-
General Linear Regression We can summarize the previ-
sible choices of quadratic polynomials. The least squares
ous discussion, as follows. Given n data points
error is
(x1 , y1 ), . . . , (xn , yn );
E(b) = ||Y − b1 F1 + b2 X + b3 X (2) ||
form the vectors
X = (x1 , . . . , xn )t Y = (y1 , . . . , yn )t and F1 = (1, . . . , 1)t where
t
in Rn . Find constants b01 and b02 so that b01 F1 + b02 X is a X (2) = x21 , . . . , x2n
vector in W = span{F1 , X} ⊂ Rn that is nearest to Y . and, as before, F1 is the n vector with all components
Let equal to 1.
A = (F1 |X)
We solve the minimization problem as before. In this
be the n × 2 matrix. This problem is solved by least case, the space of possible approximations to the data
squares in (9.1.6) as W is three dimensional; indeed, W = span{F1 , X, X (2) }.
As in the case of fits to lines we try to find a point in
0
b1
= (At A)−1 At Y. (9.3.2) W that is nearest to the vector Y ∈ Rn . By (9.1.6), the
b02
241
§9.3 Least Squares Fitting of Data
answer is: 8
t −1 t
b = (A A) A Y, 6
y
LAB as follows
−2
e9_3_1 −4
A = [F1 X X.*X]; −6
b = inv(A'*A)*A'*Y;
−8
−3 −2 −1 0 1 2 3 4 5 6 7
to obtain
x
So the best parabolic fit to this data is y = −3.8197 + is the function g0 (x) ∈ C that is nearest to the data set
1.7054x + 0.0443x2 . Note that the coefficient of x2 is in the following sense. Let
small suggesting that the data was well fit by a straight
line. Note also that the error is E(b0) = 1.9098 which is X = (x1 , . . . , xn )t and Y = (y1 , . . . , yn )t
only marginally smaller than the error for the best linear
fit. For comparison, in Figure 31 we superimpose the be column vectors in Rn . For any function g(x) define
equation for the quadratic fit onto Figure 30. the column vector
242
§9.3 Least Squares Fitting of Data
nearest to Y . This can be solved in general using (9.1.6). where T is time measured in months and b1 , b2 , b3 are
That is, let A be the n × m matrix scalars. These functions are 12 periodic, which seems ap-
propriate for weather data, and form a three dimensional
A = (F1 | · · · |Fm ) function space C. Recall the trigonometric identity
where Fj ∈ Rn is the column vector associated to the j th a cos(ωt) + c sin(ωt) = d sin(ω(t − ϕ))
basis element of C, that is,
where
Fj = (fj (x1 ), . . . , fj (xn ))t ∈ Rn . (9.3.4)
p
d= a2 + c2 .
The minimizing function g0 (x) ∈ C is a linear combina- Based on this identity we call C the space of sinusoidal
tion of the basis functions f1 (x), . . . , fn (x), that is, functions. The number d is called the amplitude of the
sinusoidal function g(T ).
g0 (x) = b1 f1 (x) + · · · + bm fm (x)
Note that each data set consists of twelve entries — one
for scalars bi . If we set for each month. Let T = (1, 2, . . . , 12)t be the vector
X ∈ R12 in the general presentation. Next let Y be the
b = (b1 , . . . , bm ) ∈ Rm , data in one of the data sets — say the high temperatures
in Paris.
then least squares minimization states that
Now we turn to the vectors representing basis functions
b = (A0 A)−1 A0 Y. (9.3.3) in C. Let
This equation can be solved easily in MATLAB. Enter the F1=[1 1 1 1 1 1 1 1 1 1 1 1]'
data as column n-vectors X and Y. Compute the column
vectors Fj = fj (X) and then form the matrix A = [F1 be the vector associated with the basis function f1 (T ) =
F2 · · · Fm]. Finally compute 1. Let F2 and F3 be the column vectors associated to the
basis functions
b = inv(A'*A)*A'*Y
2π 2π
f2 (T ) = cos T and f3 (T ) = sin T .
12 12
Least Squares Fit to a Sinusoidal Function We discuss a
specific example of the general least squares formulation These vectors are computed by typing
by considering the weather. It is reasonable to expect
monthly data on the weather to vary periodically in time F2 = cos(2*pi/12*T);
with a period of one year. In Table 3 we give average F3 = sin(2*pi/12*T);
daily high and low temperatures for each month of the
year for Paris and Rio de Janeiro. We attempt to fit this By typing temper, we enter the temperatures and the
data with curves of the form: vectors T, F1, F2 and F3 into MATLAB.
To find the best fit to the data by a sinusoidal function
2π 2π
g(T ) = b1 + b2 cos T + b3 sin T , g(T ), we use (9.1.6). Let A be the 12 × 3 matrix
12 12
243
§9.3 Least Squares Fitting of Data
Table 3: Monthly Average of Daily High and Low Temperatures in Paris and Rio de Janeiro.
244
§9.3 Least Squares Fitting of Data
100 100
90 90
80 80
70 70
temperature (Farenheit)
temperature (Farenheit)
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
time (months) time (months)
Figure 32: Monthly averages of daily high temperatures in Paris (left) and Rio de Janeiro (right) with best sinusoidal
approximation.
This MATLAB command can be checked on the sinu- (a) Find m and b to give the best linear fit to this data.
soidal fit to the high temperature Rio de Janeiro data by (b) Use this linear approximation to the data to make pre-
typing dictions of the world populations in the year 1910 and
2000.
b = A\RioH
and obtaining Year Population (in millions) Year Population (in million
1900 1625 1950 2516
b = 1910 n.a. 1960 3020
79.0833 1920 1813 1970 3698
3.0877 1930 1987 1980 4448
3.6487 1940 2213 1990 5292
245
§9.3 Least Squares Fitting of Data
City Rainy Days Sunny (%) City Rainy Days Sunny (%)
Charleston 92 72 Kansas City 98 59
Chicago 121 54 Miami 114 85
Dallas 82 65 New Orleans 103 61
Denver 82 67 Phoenix 28 88
Duluth 136 52 Salt Lake City 99 59
246
Chapter 10 Orthogonality
10 Orthogonality
In Section 10.1 we discuss orthonormal bases (bases in
which each basis vector has unit length and any two ba-
sis vectors are perpendicular) and orthogonal matrices
(matrices whose columns form an orthonormal basis).
We will see that the computation of coordinates in an
orthonormal basis is particularly straightforward. The
Gram-Schmidt orthonormalization process for construct-
ing orthonormal bases is presented in Section 10.2. We
use orthogonality in Section 10.3 to study the eigenvalues
and eigenvectors of symmetric matrices (the eigenvalues
are real and the eigenvectors can be chosen to be or-
thonormal). The chapter ends with a discussion of the
QR decomposition for finding orthonormal bases in Sec-
tion 10.4. This decomposition leads to an algorithm that
is numerically superior to Gram-Schmidt and is the one
used in MATLAB.
247
§10.1 Orthonormal Bases and Orthogonal Matrices
10.1 Orthonormal Bases and Next we discuss how to find coordinates of a vector in an
orthonormal basis, that is, a basis consisting of orthonor-
Orthogonal Matrices mal vectors.
In Section 8.3 we discussed how to write the coordinates
of a vector in a basis. We now show that finding coordi- Theorem 10.1.3. Let V ⊂ Rn be a subspace and let
nates of vectors in certain bases is a very simple task — {v1 , . . . , vk } be an orthonormal basis of V . Let v ∈ V be
these bases are called orthonormal bases. a vector. Then
Nonzero vectors v1 , . . . , vk in Rn are orthogonal if the dot v = α1 v1 + · · · + αk vk .
products
vi · vj = 0 where
when i 6= j. The vectors are orthonormal if they are αi = v · vi .
orthogonal and of unit length, that is,
Proof Since {v1 , . . . , vk } is a basis of V , we can write
vi · vi = 1.
α1 v1 + · · · + αk vk = 0. An Example in R3 Let
Corollary 10.1.2. A set of n nonzero orthogonal vectors A short calculation verifies that these vectors have unit
in Rn is a basis. length and are pairwise orthogonal. Let v = (1, 2, 3)
be a vector and determine the coordinates of v in the
basis V = {v1 , v2 , v3 }. Theorem 10.1.3 states that these
Proof Lemma 10.1.1 implies that the n vectors are lin-
coordinates are:
early independent, and Chapter 5, Corollary 5.6.7 states
that n linearly independent vectors in Rn form a ba- √ 7 √
sis. [v]V = (v · v1 , v · v2 , v · v3 ) = (2 3, √ , − 2).
6
248
§10.1 Orthonormal Bases and Orthogonal Matrices
Matrices in Orthonormal Coordinates Next we discuss (b) Q is orthogonal if and only if Q−1 = Qt ;
how to find the matrix associated with a linear map in
an orthonormal basis. Let L : Rn → Rn be a linear map (c) If Q1 , Q2 are orthogonal matrices, then Q1 Q2 is an
and let V = {v1 , . . . , vn } be an orthonormal basis for Rn . orthogonal matrix.
Then the matrix associated to L in the basis V can be
calculated in terms of dot product. That matrix is: Proof (a) Let Q = (v1 | · · · |vn ). Since Q is orthogonal,
the vj form an orthonormal basis. By direct computa-
[L]V = L(vj ) · vi . (10.1.1) tion note that Qt Q = {(vi · vj )} = In , since the vj are
orthonormal. Note that (b) is simply a restatement of
To verify (10.1.1), recall from Definition 8.3.3 that the (a).
(i, j)th entry of [L]V is the ith entry in the vector [L(vj )]V
which is L(vj ) · vi by Theorem 10.1.3. (c) Now let Q1 , Q2 be orthogonal. Then (a) implies
249
§10.1 Orthonormal Bases and Orthogonal Matrices
The columns of B form an orthonormal basis for the null 4. Show that if P is an n × n orthogonal matrix, then
space of A. This assertion can be checked by first typing det(P ) = ±1.
v1 = B(:, 1);
v2 = B(:, 2); In Exercises 5 – 9 decide whether or not the given matrix is
orthogonal.
and then typing
2 0
5. .
0 1
norm(v1)
norm(v2)
0 1 0
dot(v1,v2) 6. 0 0 1 .
A*v1 1 0 0
A*v2
0 −1 0
yields answers 1, 1, 0, (0, 0, 0)t , (0, 0, 0)t (to within nu- 7. 0 0 1 .
merical accuracy). Recall that the MATLAB command −1 0 0
norm(v) computes the norm of a vector v.
cos(1) − sin(1)
8. .
sin(1) cos(1)
Exercises
1 0 4
9. .
0 1 0
250
§10.2 Gram-Schmidt Orthonormalization Process
10.2 Gram-Schmidt Next, we find a unit length vector v20 in the plane spanned
by w1 and w2 that is perpendicular to v1 . Let w0 be the
Orthonormalization Process vector on the line generated by v1 that is nearest to w2 .
Suppose that W = {w1 , . . . , wk } is a basis for the sub- It follows from (9.1.4) that
space V ⊂ Rn . There is a natural process by which the
w2 · v1
W basis can be transformed into an orthonormal basis V w0 = v1 = (w2 · v1 )v1 .
of V . This process proceeds inductively on the wj ; the ||v1 ||2
orthonormal vectors v1 , . . . , vk can be chosen so that The vector w0 is shown on Figure 33 and, as Lemma 9.1.3
states, the vector v20 = w2 − w0 is perpendicular to v1 .
span{v1 , . . . , vj } = span{w1 , . . . , wj } That is,
v20 = w2 − (w2 · v1 )v1 (10.2.2)
for each j ≤ k. Moreover, the vj are chosen using the
theory of least squares that we have just discussed. is orthogonal to v1 .
Finally, set
1 0
The Case j = 2 To gain a feeling for how the induction v2 = v (10.2.3)
||v20 || 2
process works, we verify the case j = 2. Set
so that v2 has unit length. Since v2 and v20 point in the
v1 =
1
w1 ; (10.2.1) same direction, v1 and v2 are orthogonal. Note also that
||w1 || v1 and v2 are linear combinations of w1 and w2 . Since v1
and v2 are orthogonal, they are linearly independent. It
so v1 points in the same direction as w1 and has unit follows that
length, that is, v1 · v1 = 1. The normalization is shown
in Figure 33. span{v1 , v2 } = span{w1 , w2 }.
,
v2 In summary: computing v1 and v2 using (10.2.1), (10.2.2)
and (10.2.3) yields an orthonormal basis for the plane
w2 spanned by w1 and w2 .
251
§10.2 Gram-Schmidt Orthonormalization Process
252
§10.2 Gram-Schmidt Orthonormalization Process
Exercises
253
§10.3 The Spectral Theory of Symmetric Matrices
since A = At = A.
t
Hermitian Inner Products Let v, w ∈ Cn be two complex
n-vectors. Define
Proof of Theorem 10.3.1 Let λ be an eigenvalue of
hv, wi = v1 w1 + · · · + vn wn . A and let v be the associated eigenvector. Since Av = λv
we can use (10.3.2) to compute
Note that the coordinates wi of the second vector enter
this formula with a complex conjugate. However, if v and λhv, vi = hAv, vi = hv, Avi = λhv, vi.
w are real vectors, then
Since hv, vi = ||v||2 > 0, it follows that λ = λ and λ is
hv, wi = v · w. real.
An alternative notation for the Hermitian inner product
Proof of Theorem 10.3.2 Let A be a real symmet-
is given by matrix multiplication. Suppose that v and w
ric n × n matrix. We show that there is an orthonormal
are column n-vectors. Then
basis of Rn consisting of eigenvectors of A. The proof
hv, wi = v t w. follows directly from Corollary 10.1.2 if the eigenvalues
are distinct.
The properties of the Hermitian inner product are similar If some of the eigenvalues are multiple, the proof is more
to those of dot product. We note three. Let c ∈ C be a complicated and uses Gram-Schmidt orthonormalization.
254
§10.3 The Spectral Theory of Symmetric Matrices
The proof proceeds inductively on n. The theorem is Finally, let vj = P −1 zj for j = 2, . . . , n. Since v1 =
trivially valid for n = 1; so we assume that it is valid for P −1 e1 , it follows that v1 , v2 , . . . , vn is a basis of Rn con-
n − 1. sisting of eigenvectors of A. We need only show that the
vj form an orthonormal basis of Rn . This is done us-
Theorem 7.2.4 of Chapter 7 implies that A has an eigen-
value λ1 and Theorem 10.3.1 states that this eigenvalue ing (10.3.2). For notational convenience let z1 = e1 and
is real. Let v1 be a unit length eigenvector corresponding compute
to the eigenvalue λ1 . Extend v1 to an orthonormal basis hvi , vj i = hP −1 zi , P −1 zj i = hP t zi , P t zj i
v1 , w2 , . . . , wn of Rn and let P = (v1 |w2 | · · · |wn ) be the
= hzi , P P t zj i = hzi , zj i,
matrix whose columns are the vectors in this orthonormal
basis. Orthonormality and direct multiplication implies since P P t = In . Thus the vectors vj form an orthonor-
that mal basis since the vectors zj form an orthonormal ba-
P t P = In . (10.3.3) sis.
Therefore P is invertible; indeed P −1 = P t . Next, let Proof of Theorem 10.3.3 As a consequence of The-
−1
orem 10.3.2, let V = {v1 , . . . , vn } be an orthonormal basis
B=P AP. for Rn consisting of eigenvectors of A. Indeed, suppose
By direct computation Avj = λj vj
where λj ∈ R. Note that
Be1 = P −1 AP e1 = P −1 Av1 = λ1 P −1 v1 = λ1 e1 .
λj i=j
It follows that that B has the form Avj · vi =
0 i 6= j
It follows from (10.1.1) that
λ1 ∗
B=
0 C
λ1 0
..
where C is an (n − 1) × (n − 1) matrix. Since P −1 = P t , [A]V = .
it follows that B is a symmetric matrix; to verify this 0 λn
point compute
is a diagonal matrix. So every symmetric matrix A is
t t t t t t t
B = (P AP ) = P A (P ) = P AP = B. t similar by an orthogonal matrix P to a diagonal matrix
where P is the matrix whose columns are the eigenvectors
It follows that of A; namely, P = [v1 | · · · |vn ].
λ1 0
B=
0 C
Exercises
where C is a symmetric matrix. By induction we can use
the Gram-Schmidt orthonormalization process to choose
an orthonormal basis z2 , . . . , zn in {0} × Rn−1 consisting
1. Let
of eigenvectors of C. It follows that e1 , z2 , . . . , zn is an
a b
orthonormal basis for Rn consisting of eigenvectors of B. A=
b d
255
§10.3 The Spectral Theory of Symmetric Matrices
2. Let
1 2
A= .
2 −2
Find the eigenvalues and eigenvectors of A and verify that the
eigenvectors are orthogonal.
256
§10.4 *QR Decompositions
(b) Hu = −u.
QR Decompositions The Gram-Schmidt process is not
We claim that the matrix of a reflection across a hyper- used in practice to find orthonormal bases as there are
plane is orthogonal and there is a simple formula for that other techniques available that are preferable for orthog-
matrix. onalization on a computer. One such procedure for the
construction of an orthonormal basis is based on QR de-
Definition 10.4.1. A Householder matrix is an n × n compositions using Householder transformations. This
matrix of the form method is the one implemented in MATLAB .
2 An n × k matrix R = {rij } is upper triangular if rij = 0
H = In − uut (10.4.1)
ut u whenever i > j.
where u ∈ Rn is a nonzero vector. . Definition 10.4.3. An n × k matrix A has a QR decom-
position if
This definition makes sense since ut u = ||u||2 is a number A = QR. (10.4.2)
while the product uut is an n × n matrix.
where Q is an n × n orthogonal matrix and R is an n × k
Lemma 10.4.2. Let u ∈ Rn be a nonzero vector and let upper triangular matrix R.
V be the hyperplane orthogonal to u. Then the House-
holder matrix H is a reflection across V and is orthogo- QR decompositions can be used to find orthonormal
nal. bases as follows. Suppose that W = {w1 , . . . , wk } is a
257
§10.4 *QR Decompositions
basis for the subspace W ⊂ Rn . Then define the n × k Conversely, we can also write down a QR decomposition
matrix A which has the wj as columns, that is for a matrix A, if we have computed an orthonormal
basis for the columns of A. Indeed, using the Gram-
A = (w1t | · · · |wkt ). Schmidt process, Theorem 10.2.1, we have shown that
QR decompositions always exist. In the remainder of
Suppose that A = QR is a QR decomposition. Since Q is this section we discuss a different way for finding QR
orthogonal, the columns of Q are orthonormal. So write decompositions using Householder matrices.
Q = (v1t | · · · |vnt ).
Construction of a QR Decomposition Using Householder
On taking transposes we arrive at the equation At = Matrices The QR decomposition by Householder trans-
Rt Qt : formations is based on the following observation :
Proposition 10.4.5. Let z = (z1 , . . . , zn ) ∈ Rn be
r11 0 · · · 0 ··· 0
w1 v1
..
r12 r22 · · · 0 ··· 0 ..
nonzero and let
. = .. .. .. .. . .
. . . .
q
··· ··· r = zj2 + · · · + zn2 .
wk vn
r1k r2k · · · rkk ··· 0
Define u = (u1 , . . . , un ) ∈ Rn by
By equating rows in this matrix equation we arrive at
the system u1 0
.. ..
. .
w1 = r11 v1
uj−1
0
w2 = r12 v1 + r22 v2
uj = zj − r
.
.. (10.4.3)
.
uj+1 zj+1
. ..
wk = r1k v1 + r2k v2 + · · · + rkk vk . .. .
It now follows that the W = span{v1 , . . . , vk } and that un zn
{v1 , . . . , vk } is an orthonormal basis for W . We have Then
proved: 2ut z = ut u
Proposition 10.4.4. Suppose that there exist an orthog- and
z1
onal n×n matrix Q and an upper triangular n×k matrix ..
R such that the n × k matrix A has a QR decomposition .
zj−1
A = QR. (10.4.4)
r
Hz =
0
Then the first k columns v1 , . . . , vk of the matrix
.
Q form an orthonormal basis of the subspace W = ..
span{w1 , . . . , wk }, where the wj are the columns of A. 0
Moreover, rij = vi · wj is the coordinate of wj in the 2
orthonormal basis. holds for the Householder matrix H = In − uut .
ut u
258
§10.4 *QR Decompositions
259
§10.4 *QR Decompositions
260
§10.4 *QR Decompositions
261
Chapter 11 *Matrix Normal Forms
262
§11.1 Simple Complex Eigenvalues
Moreover, the basis in which A has the form (11.1.1) is So, in an appropriately chosen coordinate system, mul-
found as follows. Let v = w1 + iw2 be the eigenvector of tiplication by A rotates vectors counterclockwise
√ by 45◦
A corresponding to the eigenvalue σ − iτ . Then {w1 , w2 } and then expands the result by a factor of 2. See Ex-
is the desired basis. ercise 3.
Geometrically, multiplication of vectors in R2 by (11.1.1)
is the same as a rotation followed by a dilatation. More The Algebra of Complex Eigenvalues: Complex Multiplica-
specifically, let r = σ 2 + τ 2 . So the point (σ, τ ) lies on tion We have shown that the normal form (11.1.1) can
p
the circle of radius r about the origin, and there is an be interpreted geometrically as a rotation followed by a
angle θ such that (σ, τ ) = (r cos θ, r sin θ). Now we can dilatation. There is a second algebraic interpretation of
263
§11.1 Simple Complex Eigenvalues
(11.1.1), and this interpretation is based on multiplica- for any real number θ. It follows that we can write a
tion by complex numbers. complex number λ = σ + iτ in polar form as
Let λ = σ + iτ be a complex number and consider the
matrix associated with complex multiplication, that is, λ = reiθ
the linear mapping
where r2 = λλ = σ 2 + τ 2 , σ = r cos θ, and τ = r sin θ.
z 7→ λz (11.1.3)
Now consider multiplication by λ in polar form. Write
on the complex plane. By identifying real and imaginary z = seiϕ in polar form, and compute
parts, we can rewrite (11.1.3) as a real 2 × 2 matrix in
the following way. Let z = x + iy. Then λz = reiθ seiϕ = rsei(ϕ+θ) .
λz = (σ + iτ )(x + iy) = (σx − τ y) + i(τ x + σy).
It follows from polar form that multiplication of z by λ =
reiθ rotates z through an angle θ and dilates the result
Now identify z with the vector (x, y); that is, the vec-
by the factor r. Thus Euler’s formula directly relates the
tor whose first component is the real part of z and whose
geometry of rotations and dilatations with the algebra of
second component is the imaginary part. Using this iden-
multiplication by a complex number.
tification the complex number λz is identified with the
vector (σx − τ y, τ x + σy). So, in real coordinates and in
matrix form, (11.1.3) becomes
Normal Form Matrices with Distinct Complex Eigen-
x
σx − τ y
σ −τ
x
values In the first parts of this section we have dis-
y
7→
τ x + σy
=
τ σ y
. cussed a geometric and an algebraic approach to matrix
multiplication by 2×2 matrices with complex eigenvalues.
That is, the matrix corresponding to multiplication of We now turn our attention to classifying n × n matrices
z = x + iy by the complex number λ = σ + iτ is the that have distinct eigenvalues, whether these eigenvalues
one that multiplies the vector (x, y)t by the normal form are real or complex. We will see that there are two ways
matrix (11.1.1). to frame this classification — one algebraic (using com-
plex numbers) and one geometric (using rotations and
dilatations).
Direct Agreement Between the Two Interpretations of
(11.1.1) We have shown that matrix multiplication by
(11.1.1) may be thought of either algebraically as mul- Algebraic Normal Forms: The Complex Case Let A be an
tiplication by a complex number (an eigenvalue) or geo- n × n matrix with real entries and n distinct eigenvalues
metrically as a rotation followed by a dilatation. We now λ1 , . . . , λn . Let vj be an eigenvector associated with the
show how to go directly from the algebraic interpretation eigenvalue λj . By methods that are entirely analogous
to the geometric interpretation. to those in Section 7.3 we can diagonalize the matrix
A over the complex numbers. The resulting theorem is
Euler’s formula (Chapter 6, (6.2.5)) states that
analogous to Theorem 7.3.1.
eiθ = cos θ + i sin θ More precisely, the n × n matrix A is complex diagonal-
264
§11.1 Simple Complex Eigenvalues
izable if there is a complex n × n matrix T such that Moreover, the columns of T are the complex eigenvectors
v1 and v2 associated to the eigenvalues λ and λ.
λ1 0 · · · 0
0 λ2 · · · 0 It can be checked that the eigenvectors of B are v1 =
T −1 AT = .
. .. . . .. . (1, −i)t and v2 = (1, i)t . On setting
. . . .
0 0 · · · λn 1 1
T = ,
−i i
Theorem 11.1.1. Let A be an n × n matrix with n
distinct eigenvalues. Then A is complex diagonalizable. it is a straightforward calculation to verify that C =
T −1 BT .
The proof of Theorem 11.1.1 follows from a theoretical As a second example, consider the matrix
development virtually word for word the same as that
used to prove Theorem 7.3.1 in Section 7.3. Beginning
4 2 1
from the theory that we have developed so far, the diffi- A = 2 −3 1 . (11.1.5*)
culty in proving this theorem lies in the need to base the 1 −1 −3
theory of linear algebra on complex scalars rather than Using MATLAB we find the eigenvalues of A by typing
real scalars. We will not pursue that development here. eig(A). They are:
As in Theorem 7.3.1, the proof of Theorem 11.1.1 shows
that the complex matrix T is the matrix whose columns ans =
are the eigenvectors vj of A; that is, 4.6432
-3.3216 + 0.9014i
T = (v1 | · · · |vn ). -3.3216 - 0.9014i
Finally, we mention that the computation of inverse ma- We can diagonalize (over the complex numbers) using
trices with complex entries is the same as that for matri- MATLAB — indeed MATLAB is programmed to do these
ces with real entries. That is, row reduction of the n × 2n calculations over the complex numbers. Type [T,D] =
matrix (T |In ) leads, when T is invertible, to the matrix eig(A) and obtain
(In |T −1 ).
T =
0.9604 -0.1299 + 0.1587i -0.1299 - 0.1587i
Two Examples As a first example, consider the normal 0.2632 0.0147 - 0.5809i 0.0147 + 0.5809i
form 2 × 2 matrix (11.1.1) that has eigenvalues λ and λ, 0.0912 0.7788 - 0.1173i 0.7788 + 0.1173i
where λ = σ + iτ . Let
D =
4.6432 0 0
σ −τ λ 0
B= and C = . 0 -3.3216 + 0.9014i 0
τ σ 0 λ
0 0 -3.3216 - 0.9014i
Since the eigenvalues of B and C are identical, Theo-
rem 11.1.1 implies that there is a 2 × 2 complex matrix This calculation can be checked by typing inv(T)*A*T
T such that to see that the diagonal matrix D appears. One can also
C = T −1 BT. (11.1.4) check that the columns of T are eigenvectors of A.
265
§11.1 Simple Complex Eigenvalues
Note that the development here does not depend on the We need two preliminary results.
matrix A having real entries. Indeed, this diagonalization
can be completed using n × n matrices with complex Lemma 11.1.3. Let λ1 , . . . , λq be distinct (possible com-
entries — and MATLAB can handle such calculations. plex) eigenvalues of an n × n matrix A. Let vj be a (pos-
sibly complex) eigenvector associated with the eigenvalue
λj . Then v1 , . . . , vq are linearly independent in the sense
Geometric Normal Forms: Block Diagonalization There that if
is a second normal form theorem based on the geometry α1 v1 + · · · + αq vq = 0 (11.1.8)
of rotations and dilatations for real n × n matrices A.
In this normal form we determine all matrices A that for (possibly complex) scalars αj , then αj = 0 for all j.
have distinct eigenvalues — up to similarity by real n × n
matrices S. The normal form results in matrices that are Proof The proof is identical in spirit with the proof of
block diagonal with either 1 × 1 blocks or 2 × 2 blocks of Lemma 7.3.2. Proceed by induction on q. When q = 1
the form (11.1.1) on the diagonal. the lemma is trivially valid, as αv = 0 for v 6= 0 implies
A real n × n matrix is in real block diagonal form if it is that α = 0, even when α ∈ C and v ∈ Cn .
a block diagonal matrix By induction assume the lemma is valid for q − 1. Now
apply A to (11.1.8) obtaining
B1 0 · · · 0
0 B2 · · · 0 α1 λ1 v1 + · · · + αq λq vq = 0.
.. .. .. .. , (11.1.6)
. . . .
Subtract this identity from λq times (11.1.8), and obtain
0 0 ··· Bm
where each Bj is either a 1 × 1 block α1 (λ1 − λq )v1 + · · · + αq−1 (λq−1 − λq )vq−1 = 0.
Bj = λj By induction
αj (λj − λq ) = 0
for some real number λj or a 2 × 2 block
for j = 1, . . . , q − 1. Since the λj are distinct it follows
that αj = 0 for j = 1, . . . , q − 1. Hence (11.1.8) implies
σj −τj
Bj = (11.1.7)
τj σj that αq vq = 0; since vq 6= 0, αq = 0.
where σj and τj 6= 0 are real numbers. A matrix is real Lemma 11.1.4. Let µ1 , . . . , µk be distinct real eigen-
block diagonalizable if it is similar to a real block diagonal values of an n × n matrix A and let ν1 , ν 1 . . . , ν` , ν ` be
form matrix. distinct complex conjugate eigenvalues of A. Let vj ∈ Rn
Note that the real eigenvalues of a real block diagonal be eigenvectors associated to µj and let wj = wjr + iwji
form matrix are just the real numbers λj that occur in the be eigenvectors associated with the eigenvalues νj . Then
1×1 blocks. The complex eigenvalues are the eigenvalues the k + 2` vectors
of the 2 × 2 blocks Bj and are σj ± iτj .
v1 , . . . , vk , w1r , w1i , . . . , w`r , w`i
Theorem 11.1.2. Every n × n matrix A with n distinct
eigenvalues is real block diagonalizable. in Rn are linearly independent.
266
§11.1 Simple Complex Eigenvalues
Proof Let w = wr + iwi be a vector in Cn and let β r (11.1.11). Since these vectors are linearly independent, S
and β i be real scalars. Then is invertible. We claim that S −1 AS is real block diagonal.
This statement is verified by direct calculation.
β r wr + β i wi = βw + βw, (11.1.9)
First, note that Sej = vj for j = 1, . . . , k and compute
1
where β = (β r − iβ i ). Identity (11.1.9) is verified by (S −1 AS)ej = S −1 Avj = µj S −1 vj = µj ej .
2
direct calculation.
It follows that the first k columns of S −1 AS are zero
Suppose now that except for the diagonal entries, and those diagonal entries
equal µ1 , . . . , µk .
α1 v1 + · · · + αk vk + β1r w1r + β1i w1i + · · · + β`r w`r + β`i w`i = 0
(11.1.10) Second, note that Sek+1 = w1r and Sek+2 = w1i . Write
for real scalars αj , βjr and βji . Using (11.1.9) we can the complex eigenvalues as
rewrite (11.1.10) as
νj = σj + iτj .
α1 v1 + · · · + αk vk + β1 w1 + β 1 w1 + · · · + β` w` + β ` w` = 0,
Since Aw1 = ν 1 w1 , it follows that
1
where βj = (βjr − iβji ). Since the eigenvalues Aw1r + iAw1i = (σ1 − iτ1 )(w1r + iw1i )
2
= (σ1 w1r + τ1 w1i ) + i(−τ1 w1r + σ1 w1i ).
µ1 , . . . , µ k , ν 1 , ν 1 . . . , ν ` , ν `
Equating real and imaginary parts leads to
are all distinct, we may apply Lemma 11.1.3 to conclude
that αj = 0 and βj = 0. It follows that βjr = 0 and Aw1r = σ1 w1r + τ1 w1i
(11.1.12)
βji = 0, as well, thus proving linear independence. Aw1i = −τ1 w1r + σ1 w1i .
267
§11.1 Simple Complex Eigenvalues
MATLAB Calculations of Real Block Diagonal Form Let To find the matrix S that puts C in real block diagonal
C be the 4 × 4 matrix form, we need to take the real and imaginary parts of
the eigenvectors corresponding to the complex eigenval-
ues and the real eigenvectors corresponding to the real
1 0 2 3
C=
2 1 4 6
. (11.1.13*) eigenvalues. In this case, type
−1 −5 1 3
1 4 7 10 S = [real(T(:,1)) imag(T(:,1)) T(:,3) T(:,4)]
We see that C has two real and two complex conjugate Note that the 1st and 2nd columns are the real and
eigenvalues. To find the complex eigenvectors associated imaginary parts of the complex eigenvector. Check that
with these eigenvalues, type inv(S)*C*S is the matrix in complex diagonal form
268
§11.1 Simple Complex Eigenvalues
269
§11.2 Multiplicity and Generalized Eigenvectors
Then v is an eigenvector of Jn (λ0 ) if and only if N v = Lemma 11.2.3. The algebraic multiplicity of an eigen-
0. Therefore, Jn (λ0 ) has a unique linearly independent value is greater than or equal to its geometric multiplicity.
eigenvector if
Proof For ease of notation we prove this lemma only
Lemma 11.2.1. nullity(N ) = 1. for real eigenvalues, though the proof for complex eigen-
values is similar. Let A be an n × n matrix and let λ0
270
§11.2 Multiplicity and Generalized Eigenvectors
be a real eigenvalue of A. Let k be the geometric multi- Lemma 11.2.4. Let λ0 be a complex number. Then the
plicity of λ0 and let v1 , . . . , vk be k linearly independent algebraic multiplicity of the eigenvalue λ0 in the 2n × 2n
eigenvectors of A with eigenvalue λ0 . We can extend matrix Jbn (λ0 ) is n and the geometric multiplicity is 1.
{v1 , . . . , vk } to be a basis V = {v1 , . . . , vn } of Rn . In this
basis, the matrix of A is Proof We begin by showing that the eigenvalues of
J = Jbn (λ0 ) are λ0 and λ0 , each with algebraic multi-
[A]V =
λ0 Ik (∗)
. plicity n. The characteristic polynomial of J is pJ (λ) =
0 B det(J−λI2n ). From Lemma 7.1.9 of Chapter 7 and induc-
tion, we see that pJ (λ) = pB (λ)n . Since the eigenvalues
The matrices A and [A]V are similar matrices. There- of B are λ0 and λ0 , we have proved that the algebraic
fore, they have the same characteristic polynomials and multiplicity of each of these eigenvalues in J is n.
the same eigenvalues with the same algebraic multiplici-
ties. It follows from Lemma 7.1.9 that the characteristic Next, we compute the eigenvectors of J. Let Jv = λ0 v
polynomial of A is: and let v = (v1 , . . . , vn ) where each vj ∈ C2 . Observe
that (J − λ0 I2n )v = 0 if and only if
pA (λ) = p[A]V (λ) = (λ − λ0 )k pB (λ).
Qv1 + v2 = 0
Hence λ0 appears as a root of pA (λ) at least k times and ..
.
the algebraic multiplicity of λ0 is greater than or equal to
k. The same proof works when λ0 is a complex eigenvalue Qvn−1 + vn = 0
— but all vectors chosen must be complex rather than Qvn = 0,
real.
where Q = B − λ0 I2 . Using the fact that λ0 = σ + iτ , it
follows that
Deficiency in Eigenvectors with Complex Eigenvalues An
i 1
example of a real matrix with complex conjugate eigen- Q = B − λ0 I2 = −τ
−1 i
.
values having geometric multiplicity less than algebraic
multiplicity is the 2n × 2n block matrix Hence
2 2 i 1
Q = 2τ i = −2τ iQ.
−1 i
B I2 0 ··· 0 0
0 B I2 ··· 0 0 Thus
0 0 B ··· 0 0
0 = Q2 vn−1 + Qvn = −2τ iQvn−1 ,
Jn (λ0 ) = .. .. .. .. .. .. (11.2.3)
.
b
. . . . .
from which it follows that Qvn−1 + vn = vn = 0. Sim-
0 0 0 ··· B I2 ilarly, v2 = · · · = vn−1 = 0. Since there is only one
0 0 0 ··· 0 B nonzero complex vector v1 (up to a complex scalar mul-
tiple) satisfying
where λ0 = σ + iτ and B is the 2 × 2 matrix Qv1 = 0,
it follows that the geometric multiplicity of λ0 in the
σ −τ
B= . matrix Jbn (λ0 ) equals 1.
τ σ
271
§11.2 Multiplicity and Generalized Eigenvectors
eigenvector for the n × n matrix A with eigenvalue λ if Proof We can prove the lemma by induction on j if
we can show that
(A − λIn )k v = 0 (11.2.4)
for some positive integer k. The smallest integer k for null space(Ak+1
0 ) = null space(Ak+2
0 ).
which (11.2.4) is satisfied is called the index of the gen-
Since null space(Ak+1 ) ⊂ null space(Ak+2 ), we need to
eralized eigenvector v. 0 0
show that
Note: Eigenvectors are generalized eigenvectors with in-
null space(Ak+2 ) ⊂ null space(Ak+1 ).
dex equal to 1. 0 0
272
§11.2 Multiplicity and Generalized Eigenvectors
An Example of Generalized Eigenvectors Find the gener- Thus, for this example, all generalized eigenvectors that
alized eigenvectors of the 4 × 4 matrix are not eigenvectors have index 2.
−24 −58 −2 −8
A=
15 35 1 4 . (11.2.6*) Exercises
3 5 7 4
3 6 0 6
In Exercises 1 – 4 determine the eigenvalues and their geo-
and their indices. When finding generalized eigenvectors
metric and algebraic multiplicities for the given matrix.
of a matrix A, the first two steps are:
2 0 0 0
(i) Find the eigenvalues of A. 1. A =
0 3 1 0
0 0 3 0 .
273
§11.2 Multiplicity and Generalized Eigenvectors
5 1 0
8. D = −3 1 1 .
−12 −4 0
9. (matlab)
2 3 −21 −3
2 7 −41 −5
A=
. (11.2.7*)
0 1 −5 −1
0 0 4 4
10. (matlab)
179 −230 0 10 −30
144 −185 0 8 −24
(11.2.8*)
B= 30 −39 −1 3 −9 .
192 −245 0 9 −30
40 −51 0 2 −7
274
§11.3 The Jordan Normal Form Theorem
11.3 The Jordan Normal Form n × n matrix. Then A is similar to a Jordan normal
form matrix and to a real Jordan normal form matrix.
Theorem
The question that we discussed in Sections 7.3 and 11.1 This theorem is proved by constructing a basis V for Rn
is: Up to similarity, what is the simplest form that a so that the matrix S −1 AS is in Jordan normal form,
matrix can have? We have seen that if A has real dis- where S is the matrix whose columns consists of vectors
tinct eigenvalues, then A is real diagonalizable. That is, in V. The algorithm for finding the basis V is compli-
A is similar to a diagonal matrix whose diagonal entries cated and is found in Appendix 11.5. In this section we
are the real eigenvalues of A. Similarly, if A has dis- construct V only in the special and simpler case where
tinct real and complex eigenvalues, then A is complex each eigenvalue of A is real and is associated with exactly
diagonalizable; that is, A is similar either to a diagonal one Jordan block.
matrix whose diagonal entries are the real and complex
More precisely, let λ1 , . . . , λs be the distinct eigenvalues
eigenvalues of A or to a real block diagonal matrix.
of A and let
In this section we address the question of simplest form Aj = A − λj In .
when a matrix has multiple eigenvalues. In much of this
The eigenvectors corresponding to λj are the vectors in
discussion we assume that A is an n × n matrix with
the null space of Aj and the generalized eigenvectors are
only real eigenvalues. Lemma 7.3.3 shows that if the
the vectors in the null space of Akj for some k. The di-
eigenvectors of A form a basis, then A is diagonalizable.
mension of the null space of Aj is precisely the number
Indeed, for A to be diagonalizable, there must be a basis
of Jordan blocks of A associated to the eigenvalue λj . So
of eigenvectors of A. It follows that if A is not diagonaliz-
the assumption that we make here is
able, then A must have fewer than n linearly independent
eigenvectors. nullity(Aj ) = 1
The prototypical examples of matrices having fewer
eigenvectors than eigenvalues are the matrices Jn (λ) for λ for j = 1, . . . , s.
real (see (11.2.1)) and Jbn (λ) for λ complex (see (11.2.3)). Let kj be the integer whose existence is specified by
Lemma 11.2.7. Since, by assumption, there is only one
Definition 11.3.1. A matrix is in Jordan normal form Jordan block associated with the eigenvalue λj , it follows
if it is block diagonal and the matrix in each block on that kj is the algebraic multiplicity of the eigenvalue λj .
the diagonal is a Jordan block, that is, J` (λ) for some
integer ` and some real or complex number λ. To find a basis in which the matrix A is in Jordan normal
form, we proceed as follows. First, let wjkj be a vector
A matrix is in real Jordan normal form if it is block in
diagonal and the matrix in each block on the diagonal is null space(Aj j ) – null space(Aj j ).
k k −1
a real Jordan block, that is, either J` (λ) for some integer
` and some real number λ or Jb` (λ) for some integer ` and Define the vectors wji by
some complex number λ.
wj,kj −1 = Aj wj,kj
The main theorem about Jordan normal form is: ..
.
Theorem 11.3.2 (Jordan normal form). Let A be an wj,1 = Aj wj,2 .
275
§11.3 The Jordan Normal Form Theorem
Second, when λj is real, let the kj vectors vji = wji , and where A1 and A2 are square matrices. Then
when λj is complex, let the 2kj vectors vji be defined by
pA (λ) = pA1 (λ)pA2 (λ).
vj,2i−1 = Re(wji )
This observation follows directly from Lemma 7.1.9.
vj,2i = Im(wji ). Since k
k A1 0
Let V be the set of vectors vji ∈ Rn . We will show in A =
0 Ak2
,
Appendix 11.5 that the set V consists of n vectors and
is a basis of Rn . Let S be the matrix whose columns are it follows that
the vectors in V. Then S −1 AS is in Jordan normal form.
pA (A1 ) 0
pA (A) =
0 pA (A2 )
The Cayley Hamilton Theorem As a corollary of the Jor-
pA1 (A1 )pA2 (A1 ) 0
= .
dan normal form theorem, we prove the Cayley Hamilton 0 pA1 (A2 )pA2 (A2 )
theorem which states that a square matrix satisfies its
It now follows from this calculation that if the Cay-
characteristic polynomial. More precisely:
ley Hamilton theorem is valid for Jordan blocks, then
Theorem 11.3.3 (Cayley Hamilton). Let A be a square pA1 (A1 ) = 0 = pA2 (A2 ). So pA (A) = 0 and the Cayley
matrix and let pA (λ) be its characteristic polynomial. Hamilton theorem is valid for all matrices.
Then A direct calculation shows that Jordan blocks satisfy the
pA (A) = 0. Cayley Hamilton theorem. To begin, suppose that the
eigenvalue of the Jordan block is real. Note that the
Proof Let A be an n × n matrix. The characteristic characteristic polynomial of the Jordan block Jn (λ0 ) in
polynomial of A is (11.2.1) is (λ − λ0 )n . Indeed, Jn (λ0 ) − λ0 In is strictly
upper triangular and (Jn (λ0 ) − λ0 In )n = 0. If λ0 is
pA (λ) = det(A − λIn ). complex, then either repeat this calculation using the
complex Jordan form or show by direct calculation that
Suppose that B = P −1 AP is a matrix similar to A. The- (A − λ0 In )(A − λ0 In ) is strictly upper triangular when
orem 7.2.8 states that pB = pA . Therefore A = Jbn (λ0 ) is the real Jordan form of the Jordan block
in (11.2.3).
pB (B) = pA (P −1 AP ) = P −1 pA (A)P.
So if the Cayley Hamilton theorem holds for a matrix An Example Consider the 4 × 4 matrix
similar to A, then it is valid for the matrix A. Moreover,
−147 −106 −66 −488
using the Jordan normal form theorem, we may assume 604 432 271 1992
that A is in Jordan normal form. A= 621
. (11.3.1*)
448 279 2063
Suppose that A is block diagonal, that is −169 −122 −76 −562
Using MATLAB we can compute the characteristic poly-
A1 0
A= , nomial of A by typing
0 A2
276
§11.3 The Jordan Normal Form Theorem
ans = null2 =
1.0000 -2.0000 -15.0000 -0.0000 - 0.2193 -0.2236
0.0000 -0.5149 -0.8216
-0.8139 0.4935
Note that since A is a matrix of integers we know that the 0.1561 0.1774
coefficients of the characteristic polynomial of A must be
integers. Thus the characteristic polynomial is exactly: Choose one of these vectors, say the first vector, to be
v12 by typing
pA (λ) = λ4 − 2λ3 − 15λ2 = λ2 (λ − 5)(λ + 3).
v12 = null2(:,1);
So λ1 = 0 is an eigenvalue of A with algebraic multiplicity
two and λ2 = 5 and λ3 = −3 are simple eigenvalues of Since the algebraic multiplicity of the eigenvalue 0 is two,
multiplicity one. we choose the fourth basis vector be v11 = Av12 . In MAT-
We can find eigenvectors of A corresponding to the simple LAB we type
eigenvalues by typing
v11 = A*v12
v2 = null(A-5*eye(4));
v3 = null(A+3*eye(4)); obtaining
277
§11.3 The Jordan Normal Form Theorem
278
§11.3 The Jordan Normal Form Theorem
279
§11.3 The Jordan Normal Form Theorem
(c) Show that any matrix similar to a nilpotent matrix is 20. (matlab)
also nilpotent.
−1 −1 1 0
(d) Let N be a matrix all of whose eigenvalues are zero. −3 1 1 0
Use the Jordan normal form theorem to show that N is A=
−3
. (11.3.6*)
2 −1 1
nilpotent.
−3 2 0 0
16. (matlab)
−3 −4 −2 0
−9 −39 −16 −7
A=
18
. (11.3.2*)
64 27 10
15 86 34 18
17. (matlab)
9 45 18 8
0 −4 −1 −1
(11.3.3*)
A=
−16
.
−69 −29 −12
25 123 49 23
18. (matlab)
−5 −13 17 42
−10 −57 66 187
A=
−4
. (11.3.4*)
−23 26 77
−1 −9 9 32
280
§11.4 *Markov Matrix Theory
11.4 *Markov Matrix Theory Consider the case of a simple Jordan block. Suppose that
n = 1 and that A = (λ) where λ is either real or complex.
In this appendix we use the Jordan normal form theorem
Then
to study the asymptotic dynamics of transition matrices
Ak v0 = λk v0 .
such as those of Markov chains introduced in Section 4.8.
It follows that (11.4.1) is valid precisely when |λ| < 1.
The basic result is the following theorem.
Next, suppose that A is a nontrivial Jordan block. For
Theorem 11.4.1. Let A be an n × n matrix and assume example, let
that all eigenvalues λ of A satisfy |λ| < 1. Then for every
λ 1
vector v0 ∈ Rn A= = λI2 + N
0 λ
lim Ak v0 = 0. (11.4.1)
k→∞
where N 2 = 0. It follows by induction that
Proof Suppose that A and B are similar matrices; 1
that is, B = SAS −1 for some invertible matrix S. Then Ak v0 = λk v0 + kλk−1 N v0 = λk v0 + kλk N v0 .
λ
B k = SAk S −1 and for any vector v0 ∈ Rn (11.4.1) is
valid if and only if Thus (11.4.1) is valid precisely when |λ| < 1. The reason
for this convergence is as follows. The first term con-
lim B k v0 = 0. verges to 0 as before but the second term is the product
k→∞ 1
of three terms k, λk , and N v0 . The first increases to
Thus, when proving this theorem, we may assume that λ
infinity, the second decreases to zero, and the third is
A is in Jordan normal form.
constant independent of k. In fact, geometric decay (λk ,
Suppose that A is in block diagonal form; that is, suppose when |λ| < 1) always beats polynomial growth. Indeed,
lim mj λm = 0 (11.4.2)
C 0
A= , m→∞
0 D
for any integer j. This fact can be proved using l’Hôspi-
where C is an ` × ` matrix and D is a (n − `) × (n − `) tal’s rule and induction.
matrix. Then
So we see that when A has a nontrivial Jordan block, con-
vergence is subtler than when A has only simple Jordan
k
C 0
Ak = . blocks, as initially the vectors Av0 grow in magnitude.
0 Dk
For example, suppose that λ = 0.75 and v0 = (1, 0)t .
So for every vector v0 = (w0 , u0 ) ∈ R` × Rn−` (11.4.1) is Then A8 v0 = (0.901, 0.075)t is the first vector in the se-
valid if and only if quence Ak v0 whose norm is less than 1; that is, A8 v0 is
the first vector in the sequence closer to the origin than
lim C k v0 = 0 and lim Dk v0 = 0. v0 .
k→∞ k→∞
It is also true that (11.4.1) is valid for any Jordan block
So, when proving this theorem, we may assume that A A and for all v0 precisely when |λ| < 1. To verify this fact
is a Jordan block. we use the binomial theorem. We can write a nontrivial
281
§11.4 *Markov Matrix Theory
Jordan block as λIn +N where N k+1 = 0 for some integer nonnegative, whose rows sum to 1, and for which a power
k. We just discussed the case k = 1. In this case P k that has all positive entries. To prove this theorem
we must show that all eigenvalues λ of P satisfy |λ| ≤ 1
(λIn + N )m = λm In + mλm−1 N +
m m−2 2
λ N + ··· and that 1 is a simple eigenvalue of P .
2
Let λ be an eigenvalue of P and let v = (v1 , . . . , vn )t
+
m m−k k
λ N , be an eigenvector corresponding to the eigenvalue λ. We
k prove that |λ| ≤ 1. Choose j so that |vj | ≥ |vi | for all
where i. Since P v = λv, we can equate the j th coordinates of
both sides of this equality, obtaining
m m! m(m − 1) · · · (m − j + 1)
= = . pj1 v1 + · · · + pjn vn = λvj .
j j!(m − j)! j!
Proof Recall from Chapter 3, Definition 4.8.1 that a This equality is valid only if all of the vi are nonnegative
Markov matrix is a square matrix P whose entries are or all are nonpositive. Without loss of generality, we
282
§11.4 *Markov Matrix Theory
assume that all vi ≥ 0. It follows from (11.4.3) that (b) Theorem 11.4.3 states that a Markov matrix has a
dominant eigenvalue equal to 1. The Jordan normal form
vj = qj1 v1 + · · · + qjn vn . theorem implies that the eigenvalues of P t are equal to
the eigenvalues of P with the same algebraic and geo-
Since qji > 0, this inequality can hold only if all of the metric multiplicities. It follows that 1 is also a dominant
vi are equal. eigenvalue of P t . It follows from Part (a) that
Theorem 11.4.4. (a) Let Q be an n × n matrix with lim (P t )k v0 = cV
dominant eigenvalue λ > 0 and associated eigenvector v. k→∞
Let v0 be any vector in Rn . Then for some scalar c. But Theorem 4.8.3 in Chapter 3 implies
1 k that the sum of the entries in v0 equals the sum of the
lim
k→∞ λk
Q v0 = cv, entries in cV which, by assumption equals the sum of the
entries in V . Thus, c = 1.
for some scalar c.
(b) Let P be a Markov matrix and v0 a nonzero vector in Exercises
Rn with all entries nonnegative. Then
lim (P t )k v0 = V
k→∞
1. Let A be an n × n matrix. Suppose that
where V is the eigenvector of P t with eigenvalue 1 such
that the sum of the entries in V is equal to the sum of lim Ak v0 = 0.
k→∞
the entries in v0 .
for every vector v0 ∈ Rn . Then the eigenvalues λ of A all
satisfy |λ| < 1.
Proof (a) After a similarity transformation, if needed,
we can assume that Q is in Jordan normal form. More
precisely, we can assume that
1 1 0
Q=
λ 0 A
283
§11.5 *Proof of Jordan Normal Form
11.5 *Proof of Jordan Normal Form Now w ∈ null space(B) if w ∈ Vj and Ai w = 0. Since
Ai w = (A−λi In )w = 0, it follows that Aw = λi w. Hence
We prove the Jordan normal form theorem under the
assumption that the eigenvalues of A are all real. The Aj w = (A − λj In )w = (λi − λj )w
proof for matrices having both real and complex eigen-
values proceeds along similar lines. and
Let A be an n × n matrix, let λ1 , . . . , λs be the distinct Akj w = (λi − λj )k w.
eigenvalues of A, and let Aj = A − λj In .
Since λi 6= λj , it follows that Akj w = 0 only when w = 0.
Lemma 11.5.1. The linear mappings Ai and Aj com- Hence the nullity of B is zero. We conclude that
mute.
dim range(B) = dim(Vj ).
Proof Just compute Thus, B is invertible, since the domain and range of B
are the same space.
Ai Aj = (A − λi In )(A − λj In ) = A2 − λi A − λj A + λi λj In ,
Lemma 11.5.3. Nonzero vectors taken from different
and generalized eigenspaces Vj are linearly independent. More
precisely, if wj ∈ Vj and
Aj Ai = (A − λj In )(A − λi In ) = A2 − λj A − λi A + λj λi In .
w = w1 + · · · + ws = 0,
So Ai Aj = Aj Ai , as claimed.
then wj = 0.
Let Vj be the generalized eigenspace corresponding to
eigenvalue λj . Proof Let Vj = null space(Aj j ) for some integer kj .
k
0 = Cw = Cw1 ,
Proof Recall from Lemma 11.2.7 that Vj =
null space(Akj ) for some k ≥ 1. Suppose that v ∈ Vj . We since Aj j wj = 0 for j = 2, . . . , s. But Lemma 11.5.2
k
first verify that Ai v is also in Vj . Using Lemma 11.5.1, implies that C|V1 is invertible. Therefore, w1 = 0. Simi-
just compute larly, all of the remaining wj have to vanish.
284
§11.5 *Proof of Jordan Normal Form
this set to a basis W of Rn . In this basis the matrix [A]W where all of the eigenvalues of Ajj equal λj .
has block form, that is,
A11 A12
Proof It follows from Lemma 11.5.1 that A : Vj → Vj .
[A]W =
0 A22
, Suppose that vj ∈ Vj . Then Avj is in Vj and Avj is a
linear combination of vectors in Vj . The block diagonal-
where A22 is an (n − t) × (n − t) matrix. The eigenvalues ization of [A]V follows. Since Vj = null space(Aj j ), it
k
of A22 are eigenvalues of A. Since all of the distinct follows that all eigenvalues of Ajj equal λj .
eigenvalues and eigenvectors of A are accounted for in W
(that is, in A11 ), we have a contradiction. So W = Rn ,
as claimed. Lemma 11.5.6 implies that to prove the Jordan normal
form theorem, we must find a basis in which the matrix
Lemma 11.5.5. Let Vj be a basis for the generalized Ajj is in Jordan normal form. So, without loss of gen-
eigenspaces Vj and let V be the union of the sets Vj . erality, we may assume that all eigenvalues of A equal
Then V is a basis for Rn . λ0 , and then find a basis in which A is in Jordan nor-
mal form. Moreover, we can replace A by the matrix
Proof We first show that the vectors in V span Rn . It A − λ0 In , a matrix all of whose eigenvalues are zero. So,
follows from Lemma 11.5.4 that every vector in Rn is a without loss of generality, we assume that A is an n × n
linear combination of vectors in Vj . But each vector in matrix all of whose eigenvalues are zero. We now sketch
Vj is a linear combination of vectors in Vj . Hence, the the remainder of the proof of Theorem 11.3.2.
vectors in V span Rn .
Let k be the smallest integer such that Rn =
Second, we show that the vectors in V are linearly inde- null space(Ak ) and let
pendent. Suppose that a linear combination of vectors in
V sums to zero. We can write this sum as s = dim null space(Ak ) − dim null space(Ak−1 ) > 0.
285
§11.5 *Proof of Jordan Normal Form
β1 w1 + · · · + βs ws
β1 w1 + · · · + βs ws = 0.
286
Chapter 12 Matlab Commands
12 Matlab Commands
† indicates an laode toolbox command not found in MATLAB .
Chapter 1: Preliminaries
Vector Commands
Matrix Commands
287
Chapter 12 Matlab Commands
format Changes the numbers display format to standard five digit format
format long Changes display format to 15 digits
format rational Changes display format to rational numbers
format short e Changes display to five digit floating point numbers
Vector Commands
Matrix Commands
288
Chapter 12 Matlab Commands
Graphics Commands
Matrix Commands
A*x Performs the matrix vector product of the matrix A with the vector x
A*B Performs the matrix product of the matrices A and B
size(A) Determines the numbers of rows and columns of a matrix A
inv(A) Computes the inverse of a matrix A
289
Chapter 12 Matlab Commands
Matrix Commands
Matrix Commands
Vector Commands
290
Chapter 12 Matlab Commands
Matrix Commands
orth(A) Computes an orthonormal basis for the column space of the matrix A
[Q,R] = qr(A,0) Computes the QR decomposition of the matrix A
Graphics Commands
axis([xmin,xmax,ymin,ymax])
Forces MATLAB to use in a twodimensional plot the intervals
[xmin,xmax] resp. [ymin,ymax] labeling the x- resp. y-axis
plot(x,y,'o') Same as plot but now the points (x(i), y(i)) are marked by
circles and no longer connected in sequence
Vector Commands
291
Index
R3 inv, 81, 206, 224
subspaces, 150 linspace, 23
Rn , 2 map, 56, 269
ej , 62 meshgrid, 25
MATLAB Instructions norm, 12, 241, 250
\, 18, 20, 46, 48, 53 null, 116, 136, 137, 140, 206, 249
’, 7, 224 orth, 250
*, 53, 74 PhasePlane, 97, 156, 170
.^, 27 pi, 45
:, 5 plot, 23
;, 4 poly, 201, 276
[1 2 1], 4 prod, 177
[1; 2; 3], 5 qr, 260
.*, 27 rand, 22, 142
./, 27 rank, 42, 144
A(3,4), 19 real, 268
A([1 3],:), 31 rref, 39, 137
acos, 13 sin, 91
addvec, 11 size, 75
addvec3, 11 sqrt, 45
axis(’equal’), 24 sum, 201
bcoord, 223 surf, 25
ccoord, 230 trace, 201
cos, 91 view, 28
det, 194 xlabel, 23
diag, 7 ylabel, 23
dog, 56 zeros, 7
dot, 12, 13 zoom, 28
eig, 112, 116, 206, 268, 278
exp(1), 45 acceleration, 182
expm, 176 amplitude, 243
eye, 7, 80 angle between vectors, 13
format associative, 73, 128
long, 21, 46 autonomous, 154
rational, 21, 46
grid, 24 back substitution, 29, 34
hold, 24 basis, 143, 149, 150, 214, 224, 229, 235, 263
i, 48 construction, 149
imag, 268 orthonormal, 248, 249, 251, 252, 254, 258, 260
inf, 20 binomial theorem, 281
292
Chapter 12 Matlab Commands
293
Chapter 12 Matlab Commands
linearly independent, 160, 205, 275 for second order equations, 183
real, 105, 160 initial velocity, 183
elementary row operations, 30, 143, 189, 191, 209 integral calculus, 88
in MATLAB , 31 inverse, 77, 78, 84, 128, 192, 265
equilibrium, 96 computation, 79
Euler’s formula, 157, 264 invertible, 77, 79, 84, 107, 163, 175, 192, 200, 214, 215
expansion, 56, 263
exponential Jordan block, 272, 275, 281
decay, 90 Jordan normal form, 275, 277, 284
growth, 90 basis for, 275
external force, 182
law of cosines, 12
first order Law of Pythagorus, 234
reduction to, 183 least squares, 241
fitting of data, 240 approximation, 234
force, 182 distance to a line, 234
frequency distance to a subspace, 235
internal, 184 fit to a quadratic polynomial, 241, 242
function space, 127, 242 fit to a sinusoidal function, 243
subspace of, 130 fitting of data, 240
fundamental theorem of algebra, 199 general fit, 242
length, 11
Gaussian elimination, 29, 32 linear, 29, 60, 212, 229
general solution, 113, 155, 157, 161, 183 combination, 133, 136, 139, 145, 221
generalized eigenspace, 272 fit to data, 240
geometric decay, 281 mapping, 60–62, 212, 222, 230
Goodman, Roy, i, 87, 97 construction, 212
Gram-Schmidt orthonormalization, 251, 252 matrix, 213
growth rate, 90 regression, 241, 242
linearly
Hermitian inner product, 254 dependent, 139, 140, 148
homogeneous, 65, 132, 134, 136, 145, 182 independent, 139, 140, 143, 148–150, 154, 158, 204, 248
Hooke’s law, 182
hyperplane, 257 MATLAB Instructions
\, 288
identity mapping, 58 ↑, 287
inconsistent, 29, 40, 137 ’, 287
index, 272, 273 *, 289
inhomogeneous, 66, 78 .^, 288
initial condition, 104, 154 :, 287
linear independence, 155 ;, 287
initial position, 183 PhasePlane, 290
initial value problem, 89, 103, 104, 113, 114, 154, 157, 175 .*, 288
294
Chapter 12 Matlab Commands
295
Chapter 12 Matlab Commands
296
Chapter 12 Matlab Commands
zero mapping, 58
zero vector, 127, 128
297