0% found this document useful (0 votes)
18 views305 pages

LinearAlgebra GDF Jan5 23

This document is a textbook titled 'Linear Algebra and Differential Equations using MATLAB' authored by Martin Golubitsky, Michael Dellnitz, and Jim Fowler, published in January 2023. It provides an integrated approach to linear algebra and ordinary differential equations (ODEs) with a focus on using MATLAB for solving problems and visualizing solutions. The content includes theoretical concepts, practical applications, and exercises designed for students to enhance their understanding of mathematics through computational methods.

Uploaded by

shadowlordpercy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views305 pages

LinearAlgebra GDF Jan5 23

This document is a textbook titled 'Linear Algebra and Differential Equations using MATLAB' authored by Martin Golubitsky, Michael Dellnitz, and Jim Fowler, published in January 2023. It provides an integrated approach to linear algebra and ordinary differential equations (ODEs) with a focus on using MATLAB for solving problems and visualizing solutions. The content includes theoretical concepts, practical applications, and exercises designed for students to enhance their understanding of mathematics through computational methods.

Uploaded by

shadowlordpercy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 305

Linear Algebra

and Differential Equations


using MATLAB
January 4, 2023

by Martin Golubitsky, Michael Dellnitz, and Jim Fowler

cba This document was typeset on Wednesday 4th January, 2023.


Copyright © 1998 Martin Golubitsky and Michael Dellnitz.
Copyright © 2022 Martin Golubitsky and Michael Dellnitz and Jim Fowler.
This work is licensed under the Creative Commons Attribution-ShareAlike License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/
The cover photograph was taken by Ben Scumin and is licensed under a CC BY-SA license.
If you distribute this work or a derivative, include the history of the document. The source code is available at:
http://github.com/mooculus/laode/
This book is typeset using LATEX and the STIX and Gillius fonts.
This book uses the XIMERA document class.
We will be glad to receive corrections and suggestions for improvement at [email protected]
Contents 4.2 *Rate Problems . . . . . . . . . . . . 92
4.3 Uncoupled Linear Systems of Two
Preface . . . . . . . . . . . . . . . . . . . . . . i
Equations . . . . . . . . . . . . . 96
1 Preliminaries . . . . . . . . . . . . . . . . 1
4.4 Coupled Linear Systems . . . . . . . 100
1.1 Vectors and Matrices . . . . . . . . . 2
4.5 The Initial Value Problem and
1.2 MATLAB . . . . . . . . . . . . . . . 4 Eigenvectors . . . . . . . . . . . 103
1.3 Special Kinds of Matrices . . . . . . 7 4.6 Eigenvalues of 2 × 2 Matrices . . . . 108
1.4 The Geometry of Vector Operations 11 4.7 Initial Value Problems Revisited . . 113
2 Solving Linear Equations . . . . . . . . 16 4.8 *Markov Chains . . . . . . . . . . . 118
2.1 Systems of Linear Equations and 5 Vector Spaces . . . . . . . . . . . . . . . . 126
Matrices . . . . . . . . . . . . . 17
5.1 Vector Spaces and Subspaces . . . . 127
2.2 The Geometry of Low-Dimensional
Solutions . . . . . . . . . . . . . 23 5.2 Construction of Subspaces . . . . . . 132
2.3 Gaussian Elimination . . . . . . . . . 29 5.3 Spanning Sets and MATLAB . . . . 136
2.4 Reduction to Echelon Form . . . . . 38 5.4 Linear Dependence and Linear Inde-
pendence . . . . . . . . . . . . . 139
2.5 Linear Equations with Special Coef-
ficients . . . . . . . . . . . . . . 45 5.5 Dimension and Bases . . . . . . . . . 143
2.6 Uniqueness of Reduced Echelon Form 50 5.6 The Proof of the Main Theorem . . . 148
3 Matrices and Linearity . . . . . . . . . . 51 6 Closed Form Solutions for Planar ODEs153
3.1 Matrix Multiplication of Vectors . . 52 6.1 The Initial Value Problem . . . . . . 154
3.2 Matrix Mappings . . . . . . . . . . . 56 6.2 Closed Form Solutions by the Direct
3.3 Linearity . . . . . . . . . . . . . . . . 60 Method . . . . . . . . . . . . . . 157

3.4 The Principle of Superposition . . . 65 6.3 Similar Matrices and Jordan Normal
Form . . . . . . . . . . . . . . . 163
3.5 Composition and Multiplication of
Matrices . . . . . . . . . . . . . 69 6.4 Sinks, Saddles, and Sources . . . . . 168
3.6 Properties of Matrix Multiplication . 73 6.5 *Matrix Exponentials . . . . . . . . 174
3.7 Solving Linear Systems and Inverses 77 6.6 *The Cayley Hamilton Theorem . . 180
3.8 Determinants of 2 × 2 Matrices . . . 84 6.7 *Second Order Equations . . . . . . 182
4 Solving Linear Differential Equations . 87 7 Determinants and Eigenvalues . . . . . 187
4.1 A Single Differential Equation . . . . 88 7.1 Determinants . . . . . . . . . . . . . 188
7.2 Eigenvalues and Eigenvectors . . . . 198 Index . . . . . . . . . . . . . . . . . . . . . . . 292
7.3 Real Diagonalizable Matrices . . . . 204
7.4 *Existence of Determinants . . . . . 208
8 Linear Maps and Changes of Coordi-
nates . . . . . . . . . . . . . . . . . . 211
8.1 Linear Mappings and Bases . . . . . 212
8.2 Row Rank Equals Column Rank . . 217
8.3 Vectors and Matrices in Coordinates 221
8.4 *Matrices of Linear Maps on a Vector
Space . . . . . . . . . . . . . . . 229
9 Least Squares . . . . . . . . . . . . . . . . 233
9.1 Least Squares Approximations . . . . 234
9.2 Best Approximate Solution . . . . . 238
9.3 Least Squares Fitting of Data . . . . 240
10 Orthogonality . . . . . . . . . . . . . . . 247
10.1 Orthonormal Bases and Orthogonal
Matrices . . . . . . . . . . . . . 248
10.2 Gram-Schmidt Orthonormalization
Process . . . . . . . . . . . . . . 251
10.3 The Spectral Theory of Symmetric
Matrices . . . . . . . . . . . . . 254
10.4 *QR Decompositions . . . . . . . . 257
11 *Matrix Normal Forms . . . . . . . . . 262
11.1 Simple Complex Eigenvalues . . . . 263
11.2 Multiplicity and Generalized Eigen-
vectors . . . . . . . . . . . . . . 270
11.3 The Jordan Normal Form Theorem 275
11.4 *Markov Matrix Theory . . . . . . 281
11.5 *Proof of Jordan Normal Form . . . 284
12 Matlab Commands . . . . . . . . . . . 287
Preface

Preface form solutions. We assume that now and in the future


practicing scientists and mathematicians will use ODE
solving computer programs more frequently than they
will use techniques of integration. For this reason we
These notes provide an integrated approach to linear al- have focused on the information that is embedded in the
gebra and ordinary differential equations based on com- computer graphical approach. We discuss both typical
puters — in this case the software package MATLABR phase portraits (Morse-Smale systems) and typical one
1
. We believe that computers can improve the concep- parameter bifurcations (both local and global). Our goal
tual understanding of mathematics — not just enable is to provide the mathematical background that is needed
the completion of complicated calculations. We use com- when interpreting the results of computer simulation.
puters in two ways: in linear algebra computers reduce
the drudgery of calculations and enable students to focus
The integration of computers: Our approach assumes
on concepts and methods, while in differential equations
that students have an easier time learning with comput-
computers display phase portraits graphically and enable
ers if the computer segments are fully integrated with the
students to focus on the qualitative information embod-
course material. So we have interleaved the instructions
ied in solutions rather than just on developing formulas
on how to use MATLAB with the examples and theory in
for solutions.
the text. With ease of use in mind, we have also provided
We develop methods for solving both systems of lin- a number of preloaded matrices and differential equations
ear equations and systems of (constant coefficient) lin- with the notes. Any equation label in this text that is fol-
ear ordinary differential equations. It is generally ac- lowed by an asterisk can be loaded into MATLAB just by
cepted that linear algebra methods aid in finding closed typing the formula number. For the successful use of this
form solutions to systems of linear differential equations. text, it is important that students have access to com-
The fact that the graphical solution of systems of differ- puters with MATLAB and the computer files associated
ential equations can motivate concepts (both geometric with these notes.
and algebraic) in linear algebra is less often discussed.
John Polking developed an excellent graphical user inter-
These notes begin by solving linear systems of equations
face for solving planar systems of autonomous differential
(through standard Gaussian elimination theory) and dis-
equations called pplane. This program has been updated
cussing elementary matrix theory. We then introduce
by Roy Goodman and is now called PhasePlane. We use
simple differential equations — both single equations and
PhasePlane instead of using the MATLAB native com-
planar systems — to motivate the notions of eigenvectors
mands for solving ODEs. In these notes we also provide
and eigenvalues. In subsequent chapters linear algebra
an introduction to PhasePlane.
and ODE theory are often mixed.
For the most part we treat the computer as a black box.
Regarding differential equations, our purpose is to intro-
We have not attempted to explain how the computer, or
duce at the sophomore – junior level ideas from dynam-
more precisely MATLAB, performs computations. Linear
ical systems theory. We focus on phase portraits (and
algebra structures are developed (typically) with proofs,
time series) rather than on techniques for finding closed
while differential equations theorems are presented (typ-
1
MATLAB is a registered trademark of The MathWorks Inc. ically) without proof and are instead motivated by com-
Natick, MA puter experimentation.

i
Preface

There are two types of exercises included with most sec- In our course we ask the students to read the material in
tions — those that should be completed using pencil and Chapter 1 and to use the computer instructions in that
paper (called Hand Exercises) and those that should be chapter as an entry into MATLAB. In class we cover only
completed with the assistance of computers (called Com- the material on dot product. Chapter 2 explains how to
puter Exercises). solve systems of linear equations and is required for a first
course on linear algebra. The proof of the uniqueness of
reduced echelon form matrices is not very illuminating
Ways to use the text: We envision this course as a one- for students and can be omitted in classroom discussion.
year sequence replacing the standard one semester lin- Sections whose material we feel can be omitted are noted
ear algebra and ODE courses. There is a natural one by asterisks in the Table of Contents and Section 2.6 is
semester Linear Systems course that can be taught us- the first example of such a section.
ing the material in this book. In this course students
will learn both the basics of linear algebra and the ba- In Chapter 3 we introduce matrix multiplication as a
sics of linear systems of differential equations. This one notation that simplifies the presentation of systems of
semester course covers the material in the first eight chap- linear equations. We then show how matrix multiplica-
ters. The Linear Systems course stresses eigenvalues and tion leads to linear mappings and how linearity leads to
a baby Jordan normal form theory for 2 × 2 matrices and the principle of superposition. Multiplication of matrices
culminates in a classification of phase portraits for planar is introduced as composition of linear mappings, which
constant coefficient linear systems of differential equa- makes transparent the observation that multiplication of
tions. Time permitting additional linear algebra topics matrices is associative. The chapter ends with a discus-
from Chapters 9 and 10 may be included. Such material sion of inverse matrices and the role that inverses play in
includes changes of coordinates for linear mappings, and solving systems of linear equations. The determinant of
orthogonality including Gram-Schmidt orthonormaliza- a 2 × 2 matrix is introduced and its role in determining
tion and least squares fitting of data. matrix inverses is emphasized.
We believe that by being exposed to ODE theory a stu-
dent taking just the first semester of this sequence will
gain a better appreciation of linear algebra than will a Chapter 4 This chapter provides a nonstandard intro-
student who takes a standard one semester introduction duction to differential equations. We begin by emphasiz-
to linear algebra. However, a more traditional Linear Al- ing that solutions to differential equations are functions
gebra course can be taught by omitting Chapter 7 and (or pairs of functions for planar systems). We explain in
de-emphasizing some of the material in Chapter 6. Then detail the two ways that we may graph solutions to differ-
there will be time in a one semester course to cover a se- ential equations (time series and phase space) and how to
lection of the linear algebra topics mentioned at the end go back and forth between these two graphical represen-
of the previous paragraph. tations. The use of the computer is mandatory in this
chapter. Chapter 4 dwells on the qualitative theory of
solutions to autonomous ordinary differential equations.
Chapters 1–3 We consider the first two chapters to be In one dimension we discuss the importance of knowing
introductory material and we attempt to cover this mate- equilibria and their stability so that we can understand
rial as quickly as we can. Chapter 1 introduces MATLAB the fate of all solutions. In two dimensions we empha-
along with elementary remarks on vectors and matrices. size constant coefficient linear systems and the existence

ii
Preface

(numerical) of invariant directions (eigendirections). In Chapter 6 describes closed form solutions to planar sys-
this way we motivate the introduction of eigenvalues and tems of constant coefficient linear differential equations
eigenvectors, which are discussed in detail for 2 × 2 ma- in two different ways: a direct method based on eigen-
trices. Once we know how to compute eigenvalues and values and eigenvectors and a related method based on
eigendirections, we then show how this information cou- similarity of matrices. Each method has its virtues and
pled with superposition leads to closed form solution to vices. Note that the Jordan normal form theorem for
initial value problems, at least when the eigenvalues are 2 × 2 matrices is proved when discussing how to solve
real and distinct. linear planar systems using similarity of matrices.
We are not trying to give a thorough grounding in tech-
niques for solving differential equations in Chapter 4;
Chapters 7, 8, 10, and 11 Chapter 7 discusses deter-
rather we are trying to give an introduction to the ways
minants, characteristic polynomials, and eigenvalues for
that modern computer programs will represent graph-
n × n matrices. Chapter 8 presents more advanced mate-
ically solutions to differential equations. We have in-
rial on linear mappings including row rank equals column
cluded, however, a section on separation of variables for
rank and the matrix representation of mappings in differ-
those who wish to introduce techniques for finding closed
ent coordinate systems. The material in Sections 8.1 and
form solutions to single differential equations at this time.
8.2 could be presented directly after Chapter 5, while the
Our preference is to omit this section in the Linear Sys-
material in Section 8.3 explains the geometric meaning
tems course as well as to omit the applications in Sec-
of similarity.
tion 4.2 of the linear growth model in one dimension to
interest rates and population dynamics. Orthogonal bases and orthogonal matrices, least squares
and Gram-Schmidt orthonormalization, and symmetric
matrices are presented in Chapter 10. This material is
Chapter 5 In this chapter we introduce vector space the- very important, but is not required later in the text, and
ory: vector spaces, subspaces, spanning sets, linear inde- may be omitted.
pendence, bases, dimensions and the other basic notions
in linear algebra. Since solutions to differential equations The Jordan normal form theorem for n × n matrices is
naturally reside in function spaces, we are able to illus- presented in Chapter 11. Diagonalization of matrices
trate that vector spaces other than Rn arise naturally. with distinct real and complex eigenvalues is presented
We have found that, depending on time, the proof of in the first two sections. The appendices, including the
the main theorem, which appears in Section 5.6, may be proof of the complete Jordan normal form theorem, are
omitted in a first course. The material in these chapters included for completeness and should be omitted in class-
is mandatory in any first course on linear algebra. room presentations.

Chapter 6 At this juncture the text divides into two The Classroom Use of Computers At the University of
tracks: one concerned with the qualitative theory of so- Houston we use a classroom with an IBM compatible
lutions to linear and nonlinear planar systems of differen- PC and an overhead display. Lectures are presented three
tial equations and one mainly concerned with the devel- hours a week using a combination of blackboard and com-
opment of higher dimensional linear algebra. We begin puter display. We find it inadvisable to use the computer
with a description of the differential equations chapters. for more than five minutes at a time; we tend to go back

iii
Preface

and forth between standard lecture style and computer May, 1998 Michael Dellnitz
presentations. (The preloaded matrices and differential Columbus Martin Golubitsky
equations are important to the smooth use of the com-
February, 2018 James Fowler
puter in class.)
We ask students to enroll in a one hour computer lab
where they can practice using the material in the text on
a computer, do their homework and additional projects,
and ask questions of TA’s. Our computer lab happens to
have 15 power macs. In addition, we ensure that MAT-
LAB and the laode files are available on student use com-
puters around the campus (which is not always easy).
The laode files are on the enclosed CDROM; they may
also be downloaded by using a web browser or by anony-
mous ftp.

Acknowledgements This course was first taught on a pi-


lot basis during the 1995–96 academic year at the Univer-
sity of Houston. We thank the Mathematics Department
and the College of Natural Sciences and Mathematics
of the University of Houston for providing the resources
needed to bring a course such as this to fruition. We
gratefully acknowledge John Polking’s help in adapting
his software for our use and for allowing us access to his
code so that we could write companion software for use
in linear algebra.
We thank Denny Brown for his advice and his careful
readings of the many drafts of this manuscript. We thank
Gerhard Dangelmayr, Michael Field, Michael Friedberg,
Steven Fuchs, Kimber Gross, Barbara Keyfitz, Charles
Peters and David Wagner for their advice on the presen-
tation of the material. We also thank Elizabeth Golubit-
sky, who has written the companion Solutions Manual,
for her help in keeping the material accessible and in a
proper order. Finally, we thank the students who stayed
with this course on an experimental basis and by doing
so helped to shape its form.

Houston and Bayreuth Martin Golubitsky

iv
Chapter 1 Preliminaries

1 Preliminaries
The subjects of linear algebra and differential equations
involve manipulating vector equations. In this chapter
we introduce our notation for vectors and matrices —
and we introduce MATLAB, a computer program that is
designed to perform vector manipulations in a natural
way.
We begin, in Section 1.1, by defining vectors and matri-
ces, and by explaining how to add and scalar multiply
vectors and matrices. In Section 1.2 we explain how to
enter vectors and matrices into MATLAB, and how to
perform the operations of addition and scalar multiplica-
tion in MATLAB. There are many special types of ma-
trices; these types are introduced in Section 1.3. In the
concluding section, we introduce the geometric interpre-
tations of vector addition and scalar multiplication; in
addition we discuss the angle between vectors through
the use of the dot product of two vectors.

1
§1.1 Vectors and Matrices

1.1 Vectors and Matrices Addition and Scalar Multiplication of Vectors There
are two basic operations on vectors: addition and scalar
In their elementary form, matrices and vectors are just
multiplication. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
lists of real numbers in different formats. An n-vector is
be n-vectors. Then
a list of n numbers (x1 , x2 , . . . , xn ). We may write this
vector as a row vector as we have just done — or as a x + y = (x1 + y1 , . . . , xn + yn );
column vector
  that is, vector addition is defined as componentwise ad-
x1
 ..  dition.
 . .
Similarly, scalar multiplication is defined as component-
xn wise multiplication. A scalar is just a number. Initially,
we use the term scalar to refer to a real number — but
The set of all (real-valued) n-vectors is denoted by Rn ; later on we sometimes use the term scalar to refer to a
so points in Rn are called vectors. The sets Rn when n complex number. Suppose r is a real number; then the
is small are very familiar sets. The set R1 = R is the real multiplication of a vector by the scalar r is defined as
number line, and the set R2 is the Cartesian plane. The
set R3 consists of points or vectors in three dimensional rx = (rx1 , . . . , rxn ).
space.
Subtraction of vectors is defined simply as
An m × n matrix is a rectangular array of numbers with
m rows and n columns. A general 2 × 3 matrix has the x − y = (x1 − y1 , . . . , xn − yn ).
form
  Formally, subtraction of vectors may also be defined as
a11 a12 a13
A= .
a21 a22 a23 x − y = x + (−1)y.

We use the convention that matrix entries aij are indexed Division of a vector x by a scalar r is defined to be
so that the first subscript i refers to the row while the 1
second subscript j refers to the column. So the entry a21 r
x.
refers to the matrix entry in the 2nd row, 1st column.
The standard difficulties concerning division by zero still
An n × m matrix A and an n0 × m0 matrix B are equal hold.
precisely when the sizes of the matrices are equal (n = n0
and m = m0 ) and when each of the corresponding entries
are equal (aij = bij ). Addition and Scalar Multiplication of Matrices Simi-
larly, we add two m × n matrices by adding correspond-
There is some redundancy in the use of the terms “vec-
ing entries, and we multiply a scalar times a matrix by
tor” and “matrix”. For example, a row n-vector may be
multiplying each entry of the matrix by that scalar. For
thought of as a 1 × n matrix, and a column n-vector may
example,
be thought of as a n × 1 matrix. There are situations
where matrix notation is preferable to vector notation
     
0 2 1 −3 1 −1
and vice-versa. + =
4 6 1 4 5 10

2
§1.1 Vectors and Matrices

and
   
1 3 2 1
    8. A = and B = .
2 −4 8 −16 0 4 1 −2
4 = .
3 1 12 4  
2 1 0  
The main restriction on adding two matrices is that the
2 1
9. A =  4 1 0  and B = .
1 −2
matrices must be of the same size. So you cannot add a 0 0 0
4 × 3 matrix to 6 × 2 matrix — even though they both
have twelve entries.  
2 1
In Exercises 10 – 11, let A = and B =
−1 4
Exercises
 
0 2
and compute the given expression.
3 −1

10. 4A + B.
In Exercises 1 – 3, let x = (2, 1, 3) and y = (1, 1, −1) and
compute the given expression. 11. 2A − 3B.

1. x + y.

2. 2x − 3y.

3. 4x.

4. Let A be the 3 × 4 matrix


 
2 −1 0 1
A= 3 4 −7 10  .
6 −3 4 2

(a) For which n is a row of A a vector in Rn ? .


(b) What is the 2nd column of A?
(c) Let aij be the entry of A in the ith row and the j th
column. What is a23 − a31 ?

For each of the pairs of vectors or matrices in Exercises 5


– 9, decide whether addition of the members of the pair is
possible; and, if addition is possible, perform the addition.

5. x = (2, 1) and y = (3, −1).

6. x = (1, 2, 2) and y = (−2, 1, 4).

7. x = (1, 2, 3) and y = (−2, 1).

3
§1.2 MATLAB

1.2 MATLAB x + y
We shall use MATLAB to compute addition and scalar
and MATLAB responds with
multiplication of vectors in two and three dimensions.
This will serve the purpose of introducing some basic ans =
MATLAB commands. 3 1 2

Entering Vectors and Vector Operations Begin a This vector is easily checked to be the sum of the vectors
MATLAB session. We now discuss how to enter a vector x and y. Similarly, to perform a scalar multiplication,
into MATLAB. The syntax is straightforward; to enter type
the row vector x = (1, 2, 1) type2
2*x

x = [1 2 1] which yields

and MATLAB responds with ans =


2 4 2
x =
MATLAB subtracts the vector y from the vector x in the
1 2 1
natural way. Type
Next we show how easy it is to perform addition and x - y
scalar multiplication in MATLAB. Enter the row vector
y = (2, −1, 1) by typing to obtain

y = [2 -1 1] ans =
-1 3 0
and MATLAB responds with
We mention two points concerning the operations that
y = we have just performed in MATLAB.
2 -1 1
(a) When entering a vector or a number, MATLAB auto-
matically echoes what has been entered. This echo-
To add the vectors x and y, type
ing can be suppressed by appending a semicolon to
2
MATLAB has several useful line editing features. We point the line. For example, type
out two here:
(a) Horizontal arrow keys (→, ←) move the cursor one space with- z = [-1 2 3];
out deleting a character.
(b) Vertical arrow keys (↑, ↓) recall previous and next command
and MATLAB responds with a new line awaiting a
lines. new command. To see the contents of the vector z
just type z and MATLAB responds with

4
§1.2 MATLAB

z = just type:
-1 2 3
z = [-1; 2; 3]
(b) MATLAB stores in a new vector the information ob-
tained by algebraic manipulation. Type and MATLAB responds with
a = 2*x - 3*y + 4*z;
z =
Now type a to find -1
2
a = 3
-8 15 11
We see that MATLAB has created a new row vector Note that MATLAB will not add a row vector and a col-
a with the correct number of entries. umn vector. Try typing x + z.
Individual entries of a vector can also be addressed. For
Note: In order to use the result of a calculation later instance, to display the first component of z type z(1).
in a MATLAB session, we need to name the result of
that calculation. To recall the calculation 2*x - 3*y +
4*z, we needed to name that calculation, which we did Entering Matrices Matrices are entered into MATLAB
by typing a = 2*x - 3*y + 4*z. Then we were able to row by row with rows separated either by semicolons or
recall the result just by typing a. by line returns. To enter the 2 × 3 matrix
We have seen that we enter a row n vector into MATLAB
 
2 3 1
A= ,
by surrounding a list of n numbers separated by spaces 1 4 7
with square brackets. For example, to enter the 5-vector
w = (1, 3, 5, 7, 9) just type just type

w = [1 3 5 7 9] A = [2 3 1; 1 4 7]

Note that the addition of two vectors is only defined when MATLAB has very sophisticated methods for addressing
the vectors have the same number of entries. Trying to the entries of a matrix. You can directly address individ-
add the 3-vector x with the 5-vector w by typing x + w ual entries, individual rows, and individual columns. To
in MATLAB yields the warning: display the entry in the 1st row, 3rd column of A, type
A(1,3). To display the 2nd column of A, type A(:,2);
??? Error using ==> + and to display the 1st row of A, type A(1,:). For ex-
Matrix dimensions must agree. ample, to add the two rows of A and store them in the
vector x, just type
In MATLAB new rows are indicated by typing ;. For
example, to enter the column vector x = A(1,:) + A(2,:)
 
−1
z =  2 , MATLAB has many operations involving matrices —
3 these will be introduced later, as needed.

5
§1.2 MATLAB

Exercises In Exercise 7 – 9, use MATLAB as a calculator to perform


the indicated computations:
18(2 + 3.5)
7. (matlab) Use MATLAB to determine to 4
1. (matlab) Enter the 3 × 4 matrix decimal places.
4.7
 
1 2 5 7
A =  −1 2 1 −2  .
4 6 8 0 8. (matlab) Use MATLAB to determine the vector
√ 2 π
[ 3, e , sin( )] to 4 decimal places.
As usual, let aij denote the entry of A in the ith row and j th 6
column. Use MATLAB to compute the following:

(a) a13 + a32 . 9. (matlab) Use MATLAB to compute log(−0.1). Are you
surprised by the answer?
(b) Three times the 3 rd
column of A.
(c) Twice the 2 nd
row of A minus the 3rd row.
(d) The sum of all of the columns of A.

2. (matlab) Verify that MATLAB adds vectors only if they


are of the same type, by typing

(a) x = [1 2], y = [2; 3] and x + y.


(b) x = [1 2], y = [2 3 1] and x + y.

In Exercises 3 – 4, let x = (1.2, 1.4, −2.45) and y =


(−2.6, 1.1, 0.65) and use MATLAB to compute the given ex-
pression.

3. (matlab) 3.27x − 7.4y.

4. (matlab) 1.65x + 2.46y.

In Exercises 5 – 6, let
   
1.2 2.3 −0.5 −2.9 1.23 1.6
A= and B=
0.7 −1.4 2.3 −2.2 1.67 0

and use MATLAB to compute the given expression.

5. (matlab) −4.2A + 3.1B.

6. (matlab) 2.67A − 1.1B.

6
§1.3 Special Kinds of Matrices

1.3 Special Kinds of Matrices is the 2 × 4 matrix


There are many matrices that have special forms and
 
2 −1 3 5
.
hence have special names — which we now list. 1 2 −4 7
Suppose that you enter this 4 × 2 matrix into MAT-
• A square matrix is a matrix with the same number
LAB by typing
of rows and columns; that is, a square matrix is an
n × n matrix. A = [2 1; -1 2; 3 -4; 5 7]
• A diagonal matrix is a square matrix whose only
The transpose of a matrix A is denoted by At . To
nonzero entries are along the main diagonal; that is,
compute the transpose of A in MATLAB, just type
aij = 0 if i 6= j. The following is a 3 × 3 diagonal
A0 .
matrix
• A symmetric matrix is a square matrix whose entries
 
1 0 0
 0 2 0 . are symmetric about the main diagonal; that is aij =
0 0 3 aji . Note that a symmetric matrix is a square matrix
There is a shorthand in MATLAB for entering di- A for which At = A.
agonal matrices. To enter this 3 × 3 matrix, type
• An upper triangular matrix is a square matrix all of
diag([1 2 3]).
whose entries below the main diagonal are 0; that is,
• The identity matrix is the diagonal matrix all of aij = 0 if i > j. A strictly upper triangular matrix
whose diagonal entries equal 1. The n × n iden- is an upper triangular matrix whose diagonal en-
tity matrix is denoted by In . This identity matrix is tries are also equal to 0. Similar definitions hold for
entered in MATLAB by typing eye(n). lower triangular and strictly lower triangular matri-
ces. The following four 3 × 3 matrices are examples
• A zero matrix is a matrix all of whose entries are of upper triangular, strictly upper triangular, lower
0. A zero matrix is denoted by 0. This notation triangular, and strictly lower triangular matrices:
is ambiguous since there is a zero m × n matrix for
every m and n. Nevertheless, this ambiguity rarely
   
1 2 3 0 2 3
causes any difficulty. In MATLAB, to define an m×n  0 2 4   0 0 4 
matrix A whose entries all equal 0, just type A = 0 0 6 0 0 0
zeros(m,n). To define an n × n zero matrix B, type    
7 0 0 0 0 0
B = zeros(n).  5 2 0   5 0 0 .
• The transpose of an m × n matrix A is the n × m −4 1 −3 10 1 0
matrix obtained from A by interchanging rows and
• A square matrix A is block diagonal if
columns. Thus the transpose of the 4 × 2 matrix
 
  B1 0 · · · 0
2 1  0 B2 · · ·
 −1 0 
2  A= . . .. .. 
 .. ..
 
. . 
 
 3 −4 
5 7 0 0 · · · Bk

7
§1.3 Special Kinds of Matrices

where each Bj is itself a square matrix. An example


 
3 2
of a 5 × 5 block diagonal matrix with one 2 × 2 block 9.  0 1 .
and one 3 × 3 block is: 0 0
 
  0 2 −4
2 3 0 0 0
 4 1 0 0 0  10.  0 7 −2 .
  0 0 0
 0 0 1 2 3 .
 
 0 0 3 2 4 
 
0 0 1 1 5 a 0
A general 2×2 diagonal matrix has the form . Thus
0 b
two numbers a and b are needed to specify each 2 × 2 diagonal
Exercises matrix.
In Exercises 11 – 16, how many different numbers are needed
to specify each of the given matrices:
In Exercises 1 – 5 decide whether or not the given matrix is
11. An upper triangular 2 × 2 matrix?
symmetric.
  12. A symmetric 2 × 2 matrix?
2 1
1. .
1 5 13. An m × n matrix?
14. A diagonal n × n matrix?
 
1 1
2. .
0 −5
15. An upper triangular n × n matrix?
3. (3). 16. A symmetric n × n matrix?
 
3 4
4.  4 3 . In each of Exercises 17 – 19 determine whether the statement
0 1 is True or False?
17. Every symmetric upper triangular matrix is diagonal.
 
3 4 −1
5. A =  4 3 1 .
−1 1 10 18. Every diagonal matrix is a scalar multiple of the identity
matrix.
19. Every block diagonal matrix is symmetric.
In Exercises 6 – 10 decide which of the given matrices are
upper (or lower) triangular and which are strictly upper (or
lower) triangular.
  20. (matlab) Use MATLAB to compute At when
2 0
6. .
 
−1 −2 1 2 4 7
  A= 2 1 5 6  (1.3.1)
0 4 4 6 2 1
7. .
0 0
Use MATLAB to verify that (At )t = A by setting B=A', C=B',
8. (2). and checking that C = A.

8
§1.3 Special Kinds of Matrices

In Exercises 21-22, compute A + At and show it is symmetric: 27. Show that an n×n skew-symmetric matrix A has diagonal
  elements aii = 0, for all i = 1, . . . , n.
2 3
21. A = .
−1 6

28. If A is an n × n matrix, show that A − At is skew-


  symmetric.
3 7 5
22. A =  −2 0 1 .
−3 4 2
29. (matlab) Use MATLAB to show that A − At is skew-
symmetric for
23. (matlab) Use MATLAB to show that A + At is sym- 
2 8 3 0

metric for    1 0 8 2 
2 8 3 0 A= 
 1  2 −5 −6 7 
0 8 2 
A=  2 −5 −6 7 
 9 −1 −4 5
9 −1 −4 5
in the following way:
in the following way:
(a) Enter matrix A in MATLAB .
(a) Enter matrix A in MATLAB .
(b) Define B = A − At .
(b) Define B = A + At .
(c) Verify that B + B t = 0.
(c) Verify that B − B t = 0.

   
−1 1 −2 3
Exercises 24-25 prove several properties of the transpose op- Let A = and B = . In Exer-
4 5 0 1
eration
cises 30-33, determine whether the given matrix is symmetric,
24. Let A and B be m × n matrices. Show that skew-symmetric, or neither.

(A + B)t = At + B t . (1.3.2) 30. A + B.

31. At − B.

25. Let A be an m × n matrix and let c be a real number. A + At


32. .
Show that (cA)t = cAt . 2
2021
33. (B − B t ).
2
26. Let A be an n×n matrix. Use (1.3.2) to show that A+At
is a symmetric matrix.
34. Show that if an n × n matrix A is both symmetric and
skew-symmetric, then A is a zero matrix.
Definition 1.3.1. An n×n matrix is skew-symmetric if At =
−A. Equivalently, aij = −aji for each i, j = 1, . . . , n.

9
§1.3 Special Kinds of Matrices

35. Let A be an n × n matrix, verify that

A + At A − At
A= + . (1.3.3)
2 2
Use (1.3.3) to verify that any n×n matrix can be expressed as
a sum of a symmetric matrix and a skew-symmetric matrix.

In Exercises 36-37, use (1.3.3) to express A = B + C, where


B is symmetric and C is skew-symmetric.
 
1 9
36. A = .
3 2
 
0 3 −4
37. A =  3 9 −1 .
1 6 −4

 
13 2 −41 81
 −2 11 −7 6 
38. (matlab) Let A =   . Use
15 −3 −20 −19 
4 −8 −27 0
(1.3.3) and MATLAB to express A = B + C, where B is
symmetric and C is skew-symmetric.

10
§1.4 The Geometry of Vector Operations

1.4 The Geometry of Vector 5


x+y

Operations 4

3 y

In this section we discuss the geometry of addition, scalar 2 x

multiplication, and dot product of vectors. We also use 1

MATLAB graphics to visualize these operations. 0

−1

Geometry of Addition MATLAB has an excellent −2

graphics language that we shall use at various times to −3

illustrate concepts in both two and three dimensions. In −4

order to make the connections between ideas and graph- −5

ics more transparent, we will sometimes use previously


−5 −4 −3 −2 −1 0 1 2 3 4 5

developed MATLAB programs. We begin with such an


Figure 1: Addition of two planar vectors.
example — the illustration of the parallelogram law for
vector addition.
Suppose that x and y are two planar vectors. Think The parallelogram spanned by x and y in R3 is shown in
of these vectors as line segments from the origin to the cyan; the diagonal x + y is shown in blue. See Figure 2.
points x and y in R2 . We use a program written by T.A. To test your geometric intuition, make several choices of
Bryan to visualize x + y. In MATLAB type3 : vectors x and y. Note that one vertex of the parallelo-
gram is always the origin.
x = [1 2];
y = [-2 3]; Geometry of Scalar Multiplication In all dimensions
addvec(x,y) scalar multiplication just scales the length of the vector.
To discuss this point we need to define the length of a vec-
The vector x is displayed in blue, the vector y in green, tor. View an n-vector x = (x1 , . . . , xn ) as a line segment
and the vector x + y in red. Note that x + y is just from the origin to the point x. Using the Pythagorean
the diagonal of the parallelogram spanned by x and y. A theorem, it can be shown that the length or norm of this
black and white version of this figure is given in Figure 1. line segment is:
The parallelogram law (the diagonal of the parallelogram q
spanned by x and y is x + y) is equally valid in three ||x|| = x21 + · · · + x2n .
dimensions. Use MATLAB to verify this statement by
MATLAB has the command norm for finding the length
typing:
of a vector. Test this by entering the 3-vector
x = [1 0 2];
x = [1 4 2];
y = [-1 4 1];
addvec3(x,y) Then type
3
Note that all MATLAB commands are case sensitive — upper
and lower case must be correct norm(x)

11
§1.4 The Geometry of Vector Operations

x+y portant operation on vectors. It is defined by:

3
x · y = x1 y1 + · · · + xn yn . (1.4.2)
2.5 Note that x · x is just ||x|| , the length of x squared.
2

MATLAB also has a command for computing dot prod-


2 x

1.5 y ucts of n-vectors. Type


1

0.5
x = [1 4 2];
y = [2 3 -1];
dot(x,y)
0
4 0
2 4
0
0
2
MATLAB responds with the dot product of x and y,
−2
−4
−2 namely,
−4

ans =
Figure 2: Addition of two vectors in three dimensions.
12

MATLAB responds with: One of the most important facts concerning dot products
is the one that states
ans = x·y =0 if and only if x and y are perpendicular.
4.5826 (1.4.3)
√ Indeed, dot product also gives a way of numerically de-
which is indeed approximately
p
1 + 4 2 + 22 = 21. termining the angle between n-vectors, as follows.
Now suppose r ∈ R and x ∈ R . A calculation shows
n
Theorem 1.4.1. Let θ be the angle between two nonzero
that n-vectors x and y. Then
||rx|| = |r|||x||. (1.4.1)
x·y
cos θ = . (1.4.4)
See Exercise 18. Note also that if r is positive, then ||x||||y||
the direction of rx is the same as that of x; while if r
is negative, then the direction of rx is opposite to the It follows that cos θ = 0 if and only if x · y = 0. Thus
direction of x. The lengths of the vectors 3x and −3x (1.4.3) is valid.
are each three times the length of x — but these vectors We show that Theorem 1.4.1 is just a restatement of the
point in opposite directions. Scalar multiplication by the law of cosines. This law states
scalar 0 produces the 0 vector, the vector whose entries
are all zero. c2 = a2 + b2 − 2ab cos θ,
where a, b, c are the lengths of the sides of a triangle and
Dot Product and Angles The dot product of two n- θ is the interior angle opposite the side of length c. See
vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) is an im- Figure 3.

12
§1.4 The Geometry of Vector Operations

(a cos θ, a sin θ)
x-y

a c
x

θ
(0, 0) b (b, 0) θ y

Figure 3: Triangle formed by sides of length a, b, c with 0


interior angle θ opposite side c.
Figure 4: Triangle formed by vectors x and y with interior
angle θ.
We use trigonometry to verify the law of cosines. First,
translate the triangle so that a vertex is at the origin.
Second, rotate the triangle placing a vertex on the x-axis Assuming that the claim is valid, it follows that
and another vertex above the x-axis. After translating x · y = ||x||||y|| cos θ,
and rotating, the coordinates of the nonzero vertex on the
x-axis is (b, 0). Observe that the vertex above the x-axis which proves the theorem. Finally, compute
has coordinates (a cos θ, a sin θ). Then use the distance ||x − y||2 = (x1 − y1 )2 + · · · + (xn − yn )2
formula to observe that the length c is the distance from
the vertex at (b, 0) to the vertex at (a cos θ, a sin θ). That = (x21 − 2x1 y1 + y12 ) + · · · + (x2n − 2xn yn + yn2 )
is, = (x21 + · · · + x2n ) − 2(x1 y1 + · · · + xn yn ) + (y12 + · · ·
c2 = (a cos θ − b)2 + (a sin θ)2 = ||x||2 − 2x · y + ||y||2
= a2 cos2 θ − 2ab cos θ + b2 + a2 sin2 θ to verify the claim. 
= a2 + b2 − 2ab cos θ.
Theorem 1.4.1 gives a numerically efficient method for
computing the angle between vectors x and y. In MAT-
Proof of Theorem 1.4.1 In vector notation we can form
LAB this computation proceeds by typing
a triangle two of whose sides are given by x and y in Rn .
The third side is just x − y as x = y + (x − y), as in
theta = acos(dot(x,y)/(norm(x)*norm(y)))
Figure 4.
It follows from the law of cosines that where acos is the inverse cosine of a number. For exam-
ple, using the 3-vectors x = (1, 4, 2) and y = (2, 3, −1)
||x − y||2 = ||x||2 + ||y||2 − 2||x||||y|| cos θ. entered previously, MATLAB responds with
We claim that
theta =
||x − y||2 = ||x||2 + ||y||2 − 2x · y. 0.7956

13
§1.4 The Geometry of Vector Operations

Remember that this answer is in radians. To convert this Exercises


answer to degrees, just multiply by 360 and divide by 2π:

360*theta / (2*pi) In Exercises 1 – 4 compute the lengths of the given vectors.

1. x = (3, 0).
to obtain the answer of 45.5847◦ .
2. x = (2, −1).

Area of Parallelograms Let P be a parallelogram whose 3. x = (−1, 1, 1).


sides are the vectors v and w as in Figure 5. Let |P | 4. x = (−1, 0, 2, −1, 3).
denote the area of P . As an application of dot products
and (1.4.4), we calculate |P |. We claim that
In Exercises 5 – 8 determine whether the given pair of vectors
|P |2 = ||v||2 ||w||2 − (v · w)2 . (1.4.5) is perpendicular.

We verify (1.4.5) as follows. Note that the area of P is 5. x = (1, 3) and y = (3, −1).
the same as the area of the rectangle R also pictured in 6. x = (2, −1) and y = (−2, 1).
Figure 5. The side lengths of R are: ||v|| and ||w|| sin θ
where θ is the angle between v and w. A computation 7. x = (1, 1, 3, 5) and y = (1, −4, 3, 0).
using (1.4.4) shows that
8. x = (2, 1, 4, 5) and y = (1, −4, 3, −2).
2 2 2 2
|R| = ||v|| ||w|| sin θ
= ||v||2 ||w||2 (1 − cos2 θ) 9. Find a real number a so that the vectors
 2 !
v·w
2 2
= ||v|| ||w|| 1 − x = (1, 3, 2) and y = (2, a, −6)
||v||||w||
are perpendicular.
= ||v||2 ||w||2 − (v · w)2 ,

which establishes (1.4.5).


10. Find the lengths of the vectors u = (2, 1, −2) and v =
w (0, 1, −1), and the angle between them.

P |w| sin(θ) R
11. Find the cosine of the angle between the normal vectors
θ to the planes
0 v |v| 2x − 2y + z = 14 and x + y − 2z = −10.

Figure 5: Parallelogram P beside rectangle R with same


area. In Exercises 12 – 17 compute the dot product x·y for the given
pair of vectors and the cosine of the angle between them.

14
§1.4 The Geometry of Vector Operations

12. x = (2, 0) and y = (2, 1). In Exercises 25 – 26 let P be the parallelogram generated by
the given vectors v and w in R3 . Compute the area of that
13. x = (2, −1) and y = (1, 2). parallelogram.
14. x = (−1, 1, 4) and y = (0, 1, 3).
25. (matlab) v = (1, 5, 7) and w = (−2, 4, 13).
15. x = (−10, 1, 0) and y = (0, 1, 20).
26. (matlab) v = (2, −1, 1) and w = (−1, 4, 3).
16. x = (2, −1, 1, 3, 0) and y = (4, 0, 2, 7, 5).

17. x = (5, −1, 4, 1, 0, 0) and y = (−3, 0, 0, 1, 10, −5). 27. Show that the only vector that has norm zero is the zero
vector. In other words, ||x|| = 0 implies that x = 0.
18. Using the definition of length, verify that formula (1.4.1)
is valid.

19. (matlab) Use addvec and addvec3 to add vectors in


R2 and R3 . More precisely, enter pairs of 2-vectors x and y
of your choosing into MATLAB, use addvec to compute x+y,
and note the parallelogram formed by 0, x, y, x + y. Similarly,
enter pairs of 3-vectors and use addvec3.

20. (matlab) Determine the vector of length 1 that points


in the same direction as the vector

x = (2, 13.5, −6.7, 5.23).

21. (matlab) Determine the vector of length 1 that points


in the same direction as the vector

y = (2.1, −3.5, 1.5, 1.3, 5.2).

In Exercises 22– 24 find the angle in degrees between the given


pair of vectors.

22. (matlab) x = (2, 1, −3, 4) and y = (1, 1, −5, 7).

23. (matlab) Answer: x = (2.43, 10.2, −5.27, π) and y =


(−2.2, 0.33, 4, −1.7).

24. (matlab) x = (1, −2, 2, 1, 2.1) and y =


(−3.44, 1.2, 1.5, −2, −3.5).

15
Chapter 2 Solving Linear Equations

2 Solving Linear Equations tance) in Section 2.6. This section is included mainly for
completeness and need not be covered on a first reading.
The primary motivation for the study of vectors and ma-
trices is based on the study of solving systems of linear
equations. The algorithms that enable us to find solu-
tions are themselves based on certain kinds of matrix
manipulations. In these algorithms, matrices serve as a
shorthand for calculation, rather than as a basis for a
theory. We will see later that these matrix manipula-
tions do lead to a rich theory of how to solve systems
of linear equations. But our first step is just to see how
these equations are actually solved.
We begin with a discussion in Section 2.1 of how to write
systems of linear equations in terms of matrices. We
also show by example how complicated writing down the
answer to such systems can be. In Section 2.2, we recall
that solution sets to systems of linear equations in two
and three variables are lines and planes.
The best known and probably the most efficient method
for solving systems of linear equations (especially with
a moderate to large number of unknowns) is Gaussian
elimination. The idea behind this method, which is in-
troduced in Section 2.3, is to manipulate matrices by
elementary row operations to reduced echelon form. It
is then possible just to look at the reduced echelon form
matrix and to read off the solutions to the linear system,
if any. The process of reading off the solutions is formal-
ized in Section 2.4; see Theorem 2.4.6. Our discussion
of solving linear equations is presented with equations
whose coefficients are real numbers — though most of
our examples have just integer coefficients. The meth-
ods work just as well with complex numbers, and this
generalization is discussed in Section 2.5.
Throughout this chapter, we alternately discuss the the-
ory and show how calculations that are tedious when
done by hand can easily be performed by computer using
MATLAB. The chapter ends with a proof of the unique-
ness of row echelon form (a topic of theoretical impor-

16
§2.1 Systems of Linear Equations and Matrices

2.1 Systems of Linear Equations The algorithmic method used to solve (2.1.1) can be ex-
panded to produce a method, called substitution, for solv-
and Matrices ing larger systems. We describe the substitution method
It is a simple exercise to solve the system of two equations as it applies to (2.1.2). Solve the 1st equation in (2.1.2)
for x1 , obtaining
x + y = 7
(2.1.1) 4 4 3 6 2
−x + 3y = 1 x1 = + x2 − x3 + x4 − x5 . (2.1.3)
5 5 5 5 5
to find that x = 5 and y = 2. One way to solve system Then substitute the right hand side of (2.1.3) for x1 in
(2.1.1) is to add the two equations, obtaining the remaining four equations in (2.1.2) to obtain a new
system of four equations in the four variables x2 ,x3 ,x4 ,x5 .
4y = 8; This procedure eliminates the variable x1 . Now proceed
inductively — solve the 1st equation in the new system
hence y = 2. Substituting y = 2 into the 1st equation in for x2 and substitute this expression into the remaining
(2.1.1) yields x = 5. three equations to obtain a system of three equations in
This system of equations can be solved in a more algo- three unknowns. This step eliminates the variable x2 .
rithmic fashion by solving the 1st equation in (2.1.1) for Continue by substitution to eliminate the variables x3
x as and x4 , and arrive at a simple equation in x5 — which
x = 7 − y, can be solved. Once x5 is known, then x4 , x3 , x2 , and
x1 can be found in turn.
and substituting this answer into the 2nd equation in
(2.1.1), to obtain
Two Questions
−(7 − y) + 3y = 1.
• Is it realistic to expect to complete the substitution
This equation simplifies to: procedure without making a mistake in arithmetic?

4y = 8. • Will this procedure work — or will some unforeseen


difficulty arise?
Now proceed as before.
Almost surely, attempts to solve (2.1.2) by hand, using
the substitution procedure, will lead to arithmetic errors.
Solving Larger Systems by Substitution In contrast to However, computers and software have developed to the
solving the simple system of two equations, it is less clear point where solving a system such as (2.1.2) is routine.
how to solve a complicated system of five equations such In this text, we use the software package MATLAB to
as: illustrate just how easy it has become to solve equations
such as (2.1.2).
5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4
2x1 + x2 − x3 − x4 + x5 = 6 The answer to the second question requires knowledge of
x1 + 2x2 + x3 + x4 + 3x5 = 19 (2.1.2) the theory of linear algebra. In fact, no difficulties will
−2x1 − x2 − x3 + x4 − x5 = −12 develop when trying to solve the particular system (2.1.2)
x1 − 6x2 + x3 + x4 + 4x5 = 4. using the substitution algorithm. We discuss why later.

17
§2.1 Systems of Linear Equations and Matrices

Solving Equations by MATLAB We begin by discussing 1 2 1 1 3


the information that is needed by MATLAB to solve -2 -1 -1 1 -1
(2.1.2). The computer needs to know that there are five 1 -6 1 1 4
equations in five unknowns — but it does not need to
keep track of the unknowns (x1 , x2 , x3 , x4 , x5 ) by name. Indeed, comparing this result with (2.1.4*), we see that
Indeed, the computer just needs to know the matrix of A contains precisely the same information.
coefficients in (2.1.2) Since the label (2.1.5*) is followed by a ‘∗’, we can enter
  the vector in (2.1.5*) into MATLAB by typing
5 −4 3 −6 2
 2 1 −1 −1 1  e2_1_5
(2.1.4*)
 
 1 2 1 1 3 
Note that the right hand side of (2.1.2) is stored in the
 
 −2 −1 −1 1 −1 
1 −6 1 1 4 vector b. MATLAB should have responded with
and the vector on the right hand side of (2.1.2) b =
  4
4
6
6 
19

(2.1.5*)
 
 19  .

 −12 
 -12
4
4
Now MATLAB has all the information it needs to solve
We now describe how we enter this information into the system of equations given in (2.1.2). To have MAT-
MATLAB. To reduce the drudgery and to allow us to LAB solve this system, type
focus on ideas, the entries in equations having a ∗ after
their label, such as (2.1.4*), have been entered in the x = A\b
laode toolbox. This information can be accessed as fol-
lows. After starting your MATLAB session, type to obtain

e2_1_4 x =
5.0000
followed by a carriage return. This instruction tells MAT- 2.0000
LAB to load equation (2.1.4*) of Chapter 2. The matrix 3.0000
of coefficients is now available in MATLAB; note that this 4.0000
matrix is stored in the 5 × 5 array A. What should appear 1.0000
is:
This answer is interpreted as follows: the five values of
the unknowns x1 ,x2 ,x3 ,x4 ,x5 are stored in the vector x;
A =
that is,
5 -4 3 -6 2
2 1 -1 -1 1 x1 = 5, x2 = 2, x3 = 3, x4 = 4, x5 = 1. (2.1.6)

18
§2.1 Systems of Linear Equations and Matrices

The reader may verify that (2.1.6) is indeed a solution Thus the command A(3,4) = -2 changes the entry in
of (2.1.2) by substituting the values in (2.1.6) into the the 3rd row, 4th column of A from 1 to −2. In other words,
equations in (2.1.2). we have now entered into MATLAB the information that
is needed to solve the system of equations
Changing Entries in MATLAB MATLAB also permits 5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4
access to single components of x. For instance, type 2x1 + x2 − x3 − x4 + x5 = 6
x1 + 2x2 + x3 − 2x4 + 3x5 = 19
x(5) −2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.
and the 5th entry of x is displayed,
As expected, this change in the coefficient matrix results
ans = in a change in the solution of system (2.1.2), as well.
1.0000 Typing

We see that the component x(i) of x corresponds to the x = A\b


component xi of the vector x where i = 1, 2, 3, 4, 5. Sim-
ilarly, we can access the entries of the coefficient matrix now leads to the solution
A. For instance, by typing
x =
A(3,4) 1.9455
3.0036
MATLAB responds with 3.0000
1.7309
ans = 3.8364
1
that is displayed to an accuracy of four decimal places.
It is also possible to change an individual entry in either
In the next step, change A as follows:
a vector or a matrix. For example, if we enter
A(2,3) = 1
A(3,4) = -2
The new system of equations is:
we obtain a new matrix A which when displayed is:
5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4
A = 2x1 + x2 + x3 − x4 + x5 = 6
5 -4 3 -6 2 x1 + 2x2 + x3 − 2x4 + 3x5 = 19 (2.1.7)
2 1 -1 -1 1 −2x1 − x2 − x3 + x4 − x5 = −12
1 2 1 -2 3 x1 − 6x2 + x3 + x4 + 4x5 = 4.
-2 -1 -1 1 -1
1 -6 1 1 4 The command

19
§2.1 Systems of Linear Equations and Matrices

x = A\b 1.
2x − y = 0
now leads to the message 3x = 6

2.
Warning: Matrix is singular to working precision. 3x − 4y = 2
2y + z = 1
x = 3z = 9
Inf 3.
Inf −2x + y = 9
Inf 3x + 3y = −9
Inf
Inf
4. Write the coefficient matrices for each of the systems of
Obviously, something is wrong; MATLAB cannot find linear equations given in Exercises 1 – 3.
a solution to this system of equations! Assuming that
MATLAB is working correctly, we have shed light on one
of our previous questions: the method of substitution 5. Neither of the following systems of three equations in three
described by (2.1.3) need not always lead to a solution, unknowns has a unique solution — but for different reasons.
even though the method does work for system (2.1.2). Solve these systems and explain why these systems cannot be
Why? As we will see, this is one of the questions that is solved uniquely.
answered by the theory of linear algebra. In the case of
(2.1.7), it is fairly easy to see what the difficulty is: the x − y = 4
second and fourth equations have the form y = 6 and (a) x + 3y − 2z = −6

−y = −12, respectively. 4x + 2y 3z = 1

and
Warning: The MATLAB command 2x − 4y + 3z = 4
(b) 3x − 5y + 3z = 5
x = A\b 2y − 3z = −4

may give an error message similar to the previous one.


When this happens, one must approach the answer with 6. Last year Dick was twice as old as Jane. Four years ago
caution. the sum of Dick’s age and Jane’s age was twice Jane’s age
now. How old are Dick and Jane?
Answer: Dick is 17 and Jane is 9.
Exercises
Solution: Rewrite the two statements as linear equations in
D — Dick’s age now — and J — Jane’s age now. Then solve
the system of linear equations.
In Exercises 1 – 3 find solutions to the given system of linear
equations.

20
§2.1 Systems of Linear Equations and Matrices

7. (a) Find a quadratic polynomial p(x) = ax2 + bx + c 9. (matlab) Matrices are entered in MATLAB as follows.
satisfying p(0) = 1, p(1) = 5, and p(−1) = −5. To enter the 2 × 3 matrix A, type A = [ -1 1 2; 4 1 2].
Enter this matrix into MATLAB; the displayed matrix should
be

A =
(b) Prove that for every triple of real numbers L, M , and -1 1 2
N , there is a quadratic polynomial satisfying p(0) = L, 4 1 2
p(1) = M , and p(−1) = N .
Now change the entry in the 2nd row, 1st column to −5.

10. (matlab) Column vectors with n entries are viewed by


(c) Let x1 , x2 , x3 be three unequal real numbers and let
MATLAB as n × 1 matrices. Enter the vector b = [1; 2;
A1 , A2 , A3 be three real numbers. Show that finding a
-4]. Then change the 3rd entry in b to 13.
quadratic polynomial q(x) that satisfies q(xi ) = Ai is
equivalent to solving a system of three linear equations.

11. (matlab) This problem illustrates some of the differ-


ent ways that MATLAB displays numbers using the format
long, the format short and the format rational com-
mands.
Use MATLAB to solve the following system of equations
8. (matlab) Using MATLAB type the commands e2_1_8 2x1 − 4.5x2 + 3.1x3 = 4.2
and e2_1_9 to load the matrices: x1 + x2 + x3 = −5.1
  x1 − 6.2x2 + x3 = 1.3 .
−5.6 0.4 −9.8 8.6 4.0 −3.4
 −9.1 6.6 −2.3 6.9 8.2 2.7 You may change the format of your answer in MATLAB. For

 
 3.6 −9.3 −8.7 0.5 5.2 5.1 example, to print your result with an accuracy of 15 digits

A=  3.6 −8.9 −1.7 −8.2 −4.8

9.8 type format long and redisplay the answer. Similarly, to

 
 8.7 0.6 3.7 3.1 −9.1 −2.7 
print your result as fractions type format rational and re-
−2.3 3.4 1.8 −1.7 4.7 −5.1 display your answer.
(2.1.8*)
and the vector  
9.7
 4.5 12. (matlab) Enter the following matrix and vector into
MATLAB

 
 5.1
(2.1.9*)

b=
 3.0


A = [ 1 0 -1 ; 2 5 3 ; 5 -1 0];
 
 −8.5 
2.6 b = [ 1; 1; -2];
Solve the corresponding system of linear equations.
and solve the corresponding system of linear equations by
typing

21
§2.1 Systems of Linear Equations and Matrices

x = A\b

Your answer should be

x =
-0.2000
1.0000
-1.2000

Find an integer for the entry in the 2nd row, 2nd column of A
so that the solution

x = A\b

is not defined. Hint: The answer is an integer between −4


and 4.

13. (matlab) The MATLAB command rand(m,n) defines


matrices with random entries between 0 and 1. For example,
the command A = rand(5,5) generates a random 5 × 5 ma-
trix, whereas the command b = rand(5,1) generates a col-
umn vector with 5 random entries. Use these commands to
construct several systems of linear equations and then solve
them.

14. (matlab) Suppose that the four substances S1 , S2 , S3 ,


S4 contain the following percentages of vitamins A, B, C and
F by weight

Vitamin S1 S2 S3 S4
A 25% 19% 20% 3%
B 2% 14% 2% 14%
C 8% 4% 1% 0%
F 25% 31% 25% 16%

Mix the substances S1 , S2 , S3 and S4 so that the result-


ing mixture contains precisely 3.85 grams of vitamin A, 2.30
grams of vitamin B, 0.80 grams of vitamin C, and 5.95 grams
of vitamin F. How many grams of each substance have to be
contained in the mixture?
Discuss what happens if we require that the resulting mixture
contains 2.00 grams of vitamin B instead of 2.30 grams.

22
§2.2 The Geometry of Low-Dimensional Solutions

2.2 The Geometry of x = -5:0.1:5;


Low-Dimensional Solutions Producing x by either command is acceptable.
In this section we discuss how to use MATLAB graph-
Typing
ics to solve systems of linear equations in two and three
unknowns. We begin with two dimensions.
y = 2*x - 6;

Linear Equations in Two Dimensions The set of all produces a vector whose entries correspond to the y-
solutions to the equation coordinates of points on the line (2.2.1). Then typing

2x − y = 6 (2.2.1) plot(x,y)
is a straight line in the xy plane; this line has slope 2 and
produces the desired plot. It is useful to label the axes
y-intercept equal to −6. We can use MATLAB to plot the
on this figure, which is accomplished by typing
solutions to this equation — though some understanding
of the way MATLAB works is needed.
xlabel('x')
The plot command in MATLAB plots a sequence of ylabel('y')
points in the plane, as follows. Let X and Y be n vectors.
Then We can now use MATLAB to solve the equation (2.1.1)
graphically. Recall that (2.1.1) is:
plot(X,Y)
x + y =7
will plot the points (X(1), Y (1)), (X(2), Y (2)), …, −x + 3y = 1
(X(n), Y (n)) in the xy-plane.
A solution to this system of equations is a point that lies
To plot points on the line (2.2.1) we need to enter the x- on both lines in the system. Suppose that we search for a
coordinates of the points we wish to plot. If we want to solution to this system that has an x-coordinate between
plot a hundred points, we would be facing a tedious task. −3 and 7. Then type the commands
MATLAB has a command to simplify this task. Typing
x = linspace(-3,7,100);
x = linspace(-5,5,100); y = 7 - x;
plot(x,y)
produces a vector x with 100 entries with the 1st entry xlabel('x')
equal to −5, the last entry equal to 5, and the remaining ylabel('y')
98 entries equally spaced between −5 and 5. MATLAB hold on
has another command that allows us to create a vector y = (1 + x)/3;
of points x. In this command we specify the distance plot(x,y)
between points rather than the number of points. That axis('equal')
command is: grid

23
§2.2 The Geometry of Low-Dimensional Solutions

The MATLAB command hold on tells MATLAB to keep these lines are parallel and unequal, then there are no
the present figure and to add the information that follows solutions, as there are no points of intersection.
to that figure. The command axis('equal') instructs
MATLAB to make unit distances on the x and y axes
Linear Equations in Three Dimensions We begin by
equal. The last MATLAB command superimposes grid
observing that the set of all solutions to a linear equa-
lines. See Figure 6. From this figure you can see that
tion in three variables forms a plane. More precisely, the
the solution to this system is (x, y) = (5, 2), which we
solutions to the equation
already knew.
10 ax + by + cz = d (2.2.2)

form a plane that is perpendicular to the vector (a, b, c)


— assuming of course that the vector (a, b, c) is nonzero.
8
This fact is most easily proved using the dot product.
Recall from Chapter 1 (1.4.2) that the dot product is
6 defined by

X · Y = x1 y1 + x2 y2 + x3 y3 ,
y

4
where X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ). We recall
from Chapter 1 (1.4.3) the following important fact con-
cerning dot products:
2

X ·Y =0

0 if and only if the vectors X and Y are perpendicular.


0 5 Suppose that N = (a, b, c) 6= 0. Consider the plane that
x
is perpendicular to the normal vector N and that con-
tains the point X0 . If the point X lies in that plane, then
Figure 6: Graph of equations in (2.1.1)
X − X0 is perpendicular to N ; that is,
There are several principles that follow from this exercise. (X − X0 ) · N = 0. (2.2.3)

• Solutions to a single linear equation in two variables If we use the notation


form a straight line.
X = (x, y, z) and X0 = (x0 , y0 , z0 ),
• Solutions to two linear equations in two unknowns
lie at the intersection of two straight lines in the then (2.2.3) becomes
plane. a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.

It follows that the solution to two linear equations in two Setting


variables is a single point if the lines are not parallel. If d = ax0 + by0 + cz0

24
§2.2 The Geometry of Low-Dimensional Solutions

puts equation (2.2.3) into the form (2.2.2). In this way The first command tells MATLAB to create a square grid
we see that the set of solutions to a single linear equation in the xy-plane. Grid points are equally spaced between
in three variables forms a plane. See Figure 7. −5 and 5 at intervals of 0.5 on both the x and y axes. The
second command tells MATLAB to compute the z value
N of the solution to (2.2.4) at each grid point. The third
command tells MATLAB to graph the surface containing
the points (x, y, z). See Figure 8.

X 30
X
0 20

10

−10

−20
0 −30
5

Figure 7: The plane containing X0 and perpendicular to 5

N.
0
0

−5
We now use MATLAB to visualize the planes that are
−5

solutions to linear equations. Plotting an equation in Figure 8: Graph of (2.2.4).


three dimensions in MATLAB follows a structure similar
to the planar plots. Suppose that we wish to plot the We can now see that solutions to a system of two linear
solutions to the equation equations in three unknowns consists of points that lie
simultaneously on two planes. As long as the normal
−2x + 3y + z = 2. (2.2.4) vectors to these planes are not parallel, the intersection of
the two planes will be a line in three dimensions. Indeed,
We can rewrite (2.2.4) as
consider the equations
z = 2x − 3y + 2. −2x + 3y + z = 2
It is this function that we actually graph by typing the 2x − 3y + z = 0.
commands We can graph the solution using MATLAB , as follows.
We continue from the previous graph by typing
[x,y] = meshgrid(-5:0.5:5);
z = 2*x - 3*y + 2; hold on
surf(x,y,z) z = -2*x + 3*y;
surf(x,y,z)

25
§2.2 The Geometry of Low-Dimensional Solutions

The result, which illustrates that the intersection of two


planes in R3 is generally a line, is shown in Figure 9.
30

20

10
30
0
20
−10
10
−20
0
−30
−10 5
5
−20
0
0
−30
5
−5 −5
5
0
0
Figure 10: Point of intersection of three planes.
−5 −5

Figure 9: Line of intersection of two planes. this system accurately. Denote the 3 × 3 matrix of coef-
ficients by A, the vector of coefficients on the right hand
We can now see geometrically that the solution to three side by b, and the solution by x. Solve the system in
simultaneous linear equations in three unknowns will gen- MATLAB by typing
erally be a point — since generally three planes in three
space intersect in a point. To visualize this intersection, A = [ -2 3 1; 2 -3 1; -3 0.2 1];
as shown in Figure 10, we extend the previous system of b = [2; 0; 1];
equations to x = A\b

−2x + 3y + z = 2 The point of intersection of the three planes is at


2x − 3y + z = 0
−3x + 0.2y + z = 1. x =
0.0233
Continuing in MATLAB type 0.3488
1.0000
z = 3*x - 0.2*y + 1;
surf(x,y,z) Three planes in three dimensional space need not inter-
sect in a single point. For example, if two of the planes
Unfortunately, visualizing the point of intersection of are parallel they need not intersect at all. The normal
these planes geometrically does not really help to get an vectors must point in independent directions to guaran-
accurate numerical value of the coordinates of this inter- tee that the intersection is a point. Understanding the
section point. However, we can use MATLAB to solve notion of independence (it is more complicated than just

26
§2.2 The Geometry of Low-Dimensional Solutions

not being parallel) is part of the subject of linear algebra. 18

MATLAB returns “Inf”, which we have seen previously, 16


when these normal vectors are (approximately) depen-
dent. For example, consider Exercise 7. 14

12

Plotting Nonlinear Functions in MATLAB Suppose


that we want to plot the graph of a nonlinear function of
10

y
a single variable, such as 8

y = x2 − 2x + 3 (2.2.5) 6

on the interval [−2, 5] using MATLAB. There is a dif- 4

ficulty: How do we enter the term x2 ? For example, 2


suppose that we type
−2 −1 0 1 2 3 4 5
x

x = linspace(-2,5); Figure 11: Graph of y = x2 − 2x + 3.


y = x*x - 2*x + 3;
Exercises
Then MATLAB responds with

??? Error using ==> *


1. Find the equation for the plane perpendicular to the vector
Inner matrix dimensions must agree. (2, 3, 1) and containing the point (−1, −2, 3).

The problem is that in MATLAB the variable x is a


vector of 100 equally spaced points x(1), x(2), …, 2. Determine three systems of two linear equations in two
x(100). What we really need is a vector consisting unknowns so that the first system has a unique solution, the
second system has an infinite number of solutions, and the
of entries x(1)*x(1), x(2)*x(2), …, x(100)*x(100).
third system has no solutions.
MATLAB has the facility to perform this operation au-
tomatically and the syntax for the operation is .* rather
than *. So typing 3. Write the equation of the plane through the origin con-
taining the vectors (1, 0, 1) and (2, −1, 2).
x = linspace(-2,5);
y = x.*x - 2*x + 3;
plot(x,y)

produces the graph of (2.2.5) in Figure 11. In a similar 4. Find a system of two linear equations in three unknowns
fashion, MATLAB has the ‘dot’ operations of ./, .\, and whose solution set is the line consisting of scalar multiples of
.^, as well as .*. the vector (1, 2, 1).

27
§2.2 The Geometry of Low-Dimensional Solutions

5. Find the cosine of the angle between the normal vectors 9. (matlab) Use MATLAB to solve graphically the planar
to the planes system of linear equations

2x − 2y + z = 14 and x + y − 2z = −10. 4.23x + 0.023y = −1.1


1.65x − 2.81y = 1.63

to an accuracy of two decimal points.


6. (a) Find a vector u normal to the plane 2x + 2y + z = 3.
(b) Find a vector v normal to the plane x + y + 2z = 4.
(c) Find the cosine of the angle θ between the vectors u and 10. (matlab) Use MATLAB to find an approximate graph-
v. ical solution to the three dimensional system of linear equa-
tions
3x − 4y + 2z = −11
2x + 2y + z = 7
−x + y − 5z = 7.
7. (matlab) Determine graphically the geometry of the set
of solutions to the system of equations in the three unknowns Then use MATLAB to find an exact solution.
x, y, z:
x + 3z = 1
3x − z = 1 11. (matlab) Use MATLAB to determine graphically the
z = 2 geometry of the set of solutions to the system of equations:
by sketching the plane of solutions for each equation individ-
x + 3y + 4z = 5
ually. Describe in words why there are no solutions to this
2x + y + z = 1
system. (Use MATLAB graphics to verify your sketch. Note
−4x + 3y + 5z = 7.
that you should enter the last equation as z = 2 - 0*x -
0*y and the first two equations with 0*y terms. Try different Attempt to use MATLAB to find an exact solution to this
views — but include view([0 1 0]) as one view.) system and discuss the implications of your calculations.
Hint: After setting up the graphics display in MATLAB, you
can use the command view([0,1,0]) to get a better view of
8. (matlab) Use MATLAB to solve graphically the planar
the solution point.
system of linear equations

x + 4y = −4
4x + 3y = 4 12. (matlab) Use MATLAB to graph the function y =
2 − x sin(x2 − 1) on the interval [−2, 3]. How many relative
to an accuracy of two decimal points. maxima does this function have on this interval?
Hint: The MATLAB command zoom on allows us to view
the plot in a window whose axes are one-half those of original.
Each time you click with the mouse on a point, the axes’ limits
are halved and centered at the designated point. Coupling
zoom on with grid on allows you to determine approximate
numerical values for the intersection point.

28
§2.3 Gaussian Elimination

2.3 Gaussian Elimination (1 − 2x2 , x2 , 3) is a solution of (2.3.3). Thus, there is an


infinite number of solutions to (2.3.3), and these solutions
A general system of m linear equations in n unknowns
can be parameterized by one number x2 .
has the form
a11 x1 + a12 x2 + ··· + a1n xn = b1
a21 x1 + a22 x2 + ··· + a2n xn = b2 Equations Having No Solutions Note that the system
.. .. .. .. (2.3.1) of equations
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm x1 − x2 = 1
The entries aij and bi are constants. Our task is to find x1 − x2 = 2
a method for solving (2.3.1) for the variables x1 , . . . , xn .
has no solutions.

Easily Solved Equations Some systems are easily Definition 2.3.1. A linear system of equations is in-
solved. The system of three equations (m = 3) in three consistent if the system has no solutions and consistent
unknowns (n = 3) if the system does have solutions.
x1 + 2x2 + 3x3 = 10
1 7 As discussed in the previous section, (2.1.7) is an example
x2 − x3 = (2.3.2) of a linear system that MATLAB cannot solve. In fact,
5 5
x3 = 3 that system is inconsistent — inspect the 2nd and 4th
equations in (2.1.7).
is one example. The 3rd equation states that x3 = 3.
Gaussian elimination is an algorithm for finding all so-
Substituting this value into the 2nd equation allows us to
lutions to a system of linear equations by reducing the
solve the 2nd equation for x2 = 2. Finally, substituting
given system to ones like (2.3.2) and (2.3.3), that are
x2 = 2 and x3 = 3 into the 1st equation allows us to solve
easily solved by back substitution. Consequently, Gaus-
for x1 = −3. The process that we have just described is
sian elimination can also be used to determine whether
called back substitution.
a system is consistent or inconsistent.
Next, consider the system of two equations (m = 2) in
three unknowns (n = 3):
Elementary Equation Operations There are three ways
x1 + 2x2 + 3x3 = 10
(2.3.3) to change a system of equations without changing the
x3 = 3 . set of solutions; Gaussian elimination is based on this
observation. The three elementary operations are:
The 2nd equation in (2.3.3) states that x3 = 3. Substitut-
ing this value into the 1st equation leads to the equation
(a) Swap two equations.
x1 = 1 − 2x2 .
(b) Multiply a single equation by a nonzero number.
We have shown that every solution to (2.3.3) has the
form (x1 , x2 , x3 ) = (1 − 2x2 , x2 , 3) and that every vector (c) Add a scalar multiple of one equation to another.

29
§2.3 Gaussian Elimination

We begin with an example: The augmented matrix contains all of the information
that is needed to solve system (2.3.1).
x1 + 2x2 + 3x3 = 10
x1 + 2x2 + x3 = 4 (2.3.4)
2x1 + 9x2 + 5x3 = 27 . Elementary Row Operations The elementary opera-
tions used in Gaussian elimination can be interpreted as
Gaussian elimination works by eliminating variables from
row operations on the augmented matrix, as follows:
the equations in a fashion similar to the substitution
method in the previous section. To begin, eliminate the
(a) Swap two rows.
variable x1 from all but the 1st equation, as follows. Sub-
tract the 1st equation from the 2nd , and subtract twice (b) Multiply a single row by a nonzero number.
the 1st equation from the 3rd , obtaining:
(c) Add a scalar multiple of one row to another.
x1 + 2x2 + 3x3 = 10
−2x3 = −6 (2.3.5) We claim that by using these elementary row operations
5x2 − x3 = 7 . intelligently, we can always solve a consistent linear sys-
Next, swap the 2nd and 3rd equations, so that the coef- tem — indeed, we can determine when a linear system
ficient of x2 in the new 2nd equation is nonzero. This is consistent or inconsistent. The idea is to perform ele-
yields mentary row operations in such a way that the new aug-
mented matrix has zero entries below the diagonal.
x1 + 2x2 + 3x3 = 10
5x2 − x3 = 7 (2.3.6) We describe this process inductively. Begin with the 1st
−2x3 = −6 . column. We assume for now that some entry in this col-
umn is nonzero. If a11 = 0, then swap two rows so that
Now, divide the 2nd equation by 5 and the 3rd equation the number a11 is nonzero. Then divide the 1st row by
by −2 to obtain a system of equations identical to our a11 so that the leading entry in that row is 1. Now sub-
first example (2.3.2), which we solved by back substitu- tract ai1 times the 1st row from the ith row for each row
tion. i from 2 to m. The end result is that the 1st column has
a 1 in the 1st row and a 0 in every row below the 1st .
The result is
Augmented Matrices The process of performing Gaus-
 
1 ∗ ··· ∗
sian elimination when the number of equations is greater  0 ∗ ··· ∗ 
than two or three is painful. The computer, however, can  .. .. .. ..  .
 
 . . . . 
help with the manipulations. We begin by introducing
the augmented matrix. The augmented matrix associ- 0 ∗ ··· ∗
ated with (2.3.1) has m rows and n + 1 columns and is
written as: Next we consider the 2nd column. We assume that some
  entry in that column below the 1st row is nonzero. So,
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2  if necessary, we can swap two rows below the 1st row so
(2.3.7) that the entry a22 is nonzero. Then we divide the 2nd
.. .. .. .. 
 
. . . .  row by a22 so that its leading nonzero entry is 1. Then


am1 am2 ··· amn bm we subtract appropriate multiples of the 2nd row from

30
§2.3 Gaussian Elimination

each row below the 2nd so that all the entries in the 2nd A(4,:) = A(4,:) - 3*A(7,:)
column below the 2nd row are 0. The result is
  The first elementary row operation, swapping two rows,
1 ∗ ··· ∗ requires a different kind of MATLAB command. In MAT-
··· ∗  LAB, the ith and j th rows of the matrix A are permuted
 0 1
 .. .. .. ..  .
 
 . . . .  by the command
0 0 ··· ∗
A([i j],:) = A([j i],:)

Then we continue with the 3rd column. That’s the idea. So, to swap the 1st and 3rd rows of the matrix A, we type
However, does this process always work and what hap-
pens if all of the entries in a column are zero? Before A([1 3],:) = A([3 1],:)
answering these questions we do experimentation with
MATLAB.
Examples of Row Reduction in MATLAB Let us see
how the row operations can be used in MATLAB. As an
Row Operations in MATLAB In MATLAB the ith row
example, we consider the augmented matrix
of a matrix A is specified by A(i,:). Thus to replace the
5th row of a matrix A by twice itself, we need only type:
 
1 3 0 −1 −8
 2 6 −4 4 4 
(2.3.8*)
A(5,:) = 2*A(5,:)
 
 1 0 −1 −9 −35 
0 1 0 3 10
In general, we can replace the ith row of the matrix A by
c times itself by typing We enter this information into MATLAB by typing

A(i,:) = c*A(i,:) e2_3_8

which produces the result


Similarly, we can divide the ith row of the matrix A by
the nonzero number c by typing A =
1 3 0 -1 -8
A(i,:) = A(i,:)/c 2 6 -4 4 4
1 0 -1 -9 -35
The third elementary row operation is performed simi- 0 1 0 3 10
larly. Suppose we want to add c times the ith row to the
j th row, then we type We now perform Gaussian elimination on A, and then
solve the resulting system by back substitution. Gaus-
A(j,:) = A(j,:) + c*A(i,:) sian elimination uses elementary row operations to set
the entries that are in the lower left part of A to zero.
For example, subtracting 3 times the 7th row from the These entries are indicated by numbers in the following
4th row of the matrix A is accomplished by typing: matrix:

31
§2.3 Gaussian Elimination

* * * * * A([2 4],:) = A([4 2],:)


2 * * * *
1 0 * * * and obtain
0 1 0 * *
A =
Gaussian elimination works inductively. Since the first 1 3 0 -1 -8
entry in the matrix A is equal to 1, the first step in 0 1 0 3 10
Gaussian elimination is to set to zero all entries in the 1st 0 -3 -1 -8 -27
column below the 1st row. We begin by eliminating the 0 0 -4 6 20
2 that is the first entry in the 2nd row of A. We replace
the 2nd row by the 2nd row minus twice the 1st row. To The next elementary row operation is the command
accomplish this elementary row operation, we type
A(3,:) = A(3,:) + 3*A(2,:)
A(2,:) = A(2,:) - 2*A(1,:)
which leads to
and the result is

A = A =
1 3 0 -1 -8 1 3 0 -1 -8
0 0 -4 6 20 0 1 0 3 10
1 0 -1 -9 -35 0 0 -1 1 3
0 1 0 3 10 0 0 -4 6 20

In the next step, we eliminate the 1 from the entry in the Now we have set all entries in the 2nd column below the
3rd row, 1st column of A. We do this by typing 2nd row to 0.
Next, we set the first nonzero entry in the 3rd row to 1
A(3,:) = A(3,:) - A(1,:)
by multiplying the 3rd row by −1, obtaining
which yields
A =
A = 1 3 0 -1 -8
1 3 0 -1 -8 0 1 0 3 10
0 0 -4 6 20 0 0 1 -1 -3
0 -3 -1 -8 -27 0 0 -4 6 20
0 1 0 3 10
Since the leading nonzero entry in the 3rd row is 1, we
Using elementary row operations, we have now set the next eliminate the nonzero entry in the 3rd column, 4th
entries in the 1st column below the 1st row to 0. Next, row. This is accomplished by the following MATLAB
we alter the 2nd column. We begin by swapping the 2nd command:
and 4th rows so that the leading nonzero entry in the 2nd
row is 1. To accomplish this swap, we type A(4,:) = A(4,:) + 4*A(3,:)

32
§2.3 Gaussian Elimination

Finally, divide the 4th row by 2 to obtain: x =


2.0000
A = -2.0000
1 3 0 -1 -8 1.0000
0 1 0 3 10 4.0000
0 0 1 -1 -3
0 0 0 1 4
Introduction to Echelon Form Next, we discuss how
Gaussian elimination works in an example in which the
By using elementary row operations, we have arrived at number of rows and the number of columns in the coef-
the system ficient matrix are unequal. We consider the augmented
matrix
x1 + 3x2 − x4 = −8
x2 + 3x4 = 10
 
(2.3.9) 1 0 −2 3 4 0 1
x3 − x4 = −3  0 1 2 4 0 −2 0 
  (2.3.12*)
x4 = 4 ,  2 −1 −4 0 −2 8 −4 
−3 0 6 −8 −12 2 −2
that can now be solved by back substitution. We obtain
This information is entered into MATLAB by typing
x4 = 4, x3 = 1, x2 = −2, x1 = 2. (2.3.10)
e2_3_12
We return to the original set of equations corresponding
to (2.3.8*)
Again, the augmented matrix is denoted by A.
x1 + 3x2 − x4 = −8 We begin by eliminating the 2 in the entry in the 3rd row,
2x1 + 6x2 − 4x3 + 4x4 = 4 1st column. To accomplish the corresponding elementary
(2.3.11*)
x1 − x3 − 9x4 = −35 row operation, we type
x2 + 3x4 = 10 .
A(3,:) = A(3,:) - 2*A(1,:)
Load the corresponding linear system into MATLAB by
typing resulting in

e2_3_11 A =
1 0 -2 3 4 0 1
The information in (2.3.11*) is contained in the coeffi- 0 1 2 4 0 -2 0
cient matrix C and the right hand side b. A direct solu- 0 -1 0 -6 -10 8 -6
tion is found by typing -3 0 6 -8 -12 2 -2

x = C\b We proceed with

which yields the same answer as in (2.3.10), namely, A(4,:) = A(4,:) + 3*A(1,:)

33
§2.3 Gaussian Elimination

to create two more zeros in the 4th row. Finally, we nonzero coefficient. In this case, we use the 4th equa-
eliminate the -1 in the 3rd row, 2nd column by tion to solve for x4 in terms of x5 and x6 , and then we
substitute for x4 in the first three equations. This process
A(3,:) = A(3,:) + A(2,:) can also be accomplished by elementary row operations.
Indeed, eliminating the variable x4 from the first three
to arrive at equations is the same as using row operations to set the
first three entries in the 4th column to 0. We can do this
A = by typing
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0 A(3,:) = A(3,:) + A(4,:);
0 0 2 -2 -10 6 -6 A(2,:) = A(2,:) - 4*A(4,:);
0 0 0 1 0 2 1 A(1,:) = A(1,:) - 3*A(4,:)

Next we set the leading nonzero entry in the 3rd row to Remember: By typing semicolons after the first two
1 by dividing the 3rd row by 2. That is, we type rows, we have told MATLAB not to print the intermediate
results. Since we have not typed a semicolon after the
A(3,:) = A(3,:)/2 3rd row, MATLAB outputs

to obtain A =
1 0 -2 0 4 -6 -2
A = 0 1 2 0 0 -10 -4
1 0 -2 3 4 0 1 0 0 1 0 -5 5 -2
0 1 2 4 0 -2 0 0 0 0 1 0 2 1
0 0 1 -1 -5 3 -3
0 0 0 1 0 2 1 We proceed with back substitution by eliminating the
nonzero entries in the first two rows of the 3rd column.
We say that the matrix A is in (row) echelon form since To do this, type
the first nonzero entry in each row is a 1, each entry in a
column below a leading 1 is 0, and the leading 1 moves A(2,:) = A(2,:) - 2*A(3,:);
to the right as you go down the matrix. In row echelon A(1,:) = A(1,:) + 2*A(3,:)
form, the entries where leading 1’s occur are called pivots.
If we compare the structure of this matrix to the ones we which yields
have obtained previously, then we see that here we have
two columns too many. Indeed, we may solve these equa- A =
tions by back substitution for any choice of the variables
1 0 0 0 -6 4 -6
x5 and x6 .
0 1 0 0 10 -20 0
The idea behind back substitution is to solve the last 0 0 1 0 -5 5 -2
equation for the variable corresponding to the first 0 0 0 1 0 2 1

34
§2.3 Gaussian Elimination

The augmented matrix is now in reduced echelon form It follows that the general solution to a linear system of
and the corresponding system of equations has the form equations is given by a single solution (x5 = x6 = 0) plus
the linear combination of a finite number of vectors. We
x1 − 6x5 + 4x6 = −6 will discuss reduced echelon form in more detail in the
x2 + 10x5 − 20x6 = 0
(2.3.13) next section.
x3 − 5x5 + 5x6 = −2
x4 + 2x6 = 1,
Exercises
A matrix is in reduced echelon form if it is in echelon
form and if every entry in a column containing a pivot,
other than the pivot itself, is 0. In Exercises 1 – 3 determine whether the given matrix is in
Reduced echelon form allows us to solve directly this sys- reduced echelon form.
tem of equations in terms of the variables x5 and x6 ,
 
1 −1 0 1
    1.  0 1 0 −6 .
x1 −6 + 6x5 − 4x6 0 0 1 0
 x2   −10x5 + 20x6   
1 0 −2 0
   
 x3   −2 + 5x5 − 5x6 
 = . (2.3.14) 2.  0 1 4 0 .
 x4   1 − 2x6 
  
 x5  
 0 0 0 1
x5 
x6 x6
 
0 1 0 3
3.  0 0 2 1 .
It is important to note that every consistent system of 0 0 0 0
linear equations corresponding to an augmented matrix
in reduced echelon form can be solved as in (2.3.14) —
and this is one reason for emphasizing reduced echelon In Exercises 4 – 6 we list the reduced echelon form of an
form. We can rewrite the solutions in (2.3.14) in the augmented matrix of a system of linear equations. Which
columns in these augmented matrices contain pivots? De-
form:
scribe all solutions to these systems of equations in the form
of (2.3.14).
       
x1 , −6 6 −4
 x2   0   −10   20   
        1 4 0 0
4.  0 0 1 5 .
 x3   −2   5 
  −5 
 =  + x5  + x 6
 .
 −2 
 x4   1   0   0 0 0 0
     
 x5   0   1   0   
x6 0 0 1 1 2 0 0 0
5.  0 0 1 1 0 .
Definition 2.3.2. A linear combination of the vectors 0 0 0 0 1
v1 , . . . , vk in Rn is a vector in Rn of the form 
1 −6 0 −1 1

6.  0 0 1 2 9 .
v = α1 v1 + · · · + αk vk 0 0 0 0 0
where α1 , . . . αk are scalars in R.

35
§2.3 Gaussian Elimination

7. Suppose that à = [A|b] is the augmented matrix of a (b) if solutions are not unique, how many variables can be
system of 4 linear equations in 7 unknowns. Suppose that assigned arbitrary values.
the solution is defined by 5 parameters. Let E be the 4 × 8
reduced echelon form of Ã. How many pivots does E have?
 
1 0 0 3
12.  0 2 1 1 .
0 0 0 0

8. (a) Consider the 2 × 2 matrix


 
1 2 0 0 3
  13.  0 1 1 0 1 .
a b 0 0 0 0 2
(2.3.15)
c 1  
1 0 2 1
where a, b, c ∈ R and a 6= 0. Show that (2.3.15) is row equiv- 14.  0 5 0 2 .
alent to the matrix 0 0 4 3
 
b  
1 1 0 2 0 3
a
a − bc  .
 
 2 3 6 1 16 
15.  .

0  0 3 2 1 10 
a
0 0 0 0 0
(b) Show that (2.3.15) is row equivalent to the identity matrix
if and only if a 6= bc.

A system of m equations in n unknowns is linear if it has the


9. Use row reduction and back substitution to solve the fol- form (2.3.1); any other system of equations is called nonlinear.
lowing system of two equations in three unknowns: In Exercises 16 – 20 decide whether each of the given systems
of equations is linear or nonlinear.
x1 − x2 + x3 = 1
2x1 + x2 − x3 = −1 16.
3x1 − 2x2 + 14x3 − 7x4 = 35
2x1 + 5x2 − 3x3 + 12x4 = −1
In Exercises 10 – 11 determine the augmented matrix and all
solutions for each system of linear equations 17.
3x1 + πx2 = 0
x−y+z = 1 2x1 − ex2 = 1
10. 4x + y + z = 5 .
18.
2x + 3y − z = 2
3x1 x2 − x2 = 10
2x − y + z + w = 1 2x1 − x22 = −5
11. .
x + 2y − z + w = 7 19.
3x1 − x2 = cos(12)
2x1 − x2 = −5
In Exercises 12 – 15 consider the augmented matrices repre-
senting systems of linear equations, and decide 20.
3x1 − sin(x2 ) = 12
(a) if there are zero, one or infinitely many solutions, and 2x1 − x3 = −5

36
§2.3 Gaussian Elimination

In Exercises 21 – 23 use elementary row operations and MAT- 26. (matlab) Comment: To understand the point of this
LAB to put each of the given matrices into row echelon form. exercise you must begin by typing the MATLAB command
Suppose that the matrix is the augmented matrix for a system format short e. This command will set a format in which
of linear equations. Is the system consistent or inconsistent? you can see the difficulties that sometimes arise in numerical
computations.
21. (matlab)
Consider the following two 3 × 3-matrices:
 
2 1 1
.
4 2 3    
1 3 4 3 14
22. (matlab)   A =  2 1 1  and B =  1 21 .
3 −4 0 2 −4 3 5 3 −45
 0 2 3 1 . (2.3.17*)
3 1 4 5 Note that matrix B is obtained from matrix A by interchang-
ing the first two columns.
23. (matlab)

−2 1 9 1
 (a) Use MATLAB to put A into row echelon form using the
 3 3 −4 2 . transformations
1 4 5 5 (a) Subtract 2 times the 1st row from the 2nd .
(b) Add 4 times the 1st row to the 3rd .
Observation: In standard format MATLAB displays all (c) Divide the 2nd row by −5.
nonzero real numbers with four decimal places while it dis-
(d) Subtract 15 times the 2nd row from the 3rd .
plays zero as 0. An unfortunate consequence of this display
is that when a matrix has both zero and noninteger entries, (b) Put B by hand into row echelon form using the trans-
the columns will not align — which is a nuisance. You can formations
work with rational numbers rather than decimal numbers by (a) Divide the 1st row by 3.
typing format rational. Then the columns will align.
(b) Subtract the 1st row from the 2nd .
24. (matlab) Load the following 6 × 8 matrix A into MAT- (c) Subtract 3 times the 1st row from the 3rd .
LAB by typing e2_3_16. (d) Multiply the 2nd row by 3/5.
(e) Add 5 times the 2nd row to the 3rd .
 
0 0 0 1 3 5 0 9
 0 3
 6 −6 −6 −12 0 1 
 (c) Use MATLAB to put B into row echelon form using the
 0 2 4 −5 −7 14 0 1  (2.3.16*)

same transformations as in part (b).
A=  0 1 2 1 14 21 0 −1
(d) Discuss the outcome of the three transformations. Is

 
 0 0 0 2 4 9 0 7
there a difference in the results? Would you expect to

0 5 10 −11 −13 2 0 2
see a difference? Could the difference be crucial when
Use MATLAB to transform this matrix to row echelon form. solving a system of linear equations?

25. (matlab) Use row reduction and back substitution to 27. (matlab) Find a cubic polynomial
solve the following system of linear equations:
p(x) = ax3 + bx2 + cx + d
2x1 + 3x2 − 4x3 + x4 = 2
3x1 − x2 − x3 + 2x4 = 4 so that p(1) = 2, p(2) = 3, p0 (−1) = −1, and p0 (3) = 1.
x1 − 7x2 + 5x3 − x4 = 6

37
§2.4 Reduction to Echelon Form

2.4 Reduction to Echelon Form Here are three examples of matrices that are not in ech-
elon form.
In this section, we formalize our previous numerical ex- 
0 0 1 15

periments. We define more precisely the notions of ech-  1 −1 14 −6 
elon form and reduced echelon form matrices, and we 0 0 0 0
prove that every matrix can be put into reduced eche-
lon form using a sequence of elementary row operations.
 
1 −1 14 −6
Consequently, we will have developed an algorithm for  0 0 3 15 
determining whether a system of linear equations is con- 0 0 0 0
sistent or inconsistent, and for determining all solutions  
to a consistent system. 1 −1 14 −6
 0 0 0 0 
Definition 2.4.1. A matrix E is in (row) echelon form 0 0 1 15
if two conditions hold.
Definition 2.4.2. Two m×n matrices are row equivalent
(a) The first nonzero entry in each row of E is equal to if one can be transformed to the other by a sequence of
1. This leading entry 1 is called a pivot. elementary row operations.
(b) A pivot in the (i + 1)st row of E occurs in a column
to the right of the column where the pivot in the ith Let A = (aij ) be a matrix with m rows and n columns.
row occurs. We want to show that we can perform row operations
on A so that the transformed matrix is in echelon form;
Note: A consequence of Definition 2.4.1 is that all rows that is, A is row equivalent to a matrix in echelon form.
in an echelon form matrix that are identically zero occur If A = 0, then we are finished. So we assume that some
at the bottom of the matrix. entry in A is nonzero and that the 1st column where that
Here are three examples of matrices that are in echelon nonzero entry occurs is in the k th column. By swapping
form. The pivot in each row (which is always equal to 1) rows we can assume that a1k is nonzero. Next, divide
is preceded by a ∗. the 1st row by a1k , thus setting a1k = 1. Now, using
  MATLAB notation, perform the row operations
∗1 0 −1 0 −6 4 −6
 0 ∗1 4 0 0 −2 0 
  A(i,:) = A(i,:) - A(i,k)*A(1,:)
 0 0 0 ∗1 −5 5 −2 
0 0 0 0 0 ∗1 0 for each i ≥ 2. This sequence of row operations leads to
a matrix whose first nonzero column has a 1 in the 1st
 
∗1 0 −1 0 −6
 0 ∗1 0 3 0  row and a zero in each row below the 1st row.
 
0 ∗1 −5  Now we look for the next column that has a nonzero entry
 0 0
0 0 0 0 0 below the 1st row and call that column `. By construction
` > k. We can swap rows so that the entry in the 2nd
 
0 ∗1 −1 14 −6
 0 0 0 ∗1 15  row, `th column is nonzero. Then we divide the 2nd row
by this nonzero element, so that the pivot in the 2nd row
 
 0 0 0 0 0 
0 0 0 0 0 is 1. Again we perform elementary row operations so that

38
§2.4 Reduction to Echelon Form

all entries below the 2nd row in the `th column are set to Reduced Echelon Form in MATLAB Preprogrammed
0. Now proceed inductively until we run out of nonzero into MATLAB is a routine to row reduce any matrix to
rows. reduced echelon form. The command is rref. For ex-
ample, recall the 4 × 7 matrix A in (2.3.12*) by typing
This argument proves:
e2_3_12. Put A into reduced row echelon form by typing
Proposition 2.4.3. Every matrix is row equivalent to a rref(A) and obtaining
matrix in echelon form.
ans =
More importantly, the previous argument provides an al- 1 0 0 0 -6 4 -6
gorithm for transforming matrices into echelon form. 0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
Reduction to Reduced Echelon Form 0 0 0 1 0 2 1

Definition 2.4.4. A matrix E is in reduced echelon form Compare the result with the system of equations (2.3.13).
if

(a) E is in echelon form, and Solutions to Systems of Linear Equations Originally,


we introduced elementary row operations as operations
(b) in every column of E having a pivot, every entry in that do not change solutions to the linear system. More
that column other than the pivot is 0. precisely, we discussed how solutions to the original sys-
tem are still solutions to the transformed system and how
We can now prove no new solutions are introduced by elementary row op-
Theorem 2.4.5. Every matrix is row equivalent to a erations. This argument is most easily seen by observing
matrix in reduced echelon form. that

Proof Let A be a matrix. Proposition 2.4.3 states all elementary row operations are invertible
that we can transform A by elementary row operations
— they can be undone.
to a matrix E in echelon form. Next we transform E
into reduced echelon form by some additional elementary For example, swapping two rows is undone by just swap-
row operations, as follows. Choose the pivot in the last ping these rows again. Similarly, multiplying a row by a
nonzero row of E. Call that row `, and let k be the nonzero number c is undone by just dividing that same
column where the pivot occurs. By adding multiples of row by c. Finally, adding c times the j th row to the ith
the `th row to the rows above, we can transform each row is undone by subtracting c times the j th row from
entry in the k th column above the pivot to 0. Note that the ith row.
none of these row operations alters the matrix before the Thus, we can make several observations about solutions
k th column. (Also note that this process is identical to to linear systems. Let E be an augmented matrix corre-
the process of back substitution.)
sponding to a system of linear equations having n vari-
Again we proceed inductively by choosing the pivot in ables. Since an augmented matrix is formed from the
the (` − 1)st row, which is 1, and zeroing out all entries matrix of coefficients by adding a column, we see that
above that pivot using elementary row operations.  the augmented matrix has n + 1 columns.

39
§2.4 Reduction to Echelon Form

Theorem 2.4.6. Suppose that E is an m × (n + 1) aug- Thus, each choice of the n − ` numbers x`+1 , . . . , xn
mented matrix that is in reduced echelon form. Let ` be uniquely determines values of x1 , . . . , x` so that
the number of nonzero rows in E x1 , . . . , xn is a solution to this system. In particular,
the system is consistent, so (a) is proved; and the set of
(a) The system of linear equations corresponding to E all solutions is parameterized by n − ` numbers, so (b) is
is inconsistent if and only if the `th row in E has a proved. 
pivot in the (n + 1)st column.
(b) If the linear system corresponding to E is consistent, Two Examples Illustrating Theorem 2.4.6 The reduced
then the set of all solutions is parameterized by n − ` echelon form matrix
parameters. 
1 5 0 0

E= 0 0 1 0 
0 0 0 1
Proof Suppose that the last nonzero row in E has its
pivot in the (n + 1)st column. Then the corresponding is the augmented matrix of an inconsistent system of
equation is: three equations in three unknowns.

0x1 + 0x2 + · · · + 0xn = 1, The reduced echelon form matrix


 
1 5 0 2
which has no solutions. Thus the system is inconsistent. E= 0 0 1 5 
Conversely, suppose that the last nonzero row has its 0 0 0 0
pivot before the last column. Without loss of generality,
is the augmented matrix of a consistent system of three
we can renumber the columns — that is, we can renumber
equations in three unknowns x1 , x2 , x3 . For this matrix
the variables xj — so that the pivot in the ith row occurs
n = 3 and ` = 2. It follows from Theorem 2.4.6 that the
in the ith column, where 1 ≤ i ≤ `. Then the associated
solutions to this system are specified by one parameter.
system of linear equations has the form:
Indeed, the solutions are
x1 + a1,`+1 x`+1 + · · · + a1,n xn = b1 x1 = 2 − 5x2
x2 + a2,`+1 x`+1 + · · · + a2,n xn = b2 x3 = 5
.. ..
. . and are specified by the one parameter x2 .
x` + a`,`+1 x`+1 + · · · + a`,n xn = b` .

This system can be rewritten in the form: Consequences of Theorem 2.4.6 It follows from Theo-
rem 2.4.6 that linear systems of equations with fewer
x1 = b1 − a1,`+1 x`+1 − · · · − a1,n xn equations than unknowns and with zeros on the right
hand side always have nonzero solutions. More precisely:
x2 = b2 − a2,`+1 x`+1 − · · · − a2,n xn (2.4.1)
.. .. Corollary 2.4.7. Let A be an m×n matrix where m < n.
. . Then the system of linear equations whose augmented
x` = b` − a`,`+1 x`+1 − · · · − a`,n xn . matrix is (A|0) has a nonzero solution.

40
§2.4 Reduction to Echelon Form

Proof Perform elementary row operations on the aug- Hence Theorem 2.4.6(b) implies that the solutions to the
mented matrix (A|0) to arrive at the reduced echelon system corresponding to E are parameterized by n − `
form matrix (E|0). Since the zero vector is a solution, parameters. If ` < n, then the solution is not unique. So
the associated system of equations is consistent. Now the ` = n.
number of nonzero rows ` in (E|0) is less than or equal to Next observe that since the system of linear equations
the number of rows m in E. By assumption m < n and is consistent, it follows from Theorem 2.4.6(a) that the
hence ` < n. It follows from Theorem 2.4.6 that solu- pivot in the nth row must occur in a column before the
tions to the linear system are parametrized by n − ` ≥ 1 (n + 1)st . It follows that the reduced echelon matrix
parameters and that there are nonzero solutions. 
E = (In |c) for some c ∈ Rn . Since (A|b) is row equiva-
lent to (In |c), it follows, by using the same sequence of
Recall that two m × n matrices are row equivalent if one elementary row operations, that A is row equivalent to
can be transformed to the other by elementary row op- In . 
erations.

Corollary 2.4.8. Let A be an n × n square matrix and Uniqueness of Reduced Echelon Form and Rank Ab-
let b be in Rn . Then A is row equivalent to the identity stractly, our discussion of reduced echelon form has one
matrix In if and only if the system of linear equations point remaining to be proved. We know that every ma-
whose augmented matrix is (A|b) has a unique solution. trix A can be transformed by elementary row operations
to reduced echelon form. Suppose, however, that we use
two different sequences of elementary row operations to
Proof Suppose that A is row equivalent to In . Then, transform A to two reduced echelon form matrices E1
by using the same sequence of elementary row operations, and E2 . Can E1 and E2 be different? The answer is:
it follows that the n × (n + 1) augmented matrix (A|b) No.
is row equivalent to (In |c) for some vector c ∈ Rn . The
system of linear equations that corresponds to (In |c) is: Theorem 2.4.9. For each matrix A, there is precisely
one reduced echelon form matrix E that is row equivalent
x1 = c1 to A.
.. .. ..
. . . The proof of Theorem 2.4.9 is given in Section 2.6. Since
xn = cn , every matrix is row equivalent to a unique matrix in re-
duced echelon form, we can define the rank of a matrix
which transparently has the unique solution x =
as follows.
(c1 , . . . , cn ). Since elementary row operations do not
change the solutions of the equations, the original aug- Definition 2.4.10. Let A be an m × n matrix that is
mented system (A|b) also has a unique solution. row equivalent to a reduced echelon form matrix E. Then
Conversely, suppose that the system of linear equations the rank of A, denoted rank(A), is the number of nonzero
associated to (A|b) has a unique solution. Suppose that rows in E.
(A|b) is row equivalent to a reduced echelon form matrix
E. Suppose that the last nonzero row in E is the `th Corollary 2.4.11. We make four remarks concerning
row. Since the system has a solution, it is consistent. the rank of a matrix.

41
§2.4 Reduction to Echelon Form

(a) An echelon form matrix is always row equivalent to


 
1 3 1
a reduced echelon form matrix with the same num- (b) A =  2 1 0 
ber of nonzero rows. Thus, to compute the rank of a 0 0 1
matrix, we need only perform elementary row oper-
 
1 1 1
ations until the matrix is in echelon form. (c) A =  1 2 1 
1 1 1
(b) The rank of any matrix is easily computed in MAT-
LAB. Enter a matrix A and type rank(A).

(c) The number ` in the statement of Theorem 2.4.6 is 4. The augmented matrix of a consistent system of five equa-
tions in seven unknowns has rank equal to three. How many
just the rank of E.
parameters are needed to specify all solutions?
(d) In particular, if the rank of the augmented matrix
corresponding to a consistent system of linear equa-
tions in n unknowns has rank `, then the solutions 5. The augmented matrix of a consistent system of nine equa-
to this system are parametrized by n − ` parameters. tions in twelve unknowns has rank equal to five. How many
parameters are needed to specify all solutions?

Exercises
6. Consider the system of equations

In Exercises 1 – 2 row reduce the given matrix to reduced x1 + 3x3 = 1


echelon form by hand and determine its rank. −x1 + 2x2 − 3x3 = 1
  2x2 + ax3 = b
1 2 1 6
1. A =  3 6 1 14  Find all pairs of real numbers a and b where the system has
1 2 2 8 no solutions, a unique solution, or infinitely many solutions?
  Your answer should subdivide the ab-plane into three disjoint
1 −2 3
sets.
2. B =  3 −6 9 
1 −8 2
In Exercises 7 – 10, use rref on the given augmented ma-
trices to determine whether the associated system of linear
3. How many solutions does the equation
equations is consistent or inconsistent. If the equations are
consistent, then determine how many parameters are needed
   
x1 2
A  x2  =  1  to enumerate all solutions.
x3 2
7. (matlab)
have for the following choices of A. Explain your reasoning.  
  2 1 3 −2 4 1
1 0 1  5 12 −1 3 5 1 
A= (2.4.2*)
(a) A =  0 1 0 

 −4 −21 11 −12 2 1 
0 0 0 23 59 −8 17 21 4

42
§2.4 Reduction to Echelon Form

8. (matlab)
 
b1
 b2 
2

4 6 −2 1
 (a) Describe the sets of vectors b =   b3  ∈ R such that
 4

B= 0 0 4 1 −1  (2.4.3*) b4
2 4 0 1 2 the system of equations Ax = b has (i) no solution, (ii)
one solution, and (iii) infinitely many solutions.
9. (matlab)
  (b) Denote the first column of A by C1 , the second column
2 3 −1 4 by C2 and the third column by C3 . Can you write the
C= 8 11 −7 8  (2.4.4*) vector
2 2 −4 −3  
2
10. (matlab) y=
 −4 

 5 
 
2.3 4.66 −1.2 2.11 −2 0
 0 0 1.33 0 1.44 
D= in the form
 
4.6 9.32 −7.986 4.22 −10.048 
1.84 3.728 −5.216 1.688 −6.208 x1 C1 + x2 C2 + x3 C3 (2.4.6)
(2.4.5*)
where x1 , x2 , x3 ∈ R? If so, express y in this form.

In Exercises 11 – 13 compute the rank of the given matrix.


 
1 −2
11. (matlab) . 16. Consider the augmented matrix
−3 6
   
2 1 0 1 1 −r 1
A=
12. (matlab)  −1 3 2 4 . r −1 1
5 −1 2 −2
  where r is a real parameter.
3 1 0
 −1 2 4 
13. (matlab)   2
. (a) Find all r so that rank(A) = 2.
3 4 
4 −1 −4 (b) Find all r for which the corresponding linear system has

(i) no solution,
14. Prove that the rank of an m × n matrix A is less than or (ii) one solution, and
equal to the minimum of m and n.
(iii) infinitely many solutions.

15. Consider the matrix Solution: Subtracting r times the first row of A from the

1 0 −1
 second row of that matrix yields
 −2 0 2 
A=     
 0 1 −2  1 −r 1 1 −r 1
2 =
0 0 0 0 r −1 1−r 0 (r + 1)(r − 1) 1−r

43
§2.4 Reduction to Echelon Form

So the reduced row echelon form of A is



1

 1 0


1+r 

r 6= ±1

1

  
1 −


 0


 1+! r
RREF(A) = 1 −1 1
 r=1



 0 0 0
 !

 1 1 0
r = −1



 0 0 1

(a) rank(A) = 2 if r 6= 1.
(b) The linear system corresponding to the augmented ma-
trix A has
(i) no solution if r = −1,
(ii) one solution if r 6= ±1, and
(iii) infinitely many solutions if r = 1.

44
§2.5 Linear Equations with Special Coefficients


2.5 Linear Equations with Special Then divide the 2nd row by 36.2 − 3π 2, obtaining:
Coefficients 
1 π 2
√ √
11.2 2√

In this chapter we have shown how to use elementary row  e − 33.6 2  .


0 1 √
operations to solve systems of linear equations. We have 36.2 − 3π 2
assumed that each linear equation in the system has the √
form Finally, multiply the 2nd row by π 2 and subtract it
aj1 x1 + · · · + ajn xn = bj , from the 1st row to obtain:
√ 
where the aji s and the bj s are real numbers. For simplic- √ √ e − 33.6 2

ity, in our examples we have only chosen equations with  1 0 11.2 2 − π 2 36.2 − 3π √2 
√ .
integer coefficients — such as:

 e − 33.6 2 
0 1 √
36.2 − 3π 2
2x1 − 3x2 + 15x3 = −1.
So

Systems with Nonrational Coefficients In fact, a more √ √ e − 33.6 2
x1 = 11.2 2 − π 2 √
general choice of coefficients for a system of two equations 36.2 − 3π 2
might have been (2.5.2)
√ √
2x1 + 2πx2 = 22.4 e − 33.6 2
x2 = √
3x1 + 36.2x2 = e. (2.5.1) 36.2 − 3π 2
which is both hideous to look at and quite uninformative.
It is, however, correct.
Suppose that we solve (2.5.1) by elementary row opera-
Both x1 and x2 are real numbers — they had to be be-
tions. In matrix form we have the augmented matrix
cause all of the manipulations involved addition, sub-
 √  traction, multiplication, and division of real numbers —
2 2π 22.4
. which yield real numbers.
3 36.2 e
If we wanted to use MATLAB√ to perform these calcula-
Proceed with the following
√ elementary row operations. tions, we have to convert 2, π, and e to their decimal
Divide the 1st row by 2 to obtain equivalents — at least up to a certain decimal place ac-
 √ √  curacy. This introduces errors — which for the moment
1 π 2 11.2 2
. we assume are small.
3 36.2 e
To enter A and b in MATLAB , type
Next, subtract 3 times the 1 row from the 2
st nd
row to
obtain: A = [sqrt(2) 2*pi; 3 36.2];
 √ √  b = [22.4; exp(1)];
1 π 2 √ 11.2 2√
.
0 36.2 − 3π 2 e − 33.6 2 Now type A to obtain:

45
§2.5 Linear Equations with Special Coefficients

A = More Accuracy MATLAB can display numbers in ma-


1.4142 6.2832 chine precision (15 digits) rather than the standard four
3.0000 36.2000 decimal place accuracy. To change to this display, type

As its default display, MATLAB displays real numbers to format long


four decimal place accuracy. Similarly, type b to obtain
Now solve the system of equations (2.5.1) again by typing
b =
22.4000 A\b
2.7183
and obtaining
Next use MATLAB to solve this system by typing:
ans =
24.54169560069650
A\b -1.95875151860858

to obtain
Integers and Rational Numbers Now suppose that all
of the coefficients in a system of linear equations are inte-
ans =
gers. When we add, subtract or multiply integers — we
24.5417
get integers. In general, however, when we divide an inte-
-1.9588
ger by an integer we get a rational number rather than an
integer. Indeed, since elementary row operations involve
The reader may check that this answer agrees with the only the operations of addition, subtraction, multiplica-
answer in (2.5.2) to MATLAB output accuracy by typing tion and division, we see that if we perform elementary
row operations on a matrix with integer entries, we will
x2 = (exp(1)-33.6*sqrt(2))/(36.2-3*pi*sqrt(2)) end up with a matrix with rational numbers as entries.
x1 = 11.2*sqrt(2)-pi*sqrt(2)*x2 MATLAB can display calculations using rational numbers
rather than decimal numbers. To display calculations
to obtain using only rational numbers, type

x1 = format rational
24.5417
For example, let
and 2

2 1 0

 1 3 −5 1 
x2 = A=
 4
 (2.5.3*)
2 1 3 
-1.9588 2 1 −1 4

46
§2.5 Linear Equations with Special Coefficients

and let Theorem 2.5.1. Let A be an n × n matrix that is row


equivalent to In , and let b be an n vector. Suppose that
 
1
 1 
b= (2.5.4*) all entries of A and b are rational numbers. Then there
 −5  .

is a unique solution to the system corresponding to the
2 augmented matrix (A|b) and this solution has rational
Enter A and b into MATLAB by typing numbers as entries.

e2_5_3 Proof Since A is row equivalent to In , Corollary 2.4.8


e2_5_4 states that this linear system has a unique solution x. As
we have just discussed, solutions are found using elemen-
Solve the system by typing tary row operations — hence the entries of x are rational
numbers. 
A\b

to obtain Complex Numbers In the previous parts of this sec-


tion, we have discussed why solutions to linear systems
ans = whose coefficients are rational numbers must themselves
-357/41 have entries that are rational numbers. We now discuss
309/41 solving linear equations whose coefficients are more gen-
137/41 eral than real numbers; that is, whose coefficients are
156/41 complex numbers.
To display the answer in standard decimal form, type First recall that addition, subtraction, multiplication and
division of complex numbers yields complex numbers.
format Suppose that
A\b
a = α + iβ
obtaining b = γ + iδ

ans = where α, β, γ, δ are real numbers and i = −1. Then
-8.7073
7.5366 a+b = (α + γ) + i(β + δ)
3.3415 a−b = (α − γ) + i(β − δ)
3.8049
ab = (αγ − βδ) + i(αδ + βγ)
The same logic shows that if we begin with a system a
=
αγ + βδ
+i 2
βγ − αδ
of equations whose coefficients are rational numbers, we b 2
γ +δ 2 γ + δ2
will obtain an answer consisting of rational numbers —
since adding, subtracting, multiplying and dividing ra- MATLAB has been programmed to do arithmetic with
tional numbers yields rational numbers. More precisely: complex numbers using exactly the same instructions as

47
§2.5 Linear Equations with Special Coefficients

it uses to do arithmetic with real and rational numbers. Complex Conjugation Let a = α + iβ be a complex num-
For example, we can solve the system of linear equations ber. Then the complex conjugate of a is defined to be
(4 − i)x1 + 2x2 = 3−i a = α − iβ.
2x1 + (4 − 3i)x2 = 2+i
Let a = α + iβ and c = γ + iδ be complex numbers. Then
in MATLAB by typing we claim that
a+c = a+c
A = [4-i 2; 2 4-3i]; (2.5.5)
ac = a c
b = [3-i; 2+i];
A\b To verify these statements, calculate

a + c = (α + γ) + i(β + δ) = (α + γ) − i(β + δ)
The solution to this system of equations is:
= (α − iβ) + (γ − iδ) = a + c
ans =
0.8457 - 0.1632i and
-0.1098 + 0.2493i
ac = (αγ − βδ) + i(αδ + βγ)
Note: Care must be given when entering complex numbers into = (αγ − βδ) − i(αδ + βγ)
arrays in MATLAB. For example, if you type
= (α − iβ)(γ − iδ) = a c.
b = [3 -i; 2 +i]

then MATLAB will respond with the 2 × 2 matrix Exercises


b =
3.0000 0 - 1.0000i
2.0000 0 + 1.0000i
1. Solve the system of equations
Typing either b = [3-i; 2+i] or b = [3 - i; 2 + i] will yield
x1 − ix2 = 1
the desired 2 × 1 column vector.
ix1 + 3x2 = −1
All of the theorems concerning the existence and unique-
ness of row echelon form — and for solving systems of lin- Check your answer using MATLAB.
ear equations — work when the coefficients of the linear
system are complex numbers as opposed to real numbers. Solve the systems of linear equations given in Exercises 2 – 3
In particular: and verify that the answers are rational numbers.
Theorem 2.5.2. If the coefficients of a system of n lin- x1 + x2 − 2x3 = 1
ear equations in n unknowns are complex numbers and if 2. x1 + x2 + x3 = 2
the coefficient matrix is row equivalent to In , then there x1 − 7x2 + x3 = 3
is a unique solution to this system whose entries are com-
plex numbers. 3.
x1 − x2 = 1
x1 + 3x2 = −1

48
§2.5 Linear Equations with Special Coefficients

In Exercises 4 – 6 use MATLAB to solve the given system of In Exercise 12-15, write the given expression in the form α+iβ
linear equations to four significant decimal places. where α, β ∈ R:

4. (matlab) 12.
1−i
.
√ 1+i
0.1x1
√ + 5x2 − 2x3 = 1 √
3+i
− 3x1 + πx2 − 2.6x3 = 14.3 .
√ 13. √ .
π 1 − 3i
x1 − 7x2 + x3 = 2
2
14. (−2 + 3i)(1 − 2i)2 .
5. (matlab)
(5 + 4i)(1 + i)
(4 − i)x1 + (2 + 3i)x2 = −i 15. .
. 3 − 2i
ix1 − 4x2 = 2.2

6. (matlab) In Exercise 16-17, use MATLAB to simplify the expression


√ and expression the solution as x + iy, where x and y are real
(2 + i)x1 + ( 2 −√3i)x2 − 10.66x3 = 4.23
numbers.
14x1 − 5ix2 + (10.2 − i)x3 = 3 −√
1.6i . √
−4.276x1 − (4 − 2i)x3 3 5 − 6i
+ 2x2 = 2i 16. (matlab) .
(7 + 2i)2

Hint: When entering 2i in MATLAB you must type √
17. (matlab) (−3 + 2i)4 2 − 5i.
sqrt(2)*i, even though when you enter 2i, you can just type
2i.
For a complex number z = x + iy, x is called the real part of z
√ and y is called the imaginary part of z. In MATLAB, real(z)
Let a = 1 + i and b = 3 − i. In Exercise 7-10, simplify the
returns the real part of z and imag(z) returns the imaginary
given expression.
part of z. For example, on entering real(2 + 3i), MAT-
7. a + b. LAB returns 2, whereas on entering imag(2 + 3i), MATLAB
returns 3.
8. b − 3a.
18. (matlab) Verify the identity (2.5.6) for z = −21 + 56i.
1
9. . √
a 19. (matlab) Verify the identity (2.5.7) for z = 14 − 4 5i.
10. ab̄.

11. Let z = x + iy be a complex number.

(a) Verify that


z z̄ = x2 + y 2 . (2.5.6)
(b) Verify that
1 x − iy
= 2 . (2.5.7)
z x + y2

49
§2.6 Uniqueness of Reduced Echelon Form

2.6 Uniqueness of Reduced Echelon in the equations associated to the matrix (F |0), xk must
be zero to be a solution. This argument contradicts the
Form fact that the (E|0) equations and the (F |0) equations
In this section we prove Theorem 2.4.9, which states that have the same solutions. So the pivots of F must also
every matrix is row equivalent to precisely one reduced occur in columns 1, . . . , `, and the equations associated
echelon form matrix. to F must have the form:
Proof of Theorem 2.4.9: Suppose that E and F are x1 = −â1,`+1 x`+1 − · · · − â1,n xn
two m × n reduced echelon matrices that are row equiva- x2 = −â2,`+1 x`+1 − · · · − â2,n xn
lent to A. Since elementary row operations are invertible, .. .. (2.6.2)
the two matrices E and F are row equivalent. Thus, the . .
systems of linear equations associated to the m × (n + 1) x` = −â`,`+1 x`+1 − · · · − â`,n xn
matrices (E|0) and (F |0) must have exactly the same set
of solutions. It is the fact that the solution sets of the lin- where âi,j are scalars.
ear equations associated to (E|0) and (F |0) are identical To complete this proof, we show that ai,j = âi,j . These
that allows us to prove that E = F . equalities are verified as follows. There is just one solu-
Begin by renumbering the variables x1 , . . . , xn so that tion to each system (2.6.1) and (2.6.2) of the form
the equations associated to (E|0) have the form:
x`+1 = 1, x`+2 = · · · = xn = 0.
x1 = −a1,`+1 x`+1 − · · · − a1,n xn
x2 = −a2,`+1 x`+1 − · · · − a2,n xn These solutions are
.. .. (2.6.1)
. . (−a1,`+1 , . . . , −a`,`+1 , 1, 0, · · · , 0)
x` = −a`,`+1 x`+1 − · · · − a`,n xn .
for (2.6.1) and
In this form, pivots of E occur in the columns 1, . . . , `.
We begin by showing that the matrix F also has pivots in (−â1,`+1 , . . . , −â`,`+1 , 1, 0 · · · , 0)
columns 1, . . . , `. Moreover, there is a unique solution to
these equations for every choice of numbers x`+1 , . . . , xn . for (2.6.2). It follows that aj,`+1 = âj,`+1 for j = 1, . . . , `.
Complete this proof by repeating this argument. Just
Suppose that the pivots of F do not occur in columns inspect solutions of the form
1, . . . , `. Then there is a row in F whose first nonzero
entry occurs in a column k > `. This row corresponds to x`+1 = 0, x`+2 = 1, x`+3 = · · · = xn = 0
an equation
through
xk = ck+1 xk+1 + · · · + cn xn .
Now, consider solutions that satisfy x`+1 = · · · = xn−1 = 0, xn = 1.

x`+1 = · · · = xk−1 = 0 and xk+1 = · · · = xn = 0.


In the equations associated to the matrix (E|0), there is
a unique solution associated with every number xk ; while

50
Chapter 3 Matrices and Linearity

3 Matrices and Linearity


In this chapter we take the first step in abstracting vec-
tors and matrices to mathematical objects that are more
than just arrays of numbers. We begin the discussion in
Section 3.1 by introducing the multiplication of a matrix
times a vector. Matrix multiplication simplifies the way
in which we write systems of linear equations and is the
way by which we view matrices as mappings. This latter
point is discussed in Section 3.2.
The mappings that are produced by matrix multiplica-
tion are special and are called linear mappings. Some
properties of linear maps are discussed in Section 3.3.
One consequence of linearity is the principle of superpo-
sition that enables solutions to systems of linear equa-
tions to be built out of simpler solutions. This principle
is discussed in Section 3.4.
In Section 3.5 we introduce multiplication of two matri-
ces and discuss properties of this multiplication in Sec-
tion 3.6. Matrix multiplication is defined in terms of
composition of linear mappings which leads to an ex-
plicit formula for matrix multiplication. This dual role
of multiplication of two matrices — first by formula and
second as composition — enables us to solve linear equa-
tions in a conceptual way as well as in an algorithmic
way. The conceptual way of solving linear equations is
through the use of matrix inverses (or inverse mappings)
which is described in Section 3.7. In this section we also
present important properties of matrix inversion and a
method of computation of matrix inverses. There is a
simple formula for computing inverses of 2 × 2 matrices
based on determinants. The chapter ends with a discus-
sion of determinants of 2 × 2 matrices in Section 3.8.

51
§3.1 Matrix Multiplication of Vectors

3.1 Matrix Multiplication of Vectors For example, when m = 2 and n = 3, then the product
is a 2-vector
In Chapter 2 we discussed how matrices appear when
solving systems of m linear equations in n unknowns.
 
x
a11 a12 a13  1 
   
a11 x1 + a12 x2 + a13 x3
Given the system x2 = .
a21 a22 a23 a21 x1 + a22 x2 + a23 x3
x3
a11 x1 + a12 x2 + ··· + a1n xn = b1 (3.1.3)
a21 x1 + a22 x2 + ··· + a2n xn = b2 As a specific example, compute
.. .. .. .. (3.1.1)
. . . . 
 
 2  
am1 x1 + am2 x2 + · · · + amn xn = bm , 2 3 −1   2 · 2 + 3 · (−3) + (−1) · 4
−3 =
4 1 5 4 · 2 + 1 · (−3) + 5·4
we saw that all relevant information is contained in the 4
m × n matrix of coefficients
 
−9
= .

a11 a12 · · · a1n
 25
 a21 a22 · · · a2n 
A=

.. .. .. 
 Using (3.1.2) we have a compact notation for writing sys-
 . . .  tems of linear equations. For example, using a special
am1 am2 · · · amn instance of (3.1.3),
and the m vector
 
  x1  
2 3 −1  2x1 + 3x2 − x3

b1
 x2  = .
4 1 5 4x1 + x2 + 5x3
b =  ...  .
  x3

bm In this notation we can write the system of two linear


equations in three unknowns

Matrices Times Vectors We motivate multiplication of 2x1 + 3x2 − x3 = 2


a matrix times a vector just as a notational advance that 4x1 + x2 + 5x3 = −1
simplifies the presentation of the linear systems. It is, as the matrix equation
however, much more than that. This concept of mul-
tiplication allows us to think of matrices as mappings
 
  x1  
2 3 −1 2
and these mappings tell us much about the structure of  x2  = .
4 1 5 −1
solutions to linear systems. But first we discuss the no- x3
tational advantage.
Indeed, the general system of linear equations (3.1.1) can
Multiplying an m × n matrix A times an n vector x pro-
be written in matrix form using matrix multiplication as
duces an m vector, as follows:

a11 ··· a1n

x1
 
a11 x1 + · · · + a1n xn
 Ax = b
 . .. . ..
Ax =  .. .   ..  =  . where
. A is the m × n matrix of coefficients, x is the n
    

am1 ··· amn xn am1 x1 + · · · + amn xn vector of unknowns, and b is the m vector of constants
(3.1.2) on the right hand side of (3.1.1).

52
§3.1 Matrix Multiplication of Vectors

Matrices Times Vectors in MATLAB We have already A\b


seen how to define matrices and vectors in MATLAB. Now
we show how to multiply a matrix times a vector using then we get the vector x back as the answer.
MATLAB.
Load the matrix A Exercises
 
5 −4 3 −6 2
 2 −4 −2 −1 1 
(3.1.4*)
 
A=  1 2 1 −5 3 
 1. Let
 −2 −1 −2 1 −1     
2 1 3
1 −6 1 1 4 A= and x= .
−1 4 −2
and the vector x Compute Ax.
 
−1
 2 
(3.1.5*)
 
 1
x= 
2. Let

 −1 
3  

2

3 4 1
B= and y =  5 .
into MATLAB by typing 1 2 3
−2

e3_1_4 Compute By.


e3_1_5

The multiplication Ax can be performed by typing

b = A*x In Exercises 3 – 6 decide whether or not the matrix vector


product Ax can be computed; if it can, compute the product.
and the result should be
   
1 2 2
3. A = and x = .
0 −5 2
b =  
2
2
 
1 2
4. A = and x =  2 .
-8 0 −5
4
18
-6
 
−1
-1 5. A = 1 2 4 and x =  1 .


3
We may verify this result by solving the system of linear 
1

equations Ax = b. Indeed if we type 6. A = (5) and x = .
0

53
§3.1 Matrix Multiplication of Vectors

7. Let 11. Let A be a 2 × 2 matrix. Find A so that


   
··· 1 3
   
a11 a12 a1n x1
A =
 a21 a22 ··· a2n   x2  0 −5
A= .. .. .. and x =  . .
   
 .. 
   
. . . 0 1

  A = .
am1 am2 ··· amn xn 1 4

Denote the columns of the matrix A by


12. Let A be a 2 × 2 matrix. Find A so that
     
a11 a12 a1n    
1 2
 a21   a22   a2n  A =
A1 =  .  , A2 =  .  , · · · 1 −1
An =  .. .
     
 ..   ..  .
   
1 4
 
am1 am2 amn A = .
−1 3
Show that the matrix vector product Ax can be written as
13. Is there an upper triangular 2 × 2 matrix A such that
Ax = x1 A1 + x2 A2 + · · · + xn An ,
   
1 1
where xj Aj denotes scalar multiplication (see Chapter 1). A = ? (3.1.6)
0 2

Is there a symmetric 2 × 2 matrix A satisfying (3.1.6)?


8. Let    
1 1 1
C= and b = . In Exercises 14 – 15 use MATLAB to compute b = Ax for the
2 −1 1
given A and x.
Find a 2-vector z such that Cz = b.
14. (matlab)
   
−0.2 −1.8 3.9 −6 −1.6 −2.6
9. Write the system of linear equations  6.3 8 3 2.5 5.1   2.4 
and
   
A=  −0.8 −9.9 9.7 4.7 5.9 x=  4.6
 .
2x1 + 3x2 − 2x3 = 4  −0.9 −4.1
 
1.1 −2.5 8.4   −6.1 
6x1 − 5x3 = 1 −1 −9 −2 −9.8 6.9 8.1
(3.1.7*)
in the matrix form Ax = b.
15. (matlab)
 
14 −22 −26 −2 −77 100 −90
10. Find all solutions to  26
 25 −15 −63 33 92 14 
 

x1

   −53 40 19 40 −27 −88 40 
1 3 −1 4 14 
−21 −72 −28

 2 1 5 7
 x2 
 =  17  .
A=  10 13 97 92 
−17
  86 43 61 13 10 50 

 x3 
3 4 4 11 31 
 −33 31 2 41 65 −48 48 

x4
31 68 55 −3 35 19 −14
(3.1.8*)

54
§3.1 Matrix Multiplication of Vectors

and  
2.7

 6.1 


 −8.3 

x=
 8.9 .


 8.3 

 2 
−4.9

16. (matlab) Let


   
2 4 −1 2
A= 1 3 2  and b =  1 . (3.1.9*)
−1 −2 5 4

Find a 3-vector x such that Ax = b.

17. (matlab) Let


   
1.3 −4.15 −1.2 1.12
A =  1.6 −1.2 2.4  and b =  −2.1  .
−2.5 2.35 5.09 4.36
(3.1.10*)
Find a 3-vector x such that Ax = b.

18. (matlab) Let A be a 3 × 3 matrix. Find A so that


   
2 1
A  −1  =  1 
1 −1
   
1 −1
A  −1  =  −2 
0 1
   
0 5
A 2  =  1 .
4 1

Hint: Rewrite these three conditions as a system of linear


equations in the nine entries of A. Then solve this system
using MATLAB. (Then pray that there is an easier way.)

55
§3.2 Matrix Mappings

3.2 Matrix Mappings Here the matrix mapping is given by (x, y) 7→ (λx, µy);
that is, a mapping that independently stretches and/or
Having illustrated the notational advantage of using ma-
contracts the x and y coordinates. Even these simple
trices and matrix multiplication, we now begin to discuss
looking mappings can move objects in the plane in a
why there is also a conceptual advantage to matrix mul-
somewhat complicated fashion.
tiplication, a conceptual advantage that will help us to
understand how systems of linear equations and linear
differential equations may be solved. The Program map We use MATLAB to explore planar
Matrix multiplication allows us to view m × n matrices matrix mappings using the program map. In MATLAB
as mappings from Rn to Rm . Let A be an m × n matrix type the command
and let x be an n vector. Then
map
x 7→ Ax
and a window appears labeled Map. The 2 × 2 matrix
defines a mapping from R to R .
n m  
0 −1
. (3.2.1)
The simplest example of a matrix mapping is given by 1 0
1 × 1 matrices. Matrix mappings defined from R → R has been pre-entered. Click on the Custom button. In the
are Icons menu click on an icon — say Dog — and a blue ‘Dog’
x 7→ ax will appear in the graphing window. Next click on the
where a is a real number. Note that the graph of this Iterate button and a new version of the Dog will appear in
function is just a straight line through the origin (with yellow —the yellow Dog is just rotated about the origin
slope a). From this example we see that matrix mappings counterclockwise by 90◦ from the blue dog. Indeed, the
are very special mappings indeed. In higher dimensions, matrix (3.2.1) rotates the plane counterclockwise by 90◦ .
matrix mappings provide a richer set of mappings; we To verify this statement click on Iterate again and see
explore here planar mappings — mappings of the plane that the yellow dog rotates 90◦ counterclockwise into the
into itself — using MATLAB graphics and the program magenta dog. Of course, the magenta dog is rotated 180◦
map. from the original blue dog. Clicking on Iterate once more
produces a fourth dog — this one in cyan. Finally, one
The simplest planar matrix mappings are the dilatations. more click on the Iterate button will rotate the cyan dog
Let A = cI2 where c > 0 is a scalar. When c < 1 into a red dog that exactly covers the original blue dog.
vectors are contracted by a factor of c and and these
mappings are examples of contractions. When c > 1 Other matrices will produce different motions of the
vectors are stretched or expanded by a factor of c and plane. Click on the Reset button. Then either push the
these dilatations are examples of expansions. We now Custom button, type the entries in the matrix, and click
explore some more complicated planar matrix mappings. on the Iterate button; or choose one of the pre-assigned
matrices listed in the Gallery menu and click on the It-
The next planar motions that we study are those given erate button. For example, clicking on the Contracting
by the matrices rotation button recalls the matrix
   
λ 0 0.3 −0.8
A= .
0 µ 0.8 0.3

56
§3.2 Matrix Mappings

This matrix rotates the plane through an angle of approx- movement associated with the linear map x 7→ −cx where
imately 69.4◦ counterclockwise and contracts the plane x ∈ R2 and c > 0 may be thought of as a dilatation
by a factor of approximately 0.85. Now click on Dog in (x 7→ cx) followed by rotation through 180◦ (x 7→ −x).
the Icons menu to bring up the blue dog again. Repeated
We claim that combining dilatations with general rota-
clicking on Iterate rotates and contracts the dog so that tions produces spirals. Consider the matrix
dogs in a cycling set of colors slowly converge towards
the origin in a spiral of dogs.4
 
c cos θ −c sin θ
S= = cRθ
c sin θ c cos θ
Rotations Rotating the plane counterclockwise through where c < 1. Then a calculation similar to the previous
an angle θ is a motion given by a matrix mapping. We one shows that
show that the matrix that performs this rotation is:
  S(rvϕ ) = c(rvϕ+θ ).
cos θ − sin θ
Rθ = . (3.2.2) So S rotates vectors in the plane while contracting them
sin θ cos θ
by the factor c. Thus, multiplying a vector repeatedly by
To verify that Rθ rotates the plane counterclockwise S spirals that vector into the origin. The example that
through angle θ, let vϕ be the unit vector whose angle we just considered while using map is
from the horizontal is ϕ; that is, vϕ = (cos ϕ, sin ϕ). We
0.85 cos(69.4◦ ) −0.85 sin(69.4◦ )
   
can write every vector in R2 as rvϕ for some number 0.3 −0.8 ∼
= ◦ ◦ ,
r ≥ 0. Using the trigonometric identities for the cosine 0.8 0.3 0.85 sin(69.4 ) 0.85 cos(69.4 )
and sine of the sum of two angles, we have: which is an example of S with c = 0.85 and θ = 69.4◦ .
  
cos θ − sin θ r cos ϕ
Rθ (rvϕ ) =
sin θ cos θ r sin ϕ A Notation for Matrix Mappings We reinforce the idea

r cos θ cos ϕ − r sin θ sin ϕ
 that matrices are mappings by introducing a notation for
=
r sin θ cos ϕ + r cos θ sin ϕ the mapping associated with an m × n matrix A. Define
 
cos(θ + ϕ) LA : Rn → Rm
= r
sin(θ + ϕ)
= rvϕ+θ . by
LA (x) = Ax,
This calculation shows that Rθ rotates every vector in for every x ∈ R .
n
the plane counterclockwise through angle θ.
There are two special matrices: the m × n zero matrix O
It follows from (3.2.2) that R180◦ = −I2 . So rotating all of whose entries are 0 and the n × n identity matrix
a vector in the plane by 180◦ is the same as reflecting In whose diagonal entries are 1 and whose off diagonal
the vector through the origin. It also follows that the entries are 0. For instance,
4
When using the program map first choose an Icon (or Vector),  
second choose a Matrix from the Gallery (or a Custom matrix), 1 0 0
and finally click on Iterate. Then Iterate again or Reset to start I3 =  0 1 0  .
over. 0 0 1

57
§3.2 Matrix Mappings

The mappings associated with these special matrices are 6. What 2×2 matrix rotates the plane clockwise by 90◦ while
also special. Let x be an n vector. Then dilating it by a factor of 2?

Ox = 0, (3.2.3)
7. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
where the 0 on the right hand side of (3.2.3) is the m across the x axis.
vector all of whose entries are 0. The mapping LO is the
zero mapping — the mapping that maps every vector x
to 0. 8. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
across the y axis.
Similarly,
In x = x
9. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane
for every vector x. It follows that across the line x = y.
LIn (x) = x
10. Suppose the mapping L : R3 → R2 is linear and satisfies
is the identity mapping, since it maps every vector to
itself. It is for this reason that the matrix In is called the
     
1   0   0  
1 2 −1
n × n identity matrix. L 0  =
2
L 1  =
0
L 0  =
4
0 1 1

What is the 2 × 3 matrix A such that L = LA ?


Exercises

11. The matrix


In Exercises 1 – 3 find a nonzero vector that is mapped to the
 
1 K
A=
origin by the given matrix. 0 1

0 1
 is a shear. Describe the action of A on the plane for different
1. A = . values of K.
0 −2
 
1 2
2. B = . 12. Determine a rotation matrix that maps the vectors (3, 4)
−2 −4
  and (1, −2) onto the vectors (−4, 3) and (2, 1) respectively.
3 −1
3. C = .
−6 2
13. Find a 2 × 3 matrix P that projects three dimensional
xyz space onto the xy plane. Hint: Such a matrix will satisfy
4. What 2×2 matrix rotates the plane about the origin coun-    
0 x
terclockwise by 30◦ ?
   
0 x
P  0  = and P  y  = .
0 y
z 0

5. What 2 × 2 matrix rotates the plane clockwise by 45◦ ?

58
§3.2 Matrix Mappings

In Exercises 22 – 26 use map to help describe the planar mo-


 
a −b
14. Show that every matrix of the form corre-
b a tions of the associated linear mappings for the given 2 × 2
sponds to rotating the plane through the angle θ followed by matrix.
a dilatation cI2 where  √ 
p 3 1
c = a2 + b2 22. (matlab) A =  2 √2  .

a 1 3
cos θ = −
c 2 2
b  1
sin θ = . 1 
c −
23. (matlab) B =  2 1
2 .
1
  2 2
3 4
15. Using Exercise 14 observe that the matrix
 
0 1
−4 3 24. (matlab) C = .
rotates the plane counterclockwise through an angle θ and 1 0
then dilates the planes by a factor of c. Find θ and c. Use  
1 0
map to verify your results. 25. (matlab) D = .
0 0
 1 1 
In Exercises 16 – 18 use map to find vectors that are stretched
and/or contracted to a multiple of themselves by the given 26. (matlab) E =  2
1
2 .
1
linear mapping. Hint: Choose a vector in the Map window 2 2
and apply Iterate several times.
 
16. (matlab) A =
2 0
. 27. (matlab) The matrix
1.5 0.5
 
0 −1
A=
 
1.2 −1.5
17. (matlab) B = . −1 0
−0.4 1.2
  reflects the xy-plane across the diagonal line y = −x while
18. (matlab) C =
2 −1.25
. the matrix  
0 −0.5 −1 0
B=
0 −1

In Exercises 19 – 21 use Exercise 14 and map to verify that rotates the plane through an angle of 180◦ . Using the program
the given matrices rotate te through an angle θ followed by a map verify that both matrices map the vector (1, 1) to its neg-
dilatation cI2 . Find θ and c in each case. ative (−1, −1). Now perform two experiments. First, choose
the dog icon and move that dog by the matrix A. Second,
move that dog using the matrix B. Describe the difference in
 
1 −2
19. (matlab) A = .
2 1 the result.
 
−2.4 −0.2
20. (matlab) B = .
0.2 −2.4
 
2.67 1.3
21. (matlab) C = .
−1.3 2.67

59
§3.3 Linearity

3.3 Linearity w1 = A*(c*x)


w2 = c*(A*x)
We begin by recalling the vector operations of addition
and scalar multiplication. Given two n vectors, vector
addition is defined by and compare w1 and w2 to verify (3.3.2).

x1
 
y1
 
x1 + y1
 The central idea in linear algebra is the notion of linear-
 ..   .. .. ity.
 .   . + = . .
  

xn yn xn + yn Definition 3.3.1. A mapping L : Rn → Rm is linear if

Multiplication of a scalar times a vector is defined by (a) L(x + y) = L(x) + L(y) for all x, y ∈ Rn .
   
x1 cx1
(b) L(cx) = cL(x) for all x ∈ Rn and all scalars c ∈ R.
c  ...  =  ...  .
   

xn cxn To better understand the meaning of Defini-


Using (3.1.2) we can check that matrix multiplication tion 3.3.1(a,b), we verify these conditions for the
satisfies mapping L : R2 → R2 defined by

A(x + y) = Ax + Ay (3.3.1) L(x) = (x1 + 3x2 , 2x1 − x2 ), (3.3.4)


A(cx) = c(Ax). (3.3.2)
where x = (x1 , x2 ) ∈ R2 . To verify Definition 3.3.1(a),
Using MATLAB we can also verify that the identities let y = (y1 , y2 ) ∈ R2 . Then
(3.3.1) and (3.3.2) are valid for some particular choices
of x, y, c and A. For example, let c = 5 and L(x + y) = L(x1 + y1 , x2 + y2 )
= ((x1 + y1 ) + 3(x2 + y2 ), 2(x1 + y1 ) − (x2 + y2 ))
   
  1 1
2 3 4 1  5   −1  = (x1 + y1 + 3x2 + 3y2 , 2x1 + 2y1 − x2 − y2 ).
A= , x=  4  , y =  −1  .
  
1 1 2 3
3 4 On the other hand,
(3.3.3*)
Typing e3_3_3 enters this information into MATLAB. L(x) + L(y) = (x1 + 3x2 , 2x1 − x2 ) + (y1 + 3y2 , 2y1 − y2 )
Now type = (x1 + 3x2 + y1 + 3y2 , 2x1 − x2 + 2y1 − y2 ).

z1 = A*(x+y) Hence
z2 = A*x + A*y L(x + y) = L(x) + L(y)
and compare z1 and z2. The fact that they are both for every pair of vectors x and y in R2 .
equal to   Similarly, to verify Definition 3.3.1(b), let c ∈ R be a
35
scalar and compute
33
verifies (3.3.1) in this case. Similarly, type L(cx) = L(cx1 , cx2 ) = ((cx1 ) + 3(cx2 ), 2(cx1 ) − (cx2 )).

60
§3.3 Linearity

Then compute Examples of Mappings that are Not Linear

cL(x) = c(x1 +3x2 , 2x1 −x2 ) = (c(x1 +3x2 ), c(2x1 −x2 )), • f (x) = x2 . Calculate
from which it follows that f (x + y) = (x + y)2 = x2 + 2xy + y 2
L(cx) = cL(x) while
for every vector x ∈ R and every scalar c ∈ R. Thus L
2 f (x) + f (y) = x2 + y 2 .
is a linear mapping. The two expressions are not equal and f (x) = x2 is
In fact, the mapping (3.3.4) is a matrix mapping and not linear.
could have been written in the form • f (x) = ex . Calculate
 
1 3
L(x) =
2 −1
x. f (x + y) = ex+y = ex ey

Hence the linearity of L could have been checked using while


identities (3.3.1) and (3.3.2). Indeed, matrix mappings f (x) + f (y) = ex + ey .
are always linear mappings, as we now discuss. The two expressions are not equal and f (x) = ex is
not linear.
Matrix Mappings are Linear Mappings Let A be an m × n • f (x) = sin x. Recall that
matrix and recall that the matrix mapping LA : Rn →
Rm is defined by LA (x) = Ax. We may rewrite (3.3.1) f (x + y) = sin(x + y) = sin x cos y + cos x sin y
and (3.3.2) using this notation as
while
LA (x + y) = LA (x) + LA (y) f (x) + f (y) = sin x + sin y.
LA (cx) = cLA (x). The two expressions are not equal and f (x) = sin x
Thus all matrix mappings are linear mappings. We will is not linear.
show that all linear mappings are matrix mappings (see
Theorem 3.3.5). But first we discuss linearity in the sim- Linear Functions of One Variable Suppose we take the
plest context of mappings from R → R. opposite approach and ask what functions of R → R are
linear. Observe that if L : R → R is linear, then
Linear and Nonlinear Mappings of R → R Note that L(x) = L(x · 1).
1 × 1 matrices are just scalars A = (a). It follows from
(3.3.1) and (3.3.2) that we have shown that the matrix Since we are looking at the special case of linear mappings
mappings LA (x) = ax are all linear, though this point on R, we note that x is a real number as well as a vector.
could have been verified directly. Before showing that Thus we can use Definition 3.3.1(b) to observe that
these are all the linear mappings of R → R, we focus on
examples of functions of R → R that are not linear. L(x · 1) = xL(1).

61
§3.3 Linearity

So if we let a = L(1), then we see that Theorem 3.3.5. Let L : Rn → Rm be a linear mapping.
Then there exists an m × n matrix A such that L = LA .
L(x) = ax.
Thus linear mappings of R into R are very special map- Proof There are two steps to the proof: determine the
pings indeed; they are all scalar multiples of the identity matrix A and verify that LA = L.
mapping.
Let A be the matrix whose j th column is L(ej ). By
Lemma 3.3.4 L(ej ) = Aej ; that is, L(ej ) = LA (ej ).
All Linear Mappings are Matrix Mappings We end this Lemma 3.3.3 implies that L = LA . 
section by proving that every linear mapping is given
by matrix multiplication. But first we state and prove Theorem 3.3.5 provides a simple way of showing that
two lemmas. There is a standard set of vectors that is
used over and over again in linear algebra, which we now L(0) = 0
define.
for any linear map L. Indeed, L(0) = LA (0) = A0 = 0
Definition 3.3.2. Let j be an integer between 1 and n. for some matrix A. (This fact can also be proved directly
The n-vector ej is the vector that has a 1 in the j th entry from the definition of linear mapping.)
and zeros in all other entries.

Lemma 3.3.3. Let L1 : Rn → Rm and L2 : Rn → Rm be Using Theorem 3.3.5 to Find Matrices Associated to Linear
linear mappings. Suppose that L1 (ej ) = L2 (ej ) for every Maps The proof of Theorem 3.3.5 shows that the j th
j = 1, . . . , n. Then L1 = L2 . column of the matrix A associated to a linear mapping
L is L(ej ) viewed as a column vector. As an example,
Proof Let x = (x1 , . . . , xn ) be a vector in Rn . Then let L : R2 → R2 be rotation clockwise through 90◦ . Ge-
ometrically, it is easy to see that
x = x1 e1 + · · · + xn en .
   
Linearity of L1 and L2 implies that 1 0
L(e1 ) = L =
0 −1
L1 (x) = x1 L1 (e1 ) + · · · + xn L1 (en )
and
= x1 L2 (e1 ) + · · · + xn L2 (en ) 
0
 
1

= L2 (x). L(e2 ) = L = .
1 0
Since L1 (x) = L2 (x) for all x ∈ Rn , it follows that L1 = Since we know that rotations are linear maps, it follows
L2 .  that the matrix A associated to the linear map L is:
Lemma 3.3.4. Let A be an m × n matrix. Then Aej is
 
0 1
the j th column of A. A=
−1 0
.

Proof Recall the definition of matrix multiplication Additional examples of linear mappings whose associated
given in (3.1.2). In that formula, just set xi equal to zero matrices can be found using Theorem 3.3.5 are given in
for all i 6= j and set xj = 1.  Exercises 11 – 14.

62
§3.3 Linearity

Exercises 8. T : R2 → R2 defined by T (x1 , x2 ) = (x1 + x2 , x1 − x2 − 1).

9. T : R2 → R3 defined by T (x1 , x2 ) = (1, x1 + x2 , 2x2 )

1. Compute ax + by for each of the following:


10. Determine which of the following maps are linear maps.
(a) a = 2, b = −3, x = (2, 4) and y = (3, −1). If the map is linear give the matrix associated to the linear
(b) a = 10, b = −2, x = (1, 0, −1) and y = (2, −4, 3). map. Explain your reasoning.
(c) a = 5, b = −1, x = (4, 2, −1, 1) and y = (−1, 3, 5, 7). 
x
 
x+y+3

(a) L1 : R2 → R2 where L1 =
y 2y + 1

2. Let x = (4, 7) and y = (2, −1). Write the vector αx + βy


 
  sin x
x
as a vector in coordinates. (b) L2 : R2 → R3 where L2 = x+y 
y
2y
 
x
3. Let x = (1, 2), y = (1, −3), and z = (−2, −1). Show that (c) L3 : R2 → R where L3 =x+y
y
you can write
z = αx + βy
for some α, β ∈ R. 11. Find the 2 × 3 matrix A that satisfies
Hint: Set up a system of two linear equations in the un- 
2
 
1
 
0

knowns α and β, and then solve this linear system. Ae1 = , Ae2 = , and Ae3 = .
3 −1 1

4. Can the vector z = (2, 3, −1) be written as


12. The cross product of two 3-vectors x = (x1 , x2 , x3 ) and
z = αx + βy y = (y1 , y2 , y3 ) is the 3-vector

where x = (2, 3, 0) and y = (1, −1, 1)? x × y = (x2 y3 − x3 y2 , −(x1 y3 − x3 y1 ), x1 y2 − x2 y1 ).

Let K = (2, 1, −1).


5. Let x = (3, −2), y = (2, 3), and z = (1, 4). For which real
numbers α, β, γ does (a) Show that the mapping L : R3 → R3 defined by

αx + βy + γz = (1, −2)? L(x) = x × K

is a linear mapping.
In Exercises 6 – 9 determine whether the given transformation (b) Find the 3 × 3 matrix A such that
is linear.
L(x) = Ax,
6. T : R3 → R2 defined by T (x1 , x2 , x3 ) = (x1 +2x2 −x3 , x1 −
4x3 ). that is, L = LA .

7. T : R2 → R2 defined by T (x1 , x2 ) = (x1 + x1 x2 , 2x2 ).

63
§3.3 Linearity

13. Argue geometrically that rotation of the plane counter- 18. (matlab) Let
clockwise through an angle of 45◦ is a linear mapping. Find a
2 × 2 matrix A such that LA rotates the plane counterclock-
 
0 0.5
A= .
wise by 45◦ . −0.5 0

Use map to determine how the mapping LA acts on 2-vectors.


Describe this action in words.
14. Let σ : R3 → R3 permute coordinates cyclically; that is,

σ(x1 , x2 , x3 ) = (x2 , x3 , x1 ). In Exercises 19 – 20 use MATLAB to verify (3.3.1) and (3.3.2).

Find the 3 × 3 matrix A such that σ = LA . 19. (matlab)


     
1 2 3 3 0
A =  0 1 −2  , x =  2 , y =  −5  , c = 21;
15. Let L be a linear map. Using the definition of linearity,
4 0 1 −1 10
prove that L(0) = 0.
(3.3.5*)

20. (matlab)
16. Let P : Rn → Rm and Q : Rn → Rm be linear mappings.    
4 0 −3 2 4 1
(a) Prove that S : Rn → Rm defined by −4 −1
 2 8 3   3 
   
A=  −1 2 1 10 −2 −2
, x= 
  
S(x) = P (x) + Q(x)  4 4 −2 1 2   3 
−2 3 1 1 −1 −1
is also a linear mapping. (3.3.6*)
(b) Theorem 3.3.5 states that there are matrices A, B and 2
 
C such that  0 
 
 13  ,
y= c = −13.
and Q = LB and

P = LA S = LC .  −2 
What is the relationship between the matrices A, B, and 1
C?

17. (matlab) Let


 
0.5 0
A= .
0 2

Use map to verify that the linear mapping LA halves the x-


component of a point while it doubles the y-component.

64
§3.4 The Principle of Superposition

3.4 The Principle of Superposition We illustrate this principle by explicitly solving the sys-
tem of equations
The principle of superposition is just a restatement of
the fact that matrix mappings are linear. Nevertheless,
 
x1
this restatement is helpful when trying to understand the
   
1 2 −1 1   x2  =
 0
structure of solutions to systems of linear equations. 2 5 −4 −1  x3  0
.
x4

Homogeneous Equations A system of linear equations Use row reduction to show that the matrix
is homogeneous if it has the form  
1 2 −1 1
2 5 −4 −1
Ax = 0, (3.4.1)
is row equivalent to
where A is an m × n matrix and x ∈ Rn . Note that ho-
mogeneous systems are consistent since 0 ∈ Rn is always
 
1 0 3 7
a solution, that is, A(0) = 0. 0 1 −2 −3
The principle of superposition makes two assertions: which is in reduced echelon form. Recall, using the meth-
ods of Section 2.3, that every solution to this linear sys-
• Suppose that y and z in Rn are solutions to (3.4.1) tem has the form
(that is, suppose that Ay = 0 and Az = 0); then 
−3x3 − 7x4
 
−3
 
−7

y + z is a solution to (3.4.1).  2x3 + 3x4 
 = x3  2  + x4  3  .
   

x3 1 0 
• Suppose that c is a scalar; then cy is a solution to
    
x4 0 1
(3.4.1).
Superposition is verified again by observing that the form
The principle of superposition is proved using the linear- of the solutions is preserved under vector addition and
ity of matrix multiplication. Calculate scalar multiplication. For instance, suppose that
       
−3 −7 −3 −7
A(y + z) = Ay + Az = 0 + 0 = 0  2   3   2   3
and β1 

α1  +α +β
1  2 0  1  2 0
       
to verify that y + z is a solution, and calculate

0 1 0 1
A(cy) = c(Ay) = c · 0 = 0 are two solutions. Then the sum has the form
   
to verify that cy is a solution. −3 −7
 2
We see that solutions to homogeneous systems of linear  + γ2  3 
  
γ1 
 1  0 
equations always satisfy the general property of superpo-

0 1
sition: sums of solutions are solutions and scalar multi-
ples of solutions are solutions. where γj = αj + βj .

65
§3.4 The Principle of Superposition

We have actually proved more than superposition. We the homogeneous equation. More precisely, suppose that
have shown in this example that every solution is a su- we know all of the solutions w to the homogeneous equa-
perposition of just two solutions tion Ax = 0 and one solution y to the inhomogeneous
    equation Ax = b. Then y + w is another solution to
−3 −7 the inhomogeneous equation and every solution to the
 2   3 
inhomogeneous equation has this form.
 1  and  0  .
   

0 1
An Example of an Inhomogeneous Equation Suppose that
Inhomogeneous Equations The linear system of m we want to find all solutions of Ax = b where
equations in n unknowns is written as 
3 2 1
 
−2

Ax = b A =  0 1 −2  and b =  4  .
3 3 −1 2
where A is an m × n matrix, x ∈ Rn , and b ∈ Rm . This
system is inhomogeneous when the vector b is nonzero. Suppose that you are told that y = (−5, 6, 1)t is a solu-
Note that if y, z ∈ Rn are solutions to the inhomogeneous tion of the inhomogeneous equation. (This fact can be
equation (that is, Ay = b and Az = b), then y − z is a verified by a short calculation — just multiply Ay and
solution to the homogeneous equation. That is, see that the result equals b.) Next find all solutions to the
homogeneous equation Ax = 0 by putting A into reduced
A(y − z) = Ay − Az = b − b = 0.
echelon form. The resulting row echelon form matrix is
For example, let
5
 
    1 0
1 2 0 3 3 
and b = 1 −2  .

A= .  0
−2 0 1 −1
0 0 0
Then
Hence we see that the solutions of the homogeneous equa-
   
1 3
y= 1  and z =  0  tion Ax = 0 are
1 5
5 5
   
are both solutions to the linear system Ax = b. It follows − s −
 3   3 
that    2s  = s  2 .
−2
s 1
y−z = 1 
−4 Combining these results, we conclude that all the solu-
is a solution to the homogeneous system Ax = 0, which tions of Ax = b are given by
can be checked by direct calculation.
5
 
Thus we can completely solve the inhomogeneous equa-
 
−5 −
3 .
tion by finding one solution to the inhomogeneous equa-  6  + s
 2 
tion and then adding to that solution every solution of 1 1

66
§3.4 The Principle of Superposition

Exercises 5. Let A be a 3 × 3 matrix with rank 2. Suppose the linear


system Ax = b has two solutions
   
1 0
y =  3  and z =  0 
1. Consider the homogeneous linear equation 4 1
x+y+z =0 Find the full set of solutions to Ax = b.

(a) Write all solutions to this equation as a general super-


position of a pair of vectors v1 and v2 . 6. Suppose A is a 3 × 3 full rank matrix. Determine how
many solutions the homogeneous system Ax = 0 has and how
(b) Write all solutions as a general superposition of a second
many solutions the inhomogenous system Ax = b has.
pair of vectors w1 and w2 .

7. Suppose A is a 3 × 3 matrix with rank less than 3. De-


2. Write all solutions to the homogeneous system of linear termine how many solution the homogeneous system Ax = 0
equations has and how many solutions the inhomogenous system Ax = b
has.
x1 + 2x2 + x4 − x5 = 0
x3 − 2x4 + x5 = 0
8. Let A be a 3 × 3 matrix. Suppose
as the general superposition of three vectors.
       
−1 3 0 −2
A 2  =  1  and A  4  =  0  .
1 1 0 1
3. (a) Find all solutions to the homogeneous equation Ax =
0 where (a) Find a solution to the inhomogeneous system
 
2 3 1 
1

A= .
1 1 4 Ax =  1  .
(b) Find a single solution to the inhomogeneous equation 2
  (b) Is the solution unique?
6
Ax = . (3.4.2)
6
9. Let A be a n × m matrix. Suppose there are n vectors
(c) Use your answers in (a) and (b) to find all solutions to u1 , . . . , un such that
(3.4.2).      
1 0 0
 0   1   0 
Au1 =  .  , Au2 =  .  , . . . , Aun =  .. .
     
 ..   ..   . 
4. How many solutions can a homogeneous system of 4 linear 0 0 1
equations in 7 unknowns have?
Then verify that for any m × 1 vector b, the inhomogeneous
equation Ax = b always has a solution.

67
§3.4 The Principle of Superposition

10. Let A be an n × n matrix with rank n − 1. Suppose u and (a) Determine the matrix representation of L. Namely, find
v in Rn are distinct solutions to the inhomogeneous system the matrix A such that LA = L.
Ax = b. Verify that every solution to Ax = b can be written 
0

 
as αu + (1 − α)v for some α ∈ R. (b) Verify that  2  is a solution to Lx =
0
2
(Hint: idea is similar to Exercise 5.) −2
 
0
(c) Find the full set of solutions of Lx = .
2
11. Let L : R3 → R4 be a mapping such that
 
x1 − x2
 x1 + x2 − x3 
Lx = 
 −x1 + x2 + 4x3 

−3x1 + x2 − x3
 
x1
for all x =  x2  .
x3

(a) Find matrix representative of L.


 
−1/2
(b) Verify that x =  −3/2  is a solution to the equation
−2
 
1
 0 
Lx =  −9 .

2
 
1
 0 
(c) Find the full set of solutions of Lx = 
 −9 .

12. Let L : R3 → R2 be a linear mapping such that


   
1   0  
−3 2
L  −1  = , L 1  = ,
1 −1
0 1

and  
0  
0
L 1  = .
1
−1

68
§3.5 Composition and Multiplication of Matrices

3.5 Composition and Multiplication With this computation in mind, we define the product
of Matrices 
2 1

0 3
 
−1 10

AB = = .
The composition of two matrix mappings leads to another 1 −1 −1 4 1 −1
matrix mapping from which the concept of multiplication
of two matrices follows. Matrix multiplication can be Using the same approach we can derive a formula for
introduced by formula, but then the idea is unmotivated matrix multiplication of 2 × 2 matrices. Suppose
and one is left to wonder why matrix multiplication is 
a11 a12
 
b11 b12

defined in such a seemingly awkward way. A= and B = .
a21 a22 b21 b22
We begin with the example of 2 × 2 matrices. Suppose
that Then
 
    b11 x1 + b12 x2
2 1 0 3 A(Bx) = A
A= and B = . b21 x1 + b22 x2
1 −1 −1 4  
a11 (b11 x1 + b12 x2 ) + a12 (b21 x1 + b22 x2 )
We have seen that the mappings =
a21 (b11 x1 + b12 x2 ) + a22 (b21 x1 + b22 x2 )
x 7→ Ax and
 
x 7→ Bx (a11 b11 + a12 b21 )x1 + (a11 b12 + a12 b22 )x2
=
(a21 b11 + a22 b21 )x1 + (a21 b12 + a22 b22 )x2
map 2-vectors to 2-vectors. So we can ask what hap- 
a11 b11 + a12 b21 a11 b12 + a12 b22

x1

pens when we compose these mappings. In symbols, we =
a21 b11 + a22 b21 a21 b12 + a22 b22 x2
.
compute
Hence, for 2 × 2 matrices, we see that composition of
LA ◦LB (x) = LA (LB (x)) = A(Bx). matrix mappings defines the matrix multiplication
In coordinates, let x = (x1 , x2 ) and compute
  
a11 a12 b11 b12
  a21 a22 b21 b22
3x2
A(Bx) = A
−x1 + 4x2 to be
 
−x1 + 10x2
 
a11 b11 + a12 b21 a11 b12 + a12 b22
=
x1 − x2
. . (3.5.1)
a21 b11 + a22 b21 a21 b12 + a22 b22
It follows that we can rewrite A(Bx) using multiplication
Formula (3.5.1) may seem a bit formidable, but it does
of a matrix times a vector as
have structure. Suppose A and B are 2×2 matrices, then
the entry of
  
−1 10 x1
A(Bx) = .
1 −1 x2 C = AB

In particular, LA ◦LB is again a linear mapping, namely in the ith row, j th column may be written as
LC , where   2
−1 10
X
C= . ai1 b1j + ai2 b2j = aik bkj .
1 −1 k=1

69
§3.5 Composition and Multiplication of Matrices

We shall see that an analog of this formula is available where Bj ≡ Bej is the j th column of the matrix B.
for matrix multiplications of all sizes. But to derive this Therefore,
formula, it is easier to develop matrix multiplication ab-
stractly. C = (AB1 | · · · |ABp ). (3.5.2)

Lemma 3.5.1. Let L1 : Rn → Rm and L2 : Rp → Rn be


Indeed, the (i, j)th entry of C is the ith entry of ABj ,
linear mappings. Then L = L1 ◦L2 : Rp → Rm is a linear
that is, the ith entry of
mapping.
   
b1j a11 b1j + · · · + a1n bnj
Proof Compute
A  ...  =  ..
. .
   
L(x + y) = L1 ◦L2 (x + y) bnj am1 b1j + · · · + amn bnj
= L1 (L2 (x) + L2 (y))
= L1 (L2 (x)) + L1 (L2 (y)) It follows that the entry cij of C in the ith row and j th
= L1 ◦L2 (x) + L1 ◦L2 (y) column is
= L(x) + L(y).
n
aik bkj . (3.5.3)
X
Similarly, compute L1 ◦L2 (cx) = cL1 ◦L2 (x).  cij = ai1 b1j + ai2 b2j + · · · + ain bnj =
k=1

We apply Lemma 3.5.1 in the following way. Let A be


an m × n matrix and let B be an n × p matrix. Then We can interpret (3.5.3) in the following way. To cal-
LA : Rn → Rm and LB : Rp → Rn are linear mappings, culate cij : multiply the entries of the ith row of A with
and the mapping L = LA ◦LB : Rp → Rm is defined the corresponding entries in the j th column of B and add
and linear. Theorem 3.3.5 implies that there is an m × p the results. This interpretation reinforces the idea that
matrix C such that L = LC . Abstractly, we define the for the matrix product AB to be defined, the number of
matrix product AB to be C. columns in A must equal the number of rows in B.
For example, we now perform the following multiplica-
Note that the matrix product AB is defined only tion:
when the number of columns of A is equal to the
number of rows of B.
 
1 −2
Calculating the Product of Two Matrices Next we discuss
 
2 3 1  3 1 
how to calculate the product of matrices; this discussion 3 −1 2
−1 4
generalizes our discussion of the product of 2×2 matrices.  
Lemma 3.3.4 tells how to compute C = AB. The j th 2 · 1 + 3 · 3 + 1 · (−1) 2 · (−2) + 3 · 1 + 1 · 4
=
column of the matrix product is just 3 · 1 + (−1) · 3 + 2 · (−1) 3 · (−2) + (−1) · 1 + 2 · 4
 
10 3
Cej = A(Bej ), = .
−2 1

70
§3.5 Composition and Multiplication of Matrices

Some Special Matrix Products Let A be an m×n matrix.


 
8 −1
Then 4. Answer: A =  −3 12  and B =
5 −4
 
OA = O 2 8 0 −3
 1 4 0 1 
AO = O −5 6 7 −20
AIn = A
Im A = A In Exercises 5 – 8 compute the given matrix product.

The first two equalities are easily checked using (3.5.3).


  
2 3 −1 1
5. .
−3 2
It is not significantly more difficult to verify the last two 0 1
equalities using (3.5.3), but we shall verify these equali-
 
  2 3
ties using the language of linear mappings, as follows: 6.
1 2 3 
−2 5 .
−2 3 −1
1 −1
LAIn (x) = LA ◦LIn (x) = LA (x),  
2 3  
1 2 3
since LIn (x) = x is the identity map. Therefore AIn = A. 7.  −2 5  .
−2 3 −1
1 −1
A similar proof verifies that Im A = A. Although the
verification of these equalities using the notions of linear
  
2 −1 3 1 7
mappings may appear to be a case of overkill, the next 8.  1 0 5   −2 −1 .
section contains results where these notions truly simplify 1 5 −1 −5 3
the discussion.
9. Determine all the 2 × 2 matrices B such that AB = BA
where A is the matrix
Exercises  
2 0
A= .
0 −1
In Exercises 1 – 4 determine whether or not the matrix prod-
ucts AB or BA can be computed for each given pair of ma- 10. Let
trices A and B. If the product is possible, perform the com-    
putation. 2 5 a 3
A= and B = .
1 4 b 2
   
1 0 −2 0
1. A = and B = . For which values of a and b does AB = BA?
−2 1 3 −1
   
0 −2 1 0 2 11. Let
2. A = and B = .
4 10 0 3 −1
 
1 0 −3
  A =  −2 1 1 .
  0 2 5 0 1 −5
8 0 2 3
3. A = and B =  −1 3 −1 .
−3 0 −10 3
0 1 −5 Let At is the transpose of the matrix A, as defined in Sec-
tion 1.3. Compute AAt .

71
§3.5 Composition and Multiplication of Matrices

In Exercises 12 – 14 decide for the given pair of matrices A


 
x1
and B whether or not the products AB or BA are defined 16. Compute A  x2 . Then describe in words how A acts
and compute the products when possible. x3
on each coordinate vector
12. (matlab)      
  1 0 0
  3 −2 0 e1 =  0  , e 2 =  1  , e 3 =  0  .
2 2 −2
A= and B= 0 −1 4  0 0 1
−4 4 0
−2 −3 5
(3.5.4*)
 
x1
17. Compute B  x2  and describe in words how B acts
13. (matlab) x3
 on each coordinate
 vector
  1 3 −4 3 −2 1
−4 1 0 5 −1
0 3 2 3 −1 4 
     
and B =  1 0 0

A= 5 −1 −2 −4 −2  
 5 4 4 5 −1 e01 =  0  , e2 =  1  , e3 =  0  .

1 5 −4 1 5
−4 −3 2 4 1 4 0 0 1
(3.5.5*)
18. Compute A2 and A3 and verify that each is a permutation
14. (matlab)
matrix.
   
−2 −2 4 5 2 3 −4 5
 0 −3 19. Compute AB, A2 B, and BA and verify that each is a
−4 3   4 −3 0 −2 
A=  1 −3
 and B=
 −3
 permutation matrix.
1 1  −4 −4 −3 
0 1 0 4 −2 −2 3 −1 20. Find all 3 × 3 permutation matrices.
(3.5.6*)

Exercise 15 - 22 discuss permutation matrices: n × n matrices 21. Show that there are n! different permutation matrices of
that have exactly one 1 in each row and each column, and size n × n, where n! = n · (n − 1) · · · 2 · 1. Hint: Use induction.
whose remaining entries are 0.
15. There
  are two 2× 2 permutation
  matrices: I2 and 22. Consider the following n × n permutation matrix
0 1 0 1 x1
. Compute . Then describe in
1 0 1 0 x2
0 1 0 0 ··· 0
   
0 1
words how acts on each coordinate vector  0 0 1 0 ··· 0 
1 0  
 0 0 0 1 ··· 0 
A= .. .. .. .. .. .
     
1 0 . . . . .
e1 = , e2 = .  
0 1 
 0 0 0 0 ··· 1


1 0 0 0 ··· 0
Exercises 16 - 20 consider 3 × 3 permutation matrices. Let Show that An = In . Hint: Describe the action of A on
    coordinate vectors.
0 1 0 1 0 0
A= 0 0 1  and B =  0 0 1 .
1 0 0 0 1 0

72
§3.6 Properties of Matrix Multiplication

3.6 Properties of Matrix and


Multiplication A(BC) = (AB)C.

In this section we discuss the facts that matrix multi- 


plication is associative (but not commutative) and that
certain distributive properties hold. We also discuss how It is worth convincing yourself that Theorem 3.6.1 has
matrix multiplication is performed in MATLAB . content by verifying by hand that matrix multiplication
of 2 × 2 matrices is associative.
Matrix Multiplication is Associative
Matrix Multiplication is Not Commutative Although ma-
Theorem 3.6.1. Matrix multiplication is associative. trix multiplication is associative, it is not commutative.
That is, let A be an m × n matrix, let B be a n × p This statement is trivially true when the matrix AB is
matrix, and let C be a p × q matrix. Then defined while that matrix BA is not. Suppose, for exam-
ple, that A is a 2 × 3 matrix and that B is a 3 × 4 matrix.
(AB)C = A(BC).
Then AB is a 2 × 4 matrix, while the multiplication BA
makes no sense whatsoever.
Proof Begin by observing that composition of map-
pings is always associative. In symbols, let f : Rn → Rm , More importantly, suppose that A and B are both n × n
g : Rp → Rn , and h : Rq → Rp . Then square matrices. Then AB = BA is generally not valid.
For example, let
f ◦(g ◦h)(x) = f [(g ◦h)(x)]    
1 0 0 1
= f [g(h(x))] A= and B = .
0 0 0 0
= (f ◦g)(h(x))
= [(f ◦g)◦h](x). Then
   
0 1 0 0
It follows that AB = and BA = .
0 0 0 0
f ◦(g ◦h) = (f ◦g)◦h.
So AB 6= BA. In certain cases it does happen that AB =
We can apply this result to linear mappings. Thus BA. For example, when B = In ,
LA ◦(LB ◦LC ) = (LA ◦LB )◦LC .
AIn = A = In A.
Since
But these cases are rare.
LA(BC) = LA ◦LBC = LA ◦(LB ◦LC )
and
Additional Properties of Matrix Multiplication Recall that
L(AB)C = LAB ◦LC = (LA ◦LB )◦LC ,
if A = (aij ) and B = (bij ) are both m × n matrices, then
it follows that A + B is the m × n matrix (aij + bij ). We now enumerate
LA(BC) = L(AB)C , several properties of matrix multiplication.

73
§3.6 Properties of Matrix Multiplication

• Let A and B be m×n matrices and let C be an n×p where atjk is the (j, k)th entry in At and btij is the (i, j)th
matrix. Then entry in B t . It follows from the definition of transpose
that the (i, k)th entry in B t At is:
(A + B)C = AC + BC.
n n
Similarly, if D is a q × m matrix, then
X X
bji akj = akj bji ,
j=1 j=1
D(A + B) = DA + DB.
which verifies the claim.
So matrix multiplication distributes across matrix
addition.
Matrix Multiplication in MATLAB Let us now explain
• If α and β are scalars, then how matrix multiplication works in MATLAB. We load
(α + β)A = αA + βA. the matrices
 
−5 2 0
So addition distributes with scalar multiplication.
 
 −1 1 −4  2 −2 −2 5 5
A=  and B =  4 −5 1 −1 2 
• Scalar multiplication and matrix multiplication sat-  −4 4 2 
3 2 3 −3 3
isfy: −1 3 −1
(αA)C = α(AC). (3.6.2*)
by typing
Matrix Multiplication and Transposes Let A be an m × n e3_6_2
matrix and let B be an n × p matrix, so that the matrix
product AB is defined and AB is an m × p matrix. Note
Now the command C = A*B asks MATLAB to compute
that At is an n×m matrix and that B t is a p×n matrix, so
the matrix C as the product of A and B. We obtain
that in general the product At B t is not defined. However,
the product B t At is defined and is an p × m matrix, as
C =
is the matrix (AB)t . We claim that
-2 0 12 -27 -21
(AB)t = B t At . (3.6.1) -10 -11 -9 6 -15
14 -8 18 -30 -6
We verify this claim by direct computation. The (i, k)th 7 -15 2 -5 -2
entry in (AB)t is the (k, i)th entry in AB. That entry is:
n Let us confirm this result by another computation. As
we have seen above the 4th column of C should be given
X
akj bji .
j=1 by the product of A with the 4th column of B. Indeed,
if we perform this computation and type
The (i, k)th entry in B t At is:
n A*B(:,4)
X
btij atjk ,
j=1 the result is

74
§3.6 Properties of Matrix Multiplication

ans = 3. Let
-27
 
0 1 0
6 A= 0 0 1 .
0 0 0
-30
-5 1 1
Compute B = I3 + A + A2 and C = I3 + tA + (tA)2 where
2 2
t is a real number.
which is precisely the 4th column of C.
MATLAB also recognizes when a matrix multiplication of
two matrices is not defined. For example, the product of 4. Let
the 3 × 5 matrix B with the 4 × 3 matrix A is not defined,    
1 0 0 −1
and if we type B*A then we obtain the error message I=
0 1
and J=
1 0
.

??? Error using ==> * (a) Show that J 2 = −I.


Inner matrix dimensions must agree.
(b) Evaluate (aI + bJ)(cI + dJ) in terms of I and J.

We remark that the size of a matrix A can be seen using


the MATLAB command size. For example, the com-
5. Recall that a square matrix C is upper triangular if cij = 0
mand size(A) leads to when i > j. Show that the matrix product of two upper
triangular n × n matrices is also upper triangular.
ans =
4 3
In Exercises 6 – 8 use MATLAB to verify that (A + B)C =
reflecting the fact that A is a matrix with four rows and AC + BC for the given matrices.
three columns.    
0 2 −2 1
6. (matlab) A = , B = and C =
2 1 3 0
Exercises
 
2 −1
1 5
   
12 −2 8 −20
7. (matlab) A = , B = and
3 1 3 10
1. Let A be an m × n matrix. Show that the matrices AAt  
10 2 4
and At A are symmetric. C=
2 13 −4
   
6 1 2 −10
2. Let
8. (matlab) A =  3 20 , B =  5 0  and
−5
   
1 2 2 3 3 3 1
A= and B= .  
−1 −1 1 4 −2 10
C=
12 10
Compute AB and B t At . Verify that (AB)t = B t At for these
matrices A and B.

75
§3.6 Properties of Matrix Multiplication

9. (matlab) Use the rand(3,3) command in MATLAB to


choose five pairs of 3 × 3 matrices A and B at random. Com-
pute AB and BA using MATLAB to see that in general these
matrix products are unequal.

10. (matlab) Experimentally, find two symmetric 2 × 2


matrices A and B for which the matrix product AB is not
symmetric.

11. Verify associativity of matrix multiplication (AB)C =


A(BC) by hand where
 
 1 
A = 4 −2 B= C= 0 2 .
−3

12. Verify associativity of matrix multiplication (AB)C =


A(BC) by hand where
 
  0 −1  
2 3 1 3 1
A= B= 2 3  C= .
0 −1 2 0 −2
−2 1

13. (matlab) Verify associativity of matrix multiplication


(AB)C = A(BC) using MATLAB where
 
5 1    
2 0 4 12 −3
A =  −8 −2  B = C=
3 −6 −2 −5 7
7 10

76
§3.7 Solving Linear Systems and Inverses

3.7 Solving Linear Systems and Invertibility We begin by giving a precise definition of
invertibility for square matrices.
Inverses
When we solve the simple equation Definition 3.7.1. The n × n matrix A is invertible if
there is an n × n matrix B such that
ax = b,
AB = In and BA = In .
we do so by dividing by a to obtain
The matrix B is called an inverse of A. If A is not
1 invertible, then A is noninvertible or singular.
x = b.
a
This division works as long as a 6= 0. Geometrically, we can see that some matrices are invert-
ible. For example, the matrix
Writing systems of linear equations as
 
0 −1
Ax = b R90 =
1 0
suggests that solutions should have the form
rotates the plane counterclockwise through 90◦ and is
1 invertible. The inverse matrix of R90 is the matrix that
x= b
A rotates the plane clockwise through 90◦ . That matrix is:
and the MATLAB command for solving linear systems 
0 1

R−90 = .
−1 0
x=A\b
This statement can be checked algebraically by verifying
suggests that there is some merit to this analogy. that R90 R−90 = I2 and that R−90 R90 = I2 .
The following is a better analogy. Multiplication by a Similarly,
has the inverse operation: division by a; multiplying a
 
5 3
number x by a and then multiplying the result by a−1 = B=
2 1
1/a leaves the number x unchanged (as long as a 6= 0).
is an inverse of
In this sense we should write the solution to ax = b as
 
−1 3
x = a−1 b. A= ,
2 −5
For systems of equations Ax = b we wish to write solu-
as matrix multiplication shows that AB = I2 and BA =
tions as
I2 . In fact, there is an elementary formula for finding
x = A−1 b.
inverses of 2 × 2 matrices (when they exist); see (3.8.1)
In this section we consider the questions: What does A−1 in Section 3.8.
mean and when does A−1 exist? (Even in one dimension,
On the other hand, not all matrices are invertible. For
we have seen that the inverse does not always exist, since
1 example, the zero matrix is noninvertible, since 0B = 0
0−1 = is undefined.) for any matrix B.
0

77
§3.7 Solving Linear Systems and Inverses

Lemma 3.7.2. If an n × n matrix A is invertible, then Invertibility and Unique Solutions Next we discuss the
its inverse is unique and is denoted by A−1 . implications of invertibility for the solution of the inho-
mogeneous linear system:
Proof Let B and C be n×n matrices that are inverses
of A. Then Ax = b, (3.7.1)
BA = In and AC = In . where A is an n × n matrix and b ∈ Rn .
We use the associativity of matrix multiplication to prove Proposition 3.7.5. Let A be an invertible n × n matrix
that B = C. Compute and let b be in Rn . Then the system of linear equations
B = BIn = B(AC) = (BA)C = In C = C. (3.7.1) has a unique solution.


Proof We can solve the linear system (3.7.1) by setting
We now show how to compute inverses for products of x = A−1 b. (3.7.2)
invertible matrices.
Proposition 3.7.3. Let A and B be two invertible n × n This solution is easily verified by calculating
matrices. Then AB is also invertible and Ax = A(A−1 b) = (AA−1 )b = In b = b.
−1 −1 −1
(AB) =B A .
Next, suppose that x is a solution to (3.7.1). Then
Proof Use associativity of matrix multiplication to
x = In x = (A−1 A)x = A−1 (Ax) = A−1 b.
compute
(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In . So A−1 b is the only possible solution. 

Similarly, Corollary 3.7.6. An invertible matrix is row equivalent


−1 −1 −1 −1 −1 to In .
(B A )(AB) = B (A A)B = B B = In .
Therefore AB is invertible with the desired inverse.  Proof Let A be an invertible n × n matrix. Propo-
Proposition 3.7.4. Suppose that A is an invertible n×n sition 3.7.5 states that the system of linear equations
matrix. Then At is invertible and Ax = b has a unique solution. Chapter 2, Corollary 2.4.8
states that A is row equivalent to In . 
(At )−1 = (A−1 )t .
The converse of Corollary 3.7.6 is also valid.
Proof We must show that (A−1 )t is the inverse of At .
Identity (3.6.1) implies that Proposition 3.7.7. An n×n matrix A that is row equiv-
(A −1 t t
) A = (AA −1 t
) = (In ) = In , t alent to In is invertible.

and Proof Form the n × 2n matrix M = (A|In ). Since


At (A−1 )t = (A−1 A)t = (In )t = In . A is row equivalent to In , there is a sequence of ele-
Therefore, (A−1 )t is the inverse of At , as claimed.  mentary row operations so that M is row equivalent to

78
§3.7 Solving Linear Systems and Inverses

(In |B). Eliminating all columns from the right half of (b) ⇒ (c) This implication is straightforward — just take
M except the j th column yields the matrix (A|ej ). The b = 0 in (3.7.1).
same sequence of elementary row operations states that
(c) ⇒ (d) This implication is just a restatement of Chap-
the matrix (A|ej ) is row equivalent to (In |Bj ) where Bj
ter 2, Corollary 2.4.8.
is the j th column of B. It follows that Bj is the solution
to the system of linear equations Ax = ej and that the (d) ⇒ (a). This implication is just Proposition 3.7.7. 
matrix product

AB = (AB1 | · · · |ABn ) = (e1 | · · · |en ) = In . A Method for Computing Inverse Matrices The proof
of Proposition 3.7.7 gives a constructive method for find-
So AB = In . ing the inverse of any invertible square matrix.
We claim that BA = In and hence that A is invertible. Theorem 3.7.9. Let A be an n × n matrix that is row
To verify this claim form the n × 2n matrix N = (In |A). equivalent to In and let M be the n×2n augmented matrix
Using the same sequence of elementary row operations
M = (A|In ). (3.7.3)
again shows that N is row equivalent to (B|In ). By con-
struction the matrix B is row equivalent to In . There- Then the matrix M is row equivalent to (In |A−1 ).
fore, there is a unique solution to the system of linear
equations Bx = ej . Now eliminating all columns except
the j th from the right hand side of the matrix (B|In ) An Example Compute the inverse of the matrix
shows that the solution to the system of linear equations
 
1 2 0
Bx = ej is just Aj , where Aj is the j th column of A. It A =  0 1 3 .
follows that 0 0 1
BA = (BA1 | · · · |BAn ) = (e1 | · · · |en ) = In . Begin by forming the 3 × 6 matrix
 
Hence BA = In .  1 2 0 1 0 0
M = 0 1 3 0 1 0 .
Theorem 3.7.8. Let A be an n × n matrix. Then the 0 0 1 0 0 1
following are equivalent:
To put M in row echelon form by row reduction, first
subtract 3 times the 3rd row from the 2nd row, obtaining
(a) A is invertible.  
1 2 0 1 0 0
(b) The equation Ax = b has a unique solution for each  0 1 0 0 1 −3  .
b ∈ Rn . 0 0 1 0 0 1
(c) The only solution to Ax = 0 is x = 0.
Second, subtract 2 times the 2nd row from the 1st row,
(d) A is row equivalent to In . obtaining
 
1 0 0 1 −2 6
Proof (a) ⇒ (b) This implication is just Proposi-  0 1 0 0 1 −3  .
tion 3.7.5. 0 0 1 0 0 1

79
§3.7 Solving Linear Systems and Inverses

Theorem 3.7.9 implies that M =


1 2 4 1 0 0
0 -5 -11 -3 1 0
 
1 −2 6
A−1 =  0 1 −3  , 0 -4 -9 -2 0 1
0 0 1
Next type
which can be verified by matrix multiplication.
M(2,:) = M(2,:)/M(2,2)
M(3,:) = M(3,:) + 4*M(2,:)
Computing the Inverse Using MATLAB There are two
M(1,:) = M(1,:) - 2*M(2,:)
ways that we can compute inverses using MATLAB . Ei-
ther we can perform the row reduction of (3.7.3) directly to obtain
or we can use the MATLAB the command inv. We illus-
trate both of these methods. First type e3_7_4 to recall M =
the matrix 1.0000 0 -0.4000 -0.2000 0.4000 0
  0 1.0000 2.2000 0.6000 -0.2000 0
1 2 4 0 0 -0.2000 0.4000 -0.8000 1.0000
A= 3 1 1 . (3.7.4*)
2 0 −1 Finally, type

To perform the row reduction of (3.7.3) we need to form M(3,:) = M(3,:)/M(3,3)


the matrix M . The MATLAB command for generating M(2,:) = M(2,:) - M(2,3)*M(3,:)
an n × n identity matrix is eye(n). Therefore, typing M(1,:) = M(1,:) - M(1,3)*M(3,:)

M = [A eye(3)] to obtain

M =
in MATLAB yields the result 1.0000 0 0 -1.0000 2.0000 -2.0000
0 1.0000 0 5.0000 -9.0000 11.0000
0 0 1.0000 -2.0000 4.0000 -5.0000
M =
1 2 4 1 0 0 Thus C = A−1 is obtained by extracting the last three
3 1 1 0 1 0 columns of M by typing
2 0 -1 0 0 1
C = M(:,[4 5 6])
Now row reduce M to reduced echelon form as follows.
Type which yields

M(3,:) = M(3,:) - 2*M(1,:) C =


M(2,:) = M(2,:) - 3*M(1,:) -1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000
obtaining -2.0000 4.0000 -5.0000

80
§3.7 Solving Linear Systems and Inverses

You may check that C is the inverse of A by typing A*C This computation also illustrates the fact that even when
and C*A. the matrix A has integer entries, the inverse of A usually
has noninteger entries.
In fact, this entire scheme for computing the inverse of
a matrix has been preprogrammed into MATLAB . Just Let b = (2, −8, 18, −6, −1). Then we may use the inverse
type B = A−1 to compute the solution of Ax = b. Indeed if
we type
inv(A)
b = [2;-8;18;-6;-1];
to obtain x = B*b
ans = then we obtain
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000 x =
-2.0000 4.0000 -5.0000 -1.0000
2.0000
We illustrate again this simple method for computing the
1.0000
inverse of a matrix A. For example, reload the matrix in
-1.0000
(3.1.4*) by typing e3_1_4 and obtaining:
3.0000
A =
5 -4 3 -6 2 as desired (see (3.1.5*)). With this computation we have
2 -4 -2 -1 1 confirmed the analytical results of the previous subsec-
1 2 1 -5 3 tions.
-2 -1 -2 1 -1
1 -6 1 1 4 Exercises
The command B = inv(A) stores the inverse of the ma-
trix A in the matrix B, and we obtain the result
1. Verify by matrix multiplication that the following matrices
B = are inverses of each other:
-0.0712 0.2856 -0.0862 -0.4813 -    
1 0 2 −1 0 2
0.0915  0 −1 2  and  2 −1 −2  .
-0.1169 0.0585 0.0690 -0.2324 -
1 0 1 1 0 −1
0.0660
0.1462 -0.3231 -
0.0862 0.0405 0.0825 2. Let α 6= 0 be a real number and let A be an invertible
-0.1289 0.0645 -0.1034 - matrix. Show that the inverse of the matrix αA is given by
0.2819 0.0555 1 −1
A .
-0.1619 0.0810 0.1724 - α
0.1679 0.1394

81
§3.7 Solving Linear Systems and Inverses

10. For which values of a, b, c is the matrix


 
a 0
3. Let A = be a 2 × 2 diagonal matrix. For which
0 b
values of a and b is A invertible?
 
1 a b
A= 0 1 c 
0 0 1
4. Let A, B, C be general n × n matrices. Simplify the ex-
pression A−1 (BA−1 )−1 (CB −1 )−1 . invertible? Find A−1 when it exists.

In Exercises 5 – 6 use row reduction to find the inverse of the In Exercises 11 – 12 use row reduction to find the inverse of
given matrix. the given matrix and confirm your results using the command
inv.
 
1 4 5
5.  0 1 −1 .
11. (matlab)
−2 0 −8  
2 1 3
A= 1 2 3 . (3.7.5*)
 
1 −1 −1
6.  0 2 0 . 5 1 0
2 0 −1
12. (matlab)
 
7. True or false? If true, explain why; if false, give a coun- 0 5 1 3
terexample.
 1 5 3 −1 
B=
 . (3.7.6*)
2 1 0 −4 
(a) If A and B are matrices such that AB = I2 , then BA = 1 7 2 3
I2 .
(b) If A, B, C are matrices such that AB = In and BC = In ,
then A = C. 13. (matlab) Try to compute the inverse of the matrix
(c) Let A be an m × n matrix and b be a column m vector.
 
1 0 3
If the system of linear equations Ax = b has a unique C =  −1 2 −2  (3.7.7*)
solution, then A is invertible. 0 2 1

in MATLAB using the command inv. What happens — can


8. Let A be an n × n matrix that satisfies you explain the outcome?
A3 + a2 A2 + a1 A + In = 0, Now compute the inverse of the matrix
where A = AA and A = AA . Show that A is invertible.
2 3 2  
1  3
Hint: Let B = −(A2 + a2 A + a1 In ) and verify that AB =  −1 2 −2 
BA = In . 0 2 1

for some nonzero numbers  of your choice. What can be


9. Let A be an n × n matrix that satisfies observed in the inverse if  is very small? What happens
when  tends to zero?
Am + am−1 Am−1 + · · · + a1 A + In = 0.
Show that A is invertible.

82
§3.7 Solving Linear Systems and Inverses

14. Let A and B be 3 × 3 invertible matrices so that 17. Let A be an m × n matrix with rank n. Use Exercise 16
    to prove: Suppose the system of linear equations
1 0 −1 1 1 1
−1
A =  −1 −1 0  and B =  1 1 0 
−1
Ax = b (3.7.8)
0 1 −1 1 0 0
is consistent, then
Without computing A or B, determine the following:
x = (At A)−1 At b.
(a) rank(A)
(b) The solution to  
1
Bx =  1 
1

(c) (2BA)−1
(d) The matrix C so that ACB + 3I3 = 0.

15. True or False: Determine whether the following state-


ments are true or false, and explain your answer.

(a) The only 3 × 2 matrix A so that Ax = 0 for all x ∈ R2


is A = 0.
(b) A system of 5 equations in 3 unknowns with the solu-
tion x1 = 0, x2 = −3, x3 = 1 must have infinitely many
solutions.
(c) If A is a 2 × 2 matrix and A2 = 0, then A = 0.
(d) If u, v ∈ R3 are perpendicular, then ku + vk = kuk+kvk.

16. Let A be an m × n matrix with rank(A) = n. Use the


following steps to prove that At A is an invertible n × n ma-
trix.

(a) Let x be an n × 1 column vector. Prove that if (At A)x =


0, then Ax = 0.
Hint: Calculate xt (At A)x = ||Ax||2 .
(b) Use rank(A) = n to prove that if Ax = 0, then x = 0.
(c) Use (a) and (b) to conclude that if (At A)x = 0 then
x = 0. Then use Theorem 3.7.8 to imply that At A is
invertible.

83
§3.8 Determinants of 2 × 2 Matrices

3.8 Determinants of 2 × 2 Matrices Therefore,


There is a simple way for determining whether a 2 × 2
det(AB) = (aα + bγ)(cβ + dδ) − (aβ + bδ)(cα + dγ)
matrix A is invertible and there is a simple formula for
finding A−1 . First, we present the formula. Let = (acαβ + bcβγ + adαδ + bdγδ)
  −(acαβ + bcαδ + adβγ + bdγδ)
a b
A= . = bc(βγ − αδ) + ad(αδ − βγ)
c d
= (ad − bc)(αδ − βγ)
and suppose that ad − bc 6= 0. Then = det(A) det(B),
 
1 d −b
−1
A = . (3.8.1) as asserted. 
ad − bc −c a

This is most easily verified by directly applying the for- Corollary 3.8.3. A 2 × 2 matrix A is invertible if and
mula for matrix multiplication. So A is invertible when only if det(A) 6= 0.
ad − bc 6= 0. We shall prove below that ad − bc must be
nonzero when A is invertible. Proof If A is invertible, then AA−1 = I2 . Proposi-
From this discussion it is clear that the number ad − bc tion 3.8.2 implies that
must be an important quantity for 2 × 2 matrices. So we
det(A) det(A−1 ) = det(I2 ) = 1.
define:
Definition 3.8.1. The determinant of the 2 × 2 matrix Therefore, det(A) 6= 0. Conversely, if det(A) 6= 0, then
A is (3.8.1) implies that A is invertible. 
det(A) = ad − bc. (3.8.2)
Proposition 3.8.2. As a function on 2 × 2 matrices, the Determinants and Area Suppose that v and w are two
determinant satisfies the following properties. vectors in R2 that point in different directions. Then, the
set of points
(a) The determinant of an upper triangular matrix is the
product of the diagonal elements. z = αv + βw where 0 ≤ α, β ≤ 1

(b) The determinants of a matrix and its transpose are is a parallelogram, that we denote by P . We denote
equal. the area of P by |P |. For example, the unit square S,
whose corners are (0, 0), (1, 0), (0, 1), and (1, 1), is the
(c) det(AB) = det(A) det(B). parallelogram generated by the unit vectors e1 and e2 .

Proof Both (a) and (b) are easily verified by direct Next let A be a 2 × 2 matrix and let
calculation. Property (c) is also verified by direct calcu-
A(P ) = {Az : z ∈ P }.
lation — but of a more extensive sort. Note that
It follows from linearity (since Az = αAv + βAw) that
    
a b α β aα + bγ aβ + bδ
= . A(P ) is the parallelogram generated by Av and Aw.
c d γ δ cα + dγ cβ + dδ

84
§3.8 Determinants of 2 × 2 Matrices

Proposition 3.8.4. Let A be a 2 × 2 matrix and let S Exercises


be the unit square. Then

|A(S)| = | det A|. (3.8.3)


1. Find the inverse of the matrix
Proof Note that A(S) is the parallelogram generated 
2 1

by u1 = Ae1 and u2 = Ae2 , and u1 and u2 are the 3 2
.
columns of A. It follows that
 t
u1 u1 ut1 u2

2 t t
(det A) = det(A ) det(A) = det(A A) = det . 
1 K

ut2 u1 ut2 u2 2. Find the inverse of the shear matrix .
0 1
Hence
||u1 ||2
 
u1 · u2
(det A)2 = det = ||u1 ||2 ||u2 ||2 −(u1 ·u2 )2 . 3. Show that the 2 × 2 matrix A =
 
a b
u1 · u2 ||u2 ||2 is row equiv-
c d
alent to I2 if and only if ad − bc 6= 0. Hint: Prove this result
Recall that (1.4.5) of Chapter 1 states that separately in the two cases a 6= 0 and a = 0.
|P |2 = ||v||2 ||w||2 − (v · w)2 .

where P is the parallelogram generated by v and w. 4. Let A be a 2 × 2 matrix having integer entries. Find a
Therefore, (det A)2 = |A(S)|2 and (3.8.3) is verified.  condition on the entries of A that guarantees that A−1 has
integer entries.
Theorem 3.8.5. Let P be a parallelogram in R2 and let
A be a 2 × 2 matrix. Then
5. Let A be a 2×2 matrix and assume that det(A) 6= 0. Then
|A(P )| = | det A||P |. (3.8.4) use the explicit form for A−1 given in (3.8.1) to verify that

1
Proof First note that (3.8.3) a special case of (3.8.4), det(A−1 ) = .
det(A)
since |S| = 1. Next, let P be the parallelogram gen-
erated by the (column) vectors v and w, and let B =
(v|w). Then P = B(S). It follows from (3.8.3) that 6. Suppose a 2 × 2 matrix A satisfies the following equation:
|P | = | det B|. Moreover,    
0 2 −1 2
A = (3.8.5)
|A(P )| = |(AB)(S)| 1 2 1 4
= | det(AB)| Without calculating the entries of A, find det(A).
= | det A|| det B|
7. Find the entries of A defined in (3.8.5) and verify your
= | det A||P |, determinant calculation from Exercise 6.
as desired. 

85
§3.8 Determinants of 2 × 2 Matrices

8. Sketch the triangle whose vertices are 0, p = (3, 0)t , and In Exercises 13 – 16 use the unit square icon in the program
q = (0, 2)t ; and find the area of this triangle. Let map to test Proposition 3.8.4, as follows. Enter the given
  matrix A into map and map the unit square icon. Compute
−4 −3 det(A) by estimating the area of A(S) — given that S has
M= .
5 −2 unit area. For each matrix, use this numerical experiment to
Sketch the triangle whose vertices are 0, M p, and M q; and decide whether or not the matrix is invertible.
find the area of this triangle. 
0 −2

13. (matlab) A = .
2 0
9. Cramer’s rule provides a method based on determinants
 
−0.5 −0.5
for finding the unique solution to the linear equation Ax = b 14. (matlab) A = .
0.7 0.7
when A is an invertible matrix. More precisely, let A be an
invertible 2 × 2 matrix and let b ∈ R2 be a column vector. Let
 
−1 −0.5
15. (matlab) A = .
Bj be the 2 × 2 matrix obtained from A by replacing the j th −2 −1
column of A by the vector b. Let x = (x1 , x2 )t be the unique
solution to Ax = b. Then Cramer’s rule states that
 
0.7071 0.7071
16. (matlab) A = .
−0.7071 0.7071
det(Bj )
xj = . (3.8.6)
det(A)
Prove Cramer’s rule. Hint: Write the general system of two
equations in two unknowns as 17. Suppose a 2 × 2 matrix A satisfies the following equation:
   
a11 x1 + a12 x2 = b1 0 2 −1 2
A = .
1 2 1 4
a21 x1 + a22 x2 = b2 .

Subtract a11 times the second equation from a21 times the Without calculating the entries of A, find det(A).
first equation to eliminate x1 ; then solve for x2 , and verify
(3.8.6). Use a similar calculation to solve for x1 .

In Exercises 10 – 11 use Cramer’s rule (3.8.6) to solve the


given system of linear equations.
2x + 3y = 2
10. Solve for x.
3x − 5y = 1

4x − 3y = −1
11. Solve for y.
x + 2y = 7

12. (matlab) Use MATLAB to choose five 2 × 2 matrices at


random and compute their inverses. Do you get the impres-
sion that ‘typically’ 2 × 2 matrices are invertible? Try to find
a reason for this fact using the determinant of 2 × 2 matrices.

86
Chapter 4 Solving Linear Differential Equations

4 Solving Linear Differential ist. In Section 4.6 we develop the theory of eigenvalues
and characteristic polynomials of 2 × 2 matrices. (The
Equations corresponding theory for n × n matrices is developed in
Chapter 7.)
The study of linear systems of equations given in Chap-
ter 2 provides one motivation for the study of matrices The method for solving planar constant coefficient linear
and linear algebra. Linear constant coefficient systems of differential equations with real eigenvalues is summarized
ordinary differential equations provide a second motiva- in Section 4.7. This method is based on the material
tion for this study. In this chapter we show how the phase of Sections 4.5 and 4.6. The complete discussion of the
space geometry of systems of differential equations moti- solutions of linear planar systems of differential equations
vates the idea of eigendirections (or invariant directions) is given in Chapter 6. This discussion is best done after
and eigenvalues (or growth rates). we have introduced the linear algebra concepts of vector
subspaces and bases in Chapter 5.
We begin this chapter with a discussion of the theory and
application of the simplest of linear differential equations, The chapter ends with an optional discussion of Markov
the linear growth equation, ẋ = λx. In Section 4.1, we chains in Section 4.8. Markov chains give a method for
solve the linear growth equation and discuss the fact that analyzing branch processes where at each time unit sev-
solutions to differential equations are functions; and we eral outcomes are possible, each with a given probability.
emphasize this point by using MATLAB to graph solu-
tions of x as a function of t. In the optional Section 4.2
we illustrate the applicability of this very simple equa-
tion with a discussion of compound interest and a simple
population model.
The next two sections introduce planar constant coeffi-
cient linear differential equations. In these sections we
use the program PhasePlane (written by John Polking
and updated by Roy Goodman) that solves numerically
planar systems of differential equations. In Section 4.3
we discuss uncoupled systems — two independent one di-
mensional systems like those presented in Section 4.1 —
whose solution geometry in the plane is somewhat more
complicated than might be expected. In Section 4.4 we
discuss coupled linear systems. Here we illustrate the
existence and nonexistence of eigendirections.
In Section 4.5 we show how initial value problems can
be solved by building the solution — through the use of
superposition as discussed in Section 3.4 — from simpler
solutions. These simpler solutions are ones generated
from real eigenvalues and eigenvectors — when they ex-

87
§4.1 A Single Differential Equation

4.1 A Single Differential Equation where C is an arbitrary constant. (It is tempting to put
a constant of integration on both sides of (4.1.2), but two
Algebraic operations such as addition and multiplication
constants are not needed, as we can just combine both
are performed on numbers while the calculus operations
constants on the right hand side of this equation.) Since
of differentiation and integration are performed on func-
the indefinite integral of dx/dt is just the function x(t),
tions. Thus algebraic equations (such as x2 = 9) are
we have
solved for numbers (x = ±3) while differential (and inte- Z
gral) equations are solved for functions. x(t) = f (τ )dτ + C. (4.1.3)
In Chapter 2 we discussed how to solve systems of linear In particular, finding closed form solutions to differential
equations, such as equations of the type (4.1.1) is equivalent to finding all
definite integrals of the function f (t). Indeed, to find
x1 + x2 = 2
closed form solutions to differential equations like (4.1.1)
x1 − x2 = 4 we need to know all of the techniques of integration from
for numbers integral calculus.
We note that if x(t) is a real-valued function of t, then
x1 = 3 and x2 = −1,
we denote the derivative of x with respect to t using the
while in this chapter we discuss how to solve some linear following
systems of differential equations for functions. dx
ẋ x0
dt
Solving a single linear equation in one unknown x is a
all of which are standard notations for the derivative.
simple task. For example, solve
2x = 4 Initial Conditions and the Role of the Integration Constant
for x = 2. Solving a single differential equation in one C Equation (4.1.3) tells us that there are an infinite
unknown function x(t) is far from trivial. number of solutions to the differential equation (4.1.1),
each one corresponding to a different choice of the con-
stant C. To understand how to interpret the constant C,
Integral Calculus as a Differential Equation Mathemati- consider the example
cally, the simplest type of differential equation is:
dx
dx (t) = cos t.
(t) = f (t) (4.1.1) dt
dt Using (4.1.3) we see that the answer is
where f is some continuous function. In words, this equa- Z
tion asks us to find all functions x(t) whose derivative is x(t) = cos τ dτ + C = sin t + C.
f (t). The fundamental theorem of calculus tells us the
answer: x(t) is an antiderivative of f (t). Thus to find Note that
all solutions, we just integrate both sides of (4.1.1) with x(0) = sin(0) + C = C.
respect to t. Formally, using indefinite integrals, Thus, the constant C represents an initial condition for
the differential equation. We will return to the discussion
Z Z
dx
(t)dt = f (t)dt + C, (4.1.2) of initial conditions several times in this chapter.
dt

88
§4.1 A Single Differential Equation

The Linear Differential Equation of Growth and Decay is a constant (independent of t). Using the product rule
The subject of differential equations that we study begins and (4.1.4), compute
when the function f on the right hand side of (4.1.1) de-
pends explicitly on the function x, and the simplest such d 
x(t)e−λt =
d
(x(t)) e−λt + x(t)
d −λt 

e
differential equation is: dt dt dt
= (λx(t))e−λt + x(t)(−λe−λt )
dx = 0.
(t) = x(t).
dt
Now recall that the only functions whose derivatives are
Using results from differential calculus, we can solve this identically zero are the constant functions. Thus,
equation; indeed, we can solve the slightly more compli-
cated equation x(t)e−λt = K
dx
(t) = λx(t), (4.1.4) for some constant K ∈ R. Hence x(t) has the form
dt
(4.1.5), as claimed.
where λ ∈ R is a constant. The differential equation
(4.1.4) is linear since x(t) appears by itself on the right Next, we discuss the role of the constant K. We have
hand side. Moreover, (4.1.4) is homogeneous since the written the function as x(t), and we have meant the
constant function x(t) = 0 is a solution. reader to think of the variable t as time. Thus x(0) is
the initial value of the function x(t) at time t = 0; we say
In words (4.1.4) asks: For which functions x(t) is the
that x(0) is the initial value of x(t). From (4.1.5) we see
derivative of x(t) equal to λx(t). The function
that
x(0) = K,
x(t) = eλt
and that K is the initial value of the solution of (4.1.4).
is such a function, since Henceforth, we write K as x0 so that the notation calls
attention to the special meaning of this constant.
dx d
(t) = eλt = λeλt = λx(t). By deriving (4.1.5) we have proved:
dt dt
Theorem 4.1.1. There is a unique solution to the initial
More generally, the function
value problem
x(t) = Keλt (4.1.5) dx
(t) = λx(t)
is a solution to (4.1.4) for any real constant K. We claim
dt (4.1.6)
that the functions (4.1.5) list all (differentiable) functions x(0) = x0 .
that solve (4.1.4).
That solution is
To verify this claim, we let x(t) be a solution to (4.1.4) x(t) = x0 eλt .
and show that the ratio

x(t) As a consequence of Theorem 4.1.1 we see that there is


eλt
= x(t)e−λt a qualitative difference in the behavior of solutions to

89
§4.1 A Single Differential Equation

(4.1.6) depending on whether λ > 0 or λ < 0. Suppose


8

that x0 > 0. Then


 7
+∞ λ>0
lim x(t) = lim x0 eλt = (4.1.7)
t→∞ t→∞ 0 λ < 0. 6

When λ > 0 we say that the solution has exponential


growth and when λ < 0 we say that the solution has 5

exponential decay . In either case, however, the number λ


is called the growth rate. We can visualize this discussion 4

x
by graphing the solutions in MATLAB.
3
Suppose we set x0 = 1 and λ = ±0.5. Type
2
x0 = 1;
lambda = 0.5; 1
t = linspace(-1,4,100);
x = x0*exp(lambda*t); 0
plot(t,x) −1 −0.5 0 0.5 1 1.5
t
2 2.5 3 3.5 4

hold on
xlabel('t') Figure 12: Solutions of (4.1.4) for t ∈ [−1, 4], x0 = 1 and
ylabel('x') λ = ±0.5.
lambda = -0.5;
x = x0*exp(lambda*t);
dx
plot(t,x) 2. ODE: = x + et .
dt
The result of this calculation is shown in Figure 12. Functions: x1 (t) = tet and x2 (t) = 2et .
In this way we can actually see the difference between dx
exponential growth (λ = 0.5) and exponential decay 3. ODE: = x2 + 1.
dt
(λ = −0.5), as discussed in the limit in (4.1.7). Functions: x1 (t) = − tan t and x2 (t) = tan t.
dx x
4. ODE: = .
Exercises dt t
Functions: x1 (t) = t + 1 and x2 (t) = 5t.

In Exercises 1 – 4 determine whether or not each of the given


functions x1 (t) and x2 (t) is a solution to the given differential 5. Solve the differential equation
equation. dx
dx t = 2x,
1. ODE: = . dt
dt x−1
p where x(0) = 1. At what time t1 will x(t1 ) = 2?
1+ 4t2 + 1
Functions: x1 (t) = t + 1 and x2 (t) = .
2

90
§4.1 A Single Differential Equation

6. Solve the differential equation


dx
= −3x.
dt
At what time t1 will x(t1 ) be half of x(0)?

In Exercises 7 – 10 use MATLAB to graph the given function


f on the specified interval.

7. (matlab) f (t) = t2 on the interval t ∈ [0, 2].

8. (matlab) f (t) = et − t on the interval t ∈ [0, 3].

9. (matlab) f (t) = cos(2t) − t on the interval t ∈ [2, 8].

10. (matlab) f (t) = sin(5t) on the interval t ∈ [0, 6.5].

Hint: Use the fact that the trigonometric functions sin and
cos can be evaluated in MATLAB in the same way as the ex-
ponential function, that is, by using sin and cos instead
of exp.

91
§4.2 *Rate Problems

4.2 *Rate Problems January 1,


Even though the homogeneous linear differential equa- • the money is not withdrawn for one year,
tion (4.1.6) is one of the simplest differential equations,
it still has some use in applications. We present two here: • no new money is deposited in that account during
compound interest and population dynamics. the year,

• the yearly interest rate r remains constant through-


Compound Interest Banks pay interest on an account in out the year, and
the following way. At the end of each day, the bank de-
termines the interest rate rday for that day, checks the • interest is added to the account N times during the
principal P in the account, and then deposits an addi- year.
tional rday P . So the next day the principal in this ac-
count is (1 + rday )P . Note that if r denotes the interest In this model, simple interest corresponds to N = 1,
rate per year, then rday = r/365. Of course, a day is just compound monthly interest to N = 12, and compound
a convenient measure for elapsed time. Before computers daily interest to N = 365.
were prevalent, banks paid interest yearly or quarterly or
We first answer the question: How much money is in this
monthly or, in a few cases, even weekly, depending on the 1
particular bank rules. account after one year? After one time unit of year,
N
Observe that the more frequently interest is paid, the the amount of money in the account is
more money is earned. For example, if interest is paid  r
only once at the end of a year, then the money in the Q1 = 1 +
N
P0 .
account at the end of the year is (1+r)P , and the amount
r
rP is called simple interest. But if interest is paid twice The interest rate in each time period is , the yearly
a year, then the principal at the end of six months will N
rate r divided by the number of time periods N . Here we
r
be (1 + )P , and the principal at the end of the year will have used the assumption that the interest rate remains
2
r
be (1 + )2 P . Since constant throughout the year. After two time units, the
2 principal is:
 r 2 1
1+ = 1 + r + r2 > 1 + r,  r  r 2
2 4 Q2 = 1 + Q1 = 1 + P0 ,
N N
there is more money in the account at the end of the year
if the interest is compounded semiannually rather than and at the end of the year (that is, after N time periods)
annually. But how much is the difference and what is the
maximum earning potential?
 r N
QN = 1 + P0 . (4.2.1)
N
While making the calculation in the previous paragraph,
we implicitly made a number of simplifying assumptions. Here we have used the assumption that money is neither
In particular, we assumed deposited nor withdrawn from our account. Note that
QN is the amount of money in the bank after one year
• an initial principal P0 is deposited in the bank on assuming that interest has been compounded N (equally

92
§4.2 *Rate Problems

spaced) times during that year, and the effective interest It follows that
rate when compounding N times is: ∆P
= rP,
∆t
r N and, on taking the limit ∆t → 0, we have the differential

1+ − 1.
N equation
dP
For the curious, we can write a program in MATLAB (t) = rP (t).
dt
to compute (4.2.1). Suppose we assume that the initial
deposit P0 = $1, 000, the simple interest rate is 6% per Since P (0) = P0 the solution of the initial value problem
year, and the interest payments are made monthly. In given in Theorem 4.1.1 shows that
MATLAB type P (t) = P0 ert .

N = 12; After one year (t = 1) we find that


P0 = 1000; P (1) = er P0 .
r = 0.06;
QN = (1 + r/N)^N*P0 Note that
P (1) = lim QN ,
N →∞
The answer is QN = $1, 061.68, and the effective interest
and we have thus verified that
rate for monthly payments is 6.16778%. For daily interest
payments N = 365, the answer is QN = $1, 061.83, and r N

lim 1 + = er .
the effective interest rate is 6.18313%. N →∞ N
To find the maximum effective interest, we ask the bank Thus the maximum effective interest rate is er −1. When
to compound interest continuously; that is, we ask the r = 6% the maximum effective interest rate is 6.18365%.
bank to compute
 r N Population Dynamics To provide a second interpretation
lim 1 + . of the constant λ in (4.1.4), we discuss a simplified model
N →∞ N
for population dynamics. Let p(t) be the size of a popula-
We compute this limit using differential equations. The tion of a certain species at time t and let r be the rate at
concept of continuous interest is rephrased as follows. Let which the population p is changing at time t. In general,
P (t) be the principal at time t, where t is measured in r depends on the time t and is a complicated function
units of years. Suppose that we assume that interest is of birth and death rates and of immigration and emigra-
compounded N times during the year. The length of tion, as well as of other factors. Indeed, the rate r may
time in each compounding period is well depend on the size of the population itself. (Over-
crowding can be modeled by assuming that the death
1
∆t = , rate increases with the size of the population.) These
N population models assume that the rate of change in the
and the change in principal during that time period is size of the population dp/dt is given by
r dp
∆P = P = rP ∆t. (t) = rp(t), (4.2.2)
N dt

93
§4.2 *Rate Problems

they just differ on the precise form of r. In general, the hours? Express your answer in terms of x0 , the initial number
rate r will depend on the size of the population p as well of bacteria.
as the time t, that is, r is a function r(p, t).
The simplest population model — which we now assume
5. Suppose you deposit $10,000 in a bank at an interest of
— is the one in which r is assumed to be constant. Then
7.5% compounded continuously. How much money will be in
equation (4.2.2) is identical to (4.1.4) after identifying your account a year and a half later? How much would you
p with x and r with λ. Hence we may interpret r as have if the interest were compounded monthly?
the growth rate for the population. The form of the
solution in (4.1.5) shows that the size of a population
grows exponentially if r > 0 and decays exponentially if 6. Newton’s law of cooling states that the rate at which a
r < 0. body changes temperature is proportional to the difference
The mathematical description of this simplest population between the body temperature and the temperature of the
model shows that the assumption of a constant growth surrounding medium. That is,
rate leads to exponential growth (or exponential decay). dT
Is this realistic? Surely, no population will grow expo- = α(T − Tm ) (4.2.3)
dt
nentially for all time, and other factors, such as limited
living space, have to be taken into account. On the other where T (t) is the temperature of the body at time t, Tm is
the constant temperature of the surrounding medium, and α
hand, exponential growth describes well the growth in
is the constant of proportionality. Suppose the body is in air
human population during much of human history. So
of temperature 50◦ and the body cools from 100◦ to 75◦ in 20
this model, though surely oversimplified, gives some in- minutes. What will the temperature of the body be after one
sight into population growth. hour? Hint: Rewrite (4.2.3) in terms of U (t) = T (t) − Tm .

Exercises
7. Let p(t) be the population of group Grk at time t mea-
sured in years. Let r be the growth rate of the group Grk.
Suppose that the population of Grks changes according to the
In Exercises 1 – 3 find solutions to the given initial value differential equation (4.2.2). Find r so that the population of
problems. Grks doubles every 50 years. How large must r be so that the
dx population doubles every 25 years?
1. = sin(2t), x(π) = 2.
dt
dx
2. = t2 , x(2) = 8. 8. You deposit $4,000 in a bank at an interest of 5.5% but
dt
after half a year the bank changes the interest rate to 4.5%.
dx 1 Suppose that the interest is compounded continuously. How
3. = 2, x(1) = 1.
dt t much money will be in your account after one year?

4. Bacteria grown in a culture increase at a rate proportional


to the number present. If the number of bacteria doubles 9. As an application of (4.1.3) answer the following question
every 2 hours, then how many bacteria will be present after 5 (posed by R.P. Agnew).

94
§4.2 *Rate Problems

One day it started snowing at a steady rate. A


snowplow started at noon and went two miles in the
first hour and one mile in the second hour. Assume
that the speed of the snowplow times the depth of
the snow is constant. At what time did it start to
snow?

To set up this problem, let d(t) be the depth of the snow at


time t where t is measured in hours and t = 0 is noon. Since
the snow is falling at a constant rate r, d(t) = r(t − t0 ) where
t0 is the time that it started snowing. Let x(t) be the position
of the snowplow along the road. The assumption that speed
times the depth equals a constant k means that
dx k K
(t) = =
dt d(t) t − t0
where K = k/r. The information about how far the snowplow
goes in the first two hours translates to
x(1) = 2 and x(2) = 3.
Now solve the problem.

10. Two banks each pay 7% interest per year — one com-
pounds money daily and one compounds money continuously.
What is the difference in earnings in one year in an account
having $10,000.

11. There are two banks in town — Intrastate and Statewide.


You plan to deposit $5,000 in one of these banks for two
years. Statewide Bank’s best savings account pays 8% interest
per year compounded quarterly and charges $10 to open an
account. Intrastate Bank’s best savings account pays 7.75%
interest compounded daily. Which bank will pay you the most
money when you withdraw your money? Would your answer
change if you had planned to keep your money in the bank
for only one year?

12. In the beginning of the year 1990 the population of the


United States was approximately 250,000,000 people and the
growth rate was estimated at 3% per year. Assuming that
the growth rate does not change, during what year will the
population of the United States reach 400,000,000?

95
§4.3 Uncoupled Linear Systems of Two Equations

4.3 Uncoupled Linear Systems of Asymptotic Stability of the Origin As we did for the
single equation (4.1.4), we ask what happens to solutions
Two Equations to (4.3.2) starting at (x0 , y0 ) as time t increases. That
A system of two linear ordinary differential equations has is, we compute
the form
dx lim (x(t), y(t)) = lim (x0 eat , y0 edt ).
(t) = ax(t) + by(t) t→∞ t→∞
dt (4.3.1) This limit is (0, 0) when both a < 0 and d < 0; but if
dy
dt
(t) = cx(t) + dy(t), either a or d is positive, then most solutions diverge to
infinity, since either
where a, b, c, d are real constants. Solutions of (4.3.1) are
pairs of functions (x(t), y(t)). lim |x(t)| = ∞ or lim |y(t)| = ∞.
t→∞ t→∞
A solution to the planar system (4.3.1) that is constant
in time t is called an equilibrium. Observe that the origin Roughly speaking, an equilibrium (x0 , y0 ) is asymptoti-
(x(t), y(t)) = (0, 0) is always an equilibrium solution to cally stable if every trajectory (x(t), y(t)) beginning from
the linear system (4.3.1). an initial condition near (x0 , y0 ) stays near (x0 , y0 ) for
We begin our discussion of linear systems of differential all positive t, and
equations by considering uncoupled systems of the form lim (x(t), y(t)) = (x0 , y0 ).
t→∞
dx
(t) = ax(t) The equilibrium is unstable if there are trajectories with
dt (4.3.2) initial conditions arbitrarily close to the equilibrium that
dy
(t) = dy(t). move far away from that equilibrium.
dt
At this stage, it is not clear how to determine whether
Since the system is uncoupled (that is, the equation for the origin is asymptotically stable for a general linear
ẋ does not depend on y and the equation for ẏ does not system (4.3.1). However, for uncoupled linear systems
depend on x), we can solve this system by solving each we have shown that the origin is an asymptotically stable
equation independently, as we did for (4.1.4): equilibrium when both a < 0 and d < 0. If either a > 0
or d > 0, then (0, 0) is unstable.
x(t) = x0 eat
(4.3.3)
y(t) = y0 edt .
Invariance of the Axes There is another observation that
There are now two initial conditions that are identified we can make for uncoupled systems. Suppose that the
by initial condition for an uncoupled system lies on the
x(0) = x0 and y(0) = y0 . x-axis; that is, suppose y0 = 0. Then the solution
(x(t), y(t)) = (x0 eat , 0) also lies on the x-axis for all time.
Similarly, if the initial condition lies on the y-axis, then
Having found all the solutions to (4.3.2) in (4.3.3), we
the solution (0, y0 edt ) lies on the y-axis for all time.
now explore the geometry of the phase plane for these
uncoupled systems both analytically and by using MAT- This invariance of the coordinate axes for uncoupled sys-
LAB. tems follows directly from (4.3.3). It turns out that

96
§4.3 Uncoupled Linear Systems of Two Equations

many linear systems of differential equations have invari- (ax0 + by0 , cx0 . + dy0 ). So the differential equation solver
ant lines; this is a topic to which we return later in this plots the direction field (f, g) and then finds curves that
chapter. are tangent to these vectors at each point in time.
The program PhasePlane, written by Roy Goodman and
Generating Phase Space Pictures with PhasePlane originally by John Polking under the name pplane, draws
How can we visualize a solution (x(t), y(t)) in (4.3.3) to two-dimensional phase planes. In MATLAB type
the system of differential equations (4.3.2)? The time se-
ries approach suggests that we should graph (x(t), y(t)) PhasePlane
as a function of t; that is, we should plot the curve
and the PhasePlane window will appear. PhasePlane has
(t, x(t), y(t)) a number of preprogrammed differential equations listed
in a menu accessed by clicking on library. To explore
in three dimensions. Using MATLAB it is possible to linear systems, choose linear system in the library.
plot such a graph — but such a graph by itself is difficult
to interpret. Alternatively, we could graph either of the To integrate the uncoupled linear system, set the param-
functions x(t) or y(t) by themselves as we do for solutions eters b and c equal to zero, a = 2, and d = −3. We
to single equations — but then some information is lost. now have the system (4.3.2). After pushing Update, the
Phase plane window is filled by vectors (f, g) indicating
The method we prefer is the phase space plot obtained by directions.
thinking of (x(t), y(t)) as the position of a particle in the
xy-plane at time t. We then graph the point (x(t), y(t)) We may start the computations by clicking with a mouse
in the plane as t varies. When looking at phase space button on an initial value (x0 , y0 ). For example, if we
plots, it is natural to call solutions trajectories, since we click approximately onto (x(0), y(0)) = (x0 , y0 ) = (1, 1),
can imagine that we are watching a particle moving in then the trajectory in the upper right quadrant of Fig-
the plane as time changes. ure 13 displays.

We begin by considering uncoupled linear equations. As First PhasePlane draws the trajectory in forward time
we saw, when the initial conditions are on a coordinate for t ≥ 0 and then it draws the trajectory in backwards
axis (either (x0 , 0) or (0, y0 )), the solutions remain on time for t ≤ 0. More precisely, when we click on a
that coordinate axis for all time t. For these initial point (x0 , y0 ) in the (x, y)-plane, PhasePlane computes
conditions, the equations behave as if they were one di- that part of the solution that lies inside the specified dis-
mensional. However, if we consider an initial condition play window and that goes through this point. For linear
(x0 , y0 ) that is not on a coordinate axis, then even for an systems there is precisely one solution that goes through
uncoupled system it is a little difficult to see what the a specified point in the (x, y)-plane.
trajectory looks like. At this point it is useful to use the
computer. Saddles, Sinks, and Sources for the Uncoupled System
The method used to integrate planar systems of differ- (4.3.2) In a qualitative fashion, the trajectories of un-
ential equations is similar to that used to integrate sin- coupled linear systems are determined by the invariance
gle equations. The solution curve (x(t), y(t)) to (4.3.2) of the coordinate axes and by the signs of the constants
at a point (x0 , y0 ) is tangent to the direction (f, g) = a and d.

97
§4.3 Uncoupled Linear Systems of Two Equations

y −1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Figure 13: PHASEPLANE Display for (4.3.2) with a = 2, d = −3 and x, y ∈ [−5, 5]. Solutions going through (±1, ±1)
are shown.

Saddles: ad < 0 In Figure 13, where a = 2 > 0 and origin as time tends to infinity. Hence — as mentioned
d = −3 < 0, the origin is a saddle. If we choose several previously, and in contrast to saddles — the equilibrium
initial values (x0 , y0 ) one after another, then we find that (0, 0) is asymptotically stable. Observe that solutions
as time increases all solutions approach the x-axis. That approach the origin on trajectories that are tangent to
is, if (x(t), y(t)) is a solution to this system of differen- the x-axis. Since d < a < 0, the trajectory decreases
tial equations, then lim y(t) = 0. This observation is to zero faster in the y direction than it does in the x-
direction. If you change parameters so that a < d < 0,
t→∞
particularly noticeable when we choose initial conditions
close to the origin (0, 0). On the other hand, solutions then trajectories will approach the origin tangent to the
also approach the y-axis as t → −∞. These qualitative y-axis.
features of the phase plane are valid whenever a > 0 and
d < 0.
Sources: a > 0 and d > 0 Choose the constants a and d
When a < 0 and d > 0, then the origin is also a saddle so that both are positive. In forward time, all trajecto-
— but the roles of the x and y axes are reversed. ries, except the equilibrium at the origin, move towards
infinity and the origin is called a source.

Sinks: a < 0 and d < 0 Now change the parameter a


to −1. After clicking on Proceed and specifying several Time Series PhasePlane simultaneously graphs the
initial conditions, we see that all solutions approach the time series of the single components x(t) and y(t) of a

98
§4.3 Uncoupled Linear Systems of Two Equations

solution (x(t), y(t)) in the upper right window of Phase- 7. Use the phase plane picture given in  Figure
 13 to draw
Plane. the time series x(t) when (x(0), y(0)) =
1 1
, . Check your
2 2
answer using PhasePlane.
Exercises

8. (matlab) For the three choices of a and d in the uncoupled


In Exercises 1 – 2 find all equilibria of the given system of system of linear differential equations in Exercises 3 – 5, use
nonlinear autonomous differential equations. PhasePlane to compute phase portraits. Use Keyboard input
to look at solutions with initial conditions on the x and y axes.
1. As time t increases, do solutions with these initial conditions
tend towards or away from the origin?
ẋ = x−y
ẏ = x2 − y.
9. (matlab) Suppose that a and d are both negative, so
2.
that the origin is asymptotically stable. Make several choices
of a < d < 0 and observe that solution trajectories tend to
ẋ = x2 − xy
approach the origin tangent to one of the axes. Determine
ẏ = x2 + y 2 − 4. which one. Try to prove that your experimental guess is al-
ways correct?

In Exercises 3 – 5 consider the uncoupled system of differen-


tial equations (4.3.2). For each choice of a and d, determine 10. (matlab) Suppose that a = d < 0. Verify experimen-
whether the origin is a saddle, source, or sink. tally using PhasePlane that all trajectories approach the origin
along straight lines. Try to prove this conjecture?
3. a = 1 and d = −1.

4. a = −0.01 and d = −2.4.

5. a = 0 and d = −2.3.

6. Let (x(t), y(t)) be the solution (4.3.3) of (4.3.2) with initial


condition (x(0), y(0)) = (x0 , y0 ), where x0 6= 0 6= y0 .

(a) Show that the points (x(t), y(t)) lie on the curve whose
equation is:
y0a xd − xd0 y a = 0.

(b) Verify that if a = 1 and d = 2, then the solution lies on


a parabola tangent to the x-axis.

99
§4.4 Coupled Linear Systems

4.4 Coupled Linear Systems Similarly, in backward time t the solutions approach the
anti-diagonal x1 = −x2 . In other words, as for the case
The general linear constant coefficient system in two un-
of uncoupled systems, we find two distinguished direc-
known functions x1 , x2 is:
tions in the (x, y)-plane. See Figure 14. Moreover, the
dx1 computations indicate that these lines are invariant in
(t) = ax1 (t) + bx2 (t) the sense that solutions starting on these lines remain on
dt (4.4.1) them for all time. This statement can be verified numer-
dx2
dt
(t) = cx1 (t) + dx2 (t). ically by choosing initial conditions (x0 , y0 ) = (1, 1) and
(x0 , y0 ) = (1, −1).
The uncoupled systems studied in Section 4.3 are ob-
tained by setting b = c = 0 in (4.4.1). We have discussed
how to solve (4.4.1) by formula (4.3.3) when the system 5
is uncoupled. We have also discussed how to visualize the 4
phase plane for different choices of the diagonal entries 3
a and d. At present, we cannot solve (4.4.1) by formula
when the coefficient matrix is not diagonal. But we may
2

use PhasePlane to solve the initial value problems numer- 1

ically for these coupled systems. We illustrate this point 0

y
by solving −1

−2
dx1
(t) = −x1 (t) + 3x2 (t) −3
dt −4
dx2
(t) = 3x1 (t) − x2 (t). −5
dt
−6 −4 −2 0 2 4 6

After starting PhasePlane, select linear system from the


x

Gallery and set the constants to:

a = −1, b = 3, c = 3, d = −1. Figure 14: PHASEPLANE Display for (4.4.1) with a =


−1 = d; b = 3 = c; and x, y ∈ [−5, 5]. Solutions going
In order to have equally spaced coordinates on the x and through (±0.5, 0) and (0, ±0.5) are shown.
y axes, do the following. In the PhasePlane Display win-
dow enter x0 = 0.5 and y0 = 0 and click on Update. Definition 4.4.1. An invariant line for a linear system
Repeat three times with x0 = −0.5 and y0 = 0, x0 = 0 of differential equations is called an eigendirection.
and y0 = 0.5, and x0 = 0 and y0 = −0.5 to recreate
Figure 14. Observe that eigendirections vary if we change parame-
ters. For example, if we set b to 1, then there are still
two distinguished lines but these lines are no longer per-
Eigendirections After computing several solutions, we
find that for increasing time t all the solutions seem to pendicular.
approach the diagonal line given by the equation x1 = x2 . For uncoupled systems, we have shown analytically that

100
§4.4 Coupled Linear Systems

the x and y axes are eigendirections. The numerical Nonexistence of Eigendirections We now show analyti-
computations that we have just performed indicate that cally that certain linear systems of differential equations
eigendirections exist for many coupled systems. This dis- have no invariant lines in their phase portrait. Consider
cussion leads naturally to two questions: the system
ẋ = y
(4.4.2)
ẏ = −x.
(a) Do eigendirections always exist?
Observe that (x(t), y(t)) = (sin t, cos t) is a solution to
(b) How can we find eigendirections? (4.4.2) by calculating

d
ẋ(t) = sin t = cos t = y(t)
The second question will be answered in Sections 4.5 and dt
4.6. We can answer the first question by performing ẏ(t) =
d
cos t = − sin t = −x(t)
another numerical computation. In the setup window, dt
change the parameter b to −2. Then numerically com- We have shown analytically that the unit circle centered
pute some solutions to see that there are no eigendirec- at the origin is a solution trajectory for (4.4.2). Hence
tions in the phase space of this system. Observe that all (4.4.2) has no eigendirections. It may be checked using
solutions appear to spiral into the origin as time goes to MATLAB that all solution trajectories for (4.4.2) are just
infinity. The phase portrait is shown in Figure 15. circles centered at the origin.

Exercises
5

2
1. (matlab) Choose the linear system in PhasePlane and set
1 a = 0, b = 1, and c = −1. Then find values d such that except
0 for the origin itself all solutions appear to
y

−1 (a) spiral into the origin;


−2 (b) spiral away from the origin;
−3 (c) form circles around the origin;
−4

−5
2. (matlab) Choose the linear system in PhasePlane and
−5 −4 −3 −2 −1 0
x
1 2 3 4 5
set a = −1, c = 3, and d = −1. Then find a value for
b such that the behavior of the solutions of the system is
Figure 15: PHASEPLANE Display for the linear system “qualitatively” the same as for a diagonal system where a
with a = −1, b = −2, c = 3, d = −1. and d are negative. In particular, the origin should be an
asymptotically stable equilibrium and the solutions should
approach that equilibrium along a distinguished line.

101
§4.4 Coupled Linear Systems

3. (matlab) Choose the linear system in PhasePlane and set 8. The ODE is:
a = d and b = c. Verify that for these systems of differential
equations: ẋ = y
1 1
(a) When |a| < b typical trajectories approach the line y = x ẏ = − x + y + 1.
t2 t
as t → ∞ and the line y = −x as t → −∞.
The pairs of functions are:
(b) Assume that b is positive, a is negative, and b < −a.
With these assumptions show that the origin is a sink (x1 (t), y1 (t)) = (t2 , 2t) and (x2 (t), y2 (t)) = (2t2 , 4t).
and that typical trajectories approach the origin tangent
to the line y = x.

4. (matlab) Sketch the time series y(t) for the solution to


the differential equation whose phase plane ispictured
 in Fig-
1 1
ure 15 with initial condition (x(0), y(0)) = , . Check
2 2
your answer using PhasePlane.

In Exercises 5 – 8, determine which of the function pairs


(x1 (t), y1 (t)) and (x2 (t), y2 (t)) are solutions to the given sys-
tem of ordinary differential equations.
5. The ODE is:
ẋ = 2x + y
ẏ = 3y.
The pairs of functions are:
(x1 (t), y1 (t)) = (e2t , 0) and (x2 (t), y2 (t)) = (e3t , e3t ).
6. The ODE is:
ẋ = 2x − 3y
ẏ = x − 2y.
The pairs of functions are:
(x1 (t), y1 (t)) = et (3, 1) and (x2 (t), y2 (t)) = (e−t , e−t ).
7. The ODE is:
ẋ = x+y
ẏ = −x + y.
The pairs of functions are:
(x1 (t), y1 (t)) = (3et , −2et ) and (x2 (t), y2 (t)) = et (sin t, cos t).

102
§4.5 The Initial Value Problem and Eigenvectors

4.5 The Initial Value Problem and where  


Eigenvectors
−1 3
C= .
3 −1
The general constant coefficient system of n differential In those calculations we observed that there is a solution
equations in n unknown functions has the form to (4.5.3) that stayed on the main diagonal for each mo-
dx1 ment in time. Note that a vector is on the main diagonal
(t) = c11 x1 (t) + · · · + c1n xn (t)

1
dt if it is a scalar multiple of . Thus a solution that
.. .. .. 1
. . . (4.5.1)
stays on the main diagonal for all time t must have the
dxn form
(t) = cn1 x1 (t) + · · · + cnn xn (t) 
x(t)
  
1
dt = u(t) (4.5.4)
y(t) 1
where the coefficients cij ∈ R are constants. Suppose
that (4.5.1) satisfies the initial conditions for some real-valued function u(t). When a function of
form (4.5.4) is a solution to (4.5.3), it satisfies:
x1 (0) = K1 , . . . , xn (0) = Kn .
     
1 ẋ(t) x(t)
Using matrix multiplication of a vector and matrix, we u̇(t)
1
=
ẏ(t)
=C
y(t)
can rewrite these differential equations in a compact    
1 1
form. Consider the n × n coefficient matrix = Cu(t) = u(t)C .
  1 1
c11 c12 · · · c1n
 c21 c22 · · · c2n  A calculation shows that
C= .. .. .. 
     
 . . .  C
1
=2
1
.
cn1 cn2 ··· cnn 1 1

and the n vectors of initial conditions and unknowns Hence    


1 1
u̇(t) = 2u(t) .
   
K1 x1 1 1
X0 =  ...  and X =  ...  .
   
It follows that the function u(t) must satisfy the differ-
Kn xn ential equation
du
Then (4.5.1) has the compact form = 2u.
dt
dX
= CX whose solutions are
dt (4.5.2)
X(0) = X0 . u(t) = αe2t ,

In Section 4.4, we plotted the phase space picture of the for some scalar α.
planar system of differential equations Similarly, we also saw in our MATLAB experiments that
there was a solution that for all time stayed on the anti-
   
ẋ x(t)
=C (4.5.3) diagonal, the line y = −x. Such a solution must have the
ẏ y(t)

103
§4.5 The Initial Value Problem and Eigenvectors

form Initial Value Problems Suppose that we wish to find a


solution to (4.5.3) satisfying the initial conditions
   
x(t) 1
= v(t) .
y(t) −1    
x(0) 1
A similar calculation shows that v(t) must satisfy the y(0)
=
3
.
differential equation
Then we can use the principle of superposition to find
dv
= −4v. this solution in closed form. Superposition implies that
dt for each pair of scalars α, β ∈ R, the functions
Solutions to this equation all have the form      
x(t) 1 1
= αe2t + βe−4t , (4.5.6)
v(t) = βe−4t , y(t) 1 −1

are solutions to (4.5.3). Moreover, for a solution of this


for some real constant β.
form
Thus, using matrix multiplication, we are able to prove
   
x(0) α+β
= .
analytically that there are solutions to (4.5.3) of exactly y(0) α−β
the type suggested by our MATLAB experiments. How-
ever, even more is true and this extension is based on the Thus we can solve our prescribed initial value problem,
principle of superposition that was introduced for alge- if we can solve the system of linear equations
braic equations in Section 3.4.
α+β =1
α − β = 3.
Superposition in Linear Differential Equations Con-
sider a general linear differential equation of the form This system is solved for α = 2 and β = −1. Thus
     
dX x(t) 2t 1 −4t 1
= CX, (4.5.5) y(t)
= 2e
1
−e
−1
dt
where C is an n × n matrix. Suppose that Y (t) and Z(t) is the desired closed form solution.
are solutions to (4.5.5) and α, β ∈ R are scalars. Then
X(t) = αY (t) + βZ(t) is also a solution. We verify this Eigenvectors and Eigenvalues We emphasize that just
fact using the ‘linearity’ of d/dt. Calculate knowing that there are two lines in the plane that are
d dY dZ invariant under the dynamics of the system of linear dif-
X(t) = α (t) + β (t) ferential equations is sufficient information to solve these
dt dt dt
= αCY (t) + βCZ(t) equations. So it seems appropriate to ask the question:
When is there a line that is invariant under the dynamics
= C(αY (t) + βZ(t)) of a system of linear differential equations? This question
= CX(t). is equivalent to asking: When is there a nonzero vector
v and a nonzero real-valued function u(t) such that
So superposition is valid for solutions of linear differential
equations. X(t) = u(t)v

104
§4.5 The Initial Value Problem and Eigenvectors

is a solution to (4.5.5)? for all constants K.


Suppose that X(t) is a solution to the system of differ- We have proved the following theorem.
ential equations Ẋ = CX. Then u(t) and v must satisfy
Theorem 4.5.3. Let v be an eigenvector of the n × n
u̇(t)v =
dX
= CX(t) = u(t)Cv. (4.5.7) matrix C with eigenvalue λ. Then
dt
Since u is nonzero, it follows that v and Cv must lie on X(t) = eλt v
the same line through the origin. Hence
is a solution to the system of differential equations Ẋ =
Cv = λv, (4.5.8) CX.
for some real number λ.
Finding eigenvalues and eigenvectors from first principles
Definition 4.5.1. A nonzero vector v satisfying (4.5.8) — even for 2 × 2 matrices — is not a simple task. We
is called an eigenvector of the matrix C, and the number end this section with a calculation illustrating that real
λ is an eigenvalue of the matrix C. eigenvalues need not exist. In Section 4.6, we present a
natural method for computing eigenvalues (and eigenvec-
Geometrically, the matrix C maps an eigenvector onto a tors) of 2 × 2 matrices. We defer the discuss of how to
multiple of itself — that multiple is the eigenvalue. find eigenvalues and eigenvectors of n × n matrices until
Note that scalar multiples of eigenvectors are also eigen- Chapter 7.
vectors. More precisely:
Lemma 4.5.2. Let v be an eigenvector of the matrix C An Example of a Matrix with No Real Eigenvalues Not
with eigenvalue λ. Then αv is also an eigenvector of C every matrix has real eigenvalues and eigenvectors. Re-
with eigenvalue λ as long as α 6= 0. call the linear system of differential equations ẋ = Cx
whose phase plane is pictured in Figure 15. That phase
Proof By assumption, Cv = λv and v is nonzero. Now plane showed no evidence of an invariant line and indeed
calculate there is none. The matrix C in that example was
C(αv) = αCv = αλv = λ(αv).  
−1 −2
C= .
The lemma follows from the definition of eigenvector.  3 −1

It follows from (4.5.7) and (4.5.8) that if v is an eigen- We ask: Is there a value of λ and a nonzero vector (x, y)
vector of C with eigenvalue λ, then such that    
x x
du C =λ ? (4.5.9)
= λu. y y
dt
Thus we have returned to our original linear differential Equation (4.5.9) implies that
equation that has solutions   
−1 − λ −2 x
u(t) = Keλt , = 0.
3 −1 − λ y

105
§4.5 The Initial Value Problem and Eigenvectors

If this matrix is row equivalent to the identity matrix, Exercises


then the only solution of the linear system is x = y = 0.
To have a nonzero solution, the matrix

1. Write the system of linear ordinary differential equations


 
−1 − λ −2
3 −1 − λ
dx1
(t) = 4x1 (t) + 5x2 (t)
dt
must not be row equivalent to I2 . Dividing the 1st row dx2
by −(1 + λ) leads to dt
(t) = 2x1 (t) − 3x2 (t)

2 ! in matrix form.
1
1+λ .
3 −1 − λ
2. Show that all solutions to the system of linear differential
Subtracting 3 times the 1st row from the second produces equations
the matrix
dx
= 3x
2 dt
 
 1 1+λ dy
= −2y
.

 6 dt
0 −(1 + λ) −
1+λ are linear combinations of the two solutions
This matrix is not row equivalent to I2 when the lower
   
1 0
U (t) = e3t and V (t) = e−2t
right hand entry is zero; that is, when 0 1
.

6
(1 + λ) + = 0.
1+λ 3. Consider
dX
That is, when (t) = CX(t) (4.5.10)
dt
2
(1 + λ) = −6, where  
2 3
which is not possible for any real number λ. This example C=
0 −1
.
shows that the question of whether a given matrix has
Let
a real eigenvalue and a real eigenvector — and hence 
1
 
1

when the associated system of differential equations has v1 = and v2 = ,
0 −1
a line that is invariant under the dynamics — is a subtle and let
question. Y (t) = e2t v1 and Z(t) = e−t v2 .
Questions concerning eigenvectors and eigenvalues are
(a) Show that Y (t) and Z(t) are solutions to (4.5.10).
central to much of the theory of linear algebra. We dis-
cuss this topic for 2×2 matrices in Section 4.6 and Chap- (b) Show that X(t) = 2Y (t)−14Z(t) is a solution to (4.5.10).
ter 6 and for general square matrices in Chapters 7 and (c) Use the principle of superposition to verify that X(t) =
11. αY (t) + βZ(t) is a solution to (4.5.10).

106
§4.5 The Initial Value Problem and Eigenvectors

(d) Using the general solution found in part (c), find a solu- 7. Let
tion X(t) to (4.5.10) such that
 
1 2
C= .
−3 −1
Show that C has no real eigenvectors.
 
3
X(0) = .
−1

8. Suppose that A is an n × n matrix with zero as an eigen-


value. Show that A is not invertible. Hint: Assume that A
4. Find a solution to
is invertible and compute A−1 Av where v is an eigenvector of
Ẋ(t) = CX(t) A corresponding to the zero eigenvalue.

where   Remark: In fact, A is invertible if all of the eigenvalues of A


1 −1
C= are nonzero. See Corollary 7.2.5 of Chapter 7.
−1 1
and
 
1 1.5
9. (matlab) C = .
 
2 0 −2
X(0) = .
1
Hint: Observe that 10. (matlab) Use MATLAB to verify that solutions to the

1
 
1
 system of linear differential equations
and
1 −1 dx
= 2x + y
dt
are eigenvectors of C. dy
= y
dt
are linear combinations of the two solutions
5. Solve the initial value problem to the planar system of    
differential equations U (t) = e2t
1
and V (t) = et
−1
.
  0 1
dX 2 −1
= X = CX More concretely, proceed as follows:
dt −4 2
(a) By superposition, the general solution to the differen-
where X(0) = (0, 4)t . tial equation has the form X(t) = αU (t) + βV (t).
 Find
0
constants α and β such that αU (0) + βV (0) = .
1
6. Let   (b) Graph the second component y(t) of this solution using
a b
C= . the MATLAB plot command.
b a
(c) Use PhasePlane to compute a solution via the Keyboard
Show that     input starting at (x(0), y(0)) = (0, 1) and then use the y
1 1
and vs t command in PhasePlane to graph this solution.
1 −1
(d) Compare the results of the two plots.
are eigenvectors of C. What are the corresponding eigenval-  
ues? (e) Repeat steps (a)–(d) using the initial vector
1
.
1

107
§4.6 Eigenvalues of 2 × 2 Matrices

4.6 Eigenvalues of 2 × 2 Matrices Definition 4.6.1. The characteristic polynomial of the


matrix A is
We now discuss how to find eigenvalues of 2 × 2 matrices
pA (λ) = det(A − λI2 ).
in a way that does not depend explicitly on finding eigen-
vectors. This direct method will show that eigenvalues
can be complex as well as real. For an n × n matrix A = (aij ), define the trace of A to
be the sum of the diagonal elements of A; that is
We begin the discussion with a general square matrix.
Let A be an n × n matrix. Recall that λ ∈ R is an tr(A) = a11 + · · · + ann . (4.6.4)
eigenvalue of A if there is a nonzero vector v ∈ Rn for
which Thus, using (4.6.3), we can rewrite the characteristic
Av = λv. (4.6.1) polynomial for 2 × 2 matrices as
The vector v is called an eigenvector. We may rewrite pA (λ) = λ2 − tr(A)λ + det(A). (4.6.5)
(4.6.1) as:
(A − λIn )v = 0.
As an example, consider the 2 × 2 matrix
Since v is nonzero, it follows that if λ is an eigenvalue of
A, then the matrix A − λIn is singular.
 
2 3
A= . (4.6.6)
Conversely, suppose that A−λIn is singular for some real 1 4
number λ. Then Theorem 3.7.8 of Chapter 3 implies that Then
there is a nonzero vector v ∈ Rn such that (A−λIn )v = 0.
 
2−λ 3
Hence (4.6.1) holds and λ is an eigenvalue of A. So, if A − λI2 =
1 4−λ
,
we had a direct method for determining when a matrix is
singular, then we would have a method for determining and
eigenvalues.
pA (λ) = (2 − λ)(4 − λ) − 3 = λ2 − 6λ + 5.

Characteristic Polynomials Corollary 3.8.3 of Chapter 3 It is now easy to verify (4.6.5) for (4.6.6).
states that 2 × 2 matrices are singular precisely when
their determinant is zero. It follows that λ ∈ R is an Eigenvalues For 2 × 2 matrices A, pA (λ) is a quadratic
eigenvalue for the 2 × 2 matrix A precisely when polynomial. As we have discussed, the real roots of pA
det(A − λI2 ) = 0. (4.6.2) are real eigenvalues of A. For 2 × 2 matrices we now gen-
eralize our first definition of eigenvalues, Definition 4.5.1,
We can compute (4.6.2) explicitly as follows. Note that to include complex eigenvalues, as follows.
 
A − λI2 =
a−λ b
. Definition 4.6.2. An eigenvalue of A is a root of the
c d−λ characteristic polynomial pA .
Therefore
It follows from Definition 4.6.2 that every 2×2 matrix has
det(A − λI2 ) = (a − λ)(d − λ) − bc precisely two eigenvalues, which may be equal or complex
= λ2 − (a + d)λ + (ad − bc).(4.6.3) conjugate pairs.

108
§4.6 Eigenvalues of 2 × 2 Matrices

Suppose that λ1 and λ2 are the roots of pA . It follows Since the characteristic polynomial of 2 × 2 matrices is
that always a quadratic polynomial, it follows that 2 × 2 ma-
trices have precisely two eigenvalues — including mul-
pA (λ) = (λ−λ1 )(λ−λ2 ) = λ2 −(λ1 +λ2 )λ+λ1 λ2 . (4.6.7) tiplicity — and these can be described as follows. The
discriminant of A is:
Equating the two forms of pA (4.6.5) and (4.6.7) shows
that
D = [tr(A)]2 − 4 det(A). (4.6.10)
tr(A) = λ1 + λ2 (4.6.8)
Theorem 4.6.3. There are three possibilities for the two
det(A) = λ1 λ2 . (4.6.9) eigenvalues of a 2 × 2 matrix A that we can describe in
Thus, for 2 × 2 matrices, the trace is the sum of the terms of the discriminant:
eigenvalues and the determinant is the product of the
eigenvalues. In Chapter 7, Theorems 7.2.4(b) and 7.2.9 (i) The eigenvalues of A are real and distinct (D > 0).
we show that these statements are also valid for n × n
matrices. (ii) The eigenvalues of A are a complex conjugate pair
(D < 0).
Recall that in example (4.6.6) the characteristic polyno-
mial is (iii) The eigenvalues of A are real and equal (D = 0).

pA (λ) = λ2 − 6λ + 5 = (λ − 5)(λ − 1).

Thus the eigenvalues of A are λ1 = 1 and λ2 = 5 and Proof We can find the roots of the characteristic poly-
identities (4.6.8) and (4.6.9) are easily verified for this nomial using the form of pA given in (4.6.5) and the
example. quadratic formula. The roots are:

Next, we consider an example with complex eigenvalues 1 p  tr(A) ± √D


and verify that these identities are equally valid in this
2
tr(A) ± [tr(A)] − 4 det(A) = .
2 2
instance. Let
The proof of the theorem now follows. If D > 0, then
 
2 −3
B= .
1 4 the eigenvalues of A are real and distinct; if D < 0, then
The characteristic polynomial is: eigenvalues are complex conjugates; and if D = 0, then
the eigenvalues are real and equal. 
pB (λ) = λ2 − 6λ + 11.

Using the quadratic formula we see that the roots of pB Eigenvectors The following lemma contains an impor-
(that is, the eigenvalues of B) are tant observation about eigenvectors:
√ √
λ1 = 3 + i 2 and λ2 = 3 − i 2. Lemma 4.6.4. Every eigenvalue λ of a 2 × 2 matrix A
has an eigenvector v. That is, there is a nonzero vector
Again the sum of the eigenvalues is 6 which equals the v ∈ C2 satisfying
trace of B and the product of the eigenvalues is 11 which Av = λv.
equals the determinant of B.

109
§4.6 Eigenvalues of 2 × 2 Matrices

Proof When the eigenvalue λ is real we know that an It follows that v1 = (3, −1)t is an eigenvector since
eigenvector v ∈ R2 exists. However, when λ is complex,
then we must show that there is a complex eigenvector (A − I2 )v1 = 0.
v ∈ C2 , and this we have not yet done. More precisely, we
Similarly, to find an eigenvector associated with the
must show that if λ is a complex root of the characteristic
eigenvalue λ2 = 5 compute
polynomial pA , then there is a complex vector v such that
 
(A − λI2 )v = 0. −3 3
A − λ2 I2 = A − 5I2 = .
1 −1
As we discussed in Section 2.5, finding v is equivalent to
It follows that v2 = (1, 1)t is an eigenvector since
showing that the complex matrix
(A − 5I2 )v2 = 0.
 
a−λ b
A − λI2 =
c d−λ
is not row equivalent to the identity matrix. See The- Examples of Matrices with Complex Eigenvectors Let
orem 2.5.2 of Chapter 2. Since a is real and λ is not,  
0 −1
a − λ 6= 0. A short calculation shows that A − λI2 is row A= .
1 0
equivalent to the matrix
b Then pA (λ) = λ2 + 1 and the eigenvalues of A are ±i. To
 
 1 a−λ  find the eigenvector v ∈ C2 whose existence is guaranteed
.
pA (λ)  by Lemma 4.6.4, we need to solve the complex system of

0
a−λ linear equations Av = iv. We can rewrite this system as:
This matrix is not row equivalent to the identity matrix 
−i −1

v1

since pA (λ) = 0.  1 −i v2
= 0.

A calculation shows that


An Example of a Matrix with Real Eigenvectors Once we
know the eigenvalues of a 2 × 2 matrix, the associated
 
i
eigenvectors can be found by direct calculation. For ex- v= (4.6.11)
1
ample, we showed previously that the matrix
  is a solution. Since the coefficients of A are real, we can
A=
2 3
. take the complex conjugate of the equation Av = iv to
1 4 obtain
in (4.6.6) has eigenvalues λ1 = 1 and λ2 = 5. With this Av = −iv.
information we can find the associated eigenvectors. To Thus
find an eigenvector associated with the eigenvalue λ1 = 1
 
−i
v=
compute 1
is the eigenvector corresponding to the eigenvalue −i.
 
1 3
A − λ1 I2 = A − I2 = . This comment is valid for any complex eigenvalue.
1 3

110
§4.6 Eigenvalues of 2 × 2 Matrices

More generally, let In Exercises 6 – 8 compute the eigenvalues for the given 2 × 2
  matrix.
σ −τ
(4.6.12)
 
A= , 1 2
τ σ 6. .
0 −5
where τ 6= 0. Then 
−3 2

7. .
1 0
pA (λ) = λ2 − 2σλ + σ 2 + τ 2
 
= (λ − (σ + iτ ))(λ − (σ − iτ )), 3 −2
8. .
2 −1
and the eigenvalues of A are the complex conjugates
σ ± iτ . Thus A has no real eigenvectors. The com-
plex eigenvectors of A are v and v where v is defined 9. Suppose that the characteristic polynomial of the 2 × 2
in (4.6.11). matrix A is pA (λ) = λ2 + 2λ − 6. Find det(A) and tr(A).

Exercises 10. (a) Let A and B be 2 × 2 matrices. Using direct calcu-


lation, show that

tr(AB) = tr(BA). (4.6.13)


1. For which values of λ is the matrix
  (b) Now let A and B be n × n matrices. Verify by direct
1−λ 4 calculation that (4.6.13) is still valid.
2 3−λ

not invertible? Note: These values of λ are just the eigen- In Exercises 11 – 13 use the program map to guess whether
1 4
values of the matrix . the given matrix has real or complex conjugate eigenvalues.
2 3
For each example, write the reasons for your guess.
 
0.97 −0.22
In Exercises 2 – 5 compute the determinant, trace, and char- 11. (matlab) A = .
0.22 0.97
acteristic polynomials for the given 2 × 2 matrix.
 
0.97 0.22
12. (matlab) B = .
 
1 4
2. . 0.22 0.97
0 −1
 
0.4 −1.4
13. (matlab) C = .
 
2 13
3. . 1.5 0.5
−1 5
 
1 4
4. . In Exercises 14 – 15 use the program map to guess one of the
1 −1
eigenvectors of the given matrix. What is the corresponding
eigenvalue? Using map, can you find a second eigenvalue and
 
4 10
5. . eigenvector?
2 5

111
§4.6 Eigenvalues of 2 × 2 Matrices

 
2 4
14. (matlab) A = .
2 0
 
2 −1
15. (matlab) B = .
0.25 1
Hint: Use the feature Rescale in the MAP Options. Then
the length of the vector is rescaled to one after each use of
the command Map. In this way you can avoid overflows in
the computations while still being able to see the directions
where the vectors are moved by the matrix mapping.

16. (matlab) The MATLAB command eig computes the


eigenvalues
 of matrices. Use eig to compute the eigenvalues
2.34 −1.43
of A = .
π e

112
§4.7 Initial Value Problems Revisited

4.7 Initial Value Problems Revisited These roots may be found either by factoring pC or by
using the quadratic formula. The roots are real and dis-
To summarize the ideas developed in this chapter, we
tinct when the discriminant
review the method that we have developed to solve the
system of differential equations D = tr(C)2 − 4 det(C) > 0.

ẋ = ax + by Recall (4.6.10) and Theorem 4.6.3.


(4.7.1)
ẏ = cx + dy

satisfying the initial conditions Step 2: Find eigenvectors v1 and v2 of C associated with
the eigenvalues λ1 and λ2 .
x(0) = x0
(4.7.2) For j = 1 and j = 2, the eigenvector vj is found by
y(0) = y0 . solving the homogeneous system of linear equations
(C − λj I2 )v = 0 (4.7.5)
Begin by rewriting (4.7.1) in matrix form
for one nonzero solution. Lemma 4.6.4 tells us that there
Ẋ = CX (4.7.3) is always a nonzero solution to (4.7.5) since λj is an eigen-
value of C.
where

Step 3: Using superposition, write the general solution


   
a b x(t)
C= and X(t) = .
c d y(t) to the system of ODEs (4.7.3) as

Rewrite the initial conditions (4.7.2) in vector form X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 , (4.7.6)
where α1 , α2 ∈ R.
X(0) = X0 (4.7.4)
Theorem 4.5.3 tells us that for j = 1, 2
where 
x0
 Xj (t) = eλj t vj
X0 = .
y0 is a solution to (4.7.3). The principle of superposition
(see Section 4.5) allows us to conclude that
When the eigenvalues of C are real and distinct we now
know how to solve the initial value problem (4.7.3) and X(t) = α1 X1 (t) + α2 X2 (t)
(4.7.4). This solution is found in four steps.
is also a solution to (4.7.3) for any scalars α1 , α2 ∈ R.
Thus, (4.7.6) is valid.
Step 1: Find the eigenvalues λ1 and λ2 of C. Note that the initial condition corresponding to the gen-
These eigenvalues are the roots of the characteristic poly- eral solution (4.7.6) is
nomial as given by (4.6.5): X(0) = α1 v1 + α2 v2 , (4.7.7)
pC (λ) = λ2 − tr(C)λ + det(C). since e0 = 1.

113
§4.7 Initial Value Problems Revisited

Step 4: Solve the initial value problem by solving the with initial conditions
system of linear equations
x(0) = 2
(4.7.12)
X0 = α1 v1 + α2 v2 (4.7.8) y(0) = −3.

for α1 and α2 (see (4.7.7)). Rewrite the system (4.7.11) in matrix form as
Let A be the 2 × 2 matrix whose columns are v1 and v2 .
Ẋ = CX
That is,
A = (v1 |v2 ). (4.7.9) where  
3 −1
Then we may rewrite (4.7.8) in the form C= .
4 −2
 
A
α1
= X0 . (4.7.10) Rewrite the initial conditions (4.7.12) in vector form
α2  
2
X(0) = X0 = .
We claim that the matrix A = (v1 |v2 ) (defined in (4.7.9)) −3
is always invertible. Recall Lemma 4.5.2 which states
that if w is a nonzero multiple of v2 , then w is also an Now proceed through the four steps outlined previously.
eigenvector of A associated to the eigenvalue λ2 . Since
the eigenvalues λ1 and λ2 are distinct, it follows that
Step 1: Find the eigenvalues of C.
the eigenvector v1 is not a scalar multiple of the eigen-
vector v2 (see Lemma 4.5.2). Therefore, the area of the The characteristic polynomial of C is
parallelogram spanned by v1 and v2 is nonzero and the
pC (λ) = λ2 −tr(C)λ+det(C) = λ2 −λ−2 = (λ−2)(λ+1).
determinant of A is nonzero by Theorem 3.8.5 of Chap-
ter 3. Corollary 3.8.3 of Chapter 3 now implies that A is Therefore, the eigenvalues of C are
invertible. Thus, the unique solution to (4.7.10) is
  λ1 = 2 and λ2 = −1.
α1
= A−1 X0 .
α2
Step 2: Find the eigenvectors of C.
This equation is easily solved since we have an explicit
formula for A−1 when A is a 2 × 2 matrix (see (3.8.1) in Find an eigenvector associated with the eigenvalue λ1 = 2
Section 3.8). Indeed, by solving the system of equations
   
1
  3 −1 2 0
−1 d −b (C − λ1 I2 )v = − v
A = . 4 −2 0 2
det(A) −c a  
1 −1
= v = 0.
4 −4
An Initial Value Problem Solved by Hand Solve the linear
system of differential equations One particular solution to this system is
 
ẋ = 3x − y 1
(4.7.11) v1 = .
ẏ = 4x − 2y 1

114
§4.7 Initial Value Problems Revisited

Similarly, find an eigenvector associated with the eigen- It follows that we solve for the coefficients αj as
value λ2 = −1 by solving the system of equations       
α1 1 4 −1 2 1 11

3 −1
 
−1 0
 = A−1 X0 = = .
(C − λ2 I2 )v = − v α2 3 −1 1 −3 3 −5
4 −2 0 −1
  In coordinates
4 −1
= v = 0. 11 5
4 −1 α1 = and α2 = − .
3 3
One particular solution to this system is

1
 The solution to the initial value problem (4.7.11) and
v2 =
4
. (4.7.12) is:
1
11e2t v1 − 5e−t v2

X(t) =
Step 3: Write the general solution to the system of dif- 3
ferential equations.
   
1 1 1
= 11e2t − 5e−t .
3 1 4
Using superposition the general solution to the system
(4.7.11) is: Expressing the solution in coordinates, we obtain:
   
1 1 1
2t −t 2t −t
11e2t − 5e−t

X(t) = α1 e v1 +α2 e v2 = α1 e +α2 e , x(t) =
1 4 3
1
where α1 , α2 ∈ R. Note that the initial state of this 11e2t − 20e−t .

y(t) =
3
solution is:
     
X(0) = α1
1
+ α2
1
=
α1 + α2
. An Initial Value Problem Solved using MATLAB Next,
1 4 α1 + 4α2 solve the system of ODEs

ẋ = 1.7x + 3.5y
Step 4: Solve the initial value problem.
ẏ = 1.3x − 4.6y
Let
with initial conditions
 
1 1
A = (v1 |v2 ) = .
1 4
The equation for the initial condition is x(0) = 2.7
  y(0) = 1.1 .
α1
A = X0 .
α2 Rewrite this system in matrix form as
See (4.7.9).
Ẋ = CX
We can write the inverse of A by formula as
  where  
−1 1 4 −1 1.7 3.5
A = . C= .
3 −1 1 1.3 −4.6

115
§4.7 Initial Value Problems Revisited

Rewrite the initial conditions in vector form v1 = null(C1)


 
X0 =
2.7
. and obtain
1.1
v1 =
Now proceed through the four steps outlined previously. -0.9830
In MATLAB begin by typing -0.1838

C = [1.7 3.5; 1.3 -4.6] Similarly, to find an eigenvector associated to the eigen-
X0 = [2.7; 1.1] value λ2 type

C2 = C - lambda(2)*eye(2);
Step 1: Find the eigenvalues of C by typing v2 = null(C2)

lambda = eig(C) and obtain

and obtaining v2 =
-0.4496
lambda = 0.8932
2.3543
-5.2543 Step 3: The general solution to this system of differen-
tial equations is:
So the eigenvalues of C are real and distinct.    
2.3543t −0.9830 −5.2543t −0.4496
X(t) = α1 e +α2 e .
−0.1838 0.8932
Step 2: To find the eigenvectors of C we need to solve
two homogeneous systems of linear equations. The ma-
trix associated with the first system is obtained by typing Step 4: Solve the initial value problem by finding the
scalars α1 and α2 . Form the matrix A by typing
C1 = C - lambda(1)*eye(2)
A = [v1 v2]
which yields
Then solve for the α’s by typing
C1 = alpha = inv(A)*X0
-0.6543 3.5000
1.3000 -6.9543 obtaining

We can solve the homogeneous system (C1)x = 0 by alpha =


row reduction — but MATLAB has this process prepro- -3.0253
grammed in the command null. So type 0.6091

116
§4.7 Initial Value Problems Revisited

Therefore, the closed form solution to the initial value


   
1.23 2π 1.2
7. (matlab) C = and X0 = .
problem is: π/2 1.45 1.6
 
2.3543t 0.9830
X(t) =3.0253e
0.1838 In Exercises 8 – 9, find the solution to Ẋ = CX satisfying
  X(0) = X0 .
−0.4496
+0.6091e−5.2543t . Use MATLAB to find the eigenvalues and eigenvectors of C
0.8932
and then find a closed form solution X(t). Use this formula
to evaluate X(0.5) to three decimal places.
Exercises  
2.65 −2.34
8. (matlab) C = and X0 =
−1.5 −1.2
 
0.5
In Exercises 1 – 4 find the solution to the system of differential .
0.1
equations Ẋ = CX satisfying X(0) = X0 .
   
    1.2 2.4 0.5
1. C =
1 1
and X0 =
1
. 9. (matlab) C = and X0 = .
0 2 4 0.6 −3.5 0.7
   
2 −3 1
2. C = and X0 = .
0 −1 −2
   
−3 2 −1
3. C = and X0 = .
−2 2 3
   
2 1 1
4. C = and X0 = .
1 2 2

5. Solve the initial value problem Ẋ = CX where X0 = e1


given that
 
1
(a) X(t) = e−t is a solution,
2
(b) tr(C) = 3, and
(c) C is a symmetric matrix.

In Exercises 6 – 7, with MATLAB assistance, find the solution


to the system of differential equations Ẋ = CX satisfying
X(0) = X0 .
 
1.76 4.65
6. (matlab) C = and X0 =
0.23 1.11
 
0.34
.
−0.50

117
§4.8 *Markov Chains

4.8 *Markov Chains each hour the cat is asked to move from the room that it
is in to another. True to form, however, the cat chooses
Markov chains provide an interesting and useful applica-
with equal probability to stay in the room for another
tion of matrices and linear algebra. In this section we
hour or to move through one of the allowed passages.
introduce Markov chains via some of the theory and two
Suppose that we let pij be the probability that the cat
examples. The theory can be understood and applied
will move from room i to room j; in particular, pii is the
to examples using just the background in linear algebra
probability that the cat will stay in room i. For example,
that we have developed in this chapter.
when the cat is in room 1, it has four choices — it can stay
in room 1 or move to any of the other rooms. Assuming
An Example of Cats Consider the four room apartment that each of these choices is made with equal probability,
pictured in Figure 16. One way passages between the we see that
rooms are indicated by arrows. For example, it is possible 1 1 1 1
to go from room 1 directly to any other room, but when p11 = p12 = p13 = p14 = .
4 4 4 4
in room 3 it is possible to go only to room 4.
It is now straightforward to verify that

1 1
p21 = p22 = p23 = 0 p24 = 0
2 2

1 2 p31 = 0 p32 = 0

1
p33 =
1
2
1
p34 =
1
2
1
p41 = 0 p42 = p43 = p44 = .
3 3 3
Putting these probabilities together yields the transition
matrix:  1 1 1 1 
 4 4 4 4 
 1 1 
 0 0 
P = 2 2 (4.8.1*)
 
 0 1 1 
0

3 4

2 2 
 
1 1 1

0
3 3 3
This transition matrix has the properties that all entries
are nonnegative and that the entries in each row sum to
1.
Figure 16: Schematic design of apartment passages.
Three Basic Questions Using the transition matrix P ,
Suppose that there is a cat in the apartment and that at we discuss the answers to three questions:

118
§4.8 *Markov Chains

(A) What is the probability that a cat starting in room i obtaining


will be in room j after exactly k steps? We call the
movement that occurs after each hour a step. ans =
0.2728
(B) Suppose that we put 100 cats in the apartment with
some initial distribution of cats in each room. What
will the distribution of cats look like after a large A Discussion of Question (B) We answer Question (B)
number of steps? in two parts: first we compute a formula for determining
the number of cats that are expected to be in room i after
(C) Suppose that a cat is initially in room i and takes a k steps, and second we explore that formula numerically
large number of steps. For how many of those steps for large k. We begin by supposing that 100 cats are
will the cat be expected to be in room j? distributed in the rooms according to the initial vector
V0 = (v1 , v2 , v3 , v4 )t ; that is, the number of cats initially
A Discussion of Question (A) We begin to answer Ques- in room i is vi . Next, we denote the number of cats that
are expected to be in room i after k steps by vi . For
(k)
tion (A) by determining the probability that the cat
moves from room 1 to room 4 in two steps. We denote example, we determine how many cats we expect to be
this probability by p14 and compute
(2) in room 2 after one step. That number is:

(4.8.3)
(1)
v2 = p12 v1 + p22 v2 + p32 v3 + p42 v4 ;
(4.8.2)
(2)
p14 = p11 p14 + p12 p24 + p13 p34 + p14 p44 ;
that is, v2 is the sum of the proportion of cats in each
(1)
that is, the probability is the sum of the probabilities
that the cat will move from room 1 to each room i and room i that are expected to migrate to room 2 in one
then from room i to room 4. In this case the answer is: step. In this case, the answer is:

(2) 1 1 1 1 1 1 1 13 1 1 1
p14 = × + ×0+ × + × = ≈ 0.27 . v1 + v2 + v4 .
4 4 4 4 2 4 3 48 4 2 3

It follows from (4.8.2) and the definition of matrix multi- It now follows from (4.8.3), the definition of the transpose
plication that p14 is just the (1, 4)th entry in the matrix
(2) of a matrix, and the definition of matrix multiplication
that v2 is the 2nd entry in the vector P t V0 . Indeed,
(1)
P . An induction argument shows that the probability
2

of the cat moving from room i to room j in k steps is it follows by induction that vi is the ith entry in the
(k)

precisely the (i, j)th entry in the matrix P k — which vector (P t )k V0 which answers the first part of Question
answers Question (A). In particular, we can answer the (B).
question: What is the probability that the cat will move We may rephrase the second part of Question (B) as
from room 4 to room 3 in four steps? Using MATLAB the follows. Let
answer is given by typing e4_10_1 to recall the matrix
P and then typing Vk = (v1k , v2k , v3k , v4k )t = (P t )k V0 .

P4 = P^4; Question (B) actually asks: What will the vector Vk look
P4(4,3) like for large k. To answer that question we need some

119
§4.8 *Markov Chains

results about matrices like the matrix P in (4.8.1*). But cat starts in room 1; then the initial distribution of cats
first we explore the answer to this question numerically is one cat in room 1 and zero cats in any of the other
using MATLAB. rooms. So V0 = e1 . In our discussion of Question (B) we
saw that the 3rd entry in (P t )k V0 gives the probability
Suppose, for example, that the initial vector is
ck that the cat will be in room 3 after k steps.
 
2 In the extreme, suppose that the probability that the cat
will be in room 3 is 1 for each step k. Then the fraction
 43 
V0 =  21  .
 (4.8.4*)
of the time that the cat is in room 3 is
34
(1 + 1 + · · · + 1)/100 = 1.
Typing e4_10_1 and e4_10_4 enters the matrix P and
the initial vector V0 into MATLAB. To compute V20 , the In general, the fraction of the time f that the cat will be
distribution of cats after 20 steps, type in room 3 during a span of 100 steps is

Q=P' 1
f= (c1 + c2 + · · · + c100 ).
V20 = Q^(20)*V0 100
Since ck = (P t )k V0 , we see that
and obtain
1
f= (P t V0 + (P t )2 V0 + · · · + (P t )100 V0 ). (4.8.5)
V20 = 100
18.1818
27.2727 So, to answer Question (C), we need a way to sum the
27.2727 expression for f in (4.8.5), at least approximately. This
27.2727 is not an easy task — though the answer itself is easy to
explain. Let V be the eigenvector of P t with eigenvalue
Thus, after rounding to the nearest integer, we expect 1 such that the sum of the entries in V is 1. The answer
27 cats to be in each of rooms 2,3 and 4 and 18 cats to is: f is approximately equal to V . See Theorem 4.8.4 for
be in room 1 after 20 steps. In fact, the vector V20 has a more precise statement.
a remarkable feature. Compute Q*V20 in MATLAB and In our previous calculations the vector V20 was seen to be
see that V20 = P t V20 ; that is, V20 is, to within four digit (approximately) an eigenvector of P t with eigenvalue 1.
numerical precision, an eigenvector of P t with eigenvalue Moreover the sum of the entries in V20 is precisely 100.
equal to 1. This computation was not a numerical acci- Therefore, we normalize V20 to get V by setting
dent, as we now describe. Indeed, compute V20 for several
initial distributions V0 of cats and see that the answer will 1
V = V20 .
always be the same — up to four digit accuracy. 100
So, the fraction of time that the cat spends in room 3 is
A Discussion of Question (C) Suppose there is just one f ≈ 0.2727. Indeed, we expect the cat to spend approxi-
cat in the apartment; and we ask how many times that mately 27% of its time in rooms 2,3,4 and about 18% of
cat is expected to visit room 3 in 100 steps. Suppose the its time in room 1.

120
§4.8 *Markov Chains

Markov Matrices We now abstract the salient proper- Proposition 4.8.2. Let P be a transition matrix for a
ties of our cat example. A Markov chain is a system Markov chain.
with a finite number of states labeled 1,…,n along with
probabilities pij of moving from site i to site j in a single (a) The probability of moving from site i to site j in
step. The Markov assumption is that these probabili- exactly k steps is the (i, j)th entry in the matrix P k .
ties depend only on the site that you are in and not on
how you got there. In our example, we assumed that the (b) The expected number of individuals at site i after
probability of the cat moving from say room 2 to room 4 exactly k steps is the ith entry in the vector Vk ≡
did not depend on how the cat got to room 2 in the first (P t )k V0 .
place. (c) P is a Markov matrix.
We make a second assumption: there is a k such that
it is possible to move from any site i to any site j in Proof Only minor changes in our discussion of the cat
exactly k steps. This assumption is not valid for general example proves parts (a) and (b) of the proposition.
Markov chains, though it is valid for the cat example,
(c) The assumption that it is possible to move from each
since it is possible to move from any room to any other
site i to each site j in exactly k steps means that the
room in that example in exactly three steps. (It takes a
(i, j)th entry of P k is positive. For that k, all of the
minimum of three steps to get from room 3 to room 1 in
entries of P k are positive. In the cat example, all entries
the cat example.) To simplify our discussion we include
of P 3 are positive. 
this assumption in our definition of a Markov chain.
Definition 4.8.1. Markov matrices are square matrices Proposition 4.8.2 gives the answer to Question (A) and
P such that the first part of Question (B) for general Markov chains.
Let vi ≥ 0 be the number of individuals initially at
(0)
(a) all entries in P are nonnegative,
site i, and let V0 = (v1 , . . . , vn(0) )t . The total number of
(0)

(b) the entries in each row of P sum to 1, and individuals in the initial population is:
(c) there is a positive integer k such that all of the en- (0)
#(V0 ) = v1 + · · · + vn(0) .
tries in P k are positive.
Theorem 4.8.3. Let P be a Markov matrix. Then
It is straightforward to verify that parts (a) and (b) in the
definition of Markov matrices are satisfied by the transi- (a) #(Vk ) = #(V0 ); that is, the number of individuals
tion matrix after k time steps is the same as the initial number.
(b) V = lim Vk exists and #(V ) = #(V0 ).
 
p11 · · · p1n
P =  ... .. ..  k→∞
. . 

pn1 ··· pnn (c) V is an eigenvector of P t with eigenvalue equal to 1.

of a Markov chain. To verify part (c) requires further Proof (a) By induction it is sufficient to show that
discussion. #(V1 ) = #(V0 ). We do this by calculating from V1 =

121
§4.8 *Markov Chains

P t V0 that See (4.8.5). The proof of this theorem involves being able
to calculate the limit of fN as N → ∞. There are two
(1)
#(V1 ) = v1 + · · · + vn(1) main ideas. First, the limit of the matrix (P t )N exists
(p11 v1 + · · · + pn1 vn(0) ) + · · · + (p1n v1 + · · · + pnnasvn(0)
N )approaches infinity — call that limit Q. Moreover,
(0) (0)
=
(0)
Q is a matrix all of whose columns equal V . Second, for
= (p11 + · · · + p1n )v1 + · · · + (pn1 + · · · + pnn )vn(0) large N , the sum
(0)
= v1 + · · · + vn(0)
P t + (P t )2 + · · · + (P t )N ≈ Q + Q + · · · + Q = N Q,
since the entries in each row of P sum to 1. Thus #(V1 ) =
so that the limit of the fN is Qei = V .
#(V0 ), as claimed.
The verification of these statements is beyond the scope
(b) The hard part of this theorem is proving that the
of this text. For those interested, the idea of the proof
limiting vector V exists; we give a proof of this fact in
of the second part is roughly the following. Fix k large
Chapter 11, Theorem 11.4.4. Once V exists it follows
enough so that (P t )k is close to Q. Then when N is large,
directly from (a) that #(V ) = #(V0 ).
much larger than k, the sum of the first k terms in the
(c) Just calculate that series is nearly zero. 

P t V = P t ( lim Vk ) = P t ( lim (P t )k V0 ) Theorem 4.8.4 gives the answer to Question (C) for a
k→∞ k→∞
= lim (P )t k+1 t k
V0 = lim (P ) V0 = V, general Markov chain. It follows from Theorem 4.8.4 that
k→∞ k→∞ for Markov chains the amount of time that an individual
spends in room i is independent of the individual’s initial
which proves (c). 
room — at least after a large number of steps.
A complete proof of this theorem relies on a result known
Theorem 4.8.3(b) gives the answer to the second part of
as the ergodic theorem. Roughly speaking, the ergodic
Question (B) for general Markov chains. Next we discuss
theorem relates space averages with time averages. To
Question (C).
see how this point is relevant, note that Question (B)
Theorem 4.8.4. Let P be a Markov matrix. Let V be deals with the issue of how a large number of individuals
the eigenvector of P t with eigenvalue 1 and #(V ) = 1. will be distributed in space after a large number of steps,
Then after a large number of steps N the expected number while Question (C) deals with the issue of how the path
of times an individual will visit site i is N vi where vi is of a single individual will be distributed in time after a
the ith entry in V . large number of steps.

Sketch of Proof In our discussion of Question (C) An Example of Umbrellas This example focuses on the
for the cat example, we explained why the fraction fN utility of answering Question (C) and reinforces the fact
of time that an individual will visit site j when starting that results in Theorem 4.8.3 have the second interpre-
initially at site i is the j th entry in the sum tation given in Theorem 4.8.4.

1 t Consider the problem of a man with four umbrellas. If it


fN =
N
(P + (P t )2 + · · · + (P t )N )ei . is raining in the morning when the man is about to leave

122
§4.8 *Markov Chains

for his office, then the man takes an umbrella from home Specifically,
to office, assuming that he has an umbrella at home. If  
it is raining in the afternoon, then the man takes an 1 0 0 0 0
umbrella from office to home, assuming that he has an 0.2 0.8 0 0 0
 
(4.8.6*)
 
umbrella in his office. Suppose that the probability that P = 0 0.2 0.8 0 0 
 
it will rain in the morning is p = 0.2 and the probability 0 0 0.2 0.8 0
 
that it will rain in the afternoon is q = 0.3, and these 0 0 0 0.2 0.8
probabilities are independent. What percentage of days  
will the man get wet going from home to office; that is, 0.7 0.3 0 0 0
what percentage of the days will the man be at home on 0 0.7 0.3 0 0
 
 
a rainy morning with all of his umbrellas at the office? Q= 0 0 0.7 0.3 0 
 
 0 0 0 0.7 0.3 
There are five states in the system depending on the num- 0 0 0 0 1
ber of umbrellas that are at home. Let si where 0 ≤ i ≤ 4
be the state with i umbrellas at home and 4 − i umbrel- The transition matrix M from moving from state si on
las at work. For example, s2 is the state of having two one morning to state sj the next morning is just M =
umbrellas at home and two at the office. Let P be the P Q. We can compute this matrix using MATLAB by
5 × 5 transition matrix of state changes from morning to typing
afternoon and Q be the 5 × 5 transition matrix of state
changes from afternoon to morning. For example, the e4_10_6
probability p23 of moving from site s2 to site s3 is 0, since M = P*Q
it is not possible to have more umbrellas at home after
going to work in the morning. The probability q23 = q, obtaining
since the number of umbrellas at home will increase by
one only if it is raining in the afternoon. The transition M =
probabilities between all states are given in the following 0.7000 0.3000 0 0 0
transition matrices: 0.1400 0.6200 0.2400 0 0
0 0.1400 0.6200 0.2400 0
0 0 0.1400 0.6200 0.2400
 
1 0 0 0 0

 p 1−p 0 0 0 
 0 0 0 0.1400 0.8600
P = 0 p 1−p 0 0 ;
It is easy to check using MATLAB that all entries in the
 
 0 0 p 1−p 0 
0 0 0 p 1−p matrix M 4 are nonzero. So M is a Markov matrix and
we can use Theorem 4.8.4 to find the limiting distri-
bution of states. Start with some initial condition like
V0 = (0, 0, 1, 0, 0)t corresponding to the state in which
 
1−q q 0 0 0
 0 1−q q 0 0  two umbrellas are at home and two at the office. Then
compute the vectors Vk = (M t )k V0 until arriving at an
 
Q= 0 0 1−q q 0 
eigenvector of M t with eigenvalue 1. For example, V70 is
 
 0 0 0 1−q q 
0 0 0 0 1 computed by typing V70 = M'^(70)*V0 and obtaining

123
§4.8 *Markov Chains

V70 =
0.0419
0.0898
2
0.1537
0.2633
0.4512
5
We interpret V ≈ V70 in the following way. Since v1
is approximately .042, it follows that for approximately
4.2% of all steps the umbrellas are in state s0 . That is,
1 3
approximately 4.2% of all days there are no umbrellas at
home. The probability that it will rain in the morning
on one of those days is 0.2. Therefore, the probability of
being at home in the morning when it is raining without
any umbrellas is approximately 0.008. 4

Exercises Figure 17: State diagram of a Markov chain.

6. Suppose that P and Q are each n × n matrices whose rows


1. Let P be a Markov matrix and let w = (1, . . . , 1)t . Show sum to 1. Show that P Q is also an n × n matrix whose rows
that the vector w is an eigenvector of P with eigenvalue 1. sum to 1.

In Exercises 2 – 4 which of the matrices are Markov matrices,


and why? 7. (matlab) Suppose the apartment in Figure 16 is popu-
  lated by dogs rather than cats. Suppose that dogs will ac-
0.8 0.2 tually move when told; that is, at each step a dog will move
2. P = .
0.2 0.8 from the room that he occupies to another room.
 
0.8 0.2 (a) Calculate the transition matrix PDOG for this Markov
3. Q = .
0 1 chain and verify that PDOG is a Markov matrix.
(b) Find the probability that a dog starting in room 2 will
 
0.8 0.2
4. R = .
−0.2 1.2 end up in room 3 after 5 steps.
(c) Find the probability that a dog starting in room 3 will
end up in room 1 after 4 steps. Explain why your answer
5. The state diagram of a Markov chain is given in Figure 17. is correct without using MATLAB.
Assume that each arrow leaving a state has equal probability
(d) Suppose that the initial population consists of 100 dogs.
of being chosen. Find the transition matrix for this chain.
After a large number of steps what will be the distribu-
tion of the dogs in the four rooms.

124
§4.8 *Markov Chains

8. (matlab) A truck rental company has locations in three 11. (matlab) Suppose that the original man in the text with
cities A, B and C. Statistically, the company knows that the umbrellas has only three umbrellas instead of four. What is
trucks rented at one location will be returned in one week to the probability that on a given day he will get wet going to
the three locations in the following proportions. work?

Rental Location Returned to A Returned to B Returned to C


A 75% 10% 15%
B 5% 85% 10%
C 20% 20% 60%

Suppose that the company has 250 trucks. How should the
company distribute the trucks so that the number of trucks
available at each location remains approximately constant
from one week to the next?

9. (matlab) Let
 
0.10 0.20 0.30 0.15 0.25
 0.05 0.35 0.10 0.40 0.10 
(4.8.7*)
 
P = 0 0 0.35 0.55 0.10 

 0.25 0.25 0.25 0.25 0 
0.33 0.32 0 0 0.35

be the transition matrix of a Markov chain.

(a) What is the probability that an individual at site 2 will


move to site 5 in three steps?
(b) What is the probability that an individual at site 4 will
move to site 1 in seven steps?
(c) Suppose that 100 individuals are initially uniformly dis-
tributed at the five sites. How will the individuals be
distributed after four steps?
(d) Find an eigenvector of P t with eigenvalue 1.

10. (matlab) Suppose that the probability that it will rain


in the morning in p = 0.3 and the probability that it will
rain in the afternoon is q = 0.25. In the man with umbrellas
example, what is the probability that the man will be at home
with no umbrellas while it is raining?

125
Chapter 5 Vector Spaces

5 Vector Spaces gives a computable method for determining when a set is


a basis, is given in Section 5.6. This proof may be omit-
In Chapter 2 we discussed how to solve systems of m lin- ted on a first reading, but the statement of the theorem
ear equations in n unknowns. We found that solutions of is most important and must be understood.
these equations are vectors (x1 , . . . , xn ) ∈ Rn . In Chap-
ter 3 we discussed how the notation of matrices and ma-
trix multiplication drastically simplifies the presentation
of linear systems and how matrix multiplication leads to
linear mappings. We also discussed briefly how linear
mappings lead to methods for solving linear systems —
superposition, eigenvectors, inverses. In Chapter 4 we
discussed how to solve systems of n linear differential
equations in n unknown functions. These chapters have
provided an introduction to many of the ideas of linear
algebra and now we begin the task of formalizing these
ideas.
Sets having the two operations of vector addition and
scalar multiplication are called vector spaces. This con-
cept is introduced in Section 5.1 along with the two pri-
mary examples — the set Rn in which solutions to sys-
tems of linear equations sit and the set C 1 of differentiable
functions in which solutions to systems of ordinary differ-
ential equations sit. Solutions to systems of homogeneous
linear equations form subspaces of Rn and solutions of
systems of linear differential equations form subspaces of
C 1 . These issues are discussed in Sections 5.1 and 5.2.
When we solve a homogeneous system of equations, we
write every solution as a superposition of a finite num-
ber of specific solutions. Abstracting this process is one
of the main points of this chapter. Specifically, we show
that every vector in many commonly occurring vector
spaces (in particular, the subspaces of solutions) can be
written as a linear combination (superposition) of a few
solutions. The minimum number of solutions needed is
called the dimension of that vector space. Sets of vectors
that generate all solutions by superposition and that con-
sist of that minimum number of vectors are called bases.
These ideas are discussed in detail in Sections 5.3–5.5.
The proof of the main theorem (Theorem 5.5.3), which

126
§5.1 Vector Spaces and Subspaces

5.1 Vector Spaces and Subspaces We can also multiply a function f by a scalar c ∈ R by
defining the function cf to be:
Vector spaces abstract the arithmetic properties of addi-
tion and scalar multiplication of vectors. In Rn we know (cf )(t) = cf (t).
how to add vectors and to multiply vectors by scalars. In-
deed, it is straightforward to verify that each of the eight With these operations of addition and scalar multiplica-
properties listed in Table 1 is valid for vectors in V = Rn . tion, F is a vector space; that is, F satisfies the eight
Remarkably, sets that satisfy these eight properties have vector space properties in Table 1. More precisely:
much in common with Rn . So we define:
(A3) Define the zero function O by
Definition 5.1.1. Let V be a set having the two oper-
ations of addition and scalar multiplication. Then V is O(t) = 0 for all t ∈ R.
a vector space if the eight properties listed in Table 5.1.1
hold. The elements of a vector space are called vectors. For every x in F the function O satisfies:
(x + O)(t) = x(t) + O(t) = x(t) + 0 = x(t).
The vector 0 mentioned in (A3) in Table 1 is called the
zero vector. Therefore, x + O = x and O is the additive identity
When we say that a vector space V has the two opera- in F.
tions of addition and scalar multiplication we mean that (A4) Let x be a function in F and define y(t) = −x(t).
the sum of two vectors in V is again a vector in V and Then y is also a function in F, and
the scalar product of a vector with a number is again
a vector in V . These two properties are called closure (x + y)(t) = x(t) + y(t) = x(t) + (−x(t)) = 0 = O(t).
under addition and closure under scalar multiplication.
Thus, x has the additive inverse −x.
In this discussion we focus on just two types of vector
spaces: Rn and function spaces. The reason that we After these comments it is straightforward to verify that
make this choice is that solutions to linear equations are the remaining six properties in Table 1 are satisfied by
vectors in Rn while solutions to linear systems of differ- functions in F.
ential equations are vectors of functions.
Sets that are not Vector Spaces It is worth considering
An Example of a Function Space For example, let F de- how closure under vector addition and scalar multiplica-
note the set of all functions f : R → R. Note that func- tion can fail. Consider the following three examples.
tions like f1 (t) = t2 − 2t + 7 and f2 (t) = sin t are in F
since they are defined for all real numbers t, but that (i) Let V1 be the set that consists of just the x and y
functions like g1 (t) =
1
and g2 (t) = tan t are not in F axes in the plane. Since (1, 0) and (0, 1) are in V1
t but
since they are not defined for all t.
(1, 0) + (0, 1) = (1, 1)
We can add two functions f and g by defining the func-
is not in V1 , we see that V1 is not closed under vector
tion f + g to be:
addition. On the other hand, V1 is closed under
(f + g)(t) = f (t) + g(t). scalar multiplication.

127
§5.1 Vector Spaces and Subspaces

Table 1: Properties of Vector Spaces: suppose u, v, w ∈ V and r, s ∈ R.

(A1) Addition is commutative v+w =w+v


(A2) Addition is associative (u + v) + w = u + (v + w)
(A3) Additive identity 0 exists v+0=v
(A4) Additive inverse −v exists v + (−v) = 0
(M1) Multiplication is associative (rs)v = r(sv)
(M2) Multiplicative identity exists 1v = v
(D1) Distributive law for scalars (r + s)v = rv + sv
(D2) Distributive law for vectors r(v + w) = rv + rw

(ii) Let V2 be the set of all vectors (k, `) ∈ R2 where The x-axis and the xz-plane are examples of subsets
k and ` are integers. The set V2 is closed under of R3 that are closed under addition and closed under
addition but not under scalar multiplication since scalar multiplication. Every vector on the x-axis has the
1 1
(1, 0) = ( , 0) is not in V2 . form (a, 0, 0) ∈ R3 . The sum of two vectors (a, 0, 0) and
2 2 (b, 0, 0) on the x-axis is (a + b, 0, 0) which is also on the
(iii) Let V3 = [1, 2] be the closed interval in R. The set V3 x-axis. The x-axis is also closed under scalar multiplica-
is neither closed under addition (1 + 1.5 = 2.5 6∈ V3 ) tion as r(a, 0, 0) = (ra, 0, 0), and the x-axis is a subspace
nor under scalar multiplication (4 · 1.5 = 6 6∈ V3 ). of R3 . Similarly, every vector in the xz-plane in R3 has
Hence the set V3 is not closed under vector addition the form (a1 , 0, a3 ). As in the case of the x-axis, it is easy
and not closed under scalar multiplication. to verify that this set of vectors is closed under addition
and scalar multiplication. Thus, the xz-plane is also a
subspace of R3 .
Subspaces
In Theorem 5.1.4 we show that every subset of a vector
Definition 5.1.2. Let V be a vector space. A nonempty space that is closed under addition and scalar multiplica-
subset W ⊂ V is a subspace if W is a vector space us- tion is a subspace. To verify this statement, we need the
ing the operations of addition and scalar multiplication following lemma in which some special notation is used.
defined on V . Typically, we use the same notation 0 to denote the real
number zero and the zero vector. In the following lemma
Note that in order for a subset W of a vector space V to it is convenient to distinguish the two different uses of 0,
be a subspace it must be closed under addition and closed and we write the zero vector in boldface.
under scalar multiplication. That is, suppose w1 , w2 ∈ W Lemma 5.1.3. Let V be a vector space, and let 0 ∈ V
and r ∈ R. Then be the zero vector. Then

(i) w1 + w2 ∈ W , and 0v = 0 and (−1)v = −v

(ii) rw1 ∈ W . for every vector in v ∈ V .

128
§5.1 Vector Spaces and Subspaces

Proof Let v be a vector in V and use (D1) to compute Theorem 5.1.4. Let W be a subset of the vector space
V . If W is closed under addition and closed under scalar
0v + 0v = (0 + 0)v = 0v. multiplication, then W is a subspace.
By (A4) the vector 0v has an additive inverse −0v.
Adding −0v to both sides yields
Proof We have to show that W is a vector space us-
(0v + 0v) + (−0v) = 0v + (−0v) = 0. ing the operations of addition and scalar multiplication
defined on V . That is, we need to verify that the eight
Associativity of addition (A2) now implies properties listed in Table 1 are satisfied. Note that prop-
0v + (0v + (−0v)) = 0. erties (A1), (A2), (M1), (M2), (D1), and (D2) are valid
for vectors in W since they are valid for vectors in V .
A second application of (A4) implies that
It remains to verify (A3) and (A4). Let w ∈ W be any
0v + 0 = 0 vector. Since W is closed under scalar multiplication, it
follows that 0w and (−1)w are in W . Lemma 5.1.3 states
and (A3) implies that 0v = 0. that 0w = 0 and (−1)w = −w; it follows that 0 and −w
Next, we show that the additive inverse −v of a vector v are in W . Hence, properties (A3) and (A4) are valid for
is unique. That is, if v + a = 0, then a = −v. vectors in W , since they are valid for vectors in V . 
Before beginning the proof, note that commutativity of
addition (A1) together with (A3) implies that 0 + v = v. Examples of Subspaces of Rn
Similarly, (A1) and (A4) imply that −v + v = 0.
Example 5.1.5. (a) Let V be a vector space. Then
To prove uniqueness of additive inverses, add −v to both the subsets V and {0} are always subspaces of V . A
sides of the equation v + a = 0 yielding subspace W ⊂ V is proper if W 6= 0 and W 6= V .
−v + (v + a) = −v + 0. (b) Lines through the origin are subspaces of Rn . Let
Properties (A2) and (A3) imply w ∈ Rn be a nonzero vector and let W = {rw :
r ∈ R}. The set W is closed under addition and
(−v + v) + a = −v. scalar multiplication and is a subspace of Rn by The-
orem 5.1.4. The subspace W is just a line through
But the origin in Rn , since the vector rw points in the
(−v + v) + a = 0 + a = a. same direction as w when r > 0 and the exact op-
Therefore a = −v, as claimed. posite direction when r < 0.
To verify that (−1)v = −v, we show that (−1)v is the (c) Planes containing the origin are subspaces of R3 . To
additive inverse of v. Using (M1), (D1), and the fact that verify this point, let P be a plane through the origin
0v = 0, calculate and let N be a vector perpendicular to P . Then P
v + (−1)v = 1v + (−1)v = (1 − 1)v = 0v = 0. consists of all vectors v ∈ R3 perpendicular to N ;
using the dot-product (see Chapter 2, (2.2.3)) we
Thus, (−1)v is the additive inverse of v and must equal recall that such vectors satisfy the linear equation
−v, as claimed.  N · v = 0. By superposition, the set of all solutions

129
§5.1 Vector Spaces and Subspaces

to this equation is closed under addition and scalar (iii) u(t) = csc(t) is neither defined nor continuous at
multiplication and is therefore a subspace by Theo- t = kπ for any integer k.
rem 5.1.4.
The subset C 1 ⊂ F is a subspace and hence a vector
In a sense that will be made precise all subspaces of Rn space. The reason is simple. If x(t) and y(t) are contin-
can be written as the span of a finite number of vectors uously differentiable, then
generalizing Example 5.1.5(b) or as solutions to a system d dx dy
of linear equations generalizing Example 5.1.5(c). dt
(x + y) =
dt
+
dt
.

Hence x + y is differentiable and is in C 1 and C 1 is closed


Examples of Subspaces of the Function Space F Let P be under addition. Similarly, C 1 is closed under scalar mul-
the set of all polynomials in F. The sum of two polyno- tiplication. Let r ∈ R and let x ∈ C 1 . Then
mials is a polynomial and the scalar multiple of a poly- d dx
nomial is a polynomial. Thus, P is closed under addition (rx)(t) = r (t).
dt dt
and scalar multiplication, and P is a subspace of F.
Hence rx is differentiable and is in C 1 .
As a second example of a subspace of F, let C 1 be the set
of all continuously differentiable functions u : R → R. A
function u is in C 1 if u and u0 exist and are continuous The Vector Space (C 1 )n Another example of a vector
for all t ∈ R. Examples of functions in C 1 are: space that combines the features of both Rn and C 1 is
(C 1 )n . Vectors u ∈ (C 1 )n have the form
(i) Every polynomial p(t) = am tm + am−1 tm−1 + · · · +
u(t) = (u1 (t), . . . , un (t)),
a1 t + a0 is in C 1 .
where each coordinate function uj (t) ∈ C 1 . Addition
(ii) The function u(t) = eλt is in C 1 for each constant and scalar multiplication in (C 1 )n are defined coordinate-
λ ∈ R. wise — just like addition and scalar multiplication in Rn .
(iii) The trigonometric functions u(t) = sin(λt) and That is, let u, v be in (C 1 )n and let r be in R, then
v(t) = cos(λt) are in C 1 for each constant λ ∈ R. (u + v)(t) = (u1 (t) + v1 (t), . . . , un (t) + vn (t))
(ru)(t) = (ru1 (t), . . . , run (t)).
(iv) u(t) = t 7/3
is twice differentiable everywhere and is The set (C 1 )n satisfies the eight properties of vector
in C 1 . spaces and is a vector space. Solutions to systems of n
linear ordinary differential equations are vectors in (C 1 )n .
Equally there are many commonly used functions that
are not in C 1 . Examples include:
Exercises
1
(i) u(t) = is neither defined nor continuous at
t−5
t = 5.
1. Verify that the set V1 consisting of all scalar multiples of
(ii) u(t) = |t| is not differentiable (at t = 0). (1, −1, −2) is a subspace of R3 .

130
§5.1 Vector Spaces and Subspaces

2. Let V2 be the set of all 2 × 3 matrices. Verify that V2 is a 14. S = {x ∈ R2 : Ax = 0} where A is a 3 × 2 matrix.
vector space. 15. S = {x ∈ R2 : Ax = b} where A is a 3 × 2 matrix and
b ∈ R3 is a fixed nonzero vector.
3. Let
16. Let V be a vector space and let W1 and W2 be subspaces.
 
1 1 0
A= .
1 −1 1 Show that the intersection W1 ∩ W2 is also a subspace of V .
Let V3 be the set of vectors x ∈ R3 such that Ax = 0. Verify
that V3 is a subspace of R3 . Compare V1 with V3 . 17. For which scalars a, b, c do the solutions to the equation
ax + by = c
In Exercises 4 – 10 you are given a vector space V and a subset form a subspace of R ?2

W . For each pair, decide whether or not W is a subspace of


V , and explain why. 18. For which scalars a, b, c, d do the solutions to the equation
4. V = R and W consists of vectors in R that have a 0 in
3 3
ax + by + cz = d
their first component. form a subspace of R ?3

5. V = R and W consists of vectors in R that have a 1 in


3 3

their first component. 19. Show that the set of all solutions to the differential equa-
tion ẋ = 2x is a subspace of C 1 .
6. V = R2 and W consists of vectors in R2 for which the sum
of the components is 1.
20. Recall from equation (4.5.6) of Section 4.5 that solutions
7. V = R2 and W consists of vectors in R2 for which the sum to the system of differential equations
of the components is 0. dX

−1 3

= X
8. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying dt 3 −1
are
Z 4
x(t)dt = 0.
   
1 1
−2 X(t) = αe2t + βe−4t .
1 −1
9. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying Use this formula for solutions to show that the set of solutions
x(1) = 0. to this system of differential equations is a vector subspace of
(C 1 )2 .
10. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying
x(1) = 1.
21. Let V = R+ = {x ∈ R : x > 0}. Show that V is a vector
space under the operations of ‘addition’ (⊕)
In Exercises 11 – 15 which of the sets S are subspaces? v ⊕ w = vw for all v, w ∈ V
11. S = {(a, b, c) ∈ R : a ≥ 0, b ≥ 0, c ≥ 0}.
3
and ‘scalar multiplication’ (⊗)
12. S = {(x1 , x2 , x3 ) ∈ R : a1 x1 + a2 x2 + a3 x3 =
3
r ⊗ v = vr for all v ∈ V and r ∈ R
0 where a1 , a2 , a3 ∈ R are fixed}. Hints: The additive identity is v = 1; the additive inverse is
1
13. S = {(x, y) ∈ R2 : ; and the multiplicative identity is r = 1.
(x, y) is on the line through (1, 1) with slope 1}. v

131
§5.2 Construction of Subspaces

5.2 Construction of Subspaces We will see later that a solution to (5.2.2) has coordinate
functions xj (t) in C 1 . The principle of superposition then
The principle of superposition shows that the set of all
shows that W is a subspace of (C 1 )n . Suppose x(t) and
solutions to a homogeneous system of linear equations is
y(t) are solutions of (5.2.2). Then
closed under addition and scalar multiplication and is a
subspace. Indeed, there are two ways to describe sub- d dx dy
spaces: first as solutions to linear systems, and second as (x(t)+y(t)) = (t)+ (t) = Cx(t)+Cy(t) = C(x(t)+y(t));
dt dt dt
the span of a set of vectors. We shall see that solving a
homogeneous linear system of equations just means writ- so x(t) + y(t) is a solution of (5.2.2) and in W . A similar
ing the solution set as the span of a finite set of vectors. calculation shows that rx(t) is also in W and that W ⊂
(C 1 )n is a subspace.
Solutions to Homogeneous Systems Form Subspaces
Definition 5.2.1. Let A be an m × n matrix. The null Writing Solution Subspaces as a Span The way we
space of A is the set of solutions to the homogeneous solve homogeneous systems of equations gives a second
system of linear equations method for defining subspaces. For example, consider
the system
Ax = 0. (5.2.1)
Ax = 0,
Lemma 5.2.2. Let A be an m × n matrix. Then the null where
space of A is a subspace of Rn . 
2 1 4 0

A= .
−1 0 2 1
Proof Suppose that x and y are solutions to (5.2.1).
Then The matrix A is row equivalent to the reduced echelon
A(x + y) = Ax + Ay = 0 + 0 = 0; form matrix
so x + y is a solution of (5.2.1). Similarly, for r ∈ R 
1 0 −2 −1

E= .
A(rx) = rAx = r0 = 0; 0 1 8 2

so rx is a solution of (5.2.1). Thus, x + y and rx are Therefore x = (x1 , x2 , x3 , x4 ) is a solution of Ex = 0 if


in the null space of A, and the null space is closed under and only if x1 = 2x3 +x4 and x2 = −8x3 −2x4 . It follows
addition and scalar multiplication. So Theorem 5.1.4 im- that every solution of Ex = 0 can be written as:
plies that the null space is a subspace of the vector space
Rn . 
   
2 1
 −8   −2 
x = x3 
 1  + x4  0  .
  
Solutions to Linear Systems of Differential Equations Form
Subspaces Let C be an n×n matrix and let W be the set 0 1
of solutions to the linear system of ordinary differential
Since row operations do not change the set of solutions,
equations
dx it follows that every solution of Ax = 0 has this form.
(t) = Cx(t). (5.2.2) We have also shown that every solution is generated by
dt

132
§5.2 Construction of Subspaces

two vectors by use of vector addition and scalar multipli- It follows that
cation. We say that this subspace is spanned by the two
vectors     Rn = span{e1 , . . . , en }.
2 1
 −8  Similarly, the set span{e1 , e3 } ⊂ R3 is just the x1 x3 -
 and  −2  .
 
plane, since vectors in this span are

 1   0 
0 1
x1 e1 + x3 e3 = x1 (1, 0, 0) + x3 (0, 0, 1) = (x1 , 0, x3 ).
For example, a calculation verifies that the vector
  Proposition 5.2.4. Let V be a vector space and let
−1 w1 , . . . , wk ∈ V . Then W = span{w1 , . . . , wk } ⊂ V is
a subspace.
 −2 
 
 1 
−3
Proof Suppose x, y ∈ W . Then
is also a solution of Ax = 0. Indeed, we may write it as
      x = r1 w1 + · · · + rk wk
−1 2 1
 −2   −8   −2  y = s1 w1 + · · · + sk wk
 1  =  1  − 3 0 .
      (5.2.3)
for some scalars r1 , . . . , rk and s1 , . . . , sk . It follows that
−3 0 1
x + y = (r1 + s1 )w1 + · · · + (rk + sk )wk
Spans Let v1 , . . . , vk be a set of vectors in a vector space
and
V . A vector v ∈ V is a linear combination of v1 , . . . , vk
rx = (rr1 )w1 + · · · + (rrk )wk
if
v = r1 v1 + · · · + rk vk are both in span{w1 , . . . , wk }. Hence W ⊂ V is closed
for some scalars r1 , . . . , rk . under addition and scalar multiplication, and is a sub-
space by Theorem 5.1.4. 
Definition 5.2.3. The set of all linear combinations of
the vectors v1 , . . . , vk in a vector space V is the span of For example, let
v1 , . . . , vk and is denoted by span{v1 , . . . , vk }.
v = (2, 1, 0) and w = (1, 1, 1) (5.2.4)
For example, the vector on the left hand side in (5.2.3) is
a linear combination of the two vectors on the right hand be vectors in R3 . Then linear combinations of the vectors
side. v and w have the form
The simplest example of a span is Rn itself. Let vj = ej αv + βw = (2α + β, α + β, β)
where ej ∈ Rn is the vector with a 1 in the j th coordinate
and 0 in all other coordinates. Then every vector x = for real numbers α and β. Note that every one of these
(x1 , . . . , xn ) ∈ Rn can be written as vectors is a solution to the linear equation

x = x1 e1 + · · · + xn en . x1 − 2x2 + x3 = 0, (5.2.5)

133
§5.2 Construction of Subspaces

that is, the 1st coordinate minus twice the 2nd coordinate In Exercises 5 – 8 each of the given matrices is in reduced ech-
plus the 3rd coordinate equals zero. Moreover, you may elon form. Write solutions of the corresponding homogeneous
verify that every solution of (5.2.5) is a linear combina- system of linear equations as a span of vectors.
tion of the vectors v and w in (5.2.4). Thus, the set of
 
1 2 0 1 0
solutions to the homogeneous linear equation (5.2.5) is a 5. A =  0 0 1 4 0 .
subspace, and that subspace can be written as the span 0 0 0 0 1
of all linear combinations of the vectors v and w.  
1 3 0 5
6. B = .
In this language we see that the process of solving a ho- 0 0 1 2
mogeneous system of linear equations is just the process 
1 0 2

of finding a set of vectors that span the subspace of all 7. A = .
0 1 1
solutions. Indeed, we can now restate Theorem 2.4.6 of
Chapter 2. Recall that a matrix A has rank ` if it is row
 
1 −1 0 5 0 0
equivalent to a matrix in echelon form with ` nonzero 8. B =  0 0 1 2 0 2 .
rows. 0 0 0 0 1 2

Proposition 5.2.5. Let A be an m × n matrix with rank 9. Write a system of two linear equations of the form Ax = 0
`. Then the null space of A is the span of n − ` vectors. where A is a 2 × 4 matrix whose subspace of solutions in R4
is the span of the two vectors
We have now seen that there are two ways to describe
   
1 0
subspaces — as solutions of homogeneous systems of lin-  −1   0 
ear equations and as a span of a set of vectors, the span-
v1 =  0  and v2 =  1  .
  

ning set. Much of linear algebra is concerned with deter- 0 −1


mining how one goes from one description of a subspace
to the other. For each subset of a vector space given in Exercises (10)-(13)
determine whether the subset is a vector subspace and if it
is a vector subspace, find the smallest number of vectors that
Exercises spans the space.
10. S = {p(t) ∈ P5 : p(2) = 0 and p0 (1) = 0}
11. T = symmetric 2 × 2 matrices. That is, T is the set of
In Exercises 1 – 4 a single equation in three variables is given. 2 × 2 matrices A so that A = AT .
For each equation write the subspace of solutions in R3 as the
span of two vectors in R3 . 12. U = 2 × 3 matrices in reduced row-echelon form
13. Let A be the 3 × 4 matrix
1. 4x − 2y + z = 0.  
1 2 1 2
2. x − y + 3z = 0. A= 1 1 0 1 
0 1 1 1
3. x + y + z = 0. and let
4. y = z. V = {y ∈ R3 : there exists x ∈ R4 such that Ax = y}

134
§5.2 Construction of Subspaces

22. Let V be a vector space and let v, w ∈ V be vectors.


 
2 2
14. Write the matrix A = as a linear combina-
−3 0 Show that
tion of the matrices
    span{v, w} = span{v, w, v + 3w}.
1 1 0 0
B= and C= .
0 0 1 0
23. Let W = span{w1 , . . . , wk } be a subspace of the vector
space V and let wk+1 ∈ W be another vector. Prove that
15. Is (2, 20, 0) in the span of w1 = (1, 1, 3) and w2 = (1, 4, 2)? W = span{w1 , . . . , wk+1 }.
Answer this question by setting up a system of linear equa-
tions and solving that system by row reducing the associated
augmented matrix. 24. Let W ⊂ Rn be a k-dimensional subspace where k < n.
Define

In Exercises 16 – 19 let W ⊂ C 1 be the subspace spanned V = {v ∈ Rn : v · w = 0 for all w ∈ W }


by the two polynomials x1 (t) = 1 and x2 (t) = t2 . For the
given function y(t) decide whether or not y(t) is an element Show that V is a subspace of Rn .
of W . Furthermore, if y(t) ∈ W , determine whether the set
{y(t), x2 (t)} is a spanning set for W .
25. Let Ax = b be a system of m linear equations in n un-
16. y(t) = 1 − t2 , knowns, and let r = rank(A) and s = rank(A|b). Suppose
that this system has a unique solution. What can you say
17. y(t) = t4 ,
about the relative magnitudes of m, n, r, s?
18. y(t) = sin t,

19. y(t) = 0.5t2

20. Let W ⊂ R4 be the subspace that is spanned by the


vectors

w1 = (−1, 2, 1, 5) and w2 = (2, 1, 3, 0).

Find a linear system of two equations such that W =


span{w1 , w2 } is the set of solutions of this system.

21. Let V be a vector space and let v ∈ V be a nonzero


vector. Show that

span{v, v} = span{v}.

135
§5.3 Spanning Sets and MATLAB

5.3 Spanning Sets and MATLAB and observe that this linear combination is the desired
one.
In this section we discuss:
Next we describe how to find the coefficients 4.1404 and
• how to find a spanning set for the subspace of solu- -7.2012 by showing that these coefficients themselves
tions to a homogeneous system of linear equations are solutions to another system of linear equations.
using the MATLAB command null, and
When is a Vector in a Span? Let w1 , . . . , wk and v be
• how to determine when a vector is in the subspace vectors in Rn . We now describe a method that allows
spanned by a set of vectors using the MATLAB com- us to decide whether v is in span{w1 , . . . , wk }. To an-
mand rref. swer this question one has to solve a system of n linear
equations in k unknowns. The unknowns correspond to
the coefficients in the linear combination of the vectors
Spanning Sets for Homogeneous Linear Equations In
w1 , . . . , wk that gives v.
Chapter 2 we saw how to use Gaussian elimination, back
substitution, and MATLAB to compute solutions to a Let us be more precise. The vector v is in
system of linear equations. For systems of homogeneous span{w1 , . . . , wk } if and only if there are constants
equations, MATLAB provides a command to find a span- r1 , . . . , rk such that the equation
ning set for the subspace of solutions. That command is
null. For example, if we type r1 w1 + · · · + rk wk = v (5.3.1)

is valid. Define the n × k matrix A as the one having


A = [2 1 4 0; -1 0 2 1] w1 , . . . , wk as its columns; that is,
B = null(A)
A = (w1 | · · · |wk ). (5.3.2)
then we obtain
Let r be the k-vector
B =
 
r1
0.4830 0 r =  ...  .
 
-0.4140 0.8729
rk
-0.1380 -0.2182
0.7591 0.4364 Then we may rewrite equation (5.3.1) as

The two columns of the matrix B span the set of solu- Ar = v. (5.3.3)
tions of the equation Ax = 0. In particular, the vector
(2, −8, 1, 0) is a solution to Ax = 0 and is therefore a To summarize:
linear combination of the column vectors of B. Indeed, Lemma 5.3.1. Let w1 , . . . , wk and v be vectors in Rn .
type Then v is in span{w1 , . . . , wk } if and only if the system
of linear equations (5.3.3) has a solution where A is the
4.1404*B(:,1)-7.2012*B(:,2) n × k defined in (5.3.2).

136
§5.3 Spanning Sets and MATLAB

To solve (5.3.3) we row reduce the augmented matrix aug = [A v]


(A|v). For example, is v = (2, 1) in the span of w1 =
(1, 1) and w2 = (1, −1)? That is, do there exist scalars Then solve the system by typing rref(aug) to obtain
r1 , r2 such that
ans =
1 0 3
     
1 1 2
r1 + r2 = ?
1 −1 1 0 1 -4
0 0 0
As noted, we can rewrite this equation as 0 0 0
    
1 1 r1 2
= . It follows that (r1 , r2 ) = (3, −4) is a solution and v =
1 −1 r2 1
3w1 − 4w2 .
We can solve this equation by row reducing the aug- Now we change the 4th entry in v slightly by typing v(4)
mented matrix   = 4.01. There is no solution to the system of equations
1 1 2  
1 −1 1 −2
4 
to obtain

  Ar =  −3 

3
 1 0
2  4.01
1 .
as we now show. Type

0 1
2

So v =
3 1
w1 + w2 . aug = [A v]
2 2 rref(aug)
Row reduction to reduced echelon form has been prepro-
grammed in the MATLAB command rref. Consider the which yields
following example. Let
ans =
w1 = (2, 0, −1, 4) and w2 = (2, −1, 0, 2) (5.3.4) 1 0 0
0 1 0
and ask the question whether v = (−2, 4, −3, 4) is in 0 0 1
span{w1 , w2 }. 0 0 0
In MATLAB load the matrix A having w1 and w2 as its
columns and the vector v by typing e5_3_5 This matrix corresponds to an inconsistent system; thus
    v is no longer in the span of w1 and w2 .
2 2 −2
 0 −1 
 and v =  4  . (5.3.5*) Exercises
 
A=  −1  −3 
0 
4 2 4
We can solve the system of equations using MATLAB. In Exercises 1 – 3 use the null command in MATLAB to find
First, form the augmented matrix by typing all the solutions of the linear system of equations Ax = 0.

137
§5.3 Spanning Sets and MATLAB

1. (matlab)
 
−4 0 −4 3
A= (5.3.6*)
−4 1 −1 1

2. (matlab)  
1 2
A= 1 0  (5.3.7*)
3 −2

3. (matlab)  
1 1 2
A= . (5.3.8*)
−1 2 −1

4. (matlab) Use the null command in MATLAB to verify


your answers to Exercises 5 and 6.

5. (matlab) Use row reduction to find the solutions to Ax =


0 where A is given in (5.3.6*). Does your answer agree with
the MATLAB answer using null? If not, explain why.

In Exercises 6 – 8 let W ⊂ R5 be the subspace spanned by


the vectors

w1 = (2, 0, −1, 3, 4), w2 = (1, 0, 0, −1, 2),


w3 = (0, 1, 0, 0, −1).
(5.3.9*)
Use MATLAB to decide whether the given vectors are ele-
ments of W .

6. (matlab) v1 = (2, 1, −2, 8, 3).

7. (matlab) v2 = (−1, 12, 3, −14, −1).

8. (matlab) v3 = (−1, 12, 3, −14, −14).

138
§5.4 Linear Dependence and Linear Independence

5.4 Linear Dependence and Linear Since linear independence means not linearly dependent,
Lemma 5.4.2 can be rewritten as:
Independence
An important question in linear algebra concerns finding Lemma 5.4.4. The set of vectors {w1 , . . . , wk } is lin-
spanning sets for subspaces having the smallest number early independent if and only if whenever
of vectors. Let w1 , . . . , wk be vectors in a vector space
V and let W = span{w1 , . . . , wk }. Suppose that W is r1 w1 + · · · + rk wk = 0,
generated by a subset of these k vectors. Indeed, sup-
it follows that
pose that the k th vector is redundant in the sense that
W = span{w1 , . . . , wk−1 }. Since wk ∈ W , this is possi- r1 = r2 = · · · = rk = 0.
ble only if wk is a linear combination of the k − 1 vectors
w1 , . . . , wk−1 ; that is, only if
Let ej be the vector in Rn whose j th component is 1 and
wk = r1 w1 + · · · + rk−1 wk−1 . (5.4.1) all of whose other components are 0. The set of vectors
e1 , . . . , en is the simplest example of a set of linearly in-
Definition 5.4.1. Let w1 , . . . , wk be vectors in the vec-
dependent vectors in Rn . We use Lemma 5.4.4 to verify
tor space V . The set {w1 , . . . , wk } is linearly dependent
independence by supposing that
if one of the vectors wj can be written as a linear com-
bination of the remaining k − 1 vectors. r1 e1 + · · · + rn en = 0.
Note that when k = 1, the phrase ‘{w1 } is linearly de- A calculation shows that
pendent’ means that w1 = 0.
If we set rk = −1, then we may rewrite (5.4.1) as 0 = r1 e1 + · · · + rn en = (r1 , . . . , rn ).

r1 w1 + · · · + rk−1 wk−1 + rk wk = 0. It follows that each rj equals 0, and the vectors e1 , . . . , en


are linearly independent.
It follows that:
Lemma 5.4.2. The set of vectors {w1 , . . . , wk } is lin-
early dependent if and only if there exist scalars r1 , . . . , rk Deciding Linear Dependence and Linear Independence
such that Deciding whether a set of k vectors in Rn is linearly de-
pendent or linearly independent is equivalent to solving
(a) at least one of the rj is nonzero, and a system of linear equations. Let w1 , . . . , wk be vectors
in Rn , and view these vectors as column vectors. Let
(b) r1 w1 + · · · + rk wk = 0.
A = (w1 | · · · |wk ) (5.4.2)
For example, the vectors w1 = (2, 4, 7), w2 = (5, 1, −1),
and w3 = (1, −7, −15) are linearly dependent since 2w1 − be the n × k matrix whose columns are the vectors wj .
w2 + w3 = 0. Then a vector  
r1
Definition 5.4.3. A set of k vectors {w1 , . . . , wk } is lin-
early independent if none of the k vectors can be written R =  ... 
 
as a linear combination of the other k − 1 vectors. rk

139
§5.4 Linear Dependence and Linear Independence

is a solution to the system of equations AR = 0 precisely are linearly dependent. After typing e5_4_4 in MATLAB,
when form the 5 × 4 matrix A by typing
r1 w1 + · · · + rk wk = 0. (5.4.3)
If there is a nonzero solution R to AR = 0, then the A = [w1 w2 w3 w4]
vectors {w1 , . . . , wk } are linearly dependent; if the only
solution to AR = 0 is R = 0, then the vectors are linearly Determine whether there is a nonzero solution to AR = 0
independent. by typing
The preceding discussion is summarized by:
Lemma 5.4.5. The vectors w1 , . . . , wk in Rn are linearly null(A)
dependent if the null space of the n × k matrix A defined
in (5.4.2) is nonzero and linearly independent if the null The response from MATLAB is
space of A is zero.
ans =
-0.7559
A Simple Example of Linear Independence with Two Vectors
-0.3780
The two vectors
0.3780
0.3780
   
2 1
 −8   −2 
 1  and w2 =  0 
w1 =    
showing that there is a nonzero solution to AR = 0 and
0 1 the vectors wj are linearly dependent. Indeed, this so-
are linearly independent. To see this suppose that lution for R shows that we can solve for w1 in terms of
r1 w1 + r2 w2 = 0. Using the components of w1 and w2 w2 , w3 , w4 . We can now ask whether or not w2 , w3 , w4
this equality is equivalent to the system of four equations are linearly dependent. To answer this question form the
matrix
2r1 + r2 = 0, −8r1 − 2r2 = 0, r1 = 0, and r2 = 0.
In particular, r1 = r2 = 0; hence w1 and w2 are linearly B = [w2 w3 w4]
independent.
and type null(B) to obtain
Using MATLAB to Decide Linear Dependence Suppose
that we want to determine whether or not the vectors ans =
0 Empty matrix: 3-by-0
       
1 −1 1
 2   1   1   4 
showing that the only solution to BR = 0 is the zero
       
w1 =  −1  w2 =  4  w3 =  −1  w4 = 
      3 
solution R = 0. Thus, w2 , w3 , w4 are linearly indepen-
 
 3   −2   3   1 
5 0 12 dent. For these particular vectors, any three of the four
−2
(5.4.4*) are linearly independent.

140
§5.4 Linear Dependence and Linear Independence

Exercises 8. Suppose that the three vectors u1 , u2 , u3 ∈ Rn are linearly


independent. Show that the set

{u1 + u2 , u2 + u3 , u3 + u1 }
1. Let w be a vector in the vector space V . Show that the
is also linearly independent.
sets of vectors {w, 0} and {w, −w} are linearly dependent.

9. Consider the matrix


2. For which values of b are the vectors (1, b) and (3, −1)
linearly independent?
 
1 −6 −1
 −3 0 2 
A=
 .
0 1 −2 
3. Let 0 −2 4

u1 = (1, −1, 1) u2 = (2, 1, −2) u3 = (10, 2, −6). (a) Show that the rows of A are linearly dependent.

Is the set {u1 , u2 , u3 } linearly dependent or linearly indepen- (b) Find two subsets S1 and S2 of rows of A such that
dent? (i) S1 6= S2 .
(ii) span S1 = span S2 .
4. For which values of b are the vectors (1, b, 2b) and (2, 1, 4) (iii) The vectors in S1 are linearly independent and the
linearly independent? vectors in S2 are linearly independent.

In Exercises 10 – 12, determine whether the given sets of


5. Let
     vectors are linearly independent or linearly dependent.
3 −6 0
u= 2  v= 1  w= 5  10. (matlab)
−5 7 −3
v1 = (2, 1, 3, 4) v2 = (−4, 2, 3, 1) v3 = (2, 9, 21, 22)
(a) Determine whether the sets {u, v}, {u, w}, {v, w} are lin- (5.4.5*)
early independent.
11. (matlab)
(b) Is the set {u, v, w} linearly independent?
w1 = (1, 2, 3) w2 = (2, 1, 5) w3 = (−1, 2, −4) w4 = (0, 2, −1)
(5.4.6*)
6. Show that the polynomials p1 (t) = 2+t, p2 (t) = 1+t2 , and
12. (matlab)
p3 (t) = t − t2 are linearly independent vectors in the vector
space C 1 . x1 = (3, 4, 1, 2, 5) x2 = (−1, 0, 3, −2, 1) x3 = (2, 4, −3, 0, 2)
(5.4.7*)

7. Show that the functions f1 (t) = sin t, f2 (t) = cos t, and


π
f3 (t) = cos t + are linearly dependent vectors in C 1 . 13. (matlab) Perform the following experiments.
3

141
§5.4 Linear Dependence and Linear Independence

(a) Use MATLAB to choose randomly three column vectors


in R3 . The MATLAB commands to choose these vectors
are:
y1 = rand(3,1)
y2 = rand(3,1)
y3 = rand(3,1)
Use the methods of this section to determine whether
these vectors are linearly independent or linearly depen-
dent.
(b) Now perform this exercise five times and record the num-
ber of times a linearly independent set of vectors is cho-
sen and the number of times a linearly dependent set is
chosen.
(c) Repeat the experiment in (b) — but this time randomly
choose four vectors in R3 to be in your set.

142
§5.5 Dimension and Bases

5.5 Dimension and Bases if B is a spanning set for W with the smallest number of
The minimum number of vectors that span a vector space elements in a spanning set for W .
has special significance.
It follows that if {w1 , . . . , wk } is a basis for W , then k =
Definition 5.5.1. The vector space V has finite dimen-
dim W . The main theorem about bases is:
sion if V is the span of a finite number of vectors. If V
has finite dimension, then the smallest number of vectors
that span V is called the dimension of V and is denoted Theorem 5.5.3. A set of vectors B = {w1 , . . . , wk } in
by dim V . a vector space W is a basis for W if and only if the set
B is linearly independent and spans W .
For example, recall that ej is the vector in Rn whose j th
component is 1 and all of whose other components are 0. Remark: The importance of Theorem 5.5.3 is that we
Let x = (x1 , . . . , xn ) be in Rn . Then can show that a set of vectors is a basis by verifying
spanning and linear independence. We never have to
x = x1 e1 + · · · + xn en . (5.5.1)
check directly that the spanning set has the minimum
Since every vector in R is a linear combination of the
n
number of vectors for a spanning set.
vectors e1 , . . . , en , it follows that Rn = span{e1 , . . . , en }. For example, we have shown previously that the set of
Thus, Rn is finite dimensional. Moreover, the dimension vectors {e1 , . . . , en } in Rn is linearly independent and
of Rn is at most n, since Rn is spanned by n vectors. It spans Rn . It follows from Theorem 5.5.3 that this set is
seems unlikely that Rn could be spanned by fewer than a basis, and that the dimension of Rn is n. In particular,
n vectors— but this point needs to be proved. Rn cannot be spanned by fewer than n vectors.
The proof of Theorem 5.5.3 is given in Section 5.6.
An Example of a Vector Space that is Not Finite Dimen-
sional Next we discuss an example of a vector space that
does not have finite dimension. Consider the subspace
P ⊂ C 1 consisting of polynomials of all degrees. We Consequences of Theorem 5.5.3 We discuss two ap-
show that P is not the span of a finite number of vectors plications of Theorem 5.5.3. First, we use this theorem
and hence that P does not have finite dimension. Let to derive a way of determining the dimension of the sub-
p1 (t), p2 (t), . . . , pk (t) be a set of k polynomials and let d space spanned by a finite number of vectors. Second,
be the maximum degree of these k polynomials. Then we show that the dimension of the subspace of solutions
every polynomial in the span of p1 (t), . . . , pk (t) has de- to a homogeneous system of linear equation Ax = 0 is
gree less than or equal to d. In particular, p(t) = td+1 n − rank(A) where A is an m × n matrix.
is a polynomial that is not in the span of p1 (t), . . . , pk (t)
and P is not spanned by finitely many vectors.
Computing the Dimension of a Span We show that the
dimension of a span of vectors can be found using ele-
Bases and The Main Theorem
mentary row operations on M .
Definition 5.5.2. Let B = {w1 , . . . , wk } be a set of vec-
tors in a vector space W . The subset B is a basis for W Lemma 5.5.4. Let w1 , . . . , wk be k row vectors in Rn

143
§5.5 Dimension and Bases

and let W = span{w1 , . . . , wk } ⊂ Rn . Define {v1 , . . . , v` } is a basis for W and that the dimension of
  W is `. To verify the claim, suppose
w1
 ..  a1 v1 + · · · + a` v` = 0. (5.5.3)
M = . 
wk We show that ai must equal 0 as follows. In the ith row,
the pivot must occur in some column — say in the j th
to be the matrix whose rows are the wj s. Then column. It follows that the j th entry in the vector of the
left hand side of (5.5.3) is
dim(W ) = rank(M ). (5.5.2)
0a1 + · · · + 0ai−1 + 1ai + 0ai+1 + · · · + 0a` = ai ,

since all entries in the j th column of E other than the


Proof To verify (5.5.2), observe that the span of pivot must be zero, as E is in reduced echelon form. 
w1 , . . . , wk is unchanged by
For instance, let W = span{w1 , w2 , w3 } in R4 where
(a) swapping wi and wj ,
w1 = (3, −2, 1, −1),
(b) multiplying wi by a nonzero scalar, and w2 = (1, 5, 10, 12), (5.5.4*)
w3 = (1, −12, −19, −25).
(c) adding a multiple of wi to wj .
To compute dim W in MATLAB , type e5_5_4 to load
That is, if we perform elementary row operations on M , the vectors and type
the vector space spanned by the rows of M does not
change. So we may perform elementary row operations M = [w1; w2; w3]
on M until we arrive at the matrix E in reduced echelon
form. Suppose that ` = rank(M ); that is, suppose that Row reduction of the matrix M in MATLAB leads to the
` is the number of nonzero rows in E. Then reduced echelon form matrix

ans =
 
v1
 ..  1.0000 0 1.4706 1.1176
 . 

 v` 
 0 1.0000 1.7059 2.1765
E=  0 ,
 0 0 0 0
 . 
 
.
 .  indicating that the dimension of the subspace W is two,
0 and therefore {w1 , w2 , w3 } is not a basis of W . Alter-
natively, we can use the MATLAB command rank(M) to
where the vj are the nonzero rows in the reduced echelon compute the rank of M and the dimension of the span
form matrix. W.
We claim that the vectors v1 , . . . , v` are linearly in- However, if we change one of the entries in w3 ,
dependent. It then follows from Theorem 5.5.3 that for instance w3(3)=-18 then indeed the command

144
§5.5 Dimension and Bases

rank([w1;w2;w3]) gives the answer three indicating which equals


that for this choice of vectors {w1, w2, w3} is a basis for        
4 −2 3 −8
span{w1, w2, w3}.  1   0   0   0 
       
 0   −3   −2   −4 
Solutions to Homogeneous Systems Revisited We return
       
 0  + x4  1  + x5  0  + x7  0  .
x2        
to our discussions in Chapter 2 on solving linear equa-  0   0   1   0 
tions. Recall that we can write all solutions to the system
       
 0   0   0   −2 
of homogeneous equations Ax = 0 in terms of a few pa- 0 0 0 1
rameters, and that the null space of A is the subspace of
solutions (See Definition 5.2.1). More precisely, Proposi- We can rewrite the right hand side of (5.5.6) as a linear
tion 5.2.5 states that the number of parameters needed combination of four vectors w2 , w4 , w5 , w7
is n − rank(A) where n is the number of variables in the x2 w2 + x4 w4 + x5 w5 + x7 w7 . (5.5.7)
homogeneous system. We claim that the dimension of
the null space is exactly n − rank(A). This calculation shows that the null space of A, which is
For example, consider the reduced echelon form 3 × 7 W = {x ∈ R7 : Ax = 0}, is spanned by the four vectors
matrix w2 , w4 , w5 , w7 . Moreover, this same calculation shows
  that the four vectors are linearly independent. From the
1 −4 0 2 −3 0 8 left hand side of (5.5.6) we see that if this linear combi-
A= 0 0 1 3 2 0 4  (5.5.5) nation sums to zero, then x2 = x4 = x5 = x7 = 0. It
0 0 0 0 0 1 2 follows from Theorem 5.5.3 that dim W = 4.
that has rank three. Suppose that the unknowns for this Definition 5.5.5. The nullity of A is the dimension of
system of equations are x1 , . . . , x7 . We can solve the the null space of A.
equations associated with A by solving the first equation
for x1 , the second equation for x3 , and the third equation Theorem 5.5.6. Let A be an m × n matrix. Then
for x6 , as follows:
nullity(A) + rank(A) = n.
x1 = 4x2 − 2x4 + 3x5 − 8x7
x3 = −3x4 − 2x5 − 4x7
Proof Neither the rank nor the null space of A are
x6 = −2x7 changed by elementary row operations. So we can assume
Thus, all solutions to this system of equations have the that A is in reduced echelon form. The rank of A is
form the number of nonzero rows in the reduced echelon form
matrix. Proposition 5.2.5 states that the null space is
 
4x2 − 2x4 + 3x5 − 8x7
 x2  spanned by p vectors where p = n − rank(A). We must
show that these vectors are linearly independent.
 
 −3x4 − 2x5 − 4x7 
(5.5.6)
 
x 4 Let j1 , . . . , jp be the columns of A that do not contain
 
 
x 5 pivots. In example (5.5.5) p = 4 and
 
 
 −2x7 
x7 j1 = 2, j2 = 4, j3 = 5, j4 = 7.

145
§5.5 Dimension and Bases

After solving for the variables corresponding to pivots, 3. Let S = span{v1 , v2 , v3 } where
we find that the spanning set of the null space consists
v1 = (1, 0, −1, 0) v2 = (0, 1, 1, 1) v3 = (5, 4, −1, 4).
of p vectors in Rn , which we label as {wj1 , . . . , wjp }. See
(5.5.6). Note that the jm th entry of wjm is 1 while the Find the dimension of S and find a basis for S.
jm th entry in all of the other p − 1 vectors is 0. Again,
see (5.5.6) as an example that supports this statement.
It follows that the set of spanning vectors is a linearly 4. Find a basis for the null space of
independent set. That is, suppose that 
1 0 −1 2

A =  1 −1 0 0 .
r1 wj1 + · · · + rp wjp = 0. 4 −5 1 −2

From the jm th entry in this equation, it follows that rm = What is the dimension of the null space of A?
0; and the vectors are linearly independent. 

5. Show that the set V of all 2 × 2 matrices is a vector space.


Theorem 5.5.6 has an interesting and useful interpreta-
Show that the dimension of V is four by finding a basis of V
tion. We have seen in the previous subsection that the with four elements. Show that the space M (m, n) of all m × n
rank of a matrix A is just the number of linearly inde- matrices is also a vector space. What is dim M (m, n)?
pendent rows in A. In linear systems each row of the co-
efficient matrix corresponds to a linear equation. Thus,
the rank of A may be thought of as the number of inde- 6. Show that the set Pn of all polynomials of degree less than
pendent equations in a system of linear equations. This or equal to n is a subspace of C 1 . What is dim P2 ? What is
theorem just states that the space of solutions loses a dim Pn ?
dimension for each independent equation.
7. Let P3 be the vector space of polynomials of degree at
Exercises most three in one variable t. Let p(t) = t3 + a2 t2 + a1 t + a0
where a0 , a1 , a2 ∈ R are fixed constants. Show that

dp d2 p d3 p
 
p, , 2 , 3
dt dt dt
1. Show that U = {u1 , u2 , u3 } where
is a basis for P3 .
u1 = (1, 1, 0) u2 = (0, 1, 0) u3 = (−1, 0, 1)

is a basis for R3 .
8. Let u ∈ Rn be a nonzero row vector.

(a) Show that the n × n matrix A = ut u is symmetric and


2. Determine whether or not the vectors
that rank(A) = 1. Hint: Begin by showing that Av t = 0
v1 = (−1, 1, 1) v2 = (1, −1, 0) v3 = (1, 1, −1) for every row vector v ∈ Rn that is perpendicular to u
and that Aut is a nonzero multiple of ut .
form a basis of R3 . (b) Show that the matrix P = In + ut u is invertible. Hint:
Show that rank(P ) = n.

146
§5.5 Dimension and Bases

9. Let
{v1 , v2 , v3 }
be a basis for R . Find all k so that
3

{v1 , kv2 , v2 + (1 − k)v3 }

is also a basis for R3 . Justify your answer.

10. Determine whether each of the following statements is


true or false and explain your answer.

(a) If A is an m × n matrix and the equation AX = b is


consistent for some b, then the columns of A span Rm .
(b) Let A and B be n × n matrices. If AB = BA and if A
is invertible, then A−1 B = BA−1 .
(c) If A and B are m×n matrices, then both AB t and At B
are defined.
(d) Similar matrices always have the same eigenvectors.
(e) If u, v, w are vectors such that {u, v}, {u, w}, and {v, w}
are linearly independent sets, then {u, v, w} is a linearly
independent set.
(f) Let {v1 , v2 , v3 } be a basis for a vector space V . If U is a
subspace of V, then some subset of {v1 , v2 , v3 } is a basis
for U .

147
§5.6 The Proof of the Main Theorem

5.6 The Proof of the Main Proof Since the dimension of W is m we know
that this vector space can be written as W =
Theorem span{v1 , . . . , vm }. Moreover, Lemma 5.6.1 implies that
We begin the proof of Theorem 5.5.3 with two lemmas the vectors v1 , . . . , vm are linearly independent. Suppose
on linearly independent and spanning sets. that {w1 , . . . , wk } is another set of vectors where k > m.
We have to show that the vectors w1 , . . . , wk are linearly
Lemma 5.6.1. Let {w1 , . . . , wk } be a set of vectors in
dependent; that is, we must show that there exist scalars
a vector space V and let W be the subspace spanned by
r1 , . . . , rk not all of which are zero that satisfy
these vectors. Then there is a linearly independent subset
of {w1 , . . . , wk } that also spans W . r1 w1 + · · · + rk wk = 0. (5.6.1)
We find these scalars by solving a system of linear equa-
Proof If {w1 , . . . , wk } is linearly independent, then tions, as we now show.
the lemma is proved. If not, then the set {w1 , . . . , wk } is The fact that W is spanned by the vectors vj implies that
linearly dependent. If this set is linearly dependent, then
at least one of the vectors is a linear combination of the w1 = a11 v1 + · · · + am1 vm
others. By renumbering if necessary, we can assume that w2 = a12 v1 + · · · + am2 vm
wk is a linear combination of w1 , . . . , wk−1 ; that is, ..
.
wk = a1 w1 + · · · + ak−1 wk−1 .
wk = a1k v1 + · · · + amk vm .
Now suppose that w ∈ W . Then
It follows that r1 w1 + · · · + rk wk equals
w = b1 w1 + · · · + bk wk . r1 (a11 v1 + · · · + am1 vm ) +
r2 (a12 v1 + · · · + am2 vm ) + · · · +
It follows that
rk (a1k v1 + · · · + amk vm )
w = (b1 + bk a1 )w1 + · · · + (bk−1 + bk ak−1 )wk−1 , Rearranging terms leads to the expression:
and that W = span{w1 , . . . , wk−1 }. If the vectors (a11 r1 + · · · + a1k rk )v1 +
w1 , . . . , wk−1 are linearly independent, then the proof of (a21 r1 + · · · + a2k rk )v2 +···+ (5.6.2)
the lemma is complete. If not, continue inductively until (am1 r1 + · · · + amk rk )vm .
a linearly independent subset of the wj that also spans
W is found.  Thus, (5.6.1) is valid if and only if (5.6.2) sums to zero.
Since the set {v1 , . . . , vm } is linearly independent, (5.6.2)
The important point in proving that linear independence can equal zero if and only if
together with spanning imply that we have a basis is a11 r1 + · · · + a1k rk = 0
discussed in the next lemma.
a21 r1 + · · · + a2k rk = 0
Lemma 5.6.2. Let W be an m-dimensional vector space ..
and let k > m be an integer. Then any set of k vectors .
in W is linearly dependent. am1 r1 + · · · + amk rk = 0.

148
§5.6 The Proof of the Main Theorem

Since m < k, Chapter 2, Theorem 2.4.6 implies that We now discuss a second approach to finding a basis for a
this system of homogeneous linear equations always has a nonzero subspace W of a finite dimensional vector space
nonzero solution r = (r1 , . . . , rk ) — from which it follows V.
that the wi are linearly dependent. 
Lemma 5.6.4. Let {u1 , . . . , uk } be a linearly indepen-
Corollary 5.6.3. Let V be a vector space of dimension dent set of vectors in a vector space V and assume that
n and let {u1 , . . . , uk } be a linearly independent set of
uk+1 6∈ span{u1 , . . . , uk }.
vectors in V . Then k ≤ n.
Then {u1 , . . . , uk+1 } is also a linearly independent set.

Proof If k > n then Lemma 5.6.2 implies that


{u1 , . . . , uk } is linearly dependent. Since we have as- Proof Let r1 , . . . , rk+1 be scalars such that
sumed that this set is linearly independent, it follows
that k ≤ n.  r1 u1 + · · · + rk+1 uk+1 = 0. (5.6.3)

To prove independence, we need to show that all rj = 0.


Suppose rk+1 6= 0. Then we can solve (5.6.3) for
Proof of Theorem 5.5.3 Suppose that B = 1
{w1 , . . . , wk } is a basis for W . By definition, B spans uk+1 = − (r1 u1 + · · · + rk uk ),
rk+1
W and k = dim W . We must show that B is linearly
independent. Suppose B is linearly dependent, then which implies that uk+1 ∈ span{u1 , . . . , uk }. This con-
Lemma 5.6.1 implies that there is a proper subset of tradicts the choice of uk+1 . So rk+1 = 0 and
B that spans W (and is linearly independent). This
contradicts the fact that as a basis B has the smallest r1 u1 + · · · + rk uk = 0.
number of elements of any spanning set for W .
Since {u1 , . . . , uk } is linearly independent, it follows that
Suppose that B = {w1 , . . . , wk } both spans W and is
r1 = · · · = rk = 0. 
linearly independent. Linear independence and Corol-
lary 5.6.3 imply that k ≤ dim W . Since, by definition,
any spanning set of W has at least dim W vectors, it The second method for constructing a basis is:
follows that k ≥ dim W . Thus, k = dim W and B is a
basis.  • Choose a nonzero vector w1 in W .

• If W is not spanned by w1 , then choose a vector w2


Extending Linearly Independent Sets to Bases that is not on the line spanned by w1 .
Lemma 5.6.1 leads to one approach to finding bases. Sup- • If W 6= span{w1 , w2 }, then choose a vector w3 6∈
pose that the subspace W is spanned by a finite set of span{w1 , w2 }.
vectors {w1 , . . . , wk }. Then, we can throw out vectors
one by one until we arrive at a linearly independent sub- • If W 6= span{w1 , w2 , w3 }, then choose a vector w4 6∈
set of the wj . This subset is a basis for W . span{w1 , w2 , w3 }.

149
§5.6 The Proof of the Main Theorem

• Continue until a spanning set for W is found. This (b) Let {w1 , . . . , wk } be a basis for W . Theorem 5.5.3 im-
set is a basis for W . plies that this set is linearly independent. If {w1 , . . . , wk }
does not span V , then it can be extended to a basis as
We now justify this approach to finding bases for sub- above. But then dim V > dim W , which is a contradic-
spaces. Suppose that W is a subspace of a finite di- tion. 
mensional vector space V . For example, suppose that
W ⊂ Rn . Then our approach to finding a basis of W Corollary 5.6.7. Let B = {w1 , . . . , wn } be a set of n
is as follows. Choose a nonzero vector w1 ∈ W . If vectors in an n-dimensional vector space V . Then the
W = span{w1 }, then we are done. If not, choose a vector following are equivalent:
w2 ∈ W – span{w1 }. It follows from Lemma 5.6.4 that
{w1 , w2 } is linearly independent. If W = span{w1 , w2 }, (a) B is a spanning set of V ,
then Theorem 5.5.3 implies that {w1 , w2 } is a basis (b) B is a basis for V , and
for W , dim W = 2, and we are done. If not, choose
w3 ∈ W – span{w1 , w2 } and {w1 , w2 , w3 } is linearly in- (c) B is a linearly independent set.
dependent. The finite dimension of V implies that con-
tinuing inductively must lead to a spanning set of linear Proof By definition, (a) implies (b) since a basis is
independent vectors for W — which by Theorem 5.5.3 is a spanning set with the number of vectors equal to the
a basis. This discussion proves: dimension of the space. Theorem 5.5.3 states that a ba-
sis is a linearly independent set; so (b) implies (c). If B
Corollary 5.6.5. Every linearly independent subset of is a linearly independent set of n vectors, then it spans
a finite dimensional vector space V can be extended to a a subspace W of dimension n. It follows from Corol-
basis of V . lary 5.6.6(b) that W = V and that (c) implies (a). 

Further consequences of Theorem 5.5.3 We summa- Subspaces of R3 We can now classify all subspaces of
rize here several important facts about dimensions. R3 . They are: the origin, lines through the origin, planes
through the origin, and R3 . All of these sets were shown
Corollary 5.6.6. Let W be a subspace of a finite dimen-
to be subspaces in Example 5.1.5(a–c).
sional vector space V .
To verify that these sets are the only subspaces of R3 ,
(a) Suppose that W is a proper subspace. Then dim W < note that Theorem 5.5.3 implies that proper subspaces
dim V . of R3 have dimension equal either to one or two. (The
zero dimensional subspace is the origin and the only three
(b) Suppose that dim W = dim V . Then W = V . dimensional subspace is R3 itself.) One dimensional sub-
spaces of R3 are spanned by one nonzero vector and
Proof (a) Let dim W = k and let {w1 , . . . , wk } be a are just lines through the origin. See Example 5.1.5(b).
We claim that all two dimensional subspaces are planes
basis for W . Since W is a proper subspace of V , there is
through the origin.
a vector w ∈ V – W . It follows from Lemma 5.6.4 that
{w1 , . . . , wk , w} is a linearly independent set. Therefore, Suppose that W ⊂ R3 is a subspace spanned by two non-
Corollary 5.6.3 implies that k + 1 ≤ n. collinear vectors w1 and w2 . We show that W is a plane

150
§5.6 The Proof of the Main Theorem

through the origin using results in Chapter 2. Observe 5. Let A be a 7 × 5 matrix with rank(A) = r.
that there is a vector N = (N1 , N2 , N3 ) perpendicular
(a) What is the largest value that r can have?
to w1 = (a11 , a12 , a13 ) and w2 = (a21 , a22 , a23 ). Such a
vector N satisfies the two linear equations: (b) Give a condition equivalent to the system of equations
Ax = b having a solution.
w1 · N = a11 N1 + a12 N2 + a13 N3 = 0 (c) What is the dimension of the null space of A?
w2 · N = a21 N1 + a22 N2 + a23 N3 = 0. (d) If there is a solution to Ax = b, then how many param-
eters are needed to describe the set of all solutions?
Chapter 2, Theorem 2.4.6 implies that a system of two
linear equations in three unknowns has a nonzero solu-
tion. Let P be the plane perpendicular to N that con- 6. Let  
1 3 −1 4
tains the origin. We show that W = P and hence that A= 2 1 5 7 .
the claim is valid. 3 4 4 11
The choice of N shows that the vectors w1 and w2 are
(a) Find a basis for the subspace C ⊂ R3 spanned by the
both in P . In fact, since P is a subspace it contains every columns of A.
vector in span{w1 , w2 }. Thus W ⊂ P . If P contains just
(b) Find a basis for the subspace R ⊂ R4 spanned by the
one additional vector w3 ∈ R3 that is not in W , then the rows of A.
span of w1 , w2 , w3 is three dimensional and P = W = R3 .
(c) What is the relationship between dim C and dim R?

Exercises 7. Show that the vectors


v1 = (2, 3, 1) and v2 = (1, 1, 3)

In Exercises 1 – 3 you are given a pair of vectors v1 , v2 span- are linearly independent. Show that the span of v1 and v2
ning a subspace of R3 . Decide whether that subspace is a line forms a plane in R3 by showing that every linear combination
or a plane through the origin. If it is a plane, then compute is the solution to a single linear equation. Use this equa-
a vector N that is perpendicular to that plane. tion to determine the normal vector N to this plane. Verify
Lemma 5.6.4 by verifying directly that v1 , v2 , N are linearly
1. v1 = (2, 1, 2) and v2 = (0, −1, 1). independent vectors.
2. v1 = (2, 1, −1) and v2 = (−4, −2, 2).
3. v1 = (0, 1, 0) and v2 = (4, 1, 0). 8. Let W be an infinite dimensional subspace of the vector
space V . Show that V is infinite dimensional.

4. The pairs of vectors


v1 = (−1, 1, 0) and v2 = (1, 0, 1) 9. (matlab) Consider the following set of vectors

span a plane P in R3 . The pairs of vectors w1 = (2, −2, 1),


w2 = (−1, 2, 0),
w1 = (0, 1, 0) and w2 = (1, 1, 0)
w3 = (3, −2, λ),
span a plane Q in R3 . Show that P and Q are different and
w4 = (−5, 6, −2),
compute the subspace of R3 that is given by the intersection
P ∩ Q. where λ is a real number.

151
§5.6 The Proof of the Main Theorem

(a) Find a value for λ such that the dimension of Consider the vectors
span{w1 , w2 , w3 , w4 } is three. Then decide whether        
−1 2 2 2
{w1 , w2 , w3 } or {w1 , w2 , w4 } is a basis for R3 .  1   5   1   −2 
v1 =  v2 =  v3 =  v4 = 
(b) Find a value for λ such that the dimension of
     
0  3   1   0 
span{w1 , w2 , w3 , w4 } is two. −1 −1 1 2

Find all bases of V of the form {vi , vj } with 1 ≤ i < j ≤ 4.


(Hint: use Corollary 5.6.7.)
10. (matlab) Find a basis for R as follows. Randomly
5

choose vectors x1 , x2 ∈ R5 by typing x1 = rand(5,1) and x2


= rand(5,1). Check that these vectors are linearly indepen- 13. Let A be an m × n matrix and B be an n × k matrix.
dent. If not, choose another pair of vectors until you find a
linearly independent set. Next choose a vector x3 at random (a) Show that null space(B) ⊆ null space(AB).
and check that x1 , x2 , x3 are linearly independent. If not, (b) Show that nullity(B) ≤ nullity(AB).
randomly choose another vector for x3 . Continue until you
have five linearly independent vectors — which by a dimen-
sion count must be a basis and span R5 . Verify this comment 14. Let {v1 , v2 , v3 } and {w1 , w2 } be linearly independent sets
by using MATLAB to write the vector of vectors in a vector space V . Show that if

2
 span{v1 , v2 , v3 } ∩ span{w1 , w2 } = {0}

 1 
 then

 3 
 dim(span{v1 , v2 , v3 , w1 , w2 }) = 5
 −2 
4 Hint: First show that if v ∈ span{v1 , v2 , v3 }, w ∈
span{w1 , w2 }, and v + w = 0, then v = w = 0.
as a linear combination of x1 , . . . , x5 .
In Exercises 15-20 decide whether the statement is true or
false, and explain your answer.
11. (matlab) Find a basis for the subspace of R5 spanned
by 15. Every set of three vectors in R3 is a basis for R3 .
u1 = (1, 1, 0, 0, 1) 16. Every set of four vectors in R3 is linearly dependent.
u2 = (0, 2, 0, 1, −1)
u3 = (0, −1, 1, 0, 2) (5.6.4*) 17. If {v1 , v2 } is a basis for the plane z = 0 in R3 , then
u4 = (1, 4, 1, 2, 1) {v1 , v2 , e3 } is a basis for R3 .
u5 = (0, 0, 2, 1, 3).
18. If {v1 , v2 , v3 } is a basis for R3 , the only subspaces of R3
of dimension one are span{v1 }, span{v2 }, and span{v3 }.
19. The only subspace of R3 that contains finitely many vec-
12. Let V be the subspace of R defined by the equations
4 tors is {0}.
20. If U is a subspace of R3 of dimension 1 and V is a subspace
x1 − x3 − x4 = 0 of R3 of dimension 2, then U ∩ V = {0}.
x2 − 2x3 + x4 = 0

152
Chapter 6 Closed Form Solutions for Planar ODEs

6 Closed Form Solutions for ponentials yield an elegant third way to derive closed
form solutions to n-dimensional linear ODE systems
Planar ODEs (Section 6.5). This method leads to a proof of uniqueness
of solutions to initial value problems of linear systems
In this chapter we describe several methods for finding
(Theorem 6.5.1). A proof of the Cayley Hamilton Theo-
closed form solutions to planar constant coefficient sys-
rem for 2 × 2 matrices is given in Section 6.6. In the last
tems of linear differential equations and we use these
section, Section 6.7, we obtain solutions to second order
methods to discuss qualitative features of phase portraits
equations by reducting them to first order systems.
of these solutions.
In Section 6.1 we show how uniqueness to initial value
problems implies that the space of solutions to a con-
stant coefficient system of n linear differential equations
is n dimensional. Using this observation we present a
direct method for solving planar linear systems in Sec-
tion 6.2. This method extends the discussion of solutions
to systems whose coefficient matrices have distinct real
eigenvalues given in Section 4.7 to the cases of complex
eigenvalues and equal real eigenvalues.
A second method for finding solutions is to use changes
of coordinates to make the coefficient matrix of the dif-
ferential equation as simple as possible. This idea leads
to the notion of similarity of matrices, which is discussed
in Section 6.3, and leads to the second method for solving
planar linear systems. Similarity also leads to the Jordan
Normal Form theorem for 2 × 2 matrices. Both the di-
rect method and the method based on similarity require
being able to compute the eigenvalues and eigenvectors
of the coefficient matrix.
The important subject of qualitative features of phase
portraits of linear systems are explored in Section 6.4.
Specifically we discuss saddles, sinks, sources and asymp-
totic stability. This discussion also uses similarity and
Jordan Normal Form. We find that the qualitative theory
is determined by the eigenvalues and eigenvectors of the
coefficient matrix — which is not surprising given that
we can classify matrices up to similarity by just knowing
their eigenvalues and eigenvectors.
Chapter 6 ends with three optional sections. Matrix ex-

153
§6.1 The Initial Value Problem

6.1 The Initial Value Problem Superposition guarantees that every linear combination
of these solutions
Recall that a planar autonomous constant coefficient sys-
tem of ordinary differential equations has the form X(t) = α1 X1 (t) + α2 X2 (t) = α1 eλ1 t v1 + α2 eλ2 t v2
dx
= ax + by is a solution to (6.1.2). Since v1 and v2 are linearly in-
dt (6.1.1) dependent, we can always choose scalars α1 , α2 ∈ R to
dy
= cx + dy solve any given initial value problem of (6.1.2). It follows
dt from the uniqueness of solutions to initial value problems
where a, b, c, d ∈ R. Computer experiments using Phase- that all solutions to (6.1.2) are included in this family of
Plane lead us to believe that there is just one solution to solutions. Uniqueness is proved in the special case of lin-
(6.1.1) satisfying the initial conditions ear systems in Theorem 6.5.1. This proof uses matrix
exponentials.
x(0) = x0
y(0) = y0 . We generalize this discussion so that we will be able to
find closed form solutions to (6.1.2) in Section 6.2 when
We prove existence in this section and the next by deter- the eigenvalues of C are complex or are real and equal.
mining explicit formulas for solutions.
Suppose that X1 (t) and X2 (t) are two solutions to (6.1.1)
such that
The Initial Value Problem for Linear Systems In this
chapter we discuss how to find solutions (x(t), y(t)) to v1 = X1 (0) and v2 = X2 (0)
(6.1.1) satisfying the initial values x(0) = x0 and y(0) = are linearly independent. Then all solutions to (6.1.1)
y0 . It is convenient to rewrite (6.1.1) in matrix form as: are linear combinations of these two solutions. We verify
dX this statement as follows. Corollary 5.6.7 of Chapter 5
(t) = CX(t). (6.1.2) states that since {v1 , v2 } is a linearly independent set in
dt
R2 , it is also a basis of R2 . Thus for every X0 ∈ R2 there
The initial value problem is then stated as: Find a so-
exist scalars r1 , r2 such that
lution to (6.1.2) satisfying X(0) = X0 where X0 =
(x0 , y0 )t . Everything that we have said here works X0 = r1 v1 + r2 v2 .
equally well for n dimensional systems of linear differ-
It follows from superposition that the solution
ential equations. Just let C be an n × n matrix and let
X0 be an n vector of initial conditions. X(t) = r1 X1 (t) + r2 X2 (t)
is the unique solution whose initial condition vector is
Solving the Initial Value Problem Using Superposition In X0 .
Section 4.7 we discussed how to solve (6.1.2) when the
We have proved that every solution to this linear sys-
eigenvalues of C are real and distinct. Recall that when
tem of differential equations is a linear combination of
λ1 and λ2 are distinct real eigenvalues of C with asso-
these two solutions — that is, we have proved that the
ciated eigenvectors v1 and v2 , there are two solutions to
dimension of the space of solutions to (6.1.2) is two. This
(6.1.2) given by the explicit formulas
proof generalizes immediately to a proof of the following
X1 (t) = eλ1 t v1 and X2 (t) = eλ2 t v2 . theorem for n × n systems.

154
§6.1 The Initial Value Problem

Theorem 6.1.1. Let C be an n × n matrix. Suppose Exercises


that
X1 (t), . . . , Xn (t)
are solutions to Ẋ = CX such that the vectors of initial In Exercises 1 – 4, consider the system of differential equations
conditions vj = Xj (0) are linearly independent in Rn . dx
Then the unique solution to the system (6.1.2) with initial = 65x + 42y
dt (6.1.6)
condition X(0) = X0 is dy
= −99x − 64y.
dt
X(t) = r1 X1 (t) + · · · + rn Xn (t), (6.1.3)
1. Verify that
where r1 , . . . , rn are scalars satisfying ! !
2 −7
v1 = and v2 =
X0 = r1 v1 + · · · + rn vn . (6.1.4) −3 11

We call (6.1.3) the general solution to the system of dif- are eigenvectors of the coefficient matrix of (6.1.6) and find
ferential equations Ẋ = CX. When solving the initial the associated eigenvalues.
value problem we find a particular solution by specifying 2. Find the solution to (6.1.6) satisfying initial conditions
the scalars r1 , . . . , rn . X(0) = (−14, 22)t .
Corollary 6.1.2. Let C be an n × n matrix and let 3. Find the solution to (6.1.6) satisfying initial conditions
X = {X1 (t), . . . , Xn (t)} X(0) = (−3, 5)t .

4. Find the solution to (6.1.6) satisfying initial conditions


be solutions to the differential equation Ẋ = CX such
X(0) = (9, −14)t .
that the vectors Xj (0) are linearly independent in Rn .
Then the set of all solutions to Ẋ = CX is an n-
dimensional subspace of (C 1 )n , and X is a basis for the In Exercises 5 – 8, consider the system of differential equations
solution subspace.
dx
= x−y
Consider a special case of Theorem 6.1.1. Suppose that dt
dy (6.1.7)
the matrix C has n linearly independent eigenvectors dt
= −x + y.
v1 , . . . , vn with real eigenvalues λ1 , . . . , λn . Then the
functions Xj (t) = eλj t vj are solutions to Ẋ = CX. 5. The eigenvalues of the coefficient matrix of (6.1.7) are 0
Corollary 6.1.2 implies that the functions Xj form a ba- and 2. Find the associated eigenvectors.
sis for the space of solutions of this system of differential 6. Find the solution to (6.1.7) satisfying initial conditions
equations. Indeed, the general solution to (6.1.2) is X(0) = (2, −2)t .
X(t) = r1 eλ1 t v1 + · · · + rn eλn t vn . (6.1.5) 7. Find the solution to (6.1.7) satisfying initial conditions
X(0) = (2, 6)t .
The particular solution that solves the initial value
X(0) = X0 is found by solving (6.1.4) for the scalars 8. Find the solution to (6.1.7) satisfying initial conditions
r1 , . . . , rn . X(0) = (1, 0)t .

155
§6.1 The Initial Value Problem

In Exercises 9 – 12, consider the system of differential equa- 16. Find a solution to the system of differential equations
tions Ẋ = CX satisfying the initial condition X(0) = (10, −4, 9)t .
dx
= −y
dt (6.1.8) 17. Find a solution to the system of differential equations
dy Ẋ = CX satisfying the initial condition X(0) = (2, −1, 3)t .
= x.
dt
9. Show that (x1 (t), y1 (t)) = (cos t, sin t) is a solution to
(6.1.8). 18. Show that for some nonzero a the function x(t) = at5 is a
solution to the differential equation ẋ = x4/5 . Then show that
10. Show that (x2 (t), y2 (t)) = (− sin t, cos t) is a solution to
there are at least two solutions to the initial value problem
(6.1.8).
x(0) = 0 for this differential equation.
11. Using Exercises 9 and 10, find a solution (x(t), y(t)) to
(6.1.8) that satisfies (x(0), y(0)) = (0, 1).

12. Using Exercises 9 and 10, find a solution (x(t), y(t)) to 19. (matlab) Use PhasePlane to investigate the system of
(6.1.8) that satisfies (x(0), y(0)) = (1, 1). differential equations
dx
= −2y
In Exercises 13 – 14, consider the system of differential equa- dt (6.1.10)
dy
tions = −x + y.
dx dt
= −2x + 7y
dt
dy (6.1.9) (a) Use PhasePlane to find two independent eigendirections
dt
= 5y, (and hence eigenvectors) for (6.1.10).
(b) Using (a), find the eigenvalues of the coefficient matrix
13. Find a solution to (6.1.9) satisfying the initial condition
of (6.1.10).
(x(0), y(0)) = (1, 0).
(c) Find a closed form solution to (6.1.10) satisfying the ini-
14. Find a solution to (6.1.9) satisfying the initial condition tial condition !
(x(0), y(0)) = (−1, 2). 4
X(0) = .
−1
In Exercises 15 – 17, consider the matrix (d) Study the time series of y versus t for the solution in
  (c) by comparing the graph of the closed form solution
−1 −10 −6 obtained in (c) with the time series graph using Phase-
C= 0 4 3 . Plane.
 

0 −14 −9

15. Verify that


     
1 2 6
v1 =  0 v2 =  −1  and v3 =  −3 
     

0 2 7

are eigenvectors of C and find the associated eigenvalues.

156
§6.2 Closed Form Solutions by the Direct Method

6.2 Closed Form Solutions by the ness we repeat the result. The general solution is:
Direct Method X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 . (6.2.2)
In Section 4.7 we showed in detail how solutions to pla-
nar systems of constant coefficient differential equations The initial value problem is solved by finding real num-
with distinct real eigenvalues are found. This method bers α1 and α2 such that
was just reviewed in Section 6.1 where we saw that the X0 = α1 v1 + α2 v2 .
crucial step in solving these systems of differential equa-
tions is the step where we find two linearly independent See Section 4.7 for a detailed discussion with examples.
solutions. In this section we discuss how to find these
two linearly independent solutions when the eigenvalues
of the coefficient matrix are either complex or real and Complex Conjugate Eigenvalues Suppose that the
equal. eigenvalues of C are complex, that is, suppose that
λ1 = σ + iτ with τ 6= 0 is an eigenvalue of C with eigen-
By finding these two linearly independent solutions we vector v1 = v +iw, where v, w ∈ R2 . We claim that X1 (t)
will find both the general solution of the system of differ- and X2 (t), where
ential equations Ẋ = CX and a method for solving the
initial value problem X1 (t) = eσt (cos(τ t)v − sin(τ t)w)
(6.2.3)
X2 (t) = eσt (sin(τ t)v + cos(τ t)w),
dX
= CX
dt (6.2.1) are solutions to (6.2.1) and that the general solution to
X(0) = X0 .
(6.2.1) is:
X(t) = α1 X1 (t) + α2 X2 (t), (6.2.4)
The principle results of this section are summarized as
follows. Let C be a 2 × 2 matrix with eigenvalues λ1 and where α1 , α2 are real scalars.
λ2 , and associated eigenvectors v1 and v2 . Two basic observations are needed when deriving (6.2.3)
and (6.2.4); these observations use basic properties of the
(a) If the eigenvalues are real and v1 and v2 are linearly complex exponential function.
independent, then the general solution to (6.2.1) is
given by (6.2.2). The first property is Euler’s celebrated formula:

(b) If the eigenvalues are complex, then the general so- eiτ = cos τ + i sin τ (6.2.5)
lution to (6.2.1) is given by (6.2.3) and (6.2.4).
for any real number τ . A justification of this formula is
(c) If the eigenvalues are equal (and hence real) and given in Exercise 1. The second property is the important
there is only one linearly independent eigenvector, feature of exponential functions:
then the general solution to (6.2.1) is given by
(6.2.18). ex+y = ex ey (6.2.6)

for any complex numbers x, y. The two formulas together


Real Distinct Eigenvalues We have discussed the case imply
when λ1 6= λ2 ∈ R on several occasions. For complete- eσ+iτ = eσ (cos τ + i sin τ ) (6.2.7)

157
§6.2 Closed Form Solutions by the Direct Method

Euler’s formula allows us to differentiate complex expo- Lemma 6.2.1. The complex vector-valued function X(t)
nentials, obtaining the expected result: is a solution to Ẋ = CX if and only if the real and
imaginary parts are real vector-valued solutions to Ẋ =
d iτ t d CX.
e = (cos(τ t) + i sin(τ t))
dt dt
= τ (− sin(τ t) + i cos(τ t))
Proof Equating the real and imaginary parts of
= iτ (cos(τ t) + i sin(τ t)) (6.2.10) implies that Ẋ1 = CX1 and Ẋ2 = CX2 . 
= iτ eiτ t .
It follows from Lemma 6.2.1 that finding one complex-
Euler’s formula also implies that valued solution to a linear differential equation provides
us with two real-valued solutions. Identity (6.2.9) implies
eλt = eσt+iτ t = eσt eiτ t = eσt (cos(τ t)+i sin(τ t)), (6.2.8) that
X(t) = eλ1 t v1
where λ = σ + iτ . Most importantly, we note that
is a complex-valued solution to (6.2.1). Using Euler’s
d λt formula we compute the real and imaginary parts of X(t),
e = λeλt . (6.2.9)
dt as follows.

We use (6.2.8) and the product rule for differentiation to X(t) = e(σ+iτ )t (v + iw)
verify (6.2.9) as follows: = eσt (cos(τ t) + i sin(τ t))(v + iw)
d λt d σt iτ t  = eσt (cos(τ t)v − sin(τ t)w)
e = e e
dt dt  + ieσt (sin(τ t)v + cos(τ t)w).
σeσt eiτ t + eσt iτ eiτ t

=
Since the real and imaginary parts of X(t) are solutions
= (σ + iτ )eσt+iτ t to Ẋ = CX, it follows that the real-valued functions
= λeλt . X1 (t) and X2 (t) defined in (6.2.3) are indeed solutions.
Returning to the case where C is a 2 × 2 matrix, we
Verification that (6.2.4) is the General Solution A com- see that if X1 (0) = v and X2 (0) = w are linearly inde-
plex vector-valued function X(t) = X1 (t) + iX2 (t) ∈ Cn pendent, then Corollary 6.1.2 implies that (6.2.4) is the
consists of a real part X1 (t) ∈ Rn and an imaginary part general solution to Ẋ = CX. The linear independence
X2 (t) ∈ Rn . For such functions X(t) we define of v and w is verified using the following lemma.

Ẋ = Ẋ1 + iẊ2 Lemma 6.2.2. Let λ1 = σ + iτ with τ = 6 0 be a complex


eigenvalue of the 2 × 2 matrix C with eigenvector v1 =
and v + iw where v, w ∈ R2 . Then
CX = CX1 + iCX2 .
Cv = σv − τ w
To say that X(t) is a solution to Ẋ = CX means that (6.2.11)
Cw = τ v + σw.

Ẋ1 + iẊ2 = Ẋ = CX = CX1 + iCX2 . (6.2.10) and v and w are linearly independent vectors.

158
§6.2 Closed Form Solutions by the Direct Method

Proof By assumption Cv1 = λ1 v1 , that is, It follows from (6.2.3) that

C(v + iw) = (σ + iτ )(v + iw) X1 (t) = e−2t (cos(3t)v − sin(3t)w)


(6.2.12) X2 (t) = e−2t (sin(3t)v + cos(3t)w),
= (σv − τ w) + i(τ v + σw).
are solutions to (6.2.13) and X = α1 X1 + α2 X2 is the
Equating real and imaginary parts of (6.2.12) leads to
general solution to (6.2.13). To solve the initial value
the system of equations (6.2.11). Note that if w = 0,
problem we need to find α1 , α2 such that
then v 6= 0 and τ v = 0. Hence τ = 0, contradicting the
assumption that τ 6= 0. So w 6= 0. X0 = X(0) = α1 X1 (0) + α2 X2 (0) = α1 v + α2 w,
Note also that if v and w are linearly dependent, then
that is,
v = αw. It then follows from the previous equation that      
1 2 0
Cw = (τ α + σ)w. = α1 + α2 .
1 −1 3

Hence w is a real eigenvector; but the eigenvalues of C Therefore, α1 =


1 1
and α2 = and
are not real and C has no real eigenvectors.  2 2
 
cos(3t) + sin(3t)
X(t) = e−2t . (6.2.14)
An Example with Complex Eigenvalues Consider an ex- cos(3t) − 2 sin(3t)
ample of an initial value problem for a linear system with
complex eigenvalues. Let Real and Equal Eigenvalues There are two types of
  2 × 2 matrices that have real and equal eigenvalues —
dX −1 2 those that are scalar multiples of the identity and those
= X = CX, (6.2.13)
dt −5 −3 that are not. An example of a 2 × 2 matrix that has real
and equal eigenvalues is
and  
1
 
λ1 1
X0 =
1
. A= , λ1 ∈ R. (6.2.15)
0 λ1

The characteristic polynomial for the matrix C is: The characteristic polynomial of A is
pA (λ) = λ2 −tr(A)λ+det(A) = λ2 −2λ1 λ+λ21 = (λ−λ1 )2 .
pC (λ) = λ2 + 4λ + 13,
Thus the eigenvalues of A both equal λ1 .
whose roots are λ1 = −2 + 3i and λ2 = −2 − 3i. So

σ = −2 and τ = 3. Only One Linearly Independent Eigenvector An impor-


tant fact about the matrix A in (6.2.15) is that it has
An eigenvector corresponding to the eigenvalue λ1 is only one linearly independent eigenvector. To verify this
      fact, solve the system of linear equations
2 2 0
v1 = = +i = v + iw. Av = λ1 v.
−1 + 3i −1 3

159
§6.2 Closed Form Solutions by the Direct Method

In matrix form this equation is that this observation about generalized eigenvectors is
  always valid.
0 1
0 = (A − λ1 I2 )v = v.
0 0 Lemma 6.2.4. Let C be a 2×2 matrix with both eigenval-
ues equal to λ1 and with one linearly independent eigen-
A quick calculation shows that all solutions are multiples vector v1 . Let w1 be a generalized eigenvector of C, then
of v1 = e1 = (1, 0)t . v1 and w1 are linearly independent.
In fact, this observation is valid for any 2 × 2 matrix that
has equal eigenvalues and is not a scalar multiple of the Proof If v1 and w1 were linearly dependent, then w1
identity, as the next lemma shows. would be a multiple of v1 and hence an eigenvector of C.
Lemma 6.2.3. Let C be a 2 × 2 matrix. Suppose that But C − λ1 I2 applied to an eigenvector is zero, which is
C has two linearly independent eigenvectors both with a contradiction. Therefore, v1 and w1 are linearly inde-
eigenvalue λ1 . Then C = λ1 I2 . pendent. 

The Cayley Hamilton theorem (see Section 6.6) coupled


Proof Let v1 and v2 be two linearly independent eigen- with matrix exponentials (see Section 6.5) lead to a sim-
vectors of C; that is, Cvj = λ1 vj . Since dim(R2 ) = 2, ple method for finding solutions to differential equations
Corollary 5.6.7 implies that {v1 , v2 } is a basis of R2 . in the multiple eigenvalue case — one that does not re-
Hence, every vector v has the form v = α1 v1 + α2 v2 . quire solving for either the eigenvector v1 or the gener-
Linearity implies alized eigenvector w1 . We next prove the special case of
Cv = C(α1 v1 + α2 v2 ) = α1 λ1 v1 + α2 λ1 v2 = λ1 v Cayley-Hamilton that is needed.

Therefore, Cv = λ1 v for every v ∈ R2 and hence C = Lemma 6.2.5. Let C be a 2 × 2 matrix with a double
λ1 I2 .  eigenvalue λ1 ∈ R. Then

(C − λ1 I2 )2 = 0. (6.2.17)
Generalized Eigenvectors Suppose that C has exactly one
linearly independent real eigenvector v1 with a double Proof Suppose that C has two linearly independent
real eigenvalue λ1 . We call w1 a generalized eigenvector eigenvectors. Then Lemma 6.2.3 implies that C − λ1 I2 =
of C it satisfies the system of linear equations 0 and hence that (C − λ1 I2 )2 = 0.
(C − λ1 I2 )w1 = v1 . (6.2.16) Suppose that C has one linearly independent eigenvec-
tor v1 and a generalized eigenvector w1 . It follows from
The matrix A in (6.2.15) has a generalized eigenvector. Lemma 6.2.4(a) that {v1 , w1 } is a basis of R2 . It also
To verify this point solve the linear system follows by definition of eigenvector and generalized eigen-
    vector that
0 1 1
(C − λ1 I2 )w1 = w1 = v1 = (C − λ1 I2 )2 v1 = (C − λ1 I2 )0 = 0
0 0 0
(C − λ1 I2 )2 w1 = (C − λ1 I2 )v1 = 0
for w1 = e2 . Note that for this matrix C, v1 = e1 and
w1 = e2 are linearly independent. The next lemma shows Hence, (6.2.17) is valid. 

160
§6.2 Closed Form Solutions by the Direct Method

Independent Solutions to Differential Equations with Equal Thus λ1 = −2 is an eigenvalue of multiplicity two. It
Eigenvalues Suppose that the 2 × 2 matric C has a dou- follows that
ble eigenvalue λ1 . Then the general solution to the initial  
3 −1
value problem Ẋ = CX and X(0) = X0 is: C − λ1 I2 =
9 −3
X(t) = eλ1 t [I2 + t(C − λ1 I2 )]X0 . (6.2.18)
and from (6.2.18) that
This is the form of the solution that is given by ma-
trix exponentials. We verify (6.2.18) by observing that
    
−2t 1 + 3t −t 2 −2t 2 + 3t
X(t) = e =e .
X(0) = X0 and calculating 9t 1 − 3t 3 3 + 9t

CX(t) = eλ1 t [C + t(C 2 − λ1 C)]X0


Exercises
Ẋ(t) = eλ1 t [λ1 (I2 + t(C − λ1 I2 )) + (C − λ1 I2 )]X0 .
Therefore
CX − Ẋ = eλ1 t M X0
1. Justify Euler’s formula (6.2.5) as follows. Recall the Taylor
where (6.2.17) implies series
1 2 1
M = C + t(C 2 − λ1 C) − λ1 (I2 + t(C − λ1 I2 )) ex = 1+x+ x + · · · + xn + · · ·
2! n!
−(C − λ1 I2 )
1 1 1
= t(C − λ1 I2 )2 cos x = 1 − x2 + x4 + · · · + (−1)n x2n + · · ·
2! 4! (2n)!
= 0. 1 1 1
sin x = x − x3 + x5 + · · · + (−1)n x2n+1 + · · · .
on use of (6.2.17). A remarkable feature of formula 3! 5! (2n + 1)!
(6.2.18) is that it is not necessary to compute either the Now evaluate the Taylor series eiθ and separate into real and
eigenvector of C or its generalized eigenvector. imaginary parts.

An Example with Equal Eigenvalues Consider the system In modern language De Moivre’s formula states that
of differential equations  n
  eniθ = eiθ .
dX 1 −1
= X (6.2.19)
dt 9 −5 In Exercises 2 - 3 use De Moivre’s formula coupled with Eu-
ler’s formula (6.2.5) to determine trigonometric identities for
with initial value
  the given quantity in terms of cos θ, sin θ, cos ϕ, sin ϕ.
2
X0 = . 2. cos(θ + ϕ).
3
  3. sin(3θ).
1 −1
The characteristic polynomial for the matrix
9 −5
is In Exercises 4 – 7 compute the general solution for the given
pC (λ) = λ2 + 4λ + 4 = (λ + 2)2 . system of differential equations.

161
§6.2 Closed Form Solutions by the Direct Method

13. Show that multiplication of the plane C by i rotates C


 
dX −1 −4
4. = X.
dt 2 3 counterclockwise by 90◦ .

14. Show that multiplication of thecomplex plane C by σ+iτ


 
dX 8 −15
5. = X.
−4

dt 3 σ −τ
moves C by the matrix .
  τ σ
dX 5 −1
6. = X.
dt 1 3
 
dX −4 4
7. = X.
dt −1 0

8. Use Euler’s formula (6.2.5) and the identity

ei(a−b) = eia e−ib (6.2.20)

to prove

cos(a − b) = cos(a) cos(b) + sin(a) sin(b), (6.2.21)

sin(a − b) = sin(a) cos(b) − cos(a) sin(b). (6.2.22)

9. Equate the real and imaginary parts of both sides of the


identity
e2iθ = eiθ · eiθ (6.2.23)

and Euler’s formula (6.2.5) to prove the identities

sin(2θ) = 2 sin(θ) cos(θ), (6.2.24)

cos(2θ) = cos2 (θ) − sin2 (θ). (6.2.25)

In Exercise 10-11, use Euler’s formula (6.2.5) and (6.2.7) to


express the given complex numbers as x + iy, where x and y
are real numbers.

10. eiπ + 1.

11. eln 2+i



2 .

12. (matlab) Use MATLAB to verify

e−13+31i = e−13 (cos(31) + i sin(31)). (6.2.26)

162
§6.3 Similar Matrices and Jordan Normal Form

6.3 Similar Matrices and Jordan Invariants of Similarity


Normal Form Lemma 6.3.3. Let A and B be similar 2 × 2 matrices.
In a certain sense every 2×2 matrix can be thought of as a Then
member of one of three families of matrices. Specifically
pA (λ) = pB (λ),
we show that every 2 × 2 matrix is similar to one of
the matrices listed in Theorem 6.3.4, where similarity is det(A) = det(B),
defined as follows. tr(A) = tr(B),

Definition 6.3.1. The n × n matrices B and C are sim- and the eigenvalues of A and B are equal.
ilar if there exists an invertible n × n matrix P such that
Proof The determinant is a function on 2 × 2 matrices
C = P −1 BP. that has several important properties. Recall, in partic-
ular, from Chapter 3, Theorem 3.8.2 that for any pair of
Our interest in similar matrices stems from the fact that 2 × 2 matrices A and B:
if we know the solutions to the system of differential equa-
det(AB) = det(A) det(B), (6.3.1)
tions Ẏ = CY , then we also know the solutions to the
system of differential equations Ẋ = BX. More precisely, and for any invertible 2 × 2 matrix P
Lemma 6.3.2. Suppose that B and C = P −1 BP are 1
det(P −1 ) = . (6.3.2)
similar matrices. If Y (t) is a solution to the system of det(P )
differential equations Ẏ = CY , then X(t) = P Y (t) is a
solution to the system of differential equations Ẋ = BX. Let P be an invertible 2 × 2 matrix so that B = P −1 AP .
Using (6.3.1) and (6.3.2) we see that
Proof Since the entries in the matrix P are constants,
pB (λ) = det(B − λI2 )
it follows that
dX dY = det(P −1 AP − λI2 )
=P .
dt dt = det(P −1 (A − λI2 )P )
Since Y (t) is a solution to the Ẏ = CY equation, it = det(A − λI2 )
follows that = pA (λ).
dX
= P CY. Hence the eigenvalues of A and B are the same. It follows
dt
from (4.6.8) and (4.6.9) of Section 4.6 that the determi-
Since Y = P −1 X and P CP −1 = B, nants and traces of A and B are equal. 

dX For example, if
= P CP −1 X = BX.
dt    
−1 0 1 2
Thus X(t) is a solution to Ẋ = BX, as claimed.  A= and P = ,
0 1 1 1

163
§6.3 Similar Matrices and Jordan Normal Form

then Proof The strategy in the proof of this theorem is to


determine the 1st and 2nd columns of P −1 CP by com-
 
−1 −1 2
P =
1 −1 puting (in each case) P −1 CP ej for j = 1 and j = 2.
and   Note from the definition of P that
−1 3 4
P AP =
−2 −3
. P e 1 = v1 and P e2 = v2 .
A calculation shows that In addition, if P is invertible, then
det(P −1
AP ) = −1 = det(A) and tr(P −1
AP ) = 0 = tr(A), P −1 v1 = e1 and P −1 v2 = e2 .
as stated in Lemma 6.3.3. Note that if v1 and v2 are linearly independent, then P
is invertible.
Classification of Jordan Normal Form 2 × 2 Matrices (a) Since v1 and v2 are assumed to be linearly inde-
We now classify all 2 × 2 matrices up to similarity. pendent, P is invertible. So we can compute
Theorem 6.3.4. Let C and P = (v1 |v2 ) be 2×2 matrices P −1 CP e1 = P −1 Cv1 = λP −1 v1 = λe1 .
where the vectors v1 and v2 are specified below.
It follows that the 1st column of P −1 CP is
(a) Suppose that C has two linearly independent real  
eigenvectors v1 and v2 with real eigenvalues λ1 and λ1
.
λ2 . Then 0

Similarly, the 2nd column of P −1 CP is


 
−1 λ1 0
P CP = .
0 λ2  
0
(b) Suppose that C has no real eigenvectors and complex λ2
conjugate eigenvalues σ ± iτ where τ 6= 0. Then thus verifying (a).
(b) Lemma 6.2.2 implies that v1 and v2 are linearly in-
 
−1 σ −τ
P CP = ,
τ σ dependent and hence that P is invertible. Using (6.2.11),
where v1 + iv2 is an eigenvector of C associated with with τ replaced by −τ , v replaced by v1 , and w replaced
the eigenvalue λ1 = σ − iτ . by w1 , we calculate

(c) Suppose that C has exactly one linearly independent P −1 CP e1 = P −1 Cv1 = σP −1 v1 + τ P −1 v2 = σe1 + τ e2 ,
real eigenvector v1 with real eigenvalue λ1 . Then and
 
−1 λ1 1
P CP = , P −1 CP e2 = P −1 Cv2 = −τ P −1 v1 +σP −1 v2 = −τ e1 +σe2 .
0 λ1
where v2 is a generalized eigenvector of C that sat- Thus the columns of P −1 CP are
isfies
   
σ −τ
(C − λ1 I2 )v2 = v1 . (6.3.3) and ,
τ σ

164
§6.3 Similar Matrices and Jordan Normal Form

as desired. to transform a given system into another normal form


(c) Let v1 be an eigenvector and assume that v2 system whose solution is already known. This method is
very much like the technique of change of variables used
is a generalized eigenvector satisfying (6.3.3). By
when finding indefinite integrals in calculus.
Lemma 6.2.4 the vectors v1 and v2 exist and are linearly
independent. We suppose that we are given a system of differential
For this choice of v1 and v2 , compute equations Ẋ = CX and use Theorem 6.3.4 to transform
C by similarity to one of the normal form matrices listed
P −1 CP e1 = P −1 Cv1 = λ1 P −1 v1 = λ1 e1 , in that theorem. We then solve the transformed equa-
tion (see Table 2) and use Lemma 6.3.2 to transform the
and solution back to the given system.
For example, suppose that C has a complex eigenvalue
P −1 CP e2 = P −1 Cv2 = P −1 v1 + λ1 P −1 v2 = e1 + λ1 e2 .
σ − iτ with corresponding eigenvector v + iw. Then The-
Thus the two columns of P −1 CP are: orem 6.3.4 states that
 
    −1 σ −τ
λ1 1 B = P CP = ,
and . τ σ
0 λ1
where P = (v|w) is an invertible matrix. Using Table 2
 the general solution to the system of equations Ẏ = BY
is:
Solutions of Jordan Normal Form Equations The
  
cos(τ t) − sin(τ t) α
Y (t) = eσt .
eigenvectors of the matrices in Table 2(a) are v1 = (1, 0)t sin(τ t) cos(τ t) β
and v2 = (0, 1)t . Hence, the closed form solution of (a) Lemma 6.3.2 states that
in that table follows from the direct solution in (6.2.2).
X(t) = P Y (t)
The eigenvectors of the matrices in Table 2(b) are v1 =
v + iw and v2 = v − iw, where v = (0, 1)t and w = (1, 0)t . is the general solution to the Ẋ = CX system. Moreover,
Hence, the closed form solution of (a) in that table follows we can solve the initial value problem by solving
from the direct solution in (6.2.18)  
α
Finally, the eigenvector and generalized eigenvector of X0 = P Y (0) = P
β
the matrices in Table 2(c) are v1 = (1, 0)t and w1 =
(0, 1)t . Hence, the closed form solution of (c) in that for α and β. In particular,
table follows from the direct solution in (6.2.3)
 
α
= P −1 X0 .
β
Closed Form Solutions Using Similarity We now use Putting these steps together implies that
Lemma 6.3.2, Theorem 6.3.4, and the explicit solutions  
cos(τ t) − sin(τ t)
to the normal form equations Table 2 to find solutions X(t) = eσt P P −1 X0 (6.3.4)
sin(τ t) cos(τ t)
for Ẋ = CX where C is any 2 × 2 matrix. The idea
behind the use of similarity to solve systems of ODEs is is the solution to the initial value problem.

165
§6.3 Similar Matrices and Jordan Normal Form

name normalform equations closed form


 λsolution
e 1t
 
λ1 0 0
(a) Ẋ = X X(t) = λ2 t X0
 0 λ2  0 e 
σ −τ cos(τ t) − sin(τ t)
(b) Ẋ = X X(t) = eσt X0
 τ σ   sin(τ t) cos(τ t)
λ1 1 1 t
(c) Ẋ = X X(t) = eλ1 t X0
0 λ1 0 1

Table 2: Solutions to Jordan normal form ODEs with X(0) = X0 .

The Example with Complex Eigenvalues Revisited Recall A calculation gives


the example in (6.2.13) 1 −2t
   
2 0 cos(3t) − sin(3t) 1
X(t) = e
  2 −1 −3 sin(3t) cos(3t) −1
dX −1 2 
cos(3t) + sin(3t)

= X, = e
−2t
.
dt −5 −3 cos(3t) − 2 sin(3t)

Thus the solution to (6.2.13) that we have found using


with initial values
similarity of matrices is identical to the solution (6.2.14)
that we found by the direct method.
 
1
X0 = .
1 Solving systems with either distinct real eigenvalues or
equal eigenvalues works in a similar fashion.
This linear system has a complex eigenvalue σ − iτ =
−2 − 3i with corresponding eigenvector
  Exercises
2
v + iw = .
−1 − 3i

Thus the matrix P that transforms C into normal form 1. Suppose that the matrices A and B are similar and the
is matrices B and C are similar. Show that A and C are also
similar matrices.
   
2 0 1 3 0
P = and P −1 = .
−1 −3 6 −1 −2 2. Use (4.6.13) in Chapter 3 to verify that the traces of similar
matrices are equal.
It follows from (6.3.4) that the solution to the initial value
problem is
In Exercises 3 – 4 determine whether or not the given matrices
are similar, and why.
 
−2t cos(3t) − sin(3t) −1
X(t) = e P P X0
sin(3t) cos(3t)    
1 2 2 −2
3. and .
   
1 −2t 2 0 cos(3t) − sin(3t) 3 0 A = B =
= e X0 . 3 4 −3 8
6 −1 −3 sin(3t) cos(3t) −1 −2

166
§6.3 Similar Matrices and Jordan Normal Form

10. Use PhasePlane to verify that the nonzero solutions to


   
2 2 4 −2
4. C = and D = .
2 2 −2 4 the system
dX
= CX
dt
5. Let B = P −1 AP so that A and B are similar matrices. where  
Suppose that v is an eigenvector of B with eigenvalue λ. Show 0 −1
C= (6.3.5)
that P v is an eigenvector of A with eigenvalue λ. 1 0
are circles around the origin. Let
 
2 1
6. Which n × n matrices are similar to In ? P =
3 4

and let
7. Solve the initial value problem
 
−2.8 −3.4
B = P −1 CP =
2.6 2.8
ẋ = 2x + 3y
Describe the solutions to the system
ẏ = −3x + 2y
dX
where x(0) = 1 and y(0) = −2. = BX. (6.3.6)
dt
What is the relationship between solutions of (6.3.5) to solu-
8. Solve the initial value problem tions of (6.3.6)?

ẋ = −2x + y
ẏ = −2y

where x(0) = 4 and y(0) = −1.

9. (matlab) Use PhasePlane to plot phase plane portraits


for each of the three types of linear systems (a), (b) and (c)
in Table 2. Based on this computer exploration answer the
following questions:

(i) If a solution to that system spirals about the origin, is


the system of differential equations of type (a), (b) or
(c)?
(ii) How many eigendirections are there for equations of type
(c)?
(iii) Let (x(t), y(t)) be a solution to one of these three types
of systems and suppose that y(t) oscillates up and down
infinitely often. Then (x(t), y(t)) is a solution for which
type of system?

167
§6.4 Sinks, Saddles, and Sources

6.4 Sinks, Saddles, and Sources (b) If the eigenvalues of C have positive real part, then
the origin is a source.
The qualitative theory of autonomous differential equa-
tions begins with the observation that many important (c) If one eigenvalue of C is positive and one is negative,
properties of solutions to constant coefficient systems of then the origin is a saddle.
differential equations
dX
= CX (6.4.1) Proof Lemma 6.3.3 states that the similar matrices B
dt and C have the same eigenvalues. Moreover, as noted
are unchanged by similarity. the origin is a sink, saddle, or source for B if and only if
it is a sink, saddle, or source for C. Thus, we need only
We call the origin of the linear system (6.4.1) a sink (or verify the theorem for normal form matrices as given in
asymptotically stable) if all solutions X(t) satisfy Table 2.
lim X(t) = 0. (a) If the eigenvalues λ1 and λ2 are real and there
t→∞
are two independent eigenvectors, then Chapter 6, The-
The origin is a source if all nonzero solutions X(t) satisfy orem 6.3.4 states that the matrix C is similar to the di-
agonal matrix
lim ||X(t)|| = ∞.
 
λ1 0
t→∞ B= .
0 λ2
Finally, the origin is a saddle if some solutions limit to
0 and some solutions grow infinitely large. Recall also The general solution to the differential equation Ẋ = BX
from Lemma 6.3.2 that if B = P −1 CP , then P −1 X(t) is
is a solution to Ẋ = BX whenever X(t) is a solution to x1 (t) = α1 eλ1 t and x2 (t) = α2 eλ2 t .
(6.4.1). Since P −1 is a matrix of constants that do not
depend on t, it follows that Since
lim eλ1 t = 0 = lim eλ2 t ,
−1 t→∞ t→∞
lim X(t) = 0 ⇐⇒ lim P X(t) = 0.
t→∞ t→∞
when λ1 and λ2 are negative, it follows that
or
lim X(t) = 0
lim ||X(t)|| = ∞ ⇐⇒ lim ||P −1 X(t)|| = ∞. t→∞
t→∞ t→∞
for all solutions X(t), and the origin is a sink. Note that
It follows the origin is C is a sink (or saddle or source)
if both of the eigenvalues are positive, then X(t) will
for (6.4.1) if and only if P −1 X(t) is a sink (or saddle or
undergo exponential growth and the origin is a source.
source) for Ẋ = BX.
(b) If the eigenvalues of C are the complex conjugates
Theorem 6.4.1. Consider the system (6.4.1) where C σ ±iτ where τ 6= 0, then Chapter 6, Theorem 6.3.4 states
is a 2 × 2 matrix. that after a similarity transformation (6.4.1) has the form

(a) If the eigenvalues of C have negative real part, then


 
σ −τ
the origin is a sink. Ẋ = X,
τ σ

168
§6.4 Sinks, Saddles, and Sources

and solutions for this equation have the form (6.3.4) of and not on the formulae for solutions to (6.4.1). This
Chapter 6, that is, is a much simpler calculation. However, Theorem 6.4.2
  simplifies the calculation substantially further.
σt cos(τ t) − sin(τ t)
X(t) = e X0 = eσt Rτ t X0 , Theorem 6.4.2. (a) If det(C) < 0, then 0 is a saddle.
sin(τ t) cos(τ t)
where Rτ t is a rotation matrix (recall (3.2.2) of Chap- (b) If det(C) > 0 and tr(C) < 0, then 0 is a sink.
ter 3). It follows that as time evolves the vector X0 is (c) If det(C) > 0 and tr(C) > 0, then 0 is a source.
rotated about the origin and then expanded or contracted
by the factor eσt . So when σ < 0, lim X(t) = 0 for all Proof Recall from (4.6.9) that det(C) is the product
t→∞
solutions X(t). Hence the origin is a sink and when σ > 0 of the eigenvalues of C. Hence, if det(C) < 0, then the
solutions spiral away from the origin and the origin is a signs of the eigenvalues must be opposite, and we have
source. a saddle. Next, suppose det(C) > 0. If the eigenvalues
(c) If the eigenvalues are both equal to λ1 and if there is are real, then the eigenvalues are either both positive (a
only one independent eigenvector, then Chapter 6, The- source) or both negative (a sink). Recall from (4.6.8) that
orem 6.3.4 states that after a similarity transformation tr(C) is sum of the eigenvalues and the sign of the trace
(6.4.1) has the form determines the sign of the eigenvalues. Finally, assume
the eigenvalues are complex conjugates σ ± iτ . Then
det(C) = σ 2 + τ 2 > 0 and tr(C) = 2σ. Thus, the sign of
 
λ1 1
Ẋ = X,
0 λ1 the real parts of the complex eigenvalues is given by the
sign of tr(C). 
whose solutions are
 
1 t
X(t) = e tλ
X0 Time Series It is instructive to note how the time series
0 1
x1 (t) damps down to the origin in the three cases listed
using Table 2(c). Note that the functions in Theorem 6.4.1. In Figure 18 we present the time series
eλ1 t and teλ1 t both have limits equal to zero as for the three coefficient matrices:
t → ∞. In the second case, use l’Hôspital’s rule and the
 
−2 0
assumption that −λ1 > 0 to compute C1 =
0 −1
,
 
t 1 −1 −55
lim = − lim = 0. C2 = ,
t→∞ e−λ1 t t→∞ λ1 e−λ1 t 55 −1
Hence lim X(t) = 0 for all solutions X(t) and the origin
 
−2 1
t→∞ C3 = .
0 −2
is asymptotically stable. Note that initially ||X(t)|| can
grow since t is increasing. But eventually exponential In this figure, we can see the exponential decay to zero
decay wins out and solutions limit on the origin. Note associated with the unequal real eigenvalues of C1 ; the
that solutions grow exponentially when λ1 > 0.  damped oscillation associated with the complex eigenval-
ues of C2 ; and the initial growth of the time series due
Theorem 6.4.1 shows that the qualitative features of the to the te−2t term followed by exponential decay to zero
origin for (6.4.1) depend only on the eigenvalues of C in the equal eigenvalue C3 example.

169
§6.4 Sinks, Saddles, and Sources

x’=Ax+By A = −2 B=0 x’=Ax+By A = −1 B = −55 x’=Ax+By A = −2 B=1


y’=Cx+Dy C=0 D = −1 y’=Cx+Dy C = 55 D = −1 y’=Cx+Dy C=0 D = −2

0
0
15
−0.5
−2

−1
10 −4

−1.5
−6
5
−2
−8
x

x
−2.5 0 −10

−3 −12
−5

−3.5 −14

−10
−4 −16

−4.5 −18
−15

−20
−5
−0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 −1 0 1 2 3 4 −0.5 0 0.5 1 1.5 2 2.5 3
t t t

Figure 18: Time series for different sinks.

Sources Versus Sinks The explicit form of solutions to Phase Portraits for Saddles Next we discuss the phase
planar linear systems shows that solutions with initial portraits of linear saddles. Using PhasePlane, draw the
conditions near the origin grow exponentially in forward phase portrait of the saddle
time when the origin of (6.4.1) is a source. We can prove
this point geometrically, as follows. ẋ = 2x + y
(6.4.4)
ẏ = −x − 3y,
The phase planes of sources and sinks are almost the
same; they have the same trajectories but the arrows are
as in Figure 20. The important feature of saddles is that
reversed. To verify this point, note that
there are special trajectories (the eigendirections) that
limit on the origin in either forward or backward time.
Ẋ = −CX (6.4.2)
Definition 6.4.3. The stable manifold or stable orbit of
is a sink when (6.4.1) is a source; observe that the tra- a saddle consists of those trajectories that limit on the
jectories of solutions of (6.4.1) are the same as those origin in forward time; the unstable manifold or unstable
of (6.4.2) — just with time running backwards. For orbit of a saddle consists of those trajectories that limit
let X(t) be a solution to (6.4.1); then X(−t) is a so- on the origin in backward time.
lution to (6.4.2). See Figure 19 for plots of Ẋ = BX and
Ẋ = −BX where Let λ1 < 0 and λ2 > 0 be the eigenvalues of a saddle with
  associated eigenvectors v1 and v2 . The stable orbits are
−1 −5
B= . (6.4.3) given by the solutions X(t) = ±eλ1 t v1 and the unstable
5 −1
orbits are given by the solutions X(t) = ±eλ2 t v2 .

So when we draw schematic phase portraits for sinks, we


automatically know how to draw schematic phase por- Stable and Unstable Orbits using PhasePlane The pro-
traits for sources. The trajectories are the same — but gram PhasePlane is programmed to draw the stable and
the arrows point in the opposite direction. unstable orbits of a saddle on command. Although the

170
§6.4 Sinks, Saddles, and Sources

x’=Ax+By A = −1 B = −5 x’=Ax+By A=1 B=5


y’=Cx+Dy C=5 D = −1 y’=Cx+Dy C = −5 D=1

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

Figure 19: (Left) Sink Ẋ = BX where B is given in (6.4.3). (Right) Source Ẋ = −BX.

principal use of this feature is seen when analyzing non- plotted a time series of the first quadrant solution. Note
linear systems, it is useful to introduce this feature now. how the x time series increases exponentially to +∞ in
As an example, load the linear system (6.4.4) into Phase- forward time and the y time series decreases in forward
Plane and click on Update. Now pull down the Anal- time while going exponentially towards −∞. The two
ysis menu and click on Find nearby equilibrium. Click time series together give the trajectory (x(t), y(t)) that
the cross hairs in the PHASEPLANE Display window on in forward time is asymptotic to the line given by the
a point near the origin; PhasePlane responds by plot- unstable eigendirection.
ting the equilibrium and the real eigenvectors — and by
putting a small circle about the origin. The circle in-
dicates that the numerical algorithm programmed into Exercises
PhasePlane has detected an equilibrium near the chosen
point. This process numerically verifies that the origin is
a saddle (a fact that could have been verified in a more In Exercises 1 – 3 determine whether or not the equilibrium
straightforward way). at the origin in the system of differential equations Ẋ = CX
is asymptotically stable.
Now pull down the Analysis menu again and click on Solve
for stable separatrices. PhasePlane responds by drawing 
1 2

the stable and unstable orbits. The result is shown in 1. C = .
4 1
Figure 20(left). On this figure we have also plotted one
trajectory from each quadrant; thus obtaining the phase
 
−1 2
2. C = .
portrait of a saddle. On the right of Figure 20 we have −4 −1

171
§6.4 Sinks, Saddles, and Sources

5 25
x
4
y
3 20

1 15

0
y

x and y
−1 10

−2

−3 5

−4

−5 0

−5 −4 −3 −2 −1 0 1 2 3 4 5
x
−5
−1.5 −1 −0.5 0 0.5 1 1.5 2
t

Figure 20: (Left) Saddle phase portrait. (Right) First quadrant solution time series.

   
2 1 1 −8
3. C = . 9. C = .
1 −5 2 1

In Exercises 10 – 13 use PhasePlane to determine whether the


In Exercises 4 – 9 determine whether the equilibrium at the
origin is a saddle, sink, or source in Ẋ = CX for the given
origin in the system of differential equations Ẋ = CX is a
matrix C.
sink, a saddle or a source.  
10 −2.7
  10. (matlab) C = .
−2 2 4.32 1.6
4. C = .
0 −1  
−10 −2.7
11. (matlab) C = .
4.32 1.6
 
3 5
5. C = .
0 −2  
−1 2
12. (matlab) C = .
  4.76 1.5
4 2
6. C = .
−1 2
 
−2 −2
13. (matlab) C = .
  4 1
8 0
7. C = .
−5 3
In Exercises 14 – 15 the given matrices B and C are similar.
Observe that the phase portraits of the systems Ẋ = BX and
 
9 −11
8. C = . Ẋ = CX are qualitatively the same in two steps.
−11 9

172
§6.4 Sinks, Saddles, and Sources

(a) Use MATLAB to find the 2 × 2 matrix P such that B =


P −1 CP . Use map to understand how the matrix P moves
points in the plane.
(b) Use PhasePlane to observe that P moves solutions of Ẋ =
BX to the solution of Ẋ = CX. Write a sentence or two
describing your results.
 
2 3
14. (matlab) C = and B =
−1 −3
 
1 1 −1
.
2 −9 −3
 
−1 5
15. (matlab) C = and B =
−5 −1
 
−1 0.5
.
−50 −1

In Exercises 16-18 use the given data (the eigenvectors v1 , v2 ∈


R2 and associated eigenvalues λ1 , λ2 ∈ C of the 2 × 2 matrix
C, and initial condition X0 ∈ R2 ) to

(a) Find the general solution of the system of differential


equations
dX
= CX. (6.4.5)
dt
(b) Sketch the trajectory in phase space of (6.4.5) with initial
condition X0 .

16.
     
1 1 2
v1 = v2 = λ1 = −1 λ2 = 3 X0 =
−1 1 0

17.
     
1 0 2
v1 = ; v2 = ; λ1 = −1; λ2 = 3; X0 =
−1 −3 −2

18.
     
1−i 1+i 2
v1 = ; v2 = ; λ1 = −1+i; λ2 = −1−i; X0 =
−1 −1 0

173
§6.5 *Matrix Exponentials

6.5 *Matrix Exponentials In this formula L2 = LL is the matrix product of L


with itself, and the power Lk is defined inductively by
In Section 4.2 we showed that the solution of the sin-
Lk = LLk−1 for k > 1. Hence eL is an n × n matrix and
gle ordinary differential equation ẋ(t) = λx(t) with ini-
is the infinite sum of n × n matrices.
tial condition x(0) = x0 is x(t) = etλ x0 (see (4.1.4) in
Chapter 4). In this section we show that we may write Remark: The infinite series for matrix exponentials
solutions of systems of equations in a similar form. In (6.5.3) converges for all n × n matrices L. This fact is
particular, we show that the solution to the linear sys- proved in Exercises 13 and 14.
tem of ODEs Using (6.5.3), we can write the matrix exponential of tC
dX
= CX (6.5.1) for each real number t. Since (tC)k = tk C k we obtain
dt
with initial condition etC = In + tC +
1 1
(tC)2 + (tC)3 + · · ·
2!2 3! (6.5.4)
X(0) = X0 , t t3
= In + tC + C 2 + C 3 + · · · .
2! 3!
where C is an n × n matrix and X0 ∈ Rn , is Next we claim that
d tC
X(t) = etC X0 . (6.5.2) e = CetC . (6.5.5)
dt
We verify the claim by supposing that we can differenti-
In order to make sense of the solution (6.5.2) we need
ate (6.5.4) term by term with respect to t. Then
to understand matrix exponentials. More precisely, since
tC is an n × n matrix for each t ∈ R, we need to make d t2 2 d t3 3
   
d tC d d
e = (In ) + (tC) + C + C +
sense of the expression eL where L is an n × n matrix. dt dt dt dt 2! dt 3!
For this we recall the form of the exponential function as  4 
d t 4
a power series: C + ···
dt 4!
1 2 1 1 t2 t3
et = 1 + t + t + t3 + t4 + · · · . = 0 + C + tC 2 + C 3 + C 4 + · · ·
2! 3! 4! 2! 3!
t2 2 t3 3
 
In more compact notation we have = C In + tC + C + C + · · ·
2! 3!

X 1 k = CetC .
et = t .
k!
k=0 It follows that the function X(t) = etC X0 is a solution
of (6.5.1) for each X0 ∈ Rn ; that is,
By analogy, define the matrix exponential eL by
d d
X(t) = etC X0 = CetC X0 = CX(t).
1 1 dt dt
e L
= In + L + L2 + L3 + · · · (6.5.3)
2! 3! Since (6.5.3) implies that e0C = e0 = In , it follows that

1 k X(t) = etC X0 is a solution of (6.5.1) with initial con-
X
= L .
k=0
k! dition X(0) = X0 . This discussion shows that solving

174
§6.5 *Matrix Exponentials

(6.5.1) in closed form is equivalent to finding a closed Next verify (6.5.6) by computing
form expression for the matrix exponential etC .
∞ ∞
1 k X 1 −1
Theorem 6.5.1. The unique solution to the initial value
X
eC = C = (P BP )k
problem k!
k=0
k!
k=0
dX ∞ ∞
!
= CX X 1 −1 k X 1 k
dt = P B P = P −1 B P = P −1 eB P.
k! k!
k=0 k=0
X(0) = X0

is 
tC
X(t) = e X0 .

Proof Existence follows from the previous discussion. Explicit Computation of Matrix Exponentials We be-
For uniqueness, suppose that Y (t) is a solution to Ẏ = gin with the simplest computation of a matrix exponen-
CY with Y (0) = X0 . We claim that Y (t) = X(t). Let tial.
Z(t) = e−tC Y (t) and use the product rule to compute (a) Let L be a multiple of the identity; that is, let
L = αIn where α is a real number. Then
dZ dY
= −Ce−tC Y (t)+e−tC (t) = e−tC (−CY (t)+CY (t)) = 0
dt dt
eαIn = eα In . (6.5.7)
It follows that Z is constant in t and Z(t) = Z(0) =
Y (0) = X0 or Y (t) = etC X0 = X(t), as claimed.  That is, eαIn is a scalar multiple of the identity. To verify
(6.5.7), compute
Similarity and Matrix Exponentials We introduce similar-
α2 2 α3 3
ity at this juncture for the following reason: if C is a ma- eαIn = In + αIn + I + I + ···
trix that is similar to B, then eC can be computed from 2! n 3! n
eB . More precisely: α2 α3
= (1 + α + + + · · · )In = eα In .
2! 3!
Lemma 6.5.2. Let C and B be n × n similar matrices,
and let P be an invertible n × n matrix such that (b) Let C be a 2 × 2 diagonal matrix,
C = P −1 BP. !
λ1 0
Then C= ,
0 λ2
C
e =P −1 B
e P. (6.5.6)
where λ1 and λ2 are real constants. Then
Proof Note that for all powers of k we have !
eλ1 t 0
etC
= . (6.5.8)
(P −1 BP )k = P −1 B k P. 0 eλ 2 t

175
§6.5 *Matrix Exponentials

To verify (6.5.8) compute In this computation we have used the fact that the
trigonometric functions cos t and sin t have the power se-
t2 2 t3 3 ries expansions:
etC = I2 + tC + C + C + ···
2! 3!
 2 
X (−1)k ∞
t 2 1 2 1
t + t4 + · · · = t2k ,
! !
1 0 λ1 t 0  2! λ1 0  cos t = 1−
2! 4! (2k)!
= + + 2 + ··· k=0
0 1 0 λ2 t t 2 
0 λ ∞
2! 2 1 3 1 5 X (−1)k 2k+1
! sin t = t − t + t + ··· = t .
eλ1 t 0 3! 5! (2k + 1)!
k=0
= .
0 eλ2 t
See Exercise 10 for an alternative proof of (6.5.9).
(c) Suppose that To compute the matrix exponential MATLAB provides
the command expm. We use this command to compute
the matrix exponential etC for
!
0 −1
C= .
1 0 !
0 −1 π
C= and t = .
Then ! 1 0 4
cos t − sin t
etC = . (6.5.9)
sin t cos t Type
We begin this computation by observing that
C = [0, -1; 1, 0];
C 2 = −I2 , C 3 = −C, and C 4 = In . t = pi/4;
expm(t*C)
Therefore, by collecting terms of odd and even power in
the series expansion for the matrix exponential we obtain that gives the answer
t2 2 t3 3
etC = I2 + tC + C + C + ··· ans =
2! 3! 0.7071 -0.7071
t2 t3 0.7071 0.7071
= I2 + tC − I2 − C + · · ·
2! 3!
t2 t4 t6
 
= 1 − + − + · · · I2 + Indeed, this is precisely what we expect by (6.5.9), since
2! 4! 6!
π π 1
t3 t5 t7
 
t − + − + ··· C cos = sin = √ ≈ 0.70710678.
3! 5! 7! 4 4 2
= (cos t)I2 + (sin t)C
! (d) Let
cos t − sin t
!
= . 0 1
sin t cos t C= .
0 0

176
§6.5 *Matrix Exponentials

Then In Exercises 4 – 6 compute the matrix exponential etC for the


matrix.
!
1 t
etC = I2 + tC = , (6.5.10)
0 1
!
0 1
4. .
since C 2 = 0. 0 0
 
0 1 0
Exercises 5.  0

0 1 .

0 0 0
!
1. (matlab) Let L be the 3 × 3 matrix 6.
0 −2
.
  2 0
2 0 −1
L =  0 −1 3 .
 

1 0 1 7. Let α, β be real numbers and let αI and βI be correspond-


ing n × n diagonal matrices. Use properties of the scalar ex-
Find the smallest integer m such that ponential function to show that
1 2 1 1 m
I3 + L + L + L3 + · · · + L e(α+β)I = eαI eβI .
2! 3! m!
is equal to eL up to a precision of two decimal places. More
exactly, use the MATLAB command expm to compute eL and In Exercises 8 – 10 we use Theorem 6.5.1, the uniqueness of
use MATLAB commands to compute the series expansion to solutions to initial value problems, in perhaps a surprising
order m. Note that the command for computing n! in MAT- way.
LAB is prod(1:n).
8. Prove that
et+s = et es
2. (matlab) Use MATLAB to compute the matrix expo- for all real numbers s and t. Hint:
nential etC for
(a) Fix s and verify that y(t) = et+s is a solution to the
!
1 1
C=
2 −1 initial value problem

by choosing for t the values 1.0, 1.5 and 2.5. Does eC e1.5C = dx
= x
e2.5C ? dt (6.5.11)
x(0) = es

3. (matlab) For the scalar exponential function et it is well (b) Fix s and verify that z(t) = et es is also a solution to
known that for any pair of real numbers t1 , t2 the following (6.5.11).
equality holds: (c) Use Theorem 6.5.1 to conclude that y(t) = z(t) for every
et1 +t2 = et1 et2 . s.
Use MATLAB to find two 2 × 2 matrices C1 and C2 such that
In this exercise you will need to use the following theorem
eC1 +C2 6= eC1 eC2 . from analysis:

177
§6.5 *Matrix Exponentials

Theorem 6.5.3. If f (x) is differentiable near x0 , and if


df (c) Show that (6.5.15) proves (6.5.13)
dx
is continuous, then there is a unique solution to the differential
equation ẋ = f (x) with initial condition x(0) = x0 . 11. Compute eA where
9. Let A be an n × n matrix. Prove that !
3 −1
A= .
e(t+s)A = etA esA 1 1
for all real numbers s and t. Hint: Check your answer using MATLAB.

(a) Fix s ∈ R and X0 ∈ R and verify that Y (t) = e


n (t+s)A
X0
is a solution to the initial value problem 12. Let C be an n × n matrix. Use Theorem 6.5.1 to show
dX that the n columns of the n × n matrix etC give a basis of
= AX solutions for the system of differential equations Ẋ = CX.
dt (6.5.12)
X(0) = esA X0
  Remark: The completion of Exercises 13 and 14 constitutes
(b) Fix s and verify that Z(t) = etA esA X0 is also a solu- a proof that the infinite series definition of the matrix expo-
tion to (6.5.12). nential is a convergent series for all n × n matrices.
(c) Use the n dimensional version of Theorem 6.5.1 to con- 13. Let A = (aij ) be an n × n matrix. Define
clude that Y (t) = Z(t) for every s and every X0 . !
n
X
Remark: Compare the result in this exercise with the calcu- ||A||m = max (|ai1 | + · · · + |ain |) = max
1≤i≤n 1≤i≤n
|aij | .
lation in Exercise 7. j=1

10. Prove that That is, to compute ||A||m , first sum the absolute values of
!! ! the entries in each row of A, and then take the maximum of
0 −1 cos t − sin t these sums. Prove that:
exp t = . (6.5.13)
1 0 sin t cos t
||AB||m ≤ ||A||m ||B||m .
Hint:
Hint: Begin by noting that
! !
cos t − sin t ! !
(a) Verify that X1 (t) = and X2 (t) =
n X
X n n X
X n

sin t cos t ||AB||m = max aik bkj ≤ max |aik bkj |


1≤i≤n 1≤i≤n
are solutions to the initial value problems j=1 k=1
!
j=1 k=1

! n X
X n
dX 0 −1 = max |aik bkj | .
= X 1≤i≤n
dt 1 0 (6.5.14) k=1 j=1

X(0) = ej 14. Recall that an infinite series of real numbers

for j = 1, 2. c1 + c2 + · · · + cN + · · ·
(b) Since Xj (0) = ej , use Theorem 6.5.1 to verify that converges absolutely if there is a constant K such that for
!! every N the partial sum satisfies:
0 −1
Xj (t) = exp t ej . (6.5.15)
1 0 |c1 | + |c2 | + · · · + |cN | ≤ K.

178
§6.5 *Matrix Exponentials

Let A be an n × n matrix. To prove that the matrix ex-


ponential eA is an absolutely convergent infinite series use
Exercise 13 and the following steps. Let aN be the (i, j)th
entry in the matrix AN where A0 = In .

(a) |aN | ≤ ||AN ||m .


(b) ||AN ||m ≤ ||A||N
m.
1
(c) |a0 | + |a1 | + · · · + |aN | ≤ e||A||m .
N!

15. When the eigenvalues λ1 and λ2 of the 2×2 matrix C are


real and distinct, etC can be computed without determining
the associated eigenvectors. To see this, prove that
1  
etC = eλ1 t (C − λ2 I2 ) − eλ2 t (C − λ1 I2 ) . (6.5.16)
λ2 − λ1
Hint: The left and right hand sides of (6.5.16) are linear maps.
Two linear maps are identical if they have the same values on a
basis of vectors v1 and v2 . Verify that the maps in (6.5.16) are
equal when applied to the linearly independent eigenvectors
of C.

179
§6.6 *The Cayley Hamilton Theorem

6.6 *The Cayley Hamilton Theorem Using the fact that pA (λ) = λ2 − tr(A)λ + det(A), we see
that
The Jordan normal form theorem (Theorem 6.3.4) for
real 2×2 matrices states that every 2×2 matrix is similar pC (λ) = (λ − λ1 )(λ − λ2 )
to one of the matrices in Table 2. We use this theorem
pD (λ) = λ2 − 2σλ + (σ 2 + τ 2 )
to prove the Cayley Hamilton theorem for 2 × 2 matrices
and then use the Cayley Hamilton theorem to present pE (λ) = (λ − λ1 )2 .
another method for computing solutions to planar linear
It now follows that
systems of differential equations in the case of real equal
eigenvalues. pC (C) = (C − λ1 I2 )(C − λ2 I2 )
The Cayley Hamilton theorem states that a matrix sat-
  
0 0 λ1 − λ2 0
=
isfies its own characteristic polynomial. More precisely: 0 λ2 − λ1 0 0
= 0,
Theorem 6.6.1 (Cayley Hamilton Theorem). Let A be
a 2 × 2 matrix and let and
σ 2 − τ 2 −2στ
 
pA (λ) = λ2 + aλ + b pD (D) = −
2στ σ2 − τ 2
be the characteristic polynomial of A. Then
 
σ −τ
2σ +
τ σ
pA (A) = A2 + aA + bI2 = 0.  
2 2 1 0
(σ + τ ) = 0,
0 1
Proof Suppose B = P −1 AP and A are similar matri-
ces. We claim that if pA (A) = 0, then pB (B) = 0. To and
verify this claim, recall from Lemma 6.3.3 that pA = pB 
0 1
2
2
and calculate pE (E) = (E − λ1 I2 ) =
0 0
= 0.

pB (B) = pA (P −1 AP ) = (P −1 AP )2 + aP −1 AP + bI2 
−1
=P pA (A)P = 0.
The Example with Equal Eigenvalues Revisited When
Theorem 6.3.4 classifies 2 × 2 matrices up to similarity. the eigenvalues λ1 = λ2 , the closed form solution of Ẋ =
Thus, we need only verify this theorem for the matrices CX is a straightforward formula

X(t) = eλ1 t (I2 + tN ) (6.6.1)


     
λ1 0 σ −τ λ1 1
C= ,D = ,E = ,
0 λ2 τ σ 0 λ1
where N = C − λ1 I2 .
that is, we need to verify that Note that when using (6.6.1), it is not necessary to com-
pute the eigenvector or generalized eigenvector of C, and
pC (C) = 0 pD (D) = 0 pE (E) = 0. this is a substantial simplification.

180
§6.6 *The Cayley Hamilton Theorem

Verification of (6.6.1) We use the Cayley-Hamilton the- Exercises


orem to verify (6.6.1) as follows. Specifically, since C is
assumed to have a double eigenvalue λ1 , it follows that
N = C − λ1 I2 1. Solve the initial value problem
has zero as a double eigenvalue. Hence, the character- dX

0 1

istic polynomial pN (λ) = λ2 and the Cayley Hamilton dt
=
−2 3
X
theorem implies that N 2 = 0. Therefore,
where X(0) = (2, 1)t .
CX(t) = eλ1 t C(I2 +tN )X0 = eλ1 t (λ1 I2 +N )(I2 +tN )X0
Thus, using N 2 = 0, we see that
2. Find all solutions to the linear system of ODEs
CX(t) = eλ1 t (λ1 I2 + tλ1 N + N )X0 = Ẋ(t),
as desired
 
dX −2 4
= X.
dt −1 1
Let us reconsider the system of differential equations
(6.2.19)  
dX
=
1 −1
X = CX 3. Solve the initial value problem
dt 9 −5  
dX 2 1
with initial value =
−2 0
X
  dt
2
X0 = . where X(0) = (1, 1)t .
3
The eigenvalues of C are real and equal to λ1 = −2.
We may write 4. Let A be a 2 × 2 matrix. Show that
C = λ1 I2 + N = −2I2 + N, A2 = tr(A)A − det(A)I2 .
where  
3 −1
N= .
9 −3
It follows from (6.6.1) that
    
tC −2t 3 −1 −2t 1 + 3t −t
e =e I2 + t =e .
9 −3 9t 1 − 3t
(6.6.2)
Hence the solution to the initial value problem is:
  
tC −2t 1 + 3t −t 2
X(t) = e X0 = e
9t 1 − 3t 3
 
2 + 3t
= e−2t .
3 + 9t

181
§6.7 *Second Order Equations

6.7 *Second Order Equations x(t) measure the distance that the spring is extended (or
compressed). It follows from Newton’s Law that (6.7.3)
A second order constant coefficient homogeneous differ-
is satisfied. Hooke’s law states that the force F acting on
ential equation is a differential equation of the form:
a spring is
ẍ + bẋ + ax = 0, (6.7.1) F = −κx,

where a and b are real numbers. where κ is a positive constant. If the spring is damped
by sliding friction, then

Newton’s Second Law Newton’s second law of motion F = −κx − µ


dx
,
is a second order ordinary differential equation, and for dt
this reason second order equations arise naturally in me-
where µ is also a positive constant. Suppose, in addition,
chanical systems. Newton’s second law states that
that an external force Fext (t) also acts on the mass and
F = ma (6.7.2) that that force is time-dependent. Then the entire force
acting on the mass is
where F is force, m is mass, and a is acceleration.
dx
F = −κx − µ + Fext (t).
dt
Newton’s Second Law and Particle Motion on a Line For
a point mass moving along a line, (6.7.2) is By Newton’s second law, the motion of the mass is de-
scribed by
d2 x
F =m , (6.7.3) d2 x dx
dt2 m 2
+µ + κx = Fext (t), (6.7.5)
dt dt
where x(t) is the position of the point mass at time t.
For example, suppose that a particle of mass m is falling which is again a second order ordinary differential equa-
towards the earth. If we let g be the gravitational con- tion.
stant and if we ignore all forces except gravitation, then
the force acting on that particle is F = −mg. In this case
Newton’s second law leads to the second order ordinary
differential equation

d2 x
+ g = 0. (6.7.4) m
dt2

Newton’s Second Law and the Motion of a Spring As a


second example, consider the spring model pictured in L x
Figure 21. Assume that the spring has zero mass and
that an object of mass m is attached to the end of the Figure 21: Hooke’s Law spring.
spring. Let L be the natural length of the spring, and let

182
§6.7 *Second Order Equations

A Reduction to a First Order System There is a simple Q in (6.7.7). Second, we know from the general theory of
trick that reduces a single linear second order differential planar systems that solutions will have the form x(t) =
equation to a system of two linear first order equations. eλ0 t for some scalar λ0 . We need only determine the
For example, consider the linear homogeneous ordinary values of λ0 for which we get solutions to (6.7.1).
differential equation (6.7.1). To reduce this second order We now discuss the second approach. Suppose that
equation to a first order system, just set y = ẋ. Then x(t) = eλ0 t is a solution to (6.7.1). Substituting this
(6.7.1) becomes form of x(t) in (6.7.1) yields the equation
ẏ + by + ax = 0. λ20 + bλ0 + a eλ0 t = 0.


It follows that if x(t) is a solution to (6.7.1) and y(t) = So x(t) = eλ0 t is a solution to (6.7.1) precisely when
ẋ(t), then (x(t), y(t)) is a solution to pQ (λ0 ) = 0, where
ẋ = y
(6.7.6) pQ (λ) = λ2 + bλ + a (6.7.8)
ẏ = −ax − by.
is the characteristic polynomial of the matrix Q in (6.7.7).
We can rewrite (6.7.6) as
Suppose that λ1 and λ2 are distinct real roots of pQ .
Ẋ = QX. Then the general solution to (6.7.1) is

x(t) = α1 eλ1 t + α2 eλ2 t ,


where  
0 1 where αj ∈ R.
Q= . (6.7.7)
−a −b
Note that if (x(t), y(t)) is a solution to (6.7.6), then x(t) An Example with Distinct Real Eigenvalues For example,
is a solution to (6.7.1). Thus solving the single second solve the initial value problem
order linear equation is exactly the same as solving the
corresponding first order linear system. ẍ + 3ẋ + 2x = 0 (6.7.9)

with initial conditions x(0) = 0 and ẋ(0) = −2. The


The Initial Value Problem To solve the homogeneous characteristic polynomial is
system (6.7.6) we need to specify two initial conditions
X(0) = (x(0), y(0))t . It follows that to solve the single pQ (λ) = λ2 + 3λ + 2 = (λ + 2)(λ + 1),
second order equation we need to specify two initial con-
ditions x(0) and ẋ(0); that is, we need to specify both whose roots are λ1 = −1 and λ2 = −2. So the general
initial position and initial velocity. solution to (6.7.9) is

x(t) = α1 e−t + α2 e−2t


The General Solution There are two ways in which we
To find the precise solution we need to solve
can solve the second order homogeneous equation (6.7.1).
First, we know how to solve the system (6.7.6) by finding x(0) = α1 + α2 = 0
the eigenvalues and eigenvectors of the coefficient matrix ẋ(0) = −α1 − 2α2 = −2

183
§6.7 *Second Order Equations

So α1 = −2, α2 = 2, and the solution to the initial value Summary It follows from this discussion that solutions
problem for (6.7.9) is to second order homogeneous linear equations are either
a linear combination of two exponentials (real unequal
x(t) = −2e−t + 2e−2t eigenvalues), α + βt times one exponential (real equal
eigenvalues), or a time periodic function times an expo-
nential (complex eigenvalues).
An Example with Complex Conjugate Eigenvalues Con- In particular, if the real part of the complex eigenvalues
sider the differential equation is zero, then the solution is time periodic. The frequency
of this periodic solution is often called the internal fre-
ẍ − 2ẋ + 5x = 0. (6.7.10) quency, a point that is made more clearly in the next
example.
The roots of the characteristic polynomial associated to
(6.7.10) are λ1 = 1 + 2i and λ2 = 1 − 2i. It follows from
the discussion in the previous section that the general Solving the Spring Equation Consider the equation for
solution to (6.7.10) is the frictionless spring without external forcing. From
(6.7.5) we get
x(t) = Re α1 eλ1 t + α2 eλ2 t (6.7.11)

mẍ + κx = 0.
r r
κ κ
where α1 and α2 are complex scalars. Indeed, we can where κ > 0. The roots are λ1 = i and λ2 = − i.
rewrite this solution in real form (using Euler’s formula) m m
So the general solution is
as
x(t) = et (β1 cos(2t) + β2 sin(2t)) , x(t) = α cos(τ t) + β sin(τ t),

for real scalars β1 and β2 .


r
κ
where τ = . Under these assumptions the motion
In general, if the roots of the characteristic polynomial m
are σ ± iτ , then the general solution to the differential of the spring is time periodic with period

or internal
equation is: τ
τ
frequency . In particular, the solution satisfying initial

x(t) = eσt (β1 cos(τ t) + β2 sin(τ t)) . conditions x(0) = 1 and ẋ(0) = 0 (the spring is extended
one unit in distance and released with no initial velocity)
is
An Example with Multiple Eigenvalues Note that the co- x(t) = cos(τ t).
efficient matrix Q of the associated first order system in
(6.7.7) is never a multiple of I2 . It follows from the pre- The graph of this function when τ = 1 is given on the
vious section that when the roots of the characteristic left in Figure 22.
polynomial are real and equal, the general solution has If a small amount of friction is added, then the spring
the form equation is
x(t) = α1 eλ1 t + α2 teλ2 t . mẍ + µẋ + κx = 0

184
§6.7 *Second Order Equations

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8

−1 −1
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70

Figure 22: (Left) Graph of solution to undamped spring equation with initial conditions x(0) = 1 and ẋ(0) = 0. (Right)
Graph of solution to damped spring equation with the same initial conditions.

where µ > 0 is small. Since the eigenvalues of the char- Exercises


acteristic polynomial are λ = σ ± iτ where
r
µ κ  µ 2 1. By direct integration solve the differential equation (6.7.4)
σ=− <0 and τ = − ,
2m m 2m for a point particle moving only under the influence of gravity.
Find the solution for a particle starting at a height of 10 feet
the general solution is above ground with an upward velocity of 20 feet/sec. At what
time will the particle hit the ground? (Recall that acceleration
x(t) = eσt (α cos(τ t) + β sin(τ t)). due to gravity is 32 feet/sec2 .)

Since σ < 0, these solutions oscillate but damp down to 2. By direct integration solve the differential equation (6.7.4)
zero. In particular, the solution satisfying initial condi- for a point particle moving only under the influence of gravity.
tions x(0) = 1 and ẋ(0) = 0 is Show that the solution is
1
 µ  x(t) = − gt2 + v0 t + x0
x(t) = e−µt/2m cos(τ t) − sin(τ t) . 2
2mτ where x0 is the initial position of the particle and v0 is the
initial velocity.
µ
The graph of this solution when τ = 1 and = 0.07
2m
is given in Figure 22 (right). Compare the solutions for In Exercises 3 – 5 find the general solution to the given dif-
the undamped and damped springs. ferential equation.

185
§6.7 *Second Order Equations

3. ẍ + 2ẋ − 3x = 0.
4. ẍ − 6ẋ + 9x = 0. In addition, find the solution to this
equation satisfying initial values x(1) = 1 and ẋ(1) = 0.
5. ẍ + 2ẋ + 2x = 0.

6. Prove that a nonzero solution to a second order linear dif-


ferential equation with constant coefficients cannot be identi-
cally equal to zero on a nonempty interval.

7. Let r > 0 and w > 0 be constants, and let x(t) be a


solution to the differential equation
ẍ + rẋ + wx = 0.
Show that lim x(t) = 0.
t→∞

In Exercises 8 – 10, let x(t) be a solution to the second order


linear homogeneous differential equation (6.7.1). Determine
whether the given statement is true or false.
8. If x(t) is nonconstant and time periodic, then the roots of
the characteristic polynomial are purely imaginary.
9. If x(t) is constant in t, then one of the roots of the char-
acteristic polynomial is zero.
10. If x(t) is not bounded, then the roots of the characteristic
polynomial are equal.

11. Consider the second order differential equation


d2 x dx
+ a(x) + b(x) = 0. (6.7.12)
dt2 dt
Let y(t) = ẋ(t) and show that (6.7.12) may be rewritten as a
first order coupled system in x(t) and y(t) as follows:
ẋ = y
ẏ = −b(x) − a(x)y.

12. (matlab) Use PhasePlane to compute solutions to the


system corresponding to the spring equations with small slid-
ing friction. Plot the time series (in x) of the solution and
observe the oscillating and damping of the solution.

186
Chapter 7 Determinants and Eigenvalues

7 Determinants and
Eigenvalues
In Section 3.8 we introduced determinants for 2 × 2 ma-
trices A. There we showed that the determinant of A is
nonzero if and only if A is invertible. In Section 4.6 we
saw that the eigenvalues of A are the roots of its char-
acteristic polynomial, and that its characteristic polyno-
mial is just the determinant of a related matrix, namely,
pA (λ) = det(A − λI2 ).
In Section 7.1 we generalize the concept of determinants
to n×n matrices, and in Section 7.2 we use determinants
to show that every n×n matrix has exactly n eigenvalues
— the roots of its characteristic polynomial. Properties
of eigenvalues are also discussed in detail in Section 7.2.
Certain details concerning determinants are deferred to
Appendix 7.4.

187
§7.1 Determinants

7.1 Determinants Proof (a) Note that Definition 7.1.1(a) implies that
D(cIn ) = cn . It follows from (7.1.1) that
There are several equivalent ways to introduce determi-
nants — none of which are easily motivated. We prefer to
D(cA) = D(cIn A) = D(cIn )D(A) = cn D(A).
define determinants through the properties they satisfy
rather than by formula. These properties actually en-
able us to compute determinants of n × n matrices where (b) Definition 7.1.1(b) implies that it suffices to prove
n > 3, which further justifies the approach. Later on, we this assertion when one row of A is zero. Suppose that
will give an inductive formula (7.1.9) for computing the the ith row of A is zero. Let J be an n × n diagonal
determinant. matrix with a 1 in every diagonal entry except the ith
diagonal entry which is 0. A matrix calculation shows
Definition 7.1.1. A determinant function of a square that JA = A. It follows from Definition 7.1.1(a) that
n × n matrix A is a real number D(A) that satisfies three D(J) = 0 and from (7.1.1) that D(A) = 0. 
properties:
Determinants of 2 × 2 Matrices Before discussing how
(a) If A = (aij ) is lower triangular, then D(A) is the
to compute determinants, we discuss the special case of
product of the diagonal entries; that is,
2 × 2 matrices. Recall from (3.8.2) of Section 3.8 that
D(A) = a11 · · · · · ann . when  
a b
A=
c d
(b) D(At ) = D(A).
we defined
(c) Let B be an n × n matrix. Then det(A) = ad − bc. (7.1.2)
We check that (7.1.2) satisfies the three properties in Def-
D(AB) = D(A)D(B). (7.1.1)
inition 7.1.1. Observe that when A is lower triangular,
then b = 0 and det(A) = ad. So (a) is satisfied. It is
Theorem 7.1.2. There exists a unique determinant straightforward to verify (b). We already verified (c) in
function det satisfying the three properties of Defini- Chapter 3, Proposition 3.8.2.
tion 7.1.1. It is less obvious perhaps — but true nonetheless — that
the three properties of D(A) actually force the determi-
We will show that it is possible to compute the determi- nant of 2 × 2 matrices to be given by formula (7.1.2). We
nant of any n × n matrix using Definition 7.1.1. Here we begin by showing that Definition 7.1.1 implies that
present a few examples:  
0 1
Lemma 7.1.3. Let A be an n × n matrix. D = −1. (7.1.3)
1 0

(a) Let c ∈ R be a scalar. Then D(cA) = cn D(A). We verify (7.1.3) by observing that

(b) If all of the entries in either a row or a column of A


 
0 1
are zero, then D(A) = 0. 1 0

188
§7.1 Determinants

equals as desired.

1 −1

1 0

1 0

1 1
 We have verified that the only possible determinant func-
. (7.1.4) tion for 2×2 matrices is the determinant function defined
0 1 1 1 0 −1 0 1
by (7.1.2).
Hence property (c), (a) and (b) imply that

Row Operations are Invertible Matrices


 
0 1
D = 1 · 1 · (−1) · 1 = −1.
1 0
Proposition 7.1.4. Let A and B be m × n matrices
It is helpful to interpret the matrices in (7.1.4) as elemen- where B is obtained from A by a single elementary row
tary row operations. Then (7.1.4) states that swapping operation. Then there exists an invertible m × m matrix
two rows in a 2 × 2 matrix is the same as performing the R such that B = RA.
following row operations in order:
Proof First consider multiplying the j th row of A by
• add the 2nd row to the 1st row; the nonzero constant c. Let R be the diagonal matrix
• multiply the 2nd row by −1; whose j th entry on the diagonal is c and whose other
diagonal entries are 1. Then the matrix RA is just the
• add the 1st row to the 2nd row; and matrix obtained from A by multiplying the j th row of A
by c. Note that R is invertible when c 6= 0 and that R−1
• subtract the 2nd row from the 1st row. 1
is the diagonal matrix whose j th entry is and whose
c
Suppose that d 6= 0. Then other diagonal entries are 1. For example
b ad − bc
! !     
1 0 0 a11 a12 a13 a11 a12 a13
 
a b 1 0
A= = d d .  0 1 0   a21 a22 a23  =  a21 a22 a23  ,
c d 0 1 c d 0 0 2 a31 a32 a33 2a31 2a32 2a33
It follows from properties (c), (b) and (a) that
multiplies the 3rd row by 2.
ad − bc
D(A) = d = ad − bc = det(A), Next we show that the elementary row operation that
d swaps two rows may also be thought of as matrix mul-
as claimed. tiplication. Let R = (rkl ) be the matrix that deviates
from the identity matrix by changing in the four entries:
Now suppose that d = 0 and note that

a b
 
0 1

c 0
 rii = 0
A= = .
c 0 1 0 a b rjj = 0
rij = 1
Using (7.1.3) we see that
  rji = 1
c 0
D(A) = −D = −bc = det(A), A calculation shows that RA is the matrix obtained from
a b

189
§7.1 Determinants

A by swapping the ith and j th rows. For example, Proof (a) The matrix that adds a multiple of one row
     to another is triangular (either upper or lower) and has
0 0 1 a11 a12 a13 a31 a32 a33 1’s on the diagonal. Thus property (a) in Definition 7.1.1
implies that the determinants of these matrices are equal
 0 1 0   a21 a22 a23  =  a21 a22 a23  ,
1 0 0 a31 a32 a33 a11 a12 a13 to 1.
which swaps the 1st and 3rd rows. Another calculation (b) The matrix that multiplies the ith row by c 6= 0 is
shows that R2 = In and hence that R is invertible since a diagonal matrix all of whose diagonal entries are 1 ex-
R−1 = R. cept for aii = c. Again property (a) implies that the
determinant of this matrix is c 6= 0.
Finally, we claim that adding c times the ith row of A to
the j th row of A can be viewed as matrix multiplication. (c) The matrix that swaps the ith row with the j th row
Let Ek` be the matrix all of whose entries are 0 except is the product of four matrices of types (a) and (b). To
for the entry in the k th row and `th column which is 1. see this let A be an n × n matrix whose ith row vector is
Then R = In + cEij has the property that RA is the ai . Then perform the following four operations in order:
matrix obtained by adding c times the j th row of A to
the ith row. We can verify by multiplication that R is
invertible and that R−1 = In − cEij . More precisely, Operation Result Matrix
Add ri to rj ri = ai rj = ai + aj B1
(In + cEij )(In − cEij ) = In + cEij − cEij − c2 Eij Multiply ri by −1
2
= In , rj = −ai rj = ai + aj B2
Add rj to ri ri = aj rj = ai + aj B3
since Eij
2
= O for i 6= j. For example, Subtract ri from rj ri = aj rj = ai B4
  
1 5 0 a11 a12 a13
It follows that the swap matrix equals B4 B3 B2 B1 .
(I3 + 5E12 )A =  0 1 0   a21 a22 a23 
Therefore
0 0 1 a31 a32 a33
  det(swap) = det(B4 ) det(B3 ) det(B2 ) det(B1 )
a11 + 5a21 a12 + 5a22 a13 + 5a23
= a21 a22 a23 ,
= (1)(−1)(1)(1) = −1.
a31 a32 a33

adds 5 times the 2nd row to the 1st row. 

Computation of Determinants We now show how to


Determinants of Elementary Row Matrices
compute the determinant of any n × n matrix A using
Lemma 7.1.5. (a) The determinant of the matrix that elementary row operations and Definition 7.1.1. It fol-
adds a multiple of one row to another is 1. lows from Proposition 7.1.4 that every elementary row
operation on A may be performed by premultiplying A
(b) The determinant of the matrix that multiplies one by an elementary row matrix.
row by c is c.
For each matrix A there is a unique reduced echelon
(c) The determinant of a swap matrix is −1. form matrix E and a sequence of elementary row ma-

190
§7.1 Determinants

trices R1 . . . Rs such that We use this approach to compute the determinant of the
4 × 4 matrix
E = Rs · · · R1 A. (7.1.5)
 
0 2 10 −2
It follows from Definition 7.1.1(c) that we can compute  1 2 4 0 
A= .
the determinant of A once we know the determinants of

1 6 1 −2 
reduced echelon form matrices and the determinants of 2 1 1 0
elementary row matrices. In particular
The idea is to use (7.1.7) to keep track of the determi-
D(E) nant while row reducing A to upper triangular form. For
D(A) = . (7.1.6)
D(R1 ) · · · D(Rs ) instance, swapping rows changes the sign of the determi-
nant; so
It is easy to compute the determinant of any matrix in
reduced echelon form using Definition 7.1.1(a) since all
 
1 2 4 0
reduced echelon form n×n matrices are upper triangular.  0 2 10 −2 
Lemma 7.1.5 tells us how to compute the determinants det(A) = − det  .
 1 6 1 −2 
of elementary row matrices. This discussion proves: 2 1 1 0

Proposition 7.1.6. If a determinant function exists for


n × n matrices, then it is unique. We call the unique Adding multiples of one row to another leaves the deter-
determinant function det. minant unchanged; so

 
1 2 4 0
We still need to show that determinant functions exist  0 2 10 −2 
when n > 2. More precisely, we know that the reduced det(A) = − det  .
 0 4 −3 −2 
echelon form matrix E is uniquely defined from A (Chap-
0 −3 −7 0
ter 2, Theorem 2.4.9), but there is more than one way to
perform elementary row operations on A to get to E.
Thus, we can write A in the form (7.1.6) in many differ- Multiplying a row by a scalar c corresponds to an ele-
ent ways, and these different decompositions might lead mentary row matrix whose determinant is c. To make
to different values for det A. (They don’t.) sure that we do not change the value of det(A), we have
to divide the determinant by c as we multiply a row of A
by c. So as we divide the second row of the matrix by 2,
An Example of Determinants by Row Reduction As a we multiply the whole result by 2, obtaining
practical matter we row reduce a square matrix A by
premultiplying A by an elementary row matrix Rj . Thus
 
1 2 4 0
 0 1 5 −1 
det(A) = −2 det  .
1  0 4 −3 −2 
det(A) = det(Rj A). (7.1.7)
det(Rj ) 0 −3 −7 0

191
§7.1 Determinants

We continue row reduction by zeroing out the last two If A is singular, then A is row equivalent to a non-identity
entries in the 2nd column, obtaining reduced echelon form matrix E whose determinant is zero
  (since E is upper triangular and its last diagonal entry
1 2 4 0 is zero). So it follows from (7.1.5) that
 0 1 5 −1 
det(A) = −2 det 
 0 0 −23

2  0 = det(E) = det(R1 ) · · · det(Rs ) det(A)
0 0 8 −3

1 2 4 0
 Since det(Rj ) 6= 0, it follows that det(A) = 0. 
 0 1 5 −1 
= 46 det 

2 .
 Corollary 7.1.8. If the rows of an n × n matrix A are
 0 0 1 −  linearly dependent (for example, if one row of A is a
23
0 0 8 −3 scalar multiple of another row of A), then det(A) = 0.
Thus

1 2 4 0
 An Inductive Formula for Determinants In this sub-
 0 1 5 −1 section we present an inductive formula for the determi-
nant — that is, we assume that the determinant is known


det(A) = 46 det  2 
 = −106.
 0 0 1 −
23
 for square (n−1)×(n−1) matrices and use this formula to
53 define the determinant for n × n matrices. This inductive
 
0 0 0 −
23 formula is called expansion by cofactors.
Let A = (aij ) be an n × n matrix. Let Aij be the (n −
Determinants and Inverses We end this subsection with 1) × (n − 1) matrix formed from A by deleting the ith row
an important observation about the determinant func- and the j th column. The matrices Aij are called cofactor
tion. This observation generalizes to dimension n Corol- matrices of A.
lary 3.8.3 of Chapter 3.
Inductively we define the determinant of an n × n matrix
Theorem 7.1.7. An n × n matrix A is invertible if and A by:
only if det(A) 6= 0. Moreover, if A−1 exists, then
n
X
1 det(A) = (−1)1+j a1j det(A1j )
det A−1 = . (7.1.8) j=1
det A
= a11 det(A11 ) − a12 det(A12 ) + · · ·

Proof If A is invertible, then + (−1)n+1 a1n det(A1n ). (7.1.9)

det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1. In Appendix 7.4 we show that the determinant function
defined by (7.1.9) satisfies all properties of a determinant
Thus det(A) 6= 0 and (7.1.8) is valid. In particular, the function. Formula (7.1.9) is also called expansion by co-
determinants of elementary row matrices are nonzero, factors along the 1st row, since the a1j are taken from the
since they are all invertible. (This point was proved by 1st row of A. Since det(A) = det(At ), it follows that if
direct calculation in Lemma 7.1.5.) (7.1.9) is valid as an inductive definition of determinant,

192
§7.1 Determinants

then expansion by cofactors along the 1st column is also There is a visual mnemonic for remembering how to com-
valid. That is, pute the six terms in formula (7.1.11) for the determinant
of 3 × 3 matrices. Write the matrix as a 3 × 5 array by
det(A) = a11 det(A11 )−a21 det(A21 )+· · ·+(−1)n+1 an1 det(An1 ).repeating the first two columns, as shown in bold face in
(7.1.10) Figure 23: Then add the product of terms connected by
We now explore some of the consequences of this defini- solid lines sloping down and to the right and subtract the
tion, beginning with determinants of small matrices. For products of terms connected by dashed lines sloping up
example, Definition 7.1.1(a) implies that the determinant and to the right. Warning: this nice crisscross algorithm
of a 1 × 1 matrix is just for computing determinants of 3 × 3 matrices does not
generalize to n × n matrices.
det(a) = a.
When computing determinants of n × n matrices when
Therefore, using (7.1.9), the determinant of a 2×2 matrix n > 3, it is usually more efficient to compute the determi-
is: nant using row reduction rather than by using formula
(7.1.9). In the appendix to this chapter, Section 7.4,
= a11 det(a22 )−a12 det(a21 ) = a11 a22 −a12 a21we
, verify that formula (7.1.9) actually satisfies the three
 
a11 a12
det
a21 a22 properties of a determinant, thus completing the proof of
Theorem 7.1.2.
which is just the formula for determinants of 2 × 2 ma-
trices given in (7.1.2). An interesting and useful formula for reducing the effort
in computing determinants is given by the following for-
Similarly, we can now find a formula for the determinant
mula.
of 3 × 3 matrices A as follows.
    Lemma 7.1.9. Let A be an n × n matrix of the form
a22 a23 a21 a23
det(A) = a11 det − a12 det 
B 0

a32 a33 a31 a33 A= ,
  C D
a21 a22
+ a13 det
a31 a32 where B is a k × k matrix and D is an (n − k) × (n − k)
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 matrix. Then
− a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . (7.1.11) det(A) = det(B) det(D).

As an example, compute Proof We prove this result using (7.1.9) coupled with
  induction. Assume that this lemma is valid or all (n −
2 1 4 1) × (n − 1) matrices of the appropriate form. Now use
det  1 −1 3  (7.1.9) to compute
5 6 −2
det(A) = a11 det(A11 ) − a12 det(A12 ) + · · · ± a1n det(A1n )
using formula (7.1.11) as = b11 det(A11 ) − b12 det(A12 ) + · · · ± b1k det(A1k ).
2(−1)(−2) + 1 · 3 · 5 + 4 · 6 · 1 − 4(−1)5 − 3 · 6 · 2 − (−2)1 · 1 Note that the cofactor matrices A1j are obtained from
= 4 + 15 + 24 + 20 − 36 + 2 = 29. A by deleting the 1st row and the j th column. These

193
§7.1 Determinants

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32

Figure 23: Mnemonic for computation of determinants of 3 × 3 matrices.

matrices all have the form to use. For example, typing e8_1_11 will load the matrix
   
B1j 0 1 2 3 0
A1j = ,
Cj D  2 1 4 1 
A=  −2 −1
. (7.1.12*)
0 1 
where Cj is obtained from C by deleting the j th column. −1 0 −2 3
By induction on k
To compute the determinant of A just type det(A) and
det(A1j ) = det(B1j ) det(D). obtain the answer

It follows that ans =


-46
det(A) = (b11 det(B11 ) − b12 det(B12 ) + · · ·
±b1k det(B1k )) det(D) Alternatively, we can use row reduction techniques in
= det(B) det(D), MATLAB to compute the determinant of A — just to
test the theory that we have developed. Note that to
as desired.  compute the determinant we do not need to row reduce
to reduced echelon form — we need only reduce to an
upper triangular matrix. This can always be done by
Determinants in MATLAB The determinant function successively adding multiples of one row to another —
has been preprogrammed in MATLAB and is quite easy an operation that does not change the determinant. For

194
§7.1 Determinants

example, to clear the entries in the 1st column below the To evaluate the determinant of A, which is now an upper
1st row, type triangular matrix, type

A(2,:) = A(2,:) - 2*A(1,:); A(1,1)*A(2,2)*A(3,3)*A(4,4)


A(3,:) = A(3,:) + 2*A(1,:);
A(4,:) = A(4,:) + A(1,:) obtaining

obtaining ans =
-46
A =
1 2 3 0 as expected.
0 -3 -2 1
0 3 6 1 Exercises
0 2 1 3

To clear the 2nd column below the 2nd row type In Exercises 1 – 3 compute the determinants of the given
matrix.
A(3,:) = A(3,:) + A(2,:);A(4,:)  
−2 1 0
= A(4,:) - A(4,2)*A(2,:)/A(2,2) 1. A =  4 5 0 .
1 0 2
obtaining  
1 0 2 3
 −1 −2 3 2 
A = 2. B =  .
4 −2 0 3 
1.0000 2.0000 3.0000 0
1 2 0 −3
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
 
2 1 −1 0 0
0 0 -0.3333 3.6667  1 −2 3 0 0 
3. C =  .
 
 −3 2 −2 0 0 
Finally, to clear the entry (4, 3) type −1
 1 1 2 4 
0 2 3 −1 −3
A(4,:) = A(4,:) -A(4,3)*A(3,:)/A(3,3)
 
0 2 0 1
to obtain  1 −1 0 −1 
4. Compute det 
 1
.
1 1 3 
A = 0 1 0 1
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000 −2

−3 2

0 0 4.0000 2.0000 5. Find det(A−1 ) where A =  4 1 3 .
0 0 0 3.8333 −1 1 1

195
§7.1 Determinants

6. Show that the determinants of similar n × n matrices are


 
1 0 0
equal. 13. A =  0 0 1 .
0 1 0

In Exercises 7 – 9 use row reduction to compute the determi-


 
0 0 1
nant of the given matrix. 14. B =  0 1 0 .
  1 0 0
−1 −2 1
7. A =  3 1 3 .
−1 1 1
15.
 Compute 
the cofactor matrices A13 , A22 , A21 when A =

1 0 1 0
 3 2 −4
 0 1 0 −1 
 0 1 5 .
8. B =  1 0 −1
. 0 0 6
0 
0 1 0 1

16. Compute the cofactor


 matrices B11 , B23 , B43 when B =
 
1 2 0 1 
 0 2 1 0  0 2 −4 5
9. C =  −2 −3 3 −1 .

 −1 7 −2
 10 
.
1 0 5 2  0 0 0 −1 
3 4 2 −10
 
2 −1 −1
10. Compute det  1 1 3 . 17. Find values of λ where the determinant of the matrix
−1 1 1  
λ−1 0 −1
Aλ =  0 λ−1 1 
11. Let −1 1 λ

vanishes.
   
2 −1 0 2 0 0
A= 0 3 0  and B= 0 −1 0 .
1 5 3 0 0 3
18. Suppose that two n × p matrices A and B are row equiv-
(a) For what values of λ is det(λA − B) = 0?
alent. Show that there is an invertible n × n matrix P such
(b) Is there a vector x for which Ax = Bx? that B = P A.

 
−1 2 3 1 19. Let A be an invertible n × n matrix and let b ∈ Rn be
 1 −1 2 −1  a column vector. Let Bj be the n × n matrix obtained from
12. Compute det 
 .
1 1 1 3  A by replacing the j th column of A by the vector b. Let
0 −3 2 −1 x = (x1 , . . . , xn )t be the unique solution to Ax = b. Then
Cramer’s rule states that
In Exercises 13 – 14 verify that the given matrix has deter- det(Bj )
minant −1. xj = . (7.1.13)
det(A)

196
§7.1 Determinants

Prove Cramer’s rule. Hint: Let Aj be the j th column of A 25. The (n + 1) × (n + 1) Vandermonde matrix is
so that Aj = Aej . Show that
xn xn−1 ··· x21 x1 1
 
1 1
Bj = A(e1 | · · · |ej−1 |x|ej+1 | · · · |en ).  xn 2 xn−1
2 ··· x22 x2 1 
Vn+1 =  .. .. .. .. .. .
 
Using this product, compute the determinant of Bj and verify  . . . . . 
(7.1.13). xn
n+1 xn−1
n+1 ··· x2n−1 xn+1 1
(7.1.14)
Verify that
20. Show that the determinant of a permutation matrix is 1
(7.1.15)
Y
or −1. Hint: Related to Exercise 21. Use induction. det(Vn+1 ) = (xi − xj ).
1≤i<j≤n+1

Recall that a skew-symmetric matrix A is an n × n matrix


such that A = −At (See Definition 1.3.1). 26. (This exercise generalizes Exercise 7 in Section 2.1.) Let
x1 , . . . , xn+1 be distinct real numbers and let b1 , . . . , bn+1 be
21. When n = 3, a skew-symmetric matrix has the form real numbers. Then there exists a unique polynomial
 
0 a b P (x) = an xn + an−1 xn−1 + · · · + a0
A =  −a 0 c 
−b −c 0 of degree n such that

for some a, b, c ∈ R. Show that det(A) = 0 by direct compu- P (xi ) = bi (7.1.16)


tation.
for i = 1, . . . , n + 1.
Hint: Rewrite the system of equations (7.1.16) as the system
22. Suppose A is an n × n skew-symmetric matrix with n of linear equations Vn+1 x = b, where Vn+1 is the Vander-
being odd. Show that det(A) = 0. Hint: Use Lemma 7.1.3 monde matrix (7.1.14).
(a) and Definition 7.1.1 (b).

23. Compute det(V2 ) where


 
x1 1
V2 = .
x2 1

24. Let
x21
 
x1 1
V3 =  x22 x2 1 
x23 x3 1
Verify that

det(V3 ) = (x1 − x2 )(x1 − x3 )(x2 − x3 ).

197
§7.2 Eigenvalues and Eigenvectors

7.2 Eigenvalues and Eigenvectors Example 7.2.2. Let A be an n × n lower triangular


matrix. Then the diagonal entries are the eigenvalues of
In this section we discuss how to find eigenvalues for an
A. We verify this statement as follows.
n × n matrix A. This discussion parallels the discussion
for 2 × 2 matrices given in Section 4.6. As we noted in 
a11 − λ 0

that section, λ is a real eigenvalue of A if there exists a ..
nonzero eigenvector v such that A − λIn =  . .
 

(∗) ann − λ
Av = λv. (7.2.1)
Since the determinant of a triangular matrix is the prod-
It follows that the matrix A − λIn is singular since uct of the diagonal entries, it follows that

(A − λIn )v = 0. pA (λ) = (a11 − λ) · · · (ann − λ), (7.2.2)

Theorem 7.1.7 implies that and hence that the diagonal entries of A are roots of the
characteristic polynomial. A similar argument works if
det(A − λIn ) = 0. A is upper triangular.

With these observations in mind, we can make the fol- It follows from (7.2.2) that the characteristic polynomial
lowing definition. of a triangular matrix is a polynomial of degree n and
that
Definition 7.2.1. Let A be an n × n matrix. The char-
acteristic polynomial of A is: pA (λ) = (−1)n λn + bn−1 λn−1 + · · · + b0 . (7.2.3)

pA (λ) = det(A − λIn ). for some real constants b0 , . . . , bn−1 . In fact, this state-
ment is true in general.
Theorem 7.2.3. Let A be an n × n matrix. Then pA is
a polynomial of degree n of the form (7.2.3).
In Theorem 7.2.3 we show that pA (λ) is indeed a poly-
nomial of degree n in λ. Note here that the roots of pA
are the eigenvalues of A. As we discussed, the real eigen- Proof Let C be an n × n matrix whose entries have
values of A are roots of the characteristic polynomial. the form cij + dij λ. Then det(C) is a polynomial in λ of
Conversely, if λ is a real root of pA , then Theorem 7.1.7 degree at most n. We verify this statement by induction.
states that the matrix A − λIn is singular and therefore It is easily verified when n = 1, since then C = (c + dλ)
that there exists a nonzero vector v such that (7.2.1) for some real numbers c and d. Then det(C) = c + dλ
is satisfied. Similarly, by using this extended algebraic which is a polynomial of degree at most one. (It may have
definition of eigenvalues we allow the possibility of com- degree zero, if d = 0.) So assume that this statement is
plex eigenvalues. The complex analog of Theorem 7.1.7 true for (n − 1) × (n − 1) matrices. Recall from (7.1.9)
shows that if λ is a complex eigenvalue, then there exists that
a nonzero complex n-vector v such that (7.2.1) is satis-
fied. det(C) = (c11 +d11 λ) det(C11 )+· · ·+(−1)n+1 (c1n +d1n λ) det(C1n ).

198
§7.2 Eigenvalues and Eigenvectors

By induction each of the determinants C1j is a polyno- polynomial has exactly two roots. In general, the proof
mial of degree at most n−1. It follows that multiplication of the fundamental theorem is not easy and is certainly
by c1j + d1j λ yields a polynomial of degree at most n in beyond the limits of this course. Indeed, the difficulty in
λ. Since the sum of polynomials of degree at most n is proving the fundamental theorem of algebra is in proving
a polynomial of degree at most n, we have verified our that a polynomial p(λ) of degree n > 0 has one (complex)
assertion. root. Suppose that λ0 is a root of p(λ); that is, suppose
Since A − λIn is a matrix whose entries have the desired that p(λ0 ) = 0. Then it follows that
form, it follows that pA (λ) is a polynomial of degree at p(λ) = (λ − λ0 )q(λ) (7.2.4)
most n in λ. To complete the proof of this theorem we
need to show that the coefficient of λn is (−1)n . Again, for some polynomial q of degree n − 1. So once we know
we verify this statement by induction. This statement is that p has a root, then we can argue by induction to prove
easily verified for 1 × 1 matrices — we assume that it is that p has n roots. A linear algebra proof of (7.2.4) is
true for (n − 1) × (n − 1) matrices. Again use (7.1.9) to sketched in Exercise 17.
compute
Recall that a polynomial need not have any real roots.
det(A − λIn ) = (a11 − λ) det(B11 ) − a12 det(B12 ) + · · · For example, the polynomial p(λ) = λ2 + 1 has no real
roots, since p(λ) > 0 for all real √
λ. This polynomial does
+ (−1)n+1 a1n det(B1n ).
have two complex roots ±i = ± −1.
where B1j are the cofactor matrices of A−λIn . Using our However, a polynomial with real coefficients has either
previous observation all of the terms det(B1j ) are poly- real roots or complex roots that come in complex con-
nomials of degree at most n − 1. Thus, in this expansion, jugate pairs. To verify this statement, we need to show
the only term that can contribute a term of degree n is: that if λ0 is a complex root of p(λ), then so is λ0 . We
claim that
−λ det(B11 ). p(λ) = p(λ).

Note that the cofactor matrix B11 is the (n − 1) × (n − 1) To verify this point, suppose that
matrix
p(λ) = cn λn + cn−1 λn−1 + · · · + c0 ,
B11 = A11 − λIn−1 ,
where A11 is the first cofactor matrix of the matrix A. By where each cj ∈ R. Then
induction, det(B11 ) is a polynomial of degree n − 1 with
leading term (−1)n−1 λn−1 . Multiplying this polynomial p(λ)
by −λ yields a polynomial of degree n with the correct = cn λn + cn−1 λn−1 + · · · + c0
leading term.  n n−1
= cn λ + cn−1 λ + · · · + c0
= p(λ)
General Properties of Eigenvalues The fundamental
theorem of algebra states that every polynomial of degree If λ0 is a root of p(λ), then
n has exactly n roots (counting multiplicity). For exam-
ple, the quadratic formula shows that every quadratic p(λ0 ) = p(λ0 ) = 0 = 0.

199
§7.2 Eigenvalues and Eigenvectors

Hence λ0 is also a root of p. Proof An eigenvector v of A has eigenvalue zero if and


It follows that only if
Av = 0.
Theorem 7.2.4. Every (real) n×n matrix A has exactly
n eigenvalues λ1 , . . . , λn . These eigenvalues are either This statement is valid if and only if v is in the null space
real or complex conjugate pairs. Moreover, of A. 

(a) pA (λ) = (λ1 − λ) · · · (λn − λ), Theorem 7.2.7. Let A be an invertible n×n matrix with
eigenvalues λ1 , · · · , λn . Then the eigenvalues of A−1 are
(b) det(A) = λ1 · · · λn . 1 , · · · , λn .
λ−1 −1

Proof Since the characteristic polynomial pA is a poly-


Proof We claim that
nomial of degree n with real coefficients, the first part of
the theorem follows from the preceding discussion. In 1
particular, it follows from (7.2.4) that pA (λ) = (−1)n det(A)λn pA−1 ( ).
λ
pA (λ) = c(λ1 − λ) · · · (λn − λ), 1
It then follows that is an eigenvalue for A−1 for each
for some constant c. Formula (7.2.3) implies that c = λ
eigenvalue λ of A. This makes sense, since the eigenvalues
1 — which proves (a). Since pA (λ) = det(A − λIn ), of A are nonzero.
it follows that pA (0) = det(A). Thus (a) implies that
pA (0) = λ1 · · · λn , thus proving (b).  Compute:

1 1
The eigenvalues of a matrix do not have to be different. (−1)n det(A)λn pA−1 ( ) = (−λ)n det(A) det(A−1 − In )
λ λ
For example, consider the extreme case of a strictly tri- 1
angular matrix A. Example 7.2.2 shows that all of the = det(−λA) det(A−1 − In )
λ
eigenvalues of A are zero. 1
= det(−λA(A−1 − In ))
We now discuss certain properties of eigenvalues. λ
= det(A − λIn )
Corollary 7.2.5. Let A be an n × n matrix. Then A is
= pA (λ),
invertible if and only if zero is not an eigenvalue of A.
which verifies the claim. 
Proof The proof follows from Theorem 7.1.7 and The-
orem 7.2.4(b).  Theorem 7.2.8. Let A and B be similar n × n matrices.
Then
Lemma 7.2.6. Let A be a singular n × n matrix. Then pA = pB ,
the null space of A is the span of all eigenvectors whose
associated eigenvalue is zero. and hence the eigenvalues of A and B are identical.

200
§7.2 Eigenvalues and Eigenvectors

Proof Since B and A are similar, there exists an in- to obtain


vertible n × n matrix S such that B = S −1 AS. It follows
that ans =
det(B − λIn ) = det(S −1
AS − λIn ) 1.0000 -5.0000 15.0000 -10.0000 -
46.0000
= det(S −1 (A − λIn )S) = det(A − λIn ),

which verifies that pA = pB .  Thus the characteristic polynomial of A is:

Recall that the trace of an n × n matrix A is the sum of pA (λ) = λ4 − 5λ3 + 15λ2 − 10λ − 46.
the diagonal entries of A; that is
The eigenvalues of A are found by typing eig(A) and
tr(A) = a11 + · · · + ann . obtaining
We state without proof the following theorem:
ans =
Theorem 7.2.9. Let A be an n × n matrix with eigen- -1.2224
values λ1 , . . . , λn . Then 1.6605 + 3.1958i
1.6605 - 3.1958i
tr(A) = λ1 + · · · + λn .
2.9014
It follows from Theorem 7.2.8 that the traces of similar
matrices are equal. Thus A has two real eigenvalues and one complex conju-
gate pair of eigenvalues. Note that MATLAB has prepro-
Definition 7.2.10. Associated with each eigenvalue λ grammed not only the algorithm for finding the charac-
of the square matrix A is a vector subspace of Rn . This teristic polynomial, but also numerical routines for find-
subspace, called the eigenspace of λ, is the null space(A− ing the roots of the characteristic polynomial.
λIn ) and consists of all eigenvectors associated with the
The trace of A is found by typing trace(A) and obtaining
eigenvalue λ.

ans =
MATLAB Calculations The commands for computing 5
characteristic polynomials and eigenvalues of square ma-
trices are straightforward in MATLAB . In particular, for
an n × n matrix A, the MATLAB command poly(A) re- Using the MATLAB command sum we can verify the
turns the coefficients of (−1)n pA (λ). statement of Theorem 7.2.9. Indeed sum(v) computes
the sum of the components of the vector v and typing
For example, reload the 4 × 4 matrix A of (7.1.12*) by
typing e8_1_11. The characteristic polynomial of A is
found by typing sum(eig(A))

poly(A) we obtain the answer 5.0000, as expected.

201
§7.2 Eigenvalues and Eigenvectors

Exercises
 
8 5
6. Consider the matrix A = .
−10 −7

(a) Find the eigenvalues and eigenvectors of A.


In Exercises 1 – 2 determine the characteristic polynomial and
the eigenvalues of the given matrices. (b) Show that the eigenvectors found in (a) form a basis for
  R2 .
−9 −2 −10
(c) Find the coordinates of the vector (x1 , x2 ) relative to the
1. A =  3 2 3 .
basis in part (b).
8 2 9
 
2 1 −5 2
 1 2 13 2  7. Find the characteristic polynomial and the eigenvalues of
2. B =  .
 0 0 3 −1   
0 0 1 1 −1 2 2
A= 2 2 2 .
−3 −6 −6
3. Find a basis for the eigenspace of A associated with the
Find eigenvectors corresponding to each of the three eigenval-
eigenvalue 2, where
ues.
 
3 1 −1
A =  −1 1 1 
2 2 0 8. Let A be an n × n matrix. Suppose that

A2 + A + In = 0.
4. Consider the matrix
  Prove that A is invertible.
−1 1 1
A= 1 −1 1 .
1 1 −1
9. Let A be an n × n matrix. Explain why the eigenvalues of
(a) Verify that the characteristic polynomial of A is pλ (A) = A and At are identical.
(λ − 1)(λ + 2)2 .
(b) Show that (1, 1, 1) is an eigenvector of A corresponding
In Exercises 10 – 12 decide whether the given statements are
to λ = 1.
true or false. If the statements are false, give a counterexam-
(c) Show that (1, 1, 1) is orthogonal to every eigenvector of ple; if the statements are true, give a proof.
A corresponding to the eigenvalue λ = −2.
10. If the eigenvalues of a 2 × 2 matrix are equal to 1, then
the four entries of that matrix are each less than 500.
5. Let
11. If A is a 4 × 4 matrix and det(A) > 0, then det(−A) > 0.
 
0 −3 −2
A= 1 −4 −2 
−3 4 1 12. The trace of the product of two n × n matrices is the
product of the traces.
One of the eigenvalues of A is −1. Find the other eigenvalues
of A.

202
§7.2 Eigenvalues and Eigenvectors

13. When n is odd show that every real n × n matrix has a 17. (matlab) Verify (7.2.4) by proving the following. Let
real eigenvalue. Pn be the vector space of polynomials in λ of degree less than
or equal to n.

In Exercises 14 – 15, use MATLAB to compute (a) the eigen- (a) Prove that dim Pn is n+1 by showing that {1, λ, . . . , λn }
values, traces, and characteristic polynomials of the given is a basis.
matrix. (b) Use the results from part (a) to confirm The- (b) For every λ0 prove that {1, λ − λ0 , . . . , (λ − λ0 )n } is a
orems 7.2.7 and 7.2.9. basis of Pn .
14. (matlab) (c) Use (b) to verify (7.2.4).
 
−12 −19 −3 14 0
18. Let A be an n × n matrix. List as TRUE all of the
 −12 10 14 −19 8 
(7.2.5*)
 
A= 4 −2 1 7 −3 following that are equivalent to A being invertible and FALSE
.
 
otherwise:
 −9 17 −12 −5 −8 
−12 −1 7 13 −12
(a) dim(range(LA )) = n
15. (matlab)
(b) A has n distinct real eigenvalues
(c) 0 is not an eigenvalue of A
 
−12 −5 13 −6 −5 12
7 14 6 1 8 18
(d) The system of equations Ax = e1 is consistent
 
 
 −8 14 13 9 2 1
(7.2.6*)

B= .
(e) The system of equations Ax = e1 has a unique solution

 2 4 6 −8 −2 15 

(f) A is similar to In
 −14 0 −6 14 8 −13 
8 16 −8 3 5 19
(g) det(A) 6= 0
(h) The rows of A form a basis for Rn
16. (matlab) Use MATLAB to compute the characteristic
polynomial of the following matrix:
 
4 −6 7
A= 2 0 5 
−10 2 5

Denote this polynomial by pA (λ) = −(λ3 + p2 λ2 + p1 λ + p0 ).


Then compute the matrix

B = −(A3 + p2 A2 + p1 A + p0 I).

What do you observe? In symbols B = pA (A). Compute


the matrix B for examples of other square matrices A and
determine whether or not your observation was an accident.

203
§7.3 Real Diagonalizable Matrices

7.3 Real Diagonalizable Matrices Since A is triangular, it follows that both eigenvalues of
A are equal to 1. Since A is not the identity matrix,
An n × n matrix is real diagonalizable if it is similar to
it cannot be diagonalizable. More generally, if N is a
a diagonal matrix. More precisely, an n × n matrix A
nonzero strictly upper triangular n × n matrix, then the
is real diagonalizable if there exists an invertible n × n
matrix In + N is not diagonalizable.
matrix S such that
These examples show that complex eigenvalues are al-
D = S −1 AS ways obstructions to real diagonalization and multiple
real eigenvalues are sometimes obstructions to diagonal-
is a diagonal matrix. In this section we investigate when ization. Indeed,
a matrix is diagonalizable. In this discussion we assume
that all matrices have real entries. Theorem 7.3.1. Let A be an n×n matrix with n distinct
We begin with the observation that not all matrices are real eigenvalues. Then A is real diagonalizable.
real diagonalizable. We saw in Example 7.2.2 that the
diagonal entries of the diagonal matrix D are the eigen- There are two ideas in the proof of Theorem 7.3.1, and
values of D. Theorem 7.2.8 states that similar matrices they are summarized in the following lemmas.
have the same eigenvalues. Thus if a matrix is real diag- Lemma 7.3.2. Let λ1 , . . . , λk be distinct real eigenvalues
onalizable, then it must have real eigenvalues. It follows, for an n × n matrix A. Let vj be eigenvectors associated
for example, that the 2 × 2 matrix with the eigenvalue λj . Then {v1 , . . . , vk } is a linearly

0 −1
 independent set.
1 0

is not real diagonalizable, since its eigenvalues are ±i. Proof We prove the lemma by using induction on k.
When k = 1 the proof is simple, since v1 6= 0. So we can
However, even if a matrix A has real eigenvalues, it need assume that {v1 , . . . , vk−1 } is a linearly independent set.
not be diagonalizable. For example, the only matrix sim-
ilar to the identity matrix In is the identity matrix itself. Let α1 , . . . , αk be scalars such that
To verify this point, calculate
α1 v1 + · · · + αk vk = 0. (7.3.1)
−1 −1
S In S = S S = In .
We must show that all αj = 0.
Suppose that A is a matrix all of whose eigenvalues are Begin by multiplying both sides of (7.3.1) by A, to obtain:
equal to 1. If A is similar to a diagonal matrix D, then
D must have all of its eigenvalues equal to 1. Since the 0 = A(α1 v1 + · · · + αk vk )
identity matrix is the only diagonal matrix with all eigen- = α1 Av1 + · · · + αk Avk (7.3.2)
values equal to 1, D = In . So, if A is similar to a diagonal
= α1 λ1 v1 + · · · + αk λk vk .
matrix, it must itself be the identity matrix. Consider,
however, the 2 × 2 matrix
  Now subtract λk times (7.3.1) from (7.3.2), to obtain:
1 1
A= . α1 (λ1 − λk )v1 + · · · + αk−1 (λk−1 − λk )vk−1 = 0.
0 1

204
§7.3 Real Diagonalizable Matrices

Since {v1 , . . . , vk−1 } is a linearly independent set, it fol- {v1 , . . . , vn } is a linearly independent set of eigenvectors
lows that of A.
αj (λj − λk ) = 0, Since D is diagonal, Dej = λj ej for some real number
for j = 1, . . . , k − 1. Since all of the eigenvalues are λj . It follows that
distinct, λj − λk 6= 0 and αj = 0 for j = 1, . . . , k − 1. Avj = SDS −1 vj = SDS −1 Sej = SDej = λj Sej = λj vj .
Substituting this information into (7.3.1) yields αk vk =
0. Since vk 6= 0, αk is also equal to zero.  So vj is an eigenvector of A. Since the matrix S is in-
vertible, its columns are linearly independent. Since the
Lemma 7.3.3. Let A be an n × n matrix. Then A is columns of S are vj , the set {v1 , . . . , vn } is a linearly
real diagonalizable if and only if A has n real linearly independent set of eigenvectors of A, as claimed. 
independent eigenvectors.
Proof of Theorem 7.3.1 Let λ1 , . . . , λn be the dis-
tinct eigenvalues of A and let v1 , . . . , vn be the cor-
Proof Suppose that A has n linearly independent responding eigenvectors. Lemma 7.3.2 implies that
eigenvectors v1 , . . . , vn . Let λ1 , . . . , λn be the corre- {v1 , . . . , vn } is a linearly independent set in Rn and there-
sponding eigenvalues of A; that is, Avj = λj vj . Let fore a basis. Lemma 7.3.3 implies that A is diagonaliz-
S = (v1 | · · · |vn ) be the n × n matrix whose columns are able. 
the eigenvectors vj . We claim that D = S −1 AS is a
diagonal matrix. Compute Remark. Theorem 7.3.1 can be generalized as follows.
Suppose all eigenvalues of the n × n matrix A are real.
D = S −1 AS = S −1 A(v1 | · · · |vn ) = S −1 (Av1 | · · · |Avn ) Then A is diagonalizable if and only if the dimension of
the eigenspace associated with each eigenvalue λ is equal
= S −1 (λ1 v1 | · · · |λn vn ). to the number of times λ is an eigenvalue of A. Issues
surrounding this remark are discussed in Chapter 11.
It follows that

D = (λ1 S −1 v1 | · · · |λn S −1 vn ). Diagonalization Using MATLAB Let

Note that
 
−6 12 4
S −1 vj = ej , A= 8 −21 −8  . (7.3.3*)
−29 72 27
since
Sej = vj . We use MATLAB to answer the questions: Is A real
diagonalizable and, if it is, can we find the matrix S such
Therefore, that S −1 AS is diagonal? We can find the eigenvalues of
D = (λ1 e1 | · · · |λn en ) A by typing eig(A). MATLAB’s response is:
is a diagonal matrix.
ans =
Conversely, suppose that A is a real diagonalizable ma- -2.0000
trix. Then there exists an invertible matrix S such that -1.0000
D = S −1 AS is diagonal. Let vj = Sej . We claim that 3.0000

205
§7.3 Real Diagonalizable Matrices

Since the eigenvalues of A are real and distinct, Theo- and MATLAB responds with
rem 7.3.1 states that A can be diagonalized. That is,
there is a matrix S such that S =

−1 0 0
 -0.7071 0.8729 -0.0000
S −1 AS =  0 −2 0  -0.0000 0.4364 -0.3162
0 0 3 -0.7071 -0.2182 0.9487

The proof of Lemma 7.3.3 tells us how to find the matrix D =


S. We need to find the eigenvectors v1 , v2 , v3 associated -2.0000 0 0
with the eigenvalues −1, −2, 3, respectively. Then the 0 -1.0000 0
matrix (v1 |v2 |v3 ) whose columns are the eigenvectors is 0 0 3.0000
the matrix S. To verify this construction we first find
the eigenvectors of A by typing The matrix S is the transition matrix whose columns are
the eigenvectors of A and the matrix D is a diagonal
v1 = null(A+eye(3)); matrix whose j th diagonal entry is the eigenvalue of A
v2 = null(A+2*eye(3)); corresponding to the eigenvector in the j th column of S.
v3 = null(A-3*eye(3));

Now type S = [v1 v2 v3] to obtain Exercises


S =
0.8729 0.7071 0
0.4364 0.0000 0.3162
 
0 3
1. Let A = .
-0.2182 0.7071 -0.9487 3 0

Finally, check that S −1 AS is the desired diagonal matrix (a) Find the eigenvalues and eigenvectors of A.
by typing inv(S)*A*S to obtain (b) Find an invertible matrix S such that S −1 AS is a diag-
onal matrix D. What is D?
ans =
-1.0000 0.0000 0
0.0000 -2.0000 -0.0000 2. The eigenvalues of
0.0000 0 3.0000  
−1 2 −1
It is cumbersome to use the null command to find eigen- A= 3 0 1 
vectors and MATLAB has been preprogrammed to do −3 −2 −3
these computations automatically. We can use the eig
are 2, −2, −4. Find the eigenvectors of A for each of these
command to find the eigenvectors and eigenvalues of a eigenvalues and find a 3×3 invertible matrix S so that S −1 AS
matrix A, as follows. Type is diagonal.

[S,D] = eig(A)

206
§7.3 Real Diagonalizable Matrices

3. Let   11. (matlab) Consider the 4 × 4 matrix


−1 4 −2  
A= 0 3 −2  . 12 48 68 88
0 4 −3  −19 −54 −57 −68 
C= . (7.3.4*)
Find the eigenvalues and eigenvectors of A, and find an in-
 22 52 66 96 
vertible matrix S so that S −1 AS is diagonal. −11 −26 −41 −64

Use MATLAB to show that the eigenvalues of C are real and


distinct. Find a matrix S so that S −1 CS is diagonal.
4. Let A and B be similar n × n matrices.

(a) Show that if A is invertible, then B is invertible.


In Exercises 12 – 13 use MATLAB to decide whether or not
(b) Show that A + A−1 is similar to B + B −1 . the given matrix is real diagonalizable.

12. (matlab)
5. Let A and B be n × n matrices. Suppose that A is real
diagonalizable and that B is similar to A. Show that B is real
 
−2.2 4.1 −1.5 −0.2
diagonalizable.
 −3.4 4.8 −1.0 0.2 
A=
 . (7.3.5*)
−1.0 0.4 1.9 0.2 
−14.5 17.8 −6.7 0.6
6. Let A be an n × n real diagonalizable matrix. Show that
13. (matlab)
A + αIn is also real diagonalizable.
 
1.9 2.2 1.5 −1.6 −2.8
0.8 2.6 1.5 −1.8 −2.0
7. Let A be an n × n matrix with a real eigenvalue λ and
 
(7.3.6*)
 
B= 2.6 2.8 1.6 −2.1 −3.8 .
associated eigenvector v. Assume that all other eigenvalues 
4.8 3.6 1.5 −3.1 −5.2

of A are different from λ. Let B be an n × n matrix that
 
−2.1 1.2 1.7 −0.2 0.0
commutes with A; that is, AB = BA. Show that v is also an
eigenvector for B.

8. Let A be an n × n matrix with distinct real eigenvalues


and let B be an n × n matrix that commutes with A. Using
the result of Exercise 7, show that there is a matrix S that
simultaneously diagonalizes A and B; that is, S −1 AS and
S −1 BS are both diagonal matrices.

9. Let A be an n × n matrix all of whose eigenvalues equal


±1. Show that if A is diagonalizable, the A2 = In .

10. Let A be an n × n matrix all of whose eigenvalues equal


0 and 1. Show that if A is diagonalizable, the A2 = A.

207
§7.4 *Existence of Determinants

7.4 *Existence of Determinants j > 2, let Ĉ be obtained from C by swapping rows j and
2. The cofactors Ĉ1k are then obtained from the cofac-
The purpose of this appendix is to verify the inductive
tors C1k by swapping rows j − 1 and 1. The induction
definition of determinant (7.1.9). We have already shown
hypothesis then implies that det(Ĉ1k ) = − det(C1k ) and
that if a determinant function exists, then it is unique.
det(Ĉ) = − det(C). Thus, verifying that det(C) = 0
We also know that the determinant function exists for
reduces to verifying the result when rows 1 and 2 are
1 × 1 matrices. So we assume by induction that the de-
equal.
terminant function exists for (n − 1) × (n − 1) matrices
and prove that the inductive definition gives a determi- Indeed, the most difficult part of this proof is the calcula-
nant function for n × n matrices. tion that shows that if the 1st and 2nd rows of C are equal,
then D(C) = 0. This calculation is tedious and requires
Recall that Aij is the cofactor matrix obtained from A
some facility with indexes and summations. Rather than
by deleting the ith row and j th column — so Aij is an
prove this for general n, we present the proof for n = 4.
(n − 1) × (n − 1) matrix. The inductive definition is:
This case contains all of the ideas of the general proof.
D(A) = a11 det(A11 )−a12 det(A12 )+· · ·+(−1)n+1 a1n det(A1n ). We can assume that
 
a1 a2 a3 a4
We use the notation D(A) to remind us that we have not  a1 a2 a3 a4 
yet verified that this definition satisfies properties (a)- C=  c31 c32 c33 c34 

(c) of Definition 7.1.1. In this appendix we verify these
c41 c42 c43 c44
properties after assuming that the inductive definition
satisfies properties (a)-(c) for (n − 1) × (n − 1) matrices.
Using the cofactor definition D(C) =
For emphasis, we use the notation det to indicate the
determinant of square matrices of size less than n. Note
   
a2 a3 a4 a1 a3 a4
that Lemma 7.1.5, the computation of determinants of el- a1 det  c32 c33 c34  − a2 det  c31 c33 c34  +
ementary row operations, can therefore be assumed valid  c42 c43 c44   c41 c43 c44 
for (n − 1) × (n − 1) matrices. a1 a2 a4 a1 a2 a3
 c31 c32 c34  − a4 det  c31 c32 c33  .
We begin with the following two lemmas. a3 det
c41 c42 c44 c41 c42 c43
Lemma 7.4.1. Let C be an n × n matrix. If two rows
Next we expand each of the four 3 × 3 matrices along
of C are equal or one row of C is zero, then D(C) = 0.
their 1st rows, obtaining D(C) =
      
c33 c34 c32 c34 c32 c33
Proof Suppose that row i of C is zero. If i > 1, then a1 a2 det − a3 det + a4 det
each cofactor has a zero row and by induction the deter-  c43 c44  c42 c44  c42 c43 
c33 c34 c31 c34 c31 c33
minant of the cofactor is 0. If row 1 is zero, then the −a2 a1 det
c c
− a3 det
c c
+ a4 det
43 44 41 44  c41 c43 
cofactor expansion is 0 and D(C) = 0.  
c32 c34
 
c31 c34

c31 c32
+a3 a1 det − a2 det + a4 det
Suppose that row i and row j are equal, where i > 1   c42 c44   c41 c44   c41 c42 
and j > 1. Then the result follows by the induction hy- −a4 a1 det
c32 c33
− a2 det
c31 c33
+ a3 det
c31 c32
c42 c43 c41 c43 c41 c42
pothesis, since each of the cofactors has two equal rows.
So, we can assume that row 1 and row j are equal. If Combining the 2×2 determinants leads to D(C) = 0. 

208
§7.4 *Existence of Determinants

Lemma 7.4.2. Let E be an n×n elementary row matrix (II) Next suppose that E adds row i to row j. If i, j > 1,
and let B be an n × n matrix. Then then the result follows from the induction hypothesis
since the new cofactors are obtained from the old co-
D(EB) = D(E)D(B) (7.4.1) factors by adding row i − 1 to row j − 1.

Proof We recall that the three elementary row oper- If j = 1, then


ations are generated by two: (I) multiply row i by a
D(EB) = (b11 + bi1 ) det(B11 ) + · · · +
nonzero scalar c and (II) add row i to row j. The re-
maining elementary row operations are obtained as fol- (−1)n+1 (b1n + bin ) det(B1n )
lows. Adding c times row i to row j is the composition of = b11 det(B11 ) + · · · + (−1)n+1 b1n det(B1n ) +
 

multiplying row i by c, adding row i to row j, and multi- 


bi1 det(B11 ) + · · · + (−1)n+1 bi1 det(B1n )

plying row i by 1/c. For 2 × 2 matrices swapping rows 1
= D(B) + D(C)
and 2 was written in terms of four other elementary row
operations in (7.1.4). This observation works in general, where the 1st and ith rows of C are equal. The fact that
as follows. Consider the sequence of row operations: D(C) = 0 follows from Lemma 7.4.1.
• add row j to row i If i = 1, then the cofactors are unchanged. It fol-
lows by direct calculation of the cofactor expansion that
• multiply row j by −1 D(EB) = D(B) + D(C) where the 1st and ith rows of C
• add row i to row j are equal. Again, the fact that D(C) = 0 follows from
Lemma 7.4.1. 
• subtract row j from row i
Property (a) is verified for D(A) using cofactors since
We can write swapping rows i and j schematically as: if A is lower triangular, then
         
ri ri + rj ri + rj ri + rj rj
→ → → → D(A) = a11 det(A11 )
rj rj −rj ri ri
and
Thus, we need to verify (7.4.1) for two types of elemen- det(A11 ) = a22 · · · ann
tary row operation: multiply row i by c 6= 0 and add row
j to row i. by the induction hypothesis.
(I) Suppose that E multiplies the ith row by a nonzero Property (c) (D(AB) = D(A)D(B)) is proved sepa-
scalar c. If i > 1, then the cofactor matrix (EA)1j rately for A singular and A nonsigular. In either case,
is obtained from the cofactor matrix A1j by multiply- row reduction implies that A = Es · · · E1 R where R is in
ing the (i − 1)st row by c. By induction, det(EA)1j = reduced echelon form.
c det(A1j ) and D(EA) = cD(A). On the other hand, If A is singular, the the bottom row of R is zero and
D(E) = det(E11 ) = c. So (7.4.1) is verified in this in- together Lemmas 7.4.1 and 7.4.2 imply that D(A) = 0.
stance. If i = 1, then the 1st row of EA is (ca11 , . . . , ca1n ) On the other hand these lemmas also imply that
from which it is easy to use the cofactor formula to verify
(7.4.1). D(AB) = D(Es · · · E1 RB) = D(Es · · · E1 )D(RB)

209
§7.4 *Existence of Determinants

and direct calculation shows that the bottom row of RB a nonzero n-vector w such that Rt w = 0. It follows that
is also zero. Hence D(RB) = 0 and property (c) is valid. v = (Est )−1 · · · (E1t )−1 w satisfies At w = Rt v = 0 and At
is singular. 
Next suppose now that A is nonsingular. It follows that

AB = Es · · · E1 B. Property (b) We prove D(At ) = D(A) in two steps.


Write
Using (7.4.1) we see that A = Es · · · E1 R, (7.4.2)
where the Ej are elementary row matrices and R is in
D(AB) = D(Es ) · · · D(E1 )D(B) = D(Es · · · E1 )D(B) = D(A)D(B),
reduced echelon form. It follows that
as desired.
At = Rt E1t · · · Est . (7.4.3)
Before verifying property (b) we prove the following:
If A is invertible, then R = In and D(At ) = D(A). If A is
Lemma 7.4.3. Let E be an elementary row operation
singular, then At is also singular and D(A) = 0 = D(At ).
matrix. Then D(E t ) = D(E). An n × n matrix A is
singular if and only if At is singular. We have now completed the proof that a determinant
function exists.
Proof The two generators of elementary row opera-
tions are: multiply row i by c and add row i to row j.
The first matrix is diagonal; so E t = E. Denote the sec-
ond matrix by Fij . It follows that Fijt = Fji . We claim
that D(Fij ) = 1 for all i, j and hence that D(E t ) = D(E)
for all E. If i < j, then Fij is lower triangular with 1s
on the diagonal. Hence D(Fij ) = 1. If 1 < j < i,
then D(Fijt ) = D(Fji ) = 1 by induction. If j = 1, then
D(Fi1t
) = 1 by direct calculation.
If A is singular, then A = Es · · · E1 R, where R is in
reduced echelon form and its bottom row is zero. Hence
R is singular. It follows that D(A) = 0. Note that

At = Rt E1t · · · Est

Here we use the fact that (BC)t = C t B t that was dis-


cussed in (3.6.1). By counting pivots in R, we see that
the column space and the row space of R have the same
dimensions. Therefore, the dimension of the row space of
Rt equals the dimension of the column space of Rt equals
the dimension of the row space of R, and all of these are
less than n. Hence Rt is singular. Therefore, there exists

210
Chapter 8 Linear Maps and Changes of Coordinates

8 Linear Maps and Changes


of Coordinates
The first section in this chapter, Section 8.1, defines lin-
ear mappings between abstract vector spaces, shows how
such mappings are determined by their values on a basis,
and derives basic properties of invertible linear mappings.
The notions of row rank and column rank of a matrix
are discussed in Section 8.2 along with the theorem that
states that these numbers are equal to the rank of that
matrix.
Section 8.3 discusses the underlying meaning of similarity
— the different ways to view the same linear mapping on
Rn in different coordinates systems or bases. This discus-
sion makes sense only after the definitions of coordinates
corresponding to bases and of changes in coordinates are
given and justified. In Section 8.4, we discuss the ma-
trix associated to a linearity transformation between two
finite dimensional vector spaces in a given set of coordi-
nates and show that changes in coordinates correspond
to similarity of the corresponding matrices.

211
§8.1 Linear Mappings and Bases

8.1 Linear Mappings and Bases (b) The map L : C 1 → R defined by


The examples of linear mappings from Rn → Rm that we L(f ) = f 0 (2)
introduced in Section 3.3 were matrix mappings. More
precisely, let A be an m × n matrix. Then is linear. Indeed,

LA (x) = Ax L(f + g) = (f + g)0 (2) = f 0 (2) + g 0 (2) = L(f ) + L(g).

defines the linear mapping LA : Rn → Rm . Recall that Similarly, L(cf ) = cL(f ).


Aej is the j th column of A (see Chapter 3, Lemma 3.3.4); (c) The map L : C 1 → C 1 defined by
it follows that A can be reconstructed from the vec-
tors Ae1 , . . . , Aen . This remark implies (Chapter 3, L(f )(t) = f (t − 1)
Lemma 3.3.3) that linear mappings of Rn to Rm are de-
is linear. Indeed,
termined by their values on the standard basis e1 , . . . , en .
Next we show that this result is valid in greater general- L(f +g)(t) = (f +g)(t−1) = f (t−1)+g(t−1) = L(f )(t)+L(g)(t).
ity. We begin by defining what we mean for a mapping
between vector spaces to be linear. Similarly, L(cf ) = cL(f ). It may be helpful to compute
L(f )(t) when f (t) = t2 − t + 1. That is,
Definition 8.1.1. Let V and W be vector spaces and
let L : V → W be a mapping. The map L is linear if L(f )(t) = (t−1)2 −(t−1)+1 = t2 −2t+1−t+1+1 = t2 −3t+3.

L(u + v) = L(u) + L(v)


Constructing Linear Mappings from Bases
L(cv) = cL(v)
Theorem 8.1.2. Let V and W be vector spaces. Let
for all u, v ∈ V and c ∈ R. {v1 , . . . , vn } be a basis for V and let {w1 , . . . , wn } be n
vectors in W . Then there exists a unique linear map
L : V → W such that L(vi ) = wi .
Examples of Linear Mappings (a) Let v ∈ Rn be a fixed
vector. Use the dot product to define the mapping L : Proof Let v ∈ V be a vector. Since
Rn → R by span{v1 , . . . , vn } = V , we may write v as
L(x) = x · v.
v = α1 v1 + · · · + αn vn ,
Then L is linear. Just check that
where α1 , . . . , αn in R. Moreover, v1 , . . . , vn are linearly
L(x + y) = (x + y) · v = x · v + y · v = L(x) + L(y) independent, these scalars are uniquely defined. More
precisely, if
for every vector x and y in Rn and
α1 v1 + · · · + αn vn = β1 v1 + · · · + βn vn ,
L(cx) = (cx) · v = c(x · v) = cL(x)
then
for every scalar c ∈ R. (α1 − β1 )v1 + · · · + (αn − βn )vn = 0.

212
§8.1 Linear Mappings and Bases

Linear independence implies that αj − βj = 0; that is There are two assertions made in Theorem 8.1.2. The
αj = βj . We can now define first is that a linear map exists mapping vi to wi . The
second is that there is only one linear mapping that ac-
L(v) = α1 w1 + · · · + αn wn . (8.1.1) complishes this task. If we drop the constraint that the
map be linear, then many mappings may satisfy these
We claim that L is linear. Let v̂ ∈ V be another vector conditions. For example, find a linear map from R → R
and let that maps 1 to 4. There is only one: y = 4x. However
there are many nonlinear maps that send 1 to 4. Exam-
v̂ = β1 v1 + · · · + βn vn .
ples are y = x + 3 and y = 4x2 .
It follows that
Finding the Matrix of a Linear Map from Rn → Rm Given
v + v̂ = (α1 + β1 )v1 + · · · + (αn + βn )vn ,
by Theorem 8.1.2 Suppose that V = Rn and W = Rm .
We know that every linear map L : Rn → Rm can be
and hence by (8.1.1) that
defined as multiplication by an m × n matrix. The ques-
tion that we next address is: How can we find the matrix
L(v + v̂) = (α1 + β1 )w1 + · · · + (αn + βn )wn
whose existence is guaranteed by Theorem 8.1.2?
= (α1 w1 + · · · + αn wn ) + (β1 w1 + · · · + βn wn )
More precisely, let v1 , . . . , vn be a basis for Rn and let
= L(v) + L(v̂).
w1 , . . . , wn be vectors in Rm . We suppose that all of
these vectors are row vectors. Then we need to find an
Similarly m × n matrix A such that Avit = wit for all i. We find
A as follows. Let v ∈ Rn be a row vector. Since the vi
L(cv) = L((cα1 )v1 + · · · + (cαn )vn ) form a basis, there exist scalars αi such that
= c(α1 w1 + · · · + αn wn )
v = α1 v1 + · · · + αn vn .
= cL(v).
In coordinates
Thus L is linear.
 
α1
Let M : V → W be another linear mapping such that v t = (v1t | · · · |vnt )  ...  , (8.1.2)
 
M (vi ) = wi . Then αn

L(v) = L(α1 v1 + . . . + αn vn ) where (v1t | · · · |vnt ) is an n × n invertible matrix. By defi-


= α1 w1 + · · · + αn wn nition (see (8.1.1))
= α1 M (v1 ) + · · · + αn M (vn ) L(v) = α1 w1 + · · · + αn wn .
= M (α1 v1 + · · · + αn vn ) Thus the matrix A must satisfy
= M (v).  
α1
Thus L = M and the linear mapping is uniquely defined. Av t = (w1t | · · · |wnt )  ...  ,
 
 αn

213
§8.1 Linear Mappings and Bases

where (w1t | · · · |wnt ) is an m × n matrix. Using (8.1.2) we Now apply (8.1.3) to obtain
see that  
  1 0 1  
1 2 1 1  −1 1 −1
Av t = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 v t , A=
0 1 −1
−1 0 1 =
1 −1 3
.
2
−3 2 −5
and
A = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 (8.1.3) As a check, verify by matrix multiplication that Avi =
wi , as claimed.
is the desired m × n matrix.

Properties of Linear Mappings


An Example of a Linear Map from R3 to R2 As an exam-
ple we illustrate Theorem 8.1.2 and (8.1.3) by defining a Lemma 8.1.3. Let U, V, W be vector spaces and L : V →
linear mapping from R3 to R2 by its action on a basis. W and M : U → V be linear maps. Then L◦M : U → W
Let is linear.

v1 = (1, 4, 1) v2 = (−1, 1, 1) v3 = (0, 1, 0).


Proof The proof of Lemma 8.1.3 is identical to that
We claim that {v1 , v2 , v3 } is a basis of R and that there
3 of Chapter 3, Lemma 3.5.1. 
is a unique linear map for which L(vi ) = wi where
A linear map L : V → W is invertible if there exists a
w1 = (2, 0) w2 = (1, 1) w3 = (1, −1). linear map M : W → V such that L◦M : W → W is the
identity map on W and M ◦L : V → V is the identity
We can verify that {v1 , v2 , v3 } is a basis of R3 by showing map on V .
that the matrix
Theorem 8.1.4. Let V and W be finite dimensional
vector spaces and let v1 , . . . , vn be a basis for V . Let
 
1 −1 0
(v1t |v2t |v3t ) =  4 1 1  L : V → W be a linear map. Then L is invertible if and
1 1 0 only if w1 , . . . , wn is a basis for W where wj = L(vj ).

is invertible. This can either be done in MATLAB using Proof If w1 , . . . , wn is a basis for W , then use The-
the inv command or by hand by row reducing the matrix orem 8.1.2 to define a linear map M : W → V by
  M (wj ) = vj . Note that
1 −1 0 1 0 0
 4 1 1 0 1 0  L◦M (wj ) = L(vj ) = wj .
1 1 0 0 0 1
It follows by linearity (using the uniqueness part of The-
to obtain orem 8.1.2) that L◦M is the identity of W . Similarly,
  M ◦L is the identity map on V , and L is invertible.
1 0 1
1
(v1t |v2t |v3t )−1 =  −1 0 1 . Conversely, suppose that L◦M and M ◦L are identity
2
−3 2 −5 maps and that wj = L(vj ). We must show that

214
§8.1 Linear Mappings and Bases

w1 , . . . , wn is a basis. We use Theorem 5.5.3 and ver- Exercises


ify separately that w1 , . . . , wn are linearly independent
and span W .
If there exist scalars α1 , . . . , αn such that 1. Use Theorem 8.1.2 and (8.1.3) to construct matrix of a
linear mapping L from R3 to R2 with L(vi ) = wi , i = 1, 2, 3,
α1 w1 + · · · + αn wn = 0, where

then apply M to both sides of this equation to obtain v1 = (1, 0, 2) v2 = (2, −1, 1) v3 = (−2, 1, 0)

0 = M (α1 w1 + · · · + αn wn ) = α1 v1 + · · · + αn vn . and
w1 = (−1, 0) w2 = (0, 1) w3 = (3, 1).
But the vj are linearly independent. Therefore, αj = 0
and the wj are linearly independent.
2. Let Pn be the vector space of polynomials p(t) of degree
To show that the wj span W , let w be a vector in less than or equal to n. Show that {1, t, t2 , . . . , tn } is a basis
W . Since the vj are a basis for V , there exist scalars for Pn .
β1 , . . . , βn such that

M (w) = β1 v1 + · · · + βn vn . 3. Which of the following mappings T are linear? Circle those


maps that are linear and cross out those maps that are not
Applying L to both sides of this equation yields linear.

w = L◦M (w) = β1 w1 + · · · + βn wn . (a) T : R2 → R3 where T (x1 , x2 ) = (2x1 −x2 , x2 +x1 , x1 +2)


Z 5
Therefore, the wj span W .  (b) T : Pk → R2 where T (p(t)) = (p0 (1), p(t)dt)
2

(c) Fix B ∈ Mn,n . Then T : Mn,n → Mn,n where T (A) =


Corollary 8.1.5. Let V and W be finite dimensional AB − BA
vector spaces. Then there exists an invertible linear map
L : V → W if and only if dim(V ) = dim(W ). Recall that Pk is the vector space of polynomials of degree
≤ k and Mn,n is the vector space of n × n square matrices.
Proof Suppose that L : V → W is an invertible linear
map. Let v1 , . . . , vn be a basis for V where n = dim(V ).
Then Theorem 8.1.4 implies that L(v1 ), . . . , L(vn ) is a 4. Show that
d
basis for W and dim(W ) = n = dim(V ). : P3 → P2
dt
Conversely, suppose that dim(V ) = dim(W ) = n. Let d
is a linear mapping, where be a transformation that maps
v1 , . . . , vn be a basis for V and let w1 , . . . , wn be a basis dt
for W . Using Theorem 8.1.2 define the linear map L : d
p(t) 7→ p(t).
dt
V → W by L(vj ) = wj . Theorem 8.1.4 states that L is
invertible. 

215
§8.1 Linear Mappings and Bases

5. Show that 10. Let M(n) denote the vector space of n × n matrices and
let A be an n × n matrix. Let L : M(n) → M(n) be the
Z t
L(p) = p(s)ds
0 mapping defined by L(X) = AX − XA where X ∈ M(n).
is a linear mapping of P2 → P3 . Verify that L is a linear mapping. Show that the null space
of L, {X ∈ M(n) : L(X) = 0}, is a subspace consisting of all
matrices X that commute with A.
6. Let P3 ⊂ C 1 be the vector space of polynomials of degree
less than or equal to three. Let T : P3 → R be the function
dp 11. Which of the following are True and which False. Give
T (p) = (0), where p ∈ P. reasons for your answer.
dt
(a) Show that T is linear. (a) For any n × n matrix A, det(A) is the product of its n
(b) Find a basis for the null space of T . eigenvalues.

(c) Let S : P3 → R be the function S(p) = p(0)2 . Show that (b) Similar matrices always have the same eigenvectors.
S is not linear. (c) For any n × n matrix A and scalar k ∈ R, det(kA) =
kn det(A).
(d) There is a linear map L : R3 → R2 such that
7. Use Exercises 4, 5 and Theorem 8.1.2 to show that
L(1, 2, 3) = (0, 1) and L(2, 4, 6) = (1, 1).
d
◦L : P2 → P2
dt (e) The only rank 0 matrix is the zero matrix.
is the identity map.
Z 2π
12. Let L : C 1 → R be defined by L(f ) = f (t) cos(t)dt
8. Let W ⊂ Rn be a k-dimensional subspace where k < n. 0
for f ∈ C . Verify that L is a linear mapping.
1
Define

W ⊥ = {v ∈ Rn : v · w = 0 for all w ∈ W }
13. Let P be the vector space of polynomials in one variable
(a) Show that W ⊥ is a subspace of Rn .
Z t
t. Define L : P → P by L(p)(t) = (s − 1)p(s)ds. Verify
(b) Find a basis for W ⊥ in the special case that W = 0
that L is a linear mapping.
span{e1 , e2 , e3 } ⊂ R5 .

9. Let C denote the set of complex numbers. Verify that C is 14. Show that
a two-dimensional vector space. Show that L : C → C defined d2
: P4 → P2
by dt2
L(z) = λz, is a linear mapping. Then compute bases for the null space
where λ = σ + iτ is a linear mapping. d2
and range of 2 .
dt

216
§8.2 Row Rank Equals Column Rank

8.2 Row Rank Equals Column Rank So the null space of L is closed under addition and scalar
multiplication and is a subspace of V .
Let A be an m × n matrix. The row space of A is the
span of the row vectors of A and is a subspace of Rn . To prove that the range of L is a subspace of W , let w1
The column space of A is the span of the columns of A and w2 be in the range of L. Then, by definition, there
and is a subspace of Rm . exist v1 and v2 in V such that L(vj ) = wj . It follows
that
Definition 8.2.1. The row rank of A is the dimension
of the row space of A and the column rank of A is the L(v1 + v2 ) = L(v1 ) + L(v2 ) = w1 + w2 .
dimension of the column space of A.
Therefore, w1 + w2 is in the range of L. Similarly,
Lemma 5.5.4 of Chapter 5 states that
L(cv1 ) = cL(v1 ) = cw1 .
row rank(A) = rank(A).
So the range of L is closed under addition and scalar
We show below that row ranks and column ranks are multiplication and is a subspace of W . 
equal. We begin by continuing the discussion of the pre-
vious section on linear maps between vector spaces.
Suppose that A is an m × n matrix and LA : Rn → Rm is
the associated linear map. Then the null space of LA is
Null Space and Range Each linear map between vector precisely the null space of A, as defined in Definition 5.2.1
spaces defines two subspaces. Let V and W be vector of Chapter 5. Moreover, the range of LA is the column
spaces and let L : V → W be a linear map. Then space of A. To verify this, write A = (A1 | · · · |An ) where
Aj is the j th column of A and let v = (v1 , . . . vn )t . Then,
null space(L) = {v ∈ V : L(v) = 0} ⊂ V LA (v) is the linear combination of columns of A
and
LA (v) = Av = v1 A1 + · · · + vn An .
range(L) = {L(v) ∈ W : v ∈ V } ⊂ W.
Lemma 8.2.2. Let L : V → W be a linear map between There is a theorem that relates the dimensions of the null
vector spaces. Then the null space of L is a subspace of space and range with the dimension of V .
V and the range of L is a subspace of W .
Theorem 8.2.3. Let V and W be vector spaces with V
finite dimensional and let L : V → W be a linear map.
Proof The proof that the null space of L is a subspace Then
of V follows from linearity in precisely the same way that
the null space of an m × n matrix is a subspace of Rn . dim(V ) = dim(null space(L)) + dim(range(L)).
That is, if v1 and v2 are in the null space of L, then

L(v1 + v2 ) = L(v1 ) + L(v2 ) = 0 + 0 = 0,

and for c ∈ R Proof Since V is finite dimensional, the null space of


L is finite dimensional (since the null space is a subspace
L(cv1 ) = cL(v1 ) = c0 = 0. of V ) and the range of L is finite dimensional (since it is

217
§8.2 Row Rank Equals Column Rank

spanned by the vectors L(vj ) where v1 , . . . , vn is a basis Row Rank and Column Rank Recall Theorem 5.5.6 of
for V ). Let u1 , . . . , uk be a basis for the null space of L Chapter 5 that states that the nullity plus the rank of an
and let w1 , . . . , w` be a basis for the range of L. Choose m × n matrix equals n. At first glance it might seem that
vectors yj ∈ V such that L(yj ) = wj . We claim that this theorem and Theorem 8.2.3 contain the same infor-
u1 , . . . , uk , y1 , . . . , y` is a basis for V , which proves the mation, but they do not. Theorem 5.5.6 of Chapter 5
theorem. is proved using a detailed analysis of solutions of linear
To verify that u1 , . . . , uk , y1 , . . . , y` are linear indepen- equations based on Gaussian elimination, back substitu-
dent, suppose that tion, and reduced echelon form, while Theorem 8.2.3 is
proved using abstract properties of linear maps.
α1 u1 + · · · + αk uk + β1 y1 + · · · + β` y` = 0. (8.2.1) Let A be an m × n matrix. Theorem 5.5.6 of Chapter 5
states that
Apply L to both sides of (8.2.1) to obtain

β1 w1 + · · · + β` w` = 0. nullity(A) + rank(A) = n.

Since the wj are linearly independent, it follows that βj = Meanwhile, Theorem 8.2.3 states that
0 for all j. Now (8.2.1) implies that
dim(null space(LA )) + dim(range(LA )) = n.
α1 u1 + · · · + αk uk = 0.

Since the uj are linearly independent, it follows that αj = But the dimension of the null space of LA equals the
0 for all j. nullity of A and the dimension of the range of A equals
the dimension of the column space of A. Therefore,
To verify that u1 , . . . , uk , y1 , . . . , y` span V , let v be in
V . Since w1 , . . . , w` span W , it follows that there exist nullity(A) + dim(column space(A)) = n.
scalars βj such that

L(v) = β1 w1 + · · · + β` w` . Hence, the rank of A equals the column rank of A. Since


rank and row rank are identical, we have proved:
Note that by choice of the yj
Theorem 8.2.4. Let A be an m × n matrix. Then
L(β1 y1 + · · · + β` y` ) = β1 w1 + · · · + β` w` .
row rank A = column rank A.
It follows by linearity that

u = v − (β1 y1 + · · · + β` y` ) Since the row rank of A equals the column rank of At ,


we have:
is in the null space of L. Hence there exist scalars αj
such that Corollary 8.2.5. Let A be an m × n matrix. Then
u = α1 u1 + · · · + αk uk .
Thus, v is in the span of u1 , . . . , uk , y1 , . . . , y` , as desired. rank(A) = rank(At ).


218
§8.2 Row Rank Equals Column Rank

Exercises 6. Let B be an m × p matrix and let C be a p × n matrix.


Prove that the rank of the m × n matrix A = BC satisfies

rank(A) ≤ min{rank(B), rank(C)}.


1. The 3 × 3 matrix
 
1 2 5
A= 2 −1 1  7. (matlab) Let
3 1 6  
1 1 2 2
has rank two. Let r1 , r2 , r3 be the rows of A and c1 , c2 , c3 be  0 −1 3 1 
the columns of A. Find all scalars α1 , α2 , α3 and β1 , β2 , β3 A= . (8.2.2*)
 2 −1 1 0 
such that −1 0 7 4
α1 r1 + α2 r2 + α3 r3 = 0 (a) Compute rank(A) and exhibit a basis for the row space
β1 c1 + β2 c2 + β3 c3 = 0. of A.
(b) Find a basis for the column space of A.
(c) Find all solutions to the homogeneous equation Ax = 0.
2. What is the largest row rank that a 5 × 3 matrix can have?
(d) Does  
4
3. Let  
 2 
Ax =  
1 1 0 1  2 
A= 0 −1 1 2 . 1
1 2 −1 3 have a solution?
(a) Find a basis for the row space of A and the row rank of
A.
(b) Find a basis for the column space of A and the column 8. Let C be the 3 × 3 matrix
rank of A.  
1 1 1
(c) Find a basis for the null space of A and the nullity of A.
C = −1 b −1 
(d) Find a basis for the null space of At and the nullity of 2 2 b2 + 1
At .
(a) Find all b so that
(i) dim(range(C)) = 3
4. Let A be a nonzero 3 × 3 matrix such that A2 = 0. Show
that rank(A) = 1. (ii) dim(range(C)) = 2
(iii) dim(range(C)) = 1
(iv) dim(range(C)) = 0
5. Let V = range(LA ) where A is a n × m matrix of rank 2.
Is V a subspace of Rn and, if so, what is dim(V )? (b) Find all b so that
(i) dim(null space(C)) = 3

219
§8.2 Row Rank Equals Column Rank

(ii) dim(null space(C)) = 2


(iii) dim(null space(C)) = 1
(iv) dim(null space(C)) = 0
(c) Find all b so that  
1
Cx = 1 (8.2.3)
2
is consistent. (Hint: can you convert this into a state-
ment about the range of C?)

9. Let  
a11 a12 a13 a14 a15
A=
a21 a22 a23 a24 a25
and suppose a11 a22 − a12 a21 6= 0. What is the nullity(A)?
Explain your answer.

220
§8.3 Vectors and Matrices in Coordinates

8.3 Vectors and Matrices in uniquely as a linear combination of vectors in W; that is,
Coordinates v = α1 w1 + · · · + αn wn ,
In the last half of this chapter we discuss how similarity
of matrices should be thought of as change of coordi- for uniquely defined scalars α1 , . . . , αn .
nates for linear mappings. There are three steps in this
discussion. Proof Since W is a basis, Theorem 5.5.3 of Chapter 5
implies that the vectors w1 , . . . , wn span V and are lin-
(a) Formalize the idea of coordinates for a vector in early independent. Therefore, we can write v in V as a
terms of basis. linear combination of vectors in B. That is, there are
scalars α1 , . . . , αn such that
(b) Discuss how to write a linear map as a matrix in
each coordinate system. v = α1 w1 + · · · + αn wn .
(c) Determine how the matrices corresponding to the Next we show that these scalars are uniquely defined.
same linear map in two different coordinate systems Suppose that we can write v as a linear combination of
are related. the vectors in B in a second way; that is, suppose

The answer to the last question is simple: the matrices v = β1 w 1 + · · · + βn w n


are related by a change of coordinates if and only if they
are similar. We discuss these steps in this section in Rn for scalars β1 , . . . , βn . Then
and in Section 8.4 for general vector spaces.
(α1 − β1 )w1 + · · · + (αn − βn )wn = 0.

Coordinates of Vectors using Bases Throughout, we Since the vectors in W are linearly independent, it follows
have written vectors v ∈ Rn in coordinates as v = that αj = βj for all j. 
(v1 , . . . , vn ), and we have used this notation almost with-
out comment. From the point of view of vector space Definition 8.3.2. Let W = {w1 , . . . , wn } be a basis in
operations, we are just writing a vector space V . Lemma 8.3.1 states that we can write
v ∈ V uniquely as
v = v1 e 1 + · · · + vn e n
v = α1 w1 + · · · + αn wn . (8.3.1)
as a linear combination of the standard basis E =
{e1 , . . . , en } of Rn . The scalars α1 , . . . , αn are the coordinates of v relative to
the basis W, and we denote the coordinates of v in the
More generally, each basis provides a set of coordinates
basis W by
for a vector space. This fact is described by the following
lemma (although its proof is identical to the first part of [v]W = (α1 , . . . , αn ) ∈ Rn . (8.3.2)
the proof of Theorem 8.1.2.
Lemma 8.3.1. Let W = {w1 , . . . , wn } be a basis for the We call the coordinates of a vector v ∈ Rn relative to the
vector space V . Then each vector v in V can be written standard basis, the standard coordinates of v.

221
§8.3 Vectors and Matrices in Coordinates

Writing Linear Maps in Coordinates as Matrices Let Proof The process of choosing the coordinates of vec-
V be a finite dimensional vector space of dimension n and tors relative to a given basis W = {w1 , . . . , wn } of a
let L : V → V be a linear mapping. We now show how vector space V is itself linear. Indeed,
each basis of V allows us to associate an n × n matrix
to L. Previously we considered this question with the [u + v]W = [u]W + [v]W
standard basis on V = Rn . We showed in Chapter 3 that [cv]W = c[v]W .
we can write the linear mapping L as a matrix mapping,
as follows. Let E = {e1 , . . . , en } be the standard basis in Thus the coordinate mapping relative to a basis W of V
Rn . Let A be the n × n matrix whose j th column is the defined by
n vector L(ej ). Then Chapter 3, Theorem 3.3.5 shows v 7→ [v]W (8.3.4)
that the linear map is given by matrix multiplication as
is a linear mapping of V into Rn . We denote this linear
L(v) = Av. mapping by [·]W : V → Rn .
Thus every linear mapping on Rn can be written in this It now follows that both the left hand and right hand
matrix form. sides of (8.3.3) can be thought of as linear mappings
of V → Rn . In verifying this comment, we recall
Remark. Another way to think of the j th column of the Lemma 8.1.3 of Chapter 5 that states that the compo-
matrix A is as the coordinate vector of L(ej ) relative to sition of linear maps is linear. On the left hand side we
the standard basis, that is, as [L(ej )]E . We denote the have the mapping
matrix A by [L]E ; this notation emphasizes the fact that
A is the matrix of L relative to the standard basis. v 7→ L(v) 7→ [L(v)]W ,

We now discuss how to write a linear map L as a matrix which is the composition of the linear maps: [·]W with
using different coordinates. L. See (8.3.4). The right hand side is

Definition 8.3.3. Let W = {w1 , . . . , wn } be a basis for v 7→ [v]W 7→ [L]W [v]W ,


the vector space V . The n × n matrix [L]W associated
to the linear map L : V → V and the basis W is defined which is the composition of the linear maps: multiplica-
as follows. The j th column of [L]W is [L(wj )]W — the tion by the matrix [L]W with [·]W .
coordinates of L(wj ) relative to the basis W.
Theorem 8.1.2 states that linear mappings are deter-
mined by their actions on a basis. Thus to verify (8.3.3),
Note that when V = Rn and when W = E, the standard
we need only verify this equality for v = wj for all j.
basis of Rn , then the definition of the matrix [L]E is ex-
Since [wj ]W = ej , the right hand side of (8.3.3) is:
actly the same as the matrix associated with the linear
map L in Remark 8.3.
[L]W [wj ]W = [L]W ej ,
Lemma 8.3.4. The coordinate vector of L(v) relative to
the basis W is which is just the j th column of [L]W . The left hand side
of (8.3.3) is the vector [L(wj )]W , which by definition is
[L(v)]W = [L]W [v]W . (8.3.3) also the j th column of [L]W (see Definition 8.3.3). 

222
§8.3 Vectors and Matrices in Coordinates

Computations of Vectors in Coordinates in Rn We and (1.5, 0.5) are the coordinates of v in the basis
divide this subsection into three parts. We consider a {w1 , w2 }.
simple example in R2 algebraically in the first part and
Using the notation in (8.3.2), we may rewrite (8.3.5) as
geometrically in the second. In the third part we formal-
ize and extend the algebraic discussion to Rn .  
1 2 1
[v]W = [v]E ,
3 1 −1
An Example of Coordinates in R How do we find the
2

coordinates of a vector v in a basis? For example, choose where E = {e1 , e2 } is the standard basis.
a (nonstandard) basis in the plane — say

w1 = (1, 1) and w2 = (1, −2). Planar Coordinates Viewed Geometrically using MATLAB
Since {w1 , w2 } is a basis, we may write the vector v as Next we use MATLAB to view geometrically the notion
a linear combination of the vectors w1 and w2 . Thus we of coordinates relative to a basis W = {w1 , w2 } in the
can find scalars α1 and α2 so that plane. Type

v = α1 w1 +α2 w2 = α1 (1, 1)+α2 (1, −2) = (α1 +α2 , α1 −2α2 ). w1 = [1 1];


w2 = [1 -2];
In standard coordinates, set v = (v1 , v2 ); this equation bcoord
leads to the system of linear equations

v1 = α1 + α2 MATLAB will create a graphics window showing the two


v2 = α1 − 2α2 basis vectors w1 and w2 in red. Using the mouse click
on a point near (2, 0.5) in that figure. MATLAB will
in the two variables α1 and α2 . As we have seen, the respond by plotting the new vector v in yellow and the
fact that w1 and w2 form a basis of R2 implies that these parallelogram generated by α1 w1 and α2 w2 in cyan. The
equations do have a solution. Indeed, we can write this values of α1 and α2 are also plotted on this figure. See
system in matrix form as Figure 24.
    
v1 1 1 α1
= ,
v2 1 −2 α2 Abstracting R2 to Rn Suppose that we are given a basis
W = {w1 , . . . , wn } of Rn and a vector v ∈ Rn . How do
which is solved by inverting the matrix to obtain:
we find the coordinates [v]W of v in the basis W?
For definiteness, assume that v and the wj are row vec-
    
α1 1 2 1 v1
= . (8.3.5)
α2 3 1 −1 v2 tors. Equation (8.3.1) may be rewritten as

For example, suppose v = (2.0, 0.5). Using (8.3.5) we 


α1

find that (α1 , α2 ) = (1.5, 0.5); that is, we can write
v t = (w1t | · · · |wnt )  ...  .
 
v = 1.5w1 + 0.5w2 , αn

223
§8.3 Vectors and Matrices in Coordinates

Coordinates in the {w1,w2} basis w1 = [ 1 4 7];


2 w2 = [ 2 1 0];
w3 = [-4 2 1];
1.5 inv([w1' w2' w3'])*[4 1 3]'

1 w1 The answer is:


1.499
0.5 v
ans =
0
0.5306
0.3061
−0.5 0.5075 -0.7143

−1
Determining the Matrix of a Linear Mapping in Co-
−1.5 ordinates Suppose that we are given the linear map
LA : Rn → Rn associated to the matrix A in standard
−2 w2 coordinates and a basis w1 , . . . , wn of Rn . How do we find
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 the matrix [LA ]W . As above, we assume that the vectors
wj and the vector v are row vectors Since LA (v) = Av t
Figure 24: The coordinates of v = (2.0, 0.5) in the basis we can rewrite (8.3.3) as
w1 = (1, 1), w2 = (1, −2).
[LA ]W [v]W = [Av t ]W
As above, let PW = (w1t | · · · |wnt ). Using (8.3.6) we see
Thus,
  that
α1 −1 t
[LA ]W PW v = PW −1
Av t .
[v]W =  ...  = PW
−1 t
v, (8.3.6)
 
Setting
αn −1 t
u = PW v
where PW = (w1t | · · · |wnt ). Since the wj are a basis for we see that
−1
Rn , the columns of the matrix PW are linearly indepen- [LA ]W u = PW APW u.
dent, and PW is invertible. Therefore,
−1
We may use (8.3.6) to compute [v]W using MATLAB. For [LA ]W = PW APW .
example, let We have proved:
v = (4, 1, 3)
Theorem 8.3.5. Let A be an n × n matrix and let
and LA : Rn → Rn be the associated linear map. Let
W = {w1 , . . . , wn } be a basis for Rn . Then the ma-
w1 = (1, 4, 7) w2 = (2, 1, 0) w3 = (−4, 2, 1). trix [LA ]W associated to to LA in the basis W is similar
to A. Therefore the determinant, trace, and eigenvalues
Then [v]W is found by typing of [LA ]W are identical to those of A.

224
§8.3 Vectors and Matrices in Coordinates

Matrix Normal Forms in R2 If we are careful about iw2 be a complex eigenvector of L associated with
how we choose the basis W, then we can simplify the the eigenvalue σ − iτ . Then W = {w1 , w2 } is a basis
form of the matrix [L]W . Indeed, we have already seen and
examples of this process when we discussed how to find
 
σ −τ
[L]W = .
closed form solutions to linear planar systems of ODEs τ σ
in the previous chapter. For example, suppose that L :
R2 → R2 has real eigenvalues λ1 and λ2 with two linearly (c) Suppose that L has exactly one linearly independent
independent eigenvectors w1 and w2 . Then the matrix real eigenvector w1 with real eigenvalue λ. Choose
associated to L in the basis W = {w1 , w2 } is the diagonal the generalized eigenvector w2
matrix
(L − λI2 )(w2 ) = w1 . (8.3.8)
 
λ1 0
[L]W = , (8.3.7)
0 λ2
Then W = {w1 , w2 } is a basis and
since  
λ1 
λ 1

[L(w1 )]W = [λ1 w1 ]W = [L]W = .
0 0 λ
and  
0
[L(w2 )]W = [λ2 w2 ]W = . Proof The verification of (a) was discussed in (8.3.7).
λ2
The verification of (b) follows from (6.2.11) on equating
w1 with v and w2 with w. The verification of (c) follows
In Chapter 6 we showed how to classify 2 × 2 matrices
directly from (8.3.8) as
up to similarity (see Theorem 6.3.4) and how to use this
classification to find closed form solutions to planar sys-
[L(w1 )]W = λe1 and [L(w2 )]W = e1 + λe2 .
tems of linear ODEs (see Section 6.3). We now use the
ideas of coordinates and matrices associated with bases 
to reinterpret the normal form result (Theorem 6.3.4) in
a more geometric fashion.
Visualization of Coordinate Changes in ODEs We con-
Theorem 8.3.6. Let L : R2 → R2 be a linear mapping. sider two examples. As a first example note that the
Then in an appropriate coordinate system defined by the matrices
basis W below, the matrix [L]W has one of the following
forms.
   
1 0 4 −3
C= and B = ,
0 −2 6 −5
(a) Suppose that L has two linearly independent real
eigenvectors w1 and w2 with real eigenvalues λ1 and are similar matrices. Indeed, B = P −1 CP where
λ2 . Then  
2 −1
 
[L]W =
λ1 0
. P = . (8.3.9)
0 λ2 1 −1

(b) Suppose that L has no real eigenvectors and complex The phase portraits of the differential equations Ẋ = BX
conjugate eigenvalues σ ± iτ where τ 6= 0. Let w1 + and Ẋ = CX are shown in Figure 25. Note that both

225
§8.3 Vectors and Matrices in Coordinates

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

Figure 25: Phase planes for the saddles Ẋ = BX and Ẋ = CX.

phase portraits are pictures of the same saddle — just in 2. Let W = {v1 , . . . , vn } be a basis of Rn .
different coordinate systems.
(a) State the definition of the coordinates of a vector x ∈ Rn
As a second example note that the matrices relative to W, and describe how to find them given the
    standard coordinates of x.
0 2 6 −4
C= and B = (b) What vector v ∈ Rn satisfies
−2 0 10 −6
are similar matrices, and both are centers. Indeed, B = [v]W = e1 − e2
P −1 CP where P is the same matrix as in (8.3.9). The
phase portraits of the differential equations Ẋ = BX (c) What is the definition of the matrix of a linear function
T : Rn → Rn relative to W?
and Ẋ = CX are shown in Figure 26. Note that both
phase portraits are pictures of the same center — just in (d) Let T : Rn → Rn be a linear transformation with stan-
different coordinate systems. dard matrix A so that [T ]W = B. What is the relation-
ship between A and B?

Exercises
3. Let W = {w1 , w2 } be a basis for R2 where w1 = (1, 2) and
w2 = (0, 1). Let LA : R2 → R2 be the linear map given by
the matrix
1. Let w1 = (1, 4), w2 = (−2, 1) and W = {w1 , w2 }. Find the
 
2 1
A=
coordinates of v = (−1, 32) in the W basis. −1 0
in standard coordinates. Find the matrix [L]W .

226
§8.3 Vectors and Matrices in Coordinates

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

Figure 26: Phase planes for the centers Ẋ = BX and Ẋ = CX.

4. Let Eij be the 2 × 3 matrix whose entry in the ith row and 6. Verify that V = {p1 , p2 , p3 } where
j th column is 1 and all of whose other entries are 0.
p1 (t) = 1 + 2t, p2 (t) = t + 2t2 , and p3 (t) = 2 − t2 ,
(a) Show that
is a basis for the vector space of polynomials P2 . Let p(t) = t
V = {E11 , E12 , E13 , E21 , E22 , E23 } and find [p]V .

is a basis for the vector space of 2 × 3 matrices.


(b) Compute [A]V where 7. (matlab) Let
 
A=
−1 0 2
. w1 = (1, 0, 2), w2 = (2, 1, 4), and w3 = (0, 1, −1)
3 −2 4
be a basis for R3 . Find [v]W where v = (2, 1, 5).

5. Suppose the mapping L : R3 → R2 is linear and satisfies


8. (matlab) Let
     
1   0   0  
1 2 −1 w1 = (0.2, −1.3, 0.34, −1.1)
L  0  = L  1  = L  0  =
2 0 4 w2 = (0.5, −0.6, 0.7, 0.8)
0 1 1 (8.3.10*)
w3 = (−1.0, 1.0, 2.0, 4.5)
What is the 2 × 3 matrix A such that L = LA ? w4 = (−5.1, 0.0, 1.6, −1.7)

be a basis W for R4 . Find [v]W where v = (1.7, 2.3, 1.0, −5.0).

227
§8.3 Vectors and Matrices in Coordinates

9. Find a basis W = {w1 , w2 } such that [LA ]W is a diagonal


matrix, where LA is the linear map associated with the matrix
 
−10 −6
A= .
18 11

10. (matlab) Let A be the 4 × 4 matrix


 
2 1 4 6
 1 2 1 1 
A=   (8.3.11*)
0 1 2 4 
2 1 1 5

and let W = {w1 , w2 , w3 , w4 } where

w1 = (1, 2, 3, 4)
w2 = (0, −1, 1, 3)
(8.3.12*)
w3 = (2, 0, 0, 1)
w4 = (−1, 1, 3, 0)

Verify that W is a basis of R4 and compute the matrix asso-


ciated to A in the W basis.

228
§8.4 *Matrices of Linear Maps on a Vector Space

8.4 *Matrices of Linear Maps on a is the transition matrix, where


Vector Space z1 = c11 w1 + · · · + cn1 wn
Returning to the general finite dimensional vector space .. (8.4.5)
.
V , suppose that zn = c1n w1 + · · · + cnn wn
W = {w1 , . . . , wn } and Z = {z1 , . . . , zn }
for scalars cij .
are bases of V . Then we can write
Proof We can restate (8.4.5) as
v = α1 w1 + · · · + αn wn and v = β1 z1 + · · · + βn zn
to obtain the coordinates
 
c1j
[zj ]W =  ...  .
 
[v]W = (α1 , . . . , αn ) and [v]Z = (β1 , . . . , βn ) (8.4.1)
cnj
of v relative to the bases W and Z. The question that we
address is: How are [v]W and [v]Z related? We answer Note that
this question by finding an n × n matrix CWZ such that [zj ]Z = ej ,
by definition. Since the transition matrix satisfies [v]W =
   
α1 β1
 ..   .  CWZ [v]Z for all vectors v ∈ V , it must satisfy this rela-
 .  = CWZ  ..  . (8.4.2)
tion for v = zj . Therefore,
αn βn
[zj ]W = CWZ [zj ]Z = CWZ ej .
We may rewrite (8.4.2) as
[v]W = CWZ [v]Z . (8.4.3) It follows that [zj ]W is the j th column of CWZ , which
proves the theorem. 
Definition 8.4.1. Let W and Z be bases for the n-
dimensional vector space V . The n × n matrix CWZ is a
transition matrix if CWZ satisfies (8.4.3). A Formula for CWZ when V = Rn For bases in Rn ,
there is a formula for finding transition matrices. Let
W = {w1 , . . . , wn } and Z = {z1 , . . . , zn } be bases of Rn
Transition Mappings Defined The next theorem presents — written as row vectors. Also, let v ∈ Rn be written as
a method for finding the transition matrix between co- a row vector. Then (8.3.6) implies that
ordinates associated to bases in an n-dimensional vector
space V .
−1 t
[v]W = PW v and [v]Z = PZ−1 v t ,

Theorem 8.4.2. Let W = {w1 , . . . , wn } and Z = where


{z1 , . . . , zn } be bases for the n-dimensional vector space
V . Then PW = (w1t | · · · |wnt ) and PZ = (z1t | · · · |znt ).
 
c11 · · · c1n
It follows that
CWZ =  ... ..
.
.. 
.  (8.4.4)

−1
cn1 ··· cnn [v]W = PW PZ [v]Z

229
§8.4 *Matrices of Linear Maps on a Vector Space

and that w1 = [1 1];


−1
CWZ = PW PZ . (8.4.6) w2 = [1 -2];
z1 = [1 3];
As an example, consider the following bases of R4 . Let z2 = [-1 2];
ccoord
w1 = [1, 4, 2, 3] z1= [3, 2, 0, 1]
w2 = [2, 1, 1, 4] z2= [−1, 0, 2, 3]
w3 = [0, 1, 5, 6] z3= [3, 1, 1, 3] The MATLAB program ccoord opens two graphics win-
w4 = [2, 5, −1, 0] z4= [2, 2, 3, 5] dows representing the W and Z planes with the basis
(8.4.7*) vectors plotted in red. Clicking the left mouse button on
Then the matrix CWZ is obtained by typing e9_4_7 to a vector in the W plane simultaneously plots this vector
enter the bases and v in both planes in yellow and the coordinates of v in the
respective bases in cyan. See Figure 27. From this dis-
play you can visualize the coordinates of a vector relative
inv([w1' w2' w3' w4'])*[z1' z2' z3' z4']
to two different bases.
to compute CWZ . The answer is: Note that the program ccoord prints the transition matrix
CWZ in the MATLAB control window. We can verify the
ans = calculations of the program ccoord on this example by
-8.0000 5.5000 -7.0000 -3.2500 hand. Recall that (8.4.6) states that
-0.5000 0.7500 0.0000 0.1250 −1 
4.5000 -2.7500 4.0000 2.3750
 
1 2 1 2
CWZ =
6.0000 -4.0000 5.0000 2.5000 2 3 4 1
  
−3 2 1 2
=
Coordinates Relative to Two Different Bases in R2 Recall 2 −1 4 1
the basis W
 
5 −4
= .
−2 3
w1 = (1, 1) and w2 = (1, −2)

of R2 that was used in a previous example. Suppose that


Z = {z1 , z2 } is a second basis of R2 . Write v = (v1 , v2 ) Matrices of Linear Maps in Different Bases
as a linear combination of the basis Z
Theorem 8.4.3. Let L : V → V be a linear mapping
v = β 1 z 1 + β2 z 2 , and let W and Z be bases of V . Then

obtaining the coordinates [v]Z = (β1 , β2 ). [L]Z and [L]W


We use MATLAB to illustrate how the coordinates of a
vector v relative to two bases may be viewed geometri- are similar matrices. More precisely,
cally. Suppose that z1 = (1, 3) and z2 = (1, −2). Then
enter the two bases W and Z by typing −1
[L]W = CZW [L]Z CZW . (8.4.8)

230
§8.4 *Matrices of Linear Maps on a Vector Space

Coordinates in the {w1,w2} basis Coordinates in the {z1,z2} basis

3 3 z1

2 2 z2

0.7916
1 w1 1
1.319

0 v 0 v

0.6645
−1 −1
−1.192

−2 w2 −2

−3 −3

−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4

Figure 27: The coordinates of v = (1.9839, −0.0097) in the bases w1 = (1, 1), w2 = (1, −2) and z1 = (1, 3), z2 = (−1, 2).

Proof For every v ∈ Rn we compute be two bases of R2 . Find CW Z .

CZW [L]W [v]W = CZW [L(v)]W


= [L(v)]Z 2. Let f1 (t) = cos t and f2 (t) = sin t be functions in C 1 .
Let V be the two dimensional subspace spanned by f1 , f2 ; so
= [L]Z [v]Z F = {f1 , f2 } is a basis for V . Let L : V → V be the linear
= [L]Z CZW [v]W . df
mapping defined by L(f ) = . Find [L]F .
dt
Since this computation holds for every [v]W , it follows
that
3. Let L : V → W and M : W → V be linear mappings,
CZW [L]W = [L]Z CZW .
and assume dim V > dim W . Show that M ◦L : V → V is not
Thus (8.4.8) is valid.  invertible.

Exercises 4. (matlab) Let

w1 = (0.23, 0.56) and w2 = (0.17, −0.71)

1. Let and
w1 = (1, 2) and w2 = (0, 1) z1 = (−1.4, 0.3) and z2 = (0.1, −0.2)
and be two bases of R2 and let v = (0.6, 0.1). Find [v]W , [v]Z ,
z1 = (2, 3) and z2 = (3, 4) and CWZ .

231
§8.4 *Matrices of Linear Maps on a Vector Space

5. Let W = {v1 , v2 , v3 } be the basis of R3 where


     
−2 0 −1
v1 =  1  v2 =  0  v3 =  1  .
1 2 0

Let T : R3 → R3 be a linear function so that


     
−1 −4 1
T (v1 ) =  1  T (v2 ) =  2  T (v3 ) =  0  .
0 2 0

(a) Find the coordinates of T (v3 ) relative to the basis W.


That is, compute [T (v3 )]W .
(b) Find the matrix [T ]W of T relative to the basis W.
(c) Find the standard matrix [T ]E of T .

6. (matlab) Consider the matrix


 √ √   
1√ 1 − 3 1 + √3 0.3333 −0.2440 0.9107
1
A= 1 + √3 1√ 1 − 3  =  0.9107 0.3333 −0.2440 
3
1− 3 1+ 3 1 −0.2440 0.9107 0.3333
(8.4.9*)

(a) Try to determine the way that the matrix A moves vec-
tors in R3 . For example, let
1 1
w1 = (1, 1, 1)t w2 = √ (1, −2, 1)t w3 = √ (1, 0, −1)t
6 2
and compute Awj .
(b) Let W = {w1 , w2 , w3 } be the basis of R3 given in (a).
Compute [LA ]W .
(c) Determine the way that the matrix [LA ]W moves vectors
in R3 . For example, consider how this matrix moves the
standard basis vectors e1 , e2 , e3 . Compare this answer
with that in part (a).

232
Chapter 9 Least Squares

9 Least Squares
In Section 9.1 we study the geometric problem of least
squares approximations: Given a point x0 and a subspace
W ⊂ Rn , find the point w0 ∈ W closest to x0 . We then
use least squares approximation to discuss two applica-
tions: the best approximate solution to an inconsistent
linear systems in Section 9.2 and least squares fitting of
data in Section 9.3.

233
§9.1 Least Squares Approximations

9.1 Least Squares Approximations The form (9.1.2) means that the sum of the squares of
the components of the vectors b−w is minimal at w = w̃.
Let W ⊂ R be a subspace and b ∈ R be a vector.
n n

In this section we solve a basic geometric problem and Recall from (1.4.3) that two vectors z1 , z2 ∈ Rn are per-
investigate some of its consequences. The problem is: pendicular or equivalently orthogonal if z1 ·z2 = 0. Before
continuing, we state and prove
Find the vector w̃ ∈ W that is the nearest vec-
Lemma 9.1.2 (The Law of Pythagorus). The vectors
tor to b in W .
z1 , z2 ∈ Rn are orthogonal if and only if
Definition 9.1.1. The vector w̃ in the subspace W of
Rn that is the nearest to the vector b in Rn is called the ||z1 + z2 ||2 = ||z1 ||2 + ||z2 ||2 . (9.1.3)
least squares approximation of b in W .
Proof To verify (9.1.3) calculate

b ||z1 + z2 ||2 = (z1 + z2 ) · (z1 + z2 )


= z1 · z1 + 2z1 · z2 + z2 · z2
= ||z1 ||2 + 2z1 · z2 + ||z2 ||2 .

w
~ It follows that z1 and z2 satisfy (9.1.3) if and only if
W z1 · z2 = 0 if and only if z1 and z2 are orthogonal. 

Using (9.1.1) and (9.1.3), we can rephrase the minimum


distance problem as follows.
Lemma 9.1.3. The vector w̃ ∈ W is the closest vector
to b ∈ Rn if the vector b − w̃ is orthogonal to every vector
in W . See Figure 28.
Figure 28: The vector w̃ is the least squares approxima-
tion to the vector b by a vector in W . Proof Write b − w = z1 + z2 where z1 = b − w̃ and
z2 = w̃ − w. By assumption, b − w̃ is orthogonal to every
The distance between two vectors v and w is ||v − w||. vector in W ; so z1 and z2 ∈ W are orthogonal. It follows
Hence the least squares approximation can be rephrased from (9.1.3) that
as follows: find a vector w̃ ∈ W such that
||b − w||2 = ||b − w̃||2 + ||w̃ − w||2 .
||b − w̃|| ≤ ||b − w|| ∀w ∈ W. (9.1.1)
Since ||w̃ − w||2 ≥ 0, (9.1.2) is valid, and w̃ is the vector
Condition (9.1.1) is also called the least squares approx- in W that is nearest to b. 
imation. In order to see where this name comes from,
(9.1.1) can be rewritten in the equivalent form
Least Squares Distance to a Line Suppose W is as
||b − w̃||2 ≤ ||b − w||2 ∀w ∈ W. (9.1.2) simple a subspace as possible; that is, suppose W is one

234
§9.1 Least Squares Approximations

dimensional with basis vector w. Since W is one dimen- Proof Observe that the vector b − w̃ is orthogonal to
sional, a vector w̃ ∈ W must be a multiple of w; that is, every vector in W precisely when b − w̃ is orthogonal to
w̃ = αw for α ∈ R. Suppose that we can find a scalar a each basis vector wj . It follows from Lemma 9.1.3 that
so that b − αw is orthogonal to every vector in W . Then w̃ is the closest vector to b in W if
it follows from Lemma 9.1.3 that w̃ is the closest vector
(b − w̃) · wj = 0
in W to b. To find α, calculate
for every j. That is, if
0 = (b − αw) · w = b · w − αw · w.
w̃ · wj = b · wj
Then
b·w for every j. These equations can be rewritten as a system
α=
||w||2 of equations in terms of the αi , as follows:
and w1 · w1 α1 + · · · + w1 · wk αk = w1 · b
b·w .. (9.1.7)
w̃ = w. (9.1.4) .
||w||2
wk · w1 α1 + · · · + wk · wk αk = wk · b.
Observe that ||w||2 6= 0 since w is a basis vector.
For example, if b = (1, 2, −1, 3) ∈ R4 and w = (0, 1, 2, 3). Note that if u, v ∈ Rn are column vectors, then u·v = ut v.
Then the vector w̃ in the space spanned by w that is Therefore, we can rewrite (9.1.7) as
nearest to b is 
α1

9
w̃ = w W t W  ...  = W t b,
 
14
since b · w = 9 and ||w||2 = 14. αk
where W is the matrix whose columns are the wj and b is
Least Squares Distance to a Subspace Similarly, we viewed as a column vector. Note that the matrix W t W
solve the general least squares problem by solving a sys- is a k × k matrix.
tem of linear equations. We claim that W t W is invertible. To verify this claim, it
suffices to show that the null space of W t W is zero; that
Theorem 9.1.4. Let b ∈ Rn be a vector, let {w1 , . . . , wk }
is, if W t Wz = 0 for some z ∈ Rk , then we show z = 0.
be a basis for the subspace W ⊂ Rn , and let W =
First, calculate
(w1 | · · · |wk ) be the n × k matrix whose columns are the
basis vectors of W . Suppose ||Wz||2 = Wz · Wz = (Wz)t Wz = z t W t Wz = z t 0 = 0.

w̃ = α1 w1 + · · · + αk wk (9.1.5) It follows that Wz = 0. Now if we let z = (z1 , . . . , zk )t ,


then the equation Wz = 0 may be rewritten as
is the vector in W nearest to b. Then
z1 w1 + · · · + zk wk = 0.
 
α1
 ..  Since the wj are linearly independent, it follows that z1 =
t
 .  = (W W) W b.
−1 t
(9.1.6) · · · = zk = 0 and z = 0. Since W t W is invertible, (9.1.6)
αk is valid, and the theorem is proved. 

235
§9.1 Least Squares Approximations

Corollary 9.1.5. Let b be a vector in Rn , let W be a 5. Prove that the least squares approximation is unique.
subspace of Rn , and let w1 , . . . , wk in Rn be a basis for More precisely, let W be a subspace of Rn and let b be a
W . Let W be the n × k matrix (w1 | · · · |wk ). Then vector in Rn outside W . Suppose w1 , w2 ∈ W are two vectors
such that the distance ||w1 − b|| = ||w2 − b|| is minimal among
w̃ = W(W t W)−1 W t b ∈ W (9.1.8) all vectors in W . Then show that w1 = w2 .

is the least squares approximation to b in W . The dis-


tance of b to W is ||b − w̃||. 6.
 Verify that when k = 1, the coefficient column vector
α1
 .  b·w
 ..  in (9.1.6) coincides with the coefficient a =
Proof Define scalars α1 , . . . , αk by (9.1.6). It follows ||w||2
from Theorem 9.1.4 that the least squares approximation αk
in (9.1.4). Similarly, verify that when k = 1, (9.1.8) coincides
is 
α1
 with (9.1.4).

w̃ = W  ...  = W(W t W)−1 W t b,


 

αk 7. Let W be the vector space spanned by

as claimed.
   
 1 −1
 2   1 
w1 =  −3  and w2 =  0 
  

Exercises 1 2

in
 R .
4
Find the closest point w̃ in W to the point b =
1
1. Use the least squares method, specifically formula (9.1.4)  3
 in R4 .

to find the minimal distance between the point (3, 4) to the

 −1 
x-axis. 2

2. Use the least squares method, specifically formula (9.1.4), 8. (matlab) Let W be the vector space spanned by the
to find the minimal distance between the point (3, 4) to the vectors      
y-axis. 1 −3 −5
 3   0   2 
(9.1.9)
     
 7  ,  8  ,  −2  .
     
 1   3   1 
3. Find the point on the line y = x in R2 that has minimal 0 9 4
distance to the point (1, 6).
 
9
 6 
Find the closest point w̃ in W to the point b =   in R5 .
 
1 1 4
4. Find the vector in the plane x − y − z = 0 that has the  
2 3  3 
1 0
minimal distance to the point x0 =  1 .
1

236
§9.1 Least Squares Approximations

9. (matlab) Use MATLAB to find the vector w̃ on the


subspace defined by the equation
1 √
3x1 − x2 + x3 + 4.5x4 − 2x5 + 2x6 = 0 (9.1.10)
7
that realizes the minimal distance to the point
(1, 3, 5, 7, 9, 11)t .

10. (matlab) Let W be the vector subspace of R5 defined


by the following system of equations:

x1 + 2.5x2 − 5 3x3 + x4 − 3x5 = 0
7x1 − x2 − 5.5x3 − 10x4 − x5 = 0 (9.1.11*)
3x1 + 4x2 − 5x3 − 2x4 − 6x5 = 0

(a) Let A be its coefficients matrix of size 3 × 5. Find a basis


of the subspace W .
(b) Find the distance between the vector b = (2, 4, 6, 8, 10)
and W .

11. (matlab) Consider the following system of linear equa-


tions
2x1 + x2 + x3 = 0
x1 − 6x3 = −2
3x2 + 5x3 = 4 (9.1.12*)
5x1 − 2x2 + 12x3 = −7
2x1 − 9x2 − 11x3 = 0
Use MATLAB to verify that the system of linear equations
is inconsistent. Then find the least squares approximation
solution of the system.

237
§9.2 Best Approximate Solution

9.2 Best Approximate Solution Verify that Ax = b is inconsistent, rank(A) = 2, and


Here we present an application of least squares approx-  
2

1

imation. Let Ax = b be a system of m linear equations w1 =  1  w2 =  2 
in n unknowns. We have discussed various methods for 1 3
solving such a linear system when the system is consis-
tent. Now we use least squares to answer a related but is a basis for range(A) = column space of A. Let
different question:  
2 1
What is the closest vector x̃ to an image point W = (w1 |w2 ) =  1 2  .
Ax̃ to b when the system is inconsistent? 1 3

Use MATLAB and (9.1.8) to compute


Specifically, we use least squares to find x̃ ∈ Rn where
the image Ax̃ is closest to b among all vectors in Rn . w̃ = W(W t W)−1 W t b = (0.8, 1.0, 1.4)t .
This question is best answered in two steps.
Finally, use (9.1.6) to see that
(a) Find w̃ ∈ W = range(A) that is closest to b. This x̃ = (α1 , α2 , 0)t = (0.2, 0.4, 0)t
step can be done using (9.1.8).
is a best approximate solution. We end by noting that w̃
(b) Solve the consistent system Ax̃ = w̃. Do this by and the distance between w̃ and b are unique, whereas x̃
choosing x1 , . . . , xk ∈ Rn such that Axj = wj for all is not unique if the nullity of A is nonzero.
j. Then
x̃ = α1 x1 + · · · + αk xk
Exercises
where the αj are found using (9.1.6).
 
6
Computationally it is simplest to reorder unknowns so 1. Find the distance from the point b =  −12  to the
that the basis for the k-dimensional column space of A 0
is given by the first k columns w1 , . . . , wk of A. It then plane W that consists of the solutions of the equation −x1 +
follows that Aej = wj , where ej is the j th standard basis 2x2 − x3 = 0.
element of Rn . Hence

x̃ = α1 e1 + · · · + αk ek = (α1 , . . . , αk , 0, . . . , 0)t 2. Consider the system of 2 linear equations in 2 unknown

Example 9.2.1. Consider the system Ax = b where Ax = b


     
3 2 x1 4
    where A = ,x= and b = .
2 1 5 1 6 4 x2 2
A= 1 2 4  and b =  0 
1 3 5 2 (a) Verify that this system is inconsistent.

238
§9.2 Best Approximate Solution

(b) Find the range of A. 6. (matlab) Consider the system of 3 linear equations in 3
(c) Let 
W = range(A). Find w̃ in W that is closest to unknowns
4 Ax = b
b= among all w ∈ W .
2 where
(d) Find a least squares approximation solution to the sys-
tem.
   
2 1 −3 7
A= 0 3 −3  and b =  1 .
−3 0 3 3
3. Consider the following system of linear equations Use MATLAB to verify that the system is inconsistent and
to find a least squares approximation solution of the system.
x1 + x2 = 0
2x1 − 3x2 = 1
5x1 − 2x2 = 2

Verify that the system of linear equations is inconsistent.


Then find the least squares approximation solution of the sys-
tem.

4. Suppose the linear system Ax = b is inconsistent. Let


x̃ be a least squares approximation solution. Then use the
principle of superposition in Section 3.4 to prove that the set
of least squares solutions of the system consists of

x̃ + x

where x ∈ null space(A).

5. (matlab) Consider the system of 4 linear equations in 3


unknown
Ax = b
where
   
7 4 −3 4
 0 9 −10   0 
A=
  and b = 
 3 .

3 −2 −5 
9 2 1 1

Use MATLAB to verify that the system is inconsistent and


to find a least squares approximation solution of the system.

239
§9.3 Least Squares Fitting of Data

9.3 Least Squares Fitting of Data b1 and b2 (that do not depend on i) for which yi = b1 +
b2 xi for each i. But these points are just data; errors
We begin this section by using the method of least
may have been made in their measurement. So we ask:
squares to find the best straight line fit to a set of data.
Find b01 and b02 so that the error made in fitting the data
Later in the section we will discuss best fits to other
to the line y = b01 + b02 x is minimal, that is, the error that
curves.
is made in that fit is less than or equal to the error made
in fitting the data to the line y = b1 + b2 x for any other
An Example of Best Linear Fit to Data Suppose that we choice of b1 and b2 .
are given n data points (xi , yi ) for i = 1, . . . , 10. For
We begin by discussing what that error actually is. Given
example, consider the ten points
constants b1 and b2 and given a data point xi , the dif-
(2.0, 0.1) (3.0, 2.7) (1.5, −1.1) (−1.0, −5.5) (0.0, −3.4) ference between the data value yi and the hypothesized
(3.6, 3.0) (0.7, −2.8) (4.1, 4.0) (1.9, −1.9) (5.0, 5.5) value b1 + b2 xi is the error that is made at that data
(9.3.1*) point. Next, we combine the errors made at all of the
The ten points (xi , yi ) are plotted in Figure 29 using the data points; a standard way to combine the errors is to
commands use the Euclidean distance

e9_3_1 1
2 2

plot(X,Y,'o') E(b) = (y1 − (b1 + b2 x1 )) + · · · + (y10 − (b1 + b2 x10 )) 2.
axis([-3,7,-8,8])
xlabel('x') Rewriting E(b) in vector notation leads to an economy
ylabel('y') in notation and to a conceptual advantage. Let

8
X = (x1 , . . . , x10 )t Y = (y1 , . . . , y10 )t and F1 = (1, 1, . . . , 1)

6 be vectors in R10 . Then in coordinates


 
4
y1 − (b1 + b2 x1 )
..
Y − (b1 F1 + b2 X) =  . .
2
 
0 y10 − (b1 + b2 x10 )
y

It follows that
−2

−4

−6
E(b) = ||Y − (b1 F1 + b2 X)||.

−8
−3 −2 −1 0 1 2 3 4 5 6 7 The problem of making a least squares fit is to minimize
x
E over all b1 and b2 .
Figure 29: Scatter plot of data in (9.3.1*). To solve the minimization problem, note that the vec-
tors b1 F1 + b2 X form a two dimensional subspace W =
Next, suppose that there is a linear relation between the span{F1 , X} ⊂ R10 (at least when X is not a scalar mul-
xi and the yi ; that is, we assume that there are constants tiple of F1 , which is almost always). Minimizing E is

240
§9.3 Least Squares Fitting of Data

identical to finding a vector w0 = b01 F1 + b02 X ∈ W that 8

is nearest to the vector Y ∈ R10 . This is the least squares 6

question that we solved in the Section 9.1. 4

We can use MATLAB to compute the values of b01 and 2


b02 that give the best linear approximation to Y . If we
set the matrix A = (F1 |X), then Theorem 9.1.4 implies 0

y
that the values of b01 and b02 are obtained using (9.1.6). −2

In particular, type e10_3_1 to call the vectors X, Y, F1


into MATLAB, and then type
−4

−6

A = [F1 X]; −8
b0 = inv(A'*A)*A'*Y −3 −2 −1 0 1 2
x
3 4 5 6 7

to obtain Figure 30: Scatter plot of data in (9.3.1*) with best linear
approximation.
b0(1) = -3.8597
b0(2) = 1.8845
Least Squares Fit to a Quadratic Polynomial Suppose
Superimposing the line y = −3.8597 + 1.8845x on the that we want to fit the data (xi , yi ) to a quadratic poly-
scatter plot in Figure 29 yields the plot in Figure 30. nomial
The total error is E(b0) = 1.9634 (obtained in MATLAB y = b1 + b2 x + b3 x2
by typing norm(Y-(b0(1)*F1+b0(2)*X)). Compare this
with the error E(2, −4) = 2.0928. by least squares methods. We want to find constants
b01 , b02 , b03 so that the error made is using the quadratic
polynomial y = b01 + b02 x + b03 x2 is minimal among all pos-
General Linear Regression We can summarize the previ-
sible choices of quadratic polynomials. The least squares
ous discussion, as follows. Given n data points
error is
(x1 , y1 ), . . . , (xn , yn );  
E(b) = ||Y − b1 F1 + b2 X + b3 X (2) ||
form the vectors
X = (x1 , . . . , xn )t Y = (y1 , . . . , yn )t and F1 = (1, . . . , 1)t where
t
in Rn . Find constants b01 and b02 so that b01 F1 + b02 X is a X (2) = x21 , . . . , x2n
vector in W = span{F1 , X} ⊂ Rn that is nearest to Y . and, as before, F1 is the n vector with all components
Let equal to 1.
A = (F1 |X)
We solve the minimization problem as before. In this
be the n × 2 matrix. This problem is solved by least case, the space of possible approximations to the data
squares in (9.1.6) as W is three dimensional; indeed, W = span{F1 , X, X (2) }.
As in the case of fits to lines we try to find a point in
 0 
b1
= (At A)−1 At Y. (9.3.2) W that is nearest to the vector Y ∈ Rn . By (9.1.6), the
b02

241
§9.3 Least Squares Fitting of Data

answer is: 8

t −1 t
b = (A A) A Y, 6

where A = (F1 |X|X (2)


) is an n × 3 matrix. 4

Suppose that we try to fit the data in (9.3.1*) with a 2

quadratic polynomial rather than a linear one. Use MAT- 0

y
LAB as follows
−2

e9_3_1 −4

A = [F1 X X.*X]; −6
b = inv(A'*A)*A'*Y;
−8
−3 −2 −1 0 1 2 3 4 5 6 7

to obtain
x

Figure 31: Scatter plot of data in (9.3.1*) with best lin-


b0(1) = 0.0443
ear and quadratic approximations. The best linear fit is
b0(2) = 1.7054
plotted with a dashed line.
b0(3) = -3.8197

So the best parabolic fit to this data is y = −3.8197 + is the function g0 (x) ∈ C that is nearest to the data set
1.7054x + 0.0443x2 . Note that the coefficient of x2 is in the following sense. Let
small suggesting that the data was well fit by a straight
line. Note also that the error is E(b0) = 1.9098 which is X = (x1 , . . . , xn )t and Y = (y1 , . . . , yn )t
only marginally smaller than the error for the best linear
fit. For comparison, in Figure 31 we superimpose the be column vectors in Rn . For any function g(x) define
equation for the quadratic fit onto Figure 30. the column vector

G = (g(x1 ), . . . , g(xn ))t ∈ Rn .


General Least Squares Fit The approximation to a
quadratic polynomial shows that least squares fits can So G is the evaluation of g(x) on the data set. Then the
be made to any finite dimensional function space. More error
precisely, Let C be a finite dimensional space of functions E(g) = ||Y − G||
and let is minimal for g = g0 .
f1 (x), . . . , fm (x)
More precisely, we think of the data Y as representing
be a basis for C. We have just considered two such spaces: the (approximate) evaluation of a function on the xi .
C = span{f1 (x) = 1, f2 (x) = x} for linear regression and Then we try to find a function g0 ∈ C whose values on
C = span{f1 (x) = 1, f2 (x) = x, f3 (x) = x2 } for least the xi are as near as possible to the vector Y . This
squares fit to a quadratic polynomial. is just a least squares problem. Let W ⊂ Rn be the
The general least squares fit of a data set vector subspace spanned by the evaluations of function
g ∈ C on the data points xi , that is, the vectors G. The
(x1 , y1 ), . . . , (xn , yn ) minimization problem is to find a vector in W that is

242
§9.3 Least Squares Fitting of Data

nearest to Y . This can be solved in general using (9.1.6). where T is time measured in months and b1 , b2 , b3 are
That is, let A be the n × m matrix scalars. These functions are 12 periodic, which seems ap-
propriate for weather data, and form a three dimensional
A = (F1 | · · · |Fm ) function space C. Recall the trigonometric identity
where Fj ∈ Rn is the column vector associated to the j th a cos(ωt) + c sin(ωt) = d sin(ω(t − ϕ))
basis element of C, that is,
where
Fj = (fj (x1 ), . . . , fj (xn ))t ∈ Rn . (9.3.4)
p
d= a2 + c2 .
The minimizing function g0 (x) ∈ C is a linear combina- Based on this identity we call C the space of sinusoidal
tion of the basis functions f1 (x), . . . , fn (x), that is, functions. The number d is called the amplitude of the
sinusoidal function g(T ).
g0 (x) = b1 f1 (x) + · · · + bm fm (x)
Note that each data set consists of twelve entries — one
for scalars bi . If we set for each month. Let T = (1, 2, . . . , 12)t be the vector
X ∈ R12 in the general presentation. Next let Y be the
b = (b1 , . . . , bm ) ∈ Rm , data in one of the data sets — say the high temperatures
in Paris.
then least squares minimization states that
Now we turn to the vectors representing basis functions
b = (A0 A)−1 A0 Y. (9.3.3) in C. Let

This equation can be solved easily in MATLAB. Enter the F1=[1 1 1 1 1 1 1 1 1 1 1 1]'
data as column n-vectors X and Y. Compute the column
vectors Fj = fj (X) and then form the matrix A = [F1 be the vector associated with the basis function f1 (T ) =
F2 · · · Fm]. Finally compute 1. Let F2 and F3 be the column vectors associated to the
basis functions
b = inv(A'*A)*A'*Y    
2π 2π
f2 (T ) = cos T and f3 (T ) = sin T .
12 12
Least Squares Fit to a Sinusoidal Function We discuss a
specific example of the general least squares formulation These vectors are computed by typing
by considering the weather. It is reasonable to expect
monthly data on the weather to vary periodically in time F2 = cos(2*pi/12*T);
with a period of one year. In Table 3 we give average F3 = sin(2*pi/12*T);
daily high and low temperatures for each month of the
year for Paris and Rio de Janeiro. We attempt to fit this By typing temper, we enter the temperatures and the
data with curves of the form: vectors T, F1, F2 and F3 into MATLAB.
To find the best fit to the data by a sinusoidal function
   
2π 2π
g(T ) = b1 + b2 cos T + b3 sin T , g(T ), we use (9.1.6). Let A be the 12 × 3 matrix
12 12

243
§9.3 Least Squares Fitting of Data

Paris Rio de Janeiro Paris Rio de Janeiro


Month High Low High Low Month High Low High Low
1 55 39 84 73 7 81 64 75 63
2 55 41 85 73 8 81 64 76 64
3 59 45 83 72 9 77 61 75 65
4 64 46 80 69 10 70 54 77 66
5 68 55 77 66 11 63 46 79 68
6 75 61 76 64 12 55 41 82 71

Table 3: Monthly Average of Daily High and Low Temperatures in Paris and Rio de Janeiro.

A = [F1 F2 F3]; A similar exercise allows us to compute the best approxi-


mation to the Rio de Janeiro high temperatures obtaining
The table data is entered in column vectors ParisH and
ParisL for the high and low Paris temperatures and RioH b(1) = 79.0833
and RioL for the high and low Rio de Janeiro tempera- b(2) = 3.0877
tures. We can find the best least squares fit of the Paris b(3) = 3.6487
high temperatures by a sinusoidal function g0 (T ) by typ-
ing
The value of b(1) is just the mean high temperature and
b = inv(A'*A)*A'*ParisH not surprisingly that value is much higher in Rio than in
Paris. There is yet more information contained in these
obtaining approximations. For the high temperatures in Paris and
Rio
b(1) = 66.9167 dP = 13.3244 and dR = 4.7798.
b(2) = -9.4745
b(3) = -9.3688 The amplitude d, where d is defined in (9.3.4), measures
the variation of the high temperature about its mean. It
The result is plotted in Figure 32 by typing is much greater in Paris than in Rio, indicating that the
difference in temperature between winter and summer is
plot(T,ParisH,'o') much greater in Paris than in Rio.
axis([0,13,0,100])
xlabel('time (months)')
ylabel('temperature (Fahrenheit)') Least Squares Fit in MATLAB The general formula for
hold on a least squares fit of data (9.3.3) has been prepro-
xx = linspace(0,13); grammed in MATLAB. After setting up the matrix A
yy = b(1) + b(2)*cos(2*pi*xx/12) + whose columns are the vectors Fj just type
b(3)*sin(2*pi*xx/12);
plot(xx,yy) b = A\Y

244
§9.3 Least Squares Fitting of Data

100 100

90 90

80 80

70 70

temperature (Farenheit)
temperature (Farenheit)
60 60

50 50

40 40

30 30

20 20

10 10

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
time (months) time (months)

Figure 32: Monthly averages of daily high temperatures in Paris (left) and Rio de Janeiro (right) with best sinusoidal
approximation.

This MATLAB command can be checked on the sinu- (a) Find m and b to give the best linear fit to this data.
soidal fit to the high temperature Rio de Janeiro data by (b) Use this linear approximation to the data to make pre-
typing dictions of the world populations in the year 1910 and
2000.
b = A\RioH

and obtaining Year Population (in millions) Year Population (in million
1900 1625 1950 2516
b = 1910 n.a. 1960 3020
79.0833 1920 1813 1970 3698
3.0877 1930 1987 1980 4448
3.6487 1940 2213 1990 5292

Table 4: Twentieth Century World Population Data by


Exercises Decades.

2. (matlab) Continue from Exercise 1. Do you expect the


1. (matlab) World population data for each decade of this prediction for the year 2000 to be high or low or on target?
century (except for 1910) is given in Table 4. Assume that Explain why by graphing the data with the best linear fit su-
population growth is linear P = mT + b where time T is perimposed and by using the differential equation population
measured in decades since the year 1900 and P is measured model discussed in Section 4.2.
in billions of people. This data can be recovered by typing
e9_3_po.

245
§9.3 Least Squares Fitting of Data

3. (matlab) Find the best sinusoidal approximation to


the monthly average low temperatures in Paris and Rio de
Janeiro. How does the variation of these temperatures about
the mean compare to the high temperature calculations? Re-
call that by typing temper in the MATLAB command window,
the temperatures and the vectors T, F1, F2 and F3 are entered
into MATLAB.

4. (matlab) In Table 5 we present weather data from ten


U.S. cities. The data is the average number of days in the
year with precipitation and the percentage of sunny hours to
hours when it could be sunny. Find the best linear fit to this
data.

City Rainy Days Sunny (%) City Rainy Days Sunny (%)
Charleston 92 72 Kansas City 98 59
Chicago 121 54 Miami 114 85
Dallas 82 65 New Orleans 103 61
Denver 82 67 Phoenix 28 88
Duluth 136 52 Salt Lake City 99 59

Table 5: Precipitation Days Versus Sunny Time for Se-


lected U.S. Cities.

246
Chapter 10 Orthogonality

10 Orthogonality
In Section 10.1 we discuss orthonormal bases (bases in
which each basis vector has unit length and any two ba-
sis vectors are perpendicular) and orthogonal matrices
(matrices whose columns form an orthonormal basis).
We will see that the computation of coordinates in an
orthonormal basis is particularly straightforward. The
Gram-Schmidt orthonormalization process for construct-
ing orthonormal bases is presented in Section 10.2. We
use orthogonality in Section 10.3 to study the eigenvalues
and eigenvectors of symmetric matrices (the eigenvalues
are real and the eigenvectors can be chosen to be or-
thonormal). The chapter ends with a discussion of the
QR decomposition for finding orthonormal bases in Sec-
tion 10.4. This decomposition leads to an algorithm that
is numerically superior to Gram-Schmidt and is the one
used in MATLAB.

247
§10.1 Orthonormal Bases and Orthogonal Matrices

10.1 Orthonormal Bases and Next we discuss how to find coordinates of a vector in an
orthonormal basis, that is, a basis consisting of orthonor-
Orthogonal Matrices mal vectors.
In Section 8.3 we discussed how to write the coordinates
of a vector in a basis. We now show that finding coordi- Theorem 10.1.3. Let V ⊂ Rn be a subspace and let
nates of vectors in certain bases is a very simple task — {v1 , . . . , vk } be an orthonormal basis of V . Let v ∈ V be
these bases are called orthonormal bases. a vector. Then
Nonzero vectors v1 , . . . , vk in Rn are orthogonal if the dot v = α1 v1 + · · · + αk vk .
products
vi · vj = 0 where
when i 6= j. The vectors are orthonormal if they are αi = v · vi .
orthogonal and of unit length, that is,
Proof Since {v1 , . . . , vk } is a basis of V , we can write
vi · vi = 1.

The standard example of a set of orthonormal vectors in v = α1 v1 + · · · + αk vk


Rn is the standard basis e1 , . . . , en .
for some scalars αj . It follows that
Lemma 10.1.1. Nonzero orthogonal vectors are linearly
independent. v · vj = (α1 v1 + · · · + αk vk ) · vj = αj ,

Proof Let v1 , . . . , vk be a set of nonzero orthogonal as claimed. 


vectors in Rn and suppose that

α1 v1 + · · · + αk vk = 0. An Example in R3 Let

To prove the lemma we must show that each αj = 0. 1


v1 = √ (1, 1, 1),
Since vi · vj = 0 for i 6= j, 3
1
αj vj · vj = α1 v1 · vj + · · · + αk vk · vj v2 = √ (1, −2, 1),
6
= (α1 v1 + · · · + αk vk ) · vj = 0 · vj = 0. 1
v3 = √ (1, 0, −1).
Since vj · vj = ||vj ||2 > 0, it follows that αj = 0.  2

Corollary 10.1.2. A set of n nonzero orthogonal vectors A short calculation verifies that these vectors have unit
in Rn is a basis. length and are pairwise orthogonal. Let v = (1, 2, 3)
be a vector and determine the coordinates of v in the
basis V = {v1 , v2 , v3 }. Theorem 10.1.3 states that these
Proof Lemma 10.1.1 implies that the n vectors are lin-
coordinates are:
early independent, and Chapter 5, Corollary 5.6.7 states
that n linearly independent vectors in Rn form a ba- √ 7 √
sis.  [v]V = (v · v1 , v · v2 , v · v3 ) = (2 3, √ , − 2).
6

248
§10.1 Orthonormal Bases and Orthogonal Matrices

Matrices in Orthonormal Coordinates Next we discuss (b) Q is orthogonal if and only if Q−1 = Qt ;
how to find the matrix associated with a linear map in
an orthonormal basis. Let L : Rn → Rn be a linear map (c) If Q1 , Q2 are orthogonal matrices, then Q1 Q2 is an
and let V = {v1 , . . . , vn } be an orthonormal basis for Rn . orthogonal matrix.
Then the matrix associated to L in the basis V can be
calculated in terms of dot product. That matrix is: Proof (a) Let Q = (v1 | · · · |vn ). Since Q is orthogonal,
the vj form an orthonormal basis. By direct computa-
[L]V = L(vj ) · vi . (10.1.1) tion note that Qt Q = {(vi · vj )} = In , since the vj are
orthonormal. Note that (b) is simply a restatement of
To verify (10.1.1), recall from Definition 8.3.3 that the (a).
(i, j)th entry of [L]V is the ith entry in the vector [L(vj )]V
which is L(vj ) · vi by Theorem 10.1.3. (c) Now let Q1 , Q2 be orthogonal. Then (a) implies

(Q1 Q2 )t (Q1 Q2 ) = Qt2 Qt1 Q1 Q2 = Qt2 In Q2 = Qt2 Q2 = In ,


An Example in R2 Let V = {v1 , v2 } ⊂ R2 where
thus proving (c). 
   
1 1 1 1
v1 = √ and v2 = √ .
2 1 2 −1 Remarks Concerning MATLAB In the next section we
prove that every vector subspace of Rn has an orthonor-
The set V is an orthonormal basis of R . Using (10.1.1)
2
mal basis (see Theorem 10.2.1), and we present a method
we can find the matrix associated to the linear map
for constructing such a basis (the Gram-Schmidt or-
thonormalization process). Here we note that certain
 
2 1
LA (x) =
−1 3
x commands in MATLAB produce bases for vector spaces.
For those commands MATLAB always produces an or-
in the basis V. That is, compute thonormal basis. For example, null(A) produces a basis
    for the null space of A. Take the 3 × 5 matrix
Av1 · v1 Av2 · v1 1 5 −3
[L]V = = .  
Av1 · v2 Av2 · v2 2 1 5 1 2 3 4 5
A =  0 1 2 3 4 . (10.1.2*)
2 3 4 0 0
Orthogonal Matrices
Definition 10.1.4. An n × n matrix Q is orthogonal if Since rank(A) = 3, it follows that the null space of A
its columns form an orthonormal basis of Rn . is two-dimensional. Typing B = null(A) in MATLAB
produces
The following lemma states elementary properties of or-
thogonal matrices. Particularly, an orthogonal matrix is B =
invertible and it is straightforward to compute its inverse. -0.4666 0
0.6945 0.4313
Lemma 10.1.5. Let Q be an n × n matrix. Then -0.2876 -0.3235
0.3581 -0.6470
(a) Q is orthogonal if and only if Qt Q = In ; -0.2984 0.5392

249
§10.1 Orthonormal Bases and Orthogonal Matrices

The columns of B form an orthonormal basis for the null 4. Show that if P is an n × n orthogonal matrix, then
space of A. This assertion can be checked by first typing det(P ) = ±1.

v1 = B(:, 1);
v2 = B(:, 2); In Exercises 5 – 9 decide whether or not the given matrix is
orthogonal.
and then typing  
2 0
5. .
0 1
norm(v1)
norm(v2) 
0 1 0

dot(v1,v2) 6.  0 0 1 .
A*v1 1 0 0
A*v2  
0 −1 0
yields answers 1, 1, 0, (0, 0, 0)t , (0, 0, 0)t (to within nu- 7.  0 0 1 .
merical accuracy). Recall that the MATLAB command −1 0 0
norm(v) computes the norm of a vector v. 
cos(1) − sin(1)

8. .
sin(1) cos(1)
Exercises 
1 0 4

9. .
0 1 0

1. Find an orthonormal basis for the solutions to the linear


equation 10. Prove that the rows of an n × n orthogonal matrix form
2x1 − x2 + x3 = 0. an orthonormal basis for Rn .

2. (a) Find the coordinates of the vector v = (1, 4) in the


11. Show that if P and Q are n×n orthogonal matrices, then
orthonormal basis V
P Q is an n × n orthogonal matrix.
1 1
v1 = √ (1, 2) and v2 = √ (2, −1).
5 5
 
1 1
(b) Let A = . Find [A]V .
2 −3

3. (matlab) Load the matrix


 
1 2 0
A= 0 1 0 
0 0 0
into MATLAB. Then type the command orth(A). Verify that
the result is an orthonormal basis for the column space of A.

250
§10.2 Gram-Schmidt Orthonormalization Process

10.2 Gram-Schmidt Next, we find a unit length vector v20 in the plane spanned
by w1 and w2 that is perpendicular to v1 . Let w0 be the
Orthonormalization Process vector on the line generated by v1 that is nearest to w2 .
Suppose that W = {w1 , . . . , wk } is a basis for the sub- It follows from (9.1.4) that
space V ⊂ Rn . There is a natural process by which the
w2 · v1
W basis can be transformed into an orthonormal basis V w0 = v1 = (w2 · v1 )v1 .
of V . This process proceeds inductively on the wj ; the ||v1 ||2
orthonormal vectors v1 , . . . , vk can be chosen so that The vector w0 is shown on Figure 33 and, as Lemma 9.1.3
states, the vector v20 = w2 − w0 is perpendicular to v1 .
span{v1 , . . . , vj } = span{w1 , . . . , wj } That is,
v20 = w2 − (w2 · v1 )v1 (10.2.2)
for each j ≤ k. Moreover, the vj are chosen using the
theory of least squares that we have just discussed. is orthogonal to v1 .
Finally, set
1 0
The Case j = 2 To gain a feeling for how the induction v2 = v (10.2.3)
||v20 || 2
process works, we verify the case j = 2. Set
so that v2 has unit length. Since v2 and v20 point in the
v1 =
1
w1 ; (10.2.1) same direction, v1 and v2 are orthogonal. Note also that
||w1 || v1 and v2 are linear combinations of w1 and w2 . Since v1
and v2 are orthogonal, they are linearly independent. It
so v1 points in the same direction as w1 and has unit follows that
length, that is, v1 · v1 = 1. The normalization is shown
in Figure 33. span{v1 , v2 } = span{w1 , w2 }.
,
v2 In summary: computing v1 and v2 using (10.2.1), (10.2.2)
and (10.2.3) yields an orthonormal basis for the plane
w2 spanned by w1 and w2 .

v2 The General Case


Theorem 10.2.1. (Gram-Schmidt Orthonormalization)
Let w1 , . . . , wk be a basis for the subspace W ⊂ Rn . De-
w0 = (w2 ,v1 )v1
v
1 fine v1 as in (10.2.1) and then define inductively
0
vj+1 = wj+1 − [(wj+1 · v1 )v1 + · · · + (wj+1 · vj )vj ]
w1
(10.2.4)
1
Figure 33: Planar illustration of Gram-Schmidt or- vj+1 = v0 . (10.2.5)
thonormalization.
0
||vj+1 || j+1

251
§10.2 Gram-Schmidt Orthonormalization Process

Then span{v1 , . . . , vj } = span{w1 , . . . , wj } and Step 1: Set


v1 , . . . , vk is an orthonormal basis of W .
1 1
v1 = w1 = √ (1, 0, −1, 0).
Proof We assume that we have constructed orthonor- ||w1 || 2
mal vectors v1 , . . . , vj such that
span{v1 , . . . , vj } = span{w1 , . . . , wj }. Step 2: Following the Gram-Schmidt process, use
(10.2.4) to define
Our purpose is to find a unit vector vj+1 that is orthog-
onal to each vi and that satisfies
v20 = w2 − (w2 · v1 )v1
span{v1 , . . . , vj+1 } = span{w1 , . . . , wj+1 }. √ 1
= (2, −1, 0, 1) − 2 √ (1, 0, −1, 0)
We construct vj+1 in two steps. First we find a vec- 2
tor vj+10
that is orthogonal to each of the vi using least = (1, −1, 1, 1).
squares. Let w0 be the vector in span{v1 , . . . , vj } that is
nearest to wj+1 . Theorem 9.1.4 tells us how to make this Normalization using (10.2.5) yields
construction. Let A be the matrix whose columns are
v1 , . . . , vj . Then (9.1.6) states that the coordinates of w0 1 0 1
in the vi basis is given by (At A)−1 At wj+1 . But since the v2 = v = (1, −1, 1, 1).
||v20 || 2 2
vi ’s are orthonormal, the matrix At A is just Ik . Hence
w0 = (wj+1 · v1 )v1 + · · · + (wj+1 · vj )vj .
Step 3: Using (10.2.4) set
Note that vj+1 0
= wj+1 − w0 is the vector defined in
(10.2.4). We claim that vj+1 0
= wj+1 − w0 is orthog-
v30 = w3 − (w3 · v1 )v1 − (w3 · v2 )v2
onal to vk for k ≤ j and hence to every vector in
√ 1
span{v1 , . . . , vj }. Just calculate = (0, 0, −2, 1) − 2 √ (1, 0, −1, 0) −
0
2
vj+1 · vk = wj+1 · vk − w0 · vk = wj+1 · vk − wj+1 · vk = 0.  
1 1
− (1, −1, 1, 1)
Define vj+1 as in (10.2.5). It follows that v1 , . . . , vj+1 are 2 2
orthonormal and that each vector is a linear combination 1
= (−3, −1, −3, 5).
of w1 , . . . , wj+1 .  4

Normalization using (10.2.5) yields


An Example of Orthonormalization Let W ⊂ R4 be the
subspace spanned by the vectors
1 0 4
w1 = (1, 0, −1, 0), w2 = (2, −1, 0, 1),
w3 = (0, 0, −2, 1). v3 = 0 v3 = √ (−3, −1, −3, 5).
||v3 || 44
(10.2.6)
We find an orthonormal basis for W using Gram-Schmidt
orthonormalization. Hence we have constructed an orthonormal basis

252
§10.2 Gram-Schmidt Orthonormalization Process

{v1 , v2 , v3 } for W , namely


1
v1 = √ (1, 0, −1, 0)
2
≈ (0.7071, 0, −0.7071, 0)
1
v2 = (1, −1, 1, 1)
2
= (0.5, −0.5, 0.5, 0.5)
4
v3 = √ (−3, −1, −3, 5)
44
≈ (−0.4523, −0.1508, −0.4523, 0.7538)
(10.2.7)

Exercises

1. Find an orthonormal basis of R2 by applying Gram-


Schmidt orthonormalization to the vectors w1 = (3, 4) and
w2 = (1, 5).

2. Find an orthonormal basis of the plane W ⊂ R3 spanned


by the vectors w1 = (1, 2, 3) and w2 = (2, 5, −1) by applying
Gram-Schmidt orthonormalization.

3. Let W = {w1 , . . . , wk } be an orthonormal basis of the


subspace W ⊂ Rn . Prove that W can be extended to an
orthonormal basis {w1 , . . . , wn } of Rn .

4. (matlab) Use Gram-Schmidt orthonormalization to find


an orthonormal basis for the subspace of R5 spanned by the
vectors

w1 = (2, 1, 3, 5, 7) w2 = (2, −1, 5, 2, 3) and w3 = (10, 1, −23, 2, 3).


(10.2.8*)
Extend this basis to an orthonormal basis of R5 .

253
§10.3 The Spectral Theory of Symmetric Matrices

10.3 The Spectral Theory of complex scalar. Then


Symmetric Matrices hv, vi = ||v||2 ≥ 0
Eigenvalues and eigenvectors of symmetric matrices have hcv, wi = chv, wi
remarkable properties that can be summarized in three hv, cwi = chv, wi
theorems.
Note the complex conjugation of the complex scalar c in
Theorem 10.3.1. Let A be a symmetric matrix. Then the previous formula.
every eigenvalue of A is real.
Let C be a complex n × n matrix. Then the most im-
Theorem 10.3.2. Let A be an n × n symmetric matrix. portant observation concerning Hermitian inner products
Then there is an orthonormal basis of Rn consisting of that we shall use is:
eigenvectors of A.
(10.3.1)
t
hCv, wi = hv, C wi.
Theorem 10.3.3. For each n × n symmetric matrix A,
there exists an orthogonal matrix P such that P t AP is a This fact is verified by calculating
diagonal matrix.
t t
hCv, wi = (Cv)t w = (v t C t )w = v t (C t w) = v t (C w) = hv, C wi.
The proof of Theorem 10.3.1 uses the Hermitian inner
product — a generalization of dot product to complex So if A is a n × n real symmetric matrix, then
vectors.
hAv, wi = hv, Awi, (10.3.2)

since A = At = A.
t
Hermitian Inner Products Let v, w ∈ Cn be two complex
n-vectors. Define
Proof of Theorem 10.3.1 Let λ be an eigenvalue of
hv, wi = v1 w1 + · · · + vn wn . A and let v be the associated eigenvector. Since Av = λv
we can use (10.3.2) to compute
Note that the coordinates wi of the second vector enter
this formula with a complex conjugate. However, if v and λhv, vi = hAv, vi = hv, Avi = λhv, vi.
w are real vectors, then
Since hv, vi = ||v||2 > 0, it follows that λ = λ and λ is
hv, wi = v · w. real. 
An alternative notation for the Hermitian inner product
Proof of Theorem 10.3.2 Let A be a real symmet-
is given by matrix multiplication. Suppose that v and w
ric n × n matrix. We show that there is an orthonormal
are column n-vectors. Then
basis of Rn consisting of eigenvectors of A. The proof
hv, wi = v t w. follows directly from Corollary 10.1.2 if the eigenvalues
are distinct.
The properties of the Hermitian inner product are similar If some of the eigenvalues are multiple, the proof is more
to those of dot product. We note three. Let c ∈ C be a complicated and uses Gram-Schmidt orthonormalization.

254
§10.3 The Spectral Theory of Symmetric Matrices

The proof proceeds inductively on n. The theorem is Finally, let vj = P −1 zj for j = 2, . . . , n. Since v1 =
trivially valid for n = 1; so we assume that it is valid for P −1 e1 , it follows that v1 , v2 , . . . , vn is a basis of Rn con-
n − 1. sisting of eigenvectors of A. We need only show that the
vj form an orthonormal basis of Rn . This is done us-
Theorem 7.2.4 of Chapter 7 implies that A has an eigen-
value λ1 and Theorem 10.3.1 states that this eigenvalue ing (10.3.2). For notational convenience let z1 = e1 and
is real. Let v1 be a unit length eigenvector corresponding compute
to the eigenvalue λ1 . Extend v1 to an orthonormal basis hvi , vj i = hP −1 zi , P −1 zj i = hP t zi , P t zj i
v1 , w2 , . . . , wn of Rn and let P = (v1 |w2 | · · · |wn ) be the
= hzi , P P t zj i = hzi , zj i,
matrix whose columns are the vectors in this orthonormal
basis. Orthonormality and direct multiplication implies since P P t = In . Thus the vectors vj form an orthonor-
that mal basis since the vectors zj form an orthonormal ba-
P t P = In . (10.3.3) sis. 

Therefore P is invertible; indeed P −1 = P t . Next, let Proof of Theorem 10.3.3 As a consequence of The-
−1
orem 10.3.2, let V = {v1 , . . . , vn } be an orthonormal basis
B=P AP. for Rn consisting of eigenvectors of A. Indeed, suppose
By direct computation Avj = λj vj
where λj ∈ R. Note that
Be1 = P −1 AP e1 = P −1 Av1 = λ1 P −1 v1 = λ1 e1 .

λj i=j
It follows that that B has the form Avj · vi =
0 i 6= j
It follows from (10.1.1) that
 
λ1 ∗
B=
0 C  
λ1 0
..
where C is an (n − 1) × (n − 1) matrix. Since P −1 = P t , [A]V =  .
 

it follows that B is a symmetric matrix; to verify this 0 λn
point compute
is a diagonal matrix. So every symmetric matrix A is
t t t t t t t
B = (P AP ) = P A (P ) = P AP = B. t similar by an orthogonal matrix P to a diagonal matrix
where P is the matrix whose columns are the eigenvectors
It follows that of A; namely, P = [v1 | · · · |vn ]. 
 
λ1 0
B=
0 C
Exercises
where C is a symmetric matrix. By induction we can use
the Gram-Schmidt orthonormalization process to choose
an orthonormal basis z2 , . . . , zn in {0} × Rn−1 consisting
1. Let
of eigenvectors of C. It follows that e1 , z2 , . . . , zn is an 
a b

orthonormal basis for Rn consisting of eigenvectors of B. A=
b d

255
§10.3 The Spectral Theory of Symmetric Matrices

be the general real 2 × 2 symmetric matrix.

(a) The discriminant of the characteristic polynomial of a


2 × 2 matrix is
D = tr(A)2 − 4 det(A).
Use the discriminant to show that A has real eigenvalues.
(b) Show that A has equal eigenvalues only if A is a scalar
multiple of I2 .

2. Let  
1 2
A= .
2 −2
Find the eigenvalues and eigenvectors of A and verify that the
eigenvectors are orthogonal.

3. Let Q be an orthogonal n × n matrix. Show that Q pre-


serves the length of vectors, that is
kQvk = kvk for all v ∈ Rn .

4. Let S2 be the set of real 2 × 2 symmetric matrices.

(a) Verify that S2 is a vector space and that


     
1 0 0 1 0 0
E1 = E2 = E3 =
0 0 1 0 0 1
(10.3.4)
is a basis of S2 . Hence S2 is 3-dimensional.
(b) Let P be a 2 × 2 orthogonal matrix. Verify that the map
MP : S2 → S2 defined by
MP (A) = P t AP
is linear.
(c) Let P be the matrix
 
0 −1
P = (10.3.5)
1 0
Verify that P is an orthogonal matrix and compute the
eigenvalues and eigenvectors of MP .

256
§10.4 *QR Decompositions

10.4 *QR Decompositions Proof By definition every vector v ∈ V satisfies ut v =


u · v = 0. Therefore,
In this section we describe an alternative approach to
Gram-Schmidt orthonormalization for constructing an 2
orthonormal basis of a subspace W ⊂ Rn . This method Hv = v − uut v = v,
ut u
is called the QR decomposition and is numerically supe-
rior to Gram-Schmidt. Indeed, the QR decomposition is and
2
the method used by MATLAB to compute orthonormal Hu = u − uut u = u − 2u = −u.
bases. To discuss this decomposition we need to intro- ut u
duce a new type of matrices, the orthogonal matrices. Hence H is a reflection across the hyperplane V . It also
follows that H 2 = In since H 2 v = H(Hv) = Hv = v for
all v ∈ V and H 2 u = H(−u) = u. So H 2 acts like the
Reflections Across Hyperplanes: Householder Matrices identity on a basis of Rn and H 2 = In .
Useful examples of orthogonal matrices are reflections
across hyperplanes. An n − 1 dimensional subspace of To show that H is orthogonal, we first calculate
Rn is called a hyperplane. Let V be a hyperplane and 2 2
let u be a nonzero vector normal to V . Then a reflection H t = Int −
ut u
(uut )t = In − t uut = H.
uu
across V is a linear map H : Rn → Rn such that
Therefore In = HH = HH t and H t = H −1 . Now apply
(a) Hv = v for all v ∈ V . Lemma 10.1.5(b). 

(b) Hu = −u.
QR Decompositions The Gram-Schmidt process is not
We claim that the matrix of a reflection across a hyper- used in practice to find orthonormal bases as there are
plane is orthogonal and there is a simple formula for that other techniques available that are preferable for orthog-
matrix. onalization on a computer. One such procedure for the
construction of an orthonormal basis is based on QR de-
Definition 10.4.1. A Householder matrix is an n × n compositions using Householder transformations. This
matrix of the form method is the one implemented in MATLAB .
2 An n × k matrix R = {rij } is upper triangular if rij = 0
H = In − uut (10.4.1)
ut u whenever i > j.
where u ∈ Rn is a nonzero vector. . Definition 10.4.3. An n × k matrix A has a QR decom-
position if
This definition makes sense since ut u = ||u||2 is a number A = QR. (10.4.2)
while the product uut is an n × n matrix.
where Q is an n × n orthogonal matrix and R is an n × k
Lemma 10.4.2. Let u ∈ Rn be a nonzero vector and let upper triangular matrix R.
V be the hyperplane orthogonal to u. Then the House-
holder matrix H is a reflection across V and is orthogo- QR decompositions can be used to find orthonormal
nal. bases as follows. Suppose that W = {w1 , . . . , wk } is a

257
§10.4 *QR Decompositions

basis for the subspace W ⊂ Rn . Then define the n × k Conversely, we can also write down a QR decomposition
matrix A which has the wj as columns, that is for a matrix A, if we have computed an orthonormal
basis for the columns of A. Indeed, using the Gram-
A = (w1t | · · · |wkt ). Schmidt process, Theorem 10.2.1, we have shown that
QR decompositions always exist. In the remainder of
Suppose that A = QR is a QR decomposition. Since Q is this section we discuss a different way for finding QR
orthogonal, the columns of Q are orthonormal. So write decompositions using Householder matrices.
Q = (v1t | · · · |vnt ).
Construction of a QR Decomposition Using Householder
On taking transposes we arrive at the equation At = Matrices The QR decomposition by Householder trans-
Rt Qt : formations is based on the following observation :
Proposition 10.4.5. Let z = (z1 , . . . , zn ) ∈ Rn be
 
  r11 0 · · · 0 ··· 0  
w1 v1
 ..  
 r12 r22 · · · 0 ··· 0   .. 
 nonzero and let
 .  =  .. .. .. ..  . .
 . . . .
q
··· ···  r = zj2 + · · · + zn2 .
wk vn
r1k r2k · · · rkk ··· 0
Define u = (u1 , . . . , un ) ∈ Rn by
By equating rows in this matrix equation we arrive at    
the system u1 0
 ..   ..
 .   .


w1 = r11 v1 
 uj−1  
 
0


w2 = r12 v1 + r22 v2   
 uj  =  zj − r

.
.. (10.4.3)
.
   
 uj+1   zj+1 
 .   ..
   
wk = r1k v1 + r2k v2 + · · · + rkk vk .  ..   .


It now follows that the W = span{v1 , . . . , vk } and that un zn
{v1 , . . . , vk } is an orthonormal basis for W . We have Then
proved: 2ut z = ut u
Proposition 10.4.4. Suppose that there exist an orthog- and  
z1
onal n×n matrix Q and an upper triangular n×k matrix  ..
R such that the n × k matrix A has a QR decomposition  .


 
 zj−1 
A = QR. (10.4.4)
 
 r
Hz =  

 0
Then the first k columns v1 , . . . , vk of the matrix

 .
 
Q form an orthonormal basis of the subspace W =  ..


span{w1 , . . . , wk }, where the wj are the columns of A. 0
Moreover, rij = vi · wj is the coordinate of wj in the 2
orthonormal basis. holds for the Householder matrix H = In − uut .
ut u

258
§10.4 *QR Decompositions

Proof Begin by computing Then the matrix A1 = H1 A can be written as


t 2
uz = uj zj + zj+1 + ··· + zn2 A1 = (r1 |w21 | · · · |wk1 ),
= zj2 − rzj + 2
zj+1 + ··· + zn2
2 where wj1 = H1 wj0 for j = 2, . . . , k.
= −rzj + r .
Second, set z = w21 in Proposition 10.4.5 and construct
Next, compute
the Householder matrix H2 such that
ut u = 2
(zj − r)(zj − r) + zj+1 + · · · + zn2  
r12
= zj2 − 2rzj + r2 + zj+1
2
+ · · · + zn2  r22 
2(−rzj + r2 ).
 
= H2 w2 =  0  ≡ r2 .
1  
 .. 
Hence 2ut z = ut u, as claimed.  . 
0
Note that z − u is the vector on the right hand side of
(10.4.4). So, compute Then the matrix A2 = H2 A1 = H2 H1 A can be written
2ut z as
 
2
Hz = In − t uut z = z − t u = z − u
uu uu A2 = (r1 |r2 |w32 | · · · |wk2 )
to see that (10.4.4) is valid.  where wj2 = H2 wj1 for j = 3, . . . , k. Observe that the
1st column r1 is not affected by the matrix multiplica-
An inspection of the proof of Proposition 10.4.5 shows tion, since H2 leaves the first component of a vector un-
that we could have chosen changed.
uj = zj + r Proceeding inductively, in the ith step, set z = wii−1 and
use Proposition 10.4.5 to construct the Householder ma-
instead of uj = zj − r. Therefore, the choice of H is not trix Hi such that:
unique.
Proposition 10.4.5 allows us to determine inductively a
 
r1i
QR decomposition of the matrix  .. 
 . 
 
 rii 
A = (w10 | · · · |wk0 ), i−1
Hi wi =    ≡ ri
 0 

where each wj0 ∈ Rn . So, A is an n × k matrix and k ≤ n.  . 
 .. 
First, set z = w10 and use Proposition 10.4.5 to construct 0
the Householder matrix H1 such that
  and the matrix Ai = Hi Ai−1 = Hi · · · H1 A can be writ-
r11
 0  ten as
H1 w10 =  .  ≡ r1 .
i
Ai = (r1 | · · · |ri |wi+1 | · · · |wki ),
 .. 
 

0 where wi2 = Hi wji−1 for j = i + 1, . . . , k.

259
§10.4 *QR Decompositions

After k steps we arrive at R =


-1.4142 -1.4142 -1.4142
Hk · · · H1 A = R, 0 2.0000 -0.5000
0 0 1.6583
where R = (r1 | · · · |rk ) is an upper triangular n × k ma-
trix. Since the Householder matrices H1 , . . . , Hk are
A comparison with (10.2.7) shows that the columns of
orthogonal, it follows from Lemma 10.1.5(c) that the
the matrix Q are the elements in the orthonormal basis.
Qt = Hk · · · H1 is orthogonal. Thus, A = QR is a QR
The only difference is that the sign of the first vector is
decomposition of A.
opposite. However, this is not surprising since we know
that there is some freedom in the choice of Householder
Orthonormalization with MATLAB Given a set matrices, as remarked after Proposition 10.4.5.
w1 , . . . , wk of linearly independent vectors in Rn the In addition, the command qr produces the matrix R
MATLAB command qr allows us to compute an or- whose entries rij are the coordinates of the vectors wj in
thonormal basis of the spanning set of these vectors. As the new orthonormal basis as in (10.4.3). For instance,
mentioned earlier, the underlying technique MATLAB the second column of R tells us that
uses for the computation of the QR decomposition is
based on Householder transformations. w2 = r12 v1 + r22 v2 + r32 v3 = −1.4142v1 + 2.0000v2 .
The syntax of the QR decomposition in MATLAB is
quite simple. For example, let w1 = (1, 0, −1, 0), w2 = Exercises
(2, −1, 0, 1) and w3 = (0, 0, −2, 1) be the three vectors
in (10.2.6). In Section 5.5 we computed an orthonormal
basis for the subspace of R4 spanned by w1 , w2 , w3 . Here In Exercises 1 – 4, compute the Householder matrix H corre-
we use the MATLAB command qr to find an orthonormal sponding to the given vector u.
basis for this subspace. Let A be the matrix having the  
vectors w1t , w2t and w3t as columns. So, A is: 1. u =
1
.
1
A = [1 2 0; 0 -1 0; -1 0 -2; 0 1 1]
 
0
2. u = .
−2
The command  
−1
3. u =  1 .
[Q R] = qr(A,0) 5

leads to the answer


 
1
 0 
4. u = 
 4 .

Q =
−2
-0.7071 0.5000 -0.4523
0 -0.5000 -0.1508
0.7071 0.5000 -0.4523 5. Find the matrix that reflects the plane across the line
0 0.5000 0.7538 generated by the vector (1, 2).

260
§10.4 *QR Decompositions

In Exercises 6 – 9, use the MATLAB command qr to compute


an orthonormal basis for each of the subspaces spanned by the
given set of vectors.

6. (matlab) w1 = (1, −1), w2 = (1, 2).

7. (matlab) w1 = (1, −2, 3), w2 = (0, 1, 1).

8. (matlab) w1 = (1, −2, 3), w2 = (0, 1, 1), w3 =


(2, 2, 0).

9. (matlab) v1 = (1, 0, −2, 0, −1), v2 =


(2, −1, 4, 2, 0), v3 = (0, 3, 5, 1, −1).

10. (matlab) Find the 4 × 4 Householder matrices H1 and


H2 corresponding to the vectors

u1 = (1.04, 2, 0.76, −0.32)


(10.4.5*)
u2 = (1.4, −1.3, 0.6, 1.2).

Compute H = H1 H2 and verify that H is an orthogonal


matrix.

261
Chapter 11 *Matrix Normal Forms

11 *Matrix Normal Forms


In this chapter we generalize to n × n matrices the the-
ory of matrix normal forms presented in Chapter 6 for
2 × 2 matrices. In this theory we ask: What is the sim-
plest form that a matrix can have up to similarity. Af-
ter first presenting several preliminary results, the theory
culminates in the Jordan normal form theorem, Theo-
rem 11.3.2.
The first of the matrix normal form results — every ma-
trix with n distinct real eigenvalues can be diagonalized
— is presented in Section 7.3. The basic idea is that
when a matrix has n distinct real eigenvalues, then it
has n linearly independent eigenvectors. In Section 11.1
we discuss matrix normal forms when the matrix has n
distinct eigenvalues some of which are complex. When an
n×n matrix has fewer than n linearly independent eigen-
vectors, it must have multiple eigenvalues and general-
ized eigenvectors. This topic is discussed in Section 11.2.
The Jordan normal form theorem is introduced in Sec-
tion 11.3 and describes similarity of matrices when the
matrix has fewer than n independent eigenvectors. The
proof is given in Appendix 11.5.
We introduced Markov matrices in Section 4.8. One of
the theorems discussed there has a proof that relies on the
Jordan normal form theorem, and we prove this theorem
in Appendix 11.4.

262
§11.1 Simple Complex Eigenvalues

11.1 Simple Complex Eigenvalues rewrite (11.1.1) as


Theorem 7.3.1 states that a matrix A with real unequal    
σ −τ cos θ − sin θ
eigenvalues may be diagonalized. It follows that in an ap- =r = rRθ ,
τ σ sin θ cos θ
propriately chosen basis (the basis of eigenvectors), ma-
trix multiplication by A acts as multiplication by these where Rθ is rotation counterclockwise through angle θ.
real eigenvalues. Moreover, geometrically, multiplication From this discussion we see that geometrically complex
by A stretches or contracts vectors in eigendirections (de- eigenvalues are associated with rotations followed either
pending on whether the eigenvalue is greater than or less by stretching (r > 1) or contracting (r < 1).
than 1 in absolute value).
As an example, consider the matrix
The purpose of this section is to show that a similar kind
of diagonalization is possible when the matrix has dis-
 
2 1
tinct complex eigenvalues. See Theorem 11.1.1 and The- A= . (11.1.2)
−2 0
orem 11.1.2. We show that multiplication by a matrix
with complex eigenvalues corresponds to multiplication The characteristic polynomial of A is pA (λ) = λ2 −2λ+2.
by complex numbers. We also show that multiplication Thus the eigenvalues of A are 1 ± i, and σ = 1 and
by complex eigenvalues correspond geometrically to ro- τ = 1 for this example. An eigenvector associated to the
tations as well as expansions and contractions. eigenvalue 1 − i is v = (1, −1 − i)t = (1, −1)t + i(0, −1)t .
Therefore,
The Geometry of Complex Eigenvalues: Rotations and Di- 
1 −1
 
1 0

latations Real 2 × 2 matrices are the smallest real ma- B = S −1 AS = where S = ,
1 1 −1 −1
trices where complex eigenvalues can possibly occur. In
Chapter 6, Theorem 6.3.4(b) we discussed the classifica- as can be checked by direct calculation. Moreover, we
tion of such matrices up to similarity. Recall that if the can rewrite
eigenvalues of a 2 × 2 matrix A are σ ± iτ , then A is  √ √ 
similar to the matrix 2

2
√  √
  B = 2  √2 √2   = 2R π4 .
σ −τ 2 2
. (11.1.1)
τ σ 2 2

Moreover, the basis in which A has the form (11.1.1) is So, in an appropriately chosen coordinate system, mul-
found as follows. Let v = w1 + iw2 be the eigenvector of tiplication by A rotates vectors counterclockwise
√ by 45◦
A corresponding to the eigenvalue σ − iτ . Then {w1 , w2 } and then expands the result by a factor of 2. See Ex-
is the desired basis. ercise 3.
Geometrically, multiplication of vectors in R2 by (11.1.1)
is the same as a rotation followed by a dilatation. More The Algebra of Complex Eigenvalues: Complex Multiplica-
specifically, let r = σ 2 + τ 2 . So the point (σ, τ ) lies on tion We have shown that the normal form (11.1.1) can
p

the circle of radius r about the origin, and there is an be interpreted geometrically as a rotation followed by a
angle θ such that (σ, τ ) = (r cos θ, r sin θ). Now we can dilatation. There is a second algebraic interpretation of

263
§11.1 Simple Complex Eigenvalues

(11.1.1), and this interpretation is based on multiplica- for any real number θ. It follows that we can write a
tion by complex numbers. complex number λ = σ + iτ in polar form as
Let λ = σ + iτ be a complex number and consider the
matrix associated with complex multiplication, that is, λ = reiθ
the linear mapping
where r2 = λλ = σ 2 + τ 2 , σ = r cos θ, and τ = r sin θ.
z 7→ λz (11.1.3)
Now consider multiplication by λ in polar form. Write
on the complex plane. By identifying real and imaginary z = seiϕ in polar form, and compute
parts, we can rewrite (11.1.3) as a real 2 × 2 matrix in
the following way. Let z = x + iy. Then λz = reiθ seiϕ = rsei(ϕ+θ) .
λz = (σ + iτ )(x + iy) = (σx − τ y) + i(τ x + σy).
It follows from polar form that multiplication of z by λ =
reiθ rotates z through an angle θ and dilates the result
Now identify z with the vector (x, y); that is, the vec-
by the factor r. Thus Euler’s formula directly relates the
tor whose first component is the real part of z and whose
geometry of rotations and dilatations with the algebra of
second component is the imaginary part. Using this iden-
multiplication by a complex number.
tification the complex number λz is identified with the
vector (σx − τ y, τ x + σy). So, in real coordinates and in
matrix form, (11.1.3) becomes
Normal Form Matrices with Distinct Complex Eigen-

x
 
σx − τ y
 
σ −τ

x
 values In the first parts of this section we have dis-
y
7→
τ x + σy
=
τ σ y
. cussed a geometric and an algebraic approach to matrix
multiplication by 2×2 matrices with complex eigenvalues.
That is, the matrix corresponding to multiplication of We now turn our attention to classifying n × n matrices
z = x + iy by the complex number λ = σ + iτ is the that have distinct eigenvalues, whether these eigenvalues
one that multiplies the vector (x, y)t by the normal form are real or complex. We will see that there are two ways
matrix (11.1.1). to frame this classification — one algebraic (using com-
plex numbers) and one geometric (using rotations and
dilatations).
Direct Agreement Between the Two Interpretations of
(11.1.1) We have shown that matrix multiplication by
(11.1.1) may be thought of either algebraically as mul- Algebraic Normal Forms: The Complex Case Let A be an
tiplication by a complex number (an eigenvalue) or geo- n × n matrix with real entries and n distinct eigenvalues
metrically as a rotation followed by a dilatation. We now λ1 , . . . , λn . Let vj be an eigenvector associated with the
show how to go directly from the algebraic interpretation eigenvalue λj . By methods that are entirely analogous
to the geometric interpretation. to those in Section 7.3 we can diagonalize the matrix
A over the complex numbers. The resulting theorem is
Euler’s formula (Chapter 6, (6.2.5)) states that
analogous to Theorem 7.3.1.
eiθ = cos θ + i sin θ More precisely, the n × n matrix A is complex diagonal-

264
§11.1 Simple Complex Eigenvalues

izable if there is a complex n × n matrix T such that Moreover, the columns of T are the complex eigenvectors
  v1 and v2 associated to the eigenvalues λ and λ.
λ1 0 · · · 0
 0 λ2 · · · 0  It can be checked that the eigenvectors of B are v1 =
T −1 AT =  .
. .. . . ..  . (1, −i)t and v2 = (1, i)t . On setting
 
 . . . .   
0 0 · · · λn 1 1
T = ,
−i i
Theorem 11.1.1. Let A be an n × n matrix with n
distinct eigenvalues. Then A is complex diagonalizable. it is a straightforward calculation to verify that C =
T −1 BT .
The proof of Theorem 11.1.1 follows from a theoretical As a second example, consider the matrix
development virtually word for word the same as that
used to prove Theorem 7.3.1 in Section 7.3. Beginning
 
4 2 1
from the theory that we have developed so far, the diffi- A =  2 −3 1 . (11.1.5*)
culty in proving this theorem lies in the need to base the 1 −1 −3
theory of linear algebra on complex scalars rather than Using MATLAB we find the eigenvalues of A by typing
real scalars. We will not pursue that development here. eig(A). They are:
As in Theorem 7.3.1, the proof of Theorem 11.1.1 shows
that the complex matrix T is the matrix whose columns ans =
are the eigenvectors vj of A; that is, 4.6432
-3.3216 + 0.9014i
T = (v1 | · · · |vn ). -3.3216 - 0.9014i

Finally, we mention that the computation of inverse ma- We can diagonalize (over the complex numbers) using
trices with complex entries is the same as that for matri- MATLAB — indeed MATLAB is programmed to do these
ces with real entries. That is, row reduction of the n × 2n calculations over the complex numbers. Type [T,D] =
matrix (T |In ) leads, when T is invertible, to the matrix eig(A) and obtain
(In |T −1 ).
T =
0.9604 -0.1299 + 0.1587i -0.1299 - 0.1587i
Two Examples As a first example, consider the normal 0.2632 0.0147 - 0.5809i 0.0147 + 0.5809i
form 2 × 2 matrix (11.1.1) that has eigenvalues λ and λ, 0.0912 0.7788 - 0.1173i 0.7788 + 0.1173i
where λ = σ + iτ . Let
D =
4.6432 0 0
   
σ −τ λ 0
B= and C = . 0 -3.3216 + 0.9014i 0
τ σ 0 λ
0 0 -3.3216 - 0.9014i
Since the eigenvalues of B and C are identical, Theo-
rem 11.1.1 implies that there is a 2 × 2 complex matrix This calculation can be checked by typing inv(T)*A*T
T such that to see that the diagonal matrix D appears. One can also
C = T −1 BT. (11.1.4) check that the columns of T are eigenvectors of A.

265
§11.1 Simple Complex Eigenvalues

Note that the development here does not depend on the We need two preliminary results.
matrix A having real entries. Indeed, this diagonalization
can be completed using n × n matrices with complex Lemma 11.1.3. Let λ1 , . . . , λq be distinct (possible com-
entries — and MATLAB can handle such calculations. plex) eigenvalues of an n × n matrix A. Let vj be a (pos-
sibly complex) eigenvector associated with the eigenvalue
λj . Then v1 , . . . , vq are linearly independent in the sense
Geometric Normal Forms: Block Diagonalization There that if
is a second normal form theorem based on the geometry α1 v1 + · · · + αq vq = 0 (11.1.8)
of rotations and dilatations for real n × n matrices A.
In this normal form we determine all matrices A that for (possibly complex) scalars αj , then αj = 0 for all j.
have distinct eigenvalues — up to similarity by real n × n
matrices S. The normal form results in matrices that are Proof The proof is identical in spirit with the proof of
block diagonal with either 1 × 1 blocks or 2 × 2 blocks of Lemma 7.3.2. Proceed by induction on q. When q = 1
the form (11.1.1) on the diagonal. the lemma is trivially valid, as αv = 0 for v 6= 0 implies
A real n × n matrix is in real block diagonal form if it is that α = 0, even when α ∈ C and v ∈ Cn .
a block diagonal matrix By induction assume the lemma is valid for q − 1. Now
  apply A to (11.1.8) obtaining
B1 0 · · · 0
 0 B2 · · · 0  α1 λ1 v1 + · · · + αq λq vq = 0.
 .. .. .. .. , (11.1.6)
 
 . . . .
Subtract this identity from λq times (11.1.8), and obtain

0 0 ··· Bm
where each Bj is either a 1 × 1 block α1 (λ1 − λq )v1 + · · · + αq−1 (λq−1 − λq )vq−1 = 0.

Bj = λj By induction
αj (λj − λq ) = 0
for some real number λj or a 2 × 2 block
for j = 1, . . . , q − 1. Since the λj are distinct it follows
that αj = 0 for j = 1, . . . , q − 1. Hence (11.1.8) implies
 
σj −τj
Bj = (11.1.7)
τj σj that αq vq = 0; since vq 6= 0, αq = 0. 
where σj and τj 6= 0 are real numbers. A matrix is real Lemma 11.1.4. Let µ1 , . . . , µk be distinct real eigen-
block diagonalizable if it is similar to a real block diagonal values of an n × n matrix A and let ν1 , ν 1 . . . , ν` , ν ` be
form matrix. distinct complex conjugate eigenvalues of A. Let vj ∈ Rn
Note that the real eigenvalues of a real block diagonal be eigenvectors associated to µj and let wj = wjr + iwji
form matrix are just the real numbers λj that occur in the be eigenvectors associated with the eigenvalues νj . Then
1×1 blocks. The complex eigenvalues are the eigenvalues the k + 2` vectors
of the 2 × 2 blocks Bj and are σj ± iτj .
v1 , . . . , vk , w1r , w1i , . . . , w`r , w`i
Theorem 11.1.2. Every n × n matrix A with n distinct
eigenvalues is real block diagonalizable. in Rn are linearly independent.

266
§11.1 Simple Complex Eigenvalues

Proof Let w = wr + iwi be a vector in Cn and let β r (11.1.11). Since these vectors are linearly independent, S
and β i be real scalars. Then is invertible. We claim that S −1 AS is real block diagonal.
This statement is verified by direct calculation.
β r wr + β i wi = βw + βw, (11.1.9)
First, note that Sej = vj for j = 1, . . . , k and compute
1
where β = (β r − iβ i ). Identity (11.1.9) is verified by (S −1 AS)ej = S −1 Avj = µj S −1 vj = µj ej .
2
direct calculation.
It follows that the first k columns of S −1 AS are zero
Suppose now that except for the diagonal entries, and those diagonal entries
equal µ1 , . . . , µk .
α1 v1 + · · · + αk vk + β1r w1r + β1i w1i + · · · + β`r w`r + β`i w`i = 0
(11.1.10) Second, note that Sek+1 = w1r and Sek+2 = w1i . Write
for real scalars αj , βjr and βji . Using (11.1.9) we can the complex eigenvalues as
rewrite (11.1.10) as
νj = σj + iτj .
α1 v1 + · · · + αk vk + β1 w1 + β 1 w1 + · · · + β` w` + β ` w` = 0,
Since Aw1 = ν 1 w1 , it follows that
1
where βj = (βjr − iβji ). Since the eigenvalues Aw1r + iAw1i = (σ1 − iτ1 )(w1r + iw1i )
2
= (σ1 w1r + τ1 w1i ) + i(−τ1 w1r + σ1 w1i ).
µ1 , . . . , µ k , ν 1 , ν 1 . . . , ν ` , ν `
Equating real and imaginary parts leads to
are all distinct, we may apply Lemma 11.1.3 to conclude
that αj = 0 and βj = 0. It follows that βjr = 0 and Aw1r = σ1 w1r + τ1 w1i
(11.1.12)
βji = 0, as well, thus proving linear independence.  Aw1i = −τ1 w1r + σ1 w1i .

. Using (11.1.12), compute

(S −1 AS)ek+1 = S −1 Aw1r = S −1 (σ1 w1r + τ1 w1i )


Proof of Theorem 11.1.2 Let µj for j = 1, . . . , k be = σ1 ek+1 + τ1 ek+2 .
the real eigenvalues of A and let νj , ν j for j = 1, . . . , ` be
the complex eigenvalues of A. Since the eigenvalues are Similarly,
all distinct, it follows that k + 2` = n.
(S −1 AS)ek+2 = S −1 Aw1i = S −1 (−τ1 w1r + σ1 w1i )
Let vj and wj = wjr +iwji be eigenvectors associated with
the eigenvalues µj and ν j . It follows from Lemma 11.1.4 = −τ1 ek+1 + σ1 ek+2 .
that the n real vectors
Thus, the k th and (k + 1)st columns of S −1 AS have the
v1 , . . . , vk , w1r , w1i , . . . , w`r , w`i (11.1.11) desired diagonal block in the k th and (k + 1)st rows, and
have all other entries equal to zero.
are linearly independent and hence form a basis for Rn .
The same calculation is valid for the complex eigenval-
We now show that A is real block diagonalizable. Let ues ν2 , . . . , ν` . Thus, S −1 AS is real block diagonal, as
S be the n × n matrix whose columns are the vectors in claimed. 

267
§11.1 Simple Complex Eigenvalues

MATLAB Calculations of Real Block Diagonal Form Let To find the matrix S that puts C in real block diagonal
C be the 4 × 4 matrix form, we need to take the real and imaginary parts of
the eigenvectors corresponding to the complex eigenval-
ues and the real eigenvectors corresponding to the real
 
1 0 2 3
C=
 2 1 4 6 
. (11.1.13*) eigenvalues. In this case, type
 −1 −5 1 3 
1 4 7 10 S = [real(T(:,1)) imag(T(:,1)) T(:,3) T(:,4)]

Using MATLAB enter C by typing e13_2_14 and find


to obtain
the eigenvalues of C by typing eig(C) to obtain
S =
ans =
-0.0787 0.0899 0.0464 0.2209
0.5855 + 0.8861i
0.0772 0.2476 0.0362 0.4803
0.5855 - 0.8861i
-0.5558 -0.5945 -0.8421 -0.0066
-0.6399
0.3549 0.3607 0.5361 0.8488
12.4690

We see that C has two real and two complex conjugate Note that the 1st and 2nd columns are the real and
eigenvalues. To find the complex eigenvectors associated imaginary parts of the complex eigenvector. Check that
with these eigenvalues, type inv(S)*C*S is the matrix in complex diagonal form

[T,D] = eig(C) ans =


0.5855 0.8861 0.0000 0.0000
MATLAB responds with -0.8861 0.5855 0.0000 -0.0000
0.0000 0.0000 -0.6399 0.0000
T = -0.0000 -0.0000 -0.0000 12.4690
-0.0787+0.0899i -0.0787-0.0899i 0.0464 0.2209
0.0772+0.2476i 0.0772-0.2476i 0.0362 0.4803 as proved in Theorem 11.1.2.
-0.5558-0.5945i -0.5558+0.5945i -0.8421 -0.0066
0.3549+0.3607i 0.3549-0.3607i 0.5361 0.8488
Exercises
D =
0.586+0.886i 0 0 0
0 0.586-0.886i 0 0 1. Consider the 2 × 2 matrix
0 0 -0.640 0 
3 1

0. 0 0 12.469 A=
−2 1
,

whose eigenvalues are 2 ± i and whose associated eigenvectors


The 4 × 4 matrix T has the eigenvectors of C as columns.
are:
The j th column is the eigenvector associated with the j th 
1−i
 
1+i

diagonal entry in the diagonal matrix D. and
2i −2i

268
§11.1 Simple Complex Eigenvalues

Find a complex 2 × 2 matrix T such that C = T −1 AT is 5. (matlab)


complex diagonal and a real 2 × 2 matrix S so that B =
S −1 AS is in real block diagonal form.
 
1 2 4
A= 2 −4 −5  . (11.1.14*)
1 10 −15
2. Let 
2 5
 6. (matlab)
A= .
−2 0  
−15.1220 12.2195 13.6098 14.9268
Find a complex 2 × 2 matrix T such that T AT is complex
−1  −28.7805 21.8049 25.9024 28.7317 
A= .
diagonal and a real 2 × 2 matrix S so that S −1 AS is in real

60.1951 −44.9512 −53.9756 −60.6829 
block diagonal form. −44.5122 37.1220 43.5610 47.2927
(11.1.15*)
7. (matlab)
3. (matlab) Use map to verify that the normal form ma- 
2.2125 5.1750 8.4250 15.0000 19.2500 0.5125

trices (11.1.1) are just rotations followed by dilatations. In  −1.9500 −3.9000 −6.5000 −7.4000 −12.0000 −2.9500
particular, use map to study the normal form matrix

 2.2250 3.9500 6.0500 0.9000 1.5000 1.0250
 
A= .

   −0.2000 −0.4000 0 0.1000 0 −0.2000 
1 −1  −0.7875 −0.8250 −1.5750 1.0000 2.2500 0.5125 
A= .
1 1 1.7875 2.8250 4.5750 0 4.7500 5.4875
(11.1.16*)
Then compare your results with the similar matrix
 
B=
2 1
. 8. (matlab)
−2 0  
−12 15 0
A= 1 5 2 . (11.1.17*)
4. (matlab) Consider the 2 × 2 matrix −5 1 5
 
−0.8318 −1.9755
A= .
0.9878 1.1437

(a) Use MATLAB to find the complex conjugate eigenvalues


and eigenvectors of A.
(b) Find the real block diagonal normal form of A and de-
scribe geometrically the motion of this normal form on
the plane.
(c) Using map describe geometrically how A maps vectors
in the plane to vectors in the plane.

In Exercises 5 – 8 find a square real matrix S so that S −1 AS


is in real block diagonal form and a complex square matrix T
so that T −1 AT is in complex diagonal form.

269
§11.2 Multiplicity and Generalized Eigenvectors

11.2 Multiplicity and Generalized Proof In coordinates the equation N v = 0 is:


Eigenvectors 
0 1 0 ··· 0 0

v1
 
v2

The difficulty in generalizing the results in the previous 1 ··· 0 0 


 0 0   v2
   v3 
   
two sections to matrices with multiple eigenvalues stems 0 · · · 0 0   v3
 0 0     v4 
 .. .. .. . . . .   ..  =  ..  = 0.
. .. .. 
   
from the fact that these matrices may not have enough  . . .  .   . 
(linearly independent) eigenvectors. In this section we
    
 0 0 0 · · · 0 1   vn−1   vn 
present the basic examples of matrices with a deficiency 0 0 0 ··· 0 0 vn 0
of eigenvectors, as well as the definitions of algebraic and
geometric multiplicity. These matrices will be the build- Thus v2 = v3 = · · · vn = 0, and the solutions are all
ing blocks of the Jordan normal form theorem — the multiples of e1 . Therefore, the nullity of N is 1. 
theorem that classifies all matrices up to similarity.
Note that we can express matrix multiplication by N as

Deficiency in Eigenvectors for Real Eigenvalues An exam- N e1 = 0


(11.2.2)
ple of deficiency in eigenvectors is given by the following N ej = ej−1 j = 2, . . . , n.
n × n matrix
Note that (11.2.2) implies that N n = 0.
The n × n matrix N motivates the following definitions.
 
λ0 1 0 ··· 0 0
 0 λ0 1 · · · 0 0 

 0 0 λ · · · 0 0 
 Definition 11.2.2. Let λ0 be an eigenvalue of A. The
0
Jn (λ0 ) =  . .. .. .. .. ..  (11.2.1) algebraic multiplicity of λ0 is the number of times that λ0
 ..
 
. . . . .  appears as a root of the characteristic polynomial pA (λ).
The geometric multiplicity of λ0 is the number of linearly
 
 0 0 0 · · · λ0 1 
0 0 0 · · · 0 λ0 independent eigenvectors of A having eigenvalue equal to
λ0 .
where λ0 ∈ R. Note that Jn (λ0 ) has all diagonal entries
equal to λ0 , all superdiagonal entries equal to 1, and all Abstractly, the geometric multiplicity is:
other entries equal to 0. Since Jn (λ0 ) is upper triangular,
all n eigenvalues of Jn (λ0 ) are equal to λ0 . However, nullity(A − λ0 In ).
Jn (λ0 ) has only one linearly independent eigenvector. To
verify this assertion let Our previous calculations show that the matrix Jn (λ0 )
has an eigenvalue λ0 with algebraic multiplicity equal to
N = Jn (λ0 ) − λ0 In . n and geometric multiplicity equal to 1.

Then v is an eigenvector of Jn (λ0 ) if and only if N v = Lemma 11.2.3. The algebraic multiplicity of an eigen-
0. Therefore, Jn (λ0 ) has a unique linearly independent value is greater than or equal to its geometric multiplicity.
eigenvector if
Proof For ease of notation we prove this lemma only
Lemma 11.2.1. nullity(N ) = 1. for real eigenvalues, though the proof for complex eigen-
values is similar. Let A be an n × n matrix and let λ0

270
§11.2 Multiplicity and Generalized Eigenvectors

be a real eigenvalue of A. Let k be the geometric multi- Lemma 11.2.4. Let λ0 be a complex number. Then the
plicity of λ0 and let v1 , . . . , vk be k linearly independent algebraic multiplicity of the eigenvalue λ0 in the 2n × 2n
eigenvectors of A with eigenvalue λ0 . We can extend matrix Jbn (λ0 ) is n and the geometric multiplicity is 1.
{v1 , . . . , vk } to be a basis V = {v1 , . . . , vn } of Rn . In this
basis, the matrix of A is Proof We begin by showing that the eigenvalues of
  J = Jbn (λ0 ) are λ0 and λ0 , each with algebraic multi-
[A]V =
λ0 Ik (∗)
. plicity n. The characteristic polynomial of J is pJ (λ) =
0 B det(J−λI2n ). From Lemma 7.1.9 of Chapter 7 and induc-
tion, we see that pJ (λ) = pB (λ)n . Since the eigenvalues
The matrices A and [A]V are similar matrices. There- of B are λ0 and λ0 , we have proved that the algebraic
fore, they have the same characteristic polynomials and multiplicity of each of these eigenvalues in J is n.
the same eigenvalues with the same algebraic multiplici-
ties. It follows from Lemma 7.1.9 that the characteristic Next, we compute the eigenvectors of J. Let Jv = λ0 v
polynomial of A is: and let v = (v1 , . . . , vn ) where each vj ∈ C2 . Observe
that (J − λ0 I2n )v = 0 if and only if
pA (λ) = p[A]V (λ) = (λ − λ0 )k pB (λ).
Qv1 + v2 = 0
Hence λ0 appears as a root of pA (λ) at least k times and ..
.
the algebraic multiplicity of λ0 is greater than or equal to
k. The same proof works when λ0 is a complex eigenvalue Qvn−1 + vn = 0
— but all vectors chosen must be complex rather than Qvn = 0,
real. 
where Q = B − λ0 I2 . Using the fact that λ0 = σ + iτ , it
follows that
Deficiency in Eigenvectors with Complex Eigenvalues An 
i 1

example of a real matrix with complex conjugate eigen- Q = B − λ0 I2 = −τ
−1 i
.
values having geometric multiplicity less than algebraic
multiplicity is the 2n × 2n block matrix Hence  
2 2 i 1
Q = 2τ i = −2τ iQ.
−1 i
 
B I2 0 ··· 0 0
0 B I2 ··· 0 0 Thus
 
 
 0 0 B ··· 0 0 
0 = Q2 vn−1 + Qvn = −2τ iQvn−1 ,
Jn (λ0 ) =  .. .. .. .. .. .. (11.2.3)
 
.
b
. . . . .




 from which it follows that Qvn−1 + vn = vn = 0. Sim-
 0 0 0 ··· B I2  ilarly, v2 = · · · = vn−1 = 0. Since there is only one
0 0 0 ··· 0 B nonzero complex vector v1 (up to a complex scalar mul-
tiple) satisfying
where λ0 = σ + iτ and B is the 2 × 2 matrix Qv1 = 0,
it follows that the geometric multiplicity of λ0 in the
 
σ −τ
B= . matrix Jbn (λ0 ) equals 1. 
τ σ

271
§11.2 Multiplicity and Generalized Eigenvectors

Definition 11.2.5. The real matrices Jn (λ0 ) when λ0 ∈ It follows that


R and Jbn (λ0 ) when λ0 ∈ C are real Jordan blocks. The
matrices Jn (λ0 ) when λ0 ∈ C are (complex) Jordan null space(Ak0 ) = null space(Ak+1
0 ). (11.2.5)
blocks.
Lemma 11.2.7. Let λ0 be a real eigenvalue of the n × n
matrix A and let A0 = A − λ0 In . Let k be the smallest
integer for which (11.2.5) is valid. Then
Generalized Eigenvectors and Generalized Eigenspaces
What happens when n×n matrices have fewer that n lin- null space(Ak0 ) = null space(Ak+j )
early independent eigenvectors? Answer: The matrices 0

gain generalized eigenvectors. for every interger j > 0.


Definition 11.2.6. A vector v ∈ C is a generalized
n

eigenvector for the n × n matrix A with eigenvalue λ if Proof We can prove the lemma by induction on j if
we can show that
(A − λIn )k v = 0 (11.2.4)
for some positive integer k. The smallest integer k for null space(Ak+1
0 ) = null space(Ak+2
0 ).
which (11.2.4) is satisfied is called the index of the gen-
Since null space(Ak+1 ) ⊂ null space(Ak+2 ), we need to
eralized eigenvector v. 0 0
show that
Note: Eigenvectors are generalized eigenvectors with in-
null space(Ak+2 ) ⊂ null space(Ak+1 ).
dex equal to 1. 0 0

Let λ0 be a real number and let N = Jn (λ0 ) − λ0 In . Let w ∈ null space(Ak+2


0 ). It follows that
Recall that (11.2.2) implies that N n = 0. Hence every
vector in Rn is a generalized eigenvector for the matrix Ak+1 Aw = Ak+2 w = 0;
Jn (λ0 ). So Jn (λ0 ) provides a good example of a matrix so Aw ∈ null space(Ak+1 ) = null space(Ak0 ), by (11.2.5).
whose lack of eigenvectors (there is only one independent Therefore,
0

eigenvector) is made up for by generalized eigenvectors Ak+1 w = Ak (Aw) = 0,


(there are n independent generalized eigenvectors).
which verifies that w ∈ null space(Ak+1 ). 
Let λ0 be an eigenvalue of the n × n matrix A and let 0

A0 = A − λ0 In . For simplicity, assume that λ0 is real.


Note that Let Vλ0 be the set of all generalized eigenvectors of A with
eigenvalue λ0 . Let k be the smallest integer satisfying
null space(A0 ) ⊂ null space(A20 ) ⊂ · · · (11.2.5), then Lemma 11.2.7 implies that
⊂ null space(Ak0 ) ⊂ · · ·
Vλ0 = null space(Ak0 ) ⊂ Rn
⊂ Rn .
is a subspace called the generalized eigenspace of A as-
Therefore, the dimensions of the null spaces are bounded
sociated to the eigenvalue λ0 . It will follow from the
above by n and there must be a smallest k such that
Jordan normal form theorem (see Theorem 11.3.2) that
dim null space(Ak0 ) = dim null space(Ak+1
0 ). the dimension of Vλ0 is the algebraic multiplicity of λ0 .

272
§11.2 Multiplicity and Generalized Eigenvectors

An Example of Generalized Eigenvectors Find the gener- Thus, for this example, all generalized eigenvectors that
alized eigenvectors of the 4 × 4 matrix are not eigenvectors have index 2.
 
−24 −58 −2 −8
A=
 15 35 1 4 . (11.2.6*) Exercises
 3 5 7 4 
3 6 0 6
In Exercises 1 – 4 determine the eigenvalues and their geo-
and their indices. When finding generalized eigenvectors
metric and algebraic multiplicities for the given matrix.
of a matrix A, the first two steps are:  
2 0 0 0
(i) Find the eigenvalues of A. 1. A = 
 0 3 1 0 
 0 0 3 0 .

(ii) Find the eigenvectors of A. 0 0 0 4


 
After entering A into MATLAB by typing e13_3_6, we 2 0 0 0
 0 2 0 0 
type eig(A) and find that all of the eigenvalues of A 2. B =  0 0 3 1 .

equal 6. Without additional information, there could be 0 0 0 3
1,2,3 or 4 linearly independent eigenvectors of A corre-
sponding to the eigenvalue 6. In MATLAB we determine
 
−1 1 0 0
the number of linearly independent eigenvectors by typ- 3. C = 
 0 −1 0 0  .
ing null(A-6*eye(4)) and obtaining 0 −1 0 
 0
0 0 0 1
ans = 
2 −1 0 0

0.8892 0  1 2 0 0 
4. D =  .
-0.4446 0.0000  0 0 2 1 
-0.0262 0.9701 0 0 0 2
-0.1046 -0.2425
In Exercises 5 – 8 find a basis consisting of the eigenvectors
We now know that (numerically) there are two linearly for the given matrix supplemented by generalized eigenvec-
independent eigenvectors. The next step is find the num- tors. Choose the generalized eigenvectors with lowest index
ber of independent generalized eigenvectors of index 2. possible.
To complete this calculation, we find a basis for the null  
1 −1
space of (A − 6I4 )2 by typing null((A-6*eye(4))^2) 5. A =
1 3
.
obtaining  
−2 0 −2
ans = 6. B =  −1 1 −2 .
1 0 0 0 0 1 −1
0 1 0 0 
−6 31 −14

0 0 1 0 7. C =  −1 6 −2 .
0 0 0 1 0 2 1

273
§11.2 Multiplicity and Generalized Eigenvectors

 
5 1 0
8. D =  −3 1 1 .
−12 −4 0

In Exercises 9 – 10, use MATLAB to find the eigenvalues


and their algebraic and geometric multiplicities for the given
matrix.

9. (matlab)
 
2 3 −21 −3
 2 7 −41 −5 
A=
 . (11.2.7*)
0 1 −5 −1 
0 0 4 4

10. (matlab)
 
179 −230 0 10 −30
 144 −185 0 8 −24 
(11.2.8*)
 
B=  30 −39 −1 3 −9 .

 192 −245 0 9 −30 
40 −51 0 2 −7

274
§11.3 The Jordan Normal Form Theorem

11.3 The Jordan Normal Form n × n matrix. Then A is similar to a Jordan normal
form matrix and to a real Jordan normal form matrix.
Theorem
The question that we discussed in Sections 7.3 and 11.1 This theorem is proved by constructing a basis V for Rn
is: Up to similarity, what is the simplest form that a so that the matrix S −1 AS is in Jordan normal form,
matrix can have? We have seen that if A has real dis- where S is the matrix whose columns consists of vectors
tinct eigenvalues, then A is real diagonalizable. That is, in V. The algorithm for finding the basis V is compli-
A is similar to a diagonal matrix whose diagonal entries cated and is found in Appendix 11.5. In this section we
are the real eigenvalues of A. Similarly, if A has dis- construct V only in the special and simpler case where
tinct real and complex eigenvalues, then A is complex each eigenvalue of A is real and is associated with exactly
diagonalizable; that is, A is similar either to a diagonal one Jordan block.
matrix whose diagonal entries are the real and complex
More precisely, let λ1 , . . . , λs be the distinct eigenvalues
eigenvalues of A or to a real block diagonal matrix.
of A and let
In this section we address the question of simplest form Aj = A − λj In .
when a matrix has multiple eigenvalues. In much of this
The eigenvectors corresponding to λj are the vectors in
discussion we assume that A is an n × n matrix with
the null space of Aj and the generalized eigenvectors are
only real eigenvalues. Lemma 7.3.3 shows that if the
the vectors in the null space of Akj for some k. The di-
eigenvectors of A form a basis, then A is diagonalizable.
mension of the null space of Aj is precisely the number
Indeed, for A to be diagonalizable, there must be a basis
of Jordan blocks of A associated to the eigenvalue λj . So
of eigenvectors of A. It follows that if A is not diagonaliz-
the assumption that we make here is
able, then A must have fewer than n linearly independent
eigenvectors. nullity(Aj ) = 1
The prototypical examples of matrices having fewer
eigenvectors than eigenvalues are the matrices Jn (λ) for λ for j = 1, . . . , s.
real (see (11.2.1)) and Jbn (λ) for λ complex (see (11.2.3)). Let kj be the integer whose existence is specified by
Lemma 11.2.7. Since, by assumption, there is only one
Definition 11.3.1. A matrix is in Jordan normal form Jordan block associated with the eigenvalue λj , it follows
if it is block diagonal and the matrix in each block on that kj is the algebraic multiplicity of the eigenvalue λj .
the diagonal is a Jordan block, that is, J` (λ) for some
integer ` and some real or complex number λ. To find a basis in which the matrix A is in Jordan normal
form, we proceed as follows. First, let wjkj be a vector
A matrix is in real Jordan normal form if it is block in
diagonal and the matrix in each block on the diagonal is null space(Aj j ) – null space(Aj j ).
k k −1
a real Jordan block, that is, either J` (λ) for some integer
` and some real number λ or Jb` (λ) for some integer ` and Define the vectors wji by
some complex number λ.
wj,kj −1 = Aj wj,kj
The main theorem about Jordan normal form is: ..
.
Theorem 11.3.2 (Jordan normal form). Let A be an wj,1 = Aj wj,2 .

275
§11.3 The Jordan Normal Form Theorem

Second, when λj is real, let the kj vectors vji = wji , and where A1 and A2 are square matrices. Then
when λj is complex, let the 2kj vectors vji be defined by
pA (λ) = pA1 (λ)pA2 (λ).
vj,2i−1 = Re(wji )
This observation follows directly from Lemma 7.1.9.
vj,2i = Im(wji ). Since  k 
k A1 0
Let V be the set of vectors vji ∈ Rn . We will show in A =
0 Ak2
,
Appendix 11.5 that the set V consists of n vectors and
is a basis of Rn . Let S be the matrix whose columns are it follows that
the vectors in V. Then S −1 AS is in Jordan normal form.
 
pA (A1 ) 0
pA (A) =
0 pA (A2 )
The Cayley Hamilton Theorem As a corollary of the Jor-
 
pA1 (A1 )pA2 (A1 ) 0
= .
dan normal form theorem, we prove the Cayley Hamilton 0 pA1 (A2 )pA2 (A2 )
theorem which states that a square matrix satisfies its
It now follows from this calculation that if the Cay-
characteristic polynomial. More precisely:
ley Hamilton theorem is valid for Jordan blocks, then
Theorem 11.3.3 (Cayley Hamilton). Let A be a square pA1 (A1 ) = 0 = pA2 (A2 ). So pA (A) = 0 and the Cayley
matrix and let pA (λ) be its characteristic polynomial. Hamilton theorem is valid for all matrices.
Then A direct calculation shows that Jordan blocks satisfy the
pA (A) = 0. Cayley Hamilton theorem. To begin, suppose that the
eigenvalue of the Jordan block is real. Note that the
Proof Let A be an n × n matrix. The characteristic characteristic polynomial of the Jordan block Jn (λ0 ) in
polynomial of A is (11.2.1) is (λ − λ0 )n . Indeed, Jn (λ0 ) − λ0 In is strictly
upper triangular and (Jn (λ0 ) − λ0 In )n = 0. If λ0 is
pA (λ) = det(A − λIn ). complex, then either repeat this calculation using the
complex Jordan form or show by direct calculation that
Suppose that B = P −1 AP is a matrix similar to A. The- (A − λ0 In )(A − λ0 In ) is strictly upper triangular when
orem 7.2.8 states that pB = pA . Therefore A = Jbn (λ0 ) is the real Jordan form of the Jordan block
in (11.2.3). 
pB (B) = pA (P −1 AP ) = P −1 pA (A)P.

So if the Cayley Hamilton theorem holds for a matrix An Example Consider the 4 × 4 matrix
similar to A, then it is valid for the matrix A. Moreover,  
−147 −106 −66 −488
using the Jordan normal form theorem, we may assume  604 432 271 1992 
that A is in Jordan normal form. A=  621
. (11.3.1*)
448 279 2063 
Suppose that A is block diagonal, that is −169 −122 −76 −562
Using MATLAB we can compute the characteristic poly-
 
A1 0
A= , nomial of A by typing
0 A2

276
§11.3 The Jordan Normal Form Theorem

poly(A) null2 = null(A^2)

The output is obtaining

ans = null2 =
1.0000 -2.0000 -15.0000 -0.0000 - 0.2193 -0.2236
0.0000 -0.5149 -0.8216
-0.8139 0.4935
Note that since A is a matrix of integers we know that the 0.1561 0.1774
coefficients of the characteristic polynomial of A must be
integers. Thus the characteristic polynomial is exactly: Choose one of these vectors, say the first vector, to be
v12 by typing
pA (λ) = λ4 − 2λ3 − 15λ2 = λ2 (λ − 5)(λ + 3).
v12 = null2(:,1);
So λ1 = 0 is an eigenvalue of A with algebraic multiplicity
two and λ2 = 5 and λ3 = −3 are simple eigenvalues of Since the algebraic multiplicity of the eigenvalue 0 is two,
multiplicity one. we choose the fourth basis vector be v11 = Av12 . In MAT-
We can find eigenvectors of A corresponding to the simple LAB we type
eigenvalues by typing
v11 = A*v12
v2 = null(A-5*eye(4));
v3 = null(A+3*eye(4)); obtaining

At this stage we do not know how many linearly inde- v11 =


pendent eigenvectors have eigenvalue 0. There are either -0.1263
one or two linearly independent eigenvectors and we de- 0.4420
termine which by typing null(A) and obtaining 0.5051
-0.1263
ans =
-0.1818 Since v11 is nonzero, we have found a basis for V0 . We
0.6365 can now put the matrix A in Jordan normal form by
0.7273 setting
-0.1818
S = [v11 v12 v2 v3];
So MATLAB tells us that there is just one linearly inde- J = inv(S)*A*S
pendent eigenvector having 0 as an eigenvalue. There
must be a generalized eigenvector in V0 . Indeed, the to obtain
null space of A2 is two dimensional and this fact can
be checked by typing J =

277
§11.3 The Jordan Normal Form Theorem

-0.0000 1.0000 0.0000 -0.0000 ans =


0.0000 0.0000 0.0000 -0.0000 5.00000000001021
-0.0000 -0.0000 5.0000 0.0000 -0.00000000000007 + 0.00000023858927i
0.0000 -0.0000 -0.0000 -3.0000 -0.00000000000007 - 0.00000023858927i
-3.00000000000993
We have only discussed a Jordan normal form exam-
ple when the eigenvalues are real and multiple. The That is, MATLAB computes two complex conjugate
case when the eigenvalues are complex and multiple first eigenvalues
occurs when n = 4. A sample complex Jordan block ±0.00000023858927i
when the matrix has algebraic multiplicity two eigenval- which corresponds to an  of -5.692483975913288e-14.
ues σ ± iτ of geometric multiplicity one is On a IBM compatible 486 computer using MATLAB ver-
  sion 4.2 the same computation yields eigenvalues
σ −τ 1 0
ans=
 τ σ 0 1 
 .
 0 0 σ −τ  4.99999999999164
0 0 τ σ 0.00000057761008
-0.00000057760735
-2.99999999999434
Numerical Difficulties When a matrix has multiple
eigenvalues, then numerical difficulties can arise when us- That is, on this computer MATLAB computes two real,
ing the MATLAB command eig(A), as we now explain. near zero, eigenvalues

Let p(λ) = λ2 . Solving p(λ) = 0 is very easy — in theory ±0.00000057761


— as λ = 0 is a double root of p. Suppose, however, that that corresponds to an  of 3.336333121e-13. These
we want to solve p(λ) = 0 numerically. Then, numerical errors are within round off error in double precision com-
errors will lead to solving the equation putation.
λ2 =  A consequence of these kinds of error, however, is that
when a matrix has multiple eigenvalues, we cannot use
where  is a√small number. Note that if  > 0, the p solu- the command [V,D] = eig(A) with confidence. On the
tions are ± ; while if  < 0, the solutions are ±i ||. Sun SPARCstation, this command yields a matrix
Since numerical errors are machine dependent,  can be
of either sign. The numerical process of finding dou- V =
-0.1652 0.0000 - 0.1818i 0.0000 + 0.1818i -0.1642
ble roots of a characteristic polynomial (that is, double 0.6726 -0.0001 + 0.6364i -0.0001 - 0.6364i 0.6704
eigenvalues of a matrix) is similar to numerically solving 0.6962
-0.1888
-0.0001
0.0000
+
-
0.7273i
0.1818i
-0.0001
0.0000
-
+
0.7273i
0.1818i
0.6978
-0.1915
the equation λ2 = 0, as we shall see.
For example, on a Sun SPARCstation 10 using MAT- that suggests that A has two complex eigenvectors corre-
LAB version 4.2c, the eigenvalues of the 4 × 4 matrix A sponding to the ‘complex’ pair of near zero eigenvalues.
in (11.3.1*) (in format long) obtained using eig(A) are: The IBM compatible yields the matrix

278
§11.3 The Jordan Normal Form Theorem

V = 5. An 8 × 8 real matrix A has three eigenvalues equal to


-0.1652 0.1818 -0.1818 -0.1642 2, two eigenvalues equal to 1 + i, and one zero eigenvalue.
0.6726 -0.6364 0.6364 0.6704 List the possible Jordan normal forms for A (up to similar-
0.6962 -0.7273 0.7273 0.6978 ity). Suppose that you can ask your computer to compute the
-0.1888 0.1818 -0.1818 -0.1915 nullity of precisely two matrices. Can you devise a strategy
for determining the Jordan normal form of A? Explain your
answer.
indicating that MATLAB has found two real eigenvectors
corresponding to the near zero real eigenvalues. Note
that the two eigenvectors corresponding to the eigenval- In Exercises 6 – 11 find the Jordan normal forms for the given
ues 5 and −3 are correct on both computers. matrix.
 
2 4
6. A = .
Exercises
1 1
 
9 25
7. B = .
−4 −11

1. Write two different 4×4 Jordan normal form matrices all of


 
−5 −8 −9
whose eigenvalues equal 2 for which the geometric multiplicity 8. C =  5 9 9 .
is two. −1 −2 −1
 
0 1 0
2. How many different 6 × 6 Jordan form matrices have all 9. D =  0 0 1 .
eigenvalues equal to 3? (We say that two Jordan form matri- 1 1 −1
ces are the same if they have the same number and type of  
2 0 −1
Jordan block, though not necessarily in the same order along
10. E =  2 1 −1 .
the diagonal.)
1 0 0
 
3 −1 2
3. A 5 × 5 matrix A has three eigenvalues equal to 4 and 11. F =  −1 2 −1 .
two eigenvalues equal to −3. List the possible Jordan normal −1 1 0
forms for A (up to similarity). Suppose that you can ask your
computer to compute the nullity of precisely two matrices.
Can you devise a strategy for determining the Jordan normal 
2 0 0

form of A? Explain your answer. 12. Compute e tJ
where J =  0 −1 1 .
0 0 −1
4. Find the Jordan Normal Form for each matrix
       
1 4 0 −1 −2 1 2 1 0 0 0
C1 = C2 = C3 =
1 1 1 −2 −10 4  0 2 0 0 0 
13. Compute etJ where J =  .
 
0 0 3 1 0
and for each j determine whether Ẋ = Cj X is a saddle, sink,
 
 0 0 0 3 1 
or source. 0 0 0 0 3

279
§11.3 The Jordan Normal Form Theorem

14. An n×n matrix N is nilpotent if N k = 0 for some positive 19. (matlab)


integer k.  
1 0 −9 18
(a) Show that the matrix N defined in (11.2.2) is nilpotent.  12 −7 −26 77 
A=
 . (11.3.5*)
(b) Show that all eigenvalues of a nilpotent matrix equal 5 −2 −13 32 
zero. 2 −1 −4 11

(c) Show that any matrix similar to a nilpotent matrix is 20. (matlab)
also nilpotent.  
−1 −1 1 0
(d) Let N be a matrix all of whose eigenvalues are zero.  −3 1 1 0 
Use the Jordan normal form theorem to show that N is A=
 −3
. (11.3.6*)
2 −1 1 
nilpotent.
−3 2 0 0

15. Let A be a 3 × 3 matrix. Use the Cayley-Hamilton the-


21. (matlab)
orem to show that A−1 is a linear combination of I3 , A, A2 .
That is, there exist real scalars a, b, c such that 
0 0 −1 2 2

1 −2 0 2 2
A−1 = aI3 + bA + cA2 .
 
(11.3.7*)
 
A=
 1 −1 −1 2 2 .

 0 0 0 1 2 
0 0 0 −1 3
In Exercises 16 – 20, (a) determine the real Jordan normal
form for the given matrix A, and (b) find the matrix S so
that S −1 AS is in real Jordan normal form.

16. (matlab)
 
−3 −4 −2 0
 −9 −39 −16 −7 
A=
 18
. (11.3.2*)
64 27 10 
15 86 34 18

17. (matlab)
 
9 45 18 8
0 −4 −1 −1 
(11.3.3*)

A=
 −16
.
−69 −29 −12 
25 123 49 23

18. (matlab)
 
−5 −13 17 42
 −10 −57 66 187 
A=
 −4
. (11.3.4*)
−23 26 77 
−1 −9 9 32

280
§11.4 *Markov Matrix Theory

11.4 *Markov Matrix Theory Consider the case of a simple Jordan block. Suppose that
n = 1 and that A = (λ) where λ is either real or complex.
In this appendix we use the Jordan normal form theorem
Then
to study the asymptotic dynamics of transition matrices
Ak v0 = λk v0 .
such as those of Markov chains introduced in Section 4.8.
It follows that (11.4.1) is valid precisely when |λ| < 1.
The basic result is the following theorem.
Next, suppose that A is a nontrivial Jordan block. For
Theorem 11.4.1. Let A be an n × n matrix and assume example, let
that all eigenvalues λ of A satisfy |λ| < 1. Then for every  
λ 1
vector v0 ∈ Rn A= = λI2 + N
0 λ
lim Ak v0 = 0. (11.4.1)
k→∞
where N 2 = 0. It follows by induction that
Proof Suppose that A and B are similar matrices; 1
that is, B = SAS −1 for some invertible matrix S. Then Ak v0 = λk v0 + kλk−1 N v0 = λk v0 + kλk N v0 .
λ
B k = SAk S −1 and for any vector v0 ∈ Rn (11.4.1) is
valid if and only if Thus (11.4.1) is valid precisely when |λ| < 1. The reason
for this convergence is as follows. The first term con-
lim B k v0 = 0. verges to 0 as before but the second term is the product
k→∞ 1
of three terms k, λk , and N v0 . The first increases to
Thus, when proving this theorem, we may assume that λ
infinity, the second decreases to zero, and the third is
A is in Jordan normal form.
constant independent of k. In fact, geometric decay (λk ,
Suppose that A is in block diagonal form; that is, suppose when |λ| < 1) always beats polynomial growth. Indeed,

lim mj λm = 0 (11.4.2)
 
C 0
A= , m→∞
0 D
for any integer j. This fact can be proved using l’Hôspi-
where C is an ` × ` matrix and D is a (n − `) × (n − `) tal’s rule and induction.
matrix. Then
So we see that when A has a nontrivial Jordan block, con-
vergence is subtler than when A has only simple Jordan
 k 
C 0
Ak = . blocks, as initially the vectors Av0 grow in magnitude.
0 Dk
For example, suppose that λ = 0.75 and v0 = (1, 0)t .
So for every vector v0 = (w0 , u0 ) ∈ R` × Rn−` (11.4.1) is Then A8 v0 = (0.901, 0.075)t is the first vector in the se-
valid if and only if quence Ak v0 whose norm is less than 1; that is, A8 v0 is
the first vector in the sequence closer to the origin than
lim C k v0 = 0 and lim Dk v0 = 0. v0 .
k→∞ k→∞
It is also true that (11.4.1) is valid for any Jordan block
So, when proving this theorem, we may assume that A A and for all v0 precisely when |λ| < 1. To verify this fact
is a Jordan block. we use the binomial theorem. We can write a nontrivial

281
§11.4 *Markov Matrix Theory

Jordan block as λIn +N where N k+1 = 0 for some integer nonnegative, whose rows sum to 1, and for which a power
k. We just discussed the case k = 1. In this case P k that has all positive entries. To prove this theorem
  we must show that all eigenvalues λ of P satisfy |λ| ≤ 1
(λIn + N )m = λm In + mλm−1 N +
m m−2 2
λ N + ··· and that 1 is a simple eigenvalue of P .
2
  Let λ be an eigenvalue of P and let v = (v1 , . . . , vn )t
+
m m−k k
λ N , be an eigenvector corresponding to the eigenvalue λ. We
k prove that |λ| ≤ 1. Choose j so that |vj | ≥ |vi | for all
where i. Since P v = λv, we can equate the j th coordinates of
  both sides of this equality, obtaining
m m! m(m − 1) · · · (m − j + 1)
= = . pj1 v1 + · · · + pjn vn = λvj .
j j!(m − j)! j!

To verify that Therefore,

lim (λIn + N )m = 0 |λ||vj | = |pj1 v1 + · · · + pjn vn | ≤ pj1 |v1 | + · · · + pjn |vn |,


m→∞
since the pij are nonnegative. It follows that
we need only verify that each term
  |λ||vj | ≤ (pj1 + · · · + pjn )|vj | = |vj |,
m m−j j
lim λ N =0
m→∞ j since |vi | ≤ |vj | and rows of P sum to 1. Since |vj | > 0,
it follows that λ ≤ 1.
Such terms are the product of three terms
Next we show that 1 is a simple eigenvalue of P . Re-
1 call, or just calculate directly, that the vector (1, . . . , 1)t
m(m − 1) · · · (m − j + 1) and λ m
and Nj.
j!λj is an eigenvector of P with eigenvalue 1. Now let v =
(v1 , . . . , vn )t be an eigenvector of P with eigenvalue 1.
The first term has polynomial growth to infinity domi- Let Q = P k so that all entries of Q are positive. Observe
nated by mj , the second term decreases to zero geomet- that v is an eigenvector of Q with eigenvalue 1, and hence
rically, and the third term is constant independent of m. that all rows of Q also sum to 1.
The desired convergence to zero follows from (11.4.2). 
To show that 1 is a simple eigenvalue of Q, and therefore
Definition 11.4.2. The n × n matrix A has a dominant of P , we must show that all coordinates of v are equal.
eigenvalue λ0 > 0 if λ0 is a simple eigenvalue and all Using the previous estimates (with λ = 1), we obtain
other eigenvalues λ of A satisfy |λ| < λ0 .
|vj | = |qj1 v1 + · · · + qjn vn | ≤ qj1 |v1 | + · · · + qjn |vn | ≤ |vj |.
(11.4.3)
Theorem 11.4.3. Let P be a Markov matrix. Then 1 Hence
is a dominant eigenvalue of P .
|qj1 v1 + · · · + qjn vn | = qj1 |v1 | + · · · + qjn |vn |.

Proof Recall from Chapter 3, Definition 4.8.1 that a This equality is valid only if all of the vi are nonnegative
Markov matrix is a square matrix P whose entries are or all are nonpositive. Without loss of generality, we

282
§11.4 *Markov Matrix Theory

assume that all vi ≥ 0. It follows from (11.4.3) that (b) Theorem 11.4.3 states that a Markov matrix has a
dominant eigenvalue equal to 1. The Jordan normal form
vj = qj1 v1 + · · · + qjn vn . theorem implies that the eigenvalues of P t are equal to
the eigenvalues of P with the same algebraic and geo-
Since qji > 0, this inequality can hold only if all of the metric multiplicities. It follows that 1 is also a dominant
vi are equal.  eigenvalue of P t . It follows from Part (a) that
Theorem 11.4.4. (a) Let Q be an n × n matrix with lim (P t )k v0 = cV
dominant eigenvalue λ > 0 and associated eigenvector v. k→∞

Let v0 be any vector in Rn . Then for some scalar c. But Theorem 4.8.3 in Chapter 3 implies
1 k that the sum of the entries in v0 equals the sum of the
lim
k→∞ λk
Q v0 = cv, entries in cV which, by assumption equals the sum of the
entries in V . Thus, c = 1. 
for some scalar c.
(b) Let P be a Markov matrix and v0 a nonzero vector in Exercises
Rn with all entries nonnegative. Then

lim (P t )k v0 = V
k→∞
1. Let A be an n × n matrix. Suppose that
where V is the eigenvector of P t with eigenvalue 1 such
that the sum of the entries in V is equal to the sum of lim Ak v0 = 0.
k→∞
the entries in v0 .
for every vector v0 ∈ Rn . Then the eigenvalues λ of A all
satisfy |λ| < 1.
Proof (a) After a similarity transformation, if needed,
we can assume that Q is in Jordan normal form. More
precisely, we can assume that
 
1 1 0
Q=
λ 0 A

where A is an (n−1)×(n−1) matrix with all eigenvalues


µ satisfying |µ| < 1. Suppose v0 = (c0 , w0 ) ∈ R × Rn−1 .
It follows from Theorem 11.4.1 that
 
1 1 c0 0
lim k Qk v0 = lim ( Q)k v0 = lim = c0 e 1 .
k→∞ λ k→∞ λ k→∞ 0 Ak w0

Since e1 is the eigenvector of Q with eigenvalue λ Part


(a) is proved.

283
§11.5 *Proof of Jordan Normal Form

11.5 *Proof of Jordan Normal Form Now w ∈ null space(B) if w ∈ Vj and Ai w = 0. Since
Ai w = (A−λi In )w = 0, it follows that Aw = λi w. Hence
We prove the Jordan normal form theorem under the
assumption that the eigenvalues of A are all real. The Aj w = (A − λj In )w = (λi − λj )w
proof for matrices having both real and complex eigen-
values proceeds along similar lines. and
Let A be an n × n matrix, let λ1 , . . . , λs be the distinct Akj w = (λi − λj )k w.
eigenvalues of A, and let Aj = A − λj In .
Since λi 6= λj , it follows that Akj w = 0 only when w = 0.
Lemma 11.5.1. The linear mappings Ai and Aj com- Hence the nullity of B is zero. We conclude that
mute.
dim range(B) = dim(Vj ).

Proof Just compute Thus, B is invertible, since the domain and range of B
are the same space. 
Ai Aj = (A − λi In )(A − λj In ) = A2 − λi A − λj A + λi λj In ,
Lemma 11.5.3. Nonzero vectors taken from different
and generalized eigenspaces Vj are linearly independent. More
precisely, if wj ∈ Vj and
Aj Ai = (A − λj In )(A − λi In ) = A2 − λj A − λi A + λj λi In .
w = w1 + · · · + ws = 0,
So Ai Aj = Aj Ai , as claimed. 
then wj = 0.
Let Vj be the generalized eigenspace corresponding to
eigenvalue λj . Proof Let Vj = null space(Aj j ) for some integer kj .
k

Lemma 11.5.2. Ai : Vj → Vj is invertible when i 6= j. Let C = Ak22 ◦ · · · ◦Aks s . Then

0 = Cw = Cw1 ,
Proof Recall from Lemma 11.2.7 that Vj =
null space(Akj ) for some k ≥ 1. Suppose that v ∈ Vj . We since Aj j wj = 0 for j = 2, . . . , s. But Lemma 11.5.2
k

first verify that Ai v is also in Vj . Using Lemma 11.5.1, implies that C|V1 is invertible. Therefore, w1 = 0. Simi-
just compute larly, all of the remaining wj have to vanish. 

Akj Ai v = Ai Akj v = Ai 0 = 0. Lemma 11.5.4. Every vector in Rn is a linear combi-


nation of vectors in the generalized eigenspaces Vj .
Therefore, Ai v ∈ null space(Akj ) = Vj .
Let B be the linear mapping Ai |Vj . It follows from Chap- Proof Let W be the subspace of Rn consisting of all
ter 8, Theorem 8.2.3 that vectors of the form z1 + · · · + zs where zj ∈ Vj . We need
to verify that W = Rn . Suppose that W is a proper sub-
nullity(B) + dim range(B) = dim(Vj ). space. Then choose a basis w1 , . . . , wt of W and extend

284
§11.5 *Proof of Jordan Normal Form

this set to a basis W of Rn . In this basis the matrix [A]W where all of the eigenvalues of Ajj equal λj .
has block form, that is,

A11 A12
 Proof It follows from Lemma 11.5.1 that A : Vj → Vj .
[A]W =
0 A22
, Suppose that vj ∈ Vj . Then Avj is in Vj and Avj is a
linear combination of vectors in Vj . The block diagonal-
where A22 is an (n − t) × (n − t) matrix. The eigenvalues ization of [A]V follows. Since Vj = null space(Aj j ), it
k
of A22 are eigenvalues of A. Since all of the distinct follows that all eigenvalues of Ajj equal λj . 
eigenvalues and eigenvectors of A are accounted for in W
(that is, in A11 ), we have a contradiction. So W = Rn ,
as claimed.  Lemma 11.5.6 implies that to prove the Jordan normal
form theorem, we must find a basis in which the matrix
Lemma 11.5.5. Let Vj be a basis for the generalized Ajj is in Jordan normal form. So, without loss of gen-
eigenspaces Vj and let V be the union of the sets Vj . erality, we may assume that all eigenvalues of A equal
Then V is a basis for Rn . λ0 , and then find a basis in which A is in Jordan nor-
mal form. Moreover, we can replace A by the matrix
Proof We first show that the vectors in V span Rn . It A − λ0 In , a matrix all of whose eigenvalues are zero. So,
follows from Lemma 11.5.4 that every vector in Rn is a without loss of generality, we assume that A is an n × n
linear combination of vectors in Vj . But each vector in matrix all of whose eigenvalues are zero. We now sketch
Vj is a linear combination of vectors in Vj . Hence, the the remainder of the proof of Theorem 11.3.2.
vectors in V span Rn .
Let k be the smallest integer such that Rn =
Second, we show that the vectors in V are linearly inde- null space(Ak ) and let
pendent. Suppose that a linear combination of vectors in
V sums to zero. We can write this sum as s = dim null space(Ak ) − dim null space(Ak−1 ) > 0.

w1 + · · · + ws = 0, Let z1 , . . . , zn−s be a basis for null space(Ak−1 ) and ex-


where wj is the linear combination of vectors in Vj . tend this set to a basis for null space(Ak ) by adjoining
Lemma 11.5.3 implies that each wj = 0. Since Vj is the linearly independent vectors w1 , . . . , ws . Let
a basis for Vj , it follows that the coefficients in the linear
Wk = span{w1 , . . . , ws }.
combinations wj must all be zero. Hence, the vectors in
V are linearly independent. It follows that Wk ∩ null space(Ak−1 ) = {0}.
Finally, it follows from Theorem 5.5.3 of Chapter 5 that We claim that the ks vectors W = {wj` = A` (wj )} where
V is a basis.  0 ≤ ` ≤ k − 1 and 1 ≤ j ≤ s are linearly independent.
Lemma 11.5.6. In the basis V of Rn guaranteed by We can write any linear combination of the vectors in W
Lemma 11.5.5, the matrix [A]V is block diagonal, that as yk + · · · + y1 , where yj ∈ Ak−j (Wk ). Suppose that
is, 
A11 0 0
 yk + · · · + y1 = 0.
..
[A]V =  0 . 0 , Then Ak−1 (yk + · · · + y1 ) = Ak−1 yk = 0. Therefore,
 

0 0 Ass yk is in Wk and in null space(Ak−1 ). Hence, yk = 0.

285
§11.5 *Proof of Jordan Normal Form

Similarly, Ak−2 (yk−1 + · · · + y1 ) = Ak−2 yk−1 = 0. But


yk−1 = Aŷk where ŷk ∈ Wk and ŷk ∈ null space(Ak−1 ).
Hence, ŷk = 0 and yk−1 = 0. Similarly, all of the yj = 0.
It follows from yj = 0 that a linear combination of the
vectors Ak−j (w1 ), . . . , Ak−j (ws ) is zero; that is

0 = β1 Ak−j (w1 )+· · ·+βs Ak−j (ws ) = Ak−j (β1 w1 +· · ·+βs ws ).

Applying Aj−1 to this expression, we see that

β1 w1 + · · · + βs ws

is in Wk and in the null space(Ak−1 ). Hence,

β1 w1 + · · · + βs ws = 0.

Since the wj are linearly independent, each βj = 0, thus


verifying the claim.
Next, we find the largest integer m so that

t = dim null space(Am ) − dim null space(Am−1 ) > 0.

Proceed as above. Choose a basis for null space(Am−1 )


and extend to a basis for null space(Am ) by adjoining
the vectors x1 , . . . , xt . Adjoin the mt vectors A` xj to
the set V and verify that these vectors are all linearly
independent. And repeat the process. Eventually, we
arrive at a basis for Rn = null space(Ak ).
In this basis the matrix [A]V is block diagonal; indeed,
each of the blocks is a Jordan block, since

wj(`−1) 0<`≤k−1
A(wj` ) = .
0 `=1

Note the resemblance with (11.2.2).

286
Chapter 12 Matlab Commands

12 Matlab Commands
† indicates an laode toolbox command not found in MATLAB .

Chapter 1: Preliminaries

Editing and Number Commands

quit Ends MATLAB session


; (a) At end of line the semicolon suppresses echo printing
(b) When entering an array the semicolon indicates a new row
↑ Displays previous MATLAB command
[] Brackets indicating the beginning and the end of a vector or a matrix
x=y Assigns x the value of y
x(j) Recalls j th entry of vector x
A(i,j) Recalls ith row, j th column of matrix A
A(i,:) Recalls ith row of matrix A
A(:,j) Recalls j th column of matrix A

Vector Commands

norm(x) The norm or length of a vector x


dot(x,y) Computes the dot product of vectors x and y
†addvec(x,y) Graphics display of vector addition in the plane
†addvec3(x,y) Graphics display of vector addition in three dimensions

Matrix Commands

A0 (Conjugate) transpose of matrix


zeros(m,n) Creates an m × n matrix all of whose entries equal 0
zeros(n) Creates an n × n matrix all of whose entries equal 0
diag(x) Creates an n × n diagonal matrix whose diagonal entries
are the components of the vector x ∈ Rn
eye(n) Creates an n × n identity matrix

Special Numbers in MATLAB

287
Chapter 12 Matlab Commands

pi The number π = 3.1415 . . .


acos(a) The inverse cosine of the number a

Chapter 2: Solving Linear Equations

Editing and Number Commands

format Changes the numbers display format to standard five digit format
format long Changes display format to 15 digits
format rational Changes display format to rational numbers
format short e Changes display to five digit floating point numbers

Vector Commands

x.*y Componentwise multiplication of the vectors x and y


x./y Componentwise division of the vectors x and y
x.^y Componentwise exponentiation of the vectors x and y

Matrix Commands

A([i j],:) = A([j i],:)


Swaps ith and j th rows of matrix A
A\b Solves the system of linear equations associated with
the augmented matrix (A|b)
x = linspace(xmin,xmax,N)
Generates a vector x whose entries are N equally spaced points
from xmin to xmax
x = xmin:xstep:xmax
Generates a vector whose entries are equally spaced points from xmin to xmax
with stepsize xstep
[x,y] = meshgrid(XMIN:XSTEP:XMAX,YMIN:YSTEP:YMAX);
Generates two vectors x and y. The entries of x are values from XMIN to XMAX
in steps of XSTEP. Similarly for y.
rand(m,n) Generates an m × n matrix whose entries are randomly and uniformly chosen

288
Chapter 12 Matlab Commands

from the interval [0, 1]


rref(A) Returns the reduced row echelon form of the m × n matrix A
the matrix after each step in the row reduction process
rank(A) Returns the rank of the m × n matrix A

Graphics Commands

plot(x,y) Plots a graph connecting the points (x(i), y(i)) in sequence


xlabel('labelx') Prints labelx along the x axis
ylabel('labely') Prints labely along the y axis
surf(x,y,z) Plots a three dimensional graph of z(j) as a function of x(j) and y(j)
hold on Instructs MATLAB to add new graphics to the previous figure
hold off Instructs MATLAB to clear figure when new graphics are generated
grid Toggles grid lines on a figure
axis('equal') Forces MATLAB to use equal x and y dimensions
view([a b c]) Sets viewpoint from which an observer sees the current 3-D plot
zoom Zoom in and out on 2-D plot. On each mouse click, axes change by a factor of 2

Special Numbers and Functions in MATLAB

exp(x) The number e√x where e = exp(1) = 2.7182 . . .


sqrt(x) The number √x
i The number −1

Chapter 3: Matrices and Linearity

Matrix Commands

A*x Performs the matrix vector product of the matrix A with the vector x
A*B Performs the matrix product of the matrices A and B
size(A) Determines the numbers of rows and columns of a matrix A
inv(A) Computes the inverse of a matrix A

Program for Matrix Mappings

†map Allows the graphic exploration of planar matrix mappings

289
Chapter 12 Matlab Commands

Chapter 4: Solving Ordinary Differential Equations

Special Functions in MATLAB

sin(x) The number sin(x)


cos(x) The number cos(x)

Matrix Commands

eig(A) Computes the eigenvalues of the matrix A


null(A) Computes the solutions to the homogeneous equation Ax = 0

Programs for the Solution of ODEs

†pline Dynamic illustration of phase line plots for single


autonomous differential equations
†PhasePlane Displays phase space and time series plots for systems of autonomous differential equations

Chapter 7: Determinants and Eigenvalues

Matrix Commands

det(A) Computes the determinant of the matrix A


poly(A) Returns the characteristic polynomial of the matrix A
sum(v) Computes the sum of the components of the vector v
trace(A) Computes the trace of the matrix A
[V,D] = eig(A) Computes eigenvectors and eigenvalues of the matrix A

Chapter 8: Linear Maps and Changes of Coordinates

Vector Commands

†bcoord Geometric illustration of planar coordinates by vector addition


†ccoord Geometric illustration of coordinates relative to two bases

290
Chapter 12 Matlab Commands

Chapter 10: Orthogonality

Matrix Commands

orth(A) Computes an orthonormal basis for the column space of the matrix A
[Q,R] = qr(A,0) Computes the QR decomposition of the matrix A

Graphics Commands

axis([xmin,xmax,ymin,ymax])
Forces MATLAB to use in a twodimensional plot the intervals
[xmin,xmax] resp. [ymin,ymax] labeling the x- resp. y-axis
plot(x,y,'o') Same as plot but now the points (x(i), y(i)) are marked by
circles and no longer connected in sequence

Chapter 11: Matrix Normal Forms

Vector Commands

real(v) Returns the vector of the real parts of the components


of the vector v
imag(v) Returns the vector of the imaginary parts of the components
of the vector v

291
Index
R3 inv, 81, 206, 224
subspaces, 150 linspace, 23
Rn , 2 map, 56, 269
ej , 62 meshgrid, 25
MATLAB Instructions norm, 12, 241, 250
\, 18, 20, 46, 48, 53 null, 116, 136, 137, 140, 206, 249
’, 7, 224 orth, 250
*, 53, 74 PhasePlane, 97, 156, 170
.^, 27 pi, 45
:, 5 plot, 23
;, 4 poly, 201, 276
[1 2 1], 4 prod, 177
[1; 2; 3], 5 qr, 260
.*, 27 rand, 22, 142
./, 27 rank, 42, 144
A(3,4), 19 real, 268
A([1 3],:), 31 rref, 39, 137
acos, 13 sin, 91
addvec, 11 size, 75
addvec3, 11 sqrt, 45
axis(’equal’), 24 sum, 201
bcoord, 223 surf, 25
ccoord, 230 trace, 201
cos, 91 view, 28
det, 194 xlabel, 23
diag, 7 ylabel, 23
dog, 56 zeros, 7
dot, 12, 13 zoom, 28
eig, 112, 116, 206, 268, 278
exp(1), 45 acceleration, 182
expm, 176 amplitude, 243
eye, 7, 80 angle between vectors, 13
format associative, 73, 128
long, 21, 46 autonomous, 154
rational, 21, 46
grid, 24 back substitution, 29, 34
hold, 24 basis, 143, 149, 150, 214, 224, 229, 235, 263
i, 48 construction, 149
imag, 268 orthonormal, 248, 249, 251, 252, 254, 258, 260
inf, 20 binomial theorem, 281

292
Chapter 12 Matlab Commands

Cartesian plane, 2 in MATLAB , 194


Cayley Hamilton theorem, 180, 276 inductive formula for, 192, 208
center, 226 of 2 × 2 matrices, 188, 193
change of coordinates, 225 of 3 × 3 matrices, 193
characteristic polynomial, 108, 159, 163, 180, 198, 263, 271, uniqueness, 188, 191
276 diagonalization
of triangular matrices, 198 in MATLAB , 205
roots, 198 differential equation
closed form solution, 165 superposition, 104
closure dilatation, 56, 263, 264
under addition, 127 dimension, 143, 144, 149, 150, 217, 222
under scalar multiplication, 127 finite, 143
cofactor, 192, 199, 208 infinite, 143
collinear, 150 of Rn , 143
column, 2 of null space, 145
rank, 217, 218 direction field, 97
space, 217 discriminant, 109
commutative, 73, 128 distance
complex conjugation, 48 between vectors, 234
complex diagonalizable, 265 Euclidean, 240
complex numbers, 47 to a line, 234
complex valued solution, 158 to a subspace, 235
imaginary part, 158 distributive, 73, 128
real part, 158 dot product, 12, 14, 24, 212, 248, 254
composition, 69 double precision, 278
of linear mappings, 214
compound interest, 92 echelon form, 33, 34, 38, 289
consistent, 29, 40 reduced, 35, 39, 137, 144, 190, 194
contraction, 56, 263 uniqueness, 41
coordinate system, 226 eigendirection, 100
coordinates, 221–223, 229 eigenvalue, 105, 108, 109, 187, 198, 200, 204, 265
in R2 , 223 complex, 110, 157, 158, 166, 198, 263
in Rn , 223 distinct, 264
in MATLAB , 223 multiple, 278
standard, 221 dominant, 282
coupled system, 100 existence, 200
Cramer’s rule, 86, 196 of inverse, 200
of symmetric matrix, 254
data points, 240 real, 105, 157, 198
data value, 240 distinct, 204
degrees, 14 equal, 159
determinant, 84, 163, 188, 190, 192, 200, 208 eigenvector, 105, 108, 109, 158, 204, 272
computation, 190, 193 generalized, 160, 225, 272, 273, 277

293
Chapter 12 Matlab Commands

linearly independent, 160, 205, 275 for second order equations, 183
real, 105, 160 initial velocity, 183
elementary row operations, 30, 143, 189, 191, 209 integral calculus, 88
in MATLAB , 31 inverse, 77, 78, 84, 128, 192, 265
equilibrium, 96 computation, 79
Euler’s formula, 157, 264 invertible, 77, 79, 84, 107, 163, 175, 192, 200, 214, 215
expansion, 56, 263
exponential Jordan block, 272, 275, 281
decay, 90 Jordan normal form, 275, 277, 284
growth, 90 basis for, 275
external force, 182
law of cosines, 12
first order Law of Pythagorus, 234
reduction to, 183 least squares, 241
fitting of data, 240 approximation, 234
force, 182 distance to a line, 234
frequency distance to a subspace, 235
internal, 184 fit to a quadratic polynomial, 241, 242
function space, 127, 242 fit to a sinusoidal function, 243
subspace of, 130 fitting of data, 240
fundamental theorem of algebra, 199 general fit, 242
length, 11
Gaussian elimination, 29, 32 linear, 29, 60, 212, 229
general solution, 113, 155, 157, 161, 183 combination, 133, 136, 139, 145, 221
generalized eigenspace, 272 fit to data, 240
geometric decay, 281 mapping, 60–62, 212, 222, 230
Goodman, Roy, i, 87, 97 construction, 212
Gram-Schmidt orthonormalization, 251, 252 matrix, 213
growth rate, 90 regression, 241, 242
linearly
Hermitian inner product, 254 dependent, 139, 140, 148
homogeneous, 65, 132, 134, 136, 145, 182 independent, 139, 140, 143, 148–150, 154, 158, 204, 248
Hooke’s law, 182
hyperplane, 257 MATLAB Instructions
\, 288
identity mapping, 58 ↑, 287
inconsistent, 29, 40, 137 ’, 287
index, 272, 273 *, 289
inhomogeneous, 66, 78 .^, 288
initial condition, 104, 154 :, 287
linear independence, 155 ;, 287
initial position, 183 PhasePlane, 290
initial value problem, 89, 103, 104, 113, 114, 154, 157, 175 .*, 288

294
Chapter 12 Matlab Commands

./, 288 sum, 290


acos, 288 surf, 289
addvec, 287 trace, 290
addvec3, 287 view, 289
axis, 291 xlabel, 289
axis(’equal’), 289 ylabel, 289
bcoord, 290 zeros, 287
ccoord, 290 zoom, 289
cos, 290 Markov chain, 121, 281
det, 290 mass, 182
diag, 287 matrix, 2, 5
dot, 287 addition, 2
eig, 290 associated to a linear map, 222
exp(1), 289 augmented, 30, 33, 41, 135, 137
eye, 287 block diagonal, 7
format, 288 real, 266
e, 288 coefficient, 18, 19
long, 288 diagonal, 7, 204, 265
rational, 288 exponential, 174
grid, 289 computation, 175
hold, 289 in MATLAB , 176
i, 289 Householder, 257, 259
imag, 291 identity, 7, 41, 48, 57
inv, 289 invertible, 213
linspace, 288 lower triangular, 7, 188, 198
map, 289 mappings, 56, 57, 61, 222
meshgrid, 288 Markov, 121–123, 281–283
norm, 287 multiplication, 52, 69, 73, 78, 254, 263
null, 290 in MATLAB , 74
orth, 291 nilpotent, 280
pi, 288 orthogonal, 249, 257
pline, 290 permutation, 72
plot, 289, 291 product, 70
poly, 290 scalar multiplication, 2
qr, 291 skew-symmetric, 9
quit, 287 square, 7, 276
rand, 288 strictly upper triangular, 204
rank, 289 symmetric, 7, 254
real, 291 transition, 118, 121, 123, 206, 229, 281
rref, 289 transpose, 7, 74, 78, 84, 188, 218
sin, 290 upper triangular, 7, 84, 257
size, 289 Vandermonde, 197
sqrt, 289 zero, 7, 57

295
Chapter 12 Matlab Commands

matrix vector product, 52 population dynamics, 93, 94


in MATLAB , 53 population model, 94
minimization problem, 240 principle of superposition, 65
multiplicity product rule, 89
algebraic, 270, 277
geometric, 270 QR decomposition, 257, 258, 260
using Householder matrices, 258
Newton’s law of cooling, 94
Newton’s second law, 182 radians, 14
nilpotent, 280 range, 217
noninvertible, 77 rank, 41, 134, 144, 145, 151
nonlinear, 36 real block diagonal form
norm, 11 in MATLAB , 268
normal form, 164, 225, 263, 264 real diagonalizable, 204, 205
geometric, 266 reflection, 257
normal vector, 24 rotation, 57, 263, 264
null space, 132, 134, 140, 200, 217, 235, 249, 275 matrix, 169
dimension, 145 round off error, 278
nullity, 145, 270 row, 2
equivalent, 38, 39, 41, 48, 196
orthogonal, 248, 251 rank, 217, 218
orthonormal, 248 reduction, 144, 191, 265
orthonormalization space, 217
with MATLAB , 260
saddle, 98, 168, 226
parabolic fit, 242 scalar, 2
parallelogram, 14, 84 scalar multiplication, 2, 11, 60, 128, 133
parallelogram law, 11 in MATLAB , 4
particle motion, 182 scatter plot, 241, 242
particular solution, 155 shear, 58, 85
perpendicular, 12, 129, 151, 251 similar, 163, 175, 200, 204, 224, 230, 266, 275
phase matrices, 163
portrait, 170 singular, 77, 198, 200
for a saddle, 170 sink, 98, 168
for a sink, 170 sinusoidal functions, 243
for a source, 170 sliding friction, 182
space, 97 source, 98, 168
pivot, 34, 38, 144 span, 132–134, 136, 137, 139, 143, 150, 240, 251
planar mappings, 56 spanning set, 134, 143, 150, 260
plane, 24, 151, 251 spring, 182
Polking, John, i, 87, 97 damped, 184
polynomial, 130 motion of, 182
polynomial growth, 281 undamped, 184

296
Chapter 12 Matlab Commands

spring equation, 184


stability
asymptotic, 96
stable
manifold, 170
orbit, 170
subspace, 128, 129, 132, 133, 217, 235, 248
of function space, 130
of polynomials, 143
of solutions, 155
proper, 129, 150
substitution, 17
superposition, 65, 104, 129
system of differential equations, 96
constant coefficient, 103
uncoupled, 96

trace, 108, 163, 201


trajectory, 97
trigonometric function, 130

uniqueness of solutions, 78, 175


unstable, 96
manifold, 170
orbit, 170

vector, 2, 127, 136


addition, 2, 11, 60, 128, 133
complex, 254
coordinates, 221
in C 1 , 130
length, 11
norm, 11
space, 127, 133, 149, 212, 229
subtraction, 2

zero mapping, 58
zero vector, 127, 128

297

You might also like