0% found this document useful (0 votes)
8 views

427J LinearAlgebraNotes

Uploaded by

Teluck Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

427J LinearAlgebraNotes

Uploaded by

Teluck Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

1.

Solving Systems of Linear Equations


A linear equation is an equation like
5x1 − 3x2 + 8x3 − 4x4 = 9
Each variable is multiplied only by a constant. A system of linear equations is a collection of
one or more linear equations in the same variables. Informally, a solution to a linear system
is a set of choices for the variables which satisfies all equations in the system. Formally, a
solution is an n-tuple of numbers, (c1 , c2 , . . . , cn ) such that all of the equations in the system
hold true if we substitute c1 for x1 , c2 for x2 , etc.
Example 1. Consider the system of linear equations
2x1 − x2 +3x3 = 12
8x1 + 5x2 −x3 = 14
x1 − x2 = −1
6x1 + x2 −x3 = 4
(1, 2, 4) is a solution to this system because if we plug in x1 = 1, x2 = 2, x3 = 4, each of the
four equations is true. Informally, we can just write the solution as
x1 = 1
x2 = 2
x3 = 4
The technique for solving linear systems is based on a very simple idea. Consider the
following system.
Example 2.
x1 + 3x2 − x3 + 6x4 =9
x2 + 8x3 − x4 =6
x3 − 4x4 =2
x4 =5
This system requires no effort at all to solve. From the last equation, we see that x4 = 5.
We plug that value for x4 into the third equation to see that x3 must equal 22. Then we
plug x3 = 72, x4 = 5 into the second equation and see that x2 = 6 + 5 − 8 · 22 = −165.
Finally x1 = 9 − 3(−165) + 22 − 6(5) = 496. So the formal solution is (496, −165, 22, 5).
This technique is called back substitution.
The general technique is to first reduce our system to an easy one by a method called
Gaussian elimination and then to finish the problem with the back substitution method we
saw in Example ??. We illustrate in the next example.
Example 3.
2x1 + 3x2 + 4x3 = 31
4x1 + x2 − 5x3 = −18
x1 − x2 + x3 = 3
1
The idea is to make Example ?? look like Example ?? by removing x1 from the second
and third equations and removing x2 from the third equation. All we need is a theorem
which tells us that it is safe to change the equations without changing the solution. I should
remark that two systems which have exactly the same solutions are said to be equivalent.

Theorem 1. Suppose we have a system of linear equations, two of which (not necessarily
the first two) are designated as (A) and (B). The set of solutions to the system is not
changed if we do any of the following three things.
(1) Replace (A) by (A0 ) where (A0 ) is the equation obtained by adding d · (B) to (A)
where d is any real number.
(2) Replace (A) by (A0 ) where (A0 ) is the equation obtained by multiplying (A) by d
where d is any nonzero real number.
(3) Interchange equations (A) and (B).

Proof. What does it mean for the solution set to remain the same? It means that each
solution of the old system is a solution to the new one and that no new solutions are
created. Because each of these operations is reversible, it is actually enough to prove that
no solutions are lost. This is true because if a new solution were created, it would be lost
when we went backwards.
As (3) is obvious - they are the same equations in a different order - we need only
prove (1) and (2). Let (c1 , c2 , . . . , cn ) be a solution to our original system. It obviously
satisfies every equation in the new system except (A0 ) since they are the same equations
we had before. So we need only check that (c1 , c2 , . . . , cn ) satisfies (A0 ) to complete the
proof. Consider conclusion (1). Suppose (A) is the equation a1 x1 + a2 x2 + · · · + an xn = f
and (B) is the equation b1 x1 + b2 x2 + · · · + bn xn = g. Then (A0 ) is the equation (a1 +
db1 )x1 + (a2 + db2 )x2 + · · · + (ab + dbn )xn = f + dg. We simply plug our solution into
the left side of the equation and verify that it works. Since a1 c1 + a2 c2 + · · · + an cn = f
and b1 c1 + b2 c2 + · · · + bn cn = g, then (a1 + db1 )c1 + (a2 + db2 )c2 + · · · + (ab + dbn )cn =
(a1 c1 + a2 c2 + · · · + an cn ) + d(b1 c1 + b2 c2 + · · · + bn cn ) = f + dg.
The proof of conclusion (2) follows the same pattern, but is a bit less complicated. 

Now we return to Example ??. First we multiply the first equation by 1/2. By Theo-
rem ??(1), this does not change the solution. We now have

x1 + (3/2)x2 + 2x3 = (31/2)


4x1 + x2 − 5x3 = −18
x1 − x2 + x3 = 3

Next we use Theorem ??(2) to add (−4) times the first equation to the second and (-1)
times the first to the third. This gives

x1 + (3/2)x2 + 2x3 = (31/2)


−5x2 − 13x3 = −80
(−5/2)x2 − x3 = −25/2
2
Then we divide the second equation by −5 and the system becomes
x1 + (3/2)x2 + 2x3 = (31/2)
x2 + (13/5)x3 = 16
(−5/2)x2 − x3 = −25/2
Next we add (5/2) times the second equation to the third and the third equation becomes
(11/2)x3 = 55/2. Finally we multiply this equation by 2/11 to get the system we want.
x1 + (3/2)x2 + 2x3 = (31/2)
x2 + (13/5)x3 = 16
x3 = 5
Now back substitution gives us in turn x3 = 5, x2 = 3, x1 = 1.
There is a more formal way to finish the problem, one which is really about the same.
Use Theorem ??(2) and the third equation to eliminate x3 from the first two equations.
This gives the same first two equations that you would get if you simply plugged x3 = 5
into the first two equations. The system becomes
x1 + (3/2)x2 = (11/2)
x2 = 3
x3 = 5
With this method, we finish by subtracting 3/2 times the second equation from the first
equation and get
x1 = 1
x2 = 3
x3 = 5
Unfortunately, I seem to have used more than a page to do a very simple problem. What
we need is better notation - notation that allows us to avoid writing so many things over and
over again. We write the system of equations in matrix form and we use a reformulation of
Theorem  ?? to solve the
system
  more  efficiently.
 First the system is the same as the matrix
2 3 4 x1 31
equation 4 1 −5 x2  = −18. We represent this system by the augmented
1 −1 1 x 3
 3
2 3 4 31
matrix 4 1 −5 −18. Our solution process relies on the next theorem.
1 −1 1 3
Theorem 2. Suppose a system of equations is represented by the augmented matrix A. The
set of solutions to the system is not changed if we perform any of the following elementary
row operations on A.
(1) Add d times one row to a different row where d is any real number.
(2) Multiply any row by a nonzero real number.
(3) Interchange two rows.
I won’t prove this. It is simply the matrix form of Theorem ??.
Now I will give the matrix form of the computations above. The symbol ∼ will be used
to imply that one matrix can be derived from the previous by elementary row operations.
3
     
2 3 4 31 1 3/2 2 31/2 1 3/2 2 31/2
4 1 −5 −18 ∼ 4 1 −5 −18  ∼ 0 −5 −13 −80  ∼
1 −1 1 3 1 −1 1 3 0 −5/2 −1 −25/2
     
1 3/2 2 31/2 1 3/2 2 31/2 1 3/2 2 31/2
0 1 13/5 16  ∼ 0 1 13/5 16  ∼ 0 1 13/5 16  ∼
0 −5/2 −1 −25/2 0 0 11/2 55/2 0 0 1 5
   
1 3/2 0 11/2 1 0 0 1
0 1 0 3  ∼ 0 1 0 3
0 0 1 5 0 0 1 5

For a quick description of the above, in the first step we divide the first row by 2 to create
a 1 in the upper left corner. This 1 is referred to as a pivot. In the second step, that pivot
was used to clear out the first column using Theorem ??(1). In the third step, a pivot was
created in the (2,2) position. In the fourth step, this pivot was used to clear out the column
beneath it. In the fifth step, a pivot was created for the third row. In the sixth step, the
column above the third pivot was cleared out and in the last step, the final zero was added
in the (1,2) position.
If we translate this matrix back into the system for equations it represents, we get a
system that is already solved.
x1 = 1
x2 = 3
x3 = 5
I have just solved the same system twice and really used the same calculations. However,
the second solution involved much less writing and so we prefer it.
HINT: I recommend doing this problem exactly as I did it. Notice that in the first step, I
changed Row 1, while in the second step I performed two operations at once - using Row
1 to change Rows 2 and 3. I do not like combining the first two operations as changing
the first row and using it in the same step risks computational error. On the other hand,
changing Rows 2 and  3 in the same
 step saves writing and is not risky.
1 0 0 1
Now the matrix 0 1 0 3 is in a very useful form and deserves a special name.
0 0 1 5
Definition. A matrix is said to be in reduced row echelon form (RREF) if
• The rows consisting entirely of zeroes (if any) are beneath all nonzero rows.
• The leftmost nonzero element of each nonzero row is a 1. Leading 1’s are called
pivots.
• Each pivot is to the right of any pivot in a row above it.
• If a column contains a pivot, all other entries in the column are zero.
Here is a slightly more interesting example of a matrix in reduced row echelon form.
 
1 0 3 0 4
0 1 −4 0 1 
Example 4.  0 0 0 1 −8 Notice that the ”1” in the (2,5) position is not a pivot

0 0 0 0 0
and that the fifth column, like the third, can be messy.
4
For completeness, I should add one additional definition.

Definition. A matrix is in row echelon form if it satisfies properties (1)-(3) in the definition
of RREF.

Note that in the chain of row operations we performed to find a matrix in RREF, the
last three matrices are in row echelon form. There are two ways to solve systems and they
are very similar. Either put the augmented matrix in reduced row echelon form or put it in
row echelon form and finish the problem by back substitution. Either is fine; in this course
we will use RREF.
So far, we have only seen two actual systems of equation and both have had a unique
solution. This is not a general rule. A system of equations can have a unique solution or
infinitely many solutions or no solutions at all. However, we can solve any system with the
same technique - put it in RREF and read off the answer.
    
1 2 3 4 x1 10
2 4 6 9  x2  22
Example 5. Solve the system of equations  3 6 8 11 x3  = 29.
   

4 8 11 15 x4 39
 
1 2 3 4 10
2 4 6 9 22
First we form the augmented matrix  3 6 8 11 29.

4 8 11 15 39
Then we use Type I operations
 to make the entries
 below the pivot in the upper left corner
1 2 3 4 10
0 0 0 1 2
equal to zero, obtaining 
0 0 −1 −1 −1.

0 0 −1 −1 −1
At this point, we are supposed to ignore the first row and create a pivot in the second row.
Since the second column is all zeroes (remember we are ignoring the first row), we must
move on to the third column. As the (2,3) entryis zero, we employ Row  operation 3 and
1 2 3 4 10
0 0 −1 −1 −1
switch the second and third rows. This gives us  0 0 0
. We multiply the
1 2
0 0 −1 −1 −1
 
1 2 3 4 10
0 0 1 1 1
second equation by -1, giving  0 0 0
 and then subtract row 2 from row
1 2
0 0 −1 −1 −1
 
1 2 3 4 10
0 0 1 1 1 
4 to obtain 0 0 0 1 2 . Finally, we use Row operation 1 three times to create

0 0 0 0 0
the
 necessary zeros  in columns 4 and 3 and obtain a matrix in reduced row echelon form
1 2 0 0 5
0 0 1 0 −1
0 0 0 1 2 . The solution to this system can be read off immediately, IF you know
 

0 0 0 0 0
5
how to do it. In equation form, this system is
x1 + 2x2 = 5
x3 = −1
x4 = 2
0=0
Because there was no pivot in the second column, we declare x2 to be a free variable and
move it to the right hand side of the equation(s). The system becomes
x1 = 5 − 2x2
x3 = −1
x4 = 2
0=0
The number of solutions is infinite. We can choose any value for x2 that we like and then
there is a unique choice for x1 , x3 , x4 . The solution set is {(5 − 2t, t, −1, 2) | t ∈ R}.
There is no limit to the number of free variables. We get one for each column without
a pivot. It also possible that we may get no solutions at all. The next example illustrates
this.
    
1 2 3 x1 2
Example 6. Solve the system of equations 4 5 6 x2  = 3.
7 8 9 x3 5
 
1 2 3 2
First we form the augmented matrix 4 5 6 3.
7 8 9 6
We then solve as usual by  row operations. I won’t show the work here, but the first four
1 2 3 2
steps lead us to the matrix 0 1 2 5/3 and four more steps lead us to the reduced row
0 0 0 2
 
1 0 −1 0
echelon form 0 1 2 0. In equation form, this system is
0 0 0 1
x1 − x3 = 0
x2 − 2x3 = 0
0=1
No matter how we choose x1 , x2 , x3 , the equation 0 = 1 will never be satisfied. Hence
this system is inconsistent and has no solutions. What tips us off immediately that this is
the case is the
 pivot in the 
last column of the augmented matrix. Moreover, when we get to
1 2 3 2
the matrix 0 1 2 5/3, we know where all the pivots will be located. We know there
0 0 0 2
will be a pivot in the final column and we can skip the last few steps. In general, for any
problem, halfway through the solution process, we know where all the pivots are located.
So we know if there is no solution, a unique solution, or infinitely many solutions. In the
case of infinitely many solutions, we also know which variables are the free variables.
6
I’ll add one example where there is more than one free variable.
 
x1
x2 
    
1 2 0 −4 5 0 0  x3 
 2
Example 7. Solve the system of equations 0 0 1 7 8
 0 −5 x4  = 3.
   
0 0 0 0 0 1 −9  x5 
 5
x6 
x7
Since this matrix is already in reduced row echelon form, we can just read off the solution.
The free variables are x2 , x4 , x5 , x7 . Informally, the solutions are
x1 = 2 − 2r + 4s − 5t
x2 =r
x3 = 3 − 7s − 8t + 5u
x4 =s
x5 =t
x6 = 5 + 9u
x7 =u
where r, s, t, u can take on any value. In set notation, the solution set is
{2 − 2r + 4s − 5t, r, 3 − 7s − 8t + 5u, s, t, 5 + 9u, u) | r, s, t, u ∈ R}.
Now we know how to solve systems and how to express the solutions. Sometimes though,
we just are interested in a simpler question. How many solutions does a system have? Is
it exactly one, infinitely many, or none at all? Let’s sum up what is totally obvious from
RREF.
Theorem 3. Suppose A x = b is a linear system. Suppose the matrix (A | b) is equivalent
to the RREF (C | d). Then
(1) If there is a pivot in the d column, the system has no solution. Otherwise it does.
(2) Now assume there is a solution. If every column of C contains a pivot, then the
solution is unique. If some column does not have a pivot, there are infinitely many
solutions.
Corollary 4. If a linear system has more unknowns than equations, than a unique solution
is impossible. If the system is consistent, i.e., has a solution, it has infinitely many.
Proof. The number of pivots cannot exceed the number of rows in the matrix, which is of
course the number of equations. Since the number of columns is greater, there must be at
least one column without a pivot. 

2. Homogeneous Systems
When I first learned about simultaneous equations (and I suspect it was the same for
most of you), the most important problems were systems which had a unique solution and
typically consisted of 2 equations in 2 unknowns or 3 equations in 3 unknowns. There could
be more equation and unknowns, but such problems took too long to do by hand. Now we
7
learned that some systems had infinitely many solutions or none, but those systems seemed
like oddities.
In reality, we were not wrong in thinking that systems with unique solutions are impor-
tant. We note that that is how we find the solution to a second order linear differential
equation satisfying specific initial conditions. However, systems with infinitely many solu-
tions or no solution at all are not just curiosities, but mathematical objects of real signifi-
cance. Even systems where the number of equations does not equal the number of variables
naturally arise.
One type of problem is extremely important in linear algebra and will play a major role
in this course. These are called homogeneous linear systems and are characterized by the
property that the constants on the right hand side of all equations are zero. We will soon
see that homogeneous systems are vital in determining linear independence. Later on, we
will see that they are crucial for finding the general solution to systems of linear differential
equations.
    
1 4 2 1 x1 0
2 7 3 0 x2  0
Example 8.  3 1 −5 0 x3  = 0.
   

6 5 −7 1 x4 0
This is an example of a homogeneous system. If you compare it to a non-homogeneous
system, e.g., Example ?? or Example ??, you might notice something right away. It has a
solution. We can recall that Example ?? had infinitely many solutions while Example ?? did
not have any, but we really didn’t know that until we had solved the system. In Example ??,
we can notice that x1 = 0, x2 = 0, x3 = 0, x4 = 0 is a solution without doing any work at
all. This solution is called the trivial solution and works for every homogeneous system.
Hence a homogeneous solution has either infinitely many solutions or the trivial solution as
a unique solution.
Now we solve homogeneous equations in much the same way as other systems with a few
adjustments. First of all, there is no point in writing out the augmented matrix. You could
do this if you like, but the last column would be all zeroes and at every step, this column
won’t change. So we just work with the original matrix and imagine the zeroes are there
when we want to read off the answer. Now let’s solve Example 8 by putting the matrix into
RREF and then reading off the solution.
       
1 4 2 1 1 4 2 1 1 4 2 1 1 4 2 1
2 7 3 0 0 −1 −1 −2 0 1 1 2  0 1 1 2 
3 1 −5 0 ∼ 0 −11 −11 −3 ∼ 0 −11 −11 −3 ∼ 0 0
Example 8.        ∼
0 8 
6 5 −7 1 0 −19 −19 −5 0 −19 −19 −5 0 0 0 delta
     
1 4 2 1 1 4 2 0 1 0 −2 0
0 1 1 2  ∼ 0 1 1 0 ∼ 0 1 1 0
   

0 0 0 1   0 0 0 1   0 0 0 1
0 0 0 delta 0 0 0 0 0 0 0 0
This gives us a single free variable x3 because the first, second, and fourth columns have
pivots but the second does not. As we did before - and remembering our right hand side was
just zeroes - we can replace our original system by x1 = 2x3 , x2 = −x3 , x4 = 0 and see that
the solution set is {(2t, −t, t, 0) | t ∈ R}.
Now there is another way in which homogeneous systems differ from non-homogeneous
ones. Notice that if we choose t = 1, we get a solution (2, −1, 1, 0) and that every other
8
solution is just a multiple of this solution. If you can imagine 4-space, the solution set is
just a straight line through the origin determined by this one other point. [The solution
set for a non-homogeneous system does not contain the origin.] We refer to the solution
(2, −1, 1, 0) as a basic solution.
The general situation is a little more complicated than this, but not much. What happens
when we put the matrix from a homogeneous problem into RREF? There can never be a
pivot in the constants column of course, because the constants column is zeroes and we don’t
even use it. If every other column contains a pivot, then the trivial solution is the unique
solution. If there are columns without pivots, than we have one or more free variables.
Example ?? is an illustration of what happens when there is one free variable. To see
what happens when there are more free variables, I will look at a homogeneous version of
Example ??.
 
x1
x2 
    
1 2 0 −4 5 0 0  x3 
 0
Example 9. Solve the system of equations 0 0 1 7 8 0 −5 x4 
  
 = 0.
0 0 0 0 0 1 −9  x5 
 0
x6 
x7
As in Example ??, we can read off the general solution

x1 = −2r + 4s − 5t
x2 =r
x3 = −7s − 8t + 5u
x4 =s
x5 =t
x6 = 9u
x7 =u

However, there is another way to look at this. Suppose we set one of our free variables,
say x2 equal to 1 and set each of the other free variables equal to zero. This means we are
letting r = 1 and s = t = u = 0. We get the basic solution.

x1 = −2
x2 =1
x3 =0
x4 =0
x5 =0
x6 =0
x7 =0

Written in vector form, this is (−2, 1, 0, 0, 0, 0, 0).


Of course, we can do the same thing with x4 , x5 , x7 and we get three more basic solu-
tions, namely (4, 0, −7, 1, 0, 0, 0), (−5, 0, −8, 0, 1, 0, 0), (0, 0, 5, 0, 0, 9, 1) Now we have a sit-
uation similar to the previous example where every solution was a multiple of the basic
9
solution. Here every solution is a linear combination of the four basic solutions. Precisely
(−2r + 4s − 5t, r, −7s − 8t + 5u, s, t, +9u, u) =
r(−2, 1, 0, 0, 0, 0, 0) + s(4, 0, −7, 1, 0, 0, 0)+t(−5, 0, −8, 0, 1, 0, 0) + u(0, 0, 5, 0, 0, 9, 1)
What we will shortly learn is that the solution set to any homogenous system is a vector
space and that this particular example gives a four dimensional vector space. The four
basic solutions constitute a set of building blocks or basis for this vector space. This will
be extremely important when we consider eigenvectors and generalized eigenvectors.

3. Understanding Matrix Multiplication


A matrix is an array of numbers, but it also has a simpler interpretation as an ordered
set of vectors.
  Consider  thecolumn vectors
   
3 −2 6 1
v1 = −1 , v2 =
   4 , v3 =
  2 , v4 = 1. Notice that the matrix A =
 
0 11 −5 6
 
3 −2 6 1 
−1 4 2 1 is simply the four vectors juxtaposed. We can think A = v1 v2 v3 v4 .
0 11 −5 6
 
3
6
Next consider the column vector w =  −5. Since A is a 3 × 4 matrix and w is a 4 × 1 ma-

2
       
3 −2 6 1
trix, we can multiply. However notice that we get is 3 −1 +6  4  −5  2  +2 1.
0 11 −5 6
So Aw = 3v1 + 6v2 − 5v3 + 2v4 is actually a linear combination of the column vectors which
comprise A and the weights are just the entries of the vector w.
This perspective even makes more general matrix multiplication easier to understand.
Suppose A is an m × n matrix and B is an n × k matrix. Then we can view both A and B
as juxtapositions of column vectors. Suppose A = (v1 · · · vn ) and B = (w1 · · · wk ). Then
AB is also a juxtaposition of column vectors, in fact AB = (Aw1 · · · Awk ) and of course
the description of A helps us compute each Awi .
Now consider the following example.
Example 10. A company manufactures five products. Product A requires 3 units of steel, 1
unit of aluminum, 5 units of plastic, and 1 unit of glass. Product B requires 0 units of steel,
4 units of aluminum, 2 units of plastic, and 2 units of glass. Product C requires 5 units
of steel, 0 units of aluminum, 6 units of plastic, and 0 units of glass. Product D requires
8 units of steel, 0 units of aluminum, 0 units of plastic, and 3 units of glass. Product E
requires 3 units of steel, 9 units of aluminum, 8 units of plastic, and 4 units of glass. If they
wish to manufacture 1000 of Product A, 26 of Product B, 970 of Product C, 100 of Product
D, and 65 of Product E, how many units of each raw material is required?
What we need very simply is a function of five variables. We must input the number
of each product that we wish to manufacture and the function should then spit out four
numbers, one for each of the raw materials. We are looking for a function of the sort
f (a, b, c, d, e) = (r, s, t, u). The way we do this is to create a vector for each product.
10
         
3 0 5 8 3
1 4 0 0 9
Let vA =  5, vB = 2, vC = 6, vD = 0, vE = 8. Now notice that
        

1 2 0 3 4
1000vA + 26vB + 970vC + 100vD + 65vE is a 4-vector which gives the answer to our
question.  
1000
 26 
  
If we let S = vA vB vC vD vE , then the answer is just the product S   970 .

 100 
65
So the function we are seeking is simply multiplication by the matrix S. The function
takes a 5-vector w and produces a 4-vector Sw, which is of course just a suitable linear
combination of the five vectors which detail how much of each raw material was needed for
each of the products.
This is a special kind of function called a linear transformation, something we will see
more of in Section 3.7 of the text and in Section ?? below. Linear transformations are great
to work with because there are surefire techniques to work with them. It is not like solving
differential equations by integration, where you can always be stumped by an impossible
integral. If we change Example 8 into a company which manufactures 17 products, each
requiring some subset of 46 ingredients, we can easily get a computer to solve the problem.

4. The Basic Vector Spaces


To be overly simplistic, vector spaces are a wonderful concept because they crop up
in many places and they are all the same. This is an exaggeration of course, but only
an exaggeration. We will see in Section 3.3 of the text that to each vector space, we
can associate a number called dimension. The simplest example of a two dimensional
vector space is the set of 2-vectors, which we can think of as points or arrows in the plane.
Another example, which arises in a completely different fashion, is the set of solutions to
the differential equation y 00 − 5y 0 = 6y = 0. That solution is of course the set of all functions
C1 e2t +C2 e3t . Notice how C1 and C2 play basically the same role as the x and y coordinates
of the points. Understanding vector spaces allows us to take advantage of these similarities
and this will be vital when we get to the more complicated vector spaces which arise later
in Chapter 3.
We saw on page 273 the definition of a vector space. We will work with many examples
of vector spaces and we need to know that the examples are actually vector spaces. The
best technique for proving that something is a vector space is by something I’ll call the
“subspace method”. If V is a subspace of a known vector space (See the next section for
the definition.), then V is itself a vector space. This is really the technique that Braun
uses, but unfortunately he does not actually mention subspaces, so we need to rectify that
omission in these notes. Virtually every vector space we will be concerned with is a subspace
of what I will refer to as a basic vector space. Here I will discuss the basic vector spaces
and show why each is truly a vector space.
We begin with the simplest vector space of all.
Example 11. R, the real numbers, is a vector space.
11
To understand why this is true, just look at the axioms. All they say is that addition
and multiplication behave nicely. Each of the axioms is a simple fact that all of us have
known for years.
We prove all of our other examples with one theorem. In the hypothesis, a set T and a
vector space V are mentioned. You may consider on first reading the case where both T
and V are the real numbers and so W is just the set of all functions from the real numbers
to the real numbers. Later, when you understand the proof on those terms, you can see
that it works for any T or any V .
Theorem 5. Let T be a set and let V be a vector space. Let W be the set of all functions
from T to V . Then W is a vector space.
Proof. We add functions by the simple rule (f1 + f2 )(t) = f1 (t) + f2 (t) for every t ∈ T and
we scalar multiply by the rule (cf )(t) = cf (t). Now we must verify the axioms. For Axiom
(i) to hold, we must have f + g = g + f for each f, g ∈ W . So when are two functions equal?
- when they take on the same value at every point. Hence we need (f + g)(t) = (g + f )(t)
for every value of t. Of course, by definition this is the same as f (t) + g(t) = g(t) + f (t).
Since f (t), g(t) are elements in the vector space V , this is Axiom (i) for V and so Axiom
(i) holds for W . Likewise, Axiom (ii) reduces to an equation in V which holds because the
vector space V satisfies Axiom (ii). The same proof works for Axioms (v),(vi),(vii),(viii).
For Axiom (iii), we must actually define our zero vector. It will be the zero function,
z(t) = 0V for all t ∈ T where 0V is the zero vector in V . Again, to see that f + z = f for
any function f , we need only show that f (t) + z(t) = f (t) for every value of T . And again
we are just checking an equation in V , in this case f (t) + 0V = f (t), which is indeed true.
For Axiom (iv), we define our minus functions by (−f )(t) = −f (t) and as usual we verify
that f + (−f ) = z by showing that f (t) + (−f )(t) = z(t) for every t ∈ T . One more time,
we have reduced our goal to checking an equation in the vector space V and once again it
holds by the corresponding axiom for V . Here f (t) + (−f (t)) = 0V . 
So far we know only one vector space, the real numbers R. Now we can use this theorem
and several different sets T to use R to create other important examples.
Example 12. The set of all functions from the real numbers to the real numbers form a
vector space. The same is also true if we choose a smaller domain, e.g., functions defined
on a particular interval.
This is the special case of the theorem where T = R and V = R that I suggested you
look at to better understand the proof.
Example 13. Rn , the set of n-vectors, is a vector space.
Proof. An n-vector is really just a function  the set {1, 2, . . . , n} to the real numbers.
 from
4
π
For example, we can regard the vector   6  as the function f where f (1) = 4, f (2) =

−3
π, f (3) = 6, f (4) = −3. 
Example 14. The set of m × n matrices is a vector space.
Proof. A matrix is just a function from {(1, 1), . . . , (1, n), (2, 1), . . . , (2, n), . . . , ((m, 1), . . . , (m, n)}
to the real numbers. 
12
And finally one last example which will be very important for us. This differs from the
last three examples in that we will employ Theorem ?? with V equal to the vector space
given by Example ??.
Example 15. An n-vector whose entries are functions from the real numbers (or some
subset of the real numbers such as an interval) to the real numbers is a vector space.
Proof. As in Example ??, we are letting T = {1, 2, . . . , n} in an application of Theorem ??.
Here we are using Example ?? for V . 
Vector spaces are abstract mathematical objects. However, every vector space we en-
counter in this course will either be one of these or a subspace of one of these or a complex
number analogue. For example, the solutions to second order linear differential equations
are subspaces of Example ??. The solutions to the equation (d/dt)x = A x will be subspaces
of Example ??.

5. Subspaces of Vector Spaces


In this section, we continue the theme of the last one - showing something is a vector
space without actually checking the axioms. It turns out to be surprisingly easily. Some
vector spaces that we encounter are among the basic vector spaces discussed in Section ??.
However another situation arises commonly. There is a vector space W that attracts our
interest (except we don’t know if it is a vector space or not). The elements of W are a subset
of the elements of a known vector space V . The way you add vectors in W and multiply
scalars times vectors in W is exactly the same way that you would do these things if you
regarded the elements as vectors in V . So W is a subset of V with the same operations. In
this setting, we refer to W as a subspace of V .
Example 16. Let V be the vector space of all functions from the real numbers to the real
numbers and let W be the set of solutions to the differential equation y 00 − 5y 0 + 6y = 0.
Then W is a subspace of V .
Notice that all of the solutions are functions and the way that we add and scalar multiply
solutions is exactly the way we add and scalar multiply all functions.
Why is this a useful concept? Because it is actually easier to prove that W is a subspace
of V than it is to directly prove that W is a vector space. Then, when we have proved that
W is a subspace of V , we know it has to be a vector space because that is what subspaces
are.
Think about the problem of showing that W is a vector space. We must verify the eight
axioms on page 273. Consider Axiom (i). We must verify that if the functions y1 (t) and
y2 (t) both satisfy the differential equation, then y1 (t) + y2 (t) = y2 (t) + y1 (t). We can prove
this, but of course it is true because it is true for any functions y1 (t), y2 (t) whether they
satisfy the differential equation or not. Axiom (i) for W is simply a special case of Axiom
(i) for V . We learned in the previous section (Example ??) that V is a vector space and so
we already know that Axiom (i) holds for V . Why should we need to prove Axiom (i) for
W?
If you look at the list of axioms further, you see that Axiom (i) is not the only axiom
like this. Axioms (ii), (v), (vi), (vii), (viii) work exactly the same way. Only Axioms (iii)
and (iv) are more complicated because there is a danger that when we tossed some of the
elements out of V to get W , we might have discarded 0 or some of the negatives. All in
13
all, though, if W is a subset of a known vector space V and we want to show that W is a
vector space, the easiest approach is to prove that W is a subspace of V . To do this, we
invoke the following theorem, which I will prove for completeness.
Theorem 6. Let W be a subset of a vector space V which is given the same operations as
V . Then W is a subspace of V if and only if
(1) 0 ∈ W
(2) if x, y ∈ W , then x + y ∈ W (closure under addition)
(3) if x ∈ W and c is a scalar, then cx ∈ W (closure under scalar multiplication)
Proof. The necessity of the three conditions is obvious. If condition (1) fails, axiom (iii)
won’t hold. If condition (2) fails, you actually don’t really have vector addition. You think
you know how to add x and y, but sometimes the sum is something not allowed, so you
really can’t add after all. Likewise, if condition (3) fails, there is no scalar multiplication.
Now we must prove these three conditions are sufficient. We must check the eight axioms.
As I argued above, six of the axioms (all but Axioms (iii) and (iv)) are implied by the
corresponding axioms for V , which we know to be true. For example, if the associative law
holds when adding all vectors in V , it certainly holds when adding some of the vectors in V ,
i.e., those in W . Also, since we assume 0 ∈ W , we also get Axiom (iii). So it only remains
to prove Axiom (iv).
For axiom (iv), we need to know that if x ∈ W , then its negative, − x, is in W . Since W
is closed under scalar multiplication, we have (−1) · x ∈ W . Obviously, for any vector space
we will encounter, (−1) · x is the negative of x and that completes the proof. Technically,
this last fact must be proved for abstract vector spaces, but such a proof, though actually
not difficult, is not in the spirit of this course. 
Now I remark that while Braun never defines subspaces or proves this theorem, this is
essentially how he demonstrates things are vector spaces in Section 3.2 and how you are
expected to do the homework from Section 3.2. Braun actually adds the step of proving
the existence of negatives, but this is unnecessary.
Next we see how to easily get subspaces; we can choose them to satisfy the hypothesis of
Theorem ??.
Theorem 7. Let V be a vector space and let S be a nonempty subset of V . Define the
span of S, which we write span(S), to be the set of all linear combinations of elements in S.
Then span(S) is a subspace of V and it is in fact the smallest subspace of V which contains
S.
Proof. Suppose S equals the finite set {x1 , x2 , . . . , xn }. We shall verify the three conditions
of Theorem ?? for span(S) and thus show span(S) is a subspace. Since 0 = 0·x1 + · · ·+0·xn ∈
span(S), condition (1) holds. For condition (2), suppose v, w ∈ span(S). Then we have
numbers ci , di such that v = c1 x1 + · · · + cn xn and w = d1 x1 + · · · + dn xn . We see that
v + w = (c1 + d1 ) x1 + · · · + (cn + dn ) xn ∈ span(S). Finally, for condition (3), let a be
any number and v = c1 x1 + · · · + cn xn as above. Then a · v = a(c1 x1 + · · · + cn xn ) =
(ac1 )v1 + · · · + (acn )vn ∈ span(S).
If the set S is infinite, the proof is a little more complicated, but essentially the same. I
won’t include it here. 
This theorem is very useful and I will offer two examples. The first shows us how linear
algebra can be used to understand Chapter 2. The second gives a preview of the next
section.
14
Example 17. Let W be the vector space of Example ??, the set of solutions to the differen-
tial equation y 00 − 5y 0 + 6y = 0. Then the functions e2t and e3t are vectors in W . Theorem
5 tells us that W1 = span({e2t , e3t }) is a subspace of W . In Section 3.3 of the text, we
will learn about dimension and see that because both of these are two dimensional vector
spaces, they are necessarily equal. Thus we have the important fact that every solution to
the differential equation has the form C1 e2t + C2 e3t .
 
a
Example 18. The set of all vectors of the form 2a + 7b, that is, the set of all linear
3a + 9b
   
1 0
combinations of the vectors 2 and 7, constitute a subspace of R3 .
3 9
6. Linear Independence, Span, Basis
This section assumes you have already read Section 3.3. It doesn’t assume that you
understand it.  
c1
3
Every vector in R has the form c2 . You know the vector if you know the entries.

c3
If V is a vector space with basis v1 , v2 , v3 , then every vector in V can be expressed
uniquely in the form c1 v1 +c2 v2 +c3 v3 . The coefficients c1 , c2 , c3 play the same role as the
x, y, z coordinates. A basis lets us coordinatize vectors and lets us pretend (for calculational
purposes) that every finite dimensional vector space is Rn for some n. If you know the basis
and you know the ”coordinates”, you know the vector.
How do you find a basis? How do you recognize a basis when you have one? The answer
to the second question comes from the following theorem, which is really just a summary
of the important results of Section 3.3.
Theorem 8. Suppose V is a vector space and S = {v1 , v2 , . . . , vn } is a set of vectors in
V.
(1) If S is linearly independent and dim V = n, then S is a basis.
(2) If S spans V and dim V = n, then S is a basis.
(3) If S is linearly independent and spans V , then S is a basis and dim V = n.
In this course, our primary use of this theorem will be Part (1). In Chapter 2, we knew
that the vector space of solutions to a second order homogeneous linear differential equation
was two-dimensional and so we just had to find two linearly independent solutions to get our
basis. In Chapter 3, we know that the vector space of solutions to a homogeneous system
of n linear equations is n-dimensional and so the game is to find n linearly independent
solutions. In this world, it is important to be able to determine whether or not a set of
vectors is linearly independent while we never end up trying to prove that a set spans.
So how do we test to see if a set is linearly independent. We need to determine if the
equation c1 v1 +c2 v2 + · · ·+cn vn = 0 has just one solution (the trivial solution) or infinitely
many. Well, if v1 , v2, . . 
. , vk 
are vectors
 in Rn , this equation is a matrix equation, namely
c1 0
  c2   0 
v1 v2 · · · vk  . . . = . . .. This is just a homogeneous system of n equations in
  

ck 0
15
k unknowns and we solve it by row reduction. If there is a pivot in every column, then
the solution is unique and the set is linearly independent. If some column lacks a pivot,
then there is a free variable, infinitely many solutions, and the set is linearly dependent. Of
course, this is what has to happen if n < k because the number of pivots can’t exceed the
number of rows and so there are not enough pivots to put one in every column. In any case
though, checking vectors in Rn for linear independence is completely straightforward. Set
up the matrix, put in row echelon form, check the pivots, and you have your answer.
So what happens if the vectors are not in Rn . Well, if we have another basis, we can
simply coordinatize and use the same process. Otherwise the process is trickier. This is
why the theorem (Theorem 6) we will encounter on Page 293 is so wonderful. It tells us how
to check for linear independence when our vectors are n-vectors of functions which solve a
linear system by checking vectors in Rn .

7. Linear Transformations, Null and Column Spaces


A very important concept in linear algebra is the notion of a function from one vector
space to another. Of course, we want a function that respects the addition and scalar
multiplication. These functions will be called linear transformations. We can actually
define a linear transformation T : V → W where V and W are any vector spaces, but the
most important cases are linear transformations of the form T : Rn → Rm . Braun defines
these more specific linear transformations when m = n on page 324 of the text, but I will
give the more general definition here.

Definition. Let V and W be vector spaces. A function T : V → W is called a linear


transformation provided that, for all v1 , v2 ∈ V and all numbers c,
(1) T (v1 + v2 ) = T (v1 ) + T (v2 )
(2) T (a v1 ) = aT (v1 )

Example 19. Let P be the vector space of all polynomials (a subspace of the vector space
d
of all functions). Define T : P → P by T (f (t)) = dt f (t). Notice that Rules (1) and (2) for
linear transformation are just our old friends - the rules that the derivative of a sum is the
sum of the derivatives and that we can factor out constants when we are taking derivatives.

Example 20. Let A be an m × n matrix. The matrix A allows us to define a linear


transformation T : Rn → Rm via the equation T (x) = Ax. In the case m = n, the statement
that (1) and (2) are satisfied is precisely the lemma on page 270 of the text. However,
nowhere in the proof is the value of m actually used. It shows that the ith components
of the corresponding vectors are equal and it makes no difference whatsoever how many
components there are.

Now we come to one of the really beautiful ideas in linear algebra. If V is a finite
dimensional vector space, it has a basis x1 , . . . , xn . Any vector in the vector space can be
uniquely
 written
 in the form c1 x1 + · · · + cn xn and so you can visualize it as a vector in
c1
 c2 
Rn - 
. . . - a column vector which, if you know what the basis is, describes the original

cn
vector completely. With this coordinatization, a linear transformation T : V → V can be
16
   
c1 d1
 c2   d2 
visualized as a function T (
. . .) = . . .. Because T is a linear transformation, each
  

cn dn
   
d1 c1
 d2   c2 
. . .= A . . . for some n × n matrix A.
di will be a linear function of the ci ’s and so    

dn cn
What we see then is that in the world of finite dimensional vector spaces, Example ?? isn’t
just one example; it is the only example.
The next observation is both wonderful and a bit baffling. If you use a different basis for
V , the matrix of the transformation will be different. The confusing part is that knowing
the matrix of a transformation does not tell you the transformation unless you know what
the basis is. Many matrices can refer to the same transformation, while one matrix can
refer to many transformations. The wonderful part is that since we have many matrices
to pick from, we can choose one that is easy to work with. The next example is our first
encounter with diagonalizing matrices.
 
42 30
Example 21. Consider the transformation T : R2→ R2
given by T (x) = x.
−60 −43
This is a simple transformation of the plane, butit  certainly
 isn’t easy to picture what it
3 −2
does. However, the plane has another basis, , . With respect to this basis, the
−4 3
 
2 0
transformation is given by the matrix . Now, given any x ∈ R2 , you can actually
0 −3
construct T (x) geometrically without doing any calculations at all. TRY IT. You will be
working with
 arrows and not algebra, but I’ll describe what is happeningalgebraically.
 What
6 3
is T ( )? You decompose the vector into two vectors, one in the direction and
−5 −4
       
−2 6 3 −2
one in the direction. As it happens, =2 + . Now you stretch the
3 −5 −4 3
first vector to twice itslength
 and youreverse
 the  second vector and triple its length. The
3 −2 18
exact answer is then 4 −3 = , which is hopefully close to what you got
−4 3 −25
graphically.

However, the value of this goes beyond just understanding the transformation better.
What we are working toward is solving linear systems of differential equations. For example,
we would like to find the general solution to the differential equation
    
d x1 42 30 x1
=
dt x2 −60 −43 x2
   
3 −3t −2
In fact, the general solution to this system is C1 e2t + C2 e . So you can see
−4 3
that Example ?? holds the key to solving linear systems. Now finding that special basis
was not guesswork. In Section 3.8, we will learn how to find that alternate basis.
17
There are three important vector spaces which are connected to an m×n matrix A. First
let T denote the linear transformation from the previous example. Recalling how matrix
multiplication works from Section 2, we can easily see exactly which vectors in Rm can be
written as Ax. If b = Ax, then b is necessarily a linear combination of the columns of A
and it is just as clear that every linear combination of the columns of A is equal to A x
for some choice of x. The image or range of the function T is thus the set of all linear
combinations of the columns of A. This motivates the following definition.
Definition. The column space of an m×n matrix A is the subspace of Rm which is spanned
by the columns of A, or equivalently, the set of all linear combinations of the columns of A.
Of course, this means that A x = b has a solution if and only if b is in the column space of
A. Using function terminology, the column space is the range of the transformation defined
by A.
There is also a row space, which is also the column space of the transpose of A. Finally,
there is the vector space of vectors which solve T (x) = 0 or A x = 0.
Definition. The nullspace of A is the set of all solutions to the homogeneous equation
A x = 0.
In understanding transformations, which we really won’t pursue in this course, the
nullspace is very important. For our purposes however, it is important that we be able
to find the nullspace and to find a basis for the nullspace. In the next section, we will be
finding eigenvectors. For each eigenvalue λ, we will have to find its eigenspace, which is just
the nullspace for the matrix A − λI. Then we will need to find a basis for that eigenspace.
The techniques for solving is exactly that which we encountered in Section 2. We find a
basis by identifying the free variables - which correspond to the columns without pivots.
Then, we choose one vector for each free variable. To find v, the vector corresponding to
the free variable xi , we let xi = 1 and all of the other free variables equal to zero. We then
solve for the dependent variables.

8. Determinants and Inverses


In this section, we only deal with square matrices. Neither the concept of determinant
nor the concept of inverse makes sense for matrices that are not square.
I don’t like determinants because, unless a matrix is small, its determinant is very hard
to compute. Almost any problem in linear algebra can be done using determinants and
invariably, the determinantal method is clumsy and inefficient. They provide a poor way
to solve equations (Cramer’s Rule), a poor way to compute inverses, a poor way to test
for linear independence, and so on. Still, alas, they have two real uses and so I must teach
them.
The determinant of a square matrix A can be defined in a number of ways, both formally
and informally. I start with an informal one that is slightly ambiguous.
(1) In R1 , a point is just a number a and we can draw a line from the origin 0
to a. Suppose we have two points in R2 , say (2, 3) and (5, 1). If we throw in
the origin, we can form a parallelogram which has vertices, traveling clockwise,
(0, 0), (2, 3), (7, 4), (5, 1). Suppose we have three points in R3 , say, (4, 0, 1), (2, 6, −7), (8, 3, 5).
Then the three vectors from the origin to the points determine a parallelepiped.
Suppose we have four points in R4 , say (1, 2, 3, 4), (0, 0, −6, 5), (2, 4, 3, 7), (1, 9, 8, 3).
18
Then the four vectors from the origin to these points determine a four dimensional
parallelotope. And so on.
The line in R1 has a length, the parallelogram has an area, the parallelepiped has
a volume, the parallelotope has a four-dimensional ”volume”. Each of these things
can be computed using a mathematical formula and the coordinates of the points.
Since the length of the line is |a|, we see that our formula might require the use of
an absolute value sign at the last step and in fact it does in every dimension. We
want to use the formula before the absolute value is applied.
We can create square matrices by letting the columns of each matrix be the vectors
mentioned, which I will do. The determinant  is defined  so that the determinants of
  1 0 2 1
  4 2 8
 2 5 2 0 4 9 
the matrices a , , 0 6 3,  3 −6 3 8 are equal to ± the length

3 1
1 −7 5
4 5 7 3
of the line, the area of the parallelogram, the volume of the parallelepiped, the (4-
dimensional) volume of the parallelotope respectively. The single most interesting
thing about the determinant however is what it means when the determinant is
zero. If a 3 × 3 determinant is zero for example, it says that the volume of the
parallelepiped is zero. That is precisely the same thing as saying that the three
vectors all lie in the same plane and so don’t create a three dimensional object as
any truly three dimensional object has volume. So the determinant is zero exactly
when the vectors are linearly dependent.
(2) Alternately, we can actually write down the formula which gives the exact value
of a determinant. I will tell you what it is for completeness, but frankly it is
neither intuitive nor particularly useful to know. Of course, that means of no use
whatsoever in this course. Before writing the definition down, I should define a
permutation of the numbers {1, 2, . . . , n}. It is a rearrangement of them, the same
numbers in a possibly different order. Now any rearrangement can be accomplished
by swapping two numbers at a time. So a permutation is accomplished by a certain
number of transpositions. There are many ways to create the same permutation
by taking transpositions. For example, I can create (4, 3, 2, 1) via the sequence
(1, 2, 3, 4) → (4, 2, 3, 1) → (4, 3, 2, 1) or I could do it via (1, 2, 3, 4) → (2, 1, 3, 4) →
(2, 3, 1, 4) → (2, 3, 4, 1) → (3, 2, 4, 1) → (3, 4, 2, 1) → (4, 3, 2, 1). The first took two
steps; the second took six. What is not a coincidence is that both took an even
number of steps. We refer to (4, 3, 2, 1) as an even permutation and permutations
are either even or odd; this is a mathematical theorem which I won’t prove. If σ is
an odd permutation, define sign(σ) = −1 and if σ is an even permutation, define
sign(σ) = +1.
P
Definition. If A is an n×n matrix, det A = σ sign(σ)a1,σ(1) a2,σ(2) · · · a2,σ(n) . The
sum is taken over all possible permutations, of which there are n!.

The way to compute determinants however is not by using the definition. Before
discussing this, I do want to discuss the two actual uses of determinants. The first
application is to vector calculus. It does not pertain to this course, but is very
important. The transformation given by a 3 × 3 matrix A maps the unit cube to
the parallelepiped determined by the columns. So it maps something of volume 1
to something of volume | det A|. In fact, the transformation multiplies all volumes
19
by the same factor and so the determinant is a key correcting factor when changing
variables in triple integrals. Moreover, there is nothing special about dimension
three.
The second important application - which is critical in this course - is related to
the fact that the RREF of A will have a column without a pivot precisely when
det A = 0. When we are simply working with A, this is clumsy. However, the
question that will concern us is for which values of λ will A − λI have a column
without a pivot. These values are found by solving the equation det(A − λI) = 0.
Now we move on to computing the value of the determinant. For small values
of n, the determinant is easy to calculate. If n = 1, det A = a11 . If n = 2,
det A = a11 a22 − a12 a21 . If n = 3, det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 −
a13 a22 a31 − a11 a23 a32 − a12 a21 a33 . However, the determinant of a 4 × 4 matrix has
24 terms and should be done some other way.
The standard way to compute a 3 × 3 determinant is by a technique referred to
as
 basket-weaving. Set upa five column matrix by repeating the first two columns
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22  Then multiply along the diagonals, adding the terms
a31 a32 a33 a31 a32
that go down to the right and subtracting the terms that go down to the left.
 
4 2 8
Example 22. Find the determinant of the matrix 0 6 3.
1 −7 5
 
4 2 8 4 2
We set up the 3 × 5 matrix 0 6 3 0 6  and see that the determinant

1 −7 5 1 −7
equals (4)(6)(5) + (2)(3)(1) + (8)(0)(−7) − (8)(6)(1) − (4)(3)(−7) − (2)(0)(5) = 162.
The determinant of a 2×2 matrix is found by an even easier form of basket-weaving,
but the technique fails for 4 × 4. Notice that you only get eight terms and there are
24 of them.
Larger determinants are computed in one of two ways. There is a very cumber-
some technique of expanding about a row or column. Unless the matrix has lots
of 0’s, this requires many calculations. In general, a better technique more closely
resembles putting the matrix in RREF. As computing large determinants will not
be stressed in this course, these methods will be relegated to the Appedix at the
end of these notes.

Definition. If A is an n × n matrix, the inverse of A, denoted A−1 is the unique matrix


which has the property that A−1 A = AA−1 = In , the n × n identity matrix.
Not every square matrix has an inverse. In fact, a matrix will have an inverse precisely
when its RREF is the identity matrix, or equivalently, when its determinant is nonzero.
There is a technique using determinants to find the inverse, but it is clumsy except for 2 × 2
matrices.
   
a b −1 d −b
Theorem 9. If A = , then A = (1/ det(A)) .
c d −c a
20
The easiest way to find A−1 for larger matrices is by solving equations. Recalling  what
we did in Section 3, suppose A is a 3 × 3 matrix and suppose A−1 = v1 v2 v3 where
the columns vi are 3-vectors which of course we  don’t know
 yet. To find the columns, we
 1 0 0
want to solve the equation A v1 v2 v3 = 0 1 0. The first column can be found
0 0 1
 
1
by solving A v1 = 0 by Gaussian elimination. The second column can be found by
0
   
0 0
solving A v2 = 1 . The third column can be found by solving A v3 = 0 . It turns
0 1
out that it is much easier to solve three problems with the same coefficient matrix than it is
to solve three completely different problems. The reason is that the row operations that we
use to put the augmented matrix in RREF will be the same in each case and so we can be
more efficient. We begin with an augmented matrix that has three columns to the right of
the augmentation bar and when we perform row operations to put A into RREF, all three
answers will appear at once. This is actually a useful trick if you are asked to solve several
problems A x = b1 , A x = b2 , A x = b3 , A x = b4 , but in this course, we will only use the
trick to find inverses.
 
1 −4 1
Example 23. Find the inverse of  1 1 −2.
−1 1 1
We set up a large augmented matrix and we create theidentity
 matrix on the left side 
by
1 −4 1 | 1 0 0 1 −4 1 | 1 0 0
performing row operations.  1 1 −2 | 0 1 0 ∼ 0 5 −3 | −1 1 0
−1 1 1 | 0 0 1 0 −3 2 | 1 0 1
   
1 −4 1 | 1 0 0 1 0 −7/5 | 1/5 4/5 0
∼ 0 1 −3/5 | −1/5 1/5 0 ∼ 0 1 −3/5 | −1/5 1/5 0
0 −3 2 | 1 0 1 0 0 1/5 | 2/5 3/5 1
   
1 0 −7/5 | 1/5 4/5 0 1 0 0 | 3 5 7
∼ 0 1 −3/5 | −1/5 1/5 0 ∼ 0 1 0 | 1 2 3.
0 0 1 | 2 3 5 0 0 1 | 2 3 5
   
3 3 5 7
1 is the first column of the inverse and so forth. A−1 = 1 2 3.
2 2 3 5

9. Solving systems of linear differential equations


We begin by considering the elementary problem of solving a system of one equation in
one variable. Consider the problem dx dt = ax with x(0) = C. The solution, as we all know,
is x = Ceat . For reasons that will ultimately become clear, I prefer to write the solution as
x = eat · C.
21
Now, this may appear silly but we can view this as a vector and matrix problem.
d
x = A x and x(0) = x0
dt
d     
x = a x and x (0) = C
dt 
x = eat C
 

This suggests a rather preposterous solution to the general problem


d
x = A x and x(0) = x0
dt
namely
x = eAt x0
Amazingly, this incredibly naive idea is correct. Of course, at first we are baffled by the
very idea of eAt . Below I will explain what it means, how it can be found, and perhaps
most importantly,
!
why we rarely ever need to find it. First I am simply going to tell you
2 0
t
e2t
 
0
that e 1 3 = 3t .
e − e2t e3t
   
d 2 0 4
Example 24. Solve dt x = x where x satisfies the initial condition x(0) = .
1 3 7
e2t 4e2t
     
0 4
Then x = = and it is easy to check that this answer
e3t − e2t e3t 7 11e3t − 4e2t
is correct.
In Section 3.8, we learned how to do this problem without computing eAt . The reason this 
was possible is because if v1 , . . . , vn is a basis for Rn , the matrix eAt v1 eAt v2 · · · eAt vn
contains the same information as the matrix eAt , which is the same as eAt e1 eAt e2 · · · eAt en

if you think about it.
According to the procedure in Section 3.8, we first findthe eigenvalues  and eigenvectors.
2−λ 0
To find the eigenvalues, we compute the determinant of . The determinant
1 3−λ
is (2 − λ)(3− λ) and so the eigenvalues
  turn out to be 2 and
 3. We
  get the first
 eigenvector

2−2 0 0 0 0 0 1
by solving x= . A solution to x= is v1 = . Then,
1 3−2 0 1 1 0 −1
     
1 0 0 0
with λ = 3, we must solve x= and we get our second eigenvector v2 = .
1 0 0 1
   
1 0
Then we conclude that we have two independent solutions e2t and e3t . If you
−1 1
compute, you will notice that these two solutions are exactly eAt v1 and eAt v2 .  
2t 1
Incidentally, if you used the method in Section 3.8, you can use the solutions e
−1
 
3t 0
and e to find eAt . You find the columns one at a time. Since e1 = v1 + v2 and
1
e2 = v2 , eAt e1 = eAt v1 +eAt v2 and eAt e2 = eAt v2 .

22
Let’s go back to the first question. How do we make sense of eAt ? Remember that
ex = 1 + x + x2 /2! + x3 /3! + x4 /4! + . . .. So we define eAt = I + At + (At)2 /2! + (At)3 /3! + . . ..
This converges to an n × n matrix and by taking derivatives term by term, we see that
d At
dt e = AeAt , which is exactly the property that we want eAt to have.
 
3 0 0
 
4 0 0
 
Example 25.It is easy  to find the D
e if D is a 
diagonal matrix. Let us compute e 0 0 2 .
 
9 0 0 27 0 0 81 0 0
Since D2 = 0 16 0, D3 =  0 64 0, D4 =  0 256 0 , etc., we see that
0 0 4 0 0 8 0 0 16
 3 
e 0 0
there is no interaction of terms and eD =  0 e4 0 .
0 0 e2
This is another illustration of how diagonalizing matrices - the process of finding a basis
so that the matrix of the linear transformation with respect to that basis is diagonal - is so
useful.
Example 26. Another easy example occurs when Ak = 0 for some k because
 inthat case
0 1 0
the infinite series is just a finite sum. For instance, suppose A = 0 0 1. Then
0 0 0
 
0 0 1
A2 = 0 0 0 and A3 is the zero matrix. So
0 0 0
       
1 0 0 0 1 0 0 0 1 1 1 1/2
eA = 0 1 0 + 0 0 1 + (1/2) 0 0 0 = 0 1 1 
0 0 1 0 0 0 0 0 0 0 0 1
Likewise
1 t t2 /2
 

eAt = 0 1 t 
0 0 1
 
3 1 0
 
0
3 1
 t
Example 27. Find e 0 0 3 .
Because      
3 1 0 3 0 0 0 1 0
     
0 3 1 t 0 3 0t 0 0 1t
     
e 0 0 3 =e 0 0 3 e 0 0 0

,the answer isjust


e3t 0
  3t
1 t t2 /2 te3t t2 e3t /2
 
0 e
 0 e3t 0  0 1 t  =  0 e3t te3t 
0 0 e 3t 0 0 1 0 0 e3t
This last example used a theorem, which is important.
23
Theorem 10. If A, B are n × n matrices such that AB = BA, then eA eB = eA+B . In
particular, if A = λI, a scalar matrix (diagonal with equal entries along the diagonal), then
eA eB = eA+B .
Proof. eA+B = I + (A + B) + (A + B)2 /2 + (A + B)3 /3! + . . . = I + (A + B) + (1/2)(A2 +
2AB + B 2 ) + (1/6)(A3 + 3A2 B + 3AB 2 + B 3 ) + . . . and this equals (I + A + A2 /2 + A3 /3! +
. . .)(I + B + B 2 /2 + . . .) for the same reason that the formula works for numbers. 
Warning. If AB 6= BA, this formula does not hold. In this case, (A + B)2 = A2 + AB +
BA + B 2 and the argument above does not work.
NOW I WILL TELL HOW TO SOLVE ALL LINEAR SYSTEMS, at least theoretically.
If you can find a basis for Rn consisting of eigenvectors, then the method discussed in
Sections 3.8 and 3.9 leads to a solution every time. Since eigenvectors corresponding to
different eigenvalues are alway linearly independent, this can always be done if the char-
acteristic polynomial has no repeated roots. However, when the characteristic polynomial
has a double root, this only happens when we are lucky and we are not usually lucky.
Definition. Let A be an n × n matrix. If λ is an eigenvalue, an eigenvector for λ is a
solution of the equation (A − λI) v = 0. A generalized eigenvector is a solution of the
equation (A − λI)n v = 0.
Now if v is a generalized eigenvector, eAt v is a solution that is actually easy to find. Let
B = A − λI. Then eAt = eλIt eBt = eλt eBt . So eAt v = eλt (I + tB + t2 B 2 /2 + t3 B 3 /3! +
t4 B 4 /4!+· · · ) v = eλt (I v +tB v +t2 B 2 v /2+t3 B 3 v /3!+t4 B 4 v /4!+· · · ) v. Since B n v = 0,
all of the terms from B n v on are zero and it is a finite sum. The process then is to find a
basis of generalized eigenvectors v1 , . . . , vn and compute n linearly independent solutions
eAt v1 , . . . , eAT vn .
    
d x1 6 1 x1
Example 28. Find the general solution to the differential equation dt =
x2 −9 0 x2

The characteristic polynomial is just (6 − λ)(0 − λ) − (1)(−9) = λ2 − 6λ + 9 = (λ − 3)2 .


So we have a single eigenvalue λ = 3. We find an eigenvector by solving
   
 x 0
1  1  

3
x2 = 0
−9 −3
x3 0
   
3 1 1
Our matrix is row equivalent to , the free variable is x2 and our eigenvector is .
0 0 −3
 
1
We have our first solution e3t . Unfortunately, we have no more eigenvectors and so
−3
we must find a generalized eigenvector for λ = 3.
 2  
3 1 0 0
(A − 3 · I)2 = =
−9 −3 0 0
Hence every vector is a generalizedeigenvector
 and we may
  choose any vector we like as
1 0
long as we don’t pick a multiple of . I choose v2 = . Our second solution will be
−3 1
24
 
0
eAt , which we must compute.
1
        
At 0 3t 0 3 1 0 3t t
e =e ( +t )=e
1 1 −9 −3 1 1 − 3t
   
1 t
The general solution is C1 e3t + C2 e3t .
−3 1 − 3t
    
x1 1 0 0 x1
d  
Example 29. Find the general solution to the differential equation dt x2 = −4 1 0 x2 
x3 3 6 2 x3
Since the matrix is lower triangular, the characteristic polynomial is easy to compute and
it is (1 − λ)2 (2 − λ). For λ = 2, we find our eigenvector by solving
    
−1 0 0 x1 0
−4 −1 0 x2  = 0
3 6 0 x3 0
 
1 0 0
Our matrix is row equivalent to 0 1 0, the free variable is x3 and our eigenvector is

0 0 0
   
0 0
0. We have our first solution e2t 0. Next we move on to the eigenvalue λ = 1. We
1 1
must solve     
0 0 0 x1 0
−4 0 0 x2  = 0
3 6 1 x3 0
 
0 0 0
This matrix is row equivalent to 1 0 0 (I’m avoiding fractions.) and so we again have
0 6 1
   
0 0
one free variable x3 and an eigenvector −1. This gives us our second solution et −1.
6 6
Unfortunately, we have no more eigenvectors and so we must find a generalized eigenvector
for λ = 1. [Since λ = 1 is a double root, we get exactly two of our solutions from λ = 1.]
 2  
0 0 0 0 0 0
(A − 1 · I)2 = −4 0 0 =  0 0 0
3 6 1 −21 6 1
So we must solve     
0 0 0 x1 0
 0 0 0   x2 = 0
 
−21 6 1 x3 0
Obviously the solution set to this equation
  is a two dimensional vector space,
  but we already
0 2
have one of the basis vectors, namely −1, and so only need a second. 7 is an obvious
3 0
25
   
21 2
second solution [Yes,  0  also works.] and so our third solution is eAt 7, which we
1 0
must compute.
        
2 2 0 0 0 2 2
eAt 7 = et (7 + t −4 0 0 7) = et 7 − 8t
0 0 3 6 1 0 48t
     
0 0 2
The general solution is C1 e2t   t   t
0 + C2 e −1 + C3 e 7 − 8t.

1 6 48t
Remark 11. This can get pretty messy if we have double complex roots, but the theory is
the same and you won’t encounter anything like that in this course. The smallest matrix
where this could happen occurs when n = 4.

There is a sophisticated way to find eA which you will not be responsible for, but which is
interesting and should help your understanding of the subject. It also shows just how neat
math can be. A matrix is only diagonalizable if it has a basis of eigenvectors. However, we
can put non-diagonalizable matrices in a nice form as well. There are matrices called Jordan
blocks. A Jordan block is simply a scalar matrix with 1’s just above the main diagonal.
Some examples should give you a clear idea of what they are. 
  5 1 0 0 0
  9 1 0 0

 3 1
 −4 1 0 0 9 1 0 0 5 1 0 0
 
7 , ,  0 −4 1 ,     , 0 0 5 1 0

 Next, a Jordan matrix
0 3 0 0 9 1 
0 0 −4 0 0 0 5 1
0 0 0 9
0 0 0 0 5
is a matrix which is built by arranging Jordan blocks down the diagonal and filling out the
rest of the matrix with zeroes. Here is an example of a Jordan matrix made up of four
Jordan blocks.
 
3 0 0 0 0 0 0
0 3 1 0 0 0 0
 
0 0 3 1 0 0 0
 
Example 30.   0 0 0 3 0 0 0 

0 0 0 0 2 1 0
 
0 0 0 0 0 2 0
0 0 0 0 0 0 0
The blocks have sizes 1, 3, 2, 1 respectively.
26
Of course, the easiest Jordan matrices are diagonal matrices. For our purposes, there
are several very nice properties that Jordan matrices have. For instance, if J is the matrix
from Example ??, I can write down eJt without doing any calculations at all.
 3t 
e 0 0 0 0 0 0
 0 e3t te3t t2 e3t /2 0 0 0
 3t 3t

0 0 e te 0 0 0
eJt = 
 3t

 0 0 0 e 0 0 0 
2t 2t

0 0 0 0 e te 0
0 e2t 0
 
0 0 0 0
0 0 0 0 0 0 1
One just handles the Jordan blocks separately and patches them together. Example ??
shows us how to handle Jordan blocks.
The second neat thing is the following theorem.
Theorem 12. Suppose A is a square matrix.
(1) There exists a matrix B and a Jordan matrix J such that A = BJB −1 .
(2) eAt = BeJt B −1 .
I won’t prove this theorem, but the second statement is easy if you write out the terms
of the power series. I might remark that J is referred to as the Jordan canonical form.
Alas, this does not really make the problem at hand any easier. To use this theorem
to solve systems of differential equations, you must find the matrices J and B. Well, the
columns of B are actually a basis of generalized eigenvectors and so one must actually go
through the process above to find B.
The other point about Jordan canonical forms is that they say that linear transformations
are actually quite easy to understand if you pick the right basis. They fall into a small
number of different groups.

10. Appendix - Properties and Computation of Matrices


A third way to understand determinants is to know the properties they satisfy. These
can be proved using the formula - and Braun does this in the text. Also, they should seem
reasonable from the volume perspective.
Property 1. Interchanging two rows of a matrix reverses the sign of the deteminant.
Property 2. Multiplying a row of a matrix by a constant c multiplies the determinant by
c.
Property 3. Adding a multiple of one row to another does not affect the determinant.
Property 4. The determinant of a diagonal matrix or an upper triangular matrix or a
lower triangular matrix is just the product of the diagonal entries.
Property 5. The determinant of the transpose of a matrix equals the determinant of the
original matrix.
Property 6. det(AB) = det(A) det(B).
Here are three more, which are simple consequences of the previous.
Property 7. If two rows of a matrix are equal, then the determinant is zero.
27
Proof. If we switch the two identical rows, we don’t change the matrix but reverse the sign
of its determinant. Zero is the only number which equals its negative. 
Property 8. det(cA) = cn det(A)
We are multiplying the n rows by c one at a time.
Property 9. Column operations have the same effect as row operations.
This is a consequence of Property ??.
Now, in light of the first three properties, we can perform row operations on a matrix and
have a predictable effect on the determinant. If we put a matrix into RREF and keep track
of what operations we use, we can easily compute the determinant. If n ≥ 4, this is much
easier than computing n! products. In fact, we can actually stop when we get the matrix
into upper triangular form. I will illustrate this technique below. First a very important
theorem.
Theorem 13. The reduced row echelon form of a matrix is the identity matrix if and only
if the determinant of the original matrix is nonzero. Consequently, a system of equations
Ax = b has a unique solution if and only if det(A) 6= 0.
Proof. The first three properties tell us that elementary row operations can change the
determinant, but they cannot change whether or not the determinant is zero. If the RREF
is the identity matrix, there is a unique solution and the determinant is nonzero. If the
RREF is not the identity matrix, it will have fewer than n pivots and so a nonzero row
and a column without a pivot. We will not get a unique solution and we will get a zero
determinant. 

So finding the determinant is a way of predicting how row reduction will turn out and
row reduction is a way to compute a determinant.
Finally, I should say a little about computing large determinants. In general, the way to
go is by row reduction.
 
1 2 3 4 10
2 4 6 9 16
 
Example 31. Find the determinant of the matrix  3 6 8 11 29. First we use the

4 7 3 2 18
4 8 11 15 21
”1” in the upper left corner to zeroout the rest of the first  column. This is completely
1 2 3 4 10
0 0
 0 1 −4 

harmless by Property ??. We get 0 0 −1 −1 −1 

 Then we interchange the
0 −1 −9 −14 −22
0 0 −1 −1 −19
second
 and fourth rows, making
 a note that we have reversed the sign of the determinant.
1 2 3 4 10
0 −1 −9 −14 −22
 
0 0 −1 −1 −1  Next we subtract the third row from the fifth row to create a zero
 
0 0 0 1 −4 
0 0 −1 −1 −19
28
 
1 2 3 4 10
0 −1 −9 −14 −22
 
0 0 −1 −1 −1  Now we have an upper triangular matrix
in the (5, 3) place.  
0 0 0 1 −4 
0 0 0 0 −18
whose determinant is (1)(−1)(−1)(1)(−18) = −18. Recalling that we reversed the sign at
the second step, we see that the determinant of the original matrix is +18.
Life is probably easier if you avoid Type 2 row operations. It is not necessary to make
the pivots all equal to “1” and utilizing property 2 above gives you more to keep track of.
Here is an easy example to illustrate this.
 
4 7 9
Example 32. Find the determinant of the matrix 2 −1 14 . First we put zeros in
3 29 −4
the first column below the 4 by subtracting 4 times the first row from the second and 43 times
2

the first row from the third. If it is easier to envision using integer multiples of 1 47 49 ,

by all means do so. However, it serves no purpose to actually change the first row and
it forces you to remember that you divided the determinant by4 and so must  multiply by
4 7 9
4 later. Just leave the 4 in the upper left corner. So we get 0 − 92 19 
2 . Then add
95 43
0 4 −4
 
4 7 9
95 9 0 − 9 19 . The determinant then is
4 ÷ 2 times the second row to the third to get 2 2
0 0 709 18
just (4)(− 92 )( 709
18 ) = −709.

Yes, it is true that this problem is probably easier to do by basket-weaving, but one still
wants to know the method because row reduction is the superior technique for determinants
of size 4 × 4 or larger.
The alternative to row reduction is direct calculation. It would be very difficult to keep
track of all of the terms without being systematic. There is an approach which is systematic,
the cofactor method.
Definition. Let A be an n × n matrix. Let Aij denote the (n − 1) × (n − 1) matrix you
get by deleting row i and column j. Also let |A| denote the deteminant of A. So |Aij | is
the determinant of Aij .
Theorem
Pn 14. Let A be an n × n matrix. Then, for any fixedP choice of column j, |A| =
i+j a |A |. Also, for any fixed choice of row i, |A| = n i+j a |A |.
i=1 (−1) ij ij j=1 (−1) ij ij

I will go back to Example ?? to show what this theorem means. However, you should
quickly be convinced that this is a very bad way to do Example ??. Then I will offer another
example where this technique is actually helpful. The second matrix will have lots of zeroes
in it.  
1 2 3 4 10
2 4 6 9 16
 
The determinant of 3 6 8 11 29 equals

4 7 3 2 18
4 8 11 15 21
29
     
4 6 9 16 2 3 4 10 2 3 4 10
6 8 11 29
 − 2 · det 6 8 11 29 + 3 · det 4 6 9 16 − 4 ·
   
1 · det 
7 3 2 18 7 3 2 18 7 3 2 18
8 11 15 21 8 11 15 21 8 11 15 21
   
2 3 4 10 2 3 4 10
4 6 9 16 4 6 9 16
6 8 11 29 + 4 · det 6 8 11 29 and there is still a lot of work ahead of
det    

8 11 15 21 7 3 2 18
us. What we did above is much much easier.
 
3 2 0 1 0
2 0 0 0 0
 
Example 33. Find the determinant of the matrix  2 1 5 6 3. Taking advantage

9 7 0 4 2
2 7 0 5 0
of the zeroes, I will first expand about the second row, then expand about the new second
column, then expand about the last column. In each case, there will only be one nonzero
term.
 Finally, I willcompute the easy two by two determinant.
3 2 0 1 0  
2 0 0 0 0 2 0 1 0  
  1 5 6 3 2 1 0
2 1 5 6 3 = −2 · det 7 0 4 2 = (−2)(5) det 7 4 2 =
det      
9 7 0 4 2 7 5 0
7 0 5 0
2 7 0 5 0
 
2 1
(−2)(5)(−2) det = (−2)(5)(−2)(2 · 5 − 7) = 60.
7 5

30

You might also like