Math 146 Notes
Math 146 Notes
Here is our official definition of “field” (from Appendix C of Friedberg, Insel & Spence).
Definition. A field is a set F on which two operations + and · (called addition and multiplica-
tion) are defined (for all possible pairs in F, and always producing an element of F), and having
two privileged elements 0 and 1 with 0 ̸= 1, such that the following laws hold: for all a, b, c ∈ F,
(F 1) a+b=b+a
a·b=b·a
(F 2) a + (b + c) = (a + b) + c
a · (b · c) = (a · b) · c
(F 3) 0+a=a
1 · a = a.
(F 4) For every a ∈ F there exists x ∈ F with a + x = 0.
For every a ∈ F with a ̸= 0 there exists y ∈ F with a · y = 1.
(F 5) a · (b + c) = (a · b) + (a · c)
In any field F, we define −a to be the unique solution x to a + x = 0. Note that (F 4) guarantees
the existence of a solution, and uniqueness can be deduced from (F 1)–(F 3).
In any field F, if a ∈ F and a ̸= 0, then we define a−1 to be the unique solution x to ax = 1.
Note that (F 4) guarantees the existence of a solution, and uniqueness can again be deduced from
(F 1)–(F 3).
2
MATH 146 January 11 Lecture notes
Announcements:
P(F) with addition and scalar multiplication defined term-wise is a vector space over F.
Remark. A more standard notation for P(F) is F[x], but we will use P(F) in this course.
Next: some basic facts true of all vector spaces.
Theorem 1.1 (Cancellation Law). Let V be a vector space. If x, y, u ∈ V and x + u = y + u, then x = y.
Proof. Assume x + u = y + u. Let 0 be the zero vector of V (given by (VS 3). By (VS 4), there exists
z ∈ V with u + z = 0. Then
x = x+0 (VS 3)
= x + (u + z) by choice of z above
= (x + u) + z (VS 2)
= (y + u) + z assumption
= y + (u + z) (VS 2)
= y+0 choice of z above
= y (VS 3).
□
Corollary 1. Suppose V is a vector space. There is only one vector in V that can be the zero vector.
Proof. Suppose 01 , 02 ∈ V both satisfy (VS 3). Thus x + 01 = x + 02 = x for all x ∈ V . Use (VS 1) to flip
this to get 01 + x = 02 + x. Then 01 = 02 by the Cancellation Law. □
Corollary 2. Suppose V is a vector space and x ∈ V . There is only one vector u ∈ V satisfying x + u = 0.
Proof. Like the proof of Corollary 1; exercise. □
Definition. Let V be a vector space and x, y ∈ V .
(1) −x denotes the unique vector u ∈ V satisfying x + u = 0.
(2) x − y denotes x + (−y).
Theorem 1.2. Suppose V is a vector space over F, x ∈ V , and a ∈ F.
(1) 0x = 0.
(2) (−a)x = −(ax) = a(−x).
(3) a0 = 0.
Remark. Parts (1) and (2) are proved in the text. Parts (1) and (3) are also proved in Prof. Tatarko’s
notes.
Also note the overloaded notation in the statement. For example, in part (1), the first 0 is the scalar
0 ∈ F, while the second 0 is the zero vector in V . In part (2), the first − means minus in F, while the
second and third mean the minus in V defined above.
Here is a proof of (1), where I will use bold font for vectors, nonbold for scalars: In the field F we have
0 + 0 = 0; thus 0x = (0 + 0)x = 0x + 0x by (VS 8). On the other hand, we have 0x = 0x + 0 = 0 + 0x by
(VS 3) and (VS 1). Putting these together gives 0x + 0x = 0 + 0x. Now apply the Cancellation Law to
get 0x = 0.
2
MATH 146 January 13 Section 2
Assume V is a vector space over F and x1 , . . . , xn ∈ V . Because of (VS 2), we can (and do) write
expressions like x1 + x2 + · · · + xn without declaring where the brackets go. And we could (though we
won’t) prove true facts like
x1 + x2 + · · · + xn = xσ(1) + xσ(2) + · · · + xσ(n) for any permutation σ of {1, 2, . . . , n}
c(x1 + x2 + · · · + xn ) = cx1 + cx2 + · · · + cxn for any scalar c ∈ F
−(x1 + x2 + · · · + xn ) = −x1 − x2 − · · · − xn
Going forward, you can freely use these facts, except when told not to.
(VS 4): given x ∈ W , recall that −1 ∈ F so (−1)x ∈ W . Applying Theorem 1.2(2) and (VS 5) to V , we
get (−1)x = −(1x) = −x (the additive inverse of x in V ), which proves −x ∈ W . We now show that −x
is also an additive inverse of x in W :
x + (−x) = x +V (−x) = 0V by definition of −x. □
Incidentally, the proof shows that if W is a subspace of V , then:
• W always contains the zero vector of V (which is also the zero vector of W );
• W is closed under additive inverses (calculated in V ). Moreover, if x ∈ W then the additive inverse
of x calculated in V is the same as the additive inverse of x calculated in the vector space W .
Example.
(1) V = R3 as a vector space over R (usual operations). Let W be the following plane in R3 :
W = {(x, y, z) ∈ R3 : x + y + z = 0}.
It’s easy to check that W is a subspace of R3 : i.e., it is a subset of R3 , it is closed under addition
and scalar multiplication, and it is nonempty.
(2) Same V . Let W1 = {(x, y, z) ∈ R3 : x + y + z = 1}. W1 is not a subspace of R3 , for many reasons:
(i) it doesn’t contain the zero element of R3 , so it can’t be a subspace by the observation following
the proof of the Lemma; (ii) W isn’t closed under scalar multiplication; and (iii) W isn’t closed
under vector addition.
(3) Let W2 be the paraboloid
W2 = {(x, y, z) ∈ R3 : z = x2 + y 2 }.
W2 is not a subspace of R3 , as it is not closed under scalar multiplication. (It’s also not closed
under vector addition.)
(4) Given any vector space V :
• V is a subspace of itself.
• {0} is a subspace.
(5) The subspaces of R3 are R3 , all planes through 0, all lines through 0, and {0}.
(6) Let V be C considered as a vector space over R. (So scalar multiplication is R × C → C.)
(a) R is a subspace of V . (Proof: R ̸= ∅, R is closed under (complex) addition, and R is closed
under scaling by real numbers.)
(b) Can you find another subspace of V (other than C and {0})?
(7) Is R2 a subspace of R3 ?
(8) For each field F and n ≥ 0, Pn (F) is a subspace of Pn+1 (F) and is also a subspace of P(F).
2
MATH 146 January 16 Section 2
Theorem 1.4. Let V be a vector space over F. Let W1 , W1 be subspaces of V . Then W1 ∩ W2 is also a
subspace of V .
Proof. We check the definition of being a subspace.
(1) Let x, y ∈ W1 ∩ W2 . Then x, y ∈ W1 so x + y ∈ W1 . Similarly, x, y ∈ W2 so x + y ∈ W2 . Thus
x + y ∈ W1 ∩ W2 , proving W1 ∩ W2 is closed under addition.
(2) Let x ∈ W1 ∩ W2 and c ∈ F. A similar proof shows cx ∈ W1 ∩ W2 .
(3) We have 0 ∈ W1 and 0 ∈ W2 by the earlier Lemma, so 0 ∈ W1 ∩ W2 , proving W1 ∩ W2 ̸= ∅. □
Note: in general, it is not true that the union of two subspaces is a subspace. See Assignment 1.
§1.4
2
MATH 146 January 18 Section 2
Step 5: Neither occurrence of d can be eliminated (without re-introducing a, b or c). This signals that we
are done with the elimination process. Since we didn’t find a contradiction (such as 0 = 1), this system
has solutions. Since solutions exist, this means that the answer to our original question is YES:
2 1 1 0 1 1 0 1 1 2
is in span , , , .
−1 1 −1 0 0 −1 1 1 1 0
To actually see how to express the first matrix as a linear combination of the other four, we first describe
all solutions to the system (∗). Introducing a new symbol, or parameter, for d, say d = t, we can express
2
a, b, c, d in terms of t using the equations in (∗∗), obtaining
a = 2
b = − t
t ∈ R.
c = 1 − t
d = t
As t ranges over R, the equations above describe all solutions to (∗). For example, when t = 0 we get the
solution (a, b, c, d) = (2, 0, 1, 0). Thus one way to describe the original matrix as a sum of the other four is
“2 times the first matrix plus the third matrix,” i.e.,
2 1 1 0 1 1 0 1 1 2
=2· +0· +1· +0· .
−1 1 −1 0 0 −1 1 1 1 0
More information about the method of elimination will be placed on Learn.
3
MATH 146 January 20 Section 2
§1.5
Jargon. Let V be a vector space over F. Given a linear combination
a1 v1 + · · · + an vn
of vectors from V , we say call it trivial if a1 = a2 = · · · = an = 0, and nontrivial otherwise.
Definition. Let V be a vector space over a field F. Let S ⊆ V .
(1) Say S is linearly dependent if there exists a nontrivial linear combination of distinct vectors from
S which equals 0; i.e., if it is possible to write
(∗) a1 v1 + a2 v2 + · · · + an vn = 0
for some distinct vectors v1 , . . . , vn ∈ S , and such that at least one ai is ̸= 0 .
(2) Say S is linearly independent if S is not linearly dependent: i.e., if
v1 , . . . , vn ∈ S
v1 , . . . , vn distinct =⇒ a1 = a2 = · · · = an = 0.
a1 v1 + · · · + an vn = 0
Example. (1) In R2 , the set {(1, −1), (−2, 2), (3, 4)} is linearly dependent because we can write
1 −2 3 0
2 +1 +0 = .
−1 2 4 0
(2) In F3 , is the set {(1,
1, 0), (1,
0,1), (0,1,
1)} linearly
dependent or independent?
1 1 0 0
Solution: solve a 1 + b 0 + c 1 = 0. We get the system
0 1 1 0
a + b = 0
a + c = 0
b + c = 0.
If F = R (or C or Q), we can solve to get a = b = c = 0, meaning the set is linearly independent.
But if F = Z2 , we get the solution a = b = c = t (t ∈ Z2 ). In fact, the sum of the three vectors in
(Z2 )3 is the zero vector, so in (Z2 )3 these vectors are linearly dependent.
(3) In any vector space, {0} is linearly dependent. Any set containing 0 is linearly dependent.
(4) {v} is linearly dependent iff v = 0. (Use A1 Problem 3 for ⇒.)
(5) What about ∅? Is it linearly dependent, or independent? (Answer: independent)
(6) Suppose S ⊆ T ⊆ V . Which of the following implications are correct?
?
S is linearly dependent =⇒ T is linearly dependent. (True)
?
T is linearly dependent =⇒ S is linearly dependent. (False)
?
S is linearly independent =⇒ T is linearly independent. (False)
?
T is linearly independent =⇒ S is linearly independent. (True)
Proposition. Let V be a vector space over F. Let S ⊆ V . TFAE:
(1) S is linearly dependent.
(2) S = {0} or ∃v ∈ S such that v can be expressed as a linear combination of other vectors in S.
Proof. (⇐) We have already explained why {0} is linearly dependent. Suppose the vector v ∈ S can be
written as a linear combination of other vectors u1 , . . . , un ∈ S, say
v = c1 u1 + · · · + cn un , c1 , . . . , cn ∈ F.
We can assume that u1 , . . . , un are distinct. Since v ̸∈ {u1 , . . . , un } by assumption, it follows that
u1 , . . . , un , v are distinct. Now note that (−1)v = −v by Theorem 1.2(3) and so
c1 u1 + · · · + cn un + (−1)v = 0.
As u1 , . . . , un , v are distinct vectors in S and at least one of the coefficients (−1) in the above linear
combination is not 0, we get that S is linearly dependent.
(⇒) Assume S is linearly dependent. By assumption there exist distinct u1 , . . . , un ∈ S and a1 , . . . , an ∈
F, not all 0, such that
(∗) a1 u1 + · · · + an un = 0.
By “weeding out” terms where ai = 0, we can assume that ai ̸= 0 for all i = 1, . . . , n.
Case 1: n = 1. Then a1 u1 = 0 with a1 ̸= 0. Thus u1 = 0 by A1 Problem 3, so {0} ⊆ S. One option is
S = {0}; the other is that there exists v ∈ S with v =
̸ 0. In the second option, we can write 0 ∈ S as a
linear combination of v ∈ S, namely, 0 = 0v.
Case 2: n > 1.
In this case we can show that every ui can be written as a linear combination of the other uj ’s. For
example, here is the proof that u1 can be written as a linear combination of u2 , . . . , un : rewite (∗) as
a1 u1 = (−a2 )u2 + · · · + (−an )un .
Recall that a1 ̸= 0, so a−1
1 exists (in F). Multiply both sides of the equation to get
u1 = (−a2 /a1 )u2 + · · · + (−an /a1 )un
proving u1 ∈ S can be written as a linear combination of u2 , . . . , un ∈ S. □
Comment. Another equivalent condition to “S is linearly dependent” is
∃v ∈ S with v ∈ span(S \ {v}).
Indeed, if v ∈ S and v = a1 u1 + · · · + an un with u1 , . . . , un ∈ S and ui ̸= v for all i, then v ∈ span(S \ {v}).
If S = {0} then 0 ∈ span(∅) = span(S \ {0}).
Taking the negation, we get
Corollary. A set S in a vector space is linearly independent iff ∀v ∈ S we have v ̸∈ span(S \ {v}).
2
MATH 146 January 23 Section 2
Definition. Let V be a vector space over F. A subset S ⊆ V is a basis for V if S is linearly independent
and spans V .
Example. (1) V = Rn : a basis is En = {(1, 0, 0, . . .), (0, 1, 0, . . .), . . . , (0, . . . , 0, 1)}.
| {z } | {z } | {z }
e1 e2 en
(2) More generally, En is a basis for Fn (for any F).
(3) V = Mm×n (F): a basis is Em,n = {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} where Eij is the matrix of all 0’s
except one 1 in the (i, j) spot.
(4) V = P(F): a basis is B = {1, x, x2 , . . .}.
Proof sketch of (4). Show span(B) = P(F): ⊆ is obvious. For ⊇, let f ∈ P(F), say
f = a0 + a1 x + a2 x2 + · · · + an xn
= a0 ·1 + a1 x + · · · + an xn .
So f is a linear combination of 1, x, x2 , . . . , xn , all in B; so f ∈ span(B). This proves P(F) ⊆ span(B).
Prove B is linearly independent: assume xi1 , . . . , xik are distinct vectors in B, ai1 , . . . , aik ∈ F, and
ai1 xi1 + · · · + aik xik = 0.
By definition, a polynomial is the zero polynomial iff all of its coefficients are 0, so
ai 1 = · · · = ai k = 0
proving B is linearly independent. So B is a basis. □
(5) Here is another basis for P(R), invented by Legendre: {P0 , P1 , . . . , Pn , . . .} where
n k
X n n+k x−1
Pn = .
k=0
k k 2
For example, P0 = 1, P1 = x, P2 = 12 (3x2 − 1), . . . This basis for P(R) is “better” than B in some
applications.
Theorem 1.8. Suppose B = {v1 , . . . , vn } ⊆ V is finite. TFAE:
(1) B is a basis for V .
(2) Every v ∈ V admits a unique representation v = a1 v1 + · · · + an vn as a linear combination of B.
Proof sketch. (1) ⇒ (2). Since B spans V , every vector in V admits some representation as a linear
combination of B. To prove uniquencess, assume that v ∈ V admits two representations
v = a1 v1 + · · · + an vn
and v = b1 v1 + · · · + bn vn .
Subtract the 2nd from the first to get
0 = (a1 − b1 )v1 + · · · + (an − bn )vn .
Use linear independence of B to get
a1 − b1 = · · · = an − bn = 0, i.e., ai = bi for all i = 1, . . . , n.
(2) ⇒ (1). By assumption, every v ∈ V admits a representation as a linear combination of B, so
span(B) = V . To prove B is linear independent, note that one representation of the zero vector (as a
linear combination of B) is
0 = 0v1 + · · · + 0vn .
By the uniqueness assumption, this is the only representation of the zero vector. That is,
a1 v1 + · · · + an vn = 0 =⇒ a1 = · · · = an = 0,
proving B is linear independent. □
Corollary. Let V be a vector space/F. If V has a finite basis B = {v1 , . . . , vn }, then V is “naturally” in
1-1 correspondence with Fn via
a1 v1 + · · · + an vn ↭ (a1 , . . . , an )
| {z } | {z }
∈V ∈Fn
Existence of bases
Theorem 1.9. Suppose the vector space V is spanned by a finite set S. Then S can be “shrunk” to a
basis B of V (i.e., ∃ basis B ⊆ S).
Proof sketch. We have span(S) = V . If S is linearly independent, then S itself is a basis. Otherwise,
∃v ∈ S such that v ∈ span(S \ {v}), and hence span(S \ {v}) = span(S) = V . Choose such v and let
S ′ = S \ {v}. It still spans V but is smaller. Repeat until you can’t anymore; the final S (k) must be linearly
independent and a basis. □
Question: is the theorem true for infinite spanning sets? More care would be needed in the proof. For
example, it is possible to construct a vector space V ̸= {0} and an infinite spanning set {v0 , v1 , . . .} with
the property that for every k, vk ∈ span({vk+1 , vk+2 , . . .}). So you might first throw away v0 , then v1 , then
v2 , etc. and at the end you will have reduced S to ∅ (which doesn’t span V ).1
Question. Suppose V has a finite linearly independent set S. Can S always be “grown” (i.e., extended)
to a basis for V ?
Naive algorithm proving “YES.”
Assume S ⊆ V is linearly independent. If span(S) = V , then S is already a basis. Otherwise, there
exists x ∈ V with x 6∈ span(S). Let S 0 = S ∪ {x}. I want to say that S 0 is also linearly independent. (I’ll
justify this later.)
Repeat: if span(S 0 ) = V then S 0 is a basis. Otherwise, there exists x0 ∈ V with x0 6∈ span(S 0 ). Let
S 00 = S 0 ∪ {x0 }. Then S 00 is still linearly independent.
Continue: we get S ⊂ S 0 ⊂ S 00 ⊂ · · · ⊂ S (n) ⊂ · · · with each set S (k) linearly independent. Now there is
the question of whether this process terminates. It turns out that we can prove termination, provided we
assume that V has a finite spanning set.
To fully justify the claims above, we need the following two theorems.
Theorem 1.7. Suppose V is a vector space, S ⊆ V , and x ∈ V with x 6∈ S. Then
S ∪ {x} is linearly independent ⇐⇒ (S is linearly independent and x 6∈ span(S)).
Proof sketch. (⇒) By known facts.
(⇐) Assume S is linearly independent and x 6∈ span(S). Suppose S ∪ {x} is linearly dependent. So
there exist v1 , . . . , vn ∈ S ∪ {x} (distinct) and scalars a1 , . . . , an ∈ F, not all 0, with
(∗) a1 v1 + · · · + an vn = 0.
By weeding out those ai vi for which ai = 0, we can assume that we have ai 6= 0 for all i.
We must have x ∈ {v1 , . . . , vn } (why?) WLOG assume that vn = x. Since an 6= 0, we can “solve for x”
in (∗) to get x ∈ span(v1 , . . . , vn−1 ) ⊆ span(S), contradiction.
Theorem 1.10 (Baby Replacement Theorem1). Suppose V is spanned by some finite set of size n. Then
every linearly independent set S ⊆ V satisfies |S| ≤ n.
(Proof deferred until Friday).
Now consider again the Question and the naive algorithm for solving it. Assume V is spanned by a
finite set of size n, and S is linearly independent. Recall the sequence of growing linearly independent sets
S ⊂ S 0 ⊂ S 00 ⊂ · · ·
By the Baby Replacement Theorem, all of these sets must have size ≤ n, so the sequence cannot go on
forever. Therefore at some step, say S (k) , the algorithm terminates, which must be because S (k) is a basis.
This proves:
Corollary 2 (c). Suppose V is spanned by a finite set. Then every linearly independent set S ⊆ V can be
extended to a basis for V .
Here is another application of the Baby Replacement Theorem.
Corollary 1. Assume V is spanned by a finite set.
(1) V has a finite basis B.
(2) Let n = |B|. Every basis of V has size n.
1The full Replacement Theorem says that, if T is the spanning set of size n, then not only do we have |S| ≤ n, but also
there exists a subset H ⊆ T of size |H| = n − |S|, such that S ∪ H spans V .
Proof. (1) Let S be a finite spanning set. S can be shrunk to a basis B, by Theorem 1.9 (Jan 23). Obviously
B is also finite.
(2) Let C be another basis. Apply the Baby Replacement Theorem 1.10 using B as the finite spanning
set and S := C as the linearly independent set to get |C| ≤ |B|. So C is also a finite basis. So we can
reverse the roles of B and C in the above argument to get |B| ≤ |C|. So |C| = |B| = n.
2
MATH 146 January 27 Section 2
Recap: suppose V has a finite spanning set.
• Theorem 1.9: finite spanning sets shrink to bases.
• (Baby) Replacement Theorem: finite spanning set B, S lin. indep. =⇒ |S| ≤ |B|.
• Corollary 2(c): lin. indep. sets extend to bases.
• Corollary 1(1): V has a basis.
• Corollary 1(2): all bases have same (finite) size.
Still need to prove the (Baby) Replacement Theorem. First we prove:
Exchange Lemma∗ . Suppose V is a vector space, span(x1 , . . . , xk , v1 , . . . , v` ) = V (possibly with k = 0),
and x ∈ V with x 6= span(x1 , . . . , xk ). Then there exists i ∈ {1, . . . , `} such that
span(x1 , . . . , xk , x, v1 , . . . , vi−1 , vi+1 , . . . , v` ) = V.
Repeating, we have x2 6∈ span(x1 ) (why?). Again by the Exchange Lemma, there exists j ∈ {2, . . . , n}
such that (B 0 \ {vj }) ∪ {x2 } still spans V . For concreteness, assume it is j = 2, so
span(x1 , x2 , v3 , . . . , vn ) = V.
Keep repeating until we run out of v’s. Then we will have
span(x1 , x2 , . . . , xn ) = V.
In particular, xn+1 ∈ span(x1 , . . . , xn ), contradicting that S is linearly independent.
Corollary 1 justifies the next definition.
Definition. Let V be a vector space.
(1) Suppose V is spanned by a finite set. The dimension of V , written dim V , is the unique size n of
every basis of V . We also say that V is finite-dimensional.
(2) If V has no finite spanning set, then we say V is infinite-dimensional.
Example.
(1) Fn . It has the finite basis En of size n. So dim Fn = n.
(2) Mm×n (F). It has the finite basis Em,n of size mn. So dim Mm×n (F) = mn.
(3) P(F). It has the infinite basis {1, x, x2 , . . .} (which is linearly independent). By the Replacement
Theorem, P(F) has no finite spanning set, i.e., is infinite-dimensional.
(4) F(R, R). It has an infinite linearly independent subset (A2). So by the Replacement Theorem, it
has no finite spanning set. So it is infinite-dimensional.
Here is an intuitively obvious but important consequence of our results.
Theorem 1.11. Let V be a finite-dimensional vector space. Let W be a subspace. Then:
(1) W is finite-dimensional and dim W ≤ dim V .
(2) If W is a proper subspace (i.e., W ⊂ V ), then dim W < dim V .
Proof. Let n = dim V . Fix a subspace W . By the Replacement Theorem or Corollary 2(c), every lin.
indep. set in V has size ≤ n. In particular, every lin. indep. set in W has size ≤ n. This means that if we
start with some lin. indep. subset S ⊆ W (say S = ∅), the Naive Algorithm for extending S to a basis
for W will terminate, say at C. Then |C| = dim W (by definition). As C is linearly independent (in W ,
and hence in V ), we have |C| ≤ n (see the comments at the start of this proof). Thus dim W ≤ n, proving
(1).
(2) Let C be the basis for W we found in (1). C is linearly independent in V , so we can extend C to a
basis B of V (by Corollary 2(c)). We have C ⊆ B, |C| = dim W ≤ n, and |B| = dim V = n. If C = B, then
we would have W = span(C) = span(B) = V , contradiction. So C ⊂ B, so |C| < |B|, proving (2).
Here is another cute result that is sometimes useful.
Corollary 2 (continued). Assume V is a finite-dimensional vector space with dim V = n. Suppose S ⊆ V
is a subset with |S| = n. Then
(1)
S is linearly independent ⇐⇒ S spans V ⇐⇒ S is a basis for V .
(1)
Proof. (⇒) By Corollary 2(c), S can be extended to a basis C. Then |C| = n. But S ⊆ C and |S| = n. So
S = C and S is already a basis for V .
(1)
(⇐) By Theorem 1.9 (Jan 23), we can shrink S to a basis C. Then |C| = n. But C ⊆ S and |S| = n. So
S = C and S is already a basis for V .
2
MATH 146 January 30 Section 2
The material in this week’s lectures is not in our textbooks (only in the exercises of Friedberg-Insel-Spence).
Example. V = M2×2 (R),
a b 0 b
W1 = | a, b, c ∈ R , W2 = | b, d ∈ R .
c 0 b d
Warm-up: what is dim W1 ? dim W2 ? W1 ∩ W2 ? dim(W1 ∩ W2 )?
Definition. Suppose V is a vector space and W1 , W2 are subspaces. Their sum is the set
def
W1 + W2 = {w1 + w2 : wi ∈ Wi for i = 1, 2} ⊆ V.
Example. V = M2×2 (R), W1 and W2 as in the previous lecture. W1 + W2 = V .
Example. V = R3 , W1 = {the x-axis} = span(e1 ) and W2 = {the y-axis} = span(e2 ). Then
W1 + W2 = {(a, 0, 0) + (0, b, 0) : a, b ∈ R} = {(a, b, 0) : a, b ∈ R} = {the x, y-plane} = span(e1 , e2 ).
Proposition 1. If W1 , W2 are subspaces of V , then W1 + W2 is a subspace of V . Moreover, W1 + W2
contains both W1 and W2 .
Proof. 0 = 0 + 0 ∈ W + W 0 so W + W 0 6= ∅.
Suppose x, y ∈ W1 + W2 (must show x + y ∈ W1 + W2 ). By definition this means x = w1 + w2 and
y = u1 + u2 for some w1 , u1 ∈ W1 and w2 , u2 ∈ W2 . So
x + y = (w1 + w2 ) + (u1 + u2 )
= (w1 + u1 ) + (w2 + u2 ) (by commutativity, associativity).
| {z } | {z }
∈W1 ∈W2
So x + y ∈ W1 + W2 . A similar proof, using one of the distributive axioms, shows W1 + W2 is closed under
scalar multiplication. So W1 + W2 is a subspace of V .
Finally, w1 ∈ W1 =⇒
w1 = w1 + 0 ∈ W1 + W2
so W1 ⊆ W1 + W2 . Similarly, w2 = 0 + w2 ∈ W1 + W2 shows W2 ⊆ W1 + W2 .
Remark. W1 + W2 is the smallest subspace containing both W1 and W2 ; that is, every subspace that
contains W1 and W2 also contains W1 + W2 .
Proposition 2. span(S ∪ T ) = span(S) + span(T ).
Proof. (⊆) S ⊆ span(S) ⊆ span(S) + span(T ). Similarly, T ⊆ span(T ) ⊆ span(S) + span(T ). So
S ∪ T ⊆ span(S) + span(T ).
Since the RHS is a subspace (Prop. 1), we get
span(S ∪ T ) ⊆ span(S) + span(T )
by Theorem 1.6.
(⊇) Clearly span(S ∪ T ) is a subspace containing both span(S) and span(T ). Thus span(S ∪ T ) contains
the smallest subspace containing span(S) and span(T ), which by a Remark is span(S) + span(T ).
Remark. Can define W1 + W2 + · · · + Wk similarly. It is always a subspace, and is the smallest subspace
containing all W1 , . . . , Wk . span(S1 ∪ · · · ∪ Sk ) = span(S1 ) + · · · + span(Sk ).
Definition. If W1 , W2 are subspace of V , then we say that V is the direct sum of W1 and W2 , and we
write V = W1 ⊕ W2 , provided:
(1) W1 + W2 = V .
(2) Every v ∈ V can be written uniquely as v = w1 + w2 with w1 ∈ W1 and w2 ∈ W2 . This means if
v ∈ V and v = w1 + w2 = w10 + w20 with wi , wi0 ∈ Wi for i = 1, 2, then wi = wi0 for i = 1, 2.
Remark. We view this as “decomposing” or “factoring” the space V . This is especially useful in analyzing
the structure of linear operators on vector spaces (MATH 245).
0 b
Example. Let V = M2×2 (R) and W1 be as before. Let W3 = | b ∈ R . Then V = W1 ⊕ W3 ,
0 b
a b
because every ∈ V can be written
c d
a b a b−d 0 d
= +
c d c 0 0 d
| {z } | {z }
∈W1 ∈W3
v+W W
Note that v + W = v ′ + W
v′
2
MATH 146 February 3 Section 2
Lemma 1. Let W be a subspace of V . Let u, v ∈ V . Then
u + W = v + W ⇐⇒ u − v ∈ W.
Proof. (⇒) Assume u + W = v + W . Observe that u ∈ U + W . So u ∈ v + W , meaning u = v + w for
some w ∈ W . Then u − v ∈ W .
(⇐) Assume u − v =: w ∈ W . Then
u + W = (v + w) + W = v + (w + W ) = v + W (as w + W = W ; exercise.) □
Lemma 2. Let W be a subspace of V . The translations of W partition V ; that is,
(1) Their union is V , and
(2) If u + W ̸= v + W then (u + W ) ∩ (v + W ) = ∅.
Proof. (1) is obvious (v ∈ v + W for each v ∈ V ). To prove (2), we prove its contrapositive. Assume
(u + W ) ∩ (v + W ) ̸= ∅. Choose x ∈ (u + W ) ∩ (v + W ). This means
x=u+w for some w ∈ W
′
=v+w for some w′ ∈ W .
Then u + w = v + w′ , so u − v = w′ − w ∈ W . So u + W = v + W by Lemma 1. □
Note: the translations of W are the equivalence classes of an equivalence relation we can define on V ,
called “congruence mod W .” The definition is
u≡v (mod W ) ⇐⇒ u − v ∈ W.
Recall that for a fixed subspace W , V /W denotes the set of all translations of W . Thus
elements of V /W ↭ (certain) subsets of V
We use vectors in V to “name” elements of V /W (v “names” v + W ), but the vector names are not the
translations. In particular, elements of V /W can be “named” in more than one way. For example, if
V = R2 , W = span({(1, 1)}), and S = (−1, 0) + W , then also S = (0, 1) + W , since
(−1, 0) − (0, 1) = (−1, −1) ∈ W, so (−1, 0) + W = (0, 1) + W (Lemma 1).
We want to turn the set V /W into a vector space. So we need to define addition and scalar multiplication
on the elements of V /W . The idea is illustrated in the following example.
Example. V = R2 , W = span({(1, 1)}), S = (−1, 0) + W , T = (2, −1) + W .
W
S S+T T
u = (−1, 0)
Remark. Similarly, if F is the field of scalars for V , then we can define scalar multiplication on V /W by
def
c · (v + W ) = (cv) + W
and there is no problem (it is well-defined: if v + W = v ′ + W then (cv) + W = (cv ′ ) + W ).
Proposition. V /W with the operations defined as above is a vector space over F.
Proof. Ugh. There are 8 axioms that we must verify. Some are easy: e.g., commutativity (VS 1). Suppose
S, T ∈ V /W , say S = x + W and T = y + W . Then
df ∗ df
S + T = (u + W ) + (v + W ) = (u + v) + W = (v + u) + W = (v + W ) + (u + W ) = T + S.
∗
(= marks an application of (VS 1) in V .) Or left distributivity (VS 7):
df
a(S + T ) = a((x + W ) + (y + W )) = a((x + y) + W )
df ∗
= (a(x + y) + W ) = ((ax + ay) + W )
df df
= (ax + W ) + (ay + W ) = a(x + W ) + a(y + W ) = aS + aT.
Existence of a zero vector is more interesting. What is the zero vector in V /W ? (It is W .)
And the additive inverse of v+W is (−v)+W , because obviously (v+W )+((−v)+W ) = 0+W = W . □
Now that we know that V /W is a vector space, we can talk about linear combinations of “vectors” (i.e.,
translations of W ), linear independence, span, etc. The next theorem tells us what is dim(V /W ) when
dim V is finite.
Theorem. Suppose If V is finite-dimensional, say dim V = n. Let W be a subspace, say of dimension k.
Then dim(V /W ) = n − k = dim V − dim W .
Proof. Let B = {v1 , . . . , vk } be a basis for W . We can view B as a linearly independent subset of V . So
we can extend B to a basis C = {v1 , . . . , vk , vk+1 , . . . , vn } for V . Let D := {vk+1 + W, . . . , vn + W }. We
will show that D is a basis for V /W . Since |D| = n − k, this will prove the theorem.
We first show that D is linearly independent (in V /W ). Suppose ak+1 , . . . , an ∈ F and
(∗) ak+1 (vk+1 + W ) + · · · + an (vn + W ) = W.
Using the definitions of the operations in V /W , we can simplify this to
(ak+1 vk+1 + · · · + an vn ) + W = W.
2
By Lemma 1, this implies
ak+1 vk+1 + · · · + an vn ∈ W,
so
ak+1 vk+1 + · · · + an vn = b1 v1 + · · · bk vk for some b1 , . . . , bk ∈ F,
since {v1 , . . . , v)k } spans W . Then
(−b1 )v1 + · · · + (−bk )vk + ak+1 vk+1 + · · · + an vn = 0.
This is a linear combination of C. Since C is linearly independent, all the coefficients are 0. In particular,
ak+1 = · · · = an = 0. This proves that the only solution to (∗) is the trivial one, so D is linearly
independent.
Next we will show that D spans V /W . Let v + W ∈ V /W be given. Because C spans V , we can write
v = a1 v1 + · · · + ak vk + ak+1 vk+1 + · · · + an vn for some a1 , . . . , an ∈ F.
Claim. v + W = ak+1 (vk+1 + W ) + · · · + an (vn + W ) (so v + W ∈ span(D)).
Proof of Claim. The RHS of the Claim simplifies to u + W where u = ak+1 vk+1 + · · · + an vn . We want to
prove v + W = u + W . We can use Lemma 1: it is enough to prove v − u ∈ W . Well,
v − u = (a1 v1 + · · · + ak vk + ak+1 vk+1 + · · · + an vn ) − (ak+1 vk+1 + · · · + an vn )
= a1 v1 + · · · + ak vk ,
which is in W because v1 , . . . , vk ∈ W and W is a subspace. □
3
MATH 146 February 6 Section 2
Every element of V can be written as on the LHS, and in just one way (Thm. 1.8, Jan 23), so there is no
issue of being well-defined.
Clearly T vi = wi for each i = 1, . . . , n since
df
T vi = T (0·v1 + · · · + 1·vi + · · · + 0·vn ) = 0·w1 + · · · + 1·wi + · · · + 0·wn = wi .
Check that T is linear. Let x, y ∈ V . Say x = ni=1 ai vi and y = ni=1 bi vi . Then
P P
Xn Xn n
X n
X
T (x + y) = T ( (ai + bi )vi ) = (ai + bi )wi = ai w i + bi wi = T x + T y.
i=1 i=1 i=1 i=1
3
MATH 146 February 8 Section 2
2
MATH 146 February 10 Section 2
Proof. Suppose dim V = n and dim Ker(T ) = k . We need a bases for V , Ker(T ), and Ran(T ) which are
related so we can compare their sizes.
Start with a basis S = {v1 . . . , vk } for Ker T . S is linearly independent in V , so we can extend S to a
basis B = {v1 , . . . , vk , x1 , . . . , xn−k } for V . Since B spans V , it follows that
{T v1 , . . . , T vk , T x1 , . . . , T xn−k } spans Ran(T ) (Theorem 2.2).
| {z }
all = 0
Let C = {T x1 , . . . , T xn−k }. So C spans Ran(T ). We’ll show that |C| = n−k and C is linearly independent.
Assume first that ∃i ̸= j with T xi = T xj . Then T (xi − xj ) = T xi − T xj = 0, so xi − xj ∈ Ker(T ). Then
xi − xj = a1 v1 + · · · + ak vk for some scalars a1 , . . . , ak . But then xi can be written as a linear combination
of other vectors in B, contradicting linear independence of B. So T xi ̸= T xj ∀i ̸= j.
Next let’s show that C is linearly independent. Suppose c1 , . . . , cn−k ∈ F and
c1 T x1 + · · · + cn−k T xn−k = 0.
Then
T (c1 x1 + · · · + cn−k xn−k ) = 0,
so
c1 x1 + · · · + cn−k xn−k ∈ Ker(T ).
Recall that {v1 , . . . , vk } spans Ker(T ), so ∃a1 , . . . , ak ∈ F with
c1 x1 + · · · + cn−k xn−k = a1 v1 + · · · + ak vk .
So
(−a1 )v1 + · · · + (−ak )vk + c1 x1 + · · · + cn−k xn−k = 0.
But this is a linear combination of B, and B is linearly independent. So c1 = · · · = cn−k = 0 (and also
a1 = · · · = ak = 0). This proves C is linearly independent.
So C is a basis for Ran(T ), and |C| = n − k. This proves
(⇐) Assume dim V = dim W = n. Let {v1 , . . . , vn } be a basis for V . Let {w1 , . . . , wn } be a basis for W .
Let T be the unique linear transformation T : V → W sending vi 7→ wi for i = 1, . . . , n (Theorem 2.6).
Then
Ran(T ) = span(T v1 , . . . , T vn ) (Theorem 2.2)
= span(w1 , . . . , wn ) = W as {w1 , . . . , wn } is a basis for W
so T is surjective. Since dim V = dim W < ∞, T is also injective (Theorem 2.5). □
Definition 2.20. We write V ∼ = W to mean ∃ an isomorphism T : V ∼ = W.
Example.
(1) Pn (F) ∼
= Fn+1 (since both have dimension n + 1).
(2) Mm×n (F) ∼= Fmn (since both have dimension mn).
(3) C “as vector space over R” ∼= R2 (both have dimension 2).
2
MATH 146 February 13 Section 2
2
MATH 146 February 15 Section 2
§2.2. Coordinatization
Definition. Let V be a finite-dimensional vector space with dim V = n. An ordered basis for V is a
basis B = {v1 , . . . , vn } with a specified order (or if you are Patrick, indexing). I’ll write B = (v1 , . . . , vn ).
Example. Let V = Fn . The standard ordered basis is E = (e1 , . . . , en ).
Definition. Let V be a finite-dimensional vector space over F, let B = (v1 , . . . , vn ) be an ordered basis
for V , and let v ∈ V . Recall (Theorem 1.8, Jan. 23) that we can write
(∗) v = a1 v1 + · · · + an vn
in exactly one way. The unique scalars a1 , . . . , an satisfying (∗) are called the coordinates of v with
respect to the ordered basis B. The coordinate vector of v with respect to B, denoted [v]B , is
a1
which tells us that [v+w]B = ... which equals [v]B +[w]B . A similar proof shows [αv]B = α[v]B .
an + b n
Now suppose V, W are finite-dimensional vector spaces over F, say dim V = n and dim W = m. Let
A = (v1 , . . . , vn ) and B = (w1 , . . . , wm ) be ordered basis for V and W . Note that:
• T is completely determined by its values T v1 , . . . , T vn on A (by Theorem 2.6, Feb. 6).
• Each T vj is determined, relative to B, by its coordinate vector [T vj ]B ∈ Fm .
Definition. In this context, the matrix of T with respect to A and B is the m × n matrix over F
whose columns are [T v1 ]B , . . . , [T vn ]B :
··· ···
[T v ] · · · [T vj B ···
] [T vn ]B .
1 B
··· ···
We denote this matrix by [T ]BA (preferred) or [T ]B
A (as in the text).
Example. Consider D : P3 (R) → P2 (R) given by D(f ) = f 0 . Choosing ordered bases A = (1, x, x2 , x3 )
and B = (1, x, x2 ), we calculate
D(1) = 0 = 0(1) + 0(x) + 0(x2 )
2 0 1 0 0
D(x) = 1 = 1(1) + 0(x) + 0(x )
so [D]BA = 0 0 2 0 .
D(x2 ) = 2x = 0(1) + 2(x) + 0(x2 )
0 0 0 3
D(x3 ) = 3x2 = 0(1) + 0(x) + 3(x2 )
Given T : V → W with dim V = n and dim W = m, ordered bases A and B for V and W , and a vector
v ∈ V , we have:
• the vector [v]A ∈ Fn .
• the vector [T v]B ∈ Fm .
• the matrix [T ]BA ∈ Mm×n (F).
Theorem 2.14. In the context above, [T ]BA [v]A = [T v]B .
2
MATH 146 February 17 Section 2
Theorem 2.14. Fix V, W vector spaces over F, dim V = n, dim W = m, A and ordered basis for V , B
an ordered basis for W . Let T : V → W be linear and v ∈ V . Then
[T ]BA [v]A = [T v]B .
Proof. Write v = a1 v1 + · · · + an vn . Then
T v = T (a1 v1 + · · · + an vn ) = a1 T v1 + · · · + an T vn .
So
[T v]B = [a1 T v1 + · · · + an T vn ]B
= a1 [T v1 ]B + · · · + an [T vn ]B because [ ]B is linear
a1
[ ]A ∼
= [ ]B ∼
=
Fn Fm
TB (= LB ) (B = [T ]BA )
Again with V, W, A, B fixed, but now allowing T to vary, we can view [ ]BA as a map L(V, W ) → Mm×n (F).
Theorem 2.8. With V, W, A, B as above, [ ]BA : L(V, W ) → Mm×n (F) is an isomorphism.
Proof. At the end of these notes. □
Note: we’ve already seen something like this in a special case: when V = Fn and W = Fm . In that case,
we had L : Mm×n (F) ∼ = L(Fn , Fm ). It turns out that if we choose the standard ordered bases En and Em
for F and F , then [ ]Em En : L(Fn , Fm ) ∼
n m
= Mm×n (F) and [ ]Em En is the inverse map to L.
§2.3
It’s time to define multiplication of matrices. Suppose A ∈ Mk×m (F) and B ∈ Mm×n (F). We will define
AB to be a k × n matrix as follows. First, an example:
2 0
1 2 3 0 ∗ ∗
1 −4
AB = 0 −1 1 5 3 2 = ∗ ∗
2 0 2 4 ∗ ∗
1 2
1 2 3 0 13
Write B = [b1 b2 ]. The first column of AB is Ab1 = 2 0 + 1 −1 + 3 1 + 1 5 = 7
1 0 2 4 12
1 2 3 0 −2
The second column of AB is Ab2 = 0 0 + (−4) −1 + 2 1 + 2 5 = 16
1 0 2 4 12
13 −2
So AB = 7 16 . Also note that if you want to compute just one entry of AB, say the row-
12 12
2,column-2 entry, just multiply the elements of row-2 in A with the elements of column-2 in B:
2 0
1 2 3 0 13 −2
1 −4
AB = 0 −1 1 5 3 2 = 7 16 .
2 0 2 4 12 12
1 2
[T vi ]B = ... .
0
Let B = (w1 , . . . , wm ). The equation displayed above is equivalent to
T vi = 0w1 + · · · + 0wm
which implies T vi = 0 (the zero vector of W ) for all i = 1, . . . , n. Thus if v ∈ V is an arbitrary vector,
then v = a1 v1 + · · · + an vn for some scalars a1 , . . . , an , and
T v = T (a1 v1 + · · · + an vn ) = a1 T v1 + · · · + an T vn = a1 0 + · · · + an 0 = 0,
which proves that T is the constant zero transformation, i.e., T = 0.
We’ve shown ker([ ]BA ) ⊆ {0}, and the opposite inclusion is obvious, so ker([ ]BA ) = {0} and hence [ ]BA
is injective.
Finally, we show that [ ]BA is surjective. Let A ∈ Mm×n be given. Recall that B = (w1 , . . . , wm ). For
each i = 1, . . . , n, let the i-th column of A be
a1i
a2i
.
..
ami
and define xi = a1i w1 + a2i w2 + · · · + ami wm ∈ W . Using Theorem 2.6 (Feb 6), there exists a (unique)
linear transformation T : V → W satisfying T vi = xi for i = 1, . . . , n. Obviously
a1i
a2i
[T vi ]B =
... = the i-th column of A
ami
so [T ]BA and A have exactly the same columns. Hence [T ]BA = A, proving [ ]BA is surjective. □
3
MATH 146 February 27 Section 2
2
MATH 146 March 1 Section 2
Recall: If A ∈ Mn (F), we say that A is invertible if there exists B ∈ Mn (F) with AB = BA = In . Such
B is called an inverse of A.
Proof. (1) We just need to show that B := A works as an inverse to A−1 , i.e., A−1 A = AA−1 = I, which
is immediate since A−1 is the inverse to A.
(2) (αA)( α1 A−1 ) = αα (AA−1 ) = 1I = I. Similarly for the product in the other direction.
(3) By associativity, (AB)(B −1 A−1 ) = ((AB)B −1 )A−1 = (A(BB −1 ))A−1 = (AI)A−1 = AA−1 = I.
Similarly for the product in the other order. □
Definition. Given T ∈ L(V, W ), we say that T is invertible if there exists S ∈ L(W, V ) such that
ST = IV and T S = IW , where IV : V → V is the identity transformation on V , and similarly for IW . Such
an S is called an inverse of T .
(2) (⇒) Assume T is invertible, so an inverse S ∈ L(W, V ) exists. We have ST = IV , which is a bijection;
so T is injective. And we have T S = IW , which is a bijection; so T is surjective. Hence T is a bijection,
so is an isomorphism.
(⇐) Assume T is an isomorphism. So T is a bijection from V to W . Hence the inverse map T −1 : W → V
exists and is defined by T −1 (w) = v iff T V = w. We need to show that T −1 ∈ L(W, V ) and T and T −1
satisfying the compositions T T −1 = IV and T −1 T = IW . The compositions are easily checked. We need to
prove T −1 is linear. Suppose w1 , w2 ∈ W ; let v1 , v2 ∈ V be such that T vi = wi (and T −1 wi = vi ). We know
T (v1 + v2 ) = T v1 + T v2 = w1 + w2 ; hence T −1 (w1 + w2 ) = v1 + v2 = T −1 w1 + T −1 w2 . So T −1 preserves +.
A similar proof works for scalar multiplication. □
Theorem 2.18. Let V, W be vector spaces over F with dim(V ) = n and dim(W ) = m and ordered bases
A and B. Let T ∈ L(V, W ). Then T is invertible iff n = m and [T ]BA is invertible.
2
MATH 146 March 3 Section 2
Recall:
Theorem 2.18. Let V, W be vector spaces over F with dim(V ) = n and dim(W ) = m and ordered bases
A and B. Let T ∈ L(V, W ). Then T is invertible iff n = m and [T ]BA is invertible.
Definition. Given A ∈ Mm×n (F), its transpose is the matrix At ∈ Mn×m (F) obtained from A by
“switching rows and columns.”
1 0
1 2 4
Example. If A = then At = 2 1.
0 1 5
4 5
2
MATH 146 March 6 Section 2
Definition. Given a matrix A ∈ Mm×n (F), the transpose of A is the matrix At ∈ Mn×m (F) obtained
from A by “switching rows and columns.”
Formally, the (i, j)-entry of At is the (j, i)-entry of A.
Properties of Transpose.
(1) (A + B)t = At + B t
(2) (αA)t = α(At )
(3) (At )t = A
(4) (AB)t = B t At (whenever A is m × n and B is n × k).
Proof. (1)–(3) are obvious, since switching rows with columns commutes with addition and scalar multi-
plication.
(4) is not obvious, but can be deduced as follows. AB is m × k, so (AB)t is k × m. B t is k × n and At
is n × m, so B t At is also k × m. Finally,
(i, j)-entry of (AB)t = (j, i)-entry of AB
= sum of products of pairs from row j of A and column i of B
= sum of products of pairs from column j of At and row i of B t
= (i, j) entry of B t At .
Thus (AB)t and B t At have the same dimensions and exactly the same entries, so (AB)t = B t At . □
Corollary. Let A ∈ Mn (F). If A is invertible, then so is At and (At )−1 = (A−1 )t .
Proof. From A−1 A = In we get (A−1 A)t = (In )t which simplifies to At (A−1 )t = In . Then the 2nd Corollary
from March 3 gives that At is invertible and (A−1 )t = (At )−1 . □
Fundamental Subspaces
One of the most interesting facts about ranks is that rank(A) = rank(At ), or equivalently, Col(A) and
Row(A) have the same dimension. Before proving it, let me state the key idea on which the proof rests.
Lemma. Suppose A ∈ Mm×n (F) and B ∈ Mm×r (F). The following are equivalent:
(1) Col(A) ⊆ Col(B).
(2) There exists C ∈ Mr×n (F) with A = BC.
Proof. (2) ⇒ (1). Assume A = BC with C = [c1 · · · cn ]. Then the columns of A are Bc1 , Bc2 , . . . , Bcn .
Recall that Bx is a linear combination of the columns of B and so is in Col(B) (for any x ∈ Fr ). In
particular, every column of A is in Col(B), and hence Col(A) ⊆ Col(B).
(1) ⇒ (2). Write A = [a1 · · · an ] and B = [b1 · · · br ]. Since Col(A) ⊆ Col(B) we get that a1 , . . . , an ∈
span{b1 , . . . , br }. Consider a1 . By the previous remarks, there exist scalars α1 , . . . , αr ∈ F so that a1 =
α1 b1 + · · · + αr br . Define
α1
c1 = ... ∈ Fr .
αr
Then Bc1 = α1 b1 + · · · + αr br = a1 . Similarly we can find c2 , . . . , cn ∈ Fr so that Bci = ai for each
i = 1, . . . , n. Let C = [c1 · · · cn ] ∈ Mr×n (F). Then
BC = B[c1 · · · cn ] = [Bc1 · · · Bcn ] = [a1 · · · an ] = A. □
Theorem 1. For any A ∈ Mm×n (F), rank(A) = rank(At ).
Proof. Write A = [a1 a2 · · · an ] and let r = rank(A). Let {b1 , . . . , br } ⊆ Fm be a basis for Col(A).
Let B = [b1 · · · br ] ∈ Mm×r (F). By definition, Col(B) = Col(A). Hence by the Lemma, there exists
C ∈ Mr×n (F) with
A = BC.
Taking transposes of both sides gives
At = (BC)t = C t B t .
Then the Lemma gives Col(At ) ⊆ Col(C t ). Hence
rank(At ) = dim Col(At ) ≤ dim Col(C t ) = dim Row(C) ≤ r since C has only r rows.
In other words, rank(At ) ≤ rank(A). Applying this to At gives rank((At )t ) ≤ rank(At ), so rank(A) =
rank(At ). □
Corollary. For any matrix A, dim Row(A) = dim Col(A) = rank(A).
2
MATH 146 March 8 Section 2
§3.1
Consider a system (S) of m linear equations in n unknowns:
a11 x1 + a12 x2 + · · · · · · a1n xn = b1
a21 x1 + a22 x2 + · · · · · · a2n xn = b2
(S) ..
.
am1 x1 + am2 x2 + · · · · · · amn xn = bm
We always have a field F in mind. The coefficients aij and right-hand sides bi belong to F, and we are
looking for solutions x1 , . . . , xn ∈ F.
We can write the system as
a11 a1n b1
x1 .
.. + · · · + xn .
.. = ..
.
am1 amn bm
and hence as
a11 ··· a1n x1 b1
... ..
.
.. = .. ,
. .
am1 · · · amn xn bm
that is, as a matrix-vector equation
Ax = b
where A ∈ Mm×n (F) is the coefficient matrix of (S), b ∈ Fm is the RHS vector of (S), and x ∈ Fn
represents a potential solution.
It is also convenient to form the m × (n + 1) matrix
a11 · · · a1n b1
(A | b) = ... ..
.
..
.
am1 · · · amn bm
which encodes (S) and which we call the augmented matrix of (S).
Example. For the system
2x1 + 3x2 − x3 = 1
x1 − 2x2 + 4x3 = 2
we have
x1
2 3 −1 1 2 3 −1 1
A= , x = x2 ,
b= , (A | b) = .
1 −2 4 2 1 −2 4 2
x3
Goal: to develop techniques to
(1) Solve a system of linear equations.
(2) Determine the rank of a matrix A.
(3) Determine whether a square matrix A is invertible, and if so, to find A−1 .
We will do all of these via manipulations of matrices called elementary operations.
Definition. An elementary row operation (on matrices) is any one of the following actions:
(1) Switching two rows. Ri ⇆ Rj
(2) Multiplying one row by a nonzero scalar. Ri ← αRi (α ̸= 0)
(3) Adding a scalar multiple of one row to another row. Ri ← Ri + aRj
An elementary column operation is any action of the above kinds, but with rows replaced by columns.
We use notation Ci ⇆ Cj , Ci ← αCi etc.
An elementary row operation applies to m × n matrices where m is at least as large as the index i (or
indices i, j) of the row(s) it references. Similarly, an elementary column operation applies to m×n matrices
where n is at least as large as the index i (or indices i, j) of the column(s) it references.
Notation. If A ∈ Mm×n , O is an elementary operation that can be applied to A, and A′ is the result of
O
applying O to A, then we write A −→ A′ .
Newton’s 3rd Law of Elementary Operations. To every elementary operation O there is an equal
and opposite elementary operation O−1 of the same kind.
For example, if O is the row operation Ri ← Ri + αRj , then O−1 is Ri ← Ri + (−α)Rj . Clearly if
O O−1
A −→ A′ then A′ −→ A.
Willard’s Law. For every elementary row operation O there is a transpose column operation Ot , so that
O Ot
A −→ B iff At −→ B t . (Just change rows to columnns.) Similarly for column operations.
Definition. Let O be an elementary operation and let n be an integer big enough so that O can act on
O
n × n matrices. The n × n elementary matrix corresponding to O is the matrix E where In −→ E.
1 0 0
Example. 0 0 1 is an elementary matrix of type 1, using either R2 ⇆ R3 or C2 ⇆ C3 .
0 1 0
1 0 0
0 1 0 (where α ̸= 0) is an elementary matrix of type 2, using either R3 ← αR3 or C3 ← αC3 .
0 0 α
1 0 0
α 1 0 is an elementary matrix of type 3, using either R2 ← R2 + αR1 or C1 ← C1 + αC2 .
0 0 1
Theorem 3.1. Fix m, n and suppose that O is an elementary column operation which can act on □ × n
O
matrices. Let In −→ E, so E is the corresponding n × n elementary matrix. Then for all A ∈ Mm×n (F),
O
A −→ AE.
Proof. Write A = [a1 · · · an ] and In = [e1 · · · en ]. Now consider cases.
Case 1: O is Ci ⇆ Cj . (Assume i < j.)
Then E = [e1 · · · ej · · · ei · · · en ]. So
AE = [Ae1 · · · Aej · · · Aei · · · Aen ] = [a1 · · · aj · · · ai · · · an ],
O
which is the result of switching columns i and j of A, so A −→ AE.
Case 2: O is Ci ← αCi (α ̸= 0).
Then E = [e1 · · · αei · · · en ], so
AE = [Ae1 · · · A(αei ) · · · Aen ] = [Ae1 · · · α(Aei ) · · · Aen ] = [a1 · · · αai · · · an ],
O
which is the result of multiplying column i of A by α, so A −→ AE.
Case 3: O is Ci ← Ci + αCj . (Left as an exercise.) □
There is a corresponding result for elementary row operations.
Corollary. Fix m, n and suppose that O is an elementary row operation that can act on m × □ matrices.
O O
Let Im −→ E, so E is the corresponding m×m elementary matrix. Then for all A ∈ Mm×n (F), A −→ EA.
2
MATH 146 March 10 Section 2
Theorem 3.1. Suppose O is an elementary column operation which can act on □ × n matrices. Let E be
O O
the corresponding n × n elementary matrix, i.e., In −→ E. Then for all A ∈ Mm×n (F), A −→ AE.
O
Corollary 1. Suppose O is an elementary row operation that can act on m × □ matrices. Let Im −→ E.
O
Then for all A ∈ Mm×n (F), A −→ EA.
O
Proof sketch. Ot is an elementary column operation that can act on □ × m matrices. We have Im −→ E
t Ot
t O O
so Im = Im −→ E t . We can apply Ot to At and we get At −→ At E t = (EA)t . So A −→ EA. □
Now we can prove the following intuitively obvious result.
Theorem 3.2. Elementary matrices are invertible. Moreover, if E is the n × n elementary matrix corre-
sponding to O, then E −1 is the n × n elementary matrix corresponding to O−1 .
O−1 O
Proof. Let In −→ F . Then F −→ In .
O
Case 1: O is a column op. Then F −→ F E (Theorem 3.1), so F E = In . Since E, F are square, we get
E is invertible and F = E −1 (2nd Corollary, March 3).
O
Case 2: O is a row op. Then F −→ EF (Corollary 1). So EF = In . Use the same logic. □
§3.2
Notation. Suppose A, B ∈ Mm×n (F). We write A ⇝ B if there exists a sequence of elementary operations
row
that can transform A to B. We write A ⇝ B if we can do this using just row operations; similarly for
col
A ⇝ B.
Consider the following sequence O1 , . . . , O11 which witnesses
2 4 1 0 1 0 0 0
A = −1 −2 1 3 ⇝ 0 1 0 0 = B :
3 6 0 −3 0 0 0 0
2 4 1 0 −1 −2 1 3 1 2 −1 −3
R1 ⇆R2 R1 ←(−R1 )
A = −1 −2 1 3 −−−−→ 2 4 1 0 −−−−−−→ 2 4 1 0
O1 O2
3 6 0 −3 3 6 0 −3 3 6 0 −3
1 2 −1 −3 1 2 −1 −3 1 0 −1 −3
R2 ←R2 −2R1 R3 ←R3 −3R1 C2 ←C2 −2C1
−−−−−−−→ 0 0 3 6 −− −−−−−→ 0 0 3 6 −− −−−−−→ 0 0 3 6
O3 O4 O5
3 6 0 −3 0 0 3 6 0 0 3 6
1 0 0 −3 1 0 0 0 1 0 0 0
C3 ←C3 +C1 C4 ←C4 +3C1 R3 ←R3 −R2
−−−−−−−→ 0 0 3 6 −−−−−−−→ 0 0 3 6 −−−−−−−→ 0 0 3 6
O6 O7 O8
0 0 3 6 0 0 3 6 0 0 0 0
1 0 0 0 1 0 0 0 R ←1R 1 0 0 0
C4 ←C4 −2C3 C2 ⇆C3 2 2
−− −−−−−→ 0 0 3 0 −−−−→ 0 3 0 0 −−−−3−→ 0 1 0 0 = B.
O9 O10 O11
0 0 0 0 0 0 0 0 0 0 0 0
Observe that O1 , . . . , O4 , O8 , O11 are elementary row operations while O5 , O6 , O7 , O9 , O10 are elementary
column operations. For i = 1, . . . , 11 let Ei be the appropriately sized elementary matrix corresponding to
Oi . (So O1 , . . . , O4 , O8 , O11 are 3 × 3 and O5 , O6 , O7 , O9 , O10 are 4 × 4). Then
B = E10 E8 E4 E3 E2 E1 A E5 E6 E7 E9 E10 = P AQ.
| {z } | {z }
P := Q:=
Since elementary matrices are invertible (Theorem 3.2), and products of invertible matrices are invertible
(Properties of inverses, Feb 27), we get that P, Q are invertible. This obviously generalizes.
Theorem 1. Let A, B ∈ Mm×n (F).
(1) If A ⇝ B, then B = P AQ for some invertible P ∈ Mm (F) and Q ∈ Mn (F).
row
(2) If A ⇝ B, then B = P A for some invertible P ∈ Mm (F).
col
(3) If A ⇝ B, then B = AQ for some invertible Q ∈ Mn (F).
Theorem 3.4. Suppose A ∈ Mm×n (F). Let P ∈ Mm (F) and Q ∈ Mn (F) be invertible.
(1) Col(A) = Col(AQ).
(2) Row(A) = Row(P A).
(3) rank(A) = rank(P A) = rank(AQ) = rank(P AQ).
Proof. (1) Col(AQ) = Ran(TAQ ) = Ran(TA ◦ TQ ). Q is invertible so TQ is an isomorphism and hence is
surjective. So Ran(TA ◦ TQ ) = Ran(TA ) = Col(A).
(2) Row(P A) = Col((P A)t ) = Col(At P t ) = Col(At ) (by (1)) = Row(A).
(3) rank(AQ) = dim Col(AQ) = dim Col(A) = rank(A). Similarly for rank(P A). Finally,
rank(P AQ) = rank((P A)Q) = rank(P A) = rank(A). □
Combining Theorem 3.4 with Theorem 1, we get:
Corollary 2.
row
(1) If A ⇝ B, then Row(A) = Row(B).
col
(2) If A ⇝ B, then Col(A) = Col(B).
(3) If A ⇝ B, then rank(A) = rank(B).
1 1 1 2
Example. Find rank(A) where A = 2 0 −1 2.
1 1 1 2
Solution:
1 1 1 2 1 0 1 2 1
C ←− C2
1 0 1 2
R2 ←R2 −2R1 C ←C −C1
A −− −−−−−→ 0 −2 −3 −2 −−2−−−2−−→ 0 −2 −3 −2 −−2−−−2−→ 0 1 −3 −2 =: B.
R3 ←R3 −R1
0 0 0 0 0 0 0 0 0 0 0 0
Obviously rank(B) = 2 (the first two columns of B form a basis for Col(B)). Hence rank(A) = 2 by
Corollary 2.
Note that A, B do not have the same column space. But their column spaces have the same dimension.
Also note that, if we wanted, we could continue the above calculation as follows:
1 0 0 0 1 0 0 0
C ←C −C1 C3 ←C3 +3C2
B −−3−−−3−−→ 0 1 −3 −2 − −−−−−−→ 0 1 0 0 =: D.
C4 ←C4 −2C1 C4 ←C4 +2C2
0 0 0 0 0 0 0 0
This generalizes.
Theorem 3.6. For every nonzero matrix A ∈ Mm×n (F) there exists a matrix D of the form
Ir O
D=
O′ O′′
where r ≥ 1 and O, O′ , O′′ are all-zero matrices, such that A ⇝ D. Obviously rank(D) = r, so rank(A) = r.
Proof sketch. A has a nonzero entry, and using type-1 operations we can move it to the 1,1 position. By
a type-2 operation, we can change it to 1. Then using type-3 operations, we can “clear” the remaining
2
entries in the first row and column. Thus we have converted A to a matrix A′ of the form
1 0 ··· 0
0
A′ =
.
..
B
0
If B is all 0s we’re done. Else repeat: we can move a nonzero entry of B to the 2,2 position of A′ ; make it
equal 1; and then clear the rest of the 2nd row and column. Etc. □
3
MATH 146 March 13 Section 2
2
MATH 146 March 15 Section 2
Ir O row
We’ve seen that every A ⇝ and the RHS is unique (for A). What if we can only use ⇝?
O′ O′′
In the following definition, the leading entry of a nonzero row is the first (left-most) nonzero entry.
For example, in the following matrix
1 0 2 0 −2 3
0 1 −1 0 1 1
0 0 0 1 −2 2
0 0 0 0 0 0
the leading entries of rows 1–3 are in columns 1, 2 and 4.
Definition. A matrix R ∈ Mm×n (F) is in reduced row echelon form (RREF) if:
(1) All zero rows (if any) are at the bottom.
(2) The leading entry of a nonzero row is always strictly to the right of the leading entries of the rows
above it.
(3) Every leading entry (of a nonzero row) equals 1.
(4) The entries directly above a leading entry (i.e., in the same column) all equal 0.
If R just satisfies (1) and (2), then it is in row echelon form (REF).
Examples:
1 0 0 2 0 1 0 0 2 0 1 2 2
0 1 0 0 0 0 1 0 0 0 0 0 0
0 0 1 −1 0 0 0 1 −1 0 0 0 0
Some matrices not in RREF:
1 0 2 0 0 1 0 2
0 1 −1 0 0 0 0 1
0 0 0 2 0 0 0 0
Note: in each RREF example R, the columns containing the leading 1s are standard basis vectors (so
are linearly independent), and span Col(R). This is always the case. Thus
Theorem 1. If R is in RREF, then rank(R) = number of leading ones = number of nonzero rows.
row
Theorem 2. For every A ∈ Mm×n (F) there exists R ∈ Mm×n (F) in RREF such that A ⇝ R.
Proof sketch. Like the proof of Thm 3.6, but no column operations allowed:
• If A = O, then we’re done. Else, find the first nonzero column; pick a nonzero entry in it and use
a row op to move it to the top of that column. (This entry is called a pivot.)
• Using row operations, change the pivot to 1 and “clear” the entries below it. We now have something
like
0 0 1 ∗ ··· ∗
0 0 0
A′ =
. . . .
.. .. .. B
0 0 0
• Repeat. If B = O then we are done. Otherwise, find its first nonzero column, pick a nonzero entry
in that column, move it to the top of B (so to the 2nd row of A′ ). (This is our 2nd pivot.)
• Change this pivot to a 1 and use it to clear the entries above and below it to get something like
0 0 1 ∗ ∗ 0 ∗ ··· ∗
0 0 0 0 0 1 ∗ ··· ∗
′′ 0 0 0 0 0 0
A = . . . . . .
.. .. .. .. .. .. C
0 0 0 0 0 0
row
Eventually we arrive at an RREF matrix R. We used only row ops, so A ⇝ R. □
Notes
(1) The algorithm sketched in the above proof is known as Gauss-Jordan elimination. Another
variation is described in the example below.
(2) This algorithm is non-deterministic – each time you examine a first nonzero column, you have a
choice in which nonzero element from that column to use as a pivot.
Example. Consider the following matrix A ∈ M4×6 (R):
2 4 1 0 −4 2
0 0 2 −4 4 4
A= 1 2 2 −3 1
4
3 6 −2 7 −13 4
We can transform it to a matrix in RREF as follows.
(1) In the leftmost nonzero column, use elementary row operations (if necessary) to get a 1 in the first
row. (This will be a leading one.)
R ∗1
2 4 1 0 −4 2 − 1 2
−−→ 1 2 21 0 −2 1
0 0 2 −4 4 4 0 0 2 −4 4 4
A= 1 2 2 −3 1 4
1 2 2 −3 1 4
3 6 −2 7 −13 4 3 6 −2 7 −13 4
(2) By means of type 3 elementary row operations, use the first row to create zeros in the remaining
entries of the leftmost nonzero column; that is, below the leading one created in the previous step.
1 2 21 0 −2 1 1 2 12 0 −2 1
R3 ←R3 −R1 0 0 23 −4 4 4
0 0 2 −4 4 4
1 2 2 −3 1 4 −−−−−−−→ 0 0 −3 3 3
2
←R −3∗R 7
3 6 −2 7 −13 4 −−−−−−−−→ 0 0 − 2 7 −7 1
R 4 4 1
2
(3) Consider the submatrix consisting of the columns to the right of the column we just modified and
the rows beneath the row that just got a leading one. Use elementary row operations (if necessary)
to get a leading one in the top of the first nonzero column of this submatrix.
1
1
12 2
0 −2 1 1 2 2
0 −2 1
0 0 2 −4 4 4 R2 ∗ 12 0 0 1 −2 2 2
−−−→
0 0 3 −3 3 3 0 0 3 −3 3 3
2 2
0 0 − 72 7 −7 1 0 0 − 72 7 −7 1
(4) Use elementary row operations to obtain zeros below the 1 created in the preceding step. (But do
not create zeroes above the leading one now; Gaussian elimination does this later.)
1 2 12 0 −2 1 1 2 21 0 −2 1
0 0 1 −2 2 2
R3 ←R3 − 32 ∗R2 0 0 1 −2 2 2
3
0 0
2
−3 3 3 −−−−−−−−→ 0 0 0 0 0 0
7 7
0 0 − 2 7 −7 1 R4 ←R4 + 2 ∗R2 0 0 0 0 0 8
−−−−−−−−→
(5) Repeat Steps 3 and 4 until no nonzero rows remain. This completes the forward phase.
1 2 12 0 −2 1 1 2 12 0 −2 1 1 2 12 0 −2 1
0
0 1 −2 2 2
0 0 1 −2 2 2
R3 ← 1 ∗R3 0 0 1 −2 2 2
0 0 0 0 0 0 R3 ↔R4
0 0 0 0 0 8 −−−−8−−→
0 0 0 0 0 1
0 0 0 0 0 8 −−− −→ 0 0 0 0 0 0 0 0 0 0 0 0
(6) Now we will create zeroes above the leading ones. Working backwards, begin with the last nonzero
row and add multiples of it to each row above it to create zeros above its leading one.
R ←R −R3
1 2 21 0 −2 1 −−1−−−1−−→ 1 2 12 0 −2 0
0 0 1 −2 2 2 R2 ←R2 −2∗R3 0 0 1 −2 2 0
−−−−−−−−→
0 0 0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
(7) Repeat the process in Step 6 for the second-to-last nonzero row, then the third-to-last nonzero row,
etc. until it has been performed to every nonzero row except the first row. This completes the
backward phase. At this point the matrix should be in RREF.
R ←R − 1 ∗R
1 2 12 0 −2 0 − 1 1 2
−−−−−−−→
2
1 2 0 1 −3 0
0 0 1 −2 2 0 0 0 1 −2 2 0
in RREF.
0 0 0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
Note that, in this example, we did not clear the entries above the pivots until the end (and then we
did them in reverse order). This variation of the algorithm is called Gaussian elimination and is the
preferred algorithm for computer implementations, because it is slightly more efficient than the Gauss-
Jordan algorithm.
4
MATH 146 March 17 Section 2
Theorem 4. Let A ∈ Mm×n (F) and b ∈ F. Let S be the solution set to Ax = b and let SH be the solution
set to Ax = 0. Assume that Ax = b is consistent (i.e., S ̸= ∅) and let xp ∈ Fn be one particular solution
(i.e., xp ∈ S). Then
S = xp + SH .
Proof. (⊆) Suppose x ∈ S, so Ax = b. Then A(x − xp ) = Ax − Axp = b − b = 0 so x − xp ∈ SH . Then
x = xp + (x − xp ) ∈ xp + SH .
(⊇) Suppose x ∈ xp + SH . Then x = xp + v for some v ∈ SH . Then Ax = A(xp + v) = Axp + Av = b + 0 = b
so x ∈ S. □
Computational problem: given A ∈ Mm×n (F) and b ∈ Fm , “describe” the solution set S to Ax = b by:
(P1) Determining whether Ax = b is consistent (i.e., whether S ̸= ∅).
(P2) If S ̸= ∅, finding
(a) one solution xp to Ax = b, and (b) a basis {v1 , . . . , vk } for SH (= Null(A)),
so that S = xp + span{v1 , . . . , vk }.
Fact. Let R ∈ Mm×n (F) and b ∈ Fm . If (R | b) is in RREF, then we can easily “read off” a solution to
(P1) and (P2) for Rx = b.
(P1) Rx = b is consistent ⇐⇒ b ∈ Col(R) ⇐⇒ (R | b) does not have a pivot in the last column (b).
1 2 0 −1 0 1 2 0 −1 2
(R | b) = 0 0 1 4 0 (R | b) = 0 0 1 4 3
0 0 0 0 1 0 0 0 0 0
inconsistent consistent
This is because, generally, a pivot column in an RREF matrix is never in the span of the columns
to its left.
(P2) If Rx = b is consistent:
(a) Write the system of equations corresponding to Rx = b. In the 2nd example above, this would
correspond to
x1 + 2x2 − x4 = 2
x3 + 4x4 = 3
0 = 0
(b) The columns of R correspond to the variables x1 , . . . , xn in the equations. The variables
corresponding to columns with a pivot are called pivot variables. The remaining variables
are called free variables.
If r = rank(R), then R has r pivots. So R has n − r free variables. Choose new parameter
names for them s1 , . . . , sk (k = n − r) (for convenience).
In the above example, n = 4 and r = 2, so k = 2. The free variables are x2 , x4 and we
rename them x2 = s1 and x4 = s2 .
(c) Replace the free variables with their parameter names and move them to the RHS:
x1 = 2 − 2s1 + s2
x3 = 3 − 4s2
(d) (Optional) Insert the equations equating the free variables with the parameters.
x1 = 2 − 2s1 + 1s2
x2 = 0 + 1s1 + 0s2
x3 = 3 + 0s1 − 4s2
x4 = 0 + 0s2 + 1s2
(e) Translate to a vector equation:
x1 2 −2 1
x2 0 1 0
= + s1 + s2
x3 3 0 −4
x4 0 0 1
Here s1 , s2 can freely vary over F.
(f) Thus the solution set S to Rx = b is
2 −2 1
0
1 0
S = +span ,
3
0 −4
1 0 1
| {z } | {z }
xp basis for SH
2
MATH 146 March 20 Section 2
One more fact about row-reduction to RREF: it preserves “linear dependencies among the columns.”
Consider the example from March 15 lecture notes:
2 4 1 0 −4 2 1 2 0 1 −3 0
0 0 2 −4 4 4 row 0 0 1 −2 2 0
A= 1 2 2 −3 1 4 ⇝ 0 0 0
= R in RREF.
0 0 1
3 6 −2 7 −13 4 0 0 0 0 0 0
Write A = [a1 a2 · · · a6 ] and R = [r1 r2 · · · r6 ]. Observe that:
• r1 ̸= 0. Also a1 ̸= 0.
• r2 = 2r1 . Also a2 = 2a1 .
• r3 ̸∈ span{r1 , r2 }. Also a3 ̸∈ span{a1 , a2 }.
• r4 = r1 − 2r3 . Claim: a4 = a1 − 2a3 .
• By the apparent pattern, we should get a5 = −3a1 + 2a3 and a6 ̸∈ span{a1 , . . . , a5 }, because
r5 = −3r1 + 2r3 and r6 ̸∈ span{r1 , . . . , r5 }.
The formal statement of the pattern observed above is the following theorem.
row
Theorem 3.16. Suppose A ∈ Mm×n (F) and A ⇝ R in RREF. Write A = [a1 a2 · · · an ] and R =
[r1 r2 · · · rn ]. Then for all t = 1, . . . , n:
(1) rt = 0 iff at = 0.
(2) For all i1 , . . . , ik ̸= t and α1 , . . . , αk ∈ F,
rt = α1 ri1 + · · · + αk rik ⇐⇒ at = α1 ai1 + · · · + αk aik .
row
Proof sketch. (Not given in class) Because A ⇝ R, there exists invertible P ∈ Mm (F) with P A = R and
P −1 R = A. Hence P ai = ri and P −1 ri = ai for i = 1, . . . , n.
(1) Suppose at = 0. Multiply both sides on the left by P to get rt = 0. Conversely, if rt = 0, multiply
both sides on the left by P −1 to get at = 0.
(2) Suppose at = α1 ai1 + · · · + αk aik . Multiply both sides on the left by P (and use linearity of matrix-
vector multiplication) to get rt = α1 ri1 + · · · + αk rik . For the opposite implication, multiply both sides on
the left by P −1 . □
Returning to the example at the top of this page: obviously {r1 , . . . , r6 } spans Col(R). Consider what
will happen if we apply the Naive Algorithm to shrink {r1 , . . . , r6 } to a basis of Col(R). The above bullet
points (starting with r1 ̸= 0) tell us that the output of the algorithm will be {r1 , r3 , r6 }, i.e., the columns
of R containing pivots.
Now consider what will happen if we apply the Naive Algorithm to shrink {a1 , . . . , a6 } to a basis of
Col(A). Because the columns of A satisfy exactly the same linear dependencies as do the columns of R,
the output of the algorithm will be {a1 , a3 , a6 }. Hence {a1 , a3 , a6 } is a basis for Col(A).
These remarks hold generally.
row
Corollary 1. Suppose A ∈ Mm×n (F) and A ⇝ R in RREF. Write A = [a1 a2 · · · an ]. Suppose R has
pivots in columns i1 , i2 , . . . , ir . Then the Naive Algorithm to shrink {a1 , . . . , an } to a basis for Col(A) will
return {ai1 , . . . , air }. Hence the columns of A corresponding to the pivot columns of R form a basis for
Col(A).
Lastly, we can use Theorem 3.16 to prove the uniqueness of RREFs.
row
Corollary 2. For all A ∈ Mm×n (F) there exists only one RREF matrix R with A ⇝ R.
row
Proof sketch. (Not given in class) Suppose A ⇝ R with R in RREF. Write R = [r1 · · · rn ]. We will show
that the columns of R are determined by A. If A = 0 then R = 0 and R is completely determined. If
A ̸= 0, let i1 be the column number of the first nonzero column of A. Then the first i1 − 1 columns of
R equal 0 (by Theorem 3.16(1)) and column i1 of R equals e1 (by Theorem 3.16(1) and the definition of
RREF). So the first i1 columns of R are determined. Inductively suppose that i1 < t ≤ n and we have
proved that the first t − 1 columns of R are determined. Let the pivot columns of R in the first t − 1
columns be in column numbers i1 , . . . , ik (so the corresponding columns of R are e1 , . . . , ek ). Consider rt .
Case 1: at ∈ span{ai1 , . . . , aik }. Say
(∗) at = α 1 ai 1 + · · · + α k ai k .
Then rt = α1 e1 + · · · + αk ek by Theorem 3.16(2) so rt is determined by (∗), which is determined by A.
Case 2: at ̸∈ span{ai1 , . . . , aik }.
Then rt ̸∈ span{e1 , . . . , ek } by Theorem 3.16(2), so rt is the next pivot column (by definition of RREF),
i.e., rt = ek+1 , which was determined by A. □
3
MATH 146 March 22 Section 2
2
MATH 146 March 24 Section 2
Corollary 2. Let E ∈ Mn (F) be elementary, let O be the elementary column operation corresponding to
E, and let α be the nonzero scalar corresponding to O by Corollary 1.
(1) α = det E.
(2) For all A ∈ Mn (F), det(AE) = (det A)(det E).
O
Proof. (1) In −→ E so det E = α · det In by Corollary 1. Apply (P3).
O
(2) A −→ AE so det(AE) = α · det A by Corollary 1. Apply (1).
Lemma 7. For all A ∈ Mn (F) and all elementary matrices E1 , . . . , E` ,
(1) det(AE1 E2 · · · E` ) = (det A)(det E1 )(det E2 ) · · · (det E` )
(2) and det(E1 E2 · · · E` ) = (det E1 )(det E2 ) · · · (det E` ).
Proof. (Not given in lecture.) (2) follows from (1) by setting A = In . We prove (1) by induction on `. If
` = 1 then the claim is just Corollary 2. Inductively,
det(AE1 · · · E` E`+1 ) = det((AE1 · · · E` )E`+1 )
= det(AE1 · · · E` ) · (det E`+1 ) Corollary 2
= (det A)(det E1 ) · · · (det E` ) · (det E`+1 ). Induction
Theorem 2. det(AB) = (det A)(det B) for all A, B ∈ Mn (F).
Proof. If rank B < n, then rank(AB) ≤ rank B < n and det B = det(AB) = 0 by Theorem 1 and the result
holds. If rank B = n, then B is a product of elementary matrices, say B = E1 · · · E` . Apply Lemma 7.
Theorem 3. det At = det A for all A ∈ Mn (F).
Proof. First we prove det E t = det E whenever E is elementary. Let O be the elementary column operation
corresponding to E. If O is of type 1 (Ci Cj ) or of type 2 (Ci ← αCi ) then E t = E, as can be seen by
inspection, so obviously det E t = det E. If instead O is of type 3 (Ci ← Ci +βCj ) then E t is the elementary
matrix of the type-3 operation O0 : Cj ← Cj + βCi (as can be seen by inspection). The corresponding
scalar α associated to both O and O0 by Corollary 1 is α = 1, so det E = det E t = 1 in this case (by
Corollary 2).
Now we prove the theorem in the general case. If rank A < n then rank At < n and det A = 0 = det At
by Theorem 1. If rank A = n, then A = E1 · · · Ek for some elementary matrices Ei , so
At = (E1 · · · Ek )t = (Ek )t · · · (E1 )t
so
det At = (det(Ek )t ) · · · (det(E1 )t ) Lemma 7(2)
= (det Ek ) · · · (det E1 ) Comments in first paragraph
= det A. Lemma 7(2)
Theorem 3 implies that to every “column fact” from Wednesday there is a corresponding “row fact”; for
example, there is a “row version” of Corollary 1.
a11 a12 · · · a1n
a21 a22 · · · a2n
Theorem 4 (Leibniz formula). Let A ∈ Mn (F), say A = ... .. . . .. . Then
. . .
an1 an2 · · · ann
X
det A = sgn(σ) a1σ(1) a2σ(2) · · · anσ(n) .
σ∈Sn
Pn
Proof sketch. Let the rows of A be v1 , . . . , vn , and note that each vi = j=1 aij ej . We will pass to At
whose columns are v1 , . . . , vn .
n
X n
X n
X
t
det A = det A = D( a1j1 ej1 , a2j2 ej2 , . . . , anjn ejn )
j1 =1 j2 =1 jn =1
X n
n X n
X
= ··· a1j1 a2j2 · · · anjn D(ej1 , ej2 , . . . , ejn )
j1 =1 j2 =1 jn =1
Note that each D(ej1 , ej2 , . . . , ejn ) = 0 unless all of j1 , . . . , jn are distinct, in which case the map i 7→ ji is
a permutation σ. Thus
X
= a1σ(1) a2σ(2) · · · anσ(n) D(eσ(1) , eσ(2) , . . . , eσ(n) )
σ∈Sn
X
= a1σ(1) a2σ(2) · · · anσ(n) det(Pσ )
σ∈Sn
X
= a1σ(1) a2σ(2) · · · anσ(n) sgn(σ).
σ∈Sn
In the first and third lines we can inductively use the fact that Ln−1 satisfies (P1)(a) to pull out a factor
of α from each summand; since the middle line also has a factor of α, we can see that Ln (A) = αLn (B) as
required.
(P1)(b). Let
a11 · · · b + c · · · a1n a11 · · · b · · · a1n a11 · · · c ··· a1n
A= , B= , C=
v1 · · · x + y · · · vn v1 · · · x · · · vn v1 · · · y ··· vn
b c b+c
where , , and are in column j of the respective matrices. Then
x y x+y
j−1
X
Ln (A) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · x+y · · · vn ]
| {z } |{z}
i=1 skip vi j
j−1
+ (−1) (b + c)Ln−1 [v1 · · · vj−1 vj+1 · · · + vn ]
| {z }
skip x+y
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · x+y · · · vi−1 vi+1 · · · vn ]
|{z} | {z }
i=j+1 j skip vi
j−1
X
Ln (B) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · |{z}
x · · · vn ]
| {z }
i=1 skip vi j
and
j−1
X
Ln (C) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · y · · · vn ]
| {z } |{z}
i=1 skip vi j
j−1
+ (−1) cLn−1 [v1 · · · vj−1 vj+1 · · · + vn ]
| {z }
skip y
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · y · · · vi−1 vi+1 · · · vn ].
|{z} | {z }
i=j+1 j skip vi
3
By applying (P1)(b) to Ln−1 in the first and third lines of the formula for Ln (A), and expanding the
product in the second line, we can see that Ln (A) = Ln (B) + Ln (C) as required.
a11 · · · b b · · · a1n
(P2). Suppose A = has two consecutive equal columns in columns j and j + 1
v1 · · · x x · · · vn
as shown. In the Laplace expansion of Ln (A) we will have:
• n − 2 summands of the form (−1)i−1 a1i Ln−1 [· · · x x · · · ]. Note that all of these equal 0 because
Ln−1 satisfies (P2).
4
MATH 146 March 27 Section 2
det A = −a12 det A12 + a22 det A22 + · · · + (−1)n det An2 .
Example 2. Using A from Example 1, note that row 3 has only one nonzero entry. Using cofactor
expansion along row 3, we get
4 0 9 2 0 9 2 4 9 2 4 0
det A = +0 · det −3 0 8 − 2 · det −1 0 8 + 0 · det 1 −3 8 − 0 · det −1 −3 0
5 7 6 4 7 6 4 5 6 4 5 7
2 0 9
= −2 · det −1 0 8 .
4 7 6
We can calculate the 3 × 3 determinant by cofactor expansion along column 2:
2 0 9
−1 8 2 9 2 9
det −1 0 8 = −0 · det + 0 · det − 7 · det
4 6 4 6 −1 8
4 7 6
2 9
= −7 · det .
−1 8
Hence
2 9
det A = (−2)(−7) det
−1 8
= 14(2 · 8 − (−1)·9)
= 14(25) = 350.
Theorem 5 (Cofactor expansion along any row). Let A ∈ Mn (F). Then for any row i = 1, . . . , n, cofactor
expansion along row i correctly calculates det A. That is,
n
X
det A = (−1)i+j aij det Aij .
j=1
Proof sketch. (Not given in lectures) If i = 1 then this is just the last Theorem from the March 24 lecture.
Consider the case i = 2. Write
a11 · · · a1j−1 a1j a1j+1 · · · a1n
a21 · · · a2j−1 a2j a2j+1 a11 · · · a1j−1 a1j+1 · · · a1n
· · · a2n |
··· | | ··· |
A= | ··· | | | ··· | so A2j = .
v1 · · · vj−1 vj vj+1
v1 · · · vj−1 vj+1 · · · vn
· · · vn
| ··· | | ··· |
| ··· | | | ··· |
Finally, consider the general case i ≥ 2. Let B be the matrix obtained from A by cyclically permuting
the first i rows of A, so that rows 1, 2, . . . , i − 1, i of A become rows 2, 3, . . . , i, 1 of B. In particular, the
i-th row of A is the first row of B. The cyclic permutation (1 2 · · · i) can be simulated by i − 1 swaps
of pairs of rows. Hence
Proof. Cofactor expansion of det A along column j is cofactor expansion of det At along row j. □
The method of expansion by cofactors is useful if a matrix is sparse (has many zeroes), but otherwise is
generally not useful in calculations. The best method in practice uses row-reduction to an upper-triangular
matrix.
a11 a12 · · · a1n
a21 a22 · · · a2n
Definition. Let A ∈ Mn (F), say A = ... .. ... .. .
. .
an1 an2 · · · ann
(1) The diagonal entries of A are a11 , a22 , . . . , ann . Collectively they are called the diagonal.
(2) A is diagonal if all entries above and below the diagonal are 0.
(3) A is upper triangular if all entries below the diagonal are 0.
Theorem 6. If A ∈ Mn (F) is upper-triangular, then det A is the product of the diagonal entries of A.
a11 a12 · · · a1n
0 a22 · · · a2n
Proof sketch. By induction on n. In the inductive step, write A =
... .. . . .. .
. . .
0 0 ··· ann
and expand det A by cofactors along the first column. □
This suggests a method to calculate det A: reduce
A⇝U upper triangular
keeping track of the elementary row and/or column operations O1 , . . . , Oℓ used. If α1 , . . . , αℓ are the scalars
corresponding to O1 , . . . , Oℓ , then
det U = α1 · · · αℓ (det A).
det U can be easily calculated using Theorem 6, and α1 , . . . , αℓ are known, so we can solve for det A.
0 1 3
Example. To find det −2 −4 −5 you can do
3 −1 1
5
0 1 3 −2 −4 −5 1 2 52 1 2 2
1 2 52
−1 −1/2 1 1
A = −2 −4 −5 −→ 0 1 3 −→ 0 1 3 −→ 0 1 3 −→ 0 1 3 =: U.
3 −1 1 3 −1 1 3 −1 1 0 −7 − 13
2
0 0 29 2
Then det U = (−1)(− 12 )(1)(1) det A by tracking row operations, while det U = 29
2
by upper-triangularity.
Hence det A = 29.
3
Here is one last fact to blow your mind.
Definition. Let A ∈ Mn (F) and 1 ≤ i, j ≤ n. The (i, j)-cofactor of A is the scalar (−1)i+j det Aij .
Definition. Given A ∈ Mn (F), for each 1 ≤ i, j ≤ n let cij be the (i, j)-cofactor of A, and let C be the
n × n matrix whose (i, j)-entry is cij . The transpose C t of C is called the adjugate of A.
Theorem 7. For any A ∈ Mn (F), A(C t ) = det A · In .
Proof sketch. (Not given in lecture) Consider the (i, i)-entry of A(C t ). It is the sum of pairwise products
from row i of A and column i of C t , or in other words, of the i-th rows of A and C. The i-th rows of A
and C are
j: 1 2 ··· n
A: ai1 ai2 ··· ain
C: (−1)i+1 det Ai1 (−1)i+2 det Ai2 · · · (−1)i+n det Ain
so the (i, i) entry of A(C t ) is
ai1 (−1)i+1 det Ai1 + ai2 (−1)i+2 det Ai2 + · · · + ain (−1)i+n det Ain
which is exactly the cofactor expansion of det A on row i, and hence equals det A.
Next consider the (i, j)-entry of A(C t ) where j ̸= i. Let B be the matrix obtained from A by replacing
row j (of A) with a copy of row i (of A). Thus det B = 0 since B has two equal rows.
As before, the (i, j)-entry of A(C t ) is the sum of pairwise products from row i of A and column j of C t ,
or in other words, of row i of A and row j of C. Note that row i of A is the same as row j of B. And
because A and B are identical except on row j, we get that Bjℓ = Ajℓ for all ℓ = 1, . . . , n. From this we
can deduce that the (i, j) entry is the cofactor expansion of det B on row j, which equals 0. □
Corollary 3. Suppose A ∈ Mn (F) and det A ̸= 0. Then A−1 = 1
det A
Ct where C t is the adjugate of A.
1 1 det A
Proof. A · det A
Ct = det A
A(C t ) = I
det A n
= In . □
Here is a typical application.
Corollary 4. Suppose A ∈ Mn (R) has integer entries. If det A = ±1, then A−1 also has integer entries.
Proof. The determinant of any square matrix with integer entries is an integer, by the Leibniz formula.
Since A has integer entries, so does every submatrix Aij , and hence every cofactor (−1)i+j det Aij . Thus
the adugate C t has integer entries. If det A = ±1 then A−1 = ±C t by Corollary 3 and so A−1 has integer
entries. □
4
MATH 146 March 29 Section 2
§5.1 Diagonalizability
Motivation: given a linear operator T ∈ L(V ) (with dim V = n), we can represent T by [T ]BA where A, B
are ordered bases of V .
If we are free to choose A and B, then we can always find A, B so that
Ir O
[T ]BA = with r = rank(T )
O′ O′′
(exercise) which is a very nice matrix. However if we want to understand T 2 , T 3 , etc., then we will likely
want A = B. (Then [T 2 ]B = [T ]B [T ]B = ([T ]B )2 , [T 3 ]B = ([T ]B )3 , etc.)
Definition. T ∈ L(V ) is called diagonalizable if there exists an ordered basis B for V such that [T ]B is
diagonal. If V = Fn and T = TA , then we also say that A is diagonalizable.
Diagonalization Problem: Which T ∈ L(V ) are diagonalizable? If T is diagonalizable, how can I find
B so that [T ]B is diagonal?
Special Case: V = Fn , T = TA . Suppose B = (v1 , . . . , vn ) is an ordered basis such that [TA ]B is a
diagonal matrix, say D. Let E be the standard basis for Fn . We have
D = [TA ]B = [Id]BE [TA ]E [Id]EB
Q−1 A Q
where Q = [v1 · · · vn ]. So A is similar to D. The converse is also true: if A is similar to the diagonal
matrix D, say D = Q−1 AQ, then [TA ]B = D where B is the ordered basis for Fn consisting of the columns
of Q (exercise). So TA is diagonalizable iff A is similar to a diagonal matrix.
Matrix diagonalization problem: Which A ∈ Mn (F) are similar to a diagonal matrix? If A is diago-
nalizable, how can I find invertible Q and diagonal D such that Q−1 AQ = D (equivalently, QDQ−1 = A)?
Recall: if A ∈ Mn (F) then pA (t) = det(A − tIn ). The eigenvalues of A are the roots of pA (t).
3/2 −1
Example. A = , pA (t) = t2 − 23 t + 12 , eigenvalues are λ = 12 and λ = 1.
1/2 0
Theorem 2. Let A ∈ Mn (F).
(1) pA (t) is a polynomial cn tn + cn−1 tn−1 + · · · + c1 t + c0 of degree n.
(2) cn = (−1)n .
(3) cn−1 = (−1)n−1 trace(A).
(4) c0 = det A.
Proof. (4) c0 = pA (0) = det(A − 0In ) = det A.
(1)–(3): At end of these lecture notes. □
Now we turn to eigenvectors. First, another definition.
Definition. Suppose V is a vector space over F, T ∈ L(V ), and λ ∈ F is an eigenvalue of T . The
eigenspace of T corresponding to λ is
Eλ = Ker(T − λ·Id) = {v ∈ V : T v = λv}
= {eigenvectors corresponding to λ} ∪ {0}.
Notes:
(1) If V = Fn and T = TA , then Eλ = Null(A − λIn ).
(2) dim Eλ ≥ 1 always.
To find eigenvector(s) corresponding to an eigenvalue λ for A, we can let B = A − λIn and solve Bv = 0.
Our usual method gives a basis for the solution set (which is Eλ ); all nonzero linear combinations of the
basis vectors are eigenvectors.
Recall: to diagonalize T , what we need is a basis for V consisting of eigenvectors; that means a basis
contained in the union of the eigenspaces. The next result will guarantee that we can “glue” bases of the
eigenspaces together and still have a linearly independent set.
Theorem 3. Assume T ∈ L(V ) and λ1 , . . . , λk are distinct eigenvalues of T . Let W = Eλ1 + · · · + Eλk .
Then W = Eλ1 ⊕ · · · ⊕ Eλk .
Proof. At end of these lecture notes. □
As a consequence, we get the following.
Corollary 1. Suppose dim V = n, T ∈ L(V ), and λ1 , . . . , λk are the distinct eigenvalues of T .
(1) dim Eλ1 + · · · + dim Eλk = dim(Eλ1 ⊕ · · · ⊕ Eλk ) ≤ n.
(2) T is diagonalizable ⇐⇒ dim Eλ1 + · · · + dim Eλk = n ⇐⇒ V = Eλ1 ⊕ · · · ⊕ Eλk .
(3) If T is diagonalizable, then a diagonalizing basis can be found by taking the union of bases for the
eigenspaces of T .
Proof. At end of these lecture notes. □
Example 1. A ∈ M2 (R) at start of lecture. Two distinct eigenvalues λ = 21 and λ = 1. Each eigenspace
has dimension ≥ 1, and the sum of their dimensions is ≤ 2, so dim E 1 = dim E1 = 1 and the dimensions
2
sum to 2. Hence A is diagonalizable.
4 0 1
Example 2. Let A = 2 3 2 ∈ M3 (R).
1 0 4
4−t 0 1
4−t 1
pA (t) = det(A − tI3 ) = det 2 3−t 2 = (3 − t) det
1 4−t
1 0 4−t
= (3 − t)((4 − t)2 − 1) = (3 − t)(t2 − 8t + 15) = −(t − 3)2 (t − 5).
Thus the eigenvalues are 3 and 5 (we say that λ = 3 has “algebraic multiplicity 2”)
Next let’s find the eigenspaces and their dimensions.
λ=3
1 0 1
E3 = Null(A − 3I3 ) = Null 2 0 2
1 0 1
By inspection, rank(A − 3I3 ) = 1, so dim(E3 ) = nullity(A − 3I3 ) = 2. To get a basis for E3 , solve the
system (A − 3I3 )x = 0. The augmented matrix for this system is
1 0 1 0 1 0 1 0
2 0 2 0 −→ 0 0 0 0 .
1 0 1 0 0 0 0 0
A basis for the solution set is v1 = (0, 1, 0), v2 = (−1, 0, 1).
λ=5
−1 0 1
E5 = Null(A − 5I3 ) = Null 2 −2 2
1 0 −1
By inspection, rank(A − 5I3 ) = 2, so dim(E5 ) = nullity(A − 5I3 ) = 1.
To get a basis for E5 , solve the system (A − 5I3 )x = 0. The augmented matrix for this system is
−1 0 1 0 1 0 −1 0
2 −2 2 0 −→ 0 1 −2 0 .
1 0 −1 0 0 0 0 0
A basis for the solution set is v3 = (1, 2, 1).
We have dim E3 + dim E5 = 3 so A is diagonalizable. A diagonalizing basis for TA is B = (v1 , v2 , v3 ) and
3 0 0
[TA ]B = 0 3 0 =: D.
0 0 5
If we let
0 −1 1
Q = [v1 v2 v3 ] = 1 0 2
0 1 1
then Q−1 AQ = D.
3 1 0
Example 3. Let’s do the same thing for the matrix B = 0 3 0.
0 0 5
3−t 1 0
pB (t) = det(B − tI3 ) = det 0 3−t 0 = (3 − t)2 (5 − t) = −(t − 3)2 (t − 5).
0 0 5−t
2
The same as pA (t); hence the same eigenvalues and same multiplicities. Let’s find the eigenspaces and
their dimensions.
λ=3
0 1 0
E3 = Null(B − 3I3 ) = Null 0 0 0 .
0 0 2
By inspection, rank(B − 3I3 ) = 2, so dim(E3 ) = nullity(B − 3I3 ) = 1.
λ = 5.
We will see next week that dim Eλ is always ≤ the algebraic multiplicity of λ. Since 5 has algebraic
multiplicity 1, we know that dim(E5 ) = 1.
So dim E3 + dim E5 = 1 + 1 = 2 < 3, so B is not diagonalizable.
n
X
n
(a11 − t)(a22 − t) · · · (ann − t) = (−t) + aii (−t)n−1 + (lower terms)
i=1
= (−1) t + (−1)n−1 (a11 + · · · + ann )tn−1 + (lower terms).
n n
| {z }
trace A
By induction,
(λ1 − λk )x1 = · · · = (λk−1 − λk )xk−1 = 0
and since the λi ’s are distinct, we get
x1 = · · · = xk−1 = 0.
Plugging back into (1) gives xk = 0 as well. □
Before proving Corollary 1, we state and prove a consequence of Theorem 3 that we will need.
Corollary 0. Suppose dim V = n, T ∈ L(V ), and λ1 , . . . , λk are the distinct eigenvalues of T . Let
B1 , . . . , Bk be bases for Eλ1 , . . . , Eλk respectively. Then B1 ∪ · · · ∪ Bk is linearly independent.
Proof. Let di = dim Eλi and write Bi = {v1i , . . . , vdi i }. Suppose ait are scalars such that
a11 v11 + · · · + a1d1 vd11 + · · · + ai1 v1i + · · · + aidi vdi i + · · · + ak1 v1k + · · · + akdk vdkk = 0.
| {z } | {z } | {z }
x1 xi xk
Let x1 , . . . , xk be the elements named in the above display. Then xi ∈ Eλi for each i. By the proof of
Theorem 3, x1 = · · · = xk = 0. That is, for each fixed i, dt=1 at vt = 0. Since Bi is linearly independent,
Pi i i
each ait = 0. Since all coefficients are 0, B1 ∪ · · · ∪ Bk is linearly independent. □
Proof of Corollary 1. (1) The equality follows from repeated use of dim(W1 ⊕ W2 ) = dim W1 + dim W2 (see
the remarks after Theorem 1 on Feb. 1). The inequality follows from the fact that Eλ1 ⊕ · · · ⊕ Eλk is a
subspace of V .
(2) For i = 1, . . . , k let di = dim Eλi . Clearly dim(Eλ1 ⊕ · · · ⊕ Eλk ) = ki=1 di , so V = Eλ1 ⊕ · · · ⊕ Eλk
P
4
MATH 146 April 3 Section 2
Definition. Given A ∈ Mn (F) and an eigenvalue λ of A, the algebraic multiplicity of λ is the maximum
m ≥ 1 such that (t − λ)m | pA (t).
4 0 1
Example. Let A = 2 3 2 ∈ M3 (R). We saw on Friday that pA (t) = −(t − 3)2 (t − 5). Eigenvalue
1 0 4
λ = 3 has algebraic multiplicity 2. Eigenvalue λ = 5 has algebraic multiplicity 1.
Theorem 4. Suppose A ∈ Mn (F) and λ is an eigenvalue of A. Then
dim Eλ ≤ (algebraic multiplicity of λ).
(We’ll prove this at the end of the lecture.)
1 0 0
Example. A = 0 0 −1 ∈ M3 (R). Is A diagonalizable?
0 1 0
1−t 0 0
−t −1
pA (t) = det 0 −t −1 = (1 − t) det
= −(t − 1)(t2 + 1).
1 −t
0 1 −t
t2 + 1 cannot be factored over R. We say that pt (A) doesn’t split (over R), because it doesn’t completely
factor as a product of linear factors. Considered as a matrix over R, the only eigenvalue is λ = 1. Its
eigenspace will have dimension 1, not enough to diagonalize A.
If on the other hand we consider A as a matrix over C, then pA (t) does split:
pA (t) = −(t − 1)(t − i)(t + i)
There are three distinct eigenvalues: 1, i and −i. Each has algebraic multiplicity 1. The dimension of each
eigenspace is 1, so the sum of the dimensions is 3, and A is diagonalizable under this interpretation.
We see that when discussing diagonalizability of a matrix, we must also be careful to specify over which
field F we are considering it. (In other words, we must specify the vector space Fn on which TA operates.)
Using Theorem 4, we can prove one more characterization of diagonalizability.
Theorem 5. Suppose A ∈ Mn (F). A is diagonalizable over F iff pA (t) splits over F and dim Eλ =
(alg. mult. of λ) for each eigenvalue λ.
Proof. (Not given in lectures) Let λ1 , . . . , λk be the distinct eigenvalues of A and let m1 , . . . , mk be their
algebraic multiplicities. Then
pA (t) = c(t − λ1 )m1 · · · (t − λk )mk if pA (t) splits;
while if pA (t) doesn’t split, then
pA (t) = c(t − λ1 )m1 · · · (t − λk )mk q(t) with deg(q(t)) ≥ 2.
Note that deg(pA (t)) = n by Theorem 2 (March 31). Hence using Theorem 4 we get
k k
X X n if pA (t) splits
dim Eλi ≤ mi =
n − deg(q(t)) if pA (t) does not split
i=1 i=1
Pk
So we get i=1 dim Eλi = n exactly when pA (t) splits and dim Eλi = mi for all i. □
The rest of this lecture is devoted to a proof of Theorem 4. To get there, we need to extend some of our
analysis of eigenvalues from matrices to linear operators.
First, given T ∈ L(V ) where V is finite-dimensional, how can we find the eigenvalues of T ?
Answer: Pick an ordered basis B, find eigenvalues of [T ]B .
This works because T and A := [T ]B have the same eigenvalues:
T v = λv ⇐⇒ [T v]B = [λv]B
⇐⇒ [T ]B [v]B = λ[v]B
⇐⇒ A[v]B = λ[v]B .
We also get a complete translation between
Eigenvectors of T and eigenvectors of [T ]B : v ↭ [v]B
Eigenspaces of T and eigenspaces of [T ]B : EλT ∼
= EλA via v 7→ [v]B
Dimensions of eigenspaces: dim EλT = dim EλA .
Note: we haven’t yet defined the characteristic polynomial of a linear operator. Time to fix that.
Definition. Suppose V is a vector space over F with dim V = n, and T ∈ L(V ). The characteristic
polynomial of T , denoted pT (t), is obtained by choosing an ordered basis B for V , letting A = [T ]B , and
def
defining pT (t) = pA (t).
There is one annoying issue: we need to prove that pT (t) is well-defined, i.e., doesn’t depend on the
choice of B.
Lemma. Let V and T be as above. Then pT (t) is well-defined; that is, if A = [T ]B and B = [T ]C where
B, C are ordered bases for V , then pA (t) = pB (t).
Proof. This will follow from the fact that A and B are similar. Write B = Q−1 AQ where Q is the
change-of-coordinate matrix from C to B. First observe that det B = det A, since
det B = (det Q−1 )(det A)(det Q)
= (det A)(det Q−1 )(det Q)
= (det A) det(Q−1 Q) = det A.
(Hey, we’ve proved that similar matrices always have the same determinant.) Next observe that
B − tIn = B − tQ−1 In Q
= Q−1 AQ − Q−1 (tIn )Q
= Q−1 (A − tIn )Q.
So B − tIn and A − tIn are similar. So
pB (t) = det(B − tIn ) = det(A − tIn ) = pA (t). □
Now we can prove Theorem 4. It is easier to prove its generalization to linear operators.
Theorem 4 (Generalization). Suppose V is finite-dimensional, T ∈ L(V ), and λ is an eigenvalue of T .
Then
dim Eλ ≤ (algebraic multiplicity of λ).
Proof. Let d = dim Eλ . It suffices to prove (t − λ)d | pT (t).
Let (v1 , . . . , vd ) be an ordered basis for Eλ , and extend it to an ordered basis B = (v1 , . . . , vd , vd+1 , . . . , vn )
for V . Let A = [T ]B , so pT (t) = pA (t) by the Lemma.
2
Observe that for i = 1, . . . , d,
T (vi ) = λvi (because vi ∈ Eλ )
= 0v1 + · · · + 0vi−1 + λvi + 0vi+1 + · · · 0vd + 0vd+1 + · · · + 0vn .
Hence
λ 0 ··· 0 ∗ ··· ∗
0 λ ··· 0 ∗ ··· ∗
.. .. .. .
. . . .. ∗ ··· ∗
λId B
A = 0 0 ··· λ ∗ ··· ∗ =
O C
0 0 ··· 0 ∗ ··· ∗
.. .. .. .. ..
. . . . .
0 0 ··· 0 ∗ ··· ∗
for some matrices of the appropriate dimensions. Then
(λ − t)Id B
A − tIn = where m = n − d.
O C − tIm
Both A and A − tIn are block upper triangular (with square blocks on the diagonal). In A6 Q3 you will
show that if B = O then the determinant of such a matrix is is equal to the product of the determinants of
the two diagonal blocks. Essentially the same argument shows that this is true even if B ̸= O (exercise!).
Note that
λ−t 0 ··· 0
0 λ − t ··· 0
det((λ − t)Id ) = det .
.. .
.. . .. .. = (λ − t)d .
.
0 0 ··· λ−t d×d
Thus
pT (t) = det((λ − t)Id ) · det(C − tIm )
= (λ − t)d · det(C − tIm )
= (λ − t)d · pC (t).
This proves that (t − λ)d divides pT (t) as required. □
3
MATH 146 April 5 Section 2
Here is a little parlour trick which is actually examinable material. Suppose A ∈ Mn (F) is diagonalizable.
Thus there exists an invertible matrix P and a diagonal matrix
λ1 0 · · · 0
0 λ2 · · · 0
D= ... .. . . .
. . ..
0 0 ··· λn
such that P −1 AP = D. We can solve for A to get A = P DP −1 . Then for any m ≥ 1,
Am = (P DP −1 )(P DP −1 )(P DP −1 ) · · · (P DP −1 )
| {z }
m
m −1
= PD P .
Now Dm is easily calculated:
λm 0 ··· 0
1
0 λm 2 ··· 0
Dm =
... .. . . . .
. . ..
0 0 · · · λmn
2
As ε, δ → 0, the second eigenspace may rotate wildly, and in effect can be anything we want. In particular,
we can pick our rates of convergence so that E1+ε converges to the y-axis. So in the limit, when the two
eigenvalues converge to each other, we might expect a 2-dimensional eigenspace; which is what happens.
Now do the same thing to B; let
1−ε 1+δ
Bε,δ = .
0 1+ε
Bε,δ has the same two eigenvalues: 1 − ε and 1 + ε. But this time its eigenspaces are
0 1+δ 1
E1−ε = Null = span , the x-axis
0 2ε 0
−2ε 1 + δ 1 2ε
E1+ε = Null = span , the line thru 0 with slope 1+δ .
0 0 2ε
As ε, δ → 0, the second eigenspace converges to the first, so in the limit, there is only 1 dimension’s worth
of eigenvectors. Viewed in this way, the 1-dimensional eigenspace of B can be seen as a “singularity,” that
is, two 1-dimensional eigenspaces which have collapsed onto each other.