1739538576_Eng_Math_Unit_2
1739538576_Eng_Math_Unit_2
1
The Characteristic Polynomial of a Matrix
Recall that a number λ is an eigenvalue of A ∈ Rn×n if there exists a non-zero vector v such
that
Av = λv
or equivalently if v ∈ Null(A − λI). In other words, λ is an eigenvalue of A if and only if the
subspace Null(A − λI) contains a vector other than the zero vector. We know that any matrix
M has a non-trivial null space if and only if M is non-invertible, if and only if det(M) = 0.
Hence, λ is an eigenvalue of A if and only if λ satisfies det(A − λI) = 0.
Let’s compute the expression det(A − λI) for a generic 2 × 2 matrix:
a11 − λ a12
det(A − λI) = = (a11 −λ)(a22 −λ)−a12 a21 = λ2 −(a11 +a22 )λ+a11 a22 −a12 a21 .
a21 a22 − λ
Thus, if A is 2 × 2 then
det(A − λI) = λ2 − (a11 + a22 )λ + a11 a22 − a12 a21
is a polynomial in the variable λ of degree n = 2. This motivates the following definition.
Definition Let A be an n × n matrix. The polynomial
p(λ) = det(A − λI)
is called the characteristic polynomial of A.
In summary, to find the eigenvalues of A we must find the roots of the characteristic poly-
nomial:
p(λ) = det(A − λI).
The following theorem asserts that what we observed for the case n = 2 is indeed true for
all n.
Theorem 2. The characteristic polynomial p(λ) = det(A − λI) of an n × n matrix A is an nth
degree polynomial.
Proof: Recall that for the case n = 2 we computed that
det(A − λI) = λ2 − (a11 + a22 )λ + a11 a22 − a12 a21 .
Therefore, the claim holds for n = 2. By induction, suppose that the claims hold for n ≥ 2. If
A is an (n + 1) × (n + 1) matrix then expanding det(A − λI) along the first row:
n+1
X
det(A − λI) = (a11 − λ) det(A11 − λI) + (−1)1+k a1k det(A1k − λI).
k=2
By induction, each of det(A1k − λI) is an nth degree polynomial. Hence, (a11 − λ) det(A11 − λI)
is an (n + 1)th degree polynomial. This ends the proof.
Example: Find the eigenvalues of
−4 −6 −7
A= 3 5 3 .
0 0 3
Solution. Compute
−4 −6 −7 λ 0 0 −4 − λ −6 −7
A − λI = 3 5 3 − 0 λ 0 = 3 5−λ 3 .
0 0 3 0 0 λ 0 0 3−λ
2
Then
5−λ 3 −6 −7
det(A − λI) = (−4 − λ) −3
0 3−λ 0 3−λ
= (−4 − λ)[(3 − λ)(5 − λ) + 3λ] − 3[−6(3 − λ) − 7λ] = λ3 − 4λ2 + λ + 6.
Factor the characteristic polynomial:
p(λ) = λ3 − 4λ2 + λ + 6 = (λ − 2)(λ − 3)(λ + 1).
Therefore, the eigenvalues of A are λ1 = 2, λ2 = 3, λ3 = −1.
form a basis for R3 . Hence, for the repeated eigenvalue λ2 = 2, we were able to find two linearly
independent eigenvectors.
3
Cayley-Hamilton Theorem
Statement: Every square matrix satisfies its own characteristic equation.
2 −1 2 2 2 −1
S2 = sum of the minors of its leading diagonal elements = + +
−1 2 1 2 1 2
= (4 − 1) + (4 − 2) + (4 − 1) = 3 + 2 + 3 = 8
S3 = |A| = 2(4 − 1) + 1(−2 + 1) + 2(1 − 2) = 2(3) + 1(−1) + 2(−1) = 6 − 1 − 2 = 3
So, the characteristic equation of A is λ3 − 6λ2 + 8λ − 3 = 0.
By Cayley-Hamilton theorem
A3 − 6A2 + 8A − 3I = O (1)
Verification:
−2 −1 2 −2 −1 2 7 −6 9
2
A = A × A = −1
2 −1 −1
2 −1 = −5
6 −6
1 −1 2 1 −1 2 5 −5 7
−2 −1 2 7 −6 9 29 −28 38
3 2
A = A × A = −1
2 −1 −5
6 −6 = −22
23 −28
1 −1 2 5 −5 7 22 −22 29
So,
29 −28 38 7 −6 9 −2 −1 2 1 0 0
A3 −6A2 +8A−3I = −22 23 −28 −6 −5 6 −6 +8 −1 2 −1 −3 0 1 0
22 −22 29 5 −5 7 1 −1 2 0 0 1
0 0 0
= 0 0 0 = O
0 0 0
To find A4 : From (1),
A3 = 6A2 − 8A + 3I (2)
Multiply A on both sides, we get
4
Using (3),
7 −6 9 −2 −1 2 1 0 0
A4 = 28 −5 6 −6 − 45 −1 2 −1 + 18 0 1 0
5 −5 7 1 −1 2 0 0 1
124 −123 162
= −95 96 −123
95 −95 124
To find A−1 : Multiply (1) by A−1 ,
A2 − 6A + 8I − 3A−1 = O
3A−1 = A2 − 6A + 8I
7 −6 9 −2 −1 2 1 0 0
3A−1 = −5 6 −6 − 6 −1 2 −1 + 8 0 1 0
5 −5 7 1 −1 2 0 0 1
30 −3 12
= 0 −11 3
−3 0 9
So,
30 −3 12
1
A−1 = 0 −11 3
3
−3 0 9
5
However, as we will see in the next example, the geometric multiplicity gi is in general less than
the algebraic multiplicity ki :
gi ≤ ki .
Example: Find the eigenvalues of A and a basis for each eigenspace:
2 4 3
A = −4 −6 −3
3 3 1
For each eigenvalue of A, find its algebraic and geometric multiplicity. Does R3 have a basis
of eigenvectors of A?
Solution. One computes
6
If A is similar to B then B is similar to A because from the equation A = PBP−1 we can
multiply on the left by P−1 and on the right by P to obtain that
P−1 AP = B.
Hence, with Q = P−1 , we have that B = QAQ−1 and thus B is similar to A. Hence, if A
is similar to B then B is similar to A and therefore we simply say that A and B are similar.
Matrices that are similar are clearly not necessarily equal. However, there is a reason why
the word similar is used. Here are a few reasons why.
Theorem 3. If A and B are similar matrices then the following are true:
1. rank(A) = rank(B)
2. det(A) = det(B)
det(A − λI) = det A − λPP−1 = det PBP−1 − λPP−1 = det P(B − λI)P−1
Thus, A and B have the same characteristic polynomial, and hence the same eigenvalues.
Diagonalization
Eigenvalues of Triangular Matrices
Before discussing diagonalization, we first consider the eigenvalues of triangular matrices.
Theorem 4. Let A be a triangular matrix (either upper or lower). Then the eigenvalues of A
are its diagonal entries.
Proof: We will prove the theorem for the case n = 3 and A is upper triangular; the general
case is similar. Suppose then that A is a 3 × 3 upper triangular matrix:
a11 a12 a13
A = 0 a22 a23
0 0 a33
Then
a11 − λ a12 a13
A − λI = 0 a22 − λ a23
0 0 a33 − λ
and thus the characteristic polynomial of A is
7
Example: Consider the following matrix:
6 0 0 0 0
−1 0 0 0 0
A= 0 0 7 0 0
−1 0 0 −4 0
8 −2 3 0 7
We now introduce a very special type of a triangular matrix, namely, a diagonal matrix.
Definition: A matrix D whose off-diagonal entries are all zero is called a diagonal matrix. For
example, here is a 3 × 3 diagonal matrix:
3 0 0
D = 0 −5 0
0 0 −8
A diagonal matrix is clearly also a triangular matrix, and therefore the eigenvalues of a diagonal
matrix D are simply the diagonal entries of D. Moreover, the powers of a diagonal matrix are
easy to compute. For example, if
λ1 0
D=
0 λ2
then 2
2 λ1 0 λ1 0 λ1 0
D = =
0 λ2 0 λ2 0 λ22
and similarly for any integer k = 1, 2, 3, . . ., we have that
k
k λ1 0
D =
0 λk2
Diagonalization
Definition: A matrix A is called diagonalizable if it is similar to a diagonal matrix D. In other
words, if there exists an invertible matrix P such that
A = PDP−1 .
How do we determine when a given matrix A is diagonalizable? Let us first determine what
conditions need to be met for a matrix A to be diagonalizable. Suppose then that A is diago-
nalizable. Then by Definition, there exists an invertible matrix
P = v1 v2 · · · vn
8
and a diagonal matrix
λ1 0 ··· 0
0 λ2 ··· 0
D = ..
.. ... ..
. . .
0 0 · · · λn
such that A = PDP−1 . Multiplying on the right both sides of the equation A = PDP−1 by
the matrix P, we obtain
AP = PD.
Now
AP = Av1 Av2 · · · Avn
while on the other hand
PD = λ1 v1 λ2 v2 · · · λn vn .
Therefore, since it holds that AP = PD, we have
Av1 Av2 · · · Avn = λ1 v1 λ2 v2 · · · λn vn .
Comparing columns, we must have that
Avi = λi vi .
Thus, the columns v1 , v2 , . . . , vn of P are eigenvectors of A and form a basis for Rn because P
is invertible. In conclusion, if A is diagonalizable, then Rn has a basis consisting of eigenvectors
of A.
Suppose instead that {v1 , v2 , . . . , vn } is a basis of Rn consisting of eigenvectors of A. Let
λ1 , λ2 , . . . , λn be the eigenvalues of A associated with v1 , v2 , . . . , vn , respectively, and set
P = v1 v2 · · · vn .
Then P is invertible because {v1 , v2 , . . . , vn } are linearly independent. Let
λ1 0 ··· 0
0 λ2 · · · 0
D = .. .. .
.. . .
. . . .
0 0 · · · λn
Now, since Avi = λi vi we have that
AP = A v1 v2 · · · vn = Av1 Av2 · · · Avn = λ1 v1 λ2 v2 · · · λn vn .
Therefore,
AP = λ1 v1 λ2 v2 · · · λn vn .
On the other hand,
λ1 0 ··· 0
0 λ2
··· 0
PD = v1 v2 · · · vn .. .. .. .. = λ1 v1 λ2 v2 · · · λn vn .
. . . .
0 0 · · · λn
Therefore, AP = PD, and since P is invertible we have that
A = PDP−1 .
Thus, if Rn has a basis consisting of eigenvectors of A, then A is diagonalizable. We have
therefore proved the following theorem.
9
Theorem 5. A matrix A is diagonalizable if and only if there is a basis {v1 , v2 , . . . , vn } of Rn
consisting of eigenvectors of A.
Proof. Each eigenvalue λi produces an eigenvector vi . The set of eigenvectors {v1 , v2 , . . . , vn } are
linearly independent because they correspond to distinct eigenvalues. Therefore, {v1 , v2 , . . . , vn }
is a basis of Rn consisting of eigenvectors of A, and then by Theorem 5 we conclude that A is
diagonalizable.
What if A does not have distinct eigenvalues? Can A still be diagonalizable? The following
theorem completely answers this question.
Theorem 7. A matrix A is diagonalizable if and only if the algebraic and geometric multiplicities
of each eigenvalue are equal.
10
Therefore, a matrix that diagonalizes A is
−1 −2 −2
P= 1 0 1
0 1 0
One finds that g2 = dim(Null(A − λ2 I)) = 2, and two linearly independent eigenvectors for λ2
are
−1 0
{v2 , v3 } = 0 , 1
2 0
Orthogonal Diagonalization
Recall that an n × n matrix A is diagonalizable if and only if it has n linearly independent
eigenvectors. Moreover, the matrix P with these eigenvectors as columns is a diagonalizing
matrix for A, that is,
P −1 AP is diagonal.
As we have seen, the really nice bases of Rn are the orthogonal ones, so a natural question is:
which n × n matrices have an orthogonal basis of eigenvectors? These turn out to be precisely
the symmetric matrices, and this is the main result of this section.
11
Before proceeding, recall that an orthogonal set of vectors is called orthonormal if ∥v∥ = 1
for each vector v in the set, and that any orthogonal set {v1 , v2 , . . . , vk } can be “normalized,”
that is, converted into an orthonormal set:
v1 v2 vk
, ,..., .
∥v1 ∥ ∥v2 ∥ ∥vk ∥
In particular, if a matrix A has n orthogonal eigenvectors, they can (by normalizing) be taken
to be orthonormal. The corresponding diagonalizing matrix P has orthonormal columns, and
such matrices are very easy to invert.
Theorem 8. The following conditions are equivalent for an n × n matrix P .
1. P is invertible and P −1 = P T .
2. The rows of P are orthonormal.
3. The columns of P are orthonormal.
Proof. First, recall that condition (1) is equivalent to P P T = I. Let x1 , x2 , . . . , xn denote the
rows of P . Then xTj is the jth column of P T , so the (i, j)-entry of P P T is given by xi · xj .
Thus, P P T = I means that xi · xj = 0 if i ̸= j and xi · xj = 1 if i = j. Hence, condition (1) is
equivalent to (2). The proof of the equivalence of (1) and (3) is similar.
12
Theorem 9. The following conditions are equivalent for an n × n matrix A.
2. A is orthogonally diagonalizable.
3. A is symmetric.
Quadratic Forms
Definition: A quadratic form in n variables is a function f : Rn → R of the form
X
f (x) = f (x1 , . . . , xn ) = cij xi xj (∗)
1≤i≤j≤n
1. f (x1 ) = x21
f (x) = xT Ax,
where A is the symmetric n × n matrix with (i, j)-th entry equal to aij . This matrix A is called
the matrix of the quadratic form f .
Example: Let
f (x1 , x2 ) = 2x21 − 3x22 − x1 x2 .
Then,
2 − 21
x1
f (x1 , x2 ) = x1 x2 .
− 12 −3 x2
Since a symmetric matrix is involved, we can use Theorem 9, which states that there exists an
orthogonal matrix Q such that
λ1 · · · 0
QT AQ = D = ... . . . ... ,
0 · · · λn
13
Setting y = Q−1 x = QT x, we obtain
x = Qy,
so that
f (x) = (Qy)T A(Qy) = y T QT AQy = y T Dy.
If
y1
..
y = . ,
yn
then
y T Dy = λ1 y12 + · · · + λn yn2 ,
which is a quadratic form in the variables y1 , . . . , yn with no cross terms. This process is called
the diagonalization of the quadratic form f .
Theorem 10. (Principal Axes Theorem) Every quadratic form f can be diagonalized.
Specifically, if
f (x) = xT Ax
is a quadratic form in
x1
x = ... ,
xn
then there exists an orthogonal matrix Q such that
where
y1 x1
.. T ..
. = Q . ,
yn xn
and λ1 , . . . , λn are the eigenvalues of A.
From linear algebra, we know that Q is the matrix whose columns are the unit eigenvectors
of A.
Example: Let
f (x1 , x2 , x3 ) = 2x1 x2 + 2x1 x3 + 2x2 x3 .
The matrix of f is
0 1 1
A = 1 0 1 .
1 1 0
The eigenvalues of A are 2, −1, −1 with corresponding unit eigenvectors
√ √
1/√3 −2/√ 6 0√
1/ 3 , 1/ 6 , 1/ 2 .
√ √ √
1/ 3 1/ 6 −1/ 2
Thus, √ √
1/√3 −2/√ 6 0√
Q = 1/√3 1/√6 1/ √2 .
1/ 3 1/ 6 −1/ 2
14
Setting y = QT x, we get
1 1 1
y1 = √ (x1 + x2 + x3 ), y2 = √ (−2x1 + x2 + x3 ), y3 = √ (x2 − x3 ).
3 6 2
Expressing f in terms of y1 , y2 , y3 , we obtain
Theorem 11. Let f (x) = xT Ax be a quadratic form with matrix A. Then f is positive definite
if and only if all the eigenvalues of A are positive.
Proof : By the Principal Axes Theorem, there exists an orthogonal matrix Q such that
where
y1
..
y = . = QT x
yn
and λ1 , . . . , λn are the eigenvalues of A. If all the λi are positive, then f (x) > 0 except when
y = 0. But this happens if and only if x = 0 because QT is invertible. Therefore, f is positive
definite.
On the other hand, if one of the eigenvalues λi ≤ 0, letting y = ei and x = Qy, we get
f (x) = λi ≤ 0 and so f is not positive definite. We say that a symmetric matrix A is positive
definite if the associated quadratic form
f (x) = xT Ax
is positive definite. The Principal Axes Theorem has important applications in geometry.
15