Unitary and Orthogonal Operators and Their Matrices-Pt2
Unitary and Orthogonal Operators and Their Matrices-Pt2
Definition 6.36. A linear isometry of an inner product space V over F is a linear map T satisfying
∀x ∈ V, ||T(x)|| = ||x||
It should be clear that every eigenvalue of an isometry must have modulus 1: if T(w) = λw, then
The matrix in this example is very special in that its inverse is its transpose:
−1 1 4 3 1 4 3
A = 16 = = AT
+ 9 −3 4
25 25
5 −3 4
Definition 6.38. A unitary operator T on an inner product space V is an invertible linear map satis-
fying T∗ T = I = TT∗ . A unitary matrix is a matrix satisfying A∗ A = I.
• If V is real, we usually call these orthogonal operators/matrices: this isn’t necessary, since unitary
encompasses both real and complex spaces. Note that an orthogonal matrix satisfies A T A = I.
1
Theorem 6.40. Let T be a linear operator on V.
The finite-dimensional restriction is important in part 2: we use the existence of adjoints, the spectral
theorem, and that a left-inverse is also a right-inverse. Again, see Exercise 6.5.12. for an example of a
non-unitary isometry in infinite dimensions.
The proof shows a little more:
Corollary 6.41. On a finite dimensional space, being unitary is equivalent to each of the following:
(a) Preservation of the inner producta (†). In particular, in a real inner product space isomteries
x,y
also preserve the angle θ between vectors since cos θ = ||xh ||||yi || .
(b) The existence of an orthonormal basis β = {w1 , . . . , wn } such that T( β) = {T(w1 ), . . . , T(wn )}
is also orthonormal.
(c) That every orthonormal basis β of V is mapped to an orthonormal basis T( β).
a (†) is in fact equivalent to being an isometry in infinite dimensions: recall the polarization identity. . .
While (a) is simply (†), claims (b) and (c) are also worth proving explicitly: see Exercise 6.5.8. If β is
the standard orthonormal basis of Fn and T = L A , then the columns of A form the orthonormal set
T ( β). This makes identifying unitary/orthogonal matrices easy:
Corollary 6.42. A matrix A ∈ Mn (R) is orthogonal if and only if its columns form an orthonormal
basis of Rn with respect to the standard (dot) inner product.
A matrix A ∈ Mn (C) is unitary if and only if its columns form an orthonormal basis of Cn with
respect to the standard (hermitian) inner product.
2
Examples 6.43. 1. The matrix Aθ = cos θ − sin θ ∈ M (R) is orthogonal for any θ. Example 6.37 is
sin θ cos θ 2
this with θ = tan−1 34 . More generally (Exercise 6.5.6.), it can be seen that every real orthogonal
2 × 2 matrix has the form Aθ or
cos θ sin θ
Bθ =
sin θ − cos θ
for some angle θ. The effect of the linear map L Aθ is to rotate counter-clockwise by θ, while that
of L Bθ is to reflect across the line making angle 12 θ with the positive x-axis.
√ √ !
√2 3 1
2. A = √1 ∈ M3 (R) is orthogonal: check the columns!.
6 √2 √0 −2
− 2 3 −1
3. The matrix A = √1 1 i is unitary: indeed it maps the standard basis to the orthonormal basis
2 i 1
1 1 1 i
T( β ) = √ ,√
2 i 2 1
It is also easy to check that the characteristic polynomial is
!
√1 − t √i 1 2 1 1
p(t) = det 2 2 = t− √ + =⇒ t = √ (1 ± i ) = e±πi/4
i 1
√ −t
√
2 2 2 2 2
T( f ) = λ f ⇐⇒ ∀ x, eix f ( x ) = λ f ( x ) ⇐⇒ f ( x ) ≡ 0
• T certainly maps any orthonormal set to an orthonormal set, however it can be seen that
C [−π, π ] has no orthonormal basis!a
a An orthonormal set β = { f : k ∈ Z} can be found so that every function f equals an infinite series in the sense that
k
|| f − ∑ ak f k || = 0. However, these are not finite sums and so β is not a basis. Moreover, given that the norm is defined by
an integral, this also isn’t quite the same as saying that f = ∑ ak f k as functions. Indeed there is no guarantee that such an
infinite series is itself continuous! For these reasons, when working with Fourier series, one tends to consider a broader
class than the continuous functions.
3
Unitary and Orthogonal Equivalence
Suppose A ∈ Mn (R) is symmetric (self-adjoint) A T = A. By the spectral theorem, A has an orthonor-
mal eigenbasis β = {w1 , . . . , wn }: Aw j = λ j w j . If we write U = (w1 · · · wn ), then the columns of U
are orthonormal and thus U is an orthogonal matrix. We can therefore write
λ1 · · · 0
A = UDU −1 = U ... .. U T
..
. .
0 ··· λn
The same approach works if A ∈ Mn (C) is normal: we now have A = UDU ∗ where U is unitary.
Example 6.44. The matrix A = −11+−i i 11+
+i
i is normal as can easily be checked. Its characteristic
polynomial is
Definition 6.45. Square matrices A, B are unitarily equivalent if there exists a unitary matrix U such
that B = U ∗ AU. Orthogonal equivalence is similar: B = U T AU.
Theorem 6.46. A ∈ Mn (C) is normal if and only if it is unitarily equivalent to a diagonal matrix
(the matrix of its eigenvalues).
Similarly, A ∈ Mn (R) is symmetric if and only if it is orthogonally equivalent to a diagonal matrix.
A∗ A = (U ∗ DU )∗ U ∗ DU = U ∗ D ∗ UU ∗ DU = U ∗ DDU = · · · = U ∗ DU (U ∗ DU )∗ = AA∗
A T = (U T DU )T = U T D T U = U T DU = A
4
Exercises. 6.5.1. For each matrix A find an orthogonal or unitary U and a diagonal D = U ∗ AU.
2 1 1
1 2 0 −1 2 3 − 3i
(a) (b) (c) (d) 1 2 1
2 1 1 0 3 + 3i 5
1 1 2
6.5.2. Which of the following pairs are unitarily/orthogonally equivalent? Explain your answer.
0 −1 0 2 0 0
0 1 0 2
(a) A = and B = (b) A = 1 0 0 and B = 0 −1 0
1 0 2 0
0 0 1 0 0 0
0 −1 0 1 0 0
(c) A = 1 0 0 and B = 0 i 0
0 0 1 0 0 −i
6.5.3. Let a, b ∈ C be such that | a|2 + |b|2 = 1. Prove that every 2 × 2 matrix of the form a −iθe b is
iθ
b e a
unitary. Are these all the unitary 2 × 2 matrices? Prove or disprove. . .
6.5.4. If A, B are orthogonal/unitary, prove that AB and A−1 are also orthogonal/unitary.
(This proves that orthogonal/unitary matrices are groups under matrix multiplication)
6.5.5. Check that A = 13 4i5 −54i ∈ M2 (C) satisfies A T A = I and is therefore a complex orthogonal
matrix.
(These don’t have the same nice relationship with inner products, and are thus less useful to us)
6.5.6. Supply the details of Exercise 6.43.1.
(Hints: β = {i, j} is orthonormal, whence { Ai, Aj} must be orthonormal. Now draw pictures to
compute the result of rotating and reflecting the vectors i and j.)
6.5.7. Prove that A ∈ Mn (C) has an orthonormal basis of eigenvectors whose eigenvalues have
modulus 1, if and only if A is unitary.
6.5.8. Prove parts (b) and (c) of Corollary 6.41 for a finite-dimensional inner product space:
(a) If β is an orthonormal basis such that T( β) is orthonormal, then T is unitary.
(b) If T is unitary, and η is an orthonormal basis, then T(η ) is an orthonormal basis.
6.5.9. Let T be a linear operator on a finite-dimensional inner product space V. If ||T(x)|| = ||x|| for
all x in some orthonormal basis of V, must T be unitary? Prove or disprove.
6.5.10. Let T be a unitary operator on an inner product space V and let W be a finite-dimensional
T-invariant subspace of V. Prove:
(a) T(W ) = W (Hint: show that TW is injective. . . );
(b) W ⊥ is T-invariant.
6.5.11. Let W a subspace of an inner product space V such that V = W ⊕ W ⊥ . Define T ∈ L(V ) by
T(u + w) = u − w where u ∈ W and w ∈ W ⊥ . Prove that T is unitary and self-adjoint.
6.5.12. In the inner product space `2 of square-summable sequences (see section 6.1), consider the
linear operator T( x1 , x2 , . . .) = (0, x1 , x2 , . . .). Prove that T is an isometry and compute its
adjoint. Check that T is non-invertible and non-unitary.
6.5.13. Prove Schur’s Lemma for matrices. Every A ∈ Mn (R) is orthogonally equivalent and every
A ∈ Mn (C) is unitarily equivalent to an upper triangular matrix.
5
6.6 Orthogonal Projections
Recall the discussion surrounding the Gram-Schmidt process, where we saw that any finite-dimensional
subspace W of an inner product space V has an orthonormal basis βW = {w1 , . . . , wn }. We could
then define the orthogonal projection onto W as the map
n
πW : V → V : x 7 → ∑ x, w j w j
j =1
In this section we develop the projections more rigorously, though we start from a slightly different
place. First recall the notion of a direct sum within a vector space V:
1 −1 2
1
2
Example 6.48. A= 5 −3 6 is a projection matrix with R( A) = Span 3 and N ( A) = Span 1 .
Identifying projections is very easy: check this yourself for the above matrix!
(⇐) Suppose T2 = T. Note first that if r ∈ R(T), then r = T(v) for some v ∈ V, whence
Thus T is the identity on R(T). Moreover, if x ∈ R(T) ∩ N (T), (†) says thata x = T(x) = 0.
Now observe that for any v ∈ V,
T v − T(v) = T(v) − T2 (v) = 0 =⇒ v − T(v) ∈ N (T)
so that v = T(v) + v − T(v) is a decomposition into R(T)- and N (T)-parts. We conclude that
V = R(T) ⊕ N (T) and that T is a projection.
a By the Rank–Nullity Theorem, this is enough to establish V = R(T) ⊕ N (T) when V is finite-dimensional.
6
We can generalize the above to describe all projection matrices in M2 (R). There are three cases:
Thusfar the discussion hasn’t had anything to do with inner products. Now we specialize:
In the language above, the identity and zero matrices are both 2 × 2 real orthogonal projection matri-
ces, while those of type 3 are orthogonal if ( dc ) k ( −ab ):
2
1 a 1 a ab
A= 2 ( a b) = 2
a + b2 b a + b2 ab b2
More generally, if β = {v1 , . . . , vk } ≤ Fn is orthonormal, then πSpan β has matrix ∑kj=1 w j w∗j .
Proof. (⇒) By assumption, R(T) and N (T) are orthogonal subspaces. Letting x, y ∈ V and using
subscripts to denote R(T)- and N (T)-parts, we see that
Moreover, since T is a projection already, we have that V = R(T) ⊕ N (T) = R(T) ⊕ R(T)⊥ , from
whicha R(T) = (R(T)⊥ )⊥ = N (T)⊥ .
a Recall
that if V = U ⊕ U ⊥ , then (U ⊥ )⊥ = U. Alternatively, one can check explicitly that ||x − T(x)|| = 0 for any
x∈ N (T) ⊥
to see that N (T)⊥ ≤ R(T) ≤ (R(T)⊥ )⊥ = N (T)⊥ ; though the calculation is really just a combination of part
of Lemma 6.49 and the proof that V = U ⊕ U ⊥ =⇒ (U ⊥ )⊥ = U.
7
Orthogonal Projections and the Spectral Theorem
It should be clear that every projection T has (at most) two eigenspaces:
where rank I = rank T. In particular, every such projection is diagonalizable. The language of pro-
jections allows us to rephrase the Spectral Theorem.
Theorem 6.52 (Spectral Theorem, mk. II). Let V be finite-dimensional and T ∈ L(V ) be
normal/self-adjointa with distinct eigenvalues λ1 , . . . , λk and corresponding eigenspaces E1 , . . . , Ek .
Let π j ∈ L(V ) be the orthogonal projection onto Ej . Then:
2. πi π j = 0 if i 6= j;
3. IV = π1 + · · · + πk ;
4. T = λ1 π1 + · · · + λk πk .
a Normal if V is complex, self-adjoint if V is real.
Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,
the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces
are mutually orthogonal. In particular, this says that
M
Êj := Ei ≤ E⊥
j
i6= j
8
Definition 6.53. The spectrum of a normal/self-adjoint operator T on a finite dimensional inner
product space is its set of eigenvalues. The expressions in parts 3 and 4of the theorem are called,
respectively, the resolution of the identity and the spectral decomposition of T.
1+ i 1+ i
Examples 6.54. 1. Recall Example 6.44 where we had a normal matrix A = −1− i 1+ i with or-
thonormal eigenvectors
1 1 1 1
w2 = √ , w2i = √
2 −i 2 i
Writing π2 , π2i for the orthogonal projection matrices onto these eigenspaces, it is easy to see
that
∗ 1 1 1 1 i ∗ 1 1 −i
π2 = w2 w2 = (1 i ) = π2i = w2i w2i =
2 −i 2 −i 1 2 i 1
It is now easy to check the resolution of the identity and the spectral decomposition:
9
Exercises. 6.6.1. Compute the matrices of the orthogonal projections onto the following subspaces:
in all cases we use the standard inner product.
(a) Span −41 in R2
n 1 1 o
(b) Span 2 , 0 in R3
1 −1
n i 1 o
(c) Span 1 , i in C3
n 01 11 o
(d) Span 1 , 2 in R3 (watch out, these vectors aren’t orthogonal!)
0 1
6.6.2. For each of the matrices in Exercise 6.5.1., compute the projections onto each eigenspace, verify
the resolution of the identity and the spectral decomposition.
6.6.3. If W be a finite-dimensional subspace of an inner product space V. If T = πW is the orthogonal
projection onto W, prove that I − T is the orthogonal projection onto W ⊥ .
6.6.4. Let T ∈ L(V ) where V is finite-dimensional.
(a) If T is an orthogonal projection, prove that ||T(x)|| ≤ ||x|| for all x ∈ V.
(b) Give an example of a projection for which the inequality in (a) is false.
(c) If T is a projection for which ||T(x)|| = ||x|| for all x ∈ V, what is T?
(d) If T is a projection for which ||T(x)|| ≤ ||x|| for all x ∈ V, prove that T is an orthogonal
projection.
6.6.5. Let T be a normal operator on a finite-dimensional inner product space. If T is a projection,
prove that it must be an orthogonal projection.
6.6.6. Let T be a normal operator on a finite-dimensional complex inner product space V. Use the
spectral decomposition T = λ1 π1 + · · · + λk πk to prove:
(a) If Tn is the zero map for some n ∈ N, then T is the zero map.
(b) U ∈ L(V ) commutes with T if and only if U commutes with each π j .
(c) There exists a normal U ∈ L(V ) such that U2 = T.
(d) T is invertible if and only if λ j 6= 0 for all j.
(e) T is a projection if and only if every λ j = 0 or 1.
(f) T = −T∗ if and only if every λ j is imaginary.
10
6.7 The Singular Value Decomposition and the Pseudoinverse
Given T ∈ L(V, W ) between finite-dimensional inner product spaces, the overarching concern of this
chapter is the existence and computation of bases β, γ of V, W with two properties:
• That β, γ be orthonormal, thus facilitating easy calculation within V, W;
• That the matrix [T] β be as simple as possible.
γ
The √bases β, γ come close to ‘diagonalizing’ the operator. The resulting scalars on the main diagonal
(4, 2 3) behave very like eigenvalues. Our main result says that such bases always exist.
Theorem 6.56 (Singular Value Decomposition). Suppose V, W are finite-dimensional inner prod-
uct spaces and that T ∈ L(V, W ) has rank r.
11
Definition 6.57. The numbers σ1 , . . . , σr are the singular values of T. If T is not maximum rank, we
have additional zero singular values σr+1 = · · · = σmin(m,n) = 0. If A is a matrix, its singular values
are those of the linear map L A .
• While the singular values are determined by T, there is often significant freedom of choice of
the bases β, γ, particularly if any of the eigenspaces of T∗ T have dimension ≥ 2.
Since the columns of U, V are the orthonormal elements of β, γ, these matrices are unitary.
Examples 6.58. 1. First recall Example 6.55. We have A∗ A = A T A = 14
2
2
14 with eigenvalues
2 2 1 1
1 − 1
σ1 = 16 and σ2 = 12 and orthonormal eigenvectors v1 = √ 1 , v2 = √ 1 . The singular
√ 2 2
values are therefore σ1 = 4, σ2 = 2 3. Now compute
1 1 1 1 1 −1
w1 = Av1 = √ 0 , w2 = Av2 = √ −2
σ1 2 1 σ2 6 1
1
and observe that these are orthonormal. Finally choose w3 = √1 −1 to complete the or-
3 −1
thonormal basis γ of R3 . We therefore have the singular value decomposition
1
√ √1 √1 !
3 1 2 6 3 4 0
√ √1 √1
√2 −1
A = 2 −2 = UΣV ∗ = 0 √ 0 2 3 −1
2 2
6 3 √ √1
1 3 √1 −1
√ −1
√ 0 0 2 2
2 6 3
2. The matrix A = 20 32 has A T A = 4 6
6 13 with eigenvalues σ12 = 16 and σ22 = 1 and orthonormal
eigenbasis
1 1 1 −2
β= √ , √
5 2 5 1
The singular values are therefore σ1 = 4 and σ2 = 1, from which we obtain
1 1 1 2 1 1
γ= Av1 , Av2 = √ ,√
σ1 σ2 5 1 5 −2
and the singular value decomposition
! !
√2 √1 4 0 √1 √2
A = UΣV =∗ 5 5 5 5
√1 −2
√ 0 1 −2
√ √1
5 5 5 5
12
Proof of the Singular Value Decompostion. 1. T∗ T is self-adjoint: by the spectral theorem it has an
orthonormal basis of eigenvectors β = {v1 , . . . , vn }. Suppose T∗ T(v j ) = λ j v j , then
λ1 ≥ · · · ≥ λr > 0
p
If j ≤ r, define σj := λ j > 0 and w j := σ1j T(v j ), then the set {w1 , . . . , wr } is orthonormal. If
necesary, extend this to an orthonormal basis γ.
T∗ (w j ), vk = w j , T(vk ) = w j , σk wk = σk δjk = σj v j , vk =⇒ T∗ (w j ) = σj v j
It is typically much harder to find singular values in non-standard inner product spaces, since com-
putation of the adjoint is typically so difficult. Here is a classic example.
R1
Example 6.59. Consider the inner product h f , gi = 0 f ( x ) g( x ) dx on the polynomial spaces P2 (R)
and P1 (R). As previously seen, we have orthonormal bases
√ √ √
β = { 5(6x2 − 6x + 1), 3(2x − 1), 1}, γ = { 3(2x − 1), 1}
d
Let T = dx be the derivative operator. It is easy to find the matrix of T:
√
2 15 √0 0
[T]γβ =
0 2 3 0
This matrix is √
already in the required
√ form, whence β, γ are suitable bases, and the singular values
of T are σ1 = 2 15 and σ2 = 2 3.
In case you are unconvinced and want to use the method to evaluate this directly, use the orthonor-
mality of β, γ to compute
60 0 0
[T∗ T] β = ([T]γβ )∗ [T]γβ = 0 12 0 =⇒ σ12 = 60, σ22 = 12
0 0 0
The standard basis of R3 is clearly an orthonormal basis of eigenvectors for this matrix: up to sign,
{[v1 ] β , [v2 ] β , [v3 ] β } is therefore forced to be the standard ordered basis of R3 . This says that β was the
correct basis of P2 (R) all along.
13
The Pseudoinverse
Given the singular values of an operator, it is straightforward to define something that looks like an
inverse map, but which makes sense even when the operator is not invertible!
Definition 6.60. Suppose we have the singular value decomposition of T ∈ L(V, W ). The pseudoin-
verse of T is the linear map T† ∈ L(W, V ) defined by
(
1
σj v j if j ≤ r
T† ( w j ) =
0 otherwise
1 1
A† = v1 w1∗ + v2 w2∗
σ1 σ2
1 1 1 −1
= √ √ (1 0 1) + √ √ √ (−1 − 2 1)
4 2 2 1 2 3 2 6 1
1 1 0 1 1 1 2 −1 1 5 4 1
= + =
8 1 0 1 12 −1 −2 1 24 1 −4 5
which is exactly what we would have found by computing A† = VΣ† U ∗ . Observe that
2 1 1
1 0 1
A† A = and AA† = 1 2 1
0 1 3
1 1 2
are the orthogonal projection matrices onto Span{v1 , v2 } = R2 and Span{w1 , w2 } ≤ R3 respec-
tively. These are projections onto two-dimensional subspaces since rank A = 2. It also easy to
check that
A( A T A)−1 A T = AA†
14
2. Finally we check that the pseudoinverse of the derivative operator behaves roughly as ex-
pected.
d
The pseudoinverse of T = dx : P2 (R) → P1 (R), as seen in Example 6.59, maps
√ 1 √ 1
T† ( 3(2x − 1)) = √ 5(6x2 − 6x + 1) = √ (6x2 − 6x + 1)
2 15 2 3
1 √ 1
T† (1 ) = √ 3(2x − 1) = x −
2 3 2
† † b b √ b 1 b
=⇒ T ( a + bx ) = T a + + √ 3(2x − 1) = a + x− + (6x2 − 6x + 1)
2 2 3 2 2 12
b a b
= x2 + ax − −
2 2 6
The pseudoinverse of ‘differentiation’ therefore returns a particular choice of anti-derivative.
Exercises. 6.7.1. Find the ingredients β, γ and the singular values for each of the following:
x
(a) T ∈ L(R2 , R3 ) where T ( yx ) = x+y
x −y
R1
(b) T : P2 (R) → P1 (R) and T( f ) = f 00 where h f , gi := 0 f ( x ) g( x ) dx
R 2π
(c) V = W = Span{1, sin x, cos x } and h f , gi = 0 f ( x ) g( x ) dx, with T( f ) = f 0 + 2 f
6.7.2. Find a singular value decomposition of each of the matrices:
1 1 1 1 1
(a) 1 1 (b) 11 00 −11 (c) 1 −1 0
−1 −1 1 0 −1
6.7.3. Find an explicit formula for T† in each of the examples in Exercise 6.7.1..
6.7.4. Find the pseudoinverse of each of the matrices in Exercise 6.7.2..
6.7.5. Suppose T : V → W is written according to the singular value decompostion. Compute T∗
in terms of β, γ and prove that γ is a basis of eigenvectors of TT∗ with the same non-zero
eigenvalues as T∗ T, including repetitions.
6.7.6. Suppose T = L(V ) is a normal operator. Prove that each v j in the singular value theorem may
be chosen to be an eigenvector of T and that σj is the modulus of the corresponding eigenvalue.
6.7.7. Let V, W be finite-dimensional inner product spaces and T ∈ L(V, W ). Prove:
(a) If T is injective, then T∗ T is invertible and T† = (T∗ T)−1 T∗ .
(b) If T is surjective, then TT∗ is invertible and T† = T∗ (TT∗ )−1 .
6.7.8. Suppose A ∈ Mm×n (R) and b ∈ Rm and define z = A† b.
(a) If Ax = b is consistent (has a solution), prove that z is the solution with minimal norm.
(b) If Ax = b is inconsistent, prove that z is the unique minimizer of || Az − b|| with minimal
norm.
6.7.9. Find the minimal norm solution to the first system, and the vector which comes closest to
solving the second:
(
3x + 2y + z = 9 3x + y = 1
2x − 2y = 0
x − 2y + 3z = 3
x + 3y = 0
15
6.8 Bilinear and Quadratic Forms
In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space
over a field F: it need not be an inner product space and F can be any field (not just R or C).
Definition 6.62. A bilinear form B : V × V → F is a function which is linear in each entry when the
other is held fixed. That is: ∀v, x, y ∈ V, λ ∈ F,
Examples 6.63. 1. If V is a real inner product space, then the inner product h , i is a symmetric
bilinear form. Note that a complex inner product is not bilinear!
2. If A ∈ Mn (F), then B(x, y) := x T Ay is a bilinear form on Fn . For instance, on R2 ,
T 1 2
B(x, y) = x y = x1 y1 + 2x1 y2 + 2x2 y1
2 0
defines a symmetric bilinear form, though not an inner product since it isn’t positive definite;
for example B(j, j) = 0.
Definition 6.64. Let B be a bilinear form on a finite-dimensional V with basis β = {v1 , . . . , vn }. The
matrix of B with respect to β is the matrix [ B] β = A ∈ Mn (F) with ijth entry
Aij = B(vi , v j )
Given x, y ∈ V, compute their co-ordinate vectors [x] β , [y] β with respect to β, then
The set of bilinear forms on V is therefore in bijective correspondence with the set Mn (F). Moreover,
T
B(y, x) = [y] Tβ A[x] β = [y] Tβ A[x] β = [x]Tβ AT [y] β
Finally, if γ is another basis of V, then an appeal to the change of co-ordinate matrix Q β yields
γ
B(x, y) = [x] Tβ A[y] β = ( Qγ [x]γ )T A( Qγ [y]γ ) = [x]γT ( Qγ )T AQγ [y]γ =⇒ [ B]γ = ( Qγ )T [ B]γ Qγ
β β β β β β
1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form
Q T AQ for some invertible Q.
2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.
16
Diagonalization of symmetric bilinear forms
As with everything else in this chapter, the ideal situation is when a basis can be found which diago-
nalizes a matrix. Bilinear forms are no different. For instance, Example 6.63.2 can be written
1 2
B(x, y) = x T y = x1 y1 + 2x1 y2 + 2x2 y1 = ( x1 + 2x2 )(y1 + 2y2 ) − 4x2 y2
2 0
T
x1 + 2x2 1 0 y1 + 2y2
= (∗)
x2 0 −4 y2
1 0 1 −2
=⇒ [ B]γ = where γ = ,
0 −4 0 1
−1
If β is the standard basis, then the change of co-ordinate matrix is Q β = 01 −12 = 10 21 , hence the
γ
0 0 1 0 0 1 0 1 1 0 1 1
n 1 −1 −3 o
With respect to the basis γ = 0 , 1 , 0 the symmetric bilinear form B is diagonal. If
0 1 1
you’re having trouble believing, invert the change of co-ordinate matrix and check that
17
Warning! If F = R then every symmetric B is diagonalizable with respect to an orthonormal basis
of eigenvectors (for the usual inner product on Rn ). It is very unlikely that the above algorithm will
produce such a basis! Note how γ in the previous example is not orthogonal in R3 . The algorithm
has several advantages over the spectral theorem: it is typically faster than computing eigenvectors
and it applies for vector spaces over any field. A disadvantage of the algorithm is that there are many
many choices: here is an example;
Example 6.67. The bilinear form B(x, y) = x T 16 63 y = x1 y1 + 6x1 y2 + 6x2 y1 + 3x2 y2 can be diag-
onalized as follows:
1 −6
• −16 10 16 63 10 −16 = 10 −033 = [ B]γ where γ = 0 , 1 . This corresponds to
1 −2
16
1 0
−11 0
1
0
• 0 1 63 −2 1 = 0 3 = [ B]η where η = −2 , 1 . This corresponds to
1
√ 6 −1−√37 2+√37 0
6√ 1+ 37 16
[ B]ζ = √ 63
√ = √
74 + 2 37 −1− 37 6 1+ 37 6 0 2− 37
√ (6x +(1+√37)x2 )(6y +(1+√37)y2 ) √ (6x2 −(1+√37)x )(6y2 −(1+√37)y )
B(x, y) = (2 + 37) 1 √1
74+2 37
+ ( 2 − 37) 1 √
74+2 37
1
2. If B is symmetric and F does not have characteristic two (see below), then B is diagonalizable.
2. The converse is more difficult. First suppose B is non-zero (otherwise the result is trivial). We
prove by induction on n = dim V.
If n = 1 the result is trivial: B( x, y) = axy for some a ∈ F is clearly symmetric.
Fix n ∈ N and assume that every non-zero symmetric bilinear form on a dimension n vector
space over a field with char F 6= 2 is diagonalizable! Let dim V = n + 1. By the discussion
below, ∃x ∈ V such that B(x, x) 6= 0. Consider the linear map
T : V → F : v 7→ B(x, v)
18
Aside: Characteristic two fields This means 1 + 1 = 0 in F, which holds, for instance, in the field
Z2 = {0, 1} of remainders modulo 2. We now see the importance char F 6= 2 has to the above result.
• The proof requires the existence of x ∈ V such that B(x, x) 6= 0. If B is non-zero, there exist u, v
such that B(u, v) 6= 0. If both B(u, u) = 0 = B(v, v), then x = u + v does the job
whenever char F 6= 2.
• Consider B(x, y) = x T 01 10 y on the 2D vector space Z22 = { 00 , 10 , 01 , 11 } over Z2 . Ev-
ery element of this space satisfies B(x, x) = 0! Perhaps surprisingly, the matrix of B is identical
with respect to any basis of Z22 , and so B is symmetric but non-diagonalizable.
In the above example notice how the three diagonal matrix representations have something in com-
mon: one each of the diagonal entries are positive and negative. This is a general phenomenon:
Theorem 6.69 (Sylvester’s Law of Inertia). Suppose B is a symmetric bilinear form on a real vector
space V with diagonal matrix representation diag(λ1 , . . . , λn ). Then the number of entries λ j which
are positive/negative/zero is independent of the diagonal representation.
Sketch Proof. For simplicitly, let V = Rn and write B(x, y) = x T Ay where A is symmetric.
1. First define rank B := rank A and observe that the rank of any matrix of B is independent of
basis (exercises).
β = { v 1 , . . . , v p , v p +1 , . . . , v r , v r +1 , . . . , v n }
γ = { w 1 , . . . , w q , w q +1 , . . . , w r , w r +1 , . . . , w n }
19
Quadratic Forms
K : V → F : x 7→ B(x, x)
A function K : V → F is termed a quadratic form when such a symmetric bilinear form exists.
Examples 6.71. 1. If B is a real inner product, then K (v) = hv, vi = ||v||2 is the square of the norm.
2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,
(
n
Aij if i = j
K (x) = x Ax = ∑ xi Aij x j = ∑ ãij xi x j where ãij =
T
i,j=1 1≤ i ≤ j ≤ n 2Aij if i 6= j
3 −1
E.g., K (x) = 3x12 + 4x22 − 2x1 x2 corresponds to the bilinear form B(x, y) = x T −1 4 y
Diagonalizing Conics A fun application of quadratic forms and their relationship to symmetric
(diagonalizable) bilinear forms is to the study of quadratic manifolds: in R2 these are conics, and in
R3 quadratic surfaces such as ellipsoids. For example, the general non-zero quadratic form on R2 is
2 2 T a b
K (x) = ax + 2bxy + cy ! B(v, w) = v w
b c
Diagonalizing with respect to an orthonormal basis {v1 , v2 }, there exist scalars λ1 , λ2 with
With respect to this basis the general conic has the form
λ1 t21 + λ2 t22 + µ1 t1 + µ2 t2 = η, λ1 , λ2 , µ1 , µ2 , η ∈ R
µj
If the λi are non-zero, we may complete the squares via the linear transformations s j = t j + 2λ j . The
canonical forms are then recovered:
Parabola λ1 or λ2 = 0.
Since we only applied a rotation/reflection (change to the orthonormal basis {v1 , v2 }) and translation
(completing the square), it is clear that every conic may be recovered thus from the canonical forms.
One could instead diagonalize K using our earlier algorithm, though this probably won’t produce
an orthonormal basis and so we couldn’t interpret the transformation as a rotation/reflection. By
Sylvester’s Law, however, the diagonal entries will have the same number of positive/negative/zero
terms, so the canonical forms will still look the same.
20
Examples 6.72. 1. We describe and plot the conic with equation 7x2 + 24xy = 144.
The matrix of the associated bilinear form is 12 7 12
0 t2 y
which has orthonormal eigenpairs 4
4 t1
3
(λ1 , v1 ) = (16, 15 43 ), (λ2 , v2 ) = (−9, 15 −43 )
22 4
3
In the rotated basis, we have the canonical hyperbola 1v 2
2
t21 t22 v11
16t21 − 9t22 = 144 ⇐⇒ − =1 −1
32 42 −4 −2 −1 2 4
−2 −2 x
which is easily plotted. Note here that −3 −2 −3
−4 −4
1 1
t1 = (4x + 3y), t2 = (−3x + 4y)
5 5 −4
which quickly recovers the original equation.
2. The conic defined by K (x) = x2 + 12xy + 3y2 = −33 de- t2
fines a hyperbola in accorandance with Example 6.67. With y
6
respect to the basis η = { −12 , 01 }, we see that −3
Exercises. 6.8.1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-
ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.
(You can’t use matrices here, since V could be infinite-dimensional!)
6.8.2. Compute the matrix of the bilinear form
B(x, y) = x1 y1 − 2x1 y2 + x2 y1 − x3 y3
n 1 1
0 o
on R3 with respect to the basis β = 0 , 0 , 1 .
1 −1 0
6.8.3. Check that the function B( f , g) = f 0 (0) g00 (0)
is a bilinear form on the vector space of twice-
differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}
when restricted to the subspace Span β.
21
6.8.4. For each matrix A with real valued entries, find a diagonal matrix D and an invertible matrix
Q such that Q T AQ = D.
3 1 2
1 3
(a) (b) 1 4 0
3 2
2 0 −1
6.8.5. If K is a quadratic form and K (x) = 2, what is the value of K (3x)?
6.8.6. If F does not have characteristic 2, and K (x) = B(x, x) is a quadratic form, prove that we can
recover the bilinear form B via
1
B(x, y) = (K (x + y) − K (x) − K (y))
2
0 1 y is a bilinear form on F2 , compute the quadratic form K ( x ).
6.8.7. If B(x, y) = x T 10
6.8.8. With reference to the proof of Sylvester’s Law, explain why B(x, x) = 0 ⇐⇒ Ax = 0. Also
explain why rank B is independent of the choice of diagonalizing basis.
6.8.9. If char F 6= 2, apply the diagonalizing algorithm to the symmetric bilinear form B = x T 01 10 y
on F2 . What goes wrong if char F = 2?
6.8.10. Describe and plot the following conics:
(a) x2 + y2 + xy = 6
(b) 35x2 + 120xy = 4x + 3y
6.8.11. Suppose that a non-empty, non-degeneratea conic C in R2 has the form ax2 + 2bxy + cy2 +
dx + ey + f = 0, where at least one of a, b, c 6= 0, and define ∆ = b2 − ac. Prove that:
• C is a parabola if and only if ∆ = 0;
• C is an ellipse if and only if ∆ < 0;
• C is a hyperbola if and only if ∆ > 0.
(Hint: λ1 , λ2 are the eigenvalues of a symmetric matrix, so. . . )
a Theconic contains at least two points and cannot be factorized as a product of two straight lines: for example, the
following are disallowed;
• x2 + y2 + 1 = 0 is empty (unless one allows conics over C . . .)
• x2 + y2 = 0 contains only one point;
• x2 − xy − x + y = ( x − 1)( x − y) = 0 is the product of two lines.
22