0% found this document useful (0 votes)
52 views

4 Matrix PDF

This document introduces matrix algebra and provides definitions of key matrix terminology. It begins by stating that matrix algebra provides a useful mathematical notation for developing results involving linear combinations and transformations of variables. The document then defines what a matrix is, introduces common matrix notation, and provides definitions for column vectors, row vectors, and several special types of matrices including identity, diagonal, triangular, and symmetric matrices. It also discusses partitioning matrices into sub-matrices. Finally, it reviews basic matrix operations of addition, subtraction, and equality.

Uploaded by

Abraham Jorque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

4 Matrix PDF

This document introduces matrix algebra and provides definitions of key matrix terminology. It begins by stating that matrix algebra provides a useful mathematical notation for developing results involving linear combinations and transformations of variables. The document then defines what a matrix is, introduces common matrix notation, and provides definitions for column vectors, row vectors, and several special types of matrices including identity, diagonal, triangular, and symmetric matrices. It also discusses partitioning matrices into sub-matrices. Finally, it reviews basic matrix operations of addition, subtraction, and equality.

Uploaded by

Abraham Jorque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

4

Introduction to Matrix
Algebra

In the previous chapter, we learned the algebraic results that form the founda-
tion for the study of factor analysis and structural equation modeling. These
results, powerful as they are, are somewhat cumbersome to apply in more
complicated systems involving large numbers of variables. Matrix algebra
provides us with a new mathematical notation that is ideally suited for de-
veloping results involving linear combinations and transformations. Once we
have developed a few basic results and learned how to use them, we will be
ready to derive the fundamental equations of factor analysis and structural
modeling.

4.1 BASIC TERMINOLOGY

Denition 4.1 (Matrix) A matrix is dened as an ordered array of numbers, of


dimensions p,q.

Our standard notation for a matrix A of order p, q will be:

p Aq

There are numerous other notations. For example, one might indicate a
matrix of order p, q as A (p q). Frequently, we shall refer to such a matrix
as a p q matrix A.
53
54 INTRODUCTION TO MATRIX ALGEBRA

On occasion, we shall refer explicitly to the elements of a matrix (i.e., the


numbers or random variables in the array). In this case, we use the following
notation to indicate that A is a matrix with elements aij .

A = {aij }

When we refer to element aij , the rst subscript will refer to the row
position of the elements in the array. The second subscript (regardless of
which letter is used in this position) will refer to the column position. Hence,
a typical matrix p Aq will be of the form:

a11 a12 a13 a1q
a21 a22 a23 a2q

a31 a32 a33 a3q
A=
.. .. .. .. ..
. . . . .
ap1 ap2 ap3 apq
We shall generally use bold capital letters to indicate matrices, and employ
lower case letters to signify elements of an array, except where clarity dictates.
In particular, we may nd it convenient, when referring to matrices of random
variables, to refer to the elements with capital letters, so we can distinguish
notationally between constants and random variables.

Denition 4.2 (Column Vector) A column vector of numbers or random


variables will be a matrix of order p 1. We will, in general, indicate column vectors
with the following notation:
p x1

Denition 4.3 (Row Vector) A row vector of numbers or random variables


will be a matrix of order 1 q. We will, in general, indicate row vectors with the
following notation:

1 xq

A column vector with all elements equal to one will be symbolized as either
j or 1.
Note how we reserve the use of boldface for matrices and vectors.

4.1.1 Special Matrices

We will refer occasionally to special types of matrices by name. For any p Aq ,

1. If p = q, A is a rectangular matrix.

2. If p = q, A is a square matrix.
BASIC TERMINOLOGY 55

3. In a square matrix, the elements aii , i = 1, p dene the diagonal of the


matrix.

4. A square matrix is lower triangular if aij = 0 for i < j.

5. A square matrix is upper triangular if aij = 0 for i > j.

6. A square matrix is a diagonal matrix if aij = 0 for i = j

7. A square matrix is a scalar matrix if it is a diagonal matrix and all


diagonal elements are equal.

8. An identity matrix is a scalar matrix with diagonal elements equal to


one. We use the notation Ip to denote a p p identity matrix.

9. 0, a matrix composed entirely of zeros, is called a null matrix.

10. A square matrix A is symmetric if aij = aji for all i, j.

11. A 1 1 matrix is a scalar.

Example 4.1 (Special Matrices) Some examples follow:


1. A rectangular matrix  
1 2 3 4
5 6 7 8

2. A square matrix  
1 2
3 4

3. A lower triangular matrix



1 0 0 0
2 3 0 0

4 5 6 0
7 8 9 10

4. An upper triangular matrix



1 2 3 4
0 5 6 7

0 0 8 9
0 0 0 10

5. A diagonal matrix
1 0 0
0 2 0
0 0 7
56 INTRODUCTION TO MATRIX ALGEBRA

6. A scalar matrix
2 0 0
0 2 0
0 0 2

7. A symmetric matrix
1 2 3
2 2 4
3 4 2

4.1.2 Partitioning of Matrices

In many theoretical discussions of matrices, it will be useful to conceive of a


matrix as being composed of sub-matrices. When we do this, we will parti-
tion the matrix symbolically by breaking it down into its components. The
components can be either matrices or scalars. Here is a simple example.

Example 4.2 (A Simple Partitioned Matrix) In discussions of simple


multiple regression, where there is one criterion variable Y and p predictor variables
that can be placed in a vector x, it is common to refer to the correlation matrix of
the entire set of variables using partitioned notation, as follows:
 
1 rY x
R=
rxY Rxx

In the above notation, the scalar 1 refers to the correlation of the Y variable
with itself. The row vector rY x refers to the set of correlations between the
variable Y and the set of p random variables in x. Rxx is the p p matrix
of correlations of the predictor variables. We will refer to the order of the
partitioned form as the number of rows and columns in the partitioning,
which is distinct from the number of rows and columns in the matrix being
represented. For example, suppose there were p = 5 predictor variables in
Example 4.2. Then the matrix R is a 6 6 matrix, but the example shows a
2 2 partitioned form.
When matrices are partitioned properly, it is understood that pieces that
appear to the left or right of other pieces have the same number of rows, and
pieces that appear above or below other pieces have the same number of
columns. So, in the above example, Rxx , appearing to the right of the p 1
column vector rxY , must have p rows, and since it appears below the 1 p
row vector rY x , it must have p columns. Hence, it must be a p p matrix.
SOME MATRIX OPERATIONS 57

4.2 SOME MATRIX OPERATIONS

In this section, we review the fundamental operations on matrices.

4.2.1 Matrix (and Vector) Addition and Subtraction

For the addition and subtraction operations to be dened for two matrices A,
B, they must be conformable.

Denition 4.4 (Conformability for Addition and Subtraction) Two


matrices are conformable for addition and subtraction if and only if they are of the
same order.

Denition 4.5 (Matrix Addition) Let A = {aij } and B = {bij }. Let A and
B be conformable. The sum A + B = C is dened as:

C = A + B = {cij } = {aij + bij }

Denition 4.6 (Matrix Subtraction) Let A = {aij } and B = {bij }. Let A


and B be conformable. The sum A B = C is dened as:

C = A B = {cij } = {aij bij }

Comment. Matrix addition and subtraction are natural, intuitive exten-


sions to scalar addition and subtraction. One simply adds elements in the
same position.

1 4 5 3 2 1
Example 4.3 (Matrix Addition) Let A = 2 3 4 , and B = 2 3 1 .
4 4 0 1 3 2
D=A
Find C = A + B and B.
4 6 6 2 2 4
Solution. C = 4 6 5 D = 0 0 3
5 7 2 3 1 2

Denition 4.7 (Matrix Equality) Two matrices are equal if and only if they
are of the same row and column order, and have all elements equal.

Matrix addition has some important mathematical properties, which, for-


tunately, mimic those of scalar addition and subtraction. Consequently, there
is little negative transfer involved in generalizing from the scalar to the
matrix operations.
58 INTRODUCTION TO MATRIX ALGEBRA

Result 4.1 (Properties of Matrix Addition) For matrices A, B, and C,


properties include:

1. Associativity
A + (B + C) = (A + B) + C

2. Commutativity
A+B=B+A

3. There exists a neutral element for addition, i.e., the null matrix 0,
such that A + 0 = A.

4. There exist inverse elements for addition, in the sense that for any ma-
trix A, there exists a matrix A, such that A + A = 0.

4.2.2 Scalar Multiples and Scalar Products

In the previous section, we examined some matrix operations, addition and


subtraction, that operate very much like their scalar algebraic counterparts.
In this section, we begin to see a divergence between matrix algebra and scalar
algebra.

Denition 4.8 (Scalar Multiplication) Given a matrix A = {aij }, and a


scalar c. Then B = cA = {caij } is called a scalar multiple of A.

Comment. Scalar multiples are not to be confused with scalar products,


which will be dened subsequently. Scalar multiplication is a simple idea
multiply a matrix by a scalar, and you simply multiply every element of the
matrix by the scalar.
   
2 1 4 2
Example 4.4 (Scalar Multiple) Let A = . Then 2A = .
3 4 6 8

Result 4.2 (Properties of Scalar Multiplication) For matrices A and


B, and scalars a and b, scalar multiplication has the following mathemati-
cal properties:

1. (a + b)A = aA + bA

2. a(A + B) = aA + aB

3. a(bA) = (ab)A

4. aA = Aa
SOME MATRIX OPERATIONS 59

Denition 4.9 (Scalar Product) Given row vector 1 ap and p b1 . Let a = {ai }
and b = {bi }. The scalar product a b is dened as

a b = ai b i
i

Note: This is simply the sum of cross products of the elements of the two vectors.



4
Example 4.5 (Scalar Product) Let a = 
1 2 3 . Let b = 2 .
1
Then a b = 11.

4.2.3 Matrix Multiplication

Matrix multiplication is an operation with properties quite dierent from its


scalar counterpart. To begin with, order matters in matrix multiplication.
That is, the matrix product AB need not be the same as the matrix product
BA. Indeed, the matrix product AB might be well-dened, while the product
BA might not exist. To begin with, we establish the fundamental property
of conformability.

Denition 4.10 (Conformability for Matrix Multiplication) p Aq and


r Bsare conformable for matrix multiplication if and only if q = r.

The matrix multiplication operation is dened as follows.

Denition 4.11 (Matrix Multiplication) Let p Aq = {aij } and q Bs = {bjk }.


Then p Cs = AB = {cik } where:
q

cik = aij bjk
j=1

Example 4.6 (The Row by Column Method) The meaning of the formal
denition of matrix multiplication might not be obvious at rst glance. Indeed,
there are several ways of thinking about matrix multiplication. The rst way, which
I call the row by column approach, works as follows. Visualize p Aq as a set of p row
vectors and q Bs as a set of s column vectors. Then if C = AB, element cik of C
is the scalar product (i.e., the sum of crossproducts) of the ith row of A
with the

2 4 6 4 1
kth column of B. For example, let A = 5 7 1 , and let B = 0 2 .
2 3 5 5 1
60 INTRODUCTION TO MATRIX ALGEBRA


38 16
Then C = AB = 25 20 . Consider element c21 , which has a value of 25. This
33 13
element, which is in the second row and rst column of C, is computed by taking
the sum of cross products of the elements of the second row of A with the rst
column of B. That is, (5 4) + (7 0) + (1 5) = 25.

Result 4.3 (Properties of Matrix Multiplication) The following are some


key properties of matrix multiplication:
1. Associativity.
(AB)C = A(BC)

2. Not generally commutative. That is, often AB = BA.


3. Distributive over addition and subtraction.
C(A + B) = CA + CB

4. Assuming it is conformable, the identity matrix I functions like the num-


ber 1, that is p Aq Iq = A, and p Ip Aq = A.
5. AB = 0 does not necessarily imply that either A = 0 or B = 0.
Several of the above results are surprising, and result in negative transfer
for beginning students as they attempt to reduce matrix algebra expressions.

Example 4.7 (A Null Matrix Product) The following example shows that
one can, indeed, obtain a null matrix
as the product of two non-null matrices. Let

8 12 12

a = 6 2 2 , and let B = 12 40 4 . Then a B = 0 0 0 .
12 4 40

Denition 4.12 (Pre-multiplication and Post-multiplication) When we


talk about the product of matrices A and B,it is important to remember that
AB and BA are usually not the same. Consequently, it is common to use the terms
pre-multiplication and post-multiplication. When we say A is post-multiplied
by B, or B is pre-multiplied by A, we are referring to the product AB. When
we say B is post-multiplied by A, or A is pre-multiplied by B, we are referring
to the product BA.

4.2.4 Matrix Transposition

Transposing a matrix is an operation which plays a very important role in


multivariate statistical theory. The operation, in essence, switches the rows
and columns of a matrix.
SOME MATRIX OPERATIONS 61

Denition 4.13 (Transpose of a Matrix) Let p Aq = {aij }. Then the


transpose of A, denoted A , is dened as

q Ap = {aji }

Comment. The above notation may seem cryptic to the beginner, so some
clarication may be useful. The typical element of A is aij . This means that,
in the i,j position of matrix A is found element aij . Suppose A is of order
3 3. Then it can be written as

a11 a12 a13
A = a21 a22 a23
a31 a32 a33

A is a matrix with typical element aji . This means that, in the i,j position
of matrix A is found element aji of the original matrix A. For example,
element 2,1 of A is element 1,2 of the original matrix. So, in terms of the
elements of the original A, the transpose has the representation

a11 a21 a31
A = a12 a22 a32
a13 a22 a33

Studying the above element-wise representation, you can see that trans-
position does not change the diagonal elements of a matrix, and constructs
columns of the transpose from the rows of the original matrix.

Example 4.8 (Matrix Transposition)



  1 1
1 2 3
Let A = . Then A = 2 4
1 4 5
3 5

Result 4.4 (Properties of Matrix Transposition) Here are some frequently


used properties of the matrix transposition operation:

1. (A ) = A

2. (cA) = cA

3. (A + B) = A + B

4. (AB) = B A

5. A square matrix A is symmetric if and only if A = A .

Comment. You should study the above results carefully, as they are used
frequently in reducing and simplifying matrix algebra expressions.
62 INTRODUCTION TO MATRIX ALGEBRA

4.2.5 Linear Combinations of Matrix Rows and Columns

We have already discussed the row by column conceptualization of matrix


multiplication. However, there are some other ways of conceptualizing matrix
multiplication that are particularly useful in the eld of multivariate statistics.
To begin with, we need to enhance our understanding of the way matrix
multiplication and transposition works with partitioned matrices.

Denition 4.14 (Multiplication and Transposition of Partitioned Ma-


trices) Being able to transpose and multiply a partitioned matrix is a skill that
is important for understanding the key equations of structural equation modeling.
Assuming that the matrices are partitioned properly, the rules are quite simple:

1. To transpose a partitioned matrix, treat the sub-matrices in the partition as


though they were elements of a matrix, but transpose each sub-matrix. The
transpose of a p q partitioned form will be a q p partitioned form.

2. To multiply partitioned matrices, treat the sub-matrices as though they were


elements of a matrix. The product of p q and q r partitioned forms is a
p r partitioned form.

Some examples will illustrate the above denition.

Example 4.9 (Transposing a Partitioned Matrix)


Suppose A is partitioned as

C D
A= E F
G H

Then  
C E G
A =
D F H

Example 4.10 (Product


 of two Partitioned Matrices) Suppose A =

G
X Y and B = . Then AB = XG + YH .
H

Example 4.11 (Linearly Combining Columns of a Matrix) Consider


an N p matrix X, containing the scores of N persons on p variables. One can
conceptualize the matrix X as a set of p column vectors. In partitioned matrix
form, we can represent X as


X = x1 x2 x3 xp

Now suppose one were to post-multiply X with a p 1 vector b. The product is


a N 1 column vector y. Utilizing the above results on multiplication of partitioned
SOME MATRIX OPERATIONS 63

matrices, it is easy to see that the result can be written as follows.

y = Xb

b1
b2


b3


= x1 x2 x3 xp
..
.
bp
= b1 x1 + b2 x2 + b3 x3 + + bp xp

The above example illustrates a very general principle in matrix algebra.


If you wish to linearly combine a set of variables that are in the columns of
a matrix X, simply place the desired linear weights in a vector b, and post-
multiply X by b. We illustrate this below with a simple numerical example.

Example 4.12 (Computing Dierence Scores) Suppose the matrix X


consists of a set of scores on two variables, and you wish to compute the dierence
scores on the variables. Simply apply the linear weight +1 to the rst column,
and 1 to the second column, by post-multiplying with a vector containing those
weights. Specically

y = Xb

80 70  
77 +1
= 79
1
64 64

80 70
= +1 77 1 79
64 64

10
= 2
0

Example 4.13 (Computing Course Grades) Suppose the matrix X con-


tained grades on two exams, a mid-term and a nal, and that the nal exam is
counted twice as much as the mid-term in determining the course grade. In that
case, course grades might be computed using the linear weights +1/3 and +2/3.
64 INTRODUCTION TO MATRIX ALGEBRA

Specically

y = Xb

80 70  
77 +1/3
= 79
+2/3
64 64

80 70
= +1/3 77 + 2/3 79
64 64
1

73 3
= 78 1
3
64

In the preceding examples, we linearly combined columns of a matrix, using


post-multiplication by a vector of linear weights. Of course, one can perform
an analogous operation on the rows of a matrix by pre-multiplication with a
row vector of linear weights.

Example 4.14 (Linearly Combining Rows of a Matrix) Suppose we view


the p q matrix X as being composed of p row vectors. If we pre-multiply X with
a 1 p row vector b , the elements of b are linear weights applied to the rows of
X. Specically,

y = b X

x1


x2

= b1 b2 bp ..
.
xp
= b1 x1 + b2 x2 + . . . + bp xp

4.2.6 Sets of Linear Combinations

There is, of course, no need to restrict oneself to a single linear combina-


tion of the rows and columns of a matrix. To create more than one linear
combination, simply add columns (or rows) to the post-multiplying (or pre-
multiplying) matrix!

Example 4.15 (Taking the Sum and Dierence of Two Columns) Sup-
pose the matrix X consists of a set of scores on two variables, and you wish to
compute both the sum and the dierence scores on the variables. In this case, we
post-multiply by two vectors of linear weights, creating two linear combinations.
SOME MATRIX OPERATIONS 65

Specically

y = XB

80 70  
= 77 79 +1 +1
+1 1
64 64

150 10
= 156 2
128 0

Linear Combinations can be used to perform a number of basic operations


on matrices. For example, you can

1. Extract a column of a matrix;

2. Extract a row of a matrix;

3. Exchange rows and/or columns of a matrix;

4. Re-scale the rows and/or columns of a matrix by multiplication or divi-


sion.

5. Extract a particular single element from a matrix.

See if you can gure out, for yourself, before scanning the examples below,
how to perform the above 5 operations with linear combinations.

Example 4.16 (Extracting a Column of a Matrix) This is perhaps the


simplest case of a linear combination. Simply multiply the desired column by the
linear weight +1, and make the linear weights 0 for all other columns. So, for
example,
1 4   4
2 5 0
= 5
1
3 6 6
Comment. We can, if we wish, give such a matrix a formal name! (See denition
below.)

Denition 4.15 (Selection Vector) The selection vector s[i] is a vector with
all elements zero except the ith element, which is 1.

Example 4.17 (Extracting a Row of a Matrix) Just as we can extract


the ith column of a matrix by post-multiplication by s[i] , we can extract the ith
row by pre-multiplication by s[i] . In this case, we extract the second row by pre-
66 INTRODUCTION TO MATRIX ALGEBRA

multiplication by s[2] .


1 4

0 1 0 2 5 = 2 5
3 6

Example 4.18 (Exchanging Columns of a Matrix) We can use two strate-


gically selected linear combinations to exchange columns of a matrix. Suppose we
wish to exchange columns 1 and 2 of the matrix from the previous example. Con-
sider a post-multiplying matrix of linear weights B. We simply make its rst column
s[2] , thereby selecting the second column of the matrix to be the rst column of the
new matrix. We then make the second column of B the selection vector s[1] , placing
the rst column of the old matrix in the second column position of the new matrix.
This is illustrated below.

1 4   4 1
2 5 0 1
= 5 2
1 0
3 6 6 3

Comment. Exchanging rows of a matrix is an obvious generalization. Simply pre-


multiply by appropriately chosen (row) selection vectors.

Example 4.19 (Rescaling Rows and Columns of a Matrix) This is accom-


plished by post-multiplication and/or pre-multiplication by appropriately congured
diagonal matrices. Consider the following example.

1 4   2 12
2 5 2 0
= 4 15
0 3
3 6 6 18

Example 4.20 (Selecting a Specic Element of a Matrix) By selecting the


column of a matrix with post-multiplication, and the row with pre-multiplication,
one may, using two appropriately chosen selection vectors, pick out any element
of an array. For example, if we wished to select element X12 , we would pre-multiply
by s[1] and post-multiply by s[2] . For example,

 

1 4 0
1 0 0 2 5 =4
1
3 6
MATRIX ALGEBRA OF SOME SAMPLE STATISTICS 67

4.3 MATRIX ALGEBRA OF SOME SAMPLE STATISTICS

In this section, we show how matrix algebra can be used to express some
common statistical formulas in a succinct way that allows us to derive some
important results in multivariate analysis.

4.3.1 The Data Matrix

Suppose we wish to discuss a set of sample data representing scores for N


people on p variables. We can represent the people in rows and the variables
in columns, or vice-versa. Placing the variables in columns seems like a more
natural way to do things for the modern computer user, as most computer
les for standard statistical software represent the cases as rows, and the
variables as columns. Ultimately, we will develop the ability to work with
both notational variations, but for the time being, well work with our data
in column form, i.e., with the variables in columns. Consequently, our
standard notation for a data matrix is N Xp .

4.3.2 Converting to Deviation Scores

Suppose x is an N 1 matrix of scores for N people on a single variable. We


wish to transform the scores in x to deviation score form. (In general, we will
nd this a source of considerable convenience.) To accomplish the deviation
score transformation, the arithmetic mean x , must be subtracted from each
score in x.
Let 1 be a N 1 vector of ones. We will refer to such a vector on occasion
as a summing vector, for the following reason. Consider any vector x, for
example a 3 1 column vector with the numbers 1, 2, 3. If we compute 1 x,
we are taking the sum of cross-products of a set of 1s with the numbers in x.
In summation notation,

N
N

1 x = 1i xi = xi
i=1 i=1

So 1 x is how we express the sum of the xs in matrix notation.


Consequently,

x = (1/N )1 x

To transform x to deviation score form, we need to subtract x from every


element of x. We can easily construct a vector with every element equal to
x by simply multiplying the scalar x by a summing vector. Consequently,
if we denote the vector of deviation scores as x , we have
68 INTRODUCTION TO MATRIX ALGEBRA

x = x 1x

1x
= x1 (4.1)
N

11
= x x
N 
11
= x x
N

11
= I x (4.2)
N
= (I P) x (4.3)

x = Qx (4.4)

where
Q=IP
and
11
P=
N
Comment. A number of points need to be made about the above derivation:

1. You should study the above derivation carefully, making certain you
understand all steps.

2. You should carefully verify that the matrix 11 is an N N matrix


of 1s, so the expression 11 /N is an N N matrix with each element
equal to 1/N (Division of matrix by a non-zero scalar is a special case of
a scalar multiple, and is perfectly legal).

3. Since x can be converted from raw score form to deviation score form
by pre-multiplication with a single matrix, it follows that any particular
deviation score can be computed with one pass through a list of numbers.

4. We would probably never want to compute deviation scores in practice


using the above formula, as it would be inecient. However, the formula
does allow us to see some interesting things that are dicult to see using
scalar notation (more about that later).

5. If one were, for some reason, to write a computer program using Equa-
tion 4.4, one would not need (or want) to save the matrix Q, for several
reasons. First, it can be very large! Second, no matter how large N is,
the elements of Q take on only two distinct values. Diagonal elements
of Q are always equal to (N 1)/N , and o-diagonal elements are al-
ways equal to 1/N . In general, there would be no need to store the
numbers.
MATRIX ALGEBRA OF SOME SAMPLE STATISTICS 69

Example 4.21 (The Deviation Score Projection Operator) Any vector


of N raw scores can be converted into deviation score form by pre-multiplication by
a projection operator Q. Diagonal elements of Q are always equal to (N 1)/N ,
and o-diagonal elements are always equal to 1/N . Suppose we have the vector

4
x= 2
0

Construct a projection operator Q such that Qx will be in deviation score form.


Solution. We have

2/3 1/3 1/3 4
Qx = 1/3 2/3 1/3 2
1/3 1/3 2/3 0

2
= 0
2

Example 4.22 (Computing the ith Deviation Score) An implication of


the preceding result is that one can compute the ith deviation score as a single linear
combination of the N scores in a list. For example, the 3rd deviation score in a list
of 3 is computed as [dx]3 = 1/3x1 1/3x2 + 2/3x3 .

Let us now investigate the properties of the matrices P and Q that accom-
plish this transformation. First, we should establish an additional denition
and result.

Denition 4.16 (Idempotent Matrix) A matrix C is idempotent if C2 =


CC = C.

Theorem 4.1 If C is idempotent and I is a conformable identity matrix,


then I C is also idempotent.

2
Proof. To prove the result, we need merely show that (I C) = (I C).
This is straightforward.

2
(I C) = (I C) (I C)
= I2 CI IC + C2
= ICC+C
= IC
2
70 INTRODUCTION TO MATRIX ALGEBRA

Recall that P is an N N symmetric matrix with each element equal to


1/N . P is also idempotent, since:

11 11
PP =
N N
11 11
=
N2
1 (1 1) 1
=
N2
1 (N ) 1
=
N2
11 (N )
=
N2
N
= 11 2
N
11
=
N
= P

The above derivation demonstrates some principles that are generally useful
in reducing simple statistical formulas in matrix form:

1. Scalars can be moved through matrices to any position in the expres-


sion that is convenient.

2. Any expression of the form x y is a scalar product, and hence it is a


scalar, and can be moved intact through other matrices in the expres-
sion. So, for example, we recognized that 1 1 is a scalar and can be
reduced and eliminated in the above derivation.

You may easily verify the following properties:

1. The matrix Q = I P is also symmetric and idempotent. (Hint: Use


Theorem 4.1.)

2. Q1 = 0 (Hint: First prove that P1 = 1.)

4.3.3 The Sample Variance and Covariance


2
Since the sample variance SX is dened as in Denition 3.6 as the sum of
squared deviations divided by N 1, it is easy to see that, if scores in a vector
x are in deviation score form, then the sum of squared deviations is simply
x x , and the sample variance may be written
2
SX = 1/(N 1)x x (4.5)
MATRIX ALGEBRA OF SOME SAMPLE STATISTICS 71

If x is not in deviation score form, we may use the Q operator to convert


it into deviation score form rst. Hence, in general,
2
SX = 1/(N 1)x x
= 1/(N 1)Qx Qx
= 1/(N 1)x Q Qx,

since the transpose of a product of two matrices is the product of their trans-
poses in reverse order.
The expression can be reduced further. Since Q is symmetric, it follows
immediately that Q = Q, and (remembering also that Q is idempotent) that
Q Q = Q. Hence
2
SX = 1/(N 1) x Qx
As an obvious generalization of the above, we write the matrix form for
the covariance between two vectors of scores x and y as

SXY = 1/(N 1)x Qy


Some times a surprising result is staring us right in the face, if we are only
able to see it. Notice that the sum of cross products of deviation scores can
be computed as

x y = (Qx) (Qy)
= x Q Qy
= x Qy
= x (Qy)
= x y
= y x

Because products of the form QQ or QQ can be collapsed into a single Q,


when computing the sum of cross products of deviation scores of two variables,
one variable can be left in raw score form and the sum of cross products will
remain the same! This surprising result is somewhat harder to see (and prove)
using summation algebra.
In what follows, we will generally assume, unless explicitly stated otherwise,
that our data matrices have been transformed to deviation score form. (The
Q operator discussed above will accomplish this simultaneously for the case
of scores of N subjects on several, say p, variates.) For example, consider a
data matrix N Xp , whose p columns are the scores of N subjects on p dierent
variables. If the columns of X are in raw score form, the matrix X = QX
will have p columns of deviation scores.
We shall concentrate on results in the case where X is in column variate
form, i.e., is an N p matrix. Equivalent results may be developed for row
variate form p N data matrices which have the N scores on p variables
72 INTRODUCTION TO MATRIX ALGEBRA

arranged in p rows. The choice of whether to use row or column variate repre-
sentations is arbitrary, and varies in books and articles. One must, ultimately,
be equally uent with either notation, although modern computer software
tends to emphasize column variate form.

4.3.4 The Variance-Covariance Matrix

Consider the case in which we have N scores on p variables. We dene the


variance-covariance matrix Sxx to be a symmetric p p matrix with element
sij equal to the covariance between variable i and variable j. Naturally, the
ith diagonal element of this matrix contains the covariance of variable i with
itself, i.e., its variance. As a generalization of our results for a single vector
of scores, the variance-covariance matrix may be written as follows. First, for
raw scores in column variate form:

Sxx = 1/(N 1)X QX


We obtain a further simplication if X is in deviation score form. In that
case, we have:

Sxx = 1/(N 1)X X


Note that some authors use the terms variance-covariance matrix and
covariance matrix interchangeably.

4.3.5 The Correlation Matrix


For p variables in the data matrix X, the correlation matrix Rxx is a p p
symmetric matrix with typical element rij equal to the correlation between
variables i and j. Of course, the diagonal elements of this matrix represent the
correlation of a variable with itself, and are all equal to 1. Recall that all of
the elements of the variance-covariance matrix Sxx are covariances, since the
variances are covariances of variables with themselves. We know that, in order
to convert sij (the covariance between variables i and j) to a correlation, we
simply standardize it by dividing by the product of the standard deviations
of variables i and j. This is very easy to accomplish in matrix notation.
Specically, let Dxx = diag (Sxx ) be a diagonal matrix with ith diagonal
element equal to the variance of the ith variable in X. Then let D1/2 be a
diagonal matrix with elements equal to standard deviations, and D1/2 be
a diagonal matrix with ith diagonal element equal to 1/si , where si is the
standard deviation of the ith variable. Then we may quickly verify that the
correlation matrix is computed as:

Rxx = D1/2 Sxx D1/2


VARIANCE OF A LINEAR COMBINATION 73

4.3.6 The Covariance Matrix

Given N Xm and N Yp , two data matrices in deviation score form. The covari-
ance matrix Sxy is a m p matrix with element sij equal to the covariance
between the ith variable in X and the jth variable in Y. Sxy is computed as

Sxy = 1/(N 1)X Y

4.4 VARIANCE OF A LINEAR COMBINATION

In an earlier section, we developed a summation algebra expression for eval-


uating the variance of a linear combination of variables. In this section, we
derive the same result using matrix algebra.
We rst note the following result.

Theorem 4.2 Given X, a data matrix in column variate deviation score


form. For any linear composite y = Xb, y will also be in deviation score
form.

Proof. The variables in X are in deviation score form if and only if the sum
of scores in each column is zero, i.e., 1 X = 0 . But if 1 X = 0 , then for any
linear combination y = Xb, we have, immediately,

1 y = 1 Xb
= (1 X) b
= 0 b
= 0

Since, for any b, the linear combination scores in y sum to zero, it must
be in deviation score form. 2

We now give a result that is one of the cornerstones of multivariate statis-


tics.

Theorem 4.3 (Variance of a Linear Combination) Given X, a set of


N deviation scores on p variables in column variate form, having variance-
covariance matrix Sxx . The variance of any linear combination y = Xb may
be computed as
Sy2 = b Sxx b (4.6)
74 INTRODUCTION TO MATRIX ALGEBRA

Proof. Suppose X is in deviation score form. Then, by Theorem 4.2, so must


y = Xb, for any b. From Equation 4.5, we know that

Sy2 = 1/(N 1) y y

= 1/(N 1) (Xb) (Xb)
 
= 1/(N 1) b X Xb


= b 1/(N 1) X X b
= b Sxx b

This is a very useful result, as it allows to variance of a linear composite


to be computed directly from the variance-covariance matrix of the original
variables. This result may be extended immediately to obtain the variance-
covariance matrix of a set of linear composites in a matrix Y = XB. The
proof is not given as, it is a straightforward generalization of the previous
proof.

Theorem 4.4 (Variance-Covariance Matrix of Several Linear Combinations)


Given X, a set of N deviation scores on p variables in column variate form,
having variance-covariance matrix Sxx . The variance-covariance matrix of
any set of linear combinations Y = XB may be computed as

SYY = B Sxx B (4.7)

In a similar manner, we may prove the following:

Theorem 4.5 (Covariance Matrix of Two Sets of Linear Combinations)


Given X and Y, two sets of N deviation scores on p and q variables in column
variate form, having covariance matrix Sxy . The covariance matrix of any
two sets of linear combinations W = XB and M = YC may be computed as

Swm = B Sxy C (4.8)

4.5 TRACE OF A SQUARE MATRIX

Traces play an important role in regression analysis and other elds of statis-
tics.

Denition 4.17 (Trace of a Square Matrix) The trace of a square matrix


A, denoted Tr (A), is dened as the sum of its diagonal elements, i.e.,


N
Tr (A) = aii
i=1
RANDOM VECTORS AND RANDOM MATRICES 75

Result 4.5 (Properties of the Trace) We may verify that the trace has
the following properties:

1. Tr (A + B) = Tr (A) + Tr (B)
 
2. Tr (A) = Tr A

3. Tr (cA) = cTr (A)


   
4. Tr A B = i j aij bij
   
5. Tr E E = i j e2ij

6. The cyclic permutation rule:

Tr (ABC) = Tr (CAB) = Tr (BCA)

4.6 RANDOM VECTORS AND RANDOM MATRICES

In this section, we extend our results on linear combinations of variables to


random vector notation. The generalization is straightforward.

Denition 4.18 (Random Vector) A random vector is a vector whose


elements are random variables.

One (informal) way of thinking of a random variable is that it is a process


that generates numbers according to some law. An analogous way of thinking
of a random vector is that it produces a vector of numbers according to some
law.

Denition 4.19 (Random Matrix) A random matrix is a matrix whose


elements are random variables.

Denition 4.20 (Expected Value of a Random Vector or Matrix) The


expected value of a random vector (or matrix) is a vector (or matrix) whose elements
are the expected values of the individual random variables that are the elements of
the random vector.

Example 4.23 (Expected Value of a Random Vector) Suppose, for


example, we have two random variables x and y, and their expected values are 0
and 2, respectively. If we put these variables into a vector , it follows that
 
0
E () =
2
76 INTRODUCTION TO MATRIX ALGEBRA

Result 4.6 (Variance-Covariance Matrix of a Random Vector) Given


a random vector with expected value , the variance-covariance matrix
is dened as

= E( )( ) (4.9)
 
= E( ) (4.10)

If is a deviation score random vector, then

= E(  )

Comment. Result 4.6 frequently is confusing to beginners. Lets con-


cretize it a bit by giving an example with just two variables.

Example 4.24 (Variance-Covariance Matrix) Suppose


 
x1
=
x2

and  
1
=
2
Note that contains random variables, while contains constants. Computing
E(  ), we nd
 
  x1

E  = E x1 x2
x2
 
x21 x1 x2
= E
x2 x1 x22
 
E(x21 ) E(x1 x2 )
= 2 (4.11)
E(x2 x1 ) E(x2 )

In a similar vein, we nd that


 
1

 = 1 2
2
 
21 1 2
= (4.12)
2 1 22

Subtracting Equation 4.12 from Equation 4.11, and recalling Equation 3.2, we
nd
 
E(x21 ) 21 E(x1 x2 ) 1 2
E(  )  =
E(x2 x1 ) 2 1 E(x22 ) 22
 2 
1 12
=
21 22
RANDOM VECTORS AND RANDOM MATRICES 77

Result 4.7 (Covariance Matrix for Two Random Vectors) Given two
random vectors and , their covariance matrix is dened as
 
= E  E()E(  ) (4.13)
 
= E E()E() (4.14)
We now present some key results involving the expected value algebra of
random matrices and vectors.
Result 4.8 (Expected Value of A Linear Combination) As a general-
ization of results we presented in scalar algebra, we nd that, for a matrix of
constants B, and a random vector x,
 
E B x = B E(x) = B
Result 4.9 (Expected Value of Sums of Random Vectors) For random
vectors x and y, we nd
E (x + y) = E(x) + E(y)
Comment. The result obviously generalizes to the expected value of the
dierence of random vectors.
Result 4.10 (Expected Value Algebra for Random Vectors) Some key
implications of the preceding two results, which are especially useful for reduc-
ing matrix algebra expressions, are the following:
1. The expected value operator distributes over addition and/or subtraction
of random vectors and matrices.
2. The parentheses of an expected value operator can be moved through
multiplied matrices or vectors of constants from both the left and right
of any term, until the rst random vector or matrix is encountered.
3. E(x ) = (E(x))

Example 4.25 (Expected Value Algebra) As an example of Result 4.10,


we reduce the following expression. Suppose the Greek letters are random vectors
with zero expected value, and the matrices contain constants. Then
   
E A B  C = A B E  C
= A B C

Theorem 4.6 (Variance-Covariance Matrix of Linear Combinations)


Given x, a random vector with p variables, having variance-covariance ma-
trix xx . The variance-covariance matrix of any set of linear combinations
y = B x may be computed as
yy = B xx B (4.15)
78 INTRODUCTION TO MATRIX ALGEBRA

In a similar manner, we may prove the following:


Theorem 4.7 (Covariance Matrix of Two Sets of Linear Combinations)
Given x and y, two random vectors with p and q variables having covariance
matrix xy . The covariance matrix of any two sets of linear combinations
w = B x and m = C y may be computed as
wm = B xy C (4.16)

4.7 INVERSE OF A SQUARE MATRIX

For an N N square matrix A, the inverse of A, A1 , exists if and only if


A is of full rank, i.e., if and only if no column of A is a linear combination of
the others. A1 is the unique matrix that satises
A1 A = AA1 = I
If a square matrix A has an inverse, we say that A is invertible, non-
singular, and of full rank. If the transpose of a matrix is its inverse, and
vice versa, i.e., AA = A A = I, we say that the matrix A is orthogonal.
Result 4.11 (Properties of a Matrix Inverse) A1 has the following prop-
erties:
1. (A )1 = (A1 )
2. If A = A , then A1 = (A1 )
3. The inverse of the product of several invertible square matrices is the
product of their inverses in reverse order. For example
(ABC)1 = C1 B1 A1

4. For nonzero scalar c, (cA)1 = (1/c)A1


5. For diagonal matrix D, D1 is a diagonal matrix with diagonal elements
equal to the reciprocal of the corresponding diagonal elements of D.

4.8 BILINEAR AND QUADRATIC FORMS, RANK, AND LINEAR


DEPENDENCE

In this section, we discuss a number of terms related to the rank of a matrix,


a concept that is closely related to the dimensionality of a linear model.

Denition 4.21 (Bilinear Form) For a matrix A, a bilinear form is a scalar


expression of the form
b Ac
BILINEAR AND QUADRATIC FORMS, RANK, AND LINEAR DEPENDENCE 79

for vectors b and c.

Denition 4.22 (Quadratic Form) A quadratic form is a scalar expression


of the form
b Ab
for vector b.

Denition 4.23 (Positive Denite Matrix) A matrix A is positive denite


if for any non-null b, the quadratic form b Ab is greater than zero. Similarly, we
say that A is positive semidenite if for any non-null b, the quadratic form b Ab
is greater than or equal to zero.

Denition 4.24 (Linear Independence and Dependence) A set of p-


component vectors xi , i = 1, . . . , p is said to be linearly independent if there exists
no (non-zero) set of linear weights bi such that
p

bi xi = 0
i=1

If a set of vectors is not linearly independent, then the vectors are linearly dependent.

Example 4.26 (Linear


Dependence)
Let x = 1 1 2 , y =
2 0 1 , and z = 0 2 5 . Then x, y, and z are linearly dependent
because z 2x + y = 0.

Denition 4.25 (Rank of a Matrix) The rank of a matrix A, denoted


Rank (A), is the maximal number of linearly independent rows or columns in A.

Comment. In some theoretical discussions, reference is made to row rank


and column rank. The row rank of a matrix is equal to the number of linearly
independent rows in the matrix. The column rank of a matrix is equal to its
number of linearly independent columns. For any matrix A, row rank and
column rank are equal. If the row rank of a matrix is equal to its number of
rows, we say the matrix is of full row rank, and if its row rank is less than
its number of rows, we say it is of decient row rank. Analogous terminology
holds for columns. If a matrix is square, we refer to it as full rank or decient
rank, depending on whether or not it has the maximum possible number of
linearly independent rows and columns.

Result 4.12 (Some Properties of Rank) For any matrix A, we have


 
1. Rank A = Rank (A).
80 INTRODUCTION TO MATRIX ALGEBRA

   
2. Rank A A = Rank AA = Rank (A).

3. For any conformable matrix B, Rank (AB) min (Rank (A) , Rank (B)).

4. For any nonsingular, square matrix B, Rank (AB) = Rank (A).

4.9 DETERMINANT OF A SQUARE MATRIX

In this section, we dene the determinant of a square matrix, and explore its
properties. The determinant of a matrix A is a scalar function that is zero
if and only if a matrix is of decient rank. This fact is sucient information
about the determinant to allow the reader to continue through much of the
remainder of this book. The remainder of this section is presented primarily
for mathematical completeness, and may be omitted on rst reading.

Denition 4.26 (Determinant) The determinant of a square matrix A,


denoted Det (A) or |A|, is dened as follows:
1. For N N matrix A, form all possible products of N elements of the matrix
such that no two elements in any product are from the same row or column
of A. There are N ! such products available. For example, when N = 3,
the required products are the 6 quantities, a11 a22 a33 , a12 a23 a31 , a13 a21 a32 ,
a13 a22 a31 , a11 a23 a32 , a12 a21 a33 .
2. Within each product, arrange the factors so that the row subscripts are in
natural order 1, 2, . . . , N , as in the example above.
3. Then examine the order in which the column subscripts appear. Specically,
note how many times a larger number precedes a smaller number in the se-
quence of column subscripts for the product. For product pi , call this number
the number of inversions ki .
4. The determinant is computed as


N!
|A| = pi (1)ki (4.17)
i=1

Example 4.27 (Some Simple Determinants) Applying the above formula


to a 3 3 matrix, we obtain

|A| = a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a13 a22 a31 a11 a23 a32 a12 a21 a33 (4.18)

For a 2 2 matrix, we have

|A| = a11 a22 a12 a21 (4.19)


DETERMINANT OF A SQUARE MATRIX 81

Clearly, for N > 3, calculating the determinant using the basic formula
would be exceedingly tedious. However, if we dene Mij as the determinant
of the matrix of order (N 1) (N 1) obtained by crossing out the ith row
and jth column of A, we nd that, for any row i,

N

|A| = aij Mij (1)i+j (4.20)
j=1

Mij is referred to as a minor of A. If we dene the cofactor Cij =


(1)i+j Mij , then we nd that for any row i,

N

|A| = aij Cij , (4.21)
j=1

Also, for any column j,

N

|A| = aij Cij (4.22)
i=1

Result 4.13 (Selected Properties of Determinants) For any N N ma-


trix A,

1. The determinant of a diagonal matrix or a triangular matrix is the prod-


uct of its diagonal elements.

2. If the elements of a single row or column of A multiplied by the scalar


c, the determinant of the new matrix is equal to c|A|. If every element
of A is multiplied by c, then |cA| = cN |A|.

3. If two columns (or rows) of A are interchanged, the sign of |A| is re-
versed.

4. If two columns or two rows of a matrix are equal, then |A| = 0 .

5. The determinant of a matrix is unchanged if a multiple of some column


is added to another column. A similar property holds for rows.

6. If all elements of a row or column of A are zero, then |A| = 0.

7. If A and B are both N N , then |AB| = |A||B|.

8. The sum of the products of the elements of a given row of a square matrix
with the corresponding cofactors of a dierent row is equal to zero. A
similar result holds for columns.
82 INTRODUCTION TO MATRIX ALGEBRA

4.10 EIGENVALUES AND EIGENVECTORS OF A SQUARE MATRIX

The eigenvalues and eigenvectors of a square matrix play a key role in some
important operations in statistics. In particular, they are intimately con-
nected with the determination of the rank of a matrix, and the factoring of
a matrix into a product of matrices.

Denition 4.27 (Eigenvalues and Eigenvectors) For a square matrix A,


a scalar c and a vector v are an eigenvalue and associated eigenvector, respectively,
if and only if they satisfy the equation,

Av = cv (4.23)

Comment. There are innitely many solutions to Equation 4.23 unless some
identication constraint is placed on the size of vector v. For example for any c
and v satisfying the equation, c/2 and 2v must also satisfy the same equation.
Consequently in eigenvectors are assumed to be normalized, i.e., satisfy
the constraint that v v = 1. Eigenvalues ci are roots to the determinantal
equation
|A cI| = 0 (4.24)

Result 4.14 (Properties of Eigenvalues and Eigenvectors) For N N


matrix A with eigenvalues ci and associated eigenvectors vi ,
1.
N

Tr (A) = ci
i=1

2.
N

|A| = ci
i=1

3. Eigenvalues of a symmetric matrix with real elements are all real.


4. Eigenvalues of a positive denite matrix are all positive.
5. If a N N symmetric matrix A is positive semidenite and of rank r,
it has exactly r positive eigenvalues and p r zero eigenvalues.
6. The nonzero eigenvalues of the product AB are equal to the nonzero
eigenvalues of BA. Hence the traces of AB and BA are equal.
7. The characteristic roots of a diagonal matrix are its diagonal elements.
8. The scalar multiple bA has eigenvalue bci with eigenvector vi . (Proof:
Avi = ci vi implies immediately that (bA)vi = (bci )vi .)
APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 83

9. Adding a constant b to every diagonal element of A creates a matrix


A + bI with eigenvalues ci + b and associated eigenvectors vi . (Proof:
(A + bI)vi = Avi + bvi = ci vi + bvi = (ci + b)vi ).

10. Am has cm i as an eigenvalue, and vi as its eigenvector. Proof: Consider


A2 vi = A(Avi ) = A(ci vi ) = ci (Avi ) = ci ci vi = c2i vi . The general
case follows by induction.

11. A1 , if it exists, has 1/ci as an eigenvalue, and vi as its eigenvector.


Proof: Avi = ci vi = vi c. A1 Avi = vi = A1 vi ci . vi = A1 vi ci =
ci A1 vi . So (1/ci )vi = A1 vi .

12. For symmetric A, for distinct eigenvalues ci , cj with associated eigen-


vectors vi , vj , we have vi vj = 0. Proof: Avi = ci vi , and Avj = cj vj .
So vj Avi = ci vj vi and vi Avj = cj vi vj . But, since a bilinear form
is a scalar, it is equal to its transpose, and, remembering that A = A ,
vj Avi = vi Avj . So ci vj vi = cj vi vj = cj vj vi . If ci and cj are dier-
ent, this implies vj vi = 0.

13. For any real, symmetric A, there exists a V such that V AV = D,


where D is diagonal. Moreover, any real, symmetric matrix A can be
written as A = VDV , where V contains the eigenvectors vi of A in
order in its columns, and D contains the eigenvalues ci of A in the ith
diagonal position.

4.11 APPLICATIONS OF EIGENVALUES AND EIGENVECTORS

Eigenvalues and eigenvectors have widespread practical application in mul-


tivariate statistics. In this section, we demonstrate a few such applications.
First, we deal with the notion of matrix factorization.

Denition 4.28 (Powers of a Diagonal Matrix) Diagonal matrices act


much more like scalars than most matrices do. For example, we can dene fractional
powers of diagonal matrices, as well as positive powers. Specically, if diagonal
matrix D has diagonal elements di , the matrix Dx has elements dxi . If x is negative,
it is assumed D is positive denite. With this denition, the powers of D behave
essentially like scalars. For example, D1/2 D1/2 = D.

Example 4.28 (Powers of a Diagonal Matrix) Suppose we have


 
4 0
D=
0 9

Then  
2 0
D1/2 =
0 3
84 INTRODUCTION TO MATRIX ALGEBRA

Example 4.29 (Matrix Factorization) Suppose you have a variance-covariance


matrix for some statistical population. Assuming is positive semidenite,
then (from Result 4.14), it can be written in the form = VDV = FF , where
F = VD1/2 . F is called a Gram-factor of .

Comment. Gram-factors are not, in general, uniquely dened. For exam-


ple, suppose = FF . Then consider any orthogonal matrix T, such that
TT = T T = I. There are innitely many orthogonal matrices of order 2 2
and higher. Then for any such matrix T, we have = FTT F = F F  ,
where F = FT.
Gram-factors have some signicant applications. For example, in the eld
of random number generation, it is relatively easy to generate pseudo-random
numbers that mimic p variables that are independent with zero mean and unit
variance. But suppose we wish to mimic p variables that are not independent,
but have variance-covariance matrix ? The following example describes one
method for doing this.

Result 4.15 (Simulating Nonindependent Random Numbers) Given


p 1 random vector x having variance-covariance matrix I. Let F be a Gram-
factor of = FF . Then y = Fx will have variance-covariance matrix .

In certain intermediate and advanced derivations in matrix algebra, refer-


ence is made to symmetric powers of a symmetric matrix A, in particular
the symmetric square root of A, a symmetric matrix which, when multiplied
by itself, yields A.

Example 4.30 (Symmetric Powers of a Symmetric Matrix) When inves-


tigating properties of eigenvalues and eigenvectors, we pointed out that, for distinct
eigenvalues of a symmetric matrix A , the associated eigenvectors are orthogonal.
Since the eigenvectors are normalized to have a sum of squares equal to 1, it follows
that if we place the eigenvectors in a matrix V, this matrix will be orthogonal, i.e.
VV = V V = I. This fact allows us to create symmetric powers of a symmetric
matrix very eciently if we know the eigenvectors. For example, suppose you wish
to create a symmetric matrix A1/2 such that A1/2 A1/2 = A. Let diagonal matrix
D contain the eigenvalues of A in proper order. Then A = VDV , and it is easy
to verify that A1/2 = VD1/2 V has the required properties. To prove that A1/2 is
symmetric, we need simply show that it is equal to its transpose, which is trivial (so
long as you recall that any diagonal matrix is symmetric, and that the transpose
of a product of several matrices is the product of the transposes in reverse order).
That A1/2 A1/2 = A follows immediately by substitution, i.e.,

A1/2 A1/2 = VD1/2 V VD1/2 V




= VD1/2 V V D1/2 V
= VD1/2 [I] D1/2 V
= VD1/2 D1/2 V
= VDV

You might also like