4 Matrix PDF
4 Matrix PDF
Introduction to Matrix
Algebra
In the previous chapter, we learned the algebraic results that form the founda-
tion for the study of factor analysis and structural equation modeling. These
results, powerful as they are, are somewhat cumbersome to apply in more
complicated systems involving large numbers of variables. Matrix algebra
provides us with a new mathematical notation that is ideally suited for de-
veloping results involving linear combinations and transformations. Once we
have developed a few basic results and learned how to use them, we will be
ready to derive the fundamental equations of factor analysis and structural
modeling.
p Aq
There are numerous other notations. For example, one might indicate a
matrix of order p, q as A (p q). Frequently, we shall refer to such a matrix
as a p q matrix A.
53
54 INTRODUCTION TO MATRIX ALGEBRA
A = {aij }
When we refer to element aij , the rst subscript will refer to the row
position of the elements in the array. The second subscript (regardless of
which letter is used in this position) will refer to the column position. Hence,
a typical matrix p Aq will be of the form:
a11 a12 a13 a1q
a21 a22 a23 a2q
a31 a32 a33 a3q
A=
.. .. .. .. ..
. . . . .
ap1 ap2 ap3 apq
We shall generally use bold capital letters to indicate matrices, and employ
lower case letters to signify elements of an array, except where clarity dictates.
In particular, we may nd it convenient, when referring to matrices of random
variables, to refer to the elements with capital letters, so we can distinguish
notationally between constants and random variables.
A column vector with all elements equal to one will be symbolized as either
j or 1.
Note how we reserve the use of boldface for matrices and vectors.
1. If p = q, A is a rectangular matrix.
2. If p = q, A is a square matrix.
BASIC TERMINOLOGY 55
2. A square matrix
1 2
3 4
5. A diagonal matrix
1 0 0
0 2 0
0 0 7
56 INTRODUCTION TO MATRIX ALGEBRA
6. A scalar matrix
2 0 0
0 2 0
0 0 2
7. A symmetric matrix
1 2 3
2 2 4
3 4 2
In the above notation, the scalar 1 refers to the correlation of the Y variable
with itself. The row vector rY x refers to the set of correlations between the
variable Y and the set of p random variables in x. Rxx is the p p matrix
of correlations of the predictor variables. We will refer to the order of the
partitioned form as the number of rows and columns in the partitioning,
which is distinct from the number of rows and columns in the matrix being
represented. For example, suppose there were p = 5 predictor variables in
Example 4.2. Then the matrix R is a 6 6 matrix, but the example shows a
2 2 partitioned form.
When matrices are partitioned properly, it is understood that pieces that
appear to the left or right of other pieces have the same number of rows, and
pieces that appear above or below other pieces have the same number of
columns. So, in the above example, Rxx , appearing to the right of the p 1
column vector rxY , must have p rows, and since it appears below the 1 p
row vector rY x , it must have p columns. Hence, it must be a p p matrix.
SOME MATRIX OPERATIONS 57
For the addition and subtraction operations to be dened for two matrices A,
B, they must be conformable.
Denition 4.5 (Matrix Addition) Let A = {aij } and B = {bij }. Let A and
B be conformable. The sum A + B = C is dened as:
Denition 4.7 (Matrix Equality) Two matrices are equal if and only if they
are of the same row and column order, and have all elements equal.
1. Associativity
A + (B + C) = (A + B) + C
2. Commutativity
A+B=B+A
3. There exists a neutral element for addition, i.e., the null matrix 0,
such that A + 0 = A.
4. There exist inverse elements for addition, in the sense that for any ma-
trix A, there exists a matrix A, such that A + A = 0.
1. (a + b)A = aA + bA
2. a(A + B) = aA + aB
3. a(bA) = (ab)A
4. aA = Aa
SOME MATRIX OPERATIONS 59
Denition 4.9 (Scalar Product) Given row vector 1 ap and p b1 . Let a = {ai }
and b = {bi }. The scalar product a b is dened as
a b = ai b i
i
Note: This is simply the sum of cross products of the elements of the two vectors.
4
Example 4.5 (Scalar Product) Let a =
1 2 3 . Let b = 2 .
1
Then a b = 11.
Example 4.6 (The Row by Column Method) The meaning of the formal
denition of matrix multiplication might not be obvious at rst glance. Indeed,
there are several ways of thinking about matrix multiplication. The rst way, which
I call the row by column approach, works as follows. Visualize p Aq as a set of p row
vectors and q Bs as a set of s column vectors. Then if C = AB, element cik of C
is the scalar product (i.e., the sum of crossproducts) of the ith row of A
with the
2 4 6 4 1
kth column of B. For example, let A = 5 7 1 , and let B = 0 2 .
2 3 5 5 1
60 INTRODUCTION TO MATRIX ALGEBRA
38 16
Then C = AB = 25 20 . Consider element c21 , which has a value of 25. This
33 13
element, which is in the second row and rst column of C, is computed by taking
the sum of cross products of the elements of the second row of A with the rst
column of B. That is, (5 4) + (7 0) + (1 5) = 25.
Example 4.7 (A Null Matrix Product) The following example shows that
one can, indeed, obtain a null matrix
as the product of two non-null matrices. Let
8 12 12
a = 6 2 2 , and let B = 12 40 4 . Then a B = 0 0 0 .
12 4 40
Comment. The above notation may seem cryptic to the beginner, so some
clarication may be useful. The typical element of A is aij . This means that,
in the i,j position of matrix A is found element aij . Suppose A is of order
3 3. Then it can be written as
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
A is a matrix with typical element aji . This means that, in the i,j position
of matrix A is found element aji of the original matrix A. For example,
element 2,1 of A is element 1,2 of the original matrix. So, in terms of the
elements of the original A, the transpose has the representation
a11 a21 a31
A = a12 a22 a32
a13 a22 a33
Studying the above element-wise representation, you can see that trans-
position does not change the diagonal elements of a matrix, and constructs
columns of the transpose from the rows of the original matrix.
1. (A ) = A
2. (cA) = cA
3. (A + B) = A + B
4. (AB) = B A
Comment. You should study the above results carefully, as they are used
frequently in reducing and simplifying matrix algebra expressions.
62 INTRODUCTION TO MATRIX ALGEBRA
Then
C E G
A =
D F H
G
X Y and B = . Then AB = XG + YH .
H
y = Xb
b1
b2
b3
= x1 x2 x3 xp
..
.
bp
= b1 x1 + b2 x2 + b3 x3 + + bp xp
y = Xb
80 70
77 +1
= 79
1
64 64
80 70
= +1 77 1 79
64 64
10
= 2
0
Specically
y = Xb
80 70
77 +1/3
= 79
+2/3
64 64
80 70
= +1/3 77 + 2/3 79
64 64
1
73 3
= 78 1
3
64
y = b X
x1
x2
= b1 b2 bp ..
.
xp
= b1 x1 + b2 x2 + . . . + bp xp
Example 4.15 (Taking the Sum and Dierence of Two Columns) Sup-
pose the matrix X consists of a set of scores on two variables, and you wish to
compute both the sum and the dierence scores on the variables. In this case, we
post-multiply by two vectors of linear weights, creating two linear combinations.
SOME MATRIX OPERATIONS 65
Specically
y = XB
80 70
= 77 79 +1 +1
+1 1
64 64
150 10
= 156 2
128 0
See if you can gure out, for yourself, before scanning the examples below,
how to perform the above 5 operations with linear combinations.
Denition 4.15 (Selection Vector) The selection vector s[i] is a vector with
all elements zero except the ith element, which is 1.
multiplication by s[2] .
1 4
0 1 0 2 5 = 2 5
3 6
In this section, we show how matrix algebra can be used to express some
common statistical formulas in a succinct way that allows us to derive some
important results in multivariate analysis.
N
N
1 x = 1i xi = xi
i=1 i=1
x = (1/N )1 x
x = x 1x
1x
= x1 (4.1)
N
11
= x x
N
11
= x x
N
11
= I x (4.2)
N
= (I P) x (4.3)
x = Qx (4.4)
where
Q=IP
and
11
P=
N
Comment. A number of points need to be made about the above derivation:
1. You should study the above derivation carefully, making certain you
understand all steps.
3. Since x can be converted from raw score form to deviation score form
by pre-multiplication with a single matrix, it follows that any particular
deviation score can be computed with one pass through a list of numbers.
5. If one were, for some reason, to write a computer program using Equa-
tion 4.4, one would not need (or want) to save the matrix Q, for several
reasons. First, it can be very large! Second, no matter how large N is,
the elements of Q take on only two distinct values. Diagonal elements
of Q are always equal to (N 1)/N , and o-diagonal elements are al-
ways equal to 1/N . In general, there would be no need to store the
numbers.
MATRIX ALGEBRA OF SOME SAMPLE STATISTICS 69
Let us now investigate the properties of the matrices P and Q that accom-
plish this transformation. First, we should establish an additional denition
and result.
2
Proof. To prove the result, we need merely show that (I C) = (I C).
This is straightforward.
2
(I C) = (I C) (I C)
= I2 CI IC + C2
= ICC+C
= IC
2
70 INTRODUCTION TO MATRIX ALGEBRA
11 11
PP =
N N
11 11
=
N2
1 (1 1) 1
=
N2
1 (N ) 1
=
N2
11 (N )
=
N2
N
= 11 2
N
11
=
N
= P
The above derivation demonstrates some principles that are generally useful
in reducing simple statistical formulas in matrix form:
since the transpose of a product of two matrices is the product of their trans-
poses in reverse order.
The expression can be reduced further. Since Q is symmetric, it follows
immediately that Q = Q, and (remembering also that Q is idempotent) that
Q Q = Q. Hence
2
SX = 1/(N 1) x Qx
As an obvious generalization of the above, we write the matrix form for
the covariance between two vectors of scores x and y as
arranged in p rows. The choice of whether to use row or column variate repre-
sentations is arbitrary, and varies in books and articles. One must, ultimately,
be equally uent with either notation, although modern computer software
tends to emphasize column variate form.
Given N Xm and N Yp , two data matrices in deviation score form. The covari-
ance matrix Sxy is a m p matrix with element sij equal to the covariance
between the ith variable in X and the jth variable in Y. Sxy is computed as
Proof. The variables in X are in deviation score form if and only if the sum
of scores in each column is zero, i.e., 1 X = 0 . But if 1 X = 0 , then for any
linear combination y = Xb, we have, immediately,
1 y = 1 Xb
= (1 X) b
= 0 b
= 0
Since, for any b, the linear combination scores in y sum to zero, it must
be in deviation score form. 2
Sy2 = 1/(N 1) y y
= 1/(N 1) (Xb) (Xb)
= 1/(N 1) b X Xb
= b 1/(N 1) X X b
= b Sxx b
Traces play an important role in regression analysis and other elds of statis-
tics.
N
Tr (A) = aii
i=1
RANDOM VECTORS AND RANDOM MATRICES 75
Result 4.5 (Properties of the Trace) We may verify that the trace has
the following properties:
1. Tr (A + B) = Tr (A) + Tr (B)
2. Tr (A) = Tr A
= E( )( ) (4.9)
= E( ) (4.10)
= E( )
and
1
=
2
Note that contains random variables, while contains constants. Computing
E( ), we nd
x1
E = E x1 x2
x2
x21 x1 x2
= E
x2 x1 x22
E(x21 ) E(x1 x2 )
= 2 (4.11)
E(x2 x1 ) E(x2 )
Subtracting Equation 4.12 from Equation 4.11, and recalling Equation 3.2, we
nd
E(x21 ) 21 E(x1 x2 ) 1 2
E( ) =
E(x2 x1 ) 2 1 E(x22 ) 22
2
1 12
=
21 22
RANDOM VECTORS AND RANDOM MATRICES 77
Result 4.7 (Covariance Matrix for Two Random Vectors) Given two
random vectors and , their covariance matrix is dened as
= E E()E( ) (4.13)
= E E()E() (4.14)
We now present some key results involving the expected value algebra of
random matrices and vectors.
Result 4.8 (Expected Value of A Linear Combination) As a general-
ization of results we presented in scalar algebra, we nd that, for a matrix of
constants B, and a random vector x,
E B x = B E(x) = B
Result 4.9 (Expected Value of Sums of Random Vectors) For random
vectors x and y, we nd
E (x + y) = E(x) + E(y)
Comment. The result obviously generalizes to the expected value of the
dierence of random vectors.
Result 4.10 (Expected Value Algebra for Random Vectors) Some key
implications of the preceding two results, which are especially useful for reduc-
ing matrix algebra expressions, are the following:
1. The expected value operator distributes over addition and/or subtraction
of random vectors and matrices.
2. The parentheses of an expected value operator can be moved through
multiplied matrices or vectors of constants from both the left and right
of any term, until the rst random vector or matrix is encountered.
3. E(x ) = (E(x))
If a set of vectors is not linearly independent, then the vectors are linearly dependent.
2. Rank A A = Rank AA = Rank (A).
3. For any conformable matrix B, Rank (AB) min (Rank (A) , Rank (B)).
In this section, we dene the determinant of a square matrix, and explore its
properties. The determinant of a matrix A is a scalar function that is zero
if and only if a matrix is of decient rank. This fact is sucient information
about the determinant to allow the reader to continue through much of the
remainder of this book. The remainder of this section is presented primarily
for mathematical completeness, and may be omitted on rst reading.
N!
|A| = pi (1)ki (4.17)
i=1
|A| = a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a13 a22 a31 a11 a23 a32 a12 a21 a33 (4.18)
Clearly, for N > 3, calculating the determinant using the basic formula
would be exceedingly tedious. However, if we dene Mij as the determinant
of the matrix of order (N 1) (N 1) obtained by crossing out the ith row
and jth column of A, we nd that, for any row i,
N
|A| = aij Mij (1)i+j (4.20)
j=1
N
|A| = aij Cij , (4.21)
j=1
N
|A| = aij Cij (4.22)
i=1
3. If two columns (or rows) of A are interchanged, the sign of |A| is re-
versed.
8. The sum of the products of the elements of a given row of a square matrix
with the corresponding cofactors of a dierent row is equal to zero. A
similar result holds for columns.
82 INTRODUCTION TO MATRIX ALGEBRA
The eigenvalues and eigenvectors of a square matrix play a key role in some
important operations in statistics. In particular, they are intimately con-
nected with the determination of the rank of a matrix, and the factoring of
a matrix into a product of matrices.
Av = cv (4.23)
Comment. There are innitely many solutions to Equation 4.23 unless some
identication constraint is placed on the size of vector v. For example for any c
and v satisfying the equation, c/2 and 2v must also satisfy the same equation.
Consequently in eigenvectors are assumed to be normalized, i.e., satisfy
the constraint that v v = 1. Eigenvalues ci are roots to the determinantal
equation
|A cI| = 0 (4.24)
2.
N
|A| = ci
i=1
Then
2 0
D1/2 =
0 3
84 INTRODUCTION TO MATRIX ALGEBRA