0% found this document useful (0 votes)
17 views

Appendix B Matrix Algebra

This document provides definitions and properties of matrix algebra. It defines matrices, vectors, and common matrix operations such as addition, scalar multiplication, and matrix multiplication. It also defines transpose and properties of transpose operations. Key matrix types are defined, including diagonal, identity, zero, and symmetric matrices. Matrix operations like addition and multiplication follow similar rules to arithmetic but matrix multiplication is noncommutative. The document aims to introduce basic concepts of matrix algebra.

Uploaded by

ma yening
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Appendix B Matrix Algebra

This document provides definitions and properties of matrix algebra. It defines matrices, vectors, and common matrix operations such as addition, scalar multiplication, and matrix multiplication. It also defines transpose and properties of transpose operations. Key matrix types are defined, including diagonal, identity, zero, and symmetric matrices. Matrix operations like addition and multiplication follow similar rules to arithmetic but matrix multiplication is noncommutative. The document aims to introduce basic concepts of matrix algebra.

Uploaded by

ma yening
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Lecture 2 Basics of Matrix Algebra

Shaojian Chen
September 23, 2021

1 Basic Definitions
Definition 1.1 (Matrix). A matrix is a rectangular array of numbers. More precisely,
an m  n matrix has m rows and n columns. The positive integer m is called the row
dimension, and n is called the column dimension.

We use uppercase boldface letters to denote matrices. We can write an m  n matrix


generically as
 a11 a12 a13 a1n 
a a22 a23 a2 n 
A =  aij  =  21
 
 
 am1 am 2 am3 amn 

where aij represents the element in the ith row and the jth column. For example, a25
stands for the number in the second row and the fifth column of A. A specific example
of a 2  3 matrix is
 2 −1 7
Α=
0 
(A.1)
 −4 5

Where a13 = 7 . The shorthand A = [aij ] is often used to define matrix operations.

Definition 1.2 (Square Matrix). A square matrix has the same number of rows and
columns. The dimension of a square matrix is its number of rows and columns.

Definition 1.3 (Vectors)

(i)A 1 m matrix is called a row vector (of dimension m) and can be written as
x  ( x1 , x2 , , xm ) .

(ii)An n  1 matrix is called a column vector and can be written as

1 / 11
 x1 
x 
x   2
 
 
 xn 

Definition 1.4 (Diagonal Matrix).A square matrix A is a diagonal matrix when all of its
off-diagonal elements are zero, that is, aij = 0 for all i  j . We can always write a diagonal
matrix as
 a11 0 0 0 
0 a22 0 0 
Α=
 
 
0 0 0 ann 

Definition 1.5 (Identity and Zero Matrices)

(i) The n  n identity matrix, denoted I, or sometimes In to emphasize its dimension,


is the diagonal matrix with unity (one) in each diagonal position, and zero elsewhere:
1 0 0 0
0 1 0 0 
I  In  
 
 
0 0 0 1

(ii) The m  n zero matrix, denoted 0, is the m  n matrix with zero for all entries. This
need not be a square matrix.

2 Matrix Operations
2.1 Matrix Addition

Two matrices A and B, each having dimension m  n , can be added element by element:
A + B = [ aij + bij ] . More precisely,

 a11 + b11 a12 + b12 a1n + b1n 


 a +b a22 + b22 a2 n + b2 n 
Α + Β =  21 21
 
 
 am1 + bm1 am 2 + bm 2 amn + bmn 

For example,
 2 −1 7  1 0 −4   3 −1 3
 −4 + =
 5 0   4 2 3  0 7 3

2 / 11
Matrices of different dimensions cannot be added.

2.2 Scalar Multiplication

Given any real number g (often called a scalar), scalar multiplication is defined as
 A  [ aij ] , or
  a11  a12  a1n 
 a  a22  a2 n 
 Α =  21
 
 
  am1  am 2  amn 

For example, if  = 2 and A is the matrix in equation (A.1), then


 4 −2 14 
Α =  
 −8 10 0 

2.3 Matrix Multiplication

To multiply matrix A by matrix B to form the product AB, the column dimension of A
must equal the row dimension of B. Therefore, let A be an m  n matrix and let B be an
n  p matrix. Then, matrix multiplication is defined as
 n 
ΑΒ =   aik bkj 
 k =1 

In other words, the (i, j)th element of the new matrix AB is obtained by multiplying each
element in the ith row of A by the corresponding element in the jth column of B and
adding these n products together.

For example,

 0 1 6 0
 2 −1 0   1 −1
1 = 
0 12
 −4 −1
0   1
2 0
 −1 −2 −24
0  
1
 3 0 0

We can also multiply a matrix and a vector. If A is an n  m matrix and y is an m1


vector, then Ay is an n  1 vector. If x is a 1 n vector, then xA is a 1 m vector.

Matrix addition, scalar multiplication, and matrix multiplication can be combined in


various ways, and these operations satisfy several rules that are familiar from basic
operations on numbers. In the following list of properties, A, B, and C are matrices with

3 / 11
appropriate dimensions for applying each operation, and  and  are real numbers.
Most of these properties are easy to illustrate from the definitions.

Properties of Matrix Operations. (1) ( +  )A =  A +  A ; (2)  (A + B) =  A +  B ;


(3) ( )A =  ( A) ; (4)  (AB) = ( A)B ; (5) A + B = B + A ; (6) (AB)C = A(BC) ;
(7) (A + B) + C = A + (B + C) ; (8) A(B + C) = AB + AC ; (9) (A + B)C = AC + BC ;
(10) IA = AI = A ; (11) A + 0 = 0 + A = A ; (12) A − A = 0 ; (13) A0 = 0A = 0 ; and (14)
AB  BA , even when both products are defined.

The last property deserves further comment. If A is n  m and B is m  p , then AB is


defined, but BA is defined only if n = p (the row dimension of A equals the column
dimension of B). If A is m  n and B is n  m , then AB and BA are both defined, but they
are not usually the same; in fact, they have different dimensions, unless A and B are both
square matrices. Even when A and B are both square, AB  BA , except under special
circumstances.

2.4 Transpose

Definition 2.1 (Transpose). Let A = [aij ] be an m  n matrix. The transpose of A,


denoted A (called A prime), is the n  m matrix obtained by interchanging the rows
and columns of A. We can write this as A  [a ji ] .

For example,

 2 −4 
 2 −1 7  
A=  , A =  −1 5
 −4 5 0
 7 0 

Properties of Transpose. (1) (A) = A ; (2) ( A) =  A for any scalar  ; (3)
(A + B) = A + B ; (4) (AB) = BA , where A is m  n and B is n  k ; (5) xx = 
n 2
x ,
i =1 i

where x is an n  1 vector; and (6) If A is an n  k matrix with rows given by the 1 k


vectors a1 , a 2 , , a n , so that we can write

 a1 
a 
A =  2
 
 
a n 

then A = (a1 , a 2 , , a n ) .
4 / 11
Definition 2.2 (Symmetric Matrix). A square matrix A is a symmetric matrix if, and
only if, A = A .

If X is any n  k matrix, then XX is always defined and is a symmetric matrix, as can be
seen by applying the first and fourth transpose properties.

2.5 Partitioned Matrix Multiplication

Let A be an n  k matrix with rows given by the 1 k vectors a1, a2, , an, and let B be
an n  m matrix with rows given by 1 m vectors b1, b2, , bn:

 a1   b1 
a  b 
A=  2
, B= 
2

   
   
a n  b n 

Then,
n
AB =  aibi
i=1

where for each i , ai b i is a k  m matrix. Therefore, AB can be written as the sum of n
matrices, each of which is k  m . As a special case, we have
n
AA =  aiai
i=1

Where ai ai is a k  k matrix for all i .

A more general form of partitioned matrix multiplication holds when we have matrices
A ( m  n ) and B ( n  p ) written as

A A12   B11 B12 


A =  11 , B =  
 A 21 A 22   B 21 B 22 

where A11 is m1  n1 , A12 is m1  n2 , A21 is m2  n1 , A22 is m2  n2 , B11 is n1  p1 , B12 is


n1  p2 , B21 is n2  p1 , and B22 is n2  p2 . Naturally, m1 + m2 = m , n1 + n2 = n , and
p1 + p2 = p .

When we form the product AB, the expression looks just when the entries are scalars:

 A B + A12B 21 A11B12 + A12B 22 


AB =  11 11 
 A 21B11 + A 22B 21 A 21B12 + A 22B 22 

5 / 11
Note that each of the matrix multiplications that form the partition on the right is well
defined because the column and row dimensions are compatible for multiplication.

2.6 Trace

The trace of a matrix is a very simple operation defined only for square matrices.

Definition 2.3 (Trace). For any n  n matrix A, the trace of a matrix A, denoted tr(A), is
the sum of its diagonal elements. Mathematically,
n
tr( A) =  aii
i =1

Properties of Trace. (1) tr(I n ) = n ; (2) tr(A + B) = tr(A) + tr(B) ; (3) tr(A) = tr(A) ; (4)
tr( A) =  tr(A) , for any scalar  ; and (5) tr(AB) = tr(BA) , where A is m  n and B is
n m .

2.7 Inverse

The notion of a matrix inverse is very important for square matrices.

Definition 2.4 (Inverse). An n  n matrix A has an inverse, denoted A −1 , provided that


A −1A = I n and AA −1 = I n . In this case, A is said to be invertible or nonsingular. Otherwise,
it is said to be noninvertible or singular.

Properties of Inverse. (1) If an inverse exists, it is unique; (2) ( A)−1 = (1/  ) A −1 , if


  0 and A is invertible; (3) ( AB)−1 = B−1A −1 , if A and B are both n  n and
invertible; and (4) ( A) −1 = ( A −1 ) .

3 Linear Independence and Rank of a Matrix


Definition 3.1 (Linear Independence). Let {x1 , x 2 , , x r } be a set of n  1 vectors.
These are linearly independent vectors if, and only if,
1x1 +  2 x2 + +  r xr = 0 (A.2)
implies that 1 =  2 = =  r = 0 . If (A.2) holds for a set of scalars that are not all zero,
then {x1 , x 2 , , x r } is linearly dependent.

The statement that {x1 , x 2 , , x r } is linearly dependent is equivalent to saying that at


least one vector in this set can be written as a linear combination of the others.

Definition 3.2 (Rank)


6 / 11
(i)Let A be an n  m matrix. The rank of a matrix A , denoted rank(A), is the maximum
number of linearly independent columns of A .

(ii)If A is n  m and rank A = m, then A has full column rank.

If A is n  m , its rank can be at most m. A matrix has full column rank if its columns form
a linearly independent set. For example, the 3  2 matrix

1 3 
2 6
 
 0 0 

can have at most rank two. In fact, its rank is only one because the second column is
three times the first column.

Properties of Rank. (1) rank(A)=rank(A); (2) If A is n  k , then rank( A ) min(n, k);


and (3) If A is k  k and rank( A ) = k, then A is invertible.

4 Quadratic Forms and Positive Definite Matrices


Definition 4.1 (Quadratic Form). Let A be an n  n symmetric matrix. The quadratic
form associated with the matrix A is the real-valued function defined for all n  1 vectors
x:
n n n
f (x) = xAx =  aii xi2 + 2 aij xi x j
i =1 i =1 j i

Definition 4.2 (Positive Definite and Positive Semi-Definite)

A symmetric matrix A is said to be positive definite (p.d.) if

xAx  0 for all n  1 vectors x except x = 0 .

A symmetric matrix A is positive semi-definite (p.s.d.) if


xAx  0 for all n  1 vectors.

If a matrix is positive definite or positive semi-definite, it is automatically assumed


to be symmetric.

Properties of Positive Definite and Positive Semi-Definite Matrices. (1)A p.d.


matrix has diagonal elements that are strictly positive, while a p.s.d. matrix has
nonnegative diagonal elements; (2) If A is p.d., then A −1 exists and is p.d.; (3) If X is n  k ,

7 / 11
then XX and XX are p.s.d.; and (4) If X is n  k and rank(X) = k , then XX is p.d. (and
therefore nonsingular).

5 Idempotent Matrices
Definition 5.1 (Idempotent Matrix). Let A be an n  n symmetric matrix. Then A is said
to be an idempotent matrix if, and only if, AA = A .

For example,

1 0 0 
0 0 0 
 
0 0 1 

is an idempotent matrix, as direct multiplication verifies.

Properties of Idempotent Matrices. Let A be an n  n idempotent matrix. (1)


rank(A) = tr(A) ; (2) A is positive semi-definite; (3) its characteristic roots are either
zero or one.

We can construct idempotent matrices very generally. Let X be an n  k matrix with


rank(A) = k . Define
P  X( XX)−1 X
M  I n − X( XX) −1 X = I n − P

Then P and M are symmetric, idempotent matrices with rank(P) = k and


rank(M) = n − k . The ranks are most easily obtained by using Property 1:
tr(P) = tr[( XX) −1 XX] (from Property 5 for trace) = tr(I k ) = k (by Property 1 for trace).
It easily follows that tr(M) = tr(I n ) − tr(P) = n − k .

6 Differentiation of Linear and Quadratic Forms


Let g (x) = xAx , x = ( x1 , x2 , , xn ) and a be n  1 , A is a n  n symmetric matrix,
then:
(1) ax / x = xa / x = a
 g (x) / x1 
(2) g (x) / x =  

g (x) / x n 

(3) g (x) / x = g (x) / x1 g (x) / xn 


8 / 11
(4) (Ax) / x = A
(5) (xAx) / x = (A + A)x

(6)  2 (xAx) / xx = ( A + A)

7 Moments and Distributions of Random Vectors


In order to derive the expected value and variance of the OLS estimators using matrices,
we need to define the expected value and variance of a random vector. As its name
suggests, a random vector is simply a vector of random variables. We also need to define
the multivariate normal distribution.

7.1 Expected Value

Definition 7.1 (Expected Value)

(i)If x is an n  1 random vector, the expected value of x, denoted E(x), is the vector of
expected values: E(x)=[E(x1 ), E(x2 ),..., E(xn )] .

(ii)If Z is an n  m random matrix, E(Z) is the n  m matrix of expected values:


E(Z)=[E(zij )] .

Properties of Expected Value. (1) If A is an m  n matrix and b is an n  1 vector, where


both are nonrandom, then E(Ax + b) = AE(x) + b ; (2) If A is p  n and B is m  k , where
both are nonrandom, then E(AZB) = AE(Z)B .

7.2 Variance-Covariance Matrix

Definition 7.2 (Variance-Covariance Matrix). If x is an n  1 random vector, its


variance-covariance matrix, denoted Var(x) , is defined as
  12  12 ...  1n 
 
  21  22 ...  2 n 
Var(x) =
 
 2 
 n1  n 2 ...  n 

Where  2j = Var(x j ) and  ij = Cor(xi , x j ) . In other words, the variance-covariance


matrix has the variances of each element of x down its diagonal, with covariance terms
in the off diagonals. Because Cov(xi , x j ) = Cov(x j , xi ) , it immediately follows that a
variance-covariance matrix is symmetric.

Properties of Variance. (1) If a is an n  1 nonrandom vector, then Var(a ' x) =


9 / 11
a '[Var(x)]a  0 ;(2)If Var(a ' x)  0 for all a  0 , Var(x) is positive definite;(3) Var(x) =
E[(x − μ)(x − μ)'] , where μ=E(x) ;(4) If the elements of x are uncorrelated, Var( x ) is a
diagonal matrix. If, in addition, Var(x j ) =  2 for j = 1, 2, , n , then Var(x) =  2 I n ; (5)
If A is an m  n nonrandom matrix and b is an n  1 nonrandom vector, then
Var(Ax + b) = A[Var(x)]A ' .

7.3 Multivariate Normal Distribution

If x is an n  1 multivariate normal random vector with mean μ and variance-covariance


matrix Σ , we write x Normal(μ, Σ) .We now state several useful properties of the
multivariate normal distribution.

Properties of the Multivariate Normal Distribution. (1) If x Normal(μ, Σ) , then


each element of x is normally distributed; (2) If x Normal(μ, Σ) , then xi and x j , any two
elements of x, are independent if, and only if, they are uncorrelated, that is,  ij = 0 ; (3) If
x Normal(μ, Σ) , then Ax+b ~Normal (Aμ+b, AΣA ') ,where A and b are nonrandom;
(4) If x ~Normal (0, Σ) then, for nonrandom matrices A and B, Ax and Bx are
independent if, and only if, AΣB ' = 0 .In particular, if Σ =  2 Ι n , then AB ' = 0 is
necessary and sufficient for independence of Ax and Bx;(5) If x~Normal (0,  2 Ι n ) , A is
a k  n nonrandom matrix, and B is an n  n symmetric, idempotent matrix, then Ax and
x ' Bx are independent if, and only if, AB =0; (6) If x ~Normal (0,  2 Ι n ) and A and B are
nonrandom symmetric, idempotent matrices, then x ' Ax and x ' Bx are independent
if, and only if, AB =0.

7.4 Chi-Square Distribution

We defined a chi-square random variable as the sum of squared independent


standard normal random variables. In vector notation, if x ~Normal (0, Ι n ) ,then
x ' x ~ χ 2n .

Properties of the Chi-Square Distribution. (1)If x ~Normal (0, Ι n ) and A is an n  n


symmetric, idempotent matrix with rank(A) = q, then x ' Ax ~ χ 2q ; (2)If x ~Normal
(0, Ι n ) and A and B are n  n symmetric, idempotent matrices such that AB = 0, then
x'Ax and x'Bx are independent, chi-square random variables; and (3) If z ~Normal
(0, C) , where C is an m  m nonsingular matrix, then z ' C−1z ~ χ 2m .

10 / 11
7.5 t Distribution

Property of the t Distribution. If x ~Normal (0, Ι n ) , c is an n  1 nonrandom vector,


A is a nonrandom n  n symmetric, idempotent matrix with rank q, and Ac=0, then
{c ' x / (c ' c)1/2}/ (x ' Ax / q)1/2 ~ tq .

7.6 F Distribution

Recall that an F random variable is obtained by taking two independent chi-square


random variables and finding the ratio of each, standardized by degrees of freedom.

Property of the F Distribution. If x ~Normal (0, Ι n ) and A and B are n  n nonrandom


symmetric, idempotent matrices with rank (A) = k1, rank (B)=k2, and AB=0, then
(x'Ax / k1 ) / (x'Bx / k2 ) ~ Fk1 ,k2 .

11 / 11

You might also like