Sst414-Lesson 4
Sst414-Lesson 4
4.1 Introduction
In this Lesson we will consider a multivariate statistical method known as principal
component analysis.
4.2.1 The principal components of a random vector when the matrix Σ is specified
and when Σ unspecified
We shall first introduce some ideas in matrix algebra necessary for the derivation of
results in principal component analysis
In the prelude to this course we saw that a matrix X of observations on a random vector
can be represented in terms of its columns as a data matrix given by
X = [ X1 , X 2 , ..., Xn ]
⎛ X 11 X 12 X 13 ⎞
⎜ ⎟
X = [ X1 X2 X 3 ] = ⎜ X 21 X 22 X 23 ⎟ where each X j is a 3 × 1 column matrix.
⎜X X 33 ⎟⎠
⎝ 31 X 32
Orthogonal Matrices
Orthogonal matrices have the property that the sum of cross products of any two
rows(columns)is equal to zero.
Example 4.1
1
⎛ ⎞
⎜1 2 2 ⎟
−
⎜3 3 3 ⎟
P=⎜ 2 ⎟
The matrix ⎜ −
1
− ⎟ is an orthogonal matrix.
2
⎜3 3 3 ⎟
⎜2 1 ⎟
⎜ 2 ⎟
⎝3 3 3 ⎠
Orthonormal Matrices
These are orthogonal matrices which have three additional properties.
1. The sum of squares of the elements of any row(column) is equal to 1
2. The determinant of P , P , is equal to ±1
U ′ΣU = Λ 4.1
The diagonal elements of Λ , λ1 , λ2 , ..., λ p are called the characteristic roots(latent roots or
from which the values λ1 , λ2 , ..., λ p are obtained. The characteristic vector
(Σ − λ I ) t = 0
p i 4.3
2
i = 1, 2, ..., p
ti
ui = 4.4
ti′ ti
i = 1, 2, ..., p
U ′SU = L 4.5
determinantal equation
S − lI p = 0 4.6
Geometrically, the procedure described thus is nothing more than a principal axis
rotation of the covariance matrix and elements of the characteristic vectors are the
direction cosines of the new axes relative to the old axes.
3
⎛ n n
⎞
⎜
1 ⎜ n
∑X ∑X ik jk ⎟
where sij = ∑
n − 1 ⎜ k =1
X ik X jk − k =1
n
k =1 ⎟ i, j = 1, 2, ..., p and s = s i.e. S is a
⎟ ij ji u
⎜⎜ ⎟⎟
⎝ ⎠
symmetric matrix.
A principal axis transformation will transform p correlated variables X 1 , X 2 , ..., X p
forming the original random vector of correlated variables into p new uncorrelated
variables Z1 , Z 2 , ..., Z p , the coordinate axes of these new variables being described by the
vectors ui which make up the matrix U of direction cosines used in the following
transformation
Z = U ′( X − X ) 4.7
Where X and X are p × 1 vectors of the original variables and their means respectively.
The transformed variables are called the Principal Components of X . The i − th
Principal Component is then given by
Zi = ui′ ( X − X ) 4.8
and will have mean zero and variance li . When the li ' s are arranged in order of
This means that the first PC has the largest variance followed by the second PC and so
on.
Since PC’s are linear combinations of the original variables it would require
standardization of the latter to make them unit free.
From (4.7) if one requires to transform a set of variables X by a linear transformation
Z = U ′( X − X )
where U is orthogonal or not, the covariance matrix of the new variables, S Z , can be
determined directly by the relationship
4
S Z = var(U ′( X − X ))
= U ′ var( X )U
= U ′S X U = L 4.9
However, the fact that U is orthonormal is not a sufficient condition for the new
variables to be independent. Only transformation such as the principal axis
transformation will produce an S Z which is the diagonal matrix L . The fact that S Z is
diagonal means that the PC’s are uncorrelated.
Uses of Principal Components
1. One use of PC is that it can be used to reduce dimensionality i.e. if the
dimensionality(number of variables) is large then using PC analysis we can decide to
retain only a few PC’s which cumulatively explain quite a “big” proportion of the
variance(some practitioners use>75%)
2. Another use is that PC analysis forms a preliminary for identifying underlying factors
in ‘Factor Analysis’
Example 4.1 In a study involving two correlated variables, X 1 and X 2 , the following
sample statistics were obtained:
⎛ 0.7986 0.6793 ⎞
S =⎜ ⎟, X = [10.0 10.0]′
⎝ 0.6793 0.7343 ⎠
(a) Determine the Principal Components and their variances
(b) By computing proportion of the variance explained by the first Principal
component would you consider reducing the dimensionality of the data ?
Solution:
(a) There are two variables so p = 2 and we have
0.7986 − l 0.6793
S − lI 2 = =0
0.6793 0.7343 − l
5
0.12496 − 1.5329l + l 2 = 0
Using formula for solving quadratic equation we have
The characteristic vectors may then be obtained by the solution of the equations
ti
( S − li I 2 ) ti = 0 and ui = , i = 1, 2
ti′ ti
Hence for i = 1
( S − li I 2 ) t1 = 0 we get
These are two homogeneous equations in two unknowns. To solve, let t11 = 1 and work
with just one equation
−0.6479 + 0.6793t21 = 0 gives t21 = 0.9538
So that,
t1 ⎛ 1.0 ⎞
u1 = 1
= ⎜ ⎟
t1′ t1 1.9097 ⎝ 0.9538 ⎠
⎛ 0.7236 ⎞
= ⎜ ⎟
⎝ 0.6902 ⎠
⎛ −0.6902 ⎞
u2 = ⎜ ⎟
⎝ 0.7236 ⎠
6
The matrix U is then given by
⎛ 0.7236 −0.6902 ⎞
U = [u1 u2 ] = ⎜ ⎟
⎝ 0.6902 0.7236 ⎠
⎛ 1.4465 0 ⎞
=⎜ ⎟=L
⎝ 0 0.0864 ⎠
⎛ Z1 ⎞ = ⎛ 0.7236 0.6902 ⎞ ⎛ X 1 − 10 ⎞
⎜ ⎟ ⎜ −0.6902 ⎟⎜ ⎟
⎝ Z2 ⎠ ⎝ 0.7236 ⎠⎝ X 2 − 10 ⎠
That is,
First PC: Z1 = 0.7236( X 1 − 10) + 0.6902( X 2 − 10)
Second PC: Z 2 = −0.6902( X 1 − 10) + 0.7236( X 2 − 10)
7
Take X j = a j′ X , where a j′ = [0, 0, ...,1, 0, ..., 0]
Then
= a j′ var( X )ui
= a j′Σui
But Σ ui = λi ui so that
Therefore
λiU ji
co rr ( X j , Z i ) =
var( X j ).var( Z i )
= λiU ji
σ jj .λi
= U ji λi 4.10
σ jj
U ji li
rZi X j = where s j = s jj
sj
Example 4.2: For the data given in Example 4.1 compute correlations between the PC’s
with each of the original variables
Solution:
The correlation between Z1 and each X j ' s
8
U 21 l1 0.6902 1.4465
rZ1 , X 2 = = =0.968
s2 0.7343
U 22 l2 0.7236 0.0864
rZ2 , X 2 = = =0.2482
s2 0.7343
Z1 Z2
X1 0.974 -0.2270
X2 0.9687 0.2482
Two ways of describing in one number, the variability of a set of related variables are
1. The determinant of the covariance matrix, S
Where these determinants are related to the area or volume of generated by the set of
variables. Likewise
9
l1 1.4465
= =0.944
trace( S ) 1.5329
And
l2 0.0864
= =0.056
trace( S ) 1.5329
This says that roughly 94% of total variability of these measurements is associated with
the First PC while only 6% is due to the Second PC. This helps to explain the correlation
between X and Z (high positive correlation between First PC and the components of the
original variables). Since the characteristic roots from S are sample estimates, the
li
proportions are also sample estimates.
trace( S )
4.5 Exercise
4.5.1. . Let the random vector X have the variance-covariance matrix
⎛ 4 0 0⎞
⎜ ⎟
Σ = ⎜ 0 9 0⎟
⎜0 0 1⎟
⎝ ⎠
Find
(a) Σ −1
(b) The eigenvalues of Σ
(c) The eigenvalues and eigenvectors of Σ −1
Show that is positive definite.
10
⎛ 17.5 7.0 ⎞
S =⎜ ⎟
⎝ 7.0 5.0 ⎠
(a) Determine the principal components
(b) Find the percentage of the total variance explained by the first principal
component. How would you interpret these principal components with regard to
their structure?
4.5.3
(a) Explain the use of principal components in multivariate analysis.
(b) Let X = ( X 1 , X 2 )′ be a two-dimensional random vector whose variance-covariance
matrix is given by
⎛1 α⎞
Σ=⎜ ⎟ , where 0 < α < 1
⎝α 1 ⎠
(ii) Find the percentage of the total variance explained by the first principal component if
α = 12
Summary
In this Lesson we considered a multivariate statistical method known as principal
component analysis.
. In particular we have:
1. Obtained the principal components of a random vector when the matrix Σ is
specified and when Σ unspecified
2. Obtained the correlation between principal components and original variables
References
1. Manly, B.F.J.(2004). Multivariate Statistical Methods: A Primer, 3rd Edition.
Chapman& Hall/HRC. ISBN-1584884142, ISBN-13: 978-1583883149.
2. Morrison, D. F. (2004). Multivariate Statistical Methods; 4th Edition; McGraw Hill;
ISBN: 07- 043185.
11
3. Krzanowski, W. J. (2000). Principles of Multivariate Analysis; 2nd Edition; Oxford
University Press; ISBN: 0198507089, 97801198507086.
4. https://www.worldcat.org/title/applied-multivariate-analysis/oclc/1035710263
12