0% found this document useful (0 votes)
10 views12 pages

Sst414-Lesson 4

This document discusses principal component analysis (PCA). It introduces PCA and some related matrix algebra concepts. It then describes how to perform PCA by obtaining principal components and their variances from a covariance matrix. It also discusses using PCA to reduce dimensionality by retaining components explaining most of the variance.

Uploaded by

kamandawyclif0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

Sst414-Lesson 4

This document discusses principal component analysis (PCA). It introduces PCA and some related matrix algebra concepts. It then describes how to perform PCA by obtaining principal components and their variances from a covariance matrix. It also discusses using PCA to reduce dimensionality by retaining components explaining most of the variance.

Uploaded by

kamandawyclif0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LESSON 4 (2hrs)

PRINCIPAL COMPONENTS ANALYSIS

4.1 Introduction
In this Lesson we will consider a multivariate statistical method known as principal
component analysis.

4.2 Lesson Learning Outcomes


By the end of this Lesson will be able to:
4.2.1 Obtain the principal components of a random vector when the matrix Σ is
specified and when Σ unspecified
4.2.2 Obtain the correlation between principal components and original variables

4.2.1 The principal components of a random vector when the matrix Σ is specified
and when Σ unspecified

We shall first introduce some ideas in matrix algebra necessary for the derivation of
results in principal component analysis

Some Matrix Algebra

In the prelude to this course we saw that a matrix X of observations on a random vector
can be represented in terms of its columns as a data matrix given by
X = [ X1 , X 2 , ..., Xn ]

For instance if X is a 3 × 3 matrix, it can be partitioned as

⎛ X 11 X 12 X 13 ⎞
⎜ ⎟
X = [ X1 X2 X 3 ] = ⎜ X 21 X 22 X 23 ⎟ where each X j is a 3 × 1 column matrix.
⎜X X 33 ⎟⎠
⎝ 31 X 32

Orthogonal Matrices
Orthogonal matrices have the property that the sum of cross products of any two
rows(columns)is equal to zero.
Example 4.1

1
⎛ ⎞
⎜1 2 2 ⎟

⎜3 3 3 ⎟
P=⎜ 2 ⎟
The matrix ⎜ −
1
− ⎟ is an orthogonal matrix.
2
⎜3 3 3 ⎟
⎜2 1 ⎟
⎜ 2 ⎟
⎝3 3 3 ⎠

Orthonormal Matrices
These are orthogonal matrices which have three additional properties.
1. The sum of squares of the elements of any row(column) is equal to 1
2. The determinant of P , P , is equal to ±1

3. The inverse of P , P −1 is equal to its transpose P′ i.e. P −1 = P′


The key idea from Matrix Algebra related to the method of Principal Component(PC)
is that a p × p symmetric, nonsingular matrix, such as the variance-covariance matrix
Σ may be reduced to a diagonal matrix Λ by pre-multiplying and post-multiplying
by a particular orthonorrmal matrix U That is,

U ′ΣU = Λ 4.1

The diagonal elements of Λ , λ1 , λ2 , ..., λ p are called the characteristic roots(latent roots or

eigenvalues) of Σ . The characteristic roots may be obtained by solving the determinantal


equation
Σ − λI p = 0 4.2

Where I p is the p × p identity matrix. This produces a p − th degree polynomial in λ

from which the values λ1 , λ2 , ..., λ p are obtained. The characteristic vector

ti corresponding to the characteristic root λi ( i = 1, 2, ..., p ) are obtained by solving the


p simultaneous equations

(Σ − λ I ) t = 0
p i 4.3

2
i = 1, 2, ..., p

Then the normalized values of ti are obtained by

ti
ui = 4.4
ti′ ti

i = 1, 2, ..., p

If the matrix Σ is unspecified(which is usually the case) it is usually replaced by its


sample counterpart S and the matrix U determined to satisfy

U ′SU = L 4.5

Where L is a diagonal matrix with elements l1 , l2 , ..., l p obtained by solving the

determinantal equation
S − lI p = 0 4.6

Geometrically, the procedure described thus is nothing more than a principal axis
rotation of the covariance matrix and elements of the characteristic vectors are the
direction cosines of the new axes relative to the old axes.

Application of Method of Principal Axis Rotation to Statistical Analysis


The starting point of the statistical application of the Principal Component is the sample
covariance matrix S .
For a p − variate problem the sample dispersion matrix is

⎛ s11 s12 ... s1 p ⎞ ⎛ s12 s12 ... s1 p ⎞


⎜ ⎟ ⎜ ⎟
⎜ s21 s22 ... s2 p ⎟ ⎜ s21 s22 ... s2 p ⎟
Su = ⎜ . . ... . ⎟ = ⎜ . . ... . ⎟⎟
⎜ ⎟ ⎜
⎜ . . ... . ⎟ ⎜ . . ... . ⎟
⎜s ⎜ ⎟
⎝ p1 sp2 ... s pp ⎟⎠ ⎜ s sp2 ... s 2p ⎟⎠
⎝ p1

3
⎛ n n


1 ⎜ n
∑X ∑X ik jk ⎟
where sij = ∑
n − 1 ⎜ k =1
X ik X jk − k =1
n
k =1 ⎟ i, j = 1, 2, ..., p and s = s i.e. S is a
⎟ ij ji u

⎜⎜ ⎟⎟
⎝ ⎠
symmetric matrix.
A principal axis transformation will transform p correlated variables X 1 , X 2 , ..., X p

forming the original random vector of correlated variables into p new uncorrelated
variables Z1 , Z 2 , ..., Z p , the coordinate axes of these new variables being described by the

vectors ui which make up the matrix U of direction cosines used in the following
transformation
Z = U ′( X − X ) 4.7
Where X and X are p × 1 vectors of the original variables and their means respectively.
The transformed variables are called the Principal Components of X . The i − th
Principal Component is then given by

Zi = ui′ ( X − X ) 4.8
and will have mean zero and variance li . When the li ' s are arranged in order of

magnitudes i.e. l1 ≥ l2 ≥ ... ≥ l p then the corresponding ui vectors determine the

corresponding Principal Components i.e.


Z1 = u1′ ( X − X ) , Z 2 = u2′ ( X − X ) , …, Z p = u p′ ( X − X )

This means that the first PC has the largest variance followed by the second PC and so
on.
Since PC’s are linear combinations of the original variables it would require
standardization of the latter to make them unit free.
From (4.7) if one requires to transform a set of variables X by a linear transformation
Z = U ′( X − X )

where U is orthogonal or not, the covariance matrix of the new variables, S Z , can be
determined directly by the relationship

4
S Z = var(U ′( X − X ))

= U ′ var( X )U
= U ′S X U = L 4.9
However, the fact that U is orthonormal is not a sufficient condition for the new
variables to be independent. Only transformation such as the principal axis
transformation will produce an S Z which is the diagonal matrix L . The fact that S Z is
diagonal means that the PC’s are uncorrelated.
Uses of Principal Components
1. One use of PC is that it can be used to reduce dimensionality i.e. if the
dimensionality(number of variables) is large then using PC analysis we can decide to
retain only a few PC’s which cumulatively explain quite a “big” proportion of the
variance(some practitioners use>75%)
2. Another use is that PC analysis forms a preliminary for identifying underlying factors
in ‘Factor Analysis’

Example 4.1 In a study involving two correlated variables, X 1 and X 2 , the following
sample statistics were obtained:

⎛ 0.7986 0.6793 ⎞
S =⎜ ⎟, X = [10.0 10.0]′
⎝ 0.6793 0.7343 ⎠
(a) Determine the Principal Components and their variances
(b) By computing proportion of the variance explained by the first Principal
component would you consider reducing the dimensionality of the data ?

Solution:
(a) There are two variables so p = 2 and we have
0.7986 − l 0.6793
S − lI 2 = =0
0.6793 0.7343 − l

This gives the quadratic equation

5
0.12496 − 1.5329l + l 2 = 0
Using formula for solving quadratic equation we have

1.5329 ± (1.5329) 2 − (4)(1)(0.12496)


l=
2
This gives
l1 = 1.4465 , l2 = 0.0864

The characteristic vectors may then be obtained by the solution of the equations
ti
( S − li I 2 ) ti = 0 and ui = , i = 1, 2
ti′ ti

Hence for i = 1
( S − li I 2 ) t1 = 0 we get

⎛ 0.7986 − 1.4465 0.6793 ⎞ ⎛ t11 ⎞ ⎛ 0 ⎞


⎜ ⎟⎜ ⎟ = ⎜ ⎟
⎝ 0.6793 0.7343 − 1.4465 ⎠ ⎝ t21 ⎠ ⎝ 0 ⎠

These are two homogeneous equations in two unknowns. To solve, let t11 = 1 and work
with just one equation
−0.6479 + 0.6793t21 = 0 gives t21 = 0.9538
So that,
t1 ⎛ 1.0 ⎞
u1 = 1
= ⎜ ⎟
t1′ t1 1.9097 ⎝ 0.9538 ⎠

⎛ 0.7236 ⎞
= ⎜ ⎟
⎝ 0.6902 ⎠

Similarly, using l2 = 0.0864 and letting t22 = 1 we get

⎛ −0.6902 ⎞
u2 = ⎜ ⎟
⎝ 0.7236 ⎠

6
The matrix U is then given by

⎛ 0.7236 −0.6902 ⎞
U = [u1 u2 ] = ⎜ ⎟
⎝ 0.6902 0.7236 ⎠

Note that U is orthonormal i.e. u1′ u1 = 1 , u2′u2 = 1 and u1′ u2 = 0

⎛ 0.7236 0.6902 ⎞⎛ 0.7986 0.6793 ⎞⎛ 0.7236 −0.6902 ⎞


U ′SU = ⎜ ⎟⎜ ⎟⎜ ⎟
⎝ −0.6902 0.7236 ⎠⎝ 0.6793 0.7343 ⎠⎝ 0.6902 0.7236 ⎠

⎛ 1.4465 0 ⎞
=⎜ ⎟=L
⎝ 0 0.0864 ⎠

Hence the PC’s are

⎛ Z1 ⎞ = ⎛ 0.7236 0.6902 ⎞ ⎛ X 1 − 10 ⎞
⎜ ⎟ ⎜ −0.6902 ⎟⎜ ⎟
⎝ Z2 ⎠ ⎝ 0.7236 ⎠⎝ X 2 − 10 ⎠

That is,
First PC: Z1 = 0.7236( X 1 − 10) + 0.6902( X 2 − 10)
Second PC: Z 2 = −0.6902( X 1 − 10) + 0.7236( X 2 − 10)

var( Z1 ) = l1 = 1.4465 , var( Z 2 ) = l2 = 0.0864


l1 1.4465
(b) Here = = 0.94
l1 + l2 1.5329
or the first principal component explains 94% of the total variability. Since this is
greater than 75% We may Consider reducing the dimensionality to one.
4.2.2 Correlation between Principal Components and original variables
We can determine the correlation of each PC with each of the original variables for
diagnostic purposes.
Consider cov( X j , Z i ) .

7
Take X j = a j′ X , where a j′ = [0, 0, ...,1, 0, ..., 0]

Then

cov( X j , Z i ) = cov( a j′ X , ui′ X )

= a j′ var( X )ui

= a j′Σui

But Σ ui = λi ui so that

cov( X j , Z i ) = λi a j′ui = λiU ji

Therefore
λiU ji
co rr ( X j , Z i ) =
var( X j ).var( Z i )

= λiU ji
σ jj .λi

= U ji λi 4.10
σ jj

In terms of sample values we have

U ji li
rZi X j = where s j = s jj
sj

Example 4.2: For the data given in Example 4.1 compute correlations between the PC’s
with each of the original variables
Solution:
The correlation between Z1 and each X j ' s

U11 l1 0.7236 1.4465


rZ1 , X1 = = =0.974
s1 0.7986

8
U 21 l1 0.6902 1.4465
rZ1 , X 2 = = =0.968
s2 0.7343

The correlation between Z 2 and each X j ' s

U12 l2 −0.6902 0.0864


rZ2 , X1 = = = -0.227
s1 0.7986

U 22 l2 0.7236 0.0864
rZ2 , X 2 = = =0.2482
s2 0.7343

We can form the table below

Z1 Z2

X1 0.974 -0.2270
X2 0.9687 0.2482

Generalized Measures of Variability

Two ways of describing in one number, the variability of a set of related variables are
1. The determinant of the covariance matrix, S

2. The sum of the variances of the variables, s1 + s2 + ... + s p = trace( S )

A useful property of PC is that the variability specified by either measure is preserved.


That is
S = L 4.11

Where these determinants are related to the area or volume of generated by the set of
variables. Likewise

trace( S ) = trace( L) 4.12


If we obtain the ratio of each characteristic root to the total variability, we can obtain the
proportion of total variability associated with each PC. For instance in Example 4.1 we
have

9
l1 1.4465
= =0.944
trace( S ) 1.5329

And
l2 0.0864
= =0.056
trace( S ) 1.5329

This says that roughly 94% of total variability of these measurements is associated with
the First PC while only 6% is due to the Second PC. This helps to explain the correlation
between X and Z (high positive correlation between First PC and the components of the
original variables). Since the characteristic roots from S are sample estimates, the
li
proportions are also sample estimates.
trace( S )

4.4 Assessment Questions


4.4.1 Explain how you determine the principal components of a vector of random vector
4.4.2. Give some three uses of Principal components analysis
4.4.3 Give some two generalized measures of dispersion in multivariate statitics

4.5 Exercise
4.5.1. . Let the random vector X have the variance-covariance matrix
⎛ 4 0 0⎞
⎜ ⎟
Σ = ⎜ 0 9 0⎟
⎜0 0 1⎟
⎝ ⎠
Find
(a) Σ −1
(b) The eigenvalues of Σ
(c) The eigenvalues and eigenvectors of Σ −1
Show that is positive definite.

4.5.2. Let X = ( X 1 , X 2 )′ be two-dimensional random vector whose sample variance-


covariance matrix is given as

10
⎛ 17.5 7.0 ⎞
S =⎜ ⎟
⎝ 7.0 5.0 ⎠
(a) Determine the principal components
(b) Find the percentage of the total variance explained by the first principal
component. How would you interpret these principal components with regard to
their structure?
4.5.3
(a) Explain the use of principal components in multivariate analysis.
(b) Let X = ( X 1 , X 2 )′ be a two-dimensional random vector whose variance-covariance
matrix is given by

⎛1 α⎞
Σ=⎜ ⎟ , where 0 < α < 1
⎝α 1 ⎠

(i) Obtain the first and the second principal components

(ii) Find the percentage of the total variance explained by the first principal component if
α = 12

(iii) Show that Σ = Λ

(iv) Show that trace(Σ) = trace(Λ ) , where Λ =diagonal( λ1 , λ2 ), λi , i = 1, 2 being the


characteristic roots of Σ .

Summary
In this Lesson we considered a multivariate statistical method known as principal
component analysis.
. In particular we have:
1. Obtained the principal components of a random vector when the matrix Σ is
specified and when Σ unspecified
2. Obtained the correlation between principal components and original variables

References
1. Manly, B.F.J.(2004). Multivariate Statistical Methods: A Primer, 3rd Edition.
Chapman& Hall/HRC. ISBN-1584884142, ISBN-13: 978-1583883149.
2. Morrison, D. F. (2004). Multivariate Statistical Methods; 4th Edition; McGraw Hill;
ISBN: 07- 043185.

11
3. Krzanowski, W. J. (2000). Principles of Multivariate Analysis; 2nd Edition; Oxford
University Press; ISBN: 0198507089, 97801198507086.
4. https://www.worldcat.org/title/applied-multivariate-analysis/oclc/1035710263

12

You might also like