Factor Analysis
Factor Analysis
654
Factor Analysis - Example
A marketing firm wishes to determine how consumers choose
to patronize certain stores.
655
Orthogonal Factor Model
X = (X1, X2, . . . , Xp)0 is a pdimensional vector of
observable traits distributed with mean vector and
covariance matrix .
656
Orthogonal Factor Model
In matrix notation: (X )p1 = LpmFm1 + p1, where L
is the matrix of factor loadings and F is the vector of values
for the m unobservable common factors.
Notice that the model looks very much like an ordinary linear
model. Since we do not observe anything on the right hand
side, however, we cannot do anything with this model un-
less we impose some more structure. The orthogonal factor
model assumes that
E(F) = 0, V ar(F) = E(FF0) = I,
657
Orthogonal Factor Model
Assuming that the variances of the factors are all one is not
a restriction, as it can be achieved by properly scaling the
factor loadings.
Assuming that the common factors are uncorrelated and the
unique factors are uncorrelated are the defining restrictions
of the orthoginal factor model.
The assumptions of the orthogonal factor model have
implications for the structure of . If
(X )p1 = LpmFm1 + p1,
then it follows that
(X )(X )0 = (LF + )(LF + )0
= (LF + )((LF)0 + 0)
= LFF0L0 + F0L0 + LF0 + 0.
658
Orthogonal Factor Model
= E(X )(X )0
= LE(FF0)L0 + E(F0)L0 + LE(F0) + E(0)
= LL0 + ,
since E(FF0) = V ar(F) = I and E(F0) = Cov(, F) = 0.
Also,
659
Orthogonal Factor Model
Var(Xi) = ii = `2 2 2
i1 + `i2 + + `im + i
Cov(Xi, Xk ) = ik = `i1`k1 + `i2`k2 + + `im`km.
ii = `2
i1 + ` 2 + + `2 + = h2 + .
i2 im i i i
660
Orthogonal Factor Model
661
Example: No Proper Solution
662
Example: No Proper Solution
663
Example: No Proper Solution
Further,
1 = `2 2
11 + 1 1 = 1 `11 = 0.575,
which cannot be true because = Var(1). Thus, for m = 1
we get a numerical solution that is not consistent with the
model or with the interpretation of its parameters.
664
Rotation of Factor Loadings
X = LF + = LT T 0F + = LF + ,
with L = LT and F = T 0F.
665
Rotation of Factor Loadings
Since
Notice that the two sets of loadings generate the same co-
variance matrix :
0
= LL0 + = LT T 0L0 + = LL + .
666
Rotation of Factor Loadings
How to resolve this ambiguity?
667
Estimation in Orthogonal Factor Models
We begin with a sample of size n of pdimensional vectors
x1, x2, ..., xn and (based on our knowledge of the problem)
choose a small number m of factors.
668
Estimation in Orthogonal Factor Models
669
The Principal Component Method
Let (i, ei) denote the eigenvalues and eigenvectors of and
recall the spectral decomposition which establishes that
= 1e1e01 + 2e2e02 + + pepe0p.
Use L to denote the pp matrix with columns equal to iei,
i = 1, ..., p. Then the spectral decomposition of is given
by
= LL0 + 0 = LL0,
and corresponds to a factor model in which there are as many
factors as variables m = p and where the specific variances
i = 0. The loadings on the jth factor are just the q coeffi-
cients of the jth principal component multiplied by j .
670
The Principal Component Method
LpmL0mp.
671
The Principal Component Method
The communality for the i-th observed variable is the amount
of its variance that can be atttributed to the variation in the
m factors
m
`2
X
hi = ij for i = 1, 2, ..., p
j=1
672
The Principal Component Method
LpmL0mp +
for that exactly reproduces the variances of the p measured
traits but only approximates the correlations.
673
Principal Component Estimation
674
Principal Component Estimation
The estimated specific variances are given by the diagonal
L
elements of S L 0, so
1 0 0
2 0 m
0
`2
X
=
... ... ... ... ,
i = sii
ij .
j=1
0 0 p
675
Principal Component Estimation
In many applications of factor analysis, m, the number of
factors, is decided prior to the analysis.
676
Principal Component Estimation
677
Principal Component Estimation
678
Strategy for PC Factor Analysis
First, center observations (and perhaps standardize)
679
Example: Stock Price Data
Note that the first three are chemical companies and the last
two are oil companies.
680
Example: Stock Price Data
The sample correlation matrix R is
1 0.58 0.51 0.39 0.46
1 0.60 0.39 0.32
R=
1 0.44 0.42 .
1 0.52
1
681
Example: Stock Price Data
Recall that the method of principal components results in
factor loadings equal to
ej , so in this case
682
Example: Stock Price Data
The first factor appears to be a market-wide effect on weekly
stock price gains whereas the second factor reflects industry
specific effects on chemical and oil stock price returns.
683
Example: Stock Price Data
In this example, most of the residuals appear to be small,
with the exception of the {4, 5} element and perhaps also
the {1, 2}, {1, 3}, {2, 3} elements.
684
SAS code: Stock Price Data
data set1;
infile "c:\stat501\data\stocks.dat";
input x1-x5;
label x1 = ALLIED CHEM
x2 = DUPONT
x3 = UNION CARBIDE
x4 = EXXON
x5 = TEXACO;
run;
685
/* Compute principal components */
Factor Pattern
Factor1 Factor2
x1 ALLIED CHEM 0.78344 -0.21665
x2 DUPONT 0.77251 -0.45794
x3 UNION CARBIDE 0.79432 -0.23439
x4 EXXON 0.71268 0.47248
x5 TEXACO 0.71209 0.52373
686
R code: Stock Price Data
stocks[1:6, ]
x1 x2 x3 x4 x5
1 0.000000 0.000000 0.000000 0.039473 0.000000
2 0.027027 -0.044855 -0.003030 -0.014466 0.043478
3 0.122807 0.060773 0.088146 0.086238 0.078124
4 0.057031 0.029948 0.066808 0.013513 0.019512
5 0.063670 -0.003793 -0.039788 -0.018644 -0.024154
6 0.003521 0.050761 0.082873 0.074265 0.049504
688
# Create a scatter plot matrix of the standardized data
pairs(stockss,labels=c("Allied","Dupont","Carbide", "Exxon",
"Texaco"), panel=function(x,y){panel.smooth(x,y)
abline(lsfit(x,y),lty=2) })
689
2 0 1 2 3 2 0 1 2 3
0 1 2 3
Allied
0 1 2 3
Dupont
1
Carbide
2
3
Exxon
0
1
Texaco
2 0 1 2 3 2 0 1 2 2 0 1 2 3
690
# Compute principal components from the correlation matrix
x1 x2 x3 x4 x5
x1 1.0000000 0.5769308 0.5086555 0.3867206 0.4621781
x2 0.5769308 1.0000000 0.5983817 0.3895188 0.3219545
x3 0.5086555 0.5983817 1.0000000 0.4361014 0.4256266
x4 0.3867206 0.3895188 0.4361014 1.0000000 0.5235293
x5 0.4621781 0.3219545 0.4256266 0.5235293 1.0000000
s <- var(s.pc$x)
pvar<-round(diag(s)/sum(diag(s)), digits=6)
pvar
PC1 PC2 PC3 PC4 PC5
0.571298 0.161824 0.108008 0.090270 0.068600
692
# Plot component scores
par(fin=c(5,5))
plot(s.pc$x[,1],s.pc$x[,2],
xlab="PC1: Overall Market",
ylab="PC2: Oil vs. Chemical",type="p")
693
3
PC2: Oil vs. Chemical
4 2 0 2 4
694
Principal Factor Method
Consider the estimated model for the correlation matrix.
R L(L)0 +
The estimated loading matrix should provide a good approx-
imation for all of the correlations and part of the variances
as follows
(h1)2 r12 r13 r1p
695
Principal Factor Method
Start by obtaining initial estimates of the communalities,
(h1)2, (h2)2, ..., (hp)2.
)2 r
23
r21 (h r2p
2
R
... ... . . . ... ...
rp1 rp2 rp3 (hp)2
where (hi )2 = 1 i is the estimated communality for the
i-th factor
696
Principal Factor Method
Use the initial estimates to compute
(h1)2 r12 r13 r1p
j=1
697
Principal Factor Method
)2 r
23
r21 (h r2p
2
R
... ... . . . ... ...
rp1 rp2 rp3 (hp)2
and compute a new estimate of the loading matirx,
q q
L = 1e1 mem
698
Principal Factor Method
Note that R is generally not positive definite and some
eignvalues can be negative.
699
Stock Price Example
The estimated loadings from the principal factor method are
displayed below.
Loadings Loadings Specific Commu-
on factor 1 on factor 2 variances nalities
Variable `i1 `i2 i = 1 (hi )2 (hi )2
Allied Chem 0.70 -0.09 0.50 0.50
Du Pont 0.71 -0.25 0.44 0.56
Union Carb 0.72 -0.11 0.47 0.53
Exxon 0.62 0.23 0.57 0.43
Texaco 0.62 0.28 0.54 0.46
700
Stock Price Example
701
Stock Price Example
702
/* SAS code for the principal factor method */
703
Maximum Likelihood Estimation
To implement the ML method, we need to include some
assumptions about the distribution of the p-dimensional
vector Xj and the m-dimensional vector Fj :
Xj Np(, ), Fj Nm(0, Im), j Np(0, pp),
where Xj = LFj + j , = LL0 + , and Fj is independent of
j . Also, is a diagonal matrix.
704
Maximum Likelihood Estimation
705
Maximum Likelihood Estimation
706
Maximum Likelihood Estimation: Stock Prices
Results;
Loadings Loadings Specific
Variable for factor 1 for factor 2 variances
Allied Chem 0.684 0.189 0.50
Du Pont 0.694 0.517 0.25
Union Carb. 0.681 0.248 0.47
Exxon 0.621 -0.073 0.61
Texaco 0.792 -0.442 0.18
Prop. of variance 0.485 0.113
707
Maximum Likelihood Estimation: Stock Prices
708
Likelihood Ratio test for Number of Factors
We wish to test whether the m factor model appropriately
describes the covariances among the p variables.
We test
H0 : pp = LpmL0mp + pp
versus
Ha : is a positive definite matrix
709
Likelihood Ratio test for Number of Factors
710
Likelihood Ratio test for Number of Factors
711
Likelihood Ratio test for Number of Factors
(1/2)[(p m)2 p m]
Where does this come from?
713
Likelihood Ratio test for Number of Factors
We reject H0 at level if
L
|L 0 + |
2 ln = (n 1 (2p + 4m + 5)/6) ln > 2
df ,
||
with df = 1
2 [(p m) 2 p m] for large n and large n p.
1 (2p + 1 8p + 1).
To have df > 0, we must have m < 2
714
Stock Price Data
715
Stock Price Data
716
Stock Price Data
717
Maximum Likelihood Estimation: Stock Prices
The same SAS program can be used, but now use method
= ml instead of prinit in the proc factor statement.
718
Initial Factor Method: Maximum Likelihood
x1 x2 x3 x4 x5
0.43337337 0.46787840 0.44606336 0.34657438 0.37065882
Factor1 Factor2
0.88925225 0.70421149
721
Eigenvalues of the Weighted Reduced Correlation
Matrix: Total = 10.4103228 Average = 2.08206456
Factor Pattern
Factor1 Factor2
x5 TEXACO 0.79413 0.43944
x2 DUPONT 0.69216 -0.51894
x1 ALLIED CHEM 0.68315 -0.19184
x3 UNION CARBIDE 0.68015 -0.25092
x4 EXXON 0.62087 0.07000
722
Variance Explained by Each Factor
x1 x2 x3
x1 ALLIED CHEM 0.49651 0.00453 -0.00413
x2 DUPONT 0.00453 0.25161 -0.00261
x3 UNION CARBIDE -0.00413 -0.00261 0.47443
x4 EXXON -0.02399 -0.00389 0.03138
x5 TEXACO 0.00397 0.00033 -0.00424
x4 x5
x1 ALLIED CHEM -0.02399 0.00397
x2 DUPONT -0.00389 0.00033
x3 UNION CARBIDE 0.03138 -0.00424
x4 EXXON 0.60963 -0.00028
x5 TEXACO -0.00028 0.17625
x1 x2 x3 x4 x5
0.01253908 0.00326122 0.01602114 0.01984784 0.00291394
724
Maximum Likelihood Estimation with R
stocks.fac
Call:
factanal(x = stocks, factors = 2, method = "mle", scale=T, center=T)
Uniquenesses:
x1 x2 x3 x4 x5
0.497 0.252 0.474 0.610 0.176
725
Loadings:
Factor1 Factor2
x1 0.601 0.378
x2 0.849 0.165
x3 0.643 0.336
x4 0.365 0.507
x5 0.207 0.884
Factor1 Factor2
SS loadings 1.671 1.321
Proportion Var 0.334 0.264
Cumulative Var 0.334 0.598
728
Measure of Sampling Adequacy
730
Measure of Sampling Adequacy
x1 x2 x3 x4 x5
0.80012154 0.74270277 0.80925875 0.80426564 0.75573970
731
Tucker and Lewis Reliability Coefficient
m = diag(R L
0m)
m L
1/2
Gm = m diag(R L 1/2
0m)
m L m
732
Tucker and Lewis Reliability Coefficient
733
Tucker and Lewis Reliability Coefficient
The mean square for the model with zero common factors is
PP 2
i<j rij
M0 =
p(p 1)/2
734
Cornbachs Alpha
Define
1 PP
Cov(X , X ) PP
p(p1)/2 i<j i j 2 i<j Sij
r = 1 Pp
= Pp
p 1 i=1 Sii
p i=1 V ar(Xi )
Cornbachs Alpha is
P " #
p
r p V ar(X )
= = 1 i P i
1 + (p 1)
r p1 V ar( i Xi)
735
Cornbachs Alpha
Pp
For standardized variables, i=1 V ar(Xi ) = p
736
Cornbachs Alpha
In the extreme case where all pairwise correlations are 1 we
have
P
i V ar(Xi ) p 1
= =
p2
P
V ar( i Xi) p
and
P" # " #
p V ar(X ) p p
= 1 i P i = 1 2 =1
p1 V ar( i Xi) p1 p
When r = 0, then = 0
Be sure that the scores for all items are orientated in the
same direction so all correlations are positive
737
Factor Rotation
738
Factor Rotation
Since only the loadings change by rotation, we rotate factors
to see if we can better interpret results.
739
Varimax Rotation
Define `ij = `ij /
hi as the scaled loading of the i-th variable
on the j-th rotated factor.
740
Maximum Likelihood Estimation: Stock Prices
741
Quartimax Rotation
742
Quartimax Rotation
743
Maximum Likelihood Estimation: Stock Prices
744
PROMAX Transformation
1. is not a rotation
745
PROMAX Transformation
1. First perform a varimax rotate to obtain loadings L
746
PROMAX Transformation
LF = LM M 1F = LP FP
LP = LM
747
PROMAX Transformation
Derivation:
748
Maximum Likelihood Estimation: Stock Prices
749
/* Varimax Rotation */
1 2
1 0.67223 0.74034
2 -0.74034 0.67223
Factor1 Factor2
x2 DUPONT 0.84949 0.16359
x3 UNION CARBIDE 0.64299 0.33487
x1 ALLIED CHEM 0.60126 0.37680
x5 TEXACO 0.20851 0.88333
x4 EXXON 0.36554 0.50671
751
Variance Explained by Each Factor
753
The FACTOR Procedure
Rotation Method: Quartimax
1 2
1 0.88007 -0.47484
2 0.47484 0.88007
Factor1 Factor2
x2 DUPONT 0.82529 -0.25940
x3 UNION CARBIDE 0.72488 -0.01060
x1 ALLIED CHEM 0.70807 0.04611
x4 EXXON 0.56231 0.27237
x5 TEXACO 0.60294 0.67839
754
Variance Explained by Each Factor
756
The FACTOR Procedure
Prerotation Method: Varimax
757
Variance Explained by Each Factor
Factor1 Factor2
x2 DUPONT 1.00000 0.00733
x3 UNION CARBIDE 0.73686 0.10691
x1 ALLIED CHEM 0.64258 0.16242
x5 TEXACO 0.01281 1.00000
x4 EXXON 0.21150 0.57860
1 2
1 1.11385973 -0.3035593
2 -0.2810538 1.10793795
Inter-Factor Correlations
Factor1 Factor2
Factor1 1.00000 0.49218
Factor2 0.49218 1.00000
760
Rotated Factor Pattern (Standardized Regression Coefficients)
Factor1 Factor2
x2 DUPONT 0.90023 -0.07663
x3 UNION CARBIDE 0.62208 0.17583
x1 ALLIED CHEM 0.56382 0.23495
x5 TEXACO -0.01601 0.91538
x4 EXXON 0.26475 0.45044
761
Reference Structure (Semipartial Correlations)
Factor1 Factor2
x2 DUPONT 0.78365 -0.06670
x3 UNION CARBIDE 0.54152 0.15306
x1 ALLIED CHEM 0.49081 0.20452
x5 TEXACO -0.01394 0.79683
x4 EXXON 0.23046 0.39211
Factor1 Factor2
x2 DUPONT 0.86252 0.36645
x3 UNION CARBIDE 0.70862 0.48200
x1 ALLIED CHEM 0.67946 0.51245
x5 TEXACO 0.43452 0.90750
x4 EXXON 0.48644 0.58074
764
Scoring Coefficients Estimated by Regression
Factor1 Factor2
0.83601404 0.84825886
Factor1 Factor2
x2 DUPONT 0.58435 -0.01834
x3 UNION CARBIDE 0.21791 0.06645
x1 ALLIED CHEM 0.18991 0.08065
x5 TEXACO 0.02557 0.78737
x4 EXXON 0.07697 0.11550
765
R code posted in stocks.R
# Compute maximun likelihood estimates for factors
stocks.fac
Uniquenesses:
x1 x2 x3 x4 x5
0.497 0.252 0.474 0.610 0.176
766
Loadings:
Factor1 Factor2
x1 0.601 0.378
x2 0.849 0.165
x3 0.643 0.336
x4 0.365 0.507
x5 0.207 0.884
Factor1 Factor2
SS loadings 1.671 1.321
Proportion Var 0.334 0.264
Cumulative Var 0.334 0.598
stocks.fac
Uniquenesses:
x1 x2 x3 x4 x5
0.497 0.252 0.474 0.610 0.176
768
Loadings:
Factor1 Factor2
x1 0.601 0.378
x2 0.849 0.165
x3 0.643 0.336
x4 0.365 0.507
x5 0.207 0.884
Factor1 Factor2
SS loadings 1.671 1.321
Proportion Var 0.334 0.264
Cumulative Var 0.334 0.598
pred<-stocks.fac$loadings%*%t(stocks.fac$loadings)
+diag(stocks.fac$uniqueness)
resid <- s.cor - pred
resid
x1 x2 x3 x4
x1 -8.4905e-07 4.5256e-03 -4.1267e-03 -2.3991e-02 3.9726e-03
x2 4.5256e-03 1.8641e-07 -2.6074e-03 -3.8942e-03 3.2772e-04
x3 -4.1267e-03 -2.6074e-03 7.4009e-08 3.1384e-02 -4.2413e-03
x4 -2.3991e-02 -3.8942e-03 3.1384e-02 -1.7985e-07 -2.8193e-04
x5 3.9726e-03 3.2772e-04 -4.2413e-03 -2.8193e-04 2.7257e-08
770
# List factor scores
stocks.fac$scores
Factor1 Factor2
1 -0.05976839 0.02677225
2 -1.22226529 1.44753390
3 1.62739601 2.47578797
. . .
. . .
99 0.71440818 -0.02122186
100 -0.59289113 0.34344739
771
# You could use the following code in Splus to apply
# the quartimax rotation, but this is not available in R
#
# stocks.fac <- factanal(stocks, factors=2, rotation="quartimax",
# method="mle", scores="regression")
# stocks.fac
promax(stocks.fac$loadings, m=3)
772
Loadings:
Factor1 Factor2
x1 0.570 0.195
x2 0.965 -0.174
x3 0.639 0.125
x4 0.226 0.456
x5 -0.128 0.984
Factor1 Factor2
SS loadings 1.732 1.260
Proportion Var 0.346 0.252
Cumulative Var 0.346 0.598
$rotmat
[,1] [,2]
[1,] 1.2200900 -0.4410072
[2,] -0.4315772 1.2167132
773
# You could try to apply the promax rotation with the
# factanal function. It selects the power as m=4
>
> stocks.fac <- factanal(stocks, factors=2, rotation="promax",
+ method="mle", scores="regression")
>
> stocks.fac
Call:
factanal(x = stocks, factors = 2, scores = "regression", rotation = "
Uniquenesses:
x1 x2 x3 x4 x5
0.497 0.252 0.474 0.610 0.176
774
Loadings:
Factor1 Factor2
x1 0.576 0.175
x2 1.011 -0.232
x3 0.653
x4 0.202 0.466
x5 -0.202 1.037
Factor1 Factor2
SS loadings 1.863 1.387
Proportion Var 0.373 0.277
Cumulative Var 0.373 0.650
776
Example: Test Scores
777
Example: Test Scores
All variables load highly on the first factor. We call that a
general intelligence factor.
Half of the loadings are positive and half are negative on the
second factor. The positive loadings correspond to the ver-
bal scores and the negative correspond to the math scores.
We plot the six loadings for each factor (`i1, `i2) on the orig-
inal coordinate system and also on a rotated set of coor-
dinates chosen so that one axis goes through the loadings
(`41, `42) of the fourth variable on the two factors.
778
Factor Rotation for Test Scores
779
Varimax Rotation for Test Scores
780
PROMAX Rotation for Test Scores
F1 is more clearly a mathematics ability factor and F2 is a
verbal ability factor.
Loadings on Loadings on Communalities
Variable F1 F2 h2
i
Gaelic 0.059 -0.668 0.490
English 0.191 0.519 0.406
History -0.084 0.635 0.356
Arithmetic 0.809 0.041 0.623
Algebra 0.743 0.021 0.569
Geometry 0.575 0.064 0.372
781
Factor Scores
Sometimes, we require estimates of the factor values for each
respondent in the sample.
(or L
To estimate fj we act as if L (or )
) and are
true values.
782
Weighted Least Squares for Factor Scores
From the model above, we can try to minimize the weighted
sum of squares
p
X 2
i = 0 1 = (x Lf )0 1 (x Lf ).
i=1 i
783
Weighted Least Squares for Factor Scores
When factor loadings are estimated with the PC method,
ordinary least squares is sometimes used to get the factor
scores. In this case
0 L
fj = (L )1L
0(xj x
).
q q q
=[
Since L 1
e1
2
e2 ...
m
em], we find that
1/2 0
1 e1(xj x
)
1/2 0
2 e2(xj x
)
fj = .
...
1/2 0
m em(xj x
)
784
Weighted Least Squares for Factor Scores
785
Regression Method for Factor Scores
We again start from the model (X ) = LF + .
787
Factor Scores for the Stock Price Data
788
Heywood Cases and Other Potential Problems
789
Heywood Cases and Other Potential Problems
790
Heywood Cases and Other Potential Problems
791
Heywood Cases and Other Potential Problems
792
Heywood Cases and Other Potential Problems
When iterative algorithms in SAS fail to converge, we can
include a heywood or an ultra heywood option in the proc
factor statement.
The dataset for women includes seven variables x1, ..., x7 that
correspond to national records on 100m, 200m, 400m, 800m,
1500m, 3000m and marathon races.
The dataset for men includes eight variables x1, ..., x8 that
correspond to national records on 100m, 200m, 400m, 800m,
1500m, 5000m, 10000m and marathon races.
The first three times are in seconds and the remaining times
are in minutes.
794
Example: track records for men and women
795
Example: track records for men
The scatter plot matrix reveals a couple of extreme countries
with slow times but they conform to the correlation pattern.
All relationships are approximately straight line relationships.
796
2 1 3 1 1 3 1 1 3 1 1 3
2
100m
2 0
3
1
200m
2
3
400m
800m
1
3
1500m
1 1
5000m
1 1
3
10000m
1 1
marathon
1 1
2 0 2 2 1 3 1 1 3 1 1 3
797
Example: track records for men
798
Example: track records for women
The two factor model does not appear to fit for women
(p-value<.0001).
799
Example: track records for women
800
2 0 2 1 1 3 1 1 3 5
2
100m
200m
0
3
400m
800m
1
3
1500m
1
5
3000m
1 1
3
marathon
2 0 2 1 1 3 1 1 3 1 1 3
801
Example: track records for women
802