Factor Analysis Using SPSS: Example
Factor Analysis Using SPSS: Example
SPSS
Example
Factor analysis is frequently used to develop questionnaires: after all if you want to measure
an ability or trait, you need to ensure that the questions asked relate to the construct that
you intend to measure. I have noticed that a lot of students become very stressed about
SPSS. Therefore I wanted to design a questionnaire to measure a trait that I termed ‘SPSS
anxiety’. I decided to devise a questionnaire to measure various aspects of students’ anxiety
towards learning SPSS. I generated questions based on interviews with anxious and non-
anxious students and came up with 23 possible questions to include. Each question was a
statement followed by a five-point Likert scale ranging from ‘strongly disagree’ through
‘neither agree or disagree’ to ‘strongly agree’. The questionnaire is printed in Field (2005, p.
639).
The questionnaire was designed to predict how anxious a given individual would be about
learning how to use SPSS. What’s more, I wanted to know whether anxiety about SPSS could
be broken down into specific forms of anxiety. So, in other words, are there other traits that
might contribute to anxiety about SPSS? With a little help from a few lecturer friends I
collected 2571 completed questionnaires (at this point it should become apparent that this
example is fictitious!). The data are stored in the file SAQ.sav.
Questionnaires are made up of multiple items each of which elicits a
response from the same person. As such, it is a repeated measures
design.
Given we know that repeated measures go in different columns, different
questions on a questionnaire should each have their own column in SPSS.
Initial Considerations
Sample Size
Correlation coefficients fluctuate from sample to sample, much more so in small samples than
in large. Therefore, the reliability of factor analysis is also dependent on sample size. Field
(2005) reviews many suggestions about the sample size necessary for factor analysis and
concludes that it depends on many things. In general over 300 cases is probably adequate but
communalities after extraction should probably be above 0.5 (see Field, 2005).
Data Screening
SPSS will nearly always find a factor solution to a set of variables. However, the solution is
unlikely to have any real meaning if the variables analysed are not sensible. The first thing to
do when conducting a factor analysis is to look at the inter-correlation between variables. If
our test questions measure the same underlying dimension (or dimensions) then we would
expect them to correlate with each other (because they are measuring the same thing). If we
find any variables that do not correlate with any other variables (or very few) then you should
consider excluding these variables before the factor analysis is run. The correlations between
variables can be checked using the correlate procedure (see Chapter 4) to create a correlation
matrix of all variables. This matrix can also be created as part of the main factor analysis.
The opposite problem is when variables correlate too highly. Although mild multicollinearity is
not a problem for factor analysis it is important to avoid extreme multicollinearity (i.e.
variables that are very highly correlated) and singularity (variables that are perfectly
correlated). As with regression, singularity causes problems in factor analysis because it
becomes impossible to determine the unique contribution to a factor of the variables that are
SPSS Output 1
SPSS Output 2 shows several very important parts of the output: the Kaiser-Meyer-Olkin
measure of sampling adequacy and Bartlett's test of sphericity. The KMO statistic varies
between 0 and 1. A value of 0
indicates that the sum of partial KMO and Bartlett's Test
Bartlett's test is highly significant (p < 0.001), and therefore factor analysis is 3appropriate.41.842
3 1.317 5.725 44.981 1.317 5.725 44.981 2.55 11.099
Communalities
Initial Extraction
Q0 1.000 .435
1 Component Matrixa
Q0 1.000 .414 Component
2 1 2 3 4
Q0 1.000 .530 Q18 .
3 70
1
Q0 1.000 .469 Q07 .
4 68
Q0 1.000 .343 5
5 Q16 .
67
Q0 1.000 .654 9
6 Q13 .
Q0 1.000 .545 67
3
7
Q12 .
Q0 1.000 .739 66
8 9
Q0 1.000 .484 Q21 .
65
9 8
Q1 1.000 .335 Q14 .
0 65
Q1 6
1.000 .690
Q11 . -.40
1 65 0
Q1 1.000 .513 2
2 Q17 .
Q1 64
1.000 .536 3
3 Q04 .
Q1 1.000 .488 63
4 4
Q1 Q03 -.62
1.000 .378 9
5 Q15 .
Q1 1.000 .487 59
SPSS Output 4
This output also shows the component matrix before rotation. This matrix contains the
loadings of each variable onto each factor. By default SPSS displays all loadings; however, we
requested that all loadings less than 0.4 be suppressed in the output and so there are blank
spaces for many of the loadings. This matrix is not particularly important for interpretation.
At this stage SPSS has extracted four factors. Factor analysis is an exploratory tool and so it
should be used to guide the researcher to make various decisions: you shouldn't leave the
computer to make them. One important decision is the number of factors to extract. By
Kaiser's criterion we should extract four factors and this is what SPSS has done. However, this
criterion is accurate when there are less than 30 variables and communalities after extraction
are greater than 0.7 or when the sample size exceeds 250 and the average communality is
greater than 0.6. The communalities are shown in SPSS Output 4, and none exceed 0.7. The
average of the communalities can be found by adding them up and dividing by the number of
communalities (11.573/23 = 0.503). So, on both grounds Kaiser's rule may not be accurate.
However, you should consider the huge sample that we have, because the research into
Kaiser's criterion gives recommendations for much smaller samples. We can also use the
scree
plot, which we asked SPSS to produce. The scree plot is shown below with a thunderbolt
indicating the point of inflexion on the curve. This curve is difficult to interpret because the
curve begins to tail off after three factors, but there is another drop after four factors before a
stable plateau is reached. Therefore, we could probably justify retaining either two or four
factors. Given the large sample, it is probably safe to assume Kaiser's criterion; however, you
could rerun the analysis specifying that SPSS extract only two factors and compare the
results.
Scree Plot
8
2
Eigenvalue
0
1 3 5 7 9 11 13 15 17 19 21 23
Component Number
SPSS Output 5
If there are less than 30 variables and communalities after extraction are
greater than 0.7 or if the sample size exceeds 250 and the average
communality is greater than 0.6 then retain all factors with Eigen values
above 1 (Kaiser’s criterion).
If none of the above apply, a Scree Plot can be used when the sample size
is large (around 300 or more cases).
Factor Rotation
The first analysis I asked you to run was using an orthogonal rotation. SPSS Output 6 shows
the rotated component matrix (also called the rotated factor matrix in factor analysis) which
is a matrix of the factor loadings for each variable onto each factor. This matrix contains the
same information as the component matrix in SPSS Output 4 except that it is calculated after
rotation. There are several things to consider about the format of this matrix. First, factor
loadings less than 0.4 have not been displayed because we asked for these loadings to be
suppressed. If you didn't select this option, or didn't adjust the criterion value to 0.4, then
your output will differ. Second, the variables are listed in the order of size of their factor
loadings because we asked for the output to be Sorted by size. If this option was not selected
your output will look different. Finally, for all other parts of the output I suppressed the
variable labels (for reasons of space) but for this matrix I have allowed the variable labels to
be printed to aid interpretation.
Compare this matrix with the unrotated solution. Before rotation, most variables loaded highly
onto the first factor and the remaining factors didn't really get a look in. However, the rotation
of the factor structure has clarified things considerably: there are four factors and variables
load very highly onto only one factor (with the exception of one question). The suppression of
loadings less than 0.4 and ordering variables by loading size also makes interpretation
considerably easier (because you don't have to scan the matrix to identify substantive
loadings).
C8057 (Research Methods II): Factor Analysis on
SPSS
SPSS Output 6
Use orthogonal rotation when you believe your factors should theoretically
independent (unrelated to each other).
Use oblique rotation when you believe factors should be related to each
other.
Interpretation
The next step is to look at the content of questions that load onto the same factor to try to
identify common themes. If the mathematical factor produced by the analysis represents
some real-world construct then common themes among highly loading questions can help us
identify what the construct might be. The questions that load highly on factor 1 seem to all
relate to using computers or SPSS. Therefore we might label this factor fear of computers.
The questions that load highly on factor 2 all seem to relate to different aspects of statistics;
therefore, we might label this factor fear of statistics. The three questions that load highly on
factor 3 all seem to relate to mathematics; therefore, we might label this factor fear of
mathematics. Finally, the questions that load highly on factor 4 all contain some component of
social evaluation from friends; therefore, we might label this factor peer evaluation. This
analysis seems to reveal that the initial questionnaire, in reality, is composed of four sub-
scales: fear of computers, fear of statistics, fear of maths, and fear of negative peer
evaluation. There are two possibilities here. The first is that the SAQ failed to measure what it
set out to (namely SPSS anxiety) but does measure some related constructs. The second is
that these four constructs are sub-components of SPSS anxiety; however, the factor analysis
does not indicate which of these possibilities is true.
Guided Example
C8057 (Research Methods II): Factor Analysis on
SPSS
The University of Sussex is constantly seeking to employ the best people possible as lecturers
(no, really, it is). Anyway, they wanted to revise a questionnaire based on Bland’s theory of
SPSS Output 6
C8057 (Research Methods II): Factor Analysis on
SPSS
research methods lecturers. This theory predicts that good research methods lecturers should
have four characteristics: (1) a profound love of statistics; (2) an enthusiasm for experimental
design; (3) a love of teaching; and (4) a complete absence of normal interpersonal skills.
These characteristics should be related (i.e. correlated). The ‘Teaching Of Statistics for
Scientific Experiments’ (TOSSE) already existed, but the university revised this questionnaire
and it became the ‘Teaching Of Statistics for Scientific Experiments — Revised’ (TOSSE—R).
The gave this questionnaire to 239 research methods lecturers around the world to see if it
supported Bland’s theory.
The questionnaire is below.
I once woke up in the middle of a vegetable patch hugging a turnip that I'd mistakenly dug
1 A A A A A
up thinking it was Roy's largest root
2
If I had a big gun I'd shoot all the students I have to teach A A A A A
Teaching others makes me want to swallow a large bottle of bleach because the pain of my
6 A A A A A
burning oesophagus would be light relief in comparison
11 I like it when people tell me I've helped them to understand factor rotation A A A A A
12
People fall asleep as soon as I open my mouth to speak A A A A A
14
I'd rather think about appropriate dependent variables than go to the pub A A A A A
23 I often spend my spare time talking to the pigeons ... and even they die of boredom A A A A A
I tried to build myself a time machine so that I could go back to the 1930s and follow Fisher
24 A A A A A
around on my hands and knees licking the floor on which he'd just trodden
25 I love teaching A A A A A
27 I love teaching because students have to pretend to like me or they'll get bad marks A A A A A
Your Answer:
Is the sample size adequate? Explain your answer quoting any relevant
statistics.
Your Answer:
How many factors should be retained? Explain your answer quoting any
relevant statistics.
Your Answer:
What method of rotation have you used and why?
Your Answer:
Which items load onto which factors? Do these factors make psychological
sense (i.e. can you name them based on the items that load onto them?)
Your Answer:
Unguided Example
Re-run the SAQ analysis using oblique rotation (Use Field, 2005 to help
you). Compare the results to the current analysis. Also, look over Field
(2005) and find out about Factor Scores and how to interpret them.