0% found this document useful (0 votes)
151 views

Factor Analysis Exercises

This document discusses using factor analysis to analyze labor and social protection data from 27 EU countries. The analysis involves dealing with missing data, assessing correlations between variables using Bartlett's test of sphericity and the KMO measure, removing variables, interpreting communalities from the principal axis factoring analysis, selecting the number of factors based on eigenvalues above 1 or a scree plot, and potentially rotating factors. While only 81% of variance is explained by two factors, this is considered a good result from reducing ten variables down to two factors.

Uploaded by

aida
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

Factor Analysis Exercises

This document discusses using factor analysis to analyze labor and social protection data from 27 EU countries. The analysis involves dealing with missing data, assessing correlations between variables using Bartlett's test of sphericity and the KMO measure, removing variables, interpreting communalities from the principal axis factoring analysis, selecting the number of factors based on eigenvalues above 1 or a scree plot, and potentially rotating factors. While only 81% of variance is explained by two factors, this is considered a good result from reducing ten variables down to two factors.

Uploaded by

aida
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Exercise on Factor Analysis

You are an expert in labor policies working for the European Commission and you have been asked
by your supervisor to compile a report on the situation of labor and social protection within the 27
countries belonging to the European Union. For this purpose, you decide to use data made
available from the World Bank for the year 2012.

Table 4.1: Descriptive statistics


Minimu Maximu Std.
N m m Mean Deviation
Unemployment, youth total (% of total labor 2 24.970
8.10 55.30 11.38108
force ages 15-24) 7 4
Long-term unemployment (% of total 2
1.10 14.40 4.9333 3.22979
unemployment) 7
Employment in agriculture (% of total 2
1.00 29.00 5.8346 5.84417
employment) 6
Employers, total (% of employment) 2
1.20 7.20 4.0593 1.22734
7
Self-employed, total (% of total employed) 2 15.896
8.60 36.80 6.80359
7 3
Vulnerable employment, total (% of total 2 11.840
5.00 31.50 6.55786
employment) 7 7
Wage and salaried workers, total (% of total 2 84.074
63.20 91.40 6.79471
employed) 7 1
Contributing family workers, total (% of total 2
.20 12.60 1.6440 2.59343
employed) 5
Employment in services (% of total 2 68.953
42.40 84.10 9.08001
employment) 6 8
Employment in industry (% of total 2 24.938
12.40 38.10 6.22923
employment) 6 5
Unemployment, total (% of total labor force) 2 10.559
4.30 25.00 5.15075
7 3
Employment to population ratio, 15+, total (%) 2 52.237
40.30 61.30 5.06869
7 0
Valid N (listwise) 2
4

All variables consist of percentages, therefore all of them are expressed on the same scale.

1) Based on Table 4.1, which problem do you have to deal with in order to proceed with the
analysis?
I notice that there are some missing values. I could therefore opt either for the removal
of the countries with missing information, either listwise of pairwise, or for replacing the
missing values with the mean of the variable.

2) What does the Bartlett’s test of sphericity tell you about the correlation between the
items? What is the null hypothesis behind the test? Given a value a value of 673.178 (p-
value=.000) for Bartlett’s test, would you proceed with the analysis or not? Are you
confident with that or would you need some other piece of information?
Bartlett’s test of sphericity is a global test on the correlation matrix, whose null
hypothesis states that the correlation between the variables is equal to zero, thus the
correlation matrix is an identity matrix. Just considering the result of the test is not
enough to proceed with the analysis, since the test just tells me that there is some
correlation between the variables, but I do not know how strong it is and among how
many variables. For this information, I need the value of the KMO measure.

3) You decide to remove two variables from the analysis (“employment in industry” and
“employers”). Looking at the main diagonal of the new anti-image matrix, you see that the
values range from 0.644 (“Unemployment, total”) to 0.800 (“Vulnerable employment,
total”). What do these values tell us? Are you confident enough with going on with the
analysis (the new KMO measure is 0.742)?
The anti-image matrix reports the KMO MSA measures for each of the variables, that is to
say, the amount of common variance between each variable and the remaining ones. The
smallest value, 0.644, indicates that the inter-correlation between the item and the other
ones is just mediocre. However, if the other values are above 0.7, the degree of inter-
correlation should be satisfactory. Moreover, the KMO MSA is larger than 0.7, indicating that
the total degree of inter-correlation in the data is satisfactory. Hence, I would proceed with
the analysis.

Table 4.2: Communalities


Initial Extraction
Employment in services (% of total
.676 .449
employment)
Unemployment, youth total (% of total
.954 .954
labor force ages 15-24)
Long-term unemployment (% of total
.909 .842
unemployment)
Employment in agriculture (% of total
.895 .777
employment)
Self-employed, total (% of total employed) 1.000 .883
Employment to population ratio, 15+, total
.767 .585
(%)
Vulnerable employment, total (% of total
.993 .936
employment)
Wage and salaried workers, total (% of
1.000 .881
total employed)
Contributing family workers, total (% of
.902 .859
total employed)
Unemployment, total (% of total labor
.967 .929
force)
Extraction Method: Principal Axis Factoring.

4) What are communalities? What is the difference between the initial communalities for a
“principal axis factoring analysis” and the initial communalities for a “principal component
analysis”?
The communality of a variable is the proportion of each variable’s variance that is in
common with the other variable and therefore can be explained by the factors. It can be
defined as the sum of squared factor loadings for the variables.
The initial communalities under the principal axis factoring analysis aim to describe the
share of common variance between each variable and all the remaining ones, thus they
are numbers between 0 and 1. This happens because this method is not interested in
explaining the whole variance in the data, but only the common variance between the
variables.
Instead, the initial communalities under the PCA are all equal to one because now the
analysis aims to reduce the data dimension having as goal to explain as much as possible
of the whole variability in the data, not only the one in common between the variables.

5) Table 4.3 shows the total variance explained by each of the factors. What possible criteria
might have been used to select the number of factors? Is it OK if the two factors explain
only 81% of the whole variance?
One possible criterion is to select those factors characterized by an initial eigenvalue
bigger than one, which indicates that the factor accounts for more variability than a
single variable. Another possible criterion is to look at the scree plot, which plots the
eigenvalues against the factor number. The optimal number of factors is the one that is
followed by an almost flat line. This indicates that each successive factor accounts for
increasingly smaller amounts of the total variance.
I am fine with the fact that only around 80% of total variance is explained by the factors.
The aim of FA is to reduce the dimension of data to a smaller set of variables keeping as
large as possible their capacity to explain most of the total variance in the initial data. In
this case, the result is very good, because 80% of the total variance is explained by two
factors, compared to the ten initial variables.

6) If you asked for rotation, SPSS output finally shows you two matrices, one with the
unrotated factors and the other with the rotated ones. Which one would you consider to
facilitate your task of interpreting the latent factors that you found? And why?
I would consider the matrix with the rotated factors. Indeed, rotation is a mathematical
transformation that doesn’t alter the factors and the quantity of variability that they
explain. Rather, by distributing the variance that they explain more evenly among the
variables, rotation makes interpretation of the factors more straightforward. Different
kinds of rotation are possible. For instance, varimax rotation gives totally uncorrelated
factors, while oblique rotation gives correlated factors.
7) Using Table 4.4, with the rotated factors, obtained through Varimax method with Kaiser
normalization, would you be able to compute the communality for the item “Long-term
unemployment”?
The communality for the item is given by
(0.097)^2+(0.912)^2 = 0.8412.

8) Instead of the Varimax method for rotation, say, you opted for the Direct Oblimin method.
What is the assumption behind this rotation method? You therefore got the output shown
in Table 4.5. Given this piece of information, would you be confident enough in using your
two new factors as independent variables in a regression model?

Direct Oblimin is a method for oblique rotation. This is not an orthogonal rotation of the
factors, therefore the factors are no longer uncorrelated. In this particular case, since the
correlation between the two factors is rather small, just 0.28, I would be confident
enough in using the two factors as independent variables in a regression model.

Table 4.3: Total variance explained


Extraction Sums of Squared Rotation Sums of Squared
Initial Eigenvalues Loadings Loadings
% of % of % of
Facto Varianc Cumulativ Varianc Cumulativ Varianc Cumulativ
r Total e e% Total e e% Total e e%
1 5.57 4.39
5.736 57.357 57.357 55.774 55.774 43.953 43.953
7 5
2 2.51 3.69
2.697 26.974 84.331 25.169 80.943 36.990 80.943
7 9
3 .709 7.089 91.420
4 .389 3.887 95.307
5 .260 2.604 97.911
6 .116 1.156 99.066
7 .066 .657 99.723
8 .021 .211 99.934
9 .007 .065 100.000
10 3.802E
.000 100.000
-005
Table 4.4: Rotated factor matrix
Factor
1 2
Contributing family workers, total (% of total employed) .926 -.029
Vulnerable employment, total (% of total employment) .909 .331
Employment in agriculture (% of total employment) .877 .088
Self-employed, total (% of total employed) .840 .421
Wage and salaried workers, total (% of total employed) -.839 -.421
Employment in services (% of total employment) -.669 .033
Unemployment, total (% of total labor force) .050 .963
Unemployment, youth total (% of total labor force ages 15-24) .172 .961
Long-term unemployment (% of total unemployment) .097 .912
Employment to population ratio, 15+, total (%) -.207 -.736
Extraction Method: Principal Axis Factoring.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.

Table 4.5: Factor correlation matrix

Factor 1 2
1 1.000 .282
2 .282 1.000

You might also like