Factor Analysis2
Factor Analysis2
19
“ ”
Factor analysis allows us to look at groups
correlations.
Objectives [ After reading this chapter, the student should be able to: ]
1. Describe the concept of factor analysis and explain how it is different from
analysis of variance, multiple regression, and discriminant analysis.
2. Discuss the procedure for conducting factor analysis, including problem
formulation, construction of the correlation matrix, selection of an appropriate
method, determination of the number of factors, rotation, and interpretation
of factors.
3. Understand the distinction between principal component factor analysis
and common factor analysis methods.
4. Explain the selection of surrogate variables and their application, with
emphasis on their use in subsequent analysis.
5. Describe the procedure for determining the fit of a factor analysis model
using the observed and the reproduced correlations.
602
Factor Analysis
Overview In analysis of variance (Chapter 16), regression (Chapter 17), and discriminant analysis
(Chapter 18), one of the variables is clearly identified as the dependent variable. We now turn
to a procedure, factor analysis, in which variables are not classified as independent or depen-
dent. Instead, the whole set of interdependent relationships among variables is examined.
This chapter discusses the basic concept of factor analysis and gives an exposition of the
factor model. We describe the steps in factor analysis and illustrate them in the context of
principal components analysis. Next, we present an application of common factor analysis.
Finally, we discuss the use of software in factor analysis. Help for running the SPSS and
SAS Learning Edition programs used in this chapter is provided in four ways: (1) detailed
step-by-step instructions are given later in the chapter, (2) you can download (from the Web
site for this book) computerized demonstration movies illustrating these step-by-step
instructions, (3) you can download screen captures with notes illustrating these step-by-step
instructions, and (4) you can refer to the Study Guide and Technology Manual, a supplement
that accompanies this book.
To begin, we provide some examples to illustrate the usefulness of factor analysis.
603
604 PART III • DATA COLLECTION, PREPARATION, ANALYSIS, AND REPORTING
relatives, attractiveness of the physical structure, community involvement, and obtainability of loans.
Competence consisted of employee competence and availability of auxiliary banking services. It was
concluded that consumers evaluated banks using the four basic factors of traditional services, convenience,
visibility, and competence, and banks must excel on these factors to project a good image. By emphasizing
these factors, JPMorgan Chase & Co. became one of the largest U.S. banks and bought the banking opera-
tions of bankrupt rival Washington Mutual in September 2008.1 ■
Basic Concept
Factor analysis is a general name denoting a class of procedures primarily used for data reduc-
factor analysis tion and summarization. In marketing research, there may be a large number of variables, most
A class of procedures of which are correlated and which must be reduced to a manageable level. Relationships among
primarily used for data sets of many interrelated variables are examined and represented in terms of a few underlying
reduction and
factors. For example, store image may be measured by asking respondents to evaluate stores on
summarization.
a series of items on a semantic differential scale. These item evaluations may then be analyzed to
determine the factors underlying store image.
In analysis of variance, multiple regression, and discriminant analysis, one variable is
considered as the dependent or criterion variable, and the others as independent or predictor vari-
ables. However, no such distinction is made in factor analysis. Rather, factor analysis is an
interdependence technique in that an entire set of interdependent relationships is examined.2
interdependence Factor analysis is used in the following circumstances:
technique
Multivariate statistical 1. To identify underlying dimensions, or factors, that explain the correlations among a
techniques in which set of variables. For example, a set of lifestyle statements may be used to measure the
the whole set of psychographic profiles of consumers. These statements may then be factor analyzed
interdependent to identify the underlying psychographic factors, as illustrated in the department store
relationships is example. This is also illustrated in Figure 19.1 derived based on empirical analysis,
examined. where the seven psychographic variables can be represented by two factors. In this
factors
figure, factor 1 can be interpreted as homebody versus socialite, and factor 2 can be
An underlying dimension interpreted as sports versus movies/plays.
that explains the 2. To identify a new, smaller set of uncorrelated variables to replace the original set of correlated
correlations among a set variables in subsequent multivariate analysis (regression or discriminant analysis). For example,
of variables. the psychographic factors identified may be used as independent variables in explaining the
differences between loyal and nonloyal consumers. Thus, instead of the seven correlated
psychographic variables of Figure 19.1, we can use the two uncorrelated factors,
i.e., homebody versus socialite, and sports versus movies/plays, in subsequent analysis.
3. To identify a smaller set of salient variables from a larger set for use in subsequent multi-
variate analysis. For example, a few of the original lifestyle statements that correlate highly
with the identified factors may be used as independent variables to explain the differences
between the loyal and nonloyal users. Specifically, based on theory and empirical results
Evening at home
Factor 1
Plays
Movies
CHAPTER 19 • FACTOR ANALYSIS 605
(Figure 19.1), we can select home is best place and football as independent variables, and
drop the other five variables to avoid problems due to multicollinearity (see Chapter 17).
All these uses are exploratory in nature and, therefore, factor analysis is also called exploratory
factor analysis (EFA). The technique has numerous applications in marketing research. For
example:
䊉 It can be used in market segmentation for identifying the underlying variables on which to
group the customers. New car buyers might be grouped based on the relative emphasis
they place on economy, convenience, performance, comfort, and luxury. This might result
in five segments: economy seekers, convenience seekers, performance seekers, comfort
seekers, and luxury seekers.
䊉 In product research, factor analysis can be employed to determine the brand attributes that
influence consumer choice. Toothpaste brands might be evaluated in terms of protection
against cavities, whiteness of teeth, taste, fresh breath, and price.
䊉 In advertising studies, factor analysis can be used to understand the media consumption
habits of the target market. The users of frozen foods may be heavy viewers of cable TV,
see a lot of movies, and listen to country music.
䊉 In pricing studies, it can be used to identify the characteristics of price-sensitive consumers.
For example, these consumers might be methodical, economy minded, and home centered.
FIGURE 19.2 X2
Graphical Fi = Wi1X1 + Wi2X2
Illustration
of Factor
Analysis
X1
The researcher decides on the number of factors to be extracted and the method of rotation.
Next, the rotated factors should be interpreted. Depending upon the objectives, the factor scores
may be calculated, or surrogate variables selected, to represent the factors in subsequent multi-
variate analysis. Finally, the fit of the factor analysis model is determined. We discuss these
steps in more detail in the following sections.4
TABLE 19.1
Toothpaste Attribute Ratings
Respondent
Number V1 V2 V3 V4 V5 V6
TABLE 19.2
Correlation Matrix
Variables V1 V2 V3 V4 V5 V6
V1 1.00
V2 -0.053 1.00
SPSS Output File V3 0.873 -0.155 1.00
V4 -0.086 0.572 -0.248 1.00
V5 -0.858 0.020 -0.778 -0.007 1.00
V6 0.004 0.640 -0.018 0.640 -0.136 1.00
TABLE 19.3
Results of Principal Components Analysis
Bartlett test of sphericity
Approx. chi-square 111.314, df 15, significance 0.00000
Kaiser-Meyer-Olkin measure of sampling adequacy 0.660
Communalities
SPSS Output File
Variable Initial Extraction
V1 1.000 0.926
V2 1.000 0.723
V3 1.000 0.894
SAS Output File V4 1.000 0.739
V5 1.000 0.878
V6 1.000 0.790
Initial Eigenvalues
Factor Eigenvalue % of Variance Cumulative %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
6 0.085 1.420 100.000
(continued)
610 PART III • DATA COLLECTION, PREPARATION, ANALYSIS, AND REPORTING
TABLE 19.3
Results of Principal Components Analysis (continued)
Extraction Sums of Squared Loadings
Factor Eigenvalue % of Variance Cumulative %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
Factor Matrix Factor 1 Factor 2
V1 0.928 0.253
V2 -0.301 0.795
V3 0.936 0.131
V4 -0.342 0.789
V5 -0.869 -0.351
V6 -0.177 0.871
Rotation Sums of Squared Loadings
Factor Eigenvalue % of Variance Cumulative %
1 2.688 44.802 44.802
2 2.261 37.687 82.488
Rotated Factor Matrix
Factor 1 Factor 2
V1 0.962 -0.027
V2 -0.057 0.848
V3 0.934 -0.146
V4 -0.098 0.854
V5 -0.933 -0.084
V6 0.083 0.885
Factor Score Coefficient Matrix
Factor 1 Factor 2
V1 0.358 0.011
V2 -0.001 0.375
V3 0.345 -0.043
V4 -0.017 0.377
V5 -0.350 -0.059
V6 0.052 0.395
Reproduced Correlation Matrix
V1 V2 V3 V4 V5 V6
V1 0.926* 0.024 -0.029 0.031 0.038 -0.053
V2 -0.078 0.723* 0.022 -0.158 0.038 -0.105
V3 0.902 -0.177 0.894* -0.031 0.081 0.033
V4 -0.117 0.730 -0.217 0.739* -0.027 -0.107
V5 -0.895 -0.018 -0.859 0.020 0.878* 0.016
V6 0.057 0.746 -0.051 0.748 -0.152 0.790*
*The lower left triangle contains the reproduced correlation matrix; the diagonal, the
communalities; the upper right triangle, the residuals between the observed correlations
and the reproduced correlations.
CHAPTER 19 • FACTOR ANALYSIS 611
Rotate Factors
An important output from factor analysis is the factor matrix, also called the factor pattern matrix.
The factor matrix contains the coefficients used to express the standardized variables in terms of the
factors. These coefficients, the factor loadings, represent the correlations between the factors and
2.0
Eigenvalues
1.0
0.0
1 2 3 4 5 6
Number of Factors
CHAPTER 19 • FACTOR ANALYSIS 613
the variables. A coefficient with a large absolute value indicates that the factor and the variable are
closely related. The coefficients of the factor matrix can be used to interpret the factors.
Although the initial or unrotated factor matrix indicates the relationship between the factors and
individual variables, it seldom results in factors that can be interpreted, because the factors are corre-
lated with many variables. For example, in Table 19.3, under “Factor Matrix”, factor 1 is at least some-
what correlated with five of the six variables (an absolute value of factor loading greater than 0.3).
Likewise, factor 2 is at least somewhat correlated with four of the six variables. Moreover, variables 2
and 5 load at least somewhat on both the factors. This is illustrated in Figure 19.5(a). How should
these factors be interpreted? In such a complex matrix it is difficult to interpret the factors. Therefore,
through rotation, the factor matrix is transformed into a simpler one that is easier to interpret.
In rotating the factors, we would like each factor to have nonzero, or significant, loadings or
coefficients for only some of the variables. Likewise, we would like each variable to have nonzero
or significant loadings with only a few factors, if possible with only one. If several factors have
high loadings with the same variable, it is difficult to interpret them. Rotation does not affect the
communalities and the percentage of total variance explained. However, the percentage of
variance accounted for by each factor does change. This is seen in Table 19.3 by comparing
“Extraction Sums of Squared Loadings” with “Rotation Sums of Squared Loadings.” The
variance explained by the individual factors is redistributed by rotation. Hence, different methods
of rotation may result in the identification of different factors.
orthogonal rotation The rotation is called orthogonal rotation if the axes are maintained at right angles. The
Rotation of factors in which most commonly used method for rotation is the varimax procedure. This is an orthogonal
the axes are maintained at method of rotation that minimizes the number of variables with high loadings on a factor,
right angles. thereby enhancing the interpretability of the factors.8 Orthogonal rotation results in factors that
varimax procedure are uncorrelated. The rotation is called oblique rotation when the axes are not maintained at
An orthogonal method right angles, and the factors are correlated. Sometimes, allowing for correlations among factors
of factor rotation that can simplify the factor pattern matrix. Oblique rotation should be used when factors in the
minimizes the number population are likely to be strongly correlated.
of variables with high In Table 19.3, by comparing the varimax rotated factor matrix with the unrotated matrix (titled
loadings on a factor, “Factor Matrix”), we can see how rotation achieves simplicity and enhances interpretability.
thereby enhancing the Whereas five variables correlated with factor 1 in the unrotated matrix, only variables V1, V3, and V5
interpretability of the correlate with factor 1 after rotation. The remaining variables, V2, V4, and V6, correlate highly with
factors.
factor 2. Furthermore, no variable correlates highly with both the factors. This can be seen clearly in
oblique rotation Figure 19.5(b). The rotated factor matrix forms the basis for interpretation of the factors.
Rotation of factors when
the axes are not maintained Interpret Factors
at right angles.
Interpretation is facilitated by identifying the variables that have large loadings on the same factor.
That factor can then be interpreted in terms of the variables that load high on it. Another useful aid
in interpretation is to plot the variables using the factor loadings as coordinates. Variables at the
end of an axis are those that have high loadings on only that factor, and hence describe the factor.
Variables near the origin have small loadings on both the factors. Variables that are not near any of
the axes are related to both the factors. If a factor cannot be clearly defined in terms of the original
variables, it should be labeled as an undefined or a general factor.
In the rotated factor matrix of Table 19.3, factor 1 has high coefficients for variables V1
(prevention of cavities) and V3 (strong gums), and a negative coefficient for V5 (prevention of
tooth decay is not important). Therefore, this factor may be labeled a health benefit factor.
Factor 2
V1
0.0
V5
SPSS Output File V3
–0.5
–1.0
–1.0 –0.5 0.0 0.5 1.0
SAS Output File Factor 1
Note that a negative coefficient for a negative variable (V5) leads to a positive interpretation
that prevention of tooth decay is important. Factor 2 is highly related with variables V2 (shiny
teeth), V4 (fresh breath), and V6 (attractive teeth). Thus, factor 2 may be labeled a social bene-
fit factor. A plot of the factor loadings, given in Figure 19.6, confirms this interpretation.
Variables V1, V3, and V5 are at the ends of the horizontal axis (factor 1), with V5 at the end
opposite to V1 and V3, whereas variables V2, V4, and V6 are at the end of the vertical axis
(factor 2). One could summarize the data by stating that consumers appear to seek two major
kinds of benefits from a toothpaste: health benefits and social benefits.
than one with a slightly higher loading. Likewise, if a variable has a slightly lower loading but has
been measured more precisely, it should be selected as the surrogate variable. In Table 19.3, the
variables V1, V3, and V5 all have high loadings on factor 1, and all are fairly close in magnitude,
although V1 has relatively the highest loading and would therefore be a likely candidate. However,
if prior knowledge suggests that prevention of tooth decay is a very important benefit, V5 would be
selected as the surrogate for factor 1. Also, the choice of a surrogate for factor 2 is not straight
forward. Variables V2, V4, and V6 all have comparable high loadings on this factor. If prior knowledge
suggests that attractive teeth is the most important social benefit sought from a toothpaste, the
researcher would select V6.
ACTIVE RESEARCH
Factor Interpretation
Factor (% variance explained) Loading Variables Included in the Factor
were significant at p 6 0.001. Correct classification into high, medium, and low categories was achieved
for 65 percent of the cases. The order of entry into discriminant analysis was used to determine the relative
importance of factors as trade support influencers, as shown in Table 3.9
In keeping with the results of this study, P&G decided to emphasize item importance, item profitability,
incentive amount, and its reputation in order to garner retailers’ promotion support. Partially as a result of
these efforts, P&G brands touched the lives of people around the world three billion times a day in 2009. ■
ACTIVE RESEARCH
Wendy’s: How Old Fashioned Is Consumer’s Choice Criteria for Fast Foods
Visit www.wendys.com and search the Internet using a search engine as well as your library’s online database
to determine the choice criteria of consumers in selecting a fast-food restaurant.
As the marketing director for Wendy’s, what marketing strategies would you formulate to increase
your patronage?
Describe the data you would collect and the analysis you would conduct to determine the choice
criteria of consumers in selecting a fast-food restaurant.
618 PART III • DATA COLLECTION, PREPARATION, ANALYSIS, AND REPORTING
TABLE 19.4
Results of Common Factor Analysis
Bartlett test of sphericity
Approx. chi-square 111.314, df 15, significance 0.00000
Kaiser-Meyer-Olkin measure of sampling adequacy 0.660
Communalities
SPSS Output File
Variable Initial Extraction
V1 0.859 0.928
V2 0.480 0.562
V3 0.814 0.836
SAS Output File V4 0.543 0.600
V5 0.763 0.789
V6 0.587 0.723
Initial Eigenvalues
Factor Eigenvalue % of Variance Cumulative %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
6 0.085 1.420 100.000
Extraction Sums of Squared Loadings
Factor Eigenvalue % of Variance Cumulative %
1 2.570 42.837 42.837
2 1.868 31.126 73.964
Factor Matrix
Factor 1 Factor 2
V1 0.949 0.168
V2 -0.206 0.720
V3 0.914 0.038
V4 -0.246 0.734
V5 -0.850 -0.259
V6 -0.101 0.844
(continued)
CHAPTER 19 • FACTOR ANALYSIS 619
TABLE 19.4
Results of Common Factor Analysis (continued )
Factor Score Coefficient Matrix
Factor 1 Factor 2
V1 0.628 0.101
V2 -0.024 0.253
V3 0.217 -0.169
V4 -0.023 0.271
V5 -0.166 -0.059
V6 0.083 0.500
*The lower left triangle contains the reproduced correlation matrix; the diagonal, the
communalities; the upper right triangle, the residuals between the observed
correlations and the reproduced correlations.
respectively. Factor 1 was defined as a representation of consumers’ faith in the rebate system (Faith). Factor
2 seemed to capture the consumers’ perceptions of the efforts and difficulties associated with rebateredemp-
tion (Efforts). Factor 3 represented consumers’ perceptions of the manufacturers’ motives for offering
rebates (Motives). The loadings of items on their respective factor ranged from 0.527 to 0.744.
Therefore, companies such as AT&T that employ rebates should ensure that the effort and difficulties
of consumers in taking advantage of the rebates are minimized. They should also try to build consumers’
faith in the rebate system and portray honest motives for offering rebates.10 ■
Note that in this example, when the initial factor solution was not interpretable, items that had
low loadings were deleted and the factor analysis was performed on the remaining items. If the
number of variables is large (greater than 15), principal components analysis and common factor
analysis result in similar solutions. However, principal components analysis is less prone to mis-
interpretation and is recommended for the nonexpert user. The next example illustrates an appli-
cation of principal components analysis in international marketing research, and the example
after that presents an application in the area of ethics.
To simplify the table, only varimax rotated loadings of 0.40 or greater are reported. Each was rated on a 5-point scale with 1 “strongly agree”
and 5 “strongly disagree.”
One of these scales included 11 items pertaining to the extent that ethical problems plagued the organiza-
tion, and what top management’s actions were toward ethical situations. A principal components analysis with
varimax rotation indicated that the data could be represented by two factors.
These two factors were then used in a multiple regression along with four other predictor variables.
They were found to be the two best predictors of unethical marketing research practices.12 ■
The Situation
Tiffany & Co. (www.tiffany.com) is the internationally renowned retailer, designer, manufacturer, and
distributor of fine jewelry, timepieces, sterling silverware, china, crystal, stationery, fragrances, and acces-
sories. Founded in 1837 by Charles Lewis Tiffany, there were 184 Tiffany & Co. stores and boutiques that
served customers in the United States and international markets in 2009. Tiffany’s main growth strategies
consist of expanding its channels of distribution in important markets around the world, complementing
its existing product offerings with an active product development program, enhancing customer awareness
of its product designs, quality, and value, and providing levels of customer service that guarantee a great
shopping experience. Tiffany & Company’s revenues exceeded $2.94 billion in 2008.
Tiffany is slowly and subtly embracing the middle class, a potential danger for one of retail’s most
exclusive names. Over the past decade, the luxury jewelry retailer has nearly tripled its stores in the United
States and changed its promotions to highlight more lower-price wares. It currently has 70 U.S. locations
and another 114 overseas. Tiffany’s new 5,000-square-foot format will allow it to also expand into smaller
markets and double up in bigger towns. In the process of reaching out to embrace more markets, Tiffany
622 PART III • DATA COLLECTION, PREPARATION, ANALYSIS, AND REPORTING
may be driving away some of its core customers. Although Tiffany has a long way to go before it is fully
accessible, the Tiffany Heart Tag silver bracelet has become quite a popular item among many, including
Reese Witherspoon and her entourage in Legally Blonde. Mr. James E. Quinn, president, is wondering
what the psychographic profile of Tiffany’s core customers is and what the company should do to maintain
and build upon the loyalty of its core customers. This is critical to success in the future.
Statistical Software
Computer programs are available to implement both of the approaches: principal components analy-
sis and common factor analysis. We discuss the use of SPSS and SAS in detail in the subsequent
sections. Here, we briefly describe the use of MINITAB. In MINITAB, factor analysis can be
assessed using Multivariate>Factor analysis. Principal components or maximum likelihood can be
used to determine the initial factor extraction. If maximum likelihood is used, specify the number of
factors to extract. If a number is not specified with a principal component extraction, the program
will set it equal to a number of variables in the data set. Factor analysis is not available in EXCEL.
SPSS Windows
To select this procedure using SPSS for Windows, click:
Analyze7Data Reduction7Factor . . .
The following are the detailed steps for running principal components analysis on the tooth-
paste attribute ratings (V1 to V6) using the data of Table 19.1.
The procedure for running common factor analysis is similar, except that in step 5, for
METHOD select PRINCIPAL AXIS FACTORING.
Analyze7Multivariate7Factor Analysis
The following are the detailed steps for running principal components analysis on the tooth-
paste attribute ratings (V1 to V6) using the data of Table 19.1.
Project Activities
SPSS Data File Download the SPSS or SAS data file Sears Data 17 from the Web site for this book. See Chapter 17 for a
description of this file.
1. Can the 21 lifestyle statements be represented by a reduced set of factors? If so, what would be the
interpretation of these factors? Conduct a principal components analysis and save the factor scores.
2. Can the importance attached to the eight factors of the choice criteria be represented by a reduced set
SAS Data File of factors? If so, what would be the interpretation of these factors? Conduct a principal components
analysis. ■
Summary
Factor analysis, also called exploratory factor analysis (EFA), accounts for the highest variance in the data, the second the
is a class of procedures used for reducing and summarizing next highest, and so on. Additionally, it is possible to extract
data. Each variable is expressed as a linear combination of the the factors so that the factors are uncorrelated, as in principal
underlying factors. Likewise, the factors themselves can be components analysis. Figure 19.7 gives a concept map for
expressed as linear combinations of the observed variables. factor analysis.
The factors are extracted in such a way that the first factor
FIGURE 19.7
A Concept Map for Factor Analysis
In formulating the factor analysis problem, the variables The number of factors that should be extracted can be
to be included in the analysis should be specified based on determined a priori or based on eigenvalues, scree plots,
past research, theory, and the judgment of the researcher. percentage of variance, split-half reliability, or significance
These variables should be measured on an interval or ratio tests. Although the initial or unrotated factor matrix indicates
scale. Factor analysis is based on a matrix of correlation the relationship between the factors and individual variables,
between the variables. The appropriateness of the correla- it seldom results in factors that can be interpreted, because
tion matrix for factor analysis can be statistically tested. the factors are correlated with many variables. Therefore,
The two basic approaches to factor analysis are princi- rotation is used to transform the factor matrix into a simpler
pal components analysis and common factor analysis. one that is easier to interpret. The most commonly used
In principal components analysis, the total variance in the method of rotation is the varimax procedure, which results in
data is considered. Principal components analysis is recom- orthogonal factors. If the factors are highly correlated in the
mended when the researcher’s primary concern is to deter- population, oblique rotation can be utilized. The rotated
mine the minimum number of factors that will account for factor matrix forms the basis for interpreting the factors.
maximum variance in the data for use in subsequent multi- Factor scores can be computed for each respondent.
variate analysis. In common factor analysis, the factors are Alternatively, surrogate variables may be selected by exam-
estimated based only on the common variance. This method ining the factor matrix and selecting for each factor a
is appropriate when the primary concern is to identify the variable with the highest or near highest loading. The
underlying dimensions, and when the common variance is differences between the observed correlations and the
of interest. This method is also known as principal axis reproduced correlations, as estimated from the factor
factoring. matrix, can be examined to determine model fit.
Video Case
23.1 Marriott