Principal Components Analysis
Principal Components Analysis
The purpose of principal components factor analysis is to reduce the number of variables in the analysis by using a surrogate variable or factor to represent a number of variables, while retaining the variance that was present in the original variables. The data analysis indicates the relationship between the original variables and the factors, so that we know how to make the substitutions. Principal components is frequently used to simplify a data set prior to conducting a multiple regression or discriminant analysis. To demonstrate principal components analysis, we will use the sample problem in the text which begins on page 120.
Slide 1
Slide 2
The variables included in the analysis are core elements of the business. There do not appear to be any extraneous variables.
Principal Components Factor Analysis Slide 3
Slide 5
Second, click on the move arrow to move the highlighted variables to the 'Variables:' list.
Slide 6
Fourth, click on the 'Continue' button to complete the 'Factor Analysis: Descriptives' dialog box.
Third, mark the checkboxes for 'Coefficients', 'KMO and Bartlett's test of sphericity', and 'Anti-image' on the 'Correlation Matrix' panel. Clear all other checkboxes.
Slide 7
Fourth, mark the checkboxes for 'Unrotated factor solution' and 'Scree plot' on the 'Display' panel.
First, click on the 'Extraction...' button. Sixth, click on the 'Continue' button to complete the dialog box.
Fifth, accept the default values of 'Eigenvalues over: 1' on the Extract panel and the 'Maximum Iterations for convergence: 25'.
Slide 8
Third, mark the checkbox for 'Rotated solution' on the 'Display' panel. Clear all other checkboxes.
Slide 9
Slide 10
Slide 11
Interpretive adjectives for the Kaiser-Meyer-Olkin Measure of Sampling Adequacy are: in the 0.90 as marvelous, in the 0.80's as meritorious, in the 0.70's as middling, in the 0.60's as mediocre, in the 0.50's as miserable, and below 0.50 as unacceptable. The value of the KMO Measure of Sampling Adequacy for this set of variables is .446, falling below the acceptable level. We will examine the anti-image correlation matrix to see if it provides us with any possible remedies.
Slide 12
The Anti-image Correlation Matrix contains the measures of sampling adequacy for the individual variables on the diagonal of the matrix, highlighted in cyan. The measures for three variables fall below the acceptable level of 0.50: X1 'Delivery Speed' (.344), X2 'Price Level' (.330), and X5 'Service' (.288). The corrective action is to delete the variables one at a time, starting with the one with the smallest value, until the problem is corrected.
Slide 13
Fourth, click on the move arrow to return 'Service (X5)' to the list of available buttons.
Slide 14
Bartlett's test of sphericity tests the hypothesis that the correlation matrix is an identify matrix; i.e. all diagonal elements are 1 and all offdiagonal elements are 0, implying that all of the variables are uncorrelated. If the Sig value for this test is less than our alpha level, we reject the null hypothesis that the population matrix is an identity matrix. The Sig. value for this analysis leads us to reject the null hypothesis and conclude that there are correlations in the data set that are appropriate for factor analysis.
Slide 15
The new anti-image correlation matrix indicates that the sampling adequacy for each variable is above the 0.50 threshold.
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
In this criterion, we count the number of components that would be necessary to explain 70% or more of the variance in the original set of variables. In this analysis, we reach the 70% minimum with two components.
Slide 23
In my analysis of the scree plot, the eigenvalues level off beginning with the third eigenvalue. The number of components to retain corresponds to the number of eigenvalues before the line levels off. Therefore, we would retain two components, which corresponds to the number determined by the latent root criterion. (NOTE: in applying this test, the text identifies three components using their interpretation of the criteria).
Slide 24
Slide 25
Slide 26
The table of Communalities for this analysis shows communalities for all variables above 0.50, so we would not exclude any variables on the basis of low communalities. If we did exclude a variable for a low communality, we should re-run the factor analysis without that variable before proceeding.
Principal Components Factor Analysis Slide 27
Slide 28
In this component matrix, each variable does have one substantial loading on a component. If one or more variables did not have a substantial loading on a factor, we would re-run the factor analysis excluding those variables one at a time, until we have a solution in which all of the variables in the analysis load on at least one factor.
In this component matrix, each of the original variables also has a substantial loading on only one factor. If a variable had a substantial loading on more than one variable, we refer to that variable as "complex" meaning that it has a relationship to two or more of the derived factors. There are a variety of prescriptions for handling complex variables. The simple prescription is to ignore the complexity and treat the variable as belonging to the factor on which it has the highest loading. A second simple solution to complexity is to eliminate the complex variable from the factor analysis. I have seen other instances where authors chose to include it as a variable in multiple factors, or to arbitrarily assign it to a factor for conceptual reasons. Other prescriptions are to try different methods of factor extraction and rotation to see if a more interpretable solution can be found.
Slide 29
Slide 30
Slide 31
First, select the 'Random Number Seed...' command from the 'Transform' menu.
Second, click on the 'Set seed to:' option to access the text box for the seed number. Fourth, click on the OK button to complete this action. Third, type '34567' in the 'Set seed to:' text box. (This is the same random number seed specified by the authors on page 705 of the text.)
Slide 32
Compute the Variable to Randomly Split the Sample into Two Halves
First, select the 'Compute...' command from the Transform menu.
Second, create a new variable named 'split' that has the values 1 and 0 to divide the sample into two part. Type the name 'split' into the 'Target Variable:' text box.
Third, type the formula 'uniform(1) > 0.52' in the 'Numeric Expression:' text box. The uniform function will generate a random number between 0.0 and 1.0 for each case. If the generated random number is greater than 0.52, the numeric expression will result in a 1, since the numeric expression is true. If the generated random number is 0.52 or less, the numeric expression will produce a 0, since its value is false. In many computer programs, true is represented by the number 1 and false is represented by a 0.
Slide 33
Compute the Factor Analysis for the First Half of the Sample
First, select the 'Data Reduction | Factor...' command from the Analyze menu.
Second, highlight the 'split' variable and click on the move button to put it into the 'Selection Variable:' text box.
Slide 34
First, click on the 'Value...' button which was activated when the split variable moved to the 'Selection Variable:' text box. Third, click on the Continue button to complete the value assignment. Click on the OK button in the Factor Analysis dialog to compute the factor analysis for the first half of the sample.
Slide 35
Compute the Factor Analysis for the Second Half of the Sample
Fourth, type the value 1 into the 'Value for Selection Variable:' text box to replace the '0' in the 'split=0' entry in the 'Selection Variable:' text box with 'split=1'.
First, select the 'Data Reduction | Factor...' command from the Analyze menu. Second, click on the 'Selection Variable:' text box to highlight it.
Fifth, click on the Continue button to complete the value assignment. Third, click on the 'Value...' button which was activated when the 'Selection Variable:' text box was highlighted. Click on the OK button in the Factor Analysis dialog to compute the factor analysis for the second half of the sample.
Slide 36
Slide 37
Slide 38
2. Identification of Outliers
SPSS proposes a strategy for identifying outliers that is not found in the text (See: SPSS Base 7.5 Applications Guide, pp. 303-304). SPSS computes the factor scores as standard scores with a mean of 0 and a standard deviation of 1. We can examine the factor scores to see if any are above or below the standard score size associated with extreme cases, i.e. +/-2.5 or +-3.0. For this analysis, we will need to compute the factors scores which we have not requested to this point.
Slide 39
First, we re-open the Factor Analysis dialog box by selecting the 'Data Reduction | Factor...' command from the Analyze menu.
Second, we highlight the 'split=1' selection variable and click on the move arrow to remove it, so that the factors scores are computed using the parameters for the full sample.
Slide 40
Fourth, we click on the 'Continue' button to close the 'Factor Analysis: Factor Scores' dialog and the OK button to request the output.
Third, we accept the default 'Regression' method for computing the factor scores.
Slide 41
SPSS adds variables for the factor scores to the data set.
Slide 42
Second, move the FAC1_1 'REGR factor score 1 for analysis 1' and 'FAC2_1 REGR factor score 2 for analysis 1' variables compute by the Factor Analysis to the 'Dependent List:' list box.
First, select the 'Descriptive Statistics | Explore' command from the Analyze menu.
Third, move the ID variable to the 'Label Cases by:' text box so that the case ID will appear in the output listings.
Fifth, click on the 'Statistics' to request the listing of outliers. Fourth, mark the 'Statistics' option on the Display panel.
Slide 43
First, we mark the 'Outliers' check box and clear all other check boxes.
Slide 44
Slide 45
Second, mark the 'If condition in satisfied' option in the 'Select' panel.
First, select the 'Select Cases...' command from the Data menu.
Slide 46
Slide 47
Slide 48
Correlation
Delivery Speed Price Level Price Flexibility Manufacturer Image Salesforce Image Product Quality
Correlation Matrix Delivery Speed 1.000 -.319 .487 -.039 -.020 -.450 Price Level -.319 1.000 -.471 .353 .272 .449 Price Flexibility .487 -.471 1.000 -.186 -.107 -.426 Manufacturer Image -.039 .353 -.186 1.000 .761 .295 Salesforce Image -.020 .272 -.107 .761 1.000 .284 Product Quality -.450 .449 -.426 .295 .284 1.000
Correlation
Delivery Speed Price Level Price Flexibility Manufacturer Image Salesforce Image Product Quality
Slide 49
Slide 50
Slide 51
Another option for reducing the data set is to select one of the variables on each factor to use as a surrogate for all the variables that loaded on that factor.
A more common method for incorporating the results of the factor analysis is to create summated scale variables. In this method, the variables which load on each factor are simply summed to form the scale score, rather than using the weights or coefficients for each variable that SPSS uses in calculating factor scores.
Summated scales are easier to compute than weighted factor scores and can easily be applied to cases not included in the original factor analysis. When summated scales are used, it is customary to compute Chronbach's Alpha
Slide 52
Summated or additive scales are formed by summing the scores for a set of variables that load on a factor. If you incorporate summated or additive scales into your research, there is an expectation that you will include efforts to assess the reliability of your measures.
Recall that an underlying construct is a hypothetical variable that you wish to measure, but which cannot be directly measured. The observed variables, on the other hand, consist of measurements that are actually obtained. A reliability coefficient is defined as the percent of variance in an observed variable that is accounted for by the true scores on the underlying construct. Since it is generally not possible to obtain true scores on the underlying construct, reliability is usually defined in practice in terms of the consistency of the scores that are obtained on the observed variables; an instrument is said to be reliable if it is shown to provide consistent scores upon repeated administration, upon administration in alternate forms, and so forth. A variety of methods for estimating scale reliability are actually used in practice. Testretest reliability is assessed by administering the same instrument to the same sample of subjects at two points in time and computing the correlation between the two sets of scores. However, this can be a time consuming and expensive procedure, where you are collecting additional data that cannot be used in other analyses. Because of the cost and time involved in test-retest procedures, indices of reliability that require only one administration are often used. The most popular of these indices are the internal consistency indices of reliability. Briefly, internal consistency is the extent to which the individual items that constitute a test correlate with one another or with the test total. In the social sciences, one of the most widely used indices of internal consistency is coefficient alpha or Cronbach's alpha.
Principal Components Factor Analysis Slide 53
Slide 54
Second, move the items loading on the first scale, Delivery Speed, Price Level, Price Flexibility, and Product Quality, to the list box of 'Items:'.
First, select the 'Scale | Reliability Analysis...' from the Analyze menu.
Third, select 'Alpha' from the drop down menu of 'Model:' choices.
Fourth, click on the 'Statistics...' button to specify the statistics we want included in the output.
Slide 55
First, we mark the check boxes for 'Scale' and 'Scale if item deleted' and clear all other check boxes. If the obtained value of coefficient alpha is below the acceptable criteria, these statistics will suggest a remedy for correcting the problem.
Third, click on the OK button to close the 'Reliability Analysis' dialog box.
Slide 56
Slide 57
First, select the 'Scale | Reliability Analysis...' from the Analyze menu.
Second, remove the variables for the first scale from the 'Items:' list box and move the items for the second scale, Manufacturer Image and Salesforce Image, to the list box of 'Items:'.
Third, all other specifications remain the same, so we click on the OK button to produce the output.
Slide 58
Slide 59