Factor Analysis
Factor Analysis
The perceptions of JPMorgan Chase, on 38 attributes (Q1 to Q15) are examined for the following reasons:
• Understand whether these perceptions can be “grouped.”
• Reduce the 38 variables to a smaller number
The perceptions of JPMorgan Chase, on 38 attributes (Q1 to Q15) are examined for the following reasons:
• Understand whether these perceptions can be “grouped.”
• Reduce the 38 variables to a smaller number
1. Adequate sample size: The case must be greater than the factor. Here we have the number of
cases 500, which is much greater than the numbers of factors.
2. No perfect multicollinearity: Factor analysis is an interdependency technique. There should not
be perfect multicollinearity between the variables.
From the above table we find that there is no perfect collinearity between the variables Q1_a
through Q1_l).
The above table represents the correlation between the variables Q6_a through Q6_m. From the
above table we find that there is no perfect collinearity between the variables Q6_a through
Q6_m).
3. Linearity: Factor analysis is also based on linearity assumption. Non-linear variables can also be
used. After transfer, however, it changes into linear variable.
From the above two plots we find that there is a linear relation between the variables (Q1_a
through Q1_l)
From the above two plots we find that there is a linear relation between the variables (Q6_a
through Q6_m)
F.1. Can the importance variables (Q1_a through Q1_l) be represented by a reduced set of factors? Conduct a principal
components analysis using varimax rotation. Save the factor scores.
The above table represents the KMO and Bartlett’s test. The tests help us to check whether the co
relation is following an identity structure or whether there is a significant difference. From the above
table we can find that the significance value is less than 0.01 hence we conclude that there is statistical
difference and we infer that our co relation matrix for our measured variables is significantly different
from identity matrix which would then be consistent with the assumption that the matrix should be
treated as factorable.
The above signifies the total number of components that have been extracted before deciding the
number of factors. From the above table we can see that a total of 12 was extracted which is equal to
the number of measured variables. Each of these components have an eigen value associated with
them which is then summarizing the amount of variation in their measured variables that is being
accounted by the given components. These components are extracted in such a way they summarize
all variation and they orthogonal to each other.
In our PCA we have set the criteria for selecting only those components whose value is greater than
1, hence only first two components have been considered and they altogether explain the 57% of
variation.
The above scree plot depicts the components accepted in our analysis as only two components are
above eigen value 1.
The above table signifies the probability of selection of variables in the two components.
The above table represents the same probability but after rotation.
If we compare both of the probability table, we will find that the probabilities in the rotation table are
much higher and precise.
The above table depicts the component score of all the variables.
F.2. Can the ratings of the primary financial provider (Q6_a through Q6_m) be represented by a reduced set of factors?
Conduct a principal components analysis using varimax rotation. Save the factor scores.
The above table represents the KMO and Bartlett’s test. The tests help us to check whether the co
relation is following an identity structure or whether there is a significant difference. From the above
table we can find that the significance value is less than 0.01 hence we conclude that there is statistical
difference and we infer that our co relation matrix for our measured variables is significantly different
from identity matrix which would then be consistent with the assumption that the matrix should be
treated as factorable.
The above signifies the total number of components that have been extracted before deciding the
number of factors. From the above table we can see that a total of 13 was extracted which is equal to
the number of measured variables. Each of these components have an eigen value associated with
them which is then summarizing the amount of variation in their measured variables that is being
accounted by the given components. These components are extracted in such a way they summarize
all variation and they orthogonal to each other.
In our PCA we have set the criteria for selecting only those components whose value is greater than
1, hence only first two components have been considered and they altogether explain the 58.302% of
variation.
The above scree plot depicts the components accepted in our analysis as only two components are
above eigen value 1.
The above table signifies the probability of selection of variables in the two components.
The above table represents the same probability but after rotation.
If we compare both of the probability table, we will find that the probabilities in the rotation table are
much higher and precise.
The above table depicts the component score of all the variables.
F.3. Can the likelihood of “recommend your primary provider to someone you know” (Q2) be explained by the factor scores
of ratings of the primary financial provider (Q6_a through Q6_m) when these factor scores are considered simultaneously?
To analyse this we will use Regression analysis to check the overall impact of the factor scores of ratings of
the primary financial provider (Q6_a through Q6_m) on “recommend your primary provider to someone you know” (Q2).
The above table we can see that the value of adjusted R square is 0.301, which shows that the factors
can explain only the 30.1% of the likelihood of “recommend your primary provider to someone you know” (Q2).
From the above table we find that the significance value of all the three factors are lower than 0.
05, hence they are impacting the Q2 significantly but only to the extent of 30.1%.
F.4. Can the likelihood of “continue to use your primary provider at least at the same level as up to now” (Q3) be explained
by the factor scores of ratings of the primary financial provider (Q6_a through Q6_m) when these factor scores are
considered simultaneously?
To analyze this we will use Regression analysis to check the overall impact of the factor scores of ratings
of the primary financial provider (Q6_a through Q6_m) on “continue to use your primary provider at least at the same level
as up to now” (Q3)
The above table we can see that the value of adjusted R square is 0.236, which shows that the factors
can explain only the 23.6% of the likelihood of “recommend your primary provider to someone you know” (Q3).
From the above table we find that the significance value of factors score1 and factor score 2 are lower
than 0.05, hence they are impacting the Q3 significantly but only to the extent of 23.6%.
F.5. Do the factor scores of ratings of the primary financial provider (Q6_a through Q6_m) considered simultaneously
explain who switched some assets from one investment/savings provider to another and who did not (Q7)?
We can analyze this by performing logistic regression by takning factor scores as independent
variable and Q7 as dependent variable.
From the above table we find that the value of Cox and Snell R square is 0.007.
The above table depicts the overall accuracy is 85.6% which is very significant.
F.6. Do the factor scores of ratings of the primary financial provider (Q6_a through Q6_m) considered simultaneously
explain the various decision-making approaches (Q8)?
To analyze this we will use Regression analysis to check the overall impact of the factor scores of ratings
of the primary financial provider (Q6_a through Q6_m) on “continue to use your primary provider at least at the same level
as up to now” (Q8)
The above table we can see that the value of adjusted R square is 0.133, which shows that the factors
can explain only the 13.3% of the likelihood of the various decision-making approaches (Q8)
From the above table we find that the significance value of all the factors score are lower than 0.05,
hence they are impacting the Q8 significantly but only to the extent of 13.3%.
F.7. Do the factor scores of the importance variables (Q1_a through Q1_l) considered simultaneously explain the various
decision-making approaches (Q8)
To analyse this we will use Regression analysis to check the overall impact of the factor scores (Q1_a
through Q1_l) on “continue to use your primary provider at least at the same level as up to now” (Q8)
The above table we can see that the value of adjusted R square is 0.155, which shows that the factors
can explain only the 15.5% of the likelihood of the various decision-making approaches (Q8)
From the above table we find that the significance value of all the factors score are lower than 0.05,
hence they are impacting the Q8 significantly but only to the extent of 15.5%.
The above table represents that the first data set compromise of 2 different variables each having 500
data points in it and there are no missing values in it.
Descriptive Statistics
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
Performance of
investments with this 500 1 5 4.28 .875 -1.760 .109 4.046 .218
provider
Fees or commissions
500 1 5 3.79 .862 -.492 .109 .505 .218
charged
Depth of products and
services to meet the range 500 1 5 3.79 .929 -.947 .109 1.142 .218
of your investment needs
Ability to resolve
500 1 5 3.95 .925 -1.095 .109 1.515 .218
problems
Online services offered 500 1 5 2.66 1.255 .193 .109 -.892 .218
Multiple providers'
500 1 5 3.36 1.088 -.494 .109 -.227 .218
products to choose from
Quality of advice 500 1 5 4.24 .894 -1.449 .109 2.471 .218
Knowledge of
representatives or advisors 500 1 5 4.20 .922 -1.517 .109 2.774 .218
you deal with
Representative knowing
your overall situation and 500 1 5 3.99 1.033 -1.107 .109 .894 .218
needs
Access to other
500 1 5 3.40 1.025 -.396 .109 -.033 .218
professional resources
Degree to which my
500 1 5 3.68 1.066 -.729 .109 .086 .218
provider knows me
Quality of service 500 1 5 4.43 .668 -1.439 .109 4.303 .218
Valid N (listwise) 500
The table represents the descriptive statistics of first sample (Q1_a through Q1_l) which we are going
to use for our cluster analysis.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Performance of investments
.267 500 .000 .716 500 .000
with this provider
Fees or commissions charged .237 500 .000 .854 500 .000
Depth of products and services
to meet the range of your .295 500 .000 .836 500 .000
investment needs
Ability to resolve problems .291 500 .000 .817 500 .000
Online services offered .183 500 .000 .893 500 .000
Multiple providers' products to
.208 500 .000 .893 500 .000
choose from
Quality of advice .257 500 .000 .758 500 .000
Knowledge of representatives
.262 500 .000 .754 500 .000
or advisors you deal with
Representative knowing your
.264 500 .000 .816 500 .000
overall situation and needs
Access to other professional
.201 500 .000 .892 500 .000
resources
Degree to which my provider
.251 500 .000 .871 500 .000
knows me
Quality of service .306 500 .000 .706 500 .000
The above table represents the normality of the data which we will use for our analysis. From the
above table we can see that the significance value of Shapiro Wilk’s test is less than 0.05 in all the
cases, hence the data is not normal and have significant number of outliers in our data set. In the
previous portion we have tried to normalize the above data sets with different methods, but could not,
hence we cannot remove the outliers in the data set.
Descriptive Statistics
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Erro
The table represents the descriptive statistics of second sample factor score of (Q1_a through Q1_l)
which we are going to use for our cluster analysis.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
The above table represents the normality of the data which we will use for our analysis. From the
above table we can see that the significance value of Shapiro Wilk’s test is less than 0.05 in all the
cases, hence the data is not normal and have significant number of outliers in our data set. In the
previous portion we have tried to normalize the above data sets with different methods, but could not,
hence we cannot remove the outliers in the data set.
The above table represents the correlation between all the variables. From the above tables
we can see that the value of Pearson coefficient is less than 0.8. hence there is no substantial
multicollinearity between the variables.
The above represents the correlation between the variables of second data sets which we are
going to use. None of the above variables have Pearson Co relation greater than 0.8. there is
no substantial multicollinearity between the variables.
G.1. Cluster the respondents based on the importance variables (Q1_a through Q1_l). Use K-means clustering and specify a
two-cluster solution. Interpret the resulting clusters.
The initial cluster centers are the variable values of the 2 well-spaced observations.
1. The iteration history shows the progress of the clustering process at each step.
2. In early iterations, the cluster centers shift quite a lot.
3. By the 10th iteration, they have settled down to the general area of their final
location, and the last four iterations are minor adjustments.
The above ANOVA table indicates which variables contribute the most to our cluster solution.
Variables with large F values provide the greatest separation between clusters.
The final cluster centers are computed as the mean for each variable within each final cluster. The
final cluster centers reflect the characteristics of the typical case for each cluster.
Cluster 1 is representing the all those users who have given higher rating whereas the cluster
comprises those users who have given lesser ratings.
This table shows the Euclidean distances between the final cluster centers. Greater distances between
clusters correspond to greater dissimilarities.
• The distance between Cluster 1 and 2 is 3.728, which is quite significant.
These relationships between the clusters can also be intuited from the final cluster centers, but this
becomes more difficult as the number of clusters and variables increases.
A large number of cases have been classified to cluster 1 which fortunately is the cluster of high
rating.
G.2. Cluster the respondents based on the factor scores of the importance variables (Q1_a through Q1_l). Use K-means
clustering and specify a two-cluster solution. Interpret the resulting clusters. Compare your results to those obtained by
clustering on the original importance variables.
The initial cluster centers are the variable values of the 2 well-spaced observations.
1. The iteration history shows the progress of the clustering process at each step.
2. In early iterations, the cluster centers shift quite a lot.
3. By the 10th iteration, they have settled down to the general area of their final
location, and the last five iterations are minor adjustments.
The above ANOVA table indicates which variables contribute the most to our cluster solution.
Variables with large F values provide the greatest separation between clusters.
The final cluster centers are computed as the mean for each variable within each final cluster. The
final cluster centers reflect the characteristics of the typical case for each cluster.
Cluster 1 is representing the all those users who have given higher rating whereas the cluster
comprises those users who have given lesser ratings.
This table shows the Euclidean distances between the final cluster centers. Greater distances between
clusters correspond to greater dissimilarities.
• The distance between Cluster 1 and 2 is only 1.865, which is not very significant.
A large number of cases have been classified to cluster 1 which fortunately is the cluster of high
rating.
Section H: Report and Presentation
H.1. Write a report for JPMorgan Chase based on all the analyses that you have conducted. What do you recommend that
JPMorgan Chase do in order to continue to grow?
From the above analysis we have found that there are lot of significant factors which are impacting
the continuation of growth of the company. Some of the very important factor which is impacting the
growth positively are “overall Satisfaction level”, Degree to which my provider knows me”,
“Representative knowing your overall situation and needs”, and “Quality of advice” as the coefficient
value of all these variable positive and significant in the regression model as compared to other
variables. The factors which are impacting the growth of company negatively are “Fees or
commissions charged”, “Online services offered”, “Knowledge of representatives or advisors you deal
with”, “Quality of service” as the value of all these factors are negative and significant in the
regression model. Also, from our analysis we have found that there is no significant impact of
demographic factor on the growth of the company. So, in order sustain the growth of the company,
the company needs to focus on improving the all those which are impacting positively and reducing
those which are negative for the growth.