0% found this document useful (0 votes)
93 views

Principal Component Analysis

Principal component analysis (PCA) and factor analysis are statistical techniques used to analyze interrelationships among variables and reduce dimensionality in a dataset. PCA transforms variables into principal components that retain maximum variation in the original variables while being uncorrelated. Factor analysis describes covariance relationships between variables in terms of underlying random factors inferred from observed variables. Both techniques simplify data by grouping related variables.

Uploaded by

Fall 2017
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Principal Component Analysis

Principal component analysis (PCA) and factor analysis are statistical techniques used to analyze interrelationships among variables and reduce dimensionality in a dataset. PCA transforms variables into principal components that retain maximum variation in the original variables while being uncorrelated. Factor analysis describes covariance relationships between variables in terms of underlying random factors inferred from observed variables. Both techniques simplify data by grouping related variables.

Uploaded by

Fall 2017
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Principal Component Analysis

& Factor Analysis


PCA
• The central idea of principal component analysis (PCA) is to reduce
the dimensionality of a dataset consisting of a large number of
interrelated variables while retaining as much as possible of the
variation present in the data set.

• This is achieved by transforming to a new set of variables,


the principal components (PCs), which are uncorrelated, and which
are ordered so that the first few retain most of the variation present
in all of the original variables.
Mathematics Behind PCA
• The process of obtaining principle components from a raw dataset
can be simplified in six parts :

• Take the whole dataset consisting of d dimensions (ignoring the labels).


• Compute the mean for every dimension.
• Compute the variance-covariance matrix of the whole dataset.
• Compute eigenvectors and the corresponding eigenvalues.
• Sort the eigenvectors by decreasing eigenvalues and choose k
eigenvectors with the largest eigenvalues to form a d × k
dimensional matrix.
• Use this d × k eigenvector matrix to transform the samples onto the
new subspace.
Example: Data

Remove labels
Dimensions & their mean

A
A - 𝐴ҧ = a

90 60 90 66 60 60 24 0 30

90 90 30 66 60 60 24 30 -30
60 60 60 - 66 60 60 = -6 0 0
60 60 90 66 60 60 -6 0 30

30 30 30 66 60 60 -36 -30 -30


a’ . a
24 0 30
2520 1800 900
24 24 -6 -6 -36 24 30 -30
0 30 0 0 -30 = 1800 1800 0
-6 0 0
30 -30 0 30 -30
-6 0 30 900 0 3600
-36 -30 -30
V = a‘ . a / n

2520/5 1800/5 900/5 504 360 180

1800/5 1800/5 0/5 = 360 360 0

900/5 0/5 3600/5 180 0 720


formula
Variance-Covariance Matrix

• Variance of scores for each test are shown in Blue along the diagonal. The art test has the biggest
variance (720); and the English test, the smallest (360).

• The covariance is displayed in black in the off-diagonal elements of the matrix:


• Covariance between math and English and math and art is positive (360), (180).
• The covariance between English and art, however, is zero. This means there tends to be no
predictable relationship between the movement of English and art scores.
Factor Analysis
• Factor Analysis is an extension of Principal Component Analysis (PCA).
Both models tries to approximate the covariance matrix.

• The essential purpose of Factor Analysis is to describe the covariance


relationships between several variables in terms of a few underlying
and unobservable random components that we will call factors.

• In statistics, latent (unobservable) variables, are variables that are not


directly observed but are rather inferred through a mathematical
model from other variables that are observed (directly measured).
The Big 5 personality traits
• The Big Five personality traits is a taxonomy for personality traits built
using factor analysis.
• The five factors are:
• Openness to experience
• Conscientiousness
• Extraversion
• Agreeableness
• Neuroticism
• Beneath each proposed global factor, there are a number of correlated
and more specific primary factors.
• For example, extraversion includes related qualities as gregariousness,
assertiveness, excitement seeking, warmth, activity, and positive emotions
Factor Analysis — Example
F1 F2 F3

X1: Convenient location 0.954 -0.234 -0.236

X2: Near home 0.942 0.254 0.325


X3: Value for money 0.251 0.723 -0.221

X4: Attractive promotions 0.124 0.884 -0.251

X5: Low prices -0.132 0.952 0.122

X6: Easy to locate items 0.114 0.231 0.945

X7: Good service -0.122 0.341 0.789


X8: Ease of parking 0.181 -0.332 0.678

X9: Efficient checkouts 0.238 0.102 0.988


Factor Analysis
• The prime objective of factor analysis is to simplify the data. Based
on patterns in the data, the technique summarizes numerous
variables into a few factors.
• For example, the 9 variables (attributes) in previous slide, are
summarized as 3 factors.
Variables grouped into Factors
• Variables with high loading help define the factor. For instance,
variables ‘value for money’, ‘attractive promotions’ and ‘low prices’
move in concert and are associated more strongly with F2.
Latent Factors

• Variables that define the same factor are usually grouped under their
respective factors.
Factor Loading & Naming
• The factor loading is the correlation between the variable and the
factor. ( ‘convenient location’, has correlation of 0.954 with factor F1.)

• There often exists some common meaning among the variables that
define a factor. Factor naming is a subjective process that combines
understanding of market with inspection of variables that define the
factor

You might also like