0% found this document useful (0 votes)

93 views

Principal Component Analysis

Principal component analysis (PCA) and factor analysis are statistical techniques used to analyze interrelationships among variables and reduce dimensionality in a dataset. PCA transforms variables into principal components that retain maximum variation in the original variables while being uncorrelated. Factor analysis describes covariance relationships between variables in terms of underlying random factors inferred from observed variables. Both techniques simplify data by grouping related variables.

Uploaded by

Fall 2017

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

Principal Component Analysis

Uploaded by

Fall 2017

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Principal Component Analysis

& Factor Analysis

PCA
• The central idea of principal component analysis (PCA) is to reduce
the dimensionality of a dataset consisting of a large number of
interrelated variables while retaining as much as possible of the
variation present in the data set.

• This is achieved by transforming to a new set of variables,

the principal components (PCs), which are uncorrelated, and which
are ordered so that the first few retain most of the variation present
in all of the original variables.
Mathematics Behind PCA
• The process of obtaining principle components from a raw dataset
can be simplified in six parts :

• Take the whole dataset consisting of d dimensions (ignoring the labels).

• Compute the mean for every dimension.
• Compute the variance-covariance matrix of the whole dataset.
• Compute eigenvectors and the corresponding eigenvalues.
• Sort the eigenvectors by decreasing eigenvalues and choose k
eigenvectors with the largest eigenvalues to form a d × k
dimensional matrix.
• Use this d × k eigenvector matrix to transform the samples onto the
new subspace.
Example: Data

Remove labels
Dimensions & their mean

A
A - 𝐴ҧ = a

90 60 90 66 60 60 24 0 30

90 90 30 66 60 60 24 30 -30
60 60 60 - 66 60 60 = -6 0 0
60 60 90 66 60 60 -6 0 30

30 30 30 66 60 60 -36 -30 -30

a’ . a
24 0 30
2520 1800 900
24 24 -6 -6 -36 24 30 -30
0 30 0 0 -30 = 1800 1800 0
-6 0 0
30 -30 0 30 -30
-6 0 30 900 0 3600
-36 -30 -30
V = a‘ . a / n

2520/5 1800/5 900/5 504 360 180

1800/5 1800/5 0/5 = 360 360 0

900/5 0/5 3600/5 180 0 720

formula
Variance-Covariance Matrix

• Variance of scores for each test are shown in Blue along the diagonal. The art test has the biggest
variance (720); and the English test, the smallest (360).

• The covariance is displayed in black in the off-diagonal elements of the matrix:

• Covariance between math and English and math and art is positive (360), (180).
• The covariance between English and art, however, is zero. This means there tends to be no
predictable relationship between the movement of English and art scores.
Factor Analysis
• Factor Analysis is an extension of Principal Component Analysis (PCA).
Both models tries to approximate the covariance matrix.

• The essential purpose of Factor Analysis is to describe the covariance

relationships between several variables in terms of a few underlying
and unobservable random components that we will call factors.

• In statistics, latent (unobservable) variables, are variables that are not

directly observed but are rather inferred through a mathematical
model from other variables that are observed (directly measured).
The Big 5 personality traits
• The Big Five personality traits is a taxonomy for personality traits built
using factor analysis.
• The five factors are:
• Openness to experience
• Conscientiousness
• Extraversion
• Agreeableness
• Neuroticism
• Beneath each proposed global factor, there are a number of correlated
and more specific primary factors.
• For example, extraversion includes related qualities as gregariousness,
assertiveness, excitement seeking, warmth, activity, and positive emotions
Factor Analysis — Example
F1 F2 F3

X1: Convenient location 0.954 -0.234 -0.236

X2: Near home 0.942 0.254 0.325

X3: Value for money 0.251 0.723 -0.221

X4: Attractive promotions 0.124 0.884 -0.251

X5: Low prices -0.132 0.952 0.122

X6: Easy to locate items 0.114 0.231 0.945

X7: Good service -0.122 0.341 0.789

X8: Ease of parking 0.181 -0.332 0.678

X9: Efficient checkouts 0.238 0.102 0.988

Factor Analysis
• The prime objective of factor analysis is to simplify the data. Based
on patterns in the data, the technique summarizes numerous
variables into a few factors.
• For example, the 9 variables (attributes) in previous slide, are
summarized as 3 factors.
Variables grouped into Factors
• Variables with high loading help define the factor. For instance,
variables ‘value for money’, ‘attractive promotions’ and ‘low prices’
move in concert and are associated more strongly with F2.
Latent Factors

• Variables that define the same factor are usually grouped under their
respective factors.
Factor Loading & Naming
• The factor loading is the correlation between the variable and the
factor. ( ‘convenient location’, has correlation of 0.954 with factor F1.)

• There often exists some common meaning among the variables that
define a factor. Factor naming is a subjective process that combines
understanding of market with inspection of variables that define the
factor

Download Full (eBook PDF) Statistics: A Tool for Social Research, 4th Canadian Edition by Joseph Healey PDF All Chapters
100% (2)
Download Full (eBook PDF) Statistics: A Tool for Social Research, 4th Canadian Edition by Joseph Healey PDF All Chapters
51 pages
Graded Project As - Kamalpreet Kaur
No ratings yet
Graded Project As - Kamalpreet Kaur
8 pages
Assignment 2
No ratings yet
Assignment 2
25 pages
LECTURE - 04 - CDB 3044 - Conceptual Process Design - Synthesis of Reaction System (II)
No ratings yet
LECTURE - 04 - CDB 3044 - Conceptual Process Design - Synthesis of Reaction System (II)
36 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Fun Bio Med Engg
No ratings yet
Fun Bio Med Engg
125 pages
1 CDB 4014 - Plant Design Project I
No ratings yet
1 CDB 4014 - Plant Design Project I
3 pages
Lec 17 - Principal Component Analysis PDF
No ratings yet
Lec 17 - Principal Component Analysis PDF
30 pages
Green Chemistry The Economic Impact Perspective
No ratings yet
Green Chemistry The Economic Impact Perspective
3 pages
Methanol Synthesis From Biogas A Thermodynamic Analysis
No ratings yet
Methanol Synthesis From Biogas A Thermodynamic Analysis
12 pages
CHE4049 Project 3 Handout 2012
100% (1)
CHE4049 Project 3 Handout 2012
5 pages
Bio9 Report Final Report
No ratings yet
Bio9 Report Final Report
282 pages
Thermodynamic Models & Physical Properties: Property Method Selection
No ratings yet
Thermodynamic Models & Physical Properties: Property Method Selection
24 pages
Techno-Economic Assessment of A Novel Integrated Multigeneration System To Synthesize E-Methanol and Green Hydrogen in A Carbon-Neutral Context
No ratings yet
Techno-Economic Assessment of A Novel Integrated Multigeneration System To Synthesize E-Methanol and Green Hydrogen in A Carbon-Neutral Context
12 pages
Absorption & Stripping: Senior Design CHE 396
No ratings yet
Absorption & Stripping: Senior Design CHE 396
44 pages
Styrene Design Problem
No ratings yet
Styrene Design Problem
4 pages
Contoh PDF
100% (1)
Contoh PDF
270 pages
NIST-JANAF Thermochemical Tables
No ratings yet
NIST-JANAF Thermochemical Tables
39 pages
CHEN 4460 - Process Synthesis, Simulation and Optimization
No ratings yet
CHEN 4460 - Process Synthesis, Simulation and Optimization
28 pages
Mathematical Modeling of CSTR For Polystyrene Production
No ratings yet
Mathematical Modeling of CSTR For Polystyrene Production
12 pages
The Nature of Chemical Process Design and Integration
No ratings yet
The Nature of Chemical Process Design and Integration
27 pages
Chemical Product Design
No ratings yet
Chemical Product Design
6 pages
Journal of Cleaner Production: Yuting Tang, Fengqi You
No ratings yet
Journal of Cleaner Production: Yuting Tang, Fengqi You
22 pages
Factor Analysis: Interdependence Technique
No ratings yet
Factor Analysis: Interdependence Technique
22 pages
Common Stata Pscore Commands - MGarrido
75% (4)
Common Stata Pscore Commands - MGarrido
2 pages
Problem Table Approach Procedure
No ratings yet
Problem Table Approach Procedure
17 pages
Operatio Research For Chemical Engineering
No ratings yet
Operatio Research For Chemical Engineering
6 pages
Introduction To Process Design and Simulation
No ratings yet
Introduction To Process Design and Simulation
43 pages
Basic Statistics: Basic Statistical Interview Question
No ratings yet
Basic Statistics: Basic Statistical Interview Question
5 pages
Liden 1988
No ratings yet
Liden 1988
16 pages
Labor Migration in Asia
No ratings yet
Labor Migration in Asia
132 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Stata
100% (1)
Stata
53 pages
Kinetic Modelling at The Basis of Process Simulation For Heterogeneous Catalytic Process Design
No ratings yet
Kinetic Modelling at The Basis of Process Simulation For Heterogeneous Catalytic Process Design
31 pages
Life Cycle Assessment of Electricity Generation in Mauritius - SIMAPRO PDF
100% (1)
Life Cycle Assessment of Electricity Generation in Mauritius - SIMAPRO PDF
11 pages
Study of Mixing Behavior of CSTR Using CFD: Brazilian Journal of Chemical Engineering April 2014
No ratings yet
Study of Mixing Behavior of CSTR Using CFD: Brazilian Journal of Chemical Engineering April 2014
12 pages
Chemical Process Simulations CHE 312 Lec 1
No ratings yet
Chemical Process Simulations CHE 312 Lec 1
18 pages
CHEM+ENG+3030 7054+Design+Project+Brief+2023+Final 075629
No ratings yet
CHEM+ENG+3030 7054+Design+Project+Brief+2023+Final 075629
7 pages
Hand Out Enzyme
100% (1)
Hand Out Enzyme
162 pages
CHEM ENG 3030 Design Project 2017
No ratings yet
CHEM ENG 3030 Design Project 2017
12 pages
CAPS - 3170507 - Common Lab Manual With Acknow
No ratings yet
CAPS - 3170507 - Common Lab Manual With Acknow
67 pages
Hyper Geometric Probability
0% (1)
Hyper Geometric Probability
8 pages
Ethanol Breakev en Aib Grind Margin Ai Return Over Variable Costs Ak Return Over All Costs
No ratings yet
Ethanol Breakev en Aib Grind Margin Ai Return Over Variable Costs Ak Return Over All Costs
38 pages
Lesson Plan CHE 604 Sept 2014 (1) Plant Design
No ratings yet
Lesson Plan CHE 604 Sept 2014 (1) Plant Design
5 pages
Chapter 11 - Heuristics
No ratings yet
Chapter 11 - Heuristics
39 pages
3K4 2013 Assignment 2 Solutions
No ratings yet
3K4 2013 Assignment 2 Solutions
9 pages
Methods of Regional Analysis (Shift-Share)
No ratings yet
Methods of Regional Analysis (Shift-Share)
7 pages
Chemical Product Design
No ratings yet
Chemical Product Design
6 pages
Kiran Final PJCT Witout Nums
No ratings yet
Kiran Final PJCT Witout Nums
76 pages
Production of Bioorganic Liquid Fertilizer From Cow Manure and Banana Peels
No ratings yet
Production of Bioorganic Liquid Fertilizer From Cow Manure and Banana Peels
6 pages
Lecture 8
No ratings yet
Lecture 8
52 pages
ChE312 Process Synthesis and Design Individual Assignment
No ratings yet
ChE312 Process Synthesis and Design Individual Assignment
9 pages
Fires, Explosions and Prevention Methods 3.1: 2 - C T - C T 2
No ratings yet
Fires, Explosions and Prevention Methods 3.1: 2 - C T - C T 2
4 pages
Simulation and Optimization of Rice Husk Asification Using Intrinsic Reaction Rate Based CFD Model
No ratings yet
Simulation and Optimization of Rice Husk Asification Using Intrinsic Reaction Rate Based CFD Model
10 pages
Biobased Polymers in Europe PDF
No ratings yet
Biobased Polymers in Europe PDF
249 pages
Gate 2006 PDF
No ratings yet
Gate 2006 PDF
21 pages
Describing Bivariate Numerical Data - Honors 281
No ratings yet
Describing Bivariate Numerical Data - Honors 281
34 pages
Lecture 5-Association Between Variables-1
No ratings yet
Lecture 5-Association Between Variables-1
20 pages
Chapter 4
No ratings yet
Chapter 4
63 pages
Decision-Science-Dec 2022
No ratings yet
Decision-Science-Dec 2022
11 pages
Lecture 8-Association Between Variables
No ratings yet
Lecture 8-Association Between Variables
28 pages
Pre-Calculus Know-It-ALL
From Everand
Pre-Calculus Know-It-ALL
Stan Gibilisco
No ratings yet
JURNAL Anggi Wulanti Ndolu
No ratings yet
JURNAL Anggi Wulanti Ndolu
18 pages
Nurse Education Today: Lian Mao, Lingzhi Huang, Qiongni Chen
No ratings yet
Nurse Education Today: Lian Mao, Lingzhi Huang, Qiongni Chen
7 pages
Advanced Statistics: PAPER #10
No ratings yet
Advanced Statistics: PAPER #10
4 pages
ANSWER SHEET (Module 5) .: Z-Test: Two Sample For Means
No ratings yet
ANSWER SHEET (Module 5) .: Z-Test: Two Sample For Means
4 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
64 pages
Data Driven Decision Regression
100% (1)
Data Driven Decision Regression
3 pages
Stat - Normal Curve. Week 3 4
100% (1)
Stat - Normal Curve. Week 3 4
40 pages
Rose Assignment
No ratings yet
Rose Assignment
6 pages
Lec-9 - Joint Moments and Joint Characteristic Functions of Functions of Two Random Variables
No ratings yet
Lec-9 - Joint Moments and Joint Characteristic Functions of Functions of Two Random Variables
20 pages
Practice Problem Set 2
No ratings yet
Practice Problem Set 2
1 page
C955 Formulas and Key Concepts
100% (1)
C955 Formulas and Key Concepts
14 pages
Pearson Correlation Coefficient
100% (1)
Pearson Correlation Coefficient
8 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
47 pages
Random Sample (N 5) Sample Mean (X̅) Solution
No ratings yet
Random Sample (N 5) Sample Mean (X̅) Solution
3 pages
Download Complete Probability Statistics for Engineers Scientists 8th Edition Instructors Solution Manual ONLY Sharon Myers PDF for All Chapters
No ratings yet
Download Complete Probability Statistics for Engineers Scientists 8th Edition Instructors Solution Manual ONLY Sharon Myers PDF for All Chapters
82 pages
Question - A: Parameters Smoker Non-Smoker Total Men Women Total
No ratings yet
Question - A: Parameters Smoker Non-Smoker Total Men Women Total
7 pages
Sasa PPT2
No ratings yet
Sasa PPT2
2 pages
Collection of Data Organize The Data - Tally Presentation of Data - Graphs and Table Analysis of The Data Interpretation of The Data
100% (2)
Collection of Data Organize The Data - Tally Presentation of Data - Graphs and Table Analysis of The Data Interpretation of The Data
26 pages
STATISTICS
No ratings yet
STATISTICS
12 pages
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
No ratings yet
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
8 pages
Stats Formula Sheet 1
No ratings yet
Stats Formula Sheet 1
1 page
Variance and Standard Deviation
No ratings yet
Variance and Standard Deviation
44 pages
Applied statistical inference with MINITAB Second Edition Lesik All Chapters Instant Download
100% (1)
Applied statistical inference with MINITAB Second Edition Lesik All Chapters Instant Download
55 pages
Document 1
No ratings yet
Document 1
2 pages
Lec01_population
No ratings yet
Lec01_population
32 pages
Levine and Renelt 1992 - A Sensitivity Analysis of Cross-Country Growth Regressions
No ratings yet
Levine and Renelt 1992 - A Sensitivity Analysis of Cross-Country Growth Regressions
22 pages
Stat-II CH-TWO
No ratings yet
Stat-II CH-TWO
68 pages
Chapter 12: Quantitative Data Analysis Part 2: Hypothesis Testing
No ratings yet
Chapter 12: Quantitative Data Analysis Part 2: Hypothesis Testing
6 pages

Principal Component Analysis

Uploaded by

Principal Component Analysis

Uploaded by

Principal Component Analysis

& Factor Analysis

• This is achieved by transforming to a new set of variables,

• Take the whole dataset consisting of d dimensions (ignoring the labels).

30 30 30 66 60 60 -36 -30 -30

2520/5 1800/5 900/5 504 360 180

1800/5 1800/5 0/5 = 360 360 0

900/5 0/5 3600/5 180 0 720

• The covariance is displayed in black in the off-diagonal elements of the matrix:

• The essential purpose of Factor Analysis is to describe the covariance

• In statistics, latent (unobservable) variables, are variables that are not

X1: Convenient location 0.954 -0.234 -0.236

X2: Near home 0.942 0.254 0.325

X4: Attractive promotions 0.124 0.884 -0.251

X5: Low prices -0.132 0.952 0.122

X6: Easy to locate items 0.114 0.231 0.945

X7: Good service -0.122 0.341 0.789

X9: Efficient checkouts 0.238 0.102 0.988

You might also like