0% found this document useful (0 votes)

74 views

Multivariate Statistics PCA

1. Principal component analysis (PCA) is a technique used to reduce the dimensionality of multivariate data while retaining as much information as possible. 2. PCA works by transforming the original variables into a new set of uncorrelated variables called principal components. 3. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.

Uploaded by

debantam majilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Multivariate Statistics PCA

Uploaded by

debantam majilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Multivariate Statistics

Sudipta Das

Assistant Professor,
Department of Data Science,
Ramakrishna Mission Vivekananda University, Kolkata
Outline I

1 Principal Component Analysis

Sudipta Das (RKMVU) Multivariate Statistics 2 / 19

Introduction I

A principal component analysis (PCA) is concerned with

explaining the variance-covariance structure of a set of variables
through a few linear combinations of these variables.
Objectives
Data reduction
Data interpretation

Sudipta Das (RKMVU) Multivariate Statistics 3 / 19

Introduction II

By PCA we select k principal components from a set for p(≥ k )

initial variables such that the total system variability is retained as
much as possible.
PCA
Data-set of size (n × p) ==⇒ Data-set of size (n × k )
Note
To retain the total system variability, we need to retain all the p
principal components

Sudipta Das (RKMVU) Multivariate Statistics 4 / 19

Population Principal Components I

Principal components are particular linear combinations of the p

random features/variables X1 , X2 , . . . , Xp .
These linear combinations represents selection of new coordinate
system obtained by rotating the original system with X1 , X2 , . . . , Xp
as the coordinate axes
the new axes represent the direction with maximum variability and
provide a simpler and more parsimonious description of the
covariance structure
Note:
Principal components depends solely on the covariance matrix Σ of
X1 , X2 , . . . , Xp .
Their development does not require a multivariate normal
assumption.
However, standard results on inference can be used if the samples
are assumed to be coming from normal population

Sudipta Das (RKMVU) Multivariate Statistics 5 / 19

Population Principal Components II

Formal definition
First principal component
is the linear combination of a1 0 X that maximizes Var (a1 0 X) subject to
a1 0 a1 = 1
Second principal component
is the linear combination of a2 0 X that maximizes Var (a2 0 X) subject to
a2 0 a2 = 1 and Cov (a2 0 X, a1 0 X) = 0
······
ith principal component
is the linear combination of ai 0 X that maximizes Var (ai 0 X) subject to
ai 0 ai = 1 and Cov (ai 0 X, ak 0 X) = 0 for all k < i

Sudipta Das (RKMVU) Multivariate Statistics 6 / 19

Population Principal Components III

Result: Let Σ be the covariance matrix associated with random

vector X = [X1 , X2 , . . . , Xp ]0 . Let Σ have the eigenvalue-
eigenvector pairs (λ1 , e1 ), (λ2 , e2 ), . . . , (λp , ep ), where
λ1 ≥ λ2 ≥ · · · λp ≥ 0. Then the ith principal component is given by

Yi = ei 0 X = ei1 X1 + ei2 X2 + · · · + eip Xp , for i = 1, . . . , p

With these choices,

Var (Yi ) = ei 0 Σei = λi , for i = 1, . . . , p

Cov (Yi , Yk ) = ei 0 Σek = 0, for i 6= k

Note: If some λi are equal then the choices of the corresponding
coefficient vectors ei , and hence Yi , are not unique.

Sudipta Das (RKMVU) Multivariate Statistics 7 / 19

Population Principal Components IV

Sketch of proof:
To get the first principal component, we need

a0 Σa

0 0
max(Var (a X)) s. t. a a = 1 ⇒ max
a a a0 a

Thus (Lemma: Maximization of Quadratic Forms for Points on the Unit Sphere),
0
a Σa
max = λ1
a a0 a

and maximum is attained at a = e1 .

Hence, Y1 = e01 X and Var (Y1 ) = e01 Σe1 = λ1

Sudipta Das (RKMVU) Multivariate Statistics 8 / 19

Population Principal Components V
Sketch of proof (contd.):
To get the ith principal component, we need
max(Var (a0 X)) s. t. a0 a = 1 and Cov (a0 X, ak 0 X) = 0 for all k < i
a
0
a Σa
⇒ max s. t. Cov (a0 X, ek 0 X) = 0 for all k < i
a a0 a
0
a Σa
⇒ max , [ since a0 Σek = a0 λk ek = 0 ⇒ a⊥ek ]
a⊥e1 ,...,ei−1 a0 a

Thus (Lemma: Maximization of Quadratic Forms for Points on the Unit Sphere),
0
a Σa
max = λi
a⊥e1 ,...,ei−1 a0 a
and maximum is attained at a = ei .
Hence, Yi = e0i X and Var (Yi ) = e0i Σei = λi
Also,
Cov (Yi , Yk ) = Cov (e0i X, e0k X) = e0i Σek = 0

Sudipta Das (RKMVU) Multivariate Statistics 9 / 19

Population Principal Components VI

Result: Let the random vector X = [X1 , X2 , . . . , Xp ]0 have

covariance matrix Σ, with the eigenvalue- eigenvector pairs
(λ1 , e1 ), (λ2 , e2 ), . . . , (λp , ep ), where λ1 ≥ λ2 ≥ · · · λp ≥ 0. Let
Y1 = e1 0 X, Y2 = e2 0 X, . . . , Yp = ep 0 X be the principal components.
Then
p
X
Var (Xi ) = σ11 + σ22 + · · · + σpp
i=1
= tr (Σ)
= λ1 + λ2 + · · · + λp
p
X
= Var (Yi ).
i=1

Sudipta Das (RKMVU) Multivariate Statistics 10 / 19

Population Principal Components VII

Proportion of total population variance explained by k th principal

component:
λk
λ1 + . . . + λk + . . . + λp
Proportion of total population variance explained by first k
principal components:

λ1 + . . . + λk
λ1 + . . . + λk + . . . + λp

Sudipta Das (RKMVU) Multivariate Statistics 11 / 19

Population Principal Components VIII

Result: If Y1 = e1 0 X, Y2 = e2 0 X, . . . , Yp = ep 0 X are the principal

components obtained from the covariance matrix Σ, then
√
eik λi
ρYi ,Xk = √ , for i, k = 1, 2, . . . , p
σkk

are the correlation coefficients between the components Yi and

the variables Xk . Here (λ1 , e1 ), (λ2 , e2 ), . . . , (λp , ep ) are the
eigenvalue- eigenvector pairs for Σ.
The magnitude of eik measures the importance of k th variable (Xk )
to the ith principal component (Yi )

Sudipta Das (RKMVU) Multivariate Statistics 12 / 19

Population Principal Components IX

Sketch of proof:

ρYi ,Xk = Cor (Yi , Xk )

Cov (ei0 X, [0 0 . . . 1 . . . 0]X)
k th
= p
Var (Yi )Var (Xk )
[0 0 . . . 1 . . . 0]Σei
th
= √k
λi σkk
√
λi eik λi eik
= √ = √
λi σkk σkk

Sudipta Das (RKMVU) Multivariate Statistics 13 / 19

Principal Components on Standardized Variable I

Given the vector X , the standardized vector can be obtained as

1
Z = V − 2 (X − µ),

σ11 ... 0
 
 . .. .. 
recall V =  .. . . .
0 ... σpp
Note:
E(Z) = 0 = [0 .. . 0]0 
1 ρ12 ... ρ1p
 .. .. ..
Cov (Z) = ρ =  . .

. .
ρ1p ρ2p ... 1

Sudipta Das (RKMVU) Multivariate Statistics 14 / 19

Principal Components on Standardized Variable II

Result: The ith principal component of the standardized variables

Z = [Z1 Z2 . . . Zp ]0 with Cov (Z) = ρ, is given by

Yi = ei0 Z, for i = 1, 2, . . . , p.

Moreover,
p
X p
X
Var (Yi ) = Var (Zi ) = p.
i=1 i=1

In this case, (λ1 , e1 ), (λ2 , e2 ), . . . , (λp , ep ) are the eigenvalue-

eigenvector pairs for ρ, with λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0.

Sudipta Das (RKMVU) Multivariate Statistics 15 / 19

Principal Components on Standardized Variable III

Proportion of total population variance explained by k th principal

component:
λk
p
Proportion of total population variance explained by first k
principal components:

λ1 + . . . + λk
p

Sudipta Das (RKMVU) Multivariate Statistics 16 / 19

Summarizing Sample Variations by Principal
Components I
Result: Let X be the observation on the variables X1 , X2 , . . . , Xp
with the corresponding sample covariance matrix Sp×p . Then the
ith sample principal component is given by

Ŷi = ê0i X = êi1 X1 + · · · + êip Xp for i = 1, 2 . . . , p,

where (λ̂1 , ê1 ), (λ̂2 , ê2 ), . . . , (λ̂p , êp ) are the eigenvalue-
eigenvector pairs for S, with λ̂1 ≥ λ̂2 ≥ · · · ≥ λ̂p ≥ 0. Also,
Var (Ŷi ) = λ̂i , for i = 1, 2, . . . , p

and
Cov (Ŷi , Ŷk ) = 0, for i 6= k .

In addition,
p
X p
X
Total Sample Variance = sii = λ̂i
i=1 i=1

Sudipta Das (RKMVU) Multivariate Statistics 17 / 19

Summarizing Sample Variations by Principal
Components II

Result: Let Z be the observation on the variables Zi = X√i −sX̄i s,
ii
i = 1, . . . , p, with the corresponding sample covariance matrix
Rp×p . Then the ith sample principal component is given by

Ŷi = ê0i Z = êi1 Z1 + · · · + êip Zp for i = 1, 2 . . . , p,

where (λ̂1 , ê1 ), (λ̂2 , ê2 ), . . . , (λ̂p , êp ) are the eigenvalue-
eigenvector pairs for R, with λ̂1 ≥ λ̂2 ≥ · · · ≥ λ̂p ≥ 0. Also,

Var (Ŷi ) = λ̂i , for i = 1, 2, . . . , p

and
Cov (Ŷi , Ŷk ) = 0, for i 6= k .

In addition,
p
X
Total Sample Variance = λ̂i = p
i=1

Sudipta Das (RKMVU) Multivariate Statistics 18 / 19

Summarizing Sample Variations by Principal
Components III

How many principal components to be retained?

No definite answer.
Subjectively, we deice on
the relative size of the eigenvalues and the amount of sample
variation explained
subject-matter interpretations of the components is also important
Visual aid: Scree Plot
Plot of λ̂i vs i
To determine the appropriate number of components we look for an
elbow (bend) in the scree plot
The number of components is taken to be the point at which the
remaining eigenvalues are relatively small and all about the same
size.

Sudipta Das (RKMVU) Multivariate Statistics 19 / 19

8MA0-01 As Pure Mathematics May 2018 Mark Scheme (Final)
33% (6)
8MA0-01 As Pure Mathematics May 2018 Mark Scheme (Final)
26 pages
Lecture Notes in Measure Theory and Integration
No ratings yet
Lecture Notes in Measure Theory and Integration
40 pages
Principal_Components
No ratings yet
Principal_Components
5 pages
Lecture_note5
No ratings yet
Lecture_note5
53 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Bia b350f Unit 4
No ratings yet
Bia b350f Unit 4
38 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Unit5 1
No ratings yet
Unit5 1
98 pages
Lecture 9 PRINCIPAL COMPONENTS
No ratings yet
Lecture 9 PRINCIPAL COMPONENTS
7 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Week 2 Notes
No ratings yet
Week 2 Notes
23 pages
1731050702_ML15_PCA
No ratings yet
1731050702_ML15_PCA
12 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
AE - Tema 2 - Principal Component Analysis
No ratings yet
AE - Tema 2 - Principal Component Analysis
4 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Multivariate Analysis Notes
No ratings yet
Multivariate Analysis Notes
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
39 pages
Principal Component Analysis (PCA) Final
No ratings yet
Principal Component Analysis (PCA) Final
37 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Factor_Analysis (1)
No ratings yet
Factor_Analysis (1)
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Unit-3
No ratings yet
Unit-3
28 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
Multivariate Material
No ratings yet
Multivariate Material
58 pages
Component Analysis Is A Dimension-Reduction Tool That Can
No ratings yet
Component Analysis Is A Dimension-Reduction Tool That Can
2 pages
Module12 - Unsupervised Learning
No ratings yet
Module12 - Unsupervised Learning
52 pages
Pca
No ratings yet
Pca
10 pages
Spectrum Estimation For Large Dimensional Covariance Matrices Using Random Matrix Theory
No ratings yet
Spectrum Estimation For Large Dimensional Covariance Matrices Using Random Matrix Theory
25 pages
Principal Component Analysis Slides
No ratings yet
Principal Component Analysis Slides
26 pages
09 PCA
No ratings yet
09 PCA
22 pages
Week04
No ratings yet
Week04
86 pages
Principal Component Analysis: 2.1 Definition of Principal Components
No ratings yet
Principal Component Analysis: 2.1 Definition of Principal Components
8 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Week 4
No ratings yet
Week 4
13 pages
Eigen Value and Vector
No ratings yet
Eigen Value and Vector
5 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
Jolliffe 2014
No ratings yet
Jolliffe 2014
5 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
The Mathematics Behind Principal Component Analysis
No ratings yet
The Mathematics Behind Principal Component Analysis
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
Chapter-4 Principal Component Analysis-Based Fusion
No ratings yet
Chapter-4 Principal Component Analysis-Based Fusion
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Ch9-Factor Analysis Model
No ratings yet
Ch9-Factor Analysis Model
44 pages
04 Presentation D 20 Sep 2021
No ratings yet
04 Presentation D 20 Sep 2021
55 pages
L3
No ratings yet
L3
38 pages
Lecture 3
No ratings yet
Lecture 3
5 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
PCA 1 Geladi Comprehensive Chemometrics 2020
No ratings yet
PCA 1 Geladi Comprehensive Chemometrics 2020
21 pages
Statistics For Applications 9: Principal Component Analysis (PCA)
No ratings yet
Statistics For Applications 9: Principal Component Analysis (PCA)
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
CPA 200 COmponents
No ratings yet
CPA 200 COmponents
11 pages
Imm4000 PDF
No ratings yet
Imm4000 PDF
5 pages
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
STAT 251 Course Text
No ratings yet
STAT 251 Course Text
179 pages
2.6 Four-Link Mechanism: Velocity Analysis
No ratings yet
2.6 Four-Link Mechanism: Velocity Analysis
4 pages
Lesson 6, Coordinate Geometry (II) : Midpoint, Distance, Parabola, Circles and More
No ratings yet
Lesson 6, Coordinate Geometry (II) : Midpoint, Distance, Parabola, Circles and More
5 pages
Counting_Graph_Homomorphisms
No ratings yet
Counting_Graph_Homomorphisms
46 pages
9-Inverse Trigonometric Ratios
No ratings yet
9-Inverse Trigonometric Ratios
4 pages
Circles Sheet
No ratings yet
Circles Sheet
15 pages
DarylL - loganAFirstCourse Bar Convergence-1
No ratings yet
DarylL - loganAFirstCourse Bar Convergence-1
10 pages
09 - de - Moivre Theorem
No ratings yet
09 - de - Moivre Theorem
6 pages
Secant
No ratings yet
Secant
2 pages
23MA1407 - Or - Syllabus
No ratings yet
23MA1407 - Or - Syllabus
3 pages
Pseudo Affine Projection Sign Algorithm For
No ratings yet
Pseudo Affine Projection Sign Algorithm For
2 pages
Course Outline (MATH-351 - Numerical Methods)
No ratings yet
Course Outline (MATH-351 - Numerical Methods)
3 pages
MTH11502-Engineering Mathematics II
No ratings yet
MTH11502-Engineering Mathematics II
6 pages
Pemberton: Mathematics
100% (1)
Pemberton: Mathematics
19 pages
9709 w04 QP 3
No ratings yet
9709 w04 QP 3
4 pages
CH 6 Numerical Differentiation and Integration
No ratings yet
CH 6 Numerical Differentiation and Integration
21 pages
Reference Books
No ratings yet
Reference Books
1 page
Module 11: Introduction To Optimal Control: Lecture Note 3
100% (1)
Module 11: Introduction To Optimal Control: Lecture Note 3
5 pages
PDF HMH Texas Algebra 2 Timothy D. Kanold Download
100% (5)
PDF HMH Texas Algebra 2 Timothy D. Kanold Download
53 pages
AD
0% (1)
AD
5 pages
Hungarian Method Calculator
No ratings yet
Hungarian Method Calculator
6 pages
MTP290 Tut5
No ratings yet
MTP290 Tut5
2 pages
Xi Maths
No ratings yet
Xi Maths
273 pages
Fundamental Objectives of Structural Dynamics
No ratings yet
Fundamental Objectives of Structural Dynamics
11 pages
2024 Math 10 1st Periodical
No ratings yet
2024 Math 10 1st Periodical
3 pages
GR 7 Statistics Module7
No ratings yet
GR 7 Statistics Module7
5 pages
Math1011 CH-3
No ratings yet
Math1011 CH-3
112 pages
GLS e FGLS
No ratings yet
GLS e FGLS
10 pages