Dimensionality Reduction by Pca: Non - Feasible
Dimensionality Reduction by Pca: Non - Feasible
by PCA
Motivation
n Doing Exhaustive search is very expensive
– non-feasible
n Doing Wrapper based feature selection
(SFS,SBS, SFFS, etc ) again very
expensive
n Doing filter based feature selection is
suboptimal
n Solution??? Try to automate this method
14
Spread of Data
n Often data varies in only some limited
directions
n Can’t spot low dimensional data by looking
at numbers
Example
15
16
17
18
Dimensionality Reduction
n Reduce dimensions by projecting onto low
dimensional subspace with maximum
variation
n You can consider this as dropping non-
necessary axis and rotating the remaining
axis
Data Compression
19
PCA is not linear regression
20
Orthogonal Axis that
capture the max
variance of the data
21
Principal Components
• First principal component is the d irection o f g reatest
variability (covariance) in the d ata
• A nd so o n …
n Properties
n It can b e viewed a s a rotation o f the e xisting a xes to n ew
positions in the space d efined b y o riginal variables
n New axes a re o rthogonal a nd represent the d irections with
maximum variability
22
SOME BACKGROUND OF
STATISTICS
23
1st order statistics
n Std Dev:
sP
n
i=1 (Xi µ)2
=
(n 1)
n Variance (s2)
Pn
2 i=1 (Xi µ)2
=
(n 1)
Covariance
n Covariance always between two
dimensions cov (x,y)
n Covariance with itself is variance
n Covariance of 3-dimensional data set
(x,y,z),
n Measure cov between (x,y), (x,z) and (y,z)
24
n Variance:
Pn
i=1 (Xi µ)(Xi µ)
var(X) =
(n 1)
n Covariance:
Pn
i=1 (Xi µX )(Yi µY )
cov(X, Y ) =
(n 1)
In English
n For each data item, multiply the difference
between the x value and the mean of x, by
the the difference between the y value and
the mean of y. Add all these up, and divide
by (n-1)
25
Question
n Is cov(X,Y) equal to cov(Y,X) ??
n (Xi-μx )(Yi-μy ) and (Yi-μy ) (Xi-μx ) and
multiplication is commutative
Covariance Matrix
n For dataset with dimensions more than 2,
n!
you can calculate different
covariance values (n 2)! ⇤ 2
n Calculate for n=3
0 1
cov(x, x) cov(x, y) cov(x, z)
C = @ cov(y, x) cov(y, y) cov(y, z) A
cov(z, x) cov(z, y) cov(z, z)
26
Matrix Algebra
n Matrix * vector = rotated and scaled vector
n Matrix * vector = ONLY scaled vector and
NO rotation
Vector = Eigen vector
Example
27
Example 2
Eigen Vectors
n Can only be found for Square Matrices
n Give nxn matrix, there are n Eigen vectors
n Even if we scale the Eigen vector, you get
same multiple as a result—cuz you are
only scaling the vector, its direction
remains the same
n All Eigen vectors of a matrix are
orthogonal
n Usually Eigen vectors are calculated as
unit vectors: magnitude is exactly one
28
Eigen Vector
✓ ✓◆ 3 ◆
3
2 2
pp 2 p p
2 = 13
32 +3 22+=2 13
✓ ✓p3/p◆13 ◆
3/p13 p
2/ 13
2/ 13
Eigen Value
n In both those examples, the amount by
which the original vector was scaled after
multiplication by the square matrix was the
same
n Eigen value of this Eigen vector is 4
29
Principal Components Analysis
n Step 1: Get some data
DATA:
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9 59
n Step 2
ZERO MEAN DATA:
n Subtract Mean
x y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 .79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
30
n Step 3
nCalculate Covariance matrix
cov = .616555556 .615444444
.615444444 .716555556
n Step 4:
n Calculate the eigenvectors and
eigenvalues of the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
31
Eigen Vectors
64
32
Note:
• Note they are perpendicular to each other.
• Note one of the eigenvectors goes through
the middle of the points, like drawing a line
of best fit.
• The second eigenvector gives us the
other, less important, pattern in the data,
that all the points follow the main line, but
are off to the side of the main line by some
amount.
You do lose some information, but if the eigenvalues are
small, you don’t lose much
33
PCA Example –STEP 5
n Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of
the eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
67
Eigen Vectors
n Ureduce = U(:,1:k);
n z = Ureduce’*x; % Projected
data
34
PCA Example –STEP 5
n Deriving the new data
FinalData = RowFeatureVector x RowZeroMeanData
RowFeatureVector is the matrix with the eigenvectors in
the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at
the top
RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each
column, with each row holding a separate
dimension.
35
PCA Example –STEP 5
36
n Z = UT * X
n X = (U-1 * Z) + originalMean
x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
37
HOW TO SELECT K PCS
• The first PC retains the g reatest a mount o f variation in the sample
• The kth PC retains the kth greatest fraction o f the variation in the
sample
• The kth largest e igenvalue o f the correlation matrix C is the variance
in the sample a long the kth PC
38
Dimensionality Reduction
2 3
s11 0 0 0 0
6 0 s22 0 0 0 7
6 7
S=6 .. 7
4 0 0 . 0 0 5
0 0 0 0 snn
Pk
sii
Pi=1
n 0.99
i=1 sii
39