0% found this document useful (0 votes)
19 views

Steps Involved in The PCA: Dataset Matrix

The document outlines the steps involved in principal component analysis (PCA): 1) Standardize the dataset. 2) Calculate the covariance matrix. 3) Calculate the eigenvalues and eigenvectors of the covariance matrix. 4) Sort the eigenvalues and eigenvectors. 5) Form a matrix using the top k eigenvectors. 6) Transform the original dataset using the eigenvector matrix.

Uploaded by

laalu mukkamala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Steps Involved in The PCA: Dataset Matrix

The document outlines the steps involved in principal component analysis (PCA): 1) Standardize the dataset. 2) Calculate the covariance matrix. 3) Calculate the eigenvalues and eigenvectors of the covariance matrix. 4) Sort the eigenvalues and eigenvectors. 5) Form a matrix using the top k eigenvectors. 6) Transform the original dataset using the eigenvector matrix.

Uploaded by

laalu mukkamala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Steps Involved in the PCA

Step 1: Standardize the dataset.


Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step 4: Sort eigenvalues and their corresponding eigenvectors.
Step 5: Pick k eigenvalues and form a matrix of eigenvectors.
Step 6: Transform the original matrix.

1. Standardize the Dataset


Assume we have the below dataset which has 4 features and a total of 5 training
examples.

Dataset matrix

First, we need to standardize the dataset and for that, we need to calculate the mean
and standard deviation for each feature.

Standardization formula

Mean and standard deviation before standardization


After applying the formula for each feature in the dataset is transformed as below:

Standardized Dataset

2. Calculate the covariance matrix for the whole dataset


The formula to calculate the covariance matrix:
Covariance Formula the covariance matrix for the given dataset will be calculated as
below

Since we have standardized the dataset, so the mean for each feature is 0 and the
standard deviation is 1.

var(f1) = ((-1.0-0)² + (0.33-0)² + (-1.0-0)² +(0.33–0)² +(1.33–0)²)/5


var (f1) = 0.8

cov(f1,f2) =
((-1.0–0)*(-0.632456-0) +
(0.33–0)*(1.264911-0) +
(-1.0–0)* (0.632456-0)+
(0.33–0)*(0.000000 -0)+
(1.33–0)*(-1.264911–0))/5
cov(f1,f2 = -0.25298

In the similar way be can calculate the other covariances and which will result in the
below covariance matrix

covariance matrix (population formula)

3. Calculate eigenvalues and eigen vectors.


An eigenvector is a nonzero vector that changes at most by a scalar factor when that
linear transformation is applied to it. The corresponding eigenvalue is the factor by
which the eigenvector is scaled.

Let A be a square matrix (in our case the covariance matrix), ν a vector and λ a scalar
that satisfies Aν = λν, then λ is called eigenvalue associated with eigenvector ν of A.
Rearranging the above equation,

Aν-λν =0 ; (A-λI)ν = 0
Since we have already know ν is a non- zero vector, only way this equation can be
equal to zero, if

det(A-λI) = 0
A-λI = 0

Solving the above equation = 0

λ = 2.51579324 , 1.0652885 , 0.39388704 , 0.02503121

Eigenvectors:

Solving the (A-λI)ν = 0 equation for ν vector with different λ values:

For λ = 2.51579324, solving the above equation using Cramer's rule, the values for v
vector are

v1 = 0.16195986
v2 = -0.52404813
v3 = -0.58589647
v4 = -0.59654663

Going by the same approach, we can calculate the eigen vectors for the other eigen
values. We can from a matrix using the eigen vectors.

eigenvectors(4 * 4 matrix)

4. Sort eigenvalues and their corresponding eigenvectors.


Since eigenvalues are already sorted in this case so no need to sort them again.

5. Pick k eigenvalues and form a matrix of eigenvectors


If we choose the top 2 eigenvectors, the matrix will look like this:
Top 2 eigenvectors(4*2 matrix)

6. Transform the original matrix.


Feature matrix * top k eigenvectors = Transformed Data

Data Transformation

You might also like