Principal Component Analysis
Principal Component Analysis
The main idea behind PCA is to figure out patterns and correlations
among various features in the data set. On finding a strong correlation
between different variables, a final decision is made about reducing the
dimensions of the data in such a way that the significant data is still
retained.
Consider an example, let’s say that we have 2 variables in our data set,
one has values ranging between 10-100 and the other has values between
1000-5000. In such a scenario, it is obvious that the output calculated by
using these predictor variables is going to be biased since the variable
with a larger range will have a more obvious impact on the outcome.
Simple math, isn’t it? Now let’s move on and look at the next step in
PCA.
The last step in performing PCA is to re-arrange the original data with
the final principal components which represent the maximum and the
most significant information of the data set. In order to replace the
original data axis with the newly formed Principal Components, you
simply multiply the transpose of the original data set by the transpose of
the obtained feature vector.
So that was the theory behind the entire PCA process. It’s time to get
your hands dirty and perform all these steps by using a real data set
Introduction
Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance
matrix.
Assume we have the below dataset which has 4 features and a total of 5
training examples.
Dataset matrix
First, we need to standardize the dataset and for that, we need to calculate
the mean and standard deviation for each feature.
1−4 5−4
=0.3333
3 3
= -1and
Standardization formula
After applying the formula for each feature in the dataset is transformed
as below:
Standardized Dataset
Covariance Formula
the covariance matrix for the given dataset will be calculated as below
For standardized the dataset, the mean for each feature is 0 and the
standard deviation is 1.
In the similar way be can calculate the other covariances and which will
result in the below covariance matrix
Let A be a square matrix (in our case the covariance matrix), ν a vector
and λ a scalar that satisfies Aν = λν, then λ is called eigenvalue
associated with eigenvector ν of A.
Rearranging the above equation,
Aν-λν =0 ; (A-λI)ν = 0
Since we have already know ν is a non- zero vector, only way this
equation can be equal to zero, if
det(A-λI) = 0
Eigenvectors:
Solving the (A-λI)ν = 0 equation for ν vector with different λ values:
For λ = 2.51579324, solving the above equation using Cramer's rule, the
values for v vector are
v1 = 0.16195986
v2 = -0.52404813
v3 = -0.58589647
v4 = -0.59654663
Going by the same approach, we can calculate the eigen vectors for the
other eigen values. We can from a matrix using the eigen vectors.
4. Sort eigenvalues and their corresponding eigenvectors.
Since eigenvalues are already sorted in this case so no need to sort them
again.
If we choose the top 2 eigenvectors, the matrix will look like this:
https://www.youtube.com/watch?v=Ao_iYZ50RNY
https://www.youtube.com/watch?v=kn_rLM1ZS2Q
https://www.youtube.com/watch?v=cCqCcC2o16U
https://www.youtube.com/watch?v=g-Hb26agBFg
https://www.youtube.com/watch?v=VzPpJXISz-E
https://www.youtube.com/watch?v=f9mZ8dSrVJA
https://www.youtube.com/watch?v=kEjhbylvk0I
https://www.youtube.com/watch?v=g-Hb26agBFg
https://www.youtube.com/watch?v=VzPpJXISz-E
PCA in R
https://www.youtube.com/watch?v=0Jp4gsfOLMs
https://www.youtube.com/watch?v=xKl4LJAXnEA
https://www.youtube.com/watch?v=NLrb41ls4qo