A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
FOR EMPLOYERS
BETA
A Step-by-Step Explanation of
Principal Component Analysis (PCA)
Zakaria Jaadi
April 1, 2021
Updated:
December 1, 2021
PCA is a widely covered method on the web, and there are some great articles
about it, but many spend too much time in the weeds on the topic, when most of
us just want to know how it works in a simplified way.
Principal component analysis can be broken down into five steps. I'll go through
each step, providing logical explanations of what PCA is doing and
simplifying mathematical concepts such as standardization, covariance,
eigenvectors and eigenvalues without focusing on how to compute them.
HIRING NOW
Reducing the number of variables of a data set naturally comes at the expense of
accuracy, but the trick in dimensionality reduction is to trade a little accuracy for
simplicity. Because smaller data sets are easier to explore and visualize and make
analyzing data much easier and faster for machine learning algorithms without
extraneous variables to process.
So to sum up, the idea of PCA is simple — reduce the number of variables of a data
set, while preserving as much information as possible.
STEP 1: STANDARDIZATION
The aim of this step is to standardize the range of the continuous initial variables
so that each one of them contributes equally to the analysis.
https://builtin.com/data-science/step-step-explanation-principal-component-analysis 2/12
12/28/21, 12:56 PM A Step-by-Step Explanation of Principal Component Analysis (PCA) | Built In
variables. That is, if there are large differences between the ranges of initial
B Evariables, those variables with larger ranges will dominate over those with small
FOR EMPLOYERS
TA
ranges (For example, a variable that ranges between 0 and 100 will dominate over a
variable that ranges between 0 and 1), which will lead to biased results. So,
transforming the data to comparable scales can prevent this problem.
Mathematically, this can be done by subtracting the mean and dividing by the
standard deviation for each value of each variable.
Once the standardization is done, all the variables will be transformed to the same
scale.
The aim of this step is to understand how the variables of the input data set are
varying from the mean with respect to each other, or in other words, to see if
there is any relationship between them. Because sometimes, variables are highly
correlated in such a way that they contain redundant information. So, in order to
identify these correlations, we compute the covariance matrix.
Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the
main diagonal (Top left to bottom right) we actually have the variances of each
initial variable. And since the covariance is commutative (Cov(a,b)=Cov(b,a)), the
View remote jobs at top tech companies nationwide
https://builtin.com/data-science/step-step-explanation-principal-component-analysis 3/12
12/28/21, 12:56 PM A Step-by-Step Explanation of Principal Component Analysis (PCA) | Built In
entries of the covariance matrix are symmetric with respect to the main diagonal,
B Ewhich means that the upper and the lower triangular portions are equal.
FOR EMPLOYERS
TA
What do the covariances that we have as entries of the matrix tell us about the
correlations between the variables?
Now, that we know that the covariance matrix is not more than a table that
summaries the correlations between all the possible pairs of variables, let’s move
to the next step.
Eigenvectors and eigenvalues are the linear algebra concepts that we need to
compute from the covariance matrix in order to determine the principal
components of the data. Before getting to the explanation of these concepts, let’s
first understand what do we mean by principal components.
(Information)
for each by
PC FOR EMPLOYERS
BETA
Organizing information in principal components this way, will allow you to reduce
dimensionality without losing much information, and this by discarding the
components with low information and considering the remaining components as
your new variables.
An important thing to realize here is that, the principal components are less
interpretable and don’t have any real meaning since they are constructed as linear
combinations of the initial variables.
HIRING NOW
As there are as many principal components as there are variables in the data,
principal components are constructed in such a manner that the first principal
component accounts for the largest possible variance in the data set. For
example, let’s assume that the scatter plot of our data set is as shown below, can
we guess the first principal component ? Yes, it’s approximately the line that
matches the purple marks because it goes through the origin and it’s the line in
which the projection of the points (red dots) is the most spread out. Or
mathematically speaking, it’s the line that maximizes the variance (the average of
the squared distances from the projected points (red dots) to the origin).
The second principal component is calculated in the same way, with the condition
View remote jobs at top tech companies nationwide
that it is uncorrelated with (i.e., perpendicular to) the first principal component
https://builtin.com/data-science/step-step-explanation-principal-component-analysis 5/12
12/28/21, 12:56 PM A Step-by-Step Explanation of Principal Component Analysis (PCA) | Built In
Without further ado, it is eigenvectors and eigenvalues who are behind all the
magic explained above, because the eigenvectors of the Covariance matrix are
actually the directions of the axes where there is the most variance(most
information) and that we call Principal Components. And eigenvalues are simply
the coefficients attached to eigenvectors, which give the amount of variance
carried in each Principal Component.
Example:
Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:
If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the
eigenvector that corresponds to the first principal component (PC1) is v1 and the
one that corresponds to the second component (PC2) isv2.
As we saw in the previous step, computing the eigenvectors and ordering them by
B Etheir eigenvalues in descending order, allow us to find the principalFOR
components in
EMPLOYERS
TA
order of significance. In this step, what we do is, to choose whether to keep all
these components or discard those of lesser significance (of low eigenvalues), and
form with the remaining ones a matrix of vectors that we call Feature vector.
So, the feature vector is simply a matrix that has as columns the eigenvectors of
the components that we decide to keep. This makes it the first step towards
dimensionality reduction, because if we choose to keep only p eigenvectors
(components) out of n, the final data set will have only p dimensions.
Example:
Continuing with the example from the previous step, we can either form a feature
vector with both of the eigenvectors v1 and v2:
Or discard the eigenvector v2, which is the one of lesser significance, and form a
feature vector with v1 only:
So, as we saw in the example, it’s up to you to choose whether to keep all the
components or discard the ones of lesser significance, depending on what you are
looking for. Because if you just want to describe your data in terms of new
variables (principal components) that are uncorrelated without seeking to reduce
dimensionality, leaving out lesser significant components is not needed.
In the previous steps, apart from standardization, you do not make any changes on
B Ethe data, you just select the principal components and form the feature vector, but
FOR EMPLOYERS
TA
the input data set remains always in terms of the original axes (i.e, in terms of the
initial variables).
In this step, which is the last one, the aim is to use the feature vector formed using
the eigenvectors of the covariance matrix, to reorient the data from the original
axes to the ones represented by the principal components (hence the name
Principal Components Analysis). This can be done by multiplying the transpose of
the original data set by the transpose of the feature vector.
***
Zakaria Jaadi is a data scientist and machine learning engineer. Check out more of
his content on Data Science topics on Medium.
References:
RELATED
Your Expertise