D3S2 _ Unsupervised - Dimensionality Reduction
D3S2 _ Unsupervised - Dimensionality Reduction
Row 1 …
Row 2 …
Row 3 …
Dimensionality Reduction
Data f1 f2 f3
Row 3
Dimensionality Reduction
Data f1 f2 f3
Row 3
x1
x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.
x1
x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.
x1
x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.
x1
x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.
x1
x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?
x1
Line 1 Line 2
x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?
x1
Line 1 Line 2
x2
x1
Line 1 Line 2
x2
As compared to line 2, the spread of the data in line 1 is Thus, the variety of information is
captured well if you compare it to the original data. captured more in line 1.
PCA Terminologies
PCA Terminologies
Dimension:-
- Number of features / variables / columns in a dataset
Row 1 …
Row 2 …
Row 3 …
PCA Terminologies
Principal Component:-
- New feature(s) that are constructed as linear combinations of initial variables
- The lines we drew were each a principal component
- Principal Component 1 (PC1) captures the maximum variance in the data (line 1)
- Followed by PC2 which captures the next maximum variance in the data (line 2)
x2
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.
A) True
B) False
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.
A) True
B) False
Answer this!
Q. Why should principal components be orthogonal (perpendicular) to each other?
Standardization
● Gives all features equal weight and ensure that they have comparable variances
Flowchart of steps involved in PCA
Eigenvectors
● They represent the loadings as we saw earlier which are the axes in PCA plot
Flowchart of steps involved in PCA
Projection Eigenvectors
● Project data onto new coordinate system to show on the PCA plot
Let’s understand this process with an example
Data x1 x2
Row 1 4 11
Again, note that we are choosing only two dimensions in the
example for the ease of demonstration. Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 1:- Standardization
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 2:- Compute the Covariance Matrix
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 2:- Compute the Covariance Matrix
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 2:- Compute the Covariance Matrix
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 2:- Compute the Covariance Matrix
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 2:- Compute the Covariance Matrix
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Can you answer this?
Q. Guess the shape of covariance matrix with 3 features and how will the matrix look like?
Can you answer this?
Q. Guess the shape of covariance matrix with 3 features and how will the matrix look like?
Step 3:- Decompose Covariance Matrix into Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 3:- Decompose Covariance Matrix into Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 3:- Decompose Covariance Matrix into Eigenvalues
● However, we can get two principal components at max in this case. If Row 4 7 14
data had 30 features, we would have got 30 eigenvalues and we can
then have corresponding PC1 to PC30 principal components.
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
● Unit eigenvectors have a magnitude of 1, making them
invariant to scaling, ensuring that the variance is
captured solely due to the direction.
Step 4:- Computation of Eigenvectors from Eigenvalues
Data x1 x2
Row 1 4 11
● You can try and do it for second eigenvector for PC2.
● The calculation process is just the same! Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 5:- Projections on principal components
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 5:- Projections on principal components
Data x1 x2
Row 1 4 11
Row 2 8 4
Row 3 13 5
Row 4 7 14
Let’s add PC1 and PC2 columns here to show the projection
Step 5:- Projections on principal components
Row 1 4 11 -4.31
Row 2 8 4
Row 3 13 5
Row 4 7 14
Step 5:- Projections on principal components
Row 1 4 11 -4.31
Row 2 8 4 3.74
Row 3 13 5
Row 4 7 14
Step 5:- Projections on principal components
Row 1 4 11 -4.31
Row 2 8 4 3.74
Row 3 13 5 5.69
Row 4 7 14 -5.12
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
PC2
Row 1 4 11 -4.31 -1.93
x1
Let’s visualize what just happened!
x2
PC2
Row 1 4 11 -4.31 -1.93
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2
x1
Let’s visualize what just happened!
x2