0% found this document useful (0 votes)
18 views81 pages

D3S2 _ Unsupervised - Dimensionality Reduction

Uploaded by

THOR GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views81 pages

D3S2 _ Unsupervised - Dimensionality Reduction

Uploaded by

THOR GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

DYPW - DAY 3 SESSION 2

Unsupervised - Dimensionality Reduction


Topics to cover in this session
1. Dimensionality Reduction
2. Principal Component Analysis (PCA)
3. t-Distributed Stochastic Neighbor Embedding (t-SNE)
Dimensionality Reduction
● It refers to the techniques that reduce the number of input variables in a dataset.
Dimensionality Reduction

Imagine you have such a dataset to deal with…


It has total of 50 variables (features) in it.

Data x1 x2 x3 x4 x5 x6 … x49 x50

Row 1 …

Row 2 …

Row 3 …
Dimensionality Reduction
Data f1 f2 f3

Now, if you reduce this high dimensional data Row 1


to let’s say 2 or 3 features only…
Row 2

Row 3
Dimensionality Reduction
Data f1 f2 f3

Now, if you reduce this high dimensional data Row 1


to let’s say 2 or 3 features only…
Row 2

Row 3

…then the technique used here is referred to as


the dimensionality reduction technique.
Dimensionality Reduction
We are going to study two such techniques in this session:-
1. PCA
2. t-SNE
Need of Dimensionality Reduction?
There are multiple reasons how reducing the dimensions of data is useful!
● What will happen if this five days
workshop content is been taught
to you in just one day? Then based
on this, you have one assignment to
solve!

● What all sort of problems do you


think you will face?
1) More the dimensions, more the axes, and more difficult to plot and visualize the data.
2) More the dimensions, more the computations. Hence, more time consuming.
3) More the dimensions, more is the possibility of overfitting on the data.
Principal Component Analysis
It is a technique to reduce the dimensionality of datasets,
increasing the interpretability but at the same time
minimizing the loss of information.
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x1

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x1

Let’s try to draw a line on this feature space

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x1

Let’s try to draw a line on this feature space


and project these data points on that line.

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x1

Now, again draw a line which should


be perpendicular to the first line

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x1

Project these data points on this line

x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?

x1

Line 1 Line 2
x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?

x1

Line 1 Line 2
x2

As compared to line 2, the spread of the data in line 1 is


captured well if you compare it to the original data.
Now, which line did a better job at capturing the data
with less loss of variety in the information?

x1

Line 1 Line 2
x2

As compared to line 2, the spread of the data in line 1 is Thus, the variety of information is
captured well if you compare it to the original data. captured more in line 1.
PCA Terminologies
PCA Terminologies
Dimension:-
- Number of features / variables / columns in a dataset

Data x1 x2 x3 x4 x5 x6 … x49 x50

Row 1 …

Row 2 …

Row 3 …
PCA Terminologies
Principal Component:-
- New feature(s) that are constructed as linear combinations of initial variables
- The lines we drew were each a principal component
- Principal Component 1 (PC1) captures the maximum variance in the data (line 1)
- Followed by PC2 which captures the next maximum variance in the data (line 2)

Line 1 - PC1 Line 2 - PC2


PCA Terminologies
Projections:-
- Shifting the original point on to the principal components.
x1

x2
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.

A) True
B) False
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.

A) True
B) False
Answer this!
Q. Why should principal components be orthogonal (perpendicular) to each other?

A) It is not compulsory, they could be non-orthogonal too


B) To ensure they are uncorrelated, simplifying interpretation and subsequent analysis
C) To maximize the variance, making them highly correlated with each other
D) All of the above
Answer this!
Q. Why should principal components be orthogonal (perpendicular) to each other?

A) It is not compulsory, they could be non-orthogonal too


B) To ensure they are uncorrelated, simplifying interpretation and subsequent analysis
C) To maximize the variance, making them highly correlated with each other
D) All of the above
Mathematical Intuition
Mathematical Intuition

● The whole idea is to find the values of these loadings.


● These loadings represent the directions (of line / axis) in original feature space.
● They indicate the contribution of each original feature w.r.t. its principal component.
Flowchart of steps involved in PCA
Flowchart of steps involved in PCA

Standardization

● Gives all features equal weight and ensure that they have comparable variances
Flowchart of steps involved in PCA

Standardization Covariance Matrix

● Represents the relationships and variances between pairs of features


Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

● Represents the amount of variance explained by each principal component


Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

Eigenvectors

● They represent the loadings as we saw earlier which are the axes in PCA plot
Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

Projection Eigenvectors

● Project data onto new coordinate system to show on the PCA plot
Let’s understand this process with an example
Data x1 x2

Row 1 4 11
Again, note that we are choosing only two dimensions in the
example for the ease of demonstration. Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 1:- Standardization

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14

Do you think standardization is required for this data?


Step 1:- Standardization

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14

Do you think standardization is required for this data?


- No, as the data points are in the same range, it is not required.
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Can you answer this?
Q. Guess the shape of covariance matrix with 3 features and how will the matrix look like?
Can you answer this?
Q. Guess the shape of covariance matrix with 3 features and how will the matrix look like?
Step 3:- Decompose Covariance Matrix into Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 3:- Decompose Covariance Matrix into Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 3:- Decompose Covariance Matrix into Eigenvalues

● The first eigenvalue explains the maximum variance, hence the


corresponding eigenvector is the first principal component PC1.
Data x1 x2
● Hence, the second eigenvalue corresponds to the second principal
component PC2 which has captured the next maximum available Row 1 4 11
variance after PC1.
Row 2 8 4
● As we are looking to reduce the dimensionality of the given dataset,
it basically means that PC1 is what we are trying to get here. Row 3 13 5

● However, we can get two principal components at max in this case. If Row 4 7 14
data had 30 features, we would have got 30 eigenvalues and we can
then have corresponding PC1 to PC30 principal components.
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
● Unit eigenvectors have a magnitude of 1, making them
invariant to scaling, ensuring that the variance is
captured solely due to the direction.
Step 4:- Computation of Eigenvectors from Eigenvalues

Data x1 x2

Row 1 4 11
● You can try and do it for second eigenvector for PC2.
● The calculation process is just the same! Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 5:- Projections on principal components

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 5:- Projections on principal components

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14

Let’s add PC1 and PC2 columns here to show the projection
Step 5:- Projections on principal components

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31

Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 5:- Projections on principal components

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31

Row 2 8 4 3.74

Row 3 13 5

Row 4 7 14
Step 5:- Projections on principal components

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31

Row 2 8 4 3.74

Row 3 13 5 5.69

Row 4 7 14 -5.12

Similarly, for PC13 and PC14


Step 5:- Projections on principal components

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

Repeat these steps for PC2


- Note that, we don’t really need PC2 here, but for the
visualization purpose we are calculating it here.
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

This is the actual plot of the data.


Including the mean (8, 8.5).

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

As we centralized our data to mean, that is


the origin for our PCA plot.

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

Adding dotted line on that origin for PCA plot

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

Now, mark the direction of PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

Now, mark the direction of PC2

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

Plot the PC1 axis on graph! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

Plot the PC2 axis on graph! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


1

PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51


-1 1

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24


-1

This is the transformed axes for PCA plot! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

This is the transformed axes for PCA plot! PC1


Remove anything extra…

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93 -4.31

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

Check the projections of data points on PC1 now! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20 3.74

Row 4 7 14 -5.12 2.24

Check the projections of data points on PC1 now! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20


5.69
Row 4 7 14 -5.12 2.24

Check the projections of data points on PC1 now! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93 -5.12

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

Check the projections of data points on PC1 now! PC1

x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

Similarly, you can do the projection for PC2. PC1


But did you get the whole point of
what has happened?
x1
Let’s visualize what just happened!
x2

Data x1 x2 PC1 PC2


PC2
PC2
Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20

Row 4 7 14 -5.12 2.24

If not, this will make it clear… PC1


Make the center of PCA plot meet the origin of
the actual plot and change the coordinates!
x1
Let’s visualize what just happened!

Data x1 x2 PC1 PC2


PC2

Row 1 4 11 -4.31 -1.93

Row 2 8 4 3.74 -2.51

Row 3 13 5 5.69 2.20


PC1
Row 4 7 14 -5.12 2.24

Did you see the anticlockwise


rotational transformation?
Checkout the Python Implementation of PCA

You might also like