0% found this document useful (0 votes)

18 views81 pages

D3S2 _ Unsupervised - Dimensionality Reduction

Uploaded by

THOR GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views81 pages

D3S2 _ Unsupervised - Dimensionality Reduction

Uploaded by

THOR GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

DYPW - DAY 3 SESSION 2

Unsupervised - Dimensionality Reduction

Topics to cover in this session
1. Dimensionality Reduction
2. Principal Component Analysis (PCA)
3. t-Distributed Stochastic Neighbor Embedding (t-SNE)
Dimensionality Reduction
● It refers to the techniques that reduce the number of input variables in a dataset.
Dimensionality Reduction

Imagine you have such a dataset to deal with…

It has total of 50 variables (features) in it.

Data x1 x2 x3 x4 x5 x6 … x49 x50

Row 1 …

Row 2 …

Row 3 …
Dimensionality Reduction
Data f1 f2 f3

Now, if you reduce this high dimensional data Row 1

to let’s say 2 or 3 features only…
Row 2

Row 3
Dimensionality Reduction
Data f1 f2 f3

Now, if you reduce this high dimensional data Row 1

to let’s say 2 or 3 features only…
Row 2

Row 3

…then the technique used here is referred to as

the dimensionality reduction technique.
Dimensionality Reduction
We are going to study two such techniques in this session:-
1. PCA
2. t-SNE
Need of Dimensionality Reduction?
There are multiple reasons how reducing the dimensions of data is useful!
● What will happen if this ﬁve days
workshop content is been taught
to you in just one day? Then based
on this, you have one assignment to
solve!

● What all sort of problems do you

think you will face?
1) More the dimensions, more the axes, and more diﬃcult to plot and visualize the data.
2) More the dimensions, more the computations. Hence, more time consuming.
3) More the dimensions, more is the possibility of overﬁtting on the data.
Principal Component Analysis
It is a technique to reduce the dimensionality of datasets,
increasing the interpretability but at the same time
minimizing the loss of information.
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

Let’s try to draw a line on this feature space

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

Let’s try to draw a line on this feature space

and project these data points on that line.

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

Now, again draw a line which should

be perpendicular to the ﬁrst line

x2
For ease of understanding, let us consider that we are trying to
convert a dataset with two features into one feature.

Project these data points on this line

x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?

Line 1 Line 2
x2
Now, which line did a better job at capturing the data
with less loss of variety in the information?

Line 1 Line 2
x2

As compared to line 2, the spread of the data in line 1 is

captured well if you compare it to the original data.
Now, which line did a better job at capturing the data
with less loss of variety in the information?

Line 1 Line 2
x2

As compared to line 2, the spread of the data in line 1 is Thus, the variety of information is
captured well if you compare it to the original data. captured more in line 1.
PCA Terminologies
PCA Terminologies
Dimension:-
- Number of features / variables / columns in a dataset

Data x1 x2 x3 x4 x5 x6 … x49 x50

Row 1 …

Row 2 …

Row 3 …
PCA Terminologies
Principal Component:-
- New feature(s) that are constructed as linear combinations of initial variables
- The lines we drew were each a principal component
- Principal Component 1 (PC1) captures the maximum variance in the data (line 1)
- Followed by PC2 which captures the next maximum variance in the data (line 2)

Line 1 - PC1 Line 2 - PC2

PCA Terminologies
Projections:-
- Shifting the original point on to the principal components.
x1

x2
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.

A) True
B) False
Answer this!
Q. Number of principal components is always less than or equal to number of
features in the dataset.

A) True
B) False
Answer this!
Q. Why should principal components be orthogonal (perpendicular) to each other?

A) It is not compulsory, they could be non-orthogonal too

B) To ensure they are uncorrelated, simplifying interpretation and subsequent analysis
C) To maximize the variance, making them highly correlated with each other
D) All of the above
Answer this!
Q. Why should principal components be orthogonal (perpendicular) to each other?

A) It is not compulsory, they could be non-orthogonal too

● The whole idea is to ﬁnd the values of these loadings.

● These loadings represent the directions (of line / axis) in original feature space.
● They indicate the contribution of each original feature w.r.t. its principal component.
Flowchart of steps involved in PCA
Flowchart of steps involved in PCA

Standardization

● Gives all features equal weight and ensure that they have comparable variances
Flowchart of steps involved in PCA

Standardization Covariance Matrix

● Represents the relationships and variances between pairs of features

Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

● Represents the amount of variance explained by each principal component

Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

Eigenvectors

● They represent the loadings as we saw earlier which are the axes in PCA plot
Flowchart of steps involved in PCA

Standardization Covariance Matrix Eigenvalues

Projection Eigenvectors

● Project data onto new coordinate system to show on the PCA plot
Let’s understand this process with an example
Data x1 x2

Row 1 4 11
Again, note that we are choosing only two dimensions in the
example for the ease of demonstration. Row 2 8 4

Row 3 13 5

Row 4 7 14
Step 1:- Standardization

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14

Do you think standardization is required for this data?

Step 1:- Standardization

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5

Row 4 7 14

Do you think standardization is required for this data?

- No, as the data points are in the same range, it is not required.
Step 2:- Compute the Covariance Matrix

Data x1 x2

Row 1 4 11

Row 2 8 4

Row 3 13 5