0% found this document useful (0 votes)
3 views

ML Unit 2 Part -2

Dimensionality reduction is a technique to convert high-dimensional datasets into lower dimensions while preserving essential information, addressing the challenges of high-dimensional data, such as overfitting. The two main approaches are feature selection, which involves selecting relevant features, and feature extraction, which transforms high-dimensional data into fewer dimensions. Principal Component Analysis (PCA) is a key method in feature extraction that identifies the most significant variables by maximizing variance in the lower-dimensional space.

Uploaded by

psaritha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML Unit 2 Part -2

Dimensionality reduction is a technique to convert high-dimensional datasets into lower dimensions while preserving essential information, addressing the challenges of high-dimensional data, such as overfitting. The two main approaches are feature selection, which involves selecting relevant features, and feature extraction, which transforms high-dimensional data into fewer dimensions. Principal Component Analysis (PCA) is a key method in feature extraction that identifies the most significant variables by maximizing variance in the lower-dimensional space.

Uploaded by

psaritha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DIMENSIONALITY REDUCTION

Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information."

Need for dimensionality reduction:

Handling the high-dimensional data is very difficult in practice, commonly known as the curse of
dimensionality. If the dimensionality of the input dataset increases, any machine learning
algorithm and model becomes more complex. As the number of features increases, the number of
samples also gets increased proportionally, and the chance of overfitting also increases. If the
machine learning model is trained on high-dimensional data, it becomes overfitted and results in
poor performance.

Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.

Approaches of Dimension Reduction:

There are two ways to apply the dimension reduction technique, which are given below:
Feature Selection:

Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a
way of selecting the optimal features from the input dataset.

Three methods are used for the feature selection:

1. Filters Methods

In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:

o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.

2. Wrappers Methods

The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and evaluate the
performance. The performance decides whether to add those features or remove to increase the
accuracy of the model. This method is more accurate than the filtering method but complex to
work. Some common techniques of wrapper methods are:

o Forward Selection
o Backward Selection
o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations of the
machine learning model and evaluate the importance of each feature. Some common techniques
of Embedded methods are:

o LASSO
o Elastic Net
o Ridge Regression, etc.

Feature Extraction:

Feature extraction is the process of transforming the space containing many dimensions into
space with fewer dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the information.
Some common feature extraction techniques are:

a. Principal Component Analysis


b. Linear Discriminant Analysis
c. Kernel PCA
d. Quadratic Discriminant Analysis

Principal Component Analysis

Karl Pearson was the first person to come up with this plan. It is based on the idea that when
data from a higher-dimensional space is put into a lower-dimensional space, the lower-
dimensional space should have the most variation. In simple terms, principal component
analysis (PCA) is a way to get important variables (in the form of components) from a large
set of variables in a data set. It tends to find the direction in which the data is most spread
out. PCA is more useful when you have data with three or more dimensions.
When applying the PCA method, the following are the primary steps that should be
followed:
1. Obtain the dataset you need.
2. Calculate the mean of the vectors ().
3. Deduct the mean of the given data from the total.
4. Complete the computation for the covariance matrix.
5. Determine the eigenvectors and eigenvalues of the matrix that represents the covariance
matrix.
6. Creating a feature vector and deciding which components would be the major ones i.e. the
Principal components.
7. Create a new data set by projecting the weight vector onto the dataset. As a result, we have
a smaller number of eigenvectors, and some data may have been lost in the process.
However, the remaining eigenvectors should keep the most significant variances.

Below is the practice question for Principal Component Analysis (PCA) :

Problem-01: 2, 3, 4, 5, 6, 7; 1, 5, 3, 6, 7, 8 are the given data. Using the PCA


Algorithm, calculate the primary component.

OR

Consider the two-dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (8, 8) and (9, 10).

(7, 8).Using the PCA Algorithm, calculate the primary component.


OR

Calculate the principal component of following data-

Class1 values Class 2 values

X 2,3,4 X 5,6,7

Y 1,5,3 Y 6,7,8

Answer :

You might also like