ML Unit 2 Part - 2

Dimensionality reduction is a technique to convert high-dimensional datasets into lower dimensions while preserving essential information, addressing the challenges of high-dimensional data, such as overfitting. The two main approaches are feature selection, which involves selecting relevant features, and feature extraction, which transforms high-dimensional data into fewer dimensions. Principal Component Analysis (PCA) is a key method in feature extraction that identifies the most significant variables by maximizing variance in the lower-dimensional space.

Uploaded by

psaritha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views6 pages

ML Unit 2 Part - 2

Uploaded by

psaritha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DIMENSIONALITY REDUCTION

Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information."

Need for dimensionality reduction:

Handling the high-dimensional data is very difficult in practice, commonly known as the curse of
dimensionality. If the dimensionality of the input dataset increases, any machine learning
algorithm and model becomes more complex. As the number of features increases, the number of
samples also gets increased proportionally, and the chance of overfitting also increases. If the
machine learning model is trained on high-dimensional data, it becomes overfitted and results in
poor performance.

Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.

Approaches of Dimension Reduction:

There are two ways to apply the dimension reduction technique, which are given below:
Feature Selection:

Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a
way of selecting the optimal features from the input dataset.

Three methods are used for the feature selection:

1. Filters Methods

In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:

o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.

2. Wrappers Methods

The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and evaluate the
performance. The performance decides whether to add those features or remove to increase the
accuracy of the model. This method is more accurate than the filtering method but complex to
work. Some common techniques of wrapper methods are:

o Forward Selection
o Backward Selection
o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations of the
machine learning model and evaluate the importance of each feature. Some common techniques
of Embedded methods are:

o LASSO
o Elastic Net
o Ridge Regression, etc.

Feature Extraction:

Feature extraction is the process of transforming the space containing many dimensions into
space with fewer dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the information.
Some common feature extraction techniques are:

a. Principal Component Analysis

b. Linear Discriminant Analysis
c. Kernel PCA
d. Quadratic Discriminant Analysis

Principal Component Analysis

Karl Pearson was the first person to come up with this plan. It is based on the idea that when
data from a higher-dimensional space is put into a lower-dimensional space, the lower-
dimensional space should have the most variation. In simple terms, principal component
analysis (PCA) is a way to get important variables (in the form of components) from a large
set of variables in a data set. It tends to find the direction in which the data is most spread
out. PCA is more useful when you have data with three or more dimensions.
When applying the PCA method, the following are the primary steps that should be
followed:
1. Obtain the dataset you need.
2. Calculate the mean of the vectors ().
3. Deduct the mean of the given data from the total.
4. Complete the computation for the covariance matrix.
5. Determine the eigenvectors and eigenvalues of the matrix that represents the covariance
matrix.
6. Creating a feature vector and deciding which components would be the major ones i.e. the
Principal components.
7. Create a new data set by projecting the weight vector onto the dataset. As a result, we have
a smaller number of eigenvectors, and some data may have been lost in the process.
However, the remaining eigenvectors should keep the most significant variances.

Below is the practice question for Principal Component Analysis (PCA) :

Problem-01: 2, 3, 4, 5, 6, 7; 1, 5, 3, 6, 7, 8 are the given data. Using the PCA

Algorithm, calculate the primary component.

Consider the two-dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (8, 8) and (9, 10).

(7, 8).Using the PCA Algorithm, calculate the primary component.

Calculate the principal component of following data-

Class1 values Class 2 values

X 2,3,4 X 5,6,7

Y 1,5,3 Y 6,7,8

Answer :

ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Unit IV Dimensionality Reduction
No ratings yet
Unit IV Dimensionality Reduction
34 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
23 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Data Pre-processing Techniques Explained
No ratings yet
Data Pre-processing Techniques Explained
101 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Mod2 Dimensionality Reduction
No ratings yet
Mod2 Dimensionality Reduction
18 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
82 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Data Reduction Techniques
100% (1)
Data Reduction Techniques
41 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
4 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
Unit 4 - ML (NEW)
No ratings yet
Unit 4 - ML (NEW)
80 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Data Reduction Techniques Explained
No ratings yet
Data Reduction Techniques Explained
23 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
104 pages
Eigenvector-Based Feature Extraction For Classification
No ratings yet
Eigenvector-Based Feature Extraction For Classification
5 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Unit 3
No ratings yet
Unit 3
31 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
10 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
17 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
16 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
6 pages
r20 DWDM Unit 2 PART 2
No ratings yet
r20 DWDM Unit 2 PART 2
15 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Dimensionality Reduction Techniques: PCA & t-SNE
No ratings yet
Dimensionality Reduction Techniques: PCA & t-SNE
81 pages
Unit 3 - MSC
No ratings yet
Unit 3 - MSC
51 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
MySQL Queries Solutions FULL
No ratings yet
MySQL Queries Solutions FULL
5 pages
Power Bi Interview Questions
No ratings yet
Power Bi Interview Questions
17 pages
Flash Fill
No ratings yet
Flash Fill
2 pages
JOINS
No ratings yet
JOINS
9 pages
Power Bi
No ratings yet
Power Bi
5 pages
Classes in Python
No ratings yet
Classes in Python
7 pages
Sem 12
No ratings yet
Sem 12
25 pages
Unit 5 - ML
No ratings yet
Unit 5 - ML
10 pages
Ai 1
No ratings yet
Ai 1
3 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Physics COMPLETE
No ratings yet
Physics COMPLETE
110 pages
Computer ScienceCOMPLETE ICSE
No ratings yet
Computer ScienceCOMPLETE ICSE
110 pages
QP1151J
No ratings yet
QP1151J
1 page
DAX1
No ratings yet
DAX1
1 page
Project 1 AP
No ratings yet
Project 1 AP
2 pages
BA Sem 5
No ratings yet
BA Sem 5
2 pages
QP1151J
No ratings yet
QP1151J
1 page
Computers and Informatics B SC Data Science 2022 03 28 07 00 07
No ratings yet
Computers and Informatics B SC Data Science 2022 03 28 07 00 07
9 pages
Data Analyst
No ratings yet
Data Analyst
7 pages
Laboratories 93. Infrastructure Available in The Labs
No ratings yet
Laboratories 93. Infrastructure Available in The Labs
3 pages
Rapid Fire Computer Knowledge Questions
No ratings yet
Rapid Fire Computer Knowledge Questions
1 page
ER Diagram
No ratings yet
ER Diagram
1 page
Assembly Language Programming (Part I)
100% (1)
Assembly Language Programming (Part I)
20 pages
Cover
No ratings yet
Cover
1 page
Distinguishing Business Analytics and Data Analytics
No ratings yet
Distinguishing Business Analytics and Data Analytics
21 pages
STATS 2nd Quarter Tutor
No ratings yet
STATS 2nd Quarter Tutor
3 pages
Tata Motors 2010-11 Sales Analysis
No ratings yet
Tata Motors 2010-11 Sales Analysis
9 pages
Common Reasons of Failing of Students in Their Academic Performance
No ratings yet
Common Reasons of Failing of Students in Their Academic Performance
9 pages
Statistical Analysis & Graphical Tools Guide
No ratings yet
Statistical Analysis & Graphical Tools Guide
5 pages
Evaluating Financial Performance of Life Insurance Corporation
No ratings yet
Evaluating Financial Performance of Life Insurance Corporation
49 pages
Design Thinking
100% (2)
Design Thinking
37 pages
FragPipe-Analyst Report
No ratings yet
FragPipe-Analyst Report
10 pages
Precision and Recall in ML Evaluation
No ratings yet
Precision and Recall in ML Evaluation
28 pages
Inquiries, Investigation and Immersion
No ratings yet
Inquiries, Investigation and Immersion
23 pages
OpenCV Machine Learning Metrics Guide
No ratings yet
OpenCV Machine Learning Metrics Guide
3 pages
Abdallah Nasser CV
No ratings yet
Abdallah Nasser CV
1 page
Business Analytics with SAP Dashboards
No ratings yet
Business Analytics with SAP Dashboards
122 pages
Dissertation Submitted To: The Tamil Nadu Dr.M.G.R.Medical University
100% (2)
Dissertation Submitted To: The Tamil Nadu Dr.M.G.R.Medical University
13 pages
Data Science Expert Profile
No ratings yet
Data Science Expert Profile
3 pages
Project
No ratings yet
Project
13 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
3 pages
Project Guideline For BBA
No ratings yet
Project Guideline For BBA
7 pages
Data Science: M.Tech. (Computer Science & Engineering) First Semester (C.B.C.S.)
No ratings yet
Data Science: M.Tech. (Computer Science & Engineering) First Semester (C.B.C.S.)
2 pages
Cassie's Collaborative Developmental Journey
No ratings yet
Cassie's Collaborative Developmental Journey
11 pages
Data Cleaning Guide
No ratings yet
Data Cleaning Guide
4 pages
EISD BASELINE SURVEY PROPOSAL FOR NTDs
No ratings yet
EISD BASELINE SURVEY PROPOSAL FOR NTDs
44 pages
Job Stressors Impact on Performance
100% (1)
Job Stressors Impact on Performance
4 pages
Ch08 MF
No ratings yet
Ch08 MF
28 pages
Data Analysis Using R by Srilakshminarayana G (DAR)
No ratings yet
Data Analysis Using R by Srilakshminarayana G (DAR)
7 pages
Unit 1
No ratings yet
Unit 1
20 pages
The Estimation of Measurement Results Using Statistical Methods
No ratings yet
The Estimation of Measurement Results Using Statistical Methods
7 pages
Copy-EDSE 301 Lecture Note
No ratings yet
Copy-EDSE 301 Lecture Note
15 pages
Turning Dynamic Sensor Measurements From Gas Turbines Into Insights: A Big Data Approach
No ratings yet
Turning Dynamic Sensor Measurements From Gas Turbines Into Insights: A Big Data Approach
10 pages
CCW332 - Ba - At2 - Ai & DS
No ratings yet
CCW332 - Ba - At2 - Ai & DS
1 page