0% found this document useful (0 votes)

14 views4 pages

Dimentiality

Uploaded by

swertyuou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

Dimentiality

Uploaded by

swertyuou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Dimensionality Reduction

Dimensionality reduction simply refers to the process of reducing the number

of attributes in a dataset while keeping as much of the variation in the original
dataset as possible. It is a data preprocessing step meaning that we perform
dimensionality reduction before training the model. In this article, we will
discuss 11 such dimensionality reduction techniques and implement them with
real-world datasets using Python and Scikit-learn libraries.

The importance of dimensionality reduction

When we reduce the dimensionality of a dataset, we lose some percentage

(usually 1%-15% depending on the number of components or features that we
keep) of the variability in the original data. But, don’t worry about losing that
much percentage of the variability in the original data because dimensionality
reduction will lead to the following advantages.

 A lower number of dimensions in data means less training time

and less computational resources and increases the overall
performance of machine learning algorithms — Machine learning
problems that involve many features make training extremely slow.
Most data points in high-dimensional space are very close to the
border of that space. This is because there’s plenty of space in high
dimensions. In a high-dimensional dataset, most data points are likely
to be far away from each other. Therefore, the algorithms cannot
effectively and efficiently train on the high-dimensional data. In
machine learning, that kind of problem is referred to as the curse of
dimensionality — this is just a technical term that you do not need to
worry about!

 Dimensionality reduction avoids the problem of overfitting —

When there are many features in the data, the models become more
complex and tend to overfit on the training data. To see this in action,
read my “How to Mitigate Overfitting with Dimensionality
Reduction” article.

 Dimensionality reduction is extremely useful for data

visualization — When we reduce the dimensionality of higher
dimensional data into two or three components, then the data can
easily be plotted on a 2D or 3D plot. To see this in action, read
my “Principal Component Analysis (PCA) with Scikit-learn” article.

 Dimensionality reduction takes care of multicollinearity — In

regression, multicollinearity occurs when an independent variable is
highly correlated with one or more of the other independent variables.
Dimensionality reduction takes advantage of this and combines those
highly correlated variables into a set of uncorrelated variables. This
will address the problem of multicollinearity. To see this in action,
read my “How do you apply PCA to Logistic Regression to remove
Multicollinearity?” article.

 Dimensionality reduction is very useful for factor analysis — This

is a useful approach to find latent variables which are not directly
measured in a single variable but rather inferred from other variables
in the dataset. These latent variables are called factors. To see this in
action, read my “Factor Analysis on Women Track Records Data
with R and Python” article.
 Dimensionality reduction removes noise in the data — By keeping
only the most important features and removing the redundant features,
dimensionality reduction removes noise in the data. This will improve
the model accuracy.

 Dimensionality reduction can be used for image

compression — image compression is a technique that minimizes the
size in bytes of an image while keeping as much of the quality of the
image as possible. The pixels which make the image can be
considered as dimensions (columns/variables) of the image data. We
perform PCA to keep an optimum number of components that balance
the explained variability in the image data and the image quality. To
see this in action, read my “Image Compression Using Principal
Component Analysis (PCA)” article.

 Dimensionality reduction can be used to transform non-linear

data into a linearly-separable form — Read the Kernel PCA section
of this article to see this in action!

Principal Component Analysis (PCA)

PCA is one of my favorite machine learning algorithms. PCA is a linear

dimensionality reduction technique (algorithm) that transforms a set of
correlated variables (p) into a smaller k (k<p) number of uncorrelated variables
called principal components while retaining as much of the variation in the
original dataset as possible. In the context of Machine Learning (ML), PCA is an
unsupervised machine learning algorithm that is used for dimensionality
reduction.
As this is one of my favourite algorithms, I have previously written several
contents for PCA. If you’re interested to learn more about the theory behind
PCA and its Scikit-learn implementation, you may read the following contents
written by me.

 Principal Component Analysis (PCA) with Scikit-learn

 Statistical and Mathematical Concepts behind PCA
 Principal Component Analysis for Breast Cancer Data with R and
Python
 Image Compression Using Principal Component Analysis (PCA)

Top 11 Dimensionality Reduction Techniques
No ratings yet
Top 11 Dimensionality Reduction Techniques
12 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Module 2 ML
No ratings yet
Module 2 ML
15 pages
What Is Dimensionality Reduction
No ratings yet
What Is Dimensionality Reduction
3 pages
Inbound 3415279694782152083
No ratings yet
Inbound 3415279694782152083
6 pages
Session 6
No ratings yet
Session 6
14 pages
Deep Learning For Data Analytics 2023 Answer
No ratings yet
Deep Learning For Data Analytics 2023 Answer
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Machine Learning Dimensionality Guide
No ratings yet
Machine Learning Dimensionality Guide
9 pages
Unit 4 - ML (NEW)
No ratings yet
Unit 4 - ML (NEW)
80 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
6 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
38 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
ICACCI 2015 7275954-Pca
No ratings yet
ICACCI 2015 7275954-Pca
4 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium
No ratings yet
What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium
16 pages
Introduction to PCA and Dimensionality Reduction
No ratings yet
Introduction to PCA and Dimensionality Reduction
20 pages
Unit 3
No ratings yet
Unit 3
102 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
104 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
ML Module 6
No ratings yet
ML Module 6
29 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
What Is Dimensionality Reduction - Definition From TechTarget
No ratings yet
What Is Dimensionality Reduction - Definition From TechTarget
5 pages
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
No ratings yet
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
29 pages
Dimensionality Reduction: A Comparative Review
No ratings yet
Dimensionality Reduction: A Comparative Review
36 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
Beginner's Guide to Dimension Reduction
No ratings yet
Beginner's Guide to Dimension Reduction
5 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
24 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Lab #3
No ratings yet
Lab #3
12 pages
ML Unit 4
No ratings yet
ML Unit 4
50 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Module 3
No ratings yet
Module 3
41 pages
PCA
No ratings yet
PCA
21 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
33 pages
Dimensionality Reduction, PCA, and Kernel Methods
No ratings yet
Dimensionality Reduction, PCA, and Kernel Methods
3 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
ML Mod 6
No ratings yet
ML Mod 6
5 pages
9 ML
No ratings yet
9 ML
39 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
7 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Unit 3 DS
No ratings yet
Unit 3 DS
16 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
Smoothing vs. Sharpening of Color Images
No ratings yet
Smoothing vs. Sharpening of Color Images
19 pages
37 Impeller Fault Detection For A Centrifugal Pump Using Principal Component Analysis of Tim PDF
No ratings yet
37 Impeller Fault Detection For A Centrifugal Pump Using Principal Component Analysis of Tim PDF
12 pages
Construction Waste Minimization Practices
No ratings yet
Construction Waste Minimization Practices
10 pages
(Lecture Notes in Electrical Engineering 292) Jacob Scharcanski, Hugo ProenÃ A, Eliza Du (Eds.) - Signal and Image Processing For Biometrics
No ratings yet
(Lecture Notes in Electrical Engineering 292) Jacob Scharcanski, Hugo ProenÃ A, Eliza Du (Eds.) - Signal and Image Processing For Biometrics
336 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
PR Culture & Leadership Impact
No ratings yet
PR Culture & Leadership Impact
12 pages
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
No ratings yet
A Synergistic Approach For Enhancing Credit Card Fraud Detection Using Random Forest and Naïve Bayes Models
9 pages
Fractured Reservoir Characterization Lefranc Paper 4731874 01
No ratings yet
Fractured Reservoir Characterization Lefranc Paper 4731874 01
10 pages
Geostatistics: PCA & Clustering Analysis
No ratings yet
Geostatistics: PCA & Clustering Analysis
6 pages
Geothermal Potential Detection via LST
No ratings yet
Geothermal Potential Detection via LST
18 pages
PCA & LDA for Engineering Students
No ratings yet
PCA & LDA for Engineering Students
5 pages
Face Recognition Using LDA-based Algorithms
No ratings yet
Face Recognition Using LDA-based Algorithms
7 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
Teamwork in Odisha Hotels
No ratings yet
Teamwork in Odisha Hotels
6 pages
Unit 3
No ratings yet
Unit 3
50 pages
MCS 224 (2025)
No ratings yet
MCS 224 (2025)
5 pages
Survey On Categorical Data For Neural Networks: Open Access Survey Paper
No ratings yet
Survey On Categorical Data For Neural Networks: Open Access Survey Paper
41 pages
Activity 5a - Data Analysis Using R and Other Stat Application-1
No ratings yet
Activity 5a - Data Analysis Using R and Other Stat Application-1
8 pages
ASM, Image Search N Classification-2
No ratings yet
ASM, Image Search N Classification-2
4 pages
Teacher S Attitudes Toward Multicultural
No ratings yet
Teacher S Attitudes Toward Multicultural
13 pages
A Review On Facial Expression Recognition Techniques
No ratings yet
A Review On Facial Expression Recognition Techniques
8 pages
Fraud Detection in Credit Card System: EEE1007 - Neural Networks and Fuzzy Control
No ratings yet
Fraud Detection in Credit Card System: EEE1007 - Neural Networks and Fuzzy Control
7 pages
Managerial Computer Lab
No ratings yet
Managerial Computer Lab
14 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Feature Generation & Selection Guide
No ratings yet
Feature Generation & Selection Guide
50 pages
On The Eigenspectrum of The Gram Matrix and Its Relationship To The Operator Eigenspectrum
No ratings yet
On The Eigenspectrum of The Gram Matrix and Its Relationship To The Operator Eigenspectrum
18 pages
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
No ratings yet
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
24 pages
Borland, Van Heel - 1990 - Classification of Image Data in Conjugate Representation Spaces
No ratings yet
Borland, Van Heel - 1990 - Classification of Image Data in Conjugate Representation Spaces
10 pages

Dimentiality

Uploaded by

Dimentiality

Uploaded by

Dimensionality Reduction

Dimensionality reduction simply refers to the process of reducing the number

The importance of dimensionality reduction

When we reduce the dimensionality of a dataset, we lose some percentage

 A lower number of dimensions in data means less training time

 Dimensionality reduction avoids the problem of overfitting —

 Dimensionality reduction is extremely useful for data

 Dimensionality reduction takes care of multicollinearity — In

 Dimensionality reduction is very useful for factor analysis — This

 Dimensionality reduction can be used for image

 Dimensionality reduction can be used to transform non-linear

Principal Component Analysis (PCA)

PCA is one of my favorite machine learning algorithms. PCA is a linear

 Principal Component Analysis (PCA) with Scikit-learn

You might also like