0% found this document useful (0 votes)

18 views18 pages

6 Dimension Reduction Theory

Uploaded by

Bhaskar Mulik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views18 pages

6 Dimension Reduction Theory

Uploaded by

Bhaskar Mulik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Why is Dimensionality Reduction important in Machine Learning and Predictive

Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not. This can
involve a large number of features, such as whether or not the e-mail has a generic title, the
content of the e-mail, whether the e-mail uses a template, etc. However, some of these features
may overlap. In another condition, a classification problem that relies on both humidity and
rainfall can be collapsed into just one underlying feature, since both of the aforementioned are
correlated to a high degree. Hence, we can reduce the number of features in such problems.

List of reasons for reducing dimensionality includes the following:

 Making the dataset easier to use
 Reducing computational cost of many algorithms
 Removing noise
 Making the results easier to understand

Advantages of Dimensionality Reduction:

 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
Singular Value Decomposition (SVD)
4 3
Factorize following matrix using SVD: A= [ ]
0 −5
Step 1: Compute its transpose AT and ATA.

Step 2: Determine the eigenvalues of ATA and sort these in descending order, in
the absolute sense. Square roots these to obtain the singular values of A.

Step 3: Construct diagonal matrix S by placing singular values in descending order

along its diagonal. Compute its inverse, S-1.
Step 4: Use the ordered eigenvalues from step 2 and compute the eigenvectors of
ATA. Place these eigenvectors along the columns of V and compute its transpose,
VT.
Step 5: Compute U as U = AVS-1. To complete the proof, compute the full SVD
using A = USVT.

Principal Component Analysis (PCA)

In PCA, the dataset is transformed from its original coordinate system to a new coordinate
system. The new coordinate system is chosen by the data itself. The first new axis is chosen in
the direction of the most variance in the data. The second axis is orthogonal to the first axis and
in the direction of an orthogonal axis with the largest variance. This procedure is repeated for as
many features as we had in the original data. We’ll find that the majority of the variance is
contained in the first few axes. Therefore, we can ignore the rest of the axes, and we reduce the
dimensionality of our data.

 Data on ʻ p ʼ variables; these variables may be correlated.

 Correlation indicates information contained in one variable is also contained in
some of the other p-1 variables.
 PCA transforms the ʻ p ʼ original correlated variables into ʻ p ʼ uncorrelated
components (also called as orthogonal components or principal components)
 These components are linear functions of the original variables.
The transformation is written as Z=XA
Where, X is nxp matrix of n observations on p variables
Z is nxp matrix of n values for each of p components
A is pxp matrix of coefficients defining the linear transformation

1. Find mean of ʻ p ʼ variables i.e x̅ and y̅

2. Matrix X is assumed to be deviations from their respective means; hence X is a
matrix of deviations from mean. i.e Matrix X contains columns of xi-x̅ and yi-y̅
3. Find covariance between ʻ p ʼ variables

4. Construct covariance matrix A

5. Find Eigen values and Eigen vectors of covariance matrix

6. Choosing Principal component. Fraction of total variance accounted for by the

jth principal component.

7. Choose principal components and form a feature vector using Eigen vectors.
8. Derive the new data set Z.
Z = X A Where X is a matrix containing columns xi-𝐱̅ and yi-𝐲̅ and A is a
matrix of Eigen vectors.
Singular Value Decomposition(SVD)
It is one of the most widely used unsupervised learning algorithms, that is at the center of many
recommendation and Dimensionality reduction.
In simple terms, SVD is the factorization of a matrix into 3 matrices. So if we have a matrix A,

then its SVD is represented by:

Where A is an m x n matrix,
U is an (m x m) orthogonal matrix, (U is also referred as left singular vectors)
𝚺 is an (m x n) nonnegative rectangular diagonal matrix (𝚺 is also referred as singular values)
V is an (n x n) orthogonal matrix (V is also referred as right singular vectors)

Independent component analysis (ICA)

It is a method for finding underlying factors or components from multivariate (multi-
dimensional) statistical data. What distinguishes ICA from other methods is that it looks for
components that are both statistically independent, and nonGaussian.”
--A.Hyvarinen, A.Karhunen, E.Oja
ICA estimation principles
 Principle 1: “Nonlinear decorrelation. Find the matrix W so that for any i ≠ j , the
components yi and yj are uncorrelated, and the transformed components g(yi) and h(yj)
are uncorrelated, where g and h are some suitable nonlinear functions.”
 Principle 2: “Maximum nongaussianity”. Find the local maxima of nongaussianity of a
linear combination y=Wx under the constraint that the variance of x is constant.
 Each local maximum gives one independent component.

Applications include: Audio Processing, Medical data, Finance, Array processing

(beamforming) etc.
ICA mathematical approach
Given a set of observations of random variables x1(t), x2(t)…xn(t), where t is the time or sample
index, assume that they are generated as a linear mixture of independent components: y=Wx,
where W is some unknown matrix. Independent component analysis now consists of estimating
both the matrix W and the yi(t), when we only observe the xi(t).

Example: Simple “Cocktail Party” Problem

Simple scenario: Two people speaking simultaneously in a room. Speeches are recorded by two

microphones in separate locations.

 Let s1(t), s2(t) be the speech signals emitted by the two speakers.

 Recorded time signals, by the two microphones, are denoted by x1(t), x2(t).

The recorded time signals can be expressed as a linear equation:

x1(t) = a11s1(t) + a12s2(t)

x2(t) = a21s1(t) + a22s2(t)

Use statistical “latent variables“ system. Random variable sk instead of time signal are latent

variables & are unknown AND Mixing matrix A is also unknown. Task here is to estimate A and

s using only the observeable random vector x.

xj = aj1s1 + aj2s2 + .. + ajnsn, for all j

x = As

Lets assume that

no. of Inedependant Componenets = no. of observable mixtures
and
A is square and invertible

So after estimating A,
we can compute W=A-1 and hence s = Wx = A-1x

Some related Concepts to understand above method:

Independent component analysis (ICA) is a method for finding underlying factors or components

from multivariate (multi-dimensional) statistical data. What distinguishes ICA from other

methods is that it looks for components that are both statistically independent, and nonGaussian.
In probability theory, two events are independent, statistically independent, or

stochastically independent if the occurrence of one does not affect the probability of occurrence

of the other.

In physics, a non-Gaussianity is the correction that modifies the expected Gaussian function

estimate for the measurement of a physical quantity.

Gaussian functions are widely used in statistics to describe the normal distributions.

In probability theory, the normal (or Gaussian or Gauss or Laplace–Gauss) distribution is a very

common continuous probability distribution.

Invertible matrix: If this is the case, then the matrix B is uniquely determined by A and is called

the inverse of A, denoted by A−1. A square matrix that is not invertible is called singular or

degenerate. A square matrix is singular if and only if its determinant is 0.

Latent variables are variables that are not directly observed but are rather inferred (through a

mathematical model) from other variables that are observed (directly measured)
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Advanced Econometrics - 1985 - 1era Edición - Amemiya
100% (1)
Advanced Econometrics - 1985 - 1era Edición - Amemiya
531 pages
3
No ratings yet
3
12 pages
5-dimension reduction
No ratings yet
5-dimension reduction
48 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Dimensionality Reduction DR (2)
No ratings yet
Dimensionality Reduction DR (2)
31 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Unit-3
No ratings yet
Unit-3
28 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Pac
No ratings yet
Pac
70 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
lec3
No ratings yet
lec3
60 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
PCA
100% (1)
PCA
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Multivariate
100% (1)
Multivariate
78 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Principal Component Analysis (PCA) Application To Images: Outline of The Lecture
No ratings yet
Principal Component Analysis (PCA) Application To Images: Outline of The Lecture
26 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Unit_5(Dimensionality_Reduction)
No ratings yet
Unit_5(Dimensionality_Reduction)
96 pages
3 - Feature Extraction
No ratings yet
3 - Feature Extraction
22 pages
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
No ratings yet
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
27 pages
MLSP-6 dimensionality reduction
No ratings yet
MLSP-6 dimensionality reduction
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Factor Analysis
No ratings yet
Factor Analysis
26 pages
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
No ratings yet
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
42 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
D3S2 _ Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 _ Unsupervised - Dimensionality Reduction
81 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
No ratings yet
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
27 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Question Bank DSTL(BCS-303)
No ratings yet
Question Bank DSTL(BCS-303)
6 pages
Mathematics For Inverse Kinematics PDF
No ratings yet
Mathematics For Inverse Kinematics PDF
30 pages
Quantum Versus Classical Mechanics and Integrability Problems
No ratings yet
Quantum Versus Classical Mechanics and Integrability Problems
464 pages
Compound Angles WS
No ratings yet
Compound Angles WS
5 pages
BasicCalQ3W1 SLM
No ratings yet
BasicCalQ3W1 SLM
16 pages
2020 HSC Mathematics Ext 1 Ans
No ratings yet
2020 HSC Mathematics Ext 1 Ans
22 pages
UserManual MAPLE
No ratings yet
UserManual MAPLE
354 pages
07 Matrix Frame
No ratings yet
07 Matrix Frame
72 pages
Quadratic Equations Assignments
No ratings yet
Quadratic Equations Assignments
4 pages
FAC1001 - Lines & Planes (I)
No ratings yet
FAC1001 - Lines & Planes (I)
21 pages
Calculus Readiness, Test
No ratings yet
Calculus Readiness, Test
36 pages
CBSE Board-XII Maths - 2015 - Eng
No ratings yet
CBSE Board-XII Maths - 2015 - Eng
17 pages
MIMO Lecture Notes 1
No ratings yet
MIMO Lecture Notes 1
11 pages
CBSE 2025_ 12th BOARD_ Vectors and 3D-Geometry
No ratings yet
CBSE 2025_ 12th BOARD_ Vectors and 3D-Geometry
140 pages
Movement Equations 2 Chap2
No ratings yet
Movement Equations 2 Chap2
19 pages
13.4 Green's Theorem
No ratings yet
13.4 Green's Theorem
8 pages
The Capacity of MIMO Channels With Per-Antenna Power Constraint
No ratings yet
The Capacity of MIMO Channels With Per-Antenna Power Constraint
26 pages
Geometry of Multiple Zeta Values
No ratings yet
Geometry of Multiple Zeta Values
9 pages
9781036011499-sample
No ratings yet
9781036011499-sample
12 pages
Maths Continuous Assessment
No ratings yet
Maths Continuous Assessment
4 pages
Math 2nd Puc Imp Questions
No ratings yet
Math 2nd Puc Imp Questions
11 pages
General Mathematics: Quarter 1 - Module 9: Intercepts, Zeroes and Asymptotes of Rational Functions
No ratings yet
General Mathematics: Quarter 1 - Module 9: Intercepts, Zeroes and Asymptotes of Rational Functions
37 pages
0 - 9386290 Introduction To Number Theory PDF
No ratings yet
0 - 9386290 Introduction To Number Theory PDF
27 pages
Class X Holidays Home Work
No ratings yet
Class X Holidays Home Work
5 pages
250+ C Programs For Practice PDF
No ratings yet
250+ C Programs For Practice PDF
13 pages
PHY 312 EXAM
No ratings yet
PHY 312 EXAM
2 pages
New General Mathematics 3
50% (2)
New General Mathematics 3
298 pages
(Graduate Studies in Mathematics, Vol. 92) I. Martin Isaacs - Finite Group Theory-American Mathematical Society (2008)(Z-Lib.io)
No ratings yet
(Graduate Studies in Mathematics, Vol. 92) I. Martin Isaacs - Finite Group Theory-American Mathematical Society (2008)(Z-Lib.io)
362 pages
(FREE PDF Sample) Solution Manual For Technical Mathematics, 4th Edition Ebooks
100% (16)
(FREE PDF Sample) Solution Manual For Technical Mathematics, 4th Edition Ebooks
30 pages

6 Dimension Reduction Theory

Uploaded by

6 Dimension Reduction Theory

Uploaded by

Why is Dimensionality Reduction important in Machine Learning and Predictive

List of reasons for reducing dimensionality includes the following:

Advantages of Dimensionality Reduction:

Step 3: Construct diagonal matrix S by placing singular values in descending order

Principal Component Analysis (PCA)

 Data on ʻ p ʼ variables; these variables may be correlated.

1. Find mean of ʻ p ʼ variables i.e x̅ and y̅

4. Construct covariance matrix A

5. Find Eigen values and Eigen vectors of covariance matrix

6. Choosing Principal component. Fraction of total variance accounted for by the

then its SVD is represented by:

Independent component analysis (ICA)

Applications include: Audio Processing, Medical data, Finance, Array processing

Example: Simple “Cocktail Party” Problem

microphones in separate locations.

The recorded time signals can be expressed as a linear equation:

x1(t) = a11s1(t) + a12s2(t)

x2(t) = a21s1(t) + a22s2(t)

s using only the observeable random vector x.

xj = aj1s1 + aj2s2 + .. + ajnsn, for all j

Lets assume that

Some related Concepts to understand above method:

estimate for the measurement of a physical quantity.

common continuous probability distribution.

degenerate. A square matrix is singular if and only if its determinant is 0.

You might also like