0% found this document useful (0 votes)

13 views24 pages

ML Module VI

Module VI of CSC604 covers dimensionality reduction in machine learning, focusing on the curse of dimensionality, feature selection, and feature extraction techniques like Principal Component Analysis (PCA). It discusses the importance of reducing the number of features to improve model performance and avoid overfitting, as well as the various approaches to feature selection and extraction. The module highlights the advantages and disadvantages of dimensionality reduction methods, particularly emphasizing PCA's role in transforming data while retaining variance.

Uploaded by

yashsp20phpcomp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views24 pages

ML Module VI

Uploaded by

yashsp20phpcomp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CSC604 (Machine Learning)

Module VI
➢ BY:
DR. ARUNDHATI DAS
Module VI: Dimensionality Reduction
• 6.1 Curse of Dimensionality.
• 6.2 Feature Selection and Feature Extraction.
• 6.3 Dimensionality Reduction Techniques, Principal Component
Analysis.

2
Curse of Dimensionality
• In machine learning classification problems,
the higher the number of features, the harder
it gets to visualize the training set and then
work on it. Sometimes, most of these
features are correlated, and hence redundant.
This is where dimensionality reduction
algorithms come into play. Dimensionality
reduction is the process of reducing the
number of features.
• Due to high dimensions in data, there exists
a phenomenon called curse of
dimensionality.
• Curse of dimensionality causes overfitting
problem, resulting in high test error (low test
accuracy) in machine learning. Image courtesy: Internet
3
Introduction to Dimensionality Reduction
• There are two dimensionality reduction approaches used.
• Feature Selection
• Feature Extraction
• Feature Selection: A feature selection method selects a subset of relevant
features from the original feature set.
• Feature Extraction: A feature extraction method creates new features based on
combinations or transformations of the original feature set.

4
Introduction to Dimensionality Reduction
Feature Selection Feature Extraction
• Original features are maintained in the • The feature extraction algorithms
case of feature selection algorithms. transform the data onto a new feature
space.
• Reduces the dimensionality of the feature
space by selecting a subset from the • Represent the original features in a lower
original set. dimensional transformed feature space by
capturing only the essential information
• Requires domain knowledge and feature from the original set
engineering to write algorithm for
selecting features. • Can be applied to raw data without
explicitly writing algorithms having the
• May lose some information and introduce domain knowledge.
bias (error due to incorrect assumptions in
an algorithm) if the wrong features are • May introduce some noise and
selected. redundancy if the extracted features are
not informative.

5
Feature Selection
• Feature selection, also known as variable, attribute, or variable
subset selection is used in machine learning or statistics for
selection of a subset of features from the original set of features
to construct models for describing data.

Diabetes dataset
Iris dataset
Ionospheare dataset

6
Feature selection
• Why feature selection is needed?
• Feature selection is used to choose a subset of relevant features for effective classification
of data.
• In high dimensional data classification, the performance of a classifier often depends on
the feature subset used for classification
• People use feature selection to
• Minimize redundancy
• Reduce dimensionality (to reduce number of features)
• Improve predictive accuracy
• The main objective of feature selection is to identify m most informative features out of
the d original features, where m < d. (For example m=10, d=100)

7
Feature selection
• Feature selection approaches are classified mainly into three categories.
• 1. Filter approach
• 2. Wrapper approach
• 3. Embedded approach

Image Courtesy: Liping Xei

8
Feature selection
1. Filter approach (Guyon & Elisseeff, 2003):
1. It is used in many datasets where the number of features is high.
2. This approach selects a subset of features depending on some
measure calculated on the features without using a learning
algorithm.
3. Filter-based feature selection methods are faster than wrapper-
based methods.
2. Wrapper approach (Blum & Langley, 1997):
1. This approach uses a learning algorithm to evaluate the
accuracy produced by the use of the selected features in
classification.
2. Wrapper methods can give high classification accuracy for
particular classifiers, but generally they have high
computational complexity.
3. Embedded approach (Guyon & Elisseeff, 2003):
1. This approach performs feature selection during the process of
training and is specific to the applied learning algorithms.

Hybrid approach (Hsu, Hsieh, & Lu, 2011):

1. This approach is a combination of both filter and wrapper-based
methods.
2. The filter approach selects a candidate feature set from the
original feature set and the candidate feature set is refined by the
wrapper approach.
3. It exploits the advantages of these two approaches.
9
Feature selection
1. Filter approach (Guyon & Elisseeff, 2003):
a. It is used in many datasets where the number of features is high.
b. This approach selects a subset of features depending on some measure calculated on the features
without using a learning algorithm. Some of the criteria or measures to select the features are:
i. Information Gain – It is defined as the amount of information provided by the feature for
identifying the target value. Information gain of each attribute is calculated considering the target
values for feature selection.
ii. Variance Threshold – It is an approach where all features are removed whose variance doesn’t
meet the specific threshold. By default, this method removes features having zero variance. The
assumption made using this method is higher variance features are likely to contain more
information.
iii. Pearson Correlation- Measures the linear relationship between features and the target variable. It
indicates how one variable changes in response to another.
c. Filter-based feature selection methods are faster than wrapper-based methods since it does not use
learning algorithms.
ORIGINAL SET
OF FEATURES
10
Feature selection
2. Wrapper approach (Blum & Langley, 1997):
1. This approach uses a learning algorithm to evaluate the accuracy produced by the use of the selected features
in classification.
2. Wrapper methods can give high classification accuracy for particular classifiers, but generally they have high
computational complexity.
3. Some of the techniques used are:
1. Forward selection – This method is an iterative approach where we initially start with an empty set of
features and keep adding a feature which best improves our model after each iteration. The stopping
criterion is till the addition of a new variable does not improve the performance of the model.
2. Backward elimination – This method is also an iterative approach where we initially start with all
features and after each iteration, we remove the least significant feature. The stopping criterion is till no
improvement in the performance of the model is observed after the feature is removed.

ORIGINAL SET
OF FEATURES

11
Feature selection
3. Embedded approach (Guyon & Elisseeff, 2003):
1. This approach performs feature selection during the process of training and is specific to the
applied learning algorithms.
2. The feature selection algorithm is blended as part of the learning algorithm, thus having its own
built-in feature selection methods.
3. Some techniques used are:
Tree-based methods – Methods such as Random Forest, Gradient Boosting provides us
feature importance as a way to select features as well. Feature importance tells us which
features are more important in making an impact on the target feature.

ORIGINAL SET
OF FEATURES

12
13
Feature selection
Hybrid approach (Hsu, Hsieh, & Lu, 2011):
1. This approach is a combination of both filter and wrapper-based methods.
2. The filter approach selects a candidate feature set from the original feature set and the
candidate feature set is refined by the wrapper approach.
3. It exploits the advantages of these two approaches.

14
Feature Extraction:

• A feature extraction method creates new features based on

combinations or transformations of the original feature set.

• Whether the new transformed features are better than the original
features?

• Eg: PCA (Principal Component Analysis), LDA (Linear Discriminant

Analysis).

15
Principal Component Analysis (PCA).
• PCA is one of the most popular and widely used feature extraction
techniques.
• It was introduced by Karl Pearson.
• Why it is so popular:
• Simple, based on applied linear algebra.
• Non-parametric method of extracting relevant information from confusing data set.
• PCA makes one stringent but powerful assumption that is linearity.
• PCA is unsupervised in nature.

16
Principal Component Analysis (PCA)
➢ Consider the example shown in the figure 1. A spring-like structure is shown, which is stretched and released, the
system will exhibit oscillatory motion in one direction.
➢ The underlying dynamics can be expressed as a function of a single variable x.
➢ Unfortunately, because of our ignorance, we do not even know what are the real “x”, “y” and “z” axes, so we choose
three camera axes {~a, ~b, ~c} at some arbitrary angles with respect to the system.
➢ If we were smart experimenters, we would have just measured the position along the x-axis with one camera. But this is
not what happens in the real world.
➢ We often do not know what measurements best reflect the dynamics of our system in question.
➢ Furthermore, we sometimes record more dimensions than we actually need!

➢ Any data can be represented as a linear combination of their basis. For example
(x,y)=(3,-2) => 3(1, 0)+-2(0, 1) : when we use standard basis
o What is basis?
o We could choose different basis vectors!
➢ Principal component analysis computes the most meaningful basis to re-express a
noisy data.
➢ The hope is that this new basis will filter out the noise and reveal hidden dynamics.
➢ The basis vectors will always be orthogonal. (Eg. X-y co-ord are basis vectors.)

17
Principal Component Analysis (PCA)

• For the shown example, in other words, the goal of PCA is to determine that x' - the unit basis vector along the x-
axis which is the important dimension.
• Determining this fact allows an experimenter to discern which dynamics are important and which are just
redundant.
• PCA asks: Is there another basis, which is a linear combination of the original basis, that best re-expresses our data set?
• PCA transforms or maps the data in one (higher) dimensional space into data in another (lower) dimension space, the variance of the
data in the transformed dimensional space should be maximum

• PCA involves the following steps:

• Construct the covariance matrix of the data.
• Compute the eigenvectors of this matrix.
• Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large fraction of variance of the original data.

• Hence, we are left with a lesser number of eigenvectors, and there might have been some data loss in the process.
But, the most important variances should be retained by the remaining eigenvectors

18
Basic concepts for PCA
• As we can see PCA computes covariance matrix, eigen vectors, eigen values etc, let us
brush up our concepts on all these.
• Equation to find Eigen values
• Eigen Vector Equation

• Variance (X),
• Co-variance (X, Y),

• Co-variance Matrix, for two columns X,Y

• Co-variance Matrix, for three columns X,Y, Z

• Numericals discussed on board.
19
Basic concepts for PCA
• Covariance matrix is also known as dispersion matrix or variance-
covariance matrix
• Variance means how much one random variable gets changed
• Co-variance means how much two random variables get changed together
• Covariance matrix is used to calculate co-variance between every pair (2
columns) of a dataset.

20
Principal Component Analysis (PCA).
• The idea of principal component analysis (PCA) is to reduce the
dimensionality of a dataset consisting of a large number of related
variables while retaining as much variance in the data as possible.
• PCA finds a set of new variables that the original variables are just
their linear combinations.
• The new variables are called Principal Components (PCs).
• These principal components are orthogonal: In a 3-D case, the
principal components are perpendicular to each other.
• Figure 2 shows the intuition of PCA: it “rotates” the axes to line
up better with your data.
• The first principal component will capture most of the variance in
the data, then followed by the second, third, and so on. As a result,
the new data will have fewer dimensions.
Figure 2: Visualization of PCs

21
Dimensionality Reduction

• Advantages of Dimensionality Reduction

• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• It overcomes the curse of dimensionality problem
• Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to define datasets.
• We may not know how many principal components to keep- in practice, some thumb
rules are applied.
22
Numerical on PCA

23
24

Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Suzlon One Earth Final
86% (7)
Suzlon One Earth Final
22 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Unit IV Dimensionality Reduction
No ratings yet
Unit IV Dimensionality Reduction
34 pages
Unit 3
No ratings yet
Unit 3
55 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
9 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
Astm d1683 PDF
No ratings yet
Astm d1683 PDF
9 pages
Kernels, Model & Feature Selection
No ratings yet
Kernels, Model & Feature Selection
5 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Unit 3 - MSC
No ratings yet
Unit 3 - MSC
51 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Unit 3
No ratings yet
Unit 3
50 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
ML Unit 2 CLS Notes
No ratings yet
ML Unit 2 CLS Notes
38 pages
Data-Science Feature Selection & Extraction
No ratings yet
Data-Science Feature Selection & Extraction
15 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
5 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
10 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Feature Selection Techniques
No ratings yet
Feature Selection Techniques
5 pages
Feature Selection Techniques and Its Importance in Machine Learning: A Survey
No ratings yet
Feature Selection Techniques and Its Importance in Machine Learning: A Survey
6 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Unit 3
No ratings yet
Unit 3
23 pages
ML Unit 2 Part - 2
No ratings yet
ML Unit 2 Part - 2
6 pages
Feature Selection - New
No ratings yet
Feature Selection - New
41 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
6 pages
Barnes-The Toils of Scepticism
100% (1)
Barnes-The Toils of Scepticism
88 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Traffic Engineering Lab Guide IITG
No ratings yet
Traffic Engineering Lab Guide IITG
101 pages
PYP Student Planner Guide
No ratings yet
PYP Student Planner Guide
31 pages
Physical Properties of Steel
No ratings yet
Physical Properties of Steel
1 page
Wrapper Method
No ratings yet
Wrapper Method
58 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Form Four Geo-1
No ratings yet
Form Four Geo-1
6 pages
ML Module III
No ratings yet
ML Module III
64 pages
ML Module I
No ratings yet
ML Module I
71 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
ML Module IV
No ratings yet
ML Module IV
41 pages
Wellness Habits for Veterinary Professionals
No ratings yet
Wellness Habits for Veterinary Professionals
2 pages
Development of Algan/Gan High Electron Mobility Transistors (Hemts) On Diamond Substrates
No ratings yet
Development of Algan/Gan High Electron Mobility Transistors (Hemts) On Diamond Substrates
76 pages
Ethnomath in Javanese Drums
No ratings yet
Ethnomath in Javanese Drums
12 pages
AP - Suggestion PDF - SSC 2025 EPL - Class 7 - English 2nd
No ratings yet
AP - Suggestion PDF - SSC 2025 EPL - Class 7 - English 2nd
14 pages
Theme: Living With COVID Sub Theme: Health and Well Being: CLASS - VII (2020-21) Project Based Assessment
No ratings yet
Theme: Living With COVID Sub Theme: Health and Well Being: CLASS - VII (2020-21) Project Based Assessment
4 pages
ATTENDANCE 2nd Quarter (AutoRecovered)
No ratings yet
ATTENDANCE 2nd Quarter (AutoRecovered)
3 pages
ML Module II
No ratings yet
ML Module II
11 pages
RSER PI Manuscript
No ratings yet
RSER PI Manuscript
29 pages
Heat Transfer CHE F241: Basic Concepts
No ratings yet
Heat Transfer CHE F241: Basic Concepts
36 pages
Admission To Foundation Program of Tianjin University 2025
No ratings yet
Admission To Foundation Program of Tianjin University 2025
5 pages
Scaling Social Impact
No ratings yet
Scaling Social Impact
95 pages
An Analytical and Comparative Approach To Cultural Heritage Experiences Enhanced With Augmented Reality
No ratings yet
An Analytical and Comparative Approach To Cultural Heritage Experiences Enhanced With Augmented Reality
25 pages
+3 Final - Programme - 2015
No ratings yet
+3 Final - Programme - 2015
4 pages
Roofing Thesis
100% (3)
Roofing Thesis
4 pages
WWW Scribd
No ratings yet
WWW Scribd
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Spray Humidifier for Agriculture
No ratings yet
Spray Humidifier for Agriculture
27 pages
Material List Summary-Waptech
No ratings yet
Material List Summary-Waptech
5 pages
Use Phone as Projector with Lens
No ratings yet
Use Phone as Projector with Lens
8 pages
Skilled Boilermaker & Welder CV
No ratings yet
Skilled Boilermaker & Welder CV
2 pages
Richard E. Taylor
No ratings yet
Richard E. Taylor
4 pages
(Ebook) Intelligent Materials by M. Shahinpoor, M. Shahinpoor, H-J. Schneider ISBN 9780854043354, 0854043357 PDF Download
100% (1)
(Ebook) Intelligent Materials by M. Shahinpoor, M. Shahinpoor, H-J. Schneider ISBN 9780854043354, 0854043357 PDF Download
47 pages
Greenway Poster, Eg 1
No ratings yet
Greenway Poster, Eg 1
1 page
Disciplinary Paper
No ratings yet
Disciplinary Paper
5 pages
Multivariate Parametric Methods: Steven J Zeil
No ratings yet
Multivariate Parametric Methods: Steven J Zeil
36 pages

ML Module VI

Uploaded by

ML Module VI

Uploaded by

CSC604 (Machine Learning)

Image Courtesy: Liping Xei

Hybrid approach (Hsu, Hsieh, & Lu, 2011):

• A feature extraction method creates new features based on

• Eg: PCA (Principal Component Analysis), LDA (Linear Discriminant

• PCA involves the following steps:

• Co-variance Matrix, for two columns X,Y

• Co-variance Matrix, for three columns X,Y, Z

• Advantages of Dimensionality Reduction

You might also like