0% found this document useful (0 votes)

26 views50 pages

Unit 3

The document discusses feature extraction and selection in machine learning, highlighting their importance in reducing computational costs, improving algorithm performance, and preventing overfitting. It explains techniques such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) for dimensionality reduction, as well as various feature selection methods including wrapper, filter, and embedded techniques. Additionally, it covers model evaluation metrics like accuracy, precision, recall, and the confusion matrix to assess model performance.

Uploaded by

kp4330471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views50 pages

Unit 3

Uploaded by

kp4330471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Statistical Learning

Feature Extraction

 Feature extraction is a machine learning and data analysis process

that transforms raw data into numerical features.

 Feature extraction refers to the process of transforming raw data into

numerical features that can be processed while preserving the
information in the original data set.

 Feature extraction is a process used in machine learning to reduce

the number of resources needed for processing without losing
important or relevant information.
Why is Feature Extraction
Important?
 Reduction of Computational Cost: By reducing the dimensionality of the
data, machine learning algorithms can run more quickly.

 Improved Performance: Algorithms often perform better with a reduced

number of features. This is because noise and irrelevant details are removed,
allowing the algorithm to focus on the most important aspects of the data.

 Prevention of Overfitting: With too many features, models can become

overfitted to the training data, meaning they may not generalize well to new,
unseen data. Feature extraction helps to prevent this by simplifying the model.

 Better Understanding of Data: Extracting and selecting important features

can provide insights into the underlying processes that generated the data.
Principal Component Analysis

 Principal Component Analysis is an unsupervised learning algorithm

that is used for the dimensionality(or features) reduction in machine
learning.

 Dimensionality reduction in machine learning refers to the process of

reducing the number of random variables (or features) under
consideration.

 This reduction can be achieved by obtaining a set of principal

variables.

 Dimensionality reduction can be used for feature selection, feature

extraction, or a combination of the two.
 Principal Component Analysis (PCA) is an unsupervised learning
algorithm technique used to examine the interrelations among a set
of variables. It is also known as a general factor analysis
Principal Component Analysis

 Principal Component Analysis (PCA) is a statistical procedure that

uses an orthogonal transformation that converts a set of correlated
variables to a set of uncorrelated variables.

 Set of Correlated Variables: PCA is typically applied to a dataset

containing multiple correlated variables (features). The goal of PCA is
to find a new set of variables (principal components) that are linear
combinations of the original variables but are uncorrelated with each
other.
 Uncorrelated Variables: The principal components derived from
PCA are uncorrelated, which means that the covariance between any
pair of principal components is zero.
 The PCA algorithm is based on some mathematical concepts such as:
• Variance and Covariance
• Eigenvalues and Eigen vectors
STEP 1: Standardization
STEP 2: Covariance Matrix
Computation
Step 3: Compute Eigenvalues and
Eigenvectors of Covariance Matrix to Identify
Principal Components
Singular Value Decomposition(SVD)

 The singular value decomposition(SVD) of a matrix is a factorization

of that matrix into three matrices.

 Singular-value decomposition is also one of the popular

dimensionality reduction techniques.

 It is the matrix-factorization method of linear algebra, and it is widely

used in different applications such as feature selection, visualization,
noise reduction, and many more.
 Intensity Level of the image(Which is applicable in Discrete signal)

 Since an image is contiguous, the values of most pixels depend on

the pixels around them.
Application of SVD

 Image Recovery

 Image Compression
Feature Selection

 While developing the machine learning model, only a few variables in

the dataset are useful for building the model, and the rest features
are either redundant or irrelevant.

 If we input the dataset with all these redundant and irrelevant

features, it may negatively impact and reduce the overall
performance and accuracy of the model.

 A feature is an attribute that has an impact on a problem or is useful

for the problem, and choosing the important features for the model is
known as feature selection.
 The main difference between them is that feature selection is about
selecting the subset of the original feature set, whereas feature
extraction creates new features.

 Feature selection is a way of reducing the input variable for the model
by using only relevant data to reduce overfitting in the model.

 It is a process of automatically or manually selecting the

subset of most appropriate and relevant features to be used
in model building.
Need for Feature Selection

 Before implementing any technique, it is really important to

understand, need for the technique and so for the Feature Selection.
As we know, in machine learning, it is necessary to provide a pre-
processed and good input dataset in order to get better outcomes.
 We collect a huge amount of data to train our model and help it to
learn better. Generally, the dataset consists of noisy data, irrelevant
data, and some part of useful data. Moreover, the huge amount of
data also slows down the training process of the model, and with
noise and irrelevant data, the model may not predict and perform
well.
Benefits of using feature selection in
machine learning:
• It helps in avoiding the curse of dimensionality.
• It helps in the simplification of the model so that it can be
easily interpreted by the researchers.
• It reduces the training time.
• It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
Wrapper Method

 In wrapper methods, different subsets of features are evaluated by

training a model for each subset, and then the performance is
compared and the right combination is chosen.
Some techniques of wrapper
methods:
 Forward selection

 Backward elimination

 Exhaustive Feature Selection

 Recursive Feature Elimination

Forward Selection

• Starting from Scratch: Begin with an empty set of features and

iteratively add one feature at a time.

• Model Evaluation: At each step, train and evaluate the machine

learning model using the selected features.

• Stopping Criterion: Continue until a predefined stopping criterion is

met, such as a maximum number of features or a significant drop in
performance.
Backward Elimination

• Starting with Everything: Start with all available features.

• Iterative Removal: In each iteration, remove the least important

feature and evaluate the model.

• Stopping Criterion: Continue until a stopping condition is met.

Exhaustive Search

• Exploring All Possibilities: Evaluate all possible combinations of

features, which ensures finding the best subset for model
performance.

• Computational Cost: This can be computationally expensive,

especially with a large number of features.
Recursive Feature Elimination (RFE)

• Ranking Features: Start with all features and rank them based on
their importance or contribution to the model.

• Iterative Removal: In each iteration, remove the least important

feature(s).

• Stopping Criterion: Continue until a desired number of features is

reached.
Filter Method

 These methods are generally used while doing the pre-processing

step. These methods select features from the dataset irrespective of
the use of any machine learning algorithm.

 In terms of computation, they are very fast and inexpensive and are
very good for removing duplicated, correlated, redundant features
but these methods do not remove multicollinearity.
Techniques

 Information Gain

 Chi-square test

 Fisher’s Score

 Correlation Coefficient
Information Gain

 It is defined as the amount of information provided by the feature for

identifying the target value and measures reduction in the entropy
values. Information gain of each attribute is calculated considering
the target values for feature selection.

 In the context of information theory and machine learning, entropy

is a measure of uncertainty or randomness associated with a random
variable.
Chi-square test

 Chi-square method (X2) is generally used to test the relationship

between categorical variables. It compares the observed values from
different attributes of the dataset to its expected value.
Fisher’s Score

 Fisher’s Score selects each feature independently according to their

scores under Fisher criterion leading to a suboptimal set of features.
The larger the Fisher’s score is, the better is the selected feature.
Correlation Coefficient

 Pearson’s Correlation Coefficient is a measure of quantifying the

association between the two continuous variables and the direction of
the relationship with its values ranging from -1 to 1.
Embedded Method

 Embedded methods combine the advantageous aspects of both Filter

and Wrapper methods.

 Embedded methods combined the advantages of both filter and

wrapper methods by considering the interaction of features along
with low computational cost.

 These are fast processing methods similar to the filter method but
more accurate than the filter method.
 These methods are also iterative, which evaluates each iteration, and
optimally finds the most important features that contribute the most
to training in a particular iteration. :
Some techniques of embedded
methods
 Regularization

 Random Forest Importance

Regularization

 Regularization adds a penalty term to different parameters of the

machine learning model to avoid overfitting in the model.

 This penalty term is added to the coefficients; hence it shrinks some

coefficients to zero. Those features with zero coefficients can be
removed from the dataset.

 The types of regularization techniques are L1 Regularization (Lasso

Regularization) or d L2 regularization(Ridge Regularization)
Random Forest Importance

 Different tree-based methods of feature selection help us with feature

importance to provide a way of selecting features. Here, feature
importance specifies which feature has more importance in model
building or has a great impact on the target variable.

 Random Forest is a tree-based method, which is a type of bagging

algorithm that aggregates a different number of decision trees.
Evaluating ML Algo and Model
Selection
 Model evaluation is the process that uses some metrics that help us
to analyze the performance of the model.
Accuracy

 Accuracy: Accuracy is defined as the ratio of the number of correct

predictions to the total number of predictions.

accuracy_score
Precision

 Precision is the ratio of true positives to the summation of true

positives and false positives. It basically analyses the positive
predictions.

The drawback of Precision is that it does not consider the True

Negatives and False Negatives.

precision_score()
Recall

 Recall is the ratio of true positives to the summation of true positives

and false negatives. It basically analyses the number of correct
positive samples.

recall_score()
Confusion Matrix

The confusion matrix is a representation of the accuracy of the

classification model.
 True Positive: - This is the number of times the model predicted an
independent variable as positive when the actual value was positive.
 False Positive: - This is the number of times the model predicted an
independent variable as positive when the actual value was negative.
 False Negative: - This is the number of times the model predicted
an independent variable as negative when the actual value was
positive.
 True Negative: - This is the number of times the model predicted an
independent variable as negative when the actual value was
negative.

Unit 3
No ratings yet
Unit 3
55 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Feature Selection Tech
No ratings yet
Feature Selection Tech
5 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
9 pages
ML Unit 2 CLS Notes
No ratings yet
ML Unit 2 CLS Notes
38 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
ML Notes
No ratings yet
ML Notes
15 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Day School 03
No ratings yet
Day School 03
32 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
5 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Feature Selection for ML Experts
No ratings yet
Feature Selection for ML Experts
38 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Module 2 Data Preprocessing
No ratings yet
Module 2 Data Preprocessing
31 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Unit 4
No ratings yet
Unit 4
121 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
3 pages
Bpia 2371525652
No ratings yet
Bpia 2371525652
172 pages
Rana Kashif CV
No ratings yet
Rana Kashif CV
3 pages
Account STMT
No ratings yet
Account STMT
1 page
Release Waiver Quitclaim Undertaking Blank
No ratings yet
Release Waiver Quitclaim Undertaking Blank
1 page
AERO462 - Outline - F2020
No ratings yet
AERO462 - Outline - F2020
8 pages
Project - 2022 Petrobel Final
No ratings yet
Project - 2022 Petrobel Final
456 pages
Mini Project Semester 2
No ratings yet
Mini Project Semester 2
34 pages
Water & Sediment in Crude Oil Test
100% (1)
Water & Sediment in Crude Oil Test
5 pages
HW 01 Solution
No ratings yet
HW 01 Solution
6 pages
U4E3 IntroductionToLCA EN Notes
No ratings yet
U4E3 IntroductionToLCA EN Notes
35 pages
Polyspace Code Verification: Call Hierarchy Report For Project: Polyspace
No ratings yet
Polyspace Code Verification: Call Hierarchy Report For Project: Polyspace
7 pages
PML Heat Exchange
No ratings yet
PML Heat Exchange
64 pages
Proform 325 CSX
No ratings yet
Proform 325 CSX
28 pages
MJL Bangladesh Limited: Bank Code Br. Code Bank & Branch Name
No ratings yet
MJL Bangladesh Limited: Bank Code Br. Code Bank & Branch Name
9 pages
Technical Meeting No.57
No ratings yet
Technical Meeting No.57
10 pages
Mpegtv - 878 User Manual
No ratings yet
Mpegtv - 878 User Manual
24 pages
Aspiring Teachers' Guide
No ratings yet
Aspiring Teachers' Guide
184 pages
Integration Guide
100% (1)
Integration Guide
672 pages
SSS GuideBook 2010 PDF
No ratings yet
SSS GuideBook 2010 PDF
113 pages
Three Phase Heat Exchangers Enhancing Thermal Efficiency
No ratings yet
Three Phase Heat Exchangers Enhancing Thermal Efficiency
8 pages
Completed CH6 Mini Case Pharma Biotech Working Papers Fall 2014
50% (2)
Completed CH6 Mini Case Pharma Biotech Working Papers Fall 2014
3 pages
Digital Twin Tools for Construction
No ratings yet
Digital Twin Tools for Construction
93 pages
Word Larep010315
100% (1)
Word Larep010315
649 pages
The Lean Lego Game: Francisco Trindade Danilo Sato
100% (1)
The Lean Lego Game: Francisco Trindade Danilo Sato
53 pages
Let's Call Quiet Quitting What It Often Is Calibrated Contributing
No ratings yet
Let's Call Quiet Quitting What It Often Is Calibrated Contributing
5 pages
Dry, Near-Dry Wet EDM-parameters, Capabilities, Application and Process
No ratings yet
Dry, Near-Dry Wet EDM-parameters, Capabilities, Application and Process
19 pages
House Republicans Land Sales Letter
No ratings yet
House Republicans Land Sales Letter
2 pages
Unit - 4 Human Resource Management
No ratings yet
Unit - 4 Human Resource Management
40 pages
S.No. Name Town Address1 Numbe R Mobile - N O Email
No ratings yet
S.No. Name Town Address1 Numbe R Mobile - N O Email
21 pages