0% found this document useful (0 votes)

147 views4 pages

Variable and Feature Selection Methods

This paper introduces methods for variable and feature selection. It discusses variable ranking, subset selection, feature construction, and validation methods. The key goals of variable selection are to identify a minimal set of predictive variables and reduce overfitting by removing irrelevant or redundant variables. Common techniques include filters that rank variables by correlation with the target, wrappers that evaluate variable subsets, and embedded methods that perform selection in building a model.

Uploaded by

amanjots01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views4 pages

Variable and Feature Selection Methods

Uploaded by

amanjots01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Paper Title: An Introduction to Variable and Feature Selection

Authors: Isabelle Guyon, Andr´e Elisseeff

Venue: Journal of Machine Learning Research (JMLR)

Review

In this paper, authors provides various the various aspects regarding variable selection and feature
selection methods. This includes providing a better definition of the objective function, feature
construction, feature ranking, improving the performance of predictors. Authors considered two
examples for illustration purpose throughout the paper i.e. Gene selection from microarray data and text
categorization.

1. Variable Ranking

In order to do variable selection, researchers used variable ranking as an auxiliary selection mechanism
because of its simplicity and scalability. It is used to discover a set of drug in microarray analysis. It finds
genes that discriminates between healthy and disease patients. Authors illustrates two methods for
variable ranking i.e. correlation criteria, single variable classifiers, and information theoretic ranking
criteria which is discussed below.

Consider a set of m examples {xk, yk} consisting of n input variables xk,I(I = 1,…n) and one output vector yk.
In order to do the ranking of each variable in vector x, score function S(i) will be applied and the scores
gets sorted. The higher value of score indicates the higher value of that variable. The score of i th variable
xi is the correlation between the variable and the output vector y. Mathematically, it is written as:
∑𝑚
𝑘=1(𝑥𝑘,𝑖 − 𝑥̅ )(𝑦𝑘 − 𝑦
̅)
𝑅(𝑖) =
√∑𝑚 2 𝑚 ̅)2
𝑘=1(𝑥𝑘,𝑖 − 𝑥̅ ) ∑𝑘=1(𝑦𝑘 − 𝑦

To rank the variable, 𝑅(𝑖)2 will be used as it enforces a ranking according to goodness of linear fit of
individual variables.

1.1. Single Variable Classifiers

This method to compute the predictive power of each variable. This method is used for only binary
classification problem. In order to compute the predictive power, a threshold 𝛳 can be used to make a
single variable classifier. This method can be used when there is a large number of variable present where
ranking criteria can be failed.

1.2. Information Theoretic Ranking Criteria

This method computes the mutual information between each variable and target. Mathematically it is
computed as:
𝑝(𝑥𝑖 , 𝑦)
𝐼(𝑖) = ∫ ∫ 𝑙𝑜𝑔 𝑑𝑥𝑑𝑦
𝑥𝑖 𝑦 𝑝(𝑥𝑖 )𝑝(𝑦)
2. Small but Revealing Examples

In this section, authors answers several questions regarding redundant variables and outlines the
usefulness and limitations of ranking techniques. The questions and their corresponding answers are
below:

Q1: Can Presumably Redundant Variables Help Each Other?

Noise reduction and consequently better class separation may be obtained by adding variables
that are presumably redundant.

Q2: How Does Correlation Impact Variable Redundancy?

Perfectly correlated variables are truly redundant in the sense that no additional information is
gained by adding them. Very high variable correlation does not mean absence of variable
complementarity.

Q3: Can a variable that is Useless by itself be Useful with Others?

Yes it is. Two variables that are useless by themselves can be useful together.

3. Variable Subset Selection

In order to select the subset of variable, several method has been proposed by the researchers i.e. 1)
Wrappers and Embedded methods 2) Nested Subset 3) Direct Objective optimization which are discussed
below

3.1. Wrappers and Embedded Methods:

It addresses the problem of variable selection regardless of choosing learning algorithm. It uses the
prediction performance of a learning algorithm to assess the relative usefulness of subsets of variables. If
the number of variables are too large then the search space is also large and the variable selection
problem becomes NP hard. Efficient search techniques has been applied to address this pronlem such as
greedy search, which is robust against overfitting.

3.2. Nested Subset Methods

In order to find the optimal variable subset, researchers proposed various methods which selects the
subsets with respect to change in objective function. These methods are discussed below:

1) Finite difference calculation: In this method, the difference between J(s) and J(s+1) or J(s-1) is
computed for addition and removal of variables.
2) Quadratic approximation of the cost function: This method is used the prune the weights of the
variables in backward elimination method. For this purpose, a second order Taylor expansion of J
is computed and the first order terms are neglected. It yields the variation
𝜕2 𝐽
𝐷𝐽𝑖 = (1/2) 𝜕𝑤 2 (𝐷𝑤𝑖 )2 for variable i. The change in weights reflects the removal of variable.
𝑖
3) Sensitivity of the objective function calculation: In this method, the square of the derivative of J
w.r.t. xi is used.

3.3. Direct Objective Optimization

This section addresses the problem of formulating the objective function of variable selection and find
algorithm to optimize it. The objective function consist of two terms i.e. goodness of fit and regularization.
Researchers showed that the l0 norm formulation of SVMs can be solved approximately with a simple
modification of the vanilla SVM algorithm. Few researchers showed that using l1 norm minimization in
SVMs is sufficient to drive enough weights to zero. To the best of author’s knowledge, there is no
algorithm which directly minimizes the number of variables for non linear predictors.

4. Feature Construction and Space Dimensionality Reduction

Dimensionality reduction of input data is always advantageous for storing and processing the data. On the
other hand, it is said that the better performance is achieved using the features extracted from the original
input data. There are several techniques proposed to reduce the dimensionality of the feature such as
PCA, LDA etc. For feature construction, achieving best reconstruction of data is efficient for making
prediction. Also developing or applying unsupervised algorithm is always advantageous even the data is
supervised as most of the data is unlabeled. For example in text categorization, most of the data is
unlabeled.

4.1. Clustering

In this method, group of similar data points is replaced by the centroid of the cluster and that becomes a
feature. K-means and hierarchical are the most popular algorithm for clustering. In distributional
clustering, if 𝑋̂ is the random variable representing the constructed features, the information bottleneck
(IB) tries to minimize the mutual information between 𝐼(𝑋, 𝑋̂) and preserves the mutual information
𝐼(𝑋̂, 𝑌). This method searches for largest possible compression while retaining information about target.

4.2. Matrix Factorization

This method uses singular value decomposition (SVD) for feature construction. It form a set of features
that are linear combination of the original input data and it outputs the reconstruction of the original data
in least square manner.

4.3. Supervised Feature Selection

Authors reviewed three approaches for selecting features which are discussed below:

 Nested subset methods: Neural networks uses nodes to extract features from input data.
Therefore, node selection is considered as feature selection process.
 Filters: In this method, mutual information is maximized between the features and the output. To
optimize the weights of the parameter, gradient descent method is used.
 Direct objective optimization: This technique uses kernel methods to possess an implicit feature
space revealed by kernel expansion. In this method, it has been shown that the selecting these
implicit features only improves the generalization of the model.

5. Validation Methods

In this section, authors addresses two problem i.e. out of sample performance prediction and model
selection. For model selection purpose, only the training and validation data is used and various methods
are applied. Before doing model selection, the number of samples required for training is one of the major
problem. Several researchers followed leave one out technique but many times it leads to overfitting of
data. In metric based methods, unlabeled data is used and discrepancy between the model trained using
different subset of the data is used.

6. Advances Topics and Open Problems

In this section talks about seven open problems which is discussed below

 Variance of Variable Subset Selection: Small perturbation in experimental data can lead to poor
performance. To stabilize this problem, several bootstraps is used in which the subsets of training
data is chosen several times for variable selection.
 Variable Ranking in the Context of Others: Various methods has been discussed above for
variable ranking. Another algorithm i.e. relief algorithm which is based on nearest neighbor
selection is used for feature selection. For each example, the closest example from same class and
from different class are selected. The score of ith variable is computed as difference between the
nearest miss and nearest hit.
 Unsupervised Variable Selection: Many times, it is required to select the optimal features in the
absence of target label y. For this purpose, number of variable ranking criteria is used i.e. saliency,
entropy, smoothness, density and reliability.
 Forward vs Backward selection: It is said that the forward selection is computationally more
efficient as compared to backward selection. But it is argued that the weaker subsets are found
by the forward selection method. Authors illustrated the above by an example in this paper.
 Multi-class Problem: Variable selection method treat multiclass problem directly rather than
decomposing it into two class problem. The based on mutual information can be extended directly
to multi-class problem. For variable selection, multi-class setting is considered as advantageous
for variable selection since it is less likely that the random features gives good accuracy.
 Selection of Examples: In this problem, it is said that the mislabeled data leads to wrong choices
of features whereas reliable labelled data leads to better performance and it avoids the selection
of wrong variables.
 Inverse Problems: Authors considered this problem as one of the most problem. It states that
many times it is necessary to find out the underlying distribution. It means, the source of
generation of data. It is used in identifying the source causing the disease which helps in diagnosis.

At the end, authors recommend using a linear predictor and select variables in two alternative ways i.e.
correlation coefficient or mutual information, nested subset method.

Overview of Feature Selection Methods
No ratings yet
Overview of Feature Selection Methods
13 pages
Sparse SVMs for Variable Selection
No ratings yet
Sparse SVMs for Variable Selection
15 pages
Variable Ranking for Classification Efficiency
No ratings yet
Variable Ranking for Classification Efficiency
7 pages
Machine Learning for Gene Selection in Breast Cancer
No ratings yet
Machine Learning for Gene Selection in Breast Cancer
6 pages
Kernels and Feature Selection Methods
No ratings yet
Kernels and Feature Selection Methods
5 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
26 pages
SVM-Based Feature Selection Methods
No ratings yet
SVM-Based Feature Selection Methods
22 pages
Embedded Methods for Feature Selection
No ratings yet
Embedded Methods for Feature Selection
12 pages
Feature Selection Overview by Guyon & Elisseeff
No ratings yet
Feature Selection Overview by Guyon & Elisseeff
26 pages
Variable and Feature Selection Insights
No ratings yet
Variable and Feature Selection Insights
26 pages
Feature Selection Methods Explained
No ratings yet
Feature Selection Methods Explained
10 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
47 pages
SVM Regression Variable Ranking Analysis
No ratings yet
SVM Regression Variable Ranking Analysis
13 pages
Understanding Feature Selection Methods
No ratings yet
Understanding Feature Selection Methods
61 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
15 pages
Variable Selection in Regression Analysis
No ratings yet
Variable Selection in Regression Analysis
27 pages
Graph Autoencoder for Unsupervised Feature Selection
No ratings yet
Graph Autoencoder for Unsupervised Feature Selection
28 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
24 pages
Wrapper Method for Feature Selection
No ratings yet
Wrapper Method for Feature Selection
58 pages
Feature Selection Methodologies Review
No ratings yet
Feature Selection Methodologies Review
5 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
18 pages
Robust Multiclass Probability Estimation
No ratings yet
Robust Multiclass Probability Estimation
15 pages
Feature Selection Techniques Overview
No ratings yet
Feature Selection Techniques Overview
6 pages
Evolving Methodologies in Feature Selection
No ratings yet
Evolving Methodologies in Feature Selection
6 pages
Regression Analysis: Simple vs. Multivariate
No ratings yet
Regression Analysis: Simple vs. Multivariate
8 pages
Understanding Feature Selection in ML
No ratings yet
Understanding Feature Selection in ML
5 pages
Enhancing Filter Selection with Optimization
No ratings yet
Enhancing Filter Selection with Optimization
5 pages
Introduction to Feature Selection Techniques
No ratings yet
Introduction to Feature Selection Techniques
45 pages
SVM and Naive Bayes Exam Overview
No ratings yet
SVM and Naive Bayes Exam Overview
22 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
55 pages
Feature Selection Stability Measure
No ratings yet
Feature Selection Stability Measure
57 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
6 pages
Support Vector Machines in Stata
No ratings yet
Support Vector Machines in Stata
19 pages
Multi-Objective SVM for Multi-Class Classification
No ratings yet
Multi-Objective SVM for Multi-Class Classification
13 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
96 pages
Feature Selection & Dimensionality Reduction
No ratings yet
Feature Selection & Dimensionality Reduction
41 pages
Efficient SVM for Categorical Features
No ratings yet
Efficient SVM for Categorical Features
10 pages
Feature Selection Techniques for SVMs
No ratings yet
Feature Selection Techniques for SVMs
7 pages
Bivariate and Multivariate Data Analysis
No ratings yet
Bivariate and Multivariate Data Analysis
12 pages
Machine Learning in Plant Genomic Prediction
No ratings yet
Machine Learning in Plant Genomic Prediction
12 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
50 pages
Linear Regression in AIML Explained
No ratings yet
Linear Regression in AIML Explained
9 pages
Supervised Learning Techniques in AI
No ratings yet
Supervised Learning Techniques in AI
121 pages
Machine Learning for Stock Price Prediction
No ratings yet
Machine Learning for Stock Price Prediction
16 pages
Fisher Score in Feature Selection
No ratings yet
Fisher Score in Feature Selection
6 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
7 pages
A SVM Stock Selection Model Within PCA
No ratings yet
A SVM Stock Selection Model Within PCA
7 pages
Fast Clustering for Feature Selection
No ratings yet
Fast Clustering for Feature Selection
7 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
40 pages
Hybrid-RFE for Feature Selection
No ratings yet
Hybrid-RFE for Feature Selection
9 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
16 pages
Hybridizing Genetic Programming for Regression
No ratings yet
Hybridizing Genetic Programming for Regression
8 pages
Instance-Based Learning in Machine Learning
No ratings yet
Instance-Based Learning in Machine Learning
16 pages
Feature Selection vs. Extraction Explained
No ratings yet
Feature Selection vs. Extraction Explained
15 pages
Feature Selection Techniques in Data Mining
No ratings yet
Feature Selection Techniques in Data Mining
5 pages
Sika Fiber T-48 Performance Declaration
No ratings yet
Sika Fiber T-48 Performance Declaration
4 pages
Free Public Domain Book: Mill's Liberty
100% (1)
Free Public Domain Book: Mill's Liberty
269 pages
CSE-112: Intro to C++ Programming
88% (8)
CSE-112: Intro to C++ Programming
51 pages
Job Master Script for rAthena
No ratings yet
Job Master Script for rAthena
4 pages
10 Key Questions on Crowdsourcing Success
No ratings yet
10 Key Questions on Crowdsourcing Success
39 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
208 pages
Bar Visibility Representation Extension
No ratings yet
Bar Visibility Representation Extension
34 pages
Merck Account Access Guide
No ratings yet
Merck Account Access Guide
3 pages
VLSI CAD Exam Questions and Solutions
No ratings yet
VLSI CAD Exam Questions and Solutions
2 pages
Configuring EtherChannel: Troubleshooting Guide
No ratings yet
Configuring EtherChannel: Troubleshooting Guide
3 pages
UVCE CSE Syllabus Overview
100% (2)
UVCE CSE Syllabus Overview
55 pages
Avoid MySQL ALTER Table Downtime
No ratings yet
Avoid MySQL ALTER Table Downtime
2 pages
CAD/CAM and NC Systems Overview
No ratings yet
CAD/CAM and NC Systems Overview
31 pages
C++ OOP Concepts and Basics
No ratings yet
C++ OOP Concepts and Basics
24 pages
HMS Lean Yellow Belt A3 Training Guide
No ratings yet
HMS Lean Yellow Belt A3 Training Guide
112 pages
Security Risks of Voice Assistants
No ratings yet
Security Risks of Voice Assistants
11 pages
Shortest Path Algorithms Explained
No ratings yet
Shortest Path Algorithms Explained
28 pages
Grade 10 Math Syllabus Overview
No ratings yet
Grade 10 Math Syllabus Overview
3 pages
Supporting Realtime Traffic Pre A Paring Ip Network For Videoconferencing
No ratings yet
Supporting Realtime Traffic Pre A Paring Ip Network For Videoconferencing
25 pages
Optimal Cuts in Dynamic Programming
No ratings yet
Optimal Cuts in Dynamic Programming
5 pages
Low-Power DCT for FPGA Image Compression
No ratings yet
Low-Power DCT for FPGA Image Compression
6 pages
Mathematical Modelling: Basics: Lecture Notes
No ratings yet
Mathematical Modelling: Basics: Lecture Notes
10 pages
FAS: Fingerprint Access Control System
No ratings yet
FAS: Fingerprint Access Control System
3 pages
SQL Skills for Business Analysts
No ratings yet
SQL Skills for Business Analysts
2 pages
HIPAA Compliance Self-Audit Checklist
No ratings yet
HIPAA Compliance Self-Audit Checklist
2 pages
Push Down Automata Overview and Examples
No ratings yet
Push Down Automata Overview and Examples
26 pages
Using Python's CSV Module Guide
No ratings yet
Using Python's CSV Module Guide
5 pages
Rafid Ahmmad's Computer Science Portfolio
No ratings yet
Rafid Ahmmad's Computer Science Portfolio
2 pages
Safe Tutorial
100% (1)
Safe Tutorial
364 pages
July Attendance Report
No ratings yet
July Attendance Report
10 pages