Feature Generation & Selection Techniques

Dvs notes

Uploaded by

rkasturidisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views18 pages

Feature Generation & Selection Techniques

Dvs notes

Uploaded by

rkasturidisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module-3

Feature Generation and Feature

Selection

Extracting Meaning from Data: Motivating application: user (customer) retention. Feature
Generation (brainstorming, role of domain expertise, and place for imagination), Feature Selection
algorithms. Filters; Wrappers; Decision Trees; Random Forests. Recommendation Systems:
Building a User-Facing Data Product, Algorithmic ingredients of a Recommendation Engine,
Dimensionality Reduction, Singular Value Decomposition, Principal Component Analysis, Exercise:
build your own recommendation system.
Feature Selection
• Feature selection methods can be broadly categorized into three
types:

Types of Feature Selection:

Filter Method: Based on the statistical measure of the relationship between the feature and the
target variable. Features with a high correlation are selected.

Wrapper Method: Based on the evaluation of the feature subset using a specific machine
learning algorithm. The feature subset that results in the best performance is selected.

Embedded Method: Based on the feature selection as part of the training process of the machine
learning algorithm.
Filter-based feature selection:
• Filter based approach selects the best feature from the original
features set based on some statistical criteria.
• The process of selecting the significant features is independent of ML
algorithms that will be used in building the model.
The outline of the filter approach is as shown below:
Feature selection techniques
• Two strategies are used by feature selection techniques:
• eliminate the correlated/redundant features
• select features which impact the target variable
• Following are filter-based features selection methods:

•Correlation Coefficient: For continuous

target variables.
•Chi-Square Test: For categorical target
variables.
•ANOVA F-Test: For continuous target
variables.
•Mutual Information: For both continuous and
categorical target variables.
(a) Filter-based feature selection based on
correlation coefficient:
• How to measure or decide whether the features are redundant in the
given data?
• Mathematically, redundancy between the features can be measures
based on correlation coefficient or mutual information.
Filter based feature selection
• One way to analyze the redundant features is by computing correlation and summarizing it in the
form of a correlation matrix.
• Correlation means it is an increase or decrease in one variable which is directly related to an
increase or decrease in the other variable. This can be called a correlation coefficient.
• The correlation coefficient ranges from −1 to +1.
• An absolute value of 1 indicates a perfect linear relationship between variables.
• A correlation close to 0 indicates no linear relationship between the variables.
• The sign of the coefficient indicates the direction of the relationship.
Correlation matrix
• Below are some python code plots of the correlation matrix for the house price prediction data set which has 15 features:
CRIM- per capita crime rate by town
• ZN- the proportion of residential land zoned for lots over 25,000 sq. ft.
• INDUS- the proportion of non-retail business acres per town
• CHAS- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
• NOX- nitric oxides concentration (parts per 10 million)
• RM- the average number of rooms per dwelling
• AGE- the proportion of owner-occupied units built before 1940
• DIS- weighted distances to five Boston employment centers
• RAD- index of accessibility to radial highways
• TAX- full-value property-tax rate per 10,000usd
• PTRATIO: pupil-teacher ratio by town
• B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT- % lower status of the population
Correlation matrix
• Interpreting the correlation matrix:
• The above plot shows a 10 x 10 matrix and color-fills each cell based on the
correlation coefficient of the pair representing it.
• Each cell in the grid represents the value of the correlation coefficient
between two variables.
• All diagonal elements are 1. Since diagonal elements represent the
correlation of each variable with itself, it will always be equal to 1.
• A large positive value (near 1.0) indicates a strong positive correlation.
• A large negative value (near -1.0) indicates a strong negative correlation.
•
Wrapper methods
• Wrapper methods are a category of feature selection techniques that focus on optimizing the performance
of a specific machine learning model by selecting a subset of features.
• These methods are aptly named because they “wrap” around the machine learning algorithm in question
and iteratively evaluate different combinations of features to determine which subset results in the best
model performance.
Wrapper methods
1. Forward Selection:
• Starting from Scratch: Begin with an empty set of features and
iteratively add one feature at a time.
• Model Evaluation: At each step, train and evaluate the machine
learning model using the selected features.
• Stopping Criterion: Continue until a predefined stopping criterion is
met, such as a maximum number of features or a significant drop in
performance.
Wrapper methods
2. Backward Elimination:
• Starting with Everything: Start with all available features.
• Iterative Removal: In each iteration, remove the least important
feature and evaluate the model.
• Stopping Criterion: Continue until a stopping condition is met.
Wrapper methods
3. Recursive Feature Elimination (RFE):
• Ranking Features: Start with all features and rank them based on
their importance or contribution to the model.
• Iterative Removal: In each iteration, remove the least important
feature(s).
• Stopping Criterion: Continue until a desired number of features is
reached.
Why Feature Selection?
• Reduces Overfitting: By using only the most relevant features, the
model can generalize better to new data.
• Improves Model Performance: Selecting the right features can
improve the accuracy, precision, and recall of the model.
• Decreases Computational Costs: A smaller number of features
requires less computation and storage resources.
• Improves Interpretability: By reducing the number of features, it is
easier to understand and interpret the results of the model.
Filter Vs Wrapper methods for feature selection

• Wrapper methods are a feature selection technique in machine

learning that evaluate the usefulness of features based on the
performance of a given model.
• Unlike filter methods, which rely on the intrinsic properties of the
data, wrapper methods use the model's predictive power to assess
which features are most beneficial for prediction.
Embedded Methods: Definition
• Embedded methods complete the feature selection process within
the construction of the machine learning algorithm itself. In other
words, they perform feature selection during the model training,
which is why we call them embedded methods.
• A learning algorithm takes advantage of its own variable selection
process and performs feature selection and classification/regression
at the same time.

Feature Generation & Selection for Retention
No ratings yet
Feature Generation & Selection for Retention
23 pages
Data Mining and Predictive Modelling: Lecture 7: Feature Selection
No ratings yet
Data Mining and Predictive Modelling: Lecture 7: Feature Selection
11 pages
Feature Selection Methods Explained
No ratings yet
Feature Selection Methods Explained
10 pages
Kernels and Feature Selection Methods
No ratings yet
Kernels and Feature Selection Methods
5 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
18 pages
Machine Learning Feature Selection Methods
No ratings yet
Machine Learning Feature Selection Methods
40 pages
Feature Generation and Selection in Data Science
No ratings yet
Feature Generation and Selection in Data Science
20 pages
Feature Selection Methods in ML
No ratings yet
Feature Selection Methods in ML
4 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
15 pages
Wrapper Method for Feature Selection
No ratings yet
Wrapper Method for Feature Selection
58 pages
Feature Selection in Data Science
No ratings yet
Feature Selection in Data Science
89 pages
Understanding Feature Selection Methods
No ratings yet
Understanding Feature Selection Methods
35 pages
Essential Guide to Feature Selection
No ratings yet
Essential Guide to Feature Selection
24 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
5 pages
Feature Selection and Engineering in ML
No ratings yet
Feature Selection and Engineering in ML
14 pages
Feature Selection Techniques 2
No ratings yet
Feature Selection Techniques 2
21 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
5 pages
Data Preprocessing and Feature Selection
No ratings yet
Data Preprocessing and Feature Selection
62 pages
Machine Learning Feature Selection Guide
No ratings yet
Machine Learning Feature Selection Guide
15 pages
Feature Selection and Tuning Methods
No ratings yet
Feature Selection and Tuning Methods
54 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
22 pages
Understanding Feature Selection Methods
No ratings yet
Understanding Feature Selection Methods
61 pages
Machine Learning Challenges and Solutions
No ratings yet
Machine Learning Challenges and Solutions
70 pages
Data Mining: Feature Selection Techniques
No ratings yet
Data Mining: Feature Selection Techniques
5 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
10 pages
Feature Selection and AIC in Data Science
No ratings yet
Feature Selection and AIC in Data Science
22 pages
Report
No ratings yet
Report
9 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
47 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
40 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
32 pages
Feature Selection vs Dimensionality Reduction
No ratings yet
Feature Selection vs Dimensionality Reduction
44 pages
AI Predictive Analytics Feature Selection
No ratings yet
AI Predictive Analytics Feature Selection
14 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
5 pages
Feature Engineering in Machine Learning
0% (1)
Feature Engineering in Machine Learning
29 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
5 pages
Data Cleaning & Feature Engineering in ML
No ratings yet
Data Cleaning & Feature Engineering in ML
18 pages
Feature Selection and Extraction Techniques
No ratings yet
Feature Selection and Extraction Techniques
5 pages
ML 6
No ratings yet
ML 6
31 pages
Feature Selection & Dimensionality Reduction
No ratings yet
Feature Selection & Dimensionality Reduction
41 pages
Feature Selection Methodologies Review
No ratings yet
Feature Selection Methodologies Review
5 pages
Introduction to Feature Selection Techniques
No ratings yet
Introduction to Feature Selection Techniques
45 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
10 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
6 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
53 pages
Feature Selection Techniques in Machine Learning - GeeksforGeeks
No ratings yet
Feature Selection Techniques in Machine Learning - GeeksforGeeks
5 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
7 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
91 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
11 pages
Feature Engineering and Selection Methods
No ratings yet
Feature Engineering and Selection Methods
68 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
19 pages
Feature Selection Methods and Strategies
No ratings yet
Feature Selection Methods and Strategies
36 pages
M3 Feature Extraction
No ratings yet
M3 Feature Extraction
50 pages
Understanding Dimensionality in Data Analysis
No ratings yet
Understanding Dimensionality in Data Analysis
5 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
55 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
74 pages
Feature Engineering in Data Science
No ratings yet
Feature Engineering in Data Science
15 pages
Supervised Learning Techniques in AI
No ratings yet
Supervised Learning Techniques in AI
121 pages
CCH Hospital Information Systems Overview
No ratings yet
CCH Hospital Information Systems Overview
50 pages
Transfer Learning for Telco Position Recovery
No ratings yet
Transfer Learning for Telco Position Recovery
17 pages
GDPR CIS White Paper
No ratings yet
GDPR CIS White Paper
8 pages
PowerPoint Presentation in Indonesia
No ratings yet
PowerPoint Presentation in Indonesia
14 pages
Python Programming for Engineers Guide
No ratings yet
Python Programming for Engineers Guide
40 pages
Cranex D Service Manual Overview
No ratings yet
Cranex D Service Manual Overview
264 pages
MDP Certification & NSE Trading Simulator
No ratings yet
MDP Certification & NSE Trading Simulator
6 pages
Java Full Stack Developer Profile
No ratings yet
Java Full Stack Developer Profile
6 pages
Test 2 Form 3 Computer
No ratings yet
Test 2 Form 3 Computer
1 page
Crash 2026 01 07 - 20.25.08 Client
No ratings yet
Crash 2026 01 07 - 20.25.08 Client
14 pages
Alcatel-Lucent WCDMA UMTS BTS Description: Training Manual
No ratings yet
Alcatel-Lucent WCDMA UMTS BTS Description: Training Manual
148 pages
Product Designer at Global Tech JSC
No ratings yet
Product Designer at Global Tech JSC
6 pages
News 122
No ratings yet
News 122
7 pages
Understanding Python DB-API Basics
No ratings yet
Understanding Python DB-API Basics
19 pages
Introduction to Records Management
No ratings yet
Introduction to Records Management
4 pages
Beginner's Guide to AI Webinar
No ratings yet
Beginner's Guide to AI Webinar
98 pages
Evolving HMI/SCADA for Industry Needs
No ratings yet
Evolving HMI/SCADA for Industry Needs
15 pages
Introduction to Embedded Systems
No ratings yet
Introduction to Embedded Systems
33 pages
Liye - Info PDF File of Concrete Technology Ms Shetty Download For Free PR
No ratings yet
Liye - Info PDF File of Concrete Technology Ms Shetty Download For Free PR
1 page
Java Data Structures Overview
No ratings yet
Java Data Structures Overview
20 pages
Harvard vs. Von Neumann Architecture
No ratings yet
Harvard vs. Von Neumann Architecture
4 pages
Frontend Developer Resume - Vikram Kumar
No ratings yet
Frontend Developer Resume - Vikram Kumar
1 page
CyberArk Application Identity Management Guide
No ratings yet
CyberArk Application Identity Management Guide
16 pages
Palo Alto Secure Remote Work Guide
No ratings yet
Palo Alto Secure Remote Work Guide
14 pages
SQL Command Types: DDL, DML, DCL, TCL
No ratings yet
SQL Command Types: DDL, DML, DCL, TCL
13 pages
Wasp Assembly Language Guide
No ratings yet
Wasp Assembly Language Guide
9 pages
Esports HUD Development for CS2
No ratings yet
Esports HUD Development for CS2
40 pages
Traffic Signal Simulator Project Report
No ratings yet
Traffic Signal Simulator Project Report
34 pages
Guidelines for Automated Spectrum Management
No ratings yet
Guidelines for Automated Spectrum Management
25 pages
Loofah Color Code in The Villages
No ratings yet
Loofah Color Code in The Villages
1 page