0% found this document useful (0 votes)
8 views

Module-3 - DS (Autosaved)

Dvs notes

Uploaded by

rkasturidisha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Module-3 - DS (Autosaved)

Dvs notes

Uploaded by

rkasturidisha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Module-3

Feature Generation and Feature


Selection

Extracting Meaning from Data: Motivating application: user (customer) retention. Feature
Generation (brainstorming, role of domain expertise, and place for imagination), Feature Selection
algorithms. Filters; Wrappers; Decision Trees; Random Forests. Recommendation Systems:
Building a User-Facing Data Product, Algorithmic ingredients of a Recommendation Engine,
Dimensionality Reduction, Singular Value Decomposition, Principal Component Analysis, Exercise:
build your own recommendation system.
Feature Selection
• Feature selection methods can be broadly categorized into three
types:

Types of Feature Selection:


Filter Method: Based on the statistical measure of the relationship between the feature and the
target variable. Features with a high correlation are selected.

Wrapper Method: Based on the evaluation of the feature subset using a specific machine
learning algorithm. The feature subset that results in the best performance is selected.

Embedded Method: Based on the feature selection as part of the training process of the machine
learning algorithm.
Filter-based feature selection:
• Filter based approach selects the best feature from the original
features set based on some statistical criteria.
• The process of selecting the significant features is independent of ML
algorithms that will be used in building the model.
The outline of the filter approach is as shown below:
Feature selection techniques
• Two strategies are used by feature selection techniques:
• eliminate the correlated/redundant features
• select features which impact the target variable
• Following are filter-based features selection methods:

•Correlation Coefficient: For continuous


target variables.
•Chi-Square Test: For categorical target
variables.
•ANOVA F-Test: For continuous target
variables.
•Mutual Information: For both continuous and
categorical target variables.
(a) Filter-based feature selection based on
correlation coefficient:
• How to measure or decide whether the features are redundant in the
given data?
• Mathematically, redundancy between the features can be measures
based on correlation coefficient or mutual information.
Filter based feature selection
• One way to analyze the redundant features is by computing correlation and summarizing it in the
form of a correlation matrix.
• Correlation means it is an increase or decrease in one variable which is directly related to an
increase or decrease in the other variable. This can be called a correlation coefficient.
• The correlation coefficient ranges from −1 to +1.
• An absolute value of 1 indicates a perfect linear relationship between variables.
• A correlation close to 0 indicates no linear relationship between the variables.
• The sign of the coefficient indicates the direction of the relationship.
Correlation matrix
• Below are some python code plots of the correlation matrix for the house price prediction data set which has 15 features:
CRIM- per capita crime rate by town
• ZN- the proportion of residential land zoned for lots over 25,000 sq. ft.
• INDUS- the proportion of non-retail business acres per town
• CHAS- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
• NOX- nitric oxides concentration (parts per 10 million)
• RM- the average number of rooms per dwelling
• AGE- the proportion of owner-occupied units built before 1940
• DIS- weighted distances to five Boston employment centers
• RAD- index of accessibility to radial highways
• TAX- full-value property-tax rate per 10,000usd
• PTRATIO: pupil-teacher ratio by town
• B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT- % lower status of the population
Correlation matrix
• Interpreting the correlation matrix:
• The above plot shows a 10 x 10 matrix and color-fills each cell based on the
correlation coefficient of the pair representing it.
• Each cell in the grid represents the value of the correlation coefficient
between two variables.
• All diagonal elements are 1. Since diagonal elements represent the
correlation of each variable with itself, it will always be equal to 1.
• A large positive value (near 1.0) indicates a strong positive correlation.
• A large negative value (near -1.0) indicates a strong negative correlation.

Wrapper methods
• Wrapper methods are a category of feature selection techniques that focus on optimizing the performance
of a specific machine learning model by selecting a subset of features.
• These methods are aptly named because they “wrap” around the machine learning algorithm in question
and iteratively evaluate different combinations of features to determine which subset results in the best
model performance.
Wrapper methods
1. Forward Selection:
• Starting from Scratch: Begin with an empty set of features and
iteratively add one feature at a time.
• Model Evaluation: At each step, train and evaluate the machine
learning model using the selected features.
• Stopping Criterion: Continue until a predefined stopping criterion is
met, such as a maximum number of features or a significant drop in
performance.
Wrapper methods
2. Backward Elimination:
• Starting with Everything: Start with all available features.
• Iterative Removal: In each iteration, remove the least important
feature and evaluate the model.
• Stopping Criterion: Continue until a stopping condition is met.
Wrapper methods
3. Recursive Feature Elimination (RFE):
• Ranking Features: Start with all features and rank them based on
their importance or contribution to the model.
• Iterative Removal: In each iteration, remove the least important
feature(s).
• Stopping Criterion: Continue until a desired number of features is
reached.
Why Feature Selection?
• Reduces Overfitting: By using only the most relevant features, the
model can generalize better to new data.
• Improves Model Performance: Selecting the right features can
improve the accuracy, precision, and recall of the model.
• Decreases Computational Costs: A smaller number of features
requires less computation and storage resources.
• Improves Interpretability: By reducing the number of features, it is
easier to understand and interpret the results of the model.
Filter Vs Wrapper methods for feature selection

• Wrapper methods are a feature selection technique in machine


learning that evaluate the usefulness of features based on the
performance of a given model.
• Unlike filter methods, which rely on the intrinsic properties of the
data, wrapper methods use the model's predictive power to assess
which features are most beneficial for prediction.
Embedded Methods: Definition
• Embedded methods complete the feature selection process within
the construction of the machine learning algorithm itself. In other
words, they perform feature selection during the model training,
which is why we call them embedded methods.
• A learning algorithm takes advantage of its own variable selection
process and performs feature selection and classification/regression
at the same time.

You might also like