0% found this document useful (0 votes)
29 views

Feature Selection

Uploaded by

ppreethu151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Feature Selection

Uploaded by

ppreethu151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Feature Selection

• Feature Selection is the process of selecting a subset of


relevant features from the dataset to be used in a
machine-learning model. It is an important step in the
feature engineering process as it can have a significant
impact on the model’s performance.
• After generating a large set of features,
• use statistical and machine learning techniques to identify the
most relevant ones.
• This includes:1. Correlation Analysis :Checking how each
feature correlate with the target variable (e.g.,user retention).
• 2. Model-Based Selection: Using algorithms like random forest
or LASSO regression
• To identify important features.
• 3. Cross-Validation :Ensuring th features selected improve
model performance on unseen data.
• Benefits of Feature Selection:
• 1. Reduces Overfitting:
• By using only the most relevant features, the model can generalize better to new
data.
• 2.Improves Model Performance:
• Selecting the right features can improve the accuracy, precision, and recall of the
model.
• 3.Decreases Computational Costs:
• A smaller number of features requires less computation and storage resources.
• 4.Improves Interpretability:
• By reducing the number of features, it is easier to understand and interpret the
results of the model.

• There can be various reasons to perform feature selection
• Simplification of the model.
• •Less computational time.
• •To avoid the curse of dimensionality.
• •Improve the compatibility of data with models
Roughly the feature selection techniques can
be divided into three parts.
• 1. Filters
• 2. Wrappers
• 3.Embedded methods.
Filter Methods
• Filters rank features based on a statistical measure of their
relationship with the outcome variable. This is a good initial step but
doesn’t account for interactions between features.
• The filter method filters out the irrelevant feature and redundant
columns from the model by using different metrics through
ranking.
• Advantages:
• •Simple and fast to compute. does not over fit the data
• •Provides a preliminary ranking of features.
• Disadvantages:
• •Ignores feature interactions and redundancy.
Filter Methods -example

• Linear Regression Test: Foreach feature, run a linear regression with


only that feature as a predictor.
• Rank features by p-value or R-squared.
• Steps:1.Compute correlation: Measure thecorrelation between each
feature and thetarget variable(e.g., user retention).
• 2.Rank features: Order features by their p-values or R-squared values.
• 3.Select top features: Choose asubset of top-ranked features for
further analysis.
• 1. Wrapper Method:-
• In wrapper methodology, selection of features is done by
considering it as a search problem, in which different
combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of
features iteratively. On the basis of the output of the model,
features are added or subtracted, and with this feature set, the
model has trained again.
• They consider feature interactions but can be computationally
• expensive and prone to overfitting.
• Types of Wrappers:
• 1.Forward Selection:
• 1.Start with no features.
• 2.Add features one at a time, selecting the one that improves
the model the most.
• 3.Stop when adding more features does not improve the model
• 2.Backward Elimination:
• A. Start with allfeatures.
• b.Remove features one at a time, selecting the one that
improves the model the most when removed.
• C.. Stop when removing more features degrades the model.
• 3.Combined Approach: Use a hybrid of forward selection and backward
elimination to balance feature inclusion and exclusion.

• Steps involved in Wrapper Methods


• Steps:
• •Select an algorithm:
• Choose forward selection, backward elimination,or a combined approach.
• •Evaluate subsets:Use cross-validation to evaluate the performance of
different feature subsets.
• •Optimize selection:Use criteria such as R-squared, values, AIC,or BIC to
select the best subset.
Selection criteria of features
• 1.R-squared:
• 1. Measures the proportion of variance explained by the model.
• 2.Higher R-squared indicates a better fit.
• 2.P-values:
• 1. Assess the significance of individual features.
• 2. Lower p-values indicate higher significance.
• 3.AIC (Akaike Information Criterion)
• 1. Balances model fit and complexity.
• 2. Lower AIC indicates a better model.

• 4. .BIC (Bayesian Information Criterion):
• 1. Similar to AIC but with a stronger penalty for model complexity.
• 2. Lower BIC indicates a better model.
• 5.Entropy
• 1.Calculate Entropy :Compute the entropy for the entire dataset.
• 2. Compute Information Gain:For each feature,calculate the information
gain resulting from splitting the data set on that feature.
• 3. Select Features: Choose features with the higest information gain, as
these contribute the most reducing uncertainty.

• he Akaike information criterion (AIC) is a mathematical
method for evaluating how well a model fits the data it was
generated from. In statistics, AIC is used to compare
different possible models and determine which one is the
best fit for the data. AIC is calculated from:
1.the number of independent variables used to build the
model.
2.the maximum likelihood estimate of the model (how well
the model reproduces the data).
• The best-fit model according to AIC is the one that explains
the greatest amount of variation using the fewest possible
independent variables.
• Model selection exampleIn a study of how hours spent studying and test
format (multiple choice vs. written answers) affect test scores, you
create two models:
1. Final test score in response to hours spent studying
2. Final test score in response to hours spent studying + test format
2
• You
2
find an r of 0.45 with a p value less than 0.05 for model 1, and an
r of 0.46 with a p value less than 0.05 for model 2. Model 2 fits the
data slightly better – but was it worth it to add another parameter just
to get this small increase in model fit?
• You run an AIC test to find out, which shows that model 1 has the
lower AIC score because it requires less information to predict with
almost the exact same level of precision. Another way to think of this
is that the increased precision in model 2 could have happened by
chance.
• From the AIC test, you decide that model 1 is the best model for your
study.
• Bayesian information criterion (BIC) is a criterion for model
selection among a finite set of models. It is based, in part, on
the likelihood function, and it is closely related to Akaike
information criterion (AIC).
• When fitting models, it is possible to increase the likelihood by
adding parameters, but doing so may result in overfitting. The
BIC resolves this problem by introducing a penalty term for the
number of parameters in the model. The penalty term is larger
in BIC than in AIC.
Random Forest Algorithm

• Random Forest is a popular machine learning algorithm that


belongs to the supervised learning technique. It can be
used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a
process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
• "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of
that dataset."
• The greater number of trees in the forest leads to higher
accuracy and prevents the problem of overfitting.
• The Working process can be explained in the below steps and
diagram:
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected
data points (Subsets).
• Step-3: Choose the number N for decision trees that you want
to build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each
decision tree, and assign the new data points to the category
that wins the majority votes.

You might also like