0% found this document useful (0 votes)
12 views

Data Prep for ML-1

Uploaded by

Jaya Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Prep for ML-1

Uploaded by

Jaya Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Prep for ML

• Data Cleaning
• Feature Selection
• Data Transformation
• Feature Engineering
• Dimensionality reduction
Understanding the data - EDA
Feature Selection – Filter methods
Some examples of filter methods for feature selection include:
1. Correlation-based feature selection: This method selects features based on their
correlation with the target variable. Features that are highly correlated with the target are
considered more important and are selected, while features that are not correlated or have a
low correlation with the target are discarded.
2. Mutual information-based feature selection: This method selects features based on the
mutual information between the feature and the target variable. Features with a high mutual
information are considered more important and are selected.
3. Chi-squared test: This method selects features based on the chi-squared statistic, which
measures the dependence between categorical variables. Features that are strongly
associated with the target variable are selected.
4. ANOVA: This method selects features based on the analysis of variance between the means
of different groups. Features that have a significant effect on the target variable are selected.
Feature Selection – Wrapper
methods
Some examples of wrapper methods for feature selection include:
1. Forward selection: This method starts with an empty set of features and adds one
feature at a time to the set until the performance of the model reaches a maximum.
2. Backward selection: This method starts with a full set of features and removes one
feature at a time from the set until the performance of the model reaches a maximum.
3. Recursive feature elimination (RFE): This method uses a machine learning algorithm
to rank the features in the dataset based on their importance. The least important
features are removed iteratively until the desired number of features is reached.
4. Genetic algorithms: These algorithms use principles from evolutionary biology to
search for the optimal subset of features. They start with a population of random
feature subsets and use selection, crossover, and mutation to evolve the subsets over
multiple generations until the optimal subset is found.
Feature selection – Dimensionality
Reduction
Some examples of wrapper methods for feature selection include:
• Principal component analysis (PCA): PCA is a linear dimensionality reduction
technique that projects the data onto a lower-dimensional space. PCA can be used for
feature selection because it can reduce the number of features while preserving the
most important information.
• Singular value decomposition (SVD): SVD is a matrix factorization technique that
decomposes a matrix into a product of three matrices. SVD can be used for feature
selection because it can reduce the number of features while preserving the most
important information.
• Independent component analysis (ICA): ICA is a technique that finds the
independent components of a signal by maximizing the statistical independence of
the components. ICA can be used for feature selection because it can reduce the
number of features while preserving the most important information.

You might also like