0% found this document useful (0 votes)
636 views

Feature Engineering PDF

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to improve machine learning results. It involves extracting relevant attributes from data, creating new features from existing ones, selecting important features, and reducing dimensionality. Some key techniques include feature extraction, creation, selection, and dimensionality reduction methods like PCA. Feature engineering is especially important when data is limited to avoid overfitting.

Uploaded by

Rutuparn Dalvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
636 views

Feature Engineering PDF

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to improve machine learning results. It involves extracting relevant attributes from data, creating new features from existing ones, selecting important features, and reducing dimensionality. Some key techniques include feature extraction, creation, selection, and dimensionality reduction methods like PCA. Feature engineering is especially important when data is limited to avoid overfitting.

Uploaded by

Rutuparn Dalvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Feature Engineering

(In Machine Learning)


What is a feature?

An attribute (coordinate) of the observation (point)


that is important from learning or prediction point
of view

Not all attributes are features

2
Examples of Features …
• An attribute (in a table)
• Line in an image
• A phrase
• A word count

3
What is Feature Engineering?

A set of steps taken to present the original and / or


transformed data to a machine learning strategy –
such that inherent important structures in the data
are exposed for the purpose of model creation

4
What is Feature Engineering?

5
Feature Engineering: when required, and not …
• Feature engineering is required when …
– Limited data is available
• “Curse of dimensionality” if more features are considered in
model building
• Cases of over-fitting if there are more features and less data
– Limited computation power

• Feature engineering may not be required when …


– Copious data availability (eg. images, server logs)
– Computation power is not an issue (eg. cloud computing)
– Most important: availability of universal function
approximators
• Artifical Neural Networks, Deep Learning Networks

6
The Feature Engineering Process
Execution of the following steps

1. Create / identify set of relevant features


2. Fit a model and run validation tests
3. Re-design or re-select features based on results of
validation
4. Perform step 2

Repeat the process until ‘satisfactory’ results are


obtained or there is no further improvement

7
Components of Feature Engineering
• Feature Extraction
• Feature Creation
• Feature Selection
• Dimensionality Reduction
– PCA, SVD

8
Feature Extraction
• Goal
– To increase the level of abstraction
– To reduce the total data sent into learning algorithms
• Example
– Edge detection in images
– Curvature detection in 3d models
– Number of concavities / convexities in 3D models
– Identifying regions with same “colours”
• Satellite imaging
• Temperature based tool condition monitoring
• MRI / X-Ray processing
9
Feature Extraction

10
Feature Extraction

11
Feature Creation
• Goal: To create a set of attributes
– based on domain knowledge or pre-processing /
visualization
– that are known to better describe the structure of the data
to be processed
• Example:
– In linear regression, addition of new terms like ‘log’, ‘tanh’,
‘exp’, ‘sin’, square, cube, x1 * x2 (feature combinations),
etc.
– One-hot encoding: creation of dummy variables
– Discretizing continuous attributes
– Combining multiple attributes into one feature
– Addition of new terms resulting from ‘feature extraction’
12
One-hot-encoding

13
Feature Selection
• Goal: To reduce the total number of ‘features’ sent
into the machine learning algorithm
– To reduce model complexity and model computation time
• Methods
– Forward selection:
• Start with minimal set and gradually add features
– Backward selection:
• Start with a maximal set and gradually reduce features
– Filter
• Based on analysis such as Pearson correlation coefficient
– Embedded methods
• LASSO (Least Absolute Shrinkage and Selection Operator: L1
penalty), Ridge (L2), ElasticNet (L1+L2) Regression

14
Feature Selection: Forward Selection

15
Feature Selection: Forward Selection

16
Feature Selection: Backward Selection

17
Feature Selection: Backward Selection

18
Dimensionality Reduction
• Goal:
– To reduce number of features by identifying feature combinations
• Example
– Principal Component Analysis

19

You might also like