UNIT 4
UNIT 4
Feature Engineering
Feature Engineering is the process of creating new features or transforming existing features
to improve the performance of a machine-learning model. It involves selecting relevant
information from raw data and transforming it into a format that can be easily understood by
a model. The goal is to improve model accuracy by providing more meaningful and relevant
information.
Feature engineering is the process of transforming raw data into features that are suitable
for machine learning models. In other words, it is the process of selecting, extracting, and
transforming the most relevant features from the available data to build more accurate and
efficient machine learning models.
The success of machine learning models heavily depends on the quality of the features used
to train them. Feature engineering involves a set of techniques that enable us to create new
features by combining or transforming the existing ones. These techniques help to highlight
the most important patterns and relationships in the data, which in turn helps the machine
learning model to learn from the data more effectively.
What is a Feature?
● For example, in a dataset of housing prices, features could include the number of
bedrooms, the square footage, the location, and the age of the property. In a
dataset of customer demographics, features could include age, gender, income
level, and occupation.
● The choice and quality of features are critical in machine learning, as they can
greatly impact the accuracy and performance of the model.
We engineer features for various reasons, and some of the main reasons include:
1. Feature Creation
Feature Creation is the process of generating new features based on domain knowledge or by
observing patterns in the data. It is a form of feature engineering that can significantly
improve the performance of a machine-learning model.
2. Feature Transformation
Feature Transformation is the process of transforming the features into a more suitable
representation for the machine learning model. This is done to ensure that the model can
effectively learn from the data.
Feature Extraction is the process of creating new features from existing ones to provide more
relevant information to the machine learning model. This is done by transforming,
combining, or aggregating existing features.
1. Improves Model Performance: By creating new and more relevant features, the
model can learn more meaningful patterns in the data.
2. Reduces Overfitting: By reducing the dimensionality of the data, the model is
less likely to overfit the training data.
3. Improves Computational Efficiency: The transformed features often require
fewer computational resources.
4. Improves Model Interpretability: By creating new features, it can be easier to
understand the model’s predictions.
4. Feature Selection
Feature Selection is the process of selecting a subset of relevant features from the dataset to
be used in a machine-learning model. It is an important step in the feature engineering
process as it can have a significant impact on the model’s performance.
Types of Feature Selection:
1. Filter Method: Based on the statistical measure of the relationship between the
feature and the target variable. Features with a high correlation are selected.
2. Wrapper Method: Based on the evaluation of the feature subset using a specific
machine learning algorithm. The feature subset that results in the best
performance is selected.
3. Embedded Method: Based on the feature selection as part of the training process
of the machine learning algorithm.
1. Reduces Overfitting: By using only the most relevant features, the model can
generalize better to new data.
2. Improves Model Performance: Selecting the right features can improve the
accuracy, precision, and recall of the model.
3. Decreases Computational Costs: A smaller number of features requires less
computation and storage resources.
4. Improves Interpretability: By reducing the number of features, it is easier to
understand and interpret the results of the model.
5. Feature Scaling
Feature Scaling is the process of transforming the features so that they have a similar scale.
This is important in machine learning because the scale of the features can affect the
performance of the model.
Overall, the goal of feature engineering is to create a set of informative and relevant features
that can be used to train a machine learning model and improve its accuracy and
performance. The specific steps involved in the process may vary depending on the type of
data and the specific machine-learning problem at hand.
Feature engineering is the process of transforming raw data into features that are suitable for
machine learning models. There are various techniques that can be used in feature
engineering to create new features by combining or transforming the existing ones. The
following are some of the commonly used feature engineering techniques:
One-Hot Encoding
One-hot encoding is a technique used to transform categorical variables into numerical values
that can be used by machine learning models. In this technique, each category is transformed
into a binary value indicating its presence or absence. For example, consider a categorical
variable “Colour” with three categories: Red, Green, and Blue. One-hot encoding would
transform this variable into three binary variables: Colour_Red, Colour_Green, and
Colour_Blue, where the value of each variable would be 1 if the corresponding category is
present and 0 otherwise.
Binning
Scaling
The most common scaling techniques are standardization and normalization. Standardization
scales the variable so that it has zero mean and unit variance. Normalization scales the
variable so that it has a range of values between 0 and 1.
Feature Split
Feature splitting is a powerful technique used in feature engineering to improve the
performance of machine learning models. It involves dividing single features into multiple
sub-features or groups based on specific criteria. This process unlocks valuable insights and
enhances the model’s ability to capture complex relationships and patterns within the data.
Definition
A moving window function computes a statistic over a fixed-size sliding window of data in a
time series. It is useful for smoothing, trend detection, and calculating rolling statistics.
Key Concepts
Window Size
Use min_periods to specify the minimum number of non-NA values needed for calculation.
Results are NaN for windows with insufficient valid data.
Multiple Window Sizes
Analyze both short- and long-term trends by chaining .rolling() with different window sizes.
Techniques and Applications
Trend Estimation
Compute rolling averages to identify long-term trends and remove short-term noise.
Smoothing
Detects repeating patterns by setting a window size matching the seasonal cycle.
Outlier Detection
St = α⋅Xt+(1−α)⋅St−1
Correlation Analysis
Rolling Mean
rolling_mean = ts.rolling(window=3).mean()
Expanding Mean
exp_mean = ts.expanding().mean()
Percentile Rank
from scipy.stats import percentileofscore
rank = percentileofscore(scores, 9)
These techniques provide powerful tools for time series analysis, enabling the extraction of
meaningful insights and patterns.
Fourier Decomposition
Fourier decomposition is very mathematical and not at all obvious. Figure 5-16 shows an
example of the technique. Any N point signal can be decomposed into N + 2 signals, half of
them sine waves and half of them cosine waves. The lowest frequency cosine wave (called
xC0[n] in this illustration), makes zero complete cycles over the N samples, i.e., it is a DC
signal. The next cosine components: xC1[n], xC2[n], and xC3[n], make 1, 2, and 3 complete
cycles over the N samples, respectively. This pattern holds for the remainder of the cosine
waves, as well as for the sine wave components. Since the frequency of each component is
fixed, the only thing that changes for different signals being decomposed is the amplitude of
each of the sine and cosine waves.
Fourier decomposition is important for three reasons. First, a wide variety of signals are
inherently created from superimposed sinusoids. Audio signals are a good example of this.
Fourier decomposition provides a direct analysis of the information contained in these types of
signals. Second, linear systems respond to sinusoids in a unique way: a sinusoidal input always
results in a sinusoidal output. In this approach, systems are characterized by how they change
the amplitude and phase of sinusoids passing through them. Since an input signal can be
decomposed into sinusoids, knowing how a system will react to sinusoids allows the output of
the system to be found. Third, the Fourier decomposition is the basis for a broad and powerful
area of mathematics called Fourier analysis, and the even more advanced Laplace and
z-transforms. Most cutting-edge DSP algorithms are based on some aspect of these techniques.
Why is it even possible to decompose an arbitrary signal into sine and cosine waves? How are
the amplitudes of these sinusoids determined for a particular signal? What kinds of systems
can be designed with this technique? These are the questions to be answered in later chapters.
The details of the Fourier decomposition are too involved to be presented in this brief
overview. For now, the important idea to understand is that when all of the component
sinusoids are added together, the original signal is exactly reconstructed.
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN)
designed to better capture long-range dependencies in sequences of data. In deep learning,
LSTMs are particularly useful for tasks where the order of inputs is crucial, such as time
series forecasting, speech recognition, language modeling, and many other sequence-related
tasks.
1. Memory Cell:
○ LSTMs utilize gates to control the flow of information. These gates decide
what information should be updated, forgotten, or passed forward. The three
main gates in LSTM are:
■ Forget Gate: Decides which information should be discarded from the
memory cell.
■ Input Gate: Determines what new information should be added to the
memory cell.
■ Output Gate: Controls the output of the memory cell, deciding what
information to pass to the next layer or timestep.
3. Cell State:
○ The cell state is the internal memory of the LSTM unit. It carries information
across time steps and is updated by the gates. It can carry important features
that were learned earlier, allowing the network to maintain long-term context.
4. Capturing Long-Range Dependencies:
○ LSTMs are highly effective for predicting future values in time series data
(e.g., stock prices, weather forecasting, energy demand) because of their
ability to learn long-term temporal dependencies.
3. Anomaly Detection:
○ LSTMs can be used for action recognition in videos, where they model
temporal dependencies between frames to identify actions or events over time.
5. Healthcare and Bioinformatics:
○ LSTMs are used for analyzing medical time series data, such as heart rate
signals, EEG data, or patient health records, where sequential patterns are
important.
Advantages of LSTM:
Image Histogram:
1. Pixel Intensity:
○ Each pixel has an intensity value (0-255 for grayscale, or separate values for
Red, Green, and Blue in color images).
2. X-axis: Intensity Levels:
○ The x-axis represents the range of intensity values (0 to 255 for grayscale or
separate channels for RGB).
3. Y-axis: Frequency:
○ The y-axis shows the number of pixels at each intensity level, representing the
distribution of pixel values.
Types of Histograms:
1. Grayscale Histogram:
○ For a grayscale image, the histogram consists of 256 bins (one for each
possible intensity level from 0 to 255). A grayscale histogram indicates how
many pixels in the image have a particular gray level intensity.
2. Color Histogram:
○ For a color image, there are three histograms: one for each color channel (Red,
Green, Blue). Each channel's histogram will also have 256 bins, indicating the
distribution of that particular color intensity.
1. Image Enhancement:
● Underexposed Image: The histogram might be concentrated towards the left (dark
regions), with few pixels in the higher intensity range.
● Overexposed Image: The histogram might be skewed towards the right (bright
regions), with few pixels in the lower intensity range.
● Balanced Image: A well-balanced histogram will have a distribution of pixel
intensities across the full range, with no significant concentration at one end.
For a color image, you will usually see three separate histograms:
● Red Histogram: Shows the distribution of the red channel intensity values.
● Green Histogram: Shows the distribution of the green channel intensity values.
● Blue Histogram: Shows the distribution of the blue channel intensity values.
Each of these histograms can help in adjusting the color balance of the image. If one channel
(say, red) dominates, the image may appear too red, and adjustments can be made to correct
the balance.
import cv2
import matplotlib.pyplot as plt
1. Scale-Invariance:
○ SIFT detects features that are invariant to scale, meaning it can find the same features
in images that are resized.
2. Rotation-Invariance:
○ SIFT can identify the same features even if the image is rotated.
3. Keypoint Detection:
○ Each keypoint is assigned a consistent orientation based on the local image gradient.
This ensures rotation-invariance.
5. Feature Description:
○ SIFT generates a descriptor for each keypoint using the gradient orientations around
the keypoint, creating a robust feature vector for matching keypoints across images.
6. Robust to Noise and Illumination:
○ Once keypoints are detected and described, SIFT can be used to match corresponding
keypoints across different images, useful in tasks like image stitching, 3D
reconstruction, and object recognition.
Applications of SIFT:
Limitations:
Sample Code:
import cv2
import numpy as np
# Load image
sift = cv2.SIFT_create()
cv2.waitKey(0)
cv2.destroyAllWindows()
Here's a more concise version of your points with the most important aspects highlighted,
making them easier to memorize:
Convolutional Layers:
● Core Operation: Slide a filter over the image to detect features (edges, textures).
● Multiple Filters: Different filters capture different features (e.g., horizontal/vertical
edges).
Pooling Layers:
● Max Pooling: Downsamples by selecting the max value in a region (e.g., 2x2).
● Average Pooling: Takes the average value in a region.
● Purpose: Reduces spatial dimensions, lessens computation, and improves translation
invariance.
● After convolution and pooling, these layers make predictions by flattening feature
maps into 1D vectors.
Activation Functions:
Stride:
● Controls how much the filter moves during convolution. Larger strides reduce feature
map size and computation.
Padding:
● Adds extra pixels around the input to preserve spatial dimensions. Types: "valid" (no
padding), "same" (keeps dimensions).
● Each neuron connects to a small image region, focusing on local patterns and
reducing parameters.
Weight Sharing:
● The same filter is applied across different regions of the image, reducing parameters
and overfitting.
● CNNs learn from low-level features (edges, textures) to high-level features (objects,
shapes).
Transfer Learning:
● Pre-trained models (like VGG, ResNet) can be reused and fine-tuned for specific
tasks, saving data and training time.
Applications of CNNs:
1. Image Classification:
○ CNNs are widely used to classify images into categories. For example,
recognizing whether an image contains a dog, cat, or other objects.
2. Object Detection:
○ CNNs can identify the locations of objects within an image and classify them.
This is commonly done using architectures like YOLO (You Only Look
Once) or Faster R-CNN.
3. Image Segmentation:
○ CNNs are used for pixel-wise classification, where each pixel is labeled as
part of a particular object or background. Techniques like U-Net or FCN
(Fully Convolutional Network) are used for segmentation tasks.
4. Face Recognition:
○ CNNs are commonly used in facial recognition systems, where they learn
distinctive features of faces and match them across different images.
Advantages of CNNs:
● Automatic Feature Extraction: CNNs learn features directly from the data, reducing
the need for manual feature engineering.
● Parameter Efficiency: Due to weight sharing and local receptive fields, CNNs are
highly efficient in terms of the number of parameters.
● Robustness to Variations: CNNs are robust to variations in input, such as scaling,
rotation, and partial occlusion.
● Hierarchical Learning: CNNs can learn increasingly complex features as they go
deeper, allowing for rich and high-level representations of data.
Challenges: