Week 10
Week 10
Science
Feature Engineering
WEEK-??
Feature Engineering
Feature engineering is the process of extracting meaningful features from
raw data. We can experiment with different features based on our domain knowledge
or understanding of the data.
•There are mainly four different ways to do feature engineering:
1. Feature Transformation (FT)
2. Feature Construction
3. Feature Selection
4. Feature Extraction
1. Feature Transformation
Feature transformation is the process of modifying features to make them more suitable
for machine learning algorithms." It includes
5. Handling missing values,
6. Handling categorical values (converting categorical features to numerical values)
7. Detecting outliers, and
8. Calling features to a standard or common range.
•
Handling Missing Values
•Missing values can crash our data and ruin the model due to overlooking them.
•There are two main approaches to handling missing values:
Imputation: This is like filling in the blanks with estimates. We can use the mean, median, or mode of the
nearest values, or we can use some logic to fill in the blanks.
•Python Code:
•#file: cse315_1.py
•#data file impu.csv
•import pandas as pd
•data1 = pd.read_csv("C:/Users/HP/Desktop/ddd/impu.csv")
•impudat=data1.fillna(0)
•print(" Before imputation",data1,sep='\n')
Deletion
Deletion: We can remove the rows or columns with missing values.
•#file: cse315_del.py
•#data file deletion.csv
•import pandas as pd
•dele = pd.read_csv("C:/Users/HP/Desktop/ddd/deletion.csv")
•#deldat=dele.dropna(inplace=False)
•deldat=dele.dropna(inplace=False)
•print(" Before deletion",dele,sep='\n')
•print(" After deletion",deldat,sep='\n')
Encoding: Handling Categorical Variable
Data can be divided into numerical (quantitative) and categorical
(qualitative). Categorical data can be divided into nominal and
ordinal data. Depending on the data type, there are different ways
to convert categorical data to numerical data. This process is
called encoding.
Encoding refers to the process of converting categorical data
into a numerical format.
•
One-hot encoding
•By far the most common way to represent categorical variables is using the one-hot
encoding or one-out-of-N encoding, also known as dummy variables.
One Hot Encoder is a popular encoder through which categorical variables can be converted into
separate columns and their presence can be expressed in each column by Boolean True/False or 0/1.
This is encoded by the OneHotEncoder function, but we will see its application in the following
example by its dummy variable using the get_dummies function.
For example, we created two separate columns for OWN_OCCUPIED columns 1 and 2 and
expressed the value of the columns by 0/1.
• #One Hot Encoding for nominal data
• Import pandas as pd
• df=……..
• df1 = pd.get_dummies(df, columns=['OWN_OCCUPIED'])
• df1
Example of one-hot coding
• Example 1
• Example 2: Consider the data where fruits, their corresponding categorical values, and prices are given.
Fruit Categorical value of fruit Price apple mango orange price
apple 1 5 1 0 0 5
• mango 2 10 0 1 0 10
apple 1 15 1 0 0 15
orange 3 20 0 0 1 20
Ordinal Encoding
Ordinal data can be converted to ordinal encoding.
Example:
Data Transformation
• Data transformation is an important issue for machine learning.
• Suppose you have a dataset that contains some people's age and
income data. The age range of human beings is usually from 0 to
100, in fact it is seen that those who earn may be between 25 and
60 years of age.
• On the other hand, the amount of income can be from a few
thousand to a few lakhs. So it is clear to us that there is a big
difference between the age range and the income range.
Sometimes such differences can cause the model to be biased.
• Also, scaling the data increases the performance of the model, in
many cases it takes less time to run the model.
• Many people also call data transformation as feature engineering.
Label Encoding
One-hot and Ordinal encoders can be used for explanatory/independent
variables (x). For prediction/target variables (y), we use label encoding,
specially designed for output or target variables. from sklearn.preprocessing
import LabelEncoder
• Outliers are data points significantly different from the rest of the data set. They can
affect the accuracy of our model.
n
A value is standardized as follows:
xi – mean xi – x
x
i 1
i
n
i 1
xi x
2
yi , where x and s
standard deviation s n n 1
We can guesstimate a mean of 10.0 and a standard deviation of about 5.0. Using
these values, we can standardize the first value of 20.7 as follows:
y = (x – mean) / standard_deviation
y = (20.7 – 10) / 5= (10.7) / 5= 2.14
The mean and standard deviation estimates of a dataset can be more robust to
new data than the minimum and maximum.
Normalization
• Apply min max normalization for the salary value of 64000 and 55000
Max distance for salary = 100000 Min distance for salary = 19000
Min-Max Normalization
Applying the min-Max normalization formula,
For 64000,
Both standard and robust scalers transform inputs to comparable scales. The
difference lies in how they scale raw input values. Robust scaling answers a simple
question. How far is each data point from the input’s median? More precisely, it
measures this distance in terms of the IQR using the below formula:
x x
2
z
x mean
x x
, where s , and x mean
standard deviation s n 1
x = particular value, and n = number of values
Mean = = 75
⸫sd = = 10.58
Z-Score Normalization
After applying the formula we get,
For 71,
= -0.37780
For 67,
z = -0.7559
For 67,
z = 1.1339
Decimal Scale Normalization
Decimal scaling normalization aims to scale the feature values by a power of 10,
ensuring that the largest absolute value in each feature becomes less than 1. It is
useful when the range of values in a dataset is known, but the range varies
across features. The formula for decimal scaling normalization is:
Xdecimal = X / 10d
X is the original feature value, and d is the smallest integer such that the largest
absolute value in the feature becomes less than 1.
For example, if the largest absolute value in a feature is 350, then d would be 3,
and the feature would be scaled by 103.
Decimal scaling normalization is advantageous when dealing with datasets
where the absolute magnitude of values matters more than their specific scale.
Decimal Scale Normalization
CGPA Formula After Decimal Money Formula After Decimal
Normalization Normalization
2 2/10 0.2 500 500/1000 0.5
3 3/10 0.3 320 320/1000 0.32
Normalization Standardization
The objective is to bring the values of a feature within a The objective is to transform the values of a feature to
specific range, often between 0 and 1 have a mean of 0 and a standard deviation of 1
Maintains the interpretability of the original values Alters the original values, making interpretation more
within the specified range challenging due to the shift in scale and units
This can lead to faster convergence, especially in It also contributes to faster convergence, particularly in
algorithms that rely on gradient descent algorithms sensitive to the scale of input features.
Use cases: Image processing, neural networks, algorithms Use cases: Linear regression, support vector machines,
sensitive to feature scales algorithms assuming a normal distribution
Which procedure is appropriate for when?
• It is difficult to say when to use any kind of transformation. Depending on
the type of problem.
• Data scaling is very important in case of distance based algorithms like
SVM, KNN, clustering etc.
• On the other hand, non-distance based algorithm is not very important in
case of naive-bayes, various tree based algorithms.
• The normalization brings the data on a scale from 0 to 1 Standardization,
• on the other hand, brings data within Min 0 and Standard Deviation 1.
• Normalization can be used if there is a large difference in the range of
values of the dataset feature.
• Standardization works well if there is an outlier in the data.
• However, in most cases, standardization works well overall.
Feature Construction
The process of developing new features from existing features or upon our domain knowledge is known as
feature construction.
•Making the features more informative and relevant to the task helps machine learning models perform better.
Feature Construction
•There are numerous ways to build features, but some typical techniques include:
Repurposing existing features: This is like remixing old songs. We can combine existing features in new
ways to create something new and exciting. Combine, alter, or create new features from existing ones. For
example, you could combine the features "sibsp" and "parch" in the Titanic dataset to create a new feature
called "family."
Using domain expertise: This is like consulting a chef. You can use your domain understanding to create
new features essential to the task. Create new features that are important to the task based on our domain
knowledge. For example, if you are developing a model to predict customer churn, you could add a new
feature called "number of months since last purchase" if you know that customers who haven't purchased in
a while are more likely to churn.
Using feature selection algorithms: This is like hiring a personal shopper. You can use these algorithms to
determine the most important features from a data set and then build new features based on those
features. It's like curating the best attributes to create powerful new ones!
Feature Extraction
LDA is a supervised machine learning algorithm that seeks to find the directions in
the data that best separate the different known categories.
Feature Extraction: Linear Discriminant Analysis(LDA)
LDA is a supervised machine learning algorithm that seeks to find the directions in
the data that best separate the different known categories.
Feaure Extraction: T-distributed Stochastic Neighbor Embedding(T-SNE)
ANOVA: This method is like a judge in a courtroom. It looks at the means of multiple
groups and decides if there's enough evidence to convict them of being essential
features.
Feature Selection: Wrapper methods
•Filter methods are a simple way to select features, but they do not consider the relationship
between features. This can lead to the selection of features that are not relevant to the target
variable.
•The wrapper method is like a dating app for features. It takes many features out on dates
with a machine learning algorithm and then sees which ones the algorithm likes the best.
The feature that gets the most dates is the one that gets selected.
Wrapper method considers the relationship between features by training a machine learning
algorithm on a subset of features and then evaluating the algorithm's performance."
•This process is repeated for different subs
•This process is repeated for different subsets of features, and the subset that results in the
best performance is selected.
Feature Selection: Wrapper Method
Feature Selection: Wrapper methods cont.
•"Embedded methods integrate the feature selection process into the machine learning algorithm itself.“
• Random forests
Test for feature selection
• In the case of filter-based methods, statistical tests are used to determine the strength of
correlation of the feature with the target variable. The choice of the test depends on the
data type of both input and output variable (i.e. whether they are categorical or
numerical.). You can see the most popular tests in the table below.
Test (cont)
Input Output Feature Selection Model