0% found this document useful (0 votes)
10 views

Machine Learning – I[1]

The document outlines the principles of well-posed learning problems in machine learning, emphasizing the importance of existence, uniqueness, and stability of solutions. It details the steps for designing a learning system, including problem definition, data gathering, model selection, training, evaluation, and deployment, while also discussing concepts like empirical risk minimization and inductive bias. Additionally, it provides examples, such as classifying emails as spam, to illustrate these concepts in practice.

Uploaded by

ihtgoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Machine Learning – I[1]

The document outlines the principles of well-posed learning problems in machine learning, emphasizing the importance of existence, uniqueness, and stability of solutions. It details the steps for designing a learning system, including problem definition, data gathering, model selection, training, evaluation, and deployment, while also discussing concepts like empirical risk minimization and inductive bias. Additionally, it provides examples, such as classifying emails as spam, to illustrate these concepts in practice.

Uploaded by

ihtgoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Machine Learning – I

Well-Posed Learning Problems


• A problem is considered well-posed if it meets three essential criteria
(as described by Hadamard in the context of mathematical modeling):
• Existence of a Solution (There must exist a hypothesis or model that
can solve the given problem using the available data.)
• Uniqueness of the Solution (The solution should be unique or at least
behave consistently across similar datasets.)
• Stability of the Solution (The solution should not change drastically
for small changes in the input data.)
Applying the Concept to Machine Learning
• Having sufficient and high-quality data to train the model (ensures
existence).
• Avoiding ambiguity in the target output by clearly defining the goal
(ensures uniqueness).
• Using regularization techniques or proper validation methods to
prevent overfitting (ensures stability).
Example: Classifying Emails as Spam or Not
Spam
• Problem: Classify incoming emails into two categories: spam or not
spam.
• Existence: We can collect a labeled dataset of emails, and algorithms
like Logistic Regression or Decision Trees can be used to learn a
classification function.
• Uniqueness: If two models are trained on sufficiently large and
representative datasets, they should converge to similar spam
classifiers.
• Stability: Small variations in the email content (like adding a few
irrelevant words) should not drastically change the classification.
Designing a Learning System
1. Defining the Problem
• The first step in designing a learning system is clearly defining the
problem to be solved.
• Example: Predict whether a customer will buy a product based on
their browsing history.
• Clearly identifying the input (customer data) and the desired output
(purchase or not) helps determine the type of learning problem
(classification, regression, etc.).
2. Gathering and Preprocessing Data
• Data Collection: Collect data relevant to the problem.
• Example: For predicting customer purchases, collect data on user interactions,
demographics, and previous purchase history.
• Data Cleaning: Handle missing values, outliers, and inconsistencies.
• Feature Engineering: Create meaningful features by transforming or
combining raw data
• Data Normalization: Scale features to a standard range (e.g., 0 to 1) if
required, to improve model performance.
3. Choosing the Type of Learning
Based on the problem definition, decide whether the system will use:
• Supervised Learning: The system learns from labeled data (e.g., spam
email classification).
• Unsupervised Learning: The system finds patterns in unlabeled data
(e.g., customer segmentation).
• Reinforcement Learning: The system learns by interacting with an
environment to achieve a goal (e.g., training a robot to walk).
4. Selecting a Model
Choose an appropriate model based on the type of problem and data.
• Examples:
• For classification: Decision Trees, SVM, Logistic Regression, or Neural
Networks.
• For regression: Linear Regression, Random Forest Regressor, or Polynomial
Regression.
• For unsupervised tasks: K-Means Clustering, PCA, or DBSCAN.
5. Training the Model
Training involves feeding the data to the selected model and adjusting
its internal parameters (weights) to minimize the error.
• Training Process:
• Split the data into training and testing sets.
• Use the training set to train the model by minimizing a loss function (e.g.,
Mean Squared Error for regression or Cross-Entropy Loss for classification).
• Regularize the model to avoid overfitting.
6. Evaluating the Model
After training, evaluate the model’s performance using appropriate
metrics.
• Metrics for Classification: Accuracy, Precision, Recall, F1-Score.
• Metrics for Regression: R² Score, Mean Absolute Error (MAE), Mean
Squared Error (MSE).
• Cross-Validation: Use techniques like k-fold cross-validation to get a
robust estimate of model performance.
7. Tuning Hyperparameters
Hyperparameters are the external settings of the model (e.g., learning
rate, number of trees in a Random Forest).
• Techniques:
• Grid Search: Try all combinations of hyperparameters.
• Random Search: Randomly sample combinations.
• Bayesian Optimization: Use probabilistic models to find the best
hyperparameters.
8. Deploying the Model
• Once the model is trained and fine-tuned, it can be deployed for real-
world use.
• Deployment Options:
• Deploy as a web service using Flask or FastAPI.
• Integrate into an existing application or system.
• Monitor the model’s performance over time and retrain when necessary.
Example Workflow
For a spam email classifier:
• Define the problem: Classify emails as spam or not spam.
• Gather data: Collect a dataset of labeled emails.
• Preprocess data: Clean the email text, remove stop words, and convert text to
numerical features.
• Choose a model: Use a Naïve Bayes classifier.
• Train the model: Fit the Naïve Bayes model to the training data.
• Evaluate the model: Use metrics like Precision and Recall to assess its accuracy.
• Tune hyperparameters: Optimize parameters like smoothing factor in Naïve
Bayes.
• Deploy the model: Integrate the classifier into an email filtering system.
• Monitor performance: Continuously track misclassified emails and retrain when
necessary.
Well Posed Learning
Definition # 1
Arthur Samuel (1959) coined the term machine learning and defined it
as: ‘the field of study that gives computers the ability to learn without
being explicitly programmed.’ This is an informal and old definition of
machine learning.
Definition # 2
In 1998, Tom Mitchell redefined the concept of machine learning as ‘A
computer program is said to learn from experience E with respect to
some class of tasks T and performance measures P; if its performance
at tasks in T, as measured by P, improves with experience E.’
Well Posed Learning Problem

• Classification of emails as spam or not spam --- Task T

• Tracking user marking emails as spam or not spam --- Experience E

• The number of emails correctly classified as spam or not spam ---


Performance P
Designing a Learning System
How Machines Learn?
Learning Input Output Functions
• Goal- We are trying to learn a function f (target function)
• Input f Output

• f takes a vector-valued input→ a n-tuple x=(x1,x2,…….,xn)


• f itself may be vector-valued → yielding a k-tuple as output
Example: Learning Input-Output Functions
Statistical Learning Framework
• The Statistical Learning Framework provides a formal approach to
understanding and modeling how machines learn from data.
• It is grounded in probability theory and statistics, aiming to find
patterns or functions that can predict outputs from inputs effectively.
1. Problem Setup
• Input Space (X): The set of all possible input features.
Example: In house price prediction, features could include the size of
the house, the number of rooms, etc.
X = (x1,x2,…,xn)

• Output Space (Y): The set of all possible output values.


Example: The price of the house.
Y=R (for regression) or Y={0,1} (for classification)
2. Goal of Learning
The aim is to find a function 𝑓:𝑋→𝑌 that can predict the output 𝑌 given
an input 𝑋. The function 𝑓 belongs to a set of possible functions called
the Hypothesis Space 𝐻.
3. Loss Function (L)
To measure how well a function 𝑓 predicts the output, we define a loss
function 𝐿(𝑦,𝑦^),
where 𝑦 is the true output, and
𝑦^=𝑓(𝑥) is the predicted output.

Common examples:
For regression: Mean Squared Error (MSE)
𝐿(𝑦,𝑦^)=(𝑦−𝑦^)^2
For classification: 0-1 loss (indicates whether the prediction is correct)
4. Risk Function
For overall performance of the model across all possible inputs, an
expected risk (also called true risk) is calculated to measure the
average loss over the entire input space.
5. Empirical Risk Minimization (ERM)
Empirical risk minimization is a principle where a model minimizes the
average loss (or risk) over a finite dataset or on a given training set.
Since we cannot compute the true risk (which involves all possible data
points in the universe), we approximate it by minimizing the risk on the
given dataset.
𝑛
1
𝑅𝑒𝑚𝑝 𝑓 = ෍ 𝐿(𝑦𝑖 , 𝑓 𝑥𝑖 )
𝑛
𝑖=1
where, 𝐿(𝑦𝑖 , 𝑓 𝑥𝑖 ) is the loss function measuring the difference
between the true output 𝑦 𝑖 and the predicted output ℎ 𝑥𝑖 .
𝑛 is the number of training samples.
Visualizing ERM
• Imagine a target (the true function)
and a set of darts (models).
• Goal: to throw a dart as close to the
center of the target as possible.
•The Target: Represents the true underlying
function
•The Darts: Represent different models, each
with its own set of parameters.
•The Bullseye: Represents the optimal model,
which minimizes the loss.
If the dataset contains the following three
house prices and predicted prices:
House True Price ($) Predicted Price ($) Loss
A 250,000 240,000 100,000,000
B 300,000 310,000 100,000,000
C 200,000 180,000 400,000,000

Total empirical risk:

1
𝑅𝑒𝑚𝑝 𝑓 = 100,000,000 + 100,000,000 + 400,00,000 = 200,000,000
3
Scenario: Predicting whether a customer will
buy a product based on age and income.
1. Input features (X):
• Age of the customer
• Income of the customer
2. Output (Y):
• Binary classification (0 = No, 1 = Yes)
3. Loss Function:
Let’s use the 0-1 loss function, where the loss is 0 if the prediction is correct
and 1 if it is incorrect:
𝐿(𝑦𝑖 , ℎ 𝑥𝑖 ) = 0 𝑖𝑓 𝑦𝑖 = ℎ 𝑥𝑖
1 𝑖𝑓 𝑦𝑖 ≠ ℎ 𝑥𝑖
4. Dataset (D): Suppose the dataset contains 5 customer records:
Age (X1) Income (X2) Will Buy (Y)
25 30k 1
35 50k 0
45 70k 0
30 40k 1
40 60k 1

Empirical Risk Calculation:


Let’s say a simple model h predicts that customers under 40 always buy the product (i.e., h(x) = 1 if age < 40).

Applying the model to the dataset:


For (25, 30k, 1), prediction = 1 → Correct → Loss = 0
For (35, 50k, 0), prediction = 1 → Incorrect → Loss = 1
For (45, 70k, 0), prediction = 0 → Correct → Loss = 0
For (30, 40k, 1), prediction = 1 → Correct → Loss = 0
For (40, 60k, 1), prediction = 1 → Correct → Loss = 0

The empirical risk is the average loss:


𝑅𝑒𝑚𝑝 ℎ = (0+1+0+0+0) / 5 = 1/5 = 0.2
6. Overfitting and Underfitting
• While minimizing empirical risk, two common issues can arise:
• Overfitting: The model learns the noise in the training data, leading to
poor generalization on new data.
EX-You create a very complex model that memorizes every detail of the
training emails, including specific words and sender names. It performs
perfectly on the training data but poorly on new emails because it learned
noise rather than general pattern
• Underfitting: The model is too simple to capture the underlying pattern in
the data, resulting in high errors on both training and test sets.
EX- You create a very simple model that only looks at whether the email
contains the word "discount." It misses many spam emails that don’t contain
this word, resulting in high errors on both training and new data.
7. Inductive Bias
• To avoid overfitting and underfitting, we introduce inductive bias,
which refers to the assumptions a model makes to generalize better.
• It is the set of assumptions a model makes to generalize from the
training data to unseen data. Without inductive bias, the problem of
learning would be impossible since an infinite number of hypotheses
can fit the training data perfectly.
• ERM with inductive bias means selecting a hypothesis from a
restricted class (biased towards simpler models) that minimizes
empirical risk while assuming it will generalize well.
ERM with Inductive Bias
• The Overfitting can be avoid by:
• Limiting the number of possible
hypothesis
• Increasing the number of training
data Restricting the search space
of ERM to avoid Overfitting
• Hypothesis Class (H)
• ERM
• By limiting the search space to a
specific hypothesis class, we can
reduce the risk of overfitting. This
restriction is often referred to as
inductive bias.
The Experiment Setup
• Participants: Pigeons.
• Apparatus: A box (commonly referred to as a "Skinner box")
equipped with a mechanism to deliver food to the pigeons at random
intervals.
• Procedure:The pigeons were placed in the box, where food
(reinforcement) was delivered at fixed time intervals, regardless of
what the pigeon was doing at that moment.
• Importantly, the food delivery was not contingent on the pigeons’
behavior.
Observations
• The pigeons began to associate their random actions with the food
delivery.
• For instance, if a pigeon happened to turn its head just before the
food arrived, it might repeatedly turn its head in the same way,
believing this action was responsible for the food reward.
• Each pigeon developed a unique behavior or "ritual," such as
spinning, pecking the corner, or flapping its wings.
• Skinner concluded that the pigeons developed superstitious behavior
because they falsely believed that their actions caused the food
delivery.
Pigeon Superstition Experiment and ERM with
Inductive Bias
• The Pigeon Superstition Experiment can serve as a metaphorical
bridge to understand Empirical Risk Minimization (ERM) with
Inductive Bias, particularly in how seemingly random actions or
assumptions can affect learning outcomes
1. The Connection Between the Experiment
and ERM
Pigeon Superstition Experiment:
• Pigeons developed behaviors (e.g., spinning or head-turning) because they
incorrectly associated their random actions with food delivery.
• The pigeons operated without any guiding assumption (inductive bias) to
differentiate between relevant and irrelevant actions.
ERM in Machine Learning:
• ERM minimizes the empirical risk (average loss on training data).
• Without inductive bias, the model might find spurious patterns in the
data, leading to overfitting. These spurious patterns are analogous to the
pigeons’ superstitions—they "work" on the training data but don’t
generalize well.
2. Inductive Bias as a Solution
What is Inductive Bias?
• Inductive bias introduces assumptions about the relationship between
inputs and outputs. These assumptions guide the learning algorithm to
prioritize certain hypotheses over others, even if they don’t perfectly fit the
training data.
• In the pigeon experiment, inductive bias would act like a guiding principle
to help the pigeons focus on actions more likely to be causally linked to
food delivery.
How Inductive Bias Improves ERM:
• It prevents the model from learning random correlations (like pigeon
behaviors).
• It restricts the hypothesis space to simpler, more generalizable functions,
improving performance on unseen data.
3. Applying the Pigeon Experiment to ERM
with Inductive Bias
Scenario Without Inductive Bias:
• Imagine a machine learning model trained on noisy data:
• Instance Space (Input): Pigeons’ random movements.
• Target Output: Food delivery.
• Without inductive bias, the model might "learn" that spinning or flapping causes
food delivery, fitting the noisy training data but failing on new inputs.
Scenario With Inductive Bias:
• Introduce an inductive bias, such as:
• A rule: "Only actions involving the food dispenser matter.“
• The model now ignores irrelevant actions (e.g., wing flapping) and focuses on
more meaningful behaviors.
Inductive Bias Example
• Linear regression: Assumes the relationship between input features
and the target is linear.
• Decision tree: Assumes the data can be split into discrete regions
using simple rules.
• KNN: Assumes similar inputs have similar outputs.
Example with Inductive Bias
• Scenario: Predicting house prices based on size.
• If we use linear regression, our inductive bias is that price increases
linearly with size. If we use decision trees, our inductive bias is that
house prices can be grouped into ranges, where different ranges have
different prices.
• A linear regression model assumes a straight-line relationship:
Price=50 × Size + 100,000 Size (sq. ft) Price ($)

• A decision tree model might split size into ranges like: 1000 150,000
1500 200,000
• If size < 1250, price = $150,000 2000 250,000
• If 1250 ≤ size < 1750, price = $200,000
• If size ≥ 1750, price = $250,000
Both models minimize empirical risk with different inductive biases.
8. Regularization
• Regularization is a technique used to control overfitting by adding a penalty term to the loss
function.
• It ensures that the model does not become too complex.
• L1 Regularization (Lasso): Adds the sum of absolute values of weights to the loss.
𝑝

𝑅𝑟𝑒𝑔 𝑓 = 𝑅𝑒𝑚𝑝 𝑓 + λ ෍ |𝑤𝑗 |


𝑗=1

• L2 Regularization (Ridge): Adds the sum of squared values of


𝑝
weights to the loss.
𝑅𝑟𝑒𝑔 𝑓 = 𝑅𝑒𝑚𝑝 𝑓 + λ ෍ 𝑤𝑗2
𝑗=1
Here, 𝜆 controls the strength of regularization.
9. Generalization
• The ultimate goal of a learning system is to generalize well to unseen
data. This means the model should perform well not just on the
training data but also on new, unseen examples.
Good generalization is achieved by:
• Using enough training data.
• Choosing a model with the right capacity (neither too simple nor too
complex).
• Employing techniques like regularization and cross-validation.
Probably Approximately Correct (PAC)
Learning
• Provides a formal analysis of how much data is needed for a learning
algorithm to perform well on unseen data.
• Ensures with high probability (1-ẟ), the learned model will have low
error (ϵ) on new data.
• Probably: Refers to the confidence (1−δ).
• Approximately Correct: Refers to the error tolerance (ϵ).
• PAC learning guarantees that the learned model will have low error
on unseen data, with high probability.
Definition
• A concept class C is PAC-learnable if there exists a learning algorithm
such that,
• For every small error (ϵ > 0 ), confidence (δ>0), and every distribution
D:
• The algorithm outputs a hypothesis h∈H such that the error of h is at
most ϵ, with probability at least 1−δ.
Goals of PAC Learning
• To find a learning algorithm that can learn any function from a given
hypothesis space with high probability and low error, given enough
training data.
• To find a learning algorithm that generalizes well to unseen data.
• The learned hypothesis should have low error (≤ϵ) on unseen
examples with high probability (≥1−δ).
1. Approximately Correct Hypothesis
• A hypothesis is approximately correct if its error is less than or equal
to ϵ.
• Error formula:
𝑃 𝑐≠ℎ ≤ϵ
This means the hypothesis makes mistakes on at most ϵ⋅100% of the
instance space.
2. Probably Approximately Correct (PAC)
• The probability that the hypothesis h has an error ≤ϵ is at least 1−δ.
• Probability formula:

Pr(𝑃 𝑐 ≠ ℎ ≤ ϵ) ≥ 1 − δ
Example - Spam email classification
Scenario: Suppose you want to classify whether an email is spam or
not.
• You collect 1,000 emails and train a classifier.
• Your goal is for the classifier to have an error rate of less than 5%
(ϵ=0.05) with 95% confidence (1−δ=0.95).
• If the model achieves this performance on the training set, PAC
learning guarantees that it will perform similarly well on new, unseen
emails, provided enough data is used.
PAC Learning Guarantees
Key Points in PAC Learning
• Error Bound (ϵ): The maximum allowed error on unseen data.
• Confidence Level (1−δ): The probability that the hypothesis will meet
the error bound.
• Sample Complexity: The number of samples required to achieve the
desired ϵ and δ.
• Hypothesis Space (H): The set of all possible functions the model can
choose from.
PAC Learnability
• A concept C is PAC-learnable if there exists an algorithm that can,
with high probability (1−δ), output a hypothesis h such that the error
of h is at most ϵ, given a sufficient number of samples.
• Error Definition: The true error of a hypothesis h is:
𝐸𝑟𝑟𝑜𝑟 ℎ = 𝑃 𝑥,𝑦 ~𝐷 ℎ 𝑥 ≠ 𝑦
Where:
𝑥, 𝑦 are drawn from the data distribution D.
ℎ 𝑥 ≠ 𝑦 indicates a misclassification.
Sample Complexity
• The number of training examples m required for PAC learning
depends on:
1. The size of the hypothesis space (H).
2. The error tolerance (ϵ).
3. The confidence level (δ).
1
𝑚≥ ( 𝐼𝑛 𝐻 + 𝐼𝑛 1/ δ)
𝜖

Where ∣H∣ is the size of the hypothesis space. Larger hypothesis spaces
require more samples to ensure a good generalization.
Example of PAC Learning (SPAM Classification)
• Hypothesis Space: Suppose you are using a linear classifier. The
hypothesis space H consists of all possible linear decision boundaries.
• Error Tolerance (ϵ): You want the model to classify with less than 5%
error.
• Confidence (1−δ): You want to be 95% confident in the result.
• Using the PAC formula for sample complexity:
• Suppose ∣H∣=10^6. To achieve ϵ=0.05 and δ=0.05:
1 6
1
𝑚≥ 𝐼𝑛 10 + 𝐼𝑛 ≈ 1320 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
0.05 0.05
This means you need at least 1,320 labeled samples to ensure the
classifier performs within the desired bounds.
Why PAC Learning is Important?
• Theoretical Guarantees: PAC learning provides a framework to
understand the conditions under which a model will generalize well.
• Guidance for Practice: It highlights the trade-off between model
complexity and data requirements.
• Robustness: It allows for the evaluation of different algorithms and
their ability to generalize.
Example2: Medium Built Person
• Training Set: Height and Weight of m individuals.
• Target: Whether the person is medium built or not.
Building Good Training Sets
1. Data Preprocessing
• Data preprocessing is the critical first step in machine learning, where
raw data is prepared to ensure that it is clean, structured, and ready
for analysis. It includes cleaning the data, transforming it, and
handling inconsistencies.
Why is it Important?
• Ensures better model accuracy and faster convergence.
• Reduces noise and irrelevant information that may mislead the
model.
Steps in Data Preprocessing:
• Removing Duplicates: Identical rows can distort patterns.
• Handling Missing Values: Replace missing data with:
• Mean/Median/Mode (numerical).
• Most frequent value (categorical).
• Outlier Detection: Identify and treat outliers using statistical methods
(e.g., IQR or z-scores).
Example
Scenario: Predicting house prices using a dataset
with missing values.
Size (sq ft) Bedrooms Price ($)
1200 3 300,000
1500 NaN 400,000
NaN 4 500,000
1800 4 NaN
Code for Handling Missing Data:
import pandas as pd
from sklearn.impute import SimpleImputer

# Example dataset
data = {
"Size (sq ft)": [1200, 1500, None, 1800],
"Bedrooms": [3, None, 4, 4],
"Price ($)": [300000, 400000, 500000, None]
}
df = pd.DataFrame(data)

# Impute missing values


imputer = SimpleImputer(strategy='mean') # Replace missing values with mean
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

print("Original Data:\n", df)


print("\nAfter Imputation:\n", df_imputed)
2. Handling Categorical Data
Categorical data represents labels or categories (e.g., "Red," "Blue,"
"Green"). Machine learning models require numerical input, so we
need to convert categories into numbers.
Techniques:
• One-Hot Encoding: Creates binary columns for each category.
• Label Encoding: Assigns an integer to each category.
• Ordinal Encoding: Encodes categories with a meaningful order.
Example
Scenario: Encode "Color" (Red, Blue, Green).
Color One-Hot Encoding Label Encoding
Red [1,0,0] 0
Blue [0,1,0] 1
Green [0,0,1] 2
Code - Example
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

# Categorical data
colors = ['Red', 'Blue', 'Green', 'Blue']

# One-Hot Encoding
one_hot_encoder = OneHotEncoder(sparse_output=False)
colors_one_hot = one_hot_encoder.fit_transform([[c] for c in colors])

# Label Encoding
label_encoder = LabelEncoder()
colors_label = label_encoder.fit_transform(colors)

print("One-Hot Encoding:\n", colors_one_hot)


print("\nLabel Encoding:\n", colors_label)
3. Partitioning a Dataset into Training and Test
Sets
Why Split Data?
To evaluate how well the model generalizes to unseen data.
• Training Set: Used to train the model.
• Test Set: Used to assess model performance.
Typical Split Ratios:
• 80/20 Split: 80% for training, 20% for testing.
• 70/30 Split: Common for smaller datasets.
• Cross-Validation: Splits the data multiple times to reduce bias.
Partitioning Dataset
Code - Example
from sklearn.model_selection import train_test_split

# Sample dataset
X = [[1], [2], [3], [4], [5]]
y = [0, 1, 0, 1, 0]

# Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training Data:", X_train, y_train)


print("Testing Data:", X_test, y_test)
4. Normalization
• Normalization rescales data to fit within a specific range (e.g., 0 to 1) to
ensure all features have the same scale.
Why Normalize?
• Improves the performance of algorithms sensitive to feature scaling (e.g.,
Gradient Descent, KNN).
• Scale data for consistent feature ranges
• Min-Max scaling for bounded values
• Standardize to zero mean and unit variance
• Improve convergence rates for optimization
• Essential for distance-based algorithms like KNN
• Avoid scale-related bias in models
Code - Example
5. Handling Imbalanced Datasets
• In imbalanced datasets, one class significantly outnumbers the
other(s) (e.g., Fraud detection: 99% non-fraud, 1% fraud).
• Solutions:
• Oversampling: Duplicate or synthesize minority class examples.
• Undersampling: Reduce the size of the majority class.
• Class Weights: Adjust the model’s loss function to penalize
misclassification of the minority class more heavily.
Handling Imbalanced Datasets
• Random Under-Sampler (RU)
• Random Over-Sampler (RO)
• Synthetic Minority Oversampling Technique (SMOTE)
• Borderline SMOTE
• SVM-SMOTE
• Adaptive Synthetic Sampling (ADASYN)
CODE - Example
6. Feature Selection and Dimensionality Reduction
• Selecting the most relevant features to improve model performance and
reduce complexity.
• What is Dimensionality Reduction?
• Reducing the number of input features while retaining important
information (e.g., PCA).
Feature Selection
Feature Selection Methods
Principal Component Analysis (PCA)
Code - Example
from sklearn.decomposition import PCA

# Example data
X = [[1, 2], [3, 4], [5, 6], [7, 8]]

# Apply PCA
pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)

print("Original Data:\n", X)
print("\nReduced Data:\n", X_reduced)

You might also like