0% found this document useful (0 votes)
2 views

SemVII_MachineLearning

The document outlines important questions and concepts related to machine learning, including the steps to design a machine learning problem, challenges in the field, and various algorithms and their applications. It covers topics such as linear regression, decision trees, performance metrics, logistic regression, ensemble learning methods like bagging and boosting, and the necessity of cross-validation. Additionally, it discusses specific use cases for machine learning in healthcare, finance, and weather forecasting.

Uploaded by

vu1s2223013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SemVII_MachineLearning

The document outlines important questions and concepts related to machine learning, including the steps to design a machine learning problem, challenges in the field, and various algorithms and their applications. It covers topics such as linear regression, decision trees, performance metrics, logistic regression, ensemble learning methods like bagging and boosting, and the necessity of cross-validation. Additionally, it discusses specific use cases for machine learning in healthcare, finance, and weather forecasting.

Uploaded by

vu1s2223013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

ML Imp.

Questions for Sem End

Mod. 1
Q1. What are the Steps to Design a Machine Learning Problem ?
OR Explain the Procedure to Designing a Machine Learning System.
OR Steps in developing a ML application.
Q2. What are the issues and challenges in Machine Learning ? [V IMP]

Q.3 Explain the steps required for selecting the right Machine Learning algorithm [V IMP]

Q.4 Applications or Business Applications of Machine Learning [V IMP]


1.​ Learning Associations: This technique is often applied in market basket analysis.
Association learning, or association rule mining, is a method in ML for discovering
relationships between variables in large datasets. ​
It’s commonly used to identify patterns or rules, such as products that are often
purchased together in retail. If people who buy product A typically also buy product B
and if there is a customer who buys A and does not buy B, then he is a potential B
customer.​
In business, we can use this association rule to find out the conditional probability of the
people buying product A and B together. ​
P(Milk/Bread) = 0.7
2.​ Classification: Classification is a supervised learning technique where the model is
trained to categorize data into predefined classes. It’s widely used for tasks where data
needs to be sorted into different categories based on features.​
In business, when an amount of money is loaned by a financial institution, it is important
for the bank to be able to predict in advance the risk associated with the load. Which is
the probability of whether the customer will pay the whole amount back or not ?
3.​ Regression: Regression is a technique in supervised learning that predicts a continuous
outcome based on one or more input features. The goal is to establish a relationship
between independent variables and a dependent variable to make numeric predictions.​
In business, Real estate companies use regression to estimate property prices. By
analyzing features like location, square footage, number of bedrooms, and market
trends, a model can predict the price of a house.
4.​ Financial Services and Fraud Detection: Banks and financial institutions use ML for
credit scoring, risk assessment, fraud detection, and algorithmic trading. ML algorithms
identify unusual transaction patterns that may indicate fraud, helping in real-time fraud
prevention.
5.​ Supply Chain and Demand Forecasting: ML predicts demand for products and
optimizes inventory management, improving supply chain efficiency. Retailers use ML to
forecast demand based on historical sales data, weather patterns, and seasonal trends,
avoiding stock outs and overstock.

Q.5 Explain Machine Learning and its types.


Machine Learning (ML) is a branch of artificial intelligence that enables systems to learn and
make decisions from data, without being explicitly programmed for each task. In ML, algorithms
identify patterns in large datasets and use these patterns to make predictions or decisions on
new, unseen data.
Mod. 2
Q.6 Explain Linear Regression with an Example + Numerical [IMP]
OR Explain the concept of Linear Regression and enlist its types
Q.7 Write a note on Decision Tree + Numerical [VV IMP]
A Decision Tree is a popular supervised learning algorithm used in machine learning for both
classification and regression tasks. The structure resembles a flowchart, where each internal
node represents a feature (attribute), each branch represents a decision rule, and each leaf
node represents an outcome or a prediction.
-​ Root Node: It represents an entire set of records or dataset and this is again divided into
two or more similar sets
-​ Splitting: Splitting procedure is used to divide a node into two or more sub-nodes
depending on the criteria. At each node, the algorithm chooses the feature that best
separates the data based on a criterion like Gini Impurity Index or Information Gain ID3.
-​ Decision Node: A decision node is a sub-node which is divided into more sub-nodes
-​ Leaf/Terminal Node: Leaf node is a node which is not further divided or a node with no
children
-​ Parent and Child Node: Parent node is a node, which is split into sub-nodes which are
called as child of parent node
-​ Branch/Sub-Tree: A branch or sub-tree is a sub part of the decision tree
-​ Pruning: Pruning method is used to reduce the size of decision trees by removing
nodes
For example, a decision tree can classify whether a loan applicant is “low-risk” or “high-risk”
based on factors like income, age, and credit score. The model splits the data at each node by
these features to reach a final decision at the leaf.

Q.8 Explain any five performance measures along with examples or Performance metrics
for Classification [V IMP]
OR Explain performance evaluation metrics for binary classification with a suitable
example.
-​ Accuracy: Accuracy measures the proportion of correctly predicted instances (both true
positives and true negatives) out of the total number of instances.​
Accuracy = (TP+TN) / Total​
Example: Suppose a model predicts whether an email is spam or not, and out of 100
emails, it correctly classifies 85 (both spam and not spam). The accuracy would be
85/100 = 0.85 or 85%
-​ Precision: Precision indicates the amount of positive predictions that are actually
correct. It answers the question, “Of all items predicted as positive, how many are truly
positive?”​
Precision = TP/(TP+FP)​
Example: In a medical diagnosis, if a model predicts cancer, precision tells us how many
of those predicted cases are actually cancerous. If the model predicted cancer 20 times,
but only 15 were correct, precision would be 15/20 = 0.75 or 75%.
-​ Recall: Recall measures the proportion of actual positives that were correctly identified
by the model. It answers, “Of all items that are truly positive, how many did we correctly
identify?”​
Recall: TP / (TP+FN)
Example: If there are 25 fraudulent transactions, and the model identifies 20, recall
would be 20/25 = 0.8 or 80%.
-​ F1 Score: The F1 score is the mean of precision and recall, offering a balance between
them. It’s useful when you need to consider both false positives and false negatives and
is particularly helpful in cases of imbalanced datasets​

-​ AUC: AUC represents the area under the ROC curve, which plots the True Positive Rate
(TPR) against the False Positive Rate (FPR) across various threshold levels. AUC
values range from 0 to 1, with 1 indicating perfect classification.​
Example: In binary classification, if a model has an AUC of 0.95, it’s generally
considered very good, meaning the model has a high true positive rate and low false
positive rate across different thresholds.

Q.9 Explain Regression line, Scatter plot, Error in prediction and Best fitting line.
Regression Line: The regression line is a straight line that represents the relationship between
two variables in a simple linear regression. This line is drawn through a set of data points in
such a way that it minimizes the total distance between itself and the actual data points. The
regression line is used to predict the dependent variable (Y) based on the value of the
independent variable (X).​
Equation for a simple linear regression line is Y = mX + c, where:​
Y is the Predicted value​
m is the Slope of the line​
c is the Y-intercept

Scatter Plot: A scatter plot is a graphical representation of individual data points, where each
point represents a pair of values for the variables X(independent) and Y(dependent). This allows
you to visually assess the relationship between the two variables.​
Scatter plots help to visualize whether there is a trend or correlation between X and Y. For
instance, in a dataset of house prices (Y) and house size (X), a scatter plot could show a
positive correlation if larger houses tend to be more expensive.

Error in Prediction: The error in prediction, also known as residual in regression, is the
difference between the actual observed value and the predicted value on the regression line.
This error shows how far off a model's prediction is from the actual observed value. The aim in
regression is to minimize these errors across all data points to make the predictions more
accurate.​
Error (Residual) = Y observed - Y predicted, If the actual sales of a product (Y) are 100 units
and the model predicts 90 units, then the error is 100−90 = 10.

Best Fitting Line: The best fitting line, also known as the line of best fit, is the regression line
that minimizes the sum of squared errors (the differences between actual and predicted values)
across all data points, ensuring that the overall error is as low as possible. The best fitting line
provides the most accurate representation of the linear relationship between X and Y.​

Q.10 Explain the terms overfitting, underfitting, bias & variance tradeoff with respect to
Machine Learning.

Q.11 Write a short note on Logistic Regression or Logistic vs SVM.


Logistic Regression is a statistical model used for binary classification tasks, where the goal is
to predict the probability of an instance belonging to one of two classes, typically labeled as 0
and 1. Unlike linear regression, which predicts a continuous output, logistic regression predicts
a probability that maps to one of the two classes using a logistic function, also known as the
sigmoid function.​
Sigmoid Function, Logistic regression uses the sigmoid function to model the probability of a
binary outcome. The function takes any real-valued number and maps it to a value between 0
and 1​

Decision Boundary: By setting a threshold, such as 0.5, we can classify instances into classes.
If the probability is greater than 0.5, the model assigns the instance to class 1; otherwise, it
assigns it to class 0.
Applications: Logistic regression is widely used in various domains for classification tasks,
including ​
Medical Diagnosis: Predicting the presence or absence of a disease​
Marketing: Predicting whether a customer will purchase a product​
Finance: Assessing the likelihood of loan default
Logistic Regression Support Vector Machine

Is a non-probabilistic model. Its primary goal


Is a probabilistic model. Its goal is to find the
is to find the optimal decision boundary
probability that a given instance belongs to a
(hyperplane) that maximizes the distance
particular class using the sigmoid function
between the classes.

Focuses on correctly separating classes with


Used mainly for classification by using
the widest possible margin rather than on
probability
probability

Can handle non-linear data effectively using


Can only create a linear decision boundary
kernel functions

Logistic regression is more interpretable, as it


directly provides the probability that an
SVM does not provide probabilities directly
instance belongs to a class, making it easier
hence its often harder to interpret
to understand how a feature influences the
classification

Preferred for simple and linearly separable Preferred when there is a need for complex
problems decision boundaries

Q.12 Explain Multivariate Linear regression method.


Multivariate Linear Regression is an extension of simple linear regression that models the
relationship between multiple independent variables (predictors) and a single dependent
variable (outcome). It’s widely used when the outcome depends on several factors rather than
just one.
Formula of Multivariate linear regression is, Y = b0 + b1X1 + b2X2 + … bnXn + e where,
Yis the dependent variable we want to predict
X1,X2,...,Xn are the independent variables (predictors)
b0 is the intercept of Y
b1, b2,...,bn are the coefficients of Xs
e is the error term or irreducible error
Multivariate linear regression incorporates multiple factors to make more accurate predictions as
compared to linear regression. So it's widely used in areas like economics (predicting GDP
based on multiple factors), finance (stock prices) and medicine.
Example: Suppose we want to predict the probability of a person having Diabetes(Y) based on
several factors: BMI(X1), Skin Thickness(X2) and Age(X3), the model would look like:
Diabetes = b0 + b1(BMI) + b2(Skin Thickness) + b3(Age) + e
Mod. 3
Q.13 Compare Bagging and Boosting with reference to ensemble learning. Explain how
these methods help to improve the performance of the machine learning model. [VV IMP]

Q.14 Explain the different ways to combine the classifier [VV IMP]
Ensemble Learning leverages multiple individual models, or "weak learners," to create a
stronger combined model that performs better than any single classifier alone. The idea is that
by combining different models, we can reduce errors and variance, leading to more accurate
and stable predictions.
Here are the main ways to combine classifiers under ensemble learning:
Bagging: Bagging reduces variance by training multiple models on different random subsets of
the training data (with replacement) and then averaging their predictions (or taking a majority
vote). Each model, often a decision tree, is trained on a random sample of the data. Once all
models are trained, predictions from each model are aggregated.
Bagging is useful in reducing overfitting and variance, especially in high-variance models like
decision trees.
Example: Random Forest
Boosting: Boosting aims to convert weak learners into strong learners by sequentially training
models, with each model attempting to correct the errors of the previous one. Each new model
focuses on the samples that were previously misclassified. The final prediction is a weighted
average (or weighted vote) of the predictions.
Boosting is effective for reducing both bias and variance, creating a more accurate model
Example: AdaBoost and Gradient Boosting

Stacking: Stacking uses multiple models of different types and combines their predictions using
a "meta-learner" or "meta-model," which learns how to best combine these predictions.

Stacking can often yield better results by leveraging the strengths of diverse models, though it
can be more complex and computationally intensive.
Example: A stacking ensemble could combine a decision tree, a logistic regression model, and
a support vector machine, with a meta-model learning the optimal combination of their outputs.

Q.15 Explain the necessity of cross-validation in Machine Learning applications and


K-fold cross-validation in detail [VV IMP]
Necessities of K-fold Cross Validation:
-​ Improved Model Evaluation: Provides a more reliable estimate of model performance
by testing it across different subsets of the data. Instead of relying on a single train-test
split (which may be biased), it ensures that each observation is used for both training
and validation, minimizing the impact of any one specific split.
-​ Better use of data: When data is limited, k-fold cross-validation maximizes data usage
by ensuring that each data point gets used in both training and validation sets.
-​ Reduced Variance from Averaging: By averaging the results across k different
validations, k-fold cross-validation reduces the risk of overfitting to any one train-test
division. This means you’re less likely to have a model that performs well on one subset
but poorly on others.
-​ Reduced Bias from Random splits: Unlike a single train-test split that might
accidentally favor or disfavor certain observations, k-fold cross-validation rotates through
k subsets, reducing bias introduced by any single split.

Q.16 Random forest algorithm or Explain Ensemble learning algorithm Random Forest
and its use cases in real-world applications. [VV IMP]
Real-world applications of Random Forest:
Healthcare Diagnostics Random Forest is frequently used for predicting the likelihood of
diseases like diabetes, heart disease, and cancer by analyzing patient data, symptoms, and
diagnostic indicators.
Weather Forecasting Random Forest can be used in weather prediction models to forecast
temperatures, rainfall, and other weather conditions. It helps in analyzing meteorological data to
predict local and regional climate trends, aiding agriculture, disaster management, and public
safety.
Finance Financial institutions use Random Forest for credit scoring, fraud detection, and risk
assessment. The model can help classify loan applicants as high or low risk by analyzing
factors such as credit history, income, and transaction patterns.
Stock Market Analysis Traders and analysts use Random Forest for stock price prediction by
analyzing historical data, market trends, and external economic indicators.

Mod. 4 are diddy bkl tmkc 😭 bhsda band kr(mi pn tech tr mhntoy)
Q.17 Explain Multiclass Classification [VV IMP]
Multi-class classification is a type of classification task where a model is designed to classify
input data into one of three or more classes (categories). Unlike binary classification, where
there are only two possible classes, multi-class classification involves multiple distinct
categories, requiring the model to make more complex decisions.
Training Data: The model is trained on labeled data where each instance belongs to one of
several classes.
Prediction: For each new instance, the model predicts the most likely class among the multiple
options.

Techniques for Multi-Class Classification


One-vs-All (OvA): In this approach, a separate binary classifier is trained for each class,
treating it as a "vs. all other classes" problem. If there are 4 classes, the model will train 4 binary
classifiers. During prediction, the class with the highest confidence score across these
classifiers is selected as the final output.
One-vs-One (OvO): Here, a classifier is trained for each pair of classes, resulting in C(C-1)/2
classifiers if there are C classes. For instance, with 4 classes, 6 classifiers are created: (Class 1
vs. Class 2), (Class 1 vs. Class 3), and so on. The final classification decision is made by
majority voting from all classifiers, where the class with the most "votes" across classifiers is
chosen.

Some common examples of multi-class classification tasks are:​


Image Classification: Classifying an image as either "cat," "dog," "bird," or "fish."​
Text Classification: Sorting documents or emails into categories such as "business," "sports,"
"politics," and "entertainment."​
Digit Recognition: Identifying handwritten digits (0–9) as in the MNIST dataset.

Q.18 Define Support Vector Machine. Explain how margin is computed and optimal
hyper-plane is decided + Concept of Margin and SVM. [IMP]

Q.19 Explain Support Vector Machine as a constrained optimization problem.


Support Vector Machine (SVM) can be formulated as a constrained optimization problem to find
the optimal hyperplane that best separates data into distinct classes. The goal of SVM is to find
the optimal hyperplane that maximizes the margin between two classes, where the margin is the
distance between the closest points of each class to the hyperplane (also known as the support
vectors).
Each point must be on the correct side of the margin. This leads us to a constrained
optimization problem, where we need to minimize ∥w∥, where ‘w’ is the weight vector which
determines the orientation of the hyperplane.

Q.20 Define the following terminologies with reference to SVM.

Kernel: A kernel is a mathematical function in SVM that helps us handle data that isn’t linearly
separable in its original form. It works by transforming the data into a higher-dimensional space
where it becomes easier to separate using a straight line (or hyperplane). ​
If we have two groups of points that can’t be separated by a line in a 2D plane, we can lift those
points into 3D space, in this new space it might be possible to draw a plane that separates
them.
Hard Margin: A hard margin refers to the maximum-margin hyperplane that perfectly separates
the data points of different classes without any misclassifications.
Soft Margin: When data contains outliers or is not perfectly separable, SVM uses the soft
margin technique. This method introduces a slack variable for each data point to allow some
misclassifications while balancing between maximizing the margin and minimizing violations.

Q.21 Explain the Kernel Trick in SVM


The kernel trick in SVM is a powerful technique that allows us to apply SVM to datasets that
aren’t linearly separable by transforming them into a higher-dimensional space where they
become separable.
Instead of working directly in the original data space, we apply a kernel function that
transforms the data into a higher-dimensional space. In this new space, the data points that
were not separable by a line or hyperplane in the original space can now be separated.
The key advantage of the kernel trick is that it avoids explicitly computing the
transformation for each data point. Instead of transforming each data point into this
higher-dimensional space (which could be computationally intensive), the kernel trick calculates
the relationships (or distances) between pairs of points directly in the original space.
The kernel trick allows the SVM to act as if it's operating in a higher-dimensional space, where it
finds a separating boundary, but without the costly computation of transforming all points.
This is both efficient and effective for separating complex data patterns.
Types of Kernels:
Linear Kernel, Useful when data is already linearly separable.
Polynomial Kernel, Adds polynomial features to create more complex boundaries.
Radial Basis Function kernel, Maps data to an infinite-dimensional space, effective for
non-linearly separable data.
Sigmoid kernel, Works somewhat like a neural network by applying a sigmoid function

Q.22 Consider the use case of Email spam detection. Identify and explain the suitable
machine learning technique for this task. [Naive Bayes]
For the use case of email spam detection, Naive Bayes is a highly suitable machine learning
technique due to its simplicity, effectiveness, and efficiency in handling text classification
problems like spam filtering.
-​ Naive Bayes is a probabilistic classifier that uses Bayes' Theorem to calculate the
probability that a given email is spam or not, based on the words (features) it contains.
Given an email, Naive Bayes computes the probability of it being spam based on the
presence of certain words commonly associated with spam (e.g., "free," "winner,"
"discount").
-​ The "naive" aspect of Naive Bayes assumes that the presence of each word in an email
is independent of the presence of other words. While this assumption might not hold
perfectly, it simplifies the computations and has been shown to perform very well in
practice for text classification tasks.
-​ Emails have a large vocabulary (number of unique words), making them
high-dimensional data. Naive Bayes handles high-dimensionality effectively because it
only requires word probabilities, not the full vocabulary structure.
-​ Naive Bayes is computationally efficient and requires relatively little training data,
making it fast and easy to train even with large datasets.
Training Phase: A dataset with labeled examples of spam and non-spam emails is used for
training. For each word, Naive Bayes calculates the probability of the word appearing in spam
emails and the probability of it appearing in non-spam emails.
Prediction Phase: For a new email, Naive Bayes calculates the probabilities of the email being
spam or not based on the presence of each word in the email. The classifier then assigns the
label “spam” or “non-spam” based on the higher probability.
Example: Suppose we have a simple spam email dataset with two classes: spam and not
spam. If the email contains words like “winner,” “free,” and “prize” frequently in spam emails,
Naive Bayes will assign a high probability to these words being in spam emails.

So, if a new email contains the words “You are a winner! Claim your free prize,” Naive Bayes will
calculate a high probability that the email is spam based on these words' presence and classify
it as spam.

Q.23 Write a short note on optimization technique in machine learning [Explain Steepest
Descent method of optimization]
In machine learning, optimization techniques are methods used to minimize or maximize an
objective function, typically a cost or loss function. The goal of these techniques is to find the
optimal parameters for a model and reduce errors which lead to the best predictions.
Gradient Descent method or the Steepest Descent method, is one of the simplest and most
commonly used optimization techniques in machine learning.
It aims to find the minimum of a function by taking steps proportional to the negative of the
gradient (slope) of the function at the current point. This method is useful when dealing with
continuous variables.

Steps of Steepest Descent method:


Step 1: Initialize Parameters: Start with an initial guess for the parameters for e.g weights in a
neural network
Step 2: Compute Gradient: Calculate the gradient of the objective function (loss function) with
respect to the current position’s parameters. The gradient or slope represents the direction of
the steepest increase in function’s value.
Step 3: Update Parameters: Adjust the parameters in the opposite direction of the gradient by
a factor of the learning rate,

Step 4: Repeat: Continue updating parameters iteratively until convergence (when the change
in parameters is very small, or the loss stops decreasing significantly).

Example: Consider minimizing a simple quadratic cost function, like a mean squared error. The
Steepest Descent method would compute the gradient at each point (for example, how the error
changes with respect to each weight in a linear regression) and then adjust the weights in the
direction that reduces the error. Over time, these steps bring the weights to values that minimize
the error, thus optimizing the model.

Q.24 Radial Basis Function


The Radial Basis Function (RBF) is a popular type of kernel function often used in Support
Vector Machines (SVMs) and other machine learning algorithms to handle non-linear
relationships. The RBF kernel maps data points into a higher-dimensional space, where they
can be linearly separated, even if they’re not linearly separable in the original space.
The RBF kernel computes similarity based on distance: points that are closer together have
higher kernel values (closer to 1), and points that are farther apart have lower kernel values
(closer to 0).
This similarity function creates a non-linear decision boundary in the original space by acting
as if the data points are mapped into a higher-dimensional space.

Example: Imagine we have two classes in a dataset that form concentric circles. A linear
boundary in the original 2D space would not separate these classes, but using an RBF kernel
transforms the data into a higher dimension where a hyperplane (linear boundary) can
effectively separate the classes.

Mod. 5
Q.25 Expectation Maximization Algorithm Short note (EM Algo) [VV IMP]
Q.26 DBSCAN or What is Density-based clustering ? Explain the steps used for
clustering using the DBSCAN algorithm. [VV IMP]

Q.27 K-mean algorithm or Numerical


Q.28 Explain the distance metrics used in clustering
In clustering, distance metrics are used to measure the similarity or dissimilarity between data
points, helping to determine which points should be grouped together. Some commonly used
distance metrics in clustering are:

Euclidean Distance: Euclidean distance is the "straight-line" distance between two points in
Euclidean space. Common in k-means clustering and scenarios where the data has a
continuous, multidimensional structure.

Manhattan Distance: The Manhattan distance between two points is the sum of the absolute
differences of their coordinates. Useful for sparse data and cases where the directions of
changes are more important than exact distances. Often used in city-block grids or
high-dimensional spaces.

Mod. 6
Q.29 What is Dimensionality reduction ? Explain how it can be utilized for classification
and clustering tasks in Machine Learning.
Dimensionality reduction is the process of reducing the number of features (or dimensions) in
a dataset, while retaining as much of the original information as possible. This is essential when
working with high-dimensional data, as too many features can make models complex, increase
computation time, and lead to issues like overfitting. Dimensionality reduction helps simplify
models, improve performance, and make data more manageable and interpretable.

Utilization of Dimensionality Reduction:


Noise Reduction: Dimensionality reduction helps remove irrelevant or redundant features,
simplifying the data and making patterns clearer for the classifier. This can improve accuracy, as
it reduces overfitting and noise.
Faster Training: With fewer dimensions, classifiers can be trained faster, as they have fewer
features to process.
Enhanced Interpretability: Reducing features makes it easier to visualize and interpret the
data, especially in exploratory data analysis.
Capturing Core patterns: In clustering, dimensionality reduction can reveal core patterns in
data by consolidating features that exhibit similar trends, which improves the ability to form
distinct clusters.
Visualization of Clusters: High-dimensional data is difficult to visualize. Reducing it to 2D or
3D makes it possible to visually inspect clusters and help in understanding cluster formations.

Q.30 Principal Component Analysis for Dimension Reduction [VV IMP]


OR What is Dimensionality reduction ? Describe how PCA is carried out to reduce
dimensionality of data sets.
Principal Component Analysis (PCA) is a widely used dimensionality reduction technique that
transforms a dataset with many features (dimensions) into a smaller set of new, uncorrelated
features, called principal components. These components capture most of the data's
variance, allowing us to reduce the number of dimensions while preserving essential
information.
Steps of PCA:
Step 1: Standardize the Data: Since PCA is sensitive to scale, the first step is to standardize
each feature to have a mean of zero and unit variance (standard deviation of 1).
Step 2: Calculate Covariance Matrix: PCA finds patterns in the data based on the
relationships between features. A covariance matrix is calculated to show how features vary
together.
Step 3: Calculate the Eigenvalues and Eigenvectors: The covariance matrix is decomposed
into eigenvalues and eigenvectors. The eigenvalues represent the amount of variance each
component captures, while the eigenvectors define the direction of the new feature space.
Step 4: Select Principal Components: Sort the eigenvalues in descending order, and choose
the top k components that capture the desired level of variance (often 95% or more). These
principal components will be used to transform the data.
Step 5: Transform the Data: Finally, project the original data onto the selected principal
components, reducing it to the new lower-dimensional space.
Benefits of PCA:
SAME 3 points as Utilization of Dimensionality Reduction in Classifiers
1.​ Noise Reduction / Reduction of Overfitting
2.​ Faster Training / Sped up Computation
3.​ Enhanced Interpretability / Visualization

Example: Imagine a dataset with thousands of genes as features in a medical study. PCA can
be applied to reduce these thousands of features to a smaller set of components that capture
the core variance. This allows for effective visualization, easier clustering of patient data, and
more efficient modeling.

Q.31 Linear Discriminant Analysis for Dimension Reduction Theory / [Numerical] [VV
IMP]
OR Explain the Dimensionality Reduction technique of LDA and its real world
applications.
Linear Discriminant Analysis (LDA) is a powerful technique that can be used for both
dimensionality reduction and classification. While it is mainly known for classification, it can
also reduce the dimensions of data, just like Principal Component Analysis (PCA), but with a
key difference: LDA considers class labels in the process.

Steps of LDA:
Step 1: Compute the Mean of each class: For each class in the dataset, calculate the mean of
the features.
Step 2: Compute the Between-Class Scatter Matrix: Measures the spread of the class
means from the overall mean of the dataset. It tries to capture how different the class means are
from each other. Variance BETWEEN all the classes.
Step 3: Compute the Within-Class Scatter Matrix: Measures how much each class's data
points deviate from the mean of that class. Variance WITHIN a single class.
Step 4: Compute the Optimal Projection (Eigenvectors): By solving the eigenvalue problem
for the matrix, LDA computes discriminant directions that maximize the class separability,
whereas the eigenvectors represent the axes along which the classes are most distinguishable.
Step 5: Transform the Data: Choose the top k eigenvectors corresponding to largest
eigenvalues to project data into a lower-dimensional space.




Q.32 SVD numerical

You might also like