MIT School of Computing
Department of Computer Science & Engineering
Third Year Engineering
23CSE3006 -MACHINE LEARNING
Class - T.Y. AIA (SEM-II)
Unit II: Supervised Machine Learning
Name Of the Course Coordinator:
Prof. Aarti Pimpalkar
Team Members
1. Prof. Dr. Nilima Kulkarni
2. Prof. Abhishek Das
3. Prof. Dattatray Kale
4. Prof. Nilesh Kulal
AY 2025-2026 SEM-II
Support Vector Machine
(SVM)
Key Concepts of Support Vector Machine
•Hyperplane: A decision boundary separating different classes
in feature space and is represented by the equation wx + b = 0 in
linear classification.
•Support Vectors: The closest data points to the hyperplane,
crucial for determining the hyperplane and margin in SVM.
•Margin: The distance between the hyperplane and the support
vectors. SVM aims to maximize this margin for better
classification performance.
•Kernel: A function that maps data to a higher-dimensional
space enabling SVM to handle non-linearly separable data.
•Hard Margin: A maximum-margin hyperplane that perfectly
separates the data without misclassifications.
•Soft Margin: Allows some misclassifications by introducing
slack variables, balancing margin maximization and
misclassification penalties when data is not perfectly separable.
Key Concepts of Support Vector Machine
•Hyperplane: A decision boundary separating different classes
in feature space and is represented by the equation wx + b = 0 in
linear classification.
•Support Vectors: The closest data points to the hyperplane,
crucial for determining the hyperplane and margin in SVM.
•Margin: The distance between the hyperplane and the support
vectors. SVM aims to maximize this margin for better
classification performance.
•Kernel: A function that maps data to a higher-dimensional
space enabling SVM to handle non-linearly separable data.
•Hard Margin: A maximum-margin hyperplane that perfectly
separates the data without misclassifications.
•Soft Margin: Allows some misclassifications by introducing
slack variables, balancing margin maximization and
misclassification penalties when data is not perfectly separable.
How does Support Vector Machine Algorithm
Work?
The key idea behind the SVM algorithm is to find The best hyperplane also known as the "hard
the hyperplane that best separates two classes by margin" is the one that maximizes the distance
maximizing the margin between them. This between the hyperplane and the nearest data
margin is the distance from the hyperplane to the points from both classes. This ensures a clear
nearest data points (support vectors) on each side. separation between the classes. So from the above
figure, we choose L2 as hard margin. Let's
consider a scenario like shown below:
How does SVM classify the data?
• The blue ball in the boundary of red ones is an outlier of blue balls. The SVM
algorithm has the characteristics to ignore the outlier and finds the best hyperplane
that maximizes the margin. SVM is robust to outliers.
• A soft margin allows for some misclassifications or violations of the margin
to improve generalization.
• The SVM optimizes the following equation to balance margin maximization
and penalty minimization:
• The penalty used for violations is often hinge loss which has the following
behavior:
• If a data point is correctly classified and within the margin there is
no penalty (loss = 0).
• If a point is incorrectly classified or violates the margin the hinge loss
increases proportionally to the distance of the violation.
• Till now we were talking about linearly separable data that separates
group of blue balls and red balls by a straight line/linear line.
What to do if data are not linearly separable?
When data is not linearly separable i.e it can't be divided by a
straight line, SVM uses a technique called kernels to map the
data into a higher-dimensional space where it becomes
separable. This transformation helps SVM find a decision
boundary even for non-linear data.
A kernel is a function that maps data points into a
higher-dimensional space without explicitly computing the
coordinates in that space. This allows SVM to work efficiently
with non-linear data by implicitly performing the mapping. For
example consider data points that are not linearly separable. By
applying a kernel function SVM transforms the data points into
a higher-dimensional space where they become linearly
separable.
•Linear Kernel: For linear separability.
•Polynomial Kernel: Maps data into a polynomial space.
•Radial Basis Function (RBF) Kernel: Transforms data into a
space based on distances between data points.
In this case the new variable y is created as a function
of distance from the origin.
Mathematical Computation of SVM
Consider a binary classification problem with two classes, labeled as +1 and -1. We have a training dataset
consisting of input feature vectors X and their corresponding class labels Y. The equation for the linear
hyperplane can be written as:
Where:
•w is the normal vector to the hyperplane (the direction perpendicular to it).
•b is the offset or bias term representing the distance of the hyperplane from the origin along the normal
vector w.
Distance from a Data Point to the Hyperplane
The distance between a data point x_i and the decision boundary can be calculated as:
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the normal vector W
Linear SVM Classifier
Distance from a Data Point to the Hyperplane:
Where y^ is the predicted label of a data point.
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided into two main parts:
•Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be
precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in
higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin
between the classes is the decision boundary.
•Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a straight line
(in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly separable data. The original input data is
transformed by these kernel functions into a higher-dimensional feature space where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this modified space.