Support Vector Machine Algorithm
By:-
Dr Rashmi Popli
Associate Professor
Department of Computer Engineering
Dr Rashmi
Support Vector Machine Algorithm
• SVM is one of the most popular Supervised Machine Learning
algorithm, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.
• Goal is to create the decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary
is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.
Dr Rashmi
Dr Rashmi
Margin
Hyperplane Types
• A hyperplane is a decision boundary that separates the data into different classes
• There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that
helps to classify the data points. This best boundary is known as the hyperplane
of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features then hyperplane will be a straight line. And if
there are 3 features, then hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Dr Rashmi
Dr Rashmi
Example
SVM can be understood with the example that we have used in the KNN
classifier.
Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm. We will first train our
model with lots of images of cats and dogs so that it can learn about different
features of cats and dogs, and then we test it with this strange creature. So as
support vector creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see the extreme case
of cat and dog. On the basis of the support vectors, it will classify it as a cat.
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Dr Rashmi
Dr Rashmi
SVM Types:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Dr Rashmi
Dr Rashmi
Working of SVM
Linear SVM:
The working of the SVM algorithm can be understood by using an
example. Suppose we have a dataset that has two tags (green and
blue), and the dataset has two features x1 and x2. We want a classifier
that can classify the pair(x1, x2) of coordinates in either green or blue.
So as it is 2-d space so by just using a
straight line, we can easily separate these
two classes.
Dr Rashmi
there can be multiple lines that can separate these classes. Consider
the below image:
Dr Rashmi
The SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane.
SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors.
The distance between the vectors and
the hyperplane is called as margin.
And the goal of SVM is to maximize
this margin. The hyperplane with
maximum margin is called the optimal
hyperplane.
Dr Rashmi
Non-Linear SVM
• If data is linearly arranged, then we can separate it by using a straight
line, but for non-linear data, we cannot draw a single straight line.
So to separate these data points, we
need to add one more dimension.
For linear data, we have used two
dimensions x and y.
For non-linear data, we will add a third
dimension z. It can be calculated as:
z=x2 +y2
Dr Rashmi
Low Dimension data
Kernel
High Dimension Data
Kernal means replacing the dot product in mapping function with kernel function
Dr Rashmi
Kernel
• Support Vector Machines (SVMs) use kernel methods to transform
the input data into a higher-dimensional feature space, which makes
it simpler to distinguish between classes or generate predictions.
• Kernel approaches in SVMs work on the fundamental principle of
implicitly mapping input data into a higher-dimensional feature
space without directly computing the coordinates of the data points
in that space.
Dr Rashmi
• By adding the third dimension, the sample space will become as
below image.
So now, SVM will divide the datasets into
classes in the following way. Consider the
below image:
Dr Rashmi
• Since we are in 3-d Space, hence it is looking like a plane parallel to
the x-axis. If we convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Dr Rashmi
Characteristics of Kernel Function
• Mercer's condition: A kernel function must satisfy Mercer's condition to be valid. This
condition ensures that the kernel function is positive semi definite, which means that it is
always greater than or equal to zero.
• Positive definiteness: A kernel function is positive definite if it is always greater than zero
except for when the inputs are equal to each other.
• Non-negativity: A kernel function is non-negative, meaning that it produces non-negative
values for all inputs.
• Symmetry: A kernel function is symmetric, meaning that it produces the same value
regardless of the order in which the inputs are given.
• Reproducing property: A kernel function satisfies the reproducing property if it can be used to
reconstruct the input data in the feature space.
• Smoothness: A kernel function is said to be smooth if it produces a smooth transformation of
the input data into the feature space.
• Complexity: The complexity of a kernel function is an important consideration, as more
complex kernel functions may lead to over fitting and reduced generalization performance.
Dr Rashmi
Linear Kernel
A linear kernel is a type of kernel function used in machine learning, including
in SVMs (Support Vector Machines). It is the simplest and most commonly
used kernel function, and it defines the dot product between the input vectors
in the original feature space.
K(x, y) = x .y
Where x and y are the input feature vectors. The dot product of the input
vectors is a measure of their similarity or distance in the original feature space.
When using a linear kernel in an SVM, the decision boundary is a linear
hyperplane that separates the different classes in the feature space. This linear
boundary can be useful when the data is already separable by a linear decision
boundary or when dealing with high-dimensional data, where the use of more
complex kernel functions may lead to overfitting.
Dr Rashmi
Polynomial Kernel
• It is a nonlinear kernel function that employs polynomial functions to transfer the input
data into a higher-dimensional feature space.
• K(x, y) = (x. y + c)d
• Where x and y are the input feature vectors, c is a constant term, and d is the degree of
the polynomial,. The constant term is added to, and the dot product of the input vectors
elevated to the degree of the polynomial.
• The decision boundary of an SVM with a polynomial kernel might capture more intricate
correlations between the input characteristics because it is a nonlinear hyperplane.
• The degree of nonlinearity in the decision boundary is determined by the degree of the
polynomial.
Dr Rashmi
Gaussian (RBF) Kernel
• The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a
popular kernel function used in machine learning, particularly in SVMs (Support
Vector Machines). It is a nonlinear kernel function that maps the input data into
a higher-dimensional feature space using a Gaussian function.
• K(x, y) = exp(-gamma * ||x - y||^2)
• Where x and y are the input feature vectors, gamma is a parameter that controls
the width of the Gaussian function, and ||x - y||^2 is the squared Euclidean
distance between the input vectors.
• When using a Gaussian kernel in an SVM, the decision boundary is a nonlinear
hyper plane that can capture complex nonlinear relationships between the input
features. The width of the Gaussian function, controlled by the gamma
parameter, determines the degree of nonlinearity in the decision boundary.
Dr Rashmi
Laplace Kernel
The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is
a type of kernel function used in machine learning, including in SVMs (Support
Vector Machines). It is a non-parametric kernel that can be used to measure the
similarity or distance between two input feature vectors.
K(x, y) = exp(-gamma * ||x - y||)
• Where x and y are the input feature vectors, gamma is a parameter that controls
the width of the Laplacian function, and ||x - y|| is the L1 norm or Manhattan
distance between the input vectors.
• When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear
hyperplane that can capture complex relationships between the input features.
The width of the Laplacian function, controlled by the gamma parameter,
determines the degree of nonlinearity in the decision boundary.
Dr Rashmi
Positively labelled data points=4,5
7,4
Negatively Labelled Data points=2,2
Find out best hyperplane using SVM
Dr Rashmi