0% found this document useful (0 votes)
432 views

SVM Notes

The support vector machine (SVM) algorithm is a supervised machine learning algorithm used for both classification and regression problems. The goal of SVM is to find the optimal hyperplane that distinctly classifies data points into categories by maximizing the margin between the categories. Support vectors are the data points closest to the hyperplane whose removal would change the position of the hyperplane. SVM can perform both linear and nonlinear classification by using kernels to transform data into a higher dimensional space.

Uploaded by

Rahul sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
432 views

SVM Notes

The support vector machine (SVM) algorithm is a supervised machine learning algorithm used for both classification and regression problems. The goal of SVM is to find the optimal hyperplane that distinctly classifies data points into categories by maximizing the margin between the categories. Support vectors are the data points closest to the hyperplane whose removal would change the position of the hyperplane. SVM can perform both linear and nonlinear classification by using kernels to transform data into a higher dimensional space.

Uploaded by

Rahul sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Support Vector Machine

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:

 Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
 Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in


n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.

 How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green
or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors
and the hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:

What to do if data are not linearly separable?

Say, our data is like shown in the figure above. SVM solves this by creating a new variable
using a kernel. We call a point x i on the line and we create a new variable y i as a function
of distance from origin o.so if we plot this we get something like as shown below
In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as kernel.

 SVM Kernel:

The SVM kernel is a function that takes low dimensional input space and transforms it
into higher-dimensional space, ie it converts non separable problem to separable
problem. It is mostly useful in non-linear separation problems. Simply put the kernel, it
does some extremely complex data transformations then finds out the process to
separate the data based on the labels or outputs defined.

 Linear Kernel: It is used when data is linearly separable.


 Gaussian Kernel Radial Basis Function (RBF): It is used to perform transformation
when there is no prior knowledge about data and uses radial basis method to
improve the transformation.
Gaussian Kernel Graph
 Sigmoid Kernel: this function is equivalent to a two-layer, perceptron model of the
neural network, which is used as an activation function for artificial neurons.
Sigmoid Kernel Graph

 Polynomial Kernel: It represents the similarity of vectors in the training set of data
in a feature space over polynomials of the original variables used in the kernel.

Polynomial Kernel Graph


 Hyperparameters

 Cost parameter: C parameter adds a penalty for each misclassified data point. If c is
small, the penalty for misclassified points is low so a decision boundary with a large
margin is chosen at the expense of a greater number of misclassifications. If c is
large, SVM tries to minimize the number of misclassified examples due to high
penalty which results in a decision boundary with a smaller margin. Penalty is not
same for all misclassified examples. It is directly proportional to the distance to
decision boundary.

 Gamma: One of the commonly used kernel functions is radial basis function (RBF).
Gamma parameter of RBF controls the distance of influence of a single training point.
Low values of gamma indicate a large similarity radius which results in more points
being grouped together. For high values of gamma, the points need to be very close
to each other to be considered in the same group (or class). Therefore, models with
very large gamma values tend to overfit.

In the paper A practical guide to Support Vector Classifier they recommend to search

 Advantages of SVM:

 Effective in high dimensional cases


 Its memory efficient as it uses a subset of training points in the decision function
called support vectors
 Different kernel functions can be specified for the decision functions and its
possible to specify custom kernels

You might also like