0% found this document useful (0 votes)
10 views

Machine Learning Unit-3.3

Uploaded by

sahil.utube2003
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Machine Learning Unit-3.3

Uploaded by

sahil.utube2003
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Support Vector Machine Algorithm

• Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms.

• It is used for Classification as well as Regression problems.

• However, primarily, it is used for Classification problems in Machine


Learning.

• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.

• This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the


hyperplane.

• These extreme cases are called as support vectors.


• In the SVM algorithm, we plot each data item as a point in n-
dimensional space (where n is a number of features you have)
with the value of each feature being the value of a particular
coordinate.

• Then, we perform classification by finding the hyper-plane that


differentiates the two classes very well.

• Hyperplane whose distance from it to the nearest data point on


each side is maximized.

• If such a hyperplane exists it is known as the maximum-margin


hyperplane/hard margin.

• SVM algorithm can be used for Face detection, image


classification, text categorization, etc.
• Consider the below diagram in which there
are two different categories that are classified
using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it
with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme
cases (support vectors), it will see the extreme case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
• Consider the below diagram:
Types of SVM

• Linear SVM: Linear SVM is used for linearly separable


data, which means if a dataset can be classified into
two classes by using a single straight line, then such
data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.

• Non-linear SVM: Non-Linear SVM is used for non-


linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Hyperplane: There can be multiple lines/decision boundaries
to segregate the classes in n-dimensional space, but we
need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as
the hyperplane of SVM.

• The dimensions of the hyperplane depend on the features


present in the dataset, which means if there are 2 features
(as shown in image), then hyperplane will be a straight
line. And if there are 3 features, then hyperplane will be a
2-dimension plane.

• We always create a hyperplane that has a maximum


margin, which means the maximum distance between the
data points.
Support Vectors:
• The data points or vectors that are the closest
to the hyperplane and which affect the
position of the hyperplane are termed as
Support Vector.

• Since these vectors support the hyperplane,


hence called a Support vector.
How does SVM works?
• Linear SVM: Suppose we have a dataset that has two tags (green
and blue), and the dataset has two features x1 and x2. We want
a classifier that can classify the pair(x1, x2) of coordinates in
either green or blue. Consider the below image:

• So as it is 2-d space so by just using a straight line, we can easily


separate these two classes.
• But there can be multiple lines that can
separate these classes. Consider the below
image:
• Hence, the SVM algorithm helps to find the best line or decision
boundary.
• This best boundary or region is called as a hyperplane.
• SVM algorithm finds the closest point of the lines from both the classes.
• The distance between the vectors and the hyperplane is called
as margin.
• The hyperplane with maximum margin is called the optimal hyperplane.
• Non-Linear SVM: If data is linearly arranged, then we can
separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line.

• Consider the below image:


• So to separate these data points, we need to add one more
dimension. For linear data, we have used two dimensions x and y.
• So for non-linear data, we will add a third dimension z. It can be
calculated as:
z=x2 +y2

• By adding the third dimension, the sample space will become as


below image:
• Since we are in 3-d Space, hence it is looking
like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:
• How can we identify the right hyper-plane?”. Don’t worry, it’s not as
hard as you think!

• Scenario-1:Identify the right hyper-plane: Here, we have three


hyper-planes (A, B, and C). Now, identify the right hyper-plane to
classify stars and circles.

•To identify the right hyper-plane: “Select the hyper-plane which


segregates the two classes better”.

•In this scenario, hyper-plane “B” has excellently performed this job.
• Scenario-2: Identify the right hyper-plane : Here, we
have three hyper-planes (A, B, and C) and all are
segregating the classes well. Now, How can we
identify the right hyper-plane?

Here, maximizing the distances between nearest data point


(either class) and hyper-plane will help us to decide the right
hyper-plane. This distance is called as Margin.
• Let’s look at the below snapshot:

• Above, you can see that the margin for hyper-plane C is high as
compared to both A and B. Hence, we name the right hyper-
plane as C. Another lightning reason for selecting the hyper-
plane with higher margin is robustness. If we select a hyper-plane
having low margin then there is high chance of miss-
classification.
• Scenario3:Identify the right hyper-plane :

Some of you may have selected the hyper-plane B as it has


higher margin compared to A. But, here is the catch, SVM selects
the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification
error and A has classified all correctly. Therefore, the right hyper-
plane is A.
• Scenario-4:-Can we classify following two classes ?

• Below, I am unable to segregate the two classes using a


straight line, as one of the stars lies in the territory of
other(circle) class as an outlier.
• As I have already mentioned, one star at other end is like
an outlier for star class.

• The SVM algorithm has a feature to ignore outliers and


find the hyper-plane that has the maximum margin.

• Hence, we can say, SVM classification is robust to outliers.


• (Scenario-5): Find the hyper-plane to segregate to classes
• In the Fig. given below, we can’t have linear hyper-plane
between the two classes, so how does SVM classify these
two classes?
• Till now, we have only looked at the linear hyper-plane.
• SVM can solve this problem. Easily! It solves this problem by
introducing additional feature. Here, we will add a new feature
z=x^2+y^2. Now, let’s plot the data points on axis x and z:

• In above plot, points to consider are:


• All values for z would be positive always because z is the
squared sum of both x and y
• In the original plot, red circles appear close to the origin of x
and y axis, leading to lower value of z and star relatively away
from the origin result to higher value of z.
• In the SVM classifier, it is easy to have a linear
hyper-plane between these two classes.

• But, another burning question arises is, should


we need to add this feature manually to have
a hyper-plane.

• No, the SVM algorithm has a technique called


the kernel trick.
SVM Kernel:

• The SVM kernel is a function that takes low dimensional


input space and transforms it to a higher dimensional space.
• i.e. it converts not separable problem to separable problem.
(2-dimension space to 3-dimension space)

• It is mostly useful in non-linear separation problem.

• Simply put, it does extremely complex data transformations,


and

• then finds out the process to separate the data based on


the labels or Category.
What is SVM Kernel Functions?
• Kernels or kernel methods (also called Kernel functions) are sets
of different types of algorithms that are being used for
classification.

• They are used to solve a non-linear classification problem by


using a linear classifier.

• The SVM uses what is called a “Kernel Trick” where the data is
transformed and an optimal boundary is found for the possible
outputs.

• The kernel functions job is to take data as input and transform it


in any required form.

• In this article, we will be looking at various types of kernels.


What is Kernel?
• A kernel is a function used in SVM for helping to solve
problems.

• They provide shortcuts to avoid complex calculations.

• The amazing thing about kernel is that we can go to


higher dimensions and perform smooth calculations
with the help of it.

• A kernel helps to form the hyperplane in the higher


dimension without raising the complexity.
• It is very difficult to solve this classification using a linear classifier as
there is no good linear line that should be able to classify the red and
the green dots as the points are randomly distributed.

• Here comes the use of kernel function which transform the points to
higher dimensions, solves the problem over there and returns the
output.

• Think of this in this way, we can see that the square are enclosed in
some perimeter area while the circle lies outside it, likewise, there
could be other scenarios where green dots might be distributed in a
trapezoid-shaped area.

• So, what we do is to convert the two-dimensional plane which was


first classified by one-dimensional hyperplane (“or a straight line”) to
the three-dimensional area and here our classifier i.e. hyperplane will
not be a straight line but a two-dimensional plane which will cut the
area.
• Kernels are the function to solve non-linear problems with the help of linear
classifiers.

• The kernel functions are used as parameters in the SVM codes.

• They help to determine the shape of the hyperplane and decision boundary.

• We can set the value of the kernel parameter in the SVM code.

• The value can be any type of kernel from linear to polynomial.

• If the value of the kernel is linear then the decision boundary would be
linear and two-dimensional.

• We do not need to do complex calculations.

• The kernel functions do all the hard work.

• We just have to give the input and use the appropriate kernel.
• SVM algorithms use a set of mathematical functions that are defined as the
kernel.

• Different SVM algorithms use different types of kernel functions.

• These functions can be of different types. For example


1. Linear,
2. Nonlinear,
3. Polynomial
4. Radial basis function (RBF), and
5. Sigmoid.

• The mostly used kernel function is RBF. Because it has localized and finite
response along the entire x-axis.

• The kernel functions return the inner product between two points in a suitable
feature space.

• Simply put, it does some extremely complex data transformations, then finds out
the method to separate the data points based on the target classes you’ve
defined.
SVM for Non-Linear Data Sets
• An example of non-linear data is:

• In this case we cannot find a straight line to separate apples from lemons.

• So how can we solve this problem. We will use the Kernel Trick!
• The basic idea is that when a data set is inseparable in the current
dimensions, add another dimension, may be that way the data will
be separable.

• Just think about it, the example above is in 2D and it is inseparable,


but may be in 3D there is a gap between the apples and the lemons,
may be there is a level difference, so lemons are on level one and
apples are on level two.

• In this case, we can easily draw a separating hyperplane (in 3D a


hyperplane is a plane) between level-1 and 2.

Mapping to Higher Dimensions:


• To solve this problem we shouldn’t just blindly add another
dimension, we should transform the space so we generate this level
difference intentionally.
• Mapping from 2D to 3D
Mapping from 2D to 3D:
• Let's assume that we add another dimension called X3. Another
important transformation is that in the new dimension the points
are organized using the formula x3=x1² + x2².

• If we plot the plane defined by the x1² + x2² formula, we will get
something like this:

.
• Now we have to map the apples and lemons (which are
just simple points) to this new space.

• Think about it carefully, what did we do?

• We just used a transformation in which we added levels


based on distance.

• If you are in the origin, then the points will be on the


lowest level.

• As we move away from the origin, it means that we


are climbing the hill (moving from the center of the plane
towards the margins), so the level of the points will be
higher.
• Now if we consider that the origin is the lemon from the center, we will have
something like this:

• Now we can easily separate these two classes.


• These transformations are called kernels.
• Popular kernels are: Linear Kernel, Non-Linear Kernel, Polynomial Kernel,
Gaussian Kernel, Radial Basis Function (RBF), Laplace RBF Kernel, Sigmoid
Kernel, etc.
Types of Kernel and methods in SVM

1. Linear Kernel: Let us say that we have two vectors with name x1 and x2, then the
linear kernel is defined by the dot product of these two vectors:

• Linear Kernel Formula


F(xi, xj) = sum( xi.xj)
Here, xi, xj represents the data you’re trying to classify.

2. Polynomial Kernel: A polynomial kernel is defined by the following equation:


K(x1, x2) = (x1. x2 + 1)d,
Where, d is the degree of the polynomial and x1 and x2 are vectors/data point.

• Polynomial Kernel Formula


F(xi, xj) = (xi.xj+1)^d

Here ‘.’ shows the dot product of both the values, and d denotes the degree.

F(xi, xj) representing the decision boundary to separate the given classes.
3. Gaussian RBF Kernel: It is a general-purpose kernel; used when there is no prior knowledge about the
data.

Equation is:

This kernel is an example of a radial basis function kernel.

Below is the equation for this:

for

F(x, xj) = exp(-gamma * ||x - xj||^2)

• The value of gamma varies from 0 to 1. You have to manually provide the value of gamma in the code.
The most preferred value for gamma is 0.1.

• The given sigma plays a very important role in the performance of the Gaussian kernel and should
neither be overestimated and nor be underestimated, it should be carefully tuned according to the
problem.

• It is one of the most preferred and used kernel functions in svm. It is usually chosen for non-linear data.
It helps to make proper separation when there is no prior knowledge of data.
4. Hyperbolic or the Sigmoid Kernel
• This kernel is used in neural network areas of machine learning.

• The activation function for the sigmoid kernel is the bipolar


sigmoid function.

• The equation for the hyperbolic kernel function is:

• This kernel function is similar to a two-layer perceptron model


of the neural network, which works as an activation function
for neurons.

• It can be shown as,


Sigmoid Kernel Function
F(x, xj) = tanh(αxTy + c)
Advantages of SVM Kernel Function:

1. Effective in high dimensional cases.

2. Its memory efficient as it uses a subset of


training points in the decision function called
support vectors.

3. Different kernel functions can be specified for


the decision functions and its possible to
specify custom kernels.

You might also like