Support Vector Machine
Support Vector Machine
SVM is a supervised algorithm which can be
used for both classification or regression
problems. It separates data by using concept
of hyperplanes.
Support Vector Machine
Support Vectors are simply the co-ordinates
of individual observation. These are data
points that are closer to the hyperplane and
influence the position and orientation of the
hyperplane.
The distance between the vectors and the
hyperplane is called as margin.
The main objective is to maximize this
margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Support Vector Machine
.
SVM takes the training data as an input and
outputs a line/hyperplane that separates
those classes if possible.
For a space of n dimensions, we have a
hyperplane of n-1 dimensions separating it
into two or more parts.
Types of SVM
Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
.linearly separable data, and classifier is used called as Linear
SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM
classifier.
Support Vector Machine
Support Vector Machine
Support Vector Machine
This data is simple to classify and one can see
that the data is clearly separated into two
segments.
Any line that separates the red and blue items
can be used to classify the above data.
Support Vector Machine
Had this data been multi-dimensional data,
any plane can separate and successfully
classify the data.
However, we want to find the “most optimal”
solution. What will then be the characteristic
of this most optimal line?
Support Vector Machine
We have to remember that this is just the
training data and we can have more data
points which can lie anywhere in the
subspace. If our line is too close to any of the
data points, noisy test data is more likely to
get classified in a wrong segment.
We have to choose the line which lies
between these groups and is at the farthest
distance from each of the segments.
Support Vector Machine
To solve this classifier line, we draw the line
as y=ax+b and make it equidistant from the
respective data points closest to the line. So
we want to maximize the margin (m).
Support Vector Machine
Support Vector Machine
How does it work?
The process of segregating the two classes with
a hyper-plane.
“How can we identify the right hyper-plane?”.
Support Vector Machine
Identify the right hyper-plane (Scenario-
1): Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify
star and circle.
Support Vector Machine
Support Vector Machine
Identify the right hyper-plane (Scenario-
2): Here, we have three hyper-planes (A, B and
C) and all are segregating the classes well. Now,
How can we identify the right hyper-plane?
Support Vector Machine
Support Vector Machine
Here, maximizing the distances between nearest
data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This
distance is called as Margin. Let’s look at the
below snapshot:
Support Vector Machine
Support Vector Machine
You can see that the margin for hyper-plane C
is high as compared to both A and B. Hence,
we name the hyper-plane C as best.
Another lightning reason for selecting the
hyper-plane with higher margin is robustness.
If we select a hyper-plane having low margin
then there is high chance of miss-
classification.
Support Vector Machine
Identify the right hyper-plane (Scenario-
3):Hint: Use the rules as discussed in previous
section to identify the right hyper-plane
Support Vector Machine
Support Vector Machine
Some of you may have selected the hyper-
plane B as it has higher margin compared
to A. But, here is the catch, SVM selects the
hyper-plane which classifies the classes
accurately prior to maximizing margin. Here,
hyper-plane B has a classification error and A
has classified all correctly. Therefore, the right
hyper-plane is A.
Support Vector Machine
Support Vector Machine
One star at other end is like an outlier for star
class. SVM has a feature to ignore outliers
and find the hyper-plane that has maximum
margin. Hence, we can say, SVM is robust to
outliers.
Support Vector Machine
Support Vector Machine
Find the hyper-plane to segregate to
classes (Scenario-5): In this scenario, we
can’t have linear hyper-plane between the
two classes, so how does SVM classify these
two classes? (Till now, we have only looked at
the linear hyper-plane)
Support Vector Machine
Support Vector Machine
SVM can solve this problem. Easily! It solves
this problem by introducing additional feature.
Here, we will add a new feature z=x^2+y^2.
Now, let’s plot the data points on axis x and z
Support Vector Machine
Support Vector Machine
In this plot, points to be considered are:
All values for z would be positive always
because z is the squared sum of both x and y.
In the original plot, red circles appear close to
the origin of x and y axes, leading to lower
value of z and star relatively away from the
origin result to higher value of z.
Support Vector Machine
SVM, the function which converts non-separable
problem to separable problem, is called kernel.
It is mostly useful in non-linear separation
problem.
It does some extremely complex data
transformations to separate the data based on
the labels or outputs.
SVM Implementation
import pandas as pd
import numpy as np
from sklearn.model_selection import
train_test_split
bankdata = pd.read_csv("D:/Machine
Learning/Lab/bill_authentication.csv")
print(bankdata.shape)
print(bankdata.head())
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
34
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)
print(X_train.shape)
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
35
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import
classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
36
SVM Kernels
1. Polynomial Kernel
In the case of polynomial kernel, you also
have to pass a value for the degree parameter
of the SVC class. This basically is the degree
of the polynomial.
svclassifier = SVC(kernel='poly', degree=8)
37
2. Gaussian Kernel
Take a look at how we can use polynomial
kernel to implement kernel SVM:
svclassifier = SVC(kernel=‘rbf')
Radial basis function
38
SVM Kernels
Sigmoid Kernel
svclassifier = SVC(kernel=‘sigmoid')
39
Parameters Tuning
Most of the machine learning and deep learning algorithms
have some parameters that can be adjusted which are
called hyperparameters.
We need to set hyperparameters before we train the
models. Hyperparameters are very critical in building
robust and accurate models.
They help us find the balance between bias and variance
and thus, prevent the model from overfitting or
underfitting.
40
Consider the data points in 2D space as
Result: Model is not generalized and overfit.
41
To overcome this issue, in 1995, Cortes and Vapnik, came up with the
idea of “soft margin” SVM which allows some examples to be
misclassified or be on the wrong side of decision boundary.
Soft margin SVM often result in a better generalized model.
42
When determining the decision boundary, a soft margin
SVM tries to solve an optimization problem with the
following:
Increase the distance of decision boundary to classes (or
support vectors)
Maximize the number of points that are correctly classified
in the training set
43
There is a trade-off between these two goals.
Decision boundary might have to be very close to one
particular class to correctly label all data points in training set.
However, in this case, accuracy on test dataset might be
lower because decision boundary is too sensitive to noise and
to small changes in the independent variables.
On the other hand, a decision boundary might be placed as
far as possible to each class with the expense of some
misclassified exceptions. This trade-off is controlled by c
parameter.
44
C parameter adds a penalty for each misclassified data
point.
If c is small, the penalty for misclassified points is low so a
decision boundary with a large margin is chosen at the
expense of a greater number of misclassifications.
If c is large, SVM tries to minimize the number of
misclassified examples due to high penalty which results in
a decision boundary with a smaller margin.
Penalty is not same for all misclassified examples. It is
directly proportional to the distance to decision boundary.
45
Parameters Tuning
Tuning parameters value for machine learning
algorithms effectively improves the model
performance.
kernel:Here, we have various options
available with kernel like, “linear”,
“rbf”,”poly” and others (default value is
“rbf”). Here “rbf” and “poly” are useful for
non-linear hyper-plane.
46
Programming Example
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]
# we only take the first two features.
y = iris.target
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)
47
48
Change the kernel type to rbf
svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)
49
gamma: Kernel coefficient for ‘rbf’, ‘poly’ and
‘sigmoid’. Higher the value of gamma, will try to
exact fit the model as per training data set i.e.
generalization error and cause over-fitting
problem.
50
C: Penalty parameter C of the error term. It also
controls the trade off between smooth decision
boundary and classifying the training points
correctly.
51
Gamma vs C parameter
For a linear kernel, we just need to optimize the c
parameter.
However, if we want to use an RBF kernel, both c and
gamma parameter need to optimized simultaneously.
If gamma is large, the effect of c becomes negligible.
If gamma is small, c affects the model just like how it
affects a linear model.
52