0% found this document useful (0 votes)
21 views

Support Vector Machine

Uploaded by

keshavkeshav2608
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Support Vector Machine

Uploaded by

keshavkeshav2608
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Support Vector Machine

Support Vector Machine


 SVM is a supervised algorithm which can be
used for both classification or regression
problems. It separates data by using concept
of hyperplanes.
Support Vector Machine
Support Vectors are simply the co-ordinates
of individual observation. These are data
points that are closer to the hyperplane and
influence the position and orientation of the
hyperplane.

The distance between the vectors and the


hyperplane is called as margin.

The main objective is to maximize this


margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Support Vector Machine

.
SVM takes the training data as an input and
outputs a line/hyperplane that separates
those classes if possible.

For a space of n dimensions, we have a


hyperplane of n-1 dimensions separating it
into two or more parts.
Types of SVM

Linear SVM: Linear SVM is used for linearly separable data,


which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
.linearly separable data, and classifier is used called as Linear
SVM classifier.

Non-linear SVM: Non-Linear SVM is used for non-linearly


separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM
classifier.
Support Vector Machine
Support Vector Machine
Support Vector Machine

 This data is simple to classify and one can see


that the data is clearly separated into two
segments.
 Any line that separates the red and blue items
can be used to classify the above data.
Support Vector Machine

 Had this data been multi-dimensional data,


any plane can separate and successfully
classify the data.
 However, we want to find the “most optimal”
solution. What will then be the characteristic
of this most optimal line?
Support Vector Machine

 We have to remember that this is just the


training data and we can have more data
points which can lie anywhere in the
subspace. If our line is too close to any of the
data points, noisy test data is more likely to
get classified in a wrong segment.
 We have to choose the line which lies
between these groups and is at the farthest
distance from each of the segments.
Support Vector Machine

 To solve this classifier line, we draw the line


as y=ax+b and make it equidistant from the
respective data points closest to the line. So
we want to maximize the margin (m).
Support Vector Machine
Support Vector Machine

How does it work?


The process of segregating the two classes with
a hyper-plane.

“How can we identify the right hyper-plane?”.


Support Vector Machine

Identify the right hyper-plane (Scenario-


1): Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify
star and circle.
Support Vector Machine
Support Vector Machine

Identify the right hyper-plane (Scenario-


2): Here, we have three hyper-planes (A, B and
C) and all are segregating the classes well. Now,
How can we identify the right hyper-plane?
Support Vector Machine
Support Vector Machine

Here, maximizing the distances between nearest


data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This
distance is called as Margin. Let’s look at the
below snapshot:
Support Vector Machine
Support Vector Machine

 You can see that the margin for hyper-plane C


is high as compared to both A and B. Hence,
we name the hyper-plane C as best.
 Another lightning reason for selecting the
hyper-plane with higher margin is robustness.
If we select a hyper-plane having low margin
then there is high chance of miss-
classification.
Support Vector Machine

Identify the right hyper-plane (Scenario-


3):Hint: Use the rules as discussed in previous
section to identify the right hyper-plane
Support Vector Machine
Support Vector Machine

 Some of you may have selected the hyper-


plane B as it has higher margin compared
to A. But, here is the catch, SVM selects the
hyper-plane which classifies the classes
accurately prior to maximizing margin. Here,
hyper-plane B has a classification error and A
has classified all correctly. Therefore, the right
hyper-plane is A.
Support Vector Machine
Support Vector Machine

 One star at other end is like an outlier for star


class. SVM has a feature to ignore outliers
and find the hyper-plane that has maximum
margin. Hence, we can say, SVM is robust to
outliers.
Support Vector Machine
Support Vector Machine

 Find the hyper-plane to segregate to


classes (Scenario-5): In this scenario, we
can’t have linear hyper-plane between the
two classes, so how does SVM classify these
two classes? (Till now, we have only looked at
the linear hyper-plane)
Support Vector Machine
Support Vector Machine

 SVM can solve this problem. Easily! It solves


this problem by introducing additional feature.
Here, we will add a new feature z=x^2+y^2.
Now, let’s plot the data points on axis x and z
Support Vector Machine
Support Vector Machine

In this plot, points to be considered are:


 All values for z would be positive always
because z is the squared sum of both x and y.

 In the original plot, red circles appear close to


the origin of x and y axes, leading to lower
value of z and star relatively away from the
origin result to higher value of z.
Support Vector Machine
 SVM, the function which converts non-separable
problem to separable problem, is called kernel.
 It is mostly useful in non-linear separation
problem.
 It does some extremely complex data
transformations to separate the data based on
the labels or outputs.
SVM Implementation
import pandas as pd
import numpy as np
from sklearn.model_selection import
train_test_split
bankdata = pd.read_csv("D:/Machine
Learning/Lab/bill_authentication.csv")
print(bankdata.shape)
print(bankdata.head())
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
34
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)

print(X_train.shape)
from sklearn.svm import SVC

svclassifier = SVC(kernel='linear')

svclassifier.fit(X_train, y_train)

35
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import
classification_report, confusion_matrix

print(confusion_matrix(y_test,y_pred))

print(classification_report(y_test,y_pred))

36
SVM Kernels

1. Polynomial Kernel
In the case of polynomial kernel, you also
have to pass a value for the degree parameter
of the SVC class. This basically is the degree
of the polynomial.

svclassifier = SVC(kernel='poly', degree=8)

37
2. Gaussian Kernel
Take a look at how we can use polynomial
kernel to implement kernel SVM:

svclassifier = SVC(kernel=‘rbf')

Radial basis function

38
SVM Kernels

Sigmoid Kernel

svclassifier = SVC(kernel=‘sigmoid')

39
Parameters Tuning
Most of the machine learning and deep learning algorithms
have some parameters that can be adjusted which are
called hyperparameters.
 We need to set hyperparameters before we train the
models. Hyperparameters are very critical in building
robust and accurate models.
 They help us find the balance between bias and variance
and thus, prevent the model from overfitting or
underfitting.

40
Consider the data points in 2D space as

Result: Model is not generalized and overfit.

41
To overcome this issue, in 1995, Cortes and Vapnik, came up with the
idea of “soft margin” SVM which allows some examples to be
misclassified or be on the wrong side of decision boundary.

Soft margin SVM often result in a better generalized model.

42
When determining the decision boundary, a soft margin
SVM tries to solve an optimization problem with the
following:
 Increase the distance of decision boundary to classes (or
support vectors)
 Maximize the number of points that are correctly classified
in the training set

43
There is a trade-off between these two goals.

Decision boundary might have to be very close to one


particular class to correctly label all data points in training set.
However, in this case, accuracy on test dataset might be
lower because decision boundary is too sensitive to noise and
to small changes in the independent variables.

On the other hand, a decision boundary might be placed as


far as possible to each class with the expense of some
misclassified exceptions. This trade-off is controlled by c
parameter.

44
C parameter adds a penalty for each misclassified data
point.
 If c is small, the penalty for misclassified points is low so a
decision boundary with a large margin is chosen at the
expense of a greater number of misclassifications.

 If c is large, SVM tries to minimize the number of


misclassified examples due to high penalty which results in
a decision boundary with a smaller margin.

 Penalty is not same for all misclassified examples. It is


directly proportional to the distance to decision boundary.

45
Parameters Tuning
Tuning parameters value for machine learning
algorithms effectively improves the model
performance.
 kernel:Here, we have various options
available with kernel like, “linear”,
“rbf”,”poly” and others (default value is
“rbf”). Here “rbf” and “poly” are useful for
non-linear hyper-plane.

46
Programming Example

# import some data to play with


iris = datasets.load_iris()
X = iris.data[:, :2]
# we only take the first two features.
y = iris.target

C = 1.0 # SVM regularization parameter


svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)

47
48
Change the kernel type to rbf

svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)

49
gamma: Kernel coefficient for ‘rbf’, ‘poly’ and
‘sigmoid’. Higher the value of gamma, will try to
exact fit the model as per training data set i.e.
generalization error and cause over-fitting
problem.

50
C: Penalty parameter C of the error term. It also
controls the trade off between smooth decision
boundary and classifying the training points
correctly.

51
Gamma vs C parameter
 For a linear kernel, we just need to optimize the c
parameter.
 However, if we want to use an RBF kernel, both c and
gamma parameter need to optimized simultaneously.
 If gamma is large, the effect of c becomes negligible.
 If gamma is small, c affects the model just like how it
affects a linear model.

52

You might also like