0% found this document useful (0 votes)
2 views5 pages

SVM Set-2

Support Vector Machines (SVM) is a supervised learning algorithm used for classification and regression, aiming to find a hyperplane that maximizes the margin between classes. The document explains key concepts such as Hinge Loss, the dual formulation of SVM, the kernel trick, and different kernel types including Polynomial and RBF kernels. It also discusses the implications of the hyperparameter C on the model's bias-variance trade-off and the choice between primal and dual forms based on dataset size.

Uploaded by

gunjan09102000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

SVM Set-2

Support Vector Machines (SVM) is a supervised learning algorithm used for classification and regression, aiming to find a hyperplane that maximizes the margin between classes. The document explains key concepts such as Hinge Loss, the dual formulation of SVM, the kernel trick, and different kernel types including Polynomial and RBF kernels. It also discusses the implications of the hyperparameter C on the model's bias-variance trade-off and the choice between primal and dual forms based on dataset size.

Uploaded by

gunjan09102000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1) Can you explain SVM?

2) What is the geometric intuition behind SVM?


3) What is Hinge Loss?
4) Explain the Dual form of SVM formulation?
5) What’s the “kernel trick” and how is it useful?
6) What is a Polynomial kernel?
7) What is RBF-Kernel?
8) Should you use the primal or the dual form of the SVM problem to train a model
on a training set with millions of instances and hundreds of features?
9) Explain about SVM Regression?
10)What is the role of C in SVM? How does it affect the bias/variance trade-off?

Solutions:

1) Explanation: Support vector machines is a supervised machine learning


algorithm which works both on classification and regression problems. It tries to
classify data by finding a hyperplane that maximizes the margin between the
classes in the training data. Hence, SVM is an example of a large margin
classifier.

The basic idea of support vector machines:


● Optimal hyperplane for linearly separable patterns
● Extend to patterns that are not linearly separable by transformations of
original data to map into new space(i.e the kernel trick)

2) Explanation: If you are asked to classify two different classes. There can be
multiple hyperplanes which can be drawn.

SVM chooses the hyperplane which separates the data points as widely as
possible. SVM draws a hyperplane parallel to the actual hyperplane intersecting
with the first point of class A (also known as Support Vectors) and another
hyperplane parallel to the actual hyperplane intersecting with the first point of
class B. SVM tries to maximize these margins. Eventually, this margin
maximization improves the model’s accuracy on unseen data.

3) Explanation: Hinge Loss is a loss function which penalises the SVM model for
inaccurate predictions.
If Yi(WT*Xi +b) ≥ 1, hinge loss is ‘0’ i.e the points are correctly classified.

When Yi(WT*Xi +b) < 1, then hinge loss increases massively.


As Yi(WT*Xi +b) increases with every misclassified point, the upper bound of
hinge loss {1- Yi(WT*Xi +b)} also increases exponentially.
Hence, the points that are farther away from the decision margins have a greater
loss value, thus penalizing those points.

We can formulate hinge loss as max[0, 1- Yi(WT*Xi +b)]

4) The aim of the Soft Margin formulation is to minimize

subject to
This is also known as the primal form of SVM.
The duality theory provides a convenient way to deal with the constraints. The
dual optimization problem can be written in terms of dot products, thereby
making it possible to use kernel functions.
It is possible to express a different but closely related problem, called its dual
problem. The solution to the dual problem typically gives a lower bound to the
solution of the primal problem, but under some conditions, it can even have the
same solutions as the primal problem. Luckily, the SVM problem happens to
meet these conditions, so you can choose to solve the primal problem or the dual
problem; both will have the same solution.

5) Earlier we have discussed applying SVM on linearly separable data but it is very

rare to get such data. Here, the kernel trick plays a huge role. The idea is to map
the non-linear separable data-set into a higher dimensional space where we can
find a hyperplane that can separate the samples.

It reduces the complexity of finding the mapping function. So, Kernel function
defines the inner product in the transformed space. Application of the kernel
trick is not limited to the SVM algorithm. Any computations involving the dot
products (x, y) can utilize the kernel trick.
6) Polynomial kernel is a kernel function commonly used with support vector
machines (SVMs) and other kernelized models, that represents the similarity of
vectors (training samples) in a feature space over polynomials of the original
variables, allowing learning of non-linear models.

For d-degree polynomials, the polynomial kernel is defined as:

7) The RBF kernel on two samples x and x’, represented as feature vectors in some
input space, is defined as

||x-x’||² recognized as the squared Euclidean distance between the two feature
vectors. sigma is a free parameter.

8) This question applies only to linear SVMs since kernelized can only use the dual
form. The computational complexity of the primal form of the SVM problem is
proportional to the number of training instances m, while the computational
complexity of the dual form is proportional to a number between m² and m³. So,
if there are millions of instances, you should use the primal form, because the
dual form will be much too slow.

9) The Support Vector Regression (SVR) uses the same principles as the SVM for
classification, with only a few minor differences. First of all, because the output is
a real number it becomes very difficult to predict the information at hand, which
has infinite possibilities. In the case of regression, a margin of tolerance (epsilon)
is set in approximation to the SVM

10)

In the given Soft Margin Formulation of SVM, C is a

hyperparameter.

C hyperparameter adds a penalty for each misclassified data point.


Large Value of parameter C implies a small margin, there is a tendency to overfit
the training model.
Small Value of parameter C implies a large margin which might lead to
underfitting of the model.

You might also like