SVM Set-2
SVM Set-2
Solutions:
2) Explanation: If you are asked to classify two different classes. There can be
multiple hyperplanes which can be drawn.
SVM chooses the hyperplane which separates the data points as widely as
possible. SVM draws a hyperplane parallel to the actual hyperplane intersecting
with the first point of class A (also known as Support Vectors) and another
hyperplane parallel to the actual hyperplane intersecting with the first point of
class B. SVM tries to maximize these margins. Eventually, this margin
maximization improves the model’s accuracy on unseen data.
3) Explanation: Hinge Loss is a loss function which penalises the SVM model for
inaccurate predictions.
If Yi(WT*Xi +b) ≥ 1, hinge loss is ‘0’ i.e the points are correctly classified.
subject to
This is also known as the primal form of SVM.
The duality theory provides a convenient way to deal with the constraints. The
dual optimization problem can be written in terms of dot products, thereby
making it possible to use kernel functions.
It is possible to express a different but closely related problem, called its dual
problem. The solution to the dual problem typically gives a lower bound to the
solution of the primal problem, but under some conditions, it can even have the
same solutions as the primal problem. Luckily, the SVM problem happens to
meet these conditions, so you can choose to solve the primal problem or the dual
problem; both will have the same solution.
5) Earlier we have discussed applying SVM on linearly separable data but it is very
rare to get such data. Here, the kernel trick plays a huge role. The idea is to map
the non-linear separable data-set into a higher dimensional space where we can
find a hyperplane that can separate the samples.
It reduces the complexity of finding the mapping function. So, Kernel function
defines the inner product in the transformed space. Application of the kernel
trick is not limited to the SVM algorithm. Any computations involving the dot
products (x, y) can utilize the kernel trick.
6) Polynomial kernel is a kernel function commonly used with support vector
machines (SVMs) and other kernelized models, that represents the similarity of
vectors (training samples) in a feature space over polynomials of the original
variables, allowing learning of non-linear models.
7) The RBF kernel on two samples x and x’, represented as feature vectors in some
input space, is defined as
||x-x’||² recognized as the squared Euclidean distance between the two feature
vectors. sigma is a free parameter.
8) This question applies only to linear SVMs since kernelized can only use the dual
form. The computational complexity of the primal form of the SVM problem is
proportional to the number of training instances m, while the computational
complexity of the dual form is proportional to a number between m² and m³. So,
if there are millions of instances, you should use the primal form, because the
dual form will be much too slow.
9) The Support Vector Regression (SVR) uses the same principles as the SVM for
classification, with only a few minor differences. First of all, because the output is
a real number it becomes very difficult to predict the information at hand, which
has infinite possibilities. In the case of regression, a margin of tolerance (epsilon)
is set in approximation to the SVM
10)
hyperparameter.