0% found this document useful (0 votes)

33 views9 pages

Advanced SVM and Kernel Theory

The document discusses advanced Support Vector Machines (SVM) and kernel theory in machine learning, focusing on the kernel trick and various kernel functions for classifying non-linearly separable data. It covers the SVM objective function, dual formulation, common kernel types, and visualizations of decision boundaries using different kernels on datasets like the Iris dataset. The conclusion emphasizes the importance of visualizing decision boundaries to understand how different kernels affect classification, while recommending parameter tuning for better model performance.

Uploaded by

sufyanalthawri1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views9 pages

Advanced SVM and Kernel Theory

Uploaded by

sufyanalthawri1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Ferhat Abbas University, Setif 1

Computer Science Department

Machine Learning, IDTW, M2

Kernel Theory and Advanced SVM

November 5, 2024

i
im
rr
Be
F.

[email protected] 1
Contents
1 Introduction 3

2 Theory of SVM and Kernel Trick 3

3 Objective Function of SVM 3

4 Kernel Trick 3

5 SVM Dual Formulation with Kernel 3

6 Common Kernel Functions 4

6.1 Linear Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
6.2 Polynomial Kernel (degree p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
6.3 Radial Basis Function (RBF) Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
6.4 Sigmoid Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

7 Visualizing the Effect of Kernels 4

8 Plotting Different SVM Classifiers in the Iris Dataset 5

8.1 SVM Classifiers and Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

i
9 Example: Plot classification boundaries with different SVM Kernels 5

im
9.1 Creating a dataset . . . . . . . . . . . . . . . . . . . . .
9.2 Training SVC model and plotting decision boundaries .
9.3 Polynomial kernel . . . . . . . . . . . . . . . . . . . . . .
9.4 RBF kernel . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
7
7
8
9.5 Sigmoid kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
rr
10 Conclusion 9
Be
F.

2
1 Introduction
I’ll explain the advanced Support Vector Machines (SVM) and kernel theory in machine learning, par-
ticularly in the context of non-linearly separable data. This involves explaining the kernel trick, various
kernel functions, and how SVMs utilize kernels for classification in high-dimensional spaces. I’ll provide
the formulas and graphs where possible.

2 Theory of SVM and Kernel Trick

Support Vector Machines (SVM) are supervised learning algorithms primarily used for classification and
regression tasks. The goal of an SVM classifier is to find the hyperplane that best separates data points
of different classes with the maximum margin.
In cases where the data is linearly inseparable in the input space, SVMs can use the kernel trick to
map data into a higher-dimensional space where it becomes linearly separable.

3 Objective Function of SVM

Given a set of labeled data points (xi , yi ) where i = 1, . . . , N , xi ∈ Rd represents the feature vectors,
and yi ∈ {−1, 1} the labels, the SVM seeks to solve the following optimization problem:
1
min ∥w∥2

i
w,b 2

subject to: im
yi (w · xi + b) ≥ 1,
where: w is the normal vector to the hyperplane.
∀i

b is the bias term.

rr
To handle non-linearly separable cases, we introduce the kernel function K(x, x′ ), which implicitly
maps data to a higher-dimensional feature space without explicitly computing the transformation.
Be

4 Kernel Trick
The kernel trick allows us to compute the inner product of transformed features in high-dimensional
space efficiently. Instead of mapping each point explicitly, we use a function K(xi , xj ) = ϕ(xi ) · ϕ(xj ),
where ϕ is the mapping function.
F.

This kernel function replaces the dot product in the SVM’s decision function, yielding a more flexible
classifier that can work in high-dimensional spaces.

5 SVM Dual Formulation with Kernel

The dual form of the SVM objective function using kernels K(x, x′ ) is:
N N N
X 1 XX
max αi − αi αj yi yj K(xi , xj )
α
i=1
2 i=1 j=1

subject to:
N
X
0 ≤ αi ≤ C, αi yi = 0
i=1

where αi are the Lagrange multipliers, and C is the regularization parameter. The decision function
for a new data point x is:
N
!
X
f (x) = sign αi yi K(xi , x) + b
i=1

3
Figure 1: Linear SVC and SVC with kernels

i
6
6.1
Common Kernel Functions
Linear Kernel
im
K(xi , xj ) = xi · xj
rr
6.2 Polynomial Kernel (degree p)
Be

K(xi , xj ) = (xi · xj + 1)p

6.3 Radial Basis Function (RBF) Kernel

∥xi − xj ∥2

K(xi , xj ) = exp −
2σ 2
F.

where σ is a free parameter that determines the spread of the kernel.

6.4 Sigmoid Kernel

K(xi , xj ) = tanh(κxi · xj + θ)
where κ and θ are parameters.

7 Visualizing the Effect of Kernels

The following diagrams illustrate the effect of different kernels in transforming non-linearly separable
data:

• Linear Kernel: A simple hyperplane separation (useful for linearly separable data).
• Polynomial Kernel: Creates complex decision boundaries that can capture curved relationships.
• RBF Kernel: Suitable for highly non-linear relationships, producing circular boundaries.

4
8 Plotting Different SVM Classifiers in the Iris Dataset
We provide a comparison of different linear SVM classifiers on a 2D projection of the Iris dataset. For
simplicity, we consider only the first two features of the dataset:
• Sepal length

• Sepal width
This example demonstrates how to plot the decision surface for four different SVM classifiers, each
with a distinct kernel type.

8.1 SVM Classifiers and Kernels

The linear models LinearSVC() and SVC(kernel=’linear’) produce slightly different decision bound-
aries. These differences may arise due to:
• Loss Function: LinearSVC minimizes the squared hinge loss, while SVC minimizes the regular
hinge loss.

• Multiclass Strategy: LinearSVC employs a One-vs-All (or One-vs-Rest) approach for multiclass
classification, whereas SVC uses a One-vs-One approach.
Both linear models create linear decision boundaries (intersecting hyperplanes). In contrast, non-

i
linear kernel models, such as polynomial or Gaussian RBF kernels, produce flexible, non-linear boundaries
shaped by the kernel type and parameters.

Note
im
Visualizing decision functions of classifiers on 2D datasets can provide insights into their expressive
rr
power. However, these intuitions may not generalize to high-dimensional problems.
Be

9 Example: Plot classification boundaries with different SVM

Kernels
This example shows how different kernels in a SVC (Support Vector Classifier) influence the classification
boundaries in a binary, two-dimensional classification problem.
SVCs aim to find a hyperplane that effectively separates the classes in their training data by maxi-
F.

mizing the margin between the outermost data points of each class. This is achieved by finding the best
weight vector that defines the decision boundary hyperplane and minimizes the sum of hinge losses for
misclassified samples, as measured by the hinge−loss function. By default, regularization is applied with
the parameter C=1, which allows for a certain degree of misclassification tolerance.

If the data is not linearly separable in the original feature space, a non-linear kernel parameter can be
set. Depending on the kernel, the process involves adding new features or transforming existing features
to enrich and potentially add meaning to the data. When a kernel other than ”linear” is set, the SVC
applies the kernel trick, which computes the similarity between pairs of data points using the kernel
function without explicitly transforming the entire dataset. The kernel trick surpasses the otherwise
necessary matrix transformation of the whole dataset by only considering the relations between all pairs
of data points. The kernel function maps two vectors (each pair of observations) to their similarity using
their dot product.

The hyperplane can then be calculated using the kernel function as if the dataset were represented in a
higher-dimensional space. Using a kernel function instead of an explicit matrix transformation improves
performance, as the kernel function has a time complexity of O(n2 ) , whereas matrix transformation
scales according to the specific transformation being applied.
In this example, we compare the most common kernel types of Support Vector Machines: the linear
kernel (”linear”), the polynomial kernel (”poly”), the radial basis function kernel (”rbf”) and the sigmoid
kernel (”sigmoid”).

5
9.1 Creating a dataset
We create a two-dimensional classification dataset with 16 samples and two classes. We plot the samples
with the colors matching their respective targets.

i
im
rr
Be

Figure 2: script of SVC

To download the svm script from different kernel, click on this link website.
F.

Figure 3: Samples in two dimentionel feature space

We can see that the samples are not clearly separable by a straight line.

6
9.2 Training SVC model and plotting decision boundaries
We define a function that fits a SVC classifier, allowing the kernel parameter as an input, and then plots
the decision boundaries learned by the model using DecisionBoundaryDisplay.
Notice that for the sake of simplicity, the C parameter is set to its default value (C = 1) in this
example and the γ parameter is set to γ = 2 across all kernels, although it is automatically ignored for
the linear kernel. In a real classification task, where performance matters, parameter tuning (by using
GridSearchCV for instance) is highly recommended to capture different structures within the data.

Setting response method = ”predict” in DecisionBoundaryDisplay colors the areas based on their
predicted class. Using response method = ”decision f unction” allows us to also plot the decision
boundary and the margins to both sides of it. Finally the support vectors used during training (which
always lay on the margins) are identified by means of the support vectors attribute of the trained SVCs,
and plotted as well.

i
im
rr
Be

Figure 4: Decision boundaries of linear kernel in SVC

Training a SVC on a linear kernel results in an untransformed feature space, where the hyperplane
and the margins are straight lines. Due to the lack of expressivity of the linear kernel, the trained classes
do not perfectly capture the training data.

9.3 Polynomial kernel

The polynomial kernel changes the notion of similarity. The variable d is the degree (degree) of the
polynomial, γ controls the influence of each individual training sample on the decision boundary and r
is the bias term (coef 0) that shifts the data up or down. Here, we use the default value for the degree
of the polynomial in the kernel function (degree = 3). When coef 0 = 0 (the default), the data is only
transformed, but no additional dimension is added. Using a polynomial kernel is equivalent to creating
PolynomialFeatures and then fitting a SVC with a linear kernel on the transformed data, although this
alternative approach would be computationally expensive for most datasets.

7
Figure 5: Decision boundaries of poly kernel in SVC

i
The polynomial kernel with γ = 2 adapts well to the training data, causing the margins on both sides
of the hyperplane to bend accordingly.

9.4 RBF kernel

im
rr
The radial basis function (RBF) kernel, also known as the Gaussian kernel, is the default kernel for
Support Vector Machines in scikit-learn. It measures similarity between two data points in infinite di-
mensions and then approaches classification by majority vote.
Be
F.

Figure 6: Decision boundaries of RBF kernel in SVC

The variable γ controls the influence of each individual training sample on the decision boundary.
The larger the euclidean distance between two points the closer the kernel function is to zero. This
means that two points far away are more likely to be dissimilar.

8
9.5 Sigmoid kernel
The kernel coefficient γ controls the influence of each individual training sample on the decision boundary
and r is the bias term (coef 0) that shifts the data up or down.
In the sigmoid kernel, the similarity between two data points is computed using the hyperbolic tangent
function (tanh). The kernel function scales and possibly shifts the dot product of the two points (x1 and
x2 ).

i
im
rr
Figure 7: Decision boundaries of sigmoid kernel in SVC
Be

We can see that the decision boundaries obtained with the sigmoid kernel appear curved and irregu-
lar. The decision boundary tries to separate the classes by fitting a sigmoid-shaped curve, resulting in a
complex boundary that may not generalize well to unseen data.
From this example it becomes obvious, that the sigmoid kernel has very specific use cases, when dealing
with data that exhibits a sigmoidal shape. In this example, careful fine tuning might find more gen-
eralizable decision boundaries. Because of it’s specificity, the sigmoid kernel is less commonly used in
F.

practice compared to other kernels.

10 Conclusion
In this example, we have visualized the decision boundaries trained with the provided dataset. The plots
serve as an intuitive demonstration of how different kernels utilize the training data to determine the
classification boundaries.
The hyperplanes and margins, although computed indirectly, can be imagined as planes in the trans-
formed feature space. However, in the plots, they are represented relative to the original feature space,
resulting in curved decision boundaries for the polynomial, RBF, and sigmoid kernels.
Noted that the plots do not evaluate the individual kernel’s accuracy or quality. They are intended
to provide a visual understanding of how the different kernels use the training data.
For a comprehensive evaluation, fine-tuning of SVC parameters using techniques such as Grid-
SearchCV is recommended to capture the underlying structures within the data.

SVM Detailed Presentation
No ratings yet
SVM Detailed Presentation
13 pages
Support Vector Machines
No ratings yet
Support Vector Machines
43 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
SVM Kernal
No ratings yet
SVM Kernal
5 pages
Honours Endsem Notes
No ratings yet
Honours Endsem Notes
163 pages
Support Vector and Kernel Methods - Detailed Notes
No ratings yet
Support Vector and Kernel Methods - Detailed Notes
10 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
2 pages
SVM Kernels and Its Type
No ratings yet
SVM Kernels and Its Type
6 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
36 pages
Support Vector Machine Explained
No ratings yet
Support Vector Machine Explained
4 pages
Kernels in Support Vector Machine Part B
No ratings yet
Kernels in Support Vector Machine Part B
5 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
PML Lab Exp 10
No ratings yet
PML Lab Exp 10
3 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
Support Vector Machines: Detailed Notes: Compiled From Geeksforgeeks and Other Sources September 14, 2025
No ratings yet
Support Vector Machines: Detailed Notes: Compiled From Geeksforgeeks and Other Sources September 14, 2025
6 pages
SVM
No ratings yet
SVM
8 pages
SVM Presentation
No ratings yet
SVM Presentation
13 pages
SVM: High Accuracy Classifier Guide
No ratings yet
SVM: High Accuracy Classifier Guide
7 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
29 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
Lect 11-SVM
No ratings yet
Lect 11-SVM
14 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
SVM
No ratings yet
SVM
12 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SVM Classifier Techniques Guide
No ratings yet
SVM Classifier Techniques Guide
15 pages
Support Vector Machine
0% (1)
Support Vector Machine
7 pages
Handout 03 Classic Classifiers
No ratings yet
Handout 03 Classic Classifiers
39 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
This Is
No ratings yet
This Is
7 pages
SVM
No ratings yet
SVM
9 pages
Module 3 ML 24
No ratings yet
Module 3 ML 24
65 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
Presentation - SVM & KM - May 2009
No ratings yet
Presentation - SVM & KM - May 2009
24 pages
Support Vector Machine: Classification, Regression and Outliers Detection
No ratings yet
Support Vector Machine: Classification, Regression and Outliers Detection
26 pages
SVMs
No ratings yet
SVMs
30 pages
Support Vector Machine (SVM) Classifier:: Key Features
No ratings yet
Support Vector Machine (SVM) Classifier:: Key Features
6 pages
SVM Basics for Data Scientists
No ratings yet
SVM Basics for Data Scientists
28 pages
Complete-SVM Lecture Notes
No ratings yet
Complete-SVM Lecture Notes
23 pages
SVMs
No ratings yet
SVMs
30 pages
Presentation1 2
No ratings yet
Presentation1 2
18 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Support Vector Machine: Mr.A.Suresh Kumar
No ratings yet
Support Vector Machine: Mr.A.Suresh Kumar
13 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
SVM
No ratings yet
SVM
43 pages
Supervised Learning-SVM: Support Vector Machines
No ratings yet
Supervised Learning-SVM: Support Vector Machines
29 pages
Support Vector Machine
No ratings yet
Support Vector Machine
18 pages
SVM
No ratings yet
SVM
4 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Lecture#12
No ratings yet
Lecture#12
16 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
Support Vector Machine
No ratings yet
Support Vector Machine
14 pages
Université Ferhat Abbas Sétif-1 2022-2023 Faculté Des Sciences, Département D'informatique Programmation Web Avancée
No ratings yet
Université Ferhat Abbas Sétif-1 2022-2023 Faculté Des Sciences, Département D'informatique Programmation Web Avancée
3 pages
JavaScript Promise Examples
No ratings yet
JavaScript Promise Examples
4 pages
Promise Etud
No ratings yet
Promise Etud
12 pages
TRANSLATION Informatique
No ratings yet
TRANSLATION Informatique
4 pages
Conditionals Informatique
No ratings yet
Conditionals Informatique
2 pages
Loctite Solvo-Rust Super Penetrating Oil Aerosol
No ratings yet
Loctite Solvo-Rust Super Penetrating Oil Aerosol
4 pages
Unit-Simple Mechanism Revision Class: Engineering-Projects
No ratings yet
Unit-Simple Mechanism Revision Class: Engineering-Projects
48 pages
Presentasi Power Electronics Bagus
No ratings yet
Presentasi Power Electronics Bagus
28 pages
Technical Data Foam Sprinklers
No ratings yet
Technical Data Foam Sprinklers
6 pages
1831 e Cr421 en Voith Turbo Safeset Torque Limiting Couplings
No ratings yet
1831 e Cr421 en Voith Turbo Safeset Torque Limiting Couplings
32 pages
Empire dw1040tp Dishwasher
No ratings yet
Empire dw1040tp Dishwasher
20 pages
Dedication Booklet
No ratings yet
Dedication Booklet
16 pages
PG Thread DIN 40430, Steel Conduit Thread
100% (2)
PG Thread DIN 40430, Steel Conduit Thread
2 pages
Design Report (HK)
100% (1)
Design Report (HK)
22 pages
Threshold 14
No ratings yet
Threshold 14
194 pages
INSTRUMENTATION
No ratings yet
INSTRUMENTATION
43 pages
Safety Data Sheet: Section 1. Product and Company Identification
No ratings yet
Safety Data Sheet: Section 1. Product and Company Identification
13 pages
Computational Modeling of Nanometer-Scale Tribology
No ratings yet
Computational Modeling of Nanometer-Scale Tribology
6 pages
Tropospheric Wave Propagation
No ratings yet
Tropospheric Wave Propagation
17 pages
Budget 2025 SuperKalam
No ratings yet
Budget 2025 SuperKalam
27 pages
Printmaking Techniques Explained
No ratings yet
Printmaking Techniques Explained
21 pages
ProtoCycler User Manual 2.0
No ratings yet
ProtoCycler User Manual 2.0
37 pages
LitCharts Anthropomorphism
100% (1)
LitCharts Anthropomorphism
3 pages
Sat Data Summary
No ratings yet
Sat Data Summary
1 page
Math Olympiad Sample Test
No ratings yet
Math Olympiad Sample Test
3 pages
BT Price List 1stmay2018
No ratings yet
BT Price List 1stmay2018
24 pages
Hungary - SME Fact Sheet 2022
No ratings yet
Hungary - SME Fact Sheet 2022
1 page
1862-Article Text-1862-1-10-20141206
No ratings yet
1862-Article Text-1862-1-10-20141206
8 pages
The Order of The Eastern Star
100% (6)
The Order of The Eastern Star
15 pages
Manual Motoniveladora Champion
67% (12)
Manual Motoniveladora Champion
51 pages
EXPT 8 Standing Waves in A String
No ratings yet
EXPT 8 Standing Waves in A String
6 pages
Executive Innovation Immersion
No ratings yet
Executive Innovation Immersion
12 pages
Moby Dick: A Journey with Ishmael
No ratings yet
Moby Dick: A Journey with Ishmael
10 pages
Practical Risk Theory For Actuaries
0% (1)
Practical Risk Theory For Actuaries
1 page
Hawk's Ego and Power
No ratings yet
Hawk's Ego and Power
4 pages