Module10 - Support Vector Machine
Module10 - Support Vector Machine
Some of the points are allowed to be on the wrong side of the hyperplane
Support Vector Classifier
• Support vector classifier (SVC) is also known as soft margin classifier.
• This allow some observations to be on the incorrect side of the margin that take care of:
1) robustness of individual observations and
2) better classification of most of the training observations.
• SVC is a solution to following optimization
max 𝑀
𝛽0,𝛽1,…,𝛽𝑝 ,𝜖1,…,𝜖𝑛
𝑝
Subject to σ𝑗=1 𝛽𝑗2 = 1,
𝑦𝑖 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝 𝑋𝑖𝑝 ≥ 𝑀(1 − 𝜖𝑖 )
𝑛
𝜖𝑖 ≥ 0, 𝜖𝑖 ≤ 𝐶
𝑖=1
Where 𝐶= nonnegative tuning parameter, 𝑀=width of margin, 𝜖1 , … , 𝜖𝑛 =slack variables
• 𝐶 is a budget for number and severity of the violations to the margin that will tolerate.
• Observations that lie directly on the margin, or on the wrong side of the margin for their class, are
known as support vectors
Support Vector Classifier with Linear Kernel
Support Vector Classifier
• The constraint of maximizing the margin of the line that separate the classes must
be relaxed. This is often called as soft margin classifier.
• Some additional set of coefficients are introduced that give the margin to move in
each dimensions. These coefficients are sometimes called slack variables (𝝐𝒏 ).
• Tuning parameter is defined as amount of violation of the margin allowed.
Value of 𝑪 Physical significance Variance Bias
𝑥, 𝑥𝑖 ′ = 𝑥𝑖𝑗 𝑥𝑖 ′𝑗
𝑗=1
• The linear support vector classifier can be represented as:
𝑓 𝑥 = 𝛽0 + 𝛼𝑖 𝑥, 𝑥𝑖 𝑖 = 1, … . , 𝑛
𝑖∈𝑆
• Where 𝑆 is a collection of support vectors and 𝑛 = parameters
Types of Kernel
• The inner product in support vector classifier is generally represented as 𝐾(𝑥𝑖 , 𝑥𝑖 ′ )
• Where 𝐾 is kernel function that quantifies the similarity of two observations.
• The linear kernel is written as: 𝑝
𝐾 𝑥𝑖 , 𝑥𝑖 ′ = 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗
𝑗=1
• Polynomial kernel of degree 𝑑: 𝑝
𝐾 𝑥𝑖 , 𝑥𝑖 ′ = (1 + 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗 )𝑑
𝑗=1
• The support vector classifier combined with a non-linear kernel, then resulting classifier is known
as a support vector machine.
𝑓 𝑥 = 𝛽0 + 𝛼𝑖 𝐾 𝑥𝑖 , 𝑥𝑖 ′
𝑖∈𝑆
• Radial kernel: 𝑝
𝐾 𝑥𝑖 , 𝑥𝑖 ′ = exp(−𝛾 (𝑥𝑖𝑗 − 𝑥𝑖 ′ 𝑗 )2 )
𝑗=1
SVM
Bayesian boundary