Session 19 - SVM
Session 19 - SVM
2
Functions, Derivatives & Convexity
Source: Internet
3
Local & Global – Minimum, Maximum; Saddle point
• Brute force search
• Gradient Descent Search
• Genetic algorithm based approaches
• Evolutionary computation
techniques
• Tabu Search
• Simulated Annealing
• Hill Climbing techniques
• Ant colony optimization
• Particle swarm optimization
• Random forest optimization
5
Notes on Gradient Descent Algorithm
• Error is summed over all inputs and then the weights are updated
• Linear (pass through) activation function is used → f(x) = x
• Assumes a convex error space
• If there are local minima, the search may get stuck and never come-out
• Since error is summed over all inputs, the convergence may be slow
• Incremental or stochastic gradient descent is a variation of the algorithm
• Weight update done with each input
• Sometimes helps in overcoming the local minima
• The same principle can be applied with other activation functions as well.
However, mathematical proof of convergence may be difficult.
6
Support Vector Machines
7
Classes & Boundaries
8
Linear binary classifier
𝑥2
𝑥1
ഥ 𝑥ҧ + 𝑏 = 0
𝒘. 𝑥ҧ =
𝑥2 α
ഥ
𝒙 𝒇 𝒚′
ഥ
𝒘
𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)
ഥ 𝑥ҧ + 𝑏 < 0
𝒘. ഥ 𝑥ҧ + 𝑏 > 0
𝒘. Denotes -ve class
𝑥1
9
Linear binary classifier
𝑥2
𝑥1
ഥ 𝑥ҧ + 𝑏 = 0
𝒘. 𝑥ҧ =
𝑥2 α
ഥ
𝒙 𝒇 𝒚′
ഥ 𝑥ҧ + 𝑏 < 0
𝒘.
𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)
ഥ 𝑥ҧ + 𝑏 > 0
𝒘. Denotes -ve class
𝑥1
10
Support Vectors, Gutters, Separating Hyperplane & Margin
𝑥2 Separating Hyperplane
Gutters
A
Margin
Support Vectors
𝑥1
11
Linear binary classifier α
𝑥2 ഥ 𝑥ҧ + 𝑏 = 0
𝒘. Denotes +ve class
ഥ
𝒙 𝒇 𝒚′
ഥ
𝒘
Denotes -ve class
A 𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)
ഥ 𝑥ҧ 𝐴 + 𝑏 = +1
𝒘.
C ഥ 𝑥ҧ 𝐵 + 𝑏 = +1
𝒘.
B
ഥ 𝑥ҧ 𝐶 + 𝑏 = −1
𝒘.
Generalization
ഥ 𝑥ҧ + 𝑣𝑒 + 𝑏 ≥ +1
𝒘.
𝑥1 ഥ 𝑥ҧ − 𝑣𝑒 + 𝑏 ≤ −1
𝒘.
1
12
Linear binary classifier α
𝑥2 ഥ 𝑥ҧ + 𝑏 = 0
𝒘. Denotes +ve class
ഥ
𝒙 𝒇 𝒚′
ഥ
𝒘
Denotes -ve class
A 𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)
ഥ 𝑥ҧ 𝐴 + 𝑏 = +1
𝒘.
C ഥ 𝑥ҧ 𝐵 + 𝑏 = +1
𝒘.
B
ഥ 𝑥ҧ 𝐶 + 𝑏 = −1
𝒘.
ഥ | = 𝒘.
𝑴. | 𝒘 ഥ 𝑥𝐴 ഥ 𝑥ҧ 𝐶 + 𝑏
ҧ + 𝑏 − 𝒘.
2
𝑥1 M= 2
ഥ |
|𝒘
13
Optimization for SVM
2
1 ഥ | = 𝒘.
𝑴. | 𝒘 ഥ 𝑥𝐴 ഥ 𝑥ҧ 𝐶 + 𝑏
ҧ + 𝑏 − 𝒘.
ഥ 𝑥ҧ + 𝑣𝑒 + 𝑏 ≥ +1
𝒘. 2
M=
ഥ |
|𝒘
ഥ 𝑥ҧ − 𝑣𝑒 + 𝑏 ≤ −1
𝒘.
ഥ 𝑥ҧ 𝑖 + 𝑏 𝑦𝑖 ≥ +1
𝒘. Formulate the optimization problem & constraints
φ(𝒘) ഥ T. 𝒘)
ഥ = ½ (𝒘 ഥ
𝒚′ = 𝒇 𝒙
ഥ ഥ 𝑥ҧ 𝑖 + 𝑏 𝑦𝑖 ≥ +1
subject to 𝒘.
= σ𝒊 ∝ 𝑖 𝑦𝑖 𝒙ഥ𝑖𝑇. 𝒙
ഥ + 𝑏) for all 𝑖
14
Optimization for SVM
15
SVM
𝒚′ = 𝒇 𝒙
ഥ Kernel function is some
function which maintains
= σ𝒊 ∝ 𝑖 𝑦𝑖 𝒙ഥ𝑖𝑇. 𝒙
ഥ + 𝑏) the sanctity of dot product
in some expanded space
16
Multiclass Scenario
𝑥2
𝑥1
17
One against another approach
𝑥2
𝑥2 𝑥2
𝑥1
𝑥1 𝑥1
18
One vs all others approach
𝑥2
• Consider each class separately
against all other classes
combined
✓ Tomato against all others
✓ Kiwi against all others
✓ Apples against all others
• SVM models generated as
many classes are available (5
models in this example)
• Test pattern is run through
each models
✓ Score of being an apple
✓ Score of being a tomato
• Class value assigned based on
𝑥1 maximum membership score
19
Python Code
20
Thank you !!!!!
21