0% found this document useful (0 votes)
44 views

Session 19 - SVM

1) The document discusses optimization techniques like gradient descent and genetic algorithms for minimizing error functions. It then introduces support vector machines (SVMs) for classification. 2) SVMs find the optimal separating hyperplane that maximizes the margin between two classes of data points. The support vectors are the data points closest to the hyperplane. 3) The optimization problem for SVMs is to minimize the width of the margin while ensuring data points are classified correctly. Kernels are introduced to map data to higher dimensions to allow for nonlinear decision boundaries.

Uploaded by

raghu atluri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Session 19 - SVM

1) The document discusses optimization techniques like gradient descent and genetic algorithms for minimizing error functions. It then introduces support vector machines (SVMs) for classification. 2) SVMs find the optimal separating hyperplane that maximizes the margin between two classes of data points. The support vectors are the data points closest to the hyperplane. 3) The optimization problem for SVMs is to minimize the width of the margin while ensuring data points are classified correctly. Kernels are introduced to map data to higher dimensions to allow for nonlinear decision boundaries.

Uploaded by

raghu atluri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Machine Learning (19CSE305)

Error Surface, Parameter Optimization & SVM

Dr. Peeta Basa Pati


Ms. Priyanka V
Department of Computer Science & Engineering,
Amrita School of Engineering, Bengaluru
1
Topics
• Recap of Optimization
• Support Vectors

2
Functions, Derivatives & Convexity

Source: Internet

3
Local & Global – Minimum, Maximum; Saddle point
• Brute force search
• Gradient Descent Search
• Genetic algorithm based approaches
• Evolutionary computation
techniques
• Tabu Search
• Simulated Annealing
• Hill Climbing techniques
• Ant colony optimization
• Particle swarm optimization
• Random forest optimization

Source: Engineering Optimization, S S Rao


4
Derivation of Gradient Descent Algorithm

5
Notes on Gradient Descent Algorithm
• Error is summed over all inputs and then the weights are updated
• Linear (pass through) activation function is used → f(x) = x
• Assumes a convex error space
• If there are local minima, the search may get stuck and never come-out
• Since error is summed over all inputs, the convergence may be slow
• Incremental or stochastic gradient descent is a variation of the algorithm
• Weight update done with each input
• Sometimes helps in overcoming the local minima
• The same principle can be applied with other activation functions as well.
However, mathematical proof of convergence may be difficult.

6
Support Vector Machines

7
Classes & Boundaries

8
Linear binary classifier
𝑥2
𝑥1
ഥ 𝑥ҧ + 𝑏 = 0
𝒘. 𝑥ҧ =
𝑥2 α

𝒙 𝒇 𝒚′

𝒘

𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)

Denotes +ve class

ഥ 𝑥ҧ + 𝑏 < 0
𝒘. ഥ 𝑥ҧ + 𝑏 > 0
𝒘. Denotes -ve class
𝑥1

9
Linear binary classifier
𝑥2
𝑥1
ഥ 𝑥ҧ + 𝑏 = 0
𝒘. 𝑥ҧ =
𝑥2 α

𝒙 𝒇 𝒚′

ഥ 𝑥ҧ + 𝑏 < 0
𝒘.
𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)

Denotes +ve class

ഥ 𝑥ҧ + 𝑏 > 0
𝒘. Denotes -ve class
𝑥1

10
Support Vectors, Gutters, Separating Hyperplane & Margin
𝑥2 Separating Hyperplane

Gutters
A
Margin

Denotes +ve class

C Denotes -ve class


B

Support Vectors
𝑥1
11
Linear binary classifier α
𝑥2 ഥ 𝑥ҧ + 𝑏 = 0
𝒘. Denotes +ve class

𝒙 𝒇 𝒚′

𝒘
Denotes -ve class
A 𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)

ഥ 𝑥ҧ 𝐴 + 𝑏 = +1
𝒘.
C ഥ 𝑥ҧ 𝐵 + 𝑏 = +1
𝒘.
B
ഥ 𝑥ҧ 𝐶 + 𝑏 = −1
𝒘.
Generalization
ഥ 𝑥ҧ + 𝑣𝑒 + 𝑏 ≥ +1
𝒘.

𝑥1 ഥ 𝑥ҧ − 𝑣𝑒 + 𝑏 ≤ −1
𝒘.
1

12
Linear binary classifier α
𝑥2 ഥ 𝑥ҧ + 𝑏 = 0
𝒘. Denotes +ve class

𝒙 𝒇 𝒚′

𝒘
Denotes -ve class
A 𝒚′ = 𝒇 𝑥,ҧ 𝒘,
ഥ 𝑏
= 𝒔𝒊𝒈𝒏 (𝒘.ഥ 𝑥ҧ + 𝑏)

ഥ 𝑥ҧ 𝐴 + 𝑏 = +1
𝒘.
C ഥ 𝑥ҧ 𝐵 + 𝑏 = +1
𝒘.
B
ഥ 𝑥ҧ 𝐶 + 𝑏 = −1
𝒘.

ഥ | = 𝒘.
𝑴. | 𝒘 ഥ 𝑥𝐴 ഥ 𝑥ҧ 𝐶 + 𝑏
ҧ + 𝑏 − 𝒘.
2
𝑥1 M= 2
ഥ |
|𝒘
13
Optimization for SVM
2
1 ഥ | = 𝒘.
𝑴. | 𝒘 ഥ 𝑥𝐴 ഥ 𝑥ҧ 𝐶 + 𝑏
ҧ + 𝑏 − 𝒘.
ഥ 𝑥ҧ + 𝑣𝑒 + 𝑏 ≥ +1
𝒘. 2
M=
ഥ |
|𝒘
ഥ 𝑥ҧ − 𝑣𝑒 + 𝑏 ≤ −1
𝒘.

Multiply each equation with yi We want to maximize the margin M.


yi = +1 for all +ve class vectors ➔Minimize |𝒘ഥ |
yi = -1 for all –ve class vectors ഥ T. 𝒘)
➔Minimize (𝒘 ഥ

ഥ 𝑥ҧ 𝑖 + 𝑏 𝑦𝑖 ≥ +1
𝒘. Formulate the optimization problem & constraints
φ(𝒘) ഥ T. 𝒘)
ഥ = ½ (𝒘 ഥ
𝒚′ = 𝒇 𝒙
ഥ ഥ 𝑥ҧ 𝑖 + 𝑏 𝑦𝑖 ≥ +1
subject to 𝒘.
= σ𝒊 ∝ 𝑖 𝑦𝑖 𝒙ഥ𝑖𝑇. 𝒙
ഥ + 𝑏) for all 𝑖
14
Optimization for SVM

𝑦𝑖 are all scalars; hence,


෍ ∝ 𝑖 𝑦𝑖 = 0 ∝ 𝑖 are also scalars.
Formulate the optimization problem & constraints 𝒊
φ(𝒘) ഥ T. 𝒘)
ഥ = ½ (𝒘 ഥ
ഥ 𝑥ҧ 𝑖 + 𝑏 𝑦𝑖 ≥ +1
subject to 𝒘.
for all 𝑖 𝒚′ = 𝒇 𝒙

= σ𝒊 ∝ 𝑖 𝑦𝑖 𝒙ഥ𝑖𝑇. 𝒙
ഥ + 𝑏)

15
SVM

𝒚′ = 𝒇 𝒙
ഥ Kernel function is some
function which maintains
= σ𝒊 ∝ 𝑖 𝑦𝑖 𝒙ഥ𝑖𝑇. 𝒙
ഥ + 𝑏) the sanctity of dot product
in some expanded space

Kernel function: Kernel function generalized as:


K 𝒙ഥ𝑖, 𝒙ഥ𝑗 = 𝒙ഥ𝑖 𝑇. 𝒙ഥ𝑗 K 𝒙ഥ𝑖, 𝒙ഥ𝑗 = φ(𝒙ഥ𝑖)𝑇. φ(𝒙ഥ𝑗)

Kernel function by mapping


the vectors to some
expanded space, introduces
linearity which is absent in
the original space.

16
Multiclass Scenario
𝑥2

𝑥1

17
One against another approach
𝑥2
𝑥2 𝑥2

𝑥1
𝑥1 𝑥1

• Consider each class separately against all other classes combined


✓ Tomato against brinjal
✓ Kiwi against green chilies
✓ Apples against tomatoes
• N * (N-1) / 2 SVM models generated (10 models in this example)
• Test pattern is run through each models
• Class value assigned based on absolute maximum score & sign (+/-ve) from all models

18
One vs all others approach
𝑥2
• Consider each class separately
against all other classes
combined
✓ Tomato against all others
✓ Kiwi against all others
✓ Apples against all others
• SVM models generated as
many classes are available (5
models in this example)
• Test pattern is run through
each models
✓ Score of being an apple
✓ Score of being a tomato
• Class value assigned based on
𝑥1 maximum membership score

19
Python Code

20
Thank you !!!!!

21

You might also like