21 Support Vector Machines 03-10-2024
21 Support Vector Machines 03-10-2024
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
X O X X
X X X X O X X O
O O X O
O X X O O O O O
1 2m 4
R[f] Remp[f] + h ln + 1 + ln
m h
O O
X X
X O
X O O
X X O
X O
X O
O O O
OO O O O O
O X O
XX X
O O
O X X O
O O O
O O
Image from http://www.atrandomresearch.com/iclass/
x2 x12
X X
X X
X X O O O O X X x1 OO O O x1
Image from http://web.engr.oregonstate.edu/
~afern/classes/cs534/
Copyright © 2001, 2003, Andrew W.
Moore
“Given an algorithm which is
formulated in terms of a positive
definite kernel K1, one can construct
an alternative algorithm by replacing
K1 with another positive definite
kernel K2”
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane. Why?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane. Why?
Let u and v be two vectors on the
Plus Plane. What is w . ( u – v ) ?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
Any location in
• Let x- be any point on the minus plane Rmm::not
not
necessarily a
• Let x+ be the closest plus-plane-point to x-. datapoint
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
• Claim: x+ = x- + l w for some value of l. Why?
1 2 N N
L p = w - a i y i ( x i w + b) + a i
2 i =1 i =1
• The final classifier is computed using the support vectors and the weights:
N
f (x ) = ai yi xi x + b
i =1
• Also, the system will be very sensitive to mislabeled training data or outliers.
yi ( xi wi + b) - (1 - xi ) 0 Class 2
This gives the system the ability to
ignore data points near the boundary,
and effectively pushes the margin
towards the centroid of the training data.
• The solution to this problem can still be found using Lagrange multipliers.
ECE 8443: Lecture 16, Slide 1
Nonlinear Decision Surfaces
• Thus far we have only considered linear decision surfaces. How do we
generalize this to a nonlinear surface?
f( )
f( ) f( )
f( ) f( ) f( )
f(.) f( )
f( ) f( )
f( ) f( )
f( ) f( )
f( ) f( ) f( )
f( )
f( )
• Generates a distance from the hyperplane, but this distance is often not a
good measure of our “confidence” in the classification
• Number of support vectors grows linearly with the size of the data set
Error Open-Loop
Error
Optimum
Training Set
Error
Model Complexity
§ Computational complexity?
• Find a linear hyperplane (decision boundary) that will separate the data
10/11/2021 Introduction to Data Mining, 2nd Edition 2
Support Vector Machines
r r
w x+b=0
r r
r r w x + b = +1
w x + b = -1
r r
r 1 if w x + b 1 2
f (x) = r r Margin = r
-1 if w x + b - 1 || w ||
10/11/2021 Introduction to Data Mining, 2nd Edition 8
Linear SVM
• Linear model:
r r
r 1 if w x+b 1
f ( x) = r r
- 1 if w x + b -1
Support vectors
Decision boundary:
r r
w F( x ) + b = 0
10/11/2021 Introduction to Data Mining, 2nd Edition 17
Learning Nonlinear SVM
• Optimization problem:
• Issues:
– What type of mapping function F should be
used?
– How to do the computation in high
dimensional space?
u Most computations involve dot product F(xi) F(xj)
u Curse of dimensionality?
• Kernel Trick:
– F(xi) F(xj) = K(xi, xj)
– K(xi, xj) is a kernel function (expressed in
terms of the coordinates in the original space)
u Examples:
• Robust to noise
• Overfitting is handled by maximizing the margin of the decision boundary,
• SVM can handle irrelevant and redundant attributes better than many
other techniques
• The user needs to provide the type of kernel function and cost function
• Difficult to handle missing values