An Introduction Of: Support Vector Machine
An Introduction Of: Support Vector Machine
Support Vector
Machine
Jinwei Gu
2008/10/16
Parzen-Window, kn-Nearest-Neighbor
V. Vapnik
Outline
Discriminant Function
g i ( x) g j ( x)
for all j i
Minimum-Error-Rate Classifier
g (x) p (1 | x) p(2 | x)
Discriminant Function
Nearest
Neighbor
Decision
Tree
Linear
Functions
g ( x) w T x b
Nonlinear
Functions
x2
wT x + b > 0
g ( x) w T x b
b
+
x
=0
w
n
w
wT x + b < 0
x1
x2
denotes -1
x1
x2
denotes -1
x1
x2
denotes -1
x1
x2
denotes -1
x1
safe zone
denotes +1
denotes -1
Margin
x1
denotes +1
denotes -1
For yi 1, wT xi b 1
For yi 1, wT xi b 1
x1
We know that
denotes +1
denotes -1
Margin
wT x b 1
w x b 1
T
b
+
x
=1
x+
= 0 = -1
b
b
+
+
T x
T x
w
w
M (x x ) n
w
2
(x x )
w
w
x-
Support Vectors
x1
Formulation:
maximize
denotes +1
denotes -1
2
w
T
such that
For yi 1, w T xi b 1
Margin
b
+
x
=1
x+
= 0 = -1
b
b
+
+
T x
T x
w
w
n
x-
For yi 1, w T xi b 1
x1
Formulation:
1
minimize
w
2
denotes +1
denotes -1
Margin
such that
For yi 1, w T xi b 1
b
+
x
=1
x+
= 0 = -1
b
b
+
+
T x
T x
w
w
n
x-
For yi 1, w T xi b 1
x1
Formulation:
1
minimize
w
2
denotes -1
Margin
such that
yi (wT xi b) 1
denotes +1
b
+
x
=1
x+
= 0 = -1
b
b
+
+
T x
T x
w
w
n
x-
x1
minimize
s.t.
yi (wT xi b) 1
Lagrangian
Function
n
1
2
minimize Lp ( w, b, i ) w i yi ( w T xi b) 1
2
i 1
s.t.
i 0
s.t.
Lp
w
Lp
b
i 1
i 0
n
w i yi xi
i 1
y
i 1
s.t.
i 1
i 0
Lagrangian Dual
Problem
n
1 n n
maximize i i j yi y j xTi x j
2 i 1 j 1
i 1
s.t.
i 0 , and
y
i 1
x2
i yi (wT xi b) 1 0
x+
+b
=1
x+
b
x+
=0
T
+b
1
=-
x-
w i yi xi
i 1
Support Vectors
y x
iSV
get b from yi ( w T xi b) 1 0,
where xi is support vector
x1
g ( x) w T x b
x
i i xb
iSV
denotes +1
denotes -1
b
+
x
x1
n
1
2
minimize
w C i
2
i 1
such that
yi (wT xi b) 1 i
i 0
1 n n
maximize i i j yi y j xTi x j
2 i 1 j 1
i 1
such that
0 i C
n
y
i 1
Non-linear
Datasets that are linearly separable with noise work out great:
SVMs
: x (x)
g ( x) w T ( x) b
(
x
)
i i ( x) b
iSV
K ( x i , x j ) ( x i )T ( x j )
K (xi , x j ) xTi x j
Linear kernel:
Polynomial kernel:
K (xi , x j ) (1 xTi x j ) p
K (xi , x j ) exp(
Sigmoid:
xi x j
2
1 n n
maximize i i j yi y j K (xi , x j )
2 i 1 j 1
i 1
such that
0 i C
n
y
i 1
g ( x)
K ( x , x) b
iSV
Some Issues
Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
- domain experts can give assistance in formulating appropriate
similarity measures
Additional Resource
http://www.kernel-machines.org/
Demo of LibSVM
http://www.csie.ntu.edu.tw/~cjlin/libsvm/