Support Vector Machine SVM
Support Vector Machine SVM
SVM
Slides for guest lecture presented by Linda Sellie in Spring 2012 for
CS6923, Machine Learning, NYU-Poly
with a few corrections...
http://www.svms.org/tutorials/Hearst-etal1998.pdf
http://www.cs.cornell.edu/courses/cs578/2003fa/slides_sigir03_tutorial-modified.v3.pdf
Which Hyperplane?
g(x)?
+
++
+
+
- +
- - -
g(x) = w x + w0
g(x) > 0 then f (x) = 1
g(x) 0 then f (x) = 1
T
If w = (3, 4) & w0 = 10
T
g(x) = (3, 4) x 10
T
+
++
+
(2, 2) +
+
g(x) = w x + w0
g(x) > 0 then f (x) = 1
g(x) 0 then f (x) = 1
T
If w = (3, 4) & w0 = 10
T
g(x) = (3, 4) x 10
T
+
++
+
(2, 2) +
+
with shared variance for each feature (for each xi, requiring estimated variance
of distributions for p[xi|+] and p[xi|-] to be the same)
+
++
g(x)
- - - -
+
margin
+
g(x)
+
++
+
+
- margin
+
- - -
x3
projection of
the plane x =
xp
r x
x onto w
xp + r
w
w
g(x) = w x + w0
T
Thus
g(x)
r=
w
w
= w xp + r
+
w
0
x1
w
T
2
T
=w r
Observe that w w = w
w
= w r
T
x2
g(x) = w x + w0
T
w = (3, 4) w0 = 10
T
g(x)
Distance Formula: r =
w
g(x)
+
+ +
+
(2, 2) +
+
- -(1,1)
- -(1,.5)
- --
++
+ (3,1)
1
g '(x) = w ' x + w '0 = g(x)
3
1
T
w '0 = 10 / 3
w ' = w = (1, 4 / 3)
3
g '(x)
Distance Formula: r =
w'
g '(2, 2)
(1, 4 / 3)i(2, 2) 10 / 3
=
= 4/5
(1, 4 / 3)
5/3
g '(1,.5) (1, 4 / 3)i(1,.5) 10 / 3
=
= 1
(1, 4 / 3)
5/3
+
+ g '(3,1) (1, 4 / 3)i(3,1) 10 / 3
g(x)
+
+ + +
=
= 3/5
(2, 2) +
+
(1, 4 / 3)
5/3
- -(1,1)
g '(1,1)
(1, 4 / 3)i(1,1) 10 / 3
(3,1)
+
=
= 3 / 5
- -(1,
4
/
3)
5
/
3
(1,.5)
- T
+
+
+
++ +
+ ++
-- +
- - - -- - --
g(x)
+
support vector +
++ +
+ +
x support vector
- + margin
x
support -vector1
+
++
+
(2, 2) +
+
(0, 3.3)
if it is a maximum margin
hyperplane -- since in such a
hyperplane, the distance to the
closest + example must be equal
to the distance to the closest example
g(x)
+
+ +
+
(2, 2) (3, 2)
+ +
- -(1,1)
- (1,.5)- (1,1)
1
+ (3,1)
(i )
g(x)
1
:r =
=
w
w
assuming
canonical
hyperplane
1
w
w
(i )
(i )
(i )
S = {< (1,1), 1 >, < (2, 2),1 >, < (1,1 / 2), 1 >, < (3, 2),1 >,...}
+ (3,1)
g(x) = wT x + b = 1
g(x) = wT x + b = 0
g(x) = wT x + b = 1
negative
positive
Linearly separable?
g(x) = w x + w0
T
+
+
(0,1.25)
(1,1) + +
+ +(1,1)
+ g(x) - +
+
+
+
(0.25, 0.25) +
(1.25, 0)
(0.5,
0)
(0, 0)
+
+ (0.75, 0.25)- - - (0.5, 0.5) +
+ -- - - - - +
(1, 0.75)
+
(1, 1)
+
+ ++ +
+
+
+
Linearly separable?
Transform feature space
: x (x) (x) = x , x
T
g(x) = w (x) + w0
w = (1,1) w0 = 1
2
1
2
2
+
+
(0,1.25)
(1,1) + +
+ +(1,1)
+ g(x) - +
+
+
+
(0.25, 0.25) +
(1.25, 0)
(0.5,
0)
(0, 0)
+
+ (0.75, 0.25)- - - (0.5, 0.5) +
+ -- - - - - +
(1, 0.75)
+
(1, 1)
+
+ ++ +
+
+
+
(0,1.5625)
(1,1)
+
(1, 0.56)
+
(0.25, 0.25)
(1.5625, 0)
(0, 0)-(0, 0.25) - - (0.56, 0.0625) +
Linearly separable?
Transform feature space
: x (x) (x) = x , x
T
g(x) = w (x) + w0
w = (1,1) w0 = 1
2
1
2
2
+
+
(0,1.25)
(1,1) + +
+ +(1,1)
+ g(x) - +
+
+
+
(0.25, 0.25) +
(1.25, 0)
(0.5,
0)
(0, 0)
+
+ (0.75, 0.25)- - - (0.5, 0.5) +
+ -- - - - - +
(1, 0.75)
+
(1, 1)
+
+ ++ +
+
+
+
(0,1.5625)
(1,1)
+
(1, 0.56)
+
(0.25, 0.25)
(1.5625, 0)
(0, 0)-(0, 0.25) - - (0.56, 0.0625) +
Linearly separable?
-. .- .--. +. +. +. .--. -. -. -.
0
1
2
3
Linearly separable?
Yes, by transforming the
feature
space!
There is an error in this slide.
(x) = x, x
2
-. .- .--. +. +. +. .--. -. -. -.
0
1
2
3
Linearly separable?
Yes, by transforming the feature space!
(x) = x, x
2
-. .- .--. +. +. +. .--. -. -. -.
0
1
2
3
+(7 / 4, 49 /16)
+(3 / 2, 9 / 4)
+ (5 / 4, 25 /16)
(3 / 4, 9 /16)-
Kernel Function
K (x, z) = (x)i (y)
KERNEL TRICK
Never compute phi(x). Just compute K(x,z)
Why is this enough?
If work with dual representation of hyperplane (and dual
quadratic program), only use of new features is in inner
products!
Non-Separable Data
IV
x
What if data is not linearly separable for
only a few points?
+ + +
+
+ ++
+ + + +
- + +
+
--+
(3,1)+
+
-- -+
(1,1)
x
What if a small number of points prevents
the margin from being large?
+ + +
+
+ ++
+ + + +
+
+
+
--- +
(3,1)+
+
-- -+
(1,1)
x
What if a small number of points prevents
the margin from being large?
large
+ + +
+
+ ++
+ + + +
+
+
+
+
--- - - +
(3,1)+
+
+
-- -+
(1,1)
What if =?
small
+ + +
+
+ ++
+
+ + +
+
+
+
+
--- -- +
(3,1)+
+
+
-- -+
--- (1,1)