0% found this document useful (0 votes)
136 views

Support Vector Machines

1. Support vector machines (SVMs) are a type of supervised learning model used for classification and regression analysis. SVMs find a hyperplane in a multidimensional space that distinctly classifies data points. 2. SVMs aim to maximize the margin between the decision boundary and the nearest data points of each class. These nearest data points are called support vectors. 3. There are various techniques for applying SVMs to problems with more than two classes, such as one-versus-one and one-versus-all methods. Cross-validation is commonly used to estimate model performance.

Uploaded by

anshul77
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views

Support Vector Machines

1. Support vector machines (SVMs) are a type of supervised learning model used for classification and regression analysis. SVMs find a hyperplane in a multidimensional space that distinctly classifies data points. 2. SVMs aim to maximize the margin between the decision boundary and the nearest data points of each class. These nearest data points are called support vectors. 3. There are various techniques for applying SVMs to problems with more than two classes, such as one-versus-one and one-versus-all methods. Cross-validation is commonly used to estimate model performance.

Uploaded by

anshul77
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Support Vector Machine

Shao-Chuan Wang
1
Support Vector Machine
1D Classification Problem: how will you
separate these data?(H1, H2, H3?)
2
x
0
H1 H2 H3
Support Vector Machine
2D Classification Problem: which H is better?
3
Max-Margin Classifier
Functional Margin


Geometric Margin
4
} 1 , 1 { , )} , {(
) (
0
) ( ) (
+ e =
=
i m
i
i i
y y S x
) (

) ( ) ( ) (
b y
i T i i
+ = x w
We feel more confident
when functional margin is larger
0
) ( ) (
= +
|
|
.
|

\
|
b
i i T
w
w
x w
w
x
w
w b
i
T
i
+
|
|
.
|

\
|
=
) ( ) (

0 = +b
T
x w
1
x
2
x
) (i

w
) , (
) ( ) ( i i
y x
Note that scaling on w, b wont
change the plane.
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Maximize margins
Optimization problem: maximize minimal
geometric margin under constraints.





Introduce scaling factor such that











5
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Optimization problem subject to
constraints
Maximize f(x, y), subject to constraint g(x, y) = c

6
c y x g = ) , (
) , ( y x f
Lagrange multiplier method
) ) , ( ( ) , ( ) , , ( c y x g y x f y x + = A
0 , 0 , 0 =
c
A c
=
c
A c
=
c
A c
y x
Lagrange duality
Primal optimization problem:


Generalized Lagrangian method


Primal optimization problem (equivalent form)

Dual optimization problem:
7
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Dual Problem

The necessary conditions that equality holds:
f, g
i
are convex, and h
i
are affine.
KKT conditions.
8
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Optimal margin classifiers


Its Lagrangian

Its dual problem
9
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
If not linearly separable, we can
Find a nonlinear solution
Technically, its a linear solution in higher-order
space








Kernel Trick
Support Vector Machine (contd)
( ) ( ) ( )
, = u u
T
T
i j i j i j
x x K x x x x
26
(

= u
2
2
2
1
) (
x
x
x
Kernel and feature mapping
Kernel:
Positive semi-definite
Symmetric
For example:

Loose Intuition
similarity between features

11
) ( ) ( ) , ( z x z x K
T
| | =
( )
2
) , ( z x z x K
T
=
(
(
(
(
(
(
(
(
(
(
(
(

=
3 3
2 3
1 3
3 2
2 2
1 2
3 1
2 1
1 1
) (
x x
x x
x x
x x
x x
x x
x x
x x
x x
x |
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
n T
z Kz z 9 e > , 0
0 = z
Soft Margin (L1 regularization)
12
C = leads to hard margin SVM,
Rychetsky (2001)
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Why doesnt my model fit well
on test data ?
13
Bias/variance tradeoff

underfitting (high bias) overfitting (high variance)





Training Error =
Generalization Error =
14

=
= =
m
i
i i
y x h
m
h
1
) ( ) (
} ) ( { 1
1
) (

c
) ) ( ( ) (
~ ) , (
y x h P h
D y x
= = c
Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
In-sample error
Out-of-sample error
Bias/variance tradeoff
15
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer
series in statistics. Springer, New York, 2001.
Is training error a good estimator
of generalization error?
16
Chernoff bound (|H|=finite)
Lemma: Assume Z
1
, Z
2
, , Z
m
are drawn iid
from Bernoulli(), and

and let > 0 be fixed. Then,


based on this lemma, one can find, with
probability 1-
(k = # of hypotheses)
17

=
=
m
i
i
Z m
1
) / 1 (

|
) 2 exp( 2 ) |

(|
2
m P | | s >
o
c c
k
m
h h
2
log
2
1
) ( ) (

s
Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
Chernoff bound (|H|=infinite)
VC Dimension d : The size of largest set that H
can shatter.
e.g.
H = linear classifiers
in 2-D
VC(H) = 3

With probability at least 1-,
18
|
|
.
|

\
|
+ O s
o
c c
1
log
1
log ) ( ) (

m d
m
m
d
h h
Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
Model Selection
Cross Validation: Estimator of generalization error
K-fold: train on k-1 pieces, test on the remaining
(here we will get one test error estimation).


Average k test error estimations, say, 2%. Then 2%
is the estimation of generalization error for this
machine learner.
Leave-one-out cross validation (m-fold, m =
training sample size)




19
train train validate train train train
Model Selection
Loop possible parameters:
Pick one set of parameter, e.g. C = 2.0
Do cross validation, get a error estimation
Pick the C
best
(with minimal error estimation) as
the parameter


20
Multiclass SVM
One against one
There are binary SVMs. (1v2, 1v3, )
To predict, each SVM can vote between 2 classes.


One against all
There are k binary SVMs. (1 v rest, 2 v rest, )
To predict, evaluate , pick the largest.
Multiclass SVM by solving ONE optimization
problem

21
|
|
.
|

\
|
2
k
1 3 5 3 2 1
1 2 3 4 5 6
K =
poll
K = 3
b
T
+ x w
Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based
vector machines. JMLR, 2, 265-292.
Multiclass SVM (2/2)
DAGSVM (Directed Acyclic Graph SVM)
22
An Example: image classification
Process
23
Raw
images
Formatted
vectors
Training
Data
K-fold Cross
validation
SVM
(with best
C)
Test Data
Accuracy
1 0:49 1:25
1 0:49 1:25


2 0:49 1:25

1/4
3/4
K = 6
An Example: image classification
Results
Run Multi-class SVM 100 times for both
(linear/Gaussian).
Accuracy Histogram
24

You might also like