A Introduction To SVM PDF
A Introduction To SVM PDF
Vector Machines
Tingfan Wu
MPLAB, UCSD
Outline
Data Classification
High-level Concepts of SVM
Interpretation of SVM Model/Result
Use Case Study
Making predictions is
fundamental to survival
Will that bear eat me?
Data Classification
Sensor
Data
Preprocessing
features
Classifier
SVM
Adaboost
Neural Network
Prediction
Generalization
Generalization
10
Outline
Data Classification
High-level Concepts of SVM
Interpretation of SVM Model/Result
Use case study
11
A Simple Dilemma
Who do I invite to my
birthday party?
12
Problem Formulation
training data as vectors: xi
binary labels [ +1, -1]
Name
Gift?
Income
Fondness
John
Mary
Yes
No
3k
5k
3/5
1/5
class
feature vector
13
x2
(Disposable Income)
Vector space
+ +
+
No Gift
+ Gift
+
+ ++
+
+
x1(Fondness)
14
A Line
The line : w T x
+ b= 0
x2(second feature)
Normal: w
++
+
+
+
+ ++
+
+
+
x1(first feature)
15
w x + b= 0
+ + wT x + b> 0
i
- +
+
+
+
+
+
wT xi + b < 0
+
+
+
-
model
16
Large Margin
17
Maximal Margin
18
Case 1
Case 2
19
Trick 1: Soft-Margin
These points are usually outliers. The hyperplane should not bias too much.
Penalty of
violating data
20
Soft-margin
21
Support vectors
22
x 2 = x 21
x2
x 2 = x 21
M appi ng:
x1
x 21
23
=Kernel(x,y)
24
Dual Problem
Primal
Dual
finite calculation
25
Gaussian/RBF Kernel
~ linear kernel
Overfitting
nearest neighbor?
26
27
Recap
Soft-ness
Nonlinearity
28
29
Cross Validation
What is the best (C, ) ? Date dependent
Need to be determined by testing performance
Split training data into pseudo training, testing sets
Training
Split: training
Testing
Split: test
30
Outline
31
32
33
34
35
(3)Weights as profiles
Fluorescent image of cells of
various dosage of certain drug
Various image-based features
Outline
38
The Software
SVM requires an constraint quadratic
optimization solver
not easy to implement.
Off-the-shelf Software
libsvm by Chih-Jen Lin et. al.
svmlight by Thorsten Joachims
Beginners may
1. Convert their data into the format of a
SVM software.
2. May not conduct scaling
3. Randomly try few parameters and
without cross validation
4. Good result on training data, but poor in
testing.
40
Data scaling
Without scaling
feature of large dynamic range may dominate
separating hyperplane.
X Height Gender
x1 150 2
y2=1
y3=1
x2 180
x3 185
1
1
Gender
label
y1=0
Height
41
Parameter Selection
Contour of cross validation accuracy.
Good area
42
44
Overfitting
Training
$./svm-train train.1 (default parameter used)
optimization finished, #iter = 6131
nSV = 3053, nBSV = 724
Total nSV = 3053
Training Accuracy
$./svm-predict train.1 train.1.model o
Accuracy = 99.7734% (3082/3089)
Testing Accuracy
$./svm-predict test.1 train.1.model test.1.out
Accuracy = 66.925% (2677/4000)
nSV and nBSV: number of SVs and bounded SVs (i = C).
Without scaling. One feature may dominant the value overfitting
Suggested Procedure
Data pre-scaling
scale range [0 1] or unit variance
46
24
47
Resources
LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm
LIBSVM Tools: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools
Kernel Machines Forum: http://www.kernel-machines.org
Hsu, Chang, and Lin: A Practical Guide to Suppor t Vector
Classification
my email: [email protected]
Acknowledgement
Many slides from Dr. Chih-Jen Lin , NTU
48