0% found this document useful (0 votes)

122 views

An Introduction Of: Support Vector Machine

This document provides an introduction and overview of support vector machines (SVM). It begins with an outline of the topics to be covered, including linear discriminant functions, large margin linear classifiers, nonlinear SVMs using the kernel trick, and a demo of SVM. It then discusses linear discriminant functions and how to classify data points to minimize error rate. It introduces the concept of large margin classifiers and formulates the optimization problem to find the linear decision boundary with maximum margin. Finally, it describes how the kernel trick allows SVMs to be used for nonlinear classification by mapping the data into a higher-dimensional feature space.

Uploaded by

Chandan Roy

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views

An Introduction Of: Support Vector Machine

Uploaded by

Chandan Roy

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

An Introduction of

Support Vector
Machine
Jinwei Gu
2008/10/16

Review: What Weve Learned

So Far

Bayesian Decision Theory

Maximum-Likelihood & Bayesian Parameter Estimation
Nonparametric Density Estimation

Parzen-Window, kn-Nearest-Neighbor

K-Nearest Neighbor Classifier

Decision Tree Classifier

Today: Support Vector

Machine (SVM)

A classifier derived from statistical learning theory by Vapnik, et

al. in 1992
SVM became famous when, using images as input, it gave
accuracy comparable to neural-network with hand-designed
features in a handwriting recognition task
Currently, SVM is widely used in object detection & recognition,
content-based image retrieval, text recognition, biometrics,
speech recognition, etc.
Also used for regression (will not cover today)

Chapter 5.1, 5.2, 5.3, 5.11 (5.4*) in textbook

V. Vapnik

Outline

Linear Discriminant Function

Large Margin Linear Classifier
Nonlinear SVM: The Kernel Trick
Demo of SVM

Discriminant Function

Chapter 2.4: the classifier is said to assign a feature vector x to class

wi if

g i ( x) g j ( x)

For two-category case,

for all j i

g (x) g1 (x) g 2 (x)

Decide 1 if g ( x) 0; otherwise decide 2

An example weve learned before:

Minimum-Error-Rate Classifier

g (x) p (1 | x) p(2 | x)

Discriminant Function

It can be arbitrary functions of x, such as:

Nearest
Neighbor

Decision
Tree

Linear
Functions

g ( x) w T x b

Nonlinear
Functions

Linear Discriminant Function

g(x) is a linear function:

wT x + b > 0

g ( x) w T x b

A hyper-plane in the feature

space

b
+
x

(Unit-length) normal vector

of the hyper-plane:

w
n
w

wT x + b < 0

Linear Discriminant Function

denotes +1

How would you classify

these points using a linear
discriminant function in order
to minimize the error rate?

Infinite number of answers!

denotes -1

Linear Discriminant Function

denotes +1

How would you classify

these points using a linear
discriminant function in order
to minimize the error rate?

Infinite number of answers!

denotes -1

Linear Discriminant Function

denotes +1

How would you classify

these points using a linear
discriminant function in order
to minimize the error rate?

Infinite number of answers!

denotes -1

Linear Discriminant Function

denotes +1

How would you classify

these points using a linear
discriminant function in order
to minimize the error rate?

Infinite number of answers!

Which one is the best?

denotes -1

Large Margin Linear

Classifier
x

The linear discriminant

function (classifier) with the
maximum margin is the best

Margin is defined as the

width that the boundary
could be increased by before
hitting a data point

Why it is the best?

Robust to outliners and thus

strong generalization ability

safe zone

denotes +1
denotes -1
Margin

Large Margin Linear

Classifier
x

Given a set of data points:

denotes +1
denotes -1

{(xi , yi )}, i 1, 2, L , n, where

For yi 1, wT xi b 0
For yi 1, wT xi b 0

With a scale transformation

on both w and b, the above
is equivalent to

For yi 1, wT xi b 1
For yi 1, wT xi b 1

Large Margin Linear

Classifier
x

We know that

denotes +1
denotes -1

Margin

wT x b 1

w x b 1
T

b
+
x

= 0 = -1
b
b
+
+
T x
T x
w
w

The margin width is:

M (x x ) n
w
2
(x x )

w
w

Support Vectors

Large Margin Linear

Classifier
x

Formulation:

maximize

denotes +1
denotes -1

2
w
T

such that

For yi 1, w T xi b 1

Margin

b
+
x

= 0 = -1
b
b
+
+
T x
T x
w
w

n
x-

For yi 1, w T xi b 1
x1

Large Margin Linear

Classifier
x

Formulation:

1
minimize
w
2

denotes +1
denotes -1

Margin

such that

For yi 1, w T xi b 1

b
+
x

= 0 = -1
b
b
+
+
T x
T x
w
w

n
x-

For yi 1, w T xi b 1
x1

Large Margin Linear

Classifier
x

Formulation:

1
minimize
w
2

denotes -1

Margin

such that

yi (wT xi b) 1

denotes +1

b
+
x

= 0 = -1
b
b
+
+
T x
T x
w
w

n
x-

Solving the Optimization

Problem
1
Quadratic
programming
with linear
constraints

minimize

s.t.

yi (wT xi b) 1

Lagrangian
Function
n
1
2
minimize Lp ( w, b, i ) w i yi ( w T xi b) 1
2
i 1

s.t.

i 0

Solving the Optimization

Problem
1
minimize L ( w, b, ) w y ( w x b) 1
n

s.t.

Lp
w
Lp
b

i 1

i 0
n

w i yi xi
i 1

y
i 1

Solving the Optimization

Problem
1
minimize L ( w, b, ) w y ( w x b) 1
n

s.t.

i 1

i 0

Lagrangian Dual
Problem
n

1 n n
maximize i i j yi y j xTi x j
2 i 1 j 1
i 1
s.t.

i 0 , and

y
i 1

Solving the Optimization

Problem
From KKT condition, we know:

i yi (wT xi b) 1 0

Thus, only support vectors have i 0

b
x+

=0
T

1
=-

The solution has the form:

w i yi xi
i 1

Support Vectors

y x

iSV

get b from yi ( w T xi b) 1 0,
where xi is support vector

Solving the Optimization

Problem
The linear discriminant function is:

g ( x) w T x b

x
i i xb

iSV

Notice it relies on a dot product between the test point x

and the support vectors xi

Also keep in mind that solving the optimization problem

involved computing the dot products xiTxj between all pairs
of training points

Large Margin Linear

Classifier
x

What if data is not linear

separable? (noisy data,
outliers, etc.)

Slack variables i can be

1
added to allow mis=1
b
classification of difficult or
+
T x
=0
b
noisy data points
w
1
+
=T x

denotes +1
denotes -1

b
+
x

Large Margin Linear

Classifier
Formulation:

n
1
2
minimize
w C i
2
i 1

such that

yi (wT xi b) 1 i

i 0

Parameter C can be viewed as a way to control over-fitting.

Large Margin Linear

Classifier
Formulation: (Lagrangian Dual Problem)

1 n n
maximize i i j yi y j xTi x j
2 i 1 j 1
i 1
such that

0 i C
n

y
i 1

Non-linear
Datasets that are linearly separable with noise work out great:
SVMs

But what are we going to do if the dataset is just too hard?

How about mapping data to a higher-dimensional space:

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Non-linear SVMs: Feature

General idea: the original input space can be mapped to
Space

some higher-dimensional feature space where the

training set is separable:

: x (x)

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Nonlinear SVMs: The Kernel

With this mapping, our discriminant function is now:
Trick

g ( x) w T ( x) b

(
x
)
i i ( x) b

iSV

No need to know this mapping explicitly, because we only use

the dot product of feature vectors in both the training and test.

A kernel function is defined as a function that corresponds to a

dot product of two feature vectors in some expanded feature
space:

K ( x i , x j ) ( x i )T ( x j )

Nonlinear SVMs: The Kernel

An example:
Trick

2-dimensional vectors x=[x1 x2];

let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj) = (xi) T(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 2 xi1xi2 xi22 2xi1 2xi2]T [1 xj12 2 xj1xj2 xj22 2xj1 2xj2]
= (xi) T(xj),

where (x) = [1 x12 2 x1x2 x22 2x1 2x2]

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Nonlinear SVMs: The Kernel

Examples of commonly-used kernel functions:
Trick

K (xi , x j ) xTi x j

Linear kernel:

Polynomial kernel:

Gaussian (Radial-Basis Function (RBF) ) kernel:

K (xi , x j ) (1 xTi x j ) p

K (xi , x j ) exp(

Sigmoid:

xi x j
2

K (xi , x j ) tanh( 0 xTi x j 1 )

In general, functions that satisfy Mercers condition can be

kernel functions.

Nonlinear SVM: Optimization

Formulation: (Lagrangian Dual Problem)

1 n n
maximize i i j yi y j K (xi , x j )
2 i 1 j 1
i 1
such that

0 i C
n

y
i 1

The solution of the discriminant function is

g ( x)

K ( x , x) b

iSV

The optimization technique is the same.

Support Vector Machine:

Algorithm

1. Choose a kernel function

2. Choose a value for C

3. Solve the quadratic programming problem

(many software packages available)

4. Construct the discriminant function from the

support vectors

Some Issues

Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
- domain experts can give assistance in formulating appropriate
similarity measures

Choice of kernel parameters

- e.g. in Gaussian kernel
- is the distance between closest points with different classifications
- In the absence of reliable criteria, applications rely on the use of a
validation set or cross-validation to set such parameters.

Optimization criterion Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters are
tested

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Summary: Support Vector

Machine

1. Large Margin Classifier

Better generalization ability & less over-fitting

2. The Kernel Trick

Map data points to higher dimensional space in

order to make them linearly separable.
Since only dot product is used, we do not need to
represent the mapping explicitly.

Additional Resource

http://www.kernel-machines.org/

Demo of LibSVM

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

SVM
No ratings yet
SVM
21 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Ch5 - Support Vector Machine (SVM)
No ratings yet
Ch5 - Support Vector Machine (SVM)
27 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Machine Learning Solution
No ratings yet
Machine Learning Solution
6 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
SVM
No ratings yet
SVM
12 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
10 Support Vector Machine
No ratings yet
10 Support Vector Machine
130 pages
Zenva - Javascript 101 Slides
No ratings yet
Zenva - Javascript 101 Slides
49 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Computer Vision & Image Processing Assignment
100% (1)
Computer Vision & Image Processing Assignment
13 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
Ml Lab Manual (5cs4-23)
No ratings yet
Ml Lab Manual (5cs4-23)
53 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
938 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
06 Feature Engineering
No ratings yet
06 Feature Engineering
24 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
21 SVR
No ratings yet
21 SVR
22 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
3 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Feature Extraction
No ratings yet
Feature Extraction
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Deep Learning - Wikipedia
No ratings yet
Deep Learning - Wikipedia
36 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Feature Selection Methods
No ratings yet
Feature Selection Methods
24 pages
Bandits
No ratings yet
Bandits
2 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Data Encryption and Decription
No ratings yet
Data Encryption and Decription
10 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Machine Learning Revision Notes
No ratings yet
Machine Learning Revision Notes
6 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
svm
No ratings yet
svm
36 pages
SVM
No ratings yet
SVM
36 pages
Common Mode Rejection
No ratings yet
Common Mode Rejection
9 pages
Synthesis Design
100% (1)
Synthesis Design
8 pages
Triple Mode
No ratings yet
Triple Mode
3 pages
Microstrip To Microstrip Transition
No ratings yet
Microstrip To Microstrip Transition
7 pages
Pam
100% (1)
Pam
19 pages
Data-Driven Science and Engineering Steven L. Brunton - Download the ebook today to explore every detail
100% (1)
Data-Driven Science and Engineering Steven L. Brunton - Download the ebook today to explore every detail
76 pages
MCQ On LPP
100% (1)
MCQ On LPP
14 pages
MGTSCI - Individual Activity 7
No ratings yet
MGTSCI - Individual Activity 7
3 pages
Mb0048 Unit 01-Slm
No ratings yet
Mb0048 Unit 01-Slm
20 pages
OR4 Odt
No ratings yet
OR4 Odt
3 pages
Examiner's Report: Performance Management (PM) June 2019
No ratings yet
Examiner's Report: Performance Management (PM) June 2019
8 pages
Linear Programming PDF
No ratings yet
Linear Programming PDF
3 pages
A Multi-Size Compartment Vehicle Routing Problem For Multi-Product Distribution: Models and Solution Procedures
No ratings yet
A Multi-Size Compartment Vehicle Routing Problem For Multi-Product Distribution: Models and Solution Procedures
21 pages
Linear Programming Is Project Management Use Two Methods-1
No ratings yet
Linear Programming Is Project Management Use Two Methods-1
3 pages
Direct Methods For Limit States in Structures, (Franck Pastor, Joseph Pastor, Djimedo Kondo (Auth.), Konstantinos Spiliopoulos, Dieter Weichert (Eds.) )
100% (2)
Direct Methods For Limit States in Structures, (Franck Pastor, Joseph Pastor, Djimedo Kondo (Auth.), Konstantinos Spiliopoulos, Dieter Weichert (Eds.) )
281 pages
Coal Blending Model Developed To Prepare Feed To RINL With Cost Optimization - IPR
100% (2)
Coal Blending Model Developed To Prepare Feed To RINL With Cost Optimization - IPR
41 pages
Process Selection and Facility Layout
100% (1)
Process Selection and Facility Layout
72 pages
Syllabus
No ratings yet
Syllabus
2 pages
Case Study Spring 2022 Last Vearsion
No ratings yet
Case Study Spring 2022 Last Vearsion
3 pages
Management Science Lecture 2 Graphical Method
No ratings yet
Management Science Lecture 2 Graphical Method
12 pages
Optimization in ChemicalEngiereneering TOC
No ratings yet
Optimization in ChemicalEngiereneering TOC
7 pages
Supply Chain Coordination Mechanisms - New Approaches For Collaborative Planning (PDFDrive)
No ratings yet
Supply Chain Coordination Mechanisms - New Approaches For Collaborative Planning (PDFDrive)
234 pages
Course Description or (VCE R11)
No ratings yet
Course Description or (VCE R11)
6 pages
Sustainability 16 01278
No ratings yet
Sustainability 16 01278
15 pages
Introduction To Linear Programming
No ratings yet
Introduction To Linear Programming
34 pages
Activity 3.1
No ratings yet
Activity 3.1
5 pages
Chapter 3 Transportation Problem
No ratings yet
Chapter 3 Transportation Problem
63 pages
Steam Production Optimization in A Petrochemical Industry: Abstract
No ratings yet
Steam Production Optimization in A Petrochemical Industry: Abstract
10 pages
Revenue Management
No ratings yet
Revenue Management
8 pages
GP TwoMines
No ratings yet
GP TwoMines
7 pages
Tma
50% (2)
Tma
48 pages
2025 12 Applied Maths Syllabus
No ratings yet
2025 12 Applied Maths Syllabus
10 pages
SEHH1008 Chapter 11 Linear Programming - Sensitivity Analysis
No ratings yet
SEHH1008 Chapter 11 Linear Programming - Sensitivity Analysis
23 pages
Here Is A Pascal Program To Solve Small Problems Using The Simplex Algorithm
No ratings yet
Here Is A Pascal Program To Solve Small Problems Using The Simplex Algorithm
12 pages
Operation Research Sir Haidar Ali PDF
No ratings yet
Operation Research Sir Haidar Ali PDF
70 pages