0% found this document useful (0 votes)

48 views72 pages

21 Support Vector Machines 03-10-2024

The document discusses Support Vector Machines (SVMs), focusing on the concept of maximum margin classifiers and the role of support vectors in defining decision boundaries. It explains the mathematical formulation of SVMs, including the conditions for optimal separating hyperplanes and the computation of margin width. Additionally, the document addresses challenges in real-world applications, such as the presence of outliers, and introduces the concept of soft-margin classification to improve robustness.

Uploaded by

adityaduggi0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views72 pages

21 Support Vector Machines 03-10-2024

Uploaded by

adityaduggi0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

O O

X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
O O
X X
X O
X O O
X X O
X O
X O
X O X X

X X X X O X X O

O O X O

O X X O O O O O
1 2m 4
R[f]  Remp[f] + h ln + 1 + ln
m h 
O O
X X
X O
X O O
X X O
X O
X O
O O O
OO O O O O
O X O
XX X
O O
O X X O
O O O
O O
Image from http://www.atrandomresearch.com/iclass/
x2 x12
X X
X X
X X O O O O X X x1 OO O O x1
Image from http://web.engr.oregonstate.edu/
~afern/classes/cs534/
Copyright © 2001, 2003, Andrew W.
Moore
“Given an algorithm which is
formulated in terms of a positive
definite kernel K1, one can construct
an alternative algorithm by replacing
K1 with another positive definite
kernel K2”

Ø SVMs can use the kernel trick

a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x + b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
Support Vectors with the, um,
are those
datapoints maximum margin.
that the This is the
margin pushes
up against simplest kind of
SVM (Called an
LSVM)
Linear SVM
Support Vector Machines: Slide 1
Why Maximum Margin?
1. Intuitively this feels safest.
f(x,w,b)
2. If we’ve made a small error inxthe
= sign(w. - b)
denotes +1
location of the boundary (it’s been
denotes -1 The maximum
jolted in its perpendicular direction)
this gives us leastmargin
chance linear
of causing a
misclassification. classifier is the
linear
3. CV is easy since the classifier
model is immune
Support Vectors to removal of anywith
non-support-vector
the, um,
are those datapoints.
datapoints that maximum margin.
the margin 4. There’s some theory that this is a
good thing. This is the
pushes up
against simplest kind of
5. Empirically it works very very well.
SVM (Called an
LSVM)
Support Vector Machines: Slide 2
Specifying a line and margin
+1” Plus-Plane
=
la ss Classifier Boundary
ic t C one
re d z Minus-Plane
“ P
-1”
=
la ss
ic t C one
“P red z

• How do we represent this mathematically?

• …in m input dimensions?

Support Vector Machines: Slide 3

Specifying a line and margin
+1” Plus-Plane
=
la ss Classifier Boundary
ic t C one
re d z Minus-Plane
“ P
-1”
=
=1 la ss
+b
ic t C one
wx
red z
0
b=
+
wx b=-1 “P
+
wx

Conditions for optimal separating hyperplane for data points

(x1, y1),…,(xl, yl) where yi =1
1. w . xi + b  1 if yi = 1 (points in plus class)
2. w . xi + b  -1 if yi = -1 (points in minus class)

Support Vector Machines: Slide 4

Computing the margin width
+1”
= M = Margin Width
la ss
ic t C one
re d z
“ P
= -1” How do we compute
=1 la ss M in terms of w
+b
ic t C one
wx
red z
0
+b=
wx b=-1 “P and b?
+
wx

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane. Why?

Support Vector Machines: Slide 5

Computing the margin width
+1”
= M = Margin Width
la ss
ic t C one
re d z
“ P
= -1” How do we compute
=1 la ss M in terms of w
+b
ic t C one
wx
red z
0
+b=
wx b=-1 “P and b?
+
wx

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane. Why?
Let u and v be two vectors on the
Plus Plane. What is w . ( u – v ) ?

And so of course the vector w is also

perpendicular to the Minus Plane
Support Vector Machines: Slide 6
Computing the margin width
+ 1” +
= x M = Margin Width
s s
Cla e
t
re dic zon
“P
- -1” How do we compute
s x=
b =1
t C las
e M in terms of w
wx
+
=0 e dic zon
wx
+ b
= -1 “P r and b?
x+b
w

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
Any location in
• Let x- be any point on the minus plane Rmm::not
not
necessarily a
• Let x+ be the closest plus-plane-point to x-. datapoint

Support Vector Machines: Slide 7

Computing the margin width
+ 1” +
= x M = Margin Width
s s
Cla e
t
re dic zon
“P
- -1” How do we compute
s x=
b =1
t C las
e M in terms of w
wx
+
=0 e dic zon
wx
+ b
= -1 “P r and b?
x+b
w

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
• Claim: x+ = x- + l w for some value of l. Why?

Support Vector Machines: Slide 8

Computing the margin width
+ 1” +
= x M = Margin Width
s s
Cla e
e d i t
c zon The line from x - to x+ is
“ Pr
- -1” How do we
perpendicular compute
to the
s x=
= 1
C las planes.
M in terms of w
b t e
wx
+
=0 e dic zon
wx
+ b
=-
1 “P r and
So to getbfrom
? x- to x+
+b
wx travel some distance in
• Plus-plane = { x : w . x + b = +1 direction
} w.
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
• Claim: x+ = x- + l w for some value of l. Why?

Support Vector Machines: Slide 9

Computing the margin width
+ 1” +
= x M = Margin Width
s s
Cla e
t
re dic zon
“P
- -1”
s x=
=1 la s
b
c t C e
wx
+
0 di zo n
x
=
+b -1 “ P re
w =
x+b
w
What we know:
• w . x+ + b = +1
• w . x- + b = -1
• x+ = x- + l w
• | x+ - x- | = M
It’s now easy to get M
in terms of w and b
Support Vector Machines: Slide 10
Computing the margin width
+ 1” +
= x M = Margin Width
s s
Cla e
t
re dic zon
“P
- -1”
s x=
=1 la s
b
c t C e
wx
+
di zo n
wx
= 0
+b -1 “ P re w . (x - + l w) + b =1
=
x+b
w
=>
What we know:
• w . x+ + b = +1 w . x - + b + l w .w = 1
• w . x- + b = -1 =>
• x+ = x- + l w
-1 + l w .w = 1
• | x+ - x- | = M
=>
It’s now easy to get M 2
in terms of w and b =
w.w
Support Vector Machines: Slide 11
Computing the margin width
+ 1” + 2
= x M = Margin Width =
s s w.w
Cla e
dict zon
e
“ Pr
- -1”
s x=
=1 la s
b
c t C e
wx
+
di zo n
M = |x+ - x- | =| l w |=
0
x
=
+b -1 “ P re
w =
x+b
w
What we know: = |w|= w.w
• w . x+ + b = +1
• w . x- + b = -1 2 w.w 2
• x+ = x- + l w = =
w.w w.w
• | x+ - x- | = M
• 2
=
w.w
Support Vector Machines: Slide 12
Learning the Maximum Margin Classifier
+ 1” + 2
= x M = Margin Width =
s s w.w
Cla e
t
re dic zon
“P
- -1”
s x=
=1 la s
b
c t C e
wx
+
0 di zo n
x
=
+b -1 “ P re
w =
x+b
Given a guess of w and b we can
w

• Compute whether all data points in the correct half-planes

• Compute the width of the margin
So now we just need to write a program to search the space
of w’s and b’s to find the widest margin that matches all
the datapoints. How?
Gradient descent? Simulated Annealing? Matrix Inversion? EM?
Newton’s Method?
Support Vector Machines: Slide 13
Large-Margin Classification
• Hyperplanes C0 - C2 achieve perfect classification CO
C2 H2
(zero empirical risk): C1 class 1

§ C0 is optimal in terms of generalization.

H1
§ The data points that define the boundary
are called support vectors. w

§ A hyperplane can be defined by: x w + b

§ We will impose the constraints:
yi ( xi wi + b) - 1 0 origin
class 2
optimal
classifier

The data points that satisfy the equality are

called support vectors.
• Support vectors are found using a constrained optimization:

1 2 N N
L p = w - a i y i ( x i w + b) + a i
2 i =1 i =1

• The final classifier is computed using the support vectors and the weights:
N
f (x ) = ai yi xi x + b
i =1

ECE 8443: Lecture 16, Slide 0

Soft-Margin Classification
• In practice, the number of support vectors will grow unacceptably large for
real problems with large amounts of data.

• Also, the system will be very sensitive to mislabeled training data or outliers.

• Solution: introduce “slack variables”

or a soft margin:

yi ( xi wi + b) - (1 - xi ) 0 Class 2
This gives the system the ability to
ignore data points near the boundary,
and effectively pushes the margin
towards the centroid of the training data.

• This is now a constrained optimization

with an additional constraint:
Class 1

• The solution to this problem can still be found using Lagrange multipliers.
ECE 8443: Lecture 16, Slide 1
Nonlinear Decision Surfaces
• Thus far we have only considered linear decision surfaces. How do we
generalize this to a nonlinear surface?

f( )
f( ) f( )
f( ) f( ) f( )
f(.) f( )
f( ) f( )
f( ) f( )
f( ) f( )
f( ) f( ) f( )
f( )
f( )

Input space Feature space

• Our approach will be to transform the data to a higher dimensional space
where the data can be separated by a linear surface.

• Define a kernel function:

K ( x i , x j ) = f ( x i )f ( x j )

Examples of kernel functions include polynomial:

t
K ( x i , x j ) = ( x i x j + 1) d

ECE 8443: Lecture 16, Slide 2

Kernel Functions
Other popular kernels are a radial basis function (popular in neural networks):
2
K ( x i , x j ) = exp( x i - x j ( 2s 2 ))

and a sigmoid function:

t
K ( x i , x j ) = tanh( kx i x j + q )
• Our optimization does not change significantly:
n 1n n
max W (a ) = ai - a ia j yi y j K (x i ,x j )
i =1 2 i =1 j =1
n
subject to C ai 0, a i yi = 0
i =1

• The final classifier has a similar form:

N
f (x ) = a i y i K (x i , x ) + b
i =1
• Let’s work some examples.

ECE 8443: Lecture 16, Slide 3

SVM Limitations
• Uses a binary (yes/no) decision rule

• Generates a distance from the hyperplane, but this distance is often not a
good measure of our “confidence” in the classification

• Can produce a “probability” as a function of the distance (e.g. using

sigmoid fits), but they are inadequate

• Number of support vectors grows linearly with the size of the data set

• Requires the estimation of trade-off parameter, C, via held-out sets

Error Open-Loop
Error

Optimum

Training Set
Error

Model Complexity

ECE 8443: Lecture 16, Slide 4

Summary
• Support Vector Machines are one example of a kernel-based learning machine
that is training in a discriminative fashion.
• Integrates notions of risk minimization, large-margin and soft margin
classification.
• Two fundamental innovations:
§ maximize the margin between the classes using actual data points,
§ rotate the data into a higher-dimensional space in which the data is linearly
separable.
• Training can be computationally expensive but classification is very fast.
• Note that SVMs are inherently non-probabilistic (e.g., non-Bayesian).
• SVMs can be used to estimate posteriors by mapping the SVM output to a
likelihood-like quantity using a nonlinear function (e.g., sigmoid).
• SVMs are not inherently suited to an N-way classification problem. Typical
approaches include a pairwise comparison or “one vs. world” approach.

ECE 8443: Lecture 16, Slide 5

Summary
• Many alternate forms include Transductive SVMs, Sequential SVMs, Support
Vector Regression, Relevance Vector Machines, and data-driven kernels.

• Key lesson learned: a linear algorithm in the feature space is equivalent to a

nonlinear algorithm in the input space. Standard linear algorithms can be
generalized (e.g., kernel principal component analysis, kernel independent
component analysis, kernel canonical correlation analysis, kernel k-means).

• What we didn’t discuss:

§ How do you train SVMs?

§ Computational complexity?

§ How to deal with large amounts of data?

ECE 8443: Lecture 16, Slide 6

Support Vector Machines

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

10/11/2021 Introduction to Data Mining, 2nd Edition 1

Support Vector Machines

• Find a linear hyperplane (decision boundary) that will separate the data
10/11/2021 Introduction to Data Mining, 2nd Edition 2
Support Vector Machines

• One Possible Solution

10/11/2021 Introduction to Data Mining, 2nd Edition 3
Support Vector Machines

• Another possible solution

10/11/2021 Introduction to Data Mining, 2nd Edition 4
Support Vector Machines

• Other possible solutions

10/11/2021 Introduction to Data Mining, 2nd Edition 5
Support Vector Machines

• Which one is better? B1 or B2?

• How do you define better?
10/11/2021 Introduction to Data Mining, 2nd Edition 6
Support Vector Machines

• Find hyperplane maximizes the margin => B1 is better than B2

10/11/2021 Introduction to Data Mining, 2nd Edition 7
Support Vector Machines

r r
w x+b=0
r r
r r w x + b = +1
w x + b = -1

r r
r 1 if w x + b 1 2
f (x) = r r Margin = r
-1 if w x + b - 1 || w ||
10/11/2021 Introduction to Data Mining, 2nd Edition 8
Linear SVM

• Linear model:
r r
r 1 if w x+b 1
f ( x) = r r
- 1 if w x + b -1

• Learning the model is equivalent to determining

r
the values of w and b
r
– How to find w and b from training data?

10/11/2021 Introduction to Data Mining, 2nd Edition 9

Learning Linear SVM
2
• Objective is to maximize: Margin = r
|| w ||
r 2
r || w ||
– Which is equivalent to minimizing: L( w) =
2
– Subject to the following constraints:
r r
1 if w x i + b 1
yi = r r
-1 if w x i + b -1
or
�� (w•x� + �) ≥ 1, � = 1,2, . . . , �

u This is a constrained optimization problem

– Solve it using Lagrange multiplier method

10/11/2021 Introduction to Data Mining, 2nd Edition 10

Example of Linear SVM

Support vectors

10/11/2021 Introduction to Data Mining, 2nd Edition 11

Learning Linear SVM

• Decision boundary depends only on support

vectors
– If you have data set with same support
vectors, decision boundary will not change

– How to classify using SVM once w and b are

found? Given a test record, xi
r r
r 1 if w xi + b 1
f ( xi ) = r r
- 1 if w x i + b -1

10/11/2021 Introduction to Data Mining, 2nd Edition 12

Support Vector Machines

• What if the problem is not linearly separable?

10/11/2021 Introduction to Data Mining, 2nd Edition 13

Support Vector Machines

• What if the problem is not linearly separable?

– Introduce slack variables
u Need to minimize: r 2
|| w || N
L( w) = +C xik
2 i =1
u Subject to:
r r
1 if w x i + b 1 - x i
yi = r r
-1 if w x i + b - 1 + x i

u If k is 1 or 2, this leads to similar objective function

as linear SVM but with different constraints (see
textbook)

10/11/2021 Introduction to Data Mining, 2nd Edition 14

Support Vector Machines

• Find the hyperplane that optimizes both factors

10/11/2021 Introduction to Data Mining, 2nd Edition 15
Nonlinear Support Vector Machines

• What if decision boundary is not linear?

10/11/2021 Introduction to Data Mining, 2nd Edition 16

Nonlinear Support Vector Machines

• Transform data into higher dimensional space

Decision boundary:
r r
w F( x ) + b = 0
10/11/2021 Introduction to Data Mining, 2nd Edition 17
Learning Nonlinear SVM

• Optimization problem:

• Which leads to the same set of equations (but

involve F(x) instead of x)

10/11/2021 Introduction to Data Mining, 2nd Edition 18

Learning NonLinear SVM

• Issues:
– What type of mapping function F should be
used?
– How to do the computation in high
dimensional space?
u Most computations involve dot product F(xi) F(xj)
u Curse of dimensionality?

10/11/2021 Introduction to Data Mining, 2nd Edition 19

Learning Nonlinear SVM

• Kernel Trick:
– F(xi) F(xj) = K(xi, xj)
– K(xi, xj) is a kernel function (expressed in
terms of the coordinates in the original space)
u Examples:

10/11/2021 Introduction to Data Mining, 2nd Edition 20

Example of Nonlinear SVM

SVM with polynomial

degree 2 kernel

10/11/2021 Introduction to Data Mining, 2nd Edition 21

Learning Nonlinear SVM

• Advantages of using kernel:

– Don’t have to know the mapping function F
– Computing dot product F(xi) F(xj) in the
original space avoids curse of dimensionality

• Not all functions can be kernels

– Must make sure there is a corresponding F in
some high-dimensional space
– Mercer’s theorem (see textbook)

10/11/2021 Introduction to Data Mining, 2nd Edition 22

Characteristics of SVM

• The learning problem is formulated as a convex optimization problem

– Efficient algorithms are available to find the global minima
– Many of the other methods use greedy approaches and find locally
optimal solutions
– High computational complexity for building the model

• Robust to noise
• Overfitting is handled by maximizing the margin of the decision boundary,
• SVM can handle irrelevant and redundant attributes better than many
other techniques
• The user needs to provide the type of kernel function and cost function
• Difficult to handle missing values

• What about categorical variables?

10/11/2021 Introduction to Data Mining, 2nd Edition 23

CO3002/CO7002 Analysis and Design of Algorithms: S C M S
No ratings yet
CO3002/CO7002 Analysis and Design of Algorithms: S C M S
103 pages
Machine Learning-4
100% (1)
Machine Learning-4
18 pages
Data Structures-Trees
100% (3)
Data Structures-Trees
40 pages
NLP Assignment-4 Solution
100% (1)
NLP Assignment-4 Solution
5 pages
B+ Tree in DBMS
No ratings yet
B+ Tree in DBMS
21 pages
Cheat Sheet - PLC
No ratings yet
Cheat Sheet - PLC
8 pages
SVM PCA Kmeans
No ratings yet
SVM PCA Kmeans
121 pages
5.2 Processing N Jobs K Machines
No ratings yet
5.2 Processing N Jobs K Machines
15 pages
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
65 pages
Assignment 13
0% (1)
Assignment 13
19 pages
ML - Lec 8-SVM As A Linear Classifier
No ratings yet
ML - Lec 8-SVM As A Linear Classifier
78 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
74 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
65 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
SVM 2
No ratings yet
SVM 2
65 pages
L9 Support Vector Machines
No ratings yet
L9 Support Vector Machines
83 pages
Linear Programming: Artifical Variable Technique: Two - Phase Method
No ratings yet
Linear Programming: Artifical Variable Technique: Two - Phase Method
4 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Support Vector Machines: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
65 pages
Google Step Intern
No ratings yet
Google Step Intern
3 pages
Cme250 Lecture5
No ratings yet
Cme250 Lecture5
64 pages
8 SVM
No ratings yet
8 SVM
55 pages
10 SVM
No ratings yet
10 SVM
77 pages
ML-chap13 2024 110331
No ratings yet
ML-chap13 2024 110331
67 pages
Module 6-Svm
No ratings yet
Module 6-Svm
47 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
15 SVM
No ratings yet
15 SVM
61 pages
SVM Minus Kernel 71
No ratings yet
SVM Minus Kernel 71
32 pages
Game Theory
No ratings yet
Game Theory
39 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
SVM PPT
No ratings yet
SVM PPT
32 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Introduction and Array
No ratings yet
Introduction and Array
36 pages
27 Support - Vector - Machine
No ratings yet
27 Support - Vector - Machine
17 pages
Graphs: Shortest Paths, Job Scheduling Problem, Huffman Code
100% (1)
Graphs: Shortest Paths, Job Scheduling Problem, Huffman Code
25 pages
Perceptron
No ratings yet
Perceptron
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Data Structures and Algorithms With Python LetsUpgrade
No ratings yet
Data Structures and Algorithms With Python LetsUpgrade
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Exp 14
No ratings yet
Exp 14
27 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Practical File
No ratings yet
Practical File
52 pages
Pre Board Mathematics Ss Memorial Paper Class 12
No ratings yet
Pre Board Mathematics Ss Memorial Paper Class 12
19 pages
Machine Learning SVM: Mustansar Ali
No ratings yet
Machine Learning SVM: Mustansar Ali
21 pages
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
No ratings yet
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
41 pages
Support Vector Machines (SVMS) 2222
No ratings yet
Support Vector Machines (SVMS) 2222
23 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Binomial Trees Exaample
No ratings yet
Binomial Trees Exaample
19 pages
Week 6 SVM
No ratings yet
Week 6 SVM
18 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Lecture#9: Support Vector Machine (SVM)
No ratings yet
Lecture#9: Support Vector Machine (SVM)
18 pages
National - I YEAR C New Program
No ratings yet
National - I YEAR C New Program
28 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Chapter 4 - Recursion
No ratings yet
Chapter 4 - Recursion
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Clustering
No ratings yet
Clustering
18 pages
Module: - 3 Z-Transform
No ratings yet
Module: - 3 Z-Transform
23 pages
Maximum Flow Problem
No ratings yet
Maximum Flow Problem
21 pages
Support Vector Machines For Classification: A Seminar On Data Mining
No ratings yet
Support Vector Machines For Classification: A Seminar On Data Mining
18 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Ass1 PDF
No ratings yet
Ass1 PDF
2 pages
DSA Question Bank For Term Test 1
No ratings yet
DSA Question Bank For Term Test 1
2 pages
Time Table Problem Solving Using Graph Coloring
No ratings yet
Time Table Problem Solving Using Graph Coloring
15 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Greatest Common Divisor of Strings
No ratings yet
Greatest Common Divisor of Strings
8 pages
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
No ratings yet
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
11 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Final Examination TTE 4300 - Transportation Data Analytics Spring 2020
No ratings yet
Final Examination TTE 4300 - Transportation Data Analytics Spring 2020
5 pages
Tutorialsheet 3
No ratings yet
Tutorialsheet 3
2 pages
Managerial Decision Modeling With Spreadsheets (Answer Key) P2-14-1
No ratings yet
Managerial Decision Modeling With Spreadsheets (Answer Key) P2-14-1
2 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages

21 Support Vector Machines 03-10-2024

Uploaded by

21 Support Vector Machines 03-10-2024

Uploaded by

O O

Ø SVMs can use the kernel trick

• How do we represent this mathematically?

Support Vector Machines: Slide 3

Conditions for optimal separating hyperplane for data points

Support Vector Machines: Slide 4

Support Vector Machines: Slide 5

And so of course the vector w is also

Support Vector Machines: Slide 7

Support Vector Machines: Slide 8

Support Vector Machines: Slide 9

• Compute whether all data points in the correct half-planes

§ C0 is optimal in terms of generalization.

§ A hyperplane can be defined by: x w + b

The data points that satisfy the equality are

ECE 8443: Lecture 16, Slide 0

• Solution: introduce “slack variables”

• This is now a constrained optimization

Input space Feature space

• Define a kernel function:

Examples of kernel functions include polynomial:

ECE 8443: Lecture 16, Slide 2

and a sigmoid function:

• The final classifier has a similar form:

ECE 8443: Lecture 16, Slide 3

• Can produce a “probability” as a function of the distance (e.g. using

• Requires the estimation of trade-off parameter, C, via held-out sets

ECE 8443: Lecture 16, Slide 4

ECE 8443: Lecture 16, Slide 5

• Key lesson learned: a linear algorithm in the feature space is equivalent to a

• What we didn’t discuss:

§ How do you train SVMs?

§ How to deal with large amounts of data?

ECE 8443: Lecture 16, Slide 6

Introduction to Data Mining, 2nd Edition

10/11/2021 Introduction to Data Mining, 2nd Edition 1

• One Possible Solution

• Another possible solution

• Other possible solutions

• Which one is better? B1 or B2?

• Find hyperplane maximizes the margin => B1 is better than B2

• Learning the model is equivalent to determining

10/11/2021 Introduction to Data Mining, 2nd Edition 9

u This is a constrained optimization problem

10/11/2021 Introduction to Data Mining, 2nd Edition 10

10/11/2021 Introduction to Data Mining, 2nd Edition 11

• Decision boundary depends only on support

– How to classify using SVM once w and b are

10/11/2021 Introduction to Data Mining, 2nd Edition 12

• What if the problem is not linearly separable?

10/11/2021 Introduction to Data Mining, 2nd Edition 13

• What if the problem is not linearly separable?

u If k is 1 or 2, this leads to similar objective function

10/11/2021 Introduction to Data Mining, 2nd Edition 14

• Find the hyperplane that optimizes both factors

• What if decision boundary is not linear?

10/11/2021 Introduction to Data Mining, 2nd Edition 16

• Transform data into higher dimensional space

• Which leads to the same set of equations (but

10/11/2021 Introduction to Data Mining, 2nd Edition 18

10/11/2021 Introduction to Data Mining, 2nd Edition 19

10/11/2021 Introduction to Data Mining, 2nd Edition 20

SVM with polynomial

10/11/2021 Introduction to Data Mining, 2nd Edition 21

• Advantages of using kernel:

• Not all functions can be kernels

10/11/2021 Introduction to Data Mining, 2nd Edition 22

• The learning problem is formulated as a convex optimization problem

• What about categorical variables?

10/11/2021 Introduction to Data Mining, 2nd Edition 23

You might also like