0% found this document useful (0 votes)

30 views

Lecture 9 - SVMs

Uploaded by

Eric Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Lecture 9 - SVMs

Uploaded by

Eric Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CIS5200: Machine Learning Spring 2024

Lecture 9: Support Vector Machines

Date: February 15, 2024

In this lecture, we’ll cover our next machine learning algorithm: the Support vector machine
(SVM). SVMs are one of the most successful linear models of all time, and there’s a particular
notion of “quality” under which they are in fact the optimal linear classifier.

𝒘𝟐 𝒘𝟐 𝛾1

𝒘𝟏 𝒘𝟏
𝛾2

Figure 1: Left: Intuitively, the purple hyperplane parameterized by w2 is a “better” classifier than
the hyperplane parameterized by w1 , because it seems to leave more room for error. Right: the
margin–a concept we’ve briefly touched on before–seems to formalize this concept nicely. The green
hyperplane has a much smaller margin.

To begin, consider the dataset in the figure above. Intuitively, it seems like the purple hyperplane
parameterized by w2 is “better” than the green one parameterized by w2 . The reason for this is
that there are orange circles very close to the green hyperplane – it’s not implausible to think that
there could be orange circle test points just to the left of the given training data that we would
classify incorrectly. Put another way, the purple hyperplane is better because it seems to “leave
more room for error.” On the right, we see that we can formalize this intuition using a quantity
we’ve already seen before: the margin. The purple hyperplane is “better” because it has a larger
margin. SVMs are linear models with the maximum possible margin given a dataset.
Note: Recall that the Perceptron algorithm guaranteed to find a hyperplane that separates the data
as long as one exists. In fact, since we assumed separation with margin, there are infinitely many
hyperplanes that separate the data, and the Perceptron may find any one of these. In particular,
Perceptron algroithm did not provide any guarantees on the margin of the hyperplane found, even
though we assumed that the training dataset had margin γ with respect to the true separator. How-
ever, in Homework 1, we saw a simple modification of the Perceptron to the Margin Perceptron
which guaranteed an approximate margin of γ/3. SVM will guarantee maximum margin of γ.

1
1 The Margin Revisited

Recall from the perceptron that we defined the margin γ(w, S) of a hyperplane w and dataset S as
the smallest distance between any point in S and the hyperplane. Since the margin is going to be
such a critical quantity to our discussion of SVMs, let’s talk about how to derive it. This time, we
won’t assume that the hyperplane is defined by a normalized vector w – instead we’re given some
arbitrary vector w that defines a hyperplane. As we’ll see, we get the “right thing” for normalized
vectors anyways.
To derive the margin, we make the following observations about the distance from a point x to
some hyperplane parameterized by w:

1. For any point x, there is a point xp that is the perpendicular projection from the hyperplane
to the point.

2. Because the vector pointing from xp to x, d = x − xp , is perpendicular to the hyperplane, it

is parallel to w. As a result, it must be the case that d = αw.

3. Since xp is on the hyperplane, we know that w⊤ xp = 0 by definition of the hyperplane

parameterized by w.

These facts are enough to compute the vector pointing from the hyperplane to the point x:

w ⊤ xp = 0 xp is on the hyperplane
⊤
w (x − d) = 0 d = x − xp =⇒ xp = x − d
w⊤ (x − αw) = 0 w ∥ d, so d = αw
⊤ ⊤
w x − αw w = 0 expand
w⊤ x
α= solve for α
w⊤ w
w⊤ x
d = αw = ⊤ w solve for α
w w

√
Given this vector d, we simply need to compute the length of this vector d⊤ d to get the distance
from the hyperplane to x:
s s
√ (w ⊤ x)2 (w⊤ x)2 |w⊤ x|
d⊤ d = w ⊤w = = √ (1)
(w⊤ w)2 (w⊤ w) w⊤ w
√
Note that if w is a unit vector, e.g. ∥w∥2 = w⊤ w = 1, then this expression indeed just becomes
|w⊤ x|. Either way, we can now write down a general expression for the margin γ(w, S) of a
hyperplane parameterized by w for a dataset S:

|w⊤ xi |
γ(w, S) = min √ (2)
i w⊤ w

2
2 (Hard Margin) Support Vector Machines: Maximizing the Mar-
gin

Now that we’ve recapped our expression for the margin, and some intuition that we’d like a linear
classifier with a large margin, what we might try first using our nice numerical optimization tools
is to simply maximize the margin:

DANGER: Bad Idea Zone

w∗ = max γ(w, S)
w

The problem with this is, of course, that it says nothing about actually fitting the data! Indeed,
it is trivial to simply maximize the margin: just place the hyperplane as far away from all of your
data as you can get away with. What we need to do is not just find a maximum margin hyperplane,
but a maximum margin separating hyperplane (which we’ll assume exists for the moment).
Recall that a point (xi , yi ) is on the “right” side of the hyperplane if yi w⊤ xi > 0. What we can
do is modify the above optimization problem to be subject to the constraint that all training data
points are on the “right” side of the hyperplane:

A Better Idea

w∗ = max γ(w, S) s.t. ∀i yi w⊤ xi > 0

w | {z }
w needs to be a separating hyperplane

Pros of this optimization problem: it probably gives us a really good linear model! Cons of this
optimization problem: it has constraints, so it doesn’t immediately look like we can apply gradient
descent.
The rest of what we need to do in this section is to fix that con. First, let’s plug in the definition
of the margin above to make things even worse for a moment:
|w⊤ xi |
w∗ = max min √ s.t. ∀i yi w⊤ xi > 0
w i w⊤ w
Let’s play a few tricks to clean up this optimization problem. First of all, remember that the
hyperplane is scale invariant: for any hyperplane parameterized by w, cw for c ̸= 0 parameterizes
the same hyperplane. This is because (cw⊤ )x = 0 ⇐⇒ w⊤ x = 0. So let’s pick a really convenient
scale for c. Given the optimal solution w∗ to the above optimization problem let’s find c such that
mini |(cw∗⊤ )xi | = 1. Since w∗ and cw∗ define the same hyperplane, why not just search for this
scaled version directly by adding this as a constraint so that we find cw∗ instead of w∗ in the first
place?
|w⊤ xi |
w∗ = max min √
w i w⊤ w
⊤
s.t. ∀i yi w xi > 0
min |w∗⊤ xi | = 1
i

3
The claim is that the above optimization problem, which just has a new constraint, is equivalent
to the first one. Why does this help us? Well, let’s first pull a √ 1⊤ out of the minimization over
w w
i since it doesn’t depend on the data:
1
w∗ = max √ min |w⊤ xi |
w w⊤ w i
⊤
s.t. ∀i yi w xi > 0
min |w∗⊤ xi | = 1
i

But at the minimum, we know by the constraint that mini |w⊤ xi | = 1, so we can drop it!
1
w∗ = max √
w w⊤ w
s.t. ∀i yi w⊤ xi > 0
min |w∗⊤ xi | = 1
i

Now, we’ll make two observations. The first is simple, the second is complex and we’ll prove it.
First the maximization problem can be converted to a simpler minimization problem,

1
max √ ⇐⇒ min w⊤ w
w w⊤ w w

This is because maximizing a fraction is the same as making the denominator as small as possible,
√
and since · is monotonic we can just drop it.
Next, the big part. I claim that the following optimization problems are equivalent:
(
∀i yi w⊤ xi > 0
min w⊤ w s.t. ⇐⇒ min w⊤ w s.t. yi w⊤ xi ≥ 1
w mini |w∗⊤ xi | = 1 w

Let’s be clear about what I mean by equivalent. I am claiming that the optimal solution w∗ is
the same for both optimization problems. Equivalently, w∗ satisfies the left two constraints if and
only if it satisfies the right constraint. Let’s prove this.

Left implies right. Let’s start by showing that if the left two constraints are satisfied by w∗ , then
the right one is. The first constraint implies that yi w⊤ xi is positive for every i. Since yi ∈ {+1, −1},
the multiplication by yi only changes the sign of w⊤ xi , it cannot change the magnitude. Therefore,

yi w⊤ xi = |w⊤ xi |

But by the second constraint on the left, we know that the smallest value of |w⊤ xi | is 1. Therefore:

yi w⊤ xi = |w⊤ xi | ≥ 1

Thus, if the left two constraints are satisfied, the right constraint is also satisfied.

4
Right implies left. To finish the proof that these constraint sets are equivalent, we just need
to show that the right constraint implies both of the left two constraints. First of all, it is obvious
that the first constraint is implied:
yi w⊤ xi ≥ 1 =⇒ yi w⊤ xi > 0
Clearly if for every i the quantity is greater than 1, it’s also greater than 0. Now we consider the
second constraint. Suppose for contradiction that we indeed solved the right optimization problem:
w∗ = min w⊤ w
w
s.t. ∀i yi w⊤ xi ≥ 1
and found that the second left constraint was violated. In other words, we found that mini |w∗⊤ xi | =
c for c ̸= 1. Clearly, c cannot be less than 1, otherwise we wouldn’t have yi w⊤ xi ≥ 1. Therefore,
c > 1.
Set w∗∗ = wc∗ . I claim that w∗∗ is at least a feasible solution to the optimization problem above.
To see this, observe that, by definition,
⊤ 1
yi w∗∗ xi = yi w∗⊤ xi
c
Since yi w∗⊤ xi and c are both positive, we certainly haven’t changed the sign – the above is still
positive. Therefore:
1 1
yi w∗⊤ xi = |w∗⊤ xi |
c c
⊤
But by definition of c, we know that mini |w∗ xi | = c. Therefore,
1 c
min |w∗⊤ xi | = = 1
i c c
⊤ x is 1, certainly for all i this quantity is greater than or equal to
Since the smallest value of yi w∗∗ i
1. Therefore, w∗∗ is a feasible solution.
But what about the objective value?
⊤ 1 1 1
w∗∗ w∗∗ = ( w∗ )⊤ ( w∗ ) = 2 w∗⊤ w∗
c c c
Since c > 1, this is less than the objective value we achieved with w∗ . This is a contradiction,
because w∗ was supposed to be the optimal solution!
Therefore, reiterating things, we do indeed have that:
(
⊤ ∀i yi w⊤ xi > 0
min w w s.t. ⇐⇒ min w⊤ w s.t. yi w⊤ xi ≥ 1
w mini |w∗⊤ xi | = 1 w

This right optimization problem is exactly what we’ll call the hard margin SVM:

The Hard Margin SVM

We call the linear model specified by h(x) = w∗⊤ x parameterized by w∗ a “hard margin”
SVM if it is a solution to the optimization problem:

w∗ = min w⊤ w s.t. yi w⊤ xi ≥ 1
w

5
What happens if the data isn’t linearly separable? Then there is no feasible solution! We call this
a “hard margin” because it requires linear separability. What we’ll do next is relax this.

3 Soft margin Support Vector Machines

The formulation above has two problems:

1. It’s still a constrained optimization problem. It’s a really nice kind of constrained optimization
problem called a quadratic program, so it’s actually solvable, but we still don’t like constraints.

2. The hard margin SVM only works if the data is linearly separable – otherwise, there is no
feasible solution to the optimization problem.

In this section, we’ll solve both of these problems with the same trick. To do this, we’ll introduce
the notion of slack variables to the constraint. In particular, we make the following observation:
for any w, if w is not a feasible solution above, it’s because there’s some i for which yi w⊤ xi .
The idea of slack variables is to explicitly allow this kind of violation, but make the optimizer pay
a penalty for it. The resulting optimization problem might look like this:
X
w∗ = min w⊤ w + C ξi
w,ξ
i
⊤
s.t. y w xi ≥ 1 − ξi
|i {z }
ξi is constraint violation for xi

ξi ≥ 0

Here, ξi measures the amount by which the constraint for (xi , yi ) is violated – e.g., yi w⊤ xi is
less than 1. In English, what this new optimization problem does is allow each xi to violate the
constraint by some ξi , but we pay a penalty of C ξi (where C is a constant chosen by you) in the
objective. Our goal is to minimize the total penalty.
So far, we seem to have solved the “hard margin” problem, but not the constrained optimization
problem. To achieve this final goal, first rearrange the above optimization problem:
X
w∗ = min w⊤ w + C ξi
w,ξ
i
s.t. ξi ≥ 1 − yi w⊤ xi
ξi ≥ 0

observe that at the optimum:

(
1 − yi w⊤ xi yi w ⊤ xi < 1
ξi =
0 yi w ⊤ xi ≥ 1

Why is this true? Well, if yi w⊤ xi ≥ 1, clearly we can set ξi = 0. This satisfies both constraints, but
adds the minimum possible additional loss to the objective. On the other hand, if yi w⊤ xi < 1, the

6
minimum value of ξi we can choose to satisfy the constraint that yi w⊤ xi ≥ 1 − ξi is just 1 − yi w⊤ xi ,
which we can see by rearranging the constraint itself.
Thus, for any choice of w, instead of treating ξi as a variable to be optimized over, we can just
compute it as above. A compact form for this is just:

ξi = max(0, 1 − yi w⊤ xi )

To recap, we know that (1) at the optimum, ξi takes exactly the value above, and (2) this definition
of ξi always satisfies both constraints. Thus, we can plug this definition of ξi into the loss and drop
both constraints! X
w∗ = min w⊤ w + C max(0, 1 − yi w⊤ xi )
w
i
We’re essentially done – the above is a perfectly fine optimization problem, and we could stop here.
However, to make it look more like the optimization problems we’ve seen in class, let’s do three
“clean up” steps:

1. First, write w⊤ w as ∥w∥22 .

2. Second, since C is a free parameter that you will have to specify anyways, let’s instead define
1
λ = Cm and multiply the whole equation by that.

3. Finally, let’s swap the order of the terms.

The result is the following:

The Soft Margin SVM

We call the linear model specified by h(x) = w∗⊤ x parameterized by w∗ a “soft margin”
SVM if w∗ is a solution to the optimization problem:
1 X
w∗ = min max(0, 1 − yi w⊤ xi ) + λ∥w∥22
w m
i

4 The Empirical Risk Minimization View

Part of the reason we did those “clean up” steps is that now the above minimization problem looks
pretty familiar. We’re minimizing the following function of w:
1 X
R̂(w) = max(0, 1 − yi w⊤ xi ) + λ∥w∥22
m | {z } | {z }
i
loss function ℓ2 -regularizer

Indeed, the soft margin SVM is just a plain old empirical risk minimization problem like linear or
logistic regression, just with a particular loss function and an L2 regularizer!
We call this specific loss the hinge loss:

ℓhinge (h(x), y) = max(0, 1 − yi h(xi ))

7
A soft margin SVM therefore just uses the hinge loss plus L2 regularization. Here’s a plot comparing
the 0 − 1 loss, the hinge loss, and the logistic loss:

Figure 2: Three classification losses we’ve learned about so far (plus the exponential loss). These
are plotted as a function of yi h(xi ), because that quantity measures how “bad” the fit is – negative
values mean we are classifying incorrectly.

By looking at these loss functions, we can start to get a better sense of the behavior of these algo-
rithms. The hinge loss penalizes misclassification, but also penalizes the case where 0 < yi h(xi ) < 1.
The hinge loss has no penalty if points are classified correctly by at least a “margin” of 1. The ex-
ponential loss penalizes points the model is getting very wrong more heavily than the hinge/logistic
loss. What do you think this will mean if there are outliers?

Chapter 2
No ratings yet
Chapter 2
4 pages
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
S5 Revision Exercise 1
100% (1)
S5 Revision Exercise 1
11 pages
Week 6 SVM
No ratings yet
Week 6 SVM
18 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Chapter_8 (1)
No ratings yet
Chapter_8 (1)
52 pages
ML-chap13_2024_110331
No ratings yet
ML-chap13_2024_110331
67 pages
3 Classification 2
No ratings yet
3 Classification 2
27 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Lecture 7_SVM
No ratings yet
Lecture 7_SVM
125 pages
EXP-14
No ratings yet
EXP-14
27 pages
10_SVM (1)
No ratings yet
10_SVM (1)
77 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
7 SVM for Scientists Annotated (1)
No ratings yet
7 SVM for Scientists Annotated (1)
76 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVMs
No ratings yet
SVMs
42 pages
Support Vector Machines: CS229 Lecture Notes
100% (2)
Support Vector Machines: CS229 Lecture Notes
25 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Report 1
No ratings yet
Report 1
6 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Main
No ratings yet
Main
12 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
svm and kernel
No ratings yet
svm and kernel
57 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
SVM Reference
No ratings yet
SVM Reference
8 pages
הרצאה - SVM 2
No ratings yet
הרצאה - SVM 2
47 pages
8_SVM
No ratings yet
8_SVM
55 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
Support Vector Machines
No ratings yet
Support Vector Machines
25 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
6.034 Notes: Section 8.1: Slide 8.1.1
No ratings yet
6.034 Notes: Section 8.1: Slide 8.1.1
31 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Perceptron Notes
No ratings yet
Perceptron Notes
5 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Excel Formula Function
No ratings yet
Excel Formula Function
40 pages
Calculus Symbols
No ratings yet
Calculus Symbols
1 page
Conic Curves - Ellipse, Hyperbola, Parabola PDF
No ratings yet
Conic Curves - Ellipse, Hyperbola, Parabola PDF
14 pages
End Course Submission - Design Drawing Course - B.des, Divya Batra
No ratings yet
End Course Submission - Design Drawing Course - B.des, Divya Batra
43 pages
Inverse Functions - Worksheet: Find: ( )
No ratings yet
Inverse Functions - Worksheet: Find: ( )
10 pages
Anurag Singh Bcom (H) Business Mathematics Assignment (Fimt)
No ratings yet
Anurag Singh Bcom (H) Business Mathematics Assignment (Fimt)
10 pages
INTEGRAL Bahasa Inggris
No ratings yet
INTEGRAL Bahasa Inggris
28 pages
Form 5 Progression Module 1
No ratings yet
Form 5 Progression Module 1
14 pages
1 Parametric Equations and Graphing WS (With Answers)
No ratings yet
1 Parametric Equations and Graphing WS (With Answers)
4 pages
Dsa Assignment
No ratings yet
Dsa Assignment
7 pages
11 Maths Formulae Theorem
No ratings yet
11 Maths Formulae Theorem
39 pages
Master Theorem Calculator - Gate Vidyalay
No ratings yet
Master Theorem Calculator - Gate Vidyalay
8 pages
Chapter 4 Section 1: Numerical Descriptive Techniques: Multiple Choice
No ratings yet
Chapter 4 Section 1: Numerical Descriptive Techniques: Multiple Choice
14 pages
2003 DIP Chap 6 Geometric
No ratings yet
2003 DIP Chap 6 Geometric
17 pages
Arithmetic Series
No ratings yet
Arithmetic Series
17 pages
Pre-Calculus: Quarter 2 - Module 3: Circular Functions
100% (1)
Pre-Calculus: Quarter 2 - Module 3: Circular Functions
33 pages
Function of Several Variables Assignment: Mathophilic Education Mob. No.-7239082744
No ratings yet
Function of Several Variables Assignment: Mathophilic Education Mob. No.-7239082744
10 pages
Revised - STUDENT SUPPORT MATERIAL XII 2020 21
No ratings yet
Revised - STUDENT SUPPORT MATERIAL XII 2020 21
124 pages
Free Access to Precalculus 10th Edition Larson Solutions Manual Chapter Answers
100% (9)
Free Access to Precalculus 10th Edition Larson Solutions Manual Chapter Answers
69 pages
Permutation of The Last Layer
No ratings yet
Permutation of The Last Layer
3 pages
4.5 - Applications of Differential Calculus - Completed
No ratings yet
4.5 - Applications of Differential Calculus - Completed
20 pages
Mechanics Fundamentals 01
No ratings yet
Mechanics Fundamentals 01
6 pages
8 Stress-Based Topology Optimization Using Bi-Directional Evolutionary
No ratings yet
8 Stress-Based Topology Optimization Using Bi-Directional Evolutionary
20 pages
Multivariable Linear Systems and Row Operations (Teacher Resource)
No ratings yet
Multivariable Linear Systems and Row Operations (Teacher Resource)
66 pages
MATH 141 Course Outline
No ratings yet
MATH 141 Course Outline
4 pages
Matrix Analysis in Situ - Steven J. Cox
No ratings yet
Matrix Analysis in Situ - Steven J. Cox
170 pages
CIE 115 SAS 5 Highlighted Version
No ratings yet
CIE 115 SAS 5 Highlighted Version
7 pages
Ps 1 Solutions
No ratings yet
Ps 1 Solutions
4 pages

Lecture 9 - SVMs

Uploaded by

Lecture 9 - SVMs

Uploaded by

CIS5200: Machine Learning Spring 2024

Lecture 9: Support Vector Machines

2. Because the vector pointing from xp to x, d = x − xp , is perpendicular to the hyperplane, it

3. Since xp is on the hyperplane, we know that w⊤ xp = 0 by definition of the hyperplane

DANGER: Bad Idea Zone

w∗ = max γ(w, S) s.t. ∀i yi w⊤ xi > 0

The Hard Margin SVM

3 Soft margin Support Vector Machines

The formulation above has two problems:

observe that at the optimum:

1. First, write w⊤ w as ∥w∥22 .

3. Finally, let’s swap the order of the terms.

The result is the following:

The Soft Margin SVM

4 The Empirical Risk Minimization View

ℓhinge (h(x), y) = max(0, 1 − yi h(xi ))

You might also like