0% found this document useful (0 votes)
14 views

Week 6 SVM

Introduction to Applied Machine Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Week 6 SVM

Introduction to Applied Machine Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

IAML: Support Vector Machines I

Nigel Goddard
School of Informatics

Semester 1

1 / 18
Outline

I Separating hyperplane with maximum margin


I Non-separable training data
I Expanding the input into a high-dimensional space
I Support vector regression
I Reading: W & F sec 6.3 (maximum margin hyperplane,
nonlinear class boundaries), SVM handout. SV regression
not examinable.

2 / 18
Overview

I Support vector machines are one of the most effective and


widely used classification algorithms.
I SVMs are the combination of two ideas
I Maximum margin classification
I The “kernel trick”
I SVMs are a linear classifier, like logistic regression and
perceptron

3 / 18
Stuff You Need to Remember
w> x is length of the projection of x onto w (if w is a unit vector)

w b

i.e., b = wT x.

(If you do not remember this, see supplementary maths notes


on course Web site.)
4 / 18
Separating Hyperplane
For any linear classifier
I Training instances (xi , yi ), i = 1, . . . , n. yi ∈ {−1, +1}
I Hyperplane w> x + w0 = 0
I Notice for this lecture we use −1 rather than 0 for negative class.
This will be convenient for the maths.

x2
o o
o o
o o
o
w o o
x o
o
x x
x

x
x x x1
x

5 / 18
A Crap Decision Boundary

Seems okay This is crap


x2
x2 o o
o o
o o
o o o
o o
o o o o
o x o
w o o x x
o
x o x
o
x x
x
x
x x x1
x x x1 x
x
x w

6 / 18
Idea: Maximize the Margin
The margin is the distance between the decision boundary (the
hyperplane) and the closest training point.

x
o x
o x x

o o
~
w margin
o
7 / 18
Computing the Margin
I The tricky part will be to get an equation for the margin
I We’ll start by getting the distance from the origin to the
hyperplane
I i.e., We want to compute the scalar b below

w
b

wTx + w0 = 0

8 / 18
Computing the Distance to Origin

I Define z as the point on


the hyperplane closest to
the origin.
z I z must be proportional to
w, because w normal to
w hyperplane
b I By definition of b, we have
the norm of z given by:
wTx + w0 = 0 ||z|| = b

So
w
b =z
||w||

9 / 18
Computing the Distance to Origin

w
I We know that (a) z on the hyperplane and (b) b ||w|| = z.
I First (a) means wT z + w0 = 0
I Substituting we get

bw
wT + w0 = 0
||w||
bwT w
+ w0 = 0
||w||
w0
b=−
||w||

I Remember ||w|| = wT w.
I Now we have the distance from the origin to the
hyperplane!

10 / 18
Computing the Distance to Hyperplane

x c

w
a
b

I Now we want c, the distance from x to the hyperplane.


I It’s clear that c = |b − a|, where a the length of the
projection of x onto w. Quiz: What is a?

11 / 18
Computing the Distance to Hyperplane

x c

w
a
b

I Now we want c, the distance from x to the hyperplane.


I It’s clear that c = |b − a|, where a the length of the
projection of x onto w. Quiz: What is a?
wT x
a=
||w||
12 / 18
Equation for the Margin

I The perpendicular distance from a point x to the


hyperplane wT x + w0 = 0 is

1
|wT x + w0 |
||w||
I The margin is the distance from the closest training point
to the hyperplane

1
min |wT xi + w0 |
i ||w||

13 / 18
The Scaling

I Note that (w, w0 ) and (cw, cw0 ) defines the same


hyperplane. The scale is arbitrary.
I This is because we predict class y = 1 if wT x + w0 ≥ 0.
That’s the same thing as saying cwT x + cw0 ≥ 0
I To remove this freedom, we will put a constraint on (w, w0 )

min |w> xi + w0 | = 1
i

I With this constraint, the margin is always 1/||w||.

14 / 18
First version of Max Margin Optimization Problem
I Here is a first version of an optimization problem to
maximize the margin (we will simplify)
max 1/||w||
w
subject to w> xi + w0 ≥ 0 for all i with yi = 1
>
w xi + w0 ≤ 0 for all i with yi = −1
>
min |w xi + w0 | = 1
i
I The first two constraints are too lose. It’s the same thing to
say
max 1/||w||
w
subject to w> xi + w0 ≥ 1 for all i with yi = 1
w> xi + w0 ≤ −1 for all i with yi = −1
>
min |w xi + w0 | = 1
i
I Now the third constraint is redundant 15 / 18
First version of Max Margin Optimization Problem

I That means we can simplify to

max 1/||w||
w
subject to w> xi + w0 ≥ 1 for all i with yi = 1
>
w xi + w0 ≤ −1 for all i with yi = −1

I Here’s a compact way to write those two constraints

max 1/||w||
w
subject to yi (w> xi + w0 ) ≥ 1 for all i

I Finally, note that maximizing 1/||w|| is the same thing as


minimizing ||w||2

16 / 18
The SVM optimization problem

I So the SVM weights are determined by solving the


optimization problem:

min ||w||2
w
s.t. yi (w> xi + w0 ) ≥ +1 for all i

I Solving this will require maths that we don’t have in this


course. But I’ll show the form of the solution next time.

17 / 18
Fin (Part I)

18 / 18

You might also like