0% found this document useful (0 votes)
34 views

IE643 Lecture4 2020aug25

The document outlines the training process of a perceptron and discusses its convergence properties. It recaps the perceptron algorithm and geometric intuition behind its training procedure. The linear separability assumption is introduced, which assumes that the data can be separated by a hyperplane with a margin. The document then discusses deriving a mistake bound, which bounds the number of mistakes made by the perceptron during training in terms of other quantities under the linear separability assumption.

Uploaded by

Ankit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

IE643 Lecture4 2020aug25

The document outlines the training process of a perceptron and discusses its convergence properties. It recaps the perceptron algorithm and geometric intuition behind its training procedure. The linear separability assumption is introduced, which assumes that the data can be separated by a hyperplane with a margin. The document then discusses deriving a mistake bound, which bounds the number of mistakes made by the perceptron during training in terms of other quantities under the linear separability assumption.

Uploaded by

Ankit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Deep Learning - Theory and Practice

IE 643
Lecture 4

August 25, 2020.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 1 / 71


Outline

1 Recap
Perceptron Training and Convergence

2 Perceptron Mistake Bound (Continued...)

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 2 / 71


Recap Perceptron Training and Convergence

Recap: Perceptron Training and Convergence

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 3 / 71


Recap Perceptron Training and Convergence

Perceptron

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .


We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
I hw , xi ≥ θ =⇒ predict 1.
I hw , xi < θ =⇒ predict −1.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 4 / 71
Recap Perceptron Training and Convergence

Perceptron - Geometric Idea

Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane (w , θ) such that samples
with class labels 1 and −1 lie on alternate sides of the hyperplane.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 5 / 71


Recap Perceptron Training and Convergence

Perceptron - Geometric Idea

Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane (w , θ) such that samples
with class labels 1 and −1 lie on alternate sides of the hyperplane.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 6 / 71


Recap Perceptron Training and Convergence

Perceptron

Input is of the form x̃ = (x, 1) = (x1 , x2 , . . . , xd , 1) ∈ Rd+1 .


We associate weights w̃ = (w , −θ) = (w1 , w2 , . . . , wd , −θ) ∈ Rd+1 to
the connections.
Prediction Rule:
I hw̃ , x̃i = hw , xi − θ ≥ 0 =⇒ predict 1.
I hw̃ , x̃i = hw , xi − θ < 0 =⇒ predict −1.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 7 / 71
Recap Perceptron Training and Convergence

Perceptron - Data Perspective

Input: data point x = (x1 , x2 , . . . , xd ), label y ∈ {+1, −1}.


P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 8 / 71
Recap Perceptron Training and Convergence

Perceptron - Training

Perceptron Training Procedure


1: w̃ 1 = 0
2: for t ← 1, 2, 3, . . . do
3: receive (x t , y t ), x t ∈ Rd , y t ∈ {+1, −1}.
4: Transform x t into x̃ t = (x t , 1) ∈ Rd+1 .
5: yb = Perceptron(x̃ t ; w̃ t )
6: if yb 6= y t then
7: w̃ t+1 = w̃ t + y t x̃ t
8: else
9: w̃ t+1 = w̃ t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 9 / 71


Recap Perceptron Training and Convergence

Recap: Convergence of Perceptron Training

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 10 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

Can the data be separated by a hyperplane?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 11 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

First assumption: At least the data should be such that the samples
with label 1 can to be separated by a hyperplane from samples with
label −1.
Is this assumption sufficient?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 12 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

Refined assumption: We not only want the data to be separated


but the separation should be good enough!
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 13 / 71
Recap Perceptron Training and Convergence

Perceptron Convergence - Separability Assumption

Linear Separability Assumption


Let D = {(x t , y t )}∞ t
t=1 denote the training data where x ∈ R ,
d

y t ∈ {+1, −1}, ∀t = 1, 2, . . .. Then there exist Rd 3 w ∗ 6= 0, γ > 0, such


that:

hw ∗ , x t i > γ where y t = 1,
hw ∗ , x t i < −γ where y t = −1.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 14 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Separability Assumption

Linear Separability Assumption


Let D = {(x t , y t )}∞ t d
t=1 denote the training data where x ∈ R ,
t d ∗
y ∈ {+1, −1}, ∀t = 1, 2, . . .. Then there exist R 3 w 6= 0, γ > 0, such
that:

y t hw ∗ , x t i > γ.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 15 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

We will try to derive useful bounds on the number of mistakes that a


perceptron can commit during its training.

Assumption on data: Linear Separability

Assume that the T rounds of training have been completed in


perceptron training. Assume T to be some large number.

Assume that M mistakes are made by the perceptron in these T


rounds. (Obviously, M < T .)

We ask if the number of mistakes M can be bounded by some


suitable quantity.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 16 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 17 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 18 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 19 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 20 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now from linear separability assumption, we have w ∗ 6= 0 such that
y t hw ∗ , x t i > γ.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 21 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now from linear separability assumption, we have w ∗ 6= 0 such that
y t hw ∗ , x t i > γ.
First step: To bound the difference hw ∗ , w t+1 i − hw ∗ , w t i.
This quantity helps us to check if w t+1 obtained after perceptron
update is closer in orientation to w ∗

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 22 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now from linear separability assumption, we have w ∗ 6= 0 such that
y t hw ∗ , x t i > γ.
First step: To bound the difference hw ∗ , w t+1 i − hw ∗ , w t i.
This quantity helps us to check if w t+1 obtained after perceptron
update is closer in orientation to w ∗
We can write

hw ∗ , w t+1 i − hw ∗ , w t i = y t hw ∗ , x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 23 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound


We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now from linear separability assumption, we have w ∗ 6= 0 such that
y t hw ∗ , x t i > γ.
First step: To bound the difference hw ∗ , w t+1 i − hw ∗ , w t i.
This quantity helps us to check if w t+1 obtained after perceptron
update is closer in orientation to w ∗
We can write

hw ∗ , w t+1 i − hw ∗ , w t i = y t hw ∗ , x t i > γ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 24 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Now when no mistake is made in round t, we have w t+1 = w t .

Hence hw ∗ , w t+1 i − hw ∗ , w t i = 0.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 25 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

T
X X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t+1 i − hw ∗ , w t i+
t=1 t∈{1,...,T },
t:mistake is made
at round t
X
hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:no mistake is made
at round t
X
= hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:mistake is made
at round t
> Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 26 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Also note:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w T +1 i
t=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 27 / 71


Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Hence we have:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i > Mγ
t=1
=⇒ hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 28 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Mistake Bound Continued...

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 29 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 30 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

From Cauchy-Schwarz inequality we have,


hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2 . (Homework: Prove this inequality!)

Note: kw T +1 k2 denotes the Euclidean `2 norm of w T +1 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 31 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

From Cauchy-Schwarz inequality we have,


hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2 . (Homework: Prove this inequality!)

Note: kw T +1 k2 denotes the Euclidean `2 norm of w T +1 .

We will now see how to bound kw T +1 k2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 32 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we consider an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 33 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 34 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 35 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 36 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now, we have

kw t+1 k22 = kw t + y t x t k22

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 37 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now, we have

kw t+1 k22 = kw t + y t x t k22


= kw t k22 + ky t x t k22 + 2hw t , y t x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 38 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now, we have

kw t+1 k22 = kw t + y t x t k22


= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 39 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now, we have

kw t+1 k22 = kw t + y t x t k22


= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i
=⇒ kw t+1 k22 ≤ kw t k22 + kx t k22 (How?)

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 40 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now, we have

kw t+1 k22 = kw t + y t x t k22


= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i
=⇒ kw t+1 k22 ≤ kw t k22 + kx t k22 (How?)

Thus kw t+1 k22 − kw t k22 ≤ kx t k22 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 41 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 42 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

This is yet another assumption to help our analysis.


Bounded kx t k2 is not very unrealistic, however finding a suitable
value for R might be difficult.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 43 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

This is yet another assumption to help our analysis.


Bounded kx t k2 is not very unrealistic, however finding a suitable
value for R might be difficult.
This is where normalizing all x t might help, so that kx t k2 ≤ 1 can be
assumed.
Note: The set {x ∈ Rd : kxk2 ≤ 1} is called a unit ball in Rd .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 44 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 45 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 46 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get


T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 47 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get


T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t
X
= kw t+1 k22 − kw t k22
t∈{1,...,T },
t:mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 48 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get


T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t
X
= kw t+1 k22 − kw t k22 ≤ MR 2 (How?)
t∈{1,...,T },
t:mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 49 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 50 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:


T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 51 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:


T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

Combining both, we get

kw T +1 k22 ≤ MR 2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 52 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:


T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

Combining both, we get

kw T +1 k22 ≤ MR 2 .

Thus we have bounded kw T +1 k2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 53 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 54 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 55 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 56 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 57 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 58 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2
kw ∗ k22 R 2
=⇒ M <
γ2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 59 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2
kw ∗ k22 R 2
=⇒ M <
γ2
Thus, assuming that kw ∗ k2 and R can be controlled, the number of
mistakes M is inversely proportional to γ, which determines the closeness
of the data points to the separating hyperplane.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 60 / 71
Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

References:
H.D. Block: The perceptron: A model for brain functioning.
Reviews of Modern Physics 34, 123-135 (1962).
A.B.J. Novikoff: On convergence proofs on perceptrons. In:
Proceedings of the Symposium on the Mathematical Theory of
Automata, vol. XII, pp. 615-622 (1962).

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 61 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 62 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:


How do we compute w ∗ and γ in the linear separability assumption ?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 63 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:


How do we compute w ∗ and γ in the linear separability assumption ?

What is the intuition behind the Perceptron update rule?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 64 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability


assumption ?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 65 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability


assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .


t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 66 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability


assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .


t

This optimization problem is a linear program and is called a Feasibility


problem.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 67 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability


assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .


t

This optimization problem is a linear program and is called a Feasibility


problem.
Caveat: Leads to infinitely many constraints.
Thus, we need a finite data set of training samples.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 68 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Question: How do we adapt the perceptron training to finite data sets?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 69 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound


Question: How do we adapt the perceptron training to finite data sets?
Perceptron Training Procedure For Finite Data
1: Input: D = {(x i , y i )}N i d i
i=1 , x ∈ R , y ∈ {+1, −1}.
2: 1
w = 0, t = 1.
3: while True do
4: for i ← 1, 2, 3, . . . , N do
5: receive (x i , y i ) from D.
6: (x t , y t ) = (x i , y i ).
7: yb = Perceptron(x t ; w t )
8: if yb 6= y t then
9: w t+1 = w t + y t x t
10: else
11: w̃ t+1 = w̃ t
12: t =t +1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 70 / 71


Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Second question:
What is the intuition behind the Perceptron update rule?

Will see later!

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 71 / 71

You might also like