0% found this document useful (0 votes)

34 views

IE643 Lecture4 2020aug25

The document outlines the training process of a perceptron and discusses its convergence properties. It recaps the perceptron algorithm and geometric intuition behind its training procedure. The linear separability assumption is introduced, which assumes that the data can be separated by a hyperplane with a margin. The document then discusses deriving a mistake bound, which bounds the number of mistakes made by the perceptron during training in terms of other quantities under the linear separability assumption.

Uploaded by

Ankit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

IE643 Lecture4 2020aug25

Uploaded by

Ankit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Deep Learning - Theory and Practice

IE 643
Lecture 4

August 25, 2020.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 1 / 71

Outline

1 Recap
Perceptron Training and Convergence

2 Perceptron Mistake Bound (Continued...)

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 2 / 71

Recap Perceptron Training and Convergence

Recap: Perceptron Training and Convergence

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 3 / 71

Recap Perceptron Training and Convergence

Perceptron

Input is of the form x = (x1 , x2 , . . . , xd ) ∈ Rd .

We associate weights w = (w1 , w2 , . . . , wd ) ∈ Rd to the connections.
Prediction Rule:
I hw , xi ≥ θ =⇒ predict 1.
I hw , xi < θ =⇒ predict −1.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 4 / 71
Recap Perceptron Training and Convergence

Perceptron - Geometric Idea

Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane (w , θ) such that samples
with class labels 1 and −1 lie on alternate sides of the hyperplane.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 5 / 71

Recap Perceptron Training and Convergence

Perceptron - Geometric Idea

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 6 / 71

Recap Perceptron Training and Convergence

Perceptron

Input is of the form x̃ = (x, 1) = (x1 , x2 , . . . , xd , 1) ∈ Rd+1 .

We associate weights w̃ = (w , −θ) = (w1 , w2 , . . . , wd , −θ) ∈ Rd+1 to
the connections.
Prediction Rule:
I hw̃ , x̃i = hw , xi − θ ≥ 0 =⇒ predict 1.
I hw̃ , x̃i = hw , xi − θ < 0 =⇒ predict −1.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 7 / 71
Recap Perceptron Training and Convergence

Perceptron - Data Perspective

Input: data point x = (x1 , x2 , . . . , xd ), label y ∈ {+1, −1}.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 8 / 71
Recap Perceptron Training and Convergence

Perceptron - Training

Perceptron Training Procedure

1: w̃ 1 = 0
2: for t ← 1, 2, 3, . . . do
3: receive (x t , y t ), x t ∈ Rd , y t ∈ {+1, −1}.
4: Transform x t into x̃ t = (x t , 1) ∈ Rd+1 .
5: yb = Perceptron(x̃ t ; w̃ t )
6: if yb 6= y t then
7: w̃ t+1 = w̃ t + y t x̃ t
8: else
9: w̃ t+1 = w̃ t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 9 / 71

Recap Perceptron Training and Convergence

Recap: Convergence of Perceptron Training

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 10 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

Can the data be separated by a hyperplane?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 11 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

First assumption: At least the data should be such that the samples
with label 1 can to be separated by a hyperplane from samples with
label −1.
Is this assumption sufficient?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 12 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Geometric Intuition

Refined assumption: We not only want the data to be separated

but the separation should be good enough!
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 13 / 71
Recap Perceptron Training and Convergence

Perceptron Convergence - Separability Assumption

Linear Separability Assumption

Let D = {(x t , y t )}∞ t
t=1 denote the training data where x ∈ R ,
d

y t ∈ {+1, −1}, ∀t = 1, 2, . . .. Then there exist Rd 3 w ∗ 6= 0, γ > 0, such

that:

hw ∗ , x t i > γ where y t = 1,
hw ∗ , x t i < −γ where y t = −1.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 14 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Separability Assumption

Linear Separability Assumption

Let D = {(x t , y t )}∞ t d
t=1 denote the training data where x ∈ R ,
t d ∗
y ∈ {+1, −1}, ∀t = 1, 2, . . .. Then there exist R 3 w 6= 0, γ > 0, such
that:

y t hw ∗ , x t i > γ.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 15 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

We will try to derive useful bounds on the number of mistakes that a

perceptron can commit during its training.

Assumption on data: Linear Separability

Assume that the T rounds of training have been completed in

perceptron training. Assume T to be some large number.

Assume that M mistakes are made by the perceptron in these T

rounds. (Obviously, M < T .)

We ask if the number of mistakes M can be bounded by some

suitable quantity.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 16 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 17 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 18 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 19 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 20 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 21 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

We begin the analysis by considering an arbitrary round
t ∈ {1, 2, . . . , T } where a mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .
I Perceptron update: w t+1 = w t + y t x t .
Now from linear separability assumption, we have w ∗ 6= 0 such that
y t hw ∗ , x t i > γ.
First step: To bound the difference hw ∗ , w t+1 i − hw ∗ , w t i.
This quantity helps us to check if w t+1 obtained after perceptron
update is closer in orientation to w ∗

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 22 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

hw ∗ , w t+1 i − hw ∗ , w t i = y t hw ∗ , x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 23 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

hw ∗ , w t+1 i − hw ∗ , w t i = y t hw ∗ , x t i > γ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 24 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Now when no mistake is made in round t, we have w t+1 = w t .

Hence hw ∗ , w t+1 i − hw ∗ , w t i = 0.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 25 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

T
X X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t+1 i − hw ∗ , w t i+
t=1 t∈{1,...,T },
t:mistake is made
at round t
X
hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:no mistake is made
at round t
X
= hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:mistake is made
at round t
> Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 26 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Also note:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w T +1 i
t=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 27 / 71

Recap Perceptron Training and Convergence

Perceptron Convergence - Mistake Bound

Hence we have:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i > Mγ
t=1
=⇒ hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 28 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Mistake Bound Continued...

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 29 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 30 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

From Cauchy-Schwarz inequality we have,

hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2 . (Homework: Prove this inequality!)

Note: kw T +1 k2 denotes the Euclidean `2 norm of w T +1 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 31 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Now we will handle the inner product term:

hw ∗ , w T +1 i > Mγ

From Cauchy-Schwarz inequality we have,

hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2 . (Homework: Prove this inequality!)

Note: kw T +1 k2 denotes the Euclidean `2 norm of w T +1 .

We will now see how to bound kw T +1 k2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 32 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Again we consider an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 33 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 34 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Again we conside an arbitrary round t ∈ {1, 2, . . . , T } where a
mistake is made by the perceptron.
Recall: During this t-th round:
I Perceptron computes ŷ t = sign(hw t , x t i).
I ŷ t 6= y t .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 35 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 36 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

kw t+1 k22 = kw t + y t x t k22

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 37 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

kw t+1 k22 = kw t + y t x t k22

= kw t k22 + ky t x t k22 + 2hw t , y t x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 38 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

kw t+1 k22 = kw t + y t x t k22

= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 39 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

kw t+1 k22 = kw t + y t x t k22

= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i
=⇒ kw t+1 k22 ≤ kw t k22 + kx t k22 (How?)

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 40 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

kw t+1 k22 = kw t + y t x t k22

= kw t k22 + ky t x t k22 + 2hw t , y t x t i
= kw t k22 + kx t k22 + 2y t hw t , x t i
=⇒ kw t+1 k22 ≤ kw t k22 + kx t k22 (How?)

Thus kw t+1 k22 − kw t k22 ≤ kx t k22 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 41 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 42 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

This is yet another assumption to help our analysis.

Bounded kx t k2 is not very unrealistic, however finding a suitable
value for R might be difficult.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 43 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Assumption on boundedness of kx t k2
We shall assume further that ∀t = 1, 2, . . . , the `2 norm (or length) of x t
is bounded, which is denoted as:

kx t k2 ≤ R ∀t = 1, 2, . . .

This is yet another assumption to help our analysis.

Bounded kx t k2 is not very unrealistic, however finding a suitable
value for R might be difficult.
This is where normalizing all x t might help, so that kx t k2 ≤ 1 can be
assumed.
Note: The set {x ∈ Rd : kxk2 ≤ 1} is called a unit ball in Rd .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 44 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 45 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 46 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get

T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 47 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get

T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t
X
= kw t+1 k22 − kw t k22
t∈{1,...,T },
t:mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 48 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

We thus have

kw t+1 k22 − kw t k22 ≤ kx t k22 =⇒ kw t+1 k22 − kw t k22 ≤ R 2 .

Again, summing kw t+1 k22 − kw t k22 over t = 1, . . . , T we get

T
X X
kw t+1 k22 − kw t k22 = kw t+1 k22 − kw t k22 +
i=1 t∈{1,...,T },
t:mistake is made
at round t
X
kw t+1 k22 − kw t k22
t∈{1,...,T },
t:no mistake is made
at round t
X
= kw t+1 k22 − kw t k22 ≤ MR 2 (How?)
t∈{1,...,T },
t:mistake is made
at round t

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 49 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 50 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:

T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 51 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:

T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

Combining both, we get

kw T +1 k22 ≤ MR 2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 52 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Thus we have
T
X
kw t+1 k22 − kw t k22 ≤ MR 2 .
i=1

On the other hand we get:

T
X
kw t+1 k22 − kw t k22 = kw T +1 k22 . (Homework!)
i=1

Combining both, we get

kw T +1 k22 ≤ MR 2 .

Thus we have bounded kw T +1 k2 .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 53 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 54 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 55 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 56 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 57 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 58 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2
kw ∗ k22 R 2
=⇒ M <
γ2

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 59 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Recall: We wanted to handle the inner product term:

hw ∗ , w T +1 i > Mγ

Then using Cauchy-Schwarz inequality we had

Mγ < hw ∗ , w T +1 i ≤ kw ∗ k2 kw T +1 k2
=⇒ M 2 γ 2 < kw ∗ k22 kw T +1 k22

Using the bound kw T +1 k22 ≤ MR 2 we obtain:

M 2 γ 2 < kw ∗ k22 MR 2
kw ∗ k22 R 2
=⇒ M <
γ2
Thus, assuming that kw ∗ k2 and R can be controlled, the number of
mistakes M is inversely proportional to γ, which determines the closeness
of the data points to the separating hyperplane.
P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 60 / 71
Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

References:
H.D. Block: The perceptron: A model for brain functioning.
Reviews of Modern Physics 34, 123-135 (1962).
A.B.J. Novikoff: On convergence proofs on perceptrons. In:
Proceedings of the Symposium on the Mathematical Theory of
Automata, vol. XII, pp. 615-622 (1962).

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 61 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 62 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:

How do we compute w ∗ and γ in the linear separability assumption ?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 63 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Two questions remain:

How do we compute w ∗ and γ in the linear separability assumption ?

What is the intuition behind the Perceptron update rule?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 64 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability

assumption ?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 65 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability

assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 66 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability

assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .

This optimization problem is a linear program and is called a Feasibility

problem.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 67 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

First question: How do we compute w ∗ and γ in the linear separability

assumption ?
One possible way is to solve the following problem:

w ∗ , γ = argmin 0
u,µ

s.t. y hu, x t i > µ, ∀ t = 1, 2, . . . .

This optimization problem is a linear program and is called a Feasibility

problem.
Caveat: Leads to infinitely many constraints.
Thus, we need a finite data set of training samples.

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 68 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Question: How do we adapt the perceptron training to finite data sets?

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 69 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Question: How do we adapt the perceptron training to finite data sets?
Perceptron Training Procedure For Finite Data
1: Input: D = {(x i , y i )}N i d i
i=1 , x ∈ R , y ∈ {+1, −1}.
2: 1
w = 0, t = 1.
3: while True do
4: for i ← 1, 2, 3, . . . , N do
5: receive (x i , y i ) from D.
6: (x t , y t ) = (x i , y i ).
7: yb = Perceptron(x t ; w t )
8: if yb 6= y t then
9: w t+1 = w t + y t x t
10: else
11: w̃ t+1 = w̃ t
12: t =t +1

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 70 / 71

Perceptron Mistake Bound (Continued...)

Perceptron Convergence - Mistake Bound

Second question:
What is the intuition behind the Perceptron update rule?

Will see later!

P. Balamurugan Deep Learning - Theory and Practice August 25, 2020. 71 / 71

Big Data Systems
100% (1)
Big Data Systems
341 pages
IE643 Lecture3 2020aug21
No ratings yet
IE643 Lecture3 2020aug21
60 pages
IE643 Lecture2 2020aug18
No ratings yet
IE643 Lecture2 2020aug18
65 pages
IE643 Lecture6 2020sep1
No ratings yet
IE643 Lecture6 2020sep1
76 pages
1155_CS_F425_20230524120823_Mid_Semester_Question_Paper_DL
No ratings yet
1155_CS_F425_20230524120823_Mid_Semester_Question_Paper_DL
5 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Slide 2
No ratings yet
Slide 2
35 pages
perceptron
No ratings yet
perceptron
11 pages
Perceptron Bound Proof
No ratings yet
Perceptron Bound Proof
27 pages
ANN (Perceptron) 02
No ratings yet
ANN (Perceptron) 02
14 pages
Perceptron
No ratings yet
Perceptron
6 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Perceptron PDF
No ratings yet
Perceptron PDF
37 pages
Perceptron: Tirtharaj Dash
No ratings yet
Perceptron: Tirtharaj Dash
22 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
07. Linear Regression
No ratings yet
07. Linear Regression
37 pages
HW1
No ratings yet
HW1
2 pages
Clase3_redUnidireccional
No ratings yet
Clase3_redUnidireccional
74 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
In-Class Exercise Solutions - Perceptrons
No ratings yet
In-Class Exercise Solutions - Perceptrons
23 pages
Perceptron Mistake Bound
No ratings yet
Perceptron Mistake Bound
10 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Linear Classifier-Perceptron
No ratings yet
Linear Classifier-Perceptron
42 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
NNLS1 2019 HW1 Solutions
No ratings yet
NNLS1 2019 HW1 Solutions
5 pages
ANN - Perceptron - Adaline
No ratings yet
ANN - Perceptron - Adaline
15 pages
06 Optimization Basics PDF
No ratings yet
06 Optimization Basics PDF
82 pages
Perceptron
No ratings yet
Perceptron
3 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
Lecturenotes Perceptron
No ratings yet
Lecturenotes Perceptron
7 pages
Single Layer Feedforward Networks
No ratings yet
Single Layer Feedforward Networks
21 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
No ratings yet
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
11 pages
08 NN
No ratings yet
08 NN
43 pages
NN 2
No ratings yet
NN 2
42 pages
Perceptron
No ratings yet
Perceptron
24 pages
lecture 4
No ratings yet
lecture 4
65 pages
Neural N Problems - SLP
No ratings yet
Neural N Problems - SLP
123 pages
Lecture 5 NN
No ratings yet
Lecture 5 NN
57 pages
Perceptron
No ratings yet
Perceptron
26 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
Linear
No ratings yet
Linear
18 pages
Tasks on Neurons and ANN
No ratings yet
Tasks on Neurons and ANN
15 pages
Deep Learning.pdf
No ratings yet
Deep Learning.pdf
289 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
151 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
No ratings yet
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
43 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
BT Neural
No ratings yet
BT Neural
9 pages
hw1 Sols PDF
No ratings yet
hw1 Sols PDF
5 pages
1 Slides ANN
No ratings yet
1 Slides ANN
90 pages
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
14 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
cz4041 7 ANN
No ratings yet
cz4041 7 ANN
70 pages
Lecture1 INL
No ratings yet
Lecture1 INL
132 pages
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
P. Balamurugan Deep Learning - Theory and Practice
No ratings yet
P. Balamurugan Deep Learning - Theory and Practice
47 pages
IE643 Lecture10 Part2 25sep2020 PDF
No ratings yet
IE643 Lecture10 Part2 25sep2020 PDF
55 pages
IE643 Lecture9 2020sep15 Moodle
No ratings yet
IE643 Lecture9 2020sep15 Moodle
75 pages
IE643 Lecture8 2020sep11 2020sep8
No ratings yet
IE643 Lecture8 2020sep11 2020sep8
100 pages
IE643 Lecture7 2020sep4 Moodle
No ratings yet
IE643 Lecture7 2020sep4 Moodle
67 pages
P. Balamurugan Deep Learning - Theory and Practice
No ratings yet
P. Balamurugan Deep Learning - Theory and Practice
29 pages
Slides Laplace Transforms April 10 2019
No ratings yet
Slides Laplace Transforms April 10 2019
198 pages
Tutorial 4 & 5 Sol
No ratings yet
Tutorial 4 & 5 Sol
17 pages
Survey of Explainable Artificial Intelligence Techniques For Biomedical Imaging With Deep Neural Networks
No ratings yet
Survey of Explainable Artificial Intelligence Techniques For Biomedical Imaging With Deep Neural Networks
29 pages
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 417)
No ratings yet
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 417)
8 pages
Study of Subjective and Objective Quality Assessme
No ratings yet
Study of Subjective and Objective Quality Assessme
19 pages
Huawei Final Written Exam 2.2 Attempts
No ratings yet
Huawei Final Written Exam 2.2 Attempts
19 pages
MSC Thesis Machine Learning in Industrial Machinery
No ratings yet
MSC Thesis Machine Learning in Industrial Machinery
46 pages
Artificial Intelligence in Marketing
No ratings yet
Artificial Intelligence in Marketing
9 pages
Program Curriculum CAIE
No ratings yet
Program Curriculum CAIE
6 pages
Students S Trait Using Machine and Deep Learning
No ratings yet
Students S Trait Using Machine and Deep Learning
7 pages
A Survey On Personality-Aware Recommendation Systems
No ratings yet
A Survey On Personality-Aware Recommendation Systems
17 pages
Cuda PDF
No ratings yet
Cuda PDF
18 pages
Bosch - GBA MBM 4nov24
No ratings yet
Bosch - GBA MBM 4nov24
19 pages
Artificial Intelligence and Machine Learning for Healthcare Vol 2 Emerging Methodologies and Trends Intelligent Systems Reference Library 229 1st Edition Chee Peng Lim (Editor) 2024 Scribd Download
100% (1)
Artificial Intelligence and Machine Learning for Healthcare Vol 2 Emerging Methodologies and Trends Intelligent Systems Reference Library 229 1st Edition Chee Peng Lim (Editor) 2024 Scribd Download
40 pages
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
No ratings yet
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
38 pages
Eti Chapter-1 MCQ
No ratings yet
Eti Chapter-1 MCQ
12 pages
An Innovative Hashing Scheme and BiLSTM-based Dynamic Resume Ranking System
No ratings yet
An Innovative Hashing Scheme and BiLSTM-based Dynamic Resume Ranking System
8 pages
Unit 1 Introduction N
No ratings yet
Unit 1 Introduction N
45 pages
Methods and Advancement of Content-Based Fashion Image Retrieval: A Review
No ratings yet
Methods and Advancement of Content-Based Fashion Image Retrieval: A Review
30 pages
Research Statement
No ratings yet
Research Statement
2 pages
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
No ratings yet
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
8 pages
Neural Networks - I
No ratings yet
Neural Networks - I
93 pages
Applications of Artificial Intelligence in Mining and Geotechnical Geoengineering 1st Edition Hoang Nguyen Xuan Nam Bui Yosoon Choi Wengang Zhang Jian Zhou Erkan Topal pdf download
100% (1)
Applications of Artificial Intelligence in Mining and Geotechnical Geoengineering 1st Edition Hoang Nguyen Xuan Nam Bui Yosoon Choi Wengang Zhang Jian Zhou Erkan Topal pdf download
51 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
9 pages
Fire Detection and Alert System Using Convolutional Neural Network
No ratings yet
Fire Detection and Alert System Using Convolutional Neural Network
6 pages
Unit 1
No ratings yet
Unit 1
36 pages
AI in Cyber Security
100% (1)
AI in Cyber Security
5 pages
Agriculture 12 01033 v2
No ratings yet
Agriculture 12 01033 v2
35 pages
Physics-Informed Deep-Learning For Scientific Computing PDF
No ratings yet
Physics-Informed Deep-Learning For Scientific Computing PDF
19 pages
Artificial Intelligence Machine Learning Program Brochure
No ratings yet
Artificial Intelligence Machine Learning Program Brochure
24 pages