0% found this document useful (0 votes)
55 views76 pages

IE643 Lecture6 2020sep1

The document discusses moving beyond the limitations of perceptrons for non-linearly separable problems. It introduces multi-layer perceptrons (MLPs) as a way to represent non-linear decision boundaries using multiple layers of neurons with non-linear activations. Specifically, it shows how an MLP can learn the XOR function, which cannot be represented by a single perceptron. It provides notation for neurons and activations in an MLP and works through an example XOR classification task to demonstrate their non-linear representation capability.

Uploaded by

Ankit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views76 pages

IE643 Lecture6 2020sep1

The document discusses moving beyond the limitations of perceptrons for non-linearly separable problems. It introduces multi-layer perceptrons (MLPs) as a way to represent non-linear decision boundaries using multiple layers of neurons with non-linear activations. Specifically, it shows how an MLP can learn the XOR function, which cannot be represented by a single perceptron. It provides notation for neurons and activations in an MLP and works through an example XOR classification task to demonstrate their non-linear representation capability.

Uploaded by

Ankit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Deep Learning - Theory and Practice

IE 643
Lecture 6

September 1, 2020.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 1 / 76


Outline

1 Moving on from Perceptron

2 Multi Layer Perceptron


MLP-Data Perspective

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 2 / 76


Moving on from Perceptron

Perceptron - Caveat

Not suitable when linear separability assumption fails


Example: Classical XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 3 / 76


Moving on from Perceptron

Perceptron - Caveat

Not suitable when linear separability assumption fails


Example: Classical XOR problem

Heavily criticized by M. Minsky and S. Papert in their book: Perceptrons,


MIT Press, 1969.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 4 / 76


Moving on from Perceptron

Perceptron - Caveat
Not suitable when linear separability assumption fails
Example: Classical XOR problem

x1 x2 y = x1 ⊕ x2
0 0 -1
0 1 1
1 0 1
1 1 -1

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 5 / 76


Moving on from Perceptron

Perceptron - Caveat

Not suitable when linear separability assumption fails


Example: Classical XOR problem

x1 x2 y = x1 ⊕ x2 ŷ = sign(w1 x1 + w2 x2 − θ)
0 0 -1 sign(−θ)
0 1 1 sign(w2 − θ)
1 0 1 sign(w1 − θ)
1 1 -1 sign(w1 + w2 − θ)

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 6 / 76


Moving on from Perceptron

Perceptron - Caveat
Not suitable when linear separability assumption fails
Example: Classical XOR problem

sign(−θ) = −1 =⇒ θ > 0
sign(w2 − θ) = 1 =⇒ w2 − θ ≥ 0
sign(w1 − θ) = 1 =⇒ w1 − θ ≥ 0
sign(w1 + w2 − θ) = −1 =⇒ −w1 − w2 + θ > 0

Note: This system is inconsistent. (Homework!)

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 7 / 76


Moving on from Perceptron

Perceptron - Caveat
Not suitable when linear separability assumption fails
Example: Classical XOR problem

sign(−θ) = −1 =⇒ θ > 0
sign(w2 − θ) = 1 =⇒ w2 − θ ≥ 0
sign(w1 − θ) = 1 =⇒ w1 − θ ≥ 0
sign(w1 + w2 − θ) = −1 =⇒ −w1 − w2 + θ > 0

Note: This system is inconsistent. (Homework!)


Recall: We verified this using code for linear separability check.
P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 8 / 76
Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 9 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Assume that the sample features x ∈ Rd .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 10 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Assume that the sample features x ∈ Rd .

Idea: Use a transformation φ : Rd → Rq , where q  d, to lift the


data samples x ∈ Rd into φ(x) ∈ Rq hoping to see a separating
hyperplane in the transformed space.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 11 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Assume that the sample features x ∈ Rd .

Idea: Use a transformation φ : Rd → Rq , where q  d, to lift the


data samples x ∈ Rd into φ(x) ∈ Rq hoping to see a separating
hyperplane in the transformed space.

Forms the core idea behind kernel methods. (Will not be pursued in
this course!)

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 12 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 13 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Idea: The separating surface need not be linear and can be assumed
to take some non-linear form.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 14 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Idea: The separating surface need not be linear and can be assumed
to take some non-linear form.

Hence for an input space X and output space Y, the learned map
h : X → Y can take some non-linear form.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 15 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Idea: The separating surface need not be linear and can be assumed
to take some non-linear form.

Hence for an input space X and output space Y, the learned map
h : X → Y can take some non-linear form.

Forms the idea behind multi-layer perceptrons!

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 16 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 17 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 18 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Some notations
nk` denotes k-th neuron at layer `.
ak` denotes the activation of the neuron nk` .
P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 19 / 76
Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Activation at neuron n11 :

a11 = max{px1 + qx2 + b1 , 0}.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 20 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Activation at neuron n21 :

a21 = max{rx1 + sx2 + b2 , 0}.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 21 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Activation at neuron n12 :


a12 = sign(ta11 + ua21 + b3 ).

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 22 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

Activation at neuron n12 :


a12 = sign(ta11 + ua21 + b3 ).
Note: The activation a12 is the output of the network denoted by ŷ .
P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 23 / 76
Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

x1 x2 a11 a21 ŷ y
0 0 max{b1 , 0} max{b2 , 0} sign(ta11 + ua21 + b3 ) -1
0 1 max{q + b1 , 0} max{s + b2 , 0} sign(ta11 + ua21 + b3 ) +1
1 0 max{p + b1 , 0} max{r + b2 , 0} sign(ta11 + ua21 + b3 ) +1
1 1 max{p + q + b1 , 0} max{r + s + b2 , 0} sign(ta11 + ua21 + b3 ) -1

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 24 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 25 / 76


Moving on from Perceptron

Moving away from perceptron - Dealing with XOR problem

A different Multi Layer Perceptron (MLP) architecture is given for XOR


problem in:
David. E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams.
Learning Internal Representations by Error Propagation,
Technical Report, UCSD, 1985.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 26 / 76


Multi Layer Perceptron

Multi Layer Perceptron

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 27 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 28 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Multiple layers stacked together.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 29 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Multiple layers stacked together.
Zero-th layer usually called input layer.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 30 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Multiple layers stacked together.
Zero-th layer usually called input layer.
Final layer usually called output layer.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 31 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Multiple layers stacked together.
Zero-th layer usually called input layer.
Final layer usually called output layer.
Intermediate layers are called hidden layers.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 32 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Each neuron in the hidden and output layer is like a perceptron.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 33 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Each neuron in the hidden and output layer is like a perceptron.
However, unlike perceptron, different activation functions are used.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 34 / 76


Multi Layer Perceptron

Multi Layer Perceptron

Notable features:
Each neuron in the hidden and output layer is like a perceptron.
However, unlike perceptron, different activation functions are used.
max{x, 0} has a special name called ReLU (Rectified Linear Unit).

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 35 / 76


Multi Layer Perceptron

Multi Layer Perceptron

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 36 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 37 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

This MLP contains an input layer L0 , 2 hidden layers denoted by


L1 , L2 , and output layer L3 .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 38 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

Recall:
nk` denotes k-th neuron at `-th layer.
ak` denotes activation of neuron nk` .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 39 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

wij` denotes weight of connection connecting ni` from nj`−1 .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 40 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

wij` denotes weight of connection connecting ni` from nj`−1 .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 41 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

wij` denotes weight of connection connecting ni` from nj`−1 .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 42 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

In this particular case, the inputs are x1 and x2 at input layer L0 .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 43 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L1 :
I At neuron n11 :
F a11 = φ(w11
1 1
x1 + w12 x2 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 44 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L1 :
I At neuron n11 :
F a11 = φ(w11
1 1
x1 + w12 x2 ) =: φ(z11 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 45 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L1 :
I At neuron n21 :
F a21 = φ(w21
1 1
x1 + w22 x2 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 46 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L1 :
I At neuron n21 :
F a21 = φ(w21
1 1
x1 + w22 x2 ) =: φ(z21 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 47 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L1 :
 1 
φ(z11 ) 1 x + w1 x )
  
a1 φ(w11 1 12 2
= =
a21 φ(z21 ) 1 x + w1 x )
φ(w21 1 22 2

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 48 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

 1 1
  
w11 w12 x
Letting W1 = 1 1 and x = 1 , we have at layer L1 :
w21 w22 x2
 1  1   1 1 x

a1 z1 w11 x1 + w12 2
=φ =φ = φ(W 1 x)
a21 z21 1 x + w1 x
w21 1 22 2

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 49 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

 1
a
Letting a1 = 11 , we have at layer L1 :
a2
 1
a
a = 11 = φ(W 1 x)
1
a2

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 50 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L2 :
I At neuron n12 :
F a12 = φ(w11
2 1 2 1
a1 + w12 a2 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 51 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L2 :
I At neuron n12 :
F a12 = φ(w11
2 1 2 1
a1 + w12 a2 ) =: φ(z12 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 52 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L2 :
I At neuron n22 :
F a22 = φ(w21
2 1 2 1
a1 + w22 a2 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 53 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L2 :
I At neuron n22 :
F a22 = φ(w21
2 1 2 1
a1 + w22 a2 ) =: φ(z22 ).

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 54 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L2 :
 2 
φ(z12 ) 2 a1 + w 2 a1 )
  
2 a1 φ(w11 1 12 2
a = 2 = =
a2 φ(z22 ) 2 a1 + w 2 a1 )
φ(w21 1 22 2

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 55 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

 2 2

w11 w12
Letting W2 = 2 2 , we have at layer L2 :
w21 w22
 2  2   2 1 2 a1
   1 
2 a1 z1 w11 a1 + w12 2 2 a1
a = 2 =φ =φ =φ W
a2 z22 2 a1 + w 2 a1
w21 1 22 2 a21

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 56 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

We have at layer L2 :
 2  2    1 
2 a1 z1 2 a1
a = 2 =φ =φ W = φ(W 2 a1 )
a2 z22 a21

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 57 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L3 :
I At neuron n13 :
F a13 = φ(w11
3 2 3 2
a1 + w12 a2 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 58 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L3 :
I At neuron n13 :
F a13 = φ(w11
3 2 3 2
a1 + w12 a2 ) =: φ(z13 ) .

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 59 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

At layer L3 :

a3 = a13 = φ(z13 ) = φ(w11


3 a2 + w 3 a2 )
     
1 12 2

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 60 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

 3 3 , we have at layer L :
Letting W 3 = w11

w12 3
  2 
3 3 a1
 3  3   3 2 3 2

a = a1 = φ z1 = φ w11 a1 + w12 a2 = φ W
a22

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 61 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

 3 3 , we have at layer L :
Letting W 3 = w11

w12 3
  2 
3 3 a1
= φ(W 3 a2 )
 3  3 
a = a1 = φ z1 = φ W
a22

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 62 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

a3 = φ(W 3 a2 )

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 63 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

a3 = φ(W 3 a2 ) = φ(W 3 φ(W 2 a1 ))

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 64 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

a3 = φ(W 3 a2 ) = φ(W 3 φ(W 2 a1 )) = φ(W 3 φ(W 2 φ(W 1 x)))

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 65 / 76


Multi Layer Perceptron

Multi Layer Perceptron - More notations

ŷ = a3 = φ(W 3 a2 ) = φ(W 3 φ(W 2 a1 )) = φ(W 3 φ(W 2 φ(W 1 x)))

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 66 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Given data (x, y ), multi layer perceptron predicts:

ŷ = φ(W 3 φ(W 2 φ(W 1 x)))

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 67 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Given data (x, y ), multi layer perceptron predicts:

ŷ = φ(W 3 φ(W 2 φ(W 1 x))) =: MLP(x)

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 68 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Given data (x, y ), multi layer perceptron predicts:

ŷ = φ(W 3 φ(W 2 φ(W 1 x))) =: MLP(x)

Similar to perceptron, if y 6= ŷ an error E (y , ŷ ) is incurred.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 69 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Given data (x, y ), multi layer perceptron predicts:

ŷ = φ(W 3 φ(W 2 φ(W 1 x))) =: MLP(x)

Similar to perceptron, if y 6= ŷ an error E (y , ŷ ) is incurred.


Aim: To change the weights W 1 , W 2 , W 3 , such that the error E (y , ŷ ) is
minimized.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 70 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Given data (x, y ), multi layer perceptron predicts:

ŷ = φ(W 3 φ(W 2 φ(W 1 x))) =: MLP(x)

Similar to perceptron, if y 6= ŷ an error E (y , ŷ ) is incurred.


Aim: To change the weights W 1 , W 2 , W 3 , such that the error E (y , ŷ ) is
minimized.
Leads to an error minimization problem.

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 71 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Input: Training Data D = {(x s , y s )}Ss=1 .


For each sample x s the prediction ŷ s = MLP(x s ).
Error: e s = E (y s , ŷ s ).
Aim: To minimize Ss=1 e s .
P

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 72 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Optimization perspective
Given training data D = {(x s , y s )}Ss=1 ,
S
X
min es
s=1

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 73 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Optimization perspective
Given training data D = {(x s , y s )}Ss=1 ,
S
X S
X
min es = E (y s , ŷ s )
s=1 s=1

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 74 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Optimization perspective
Given training data D = {(x s , y s )}Ss=1 ,
S
X S
X S
X
min es = E (y s , ŷ s ) = E (y s , MLP(x s ))
s=1 s=1 s=1

P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 75 / 76


Multi Layer Perceptron MLP-Data Perspective

Multi Layer Perceptron - Data Perspective

Optimization perspective
Given training data D = {(x s , y s )}Ss=1 ,
S
X S
X S
X
min es = E (y s , ŷ s ) = E (y s , MLP(x s ))
s=1 s=1 s=1

Note: The minimization is over the weights of the MLP W 1 , . . . , W L ,


where L denotes number of layers in MLP.
P. Balamurugan Deep Learning - Theory and Practice September 1, 2020. 76 / 76

You might also like