Lecture 18. Backpropagation
Lecture 18. Backpropagation
𝑛 𝑛
𝑦 = 1 𝑖𝑓 𝑥𝑖 ≥ 0 𝑦 = 1 𝑖𝑓 𝒘𝒊 ∗ 𝑥𝑖 ≥ 0
𝑖=0 𝑖=0
𝑛 𝑛
= 0 𝑖𝑓 𝑥𝑖 < 0 = 0 𝑖𝑓 𝒘𝒊 ∗ 𝑥𝑖 < 0
𝑖=0 𝑖=0
➢ In other words, a single perceptron can only be used to implement linearly separable functions
➢ The weights (including threshold) can be learned and the inputs can be real valued
Credit: Mitesh Khapra, IITM
DA322M: Deep Learning by Dr. P. W. Patil, MFSDSAI
Perceptron Learning
w = [w0,w1,w2,...,wn]
x = [1,x1,x2,...,xn]
▪ Parameters: In all the above cases, w is a parameter which needs to be learned from the data
▪ Learning algorithm: An algorithm for learning the parameters w of the model (for example,
perceptron learning algorithm, gradient descent, etc.)
▪ With some guess work, we are able to find out the optimal values for w and b.
➢ Geometric interpretation of our “guess work” algorithm in terms of this error surface
➢ The direction u that we intend to move in should be at 180° w.r.t. the gradient
➢ Algorithm
➢ The input layer can be called the 0th layer and the
output layer can be called the (L)th layer
➢ Here, 𝑦𝑗 ∈ ℝ3
➢ Here, 𝑦𝑗 ∈ ℝ3
➢ Here, 𝑦𝑗 ∈ ℝ3
➢ Cross-entropy 𝑘
ℒ 𝜃 = − 𝑦𝑐 𝑙𝑜𝑔 𝑦ො 𝑐
𝑐=1
➢ Cross-entropy 𝑘
ℒ 𝜃 = − 𝑦𝑐 𝑙𝑜𝑔 𝑦ො 𝑐
𝑐=1
max −ℒ 𝜃 = 𝑦𝑐 𝑙𝑜𝑔 𝑦ො 𝑐
𝜃
➢ Cross-entropy 𝑘
ℒ 𝜃 = − 𝑦𝑐 𝑙𝑜𝑔 𝑦ො 𝑐
𝑐=1
➢ Cross-entropy 𝑘
ℒ 𝜃 = − 𝑦𝑐 𝑙𝑜𝑔 𝑦ො 𝑐
𝑐=1
➢ Of course, there could be other loss functions depending on the problem at hand but the two loss
functions that we just saw are encountered very often
➢ Of course, there could be other loss functions depending on the problem at hand but the two loss
functions that we just saw are encountered very often
➢ For the rest of this lecture, we will focus on the case where the output activation is a softmax function
and the loss function is cross entropy