lecture 4
lecture 4
• The following data is a set of large regression dataset to calculate the savings of
workers given their income and minimum wage in their countries
• It is required to design a FFNN to fit the problem
Regression Example (forward direction)
5
𝜕𝐽(𝑊)
However How to calculate the gradients
𝜕𝑊
BACKPROPAGATION
Computing gradients (Backpropagation)
13
Computing gradients (Backpropagation)
14
Computing gradients (Backpropagation)
15
Computing gradients (Backpropagation)
16
Computing gradients (Backpropagation)
17
Training FFNN
18
𝒙𝟏 𝒙𝟐 target
0 0 0
0 1 1
1 0 1
1 1 0
Backpropagation example (random initialization)
22
Backpropagation example (Feedforward)
23
Backpropagation example (loss calculation)
24
n: number of samples
In this example n=1
Backpropagation example (backword)
25
Gradient descent:
𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∆𝑊
𝜕𝐽(𝑊)
∆𝑤 =
𝜕𝑊
Backpropagation example (weight update)
26
Gradient descent:
𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∆𝑊
𝜕𝐽(𝑊)
∆𝑤 =
𝜕𝑊
Backpropagation example
27
Iteration 2:
• Repeat the above steps using the second sample (0,1), desired output=1
Backpropagation example (final result)
28
OPTIMIZATION: GRADIENT DESCENT ALTERNATIVES
Gradient descent
30
a) reduces the variance of the parameter updates, which can lead to more stable
convergence;
b) can make use of highly optimized matrix optimizations
c) Common mini-batch sizes range between 50 and 256, but can vary for different
applications.
Mini-batch gradient descent is typically the algorithm of choice when training a
neural network and the term SGD usually is employed also when mini-batches are
used.
Challenges of mini-batch gradient descent
37
Challenges of mini-batch gradient descent
38
Challenges of mini-batch gradient descent
39
Challenges of mini-batch gradient descent
40
Challenges of mini-batch gradient descent
41
Challenges of mini-batch gradient descent
42