ppt3dl
ppt3dl
Experiments
Lecture 15
22 January 2025 1
Normalizing Data Sets
22 January 2025 2
Speed up the training/ Why normalization
Use same normalizer in the test set also, exactly in the same way as
training set
If the features are on different scale 1, 1000 and 0,1 weights will end
up taking very different values
More steps may be needed to reach the optimal value and the
learning can be slow.
22 January 2025 3
Normalizing Training Sets
Subtract Mean
1 𝑚 (𝑖)
𝜇 = Σ𝑙=1 𝑥
𝑚
𝑥 =𝑥−𝜇
Normalize Variance
2
1 𝑚 (𝑖)
𝜎 = Σ𝑖=1 𝑥 ∗∗ 2
𝑚
𝑥/= 𝜎 2
22 January 2025 4
Vanishing/exploding gradients
g(z) = z # A linear function b[l]=0
𝑦ො = 𝑤 [𝑙] 𝑤 [𝑙−1] 𝑤 [𝑙−2] … 𝑤 3 𝑤 2 𝑤 1 𝑥
22 January 2025 5
Exploding/vanishing gradients
Gradients/slope becoming too small or two large
If the value of features are large than weights needs to very small
22 January 2025 6
Batch Norm
It is an extension of normalizing inputs and applies to every layer of the neural
network
Given some intermediate values in Neural Network
1
• 𝜇= σ 𝑧 (𝑖)
𝑚
1
• 𝜎 2 = σ(𝑧 − 𝜇)2
𝑚
𝑖 𝑧 (𝑖) −𝜇
• 𝑧𝑛𝑜𝑟𝑚 = 2
𝜎 +∈
𝑖
• 𝑧𝑖 = 𝛾𝑧𝑛𝑜𝑟𝑚 +𝛽 where 𝛾 and 𝛽 are learnable parameters
• If j = 𝜎 2 +∈ and 𝛽= 𝜇 then 𝑧𝑖 = 𝑧 (𝑖)
22 January 2025 7
Applying Batch Norm
1 [ ] 1 [ ]
1 1
𝑤 ,𝑏 𝛽 ,𝛾 [1 ]
•X z[1] 𝑧𝑖 =a =g (𝑧෦
[1] [1]
• tf.nn.batch-normalization
• Each mini-batch is scaled by the mean/variance computed on just that mini-
batch
• This adds some noise to the values of z within that minibatch. Similar to
dropout it has some regularization effect, as it adds to hidden layers
activations.
22 January 2025 8
Why batch norm
Learning
Learning Hidden Mini Number
Beta Decay Others
Rate units batch size of Layers
Rate
22 January 2025 10
Random is better than a Grid
𝜶
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * * 𝜷
22 January 2025 11
Coarser to finer
22 January 2025 12
Picking hyperparameter at random and as per
scale
• Is it ok to use the actual scale for all parameters or in some cases we
require the log scale
0.0001 1
22 January 2025 13
Babysitting one model Vs Parallel model
training
Panda or Caviar Approach
[email protected]
[email protected]
https://www.linkedin.com/in/gauravsingal789/
http://www.gauravsingal.in
22 January 2025
15