Bayes Optimization For Machine Learning
Bayes Optimization For Machine Learning
Learning Algorithms
Arturo Fernandez
CS 294
University of California, Berkeley
p(f | X) = N (f | µ, K)
Thus
Cov(y | X) = K + σy2 I =: Ky
µ̂ = kT∗ K−1
y y
Σ̂ = k∗∗ − kT∗ K−1
y k∗
where
ARDSE kernel too smooth, instead use ARD Matérn 5/2 kernel:
KM 52 (x, x0 )
p 5 2 0
n p o
= θ0 1 + 5r (x, x ) + r (x, x ) exp − 5r2 (x, x0 )
2 0
3
D + 3 Hyperparameters
I D Scales θ1:D
I Amplitude θ0
I Observation Noise ν
I Constant mean m
Hyperparameters
I Learning rate ρt = (τ0 + t)−κ → (τ0 , κ)
I minibatch size
Cited Papers uses exhaustive search of size 6 × 6 × 8 (288)
Hyperparameters
I Epochs to run model
I Learning rate
I Four weight costs (one for each layer and the softmax
output weights)
I Width, scale and power of the response normalization on
the pooling layers
Cited Papers uses exhaustive search of size 6 × 6 × 8 (288)