0% found this document useful (0 votes)
15 views

Bayesian Optimisation (AutoML)

This document provides an overview of Bayesian optimization for automated machine learning. It introduces Bayesian optimization and its application to AutoML problems like combined algorithm and hyperparameter selection. It discusses the theory behind Bayesian optimization, common methods like Gaussian processes and tree-based approaches, and extensions for large datasets. Applications to deep reinforcement learning are also mentioned. Tools and frameworks for Bayesian optimization in Python are listed. Limitations and strengths are briefly outlined.

Uploaded by

iit2020211
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Bayesian Optimisation (AutoML)

This document provides an overview of Bayesian optimization for automated machine learning. It introduces Bayesian optimization and its application to AutoML problems like combined algorithm and hyperparameter selection. It discusses the theory behind Bayesian optimization, common methods like Gaussian processes and tree-based approaches, and extensions for large datasets. Applications to deep reinforcement learning are also mentioned. Tools and frameworks for Bayesian optimization in Python are listed. Limitations and strengths are briefly outlined.

Uploaded by

iit2020211
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Bayesian Optimisation

FOR AUTOMATED MACHINE LEARNING


Outline
In this presentation, we will introduce Bayesian Optimization (BO) for Automated Machine
Learning (AutoML). The structure of the talk will have:
 An introduction into BO and AutoML
 Some motivations for BO applied to AutoML
 An overview of the theory behind BO with focus on AutoML
 A discussion into state-of-the-art for BO for AutoML
 Mentions of application areas, with a focus on Deep Reinforcement Learning (Deep RL)
 A look into tutorials, and some demos of BO and Deep RL in action
 Conclusions
Automated
Machine Learning (AutoML)
 A few questions to ask yourself: What model should I use
for my machine learning task? What model hyperparameters
shall I set and to which values? What are hyperparameters?

 Applying machine learning models to problems requires


some expert knowledge in order to do model selection, and
hyperparameter tuning.

 Finding the best model and best hyperparameter


configuration is non-trivial. This problem is known as
Combined Algorithm and Hyperparameter Selection
(CASH) Figure 1) Cycle of CASH
Theory

We will first explore the theory of Bayesian Optimization with details in the following areas:

 Bayesian Optimization

 Methods for Bayesian Optimization

 Gaussian Processes

 Tree-Based Approaches

 Extensions for Large Datasets and Many Dimensions


Bayesian Optimization

Cross Validation and manual tuning are expensive

 Bayesian Optimization to the rescue!


 Objective function is the performance of model and hyperparameter settings
 Exploration is guided by an acquisition function
Bayesian Optimization
Bayesian Optimization for AutoML is defined as follows:

Given hyperparameters from configuration space

We wish to optimize the algorithm loss function over k-fold CV as given by


Bayesian Optimization
Bayesian Optimization boils down to this algorithm [1],

And acquisition functions, such as the Expected Improvement (as an example)


Methods for Bayesian Optimization

Various methods, modifications, and extensions exist for Bayesian Optimization,


including:

 Gaussian Processes

 Tree-based approaches: TPE and SMAC

 REMBO, Fabolas, and MTBO


Gaussian Processes

A Gaussian Process is a Gaussian distribution over functions [1]. It comprises of:


 Mean and covariance which are functions of x (x is a vector)
 A Mean function which encodes an a priori expectation
 A Covariance obtained using a kernel (indicates similarity)
 Inferences obtained by sampling at points where acquisition function is greatest
Gaussian Processes

 On the right, a GP is shown performing BO


over a univariate function f(x)
 Second plot shows adding a new
observation and making a new prediction

Figure 2) GP for
a single iteration
Tree-Based Approaches
Tree Parzen Estimators (TPE) [2]
 These model and rather than .

 Where a split is created at the specified threshold and the 2 new conditionals are
modeled by tree Parzen estimators.

Sequential Model-Based Algorithm Configuration (SMAC) [3]


 This uses random forests to model as a Gaussian distribution whose
mean and variance are the empirical mean and variance over the predictions of the
forest’s trees
Extensions for Large Datasets and Many
Dimensions
One method of extending Bayesian optimization is to make it more robust and efficient on large
datasets. Techniques to do this are:

 Multi-Task Bayesian Optimization (MTBO) by K. Swersky et al. emerges as one technique that
introduces dataset size as an additional task to optimize [4]

 Random Embeddings in Billions of Dimension with Bayesian Optimization (ReMBO) is a


embed hyperparameter configs in a higher dimensional space [5]

 Fast Bayesian Optimization on Large DataSets (Fabolas) is a method that uses a unique
acquisition function that models the dataset size and function evaluation time on small data
subsets [6]
Applications of BO
Bayesian optimization is ideal for globally optimizing
black-box functions that are expensive to evaluate. It
has been successfully been applied to:

 Improving Convolutional Neural Network (CNN)


architectures for image classification
 Performing combined model selection and
hyperparameter optimization for machine learning Figure 3) Top:
tasks ImageNet Zoo with
 Optimizing sensor set selection for applications many human
such as choosing weather sensors for forecasting, configured models.
and robotics AutoML to the rescue?
Left: A cute Boston
Dynamics robot.
Sensor optimization?
More Applications of BO
 Predicting new materials with novel properties, without extensive screening of candidate
materials, for applications in materials science and engineering
 Predicting training loss of machine learning algorithms when training training on small subsets
of a dataset
 Improving object detection and pose estimation
 Improving performance of ML models in astronomy related domains

Figure 4) Galaxy classification is a big data


problem in astronomy that is often tackled
using ML. AutoML can yield better
performance.

See the Kaggle Galaxy Zoo Competition


Tools, platforms and frameworks
There is wide-community support for Bayesian Optimization for machine learning.

 There is large number of tools available for languages used for data science, such as Python,
MATLAB, R, and Java.

 We list a few tools and frameworks implementing the mentioned algorithms for Python, a very popular
language for machine learning and Bayesian optimization research implementations. Tools also exist
for Java, such as Auto-WEKA, and for R, such as H2O. However, we stick to Python for its prevalence
in the research community
Tools, platforms and frameworks
Below are some Python packages implementing various algorithms

Name Tool Link Description of Package

SMAC SMACv3 github.com/automl/SMAC3 Sequential model-based


algorithm configuration

Fabolas RoBO github.com/automl/RoBO Fast BO on large datasets

Hyperband RoBO github.com/zygmuntz/hyperband Successive Halving algorithm

TPE Hyperopt github.com/hyperopt BO w/ TPE and random


framework

MTBO RoBO github.com/automl/RoBO Multi-Task BO framework


Strengths
 ML Experts and researchers are free to focus on developing novel architectures, algorithms, and models, for
a domain rather than optimizing hyperparameters for a current model.
 Using data-efficient approaches to BO can find high-quality models a lot faster than any brute-force
approach, and more often than not can outperform human expert searches
 Many application domains such as astronomy, videogame development and engineering can now more easily
benefit from machine learning, without the need of experts well-versed in the theory and practice
 Unlimited power!
Limitations
 Potentially hardware intensive when optimizing expensive neural network architectures.
 Does not incorporate expert knowledge and reasoning into the optimization procedure.
 Implementation-wise, the packages and tools that exist for state-of-the-art BO techniques are
often laden with dependencies, restricted to Linux, and contain code that is not always
production-ready.
 Unlimited power comes with unlimited responsibilities!
Application Area:
Deep Reinforcement Learning

 A sub-field of deep reinforcement learning aims to use RL to train deep neural networks to
perform complex tasks such as playing videogames
 Choosing which neural network architecture, which network hyperparameters, and which RL
algorithm to use is a tough problem for humans to tackle

 Deep RL itself is an approach that finds many applications in robotics, game-playing agents, and
adversarial systems.
 Improving model selection and hyperparameter optimization for deep RL can create more
efficient agents, and accelerate research into certain application areas
Deep RL Tutorials and Videogame Agents
Deep RL is state-of-the-art for playing many adversarial games:
(labelled with figures alongside)
d)
 a) Spinning pendulum
 b) Atari games such as Breakout, Pong
 c) Pole on a cart
 d) DOOM (!)

Future: GTA self-driving cars?

a) b) c)
a
)
Experimental Details
Hardware:

 I7-3930K 8-core Processor

 16GB RAM

 NVIDIA GTX 680 GPU - 3GB Memory

Software:

 See our Github page! Link: github.com/BrutishGuy/Deep-RL-BO

 If you had to count our dependencies, you may run out of natural numbers to do it with.

 Also, sorry Mac OS, and Windows users, this one is strictly Linux (blame dependencies)
Experimental Details Cont...

 For each of the algorithms, the models to choose from were: (Below, images refers to
preprocessed images)

 A Convolutional Neural Network (CNN) on images,

 A Long Short Term Memory (LSTM) Network on a short sequence of images,

 A Convolutional + LSTM layered Network on a short sequence of images,

 And a CNN-LSTM merged network on a short sequence of images.


More Experimental Details...
 Choices for network hyperparameters for each model/component of a model are:
 Up to 10 layers are allowed for LSTM and CNN networks,

 Up to 512 neurons per layer, increasing in powers of two starting at 16,

 ReLU, Sigmoid, Linear, Tanh, Leaky ReLU activation functions for each layer, and

 Adam, SGD, RMSProp choices for neural network weight training.

 Choices of reinforcement learning algorithms are Q-learning, SARSA, A3C, CEM, and TD-
lambda. Models were allowed to play a total of 10 000 episodes for training and 10 test episodes
for evaluation.
 This budget constraint makes good models a necessity
Demo - Show Me the Money
This is the part where we show off all the things
Conclusions
 We have seen how Bayesian Optimization (BO) can be used as a technique for
automated machine learning.

 We have learnt some of the theory and seen a few of the state-of-the-art techniques used
to combat dimensionality issues and large datasets.

 We have seen BO applied to Deep Reinforcement Learning to play some simple and
some more complex videogames at a better-than human level of performance, while
removing the need for model configuration from the experimenter.
References
[1] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & de Freitas, N. (2016).
Taking the human out of the loop: A review of bayesian optimization.
Proceedings of the IEEE, 104(1), 148-175

[2] Hutter, F. and Hoos, H. H. and Leyton-Brown, K.


Sequential Model-Based Optimization for General Algorithm Configuration
In: Proceedings of the conference on Learning and Intelligent OptimizatioN (LION 5)

[3] Bergstra, J., Yamins, D., & Cox, D. D. (2013).


Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms.
In Proceedings of the 12th Python in Science Conference (pp. 13-20).

[4] Wang, Z., Zoghi, M., Hutter, F., Matheson, D., & De Freitas, N. (2013, August).
Bayesian Optimization in High Dimensions via Random Embeddings.
In IJCAI (pp. 1778-1784)

[5] Swersky, K., Snoek, J., & Adams, R. P. (2013).


Multi-task bayesian optimization.
In Advances in neural information processing systems (pp. 2004-2012).

[6] Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2016).
Fast bayesian optimization of machine learning hyperparameters on large datasets.
arXiv preprint arXiv:1605.07079

You might also like