0% found this document useful (0 votes)

15 views

Bayesian Optimisation (AutoML)

This document provides an overview of Bayesian optimization for automated machine learning. It introduces Bayesian optimization and its application to AutoML problems like combined algorithm and hyperparameter selection. It discusses the theory behind Bayesian optimization, common methods like Gaussian processes and tree-based approaches, and extensions for large datasets. Applications to deep reinforcement learning are also mentioned. Tools and frameworks for Bayesian optimization in Python are listed. Limitations and strengths are briefly outlined.

Uploaded by

iit2020211

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Bayesian Optimisation (AutoML)

Uploaded by

iit2020211

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Bayesian Optimisation

FOR AUTOMATED MACHINE LEARNING

Outline
In this presentation, we will introduce Bayesian Optimization (BO) for Automated Machine
Learning (AutoML). The structure of the talk will have:
 An introduction into BO and AutoML
 Some motivations for BO applied to AutoML
 An overview of the theory behind BO with focus on AutoML
 A discussion into state-of-the-art for BO for AutoML
 Mentions of application areas, with a focus on Deep Reinforcement Learning (Deep RL)
 A look into tutorials, and some demos of BO and Deep RL in action
 Conclusions
Automated
Machine Learning (AutoML)
 A few questions to ask yourself: What model should I use
for my machine learning task? What model hyperparameters
shall I set and to which values? What are hyperparameters?

 Applying machine learning models to problems requires

some expert knowledge in order to do model selection, and
hyperparameter tuning.

 Finding the best model and best hyperparameter

configuration is non-trivial. This problem is known as
Combined Algorithm and Hyperparameter Selection
(CASH) Figure 1) Cycle of CASH
Theory

We will first explore the theory of Bayesian Optimization with details in the following areas:

 Bayesian Optimization

 Methods for Bayesian Optimization

 Gaussian Processes

 Tree-Based Approaches

 Extensions for Large Datasets and Many Dimensions

Bayesian Optimization

Cross Validation and manual tuning are expensive

 Bayesian Optimization to the rescue!

 Objective function is the performance of model and hyperparameter settings
 Exploration is guided by an acquisition function
Bayesian Optimization
Bayesian Optimization for AutoML is defined as follows:

Given hyperparameters from configuration space

We wish to optimize the algorithm loss function over k-fold CV as given by

Bayesian Optimization
Bayesian Optimization boils down to this algorithm [1],

And acquisition functions, such as the Expected Improvement (as an example)

Methods for Bayesian Optimization

Various methods, modifications, and extensions exist for Bayesian Optimization,

including:

 Gaussian Processes

 Tree-based approaches: TPE and SMAC

 REMBO, Fabolas, and MTBO

Gaussian Processes

A Gaussian Process is a Gaussian distribution over functions [1]. It comprises of:

 Mean and covariance which are functions of x (x is a vector)
 A Mean function which encodes an a priori expectation
 A Covariance obtained using a kernel (indicates similarity)
 Inferences obtained by sampling at points where acquisition function is greatest
Gaussian Processes

 On the right, a GP is shown performing BO

over a univariate function f(x)
 Second plot shows adding a new
observation and making a new prediction

Figure 2) GP for
a single iteration
Tree-Based Approaches
Tree Parzen Estimators (TPE) [2]
 These model and rather than .

 Where a split is created at the specified threshold and the 2 new conditionals are
modeled by tree Parzen estimators.

Sequential Model-Based Algorithm Configuration (SMAC) [3]

 This uses random forests to model as a Gaussian distribution whose
mean and variance are the empirical mean and variance over the predictions of the
forest’s trees
Extensions for Large Datasets and Many
Dimensions
One method of extending Bayesian optimization is to make it more robust and efficient on large
datasets. Techniques to do this are:

 Multi-Task Bayesian Optimization (MTBO) by K. Swersky et al. emerges as one technique that
introduces dataset size as an additional task to optimize [4]

 Random Embeddings in Billions of Dimension with Bayesian Optimization (ReMBO) is a

embed hyperparameter configs in a higher dimensional space [5]

 Fast Bayesian Optimization on Large DataSets (Fabolas) is a method that uses a unique
acquisition function that models the dataset size and function evaluation time on small data
subsets [6]
Applications of BO
Bayesian optimization is ideal for globally optimizing
black-box functions that are expensive to evaluate. It
has been successfully been applied to:

 Improving Convolutional Neural Network (CNN)

architectures for image classification
 Performing combined model selection and
hyperparameter optimization for machine learning Figure 3) Top:
tasks ImageNet Zoo with
 Optimizing sensor set selection for applications many human
such as choosing weather sensors for forecasting, configured models.
and robotics AutoML to the rescue?
Left: A cute Boston
Dynamics robot.
Sensor optimization?
More Applications of BO
 Predicting new materials with novel properties, without extensive screening of candidate
materials, for applications in materials science and engineering
 Predicting training loss of machine learning algorithms when training training on small subsets
of a dataset
 Improving object detection and pose estimation
 Improving performance of ML models in astronomy related domains

Figure 4) Galaxy classification is a big data

problem in astronomy that is often tackled
using ML. AutoML can yield better
performance.

See the Kaggle Galaxy Zoo Competition

Tools, platforms and frameworks
There is wide-community support for Bayesian Optimization for machine learning.

 There is large number of tools available for languages used for data science, such as Python,
MATLAB, R, and Java.

 We list a few tools and frameworks implementing the mentioned algorithms for Python, a very popular
language for machine learning and Bayesian optimization research implementations. Tools also exist
for Java, such as Auto-WEKA, and for R, such as H2O. However, we stick to Python for its prevalence
in the research community
Tools, platforms and frameworks
Below are some Python packages implementing various algorithms

Name Tool Link Description of Package

SMAC SMACv3 github.com/automl/SMAC3 Sequential model-based

algorithm configuration

Fabolas RoBO github.com/automl/RoBO Fast BO on large datasets

Hyperband RoBO github.com/zygmuntz/hyperband Successive Halving algorithm

TPE Hyperopt github.com/hyperopt BO w/ TPE and random

framework

MTBO RoBO github.com/automl/RoBO Multi-Task BO framework

Strengths
 ML Experts and researchers are free to focus on developing novel architectures, algorithms, and models, for
a domain rather than optimizing hyperparameters for a current model.
 Using data-efficient approaches to BO can find high-quality models a lot faster than any brute-force
approach, and more often than not can outperform human expert searches
 Many application domains such as astronomy, videogame development and engineering can now more easily
benefit from machine learning, without the need of experts well-versed in the theory and practice
 Unlimited power!
Limitations
 Potentially hardware intensive when optimizing expensive neural network architectures.
 Does not incorporate expert knowledge and reasoning into the optimization procedure.
 Implementation-wise, the packages and tools that exist for state-of-the-art BO techniques are
often laden with dependencies, restricted to Linux, and contain code that is not always
production-ready.
 Unlimited power comes with unlimited responsibilities!
Application Area:
Deep Reinforcement Learning

 A sub-field of deep reinforcement learning aims to use RL to train deep neural networks to
perform complex tasks such as playing videogames
 Choosing which neural network architecture, which network hyperparameters, and which RL
algorithm to use is a tough problem for humans to tackle

 Deep RL itself is an approach that finds many applications in robotics, game-playing agents, and
adversarial systems.
 Improving model selection and hyperparameter optimization for deep RL can create more
efficient agents, and accelerate research into certain application areas
Deep RL Tutorials and Videogame Agents
Deep RL is state-of-the-art for playing many adversarial games:
(labelled with figures alongside)
d)
 a) Spinning pendulum
 b) Atari games such as Breakout, Pong
 c) Pole on a cart
 d) DOOM (!)

Future: GTA self-driving cars?

a) b) c)
a
)
Experimental Details
Hardware:

 I7-3930K 8-core Processor

 16GB RAM

 NVIDIA GTX 680 GPU - 3GB Memory

Software:

 See our Github page! Link: github.com/BrutishGuy/Deep-RL-BO

 If you had to count our dependencies, you may run out of natural numbers to do it with.

 Also, sorry Mac OS, and Windows users, this one is strictly Linux (blame dependencies)
Experimental Details Cont...

 For each of the algorithms, the models to choose from were: (Below, images refers to
preprocessed images)

 A Convolutional Neural Network (CNN) on images,

 A Long Short Term Memory (LSTM) Network on a short sequence of images,

 A Convolutional + LSTM layered Network on a short sequence of images,

 And a CNN-LSTM merged network on a short sequence of images.

More Experimental Details...
 Choices for network hyperparameters for each model/component of a model are:
 Up to 10 layers are allowed for LSTM and CNN networks,

 Up to 512 neurons per layer, increasing in powers of two starting at 16,

 ReLU, Sigmoid, Linear, Tanh, Leaky ReLU activation functions for each layer, and

 Adam, SGD, RMSProp choices for neural network weight training.

 Choices of reinforcement learning algorithms are Q-learning, SARSA, A3C, CEM, and TD-
lambda. Models were allowed to play a total of 10 000 episodes for training and 10 test episodes
for evaluation.
 This budget constraint makes good models a necessity
Demo - Show Me the Money
This is the part where we show off all the things
Conclusions
 We have seen how Bayesian Optimization (BO) can be used as a technique for
automated machine learning.

 We have learnt some of the theory and seen a few of the state-of-the-art techniques used
to combat dimensionality issues and large datasets.

 We have seen BO applied to Deep Reinforcement Learning to play some simple and
some more complex videogames at a better-than human level of performance, while
removing the need for model configuration from the experimenter.
References
[1] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & de Freitas, N. (2016).
Taking the human out of the loop: A review of bayesian optimization.
Proceedings of the IEEE, 104(1), 148-175

[2] Hutter, F. and Hoos, H. H. and Leyton-Brown, K.

Sequential Model-Based Optimization for General Algorithm Configuration
In: Proceedings of the conference on Learning and Intelligent OptimizatioN (LION 5)

[3] Bergstra, J., Yamins, D., & Cox, D. D. (2013).

Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms.
In Proceedings of the 12th Python in Science Conference (pp. 13-20).

[4] Wang, Z., Zoghi, M., Hutter, F., Matheson, D., & De Freitas, N. (2013, August).
Bayesian Optimization in High Dimensions via Random Embeddings.
In IJCAI (pp. 1778-1784)

[5] Swersky, K., Snoek, J., & Adams, R. P. (2013).

Multi-task bayesian optimization.
In Advances in neural information processing systems (pp. 2004-2012).

[6] Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2016).
Fast bayesian optimization of machine learning hyperparameters on large datasets.
arXiv preprint arXiv:1605.07079

Vibration Thesis Matlab Code
100% (3)
Vibration Thesis Matlab Code
6 pages
AWS ML Notes -Domain 2 - Data Transformation
No ratings yet
AWS ML Notes -Domain 2 - Data Transformation
32 pages
Chapter 2: Technologies: What Is Yolov4?
No ratings yet
Chapter 2: Technologies: What Is Yolov4?
6 pages
Multi-Swarm Parallel PSO
No ratings yet
Multi-Swarm Parallel PSO
7 pages
Hyperopt a Python library for model selection and
No ratings yet
Hyperopt a Python library for model selection and
25 pages
A Hardware Descriptive Approach to Beetle antennane search
No ratings yet
A Hardware Descriptive Approach to Beetle antennane search
12 pages
Scimakelatex 18970 One Two Three
No ratings yet
Scimakelatex 18970 One Two Three
6 pages
Scimakelatex 14592 XXX
No ratings yet
Scimakelatex 14592 XXX
8 pages
Deployment of The Producer-Consumer Problem: Bill Smith
No ratings yet
Deployment of The Producer-Consumer Problem: Bill Smith
7 pages
sensors-21-08003
No ratings yet
sensors-21-08003
16 pages
On The Investigation of Rasterization: Bee Bop
No ratings yet
On The Investigation of Rasterization: Bee Bop
5 pages
Emulating Expert Systems Using Modular Technology: RTR and RT
No ratings yet
Emulating Expert Systems Using Modular Technology: RTR and RT
7 pages
варианты Лагранж и Гамильтое PDF
No ratings yet
варианты Лагранж и Гамильтое PDF
11 pages
Theano: A CPU and GPU Math Compiler in Python
No ratings yet
Theano: A CPU and GPU Math Compiler in Python
7 pages
Sample Progress Report
No ratings yet
Sample Progress Report
7 pages
1-s2.0-S1568494617301874-main
No ratings yet
1-s2.0-S1568494617301874-main
10 pages
Simulating I/O Automata and 802.11B: Will Ismad
No ratings yet
Simulating I/O Automata and 802.11B: Will Ismad
6 pages
Scimakelatex 69254 Travis+Herrick Michael+meredith Austin+stone
No ratings yet
Scimakelatex 69254 Travis+Herrick Michael+meredith Austin+stone
7 pages
Goar A Methodology For The Synthesis of Expert
No ratings yet
Goar A Methodology For The Synthesis of Expert
7 pages
A Case For Scheme - Scott+Summers - Robert+Drake - Hank+McCoy - Jean+Grey - Warren+Worthington+III
No ratings yet
A Case For Scheme - Scott+Summers - Robert+Drake - Hank+McCoy - Jean+Grey - Warren+Worthington+III
6 pages
Work Cocoliso
No ratings yet
Work Cocoliso
6 pages
Distributed Optimization Methods for Multi-Robot Systems Part 2 survey
No ratings yet
Distributed Optimization Methods for Multi-Robot Systems Part 2 survey
16 pages
Towards The Synthesis of Context-Free Grammar
No ratings yet
Towards The Synthesis of Context-Free Grammar
8 pages
358_Large_Language_Models_as_O
No ratings yet
358_Large_Language_Models_as_O
41 pages
Scimakelatex 11681 Raja+Manickam Sethu+Iyer Ramesh+Nambiar
No ratings yet
Scimakelatex 11681 Raja+Manickam Sethu+Iyer Ramesh+Nambiar
5 pages
Comparing Ipv7 and The Memory Bus With Wirymust: Pepe Garz and Loquillo
No ratings yet
Comparing Ipv7 and The Memory Bus With Wirymust: Pepe Garz and Loquillo
5 pages
LoftTart Symbiotic Theory
No ratings yet
LoftTart Symbiotic Theory
7 pages
Refinement of Symmetric Encryption
No ratings yet
Refinement of Symmetric Encryption
6 pages
Beating C in Scientific Computing Applications: On The Behavior and Performance of L, Part 1
No ratings yet
Beating C in Scientific Computing Applications: On The Behavior and Performance of L, Part 1
12 pages
Frier: Optimal Models: Bstract
No ratings yet
Frier: Optimal Models: Bstract
3 pages
Particle Swarm Optimization Matlab Toolbox 1 - Conformat
No ratings yet
Particle Swarm Optimization Matlab Toolbox 1 - Conformat
5 pages
Essay For Submission
No ratings yet
Essay For Submission
7 pages
Maximising MicroPython Speed
No ratings yet
Maximising MicroPython Speed
9 pages
Deconstructing Access Points Using BOTCH: Dobed
No ratings yet
Deconstructing Access Points Using BOTCH: Dobed
5 pages
Certificate: Vaibhav Gaur, Students of Semester 4
No ratings yet
Certificate: Vaibhav Gaur, Students of Semester 4
44 pages
Construction of Business
No ratings yet
Construction of Business
6 pages
Extensible, Amphibious Information For SCSI Disks: Tam Asi Aron, Zerge Zita and B Ena B Ela
No ratings yet
Extensible, Amphibious Information For SCSI Disks: Tam Asi Aron, Zerge Zita and B Ena B Ela
7 pages
The Location-Identity Split No Longer Considered Harmful
No ratings yet
The Location-Identity Split No Longer Considered Harmful
7 pages
Main Seminar 'Autonomic Computing': Operating Systems and Middleware
No ratings yet
Main Seminar 'Autonomic Computing': Operating Systems and Middleware
10 pages
A Methodology For The Investigation of Robots: Trap Handler
No ratings yet
A Methodology For The Investigation of Robots: Trap Handler
4 pages
Exploring Randomized Algorithms and The Location-Identity Split
No ratings yet
Exploring Randomized Algorithms and The Location-Identity Split
6 pages
Janke 2020
No ratings yet
Janke 2020
6 pages
A Case For Massive Multiplayer Online Role-Playing Games: Benza
No ratings yet
A Case For Massive Multiplayer Online Role-Playing Games: Benza
5 pages
Link-Level Acknowledgements Considered Harmful
No ratings yet
Link-Level Acknowledgements Considered Harmful
6 pages
Studying Internet Qos Using Authenticated Models: White and Whalt
No ratings yet
Studying Internet Qos Using Authenticated Models: White and Whalt
5 pages
A Case For Rasterization
No ratings yet
A Case For Rasterization
6 pages
Scimakelatex 30610 None
No ratings yet
Scimakelatex 30610 None
7 pages
p553 Boehm
No ratings yet
p553 Boehm
12 pages
ThawyAil Event-Driven, Mobile Information
No ratings yet
ThawyAil Event-Driven, Mobile Information
7 pages
Unit 3 Slides - Getting Started With Neural Networks
No ratings yet
Unit 3 Slides - Getting Started With Neural Networks
70 pages
Deconstructing Architecture
No ratings yet
Deconstructing Architecture
6 pages
Lab3 - Introduction To Machine Learning Algorithms With A Focus On Robotics Applications
No ratings yet
Lab3 - Introduction To Machine Learning Algorithms With A Focus On Robotics Applications
12 pages
Butyl: Pseudorandom Models
No ratings yet
Butyl: Pseudorandom Models
7 pages
On The Development of Sensor Networks: Bstract
No ratings yet
On The Development of Sensor Networks: Bstract
4 pages
A Matlab Based Simulator For Autonomous Mobile Robots
No ratings yet
A Matlab Based Simulator For Autonomous Mobile Robots
6 pages
Scimakelatex 28282 Mary+jones
No ratings yet
Scimakelatex 28282 Mary+jones
6 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
HCIA-AI V3.5 Exam Outline
No ratings yet
HCIA-AI V3.5 Exam Outline
3 pages
Measuring Regional Inequality Using Nightlight Satellite Data and Population Density For Nigeria
No ratings yet
Measuring Regional Inequality Using Nightlight Satellite Data and Population Density For Nigeria
13 pages
Cse535 Link Analysis
No ratings yet
Cse535 Link Analysis
19 pages
Pretrained ResNet-18 Convolutional Neural Network - MATLAB Resnet18
No ratings yet
Pretrained ResNet-18 Convolutional Neural Network - MATLAB Resnet18
2 pages
Unit11 Eigen Values and Eigen Vector Part 2 PDF
No ratings yet
Unit11 Eigen Values and Eigen Vector Part 2 PDF
12 pages
Or-Week 2 - Introduction To LP - Simplex Method
No ratings yet
Or-Week 2 - Introduction To LP - Simplex Method
31 pages
XGBoost R Tutorial
100% (1)
XGBoost R Tutorial
10 pages
CCN4-LENINGUERRERO-PARALELOA-SEMANA2-21.2.10 Lab - Encrypting and Decrypting Data Using OpenSSL
No ratings yet
CCN4-LENINGUERRERO-PARALELOA-SEMANA2-21.2.10 Lab - Encrypting and Decrypting Data Using OpenSSL
5 pages
Feature Extraction Gabor Filters
No ratings yet
Feature Extraction Gabor Filters
2 pages
Lecture 15method of Variation of Parameters
No ratings yet
Lecture 15method of Variation of Parameters
16 pages
PAMJ2
No ratings yet
PAMJ2
10 pages
Hierarchical and Partitional Clustering
No ratings yet
Hierarchical and Partitional Clustering
3 pages
Ai
No ratings yet
Ai
12 pages
POLYMAX - A REVOLUTION IN OPERATIONAL MODAL ANALYSIS (C)
No ratings yet
POLYMAX - A REVOLUTION IN OPERATIONAL MODAL ANALYSIS (C)
13 pages
Labview Guide
No ratings yet
Labview Guide
8 pages
Textbook Distribution-Tables
No ratings yet
Textbook Distribution-Tables
1 page
Latihan Soal Sesi 7 (Dipulihkanotomatis)
No ratings yet
Latihan Soal Sesi 7 (Dipulihkanotomatis)
2 pages
Reconstructing Odes From Time Series Data
No ratings yet
Reconstructing Odes From Time Series Data
15 pages
Model QP 21AML152
No ratings yet
Model QP 21AML152
3 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Big Mart Sales Prediction Using Machine Learning
No ratings yet
Big Mart Sales Prediction Using Machine Learning
58 pages
Spreadsheet Modeling & Decision Analysis:: A Practical Introduction To Management Science, 3e by Cliff Ragsdale
100% (1)
Spreadsheet Modeling & Decision Analysis:: A Practical Introduction To Management Science, 3e by Cliff Ragsdale
33 pages
Complete Time Series Analysis in Python 1673057003
No ratings yet
Complete Time Series Analysis in Python 1673057003
56 pages
BigMart Sale Prediction Using Machine Learning
No ratings yet
BigMart Sale Prediction Using Machine Learning
2 pages
rfgf
No ratings yet
rfgf
5 pages
Assignment of QT
No ratings yet
Assignment of QT
9 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
22 pages
Paper - 2011 - Designs of Bhattacharyya Parameter in The Construction of Polar Codes - Shengmei Zhao
No ratings yet
Paper - 2011 - Designs of Bhattacharyya Parameter in The Construction of Polar Codes - Shengmei Zhao
4 pages
Secrets and Lies Review
No ratings yet
Secrets and Lies Review
2 pages
Data Science Essentials Study Guide
No ratings yet
Data Science Essentials Study Guide
40 pages