A Review On Automated Machine Learning (AutoML) Systems
A Review On Automated Machine Learning (AutoML) Systems
Abstract—Automated Machine Learning is a research area researches. AutoML is defined as a “software capable of
which has gained a lot of focus in the recent past. But the being trained and tested without human intervention”
various approaches followed by researchers and what has been
disclosed by the available work is neither properly documented B. Problem & Motivation
nor very clear due to the differences in the approaches. If the Designing an effective learning model is often tedious
existing work is analyzed and brought under a common and only done well by experts with deep knowledge of
evaluation criterion, it will assist in continuing researches. This machine learning algorithms and domain expertise. Most
paper presents an analysis of the existing work in the domains others just choose an algorithm by intuition and go ahead
of autoML, hyperparameter tuning and meta learning. The with whatever the default values the parameters were
strongholds and drawbacks of the various approaches and assigned by the learning tool they are using. Sometimes they
their reviews in terms of algorithms supported, features and use brute force techniques to find the best performing model,
the implementations are explored. This paper is a results of the but with infinite possibilities of algorithms and parameters, it
initial phase of an ongoing research, and in the future we hope becomes highly infeasible [5]. So even though the definition
to make use of this knowledge to create a design that will meet of machine learning makes it seem like there won’t be any
the gaps and the missing links identified. programming required, in reality it demands large hours of
tedious programming to build a learning model.
Keywords—autoML, hyperparameter, automation, AI
Machine learning is arguably the center of Artificial
I. INTRODUCTION Intelligence with many disciplines relying on it [6]. But lack
of experts in machine learning makes the development of AI
This section introduces the domain ‘Automated Machine constrained and slow paced. An off-the-shelf solution that
Learning’ with some background information, the identified can be used to build learning models even by novice
problem in the domain and the necessity of addressing the developers to any given use case, will be a huge turning
problem. point in AI world.
A. Background II. STUDY SETUP
Machine learning is defined as the “field of study that
gives computers the ability to learn without being explicitly We started gaining the necessary domain knowledge with
programmed” [1]. Though machine learning makes it the literature survey. We identified three domains that are
possible for computers to learn without human interaction, linked with automated statistical modelling,
human effort is required when the machine learning systems 1. AutoML
are being built [2]. In low level, machine learning systems 2. Hyperparameter Tuning
are just statistical modelling algorithms or ensembles of 3. Meta-learning
algorithms that are developed to work well in a specific use
case. A machine learning model designed to identify cats We came across 28 primary researches under these domains
from dogs will not work well on detecting human age from a and identified different types of approaches used, which is
picture. The statistical model built for one use case, was discussed in Section 4. We identified two main such
developed by choosing an algorithm from many possibilities approaches, fully-automated and semi-automated and
and was tweaked as necessary so that it would work well, but explored more on these. We identified six main solutions
only on that particular dataset. So there is a need to create developed by researchers with different pros and cons, as
separate statistical models to each and every dataset we come discussed in Section 6.
across making the machine learning process inefficient.
According to a KDNuggets poll conducted in 2015 [3], III. AUTO-ML
within next ten years, most of the expert level predictive data The umbrella term AutoML coined from ‘Automated
science tasks will be automated. This is supported by the fact Machine Learning’ [4] refers to the large scale automation
many domain specific tools (e.g., Google Prediction API, of a wide spectrum of the machine learning process beyond
Google CloudML, AzureML, BigML, Dataiku, DataRobot, the traditional model-creation, such as data pre-processing,
KNIME, and RapidMiner) have surfaced in the near past to meta-learning, feature learning, model searching,
automate some parts of the machine learning pipeline, while hyperparameter optimization, classification/regression,
many researches are underway to build tools that will
workflows generation, data acquisition and reporting. These
completely automate the process. AutoML is a term coined
in 2015 by CheLearn [4] to specify these set of tools and black-box learning machines gained popularity after
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.
ChaLearn initiated AutoML Competitions [4] in 2015. knowledge ecosystem that can help build an effective
Started as a ‘benchmark for automated machine learning machine learning workflow.
systems that can be operated without any human
intervention’ the challenge focused on automating IV. EXISTING WORK
hyperparameter tuning and model selection for classification The existing work on tracking framework evolution and
learnings. code generation is presented in this section.
Even though there were many promising systems
emerged from these competitions and recently we have been There has been substantial interest in researches around
autoML systems in the recent past. Thornton [12] was one of
introduced to some commercial level AutoML systems by
the earliest researchers to propose hyperparameter tuning. He
Google [7] and H2O.ai [8], the majority of the concepts and used Sequential Model-based Bayesian Optimization to
researches are in a very early stages. Researchers have used achieve automation in parameter tuning. Several researchers
varieties of statistical theories like regularization, Bayesian followed his initial work and improved the concept further.
priors, Minimum Description Length (MDL), Structural
Risk Minimization (SRM), bias/variance tradeoff and
genetic programming to tackle the autoML problem but
ending up in a very distinct and inconsistent results. Further
researches are required to find the best suiting techniques
that are generic and works consistently. Fig. 1. Typical design of an autoML system
Two main concepts in this research area, hyperparameter
tuning and meta-learning are analyzed below, We can categorize possible approaches to track
framework evolution, into two main types,
A. Hyper-parameter tuning
Machine learning algorithms are in fact statistical 1) Automated approches
functions aimed at minimizing some variant of a cost 2) Semi-automated apporaches
function. The cost function depends on a set of parameters
called hyperparameters that make up the algorithm. For a A. Automated approches
specific dataset, to get the efficient learning model the Automated approaches try to completely automate the
parameters need to be set to the optimal value specific to machine learning process. Even though it is the ultimate
that dataset. Till recent past, hand-tuning these parameters goal of all these researches, a complete automation has
by domain experts have been the only way to find the proved to be very challenging task even after the current
parameter setting [9]. Since the hyperparameters can take up technological advancements. We identified four researches
any value, choosing the optimal one becomes extremely which have achieved varied success in complete automation
tedious. Recently however, hyperparameter optimization and analyzed their work.
techniques like Regression Trees, Gaussian Processes and
density-estimation have been used to automate this tuning Auto-WEKA (2012) [12] is the earliest research
process. involved in automating the learning process. They
The concept of hyperparameter optimization was formulated the problem into a formal study area dubbed
originated in neural networks as there can be overwhelming ‘Combined Algorithm Selection and Hyperparameter
number of parameters in a neural network. Later these optimization’ (CASH) problem. They came up with a
concepts were needed even in machine learning domains concept to optimize empirical performance by automatically
with few hyperparameters, as datasets tend to become too choosing an algorithm from WEKA Java package and its
large to hand-tune. Bayesian optimization has emerged as a hyperparameters for a given dataset. Auto-WEKA used
successful candidate for hyperparameter tuning. It is a Bayesian Optimization techniques, and in particular
probabilistic model that captures relationship between Sequential Model-based Optimization that can work in both
hyperparameter settings and their performance, and uses this categorical and continuous hyperparameters. It iteratively
model to iteratively evaluate, choose and update the most calculates the dependence of cross-validation loss function
promising settings. Techniques like random search and grid of a hyperparameter setting, uses machine learning to
search [10] are used underneath these approaches. choose the best candidate configuration of hyperparameters
and updates the model with new datapoint obtained. In 2017
B. Meta-learning Auto-WEKA 2.0 [13] was released improving on the first
Meta-learning is yet another concept getting popularity package by adding support for regression algorithms and
in the recent times. Each model trained on a dataset parallel runs. It also considered tree-based Bayesian
contributes to the understanding of the data. Even if a model optimization methods as it yielded more promising results
performed poorly, that says something about the dataset. than the predecessor. In the comparisons below we consider
These results combined, can create a knowledge system that both Auto-WEKA and Auto-WEKA 2.0 to be the same.
can be used on similar datasets. This is the concept behind
meta-learning. Meta-learning uses attributes like data set
size, the number of features, and various aspects about the
features along with the performance data. It is used to find
good instantiations of learning models from the knowledge
of previous tasks [11]. When this data generated by
researches all over the world, is stored and continuously
updated in a public repository, it creates an invaluable
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. AUTO-SKLEARN Configuration Space
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.
1. Datatype Identifier & Data Splitter V. KEY COMPONENTS REQUIRED
2. Feature Selector & Stacker This section covers the key components involved in
3. Model & Hyper-parameter Selector
developing the system regardless of the approaches
The system was even capable of automatically identifying
machine learning type required (If text dataset is given, identified through the literature survey. A typical autoML
deploys Natural Language Processing algorithms), and system will contain 4 important components overseeing 4
avoiding over-fitting. On the downside, it was too focused on important tasks of the automated workflow. But in a more
a specific use case, making it hard to be used as a generic commercial oriented system there can be more than these 4
solution and worked only on tabular datasets. components. The important components that were identified
are as follows,
A. Preprocessing Engine
Preprocessing the dataset is the very first operation done.
It is important as the dataset input can be varied and can
have many discrepancies. The preprocessing engine takes
care of tidying up the data and performing few
transformations so that the subsequent parts of the workflow
can be run smoothly. Normalization, feature standardization,
and missing-value patching are some common
preprocessing done regardless of data type. More focused
preprocessing can be done with dimensionality reduction,
grouping modalities for categorical variables, discretization
and nonlinear transformation (e.g. log transformation).
There can also be datatype specific preprocessing like
punctuation removal in Natural Language Processing related
Fig. 6. AutoCompete Components
learnings.
PennAI (2017) [18] is the first system to have a B. Feature Engine
commercial appeal in the autoML landscape. Because the The next step is to identify and engineer the features of
fully automated techniques are in very early stages, to have a
the dataset. Getting a proper feature set influences hugely on
proper product meant the system to not be fully automated.
PennAI was introduced as a Learning Assistance that will the success of the learning model. Common operations
not replace data scientists, rather help them find best models. under this engine will be feature extraction, feature
While continuing exploration of genetic programming, this selection, dimensionality reduction, linear manifold
was solely focused on healthcare and biomedical domains. transformations (e.g. Principle Component Analysis, ICA)
PennAI also had a very systematic and defined workflow, and clustering (e.g. K-means) for unsupervised learnings.
with human involvement (dubbed Human Engine) being an More specified and customized feature engine operations
important part of the workflow. A new feature (dubbed can be done with embedded feature learning of the
Knowledge Base) of storing the models created in previous algorithms and non-linear dimensionality reductions like
operations by different users and recommending for new KPCA, MDS, LLE and Laplacian Eigenmaps.
operations was introduced along with a user friendly
Graphical User Interface. Though the product had C. Predictor Engine
commercial appeal, it supported only a selected few Predictor operation is the most important component of
algorithms in scikit-learn package. an autoML system. This creates the machine learning model
or the predictor function which will be trained and evaluated
in the automated process. The ultimate goal of this engine is
to find the best candidates of hyperparameters and learning
algorithms to be passed to the next engine. In all the
previous works a fully functional package was selected to be
the underlying layer of the predictor engine. For example,
Auto-WEKA uses WEKA, a Java package and AUTO-
SKLEARN uses Scikit-Learn, a Python package. Neural
nets and Naive Bayes optimizations as predictor models
with logistic loss function as predictor function is the most
used and successful, so far in the existing systems.
Alternatively, researchers have used genetic programming,
ensembles of decision trees, linear methods, two-norm/one-
norm regularization and nearest neighbors for classification
learning as well.
D. Model Selection and Ensemble Engine
Here the best suiting prediction algorithm is chosen from
Fig. 7. PennAI Components a pool of candidates from Predictor Engine. These
candidates can be as much as infinite possibilities or can be
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.
refined by processes like meta-learning to a handful best ZeroR 0 1 1
suiting few. For model selection, techniques like cross- Classification Learners
validation and leaderboards were used. Particularly in cross-
validation, K-folds and leave-one-out and for ensembling, Random Forest 7 1 1 1 1 1 1 6
boosting, out-of-bag estimation and other bagging k-Nearest
techniques were used. In addition, bi-level optimization and Neighbors 3 1 1 1 1 1 5
knowledge transfer from one engine to the next were widely Logistic 1 1 1 1 1 4
used to make an effective model selection. Decision Tree 4 1 1 1 3
In a commercial oriented autoML system there can be Gradient Boosting 6 1 1 1 3
few other components worth mentioning. For example, Naïve Bayes
Multinomial 2 1 1 1 3
PennAI introduced few engines like Human Engine,
Support Vector
Knowledge Base, Visualization Engine and GUI Engine. Machine 4 1 1 1 3
Though these are interesting concepts to make a user Gaussian Processes 10 1 1 2
friendly autoML system, these doesn’t contribute much to kernel SVM 7 1 1 2
the novel research of autoML. Linear Regression 3 1 1 2
Naïve Bayes 2 1 1 2
VI. ANALYSIS OF EXISTING WORK SGD 5 1 1 2
In this section we analyze the main six solutions under AdaBoost 4 1 1
two categories, 1) The learning algorithms supported and 2) BayesNet 2 1 1
features and characteristics. Bernoulli Naïve
Bayes 2 1 1
A. Analysis of Algorithms Decision Stump 0 1 1
Table 1 presents the comparison of the machine learning Decision Table 4 1 1
algorithms supported in the available work. ExtraTrees 8 1 1
extreml Random
Trees 5 1 1
TABLE I. LEARNING ALGORITHMS SUPPORTED BY AUTO-ML
SYSTEMS IBk 5 1 1
J48 9 1 1
Semi-
Automated JRip 4 1 1
automate
KStar 3 1 1
AUTO-SKLEARN
LDA 4 1 1
Hyperopt-Sklearn
LMT 9 1 1
Auto-Compete
# of HP
Auto-WEKA
Total
M5P 4 1 1
Algorithm
M5Rules 4 1 1
PennAI
TPOT
Multilayer
Perceptron 8 1 1
OneR 1 1 1
Regression Learners PART 4 1 1
Gradient Boosting 6 1 1 2 passive aggressive 3 1 1
Linear Regression 3 1 1 2 QDA 2 1 1
Random Forest 7 1 1 2 Random Tree 11 1 1
Decision Stump 0 1 1 REPTree 6 1 1
Decision Table 4 1 1 Ridge Classifier 1 1
Decision Tree 4 1 1 Simple Linear
Regression 0 1 1
ElasticNet 1 1
Simple Logistic 5 1 1
Gaussian Processes 10 1 1
SMO 11 1 1
IBk 5 1 1
SMOreg 13 1 1
k-Nearest
Neighbors 3 1 1 SVC 23 1 1
KStar 3 1 1 Voted Perceptron 3 1 1
Lasso 1 1 XGBoost 1 1
Logistic ZeroR 0 1 1
Regression 1 1 1 Total Algorithms 41 6 15 5 14 13 -
Multilayer Supported -
Perceptron 8 1 1 Here ‘HP’ means Hyperparameters, 1 means available and missing values means not available.
Random Tree 11 1 1
REPTree 6 1 1
Ridge 1 1 B. Features and Behavoiurs
SGD 5 1 1 Table 2 presents the comparison of various features and
Simple Linear behaviors found in available work.
Regression 0 1 1
SMOre 13 1 1
Support Vector
Machine 4 1 1
Support Vector
Regression 1 1
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.
TABLE II. CHARACTERISTICS OF AUTO-ML SYSTEMS autoML is something that formulated very recently and
more initiatives are imminent before we get a fully
AutoCompete
automated industrial standard system. Even though several
Auto-WEKA
SKLEARN
Hyperopt-
Sklearn
PennAI
AUTO-
TPOT
Feature promising systems are developed, these are centered on a
specific domain or use case, and not suitable to be used as a
generic solution. By using ensembling and meta-learning the
Language Java Pytho Pytho Python Pytho Python problem of automated hyperparameter tuning can be tackled
n n n efficiently. More technologies and statistical concepts
Algorithm WEK Scikit Scikit Scikit- - Scikit- unexplored in the autoML systems will make up the
Source A -learn -learn learn learn
majority of future efforts while the knowledge of previous
Predictor Bayes Hyper Bayes Genetic Grid- Genetic
Algorithm ian opt ian Progra search Progra efforts need to be accumulated as knowledge hubs.
mming mming With this knowledge about the existing work,
Ensemble 0 0 1 1 0 0 drawbacks of the available systems, and how they can be
s improved, we hope to come up with an architectural style in
Meta 0 0 0 1 0 1
Learning
near future, towards an efficient automated machine
learning system.
REFERENCES
VII. WHAT IS MISSING IN EXISTING WORK [1] P. Simon, “Too Big to Ignore: The Business Case for Big Data,” p.
In this section we will look into the findings of the 25, 2013.
[2] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, “Algorithms for
survey focusing on what has been missing and the common Hyper-Parameter Optimization,” p. 9.
issues noted in the existing systems. [3] I. Guyon et al., “A brief Review of the ChaLearn AutoML
Functional end products: So far the work in this Challenge:,” p. 10, 2016.
domain is separated to two aspects of quality - Fully [4] I. Guyon et al., “Design of the 2015 ChaLearn AutoML challenge,”
2015, pp. 1–8.
functional but research products and partly functional but [5] F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential Model-
proper end products. Since the results from fully automated Based Optimization for General Algorithm Configuration,” in
products show inconsistency, only the semi-automated Learning and Intelligent Optimization, vol. 6683, C. A. C. Coello, Ed.
products have been used largely by public users. A fully Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 507–523.
[6] R. S. Olson et al., “A System for Accessible Artificial Intelligence,”
automated industry-standard product with user friendly ArXiv170500594 Cs, May 2017.
interface is still missing in this domain. [7] Q. Le and B. Zoph, “Using Machine Learning to Explore Neural
Accessible knowledge hub: Meta-learning techniques Network Architecture,” Using Machine Learning to Explore Neural
in the autoML systems rely on knowledge of previous Network Architecture, 17-May-2017. .
[8] H2O.ai, “AutoML: Automatic Machine Learning,” AutoML:
similar tasks but there aren’t any collaborative efforts in Automatic Machine Learning. .
creating such libraries of statistical models. Following the [9] R. Bardenet, M. Brendel, B. Kégl, and M. Sebag, “Collaborative
initiatives of having open datasets for use, having a library hyperparameter tuning,” p. 9.
of statistical models will help in advancing the autoML [10] K. Swersky, J. Snoek, and R. P. Adams, “Freeze-Thaw Bayesian
Optimization,” ArXiv14063896 Cs Stat, Jun. 2014.
systems. [11] M. Feurer, J. T. Springenberg, and F. Hutter, “Initializing Bayesian
Python-centric researches: So far except for Auto- Hyperparameter Optimization via Meta-Learning,” p. 8.
WEKA all other researches have been centered around [12] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto-
Python. Though this is not a concern, R Language has been WEKA: Combined Selection and Hyperparameter Optimization of
Classification Algorithms,” ArXiv12083719 Cs, Aug. 2012.
in rising popularity in the recent times in terms of data [13] L. Kotthoơ , C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-
science researches. Having a research in R, a language Brown, “Auto-WEKA 2.0: Automatic model selection and
designed solely for statistical computing will help explore hyperparameter optimization in WEKA,” p. 5.
avenues otherwise unexplored. [14] B. Komer, J. Bergstra, and C. Eliasmith, “Hyperopt-Sklearn:
Automatic Hyperparameter Configuration for Scikit-Learn,” p. 7,
Using Neural Networks: Deep Neural Networks and 2014.
Deep Belief Networks have become feasible in the recent [15] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,”
times with the increasing computation powers. Tools like ArXiv12010490 Cs, Jan. 2012.
TensorFlow have been helping in AI researches and [16] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and
F. Hutter, “Efficient and Robust Automated Machine Learning,” p. 9.
AutoML problem makes a good domain to try such [17] R. S. Olson, “TPOT: A Tree-based Pipeline Optimization Tool for
advanced neural technologies. Automating Machine Learning,” p. 9.
[18] A. Thakur and A. Krohn-Grimberghe, “AutoCompete: A Framework
VIII.CONCLUSION for Machine Learning Competition,” ArXiv150702188 Cs Stat, Jul.
2015.
Analyzing the results from the existing solutions,
following conclusions were derived. The research area of
Authorized licensed use limited to: ANII. Downloaded on January 04,2021 at 09:02:03 UTC from IEEE Xplore. Restrictions apply.