0% found this document useful (0 votes)

11 views

A Framework For Parameter Estimation and Model Selection From Experimental Data in Systems Biology Using Approximate Bayesian Computation 2014 Liepe

This document presents an approximate Bayesian computation (ABC) framework called ABC-SysBio for parameter estimation and model selection in systems biology. ABC-SysBio is a Python package that uses sequential Monte Carlo approaches to enable parameter estimation and model discrimination in the Bayesian formalism. The document outlines the rationale for ABC-SysBio, discusses computational issues, and provides guidance on using it for parameter inference and model selection. As a demonstration, it applies ABC-SysBio to infer parameters and discriminate between models for an imaginary 7-reaction biological network using provided simulated data.

Uploaded by

Catalina Casanueva

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

A Framework For Parameter Estimation and Model Selection From Experimental Data in Systems Biology Using Approximate Bayesian Computation 2014 Liepe

Uploaded by

Catalina Casanueva

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

protocol

A framework for parameter estimation and model

selection from experimental data in systems biology
using approximate Bayesian computation
Juliane Liepe1, Paul Kirk1, Sarah Filippi1, Tina Toni1, Chris P Barnes2 & Michael P H Stumpf1,3
1Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College, London, UK. 2Department of Cell and Developmental Biology,
University College, London, UK. 3Institute of Chemical Biology, Imperial College, London, UK. Correspondence should be addressed to M.P.H.S. ([email protected]).

Published online 23 January 2014; doi:10.1038/nprot.2014.025

As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools
to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC)
framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that
enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches.
We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the
important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages,
© 2014 Nature America, Inc. All rights reserved.

ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order
to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed
of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified
version of the protein), a network that is defined by two files containing ‘observed’ data that we provide as supplementary
information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second
part we use ABC-SysBio’s relevant functionality to discriminate between two different reaction network models, one of them being
the ‘true’ one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up
for this cost, especially in complex problems.

INTRODUCTION
Experimental data and mathematical models are beginning to mathematical system. In addition, suitably parameterized models
take equal billing in systems biology. Experimental observations are few and far between.
without a framework in which to link them offer researchers only Two schools of thought can be distinguished. The first tradi
limited insights into how biological systems work. Equally, math tional approach is to collect parameter values from the literature
ematical analysis without concrete grounding in, and immediate and plug these values into the mathematical equations making
relevance to, experimental observations risks being biologically up the model. The second approach places the experimental data
irrelevant. Here we adopt a very flexible notion of what consti at the heart of the analysis and seeks to infer the parameters from
tutes a system, and we merely assume that we have quantitative the available observations1,2. A host of different approaches, or
(e.g., proteomics, transcriptomics or metabolomics) data con inferential procedures, have been proposed in the literature and
cerning the change over time in the abundances or concentrations used in practice3. Statistical inference typically tries to obtain the
of a number of different molecular species; signal transduction best estimates of the reaction rates, as well as their respective
and stress response pathways and gene expression regulatory uncertainties (Fig. 1). Optimization-based frameworks contend
circuits naturally fall under this loose definition, as do metabolic with the best value. A common method is to specify an objective
pathways and combinations thereof. function that quantifies the discrepancy between the experimen
Models summarize our understanding of biological mecha tal data and the model’s predictions, and then to search through
nisms in an equally convenient and precise form; they enable us parameter combinations in order to minimize this discrepancy4.
to make predictions that test our understanding; and they model A broad range of optimization algorithms exists, which provide
those aspects of a system that are not directly accessible to experi a variety of different (often heuristic) methods for perform
mental observation. In the analysis of gene expression dynamics, ing this search and thereby identifying the best parameter set.
for example, proteomic and transcriptomic data are rarely mea If such an optimization approach is adopted, a key consideration
sured together and, if they are, not always at the same time points. is to avoid overfitting the data (i.e., fitting the noise). Another
Models thus provide the context in which data are best inter concern is the problem of local optima, which means that there
preted, and the function of biological systems is understood. will often be many parameter combinations that provide locally
optimal fits, but determining whether or not they are truly the
Deriving models from data best parameters (or if, alternatively, we could have found better
Linking models and data, however, remains a formidable chal ones by performing a more thorough search) is typically very
lenge. Even when a plausible, perhaps even ‘almost correct’, challenging. Finally, optimization approaches must always be
model is available, researchers require numerical values for all concerned with the robustness of the parameter estimates and
the mathematical parameters that describe the behavior of the the confidence that is placed in them. Bootstrapping5 and data

nature protocols | VOL.9 NO.2 | 2014 | 439

protocol
Figure 1 | Data and posteriors. The aim of Bayesian inference is to infer
Θ2
parameters that have high or appreciable probability of having generated 10
200
some observed data (red dots in the left image). If a model has two 8
parameters, θ1 and θ2, then our aim is to obtain the joint distribution over 6

Output
both parameters, indicated by the contour diagram in the right image. 100
4
Please note that in the right image, the darker the color of the contour,
the higher the posterior probability density. The two simulated trajectories 2
in the left image correspond to two different parameter combinations. 0 0
The parameter combination associated with the thicker trajectory 0 5 10 15 0 2 4 6 8 10
(which provides the better explanation of the observed data) is in a region Time Θ1

of high posterior density, whereas the parameter combination of the thinner trajectory is located in a region of lower posterior density. Often, as here,
the joint distribution will differ from the product of the (marginal) distributions of the individual parameters (histograms at the top and right of the contour
plot)—statistical dependence between the two parameters means that their joint posterior distribution is not simply the product of the individual or marginal
parameter posteriors. Secrier et al.64 discuss a range of such examples.

subsampling approaches provide a class of (computationally how researchers should elicit and specify priors is a highly debated
intensive) methods for robustness quantification by generating a issue that is largely beyond the scope of the present article, and
collection of new data sets from the initial set of observations and we refer the interested reader to the literature9–12. As the use of
then assessing the variability in the parameter estimates obtained objective priors (i.e., vague priors, such as maximum entropy and
across this collection. Jeffreys priors, specified according to mathematical principles,
© 2014 Nature America, Inc. All rights reserved.

rather than according to the subjective prior belief of the inves

Bayesian inference for model calibration tigator conducting the analysis) has received some criticism13,
Although the best parameter value is of obvious interest, so too we would recommend biophysically motivated priors (i.e., priors
is an assessment of how much uncertainty there is in the esti that genuinely reflect the researcher’s knowledge of any biophysi
mate. As an alternative to heuristic optimization approaches, cal constraints) wherever possible. When the use of biophysically
Bayesian inference has gained attention in recent years as a flex motivated priors is not possible, it is advisable to explore the
ible and formally coherent way in which to approach the prob influence of prior choice explicitly, as done in, for instance, Toni
lem of model calibration1,6,7. Bayesian approaches provide an et al.14.
opportunity to specify any prior beliefs or information that we The likelihood is typically defined by a parametric probability
have about the unknown parameters (which may, for example, model, p(D|θ), for the data, such that (θ|D) is given by con
have been obtained through previous experimentation), while sidering p(D|θ) as a function of the parameters θ (with D, the
also (i) automatically avoiding the problem of overfitting and observed data set, fixed). In contrast to maximum likelihood
(ii) providing assessments of confidence by assessing the uncer approaches, which treat the likelihood as an objective function
tainty that remains in the unknown parameters. Although the and use optimization approaches to search for the single best
problem of adequately exploring the space of parameter com parameter vector that maximizes p(D|θ), Bayesian approaches
binations remains, and it must be carefully considered, methods are concerned with elucidating (or, at least, obtaining samples
for Bayesian inference typically take great pains to address these from) the posterior distribution of parameter vectors p(θ|D). It
concerns. The key object of interest when performing Bayesian is usually impossible to write down an expression for the poste
parameter inference is the posterior distribution. This distribu rior distribution analytically; in these cases, it is necessary to use
tion describes the uncertainty that remains in the parameters after computational approaches, such as Markov chain Monte Carlo
observing the data, and it is obtained via Bayes rule in a manner (MCMC) techniques15.
that combines our prior beliefs (the beliefs we had regarding the It is worth reiterating that Bayesian inference attempts to assess
parameters before performing the current experiments) with an the probability of a parameter to be the correct parameter given
assessment of the fit provided to the observed data. Formally, we the data8; this naturally includes an assessment of the uncertainty
usually write this relationship as8 of the inference, as all parameter values that have finite prob
ability to have generated the data are the target of the inference
p(q | D) ∝ (q | D) p(q ), procedure. This uncertainty, it has turned out, can have a pivotal
role in the analysis of a system’s dynamics16–18, and appreciation
where θ denotes the vector of parameters, D is the observed data of this uncertainty yields direct insights into the degree to which
and  is the likelihood function, or, in words, as, the behavior predicted by the model is robust to changes to the
posterior ∝ likelihood × prior. parameters, especially when the distribution over the different
reaction rates is considered jointly (Fig. 1). Although they come
Here, the prior is a distribution that formally expresses the infor at computational expense, the insights gained from considering
mation or beliefs that we have about the parameters before we this joint distribution over parameters may outweigh these costs.
performed the current experiment, whereas the likelihood is a When analyzing data in the context of a mathematical model,
function of the parameters that describes in a formal, probabilis researchers always ought to calibrate the model against the avail
tic manner how well each parameter explains the observed data. able data (i.e., to estimate parameters from the data directly).
The prior distribution clearly has an important role in Bayesian Relying on parameter values obtained independently, such as
inference, providing an opportunity to express the beliefs we have from the literature, is fraught with potential problems, as bio
regarding the parameters before data set D is obtained. Exactly chemical reaction rates can vary between different conditions

440 | VOL.9 NO.2 | 2014 | nature protocols

protocol

Box 1 | The ABC-SMC algorithm

ABC-SMC as used by ABC-SysBio attempts to find an approximation Population 1 Define set of intermediate distributions, πt, t = 1,...., T
Prior, π(�) �1 > �2 > ...... > �T
to the true posterior in a sequential manner27. To this end,
a set of intermediate distributions—also known as populations— Population t – 1
is constructed, where for each population t all accepted particles πt – 1 (�|d(D* , D) < εt–1)

give rise to simulated data D* that differ from the true experimen- Population t
πt (�|d(D* , D) < �t )
tal data D by at most a distance d (D*,D) <εt. This approach
requires a sequence of decreasing thresholds or tolerances as Population T
πT (�|d(D* , D) < �T)
shown in the figure above, with the final tolerance εT setting the
desired final agreement between real and simulated data.
Successive populations are generated from the previous popula- At each population t:
tion (or from the prior if t = 1) by using a sequential importance (1) sample from proposal
sampling scheme, by perturbing particles using an appropriate �t(�t) = ∫ πt–1 (�t–1)Kt(�t–1, �t)d�t–1 where
so-called perturbation kernel, to ensure that the parameter space Kt(�t–1, �t) is Markov perturbation kernel,
is explored sufficiently well. Each accepted particle has an (2) accept particle if d(D*, D) < �t,

associated weight, and in ABC-SysBio we require a fixed number (3) link weight to particle wt(�t) = πt(�t)/�t(�t)

of particles in each population. The choice of the kernel and the

sequence for εt can affect the speed of the algorithm.
© 2014 Nature America, Inc. All rights reserved.

(e.g., as a function of temperature, ambient pH or changes due schedule of ε values is used in such a way that these approaches
to factors not explicitly modeled). gradually move from sampling from the prior (when ε is very
large) toward sampling from the posterior (as ε tends toward
Approximate Bayesian computation zero). In this protocol, we focus on an ABC algorithm based
A great deal of recent research has considered situations in which on SMC approaches (ABC-SMC) introduced by Toni et al.27.
it is impossible to write down an expression for the likelihood An overview of the ABC-SMC algorithm is provided in Box 1.
(θ|D), but it is nevertheless possible to simulate data from our In cases where the data set, D, is very high-dimensional or has a
model. Such ‘likelihood-free’ approaches have become known as particularly complicated structure (e.g., if D is from a network),
ABC approaches19. a number of authors have considered comparing summaries of
The simplest ABC approach, ABC rejection20,21, proceeds by: the data, i.e., calculating vectors of statistics, ρ(D) and ρ(D*),
(i) sampling a parameter vector, θ*, from the prior distribution; for the observed and simulated data sets, and only accepting θ*
(ii) plugging θ*into the model and running a simulation to if d(ρ(D), ρ(D*)) < ε. However, this approach will usually result
generate a synthetic data set D*; (iii) using a distance function, d, in some loss of information (which can have negative theoretical
to quantify the discrepancy between D* and the observed data D; and practical consequences), and hence considerable care must
and (iii) accepting θ* if the distance, d(D, D*), between D and D* be taken to choose appropriate, informative summaries of the
is less than some threshold value ε. This process may be repeated data29–33. ABC-SysBio is an efficient and very generally applicable
many times in order to obtain a collection of accepted param software implementation for performing parameter estimation
eter vectors. It is important to note that if noise is present in the and model selection within the ABC framework. In ABC-SysBio,
observed data set D then, to avoid introducing biases, it should we only consider direct comparisons between the observed and
also be present in the synthetic data set D*, which for known noise simulated data sets, rather than using summaries of the data.
characteristics can be straightforwardly incorporated. In some In addition to parameter estimation, ABC approaches can also
contexts (e.g., when modeling data using ordinary differential be used for model ranking and selection34. In this case, we associ
equations (ODEs)), simply specifying a model for the measure ate a model indicator, m, with each model under consideration,
ment noise will imply a likelihood22, but here we are concerned and seek samples from the joint posterior distribution over mod
with complex stochastic models for which this is not the case. els and parameters, p(m, θ|D). From these samples, researchers
In the models that we consider, there are components of output may derive estimates of the marginal posterior probability of a
uncertainty that are typically much larger than the measurement model, p(m|D), which may be used to rank the models of interest.
noise (e.g., in the context of biochemical reaction networks, the As we discuss in the Limitations section below, the issues men
times at which reactions occur), and therefore it has become com tioned above regarding the use of statistics to summarize the data
mon practice to assume that measurement noise is negligible (which we avoid in ABC-SysBio) are particularly problematic in
compared with these other sources of stochasticity23–25. the context of model selection.
In the limit, as the threshold value ε tends to zero, the accepted The key strength of ABC approaches is that they can be applied
collection of parameter vectors will represent a sample from the to problems with intractable likelihoods35,36 (for example, com
posterior distribution p(θ|D). In practice, if ε is set to be too small, plex stochastic models). However, ABC approaches are much
the acceptance rate (i.e., the proportion of times we have d(D, D*) more broadly applicable, as they can be used regardless of whether
<ε) will be unacceptably low. This consideration (and its com or not it is possible to write down a likelihood function37,38. The
putational implications) has motivated researchers to introduce only requirement is that researchers must be able to simulate from
a number of sequential approaches26–28, in which a decreasing the models under consideration. This property makes ABC an

nature protocols | VOL.9 NO.2 | 2014 | 441

protocol
ideal methodology for software implementation, enabling it to problems that defy conventional statistical inference47. Second, in
be applied ‘out of the box’ to a broad range of problems. their wake, we may be able to close such gaps in the applicability
of conventional statistical inference either through computational
Applications and key papers for ABC-SMC and ABC-SysBio advances or through new developments of, e.g., suitable approxi
Likelihood-free inference in the form of simple ABC rejection mations to the likelihood23,48–51.
was first introduced in the area of population genetics 21,39, and, The distinct applications and strengths of ABC methods com
owing to the size of the models and parameter space, the algo plicate comparison with other methods. Pure ABC packages
rithm was soon extended to adopt more powerful MCMC40 and are typically targeted either at ABC cognoscenti and require the
SMC26 samplers. Since then, there has been an explosion of papers provision of, e.g., simulation routines (typically provided as R
advancing the ABC methodology and its applications (see ref. 41 or C functions) or at population geneticists, as is the case with
for a review). As described above, ABC methods can be used DIY-ABC52. In the latter realm, some packages have achieved a
for a wide range of applications for fitting models to different level of sophistication that enables non-expert users to study
types of data; here we restrict the discussion to biological appli hard problems in population genetics, such as population sub
cations with data ranging from gene expression and proteomic division and movement between different demes53,54. However,
time series data to imaging data and protein-protein interaction for the practicing systems biologist, packages such as easyABC
data. ABC-SysBio was conceived with the aim of solving precisely (http://easyabc.r-forge.r-project.org) lack, for example, the ability
these types of problems. The parameter estimation algorithm was to parse mathematical models provided in the Systems Biology
used to fit the deterministic and stochastic mechanistic models of Markup Language (SBML) exchange format or the ability to effi
the phage shock protein stress response in Escherichia coli, which ciently simulate (e.g., via GPU-support55,56) different models.
© 2014 Nature America, Inc. All rights reserved.

then served to propose novel hypotheses about the stress response In the context of likelihood-based Bayesian inference, several
system dynamics42. A protein kinase B (Akt) signaling pathway packages exist (typically using MCMC algorithms) for systems
model was among the largest models to which the parameter modeled by ODEs. These include primarily BioBayes57. Stochastic
inference algorithm has been applied to date3. The obtained pos dynamics, whether modeled using stochastic differential equa
terior distribution was used to study in detail the sensitivity and tions (SDEs) or chemical master equation formalisms, incur huge
sloppiness of the kinetic parameters affecting Akt signaling. The computational costs and there is a distinct lack of general-purpose
parameter estimation algorithm was also used to find the param software aimed at the systems biology community. Here, however,
eter region for a model of hes family bHLH transcription factor we see the main use of ABC methods at present. For ODEs, it is
1 (Hes1) transcription dynamics, which captures the oscillatory possible, and indeed desirable, to use likelihood-based inference,
behavior of Hes1 expression levels observed in mouse cell lines43. but for many stochastic models ABC-based approaches enable
Oscillatory behavior poses considerable challenges to parameter researchers to address inference problems that simply cannot be
estimation problems22, and in this study the parameter distribu tackled by conventional Bayesian approaches50,58.
tion obtained by ABC served as prior information for another Likelihood-based MCMC or SMC approaches and nested
powerful algorithm that can efficiently infer parameters giving sampling are also emerging as inferential frameworks for sto
rise to oscillatory behavior. chastic dynamical systems. This development is particularly
Toni et al.14 used the model selection algorithm to distinguish promising when dealing with cases where the likelihood of a set
between several models of the phosphorylation dynamics of the of stochastic (time-series) realizations of a system can be approxi
ERK MAP kinase by fitting the models to time series proteomic mated in a computationally favorable way. One such way is to use,
data. The model selection algorithm was also used to study leuko for instance, the linear noise approximation or generalizations
cyte migration in zebrafish embryos in response to injuries44. In thereof to model the time evolution of stochastic dynamical sys
this application, model selection was used to distinguish between tems48. Such simulation routines may, of course, also be gainfully
different models of the chemokine stimulus gradient, and, based used in ABC frameworks.
on migration trajectories obtained from live imaging data, the
model was chosen that best describes the in vivo leukocyte dynam Limitations
ics. This study is a prime example of an application for which ABC ABC methods are designed to work where other likelihood-based
is particularly appropriate: here the definition of a likelihood has approaches cannot (perhaps, yet) be applied. Nevertheless, when
thus far proved elusive, whereas simulating from these models is they are used to address any challenging problem, ABC methods
possible. Other applications of ABC-SMC (based on ABC-SysBio) will also be computationally expensive, and, obviously, the curse
have emerged in synthetic biology45, where researchers can use of dimensionality still applies; thus, the more the parameters that
this framework to identify molecular reaction networks that have we seek to infer, the more challenging the inference will become
high (or appreciable) probability of fulfilling a given set of design and models with even only dozens of parameters will defy serious
objectives, such as different switch-like or sensor behaviors. In analysis by ABC, or, indeed, by any other Bayesian approach.
regenerative medicine and stem-cell biology, a related approach There have been developments in computational aspects of
has been used to map out the behavior of hematopoietic stem cells ABC36,59,60, which promise to make inference more efficient and
and their progeny in the bone marrow stem cell niche46. affordable, but these developments cannot overcome the more
generic problems encountered by all inference algorithms.
Comparison with other methods One area in which limitations of ABC procedures have received
ABC methods fill a gap in the apparatus of statistical inference. widespread attention is model selection31,33,61. The limitations
Their advantages are two-fold. First, they enable researchers to that have been highlighted in the literature are pertinent for cases
apply the whole Bayesian formalism—in approximation—to where inferences are based on summary statistics of the data

442 | VOL.9 NO.2 | 2014 | nature protocols

protocol
instead of the data themselves, an approach that is convention Protocol overview
ally adopted in population genetics applications. In these cases, The ABC-SysBio software is designed for parameter inference
model selection is notoriously dependent on arbitrary choices and model selection. However, it can also be used to parse and
made in the setup of the ABC inference, and it can swing in favor simulate sbml models (models written in the format of the SBML
of any plausible model, irrespective of which is the correct one. standard). It enables researchers to perform simulations using
This tendency causes problems in any real-world application in ODE and SDE solvers, as well as the Gillespie algorithm.
which the correct model is obviously not known. In the ABC-SysBio software, parameter inference and model
The type of inference problem considered in this protocol selection are performed in a sequential manner (as described in
does not require the use of summary statistics of the data. In the Box 1). After each iteration of the algorithm, a set of parameter
context of the dynamical systems models considered here, ABC vectors is constructed; these parameter vectors are called ‘particles’
inference may be conducted using the whole time-course data set, and form a ‘population’. Each particle is a vector of length equal
rather than summary statistics thereof. Thus, ABC model selec to the number of parameters to be estimated. In this protocol,
tion is possible in the current setting, and it is implemented in we refer to the number of particles in a population as the popu
ABC-SysBio. lation size. The populations are constructed so that the particles

Box 2 | Algorithm setup and advanced options

Many of the settings of the algorithm affect convergence to the true posterior and may require careful consideration in new
applications of ABC-SysBio. We provide some basic guidance on the most important parameters below:
© 2014 Nature America, Inc. All rights reserved.

Particles. The number of particles has to be large enough in order to efficiently cover the entire parameter search space, and it should
increase with the number of parameters, or for model selection applications.
Epsilon. As an alternative to user-specified tolerance schedules, we can also choose automated tolerances, which are based on the
distributions of the recorded distances between simulated data from the previous population and the observed data. The next threshold
is the <alpha> quantile of this distribution, where <alpha> is a parameter of the algorithm that needs to be defined. For example,
<autoepsilon>
<finalepsilon> 0.0 </finalepsilon>
<alpha> 0.1 </alpha>
</autoepsilon>
Restart. For a user-defined tolerance schedule, it can happen that a tolerance value is too strict, in which case the acceptance rate
drops drastically. The user can stop the algorithm and restart it from the last finished population with a new tolerance schedule by
setting in the input file: <restart> True </restart>. The algorithm will then apply the new tolerance schedule.
Prior distribution. Currently implemented prior distributions are as follows: constant x (constant parameter with value x), normal a b
(normal distribution with location a and variance b), uniform a b (uniform distribution on the interval [a, b]) and lognormal a b
(lognormal distribution with location a and variance b). If plausible parameter ranges are known, prior distributions should be
defined accordingly. In this case, the user can either trust these values (‘constant‘ prior) or set a small prior range around this value
(for example, a normal distribution centered on a literature value).
Initial conditions. They can be inferred as parameters if priors are provided, e.g.,
<initial>
<ic1> uniform 0.0 100.0 </ic1>
<ic2> uniform 1.0 10.0 </ic2>
<ic3> constant 0.0 </ic3>
</initial>
This infers initial conditions for species 1 and 2 (with uniform priors), but starts from 0 for species 3.
Distance function. ABC-SysBio computes the sum of squares (Euclidean distance) between data and the simulated trajectories.
The user has the option to use a custom distance function (see section 5.3 of the ABC-SysBio manual). Adaptations of the distance
function can help avoid convergence problems60. Furthermore, a noise model can be incorporated into this custom distance function.
Because the distance function has to be written in Python syntax, any available Python function (including sampling random numbers
in order to generate a noise model) can be applied. The manuscript refers to this possibility.
Kernel. The implemented perturbation kernels are as follows: uniform (component-wise uniform kernels), normal (component-wise
normal kernels), multiVariateNormal (multivariate normal kernel whose covariance is based on the previous population),
multiVariateNormalKNeigh (multivariate normal kernel whose covariance is based on the K nearest neighbors of the particle) and
multiVariateNormalOCM (multivariate normal kernel whose covariance is the OCM).
dt. For SDE models, the user has to set the numerical time step ‘dt’. This time step needs to be reasonably small (for most systems
dt < 0.01) to avoid numerical errors, but smaller time steps result in longer simulation times.

nature protocols | VOL.9 NO.2 | 2014 | 443

protocol
Figure 2 | Models and data. (a–e) The full mRNA a P1p2 m
p4 + P2p3 b m
p0
P1 m
p4
∅
self-regulation model is shown in a. mRNA (m)
produces protein P1, which can be transformed p1 p4
p0
P1 P2 P1 ∅
into protein P2. P1 is required to produce mRNA, p1
whereas P2 degrades mRNA into the empty P1 P2 P1p2 p4
∅ m P2 ∅
set (Ø). P1 and P2 can also be degraded. The
reactions that occur according to this model are p4 p4 P2p3
m ∅
shown in b. Fitting of the model to the data (c),
which comprise mRNA measurements over time.
The second model (d) is based on the first model,
but it does not contain protein P2. The relevant c d e
P1p2 m p3
reactions are shown in e. 25 p0
m P1

20 P1p2
p0
forming the population give rise to simu ∅ m

m
lated data that differ from the observed 15 P1 p3
m ∅
data by at most a predetermined threshold. 10 p4
Therefore, each population is associated p4 P1 ∅
with a threshold; these thresholds decrease 0 5 10 15
in consecutive populations, starting from Time
a typically quite high threshold at
population 1 and tending toward zero.
© 2014 Nature America, Inc. All rights reserved.

Selecting appropriate settings for the algorithm, such as the In the second part of the PROCEDURE (Steps 19–29), we illus
number of particles per population or the decreasing threshold trate the use of the model selection tools to discriminate between
schedule, involves some trial and error and experience. Some basic two models: the model described above and a simplified model
guidance is given in Box 2. of the mRNA self-regulation represented in Figure 2d,e. We use
In this protocol, we demonstrate how to use ABC-SysBio to a similar data set as in the first part of the procedure. However,
infer parameters of an example system given a data set and how in this second part of the protocol, we assume that only the
to rank two candidate models. Two mRNA self-regulatory models total protein measurements are available, although not for all
have been created to serve as tutorials. One of them was used to time points.
generate an in silico data set, which will be used in the parameter For other models that are not part of this protocol, sbml model
inference and model selection scheme. files can either be generated manually, by several pieces of soft
In the first example system, mRNA (m) is translated into a ware (Copasi, Mendel and ShorthandSBML), or, in the case of
protein (P1) that regulates the production of its own mRNA, m. published models, files can be found in the BioModels database
Furthermore, P1 can be modified (through an assumed post- (http://www.ebi.ac.uk/biomodels-main/). An excellent tuto
translational modification) at some rate resulting in P2, which rial on understanding and generating sbml files can be found in
degrades m. All three molecular species are degraded at a constant Wilkinson62.
rate. This system therefore contains seven reactions. A schematic Although this protocol contains a timing information section,
of the system, together with the seven reactions, is shown in the length of time required for the parameter inference and
Figure 2a,b. The species, parameter and reactions are defined in the model selection algorithm to run is highly dependent on the
an sbml model file, which is provided as Supplementary Data 1. system hardware. The computational cost also depends on the
In the first part of the PROCEDURE (Steps 1–18), we illustrate size of the model, the complexity of the data, the dimension of
how to infer parameters of this system (denoted by p0, p1, p2, the parameter space and all the algorithm settings (such as the
p3 and p4 in Fig. 2) by using the in silico–generated data set. We number of particles, the perturbation kernel, and so on). A full
explain how to use sbml models, guide the reader through the list of all algorithm settings is provided within the documentation
algorithm settings and explain the output of ABC-SysBio. of the ABC-SysBio package.

MATERIALS
EQUIPMENT (http://sourceforge.net/projects/numpy/files/), Scipy (http://sourceforge.
• Data sets for the observables related to the postulated (imaginary) reaction net/projects/scipy/files/), Matplotlib (http://sourceforge.net/
network described in the Introduction are reported in Supplementary projects/matplotlib/files/). Optional dependencies are Swig
Data 1, which contains the input files for the ABC-SysBio package in sbml (http://sourceforge.net/projects/swig/files/), libSBML (http://sourceforge.
format, and Supplementary Data 2, which contains the description of the net/projects/sbml/files/libsbml/) (both necessary to follow this
models in sbml format. protocol) and cuda-sim (http://sourceforge.net/projects/
• ABC-SysBio is a Python package, which runs on Linux and Mac OS cuda-sim/files/).
X systems. (Windows is not currently supported, but we have successfully EQUIPMENT SETUP
installed matplotlib, numpy, scipy, libsbml and ABC-SysBio on Windows Installation Install Python and the relevant dependencies according to the
Vista using WinPython.) Python can be downloaded from procedure detailed in the Supplementary Methods.
http://www.python.org. Necessary dependencies are as follows: Numpy Install ABC-SysBio according to the instructions in Box 3.

444 | VOL.9 NO.2 | 2014 | nature protocols

protocol

Box 3 | Installation of ABC-SysBio

1. Download the ABC-SysBio package from http://sourceforge.net/projects/abc-sysbio/files/ and unzip it.
In the following steps (2 and 3), replace <dir> with the full path to a location. This will be the location containing the lib and bin
directories (usually /usr/local by default, where Python is installed).
2. Open a terminal and type:
cd abc-sysbio-2.06
python setup.py install --prefix=<dir>
Please note that the --prefix=<dir> option is recommended, as it will guarantee that each package picks up the correct
dependencies. This places the ABC-SysBio package into
<dir>/lib/python2.6/site-packages/
and generates the scripts
<dir>/bin/abc-sysbio-sbml-sum
<dir>/bin/run-abc-sysbio
3. Add the script directory to the path (this must be done in each session or added to the shell configuration files, e.g. .bashrc or
.cshrc file).
© 2014 Nature America, Inc. All rights reserved.

export PATH=<dir>/bin:$PATH (bash shells)

setenv PATH <dir>/bin:$PATH (c shells)
4. Type the following command:
run-abc-sysbio –h
This should lead to the display of a list of options and put you in the position to run the examples.
 CRITICAL STEP Should any problem occur, refer to the ABC-SysBio manual, which is included in the package and can be downloaded
from sourceforge. In general, this manual includes many more examples and details than those covered in this protocol. In particular,
the advanced software settings and options will be presented in the manual in more detail.

PROCEDURE
Preparing the folder structure ● TIMING 2 min
1| In a terminal, go to the working directory and create a project folder ‘paramInference’:
mkdir paramInference
cd paramInference

Downloading the first sbml file ● TIMING 2 min

2| Download Supplementary Data 1. This is a zipped folder, which contains the files ‘mRNAselfReg1.sbml’ and
‘mRNAselfReg2.sbml’. Unzip this folder. The sbml model file ‘mRNAselfReg1.sbml’ is all you need to analyze the model.
Copy the file ‘mRNAselfReg1.sbml’ into the folder ‘paramInference’.

Parsing the sbml file ● TIMING 2 min

3| The ABC-SysBio package contains two main functions: abc-sysbio-sbml-sum and run-abc-sysbio. The first one reads
an sbml file and provides a model summary. It also creates a template file, which will be used as an input file in all further
steps. In the terminal type (as one line):
abc-sysbio-sbml-sum --files mRNAselfReg1.sbml --input_file_name input_file1.xml
which will print to the terminal:
input_files: [' mRNAselfReg1.sbml']
data: None
filename: input_file1.xml
sumname: model_summary.txt
? TROUBLESHOOTING

nature protocols | VOL.9 NO.2 | 2014 | 445

protocol
Figure 3 | Automatically generated model summary file. The function Model 1
name: Model 1
abc-sysbio-sum reads the sbml model file and extracts all model-specific
source: geneReg.sbml
information. The file contains the (always included) number of compartments
and reactions. Some models also contain rules, functions and events. number of compartments: 1
The file acts as a dictionary for all ABC-SysBio steps. The software renames number of reactions: 7
parameters and species. In the second column are the original sbml number of rules: 0
identifiers, whereas the new names are in the third column. The numbers in number of functions: 0
brackets denote the default values as defined in the sbml model file. number of events: 0

Species with initial values: 3

S1: species_m species1 (10.0)
S2: species_p1 species2 (5.0)
S3: species_p2 species3 (0.0)
4| Type
Parameter: 6L (all of them are global parameters)
ls –l (0 parameter is treated as species)
P1: cell compartment1 (1.0)
and all files that are now in the project folder will be listed: P2: parameter_0 parameter1 (10.0)
P3: parameter_1 parameter2 (0.5)
mRNAselfReg1.sbml P4: parameter_2 parameter3 (10.0)
P5: parameter_3 parameter4 (2.0)
input_file1.xml P6: parameter_4 parameter5 (1.0)

model_summary.txt ############################################################
© 2014 Nature America, Inc. All rights reserved.

Please note that the file model_summary.txt contains information about the provided sbml model file. The summary of this
example is shown in Figure 3.

Modifying the input file ● TIMING 10 min

 CRITICAL The generated input file (/paramInference/input_file1.xml) is written in the xml standard, i.e., specific
tags—which correspond to machine and (arguably) human readable definitions—are written as <tag> … </tag>. It contains
all information about the settings specifying the algorithm setup, the parameters, the data and the model. The automatically
generated template file already has the right format, e.g., the number of parameters and species corresponds to the sbml
model file. In case no sbml model file is used, the input file has to be generated separately. We recommend using one of the
example input files as a template on which to base any customized files.
 CRITICAL The following subsection of the PROCEDURE (Steps 5–14) contains instructions on how to set up the input file.
Its implementation can be avoided by using an already prepared input file provided in Supplementary Data 2. To follow this
option, download Supplementary Data 2 and unzip this file. In the folder are the two files ‘input_file1.xml’ and ‘input_file2.
xml’. Copy the file ‘input_file1.xml’ into the folder ‘paramInference’ and proceed with Step 15.

5| Define a tolerance schedule; this is one of the important parameters that control the rate at which the ABC-SMC
algorithm converges. The default option is an automatically generated schedule. In this example, we will use a fixed
user-defined schedule. Therefore replace
<autoepsilon>
<finalepsilon> 1.0 </finalepsilon>
<alpha> 0.9 </alpha>
</autoepsilon>
with
<epsilon>
<e1> 50 48 46 43 41 39 37 35 32 30 28 26 24 22 20 18 16 15 </e1>
</epsilon>

6| Set the number of accepted particles per ABC-SMC population by typing the following:
<particles> 100 </particles>
Please note that this command defines the population size, which is set to a low value here for demonstration purposes.
To obtain a good approximation of the posterior parameter distribution, the population size should be much larger (for this

446 | VOL.9 NO.2 | 2014 | nature protocols

protocol
example, around 1,000 particles will suffice), depending on how many parameters are to be estimated. As a rule of thumb,
the more parameters to be estimated, the larger the population size needs to be.

7| Set the numerical step size—a parameter used by the numerical solvers—to
<dt> 0.01 </dt>

8| Set the type of the parameter perturbation kernel; implementing this command means the sampled parameters are
perturbed uniformly in linear space.
<kernel> uniform </kernel>

9| Provide the data by typing the following lines in the input file:
<times> 0 0.1 0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 6 7 8 9 10 11 12 13 14 15 </times>
<variables>
<var1> 10.000 8.861 12.241 26.408 21.474 13.776 10.038 8.127 7.264 6.716 6.725 7.244
© 2014 Nature America, Inc. All rights reserved.

7.830 8.772 9.076 8.941 8.539 8.246 8.543 8.780 8.666 8.736 8.505 </var1>
</variables>
This instruction sets the times at which observations are taken, as well as the measured values for all observed species
(here only var1 is observed). The data are shown in Figure 2c.

10| Provide all model information in the section <models>. To achieve this objective, type the lines:
<name> mRNAselfReg1 </name>
<source> mRNAselfReg1.sbml </source>
These define the name of the model and the sbml model file containing the relevant model description.

11| The ABC-SysBio package can simulate SDE and ODE models, as well as Markov jump processes. The algorithms used are
summarized in Table 1. We will analyze the system as an SDE model. For this purpose, type the lines:
<type> SDE </type>

12| As the data only describe the temporal behavior of the mRNA species, which is species1, set:
<fit> species1 </fit>

13| The initial conditions, i.e., the state of our model system at time 0 (in this example, the amount of each species before
any reaction takes place), are known, and thus they must be defined as constant by typing:
<initial>
<ic1> constant 10.0 </ic1>
<ic2> constant 5.0 </ic2>
<ic3> constant 0.0 </ic3>
</initial>

14| Define the parameters’ prior distributions. The first parameter describes the sbml model–specific parameter ‘compartment
size’, which in the majority of models is set to 1. All known model parameters must be set as constant. In this case,
parameter6 (mRNA and protein degradation rate) is assumed to be known and set to 1. For this protocol, define the

nature protocols | VOL.9 NO.2 | 2014 | 447

protocol

prior parameter distributions of the remaining parameters Table 1 | Implemented algorithms for numerical simulation of
as follows: biological systems and references.

<parameters> Type of model Numerical algorithm Ref.

<parameter1> constant 1.0 </parameter1> ODE LSODA 67
<parameter2> uniform 0 50 </parameter2>
SDE Euler–Maruyama 68
<parameter3> uniform 0 10 </parameter3>
MJP Gillespie 69
<parameter4> uniform 0 50 </parameter4> All three algorithms are implemented in Python, C and PyCuda. The Python implementation is the
default option, which is used in this protocol. The C routines are applied when adding ‘the option
<parameter5> uniform 0 10 </parameter5> ‘-c++’ to the command line in Step 15, whereas the cuda routines are used when using ‘-cu’.

<parameter6> constant 1.0 </parameter6>

</parameters>
This defines, for example, the prior distribution of parameter 2 as a uniform distribution between 0 and 50. Other implemented
prior distributions are ‘normal’ and ‘lognormal’.

Running ABC-SysBio for parameter inference ● TIMING 20 min until population 12 and 3 h until population 16
© 2014 Nature America, Inc. All rights reserved.

15| Start the ABC-SysBio program by typing the following in the terminal:
run-abc-sysbio -i input_file1.xml -of=results –f –sd=2
Here the tag ‘-i’ defines the input file, ‘-of=’ defines the name of the folder that will contain all results and ‘-f’ results in
printing a full report to the terminal. The ABC-SysBio program will now import the sbml model file and translate it into
Python syntax, specific to the supplied SDE solver. This file mRNAselfReg1.py now becomes the project solver. The tag ‘-sd=2’
sets the seed of the random number generator in numpy. This tag is useful for debugging or comparison of results. It is not
generally needed to run the algorithm. As we set the population to only 100, we recommend the user to use this tag in order
to better compare the results with the results presented here.
? TROUBLESHOOTING

16| Carefully check all algorithm parameters that the program will print to ensure that the information is correct.
This information should correspond to the above-described instructions in the input file (for example, make sure that the
number of particles is set to 100). After around 1 min (depending on the computer on which ABC-SysBio is run), the first
ABC-SMC population will be finished and the summary of this population will be printed to the terminal:
### population 1
sampling steps / acceptance rate : 1211 / 0.0825763831544
model marginals : [1.0000000000000007]
This output appears after each finished ABC-SMC population. A new folder will be created, in this case ‘results’, which will
contain all other outputs of the program.

17| The results folder is updated every time an ABC-SMC population is finished. Every time this happens, check the files
inside the results folder by typing the following:
cd results
ls –l
The output will comprise the following files:
copy
_data.png
distance_Population1.txt
rates.txt
results_ mRNAselfReg1
traj_Population1.txt

448 | VOL.9 NO.2 | 2014 | nature protocols

protocol
The file ‘_data.png’ shows a plot of the data provided in the input file. The file ‘rates.txt’ contains in its first column the
population number, followed by the tolerance value ε, the number of sampled parameter combinations in order to obtain
a full ABC-SMC population and the achieved acceptance rate (i.e., the fraction of simulations that gave rise to simulated
data that was within the specified distance from the observed data). The last column shows the time it took to obtain this
population in seconds. This information is useful when redefining the tolerance schedule in order to increase the algorithm’s
performance. The files ‘distance_Population1.txt’ and ‘traj_Population1.txt’ contain the accepted simulations and their corre-
sponding distances from the provided data. The folder ‘results_ mRNAselfReg1’ contains a folder for each finished population.

18| To view the files generated after the first ABC-SMC population, type:
cd results_ mRNAselfReg1/Population_1
ls –l
which will list the following files:
data_Population1.txt
data_Weights1.txt
ScatterPlots_Population1.png
© 2014 Nature America, Inc. All rights reserved.

Timeseries_Population1.png
weightedHistograms_Population1.png
The accepted parameter combinations will be saved in ‘data_Population1.txt’, where columns represent the parameter and
initial conditions. In this example, the initial conditions are known and set to be constant. However, it is possible to infer
them by defining a prior distribution. The statistical weights corresponding to the parameter combinations are stored in the
file ‘data_Weights1.txt’. The .png files show simulations for ten of the accepted particles, the marginal posterior distributions
as histograms and pairwise scatterplots providing an overview of the posterior parameter distributions. The scatter plots
show the most recent population plotted on top of all previous populations marked by different colors. An example of the
output is shown in Supplementary Figure 1. Furthermore, example trajectories are plotted (Fig. 4). These plots are useful
for monitoring purposes and enable the user to follow the progress of the algorithm. Please note that sometimes it is
advisable not to generate these diagnostic plots, for example, when analyzing models with a high-dimensional parameter
space (models with a large number of parameters to estimate). Generating these diagnostic plots is time-consuming,
and it slows down the algorithm; hence, it is advisable in these cases to run the algorithm as in Step 15 by adding the
command ‘--diagnostic’ at the end of the line.

Preparing a new project folder ● TIMING 2 min

19| As in Step 1, in a terminal, go to the working directory and create a new project folder ‘modelSelection’. Type:
mkdir modelSelection
cd modelSelection

Downloading the sbml files for model selection ● TIMING 2 min

20| In Step 2, Supplementary Data 1 was downloaded and unzipped. Copy the two sbml model files (mRNAselfReg1.sbml
and mRNAselfReg2.sbml) into the ‘modelSelection’ folder.

Parsing both sbml model files ● TIMING 2 min

21| In the terminal, type (as one line):
abc-sysbio-sbml-sum --files mRNAselfReg1.sbml,mRNAselfReg2.sbml
--input_file_name input_file2.xml
? TROUBLESHOOTING

22| Type the following to list all files:

ls –l

nature protocols | VOL.9 NO.2 | 2014 | 449

protocol

Figure 4 | Example trajectories of intermediate a 50 b 30

and final ABC-SMC populations. After each
population, the software produces diagnostic 40 25
plots, which enable the user to follow the
progress of the algorithm implementation. These 30 20
plots include ten example trajectories plotted in

Unit

Unit
20 15
comparison with the data. (a,b) Shown are these
trajectories for the first ABC-SMC population (a) 10 10
and the last ABC-SMC population (b).
0 5

–10 0
This command again generates the 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Time Time
model_summary.txt, which now con-
tains information about both models, and the input_file2.xml. The latter is automatically in the right format for the model
selection algorithm.

Modifying the second input file ● TIMING 10 min

 CRITICAL Implementation of the following subsection of the PROCEDURE (Steps 23–28) can again be avoided by using the
already prepared input file provided in Supplementary Data 2. To follow this option, download Supplementary Data 2 and
© 2014 Nature America, Inc. All rights reserved.

unzip this file. In the folder are the two files ‘input_file1.xml’ and ‘input_file2.xml’. Copy the file ‘input_file2.xml’ into the
folder ‘modelSelection’ and proceed with Step 29.

23| Apply the same tolerance schedule as in Step 5; replace:

24| Set the number of accepted particles to 100 (note that this is a very low number, and it is only used for the purpose of
this tutorial example, but it should typically be much higher in real inference applications):
<particles> 100 </particles>

25| Set the numeric step size, the parameter perturbation kernel and the data as in Steps 7, 8 and 9, respectively.
Note that the data are now described by two time series, where the first (<var1>) is set as before. Furthermore, the second
time series includes missing values (NA) for some time points.
<dt> 0.01 </dt>
<kernel> uniform </kernel>
<times> 0 0.1 0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 6 7 8 9 10 11 12 13 14 15 </times>
<variables>
<var1> 10.000 8.861 12.241 26.408 21.474 13.776 10.038 8.127 7.264 6.716 6.725 7.244
7.830 8.772 9.076 8.941 8.539 8.246 8.543 8.780 8.666 8.736 8.505 </var1>
<var2> NA NA NA NA 144.147 NA 140.720 NA 103.582 NA 82.268 NA 77.614 82.699 88.346
90.024 89.033 87.776 87.291 87.431 87.706 87.839 87.826 </var2>
</variables>

450 | VOL.9 NO.2 | 2014 | nature protocols

protocol

26| Provide the information about how many models are considered. On the top of the file, note the tag:
<modelnumber> 2 </modelnumber>
In the <model> section, there will be now the two tags: <model1>, and further down in the file <model2>.

27| Define all parameters for <model1> as done in Steps 10–14:

<name> mRNAselfReg1</name>
<source> mRNAselfReg1.sbml </source>
<type> SDE </type>
<fit> species1 species2+species3 </fit>
<initial>
<ic1> constant 10.0 </ic1>
<ic2> constant 5.0 </ic2>
<ic3> constant 0.0 </ic3>
© 2014 Nature America, Inc. All rights reserved.

</initial>
<parameters>
<parameter1> constant 1.0 </parameter1>
<parameter2> uniform 0 50 </parameter2>
<parameter3> uniform 0 10 </parameter3>
<parameter4> uniform 0 50 </parameter4>
<parameter5> uniform 0 10 </parameter5>
<parameter6> constant 1.0 </parameter6>
</parameters>
The fitting instruction <fit> now includes two expressions, one for each provided time series in <data>. The second time
series describes the total amount of measured protein, which is, in this first model, the sum of species2 and species3.

28| For <model2>, set:

<name> mRNAselfReg2 </name>
<source> mRNAselfReg2.sbml </source>
<type> SDE </type>
<fit> species1 species2 </fit>
<initial>
<ic1> constant 10.0 </ic1>
<ic2> constant 5.0 </ic2>
</initial>
<parameters>
<parameter1> constant 1.0 </parameter1>
<parameter2> uniform 0 10 </parameter2>
<parameter3> uniform 0 10 </parameter3>
<parameter4> uniform 0 30 </parameter4>
<parameter5> uniform 0 30 </parameter5>
</parameters>

nature protocols | VOL.9 NO.2 | 2014 | 451

protocol
Note that in this second model we have only one protein species. For this reason, the fitting instruction for the second
time series is only ‘species2’. The algorithm automatically chooses the model selection algorithm if more than one model is
provided. Parameter inference is also carried out as part of the model selection procedure. The final edited input file is
provided in Supplementary Data 2 (input_file2.xml).

Running ABC-SysBio for model selection ● TIMING 10 min until population 6 and 1 h until population 9
29| To start the model selection algorithm, type the same command in the terminal as in Step 16:
run-abc-sysbio -i input_file2.xml -of=results –f –sd=2
No further commands are required for model selection, because all necessary information is contained in the input file.
Once the first ABC-SMC population is finished (this should be in a few seconds), the algorithm prints to the terminal:
### population 1
sampling steps / acceptance rate : 1478 / 0.0676589986468
model marginals : [0.5900000000000003, 0.4100000000000002]
The model marginals represent the probability of the two models in light of the data, i.e., they describe which of the models
describes the data best. Please note that it takes 3–4 h for the whole algorithm to be run, but already after a few
populations a clear tendency is visible.
© 2014 Nature America, Inc. All rights reserved.

? TROUBLESHOOTING

30| Compared with the parameter inference algorithm, in the results folder we now have additional files. View them by typing
cd results
ls –l
The output will comprise the following files:
copy
_data.png
distance_Population1.txt
ModelDistribution_1.png
ModelDistribution.txt
rates.txt
results_ mRNAselfReg1
results_ mRNAselfReg2
traj_Population1.txt
The file ‘_ModelDistribution_1.png’
(1) [380.] (2) [370.] (3) [360.]
0.943396226415 0.0627746390458 0.0825763831544 shows a bar plot representing the
0.6
0.8
1.0 model probabilities. This figure is
0.5 0.8
0.4 0.6
0.6
updated after each ABC-SMC popula-
0.3
0.2
0.4 0.4 tion. The file ‘ModelDistribution.txt’
0.1 0.2 0.2
0 0 0
1 2 1 2 1 2

(4) [340.] (5) [300.] (6) [250.]

0.0458926112896 0.0199322304166 0.0135998912009
Figure 5 | Model probabilities after each
0.7 1.0
ABC-SMC population. Each of the histograms
0.8 0.6 0.8 in this figure is produced after each ABC-SMC
0.6 0.5
0.4 0.6 population. Shown are the model probabilities
0.4 0.3 0.4
0.2 0.2
0.2
as bar plots. The numbers in parentheses
0.1
0 0 0 represent the population number; numbers in
1 2 1 2 1 2
square brackets are the distance thresholds for
(7) [150.] (8) [100.] (9) [90.] each population (ε-schedule); and numbers
0.00510646989736 0.00174258529955 0.00114552785924
below the above-mentioned parentheticals are
1.0 1.0 1.0
0.8 0.8 0.8
the acceptance rates. In population 1, both
0.6 0.6 0.6 models have approximately the same probability
0.4 0.4 0.4 of representing the data. After population
0.2 0.2 0.2
16, model 1 has a much higher probability of
0 0 0
1 2 1 2 1 2 representing the data best.

452 | VOL.9 NO.2 | 2014 | nature protocols

protocol
lists the model probabilities for each finished ABC-SMC population. A results folder for each model is created, in which the
ABC-SMC populations are listed (according to the parameter inference algorithm). Figure 5 shows the model probabilities
from population 1 to 16.

? TROUBLESHOOTING
Troubleshooting advice can be found in Table 2.

Table 2 | Troubleshooting table.

Step Problem Possible reason Solution

3, 21 Error: ‘ can not parse sbml The sbml model file does not exist Make sure the model name provided in the input file
model file’ or contains errors (or command line) is exactly the same as the model
file. If the sbml model file was manually generated,
make sure all tags are correct and closed and only sbml
standard acceptable expressions and syntax is used

strings for model names!’ strings input file

Error: ‘The number of given The model contains a different Provide one prior distribution per parameter defined
prior distributions for model X is number of parameters than was in the model. If some parameters are known, they still
not correct’ defined in the input file need to be defined (as ‘constant’). Note: when using
an sbml model file an additional parameter appears,
which defines the compartment size. This parameter is
always defined as ‘parameter 1’. For the vast majority of
systems this parameter is constant 1.0

Error: ‘Please provide an The number of species in the Check the input file and make sure that the initial
initial value for each species model and that in the input file do conditions are correctly defined. For each species in
in model X.’ not correspond to each other the model one initial condition needs to be provided.
This can either be ‘constant’ if the initial condition is
known, or one of the following, if the initial condition
needs to be inferred: ‘uniform’, ‘normal’, ‘lognormal’

Error: ‘The prior distribution of Invalid expression in the input file Check the type of distribution for parameter X in
parameter X is wrongly defined’ the input file. Possible types are: ‘uniform’, ‘normal’,
‘lognormal’ and ‘constant’

Error: ‘The integration type for Invalid expression in the input file Check the integration type for model X. Allowed
model X does not exist.’ expressions are ‘ODE’, ‘SDE’ and ‘Gillespie’

Error: ‘The results folder There is already a file/folder called Change the working directory, change the name of the
already exists’ ‘results’ in the working directory existing folder, or remove the folder

Error: ‘Please provide a fit Wrong or no fitting instruction is Always provide the same number of fitting instructions
instruction for each model’ provided as provided time series (if the number of species
differs from the number of data series). Fitting
instructions can be simple expressions such as
‘species1’, but also more advanced instructions such as
‘species1+10*species3’. This is particularly useful when
data need to be scaled or only combinations of species
are observed

● TIMING
Step 1, preparing the folder structure: 2 min
Step 2, downloading the first sbml file: 2 min
Steps 3 and 4, parsing the sbml file: 2 min
Steps 5–14, modifying the input file: 10 min

nature protocols | VOL.9 NO.2 | 2014 | 453

protocol

Steps 15–18, running ABC-SysBio for parameter inference, until population 12: 20 min and until population 16: 3 h
Step 19, preparing a new project folder: 2 min
Step 20, downloading the sbml files for model selection: 2 min
Steps 21 and 22, parsing both sbml model files: 2 min
Steps 23–28, modifying the second input file: 10 min
Steps 29 and 30, running ABC-SysBio for model selection, until population 6: 10 min and until population 9: 1 h

ANTICIPATED RESULTS
The typical output after performing Bayesian parameter inference in ABC-SysBio consists of a set of weighted particles
that summarize the approximate posterior distribution. A particle is a parameter vector containing a value for each
of the reaction rates to be estimated. The weight associated with a particle is proportional to the probability that this
parameter vector can explain the observed data. In this section, we describe how to analyze and interpret the posterior
distribution obtained.
First, the marginal posterior distribution (i.e., the probability distribution of each reaction rate considered independently)
can be obtained by using a weighted histogram. ABC-SysBio provides these weighted histograms at each step of the
sequential algorithm. If the marginal distribution is very peaked around a parameter value, we say that the reaction rate is
well inferred (Fig. 6). In most biological systems, however, only a few reaction rates can be inferred given an observed data
set, and different parameter vectors can explain the observed data (almost) equally well16,18,63. Such issues are especially
© 2014 Nature America, Inc. All rights reserved.

obvious and important to consider when looking at the joint probability distribution over all reaction rates.
In order to study the correlation between parameters, an investigator typically plots the joint posterior distribution of
pairs of reaction rates. Different examples of joint pairwise posterior distributions are shown in Figure 6a,c. Here we observe
that the correlation can be linear or highly nonlinear and that the posterior distribution can have several peaks, i.e., the
distribution is multimodal. Liepe et al.44 and Secrier et al.64 described how to analyze a posterior distribution and perform
sensitivity analysis. Such an analysis of inferred posterior distributions over parameters also enables researchers to consider
factors such as parameter identifiability and sloppiness49,63.
Parameter inference is not just an aim in its own right, and the posterior distribution can also be exploited for a predictive
purpose. For example, it is possible to study the evolution of some of the species that have not been measured, or to predict
the behavior of the biological system under different experimental conditions (Fig. 6b). This task is easily performed by
sampling a set of particles from the obtained posterior distribution and simulating the model (or the variation of the model)
for each of the particles65. Each simulated trajectory corresponds to a possible behavior. If all the simulated trajectories are
very similar, then this behavior is of high probability given the assumed mechanistic model, the prior distribution over the
parameters and the observed data. In contrast, if the simulated trajectories significantly vary from one particle to another,
then the behavior of the corresponding species cannot be accurately predicted. This analysis serves as a basis for the design
of experiments that could help improve such predictions65,66.
Analysis of marginal distributions provides an assessment of the probabilities of different candidate models—which
represent different mechanistic hypotheses—in light of data. By making use of these probabilities, we can, for example, rank
these models, or we can identify similarities among models that receive statistical support from the data38. If, for example,
all models that have appreciable posterior probability share certain types of interactions, then we might hypothesize that
these interactions are more likely to be real than interactions that receive little statistical support.
A frequent occurrence in inference is a lengthy delay while the computer evaluates the approximate posteriors.
ABC-SysBio provides access to advanced graphics processing unit (GPU) hardware, which, when available, will result in a
considerable acceleration of the simulation process. Alternatively, Python can be dropped in favor of C routines, which will
also increase the speed of simulation. In its simplest form, relying on Python as the primary language, ABC-SysBio is readily
usable and highly suited for preliminary analysis of models. As always in computing, there is a potential trade-off between
the time it takes to implement computational analyses and the computer run-time the analysis takes. Here, ABC-SysBio
provides the user with the flexibility gradually to scale up in computational sophistication as and when needed.
It is important to remember that ABC methods only provide an approximation of the posterior distribution. The ABC-SMC
algorithm has been tested for examples where the true posterior distribution is known, and it has been shown that the
obtained posterior distribution is similar to the true one27,43. For more realistic examples where the true posterior distribution
is unknown, a sensible and precautionary approach to check the quality of the obtained posterior distribution is to study the
predictive distribution by comparing the simulated data with the observed ones. Of course, even if the simulated data are
almost identical to the observed ones, there is no guarantee that the obtained posterior distribution is the true (but unknown)
one. In particular, some regions of the posterior distribution may not be covered owing to too few particles. We recommend
running the software repeatedly and comparing the posterior distributions obtained.
The accuracy of the obtained approximation of the posterior distribution is highly dependent on the last value of epsilon28,
but also on the number of particles per population, the tolerance schedule, the distance function and the perturbation kernels.

454 | VOL.9 NO.2 | 2014 | nature protocols

protocol
Figure 6 | Analyzing the posterior distribution. (a) The marginal posterior a c 200
density for each of the four reaction rates to be estimated in the mRNA p1

Frequency
0.06
self-regulation model (diagonal), as well as in the joint pairwise posterior 100
distribution for each couple of reaction rates. The 2D distributions are
0
represented with orange contours in which the darker the color, the higher 0
0 20 40 p2 0 2 4 6 8 10
the probability. (b) Exploiting the posterior distribution to predict the 0.4 Parameter 1
evolution of the three species (from left to right: mRNA, P1 and P2) of the 0.2
400
mRNA self-regulation model. We plot ten simulated trajectories for ten 0

Frequency
parameter vectors sampled from the posterior distribution (top). To analyze 0 4 8 p3 200
the distribution of the evolution of the three species, we sample 1,000 0.10
parameter sets from the posterior distribution and plot the mean (dark red), 0
0 0 2 4 6 8 10
the 25th and 75th percentiles (orange) and the 5th and 95th percentiles Parameter 1
0 20 40 p4
(yellow) of the simulated (bottom). (c) Examples of posterior distributions. 0.6 10
From top to bottom: the marginal posterior distribution for a well-inferred 8

Parameter 2
0.3
parameter; a bimodal posterior distribution; a posterior distribution over two 0
6
linearly correlated parameters; a posterior distribution over two parameters 4
0 4 8
2
that are highly dependent, but in a nonlinear manner; and a bimodal
posterior distribution over two parameters. b 0
0 2 4 6 8 10
40 200 Parameter 1
200 10
8

Parameter 2
Some of the computational aspects of ABC are still active

4
0 0 0
incorporate these developments. These improvements will 0 5 10 15 0 5 10 15 0 5 10 15
2
Time Time Time 0
come from two directions: there are nontrivial speed gains 0 2 4 6 8 10
Parameter 1
to be achieved by using modern computer architectures 200 200 10
or streamlined programming in low-level languages 40 8

Parameter 2
6
(ABC-SysBio allows for this, and we would recommend that 100

P2
100
m

20 4
users make use of the GPU implementations or provide C 2
0 0 0 0
rather than Python routines) and recent developments in 0 5 10 15 0 5 10 15 0 5 10 15 0 2 4 6 8 10
simulating stochastic dynamical systems more efficiently. Time Time Time Parameter 1

The second type of improvement may result from research

into the underlying ABC foundations. ABC is increasingly considered as a distinct inferential formalism and not merely as an
approximation to conventional Bayesian inference.
In summary, however, ABC provides a pragmatic, rarely optimal but often applicable, framework in which cutting-edge
scientific problems can be addressed from a Bayesian perspective. ABC-SysBio makes this framework, as well as state-of-the-
art computational tools, available to computational and systems biologists.

Note: Any Supplementary Information and Source Data files are available in the 2. Xu, T.-R. et al. Inferring signaling pathway topologies from multiple
online version of the paper. perturbation measurements of specific biochemical species. Sci. Signal 3,
ra20 (2010).
Acknowledgments J.L., T.T. and C.P.B. gratefully acknowledge funding from
3. Stumpf, M.P.H., Balding, D.J. & Girolami, M. Handbook of Statistical
the Wellcome Trust through a PhD studentship, a Wellcome Trust-Massachusetts
Systems Biology (Wiley, 2011).
Institute of Technology (MIT) postdoctoral fellowship (no. 090433/B/09/Z)
4. Balsa-Canto, E., Peifer, M., Banga, J.R., Timmer, J. & Fleck, C. Hybrid
and a Research Career Development Fellowship (no. 097319/Z/11/Z),
optimization method with general switching strategy for parameter
respectively. J.L. also acknowledges financial support from the NC3R through
estimation. BMC Syst. Biol. 2, 26 (2008).
a David Sainsbury Fellowship; S.F. acknowledges financial support through
5. Kirk, P.D.W. & Stumpf, M.P.H. Gaussian process regression bootstrapping:
a UK Medical Research Council Biocomputing Fellowship. P.K. and M.P.H.S.
exploring the effects of uncertainty in time course data. Bioinformatics
acknowledge support from a Human Frontier Science Program (HFSP) grant
25, 1300–1306 (2009).
(no. RGP0061/2011). M.P.H.S. gratefully acknowledges support from the UK
6. Efron, B. Bayes’ theorem in the 21st century. Science 340, 1177–1178
Biotechnology and Biological Sciences Research Council, The Leverhulme Trust
(2013).
and the Royal Society through a Wolfson Research Merit Award.
7. Vyshemirsky, V. & Girolami, M.A. Bayesian ranking of biochemical system
models. Bioinformatics 24, 833–839 (2008).
AUTHOR CONTRIBUTIONS J.L. designed and analyzed the examples, developed
8. Robert, C. The Bayesian Choice (Springer, 2007).
the protocol and wrote the paper. P.K. designed the analysis and wrote the
9. Jeffreys, H. An invariant form for the prior probability in estimation
paper; S.F. analyzed the examples, verified the protocols and wrote the paper.
problems. Proc. R. Soc. Lond. A Math. Phys. Sci. 186, 453–461 (1946).
T.T. analyzed the examples, verified the protocols and wrote the paper.
10. Jaynes, E. Prior Probabilities. IEEE Trans. Syst. Sci. Cyber. 4, 227–241
C.P.B. developed the protocols and wrote the paper; M.P.H.S. designed the
(1968).
examples and wrote the paper.
11. Bernardo, J.M. & Smith, A.F.M. Bayesian Theory (John Wiley & Sons,
COMPETING FINANCIAL INTERESTS The authors declare no competing financial 2009).
interests. 12. Kass, R.E. & Wasserman, L. The selection of prior distributions by formal
rules. J. Am. Statist. Assoc. 91, 1343–1370 (1996).
Reprints and permissions information is available online at http://www.nature. 13. Cox, D. Principles of Statistical Inference (Cambridge University Press, 2006).
com/reprints/index.html. 14. Toni, T., Ozaki, Y.-I., Kirk, P., Kuroda, S. & Stumpf, M.P.H. Elucidating the
in vivo phosphorylation dynamics of the ERK MAP kinase using
1. Kirk, P., Thorne, T. & Stumpf, M.P. Model selection in systems and quantitative proteomics data and Bayesian model selection. Mol. Biosyst.
synthetic biology. Curr. Opin. Biotechnol. 24, 767–774 (2013). 8, 1921–1929 (2012).

nature protocols | VOL.9 NO.2 | 2014 | 455

protocol
15. Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. Markov Chain Monte Carlo 43. Silk, D. et al. Designing attractive models via automated identification of
in Practice (CRC Press, 1996). chaotic and oscillatory dynamical regimes. Nat. Commun. 2, 489 (2011).
16. Gutenkunst, R.N. et al. Universally sloppy parameter sensitivities in 44. Liepe, J. et al. Calibrating spatio-temporal models of leukocyte dynamics
systems biology models. PLoS Comput. Biol. 3, 1871–1878 (2007). against in vivo live-imaging data using approximate Bayesian
17. Apgar, J.F., Witmer, D.K., White, F.M. & Tidor, B. Sloppy models, parameter computation. Integr. Biol. 4, 335–345 (2012).
uncertainty, and the role of experimental design. Mol. Biosyst. 6, 45. Barnes, C.P., Silk, D., Sheng, X. & Stumpf, M.P.H. Bayesian design of
1890–1900 (2010). synthetic biological systems. Proc. Natl. Acad. Sci. USA 108, 15190–15195
18. Erguler, K. & Stumpf, M.P.H. Practical limits for reverse engineering of (2011).
dynamical systems: a statistical analysis of sensitivity and parameter 46. Maclean, A.L., Lo Celso, C. & Stumpf, M.P.H. Population dynamics of
inferability in systems biology models. Mol. Biosyst. 7, 1593–1602 (2011). normal and leukaemia stem cells in the haematopoietic stem cell niche
19. Sunnåker, M. et al. Approximate Bayesian computation. PLoS Comput. Biol. show distinct regimes where leukaemia will be controlled. J. R. Soc.
9, e1002803 (2013). Interface 10, 20120968 (2013).
20. Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. Inferring 47. Csilléry, K., Blum, M.G.B., Gaggiotti, O.E. & Francois, O. Approximate
coalescence times from DNA sequence data. Genetics 145, 505–518 Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418
(1997). (2010).
21. Beaumont, M.A., Zhang, W. & Balding, D.J. Approximate Bayesian 48. Komorowski, M., Finkenstädt, B., Harper, C.V. & Rand, D.A. Bayesian
computation in population genetics. Genetics 162, 2025–2035 (2002). inference of biochemical kinetic parameters using the linear noise
22. Kirk, P.D.W., Toni, T. & Stumpf, M.P. Parameter inference for biochemical approximation. BMC Bioinformatics 10, 343 (2009).
systems that undergo a Hopf bifurcation. Biophys. J. 95, 540–549 (2008). 49. Komorowski, M., Costa, M.J., Rand, D.A. & Stumpf, M.P.H. Sensitivity,
23. Golightly, A. & Wilkinson, D.J. Bayesian inference for stochastic kinetic robustness, and identifiability in stochastic chemical kinetics models.
models using a diffusion approximation. Biometrics 61, 781–788 (2005). Proc. Natl. Acad. Sci. USA 108, 8645–8650 (2011).
24. Bowsher, C.G. & Swain, P.S. Identifying sources of variation and the flow 50. Golightly, A. & Wilkinson, D.J. Bayesian parameter inference for stochastic
of information in biochemical networks. Proc. Natl. Acad. Sci. USA 109, biochemical network models using particle Markov chain Monte Carlo.
E1320–E1328 (2012). Interface Focus 1, 807–820 (2011).
© 2014 Nature America, Inc. All rights reserved.

25. Hilfinger, A. & Paulsson, J. Separating intrinsic from extrinsic fluctuations 51. Ale, A., Kirk, P. & Stumpf, M.P.H. A general moment expansion method for
in dynamic biological systems. Proc. Natl. Acad. Sci. USA 108, stochastic kinetic models. J. Chem. Phys. 138, 174101 (2013).
12167–12172 (2011). 52. Cornuet, J.-M. et al. Inferring population history with DIY ABC: a user-
26. Sisson, S.A., Fan, Y. & Tanaka, M.M. Sequential Monte Carlo without friendly approach to approximate Bayesian computation. Bioinformatics
likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007). 24, 2713–2719 (2008).
27. Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M.P.H. Approximate 53. Cornuet, J.-M., Ravigné, V. & Estoup, A. Inference on population history
Bayesian computation scheme for parameter inference and model selection and model checking using DNA sequence and microsatellite data with the
in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009). software DIYABC (v1.0). BMC Bioinformatics 11, 401 (2010).
28. Beaumont, M.A., Cornuet, J.-M., Marin, J.-M. & Robert, C.P. Adaptive 54. Bertorelle, G., Benazzo, A. & Mona, S. ABC as a flexible framework to
approximate Bayesian computation. Biometrika 96, 983–990 (2009). estimate demography over space and time: some cons, many pros.
29. Joyce, P. & Marjoram, P. Approximately sufficient statistics and Bayesian Mol. Ecol. 19, 2609–2625 (2010).
computation. Stat. Appl. Genet. Mol. Biol. 7, 26 (2008). 55. Dematté, L. & Prandi, D. GPU computing for systems biology. Brief.
30. Nunes, M.A. & Balding, D.J. On optimal selection of summary statistics Bioinformatics 11, 323–333 (2010).
for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9, 34 56. Zhou, Y., Liepe, J., Sheng, X., Stumpf, M.P.H. & Barnes, C. GPU accelerated
(2010). biochemical network simulation. Bioinformatics 27, 874–876 (2011).
31. Robert, C.P., Cornuet, J.-M., Marin, J.-M. & Pillai, N.S. Lack of confidence 57. Vyshemirsky, V. & Girolami, M. BioBayes: a software package for Bayesian
in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. inference in systems biology. Bioinformatics 24, 1933–1934 (2008).
USA 108, 15112–15117 (2011). 58. Golightly, A. & Wilkinson, D. Bayesian sequential inference for stochastic
32. Fearnhead, P. & Prangle, D. Constructing summary statistics for kinetic biochemical network models 13, 838–851 (2006).
approximate Bayesian computation: semi-automatic approximate Bayesian 59. Filippi, S., Barnes, C.P., Cornebise, J. & Stumpf, M.P.H. On optimality of
computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74, 419–474 (2012). kernels for approximate Bayesian computation using sequential Monte
33. Barnes, C.P., Filippi, S., Stumpf, M.P. & Thorne, T. Considerate approaches Carlo. Stat. Appl. Genet. Mol. Biol. 12, 87–107 (2013).
to constructing summary statistics for ABC model selection. Stat. Comput. 60. Silk, D., Filippi, S. & Stumpf, M.P.H. Optimizing threshold - schedules for
22, 1181–1197 (2012). approximate Bayesian computation sequential Monte Carlo samplers:
34. Toni, T. & Stumpf, M.P.H. Simulation-based model selection for dynamical applications to molecular systems. Preprint at http://arxiv.org/
systems in systems and population biology. Bioinformatics 26, 104–110 abs/1210.3296 (2012).
(2010). 61. Leuenberger, C. & Wegmann, D. Bayesian computation and model selection
35. Wilkinson, R.D. Approximate Bayesian computation (ABC) gives exact without likelihoods. Genetics 184, 243–252 (2010).
results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 62. Wilkinson, D.J. Stochastic Modelling for Systems Biology (CRC Press, 2011).
12, 129–141 (2013). 63. Rand, D.A. Mapping global sensitivity of cellular network dynamics:
36. Drovandi, C.C., Pettitt, A.N. & Faddy, M.J. Approximate Bayesian sensitivity heat maps and a global summation law. J. R. Soc. Interface 5
computation using indirect inference. J. R. Statist. Soc. Ser. C 60, (suppl. 1): S59–S69 (2008).
317–337 (2011). 64. Secrier, M., Toni, T. & Stumpf, M.P.H. The ABC of reverse engineering
37. Grelaud, A., Robert, C.P. & Marin, J.-M. ABC methods for model choice in biological signalling systems. Mol. Biosyst. 5, 1925–1935 (2009).
Gibbs random fields. Comptes Rendus Mathematique 347, 205–210 (2009). 65. Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M.P.H. Maximizing the
38. Thorne, T. & Stumpf, M.P.H. Graph spectral analysis of protein interaction information content of experiments in systems biology. PLoS Comput. Biol.
network evolution. J. R. Soc. Interface 9, 2653–2666 (2012). 9, e1002888 (2013).
39. Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A. & Feldman, M.W. 66. Vanlier, J., Tiemann, C.A., Hilbers, P.A.J. & van Riel, N.A.W. A Bayesian
Population growth of human Y chromosomes: a study of Y chromosome approach to targeted experiment design. Bioinformatics 28, 1136–1142
microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999). (2012).
40. Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. Markov chain Monte 67. Hindmarsh, A.C. ODEPACK, a systematized collection of ODE solvers, in
Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 Scientific Computing (eds. Stepleman, R.S. et al.) IMACS Transactions on
(2003). Scientific Computation, Vol. 1, 55–64 (Elsevier, 1983).
41. Lopes, J.S. & Beaumont, M.A. ABC: a useful Bayesian tool for the analysis 68. Kloeden, P.E. & Platen, E. Numerical Solution of Stochastic Differential
of population data. Infect. Genet. Evol. 10, 826–833 (2010). Equations (Springer, 1992).
42. Toni, T., Jovanovic, G., Huvet, M., Buck, M. & Stumpf, M.P.H. From 69. Gillespie, D.T. A general method for numerically simulating the stochastic
qualitative data to quantitative models: analysis of the phage shock time evolution of coupled chemical reactions. J. Comput. Phys. 22,
protein stress response in Escherichia coli. BMC Syst. Biol. 5, 69 (2011). 403–434 (1976).