Bio Articles2
Bio Articles2
Perspective https://doi.org/10.1038/s41592-023-01832-z
Check for updates Major computational challenges exist in relation to the collection, curation,
processing and analysis of large genomic and imaging datasets, as well as the
simulation of larger and more realistic models in systems biology. Here we
discuss how a relative newcomer among programming languages—Julia—is
poised to meet the current and emerging demands in the computational
biosciences and beyond. Speed, flexibility, a thriving package ecosystem
and readability are major factors that make high-performance computing
and data analysis available to an unprecedented degree. We highlight how
Julia’s design is already enabling new ways of analyzing biological data and
systems, and we provide a list of resources that can facilitate the transition
into Julian computing.
Computers are tools. Like pipettes or centrifuges, they allow us to per- research that is hidden from most users, however, continues to rely on
form tasks more quickly or efficiently, and like microscopes, they give C/C++ or Fortran. Computationally intensive studies are often initially
us new, more detailed insights into biological systems and data. Com- designed and prototyped in R, Python or MATLAB and subsequently
puters allow us to develop, simulate and test mathematical models of translated into C/C++ or Fortran for increased performance. This is
biology and compare models with complex datasets. As computational known as the two-language problem6.
power evolved, solving biological problems computationally became This two-language approach has been successful but has limita-
possible, then popular and, eventually, necessary1. Entire fields such tions (Fig. 1a). When moving an implementation from one language to
as computational biology and bioinformatics emerged. Without com- another, faster, programming language, verbatim translation may not
puters, the reconstruction of structures from X-ray crystallography, be the optimal route: faster languages often provide the programmer
NMR or cryogenic electron microscopy methods would be impossible. with higher autonomy to choose how memory is accessed or allocated
The same goes for the 1000 Genomes Project2, which used computer or to employ more flexible data structures7. Exploiting such features
programs to assemble and analyze the DNA sequences generated. may involve a complete rewrite of the algorithm to ensure faster imple-
More recently, vaccine development has benefited from advances in mentation or better scaling as datasets grow in size and complexity.
algorithms and computer hardware3. This requires expertise across both languages, but also rigorous testing
Programming languages are also tools. They make it possible of the code in both languages.
to instruct computers. Some languages are good at specific tasks Julia8 is a relatively new programming language that overcomes
(think Perl for string processing tasks or R for statistical analyses), the two-language problem. Users do not have to choose between ease
whereas others—including C/C++ and Python—have been used with of use and high performance. Julia has been designed to be easy to
success across many different domains. In biomedical research, the program in and fast to execute (Fig. 1b). This efficiency and the grow-
prevailing languages have arguably been R4 and Python5. Much of ing ecosystem of state-of-the-art application packages (Table 1 and
the high-performance backbone supporting computationally intensive Fig. 2) and introductions7,9 make it an attractive choice for biologists.
School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia. 2Melbourne Integrative Genomics, University of
1
Melbourne, Melbourne, Victoria, Australia. 3JuliaHub, Somerville, MA, USA. 4Medical Research Council Laboratory of Molecular Biology, Cambridge,
UK. 5Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA. 6RelationalAI, Berkeley, CA,
USA. 7Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA. 8Pumas-AI, Centreville, VA, USA. 9Departments of
Neuroscience and Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA. 10School of BioSciences, The University of Melbourne,
Melbourne, Victoria, Australia. 11ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems, Melbourne, Victoria, Australia.
e-mail: [email protected]
a
New biology
Speed and
metaprogramming
Abstraction
b
Provided performance
C/C++,
Provided performance
Fortran Julia
Translation
c
R,
Python,
MATLAB Current R, Python, C/C++,
project Julia MATLAB Fortran
Add new
Coding effort
functions?
Fig. 1 | Julia is a tool enabling biologists to discover new science. a, In the the glacier and a rocket to simply fly over the chasm. These represent Julia’s top
biological sciences, the most obvious alternatives to the programming language three language design features: abstraction, speed and metaprogramming. With
Julia are R, Python and MATLAB. Here we contrast the two potential pathways to these tools, the journey to the top of the mountain becomes much easier for the
new biology with a mountaineering analogy. The top of the mountain represents excursionist. Julia allows biologists to not be held back by the problems discussed
new biology49. There are two potential base camps for the ascent: base camp 1 in b and c. b, The two-language problem refers to having separate languages for
(left, red) is R/Python/MATLAB. Base camp 2 (right, green) is Julia. To get to the algorithm development and prototyping (such as R or Python) and production
top, the mountaineer, representing a researcher, needs to overcome certain runs (such as C/C++ or Fortran), respectively. Julia was designed to be good at both
obstacles, such as a glacier and a chasm. These represent research hurdles, such tasks, which can reduce programming efforts and software complexity. c, The
as large and diverse datasets or complex models. Starting at the Julia base camp, expression problem refers to the effort required by users to define new (optimized)
the mountaineer has access to efficient and effective tools, such as a bridge over data types and functions that can be added to existing external code bases.
Biological systems and data are multifaceted by nature, and to In this article, we discuss each language feature and its rele-
describe them or model them mathematically requires a flexible vance in the context of one concrete biological example per feature.
programming language that can connect different types of highly An additional example per feature can be found in the Supplementary
structured data (Fig. 1c). Three hallmarks of the language make Julia Information. Furthermore, in Supplementary Table 1, we provide a sum-
particularly suitable for meeting current and emerging demands of mary of why we believe Julia is a good programming language for biolo-
biomedical science: speed, abstraction and metaprogramming. gists. Supporting online material is provided in a GitHub repository at
JuliaData Data manipulation, storage, and input and output DataFrames.jl, JuliaDB.jl, DataFramesMeta.jl and CSV.jl
JuliaPlots Data visualization Plots.jl, Makie.jl, StatsPlots.jl and PlotlyJS.jl
JuliaStats Statistics and machine learning Distributions.jl, GLM.jl, StatsBase.jl, Distances.jl, MixedModels.jl, TimeSeries.jl,
Clustering.jl, MultivariateStats.jl and HypothesisTests.jl.
BioJulia Bioinformatics and computational biology BioSequences.jl, BioStructures.jl, BioAlignments.jl, FASTX.jl and Microbiome.jl
JuliaImages Image processing Images.jl, ImageSegmentation.jl, ImageTransformations.jl and ImageView.jl
EcoJulia Ecological research SpatialEcology.jl, EcologicalNetworks.jl, Phylo.jl and Diversity.jl
SciML Scientific machine learning DifferentialEquations.jl, ModelingToolkit.jl, DiffEqFlux.jl and Catalyst.jl
FluxML Machine learning Flux.jl, Zygote.jl, MacroTools.jl, GeometricFlux.jl and Metalhead.jl
Related packages are organized in package communities. In this table, we present an overview of the package communities we consider to be most relevant to biologists.
RCall.jl
PyCall.jl
Data: MATLAB.jl
CxxWrap.jl
DataFrames.jl JavaCall.jl
CSV.jl Domain data:
Visualization: Graphs.jl
Images.jl BioSequences.jl
CellFishing.jl Tools:
Plots.jl
FASTX.jl
StatPlots.jl Integration of non-Julia
PhyloPlots.jl code Miocrobiome.jl
PyPlot.jl BioStructures.jl
Gadfly.jl BioAlignments.jl
Data handling and PhyloNetworks.jl
visualization MIToS.jl
Bioinformatics
Dimensionality Julia for Biologists
reduction:
Advanced models:
TSne.jl
UMAP.jl ModelingToolkit.jl
DifferentialEquations.jl
Statistical and machine Mathematical modeling DynamicalSystems.jl
learning Catalyst.jl,Turing.jl
Statistics: BifurcationKit.jl
Fig. 2 | Overview of Julia’s package ecosystem, presented by topic group. Julia consists of packages related to five main biological topics: bioinformatics,
mathematical modeling, statistical and machine learning, data handling and visualization, and the integration of non-Julia code.
a
6
Julia
R
–2
4
Julia: DifferentialEquations.jl DP5
–4 Julia: DifferentialEquations.jl Tsit5
Time (log[s])
Time (log[s])
–2 –10
b
# allocate tmp Actually in C
for i in 1:n
tmp[i] = A[i] × B[i] In Python:
In Python D=A×B+C 2 function calls and
# allocate D 2 allocations
In Julia:
for i in 1:n
In Julia D. = A. × B. + C 1 function call and
D[i] = A[i] × B[i] + C[i]
no allocation
In Julia
c
Numbers of function calls for calculating the derivative f([x,y]) Function call costs
Julia: In Julia: ~5 ns
Fused to 1 function call
Numba:
In Numba: ~150 ns
Fused to 1 function call
Time of
Time of array floating point Time of
allocation + + function calls = Inferred time Real time
operations
Julia 8 × 2 ns + 1 × 5 ns = 21 ns 20 ns
Fig. 3 | Julia’s speed feature. a, Speed-up examples relevant to biology. Lotka–Volterra model (more systems are described in ref. 50). b, Schematic
Left, comparison of the time required to calculate the mutual information of the speed up of vectorizable code (as in a). c, Schematic of the speed up of
for all possible pairs of genes of a single-cell dataset13. Right, benchmark of nonvectorizable code (as in b).
ODE solvers implemented in Julia, Fortran, C, MATLAB, Python and R for the
shorter development times. Going from an initial idea to working code Shiny) and flexible software editing environments. Julia combines
can be orders of magnitude faster than, for example, C/C++. This is in fast development with fast run-time performance and is therefore
no small measure helped by the flexible Jupyter and Pluto.jl notebook appropriate for both algorithm/method prototyping and time- and
user interfaces (which fulfill similar functions to, for example, R’s resource-intensive applications.
a b
PDB file of monomer
Existing types Domain-specific function
of Graphs.jl finding residues for allosteric communication
Read file
Extract Cβ atom
Reuse types by writing
generic pipelines
Plot distance map
Graph of contacting
New operation applies to
residues
existing type
Input/output
Betweenness Graph of contacting
centrality of residues Key steps residues
highlighting
flexibility
c
Existing generic function Domain-specific type
plot of Plots.jl in BioStructures.jl
Fig. 5 | The abstraction feature in Julia. a, Abstract Julia code enables a code for defining a new type and and a new plot recipe. This example is for the
flexible structural bioinformatics pipeline. The flow chart shows a pipeline that structure MyBioStruc, which captures the results of prediction algorithms of
combines multiple Julia packages seamlessly together. This gives developers amino acid sequences based on data. It is defined with the fields predicted_AA
and users flexibility so that the effort and time required to generate new models (a vector of characters that represent the predicted AAs), certainty_AA (a vector
and complex workflows is substantially reduced and collaboration is made of numbers quantifying the certainty for each predicted AA), study (a string
easier. PDB, Protein Data Bank. b, An example pipeline showing the solving of naming the respective study that the prediction is based on) and alg (a string
the first part of the expression problem (an illustration of which is provided in naming the respective prediction algorithm). With the macro @recipe, we can
Fig. 1) via the easy code base extension to new functions (step highlighted in specify how the function plot(…) should work for our newly specified example
blue). c, Left, an example pipeline showing the solving of another expression type. Here we define that this should create a line plot of the predicted amino
problem: extension to new types. The step highlighted in blue represents the acids with the mean of the certainty of the prediction shown by the opacity of the
point at which a new plot recipe is defined for a domain-specific type (that is, we line, specified by the Plots.jl package as α. More details on the selected example
demonstrate the extension of an existing code base to new types). Right, Julia code are provided in the Supplementary Information.
and read the structure of the protein crambin from the Protein Data Packages can be combined to meet the specific needs of each
Bank. This can be done using the BioStructures.jl package25 from the study; for example, to generate protein ensembles and predict allos-
BioJulia organization, which provides the essential bioinformatics teric sites28 or to carry out information theoretical comparisons using
infrastructure. Protein structures can be viewed using Bio3DView.jl, the MIToS.jl package29. In this example, we have used at least five dif-
which uses the 3Dmol.js JavaScript library26 as Julia can easily connect ferent packages together seamlessly. Plots.jl, BioAlignments.jl and
to packages from other languages. We can show the distance map of the Graphs.jl do not depend on or know about BioStructures.jl, but can
Cβ atoms using Plots.jl. While Plots.jl is not aware of this custom type, still be used productively alongside it (Fig. 5c). Abstraction means
a Plots.jl recipe makes this straightforward. BioSequences.jl provides that the improvements in any of these packages will benefit users
custom data types of sequences and allows us to represent the protein of BioStructures.jl, despite the packages not being developed with
sequence efficiently. With this, BioAlignments.jl can be used to align protein structures in mind.
our sequences of interest. This suite of packages can be used to carry Package composability is common across the Julia ecosystem
out single-cell, full-length total RNA sequencing analysis27 quickly and is enabled by abstract interfaces supported by multiple dispatch
and with ease. A few lines of code in BioStructures.jl allow us to define (that is, the ability to define multiple versions of the same function
the residue contact graph using Graphs.jl, giving access to optimized with different argument types). Programmers can define standard
graph operations implemented in Graphs.jl for further analysis, such functions such as addition and multiplication for their own types.
as calculating the betweenness centrality of the nodes. If coding and Abstraction means that functions in unrelated packages often just work
analysis are performed in Pluto.jl, then updating one section updates despite knowing nothing about the custom types. This is rarely seen in
the whole workflow, which assists exploratory analysis (Fig. 5b). languages such as Python, R and C/C++, where the behavior of an object
is tightly confined and combining classes and functions from different systems underlying cellular function34,35. However, the specification
projects requires much more (of what is known as) boilerplate code. of mathematical models is challenging and requires us to specify all
For example, the Biopython project30 has become a powerful of our assumptions explicitly. We then have to solve these models
package covering much of bioinformatics. However, extensions to based on these assumptions. Analyzing a given reaction network can
Biopython objects are generally added to (an increasingly monolithic) involve the solution, for example, of ordinary differential equations,
Biopython, rather than to independent packages. This can lead to delay differential equations, stochastic differential equations (SDEs)
objects and algorithms that have the difficult task of fitting all use or discrete-time stochastic processes. To create instances of each of
cases, including their dependencies, simultaneously31. In contrast, these models would—in languages such as C/C++ or Python—typically
Julia’s composability facilitates writing generic code that can be used require the writing of different snippets of code for each modeling
beyond its intended application domain. Tables.jl, for example, pro- framework. In Julia, via metaprogamming, different models can be
vides a common interface for tabular data, allowing generic code for generated automatically from a single block of code. This simpli-
common tasks on tables. Currently, some 131 distinct packages draw fies workflows and makes them more efficient, but also removes the
on this common core for purposes far beyond the initially conceived possibility of errors due to model inconsistencies.
application scope. This is an example that showcases how abstraction For example, we can consider the ERK phosphorylation process
ensures the interoperability and longevity of code. shown in Fig. 6b36. Here ERK is doubly phosoporylated (by its cognisant
The code for this example can be found at https://github.com/ kinase, MEK), upon which it can shuttle into the nucleus and initiate
ElisabethRoesch/Perspective_Julia_for_Biologists/tree/main/ changes in gene expression. Its role and importance have made ERK a
examples/Abstraction/Example_Structural_bioinformatics_with_ target of extensive further analysis, and modeling has helped to shed
composable_packages. light on its function and role in cell fate decision-making systems37.
This small system, albeit one of great importance and subtlety, forms
Metaprogramming building blocks for larger, more realistic biochemical reaction and
As our knowledge of the complexity of biological systems increases, so signal transduction38 models.
does our need to construct and analyze mathematical models of these In Julia, using the package Catalyst.jl39, this model can be written
systems (Fig. 6). Currently, most modeling studies in biology rely on directly in terms of its reactions, with the corresponding rates. Source
programming languages that treat source code as static. Once writ- code is human readable and differs minimally from the conventional
ten, it can be processed into loaded and executing code, but it is never chemical reaction systems shown in Fig. 6c.
changed while running. We can compare this linear control process with The science is encapsulated in this little snippet. Solving of the
the central dogma of biology: source code (DNA) is transformed into reaction systems then proceeds by calling the appropriate simulation
loaded code (RNA) and executing code (protein). We now know that this tool from DifferentialEquations.jl. For a deterministic model, the reac-
process (DNA⟶RNA⟶protein) is not linear and unidirectional. RNA tion network is directly converted into a system of ordinary differential
and proteins can alter how and when DNA is expressed. Programming equations (via ODESystem). The same reaction network can be directly
languages that support metaprogramming break the linear flow of the converted into a model that is specified by SDEs (via SDEProblem) or a
computer program in a analogous manner (Fig. 6a). With metaprogram- discrete-time stochastic process model (via DiscreteProblem). Each of
ming, source code can be written that is processed into loaded and these cases leads to the creation of a distinct model that can be simu-
executing code and that can be modified during run time. This shifts lated or analyzed; yet, all of the models share the underlying structure
our perception from static software to code as a dynamic instance when of the same reaction network. To simulate one of the resulting models,
the program can modify aspects of itself during run time. the user needs to specify only the necessary assumptions required for a
Metaprogramming originated in the LISP programming language simulation (that is, the parameter values and initial conditions), as well
in the early days of artificial intelligence research. It enables a form of as any further assumptions required that are specific to the model type
reflection and learning by the software, but the ability of a program to (for example, the choice of noise model for a system of SDEs). Adapting
modify computer code needs to be channeled very carefully. In Julia, the model to include nuclear shuttling40 of ERK, as in Fig. 6c, or extrinsic
this is done via a feature called hygenic macros32. These are flexible code noise upstream of ERK36 is easily achieved using metaprogramming.
templates, specified in the program, that can be manipulated at execu- The fitting of models to data, or estimation of their parameters, is
tion time. They are called hygenic because they prohibit accidentally also supported by the Julia package ecosystem. Parameter estimation
using variable names (and thus memory locations) that are defined by evaluating the likelihood, the posterior distribution or a cost func-
and used elsewhere. These macros can be used to generate repetitive tion is straightforward using the Optim.jl41 or JuMP.jl42 packages. Also,
code efficiently and effectively. because of Julia’s speed, it has become much easier to deploy Bayesian
However, there are other uses that can enable new research, and inference methods. Here, too, metaprogramming helps tools such as
this includes the development of mathematical models of biological the probabilistic programming tool Turing.jl43. Approximate Bayesian
systems. Unlike in physics, first principles (the conservation of energy, computation approaches44 also benefit from Julia’s speed, abstraction
momentum and so on) offer little guidance as to how we should con- and metaprogramming and are implemented in GpABC.jl14.
struct models of biological processes and systems. For these notori- The code for this example can be found at https://github.com/
ously complicated biological systems, trial and error, coupled with ElisabethRoesch/Perspective_Julia_for_Biologists/tree/main/examples/
biological domain expertise and state-of-the-art statistical model Metaprogramming/Example_Biochemical_reaction_networks.
selection, is required33. Great manual effort is spent on the formulation
of mathematical models, the exploration of their behavior and their Outlook
adaptation in light of comparisons with data. Metaprogramming (or Computer languages, like human languages, are diverse and changing
the abilities of introspection and reflection during run time32) and the to meet new demands. When selecting a programming language, we
ability to automate parts of the modeling process open up enormous have many choices, but often they reduce to essentially two options:
scope for new approaches to modeling biological systems (Fig. 6b), using a widely used language that everybody else is using or using the
including whole cells (see Supplementary Information). best language for the problem. Traditional languages have an enviable
track record of success in biological research. A frightening propor-
Example: biochemical reaction networks tion of the Internet and modern information infrastructure probably
Mathematical models of biochemical reaction networks allow us to ana- depends on legacy software that would not pass modern quality con-
lyze biological processes and make sense of the bewilderingly complex trol. However, it does the job, for the moment. Similarly, scientific
MKP MKP
Analogy:
Transcription factor
M Mp Mpp
DNA RNA Protein
c
Mathematical model description Metaprogramming syntax in Julia
No
Fig. 6 | Julia’s metaprogramming feature. a, Illustration of metaprogramming MAP kinases present in human cells and build compartmental models by
and an analogy to the central dogma of molecular biology. Similar to how a explicitly modeling the kinase dynamics in the nucleus and cytosol40. c, Example
transcription factor, initially encoded in DNA, can control gene expression workflow of model construction. The adaption process of models could, for
and modify RNA levels of an organism, with metaprogramming we can create example, start with a theoretical inferred mathematical description, captured
code with a feedback effect. b, An example application of metaprogramming via the @reaction_network syntax of the Julia package Catalyst.jl. Subsequently,
in biology. Metaprogramming is especially helpful for large-scale, automated given experimental data, we evaluate an objective function of the current model,
model development. We can write code that adapts the model definition capturing the descriptiveness of the model in light of the data. Depending on the
automatically (for example, in light of new data or based on how they interact outcome of this evaluation, the model will be updated (for example, by adding
with other submodels). For example, when constructing models of cellular new reactions to the model via the macro @add_reactions). More details on the
systems V1, V2, ..., Vn, we can combine structurally similar models for the different selected example code are provided in the Supplementary Information.
progress is possible with legacy software. Python and R are far from On top of all of that, is a state-of-the-art package manager. All pack-
legacy and have plenty of life in them, and there are tools that allow us ages and Julia itself are maintained via Git, which makes installing
to overcome their intrinsic slowness45. and updating the Julia language, packages and their dependencies
Here we have tried to explain why we consider Julia a language for straightforward6.
the next chapter in the quantitative and computational life sciences. Julia has a smaller user base than R and Python, but it is growing.
Julia was designed to meet the current and future demands of scien- In some domains these languages have truly impressive package
tific and data-intensive computing46. It is an unequivocally modern ecosystems. R and the associated Bioconductor project, in particular,
language and it does not have the ballast of a long track record going have been instrumental in bringing sophisticated bioinformatics, data
all the way to the pre-big data days. The deliberate choices made by analysis and visualization methods to biologists. For many, they have
the developers furthermore make it fast and give developers and users also served as a gateway into programming. In other application areas
of the language a level of flexibility that is difficult to achieve in other (notably, the simulation of dynamical systems), Julia has leapfrogged
common languages such as R and Python, but also C/C++ and Fortran. the competition47. Many of the speed advantages of Julia come from
just-in-time compilation, which underlies and enables good run-time 13. Chan, T. E., Stumpf, M. P. & Babtie, A. C. Gene regulatory network
performance. This, however, takes time and causes what is known as inference from single-cell data using multivariate information
latency. Latency can be a problem for applications with hard real-time measures. Cell Syst. 5, 251–267.e3 (2017).
constraints, such as being the embedded code on a medical device that 14. Tankhilevich, E. et al. GpABC: a Julia package for approximate
requires strict accurate updates at 100-ms intervals. Bayesian computation with Gaussian process emulation.
Julia was designed to meet the current and future demands of Bioinformatics 36, 3286–3287 (2020).
scientific and data-intensive computing. The Julia alternative that 15. Innes, M. Flux: elegant machine learning with Julia. J. Open
arguably has the most traction is Rust. Rust is an emerging language Source Softw. 3, 602 (2018).
that has syntactic similarity to C++ but is better at managing memory 16. Rackauckas, C. & Nie, Q. DifferentialEquations.jl—a performant
safely. It detects discrepancies of type assignments at compile time and feature-rich ecosystem for solving differential equations in
and not just at run time, as is the case for C/C++. For this reason, it is Julia. J. Open Res. Softw. 5, 15 (2017).
being used in, for example, the Linux kernel. In the biological domain, it 17. Chen, J. et al. Spatial transcriptomic analysis of cryosectioned
could become a choice for medical devices (as we can control latency) tissue samples with Geo-seq. Nat. Protoc. 12, 566–580
or bioinformatics servers that would previously have been developed (2017).
in Java or C/C++. 18. Mahon, S. S. M. et al. Information theory and signal transduction
These advantages of a new language need to be balanced against systems: from molecular information processing to network
the convenience of programmers who are able to tap into the collective inference. Semin. Cell Dev. Biol. 35, 98–108 (2014).
knowledge of vast user communities. All languages have started small and 19. Meyer, P. E., Lafitte, F. & Bontempi, G. minet: a R/Bioconductor
had to develop user bases. The Julia community is growing, including in package for inferring large transcriptional networks using mutual
the biomedical sciences, and it appears to be acutely aware of the needs information. BMC Bioinformatics 9, 461 (2008).
of newcomers to Julia (and under-represented minorities in the compu- 20. Bates, D. Julia MixedModels from R. https://rpubs.com/
tational sciences more generally48; see, for example, https://julialang.org/ dmbates/377897 (2018).
diversity/ for details), which makes the switch to Julia easier9. 21. Lange, K. Algorithms from the Book (SIAM, 2020).
We have described the three main language design features that 22. Oliveira, S. & Stewart, D. E. Writing Scientific Software: a Guide to
make Julia interesting for the scientific computing: speed, abstrac- Good Style (Cambridge Univ. Press, 2006).
tion and metaprogramming. We have provided some intuition that 23. Alyass, A., Turcotte, M. & Meyre, D. From big data analysis to
fills these concepts with life, and we have illustrated how they can be personalized medicine for all: challenges and opportunities. BMC
exploited in different biological domains, and how speed, abstraction Med. Genom. 8, 33 (2015).
and metaprogramming together enable new ways of performing bio- 24. Gomez-Cabrero, D. et al. Data integration in the era of
logical research. Even though we have introduced these features sepa- omics: current and future challenges. BMC Syst. Biol. 8, I1
rately, they are deeply intertwined. For example, a lot of the speed-up (2014).
opportunities of Julia derive from the language’s abstraction powers; 25. Greener, J. G., Selvaraj, J. & Ward, B. J. BioStructures.jl: read, write
abstraction in turn makes metaprogramming easier. and manipulate macromolecular structures in julia. Bioinformatics
36, 4206–4207 (2020).
References 26. Rego, N. & Koes, D. 3Dmol.js: molecular visualization with WebGL.
1. Tomlin, C. J. & Axelrod, J. D. Biology by numbers: mathematical Bioinformatics 31, 1322–1324 (2014).
modelling in developmental biology. Nat. Rev. Genet. 8, 331–340 27. Hayashi, T. et al. Single-cell full-length total RNA sequencing
(2007). uncovers dynamics of recursive splicing and enhancer RNAs.
2. Auton, A. et al. A global reference for human genetic variation. Nat. Commun. 9, 619 (2018).
Nature 526, 68–74 (2015). 28. Greener, J. G., Filippis, I. & Sternberg, M. J. Predicting protein
3. Robson, B. Computers and viral diseases. preliminary dynamics and allostery using multi-protein atomic distance
bioinformatics studies on the design of a synthetic vaccine and a constraints. Structure 25, 546–558 (2017).
preventative peptidomimetic antagonist against the SARS-CoV-2 29. Zea, D. J., Anfossi, D., Nielsen, M. & Marino-Buslje, C. MIToS.jl:
(2019-nCoV, COVID-19) coronavirus. Comput. Biol. Med. 119, mutual information tools for protein sequence analysis in the Julia
103670 (2020). language. Bioinformatics 33, 564–565 (2017).
4. Seefeld, K. & Linder, E. Statistics Using R with Biological Examples 30. Cock, P. J. A. et al. Biopython: freely available Python tools
(K. Seefeld, 2007). for computational molecular biology and bioinformatics.
5. Ekmekci, B., McAnany, C. E. & Mura, C. An introduction to Bioinformatics 25, 1422–1423 (2009).
programming for bioscientists: a Python-based primer. PLoS 31. Kunzmann, P. & Hamacher, K. Biotite: a unifying open source
Comput. Biol. 12, e1004867 (2016). computational biology framework in Python. BMC Bioinformatics
6. Sengupta, A. & Edelman, A. Julia High Performance (Packt 19, 346 (2018).
Publishing, 2019). 32. Perera, R. Programming languages for interactive computing.
7. Nazarathy, Y. & Klok, H. Statistics with Julia: Fundamentals for Data Electron. Notes Theor. Comput. Sci. 203, 35–52 (2008).
Science, Machine Learning and Artificial Intelligence (Springer, 33. Kirk, P. D. W., Babtie, A. C. & Stumpf, M. P. H. Systems biology (un)
2021). certainties. Science 350, 386–388 (2015).
8. Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: a fresh 34. Kirk, P., Thorne, T. & Stumpf, M. P. Model selection in systems
approach to numerical computing. SIAM Rev. 59, 65–98 (2017). and synthetic biology. Curr. Opin. Biotechnol. 24, 767–774
9. Lauwens, B. & Downey, A. Think Julia: How to Think like a (2013).
Computer Scientist (O’Reilly Media, 2021). 35. Warne, D. J., Baker, R. E. & Simpson, M. J. Simulation and inference
10. Marx, V. The big challenges of big data. Nature 498, 255–260 algorithms for stochastic biochemical reaction networks:
(2013). from basic concepts to state-of-the-art. J. R. Soc. Interface 16,
11. Björnsson, B. et al. Digital twins to personalize medicine. Genome 20180943 (2019).
Med. 12, 4 (2019). 36. Filippi, S. et al. Robustness of MEK-ERK dynamics and origins of
12. Laubenbacher, R., Sluka, J. P. & Glazier, J. A. Using digital twins in cell-to-cell variability in MAPK signaling. Cell Rep. 15, 2524–2535
viral infection. Science 371, 1105–1106 (2021). (2016).
37. Michailovici, I. et al. Nuclear to cytoplasmic shuttling of ERK Science Foundation (DMS 2045327). T.E.H. acknowledges NIH
promotes differentiation of muscle stem/progenitor cells. 1UF1NS108176. The information, data and work presented herein was
Development 141, 2611–2620 (2014). funded in part by the Advanced Research Projects Agency—Energy
38. MacLean, A. L., Rosen, Z., Byrne, H. M. & Harrington, H. A. under award numbers DE-AR0001222 and DE-AR0001211, as well as
Parameter-free methods distinguish Wnt pathway models and National Science Foundation award number IIP-1938400. The views
guide design of experiments. Proc. Natl Acad. Sci. USA 112, and opinions of the authors expressed herein do not necessarily state
2652–2657 (2015). or reflect those of the US Government or any agency thereof. M.P.H.S.
39. Loman, T. E. et al. Catalyst: fast biochemical modeling with Julia. acknowledges funding from the University of Melbourne Driving
Preprint at bioRxiv https://doi.org/10.1101/2022.07.30.502135 Research Momentum initiative and Volkswagen Foundation Life?
(2022). program grant (grant number 93063), as well as support through an
40. Harrington, H. A., Feliu, E., Wiuf, C. & Stumpf, M. P. Cellular Australian Research Council Laureate Fellowship.
compartments cause multistability and allow cells to process
more information. Biophys. J. 104, 1824–1831 (2013). Author contributions
41. Mogensen, P. K. & Riseth, A. N. Optim: a mathematical E.R. and M.P.H.S. conceived of the concept of the project and were in
optimization package for Julia. J. Open Source Softw. 3, charge of the overall direction and planning. All authors contributed to
615 (2018). writing the manuscript and have read and approved the final version.
42. Dunning, I., Huchette, J. & Lubin, M. JuMP: a modeling language
for mathematical optimization. SIAM Rev. 59, 295–320 (2017). Competing interests
43. Ge, H., Xu, K. & Ghahramani, Z. Turing: a language for flexible E.R. is a Sales Engineer at JuliaHub. C.R. is the Vice President of
probabilistic inference. In Proc. 21st International Conference on Modeling and Simulation at JuliaHub and Director of Scientific
Artificial Intelligence and Statistics 1682–1690 (Proc. Machine Research at Pumas-AI. T.E.H. is a steward of the Julia project. H.N. is a
Learning Res., 2018). Senior Computer Scientist at RelationalAI. J.G.G., A.L.M. and M.P.H.S.
44. Liepe, J. et al. A framework for parameter estimation and declare no competing interests.
model selection from experimental data in systems biology
using approximate bayesian computation. Nat. Protoc. 9, Additional information
439–456 (2014). Supplementary information The online version contains
45. Harris, C. R. et al. Array programming with NumPy. Nature 585, supplementary material available at
357–362 (2020). https://doi.org/10.1038/s41592-023-01832-z.
46. Stanitzki, M. & Strube, J. Performance of Julia for high energy
physics analyses. Comput. Softw. Big Sci. 5, 10 (2021). Correspondence should be addressed to Michael P. H. Stumpf.
47. Rackauckas, C. et al. Accelerated predictive healthcare analytics
with Pumas, a high performance pharmaceutical modeling Peer review information Nature Methods thanks Nico Stuurman
and simulation platform. Preprint at bioRxiv https://doi.org/ and the other, anonymous, reviewers for their contribution to the
10.1101/2020.11.28.402297 (2020). peer review of this work. Primary Handling Editor: Rita Strack, in
48. Whitney, T. & Taylor, V. Increasing women and underrepresented collaboration with the Nature Methods team.
minorities in computing: the landscape and what you can do.
Computer 51, 24–31 (2018). Reprints and permissions information is available at
49. Sharpe, J. Computer modeling in developmental biology: www.nature.com/reprints.
growing today, essential tomorrow. Development 144,
4214–4225 (2017). Publisher’s note Springer Nature remains neutral with regard to
50. Rackauckas, C. Benchmark of ODE solvers in Julia. https://github. jurisdictional claims in published maps and institutional affiliations.
com/SciML/MATLABDiffEq.jl (2019).
Springer Nature or its licensor (e.g. a society or other partner) holds
Acknowledgements exclusive rights to this article under a publishing agreement with
We thank all attendees of the Birds of a Feather session Julia for the author(s) or other rightsholder(s); author self-archiving of the
Biologists at JuliaCon2021; D. F. Gleich for allowing us to run an accepted manuscript version of this article is solely governed by the
experiment on his servers; and R. Patro for discussions about Rust. E.R. terms of such publishing agreement and applicable law.
acknowledges financial support through a University of Melbourne
PhD scholarship. A.L.M. acknowledges support from the National © Springer Nature America, Inc. 2023, corrected publication 2023