Smith 2015
Smith 2015
1093/bib/bbu030
Advance Access published on 1 September 2014
Buying in to bioinformatics: an
introduction to commercial
sequence analysis software
David Roy Smith
Submitted: 25th June 2014; Received (in revised form) : 7th August 2014
Abstract
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bio-
informatics programs and software packages. Given the importance of molecular sequence data in contemporary
life science research, these software suites are becoming an essential component of many labs and classrooms, and
as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits.
Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more
arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right soft-
ware for their needs, especially if they do not have a bioinformatics background. This review highlights some of
the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, fea-
tures and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced
and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just
beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore
proprietary software bundles. They have the potential to streamline your research, increase your productivity, ener-
gize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.
Keywords: bioinformatics software; CLC bio; Geneious; genome assembly; nucleotide alignment; phylogenetics software
Corresponding author. David Roy Smith, University of Western Ontario, London, Ontario N6A 5B7, Canada. E-mail:
[email protected]
David Roy Smith is an assistant professor of biology at the University of Western Ontario, where he studies genome evolution of
eukaryotic microbes. He can be found online at www.arrogantgenome.com and @arrogantgenome.
Approximate price of a single-user academic license. Prices were taken directly from company websites (as of 1 June 2014) or were obtained by sales representatives sometime between January and June 2014.
Many companies offer a range of pricing and licensing options, and frequently have promo deals. bRuns on the following platforms: Mac (M), Windows (W) and Linux (L). cCan store, organize and analyse
suitability
(e.g. assemble or map to a reference sequence) next-generation sequencing data. In some cases, de novo assembly features are missing. dContains some tools for studying molecular evolution, such as those for
Teaching
prices in US dollars) for a student license of
Geneious, which allowed me to install the software
5
3
3
3
3
3
3
3
on a single computer. As Geneious increases in
popularity, so does its price tag. As of May 2014, a
performing multiple sequence alignments, phylogenetic analyses and/or repeat identification. eIs able to connect and interact with online sequence databases, such as GenBank. 3 ¼ yes, 5 ¼ no
Workflows student license costs $395 (a standard academic one is
$795), which still makes it among the least expensive
5
5
5
5
5
3 all-in-one commercial suites on the market. In com-
3
3
3
3
parison, stand-alone academic licenses of the
Plug-ins
5
5
5
3
3
3
3
3
USA) are around $6000 and $2500, respectively
searchinge
(Table 1).
Database
5
3
3
3
3
3
3
3
matics platforms, which have varied widely in price,
usability and quality. In several cases, the costs of
Evolutionary
Table 1: Examples, features and comparisons of some commonly used commercial bioinformatics software suites
analysesd
5
3
3
3
3
3
3
3
3
5
3
3
3
3
3
3
3
3
M, W, L
M, W
M, W
M, W
W
M
30
30
30
20
35
14
21
$2500
$5500
$5950
$600
$300
$720
$795
$295
Softgenetics
MacVector
Genamics
prepared to be pestered.
how long?
MacVector & Assembler
VectorNTI Advance
Sequencher
NextGENe
Software
maintenance, upgrades and support. Shortly after I for instance—your bioinformatics package might
bought my student license for Geneious, the firm have to be upgraded as well. Most bioinformatics
released a new version of the software. Because this companies offer their software for both Windows
occurred within 1 year of my purchasing the pro- and Apple platforms, and some, including
gram, I was able to upgrade to the newest version Geneious and CLC bio, have Linux versions too,
for free. Geneious and other bioinformatics manu- so in most cases, it is possible to switch operating
facturers have recently switched to ‘version-based systems completely and continue running the
licensing’, meaning that users receive free updates program.
for their version of the software (e.g. switching Things get even more complicated when purchas-
from v1.1 to v1.2), no matter when they are ing network (or ‘floating’) licenses of bioinformatics
released, but access to newer versions (e.g. switching programs. Unlike a single computer license, which
from v1 to v2) requires an upgrade, which typically works only on one computer, a network/floating
costs anywhere from 25 to 75% of the software list license allows multiple people to use a bioinformatics
price. package simultaneously by logging on to a network
Last year, for approximately $6000, I purchased as computer (e.g. a powerful computer housed in the
part of a package deal a single academic license of lab) and running the program from it. The number
CLC Genomics Workbench and a genome finishing of people that can log on depends on the number of
plug-in (more on plug-ins later). Enrollment in the floating licenses that were purchased. Network/
maintenance, upgrade and support program for the floating licenses are more expensive (typically twice
first 12 months, which was mandatory, was an add- the price) than their single-computer counterparts,
itional $1500, making the initial cost of the software but they can be more economical for big labs or
$7500. Renewal of the maintenance program was classroom settings, where purchasing multiple
25% of the purchase price per year, and, most im- single-user licenses makes less sense. Floating licenses
portantly, was automatic, ‘unless terminated in writ- can also be convenient for groups that have a high
ing by one of the involved parties (CLC bio or the turnover—such as those with a lot of summer stu-
customer) not later than 3 months before the begin- dents and undergraduate volunteers—as they allow
ning of the next calendar year’. In other words, software key codes to be issued to individual lab
9 months after buying the software, I was sent an members and then taken back once the member
invoice for $1500, with 2% interest per month. leaves. Sequencher (Table 1) offers a ‘hardkey’
Although costly, subscribing to the maintenance option, whereby the user is sent a USB dongle
agreement can be wise. Commercial bioinformatics after purchasing the software. Sequencher can then
programs (Table 1), such as Geneious, CLC be loaded onto as many computers as the owner
Genomics Workbench and Lasergene, frequently wants—all that is required to activate the software
undergo major changes, which can significantly im- is plugging in the USB key. But, as I can attest, USB
prove the software. In the past, I have regretted not dongles are easy to misplace (and, if issued from
renewing certain software, and more than once Sequencher, expensive and inconvenient to replace).
I have bought programs anew at full price because Cloud computing has also arrived to bioinfor-
I let the maintenance period expire. matics [20]. Companies like DNAnexus,
Before investing in a bioinformatics package, there InterpretOmics, and others are selling bioinformatics
are other important details to consider. I suggest as a service, whereby consumers buy online access to
asking about the rules on moving the software to powerful computers and their associated software
another computer, in case, for example, you buy a tools, analysis pipelines and data storage and sharing
new laptop or your old one breaks down. I have capabilities. The sequencing giant Illumina sells
found that most companies allow users to transfer online access to their genomics cloud-computing in-
their software license to a different computer. But frastructure BaseSpace—10 terabytes of storage will
doing so normally requires contacting user support run you $12 000 per year. Alternatively, the popular
for a new software activation key, and if you have let web-based platform Galaxy is a free, open-source,
your maintenance agreement expire, then you might cloud-based bioinformatics tool. It is safe to assume
have to renew it before being able to migrate the that bioinformatics clouds will only grow larger and
software. Similarly, if you update your computer more popular over the next few years and are where
operating system—from Apple OS X 10.8 to 10.9, the most innovative new software will be based.
704 Smith
Figure 1: The tools and features commonly found in commercial bioinformatics software packages, and what to
keep in mind when purchasing one.
But what does the software actually do? interphase’. These kinds of claims are often associated
You have paid your money and decided on the best with a white paper describing the software’s de novo
maintenance and licensing options for your needs, assembler, including its algorithm, speed and accur-
now what? Well, it is time to start examining mo- acy, how well it performs on standard datasets, such
lecular sequence data and making some big discov- as the human genome, and how it stacks up against
eries, of course. Commercial bioinformatics packages other brand-name and open-source assemblers.
bring together, into a single browser-based platform, White papers, however, do tend to present commer-
a diversity of nucleotide and protein analysis tools cial software in an overly positive light and—unlike
(Figure 1). These tools do everything from simple open-source programs—only a few of the widely
pairwise alignments to restriction site and gene pre- used proprietary tools have undergone peer review.
dictions to whole genome and transcriptome assem- Commercial browser-based assemblers once had a
blies. Given the prevalence of high-throughput reputation for being slow, memory-expensive and
sequencing in life science research, many of the inferior to the free open-source alternatives. Early
tools are designed for analysing, visualizing and on, I admittedly struggled to generate quality assem-
arranging NGS information. blies, even of small genomes, using commercial
One of the most sought after and marketed fea- programs. In recent years, however, proprietary as-
tures of commercial bioinformatics software is their sembly algorithms have improved immensely and are
ability to perform fast, efficient and high-quality de now used by some of the top academic and industrial
novo assemblies of NGS data—taking millions, even research laboratories in the world. With software like
billions, of single or paired-end sequencing reads and CLC Genomics Workbench v7, I have been able to
assembling them into contigs. Go to any of the big assemble draft genome and transcriptome sequences
bioinformatics software websites and you will find of microalgae from my laptop computer, which has
statements like ‘Dominating the high-throughput 16 GB of memory and an Intel Core i7 processor.
sequencing data analysis challenge’, ‘Quick and ac- Many teams are using proprietary tools to assemble
curate de novo assembly on a desktop computer’ and complex eukaryotic nuclear genomes, including
‘Next-gen sequence assembly with a clear graphical those of land plants. But these kinds of assemblies
Buying in to bioinformatics 705
require large amounts of time, resources and com- citations for commercial software suites, especially
puting power. their assembly and mapping algorithms, are on the
Commercial assemblers, unlike certain open- rise and catching up to their open-source counter-
source ones, are also great at handling data from dif- parts. A keyword search of ‘CLC Genomics’ in
ferent sequencing platforms, such as assembling a Google Scholar returns >2000 hits. Visit the
mixture of Illumina, 454, PacBio and Sanger reads Geneious blog (http://blog.geneious.com) and you
(Table 1); in fact, for many researchers, this is a will find a section called ‘Citation Sunday’, high-
key selling point. In March 2014, for example, lighting peer-reviewed research that used
Northwestern University purchased an organiza- Geneious. Click the ‘publications’ link on the
tion-wide license of Lasergene, providing all faculty, DNASTAR homepage (www.dnastar.com) and
staff and students with access to the software [21]. you will see a long list of papers and the following
Similarly, the J. Craig Venter Institute has been using bold statement: ‘Every year for the last 28 years,
‘CLC bio’s enterprise platform since 2009 and more researchers have cited DNASTAR’s software
currently uses it on more than 30 research grants, in scientific journals than any other sequence analysis soft-
including their work as part of the Human ware’ (italics their own). Skimming through these
Microbiome Project’ [22]. publications, it is obvious that most papers citing
Read mapping, which is when sequencing reads proprietary programs reference a range of open-
are aligned to a reference, such as an entire chromo- source ones as well, and that contemporary genomics
some or genome, is another core feature of commer- research often involves a hodgepodge of commercial
cial bioinformatics packages. Like with the de novo and free bioinformatics software. Lizzy Sollars, a PhD
assemblers, bioinformatics companies regularly student at CLC bio, put it best when describing her
boast about their highly tuned, ultra-fast mapping work on the Ash Tree Genome Project: ‘Using CLC
algorithms for reference-guided alignments. CLC bio’s de novo assembler, along with the open-source
bio maintains that their ‘read mapper not only scaffolding tool SSPACE, we produced our best de
maps more than 1.3 billion Illumina reads (100 nt, novo assembly so far’ [26]. Visit the Broad Institute
paired-end) in less than 5 hours, but [that it] also Software Archive (www.broadinstitute.org/scien-
achieves consistently high mapping accuracy even tific-community/software) for a list of widely used
for complex read data, such [as those] originating open-source tools for analysing large genome-related
from the PacBioRS system’ [23]. They go on to datasets.
argue that the CLC ‘mapper consistently outper-
forms the market in all major disciplines’, including More than just browser-based assemblers
the open-source peer-reviewed mapping algorithms and mappers
Bowtie 2 and BWA [23]. Geneious makes similar Commercial sequence analysis suites, in addition to
claims about their proprietary mapper: ‘Six read assembling and mapping NGS data, are designed to
mapping algorithms were evaluated on Illumina carry out the day-to-day bioinformatics tasks
HiSeq and Ion Torrent sequence data from an involved in molecular, evolutionary and genome
Escherichia coli—BWA (0.6.2-r126), Bowtie 1 biology (Figure 1). Although it might sound trivial,
(0.12.8), Bowtie 2 (2.0.0-beta7), SMALT (0.6.4), one of the more useful features of commercial pack-
SOAP2 (2.20) and Geneious (6.0.3). The results ages is visualizing, organizing and storing molecular
demonstrate that the Geneious Read Mapper pro- sequence information. The intuitive graphical inter-
duces superior results to the other mapping algo- faces of commercial software allow users to easily
rithms on these data sets’ [24]. The claims can be build folder hierarchies and drop-down lists of se-
overstated, but in my experience commercial read quence data, move or export these data to different
mappers are as good as or outperform many of the folders and change file formats for use in other ap-
open-source alternatives. plications. In most cases, the software can connect to
The ultimate test for any assembler or read mapper online resources, such as the National Centre for
is whether it is cited in peer-reviewed journals. Biotechnology Information (NCBI) and UniProt,
There is no question that open-source programs providing quick direct access to vast amounts of nu-
are cited more than proprietary ones. The paper pre- cleotide and protein sequence information, which
senting the mapper Bowtie 2, for instance, has can then be downloaded, interpreted and analysed
received 700 citations in just 2 years [25]. But through interactive sequence viewers. Many
706 Smith
commercial programs also give users the ability to bioinformatics packages is continually expanding.
BLAST [4] their data directly against NCBI and Plug-ins work in two ways: they allow users to
UniProt databases, or custom databases, and view add more features to the software, but they also
and analyse the results through GUIs. My research allow developers to design their own apps for the
on organelle DNA has benefited greatly from these software. Bioinformatics plug-ins can bring some of
types of search tools—in minutes, using commercial the most commonly used open-source software to
software, I can download all of the completely proprietary programs, giving users the benefits of a
sequenced mitochondrial and chloroplast genomes user-friendly GUI and the power of peer-reviewed
from GenBank, extract their annotations, sort and algorithms. A cursory scan through the plug-in list
search them based on a range of features and transfer for Geneious reveals programs for phylogenetics (e.g.
them to subfolders for downstream analyses. GARLI [31], MrBayes [16] and RAxML [32]), NGS
The applications within commercial bioinfor- assembly and mapping (e.g. Velvet [33], TopHat [34]
matics suites that I tend to use most often are for and Bowtie [25]), sequence alignment (e.g. ClustalW
evolutionary analyses and comparative genomics. [13], MAUVE [14] and Muscle [35]) and other
Most packages come with software for aligning molecular analysis procedures (e.g. Glimmer Gene
nucleotide and amino acid sequences (and entire Prediction [36], Phobos Tandem Repeat Finder
chromosomes) as well as tools for inferring evolu- (e.g. [37]) and DualBrothers Recombination
tionary relationships among sequences and con- Detection [38]). More plug-ins means more func-
structing phylogenetic trees and distance matrices. tions and sometimes more money. CLC bio provides
Other useful tools include protein structure predic- a wide range of plug-ins for their Genomics
tion, nucleotide repeat and motif finders and primer Workbench package (www.clcbio.com/clc-plugin),
prediction software. An advantage to performing many of which are free, but some can cost hundreds
these kinds of analyses within commercial software even thousands of dollars—the Shannon Human
is that the results—be they genome maps, align- Splicing Pipeline plug-in is around $4000.
ments, nucleotide sequence dot plots or phylogen- Once you have found the tools and plug-ins to
etic trees—are depicted in colourful and editable suit your needs, you can start linking them together
graphics, which can be exported and used for figures into ‘workflows’ and pipelines. As CLC bio puts it:
in lectures and publications. I regularly build genome ‘A workflow consists of a series of tools where the
maps with Geneious and then export them to a output of one tool is connected as the input to an-
graphics-editing program for further polishing. All other tool. This way you can set up a workflow to
of the genome maps in Smith et al. [27, 28], for ex- go through (for example) read mapping, using the
ample, were constructed with Geneious. The inter- mapped reads as input for variant detection, and per-
active graphical visualization tools of commercial form filtering of the variant track’. Workflows
suites are excellent for exploring large genomic can save researchers huge amounts of time and are
data sets (often depicted in stacked views) and becoming more widespread among commercial bio-
allow for quick navigation to regions or contigs informatics packages. If you do not want to fork
of interest. Many of these features parallel those of out the big bucks, check out The Galaxy Project
popular freely available NGS viewers, like the (http://galaxyproject.org)—a free, web-based and
Interactive Genomics Viewer [29] and Tablet [30]. user-friendly bioinformatics workflow management
If you purchase a bioinformatics package and dis- system, which provides access to a large number of
cover that a particular function is missing, do not data integration and analysis programs.
panic because there is probably a ‘plug-in’ that can
do the job. Plug-ins are downloadable applications Bringing bioinformatics into the
that provide additional features to software pack- classroom
ages—similar to apps for smartphones and tablets. Students today are reared on a digital diet of smart-
For bioinformatics software, plug-ins add an array phones, tablets and ultra-sleek retina-display laptops
of new sequence analysis tools (ones that comple- filled with intuitive software apps, which integrate
ment existing tools or that add novel functions), seamlessly across platforms and devices. Thus, when
greatly improving the package. Companies are con- these students are introduced to bioinformatics and
stantly designing new plug-ins for their software, molecular evolution, one would expect them to
which means that the repertoire of tools within engage more easily and enthusiastically with
Buying in to bioinformatics 707
easy-to-use GUI software than with barebones com- certain operations are turned off (e.g. assemblies
mand-line-driven tools. cannot be exported or saved). However, even with
Commercial bioinformatics suites, given their limited functions, the software can still provide
browser-based point-and-click interface, lend them- enough processes for teaching and developing assign-
selves to teaching and learning. From a lecturer’s ments [39]. Again, there is nothing preventing in-
perspective, the high-end graphics, visual aids and structors from investing in a personal copy of the
tutorials built into proprietary software are great for software and using it for lectures.
communicating bioinformatics topics, themes and
procedures, from sequence alignments to contig Give it try and give us your feedback
assemblies to blasting proteins against GenBank. Going forward, innovations in molecular sequencing
I regularly incorporate bioinformatics software techniques will result in ever more sophisticated bio-
suites into my undergraduate lectures and conference informatics programs, and it is crucial that these pro-
presentations. With my notebook computer con- grams are accessible to a broad range of users. We
nected to a projector, I can use a program like might soon be at a point where walk-in medical
Geneious to effectively communicate to a large audi- clinics have genome sequencing and bioinformatics
ence the procedures and output of various bioinfor- desks, where patients can play an active role in in-
matics analyses. For example, using a bioinformatics terpreting their gene sequences and contributing to
package, it takes me 10 min to import a set of genetic treatments, and where high-school students
Illumina sequencing reads, download a reference assemble and analyse genomes for homework. The
genome from GenBank, map the reads to the refer- increasingly integral role of bioinformatics in re-
ence and then zoom in to the resulting alignment, search, medicine and society also means that it will
showing the class where the reads mapped onto the become an increasingly larger, more lucrative indus-
genome, the polymorphic sites, paired-end distances try and one where users will have to pay for the best
and an assortment of other statistics. With the same products.
software, I can design, distribute and evaluate bio- My own experiences with proprietary bioinfor-
informatics assignments to be completed inside or matics software have been positive. The tools I
outside of the classroom. These assignments typically have purchased have made my laboratory group
involve a range of sequence analysis tools where the and me more productive, and I certainly enjoy
results of one tool are used as input for another. using stand-alone GUI-based programs more than
I almost always receive positive feedback from command-line driven ones. This productivity and
students when using user-friendly bioinformatics— ease of use, however, has come at a cost, both intel-
some students have even said that it has inspired lectually and financially. Although I use sequence
them to pursue a career in bioinformatics. analysis tools almost every day, my bioinformatics
Obviously, the biggest barrier to bringing com- skills, in certain respects, have plateaued. Moreover,
mercial software into the classroom is the high finan- the licensing and upgrading costs of using commer-
cial cost of the programs. It is unreasonable to ask cial software represent a significant proportion of my
students to pay hundreds of dollars for proprietary laboratory’s operating budget. Another downside to
software, and most undergraduate departments are commercial bioinformatics is that the user can lose
unable or unwilling to invest thousands of dollars touch with what the programs/algorithms are actu-
into bioinformatics teaching resources—although ally doing (they can be a ‘black box’), whereas it is
with institutes like Northwestern buying campus- simple to look ‘under the hood’ of open-source
wide access to proprietary programs, this might be tools, which makes them easy to modify and de-
changing. velop. But as bioinformatics software and algorithms
One strategy for using commercial bioinformatics become increasingly complex, it might be unrealistic
in a course is to get all of the students to apply for a to expect students to have a strong grasp of the math,
free trial version of the software. Their access to the theory and computer science that underpin those
software will be limited to 30 days, but this should processes.
be long enough for them to complete a few assign- If you are considering commercial programs, I
ments or workshops. Alternatively, some commercial recommend taking advantage of the free trials that
bioinformatics packages can be downloaded and used most of the bioinformatics companies offer. You
for free on a ‘basic’ or ‘test’ mode, which means that may find that these programs streamline your
708 Smith
research and invigorate your classroom, or that they 8. Tamura K, Stecher G, Peterson D, etal. MEGA6: Molecular
Evolutionary Genetics Analysis version 6.0. Mol Biol Evol
are a waste of time and resources and you are better 2013;30:2725–9.
off using open-source and/or freeware alternatives. 9. Okonechnikov K, Golosova O, Fursov M. Unipro
Wherever you stand on the topic, I urge you to share UGENE: a unified bioinformatics toolkit. Bioinformatics
your opinions and experiences with others—and best 2012;28:1166–7.
of luck with all of your bioinformatics endeavours. 10. Vincent AT, Charette SJ. Freedom in bioinformatics. Front
Genet 2014;5:259.
11. Ewing B, Green P. Basecalling of automated sequencer
traces using phred. II. Error probabilities. Genome Res
Key Points 1998;8:186–94.
Innovations in molecular sequencing techniques, and the popular 12. Gordon D, Abajian C, Green P. Consed: a graphical tool
use of these technologies, have given rise to a range of user- for sequence finishing. Genome Res 1998;8:195–202.
friendly commercial bioinformatics software suites.
13. Larkin MA, Blackshields G, Brown NP, et al. Clustal W and
Often marketed as one-stop bioinformatics toolkits, these soft-
Clustal X version 2.0. Bioinformatics 2007;23:2947–8.
ware packages can be expensive, and it can be difficult for con-
sumers to choose between the different programs. 14. Darling AC, Mau B, Blattner FR, et al. Mauve: multiple
This review explores some of the currently available proprietary alignment of conserved genomic sequence with rearrange-
bioinformatics packages, comparing their prices, usability, func- ments. Genome Res 2004;14:1394–403.
tions and suitability for teaching. 15. Yang Z. PAML 4: Phylogenetic analysis by maximum like-
Some commercial bioinformatics programs are arguably over- lihood. Mol Biol Evol 2007;24:1586–91.
priced and overhyped, but many are well designed, sophisticated 16. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylo-
and, in my opinion, worth the investment. genetic inference under mixed models. Bioinformatics 2003;
I encourage readers to explore commercial bioinformatics pack- 19:1572–4.
ages; they have the potential to streamline your research, in-
17. Guindon S, Gascuel O. A simple, fast and accurate algo-
crease your productivity and energize your classroom.
rithm to estimate large phylogenies by maximum likeli-
hood. Syst Biol 2003;52:696–704.
18. Swofford DL. PAUP* Phylogenetic Analysis Using Parsimony
(*and other methods) Version 4.0b10a. Sunderland, MA:
Acknowledgment
Sinauer Associates, 2002.
The author thanks four anonymous reviewers whose feedback
greatly improved the manuscript. 19. Maddison WP, Maddison DR. MacClade Version 3.
Sunderland, MA: Sinauer Associates, 1992.
20. Stein LD. The case for cloud computing in genome inform-
atics. Genome Biol 2010;11:207.
FUNDING 21. DNASTAR press release, 31 March 2014: Northwestern
University adopts DNASTAR Lasergene software. http://
This work was supported by a Discovery Grant to www.dnastar.com/t-NorthwesternPress.aspx (1 June 2014,
DRS from the Natural Sciences and Engineering date last accessed).
Research Council (NSERC) of Canada. 22. CLC bio press release, 8 Jan 2013: J. Craig Venter Institute
extends CLC bio site license through 2017. http://www.
clcbio.com/news/jcvi-extends-site-license/(1 June 2014,
date last accessed).
References 23. CLC bio White Paper, Read Mapping. 2012. http://www.
clcbio.com/files/whitepapers/whitepaper-on-CLC-read-
1. Metzker ML. Sequencing technologies — the next gener- mapper.pdf (1 June 2014, date last accessed).
ation. Nat Rev Genet 2010;11:31–46.
24. Kearse M, Sturrock S, Meintjes P. The Geneious 6.0.3 read
2. Kumar S, Dudley J. Bioinformatics software for biologists in mapper. http://assets.geneious.com/documentation/gen
the genomics era. Bioinformatics 2007;23:1713–17. eious/GeneiousReadMapper.pdf (1 June 2014, date last
3. Moody G. Digital Code of Life: How Bioinformatics is accessed).
Revolutionizing Science, Medicine, and Business. Hoboken: 25. Langmead B, Salzberg SL. Fast gapped-read alignment with
Willey and Sons, Inc., 2004. Bowtie 2. Nat Methods 2012;9:357–9.
4. Altschul SF, Gish W, Miller W, et al. Basic local alignment 26. CLC bio press release, 26 Sep 2013: CLC bio and UK
search tool. J Mol Biol 1990;215:403–10. scientists assemble ash tree genomehttp://www.clcbio.
5. Smith DR. The battle for user-friendly bioinformatics. Front com/news/clc-bio-and-uk-scientists-assemble-ash-tree-
Genet 2013;4:187. genome/(1 June 2014, date last accessed).
6. Carver T, Harris SR, Berriman M, et al. Artemis: an 27. Smith DR, Kayal E, Yanagihara AA, et al. First complete
integrated platform for visualization and analysis of mitochondrial genome sequence from a box jellyfish re-
high-throughput sequence-based experimental data. veals a highly fragmented linear architecture and in-
Bioinformatics 2012;28:464–9. sights into telomere evolution. Genome Biol Evol 2012;4:
7. Pabinger S, Dander A, Fischer M, et al. A survey of tools for 52–8.
variant analysis of next-generation genome sequencing data. 28. Smith DR, Hua J, Archibald, et al. Palindromic genes in the
Brief Bioinform 2013;15:256–78. linear mitochondrial genome of the nonphotosynthetic
Buying in to bioinformatics 709
green alga Polytomella magna. Genome Biol Evol 2013;5: 34. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering
1661–7. splice junctions with RNA-Seq. Bioinformatics 2009;25:
29. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative 1105–11.
Genomics Viewer (IGV): high-performance genomics 35. Edgar RC. MUSCLE: multiple sequence alignment with
data visualization and exploration. Brief Bioinform 2012;14: high accuracy and high throughput. Nucleic Acids Res 2004;
178–92. 32:1792–7.
30. Milne I, Bayer M, Cardle L, et al. Tablet—next gener- 36. Delcher AL, Bratke KA, Powers EC, et al. Identifying bac-
ation sequence assembly visualization. Bioinformatics 2010; terial genes and endosymbiont DNA with Glimmer.
26:401–2. Bioinformatics 2007;23:673–9.
31. Zwickl DJ. Genetic algorithm approaches for the phylogen- 37. Mayer C, Leese F, Tollrian R. Genome-wide analysis of
etic analysis of large biological sequence datasets under the tandem repeats in Daphnia pulex-a comparative approach.
maximum likelihood criterion. PhD diss., The University of BMC Genomics 2010;11:277.
Texas at Austin, 2006. 38. Minin VN, Dorman KS, Fang F, et al. Dual multiple
32. Stamatakis A. RAxML Version 8: a tool for change-point model leads to more accurate recombination
Phylogenetic Analysis and Post-Analysis of Large detection. Bioinformatics 2005;21:3034–42.
Phylogenies. Bioinformatics 2014;30:1312–13. 39. Kearse M, Moir R, Wilson A, et al. Geneious Basic: an
33. Zerbino DR, Birney E. Velvet: algorithms for de novo integrated and extendable desktop software platform for
short read assembly using de Bruijn graphs. Genome Res the organization and analysis of sequence data.
2008;18:821–9. Bioinformatics 2012;28:1647–9.