A Review Article On Bioinformatics Tools and Software
A Review Article On Bioinformatics Tools and Software
Abstract
This is the era of scientific research and discoveries along with inventions. Biological science is
growing at fastest rates as the Bioinformatics come into play with its immense advantages and
applications. Bioinformatics is relatively young science and rapidly growing. There are large
number nucleotide and amino acid data available now and growing exponentially. This large
amount of data was difficult to handle or to analyze. So with the help of computer software
bioinformatics emerged. It helps in storing, analyzing and interpreting this huge biological data
which now called as Proteomics and Genomics. To store this amount of data the need arises for
establishment for Databases. Which store this data via computer programs, and internet ensure
its availability publically worldwide. The most famous data bases are Genebank, NCBI, EMBL,
DDBJ, Swiss-prot, Uniprot, TAIR, GEO etc. These allow the researchers to submit, and analyze
the biological data. Now if this data is not retrieved or analyzed then of no use. Scientists
developed tools and software which enables the researchers to retrieve, analyze and interpret the
data from these databases. Then this data used for number of biological purposes like
Phylogenetic tree construction, drug development and number of other applications in real life.
These tools for comparative studies and structure and function prediction, expression of
biological molecules like DNA and Proteins. Some of which includes Bankit, ENTREZ,
GENEQUIZ, GENSCAN, READSEQ, Modeller, iTASSER. In this review, introduction,
specialty and functioning of the above tools and software, majority of which available freely, and
commonly used in bioinformatics for research and academic purposes.
Introduction
Bioinformatics also called computational biology as it is interdisciplinary field, which come into
existence by the merger of Biology, Statistics, and Computer. The bioinformatics has deep roots,
work on Genomics paced as the DNA structure was discovered by Watson and Crick. The
sequence of 1st gene was determined in 1972 at Laboratory of Molecular Biology of the
University if Ghent Belgium by Walter Fiers et al, it was the gene of MS2 coat protein of
Bacteriophage [1]. Then followed by this same team determined the complete sequence of this
gene in 1974 [2]. The first genome sequence by Sanger in 1977 was bacteriophage DNA [3].
Then in 1995 1st living organism genome was sequenced, H. influenza much faster than previous
one. After this, modifications made in Sanger technique and genome sequencing of different
organisms done up till now at faster pace. In 2001 Human Genome Project completed and
sequenced around 25000 genes of human [4]. Then this area become the main interest of
Bioinformaticians and Biologists they started sequencing different genomes of organisms protein
structures and functions along with gene structure prediction. Computer software and latest
technologies used to build databases of Proteomics and Genomics. Up till now there are number
Page |2
of Prokaryotes, Eukaryotes including plants and animals are sequenced data is stored in
databases. Tools and software also developed for the retrieving, analyzing, interpretation and to
make maximum use of this data.
i. PIR
Protein Information Resource is an online tool for the proteins databases. In 1984
founded by NBRF. It was built to assist the researchers and users to identify and
interpretation of the protein sequence information. It is now PIR-international with
the collaboration of MIPS and JIPD [5].
Specialty
It is comprehensive tool for protein sequence data information. One can easily find
information about any sequence of amino acids. This protein database search tool
provides information about annotated protein sequence, Domain search, combined
global and domain search, and interactive and text searches [6]. Their files also
available by FTP. This tool is available freely on the web online. Can be accessed
from the link http://pir.georgetown.edu/ .
It provides family and super family classification, Search and Analysis of sequences,
similarity search of sequences, and also searching and aligning of the Sequences [7].
Functioning
By clicking the link one can go the PIR official website and there are other multiple
resources available like, iProLink, iPTMnet, Pro protein ontology, UniprotKb.
Input is provided in the form of amino acid sequence like [8]
>P1;CRAB_ANAPL
ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).
KLVQP MLAAVR DDKKLVQ PMRFTS
DRDPSV YYRNQSEL KRQA HEIPIS
KRSATV PQVLLS QKRPLTV
SDVPERSIPI TREEKPAIAG AQRK*
It is NBRF format. After setting all required parameters and then click on search. It
will search the amino acid sequence on the search button. It will search all possible
annotated sequences and peptides that matching the sequence as in output.
ii. GeneBank
It is an online web-based tool which has large nucleotide database. It is the largest
nucleotide database. It was released in the 1982. First established by Walter Goad and
Los Angeles National Laboratory. This database is maintained by the National Center
for Biotechnology Information in United States. It has exponential growth almost
double every 18 months [9].
Specialty
It has largest nucleotide database, more than 300,000 organism’s nucleotide
sequences are available in this tool. It has more than 150 billion nucleotide and more
Page |3
than 162 million sequences [10]. One can easily find the matching nucleotide
sequences and their annotations. It is built with purpose to facilitate the researches
and for general use to find the matching nucleotide sequences. Submission of
sequences also in the original form and via BankIt and some other tools. It contains
most up to date nucleotide data so NCBI do not restrict the researchers to submit the
data. This tool can be accessed via different resources especially from the official
website https://www.ncbi.nlm.nih.gov/genbank/ . It is primary designed for bulk
submission data like EST, Genome survey Sequence, for the NGS and WGS and
other high through put data form sequencing centers. It has uniform and
comprehensive nucleotide sequence information. Available on internet, free of cost,
for web-based retrieval, analysis services, and also via FTP [11].
Functioning
GeneBank is used to direct submission and via tool submission. Once the sequence is
submitted is thoroughly examined by the staff and for the originality and then
accession number is assigned and make available for the public online. It accepts
simple mRNA or DNA nucleotide sequence, ESTs, STS, and GSS, also in bulk form.
BACs and YACs also cosmid genomic sequence submission done via available tools
to it [12]. Complete microbial genome and Whole Genome Shortgun sequence of
prokaryotes and eukaryotes submission also made.
Sequences can be stored in many formats and different software are used to interpret
them. FASTA is most recognized, accepted and frequently used format for submission. Like >gi|
568815581:c7687550-7668402 Homo sapiens
chromosome 17, GRCh38 Primary Assembly
GATGGGATTGGGGTTTTCCCCTCCCATGTGCT
CATCTAGAGCCACCGTCCAGGGAGCAGGTAGC
TGCTGGGCTCTCCACGACGGTGACACGC--------
Data Retrieving can be done via ENTRENZ system and BLAST sequence similarity
search also done by it [13]. File of data can be received in FASTA format via FTP
and readable formats. Bulk data retrieval done via command line application uses
FTP. PERL and Python are good for biological data retrieval.
iii. TAIR
The Arabidopsis Information Resource is one of the specific model organism
Arabidopsis thaliana data bases [14]. It provides gateway to research by providing
genome and proteome information about the model organism of plant family. It is
comprehensive and widely used database for the research work in plant biology. It
was established in 1999 and funded by the National Science foundation in USA.
Specialty
Information regarding genome, gene expression data, proteomes, variants, mutant
alleles and phenotypes, of plan Arabidopsis thaliana can be accessed from this
organism specific data base. Data base of TAIR also presents sequence
Page |4
And it will give us all the information regarding Stress Responsive gene. We will
click on the one of the desire link of search query page and the gene result will look
like this as output.
Search result data can be downloaded in different formats, genome and annotations
can also be visualized by using tools like SeqViewer and GBrows. And researches
and analysis purposes can be met in this way [16].
iv. GEO
Gene Expression Omnibus is database which store high throughput expression data
which includes functional genomic data sets, obtained by micro-array and sequence
based technologies [17]. It is web based online tool. It store basically gene expression
data. It was created in year 2000 to store microarray gene expression data for the
research community who started producing this data. Moreover the microarray
Page |5
expression data frequently used for number of researches and analysis so GEO data
base provide an easy access to repository. It is maintained and provided by NCBI.
Specialty
It is best online tool for expression data deposit and archiving. The objective being to
facilitate independent evaluation of results, reanalysis, and full access to all parts of
the study. It supports all kind of data archiving like independent evaluation of results
and complete access to all parts of study and processing those results. Most of the
data available in GEO is original data submitted by the researchers. Over 8000
laboratories submitted over .5 million public samples. And GEO contains expression
data of over 1300 different organisms. It can be accessed online from the link
https://www.ncbi.nlm.nih.gov/geo/ , available free of cost for the researchers and
public. The data stored in GEO is MIAME compliant data. Data on GEO can easily
download. It offers simple submission methods and formats which support well
annotated data deposits by researchers.
Functioning
GEO mainly contains data of Microarray expression in the MIAME format and ChIp-
chip data. Simply go to the home page and navigate through required field for data
submission like, Array submission, RT-PCR submission, and high throughput
submissions. GEO has contents like, platform (it describe a specific product set),
series (it is series of sample), samples (individual arrays, and start accession with
GSM), data bases and profiles.
All the data on the GEO can easily downloaded in number of formats. Data is in FTP,
a URL, Entrez GEO data set query download, GEO BLAST query. Bulk data
Page |6
downloading is usually performed through GEO FTP data mining. Others MINiML
and SOFT directories are also used for data downloading. Supplementary and
annotation directories for downloading data over GEO are also available.
v. Sequin
It is software tool used for the submission of sequence data to the GeneBank, DDBJ,
and EMBL. It developed and maintained by the NCBI. It can submit simpler and
complex data to the database [18].
Specialty
It sequence submission tool and can handle simple submission may only contain a
short mRNA sequence and complex submissions like multiple annotations,
phylogenetics and population studies, gapped sequences and other multiple
sequences. It is available as off-line software can be installed on PC very easily. Its
efficiency is very high and produced very good results if Sequin file contains less
than 10.000 sequences. It’s rather slower with larger submissions. It can be
downloaded from the page https://www.ncbi.nlm.nih.gov/projects/Sequin/index.html
downloading through FTP.
Functionality
Get the sequence and open the Sequin software, and click on start the new
submission, now set the parameters and provide other information asked. Sequin can
be used in any of the two modes which are stand-alone and network aware. Sequin
macro send allows the user to send larger files. Some very large sequence file may be
send through Tbl2asn command line program available on Sequin software [18].
vi. BankIt
It is online web-bases tool for the submission of sequence data to GeneBank. It is
maintained and developed by the NCBI [19].
Specialty
A single sequence can be easily submitted through it. Few unrelated sequences or the
sequences with different information and source submitted via it. Smaller batches of
sequences also submitted using the BankIt. It can be accessed by the clicking link
https://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank , its free of cost and available
easily to all researchers and public.
Functioning
Get your sequence for submission and go to the official website of BankIt “Sequence
submission Category”. You will be asked to fill some fields to provide information
prior to sequence submission. Like contact information, nucleotide sequence
information and general information. Submission category like created from the
primary existing data or submission directly sequenced. Source information and PCR
primers fields, and feature information like coding region, exons, intron etc also need
to be filled. Once all the information is provided the BankIt shows last finished flat
file to make corrections if needed. Provide correct accession numbers of gene.
Complete submission as original submission or third party submission of sequence.
Page |7
Primary data is WGS and contig data is not regarded as primary data in submission
queries [19].
vii. Readseq
It is a tool which converts different biosequences in the popular formats like FASTA,
GeneBank and EMBL [20]. Originally it was written in 1989 as sequence analysis
program but a simple command line interface promoted to conversion program for
bioinformatics [21]. It is developed and maintained by EMBL-EBI.
Specialty
There is wide variety of sequence format; we often need to convert one sequence
format into the other, for our convenience. This is the spatiality of Readseq that it
done this for us. Readseq also available off line. Software can be downloaded and
install in the PC. It is available freely on https://www.ebi.ac.uk/Tools/sfc/readseq/
website.
Functionality
Sequence Input: More than one sequences of any of the format from EMBL,
FASTA, GeneBank, GCG, PHYLIP, Swissprot, Uniprot, NBRF and PIR can be given
as input. Partially formatted data is not accepted and data limit is 2Mb. Only a valid
file format sequence can be uploaded [22].
Output Sequence: There are multiple formats supported as output like, EMBL,
GeneBank, FASTA, Clustal, ACEDB, DNA Strider, Flat Feat, GCG, GFF, XML,
PIR, NBRF, MSF and more. Then the case of output letters can also be set, gaps are
removed. After setting all parameters and providing the required information like
email or contact information the file is submitted. The results are sent to email
address and the results sometimes delivered to browser window and ready to
interpret.
viii. Entrez
It is data retrieval tool which is text based and used by the all major databases like
Complete Genome, PubMed, Taxonomy, Protein Structure, Protein Sequence and
many others. It is established in 1991 distributed by NCBI [23]. Initially it was only
consists of PDB and GeneBank nucleotide sequence data.
Specialty
It is global query integrated and retrieval system which provides access to all the data
bases available simultaneously with single query or string. Data retrieval related to
sequence, structure and references can easily be retrieved with Entrez efficiently. It
also provides visuals of protein, gene, and chromosome maps. It is web-based online
tool and freely available for research purposes and public. Can be accessed on NCBI
https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/index.html
clicking the link.
Functioning
Page |8
The front page of the Entrez provide all access to global query system. It support
Boolean operators and data bases can be searched by providing single query string.
Search tags limits the search to specified fields. The results is the unified page which
contains the search hits in the databases. It also gives links in output to actual search
result in the data bases. The limit field feature allows the user to narrow down its
search results. History feature provide information about the previous performed
queries. The previous queries are referred by the number or combined with Boolean
operators. The result can also be saved in clipboard and MyNCBI account provide
feature of saving indefinite queries. Results can also be emailed. This tool is widely
used in biotechnology for research and study purposes [24].
ix. GeneQuiz
It web-based tool and used for large scale biological sequence analysis. It was 1st
launched in 1993 for automated genome analysis by EBI [25]. As there is large data
available on different databases, so computational analysis become challenge for
researchers and then this tool helps batch sequence analysis and annotation.
Specialty
It can analyze longer sequences and provides single user friendly interface which hide
all the complexities and fully automated web-based tool. For each query sequence it
runs single functional annotation. Which is shown in extensive report allows the user
to track the various aspects of the query in detail. Primary goal of Genequiz is
sequence analysis viewed as automated functional interface and secondary goal is
presentation of the supporting information abstracted from different sequence
analysis. It can process large sequence information in quick and consistent way and
uses the regularly updated sequence databases. One thing must be clear Genequiz
does not process DNA sequence, the DNA sequence must be converted into amino
acid sequence before processing with Genequiz. It only works with protein
sequences. It runs set of analysis that provides functional annotation but it does not
allow user to compare and analyses between different queries. It can be accessed by
the link http://www.sander.ebi.ac.uk/gqsrv/submit online. It is available free of cost
for the researchers to make queries and for study purposes.
Functioning
Genequiz take amino acid sequence as an Input. The sequences can be inserted more
than one. Usually the input sequences are FASTA format and some others also
supported. After submission Genequiz runs automated analysis and look into all
available protein databases like uniprot and swissprot etc., and finally it generates a
web-based report as an output for the user which summarizes the results [26]. It
usually takes five to fifteen minutes per query but it could be time consuming if the
queries are larger and servers being heavily used. There are 4 modules in Genequiz
which are GQreason, GQupdate, GQbrowse, and GQsearch [27]. Genequiz makes
combination of heterogeneous for the reasoning procedure. And uses different
sequence similarity methods in output. Currently FASTA and BLASTP are widely
Page |9
used. It is being more updated by the maintaining authorities for better and efficient
result.
x. GENSCAN
It is software in gene prediction category. Used for gene identification and structure
of gene in genomic DNA. It can also find exons and introns of a gene and their edges,
also used to predict the complete structure of a gene. GENSCAN was developed by
Chris Burg in 1997 [28].
Specialty
It analyzes the genomic DNA sequences from variety of organisms like vertebrates,
invertebrates, human and plants. When a gene sequence provided it determines the
most probable gene structure under the probabilistic approach of gene structure and
composition of genomic data of given organism [29]. It then gives the file on which
exons and genes are printed along with the predicted peptide sequence. It is available
both on web and also an offline version. Can be downloaded and installed in PC.
General feature includes it can predict multiple genes in a sequence, can predict
partial as well as complete genes, and predict continuous sets of genes present on
both strands of the DNA. It uses obvious model parameters to show difference in
gene structure and composition particularly in G+C composition in human genome.
There is data which shows distinct improvement in Genscan results accuracy over
other available tools. Genescan program search against the sequences databases with
BLASTP to detect all possible homologs [30]. GENSCAN web server can be
accessed at link http://genes.mit.edu/genomescan.html . It is available free of cost for
the researcher and study purposes. An offline version can also be downloaded for PC.
Functioning
It functions to predict structure of genes by searching against databases. Genscan
probabilistic model accounts for many essential properties of gene structure and
composition like, number of exons per gene, gene density and the reading frame, and
composition dependent initiation, termination and TATA box and cap signals.
It takes a DNA sequence as an input in specified formats supported by the Genscan.
Usually FASTA format sequence is entered, it also run file format of EMBL,
GeneBank, LOCUS and CDS formats [31].
The output is simply printed on the screen, this out put on the screen can be saved
into a file SEQFILE.out format. The run time is mostly .8 sec/Kb for an average input
file [31]. Following steps involves in its working, first the sequence and parameters
are stored in allocated arrays. Then the sequence is scored using the probabilistic
approach. Then the predicted structure of a gene is displayed on the screen in the
form of a report. The structure of predicted exons and introns on the genes are shown
in the graphical form also [31].
xi. Modeller
It is software used for the homology modeling. Is computer software which produces
models for protein secondary and tertiary structures [32]. It is available online by
P a g e | 10
Sallilab and commonly utilized tool and very powerful tool for homology modeling.
It was initially released in 1989 [32]. Written and maintained by Andrej Sali at
University of California USA [33].
Specialty
It predicts 3D protein 2’ and 3’ prime structure also 4’ from a simple amino acid
sequence. It works on energy minimization principle. It relies on the input query
sequence and target amino acid sequence and the template that’s structure are to be
resolved. It also does the loop modeling for a protein. It ensure the comparative
protein structure modeling by satisfying the spatial arrangements of the atoms [33]
and some other tasks like, loop modeling, optimization of protein structure with
respect to flexibility defined objective function, multiple alignment of protein
structure, clustering, comparison of protein structure etc. It is available as freeware
software to install on PC, Machontosh, Linux operating system and can be accessed
and downloaded from the website of Salilab https://salilab.org/modeller/ , after
installing on the PC its ready to use for homology modelling. It gives automated
protein structures, upon its popularity several third party GUIs are also available for
Modeller, like EasyModeller which also freeware 3rd part GUI for Modeller can be
installed on PC.
Functionality
For its functioning it require 3 things a python Script, A sequence Alignment and a
Template file from PDB as an input. To run it requires a Python script which can be
learned easily. To begin with one has to write a python script an input to modeler, and
then it will be followed by sequences for various proteins and their alignments, which
are (.ali) file for sequence alignment, file [34]. Lastly we have to input the template
structure PDB file which contains alpha carbons and their coordinates. So these are
the three inputs which need to provide to Modeller software. Modeller tries to
substitute the side chain amino acids with other amino acids to minimize the overall
calculated energy from the template structure to create initial model in (.ini) format.
So energy is minimized of the protein structure by creating the entire possible
rotamers configuration. Output structure is provided with lowest energy and most
stable protein 3D structure. Amongst the output we have .log file (log output from the
run), .b file (model generates in PDB), .d file (progress of optimization), .ini file
(initial model generated), .v file (violation profile), .rsr file (restrains in user
format), .sch file (schedule file for optimization process) [35].
xii. iTASSER
Iterative Threading ASSEmbly Refinement is software which is used to predict the
3D proteins structure from amino acid sequence [36]. The process of predicting the
structure also knows as threading. Through threading technique it finds the structure
of given sequence in the PDB. This tool web-based as well as an offline download
version is also available. This tool developed and maintained by Yang Zhang Lab at
the University of Michigan.
P a g e | 11
Specialty
It is the most widely used and successful online software for structure and function
prediction of proteins. It gives high quality 3D structures and predicted functions of
the amino acid sequence provided to it. Use by the researchers and for academic
purposes. In CASP 7 and 8 iTASSER ranked no.1 server among the structure
prediction servers available [37]. It uses Monte Carlo simulations to construct full
protein structures by reassembling the predicted protein fragments from threading
templates. It calculates the best matching scores of amino acid sequence by searching
in PDB, using Z-Scores. It runs the structure and function prediction in as Sequence
to structure to function prediction. It is estimated that around 374891 sequences
predicted to date by the iTASSER server which were submitted by the 91082 user
from 136 countries [38]. This tool can be accessed from the official website of Yang
Zhang Lab by the https://zhanglab.ccmb.med.umich.edu/I-TASSER/ link. It is free
web-bases and downloadable version also available.
Functioning
For the functioning iTasser requires folds from library (available publically), Amino
acid sequences (that’s structure is to be predicted) and a scoring function (which is
developed by the iTASSER team) as an Input, and the output will be a predicted
structure. It will assemble various folds by threading sequences against those folds
and built a tertiary structure. The software will do this in an automated way.
Input: It starts by providing an amino acid sequence usually from 10 to 1500
residues. Just copy paste the sequence in the provided space on the tool. The provided
amino acid sequence should be FASTA format and one can also upload a file of
FASTA. Then set other optional parameters and provide contact information so that
the predicted structure can be sent to email address after some time may be after a
day. After just click on the “Run I-TASSER” to run the software and to get output
[34].
Working: Taking input iTASSER will go and generate all 3D atomic models for
multiple threading alignments. It will do this automatically through the server and
create iterative structural assembly simulations. These assemblies will be evaluated
then. Then it will predict the function by matching the predicted 3D structural models
with already known protein structures and their functions.
Output: iTASSER will output full length secondary and tertiary 3D structures along
with their functional annotations on their ligand binding sites. An estimate of
accuracy is also an output which is represented by the confidence score of the
modeling.
Conclusion
As from the above discussion it is obvious that Bioinformatics has wide application range. And it
has large biological data which implies to functional and structural research and studies. And to
P a g e | 12
use and store this data tools and software help us in this regard. In this review specificity and
functioning of some important tools and software discussed which are commonly used in
bioinformatics. As Bioinformatics is growing field some more advancements are being made in
the utilizing of the Biological data. Tools and software are being modified and updated.
Efficiency and accuracy of tools is more improving with the passage of time. It is predicted that
in coming years bioinformatics will rule the science. It will be easy to diagnose and treat many
incurable diseases. The treatment and diagnose will be based on the genomic and proteomic
studies. The trend of personalized medicines will prevail in future. Biotechnology will be
assisted by bioinformatics in plant (crop) technology.
References
1. Min Jou W, Haegeman G, Ysebaert M, Fiers W., (1972) “Ncleotide sequence of the gene
coding for the bacuteriophage MS2 coat protein,”
2. Fiers W et al., (1976) “Complete nucleotide-sequence of bacteriophage MS2-RNA
primary and secondary structure of replicase gene”
3. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA,
Slocombe PM, Smith M., (1977) “Nucleotide sequence of bacteriophage DNA” ,Nature
4. IHGSC (2004). "Finishing the euchromatic sequence of the human genome.”
9. Benson D; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W.; et al.
(2009). "GenBank". Nucleic Acids Research. 37 (Database): D26–D31.
10.GenBank release notes". NCBI
P a g e | 13
23. NCBI Resource Coordinators (2012). "Database resources of the National Center for
Biotechnology Information". Nucleic Acids Research
24.Fishel R, Lescoe MK, Rao MR, Copeland NG, Jenkins NA, Garber J, Kane M, Kolodner
R. (1993) “The human mutator gene homolog MSH2 and its association with hereditary
nonpolyposis colon cancer”.
25.Andrade, M.A., N.P. Brown, C. Leroy, S. Hoersch, A. de Daruvar, C .Reich, A.
Franchini, J. Tamames, A. Valencia, C. Ouzounis, and C. Sander. 1999. “Automated
genome sequence analysis and annotation”. Bioinformatics. 15, 391-412
26.Scharf, M. et al. 1994; “GeneQuiz: a workbench for sequence analysis”
27. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local
alignment search tool. J. Mol. Biol., 215, 403–410.
28. Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic
DNA.
29. Burge, C. B. (1998) Modeling dependencies in pre-mRNA splicing signals. In Salzberg,
S., Searls, D. and Kasif, S., eds. Computational Methods in Molecular Biology, Elsevier
Science, Amsterdam,
30. Burge, C. B. and Karlin, S. (1998) Finding the genes in genomic DNA
31. Burge, C. & Karlin, S. (1997) Gene structure, exon prediction, and alternative splicing
32. Fiser A, Sali A (2003). "Modeller: generation and refinement of homology-based protein
structure models". Meth. Enzymol. 374: 461–91
33. A. Sali & T.L. Blundell. (1993) “Comparative protein modelling by satisfaction of spatial
restraints”.
34. Fiser, A., R. K. Do, and A. Sali. (2000). “Modeling of loops in protein structures.”
Protein Sci..
35. Karlin, S. & Altschul, S. F. (1990). Proc. Natl. Acad. Sci. USA,
36. Roy A, Kucukural A, Zhang Y (2010). "I-TASSER: a unified platform for automated
protein structure and function prediction
37. Battey, JN; et al. (2007). "Automated server predictions in CASP7". Proteins
38. https://zhanglab.ccmb.med.umich.edu/I-TASSER/ (2018, Jan 23). “ official website”