0% found this document useful (0 votes)
17 views17 pages

Ajol File Journals - 314 - Articles - 242956 - Submission - Proof - 242956 3745 584187 1 10 20230306

The document reviews the development and challenges of biological databases, emphasizing their importance in research related to human diseases and drug manufacturing. It categorizes databases into primary, secondary, and specialized types, providing examples and discussing issues such as data accuracy and integration. The paper highlights the rapid growth of databases like GenBank and the Protein Data Bank, while also addressing the need for improved data management and accessibility.

Uploaded by

Cappa Maampi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

Ajol File Journals - 314 - Articles - 242956 - Submission - Proof - 242956 3745 584187 1 10 20230306

The document reviews the development and challenges of biological databases, emphasizing their importance in research related to human diseases and drug manufacturing. It categorizes databases into primary, secondary, and specialized types, providing examples and discussing issues such as data accuracy and integration. The paper highlights the rapid growth of databases like GenBank and the Protein Data Bank, while also addressing the need for improved data management and accessibility.

Uploaded by

Cappa Maampi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

https://ajbas.journals.ekb.

eg
Alfarama Journal of Basic & Applied Sciences
[email protected]

Faculty of Science Port Said University


http://sci.psu.edu.eg/en/

ISSN 2682-275X July 2022, Volume 3 Issue II DOI:10.21608/AJBAS.2022.105632.1078


Submitted: 17-01-2022
Accepted: 27-03-2022 Pages: 346 - 362

Perspectives review and Challenges in Biological Databases Integration

Monica Fawzy1,*, Noha E. EL-Attar2, 3Shady Y. El-mashad and 4Wael A. Awad


1
Department of Mathematics and Computer Sciencce, Faculty of Science, Port Said University, Egypt.
2
Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt.
3
Faculty of Engineering (at Shoubra), Benha University, Benha, Egypt.
4
Faculty of computer and information, Damietta University, Damietta El-Gadeeda, Egypt.
* Corresponding author: [email protected]
‫ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ‬

ABSTRACT
With the enormous and rapid increase in biological data, there has become a significant need in
developing biological databases, which help in various researches related to human diseases, drugs
Manufacturing, and others based on the stored data. The set prepay to convey back imply distance
from fundamental databases has been yoke of the most crucial issues in this field. In this paper, we
present a review of biological databases with its types: - primary databases, secondary databases and
specialized databases with examples of all of them and a brief description of them and we highlight
some perspective, similarities and differences of widely utilized biological databases. Also, we have
displayed what researchers have found in developing the biological databases. In addition, the paper
presents some of the main challenges that face the biological databases developers such as; big data
issues, data errors, and integration of data and state of art on how they can be overcome.
Keywords
B-number, Biological Database, Bioinformatics, DNA.
‫ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ‬
1. INTRODUCTION
Biological databases can be defined as libraries of life-science information that may be collected from
different types of resources, computational analysis, high-throughput experiment technologies, published
literature and experimental experiments are only a few examples. [1]. This biological information can be
in the form of raw data or measurements stored or exchanged in a digital form in files or databases [2].

Constructing the biological databases is considered one of the significant issues that is handled by the
bioinformatics science. Generally, bioinformatics can be defined as ―Bioinformatics is a branch of
computer science that includes storing, retrieving, processing, and disseminating data related to
biomolecules such as RNA, DNA, and proteins‖ [1].

The "core dogma" of biology governs the transmission of genetic information in general.; it presents
the process of transcribing DNA sequences are translated into RNA, which is ultimately translated into
proteins. as shown in (Fig. 1). In order to study the biological databases well, the features of the biological
raw data of DNA, RNA, and Proteins need to be defined.
Initially, DNA (Deoxyribonucleic Acid) is the genetic material in humans and almost all other
organisms. Most DNA is located in the cell nucleus. Ribonucleic acid (RNA) is a polymer molecule that
plays a necessary component role in different biological processes, gene coding, decoding, control, and

346
Fawzy, et al AJBAS Volume 3, Issue II, 2022

expression, to name a few. Nucleic acids, such as DNA and RNA, compose the four primary
macromolecules required for all known living forms, along with lipids, proteins, and carbohydrates.
Protein is large biomolecules or macromolecules. It contains a long chain or more of the essential
amino acid residue. Proteins have a wide range of roles in animals, including metabolic process
stimulation, DNA replication, response to stimuli, cell and organism production, and the movement of
molecules from one area to another. Proteins differ primarily in their amino acid sequence, which is
determined by the nucleotide sequence in their genes. Proteins normally fold into a precise 3D shape that
influences their activity. [4].

(Fig. 1): DNA to Protein Transaction Process

In general, bioinformatics science is interested in two significant fields; the first field is concerned
with developing the computational tools and biological databases, while the second one is about how to
apply and use these tools and databases to create biological knowledge to enhance the recognition
methods inside the living systems [1].
The biological computational tools include several types of developed software for interpreting DNA
sequence, structural, or functional analysis. In addition, it is interested in constructing and curating of
biological databases which support significant analysis types, such as: molecular structural analysis
molecular functional analysis and molecular sequence analysis. Sequence alignment, sequence database
searching and style finding, genetics and meadows construction, reconstructing evolutionary links, and
genome aggregation and comparison are all part of molecular sequence analysis. The examination,
comparison, categorization, and prediction of protein and nucleic acid structure are all part of molecular
structural analysis. Finally, the molecular functional analysis handles the methods of determining the gene
expression patterns, predicting the protein-to-protein interaction, predicting the subcellular protein
localization, rebuilding the metabolism pathways, and the simulation processes. Indeed, these three
analysis issues are not individual processes; it might be integrated together to accomplish perfect results
[1].

2. BIOLOGICAL DATABASES
Biological data is usually collected from various sources across different structural and functional
boundaries. Biological data do not restrict to limited types of information, there is a wide range of data as
declared in (Fig. 2). The biological data may appear in the form of genetic sequences, protein functions,
medical findings, ecological questions, and others. So, existing of biological databases helps the different
types of biological data stakeholders in dealing with the collected biological data which is well organized,
easily accessed, centrally managed, and flexibly updated [1].
Databases serve as a repository for information. It's used to store and arrange data so that it's easy to
find information using various search criteria. Generally, there is a wide range of databases as declared in
(Fig. 2) [5].

347
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

(Fig. 2): Relation between types of databases and biological data (Databases :SQL database or Non-SQL
database…..Biological database :sequences, structures of biological molecules, gene expression profiles, biochemical pathways,
chromosomal mapping , phylogenetic data single and nucleotide polymorphisms)

The current point of view for developing the biological databases is based on dividing the functions of
databases: Primary, secondary, and specialized database into three groups. Before that, they divided the
databases into SQL database and Non-SQL database.
The major databases mostly contain experimentally produced data such as nucleotide sequences,
protein sequences, and macromolecular structures. The source of data for primary databases are directly
submitted from experiments that produce original biological data.
The primary databases are considered the raw materials for the other databases types. Most of the
Governments and community funding groups support primary databases financially. GenBank, European
Molecular Biology Laboratory (EMBL), and DNA Data Bank of Japan (DDBJ) are the three primary
public sequence databases that preserve and disseminate raw nucleic acid sequences [7,1, 8]. GenBank [7]
is an annotated repository of all publicly valid nucleotide sequences and their protein translations that is
free to the public. The National Center for Biotechnology Information (NCBI) produces and maintains it
as part of the International Nucleotide Sequence Database Collaboration (INSDC).
GenBank is growing at an exponential rate, doubling every 18 months. Furthermore, as stated in, its
data is viewed and cited by millions of scholars worldwide (Fig. 3).

348
Fawzy, et al AJBAS Volume 3, Issue II, 2022

(Fig .3):The Growth of GenBank (1982 - 2020)


The European Molecular Biology Laboratory (EMBL) [1] is a molecular biology research institute
financed by 27 member countries. One prospective member state and two associate member states [3].
EMBL was founded in 1974 as an international organisation supported by public research funds from its
member countries. Approximately 85 separate groups at EMBL conduct research throughout the gamut of
molecular biology. EBML is the name of nucleotide sequence database in Europe, but it is a bit confusing
with the name of the institute itself. Therefore, currently European Nucleotide Archive (ENA) is more
commonly used.
The DNA Data Bank of Japan (DDBJ) is a biological database that stores DNA sequences. It is housed
in the National Institute of Genetics (NIG) in Japan's Shizuoka prefecture. DDBJ is also a participant in
the International Nucleotide Sequence Database Collaboration (INSDC). It exchanges data on a daily
basis with the European Molecular Biology Laboratory at the European Bioinformatics Institute and
GenBank at the National Center for Biotechnology Information. As a result, these three databanks have
the identical information. They all accept nucleotide sequence submissions and then share fresh and
updated data on a regular basis in order to achieve optimal synchronization. Because they include original
sequencing data, these three databases are designated as primary databases [1].
Over the last decade, secondary databases have been regarded as the reference library for molecular
biologists, giving a plethora of (sometimes inaccurate) information regarding every genetic product or
genetic product under investigation by the scientific community. Data in secondary databases is the
product of data analysis, literature search and interpretation. Some individuals and companies financially
support secondary databases, besides some government bodies that support primary databases. Protein
Data Bank (PDB) is an example of a secondary database. PDB's growth is expanding year after year, and
its data is viewed and cited by millions of scholars worldwide, as stated in (Fig. 4).

349
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

(Fig .4): The Growth of PDB (1976 - 2020)


Specialized databases usually cater to a certain scientific community or are focused on a single
organism. These databases may contain DNA or RNA sequences, as well as other types of information for
a given organ [1]. (Fig. 5) displays the relationship among the three categories of the biological databases.

(Fig .5):Relationships among the biological databases types

3. COMMON ISSUES IN PUBLIC BIOLOGICAL DATABASE


The ongoing enhancement of biological databases is a major objective of bioinformatics science. The
content of these repositories may contain some inaccuracies due to the quick nature of this improvement
and the massive volume of data output. We acknowledge that mistakes and omissions can occur at both
the sequence and metadata levels in open databases, but for the purposes of highlighting some of the
various data integrity difficulties that might arise, we will primarily focus on sequence and taxonomy data
concerns. The reassembly of a misassembled Francisella Tularensis genome and the detection of single
nucleotide faults in a reference Tobacco mosaic virus (TMV) genome are two high-profile instances of
sequence errors [6]. On the other hand, because databases of protein sequences are obtained from genome
sequencing, via genome annotation, and in silico translation, the integrity issues for the proteomics
database are often identical to those for genomics. A sequence database mistake is unlikely to result in the
incorrect detection of a protein present in the sample (false positive), but it might easily result in the
failure to identify a protein present in the sample (false negative) (false negative). This is especially
important in terms of identifying precise peptide signatures for use in focused experiments [6].
In addition to the mentioned data errors and integrity problems, there are other types of obstacles that
may face the biological databases developers such as, redundant data, a steady flux, spreading data from
several databases, insufficient information, incorrect links, and unclear naming terms and annotations. All
of these challenges make the retrieval and extraction of information more complex. As a result, in order to
350
Fawzy, et al AJBAS Volume 3, Issue II, 2022

make the data more complete and dependable, the major core databases must interchange data on a
regular basis over the internet. However, in many circumstances, most secondary databases are unable to
communicate data. Although, the data in the secondary databases is derived from the primary databases,
the rate of data update within secondary databases is relatively slow for major databases. This is due to
the fact that the amount of data in major databases is often enormous and continually updated [1].

4. POPULAR BIOLOGICAL DATABASES FOR HUMAN RESEARCH


The departed development in computer-based technologies has enabled bioinformaticists and
biologists in typical to cope with the information age. The variety in the biological Information sources
and structures, made various biological institutions tend to build, develop, and maintain their own
biological databases. Hereby, the molecular biology databases have become crucial tools for research.
However, the importance of the biological databases is not limited to keep the biological data, it
extends to how the data will be organized is such a way that keeps the search process efficient to find the
optimal output. Generally, the database types may vary in their content structure, format, and mode of
data accessing [7].
(Fig. 6) displays the classification of the popular biological databases according to their structure,
source, type of data, quality of data, and maintainer statue. Also, some of the existing biological databases
are presented and classified in table (1) according to their types (i.e. primary, secondary, and specialized)
with a brief description for each one.

(Fig .6):Schematic Representation of Biological Databases (the classification of the popular biological databases
according to their structure, source, type of data, quality of data, and maintainer statue.)

351
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

Table 1: Description for Different Types of Biological Databases

Category Name Data modeling Brief description URL

A large public database of


nucleotide sequences that may
be used for bibliographical and
biological annotation. It https://www.ncbi.nlm.ni
GenBank [8] Flat file
contains genomic DNA, raw h.gov/genbank/
sequencing data from,sequence
polymorphisms and high
Primary Database

throughput.
An international organisation with
The European over 80 separate research groups
Molecular encompassing the gamut of
https://www.emb
Biology Nosql molecular biology and Europe's
l.de/
Laboratory flagship laboratory for the
(EMBL) [1] biological sciences — a DNA
sequence database.
Database and the
All publicly accessible nucleotide
DNA Data Bank https://www.ddbj.nig.ac
Flat file and protein sequences are
of Japan (DDBJ) .jp/index-e.html
annotated in this database.
[9]
It keeps track of the three-
dimensional structures of big
Protein Data Bank https://www.rcsb
Flat file (XML) biological molecules like DNA
(PDB) [10] .org/
and protein, as well as their
complexes.
The Protein
A comprehensive general
Information FASTA format https://proteininformatio
bioinformatics library that aids
Resource (PIR) nresource.org/
genetic and protein research.
[11]
It manually gathers annotated
UniProt Custom flat
data from literature and https://www.uniprot.org/
knowledgebase file, FASTA, G
computer analysis based on help/uniprotkb
[12] FF, RDF, XML.
values.
Secondary database

A knowledge platform for


human proteins that may be
accessed online. It aspires to
provide a complete resource for
https://www.nextprot.or
neXtProt[13] FTP knowledge on human proteins,
g/proteins/search
including their function,
subcellular localization,
expression, relationships, and
role in illnesses.
Predictive model resources, or
"signatures," are collected from
large protein signature databases https://www.ebi.
InterPro [14] FTP
and reflect protein domains, ac.uk/interpro/
families, regions, frequency, and
locations.
Database to represent the serial
alignment of multiple protein https://pfam.xfa
Pfam [15] Relational
families and hidden Markov m.org
models

352
Fawzy, et al AJBAS Volume 3, Issue II, 2022

Database of functional
ConsensusPathDB molecular interactions, fusion
[16] BioPAX ,PSI- of information about protein
http://consensusp
MI, SBML interactions, signs of genetic
athdb.org/
interactions, metabolism, gene
regulation, and drug-drug
interactions in humans
The largest research database
that can be reconnected with
information on genotypes and
patterns in the world. By inviting
23andMe [17] clients to take part in the
Relational https://research.23andme
research, a new research model
model .com/
has been created that helps speed
genetic discovery and provides
the ability to obtain new insights
more quickly about disease
treatments
A database built on collaboration
between university centres,
International nonprofit biomedical research https://www.genome.go
HapMap Flat file (XML) organisations, and private firms v/10001688/internation
Project [18] in Canada, China, Japan, al-hapmap-project
Nigeria, the United Kingdom,
and the United States.
A regularly updated database of
Online Mendelian human genetics, illnesses, and
Inheritance in Flat file genetic features, with a focus on https://omim.org/
Man [19] the genotype-phenotype link.
A biological database that
Relational http://www.mirbase.
miRBase [20] archives sequences and
model org/
comments of microRNA
With various integration
approaches, such as antibody-
based imaging, mass
Human Protein Relational spectrometer-based proteins, https://www.proteinatlas.
Atlas [21] model transcriptomics, and system org/
biology, it assigns all human
proteins in cells, tissues, and
organs.
Structural Provides a clear and https://www.ncbi.nlm.ni
Classification of comprehensive account of the h.gov/p-
Flat file
Protein (SCOP) links between known protein mc/articles/PMC10247
[22] structures. 9/
Class Architecture
Topology Relational Performs a semi-hierarchical
http://www.cathdb.info/
Homology model classification of protein domains.
(CATCH) [23]
A repository of global
neuroscience web resources,
containing experimental, clinical,
and infectious neuroscience
Neuroscience
Relational databases, knowledge databases,
Information https://neuinfo.org/
model atlases and genetic / genomic
Framework [24]
resources, and providing many
reliable links via the
Neuroscience Portal at
Wikipedia

353
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

A long-term founding initiative


operating between DDBJ,
International
EMBL-EBI and NCBI. INSDC
Nucleotide
covers a range of raw data
Sequence Relational http://www.insdc.or
readings, through alignment and
Database model g/
aggregations to job annotations,
Collaboration
enriched with contextual
(INSDC) [25]
information related to samples
and experimental configurations.
A database of genomic
annotations for chordates and https://www.ensembl.or
Ensemble [26] FTP
model organisms that is kept up g/index.html
to date.
Is a complete, integrated, non-
redundant, well-annotated set of
sequences derived from sequence
Relational https://www.ncbi.nlm.ni
RefSeq [27 data accessible in the redundant
model h.gov/refseq/
archival database GenBank. It
includes genomic DNA,
transcripts, and proteins.
A database with a wide-ranging
bioactivity of biologically active
biologically active compounds
that contain medications for
Relational https://www.ebi.ac.u
ChEMBL [28] functional correlation,
model k/chembl/
Absorption, excretion,
metabolism ,distribution, and
toxicity in vivo are all factors to
consider.
From a high-level and genetic
viewpoint, this resource can help https://www.genome
KEGG [29] Flat file (XML)
you comprehend facility .jp/kegg/
operations, cells, and organisms.
It is a knowledge resource that
contains in-depth biological data
on model organisms that have
been thoroughly examined. This
model allows researchers to https://www.genome.go
Model organism FASTA
easily find basic information v/10001837/model-
database [30]
about large groups of genes, organism-databases
efficiently plan experiments,
integrate their data with existing
knowledge, and build new
hypotheses
One of the most comprehensive
databases for understanding the
transcriptional regulatory
network in any organism.
http://regulondb.ccg.un
RegulonDB [31] XML Initially, it was created as the
am.mx/
primary database source for
information on the Escherichia
coli k-12 clone regulatory
network.
A big, long-term group study
aimed at sequencing and
releasing the full genomes and
Personal Genome
XML medical information of 100,000 https://my.pgp-hms.org/
Project [32]
participants to allow personal
genomics and personal medicine
research.

354
Fawzy, et al AJBAS Volume 3, Issue II, 2022

A biological database containing


Pathogen-Host structured information about
XML, FASTA http://www.phi-
Interaction genes that have been shown in
base.org/
database [33] experiments to alter the outcome
of host-host interactions.
Is a free, collaborative online
encyclopaedia that aims to
chronicle all of the world's 1.9
Encyclopedia of million living species. It is based
Flat File https://eol.org/
Life[34] on existing databases as well as
contributions from professionals
and non-experts all across the
world.
Is a public repository for
functional genomics data that
accepts MIAME-compliant data
uploads. Data that is based on an
Flat file and https://www.ncbi.nlm.ni
GEO[35] array or a sequence is acceptable.
relational model h.gov/geo/
Users can utilise the tools to
search for and download
experiments and curated gene
expression profiles.
A molecular biology and
Saccharomyces G
FASTA genetics resource in leaven. https://www.yeastgeno
enome
Yeast or emergent leaven, me.org/
Database [36]
Saccharomyces cerevisiae
Human genetics is being
researched by a Wiki. This
https://www.snpedia.co
SNPedia [37] XML article discusses the impact of
m/index.php/SNPedia
genetic variations, referencing
peer-reviewed scientific articles.
UCSC Malaria
FTP, Relational A bioinformatics research tool to https://genome.ucsc.edu
Genome Browser
model study the malaria genome. /
[38]
On the reference genome
assembly, Flybase employs a
Relational
FlyBase [39] generic genome browser to show https://flybase.org/
model
Specialized database

genome annotations and


genome-aligned evidence.
a database that is open-source
and a collection of tools for
Protein-protein storing data, displaying and https://www.ncbi.nlm.ni
Relational
Interaction analyzing rich arranged h.gov/pu-
model
Databases [40] molecular reaction data in bmed/25859942
standard formats accepted by
members of the community
A collection of small genetic
variants from various species.
The search tool on the dbSNP
webpage allows you to query
variants by simple terms or
http://www1.biologie.u
complicated searches. Reference
Polymorphism nihamburg.de/bonline/li
Flat The SNP Cluster Report includes
and Mutation brary/genomeweb
file, FASTA an allele summary, mapping
Databases [41] /GenomeWeb/human-
information in HGVS
gen-db-mutation.html
nomenclature, a gene-centric
view, a map table with
chromosomal coordinates, a
variation view, and a link to the
1000 Genomes Browser.

355
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

Pepteatlas is a statistically
validated technique and structure
Object- oriented http://www.peptideatlas
PeptideAtlas [42] for archiving proteomic data that
.org/
allows data interchange and
integration with genomic data.
A collection of mass-
spectrometry-based proteomics
data, including the identification
Relational https://www.ebi.ac.uk/p
PRIDE [43] of proteins, peptides, and post-
model ride/archive/
translational changes
documented in peer-reviewed
journals.
A database that contains
https://www.ebi.ac.uk/
MEROPS [44] Object- oriented information about peptides and
merops/
the proteins that bind to them.
A collection of community-
written Wikipedia articles https://en.wikipedia.org
Gene Wiki [45] Flat file (XML)
regarding human genes found in /wiki/Gene_Wiki
the NCBI Gene database.
In E. coli, a database is used to
record information on proteins
EcoCyc [46] HTML https://ecocyc.org/
and molecular interaction
pathways.
A constructed library of
generically classified RNA-seq
Sex-Associated datasets containing female and
Relational http://bioinfo.life.hust.e
Gene Database male biological copies of the
model du.cn/SAGD#!/
(SAGD) [47] same condition, with datasets
routinely re-analyzed using
established methodologies.
A wide range of active
Super-Enhancer
transcription enhancers with http://www.licpathway.
database (SEdb) sam format
enhanced chromatin properties net/sedb/
[48]
associated with the enhancer
VMH enables complicated
searches of its material, allowing
Virtual Metabolic
for quick analyses and
Human (VMH) COBRA format https://vmh.life/
interpretations of complex data
[49]
emerging from biological
investigations.
A method frequently used in
biology to study molecular
evolution and adaptive changes
in microbial groups over a long
Adaptive
period of time and under specific
Laboratory
Object- oriented growth conditions. With recent https://aledb.org/
Evolution
breakthroughs in next-generation
(AleDB) [50]
sequencing technologies and
lower sequencing-associated
costs, ALE has become more
popular as a biotechnology tool
Microsatellites are made up of
base pairs 1-6 that repeat side by
Catalog of all
side to create an array; there are
Germline Relational http://bidd2.nus.edu.sg/
about a million of them in the
microsatellites model CMAUP/
human genome, most of which
(CAGm) [51]
are found in gene introns, exons,
and regulatory regions.

356
Fawzy, et al AJBAS Volume 3, Issue II, 2022

An online biological resource


that provides information on the
WormBase [52] Flat file (XML) biology and genome of the worm https://wormbase.org//
Caenorhabditis elegans, as well
as other similar nematodes.
MOD is a model organism
database that includes
http://www.xenbase.org
Xenbase [53] FTP informatics tools as well as
/entry/
genetic and biological
information on Xenopus frogs.
Acedb is a database system
designed primarily for
processing genome and http://www.sanger.ac.u
Hierarchical
AceDB [54] bioinformatic data. It comes with k/
object
a number of useful tools for science/tools/acedb
manipulating, displaying, and
annotating genomic data.
A database of protein-protein
Relational
STRING [55] interactions that are known and https://string-db.org/
model
expected.
A database for analysing
http://fantom.gsc.
EdgeExpressDB Relational biological networks and
riken.jp/4/
[56] model comparing massive high-
edgeexpress/about/
throughput expression datasets.
A data repository for interactions
that has been gathered through
Relational extensive curation efforts. All
BioGRID[57] https://thebiogrid.org/
model data is publicly available through
a search index and may be
downloaded in specified forms.

5. APPROACHES FOR IMPROVING THE BIOLOGICAL DATABASES


In order to protect and secure biological databases against any intentional errors, the researchers have
resorted to enhance biological databases by several methods such as, constructing validation-specific
tools, database enablement specialised ontologies for self-validation which promoting internal control by
allowing only skilled curators and annotators to enter data [58].
The problem of biological database integration also have encouraged several researchers to solve it, for
instance:
The Biology Workbench [59]: It is the first online application to provide users with a combined
environment for tools and data. The BW offers web-based search interfaces through thirty-three (33)
databases, save the search results, and send the saved sequence to (66) sequence analysis and presentation
programmes [59]. For diverse biological datasets, this method is still the most used. [60].
NC Bioportal Project [61]: It is a bioinformatics portal that may be customised. Bioportal is a website
that connects static PISE interfaces with grid computing resources [62].
Thick Client Solutions [63]: Another design approach is smart client software (i.e., packages that must
be downloaded and installed locally). These tools allow you to create workflows by connecting tools in a
certain order and pipelining data through the process.
Anabench (Biology Workbench) [64]: It is a Web/CORBA-based workbench that allows for the flexible
integration of bioinformatics sequence analysis applications. This online application gives you access to
tools like the EMBOSS suite and a workflow pipelining system [65], but it doesn't provide you access to
data from public sources.
On the other hand, the importance of the biological databases motivated researchers to create and
develop new biological databases. Also, it has encouraged them to solve the issues existing in the current
databases such as data error and lack of integrity. Generally, there are no standards that can be used for

357
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

biological databases development; thus, it is become essential to define a list of IDs genes for each
database. Currently, this list of gene identifiers is embedded in a separate file for each database. A class in
Java has been developed to read these files, connect to the online services, and then return all accessible
information concerning these genes. Following the data retrieval procedure, it is critical to examine each
file's structure and create an analyzer to transform the information into XML files. This enables users to
perform searches and navigation via the databases easily. This approach has facilitated the data
integration process. Also, the procedure of saving data in XML files is a viable replacement for relational
databases at the moment. It has become a popular way for representing data changes, publishing it
through web, and also building versatile syntax that permits the identical information to be represented in
numerous ways [66].
As the knowledge integration is considered one of the most significant obstacles in the science of
bioinformatics, recent research directions have concerned with how to apply the integration between the
current existing databases. Some databases have achieved the integration via hyperlinks connected to
other databases. For instance, more than 160 different databases are linked from the UniProt database.
However, the user is left with the decision of which of the offered links to pursue in order to uncover
meaningful relationships. Other databases have taken an orthogonal approach to overcome this problem,
instead of referring connections to other sources, they just transfer the appropriate material from other
sources into the database. There are certain disadvantages to this strategy as well. It creates unnecessary
information (which might result in significant storage space consumption). Most importantly, this strategy
may result in the utilisation of stale and out-of-date results. [65].
There are also additional issues to consider when considering data integration, such as discrepancies in
the amount of genes detected in various databases. Pereira, R. et al. 2014 [66] concentrated on creating a
framework for integrating four existing databases: NCBI, EcoCyc, KEGG, and RegulonDB. These
databases were chosen because they include information that might be useful in TRN-related research.
When comparing KEGG and NCBI, because they both contain the same amount of genes, there are
frequently a lot of coincidences. When comparing the EcoCyc database to RegulonDB, the same thing
happens. As a result, massive amounts of data had to be gathered in order to accomplish data integration
in TRNs. His suggested integration process is centred on producing a B-number, which is often used as a
gene identification, as a common feature for all databases. However, the B-number does not apply to all
genes. This entails gathering all information about TRNs that is available across different databases and
putting it in a single repository known as the KREN.
Other researchers have also noted how to integrate and access diverse biological datasets in an easy-to-
use manner that does not need knowledge of the location or source of the underlying data, as well as the
technical language used to get the data (e.g., SQL or SPARQL). This technology is inspired by the work
submitted by Malick et al. in 2013 [67], which explicitly focused on searching for keywords in relational
databases. This technique begins with a description of the layers that comprise an integrated data access
system from the bottom up. Base data, data model, integration, and presentation layer are the four basic
levels of a data integration access system. However, one disadvantage of this technique is that, in the
absence of a presentation layer, such as an easy-to-use query interface, the data contained in global
ontology can only be accessible using specialized query languages such as SPARQL. As a result, in order
to access or retrieve the data, users must first understand the applicable query language. Table 2
summarizes some of the existing biological database that handled the integration problems with
displaying their shortage points.

358
Fawzy, et al AJBAS Volume 3, Issue II, 2022

Table 2: Literature Reviews on Biological Databases and its drawbacks

Database Name Description Drawbacks

I compiled a list of Wikipedia


Have a collection of community-written Wikipedia
Gene Wiki Database pages regarding human genes
entries about human genes that aren't connected to
[68] that were created by members
other databases like Gene Wiki
of the public
Although they are not part of the INSDC, they are
The presented fixed database produced from INSDC sequences to provide non-
was created using sequence data redundant curated data that represents our current
RefSeq Database [69]
from the redundant archive understanding of known genes. Sequence data
database GenBank from several INSDC records is included in certain
instances. As a result, it's a complicated database
Class Architecture Protein domain categorization Without integrating it with another database, I
Topology Homology that is semi-automated and offered a database for small genetic variants from
(CATH)[70] hierarchical. other species
NCBI, EcoCyc,
Use B-numbers a gene
KEGG and Not all genes have a B-number
identifier
RegulonDB [29]
Use four layers: Base data The data contained in the global ontology is largely
layer, Data model layer, available through a technical query language, such
OMA and Bgee[71]
Integration layer and as SPARQL, due to the lack of a presentation
presentation layer layer, such as a user-friendly query interface

6. CONCLUSION
In this paper, we have presented a comprehensive review on the major biological databases. Also,
we have displayed what researchers have found in developing the biological databases and the
challenges they have faced in building new generation of biological databases. The traditional
approach to retrieve information from biological databases has been one of the most important topics
in this area. Currently, it is getting more difficult as there are a large number of new heterogeneous
databases, often sporting different structures, formats, and ways of storing information as well as
different ways of providing this information to users. These factors along with the dispersion of
information entail the fact that biological data integration process is a very hard task to perform.

REFERENCES
[1] [1] Xiong, J., Essential Bioinformatics, first edition, copyright by Cambridge University Press, New
York, 2006.doi: 10.1017/CBO9780511806087.
[2] [2] Phillips R., Communications of the ACM. USA,2014.doi: 10.1145/2398356.2398365.
[3] [3] Cabestany,Joan, the 10th International Work-Conference on Artificial Neural Networks: Part I:
Bio-Inspired Systems: Computational and Ambient Intelligence, LNCS 5517, pp. 820–828, 2009.
doi/10.5555/646370.759075.
[4] [4] Berg, J., Tymoczko, J. and Stryer, L., Biochemistry, chap3, copyright by W. H. Freeman, New York,
2002.doi: 10.1042/EBC20160094.
[5] [5] Sarinder K Dhillo, Biological Databases, Kuala Lumpur, Malaysia ,2018. doi:
10.1002/0471250953.bi0101s27.
[6] [6] Cooper, B., Genome Biol. 15: R67,2014. doi:10.1186/gb-2014-15-5-r67
[7] [7] Yusuf A., Sufyanu Z., Mamman K., Suleiman A., International Journal of Applied and Advanced
Scientific Research, Page Number 19-28, Volume 1, Issue 1, 2016. doi:10.5281/zenodo.154757
[8] [8] Agarwala R, et al. NCBI Resource Coordinators,2015. doi: 10.1093/nar/gkv1290.
[9] [9] Y. Tateno, T. Imanishi, S. Miyazaki, K. Fukami-Kobayashi, N. Saitou, H. Sugawara, T. Gojobori.
Nucleic Acids Res. 30 (1): 27–30,2002. doi: 10.1093/nar/30.1.27
[10] Berman H., Battistuz T., Bhat T., Bluhm W., Bourne P., Burkhardt K., Feng Z., Gilliland G., Iype
L., Jain S., Fagan P., Marvin J., Padilla D., Ravichandran V., Schneider B., Thanki N., Weissig

359
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

H., Westbrook J., Zardecki C., ACTA Crystallographica Section D-Biological Crystallography
58,2002.doi: 10.1007/978-1-4020-6754-9_13596.
[11] W. C. Barker, J. Garavelli, Hongzhan Huang, P. McGarvey, B. C. Orcutt, G. Srinivasarao, C. Xiao, L.
Yeh, R. Ledley, Joseph F. Janda, F. Pfeiffer, H. Mewes, A. Tsugita, Cathy H. Wu, Nucleic
[12] Apweiler R., Bairoch A., Wu C., Barker W., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez
R., Magrane M., Martin M., Natale D., O'Donovan C., Redaschi N., Yeh Y., Nucleic acids research
32. suppl_1, D115-D119,2004. doi: 10.1093/nar/gkaa1100.
[13] Lane L., Argoud-Puy G., Britan A., Cusin I., Duek P., Evalet O., Gateau A., Gaudet p., Gleizes
A., Masselot A., Zwahlen C., Bairoch A., Nucleic acids research 40.D1 D76-D83,2012. doi:
10.1093/nar/gkr1179.
[14] Mulder N., Apweiler R., Attwood T., Bairoch A., Barrell D., BatemanA., Binns D., Biswas
M., Bradley P., Bork P., Bucher P., Copley R., Courcelle E., Das U., Durbin R., Falquet
L., Fleischmann W., Sam Griffiths-Jones, Haft D., Harte N., Hulo N., Kahn D., Kanapin
A., Krestyaninova M., Lopez R., Letunic I., Lonsdale D., Silventoinen V., Sandra E. Orchard, Pagni
M., Peyruc D., Chris P. Ponting, Jeremy D. Selengut, Servant F., Christian J. A. Sigrist,4 Robert
Vaughan,1 and Evgueni M. Zdobnov, Nucleic acids research 31.1 ,2003. doi: 10.1093/nar/gkg046
[15] Finn R., Coggill P., Eberhardt R., Eddy S., Mistry J., Mitchell A., Potter S., Punta M., Qureshi M., V
egas AS., Salazar G., Tate J., Bateman A., Nucleic acids research 32. suppl_1 ,D138-D141,2004. doi:
10.1093/nar/gkm960.
[16] Kamburov, Atanas, Nucleic acids research 41. D1: D793-D800, 2013. doi: 10.1093/nar/gks1055.
[17] Servick, Kelly. Can 23andMe have it all?, Vol 349, Issue 6255 • pp. 1472-1477 • DOI:
10.1126/science.349.6255.1472
[18] International HapMap Consortium. The international HapMap project, Nature 426.6968 ,2003. doi:
10.1038/nature02168.
[19] Ada Hamosh, Alan F. Scott, Joanna Amberger, David Valle, and Victor A. McKusick. Nucleic acids
research 15.1: 57-61, 2000.doi: 10.1002/ajmg.a.62407.
[20] Kozomara, Ana, and Sam Griffiths-Jones ,Nucleic acids research 42.D1: D68-D73,2014. doi:
10.1093/nar/gkt1181.
[21] Uhlén M. , Björling E., Agaton C., Szigyarto C., Amini B., Andersen E., Andersson A., Angelidou
P., Asplund A., Asplund C., Berglund L., Bergström K., Brumer H., Cerjan D., Ekström M., Elobeid
A., Eriksson C., Fagerberg L., Falk R., Fall J., Forsberg M., Björklund M., Gumbel K., Halimi
A., Hallin I., Hamsten C, Hansson M., Hedhammar M., Hercules G., Kampf K., Larsson K., Lindskog
M., Lodewyckx W., Jan Lund, Joakim Lundeberg, Magnusson K., Malm E., Nilsson P., Odling
J., Oksvold P., OlssonI., Oster E., Jenny Ottosson, Linda Paavilainen, Anja Persson, Rimini
R., Rockberg J., Runeson M., Sivertsson A., Sköllermo A., Johanna Steen, Maria Stenvall, Fredrik
Sterky, Strömberg S., Sundberg M., Tegel H., Tourle S., Wahlund E., Waldén A., Wan J., Wernérus
H., Westberg J., Wester K., Wrethagen U., Lan Xu L., Hober S., Pontén F.,Molecular & cellular
proteomics 4.12: 1920-1932, 2005. doi: 10.1074/mcp.M500279-MCP200.
[22] A G Murzin , S E Brenner, T Hubbard, C Chothia., Journal of molecular biology 247.4: 536-540, 1995.
doi: 10.1093/nar/25.1.236.
[23] Røgen, Peter, Fain B., Proceedings of the National Academy of Sciences 100.1: 119-124, 2003. doi:
10.1073/pnas.2636460100.
[24] D. Gardner , D. H. Goldberg , B. Grafstein , A. Robert, Neuroinform (2008) 6:149–160, 2008. doi:
10.1007/s12021-008-9024-z.
[25] Arita M., Mizrachi I., Cochrane G., Nucleic acids research 44.D1: D48-D50, 2016. doi:
10.1093/nar/gkaa967.
[26] T Hubbard , D Barker, E Birney, G Cameron, Y Chen, L Clark, T Cox, J Cuff, V Curwen, T Down, R
Durbin, E Eyras, J Gilbert, M Hammond, L Huminiecki, A Kasprzyk, H Lehvaslaiho, P Lijnzaad, C
Melsopp, E Mongin, R Pettett, M Pocock, S Potter, A Rust, E Schmidt, S Searle, G Slater, J Smith, W
Spooner, A Stabenau, J Stalker, E Stupka, A Ureta-Vidal, I Vastrik, M Clamp., Nucleic acids research
30.1: 38-41, 2002. doi: 10.1093/nar/30.1.38.
[27] Pruitt K., Tatusova T., Ostell J., The NCBI Handbook 2 ,2002. doi: 10.1093/nar/gkv1189.
[28] Gaulton A., Bellis L., Bento A., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich
D., Al-Lazikani B., John P Overington., Nucleic acids research 40.D1: D1100-D1107, 2012. doi:
10.1093/nar/gkv1189.
[29] Kanehisa, Minoru. The KEGG database, Novartis Foundation Symposium. Chichester; New York; John
Wiley;1999, 2002. doi:10.1002/0470857897.
[30] Rhee, Yon S.,Nucleic acids research 26.1: 55-59., 2003. doi: 10.1093/nar/gkg076.
360
Fawzy, et al AJBAS Volume 3, Issue II, 2022

[31] Angrist, Misha. US National Library of Medicine National Institutes of Health 6.6: 691-699, 2009. doi:
10.2217/pme.09.48.
[32] Winnenburg R., K. Baldwin T., Urban M., Rawlings C., Köhler J., Kim E. Hammond-Kosack. Nucleic
acids research 34. suppl_1: D459-D464, 2006. doi: 10.3389/fpls.2015.00605.
[33] J. Michael Cherry, Nucleic acids research 26.1: 73-79, 1998. doi: 10.1101/pdb.top083840.
[34] Parr, C.S., Wilson, N., Leary, P., Schulz, K.S., Lans, K., Walley, L.,Corrigan JR.,Biodiversity Data
Journal, (2), e1079. Advance online publication, 2014. doi:10.3897/BDJ.2.e1079.
[35] Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Rudnev, D., Evangelista, C., Edgar, Biology,
Nucleic Acids Res, 37 (Database issue), D885-D890, 2009. doi: 10.1093/nar/gks1193.
[36] Cariaso, Mike, and G. Lennon, Biology, Nucleic Acids Res, 2010.doi: 10.1007/978-1-4419-9863-
7_1039.
[37] M. Mangan, Jennifer M. Williams, Lathe S., D. Karolchik, W. Lathe ,Biotechnology annual review 14:
63-108 , 2008.doi: 10.1016/S1387-2656(08)00003-3.
[38] FlyBase Consortium., Nucleic Acids Research 27.1: 85-88, 1999. doi: 10.1093/nar/27.1.85.
[39] Xenarios, Ioannis, and Eisenberg D., Current Opinion in Biotechnology 12.4: 334-339, 2001.doi:
10.1007/978-1-4419-9863-7_1046.
[40] Elizabeth M. Smigielski, K. Sirotkin, M. Ward, S. Sherry, Nucleic acids research 28.1: 352-355, 2000.
doi: 10.1093/nar/28.1.352.
[41] Deutsch EW, Eng JK., Zhang H., King NL., Nesvizhskii AI., Lin B., Lee H., Yi EC., Ossola
R., Aebersold R., Proteomics 5.13: 3497-3500, 2005. doi: 10.1002/pmic.200500160.
[42] Martens L., Hermjakob H., Jones P., Adamski M., Taylor C., States D., Gevaert K., Vandekerckhove
J., Apweiler R., Proteomics 5.13: 3537-3545,2005. doi: 10.1002/pmic.200401303.
[43] Rawlings, Neil D., Alan J., Barrett, Bateman A., Nucleic acids research 38. suppl_1: D227-D233,
2010. doi: 10.1093/nar/gkh071.
[44] Huss III, Jon W., PLoS biology 6.7 ,2008. doi: 10.1371/journal.pbio.0060175.
[45] Keseler I., Mackie M., Zavaleta SA., Billington R., Martínez C., Caspi R., Fulcher C.,
GamaCastro S., Kothari A., Krummenacker M., Latendresse M., Muñiz-
Rascado L., Ong Q., Paley S., Peralta-Gil M., Subhraveti P., David A Velázquez-
Ramírez , Weaver D., Collado-Vides J., Paulsen I., Peter D Karp., Nucleic acids research 30.1: 56-58,
2002. doi: 10.1128/ecosalplus.ESP-0009-2013.
[46] Shi M., Zhang N., ShiC., Liu C., Luo Z., Wang D., Guo A., Chen Z., Nucleic acids research 47.D1:
D835-D840, 2019. doi: 10.1093/nar/gky1040.
[47] Oldridge DA.,Wood AC., Weichert-Leahey N., Crimmins I., Sussman R.,Winter C., McDaniel LD.,
a. Diamond M., Hart L., Zhu S., Durbin A., Abraham B., Anders L., Tian L., Zhang S., Wei J.,
b. Khan J., Bramlett K., Rahman N., Capasso M., Iolascon A., Gerhard D., Auvil JG., Young R.,
Hakonarson H., Diskin S., Look aT., Maris JM., Nature, 528.7582: 418-421, 2015.
doi:10.1038/nature15540.
[48] Noronha A., Daníelsdóttir A., Gawron P., Jóhannsson F., Jónsdóttir S., Jarlsson S., Brynjólfsson S., Sch
neider R., Thiele I., Fleming R., Computer science, Bioinformatics 33.4 :605-607, 2017. doi:
10.1093/bioinformatics/btw667.
[49] Phaneuf PV., Gosting D., Palsson BO., Feist AM., Nucleic acids research 47.D1: D1164-D1171,
2019. doi: 10.1093/nar/gky983.
[50] Kinney K., Titus-Glover K., Wren JD., Varghese RT., Michalak P., Liao H., Anandakrishnan
R., Pulenthiran A., Kang L., Nucleic acids research 47.D1 : D39-D45,2019. doi: 10.1093/nar/gky969.
[51] Todd W. Harris, Lee R., Schwarz E., Bradnam K., LawsonD., Chen W., Blasier D., Kenny
E., Cunningham F., Kishore R., Chan J., Muller H., Petcherski A., Thorisson G., Day A., Bieri
T., Rogers A., Chen C., Spieth J., Sternberg P., Durbin R., Lincoln D. Stein, Nucleic acids research
31.1: 133-137, 2003. doi: 10.1093/nar/gkg053
[52] Bowes JB., Snyder KA., Segerdell E., Gibb R., Jarabek C., Noumen E., Pollet N., Peter D. Vize,
Nucleic acids research 36. suppl_1: D761-D767, 2007. doi: 10.1093/nar/gkm826.
[53] Kimball R, Ross M, Mundy J,. The kimball group reader. John Wiley & Sons, New York, NY,2015
[54] Severin, J., Waterhouse AM., Kawaji H., Lassmann T., van Nimwegen E., Balwierz PJ., Hoon MJ.,
Hume DA., Carninci P., Hayashizaki Y., Suzuki H., Daub CO., Forrest AR., Genome Biology, 10(4),
R39,2009. doi: 10.1186/gb-2009-10-4-r39
[55] Stark C., Breitkreutz B.J., Reguly T., Boucher L., Breitkreutz A., Tyers M., Nucleic acids research,
34(suppl_1), pp. D535–D539,2006. doi: 10.1093/nar/gkj109.

361
AJBAS Volume 3, Issue II, 2022 Fawzy, et al

[56] Parr C.S., Wilson N., Leary P., Schulz K.S., Lans K., Walley L.,Corrigan r., Biodiversity Data Journal,
(2), e1079. Advance online publication,2014. doi:10.3897/BDJ.2.e1079.
[57] Barrett T., Troup D.B., Wilhite S.E., Ledoux P., Rudnev D., Evangelista C., Edgar R., Nucleic Acids
Research, 37 (Database issue), D885-D890, 2009. doi: 10.1093/nar/gkn764.
[58] Caswell J., Gans JD., Generous N., Merkley E., Johnson C., Oehmen C., Omberg K., Purvine E., Tayl
or K., Ting CL., Wolinsky M., Xie G., Front. Bioeng. Biotechnol. 7:58, 2019.
doi:10.3389/fbioe.2019.00058
[59] Subramaniam S., PROTEINS: Structure, Function, and Genetics, 1998.doi: 10.1145/360262.360265.
[60] Kohler J., Drug Discovery Today: BIOSILICO 2, 61–69 ,2004.doi: 10.1016/S1741-8364(04)02392-3.
[61] RENCI, international conference on Digital government research, 2007.doi:
10.1145/1146598.1146708.
[62] Letondal C., Bioinformatics 17(1), 73–82 ,2001. doi:10.1093/bioinformatics/17.1.73.
[63] Rifaieh R., Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006. Proceedings ,
2007.doi: 10.1007/978-3-319-69751-2.
[64] Badidi, E., De Sousa, C., Lang, B.F., Burger, G.: United Arab Emirates University 4, 63–72 ,2003.doi:
10.1186/1471-2105-4-63
[65] Badidi, E., De Sousa, C., Lang, B.F., Burger, G., International Conference on Innovations in
Information Technology ,50(7), 785–793 ,2004.
[66] Pereira,R. and Mendes,R, International Journal of Bioscience, Biochemistry and Bioinformatics, Vol.
4, No. 5, September 2014.doi: 10.7763/IJBBB.2014.V4.368
[67] Malick R., Ussama M., and Azim MK., Journal of Information Technology & Software Engineering,
Issue 1, 3:115, 2013.doi: 10.4172/2165-7866.1000115.
[68] C Orengo , A Michie, S Jones, D T Jones, M B Swindells, J M Thornton., biology, Structure 5.8: 1093-
1109, 1997. doi: 10.1016/s0969-2126(97)00260-8.
[69] Nuala A., O’Leary, Mathew W., Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh,
Bhanu Rajput, Barbara Robbertse, Brian Smith-White, Danso Ako-Adjei, Alexander Astashyn, Azat
Badretdin, Yiming Bao, Olga Blinkova, Vyacheslav Brover, Vyacheslav Chetvernin, Jinna Choi, Eric Cox,
Olga Ermolaeva, Catherine M. Farrell, Tamara Goldfarb, Tripti Gupta, Daniel Haft, Eneida Hatcher,
Wratko Hlavina, Vinita S. Joardar, Vamsi K., Kodali, Wenjun Li., Donna Maglott, Patrick Masterson,
Kelly M. McGarvey, Michael R. Murphy, Kathleen O’Neill, Shashikant Pujar, Sanjida H. Rangwala,
Daniel Rausch, Lillian D. Riddick, Conrad Schoch, Andrei Shkeda, Susan S. Storz, Hanzhen Sun,
Francoise Thibaud-Nissen, Igor Tolstoy, Raymond E. Tully, Anjana R. Vatsan, Craig Wallin, David Webb,
Wendy Wu, Melissa J. Landrum, Avi Kimchi, Tatiana Tatusova, Michael DiCuccio, Paul Kitts, Terence D.
Murphy and Kim D. Pruit, Nucleic Acids Res. 2016 Jan 4; 44(Database issue): D733–D745.Published
online 2015 Nov 8. doi: 10.1093/nar/gkv1189
[70] Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J, Nucleic Acids Research, Volume 39,
Issue suppl_2, 1 July 2011, Pages W546–W550, doi: 10.1093/nar/gkn877.
[71] Sima AC., Zbinden E., Anisimova M., Gil M., Stockinger H., Stockinger K., Rechavi MR., serveur
académique Lausannois, Volume 2019, 2019, baz106, 07 November 2019 .doi: 10.1093 /baz106

362

You might also like