Bioinformatics. CH 3 Databases (Summarized Notes)
Bioinformatics. CH 3 Databases (Summarized Notes)
3 Database
A database is a computerized archive used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria.
Databases are composed of computer hardware and software for data management.
The chief objective of the development of a database is to organize data in a set of
structured records to enable easy retrieval of information.
Types of databases
• Originally, databases all used a flat file format, which is a long text file that contains
many entries separated by a delimiter, a special character such as a vertical bar (|).
Within each entry are a number of fields separated by tabs or commas.
• To facilitate the access and retrieval of data, sophisticated computer software
programs for organizing, searching, and accessing data have been developed. They
are called database management systems.
Biological databases
Based on their contents, biological databases can be roughly divided into four categories:
primary databases, secondary databases, specialized databases and composite databases
Example. A few popular databases are GenBank from NCBI (National Center for
Biotechnology Information), SwissProt from the Swiss Institute of Bioinformatics and PIR
from the Protein Information Resource.
1. Primary databases
Primary databases are also called as archieval database.
They are populated with experimentally derived data such as nucleotide sequence,
protein sequence or macromolecular structure.
Experimental results are submitted directly into the database by researchers, and the
data are essentially archival in nature.
Once given a database accession number, the data in primary databases are never
changed: they form part of the scientific record.
Examples
Swiss-Prot and PIR for protein sequences
GenBank and DDBJ for genome sequences
Protein Databank (PDB) for protein structures
2. Secondary databases
Secondary databases comprise data derived from the results of analysing primary
data.
Secondary databases often draw upon information from numerous sources,
including other databases (primary and secondary), controlled vocabularies and the
scientific literature.
They are highly curated, often using a complex combination of computational
algorithms and manual analysis and interpretation to derive new knowledge from
the public record of science.
Bioinformatics collected by Dr. Nawab Ali Assistant Prof. GDC Thana, District Malakand 1
Examples
InterPro (protein families, motifs and domains)
UniProt Knowledgebase (sequence and functional information on proteins)
Ensembl (variation, function, regulation and more layered onto whole genome
sequences)
SWISS-Prot and Protein Information Resources (PIR)
3) Specialized databases
There are also specialized databases are those that cater to a particular research
interest. For example, Flybase, HIV sequence database, and Ribosomal Database Project are
databases that specialize in a particular organism or a particular type of data.
4) Composite databases
It contain a variety of primary databases, which eliminates the need to search each
one separately. Each composite database has different search algorithms and data structures.
The NCBI hosts these databases, where links to the Online Mendelian Inheritance in Man
(OMIM) is found.
Importance of Databases
1. Databases act as a store house of information.
2. Databases are used to store and organize data in such a way that information
can be retrieved easily via a variety of search criteria.
3. It allows knowledge discovery, which refers to the identification of
connections between pieces of information that were not known when the
Bioinformatics collected by Dr. Nawab Ali Assistant Prof. GDC Thana, District Malakand 2
information was first entered. This facilitates the discovery of new biological
insights from raw data.
4. Secondary databases have become the molecular biologist’s reference library
over the past decade or so, providing a wealth of information on just about
any gene or gene product that has been investigated by the research
community.
5. It helps to solve cases where many users want to access the same entries of
data.
6. It helps to remove redundancy of data.
Bioinformatics collected by Dr. Nawab Ali Assistant Prof. GDC Thana, District Malakand 3
Nucleic acid sequence databases
The Nucleotide database is a collection of sequences from several sources, including
GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the
foundation for biomedical research and discovery.
Bioinformatics collected by Dr. Nawab Ali Assistant Prof. GDC Thana, District Malakand 4
Protein databases
The biological information of proteins is available as sequences and structures.
Sequences are represented in a single dimension whereas the structure contains the
three-dimensional data of sequences.
A protein database is one or more datasets about proteins, which could include
a protein’s amino acid sequence, conformation, structure, and features such as active
sites.
Protein databases are compiled by the translation of DNA sequences from different
gene databases and include structural information. They are an important resource
because proteins mediate most biological functions.
Importance of protein databases
Huge amounts of data for protein structures, functions, and particularly sequences are being
generated. Searching databases are often the first step in the study of a new protein. It has
the following uses:
1. Comparison between proteins or between protein families provides information
about the relationship between proteins within a genome or across different species
and hence offers much more information that can be obtained by studying only an
isolated protein.
2. Secondary databases derived from experimental databases are also widely available.
These databases reorganize and annotate the data or provide predictions.
3. The use of multiple databases often helps researchers understand the structure and
function of a protein.
Bioinformatics collected by Dr. Nawab Ali Assistant Prof. GDC Thana, District Malakand 5