0% found this document useful (0 votes)
9 views

biologicaldatabase-190402034501

The document provides an overview of biological databases, which are collections of machine-readable records of biological data that can be accessed and modified. It classifies databases into primary, secondary, and composite types, detailing examples such as GenBank, DDBJ, and EMBL for nucleotide sequences, as well as Swiss-Prot and UniProt for protein sequences. Additionally, it discusses the significance of these databases in managing and sharing biological information for research purposes.

Uploaded by

benjaminkatiyo76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

biologicaldatabase-190402034501

The document provides an overview of biological databases, which are collections of machine-readable records of biological data that can be accessed and modified. It classifies databases into primary, secondary, and composite types, detailing examples such as GenBank, DDBJ, and EMBL for nucleotide sequences, as well as Swiss-Prot and UniProt for protein sequences. Additionally, it discusses the significance of these databases in managing and sharing biological information for research purposes.

Uploaded by

benjaminkatiyo76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

BIOLOGICAL DATABASE

Dr. Nusaifa Beevi.P


Associate Professor & HOD,
PG Department of Botany,
Iqbal College, Peringammala
INTRODUCTION
 BIOLOGICAL DATABASES
 Collection of files containing records of
biological data in machine readable form
 Can be accessed, added, retrieved,
manipulated and modified
 Store, manage, connect and distribute
data
 Data are arranged by sets of rules which
are programmed into software that
manages the data called Database
Management System or DBMS.
Classification based on type of
data stored
 Primary Databases: Contain original data in
the form of primary sequence data or structural
data as submitted by the scientific community.
 Secondary Databases: Contain information
that has been processed and derived from the
raw data available in primary database.eg:
PROSITE, PRINTS, BLOCKS etc..
 Composite Databases: Collect and present
data after comparing and filtering them from
different primary databases and exhibit only
the non-redundant sequences
PRIMARY DATABASES
 Nucleic acid databases: Gen Bank,
EMBL,DDBJ

 Protein sequence databases: PIR, Swiss-


Prot, UNIPROT

 Protein structure database: PDB

 Metabolic databases: KEGG


Nucleotide sequence
database
 Composed of a group of nucleotide sequence
entries.
 Data repositories that accept nucleic acid
sequence data and make it freely available to
the public.
 GenBank, EMBL,DDBJ are principal
nucleotide databases.
 All the three are members of the
International Nucleotide Sequence
Database Consortium (INSDC) and
interchange data.
Gen Bank of NCBI
 Hosted by National Centre for Biotechnology
Information (NCBI), situated at the campus of US National
Institute of Health, USA.
 Gen Bank offers all publicly available nucleotide sequences,
their protein translation, and their annotated information.
 It also facilitate direct submission of sequence data by a
user friendly process.
 Researchers from anywhere can submit their data to Gen
Bank.
 An accession number is given to the submitted sequence
and then released to the public database after the quality
assurance check.
 This information can be retrieved using the Entrez
retrieval system.
 We can access the data in NCBI over the internet through
their site, http://www.ncbi.nlm.nih.gov/genbank
Home page of GenBank
DNA DATABANK OF JAPAN (DDBJ)
 Started in 1986, hosted now at National Institute of
Genetics, Japan.
 Gather data mainly from scientists in Japan and from
researchers all over the world.
 This can also share nucleotide sequence data with Gen
Bank and EMBL.
 About 99% of the nucleotide data in INSDC submitted by
Japanese researchers through DDBJ, and enhances the
quality of INSDC.
 It includes details of sequences, submitters details,
biological significance , and the scientific name and
taxonomy of the organism. In addition, features that
identify coding region, transcription units, mutation sites
etc. are also displayed in a feature table.
DDBJ Contd…..
 Major activities of the DDBJ include, providing
internationally recognized accession
numbers to sequences, bioinformatics
database management, developing tools for
the analysis and visualization of biological
data, and also conducting courses for
beginners to reduce the complexity in the
biological data analysis.
 DDBJ can be accessed through homepage,

http://www.ddbj.nig.ac.jp/.
DDBJ homepage
EMBL database
 European Molecular Biology Laboratory Nucleotide
Sequence Database, first established in 1974.
 Hosted at UK by the EMBL European Bioinformatics
Institute.
 EMBL is a non-profit research institution supported by 20
European countries and Australia, for Molecular Biology
Research.
 EMBL collects nucleotide sequence data from individual
researchers, genome sequence projects and patent
applications.
 Sequences are stored in this database as they would exist
in the biological state.
 The stored data correspond to wild type sequences
without mutation or genetic manipulation.
 Accessed through the URL, http://www.ebi.ac.uk/embl
EMBL homepage
PROTEIN SEQUENCE
DATABASES

 An array of amino acid sequence entries


arranged according to the identification
number.
 Well known protein sequence databases

available on www are


◦ Swiss-Prot
◦ PIR
◦ UNIPROT
Swiss-Prot
 Developed by the Swiss Institute of Bioinformatics (SIB)
and European Bioinformatics Institute(EBI).
 High quality, manually annotated protein sequence
database created in 1986.
 It provides high level annotations with functions of protein
and post transcriptional modifications.
 It provide all known relevant information about a
particular protein.
 Consists of two sections:- UniProt KB/Swiss-Prot, which is
manually annotated and is reviewed, and Uni
ProtKB/TrEMBL, which is automatically annotated and not
reviewed.
 Available at http://www.expasy.ch/sprot
Swiss-Prot homepage
PIR
 Protein Information Resource database
 Established in 1984, by National Biomedical Research
Foundation (NBRF).
 It is an integrated public bioinformatics resource that
support genomic and proteomic research, and scientific
studies.
 It assists researchers in the identification and
interpretation of protein sequence information.
 PIR can be searched for entries or sequence similarity
searches.
 Can be downloaded at
http://www.pir.georgetown.edu/.
 PIR offers a variety of resources mainly oriented to
assist the propagation and standardization of
protein annotation.
PIR homepage
UNIPROT
 It provide a comprehensive, high quality and
freely accessible resource of protein
sequence.
 Entries are derived from genome
sequencing projects.
 The Uniprot consortium comprises the
European Bioinformatics Institute(EBI),the
Swiss Institute of Bioinformatics(SIB), And
the Protein Information Resourse(PIR).
 Uniprot is composed of four components,

each optimized for different uses.


COMPONENTS OF UNIPROT
 1. UniProt Knowledge Base (UniProtKB)-
For extensive curated protein information
with two sections-UniProt KB/Swiss-
Prot, which is manually annotated and
is reviewed, and Uni ProtKB/TrEMBL,
which is automatically annotated and not
reviewed.
 2. UniProt Reference Clusters (UniRef)
 3. UniProt Archive (UniParc)
 4. UniProt Metagenomic and

Environmental Sequences (UniMes)


UNIPROT homepage
PROTEIN STRUCTURE DATABASE
 Many proteins which exhibit a common
evolutionary origin, show structural
similarities.
 Dissimilar proteins exhibit changes in

primary, secondary, teritiary and


quarternary structures.
 Similar or dissimilar protein structure

can be predicted with structure


database.
 These databases store a collection of

three dimensional structures of


PROTEIN DATA BANK (PDB)
 Understanding the shape of a molecule helps to
understand how it works.
 PDB is the main primary database used for the
prediction of 3D Structures of proteins and nucleic
acids.
 The single world wide archive of structural data.
 Maintained by the Research Collaboratory for
structural bioinformatics (RCSB)
 The data obtained from X-ray chrystallography and
NMR-spectroscopy, are submitted to the PDB.
 Then, these structures are annotated as per the
depositors specifications.
 Freely available and accessed through URL
http://www.pdb.org/
PDB homepage
MODEL ORGANISM
DATABASE
 MODs are also called Organism – specific

databases. They describe genome and other


information about well studied experimental
organisms in life sciences.
 They store large volumes of data and allow users
to analyse results and interpret datasets and data
they generated. ( organism of their own interest).
 Examples:
 Fly Base- database of Drosophilla melanogaster
 SGD- Sacharomyces Genome Database
 AGR- Arabidopsis Genome Resource
 HGP- Human Genome Project
 RGD- Rat Genome Database etc…
BIODIVERSITY DATABASE
 Provide information on the biodiversity of a particular area or
group of living organisms.
 They may store genus level information, species level
information, information on nomenclature or any
combination of the three.
 Species 2000
◦ Established in September 1994, by the International Union of
Biological Sciences(IUBS), in co-operation with the committee on
Data for science and technology(CODATA) and the International
Union of Microbiological Sciences(IUMS).
◦ It is a Federation of database organizations working closely with
users, taxonomists and sponsoring agencies.
◦ It plans to create an array of participant global species databases
covering each of the major groups of organisms(plants, animals,
fungi and microbes)
◦ The goal of species 2000 is to provide a uniform and validated
quality index of names of all known species for use as a practical
tool.
Thank You

You might also like