0% found this document useful (0 votes)
9 views

Lecture 5- DataBase

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 5- DataBase

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

ISC 211 Introduction

to Bioinformatics
Lecture 5 – Bioinformatics DataBase
Dr. Athira B
Asst. Professor, CSE
IIIT Kottayam
Motivation
• Key concept in Molecular Biology is the information flow
DNA →RNA→ Protein
• From a data point of view: we have multiple omic data:
Genomics → Trancriptomics → Proteomic → Metabolomisc
• This vast amount of data needs to be stored and organized for easy
access around the globe
Motivation-Human Genome Project
• A landmark global scientific effort whose signature goal was to
generate the first sequence of the human genome (almost all genes in
human)
• Identified 1,00,000 genes in DNA
• more than 3 Billion base pairs were extracted
• The goals were:
• Alert patients that are at risk of certain diseases
• Reliably predict course of disease
• Precise diagnose and treatment
• Developing new treatments at molecular level
• Milestone in Biomedical Research
• https://www.genome.gov/about-genomics/educational-resources/
fact-sheets/human-genome-project.
Motivation-Biological Big Data
• Advancement in sequencing techniques generated good amount of
Biological data
• Similar to human, genetic data of other model organisms are also
generated:
• Yeast (Saccharomyces cerevisiae)
• Fruit fly (Drosophila melanogaster)
• Nematode worm (Caenorhabditis elegans)
• Western clawed frog (Xenopus tropicalis)
• Mouse (Mus musculus)
• Zebrafish (Danio rerio)
• How to store these data so that researchers can easily retrieve data
efficiently
Databases
• Database stores and organizes related data for easy retrieval
Eg: Your Phone contact book
• Most common form of Database is relational database (SQL)
• There are many other databases- column databases, graph databases,
etc
• Biological databases stores biological data and associated knowledge
• These knowledge bases are fundamentals to the survival of science
Biological Databases
• Store and handle the staggering volume of Biological information
through the establishment and use of computer databases
• Current biological databases use all three types of database
structures: flat files, relational, and object oriented
• Based on their contents, biological databases can be roughly divided
into three categories: primary databases, secondary databases, and
specialized databases.
Primary Databases
• Contain original biological data. They are archives of raw sequence or
structural data submitted by the scientific community
• GenBank, the European Molecular Biology Laboratory (EMBL)
database, Protein Data Bank (PDB) and the DNA Data Bank of
Japan (DDBJ)
Secondary Databases
• Secondary databases contain computationally processed or manually
curated information, based on original information from primary
databases.
• Translated protein sequence databases containing functional
annotation belong to this category
SWISS-PROT
Specialized Databases
• Specialized databases normally serve a specific research community
or focus on a particular organism
• The content of these databases may be sequences or other types of
information
• Examples include Flybase, WormBase, AceDB, Microarray gene
expression database, and TAIR
Composite Databases
• Variety of primary databases combined
• One place for different primary databases
Information Retrieval from Biological
Databases
• The most popular retrieval systems for biological databases are
Entrez and Sequence Retrieval Systems (SRS)
• Join a series of keywords using logical terms such as AND, OR, and
NOT to indicate relationships between the keywords used in a search
• Entrez3, a biological database retrieval system by NCBI
• For a complex search, a user can use the Boolean operators
• Online Mendelian Inheritance in Man (OMIM) accessible from Entrez,
which is a non-sequence-based database of human disease genes and
human genetic disorders
GenBank
• GenBank is the most complete collection of annotated nucleic acid
sequence data for almost every organism.
• The content includes genomic DNA, mRNA, cDNA, ESTs, high
throughput raw sequence data, and sequence polymorphisms
• There is also a GenPept database for protein sequences
GenBank: Sequence Format
Header
• origin of the sequence, identification of organism, unique identifiers
• Locus: unique database identifier
• Sequence length and molecule type(DNA or RNA)
• Three-letter code eg: PLN for plant, BCT for bacteria…
• Definition : name of the sequence, name and source of organism,
whether sequence is partial or complete
• Accession number : number cited in publications
• Version number : to identify the current version, if the sequence is
revised at a later stage
• Organism: source of organism with the scientific name of the species
• Reference : author and title information, contact information
Gene information
• Features : annotation information
• Source: length of sequence, scientific name of organism
• Gene : nucleotide coding sequence and its name
• CDS : information about boundaries of the sequence that can be
translated into amino acids. For eukaryotic, locaton of exons also
mentioned
DNA SEQUENCE
• ORIGIN: sequence itself; ends with two forward slashes (“//”)

• In retrieving the DNA sequence, search can be limited to “organism”,


“accession number”, “author”, “publication date”.
Fasta: Sequence Format
Reading Assignment
• Read more on Biological Databases:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4411498/
[ZMYZ15]
• Practice: explore various databases
• Assessment
• Bring your laptops
• Explore Entrez: https://www.ncbi.nlm.nih.gov/search/
• Explore NCBI databases
• Read Chapter 2, Essential Bioinformatics by Jin Xiong[Xio06]

You might also like