0% found this document useful (0 votes)
53 views

Databases and Ontologies

This document discusses the main objectives in bioinformatics and computational biology, including databases and ontologies, sequence analysis, expression analysis, genetics and population analysis, and structural bioinformatics. It describes some key databases and tools used for tasks like gene annotation, sequence alignment, and analyzing genetic variation and protein structure.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Databases and Ontologies

This document discusses the main objectives in bioinformatics and computational biology, including databases and ontologies, sequence analysis, expression analysis, genetics and population analysis, and structural bioinformatics. It describes some key databases and tools used for tasks like gene annotation, sequence alignment, and analyzing genetic variation and protein structure.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Bioinformatics and Computational Biology

for the data mining community. Each of these problems databases for similar sequences, using knowledge
requires different tools, computational techniques and about protein evolution. In the context of genom- B
machine learning methods. In the following section we ics, genome annotation is the process biological
briefly describe the main objectives in these areas: features in a sequence. A popular system is the
ensembl system which produces and maintains
1. Databases and ontologies. The overwhelming automatic annotation on selected eukaryotic
array of data being produced by experimental genomes (http://www.ensembl.org).
projects is continually being added to a collection 3. Expression analysis. The expression of genes
of databases. The primary databases typically hold can be determined by measuring mRNA levels
raw data and submission is often a requirement with techniques including microarrays, expressed
for publication. Primary databases include: a) cDNA sequence tag (EST) sequencing, sequence
sequence databases such as Genbank, EMBL and tag reading (e.g., SAGE and CAGE), massively
DDBJ, which hold nucleic acid sequence data parallel signature sequencing (MPSS), or various
(DNA, RNA), b) microarray databases such as applications of multiplexed in-situ hybridization.
ArrayExpress (Parkinson et. al. 2005), c) literature Recently the development of protein microar-
databases containing links to published articles rays and high throughput mass spectrometry can
such as PubMed (http://www.pubmed.com), and provide a snapshot of the proteins present in a
d) PDB containing protein structure data. Derived biological sample. All of these techniques, while
databases, created by analyzing the contents of pri- powerful, are noise-prone and/or subject to bias
mary databases creating higher order information in the biological measurement. Thus, a major
such as a) protein domains, families and functional research area in computational biology involves
sites (InterPro, http://www.ebi.ac.uk/interpro/, developing statistical tools to separate signal
Mulder et. al. 2003), and b) gene catalogs provid- from noise in high-throughput gene expression
ing data from many different sources (GeneLynx, (HT) studies. Expression studies are often used
http://www.genelynx.org, Lenhard et. al. 2001, as a first step in the process of identifying genes
GeneCards, http://www.genecards.org, Safran involved in pathologies by comparing the ex-
et. al. 2003). An essential addition is the Gene pression levels of genes between different tissue
Ontology project (http://www.geneontology.org). types (e.g. breast cancer cells vs. normal cells.) It
The Gene Ontology Consortium (2000), which is then possible to apply clustering algorithms to
provides a controlled vocabulary to describe genes the data to determine the properties of cancerous
and gene product attributes. vs. normal cells, leading to classifiers to diagnose
2. Sequence analysis. The most fundamental aspect novel samples. For a review of the microarray
of bioinformatics is sequence analysis. This broad approaches in cancer, see Wang (2005).
term can be thought of as the identification of 4. Genetics and population analysis. The genetic
biologically significant regions in DNA, RNA variation in the population holds the key to identi-
or protein sequences. Genomic sequence data is fying disease associated genes. Common polymor-
analyzed to identify genes that code for RNAs phisms such as single nucleotide polymorphisms
or proteins, as well as regulatory sequences (SNPs), insertions and deletions (indels) have
involved in turning on and off of genes. Protein been identified and ~3 million records are in the
sequence data is analyzed to identify signaling HGVBase polymorphism database (Fredman et.
and structural information such as the location of al. 2004). The international HapMap project is a
biological active site(s). A comparison of genes key resource for finding genes affecting health,
within or between different species can reveal disease, and drug response (The International
relationships between the genes (i.e. functional HapMap Consortium, 2005).
constraints). However, manual analysis of DNA 5. Structural bioinformatics. A proteins amino acid
sequences is impossible given the huge amount sequence (primary structure), is determined
of data present. Database searching tools such as from the sequence of the gene that encodes it.
BLAST (http://www.ncbi.nlm.nih.gov/BLAST, This structure uniquely determines its physical
Altschul et. al. 1990) are used to search the structure. Knowledge of structure is vital to



You might also like