0% found this document useful (0 votes)
48 views

Protein Databases

Uploaded by

lamkhanhduong356
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Protein Databases

Uploaded by

lamkhanhduong356
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/230229264

Protein Databases

Chapter · September 2005


DOI: 10.1038/npg.els.0005251

CITATIONS READS

0 9,247

2 authors, including:

Amos Bairoch
University of Geneva
304 PUBLICATIONS 94,090 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cellosaurus View project

neXtProt View project

All content following this page was uploaded by Amos Bairoch on 15 January 2014.

The user has requested enhancement of the downloaded file.


Protein Databases

Protein Databases Advanced article

Vivienne Baillie Gerritsen, Swiss Institute of Bioinformatics, Geneva, Switzerland Article contents
Amos Bairoch, Swiss Institute of Bioinformatics, Geneva, Switzerland  Introduction
 Protein Sequence Databases
An abundance of protein databases are available, dealing with fields as diverse as protein  Specialized Protein Sequence Databases
sequences, protein domains, posttranslational modifications and protein protein  Protein Domain and Family Databases
interactions. Such resources are crucial to proteomics research.  Three-dimensional Structure Databases
 Other Types of Database
 Protein Information in Other Types of Database
 Conclusions

Introduction doi: 10.1038/npg.els.0005251

Protein databases have been around for the best part


of half a century. One of the very first protein
databases was the Atlas of Protein Sequence and Protein Sequence Databases
Structure developed by the late Margaret Dayhoff
who founded the Protein Information Resource (PIR). The most comprehensive source of information on
A series of books were published from 1965 to 1978 proteins is found in protein sequence databases, of
until the quantity of data grew so much that an which there are two types:
electronic form was made available to the scientific  universal databases whose aim is to collect bio-
community, known as the PIR-International Protein logical information on the most varied amount of
Sequence Database. Swiss-Prot, the protein sequence species and
knowledgebase founded in 1986 by Amos Bairoch,  specialized databases that cater for specific groups
took its inspiration from PIR but strove to develop a or families of proteins or specific organisms.
database that was nonredundant and extremely well
documented. In the past four decades, a great many A universal protein knowledgebase: United
diverse databases have sprouted: protein sequence
databases, two-dimensional polyacrylamide gel elec-
Protein Databases or UniProt
trophoresis (2D-PAGE) databases, three-dimensional The availability of an ever-increasing volume and
(3D) structure databases, posttranslational modifica- variety of protein sequences and functional informa-
tion databases and metabolic databases among others. tion has seen the creation of a number of protein
It is now common knowledge that information on sequence knowledgebases, namely Swiss-Prot and
proteins is crucial to all biological research because of TrEMBL, operated by researchers from Switzerland
the functions these molecules carry out in cells and the and the European Molecular Biology Laboratory
roles they have in disease processes. (EMBL), and the PIR at Georgetown University
Improvements in deoxyribonucleic acid (DNA) Medical Centre and the National Biomedical Research
sequencing technology are generating information on Foundation (US). These groups combined their
hundreds of thousands of novel proteins and the strengths into a central public resource: the United
number is growing exponentially with ongoing geno- Protein Databases or UniProt. The UniProt non-
mic projects. Today, with this increasing volume and redundant database is built around Swiss-Prot and
variety of information on proteins, databases of TrEMBL, and each UniProt entry is a central hub for
databases are budding, and it is toward these central data regarding a specific protein.
resources that scientists active in modern biological
research should turn, particularly in the field of Swiss-Prot and TrEMBL (see Web Links)
proteomics. This is why it is impossible to list every Swiss-Prot (Boeckmann et al., 2003) is a nonredundant
database that exists, especially as their stories often curated protein knowledge resource that provides a
may be compared with the birth and death of stars: high level of annotation. Besides the stark protein
some are long-lived, while others peter out within sequence, a Swiss-Prot entry offers a wide variety of
months of existence. The range of protein databases, information including the description of the function
however, can be divided into categories, which we have of a protein, its domain structure, posttranslational
listed below, bearing in mind those relevant to research modifications, variants, numerous links to all sorts of
in the field of human genomics. A source of over a other databases, and so on.
thousand databases of interest to proteomic research is Early on it became apparent that Swiss-Prot alone
available (see Web Links). could not cope with the flow of information on

ENCYCLOPEDIA OF LIFE SCIENCES & 2005, John Wiley & Sons, Ltd. www.els.net 1
Protein Databases

hundreds of thousands of new proteins resulting lins, T cell receptors and major histocompatibility
from the never-ending improvements in DNA se- complex molecules of all vertebrate species. It includes
quencing technology. In response, a supplementary sequence databases, Web resources and interactive
database, Translation of EMBL nucleotide sequence tools, and the IMGT server provides common access
database or TrEMBL, was created. TrEMBL to data relative to the field of immunogenetics.
(Boeckmann et al., 2003) consists of computer-
annotated entries in Swiss-Prot format derived from
the translation of coding sequences (CDs) in the
EMBL nucleotide sequence database. Hence TrEMBL
entries are preliminary Swiss-Prot entries that have not
Protein Domain and Family
yet been manually annotated. Databases
The sequence of a new protein can be so distantly
Specialized Protein Sequence related to any other that the detection of any resemb-
lance by similarity searches is obsolete. However,
Databases proteins do have their fingerprints, otherwise known
as patterns, motifs or signatures. These are particular
There are a wide variety of databases dedicated to
clusters of residue types in the sequence, which reflect
groups of proteins, or families of proteins. Some have
conserved regions important to the function of the
no more than a handful of entries, while others are less
protein. A popular way to identify such motifs between
modest and provide a far wider scope. It would be
proteins is to perform a pairwise alignment. When the
quite pointless and near impossible to give a list of all identity is higher than 40%, this method gives good
existing specialized protein sequence databases. What
results. However, the weakness of the pairwise align-
is more, the development of many is stunted after a
ment is that no distinction is made between an amino
short existence, while new ones sprout almost on a
acid at a crucial position (like an active site) and an
daily basis. A positive side to all this is the appearance
amino acid with no critical role. A multiple sequence
of information systems that attempt to collect specific
alignment gives a more general view of a conserved
data into a central resource. In a feeble attempt to
region by giving a better picture of the most conserved
show the diversity, three specialized protein sequence
residues, which are also usually those essential for the
databases are briefly described.
protein’s function. Several databases have developed
their own methods based on multiple sequence align-
G-protein-coupled receptor database: ment in order to identify conserved regions (Table 1).
GPCRDB (see Web Links) A search performed on these databases is very often
more sensitive than a pairwise alignment and can help
GPCRDB (Horn et al., 1998) is an information system to identify very remote homology (less than 20%).
that collects and disseminates GPCR-related data. It
holds sequences, mutant data and ligand-binding con-
stants as primary (experimental) data. Computation-
ally derived data such as multiple sequence alignments,
InterPro (see Web Links)
3D models, phylogenetic trees and 2D visualization InterPro (Mulder et al., 2002) is an integrated
tools are added to enhance the database’s usefulness. documentation resource for protein families, domains
and functional sites that was developed to rationalize
A protease database: MEROPS (see Web Links) the complementary efforts of the individual protein
signature database projects. PRINTS, PROSITE,
MEROPS (Rawlings et al., 2002) provides a wealth of Pfam, ProDom, SMART and TIGRFAMs form the
information on proteases. There are data on individual InterPro core. Each InterPro entry includes a unique
proteases, protease families and also clans into which accession number, functional descriptions and litera-
the families are grouped. Hundreds of proteases can be ture references, and links are made back to the relevant
found by name, identifier or the organism in which member databases.
they occur. InterPro is a useful resource for whole-genome
analysis and has already been used for the proteome
International ImMunoGeneTics database: analysis of a number of completely sequenced organ-
isms, including preliminary analyses of the human
IMGT (see Web Links) genome. Table 1 gives a list of the InterPro database
IMGT (Lefranc, 2001) is a high-quality integrated members as well as a brief description and their URL
information system that specializes in immunoglobu- addresses.

2
Protein Databases

Table 1 InterPro database members

Database (Ref ) Description Method URL

PROSITE A well-documented database of protein Regular expressions and http://www.expasy.org/prosite/


(Falquet et al., 2002) families and domains generalized profiles
Pfam A large collection of multiple sequence Profiles based on hidden http://www.sanger.ac.uk/Software/
(Bateman et al., 2002) alignments and hidden Markov models Markov models Pfam/index.shtml
covering protein domains and families
SMART A collection of protein families Profiles based on hidden http://smart.embl-heidelberg.de/
(Letunic et al., 2002) and domains Markov models
TIGRFAMs A collection of protein families with Profiles based on hidden http://www.tigr.org/TIGRFAMs/
(Haft et al., 2001) an emphasis on microbial proteins Markov models index.shtml
PRINTS A well-documented database of conserved Fingerprints http://www.bioinf.man.ac.uk/
(Attwood et al., 2002) motifs used to characterize protein dbbrowser/PRINTS/
families
ProDom Protein domain database Automated clustering of http://prodes.toulouse.inra.fr/prodom/
(Corpet et al., 2000) homologous domains doc/prodom.html
based on iterative
PSI-BLAST searches

structural and sequence information directly from


Three-dimensional Structure PDB. Two can be mentioned: the SCOP database
Databases and the CATH domain database (see Web Links). The
SCOP database (Murzin et al., 1995) provides a
The primary structure of a protein dictates its spatial detailed and comprehensive description of the struc-
or tertiary structure. The knowledge of the 3D tural and evolutionary relationships between proteins
structure of a protein is crucial for the understanding whose structure is known. CATH (Orengo et al., 1997)
of its precise function, drug design and other biotech- is a hierarchical domain classification of protein
nological applications. Today still, the elucidation of a structures derived from PDB.
protein’s 3D structure is a labor-intensive and techni-
cally demanding process. However, the combination
of fast X-ray detectors, nuclear magnetic resonance Other Types of Database
(NMR) methods, modern biotechnological methods
and access to synchrotron X-ray sources has acceler- There is a never ending variety of databases, which
ated the process from months to days. simply reflects the infinite variety of research that goes on
worldwide. As for all databases, some die fast while
others develop over the years to offer a wider and better
PDB: Protein Data Bank (see Web Links)
range of tools for the biological sciences. Besides protein
PDB (Berman et al., 2000) is a collection of 3D sequence databases, protein domain databases and 3D
structures of proteins but also nucleic acids and other structure databases, the scientific community offers
biological macromolecules. It is in fact the sole databases that collect information on posttranslational
worldwide archive of structural data of biological modifications, 2D-PAGE databases and protein
macromolecules currently holding almost 20 000 protein interaction databases, to name only three.
entries. Naturally, this number is expected to grow
rapidly once the genomics findings are combined with
the structural findings. PDB remains a resource of
Posttranslational modification databases
tremendous and critical importance in the discovery of Once sequenced, most proteins are the target of
new pharmacological agents, new catalysts, new posttranslational modifications (PTMs). Indeed, with-
biomaterials and possibly even nanodevices. out such modifications, such proteins cannot function.
It is therefore of prime importance to characterize
Derived structural databases PTMs. There are few databases with information on
PTMs because the vast majority of this kind of
All proteins tend to have structural similarities, with information is already held with the protein
others echoing a common evolutionary origin. There knowledgebase Swiss-Prot. However, the following is
are a number of derived databases that merge worth mentioning.

3
Protein Databases

GlycoSuiteDB: a database of glycoprotein glycan Protein protein interaction databases


structures (see Web Links)
Protein protein interactions (PPI) lie at the heart of
GlycoSuiteDB (Cooper et al., 2001) is based on
most biological processes: signal transduction, meta-
information derived from the scientific literature and
bolic pathways and immune response. PPI data are of
provides detailed information on the glycan structure.
crucial scientific and medical relevance; indeed under-
Regarding proteins, when the glycan structures are
standing the interactions between encoded proteins of
known to be attached to a specific protein, direct links
a given genome is a critical step in functional genomic
are made to Swiss-Prot and TrEMBL.
analysis. Several PPI databases have been compiled to
document and describe protein protein interactions.
Two-dimensional polacrylamide gel DIP: Database of Interacting Proteins (see Web Links)
electrophoresis databases The DIP database (Xenarios et al., 2002) is a collection
A growing number of databases dedicated to proteins of PPIs that are determined experimentally. A consis-
identified on 2D-PAGE have appeared in the recent tent set of PPIs is the result of information gathered
years and their impact on proteome research is already from a variety of sources. Data within DIP is curated
quite significant. 2D-PAGE databases contain two both manually by expert curators and automatically.
separate components: image data and text informa- The database provides a comprehensive and integrated
tion. The former consists of one or more reference gel tool for browsing and extracting information on PPIs.
maps for a given biological sample (tissue, physiolog- Additional information can be found on the position
ical fluid, or a cell in the event of free-living of an interaction within a biological pathway and on
organisms); the latter gives detailed information on specific post-translational modifications.
each of the spots on the maps, such as their apparent
BIND: Biomolecular Interaction Network Database
molecular weight and isoelectric point on the map, the
name of the protein, the method of identification and The BIND database (Bader et al., 2001) stores full
cross-references to relevant databases. descriptions of interactions, molecular complexes and
pathways, among which are PPIs. BIND presents
protein interactions from the molecular level to the
Swiss-2Dpage: two-dimensional polyacrylamide gel pathway level and can be used, for instance, to study
electrophoresis database (see Web Links) networks of interactions.
Swiss-2Dpage (Hoogland et al., 2000) was created and
is maintained collaboratively by the Central Clinical MINT: a Molecular INTeractions database
Chemistry Laboratory of the Geneva University The MINT database (Zanzoni, 2002) stores functional
Hospital and the Swiss Institute of Bioinformatics interactions between biological molecules, i.e. pro-
(SIB). It contains 2D-PAGE and sodium dodecyl teins, RNA and DNA. Of interest to the protein
sulfate (SDS) PAGE reference maps and information specialist, MINT now focuses on experimentally
on identified proteins from a variety of human verified protein protein interactions. The database
biological samples, such as the liver, plasma, colon consists of entries extracted from the scientific litera-
and platelets. The proteins in Swiss-2Dpage have been ture where interaction information is found.
identified by the methods of microsequencing, immu-
noblotting, gel comparison, amino acid composition,
peptide mass fingerprinting and/or tandem mass Protein Information in Other Types of
spectrometry. Database
World-2Dpage (see Web Links) Many databases are not directly dedicated strictly to
the world of proteins; however, important and varied
There is, as for most types of database, an increasing
information on proteins can be found within a great
number of 2D gel databases. World-2Dpage
number of them, among which are the genomic
(Hoogland et al., 1999) is a complete index of 2D-
databases and metabolic databases.
PAGE databases and services. It not only offers the list
of databases and their web addresses, but also gives Genomic and genetic variation databases
information on the organism and the tissue or fluid
involved, as well as sites for laboratory services, 2D- Genomic databases offer a very wide scope of data
PAGE training, image analysis and links to related resources that refer to a specific organism. The aim of a
meetings and societies. genomic database is to provide a maximum of

4
Protein Databases

information on the genetic organization of a given Metabolic and enzyme nomenclature


species. Information includes gene names, gene local-
databases
ization (i.e. the position on a chromosome) as well as
numerous cross-references to nucleotide and protein Metabolic databases are particularly heterogeneous
sequence databases. A number of these databases hold data resources whose aim is to be comprehensive in the
information particularly useful for proteome studies description of enzymes, biochemical reactions and
by describing specific gene mutations and their effect metabolic pathways. Such resources can prove to be
on an organism’s phenotype. particularly helpful in the light of proteome research.
A number of these databases provide detailed descrip-
OMIM: Online Mendelian Inheritance in Man tions of all known enzymatic reactions catalyzed by a
(see Web Links) specific organism, while others tend to specialize in a
subset of biochemical pathways expressed in a variety
OMIM (McKusick, 1998) is a collection of human of organisms.
genes and genetic disorders maintained by the McKu-
sick Nathans Institute for Genetic Medicine, Johns
BRENDA: A Comprehensive Enzyme Information
Hopkins University (Baltimore, MD) and the Nation-
System
al Center for Biotechnology Information, National
Library of Medicine (Bethesda, MD). The database BRENDA (Schomburg et al., 2002) is the main
offers a wealth of textual information provided in each collection of enzyme functional data available to the
entry, some of which can be useful in the context of scientific community. It is maintained and developed
protein studies. at the Institute of Biochemistry at the University of
Cologne.
HGVS: Human Genome Variation Society
(see Web Links) KEGG: Kyoto Encyclopedia of Genes and Genomes (see
Formerly the HUGO Mutation Database Initiative, Web Links)
the Human Genome Variation Society (HGVS, KEGG (Kanehisa et al., 2002) strives to computerize
Auerbach, 2000) was created to promote the discovery the current knowledge of molecular and cellular
and free publication of information on the variations biology in terms of the information pathways that
in human genes by fostering a central repository for consist of interacting molecules or genes. It consists of
such variations. There are hundreds of genomic four types of data: pathway maps, molecule tables,
databases or databases that revolve around specific gene tables and genome maps. By its scope, KEGG is
mutations. As for all types of database, each research in fact more than just a metabolic database.
group tends to create its own database and its own
nomenclature system. The various databases are listed ENZYME: an enzyme nomenclature database
by category, their contents briefly described and their (see Web Links)
URL address given.
The ENZYME database (Bairoch, 2000) is not a
metabolic database in the strictest sense. It is in fact a
HGMD: Human Gene Mutation Database
repository of information relative to the nomenclature
(see Web Links)
of enzymes and is based on the recommendations of
HGMD (Krawczak and Cooper, 1997) is a compre- the Nomenclature Committee of the International
hensive database of gene lesions underlying human Union of Biochemistry and Molecular Biology
inherited disease. All submissions are linked to (IUBMB). Data include the recommended name of a
mutation sites that have been experimentally defined. characterized enzyme, its alternative names (if any), its
catalytic activity, cofactors (if any) as well as cross-
dbSNP: Single Nucleotide Polymorphism database references to a number of databases.
(see Web Links)
Signal nucleotide polymorphisms are the most com-
mon genetic variations. The SNP database (Sherry Conclusions
et al., 2001) is a repository of all the genetic variations
that are being discovered as the human genome is It is a particularly difficult task to give a comprehen-
being deciphered. Hence, a large number of artificial sive list of the databases that exist in the field of
variations still have to be confirmed. Submissions to proteomics. Furthermore, what is valuable within one
the database may be done directly, so information that database for one scientist may be of no value for
has not yet been published in a peer-reviewed medium another. A list of the major databases, hence perhaps
can be found here. the most frequently updated, used and relevant to the

5
Protein Databases

field of proteomics, is given here. One must keep in Mulder NJ, Apweiler R, Attwood TK, et al. (2002) The InterPro
mind that the essence of databases is to grow and database, an integrated documentation resource for protein
families, domains and functional sites. Briefings in Bioinformatics
develop constantly, and the only way to get a good 3: 225 235.
idea of the worth of one or the other is to visit a given Murzin AG, Brenner SE, Hubbard T and Chothia C (1995) SCOP:
database and browse through it. a structural classification of proteins database for the investiga-
tion of sequences and structures. Journal of Molecular Biology
247: 536 540.
See also Orengo CA, Michie AD, Jones S, et al. (1997) CATH a hierarchic
Genetic Databases classification of protein domain structures. Structure 5:
Genome Databases 1093 1108.
Protein Sequence Databases Rawlings ND, O’Brien EA and Barrett AJ (2002) MEROPS:
the protease database. Nucleic Acids Research 30: 343 346.
Schomburg I, Chang A, Hofmann O, et al. (2002) BRENDA, a
References resource for enzyme data and metabolic information. Trends in
Biochemical Sciences 27: 54 56.
Attwood TK, Blythe M, Flower DR, et al. (2002) PRINTS and Sherry ST, Ward MH, Kholodov M, et al. (2002) dbSNP: the NCBI
PRINTS-S shed light on protein ancestry. Nucleic Acids Research database of genetic variation. Nucleic Acids Research 29:
30: 239 241. 308 311.
Auerbach AD (2000) Eighth International HUGO-Mutation Xenarios I, Salwinski L, Duan XJ, et al. (2002) DIP: the Database of
Database Initiative Meeting, April 9, Vancouver, Canada. Interacting Proteins. A research tool for studying cellular
Human Mutation 16: 265 268. networks of protein interactions. Nucleic Acids Research 30:
Bader GD, Donaldson I, Wolting C, et al. (2001) BIND The 303 305.
Biomolecular Interaction Network Database. Nucleic Acids Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-
Research 29: 242 245. Citterich M and Cesareni G (2002) MINT: a Molecular
Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids INTeraction database. FEBS Letters 513(1): 135 140.
Research 28: 304 305.
Bateman A, Birney E, Cerruti L, et al. (2002) The Pfam Protein
Families Database. Nucleic Acids Research 30: 276 280.
Berman HM, Westbrook J, Feng Z, et al. (2000) The Protein Data Web Links
Bank. Nucleic Acids Research 28: 235 242.
Boeckmann B, Bairoch A, Apweiler R, et al. (2003) The Swiss-Prot The ExPASy list of Biomolecular servers. This site lists the major
protein knowledgebase and its supplement TrEMBL in 2003. (over one thousand!) databases of interest relative to proteomic
Nucleic Acids Research 31(1): 365 370. research
Cooper CA, Harrison MJ, Wilkins MR and Packer NH (2001) http://www.expasy.org/alinks.html
GlycoSuiteDB: a new curated relational database of glycoprotein BIND.The Biomolecular Interaction Network Database stores full
glycan structures and their biological sources. Nucleic Acids descriptions of interactions, molecular complexes and pathways,
Research 29: 332 335. among which are protein protein interactions
Corpet F, Servant F, Gouzy J and Kahn D (2000) ProDom and http://bind.mshri.on.ca/
ProDom-CG: tools for protein domain analysis and whole BRENDA. The main collection of enzyme functional data available
genome comparisons. Nucleic Acids Research 28: 267 269. to the scientific community, maintained and developed at the
Falquet L, Pagni M, Bucher P, et al. (2002) The PROSITE database, Institute of Biochemistry at the University of Cologne
its status in 2002. Nucleic Acids Research 30: 235 238. http://www.brenda.uni-koeln.de/
Haft DH, Loftus BJ, Richardson DL, et al. (2001) TIGRFAMSs: a CATH. A hierarchical domain classification of protein structures
protein family resource for the functional identification of derived from PDB (see below)
proteins. Nucleic Acids Research 29: 41 43. http://www.biochem.ucl.ac.uk/bsm/cath_new/index.html
Hoogland C, Sanchez J-C, Tonella L, et al. (2000) The 1999 DbSNP. The Single Nucleotide Polymorphism database is a
SWISS-2DPAGE database update. Nucleic Acids Research 28: repository of all the genetic variations which are discovered as
286 288. the human genome is being deciphered
Hoogland C, Sanchez J-C, Walther D, et al. (1999) Two-dimensional http://www.ncbi.nlm.nih.gov/SNP
electrophoresis resources available from ExPASy. Electrophor- DIP.The Database of Interacting Proteins is curated both manually
esis 20: 3568 3571. by expert curators and automatically. The DIP database provides
Horn F, Weare J, Beukers MW, et al. (1998) GPCRDB: an a comprehensive and integrated tool for browsing and extracting
information system for G protein-coupled receptors. Nucleic information on protein protein interactions
Acids Research 26: 277 281. http://dip.doe-mbi.ucla.edu/
Kanehisa M, Goto S, Kawashima S and Nakaya A (2002) The ENZYME. An enzyme nomenclature database based on the
KEGG databases at GenomeNet. Nucleic Acids Research 30: recommendations of the Nomenclature Committee of the
42 46. International Union of Biochemistry and Molecular Biology
Krawczak M and Cooper DN (1997) The Human Gene Mutation (IUBMB)
Database. Trends in Genetics 13: 121 122. http://us.expasy.org/enzyme/
Lefranc M-P (2001) IMGT, the international ImMunoGeneTics GlycoSuiteDB. A database of glycoprotein glycan structures derived
database. Nucleic Acids Research 29: 207 209. from the scientific literature. Regarding proteins, when the glycan
Letunic I, Goodstadt L, Dickens NJ, et al. (2002) Recent structures are known to be attached to a specific protein, direct
improvements to the SMART domain-based sequence annota- links are made to Swiss-Prot and TrEMBL databases (see below)
tion resource. Nucleic Acids Research 30: 242 244. http://www.glycosuite.com/
McKusick VA (1998) Mendelian Inheritance in Man. Catalogs of GPCRDB. An information system, which collects and disseminates
Human Genes and Genetic Disorders, 12th edn. Baltimore, MD: data related to the G-protein-coupled receptor
Johns Hopkins University Press. http://www.gpcr.org/7tm

6
Protein Databases

HGMD. The Human Gene Mutation Database is a comprehensive (Bethesda, MD). The database offers a wealth of textual
database of gene lesions underlying human inherited disease information provided in each entry, some of which can be useful
http://www.hgmd.org/ in the context of protein studies
HGVS. The Human Genome Variation Society was created to http://www.ncbi.nlm.nih.gov/omim/
promote the discovery and free publication of information on the PDB. The Protein Data Bank is a collection of 3D structures of
variations in human genes by fostering a central repository for proteins, nucleic acids and other biological macromolecules.
such variations PDB is a resource of critical importance in the discovery of new
http://www.hgvs.org/ pharmacological agents, new catalysts, new biomaterials and
IMGT. The international ImMunoGeneTics database is a high- possibly nanodevices
quality integrated information system that specializes in im- http://www.rcsb.org/pdb/
munoglobulins, T cell receptors and major histocompatibility SCOP. Provides a detailed and comprehensive description of the
complex molecules of all vertebrate species structural and evolutionary relationships between proteins whose
http://imgt.cines.fr:8104/ structure is known
InterPro. An integrated documentation resource for protein http://scop.mrc-lmb.cam.ac.uk/scop/
families, domains and functional sites, which was developed to Swiss-2Dpage. Contains 2D-PAGE and SDS PAGE reference maps
rationalize the complementary efforts of the individual protein and information on identified proteins from a variety of human
signature database projects that form the InterPro core (see biological samples. It is maintained collaboratively by the
Table 1) Central Clinical Chemistry Laboratory of the Geneva University
http://www.ebi.ac.uk/interpro/ Hospital and the Swiss Institute of Bioinformatics
KEGG.The Kyoto Encyclopedia of Genes and Genomes strives to http://www.expasy.org/ch2d/
computerize the current knowledge of molecular and cellular Swiss-Prot. A non-redundant curated protein knowledge resource
biology in terms of the information pathways that consist of that provides a high level of annotation. Besides the stark protein
interacting molecules or genes sequence, a Swiss-Prot entry offers the description of the function
http://www.genome.ad.jp/kegg/ of a protein, its domain structure, posttranslational modifica-
MEROPS. Provides data on individual proteases, protease families tions, variants and links to other databases
and also clans into which the families are grouped http://www.expasy.org/sprot
http://merops.iapc.bbsrc.ac.uk/ TrEMBL. Consists of computer-annotated entries in Swiss-Prot
MINT. The Molecular Interactions database. Stores functional format derived from the translation of coding sequences in the
interactions between biological molecules. European Molecular Biology Laboratory nucleotide sequence
http://cbm.bio.uni.oma2.it/mint/ database. Hence TrEMBL entries are preliminary Swiss-Prot
OMIM. The Online Mendelian Inheritance in Man database is a entries that have not yet been manually annotated
collection of human genes and genetic disorders maintained by http://www.ebi.ac.uk/swissprot
the McKusick Nathans Institute for Genetic Medicine, Johns World-2Dpage. A complete index of 2D-PAGE databases and
Hopkins University (Baltimore, MD) and the National Center services
for Biotechnology Information, National Library of Medicine http://www.expasy.org/ch2d/2d-index.html

View publication stats

You might also like