0% found this document useful (0 votes)
26 views11 pages

Zhang2011 Article AnOverviewOfHumanProteinDataba

The document discusses human protein databases and their application to functional proteomics research. It provides an overview of major human protein databases and classifies them. The review highlights databases that are publicly available and relevant to understanding protein functions and cellular mechanisms in health and disease.

Uploaded by

maneesh s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

Zhang2011 Article AnOverviewOfHumanProteinDataba

The document discusses human protein databases and their application to functional proteomics research. It provides an overview of major human protein databases and classifies them. The review highlights databases that are publicly available and relevant to understanding protein functions and cellular mechanisms in health and disease.

Uploaded by

maneesh s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SCIENCE CHINA

Life Sciences
• REVIEWS • November 2011 Vol.54 No.11: 988–998
doi: 10.1007/s11427-011-4247-x

An overview of human protein databases and their application to


functional proteomics in health and disease
ZHANG YanQiong1,2, ZHU YunPing2* & HE FuChu1,2*
1
Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China;
2
State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China

Received March 13, 2011; accepted November 23, 2011

Functional proteomics can be defined as a strategy to couple proteomic information with biochemical and physiological anal-
yses with the aim of understanding better the functions of proteins in normal and diseased organs. In recent years, a variety of
publicly available bioinformatics databases have been developed to support protein-related information management and bio-
logical knowledge discovery. In addition to being used to annotate the proteome, these resources also offer the opportunity to
develop global approaches to the study of the functional role of proteins both in health and disease. Here, we present a com-
prehensive review of the major human protein bioinformatics databases. We conclude this review by discussing a few exam-
ples that illustrate the importance of these databases in functional proteomics research.

human protein database, bioinformatics, functional proteomics, health, disease

Citation: Zhang Y Q, Zhu Y P, He F C. An overview of human protein databases and their application to functional proteomics in health and disease. Sci Chi-
na Life Sci, 2011, 54: 988–998, doi: 10.1007/s11427-011-4247-x

The Human Genome Sequencing Project triggered a revolu- through a rapid and transient association within large pro-
tion in biology and medicine. Following this proteomics has tein complexes. Therefore, understanding protein functions
developed to study the proteome, that is, the proteins ex- as well as unraveling molecular mechanisms within the cell
pressed in a genome, a cell or a tissue. Proteomics is an al- depends on the identification of the interacting protein
ternative to existing analytical methods that are used to de- partners [3]. Because most physiological and pathological
scribe life in molecular terms. Proteome investigations have processes are manifested at the protein level, biological
focused on two main areas: expression proteomics and scientists are becoming increasingly interested in applying
functional proteomics. Functional proteomics in particular, functional proteomics strategies to achieve a better under-
can be defined as a tool that aims to couple proteomic in- standing of basic molecular biology and disease processes
formation with biochemical and physiological analyses to and to advance the discovery of novel diagnostic, prognos-
understand the functions of proteins in normal and diseased tic and therapeutic targets for numerous diseases.
organs [1]. The approaches of functional proteomics are In recent years, a variety of publicly available bioinfor-
focused mainly towards two major targets: the elucidation matics databases have been developed to support pro-
of the biological functions of unknown proteins and the tein-related information management and biological
definition of cellular mechanisms at the molecular level [2]. knowledge discovery. These databases provide and organize
In cells, many proteins display their biological functions biological annotations for protein sequences, structures,
functions and evolutionary analyses in the context of bio-
logical pathways, networks and systems. In addition to the
*Corresponding author (email: [email protected]; [email protected])

© The Author(s) 2011. This article is published with open access at Springerlink.com life.scichina.com www.springer.com/scp
Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11 989

annotations, these databases offer a global approach to the two sections: one that contains manually-annotated records
study of the functional role of proteins both in health and with information extracted from literature and curator-
disease. Here, we present a comprehensive review of the evaluated computational analysis (UniProtKB/Swiss-Prot),
major human protein bioinformatics databases highlighting and another that holds computationally analyzed records
those that are recent, of high quality, publicly available, and that await full manual curation (UniProtKB/TrEMBL).
that we judged to be of interest to researchers in functional Swiss-Prot was established in 1986 and has been maintained
proteomics. We also discuss the important roles that these collaboratively since 1987 by Amos Bairoch and his group,
databases can play in functional proteomics research. first at the Department of Medical Biochemistry of the
University of Geneva and now at the Swiss Institute of Bi-
oinformatics (SIB) and the EMBL Data Library. The Uni-
1 An overview of human protein databases ProtKB/Swiss-Prot protein sequence database contains en-
tries composed of different line types, each with its own
Based on the topics and types of data that are stored, human format. For consistency, the format of UniProtKB/
protein databases can be classified primarily as sequence Swiss-Prot follows as closely as possible that of the EMBL
databases, structure databases, databases of protein-protein Nucleotide Sequence Database. UniProtKB/Swiss-Prot en-
interactions and complexes, family databases and prote- tries include information that clearly describes the type of
omics databases as described in Table 1. evidence available for the existence of a particular protein.
The UniProtKB/Swiss-Prot database is different from
1.1 Protein sequence databases other protein sequence databases by four distinct criteria:
(i) Annotation: The annotation consists of the following
Protein sequence databases serve as repositories for collec- features: function(s) of the protein; post-translational modi-
tions of protein sequences. In addition to the protein se- fication(s); domains and sites; secondary structure; quater-
quence they also contain annotations that reflect the existing nary structure; similarities to other proteins; disease(s) as-
knowledge of the protein’s function and the residues that sociated with deficiencies in the protein; sequence conflicts;
contribute to that function. A reliable sequence can form the variants; and others.
basis of investigations into the biological role of the protein. (ii) Minimal redundancy: UniProtKB/Swiss-Prot merges
Therefore, protein sequence databases are the foundation for all relevant data so as to minimize the redundancy of the
medical and functional studies. database. If conflicts exist between various sequencing re-
ports, they are indicated in the feature table of the corre-
1.1.1 Reference sequences (RefSeq) sponding entry.
The National Center for Biotechnology Information Refer- (iii) Integration with other databases: UniProtKB/
ence Sequence database (NCBI RefSeq) [4] is a compre- Swiss-Prot is currently cross-referenced with about 30 dif-
hensive resource for curated non-redundant sequences of ferent databases. Cross-references are provided in the form
genomic regions, transcripts and proteins. RefSeq is one of of pointers to information related to a UniProtKB/ Swiss-
the best sources for reliable nucleotide and protein se- Prot entry that is available in other data collections. The
quences. The RefSeq collection is derived from the se- extensive network of cross-references makes UniProtKB/
quence data available in the redundant archival database Swiss-Prot a major focal point of biomolecular database
GenBank. RefSeq sequences are annotated and include interconnectivity.
coding regions, conserved domains, variations, references, (iv) Documentation: UniProtKB/Swiss-Prot is distributed
names, and database cross-references. The annotation is with a large number of index files and specialized docu-
performed using a combination of automated prediction and mentation files. Some of these files have existed for a long
manual curation. The RefSeq data can be accessed from time; however, many have been created recently and new
NCBI web sites by Entrez query, BLAST, and FTP down- files are continuously being adding. The release notes al-
load. RefSeq sometimes stores more than one protein se- ways contain an up-to-date descriptive list of all distributed
quence per gene. In such a case, it may be useful to align all document files.
available RefSeq protein sequences for the gene of interest UniProtKB/Swiss-Prot release 2011_04 (05 April 2011)
to see where they differ and to assess whether or not sub- contains 526969 sequence entries, comprising 186402391
stantial differences require further investigation. amino acids abstracted from 196878 references.
RefSeq release 46 (11 March 2011) includes 12167392
proteins from 11734 organisms. 1.1.3 Database of Protein Disorder (DisProt)
The Database of Protein Disorder (DisProt) [6] is a curated
1.1.2 UniProtKB/Swiss-Prot database that provides sequence, structure and function in-
The UniProt Knowledgebase (UniProtKB) [5] is a compre- formation for intrinsically disordered proteins (IDPs) that
hensive, high-quality and freely accessible resource of pro- lack a fixed 3D structure in their putatively native state,
tein sequence and functional information. UniProtKB has either in their entirety or in part. Although they lack a fixed
Table 1 Human protein bioinformatics databases
990
Category Name Content URL References
Protein sequence
Reference Sequence (RefSeq) Containing many reliable protein sequences http://www.ncbi.nlm.nih.gov/RefSeq/ [4]
databases
Collection of protein sequences from a variety of
Entrez Protein Database sources, and translations from annotated coding http://www.ncbi.nlm.nih.gov/sites/ [7]
regions in GenBank and RefSeq
Containing the most reliable sequence and annota-
UniProtKB/Swiss-Prot http://www.uniprot.org/ [5]
tions
Containing most of the publicly available protein
UniProt Archive (UniParc) http://www.uniprot.org/help/uniparc [8]
sequences in the world
Clustered sets of sequences from UniProt
UniProt Reference Clusters (UniRef) http://www.uniprot.org/help/uniref [9]
Knowledgebase and selected UniParc records
Consensus CDS protein set (CCDS) Containing human and mouse protein sequences http://www.ncbi.nlm.nih.gov/CCDS/ [10]
Experimentally verified disordered regions in
Database of protein disorder (DisProt) http://www.disprot.org [6]
proteins
Extensive information on known phosphorylation
PhosphoSite http://www.phosphosite.org [11]
sites
Protein structure Containing 3D structures of proteins and polynu-
Molecular Modeling Database (MMDB) http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml [12]
databases cleotides.
Zhang Y Q, et al.

Containing the 3D structures of proteins, nucleic


Worldwide Protein Data Bank acids and large macromolecular complexes that
http://www.wwpdb.org/ [13]
(wwPDB) have been determined using X-ray crystallography,
NMR and electron microscopy techniques
Containing 3D protein models calculated by com-
ModBase http://www.modbase.compbio.ucsf.edu/modbase-cgi/index.cgi [14]
parative modeling
SWISS-MODEL Containing 3D models of proteins http://www.swissmodel.expasy.org/repository/ [15]
Sci China Life Sci

Hierarchical classification of protein domain


CATH http://www.cathdb.info/ [16]
structures in the Protein Data Bank
Structural Classification Of Proteins Description of the evolutionary and structural
http://www.scop.mrc-lmb.cam.ac.uk/scop/ [17]
(SCOP) relationships of the proteins with known structures
Containing the experimental data on protein fold-
KineticDB http://www.kineticdb.protres.ru/db/index.pl [18]
ing kinetics
Containing the structures for protein pre-, co- and
RESID http://www.ebi.ac.uk/RESID/ [19]
post-translational modifications
Containing the 3D structures of phosphorylation
Phospho3D sites that stores information retrieved from the http://www.cbm.bio.uniroma2.it/phospho3d/ [20]
phospho.ELM database
November (2011) Vol.54 No.11

Databases of pro-
Containing the binary protein–protein interactions
tein protein interac- Database of Interacting Proteins (DIP) http://www.dip.doe-mbi.ucla.edu/dip/Main.cgi [21]
that were manually curated by experts.
tions and complexes
Containing the experimentally verified pro-
Molecular INTeraction database (MINT) tein-protein interactions, which are mined from the http://mint.bio.uniroma2.it/mint/ [22]
scientific literature by expert curators
An open source database and software framework;
Containing the interaction data that are manually
extracted from public literature and annotated to a
IntAct high level of detail through the extensive use of http://www.ebi.ac.uk/intact/main.xhtml [23]
controlled vocabulary; Also containing a suite of
tools that can be used to visualize and analyze the
interaction data.
(To be continued on the next page)
(Continued)
Category Name Content URL References
Containing an extensive list of interaction data of
Human Protein Reference Database human proteins, which are manually extracted
http://www.hprd.org/ [24]
(HPRD) from public literature and curated by a team of
trained biologists.
Containing interaction data of human proteins from
numerous sources, not only from experimental
STRING repositories, but also includes computational pre- http://string-db.org/ [25]
diction methods, and automated text mining of
public text collections such as PUBMED.
Containing human protein interaction data from
Unified Human Interactome (UniHI) various sources including both computational and http://www.unihi.org/ [26]
experimental repositories.
Containing curated knowledge of biological path-
Reactome http://www.reactome.org/ [27]
ways
Kyoto Encyclopedia of Genes and Ge- Pathway maps on the molecular interaction and
http://www.genome.jp/kegg/pathway.html [28]
nomes (KEGG) reaction networks for metabolism
A highly comprehensive resource providing an
Protein family
Pfam optimised set of Hidden Markov Model profiles for http://pfam.jouy.inra.fr/ [29]
Zhang Y Q, et al.

databases
protein domain families
Resource for identification and annotation of pro-
Simple modular architecture research
tein domains and the analysis of domain architec- http://www.smart.embl.de/ [30]
tool (SMART)
tures
Containing conserved motifs used to characterize a
PRINTS http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php [31]
protein family
Containing protein domain families automatically
Sci China Life Sci

ProDom http://www.prodom.prabi.fr/prodom/current/html/home.php [32]


generated from the UniProtKB
Integrated resource of protein families, domains
InterPro and functional sites from Pfam, PRINTS, http://www.ebi.ac.uk/interpro/ [33]
PROSITE, ProDom, SMART, PIRSF etc.
Containing the protein families which are built in a
TIGRFAMs similar fashion to Pfam but also containing whole http://www.tigr.org/TIGRFAMs/index.shtml [34]
protein chains
Containing protein domains, families and function-
PROSITE al sites as well as associated patterns and profiles http://expasy.org/prosite/ [35]
to identify them
Clusters of Orthologous Groups of pro- Phylogenetic classification of proteins encoded in
http://www.ncbi.nlm.nih.gov/COG/ [36]
November (2011) Vol.54 No.11

teins (COGs) complete genomes


List of World-2DPAGE database servers,
Proteomics data- World-2DPAGE Portal that queries simultaneously
WORLD-2DPAGE Constellation http://world-2dpage.expasy.org/ [37]
bases worldwide proteomics databases, and
World-2DPAGE Repository
Containing mass spectral library for data from a
Global Proteome Machine Database
variety of organisms, the identified peptides are http://www.thegpm.org/GPMDB/index.html [38]
(GPMDB)
matched to the Ensembl genome database
Containing protein and peptide identifications that
PRoteomics IDEntifications database have been described in the scientific literature
http://www.ebi.ac.uk/pride/ [39]
(PRIDE) together with the evidence supporting these identi-
fications
Containing peptides identified in a large set of
PeptideAtlas http://www.peptideatlas.org/ [40]
LC–MS/MS proteomics experiments
Containing tandem mass spectrometry peptide and
Peptidome http://www.ncbi.nlm.nih.gov/peptidome/ [41]
protein identification data
991
992 Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11

structure, IDPs carry out important biological functions be- example, those dealing with specific families, diseases, and
ing typically involved in regulation, signaling and control. structural features are also available.
They often carry out these functions through high-specific-
ity low-affinity interactions involving the multiple binding 1.2.1 Worldwide Protein Data Bank (wwPDB)
of one protein to many partners or the multiple binding of Since 2003, the Protein Data Bank Archive (PDB Archive)
many proteins to one partner. DisProt collects and organizes [13] has been managed by an international consortium
knowledge regarding the experimental characterization and called the Worldwide Protein Data Bank (wwPDB) whose
the functional associations of IDPs. DisProt is a collabora- partners comprise the RSCB PDB, the Macromolecular
tive effort between the Center for Computational Biology Structure Database (MSD, now known as the PDBe) at the
and Bioinformatics at Indiana University School of Medi- European Bioinformatics Institute (EBI), the Protein Data
cine and the Center for Information Science and Technolo- Bank Japan (PDBj) at Osaka University and, more recently,
gy at Temple University. the BioMagResBank (BMRB) at the University of Wiscon-
Disprot release 5.7 (28 February 2011) includes 643 pro- sin-Madison. The PDB Archive is a collection of flat files in
teins and 1375 disordered regions. three different formats: the legacy PDB file format; the
PDB exchange format that follows the mmCIF syntax
1.1.4 PhosphoSite (http://www.deposit.pdb.org/mmcif/); and the PDBML/
PhosphoSite [11] is an online systems biology resource XML format. For many of the structures, the original ex-
providing comprehensive information and tools for the perimental data is also available. Thus, for structural models
study of protein post-translational modifications (PTMs). solved by X-ray crystallography, the structure factors from
PhosphoSite provides information about phosphorylated which the model was derived can be downloaded, while for
residues and the surrounding sequences, orthologous sites in structures solved by nuclear magnetic resonance (NMR)
other species, the location of the site within known domains spectroscopy, the original distance and angle restraints can
and motifs, and relevant literature references. Cross- be obtained. An important task of the wwPDB was to rem-
references are provided to a number of external resources edy the legacy the PDB archive related mainly to ligands
for protein sequences, structures, PTMs and signaling and literature references by fixing and making consistent all
pathways, as well as to sources of phospho-specific anti- the PDB data.
bodies and probes. As the knowledgebase expands, users wwPDB (27 April 2011) contains 69177 structures each
will be able to retrieve information about the kinases, of which is identified by a unique four-character reference
phosphatases, ligands, treatments, and receptors that have code, the PDB identifier.
been shown to regulate the phosphorylation status of the
sites and the pathways in which the phosphorylation sites 1.2.2 Structural Classification of Proteins (SCOP)
function. PhosphoSite provides an extensive, manually cu- The Structural Classification of Proteins (SCOP) [17] data-
rated phosphorylation site database and also includes other base provides a comprehensive and detailed description of
commonly studied PTMs. This resource provides an easily the evolutionary and structural relationships of proteins with
accessible overview of the role of different phosphorylation known structures. The SCOP classification is constructed
sites, the experimental evidence for the modification, and based on the domains in experimentally determined protein
the cell types in which the modification was found. structures and includes family, superfamily, fold and class
The latest PhosphoSite release (03 November 2011) based on the secondary structure content and organization
contains 100884 non-redundant phosphorylation sites, of the folds. SCOP also contains information on species and
19185 ubiquitination sites, 7849 acetylation sites, 379 di- on groups with similar structures and similar functions.
methylation sites, 303 mono-methylation sites, 139 methyl- SCOP release 1.75 (June 2009) includes 38221 PDB en-
ation sites, 622 sumoylation sites, and 602 O-GlcNAc sites. tries, 1195 folds, 1962 superfamilies and 3902 families.

1.2.3 Phospho3D
1.2 Protein structure databases
Phospho3D [20], a database of 3D structures of phosphory-
Protein structure databases describe experimentally deter- lation sites, stores information retrieved from Phospho.ELM
mined protein structures and provide useful links, analyses, (a database of S/T/Y phosphorylation sites) that is enriched
and schematic diagrams that relate 3D structure to biologi- with structural information and annotations at the residue
cal function. A growing body of experimental data supports level. Phospho3D also collects the results of large-scale
the notion that the structure of a protein reflects the nature structural comparisons of the 3D zones versus a representa-
of its role and therefore determines its biological function. tive dataset of structures, thus each P-site is associated to a
Some the databases classify 3D structures by their folds number of structurally similar sites. Phospho3D has several
because this can often reveal evolutionary relationships that additional features that include new structural descriptors,
may be hard to detect from sequence comparisons alone. A the possibility of selecting non-redundant sets of 3D struc-
large number of databases for more specialized users, for tures and the availability for download of non-redundant
Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11 993

sets of structurally annotated P-sites. Users can browse the 1.3.2 Human Protein Reference Database (HPRD)
data, search the database using kinase name, PDB identifi- HPRD [24] is a database of curated proteomic information
cation code or keywords, and submit a protein structure and pertaining to human proteins in healthy and diseased states.
scan it against the 3D zones in the Phospho3D database. Although it is not a protein–protein interaction database, it
Phospho3D (August 2010) includes 1770 mapped Phos- contains an extensive list of interaction data of human pro-
pho.ELM instances, 2083 distinct PDB files, 2158 distinct teins. All data in HPRD are manually extracted from public
PDB chains and 5387 phosphorylation sites mapped onto a literature and curated by a team of trained biologists. The
PDB structure. data are freely available for academic users and can be
downloaded in either tab-delimited or XML formats. The
1.3 Databases of protein-protein interactions and com- whole database or only protein-protein interaction data
plexes without annotations can be downloaded in tab-delimited or
PSI-MI format. HPRD is also linked to NetPath, a compen-
To perform their functions, proteins work together through dium of human signaling pathways, which currently con-
various forms of direct or indirect interaction mechanisms. tains annotations for several cancers and immune signaling
For a variety of basic functions, many proteins form a large pathways.
complex representing a molecular machine or a macromo- HPRD release 9 (13 April 2010) includes 30047 protein
lecular super-structural building block. High-throughput entries, 39194 protein-protein interactions, 93710 PTMs,
techniques for the detection of protein-protein interactions 112158 protein expression, 22490 subcellular localization,
have matured and protein interaction data is now available 470 domains, and 453521 PubMed links.
on a large scale. Curated databases of protein–protein inter-
actions have become a necessity for efficient research into 1.3.3 Unified Human Interactome (UniHI)
how proteins function. Databases of protein-protein interac- Unified Human Interactome (UniHI) [26] is a comprehen-
tions and complexes maintain information about in- sive database of computational and experimental based hu-
ter-molecular interactions, metabolic pathways, regulatory man protein interaction networks. The database is intended
pathways, and the complexes that underlie many biological to integrate diverge maps, providing a flexible and direct
processes. With the development of retrieval tools and inte- entry gate into the human interactome.
gration into annotation pipelines, these databases will be- A variety of protein identifiers are supported by UniHI.
come important resources for the discovery of new bio- Users can submit a set of proteins to obtain their functional
molecular mechanisms. information and interacting partners. The results are re-
turned as a list of matched proteins together with names of
1.3.1 IntAct the original source databases. UniHI also provides an inter-
IntAct [23] is an open source database and software frame- active viewer to visualize the interaction networks. To ana-
work for the storage, presentation and analysis of protein lyze the human interactome, the UniHI website provides
interaction data. Most of the interaction data is from pro- two powerful tools, UniHI Express and UniHI Scanner.
UniHI Express can be used to refine the interaction net-
tein-protein interactions, but IntAct also captures data for
works based on gene expression in selected tissues to con-
non-protein molecular interactors such as DNA, RNA, and
struct a tissue-specific network. UniHI Scanner can be used
small molecules. IntAct uses a flexible data model that can
to compare the extracted networks with the pathways from
accommodate high levels of experimental details. Technical
Kyoto Encyclopedia of Genes and Genomes (KEGG) to
details about the experiment, binding sites, protein tags and
detect new components in existing pathways. Proteins in-
mutations are annotated with the Molecular Interaction On-
volved in multiple pathways that might be useful for dis-
tology of the Proteomics Standard Initiative (PSI-MI). The ease-related studies can also be identified by UniHI Scan-
IntAct website provides both textual and graphical views of ner.
protein interactions. The interactive viewer provides a UniHI 4 version 4.0 contains 253980 distinct interactions
number of unique features such as highlighting the node between 22307 unique human proteins.
based on the molecule type, Gene Ontology, InterPro anno-
tation, experimental and biological role, and species. Users 1.3.4 Kyoto Encyclopedia of Genes and Genomes (KEGG)
can iteratively develop complex queries, exploiting the de- KEGG [28] is an integrated database resource consisting of
tailed annotation with hierarchical controlled vocabularies. 16 main databases, broadly categorized into systems infor-
Results can be obtained at any stage in a simplified, tabular mation, genomic information and chemical information.
view. A specialized view allows ‘zooming in’ on the full Systems information represents functional aspects of the
annotation of interactions, interactors and their properties. biological systems, such as the cell and the organism, that
IntAct version 2.0 contains 266855 binary interactions, are built from the building blocks. Genomic and chemical
56486 proteins, 13103 experiments, and 1665 controlled information represents the molecular building blocks of life
vocabulary terms. in the genomic and chemical spaces, respectively. KEGG
994 Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11

has been widely used as a reference knowledge base for the bases of InterPro contributes towards a different niche, from
biological interpretation of large-scale datasets generated by very high-level, structure-based classifications
sequencing and other high-throughput experimental tech- (SUPERFAMILY and CATH-Gene3D) through to quite
nologies. specific sub-family classifications (PRINTS and
On 4 May 2011 KEGG contained 134607 pathway maps, PANTHER). InterPro adds in-depth annotation, including
38205 functional hierarchies, 392 pathway modules and Gene Ontology (GO) terms, to the protein signatures. Users
complexes, 375 human diseases, 9347 drugs, 835 crude can analyze an entire new genome using a downloadable
drugs and other natural products, 14615 orthology groups, version of InterProScan which can be incorporated into ex-
1588 organisms, 6405661 genes in high-quality genomes, isting local pipelines. InterPro provides structural infor-
372418 genes in draft genomes, 3792883 genes as EST mation from PDB, its classification in CATH and SCOP, as
contigs, 669846 genes in metagenomes, 17379 metabolites well as homology models from ModBase and SwissModel.
and other small molecules, 10978 glycans, 8451 biochemi- Therefore, users can perform a direct comparison of the
cal reactions, 12547 reactant pair chemical transformations, protein signatures with the available structural information.
2324 reaction classes and 5391 enzyme nomenclatures. InterPro release 32.0 (18 April 2011) contains 21516 en-
tries, representing: 100 active sites, 66 binding sites, 628
1.4 Protein family databases conserved sites, 5974 domains, 14469 families, 16 PTMs
and 263 repeats.
The databases storing information on the sequences and
structures of proteins have been used to develop new re- 1.4.3 PROSITE
sources with value-added information by classifying pro- PROSITE [35] is a database of protein families and do-
teins into families according to their evolutionary relation- mains. It consists of entries describing the protein families,
ships. These resources can provide extensive insights into domains and functional sites as well as the amino acid pat-
evolution and, in particular, can support investigations into terns, signatures, and profiles in them. It is based on the
how proteins mutate and how function evolves over time. observation that while there is a huge number of different
Such analyses have greatly assisted the transfer of function- proteins, most of them can be grouped, on the basis of simi-
al annotations between experimentally characterized and larities in their sequences, into a limited number of families.
uncharacterized genes. Proteins or protein domains belonging to a particular family
generally share functional attributes and are derived from a
1.4.1 Pfam common ancestor. The content of PROSITE is manually
The Pfam database [29] is a large collection of multiple curated by a team at the Swiss Institute of Bioinformatics
sequence alignments and hidden Markov models covering and tightly integrated into the UniProtKB/Swiss-Prot pro-
many common protein families. The database categorizes tein annotation. PROSITE was created in 1988 by Amos
75 percent of known proteins to form a library of protein Bairoch who directed the group for more than 20 years. In
families. The open access resource was established at the July 2009, Ioannis Xenarios took over as the director of the
Welcome Trust Sanger Institute in 1998. Its vision is to PROSITE, UniProtKB/Swiss-Prot and bioinformatics com-
provide a tool which allows experimental, computational petence center Vital-IT groups. PROSITE currently con-
and evolutionary biologists to classify protein sequences tains patterns and profiles specific to more than a thousand
and answer questions about what they do and how they have protein families or domains. Each of these signatures comes
evolved. The Pfam website also provides information on with documentation providing background information on
domain compositions. the structure and function of these proteins. On the
Pfam version 25.0 (March 2011) contains alignments and PROSITE website, users can perform keywords-based
models for 12273 protein families, based on the Uni- searches, and browse the motif entries, ProRule description,
ProtKB/Swiss-Prot and UniProtKB/TrEMBL protein se- taxonomic scope, and number of positive hits. PROSITE
quence databases. provides the ScanProsite tool which can be used either to
scan protein sequences for the occurrence of PROSITE mo-
1.4.2 InterPro tifs by entering UniProtKB or PDB identifier(s) or protein
InterPro [33] is a database of protein families, domains and sequence(s), or to scan the UniProtKB or PDB databases for
functional sites in which identifiable features found in the occurrence of a pattern by entering the PROSITE identi-
known proteins can be applied to new protein sequences to fier or the user’s own pattern(s). ScanProsite can also be
functionally characterize them. It classifies sequences at the accessed programmatically through a simple HTTP web
superfamily, family and subfamily levels, predicting the service.
occurrence of functional domains, repeats and important PROSITE release 20.72 (7 April 2011) contains 1609
sites. The contents of InterPro are based on diagnostic sig- documentation entries, 1308 patterns, 922 profiles and 917
natures and the proteins that contain regions that signifi- ProRule containing functional and structural information on
cantly match these signatures. Each of the member data- PROSITE profiles.
Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11 995

1.5 Proteomics databases queried by experiment accession number, protein accession


number, literature reference, and by sample parameters in-
Protein function analysis has evolved from the careful de- cluding species, tissue, sub-cellular location and disease
sign of assays that address specific questions to the state. The query results can be retrieved as PRIDE XML,
high-throughput 2D-gel and mass spectrometry (MS) based mzData XML, or HTML. PRIDE also provides access to
proteomics technologies that yield proteome-wide maps of public PRIDE data from a query-optimized data warehouse
protein expression or interaction. Because these technolo- as well as programmatic web service access in a BioMart
gies depend heavily on information storage, representation [42] interface that allows complex queries to be constructed.
and analysis, several proteomics databases are available The PRIDE database (23 Jun 2011) contains 16476 ex-
while new resources are still emerging. periments, 5024880 identified proteins, 24498871 identified
peptides, 3299204 unique peptides and 146864975 spectra.
1.5.1 World-2DPAGE Constellation
The World-2DPAGE Constellation [37] is composed of the 1.5.3 Global Proteome Machine Database (GPMDB)
established WORLD-2DPAGE List of 2-D PAGE database The GPMDB [38] was constructed to use the information
servers, the World-2DPAGE Portal that simultaneously obtained by Global Proteome Machine (GPM) servers to aid
queries world-wide proteomics databases, and the recently in the difficult process of validating peptide MS/MS spectra
created World-2DPAGE Repository. The WORLD- and protein coverage patterns. This database has been inte-
2DPAGE List is an index to known federated 2-D PAGE grated into the GPM server pages [38], allowing users to
database, services and related servers. Databases are quickly compare their experimental results with the best
grouped by species and classified in three categories ac- results that have been observed previously by users of the
cording to their implementation of the rules defining a fed- machine. GPMDB does not hold complete records of pro-
erated 2-DE database. WORLD-2DPAGE List currently teomics experiments; rather it holds the minimum amount
lists up to 60 databases totalizing nearly 400 gel images. of information necessary for bioinformatics-related tasks
The World-2DPAGE Portal is a dynamic portal that can be such as sequence assignment validation. Most of the data is
used to simultaneously query world-wide gel-based prote- held in a set of XML files and the database serves as an
omics databases. The Portal can be seen as a virtual unique index to those files, allowing for very rapid lookups and
database with up to 133 reference maps for 20 species to- reduced database storage requirements.
talizing nearly 18,000 identified spots. The World-2DPAGE
Repository is a public standards-compliant repository for
gel-based proteomics data linked to protein identification 2 Discussion
published in the literature.
The World-2DPAGE (15 March 2011) consists of 18 ar- Human protein bioinformatics databases offer scientists the
ticles with 28 reference maps for 18 species and 5700 iden- opportunity to access a wide variety of biologically relevant
tified spots. data, including protein sequences, structures and functions.
However, the growing interest in functional proteomics is
1.5.2 Proteomics Identifications Database (PRIDE)
fuelled not only by the prospect of a true functional under-
PRIDE [39] is a prominent public data repository of MS standing but also by substantial improvements in technolo-
based proteomics data that is maintained by the European gy and methodology. Advances in protein identification
Bioinformatics Institute as part of the Proteomics Services technologies, in particular MS, have made possible the es-
Team. PRIDE stores three different kinds of information: tablishment of proteomics databases and extended the sensi-
peptide and protein identifications derived from MS or tan- tivity, accuracy and speed of analysis, making it possible to
dem MS (MS/MS) experiments, MS and MS/MS mass routinely identify several thousand proteins per experiment
spectra as peak lists, and any or all associated metadata. [43]. The introduction of MS methods for accurate relative
PRIDE was established as a production service in 2005. and absolute protein quantification and the large-scale anal-
Several other proteomics databases have been established ysis of PTMs, such as phosphorylation and ubiquitylation,
over the past few years, including GPMDB [38], Pep- have allowed truly functional proteomics to be carried out
tideAtlas [40], and the NCBI Peptidome [41]. NCBI Pepti- [44]. MS is now joined by antibody and protein-protein in-
dome and PRIDE are structured data repositories that store teraction arrays, fluorescence- and flow cytometry-based
the original experimental data from the researchers and do detection of proteins and PTMs, and optical spectroscopic
not assume any editorial control over the submitted data. methods of proteome analysis [45]. These latter techniques
PRIDE contains data from about 60 species; the biggest are promoted by an ever increasing repertoire of specific
fraction is from human samples, followed by the fruitfly antibodies against proteins and PTMs, and bring single-cell
Drosophila melanogaster and mouse. Users can submit data proteomics into reach. Therefore, the functional proteomics
obtained using different MS-based proteomics technologies approaches, when integrated with the information from hu-
in PRIDE XML or mzData XML formats. PRIDE can be man protein bioinformatics databases, offer an effective
996 Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11

route to biomarker and drug discovery by pinpointing sig- 2.3 The use of human protein bioinformatics databases
naling pathways and components that are differentially reg- to retrieve biological information on the myc pro-
ulated in particular diseases. Some examples of how protein to-oncogene protein (c-myc)
databases have been used in functional proteomics are de-
scribed in the following section. The UniProtKB/Swiss-Prot entry for human c-myc (acces-
sion number P01106) contains the following information: (i)
2.1 Functional proteomics mapping of Smad signaling the ‘Comments’ section provides an overview of its biolog-
involved in several human pathologies ical function; (ii) the list of keywords gives a very useful
quick impression about the protein and can be used to find
Access to the human genome functional proteomics data- other proteins that have been annotated with the same key-
bases have led to descriptions of large protein-protein inter- word; and (iii) the ‘Features’ sections contains annotations
action networks. An efficient strategy for the functional that are localized to particular residues or regions of the
exploration of complex proteomes requires (i) the in-depth sequence including a helix-loop-helix motif, a potential leu-
annotation of protein-protein interaction maps with all the cine zipper, and a basic motif all towards the C-terminal end.
available information on proteins, protein domains, and Information about the 353 amino acids at the N-terminal
interactions; (ii) an exploration tool that allows easy naviga- end (about two-thirds of the protein) consists mostly of
tion of complex databases; and (iii) streamlined functional PTMs and areas with compositional bias. Therefore, what
assays to validate newly identified proteins. Colland et al. might be the function of this largely unannotated region,
[46] have applied an integrated strategy of this kind for the and which residues might be involved? The features section
identification of new factors implicated in the Smad signal- ends with a list of the positions of secondary structure ele-
ing pathway involved in several human pathologies. They ments that have been experimentally validated, in c-myc
used two-hybrid screening to map Smad signaling pro- this is three alpha-helices at the C-terminal end. The c-myc
tein-protein interactions and established a network of 755 RefSeq entry (accession number NP_002458.2) contains
interactions involving 591 proteins, 179 of which were comments on a protein isoform, created using a downstream
poorly annotated or not annotated in the existing databases alternative start codon, which may have some role in the
of protein-protein interactions and complexes. The explora- cell. The ‘FEATURES’ section of this entry contains details
tion of the databases was improved by the use of PIMRider on the residues involved in particular functions, in this case
[47], a dedicated navigation tool accessible through the DNA binding and dimerization; however, no new infor-
Web. In their study, they successfully illustrated the biolog- mation on the N-terminal part of the sequence is available.
ical meaning of the network after indentifying the presence To check if the lack of annotation might be due to a high
of 18 known Smad-associated proteins. Colland et al. then degree of structural and/or functional flexibility in this re-
performed functional assays including siRNA knock-down gion of the sequences, we used DisProt to see if the c-myc
experiments in mammalian cells and identified eight novel regions had been annotated with experimentally validated
proteins involved in Smad signaling. The success of this intrinsic disorder. A keyword search for ‘c-myc’, found the
study demonstrates the validity of using an integrated func- entry ‘DP00260’, which lists a series of experiments that
tional proteomics approach. show the propensity for intrinsic disorder in the N-terminal
part of the protein including the region around the threonine
2.2 Intrinsically disordered proteins (IDPs) and func- at position 58 (T58); this residue was annotated to be some-
tional proteomics times phosphorylated and sometimes glycosylated in Uni-
ProtKB/Swiss-Prot and RefSeq. Because disordered regions
The recent discovery of IDPs has significantly broadened often harbor PTM sites that modulate molecular interactions,
the view of the scientific community and increased the and because phosphorylation is the best understood PTM,
number of groups systematically studying these intriguing we consulted PhosphoSite. Using a keyword search for
proteins [48]. IDPs and ID regions are typically involved in ‘T58’, we found that various functions have been associated
regulation, signaling and control pathways where they com- with this site. To obtain more detailed information about the
plement the functional repertoire of the more ordered re- interactions or pathways that this protein might be involved
gions that typically carry out efficient catalysis [49]. The in, we continued our search in protein structure, family, and
many IPD databases (for example, ProDDO [50] and Dis- interaction databases. Five PDB IDs (1A93, 1EE4, 1MV0,
Prot [6]) are increasingly being used in individual and 1NKP and 2A93) associated with P01106 were found. The
high-throughput experiments (i) to improve estimations of Pfam database has annotated protein-myc as belonging to
the commonness of ID regions and their functional reper- three protein families: PF00010 (HLH family, he-
toire; (ii) to aid or improve prediction of other protein fea- lix-loop-helix DNA binding domain), PF01056 (Myc-N
tures such as protein PTM sites or other types of binding family, myc amino-terminal region) and PF02344 (Myc-LZ
regions; and (iii) to gain insight into structural and dynamic family, myc leucine zipper domain). The KEGG database
properties of the proteins of interest. revealed that P01106 is involved in 14 pathways, including
Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11 997

the MAPK signaling pathway, the ErbB signaling pathway, disordered proteins. Nucl Acids Res, 2007, 35: D786–D793
7 Sayers E W, Barrett T, Benson D A, et al. Database resources of the
cell cycle and the Wnt signaling pathway all of which play
National Center for Biotechnology Information. Nucleic Acids Res,
important roles in the progression of various human can- 2007, 35: D5–D12
cers. 8 Leinonen R, Diez F G, Binns D, et al. UniProt archive. Bioinformat-
ics, 2004, 20: 3236–3237
9 Suzek B E, Huang H, McGarvey P, et al. UniRef: comprehensive and
3 Future perspectives non-redundant UniProt reference clusters. Bioinformatics, 2007, 23:
1282–1288
10 Rebhan M. Protein sequence databases. Methods Mol Biol, 2010, 609:
Functional proteomics strategies that uses information from 45–57
the human protein databases are expected to provide an in- 11 Hornbeck P V, Chabra I, Kornhauser J M, et al. PhosphoSite: a bio-
informatics resource dedicated to physiological protein phosphoryla-
tegrated picture of the expression levels and properties of tion. Proteomics, 2004, 4: 1551–1561
the thousands of protein components of organelles, path- 12 Wang Y, Addess K J, Chen J, et al. MMDB: annotating protein se-
ways, and cytoskeletal systems, both in health and disease. quences with Entrez’s 3D-structure database. Nucleic Acids Res,
Recent major developments in protein-complex purification, 2007, 35: D298–D300
13 Berman H, Henrick K, Nakamura H, et al. The worldwide Protein
MS and bioinformatics databases will progress the analysis
Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.
of the human proteome. Nucleic Acids Res, 2007, 35: D301–D303
What are the major challenges and future goals? First, 14 Pieper U, Eswar N, Webb B M, et al. MODBASE, a database of an-
although a variety of protein bioinformatics databases have notated comparative protein structure models and associated re-
sources. Nucleic Acids Res, 2009, 37: D347–D354
been developed to catalog and store different information
15 Kiefer F, Arnold K, Künzli M, et al. The SWISS-MODEL repository
about proteins, it is still important to develop new solutions and associated resources. Nucleic Acids Res, 2009, 37: D387–D392
to facilitate comparative analysis, data-driven hypothesis 16 Cuff A L, Sillitoe I, Lewis T, et al. The CATH classification revisit-
generation, and biological knowledge discovery. Second, ed-architectures reviewed and new ways to characterize structural
before a complete human proteomic analysis can be applied divergence in superfamilies. Nucleic Acids Res, 2009, 37:
D310–D314
to the study of diseases, many challenges need to be ad- 17 Andreeva A, Howorth D, Chandonia J M, et al. Data growth and its
dressed. They include the heterogeneity of biopsy material, impact on the SCOP database: new developments. Nucleic Acids Res,
the need to develop better image analysis systems for sup- 2008, 36: D419–D425
porting gel comparisons, quantitation and databasing, de- 18 Bogatyreva N S, Osypov A A, Ivankov D N. KineticDB: a database
of protein folding kinetics. Nucleic Acids Res, 2009, 37: D342–D346
termination of interacting partners, functional aspects, and 19 Garavelli J S. The RESID database of protein modifications as a re-
the lack of procedures for identifying and functionally source and annotation tool. Proteomics, 2004, 4: 1527–1533
characterizing target genes that lie in disease pathways. 20 Zanzoni A, Ausiello G, Via A, et al. Phospho3D: a database of
We are confident that all these challenges will be ad- three-dimensional structures of protein phosphorylation sites. Nucleic
Acids Res, 2007, 35: D229–D231
dressed as increasing numbers of scientists begin to apply
21 Salwinski L, Miller C S, Smith A J, et al. The database of interacting
protein bioinformatics databases to functional proteomics proteins: 2004 update. Nucleic Acids Res, 2004, 32: D449–D451
research as well as to clinically relevant questions in bio- 22 Zanzoni A, Montecchi-Palazzi L, Quondam M, et al. MINT: a
medical research. We predict that medicine will profit Molecular INTeraction database. FEBS Lett, 2002, 513: 135–140
23 Aranda B, Achuthan P, Alam-Faruque Y, et al. The IntAct molecular
enormously from the development of integrated functional
interaction database in 2010. Nucleic Acids Res, 2010, 38:
proteomics strategies for the identification and characteriza- D525–D531
tion of biomarkers and drug targets for disease diagnosis 24 Keshava Prasad T S, Goel R, Kandasamy K, et al. Human Protein
and therapeutics. Reference Database—2009 update. Nucleic Acids Res, 2009, 37:
D767–D772
25 Snel B, Lehmann G, Bork P, et al. STRING: a web-server to retrieve
This work was supported by the National Basic Research Program of Chi- and display the repeatedly occurring neighbourhood of a gene. Nu-
na (Grant Nos. 2010CB912700 and 2011CB910601). cleic Acids Res, 2000, 28: 3442–3444
26 Chaurasia G, Malhotra S, Russ J, et al. UniHI 4: new tools for query,
analysis and visualization of the human protein-protein interactome.
1 Godovac-Zimmermann J, Brown L R. Perspectives for mass spec- Nucleic Acids Res, 2009, 37: D657–D660
trometry and functional proteomics. Mass Spectrom Rev, 2001, 20: 27 Matthews L, Gopinath G, Gillespie M, et al. Reactome knowledge-
1–57 base of human biological pathways and processes. Nucleic Acids Res,
2 Gavin A C, Bosche M, Krause R, et al. Functional organization of the 2009, 37: D619–D622
yeast proteome by systematic analysis of protein complexes. Nature, 28 Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and ge-
2002, 415: 141–147 nomes. Nucleic Acids Res, 2000, 28: 27–30
3 Monti M, Orru S, Pagnozzi D, et al. Functional proteomics. Clinica 29 Finn R D, Tate J, Mistry J, et al. The Pfam protein families database.
Chimica Acta, 2005, 357: 140–150 Nucleic Acids Res, 2008, 36: D281–D288
4 Pruitt K D, Tatusova T, Maglott D R. NCBI reference sequences 30 Letunic I, Doerks T, Bork P. SMART 6: recent updates and new de-
(RefSeq): a curated non-redundant sequence database of genomes, velopments. Nucleic Acids Res, 2009, 37: D229–D232
transcripts and proteins. Nucleic Acids Res, 2007, 35: D61–D65 31 Bru C, Courcelle E, Carrère S, et al. The ProDom database of protein
5 The UniProt Consortium. The universal protein resource (UniProt) in domain families: more emphasis on 3D. Nucleic Acids Res, 2005, 33:
2010. Nucleic Acids Res, 2010, 38: D142–D148 D212–D215
6 Sickmeier M, Hamilton J A, LeGall T, et al. DisProt: the database of 32 Attwood T K. The PRINTS database: a resource for identification of
998 Zhang Y Q, et al. Sci China Life Sci November (2011) Vol.54 No.11

protein families. Brief Bioinform, 2002, 3: 252–263 42 Kasprzyk A. BioMart: driving a paradigm change in biological data
33 Hunter S, Apweiler R, Attwood T K, et al. InterPro: the integrative management. Database (Oxford), 2011, bar049
protein signature database. Nucleic Acids Res, 2009, 37: D211–D215 43 Schulz K R, Danna E A, Krutzik P O, et al. Single-cell phos-
34 Haft D H, Selengut J D, White O. The TIGRFAMs database of pro- pho-protein analysis by flow cytometry. Curr Protoc Immunol, 2007,
tein families. Nucleic Acids Res, 2003, 31: 371-373 Chapter 8: Unit 8.17
35 Hulo N, Bairoch A, Bulliard V, et al. The 20 years of PROSITE. Nu- 44 Fournier F, Guo R, Gardner E M, et al. Biological and biomedical
cleic Acids Res, 2008, 36: D245–D249 applications of two-dimensional vibrational spectroscopy: proteomics,
36 Tatusov R L, Fedorova N D, Jackson J D, et al. The COG database: imaging, and structural analysis. Acc Chem Res, 2009, 42:
an updated version includes eukaryotes. BMC Bioinformatics, 2003, 1322–1331
4: 41–54 45 Faley S L, Copland M, Wlodkowic D, et al. Microfluidic single cell
37 Hoogland C, Mostaguir K, Appel R D, et al. The World-2DPAGE arrays to interrogate signalling dynamics of individual, patient de-
Constellation to promote and publish gel-based proteomics data rived hematopoietic stem cells. Lab Chip, 2009, 9: 2659–2664
through the ExPASy server. J Proteomics, 2008, 71: 245–248 46 Colland F, Jacq X, Trouplin V, et al. Functional proteomics mapping
38 Craig R, Cortens J C, Fenyo D, et al. Using annotated peptide mass of a human signaling pathway. Genome Res, 2004, 14: 1324-1332
spectrum libraries for protein identification. J Proteome Res, 2006, 5: 47 Formstecher E, Aresta S, Collura V, et al. Protein interaction map-
1843–1849 ping: a Drosophila case study. Genome Res, 2005, 15: 376-384
39 Vizcaíno J A, Côté R, Reisinger F, et al. The proteomics identifica- 48 Dyson H J, Wright P E. According to current textbooks, a
tions database: 2010 update. Nucleic Acids Res, 2009, 38: well-defined three-dimensional structure is a prerequisite for the
D736–D742 function of a protein. Is this correct? IUBMB Life, 2006, 58:
40 Deutsch E W, Lam H, Aebersold R. PeptideAtlas: a resource for tar- 107–109
get selection for emerging targeted proteomics workflows. EMBO 49 Radivojac P, Iakoucheva L, Oldfield Christopher, et al. Intrinsic dis-
Rep, 2008, 9: 429–434 order and functional proteomics. Biophys J, 2007, 92: 1439–1456
41 Slotta D J, Barrett T, Edgar R. NCBI peptidome: a new public repos- 50 Sim K L, Uchida T, Miyano S. ProDDO: a database of disordered
itory for mass spectrometry peptide identifications. Nat Biotechnol, proteins from the Protein Data Bank (PDB). Bioinformatics, 2001, 17:
2009, 27: 600–601 379–380

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction
in any medium, provided the original author(s) and source are credited.

You might also like