100% found this document useful (8 votes)

115 views

Proteogenomics Digital DOCX Download

The document discusses the concept of proteogenomics, which integrates proteomics and genomics to enhance genome annotation and understand biological functions. It highlights the advancements in technologies like high-throughput DNA sequencing and mass spectrometry that enable the identification of novel genes and protein-coding features. The book aims to provide insights into the applications of proteogenomics in human disease research, particularly in cancer and personalized medicine.

Uploaded by

g.unhcachbuang.kiet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (8 votes)

115 views

Proteogenomics Digital DOCX Download

Uploaded by

g.unhcachbuang.kiet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Proteogenomics

Visit the link below to download the full version of this book:

https://medipdf.com/product/proteogenomics/

Click Download Now

More information about this series at http://www.springer.com/series/5584
Ákos Végvári
Editor

Proteogenomics
Editor
Ákos Végvári
Clinical Protein Science & Imaging,
Department of Medical
Bioengineering, Biomedical Center
Lund University
Lund, Sweden
Department of Pharmacology &
Toxicology
University of Texas Medical Branch
Galveston, TX, USA

ISSN 0065-2598 ISSN 2214-8019 (electronic)

Advances in Experimental Medicine and Biology
ISBN 978-3-319-42314-2 ISBN 978-3-319-42316-6 (eBook)
DOI 10.1007/978-3-319-42316-6

Library of Congress Control Number: 2016951213

© Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,
and transmission or information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher nor
the authors or the editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG Switzerland
Preface

The concept of proteogenomics, utilizing advances from the ﬁelds of pro-

teomics and genomics, was introduced at around the time of the completion
of the sequencing of the human genome. The emergence of proteogenomics
is mainly due to the rapid development of two key technologies: high-
throughput DNA sequencing and mass spectrometry-based proteomics. The
ability to determine protein sequences by mass spectrometry has provided a
unique tool to the identification and the verification of novel genes, predicted
exons, and open reading frames. Consequently, proteogenomics has been
used for genome annotation, including the validation of known or annotated
protein-coding genes; the improvement of gene annotations assigning correct
start sites; the mapping of signal peptides, proteolysis, and other posttransla-
tional modifications (an important element of biological function that is not
encoded directly in the genome); as well as the identification of splicing vari-
ants and mutant proteoforms often associated with disease progression.
Considering the rapid advancement in the field, it is perhaps appropriate to
define proteogenomics as an intensive research area that investigates the cor-
relations between proteomic data and their corresponding genomic and tran-
scriptomic data, keeping the goal to improve our knowledge about life at the
molecular level, which is a more complete view that has been initially sug-
gested. The interplay between the two data streams of genomics and pro-
teomics certainly allows for a better understanding of biological functions
and molecular mechanisms in health and disease. Today, genome sequencing
provides nearly complete coverage, including transcriptome profiling, while
targeted proteomics can be focused on specific regions of the proteome and
determine predicted proteins.
The goal of this book is to display this extended view on proteogenomics,
depicting research areas where proteogenomics is actively playing an essen-
tial role and also highlighting some emerging research arenas without pre-
tending to cover all fields of application. The chapters of this book offer the
readers a general insight to the integrative analyses of various types of omics
data and present advances within specific principles, such as next-generation
sequencing of DNA, mRNA sequencing, ribosome profiling, as well as mass
spectrometry- and antibody-based proteomics. The applications are selected
to exemplify the great potential of proteogenomics to contribute to human
disease research, particularly to cancer and personalized medicine.

v
vi Preface

Importantly, this book attempts to identify some common features that

integrate the various fields and areas where intensive efforts should be made
to drive research more efficiently in the near future. One of these is certainly
bioinformatics, which has shown amazing power and development during the
last couple of years and which is anticipated to provide powerful approaches
to improve our ability to work with and combine the large data sets that
genomics, transcriptomics, and proteomics generate.
At last, I would like to thank all the authors of this book for their excep-
tional contributions, sharing their expert views of the field, and presenting
their original research. Their enthusiasm and timely delivery of their manu-
scripts helped me tremendously to realize this project. It is my sincere hope
that the readers would enjoy this book as much as I enjoyed preparing it.

Galveston, TX, USA Ákos Végvári

March 1, 2016
Contents

1 Proteogenomic Tools and Approaches to Explore Protein

Coding Landscapes of Eukaryotic Genomes ............................. 1
Dhirendra Kumar and Debasis Dash
2 Next Generation Sequencing Data and Proteogenomics .......... 11
Kelly V. Ruggles and David Fenyö
3 Proteogenomics: Key Driver for Clinical Discovery
and Personalized Medicine.......................................................... 21
Ruggero Barbieri, Victor Guryev, Corry-Anke Brandsma,
Frank Suits, Rainer Bischoff, and Peter Horvatovich
4 Identification of Small Novel Coding Sequences,
a Proteogenomics Endeavor ........................................................ 49
Volodimir Olexiouk and Gerben Menschaert
5 Using Proteomics Bioinformatics Tools and Resources
in Proteogenomic Studies ............................................................ 65
Marc Vaudel, Harald Barsnes, Helge Ræder,
and Frode S. Berven
6 Mutant Proteogenomics............................................................... 77
Ákos Végvári
7 Proteogenomic Analysis of Single Amino Acid
Polymorphisms in Cancer Research........................................... 93
Alba Garin-Muga, Fernando J. Corrales, and Victor Segura
8 Developments for Personalized Medicine of Lung Cancer
Subtypes: Mass Spectrometry-Based Clinical
Proteogenomic Analysis of Oncogenic Mutations ..................... 115
Toshihide Nishimura and Haruhiko Nakamura
9 Proteogenomics for the Study of Gastrointestinal
Stromal Tumors............................................................................ 139
Tadashi Kondo

vii
viii Contents

10 Proteogenomics for the Comprehensive Analysis

of Human Cellular and Serum Antibody Repertoires.............. 153
Paula Díez and Manuel Fuentes
11 Antibody-Based Proteomics ........................................................ 163
Christer Wingren

Index ...................................................................................................... 181

Proteogenomic Tools
and Approaches to Explore Protein 1
Coding Landscapes of Eukaryotic
Genomes

Dhirendra Kumar and Debasis Dash

Abstract
Proteogenomic strategies aim to reﬁne genome-wide annotations of pro-
tein coding features by using actual protein level observations. Most of the
currently applied proteogenomic approaches include integrative analysis
of multiple types of high-throughput omics data, e.g., genomics, transcrip-
tomics, proteomics, etc. Recent efforts towards creating a human proteome
map were primarily targeted to experimentally detect at least one protein
product for each gene in the genome and extensively utilized proteoge-
nomic approaches. The 14 year long wait to get a draft human proteome
map, after completion of similar efforts to sequence the genome, explains
the huge complexity and technical hurdles of such efforts. Further, the
integrative analysis of large-scale multi-omics datasets inherent to these
studies becomes a major bottleneck to their success. However, recent
developments of various analysis tools and pipelines dedicated to prote-
ogenomics reduce both the time and complexity of such analysis. Here, we
summarize notable approaches, studies, software developments and their
potential applications towards eukaryotic genome annotation and clinical
proteogenomics.

Keywords
Shotgun proteomics • Peptide identiﬁcation • RNA-Seq • HUPO • Genome
annotation

1.1 Introduction

Biological systems are complex, self-replicable

D. Kumar • D. Dash (*)
G.N. Ramachandran Knowledge Centre for Genome machineries of which major components are pro-
Informatics, CSIR-Institute of Genomics and teins. Understanding the dynamics of protein
Integrative Biology, South Campus, Sukhdev Vihar, expression in these systems may lead to a better
Mathura Road, Delhi 110025, India interpretation of the underlying mechanisms and
e-mail: [email protected]

Á. Végvári (ed.), Proteogenomics, Advances in Experimental Medicine and Biology 926,
DOI 10.1007/978-3-319-42316-6_1
2 D. Kumar and D. Dash

the predictability of potential outcomes. However, sible protein species arising from the genome
the techniques for probing these proteome com- (Tanner et al. 2007). This is primarily due to
ponents are not completely unbiased, i.e., knowl- alternative splicing of transcripts and only a tiny
edge of each component of the proteome is fraction of the eukaryotic genome being protein
necessary and prerequisite to probe their expres- coding. Alternatively, proteogenomic databases
sion. These proteomic techniques are largely for eukaryotes, to discover novel protein iso-
dependent on mass spectrometry (MS) based forms, generally integrate high-throughput tran-
shotgun proteomics. Mass spectra, containing scriptomic information to discover new proteins
mass to charge ratios and intensities for pep- from MS data searches. The high error rate, a
tides and their fragments are searched against a byproduct of searching an extremely large data-
database of known proteins to identify the base, is one of the major concerns in most of
expressed proteins and their quantities (Eng et al. these studies (Krug et al. 2013; Yadav et al. 2013).
2011). One of the limitations of this method lies Another factor contributing to potential false pos-
in the database itself, against which the spectral itive identifications is genomic polymorphism
data generated in MS are searched. A protein between individual genomes and the reference
missing from the database cannot be probed for genome. These individual polymorphisms may
its expression, despite being present in the sam- result in new peptides from known genes, which
ple (Frank et al. 2007). Thus, for comprehensive may be mapped incorrectly to other places in the
proteome profiling, the search database should be genome, leading to incorrect assignment of novel
complete. However, most of these databases are translated genomic regions. Additionally, infer-
neither complete nor error free (Kumar et al. ring the exact isoform expressed in a given bio-
2016b). Proteogenomic techniques address this logical state is a difficult task in eukaryotic
problem by designing custom databases to iden- proteogenomics. Since various proteogenomic
tify the errors and achieve the completeness of studies utilize a translated transcriptome as
the proteome definition for any organism search database, which comprises of sequences
(Castellana and Bafna 2010; Nesvizhskii 2014). of several transcripts from the same gene, many
Contrary to the routine proteomic searches, pro- of the peptide identifications are shared among
teogenomic databases include proteins beyond multiple database entries. Inferring the expressed
the annotated proteome. Proteins from any organ- protein isoform/s from the identified peptide list
ism are generally annotated by computationally then becomes a non-trivial exercise and if incor-
predicting protein coding genes in the genome. rect it may adversely affect the conclusions. In
While largely correct, these predictions also con- addition to these, proteogenomic approaches are
tain several inaccuracies. Proteogenomics relies compute resource intensive (Castellana and
on the detection of unique peptides from the MS Bafna 2010). Modern day approaches integrate
data to correct these inaccuracies and refine the multiple layers of omics information to discover
protein annotations on a genome wide scale novel protein isoforms. Each of these omics data-
(Jaffe et al. 2004; Yates et al. 1995). sets, for example genomics, transcriptomics, pro-
Although very useful, these approaches are teomics, etc., is difficult to analyze independently.
full of conceptual and technical challenges Further, their integration requires multivariate
(Castellana and Bafna 2010). The order of com- analyses (Horvatovich et al. 2015; Zhang et al.
plexity of proteogenomic approaches varies for 2014) and considerations of multiple possible
different organisms. For example, for a prokary- explanations for the observation (Omenn et al.
otic genome, a six frame translated genome data- 2015).
base should represent almost all possible protein The complexity of such an analysis is reflected
coding genomic regions (Armengaud 2013; in several of the recent studies. For example, even
Kelkar et al. 2011; Kumar et al. 2013, 2014, after a decade since the human genome got
2016a). However, in the case of complex eukary- sequenced, the characterization of the human
otes it would represent only a fraction of the pos- proteome was achieved only recently and only as
1 Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes 3

a draft version (Kim et al. 2014; Wilhelm et al. tides, identified from proteogenomics, may reveal
2014). Nearly 20 % of the defined human protein translation at the intergenic, intronic or annotated
coding genes are yet to be characterized at the untranslated regions (UTRs) which may facilitate
protein level. Several worldwide initiatives are discovery of new genes, exons, splice variants
underway to detect at least one protein product and mutated proteins. However, such an analysis
for each of the human protein coding genes would require creation of custom search data-
(Deutsch et al. 2015; Kumar et al. 2015; Nilsson bases which maximizes the representation of
et al. 2015; Paik et al. 2015). Similar incomplete such novel proteoforms; isoforms of proteins.
proteome scenario exists for other model organ- Figure 1.1 highlights various possible custom
isms, like mouse (Brosch et al. 2011), rat (Kumar database approaches and associated potential dis-
et al. 2016b; Low et al. 2013), zebrafish (Kelkar coveries. Recently, various software tools and
et al. 2014), corn (Zea maize) (Castellana et al. pipelines have been developed which either cre-
2014), etc. Despite various advances in MS ate a custom database or provide an end to end
instrumentation and analysis methods, defining solution for proteogenomic data analysis and
the protein coding fraction for any genome conclusions. The most significant contribution of
remains incomplete. While the dynamics of pro- these software solutions is to expand the outreach
tein expression is certainly one of the causes, the of such approaches to a larger scientific commu-
limited sensitivity of the method to detect low nity, in addition to reducing the technical com-
abundant proteins remains an open challenge and plexity and potential errors.
a primary cause of not detecting many proteins.
Complexity of data analysis is another bottleneck
in the detection of many proteins. Proteogenomic 1.3 Proteogenomics Software
analyses directly address this point but are yet to Tools and Pipelines
be adapted in mainstream proteomic practice.
Several of the recent tools and software packages A typical proteogenomic analysis includes cus-
that have been developed for use in proteoge- tom database creation, peptide identification,
nomic analyses should make it an easy to imple- genomic mapping of identified peptides and
ment approach and should expand its applications. inferring the corrected or new gene model.
Here, we would describe various analysis tools Several of the recently developed tools offer only
and pipelines targeted for eukaryotic proteoge- a part of the proteogenomic analysis, whereas
nomic pipelines. few pipelines offer a complete proteogenomic
workflow imlementation. For example:

1.2 Basics of Proteogenomics – CustomProDB (Wang and Zhang 2013), an R

package that allows for the creation of custom
Proteomics allows probing the expression of pro- proteogenomic databases by incorporating
teins from biological samples in a high- single nucleotide polymorphism information
throughput manner (Steen and Mann 2004). from a common variant call format (vcf) file
Peptides are identified from mass spectra by or from RNA-Seq data
searching against a protein sequence database – SpliceDB (Burset et al. 2001) allows creation
using a search engine (Geer et al. 2004; Yadav of highly sensitive yet compact splice graph
et al. 2011) and identified peptides are mapped database in FASTA format which can be search
back to protein sequences to infer the expressed by any of the peptide identification tools
proteins (Eng et al. 2011). Proteogenomic – MSProGene (Zickmann and Renard 2015) is
approaches integrate these large-scale peptide another standalone application that allows
discoveries with genomics and transcriptomics creation of a sample specific search database
data to refine or enrich the annotation of protein from RNA-Seq data with network information
coding genes (Armengaud 2009). Novel pep- of peptide sharing among the database entries
4 D. Kumar and D. Dash

Fig. 1.1 Proteogenomic databases and reﬁnement of untranslated regions (Annotated). Blue color rectangles for
genome annotations. ORF Open Reading Frame, CDS cod- CDS and peptides correspond to gene on positive strand
ing DNA sequences, TIS translation initiation site, UTR whereas red colored ones for gene on negative strand

– TheProteogenomic Mapping Tool (Sanders (GAPP) (Shadforth et al. 2006) was designed
et al. 2011) allows mapping of peptides back to specific to the human genome. This web based
the genome in a quick and effective manner application improved the annotation for vari-
– SpliceVista (Zhu et al. 2014) is a Python ous genes by analyzing publicly available pro-
package that maps identified peptides on all of teomics data. However, this pipeline is no
the known splice-variants of proteins. It also longer active for use
allows integrated visualization of proteomics – PepLine (Ferro et al. 2008) is standalone soft-
data with transcript information ware for genome annotation which is indepen-
– dasHPPboard (Tabas-Madrid et al. 2015) is a dent of database search method. It rather relies
HUPO endorsed data integration platform on a hybrid tag based search to identify pep-
which permits analysis and visualization of tide tags and then maps and clusters these tags
multiple omics datasets including proteomics back to genome to discover potential trans-
– VESPA (Peterson et al. 2012) is a JAVA based lated regions. Due to the suspected low sensi-
application that enables integrated visualiza- tivity and high-error rates of tag based peptide
tion of transcriptomic and proteomics datasets detection and genome mapping approach, it
in proteogenomic context has only seen limited application in proteoge-
– iPiG (Kuhring and Renard 2012) allows inte- nomics research
gration of peptide identification into genome – Peppy (Risk et al. 2013) is one of the earliest
browser and thus, enables concurrent analysis developed pipelines for proteogenomic analy-
of multiple omics information sis. It is a fast and automated framework for
– PGx (Askenazi et al. 2015), a recent tool con- quickly searching MS data against the
verts peptide identifications into browser extremely large eukaryotic genome translated
extensible format (BED) which contain databases to discover novel translated regions.
genomic co-ordinates of features and can be Use of advanced computational methods in
visualized in genome browsers like UCSC this tool makes proteogenomic searches
– Among the earliest proteogenomic pipelines, implementable on simple desktop even for
Genome Annotating Proteomic Pipeline higher eukaryotic genomes which generally
1 Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes 5

necessitate higher memory and compute infra- junctions and somatic variations. By enabling
structure. Additionally, it allows a blind modi- searches against cancer specific variations
fication search to account for novel post from COSMIC database and fusion proteins,
translation modifications which otherwise are PGTools also allows human cancer specific
very difficult to detect by regular proteomics proteogenomic studies. Further, its multiple
searches. Despite these positive features, search engine approach adds sensitivity to the
Peppy has limited eukaryotic analyses appli- overall peptide detection process. However,
cation as a large fraction of novel proteins in due to differences in peptide detection confi-
eukaryotes originate from alternate splicing of dence inherent to variable database sizes,
transcripts which cannot be represented in a result integration from these different data-
genome translated search database as imple- bases presents new challenges. Additionally,
mented in this pipeline the approach lacks the strength of individual
– Enosi (Castellana et al. 2014) proteogenomic or tissue specific proteogenomic searches as
pipeline is comprised of two functionalities. that from RNA-Seq data
First, SpliceDB tool (Burset et al. 2001) is – ProteoAnnotator (Ghali et al. 2014) is a
used to create a comprehensive yet compact recent, open source and powerful pipeline for
database of splice junctions from RNA-Seq proteogenomic discoveries from MS datasets.
reads. This fasta formatted splice graph data- It addresses one of the common problems of
base is then searched with MS data using proteomics and proteogenomics research: file
MS-GF+ search engine (Kim and Pevzner format standards. The entire pipeline supports
2014) which is a sensitive tool to detect more and exports HUman Proteomics Organization
peptides. To evaluate novel proteogenomic (HUPO) – Proteomics Standards Initiative
events including splice junctions, Enosi uti- (PSI) supported file formats like
lizes a probabilistic scoring which takes into MzIdentML. Proteoannotator also allows
account the number of spectra and peptides multiple database searches but primarily relies
assigned to the locus, the quality of the on gene predictions. Searching MS data
assigned peptide spectral matches and the against gene predictions is an excellent
shared mapping of the peptide. The eventProb approach for a newly sequenced genome pri-
probabilistic score allows Enosi to rank and marily due to increased sensitivity of peptide
filter the proteogenomic findings according to detection attributable to small search database
their confidence. Further, the framework can compared to genomic or transcriptomic data-
utilize ab initio gene predictions and RNA- bases. The pipeline also introduces a “non-
Seq information to estimate the boundaries of canonical gene model score” calculation
alternate gene models which accommodate which allows to assign confidence values to
the identified novel peptides. Additionally, novel discoveries and thus automated assess-
Enosi pipeline is fully automated software and ment of quality of novel findings. In addition
utilizes multi-threading to speed up the MS to these new features, it also presents an auto-
data searches mated framework which integrates multiple
– PGTools (Nagaraj et al. 2015) is an end to end peptide search engines and comprehensive
solution which seamlessly integrates multiple statistical algorithm, FDRscore for result inte-
components of proteogenomic analysis. It is gration. Although it is very effective for prote-
an open source software suite which offers ogenomically annotating new genomes,
fully automated searches along with the meta- individual or sample based database searches
analysis and visualization of novel findings. It are difficult to implement in this framework
allows searches against multiple custom data- – Integrated transcriptomic-proteomic pipe-
bases, e.g., databases containing translated line (ITP) (Kumar et al. 2016b) is a recently
entries from transcripts, non-coding genes, published pipeline and comprises two analysis
UTRs, six frame translated genome, splice modules, each for transcriptomics and pro-
6 D. Kumar and D. Dash

teomics data. The transcriptomic analysis ation and thus facilitates clinical proteoge-
module uses Tuxedo suite of tools to align and nomic analysis
assemble RNA-Seq reads into transcripts by – GALAXY-P (Jagtap et al. 2014) is among the
utilizing the reference genome. Second mod- few web-based frameworks for proteogenom-
ule creates a translated transcriptome database ics. Despite its web based implementation, it
from the assembled transcripts and then allows extensive analysis for eukaryotic
searches mass spectra against this database genomes with flexibilities at every step of
using multiple search engines. Although the analysis. It extends the Galaxy bioinformatics
pipeline lacks an entirely automated structure framework for proteomics data analysis and
for public use, the approach has several advan- allows user to create custom integrative analy-
tages. For example, using a reference genome sis workflows. Default workflows within
guided transcriptome assembly provides a Galaxy-P allow MS data format conversion,
definitive transcript model for the discovered creation of proteogenomic databases from
novel peptides and thus, proper reannotation various web resources, two step database
of exon boundaries and coding splice variants search and statistical assessment of identified
are possible. Similarly, quantities of transcript peptides, sequence similarity searches of
isoforms may indicate most probable protein novel findings, evaluation of peptide-spectral
coding isoform despite extensive peptide shar- matches by visualization and comprehensive
ing among isoforms. It also allows creation of genomic visualization of novel peptides. The
tissue or individual specific search databases Galaxy framework allows smooth integration
specifically useful in clinical studies. In addi- of various genomics and transcriptomics data
tion to these, multiple search engines and analysis and with the Galaxy-P development,
FDRscore (Jones et al. 2009; Kumar et al. integration of proteomics with other omics
2013) based result integration within the sec- datasets becomes easy to implement. For
ond module EuGenoSuite, maximize both example, Sheynkman et al. (2014) developed
the sensitivity and specificity of peptide detec- three analysis workflows which enable pro-
tion. Identified peptides are also exported into teomics data searching within Galaxy-P
gene transfer format (GTF) which can be eas- framework against single amino acid poly-
ily integrated into most of the genome brows- morphism (SAP) and splice variant database
ers and thus enabling easy visualization of developed from RNA-Seq data
novel regions – QUILTS (Zhang et al. 2009) is a software to
– PPLine (Krasnov et al. 2015) is a Python create individual specific human proteoge-
language based automated proteogenomic nomic search databases by integrating SAP
pipeline which integrates proteomics with variations, splice variants, gene fusions to
exome sequencing and transcriptome canonical protein sequences. Individual spe-
sequencing technologies. Its major focus is cific genomic and transcriptomic variations
to discover variant novel peptides resulting have been attributed to different diseases pri-
from single nucleotide polymorphism (SNP), marily cancers and thus, it should allow clini-
insertions-deletions in the genomic DNA and cal proteogenomic studies focused to detect
due to alternative splicing. It integrates sev- disease specific variants. However, it is lim-
eral tools to accurately call SNPs from exome ited to human only and does not allow similar
sequencing reads, align RNA-Seq reads, analysis for other model organisms, used to
assemble transcripts including splice junc- study human diseases.
tion isoforms from reads and then allows
proteomics data searches against variant pep- With so many alternatives, one compelling
tide database. This comprehensive software question still remains: Which one is the best?
enables sample/tissue specific database cre- Although, there have not been many studies
1 Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes 7

which compare the various pipelines available consideration while evaluating a novel translated
for eukaryotic proteogenomics, our recent study region.
suggests that many of these are actually comple- Integration of other omics readouts in prote-
mentary in their results (Kumar et al. 2016b). We ogenomic frameworks could also be extremely
concluded that due to differences in their search beneficial. Particularly, ribosome bound RNAs
database compositions ITP, Enosi, (Ribosome profiling), rather than entire tran-
ProteoAnnotator and Peppy bring complemen- scriptome, to create a custom search database
tary peptide detections. Although, there are many that would allow for a better profiling of trans-
technical challenges to run multiple proteoge- lated proteins and thus a better genome annota-
nomic pipelines on a large scale proteomic data- tion. The recently developed PROTEOFORMER
set, the strategy would help achieve a (Crappe et al. 2014) pipeline integrates ribosome
comprehensive catalogue of novel translation profiling with MS based proteomics and prote-
events across genome. ogenomics analysis and could be extremely use-
ful in eukaryotic genome annotations. However,
a similar integration in other existing pipelines
1.4 Future Perspectives would expand the reach of such methods. These
pipelines also need to include provisions for
Although these tools have reduced the technical unsequenced genomes. Custom de novo assem-
complexity of proteogenomic searches, quality bled transcriptomes may provide templates for
assessment of novel discoveries still remains a proteome profiling from MS data (Brinkman
formidable challenge. Many studies indicate the et al. 2015). Proteogenomic pipelines need to be
necessity of manual inspection of identified pep- extended to include genome independent data-
tide spectrum matches to ascertain true identifi- base creation, to facilitate similar analysis for
cations (Omenn et al. 2015). However, it is not unsequenced or partially sequenced genomes.
feasible to implement manual inspection on large Proteogenomic analyses hold promise for
scale studies. Tools like Enosi and human disease related studies as well. Recent
ProteoAnnotator devised automated scoring sys- studies suggest the potential of proteogenomics
tems to evaluate the novel identifications sepa- in the discovering novel candidates in different
rately for their authenticity, but a comprehensive cancers (Alfaro et al. 2014; Rivers et al. 2014;
statistical framework dedicated to large scale Woo et al. 2014; Zhang et al. 2014). However,
proteogenomic studies is still needed. For exam- most of the existing pipelines do not consider dis-
ple, both of the studies claiming to achieve a draft ease related genetic components. Extending these
human proteome map have been heavily criti- analysis frameworks would not only benefit new
cized for their high number of “low quality” studies, they would also assist in revisiting previ-
identifications, adding up to false positives ous datasets for proteogenomic reanalysis.
(Ezkurdia et al. 2014). There have been few
approaches suggested to overcome these hurdles Acknowledgements Authors would like to thank CSIR-
(Shanmugam and Nesvizhskii 2015; Zhang et al. IGIB for compute infrastruture and project BSC0121 for
publication charges.
2015). However, these are yet to be implemented
in automated pipelines. Other than statistical
attributes, false positives may also arise due to
incorrect genomic mapping of identified pep- References
tides. The genome of an individual can vary con-
Alfaro, J. A., Sinha, A., Kislinger, T., & Boutros, P. C.
siderably from the reference genomes at various (2014). Onco-proteogenomics: Cancer proteomics
places, characterized by genomic variations like joins forces with genomics. Nature Methods, 11(11),
SNPs, insertions and deletions. If these are not 1107–1113. Available from: PM:25357240.
Armengaud, J. (2009). A perfect genome annotation is
taken into account, many of the peptide identifi-
within reach with the proteomics and genomics alli-
cations may be incorrectly assigned to novel loci. ance. Current Opinion in Microbiology, 12(3), 292–
Proteogenomic pipelines need to include this 300. Available from: PM:19410500.

Viral Fitness The Next SARS and West Nile in the Making All-in-One Download
100% (9)
Viral Fitness The Next SARS and West Nile in the Making All-in-One Download
17 pages
Kumar & Clark's Medical Management and Therapeutics Textbook PDF Download
100% (9)
Kumar & Clark's Medical Management and Therapeutics Textbook PDF Download
15 pages
Psychosocial Treatment for Medical Conditions Principles and Techniques, 1st Edition Instant PDF Download
100% (8)
Psychosocial Treatment for Medical Conditions Principles and Techniques, 1st Edition Instant PDF Download
14 pages
Behavioral Neurobiology of Bipolar Disorder and its Treatment Full-Resolution Download
100% (10)
Behavioral Neurobiology of Bipolar Disorder and its Treatment Full-Resolution Download
17 pages
Cytoskeleton Signalling and Cell Regulation A Practical Approach - 1st Edition Reference Book Download
100% (8)
Cytoskeleton Signalling and Cell Regulation A Practical Approach - 1st Edition Reference Book Download
15 pages
A Time Release History of the Opioid Epidemic Best Quality Download
100% (10)
A Time Release History of the Opioid Epidemic Best Quality Download
17 pages
Helping Substance Abusing Women of Vulnerable Populations Effective Treatment Principles and Strategies Complete Chapter Download
100% (15)
Helping Substance Abusing Women of Vulnerable Populations Effective Treatment Principles and Strategies Complete Chapter Download
16 pages
Uncover The Mindfulness Revolution Leading Psychologists, Scientists, Artists, and Meditation Teachers on the Power of Mindfulness in Daily Life High-Resolution PDF Download
100% (12)
Uncover The Mindfulness Revolution Leading Psychologists, Scientists, Artists, and Meditation Teachers on the Power of Mindfulness in Daily Life High-Resolution PDF Download
15 pages
Environmental Stressors and Gene Responses [FULL VERSION DOWNLOAD]
100% (7)
Environmental Stressors and Gene Responses [FULL VERSION DOWNLOAD]
15 pages
Women Prisoners and Health Justice Perspectives, Issues and Advocacy for an International Hidden Population 1st Edition Scribd PDF Download
100% (9)
Women Prisoners and Health Justice Perspectives, Issues and Advocacy for an International Hidden Population 1st Edition Scribd PDF Download
15 pages
Artful Improv Explore Color Recipes, Building Blocks & Free Motion Quilting Latest Edition Download
100% (8)
Artful Improv Explore Color Recipes, Building Blocks & Free Motion Quilting Latest Edition Download
16 pages
Learning RFT An Introduction to Relational Frame Theory and Its Clinical Application eBook Full Text
100% (11)
Learning RFT An Introduction to Relational Frame Theory and Its Clinical Application eBook Full Text
16 pages
Evidence Informed Health Policy, Second Edition Using EBP to Transform Policy in Nursing and Healthcare 2nd Edition All Sections Download
100% (11)
Evidence Informed Health Policy, Second Edition Using EBP to Transform Policy in Nursing and Healthcare 2nd Edition All Sections Download
14 pages
PROG NUCLEIC ACID RES&MOLECULAR BIO V39 Instant EPUB Download
100% (9)
PROG NUCLEIC ACID RES&MOLECULAR BIO V39 Instant EPUB Download
17 pages
Bioactive Factors and Processing Technology for Cereal Foods PDF DOCX DOWNLOAD
100% (12)
Bioactive Factors and Processing Technology for Cereal Foods PDF DOCX DOWNLOAD
17 pages
Oxford Handbook of Pre hospital Care - 2nd Edition Full PDF Download
100% (13)
Oxford Handbook of Pre hospital Care - 2nd Edition Full PDF Download
15 pages
The Joy of Work? Jobs, Happiness, and You - 1st Edition Digital EPUB Download
100% (9)
The Joy of Work? Jobs, Happiness, and You - 1st Edition Digital EPUB Download
16 pages
Trending Now A User Guide to the GF/CF Diet for Autism, Asperger Syndrome and AD/HD Instant EPUB Download
100% (15)
Trending Now A User Guide to the GF/CF Diet for Autism, Asperger Syndrome and AD/HD Instant EPUB Download
20 pages
Handbook of Research Ethics in Psychological Science No-Wait Download
100% (12)
Handbook of Research Ethics in Psychological Science No-Wait Download
17 pages
Reproductive and Developmental Toxicology PDF DOCX DOWNLOAD
100% (10)
Reproductive and Developmental Toxicology PDF DOCX DOWNLOAD
16 pages
Get Yours Aggregation of Therapeutic Proteins 1st Edition Academic PDF Download
100% (10)
Get Yours Aggregation of Therapeutic Proteins 1st Edition Academic PDF Download
22 pages
Evidence Based Neurosurgery An Introduction - 1st Edition Complete PDF Download
100% (8)
Evidence Based Neurosurgery An Introduction - 1st Edition Complete PDF Download
16 pages
I Can't Remember Family Stories of Alzheimer's Disease Full-Resolution Download
100% (8)
I Can't Remember Family Stories of Alzheimer's Disease Full-Resolution Download
14 pages
Money and Meaning, + URL New Ways to Have Conversations About Money with Your Clients A Guide for Therapists, Coaches, and Other Professionals 1st Edition Final Version Download
100% (11)
Money and Meaning, + URL New Ways to Have Conversations About Money with Your Clients A Guide for Therapists, Coaches, and Other Professionals 1st Edition Final Version Download
15 pages
Advances in Child Development and Behavior Authorized Download
100% (13)
Advances in Child Development and Behavior Authorized Download
17 pages
Are Racists Crazy? How Prejudice, Racism, and Antisemitism Became Markers of Insanity Unlimited Ebook Download
100% (14)
Are Racists Crazy? How Prejudice, Racism, and Antisemitism Became Markers of Insanity Unlimited Ebook Download
14 pages
Behavioral Addictions Criteria, Evidence, and Treatment full download
100% (10)
Behavioral Addictions Criteria, Evidence, and Treatment full download
17 pages
Intellectual Disability and Ill Health A Review of the Evidence - 1st Edition Premium Download
100% (8)
Intellectual Disability and Ill Health A Review of the Evidence - 1st Edition Premium Download
15 pages
Neuropsychological Aspects of Substance Use Disorders Evidence Based Perspectives full download
100% (10)
Neuropsychological Aspects of Substance Use Disorders Evidence Based Perspectives full download
16 pages
The Alkaloids Full Text Download
100% (9)
The Alkaloids Full Text Download
17 pages
Child Maltreatment Expanding Our Concept of Helping 1st Edition Annotated PDF Download
100% (9)
Child Maltreatment Expanding Our Concept of Helping 1st Edition Annotated PDF Download
16 pages
molecular Approaches to Immunology Chapter-by-Chapter Download
100% (13)
molecular Approaches to Immunology Chapter-by-Chapter Download
15 pages
Hip and Knee Pain Disorders An evidence informed and clinical based approach integrating manual therapy and exercise Full Version Download
100% (10)
Hip and Knee Pain Disorders An evidence informed and clinical based approach integrating manual therapy and exercise Full Version Download
15 pages
Integrating Psychotherapy and Psychophysiology Theory, Assessment, and Practice - 1st Edition Entire Book Download
100% (14)
Integrating Psychotherapy and Psychophysiology Theory, Assessment, and Practice - 1st Edition Entire Book Download
16 pages
The Whole Body Workbook for Cancer A Complete Integrative Program for Increasing Immunity and Rebuilding Health Google Drive Download
100% (8)
The Whole Body Workbook for Cancer A Complete Integrative Program for Increasing Immunity and Rebuilding Health Google Drive Download
16 pages
Experimental Toxicology The Basic Issues 2nd Edition Dropbox Download
100% (17)
Experimental Toxicology The Basic Issues 2nd Edition Dropbox Download
15 pages
Analysis of Incidence Rates - 1st Edition Full Version Download
100% (10)
Analysis of Incidence Rates - 1st Edition Full Version Download
15 pages
Aging and Lung Disease A Clinical Guide, 1st Edition Unlimited Ebook Download
100% (9)
Aging and Lung Disease A Clinical Guide, 1st Edition Unlimited Ebook Download
17 pages
The Caring Self The Work Experiences of Home Care Aides, 1st Edition High-Quality Download
100% (9)
The Caring Self The Work Experiences of Home Care Aides, 1st Edition High-Quality Download
16 pages
Mental Health Promotion, 1st Edition Fast Download
100% (9)
Mental Health Promotion, 1st Edition Fast Download
15 pages
A Guide to Trance Land A Practical Handbook of Ericksonian and Solution Oriented Hypnosis High-Quality Download
100% (11)
A Guide to Trance Land A Practical Handbook of Ericksonian and Solution Oriented Hypnosis High-Quality Download
14 pages
Treating Victims of Torture and Violence Theoretical Cross Cultural, and Clinical Implications Entire Volume Download
100% (10)
Treating Victims of Torture and Violence Theoretical Cross Cultural, and Clinical Implications Entire Volume Download
16 pages
Neuroticism A New Framework for Emotional Disorders and Their Treatment Secure eBook Download
100% (12)
Neuroticism A New Framework for Emotional Disorders and Their Treatment Secure eBook Download
14 pages
[Ebook PDF] 5 Minute Mindfulness Simple Daily Shortcuts to Transform Your Life Full Text PDF
100% (11)
[Ebook PDF] 5 Minute Mindfulness Simple Daily Shortcuts to Transform Your Life Full Text PDF
16 pages
The Effects of Estrogen on Brain Function All Sections Download
100% (10)
The Effects of Estrogen on Brain Function All Sections Download
14 pages
Osteoimmunology Interactions of the Immune and Skeletal Systems Entire Volume Download
100% (13)
Osteoimmunology Interactions of the Immune and Skeletal Systems Entire Volume Download
17 pages
Speak What We Feel Not What We Ought to Say Instant Download
100% (8)
Speak What We Feel Not What We Ought to Say Instant Download
14 pages
Preimplantation Genetic Diagnosis - 2nd Edition Study Guide Download
100% (9)
Preimplantation Genetic Diagnosis - 2nd Edition Study Guide Download
14 pages
Genetic Improvement of Farmed Animals Reference Book Download
100% (12)
Genetic Improvement of Farmed Animals Reference Book Download
16 pages
Emery and Rimoins Principles and Practice of Medical Genetics and Genomics Foundations - 7th Edition Complete Digital Book
100% (15)
Emery and Rimoins Principles and Practice of Medical Genetics and Genomics Foundations - 7th Edition Complete Digital Book
14 pages
Statistical Methods for Dynamic Disease Screening and Spatio Temporal Disease Surveillance - 1st Edition Ebook Download
100% (9)
Statistical Methods for Dynamic Disease Screening and Spatio Temporal Disease Surveillance - 1st Edition Ebook Download
15 pages
A Clinical Guide to Transcranial Magnetic Stimulation - 1st Edition Direct Download
100% (8)
A Clinical Guide to Transcranial Magnetic Stimulation - 1st Edition Direct Download
14 pages
CHINESE MEDICINE MODERN PRACTICE (V1) High-Resolution PDF Download
100% (9)
CHINESE MEDICINE MODERN PRACTICE (V1) High-Resolution PDF Download
16 pages
Gender, Health and Welfare 1st Edition pdf docx
100% (12)
Gender, Health and Welfare 1st Edition pdf docx
17 pages
Curriculum Development for Medical Education A Six Step Approach - 4th Edition Accessible PDF Download
100% (10)
Curriculum Development for Medical Education A Six Step Approach - 4th Edition Accessible PDF Download
14 pages
Ketamine The Story of Modern Psychiatry's Most Fascinating Molecule One-Click eBook Download
100% (7)
Ketamine The Story of Modern Psychiatry's Most Fascinating Molecule One-Click eBook Download
15 pages
Positive Health The Basics, 1st Edition Textbook PDF Download
100% (12)
Positive Health The Basics, 1st Edition Textbook PDF Download
17 pages
Bulbous Plants Biotechnology 1st Edition Full Digital Edition
100% (8)
Bulbous Plants Biotechnology 1st Edition Full Digital Edition
15 pages
Social Inclusion of People with Mental Illness - 1st Edition High-Quality Download
100% (13)
Social Inclusion of People with Mental Illness - 1st Edition High-Quality Download
16 pages
Complete Download (Ebook) Proteomics : Targeted Technology, Innovations and Applications by Manuel Fuentes; Joshua LaBaer ISBN 9781908230621, 1908230622 PDF All Chapters
100% (2)
Complete Download (Ebook) Proteomics : Targeted Technology, Innovations and Applications by Manuel Fuentes; Joshua LaBaer ISBN 9781908230621, 1908230622 PDF All Chapters
69 pages
Read Medical Microbiology - 9th Edition pdf epub
100% (9)
Read Medical Microbiology - 9th Edition pdf epub
15 pages
Full Download Keto Desserts For Dummies 1st Edition Best Quality Download
100% (17)
Full Download Keto Desserts For Dummies 1st Edition Best Quality Download
23 pages
Expert Pick Success in Academic Surgery Basic Science Fast eBook Download
100% (16)
Expert Pick Success in Academic Surgery Basic Science Fast eBook Download
14 pages
Imaging in Geriatrics (FULL VERSION DOWNLOAD)
100% (11)
Imaging in Geriatrics (FULL VERSION DOWNLOAD)
16 pages
CALM For Women Who Worry Authorized Download
100% (8)
CALM For Women Who Worry Authorized Download
16 pages
Active Surveillance for Localized Prostate Cancer A New Paradigm for Clinical Management - 1st Edition Total Access eBook
100% (14)
Active Surveillance for Localized Prostate Cancer A New Paradigm for Clinical Management - 1st Edition Total Access eBook
14 pages
Molecular Biology and Pathogenicity of Mycoplasmas, 1st Edition Entire Ebook Download
100% (12)
Molecular Biology and Pathogenicity of Mycoplasmas, 1st Edition Entire Ebook Download
17 pages
Out of the Blue True Life Experiences of Awakening, Revelation, and Transformation Entire PDF eBook
100% (11)
Out of the Blue True Life Experiences of Awakening, Revelation, and Transformation Entire PDF eBook
14 pages
365 Ways to Reduce Stress Everyday Tips to Help You Relax, Rejuvenate, and Refresh Full Text Download
100% (10)
365 Ways to Reduce Stress Everyday Tips to Help You Relax, Rejuvenate, and Refresh Full Text Download
14 pages
Deep Dive The Spice Collector's Cookbook Easy family recipes inspired by spices from across the globe Digital DOCX Download
100% (11)
Deep Dive The Spice Collector's Cookbook Easy family recipes inspired by spices from across the globe Digital DOCX Download
17 pages
Exhale 40 Breathwork Exercises to Help You Find Your Calm, Supercharge Your Health, and Perform at Your Best Educational eBook Download
100% (8)
Exhale 40 Breathwork Exercises to Help You Find Your Calm, Supercharge Your Health, and Perform at Your Best Educational eBook Download
15 pages
Before Late Modern Asian Kitchen Essential and Easy Recipes for Ramen, Dumplings, Dim Sum, Stir Fries, Rice Bowls, Pho, Bibimbaps, and More Official Download
100% (19)
Before Late Modern Asian Kitchen Essential and Easy Recipes for Ramen, Dumplings, Dim Sum, Stir Fries, Rice Bowls, Pho, Bibimbaps, and More Official Download
23 pages
IBT Practical Assignment MEMO Genomics S1 FGuerfali
No ratings yet
IBT Practical Assignment MEMO Genomics S1 FGuerfali
4 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
17 pages
UniProt SwissProt
No ratings yet
UniProt SwissProt
4 pages
Talk 06 - Biotechnology Industry in Malaysia, Opportunities & Challenge - Prof. Syed Mohsin
No ratings yet
Talk 06 - Biotechnology Industry in Malaysia, Opportunities & Challenge - Prof. Syed Mohsin
26 pages
Biotechnology Research in India Sem 2
No ratings yet
Biotechnology Research in India Sem 2
14 pages
NC_000006.12 Chromosome 6 Reference GRCh37.p13 Primary Assembly
No ratings yet
NC_000006.12 Chromosome 6 Reference GRCh37.p13 Primary Assembly
1 page
IBT Course Schedule 2023
No ratings yet
IBT Course Schedule 2023
3 pages
PHD Thesis Protein Expression
100% (2)
PHD Thesis Protein Expression
8 pages
download/computer Science PDF
No ratings yet
download/computer Science PDF
20 pages
TNSCST Workshop Brochure
No ratings yet
TNSCST Workshop Brochure
2 pages
(Ebook) Statistical bioinformatics with R by Sunil K. Mathur ISBN 9780123751041, 0123751047 pdf download
100% (1)
(Ebook) Statistical bioinformatics with R by Sunil K. Mathur ISBN 9780123751041, 0123751047 pdf download
60 pages
PDF Advances in Bioinformatics Vijai Singh download
100% (3)
PDF Advances in Bioinformatics Vijai Singh download
40 pages
B.sc. (H) 2022-Sem.-Vi-Iv-Ii (CBCS) 2015-2018-07-04-2022
No ratings yet
B.sc. (H) 2022-Sem.-Vi-Iv-Ii (CBCS) 2015-2018-07-04-2022
9 pages
Laboratory Report On Practical 2 - Multiple Sequence Allignment (A184381)
No ratings yet
Laboratory Report On Practical 2 - Multiple Sequence Allignment (A184381)
8 pages
On-Line Viterbi Algorithm For Analysis of Long Biological Sequences
No ratings yet
On-Line Viterbi Algorithm For Analysis of Long Biological Sequences
12 pages
Annamalai University: Distance Education M.Sc. (All Courses) Degree Examinations - December 2011
No ratings yet
Annamalai University: Distance Education M.Sc. (All Courses) Degree Examinations - December 2011
8 pages
Bioinformatics
No ratings yet
Bioinformatics
10 pages
Blast
No ratings yet
Blast
28 pages
Cover Letter
No ratings yet
Cover Letter
1 page
E0027-Revised Scheme of Odd Semester Exam. 2024-25
No ratings yet
E0027-Revised Scheme of Odd Semester Exam. 2024-25
36 pages
Nutrigenomics (Oxidative Stress and Disease) (PDFDrive)
100% (2)
Nutrigenomics (Oxidative Stress and Disease) (PDFDrive)
501 pages
Student Guide
No ratings yet
Student Guide
51 pages
Structural Proteomics Highthroughput Methods 1st Edition Russell L Marsden download
No ratings yet
Structural Proteomics Highthroughput Methods 1st Edition Russell L Marsden download
88 pages
Ebooks File Guide To Health Informatics Arnold Publication 2nd Edition Enrico Coiera All Chapters
100% (17)
Ebooks File Guide To Health Informatics Arnold Publication 2nd Edition Enrico Coiera All Chapters
84 pages
Itb0809 Slides p1 431 PDF
No ratings yet
Itb0809 Slides p1 431 PDF
431 pages
Instant download Understanding Bioinformatics 1st Edition Marketa Zveibil pdf all chapter
100% (1)
Instant download Understanding Bioinformatics 1st Edition Marketa Zveibil pdf all chapter
67 pages
Proteomics Introduction
67% (3)
Proteomics Introduction
39 pages
Mutasim Billah Offical CV-Updated
No ratings yet
Mutasim Billah Offical CV-Updated
4 pages
Download ebooks file (Ebook) Bacterial Regulatory RNA. Methods and protocols by Kenneth C. Keiler ISBN 9781617799488, 1617799483 all chapters
100% (3)
Download ebooks file (Ebook) Bacterial Regulatory RNA. Methods and protocols by Kenneth C. Keiler ISBN 9781617799488, 1617799483 all chapters
53 pages
AMP Cancer
No ratings yet
AMP Cancer
20 pages

Proteogenomics Digital DOCX Download

Uploaded by

Proteogenomics Digital DOCX Download

Uploaded by

Proteogenomics

Click Download Now

ISSN 0065-2598 ISSN 2214-8019 (electronic)

Library of Congress Control Number: 2016951213

© Springer International Publishing Switzerland 2016

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The concept of proteogenomics, utilizing advances from the ﬁelds of pro-

Importantly, this book attempts to identify some common features that

Galveston, TX, USA Ákos Végvári

1 Proteogenomic Tools and Approaches to Explore Protein

10 Proteogenomics for the Comprehensive Analysis

Index ...................................................................................................... 181

Dhirendra Kumar and Debasis Dash

Biological systems are complex, self-replicable

© Springer International Publishing Switzerland 2016 1

1.2 Basics of Proteogenomics – CustomProDB (Wang and Zhang 2013), an R

You might also like