Samaras_2019
Samaras_2019
Received September 14, 2019; Revised October 11, 2019; Editorial Decision October 11, 2019; Accepted October 15, 2019
ABSTRACT INTRODUCTION
ProteomicsDB (https://www.ProteomicsDB.org) ProteomicsDB (https://www.ProteomicsDB.org) is an in-
started as a protein-centric in-memory database for memory database initially developed for the exploration of
the exploration of large collections of quantitative large quantities of quantitative human mass spectrometry-
mass spectrometry-based proteomics data. The based proteomics data including the first draft of the hu-
data types and contents grew over time to include man proteome (1). Among many features, it allows the real-
time exploration and retrieval of protein abundance values
RNA-Seq expression data, drug-target interactions
across different tissues, cell lines, and body fluids via inter-
and cell line viability data. In this manuscript, we active expression heat maps and body maps. Today, Pro-
summarize new developments since the previous teomicsDB supports multiple use cases across different dis-
update that was published in Nucleic Acids Research ciplines and covering a wide range of data (2). For instance,
in 2017. Over the past two years, we have enriched tandem mass spectra, peptide identifications and peptide
the data content by additional datasets and extended proteotypicity values can be used as starting points to de-
the platform to support protein turnover data. An- velop targeted mass spectrometry assays. Because of the re-
other important new addition is that ProteomicsDB cent incorporation of a large amount of reference spectra
now supports the storage and visualization of data from the ProteomeTools project (3,4) as well as spectra pre-
collected from other organisms, exemplified by dicted by the artificial intelligence Prosit (5), both experi-
mental and reference spectra can be used for assay devel-
Arabidopsis thaliana. Due to the generic design of
opment and to validate the identification of so far unob-
ProteomicsDB, all analytical features available for
served, or in fact any proteins. The integration of pheno-
the original human resource seamlessly transfer typic data allows the exploration of the dose-dependent ef-
to other organisms. Furthermore, we introduce a fect of drugs of interest (e.g. clinically approved drugs) on
new service in ProteomicsDB which allows users to multiple cell lines (6–9). The dynamic identifier mapping
upload their own expression datasets and analyze in ProteomicsDB allows the integration of transcriptomics
them alongside with data stored in ProteomicsDB. data from e.g. the Human Protein Atlas project (10) and
Initially, users will be able to make use of this feature Bgee (11), and thus facilitates the automated integration of
in the interactive heat map functionality as well as different data sources within ProteomicsDB. This, in turn,
the drug sensitivity prediction, but ultimately will be allows the development of new tools. A wide range of drug-
able to use all analytical features of ProteomicsDB target interaction data can be visualized in ProteomicsDB
as well, which enables the exploration of combination treat-
in this way.
ments in a dose-dependent protein-drug interaction graph
in-silico.
* To
whom correspondence should be addressed. Tel: +49 8161 71 4202; Fax: +49 8161 71 5931; Email: [email protected]
Correspondence may also be addressed to Bernhard Kuster. Email: [email protected]
C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact [email protected]
D1154 Nucleic Acids Research, 2020, Vol. 48, Database issue
ProteomicsDB is becoming an increasingly valuable re- Prosit. The right section includes links that trigger the se-
source in (proteomic) life science research, evidenced by lection of the corresponding organism. To make organism
the increasing number of external resources linking to Pro- selection available throughout the web interface, we addi-
teomicsDB, such as UniProt (12) and GeneCards (13), as tionally adjusted the left sidebar to show one icon per avail-
well as resources making use of our application program- able organism. The ‘Feedback’ button that that was previ-
ming interface (API) to show e.g. protein expression infor- ously located in that position was transferred to the right
mation, as done by OmniPathDB (14) and Gene Info eX- pane below the ‘Help’ button. In light of these changes, all
tension (GIX) (15). internal procedures and endpoints (e.g. API) were adjusted
In this version, we expanded the data content of Pro- to support the new data types and organisms.
teomicsDB by including additional publically available as Figure 2B depicts the data expansion in ProteomicsDB
well as in-house generated proteomic and transcriptomic since 2017, grouped by categories. By re-analyzing and up-
studies. Furthermore, we expanded the drug-target interac- loading more publically available proteomics studies, we in-
tion data now covering ∼1500 kinase inhibitors and tool creased the tissue coverage of ProteomicsDB by ∼70 hu-
External resources
OmniPathDB GIX
cross-referencing data-reusability
machine learning
NAA16
R G Q T G V L A V L H L RARA
REPS1 FRK
NLK
VDR
b2
EPHA4 MYLK
PKMYT1 RIPK2
SRC MAP4K5
y9
YES1 EPHA2
KRT2
LIMK1 MINK1
KRT10 EPHB2 LCK
y8
CDH1
Rap1 LIMK2 MAPKAPK2 PKN1
y2
signaling DASATINIB
PSMB4 EPHB4
y1
FYN SIK2
pathway
y7
y6
y5
PSMB1
y11
b5
PKN2
b6
y3
b7
y4
UBC TGFBR2
BTK TEC
EGFR
PSMB3 KRT9
DDR2
PAG1
PTK6
LGALS3BP
PRKCQ
tyrosine
kinase
inhibitor
resistance
SRC
DDR1
PSMB6 CSNK2A1
RANBP1
SLC5A6 BCR
MAP4K2LYN
DDR1
KHDRBS1
PSMB2 INPPL1
COG8
TPP2 ABL1
ABL2
MMP2 ErbB
PSMB5 HIST1H2BL
CLK1
IMATINIB
signaling CSNK2A2
PTPN1
pathway Chemokine
signaling
PSMB7 HRNR
CSNK2B KRT9
EIF3J
BAFETINIB MAPK14
pathway
PSMB8 FAM83A
ATPIF1 GRB2 CDK3
PSMC5
SHC1 Endocrine
PSMB9 NQO2
JAGN1
ATL3
PEF1 SLC25A4
resistance
PSMB10 PNP SMC4
System Information
Data Content Main memory: 125GB,
>106 Dose-responses >108 Reference spectra
CPU cores: 64
>107 Drug sensitivity measurements >108 Peptide identifications
>107 Protein expression measurements >108 Experimental spectra
>108 Transcript expression measurements >1010 Fragment ions
Local network
16 GBit
Data layers
Figure 1. The architecture of ProteomicsDB. The production unit hosts the SAP HANA in-memory database management system which involves three
of the presented layers: the data layers, data content and the calculation layers. Parts of the calculation layers are shared between the production unit and
the compute node, such as the clustering and correlation procedures for the interactive expression heat map which are calculated by the Rserver. Part
of the data content is stored in the network storage unit, so that data are always available throughout the network if needed. The entire infrastructure is
intra-connected via a 16 Gbit bandwidth local network that enables rapid communication and data transfer between units.
in ProteomicsDB (e.g. Inhibitor potency/selectivity analy- ers the protein melting properties for many organisms (un-
sis) can be used to discover new lead compounds for medic- published). Therefore, users can more thoroughly study the
inal chemistry programs targeting a specific kinase of in- effect of temperature on selected proteins. We now cover
terest (20,21). The dose-response curves can be explored the melting properties of ∼13 000 human proteins. Pro-
in the ‘Biochemical assay’ tab of the protein details view. teomicsDB thus provides an extensive resource and data-
This view allows users to filter the data by different prop- driven guidance on which temperature range should be used
erties, so that only compounds that fit the desired criteria for e.g. a thermal shift assay or which temperature would
will be displayed. For all curves, full experimental designs be suitable for an isothermal dose response assay (ITDR).
are stored for the users to browse and explore. For dose- Third, we introduced a new assay type in the ‘Biochemical
response curves that belong to studies that are not published Assay’ tab which covers data from protein turnover mea-
yet, the curve information is available but the experimental surements (synthesis and degradation). Users can obtain
design, although fully imported, will only be shown when the half-life time of proteins of interest to assess their stabil-
these studies are published. Second, the meltome data of ity (22). This data can support the analysis of the mode of
ProteomicsDB was enriched with another study that cov- action of drugs (23) and might provide additional avenues
D1156 Nucleic Acids Research, 2020, Vol. 48, Database issue
HOME PROTEINS PEPTIDES CHROMOSOMES ANALYTICS API PROJECTS FAQ ABOUT US NEWS
HELP
Status Welcome to ProteomicsDB!
Human Proteome
Coverage: 79% ProteomicsDB is an effort of the Technische Universität München (TUM). It is dedicated to expedite the identification of various proteomes and their use across the scientific community
.
FEEDBACK
Proteins: 15,479 of 19,628
Tools Organisms
Isoforms: 11,061 of 86,726
Unique Peptides (Isoform): 242,803
Unique Peptides (Gene): 838,376
Experimental spectra: 98,045,108
Synthetic reference spectra: 5,317,466
Predicted reference
spectra: 14,980,911
Tissues: 296
Quantitative data points: 39,533,195
Repository
Projects: 80
Experiments: 706
Recently Uploaded Projects
Zecha_MCP-2018
Klaeger_Science_2017
B 100
C Proteomics D Proteomics
63 77 20,131 1130 9,627,542 1,631,661 Transcriptomics Transcriptomics
Biochemical Assays Cell Viability Assays
Increase in Data Points (%)
75
Year
1,441 177
50 2019
2017
210 4,646 18 40
25
12,435 20
1,467 219 716 287 8,186,911 3,855,372 176 362 1,367 109
0 209 126
As iab ues
es ote s
As b ugs
As em ugs
As iab nts
As am nts
pr (Pr sue
sa ility
si in
sa ility
M ays al
M ys) y
ys l
e
e
sa ica
l v iss
l v Dr
sa ilit
lV m
ch m
ch D
s ic
s
)
)
Ti
el re
io re
ys
on
ys
T
(C asu
(B asu
i
el
io
e
(C
(C
(B
ex
Figure 2. Additions to ProteomicsDB. (A) The front page of ProteomicsDB has been adjusted to host new organisms as well as provide information about
the quantity of the different data types that are stored in the database. (B) Barplot depicting the proportion and absolute number of data points added
to ProteomicsDB (in blue) since the previous update manuscript in 2017 (green). (C) Venn diagram showing the number and overlap of genes for which
proteomics, transcriptomics or biochemical assay data is available in ProteomicsDB. (D) Venn diagram showing the number and overlap of tissues (as well
as cell lines and body fluids) for which the respective data types are available in ProteomicsDB.
into understanding the effectiveness of drugs in light of the user laboratory. In order to fill this gap, we implemented
stability of on- or off-target proteins (18). In total, ∼20 000 a new feature called ‘Custom User Data Upload’ (Figure
proteins (including isoforms) are covered by at least one and 4). Here, users can temporarily upload their expression pro-
∼3000 by all three biochemical assay types, providing po- files and optionally normalize them to the data stored in
tentially valuable insight into additional aspects of a pro- ProteomicsDB. On upload of a dataset, a temporary ses-
tein’s life cycle. As ProteomicsDB visualizes every curve (ac- sion is created in the database which can be accessed by
cessible via the ‘Biochemical assay’ tab in the ‘Protein De- a unique session ID. This session will automatically expire
tails’ view), users can assess the quality of each individual after 14 days, which will result in the permanent and not
curve and underlying data points themselves. recoverable deletion of all corresponding data unless the
user chooses to extend this period. Users can save and use
Upload and online analysis of user expression data their session ID to load their session to any other com-
puter or browser. Data stored in such sessions are available
Uploading expression profiles. ProteomicsDB’s ability to via ODATA (https://www.odata.org) services within Pro-
interconnect and cross-reference data from various sources teomicsDB and will ultimately allow the integration into
is one of its core features. However, this was so far only any existing analytical pipeline.
possible for data already stored in ProteomicsDB, limiting The first use case we highlight is the comparison of
its usefulness for the interpretation of data acquired in a custom expression data to expression data stored in Pro-
Nucleic Acids Research, 2020, Vol. 48, Database issue D1157
1.2
Relative intensity
1.0
59% 0.8 EC50: 352 nM
Kinase inhibitors 0.6
Curves: ~930,000 0.4
0.2
Proteins: ~5,300
0.0
0.1 1 10 100 1000 10000 100000
467 Concentration [nM]
Replicate 1 Synthesis
1.2 Replicate 2 Degradation
2,086 26
Relative intensity
1.0 Replicate 3
Replicate 4
0.8
2,814 Protein Turnover 0.6
12,613 139 Curves: ~30,000 0.4
2,015 Proteins: ~5,000 0.2
0.0
0 5 10 15 20 25 30 35 40 45 ∞
Time [h]
Relative intensity
Figure 3. New biochemical assay data. The pie chart on the left shows the distribution of biochemical assay data available for three different applications.
The Venn diagram inside the pie chart shows the overlap of proteins for which biochemical assay data of the respective type is available. The diagrams on
the right show exemplary fitted curves for each biochemical assay type, accompanied by the number of curves and proteins that each assay covers.
Figure 4. Custom data analysis area of ProteomicsDB. The ‘Custom Data Upload’ tab enables users to upload their own expression datasets temporarily
to ProteomicsDB. The datasets are session-specific so that no other user has access to this uploaded data.
teomicsDB. For this to be successful, we highly recommend sue or cell line name representing the origin of the mea-
making use of the normalization feature available upon up- sured sample, which will be used for visualizations. (iii) A
load. The uploaded expression profiles are normalized via sample name, which is important to separate samples with
MComBat (24) using the total sum normalized proteomics the same tissue of origin especially for the normalization
expression values of ProteomicsDB as a reference set. Be- step, as samples with the same sample and tissue/cell line
cause MComBat normalization depends on the calcula- name will be automatically aggregated as there is no way to
tion of a mean and variance for any given protein, only separate them. (iv) The expression value of the correspond-
datasets with three or more samples can be normalized ing protein in the sample in log10 scale, accompanied by
using this method. Every uploaded dataset has to adhere the quantification and calculation method that was used,
to a pre-defined comma-separated format (.csv files) where which will help with further comparisons of matching in-
each row must provide the following information. (i) A gene ProteomicsDB data. (v) The taxonomy code of each sample,
name––HGNC symbol as the identifier, which will help us which will allow dataset separation based on the selected or-
associate the uploaded proteins to the ones stored in Pro- ganism, a feature which is discussed below. A detailed doc-
teomicsDB and enable cross-dataset comparisons. (ii) A tis- umentation on how to use this functionality as well as on
D1158 Nucleic Acids Research, 2020, Vol. 48, Database issue
the data upload format, can be found by clicking the ‘Help’ ing a reference dataset so that all other datasets will be nor-
button that accompanies every view in ProteomicsDB (Fig- malized based on the reference. Transcriptomics data are
ure 4). then transferred to the same scale of the proteomics expres-
sion data. Previous experiments showed that the correla-
Use of analytical tools on uploaded datasets. By upload- tion across all tissues between mRNA and protein expres-
ing an expression dataset, back-end procedures take care sion data is higher with than without such an adjustment
of the data modelling and transformation, so that they are (27). Finally, we implemented the mRNA-guided missing
compatible to existing tools with no major differences to value imputation method, described in (27). For this pur-
the data available in ProteomicsDB. The first tool mak- pose, we train linear regression models and extrapolate pro-
ing use of this is the interactive expression heat map. The tein abundance from transcriptomics abundance. To vali-
heat map allows interactive visualization of expression pat- date the performance of the generated models, we created
terns of multiple groups of proteins. Upon upload, users artificial missing values in a random subset of the protein
can choose a data source and focus their analysis on either expression data that are stored in ProteomicsDB. We then
tivity as a function of quantitative protein expression pro- on data stored in ProteomicsDB and expect values from the
files. This functionality can be used in the ‘Drug Sensitivity same or similar expression distributions.
Prediction’ view (Figure 6). Here, users can select from a
variety of tissues and cell lines whose proteomic profiles are Real-time analytics and visualization for any organism
stored in ProteomicsDB. Next, a drug or compound can be
selected to check for its effect on the selected cell line (Fig- ProteomicsDB was initially developed for the exploration of
ure 6A). Figure 6B shows the result of the prediction as bar the human proteome. As a result, every database view and
plots - one for each predicted feature (area under the curve, endpoint was designed without explicit support for multiple
pEC50, relative effect). Error bars show the range of the pre- organisms. In order to support the storage, handling and
dictions of all bootstraps of the corresponding model. Each visualization of data from multiple organisms, all layers of
drug in ProteomicsDB might be accompanied by multiple ProteomicsDB (Figure 1) required modifications and exten-
models (multiple bars in each bar plot), because the drug sive testing. In the new version presented here, we modified
may have been used in more than one drug sensitivity screen all backend procedures to support querying of data for a
which was imported into ProteomicsDB (max. 4). It is im- specific taxonomy. The API endpoints were modified to re-
portant to point out that each model includes a certain set quire a taxcode in order to respond with the desired data.
of predictor-proteins. If the sample on which a user wants With this functionality in place, we prepared the database
to predict drug sensitivity does not contain some of the re- and the data models to support and handle the protein se-
quired proteins, prediction from some models is not pos- quence space of any organism. Similarly, the user interface
sible. Selecting a bar of any bar plot generates a volcano was modified to support the visualization of data from a se-
plot (Figure 6C), which shows information for the interpre- lected organism. Users can change the selected organisms
tation of the trained model. The x-axis shows how strong by using the respective icons on the left hand side of each
the expression of a particular protein is associated with drug view, or directly on the front page of ProteomicsDB (Figure
sensitivity or resistance, analogous to a correlation. The y- 2A). For the protein expression visualization, new interac-
axis shows the number of bootstrap models contained the tive body maps for Arabidopsis thaliana and Mus musculus
particular protein as a predictor, when training the elastic were generated (Figure 7A, Supplementary Figure S2) and
net model. Proteins that appear in the top left and right ar- function in the same way as the human body map.
eas of the volcano plot (Figure 6C) are frequently selected To bring Arabidopsis thaliana into ProteomicsDB, we
from the models as predictors, as they have a high positive downloaded, processed and imported the protein sequence
or negative correlation with drug sensitivity or resistance space from UniProt, following the same mechanism as
and can, therefore, represent potential biomarkers. Instead for human proteins. Upon import, appropriate decoy se-
of predicting drug sensitivity on tissues or cell lines from quences were created for every protease, to allow false dis-
ProteomicsDB, users also have the option to use this func- covery (FDR) estimation by the picked FDR approach
tionality on their own datasets, uploaded using the ‘Cus- already implemented in ProteomicsDB (31). We further-
tom User Data Upload’ tab. Predictions can be applied to more imported the Plant Ontology (PO) (32) to be able
all user datasets, although it is highly recommended to use to make use of ontologies for the different plant tissues.
normalization upon uploading, as the models were trained This step was not necessary for Mus musculus, since the
D1160 Nucleic Acids Research, 2020, Vol. 48, Database issue
HELP
A Expression Dataset Selection B Predictions
Data Source User Data
FEEDBACK
0.50 0.9 1.0
OmicsType Proteomics
User Dataset Selection:
0.45 0.8 0.9
user_cell_lines
Tissue Selection: OVCAR3_NCI60 0.40 0.7 0.8
Drug Selection 0.35 0.7
Relative Effect
0.6
-log10(EC50)
Drug: PANOBINOSTAT
0.30 0.6
Predict Sensitivity 0.5
AUC
0.25 0.5
0.4
0.20 0.4
0.3
0.15 0.3
0.10 0.2 0.2
LE
CC P
LE
CT L E
RP
R
CC
CC
CT
C -log10(EC50)
Figure 6. Drug sensitivity prediction. (A) Prediction is enabled for both, data stored in ProteomicsDB or user uploaded datasets. (B) This view visualizes the
predicted sensitivity of a chosen cell line to a chosen drug expressed by area under the curve (AUC, left bar), the negative log of the effective concentration
of the drug (EC50, middle bars) and the relative (cell killing) effect (right bars). If more than one bar is shown, more than one training data set was available
for the particular drug and either one or several predictions are shown. (C). Each dot in the volcano plot, represents a protein that is associated to drug
sensitivity or resistance on the basis of the elastic net model generated during training.
Brenda Tissue Ontology (BTO) (33) that was previously im- As mentioned before, we have imported >5 million refer-
ported into ProteomicsDB to support the analysis of hu- ence spectra acquired from synthetic human peptides in the
man proteins covers any mammalian tissue. To complete the ProteomeTools project. As a next step, we imported more
protein information and meta-data panel, we downloaded than 10 million Prosit-predicted peptide spectra, in three
and imported protein domain information from SMART different charge states and 3 different collision energies. By
(34) using their RESTful API and GO annotations us- chance, these spectra also represent 70 000 peptides from
ing the QuickGo-API of the European Bioinformatics In- Arabidopsis thaliana because their sequences are identical
stitute (EBI). Protein-protein interactions and functional in either organism. In addition, we added predicted spectra
pathway information were downloaded from STRING (35) for all peptides present in the experimental data set. Thus,
and KEGG (36), respectively. The latter data were pro- akin to the human case, these reference spectra can be used
cessed and transformed for import into our triple-store to validate peptide identifications in experimental data us-
data model, which allows the automatic mapping of the ing the mirror spectrum viewer integrated in ProteomicsDB.
respective STRING and KEGG identifiers to the corre- First, these are directly accessible in the ‘Peptides/MSMS’
sponding UniProt accessions and our internal protein iden- tab of the ‘Protein Details’ view, where users can validate or
tifiers. With the meta-data imported, the proteomics and invalidate i.e. one hit wonders (proteins which are only iden-
transcriptomics expression profiles for Arabidopsis thaliana tified by a single peptide/spectrum), and more generally val-
were imported. The project covers 30 different tissues, in- idate proteins/peptides in case the user wants confirmation
cluding a tissue-derived cell line that was derived from cal- that the protein is actually present in the sample of a project
lus tissue. Because of the generic design of ProteomicsDB, and consequently in a cell line or tissue in ProteomicsDB.
any analytical view (e.g. heat map) will work without fur- Since ProteomicsDB contains up to 14 different types of ref-
ther modifications for any other organism. However, due to erence spectra (11 fragmentation settings from Proteome-
the limited datasets available for phenotypic drug responses Tools and 3 normalized collision energies from Prosit) as in-
(and the respective drug targets), other views do not show dicated in the list of available reference spectra, users can se-
any A. thaliana or M. musculus data yet. lect the optimal match (37). Second, in the ‘Reference Pep-
Nucleic Acids Research, 2020, Vol. 48, Database issue D1161
tides’ tab, where users can browse ProteomeTools and Prosit life science research covering proteomic and transcriptomic
spectra for e.g. designing targeted mass spectrometric as- expression, pathway, protein-protein and protein-drug in-
says. The two separate views exist because for some proteins, teractions, and cell viability data (Supplementary Figure
no experimental spectra of endogenous proteins might be S3). Many aspects of ProteomicsDB are already respect-
available, while many reference spectra might be available ing the FAIR principles (38). For example, e.g. findability
because the ProteomeTools synthesized all meaningful pep- (F) is supported by unique identifiers, accessibility (A) via
tides for a hitherto unobserved protein. For proteins where API endpoints including meta-data and reusability (R) by
experimental data from endogenous proteins is available, way of multiple online services taking advantage of Pro-
users can take experimental proteotypicity of peptides into teomicsDB’s API endpoints. However, more efforts are cur-
account and thus rationalize which peptide to choose for an rently made to transform ProteomicsDB into a fully FAIR
assay. Additionally, this view can be used to compare spec- resource, e.g. by extending the API to allow access to all
tra created by different fragmentation methods and, more data stored in ProteomicsDB. One particular strength of
importantly, different collision energies to optimize their ProteomicsDB is its versatile mapping service allowing the
targeted assays for collision energies which generate desired seamless connection between different data types. This en-
fragment ions (e.g. highly intense and high m/z ions). Fur- ables subsequent modelling and data mining to further
thermore, spectra can now be downloaded in the mirrored evolve ProteomicsDB from an information database to a
spectrum viewer as msp-files. Finally, as mentioned above, knowledge platform. Along these lines, we plan to extend
ProteomicsDB is also ready to support Mus musculus data. our analytical toolbox such that scientists in life science re-
However, the selection of mouse in ProteomicsDB will only search can directly benefit from the wealth of data stored
be enabled once the data has been published. in ProteomicsDB. Here, we show the first steps into this di-
rection by extending the toolbox as well as enabling users
FUTURE DIRECTIONS to upload their own expression data. Combined with Pro-
teomicsDB’s flexible infrastructure, this will provide ease of
The continuous updates introduced over the last years have use for data analysis, interpretation and machine learning
transformed ProteomicsDB into a multi-omics resource for
D1162 Nucleic Acids Research, 2020, Vol. 48, Database issue
capabilities not accessible to every laboratory or scientist. field of proteomics. They have no operational role in the
For this purpose, we are also planning to further extend company. S.G., H.-C.E. and S.A. are employees of SAP SE.
the data content of ProteomicsDB to include, e.g. protein Neither company affiliation had any influence on the results
structures integrated with drug–target affinity data (20) or presented in this study.
develop tools which allow the prediction of the target spaces
of kinase inhibitors (39).
Two more extensions are planned that will allow the fur- REFERENCES
ther integration and exploitation of reference spectra. The 1. Wilhelm,M., Schlegl,J., Hahne,H., Gholami,A.M., Lieberenz,M.,
first one is to use synthetic or predicted reference spectra Savitski,M.M., Ziegler,E., Butzmann,L., Gessulat,S., Marx,H. et al.
(2014) Mass-spectrometry-based draft of the human proteome.
to systematically validate and assess the confidence of ex- Nature, 509, 582–587.
perimental data by evaluating their spectral similarity. As 2. Schmidt,T., Samaras,P., Frejno,M., Gessulat,S., Barnert,M.,
shown earlier, the integration of intensity information can Kienegger,H., Krcmar,H., Schlegl,J., Ehrlich,H.C., Aiche,S. et al.
lead to drastic improvements in either the number of iden- (2018) ProteomicsDB. Nucleic Acids Res., 46, D1271–D1281.
18. Klaeger,S., Heinzlmeir,S., Wilhelm,M., Polzer,H., Vick,B., 29. Lawrence,R.T., Perez,E.M., Hernandez,D., Miller,C.P., Haas,K.M.,
Koenig,P.A., Reinecke,M., Ruprecht,B., Petzoldt,S., Meng,C. et al. Irie,H.Y., Lee,S.I., Blau,C.A. and Villen,J. (2015) The proteomic
(2017) The target landscape of clinical kinase drugs. Science, 358, landscape of triple-negative breast cancer. Cell Rep, 11, 630–644.
eaan4368. 30. Zou,H. and Hastie,T. (2005) Regularization and variable selection via
19. Koch,H., Busto,M.E., Kramer,K., Medard,G. and Kuster,B. (2015) the elastic net. J. R. Stat. Soc.: B (Stat. Methodol.), 67, 301–320.
Chemical proteomics uncovers EPHA2 as a mechanism of acquired 31. Savitski,M.M., Wilhelm,M., Hahne,H., Kuster,B. and Bantscheff,M.
resistance to small molecule EGFR kinase inhibition. J. Proteome (2015) A scalable approach for protein false discovery rate estimation
Res., 14, 2617–2625. in large proteomic data sets. Mol. Cell. Proteomics, 14, 2394–2404.
20. Heinzlmeir,S., Kudlinzki,D., Sreeramulu,S., Klaeger,S., Gande,S.L., 32. Walls,R.L., Cooper,L., Elser,J., Gandolfo,M.A., Mungall,C.J.,
Linhard,V., Wilhelm,M., Qiao,H., Helm,D., Ruprecht,B. et al. (2016) Smith,B., Stevenson,D.W. and Jaiswal,P. (2019) The plant ontology
Chemical proteomics and structural biology define EPHA2 inhibition facilitates comparisons of plant development stages across species.
by clinical kinase drugs. ACS Chem. Biol., 11, 3400–3411. Front. Plant Sci., 10, 631.
21. Heinzlmeir,S., Lohse,J., Treiber,T., Kudlinzki,D., Linhard,V., 33. Gremse,M., Chang,A., Schomburg,I., Grote,A., Scheer,M.,
Gande,S.L., Sreeramulu,S., Saxena,K., Liu,X., Wilhelm,M. et al. Ebeling,C. and Schomburg,D. (2011) The BRENDA Tissue Ontology
(2017) Chemoproteomics-Aided medicinal chemistry for the (BTO): the first all-integrating ontology of all organisms for enzyme
discovery of EPHA2 inhibitors. Chem. Med. Chem, 12, 999–1011. sources. Nucleic Acids Res., 39, D507–D513.