0% found this document useful (0 votes)
12 views11 pages

Samaras_2019

Uploaded by

xlg47311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views11 pages

Samaras_2019

Uploaded by

xlg47311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Published online 30 October 2019 Nucleic Acids Research, 2020, Vol.

48, Database issue D1153–D1163


doi: 10.1093/nar/gkz974

ProteomicsDB: a multi-omics and multi-organism


resource for life science research
Patroklos Samaras 1 , Tobias Schmidt 1 , Martin Frejno1 , Siegfried Gessulat1,2 ,
Maria Reinecke1,3,4 , Anna Jarzab1 , Jana Zecha1 , Julia Mergner1 , Piero Giansanti1 ,
Hans-Christian Ehrlich2 , Stephan Aiche2 , Johannes Rank5,6 , Harald Kienegger5,6 ,
Helmut Krcmar5,6 , Bernhard Kuster1,7,* and Mathias Wilhelm1,*
1
Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Bavaria, Germany,

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


2
Innovation Center Network, SAP SE, Potsdam, Germany, 3 German Cancer Consortium (DKTK), Partner Site
Munich, Munich, Germany, 4 German Cancer Research Center (DKFZ), Heidelberg, Germany, 5 Chair for Information
Systems, Technical University of Munich (TUM), Garching, Germany, 6 SAP University Competence Center, Technical
University of Munich (TUM), Garching, Germany and 7 Bavarian Biomolecular Mass Spectrometry Center
(BayBioMS), Technical University of Munich (TUM), Freising, Bavaria, Germany

Received September 14, 2019; Revised October 11, 2019; Editorial Decision October 11, 2019; Accepted October 15, 2019

ABSTRACT INTRODUCTION
ProteomicsDB (https://www.ProteomicsDB.org) ProteomicsDB (https://www.ProteomicsDB.org) is an in-
started as a protein-centric in-memory database for memory database initially developed for the exploration of
the exploration of large collections of quantitative large quantities of quantitative human mass spectrometry-
mass spectrometry-based proteomics data. The based proteomics data including the first draft of the hu-
data types and contents grew over time to include man proteome (1). Among many features, it allows the real-
time exploration and retrieval of protein abundance values
RNA-Seq expression data, drug-target interactions
across different tissues, cell lines, and body fluids via inter-
and cell line viability data. In this manuscript, we active expression heat maps and body maps. Today, Pro-
summarize new developments since the previous teomicsDB supports multiple use cases across different dis-
update that was published in Nucleic Acids Research ciplines and covering a wide range of data (2). For instance,
in 2017. Over the past two years, we have enriched tandem mass spectra, peptide identifications and peptide
the data content by additional datasets and extended proteotypicity values can be used as starting points to de-
the platform to support protein turnover data. An- velop targeted mass spectrometry assays. Because of the re-
other important new addition is that ProteomicsDB cent incorporation of a large amount of reference spectra
now supports the storage and visualization of data from the ProteomeTools project (3,4) as well as spectra pre-
collected from other organisms, exemplified by dicted by the artificial intelligence Prosit (5), both experi-
mental and reference spectra can be used for assay devel-
Arabidopsis thaliana. Due to the generic design of
opment and to validate the identification of so far unob-
ProteomicsDB, all analytical features available for
served, or in fact any proteins. The integration of pheno-
the original human resource seamlessly transfer typic data allows the exploration of the dose-dependent ef-
to other organisms. Furthermore, we introduce a fect of drugs of interest (e.g. clinically approved drugs) on
new service in ProteomicsDB which allows users to multiple cell lines (6–9). The dynamic identifier mapping
upload their own expression datasets and analyze in ProteomicsDB allows the integration of transcriptomics
them alongside with data stored in ProteomicsDB. data from e.g. the Human Protein Atlas project (10) and
Initially, users will be able to make use of this feature Bgee (11), and thus facilitates the automated integration of
in the interactive heat map functionality as well as different data sources within ProteomicsDB. This, in turn,
the drug sensitivity prediction, but ultimately will be allows the development of new tools. A wide range of drug-
able to use all analytical features of ProteomicsDB target interaction data can be visualized in ProteomicsDB
as well, which enables the exploration of combination treat-
in this way.
ments in a dose-dependent protein-drug interaction graph
in-silico.

* To
whom correspondence should be addressed. Tel: +49 8161 71 4202; Fax: +49 8161 71 5931; Email: [email protected]
Correspondence may also be addressed to Bernhard Kuster. Email: [email protected]

C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact [email protected]
D1154 Nucleic Acids Research, 2020, Vol. 48, Database issue

ProteomicsDB is becoming an increasingly valuable re- Prosit. The right section includes links that trigger the se-
source in (proteomic) life science research, evidenced by lection of the corresponding organism. To make organism
the increasing number of external resources linking to Pro- selection available throughout the web interface, we addi-
teomicsDB, such as UniProt (12) and GeneCards (13), as tionally adjusted the left sidebar to show one icon per avail-
well as resources making use of our application program- able organism. The ‘Feedback’ button that that was previ-
ming interface (API) to show e.g. protein expression infor- ously located in that position was transferred to the right
mation, as done by OmniPathDB (14) and Gene Info eX- pane below the ‘Help’ button. In light of these changes, all
tension (GIX) (15). internal procedures and endpoints (e.g. API) were adjusted
In this version, we expanded the data content of Pro- to support the new data types and organisms.
teomicsDB by including additional publically available as Figure 2B depicts the data expansion in ProteomicsDB
well as in-house generated proteomic and transcriptomic since 2017, grouped by categories. By re-analyzing and up-
studies. Furthermore, we expanded the drug-target interac- loading more publically available proteomics studies, we in-
tion data now covering ∼1500 kinase inhibitors and tool creased the tissue coverage of ProteomicsDB by ∼70 hu-

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


compounds. The cell line viability data were enriched with man tissues and cell lines (+∼30%), to a total of almost
an additional large dataset (16) now covering >20 000 300 tissues and cell lines. The broader coverage of biolog-
drugs against 1500 cell lines. We further increased the ical systems has direct impact on visualizations like the hu-
amount of protein property information that is stored in man body map or expression heat map. The plethora of data
ProteomicsDB, such as 13 000 melting points of proteins in ProteomicsDB allows not only the further online explo-
obtained by thermal proteome profiling (17). In addition, ration of the proteome and its properties but also enables
we expanded the biochemical assays section to include pro- the development of new tools integrating different omics
tein turnover data with synthesis and degradation curves for data sources. Currently, human proteomics and transcrip-
>6000 proteins. We further increased the number of refer- tomics data are available for ∼17 000 genes and ∼60 tissues
ence tandem mass spectra in ProteomicsDB to >5 million (Figure 2C, D). This large overlap enabled the implemen-
from synthetic peptides and 40 million from predictions, tation of a new missing value imputation approach which
which, in total, are represented by 3 billion fragment ions. makes use of transcriptomics or proteomics data to esti-
mate the presence and abundance of protein or RNA not
covered in individual data sets. For ∼13 000 proteins, ad-
RESULTS ditional information derived from other biochemical assays
such as melting behavior or synthesis or degradation curves
Overview
are available. By integrating additional publicly available
ProteomicsDB aims to provide real-time analytical func- datasets, the overlap at the tissue- and protein level will in-
tions to users, including computationally challenging tasks. crease further over the next years and eventually cover all
For this purpose, ProteomicsDB was carefully designed and the >1000 (cancer) cell lines for which we already have cell
organized (Figure 1). It consists of a production unit, a viability data. This, in turn, will aid the development of a
computing unit, and a storage unit, all intra-connected via better understanding of the molecular factors that govern
a 16Gbit local network. The production unit hosts the pro- the life of a particular cell.
duction server as well as the entire development and test-
ing environment. The computing unit is one machine with a
New biochemical assay data, covering more protein properties
fully dockerized environment which currently handles two
main tasks. First, an R server that handles R-procedures In addition to importing additional expression profile
from ProteomicsDB such as the clustering available in the datasets, we further extended our biochemical assay por-
heat map. Second, a docker container with various services tal by integrating the results of three additional studies
handling requests to our deep learning tool Prosit which is covering target information of small molecule kinase in-
connected to two NVIDIA P100 GPU cards. hibitors, melting (thermal aggregation) behavior of proteins
Over the past two years, the user interface and data con- and turnover data. First, in order to extend knowledge on
tent of ProteomicsDB were updated to accommodate new druggable protein kinases (18), we imported ∼500 000 ki-
requirements such as hosting data from other organisms. nase inhibitor dose-response curves (Figure 3) covering 243
Figure 2A shows the changes that were made to the front kinase inhibitors that are either approved for use or are in
page such that users can select the organism of interest. clinical trials (18) and ∼1300 tool compounds targeting ki-
Parts of the webpage have been renamed to be more generic nases (unpublished). This data gives users a broader cov-
and cover every organism, such as the ‘Human Proteins’ erage and thus more options to select inhibitors to study
tab, which was renamed to ‘Proteins’. The front page statis- a particular protein kinase. Various learnings might arise
tics lists new information about the quantity of the data that from such analysis, such as assessing the repurposing po-
is available for the chosen organism, including information tential of clinical kinase inhibitors. Moreover, users can dis-
about tissue coverage, quantitative multi-omics expression cover an appropriate molecule/inhibitor with respect to po-
values, biochemical assay measurements as well as cell via- tency and selectivity to study the function of a particular
bility measurements. The main pane of the front page was kinase (19). Another use case is to identify inhibitors which
redesigned to show the main features of the platform. It is share the same target(s) but have different off-targets, which
now split into two sections. The left section provides direct can be used to identify and study the core signaling path-
links to the protein centric visualizations, the analytics tool- way of the shared target(s) or general on-target effects (18).
box, the new feature to upload custom data and a link to In addition, the biochemical assay data and tools provided
Nucleic Acids Research, 2020, Vol. 48, Database issue D1155

External resources

OmniPathDB GIX

cross-referencing data-reusability

https://www.ProteomicsDB.org Compute node


< web /> { api }
Rserver
Presentation layers Clustering, normalization,
GAK
L H L V A L V G T Q G R ICMT
KRT10 HCK

machine learning
NAA16
R G Q T G V L A V L H L RARA
REPS1 FRK
NLK
VDR
b2

CSK TNK2 EPHA5


y10

EPHA4 MYLK
PKMYT1 RIPK2
SRC MAP4K5
y9

YES1 EPHA2
KRT2
LIMK1 MINK1
KRT10 EPHB2 LCK
y8

EPHB3 MAP2K5 STT3B


b3

CDH1
Rap1 LIMK2 MAPKAPK2 PKN1
y2

signaling DASATINIB
PSMB4 EPHB4
y1

FYN SIK2
pathway
y7
y6
y5

KRT2 PRKCA LIMS1

PSMB1
y11
b5

MMP7 ACVR1B FGR


b4

PKN2
b6
y3

b7
y4

UBC TGFBR2
BTK TEC

EGFR
PSMB3 KRT9
DDR2
PAG1
PTK6
LGALS3BP

PRKCQ
tyrosine
kinase
inhibitor
resistance
SRC
DDR1
PSMB6 CSNK2A1
RANBP1
SLC5A6 BCR
MAP4K2LYN
DDR1

KHDRBS1
PSMB2 INPPL1
COG8
TPP2 ABL1
ABL2

MMP2 ErbB
PSMB5 HIST1H2BL
CLK1

IMATINIB
signaling CSNK2A2
PTPN1
pathway Chemokine
signaling
PSMB7 HRNR
CSNK2B KRT9
EIF3J
BAFETINIB MAPK14

pathway
PSMB8 FAM83A
ATPIF1 GRB2 CDK3
PSMC5

SHC1 Endocrine
PSMB9 NQO2

JAGN1
ATL3

PEF1 SLC25A4
resistance
PSMB10 PNP SMC4

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


GPUs
Calculation layers Learning and prediction of
In-memory Calculation Graph Integrated
intensities and RT,
database engine engine computation supporting tensorflow serving

System Information
Data Content Main memory: 125GB,
>106 Dose-responses >108 Reference spectra
CPU cores: 64
>107 Drug sensitivity measurements >108 Peptide identifications
>107 Protein expression measurements >108 Experimental spectra
>108 Transcript expression measurements >1010 Fragment ions
Local network
16 GBit
Data layers

Controlled vocabulary, meta data


and annotations
Repository and experiment design
Storage
Peptide and protein identification
and quantification data
Reference spectra
Quantitative omics data
Experiment specific fitted models SSD
(e.g. dose-dependent data)
Cell-viability HDD
RAID
HDD
RAID
System Information System Information
DBMS: SAP HANA, Main memory: 6TB, CPU cores: 320 Capacity: 100TB

Figure 1. The architecture of ProteomicsDB. The production unit hosts the SAP HANA in-memory database management system which involves three
of the presented layers: the data layers, data content and the calculation layers. Parts of the calculation layers are shared between the production unit and
the compute node, such as the clustering and correlation procedures for the interactive expression heat map which are calculated by the Rserver. Part
of the data content is stored in the network storage unit, so that data are always available throughout the network if needed. The entire infrastructure is
intra-connected via a 16 Gbit bandwidth local network that enables rapid communication and data transfer between units.

in ProteomicsDB (e.g. Inhibitor potency/selectivity analy- ers the protein melting properties for many organisms (un-
sis) can be used to discover new lead compounds for medic- published). Therefore, users can more thoroughly study the
inal chemistry programs targeting a specific kinase of in- effect of temperature on selected proteins. We now cover
terest (20,21). The dose-response curves can be explored the melting properties of ∼13 000 human proteins. Pro-
in the ‘Biochemical assay’ tab of the protein details view. teomicsDB thus provides an extensive resource and data-
This view allows users to filter the data by different prop- driven guidance on which temperature range should be used
erties, so that only compounds that fit the desired criteria for e.g. a thermal shift assay or which temperature would
will be displayed. For all curves, full experimental designs be suitable for an isothermal dose response assay (ITDR).
are stored for the users to browse and explore. For dose- Third, we introduced a new assay type in the ‘Biochemical
response curves that belong to studies that are not published Assay’ tab which covers data from protein turnover mea-
yet, the curve information is available but the experimental surements (synthesis and degradation). Users can obtain
design, although fully imported, will only be shown when the half-life time of proteins of interest to assess their stabil-
these studies are published. Second, the meltome data of ity (22). This data can support the analysis of the mode of
ProteomicsDB was enriched with another study that cov- action of drugs (23) and might provide additional avenues
D1156 Nucleic Acids Research, 2020, Vol. 48, Database issue

HOME PROTEINS PEPTIDES CHROMOSOMES ANALYTICS API PROJECTS FAQ ABOUT US NEWS

HELP
Status Welcome to ProteomicsDB!
Human Proteome
Coverage: 79% ProteomicsDB is an effort of the Technische Universität München (TUM). It is dedicated to expedite the identification of various proteomes and their use across the scientific community
.

FEEDBACK
Proteins: 15,479 of 19,628
Tools Organisms
Isoforms: 11,061 of 86,726
Unique Peptides (Isoform): 242,803
Unique Peptides (Gene): 838,376
Experimental spectra: 98,045,108
Synthetic reference spectra: 5,317,466
Predicted reference
spectra: 14,980,911
Tissues: 296
Quantitative data points: 39,533,195

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


Human Transcriptome
Quantitative data points: 170,919,548
Tissues 254
Proteins Analytics Toolbox Homo sapiens Arabidopsis thaliana
Other statistics Explore the proteome Use our analytics tools to explore co- Explore the Human proteome. Explore the Arabidopsis proteome.
Viability assays: 634,983 expression patterns, etc.
Biochemical assays: 1,591,907

Repository
Projects: 80
Experiments: 706
Recently Uploaded Projects
Zecha_MCP-2018

Klaeger_Science_2017

Cellzome_Cellsurface Upload your Datasets Prosit Mus musculus more organisms


Use our analytics and visualization Predict peptide properties using our Explore the Mouse proteome. Explore the proteome.
tools on your data. online service.

B 100
C Proteomics D Proteomics
63 77 20,131 1130 9,627,542 1,631,661 Transcriptomics Transcriptomics
Biochemical Assays Cell Viability Assays
Increase in Data Points (%)

75

Year
1,441 177
50 2019
2017
210 4,646 18 40
25
12,435 20
1,467 219 716 287 8,186,911 3,855,372 176 362 1,367 109
0 209 126
As iab ues

es ote s

As b ugs

As em ugs

As iab nts

As am nts
pr (Pr sue
sa ility

si in

sa ility

M ays al

M ys) y

ys l
e

e
sa ica
l v iss

l v Dr

sa ilit
lV m

ch m
ch D
s ic
s
)

)
Ti

el re

io re
ys

on

ys
T

(C asu

(B asu
i

Gene Overlap Tissue Overlap


el

el

io

e
(C

(C

(B
ex

Figure 2. Additions to ProteomicsDB. (A) The front page of ProteomicsDB has been adjusted to host new organisms as well as provide information about
the quantity of the different data types that are stored in the database. (B) Barplot depicting the proportion and absolute number of data points added
to ProteomicsDB (in blue) since the previous update manuscript in 2017 (green). (C) Venn diagram showing the number and overlap of genes for which
proteomics, transcriptomics or biochemical assay data is available in ProteomicsDB. (D) Venn diagram showing the number and overlap of tissues (as well
as cell lines and body fluids) for which the respective data types are available in ProteomicsDB.

into understanding the effectiveness of drugs in light of the user laboratory. In order to fill this gap, we implemented
stability of on- or off-target proteins (18). In total, ∼20 000 a new feature called ‘Custom User Data Upload’ (Figure
proteins (including isoforms) are covered by at least one and 4). Here, users can temporarily upload their expression pro-
∼3000 by all three biochemical assay types, providing po- files and optionally normalize them to the data stored in
tentially valuable insight into additional aspects of a pro- ProteomicsDB. On upload of a dataset, a temporary ses-
tein’s life cycle. As ProteomicsDB visualizes every curve (ac- sion is created in the database which can be accessed by
cessible via the ‘Biochemical assay’ tab in the ‘Protein De- a unique session ID. This session will automatically expire
tails’ view), users can assess the quality of each individual after 14 days, which will result in the permanent and not
curve and underlying data points themselves. recoverable deletion of all corresponding data unless the
user chooses to extend this period. Users can save and use
Upload and online analysis of user expression data their session ID to load their session to any other com-
puter or browser. Data stored in such sessions are available
Uploading expression profiles. ProteomicsDB’s ability to via ODATA (https://www.odata.org) services within Pro-
interconnect and cross-reference data from various sources teomicsDB and will ultimately allow the integration into
is one of its core features. However, this was so far only any existing analytical pipeline.
possible for data already stored in ProteomicsDB, limiting The first use case we highlight is the comparison of
its usefulness for the interpretation of data acquired in a custom expression data to expression data stored in Pro-
Nucleic Acids Research, 2020, Vol. 48, Database issue D1157

1.2

Relative intensity
1.0
59% 0.8 EC50: 352 nM
Kinase inhibitors 0.6
Curves: ~930,000 0.4
0.2
Proteins: ~5,300
0.0
0.1 1 10 100 1000 10000 100000
467 Concentration [nM]

Replicate 1 Synthesis
1.2 Replicate 2 Degradation
2,086 26

Relative intensity
1.0 Replicate 3
Replicate 4
0.8
2,814 Protein Turnover 0.6
12,613 139 Curves: ~30,000 0.4
2,015 Proteins: ~5,000 0.2
0.0
0 5 10 15 20 25 30 35 40 45 ∞
Time [h]

Protein overlap 1.0

Relative intensity

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


DMSO 1
0.8 DMSO 2
0.6
Meltome
39% Curves: ~620,000
0.4
0.2
Proteins: ~20,000
0.0
35 40 45 50 55 60 65 70
% of stored curves Temperature [°C]

Figure 3. New biochemical assay data. The pie chart on the left shows the distribution of biochemical assay data available for three different applications.
The Venn diagram inside the pie chart shows the overlap of proteins for which biochemical assay data of the respective type is available. The diagrams on
the right show exemplary fitted curves for each biochemical assay type, accompanied by the number of curves and proteins that each assay covers.

Figure 4. Custom data analysis area of ProteomicsDB. The ‘Custom Data Upload’ tab enables users to upload their own expression datasets temporarily
to ProteomicsDB. The datasets are session-specific so that no other user has access to this uploaded data.

teomicsDB. For this to be successful, we highly recommend sue or cell line name representing the origin of the mea-
making use of the normalization feature available upon up- sured sample, which will be used for visualizations. (iii) A
load. The uploaded expression profiles are normalized via sample name, which is important to separate samples with
MComBat (24) using the total sum normalized proteomics the same tissue of origin especially for the normalization
expression values of ProteomicsDB as a reference set. Be- step, as samples with the same sample and tissue/cell line
cause MComBat normalization depends on the calcula- name will be automatically aggregated as there is no way to
tion of a mean and variance for any given protein, only separate them. (iv) The expression value of the correspond-
datasets with three or more samples can be normalized ing protein in the sample in log10 scale, accompanied by
using this method. Every uploaded dataset has to adhere the quantification and calculation method that was used,
to a pre-defined comma-separated format (.csv files) where which will help with further comparisons of matching in-
each row must provide the following information. (i) A gene ProteomicsDB data. (v) The taxonomy code of each sample,
name––HGNC symbol as the identifier, which will help us which will allow dataset separation based on the selected or-
associate the uploaded proteins to the ones stored in Pro- ganism, a feature which is discussed below. A detailed doc-
teomicsDB and enable cross-dataset comparisons. (ii) A tis- umentation on how to use this functionality as well as on
D1158 Nucleic Acids Research, 2020, Vol. 48, Database issue

the data upload format, can be found by clicking the ‘Help’ ing a reference dataset so that all other datasets will be nor-
button that accompanies every view in ProteomicsDB (Fig- malized based on the reference. Transcriptomics data are
ure 4). then transferred to the same scale of the proteomics expres-
sion data. Previous experiments showed that the correla-
Use of analytical tools on uploaded datasets. By upload- tion across all tissues between mRNA and protein expres-
ing an expression dataset, back-end procedures take care sion data is higher with than without such an adjustment
of the data modelling and transformation, so that they are (27). Finally, we implemented the mRNA-guided missing
compatible to existing tools with no major differences to value imputation method, described in (27). For this pur-
the data available in ProteomicsDB. The first tool mak- pose, we train linear regression models and extrapolate pro-
ing use of this is the interactive expression heat map. The tein abundance from transcriptomics abundance. To vali-
heat map allows interactive visualization of expression pat- date the performance of the generated models, we created
terns of multiple groups of proteins. Upon upload, users artificial missing values in a random subset of the protein
can choose a data source and focus their analysis on either expression data that are stored in ProteomicsDB. We then

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


data from ProteomicsDB, their own datasets noted as ‘User used our models to extrapolate the protein abundances and
Data’ or the integration of both, noted as ‘Combined’. Be- compared them to two other common missing value im-
cause the heat map automatically aggregates tissues, dupli- putation strategies: (a) replacing missing values with the
cated tissue names provided in the custom dataset will ap- minimum protein abundance of the corresponding sample
pear as one column. The automatic mapping enables users and (b) random sampling from the corresponding sample’s
to use all functionalities of the heat map, such as direct protein abundance distribution, as the created missing val-
links to the ProteomicsDB’s protein summary views and ues originate from the whole abundance distribution. The
perform GO enrichment analysis on the selected proteins. mRNA-guided missing value imputation method showed
The ‘Combined’ option allows users to compare their data the best correlation to the measured values (Supplementary
to data stored in ProteomicsDB. They can further allow a Figure S1) which is why we implemented it. The entire pro-
comparison of some or all datasets that they have uploaded cedure, from data normalization to training the regression
to the in-database data. Users should expect that uploaded model is performed by the R server (Figure 1). This is pos-
datasets that were not subjected to normalization during sible because the SAP HANA in-memory database man-
uploading, will clustered together. If the normalization step agement system supports direct connections to the R-server
was enabled, then user samples should cluster with tissues via proper adapters. Missing value imputation is available in
or cell lines that have similar expression profiles in Pro- the interactive heat map (Figure 5) and can be activated by
teomicsDB, ideally from the same origin. Figure 4 shows the respective button. Once activated, and only if matching
such an example where a custom dataset was co-clustered expression profiles are available, the model trained above
with data stored in ProteomicsDB. Some of the uploaded and the adjusted transcriptomics expression data are used
expression profiles of cell lines co-cluster with the respec- to fill in missing values in the protein expression matrix.
tive cell lines stored in ProteomicsDB (here lung and liver The authors point out that missing value imputation can
samples). There are cases though (here ovary) that cluster lead to issues and should therefore be carefully considered
with other tissues (here uterus). This feature enables users and evaluated on a case by case basis. Especially in the case
to find the closest cell lines for which ProteomicsDB con- of mRNA-guided missing value imputation, it becomes less
tains, e.g. phenotypic information and explore compounds accurate if the RNA dataset or protein expression data has
that may be effective in user cell lines. a limited number of samples. Moreover, not all missing val-
ues can be imputed if RNASeq matching data is missing.
Extended heat map features––missing values imputation.
ProteomicsDB stores a large collection of transcriptomics
Drug sensitivity prediction for proteomic profiles
expression profiles alongside the respective proteomic pro-
files. Having access to expression data from both sources ProteomicsDB already covers a lot of phenotypic drug sen-
and to the automatic mapping using the built-in Resource sitivity information (Figure 2B) and to the best of our
Identifier Relation Model, ProteomicsDB is able to perform knowledge, no other platform exists which shows the full
data-driven missing value imputation using either data type. dose response curves across multiple resources including fil-
Especially proteomics data (depending on the depth of mea- ters to the extent as ProteomicsDB’s cell viability viewer
surement) can show a large number of missing values. Data does. However, the list of cell lines for which this data is
selected for imputation might come from different projects available is necessarily incomplete and likely entirely un-
for both omics types. Even projects of the same omics type available or impossible to generate if cells lines were de-
might differ in the distribution of their expression values. rived from say patient tissue in a particular laboratory. In
This phenomenon is commonly referred to as ‘batch effect’ order to obtain an estimate of the susceptibility of such
and results in additional variance by the fact that we aggre- cell lines to drugs, without performing an experiment, Pro-
gate data across multiple ‘batches’. Here, the term ‘batch’ teomicsDB provides a tool to model and estimate drug sen-
refers to experiments processed in one laboratory over a sitivity, based on expression profiles. Recent proteome pro-
short time period using the same technological platform filing of the NCI60 (28) and the CRC65 (27) cancer cell line
(25). We performed intra-omics normalization and batch ef- panels, and an additional panel of 20 breast cancer cell lines
fect correction using ComBat (26). Next, we apply MCom- (29) showed that protein signatures can predict drug sensi-
Bat (24) to perform inter-omics correction of systematic dif- tivity or resistance. On this basis, we implemented elastic
ferences. MComBat, in contrast to ComBat, allows select- net regression (30) in ProteomicsDB to model drug sensi-
Nucleic Acids Research, 2020, Vol. 48, Database issue D1159

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


Figure 5. Combined interactive expression heat map. User datasets can be clustered along with data stored in ProteomicsDB for a combined analysis. User
datasets (marked in orange) that were normalized using MComBat subsequent to upload, cluster close to samples in ProteomicsDB (in blue) that were
generated from the same or similar tissues or cell types.

tivity as a function of quantitative protein expression pro- on data stored in ProteomicsDB and expect values from the
files. This functionality can be used in the ‘Drug Sensitivity same or similar expression distributions.
Prediction’ view (Figure 6). Here, users can select from a
variety of tissues and cell lines whose proteomic profiles are Real-time analytics and visualization for any organism
stored in ProteomicsDB. Next, a drug or compound can be
selected to check for its effect on the selected cell line (Fig- ProteomicsDB was initially developed for the exploration of
ure 6A). Figure 6B shows the result of the prediction as bar the human proteome. As a result, every database view and
plots - one for each predicted feature (area under the curve, endpoint was designed without explicit support for multiple
pEC50, relative effect). Error bars show the range of the pre- organisms. In order to support the storage, handling and
dictions of all bootstraps of the corresponding model. Each visualization of data from multiple organisms, all layers of
drug in ProteomicsDB might be accompanied by multiple ProteomicsDB (Figure 1) required modifications and exten-
models (multiple bars in each bar plot), because the drug sive testing. In the new version presented here, we modified
may have been used in more than one drug sensitivity screen all backend procedures to support querying of data for a
which was imported into ProteomicsDB (max. 4). It is im- specific taxonomy. The API endpoints were modified to re-
portant to point out that each model includes a certain set quire a taxcode in order to respond with the desired data.
of predictor-proteins. If the sample on which a user wants With this functionality in place, we prepared the database
to predict drug sensitivity does not contain some of the re- and the data models to support and handle the protein se-
quired proteins, prediction from some models is not pos- quence space of any organism. Similarly, the user interface
sible. Selecting a bar of any bar plot generates a volcano was modified to support the visualization of data from a se-
plot (Figure 6C), which shows information for the interpre- lected organism. Users can change the selected organisms
tation of the trained model. The x-axis shows how strong by using the respective icons on the left hand side of each
the expression of a particular protein is associated with drug view, or directly on the front page of ProteomicsDB (Figure
sensitivity or resistance, analogous to a correlation. The y- 2A). For the protein expression visualization, new interac-
axis shows the number of bootstrap models contained the tive body maps for Arabidopsis thaliana and Mus musculus
particular protein as a predictor, when training the elastic were generated (Figure 7A, Supplementary Figure S2) and
net model. Proteins that appear in the top left and right ar- function in the same way as the human body map.
eas of the volcano plot (Figure 6C) are frequently selected To bring Arabidopsis thaliana into ProteomicsDB, we
from the models as predictors, as they have a high positive downloaded, processed and imported the protein sequence
or negative correlation with drug sensitivity or resistance space from UniProt, following the same mechanism as
and can, therefore, represent potential biomarkers. Instead for human proteins. Upon import, appropriate decoy se-
of predicting drug sensitivity on tissues or cell lines from quences were created for every protease, to allow false dis-
ProteomicsDB, users also have the option to use this func- covery (FDR) estimation by the picked FDR approach
tionality on their own datasets, uploaded using the ‘Cus- already implemented in ProteomicsDB (31). We further-
tom User Data Upload’ tab. Predictions can be applied to more imported the Plant Ontology (PO) (32) to be able
all user datasets, although it is highly recommended to use to make use of ontologies for the different plant tissues.
normalization upon uploading, as the models were trained This step was not necessary for Mus musculus, since the
D1160 Nucleic Acids Research, 2020, Vol. 48, Database issue

PROTEOMICSDB ANALYTICS TOOLBOX DRUG SENSITIVITY PREDICTION

HELP
A Expression Dataset Selection B Predictions
Data Source User Data

FEEDBACK
0.50 0.9 1.0
OmicsType Proteomics
User Dataset Selection:
0.45 0.8 0.9
user_cell_lines
Tissue Selection: OVCAR3_NCI60 0.40 0.7 0.8
Drug Selection 0.35 0.7

Relative Effect
0.6

-log10(EC50)
Drug: PANOBINOSTAT
0.30 0.6
Predict Sensitivity 0.5

AUC
0.25 0.5
0.4
0.20 0.4
0.3
0.15 0.3
0.10 0.2 0.2

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


0.05 0.1 0.1
0.00 0.0 0.0

LE

CC P
LE

CT L E
RP
R
CC

CC
CT
C -log10(EC50)

Negative effect 0.62 Positive effect


0.58
IMPA1 0.54 THNSL2
RABEPK 0.50 CERKL
Selection frequency

MED17 0.46 HEPACAM


NRAS 0.42
NOB1 0.38
C20orf4 0.34
RSL24D1
INTS10 0.30
DHRS4 0.26
NUPL2 0.22
NFYA 0.18
POP7 0.14
SNRPB2 0.10
PUM2 0.06
INTS9 0.02
CRLF3
WRNIP1 -1.0e-3 -5.0e-4 0.0e+0 5.0e-4 1.0e-3
Mean effect size

Figure 6. Drug sensitivity prediction. (A) Prediction is enabled for both, data stored in ProteomicsDB or user uploaded datasets. (B) This view visualizes the
predicted sensitivity of a chosen cell line to a chosen drug expressed by area under the curve (AUC, left bar), the negative log of the effective concentration
of the drug (EC50, middle bars) and the relative (cell killing) effect (right bars). If more than one bar is shown, more than one training data set was available
for the particular drug and either one or several predictions are shown. (C). Each dot in the volcano plot, represents a protein that is associated to drug
sensitivity or resistance on the basis of the elastic net model generated during training.

Brenda Tissue Ontology (BTO) (33) that was previously im- As mentioned before, we have imported >5 million refer-
ported into ProteomicsDB to support the analysis of hu- ence spectra acquired from synthetic human peptides in the
man proteins covers any mammalian tissue. To complete the ProteomeTools project. As a next step, we imported more
protein information and meta-data panel, we downloaded than 10 million Prosit-predicted peptide spectra, in three
and imported protein domain information from SMART different charge states and 3 different collision energies. By
(34) using their RESTful API and GO annotations us- chance, these spectra also represent 70 000 peptides from
ing the QuickGo-API of the European Bioinformatics In- Arabidopsis thaliana because their sequences are identical
stitute (EBI). Protein-protein interactions and functional in either organism. In addition, we added predicted spectra
pathway information were downloaded from STRING (35) for all peptides present in the experimental data set. Thus,
and KEGG (36), respectively. The latter data were pro- akin to the human case, these reference spectra can be used
cessed and transformed for import into our triple-store to validate peptide identifications in experimental data us-
data model, which allows the automatic mapping of the ing the mirror spectrum viewer integrated in ProteomicsDB.
respective STRING and KEGG identifiers to the corre- First, these are directly accessible in the ‘Peptides/MSMS’
sponding UniProt accessions and our internal protein iden- tab of the ‘Protein Details’ view, where users can validate or
tifiers. With the meta-data imported, the proteomics and invalidate i.e. one hit wonders (proteins which are only iden-
transcriptomics expression profiles for Arabidopsis thaliana tified by a single peptide/spectrum), and more generally val-
were imported. The project covers 30 different tissues, in- idate proteins/peptides in case the user wants confirmation
cluding a tissue-derived cell line that was derived from cal- that the protein is actually present in the sample of a project
lus tissue. Because of the generic design of ProteomicsDB, and consequently in a cell line or tissue in ProteomicsDB.
any analytical view (e.g. heat map) will work without fur- Since ProteomicsDB contains up to 14 different types of ref-
ther modifications for any other organism. However, due to erence spectra (11 fragmentation settings from Proteome-
the limited datasets available for phenotypic drug responses Tools and 3 normalized collision energies from Prosit) as in-
(and the respective drug targets), other views do not show dicated in the list of available reference spectra, users can se-
any A. thaliana or M. musculus data yet. lect the optimal match (37). Second, in the ‘Reference Pep-
Nucleic Acids Research, 2020, Vol. 48, Database issue D1161

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


Figure 7. ProteomicsDB as a multi-organism and multi-omics platform. (A) Proteome or transcriptome expression data are visualized in the tissues of
a chosen organism (left) and numerical expression data (medians in case multiple samples of the same tissue are available) are shown on the right for
each tissue the protein was found in. Tissue bars selected by users turn orange and the respective tissue is highlighted on the body map on the left view
projects the tissue aggregated omics expression values to the corresponding organism’s body map. (B) Venn diagram is showing the overlap of gene-level
data available for proteomics and transcriptomics for Arabidopsis thaliana. (C) Venn diagram showing the overlap of tissues for which proteomics and
transcriptomics expression values are available in ProteomicsDB.

tides’ tab, where users can browse ProteomeTools and Prosit life science research covering proteomic and transcriptomic
spectra for e.g. designing targeted mass spectrometric as- expression, pathway, protein-protein and protein-drug in-
says. The two separate views exist because for some proteins, teractions, and cell viability data (Supplementary Figure
no experimental spectra of endogenous proteins might be S3). Many aspects of ProteomicsDB are already respect-
available, while many reference spectra might be available ing the FAIR principles (38). For example, e.g. findability
because the ProteomeTools synthesized all meaningful pep- (F) is supported by unique identifiers, accessibility (A) via
tides for a hitherto unobserved protein. For proteins where API endpoints including meta-data and reusability (R) by
experimental data from endogenous proteins is available, way of multiple online services taking advantage of Pro-
users can take experimental proteotypicity of peptides into teomicsDB’s API endpoints. However, more efforts are cur-
account and thus rationalize which peptide to choose for an rently made to transform ProteomicsDB into a fully FAIR
assay. Additionally, this view can be used to compare spec- resource, e.g. by extending the API to allow access to all
tra created by different fragmentation methods and, more data stored in ProteomicsDB. One particular strength of
importantly, different collision energies to optimize their ProteomicsDB is its versatile mapping service allowing the
targeted assays for collision energies which generate desired seamless connection between different data types. This en-
fragment ions (e.g. highly intense and high m/z ions). Fur- ables subsequent modelling and data mining to further
thermore, spectra can now be downloaded in the mirrored evolve ProteomicsDB from an information database to a
spectrum viewer as msp-files. Finally, as mentioned above, knowledge platform. Along these lines, we plan to extend
ProteomicsDB is also ready to support Mus musculus data. our analytical toolbox such that scientists in life science re-
However, the selection of mouse in ProteomicsDB will only search can directly benefit from the wealth of data stored
be enabled once the data has been published. in ProteomicsDB. Here, we show the first steps into this di-
rection by extending the toolbox as well as enabling users
FUTURE DIRECTIONS to upload their own expression data. Combined with Pro-
teomicsDB’s flexible infrastructure, this will provide ease of
The continuous updates introduced over the last years have use for data analysis, interpretation and machine learning
transformed ProteomicsDB into a multi-omics resource for
D1162 Nucleic Acids Research, 2020, Vol. 48, Database issue

capabilities not accessible to every laboratory or scientist. field of proteomics. They have no operational role in the
For this purpose, we are also planning to further extend company. S.G., H.-C.E. and S.A. are employees of SAP SE.
the data content of ProteomicsDB to include, e.g. protein Neither company affiliation had any influence on the results
structures integrated with drug–target affinity data (20) or presented in this study.
develop tools which allow the prediction of the target spaces
of kinase inhibitors (39).
Two more extensions are planned that will allow the fur- REFERENCES
ther integration and exploitation of reference spectra. The 1. Wilhelm,M., Schlegl,J., Hahne,H., Gholami,A.M., Lieberenz,M.,
first one is to use synthetic or predicted reference spectra Savitski,M.M., Ziegler,E., Butzmann,L., Gessulat,S., Marx,H. et al.
(2014) Mass-spectrometry-based draft of the human proteome.
to systematically validate and assess the confidence of ex- Nature, 509, 582–587.
perimental data by evaluating their spectral similarity. As 2. Schmidt,T., Samaras,P., Frejno,M., Gessulat,S., Barnert,M.,
shown earlier, the integration of intensity information can Kienegger,H., Krcmar,H., Schlegl,J., Ehrlich,H.C., Aiche,S. et al.
lead to drastic improvements in either the number of iden- (2018) ProteomicsDB. Nucleic Acids Res., 46, D1271–D1281.

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


3. Zolg,D.P., Wilhelm,M., Schmidt,T., Medard,G., Zerweck,J.,
tified peptides or the ability to differentiate correct from in- Knaute,T., Wenschuh,H., Reimer,U., Schnatbaum,K. and Kuster,B.
correct matches (5). Especially the latter will help to increase (2018) ProteomeTools: Systematic characterization of 21
the confidence of each peptide identification and thus also Post-translational protein modifications by liquid chromatography
increase the quality of identification and quantification re- tandem mass spectrometry (LC-MS/MS) using synthetic peptides.
sults stored in ProteomicsDB. The second extension is the Mol. Cell. Proteomics, 17, 1850–1863.
4. Zolg,D.P., Wilhelm,M., Schnatbaum,K., Zerweck,J., Knaute,T.,
implementation of a smart tool which will allow users to Delanghe,B., Bailey,D.J., Gessulat,S., Ehrlich,H.C., Weininger,M.
build targeted assays based on data stored in ProteomicsDB et al. (2017) Building ProteomeTools based on a complete synthetic
as described. human proteome. Nat. Methods, 14, 259–262.
Ultimately, the collected data and generated knowledge 5. Gessulat,S., Schmidt,T., Zolg,D.P., Samaras,P., Schnatbaum,K.,
Zerweck,J., Knaute,T., Rechenberger,J., Delanghe,B., Huhmer,A.
should culminate in actionable hypotheses. These may drive et al. (2019) Prosit: proteome-wide prediction of peptide tandem mass
the design of laboratory experiments or eventually aid de- spectra by deep learning. Nat. Methods, 16, 509–518.
cision making in patient care. One way how ProteomicsDB 6. Iorio,F., Knijnenburg,T.A., Vis,D.J., Bignell,G.R., Menden,M.P.,
could be used for the latter is by providing tools that as- Schubert,M., Aben,N., Goncalves,E., Barthorpe,S., Lightfoot,H.
sist molecular tumor boards. We plan to provide pipelines et al. (2016) A landscape of pharmacogenomic interactions in cancer.
Cell, 166, 740–754.
where researchers and clinicians will be able to upload the 7. Rees,M.G., Seashore-Ludlow,B., Cheah,J.H., Adams,D.J., Price,E.V.,
protein profiles of patient samples in a fully anonymized Gill,S., Javaid,S., Coletti,M.E., Jones,V.L., Bodycombe,N.E. et al.
fashion and have in-depth bioinformatic analysis reports (2016) Correlating chemical sensitivity and basal gene expression
returned, spiked with a wide range of information includ- reveals mechanism of action. Nat. Chem. Biol., 12, 109–116.
8. Medico,E., Russo,M., Picco,G., Cancelliere,C., Valtorta,E., Corti,G.,
ing, e.g. protein and RNA abundance levels, biomarkers Buscarino,M., Isella,C., Lamba,S., Martinoglio,B. et al. (2015) The
that predict sensitivity or resistance, potential off-label uses molecular landscape of colorectal cancer cell lines unveils clinically
based on approved kinase inhibitors as well as general sam- actionable kinase targets. Nat. Commun., 6, 7002.
ple characterization, classification or origin identification 9. Barretina,J., Caponigro,G., Stransky,N., Venkatesan,K.,
based on similarities of molecular fingerprints. Margolin,A.A., Kim,S., Wilson,C.J., Lehar,J., Kryukov,G.V.,
Sonkin,D. et al. (2012) The Cancer Cell Line Encyclopedia enables
predictive modelling of anticancer drug sensitivity. Nature, 483,
603–607.
DATA AVAILABILITY 10. Uhlen,M., Oksvold,P., Fagerberg,L., Lundberg,E., Jonasson,K.,
ProteomicsDB is available at https://www.ProteomicsDB. Forsberg,M., Zwahlen,M., Kampf,C., Wester,K., Hober,S. et al.
(2010) Towards a knowledge-based Human Protein Atlas. Nat.
org. Biotechnol., 28, 1248–1250.
11. Komljenovic,A., Roux,J., Wollbrett,J., Robinson-Rechavi,M. and
Bastian,F.B. (2018) BgeeDB, an R package for retrieval of curated
SUPPLEMENTARY DATA expression datasets and for gene list expression localization
Supplementary Data are available at NAR Online. enrichment tests [version 2; peer review: 2 approved, 1 approved with
reservations]. F1000Res, 5, 2748.
12. UniProt Consortium (2019) UniProt: a worldwide hub of protein
knowledge. Nucleic Acids Res., 47, D506–D515.
ACKNOWLEDGEMENTS 13. Stelzer,G., Rosen,N., Plaschkes,I., Zimmerman,S., Twik,M.,
The authors wish to thank all members of the Kuster labo- Fishilevich,S., Stein,T.I., Nudel,R., Lieder,I., Mazor,Y. et al. (2016)
The GeneCards Suite: From gene data mining to disease genome
ratory for fruitful discussions and technical assistance. sequence analyses. Curr. Protoc. Bioinformatics, 54, 1.30.1–1.30.33.
14. Turei,D., Korcsmaros,T. and Saez-Rodriguez,J. (2016) OmniPath:
guidelines and gateway for literature-curated signaling pathway
FUNDING resources. Nat. Methods, 13, 966–967.
German Science Foundation [SFB924, SFB1309, SFB132 15. Knight,J.D.R., Samavarchi-Tehrani,P., Tyers,M. and Gingras,A.C.
(2019) Gene Information eXtension (GIX): effortless retrieval of gene
1]; German Federal Ministry of Education and Research product information on any website. Nat. Methods, 16, 665–666.
(BMBF) [031L0008A, 031L0168]; SAP. Funding for open 16. Monga,M. and Sausville,E.A. (2002) Developmental therapeutics
access charge: BMBF [031L0168]. program at the NCI: molecular target and drug discovery process.
Conflict of interest statement. T.S., S.G. and M.F. are Leukemia, 16, 520–526.
17. Savitski,M.M., Reinhard,F.B., Franken,H., Werner,T., Savitski,M.F.,
founders and shareholders of msAId, which operates in the Eberhard,D., Martinez Molina,D., Jafari,R., Dovega,R.B.,
field of proteomics. M.W. and B.K. are founders and share- Klaeger,S. et al. (2014) Tracking cancer drugs in living cells by
holders of OmicScouts and msAId, which operate in the thermal profiling of the proteome. Science, 346, 1255784.
Nucleic Acids Research, 2020, Vol. 48, Database issue D1163

18. Klaeger,S., Heinzlmeir,S., Wilhelm,M., Polzer,H., Vick,B., 29. Lawrence,R.T., Perez,E.M., Hernandez,D., Miller,C.P., Haas,K.M.,
Koenig,P.A., Reinecke,M., Ruprecht,B., Petzoldt,S., Meng,C. et al. Irie,H.Y., Lee,S.I., Blau,C.A. and Villen,J. (2015) The proteomic
(2017) The target landscape of clinical kinase drugs. Science, 358, landscape of triple-negative breast cancer. Cell Rep, 11, 630–644.
eaan4368. 30. Zou,H. and Hastie,T. (2005) Regularization and variable selection via
19. Koch,H., Busto,M.E., Kramer,K., Medard,G. and Kuster,B. (2015) the elastic net. J. R. Stat. Soc.: B (Stat. Methodol.), 67, 301–320.
Chemical proteomics uncovers EPHA2 as a mechanism of acquired 31. Savitski,M.M., Wilhelm,M., Hahne,H., Kuster,B. and Bantscheff,M.
resistance to small molecule EGFR kinase inhibition. J. Proteome (2015) A scalable approach for protein false discovery rate estimation
Res., 14, 2617–2625. in large proteomic data sets. Mol. Cell. Proteomics, 14, 2394–2404.
20. Heinzlmeir,S., Kudlinzki,D., Sreeramulu,S., Klaeger,S., Gande,S.L., 32. Walls,R.L., Cooper,L., Elser,J., Gandolfo,M.A., Mungall,C.J.,
Linhard,V., Wilhelm,M., Qiao,H., Helm,D., Ruprecht,B. et al. (2016) Smith,B., Stevenson,D.W. and Jaiswal,P. (2019) The plant ontology
Chemical proteomics and structural biology define EPHA2 inhibition facilitates comparisons of plant development stages across species.
by clinical kinase drugs. ACS Chem. Biol., 11, 3400–3411. Front. Plant Sci., 10, 631.
21. Heinzlmeir,S., Lohse,J., Treiber,T., Kudlinzki,D., Linhard,V., 33. Gremse,M., Chang,A., Schomburg,I., Grote,A., Scheer,M.,
Gande,S.L., Sreeramulu,S., Saxena,K., Liu,X., Wilhelm,M. et al. Ebeling,C. and Schomburg,D. (2011) The BRENDA Tissue Ontology
(2017) Chemoproteomics-Aided medicinal chemistry for the (BTO): the first all-integrating ontology of all organisms for enzyme
discovery of EPHA2 inhibitors. Chem. Med. Chem, 12, 999–1011. sources. Nucleic Acids Res., 39, D507–D513.

Downloaded from https://academic.oup.com/nar/article/48/D1/D1153/5609531 by guest on 09 August 2022


22. Zecha,J., Meng,C., Zolg,D.P., Samaras,P., Wilhelm,M. and Kuster,B. 34. Letunic,I. and Bork,P. (2018) 20 years of the SMART protein domain
(2018) Peptide level turnover measurements enable the study of annotation resource. Nucleic Acids Res., 46, D493–D496.
proteoform dynamics. Mol. Cell. Proteomics, 17, 974–992. 35. Szklarczyk,D., Gable,A.L., Lyon,D., Junge,A., Wyder,S.,
23. Savitski,M.M., Zinn,N., Faelth-Savitski,M., Poeckel,D., Gade,S., Huerta-Cepas,J., Simonovic,M., Doncheva,N.T., Morris,J.H., Bork,P.
Becher,I., Muelbaier,M., Wagner,A.J., Strohmer,K., Werner,T. et al. et al. (2019) STRING v11: protein-protein association networks with
(2018) Multiplexed proteome dynamics profiling reveals mechanisms increased coverage, supporting functional discovery in genome-wide
controlling protein homeostasis. Cell, 173, 260–274. experimental datasets. Nucleic Acids Res., 47, D607–D613.
24. Stein,C.K., Qu,P., Epstein,J., Buros,A., Rosenthal,A., Crowley,J., 36. Kanehisa,M. and Goto,S. (2000) KEGG: kyoto encyclopedia of
Morgan,G. and Barlogie,B. (2015) Removing batch effects from genes and genomes. Nucleic Acids Res., 28, 27–30.
purified plasma cell gene expression microarrays with modified 37. Zolg,D.P., Wilhelm,M., Yu,P., Knaute,T., Zerweck,J., Wenschuh,H.,
ComBat. BMC Bioinformatics, 16, 63. Reimer,U., Schnatbaum,K. and Kuster,B. (2017) PROCAL: a set of
25. Chen,C., Grennan,K., Badner,J., Zhang,D., Gershon,E., Jin,L. and 40 peptide standards for retention time indexing, column
Liu,C. (2011) Removing batch effects in analysis of expression performance monitoring, and collision energy calibration.
microarray data: an evaluation of six batch adjustment methods. Proteomics, 17, 1700263.
PLoS One, 6, e17238. 38. Wilkinson,M.D., Dumontier,M., Aalbersberg,I.J., Appleton,G.,
26. Johnson,W.E., Li,C. and Rabinovic,A. (2007) Adjusting batch effects Axton,M., Baak,A., Blomberg,N., Boiten,J.W., da Silva Santos,L.B.,
in microarray expression data using empirical Bayes methods. Bourne,P.E. et al. (2016) The FAIR Guiding Principles for scientific
Biostatistics, 8, 118–127. data management and stewardship. Sci. Data, 3, 160018.
27. Frejno,M., Zenezini Chiozzi,R., Wilhelm,M., Koch,H., Zheng,R., 39. Li,X., Li,Z., Wu,X., Xiong,Z., Yang,T., Fu,Z., Liu,X., Tan,X.,
Klaeger,S., Ruprecht,B., Meng,C., Kramer,K., Jarzab,A. et al. (2017) Zhong,F., Wan,X. et al. (2019) Deep learning enhancing kinome-wide
Pharmacoproteomic characterisation of human colon and rectal polypharmacology profiling: model construction and experiment
cancer. Mol. Syst. Biol., 13, 951. validation. J. Med. Chem., doi:10.1021/acs.jmedchem.9b00855.
28. Gholami,A.M., Hahne,H., Wu,Z., Auer,F.J., Meng,C., Wilhelm,M.
and Kuster,B. (2013) Global proteome analysis of the NCI-60 cell
line panel. Cell Rep., 4, 609–620.

You might also like