100% found this document useful (1 vote)
129 views23 pages

Reviews: Next-Generation Computational Tools For Interrogating Cancer Immunity

Cancer immunotherapy is revolutionizing oncology. New technologies like single-cell sequencing and mass cytometry generate large datasets that require computational analysis. This review discusses computational tools for analyzing cancer immunity using these datasets. It describes hallmarks of cancer immunity like antigen presentation and the cancer-immunity cycle. Tools are presented for neoantigen prediction, characterizing tumor-infiltrating immune cells from bulk and single-cell data, analyzing T and B cell repertoires, and examining cellular phenotypes from images and single-cell data.

Uploaded by

Thị Sô Phia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
129 views23 pages

Reviews: Next-Generation Computational Tools For Interrogating Cancer Immunity

Cancer immunotherapy is revolutionizing oncology. New technologies like single-cell sequencing and mass cytometry generate large datasets that require computational analysis. This review discusses computational tools for analyzing cancer immunity using these datasets. It describes hallmarks of cancer immunity like antigen presentation and the cancer-immunity cycle. Tools are presented for neoantigen prediction, characterizing tumor-infiltrating immune cells from bulk and single-cell data, analyzing T and B cell repertoires, and examining cellular phenotypes from images and single-cell data.

Uploaded by

Thị Sô Phia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

REVIEWS

Next-generation computational tools


for interrogating cancer immunity
Francesca Finotello   , Dietmar Rieder, Hubert Hackl    and Zlatko Trajanoski   *
Abstract | The remarkable success of cancer therapies with immune checkpoint blockers is
revolutionizing oncology and has sparked intensive basic and translational research into the
mechanisms of cancer–immune cell interactions. In parallel, numerous novel cutting-edge
technologies for comprehensive molecular and cellular characterization of cancer immunity
have been developed, including single-cell sequencing, mass cytometry and multiplexed spatial
cellular phenotyping. In order to process, analyse and visualize multidimensional data sets
generated by these technologies, computational methods and software tools are required.
Here, we review computational tools for interrogating cancer immunity , discuss advantages
and limitations of the various methods and provide guidelines to assist in method selection.

Immune checkpoint
Cancer immunotherapy is revolutionizing oncology. technologies have not only provided large data sets that
blockers Given the success in achieving long-term durable can be mined for immunologically relevant parame-
Monoclonal antibodies that responses in numerous advanced and metastatic solid ters2,3 but are also increasingly used in a clinical setting
target immune checkpoints to cancers, cancer immunotherapy sparked tremendous to inform cancer therapy. Additionally, novel techno­
elicit or boost anticancer
interest and research activities in basic, translational and logies such as single-cell RNA sequencing (scRNA-seq)
immune responses. Immune
checkpoints are receptors or clinical science. This is evident not only from the increas- and mass cytometry by time of flight (CyTOF) have
their ligands expressed on ing number of publications but also from the sheer matured and enable for the first time the precise charac­
either tumour cells or immune number of ongoing clinical trials and patients enrolled terization of molecular processes at the single-cell level.
cells that modulate immune
or to be recruited (2,250 active trials with blockers of Obviously, the widespread use of NGS techniques and
cell responses to self-proteins,
chronic infections and tumour
programmed cell death protein 1 (PD1) or one of its lig- continuous development of novel medium-to-high-
antigens. ands, programmed cell death 1 ligand 1 (PDL1), encom- throughput technologies require an expanded compu-
passing 380,900 patients, as of September 2018 (ref.1)). tational toolbox for the analysis and visualization of
Neoantigens The research activities are likely to have a major impact heterogeneous data.
Short peptides generated from
in the field and provide novel mechanistic insights Here, we review computational tools for interrogat-
the expression of mutated or
rearranged genes in cancer into the complex tumour–immune cell interactions. ing cancer immunity, discuss advantages and limitations
cells, but not in normal cells. However, major challenges still remain, including the of various methods and provide guidelines to assist in
Bound to HLA molecules on elucidation of mechanisms of intrinsic and acquired method selection. This Review is complementary to our
the surface of cancer cells,
resistance to therapy with immune checkpoint blockers, previous work in which computational genomics tools
neoantigens are recognized by
T cells through the interaction
the identification of predictive markers for response, the for cancer immunology were described4. We first briefly
of the T cell receptor with the determination of mechanistic rationales for combination describe the different hallmarks of cancer immunity and
peptide–HLA complex. therapies with synergistic potential, the identification then give an overview of cutting-edge experimental tech-
and selection of neoantigens for therapeutic cancer vac- niques for single-cell analysis and spatial cellular pheno-
cination and the determination of targets for adoptive typing. This is followed by the main focus of the Review
therapy with engineered T cells. on computational methods for interrogating cancer
The intrinsic complexity of the interaction of the immunity covering neoantigen prediction, characteri-
two interwoven systems, the tumour and the immune zation of tumour-infiltrating immune cells using bulk
system, poses considerable challenges and requires com- tissue and single-cell approaches, analysis of T cell and
prehensive approaches to interrogate cancer immunity B cell repertoires, analysis of cellular phenotypes from
Biocenter, Institute of during tumour initiation and progression, and follow- histological images and single-cell data visualization.
Bioinformatics, Medical ing therapeutic modulation thereof. Several established
University of Innsbruck,
Innsbruck, Austria.
and novel high-throughput technologies enable the Hallmarks of cancer immunity
generation of the necessary data and thereby provide The interactions between tumour and immune cells
*e-mail: zlatko.trajanoski@
i-med.ac.at the basis for mechanistic understanding, and ultimately have been conceptualized as a series of events, refer­
https://doi.org/10.1038/ increase the number of patients that benefit from can- red to as the cancer-immunity cycle 5, occurring at
s41576-019-0166-7 cer immunotherapy. Next-generation sequencing (NGS) distinct anatomical sites6 (Fig. 1). Hence, it is likely that

Nature Reviews | Genetics


Reviews

a b
Priming and activation Response Composition Localization Function

Lymph node Cytotoxic T cells T cell infiltration Functional T cells

Cancer
antigen Dendritic
presentation cell

Activated
cytotoxic Clonal
expansion Regulatory T cells T cell exclusion Dysfunctional T cells
T cells

Neoantigens

Dying GZMB
cancer cells IFNγ

Tumour Effector
microenvironment response
Cytotoxic T cells Regulatory T cells Cancer cells
Cancer cells Activated T cells Dysfunctional T cells

c d
Tregcell Normal Quiescent ECM Local Circulating
epithelium CAF microbiota cytokines
RBC

Tissue-resident MSC

Effector Activated Macrophage


T cell CAF Endothelial cell

Basement Chronic
Immuno-
Cancer cell membrane viral
modulating
infections
medications
NK cell

Pericyte Immune state


of the host

Fig. 1 | Distinct hallmarks of cancer immunity. a | In the tumour microenvironment, neoantigens released by dying cancer
cells are captured by dendritic cells for processing. After homing to draining lymph nodes, dendritic cells present the
captured neoantigens to T cells, inducing their priming, activation and clonal expansion. Activated T cells migrate into the
tumour microenvironment, where they exert anticancer immune responses through secretion of molecules such as
granzyme B (GZMB) and interferon-γ (IFNγ). b | Three aspects of the tumour immune contexture determine the likelihood of
cancer patients to respond to immunotherapy: composition of the immune infiltrates in terms of effectors and suppressors
cells; localization of immune cells, which can be infiltrating the tumour core (immune inflamed phenotype), confined at the
tumour margin (immune excluded phenotype) or absent from the tumour mass (immune desert phenotype); and function
of effector cells, which can be fully activated or dysfunctional. c | The tumour microenvironment is a complex ecosystem
composed of different cell types, which include cancer cells, epithelial cells, cancer-associated fibroblasts (CAFs) and
immune cells, such as cytotoxic T cells, regulatory T (Treg) cells and myeloid suppressor cells. d | Besides the tumour immune
contexture, different systemic parameters influence the patient’s outcome and response to therapy , including chronic viral
infections, local microbiota, immune state of the host, immunomodulating medications and circulating cytokines. ECM,
extracellular matrix; MDSC, myeloid-derived suppressor cell; MSC, mesenchymal stem cell; NK cell, natural killer cell; RBC,
red blood cell. Part a is adapted from ref.4, Springer Nature Limited. Part c is adapted from ref.204, Springer Nature Limited.

multiparametric assessment rather than the use of single tumour microenvironment (TME), and host and
parameters is necessary to dissect the complex tumour– environmental factors.
immune cell interactions and inform cancer immuno­
therapy. Comprehensive characterization of cancer Neoantigens. Cytotoxic CD8+ T cells are at the core of
immunity requires the determination of the following immunological tumour control and response to anti-
broad characteristics: neo­antigens, immune contexture, cancer therapies5,7,8. After priming and activation by

www.nature.com/nrg
Reviews

Dendritic cells
dendritic cells in draining lymph nodes (Fig. 1a), CD8+ Technologies for interrogating cancer immunity
Professional antigen-presenting T cells can recognize tumour neoantigens — that is, The application of NGS for genomic, transcriptomic and
cells that act as messengers short peptides generated from the expression of mutated epigenomic profiling of tumours is building the main
between the innate and the or rearranged genes bound to class I HLA molecules source of data that enables the extraction of hallmark
adaptive immune system.
of tumour cells — and induce anticancer immune characteristics. The application of these techniques on
Dendritic cells capture
antigens, transport them into responses4,9,10. Accumulating evidence suggests that bulk tissue as well as the computational tools that ena-
lymphoid organs and present neoantigens are major determinants of the response to ble the NGS data to be leveraged have been previously
them to naive T cells together immunotherapy with checkpoint blockers, and their reviewed by us4. Additionally, microbiome analysis meth-
with co-stimulatory signals to computational or experimental characterization in can- ods (reviewed elsewhere17) have advanced rapidly and
induce T cell priming and
activation.
cer patients is the basis for personalized cancer vaccines now provide the means to study the microbiota compo-
and T cell-based immunotherapies9–11. sition and function from 16S ribosomal RNA sequenc-
Hot tumours ing, metagenomics and metatranscriptomics data. Major
Immunogenic tumours with Immune contexture. Immune cells infiltrating into the achievements in recent years have been the development
high infiltration of T cells and
tumour have profound effects on the clinical response of techniques for single-cell analysis and for multiplexed
high likelihood of response to
immune checkpoint inhibitor to immunotherapy, and it has become clear that not spatial cellular phenotyping. These techniques are of par-
therapy (as opposed to cold only the composition but also their localization and ticular relevance for cancer immunity as they enable for
tumours). their functional orientation — jointly referred to as the the first time a comprehensive interrogation of cancer
immune contexture — determine the efficacy of anti­ immunity, including characterization of the cellular
Cold tumours
Poorly immunogenic tumours
cancer immune responses6,12. Patients with high densities composition of cancerous and normal tissue, quantifi-
with low or no infiltration of of specific immune cell subpopulations in the tumour cation of the immune contexture and pairing of α-chains
T cells and low likelihood of centre or invasive margin have better pro­g nosis 13, and β-chains of individual T cell receptors (TCRs) and B
response to immune suggesting that the immune system is controlling the cell receptors (BCRs). The data types, the intermediate
checkpoint inhibitor therapy
growth of the tumour. Beyond the prognostic value, analyses and the immunogenomic analyses are shown in
(as opposed to hot tumours).
the immune contexture has profound effects on the Fig. 2. Appropriate analyses of the data require an under-
Microbiota response to cancer immunotherapies (Fig. 1b): hot tumours standing of the experimental steps that generated those
The community of are more amenable to checkpoint-blocker-based mono­ data in order to understand the origins of the key features
microorganisms, including therapy or combination therapy than cold tumours14. as well as possible biases of the resultant data. We there-
bacteria, viruses and fungi,
which are found within a
Hence, the quantitation of the immune contexture in fore first describe recent technological developments and
specific environment (for archived and prospective samples will provide valuable then review the associated computational tools.
example, the human gut). information for improving cancer immunotherapy.
Single-cell omics of isolated cells. Whereas bulk RNA
Microbiome
Tumour microenvironment. The TME comprises not sequencing (RNA-seq) data enable only reconstruction
The collection of all genomes
from all of the microorganisms only cancer cells, normal epithelial cells and immune of an average transcriptome of mixed cell populations,
composing the microbiota. cells from the adaptive and the innate lineages but also new scRNA-seq technologies can be used to reconstruct
cancer-associated fibroblasts (CAFs), endothelial cells, the transcriptomes of individual cells, opening new ave-
mesenchymal stem cells, the extracellular matrix and the nues for the study of the heterogeneity, plasticity and
basement membrane (Fig. 1c). CAFs provide physical sup- functional diversity of the immune system18,19. Most of
port for epithelial cells, release various tumour-promoting the techniques to capture single cells can be assigned
cytokines and chemokines (which favour tumour growth to either plate- or microfluidics-based methods 20.
and angiogenesis) and are major contributors to an immu- Plate-based approaches such as Smart-seq2 (ref.21) sort
nosuppressive TME. Importantly, anticancer immunity cells into separate wells via fluorescence-activated cell
can be therapeutically exploited by using combinations sorting (FACS). They generate full-length transcripts
of immune checkpoint blockers and anti-angiogenic from single cells and have higher sensitivity than
inhibitors15. Thus, it is necessary to determine the cellular microfluidics-based methods (that is, higher number of
components of the tumour environment and investigate detected genes per cell), but lower throughput in terms
their interactions. Finally, it might be necessary to also of sequenced cells due to the complexity of the single-cell
include data from measured physical properties because isolation step22. Microfluidics-based platforms such as
aberrant cell mechanics is crucial for altered cellular the 10X Chromium23 generate nanolitre-sized droplets
behaviour and the onset of cancer16. containing a single cell each, together with a barcoded
bead and the reagents needed for the downstream reac-
Host and environmental factors. Several systemic fac- tions. These platforms are more cost-efficient and there-
tors including the host microbiota have been associated fore enable profiling of larger number of cells compared
with response to cancer therapy with immune check- with the plate-based systems.
point blockers (see a recent review11), indicating that An alternative method for the analysis of single
systemic factors play a major role (Fig. 1d). Obviously, cells is CyTOF24, which characterizes cells according to
global immunological competence of the patient, their cell-surface-expressed proteins. In this method,
including external factors such as infections or immuno­ metal-isotope-conjugated antibodies are used to stain
modulating medications6, determines the likelihood cells and are then subjected to a quadrupole time-of-
of obtaining clinical benefit. Additionally, commensal flight (TOF) mass spectrometer analysis. The major
microbes influence immune responses, indicating that advantage compared with traditional fluorescence-based
antitumour immunity can also be modulated by the flow cytometry is that there are no spectral overlaps and
gut microbiota. therefore the number of assessed markers can be larger

Nature Reviews | Genetics


Reviews

Intermediate Immunogenomic based on spectral unmixing of fluorescence signals


Data types analyses analyses
(reviewed in refs25–27) is already commercially available
Somatic (Vectra/Opal, Perkin Elmer) and has been established
WGS mutations in many laboratories. Alternatively, sequential staining,
Neoantigens
imaging and stripping of the individual antibodies can
Bulk
sequencing
WES HLA types be used. Using this approach, simultaneous imaging of
Cell fractions 12 distinct biomarkers for lymphoid and myeloid line-
RNA-seq Bulk or scores
transcriptome
ages, as well as for functional orientation of T cells, was
carried out on a single FFPE slide28.
TCR and BCR
Single-cell
CyTOF There are several other IF-based methods that use
profiling Single-cell
sequential staining and imaging approaches to enable
scRNA-seq Cell types
transcriptomes and states highly multiplexed quantification of biomarkers, such
as cyclic immunofluorescence (CycIF)29 and multiplexed
Multiplex IF Paired TCR immunofluorescence (MxIF)30, as well as multiepitope
and BCR chains
ligand cartography (MELC)31. All three of these methods
Multiplexed CycIF
Cell types in use photobleaching to achieve dye cycling and can image
imaging spatial context 100 antigens in a single sample. All multiplexed IHC and
IMC
IF techniques require extensive development and vali­
MIBI dation tests for specific panels of markers in order to
provide robust readouts.
Recent technological developments and commercial
Neoantigens Cell types and phenotypes TCR and BCR availability of highly multiplexed imaging of tumour tissue
slides using mass cytometry (imaging CyTOF or imag-
Fig. 2 | Overview of technologies and analyses for interrogating cancer immunity. ing mass cytometry (IMC))32 hold promise to provide an
Technologies for bulk sequencing allow the prediction of candidate neoantigens unprecedented detailed view of the tissue heterogene-
and deconvolution of cell fractions or computation of abundance scores using ity. IMC combines mass cytometry with immunocyto-
marker-gene-based approaches. Technologies for single-cell profiling allow the chemistry, classical IHC and high-resolution tissue laser
characterization of cell types and states from mass cytometry by time of flight (CyTOF)
ablation to image dozens of proteins simultaneously at
or single-cell RNA sequencing (scRNA-seq) data, but also the reconstruction of B cell
receptors (BCRs) and T cell receptors (TCRs) of the same cells. Recent multiplexed subcellular resolution (<1 µm). Important advantages of
imaging techniques can interrogate several cellular markers, enabling the phenotyping IMC in comparison with the above-mentioned multiplex
of distinct cell types and the reconstruction of the spatial architecture of the tumour imaging methods are the absence of sample autofluor­
microenvironment. CycIF, cyclic immunofluorescence; IF, immunofluorescence; IMC, escence and the wide dynamic range, which make the
imaging mass cytometry ; MIBI, multiplexed ion beam imaging; RNA-seq, RNA method highly quantitative.
sequencing; WES, whole-exome sequencing; WGS, whole-genome sequencing. Multiplexed ion beam imaging (MIBI) is a recently
developed method that is related to IMC but uses dif-
ferent isotopes to label antibodies and an ion beam to
(currently up to 50 markers). Thus, systems-level analy­ generate secondary ions that are then detected by a TOF
ses can now be applied to study the immune system in mass spectrometer (MIBI-TOF)33. MIBI-TOF has been
health and disease. used to study the spatial organization of the TME34 and,
similarly to IMC, allows for highly multiplexed staining
In situ single-cell analyses to quantify the immune con- (up to 100 markers). However, the availability of IMC-
texture. The quantification of the immune contexture or MIBI-compatible antibodies, required instrumenta-
of human cancers has been investigated mainly with tion and relatively long image registration time (~2 h
immunohistochemistry (IHC)- and immunofluor­ per 1 mm2) presently impose limitations to the broad
escence (IF)-based techniques using formalin-fixed, application of these TOF-based imaging techniques.
paraffin-embedded (FFPE) samples as they can be visu- Finally, a promising, highly multiplexed tissue imag-
alized by microscopy at single-cell resolution to provide ing technique named CODEX was recently developed35.
spatial information on the preserved tumour tissue. It uses DNA-barcoded antibodies and cyclic detection by
IHC- and IF-based methods are limited by the number primer extension with fluorescent-labelled nucleotides.
of available channels and allow reliable quantification of CODEX allows deep profiling of cellular phenotypes in
only 2–4 markers on a single tissue slide. Hence, the low a reasonably short time (3.5 h for 30 markers), requiring
number of channels that can be simultaneously analysed only a standard motorized epifluorescence microscope
results in an incomplete characterization of the immuno­ and a simple automated fluidic set-up.
phenotypes. Staining of multiple consecutive tissue Of note, in contrast to scRNA-seq, antibody-based
slides, each with different antibodies, could in principle methods are limited to known sets of markers for the
extend the number of markers. However, correctly rea- identification of different cell types and do not pro-
ligning and combining single slices is error-prone, and vide information about the functional state of specific
important parameters such as cell–cell distances cannot immune cells. As there is currently no consensus on
be reliably calculated. single markers for the functional state (for example,
To overcome this limitation, new imaging methods for exhausted T cells), a viable approach would be an
that provide a higher number of detectable markers integration of data derived from image-based meth-
have been developed. Multiplexed IF for up to 7 markers ods (to determine the cell type and the location) with

www.nature.com/nrg
Reviews

scRNA-seq data from the same sample (to assign func- arcasHLA51, xHLA52, HLA-HD53 and HLAProfiler54,
tional states using a panel of genes)36. However, defining which also perform class II HLA typing. However, unbi-
a panel of genes and assigning specific functions remains ased benchmarking of these recent tools is not availa-
arbitrary and we expect that community-organized con- ble and would be extremely useful for characterizing
sortial projects using scRNA-seq data and expert anno- their accuracy in class II typing, for which very limited
tation such as the Human Cell Atlas37 will provide this validation has so far been carried out.
valuable information in the near future. Tools for predicting peptides binding to HLA mol-
Apart from using multiplexed imaging of spe- ecules use machine-learning methods trained on large
cific markers, several methods that aim to measure in vitro peptide–HLA binding data sets. NetMHC55 and
the expression of tens to thousands of genes in situ its pan-allele version NetMHCpan56 are based on artifi-
may be used to quantify the immune contexture in an cial neural networks and are currently the most widely
antibody-free and spatially resolved manner. By detect- used methods due to their high performance. Both
ing transcripts instead of proteins, cells that produce tools predict the binding affinity as the half-maximal
secreted factors for which antibodies are not availa- inhibitory concentration (IC50) expressed in nanomolar
ble can also be identified. Two main classes of meth- units, as well as the rank of predicted affinity compared
ods use either hybridization (seqFISH38, seqFISH+39 with a set of random natural peptides, to account for
and MERFISH40) or sequencing (FISSEQ 41, Spatial allele-specific bias. Strong binders are usually selected
Transcriptomics42 and Slide-seq43). Although these considering a binding affinity or rank lower than
promising methods have the capability to better char- 500 nM or 0.5%, respectively. Recent advancements in
acterize the spatial and functional composition of the the field of deep learning have fostered the develop-
immune landscape, they have certain advantages and ment of new machine-learning methods based on deep
limitations. For example, seqFISH and MERFISH ena- convolutional neural networks, such as HLA-CNN57
ble subcellular resolution but are time-consuming and and DeepSeqPan58. In parallel, a pan-allele method
require a high number of probes to provide complete called PSSMHCpan has been developed to leverage
coverage. Slide-seq and Spatial Transcriptomics, on the binding motifs to predict peptide binding affinity also
contrary, can provide a more complete transcriptome, for currently under-represented HLA alleles59. The
but achieving single-cell resolution remains a challenge recently developed pVACtools suite for the prediction
(Slide-seq resolution >10 µm). For FISSEQ, subcellular and prioritization of putative neoantigens60 includes
resolution is possible, but the detection threshold is an updated version of the pVACseq61 pipeline that can
high (>200 mRNA molecules per cell) and only a small compute binding-affinity predictions with different
number of transcripts can be analysed. state-of-the-art machine-learning methods, as well as
quantify features linked to antigen pre-processing and
Computational tools for predicting neoantigens recognition (see Box 1).
In silico prediction of putative neoantigens from mutated Notably, only ~1–5% of the class I binders predicted
genes consists of three main computational steps4 in silico using different computational tools have been
(Fig. 3a): first, identification of somatic mutations using experimentally validated9. One possible reason for
whole-genome sequencing (WGS) or whole-exome the discrepancy between predicted and experimen-
sequencing (WES) data from paired tumour and normal tally validated neoantigens is the low sensitivity of
tissue and reconstruction of mutated peptides; second, mass spectrometry (MS)-based methods to directly
genotyping of the patient’s HLA genes from tumour identify binding peptides. Despite this limitation, MS
RNA-seq or WES data; and, third, prediction of peptides measurements of eluted HLA-binding peptides can be
binding to the patient’s HLA molecules. used to directly interrogate the human immunopep-
Mutated peptides arising from somatic muta- tidome, namely the set of peptides presented on HLA
tions can be predicted by comparing tumour versus molecules, and to enable reconstruction of antigen
normal-tissue NGS data from the same patient. NGS profiles presented in vivo that could not be captured
data for neoantigen prediction are generated preferen- from previous in vitro affinity studies62. Novel meth-
tially from WES, which provides the deepest mutation ods such as MHCflurry 1.2.0 (ref.63), ForestMHC64,
coverage by restricting the assay only to protein-coding MixMHCpred65,66 and EDGE67 as well as the latest ver-
4-digit HLA typing regions of the genome. The computational analysis con- sion of NetMHC68 were also trained on MS data from
The standard nomenclature of
sists of data pre-processing and quality control, iden- HLA-eluted peptides, and the increasing amount of
HLA alleles is composed of the
gene name, an asterisk and
tification of somatic mutations using tools for variant HLA–ligand MS data available in databases like IEDB69,
eight digits separated by a detection, and prediction of the affected proteins and PRIDE70 or SysteMHC Atlas71 can provide rich training
colon, for example, functional impact using public repositories of genomic, data sets for the next-generation predictors. However,
HLA-A*02:01:01:05. HLA transcriptomic and proteomic sequences. For a review MS measurements have two major limitations: first,
alleles that differ at 4-digit
and critical discussion of these approaches, we refer to the requirement for a large amount of starting mate-
resolution (for example,
HLA-A*02:02 and previous literature4,44,45. rial (~1 × 108 cells62); and, second, the dependence on
HLA-A*02:01) have similar State-of-the art methods for HLA typing from NGS protein sequence databases for data analysis, which
serological specificity for a data (Table 1) are mature and widely used, and include limits the identification of peptides to the annotated
peptide, but have different
OptiType46 and Polysolver47, which showed high accu- human proteome. The latter issue can be overcome
protein sequences that can
result in different T cell
racy in 4-digit HLA typing, as well as seq2HLA48, which can with computational approaches for updating reference
recognition of the peptide– compute both HLA types and allele-specific expression. databases incorporating predicted non-canonical neo-
HLA complex. More recent methods include Kourami49, HLA*LA50, antigens, such as those derived from non-exonic regions,

Nature Reviews | Genetics


Reviews

a Normal Tumour Tumour b Multiplexed


Bulk RNA-seq scRNA-seq
WES/WGS WES/WGS RNA-seq imaging

Prediction
of mutated HLA typing
peptides

Prediction of peptide– Cellular composition Single-cell phenotypes Cells in spatial context


HLA binding affinity

Putative neoantigens Tumour immune contexture

c d Tumour cell
scRNA-seq Data types and computational analyses

WES or
Quality control >16,000 class I HLA alleles RNA-seq Class I HLA typing
Class I
HLA
WES plus Prediction of neoantigen–
Gene selection Neoantigen ~1014 possible 8–11mers HLA binding
RNA-seq
TCR
Reconstruction of
Normalization ~1016 αβ TCRs scRNA-seq αβ TCR pairs

Cell annotation

Fig. 3 | Overview of computational tools for interrogating cancer immunity. a | Putative neoantigens arising from the
expression of somatic mutations can be predicted in silico through three main computational steps: prediction of mutated
peptides using whole-exome sequencing (WES) or whole-genome sequencing (WGS) data from paired tumour and
normal samples; HL A typing from tumour sequencing data (preferentially RNA sequencing (RNA-seq)); and prediction
of the binding affinity between HL A types and mutated peptides. b | The analysis of different types of data can reveal
different facets of the tumour immune contexture depending on their pros and cons. Bulk RNA-seq data can be analysed
with deconvolution methods to quantify the fractions of different cell subpopulations, but cannot be used to study the
phenotypes of single cells. By contrast, single-cell RNA-seq (scRNA-seq) is currently not optimal to quantitatively assess
the cellular composition of the tumour, but can be used to portray single-cell types and states. Multiplexed imaging
allows the study of cells in a spatial context, but only reconstructs a restricted, 2D portion of the tumour microenvironment
and is limited in the number of markers that can be phenotyped. c | A basic scRNA-seq analysis pipeline consists of quality
control and removal of low-quality cell profiles; selection of informative genes; normalization of expression profiles;
and annotation of cell types. d | Schematic representation of the interaction between a tumour cell and a cytotoxic T cell:
the T-cell receptor (TCR), composed of an α-chain and β-chain, interacts with the neoantigen bound on the class I HL A
molecule of the tumour cell. In humans, there are more than 16,000 class I HL A alleles and ~1016 αβ TCRs, whereas all
possible peptides 8–11 amino acids long (mutated or not) amount to ~1014 8–11mers. Class I HL A typing can be performed
in silico using WES or RNA-seq data, whereas the binding between class I HL A molecules and putative neoantigens can be
predicted by integrating WES (or WGS) and RNA-seq data (details in part a). αβ TCRs of single cells can be reconstructed
from scRNA-seq data, but there are currently no computational methods to predict neoantigen recognition by TCRs. β2M,
β2-microglobulin.

insertions or deletions (indels), gene fusions, alternative neoantigens recognized by CD4+ T cells have limited
splicing or post-translational modifications (see Box 1). accuracy and are advancing slowly due to a lack of proper
However, when supplementing peptide databases with training data. Since our latest review4, the landscape of
non-canonical peptides, care must be taken to avoid false class II predictors has evolved little, with NetMHCII and
positives. The potential relevance of non-canonical neo- NetMHCIIpan73 still representing the top performers74
antigens was shown in a recent study on patients with and only one novel method proposed: MixMHC2pred75.
head and neck cancer treated with immune checkpoint MixMHC2pred was trained on MS-based, class II
inhibitors, demonstrating that gene fusions are a source immunopeptidomics data and demonstrated higher
of immunogenic neoantigens that can mediate responses accuracy compared with NetMHCIIpan75.
to immunotherapy in patients with low mutational load Overall, recent developments in deep learn-
and low pretreatment immune infiltration72. ing algorithms and MS-based immunopepti­domics
Despite the progress in predicting class I HLA neo- have created fertile ground for the development of
antigens, the current tools for predicting class II HLA next-generation predictors of HLA presentation.

www.nature.com/nrg
Reviews

Table 1 | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
HL A typing
OptiType Class I HL A typing Analysis of RNA-seq, WGS and WES; +++ https://github.com/FRED-2/ 46

4-digit resolution OptiType


Polysolver Class I HL A typing Analysis of WES data; HL A mutation +++ https://software. 47

calling; up to 8-digit resolution broadinstitute.org/cancer/


cga/polysolver
seq2HL A Class I and II HL A typing Analysis of paired-end RNA-seq +++ https://bitbucket.org/ 48

data; quantification of HL A allele sebastian_boegel/seq2hla


expression; 4-digit resolution
Kourami Class I and II HL A typing Analysis of WGS data; discovery of +++ https://github.com/ 49

novel alleles; up to 6-digit resolution Kingsford-Group/kourami


HL A*L A Class I and II HL A typing Analysis of WGS and WES data, long +++ https://github.com/ 50

reads and assemblies; up to 6-digit AlexanderDilthey/MHC-PRG


resolution
arcasHL A Class I and II HL A typing Analysis of WGS and WES data, long +++ https://github.com/ 51

reads and assemblies; up to 6-digit RabadanLab/arcasHL A


resolution
xHL A Class I and II HL A typing Analysis of WGS and WES data; +++ https://github.com/ 52

4-digit resolution humanlongevity/HL A


HL A-HD Class I and II HL A typing Analysis of WGS, WES and RNA-seq +++ https://www.genome.med. 53

data; discovery of novel alleles; up to kyoto-u.ac.jp/HL A-HD


6-digit resolution
HL AProfiler Class I and II HL A typing Analysis of RNA-seq data; up to +++ https://expressionanalysis. 54

6-digit resolution github.io/HL AProfiler


Neoantigen prediction
NetMHC Prediction of class I Based on neural networks; analysis +a http://www.cbs.dtu.dk/ 68

peptide–MHC binding of human and non-human HL A services/NetMHC


molecules
NetMHCpan Pan-allele version of Based on neural networks; applicable +a http://www.cbs.dtu.dk/ 56

NetMHC to under-represented alleles services/NetMHCpan


HL A-CNN Prediction of class I Based on deep convolutional +++ https://github.com/uci-cbcl/ 57

peptide–HL A binding networks HL A-bind


affinity
DeepSeqPan Prediction of class I Based on deep convolutional +++ https://github.com/pcpLiu/ 58

peptide–HL A binding networks DeepSeqPan


affinity
PSSMHCpan Prediction of class I Based on position-specific +++ https://github.com/BGI2016/ 59

peptide–HL A binding scoring matrices; applicable to PSSMHCpan


affinity under-represented alleles
MHCflurry Prediction of class I Based on neural networks; fast +++ https://github.com/openvax/ 63

peptide–MHC binding analysis; trained also on mass mhcflurry


spectrometry data
ForestMHC Prediction of probabilities Based on random forest classifiers; +++ https://github.com/ 64

of peptide–HL A trained also on mass spectrometry kmboehm/ForestMHC


presentation data
MixMHCpred Prediction of class I Based on position-specific scoring +++ https://github.com/ 65,66

peptide–HL A binding matrices; trained also on mass GfellerLab/MixMHCpred


affinity spectrometry data
MixMHC2pred Prediction of class II Based on position-specific scoring +++ https://github.com/ 75

peptide–HL A binding matrices; trained also on mass GfellerLab/MixMHC2pred


affinity spectrometry data
EDGE Prediction of probabilities Based on neural networks; trained +++ Supplementary software in 67

of peptide–HL A also on mass spectrometry and the original publication


presentation RNA-seq data
NetMHCstab Prediction of peptide– Based on neural networks +a http://www.cbs.dtu.dk/ 186

HL A (A and B) binding services/NetMHCstab


stability
NetMHCstabpan Pan-allele version of Based on neural networks; applicable +a http://www.cbs.dtu.dk/ 187

NetMHCstab to under-represented alleles services/NetMHCstabpan

Nature Reviews | Genetics


Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
Neoantigen prediction (cont.)
pVACseq Identification and Based on several tools for class I and +++ https://pvactools. 61

prioritization of II peptide–HL A binding, cleavage readthedocs.io


neoantigens from NGS and stability prediction; part of the
data pVACtools suite for neoantigen
vaccine design
Cell-type deconvolution and quantification
xCell Quantification of Abundance scores based on GSEA ; +a http://xcell.ucsf.edu 83

cell types from bulk 64 immune and non-immune cell


transcriptomic data types; inter-sample comparison
TIminer Computational Full pipeline, including also GSEA +++ https://icbi.i-med.ac.at/ 84

framework to perform based on three gene-set compendia; software/timiner/doc


onco-immunogenomic inter-sample comparison
analyses
MCP-counter Quantification of Abundance scores based on marker- ++ https://github.com/ebecht/ 85

cell types from bulk gene expression; eight immune cell MCPcounter
transcriptomic data types, fibroblasts and endothelial
cells; inter-sample comparison
CIBERSORT Deconvolution of SVR-deconvolution of 22 immune +a https://cibersort.stanford.edu 87

cell types from bulk cell phenotypes; web interface and


transcriptomic data standalone R script; cell fractions
referred to total immune cells;
intra-sample comparison
CIBERSORTx Building of custom Deconvolution based on CIBERSORT; +a https://cibersortx.stanford. 88

signatures and building of custom signature matrices edu


deconvolution of cell types from single-cell or bulk transcriptomic
from bulk transcriptomic data; normalization for batch effect
data and reconstruction of removal; extraction of cell-specific
cell-specific transcriptional transcriptional profiles
profiles
TIMER Deconvolution of cell Abundance scores based on +a https://cistrome.shinyapps. 89

types from bulk-tumour deconvolution; 6 immune cell types; io/timer


transcriptomic data 33 bulk-tumour types; inter-sample
comparison
EPIC Deconvolution of Deconvolution based on constrained ++ https://gfellerlab.shinyapps. 90

cell types from bulk least-squares regression; signatures io/EPIC_1-1


transcriptomic data from RNA-seq data; applicable to
blood (six immune cell types) and
tumour data (five immune cell types,
fibroblasts, endothelial cells); inter-
and intra-sample comparison
quanTIseq Pipeline for bulk RNA-seq Deconvolution based on constrained ++ https://icbi.i-med.ac.at/ 91

data pre-processing least-squares regression; signatures quantiseq


and deconvolution of from RNA-seq data; full pipeline
absolute cell fractions to process raw RNA-seq data;
applicable to blood and tumour data;
ten immune cell types; inter- and
intra-sample comparison
immunedeconv Unified framework for R package providing access to ++ https://github.com/grst/ 92

applying different cell-type CIBERSORT, quanTIseq, TIMER , EPIC immunedeconv


quantification methods to and xCell
bulk RNA-seq data
linseed Complete deconvolution Complete deconvolution based on ++ https://github.com/ctlab/ 93

of cell fractions and SVD; estimation of the unknown LinSeed


transcriptional profiles number of cell types; working
from transcriptomic data only for cell types with variable
abundance in the input data set
scRNA-seq data analysis
Seurat Pipeline for scRNA-seq R-based ++ https://satijalab.org/seurat 36,105

expression data analysis


Scater Pipeline for scRNA-seq R-based ++ https://bioconductor.org/ 106

expression data analysis packages/release/bioc/html/


scater.html

www.nature.com/nrg
Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
scRNA-seq data analysis (cont.)
SINCERA Pipeline for scRNA-seq R-based ++ https://research.cchmc.org/ 107

expression data analysis pbge/sincera.html


Scran Pipeline for scRNA-seq R-based ++ https://bioconductor.org/ 108

expression data analysis packages/release/bioc/html/


scran.html
Scanpy Pipeline for scRNA-seq Python-based ++ https://github.com/theislab/ 109

expression data analysis Scanpy


Granatum Pipeline for scRNA-seq Web interface + http://garmiregroup.org/ 110

expression data analysis granatum/app


ASAP Pipeline for scRNA-seq Web interface + https://asap.epfl.ch 111

expression data analysis


RaceID3 Identification of rare cell R-based; rare cells identified as ++ https://github.com/dgrun/ 115

types from scRNA-seq outliers of K-medoid clustering RaceID3_StemID2_package


data
GiniClust Identification of rare cell Based on Python and R; Gini ++ https://github.com/ 116

types from scRNA-seq inequality index used for selection of lanjiangboston/GiniClust


data genes specific to rare cell types
scmap Cell-type annotation R and cloud implementation; +a https://github.com/ 117

from scRNA-seq data projection of query data onto cell hemberg-lab/scmap


types or individual cells from other
experiments
CellFishing.jl Cell-type annotation Julia-based; mapping of query +++ https://github.com/ 206

from scRNA-seq data data to user-supplied, reference bicycle1885/CellFishing.jl


expression data sets
CellAtlasSearch Cell-type annotation Web interface; mapping of query + http://www.cellatlassearch. 207

from scRNA-seq data data to an annotated database com


of single-cell and bulk expression
profiles; input files limited to 20 MB
SingleR Cell-type annotation R-based; based on correlation ++ https://github.com/dviraran/ 118

from scRNA-seq data analysis considering reference SingleR


transcriptomes from human and
mouse cell profiles with microarrays
and RNA-seq
Garnett Cell-type classification R based (builds on Monocle157); based on ++ https://cole-trapnell-lab. 119

from scRNA-seq data cell-type markers, scRNA-seq reference github.io/garnett


data and hierarchy of cell subtypes;
availability of pre-trained classifiers
scMatch Cell-type annotation Python-based; based on correlation ++ https://github.com/ 121

from scRNA-seq data analysis considering reference forrest-lab/scMatch


transcriptomes or cell ontologies
TCR and BCR repertoires
MiXCR Extraction of BCR and Analysis of raw data from bulk RNA-seq +++ https://mixcr.readthedocs.io 124,125

TCR sequences or targeted sequencing data, single


or paired end; PCR-error correction
and identification of germline
hypermutations
TRUST Extraction of BCR and Analysis of mapped reads from bulk ++ https://bitbucket.org/liulab/ 126,127

TCR sequences RNA-seq data trust


V’DJer Extraction of BCR Analysis of mapped reads from bulk +++ https://github.com/mozack/ 129

sequences RNA-seq data, also short reads; vdjer


clonotype quantification not included
TraCeR Reconstruction of paired Analysis of raw data from +++ https://github.com/Teichlab/ 130

TCR sequences full-transcript, paired-end tracer


scRNA-seq data; inference of
clonality and reconstruction of
clonotype networks
TRAPeS Reconstruction of paired Analysis of raw data from +++ https://github.com/YosefLab/ 131

TCR sequences full-transcript scRNA-seq data, also TRAPeS


from short reads

Nature Reviews | Genetics


Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
TCR and BCR repertoires (cont.)
scTCRseq Reconstruction of paired Analysis of raw data from full-transcript, +++ https://github.com/ 132

TCR sequences paired-end scRNA-seq data ElementoLab/scTCRseq


BASIC Reconstruction of paired Analysis of full-transcript scRNA-seq +++ http://ttic.uchicago. 133

BCR sequences data, single or paired end edu/~aakhan/BASIC


BraCeR Extension of TraCeR for Analysis of raw data from +++ https://github.com/teichlab/ 134

reconstruction of paired full-transcript, paired-end scRNA-seq bracer


BCR sequences data; inference of clonality and
reconstruction of clonotype networks
BALDR Reconstruction of paired Analysis of raw data from full-transcript +++ https://github.com/ 135

BCR sequences scRNA-seq data, single or paired end BosingerLab/BALDR


VDJPuzzle Assembly of paired TCR Analysis of full-transcript scRNA-seq +++ https://bitbucket.org/ 136,137

and BCR sequences data, single or paired end kirbyvisp/vdjpuzzle2


Multiplexed image analysis
inForm Quantitative per-cell Unmixing of multispectral images, + http://www.perkinelmer. Perkin
analysis for single and integration into Perkin Elmer com/at/product/inform-cel Elmer
multiplex images Phenoptics workflow; handling of up l-analysis-one-seat-cls135781
to ten different markers; for IHC, IF and
RNA in situ hybridization; commercial
Halo Image analysis, Analysis of entire tissue sections; + http://www.indicalab.com/ Indica Labs
segmentation, unlimited number of markers; halo
classification identification of cellular phenotypes
characterized by multiple markers;
interactive links between cell data
and cell image; tissue classification
using machine learning; commercial
StrataQuest Image-processing Phenotypic characterization of + http://www.tissuegnostics. Tissue
solution and analysis immune cells in reference to detected com/en/products/ Gnostics
for brightfield and metastructures (for example, analysing-software GmbH
fluorescence images tumours, glands); segmentation of
morphological structures based on
a morphological stain alone; >50
apps for scientific and clinical routine
applications; commercial
ImageJ Image analysis Generic image analysis software + (++ https://imagej.nih.gov/ij 139

extendible by numerous plugins and scripting)


scripting; for Microsoft Windows,
Linux and MacOSX; open source and
community supported
CellProfiler Image analysis, imaging Creation of high-throughput image + (++ https://cellprofiler.org 140

pipeline creation analysis pipelines for quantitative scripting)


and automatic measurement
of biological phenotypes using
interoperating modules; support
of ImageJ plug-ins and scripting;
for Microsoft Windows, Linux
and MacOSX; open source, with
community and commercial support
Ilastik Image classification and Image segmentation and + GUI mode https://www.ilastik.org 141

segmentation classification based on machine (++ headless


learning; batch mode for multi-image mode)
analysis; for Microsoft Windows,
Linux and MacOSX; open source and
community supported
DeepCell Image segmentation and Segmentation of individual cells +a (+++ API https://deepcell.org 144

classification in microscopy images using deep mode)


learning; Docker image; cloud
enabled; open source
imctools IMC data processing Conversion of IMC raw files into ome. +++ https://github.com/
tiff files for downstream analysis in BodenmillerGroup/imctools
CellProfiler, Ilastik , Fiji and so forth;
open source

www.nature.com/nrg
Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
Multiplexed image analysis (cont.)
MIBIAnalysis MIBI-TOF data analysis Low-level MIBI-TOF data analysis ++ https://github.com/lkeren/ 34

MIBIAnalysis
CODEX CODEX data processing Processing of raw CODEX data; + https://github.com/nolanlab/ 35

segmentation, visualization CODEX


and analysis of processed and
segmented data
MERFISH_ MERFISH data processing Series of Matlab functions for the +++ https://github. 40

analysis analysis of MERFISH com/ZhuangLab/


MERFISH_analysis
Slideseq Slide-seq data processing Collection of Matlab and Python +++ https://github.com/ 43

function for Slide-seq data broadchenf/Slideseq


processing and analysis — note: a full
user-friendly software package will
be available in the near future
ST Pipeline (+ Spatial transcriptomics Tools and scripts to process and ++ https://github.com/ 208

tools) analysis pipeline analyse raw data; generate data sets SpatialTranscriptomicsResearch
for downstream analysis
seqFISH+ seqFISH+ data processing Processing images and barcode +++ https://github.com/ 39

calling of seqFISH+ experiments CaiGroup/seqFISH-PLUS


TIBCO Spotfire Data analysis and High-level data analysis, including + https://www.cambridgesoft. Perkin
Analyst for visualization platform clustering, network maps and com/ensemble/spotfire/ Elmer
Quantitative statistics; compatible with inForm SciStream/Default.aspx
Pathology (Perkin Elmer) output; interactive
links to cell/object locations in
images; commercial
Phenomap Analysis of Tissue and cell segmentation; + https://www.visiopharm.com/ Visiopharm
high-dimensional image automated mapping of TME phenomap-multiplexing
data cell phenotypes; clustering;
t-SNE plots; quantitative
measurements of expression, (co-)
localization, proximity , counts and
neighbourhoods; commercial
CellProfiler Exploration and analysis Phenotype classification using + https://cellprofiler.org/ 142

Analyst of high-dimensional machine learning; basic plotting cp-analyst


image data of numerical data; data tables with
interactive links to images; open
source
histoCat Analysis of phenotypes Visualization and analysis of + https://github.com/ 143

and interactions high-dimensional (for example, BodenmillerGroup/histoCAT


in multiplex image IMC) data; t-SNE; grouping of
cytometry data cells into PhenoGraph clusters;
neighbourhood analysis
Single-cell data visualization
t-SNE Non-linear dimensionality Most commonly used; dependency on ++ https://lvdmaaten.github. 145

reduction hyperparameters such as perplexity ; io/tsne


different implementations available
viSNE Dimensionality reduction Barnes–Hut implementation of ++ (+ in http://www.c2b2.columbia. 146

t-SNE; visualization for CyTOF data; Cytobank) edu/danapeerlab/html/cyt.


implementation in Cytobank html
SIMLR Multikernel learning for Analysis of a high number of samples ++ https://github.com/ 151

dimensionality reduction without loss in accuracy for dissection of BatzoglouLabSU/SIMLR


heterogeneity ; based on R and Matlab
ZIFA Zero inflated Explicit modelling of dropout ++ https://github.com/ 152

dimensionality reduction characteristics for accuracy gain; epierson9/ZIFA


linear transformation; Python-based
scvis Dimensionality reduction Parametric mapping from high- to +++ https://bitbucket.org/jerry00/ 153

with deep generative low-dimensional space; estimation scvis-dev


models of uncertainty ; possibility of
adding new points to an existing
embedding; Python-based

Nature Reviews | Genetics


Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
Single-cell data visualization (cont.)
UMAP Dimensionality reduction High reproducibility of clusters and ++ https://github.com/lmcinnes/ 154

with uniform manifold preservation of distances; short umap


approximation and run time; based on Python and
projection scikit-learn
Monocle Pseudotime trajectory Minimum spanning tree on clusters; ++ http://cole-trapnell-lab. 157

reconstruction and R-based github.io/monocle-release


analysis toolkit
DPT Diffusion pseudotime Noise reduction by diffusion ++ https://github.com/theislab/ 158

map; analysis of branching data; scanpy; https://bioconductor.


random-walk-based distance in org/packages/release/bioc/
diffusion map space; R-based html/destiny.html
PAGA Partition-based graph Connected and disconnected ++ https://github.com/theislab/ 162

abstraction trees; cycles; maps preserve paga


the global topology of data;
analysis at different resolutions;
Python-based
Palantir Alignment of cells along Markov processes; identification ++ https://github.com/dpeerlab/ 163

differentiation trajectories of differentiation potential and Palantir


terminal states; Python-based
velocyto Prediction and Analysis of mapped reads from ++ http://velocyto.org 164

visualization of RNA scRNA-seq data; several visualization


velocity of single cells approaches based on dimensionality
reduction; R and Python
implementation
PhenoGraph Subpopulation Partition of mass cytometry data ++ https://github.com/ 167

detection by clustering into a graph; no manual gating; jacoblevine/PhenoGraph


in high-dimensional Python-based
single-cell CyTOF data
SPADE Spanning tree Rapid visual analysis; limited ++ https://github.com/nolanlab/ 168

progression of quantitative analysis; R-based spade


density-normalized
events from CyTOF data
X-shift Mass cytometry weighted Optimized stratification; in-depth ++ https://github.com/nolanlab/ 169

k-nearest-neighbour visual analysis; rapid quantitative vortex


density estimation from analysis; Java-based
CyTOF data
ACCENSE Non-linear dimensionality Combination of t-SNE and ++ http://www.cellaccense.com 170

reduction and a k-means density-based partitioning;


clustering of CyTOF data Python-based
FlowSOM Self-organizing map Clustering; visualization of mean ++ http://bioconductor.org/ 171

clustering and minimal marker values by star charts or cell packages/FlowSOM


spanning trees from types by pie charts; R-based
CyTOF data
Citrus Subpopulations in Stratification with statistical ++ https://zenodo.org/ 172

multidimensional CyTOF information; differential abundance record/10310


data sets analysis; limited visual analysis;
R-based
Slingshot Lineage and pseudotime Simultaneous principal curves, ++ https://bioconductor.org/ 209

inference in single-cell orthogonal projection; multiple packages/release/bioc/html/


data lineage inference; R-based slingshot.html
Databases and other resources
TCIA Results of comprehensive Repository of gene expression + https://tcia.at 2

immunogenomic analyses data, HL A types (controlled


of NGS data for 20 solid access), neoantigens, immune
cancers from TCGA and cell fractions or scores from GSEA
patients treated with and deconvolution, genomic and
checkpoint inhibitors clinical features; data download
and dynamic visualization and
analysis (for example, survival
analysis)

www.nature.com/nrg
Reviews

Table 1 (cont.) | next-generation computational tools to interrogate cancer immunity


tool Description notable features Required software Ref.
bioinformatic
expertise
Databases and other resources (cont.)
iAtlas Interactive portal to Access to genomic, immunological + https://www.cri-iatlas.org 3

access immunogenomic and clinical features; simplified


data from >10,000 TCGA access to tools for data analysis and
tumour samples across 33 visualization
cancer types
IEDB Immune epitope and Web-based access to tools for + https://www.iedb.org 69

analyses resource prediction of class I and II binding,


peptide processing, peptide
immunogenicity and B cell epitopes;
access to experimental data on B cell
and T cell epitopes from humans and
other species
PRIDE Proteomics data Access and download of proteomics + https://www.ebi.ac.uk/pride/ 70

repository data, including spectral data; archive


tools for mass spectrometry data
visualization, analysis and conversion
SysteMHC Atlas MHC immunopeptidomic Access to class I and II MHC + https://systemhcatlas.org 71

resource immunopeptidomic data generated


by mass spectrometry , including raw
data
CyTOF Biosurf Resource for mass Primers for cytometrists and data + http://cytof.biosurf.org 173

cytometry analysis scientists; list of analysis tools;


working examples in R
VDJdb Curated database of TCR Web interface to browse and + https://vdjdb.cdr3.net 188

sequences with known query the database; query of


antigen specificities immune-repertoire sequencing
data to assess antigen specificities;
human, mouse and non-human
primates
immuneXpresso Immune intercellular Based on text-mining engine; + http://immunexpresso.org 193

communication incoming/outgoing interactions;


confidence score and link to disease
CellPhoneDB Repository of curated Accurate representation of + https://www.cellphonedb.org 194

receptors, ligands and heteromeric complexes; existing and


their interactions new manually curated data sets
IMEx Non-redundant set Deep curation model; single data set + https://www.imexconsortium. 195

of physical molecular from deposited interaction data and org


interactions publications
CCCExplorer Predicts and visualizes Based on expression data; + https://github.com/ 198,199

the gene signalling connection of ligand–receptor methodistsmab/CCCExplorer


network for crosstalk interaction with intracellular
identification in the TME signalling and transcription factors;
Java-based
The bioinformatic expertise required to use the tools is defined as follows: +, low expertise (for example, use of web interfaces); ++, medium expertise, requires
experience with R or Python scripting; +++, high expertise, requires knowledge of the Linux command line and pre-processing of input data. BCR , B cell receptor ;
CODEX, co-detection by indexing; CyTOF, mass cytometry by time of flight; FISH, fluorescence in situ hybridization; GSEA , gene-set enrichment analysis; GUI,
graphical user interface; IEDB, Immune Epitope Database; IHC, immunohistochemistry ; IF, immunofluorescence; IMC, imaging mass cytometry ; IMEx, International
Molecular Exchange Consortium; MERFISH, multiplex error-robust FISH; MHC, major histocompatibility complex; MIBI, multiplexed ion beam imaging; NGS,
next-generation sequencing; RNA-seq, RNA sequencing; scRNA-seq, single-cell RNA sequencing; seqFISH, sequential FISH; SVD, singular value decomposition;
SVR , support vector regression; TCGA , The Cancer Genome Atlas; TCR , T cell receptor ; TME, tumour microenvironment; TOF, time of flight mass spectrometry ;
t-SNE, t-distributed stochastic neighbour embedding; WES, whole-exome sequencing; WGS, whole-genome sequencing. aLow bioinformatic expertise
accommodated if the web-based/interface version of the tool is used.

However, a comprehensive and unbiased evaluation on agreement between predicted and experimental bind-
external data is currently lacking. Unfortunately, results ing affinities. This hampers a complete characterization
from two recent studies based on public peptide–HLA of tools in terms of accuracy, positive predictive value
binding data74 or de novo experimentally validated and coverage of HLA alleles. In this context, the gener-
human papillomavirus (HPV) peptides76 provided little ation of the optimal validation data set is of paramount
guidance on method selection. These studies identified importance: it should cover a wide range of HLA alleles,
either MHCflurry or NetMHC as top performers for effectively capture the rules of antigen presentation
class I binding prediction, and reported variable accu- (which is not possible using in vitro assay data) and be
racy across HLA types and peptide lengths and low based on unseen data not used for the training of the

Nature Reviews | Genetics


Reviews

Box 1 | immunogenicity of neoantigens predictive signatures81,82. Quantifying different cell


types of the TME can be done using either approaches
identifying immunogenic peptides is currently an unsolved problem: most of the based on marker genes or deconvolution-based meth-
neoantigens predicted in silico fail to elicit an immune response in vivo185. ods (Table 1). Whereas tools based on marker genes
This inevitably forces the experimental testing of neoantigens predicted computationally such as xCell83, TIminer84 and MCP-counter85 assess
or derived from mass spectrometry data. Unfortunately, very limited data on T cell
only semi-quantitative abundance scores, deconvolution
responses to tumour neoantigens have been generated, and the determinants of
neoantigen immunogenicity remain elusive. two features that have been linked to methods can quantitatively estimate the fractions of indi-
immunogenicity are the stability of the peptide–HLA complex and the functional vidual cell types in a heterocellular tissue by considering
avidity of the CD8+ T cell receptor (TCR) interacting with it. Although peptide–HLA the bulk transcriptome as the ‘convolution’ or summa-
stability can be predicted in silico using tools such as NetMHCstab186 or its pan-allele tion of cell-specific signatures86. In brief, deconvolution
version NetMHCstabpan187, no high-throughput methods are currently available for methods formulate the problem as a system of equations
predicting peptide–HLA recognition by TCRs. However, resources such as VDJdb, a that describe the expression of each gene in a bulk sam-
constantly updated, curated database of TCR sequences from different species with ple as the weighted sum of the expression profiles of the
known antigen specificities (with more than 20,000 human entries as of January 2019), admixed cell types. The mathematical problem (so-called
can provide valuable information for training such computational predictors188. inverse problem) of inferring the unknown cell-type
another feature linked to immunogenicity is neoantigen foreignness189. it was
fractions is solved using a signature matrix describing
suggested that mutations that generate novel HLa-binding sites may be more
immunogenic because they enable the presentation of peptides never seen before by the expression fingerprints of sorted cell types86. In the
the immune system, thus eliciting strong immune responses. such neoantigens can be following discussion, we briefly introduce user-friendly
identified by comparing the binding affinity of the wild-type and mutant peptides. deconvolution tools that embed a signature matrix spe-
pVACseq61 integrates binding affinity predictions with additional features such as cifically tailored for the quantification of immune cells
stability and cleavage scores, mutation coverage, variant allele frequency, gene and that have been validated on tumour data.
expression and fold-change of binding affinity of wild-type versus mutant peptides. CIBERSORT is a popular deconvolution algorithm
More recently, Łuksza et al.190 proposed a model of neoantigen fitness based on based on a signature matrix describing the expression
HLA-binding affinity of mutated and wild-type peptides and on sequence similarity of profiles of 22 immune cell phenotypes, which was
neoantigens to known immunogenic microbial epitopes. they used this model to derived from microarray data of sorted or enriched
predict the survival of patients treated with immune checkpoint blockers. However, the
immune cell types87; it uses support vector regression
prognostic value of this approach remains to be validated in additional cancer types10,11.
to identify the solution. More recently, CIBERSORT
has been integrated into CIBERSORTx, a deconvolu-
algorithms. Moreover, even optimal models of neo­ tion tool that further enables the building of custom
antigen presentation cannot predict their immunogenic- signature matrices from single-cell or flow-sorted bulk
ity (see Box 1). Large collaborative efforts such as the transcriptomic data with removal of possible batch
Tumor Neoantigen Selection Alliance (TESLA) are put- effects88. Another tool, TIMER, is a multistep compu-
ting together large data sets for the validation of these tational approach for the quantification of 6 immune
in silico approaches77 and could help to identify the best cell types in 32 different cancer types89. TIMER merges
predictors of mutated peptides that bind in vivo. the input samples to be deconvoluted with immune cell
Despite the need for further optimization of tools reference profiles, performs normalization to remove
for predicting neoantigens, they have already demon- batch effects, derives a cancer-specific signature matrix
strated preliminary clinical value. In two studies on and quantifies cell abundance scores using linear
personalized vaccinations in melanoma, predicted neo- least-squares deconvolution. Despite being estimated via
antigens elicited effective immune responses78,79. One deconvolution, TIMER scores cannot be interpreted as
study used synthetic peptides to immunize six mela- cell fractions or compared across different immune cell
noma patients79. Four out of six patients had no disease types and data sets89. The recently introduced method
recurrence, whereas patients with metastatic disease EPIC90 was developed using bulk and scRNA-seq data
obtained complete tumour regression with additional from circulating and tumour-infiltrating immune cells,
anti-PD1 therapy. The other study used RNA-based CAFs and epithelial cells. EPIC provides two separate
vaccines based on predicted neoantigens recognized by signature matrices for the analysis of blood or tumour
CD4+ and CD8+ T cells in 13 patients with melanoma78. data and, unlike previous methods, estimates cell frac-
Neoantigen-based vaccination reduced metastatic events tions referring to the total cells in a sample, thus ena-
and caused objective response in two of five patients bling intra- and inter-sample comparison. Finally,
with metastases and complete response in a third patient quanTIseq is a recent algorithm for deconvolution of
treated with the vaccine in combination with anti-PD1 blood and tumour data, which is based on a novel sig-
Avidity immunotherapy. nature matrix built from a compendium of RNA-seq
When pertaining to T cells, data sets for ten circulating immune cell types, includ-
a biological measure that Characterizing tumour-infiltrating immune cells ing regulatory T cells and classically (M1) and alter-
describes how well a T cell
Bulk analysis. Bulk-tumour transcriptomics data from natively (M2) activated macrophages91. quanTIseq is
responds to a given antigen.
T cells with high functional microarrays or RNA-seq can be used to quantify dif- specifically tailored for RNA-seq data and implements
avidity respond to low antigen ferent cell types of the TME, as well as to identify gene a full analytical pipeline, from pre-processing of raw
amounts, whereas T cells with signatures that can predict response to immunotherapy RNA-seq data until deconvolution of cell fractions.
low functional avidity require with checkpoint blockers (Fig. 3b). Given the complex and quanTIseq also estimates immune cell fractions that can
higher antigen amounts to
mount an immune response
multifactorial nature of the anticancer immune response be compared within and between samples91. Analysis of
comparable with that of and the various mechanisms of immune escape80, it will samples from two cohorts of patients with melanoma
high-avidity T cells. be challenging to validate prospectively the proposed treated with kinase inhibitors or immune checkpoint

www.nature.com/nrg
Reviews

Data dimensionality blockers demonstrated that deconvolution methods can computational strategies for scRNA-seq data are far less
The high dimensionality of be used both to monitor the immunological effects of mature and standardized than those for bulk RNA-seq,
single-cell RNA sequencing targeted agents and to reveal immune cell composition well-implemented and documented frameworks tack-
data is due to the high number in response to immune checkpoint blockers91. ling the main analytical steps are already available
of genes measured
(20,000–30,000 genes),
Benchmarking cell-type quantification methods (Table 1) and include pipelines using R (for example,
although for many of those the is difficult due to the differences in the cell types and Seurat36,105, Scater106, SINCERA107 and Scran108), Python
expression in a certain cell estimated scores/fractions. A recent comparative bench- (for example, Scanpy109) or user-friendly graphical inter-
would be zero due to dropouts. marking revealed high accuracy in the quantification of faces (for example, Granatum110 and ASAP111). The core
Due to the high dimensionality,
CD8+ T cells across different approaches, but limited per- pipeline usually consists of four main steps: first, qual-
cells become very similar and
difficult to assign to different
formance for heterogeneous cell types such as dendritic ity control and removal of low-quality cell profiles (for
groups (for example, cell cells92. The Tumor Deconvolution Challenge, organized example, stressed cells or doublets); second, selection of
subpopulations). by the Dialogue on Reverse Engineering Assessment and informative genes (for example, genes with highly var-
Dimensionality reduction Methods (DREAM) initiative, has the potential to reveal iable expression among cells); third, normalization of
techniques can ameliorate this
issue, known as the curse of
the top performers and provide guidelines for the selec- expression profiles to allow cell comparison; and, fourth,
dimensionality, and decrease tion of the best method based on the cell type of interest. annotation of cell types based on their transcriptional
the computational time. Importantly, besides simple enumeration of cell types, profiles (Fig. 3c). Seurat is currently the best developed
novel methods including CIBERSORTx88 and linseed93 and documented framework and allows single-cell,
Data sparsity
can reconstruct cell- and sample-specific transcriptional multi-omics data integration, data harmonization and
A data set is sparse when it is
mainly composed of zeros and
profiles and, thus, have the potential to elucidate the cell-type identification36,105. For in-depth review of the
the actual information is rare. functional state of cell subpopulations in the TME. computational tools, we refer readers elsewhere18,112–114.
In single-cell RNA sequencing In summary, the selection of the method depends on Annotation of different cell types is a pivotal step
data sets, data sparsity is the questions to be addressed and the type of informa- in scRNA-seq data analysis. However, there is cur-
mainly due to dropouts.
tion expected to be gained. EPIC and quanTIseq are the rently no consensus on how to systematically identify
Dropouts preferred methods to obtain cell fractions that can be known and novel cell types (or cell states) based on
In single-cell RNA sequencing compared both within and between samples, whereas their expression profiles. One common approach is to
(scRNA-seq) data, when MCP-counter and xCell provide higher signature use unsupervised clustering to group cells with similar
expressed genes result in null
specificity and lower background noise, respectively92. profiles and — assuming that each cluster represents
expression values due to the
inefficiency of mRNA capture
Specific methods can be also selected considering the one cell type or cell state — to identify the marker genes
and/or to the stochasticity of cell type of interest (for example, CIBERSORT, xCell and that are specific for each cluster. This approach has sev-
mRNA expression. They are quanTIseq for M1/M2 macrophages; xCell, EPIC and eral limitations. First, clustering approaches may force
the main cause of data sparsity TIMER for epithelial cells). the partitioning of the data into discrete clusters even
in scRNA-seq data sets.
when cells cover a continuum of states. Second, results
Doublets Cell phenotypes from single-cell data. Compared with strongly depend on the clustering strategy adopted (that
Pairs of cells that are captured bulk approaches, single-cell technologies can provide is, computational method and parameter settings).
and sequenced together in complementary insights into cancer immunity (Fig. 3b) Third, standard clustering methods might not iden-
single-cell RNA sequencing
and have been used to study the TME of different can- tify small clusters or rare cells, and therefore dedicated
experiments. As doublets have
hybrid transcriptomes that
cer types (for example, refs19,94–99). Notably, scRNA-seq approaches such as RaceID3 (ref.115) and GiniClust116
might be falsely interpreted as techniques open new avenues to study rare or unknown have to be used. Last, once cluster-specific marker genes
intermediate cell phenotypes, immune cell types100, and can shed light on the transcrip- are identified there is no standard strategy to assign cell
they have to be identified and tional programmes that underlie the plasticity and func- identities to clusters. Most of the scRNA-seq studies pub-
removed before running
downstream analysis and data
tionality of the immune cells. For example, scRNA-seq lished so far involve manual cell annotation based on
interpretation. from tumour-infiltrating CD8+ T cells can provide val- marker genes and prior knowledge, an approach that is
uable information about their activation state and the labour-intensive and has low reproducibility.
Unsupervised clustering level of exhaustion101. Besides the investigation of gene One alternative approach is to project scRNA-seq
The objective of clustering is to
expression in single immune cell types, single-cell tech- data onto reference expression profiles of previously
find different groups within the
elements in the data (usually
nologies can uncover the genetic and transcriptomic annotated cell types. For example, the tool scmap117 maps
samples or cells in makeup of tumour cells, allowing the study of rare cells single-cell profiles onto single cells or clusters of a refer-
transcriptomic data sets), such as circulating tumour cells, cancer stem cells and ence data set. SingleR classifies single-cell transcriptomes
assigning to the same cluster cells committed to epithelial-to-mesenchymal transition, by comparing them with expression profiles of sorted
the elements that are more
similar to each other. This
as well as the detection of cell-specific genetic variants cell types using correlation analysis118. SingleR embeds
process is called unsupervised and estimation of tumour clonality and evolution102. reference data from human and mouse cell populations,
because the real groups are not However, care must be taken when using scRNA-seq including immune cells, but also accepts user-supplied
known a priori. By contrast, techniques for quantifying the cellular composition of references118. The recently developed Garnett tool uses
supervised clustering or
tumours due to the differences in single-cell dissocia- a hierarchy of cell subtypes and relative gene markers
classification is based on
pre-labelled groups of samples,
tion efficiency relative to immune cells, which can bias together with a reference scRNA-seq data set to build a
which are used to classify a new cell-type proportions99. classifier for annotating cells in external scRNA-seq data
sample considering its similarity Analysis of scRNA-seq data shares some analytical sets119. Other tools leverage the integration and harmo-
to the elements of each group. steps with bulk RNA-seq (for example, read mapping) nization of scRNA-seq data sets across studies36,105, or
Cell ontologies
but also poses additional challenges due to the pecu- the mapping of marker genes onto cell ontologies120. The
Structured vocabularies of cell liarities of these data: high data dimensionality, higher recently developed scMatch method is based on correla-
types. noise and absence of biological replicates per se, and tion analysis, but can either use reference transcriptomes
data sparsity due to gene dropouts103,104. Although the or cell ontologies to annotate cells121. When analysing

Nature Reviews | Genetics


Reviews

large scRNA-seq data sets, a combination of manual and methods have been developed to extract TCRs (for
automated cell annotation should be used, whereas for example, TraCeR 130, TRAPeS 131 and scTCRseq 132)
small data sets manual annotation can be sufficient114. and BCRs (for example, BASIC 133, BraCeR 134 and
However, we recommend caution when prior knowledge BALDR135) or both (for example, VDJPuzzle136,137) from
is integrated, specifically for classifying marker-negative full-transcript scRNA-seq data (Table 1). The relevance
cells as they might be affected by dropouts. of this approach for immuno-oncology is increasingly
Independent evaluation of the computational tools for being appreciated and demonstrated in numerous stud-
scRNA-seq data analysis has not yet been carried out, so ies. For example, Zhang et al.138 performed full-transcript
the use of consensus approaches using diverse methods scRNA-seq from T cells isolated from tumour, normal
to validate the robustness of the results is recommended. mucosa and blood of 12 patients with colorectal cancer.
In the near future, the availability of carefully designed, Paired analysis of single-cell transcriptomes and TCRs
gold-standard data sets such as those recently generated revealed tumour exclusivity of the TCRs of exhausted
using both droplet- and plate-based scRNA-seq122 will CD8+ T cells, and association of this subtype with effec-
finally enable method benchmarking and definition of tor but not central memory CD8+ T cells, implicating a
guidelines and best practices for data analysis. TCR-based fate decision of tumour-infiltrating memory
CD8+ T cells. In another study using a 10X Genomics
Lymphocyte receptor repertoires. Interrogation of can- platform, which enables the direct and simultane-
cer immunity also requires the search for common ous characterization of cell phenotypes and immune
clonotypes involved in the response to tumour antigens receptors, Azizi et al.98 profiled T cells isolated from
in order to identify shared BCR and TCR sequences. eight patients with breast cancer. Through integrative
The specificity of B and T cell responses (that is, which analysis of expression and TCR diversity it was shown
antigens they recognize) depends on the repertoire of that T cell phenotypes and activation states are shaped
receptors they are equipped with. NGS has become a by a combination of antigenic TCR stimulation and
powerful tool to interrogate BCRs and TCRs, and dif- environmental stimuli.
ferent computational tools now provide simplified access
to the analysis of BCR and TCR diversity123. Recently, Spatial cellular phenotyping. The quantification of the
the analysis of immune repertoires from sequencing immune contexture requires images from tissue slides
data has seen two major advancements: the develop- in order to obtain cellular phenotypes and their spa-
ment of dedicated computational tools for the extrac- tial distribution. Individual cells are first detected by
tion of immune repertoires from bulk-tumour RNA-seq thresholding and segmenting the raw images, and then
data, and the possibility to determine pairs of protein their individual phenotypes are identified and classified
chains of individual TCRs and BCRs from single cells by detecting signals from the specific markers in the
(Fig. 3d; Table 1). corresponding cellular compartment (such as nucleus,
Three tools have been recently developed for cytoplasm or cytoplasmic membrane) used in stain-
the analysis of TCR and BCR repertoires from bulk ing procedures. Besides commercial software packages
RNA-seq data. Originally developed for targeted such as inForm (Perkin Elmer), Halo (Indica Labs) or
sequencing of BCR and TCR repertoires124, MiXCR StrataQuest (TissueGnostics GmbH), a growing num-
has been recently adapted to analyse bulk-tumour ber of open-source and free software tools, including
RNA-seq data with high accuracy and precision125. The ImageJ139, CellProfiler140 and Ilastik141, are available for
TRUST algorithm, initially developed for TCR analysis this purpose (Table 1). By combining and extending their
of bulk-tumour RNA-seq data126, can now also extract core functionalities via plug-ins, macros or scripting, cus-
BCR repertoires127. As this approach can produce incom- tom analysis pipelines have been created and adapted to
plete CDR3 sequences mapping to different clonotypes128, fit the different multiplex imaging methods (for example,
data post-processing is advisable to decrease the num- refs28,32,33,91). As an alternative to the use of fully devel-
ber of false positives. Finally, V’DJer is a tool specifically oped imaging software packages, image analysis pipe-
designed to extract BCR repertoires from bulk RNA-seq lines are often implemented using the image-processing
data (as precomputed files of mapped reads), which can routines and libraries from MATLAB (imaging toolbox)
be then quantified in downstream analysis129. Although or Python (scikit-image and opencv)32,34. This is the
promising for its applicability to short-read data, the case for novel multiplex imaging techniques including
requirement of data pre- and post-processing might IMC, MIBI-TOF, MERFISH, CODEX, seqFISH, Spatial
restrict the usage of V’DJer to bioinformaticians. Transcriptomics or Slide-seq, which require specialized
Clonotypes TCRs and BCRs consist of pairs of protein chains pre-processing, image restoration and post-processing
Populations of T cells that carry that, collectively, determine their antigen specificity. tools (Table 1).
identical T cell receptors. In bulk data sets, the pairing of the two chains is lost These primary analyses of raw images typically result
and cannot be tracked back by computational means. in data sets that provide information about each individ-
CDR3 sequences
Complementarity-determining Single-cell approaches not only retain this information ual detected cell, including spatial coordinates, expressed
region 3 (CDR3) is the region of but further allow the joint analysis of transcriptomes markers, staining intensities of the expressed markers,
the variable chain in B cell and immune repertoires to link the latter to the cell compartments and metastructures (that is, tumour or
receptors and T cell receptors state and functional orientation. Despite their still lim- stroma). Different software packages that are either com-
that binds to the cognate
antigen, thus accounting for
ited standardization and the lack of unbiased bench- mercial (for example, TIBCO Spotfire and Phenomap)
most of the variation of marking, these approaches enable analyses that are or freely available (CellProfiler Analyst142 and histo-
immune repertoires. inaccessible to bulk approaches. Several computational CAT143) implement methods (for example, t-distributed

www.nature.com/nrg
Reviews

Secretome stochastic neighbour embedding (t-SNE), PhenoGraph Visualization of single-cell data


All of the factors secreted by a clustering or SPADE) to explore and analyse these Visualization of complex single-cell data is very chal-
cell into the extracellular space. high-dimensional data (Table 1). By assessing the cellu- lenging, and numerous algorithms have been developed
lar phenotypes and their spatial relationship it is possible for representation and exploration (Table 1). Although
Dimensionality reduction
The process of reducing the
to further characterize the tumour immune landscape. many tumour–immune cell analyses (including those
number of features composing For example, the distances between the cellular pheno- discussed below) are focused on tumour-infiltrating
a data set (for example, genes types can give information about the tumour immune immune cells, it is important to bear in mind that
in single-cell RNA sequencing architecture and reveal immune cell types that form the cellular landscape of the TME is highly hetero-
data) to obtain a set of
local neighbourhoods or are dispersed throughout the geneous and includes various additional cell types
principal features. The original
features can be filtered (a
tumour, indicating tumour–immune cell interactions. such as CAFs and endothelial cells (Fig.  1c). Linear
process called feature Furthermore, analysing the spatial neighbourhood can dimensionality reduction methods such as principal com-
selection) or projected from reveal enrichment or depletion in cell–cell interactions ponent analysis (PCA) are often not able to capture
the high-dimensional space of that are indicative of cellular organization and cell–cell the complex structure of single-cell data, and therefore
the original data set into a
space composed of fewer
communication (see also Box 2)143. Clustering multiplex approaches for non-linear transformation in two dimen-
dimensions, like in the case of images based on their phenotypic signatures and/or sions such as t-SNE145 and its derivatives (viSNE146,
t-distributed stochastic specific cell–cell interactions may reveal distinct groups Barnes–Hut-SNE147, Fourier-interpolated t-SNE148 and
neighbour embedding. defining unique disease states32,34,143. hierarchical SNE149) have been developed. In general,
Despite the functionality of the available tools, anal- these methods are used to graphically represent func-
yses of whole-slide images and extraction of informa- tionally related groups of cells as clusters with similar
tion using computational algorithms still represent a gene expression profiles in 2D plots. However, although
major bottleneck as many pre-processing and interme- these methods are widely used, they need careful
diate steps depend on manual interaction and require interpretation as the results are dependent on param-
in-depth knowledge of image processing. Importantly, eters balancing global and local aspects (see ref.150 for
the correct and reliable segmentation of cells, markers recommendations on selecting optimal parameters).
and other features often requires extensive training of Additional issues are the clustering performance, which
machine-learning models (for example, using Ilastik or can be improved by kernel-based similarity learning151,
DeepCell144) or empirical determination of thresholds and accuracy, which can be increased by explicitly
that are specific to the individual samples. modelling the dropout characteristics152.

Box 2 | cell–cell communication


the analysis of cellular communication in the tumour Cell–cell signalling
microenvironment is in its infancy, although recent
technological advances such as in single-cell profiling allow
a systems-wide view not only to follow individual cells but
also to study their interactions and communication. this is Tissue
of particular interest in heterocellular tissues, and there are site 1
various communication modes between cells, particularly
receptor–ligand interaction such as chemokines and Autocrine Paracrine
chemokine receptors191, cytokines and cytokine receptors
or immune checkpoint ligand–receptor pairs, which are Endocrine
highly relevant in cancer immunology. in a seminal paper192
using quantitative proteomics and secretome analyses, the Tissue
social network architecture of immune cells could be Ligand site 2
described by stimulating innate immune cells secreting Receptor
ligands present in different cell types. A subsequent study
using computational literature mining produced the
immuneXpresso resource for system-level characterization
of intercellular interactions in a disease context193. a recent study on single-cell analysis of the maternal–fetal interface
has led to the identification of heterogeneous cell populations and to the development of CellPhoneDB, which is a
publicly available repository of curated receptors, ligands and their interactions194. a focus was on the analysis of
heteromeric complexes because communication between cells often relies on multi-subunit protein complexes.
A non-redundant set of physical molecular interaction data by the International Molecular Exchange Consortium
(IMEx)195 was used.
For cytokine-mediated communication, beyond the known importance of which cytokines and cytokine receptors
are expressed, considerations of their spatial range are also key, as recently shown using mathematical modelling196.
Three modes of cytokine-mediated communication can be observed: autocrine signalling, in which secreted cytokines
are trapped by receptors from the same cell; paracrine signalling, in which secreted cytokines are trapped from receptors
of other cells at the same tissue site; and endocrine signalling, in which secreted cytokines are trapped by receptors of
other cells at other tissue sites197 (see the figure). An example of how computational tools can leverage RNA expression
data was recently shown in a study addressing stroma–tumour crosstalk using a mouse lung cancer model198 and RNA
sequencing data. Another study in ovarian cancer used a computational model for the detection of crosstalk signals
based on ligand–receptor interactions and downstream signalling networks, and elucidated an interaction between a
ligand secreted by epithelial cells and a receptor originating from a particular subpopulation of cancer-associated
fibroblasts199.

Nature Reviews | Genetics


Reviews

a Expression Similarity Dimensionality reduction Pseudo-temporal


and clustering trajectories
Cells Cells

Dimension 2
Genes

Cells
Dimension 1

b c
Exhausted
CD8+LAYN+

Cytotoxic
t-SNE 2

Effector
CD8+CX3CR1+
CD8+ T cells

Naive
Naive
CD8+LEF1+

t-SNE 1 Non-exhausted Exhausted

Fig. 4 | single-cell analysis and visualization of tumour-related T cells. a | Analytical steps for visualization of single-cell
data: starting with an expression matrix indicating normalized expression values for each gene (rows) and single cell
(columns), similarities of expression profiles between two cells each are calculated (for example, using Euclidean distance)
and can be represented as a similarity matrix. As many cells are studied, a more simplified representation can be achieved
by (non-linear) dimensionality reduction and the projections of the most informative components are commonly
visualized in a 2D plot, thereby allowing grouping (clustering) of cells with similar expression profiles. Graph-based
approaches are used to infer linear and branched pseudotime trajectories along which the cells can be ordered.
b | Example t-distributed stochastic neighbour embedding (t-SNE) plot and clustering of single tumour-infiltrating T cells
in cancer. Functionally related marker genes can be assigned to clusters. Cell clusters and marker-gene expression can
shed light on novel, uncharacterized immune cell subtypes or subpopulations of cells with specific functional changes
within the tumour microenvironment, which may be prognostic or predictive for immunotherapy (that is, exhausted
versus cytotoxic CD8+ T cells). Indicated are clusters representing naive CD8+ T cells, exhausted CD8+ T cells and
effector CD8+ T cells, characterized by expression of the indicated marker genes. c | Beyond the useful insights from
clustering analysis in part b, the reconstruction of continuous (branched) cell transitions is only possible through the
computational analysis of pseudotime trajectories. An example pseudo-temporal ordering of CD8+ T cells is shown.
The branched trajectories of CD8+ T cells according to pseudo-temporal reconstruction underscore the functional
orientation of CD8+ T cells and their continuous transitions (from naive to cytotoxic cells, and from non-exhausted to
exhausted). Colour codes are according to clusters in the t-SNE plot and respective marker genes defining those types.
Data plotted in panels b and c are from single-cell analysis of non-small-cell lung cancer samples from 14 patients205.

Non-parametric methods including t-SNE have by pseudo-temporal trajectories (Fig. 4). In this approach
Pseudotime limitations such as loss of large-scale information it is assumed that cells with similar expression profiles
Single-cell RNA sequencing and intercluster relationships, but these limitations are arising from the same lineage, and that cells with
(scRNA-seq) can capture
different cell types and, when
can be circumvented by interpretable dimensionality more similar expression profiles are more closely
the throughput is sufficient, cell reduction153, as recently demonstrated by clustering related156. Once the data have been analysed (Fig. 4a) and
transitions from one functional immune cell types in the TME using scRNA-seq data cells have been projected into a low-dimensional space
state to another. Algorithms for from patients with melanoma19. Similar reproducibility (Fig. 4b), a minimum spanning tree can be used to build
pseudotime ordering can
and preservation of global distances can be achieved a backbone for cell state transitions, for example from
extract from scRNA-seq data
the transcriptional profiles with uniform manifold approximation and projection naive to cytotoxic CD8+ T cells (Fig. 4c). This 1D ordering
underling dynamic changes of (UMAP)154. Notably, recent studies indicate that in some is referred to as pseudotime.
cells moving throughout cases the state of a cell represents a continuum rather One of the first developed tools for this pseudotime
subsequent states, thereby than being assigned to several discrete states, which alignment is Monocle157. Another pioneering method
reconstructing their overall
trajectories in time. This
ensures the plasticity of the immune system to respond to robustly reconstruct lineage branching and to meas-
estimated time reference is to pathogens or to neoantigens released by the tumour155. ure transitions between cell states is diffusion pseudo-
referred to as pseudotime. The continuous nature of cell states can be represented time (DPT) analysis158. This method uses a non-linear

www.nature.com/nrg
Reviews

approach for recovering the low-dimensional structure are those for HLA typing and the tools for predicting class
underlying high-dimensional observations159. As of I HLA binding affinity from NGS data (at least for the
today, various tools based on different methods have common alleles). In other areas, little progress has been
been developed, reviewed156,160 and benchmarked161. achieved for various reasons. Accurate predictions of class
One of the top scoring methods with respect to the II HLA binding affinity is still challenging for both biolog-
analysis of the complexity of the trajectories and over- ical and technical reasons. First, the length of the binding
all performance was partition-based graph abstraction peptides is variable (between 13 and 25 amino acids) and
(PAGA). PAGA generates graph-like maps of cells that the peptide-flanking regions on either side of the binding
preserve both continuous and disconnected structure in core affect peptide–HLA binding. Additionally, there is a
data at multiple resolutions162. A very recent and prom- scarcity of both positive and negative training data sets.
ising addition to the visualization toolbox is a method Thus, rather than optimizing algorithms to improve their
based on Markov processes to characterize cell fate prob- performance by a few per cent (for example, prediction of
abilities (Palantir)163. Finally, an interesting approach is class I HLA binding affinity), efforts should be directed
implemented in the tool velocyto, which uses exonic and towards generating training data and developing methods
intronic reads from scRNA-seq data to model the abun- for applications that are advancing slowly.
dances of pre-mRNAs and mature mRNAs to predict Another extremely challenging area is the predic-
gene expression changes over time (that is, RNA velocity). tion of immunogenicity of neoantigens, that is, which
This information is then used to predict future cell states neoantigens will induce a T  cell response (Box  1) .
and to display cell kinetics in the form of a vector field Understanding the TCR recognition rules for peptide–
overlaid onto a dimensionality-reduced representation of HLA complexes (pHLA) would tremendously help for
the cell populations164. designing cancer vaccines and enabling T cell engineer-
The longitudinal single-cell analysis of samples is ing for solid cancers. Recent studies demonstrated that
an exemplary application of the usefulness of such vis- TCR sequences can be assigned to an antigen specific-
ualization tools165. Using t-SNE clustering, the major ity by sequence analysis alone175,176. Although in these
monocyte/macrophage subpopulations that comprise studies only few epitopes from common viruses were
the intratumoural myeloid compartment could be used, the findings suggest that the development of a
identified, as well as their remodelling upon immune generalizable model of TCR–pHLA recognition might
checkpoint blockers. However, there were limited be possible, which would be an important step towards
insights into the origins of the cells that populate the designing TCR sequences with neoantigen specific-
individual clusters, and only computational analyses of ity and, hence, rationally engineering T cell immunity
the pseudotime-organized sequence of differentiation/ against tumours.
activation events with Monocle2 (ref. 166) revealed The computational tools reviewed here are used to
that neither CX3CR1+ macrophages nor iNOS+ macro­ analyse single molecular entities such as RNA expres-
phages are present in a tumour-induced early state sion or protein expression. Emerging NGS technologies
and that there is obviously a branching point in the fate enable simultaneous measurements of different molecu-
of intratumoural myeloid cells. lar entities such as scRNA-seq coupled with cell-surface
In comparison with scRNA-seq data, visualization protein expression, as in the two related methods
of CyTOF data is more advanced as many methods of cellular indexing of transcriptomes and epitopes
are further developments of visualization methods for (CITE-seq)177 and RNA expression and protein sequenc-
conventional flow cytometry data. Numerous cyto­ ing (REAP-seq)178. Similarly, other methods combine
metry data visualization and clustering tools have been spatial and molecular data like Spatial Transcriptomics42
developed, such as viSNE146, PhenoGraph167, SPADE168, and simultaneous detection of proteins and transcripts
X-shift169, ACCENSE170, FlowSOM171 and Citrus172, and using IMC179 (see also Box 3). These and other upcoming
these tools have been comprehensively reviewed173 assays will require innovative computational methods
and summarized in a web resource (Bridging Bench, and tools for integrative analyses of heterogeneous data
Biology, and Bioinformatics in the Field of Mass in both bulk-tissue and single-cell settings. Specifically,
Cytometry). An interesting approach for visualization integrating information across different modalities asso-
of single-cell CyTOF data is scaffold maps, which are ciated with single-cell data sets such as transcriptomic,
based on force-directed graphs and have been used to epigenomic, proteomic and spatially resolved single-cell
reveal immune organization in different tissues174. data will be necessary to gain deep biological under-
In general, the choice of a specific visualization tool standing beyond listing of cell clusters180. Additionally,
for scRNA-seq or CyTOF data depends on the function- transferring information from one data set to another
ality, the programming preferences (for example, R or will be tremendously helpful for exploratory analysis
Python), the size of the data sets to be analysed and the and biological interpretation. Such original strategies
computational requirements, and should be made in using information integration and transfer learning
the context of the problem addressed. have been recently developed36,181. For example, a novel
approach has been used to transfer scRNA-seq annota-
Emerging methods and future trends tions onto chromatin accessibility data (generated using
Within the past few years numerous methods and tools single-cell assay for transposase accessible chromatin
for interrogating cancer immunity have matured, and (scATAC-seq)), thereby revealing finer distinctions
we do not expect substantial improvements for some of among the cell types36 that was not possible by using
them in the near future. Examples of such mature tools solely scATAC-seq data.

Nature Reviews | Genetics


Reviews

Box 3 | Multimodal interrogation of cancer immunity pioneering attempts were directed towards investigating
the dynamics of tumorigenesis during immunosurveil-
Despite their great potential, it is likely that the information content derived from lance183 or developing patient-specific models to simu-
sequencing-based assays will have to be complemented with additional modalities in late the effects of combination therapies184. We expect
order to comprehensively dissect tumour–immune cell interactions and inform therapy that in the near future the computational toolbox will
for individual patients. For example, longitudinal monitoring and assessment of the
be enriched with such modelling approaches, often
immunotherapy response will probably be based on approaches combining radiological
imaging, liquid biopsies and computational methods to infer changes in tumour integrated in an experimental–computational cycle.
composition and heterogeneity. a promising radiomics approach addressing this issue
was recently presented200. Combining contrast-enhanced computer tomography Conclusions
images and RNA sequencing (RNA-seq) data enabled the development of a radiomic Comprehensive and quantitative interrogation of can-
signature for tumour-infiltrating CD8+ T cells. Such an approach could pave the way cer immunity requires the use of molecular and cellular
for the non-invasive assessment of immune infiltration in tumours and, hence, enable tools, as well as sophisticated computational methods to
longitudinal monitoring of the effects of immunotherapy. analyse complex and large data sets. Given the maturity
additionally, it will be also necessary to dissect tumour cell signalling for several and robustness of NGS-based technologies and the avail-
reasons. First, dysfunctional signalling in tumours arises not only from gene mutations ability of the associated computational tools reviewed
but also from epigenetic modifications and rewiring of signalling networks201.
here, we expect that enormous amounts of data will be
second, cell signalling determines several processes including cell growth, cell–cell
communications, nutrient responses, cell cycle and cell death. Last, as nearly all generated in the upcoming years. This will pose consid-
targeted drugs are directed against signalling molecules, combination immunotherapies erable challenges to, first, make these data available and,
with these drugs will also require analyses of the pharmacological signalling rewiring. second, extract information for immuno-oncology from
signalling information from analyses of oncogenic signalling in tumours, signalling the data sets. For example, there is currently no central-
rewiring induced by drugs and the crosstalk with immune-related pathways has not ized database that hosts genomic/immunogenomic data
been used so far in clinical decision-making due to the lack of sensitive and reproducible from published clinical studies with immune checkpoint
technologies for protein-based measurements. RNA-seq data as a surrogate for blockers and in many cases researchers have to request
phosphoproteomic assays are suboptimal because the regulation of signalling pathways access from individual laboratories. Similarly, a central-
is predominantly at the post-transcriptional level. However, recent developments of ized database that enables queries across scRNA-seq data
(phospho)proteomics techniques202 that enable deep coverage and quantitative
sets would be extremely helpful. Both challenges could
consistency and accuracy provide for the first time the possibility to comprehensively
probe signalling pathways and networks. Notably, advances in organoid technologies203 be solved, but they require community efforts to address
enable the generation of personalized cancer models and thereby also the possibility to and overcome ethical issues (for example, access to con-
study signalling rewiring for individual patients. trolled data including human sequence information) and
technical issues (for example, the size of scRNA-seq data
sets). Thus, existing and future computational tools will
Complementary to the data-driven modelling as be instrumental for the interrogation of cancer immu-
described in the previous sections, mathematical mecha- nity in individual patients and will ultimately enable
nistic modelling and simulations hold promise to derive precision immuno-oncology.
novel insights by providing quantitative predictions
that can be experimentally validated182. For instance, Published online xx xx xxxx

1. Tang, J. et al. Trial watch: the clinical trial landscape 12. Fridman, W. H., Zitvogel, L., Sautès-Fridman, C. 23. Zheng, G. X. Y. et al. Massively parallel digital
for PD1/PDL1 immune checkpoint inhibitors. & Kroemer, G. The immune contexture in cancer transcriptional profiling of single cells. Nat. Commun.
Nat. Rev. Drug Discov. 17, 854–855 (2018). prognosis and treatment. Nat. Rev. Clin. Oncol. 8, 14049 (2017).
2. Charoentong, P. et al. Pan-cancer immunogenomic 14, 717–734 (2017). 24. Bendall, S. C. & Nolan, G. P. From single cells to deep
analyses reveal genotype–immunophenotype 13. Galon, J. et al. Type, density, and location of immune phenotypes in cancer. Nat. Biotechnol. 30, 639–647
relationships and predictors of response to checkpoint cells within human colorectal tumors predict clinical (2012).
blockade. Cell Rep. 18, 248–262 (2017). outcome. Science 313, 1960–1964 (2006). 25. Mansfield, J. R., Hoyt, C. & Levenson, R. M.
3. Thorsson, V. et al. The immune landscape of cancer. 14. Galon, J. & Bruni, D. Approaches to treat immune hot, Visualization of microscopy-based spectral imaging
Immunity 48, 812–830.e14 (2018). altered and cold tumours with combination data from multi-label tissue sections. Curr. Protoc.
4. Hackl, H., Charoentong, P., Finotello, F. & Trajanoski, Z. immunotherapies. Nat. Rev. Drug Discov. 18, Mol. Biol. https://doi.org/10.1002/0471142727.
Computational genomics tools for dissecting tumour– 197–218 (2019). mb1419s84 (2008).
immune cell interactions. Nat. Rev. Genet. 17, 441 15. Motzer, R. J. et al. Avelumab plus axitinib versus 26. Mansfield, J. R. Multispectral imaging: a review
(2016). sunitinib for advanced renal-cell carcinoma. N. Engl. of its technical aspects and applications in
5. Chen, D. S. & Mellman, I. Oncology meets J. Med. 380, 1103–1115 (2019). anatomic pathology. Vet. Pathol. 51, 185–210
immunology: the cancer-immunity cycle. Immunity 39, 16. Panciera, T., Azzolin, L., Cordenonsi, M. & Piccolo, S. (2014).
1–10 (2013). Mechanobiology of YAP and TAZ in physiology and 27. Stack, E. C., Wang, C., Roman, K. A. & Hoyt, C. C.
6. Galluzzi, L., Chan, T. A., Kroemer, G., Wolchok, J. D. & disease. Nat. Rev. Mol. Cell Biol. 18, 758–770 (2017). Multiplexed immunohistochemistry, imaging, and
López-Soto, A. The hallmarks of successful anticancer 17. Knight, R. et al. Best practices for analysing quantitation: a review, with an assessment
immunotherapy. Sci. Transl Med. 10, eaat7807 (2018). microbiomes. Nat. Rev. Microbiol. 16, 410–422 of Tyramide signal amplification, multispectral
7. Fridman, W. H., Pagès, F., Sautès-Fridman, C. & (2018). imaging and multiplex analysis. Methods 70, 46–58
Galon, J. The immune contexture in human tumours: 18. Giladi, A. & Amit, I. Single-cell genomics: a stepping (2014).
impact on clinical outcome. Nat. Rev. Cancer 12, stone for future immunology discoveries. Cell 172, 28. Tsujikawa, T. et al. Quantitative multiplex
298–306 (2012). 14–21 (2018). immunohistochemistry reveals myeloid-inflamed
8. Galluzzi, L., Buqué, A., Kepp, O., Zitvogel, L. & 19. Finotello, F. & Eduati, F. Multi-omics profiling of the tumor-immune complexity associated with poor
Kroemer, G. Immunological effects of conventional tumor microenvironment: paving the way to precision prognosis. Cell Rep. 19, 203–217 (2017).
chemotherapy and targeted anticancer agents. Cancer immuno-oncology. Front. Oncol. 8, 430 (2018). 29. Lin, J.-R., Fallahi-Sichani, M. & Sorger, P. K.
Cell 28, 690–714 (2015). 20. Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Highly multiplexed imaging of single cells using
9. Lee, C.-H., Yelensky, R., Jooss, K. & Chan, T. A. Update Marioni, J. C. & Teichmann, S. A. The technology and a high-throughput cyclic immunofluorescence method.
on tumor neoantigens and their utility: why it is good biology of single-cell RNA sequencing. Mol. Cell 58, Nat. Commun. 6, 8390 (2015).
to be different. Trends Immunol. 39, 536–548 (2018). 610–620 (2015). 30. Gerdes, M. J. et al. Highly multiplexed single-cell
10. Schumacher, T. N., Scheper, W. & Kvistborg, P. Cancer 21. Picelli, S. et al. Smart-seq2 for sensitive full-length analysis of formalin-fixed, paraffin-embedded cancer
neoantigens. Annu. Rev. Immunol. 37, 173–200 (2018). transcriptome profiling in single cells. Nat. Methods tissue. Proc. Natl Acad. Sci. USA 110, 11982–11987
11. Havel, J. J., Chowell, D. & Chan, T. A. The evolving 10, 1096–1098 (2013). (2013).
landscape of biomarkers for checkpoint inhibitor 22. Ziegenhain, C. et al. Comparative analysis of single-cell 31. Schubert, W. et al. Analyzing proteome topology and
immunotherapy. Nat. Rev. Cancer 19, 133–150 RNA sequencing methods. Mol. Cell 65, 631–643.e4 function by automated multidimensional fluorescence
(2019). (2017). microscopy. Nat. Biotechnol. 24, 1270–1278 (2006).

www.nature.com/nrg
Reviews

32. Giesen, C. et al. Highly multiplexed imaging of tumor 59. Liu, G. et al. PSSMHCpan: a novel PSSM-based 85. Becht, E., Giraldo, N. A. & Lacroix, L. Estimating the
tissues with subcellular resolution by mass cytometry. software for predicting class I peptide–HLA binding population abundance of tissue-infiltrating immune
Nat. Methods 11, 417–422 (2014). affinity. Gigascience 6, 1–11 (2017). and stromal cell populations using gene expression.
33. Angelo, M. et al. Multiplexed ion beam imaging 60. Hundal, J. et al. pVACtools: a computational toolkit to Genome Biol. 17, 218 (2016).
of human breast tumors. Nat. Med. 20, 436–442 identify and visualize cancer neoantigens. Preprint at 86. Finotello, F. & Trajanoski, Z. Quantifying
(2014). bioRxiv https://doi.org/10.1101/501817 (2019). tumor-infiltrating immune cells from transcriptomics
34. Keren, L. et al. A Structured tumor-immune 61. Hundal, J. et al. pVAC-Seq: a genome-guided in silico data. Cancer Immunol. Immunother. 67, 1031–1040
microenvironment in triple negative breast cancer approach to identifying tumor neoantigens. Genome (2018).
revealed by multiplexed ion beam imaging. Cell 174, Med. 8, 11 (2016). 87. Newman, A. M. et al. Robust enumeration of cell
1373–1387.e19 (2018). This study proposes a method for neoantigen subsets from tissue expression profiles. Nat. Methods
35. Goltsev, Y. et al. Deep profiling of mouse splenic vaccine design based on the prediction of peptide– 12, 453–457 (2015).
architecture with codex multiplexed imaging. Cell MHC binding affinity and other features linked to This study presents pioneering work on a
174, 968–981.e15 (2018). neoantigen immunogenicity. Hundal et al. (2019) computational method (CIBERSORT) for
36. Stuart, T. et al. Comprehensive integration of presents a suite for neoantigen predictions based building immune cell-specific signatures, cell-type
single-cell data. Cell 177, 1888–1902.e21 (2019). on different machine-learning methods. deconvolution from bulk transcriptomics data and
37. Regev, A. et al. The human cell atlas. eLife 6, e27041 62. Gfeller, D. & Bassani-Sternberg, M. Predicting antigen extraction of transcriptional profiles.
(2017). presentation—what could we learn from a million 88. Newman, A. M. et al. Determining cell type
38. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. peptides? Front. Immunol. 9, 1716 (2018). abundance and expression from bulk tissues with
& Cai, L. Single-cell in situ RNA profiling by sequential 63. O’Donnell, T. J. et al. MHCflurry: open-source class I digital cytometry. Nat. Biotechnol. 37, 773–782
hybridization. Nat. Methods 11, 360–361 (2014). MHC binding affinity prediction. Cell Syst. 7, (2019).
39. Eng, C.-H. L. et al. Transcriptome-scale super-resolved 129–132.e4 (2018). 89. Li, B. et al. Comprehensive analyses of tumor
imaging in tissues by RNA seqFISH. Nature 568, 64. Boehm, K. M., Bhinder, B., Raja, V. J., Dephoure, N. & immunity: implications for cancer immunotherapy.
235–239 (2019). Elemento, O. Predicting peptide presentation by major Genome Biol. 17, 174 (2016).
40. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. histocompatibility complex class I: an improved 90. Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E.
& Zhuang, X. RNA imaging. Spatially resolved, highly machine learning approach to the immunopeptidome. & Gfeller, D. Simultaneous enumeration of cancer and
multiplexed RNA profiling in single cells. Science 348, BMC Bioinformatics 20, 7 (2019). immune cell types from bulk tumor gene expression
aaa6090 (2015). 65. Bassani-Sternberg, M. et al. Deciphering HLA-I motifs data. eLife 6, e26476 (2017).
41. Lee, J. H. et al. Fluorescent in situ sequencing across HLA peptidomes improves neo-antigen 91. Finotello, F. et al. Molecular and pharmacological
(FISSEQ) of RNA for gene expression profiling in intact predictions and identifies allostery regulating modulators of the tumor immune contexture revealed
cells and tissues. Nat. Protoc. 10, 442–458 (2015). HLA specificity. PLOS Comput. Biol. 13, e1005725 by deconvolution of RNA-seq data. Genome Med. 11,
42. Ståhl, P. L. et al. Visualization and analysis of gene (2017). 34 (2019).
expression in tissue sections by spatial 66. Gfeller, D. et al. The length distribution and multiple 92. Sturm, G. et al. Comprehensive evaluation of
transcriptomics. Science 353, 78–82 (2016). specificity of naturally presented HLA-I ligands. transcriptome-based cell-type quantification methods
43. Rodriques, S. G. et al. Slide-seq: a scalable technology J. Immunol. 201, 3705–3716 (2018). for immuno-oncology. Bioinformatics 35, i436–i445
for measuring genome-wide expression at high spatial 67. Bulik-Sullivan, B. et al. Deep learning using tumor (2019).
resolution. Science 363, 1463–1467 (2019). HLA peptide mass spectrometry datasets improves 93. Zaitsev, K., Bambouskova, M., Swain, A. &
44. Ding, L., Wendl, M. C., McMichael, J. F. & Raphael, B. J. neoantigen identification. Nat. Biotechnol. 37, 55–63 Artyomov, M. N. Complete deconvolution of cellular
Expanding the computational toolbox for mining (2019). mixtures based on linearity of transcriptional
cancer genomes. Nat. Rev. Genet. 15, 556–570 68. Andreatta, M. & Nielsen, M. Gapped sequence signatures. Nat. Commun. 10, 2209 (2019).
(2014). alignment using artificial neural networks: application 94. Tirosh, I. et al. Dissecting the multicellular ecosystem
45. Xu, C. A review of somatic single nucleotide variant to the MHC class I system. Bioinformatics 32, of metastatic melanoma by single-cell RNA-seq.
calling algorithms for next-generation sequencing 511–517 (2016). Science 352, 189–196 (2016).
data. Comput. Struct. Biotechnol. J. 16, 15–24 69. Vita, R. et al. The Immune Epitope Database (IEDB) 95. Puram, S. V. et al. Single-cell transcriptomic analysis of
(2018). 3.0. Nucleic Acids Res. 43, D405–D412 (2015). primary and metastatic tumor ecosystems in head and
46. Szolek, A. et al. OptiType: precision HLA typing from 70. Vizcaíno, J. A. et al. 2016 update of the PRIDE neck cancer. Cell 171, 1611–1624.e24 (2017).
next-generation sequencing data. Bioinformatics 30, database and its related tools. Nucleic Acids Res. 44, 96. Li, H. et al. Reference component analysis of single-cell
3310–3316 (2014). 11033 (2016). transcriptomes elucidates cellular heterogeneity in
47. Shukla, S. A. et al. Comprehensive analysis of 71. Shao, W. et al. The SysteMHC Atlas project. human colorectal tumors. Nat. Genet. 49, 708–718
cancer-associated somatic mutations in class I HLA Nucleic Acids Res. 46, D1237–D1247 (2017). (2017).
genes. Nat. Biotechnol. 33, 1152–1158 (2015). 72. Yang, W. et al. Immunogenic neoantigens derived from 97. Lavin, Y. et al. Innate immune landscape in early lung
48. Boegel, S. et al. HLA typing from RNA-seq sequence gene fusions stimulate T cell responses. Nat. Med. 25, adenocarcinoma by paired single-cell analyses. Cell
reads. Genome Med. 4, 102 (2012). 767–775 (2019). 169, 750–765.e17 (2017).
49. Lee, H. & Kingsford, C. Kourami: graph-guided 73. Jensen, K. K. et al. Improved methods for predicting 98. Azizi, E. et al. Single-cell map of diverse immune
assembly for novel human leukocyte antigen allele peptide binding affinity to MHC class II molecules. phenotypes in the breast tumor microenvironment.
discovery. Genome Biol. 19, 16 (2018). Immunology 154, 394–406 (2018). Cell 174, 1293–1308.e36 (2018).
50. Dilthey, A. T. et al. HLA*LA—HLA typing from linearly 74. Zhao, W. & Sher, X. Systematically benchmarking 99. Lambrechts, D. et al. Phenotype molding of stromal
projected graph alignments. Bioinformatics https:// peptide–MHC binding predictors: from synthetic to cells in the lung tumor microenvironment. Nat. Med.
doi.org/10.1093/bioinformatics/btz235 (2019). naturally processed epitopes. PLOS Comput. Biol. 14, 24, 1277–1289 (2018).
51. Orenbuch, R. et al. arcasHLA: high resolution HLA e1006457 (2018). 100. Villani, A.-C. et al. Single-cell RNA-seq reveals new
typing from RNAseq. Bioinformatics https://doi. 75. Racle, J., Michaux, J., Rockinger, G. A. & Arnaud, M. types of human blood dendritic cells, monocytes,
org/10.1093/bioinformatics/btz474 (2019). Deep motif deconvolution of HLA-II peptidomes for and progenitors. Science 356, eaah4573 (2017).
52. Xie, C. et al. Fast and accurate HLA typing from robust class II epitope predictions. Preprint at bioRxiv 101. Zheng, C. et al. Landscape of infiltrating t cells in liver
short-read next-generation sequence data with xHLA. https://doi.org/10.1101/539338 (2019). cancer revealed by single-cell sequencing. Cell 169,
Proc. Natl Acad. Sci. USA 114, 8059–8064 (2017). 76. Bonsack, M. et al. Performance evaluation of MHC 1342–1356.e16 (2017).
53. Kawaguchi, S., Higasa, K., Shimizu, M., Yamada, R. class-I binding prediction tools based on an 102. Navin, N. E. The first five years of single-cell cancer
& Matsuda, F. HLA-HD: an accurate HLA typing experimentally validated MHC–peptide binding data genomics and beyond. Genome Res. 25, 1499–1507
algorithm for next-generation sequencing data. set. Cancer Immunol. Res. 7, 719–736 (2019). (2015).
Hum. Mutat. 38, 788–797 (2017). 77. [No authors listed] The problem with neoantigen 103. Stegle, O., Teichmann, S. A. & Marioni, J. C.
54. Buchkovich, M. L. et al. HLAProfiler utilizes k-mer prediction. Nat. Biotechnol. 35, 97 (2017). Computational and analytical challenges in single-cell
profiles to improve HLA calling accuracy for rare and 78. Sahin, U. et al. Personalized RNA mutanome vaccines transcriptomics. Nat. Rev. Genet. 16, 133–145
common alleles in RNA-seq data. Genome Med. 9, 86 mobilize poly-specific therapeutic immunity against (2015).
(2017). cancer. Nature 547, 222–226 (2017). 104. Yuan, G.-C. et al. Challenges and emerging directions
55. Nielsen, M. et al. Reliable prediction of T-cell epitopes 79. Ott, P. A. et al. An immunogenic personal neoantigen in single-cell analysis. Genome Biol. 18, 84 (2017).
using neural networks with novel sequence vaccine for patients with melanoma. Nature 547, 105. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. &
representations. Protein Sci. 12, 1007–1017 (2003). 217–221 (2017). Satija, R. Integrating single-cell transcriptomic data
56. Jurtz, V. et al. NetMHCpan-4.0: improved peptide– 80. Camidge, D. R., Doebele, R. C. & Kerr, K. M. across different conditions, technologies, and species.
MHC class I interaction predictions integrating eluted Comparing and contrasting predictive biomarkers Nat. Biotechnol. 36, 411 (2018).
ligand and peptide binding affinity data. J. Immunol. for immunotherapy and targeted therapy of NSCLC. This article presents the R toolkit Seurat 3.0 for
199, 3360–3368 (2017). Nat. Rev. Clin. Oncol. 16, 341–355 (2019). the analysis and integration of multimodal
This study and Nielsen et al. (2003) describe the 81. Jiang, P. et al. Signatures of T cell dysfunction and single-cell data.
original and the latest version of the popular tool exclusion predict cancer immunotherapy response. 106. McCarthy, D. J., Campbell, K. R., Lun, A. T. L. &
NetMHCpan that predicts the binding affinity of Nat. Med. 24, 1550–1558 (2018). Wills, Q. F. Scater: pre-processing, quality control,
peptides to class I MHC molecules and provides 82. Auslander, N. et al. Robust prediction of response normalization and visualization of single-cell
high-accuracy predictions for both well-annotated to immune checkpoint blockade therapy in RNA-seq data in R. Bioinformatics 33, 1179–1186
and novel alleles. metastatic melanoma. Nat. Med. 24, 1545–1549 (2017).
57. Han, Y. & Kim, D. Deep convolutional neural networks (2018). 107. Guo, M., Wang, H., Potter, S. S., Whitsett, J. A. & Xu, Y.
for pan-specific peptide–MHC class I binding 83. Aran, D., Hu, Z. & Butte, A. J. xCell: digitally SINCERA: a pipeline for single-cell RNA-seq profiling
prediction. BMC Bioinformatics 18, 585 (2017). portraying the tissue cellular heterogeneity landscape. analysis. PLOS Comput. Biol. 11, e1004575 (2015).
58. Liu, Z. et al. DeepSeqPan, a novel deep convolutional Genome Biol. 18, 220 (2017). 108. Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A
neural network model for pan-specific class I HLA– 84. Tappeiner, E. et al. TIminer: NGS data mining pipeline step-by-step workflow for low-level analysis of
peptide binding affinity prediction. Sci. Rep. 9, 794 for cancer immunology and immunotherapy. single-cell RNA-seq data with bioconductor.
(2019). Bioinformatics 33, 3140–3141 (2017). F1000Res. 5, 2122 (2016).

Nature Reviews | Genetics


Reviews

109. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: 133. Canzar, S., Neu, K. E., Tang, Q., Wilson, P. C. & 158. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. &
large-scale single-cell gene expression data analysis. Khan, A. A. BASIC: BCR assembly from single cells. Theis, F. J. Diffusion pseudotime robustly reconstructs
Genome Biol. 19, 15 (2018). Bioinformatics 33, 425–427 (2017). lineage branching. Nat. Methods 13, 845–848
110. Zhu, X. et al. Granatum: a graphical single-cell 134. Lindeman, I. et al. BraCeR: B-cell-receptor (2016).
RNA-seq analysis pipeline for genomics scientists. reconstruction and clonality inference from single-cell 159. Coifman, R. R. et al. Geometric diffusions as a tool for
Genome Med. 9, 108 (2017). RNA-seq. Nat. Methods 15, 563–565 (2018). harmonic analysis and structure definition of data:
111. Gardeux, V., David, F. P. A., Shajkofci, A., Schwalie, P. C. 135. Upadhyay, A. A. et al. BALDR: a computational diffusion maps. Proc. Natl Acad. Sci. USA 102,
& Deplancke, B. ASAP: a web-based platform for the pipeline for paired heavy and light chain 7426–7431 (2005).
analysis and interactive visualization of single-cell immunoglobulin reconstruction in single-cell RNA-seq 160. Cannoodt, R., Saelens, W. & Saeys, Y. Computational
RNA-seq data. Bioinformatics 33, 3123–3125 data. Genome Med. 10, 20 (2018). methods for trajectory inference from single-cell
(2017). 136. Eltahla, A. A. et al. Linking the T cell receptor transcriptomics. Eur. J. Immunol. 46, 2496–2506
112. Singer, M. & Anderson, A. C. Revolutionizing cancer to the single cell transcriptome in antigen-specific (2016).
immunology: the power of next-generation sequencing human T cells. Immunol. Cell Biol. 94, 604–611 161. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A
technologies. Cancer Immunol Res 7, 168–173 (2016). comparison of single-cell trajectory inference methods.
(2019). 137. Rizzetto, S. et al. B-cell receptor reconstruction from Nat. Biotechnol. 37, 547–554 (2019).
113. Kiselev, V. Y., Andrews, T. S. & Hemberg, M. single-cell RNA-seq with VDJPuzzle. Bioinformatics This study presents a comprehensive benchmark
Challenges in unsupervised clustering of single-cell 34, 2846–2847 (2018). of many computational tools for single-cell
RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019). 138. Zhang, L. et al. Lineage tracking reveals dynamic pseudotime trajectory inference.
114. Luecken, M. D. & Theis, F. J. Current best practices in relationships of T cells in colorectal cancer. Nature 162. Wolf, F. A. et al. PAGA: graph abstraction reconciles
single-cell RNA-seq analysis: a tutorial. Molecular 564, 268–272 (2018). clustering with trajectory inference through a topology
Systems Biology 15, e8746 (2019). 139. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. preserving map of single cells. Genome Biol. 20, 59
This study presents best-practice recommendations NIH Image to ImageJ: 25 years of image analysis. (2019).
covering the different steps of scRNA-seq analysis, Nat. Methods 9, 671–675 (2012). 163. Setty, M. et al. Characterization of cell fate probabilities
also documented in a bioinformatics workflow. 140. Carpenter, A. E. et al. CellProfiler: image analysis in single-cell data with Palantir. Nat. Biotechnol. https://
115. Sagar, Herman, J. S. & Grün, D. FateID infers cell fate software for identifying and quantifying cell doi.org/10.1038/s41587-019-0068-4 (2019).
bias in multipotent progenitors from single-cell phenotypes. Genome Biol. 7, R100 (2006). 164. La Manno, G. et al. RNA velocity of single cells. Nature
RNA-seq data. Nat. Methods 15, 379–386 (2018). This original publication describes a free, flexible, 560, 494–498 (2018).
116. Jiang, L., Chen, H., Pinello, L. & Yuan, G.-C. GiniClust: user-friendly and continuously maintained software 165. Gubin, M. M. et al. High-dimensional analysis
detecting rare cell types from single-cell gene package for developing image analysis and delineates myeloid and lymphoid compartment
expression data with Gini index. Genome Biol. 17, phenotyping pipelines. remodeling during successful immune-checkpoint
144 (2016). 141. Sommer, C., Straehle, C., Kothe, U. & Hamprecht, F. A. cancer therapy. Cell 175, 1014–1030.e19 (2018).
117. Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection in 2011 IEEE Int. Symp. on Biomed. Imaging: From 166. Qiu, X. et al. Reversed graph embedding resolves
of single-cell RNA-seq data across data sets. Nat. Nano to Macro https://doi.org/10.1109/ complex single-cell trajectories. Nat. Methods 14,
Methods 15, 359–362 (2018). isbi.2011.5872394 (IEEE, 2011). 979–982 (2017).
118. Aran, D. et al. Reference-based analysis of lung 142. Dao, D. et al. CellProfiler Analyst: interactive data 167. Levine, J. H. et al. Data-driven phenotypic dissection
single-cell sequencing reveals a transitional profibrotic exploration, analysis and classification of large of AML reveals progenitor-like cells that correlate with
macrophage. Nat. Immunol. 20, 163–172 (2019). biological image sets. Bioinformatics 32, 3210–3212 prognosis. Cell 162, 184–197 (2015).
119. Pliner, H. A., Shendure, J. & Trapnell, C. Supervised (2016). 168. Qiu, P. et al. Extracting a cellular hierarchy from
classification enables rapid annotation of cell atlases. 143. Schapiro, D. et al. histoCAT: analysis of cell phenotypes high-dimensional cytometry data with SPADE.
Preprint at bioRxiv https://doi.org/10.1101/538652 and interactions in multiplex image cytometry data. Nat. Biotechnol. 29, 886–891 (2011).
(2019). Nat. Methods 14, 873–876 (2017). 169. Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L. &
120. Aevermann, B. D. et al. Cell type discovery using 144. Van Valen, D. A. et al. Deep learning automates the Nolan, G. P. Automated mapping of phenotype space
single-cell transcriptomics: implications for ontological quantitative analysis of individual cells in live-cell with single-cell data. Nat. Methods 13, 493–496
representation. Hum. Mol. Genet. 27, R40–R47 imaging experiments. PLOS Comput. Biol. 12, (2016).
(2018). e1005177 (2016). 170. Shekhar, K., Brodin, P., Davis, M. M. &
121. Hou, R., Denisenko, E. & Forrest, A. R. R. scMatch: 145. Maaten, L. vander & Hinton, G. Visualizing data using Chakraborty, A. K. Automatic classification of cellular
a single-cell gene expression profile annotation tool t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). expression by nonlinear stochastic embedding
using reference datasets. Bioinformatics https:// This study is a pioneering work for visualizing (ACCENSE). Proc. Natl Acad. Sci. USA 111, 202–207
doi.org/10.1093/bioinformatics/btz292 (2019). high-dimensional data using non-linear (2014).
122. Tian, L. et al. Benchmarking single cell transformation in two dimensions (t-SNE). 171. Van Gassen, S. et al. FlowSOM: using self-organizing
RNA-sequencing analysis pipelines using mixture 146. Amir, E.-A. D. et al. viSNE enables visualization of high maps for visualization and interpretation of cytometry
control experiments. Nat. Methods 16, 479–487 dimensional single-cell data and reveals phenotypic data. Cytometry A 87, 636–645 (2015).
(2019). heterogeneity of leukemia. Nat. Biotechnol. 31, 172. Bruggner, R. V., Bodenmiller, B., Dill, D. L.,
123. Heather, J. M., Ismail, M., Oakes, T. & Chain, B. 545–552 (2013). Tibshirani, R. J. & Nolan, G. P. Automated identification
High-throughput sequencing of the T-cell receptor 147. Van Der Maaten, L. Accelerating t-SNE using of stratifying signatures in cellular subpopulations.
repertoire: pitfalls and opportunities. Brief. Bioinform. tree-based algorithms. J. Mach. Learn. Res. 15, Proc. Natl Acad. Sci. USA 111, E2770–E2777
19, 554–565 (2018). 3221–3245 (2014). (2014).
124. Bolotin, D. A. et al. MiXCR: software for comprehensive 148. Linderman, G. C., Rachh, M., Hoskins, J. G., 173. Olsen, L. R., Leipold, M. D., Pedersen, C. B. &
adaptive immunity profiling. Nat. Methods 12, Steinerberger, S. & Kluger, Y. Efficient algorithms for Maecker, H. T. The anatomy of single cell mass
380–381 (2015). t-distributed stochastic neighborhood embedding. cytometry data. Cytometry A 95, 156–172 (2019).
125. Bolotin, D. A. et al. Antigen receptor repertoire Preprint at arXiv https://arxiv.org/abs/1712.09005 174. Spitzer, M. H. et al. IMMUNOLOGY. An interactive
profiling from RNA-seq data. Nat. Biotechnol. 35, (2017). reference framework for modeling a dynamic immune
908–911 (2017). 149. van Unen, V. et al. Visual analysis of mass cytometry system. Science 349, 1259425 (2015).
126. Li, B. et al. Landscape of tumor-infiltrating T cell data by hierarchical stochastic neighbour embedding 175. Glanville, J. et al. Identifying specificity groups
repertoire of human cancers. Nat. Genet. 48, reveals rare cell types. Nat. Commun. 8, 1740 (2017). in the T cell receptor repertoire. Nature 547, 94–98
725–732 (2016). 150. Wattenberg, M., Viégas, F. & Johnson, I. How to use (2017).
127. Hu, X. et al. Landscape of B cell immunity and related t-SNE effectively. Distill https://doi.org/10.23915/ 176. Dash, P. et al. Quantifiable predictive features define
immune evasion in human cancers. Nat. Genet. 51, distill.00002 (2016). epitope-specific T cell receptor repertoires. Nature
560–567 (2019). 151. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. & 547, 89–93 (2017).
128. Bolotin, D. A., Poslavsky, S., Davydov, A. N. & Batzoglou, S. Visualization and analysis of single-cell 177. Stoeckius, M. et al. Simultaneous epitope and
Chudakov, D. M. Reply to ‘Evaluation of immune RNA-seq data by kernel-based similarity learning. Nat. transcriptome measurement in single cells.
repertoire inference methods from RNA-seq data’. Methods 14, 414–416 (2017). Nat. Methods 14, 865–868 (2017).
Nat. Biotechnol. 36, 1035–1036 (2018). 152. Pierson, E. & Yau, C. ZIFA: dimensionality reduction 178. Peterson, V. M. et al. Multiplexed quantification of
129. Mose, L. E. et al. Assembly-based inference of B-cell for zero-inflated single-cell gene expression analysis. proteins and transcripts in single cells. Nat. Biotechnol.
receptor repertoires from short read RNA sequencing Genome Biol. 16, 241 (2015). 35, 936–939 (2017).
data with V’DJer. Bioinformatics 32, 3729–3734 153. Ding, J., Condon, A. & Shah, S. P. Interpretable 179. Schulz, D. et al. Simultaneous multiplexed imaging
(2016). dimensionality reduction of single cell transcriptome of mRNA and proteins with subcellular resolution in
130. Stubbington, M. J. T. et al. T cell fate and clonality data with deep generative models. Nat. Commun. 9, breast cancer tissue samples by mass cytometry.
inference from single-cell transcriptomes. Nat. 2002 (2018). Cell Syst. 6, 531 (2018).
Methods 13, 329–332 (2016). 154. Becht, E. et al. Dimensionality reduction for visualizing 180. Stuart, T. & Satija, R. Integrative single-cell analysis.
This study presents TraCeR, a computational single-cell data using UMAP. Nat. Biotechnol. https:// Nat. Rev. Genet. 20, 257–272 (2019).
method for reconstruction of paired TCR chains doi.org/10.1038/nbt.4314 (2018). 181. Stein-O’Brien, G. L. et al. Decomposing cell identity
and inference of clonality and clonotype networks 155. Villani, A.-C., Sarkizova, S. & Hacohen, N. Systems for transfer learning across cellular measurements,
from full-transcript scRNA-seq data. immunology: learning the rules of the immune system. platforms, tissues, and species. Cell Syst. 8, 395–411.
131. Afik, S. et al. Targeted reconstruction of T cell receptor Annu. Rev. Immunol. 36, 813–842 (2018). e8 (2019).
sequence from single cell RNA-seq links CDR3 length 156. Kester, L. & van Oudenaarden, A. Single-cell 182. Altrock, P. M., Liu, L. L. & Michor, F. The mathematics
to T cell differentiation state. Nucleic Acids Res. 45, transcriptomics meets lineage tracing. Cell Stem Cell of cancer: integrating quantitative models. Nat. Rev.
e148 (2017). 23, 166–179 (2018). Cancer 15, 730–745 (2015).
132. Redmond, D., Poran, A. & Elemento, O. Single-cell 157. Trapnell, C. et al. The dynamics and regulators of cell 183. Iwami, S., Haeno, H. & Michor, F. A race between
TCRseq: paired recovery of entire T-cell α and β chain fate decisions are revealed by pseudotemporal tumor immunoescape and genome maintenance
transcripts in T-cell receptors from single-cell RNAseq. ordering of single cells. Nat. Biotechnol. 32, 381–386 selects for optimum levels of (epi)genetic instability.
Genome Med. 8, 80 (2016). (2014). PLOS Comput. Biol. 8, e1002370 (2012).

www.nature.com/nrg
Reviews

184. Kather, J. N. et al. High-throughput screening of between t cells. PLOS Comput. Biol. 11, e1004206 208. Navarro, J. F., Sjöstrand, J., Salmén, F., Lundeberg, J.
combinatorial immunotherapies with patient-specific (2015). & Ståhl, P. L. ST Pipeline: an automated pipeline for
in silico models of metastatic colorectal cancer. Cancer 197. Altan-Bonnet, G. & Mukherjee, R. Cytokine-mediated spatial mapping of unique transcripts. Bioinformatics
Res. 78, 5155–5163 (2018). communication: a quantitative appraisal of immune 33, 2591–2593 (2017).
185. Saini, S. K., Rekers, N. & Hadrup, S. R. Novel tools complexity. Nat. Rev. Immunol. https://doi. This study presents a comprehensive analysis pipeline
to assist neoepitope targeting in personalized cancer org/10.1038/s41577-019-0131-x (2019). and software tools for spatial transcriptomics.
immunotherapy. Ann. Oncol. 28, xii3–xii10 (2017). 198. Choi, H. et al. Transcriptome analysis of individual 209. Street, K. et al. Slingshot: cell lineage and pseudotime
186. Jørgensen, K. W., Rasmussen, M. & Buus, S. stromal cell populations identifies stroma–tumor inference for single-cell transcriptomics. BMC
NetMHCstab—predicting stability of peptide–MHC-I crosstalk in mouse lung cancer model. Cell Rep. 10, Genomics 19, 477 (2018).
complexes; impacts for cytotoxic T lymphocyte epitope 1187–1201 (2015).
discovery. Immunology 141, 18–26 (2014). 199. Yeung, T.-L. et al. Systematic identification of Acknowledgements
187. Rasmussen, M. et al. Pan-specific prediction of druggable epithelial–stromal crosstalk signaling The authors thank S. Boegel for fruitful discussions on
peptide–MHC class I complex stability, a correlate of networks in ovarian cancer. J. Natl Cancer Inst. 111, state-of-the-art computational methods. This work was sup-
T cell immunogenicity. J. Immunol. 197, 1517–1524 272–282 (2019). ported by the European Research Council (grant agreement
(2016). 200. Sun, R. et al. A radiomics approach to assess No. 786295 to Z.T.), the Austrian Cancer Aid/Tyrol (pro-
188. Shugay, M. et al. VDJdb: a curated database of T-cell tumour-infiltrating CD8 cells and response to ject No. 17003 to F.F.), the Austrian Science Fund (FWF)
receptor sequences with known antigen specificity. anti-PD-1 or anti-PD-L1 immunotherapy: an imaging (project No. T 974-B30 to F.F. and projects I3291 and I3978
Nucleic Acids Res. 46, D419–D427 (2018). biomarker, retrospective multicohort study. Lancet to Z.T.) and the Vienna Science and Technology Fund (Project
189. Blank, C. U., Haanen, J. B., Ribas, A. & Oncol. 19, 1180–1191 (2018). LS16–025 to Z.T.). Z.T. is a member of the German Research
Schumacher, T. N. The ‘cancer immunogram’. 201. Yaffe, M. B. Why geneticists stole cancer research Foundation (DFG) project TRR 241(INF).
Science 352, 658–660 (2016). even though cancer is primarily a signaling disease.
190. Łuksza, M. et al. A neoantigen fitness model predicts Sci. Signal. 12, eaaw3483 (2019). Author contributions
tumour response to checkpoint blockade 202. Aebersold, R. & Mann, M. Mass-spectrometric All authors contributed to all aspects of the article.
immunotherapy. Nature 551, 517–520 (2017). exploration of proteome structure and function.
Competing interests
191. Balkwill, F. Cancer and the chemokine network. Nature 537, 347–355 (2016).
The authors declare no competing interests.
Nat. Rev. Cancer 4, 540–550 (2004). 203. Drost, J. & Clevers, H. Organoids in cancer research.
192. Rieckmann, J. C. et al. Social network architecture Nat. Rev. Cancer 18, 407–418 (2018). Peer review information
of human immune cells unveiled by quantitative 204. Kobayashi, H. et al. Cancer-associated fibroblasts in Nature Reviews Genetics thanks T. Chan, A. Gentles and the
proteomics. Nat. Immunol. 18, 583–593 (2017). gastrointestinal cancer. Nat. Rev. Gastroenterol. other, anonymous, reviewer(s) for their contribution to
193. Kveler, K. et al. Immune-centric network of cytokines Hepatol. 16, 282–295 (2019). the peer review of this work.
and cells in disease context identified by 205. Guo, X. et al. Global characterization of T cells in
computational mining of PubMed. Nat. Biotechnol. non-small-cell lung cancer by single-cell sequencing. Publisher’s note
36, 651–659 (2018). Nat. Med. 24, 978–985 (2018). Springer Nature remains neutral with regard to jurisdictional
194. Vento-Tormo, R. et al. Single-cell reconstruction of the 206. Sato, K., Tsuyuzaki, K., Shimizu, K. & Nikaido, I. claims in published maps and institutional affiliations.
early maternal–fetal interface in humans. Nature 563, CellFishing.jl: an ultrafast and scalable cell search
347–353 (2018). method for single-cell RNA sequencing. Genome Biol. Related links
195. Orchard, S. et al. Protein interaction data curation: 20, 31 (2019). Bridging Bench, Biology, and Bioinformatics in the Field of
the International Molecular Exchange (IMEx) 207. Srivastava, D., Iyer, A., Kumar, V. & Sengupta, D. Mass Cytometry: http://cytof.biosurf.org
consortium. Nat. Methods 9, 345–350 (2012). CellAtlasSearch: a scalable search engine for Tumor Deconvolution Challenge: https://www.synapse.
196. Thurley, K., Gerecht, D., Friedmann, E. & Höfer, T. single cells. Nucleic Acids Res. 46, W141–W147 org/#!Synapse:syn15589870/wiki/582446
Three-dimensional gradients of cytokine signaling (2018).

Nature Reviews | Genetics

You might also like