A Gentle Introduction To Deep Learning in Medical Image Processing
A Gentle Introduction To Deep Learning in Medical Image Processing
Image Processing
Abstract
This paper tries to give a gentle introduction to deep learning in medical im-
age processing, proceeding from theoretical foundations to applications. We
first discuss general reasons for the popularity of deep learning, including sev-
eral major breakthroughs in computer science. Next, we start reviewing the
fundamental basics of the perceptron and neural networks, along with some
fundamental theory that is often omitted. Doing so allows us to understand the
reasons for the rise of deep learning in many application domains. Obviously
medical image processing is one of these areas which has been largely affected
by this rapid progress, in particular in image detection and recognition, image
segmentation, image registration, and computer-aided diagnosis. There are also
recent trends in physical simulation, modelling, and reconstruction that have led
to astonishing results. Yet, some of these approaches neglect prior knowledge
and hence bear the risk of producing implausible results. These apparent weak-
nesses highlight current limitations of deep learning. However, we also briefly
discuss promising approaches that might be able to resolve these problems in
the future.
Keywords: Introduction, Deep Learning, Machine Learning, Medical Imaging
Image Classification, Image Segmentation, Image Registration,
Computer-aided Diagnosis, Physical Simulation, Image Reconstruction
1. Introduction
Over the recent years, Deep Learning (DL) [1] has had a tremendous impact
on various fields in science. It has lead to significant improvements in speech
recognition [2] and image recognition [3], it is able to train artificial agents that
beat human players in Go [4] and ATARI games [5], and it creates artistic new
images [6, 7] and music [8]. Many of these tasks were considered to be impossible
to be solved by computers before the advent of deep learning, even in science
fiction literature.
Obviously this technology is also highly relevant for medical imaging. Vari-
ous introductions to the topic can be found in the literature ranging from short
2
Figure 1: Schematic of the traditional pattern recognition pipeline used for automatic decision
making. Sensor data is preprocessed and “hand-crafted” features are extracted in training
and test phase. During training a classifier is trained that is later used in the test phase to
decide the class automatically (after [27]).
Figure 2: Neurons are inspired by biological neurons shown on the left. The resulting computa-
tional neuron computes a weighted sum of its inputs which is then processed by an activation
function h(x) to determine the output value (cf. Fig. 5). Doing so, we are able to model
linear decision boundaries, as the weighted sum can be interpreted as a signed distance to
the decision boundary, while the activation determines the actual class membership. On the
right-hand side, the XOR problem is shown that cannot be solved by a single linear classifier.
It typically requires either curved boundaries or multiple lines.
3
it takes a bias w0 and a weight vector w = (w1 , . . . , wn ) as parameters θ =
(w0 , . . . , wn ) to model a decision
using a non-linear activation function h(x). Hence, a single neuron itself can
already be interpreted as a classifier, if the activation function is chosen such
that it is monotonic, bounded, and continuous. In this case, the maximum and
the minimum can be interpreted as a decision for the one or the other class.
Typical representatives for such activation functions in classical literature are
the sign function sign(x) resulting in Rosenblatt’s perceptron [28], the sigmoid
x −x
function σ(x) = 1+e1−x , or the tangens hyperbolicus tanh(x) = eex −e +e−x . (cf.
Fig. 5). A major disadvantage of individual neurons is that they only allow
to model linear decision boundaries, resulting in the well known fact that they
are not able to solve the XOR problem. Fig. 2 summarizes the considerations
towards the computational neuron graphically.
In combination with other neurons, modelling capabilities increase dramat-
ically. Arranged in a single layer, it can already be shown that neural networks
can approximate any continuous function f (x) on a compact subset of IRn [29].
A single layer network is conveniently summarized as a linear combination of N
individual neurons
NX −1
fˆ(x) = vi h(wi> x + w0,i ) (2)
i=0
4
Figure 3: A decision tree allows to describe any partition of space and can thus model any
decision boundary. Mapping the tree into a one-layer network is possible. Yet, there still is
significant residual error in the resulting function. In the center example, ≈ 0.7. In order
to reduce this error further, a higher number of neurons would be required. If we construct a
network with one node for every inner node in the first layer and one node for every leaf node
in the second layer, we are able to construct a network that results in = 0.
simple example is shown on the top left of the figure, and the associated par-
tition of a two-dimensional space is shown below, where black indicates class
y = 1 and white y = 0. According to the universal approximation theorem, we
should be able to map this function into a single layer network. In the center
column, we attempt to do so using the inner nodes of the tree and their inverses
to construct a six neuron basis. In the bottom of the column, we show the basis
functions that are constructed at every node projected into the input space, and
the resulting network’s approximation, also shown in the input space. Here, we
chose the output weights to minimize ||y − ŷ||2 . As can be seen in the result,
not all areas can be recovered correctly. In fact, the maximal error is close
to 0.7 for a function that is bounded by 0 and 1. In order to improve this ap-
proximation, we can choose to introduce a second layer. As shown in the right
column, we can choose the strategy to map all inner nodes to a first layer and
all leaf nodes of the tree to a second layer. Doing so effectively encodes every
partition that is described by the respective leaf node in the second layer. This
approach is able to map our tree correctly with = 0. In fact, this approach is
general, holds for all decision trees, and was already described by Ivanova et al.
in 1995 [32]. As such, we can now understand why deeper networks may have
more modelling capacity.
5
2.3. Network training
Having gained basic insights into neural networks and their basic topology,
we still need to discuss how its parameters θ are actually determined. The
answer is fairly easy: gradient descent. In order to compute a gradient, we need
to define a function that measures the quality of our parameter set θ, the so-
called loss function L(θ). In the following, we will work with simple examples
for loss functions to introduce the concept of back-propagation, which is the
algorithm that is commonly used to efficiently compute gradients for neural
network training.
We can represent a single-layer fully connected network with linear activa-
tions simply as ŷ = fˆ(x) = W x, i.e., a matrix multiplication. Note that the
network’s output is now multidimensional with ŷ, y ∈ IRm . Using an L2-loss,
we end up with the following objective function:
1 ˆ 1
L(θ) = ||f (x) − y||22 = ||W x − y||22 . (4)
2 2
In order to update the parameters θ = W in this example, we need to compute
∂L ∂L ∂ fˆ
= = (W x − y)(x> ) (5)
∂W ∂ fˆ |∂W
|{z} {z>}
(W x−y) ·(x )
using the chain rule. Note that · indicates the operator’s side, as matrix vector
multiplications generally do not commute. The final weight update is then
obtained as
W j+1 = W j + η(W j x − y)x> , (6)
where η is the so-called learning rate and j is used to index the iteration number.
Now, let us consider a slightly more complicated network structure with
three layers ŷ = fˆ3 (fˆ2 (fˆ1 (x))) = W3 W2 W1 x, again using linear activations.
This yields the following objective function:
1
L(θ) = ||W3 W2 W1 x − y||22 . (7)
2
Note that this example is academic, as θ = {W1 , W2 , W3 } could simply be
collapsed to a single matrix. Yet, the concept that we use to derive this gradient
is generally applicable also to non-linear functions. Computing the gradient with
respect to the parameters of the last layer W3 follows the same recipe as in the
previous network:
∂L ∂L ∂ fˆ3
= = (W3 W2 W1 x − y)(W2 W1 x)> . (8)
∂W3 ∂ fˆ3 ∂W
| {z 3}
|{z}
>
(W3 W2 W1 x−y) ·(W2 W1 x)
6
Figure 4: Graphical overview of back-propagation using layer derivatives. During the forward
pass, the network is evaluated once and compared to the desired output using the loss function.
The back-propagation algorithm follows different paths through the layer graph in order to
compute the matrix derivatives efficiently.
For the computation of the gradient with respect to the second layer W2 , we
already need to apply the chain rule twice:
The matrix derivatives above are also visualized graphically in Fig. 4. Note
that many intermediate results can be reused during the computation of the
gradient, which is one of the reasons why back-propagation is efficient in com-
puting updates. Also note that the forward pass through the net is part of ∂∂Lfˆ3
,
which is contained in all gradients of the net. The other partial derivatives are
only partial derivatives either with respect to the input or the parameters of
the respective layer. Hence, back-propagation can be used if both operations
are known for every layer in the net. Having determined the gradients, each
parameter can now be updated analogous to Eq. 6.
7
Figure 5: Overview of classical (sign(x), σ(x), and tanh(x)) and modern activation functions,
like the Rectified Linear Unit ReLU(x) and the leaky ReLU LReLU(x).
8
network with ReLUs [33]. Hence, several useful and desirable properties are
attained using such modern activation functions.
One disadvantage is, of course, that the ReLU is not differentiable over the
entire domain of x. At x = 0 a kink is found that does not allow to determine a
unique gradient. For optimization, an important property of the gradient of a
function is that it will point towards the direction of the steepest ascent. Hence,
following the negative direction will allow minimization of the function. For a
differentiable function, this direction is unique. If this constraint is relaxed to
allow multiple directions that lead to an extremum, we arrive at sub-gradient
theory [34]. It allows us to still use gradient descent algorithms to optimize such
problems, if it is possible to determine a sub-gradient, i.e., at least one instance
of a valid direction towards the optimum. For the ReLU, any value between
0 and -1 would be acceptable at x = 0 for the descent operation. If such a
direction can be obtained, convergence is guaranteed for convex problems by
application of specific optimization programs, such as using a fixed step size in
the gradient descent [35]. This allows us to remain with back-propagation for
optimization, while using non-differentiable activation functions.
Another significant advance towards deep learning is the use of specialized
layers. In particular, the so-called convolution and pooling layers enable to
model locality and abstraction (cf. Fig. 6). The major advantage of the convo-
lution layers is that they only consider a local neighborhood for each neuron, and
that all neurons of the same layer share the same weights, which dramatically
reduces the amount of parameters and therefore memory required to store such
a layer. These restrictions are identical to limiting the matrix multiplication to
a matrix with circulant structure, which exactly models the operation of con-
volution. As the operation is generally of the form of a matrix multiplication,
the gradients introduced in Section 2.3 still apply. Pooling is an operation that
is used to reduce the scale of the input. For images, typically areas of 2 × 2 or
3 × 3 are analyzed and summarized to a single value. The average operation
can again be expressed as a matrix with hard-coded weights, and gradient com-
putation follows essentially the previous section. Non-linear operations, such
as maximum or median, however, require more attention. Again, we can ex-
ploit the sub-gradient approach. During the forward pass through the net, the
maximum or median can easily be determined. Once this is known, a matrix is
constructed that simply selects the correct elements that would also have been
selected by the non-linear methods. The transpose of the same matrix is then
employed during the backward pass to determine an appropriate sub-gradient
[36]. Fig. 6 shows both operations graphically and highlights an example for
a convolutional neural network (CNN). If we now compare this network with
Fig. 1, we see that the original interpretation as only a classifier is no longer
valid. Instead, the deep network now models all steps directly from the signal
up to the classification stage. Hence, many authors claim that feature “hand-
crafting” is no longer required because everything is learned by the network in
a data-driven manner.
So far, deep learning seems quite easy. However, there are also important
practical issues that all users of deep learning need to be aware of. In particular,
9
a look at the loss over the training iterations is very important. If the loss
increases quickly after the beginning, a typical problem is that the learning rate
η is set too high. This is typically referred to as exploding gradient. Setting η
too low, however, can also result in a stagnation of the loss over iterations. In
this case, we observe again vanishing gradients. Hence, correct choice of η and
other training hyper-parameters is crucial for successful training [37].
In addition to the training set, a validation set is used to determine over-
fitting. In contrast to the training set, the validation set is never used to actually
update the parameter weights. Hence, the loss of the validation set allows an
estimate for the error on unseen data. During optimization, the loss on the
training set will continuously fall. However, as the validation set is independent,
the loss on the validation set will increase at some point in training. This is
typically a good point to stop updating the model before it over-fits to the
training data.
Another common mistake is bias in training or test data. First of all, hyper-
parameter tuning has to be done on validation data before actual test data is
employed. In principle, test data should only be looked at once architecture,
parameters, and all other factors of influence are set. Only then the test data
is to be used. Otherwise, repeated testing will lead to optimistic results [37]
and the system’s performance will be over-estimated. This is as forbidden as
including the test data in the training set. Furthermore, confounding factors
may influence the classification results. If, for example, all pathological data
was collected with Scanner A and all control data was collected with Scanner
B, then the network may simply learn to differentiate the two scanners instead
of the identifying the disease [38].
Due to the nature of gradient descent, training will stop once a minimum
is reached. However, due to the general non-convexity of the loss function,
this minimum is likely to be only a local minimum. Hence, it is advisable to
perform multiple training runs with different initialization techniques in order
to estimate a mean and a standard deviation for the model performance. Single
training runs may be biased towards a single more or less random initialization.
Furthermore, it is very common to use typical regularization terms on pa-
rameters, as it is commonly done in other fields of medical imaging. Here,
L2- and L1-norms are common choices. In addition, regularization can also be
enforced by other techniques such as dropout, weight-sharing, and multi-task
learning. An excellent overview is given in [37].
Also note that the output of a neural network does not equal to confidence,
even if they are scaled between 0 and 1 and appear like probabilities, e.g. when
using the so-called softmax function. In order to get realistic estimates of con-
fidence other techniques have to be employed [39].
The last missing remark towards deep learning is the role of availability of
large amounts of data and labels or annotations that could be gathered over the
internet, the immense compute power that became available by using graphics
cards for general purpose computations, and, last but not least, the positive
trend towards open source software that enables users world-wide to download
and extend deep learning methods very quickly. All three elements were crucial
10
Figure 6: Convolutional layers only face a limited preceptive field and all neurons share the
same weights (cf. left side of the figure; adopted from [40]). Pooling layers reduce the total
input size. Both are typically combined in an alternating manner to construct convolutional
neural networks (CNNs). An example is shown on the right.
11
Ronneberger’s U-net is a breakthrough towards automatic image seg-
mentation [50] and has been applied successfully in many tasks that require
image-to-image transforms, for example, images to segmentation masks. Like
the autoencoder, it consists of a contracting and an expanding branch, and it
enables multi-resolution analysis. In addition, U-net features skip connections
that connect the matching resolution levels of the encoder and the decoder stage.
Doing so, the architecture is able to model general high-resolution multi-scale
image-to-image transforms. Originally proposed in 2-D, many extensions, such
as 3-D versions, exist [51, 52].
ResNets have been designed to enable training of very deep networks [53].
Even with the methods described earlier in this paper, networks will not ben-
efit from more than 30 to 50 layers, as the gradient flow becomes numerically
unstable in such deep networks. In order to alleviate the problem, a so-called
residual block is introduced, and layers take the form fˆ(x) = x + fˆ0 (x), where
fˆ0 (x) contains the actual network layer. Doing so has the advantage that the
addition introduces a second parallel branch into the network that lets the gra-
dient flow from end to end. ResNets also have other interesting properties, e.g.,
their residual blocks behave like ensembles of classifiers [54].
Variational networks enable the conversion of an energy minimization
problem into a neural network structure [55]. We consider this type of network
as particular interesting, as many problems in traditional medical image pro-
cessing are expressed as energy minimization problems. The main idea is as
follows: The energy function is typically minimized by optimization programs
such as gradient descent. Thus, we are able to use the gradient of the original
problem to construct a so-called variational unit that describes exactly one up-
date step of the optimization program. Succession of such units then describe
the complete variational network. Two observations are noteworthy: First, this
type of framework allows to learn operators within one variational unit, such
as a sparsifying transform for compressed sensing problems. Second, the vari-
ational units generally form residual blocks, and thus variational networks are
always ResNets as well.
Recurrent neural networks (RNNs) enable the processing of sequences
with long term dependencies [56]. Furthermore, recurrent nets introduce state
variables that allow the cells to carry memory and essentially model any finite
state machine. Extensions are long-short-term memory (LSTM) networks [57]
and gated recurrent units (GRU) [58] that can model explicit read and write
memory transactions similar to a computer.
12
particular, the success of U-net is also related to very powerful augmentation
techniques that include, for example, non-rigid deformations of input images
and the desired segmentation [50]. In most recent literature, reports are found
that also GANs are useful for data augmentation [59].
Precision learning is a strategy to include known operators into the learn-
ing process [60]. While this idea is counter-intuitive for most recognition tasks,
where we want to learn the optimal representation, the approach is actually very
useful for signal processing tasks in which we know a priori that a certain op-
erator must be present in the processing chain. Embedding the operator in the
network reduces the maximal training error, reduces the number of unknowns
and therefore the number of required training samples, and enables mixing of
most signal processing methods with deep learning. The approach is applica-
ble to a broad range of operators. The main requirement is that a gradient or
sub-gradient must exist.
Adversarial examples consider the input to a neural network as a possible
weak spot that could be exploited by an attacker [61]. Generally, attacks try
to find a perturbation e such that fˆ(x + e) indicates a different class than the
true y, while keeping the magnitude of e low, for example, by minimizing ||e||22 .
Using different objective functions allows to form different types of attacks.
Attacks range from generating noise that will mislead the network, but will
remain unnoticed by a human observer, to specialized patterns that will even
mislead networks after printing and re-digitization [62].
Deep reinforcement learning is a technique that allows to train an artifi-
cial agent to perform actions given inputs from an environment and expands on
traditional reinforcement learning theory [63]. In this context, deep networks
are often used as flexible function approximators representing value functions
and/or policies [4]. In order to enable time-series processing, sequences of envi-
ronmental observations can be employed [5].
3. Results
As can be seen in the last few paragraphs, deep learning now offers a large
set of new tools that are applicable to many problems in the world of medical
image processing. In fact, these tools have already been widely employed. In
particular, perceptual tasks are well suited for deep learning. We present some
highlights that are discussed later in this section in Fig. 7. On the international
conference of Medical Image Computing and Computer-Assisted Intervention
(MICCAI) in 2018, approximately 70 % of all accepted publications were related
to the topic of deep learning. Given this fast pace of progress, we are not able
to describe all relevant publications here. Hence, this overview is far from being
complete. Still we want to highlight some publications that are representative for
the current developments in the field. In terms of structure and organization, we
follow [22] here, but add recent developments in physical simulation and image
reconstruction.
13
Figure 7: Deep learning excels in perceptual tasks such as detection and segmentation. The
left hand side shows the artificial agent-based landmark detection after Ghesu et al. [71]
and the X-ray transform-invariant landmark detection by Bier et al. [67] (projection image
courtesy of Dr. Unberath). The right hand side shows a U-net-based stent segmentation after
Breininger et al. [72]. Images are reproduced with permission by the authors.
14
on convolutional neural networks seem to dominate. Here, we only report Hol-
ger Roth’s Deeporgan [73], the brain MR segmentation using CNN by Moeskops
et al. [74], a fully convolutional multi-energy 3-D U-net presented by Chen et
al. [75], and a U-net-based stent segmentation in X-ray projection domain by
Breininger et al. [72] as representative examples. Obviously segmentation using
deep convolutional networks also works in 2-D as shown by Nirschl et al. for
histopathologic images [76].
Middelton et al. already experimented with the fusion of neural networks
and active contour models in 2004 well before the advent of deep learning [77].
Yet, their approach is neither using deep nets nor end-to-end training, which
would be desirable for a state-of-the-art method. Hence, revisiting traditional
segmentation approaches and fusing them with deep learning in an end-to-end
fashion seems a promising scope of research. Fu et al. follow a similar idea by
mapping Frangi’s vesselness into a neural network [78]. They demonstrate that
they are able to adjust the convolution kernels in the first step of the algorithm
towards the specific task of vessel segmentation in ophthalmic fundus imaging.
Yet another interesting class of segmentation algorithms is the use of re-
current networks for medical image segmentation. Poudel et al. demonstrate
this for a recurrent fully convolutional neural network on multi-slice MRI car-
diac data [79], while Andermatt et al. show effectiveness of GRUs for brain
segmentation [80].
15
problems. Zhong et al. demonstrate this for intra-operative brain shift using
imitation learning [87].
16
Figure 8: Results from a deep learning image-to-image reconstruction based on U-net. The
reference image with a lesion embedded is shown on the left followed by the analytic recon-
struction result that is used as input to U-net. U-net does an excellent job when trained
and tested without noise. If unmatched noise is provided as input, an image is created that
appears artifact-free, yet not just the lesion is gone, but also the chest surface is shifted by
approximately 1 cm. On the right hand side, an undesirable result is shown that emerged at
some point during training of several different versions of U-net which shows organ-shaped
clouds in the air in the background of the image. Note that we omitted displaying multiple
versions of “Limited Angle” as all three inputs to the U-Nets would appear identically given
the display window of the figure of [-1000, 1000] HU.
17
an energy function step by step, the concept of variational networks is useful.
Doing so allows to map virtually all iterative reconstruction algorithms onto
deep networks, e.g., by using a fixed number of iterations. There are several
impressive works found in the literature, of which we only name the MRI re-
construction by Hammernik et al. [112] and the sound speed reconstruction
by Vishnevskiy et al. [113] at this point. The concept can be expanded even
further, as Adler et al. demonstrate by learning an entire primal-dual recon-
struction [114].
Würfl et al. also follow the idea of using prior operators [115, 116]. Their
network is inspired by the classical filtered back-projection that can be retrained
to better approximate limited angle geometries that typically cannot be solved
by classical analytic inversion models. Interestingly, as the approach is described
in an end-to-end fashion, errors in the discretization or initialization of the
filtering steps are intrinsically corrected by the learning process [117]. They also
show that their method is compatible with other approaches, such as variational
networks that are able to learn an additional de-streaking sparsifying transform
[118]. Syben et al. drive these efforts even further and demonstrate that the
concept of precision learning is able to mathematically derive a neural network
structure [119]. In their work, they demonstrate that they are able to postulate
that an expensive matrix inverse is a circulant matrix and hence can be replaced
by a convolution operation. Doing so leads to the derivation of a previously
unknown filtering, back-projection, re-projection-style rebinning algorithm that
intrinsically suffers less from resolution loss than traditional interpolation-based
rebinning methods.
As noted earlier, all networks are prone to adversarial attacks. Huang et
al. demonstrate this [108] in their work, showing that already incorrect noise
modelling may distort the entire image. Yet, the networks reconstruct visually
pleasing results and artifacts cannot be as easily identified as in classical meth-
ods. One possible remedy is to follow the precision learning paradigm and fix
as much of the network as possible, such that it can be analyzed with classical
methods as demonstrated in [116]. Another promising approach is Bayesian
deep learning [39]. Here the network output is two-fold: the reconstructed im-
age plus a confidence map on how accurate the content of the reconstructed
image was actually measured.
Obviously, deep learning also plays a role in suppression of artifacts. In [120],
Zhang et al. demonstrate this effectively for metal artifacts. As a last example,
we list Bier et al. here, as they show that deep learning-based motion tracking
is also feasible for motion compensated reconstruction [121].
4. Discussion
In this introduction, we reviewed the latest developments in deep learning for
medical imaging. In particular detection, recognition, and segmentation tasks
are well solved by the deep learning algorithms. Those tasks are clearly linked
to perception and there is essentially no prior knowledge present. Hence, state-
of-the-art architectures from other fields, such as computer vision, can often be
18
easily adopted to medical tasks. In order to gain better understanding of the
black box, reinforcement learning and modelling of artificial agents seem well
suited.
In image registration, deep learning is not that broadly used. Yet, interesting
approaches already exist that are able to either predict deformations directly
from the image input, or take advantage of reinforcement learning-based tech-
niques that model registration as on optimal control problem. Further benefits
are obtained using deep networks for learning representations, which are either
done in an unsupervised fashion or using the registration metric itself.
Computer-aided diagnosis is a hot topic with many recent publications ad-
dress. We expect that simpler standard tasks that typically result in a high
workload for medical doctors will be solved first. For more complex diagnoses,
the current deep nets that immediately result in a decision are not that well
suited, as it is difficult to understand the evidence. Hence, approaches are
needed that link observations to evidence to construct a line of argument to-
wards a decision. It is the strong belief of the authors that only if such evidence-
based decision making is achieved, the new methodology will make a significant
impact to computer-aided diagnosis.
Physical simulation can be accelerated dramatically with realistic outcomes
as shown in the field of computer games and graphics. Therefore, the meth-
ods are highly relevant, in particular for interventional applications, in which
real-time processing is mandatory. First approaches exist, yet there is consid-
erable room for more new developments. In particular, precision learning and
variational networks seem to be well suited for such tasks, as they provide some
guarantees to prediction outcomes. Hence, we believe that there are many new
developments to follow, in particular in radiation therapy and real-time inter-
ventional dose tracking.
Reconstruction based on data-driven methods yield impressive results. Yet,
they may suffer from a “new kind” of deep learning artifacts. In particular, the
work by Huang et al. [108] show these effects in great detail. Both precision
learning as well as Bayesian approaches seem well suited to tackle the problem in
the future. Yet, it is unclear how to benefit best from the data-driven methods
while maintaining intuitive and safe image reading.
A great advantage of all the deep learning methods is that they are inher-
ently compatible to each other and to many classical approaches. This fusion
will spark many new developments in the future. In particular, the fusion on
network-level using either the direct connection of networks or precision learning
allows end-to-end training of algorithms. The only requirement for this deep
fusion is that each operation in the hybrid net has a gradient or sub-gradient
for the optimization. In fact, there are already efforts to design whole program-
ming languages to be compatible with this kind of differential programming
[122]. With such integrated networks, multi-task learning is enabled, for ex-
ample, training of networks that deliver optimal reconstruction quality and the
best volumetric overlap of the resulting segmentation at the same time, as al-
ready conjectured in [123]. This point may even be expanded to computer-aided
diagnosis or patient benefit.
19
In general, we observe that the CNN architectures that emerge from deep
learning are computationally very efficient. Networks find solutions that are on
par or better than many state-of-the-art algorithms. However, their computa-
tional cost at inference time is often much lower than state-of-the-art algorithms
in typical domains of medical imaging in detection, segmentation, registration,
reconstruction, and physical simulation tasks. This benefit at run-time comes
at high computational cost during training that can take days even on GPU
clusters. Given an appropriate problem domain and training setup, we can thus
exploit this effect to save run-time at the cost of additional training time.
Deep learning is extremely data hungry. This is one of the main limitations
that the field is currently facing, and performance grows only logarithmically
with the amount of data used [124]. Approaches like weakly supervised training
[125] will only partially be able to close this gap. Hence, one hospital or one
group of researchers will not be able to gather a competitive amount of data in
the near future. As such, we welcome initiatives such as the grand challenges1 or
medical data donors2 , and hope that they will be successful with their mission.
5. Conclusion
Acknowledgements
1 https://grand-challenge.org
2 http://www.medicaldatadonors.org
20
but not least, we also express our gratitude to the participants of the course
“Computational Medical Imaging3 ”, who were essentially the test audience of
this article during the summer school “Ferienakademie 2018”.
References
[1] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (2015) 436.
[10] D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis,
Annual review of biomedical engineering 19 (2017) 221–248.
[11] N. Pawlowski, S. I. Ktena, M. C. Lee, B. Kainz, D. Rueckert, B. Glocker,
M. Rajchl, Dltk: State of the art reference implementations for deep
learning on medical images, arXiv preprint arXiv:1711.06853 (2017).
3 https://www5.cs.fau.de/lectures/sarntal-2018/
21
[12] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoo-
rian, J. A. van der Laak, B. Van Ginneken, C. I. Sánchez, A survey on
deep learning in medical image analysis, Medical image analysis 42 (2017)
60–88.
[13] B. J. Erickson, P. Korfiatis, Z. Akkus, T. L. Kline, Machine learning for
medical imaging, Radiographics 37 (2017) 505–515.
[14] K. Suzuki, Survey of deep learning applications to medical image analysis,
Med Imaging Technol 35 (2017) 212–226.
[15] J. Hagerty, R. J. Stanley, W. V. Stoecker, Medical image processing in
the age of deep learning, in: Proceedings of the 12th International Joint
Conference on Computer Vision, Imaging and Computer Graphics Theory
and Applications (VISIGRAPP 2017), pp. 306–311.
[16] P. Lakhani, D. L. Gray, C. R. Pett, P. Nagy, G. Shih, Hello world deep
learning in medical imaging, Journal of digital imaging 31 (2018) 283–289.
[24] F. Chollet, Deep learning with python, Manning Publications Co., 2017.
[25] A. Géron, Hands-on machine learning with Scikit-Learn and TensorFlow:
concepts, tools, and techniques to build intelligent systems, ” O’Reilly
Media, Inc.”, 2017.
22
[26] B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker, K. H.
Cha, R. M. Summers, M. L. Giger, Deep learning in medical imaging and
radiation therapy, Medical physics (2018).
[27] H. Niemann, Pattern analysis and understanding, volume 4, Springer Sci-
ence & Business Media, 2013.
23
[39] J. Schlemper, D. C. Castro, W. Bai, C. Qin, O. Oktay, J. Duan, A. N.
Price, J. Hajnal, D. Rueckert, Bayesian deep learning for accelerated mr
image reconstruction, in: F. Knoll, A. Maier, D. Rueckert (Eds.), Ma-
chine Learning for Medical Image Reconstruction, Springer International
Publishing, Cham, 2018, pp. 64–71.
[40] V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learn-
ing, ArXiv e-prints (2016).
[41] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and
composing robust features with denoising autoencoders, in: Proceedings
of the 25th international conference on Machine learning, ACM, pp. 1096–
1103.
[42] D. Holden, J. Saito, T. Komura, T. Joyce, Learning motion manifolds
with convolutional autoencoders, in: SIGGRAPH Asia 2015 Technical
Briefs, ACM, p. 18.
[43] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked
denoising autoencoders: Learning useful representations in a deep network
with a local denoising criterion, Journal of machine learning research 11
(2010) 3371–3408.
[44] F. J. Huang, Y.-L. Boureau, Y. LeCun, et al., Unsupervised learning
of invariant feature hierarchies with applications to object recognition,
in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE
Conference on, IEEE, pp. 1–8.
[45] I. Goodfellow, Nips 2016 tutorial: Generative adversarial networks, arXiv
preprint arXiv:1701.00160 (2016).
[46] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial
networks, in: International Conference on Machine Learning, pp. 214–223.
[47] J. Gauthier, Conditional generative adversarial nets for convolutional face
generation, Class Project for Stanford CS231N: Convolutional Neural
Networks for Visual Recognition, Winter semester 2014 (2014) 2.
[48] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image trans-
lation using cycle-consistent adversarial networks, arXiv preprint (2017).
[49] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Er-
han, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in:
Proceedings of the IEEE conference on computer vision and pattern recog-
nition, pp. 1–9.
[50] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for
biomedical image segmentation, in: International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention, Springer, pp.
234–241.
24
[51] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-
net: learning dense volumetric segmentation from sparse annotation, in:
International Conference on Medical Image Computing and Computer-
Assisted Intervention, Springer, pp. 424–432.
[52] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neu-
ral networks for volumetric medical image segmentation, in: 3D Vision
(3DV), 2016 Fourth International Conference on, IEEE, pp. 565–571.
[53] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
nition, in: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 770–778.
25
[64] Y. Zheng, D. Comaniciu, Marginal space learning, in: Marginal Space
Learning for Medical Image Analysis, Springer, 2014, pp. 25–65.
[65] F. C. Ghesu, E. Krubasik, B. Georgescu, V. Singh, Y. Zheng, J. Horneg-
ger, D. Comaniciu, Marginal space deep learning: efficient architecture
for volumetric image parsing, IEEE transactions on medical imaging 35
(2016) 1217–1228.
[66] F. C. Ghesu, E. Krubasik, B. Georgescu, V. Singh, Y. Zheng, J. Horneg-
ger, D. Comaniciu, Marginal space deep learning: efficient architecture
for volumetric image parsing, IEEE transactions on medical imaging 35
(2016) 1217–1228.
[67] B. Bier, M. Unberath, J.-N. Zaech, J. Fotouhi, M. Armand, G. Osgood,
N. Navab, A. Maier, X-ray-transform invariant anatomical landmark de-
tection for pelvic trauma surgery, in: A. F. Frangi, J. A. Schnabel, C. Da-
vatzikos, C. Alberola-López, G. Fichtinger (Eds.), Medical Image Com-
puting and Computer Assisted Intervention – MICCAI 2018, Springer
International Publishing, Cham, 2018, pp. 55–63.
[68] A. Akselrod-Ballin, L. Karlinsky, S. Alpert, S. Hasoul, R. Ben-Ari,
E. Barkan, A region based convolutional network for tumor detection
and classification in breast mammography, in: Deep Learning and Data
Labeling for Medical Applications, Springer, 2016, pp. 197–205.
[69] M. Aubreville, M. Krappmann, C. Bertram, R. Klopfleisch, A. Maier, A
Guided Spatial Transformer Network for Histology Cell Differentiation, in:
T. E. Association (Ed.), Eurographics Workshop on Visual Computing for
Biology and Medicine, pp. 021–025.
[70] M. Aubreville, M. Stöve, N. Oetter, M. de Jesus Goncalves, C. Knipfer,
H. Neumann, C. Bohr, F. Stelzle, A. Maier, Deep learning-based detec-
tion of motion artifacts in probe-based confocal laser endomicroscopy im-
ages, International Journal of Computer Assisted Radiology and Surgery
(2018).
[71] F. C. Ghesu, B. Georgescu, Y. Zheng, S. Grbic, A. Maier, J. Hornegger,
D. Comaniciu, Multi-scale deep reinforcement learning for real-time 3d-
landmark detection in ct scans, IEEE Transactions on Pattern Analysis
and Machine Intelligence (2017).
[72] K. Breininger, S. Albarqouni, T. Kurzendorfer, M. Pfister, M. Kowarschik,
A. Maier, Intraoperative stent segmentation in X-ray fluoroscopy for en-
dovascular aortic repair, International Journal of Computer Assisted Ra-
diology and Surgery 13 (2018).
[73] H. R. Roth, L. Lu, A. Farag, H.-C. Shin, J. Liu, E. B. Turkbey, R. M. Sum-
mers, Deeporgan: Multi-level deep convolutional networks for automated
pancreas segmentation, in: International conference on medical image
computing and computer-assisted intervention, Springer, pp. 556–564.
26
[74] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J.
Benders, I. Išgum, Automatic segmentation of mr brain images with a
convolutional neural network, IEEE transactions on medical imaging 35
(2016) 1252–1261.
[75] S. Chen, X. Zhong, S. Hu, S. Dorn, M. Kachelriess, M. Lell, A. Maier,
Automatic Multi-Organ Segmentation in Dual Energy CT using 3D Fully
Convolutional Network, in: B. van Ginneken, M. Welling (Eds.), MIDL.
[76] J. J. Nirschl, A. Janowczyk, E. G. Peyster, R. Frank, K. B. Margulies,
M. D. Feldman, A. Madabhushi, Deep learning tissue segmentation in
cardiac histopathology images, in: Deep Learning for Medical Image
Analysis, Elsevier, 2017, pp. 179–195.
[77] I. Middleton, R. I. Damper, Segmentation of magnetic resonance images
using a combination of neural networks and active contour models, Med-
ical engineering & physics 26 (2004) 71–86.
[78] W. Fu, K. Breininger, R. Schaffert, N. Ravikumar, T. Würfl, J. Fujimoto,
E. Moult, A. Maier, Frangi-Net: A Neural Network Approach to Vessel
Segmentation, in: A. Maier, D. T., H. H., M.-H. K., P. C., T. T. (Eds.),
Bildverarbeitung für die Medizin 2018, pp. 341–346.
[79] R. P. Poudel, P. Lamata, G. Montana, Recurrent fully convolutional neu-
ral networks for multi-slice mri cardiac segmentation, in: Reconstruction,
Segmentation, and Analysis of Medical Images, Springer, 2016, pp. 83–94.
[80] S. Andermatt, S. Pezold, P. Cattin, Multi-dimensional gated recurrent
units for the segmentation of biomedical 3d-data, in: Deep Learning and
Data Labeling for Medical Applications, Springer, 2016, pp. 142–151.
27
[85] R. Liao, S. Miao, P. de Tournemire, S. Grbic, A. Kamen, T. Mansi, D. Co-
maniciu, An artificial agent for robust image registration., in: AAAI, pp.
4168–4175.
[86] J. Krebs, T. Mansi, H. Delingette, L. Zhang, F. C. Ghesu, S. Miao,
A. Maier, N. Ayache, R. Liao, A. Kamen, Robust Non-Rigid Registration
through Agent-Based Action Learning, in: Springer (Ed.), Medical Im-
age Computing and Computer-Assisted Intervention - MICCAI 2017, pp.
344–352.
[87] X. Zhong, S. Bayer, N. Ravikumar, N. Strobel, A. Birkhold,
M. Kowarschik, R. Fahrig, A. Maier, Resolve Intraoperative Brain Shift
as Imitation Game, in: M. 2018 (Ed.), MICCAI Challenge 2018 for Cor-
rection of Brainshift with Intra-Operative Ultrasound (CuRIOUS 2018).
[88] I. Diamant, Y. Bar, O. Geva, L. Wolf, G. Zimmerman, S. Lieberman,
E. Konen, H. Greenspan, Chest radiograph pathology categorization via
transfer learning, in: Deep Learning for Medical Image Analysis, Elsevier,
2017, pp. 299–320.
[89] J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev,
S. Blackwell, H. Askham, X. Glorot, B. ODonoghue, D. Visentin, et al.,
Clinically applicable deep learning for diagnosis and referral in retinal
disease, Nature medicine 24 (2018) 1342.
28
Medical Imaging meets NeurIPS Workshop at 32nd Conference on Neural
Information Processing Systems (NeurIPS), 2018.
[96] J. Maier, Y. Berker, S. Sawall, M. Kachelrieß, Deep scatter estimation
(dse): feasibility of using a deep convolutional neural network for real-
time x-ray scatter prediction in cone-beam ct, in: Medical Imaging 2018:
Physics of Medical Imaging, volume 10573, International Society for Op-
tics and Photonics, p. 105731L.
[97] M. Unberath, J.-N. Zaech, S. C. Lee, B. Bier, J. Fotouhi, M. Armand,
N. Navab, Deepdrr – a catalyst for machine learning in fluoroscopy-guided
procedures, in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-
López, G. Fichtinger (Eds.), Medical Image Computing and Computer
Assisted Intervention – MICCAI 2018, Springer International Publishing,
Cham, 2018, pp. 98–106.
[98] F. Horger, T. Würfl, V. Christlein, A. Maier, Towards arbitrary noise
augmentationdeep learning for sampling from arbitrary probability dis-
tributions, in: International Workshop on Machine Learning for Medical
Image Reconstruction, Springer, pp. 129–137.
[99] X. Han, Mr-based synthetic ct generation using a deep convolutional
neural network method, Medical physics 44 (2017) 1408–1419.
[100] B. Stimpel, C. Syben, T. Würfl, K. Mentl, A. Dörfler, A. Maier, MR
to X-ray Projection Image Synthesis, in: F. Noo (Ed.), Proceedings of
the 5th International Conference on Image Formation in X-ray Computed
Tomography (CT-Meeting), pp. 435–438.
[101] F. Schiffers, Z. Yu, S. Arguin, A. Maier, Q. Ren, Synthetic fundus flu-
orescein angiography using deep neural networks, in: A. Maier, T. M.
Deserno, H. Handels, K. H. Maier-Hein, C. Palm, T. Tolxdorff (Eds.),
Bildverarbeitung für die Medizin 2018, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2018, pp. 234–238.
[102] J. P. Cohen, M. Luck, S. Honari, Distribution matching losses can hal-
lucinate features in medical image translation, in: A. F. Frangi, J. A.
Schnabel, C. Davatzikos, C. Alberola-López, G. Fichtinger (Eds.), Med-
ical Image Computing and Computer Assisted Intervention – MICCAI
2018, Springer International Publishing, Cham, 2018, pp. 529–536.
[103] G. Wang, J. C. Ye, K. Mueller, J. A. Fessler, Image reconstruction is a
new frontier of machine learning., IEEE transactions on medical imaging
37 (2018) 1289–1296.
[104] M. T. McCann, K. H. Jin, M. Unser, A review of convolutional neural
networks for inverse problems in imaging, arXiv preprint arXiv:1710.04011
(2017).
29
[105] Z. Zhang, X. Liang, X. Dong, Y. Xie, G. Cao, A sparse-view ct recon-
struction method based on combination of densenet and deconvolution.,
IEEE transactions on medical imaging 37 (2018) 1407–1417.
[106] A. Kofler, M. Haltmeier, C. Kolbitsch, M. Kachelrieß, M. Dewey, A u-nets
cascade for sparse view computed tomography, in: International Work-
shop on Machine Learning for Medical Image Reconstruction, Springer,
pp. 91–99.
[107] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, M. S. Rosen, Image re-
construction by domain-transform manifold learning, Nature 555 (2018)
487.
30
[116] T. Würfl, M. Hoffmann, V. Christlein, K. Breininger, Y. Huang, M. Un-
berath, A. K. Maier, Deep learning computed tomography: Learning
projection-domain weights from image domain in limited angle problems,
IEEE transactions on medical imaging 37 (2018) 1454–1463.
[117] C. Syben, B. Stimpel, K. Breininger, T. Würfl, R. Fahrig, A. Dörfler,
A. Maier, Precision learning: Reconstruction filter kernel discretization,
in: Proceedings of the CT Meeting 2018. To appear.
[118] K. Hammernik, T. Würfl, T. Pock, A. Maier, A deep learning architecture
for limited-angle computed tomography reconstruction, in: K. H. Maier-
Hein, geb. Fritzsche, T. M. Deserno, geb. Lehmann, H. Handels, T. Tolx-
dorff (Eds.), Bildverarbeitung für die Medizin 2017, Springer Berlin Hei-
delberg, Berlin, Heidelberg, 2017, pp. 92–97.
[119] C. Syben, B. Stimpel, J. Lommen, T. Würfl, A. Dörfler, A. Maier, De-
riving neural network architectures using precision learning: Parallel-to-
fan beam conversion, in: German Conference on Pattern Recognition
(GCPR).
[120] Y. Zhang, H. Yu, Convolutional neural network based metal artifact
reduction in x-ray computed tomography, IEEE transactions on medical
imaging 37 (2018) 1370–1381.
[121] B. Bier, K. Aschoff, C. Syben, M. Unberath, M. Levenston, G. Gold,
R. Fahrig, A. Maier, Detecting anatomical landmarks for motion estima-
tion in weight-bearing imaging of knees, in: International Workshop on
Machine Learning for Medical Image Reconstruction, Springer, pp. 83–90.
[122] T.-M. Li, M. Gharbi, A. Adams, F. Durand, J. Ragan-Kelley, Differ-
entiable programming for image processing and deep learning in halide,
ACM Transactions on Graphics (TOG) 37 (2018) 139.
[123] G. Wang, A perspective on deep imaging, IEEE Access 4 (2016) 8914–
8924.
[124] C. Sun, A. Shrivastava, S. Singh, A. Gupta, Revisiting unreasonable
effectiveness of data in deep learning era, arXiv preprint arXiv:1707.02968
1 (2017).
[125] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Is object localization for free?-
weakly-supervised learning with convolutional neural networks, in: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 685–694.
31