0% found this document useful (0 votes)
979 views

A Gentle Introduction To Deep Learning in Medical Image Processing

This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning

Uploaded by

Frederic Menezes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
979 views

A Gentle Introduction To Deep Learning in Medical Image Processing

This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning

Uploaded by

Frederic Menezes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

A Gentle Introduction to Deep Learning in Medical

Image Processing

Andreas Maier1 , Christopher Syben1 , Tobias Lasser2 , Christian Riess1


1 Friedrich-Alexander-University Erlangen-Nuremberg, Germany
2 Technical University of Munich, Germany
arXiv:1810.05401v2 [cs.CV] 21 Dec 2018

Abstract
This paper tries to give a gentle introduction to deep learning in medical im-
age processing, proceeding from theoretical foundations to applications. We
first discuss general reasons for the popularity of deep learning, including sev-
eral major breakthroughs in computer science. Next, we start reviewing the
fundamental basics of the perceptron and neural networks, along with some
fundamental theory that is often omitted. Doing so allows us to understand the
reasons for the rise of deep learning in many application domains. Obviously
medical image processing is one of these areas which has been largely affected
by this rapid progress, in particular in image detection and recognition, image
segmentation, image registration, and computer-aided diagnosis. There are also
recent trends in physical simulation, modelling, and reconstruction that have led
to astonishing results. Yet, some of these approaches neglect prior knowledge
and hence bear the risk of producing implausible results. These apparent weak-
nesses highlight current limitations of deep learning. However, we also briefly
discuss promising approaches that might be able to resolve these problems in
the future.
Keywords: Introduction, Deep Learning, Machine Learning, Medical Imaging
Image Classification, Image Segmentation, Image Registration,
Computer-aided Diagnosis, Physical Simulation, Image Reconstruction

1. Introduction

Over the recent years, Deep Learning (DL) [1] has had a tremendous impact
on various fields in science. It has lead to significant improvements in speech
recognition [2] and image recognition [3], it is able to train artificial agents that
beat human players in Go [4] and ATARI games [5], and it creates artistic new
images [6, 7] and music [8]. Many of these tasks were considered to be impossible
to be solved by computers before the advent of deep learning, even in science
fiction literature.
Obviously this technology is also highly relevant for medical imaging. Vari-
ous introductions to the topic can be found in the literature ranging from short

Preprint submitted to Journal of Medical Physics December 24, 2018


tutorials and reviews [9, 10, 11, 12, 13, 14, 15, 16, 17, 18] over blog posts and
jupyter notebooks [19, 20, 21] to entire books [22, 23, 24, 25]. All of them serve
a different purpose and offer a different view on this quickly evolving topic. A
very good review paper is for example found in the work of Litjens et al. [12],
as they did the incredible effort to review more than 300 papers in their article.
Since then, however, many more noteworthy works have appeared - almost on
a daily basis - which makes it difficult to create a review paper that matches
the current pace in the field. The newest effort to summarize the entire field
was attempted in [26] listing more than 350 papers. Again, since its publication
several more noteworthy works appeared and others were missed. Hence, it is
important to select methods of significance and describe them in high detail.
Zhou et al. [22] do so for the state-of-the-art of deep learning in medical im-
age analysis and found an excellent selection of topics. Still, deep learning is
being quickly adopted in other fields of medical image processing and the book
misses, for example, topics such as image reconstruction. While an overview
on important methods in the field is crucial, the actual implementation is as
important to move the field ahead. Hence, works like the short tutorial by
Breininger et al. [20] are highly relevant to introduce to the topic also on a
code-level. Their jupyter notebook framework creates an interactive experience
in the web browser to implement fundamental deep learning basics in Python.
In summary, we observe that the topic is too complex and evolves too quickly
to be summarized in a single document. Yet, over the past few months there
already have been so many exciting developments in the field of medical image
processing that we believe it is worthwhile to point them out and to connect
them to a single introduction.
Readers of this article do not have to be closely acquainted with deep learning
at its terminology. We will summarize the relevant theory and present it at a
level of detail that is sufficient to follow the major concepts in deep learning.
Furthermore, we connect these observations with traditional concepts in pattern
recognition and machine learning. In addition, we put these foundations into
the context of emerging approaches in medical image processing and analysis,
including applications in physical simulation and image reconstruction. As a
last aim of this introduction, we also clearly indicate potential weaknesses of
the current technology and outline potential remedies.

2. Materials and Methods

2.1. Introduction to machine learning and pattern recognition


Machine learning and pattern recognition essentially deal with the problem of
automatically finding a decision, for example, separating apples from pears. In
traditional literature [27], this process is outlined using the pattern recognition
system (cf. Fig. 1). During a training phase, the so-called training data set is
preprocessed and meaningful features are extracted. While the preprocessing is
understood to remain in the original space of the data and comprised operations
such as noise reduction and image rectification, feature extraction is facing the

2
Figure 1: Schematic of the traditional pattern recognition pipeline used for automatic decision
making. Sensor data is preprocessed and “hand-crafted” features are extracted in training
and test phase. During training a classifier is trained that is later used in the test phase to
decide the class automatically (after [27]).

Figure 2: Neurons are inspired by biological neurons shown on the left. The resulting computa-
tional neuron computes a weighted sum of its inputs which is then processed by an activation
function h(x) to determine the output value (cf. Fig. 5). Doing so, we are able to model
linear decision boundaries, as the weighted sum can be interpreted as a signed distance to
the decision boundary, while the activation determines the actual class membership. On the
right-hand side, the XOR problem is shown that cannot be solved by a single linear classifier.
It typically requires either curved boundaries or multiple lines.

task to determine an algorithm that would be able to extract a distinctive and


complete feature representation, for example, color or length of the semi-axes
of a surrounding ellipse for our apples and pears example. This task is truly
difficult to generalize, and it is necessary to design such features anew essentially
for every new application. In the deep learning literature, this process is often
also referred to as “hand-crafting” features. Based on the feature vector x ∈ IRn ,
the classifier has to predict the correct class y, which is typically estimated by
a function ŷ = fˆ(x) that directly results in the classification result ŷ. The
classifier’s parameter vector θ is determined during the training phase and later
evaluated on an independent test data set.

2.2. Neural networks


In this context, we can now follow neural networks and associated methods in
their role as classifiers. The fundamental unit of a neural network is a neuron,

3
it takes a bias w0 and a weight vector w = (w1 , . . . , wn ) as parameters θ =
(w0 , . . . , wn ) to model a decision

fˆ(x) = h(w> x + w0 ) (1)

using a non-linear activation function h(x). Hence, a single neuron itself can
already be interpreted as a classifier, if the activation function is chosen such
that it is monotonic, bounded, and continuous. In this case, the maximum and
the minimum can be interpreted as a decision for the one or the other class.
Typical representatives for such activation functions in classical literature are
the sign function sign(x) resulting in Rosenblatt’s perceptron [28], the sigmoid
x −x
function σ(x) = 1+e1−x , or the tangens hyperbolicus tanh(x) = eex −e +e−x . (cf.
Fig. 5). A major disadvantage of individual neurons is that they only allow
to model linear decision boundaries, resulting in the well known fact that they
are not able to solve the XOR problem. Fig. 2 summarizes the considerations
towards the computational neuron graphically.
In combination with other neurons, modelling capabilities increase dramat-
ically. Arranged in a single layer, it can already be shown that neural networks
can approximate any continuous function f (x) on a compact subset of IRn [29].
A single layer network is conveniently summarized as a linear combination of N
individual neurons
NX −1
fˆ(x) = vi h(wi> x + w0,i ) (2)
i=0

using combination weights vi . All trainable parameters of this network can be


summarized as
θ = (v0 , w0,0 , w0 , . . . , vN , w0,N , wN )> .
The difference between the true function f (x) and its approximation fˆ(x) is
bounded by
|f (x) − fˆ(x)| < , (3)
where  decreases with increasing N for activation functions that satisfy the
criteria that we mentioned earlier (monotonicity, boundedness, continuity) [30].
Hence, given a large number of neurons, any function can be approximated using
a single layer network only. Note that the approximation will only be valid for
samples that are drawn from the same compact set on which the network was
trained. As such, an additional practical requirement for an approximation is
that the training set is representative and future observations will be similar.
At first glance, this contradicts all recent developments in deep learning and
therefore requires additional attention.
In the literature, many arguments are found why a deep structure has ben-
efits for feature representation, including the argument that by recombination
of the weights along the different paths through the network, features may be
re-used exponentially [31]. Instead of summarizing this long line of arguments,
we look into a slightly simpler example that is summarized graphically in Fig. 3.
Decision trees are also able to describe general decision boundaries in IRn . A

4
Figure 3: A decision tree allows to describe any partition of space and can thus model any
decision boundary. Mapping the tree into a one-layer network is possible. Yet, there still is
significant residual error in the resulting function. In the center example,  ≈ 0.7. In order
to reduce this error further, a higher number of neurons would be required. If we construct a
network with one node for every inner node in the first layer and one node for every leaf node
in the second layer, we are able to construct a network that results in  = 0.

simple example is shown on the top left of the figure, and the associated par-
tition of a two-dimensional space is shown below, where black indicates class
y = 1 and white y = 0. According to the universal approximation theorem, we
should be able to map this function into a single layer network. In the center
column, we attempt to do so using the inner nodes of the tree and their inverses
to construct a six neuron basis. In the bottom of the column, we show the basis
functions that are constructed at every node projected into the input space, and
the resulting network’s approximation, also shown in the input space. Here, we
chose the output weights to minimize ||y − ŷ||2 . As can be seen in the result,
not all areas can be recovered correctly. In fact, the maximal error  is close
to 0.7 for a function that is bounded by 0 and 1. In order to improve this ap-
proximation, we can choose to introduce a second layer. As shown in the right
column, we can choose the strategy to map all inner nodes to a first layer and
all leaf nodes of the tree to a second layer. Doing so effectively encodes every
partition that is described by the respective leaf node in the second layer. This
approach is able to map our tree correctly with  = 0. In fact, this approach is
general, holds for all decision trees, and was already described by Ivanova et al.
in 1995 [32]. As such, we can now understand why deeper networks may have
more modelling capacity.

5
2.3. Network training
Having gained basic insights into neural networks and their basic topology,
we still need to discuss how its parameters θ are actually determined. The
answer is fairly easy: gradient descent. In order to compute a gradient, we need
to define a function that measures the quality of our parameter set θ, the so-
called loss function L(θ). In the following, we will work with simple examples
for loss functions to introduce the concept of back-propagation, which is the
algorithm that is commonly used to efficiently compute gradients for neural
network training.
We can represent a single-layer fully connected network with linear activa-
tions simply as ŷ = fˆ(x) = W x, i.e., a matrix multiplication. Note that the
network’s output is now multidimensional with ŷ, y ∈ IRm . Using an L2-loss,
we end up with the following objective function:
1 ˆ 1
L(θ) = ||f (x) − y||22 = ||W x − y||22 . (4)
2 2
In order to update the parameters θ = W in this example, we need to compute

∂L ∂L ∂ fˆ
= = (W x − y)(x> ) (5)
∂W ∂ fˆ |∂W
|{z} {z>}
(W x−y) ·(x )

using the chain rule. Note that · indicates the operator’s side, as matrix vector
multiplications generally do not commute. The final weight update is then
obtained as
W j+1 = W j + η(W j x − y)x> , (6)
where η is the so-called learning rate and j is used to index the iteration number.
Now, let us consider a slightly more complicated network structure with
three layers ŷ = fˆ3 (fˆ2 (fˆ1 (x))) = W3 W2 W1 x, again using linear activations.
This yields the following objective function:
1
L(θ) = ||W3 W2 W1 x − y||22 . (7)
2
Note that this example is academic, as θ = {W1 , W2 , W3 } could simply be
collapsed to a single matrix. Yet, the concept that we use to derive this gradient
is generally applicable also to non-linear functions. Computing the gradient with
respect to the parameters of the last layer W3 follows the same recipe as in the
previous network:

∂L ∂L ∂ fˆ3
= = (W3 W2 W1 x − y)(W2 W1 x)> . (8)
∂W3 ∂ fˆ3 ∂W
| {z 3}
|{z}
>
(W3 W2 W1 x−y) ·(W2 W1 x)

6
Figure 4: Graphical overview of back-propagation using layer derivatives. During the forward
pass, the network is evaluated once and compared to the desired output using the loss function.
The back-propagation algorithm follows different paths through the layer graph in order to
compute the matrix derivatives efficiently.

For the computation of the gradient with respect to the second layer W2 , we
already need to apply the chain rule twice:

∂L ∂L ∂ fˆ3 ∂L ∂ fˆ3 ∂ fˆ2


= =
∂W2 ∂ fˆ3 ∂W2 ∂ fˆ3 ∂ fˆ2 ∂W
| {z 2}
|{z} |{z}
>
(W3 W2 W1 x−y) (W3 )> · ·(W1 x)

= W3> (W3 W2 W1 x − y)(W1 x)> . (9)

Which leads us to the input layer gradient that is determined as

∂L ∂L ∂ fˆ3 ∂L ∂ fˆ3 ∂ fˆ2 ∂L ∂ fˆ3 ∂ fˆ2 ∂ fˆ1


= = =
∂W1 ˆ
∂ f3 ∂W1 ∂ fˆ3 ∂ fˆ2 ∂W1 ∂ fˆ3 ∂ fˆ2 ∂ fˆ1 |∂W
|{z} |{z} |{z} {z 1}
>
(W3 W2 W1 x−y) (W3 )> · (W2 )> · ·(x)

= W2> W3> (W3 W2 W1 x − y)(x)> . (10)

The matrix derivatives above are also visualized graphically in Fig. 4. Note
that many intermediate results can be reused during the computation of the
gradient, which is one of the reasons why back-propagation is efficient in com-
puting updates. Also note that the forward pass through the net is part of ∂∂Lfˆ3
,
which is contained in all gradients of the net. The other partial derivatives are
only partial derivatives either with respect to the input or the parameters of
the respective layer. Hence, back-propagation can be used if both operations
are known for every layer in the net. Having determined the gradients, each
parameter can now be updated analogous to Eq. 6.

7
Figure 5: Overview of classical (sign(x), σ(x), and tanh(x)) and modern activation functions,
like the Rectified Linear Unit ReLU(x) and the leaky ReLU LReLU(x).

2.4. Deep learning


With the knowledge summarized in the previous sections, networks can be
constructed and trained. However, deep learning is not possible. One impor-
tant element was the establishment of additional activation functions that are
displayed in Fig. 5. In contrast to classical bounded activations like sign(x),
σ(x), and tanh(x), the new functions such as the Rectified Linear Unit
(
x if x ≥ 0
ReLU(x) =
0 else,

and many others, of which we only mention the Leaky ReLU


(
x if x ≥ 0
LReLU(x) =
αx else,

were identified to be useful to train deeper networks. Contrary to the classi-


cal activation functions, many of the new activation functions are convex and
have large areas with non-zero derivatives. As can be seen in Eq. 10, the com-
putation of the gradient of deeper layers using the chain rule requires several
multiplications of partial derivatives. The deeper the net, the more multiplica-
tions are required. If several elements along this chain are smaller than 1, the
entire gradient decays exponentially with the number of layers. Hence, non-
saturating derivatives are important to solve numerical issues, which were the
reasons why vanishing gradients did not allow training of networks that were
much deeper than about three layers. Also note that each neuron does not loose
its interpretation as a classifier, if we consider 0 as the classification boundary.
Furthermore, the universal approximation theorem still holds for a single-layer

8
network with ReLUs [33]. Hence, several useful and desirable properties are
attained using such modern activation functions.
One disadvantage is, of course, that the ReLU is not differentiable over the
entire domain of x. At x = 0 a kink is found that does not allow to determine a
unique gradient. For optimization, an important property of the gradient of a
function is that it will point towards the direction of the steepest ascent. Hence,
following the negative direction will allow minimization of the function. For a
differentiable function, this direction is unique. If this constraint is relaxed to
allow multiple directions that lead to an extremum, we arrive at sub-gradient
theory [34]. It allows us to still use gradient descent algorithms to optimize such
problems, if it is possible to determine a sub-gradient, i.e., at least one instance
of a valid direction towards the optimum. For the ReLU, any value between
0 and -1 would be acceptable at x = 0 for the descent operation. If such a
direction can be obtained, convergence is guaranteed for convex problems by
application of specific optimization programs, such as using a fixed step size in
the gradient descent [35]. This allows us to remain with back-propagation for
optimization, while using non-differentiable activation functions.
Another significant advance towards deep learning is the use of specialized
layers. In particular, the so-called convolution and pooling layers enable to
model locality and abstraction (cf. Fig. 6). The major advantage of the convo-
lution layers is that they only consider a local neighborhood for each neuron, and
that all neurons of the same layer share the same weights, which dramatically
reduces the amount of parameters and therefore memory required to store such
a layer. These restrictions are identical to limiting the matrix multiplication to
a matrix with circulant structure, which exactly models the operation of con-
volution. As the operation is generally of the form of a matrix multiplication,
the gradients introduced in Section 2.3 still apply. Pooling is an operation that
is used to reduce the scale of the input. For images, typically areas of 2 × 2 or
3 × 3 are analyzed and summarized to a single value. The average operation
can again be expressed as a matrix with hard-coded weights, and gradient com-
putation follows essentially the previous section. Non-linear operations, such
as maximum or median, however, require more attention. Again, we can ex-
ploit the sub-gradient approach. During the forward pass through the net, the
maximum or median can easily be determined. Once this is known, a matrix is
constructed that simply selects the correct elements that would also have been
selected by the non-linear methods. The transpose of the same matrix is then
employed during the backward pass to determine an appropriate sub-gradient
[36]. Fig. 6 shows both operations graphically and highlights an example for
a convolutional neural network (CNN). If we now compare this network with
Fig. 1, we see that the original interpretation as only a classifier is no longer
valid. Instead, the deep network now models all steps directly from the signal
up to the classification stage. Hence, many authors claim that feature “hand-
crafting” is no longer required because everything is learned by the network in
a data-driven manner.
So far, deep learning seems quite easy. However, there are also important
practical issues that all users of deep learning need to be aware of. In particular,

9
a look at the loss over the training iterations is very important. If the loss
increases quickly after the beginning, a typical problem is that the learning rate
η is set too high. This is typically referred to as exploding gradient. Setting η
too low, however, can also result in a stagnation of the loss over iterations. In
this case, we observe again vanishing gradients. Hence, correct choice of η and
other training hyper-parameters is crucial for successful training [37].
In addition to the training set, a validation set is used to determine over-
fitting. In contrast to the training set, the validation set is never used to actually
update the parameter weights. Hence, the loss of the validation set allows an
estimate for the error on unseen data. During optimization, the loss on the
training set will continuously fall. However, as the validation set is independent,
the loss on the validation set will increase at some point in training. This is
typically a good point to stop updating the model before it over-fits to the
training data.
Another common mistake is bias in training or test data. First of all, hyper-
parameter tuning has to be done on validation data before actual test data is
employed. In principle, test data should only be looked at once architecture,
parameters, and all other factors of influence are set. Only then the test data
is to be used. Otherwise, repeated testing will lead to optimistic results [37]
and the system’s performance will be over-estimated. This is as forbidden as
including the test data in the training set. Furthermore, confounding factors
may influence the classification results. If, for example, all pathological data
was collected with Scanner A and all control data was collected with Scanner
B, then the network may simply learn to differentiate the two scanners instead
of the identifying the disease [38].
Due to the nature of gradient descent, training will stop once a minimum
is reached. However, due to the general non-convexity of the loss function,
this minimum is likely to be only a local minimum. Hence, it is advisable to
perform multiple training runs with different initialization techniques in order
to estimate a mean and a standard deviation for the model performance. Single
training runs may be biased towards a single more or less random initialization.
Furthermore, it is very common to use typical regularization terms on pa-
rameters, as it is commonly done in other fields of medical imaging. Here,
L2- and L1-norms are common choices. In addition, regularization can also be
enforced by other techniques such as dropout, weight-sharing, and multi-task
learning. An excellent overview is given in [37].
Also note that the output of a neural network does not equal to confidence,
even if they are scaled between 0 and 1 and appear like probabilities, e.g. when
using the so-called softmax function. In order to get realistic estimates of con-
fidence other techniques have to be employed [39].
The last missing remark towards deep learning is the role of availability of
large amounts of data and labels or annotations that could be gathered over the
internet, the immense compute power that became available by using graphics
cards for general purpose computations, and, last but not least, the positive
trend towards open source software that enables users world-wide to download
and extend deep learning methods very quickly. All three elements were crucial

10
Figure 6: Convolutional layers only face a limited preceptive field and all neurons share the
same weights (cf. left side of the figure; adopted from [40]). Pooling layers reduce the total
input size. Both are typically combined in an alternating manner to construct convolutional
neural networks (CNNs). An example is shown on the right.

to enable this extremely fast rise of deep learning.

2.5. Important Architectures in Deep Learning


With the developments of the previous section, much progress was made
towards improved signal, image, video, and audio processing, as already detailed
earlier. In this introduction, we are not able to highlight all developments,
because this would go well beyond the scope of this document, and there are
other sources that are more suited for this purpose [31, 37, 12]. Instead, we will
only shortly discuss some advanced network architectures that we believe had,
or will have, an impact on medical image processing.
Autoencoders use a contracting and an expanding branch to find repre-
sentations of the input of a lower dimensionality [41]. They do not require
annotations, as the network is trained to predict the original input using loss
functions such as L(θ) = ||fˆ(x) − x||22 . Variants use convolutional networks
[42], add noise to the input [43], or aim at finding sparse representations [44].
Generative adversarial networks (GANs) employ two networks to learn
a representative distribution from the training data [45]. A generator network
creates new images from a noise input, while a discriminator network tries
to differentiate real images from generated images. Both are trained in an
alternating manner such that both gradually improve for their respective tasks.
GANs are known to generate plausible and realistically looking images. So-
called Wasserstein GANs can reduce instability in training [46]. Conditional
GANs [47] allow to encode states in the process such that images with desired
properties can be generated. CycleGANs [48] drive this even further as they
allow to convert one image from one domain to another, for example from day
to night, without directly corresponding images in the training data.
Google’s inception network is an advanced and deep architecture that
was applied successfully for several tasks [49]. Its main highlight is the in-
troduction of the so-called inception block that essentially allows to compute
convolutions and pooling operations in parallel. By repeating this block in a
network, the network can select by itself in which sequence convolution and
pooling layers should be combined in order to solve the task at hand effectively.

11
Ronneberger’s U-net is a breakthrough towards automatic image seg-
mentation [50] and has been applied successfully in many tasks that require
image-to-image transforms, for example, images to segmentation masks. Like
the autoencoder, it consists of a contracting and an expanding branch, and it
enables multi-resolution analysis. In addition, U-net features skip connections
that connect the matching resolution levels of the encoder and the decoder stage.
Doing so, the architecture is able to model general high-resolution multi-scale
image-to-image transforms. Originally proposed in 2-D, many extensions, such
as 3-D versions, exist [51, 52].
ResNets have been designed to enable training of very deep networks [53].
Even with the methods described earlier in this paper, networks will not ben-
efit from more than 30 to 50 layers, as the gradient flow becomes numerically
unstable in such deep networks. In order to alleviate the problem, a so-called
residual block is introduced, and layers take the form fˆ(x) = x + fˆ0 (x), where
fˆ0 (x) contains the actual network layer. Doing so has the advantage that the
addition introduces a second parallel branch into the network that lets the gra-
dient flow from end to end. ResNets also have other interesting properties, e.g.,
their residual blocks behave like ensembles of classifiers [54].
Variational networks enable the conversion of an energy minimization
problem into a neural network structure [55]. We consider this type of network
as particular interesting, as many problems in traditional medical image pro-
cessing are expressed as energy minimization problems. The main idea is as
follows: The energy function is typically minimized by optimization programs
such as gradient descent. Thus, we are able to use the gradient of the original
problem to construct a so-called variational unit that describes exactly one up-
date step of the optimization program. Succession of such units then describe
the complete variational network. Two observations are noteworthy: First, this
type of framework allows to learn operators within one variational unit, such
as a sparsifying transform for compressed sensing problems. Second, the vari-
ational units generally form residual blocks, and thus variational networks are
always ResNets as well.
Recurrent neural networks (RNNs) enable the processing of sequences
with long term dependencies [56]. Furthermore, recurrent nets introduce state
variables that allow the cells to carry memory and essentially model any finite
state machine. Extensions are long-short-term memory (LSTM) networks [57]
and gated recurrent units (GRU) [58] that can model explicit read and write
memory transactions similar to a computer.

2.6. Advanced deep learning concepts


In addition to the above mentioned architectures, there are also useful con-
cepts that allow building more robust and versatile networks. Again, the here
listed methods are incomplete. Still, we aimed at including the most useful ones.
Data augmentation In data augmentation, common sources of variation
are explicitly added to training samples. These models of variation typically
include noise, changes in contrast, and rotations and translations. In biased
data, it can be used to improve the numbers of infrequent observations. In

12
particular, the success of U-net is also related to very powerful augmentation
techniques that include, for example, non-rigid deformations of input images
and the desired segmentation [50]. In most recent literature, reports are found
that also GANs are useful for data augmentation [59].
Precision learning is a strategy to include known operators into the learn-
ing process [60]. While this idea is counter-intuitive for most recognition tasks,
where we want to learn the optimal representation, the approach is actually very
useful for signal processing tasks in which we know a priori that a certain op-
erator must be present in the processing chain. Embedding the operator in the
network reduces the maximal training error, reduces the number of unknowns
and therefore the number of required training samples, and enables mixing of
most signal processing methods with deep learning. The approach is applica-
ble to a broad range of operators. The main requirement is that a gradient or
sub-gradient must exist.
Adversarial examples consider the input to a neural network as a possible
weak spot that could be exploited by an attacker [61]. Generally, attacks try
to find a perturbation e such that fˆ(x + e) indicates a different class than the
true y, while keeping the magnitude of e low, for example, by minimizing ||e||22 .
Using different objective functions allows to form different types of attacks.
Attacks range from generating noise that will mislead the network, but will
remain unnoticed by a human observer, to specialized patterns that will even
mislead networks after printing and re-digitization [62].
Deep reinforcement learning is a technique that allows to train an artifi-
cial agent to perform actions given inputs from an environment and expands on
traditional reinforcement learning theory [63]. In this context, deep networks
are often used as flexible function approximators representing value functions
and/or policies [4]. In order to enable time-series processing, sequences of envi-
ronmental observations can be employed [5].

3. Results

As can be seen in the last few paragraphs, deep learning now offers a large
set of new tools that are applicable to many problems in the world of medical
image processing. In fact, these tools have already been widely employed. In
particular, perceptual tasks are well suited for deep learning. We present some
highlights that are discussed later in this section in Fig. 7. On the international
conference of Medical Image Computing and Computer-Assisted Intervention
(MICCAI) in 2018, approximately 70 % of all accepted publications were related
to the topic of deep learning. Given this fast pace of progress, we are not able
to describe all relevant publications here. Hence, this overview is far from being
complete. Still we want to highlight some publications that are representative for
the current developments in the field. In terms of structure and organization, we
follow [22] here, but add recent developments in physical simulation and image
reconstruction.

13
Figure 7: Deep learning excels in perceptual tasks such as detection and segmentation. The
left hand side shows the artificial agent-based landmark detection after Ghesu et al. [71]
and the X-ray transform-invariant landmark detection by Bier et al. [67] (projection image
courtesy of Dr. Unberath). The right hand side shows a U-net-based stent segmentation after
Breininger et al. [72]. Images are reproduced with permission by the authors.

3.1. Image detection and recognition


Image detection and recognition deals with the problem of detecting a certain
element in a medical image. In many cases, the images are volumetric. Therefore
efficient parsing is a must. A popular strategy to do so is marginal space learning
[64], as it is efficient and allows to detect organs robustly. Its deep learning
counter-part [65] is even more efficient, as its probabilistic boosting trees are
replaced using a neural network-based boosting cascade. Still, the entire volume
has to be processed to detect anatomical structures reliably. [66] drives efficiency
even further by replacing the search process by an artificial agent that follows
anatomy to detect anatomical landmarks using deep reinforcement learning.
The method is able to detect hundreds of landmarks in a complete CT volume
in few seconds.
Bier et al. proposed an interesting method in which they detect anatomical
landmarks in 2-D X-ray projection images [67]. In their method, they train
projection-invariant feature descriptors from 3-D annotated landmarks using a
deep network. Yet another popular method for detection are the so-called region
proposal convolutional neural networks. In [68] they are applied to robustly
detect tumors in mammographic images.
Detection and recognition are obviously also applied in many other modal-
ities and a great body of literature exists. Here, we only report two more
applications. In histology, cell detection and classification is an important task,
which is tackled by Aubreville et al. using guided spatial transformer networks
[69] that allow refinement of the detection before the actual classification is done.
The task of mitosis classification benefits from this procedure. Convolutional
neural networks are also very effective for other image classification tasks. In [70]
they are employed to automatically detect images containing motion artifacts
in confocal laser-endoscopy images.

3.2. Image segmentation


Also image segmentation greatly benefited from the recent developments in
deep learning. In image segmentation, we aim to determine the outline of an or-
gan or anatomical structure as accurately as possible. Again, approaches based

14
on convolutional neural networks seem to dominate. Here, we only report Hol-
ger Roth’s Deeporgan [73], the brain MR segmentation using CNN by Moeskops
et al. [74], a fully convolutional multi-energy 3-D U-net presented by Chen et
al. [75], and a U-net-based stent segmentation in X-ray projection domain by
Breininger et al. [72] as representative examples. Obviously segmentation using
deep convolutional networks also works in 2-D as shown by Nirschl et al. for
histopathologic images [76].
Middelton et al. already experimented with the fusion of neural networks
and active contour models in 2004 well before the advent of deep learning [77].
Yet, their approach is neither using deep nets nor end-to-end training, which
would be desirable for a state-of-the-art method. Hence, revisiting traditional
segmentation approaches and fusing them with deep learning in an end-to-end
fashion seems a promising scope of research. Fu et al. follow a similar idea by
mapping Frangi’s vesselness into a neural network [78]. They demonstrate that
they are able to adjust the convolution kernels in the first step of the algorithm
towards the specific task of vessel segmentation in ophthalmic fundus imaging.
Yet another interesting class of segmentation algorithms is the use of re-
current networks for medical image segmentation. Poudel et al. demonstrate
this for a recurrent fully convolutional neural network on multi-slice MRI car-
diac data [79], while Andermatt et al. show effectiveness of GRUs for brain
segmentation [80].

3.3. Image registration


While the perceptual tasks of image detection and classification have been
receiving a lot of attention with respect to applications of deep learning, im-
age registration has not seen this large boost yet. However, there are several
promising works found in the literature that clearly indicate that there are also
a lot of opportunities.
One typical problem in point-based registration is to find good feature de-
scriptors that allow correct identification of corresponding points. Wu et al.
propose to do so using autoencoders to mine good features in an unsupervised
way [81]. Schaffert et al. drive this even further and use the registration metric
itself as loss function for learning good feature representations [82]. Another op-
tion to solve 2-D/3-D registration problems is to estimate the 3-D pose directly
from the 2-D point features [83].
For full volumetric registration, examples of deep learning-based approaches
are also found. The quicksilver algorithm is able to model a deformable reg-
istration and uses a patch-wise prediction directly from the image appearance
[84]. Another approach is to model the registration problem as a control prob-
lem that is dealt with using an agent and reinforcement learning. Liao et al.
propose to do so for rigid registration predicting the next optimal movement
in order to align both volumes [85]. This approach can also be applied to non-
rigid registration using a statistical deformation model [86]. In this case, the
actions are movements in the vector space of the deformation model. Obvi-
ously, agent-based approaches are also applicable for point-based registration

15
problems. Zhong et al. demonstrate this for intra-operative brain shift using
imitation learning [87].

3.4. Computer-aided diagnosis


Computer-aided diagnosis is regarded as one of the most challenging prob-
lems in the field of medical image procesing. Here, we are not only acting in a
supportive role quantifying evidence towards the diagnosis. Instead the diagno-
sis itself is to be predicted. Hence, decisions have to be done with utmost care
and decisions have to be reliable.
The analysis of chest radiographs comprises a significant amount of work
for radiologistic and is performed routinely. Hence, reliable support to prevent
human error is highly desirable. An example to do so is given in [88] by Diamant
et al. using transfer learning techniques.
A similar workload is imposed on ophthalmologists in the reading of volu-
metric optical coherence tomography data. Google’s Deep Mind just recently
proposed to support this process in terms of referral decision support [89].
There are many other studies found in this line, for example, automatic
cancer assessment in confocal laser endoscopy in different tissues of the head
and neck [90], deep learning for mammogram analysis [91], and classification of
skin cancer [92].

3.5. Physical simulation


A new field of deep learning is the support of physical modelling. So far this
has been exploited in the gaming industry to compute realistically appearing
physics engines [93], or for smoke simulation [94] in real-time. A first attempt
to bring deep learning to bio-medical modelling was done by Meister et al. [95].
Based on such observations, researchers started to bring such methods into
the field of medical imaging. One example to do so is the deep scatter estimation
by Maier et al. [96]. Unberath et al. drive this even further to emulate the
complete X-ray formation process in their DeepDRR [97]. In [98] Horger et al.
demonstrate that even noise of unknown distributions can be learned, leading
to an efficient generative noise model for realistic physical simulations.
Also other physical processes have been investigated using deep learning.
In [60] a material decomposition using deep learning embedding prior physical
operators using precision learning is proposed. Also physically less plausible
interrelations are attempted. In [99], Han et al. attempt to convert MR vol-
umes to CT volumes. Stimpel et al. drive this even further predicting X-ray
projections from MR projection images [100]. While these observations seem
promising, one has to follow such endeavors with care. Schiffers et al. demon-
strate that cycleGANs may create correctly appearing flourecence images from
fundus images in ophthalmology [101]. Yet, undesired effects appear, as occa-
sionally drusen are mapped onto micro aneurysms in this process. Cohen et
al. demonstrate even worse effects [102]. In their study, cancers disappeared or
were created during the modality-to-modality mapping. Hence, such approaches
have to be handled with care.

16
Figure 8: Results from a deep learning image-to-image reconstruction based on U-net. The
reference image with a lesion embedded is shown on the left followed by the analytic recon-
struction result that is used as input to U-net. U-net does an excellent job when trained
and tested without noise. If unmatched noise is provided as input, an image is created that
appears artifact-free, yet not just the lesion is gone, but also the chest surface is shifted by
approximately 1 cm. On the right hand side, an undesirable result is shown that emerged at
some point during training of several different versions of U-net which shows organ-shaped
clouds in the air in the background of the image. Note that we omitted displaying multiple
versions of “Limited Angle” as all three inputs to the U-Nets would appear identically given
the display window of the figure of [-1000, 1000] HU.

3.6. Image Reconstruction


Also the field of medical image reconstruction has been affected by deep
learning and was just recently the topic of a special issue in the IEEE Transac-
tions on Medical Imaging. The editorial actually gives an excellent overview on
the latest developments [103] that we will summarize in the next few lines.
One group of deep learning algorithms omit the actual problem of recon-
struction and formulate the inverse as image-to-image transforms with different
initialization techniques before processing with a neural network. Recent devel-
opments in this image-to-image reconstruction are summarized in [104]. Still,
there is continuous progress in the field, e.g. by application of the latest network
architectures [105] or cascading of U-nets [106].
A recent paper by Zhu et al. proposes to learn the entire reconstruction
operation only from raw data and corresponding images [107]. The basic idea is
to model an autoencoder-like dimensionality reduction in raw data and recon-
struction domain. Then both are linked using a non-linear correlation model.
The entire model can then be converted into a single network and trained in an
end-to-end manner. In the paper, they show that this is possible for 2-D MR
and PET imaging and largely outperforms traditional approaches.
Learning operators completely data-driven carries the risk that undesired
effects may occur [108], as is shown in Fig. 8. Hence integration of prior knowl-
edge and the structure of the operators seems beneficial, as already described
in the concept of precision learning in the previous section. Ye et al. embed
a multi-scale transform into the encoder and decoder of a U-net-like network,
which gives rise to the concept of deep convolutional framelets [109]. Using
wavelets for the multi-scale transform has been successfully applied in many
applications ranging from denoising [110] to sparse view computed tomography
[111].
If we design a neural network inspired by iterative algorithms that minimize

17
an energy function step by step, the concept of variational networks is useful.
Doing so allows to map virtually all iterative reconstruction algorithms onto
deep networks, e.g., by using a fixed number of iterations. There are several
impressive works found in the literature, of which we only name the MRI re-
construction by Hammernik et al. [112] and the sound speed reconstruction
by Vishnevskiy et al. [113] at this point. The concept can be expanded even
further, as Adler et al. demonstrate by learning an entire primal-dual recon-
struction [114].
Würfl et al. also follow the idea of using prior operators [115, 116]. Their
network is inspired by the classical filtered back-projection that can be retrained
to better approximate limited angle geometries that typically cannot be solved
by classical analytic inversion models. Interestingly, as the approach is described
in an end-to-end fashion, errors in the discretization or initialization of the
filtering steps are intrinsically corrected by the learning process [117]. They also
show that their method is compatible with other approaches, such as variational
networks that are able to learn an additional de-streaking sparsifying transform
[118]. Syben et al. drive these efforts even further and demonstrate that the
concept of precision learning is able to mathematically derive a neural network
structure [119]. In their work, they demonstrate that they are able to postulate
that an expensive matrix inverse is a circulant matrix and hence can be replaced
by a convolution operation. Doing so leads to the derivation of a previously
unknown filtering, back-projection, re-projection-style rebinning algorithm that
intrinsically suffers less from resolution loss than traditional interpolation-based
rebinning methods.
As noted earlier, all networks are prone to adversarial attacks. Huang et
al. demonstrate this [108] in their work, showing that already incorrect noise
modelling may distort the entire image. Yet, the networks reconstruct visually
pleasing results and artifacts cannot be as easily identified as in classical meth-
ods. One possible remedy is to follow the precision learning paradigm and fix
as much of the network as possible, such that it can be analyzed with classical
methods as demonstrated in [116]. Another promising approach is Bayesian
deep learning [39]. Here the network output is two-fold: the reconstructed im-
age plus a confidence map on how accurate the content of the reconstructed
image was actually measured.
Obviously, deep learning also plays a role in suppression of artifacts. In [120],
Zhang et al. demonstrate this effectively for metal artifacts. As a last example,
we list Bier et al. here, as they show that deep learning-based motion tracking
is also feasible for motion compensated reconstruction [121].

4. Discussion
In this introduction, we reviewed the latest developments in deep learning for
medical imaging. In particular detection, recognition, and segmentation tasks
are well solved by the deep learning algorithms. Those tasks are clearly linked
to perception and there is essentially no prior knowledge present. Hence, state-
of-the-art architectures from other fields, such as computer vision, can often be

18
easily adopted to medical tasks. In order to gain better understanding of the
black box, reinforcement learning and modelling of artificial agents seem well
suited.
In image registration, deep learning is not that broadly used. Yet, interesting
approaches already exist that are able to either predict deformations directly
from the image input, or take advantage of reinforcement learning-based tech-
niques that model registration as on optimal control problem. Further benefits
are obtained using deep networks for learning representations, which are either
done in an unsupervised fashion or using the registration metric itself.
Computer-aided diagnosis is a hot topic with many recent publications ad-
dress. We expect that simpler standard tasks that typically result in a high
workload for medical doctors will be solved first. For more complex diagnoses,
the current deep nets that immediately result in a decision are not that well
suited, as it is difficult to understand the evidence. Hence, approaches are
needed that link observations to evidence to construct a line of argument to-
wards a decision. It is the strong belief of the authors that only if such evidence-
based decision making is achieved, the new methodology will make a significant
impact to computer-aided diagnosis.
Physical simulation can be accelerated dramatically with realistic outcomes
as shown in the field of computer games and graphics. Therefore, the meth-
ods are highly relevant, in particular for interventional applications, in which
real-time processing is mandatory. First approaches exist, yet there is consid-
erable room for more new developments. In particular, precision learning and
variational networks seem to be well suited for such tasks, as they provide some
guarantees to prediction outcomes. Hence, we believe that there are many new
developments to follow, in particular in radiation therapy and real-time inter-
ventional dose tracking.
Reconstruction based on data-driven methods yield impressive results. Yet,
they may suffer from a “new kind” of deep learning artifacts. In particular, the
work by Huang et al. [108] show these effects in great detail. Both precision
learning as well as Bayesian approaches seem well suited to tackle the problem in
the future. Yet, it is unclear how to benefit best from the data-driven methods
while maintaining intuitive and safe image reading.
A great advantage of all the deep learning methods is that they are inher-
ently compatible to each other and to many classical approaches. This fusion
will spark many new developments in the future. In particular, the fusion on
network-level using either the direct connection of networks or precision learning
allows end-to-end training of algorithms. The only requirement for this deep
fusion is that each operation in the hybrid net has a gradient or sub-gradient
for the optimization. In fact, there are already efforts to design whole program-
ming languages to be compatible with this kind of differential programming
[122]. With such integrated networks, multi-task learning is enabled, for ex-
ample, training of networks that deliver optimal reconstruction quality and the
best volumetric overlap of the resulting segmentation at the same time, as al-
ready conjectured in [123]. This point may even be expanded to computer-aided
diagnosis or patient benefit.

19
In general, we observe that the CNN architectures that emerge from deep
learning are computationally very efficient. Networks find solutions that are on
par or better than many state-of-the-art algorithms. However, their computa-
tional cost at inference time is often much lower than state-of-the-art algorithms
in typical domains of medical imaging in detection, segmentation, registration,
reconstruction, and physical simulation tasks. This benefit at run-time comes
at high computational cost during training that can take days even on GPU
clusters. Given an appropriate problem domain and training setup, we can thus
exploit this effect to save run-time at the cost of additional training time.
Deep learning is extremely data hungry. This is one of the main limitations
that the field is currently facing, and performance grows only logarithmically
with the amount of data used [124]. Approaches like weakly supervised training
[125] will only partially be able to close this gap. Hence, one hospital or one
group of researchers will not be able to gather a competitive amount of data in
the near future. As such, we welcome initiatives such as the grand challenges1 or
medical data donors2 , and hope that they will be successful with their mission.

5. Conclusion

In this short introduction to deep learning in medical image processing we


were aiming at two objectives at the same time. On the one hand, we wanted to
introduce to the field of deep learning and the associated theory. On the other
hand, we wanted to provide a general overview on the field and potential future
applications. In particular, perceptual tasks have been studied most so far.
However, with the set of tools presented here, we believe many more problems
can be tackled. So far, many problems could be solved better than the classical
state-of-the-art does alone, which also sparked significant interest in the public
media. Generally, safety and understanding of networks is still a large concern,
but methods to deal with this are currently being developed. Hence, we believe
that deep learning will probably remain an active research field for the coming
years.
If you enjoyed this introduction, we recommend that you have a look at
our video lecture that is available at https://www.video.uni-erlangen.de/
course/id/662.

Acknowledgements

We express our thanks to Katharina Breininger, Tobias Würfl, and Vincent


Christlein, who did a tremendous job when we created the deep learning course
at the University of Erlangen-Nuremberg. Furthermore, we would like to thank
Florin Ghesu, Bastian Bier, Yixing Huang, and again Katharina Breininger for
the permission to highlight their work and images in this introduction. Last

1 https://grand-challenge.org
2 http://www.medicaldatadonors.org

20
but not least, we also express our gratitude to the participants of the course
“Computational Medical Imaging3 ”, who were essentially the test audience of
this article during the summer school “Ferienakademie 2018”.

References

[1] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (2015) 436.

[2] G. E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained


deep neural networks for large-vocabulary speech recognition, IEEE
Transactions on audio, speech, and language processing 20 (2012) 30–42.
[3] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with
deep convolutional neural networks, in: Advances in neural information
processing systems, pp. 1097–1105.
[4] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driess-
che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot,
et al., Mastering the game of go with deep neural networks and tree
search, nature 529 (2016) 484.
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Belle-
mare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al.,
Human-level control through deep reinforcement learning, Nature 518
(2015) 529.

[6] A. Mordvintsev, C. Olah, M. Tyka, Inceptionism: Going deeper into


neural networks, Google Research Blog. Retrieved June 20 (2015) 5.
[7] W. R. Tan, C. S. Chan, H. E. Aguirre, K. Tanaka, Artgan: Artwork
synthesis with conditional categorical gans, in: Image Processing (ICIP),
2017 IEEE International Conference on, IEEE, pp. 3760–3764.

[8] J. Briot, G. Hadjeres, F. Pachet, Deep learning techniques for music


generation - A survey, CoRR abs/1709.01620 (2017).
[9] P. Seebock, Deep learning in medical image analysis, Master’s thesis,
Vienna University of Technology, Faculty of Informatics (2015).

[10] D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis,
Annual review of biomedical engineering 19 (2017) 221–248.
[11] N. Pawlowski, S. I. Ktena, M. C. Lee, B. Kainz, D. Rueckert, B. Glocker,
M. Rajchl, Dltk: State of the art reference implementations for deep
learning on medical images, arXiv preprint arXiv:1711.06853 (2017).

3 https://www5.cs.fau.de/lectures/sarntal-2018/

21
[12] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoo-
rian, J. A. van der Laak, B. Van Ginneken, C. I. Sánchez, A survey on
deep learning in medical image analysis, Medical image analysis 42 (2017)
60–88.
[13] B. J. Erickson, P. Korfiatis, Z. Akkus, T. L. Kline, Machine learning for
medical imaging, Radiographics 37 (2017) 505–515.
[14] K. Suzuki, Survey of deep learning applications to medical image analysis,
Med Imaging Technol 35 (2017) 212–226.
[15] J. Hagerty, R. J. Stanley, W. V. Stoecker, Medical image processing in
the age of deep learning, in: Proceedings of the 12th International Joint
Conference on Computer Vision, Imaging and Computer Graphics Theory
and Applications (VISIGRAPP 2017), pp. 306–311.
[16] P. Lakhani, D. L. Gray, C. R. Pett, P. Nagy, G. Shih, Hello world deep
learning in medical imaging, Journal of digital imaging 31 (2018) 283–289.

[17] J. Kim, J. Hong, H. Park, J. Kim, J. Hong, H. Park, Prospects of deep


learning for medical imaging, Precision and Future Medicine 2 (2018)
37–52.
[18] J. Ker, L. Wang, J. Rao, T. Lim, Deep learning applications in medical
image analysis, IEEE Access 6 (2018) 9375–9389.

[19] M. Rajchl, S. I. Ktena, N. Pawlowski, An Introduc-


tion to Biomedical Image Analysis with TensorFlow
and DLTK, 2018. https://medium.com/tensorflow/
an-introduction-to-biomedical-image-analysis-with-tensorflow-and-dltk-2c25304e7c13.

[20] K. Breininger, T. Würfl, Tutorial: How to Build a Deep


Learning Framework, 2018. https://github.com/kbreininger/
tutorial-dlframework.
[21] D. Cornelisse, An intuitive guide to Convolutional Neu-
ral Networks, 2018. https://medium.freecodecamp.org/
an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050.

[22] S. K. Zhou, H. Greenspan, D. Shen, Deep learning for medical image


analysis, Academic Press, 2017.
[23] L. Lu, Y. Zheng, G. Carneiro, L. Yang, Deep Learning and Convolutional
Neural Networks for Medical Image Computing, Springer, 2017.

[24] F. Chollet, Deep learning with python, Manning Publications Co., 2017.
[25] A. Géron, Hands-on machine learning with Scikit-Learn and TensorFlow:
concepts, tools, and techniques to build intelligent systems, ” O’Reilly
Media, Inc.”, 2017.

22
[26] B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker, K. H.
Cha, R. M. Summers, M. L. Giger, Deep learning in medical imaging and
radiation therapy, Medical physics (2018).
[27] H. Niemann, Pattern analysis and understanding, volume 4, Springer Sci-
ence & Business Media, 2013.

[28] F. Rosenblatt, The perceptron, a perceiving and recognizing automaton


Project Para, Cornell Aeronautical Laboratory, 1957.
[29] G. Cybenko, Approximation by superpositions of a sigmoidal function,
Mathematics of control, signals and systems 2 (1989) 303–314.

[30] K. Hornik, Approximation capabilities of multilayer feedforward networks,


Neural networks 4 (1991) 251–257.
[31] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review
and new perspectives, IEEE transactions on pattern analysis and machine
intelligence 35 (2013) 1798–1828.

[32] I. Ivanova, M. Kubat, Initialization of neural networks by means of deci-


sion trees, Knowledge-Based Systems 8 (1995) 333–344.
[33] S. Sonoda, N. Murata, Neural network with unbounded activation func-
tions is universal approximator, Applied and Computational Harmonic
Analysis 43 (2017) 233–268.
[34] R. Rockafellar, Convex Analysis, Princeton landmarks in mathematics
and physics, Princeton University Press, 1970.
[35] D. P. Bertsekas, A. Scientific, Convex optimization algorithms, Athena
Scientific Belmont, 2015.

[36] F. Schirrmacher, T. Köhler, L. Husvogt, J. G. Fujimoto, J. Hornegger,


A. K. Maier, QuaSI: Quantile Sparse Image Prior for Spatio-Temporal
Denoising of Retinal OCT Data, in: Medical Image Computing and
Computer-Assisted Intervention MICCAI 2017: 20th International Con-
ference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings,
volume 10434, Springer, p. 83.
[37] I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, vol-
ume 1, MIT press Cambridge, 2016.
[38] A. Maier, M. Schuster, U. Eysholdt, T. Haderlein, T. Cincarek, S. Steidl,
A. Batliner, S. Wenhardt, E. Nöth, Qmosa robust visualization method
for speaker dependencies with different microphones, Journal of Pattern
Recognition Research 4 (2009) 32–51.

23
[39] J. Schlemper, D. C. Castro, W. Bai, C. Qin, O. Oktay, J. Duan, A. N.
Price, J. Hajnal, D. Rueckert, Bayesian deep learning for accelerated mr
image reconstruction, in: F. Knoll, A. Maier, D. Rueckert (Eds.), Ma-
chine Learning for Medical Image Reconstruction, Springer International
Publishing, Cham, 2018, pp. 64–71.
[40] V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learn-
ing, ArXiv e-prints (2016).
[41] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and
composing robust features with denoising autoencoders, in: Proceedings
of the 25th international conference on Machine learning, ACM, pp. 1096–
1103.
[42] D. Holden, J. Saito, T. Komura, T. Joyce, Learning motion manifolds
with convolutional autoencoders, in: SIGGRAPH Asia 2015 Technical
Briefs, ACM, p. 18.
[43] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked
denoising autoencoders: Learning useful representations in a deep network
with a local denoising criterion, Journal of machine learning research 11
(2010) 3371–3408.
[44] F. J. Huang, Y.-L. Boureau, Y. LeCun, et al., Unsupervised learning
of invariant feature hierarchies with applications to object recognition,
in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE
Conference on, IEEE, pp. 1–8.
[45] I. Goodfellow, Nips 2016 tutorial: Generative adversarial networks, arXiv
preprint arXiv:1701.00160 (2016).
[46] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial
networks, in: International Conference on Machine Learning, pp. 214–223.
[47] J. Gauthier, Conditional generative adversarial nets for convolutional face
generation, Class Project for Stanford CS231N: Convolutional Neural
Networks for Visual Recognition, Winter semester 2014 (2014) 2.
[48] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image trans-
lation using cycle-consistent adversarial networks, arXiv preprint (2017).
[49] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Er-
han, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in:
Proceedings of the IEEE conference on computer vision and pattern recog-
nition, pp. 1–9.
[50] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for
biomedical image segmentation, in: International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention, Springer, pp.
234–241.

24
[51] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-
net: learning dense volumetric segmentation from sparse annotation, in:
International Conference on Medical Image Computing and Computer-
Assisted Intervention, Springer, pp. 424–432.
[52] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neu-
ral networks for volumetric medical image segmentation, in: 3D Vision
(3DV), 2016 Fourth International Conference on, IEEE, pp. 565–571.
[53] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
nition, in: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 770–778.

[54] A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensem-


bles of relatively shallow networks, in: Advances in Neural Information
Processing Systems, pp. 550–558.
[55] E. Kobler, T. Klatzer, K. Hammernik, T. Pock, Variational networks:
connecting variational methods and deep learning, in: German Conference
on Pattern Recognition, Springer, pp. 281–293.
[56] D. P. Mandic, J. Chambers, Recurrent neural networks for prediction:
learning algorithms, architectures and stability, John Wiley & Sons, Inc.,
2001.

[57] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural compu-


tation 9 (1997) 1735–1780.
[58] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of
gated recurrent neural networks on sequence modeling, arXiv preprint
arXiv:1412.3555 (2014).

[59] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger,


H. Greenspan, Gan-based synthetic medical image augmentation
for increased CNN performance in liver lesion classification, CoRR
abs/1803.01229 (2018).
[60] A. Maier, F. Schebesch, C. Syben, T. Würfl, S. Steidl, J.-H. Choi,
R. Fahrig, Precision Learning: Towards Use of Known Operators in Neu-
ral Networks, in: J. K. T. Tan (Ed.), 2018 24rd International Conference
on Pattern Recognition (ICPR), pp. 183–188.
[61] X. Yuan, P. He, Q. Zhu, R. R. Bhat, X. Li, Adversarial examples: Attacks
and defenses for deep learning, arXiv preprint arXiv:1712.07107 (2017).

[62] T. B. Brown, D. Mané, A. Roy, M. Abadi, J. Gilmer, Adversarial patch,


arXiv preprint arXiv:1712.09665 (2017).
[63] R. S. Sutton, A. G. Barto, F. Bach, et al., Reinforcement learning: An
introduction, MIT press, 1998.

25
[64] Y. Zheng, D. Comaniciu, Marginal space learning, in: Marginal Space
Learning for Medical Image Analysis, Springer, 2014, pp. 25–65.
[65] F. C. Ghesu, E. Krubasik, B. Georgescu, V. Singh, Y. Zheng, J. Horneg-
ger, D. Comaniciu, Marginal space deep learning: efficient architecture
for volumetric image parsing, IEEE transactions on medical imaging 35
(2016) 1217–1228.
[66] F. C. Ghesu, E. Krubasik, B. Georgescu, V. Singh, Y. Zheng, J. Horneg-
ger, D. Comaniciu, Marginal space deep learning: efficient architecture
for volumetric image parsing, IEEE transactions on medical imaging 35
(2016) 1217–1228.
[67] B. Bier, M. Unberath, J.-N. Zaech, J. Fotouhi, M. Armand, G. Osgood,
N. Navab, A. Maier, X-ray-transform invariant anatomical landmark de-
tection for pelvic trauma surgery, in: A. F. Frangi, J. A. Schnabel, C. Da-
vatzikos, C. Alberola-López, G. Fichtinger (Eds.), Medical Image Com-
puting and Computer Assisted Intervention – MICCAI 2018, Springer
International Publishing, Cham, 2018, pp. 55–63.
[68] A. Akselrod-Ballin, L. Karlinsky, S. Alpert, S. Hasoul, R. Ben-Ari,
E. Barkan, A region based convolutional network for tumor detection
and classification in breast mammography, in: Deep Learning and Data
Labeling for Medical Applications, Springer, 2016, pp. 197–205.
[69] M. Aubreville, M. Krappmann, C. Bertram, R. Klopfleisch, A. Maier, A
Guided Spatial Transformer Network for Histology Cell Differentiation, in:
T. E. Association (Ed.), Eurographics Workshop on Visual Computing for
Biology and Medicine, pp. 021–025.
[70] M. Aubreville, M. Stöve, N. Oetter, M. de Jesus Goncalves, C. Knipfer,
H. Neumann, C. Bohr, F. Stelzle, A. Maier, Deep learning-based detec-
tion of motion artifacts in probe-based confocal laser endomicroscopy im-
ages, International Journal of Computer Assisted Radiology and Surgery
(2018).
[71] F. C. Ghesu, B. Georgescu, Y. Zheng, S. Grbic, A. Maier, J. Hornegger,
D. Comaniciu, Multi-scale deep reinforcement learning for real-time 3d-
landmark detection in ct scans, IEEE Transactions on Pattern Analysis
and Machine Intelligence (2017).
[72] K. Breininger, S. Albarqouni, T. Kurzendorfer, M. Pfister, M. Kowarschik,
A. Maier, Intraoperative stent segmentation in X-ray fluoroscopy for en-
dovascular aortic repair, International Journal of Computer Assisted Ra-
diology and Surgery 13 (2018).
[73] H. R. Roth, L. Lu, A. Farag, H.-C. Shin, J. Liu, E. B. Turkbey, R. M. Sum-
mers, Deeporgan: Multi-level deep convolutional networks for automated
pancreas segmentation, in: International conference on medical image
computing and computer-assisted intervention, Springer, pp. 556–564.

26
[74] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J.
Benders, I. Išgum, Automatic segmentation of mr brain images with a
convolutional neural network, IEEE transactions on medical imaging 35
(2016) 1252–1261.
[75] S. Chen, X. Zhong, S. Hu, S. Dorn, M. Kachelriess, M. Lell, A. Maier,
Automatic Multi-Organ Segmentation in Dual Energy CT using 3D Fully
Convolutional Network, in: B. van Ginneken, M. Welling (Eds.), MIDL.
[76] J. J. Nirschl, A. Janowczyk, E. G. Peyster, R. Frank, K. B. Margulies,
M. D. Feldman, A. Madabhushi, Deep learning tissue segmentation in
cardiac histopathology images, in: Deep Learning for Medical Image
Analysis, Elsevier, 2017, pp. 179–195.
[77] I. Middleton, R. I. Damper, Segmentation of magnetic resonance images
using a combination of neural networks and active contour models, Med-
ical engineering & physics 26 (2004) 71–86.
[78] W. Fu, K. Breininger, R. Schaffert, N. Ravikumar, T. Würfl, J. Fujimoto,
E. Moult, A. Maier, Frangi-Net: A Neural Network Approach to Vessel
Segmentation, in: A. Maier, D. T., H. H., M.-H. K., P. C., T. T. (Eds.),
Bildverarbeitung für die Medizin 2018, pp. 341–346.
[79] R. P. Poudel, P. Lamata, G. Montana, Recurrent fully convolutional neu-
ral networks for multi-slice mri cardiac segmentation, in: Reconstruction,
Segmentation, and Analysis of Medical Images, Springer, 2016, pp. 83–94.
[80] S. Andermatt, S. Pezold, P. Cattin, Multi-dimensional gated recurrent
units for the segmentation of biomedical 3d-data, in: Deep Learning and
Data Labeling for Medical Applications, Springer, 2016, pp. 142–151.

[81] G. Wu, M. Kim, Q. Wang, B. C. Munsell, D. Shen, Scalable high-


performance image registration framework by unsupervised deep feature
representations learning, IEEE Transactions on Biomedical Engineering
63 (2016) 1505–1516.
[82] R. Schaffert, J. Wang, P. Fischer, A. Borsdorf, A. Maier, Metric-driven
learning of correspondence weighting for 2-d/3-d image registration, in:
German Conference on Pattern Recognition (GCPR).
[83] S. Miao, J. Z. Wang, R. Liao, Convolutional neural networks for robust
and real-time 2-d/3-d registration, in: Deep Learning for Medical Image
Analysis, Elsevier, 2017, pp. 271–296.

[84] X. Yang, R. Kwitt, M. Styner, M. Niethammer, Quicksilver: Fast predic-


tive image registration–a deep learning approach, NeuroImage 158 (2017)
378–396.

27
[85] R. Liao, S. Miao, P. de Tournemire, S. Grbic, A. Kamen, T. Mansi, D. Co-
maniciu, An artificial agent for robust image registration., in: AAAI, pp.
4168–4175.
[86] J. Krebs, T. Mansi, H. Delingette, L. Zhang, F. C. Ghesu, S. Miao,
A. Maier, N. Ayache, R. Liao, A. Kamen, Robust Non-Rigid Registration
through Agent-Based Action Learning, in: Springer (Ed.), Medical Im-
age Computing and Computer-Assisted Intervention - MICCAI 2017, pp.
344–352.
[87] X. Zhong, S. Bayer, N. Ravikumar, N. Strobel, A. Birkhold,
M. Kowarschik, R. Fahrig, A. Maier, Resolve Intraoperative Brain Shift
as Imitation Game, in: M. 2018 (Ed.), MICCAI Challenge 2018 for Cor-
rection of Brainshift with Intra-Operative Ultrasound (CuRIOUS 2018).
[88] I. Diamant, Y. Bar, O. Geva, L. Wolf, G. Zimmerman, S. Lieberman,
E. Konen, H. Greenspan, Chest radiograph pathology categorization via
transfer learning, in: Deep Learning for Medical Image Analysis, Elsevier,
2017, pp. 299–320.
[89] J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev,
S. Blackwell, H. Askham, X. Glorot, B. ODonoghue, D. Visentin, et al.,
Clinically applicable deep learning for diagnosis and referral in retinal
disease, Nature medicine 24 (2018) 1342.

[90] M. Aubreville, C. Knipfer, N. Oetter, C. Jaremenko, E. Rodner, J. Den-


zler, C. Bohr, H. Neumann, F. Stelzle, A. Maier, Automatic Classification
of Cancerous Tissue in Laserendomicroscopy Images of the Oral Cavity
using Deep Learning, Scientific Reports 7 (2017) 41598–017.
[91] G. Carneiro, J. Nascimento, A. P. Bradley, Deep learning models for
classifying mammogram exams containing unregistered multi-view images
and segmentation maps of lesions, in: Deep Learning for Medical Image
Analysis, Elsevier, 2017, pp. 321–339.
[92] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,
S. Thrun, Dermatologist-level classification of skin cancer with deep neural
networks, Nature 542 (2017) 115.
[93] J. Wu, I. Yildirim, J. J. Lim, B. Freeman, J. Tenenbaum, Galileo: Per-
ceiving physical object properties by integrating a physics engine with
deep learning, in: Advances in neural information processing systems, pp.
127–135.

[94] M. Chu, N. Thuerey, Data-driven synthesis of smoke flows with cnn-based


feature descriptors, ACM Transactions on Graphics (TOG) 36 (2017) 69.
[95] F. Meister, T. Passerini, V. Mihalef, A. Tuysuzoglu, A. Maier, T. Mansi,
Towards fast biomechanical modeling of soft tissue using neural networks,

28
Medical Imaging meets NeurIPS Workshop at 32nd Conference on Neural
Information Processing Systems (NeurIPS), 2018.
[96] J. Maier, Y. Berker, S. Sawall, M. Kachelrieß, Deep scatter estimation
(dse): feasibility of using a deep convolutional neural network for real-
time x-ray scatter prediction in cone-beam ct, in: Medical Imaging 2018:
Physics of Medical Imaging, volume 10573, International Society for Op-
tics and Photonics, p. 105731L.
[97] M. Unberath, J.-N. Zaech, S. C. Lee, B. Bier, J. Fotouhi, M. Armand,
N. Navab, Deepdrr – a catalyst for machine learning in fluoroscopy-guided
procedures, in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-
López, G. Fichtinger (Eds.), Medical Image Computing and Computer
Assisted Intervention – MICCAI 2018, Springer International Publishing,
Cham, 2018, pp. 98–106.
[98] F. Horger, T. Würfl, V. Christlein, A. Maier, Towards arbitrary noise
augmentationdeep learning for sampling from arbitrary probability dis-
tributions, in: International Workshop on Machine Learning for Medical
Image Reconstruction, Springer, pp. 129–137.
[99] X. Han, Mr-based synthetic ct generation using a deep convolutional
neural network method, Medical physics 44 (2017) 1408–1419.
[100] B. Stimpel, C. Syben, T. Würfl, K. Mentl, A. Dörfler, A. Maier, MR
to X-ray Projection Image Synthesis, in: F. Noo (Ed.), Proceedings of
the 5th International Conference on Image Formation in X-ray Computed
Tomography (CT-Meeting), pp. 435–438.
[101] F. Schiffers, Z. Yu, S. Arguin, A. Maier, Q. Ren, Synthetic fundus flu-
orescein angiography using deep neural networks, in: A. Maier, T. M.
Deserno, H. Handels, K. H. Maier-Hein, C. Palm, T. Tolxdorff (Eds.),
Bildverarbeitung für die Medizin 2018, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2018, pp. 234–238.
[102] J. P. Cohen, M. Luck, S. Honari, Distribution matching losses can hal-
lucinate features in medical image translation, in: A. F. Frangi, J. A.
Schnabel, C. Davatzikos, C. Alberola-López, G. Fichtinger (Eds.), Med-
ical Image Computing and Computer Assisted Intervention – MICCAI
2018, Springer International Publishing, Cham, 2018, pp. 529–536.
[103] G. Wang, J. C. Ye, K. Mueller, J. A. Fessler, Image reconstruction is a
new frontier of machine learning., IEEE transactions on medical imaging
37 (2018) 1289–1296.
[104] M. T. McCann, K. H. Jin, M. Unser, A review of convolutional neural
networks for inverse problems in imaging, arXiv preprint arXiv:1710.04011
(2017).

29
[105] Z. Zhang, X. Liang, X. Dong, Y. Xie, G. Cao, A sparse-view ct recon-
struction method based on combination of densenet and deconvolution.,
IEEE transactions on medical imaging 37 (2018) 1407–1417.
[106] A. Kofler, M. Haltmeier, C. Kolbitsch, M. Kachelrieß, M. Dewey, A u-nets
cascade for sparse view computed tomography, in: International Work-
shop on Machine Learning for Medical Image Reconstruction, Springer,
pp. 91–99.
[107] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, M. S. Rosen, Image re-
construction by domain-transform manifold learning, Nature 555 (2018)
487.

[108] Y. Huang, T. Würfl, K. Breininger, L. Liu, G. Lauritsch, A. Maier, Some


investigations on robustness of deep learning in limited angle tomogra-
phy, in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López,
G. Fichtinger (Eds.), Medical Image Computing and Computer Assisted
Intervention – MICCAI 2018, Springer International Publishing, Cham,
2018, pp. 145–153.
[109] J. C. Ye, Y. Han, E. Cha, Deep convolutional framelets: A general deep
learning framework for inverse problems, SIAM Journal on Imaging Sci-
ences 11 (2018) 991–1048.
[110] E. Kang, W. Chang, J. Yoo, J. C. Ye, Deep convolutional framelet denos-
ing for low-dose ct via wavelet residual network, IEEE transactions on
medical imaging 37 (2018) 1358–1369.
[111] Y. Han, J. C. Ye, Framing u-net via deep convolutional framelets: Appli-
cation to sparse-view ct, IEEE transactions on medical imaging 37 (2018)
1418–1429.

[112] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson,


T. Pock, F. Knoll, Learning a variational network for reconstruction of
accelerated mri data, Magnetic resonance in medicine 79 (2018) 3055–
3071.

[113] V. Vishnevskiy, S. J. Sanabria, O. Goksel, Image reconstruction via varia-


tional network for real-time hand-held sound-speed imaging, in: Interna-
tional Workshop on Machine Learning for Medical Image Reconstruction,
Springer, pp. 120–128.
[114] J. Adler, O. Öktem, Learned primal-dual reconstruction, IEEE transac-
tions on medical imaging 37 (2018) 1322–1332.

[115] T. Würfl, F. C. Ghesu, V. Christlein, A. Maier, Deep learning computed


tomography, in: International Conference on Medical Image Computing
and Computer-Assisted Intervention, Springer, pp. 432–440.

30
[116] T. Würfl, M. Hoffmann, V. Christlein, K. Breininger, Y. Huang, M. Un-
berath, A. K. Maier, Deep learning computed tomography: Learning
projection-domain weights from image domain in limited angle problems,
IEEE transactions on medical imaging 37 (2018) 1454–1463.
[117] C. Syben, B. Stimpel, K. Breininger, T. Würfl, R. Fahrig, A. Dörfler,
A. Maier, Precision learning: Reconstruction filter kernel discretization,
in: Proceedings of the CT Meeting 2018. To appear.
[118] K. Hammernik, T. Würfl, T. Pock, A. Maier, A deep learning architecture
for limited-angle computed tomography reconstruction, in: K. H. Maier-
Hein, geb. Fritzsche, T. M. Deserno, geb. Lehmann, H. Handels, T. Tolx-
dorff (Eds.), Bildverarbeitung für die Medizin 2017, Springer Berlin Hei-
delberg, Berlin, Heidelberg, 2017, pp. 92–97.
[119] C. Syben, B. Stimpel, J. Lommen, T. Würfl, A. Dörfler, A. Maier, De-
riving neural network architectures using precision learning: Parallel-to-
fan beam conversion, in: German Conference on Pattern Recognition
(GCPR).
[120] Y. Zhang, H. Yu, Convolutional neural network based metal artifact
reduction in x-ray computed tomography, IEEE transactions on medical
imaging 37 (2018) 1370–1381.
[121] B. Bier, K. Aschoff, C. Syben, M. Unberath, M. Levenston, G. Gold,
R. Fahrig, A. Maier, Detecting anatomical landmarks for motion estima-
tion in weight-bearing imaging of knees, in: International Workshop on
Machine Learning for Medical Image Reconstruction, Springer, pp. 83–90.
[122] T.-M. Li, M. Gharbi, A. Adams, F. Durand, J. Ragan-Kelley, Differ-
entiable programming for image processing and deep learning in halide,
ACM Transactions on Graphics (TOG) 37 (2018) 139.
[123] G. Wang, A perspective on deep imaging, IEEE Access 4 (2016) 8914–
8924.
[124] C. Sun, A. Shrivastava, S. Singh, A. Gupta, Revisiting unreasonable
effectiveness of data in deep learning era, arXiv preprint arXiv:1707.02968
1 (2017).
[125] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Is object localization for free?-
weakly-supervised learning with convolutional neural networks, in: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 685–694.

31

You might also like