Denker2021 - Conditional Normalizing Flow
Denker2021 - Conditional Normalizing Flow
Imaging
Article
Conditional Invertible Neural Networks for Medical Imaging
Alexander Denker * , Maximilian Schmidt , Johannes Leuschner and Peter Maass
Center for Industrial Mathematics, University of Bremen, Bibliothekstr. 5, 28359 Bremen, Germany;
[email protected] (M.S.); [email protected] (J.L.); [email protected] (P.M.)
* Correspondence: [email protected]
Abstract: Over recent years, deep learning methods have become an increasingly popular choice
for solving tasks from the field of inverse problems. Many of these new data-driven methods
have produced impressive results, although most only give point estimates for the reconstruction.
However, especially in the analysis of ill-posed inverse problems, the study of uncertainties is
essential. In our work, we apply generative flow-based models based on invertible neural networks
to two challenging medical imaging tasks, i.e., low-dose computed tomography and accelerated
medical resonance imaging. We test different architectures of invertible neural networks and provide
extensive ablation studies. In most applications, a standard Gaussian is used as the base distribution
for a flow-based model. Our results show that the choice of a radial distribution can improve the
quality of reconstructions.
Academic Editors: Fabiana Zama and where e ∈ Y describes the noise. Research in inverse problems has mainly focused on
Elena Loli Piccolomini developing algorithms for obtaining stable reconstructions of the true image x † in the
presence of noise. In recent years, data-driven methods have been increasingly used in
Received: 31 August 2021 research and applications to solve inverse problems [1]. The choice of methods ranges
Accepted: 13 November 2021 from post-processing approaches [2], unrolling iterative schemes as neural network lay-
Published: 17 November 2021 ers [3,4], and learned regularization terms [5] to complete learning of an inversion model
from data [6]. However, many data-driven methods only give a point estimate of the
Publisher’s Note: MDPI stays neutral solution as output. However, especially for ill-posed inverse problems, an estimation of
with regard to jurisdictional claims in the uncertainties is essential. In order to incorporate uncertainties arising in the inversion
published maps and institutional affil- process, the reconstruction process can be interpreted in a statistical way as a quest for
iations. information [7,8]. Instead of approximating a single point estimate, we are interested in
the entire conditional distribution p( x |yδ ) of the image given the noisy measurement data.
Traditionally, methods such as Markov chain Monte Carlo [9] or approximate Bayesian
computation [10] have been used to estimate the unknown conditional distribution. How-
Copyright: © 2021 by the authors. ever, these methods are often computationally expensive and unfeasible for large-scale
Licensee MDPI, Basel, Switzerland. imaging problems. A new approach is the application of deep generative models for this
This article is an open access article task. In general, the goal of a deep generative model is to learn a surrogate model for
distributed under the terms and the unknown distribution based on samples. Well-known approaches from the field of
conditions of the Creative Commons generative networks are variational autoencoders (VAEs) [11,12] and generative adversar-
Attribution (CC BY) license (https://
ial networks (GANs) [13]. Recently, flow-based generative models [14] were introduced,
creativecommons.org/licenses/by/
which use an invertible transformation to learn a continuous probability density. One of the
4.0/).
advantages is that flow-based models allow exact likelihood computation, thus allowing
for maximum likelihood training.
1.2. Contributions
Prior work on cINNs for inverse problems dealt mainly with image-to-image prob-
lems [18,19] or lower-dimensional applications [17]. These cINNs are implemented using
two components: an invertible neural network used for the normalizing flows and a con-
ditioning network used to extract features from the conditional input. This conditioning
network does not have to be invertible and is often implemented as a convolutional neural
network (CNN). In our work, we expand these concepts to inverse problems in medical
imaging, where the topology of the measurement space and the image space differ signifi-
cantly. In CT reconstruction, the measurements are line integrals over the image domain.
In MR imaging, the measurements can be interpreted in the frequency domain. This
creates an additional challenge, as CNNs are built to take advantage of local relationships
and often fail when there are global relationships in the measurements. We address this
problem by integrating a traditional reconstruction operator into the conditioning network
of the cINN. For the problem of CT reconstruction, we use the filtered back-projection
(FBP) operator, and for MRI, we use the zero-filled inverse Fourier transform. Further, we
experiment with two different invertible neural network architectures found in literature:
the multi-scale architecture popularized in the Real NVP framework [25] and an invertible
UNet, as proposed by Etmann et al. [26]. Additionally, we propose the use of a different
base distribution, a radial Gaussian distribution, instead of the widely used standard
normal distribution.
latent-variable models, autoregressive models [27,28], and normalizing flows (NFs) [29].
The latent-variable models include implicit models, such as generative adversarial net-
works (GANs) [13] and variational autoencoders (VAEs) [11,12]. These latent-variable
models work by specifying a lower-dimensional latent space and learning a conditional
distribution to sample from the image space. GANs are trained using a critic or discrimina-
tor network in an adversarial scheme. It was recently shown that GANs have the ability to
produce realistic-looking images [30]. However, it is not possible to compute the likelihood
with a GAN. VAEs induce a noisy observation model and utilize a lower bound to the exact
likelihood function for training. So, it is only possible to evaluate an approximation to the
exact likelihood. Additionally, the noisy observation model often leads to blurry-looking
images. For autoregressive models (ARMs), the joint distribution is factorized into a prod-
uct of conditional distributions using the product rule. Using this factorization, neural
networks are used to model the dependencies. In this way, the likelihood of an ARM can
be computed exactly, but sampling from such a model can be slow. Recently, score-based
generative models were proposed [31], which are trained to approximate the gradient of the
density and rely on Langevin dynamics for sampling. Models based on the concept of NFs
have the advantage of allowing exact likelihood calculation, thus offering the possibility
to use a maximum likelihood training and a fast sampling procedure. In distinction to
VAEs, they are invertible by design and have no reconstruction loss. Recently, stochastic
NFs [32] were introduced, which interweave the deterministic invertible transformations
of an NF with stochastic sampling, promising more expressive transformations. For more
information, we refer to the recent review article by Ruthotto and Haber [33].
For a given noise model, the likelihood p(yδ | x ) can be evaluated using the forward
model A : X → Y [34]. The prior p( x ) encodes information about the image. Deep
generative models are usually incorporated in two ways: learning a model for the prior
p( x ) [35] or learning a model for the full posterior distribution p( x |yδ ) [19,22]. To explore
the posterior distribution, other point estimates can be used. Commonly, the maximum a
posteriori (MAP) estimate
or the pointwise conditional mean E[ x |yδ ] is used as a reconstruction, and the pointwise
conditional variance Var[ x |yδ ] is used to assess the uncertainty. As computing the con-
ditional mean and the conditional variance would require solving a high-dimensional
integral, we use an approximation to estimate both moments as
n N
1 1
\
E[ x |yδ ] =
N ∑ xi and \
Var [ x |yδ ] =
n ∑ (xi − E[\
x |yδ ])2 , (4)
i =1 i =1
with N i.i.d. samples { xi } drawn from the trained model. In our experiments, we focus on
directly learning a model for the full posterior p( x |yδ ).
J. Imaging 2021, 7, 243 4 of 27
This exact formulation of the probability density offers the possibility to fit the pa-
rameters θ of the NF using maximum likelihood estimation [36]. Assume that we have a
dataset of i.i.d. samples { x (i) }iN=1 from an unknown target distribution; then, this objective
is used for training the NF:
N
max L(θ ) = ∑ log( pθ (x(i) ))
i =1
(7)
N
= ∑ log p( Tθ−1 ( x (i) )) + log | det JTθ ( Tθ−1 ( x (i) ))| .
i =1
( T2 ◦ T1 )−1 = T1−1 ◦ T2−2 and det JT2 ◦T1 (z) = det JT2 ( T1 (z)) · det JT1 (z) (8)
K
pθ (zK ) = pz0 ( Tθ−1 (zK )) ∏ | det JTk ( Tk−1 (zk ))|−1 (9)
k =1
with zk−1 = Tk−1 (zk ). This composition of transformations leads to the name normal-
izing flow [29]. The transformations Ti are a critical part of this formulation. We need
transformations that:
• are easily invertible,
• offer an efficient calculation of the logarithm of the Jacobian determinant,
and are still expressive enough to approximate complex distributions. Several differ-
ent models offer invertibility and tractable determinants, e.g., planar flows [37], residual
J. Imaging 2021, 7, 243 5 of 27
flows [38,39], or Sylvester flows [40]. We focus on a class of models that are based on
so-called coupling layers [36,41]. Besides the invertibility of the transformations, the sta-
bility of the inverse pass must also be taken into account. Behrmann et al. [42] showed
that typical normalizing flow building blocks can become highly unstable and, therefore,
numerically non-invertible.
y I1 = x I1
(10)
y I2 = G ( x I2 , M( x I1 )),
where G : Rn−d × Rn−d → Rn−d is called the coupling law, which has to be invertible with
respect to the first argument. The function M : Rd → Rn−d is the coupling function, which
does not need to be invertible and can be implemented as an arbitrary neural network. Two
main types of coupling functions have been studied in the literature: additive coupling
functions and affine coupling functions. Additive coupling, as used in [36], follows this
design:
y I1 = x I1 x = y I1
⇔ I1 (11)
y I2 = x I2 + M( x I1 ) x I2 = y I2 − M(y I1 ).
A more flexible type of coupling is affine coupling [25]. Affine coupling layers intro-
duce an additional scaling function to the translation of the additive coupling layer. In this
way, a scale s( x ) and a translation t( x ) are learned, i.e., M ( x ) = [s( x ), t( x )]:
y I1 = x I1 x I1 = y I1
⇔ (12)
y I2 = x I2 exp(s( x I1 )) + t( x I1 ) x I2 = exp(−s(y I1 )) (y I2 − t(y I1 ))
Instead of choosing exp(·), sometimes other functions that are non-zero everywhere
are used. Because one part of the input is unchanged during the forward pass of a coupling
layer, we get a lower block triangular structure for the Jacobian matrix:
!
∂y Im 0
= ∂y I2 ∂y I2 . (13)
∂x ∂x I1 ∂x I2
J. Imaging 2021, 7, 243 6 of 27
∂y
This allows us to compute the determinant as det = det ∂x II2 , which drastically
∂y
∂x 2
reduces the computational complexity. For additive coupling layers, this further reduces
to the identity matrix, i.e., they have a unit determinant. Affine coupling layers have a
diagonal structure in the block:
!
∂y I2
det = exp ∑ s( x1 )i ) . (14)
∂x I2 i∈ I
1
a) b) c)
Figure 1. Input image (a) and output of checkerboard downsampling (b) and haar downsampling (c).
Inspired by [26].
1 n
log( pz (z)) = − kzk22 − log(2π ). (15)
2 2
The second term is constant with respect to z and can be dropped during training. It
has been observed that the likelihood of flow-based models sometimes exhibits artifacts,
i.e., out-of-distribution data are often assigned a higher likelihood than training data [47].
In [48], the authors suggested that this behavior is due to the difference between the
high-likelihood set and the typical set in high-dimensional Gaussian distributions. For a
standard Gaussian, √ the region of the highest density is at its mean, but the typical set is
at a distance of d away from the mean. In [49], the authors addressed this problem for
Bayesian neural networks and chose a radial Gaussian distribution where the typical set
and high-density region coincided. This radial Gaussian was formulated in hyperspherical
coordinates, where the radius is distributed according to a half-normal distribution, i.e.,
r = |r̂ | with r̂ ∼ N (0, 1), and all angular coordinates follow a uniform distribution over the
hypersphere. We use this radial distribution as a base distribution for training flow-based
models. This radial distribution leads to the following log-likelihood:
√ !
2 kzk22
ln pz (z) = ln √ − (n − 1) ln(kzk2 ) − , (16)
πSn 2
where Sn is the surface of the n-dimensional unit sphere. The derivation can be found in
Appendix A.1. Sampling is nearly as efficient as for the standard Gaussian distribution.
First, a point x ∼ N (0, In ) is sampled and normalized. This point is then scaled using a
radius r = |r̂ | with r̂ ∼ N (0, 1).
Other base distributions have also been considered in the literature. Hagemann and
Neumayer used a Gaussian mixture model as a base distribution, which led to higher-
quality samples, especially in multi-modal applications [50].
∂Tθ−1 ( x; y)
!
−1
pθ ( x |y) = pz ( Tθ ( x; y)) det . (17)
∂x
∂Tθ−1 ( x;y)
We use JT −1 ( x; y) = ∂x as a shorthand notation for the Jacobian matrix. Fitting
θ
the parameters θ of the CNF can be done using a maximum likelihood loss:
N
max L(θ ) =
θ
∑ log( pθ (x(i) |y(i) ))
i =1
(18)
N
= ∑ log p( Tθ−1 ( x (i) ; y(i) )) + log (i ) (i )
| det JT −1 ( x ; y )|
θ
.
i =1
We use the same trick as for the NF and implement the CNF as a concatenation of
simple invertible building blocks.
J. Imaging 2021, 7, 243 8 of 27
y I1 = x I1
(19)
y I2 = G ( x I2 , M( x I1 , yδ ))
where G : Rn−d × Rn−d → Rn−d is called the coupling law, which has to invertible with
respect to the first argument. Function M : Rd × Rm → Rn−d is the coupling function.
Conditional coupling layers offer the same advantages as regular coupling layers, i.e.,
a block triangular Jacobian and analytical invertibility. In our experiments, we mainly
use conditional affine coupling layers, i.e., replacing s( x I1 ) and t( x I1 ) with s( x I1 , yδ ) and
t( x I1 , yδ ). For any fixed conditional input yδ , the conditional normalizing flow is invertible.
Another way of introducing the conditional input yδ into the model is to use a
conditional base distribution [51]. In this approach, the base distribution can be mod-
eled as a normal distribution where the mean and variance are functions of yδ , i.e.,
p(z|yδ ) = N (z; µ(yδ ), σ2 (yδ )). Both the mean and variance function can be parametrized
as a neural network and trained in parallel to the flow-based model.
x0 = x
(zi+1 , xi+1 ) = f i+1 ( xi , H i (yδ ))
z L = f L ( x L−1 , H L−1 (yδ ))
z = ( z1 , . . . , z L ).
Figure 2. Multi-scale architecture with conditioning network H. The conditioning network processes
the conditioning input yδ and outputs this to the respective conditional coupling layer.
xd0 = x
(ci+1 , xdi+1 ) = f di+1 ( xdi , Hui (yδ )), i = 0, . . . , L − 2
xdL = f dL ( xdL−1 , HuL−1 (yδ ))
xuL = xdL
xui−1 = f ui (( xui , ci ), Hdi (yδ )), i = L, . . . , 1
z= xu0
where indices d, u denote the down- and upsampling paths, respectively. Block f di consists
of coupling → downsampling → split, f dL is just coupling, and f ui is upsampling → concat
→ coupling. Compared with the multi-scale architecture, the iUNet concatenates the splits
step-by-step in the upsampling path and not all together in the last layer.
J. Imaging 2021, 7, 243 10 of 27
The conditioning UNet H creates outputs in the image domain X. Therefore, we can
introduce an additional conditioning loss, as proposed in Section 2.7. Specifically, we use
where α ≥ 0 is a weighting factor. Note that one can also use a pre-trained UNet with fixed
parameters as conditioning and benefit from the advantages of the CNF in comparison to a
simple post-processing approach.
Figure 3. End-to-end invertible UNet with conditioning network H. The conditioning network
processes the conditioning input yδ and outputs this to the respective conditional coupling layer.
3. Experimental Setup
In this section, we present three different applications used to evaluate different ar-
chitectures for conditional flow-based models. In the first example, we study compressed
sensing with Gaussian measurements on the popular MNIST dataset [53]. The other two
applications cover essential aspects of medical imaging: accelerated magnetic resonance
imaging and low-dose computed tomography. In these two medical imaging scenarios,
different sources introduce uncertainty into the reconstruction process. We have an un-
dersampling case in accelerated MRI, i.e., we have fewer measurements than necessary
according to the Nyquist–Shannon sampling theorem. So, a strong prior is needed for a
good reconstruction. The challenge in low-dose CT is that the lower radiation dose leads to
a worse signal-to-noise ratio. Although we are in an oversampling case, the reconstruction
is complicated by a more significant amount of noise.
Our source code is publicly available at https://github.com/jleuschn/cinn_for_
imaging (last accessed: 16 November 2021).
where x is the spatial varying mass absorption coefficient, which depends on tissue type
and density. The Radon transform corresponds to the log-ratio between the source intensity
and the measured intensity.
For continuous, noise-free measurements, the filtered back-projection (FBP) in combi-
nation with the Ram-Lak filter gives the exact inversion formula [56]. In general, recovering
the image is a mildly ill-posed problem in the sense of Nashed [57,58]. This means that
slight deviations in the measurement, e.g., noise, can lead to significant changes in the
reconstruction. The influence of the noise can be reduced by choosing an adequate filter for
the FBP. Another challenge arises from the discretization of real measurements, which can
lead to artifacts in the FBP reconstruction. Over the years, a number of different reconstruc-
tion methods, such as algebraic reconstruction techniques [59] (ART) and total variation
(TV) regularization [60], were introduced to compensate for the drawbacks of the FBP.
Recently, deep learning approaches extended the choice of methods to push the boundaries
on image quality for low-dose, sparse-angle, and limited-angle measurements [2,3,23,61].
In our experiments, we use the LoDoPaB-CT dataset [62] to replicate the challenges
that arise from low-dose CT measurements. The dataset contains over 40,000 normal-dose,
medical CT images from the human thorax from around 800 patients. Poisson noise is used
to simulate the corresponding low-dose measurements. See Figure 4 for an example of
a simulated low-dose measurement, an FBP reconstruction, and the ground-truth image.
LoDoPaB-CT has a dedicated test set that we use for the evaluation and comparison of our
models. In addition, there is a special challenge set with undisclosed ground-truth data.
We evaluate the best model from our experiments on this set to allow for a comparison
with other reconstruction approaches. The challenge results can be found on the online
leaderboard (https://lodopab.grand-challenge.org/evaluation/challenge/leaderboard/,
last accessed: 16 November 2021).
In MRI, one measures the radio-frequency (RF) responses of nuclei (e.g., protons) to
RF pulses while applying different external magnetic fields in order to obtain a density
image. A strong static magnetic field is applied, which causes the resonance frequency
of the nuclei to be within the RF range. Pulses at this frequency are emitted using an RF
transmitting coil, triggering RF response signals detected by an RF receiving coil. For
spatial encoding, configurable magnetic gradient fields G = ( Gx , Gy , Gz ) are applied that
change the applied magnetic field and thereby the resonance frequency depending on the
location. During a scan, different gradient fields G are selected for each repetition of a
pulse sequence.
A simple model for the measured receive coil signal in each repetition is given by
Z Z t
y(t) = x (r ) exp(−2πik (t) · r ) dr, k(t) = γ G (τ ) dτ,
0
where x is the spatial signal density (i.e., the image) and k specifies a position in the so-
called k-space, which coincides with the Fourier space. The choice of G determines the
trajectory of k for this repetition. By collecting samples from multiple repetitions, one can
obtain a complete Cartesian sampling of the k-space that satisfies the Nyquist–Shannon
sampling theorem. This enables (approximate) reconstruction via the inverse fast Fourier
transform (IFFT).
A major limiting factor is the time-consuming measurement process, which directly
depends on the number of repetitions required to obtain a full sampling of the k-space.
While using fewer repetitions accelerates the process, it leads to an underdetermined
reconstruction problem and can introduce artifacts due to the missing frequencies. In
order to reconstruct from undersampled measurement data, prior information needs to
be incorporated. Additionally, measurements are noisy in practice, further increasing
reconstruction ambiguity, since all solutions matching the measured data within the noise
level would be plausible. This strengthens the requirement of prior information.
In our experiments, we used the emulated single-coil measurements from the NYU
fastMRI database [64,65]. The fully sampled measurements were retrospectively sub-
sampled to simulate accelerated MRI data. See Figure 5 for an example of a subsampled
measurement, a zero-filled IFFT reconstruction, and the ground truth obtained from the full
measurement. We used an acceleration factor of 4, i.e., only 25% of frequencies were kept.
Undersampling was performed by selecting 8% of the lowest frequencies and randomly
adding higher frequencies until the acceleration factor was reached. The public dataset
consists of a training part and a validation part. In total, the training dataset includes
973 volumes (34,742 slices) and the validation dataset includes 199 volumes (7135 slices).
Additionally, there is a private test set that consists of 108 volumes (3903 slices). For this
private test set, only the undersampled measurements are available, and the models can
only be evaluated on the official fastMRI website (https://fastmri.org/, last accessed: 16
November 2021). Our best model can be found on the public leaderboard for “Single-Coil
Knee”, allowing for comparison with other approaches (our submission is named “cINN
v2”). The fastMRI dataset includes scans from two different pulse sequences: coronal
proton-density weighting with (PDFS) and without (PD) fat suppression. We trained our
models on the full dataset, but used the distinction into PD and PDFS for evaluation on the
validation set.
J. Imaging 2021, 7, 243 13 of 27
4. Results
In this section, we present the results of the three different experimental setups.
The focus here is on LoDoPaB-CT and fastMRI. For these use cases, we compare different
architectures and ablations during training. To assess the performance, we evaluate the
peak-signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) [66]
on the datasets. The PSNR is strongly related to the mean squared error and expresses the
ratio of the maximum possible value to the reconstruction error. In general, a higher PSNR
corresponds to a better reconstruction. The SSIM compares the overall image structure,
including luminance and contrast, of the reconstruction and the ground-truth image.
A detailed definition of the evaluation metrics can be found in Appendix A.3.
Table 1. Mean and standard deviation of the PSNR and SSIM for compressed sensing on the MNIST
test dataset. The conditioned mean was computed with 100 samples.
Truth Cond. mean Cond. std. Truth Cond. mean Cond. std.
0.4
0.3
0.3
0.2
0.2
0.1
0.1
Figure 6. Conditioned mean and standard deviation for the different inversion layers.
J. Imaging 2021, 7, 243 14 of 27
Figure 7. Samples from the posterior learned by the cINN. The ground-truth sample is shown in
the upper-left corner. In (a), we used the conditioning based on the TV-regularized reconstruction,
and in (b), the conditioning was chosen as the generalized inverse. It can be seen that individual
samples from the generalized inverse conditioning do not look realistic.
Table 2. Influence of the type of conditioning network for the multi-scale cINN. The mean and
standard deviation of the PSNR and SSIM were evaluated on the full LoDoPaB test set using 1000
samples for the cond. mean.
LoDoPaB-CT
Model Cond. Network PSNR SSIM
Multi-scale Average Pooling 33.15 ± 3.64 0.806 ± 0.156
CNN 34.64 ± 4.18 0.826 ± 0.160
ResNet 35.07 ± 4.34 0.831 ± 0.160
Based on these results, we chose the ResNet conditioning for the following experi-
ments. Note that we reduced the number of parameters of the multi-scale cINN in the other
experiments to be comparable with the iUNet model and shorten the time for training.
Overall, this has only a minor effect on the reconstruction quality.
0.04
0.035
0.030
0.03
0.025
0.02 0.020
0.015
0.01
0.010
0.0010 0.2
0.0005 0.1
0.0
Figure 8. Cond. mean and point-wise standard deviation for the iUNet and the multi-scale architec-
ture on the LoDoPaB-CT data.
J. Imaging 2021, 7, 243 16 of 27
Table 3. Mean and standard deviation of the PSNR and SSIM for the LoDoPaB-CT test set. Condi-
tioned mean computed with 100 samples. Unless stated otherwise, training noise was applied and
no cond. loss was used.
LoDoPaB-CT
Model Base Distribution Train Noise PSNR SSIM
Normal Yes 34.94 ± 4.24 0.829 ± 0.157
No 34.92 ± 4.26 0.829 ± 0.158
Multi-scale
Radial Yes 34.89 ± 4.29 0.823 ± 0.161
No 34.65 ± 4.25 0.829 ± 0.161
Normal Yes 34.65 ± 4.11 0.805 ± 0.151
No 34.48 ± 3.96 0.824 ± 0.153
iUNet
Radial Yes 34.58 ± 4.40 0.830 ± 0.158
No 34.57 ± 4.40 0.830 ± 0.158
Cond. Loss
Normal Yes 34.88 ± 4.17 0.809 ± 0.148
No 34.65 ± 4.11 0.805 ± 0.151
iUNet
Radial Yes 34.99 ± 4.39 0.825 ± 0.157
No 34.58 ± 4.40 0.830 ± 0.158
The results for the iUNet are given in the lower part of Table 3. The network benefits
from the additional loss on the output of the conditioning network. However, like for all
regularization terms, putting too much weight on the conditioning loss interferes with the
primary objective of the cINN model. The performance deteriorates in this case. The loss
also has a direct impact on the intermediate representations of the conditioning UNet. They
shift from feature selection to the reproduction of complete reconstructions. An example is
shown in Figure A1 in Appendix A.4.
We solve for x̂ using an iterative scheme and use as initialization our sample Tθ (y, z)
from the cINN. In our experiments, only using the maximum posterior solution as a recon-
struction often results in artifacts in the reconstructed image. Therefore, we transitioned to
the penalized version in Equation (22). An important topic is the choice of the parameter
λ. In Table 4, the results for both the iUNet and the multi-scale architecture are given.
Increasing the weighting factor λ from 0 to 1.0 leads to an improvement in terms of PSNR
and SSIM for both the multi-scale architecture and the iUNet. However, further increasing
the factor λ leads again to a deterioration in most cases.
In total, the reconstruction quality with the sample refinement is worse than for the
conditional mean approach. Therefore, we stick to the conditional mean reconstruction
technique for the following experiments on the fastMRI dataset.
Table 4. Mean and standard deviation for sample refinement on LoDoPaB for the first 100 samples
of the test set. Minimized Equation (22) for 100 iterations with a learning rate 1 × 10−4 . The initial
value was one sample from our model x0 = Tθ−1 (z, yδ ).
LoDoPaB-CT
Model λ PSNR SSIM
0 32.02 ± 3.18 0.742 ± 0.135
0.01 32.10 ± 3.21 0.749 ± 0.137
Multi-scale 0.1 32.56 ± 3.40 0.766 ± 0.142
1.0 33.03 ± 3.58 0.783 ± 0.148
10.0 32.97 ± 3.56 0.784 ± 0.149
0 32.16 ± 3.12 0.731 ± 0.126
0.01 32.31 ± 3.19 0.737 ± 0.128
iUNet 0.1 32.83 ± 3.41 0.759 ± 0.135
1.0 32.98 ± 3.45 0.765 ± 0.136
10.0 32.88 ± 3.40 0.756 ± 0.133
validation data in Table 5. Further, following the evaluation in [65], we present the results
subdivided into PD and PDFS.
Multi-Scale: mean Multi-Scale: std iUNet: mean iUNet: std Ground truth
4
6
0.00015
3
PD
4 0.00010
2
0.00005
2
1
0.00000
2.5
3
0.000100
2.0
PDFS
0.000075
2
1.5
0.000050
1.0
1 0.000025
0.5
0.000000
Figure 9. Cond. mean and point-wise standard deviation for the best-performing multi-scale
architecture and iUNet on the fastMRI data. Both networks use the radial base distribution and no
additional training noise, and the iUNet is trained with conditional loss.
Both networks were implemented such that the number of parameters was comparable
(2.5 Mio. for the iUNet and 2.6 Mio. for the multi-scale network). We used five scales for
all networks. For the iUNet additive coupling layers and for the multi-scale architecture,
affine coupling layers were used. Channel permutation after each coupling layer was
implemented using fixed 1 × 1 convolutions [45]. The conditioning network for the iUNet
was based on a UNet architecture. For the multi-scale network, we used an architecture
based on a ResNet. Both used the zero-filled IFFT as model-based inversion layer.
Table 5. Mean and standard deviation for the fastMRI dataset. Conditioned mean computed with 100 samples. Unless oth-
erwise specified, no additional training noise and no cond. loss were used.
fastMRI
PSNR SSIM
Model Base Distribution Train Noise
PD PDFS PD PDFS
Yes 29.15 ± 6.25 23.18 ± 8.20 0.777 ± 0.086 0.536 ± 0.105
Normal
No 28.54 ± 6.52 20.92 ± 9.87 0.776 ± 0.086 0.536 ± 0.105
Multi-scale
Yes 31.84 ± 3.56 25.76 ± 5.92 0.760 ± 0.092 0.515 ± 0.107
Radial
No 32.07 ± 2.34 26.54 ± 2.73 0.764 ± 0.090 0.522 ± 0.103
Normal No 27.85 ± 1.38 25.76 ± 2.10 0.622 ± 0.052 0.474 ± 0.096
iUNet
Radial No 31.89 ± 2.43 25.94 ± 2.86 0.732 ± 0.107 0.432 ± 0.126
Cond. Loss
Yes 27.91 ± 1.35 25.83 ± 2.12 0.628 ± 0.054 0.474 ± 0.096
Normal
No 27.85 ± 1.38 25.76 ± 2.10 0.622 ± 0.052 0.474 ± 0.096
iUNet
Yes 31.62 ± 2.26 26.04 ± 2.81 0.730 ± 0.096 0.469 ± 0.110
Radial
No 31.89 ± 2.43 25.94 ± 2.86 0.732 ± 0.107 0.432 ± 0.126
5. Discussion
In this work, we studied various configurations of conditioned flow-based models on
different datasets. The focus of our research was to determine best practices for the use of
cINN models for reconstruction tasks in CT and MRI. The two networks used, multi-scale
and iUNet, showed comparable performance in many cases. The results demonstrate that
a crucial part of the cINN models is the design of the conditioning network. A precise
model-based inversion layer and a subsequent extensive neural network can provide
diverse features for the CNF. In particular, the model-based layer forms an interesting basis
for combining mathematical modeling and data-driven learning. This can go much further
than the FBP and Fourier models used here.
The choice of the base distribution also has a significant impact on the model’s perfor-
mance. The radial Gaussian proved to be a valuable alternative to the normal Gaussian
distribution, primarily in reducing the reconstruction time by needing fewer samples for
the conditioned mean and avoiding common problems with high-dimensional distribu-
tions. For the noising during training and the additional conditioning loss, on the other
hand, there is no clear recommendation. The additional noise might help on small datasets,
where it acts as a data augmentation step. The conditioning loss requires extra tuning
of the weighting factor. More promising, therefore, might be the use of a pre-trained
reconstruction network whose parameters are frozen for use in cINN.
The experiments also indicated that the training of cINN models does not always
run without problems. Although invertible neural networks are analytically invertible,
it is possible to encounter instabilities in some situations, and the networks may become
numerically non-invertible. Furthermore, in this work, we used the conditional mean as a
reconstruction method for most of the experiments. However, other choices are possible.
In the following, we will address these topics in more detail.
J. Imaging 2021, 7, 243 20 of 27
5.1. Stability
Recently, it was noted that due to stability issues, an extensive invertible neural net-
work could become numerically non-invertible in test time due to rounding errors [42]. We
observed this problem when evaluating iUNets with affine coupling layers. In Figure 10,
we show the loss during training and an example reconstruction after training. It can
be observed that even when the training looks stable, one can get severe artifacts on
unknown test images. We did not observe this problem for the multi-scale architecture.
Affine coupling layers can have arbitrary large singular values in the inverse Jacobian
matrix, which leads to an unstable inverse pass. This effect is known as exploding in-
verse [42]. For increasing stability in the iUNets, we suggest using additive coupling blocks
in this architecture.
In addition, the inclusion of additional training noise led to severe instability in our
experiments with the iUNet on the fastMRI data. We did not obtain any meaningful
reconstructions for this case. In contrast, these issues occurred with neither the multi-scale
architecture on fastMRI nor the iUNet on LoDoPaB-CT.
3.8
3.6
bits per dimension
3.4
3.2
3.0
0 10,000 ,
20000 ,
30000 ,
40000 a) Ground Truth b) iUNet Reconstruction
Iteration
Figure 10. (Left) Moving average of loss during training. (Right) Ground-truth image from the
LoDoPaB test dataset and the corresponding iUNet reconstruction. The pixels in white visualize
exploding values in the reconstruction.
6. Conclusions
This work explored different architectures and best practices for applying conditional
flow-based methods to medical image reconstruction problems in CT and MRI. Our experi-
ments included two popular, invertible network designs. The iUNet [26] architecture is
inspired by the UNet [52], which is used extensively in imaging applications. The multi-
scale architecture is used in all major normalizing flow frameworks, such as Glow [45]
or NICE [36]. The invertible architectures were combined with a conditioning network,
which extracts various features from the measurements for the reconstruction process.
This cINN framework combines the advantages of memory-efficient invertible networks
and normalizing flows for uncertainty estimation with a versatile reconstruction model.
Additionally, it provides a direct way to combine model-based and data-driven approaches
in a single model.
The use of cINN models for medical image reconstruction is in its beginning stages,
and many possible improvements should be explored. We investigated the radial Gaussian
distribution as an alternative to the normal Gaussian base distribution. Our experiments
J. Imaging 2021, 7, 243 21 of 27
show that it can be beneficial in many cases. A promising next direction is the development
of novel invertible network architectures from existing approaches. For applications in
medical image reconstruction, state-of-the-art deep learning methods are based on unrolled
iterative methods [4]. In [23], an extensive evaluation of the LoDoPaB-CT dataset was
performed, and the best-scoring deep learning method was an unrolled learned primal–
dual algorithm [3]. These unrolled iterative methods can be made invertible [43,44], but
are currently only used for memory-efficient backpropagation. In further work, we want
to evaluate whether invertible iterative architectures can be integrated into flow-based
models.
Author Contributions: Conceptualization, A.D., M.S., and J.L.; Software, A.D., M.S., and J.L.; super-
vision, P.M.; project administration, P.M.; writing—original draft preparation, A.D., M.S., and J.L.;
writing—review and editing, A.D., M.S., and J.L.; visualization, A.D.; funding acquisition, P.M. All
authors reviewed, finalized, and approved the manuscript. All authors have read and agreed to the
published version of the manuscript.
Funding: J.L., M.S., A.D., and P.M. were funded by the German Research Foundation (DFG; GRK
2224/1). J.L. and M.S. additionally acknowledge support from the DELETO project funded by
the Federal Ministry of Education and Research (BMBF, project number 05M20LBB). A.D. further
acknowledges support from the Klaus Tschira Stiftung via the project MALDISTAR (project number
00.010.2019).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A
Appendix A.1. Radial Density
High-dimensional normal distributions do not behave intuitively, as known from the
low-dimensional settings. Sampling from a normal distribution gives mostly samples in
2
√ For an n-dimensional normal distribution N (µ, σ ), this typical set has a
the typical set.
distance of σ n from the expected value µ. In [49], the authors proposed an n-dimensional
radial Gaussian density in hyperspherical coordinates where:
• The radius r is distributed according to a half-normal distribution,
• All angular components ϕ1 , . . . , ϕn−2 ∈ [0, π ], ϕn−1 ∈ [0, 2π ] are uniformly dis-
tributed, yielding equal probability density at every point on the surface of the
n-dimensional sphere.
Our derivation of the likelihood closely follows that of [49]. We assume that all
dimensions are independently distributed. For the radius r, we get the density:
√
2 r2
p(r; θ ) = √ exp(− 2 ) for r ≥ 0. (A1)
πσ 2σ
Let v be a point on the unit sphere. We want every point on the unit sphere to be
equally likely, i.e.,
1 π n/2
p(v) = with Sn = 2 , (A2)
Sn Γ(n/2)
where Sn is the surface of the n-dimensional unit sphere. We can get the density for the
radial components p(φ1 , . . . , φn−1 ) by solving
1
p(v) dA = dA = p(φ1 , . . . , φn−1 ) dφ1 . . . dφn−1 . (A3)
Sn
J. Imaging 2021, 7, 243 22 of 27
n −2
1
p(φ1 , φ2 , . . . , φn−1 ) =
Sn ∏ sin(φi )n−1−i . (A5)
i =1
Setting σ = 1 for the radial components gives us the full density in hyperspherical
coordinates:
√
2 1 n −2
Sn i∏
pe (e = (r, φ1 , φ2 , . . . , φn−1 )) = √ exp(−r2 /2) sin(φi )n−1−i . (A6)
π =1
For our experiments, we are always working in Cartesian coordinates, so one has to
do a final transformation x = f (e) and use the change-of-variables theorem. The Jacobian
of the n-dimensional spherical coordinate transformation is known:
n −2
∂ f (e)
det
∂e
= r n −1 ∏ sin(φi )n−1−i . (A7)
i =1
Finally, we get
√ n −2 n −2
! −1
2
px (x) = √ exp(−r2 /2) ∏ sin(φi )n−1−i r n−1 ∏ sin(φi )n−1−i
πSn i =1 i =1
√
2 (A8)
= √ exp(−r2 /2)
πSn r n−1
√
2
= √ exp(−k x k2 /2)
πSn k x kn−1
In the conditional coupling section, we use an affine coupling layer and implement
the scale s and translation t using a small convolutional neural network with either 1 × 1
J. Imaging 2021, 7, 243 23 of 27
Conditional section
Affine coupling (CNN-subnet with 1 × 1 kernel)
1 × 1 convolution
8×
Affine coupling (CNN-subnet with 3 × 3 kernel)
1 × 1 convolution
L2 1 n
2
†
PSNR x̂, x := 10 log10 †
MSE( x̂, x )
, MSE x̂, x †
: = ∑
n i =1
x̂i − xi†
In general, higher PSNR values are an indication of a better reconstruction. The maxi-
mum image value L can be chosen in different ways. For the MNIST and the LoDoPab-CT
dataset, we compute the value per slice as L = max( x † ) − min( x † ). For evaluation on the
fastMRI dataset, we choose L as the maximum value per patient, i.e., per 3D volume.
Here, µ̂ j and µ j are the average pixel intensities, σ̂j and σj are the variances, and
Σ j is the covariance of x̂ and x † at the j-th local window. Constants C1 = (K1 L)2 and
C2 = (K2 L)2 stabilize the division. Just as with the PSNR metric, the maximum image
value L can be chosen in different ways. We use the same choices as specified in the
previous section.
J. Imaging 2021, 7, 243 24 of 27
Figure A1. Intermediate activations from a single layer in a conditioning UNet model. (Top) cINN
Model trained with just the log-likelihood. (Bottom) cINN model trained with an additional loss for
the conditioning UNet. One can observe that the conditioning network focuses on different parts
of the image if no special loss is used. Otherwise, it produces activations that are close to the final
reconstruction. In addition, there are many empty activations.
Table A1. Mean and standard deviation of the PSNR and SSIM for the LoDoPaB-CT test set. Condi-
tioned mean computed with 1000 samples. Unless stated otherwise, training noise was applied and
no cond. loss was used.
LoDoPaB-CT
Model Base Distribution Train Noise PSNR SSIM
Yes 34.99 ± 4.26 0.830 ± 0.158
Normal
No 34.97 ± 4.28 0.830 ± 0.157
Multi-scale
Yes 34.89 ± 4.29 0.823 ± 0.161
Radial
No 34.65 ± 4.25 0.829 ± 0.161
Yes 34.69 ± 4.13 0.806 ± 0.151
Normal
No 34.98 ± 4.19 0.823 ± 0.148
iUNet
Yes 34.75 ± 4.23 0.819 ± 0.153
Radial
No 34.57 ± 4.40 0.830 ± 0.158
Cond. Loss
Yes 34.92 ± 4.19 0.810 ± 0.148
Normal
No 34.69 ± 4.13 0.806 ± 0.151
iUNet
Yes 34.99 ± 4.39 0.825 ± 0.159
Radial
No 34.75 ± 4.23 0.819 ± 0.153
References
1. Arridge, S.; Maass, P.; Öktem, O.; Schönlieb, C.B. Solving inverse problems using data-driven models. Acta Numer. 2019, 28, 1–174.
[CrossRef]
2. Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans.
Image Process. 2017, 26, 4509–4522. [CrossRef] [PubMed]
3. Adler, J.; Öktem, O. Learned Primal-Dual Reconstruction. IEEE Trans. Med. Imaging 2018, 37, 1322–1332. [CrossRef]
J. Imaging 2021, 7, 243 25 of 27
4. Adler, J.; Öktem, O. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Probl. 2017, 33, 124007.
[CrossRef]
5. Lunz, S.; Schönlieb, C.; Öktem, O. Adversarial Regularizers in Inverse Problems. In Proceedings of the Advances in Neural
Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal,
QC, Canada, 3–8 December 2018.
6. Zhu, B.; Liu, J.Z.; Cauley, S.F.; Rosen, B.R.; Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature
2018, 555, 487–492. [CrossRef] [PubMed]
7. Tarantola, A.; Valette, B. Inverse problems = quest for information. J. Geophys. 1982, 50, 159–170.
8. Kaipio, J.; Somersalo, E. Statistical and Computational Inverse Problems; Springer: New York, NY, USA, 2005; Volume 160. [CrossRef]
9. Martin, J.; Wilcox, L.C.; Burstedde, C.; Ghattas, O. A stochastic Newton MCMC method for large-scale statistical inverse problems
with application to seismic inversion. SIAM J. Sci. Comput. 2012, 34, A1460–A1487. [CrossRef]
10. Sunnåker, M.; Busetto, A.G.; Numminen, E.; Corander, J.; Foll, M.; Dessimoz, C. Approximate bayesian computation. PLoS
Comput. Biol. 2013, 9, e1002803. [CrossRef]
11. Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In
Proceedings of the 31th International Conference on Machine Learning (ICML 2014), Beijing, China, 21–26 June 2014; Xing, E.P.,
Jebara, T., Eds.; PMLR: Cambridge, MA, USA, 2014; Volume 32, pp. 1278–1286.
12. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning
Representations (ICLR 2014), Banff, AB, Canada, 14–16 April 2014.
13. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial
Nets. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information
Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014.
14. Tabak, E.G.; Turner, C.V. A family of nonparametric density estimation algorithms. Commun. Pure Appl. Math. 2013, 66, 145–164.
[CrossRef]
15. Barbano, R.; Zhang, C.; Arridge, S.; Jin, B. Quantifying model uncertainty in inverse problems via bayesian deep gradient descent.
In Proceedings of the 2020 IEEE 25th International Conference on Pattern Recognition (ICPR), Virtual Event, 10–15 January 2021;
pp. 1392–1399. [CrossRef]
16. Adler, J.; Öktem, O. Deep Bayesian Inversion. arXiv 2018, arXiv:1811.05910.
17. Ardizzone, L.; Kruse, J.; Rother, C.; Köthe, U. Analyzing Inverse Problems with Invertible Neural Networks. In Proceedings of
the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
18. Ardizzone, L.; Lüth, C.; Kruse, J.; Rother, C.; Köthe, U. Guided Image Generation with Conditional Invertible Neural Networks.
arXiv 2019, arXiv:1907.02392.
19. Ardizzone, L.; Kruse, J.; Lüth, C.; Bracher, N.; Rother, C.; Köthe, U. Conditional Invertible Neural Networks for Diverse
Image-to-Image Translation. In Proceedings of the Pattern Recognition (DAGM GCPR 2020), Tübingen, Germany, 28 September–1
October 2020; Akata, Z., Geiger, A., Sattler, T., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 373–387.
[CrossRef]
20. Andrle, A.; Farchmin, N.; Hagemann, P.; Heidenreich, S.; Soltwisch, V.; Steidl, G. Invertible Neural Networks Versus MCMC for
Posterior Reconstruction in Grazing Incidence X-Ray Fluorescence. In Proceedings of the Scale Space and Variational Methods in
Computer Vision—8th International Conference (SSVM 2021), Virtual Event, 16–20 May 2021; Elmoataz, A., Fadili, J., Quéau,
Y., Rabin, J., Simon, L., Eds.; Volume 12679 Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2021;
pp. 528–539. [CrossRef]
21. Anantha Padmanabha, G.; Zabaras, N. Solving inverse problems using conditional invertible neural networks. J. Comput. Phys.
2021, 433, 110194. [CrossRef]
22. Denker, A.; Schmidt, M.; Leuschner, J.; Maass, P.; Behrmann, J. Conditional Normalizing Flows for Low-Dose Computed
Tomography Image Reconstruction. In Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows,
and Explicit Likelihood Models, Vienna, Austria, 18 July 2020.
23. Leuschner, J.; Schmidt, M.; Ganguly, P.S.; Andriiashen, V.; Coban, S.B.; Denker, A.; Bauer, D.; Hadjifaradji, A.; Batenburg,
K.J.; Maass, P.; et al. Quantitative Comparison of Deep Learning-Based Image Reconstruction Methods for Low-Dose and
Sparse-Angle CT Applications. J. Imaging 2021, 7, 44. [CrossRef] [PubMed]
24. Hagemann, P.; Hertrich, J.; Steidl, G. Stochastic Normalizing Flows for Inverse Problems: A Markov Chains Viewpoint. arXiv
2021, arXiv:2109.11375.
25. Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using Real NVP. In Proceedings of the 5th International Conference on
Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017.
26. Etmann, C.; Ke, R.; Schönlieb, C.B. iUNets: Learnable invertible up-and downsampling for large-scale inverse problems. In
Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), Espoo, Finland,
21–24 September 2020; pp. 1–6. [CrossRef]
27. Van den Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel Recurrent Neural Networks. In Proceedings of the 33nd International
Conference on Machine Learning (ICML 2016), New York City, NY, USA, 19–24 June 2016; Balcan, M., Weinberger, K.Q., Eds.;
PMLR: Cambridge, MA, USA, 2016; Volume 48, pp. 1747–1756.
J. Imaging 2021, 7, 243 26 of 27
28. Van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Kavukcuoglu, K.; Vinyals, O.; Graves, A. Conditional Image Generation with
PixelCNN Decoders. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on
Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016.
29. Papamakarios, G.; Nalisnick, E.T.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing Flows for Probabilistic
Modeling and Inference. arXiv 2019, arXiv:1912.02762.
30. Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Proceedings of the
7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
31. Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic
Differential Equations. In Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), Virtual
Event, 3–7 May 2021.
32. Wu, H.; Köhler, J.; Noé, F. Stochastic Normalizing Flows. In Proceedings of the Advances in Neural Information Processing
Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS 2020), Virtual, 6–12 December 2020.
33. Ruthotto, L.; Haber, E. An introduction to deep generative modeling. GAMM-Mitteilungen 2021, 44, e202100008. [CrossRef]
34. Dashti, M.; Stuart, A.M. The Bayesian Approach to Inverse Problems. In Handbook of Uncertainty Quantification; Springer
International Publishing: Cham, Switzerland, 2017; pp. 311–428. [CrossRef]
35. Asim, M.; Daniels, M.; Leong, O.; Ahmed, A.; Hand, P. Invertible generative models for inverse problems: Mitigating
representation error and dataset bias. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020),
Virtual Event, 13–18 July 2020; Daumé, H., III, Singh, A., Eds.; PMLR: Cambridge, MA, USA, 2020; Volume 119, pp. 399–409.
36. Dinh, L.; Krueger, D.; Bengio, Y. NICE: Non-linear Independent Components Estimation. In Proceedings of the 3rd International
Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015.
37. Rezende, D.J.; Mohamed, S. Variational Inference with Normalizing Flows. In Proceedings of the 32nd International Conference
on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; Bach, F.R., Blei, D.M., Eds.; PMLR: Cambridge, MA, USA, 2015;
Volume 37, pp. 1530–1538.
38. Behrmann, J.; Grathwohl, W.; Chen, R.T.Q.; Duvenaud, D.; Jacobsen, J. Invertible Residual Networks. In Proceedings of the 36th
International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov,
R., Eds.; PMLR: Cambridge, MA, USA, 2019; Volume 97, pp. 573–582.
39. Chen, T.Q.; Behrmann, J.; Duvenaud, D.; Jacobsen, J. Residual Flows for Invertible Generative Modeling. In Proceedings of the
Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, BC, Canada, 8–14 December 2019.
40. Van den Berg, R.; Hasenclever, L.; Tomczak, J.M.; Welling, M. Sylvester Normalizing Flows for Variational Inference. In
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI 2018), Monterey, CA, USA, 6–10
August 2018; Globerson, A., Silva, R., Eds.; AUAI Press: Arlington, VA, USA, 2018; pp. 393–402.
41. Gomez, A.N.; Ren, M.; Urtasun, R.; Grosse, R.B. The Reversible Residual Network: Backpropagation Without Storing Activations.
In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information
Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017.
42. Behrmann, J.; Vicol, P.; Wang, K.; Grosse, R.B.; Jacobsen, J. Understanding and Mitigating Exploding Inverses in Invertible Neural
Networks. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021), Virtual
Event, 13–15 April 2021; Banerjee, A., Fukumizu, K., Eds.; PMLR: Cambridge, MA, USA, 2021; Volume 130, pp. 1792–1800.
43. Rudzusika, J.; Bajic, B.; Öktem, O.; Schönlieb, C.B.; Etmann, C. Invertible Learned Primal-Dual. In Proceedings of the NeurIPS
2021 Workshop on Deep Learning and Inverse Problems, Online, 13 December 2021.
44. Putzky, P.; Welling, M. Invert to Learn to Invert. In Proceedings of the Advances in Neural Information Processing Systems 32:
Annual Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019.
45. Kingma, D.P.; Dhariwal, P. Glow: Generative Flow with Invertible 1x1 Convolutions. In Proceedings of the Advances in Neural
Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal,
QC, Canada, 3–8 December 2018.
46. Jacobsen, J.; Smeulders, A.W.M.; Oyallon, E. i-RevNet: Deep Invertible Networks. In Proceedings of the 6th International
Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018.
47. Nalisnick, E.; Matsukawa, A.; Teh, Y.W.; Gorur, D.; Lakshminarayanan, B. Do Deep Generative Models Know What They Don’t
Know? In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9
May 2019.
48. Nalisnick, E.T.; Matsukawa, A.; Teh, Y.W.; Lakshminarayanan, B. Detecting Out-of-Distribution Inputs to Deep Generative
Models Using a Test for Typicality. arXiv 2019, arXiv:1906.02994.
49. Farquhar, S.; Osborne, M.A.; Gal, Y. Radial Bayesian neural networks: Beyond discrete support in large-scale Bayesian deep
learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Virtual Event,
26–28 August 2020; PMLR: Cambridge, MA, USA, 2020; Volume 108, pp. 1352–1362.
50. Hagemann, P.; Neumayer, S. Stabilizing Invertible Neural Networks Using Mixture Models. arXiv 2020, arXiv:2009.02994.
51. Winkler, C.; Worrall, D.E.; Hoogeboom, E.; Welling, M. Learning Likelihoods with Conditional Normalizing Flows. arXiv 2019,
arXiv:1912.00042.
J. Imaging 2021, 7, 243 27 of 27
52. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of
the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich,
Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham,
Switzerland, 2015; pp. 234–241.
53. LeCun, Y.; Cortes, C.; Burges, C.C.J. The MNIST Handwritten Digit Database. Available online: http://yann.lecun.com/exdb/
mnist/ (accessed on 30 April 2020).
54. Genzel, M.; Macdonald, J.; März, M. Solving Inverse Problems with Deep Neural Networks—Robustness Included? arXiv 2020,
arXiv:2011.04268.
55. Radon, J. On the determination of functions from their integral values along certain manifolds. IEEE Trans. Med. Imaging 1986,
5, 170–176. [CrossRef] [PubMed]
56. Buzug, T.M. Computed Tomography: From Photon Statistics to Modern Cone-Beam CT; Springer: Berlin/Heidelberg, Germany, 2008.
[CrossRef]
57. Nashed, M.Z. A new approach to classification and regularization of ill-posed operator equations. In Inverse and Ill-Posed Problems;
Engl, H.W., Groetsch, C.W., Eds.; Academic Press: Cambridge, MA, USA, 1987; pp. 53–75. [CrossRef]
58. Natterer, F. The Mathematics of Computerized Tomography; SIAM: Philadelphia, PA, USA, 2001. [CrossRef]
59. Gordon, R.; Bender, R.; Herman, G.T. Algebraic Reconstruction Techniques (ART) for three-dimensional electron microscopy and
X-ray photography. J. Theor. Biol. 1970, 29, 471–481. [CrossRef]
60. Sidky, E.Y.; Pan, X. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimiza-
tion. Phys. Med. Biol. 2008, 53, 4777. [CrossRef] [PubMed]
61. Bubba, T.A.; Galinier, M.; Lassas, M.; Prato, M.; Ratti, L.; Siltanen, S. Deep Neural Networks for Inverse Problems with
Pseudodifferential Operators: An Application to Limited-Angle Tomography. SIAM J. Imaging Sci. 2021, 14, 470–505. [CrossRef]
62. Leuschner, J.; Schmidt, M.; Baguer, D.O.; Maass, P. LoDoPaB-CT, a benchmark dataset for low-dose computed tomography
reconstruction. Sci. Data 2021, 8, 109. [CrossRef]
63. Doneva, M. Mathematical models for magnetic resonance imaging reconstruction: An overview of the approaches, problems,
and future research areas. IEEE Signal Process. Mag. 2020, 37, 24–32. [CrossRef]
64. Knoll, F.; Zbontar, J.; Sriram, A.; Muckley, M.J.; Bruno, M.; Defazio, A.; Parente, M.; Geras, K.J.; Katsnelson, J.; Chandarana, H.; et
al. fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction
Using Machine Learning. Radiol. Artif. Intell. 2020, 2, e190007. [CrossRef]
65. Zbontar, J.; Knoll, F.; Sriram, A.; Murrell, T.; Huang, Z.; Muckley, M.J.; Defazio, A.; Stern, R.; Johnson, P.; Bruno, M.; et al. fastMRI:
An Open Dataset and Benchmarks for Accelerated MRI. arXiv 2019, arXiv:1811.08839.
66. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE
Trans. Image Process. 2004, 13, 600–612. [CrossRef]
67. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on
Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015.
68. Uria, B.; Murray, I.; Larochelle, H. RNADE: The real-valued neural autoregressive density-estimator. In Proceedings of the
Advances in Neural Information Processing Systems 26: Annual Conference on Neural Information Processing Systems 2013,
Lake Tahoe, NV, USA, 5–8 December 2013.
69. Kobyzev, I.; Prince, S.; Brubaker, M. Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern
Anal. Mach. Intell. 2020, 43, 3964–3979. [CrossRef]