0% found this document useful (0 votes)
13 views11 pages

Research Paper Medical

Uploaded by

xaceh61553
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

Research Paper Medical

Uploaded by

xaceh61553
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Medical Image Analysis (2024)

Contents lists available at ScienceDirect

Medical Image Analysis


journal homepage: www.elsevier.com/locate/media

Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

Yinchi Zhoua,∗, Tianqi Chend , Jun Houd , Huidong Xiea , Nicha C. Dvorneka,b , S. Kevin Zhoue , David L. Wilsonf , James S.
arXiv:2405.12223v3 [eess.IV] 14 Aug 2024

Duncana,b,c , Chi Liua,b , Bo Zhoug,∗


a Department of Biomedical Engineering, Yale University, New Haven, CT, USA
b Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
c Department of Electrical Engineering, Yale University, New Haven, CT, USA.
d Department of Computer Science, University of California Irvine, Irvine, CA, USA
e School of Biomedical Engineering & Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
f Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA.
g Department of Radiology, Northwestern University, Chicago, IL, USA

ARTICLE INFO ABSTRACT

Article history: Image-to-image translation is a vital component in medical imaging processing, with
many uses in a wide range of imaging modalities and clinical scenarios. Previous meth-
ods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs),
2000 MSC: 41A05, 41A10, 65D05, which offer realism but suffer from instability and lack uncertainty estimation. Even
65D17 though both GAN and DM methods have individually exhibited their capability in med-
ical image translation tasks, the potential of combining a GAN and DM to further im-
Keywords: Image Translation, Diffusion
prove translation performance and to enable uncertainty estimation remains largely un-
Model, Uncertainty, Cascade Framework
explored. In this work, we address these challenges by proposing a Cascade Multi-path
Shortcut Diffusion Model (CMDM) for high-quality medical image translation and un-
certainty estimation. To reduce the required number of iterations and ensure robust per-
formance, our method first obtains a conditional GAN-generated prior image that will
be used for the efficient reverse translation with a DM in the subsequent step. Addition-
ally, a multi-path shortcut diffusion strategy is employed to refine translation results and
estimate uncertainty. A cascaded pipeline further enhances translation quality, incorpo-
rating residual averaging between cascades. We collected three different medical image
datasets with two sub-tasks for each dataset to test the generalizability of our approach.
Our experimental results found that CMDM can produce high-quality translations com-
parable to state-of-the-art methods while providing reasonable uncertainty estimations
that correlate well with the translation error.

© 2024 Elsevier B. V. All rights reserved.

1. Introduction imaging modalities, such as Digital Radiography (DR), Com-


puted Tomography (CT), and Magnetic Resonance Imaging
Image-to-image translation (I2I) plays an important role in (MRI). The applications can be summarized into both intra-
medical imaging with wide applications in different medical modality I2I and inter-modality I2I in medical imaging. In the
applications of medical X-ray, intra-modality I2I can achieve
the high-quality reconstruction of images under radiation dose
∗ Corresponding
author. reduction scenarios. For example, CT radiation dose reduction
e-mail: [email protected] (Yinchi Zhou),
can be accomplished by translating the sparse-view CT, i.e. ac-
[email protected] (Bo Zhou)
2 B. Zhou et al. / Medical Image Analysis (2024)

quired with a reduced number of projection views, into the full-


view CT (Zhou et al., 2021; Zhang et al., 2018; Wu et al., 2021).
Dual-energy (DE) DR radiation dose can be reduced by nearly
half by translating the standard single-shot DR into two-shot
DE images, i.e. soft-tissue and bone images (Zhou et al., 2019;
Yang et al., 2017; Liu et al., 2023b). In MRI applications, intra-
modality I2I can be used for image acquisition acceleration. For
example, one can use T1 to assist the synthesis/reconstruction
of T2 and FLAIR with no or undersampled k-space data (Yang
et al., 2020; Zhou and Zhou, 2020). In the application of CT-
free PET or SPECT attenuation correction, inter-modality I2I
that translates PET or SPECT into CT also helps remove the
need for CT acquisition, thus reducing the overall radiation dose
(Zhou et al., 2024; Chen et al., 2022b,a). Therefore, building an
accurate and robust I2I method that is generalizable to a wide
range of medical imaging applications is important.
With the recent advancements in deep learning (DL), many
DL-based I2I methods have been proposed and adapted to the Fig. 1. Illustration of previous I2I diffusion model generation process.
medical imaging field, demonstrating promising performance. Starting the reverse process with different noise initialization leads to di-
In general, prior I2I methods can be summarized into two vergent translation results.
classes: Generative Adversarial Network (GAN) and Diffusion
Model (DM).
Li et al. (2023) developed a Brownian Bridge Diffusion Model
With paired training data available for I2I, one of the most (BBDM) that learns the translation between two domains di-
widely used I2I GANs is the conditional GAN (cGAN (Isola rectly through the bidirectional diffusion process, i.e. Brownian
et al., 2017)), which consists of 1) a generator that aims to trans- Bridge, rather than a conditional generation process. Similarly,
late an input image into a target image, and 2) a discriminator Liu et al. (2023a) proposed a Schrodinger Bridge I2I Diffu-
that conditions on the initial input and the translation for adver- sion Model (I2SB) that directly learns the nonlinear diffusion
sarial training. A large amount of cGAN variants have been de- processes between two domains. Both had shown improved
veloped for various medical imaging applications. For example, I2I performance in natural image translation tasks. Similar to
Huang et al. (2021) proposed a GAN with dual discriminators I2I GANs, these DM methods have been applied in medical
on both image and gradient domains for low-dose CT (LDCT) imaging. For example, Moghadam et al. (2023) utilized DDPM
to full-dose CT (FDCT) translation. Denck et al. (2021) pro- to synthesize artificial histopathology images with rare cancer
posed a cGAN with an additional input of MRI acquisition in- subtypes to mitigate data imbalance problems for medical data.
formation for intra-MRI-modality translations. Nie et al. (2018) Lyu and Wang (2022) proposed to translate CT into MRI with
proposed to modify the cGAN with the addition of a gradient- conditional DDPM and score-matching models. The forward
based loss function, and showed successful applications in MRI and backward diffusion processes are guided by T2 MRI. Gong
to CT translation and 3T-MRI to 7T-MRI translation. Based on et al. (2023) proposed to perform brain PET image denoising
this, Zhou et al. (2019) further designed a multi-scale cGAN for with MRI as prior information to improve image quality. Gao
single-shot DR image to DE image translation. In PET, Gong et al. (2023) utilized a contextual contained network in the DM
et al. (2020) also proposed a GAN with parameter transferring to improve the LDCT denoising. Furthermore, 2D DMs have
for low-dose PET (LDPET) to full-dose PET (FDPET) trans- also been employed for 3D translation tasks, including low-
lation. Even though reasonable translation performance can be count PET image denoising (Xie et al., 2023), CT reconstruc-
achieved with simple and fast one-step inference from the gen- tion (Chung et al., 2023), and MRI super-resolution and recon-
erator, training GANs can be challenging due to the need to struction (Lee et al., 2023). Direct extension to 3D DM were
balance between the optimization of the generator and discrim- also explored (Pan et al., 2023). However, there are several
inator (e.g. find the saddle point of the min-max objective). The unique challenges of DM for I2I. First, those methods require
training is therefore susceptible to non-convergence and mode iterating over a large number of steps in the reverse process, and
collapse. most methods start the generation with pure random noise (Sa-
On the other hand, I2I diffusion models have been recently haria et al., 2022; Lyu and Wang, 2022; Gong et al., 2023; Xie
developed and show superior performance than GANs. For et al., 2023; Chung et al., 2023; Lee et al., 2023). This proto-
general-purpose I2I with DM, Saharia et al. (2022) proposed col not only significantly slows down the translation speed, but
a unified framework, Palette, which adds conditional image in- could also lead to diverged and sub-optimal translation results
puts to the previously developed Denoising Diffusion Proba- if different random noise initialization were used in the input
bilistic Model (DDPM (Ho et al., 2020)), thus enabling the when running multiple reverse runs (Figure 1). Even though di-
I2I functionality of DDPM. To reduce the randomized initial- rect bridging methods (Li et al., 2023; Liu et al., 2023a) are
ization process and improve the stability in I2I DM, direct translation deterministic given that no random noise input is
bridging diffusion methods have been investigated. Notably, used, they still require a large number of reverse iteration steps.
B. Zhou et al. / Medical Image Analysis (2024) 3

Fig. 2. The overall workflow of our proposed Cascade Multi-path Shortcut Diffusion Model (CMDM). CMDM consists of a one-step inference model
(green) and cascades of MPD block (grey). Each MPD block consists of multiple shortcut reverse paths starting with a prior image with different noise.
The cascades are connected with residual averaging operations.

Another challenge of the deterministic translation is that they bust performance. Second, we proposed to perform this short-
also cannot generate translation uncertainty maps which is cru- cut reverse process multiple times with different noise additions
cial for medical images, since the model’s prediction error can to the cGAN-generated prior. Then, refined translation can be
be used to pinpoint problem areas or give clinicians more infor- obtained by averaging the multi-path shortcut diffusion results.
mation (Shi et al., 2021; Jungo and Reyes, 2019; Wolleb et al., Meanwhile, the translation uncertainty can also be estimated
2022). It is then a unique advantage of the stochastic sampling by computing the standard deviation of the multi-path shortcut
process of the conditional DDPM (Saharia et al., 2022) to ob- diffusion results. Lastly, to further refine the translation, we de-
tain the uncertainty map through running the DM repeatedly vised a cascade pipeline with the multi-path shortcut diffusion
with multiple random noises (Wolleb et al., 2022). Therefore, embedded in each cascade. Between each cascade, we used a
it is desirable to develop an I2I DM method that can generate residual averaging strategy where each cascade’s prior image is
high-quality converged translation results with a reduced num- perturbed by averaging the last cascade’s output and the previ-
ber of required iterations, while also being able to provide trans- ous prior image. We collected three datasets in different medi-
lation uncertainty estimation. cal imaging modalities with different image translation applica-
tions. Our experimental results on these datasets demonstrated
Looking into prior works, even though both GAN and DM
that we can generate high-quality translation images, competi-
methods have individually exhibited their capability in medi-
tive with the prior state-of-the-art I2I methods. We also show
cal image translation tasks, the potential of combining GAN
our method can generate reasonable uncertainty estimation that
and DM for further improving translation performance re-
correlates well with the translation error.
mains largely unexplored. With this and to address the afore-
mentioned challenges in DM, we proposed a Cascade Multi-
path Shortcut Diffusion Model (CMDM) for medical image- 2. Methods
to-image translation in this work. Specifically, CMDM con-
sists of three key components. Firstly, we proposed to utilize 2.1. Cascaded Multi-path Shortcut Diffusion Model
a cGAN-generated prior image with diffusion (i.e. noise ad- The overall architecture of the Cascaded Multi-path Short-
dition) for providing an arbitrary time point’s input in the re- cut Diffusion Model (CMDM) is illustrated in Figure 2. The
verse process. With this shortcut strategy, 1) we need fewer CMDM consists of (1) a one-step inference model, i.e. cGAN
number of iterations thus reducing the processing time, and 2) (Isola et al., 2017), for generating a prior translation image, and
the reverse process starts with prior information from cGAN (2) a conditional denoising diffusion probabilistic model (cD-
instead of pure noise, thus leading to more consistent and ro- DPM) to further refine the prior translation image in a cascade
4 B. Zhou et al. / Medical Image Analysis (2024)

and multi-path fashion. The training and inference details are standard normal distribution N(yT |0, I) at T , the reverse process
as follows. starts at a pre-defined time point t s ∈ [0, T ] with
Training: Let us denote the input image as x and the translation √
ŷts = γts y prior + 1 − γts ϵ prior
p
target as y0 . For the prior image generation part, we utilized a (8)
generative network, i.e. UNet (Ronneberger et al., 2015), that
where y prior = f prior (x) and ϵ prior ∼ N(0, I). t s is empirically set
aims to predict y0 from x. The network can be trained in a
to 250, depending on the translation application. By rearrang-
conditional adversarial fashion (Isola et al., 2017) using both a
ing equation 6, we can approximate the target image y0 as
pixel-wise L2 loss
yt − 1 − γt fdm (x, ŷt , γt )
p
Lgen = || f prior (x) − y0 ||22 , (1) y0 = √ . (9)
γt
and a conditional adversarial loss
Then, by substituting this estimation of y0 into the posterior
Ladv = −log( fadv (y0 |x)) − log(1 − fadv ( f prior (x)|x)), (2) distribution of q(yt−1 |(y0 , yt )) in equation 5, each iteration of the
reverse process can be formulated as
where f prior (·) is the generative network for generating the prior
image and fadv (·) is the discriminator network. 1 1 − αt p
yt−1 = √ (yt − p fdm (x, yt , γt )) + 1 − αt ϵt (10)
On the other hand, the diffusion model consists of a forward αt 1 − γt
diffusion process and a reverse denoising process. The forward
diffusion process is a Markovian process that gradually adds where ϵt ∼ N(0, I). By starting the reverse process at shortcut
Gaussian noise to the target image y0 over T iterations, and can time point t = t s with guidance from the prior image, the condi-
be defined as: tional diffusion model is closer to the endpoint, i.e. t = 0, thus
providing less diverged prediction from multiple predictions.
T
Y To further improve the robustness, instead of only performing
q(y1:T |y0 ) = q(yt |yt−1 ), (3)
a single shortcut reverse path, we perform multiple shortcut re-
t=1
verse paths at t s with different noise initialization of ϵ prior in

where q(yt |yt−1 ) = N(yt−1 ; αt yt−1 , (1 − αt )I), and αt are the equation 8, and ensemble these multi-path predictions by aver-
noise schedule parameters. T is empirically set to 1000 here aging
Np
such that yT is visually indistinguishable from Gaussian noise. 1 X p
Then, the forward process can be further marginalized at each yavg
0 = y (11)
N p p=1 0
step as:

q(yt |y0 ) = N(yt ; γt y0 , (1 − γt )I), (4) where y0p is the prediction from a single shortcut reverse path
where γt = s=0 α s . Then, the posterior distribution of yt−1
Qt and N p is the number of shortcut paths. To further refine the
given (y0 , yt ) can be derived as: translation prediction, we perform this operation in a cascade
style. To avoid over-fitting in the reverse process, we designed
q(yt−1 |y0 , yt ) = N(yt−1 |µ, σ2 I), (5) a residual averaging strategy for new prior image generation in
√ √ the next cascade. Specifically, the new prior image is the av-
γ (1−α ) αt (1−γt−1 )
where µ = t−1
1−γt
t
y0 + and σ2 = (1−γt−1
1−γt yt )(1−αt )
1−γt . eraged image from the previous prior image and the translated
With this, the noisy image during the forward process can thus image from the last cascade. The full algorithm is summarized
be written as in Algorithm 1.

ŷt = γt y0 + 1 − γt ϵ
p
(6)
2.2. Dataset Preparation
where ϵ ∼ N(0, I). Here, the goal is to estimate the noise and
thus gradually remove it during the reverse process to recover We collected three medical image datasets with different
the target image y0 . In our conditional diffusion model, we uti- medical image translation applications to validate our method.
lized another generative network fdm (·) to estimate the noise The first application is the image translation of conventional
with another pixel-wise L2 loss single-exposure chest radiography images into two-shot-based
dual-energy (DE) images, which aims to reduce the expensive
Ldm = || fdm (x, ŷt , γt ) − ϵ||22 (7) system cost of the DE system and higher radiation dose of two
X-ray shots. Specifically, we collected 210 posterior-anterior
where x is the input image that is also used as conditional input DE chest radiographs with a two-shot DE digital radiography
here. ŷt is the noisy image, and γt is the current noise level. system (Zhou et al., 2019; Wen et al., 2018). The data was
The prior image generation network and the diffusion model acquired using a 60 kVp exposure followed by a 120 kVp ex-
network were trained separately. posure procedure with 100 ms between exposures. The size of
Inference: Once the prior image generation network f prior (·) the images is 1024 × 1024 pixels. Based on this dataset, we
of cGAN and the conditional diffusion network fdm (·) are con- further divide this task into two sub-tasks, including the trans-
verged from training, we can use them in CMDM for image lation of standard chest radiography into the soft-tissue image,
translation. The overall inference pipeline of CMDM is illus- and the translation of standard chest radiography into the bone
trated in Figure 2. Instead of starting the reverse process from a image. The second application is image translation across MRI
B. Zhou et al. / Medical Image Analysis (2024) 5

Algorithm 1: Inference Process - Cascaded Multi-path Shortcut Diffusion Model (CMDM)


Input: x ∈ N d1 ×d2
Initialize #1: t s ∈ [0, T ]: the start timestep of denoising process
Initialize #2: Nc : the number of cascades; N p : the number of shortcut paths
Initialize #3: f prior (·): prior image generation network; fdm (·): conditional diffusion network
for c = 1, 2, 3, ..., Nc do
if c = 1 then
yc0 = f prior (x) ; ▷ Prior image generation by one-step CNN inference
else
yc0 = 12 (yavg
0 + y0 ) ;
c−1
▷ Subsequent prior image generation by residual averaging
for p = 1, 2, 3, ..., Npp do

ytps = γts yc0 + 1 − γts ϵ p , ϵ p ∼ N(0, I) ; ▷ Adding noise to the prior image for shortcut at t s
for t = t s , t s − 1, t s − 2, ..., 1 do
ϵt ∼ N(0, I) ; ▷ Sampling noise in the reverse process
p √
yt−1 = √1αt (ytp − √1−αt fdm (x, ytp , γt )) + 1 − αt ϵt ; ▷ Iterative reverse process in a single path
1−γt
PN p
yavg
0 =
1
Np
p
p=1 y0 ; ▷ Averaging the multiple shortcut paths outputs
return yavg
0 ; ▷ Outputting the last cascade’s multi-path averaging result

Table 1. Quantitative comparisons of translation results from different methods. I2I applications include DE X-ray image generation (soft-tissue and bone
image), Sparse-view CT reconstruction (1/6 projection under-sampling and 1/4 projection under-sampling), and MRI inter-modality synthesis (T1-to-T2
and T1-to-FLAIR). The best results are marked in bold. ”†” means the differences between CMDM and all the previous baseline methods are significant
at p < 0.002. The averaged inference time of each method is reported in the right column.
DE X-ray Soft-Tissue Bone Average
Evaluation PSNR SSIM MAE PSNR SSIM MAE Time (Sec)
UNet 39.76 ± 2.36 0.984 ± 0.003 0.606 ± 0.071 41.33 ± 3.18 0.988 ± 0.003 0.571 ± 0.066 0.013
cGAN 39.82 ± 2.37 0.985 ± 0.003 0.603 ± 0.072 41.36 ± 3.17 0.988 ± 0.003 0.572 ± 0.065 0.013
Palette v1 42.89 ± 2.34 0.987 ± 0.002 0.390 ± 0.047 43.06 ± 3.16 0.989 ± 0.002 0.373 ± 0.042 13.670
Palette v2 43.11 ± 2.34 0.988 ± 0.002 0.382 ± 0.045 43.47 ± 3.13 0.990 ± 0.002 0.363 ± 0.043 273.420
I2SB 43.18 ± 2.35 0.988 ± 0.002 0.381 ± 0.045 43.49 ± 3.14 0.990 ± 0.002 0.367 ± 0.043 14.551
BBDM 43.08 ± 2.35 0.988 ± 0.002 0.382 ± 0.044 43.52 ± 3.13 0.989 ± 0.002 0.359 ± 0.043 15.121
Ours 44.27 ± 2.33† 0.991 ± 0.002† 0.369 ± 0.041† 44.58 ± 3.16† 0.992 ± 0.003† 0.348 ± 0.038† 154.663
CT 1/6 Sparse-view 1/4 Sparse-view Average
Evaluation PSNR SSIM MAE PSNR SSIM MAE Time (Sec)
UNet 44.11 ± 1.38 0.977 ± 0.004 0.372 ± 0.047 46.32 ± 1.27 0.981 ± 0.004 0.315 ± 0.040 0.006
cGAN 44.13 ± 1.39 0.978 ± 0.004 0.370 ± 0.047 46.35 ± 1.28 0.981 ± 0.004 0.314 ± 0.040 0.006
Palette v1 44.96 ± 1.24 0.980 ± 0.003 0.321 ± 0.041 46.75 ± 1.26 0.987 ± 0.004 0.310 ± 0.039 8.863
Palette v2 45.56 ± 1.24 0.981 ± 0.003 0.318 ± 0.040 46.95 ± 1.25 0.988 ± 0.003 0.308 ± 0.038 177.202
I2SB 45.86 ± 1.26 0.982 ± 0.003 0.317 ± 0.039 46.91 ± 1.26 0.989 ± 0.003 0.309 ± 0.039 9.561
BBDM 45.73 ± 1.24 0.981 ± 0.003 0.318 ± 0.040 46.96 ± 1.26 0.989 ± 0.003 0.309 ± 0.038 9.987
Ours 46.42 ± 1.22† 0.986 ± 0.003† 0.302 ± 0.039† 47.02 ± 1.25† 0.990 ± 0.003† 0.299 ± 0.038† 108.821
MRI T1 → T2 T1 → FLAIR Average
Evaluation PSNR SSIM MAE PSNR SSIM MAE Time (Sec)
UNet 27.17 ± 1.56 0.885 ± 0.042 0.222 ± 0.051 27.38 ± 1.59 0.891 ± 0.046 0.216 ± 0.052 0.006
cGAN 27.19 ± 1.58 0.887 ± 0.044 0.220 ± 0.052 27.41 ± 1.58 0.891 ± 0.047 0.217 ± 0.053 0.006
Palette v1 27.52 ± 1.57 0.890 ± 0.044 0.218 ± 0.051 27.68 ± 1.54 0.897 ± 0.046 0.210 ± 0.052 8.863
Palette v2 27.68 ± 1.55 0.891 ± 0.043 0.211 ± 0.051 27.79 ± 1.52 0.899 ± 0.044 0.206 ± 0.051 177.202
I2SB 27.85 ± 1.56 0.892 ± 0.043 0.208 ± 0.051 27.89 ± 1.54 0.898 ± 0.043 0.208 ± 0.052 9.561
BBDM 27.88 ± 1.56 0.892 ± 0.043 0.207 ± 0.051 27.86 ± 1.53 0.899 ± 0.045 0.207 ± 0.051 9.987
Ours 27.93 ± 1.54† 0.898 ± 0.042† 0.202 ± 0.051† 27.98 ± 1.54† 0.901 ± 0.044† .201 ± .051† 108.821

modalities, which aims to speed up the MRI acquisition that re- for each patient, and resized to 256 × 256 × 18. 360 2D axial
quires multiple protocols (Zhou and Zhou, 2020). Specifically, images are generated for each protocol. We further sub-divided
we collected an in-house MRI dataset consisting of 20 patients. this task into two components: translating the T1 image into the
We scanned each patient using three protocols, including T1, T2 image, and translating the T1 image into the FLAIR image.
T2, and FLAIR, resulting in three 3D volumes of 320×230×18 The third application is the image translation of sparse-view
6 B. Zhou et al. / Medical Image Analysis (2024)

Fig. 3. Qualitative comparison of translation results and corresponding error map from different methods. Examples from DE X-ray soft-tissue generation
(Left), Sparse-view CT reconstruction (Middle), and MRI T1-to-T2 synthesis are shown. The image quality metrics of each sample are indicated at the
bottom left of the images.

CT (SVCT) images into full-view CT images, which aims to 2.3. Evaluation Metrics and Baselines Comparisons
reduce the radiation dose in CT acquisition (Zhou et al., 2021,
2022b). We collected 10 whole-body CT scans from the AAPM To evaluate the translated image quality for the above-
Low-Dose CT Grand Challenge (McCollough, 2016). Each 3D mentioned applications, we used the Peak Signal-to-Noise Ra-
scan contains 318 ∼ 856 axial slices covering a wide range of tio (PSNR), Structural Similarity Index (SSIM), and Mean Ab-
anatomical regions from chest to abdomen to pelvis, resulting solute Error (MAE) that was computed against their corre-
in a total of 3397 axial 2D images. Using the CT projection sponding paired ground truth. For baseline comparisons, we
simulator, the fully sampled sinogram data was generated via compared our method’s results against previous one-step CNN-
360 projection views uniformly spaced between 0 and 360 de- based and diffusion-based image-to-image translation methods,
grees. Then, we uniformly sampled 90 and 60 projection views including cGAN (Isola et al., 2017), Palette (Saharia et al.,
from the 360 projection views, mimicking 4- and 6-fold pro- 2022), Schrodinger Bridge Diffusion Model (I2SB) (Liu et al.,
jection view/radiation dose reduction. The paired full-view and 2023a), and Brownian Bridge Diffusion Models (BBDM) (Li
sparse-view CT images were then reconstructed using Filtered et al., 2023). Given that Palette utilizes random Gaussian noise
Back Projection (FBP) based on these sinograms with the size as the initial input, we also compared two versions of Palette,
of 256 × 256. For all three applications/datasets, we performed including Palette with 1-sampling run (Palette v1) and Palette
5-fold cross-validation for evaluation considering their moder- with 20-sampling runs with results averaging (Palette v2). I2SB
ate scale. and BBDM only have the 1-sampling run version given that
there is no randomized input during sampling. Furthermore,
we also conducted ablative studies on the hyper-parameters of
CMDM, including the shortcut time point, number of shortcut
B. Zhou et al. / Medical Image Analysis (2024) 7

Fig. 4. Ablative studies on the reverse starting time (Left), the number of paths (Middle), and the number of cascades (Right). DE X-ray soft-tissue image
generation and 1/6 SVCT reconstruction were utilized for these studies. Peak performances were annotated on the plots with the corresponding image
quality metric, i.e. PSNR.

paths, and number of cascades. the streak artifact in the input FBP SVCT image. However,
significant residual errors can be found in the femoral head and
2.4. Implementation Details pelvic bone regions. On the other hand, we observe that the pre-
vious diffusion-based methods can suppress these errors, with
We implemented our method in PyTorch and performed ex-
PSNR reaching close to 46dB. Furthermore, with CMDM com-
periments using an NVIDIA H100 GPU. We train all models
bining cGAN and Diffusion, we can see that the overall error
with a batch size of 8 for 500k training steps. The Adam solver
of our translation results are reduced even more, and the image
was used to optimize our models with lr = 1 × 10−4 , β1 = 0.9,
quality is enhanced to PSNR of 46.23dB. Similar observations
and β2 = 0.99. We used an EMA rate of 0.9999. A 10k linear
can be found for the T1-to-T2 translation example in the last
learning rate warmup schedule was implemented. We used a
two columns.
linear noise schedule with 1000 time steps.
The quantitative comparisons were summarized in Table 1.
3. Experimental Results Similar to the observations from the visualizations, we can
see the traditional CNN-based approaches generally under-
Figure 3 shows qualitative comparisons between previous performed the diffusion-based approaches. For example, the
state-of-the-art and our methods. Examples from the DE X- cGAN only achieved an average PSNR of 39.82dB and MAE
ray dataset, SVCT reconstruction dataset, and MRI translation of 0.603 for the soft-tissue image translation, while the single
dataset are illustrated. For the DE X-ray example (left two reverse path Palette, i.e. Palette v1, significantly outperformed
columns), we can see all the previous translation methods can it with PSNR of 42.89dB and MAE of 0.390. Running multiple
generate reasonable soft-tissue images, i.e. rib-suppression im- reverse paths of Palette and averaging the outputs, i.e. Palette
ages, from the standard X-ray image. While cGAN could gen- v2, led to improved performance which reached similar per-
erate visually plausible results with a PSNR of 44.74dB, the formances of I2SB and BBDM with PSNR of 43.11dB and
translated images still suffer from relatively inaccurate quantifi- MAE of 0.382. In the last row, our CMDM achieved an av-
cation as indicated by the error map. On the other hand, we erage PSNR of 44.27dB and MAE of 0.369 that significantly
can see the previous diffusion-based methods, e.g. Palette and outperformed all the previous baseline methods. Comparing
BBDM, both achieved significantly better translation as com- the soft-tissue image translation task to the bone image trans-
pared to cGAN with PSNR improving to 46.05dB, with much lation task, the CMDM had slightly higher performance on the
fewer pixel-wise errors indicated by the error maps. In the latter task since the bone image without complex soft-tissue tex-
last row, we can find that our CMDM further improved over ture can be relatively easier to generate as compared to the soft-
the previous diffusion-based methods with PSNR reaching to tissue image. For the inference speed in DE X-ray applications,
47.34dB, where further reduced pixel-wise error can be found I2SB and BBDM with a single reverse path took an average of
in the cardiac and lung regions. Similarly, for the SVCT exam- 14.55 and 15.12 seconds, respectively. CMDM with the best
ple (the middle two columns), cGAN can reasonably suppress performance took an average of 154.66 seconds per inference
8 B. Zhou et al. / Medical Image Analysis (2024)

since multiple shortcut reverse paths are needed. Similar to the ablative studies on CMDM’s uncertainty estimation. Two ex-
quantitative results for the DE X-ray, we found our CMDM con- amples of DE X-ray and MRI T1-to-T2 translation are shown
sistently outperformed previous CNN and diffusion-based base- in Figure 5. On the bottom, both the pixel-wise absolute error
line methods for both the SVCT reconstruction applications and and the pixel-wise uncertainty (i.e. computed by the standard
the MRI inter-modality translation applications. deviation of multiple shortcut path predictions) are visualized.
The corresponding scatter plot of their pixel-wise relationship
is also shown on the right. We found that the pixel-wise un-
Table 2. Quantitative comparison of CMDM with different prior strategies.
Analysis with DE X-ray soft-tissue generation task, 1/6 SVCT reconstruc- certainty and the absolute error have a good correlation. For
tion task, and T1-to-T2 MRI synthesis task are reported. the DE X-ray example and the MRI example here, we have
MAE DE X-Ray SVCT MRI a correlation coefficient equal to 0.76 and 0.81, respectively.
w/o prior 0.379 ± 0.043 0.316 ± 0.042 0.210 ± 0.051 This is particularly useful when ground truth is unavailable to
UNet prior 0.370 ± 0.041 0.306 ± 0.041 0.203 ± 0.051 compute the translation error, where uncertainty can indicate
UFNet prior 0.366 ± 0.041 0.303 ± 0.040 0.201 ± 0.050 the potential error distributions. The correlation of the pixel-
cGAN prior 0.369 ± 0.041 0.302 ± 0.039 0.201 ± 0.051 wise uncertainty and the absolute error for the whole test set is
summarized in Table 4. By running multiple sampling runs of
We conducted ablative studies for the hyper-parameters in Palette (Saharia et al., 2022), i.e. Palette v2, it can also produce
CMDM, including the reverse starting time, the number of the pixel-wise standard deviation for uncertainty estimation. In
shortcut paths, and the number of cascades. The results for the Table 4, we can see CMDM achieving a better-averaged corre-
DE X-ray and SVCT are summarized in Figure 4. First, for the lation across all three translation applications.
reverse starting time, we can see that setting t s to around 200
yields the best performance, and the performance starts to de-
Table 4. Averaged correlation of the pixel-wise absolute error and the pixel-
grade if we further increase it. It is worth noticing that using wise uncertainty, i.e. computed by the standard deviation of multiple
t s = 200 here not only yields the best performance but allows paths’ predictions. DE X-ray soft-tissue generation task, 1/6 SVCT recon-
us to reduce the inference time by about 5 times as compared struction task, and T1-to-T2 MRI synthesis task are reported.
to the previous diffusion methods that start at t = 1000 or be- Correlation DE X-Ray SVCT MRI
yond. Second, for the number of shortcut paths, we can see Palette v2 0.678 ± 0.162 0.702 ± 0.137 0.676 ± 0.108
that the performance increases as we use an increasing num- CMDM 0.695 ± 0.142 0.718 ± 0.127 0.687 ± 0.089
ber of paths. The performance started to converge when 20
paths were used. Because the inference time increased linearly
as we increased the number of paths, we chose the converg- 4. Discussion
ing point N p = 20 in our method. Thirdly, for the number of
cascades, we found that the performance gradually boosted as In this work, we developed a novel image translation method,
the number of cascades increased. However, peak performance called CMDM, that efficiently integrates GAN and DM to en-
was reached when Nc = 3, and the inference started to overfit, able high-quality medical image-to-image translation. There
leading to degraded translation performance. Lastly, we inves- are several key advantages of this method. First, we utilized
tigated the impact on CMDM when different prior image gen- a previous CNN-based translation method to generate a virtual
erations were used, including priors from UNet (Ronneberger ”t = 0” image for the diffusion model. This image is added
et al., 2015), Under-to-fully-complete Network (UFNet (Zhou with the scheduled noise, so we can start the diffusion reverse
et al., 2022a)), and cGAN (Isola et al., 2017). As we can process at a scheduled shortcut time point. As illustrated in
see from Table 2, using CMDM with prior always outperforms Figure 1, initializing the reverse process with pure noise may
CMDM without prior. Among all the prior generated, CMDM lead to sub-optimal results, while here, starting the reverse pro-
with priors generated from cGAN and UFNet yields the best cess with a roughly estimated image (e.g. cGAN’s prediction)
performance. Moreover, we also studied CMDM with or with- with the scheduled noise not only can help stabilize the reverse
out the conditional input for the diffusion part. As we can see sampling process, but also reduce the required number of re-
from Table 3, CMDM without conditional input can still gen- verse iterations, i.e shorten the inference time. Second, instead
erate a reasonable translation guided by the prior image. How- of adding one noise schedule (Chung et al., 2022; Gao et al.,
ever, CMDM with condition input with more translation guid- 2023), we added different noises to this ”t = 0” image and per-
ance still yields the best performance. formed the same reverse process multiple times in each cas-
cade. The corresponding cascade output is simply the averaged
outputs from these paths. This averaging operation inherently
Table 3. Quantitative comparison of CMDM with or without images to be
reduces the randomness from the different noises and thus im-
translated as conditional inputs in the diffusion part. Analysis with DE
X-ray soft-tissue generation task, 1/6 SVCT reconstruction task, and T1- proves the translation robustness. Based on results from mul-
to-T2 MRI synthesis task are reported. tiple reverse runs, we can generate pixel-wise uncertainty esti-
MAE DE X-Ray SVCT MRI mation for the translation results, which is also a key advantage.
w/o condition 0.517 ± 0.059 0.339 ± 0.043 0.219 ± 0.053 Lastly, we also devised a cascade framework with a residual
w condition 0.369 ± 0.041 0.302 ± 0.039 0.202 ± 0.051 averaging strategy. This design helps us enhance performance
without training additional models, but may come at the cost of
In addition to the translation performance, we also conducted additional inference time. It is worth noticing that our CMDM
B. Zhou et al. / Medical Image Analysis (2024) 9

Fig. 5. Examples of CMDM’s uncertainty estimation for DE X-ray soft-tissue image generation (Left) and MRI T1-to-T2 synthesis (Right). The relationship
plots between the absolute error (bottom left) and the uncertainty (bottom right) were shown as well. Positive correlations with R > 0.75 were found for
both cases.

can be viewed as a plug-and-play module that helps improve clinical scenarios for estimating the error, we believe our un-
the performance of cGAN, i.e. the one-step inference model certainty estimation is potentially useful for the user to decide
used in CMDM, as shown in Table 1. Ideally, our approach which region is trustworthy for downstream applications, such
can also be added as a plug-and-play module to the other pre- as diagnosis and treatment planning.
vious translation methods for potential translation performance
improvements. The presented work also has limitations with several poten-
tial improvements that are important subjects of our future stud-
We collected three medical image datasets with a total of ies. Firstly, we only validated our method on three different
six different medical image translation tasks to validate our modalities, and evaluations on more diverse applications could
method. From our experimental results, we demonstrated our be included. Even though we framed CMDM as an image-space
method can generate high-quality translated images that con- post-processing tool here, we believe it can be further tailored
sistently outperformed previous baseline methods (Figure 3 and to specific translation problems. For example, we could include
Table 1). For example, we can see that CMDM achieved PSNR physic-informed modules, such as data consistency (Schlem-
> 44dB for both DE soft-tissue image generation and DE bone per et al., 2017; Song et al., 2021), in CMDM which may fur-
image generation, while all the previous methods are below ther improve its applications in medical image reconstruction
44dB. Although CMDM achieves the best performance, it re- (Zbontar et al., 2018; Sidky and Pan, 2022). Secondly, the
quires a relatively longer inference time as compared to pre- current CMDM is implemented in a 2D fashion, while 3D is
vious methods that need a single reverse run. For example, desirable in many medical image translation tasks. Theoret-
CMDM needs 154.66 seconds on average for the DE X-ray ap- ically, we could directly change all the networks in CMDM
plication, but Palette v1, I2SB, and BBDM only need about 13 into 3D networks to enable 3D applications, but it may be in-
seconds. However, we can either reduce the number of cascades feasible with the current computation resources. For example,
or the number of shortcut paths in CMDM to balance the com- we attempted to employ the 3D CMDM with an input size of
putation time and performance need. The default settings in 256 × 256 × 128 on an 80G H100 GPU, however, it cannot fit
our CMDM are Nc = 3 and N p = 20. According to the studies into the memory even with a single batch size. Alternatively,
reported in Figure 4, we could reduce the number of cascades we could also utilize multi-view diffusion or 2.5D or memory-
(Nc ) to 1 to shorten the inference time by nearly three times efficient strategies to scale CMDM into 3D (Chung et al., 2023;
which would result in PSNR=43.75dB. This result still outper- Xie et al., 2023; Bieder et al., 2024; Chen et al., 2024) which
formed all the previous baseline methods (Table 1). Similarly, will be extensively investigated in our future works. Thirdly, the
we could also reduce the number of shortcut paths (N p ) to 10 to inference speed is still considered relatively long as compared
cut the inference time by nearly half and still outperform all the to previous methods, especially the classic CNN-based meth-
previous baseline methods. On the other hand, we believe these ods. While we discussed the trade-off between performance and
hyper-parameters also need to be tuned for different translation speed in the previous paragraph, it is also desirable to maintain
applications to find the optimal balance between performance optimal performance with increased inference speed. Utiliz-
and computation/time budgets. Besides the translation itself, ing accelerated diffusion models, such as DDIM and Resshift
CMDM also generates pixel-wise uncertainty estimation. As (Song et al., 2020; Yue et al., 2024), in CMDM could poten-
we can see from Figure 5 and Table 4, CMDM’s uncertainty es- tially help achieve this goal. To accelerate the inference speed
timation demonstrated good correlations with the absolute error for time-critical clinical scenarios, such as real-time translation
that can only be computed when the ground truth is available. in intervention radiology, one could also consider alternative
Since ground truth is commonly unavailable when deployed in solutions. For example, we could also consider distilling the
10 B. Zhou et al. / Medical Image Analysis (2024)

diffusion model knowledge into the one-step inference GAN Declaration of Competing Interest
model (Kang et al., 2024), such that GAN with diffusion model
performance and real-time capability can be realized. Fourthly, The authors declare that they have no known competing fi-
in the current implementation of CMDM, we did not imple- nancial interests or personal relationships that could have ap-
ment ways to monitor the first step’s image generation. If un- peared to influence the work reported in this paper.
satisfactory results were generated in the first step, the error
could propagate to the next step. However, this should be re-
flected on the CMDM final uncertainty map where increased Credit authorship contribution statement
uncertainty value, i.e. pixel-wise standard deviation, should be
observed. On the other hand, we could also further include un- Yinchi Zhou: Conceptualization, Methodology, Software,
certainty estimation techniques, e.g. Monte Carlo Dropout (Gal Visualization, Validation, Formal analysis, Writing original
and Ghahramani, 2016), in the first step of cGAN, thus mon- draft. Tianqi Chen: Results analysis, Writing - review and
itoring prior image generation. Lastly, CMDM requires data editing. Jun Hou: Results analysis, Writing - review and edit-
with paired images for training, but such data may not always ing. Huidong Xie: Conceptualization, Methodology, Software,
be available in certain applications. Unpaired translation diffu- Writing - review and editing. Nicha C. Dvornek: Writing - re-
sion model strategies (Sasaki et al., 2021; Özbey et al., 2023) view and editing S. Kevin Zhou: Data preparation, Writing -
could also potentially be deployed here to mitigate this chal- review and editing. David L. Wilson: Data preparation, Writ-
lenge. For example, one could consider using CycleGAN to ing - review and editing. James S. Duncan: Writing - review
generate the prior image, and then using a multi-path version of and editing Chi Liu: Writing - review and editing Bo Zhou:
UNIT-DDPM (Sasaki et al., 2021) for further refinement of the Conceptualization, Methodology, Software, Visualization, Val-
prior image. This is an interesting direction to be investigated idation, Formal analysis, Writing original draft, Supervision.
in our future works. Moreover, future works also include evalu-
ations of how CMDM impacts the downstream clinical applica-
References
tions. For example, we will investigate if the CMDM-translated
images provide similar lesion detection capability or radiomic Bieder, F., Wolleb, J., Durrer, A., Sandkuehler, R., Cattin, P.C., 2024. Denoising
features when compared to the ground truth images, thus vali- diffusion models for memory-efficient processing of 3d medical images, in:
dating the clinical values of our method. Medical Imaging with Deep Learning, PMLR. pp. 552–567.
Chen, T., Hou, J., Zhou, Y., Xie, H., Chen, X., Liu, Q., Guo, X., Xia, M., Dun-
can, J.S., Liu, C., et al., 2024. 2.5 d multi-view averaging diffusion model for
3d medical image translation: Application to low-count pet reconstruction
5. Conclusion with ct-less attenuation correction. arXiv preprint arXiv:2406.08374 .
Chen, X., Pretorius, P.H., Zhou, B., Liu, H., Johnson, K., Liu, Y.H., King, M.A.,
Liu, C., 2022a. Cross-vender, cross-tracer, and cross-protocol deep transfer
learning for attenuation map generation of cardiac spect. Journal of Nuclear
Our work proposes a Cascaded Multi-path Shortcut Diffusion Cardiology 29, 3379–3391.
Model (CMDM) - a simple and novel strategy for high-quality Chen, X., Zhou, B., Xie, H., Shi, L., Liu, H., Holler, W., Lin, M., Liu, Y.H.,
medical image-to-image translation. The proposed method first Miller, E.J., Sinusas, A.J., et al., 2022b. Direct and indirect strategies of
utilizes a classic CNN-based translation method to generate a deep-learning-based attenuation correction for general purpose and dedi-
cated cardiac spect. European Journal of Nuclear Medicine and Molecular
prior image. By adding different noises to this image, we then Imaging 49, 3046–3060.
run multiple reverse samplings starting with the noisy images, Chung, H., Ryu, D., McCann, M.T., Klasky, M.L., Ye, J.C., 2023. Solving
i.e. shortcuts. With this process in each cascade, the transla- 3d inverse problems using pre-trained 2d diffusion models, in: Proceedings
tion output is obtained by averaging them, and the uncertainty of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 22542–22551.
estimation is obtained by calculating the standard deviation. Chung, H., Sim, B., Ye, J.C., 2022. Come-closer-diffuse-faster: Accelerating
Based on this, a cascade framework with residual averaging is conditional diffusion models for inverse problems through stochastic con-
further proposed to gradually refine the translation. For vali- traction, in: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pp. 12413–12422.
dation, we utilized three medical image datasets across X-ray,
Denck, J., Guehring, J., Maier, A., Rothgang, E., 2021. Mr-contrast-aware
CT, and MRI. Our experimental results showed that CMDM image-to-image translations with generative adversarial networks. Interna-
can provide high-quality translation results, better than previous tional Journal of Computer Assisted Radiology and Surgery 16, 2069–2078.
translation baselines for different sub-tasks. In parallel, CMDM Gal, Y., Ghahramani, Z., 2016. Dropout as a bayesian approximation: Repre-
senting model uncertainty in deep learning, in: international conference on
also provides reasonable uncertainty estimations that correlate machine learning, PMLR. pp. 1050–1059.
well with the translation error maps. We believe CMDM could Gao, Q., Li, Z., Zhang, J., Zhang, Y., Shan, H., 2023. Corediff: Contextual
be potentially adapted to other applications where both high- error-modulated generalized diffusion model for low-dose ct denoising and
quality translation and uncertainty estimation are required. generalization. IEEE Transactions on Medical Imaging .
Gong, K., Johnson, K., El Fakhri, G., Li, Q., Pan, T., 2023. Pet image denois-
ing based on denoising diffusion probabilistic model. European Journal of
Nuclear Medicine and Molecular Imaging , 1–11.
Gong, Y., Shan, H., Teng, Y., Tu, N., Li, M., Liang, G., Wang, G., Wang, S.,
Acknowledgments 2020. Parameter-transferred wasserstein generative adversarial network (pt-
wgan) for low-dose pet image denoising. IEEE transactions on radiation and
plasma medical sciences 5, 213–223.
This work was supported by the National Institutes of Health Ho, J., Jain, A., Abbeel, P., 2020. Denoising diffusion probabilistic models.
(NIH) grant R01EB025468 and grant R01CA275188. Advances in neural information processing systems 33, 6840–6851.
B. Zhou et al. / Medical Image Analysis (2024) 11

Huang, Z., Zhang, J., Zhang, Y., Shan, H., 2021. Du-gan: Generative adver- from dual energy chest x-rays with sliding organ registration. Computerized
sarial networks with dual-domain u-net-based discriminators for low-dose Medical Imaging and Graphics 64, 12–21.
ct denoising. IEEE Transactions on Instrumentation and Measurement 71, Wolleb, J., Sandkühler, R., Bieder, F., Valmaggia, P., Cattin, P.C., 2022. Dif-
1–12. fusion models for implicit image segmentation ensembles, in: International
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A., 2017. Image-to-image translation Conference on Medical Imaging with Deep Learning, PMLR. pp. 1336–
with conditional adversarial networks, in: Proceedings of the IEEE confer- 1348.
ence on computer vision and pattern recognition, pp. 1125–1134. Wu, W., Hu, D., Niu, C., Yu, H., Vardhanabhuti, V., Wang, G., 2021. Drone:
Jungo, A., Reyes, M., 2019. Assessing reliability and challenges of uncertainty Dual-domain residual-based optimization network for sparse-view ct recon-
estimations for medical image segmentation, in: Medical Image Comput- struction. IEEE Transactions on Medical Imaging 40, 3002–3014.
ing and Computer Assisted Intervention–MICCAI 2019: 22nd International Xie, H., Gan, W., Zhou, B., Chen, X., Liu, Q., Guo, X., Guo, L., An, H.,
Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II Kamilov, U.S., Wang, G., et al., 2023. Dose-aware diffusion model for 3d
22, Springer. pp. 48–56. ultra low-dose pet imaging. arXiv preprint arXiv:2311.04248 .
Kang, M., Zhang, R., Barnes, C., Paris, S., Kwak, S., Park, J., Shechtman, E., Yang, Q., Li, N., Zhao, Z., Fan, X., Chang, E.I.C., Xu, Y., 2020. Mri cross-
Zhu, J.Y., Park, T., 2024. Distilling diffusion models into conditional gans. modality image-to-image translation. Scientific reports 10, 3753.
arXiv preprint arXiv:2405.05967 . Yang, W., Chen, Y., Liu, Y., Zhong, L., Qin, G., Lu, Z., Feng, Q., Chen, W.,
Lee, S., Chung, H., Park, M., Park, J., Ryu, W.S., Ye, J.C., 2023. Improving 3d 2017. Cascade of multi-scale convolutional neural networks for bone sup-
imaging with pre-trained perpendicular 2d diffusion models. arXiv preprint pression of chest radiographs in gradient domain. Medical image analysis
arXiv:2303.08440 . 35, 421–433.
Li, B., Xue, K., Liu, B., Lai, Y.K., 2023. Bbdm: Image-to-image translation Yue, Z., Wang, J., Loy, C.C., 2024. Resshift: Efficient diffusion model for
with brownian bridge diffusion models, in: Proceedings of the IEEE/CVF image super-resolution by residual shifting. Advances in Neural Information
Conference on Computer Vision and Pattern Recognition, pp. 1952–1961. Processing Systems 36.
Liu, G.H., Vahdat, A., Huang, D.A., Theodorou, E.A., Nie, W., Anandku- Zbontar, J., Knoll, F., Sriram, A., Murrell, T., Huang, Z., Muckley, M.J.,
mar, A., 2023a. I2 SB: Image-to-image schrödinger bridge. arXiv preprint Defazio, A., Stern, R., Johnson, P., Bruno, M., et al., 2018. fastmri:
arXiv:2302.05872 . An open dataset and benchmarks for accelerated mri. arXiv preprint
Liu, Y., Zeng, F., Ma, M., Zheng, B., Yun, Z., Qin, G., Yang, W., Feng, Q., arXiv:1811.08839 .
2023b. Bone suppression of lateral chest x-rays with imperfect and lim- Zhang, Z., Liang, X., Dong, X., Xie, Y., Cao, G., 2018. A sparse-view ct
ited dual-energy subtraction images. Computerized Medical Imaging and reconstruction method based on combination of densenet and deconvolution.
Graphics 105, 102186. IEEE transactions on medical imaging 37, 1407–1417.
Lyu, Q., Wang, G., 2022. Conversion between ct and mri images using diffusion Zhou, B., Chen, X., Xie, H., Zhou, S.K., Duncan, J.S., Liu, C., 2022a. Du-
and score-matching models. arXiv preprint arXiv:2209.12104 . doufnet: dual-domain under-to-fully-complete progressive restoration net-
McCollough, C., 2016. Tu-fg-207a-04: overview of the low dose ct grand work for simultaneous metal artifact reduction and low-dose ct reconstruc-
challenge. Medical physics 43, 3759–3760. tion. IEEE transactions on medical imaging 41, 3587–3599.
Moghadam, P.A., Van Dalen, S., Martin, K.C., Lennerz, J., Yip, S., Fara- Zhou, B., Chen, X., Zhou, S.K., Duncan, J.S., Liu, C., 2022b. Dudodr-net:
hani, H., Bashashati, A., 2023. A morphology focused diffusion proba- Dual-domain data consistent recurrent network for simultaneous sparse view
bilistic model for synthesis of histopathology images, in: Proceedings of and metal artifact reduction in computed tomography. Medical Image Anal-
the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. ysis 75, 102289.
2000–2009. Zhou, B., Hou, J., Chen, T., Zhou, Y., Chen, X., Xie, H., Liu, Q., Guo, X.,
Nie, D., Trullo, R., Lian, J., Wang, L., Petitjean, C., Ruan, S., Wang, Q., Shen, Tsai, Y.J., Panin, V.Y., et al., 2024. Pour-net: A population-prior-aided over-
D., 2018. Medical image synthesis with deep convolutional adversarial net- under-representation network for low-count pet attenuation map generation.
works. IEEE Transactions on Biomedical Engineering 65, 2720–2730. arXiv preprint arXiv:2401.14285 .
Özbey, M., Dalmaz, O., Dar, S.U., Bedel, H.A., Özturk, Ş., Güngör, A., Çukur, Zhou, B., Lin, X., Eck, B., Hou, J., Wilson, D., 2019. Generation of virtual dual
T., 2023. Unsupervised medical image translation with adversarial diffusion energy images from standard single-shot radiographs using multi-scale and
models. IEEE Transactions on Medical Imaging . conditional adversarial network, in: Computer Vision–ACCV 2018: 14th
Pan, S., Abouei, E., Wynne, J., Chang, C.W., Wang, T., Qiu, R.L., Li, Y., Peng, Asian Conference on Computer Vision, Perth, Australia, December 2–6,
J., Roper, J., Patel, P., et al., 2023. Synthetic ct generation from mri using 2018, Revised Selected Papers, Part I 14, Springer. pp. 298–313.
3d transformer-based denoising diffusion model. Medical Physics . Zhou, B., Zhou, S.K., 2020. Dudornet: learning a dual-domain recurrent
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks network for fast mri reconstruction with deep t1 prior, in: Proceedings of
for biomedical image segmentation, in: Medical Image Computing and the IEEE/CVF conference on computer vision and pattern recognition, pp.
Computer-Assisted Intervention–MICCAI 2015: 18th International Con- 4273–4282.
ference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Zhou, B., Zhou, S.K., Duncan, J.S., Liu, C., 2021. Limited view tomographic
Springer. pp. 234–241. reconstruction using a cascaded residual dense spatial-channel attention net-
Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., work with projection data fidelity layer. IEEE transactions on medical imag-
Norouzi, M., 2022. Palette: Image-to-image diffusion models, in: ACM ing 40, 1792–1804.
SIGGRAPH 2022 Conference Proceedings, pp. 1–10.
Sasaki, H., Willcocks, C.G., Breckon, T.P., 2021. Unit-ddpm: Unpaired im-
age translation with denoising diffusion probabilistic models. arXiv preprint
arXiv:2104.05358 .
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D., 2017. A
deep cascade of convolutional neural networks for dynamic mr image recon-
struction. IEEE transactions on Medical Imaging 37, 491–503.
Shi, Y., Zhang, J., Ling, T., Lu, J., Zheng, Y., Yu, Q., Qi, L., Gao, Y., 2021.
Inconsistency-aware uncertainty estimation for semi-supervised medical im-
age segmentation. IEEE transactions on medical imaging 41, 608–620.
Sidky, E.Y., Pan, X., 2022. Report on the aapm deep-learning sparse-view ct
grand challenge. Medical Physics 49, 4935–4943.
Song, J., Meng, C., Ermon, S., 2020. Denoising diffusion implicit models.
arXiv preprint arXiv:2010.02502 .
Song, Y., Shen, L., Xing, L., Ermon, S., 2021. Solving inverse problems
in medical imaging with score-based generative models. arXiv preprint
arXiv:2111.08005 .
Wen, D., Nye, K., Zhou, B., Gilkeson, R.C., Gupta, A., Ranim, S., Couturier, S.,
Wilson, D.L., 2018. Enhanced coronary calcium visualization and detection

You might also like