Cyclegan Models For Mri Image Translation: Cassandra Czobit and Reza Samavi
Cyclegan Models For Mri Image Translation: Cassandra Czobit and Reza Samavi
Abstract
Image-to-image translation has gained popularity in the medical field to trans-
form images from one domain to another. Medical image synthesis via domain
transformation is advantageous in its ability to augment an image dataset where
images for a given class is limited. From the learning perspective, this process
contributes to data-oriented robustness of the model by inherently broadening
the model’s exposure to more diverse visual data and enabling it to learn more
generalized features. In the case of generating additional neuroimages, it is advan-
tageous to obtain unidentifiable medical data and augment smaller annotated
datasets. This study proposes the development of a CycleGAN model for trans-
lating neuroimages from one field strength to another (e.g., 3 Tesla to 1.5). This
model was compared to a model based on DCGAN architecture. CycleGAN was
able to generate the synthetic and reconstructed images with reasonable accu-
racy. The mapping function from the source (3 Tesla) to target domain (1.5 Tesla)
performed optimally with an average PSNR value of 25.69 ± 2.49 dB and an
MAE value of 2106.27 ± 1218.37.
1 Introduction
The field of image-to-image translation is increasingly prominent in enhancing the
data-oriented robustness of machine learning methods, particularly in the realm of
medical imaging where it facilitates the translation of images across different domains.
1
This technique proves invaluable in medical image synthesis through domain conver-
sion, especially useful when there is a scarcity of images for certain classes, which
also aids in safeguarding patient privacy. From the learning perspective, such trans-
lation processes improves model robustness by data augmentation that results in a
model’s exposure to a wider array of visual data. Thus, the approach helps in fostering
the development of more generalized feature recognition capability. From a medical
imaging perspective, the technique is specifically instrumental in mild traumatic brain
injuries (mTBIs), where it routinely becomes difficult to diagnose due to a lack in
quantitative assessments Aoki, Inokuchi, Gunshin, Yahagi, and Suwa (2012).
Magnetic resonance imaging (MRI) is known universally as a form of technology to
obtain images for disease diagnosis and prognosis, in a non-invasive manner You et al.
(2022). Diffusion tensor imaging (DTI) has become a promising diagnostic tool as an
MRI-based technology to evaluate the organization of white matter in neuroimaging
Aoki et al. (2012). These tools, with machine learning, can be used to characterize,
monitor and generate predictions on disease progression, while increasing efficacy and
efficiency in patient care Hosny, Parmar, Quackenbush, Schwartz, and Aerts (2018).
The optimal field strength for assessing DTI images has been frequently debated.
The magnetic field strength of DTI neuroimages range from 0.2-0.5 Tesla (T) for low
field images to 7T and above for high field imaging You et al. (2022). A field strength
of 1.5T or 3T are predominantly selected for clinical MRI procedures, due to the
high signal-to-noise ratio (SNR) and resolution Campbell-Washburn et al. (2019),Mar-
ques, Simonis, and Webb (2019). Higher field strengths have traditionally been viewed
as advantageous due to an increase in image contrast and resolution Bahrami et
al. (2016),Hori, Hagiwara, Goto, Wada, and Aoki (2021). However, DTI at a field
strength of 0.5T is considered advantageous for assessing head trauma, due to the
reduction of geometric distortions and susceptibility artifacts Campbell-Washburn et
al. (2019),Wiens, Harris, Curtis, Beatty, and Stainsby (2020).
Generative Adversarial Network (GAN) models have gained an increasing amount
of popularity for medical image generation, reconstruction, and classification Yi, Walia,
and Babyn (2019), Kazeminia et al. (2020). GAN models, such as Deep Convolutional
GAN, are primarily used as a method for augmenting datasets. Composed of a gen-
erator and discriminator, the generator creates synthetic images from the probability
distribution, whereas the discriminator classifies samples as real or fake Goodfellow et
al. (2014). The generator obtains an input dimension z from the Gaussian distribution
to create the synthetic samples, G(z), and maps the pseudo-sample distribution Lan
et al. (2020). The ideal discriminator model will maximize the probability of allocat-
ing the correct label to the sample Goodfellow et al. (2014). Translation GAN models,
such as CycleGAN, function similarly by using two GAN networks to generate target
images from a source image. The creation of realistic synthetic medical images offers
a solution to alleviate privacy concerns relating to diagnostic imaging and usage of
individuals’ medical data Yi et al. (2019).
The primary purpose of this investigation is to demonstrate a proof-of-concept for
translating neuroimages between two field strengths in the MRI image domain. We
assess two models for data transformation using two field strength and data augmen-
tation for images in the same field strength. We develop a CycleGAN model for data
2
transformation from a source domain to the target domain (i.e., 3T to 1.5T). Our sec-
ond model aims to augment the 3T and 1.5T datasets by generating synthetic images
from the same field. Our empirical evaluation shows that the CycleGAN model is able
to translate images within the same modality, whereas the DCGAN model produces
synthetic images of poor quality.
2 Related Work
GANs were proposed as a mechanism to generate images that approximate the data
distribution of the input functions Goodfellow et al. (2014). Thus, GANs have emerged
as a powerful tool for generating synthetic data, which can be particularly useful
for data augmentation in DNNs training. The usage of GANs for this purpose has
been explored in various research studies and applications. For example, Boursalie,
Samavi, and Doyle (2021) investigated GAN-based imputation models to address data
imbalance and metrics to evaluate such models in contrast to the statistical imputation
models.
In image-to-image translation, adaptations of the original GAN model have
addressed issues surrounding convergence of the models and mode collapse Yi
et al. (2019). A deep convolutional GAN model (DCGAN) was created to over-
come issues regarding the training stability by implementing fully convolutional
upsampling/downsampling layers Yi et al. (2019), Radford, Metz, and Chintala (2015).
Paired image translation requires data from both modalities Yi et al. (2019). A
study by Emami et al., explored paired image translation from T1-weighted input MRI
images to generate synthetic CT scans Emami, Dong, Nejad-Davarani, and Glide-
Hurst (2018). The conditional GAN model involved a residual network as the generator
and a CNN for the discriminator. A model by Bahrami et al., constructed a Canonical
Correlation Analysis space from paired 7T and 3T MRI images, where the extracted
patches were mapped to a common space to show greater structural detail in the
segmentation of white and gray brain matter Bahrami et al. (2016). Unpaired image
translation offers more flexibility from paired images in instances where paired data or
an abundance of labelled data is not available. A CycleGAN model was developed to
utilize unpaired images, without a task specific similarity function, for image transla-
tion Zhu, Park, Isola, and Efros (2017). The proposed mapping approach coupled the
adversarial loss with inverse mapping, along with a cycle consistency loss function, to
translate the images from domain X to Y, and vice versa Zhu et al. (2017). Hiasa et
al., implemented a CycleGAN model for cross-modality image translation between CT
and MRI scans and adapted the loss function to capture variation of intensity pairs
in the images Hiasa et al. (2018).
The proposed study differs from prior work, in which the input and output images
will be within the same MRI domain, but at varying field strengths. Previous investiga-
tions have translated images across modalities (e.g., MRI to CT images), but have not
addressed the translation of field strengths within a single modality. This translation
aims to visualize the MRI image from a different anatomical granularity level.
3
3 CycleGAN Image Translation Model
The models discussed in the following sections were developed for the purpose of
image-to-image translation from 3T to 1.5T to generate synthetic neuroimages. It
consists of 2 mapping functions, G and F, with a relationship defined as G∗ , F ∗ =
arg min max L(G, F, DX , DY ) Zhu et al. (2017). Section 3.1 details how the data was
obtained, section 3.2 describes the architecture of the CycleGAN model, and section
3.3 describes the DCGAN network for data augmentation.
3.1 Data
Two field strengths, 1.5T and 3T, were selected as part of this study for DTI image
translation. For the purpose of this study, these field strengths were selected to demon-
strate a proof of concept for the CycleGAN and DCGAN models. DTI images at a
field strength of 0.5T are not widely available for public use. For both datasets, 70%
of the images were assigned for training purposes, with the remaining 30% for test-
ing. The scale of both datasets are small due to limited open-source resources for the
required field strengths.
The 3T images were collected by the University of North Carolina-CH and were
made available on the Kitware Data open-source platform.1 The scans were acquired
on a 3T Siemens MR unit. The DTI images were obtained using 6 directions with 2mm
x 2mm x 2mm isotropic resolution (voxel size). The 35 axial cross-sectional images
were obtained from individuals in age groups from 18-29, 30-39, 40-49, 50-59, and 60+
years. From the 35 patient scans, ten slice were obtained from each brain volume, for
a total of 350 slices.
The 1.5T scans were obtained on the Philips Intera MRI scanner from the SCA2
Diffusion Tensor Imaging dataset on the OpenNeuro platform.2 Diffusion weighted
images were collected for 16 individuals for 2 sessions, a year apart. The axial images
were applied in 15 directions for 50 slices, with a slice thickness of 3 mm. For each
image, ten slices were obtained to represent the 2-dimensional DTI scan, for a total
of 16 images and 160 slices.
3.2 CycleGAN
The proposed architecture for the CycleGAN model consists of two sequential models
that operate in a forward and backward cycle for images in the source and target
domains. Each model contains a generator and discriminator component. The complete
CycleGAN network translates an image in source domain to the target domain, where
the source represents the 3T images, and target contains the 0.5T images.
Within this network, the mapping functions between the two domains are trained
for the two data distributions. Two mapping functions are defined as G: X → Y
and F: Y → X, where X is the training sample in the source domain and Y is the
training sample in the target domain Zhu et al. (2017). The adversarial discriminator
for each mapping function contains terms for the adversarial and cycle consistency
loss functions.
1
https://data.kitware.com/#collection/591086ee8d777f16d01e0724/folder/58a372fa8d777f0721a64dfa
2
https://openneuro.org/datasets/ds001378/versions/00003
4
In the discriminator model, the CNN model architecture classifies the images and
is identical for each of the mapping functions, G and F. The discriminator receives
images of size 256 x 256 from the generator and source domain as the input. We
use a 70 x 70 PatchGAN method for patch predictions. The PatchGAN evaluates
NxN pixels to classify as real or fake, and averages the score across the entire image
Zhu et al. (2017). This approach reduces the number of parameters from a full-image
discriminator Zhu et al. (2017). The discriminator consists of 5 convolutional layers,
stride 2 and k filter size, where k is equal to the number of filters per layer. The value
of k per layer is 64, 128, 256, 512 and 1, for layers 1 to 5 respectively. We apply a 4x4
filter for filters 1-4, followed by a LeakyReLU activation function with a slope of 0.2.
Instance normalization is applied to layers 2, 3 and 4.
The generator architecture is constructed as an encoder-decoder network with a
residual network, ResNet, to transform the input images. The encoder portion of
the network downsamples the image, and consists of 3 convolutional layers, with a
ReLU activation function, followed by instance normalization. The ResNet receives
the output of the encoder as the input. ResNet contains 9 convolutional blocks, which
functions by creating skip connections between layers, where a residual block has
a layer feeding into another 2-3 skips away in addition to the layer succeeding it
He, Zhang, Ren, and Sun (2015). From the ResNet network, the output is passed
to the decoder, where upsampling is performed using two deconvolution blocks with
fractional stride values to generate the final output as the original size.
The complete composite model of the generator and discriminator is compiled for
both mapping functions, and trained for 50 epochs, with a batch size of 4. The Adam
optimizer is selected with a learning rate of 0.0002 and a momentum value of 0.5 Zhu
et al. (2017). We trained the complete model from scratch.
3.3 DCGAN
The DCGAN model consists of a generator and discriminator model for each field
strength. This model generates synthetic images from a singular source domain for the
respective field strength (i.e., a separate model for 3T and 1.5T). For both DCGAN
models, the generator and discriminator architectures are constructed in the same
manner. The DCGAN architecture was adapted from Radford et al. Radford et al.
(2015).
The discriminator architecture contains 3 convolutional layers with a 4x4 kernel
and stride 2 in each layer for downsampling. For each layer, the LeakyReLU activa-
tion function is used, followed by a dropout rate of 0.4. The convolutional layers are
flattened, before being passed to a Dense layer, with the sigmoid activation function.
The generator is composed of an initial Dense layer, followed by 3 transpose convo-
lutional layers. The Dense layer receives a low-resolution version of the image, which
is followed by the LeakyReLU activation function with alpha equal to 0.2. The trans-
pose convolution layers have filters of 256, 128 and 64 respectively with a 4x4 kernel
size. The LeakyReLU activation function with an alpha value of 0.2 is applied to each
layer. The final layer is a convolutional layer with a 3x3 kernel and uses the sigmoid
activation function.
5
The complete model is compiled and trained for 50 epochs, with a batch size of 4
and uses the binary cross-entropy loss function. The model is trained with the Adam
optimizer, using a learning rate of 0.0002 and momentum of 0.05.
3.4 Evaluation
Several statistical measures will be reviewed to evaluate the performance of the
GAN variant models. Evaluation measures include mean absolute error (MAE), mean
squared error (MSE) and peak signal-to-noise ratio (PSNR) Wolterink et al. (2017),Li
et al. (2021). To evaluate the image quality, we compared the synthesized 3T and 1.5T
neuroimages to the real, ground truth images.
4.2 DCGAN
The DCGAN model was evaluated for the 1.5T image synthesis. The synthesized
images are shown in Figure A1 in Appendix A and are evaluated based on a visual
inspection and image quality metrics. Visually, it appears that the model was unable
to generate a diverse set of examples. From 1000 randomly generated synthetic images,
the average PSNR and MAE values were 6.20 db ± 0.11 and 31907.44 ± 415.85,
respectively. The MSE was calculated as 0.24 ± 0.0059. This quantitative assessment
demonstrates that the model was not able to generate high-quality synthetic images.
6
(a)
(b)
This type of instability is an indicator of mode collapse, wherein the generator cannot
create a diverse set of images Goodfellow (2016), Kodali, Abernethy, Hays, and Kira
(2017). As a result, the discriminator only learns from a small subset of images Kodali
et al. (2017). Increasing the size of the training set could be helpful to mitigate low
accuracy results Lan et al. (2020).
We then evaluated the quality of synthesized 3T scans generated by the DCGAN
model. This was performed for comparison purposes with the backward cycle consis-
tency loss function of the CycleGAN model. From a qualitative perspective, the images
are able to replicate some details, however, they lack clarity overall. These images are
shown in Figure A2 of Appendix A. Quantitatively, evaluation metrics were calculated
from 1000 synthesized images. The 3T DCGAN model generated images with a higher
PSNR value than the 1.5T counterpart, with an average of 8.86 db ± 0.76.
7
4.4 Research Limitations
The findings of this investigation are limited by the small dataset for each domain. As
opposed to selecting a range of 10 image slice of the DTI scans, a future study could
input several different sizes of slices from each patient scan to increase the variability
of training images for the CycleGAN and DCGAN models.
5 Conclusions
The purpose of this investigation was to determine if the CycleGAN model could be
applied within a single modality to translate MRI images between two field strengths.
This investigation demonstrated that the application of this model has potential, bar-
ring some modifications to the dataset. In contrast, the DCGAN model failed to
produce realistic synthetic images for both domains.
Future investigations could implement an additional model to review the quality
of the images. The creation of a classification model could play a role in determining
if the synthesized images are able to be correctly assigned to the desired category Yi
et al. (2019). This addition would provide more concrete evidence for the performance
of the CycleGAN model. Increasing the size of the experiment to include additional
datasets and a larger scale of data would strengthen the current findings as well.
Acknowledgments. We would like to thank Dr. Michael Noseworthy and Nicholas
Simard for the initial discussions on current trends and limitations in medical imaging
and motivating this problem.
References
Aoki, Y., Inokuchi, R., Gunshin, M., Yahagi, N., Suwa, H. (2012, 9).
Diffusion Tensor Imaging Studies of Mild Traumatic Brain Injury: a
Meta-Analysis. Journal of Neurology, Neurosurgery & Psychiatry, 83 (9),
870–876, https://doi.org/10.1136/JNNP-2012-302742 Retrieved from
https://jnnp.bmj.com/content/83/9/870
Bahrami, K., Shi, F., Zong, X., Shin, H.W., An, H., Shen, D. (2016, 9). Reconstruction
of 7T-Like Images from 3T MRI. IEEE Transactions on Medical Imaging, 35 (9),
2085–2097, https://doi.org/10.1109/TMI.2016.2549918
Boursalie, O., Samavi, R., Doyle, T.E. (2021). Evaluation Metrics for Deep Learning
Imputation Models. 35th aaai conference on artificial intelligence (aaai-21) -
5th international workshop on health intelligence (pp. 309–322). Springer.
8
Emami, H., Dong, M., Nejad-Davarani, S.P., Glide-Hurst, C.K. (2018, 8).
Generating Synthetic CTs from Magnetic Resonance Images Using Genera-
tive Adversarial Networks. Medical Physics, 45 (8), 3627–3636, https://
doi.org/10.1002/MP.13047 Retrieved from https://onlinelibrary-wiley-
com.ezproxy.lib.ryerson.ca/doi/full/10.1002/mp.13047
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., . . .
Bengio, Y. (2014, 6). Generative Adversarial Networks. Communications of
the ACM , 63 (11), 139–144, https://doi.org/10.1145/3422622 Retrieved from
https://arxiv.org/abs/1406.2661v1
He, K., Zhang, X., Ren, S., Sun, J. (2015, 12). Deep Residual Learn-
ing for Image Recognition. Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016-December ,
770–778, https://doi.org/10.48550/arxiv.1512.03385 Retrieved from
https://arxiv.org/abs/1512.03385v1
Hiasa, Y., Otake, Y., Takao, M., Matsuoka, T., Takashima, K., Carass, A.,
. . . Sato, Y. (2018). Cross-modality Image Synthesis from Unpaired
Data Using CycleGAN: Effects of Gradient Consistency Loss and Train-
ing Data Size. Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
ics), 11037 LNCS , 31–41, https://doi.org/10.1007/978-3-030-00536-8 4
Retrieved from https://jhu.pure.elsevier.com/en/publications/cross-modality-
image-synthesis-from-unpaired-data-using-cyclegan-
Hori, M., Hagiwara, A., Goto, M., Wada, A., Aoki, S. (2021, 11). Low-Field Magnetic
Resonance Imaging: Its History and Renaissance. Investigative radiology, 56 (11),
669–679, https://doi.org/10.1097/RLI.0000000000000810
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L.H., Aerts, H.J. (2018). Artificial
Intelligence in Radiology (Vol. 18) (No. 8). Retrieved from www.nature.com/nrc
Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N., Albarqouni, S.,
Mukhopadhyay, A. (2020, 9). GANs for Medical Image Analysis. Artificial
9
Intelligence in Medicine, 109 , 101938, https://doi.org/10.1016/J.ARTMED.
2020.101938
Kodali, N., Abernethy, J., Hays, J., Kira, Z. (2017, 5). On Convergence and
Stability of GANs. arXiv: Artificial Intelligence, , https://doi.org/10.
48550/arxiv.1705.07215 Retrieved from https://arxiv.org/abs/1705.07215v5
http://arxiv.org/abs/1705.07215
Lan, L., You, L., Zhang, Z., Fan, Z., Zhao, W., Zeng, N., . . . Zhou, X. (2020, 5).
Generative Adversarial Networks and Its Applications in Biomedical Informat-
ics. Frontiers in Public Health, 8 , 164, https://doi.org/10.3389/FPUBH.2020.
00164/BIBTEX
Li, X., Jiang, Y., Rodriguez-Andina, J.J., Luo, H., Yin, S., Kaynak, O. (2021,
9). When Medical Images Meet Generative Adversarial Network: Recent
Development and Research Opportunities. Discover Artificial Intelligence,
1 (1), 1–20, https://doi.org/10.1007/s44163-021-00006-0 Retrieved from
https://link.springer.com/article/10.1007/s44163-021-00006-0
Marques, J.P., Simonis, F.F., Webb, A.G. (2019, 6). Low-field MRI:
An MR Physics Perspective. Journal of Magnetic Resonance
Imaging, 49 (6), 1528–1542, https://doi.org/10.1002/JMRI.26637
Retrieved from https://onlinelibrary.wiley.com/doi/full/10.1002/jmri.26637
https://onlinelibrary.wiley.com/doi/abs/10.1002/jmri.26637
https://onlinelibrary.wiley.com/doi/10.1002/jmri.26637
Wiens, C.N., Harris, C.T., Curtis, A.T., Beatty, P.J., Stainsby, J.A. (2020). Fea-
sibility of Diffusion Tensor Imaging at 0.5T. Toronto. Retrieved from
https://archive.ismrm.org/2020/0303.html
Wolterink, J.M., Dinkla, A.M., Savenije, M.H., Seevinck, P.R., van den Berg,
C.A., Išgum, I. (2017, 8). Deep MR to CT Synthesis Using Unpaired
Data. International workshop on simulation and synthesis in medical imag-
ing (Vol. 10557 LNCS, pp. 14–23). Springer Verlag. Retrieved from
https://arxiv.org/abs/1708.01155v1
10
Yi, X., Walia, E., Babyn, P. (2019, 12). Generative Adversarial Network in Medical
Imaging: A Review. Medical Image Analysis, 58 , 101552, https://doi.org/10.
1016/j.media.2019.101552
You, S., Lei, B., Wang, S., Chui, C.K., Cheung, A.C., Liu, Y., . . . Shen, Y. (2022). Fine
Perceptive GANs for Brain MR Image Super-Resolution in Wavelet Domain.
IEEE Transactions on Neural Networks and Learning Systems, , https://doi.
org/10.1109/TNNLS.2022.3153088
Zhu, J.Y., Park, T., Isola, P., Efros, A.A. (2017). Unpaired Image-to-Image Trans-
lation Using Cycle-Consistent Adversarial Networks. Proceedings of the ieee
international conference on computer vision (pp. 2242–2251). Retrieved from
https://github.com/junyanz/CycleGAN.
Appendix A
11
Fig. A1: Generation of Synthetic 1.5T DTI Scans After 50 Epoch
12