0% found this document useful (0 votes)
14 views21 pages

2.data Augmentation Pipeline

This document presents a data augmentation pipeline using a Generative Adversarial Network (GAN) to synthesize labeled 3D echocardiography images, addressing the scarcity of publicly available datasets in medical imaging. The proposed method combines anatomical segmentations with real 3D echocardiographic images to generate realistic synthetic data, which can be used to train deep learning models for various clinical tasks. The results indicate that the synthetic images effectively support segmentation algorithms, enhancing the potential for automated analysis in echocardiography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

2.data Augmentation Pipeline

This document presents a data augmentation pipeline using a Generative Adversarial Network (GAN) to synthesize labeled 3D echocardiography images, addressing the scarcity of publicly available datasets in medical imaging. The proposed method combines anatomical segmentations with real 3D echocardiographic images to generate realistic synthetic data, which can be used to train deep learning models for various clinical tasks. The results indicate that the synthetic images effectively support segmentation algorithms, enhancing the potential for automated analysis in echocardiography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

A Data Augmentation Pipeline to Generate Synthetic

Labeled Datasets of 3D Echocardiography Images


using a GAN
Cristiana Tiago Andrew Gilbert
GE Vingmed Ultrasound - GE Healthcare GE Vingmed Ultrasound – GE Healthcare
Horten, Norway Horten, Norway
[email protected]

Ahmed S. Beela Svein Aarne Aase


CARIM School for Cardiovascular Diseases GE Vingmed Ultrasound – GE Healthcare
Maastricht University Medical Center Horten, Norway
Maastricht, The Netherlands

Sten Roar Snare Jurica Sprem


GE Vingmed Ultrasound - GE Healthcare GE Vingmed Ultrasound - GE Healthcare
Horten, Norway Horten, Norway

Kristin McLeod
GE Vingmed Ultrasound - GE Healthcare
Horten, Norway

ABSTRACT
Due to privacy issues and limited amount of publicly available labeled datasets in the
domain of medical imaging, we propose an image generation pipeline to synthesize 3D
echocardiographic images with corresponding ground truth labels, to alleviate the need
for data collection and for laborious and error-prone human labeling of images for
subsequent Deep Learning (DL) tasks. The proposed method utilizes detailed anatomical
segmentations of the heart as ground truth label sources. This initial dataset is combined
with a second dataset made up of real 3D echocardiographic images to train a Generative
Adversarial Network (GAN) to synthesize realistic 3D cardiovascular Ultrasound images
paired with ground truth labels. To generate the synthetic 3D dataset, the trained GAN
uses high resolution anatomical models from Computed Tomography (CT) as input. A
qualitative analysis of the synthesized images showed that the main structures of the heart
are well delineated and closely follow the labels obtained from the anatomical models. To
assess the usability of these synthetic images for DL tasks, segmentation algorithms were
trained to delineate the left ventricle, left atrium, and myocardium. A quantitative
analysis of the 3D segmentations given by the models trained with the synthetic images
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

indicated the potential use of this GAN approach to generate 3D synthetic data, use the
data to train DL models for different clinical tasks, and therefore tackle the problem of
scarcity of 3D labeled echocardiography datasets.

Keywords 3D image generation ∙ data augmentation ∙ deep learning ∙ echocardiography ∙


generative adversarial networks ∙ segmentation

1 Introduction

Medical imaging plays a crucial role in optimizing treatment pathways. Saving time when it
comes to diagnosis and treatment planning enables the clinicians to focus on more complicated
cases.
Many modalities are used to image the heart, such as Computed Tomography (CT), Magnetic
Resonance (MR), and Ultrasound imaging, enabling several structural and functional parameters
related to the organ’s performance to be estimated. Such parameters are the basis of clinical
guidelines for diagnosis and treatment planning.
Echocardiography is the specific use of ultrasound to image the heart. This imaging modality is
widely used given its advantages of being portable, relatively low-cost, and the fact that the use
of ionizing radiation is not required.
Deep Learning (DL), and specifically Convolutional Neural Networks (CNNs), have become
extensively applied in medical image analysis because they facilitate the automation of many
tedious clinical tasks and workflows such as estimation of ejection fraction, for example. These
algorithms are capable of approaching human-level performance (Asch et al., 2019), thus
potentially saving clinicians’ time without decreasing the quality of care for patients. In fact,
clinicians agree that using DL algorithms in the clinical workflow also improves patient access
to disease diagnoses, increasing the final diagnosis confidence levels (Scheetz et al., 2021). DL
models can be developed to perform numerous medical tasks such as image classification,
segmentation and even region/structure detection (Aljuaid and Anwar, 2022)
Echocardiography images can be acquired both in 2D and 3D. Time can also be taken into
account, generating videos. 3D echocardiography images can be more difficult to assess than 2D
images. However, for some specific application cases, 3D image acquisition brings great
advantages since it can offer more accurate and reproducible measurements. One such case is
ventricle and atrium volumes (Pérez de Isla et al., 2009). Amongst the causes of lack of
annotated 3D echocardiography datasets are the higher complexity to acquire 3D
echocardiography images and the fact that 3D is still not part of all echocardiography routine
exams. Also, even when 3D images are recorded, delineating the structures in them is a
challenging, time consuming, and user dependent task. Taken together and adding the fact that
privacy regulations to access medical data are becoming stricter, these can explain why there is a
lack of publicly available datasets of such type of images. Therefore, having an approach able to
address this image scarcity is necessary. This current lack of 3D medical data and the great need
of high quality annotated data required by the DL models impacts the development of such

2
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

algorithms and therefore the scientific and technological development of the 3D medical imaging
field. Synthetic generation of labeled 3D echocardiography images is a DL based approach that
provides a solution for this problem.
Synthetic data can help in the development of DL models for image analysis (Shin et al., 2018)
and accurate labeling of these images. Furthermore, this approach works as a data augmentation
strategy by generating additional data. It is known that creating datasets with a combination of
real and synthetic images and use them to train algorithms that tackle medical challenges
represents a successful solution to the image scarcity (Chen et al., 2021) problem. Such type of
synthetic images even increase the heterogeneity present on these datasets, facilitating a more
efficient performance of the trained models as they are exposed to a larger variety of images.
Generative Adversarial Networks (GANs) are specific DL architectures that create models
capable of generating medical images closely resembling real images acquired from patients.
These deep generative models rely on a generator and a discriminator. While the straightforward
GAN discriminator distinguishes between real and fake, i.e., generated, images, the generator not
only attempts to deceive the discriminator but also tries to minimize the difference between the
generated image and the ground truth.
The generated synthetic images can even be associated with labels facilitating the acquisition of
large labeled datasets, eliminating the need for manual annotation, and therefore the variabilities
associated with the observer (Chuquicusma et al., 2018), which largely influences the final
output (Thorstensen et al., 2010). 3D heart models are a great source of anatomical labels since
they capture accurate information about the organ’s structures (Banerjee et al., 2016). Different
types of models can be used for this purpose, such as animated models, biophysical models, or
even anatomical models obtained from different imaging modalities (Segars et al., 2010), (Kainz
et al., 2019). Recently, CT models were used as label sources to generate 2D echocardiography
(Gilbert et al., 2021) and cardiac MR images (Roy et al., 2019), proving the utility of GANs for
this task.
Developing a pipeline to generate synthetic data using GANs to create labeled datasets addresses
the immense need for the large volume of data that DL algorithms require during training to
perform an image analysis task, eliminates the need to acquire the images from subjects, and
saves time of experienced professionals when annotating them, as the anatomical labels can be
extracted from anatomical models. Usually, when developing such generative models, imaging
artifacts are present and visible on the synthetically generated images. This widely common
GAN performance drawback is addressed by applying some image post-processing operations
(Perperidis, 2016) on the synthetically generated 3D echocardiography images.
In practice synthetic images can be used to train DL models because they represent a good data
augmentation strategy (Chai et al., 2021). For instance, 3D medical image segmentation is the
most common example of a medical task to which DL can turn out to be a good application.
Labeled datasets made of real images combined with synthetic ones, which even include the
respective anatomical labels, become the training dataset for 3D DL models, addressing the
problem of sparse 3D medical data availability (Lustermans et al., 2022).

1.1 State of the Art

3
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

DL has become widely used in medical imaging due to its potential in image segmentation,
classification, reconstruction, and synthesis across all imaging modalities. Image synthesis has
been a research topic for a few decades now, where some of the more conventional approaches
use human-defined rules and assumptions like shape priors, for example (Pedrosa et al., 2017).
Also, these image synthesis techniques depend on the imaging modality being considered to
perform certain tasks. To tackle these shortcomings, CNNs are now becoming a widely used
approach for image synthesis across many medical imaging modalities.
Many reasons motivate medical image generation, both 2D and 3D. Generative algorithms can
perform domain translation, with a large applicability when converting images from one imaging
modality to a different one, as (Uzunova et al., 2020) showed in their work converting 3D MR
and CT brain images. GANs can also be used to generate a ground truth for a given input, as
these DL models can be trained in a cyclic way, as is the case of the CycleGAN (Zhu et al.,
2017), for example. Additionally, generation of synthetic data used for DL algorithms also
motivates the application and development of GAN architectures. Several research groups were
able to generate medical images using this methodology as a data augmentation tool, even
though most of them were developed under a 2D scenario and focused on a few imaging
modalities, mainly MRI and CT. These imaging modalities raise less challenges when compared
with Ultrasound due to the nature of the physics behind the acquisition process.
Ultrasound images have an inherent and characteristic speckle pattern and their quality is largely
influenced by the scanner, the sonographer, and the patient anatomy. When it comes to
generating 3D Ultrasound images a few more challenges arise, with the speckle pattern having to
be consistent throughout the whole volume being the main one. The anatomical information
present in the generated volume also has to hold this consistency feature.
(Huo et al., 2019) trained a 2D GAN model, SynSegNet, on CT images and unpaired MR labels
using a CycleGAN. Similarly, (Gilbert et al., 2021) proposed an approach to synthesize labeled
2D echocardiography images, using anatomical models and a CycleGAN as well. The
CycleGAN was proposed by (Zhu et al., 2017) and works under an unpaired scenario: the images
from one training domain do not have to be related with the images belonging to the other
domain. This GAN learns how to map the images from one to another and vice-versa. The paired
version of this GAN is called Pix2pix. (Isola et al., 2017) proposed this image synthesis method
which generates images from one domain to the other, and vice-versa, however the images
belonging to the training domains are paired.
As mentioned, 3D echocardiographic data is sparser, but these images can be generated using
GANs, and then used to train new algorithms. Both (Gilbert et al., 2021) and (Amirrajab et al.,
2020) investigated the potential use of GAN synthesized datasets to train CNNs to segment
different cardiac structures on different imaging modalities, but these methods were limited to
2D.
(Hu et al., 2017) attempted to generate 2D fetal Ultrasound scan images at certain 3D spatial
locations. They concluded that common GAN training problems such as mode collapse occur.
(Abbasi-Sureshjani et al., 2020) developed a method to generate 3D labeled Cardiac MR images
relying on CT anatomical models to obtain labels for the synthesized images, using a SPADE
GAN (Park et al., 2019). More recently, (Cirillo et al., 2020) adapted the original Pix2pix model
to generate 3D brain tumor segmentations.

4
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

When dealing with medical images, U-Net (Ronneberger et al., 2015) is a widely used CNN
model to perform image segmentation, for example, since it provides accurate delineation of
several structures on these images. More recently, (Isensee et al., 2021) proposed nnU-Net (“no
new net”), which automatically adapts to any new datasets and enables accurate segmentations.
nnU-Net can be trained on a 3D scenario and optimizes its performance to new unseen datasets
and different segmentation tasks, requiring no human intervention.
Existing work to address the challenges of automatic image recognition, segmentation, and
tracking in echocardiography has been mostly focused on 2D imaging. In particular, recent work
indicates the potential for applying DL approaches to accurately perform measurements in
echocardiography images. (Alsharqi et al., 2018) and (Østvik et al., 2018) used a DL algorithm
to segment the myocardium in 2D echocardiographic images, from which the regional motion,
and from this the strain, were measured. They showed that motion estimation using CNNs is
applicable to echocardiography, even when the networks are trained with synthetic data. This
work supports the hypothesis that similar approaches could also work for 3D synthetic data.
A large amount of work has been carried out on medical imaging generation and it still
represents a challenge for the research community. To the best of our knowledge, the challenge
of synthesizing 3D echocardiography images using GANs did not produce any reproducible
results, therefore we propose a framework able to address this need.

1.2 Contributions
We propose an approach for synthesizing 3D echocardiography images paired with
corresponding anatomical labels suitable as input for training DL image analysis tasks. Thus, the
main contributions of the proposed pipeline beyond the state of the art include:
1. The extension of (Gilbert et al., 2021) work from 2D to 3D, adapting it from an unpaired
to a paired framework (3D Pix2pix) and proposing an automatic pipeline to generate any
number of 3D echocardiography images, tackling the lack of public 3D echocardiography
datasets and corresponding labels.
2. The creation of a blueprint of heart models and post-processing methods for optimal
generation of 3D synthetic data, creating a generic data augmentation tool, this way
addressing the lack of 3D data generation works in echocardiography, since it
significantly varies from 2D.
3. The demonstration of the usability of these synthetic datasets for training segmentation
models that achieve high performance when applied to real images.

2 Methodology

The proposed pipeline is summarized in Fig. 1 and described in the following sections. Section
II-A describes the preprocessing stage of annotation of the GAN training images to create
anatomical labels for these. The training and inference stages are addressed in Section II-B
describing how the GAN model was trained and used to synthesize 3D echocardiography images
from CT-based anatomical labels and how different post-processing approaches, as described in
Section II-C, were applied to these synthetic images. Next, on Section II-D, details regarding the

5
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

creation of several synthetic datasets used to train 3D segmentation models are given, followed
by Section II-E where the influence of adding real images to the synthetic datasets to train
segmentation models is assessed.

Figure 1: 3D echocardiography image generation pipeline and inference results. Step 1: during the preprocessing stage, a set of
15 3D heart volumes were labeled by a cardiologist and anatomical labels for the LV, LA and MYO were generated. To train the
3D Pix2pix GAN model, the anatomical labels are paired together with the corresponding real 3D images. Step 2: at inference
time, the GAN model generates one 3D image. An example obtained during this stage is shown. The proposed method is able to
generate physiologically realistic images, giving correct structural features and image details. Step 3: to show the utility of the
synthetic datasets, 3D segmentation models were trained using these GAN generated images (black arrow), but other DL tasks
can be addressed.

2.1 Data Collection


To train the 3D image synthetization model, an annotated dataset was needed since this GAN set
up works under a supervised scenario where two sets of images are used for training: a set
containing real 3D echocardiography images and a second set with the correspondent anatomical
labels manually performed by a cardiologist (see Fig. 1, training stage).
To create the dataset of real 3D echocardiography images, these were acquired during one time
point of the cardiac cycle of normal subjects, end-diastole in this work, when the left ventricle
(LV) volume is at its largest value. Using GE Vivid Ultrasound scanners 15 heart volumes were
acquired.
The second set of images was made up of the anatomical labels corresponding to each of the 3D
real images included in the set previously described. Each anatomical label image contains the
label for the LV, left atrium (LA), and the myocardium (MYO).
To annotate the 3D echocardiography images a certified member of the American National
Board of Echocardiography cardiologist, with more than 10 years of experience, used the V7
annotation tool (“V7 | The AI Data Engine for Computer Vision & Generative AI,” n.d.) and
contoured the three aforementioned structures (Fig. 1, preprocessing stage) on each of the

6
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

volumes. These contours were then postprocessed, applying a spline function to the contour
points and resampling it, in order to generate gray scale labeled images. All the 3D images
present on each training dataset were sized to 256 x 256 x 32.

2.2 3D GAN Training


The Pix2pix model was proposed by (Isola et al., 2017) as a solution to image-to-image
translation across different imaging domains. This model is capable of generating an output
image for each input image by learning a cyclic mapping function across both training domains.
The Pix2pix model works as a conditional paired GAN: given two training domains containing
paired images, it learns how to generate new instances of each domain. The loss function was
kept the same as presented in the original work – a combination of conditional GAN loss and the
L1 distance. This way it is conditioning the GAN performance, assuring the information on the
generated output image matches the information provided by the input.
This original work was constructed under a 2D scenario, but in this proposed work an extension
to 3D was performed by changing the original architecture of the Pix2pix model.
We considered different architectures for the GAN generator and a 3D U-Net (Çiçek et al., 2016)
was used to create a 3D version of the Pix2pix model. The discriminator architecture was kept
the same, replacing only 2D layers with the correspondent 3D ones. During training of the GAN,
data augmentation operations, including blurring and rotation, were performed on the fly,
increasing the amount of 3D volumes used without the memory burden of having to save these.
The 3D Pix2pix model used here was built using PyTorch (Paszke et al., n.d.) and its training
was performed over 200 epochs accounting for the images size and computational memory
constraints, considering an initial learning rate of 0.0002 and the Adam optimizer.
At inference time, a common problem among image synthesis is the presence of checkerboard
artifacts on the generated images. To tackle this problem, which decreases the quality of the
synthesized images, we changed the generator architecture as suggested in (Odena et al., 2016)
by replacing the transposed convolutions in the upsampling layers of the 3D U-Net with linear
upsampling ones.
In order to generate synthetic echocardiography images for each of the inference cases, i.e., 3D
CT-based heart models, the generator part of the GAN, which translates images from the
anatomical labels domain to the echocardiography looking images domain, was used.
Anatomical models of the heart (Rodero et al., 2021) obtained from CT were used to create the
inference gray scale labeled images, containing anatomical information about the LV, LA, and
MYO. The main objective of this work was then accomplished by using the GAN as a data
augmentation tool to generate synthetic datasets of 3D echocardiography images of size 256 x
256 x 32 from these inference images, augmenting the quantity of 3D echocardiographic image
data.

2.3 Synthetic Data Post-processing

7
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

During the post-processing stage of the synthetic images generated by the GAN, two different
algorithms were experimented. The synthesized images were (a) filtered using the discrete
wavelet transform, following (Yadav et al., 2015) work and (b) masked with an Ultrasound cone.
The wavelet denoising operation uses wavelets that localize features in the data, preserving
important image features while removing unwanted noise, such as checkerboard artifacts. An
image mask representing the Ultrasound cone shape was applied to all synthesized images in
order to match true Ultrasound data.

2.4 3D Segmentation
The GAN pipeline was able to generate labeled instances of 3D echocardiography images, as the
model is capable of performing paired domain translation operations. To investigate the utility of
the synthetic images, four 3D segmentation models were trained using the generated synthetic
images as training set.
The trained model architecture for the 3D segmentation task was the 3D nnU-Net (Isensee et al.,
2021). This network architecture was proposed as a self-adapting framework for medical image
segmentation. This DL model adapts its training scheme, such as the loss function or slight
variations on the model architecture, to the dataset being used and to the segmentation task being
performed. It automates necessary adaptations to the dataset such as preprocessing, patch and
batch size, and inference settings without the need of user intervention.
To train the first of four 3D segmentation models, MSynthetic, described in this section, a labeled
dataset made of 27 synthetically generated 3D echocardiography images (256 x 256 x 32),
DSynthetic, was used. This dataset was obtained from the proposed 3D GAN pipeline at inference
time, using anatomical labels from 27 CT 3D anatomical models.
To evaluate the effect of the post-processing operations on the synthesized images, three other
datasets were created – DWavelet, DCone, and DWaveletCone – and three additional segmentation
models were trained using these – MWavelet, MCone, and MWaveletCone, respectively (Fig. 2). DWavelet
was made of the original synthetic images from the DSynthetic dataset but where the wavelet
denoising post-processing algorithm was applied, and DCone, was composed by the original
synthetic images with the cone reshaping post-processing operation. Finally, a fourth dataset
where both post-processing transformations – wavelet denoising and cone reshaping – were
applied to the original synthetic images, DWaveletCone, was created. All four datasets contained 27
3D echocardiography images with corresponding anatomical labels for the LV, LA and MYO.
All four 3D segmentation models, MSynthetic, MWavelet, MCone, and MWaveletCone, using nnU-Net,
were trained on a 5-fold cross validation scenario during 800 epochs. The initial learning rate
was 0.01 and the segmentation models were also built using PyTorch (Paszke et al., n.d.). The
loss function was a combination of dice and cross-entropy losses, as described in the original
work by (Isensee et al., 2021).
Dice scores were used to assess the quality of the segmentations. This score measures the
overlap between the predicted segmentation and the ground truth label extracted from the CT
anatomical models. For each segmented structure the Dice score obtained at validation time is a
value between 0 and 1, where the latter represents a perfect overlap between the prediction and
the ground truth.

8
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

2.5 Real Data Combined with Synthetic – Data Augmentation


In their work, (Lustermans et al., 2022) showed that adding real data to GAN-generated synthetic
datasets can help improve DL models train.
To facilitate a clearer analysis of the influence of using synthetic data to train DL models and the
utility of this GAN as a data augmentation tool, three other segmentation models were trained on
the datasets DReal, D17Real10Augmented, and D17Real20Augmented. DReal contained 17 real 3D
echocardiography volumes acquired with GE Vivid Ultrasound scanners and labeled by a
cardiologist.
D17Real10Augmented and D17Real20Augmented were made up of the same 17 real volumes just described
together with 10 and 20 synthetic GAN-generated 3D echocardiography images, respectively.
Thus allowing to assess the influence of using such type of images during DL models training
(Fig. 2).
The 3D segmentation models trained on these datasets were MReal, M17Real10Augmented, and
M17Real20Augmented, respectively. All models used the nnU-Net architecture implemented with
Pytorch. Similar to the ones described on Section II-D, they were trained for 800 epochs on a 5-
fold cross validation scenario, with the same learning rate and loss function.
At inference time, a test set including real 3D echocardiography images was segmented by the
three aforementioned models. To compare the segmentation results with the ones obtained from
a cardiologist, Dice scores and Volume Similarity (VS) were calculated and used as comparison
metrics. VS is calculated as the size of the segmented structures and is of high relevance in a 3D
scenario since Dice score presents some limitations. Similarly to the Dice score, this evaluation
metric takes values between 0 and 1 but is not overlap-based. Instead, it is a volume based
parameter where the absolute volume of a region in one segmentation is compared with the
corresponding region volume in the other segmentation (Taha and Hanbury, 2015).

Figure 2: Overview of all the created datasets and trained models in this work. The generative model, 3D Pix2pix, was trained in
order to be used to generate synthetic 3D echocardiography datasets. This dataset, DSynthetic, was postprocessed applying different
transformations and 3 other datasets were created – DWavelet, DCone, and DWaveletCone. A fifth dataset completely made of real
images, DReal, was created and to it, synthetic images from DSynthetic were added creating D17Real10Augmented and D17Real20Augmented. All

9
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

these 7 datasets were used to train 7 3D segmentation models – MSynthetic, MWavelet, MCone, MWaveletCone, MReal, M17Real10Augmented, and
M17Real20Augmented.

3 Results

This work’s results are presented as follows: Section III-A focuses on the GAN training,
architectural modifications performed on the 3D Pix2pix model and their influence on the
synthesized images. In Section III-B the influence of post-processing the synthetic images is
shown. Finally, Sections III-C and III-D show the segmentation predictions from several models
trained on different 3D echocardiography datasets (Fig. 2), as described in Sections II-C and II-
D.

3.1 GAN Architecture and Training


The chosen GAN architecture influenced the final results. 3D U-Net was chosen as the generator
architecture due to its good performance in the medical image domain. The model was trained on
a NVIDIA GeForce RTX 2080 Ti GPU and training took five days.
After applying the architectural changes described in Section II-B to remove the checkerboard
artifacts, it seemed like these became less visible or even disappeared. However, this correction
created some unwanted blurring on the generated images (Fig. 3), therefore the deconvolution
layers were used instead of upsampling, and the synthesized images were post-processed to
remove the checkerboard artifacts.

Figure 3: Influence of architectural changes on the GAN generator to remove checkerboard artifacts. At inference time, a 3D
anatomical model was used to extract the anatomical labels. The first column shows 2 different slices of this volume at different
rotation angles. The middle column shows that synthesizing images using a GAN with upsampling layers smoothens the
checkerboard artifacts but introduces blurring, which is not visible on the images when using a GAN with deconvolution layers
(right column). Deconvolution layers are preferred to upsampling ones.

3.2 Synthetic Data Post-processing

10
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

After training the 3D GAN model and generating synthetic images corresponding to the input
anatomical models, as described in Section II-C, the obtained 3D echocardiography images were
post-processed in order to remove the aforementioned checkerboard artifacts.
The cone edges were slightly wavy in some cases and checkerboard artifacts were sometimes
present. The postprocessing experiment, where different transformations were applied to the
synthesized images, showed that applying these can give a more realistic aspect to the GAN-
generated images, ensuring that the anatomical information remained intact.
Performing these operations allowed to give a more realistic look to the generated
echocardiography images (Fig. 4).

Figure 4: 3D Pix2pix model inference results and post-processing step. At inference time, the anatomical labels were extracted
from a 3D heart model. The first column shows 3 different rotation planes of this volume at different rotation angles. After
generating the correspondent synthetic ultrasound image (second column) for this inference case, it was post-processed applying
a wavelet denoising transformation to eliminate the checkerboard artifacts (third column) and also a cone reshaping step to
smooth the wavy edges of the ultrasound cone (fourth column). Post-processing operations give a more realistic look to the
synthesized images as indicated by the enlarged areas framed in red and green (wavelet denoise) and the white arrows (cone
reshape).

3.3 Segmentation from Synthetic Datasets


Anatomical models were used in order to synthesize 27 3D echocardiography images. These
were then used to create the synthetic datasets that were used to train 3D segmentation
algorithms, as described in section II-D. Post-processing operations were performed on these
images to create the DWavelet, DCone, and DWaveletCone datasets. Table 1 shows the average Dice
scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) for each
trained model – MSynthetic, MWavelet, MCone, and MWaveletCone, obtained from the validation dataset.
Training took around five days for each fold using a NVIDIA GeForce RTX 2080 GPU, for all
epochs. The complete table with all the Dice scores obtained for each training fold of each model
can be found in Appendix A – Table 5.

11
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Adding to the Dice scores and to sustain the usability of synthetic images to train segmentation
algorithms, Fig. 5 shows the 3D segmentation for an inference 3D echocardiography image
acquired from a real subject. Each trained segmentation model was tested on real cases, at
inference time.
Table 1: Average validation dice scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) for each
trained model on completely synthetic datasets – MSynthetic, MWavelet, MCone, and MWaveletCone. The best scores are highlighted.

Models
MSynthetic MWavelet MCone MWaveletCone
LV 0.926 ± 0.006 0.927 ± 0.005 0.926 ± 0.006 0.924 ± 0.008
LA 0.818 ± 0.011 0.816 ± 0.010 0.816 ± 0.021 0.814 ± 0.016
MYO 0.808 ± 0.016 0.808 ± 0.017 0.803 ± 0.018 0.801 ± 0.023

Figure 5: Inference segmentation results from each trained model on synthetic datasets. On the left is shown a schematic
representation of the heart and 2 cutting planes correspondent to a real 3D echocardiography image from the test set: the 4-
chamber (CH), with blue frame, and the 2-CH, with red frame. On the right, the LV, LA, and MYO segmentation results provided
by each of the 4 segmentation models: a) MSynthetic, b) MWavelet, c) MCone, and d) MWaveletCone follow. A qualitative analysis of the
segmentation results from each of the models, shows that the one where the training data was not post-processed, MSynthetic, gives
the best output due to a smoother segmentation of the relevant structures.

3.4 Segmentation from Combined Datasets


In Table 2 one can see the average Dice scores (average ± standard deviation), obtained at
validation time, of each segmented structure (LV, LA, and MYO) for each trained model on the

12
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

combined datasets: MReal, M17Real10Augmented, and M17Real20Augmented. In Appendix A – Table 4 the


complete table with all the Dice scores for each trained fold of all three models can be found.
Table 2: Average validation dice scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) for each
trained model on combined datasets – MReal, M17Real10Augmented, and M17Real20Augmented. The best scores are highlighted.

Models
MReal M17Real10Augmented M17Real20Augmented
LV 0.938 ± 0.008 0.928 ± 0.006 0.927 ± 0.007
LA 0.862 ± 0.023 0.830 ± 0.016 0.826 ± 0.017
MYO 0.724 ± 0.028 0.767 ± 0.027 0.763 ± 0.025

Fig. 6 shows the predicted segmentations given by these trained models, next to the ground truth
segmentation provided by a cardiologist. The models were tested on a test set made of 3D
echocardiography images from real subjects. To compare the output segmentation from the DL
models, the Dice scores and VS were calculated based on the predicted segmentations and the
anatomical labels from a cardiologist and the results are in Table 3.

Figure 6: Inference segmentation results from the trained models on augmented datasets with synthetic images. On the left is
shown a schematic representation of the heart and 2 cutting planes correspondent to a real 3D echocardiography image from the
test set: the 4-CH, with blue frame, and the 2-CH, with red frame. On the right, the LV, LA, and MYO segmentation results
provided by the following 3 segmentation models: a) MReal, b) M17Real10Augmented, and c) M17Real20Augmented, follow. To allow
comparison and measure the Dice score and VS, d) shows the ground truth segmentation performed by a cardiologist. A
qualitative analysis of the segmentation results from each of the models, shows that combining synthetic with real data improves
the segmentation output due to a more accurate segmentation of the relevant structures.

13
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Table 3: Average test set dice scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) and Volume
Similarity of the segmented volume for the MReal, M17Real10Augmented, and M17Real20Augmented models. The best scores are highlighted.

Models
MReal M17Real10Augmented M17Real20Augmented
Dice score
LV 0.924 ± 0.019 0.929 ± 0.020 0.922 ± 0.017
LA 0.876 ± 0.023 0.874 ± 0.020 0.867 ± 0.022
MYO 0.666 ± 0.041 0.708 ± 0.053 0.680 ± 0.063
Volume Similarity
Heart Volume 0.831 ± 0.038 0.844 ± 0.047 0.836 ± 0.041

4 Discussion

In this work we built a pipeline to generate synthetic 3D labeled echocardiography images using
a GAN model. These realistic-looking synthetic datasets were used to train 3D DL models to
segment the LV, LA, and MYO.
Moreover, combined datasets including synthetic and real 3D images were created, with the VS
metric supporting that generated 3D echocardiography images can be used to train DL models,
as data augmentation. Segmentation tasks were considered to exemplify the utility of the
synthesized data, however the pipeline is generic and could be applied to generate other imaging
data and train any DL tasks with anatomical labels as input, as further discussed in this section.
A brief discussion on future applications and modifications of this approach is also presented.

4.1 3D Pix2pix GAN - Qualitative Analysis


The pipeline synthesizes 3D echocardiographic datasets with corresponding labels delineating
different structures in the images.
After training the 3D Pix2pix GAN model, a qualitative analysis of the synthesized images
indicated that the main structures of the heart were well delineated in the generated images (Fig.
1, inference stage). Moreover, image details such as the cone, noise, and speckle patterns are also
present and are continuous throughout each volume.

4.2 Post-processing and 3D Segmentation - Synthetic Datasets


To evaluate the utilization of synthetic images for research purposes and the extent to which the
post-processing transformations affected the final results, four segmentation models were trained
using four different datasets, as described earlier in Section III-C.
Despite the very small differences in the Dice scores shown in Table 1 and in Appendix A –
Table 5, the inference segmentations (Fig. 5) support the idea that the model trained on the

14
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

dataset whose images were not post-processed, MSynthetic, provided the best segmentation
prediction.
The results regarding the influence of the post-processing step on the synthetically generated
images supported the fact that applying a wavelet denoising transformation or cone reshaping, or
even both transformations together, to these, in order to make the synthetic images look even
more realistic, does not necessarily lead to better results when segmenting the LV, LA, and
MYO (Fig. 5). This result shows some dependence on the DL task being performed. We
segmented large volumes of the 3D image, comparing to its whole content. For this reason, the
subtle differences in the voxels intensities that create the checkerboard artifacts do not seem to
affect the prediction of the segmentation model.
To create the used synthetic datasets, CT acquired 3D anatomical models of the heart were used
to extract the anatomical labels and create the input cases to the 3D GAN. The segmentation
results and the echocardiography-looking aspect of the synthetic images pointed towards the
generalization of this pipeline, as it can synthesize 3D echocardiography images, having as labels
source different types of 3D models of the heart. The methodology to generate synthetic datasets
can be generalized to other modalities, diseases, organs, as well as structures within the same
organ (sub-regions of the heart, for example).
(Shin et al., 2018) and (Shorten and Khoshgoftaar, 2019) showed that GANs can be widely used
to perform data augmentation of medical image datasets. The work from these authors, together
with the presented results, encourage the main contributions of this work stating that GANs can
be used to generate synthetic images with labels, working as a data augmentation strategy, and
tackling the concern of scarcity of 3D echocardiography labeled datasets, especially if there are
underrepresented data samples within the available real datasets.

4.3 3D Segmentation - Combined Datasets


Further results on the usage of synthetic datasets were explored and presented in Section III-D.
Here, three datasets made of GAN generated and real 3D images were used to train more
segmentation models and further evaluate the influence of the presence of real data in these
datasets, as demonstrated in (Lustermans et al., 2022).
Fig. 6 a), b), and c) showed the anatomical segmentations of the LV, LA, and MYO predicted by
the best trained fold of each model – MReal, M17Real10Augmented, and M17Real20Augmented. From the
qualitative analysis, the segmentations delineate well the anatomical structures in consideration
throughout the whole volume. At the same time, and similarly to what was discussed on Section
IV-B, the average Dice scores presented in Table 2 led to the conclusion that having a dataset of
real images combined with synthetic ones leads to more accurate final segmentations.
From the obtained results is also possible to assess the influence of using combined datasets with
different percentages of synthetic data. Table 2 and Table 3 show that adding synthetic data to
the initial real dataset improves the 3D segmentation of real 3D echocardiography images. They
also show that adding larger amounts of synthetic data does not improve the results to a large
extent.

15
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Fig. 6 d) showed the ground truth inference case segmentation performed by a cardiologist. From
these ground truth segmentations available for all the cases in the test set, the Dice scores and the
VS in Table 3 were calculated.
Given the 3D nature of the task and due to the Dice metric limitations, the VS was additionally
calculated and used as comparison metric. In particular, M17Real10Augmented showed to perform
better at segmenting when the Dice score was considered as performance metric. On the other
hand, M17Real20Augmented performed better in terms of VS metric. These results showed that the
models trained on the combined datasets, i.e., with real and synthetic images, provided more
accurate segmentation outputs (the 3D volume), relatively to the model trained with only real
data, MReal. The results support the previous work done by (Lustermans et al., 2022), confirming
that including synthetic data on datasets made of real data improves and helps the final outcome
of the DL models.
Additionally, this result reinforces that the proposed pipeline, relying on a 3D GAN model, can
be used as a data augmentation tool. This framework arises as a solution to the lack of publicly
available medical labeled datasets.

4.4 Further Applications


The presented pipeline has the potential to be further explored. As the demand for medical
images is increasing, the proposed approach can be extended to synthesize images from other
imaging modalities other than Ultrasound, such as MR or CT. It can also generate images where
other organs are represented or even fetuses (Roy et al., 2019).
Another extension of this work would be to use different types of 3D models from which ground
truth anatomical labels could be extracted. Besides anatomical models, animated or biophysical
models represent other options that can be considered. The usage of anatomical models of
pathological hearts are another possible extension, in order to generate pathological 3D
echocardiography scans. Depending on the type of 3D model being considered, different
annotations can be extracted, increasing the amount of clinically relevant tasks where these
synthetic datasets can be used.
The generated 3D echocardiography images illustrated a heart volume during one time step of
the whole cardiac cycle (end-diastole). It would be of great interest to generate 3D images of the
heart during other cardiac cycle events and even to generate a beating volume throughout time,
as high temporal resolution is one of the main strengths of Ultrasound imaging. On the other
hand, a limitation to the Ultrasound images generation is that different scanning probe
combinations lead to the acquisition of images with different quality levels. This large variability
makes the GAN learning process more complex.
In this work we explored, to an extent, the effects that architectural changes of the GAN model
have on the final synthesized images. We used different architectures for the GAN generator but
more 3D CNNs exist and are showing up every day. These can be used to train the generative
models, since DL strategies are becoming extremely common to use as medical image synthesis
and analysis tools. Once the images were synthesized, we used wavelet denoising and an in-
house developed algorithm to fix the Ultrasound cone edges. However, there are other denoising

16
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

transformations and cone reshaping algorithms that can be experimented to post-process the
images.
We trained several DL models to perform 3D segmentation to show that synthesized images can
be used as input to train DL models. Nevertheless, the pipeline is generic and could be applied to
other DL tasks that automatically assign anatomical labels to images, e.g., structure/feature
recognition or automatic structural measurements. Furthermore, the GAN-generated labeled
datasets are not only useful as input to train DL models but also could be used to train
researchers and clinicians on image analysis.
Finally, during this pipeline development, computational memory constraints were present,
mainly due to the large size of 3D volumes, complicating the process of developing a framework
adapted to these. Future work will include study strategies to overcome these limitations.

5 Conclusion

An automatic data augmentation pipeline to create 3D echocardiography images and


corresponding anatomical labels using a 3D GAN model was proposed. DL models are
becoming widely used in clinical workflows and large volumes of medical data is a fundamental
requirement to develop such algorithms with high accuracy. Generating synthetic data that could
be used for the purpose of training DL models is of utmost importance since this generative
model can become a widely used tool to address the existent lack of publicly available data and
increasing challenges with moving data due to privacy regulations. Furthermore, the proposed
methodology not only generates synthetic 3D echocardiography images but also associates labels
to these synthetic images, eliminating the need for experienced professionals to do so, and
without adding potential bias in the labels.
The proposed GAN model shows a generalization component since it can generate synthetic
echocardiography images using 3D anatomical models of the heart obtained for imaging
modalities other than from Ultrasound.
The obtained results in this work indicate that synthetic datasets made up of GAN-generated 3D
echocardiography images, and respective labels, are a good data augmentation resource to train
and develop DL models that can be used to perform different medical tasks in the cardiac
imaging domain, such as heart segmentation, where real patients’ data is analyzed.

Acknowledgements

This work was supported by the European Union’s Horizon 2020 research and innovation
programme under the Marie – Skłodowska – Curie grant agreement No 860745.

Appendix A

See Table 4 and Table 5.

17
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Table 4: Validation dice scores of each segmented structure (LV, LA, and MYO) for each trained model on combined datasets -
MReal, M17Real10Augmented, and M17Real20Augmented. The higher the score, the better the agreement between the model prediction and the
ground truth segmentation. The best training fold of each model is highlighted.

Models
MReal M17Real10Augmented M17Real20Augmented
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
LV 0.933 0.932 0.950 0.930 0.943 0.924 0.929 0.919 0.938 0.930 0.917 0.928 0.935 0.934 0.923
LA 0.837 0.869 0.873 0.837 0.896 0.830 0.841 0.820 0.808 0.853 0.831 0.841 0.838 0.829 0.793
MYO 0.710 0.699 0.766 0.697 0.750 0.745 0.745 0.779 0.815 0.735 0.715 0.771 0.766 0.780 0.785

Table 5: Validation dice scores of each segmented structure (LV, LA, and MYO) for each trained model on completely synthetic
datasets – MSynthetic, MWavelet, MCone, and MWaveletCone. The higher the score, the better the agreement between the model prediction
and the ground truth segmentation. The best training fold of each model is highlighted.

Models
MSynthetic MWavelet MCone MWaveletCone
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
LV 0.924 0.930 0.924 0.918 0.934 0.927 0.930 0.923 0.919 0.934 0.926 0.930 0.921 0.918 0.933 0.928 0.928 0.914 0.914 0.935
LA 0.837 0.831 0.809 0.810 0.807 0.821 0.831 0.806 0.815 0.805 0.837 0.842 0.805 0.811 0.787 0.834 0.832 0.803 0.793 0.807
MYO 0.824 0.822 0.794 0.784 0.816 0.829 0.822 0.791 0.785 0.814 0.824 0.819 0.783 0.780 0.809 0.828 0.813 0.775 0.773 0.816

References

Abbasi-Sureshjani, S., Amirrajab, S., Lorenz, C., Weese, J., Pluim, J., Breeuwer, M., 2020. 4D
Semantic Cardiac Magnetic Resonance Image Synthesis on XCAT Anatomical Model, in:
Proceedings of the Third Conference on Medical Imaging with Deep Learning. Presented
at the Medical Imaging with Deep Learning, PMLR, pp. 6–18.
Aljuaid, A., Anwar, M., 2022. Survey of Supervised Learning for Medical Image Processing. SN
Comput. Sci. 3, 292. https://doi.org/10.1007/s42979-022-01166-1
Alsharqi, M., Woodward, W.J., Mumith, J.A., Markham, D.C., Upton, R., Leeson, P., 2018.
Artificial intelligence and echocardiography. Echo Res. Pract. 5, R115–R125.
https://doi.org/10.1530/ERP-18-0056
Amirrajab, S., Abbasi-Sureshjani, S., Al Khalil, Y., Lorenz, C., Weese, J., Pluim, J., Breeuwer,
M., 2020. XCAT-GAN for Synthesizing 3D Consistent Labeled Cardiac MR Images on
Anatomically Variable XCAT Phantoms, in: Martel, A.L., Abolmaesumi, P., Stoyanov, D.,
Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (Eds.), Medical
Image Computing and Computer Assisted Intervention – MICCAI 2020, Lecture Notes in
Computer Science. Springer International Publishing, Cham, pp. 128–137.
https://doi.org/10.1007/978-3-030-59719-1_13
Asch, F.M., Poilvert, N., Abraham, T., Jankowski, M., Cleve, J., Adams, M., Romano, N., Hong,
H., Mor-Avi, V., Martin, R.P., Lang, R.M., 2019. Automated Echocardiographic
Quantification of Left Ventricular Ejection Fraction Without Volume Measurements
Using a Machine Learning Algorithm Mimicking a Human Expert. Circ. Cardiovasc.
Imaging 12, e009303. https://doi.org/10.1161/CIRCIMAGING.119.009303
Banerjee, I., Catalano, C.E., Patané, G., Spagnuolo, M., 2016. Semantic annotation of 3D
anatomical models to support diagnosis and follow-up analysis of musculoskeletal

18
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

pathologies. Int. J. Comput. Assist. Radiol. Surg. 11, 707–720.


https://doi.org/10.1007/s11548-015-1327-6
Chai, L., Zhu, J.-Y., Shechtman, E., Isola, P., Zhang, R., 2021. Ensembling with Deep Generative
Views, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR). Presented at the 2021 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 14992–15002. https://doi.org/10.1109/CVPR46437.2021.01475
Chen, R.J., Lu, M.Y., Chen, T.Y., Williamson, D.F.K., Mahmood, F., 2021. Synthetic data in
machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493–497.
https://doi.org/10.1038/s41551-021-00751-8
Chuquicusma, M.J.M., Hussein, S., Burt, J., Bagci, U., 2018. How to fool radiologists with
generative adversarial networks? A visual turing test for lung cancer diagnosis, in: 2018
IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Presented at
the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp.
240–244. https://doi.org/10.1109/ISBI.2018.8363564
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016. 3D U-Net: Learning
Dense Volumetric Segmentation from Sparse Annotation, in: Ourselin, S., Joskowicz, L.,
Sabuncu, M.R., Unal, G., Wells, W. (Eds.), Medical Image Computing and Computer-
Assisted Intervention – MICCAI 2016, Lecture Notes in Computer Science. Springer
International Publishing, Cham, pp. 424–432. https://doi.org/10.1007/978-3-319-46723-
8_49
Cirillo, M.D., Abramian, D., Eklund, A., 2020. Vox2Vox: 3D-GAN for Brain Tumour
Segmentation. https://doi.org/10.48550/arXiv.2003.13653
Gilbert, A., Marciniak, M., Rodero, C., Lamata, P., Samset, E., Mcleod, K., 2021. Generating
Synthetic Labeled Data From Existing Anatomical Models: An Example With
Echocardiography Segmentation. IEEE Trans. Med. Imaging 40, 2783–2794.
https://doi.org/10.1109/TMI.2021.3051806
Hu, Y., Gibson, E., Lee, L.-L., Xie, W., Barratt, D.C., Vercauteren, T., Noble, J.A., 2017.
Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative
Adversarial Networks, in: Cardoso, M.J., Arbel, T., Gao, F., Kainz, B., van Walsum, T.,
Shi, K., Bhatia, K.K., Peter, R., Vercauteren, T., Reyes, M., Dalca, A., Wiest, R., Niessen,
W., Emmer, B.J. (Eds.), Molecular Imaging, Reconstruction and Analysis of Moving
Body Organs, and Stroke Imaging and Treatment, Lecture Notes in Computer Science.
Springer International Publishing, Cham, pp. 105–115. https://doi.org/10.1007/978-3-
319-67564-0_11
Huo, Y., Xu, Z., Moon, H., Bao, S., Assad, A., Moyo, T.K., Savona, M.R., Abramson, R.G.,
Landman, B.A., 2019. SynSeg-Net: Synthetic Segmentation Without Target Modality
Ground Truth. IEEE Trans. Med. Imaging 38, 1016–1025.
https://doi.org/10.1109/TMI.2018.2876633
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H., 2021. nnU-Net: a self-
configuring method for deep learning-based biomedical image segmentation. Nat.
Methods 18, 203–211. https://doi.org/10.1038/s41592-020-01008-z
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A., 2017. Image-to-Image Translation with Conditional
Adversarial Networks. Presented at the 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), IEEE Computer Society, pp. 5967–5976.
https://doi.org/10.1109/CVPR.2017.632

19
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Kainz, W., Neufeld, E., Bolch, W.E., Graff, C.G., Kim, C.H., Kuster, N., Lloyd, B., Morrison, T.,
Segars, P., Yeom, Y.S., Zankl, M., Xu, X.G., Tsui, B.M.W., 2019. Advances in
Computational Human Phantoms and Their Applications in Biomedical Engineering – A
Topical Review. IEEE Trans. Radiat. Plasma Med. Sci. 3, 1–23.
https://doi.org/10.1109/TRPMS.2018.2883437
Lustermans, D.R.P.R.M., Amirrajab, S., Veta, M., Breeuwer, M., Scannell, C.M., 2022.
Optimized automated cardiac MR scar quantification with GAN‐based data
augmentation. Comput. Methods Programs Biomed. 226, 107116.
https://doi.org/10.1016/j.cmpb.2022.107116
Odena, A., Dumoulin, V., Olah, C., 2016. Deconvolution and Checkerboard Artifacts. Distill 1,
e3. https://doi.org/10.23915/distill.00003
Østvik, A., Smistad, E., Espeland, T., Berg, E.A.R., Lovstakken, L., 2018. Automatic Myocardial
Strain Imaging in Echocardiography Using Deep Learning, in: Stoyanov, D., Taylor, Z.,
Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S.,
Bradley, A., Papa, J.P., Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M.,
Greenspan, H., Madabhushi, A. (Eds.), Deep Learning in Medical Image Analysis and
Multimodal Learning for Clinical Decision Support, Lecture Notes in Computer Science.
Springer International Publishing, Cham, pp. 309–316. https://doi.org/10.1007/978-3-
030-00889-5_35
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y., 2019. Semantic Image Synthesis With Spatially-
Adaptive Normalization, in: 2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR). Presented at the 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 2332–2341.
https://doi.org/10.1109/CVPR.2019.00244
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.,
Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M.,
Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., n.d. PyTorch: An
Imperative Style, High-Performance Deep Learning Library.
Pedrosa, J., Queiros, S., Bernard, O., Engvall, J., Edvardsen, T., Nagel, E., D’hooge, J., 2017.
Fast and Fully Automatic Left Ventricular Segmentation and Tracking in
Echocardiography Using Shape-Based B-Spline Explicit Active Surfaces. IEEE Trans.
Med. Imaging 36, 2287–2296. https://doi.org/10.1109/TMI.2017.2734959
Pérez de Isla, L., Balcones, D.V., Fernández-Golfín, C., Marcos-Alberca, P., Almería, C.,
Rodrigo, J.L., Macaya, C., Zamorano, J., 2009. Three-dimensional-wall motion tracking:
a new and faster tool for myocardial strain assessment: comparison with two-
dimensional-wall motion tracking. J. Am. Soc. Echocardiogr. Off. Publ. Am. Soc.
Echocardiogr. 22, 325–330. https://doi.org/10.1016/j.echo.2009.01.001
Perperidis, A., 2016. Postprocessing Approaches for the Improvement of Cardiac Ultrasound B-
Mode Images: A Review. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 63, 470–485.
https://doi.org/10.1109/TUFFC.2016.2526670
Rodero, C., Strocchi, M., Marciniak, M., Longobardi, S., Whitaker, J., O’Neill, M.D., Gillette,
K., Augustin, C., Plank, G., Vigmond, E.J., Lamata, P., Niederer, S.A., 2021. Linking
statistical shape models and simulated function in the healthy adult human heart. PLoS
Comput. Biol. 17, e1008851. https://doi.org/10.1371/journal.pcbi.1008851
Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical
Image Segmentation, in: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.),

20
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture


Notes in Computer Science. Springer International Publishing, Cham, pp. 234–241.
https://doi.org/10.1007/978-3-319-24574-4_28
Roy, C.W., Marini, D., Segars, W.P., Seed, M., Macgowan, C.K., 2019. Fetal XCMR: a
numerical phantom for fetal cardiovascular magnetic resonance imaging. J. Cardiovasc.
Magn. Reson. Off. J. Soc. Cardiovasc. Magn. Reson. 21, 29.
https://doi.org/10.1186/s12968-019-0539-2
Scheetz, J., Rothschild, P., McGuinness, M., Hadoux, X., Soyer, H.P., Janda, M., Condon, J.J.J.,
Oakden-Rayner, L., Palmer, L.J., Keel, S., van Wijngaarden, P., 2021. A survey of
clinicians on the use of artificial intelligence in ophthalmology, dermatology, radiology
and radiation oncology. Sci. Rep. 11, 5193. https://doi.org/10.1038/s41598-021-84698-5
Segars, W.P., Sturgeon, G., Mendonca, S., Grimes, J., Tsui, B.M.W., 2010. 4D XCAT phantom
for multimodality imaging research. Med. Phys. 37, 4902–4915.
https://doi.org/10.1118/1.3480985
Shin, H.-C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L.,
Andriole, K.P., Michalski, M., 2018. Medical Image Synthesis for Data Augmentation
and Anonymization Using Generative Adversarial Networks, in: Gooya, A., Goksel, O.,
Oguz, I., Burgos, N. (Eds.), Simulation and Synthesis in Medical Imaging, Lecture Notes
in Computer Science. Springer International Publishing, Cham, pp. 1–11.
https://doi.org/10.1007/978-3-030-00536-8_1
Shorten, C., Khoshgoftaar, T.M., 2019. A survey on Image Data Augmentation for Deep
Learning. J. Big Data 6, 60. https://doi.org/10.1186/s40537-019-0197-0
Taha, A.A., Hanbury, A., 2015. Metrics for evaluating 3D medical image segmentation: analysis,
selection, and tool. BMC Med. Imaging 15, 29. https://doi.org/10.1186/s12880-015-0068-
x
Thorstensen, A., Dalen, H., Amundsen, B.H., Aase, S.A., Stoylen, A., 2010. Reproducibility in
echocardiographic assessment of the left ventricular global and regional function, the
HUNT study. Eur. J. Echocardiogr. J. Work. Group Echocardiogr. Eur. Soc. Cardiol. 11,
149–156. https://doi.org/10.1093/ejechocard/jep188
Uzunova, H., Ehrhardt, J., Handels, H., 2020. Memory-efficient GAN-based domain translation
of high resolution 3D medical images. Comput. Med. Imaging Graph. 86, 101801.
https://doi.org/10.1016/j.compmedimag.2020.101801
V7 | The AI Data Engine for Computer Vision & Generative AI [WWW Document], n.d. URL
https://www.v7labs.com/, https://www.v7labs.com (accessed 3.1.24).
Yadav, A.K., Roy, R., Kumar, A.P., Kumar, Ch.S., Dhakad, S.Kr., 2015. De-noising of ultrasound
image using discrete wavelet transform by symlet wavelet and filters, in: 2015
International Conference on Advances in Computing, Communications and Informatics
(ICACCI). Presented at the 2015 International Conference on Advances in Computing,
Communications and Informatics (ICACCI), pp. 1204–1208.
https://doi.org/10.1109/ICACCI.2015.7275776
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A., 2017. Unpaired Image-to-Image Translation Using
Cycle-Consistent Adversarial Networks, in: 2017 IEEE International Conference on
Computer Vision (ICCV). Presented at the 2017 IEEE International Conference on
Computer Vision (ICCV), pp. 2242–2251. https://doi.org/10.1109/ICCV.2017.244

21

You might also like