2.data Augmentation Pipeline
2.data Augmentation Pipeline
Kristin McLeod
GE Vingmed Ultrasound - GE Healthcare
Horten, Norway
ABSTRACT
Due to privacy issues and limited amount of publicly available labeled datasets in the
domain of medical imaging, we propose an image generation pipeline to synthesize 3D
echocardiographic images with corresponding ground truth labels, to alleviate the need
for data collection and for laborious and error-prone human labeling of images for
subsequent Deep Learning (DL) tasks. The proposed method utilizes detailed anatomical
segmentations of the heart as ground truth label sources. This initial dataset is combined
with a second dataset made up of real 3D echocardiographic images to train a Generative
Adversarial Network (GAN) to synthesize realistic 3D cardiovascular Ultrasound images
paired with ground truth labels. To generate the synthetic 3D dataset, the trained GAN
uses high resolution anatomical models from Computed Tomography (CT) as input. A
qualitative analysis of the synthesized images showed that the main structures of the heart
are well delineated and closely follow the labels obtained from the anatomical models. To
assess the usability of these synthetic images for DL tasks, segmentation algorithms were
trained to delineate the left ventricle, left atrium, and myocardium. A quantitative
analysis of the 3D segmentations given by the models trained with the synthetic images
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
indicated the potential use of this GAN approach to generate 3D synthetic data, use the
data to train DL models for different clinical tasks, and therefore tackle the problem of
scarcity of 3D labeled echocardiography datasets.
1 Introduction
Medical imaging plays a crucial role in optimizing treatment pathways. Saving time when it
comes to diagnosis and treatment planning enables the clinicians to focus on more complicated
cases.
Many modalities are used to image the heart, such as Computed Tomography (CT), Magnetic
Resonance (MR), and Ultrasound imaging, enabling several structural and functional parameters
related to the organ’s performance to be estimated. Such parameters are the basis of clinical
guidelines for diagnosis and treatment planning.
Echocardiography is the specific use of ultrasound to image the heart. This imaging modality is
widely used given its advantages of being portable, relatively low-cost, and the fact that the use
of ionizing radiation is not required.
Deep Learning (DL), and specifically Convolutional Neural Networks (CNNs), have become
extensively applied in medical image analysis because they facilitate the automation of many
tedious clinical tasks and workflows such as estimation of ejection fraction, for example. These
algorithms are capable of approaching human-level performance (Asch et al., 2019), thus
potentially saving clinicians’ time without decreasing the quality of care for patients. In fact,
clinicians agree that using DL algorithms in the clinical workflow also improves patient access
to disease diagnoses, increasing the final diagnosis confidence levels (Scheetz et al., 2021). DL
models can be developed to perform numerous medical tasks such as image classification,
segmentation and even region/structure detection (Aljuaid and Anwar, 2022)
Echocardiography images can be acquired both in 2D and 3D. Time can also be taken into
account, generating videos. 3D echocardiography images can be more difficult to assess than 2D
images. However, for some specific application cases, 3D image acquisition brings great
advantages since it can offer more accurate and reproducible measurements. One such case is
ventricle and atrium volumes (Pérez de Isla et al., 2009). Amongst the causes of lack of
annotated 3D echocardiography datasets are the higher complexity to acquire 3D
echocardiography images and the fact that 3D is still not part of all echocardiography routine
exams. Also, even when 3D images are recorded, delineating the structures in them is a
challenging, time consuming, and user dependent task. Taken together and adding the fact that
privacy regulations to access medical data are becoming stricter, these can explain why there is a
lack of publicly available datasets of such type of images. Therefore, having an approach able to
address this image scarcity is necessary. This current lack of 3D medical data and the great need
of high quality annotated data required by the DL models impacts the development of such
2
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
algorithms and therefore the scientific and technological development of the 3D medical imaging
field. Synthetic generation of labeled 3D echocardiography images is a DL based approach that
provides a solution for this problem.
Synthetic data can help in the development of DL models for image analysis (Shin et al., 2018)
and accurate labeling of these images. Furthermore, this approach works as a data augmentation
strategy by generating additional data. It is known that creating datasets with a combination of
real and synthetic images and use them to train algorithms that tackle medical challenges
represents a successful solution to the image scarcity (Chen et al., 2021) problem. Such type of
synthetic images even increase the heterogeneity present on these datasets, facilitating a more
efficient performance of the trained models as they are exposed to a larger variety of images.
Generative Adversarial Networks (GANs) are specific DL architectures that create models
capable of generating medical images closely resembling real images acquired from patients.
These deep generative models rely on a generator and a discriminator. While the straightforward
GAN discriminator distinguishes between real and fake, i.e., generated, images, the generator not
only attempts to deceive the discriminator but also tries to minimize the difference between the
generated image and the ground truth.
The generated synthetic images can even be associated with labels facilitating the acquisition of
large labeled datasets, eliminating the need for manual annotation, and therefore the variabilities
associated with the observer (Chuquicusma et al., 2018), which largely influences the final
output (Thorstensen et al., 2010). 3D heart models are a great source of anatomical labels since
they capture accurate information about the organ’s structures (Banerjee et al., 2016). Different
types of models can be used for this purpose, such as animated models, biophysical models, or
even anatomical models obtained from different imaging modalities (Segars et al., 2010), (Kainz
et al., 2019). Recently, CT models were used as label sources to generate 2D echocardiography
(Gilbert et al., 2021) and cardiac MR images (Roy et al., 2019), proving the utility of GANs for
this task.
Developing a pipeline to generate synthetic data using GANs to create labeled datasets addresses
the immense need for the large volume of data that DL algorithms require during training to
perform an image analysis task, eliminates the need to acquire the images from subjects, and
saves time of experienced professionals when annotating them, as the anatomical labels can be
extracted from anatomical models. Usually, when developing such generative models, imaging
artifacts are present and visible on the synthetically generated images. This widely common
GAN performance drawback is addressed by applying some image post-processing operations
(Perperidis, 2016) on the synthetically generated 3D echocardiography images.
In practice synthetic images can be used to train DL models because they represent a good data
augmentation strategy (Chai et al., 2021). For instance, 3D medical image segmentation is the
most common example of a medical task to which DL can turn out to be a good application.
Labeled datasets made of real images combined with synthetic ones, which even include the
respective anatomical labels, become the training dataset for 3D DL models, addressing the
problem of sparse 3D medical data availability (Lustermans et al., 2022).
3
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
DL has become widely used in medical imaging due to its potential in image segmentation,
classification, reconstruction, and synthesis across all imaging modalities. Image synthesis has
been a research topic for a few decades now, where some of the more conventional approaches
use human-defined rules and assumptions like shape priors, for example (Pedrosa et al., 2017).
Also, these image synthesis techniques depend on the imaging modality being considered to
perform certain tasks. To tackle these shortcomings, CNNs are now becoming a widely used
approach for image synthesis across many medical imaging modalities.
Many reasons motivate medical image generation, both 2D and 3D. Generative algorithms can
perform domain translation, with a large applicability when converting images from one imaging
modality to a different one, as (Uzunova et al., 2020) showed in their work converting 3D MR
and CT brain images. GANs can also be used to generate a ground truth for a given input, as
these DL models can be trained in a cyclic way, as is the case of the CycleGAN (Zhu et al.,
2017), for example. Additionally, generation of synthetic data used for DL algorithms also
motivates the application and development of GAN architectures. Several research groups were
able to generate medical images using this methodology as a data augmentation tool, even
though most of them were developed under a 2D scenario and focused on a few imaging
modalities, mainly MRI and CT. These imaging modalities raise less challenges when compared
with Ultrasound due to the nature of the physics behind the acquisition process.
Ultrasound images have an inherent and characteristic speckle pattern and their quality is largely
influenced by the scanner, the sonographer, and the patient anatomy. When it comes to
generating 3D Ultrasound images a few more challenges arise, with the speckle pattern having to
be consistent throughout the whole volume being the main one. The anatomical information
present in the generated volume also has to hold this consistency feature.
(Huo et al., 2019) trained a 2D GAN model, SynSegNet, on CT images and unpaired MR labels
using a CycleGAN. Similarly, (Gilbert et al., 2021) proposed an approach to synthesize labeled
2D echocardiography images, using anatomical models and a CycleGAN as well. The
CycleGAN was proposed by (Zhu et al., 2017) and works under an unpaired scenario: the images
from one training domain do not have to be related with the images belonging to the other
domain. This GAN learns how to map the images from one to another and vice-versa. The paired
version of this GAN is called Pix2pix. (Isola et al., 2017) proposed this image synthesis method
which generates images from one domain to the other, and vice-versa, however the images
belonging to the training domains are paired.
As mentioned, 3D echocardiographic data is sparser, but these images can be generated using
GANs, and then used to train new algorithms. Both (Gilbert et al., 2021) and (Amirrajab et al.,
2020) investigated the potential use of GAN synthesized datasets to train CNNs to segment
different cardiac structures on different imaging modalities, but these methods were limited to
2D.
(Hu et al., 2017) attempted to generate 2D fetal Ultrasound scan images at certain 3D spatial
locations. They concluded that common GAN training problems such as mode collapse occur.
(Abbasi-Sureshjani et al., 2020) developed a method to generate 3D labeled Cardiac MR images
relying on CT anatomical models to obtain labels for the synthesized images, using a SPADE
GAN (Park et al., 2019). More recently, (Cirillo et al., 2020) adapted the original Pix2pix model
to generate 3D brain tumor segmentations.
4
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
When dealing with medical images, U-Net (Ronneberger et al., 2015) is a widely used CNN
model to perform image segmentation, for example, since it provides accurate delineation of
several structures on these images. More recently, (Isensee et al., 2021) proposed nnU-Net (“no
new net”), which automatically adapts to any new datasets and enables accurate segmentations.
nnU-Net can be trained on a 3D scenario and optimizes its performance to new unseen datasets
and different segmentation tasks, requiring no human intervention.
Existing work to address the challenges of automatic image recognition, segmentation, and
tracking in echocardiography has been mostly focused on 2D imaging. In particular, recent work
indicates the potential for applying DL approaches to accurately perform measurements in
echocardiography images. (Alsharqi et al., 2018) and (Østvik et al., 2018) used a DL algorithm
to segment the myocardium in 2D echocardiographic images, from which the regional motion,
and from this the strain, were measured. They showed that motion estimation using CNNs is
applicable to echocardiography, even when the networks are trained with synthetic data. This
work supports the hypothesis that similar approaches could also work for 3D synthetic data.
A large amount of work has been carried out on medical imaging generation and it still
represents a challenge for the research community. To the best of our knowledge, the challenge
of synthesizing 3D echocardiography images using GANs did not produce any reproducible
results, therefore we propose a framework able to address this need.
1.2 Contributions
We propose an approach for synthesizing 3D echocardiography images paired with
corresponding anatomical labels suitable as input for training DL image analysis tasks. Thus, the
main contributions of the proposed pipeline beyond the state of the art include:
1. The extension of (Gilbert et al., 2021) work from 2D to 3D, adapting it from an unpaired
to a paired framework (3D Pix2pix) and proposing an automatic pipeline to generate any
number of 3D echocardiography images, tackling the lack of public 3D echocardiography
datasets and corresponding labels.
2. The creation of a blueprint of heart models and post-processing methods for optimal
generation of 3D synthetic data, creating a generic data augmentation tool, this way
addressing the lack of 3D data generation works in echocardiography, since it
significantly varies from 2D.
3. The demonstration of the usability of these synthetic datasets for training segmentation
models that achieve high performance when applied to real images.
2 Methodology
The proposed pipeline is summarized in Fig. 1 and described in the following sections. Section
II-A describes the preprocessing stage of annotation of the GAN training images to create
anatomical labels for these. The training and inference stages are addressed in Section II-B
describing how the GAN model was trained and used to synthesize 3D echocardiography images
from CT-based anatomical labels and how different post-processing approaches, as described in
Section II-C, were applied to these synthetic images. Next, on Section II-D, details regarding the
5
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
creation of several synthetic datasets used to train 3D segmentation models are given, followed
by Section II-E where the influence of adding real images to the synthetic datasets to train
segmentation models is assessed.
Figure 1: 3D echocardiography image generation pipeline and inference results. Step 1: during the preprocessing stage, a set of
15 3D heart volumes were labeled by a cardiologist and anatomical labels for the LV, LA and MYO were generated. To train the
3D Pix2pix GAN model, the anatomical labels are paired together with the corresponding real 3D images. Step 2: at inference
time, the GAN model generates one 3D image. An example obtained during this stage is shown. The proposed method is able to
generate physiologically realistic images, giving correct structural features and image details. Step 3: to show the utility of the
synthetic datasets, 3D segmentation models were trained using these GAN generated images (black arrow), but other DL tasks
can be addressed.
6
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
volumes. These contours were then postprocessed, applying a spline function to the contour
points and resampling it, in order to generate gray scale labeled images. All the 3D images
present on each training dataset were sized to 256 x 256 x 32.
7
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
During the post-processing stage of the synthetic images generated by the GAN, two different
algorithms were experimented. The synthesized images were (a) filtered using the discrete
wavelet transform, following (Yadav et al., 2015) work and (b) masked with an Ultrasound cone.
The wavelet denoising operation uses wavelets that localize features in the data, preserving
important image features while removing unwanted noise, such as checkerboard artifacts. An
image mask representing the Ultrasound cone shape was applied to all synthesized images in
order to match true Ultrasound data.
2.4 3D Segmentation
The GAN pipeline was able to generate labeled instances of 3D echocardiography images, as the
model is capable of performing paired domain translation operations. To investigate the utility of
the synthetic images, four 3D segmentation models were trained using the generated synthetic
images as training set.
The trained model architecture for the 3D segmentation task was the 3D nnU-Net (Isensee et al.,
2021). This network architecture was proposed as a self-adapting framework for medical image
segmentation. This DL model adapts its training scheme, such as the loss function or slight
variations on the model architecture, to the dataset being used and to the segmentation task being
performed. It automates necessary adaptations to the dataset such as preprocessing, patch and
batch size, and inference settings without the need of user intervention.
To train the first of four 3D segmentation models, MSynthetic, described in this section, a labeled
dataset made of 27 synthetically generated 3D echocardiography images (256 x 256 x 32),
DSynthetic, was used. This dataset was obtained from the proposed 3D GAN pipeline at inference
time, using anatomical labels from 27 CT 3D anatomical models.
To evaluate the effect of the post-processing operations on the synthesized images, three other
datasets were created – DWavelet, DCone, and DWaveletCone – and three additional segmentation
models were trained using these – MWavelet, MCone, and MWaveletCone, respectively (Fig. 2). DWavelet
was made of the original synthetic images from the DSynthetic dataset but where the wavelet
denoising post-processing algorithm was applied, and DCone, was composed by the original
synthetic images with the cone reshaping post-processing operation. Finally, a fourth dataset
where both post-processing transformations – wavelet denoising and cone reshaping – were
applied to the original synthetic images, DWaveletCone, was created. All four datasets contained 27
3D echocardiography images with corresponding anatomical labels for the LV, LA and MYO.
All four 3D segmentation models, MSynthetic, MWavelet, MCone, and MWaveletCone, using nnU-Net,
were trained on a 5-fold cross validation scenario during 800 epochs. The initial learning rate
was 0.01 and the segmentation models were also built using PyTorch (Paszke et al., n.d.). The
loss function was a combination of dice and cross-entropy losses, as described in the original
work by (Isensee et al., 2021).
Dice scores were used to assess the quality of the segmentations. This score measures the
overlap between the predicted segmentation and the ground truth label extracted from the CT
anatomical models. For each segmented structure the Dice score obtained at validation time is a
value between 0 and 1, where the latter represents a perfect overlap between the prediction and
the ground truth.
8
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Figure 2: Overview of all the created datasets and trained models in this work. The generative model, 3D Pix2pix, was trained in
order to be used to generate synthetic 3D echocardiography datasets. This dataset, DSynthetic, was postprocessed applying different
transformations and 3 other datasets were created – DWavelet, DCone, and DWaveletCone. A fifth dataset completely made of real
images, DReal, was created and to it, synthetic images from DSynthetic were added creating D17Real10Augmented and D17Real20Augmented. All
9
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
these 7 datasets were used to train 7 3D segmentation models – MSynthetic, MWavelet, MCone, MWaveletCone, MReal, M17Real10Augmented, and
M17Real20Augmented.
3 Results
This work’s results are presented as follows: Section III-A focuses on the GAN training,
architectural modifications performed on the 3D Pix2pix model and their influence on the
synthesized images. In Section III-B the influence of post-processing the synthetic images is
shown. Finally, Sections III-C and III-D show the segmentation predictions from several models
trained on different 3D echocardiography datasets (Fig. 2), as described in Sections II-C and II-
D.
Figure 3: Influence of architectural changes on the GAN generator to remove checkerboard artifacts. At inference time, a 3D
anatomical model was used to extract the anatomical labels. The first column shows 2 different slices of this volume at different
rotation angles. The middle column shows that synthesizing images using a GAN with upsampling layers smoothens the
checkerboard artifacts but introduces blurring, which is not visible on the images when using a GAN with deconvolution layers
(right column). Deconvolution layers are preferred to upsampling ones.
10
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
After training the 3D GAN model and generating synthetic images corresponding to the input
anatomical models, as described in Section II-C, the obtained 3D echocardiography images were
post-processed in order to remove the aforementioned checkerboard artifacts.
The cone edges were slightly wavy in some cases and checkerboard artifacts were sometimes
present. The postprocessing experiment, where different transformations were applied to the
synthesized images, showed that applying these can give a more realistic aspect to the GAN-
generated images, ensuring that the anatomical information remained intact.
Performing these operations allowed to give a more realistic look to the generated
echocardiography images (Fig. 4).
Figure 4: 3D Pix2pix model inference results and post-processing step. At inference time, the anatomical labels were extracted
from a 3D heart model. The first column shows 3 different rotation planes of this volume at different rotation angles. After
generating the correspondent synthetic ultrasound image (second column) for this inference case, it was post-processed applying
a wavelet denoising transformation to eliminate the checkerboard artifacts (third column) and also a cone reshaping step to
smooth the wavy edges of the ultrasound cone (fourth column). Post-processing operations give a more realistic look to the
synthesized images as indicated by the enlarged areas framed in red and green (wavelet denoise) and the white arrows (cone
reshape).
11
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Adding to the Dice scores and to sustain the usability of synthetic images to train segmentation
algorithms, Fig. 5 shows the 3D segmentation for an inference 3D echocardiography image
acquired from a real subject. Each trained segmentation model was tested on real cases, at
inference time.
Table 1: Average validation dice scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) for each
trained model on completely synthetic datasets – MSynthetic, MWavelet, MCone, and MWaveletCone. The best scores are highlighted.
Models
MSynthetic MWavelet MCone MWaveletCone
LV 0.926 ± 0.006 0.927 ± 0.005 0.926 ± 0.006 0.924 ± 0.008
LA 0.818 ± 0.011 0.816 ± 0.010 0.816 ± 0.021 0.814 ± 0.016
MYO 0.808 ± 0.016 0.808 ± 0.017 0.803 ± 0.018 0.801 ± 0.023
Figure 5: Inference segmentation results from each trained model on synthetic datasets. On the left is shown a schematic
representation of the heart and 2 cutting planes correspondent to a real 3D echocardiography image from the test set: the 4-
chamber (CH), with blue frame, and the 2-CH, with red frame. On the right, the LV, LA, and MYO segmentation results provided
by each of the 4 segmentation models: a) MSynthetic, b) MWavelet, c) MCone, and d) MWaveletCone follow. A qualitative analysis of the
segmentation results from each of the models, shows that the one where the training data was not post-processed, MSynthetic, gives
the best output due to a smoother segmentation of the relevant structures.
12
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Models
MReal M17Real10Augmented M17Real20Augmented
LV 0.938 ± 0.008 0.928 ± 0.006 0.927 ± 0.007
LA 0.862 ± 0.023 0.830 ± 0.016 0.826 ± 0.017
MYO 0.724 ± 0.028 0.767 ± 0.027 0.763 ± 0.025
Fig. 6 shows the predicted segmentations given by these trained models, next to the ground truth
segmentation provided by a cardiologist. The models were tested on a test set made of 3D
echocardiography images from real subjects. To compare the output segmentation from the DL
models, the Dice scores and VS were calculated based on the predicted segmentations and the
anatomical labels from a cardiologist and the results are in Table 3.
Figure 6: Inference segmentation results from the trained models on augmented datasets with synthetic images. On the left is
shown a schematic representation of the heart and 2 cutting planes correspondent to a real 3D echocardiography image from the
test set: the 4-CH, with blue frame, and the 2-CH, with red frame. On the right, the LV, LA, and MYO segmentation results
provided by the following 3 segmentation models: a) MReal, b) M17Real10Augmented, and c) M17Real20Augmented, follow. To allow
comparison and measure the Dice score and VS, d) shows the ground truth segmentation performed by a cardiologist. A
qualitative analysis of the segmentation results from each of the models, shows that combining synthetic with real data improves
the segmentation output due to a more accurate segmentation of the relevant structures.
13
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Table 3: Average test set dice scores (average ± standard deviation) of each segmented structure (LV, LA, and MYO) and Volume
Similarity of the segmented volume for the MReal, M17Real10Augmented, and M17Real20Augmented models. The best scores are highlighted.
Models
MReal M17Real10Augmented M17Real20Augmented
Dice score
LV 0.924 ± 0.019 0.929 ± 0.020 0.922 ± 0.017
LA 0.876 ± 0.023 0.874 ± 0.020 0.867 ± 0.022
MYO 0.666 ± 0.041 0.708 ± 0.053 0.680 ± 0.063
Volume Similarity
Heart Volume 0.831 ± 0.038 0.844 ± 0.047 0.836 ± 0.041
4 Discussion
In this work we built a pipeline to generate synthetic 3D labeled echocardiography images using
a GAN model. These realistic-looking synthetic datasets were used to train 3D DL models to
segment the LV, LA, and MYO.
Moreover, combined datasets including synthetic and real 3D images were created, with the VS
metric supporting that generated 3D echocardiography images can be used to train DL models,
as data augmentation. Segmentation tasks were considered to exemplify the utility of the
synthesized data, however the pipeline is generic and could be applied to generate other imaging
data and train any DL tasks with anatomical labels as input, as further discussed in this section.
A brief discussion on future applications and modifications of this approach is also presented.
14
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
dataset whose images were not post-processed, MSynthetic, provided the best segmentation
prediction.
The results regarding the influence of the post-processing step on the synthetically generated
images supported the fact that applying a wavelet denoising transformation or cone reshaping, or
even both transformations together, to these, in order to make the synthetic images look even
more realistic, does not necessarily lead to better results when segmenting the LV, LA, and
MYO (Fig. 5). This result shows some dependence on the DL task being performed. We
segmented large volumes of the 3D image, comparing to its whole content. For this reason, the
subtle differences in the voxels intensities that create the checkerboard artifacts do not seem to
affect the prediction of the segmentation model.
To create the used synthetic datasets, CT acquired 3D anatomical models of the heart were used
to extract the anatomical labels and create the input cases to the 3D GAN. The segmentation
results and the echocardiography-looking aspect of the synthetic images pointed towards the
generalization of this pipeline, as it can synthesize 3D echocardiography images, having as labels
source different types of 3D models of the heart. The methodology to generate synthetic datasets
can be generalized to other modalities, diseases, organs, as well as structures within the same
organ (sub-regions of the heart, for example).
(Shin et al., 2018) and (Shorten and Khoshgoftaar, 2019) showed that GANs can be widely used
to perform data augmentation of medical image datasets. The work from these authors, together
with the presented results, encourage the main contributions of this work stating that GANs can
be used to generate synthetic images with labels, working as a data augmentation strategy, and
tackling the concern of scarcity of 3D echocardiography labeled datasets, especially if there are
underrepresented data samples within the available real datasets.
15
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Fig. 6 d) showed the ground truth inference case segmentation performed by a cardiologist. From
these ground truth segmentations available for all the cases in the test set, the Dice scores and the
VS in Table 3 were calculated.
Given the 3D nature of the task and due to the Dice metric limitations, the VS was additionally
calculated and used as comparison metric. In particular, M17Real10Augmented showed to perform
better at segmenting when the Dice score was considered as performance metric. On the other
hand, M17Real20Augmented performed better in terms of VS metric. These results showed that the
models trained on the combined datasets, i.e., with real and synthetic images, provided more
accurate segmentation outputs (the 3D volume), relatively to the model trained with only real
data, MReal. The results support the previous work done by (Lustermans et al., 2022), confirming
that including synthetic data on datasets made of real data improves and helps the final outcome
of the DL models.
Additionally, this result reinforces that the proposed pipeline, relying on a 3D GAN model, can
be used as a data augmentation tool. This framework arises as a solution to the lack of publicly
available medical labeled datasets.
16
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
transformations and cone reshaping algorithms that can be experimented to post-process the
images.
We trained several DL models to perform 3D segmentation to show that synthesized images can
be used as input to train DL models. Nevertheless, the pipeline is generic and could be applied to
other DL tasks that automatically assign anatomical labels to images, e.g., structure/feature
recognition or automatic structural measurements. Furthermore, the GAN-generated labeled
datasets are not only useful as input to train DL models but also could be used to train
researchers and clinicians on image analysis.
Finally, during this pipeline development, computational memory constraints were present,
mainly due to the large size of 3D volumes, complicating the process of developing a framework
adapted to these. Future work will include study strategies to overcome these limitations.
5 Conclusion
Acknowledgements
This work was supported by the European Union’s Horizon 2020 research and innovation
programme under the Marie – Skłodowska – Curie grant agreement No 860745.
Appendix A
17
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Table 4: Validation dice scores of each segmented structure (LV, LA, and MYO) for each trained model on combined datasets -
MReal, M17Real10Augmented, and M17Real20Augmented. The higher the score, the better the agreement between the model prediction and the
ground truth segmentation. The best training fold of each model is highlighted.
Models
MReal M17Real10Augmented M17Real20Augmented
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
LV 0.933 0.932 0.950 0.930 0.943 0.924 0.929 0.919 0.938 0.930 0.917 0.928 0.935 0.934 0.923
LA 0.837 0.869 0.873 0.837 0.896 0.830 0.841 0.820 0.808 0.853 0.831 0.841 0.838 0.829 0.793
MYO 0.710 0.699 0.766 0.697 0.750 0.745 0.745 0.779 0.815 0.735 0.715 0.771 0.766 0.780 0.785
Table 5: Validation dice scores of each segmented structure (LV, LA, and MYO) for each trained model on completely synthetic
datasets – MSynthetic, MWavelet, MCone, and MWaveletCone. The higher the score, the better the agreement between the model prediction
and the ground truth segmentation. The best training fold of each model is highlighted.
Models
MSynthetic MWavelet MCone MWaveletCone
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
LV 0.924 0.930 0.924 0.918 0.934 0.927 0.930 0.923 0.919 0.934 0.926 0.930 0.921 0.918 0.933 0.928 0.928 0.914 0.914 0.935
LA 0.837 0.831 0.809 0.810 0.807 0.821 0.831 0.806 0.815 0.805 0.837 0.842 0.805 0.811 0.787 0.834 0.832 0.803 0.793 0.807
MYO 0.824 0.822 0.794 0.784 0.816 0.829 0.822 0.791 0.785 0.814 0.824 0.819 0.783 0.780 0.809 0.828 0.813 0.775 0.773 0.816
References
Abbasi-Sureshjani, S., Amirrajab, S., Lorenz, C., Weese, J., Pluim, J., Breeuwer, M., 2020. 4D
Semantic Cardiac Magnetic Resonance Image Synthesis on XCAT Anatomical Model, in:
Proceedings of the Third Conference on Medical Imaging with Deep Learning. Presented
at the Medical Imaging with Deep Learning, PMLR, pp. 6–18.
Aljuaid, A., Anwar, M., 2022. Survey of Supervised Learning for Medical Image Processing. SN
Comput. Sci. 3, 292. https://doi.org/10.1007/s42979-022-01166-1
Alsharqi, M., Woodward, W.J., Mumith, J.A., Markham, D.C., Upton, R., Leeson, P., 2018.
Artificial intelligence and echocardiography. Echo Res. Pract. 5, R115–R125.
https://doi.org/10.1530/ERP-18-0056
Amirrajab, S., Abbasi-Sureshjani, S., Al Khalil, Y., Lorenz, C., Weese, J., Pluim, J., Breeuwer,
M., 2020. XCAT-GAN for Synthesizing 3D Consistent Labeled Cardiac MR Images on
Anatomically Variable XCAT Phantoms, in: Martel, A.L., Abolmaesumi, P., Stoyanov, D.,
Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (Eds.), Medical
Image Computing and Computer Assisted Intervention – MICCAI 2020, Lecture Notes in
Computer Science. Springer International Publishing, Cham, pp. 128–137.
https://doi.org/10.1007/978-3-030-59719-1_13
Asch, F.M., Poilvert, N., Abraham, T., Jankowski, M., Cleve, J., Adams, M., Romano, N., Hong,
H., Mor-Avi, V., Martin, R.P., Lang, R.M., 2019. Automated Echocardiographic
Quantification of Left Ventricular Ejection Fraction Without Volume Measurements
Using a Machine Learning Algorithm Mimicking a Human Expert. Circ. Cardiovasc.
Imaging 12, e009303. https://doi.org/10.1161/CIRCIMAGING.119.009303
Banerjee, I., Catalano, C.E., Patané, G., Spagnuolo, M., 2016. Semantic annotation of 3D
anatomical models to support diagnosis and follow-up analysis of musculoskeletal
18
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
19
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
Kainz, W., Neufeld, E., Bolch, W.E., Graff, C.G., Kim, C.H., Kuster, N., Lloyd, B., Morrison, T.,
Segars, P., Yeom, Y.S., Zankl, M., Xu, X.G., Tsui, B.M.W., 2019. Advances in
Computational Human Phantoms and Their Applications in Biomedical Engineering – A
Topical Review. IEEE Trans. Radiat. Plasma Med. Sci. 3, 1–23.
https://doi.org/10.1109/TRPMS.2018.2883437
Lustermans, D.R.P.R.M., Amirrajab, S., Veta, M., Breeuwer, M., Scannell, C.M., 2022.
Optimized automated cardiac MR scar quantification with GAN‐based data
augmentation. Comput. Methods Programs Biomed. 226, 107116.
https://doi.org/10.1016/j.cmpb.2022.107116
Odena, A., Dumoulin, V., Olah, C., 2016. Deconvolution and Checkerboard Artifacts. Distill 1,
e3. https://doi.org/10.23915/distill.00003
Østvik, A., Smistad, E., Espeland, T., Berg, E.A.R., Lovstakken, L., 2018. Automatic Myocardial
Strain Imaging in Echocardiography Using Deep Learning, in: Stoyanov, D., Taylor, Z.,
Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S.,
Bradley, A., Papa, J.P., Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M.,
Greenspan, H., Madabhushi, A. (Eds.), Deep Learning in Medical Image Analysis and
Multimodal Learning for Clinical Decision Support, Lecture Notes in Computer Science.
Springer International Publishing, Cham, pp. 309–316. https://doi.org/10.1007/978-3-
030-00889-5_35
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y., 2019. Semantic Image Synthesis With Spatially-
Adaptive Normalization, in: 2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR). Presented at the 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 2332–2341.
https://doi.org/10.1109/CVPR.2019.00244
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.,
Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M.,
Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., n.d. PyTorch: An
Imperative Style, High-Performance Deep Learning Library.
Pedrosa, J., Queiros, S., Bernard, O., Engvall, J., Edvardsen, T., Nagel, E., D’hooge, J., 2017.
Fast and Fully Automatic Left Ventricular Segmentation and Tracking in
Echocardiography Using Shape-Based B-Spline Explicit Active Surfaces. IEEE Trans.
Med. Imaging 36, 2287–2296. https://doi.org/10.1109/TMI.2017.2734959
Pérez de Isla, L., Balcones, D.V., Fernández-Golfín, C., Marcos-Alberca, P., Almería, C.,
Rodrigo, J.L., Macaya, C., Zamorano, J., 2009. Three-dimensional-wall motion tracking:
a new and faster tool for myocardial strain assessment: comparison with two-
dimensional-wall motion tracking. J. Am. Soc. Echocardiogr. Off. Publ. Am. Soc.
Echocardiogr. 22, 325–330. https://doi.org/10.1016/j.echo.2009.01.001
Perperidis, A., 2016. Postprocessing Approaches for the Improvement of Cardiac Ultrasound B-
Mode Images: A Review. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 63, 470–485.
https://doi.org/10.1109/TUFFC.2016.2526670
Rodero, C., Strocchi, M., Marciniak, M., Longobardi, S., Whitaker, J., O’Neill, M.D., Gillette,
K., Augustin, C., Plank, G., Vigmond, E.J., Lamata, P., Niederer, S.A., 2021. Linking
statistical shape models and simulated function in the healthy adult human heart. PLoS
Comput. Biol. 17, e1008851. https://doi.org/10.1371/journal.pcbi.1008851
Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical
Image Segmentation, in: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.),
20
Tiago et al., A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN
21