Sciadv Abb7973
Sciadv Abb7973
segmentors and GANs was boosted by shape consistency and online eration of synthesized radiographs, was much faster with a rate of
augmentation, respectively. Nevertheless, image-based conditioning 67,925, 41,379, and 4511 generated radiographs per hour at the
always carries the risk of leaking patient-sensitive data to the gener three spatial resolution stages. Sample images are shown in fig. S4A
ator during the training process. for a spatial resolution of 256 × 256. Further images for spatial res
Here, we propose to use generative models (GMs) on the basis of olutions of 512 × 512 and 1024 × 1024 are given in the Supplemen
convolutional GANs (18) to break the boundary of sharing medical tary Materials.
images and to enable merging of disparate databases without the We have chosen the multiscale structural similarity (MS-SSIM)
limitations that are now restricting the collection of radiographs in as a metric (19) to detect a possible mode collapse of our GAN (i.e.,
a public database (see Fig. 1). To demonstrate the performance of missing diversity in the images). The MS-SSIM has been successfully
our concept, we show that fully synthetic and thus anonymous used in predicting perceptual similarity judgments of humans. A
images can be generated, which look deceivingly real—even to the lower MS-SSIM reflects perceptually distinct samples and proves
expert’s eye—and that these images can be used in the medical data the high diversity of a dataset. In fig. S2, we depict the MS-SSIM of
sharing process. Our concept proposes how medical images or data 1000 randomly selected pairs of samples within a given pathology
can be shared in the future. class. As can be seen, the overall MS-SSIM among synthesized pairs
is comparable to that of real sample pairs.
Fig. 1. Concept of constructing a public database without disclosing patient-sensitive data. The GAN in each hospital consists of a generator G and a discriminator
D. During training, patient-sensitive data (shown in red) are never exhibited to the generators G directly. Patient-sensitive data is only exhibited to discriminator D while
it is trying to differentiate between real and synthesized radiographs. After training is completed, only the generators G are transferred to a public database and can be
used to generate synthesized radiographs.
more easily at spatial resolutions of (512 × 512)/(1024 × 1024) with renchyma, which reflect the network’s difficulty to generate fine
accuracies of 67 ± 17%/82 ± 4% for group 1 and 65 ± 5%/77 ± 13% details (see fig. S4E).
for group 2. Thus, radiologists and CV experts performed similarly
when identifying synthesized radiographs at high resolutions when Ensuring non-transference of private information
judged by their accuracy. As shown in table S2, the sensitivity, i.e., To exclude the possibility that the GAN memorizes and subse
the correct identification of a synthesized radiograph, was higher quently merely reproduces the given training examples, 1000 ran
than the specificity, i.e., the correct identification of a real radiograph. domly synthesized radiographs were generated, and their nearest
This is probably attributable to the fact that some synthesized radio neighbors in the database of real radiographs were sought accord
graphs show telltale signs of synthesization (see fig. S4E) and thus ing to the structured similarity index. All 1000 radiographs along
allow for a more reliable identification. While radiologists predom with their respective three nearest neighbors were then plotted, and
inantly detected errors in anatomical details such as bone shape or a board-certified radiologist assessed whether an entity from the
rib cage morphology, CV experts tended to focus more on tiny de database of real radiographs had been duplicated.
tails such as wave-like patterns (see fig. S4E). There was no inter In this set of 1000 randomly drawn GAN images, we did not find
reader agreement between the readers for spatial resolutions of 256, a single instance in which the synthesized radiograph looked iden
underlining the fact that identification of synthesized radiographs tical to its closest neighbor in the real dataset (fig. S4B). When as
at this spatial resolution stage is hardly possible (see Table 1). At sessing similarity in terms of the SSIM, we did not find a single case
higher spatial resolutions, the interreader agreement was consistently in a set of 105 randomly drawn synthesized radiographs, in which a
higher following the found higher accuracy in identifying synthe digital twin was found in any of the real radiographs. In addition,
sized radiographs. These results were observed under restrictions: the reader was asked to examine the synthesized radiograph for lo
The radiographs were assessed on conventional 24-inch computer cal information that might lead to the identification of a specific
Table 1. Real/synthesized radiographs test. Accuracy and interreader agreement for the group of three CV experts, three radiologists, and all readers when
differentiating whether the presented radiograph is real or synthesized.
256 × 256 512 × 512 1024 × 1024
Accuracy, % Fleiss’ kappa Accuracy, % Fleiss’ kappa Accuracy, % Fleiss’ kappa
CV experts 60 ± 5 −0.03 67 ± 17 0.07 82 ± 4 0.46
Radiologists 51 ± 5 0.10 65 ± 5 0.18 77 ± 13 0.39
All readers 55 ± 7 0.00 67 ± 14 0.07 80 ± 10 0.37
laws. We simulated a local dataset with a limited diversity by using potential influence that the size of the training set might have had
a subset of the RSNA dataset with 1000 real x-rays for training, of which on the performance of the classifiers. Similarly, improved perform
only 5% exhibited signs of pneumonia. The resulting classifier achieved ance measures were found for sensitivity, specificity, accuracy,
an area under the curve (AUC) of 0.74 on a test set of 6000 previously positive predictive value (PPV), negative predictive value (NPV),
unseen x-rays from the RSNA dataset (see Fig. 2B). To alleviate the lim and F1 score (see Fig. 2C). This experiment thus demonstrated that
iting scarcity of pathological images and improve the classifier perform our pooled dataset approach is capable of improving deep learning
ance, we used our public database of generated images [trained on classifiers by enriching scarce datasets.
the National Institutes of Health (NIH) and the Stanford dataset]. To simulate the data merging scenario as outlined in Fig. 1, we
From this database, we randomly sampled a total of 500 synthesized compared the results of a comprehensive pathology classification,
x-rays: 100 that exhibited signs of pneumonia and 400 that exhibited i.e., cardiomegaly, effusion, pneumothorax, atelectasis, edema, con
no signs of pneumonia. These were then joined with 500 real x-rays solidation, and pneumonia, with a classifier solely trained on the
from the RSNA subset (450 healthy and 50 pneumonia), which re NIH-GAN versus a classifier that was trained on merged synthesized
sulted in a set of 1000 x-rays for training of the classifier (healthy: images from different sources. Generated samples of our Stanford-
450 real and 400 synthesized; pneumonia: 50 real and 100 synthe GAN can be found in fig. S5. The average values of the AUC, accuracy,
sized). When trained on this artificially enriched set of x-rays, the sensitivity, and specificity all increased significantly after integra
performance of the classifier increased with an overall AUC of 0.81. tion of the synthesized external dataset (see Fig. 3). This demonstrated
We hypothesize that the reason for the improvement in performance that the merging of multiple databases of generated radiographs can
was probably due to the greater diversity of pathological cases as boost the performance of classifier networks and can alleviate the
produced by the generator: As reflected by the lower MS-SSIM in performance bottleneck due to insufficient amounts of training data.
Fig. 2A, the GAN-augmented dataset (MS-SSIM, 0.18 ± 0.09) achieved The performance improvements have been achieved without
B C
Fig. 2. Pooled GAN training can improve pneumonia detection by enriching the diversity of the dataset. (A) Distributions of MS-SSIM of randomly selected
2450 pneumonia-positive pairs. Higher diversity of pneumonia cases in the GAN-augmented dataset is confirmed by a lower MS-SSIM (GAN-augmented MS-SSIM: 0.18 ±
0.09 versus RSNA subset MS-SSIM: 0.24 ± 0.12). (B)The performance of the classifier when trained on 1000 x-rays from the GAN-enriched dataset (healthy: 450 real and
400 synthesized; pneumonia: 50 real and 100 synthesized) reaches an AUC of 0.81 in pneumonia detection, outperforming that of a classifier trained on 1000 real x-rays
(healthy, 950; pneumonia, 50) that reaches an AUC of 0.74. (C) Similarly, improved performance measures were found for sensitivity (Sens), specificity (Spec), accuracy
(Accu), PPV, NPV, and F1 score. We used a test set of 6000 x-rays randomly sampled from the RSNA dataset to calculate those scores. The GANs used to generate the
synthesized x-rays were trained based on the NIH and Stanford datasets.
further performance improvements through domain adaption— trollable gradient/model updates, it is difficult to detect adversarial
now, an active area of research (21)—but would also most likely attacks and protect against them (23). However, the security of
make the classifier network more robust to deployments in differ GAN-based federated learning has the advantage of offering an ad
ent environments. This is an important aspect in the translation of ditional degree of freedom for screening of databases by using con
CV algorithms from the workbench to clinics. fidence calibrated checking (24) or manual inspection (25). We
therefore investigated the use of federated learning in training one
Federated averaging facilitates the training of local GANs central GAN as an alternative to the pooled GAN approach.
Large amounts of data are required to obtain robust results from To simulate hospitals with limited amount of training data, we
GANs that are trained locally. This potentially limits the sites at randomly sampled 20,000 patient radiographs from the Stanford
which a GAN can be trained to large hospitals. Federated learning CheXpert dataset and then partitioned them into 20 local clients
algorithms (22) offer a remedy to this limitation as the GAN can be each receiving 1000 patient radiographs. We trained and compared
trained without the original images, leaving the protected space of the following models: a centralized “20k model,” which was trained
the hospital. One possible reservation is that, because of the uncon on 20,000 patient radiographs, a centralized “1k model,” which was
solely trained on 1000 local radiographs, and a federated “20 × 1k Exemplary frames visualizing the transition are given in fig. S4C for
model,” which was trained federally (22) on 20 distributed datasets cardiomegaly and effusion. With cardiomegaly, we observed an
consisting of 1000 radiographs each (see Fig. 4A). enlargement of the projected heart shape, reflecting the expected
An important property of Wasserstein GANs is that their dis radiological change. Similarly, effusion showed the typical opacifi
criminator loss directly reflects the quality of generated samples cation of the lower lung field mirroring the collection of fluid there.
(26). We therefore visualized the negative discriminator loss in Animations for all of the 14 disease states are given in the Supple
Fig. 4B. As can be seen in Fig. 4B, because of insufficient training mentary Materials.
images, the centralized 1k GAN overfitted quickly and led to an un Second, the pixel-wise difference image between the fully dis
stable training. However, as indicated by a lower loss and Fréchet eased and the healthy radiograph was calculated and superimposed
inception distance (FID) in Fig. 4 (B and C), the federated trained on the healthy radiograph as a colormap (see Fig. 5A for a schematic
GAN (federated 20 × 1k) overcame this local overfitting issue and of the process). Examples of such found visualizations are given in
significantly outperformed the locally trained GAN. The loss curve Fig. 5B for all 14 pathologies.
of the federated GAN was smoother because it represented an aver One advantage of having full control over the disease state of the
age over local iteration losses. GAN radiographs is that any combination of diseases in a single
radiograph can be generated by changing the corresponding entries
Generated images as a visualization of in the input vector simultaneously. We found that the disease state
what neural networks see as represented by the GAN transition reflected the underlying dis
The images generated by the generator could be specifically con ease and was in good agreement with radiological expertise if many
trolled: By changing the part of the input vector signifying the dis marked examples of this disease were present in the original dataset
ease, radiographs with specific pathologies could be generated. We and if the disease-related changes occurred on a large scale of the
A C
Fig. 4. Federated learning facilitates GAN training when facing insufficient amounts of local data. Hospitals can use federated learning algorithms to train a global
GAN, and the central GAN deposit can serve as a hub. (A) Illustration of the GAN-related federated learning system. After local model initialization, local hospitals B and C
(in red frames) were selected to update their local models. The global generator and discriminator were updated by the weights (w) transferred to the aggregation server
(red arrows). All local models were subsequently redefined by the updated global GAN (blue arrows). The exchange of local and global weights continued until the glob-
al GAN converged. (B) Discriminator loss curves for three trained Wasserstein GANs. The Wasserstein GAN trained by federated averaging algorithm (federated 20 × 1k)
outperformed the centralized GAN trained on only 1000 x-rays (centralized 1k) and performed comparably to the centralized 20k GAN. (C) FID evaluations of the GAN
training process.
Generator
Healthy
(null vector) Concatenation
Concatenation Subtraction
A B
Fig. 5. Learned pathological features. (A) Generation of the disease-specific pixel map. A randomly chosen vector with 512 Gaussian distributed entries characterizes
one specific patient. The GAN was tasked with generating a healthy and a diseased radiograph of that patient (cardiomegaly in this example). A subtraction map was
generated to denote the changes brought about by the disease and was superimposed as a colormap over the generated healthy radiograph. (B) Disease-specific pat-
terns generated by the generator for an exemplary randomly drawn pseudopatient. Red denotes higher signal intensity in the pathological radiograph, while blue de-
notes lower signal intensity. Note that for some diseases such as cardiomegaly and edema, the pattern looks realistic, while the GAN struggled with diseases that have a
variable appearance and where ground truth data are limited, e.g., pneumonia. (C) Revealing correlations within generated pathological radiographs by the classifier
trained on the real dataset. For each pathology, 5000 random synthesized radiographs with a pathology label drawn from a uniform distribution between 0.0 and 1.0
were generated. The images were then rated by the classifier network, and Pearson’s correlation coefficient was calculated for each pairing of pathologies [shown in (C)
with the GAN cardiomegaly label on the x axis and the cardiomegaly and fibrosis classifier output on the y axis in red and blue, respectively]. (D) Resulting correlation
coefficients for all 14 × 14 pairings are displayed and color coded in (B). Clustering on the x axis (i.e., the GAN label axis) was performed to group related diseases. The
obtained clustered blocks are marked with white-bordered boxes.
and calculated the Pearson’s correlation coefficient (Fig. 5C). We This might be due to the limited number of those cases in the data
found that the clustering of related pathologies based on the cor sets. For example, for pneumonia-labeled cases in the NIH dataset,
relation coefficient agreed well with clinical intuition: Infiltration, only 1431 cases are positive and 110,689 are negative. As can be seen
pneumonia, consolidation, effusion, and edema—all pathologies in Fig. 5D, the NIH generator cannot reliably generate pneumonia-
where lung opacity increases—were related to each other while be related features. The nature of pneumonia was better captured by
ing distant from, e.g., emphysema or pneumothorax—diseases that the Stanford-GAN in Fig. 5D, which shows the challenging cases
are associated with increased radiolucency. The magnitude of diag from this particular dataset. This GAN was trained with a much
onal elements in the 14 × 14 matrix in Fig. 5D directly reflects the higher number of 20,656 pneumonia cases in the Stanford CheXpert
quality of the generation of pathological images with our method. dataset.
Diseases, such as cardiomegaly (corr = 0.8) and effusion (corr = 0.7)
could be reliably generated due to their localized and predictable
pathological features. However, the GAN trained on these par DISCUSSION
ticular datasets performed less reliable in generating infiltration In this study, we demonstrated that GANs can be used to gener
(corr = 0.5), emphysema (corr = 0.5), and pneumonia (corr = 0.5). ate deceivingly real-looking radiographs and to merge databases of
radiological images without disclosing patient-sensitive information. approach of pooling locally trained GANs is to apply a quality crite
This helps to build large radiological image databases for the training rion such as the FID or the inception score (28) assessment. Locally
of CV algorithms. While radiographs may, in principle, be abundantly trained generators can be rejected to be included in the central GAN
available, universal access is, in general, severely restricted due to repository. Second, federated learning allows for training of a single
data protection laws: privacy concerns restrict the export of sensitive global GAN with several smaller distributed databases as demon
patient information to extramural institutions, and often, only a small strated by Fig. 4A. In this way, several smaller databases can be
fraction of the available data can be used (e.g., a patient consent form combined and act as one large database without actually sharing the
may not be universally available). In these cases, GANs that have underlying patient information.
been trained in-house may serve as a mean to distribute the infor Attention needs to be paid to adversarial attacks on distributed
mation contained within the database without actually providing a learning systems. Models might be affected by poisoning attacks (29).
real snapshot of patient-sensitive data: only the weight distribution Local gradients can be easily manipulated and distorted before be
of the GAN needs to be transferred, and a representative synthesized ing transmitted to central servers, and adversarial attacks might not
dataset of millions of radiographs may be generated in reasonable be detected in the federated learning approach. Our GAN-based
computational time at a peripheral site. This is in contrast to previous distributed learning approach offers passive and active robustness
works of Shin and co-workers, in which lower–spatial resolution against adversarial attacks. The posterior distribution could be esti
synthesized images could be produced but always required recourse mated (30), and the confidence threshold (24) of any given example in
to the original patient images as inputs to an image to image trans the local training set could be deduced. Such confidence thresholds
lational network (13). Another group has previously demonstrated could be used to detect and filter the suspicious training examples
that synthesized chest radiographs can be used to augment training, to secure GAN training from dataset poisoning (29). In addition,
see, e.g., Salehinejad et al. (27). However, they used a less advanced adversarial training (31) is an efficient method to increase model
set was kept separately until the final testing of the algorithms. Dedicated explanations about techniques used in our GANs can
Detailed label statistics for the ChestX-ray 14 dataset can be found be found at the “Network training details” section in the Supple
in the “Preprocessing steps in CheXpert dataset” section in the Sup mentary Materials.
plementary Materials and in table S3. Second, a densely connected CNN with 121 layers (DenseNet-121)
The second dataset used in this study is the CheXpert dataset, was used as a classifier. It was pretrained on 14 million natural images
which has been released by Irvin et al. (33) in January 2019. It con [ImageNet database (2)] and subsequently trained on the radiographs
tains 224,316 chest radiographs of 65,240 patients. This dataset was in this study. The architecture has been shown to achieve state-of-
used to train a second GAN to demonstrate the feasibility of the the-art performance on the ChestX-ray dataset (35) before. Imple
proposed data sharing approach (see Fig. 1). A detailed explanation mentations were done using TensorFlow 1.9.0 and PyTorch 1.1.0.
of the label preparation and statistics for the CheXpert dataset is
given in table S3. Algorithms of classification were tested on the Training of the GANs
NIH test set. Therefore, no subdivision of the CheXpert dataset into We trained two GANs on the basis of two separate datasets in a
test, training, and validation sets was needed, and all available frontal progressive growing strategy: on the NIH ChestX-ray14 dataset and
radiographs of the CheXpert dataset (n = 191,027) were used for the Stanford CheXpert dataset. Note that weights were initialized
training of the GAN. randomly. Training proceeded in repetitive stages: once training of
The third dataset used in this study is a dataset of x-rays released one spatial resolution stage stabilized after being presented a total of
by the RSNA to host a challenge about pneumonia detection. We 600,000 real radiographs (with repetitions), the layers responsible
used this dataset to train a classifier for pneumonia detection and to for the next spatial resolution stage were gradually faded in and train
test whether the inclusion of synthesized x-rays could improve the ing continued with another 600,000 radiographs during this fade-in
performance of said classifier. stage (again with repetitions). In total, discriminators of GANs were
between a variety of diseases. As not all of the 14 pathologies labeled To determine the number of needed samples for the performed
in the NIH dataset had been labeled in the Stanford dataset, we only experiments, we used power analyses according to (36). In general, all
classified those pathologies that were present in both datasets’ labels, of our performed experiments followed a binomial distribution,
namely, cardiomegaly, effusion, pneumothorax, atelectasis, edema, because each decision for a radiograph was binary: either yes (e.g.,
consolidation, and pneumonia. We trained a classifier to differentiate was real for the case of deciding between real and synthesized radio
between these classes with three different training sets: (i) synthesized graph or disease was present for the case of the classifiers) or no
x-rays generated by the NIH-GAN, (ii) synthesized x-rays generated (was not real or disease not present). We could thus use the binomial
_
by both the NIH-GAN and the Stanford-GAN, and (iii) real x-rays absolute numbers = √n × p × q ,
formula for the SD of absolute numbers: SD
from the NIH dataset. _
p×q
In addition, an experiment was carried out, in which the generated percentages = √_
or equally well the SD of percentages: SD n .
images were evaluated by the trained DenseNet to discover correla The difference of metrics, such as AUC, sensitivity, and specificity,
tions between different pathologies. For each pairing of pathology was defined as a metric (see table S4). For the total number of
as generated by the generator and pathology as classified by the n = 1000 bootstrapping, models were built after randomly per
classifier, we calculated Pearson’s correlation coefficient and performed muting predictions of two classifiers, and metric difference metrici
clustering on the resulting correlation matrix. were computed from their respective scores. We obtained the P value
of individual metrics by counting all metrici above the threshold
Federated averaging GAN metric. Statistical significance was defined as P < 0.001.
The pseudocode of our federated averaging GAN is given in
algorithm S1. Specifically, we controlled our federated learning ex SUPPLEMENTARY MATERIALS
periment by setting 10% of local clients that ran local GAN updates Supplementary material for this article is available at http://advances.sciencemag.org/cgi/
a local batch size of 32 (b = 32). Following Gulrajani et al. (26), pa View/request a protocol for this paper from Bio-protocol.
15. J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. van den Berg, localization of common thorax diseases, in Proceedings of the IEEE Conference on
I. Išgum, Deep MR to CT synthesis using unpaired data, in Simulation and Synthesis in Computer Vision and Pattern Recognition (IEEE, 2017), pp. 2097–2106.
Medical Imaging (Springer, 2017), pp. 14–23. 33. J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo,
16. A. Chartsias, T. Joyce, R. Dharmakumar, S. A. Tsaftaris, Adversarial image synthesis for R. Ball, K. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones,
unpaired multi-modal cardiac data, in Simulation and Synthesis in Medical Imaging D. B. Larson, C. P. Langlotz, B. N. Patel, M. P. Lungren, A. Y. Ng, CheXpert: A large chest
(Springer, 2017), pp. 3–13. radiograph dataset with uncertainty labels and expert comparison. arXiv:1901.07031
17. Z. Zhang, L. Yang, Y. Zheng, Translating and segmenting multimodal medical volumes (2019).
with cycle- and shape-consistency generative adversarial network, Proceedings of the IEEE 34. E. Perez, F. Strub, H. de Vries, V. Dumoulin, A. Courville, FiLM: Visual reasoning with a
Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, 18 to general conditioning layer. arXiv:1709.07871 (2017).
22 June 2018. 35. P. Rajpurkar, J. Irvin, R. L. Ball, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul,
18. A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep C. P. Langlotz, B. N. Patel, K. W. Yeom, K. Shpanskaya, F. G. Blankenberg, J. Seekins,
convolutional generative adversarial networks (2015); arXiv:1511.06434. T. J. Amrhein, D. A. Mong, S. S. Halabi, E. J. Zucker, A. Y. Ng, M. P. Lungren, Deep learning
19. A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary classifier GANs, for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm
Proceedings of the 34th International Conference on Machine Learning-Volume 70 to practicing radiologists. PLOS Med. 15, e1002686 (2018).
(JMLR.org, 2017), pp. 2642–2651. 36. B. Rosner, Fundamentals of Biostatistics (Nelson Education, 2015).
20. P. Rui, K. Kang, National Hospital Ambulatory Medical Care Survey: 2015 Emergency 37. M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN. arXiv:1701.07875 (2017).
Department Summary Tables (2015). https://www.cdc.gov/nchs/data/nhamcs/web_ 38. D. P. Kingma, P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions,
tables/2015_ed_web_tables.pdf [accessed 16 January 2020]. in Advances in Neural Information Processing Systems (2018), pp. 10215–10224.
21. S. Conjeti, A. Katouzian, A. G. Roy, L. Peter, D. Sheet, S. Carlier, A. Laine, N. Navab, 39. T. Salimans, A. Karpathy, X. Chen, D. P. Kingma, PixelCNN++: Improving the PixelCNN
Supervised domain adaptation of decision forests: Transfer of models trained in vitro with discretized logistic mixture likelihood and other modifications. arXiv:1701.05517
for in vivo intravascular ultrasound tissue characterization. Med. Image Anal. 32, 1–17 (2016). (2017).
22. B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-Efficient Learning 40. J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, T. Darrell, CyCADA:
of Deep Networks from Decentralized Data, in Proceedings of the 20th International Cycle-consistent adversarial domain adaptation. arXiv:1711.03213 (2017).
Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20 to 22 April 2017.
Science Advances (ISSN 2375-2548) is published by the American Association for the Advancement of Science. 1200 New York Avenue
NW, Washington, DC 20005. The title Science Advances is a registered trademark of AAAS.
Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).