0% found this document useful (0 votes)
20 views

Wang_Rethinking_Bayesian_Deep_Learning_Methods_for_Semi-Supervised_Volumetric_Medical_Image_CVPR_2022_paper

Uploaded by

sirdmdnd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Wang_Rethinking_Bayesian_Deep_Learning_Methods_for_Semi-Supervised_Volumetric_Medical_Image_CVPR_2022_paper

Uploaded by

sirdmdnd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Rethinking Bayesian Deep Learning Methods for

Semi-Supervised Volumetric Medical Image Segmentation

Jianfeng Wang Thomas Lukasiewicz


Department of Computer Science Department of Computer Science
University of Oxford University of Oxford
[email protected] [email protected]

Abstract gions from medical images. Training segmentation networks


usually relies on large labeled medical image datasets, but
Recently, several Bayesian deep learning methods have they are scarce and difficult to obtain, since not everyone is
been proposed for semi-supervised medical image segmen- qualified to annotate medical data, and there is only a lim-
tation. Although they have achieved promising results on ited number of medical experts. Therefore, semi-supervised
medical benchmarks, some problems are still existing. Firstly, segmentation has become an important research direction in
their overall architectures belong to the discriminative mod- the area of medical computer vision.
els, and hence, in the early stage of training, they only use la- Several methods [1, 2, 5, 6, 9–11, 14, 15, 17–19, 21, 22]
beled data for training, which might make them overfit to the have been proposed in recent years.1 Among these works,
labeled data. Secondly, in fact, they are only partially based Bayesian deep-learning-based methods [14,15,17,18,21] are
on Bayesian deep learning, as their overall architectures closely related to this work, as they are based on Monte Carlo
are not designed under the Bayesian framework. However, (MC) dropout [4], which is an approximation of Bayesian
unifying the overall architecture under the Bayesian perspec- neural networks (BNNs). These methods are mainly built
tive can make the architecture have a rigorous theoretical with the teacher-student architecture, which can be regarded
basis, so that each part of the architecture can have a clear as the discriminative model. Specifically, they initially build
probabilistic interpretation. Therefore, to solve the prob- the model P (Y |X) only from labeled data, and then apply
lems, we propose a new generative Bayesian deep learning the model to generate pseudo labels that are refined or rec-
(GBDL) architecture. GBDL belongs to the generative mod- tified based on the epistemic uncertainty [7] provided by
els, whose target is to estimate the joint distribution of input MC dropout for unlabeled ones. Then, the pseudo labels are
medical volumes and their corresponding labels. Estimating further combined with unlabeled and labeled data to further
the joint distribution implicitly involves the distribution of train the overall architecture.
data, so both labeled and unlabeled data can be utilized in The works [14, 15, 17, 18, 21] mainly have two issues.
the early stage of training, which alleviates the potential Firstly, their models are built only with labeled data in the
overfitting problem. Besides, GBDL is completely designed early stage of training and may overfit to them, since the
under the Bayesian framework, and thus we give its full quantity of labeled data is limited. Consequently, the new
Bayesian formulation, which lays a theoretical probabilistic pseudo labels generated by the models might be inaccurate,
foundation for our architecture. Extensive experiments show which adversely impacts the subsequent training process.
that our GBDL outperforms previous state-of-the-art meth- Secondly, their models are only partially based on Bayesian
ods in terms of four commonly used evaluation indicators on deep learning and are not designed under the Bayesian frame-
three public medical datasets. work. Hence, they are unable to give a Bayesian formulation
with regard to their models, lacking a solid theoretical basis.
As a result, some modules in their models are empirically
1. Introduction designed, and the functions of those modules remain unclear.

Deep neural networks are a powerful tool for visual learn- 1 Note that we only consider previous methods that build their architec-

ing, and they have outperformed previous methods that are tures with 3D CNNs, since our architecture is also based on 3D CNNs. In
most cases, using 3D CNNs to process medical volumetric data performs
based on hand-crafted features in many classical computer better than using 2D CNNs, as the relationships among neighboring slices
vision tasks. Due to the power of deep neural networks, are preserved in 3D CNNs. Thus, comparing our method with 2D-CNN-
researchers also try to use them to segment pathology re- based methods is unfair to them.

182
In this work, we aim to fix these two problems by propos- In particular, a teacher model and a student model were
ing a new generative model, named generative Bayesian deep built, where the latter learnt from the former by minimizing
learning (GBDL) architecture, as well as its corresponding the segmentation loss on the labeled data and the consis-
full Bayesian formulation. Unlike teacher-student-based tency loss on all the input data. MC dropout was leveraged
architectures, GBDL estimates the joint probability distribu- to filter out the unreliable predictions and to preserve the
tion P (X, Y ) via both labeled and unlabeled data, based on reliable ones given by the teacher model, and the refined
which more reliable pseudo-labels can be generated, since predictions can help the student model learn from unlabeled
the generative model is more faithful about the overall data data. Based on this uncertainty-aware teacher-student model,
distribution P (X) and the intrinsic relationship between in- a double-uncertainty weighted method [18] was proposed,
puts and their corresponding labels (modeled by P (X, Y )), which takes both the prediction and the feature uncertainty
alleviating the overfitting problem. Moreover, GBDL is en- into consideration for refining the predictions of the teacher
tirely constructed under the Bayesian framework, and its model during training. Sedai et al. [14] used MC dropout
related full Bayesian formulation lays a solid theoretical for training the teacher model as well, but they designed a
foundation, so that every part of GBDL has its correspond- novel loss function to guide the student model by adaptively
ing probabilistic formulation and interpretation. weighting regions with unreliable soft labels to improve the
The main contributions of this paper are as follows: final segmentation performance, rather than simply remov-
ing the unreliable predictions. Shi et al. [15] improved the
• We propose a new generative Bayesian deep learning way of uncertainty estimation by designing two additional
(GBDL) architecture. GBDL aims to capture the joint models, which are called object conservation model and
distribution of inputs and labels, which moderates the object-radical model, respectively. Based on these two mod-
potential overfitting problem of the teacher-student archi- els, certain region masks and uncertain region masks can be
tecture in previous works [14, 15, 17, 18, 21]. obtained to improve pseudo labels and to prevent the possi-
• Compared with previous methods [14, 15, 17, 18, 21], ble error propagation during training. Since the multi-task
GBDL is completely designed under the Bayesian frame- learning can boost the performance of segmentation, and it
work, and its full Bayesian formulation is also given here. has not been considered in previous works, Wang et al. [17]
Thus, it has a theoretical probabilistic foundation, and the combined it into their architecture, and imposed the uncer-
choice of each module in GBDL and the construction of tainty estimation on all tasks to get a tripled-uncertainty that
the loss functions are more mathematically grounded. is further used to guide the training of their student model.
• In extensive experiments, GBDL outperforms previous
methods with a healthy margin in terms of four evaluation
2.2. Other Deep Learning Methods
indicators on three public medical datasets, illustrating Concerning other deep learning methods proposed re-
that GBDL is a superior architecture for semi-supervised cently, they are mainly classified as three categories: new
volumetric medical image segmentation. training and learning strategies [1, 22], shape-aware or
structure-aware based methods [5, 6, 9], and consistency
The rest of this paper is organized as follows. In Section 2,
regularization based methods [2, 10, 11, 19].
we review related methods. Section 3 presents our GBDL
architecture and its Bayesian formulation, followed by the The new training and learning strategies aim to gradually
experimental settings and results in Section 4. In Section 5, improve the quality of pseudo labels during training. For
we give a summary and an outlook on future research. example, Bai et al. [1] proposed an iterative training strategy,
The source code is available at https://github.com/Jianf- in which the network is first trained on labeled data, and
Wang/GBDL. then it predicts pseudo labels for unlabeled data, which are
further combined with the labeled data to train the network
2. Related Work again. During the iterative training process, a conditional
random field (CRF) [8] was used to refine the pseudo labels.
In this section, we briefly review related Bayesian and Zeng et al. [22] proposed a reciprocal learning strategy for
other deep learning methods for semi-supervised volumetric their teacher-student architecture. The strategy contains a
medical image segmentation in the past few years. feedback mechanism for the teacher network via observing
how pseudo labels would affect the student, which is omitted
2.1. Bayesian Deep Learning Methods in previous teacher-student based models.
Previous Bayesian deep learning methods simply use MC The shape-aware or structure-aware based methods en-
dropout as a tool to detect when and where deep models courage the network to explore complex geometric informa-
might make false predictions [14, 15, 17, 18, 21]. Specifically, tion in medical images. For instance, Hang et al. [5] pro-
an uncertainty-aware self-ensembling model [21] was pro- posed a local and global structure-aware entropy-regularized
posed based on the teacher-student model and MC dropout. mean teacher (LG-ER-MT) architecture for semi-supervised

183
segmentation. It extracts local spatial structural informa- YU represents the ground-truth labels of XU , which are not
tion by calculating inter-voxel similarities within small vol- observed in the training data.
umes and global geometric structure information by utilizing The whole learning procedure of P (W |X, YL ) can be
weighted self-information. Li et al. [9] introduced a shape- written as:
aware semi-supervised segmentation strategy that integrates
a more flexible geometric representation into the network for P (W |X, YL )

boosting the performance. Huang et al. [6] designed a 3D
= P (W |X, Y )P (X, Y |Z)P (Z|X, YL )dZdXdY,
Graph Shape-aware Self-ensembling Network (3D Graph-
S 2 Net), which is composed by a multi-task learning network (1)
and a graph-based module. The former performs the se- where Z is the latent representation that governs the joint
mantic segmentation task and predicts the signed distance distribution of X and Y , denoted by P (X, Y |Z). To esti-
map that encodes rich features of object shape and surface, mate the joint distribution, one should learn Z from X and
while the latter explores co-occurrence relations and diffuse YL . Concerning that Eq. (1) is intractable, we take the MC
information between these two tasks. approximation of it as follows:
The consistency-regularization-based methods try to add
1 N −1 M −1
different consistency constraints for unlabeled data. In par- P (W |X, YL ) = P (W |X(i,j) , Y(i,j) ).
ticular, Bortsova et al. [2] proposed a model that is im- MN i=0 j=0
(2)
plemented by a Siamese architecture with two identical In this approximation, M latent representations Z are drawn
branches to receive differently transformed versions of the from P (Z|X, YL ). Then, N pairs of input volumes and la-
same images. Its target is to learn the consistency under bels are obtained from P (X, Y |Z), which are further used
different transformations of the same input. Luo et al. [10] to get the posterior distribution. Thus, to generate X and Y ,
designed a novel dual-task-consistency model, whose ba- one should obtain the joint distribution P (X, Y ), i.e., esti-
sic idea is to encourage consistent predictions of the same mating the distribution of Z. Overall, the learning procedure
input under different tasks. Xie et al. [19] proposed a pair- contains two steps:
wise relation-based semi-supervised (PRS2) model for gland
segmentation on histology tissue images. In this model, a su- • Learning the distribution of Z that governs the joint distri-
pervised segmentation network (S-Net) and an unsupervised bution P (X, Y ).
pairwise relation network (PR-Net) are built. The PR-Net • Learning the posterior distribution P (W |X, Y ) based on
learns both semantic consistency and image representations X and Y sampled from P (X, Y ).
from each pair of images in the feature space for improv-
ing the segmentation performance of S-Net. Luo et al. [11]
proposed a pyramid (i.e., multi-scale) architecture that en- Inference Procedure. The inference procedure can be for-
courages the predictions of an unlabeled input at multiple mulated as follows:
scales to be consistent, which serves as a regularization for P (Ypred |Xtest , X, YL )
unlabeled data. 
(3)
= P (Ypred |Xtest , W )P (W |X, YL )dW,
3. Methodology
In this section, we start from giving the Bayesian formu- where Xtest and Ypred denote the test input and the predicted
lation of our GBDL architecture. Thereafter, the architecture result, respectively. The posterior distribution P (W |X, YL )
and related loss functions are introduced in detail. is learnt based on X and Y sampled from P (X, Y ), and thus
the posterior can also be replaced by P (W |X, Y ). Since
3.1. Bayesian Formulation Eq. (3) is intractable as well, its MC approximation can be
Learning Procedure. The objective of semi-supervised written as:
volumetric medical image segmentation is to learn a segmen- 1 T −1
tation network, whose weights are denoted by W , with par- P (Ypred |Xtest , X, YL ) = P (Ypred |Xtest , Wi ),
T i=0
tially labeled data, and the Bayesian treatment of this target (4)
is to learn the posterior distribution, namely, P (W |X, YL ). where T models are drawn from P (W |X, Y ), which is
Now, we use X to denote the input volumes, which contains usually implemented via MC dropout [4] with T times
labeled input volumes (XL ) and unlabeled input volumes feedforward passes. For each model (Wi ) sampled from
(XU ), i.e., X = {XL , XU }, and we use Y = {YL , YU } to P (W |X, Y ), a prediction result can be obtained, and the
denote the ground-truth labels with respect to X. Each label final prediction of Xtest can be calculated by averaging the
in Y is a voxel-wise segmentation label map that has the T results. In addition, we get the epistemic uncertainty by
same shape as its corresponding input volume. Note that calculating the entropy of these predictions.

184
LKL [ Q(Z|X) || P(Z) ] For every
slice, sample Reconstruct the
Latent representation input volume,
Mean vector of from  (0, I ) of each slice i.e., P(X | Z)
each slice
Input volumes
LMSE and labels are
drawn from
Covariance matrix P(X, Y|Z) to
of each slice train the regular
3D-UNet 3D-UNet
Encoder 3D-Unet with
Decoder MC dropout.

Input Volume LCE + LDice


Regular 3D-Unet with MC
Generate labels dropout, i.e., P(W| X, Y)
based on Z and X,
i.e., P(YL | XL, Z)
3D-UNet
3D-UNet
Decoder
Encoder

Figure 1. GBDL for semi-supervised volumetric medical image segmentation, including a latent representation learning (LRL) architecture
(in the green dotted box) and a regular 3D-UNet with MC dropout (in the red dotted box). Only the regular 3D-UNet with MC dropout is
used during testing. For simplicity, the shortcut connections between the paired 3D-UNet encoder and decoder are omitted.

3.2. GBDL and Loss Functions is determined by the number of 3D convolutional layers in
the encoder. For ease of calculation, we assume that the
Under the guidance of the given Bayesian formulation,
distribution of the latent representation of each slice follows
we start to introduce the details of GBDL and loss functions.
a multivariate Gaussian distribution and that the latent rep-
Concerning the two steps mentioned in the learning proce-
resentations of slices are independent from each other. It
dure above, GBDL has a latent representation learning (LRL)
is worth noting that such independence assumption on the
architecture and a regular 3D-UNet with MC dropout, which
latent representations of slices is reasonable, as in this way,
are shown in the green dotted box and the red dotted box in
the latent representation of each slice just rests on nrf slices,
Figure 1, respectively. LRL is designed to learn the distri-
which is consistent with the fact that each slice is closely
bution of Z and to capture the joint distribution P (X, Y ),
relevant to its neighboring slices and distant slices may have
while the regular 3D-UNet with MC dropout is the Bayesian
no contribution to the slice. Therefore, based on the in-
deep model used for parameterizing the posterior distribution
dependence assumption, Q(Z|X) can be decomposed into
P (W |X, Y ). n−1
As for LRL, we first assume a variational probability i=0 q(zi |x(nrf ) ), where x(nrf ) denotes the input slices
that contribute to the zi . After obtaining the latent represen-
distribution Q(Z) to represent the distribution of the la-
tation Z, we can write the evidence lower bound (ELBO) as
tent representation Z. In this work, we follow the con-
follows (with proof in the supplementary material):
ditional variational auto-encoder (cVAE) [16] to use the
input to modulate Z, and thus, Q(Z) can be rewritten as
logP (X, Y ) ≥
Q(Z|X). Note that this work focuses on using 3D CNNs
to process volumetric medical data, and therefore, each Q(Z|X)
EQ [logP (Y |X, Z) + logP (X|Z)] − EQ [log( )],
input volume contains several slices, Q(Z|X) is actually P (Z)
the joint distribution of all these slices, i.e., Q(Z|X) = (5)
q(z0 , z1 , z2 , ..., zn−1 |x0 , x1 , x2 , ..., xn−1 ), where n is the where EQ denotes the expectation over Q(Z|X). Thus,
number of slices in an input volume, xi and zi are a slice and the learning objective is to maximize the ELBO (Eq. (5)),
its latent representation, respectively. To obtain the latent rep- which is achieved by maximizing EQ [logP (Y |X, Z) +
resentation of each slice zi , a 3D-UNet encoder2 is used in logP (X|Z)] and minimizing EQ [log(Q(Z|X)/P (Z))].
LRL. As the downsampling layers of the 3D-UNet encoder Firstly, to maximize P (X|Z), another 3D-UNet decoder
do not perform on the depth, and the 3D convolutional layers is used to take zi as input and the mean square error (LM SE )
used in the encoder are of the kernel size 3, the padding size is used as the loss function, which aims to reconstruct each
1, and the stride 1 (shown in the supplementary material), slice of the input volume. Secondly, to maximize the proba-
each zi is actually conditioned on nrf slices, where nrf is bility P (Y |X, Z), another branch can be built, which con-
the total receptive field along the depth dimension, which tains a 3D-UNet encoder and a 3D-UNet decoder as well.
Since only parts of data have labels, P (Y |X, Z) can also
2 In this paper, all 3D upsampling and downsampling layers used in the
be rewritten as P (YL |XL , Z). In this branch, the 3D-UNet
3D-UNet encoder and the 3D-UNet decoder only perform on the spatial
size, and therefore the depth of the input volumes keeps unchanged, and
encoder receives XL to extract features that are further com-
we can obtain the latent representation of each slice. More details about the bined with their corresponding Z to be input to the 3D-UNet
network configuration are given in the supplementary material. decoder. Then, YL is leveraged to calculate the dice loss

185
LDice [12] and cross-entropy loss LCE with XL to max- image are independent from each other. As a result, the
imize the probability P (YL |XL , Z). Thirdly, minimizing encoder of cVAE only gives a mean vector and a variance
EQ [log(Q(Z|X)/P (Z))] is to use Q(Z|X) to approximate vector for a latent representation3 . However, our LRL does
the prior distribution of Z, and in most cases, assuming that not follow this assumption for each slice, and the encoder
Q(Z|X) and P (Z) follow the same probability distribution gives a mean vector and a covariance matrix for the latent
can make the computation convenient. The following theo- representation of each slice4 .
rem (with proof in the supplementary material) is useful for • Joint Variational Distribution: cVAE is mainly used for
constructing the distribution of Q(Z|X). 2D inputs. In contrast, our LRL is specifically designed
Theorem 1. The product of any number n ≥ 1 of mul- for 3D-CNN-based architectures for processing volumet-
tivariate Gaussian probability density functions with pre- ric inputs. Therefore, since each input volume contains
cision Λi and mean μi results in an unnormalized Gaus- several slices, the variational distribution becomes a joint
n−1
sian curve with precision Λ∗ = i=0 Λi and mean μ∗ = distribution over their latent representations, resulting in
n−1
Λ−1
∗ ( i=0 Λi μi ). a different formulation of Kullback–Leibler (KL) diver-
Although an unnormalized Gaussian curve (denoted gence between the joint variational distribution and the
f (x)) is obtained by multiplying several multivariate Gaus- prior distribution (Eq. (6)).
sian PDFs, a Gaussian distribution
 can still be obtained by
simply normalizing f (x) with f (x)dx that does not con- Once LRL is trained well, the joint distribution of X and
tain x. Therefore, based on Theorem 1, we can suppose Y will be captured, and the second step is to learn the poste-
that Q(Z|X) follows a multivariate Gaussian distribution. rior distribution P (W |X, Y ). Therefore, input volumes and
In addition, we also assume that the prior distribution P (Z) corresponding labels are generated from the fixed LRL to
follows a multivariate standard normal distribution. To mini- train the regular 3D-UNet with MC dropout with two loss
mize EQ [log(Q(Z|X)/P (Z))], we propose a new loss func- functions that are widely used in medical image segmenta-
tion as follows (with proof in the supplementary material). tion, namely, the dice loss LDice and the cross-entropy loss
LCE . In this paper, we use LSeg to denote their summation:
Corollary 1. Minimizing EQ [log(Q(Z|X)/P (Z))] is
equivalent to minimizing LSeg = β1 LCE + β2 LDice , (8)
LKL[Q(Z|X)||P (Z)] =
where β1 and β2 are the coefficient of the two loss terms in
n−1 n−1
1/2 · (−log det[( Λi )−1 ] + tr[( Λi )−1 ] + LSeg . In the real implementation, the new generated pseudo
i=0 i=0 labels will be combined with the unlabeled data and labeled
n−1 n−1 n−1
( Λi μ i ) T ( Λi )−2 ( Λi μi ) − D), data to train the regular 3D-UNet with MC dropout. Ac-
i=0 i=0 i=0
(6) cording to Eq. (4), only the the regular 3D-UNet with MC
where n denotes the number of slices, and D denotes the dropout is used in the test phase. In addition, as we do a
number of dimensions of mean vectors. full Bayesian inference during testing, it is convenient to
obtain the voxel-wise epistemic uncertaintyC−1for each pre-
To summarize, maximizing ELBO (Eq. (5)) is equivalent
dicted result by calculating the entropy (− c=0 pc log2 pc )
to optimizing LRL and minimizing the loss function:
voxel-by-voxel, where C is the number of classes.
LELBO =
λ1 LCE + λ2 LDice + λ3 LM SE + λ4 LKL[Q(Z|X)||P (Z)] ,
4. Experiments
(7) We now report on experiments of the proposed GBDL on
where λ1 , λ2 , λ3 , and λ4 are coefficients. Significantly, three public medical benchmarks. To save space, implemen-
cVAE [16] has some similar properties when compared with tation details are shown in the supplementary material.
our LRL architecture, and it is important to emphasize the dif-
ferences between cVAE-based models and our LRL, which 4.1. Datasets
mainly contain three aspects: The KiTS19 dataset5 is a kidney tumor segmentation
• Different ELBO: cVAE is mainly used for generation tasks, dataset, which has 210 labeled 3D computed tomography
in which the data distribution P (X) is considered, and (CT) scans for training and validation. We followed the
labels Y are not involved into the ELBO of cVAE. In settings in previous works [3, 18] to use 160 CT scans for
contrast, our LRL concerns the joint distribution of X and 3 In practice, the encoder of cVAE gives log-variance vectors instead of

Y , leading to a different ELBO (Eq. (5)). variance vectors.


4 Although the independent components assumption is widely used, it
• Independent Components Assumption: cVAE assumes could be too strong and may degrade the performance.
that the components in the latent representation of an input 5 https://kits19.grand-challenge.org/

186
KiTS19 Atrial Segmentation Challenge
Scans used Metrics Scans used Metrics
Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓ Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓
160 0 0.940 (±0.004) 0.888 (±0.005) 3.66 (±0.26) 0.78 (±0.11) 80 0 0.915 (±0.006) 0.832 (±0.007) 3.89 (±0.17) 1.25 (±0.14)
3D-UNet
w/ MC dropout 16 0 0.838 (±0.044) 0.745 (±0.051) 11.36 (±1.89) 3.30 (±0.78) 16 0 0.834 (±0.044) 0.720 (±0.049) 8.97 (±1.27) 2.48 (±0.64)
4 0 0.710 (±0.051) 0.578 (±0.065) 19.56 (±1.95) 6.11 (±0.92) 8 0 0.801 (±0.057) 0.676 (±0.051) 11.40 (±1.53) 3.27 (±0.71)
Volume-based LRL 16 144 0.892 (±0.016) 0.823 (±0.021) 7.47 (±0.39) 1.88 (±0.27) 16 64 0.871 (±0.010) 0.784 (±0.017) 5.28 (±0.37) 1.89 (±0.16)
IC-LRL 16 144 0.900 (±0.011) 0.828 (±0.012) 6.99 (±0.26) 1.75 (±0.22) 16 64 0.882 (±0.014) 0.796 (±0.026) 4.69 (±0.22) 1.66 (±0.12)
GBDL (LRL) 16 144 0.911 (±0.010) 0.840 (±0.013) 6.38 (±0.37) 1.51 (±0.22) 16 64 0.894 (±0.008) 0.822 (±0.011) 4.03 (±0.41) 1.48 (±0.14)
Volume-based LRL 4 156 0.883 (±0.011) 0.810 (±0.014) 8.32 (±0.42) 1.99 (±0.24) 8 72 0.865 (±0.017) 0.773 (±0.013) 6.81 (±0.33) 2.49 (±0.13)
IC-LRL 4 156 0.889 (±0.014) 0.814 (±0.014) 8.01 (±0.31) 2.01 (±0.12) 8 72 0.871 (±0.009) 0.779 (±0.024) 5.96 (±0.22) 1.97 (±0.17)
GBDL (LRL) 4 156 0.898 (±0.008) 0.821 (±0.011) 6.85 (±0.44) 1.78 (±0.23) 8 72 0.884 (±0.007) 0.792 (±0.012) 5.89 (±0.31) 1.60 (±0.15)

Table 1. Ablation studies of different LRL variants on the KiTS19 dataset and the Atrial Segmentation Challenge dataset. For each of our
results, the “mean (±std)” is reported. The first row shows the upper bound performance, i.e., all training data with their labels are used.

KiTS19 Atrial Segmentation Challenge


Scans used Metrics Scans used Metrics
Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓ Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓
w/o LM SE 16 144 0.877 (±0.014) 0.789 (±0.019) 8.89 (±0.33) 1.96 (±0.19) 16 64 0.873 (±0.012) 0.781 (±0.011) 4.76 (±0.46) 1.66 (±0.17)
w/o LKL 16 144 0.868 (±0.013) 0.776 (±0.009) 8.99 (±0.29) 2.03 (±0.17) 16 64 0.876 (±0.014) 0.775 (±0.016) 5.12 (±0.55) 1.87 (±0.15)
Shared Encoder 16 144 0.854 (±0.019) 0.742 (±0.021) 10.04 (±0.67) 2.56 (±0.27) 16 64 0.862 (±0.021) 0.749 (±0.025) 6.33 (±0.73) 2.21 (±0.31)
GBDL 16 144 0.911 (±0.010) 0.840 (±0.013) 6.38 (±0.37) 1.51 (±0.22) 16 64 0.894 (±0.008) 0.822 (±0.011) 4.03 (±0.41) 1.48 (±0.14)
w/o LM SE 4 156 0.864 (±0.016) 0.769 (±0.014) 9.03 (±0.39) 2.05 (±0.18) 8 72 0.867 (±0.011) 0.771 (±0.017) 4.96 (±0.34) 1.85 (±0.20)
w/o LKL 4 156 0.860 (±0.017) 0.769 (±0.019) 9.32 (±0.46) 2.23 (±0.31) 8 72 0.854 (±0.015) 0.762 (±0.018) 5.33 (±0.42) 2.12 (±0.33)
Shared Encoder 4 156 0.843 (±0.021) 0.734 (±0.024) 11.08 (±0.67) 3.46 (±0.44) 8 72 0.831 (±0.019) 0.724 (±0.012) 7.19 (±0.68) 2.86 (±0.41)
GBDL 4 156 0.898 (±0.008) 0.821 (±0.011) 6.85 (±0.44) 1.78 (±0.23) 8 72 0.884 (±0.007) 0.792 (±0.012) 5.89 (±0.31) 1.60 (±0.15)

Table 2. Ablation studies of different parts in GBDL on the KiTS19 dataset and the Atrial Segmentation Challenge dataset. For each of our
results, the “mean (±std)” is reported.

training and 50 CT scans for testing. The 3D scans centering searched the hyperparameters (λ1 , λ2 , λ3 , λ4 , β1 , and β2 ),
at the kidney region were used, and we used the soft tissue and the number of sampled latent representations (M ) for
CT window range of [-100, 250] HU for the scans. each feedforward pass. Due to the space limit, the search
The Atrial Segmentation Challenge dataset [20] includes process is shown in the supplementary material. In all the
100 3D gadolinium-enhanced magnetic resonance imaging following experiments, we fixed λ1 , λ2 , λ3 , λ4 , β1 , β2 , and
scans (GE-MRIs) with labels. We followed previous works M to 1.0, 2.0, 1.0, 0.005, 1.0, 2.0, and 5 respectively.
[6, 9, 10, 18, 20] to use 80 samples for training and the other Firstly, we evaluated the baseline model, i.e., the regular
20 samples for testing. 3D-UNet with MC dropout that is trained with a limited
The Liver Segmentation dataset has been released by the number of labeled data, and we also evaluated the upper
MICCAI 2018 medical segmentation decathlon challenge.6 bound of the performance by utilizing all data to train the
The dataset has 131 training and 70 testing points, and the model. As Table 1 shows, when the number of labeled data
ground-truth labels of the testing data are not released. There- is decreased, the performance of the regular 3D-UNet drops
fore, we used those 131 CTs in our experiments, in which drastically.
100 CT scans and 31 CT scans were used for training and
Second, as for an input volume, LRL assumes that the
testing respectively. The 3D scans centering at liver regions
latent representation of each slice follows a multivariate
were utilized, and we used the soft tissue CT window range
Gaussian distribution, and therefore, the latent representa-
of [-100, 250] HU for the scans.
tion of the whole volume is sampled from a joint Gaussian
4.2. Evaluation Metrics distribution that is obtained by multiplying the Gaussian
distribution PDFs of those slices. However, LRL can also
Four evaluation metrics are used as our evaluation indi- be designed to learn the latent representation of the whole
cators, namely, Dice Score, Jaccard Score, 95% Hausdorff volume, i.e., estimating the joint Gaussian distribution di-
Distance (95HD), and Average SurfaceDistance (ASD). Dice rectly, and we denote this variant as “volume-based LRL”.
Score and Jaccard Score mainly compute the percentage of Compared to the original LRL, this variant is much closer to
overlap between two object regions. ASD computes the av- a 3D-CNN-based cVAE, since each input volume is treated
erage distance between the boundaries of two object regions, as a whole, and the volume-based LRL represents the whole
while 95HD measures the closest point distance between volume with a latent representation, similar to the original
two object regions. cVAE that represents the whole input image with a latent
4.3. Ablation Studies representation. In this case, LKL[Q(Z|X)||P (Z)] degenerates
to 1/2 · (−log det[Λ−1 −1
v ] + tr[Λv ] + μv μv − D), where
T
We conducted ablation studies on the KiTS19 and the μv and Λv are the mean vector and the precision matrix of
Atrial Segmentation Challenge dataset. We comprehensively the joint Gaussian distribution, respectively. We evaluated
6 http://medicaldecathlon.com/index.html such volume-based LRL, and the results are displayed in

187
Scans used Metrics Scans used Metrics
Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓ Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓
3D-UNet w/ MC dropout 100 0 0.950 0.899 6.04 1.53
UA-MT [21] 16 144 0.883 0.802 9.46 2.89
UA-MT [21] 5 95 0.920 0.867 13.21 4.54
SASSNet [9] 16 144 0.891 0.822 7.54 2.41 Double-UA [18] 5 95 0.927 0.878 12.11 4.19
Double-UA [18] 16 144 0.895 0.828 7.42 2.16 Tripled-UA [17] 5 95 0.921 0.869 11.77 3.62
Tripled-UA [17] 16 144 0.887 0.815 7.55 2.12 CoraNet [15] 5 95 0.923 0.877 10.84 4.28
CoraNet [15] 16 144 0.898 0.820 7.23 1.89 GBDL 5 95 0.935 0.884 7.89 2.42
UA-MT* [21] 16 144 0.878 0.797 8.93 2.76
Double-UA* [18] 16 144 0.899 0.823 7.55 2.21 Table 5. Comparison with state-of-the-art semi-supervised seg-
Tripled-UA* [17] 16 144 0.886 0.819 7.43 2.26 mentation methods on the Liver Segmentation dataset. Previous
CoraNet* [15] 16 144 0.894 0.820 7.37 1.92
GBDL 16 144 0.911 0.840 6.38 1.51
methods in this table are also based on the regular 3D-UNet with
UA-MT [21] 4 156 0.871 0.787 11.74 3.56 MC dropout.
SASSNet [9] 4 156 0.888 0.816 8.32 2.44
Double-UA [18] 4 156 0.887 0.817 8.04 2.34
Tripled-UA [17] 4 156 0.878 0.813 7.94 2.42 KiTS19 Atrial
Method
CoraNet [15] 4 156 0.882 0.814 8.21 2.44 16 labeleld 4 labeleld 16 labeleld 8 labeleld
Double-UA* [18] 4 156 0.890 0.819 7.93 2.33 UA-MT [21] 0.868 0.841 0.851 0.836
UA-MT* [21] 4 156 0.874 0.790 11.33 3.21 Double-UA [18] 0.881 0.864 0.861 0.842
Tripled-UA* [17] 4 156 0.882 0.810 7.81 2.47 Triplet-UA [17] 0.874 0.852 0.864 0.847
CoraNet* [15] 4 156 0.886 0.812 8.43 2.39 CoraNet [15] 0.882 0.866 0.867 0.855
GBDL 4 156 0.898 0.821 6.85 1.78 GBDL 0.897 0.877 0.882 0.864

Table 3. Comparison with state-of-the-art semi-supervised seg- Table 6. PAvPU of different Bayesian deep learning methods.
mentation methods on the KiTS19 dataset. “*” denotes previous
methods based on the regular 3D-UNet with MC dropout.
Scans used Metrics
(random noise) to each slice, which is beneficial to keeping
Labeled Unlabeled Dice ↑ Jaccard ↑ 95HD ↓ ASD ↓ such a difference. However, the volume-based LRL only
UA-MT [21] 16 64 0.889 0.802 7.32 2.26
SASSNet [9] 16 64 0.895 0.812 8.24 2.20
adds random noise to the latent representation of the whole
Double-UA [18] 16 64 0.897 0.814 7.04 2.03 volume, which goes against to keeping such a difference.
Tripled-UA [17] 16 64 0.893 0.810 7.42 2.21
CoraNet [15] 16 64 0.887 0.811 7.55 2.45 Third, considering that the independent components
Reciprocal Learning [22] 16 64 0.901 0.820 6.70 2.13 assumption is widely used in cVAE, we also evaluated
DTC [10] 16 64 0.894 0.810 7.32 2.10
3D Graph-S2 Net [6] 16 64 0.898 0.817 6.68 2.12 LRL variant with this assumption, and we denote this
LG-ER-MT [5] 16 64 0.896 0.813 7.16 2.06 variant as “IC-LRL”. In this case, LKL[Q(Z|X)||P (Z)]
Double-UA* [18] 16 64 0.894 0.809 6.16 2.28 
degenerates to 1/2 ·  1 1
D [−log n−1 1 + +
UA-MT* [21] 16 64 0.891 0.793 6.44 2.39 n−1 1
Tripled-UA* [17] 16 64 0.889 0.809 6.88 2.48 i=0 σ 2 i=0 σ 2
CoraNet* [15] 16 64 0.883 0.805 6.73 2.67 1
n−1 μi 2 i i

GBDL 16 64 0.894 0.822 4.03 1.48 ( n−1 1 i=0 σ 2 ) − 1], where μi and σi are the mean
i=0 σ2 i
UA-MT [21] 8 72 0.843 0.735 13.83 3.36 i
SASSNet [9] 8 72 0.873 0.777 9.62 2.55 vector and the variance vector of the Gaussian distribution
Double-UA [18] 8 72 0.859 0.758 12.67 3.31
Tripled-UA [17] 8 72 0.868 0.768 10.42 2.98 for each slice in the input volume, respectively. The re-
CoraNet [15] 8 72 0.866 0.781 12.11 2.40 sults shown in Table 1 indicate that the IC-LRL performs
Reciprocal Learning [22] 8 72 0.862 0.760 11.23 2.66
DTC [10] 8 72 0.875 0.782 8.23 2.36 worse than our original LRL, verifying that the independent
LG-ER-MT [5] 8 72 0.855 0.751 13.29 3.77
3D Graph-S2 Net [6] 8 72 0.879 0.789 8.99 2.32
components assumption harms the performance of LRL.
Double-UA* [18] 8 72 0.864 0.767 10.99 3.02 Furthermore, we analyzed the two important parts of
UA-MT* [21] 8 72 0.847 0.744 12.32 3.20
Tripled-UA* [17] 8 72 0.868 0.760 9.73 3.31 the proposed LRL, i.e., the reconstruction loss LM SE and
CoraNet* [15] 8 72 0.861 0.770 11.32 2.46 the term LKL[Q(Z|X)||P (Z))] . We removed these two loss
GBDL 8 72 0.884 0.792 5.89 1.60
terms,7 respectively, to study their impact on GBDL. As Ta-
Table 4. Comparison with state-of-the-art semi-supervised segmen- ble 2 shows, when LM SE or LKL[Q(Z|X)||P (Z))] is removed,
tation methods on the Atrial Segmentation Challenge. “*” denotes a performance decline can be observed, demonstrating the
previous methods based on the regular 3D-UNet with MC dropout. importance of these two terms and the correctness of our
Bayesian formulation.
Besides, as the two 3D-UNet encoders in LRL are of
Table 1. For a fair comparison, the μv and Λv are of the the same architecture (see the supplementary material), and
same number of dimensions as the μi and Λi in the original both aim at extracting features from input volumes, we eval-
LRL. The results show that the volume-based LRL performs uated a case where the two encoders share their weights.
worse than the original LRL that is presented in Section 3.2, Nevertheless, sharing the weights degrades the performance,
demonstrating that the original LRL can better model the as shown in Table 2. Similarly, although some layers of
joint distribution. Based on this observation, we hypothesize the two 3D-UNet decoders in the LRL also have the same
that keeping a difference among slices is important, since 7 Removing L
M SE is equivalent to removing the reconstruction path,
slices and their corresponding segmentation masks are dif- i.e., when we did an ablation study about removing the reconstruction loss
ferent from each other. The original LRL adds perturbation LM SE , we also removed the upper 3D-UNet decoder of LRL in Figure 1.

188
Time Cost 3D-UNet with MC dropout Dice Score KiTS19 (16 labeled) KiTS19 (4 labeled) PAvPU KiTS19 (16 labeled) KiTS19 (4 labeled)
Atrial (16 labeled) Atrial (8 labeled) Atrial (16 labeled) Atrial (8 labeled)
1200 0.92 0.91
0.9
1000 0.91
0.89
800 0.9 0.88
0.87
600 0.89
0.86
400 0.88 0.85
0.84
200 0.87
0.83
0 0.86 0.82
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

(a) Influence on time consumption (b) Influence on dice score (c) Influence on PAvPU metric

Figure 2. The influence of the number of feedforward passes on: (a) time costs (ms) for processing a 128 × 128 × 32 volume input on a
single GeForce GTX 1080 Ti; (b) Dice score; and (c) PAvPU metric. The horizontal axis refers to the number of feedforward passes, namely,
T.

configuration, sharing their weights would clearly lead to a that clinicians can do a post-processing and refine the results
lower performance, as the respective objectives of them are in practice based on the uncertainty given by GBDL.
different, leading to different gradient descent directions. Finally, since full Bayesian inference is time-consuming,
Finally, when compared with the baseline model, the we show the relationship among the number of feedforward
proposed GBDL can improve the performance with a huge passes T , the time consumption, and the performance in Fig-
margin with respect to the four assessment metrics for both ure 2. By Figure 2(a), the time costs rise with increasing T ,
datasets. So, the proposed method is a good solution for but by Figure 2(b) and (c), the Dice score and the PAvPU
semi-supervised volumetric medical imaging segmentation. metric are not greatly impacted by T . Thus, T = 5 can be
chosen for saving time to get the uncertainty in practice, with
4.4. Comparison with State-of-the-Art Methods only a minor performance degradation.
The proposed GBDL is compared with previous state- 5. Summary and Outlook
of-the-art methods, and since previous works provide no
standard deviation, we also only report the mean values of In this paper, we rethink the issues in previous Bayesian
the four evaluation metrics in Tables 3, 4, and 5. Since most deep-learning-based methods for semi-supervised volumet-
previous methods are based on VNet, for fair comparison, ric medical image segmentation, and have designed a new
our regular 3D-UNet with MC dropout is designed to have generative Bayesian deep learning (GBDL) architecture to
similar numbers of parameters. Furthermore, we also re- solve them. The proposed method outperforms previous
implement previous Bayesian deep learning methods based state-of-the-art methods on three public medical benchmarks,
on the regular 3D-UNet with MC dropout, and according to showing its effectiveness for handling the data with only a
the results in Tables 3 and 4, their performance are similar very limited number of annotations.
to their VNet counterparts, meaning that the comparison However, in GBDL, there is still a limitation with regard
between our method and previous methods is relatively fair. to the time consumption for full Bayesian inference and
By Tables 3, 4, and 5, the performance of GBDL is uncertainty estimation. By Figure 2(a), several numbers of
higher than previous state-of-the-art results on the three feedforward passes will lead to a high cost in terms of time.
public medical datasets. When the number of labeled data Since Bayesian deep learning is important for some safety-
decreases, the performance gap between our GBDL and critical areas, such as medical diagnosis and autonomous
previous state-of-the-art methods becomes larger, which in- driving, future research can try to find better methods to
dicates that our method is more robust when fewer labeled improve the inference speed for GBDL. Moreover, since
data were used for training. More importantly, GBDL per- GBDL is a new proposed architecture, future research can
forms better than all previous Bayesian deep-learning-based also consider to combine other previous deep-learning-based
methods [15, 17, 18, 21], which verifies that GBDL is bet- methods (such as different training and learning strategies)
ter than the teacher-student architecture used in them and with GBDL to further improve the performance.
that the generative learning paradigm is more suitable for
pseudo-label generation and solving the semi-supervised Acknowledgments. This work was partially supported
segmentation task than the discriminative counterpart. by the Alan Turing Institute under the EPSRC grant
Furthermore, we compare the patch accuracy vs. patch EP/N510129/1, by the AXA Research Fund, by the EPSRC
uncertainty (PAvPU) metric [13] for GBDL and previous grant EP/R013667/1, and by the EU TAILOR grant. We also
Bayesian-based methods to evaluate their uncertainty esti- acknowledge the use of the EPSRC-funded Tier 2 facility
mation. By Table 6, GBDL can output more reliable and JADE (EP/P020275/1) and GPU computing support by Scan
well-calibrated uncertainty estimates than other methods, so Computers International Ltd.

189
References ence on Medical Image Computing and Computer-Assisted
Intervention, pages 318–329. Springer, 2021. 1, 2, 3
[1] Wenjia Bai, Ozan Oktay, Matthew Sinclair, Hideaki Suzuki, [12] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi.
Martin Rajchl, Giacomo Tarroni, Ben Glocker, Andrew King, V-Net: Fully convolutional neural networks for volumetric
Paul M Matthews, and Daniel Rueckert. Semi-supervised medical image segmentation. In International Conference on
learning for network-based cardiac MR image segmenta- 3D Vision (3DV), pages 565–571. IEEE, 2016. 5
tion. In International Conference on Medical Image Com-
[13] Jishnu Mukhoti and Yarin Gal. Evaluating bayesian
puting and Computer-Assisted Intervention, pages 253–260.
deep learning methods for semantic segmentation.
Springer, 2017. 1, 2
arXiv:1811.12709, 2018. 8
[2] Gerda Bortsova, Florian Dubost, Laurens Hogeweg, Ioan- [14] Suman Sedai, Bhavna Antony, Ravneet Rai, Katie Jones, Hi-
nis Katramados, and Marleen de Bruijne. Semi-supervised roshi Ishikawa, Joel Schuman, Wollstein Gadi, and Rahil
medical image segmentation via learning consistency under Garnavi. Uncertainty guided semi-supervised segmentation
transformations. In International Conference on Medical Im- of retinal layers in oct images. In International Conference
age Computing and Computer-Assisted Intervention, pages on Medical Image Computing and Computer-Assisted Inter-
810–818. Springer, 2019. 1, 2, 3 vention, pages 282–290. Springer, 2019. 1, 2
[3] Kang Fang and Wu-Jun Li. DMNet: Difference minimiza- [15] Yinghuan Shi, Jian Zhang, Tong Ling, Jiwen Lu, Yefeng
tion network for semi-supervised segmentation in medical Zheng, Qian Yu, Lei Qi, and Yang Gao. Inconsistency-aware
images. In International Conference on Medical Image Com- uncertainty estimation for semi-supervised medical image
puting and Computer-Assisted Intervention, pages 532–541. segmentation. IEEE Transactions on Medical Imaging, 2021.
Springer, 2020. 5 1, 2, 7, 8
[4] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian [16] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning struc-
approximation: Representing model uncertainty in deep learn- tured output representation using deep conditional generative
ing. In International Conference on Machine Learning, pages models. volume 28, pages 3483–3491, 2015. 4, 5
1050–1059. PMLR, 2016. 1, 3 [17] Kaiping Wang, Bo Zhan, Chen Zu, Xi Wu, Jiliu Zhou, Lup-
[5] Wenlong Hang, Wei Feng, Shuang Liang, Lequan Yu, Qiong ing Zhou, and Yan Wang. Tripled-uncertainty guided mean
Wang, Kup-Sze Choi, and Jing Qin. Local and global teacher model for semi-supervised medical image segmen-
structure-aware entropy regularized mean teacher model for tation. In International Conference on Medical Image Com-
3D left atrium segmentation. In International Conference on puting and Computer-Assisted Intervention, pages 450–460.
Medical Image Computing and Computer-Assisted Interven- Springer, 2021. 1, 2, 7, 8
tion, pages 562–571. Springer, 2020. 1, 2, 7 [18] Yixin Wang, Yao Zhang, Jiang Tian, Cheng Zhong,
[6] Huimin Huang, Nan Zhou, Lanfen Lin, Hongjie Hu, Yutaro Zhongchao Shi, Yang Zhang, and Zhiqiang He. Double-
Iwamoto, Xian-Hua Han, Yen-Wei Chen, and Ruofeng Tong. uncertainty weighted method for semi-supervised learning. In
3D Graph-S2 Net: Shape-aware self-ensembling network for International Conference on Medical Image Computing and
semi-supervised segmentation with bilateral graph convolu- Computer-Assisted Intervention, pages 542–551. Springer,
tion. In International Conference on Medical Image Com- 2020. 1, 2, 5, 6, 7, 8
puting and Computer-Assisted Intervention, pages 416–427. [19] Yutong Xie, Jianpeng Zhang, Zhibin Liao, Johan Verjans,
Springer, 2021. 1, 2, 3, 6, 7 Chunhua Shen, and Yong Xia. Pairwise relation learning for
[7] Alex Kendall and Yarin Gal. What uncertainties do we need semi-supervised gland segmentation. In International Con-
in bayesian deep learning for computer vision? In Advances ference on Medical Image Computing and Computer-Assisted
in Neural Information Processing Systems, 2017. 1 Intervention, pages 417–427. Springer, 2020. 1, 2, 3
[8] Philipp Krähenbühl and Vladlen Koltun. Efficient inference [20] Zhaohan Xiong, Qing Xia, Zhiqiang Hu, Ning Huang, Cheng
in fully connected CRFs with gaussian edge potentials. vol- Bian, Yefeng Zheng, Sulaiman Vesal, Nishant Ravikumar,
ume 24, pages 109–117, 2011. 2 Andreas Maier, Xin Yang, et al. A global benchmark of algo-
[9] Shuailin Li, Chuyu Zhang, and Xuming He. Shape-aware rithms for segmenting the left atrium from late gadolinium-
semi-supervised 3D semantic segmentation for medical im- enhanced cardiac magnetic resonance imaging. Medical Im-
ages. In International Conference on Medical Image Com- age Analysis, 67:101832, 2020. 6
puting and Computer-Assisted Intervention, pages 552–561. [21] Lequan Yu, Shujun Wang, Xiaomeng Li, Chi-Wing Fu, and
Springer, 2020. 1, 2, 3, 6, 7 Pheng-Ann Heng. Uncertainty-aware self-ensembling model
for semi-supervised 3D left atrium segmentation. In In-
[10] Xiangde Luo, Jieneng Chen, Tao Song, and Guotai Wang.
ternational Conference on Medical Image Computing and
Semi-supervised medical image segmentation through dual-
Computer-Assisted Intervention, pages 605–613. Springer,
task consistency. In Proceedings of the AAAI Conference on
2019. 1, 2, 7, 8
Artificial Intelligence, pages 8801–8809, 2021. 1, 2, 3, 6, 7
[22] Xiangyun Zeng, Rian Huang, Yuming Zhong, Dong Sun, Chu
[11] Xiangde Luo, Wenjun Liao, Jieneng Chen, Tao Song, Yi-
Han, Di Lin, Dong Ni, and Yi Wang. Reciprocal learning
nan Chen, Shichuan Zhang, Nianyong Chen, Guotai Wang,
for semi-supervised segmentation. In International Confer-
and Shaoting Zhang. Efficient semi-supervised gross target
ence on Medical Image Computing and Computer-Assisted
volume of nasopharyngeal carcinoma segmentation via uncer-
Intervention, pages 352–361. Springer, 2021. 1, 2, 7
tainty rectified pyramid consistency. In International Confer-

190

You might also like