MBANet A 3D convolutional neural network with multi-branch attention for brain tumor segmentation from
MBANet A 3D convolutional neural network with multi-branch attention for brain tumor segmentation from
A R T I C L E I N F O A B S T R A C T
Keywords: More than half of brain tumors are malignant tumors, so there is a need for fast and accurate segmentation of
MBANet tumor regions in brain Magnetic Resonance Imaging (MRI) images. Traditional 2D brain tumor segmentation
Brain tumor segmentation methods seriously ignore the spatial context features of brain tumor MRI images, so how to achieve accurate
3D multi-branch attention
segmentation of brain tumor regions with multiple modalities is the main problem. The paper proposes a 3D
3D convolutional neural network
convolutional neural network with 3D multi-branch attention - MBANet. First, the optimized shuffle unit is used
to form the basic unit (BU) module of MBANet. In the BU module, the group convolution is used to perform
convolution operation after the input channel is split, and the channel shuffle is used to scramble the con
volutional channels after fusion. Then, MBANet uses a novel multi-branch 3D Shuffle Attention (SA) module as
the attention layer in the encoder. The 3D SA module groups along the channel dimension and divides the feature
maps into small features. For each small feature, the 3D SA module builds both channel attention and spatial
attention while adopting the BU module. In addition, in order to recover the resolution of the upsampling se
mantic features better, a 3D SA module is also used in the skip connection of MBANet. Experiments on the BraTS
2018 and BraTS 2019 show that the dice of ET, WT and TC reach 80.18%, 89.80%, 85.47% and 78.21%, 89.79%,
83.04%, respectively. The excellent segmentation performance shows that MBANet is significantly improved
compared with other state-of-the-art methods.
* Corresponding authors at: College of Information Science and Technology, School of Data Science, Qingdao University of Science and Technology, Qingdao
266061, China; School of Data Science, University of Science and Technology of China, Hefei 230027, China (B. Yu).
E-mail addresses: [email protected] (Y. Feng), [email protected] (B. Yu).
https://doi.org/10.1016/j.bspc.2022.104296
Received 7 June 2022; Received in revised form 27 September 2022; Accepted 8 October 2022
Available online 18 October 2022
1746-8094/© 2022 Elsevier Ltd. All rights reserved.
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
perform pixel-level classification of images. UNet [12] is a network role that artificial intelligence can play in the field of image segmenta
applied to medical images segmentation proposed on the basis of FCN. It tion. But there are still two problems here. First, MRI images are 3D
consists of encoder, decoder, and skip connection between the encoder images, and many researchers crop them into patches and then use 2D
and the decoder. UNet is mainly used for image pixel-level classification, networks to segment tumors, which will not be conducive to the conti
which can extract more details of feature maps and improve the accu nuity of MRI image segmentation of tumor regions, and also ignore some
racy of deep convolutional neural network in image segmentation. In spatial context features. Second, the tumor area is small and severely
addition, other baseline networks for semantic segmentation of medical imbalanced. In order to solve the above problems, this paper proposes a
images include Deeplabv3 [13], etc. Here, FCN cannot obtain global multi-branch attention-based 3D lightweight network MBANet to ach
information because its receptive field is too small. Deeplabv3 obtains a ieve segmentation of brain tumor. The main contributions of this paper
larger receptive field by using different dilation rates, which ignores are as follows:
local information to a certain extent. In UNet, the shallow network pays
more attention to the local information, and the deep network pays more 1. A 3D lightweight segmentation network for multimodal brain tumor
attention to the global information. In addition, the skip connections of MRI images is proposed, namely MBANet. It uses the improved Basic
UNet are effective in recovering the fine-grained details of the target Unit (BU) module based on the shuffle unit, and rationally uses group
object. Therefore, UNet is widely used in semantic segmentation of convolution to greatly reduce the amount of network computation
medical images. At the same time, a large number of lightweight net and parameters, while also achieving the best results in multimodal
works [14–18] began to appear, which combined with convolution brain tumor segmentation.
operations such as group convolution and depthwise separable convo 2. In order to obtain more accurate segmentation results, MBANet uses
lution, so that the network training required less resources and the a multi-branch 3D Shuffle Attention (SA) module as an attention
amount of model parameters was smaller. layer in brain tumor MRI image segmentation. The 3D SA module
Attention mechanisms have become a top priority in areas such as uses spatial attention and channel attention while splicing the
image processing, speech recognition, and natural language processing generated feature maps in the channel dimension. In addition, the
[19]. In the patient’s brain MRI images, the tumor area is small, and the combination of the 3D SA module and the BU module effectively
attention mechanism can focus on the tumor area and extract more utilizes the method of channel shuffle.
useful features. Lightweight attention that is often used for semantic 3. In addition, MBANet attempts to add a 3D SA attention module to the
segmentation of medical images mainly includes BAM, CBAM, Feature skip connection. By adding attention to the skip connection, MBANet
Pyramid Attention (FPA), and Squeeze-and-Excitation (SE). BAM [20] not only concatenates the features obtained by the encoder and
utilizes atrous convolution to obtain a larger receptive field, which in decoder, but also effectively avoids feature loss.
corporates channel attention and global max-pooling on feature maps.
CBAM [21] uses both spatial attention and channel attention. Unlike The rest of this paper is organized as follows: the existing related
BAM, channel attention and spatial attention are carried out separately. work is discussed in Section 2. The methods and details of MBANet are
FPA [22] is adjusted and formed on the basis of spatial pyramid pooling. introduced in Section 3. In Section 4, there mainly introduces the
Different from spatial pyramid pooling, FPA builds pyramids through implementation details of MBANet and ablation research. The conclu
convolutions of different sizes, and introduces channel attention on this sions of this paper are given in Section 5.
basis. SE [23] solves the loss problem caused by different feature maps in
the process of convolution pooling. It can be seen that it mainly focuses 2. Related work
on channel attention. Both spatial attention and channel attention are
used to capture pairwise pixel-level relationships and channel-to- 2.1. Brain tumor segmentation in artificial intelligence
channel dependencies. Using both at the same time can achieve better
results, but inevitably increases the amount of computation. However, a Machine learning was proposed in the 1950 s and has had a great
multi-branch attention Shuffle Attention (SA) [24], which can effec impact in the field of artificial intelligence (AI). Several machine
tively combine spatial attention and channel attention at the same time, learning algorithms are popular for computer vision tasks in the field of
not only avoids the shortcomings of single attention, but also constructs medical images. The researchers use the feature extraction algorithm in
a kind of global attention. Compared with BAM and FPA, SA requires machine learning to extract the features of brain MRI images, and use
fewer computing resources, and obtains more features in space than SE. random forest classifiers to classify brain tumor voxels to achieve the
Unlike CBAM, spatial attention and channel attention are fused in the effect of segmenting brain tumors [25–27]. Deep learning is the product
channel dimension instead of separately, which can effectively reduce of the development of machine learning. The emergence of convolu
feature loss. In addition, the use of SA attention can greatly reduce the tional neural networks has made greater progress in the semantic seg
amount of computation. mentation of medical images. UNet [12] is the baseline of most models
The development of deep learning technology and the use of atten for semantic segmentation of medical images, and many researchers
tion mechanisms in the semantic segmentation of medical images have have begun to explore U-shaped network for semantic segmentation.
eased the burden on radiologists, while also enhancing the important UNet uses a fully convolutional neural network, which contains encoder
Fig. 1. Example of data from a training in BraTS 2018. The first four represent different modalities of the tumor, and the fifth is ground truth (GT). In GT, red
represents tumor core, yellow represents enhanced tumor, green represents edema region and Others are background.
2
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
and decoder structures, where the encoder part is used to extract fea basis of global attention. Huang et al. [45] proposed GCAUNet that can
tures, and the decoder part is used to perform feature fusion. In addition, fully utilize the low-level detail information of tumor regions, and
there are skip connections to extract more global feature information introduced a parallel network in this network, which significantly
while mitigating the semantic gap problem between encoder and improved the performance of brain tumor segmentation. Akbar et al.
decoder. Cinar et al. [28] proposed a hybrid DenseNet121-UNet archi [46] replace processing blocks with a sequence of atrous convolutions
tecture to improve DSC accuracy in segmenting small regions such as and propose a UNet architecture that increases attention in skip con
core tumors and enhanced tumors. Wu et al. [29] proposed DE-ResUNet nections. Wang et al. [47] proposed an improved U-Net that included
for brain tumor segmentation by extracting texture features from T1 residual connections, channel attention blocks and hybrid dilated
images and connecting the features between encoder and decoder attention convolutional layers in their network architecture to achieve
through a channel attention mechanism. Chen et al. [30] proposed a semantic segmentation of medical images. Ma et al. [48] proposed an
lightweight network for real-time dense voxel brain tumor segmenta attention pre-activation residual module in their 2D end-to-end atten
tion, exploring that 3D shuffle units can reduce the amount of parame tion network model for multi-task deep supervision for multi-scale
ters. In addition, they also use atrous convolution to obtain a larger informative feature fusion. Zhou et al. [49] applied an initial 3D U-
image receptive field. Zhou et al. [31] proposed an efficient ERVNet, and Net segmentation network to generate an additional contextual
the network has a great improvement in efficiency, but the network has constraint for each tumor region. Under the obtained constraints, an
a large number of parameters, which aggravates the GPU workload. Liu attention mechanism is used to fuse multi-sequence MRI to achieve
et al. [32] proposed a scale-adaptive-based MetricUNet, which learns segmentation of three single tumor regions. Xu et al. [50] used slices to
more contextual information while reducing the amount of computa segment brain tumors and proposed a corner attention module. This
tion. Jiang et al. [33] proposed a dual-branch decoding network DDU- module captures the complementary information between slices and
net, where one decoding branch handles semantic flow information, improves the representation capability of the network. Guan et al. [51]
and the other decoding branch handles edge flow information alone. proposed the AGSE-VNet network, which added SE attention to the
Zhang et al. [34] proposed a multi-encoder architecture for brain tumor encoder and AG module to the decoder respectively to increase the
segmentation. Each modality of the MRI image corresponds to an useful information in the channel, suppress useless information such as
encoder. Then, by fusing the features obtained by the four encoders into noise, and accurately extract the features of the tumor area.
the final feature map and putting it into the decoder, the difficulty of In the field of medical image segmentation, in addition to the
feature extraction is reduced and the performance of the model is attention mechanism, there is a generative adversarial network (GAN)
improved. In addition, on the basis of UNet, Abdollahi et al. [35] pro [52], which is also widely used in the field of medical image segmen
posed the VNet structure, which was also applied to medical image tation [53–58]. GAN is composed of generator and discriminator. It has
segmentation and achieved excellent results in the field of medical some disadvantages: “Nash equilibrium” is not easy to obtain, the pos
image segmentation. Huang et al. [36] proposed a structure consisting of sibility of collapse of the loss function in the learning process will cause
two parallel decoder branches by improving the traditional V-Net the generator to degenerate, and there are uncontrollable factors in the
framework. This structure can realize both the segmentation task and GAN model, which will cause inaccurate results. The attention mecha
the distance estimation task, which makes the segmentation result more nism can focus on relevant information and ignore irrelevant informa
accurate. tion. In addition, it can parallelize the calculation, the model is simpler,
Spatial pyramid [37] combined with UNet network is widely used in the parameters are less, and it is suitable for lightweight network use.
semantic segmentation of images. Atrous convolution can obtain a Therefore, this paper uses an attention mechanism to segment brain
better receptive field due to its different atrous rates. ESPNet [38] and tumor regions.
ESPNetV2 [39] fuse multi-scale features on the basis of atrous con
volutional feature pyramid without increasing the amount of parame 3. Method
ters. Compared to the depthwise separable convolution, the depthwise
separable atrous convolution pyramid works better. Wang et al. [40] This section details MBANet, a multi-branch attention network ar
proposed an automatic segmentation brain tumor network with spatially chitecture for brain tumor MRI images. First, the overall network ar
expanded feature pyramid modules. This module can both extract multi- chitecture is introduced. Second, a multi-branch lightweight attention
scale image features and deepen the network while using a residual 3D SA module is introduced. After that, the architecture of the encoder
architecture. Khan et al. [41] proposed a pyramid-based small encoder- and decoder is introduced in detail. Finally, the skip connection using
decoder network for multi-scale brain tumor segmentation. The pyramid the 3D SA module is introduced.
in this network structure uses a coarse-to-fine prediction method to
extract features of different scales. Zhou et al. [42] used atrous convo 3.1. Proposed MBANet architecture
lution instead of pooling to designed a 3D atrous convolution feature
pyramid, and a 3D fully-connected conditional random field (CRF) is MBANet is a lightweight network as shown in Fig. 2. Like traditional
used for post-processing of the network. Zhou et al. [43] used 3D atrous UNet, MBANet consists of an encoder part on the left, a decoder part on
convolution to design a new feature pyramid module and using 3D dense the right, and skip connections with added attention. First, the initial
connections to build feature reuse, which can fuse more multi-scale image is processed using 3 × 3 × 3 convolution to convert the image
contextual information. channels from 4 channels to 32 channels. Second, the BU module is used
as the baseline of MBANet, and group convolution is used reasonably,
2.2. Attention mechanisms in lightweight networks making it an important component of the encoder architecture. In the
encoding process, the 3D SA module attention layer is added to each
In the patient’s brain MRI images, the tissue area of the tumor region layer, and the introduction of the attention layer can obtain more useful
is small, and the boundaries between the various modalities are not information on each channel. In the up-sampling stage, MBANet uses
clear. In order to focus on tumor regions in brain MRI images, many transposed convolution, which can well achieve the effect of channel
researchers have introduced the attention mechanism in NLP into se increase and restore network channels. The transposed convolution can
mantic segmentation of medical images [19]. Kong et al. [44] proposed also reduce the hyperparameters in the network. Finally, a 1 × 1 × 1
a 3D fully convolutional network to segment brain tumor, and the convolution is used to restore the obtained feature map to the original 4-
network contains global and local attention mechanisms. Among them, channel, and then the segmentation map is obtained by processing the
the global attention is mainly to accurately segment large tumor regions, Softmax function. Overall, MBANet has superior performance. During
and the local attention is mainly to correct the obtained features on the the training process, it can be clearly seen that MBANet converges at a
3
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
Fig. 2. The proposed MBANet architecture. Cuboids with different colors represent different modules. The information above each cuboid represents the
feature maps.
very fast speed. While reducing the amount of parameters and running respective feature maps through the channel attention and the spatial
time of the network, the appropriate use of grouped convolution reduces attention. For channel attention part, the channel statistics are first
the access cost and improves the performance of the parallelism in generated by simply using global average pooling (GAP) to embed
MBANet. Here, in each layer of the network, constraints can be imposed global informations ∈ RC/2G×1×1×1 , which can be calculated by shrinking
by separately enforcing identifiable separate representations for each Xa1 though the spatial dimensionH × W × D:
MRI modality, followed by convolution and attention mechanisms for
∑H ∑ W ∑ D
feature extraction. In this way, the network can identify key information s = Fgp (Xa1 ) =
1
Xa1 (i, j, k) (1)
from each MRI pattern, which in turn enables accurate segmentation H × W × D i=1 j=1 k=1
and pixel-level classification of tumor regions. In the process of feature
extraction from low-level to high-level convolutional neural networks, Here, the channel attention result is expressed as Eq. (2):
the low-level features are first extracted through the convolution kernel ′
Xa1 = σ (Fc (s))⋅Xa1 = σ(W1 s + b1 )Xa1 (2)
of the convolutional layer. With the deepening of the number of layers,
the conversion of layer edge features to local features is realized. After WhereW1 ∈ RC/2G×1×1×1 , b1 ∈ RC/2G×1×1×1 are used to scale and shifts,
that, multiple sub-features are combined into an overall feature. From The σ (⋅) function here is a simple gating mechanism with sigmoid acti
the point of view of explainable artificial intelligence, by adding atten vation, which is used to accomplish feature condensation. In the spatial
tion mechanism, the interpretability can be improved and the relation attention part, firstly, group normalization (GN) on Xa2 is used to obtain
ship between feature maps of different channels can be found [59,60]. spatial statistics, then F(⋅) is the feature prediction value obtained on the
channel, which is usually used to enhance X ̂ a2 , and finally the output is
3.2. 3D SA module passed throughXa2 = σ(W2 ⋅GN(Xa2 ) + b2 )⋅Xa2 , where W2 and b2 are the
′
C/2G×1×1×1
parameters ofR . Then the two branch aggregates are pro
In the development of Convolutional Neural Networks (CNN), the
moted toXa = [Xa1 , Xa2 ] ∈ RC/G×H×W×D . All generated sub-features will
′ ′ ′
4
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
Fig. 3. The 3D SA module. Here, first, it splits the channel into two parts, the channel attention branch and the spatial attention branch, second, the sub-features
generated by the dual branches are fused, and finally the “channel shuffle” operation is used to maintain the information communication between different sub-
features. Cuboids represent sub-features generated by different operations.
convolution to all convolution operations. After that, a 3D SA module is 3.5. Skip connections with attention
added as the attention layer of the network in the downsampling of each
layer, so as to eliminate the influence of the unimportant organizational A skip connection with attention is added between the encoder and
background on the semantic segmentation of the image. the decoder. Here, the 3D SA module is used as the attention of the skip
In recent years, lightweight networks have appeared one after connection, which reasonably organizes the channel attention and
another. Group convolution is to group different feature maps of the spatial attention, so the effect is very good. Combined with the network
input layer, and then use different convolution kernels to convolve each composed of BU modules, it can achieve the best effect of segmentation.
group. The BU module in MBANet implements the following process. In terms of attention selection, this study also tried to use other atten
Firstly, channel separation is performed on the feature map channels, tions, which will compare the impact of different attentions on the
and the separated channels undergo three consecutive convolution op segmentation results in the next section of this paper.
erations. The convolution kernel sizes are1 × 1 × 1, 3 × 3 × 3 and 1 ×
1 × 1 respectively. Batch normalization is performed after each convo 4. Experiments
lution, and a ReLU activation function is used in the 1 × 1 × 1 convo
lution operation. Secondly, after the convolution operation, the 4.1. Dataset and processing
separated channels are spliced, then expand the feature map by channel,
and then perform the “Channel Shuffle” operation, which is to transpose This experiment uses the BraTS 2018 and BraTS 2019 datasets from
the expanded feature map. Finally, fold the transposed feature maps to the Center for Biomedical Image Computing and Analytics (CBICA)1 at
generate new feature maps to complete the “Channel Shuffle” operation. University of Pennsylvania [61]. The BraTS 2018 training set contains
The principle of channel shuffle is to disrupt the channel order of the 285 patients, of which 210 are HGG and 75 are LGG. The BraTS 2018
original feature maps. What needs to be determined is how many input validation set contains 66 cases, each patient’s data contains 4 modal
channels are and how many groups to be divided into. Here, the input ities (T1, T1ce, T2, and FLAIR) and GT, and each MRI image is of
channels are used as the number of groups, and the input channels are size240 × 240 × 155. The whole tumor area contains all parts with
grouped as the input of the next convolution operation. It can fully fuse background removed, the tumor core area includes the necrotic tumor
the channels without increasing the amount of computation. part, the enhancing tumor part and the non-enhancing tumor part, and
the enhancing tumor area contains only the enhancing tumor part. The
3.4. Decoder architecture BraTS 2019 training set included 335 glial tumor cases, of which 259
were HGG and 76 were LGG. The BraTS 2019 validation set has 125
In the decoding stage of the network, the upsampling module is used cases. Like the BraTS 2018 validation set, it has four different modalities
in the upsampling stage of MBANet. In the upsampling module, the and ground truth labels. All training data are preprocessed. For example,
feature map is first subjected to two standard 3 × 3 × 3 convolution a brain tumor image of size 240 × 240 × 155 is randomly cropped
operations, then the transposed convolution is used, and finally the 1 × to128 × 128 × 128, the brain tumor image is randomly flip in different
1 × 1 convolution is used to concatenate the features obtained after the axes, and the brain tumor image is randomly rotated within the range of
standard convolution operation and the feature channels obtained by [-10◦ , 10◦ ]. The preprocessing of the data set can effectively avoid the
the transposed convolution. All conventional convolution operations overfitting problem caused by the lack of training data set samples.
add ReLU activation function and batch normalization, which can speed
up model learning, simplify the parameter tuning process, and alleviate 4.2. Implementation detail
the problem of gradient disappearance. Transposed convolution reduces
feature loss in the case of network widening, while achieving dimen The framework used by MBANet is Pytorch [62], the model input
sionality reduction. After the feature maps are sampled on the upsam size is240 × 240 × 155, and the batch-size is 12. After MBANet was
pling module, the channel shuffle operation is performed through the BU trained on 6 NVIDIA RTX A5000 GPUs for 400 epochs, the model
module to obtain better and richer semantic features. Finally, as shown basically converged, and the training took only 4 h to complete. In
in Fig. 2, at the end of the network, a standard convolution is added, addition, this experiment uses the Adam optimizer, the initial learning
which can change the channel to restore it to the 4-channel when the rate is 0.0001, the L2 norm is applied to the model regularization, and
network is input. After that, the segmentation result is obtained by
processing the Softmax function.
1
https://ipp.cbica.upenn.edu/.
5
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
the weight delay rate is10 - 5 . This paper uses the unlabeled validation
TN
set provided by the BraTS challenge to verify MBANet, and the seg Specificity = (6)
TN + FP
mentation results are submitted to the CBICA online evaluation plat
form, which provides the evaluation results.
4.3.3. Hausdorff distance
In the field of image semantic segmentation, Hausdorff distance is
4.2.1. Loss function
mainly used to measure the similarity between segmentation results and
The loss function is used to measure the similarity between metrics.
edge labels. In this work, the Hausdorff95 is used, which is based on the
Here, a loss function is used to examine how well the model’s segmen
Hausdorff distance multiplied by 95%. The Hausdorff distance is
tation predictions deviate from GT. If there is only one target in the task,
expressed by Eq. (7):
the dice loss function can often be used directly, but the tumor area of
{ }
the brain tumor MRI image occupies a small scale of the entire image
dH (X, Y) = max maxmind(x, y), maxmind(x, y) (7)
area and belongs to a small target, and there are serious differences x∈X y∈Y y∈Y x∈X
represents the weight of each category, and n represents the total beled validation set is used for segmentation of multimodal brain
number of feature maps voxels. tumors, and finally the segmentation results are uploaded to the CBICA
platform for model evaluation.
4.2.2. Post processing
Due to the imbalance of the validation set labels of the BraTS dataset, 4.4.1. Performance on BraTS 2018 validation set
for example, in the process of testing, it is found that some labels of some This section will explore the comparison of MBANet with other state-
datasets are 0, which will lead to the appearance of a large number of of-the-art methods on the BraTS 2018 validation set and the visualiza
false positive samples. Therefore, post-processing uses a test-time tion of the segmentation results of MBANet against the labeled dataset.
augmentation (TTA) [63] technique that improves accuracy by several Through experiments, on the BraTS 2018 validation set, MBANet ach
percentage points. TTA means that in the prediction stage, the original ieves 80.18%, 89.8%, and 85.47% dice on enhanced tumor, whole
image is subjected to data enhancement operations such as horizontal tumor, and tumor core, respectively. Table 1 is the comparison result of
flip, vertical flip, diagonal flip, rotation angle, etc., to obtain multiple MBANet with other best methods. Among them, Chandra et al. [64]
prediction images, and then infer respectively, and then comprehen learn each axial, sagittal and coronal view based on a 3D ResNet-18
sively analyze the multiple results to obtain final output. This method residual network and fuse the features of each view together, then
can effectively make up for the lack of some important features of the train each of the whole tumor, tumor core and enhanced tumor cate
test set samples, and can effectively improve the test efficiency by gories Pixel Linear Classifier. Hua et al. [65] proposed a cascaded V-Nets
averaging multiple enhancement maps of the input prediction image. approach to segment multimodal brain tumors, utilizing three multi
modal image preprocessing pipelines and training different models to
improve segmentation performance. Kermi et al. [66] proposed an
4.3. Evaluation metrics improved UNet network utilizing deep convolutional neural networks
(DNNs) for brain tumor segmentation. The DMF-Net proposed by Chen
This paper evaluates MBANet using dice coefficient, sensitivity, et al. [30] utilizes a combination of group convolution and atrous
specificity, and Hausdorff distance. They are the evaluation criteria used convolution and achieves the best results in the whole tumor. For
by most researchers. MBANet, it uses BU module and SA attention to extract features while
adding attention to skip connections. Compared with other methods,
4.3.1. Dice coefficient MBANet only uses lightweight modules without complex convolution
When measuring similarity, the dice coefficient is usually used. The operations. Therefore, compared to other convolutional neural networks
dice similarity coefficient in this paper is the difference between tumor for brain tumor segmentation, MBANet achieves the best results on the
prediction area and GT. The dice coefficient is expressed by Eq. (4): BraTS 2018 validation set for brain tumors.
In addition, in order to verify the difference between the prediction
Dice =
2TP
(4) results of MBANet and Ground Truth (GT), the method of randomly
2TP + FP + FN dividing the training set is used to train and test the network. First, the
Here, TP represents true positives, TN represents false negatives, and train set with labels is randomly divided into train set and test set, and
FP represents false positives. The larger the value of the dice coefficient, then GT is removed from the divided test set as the test set. After training
the more similar the tumor prediction is to the real tumor area. MBANet with the divided training set, the prediction result is obtained
after testing. The visual comparison between the segmentation results of
4.3.2. Sensitivity and Specificity MBANet and GT is shown in Fig. 4.
Sensitivity is used to measure the ability of the model to segment
regions of interest. The sensitivity is expressed by Eq. (5): 4.4.2. Performance on BraTS 2019 validation set
This paper also conducts experiments on the BraTS 2019 validation
TP
Sensitivity = (5) set. The dice of MBANet on the BraTS 2019 validation set to the enhance
TP + FN tumor, the whole tumor and the tumor core is 78.21%, 89.79% and
Specificity is the true negative (TN) rate, which is used to measure 83.04%, respectively. Table 2 is the result comparison of MBANet with
the ability of the model to correctly identify pixels that are not in the other best methods. From Table 2, it can be seen that MBANet performs
region of interest. The specificity is expressed by Eq. (6): best on the validation set of the BraTS 2019 dataset. There, Li et al. [67]
6
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
Table 1
Comparisons with the state-of-the-art methods on the BraTS 2018 validation set. The bold is the best result.
Methods Dice (%) Hausdorff95 (mm) Sensitivity (%) Specificity (%)
ET WT TC ET WT TC ET WT TC ET WT TC
Chandra et al. [64] 74.06 87.19 79.90 5.58 5.04 9.59 79.47 83.00 78.88 99.70 99.64 99.73
UNet3D [46] 77.71 89.59 79.77 3.90 9.13 8.67 81.37 92.57 80.45 99.79 99.27 99.76
Hua et al. [65] 77.68 90.48 83.64 3.51 5.18 6.29 81.66 91.46 84.53 99.77 99.45 99.71
Kermi et al. [66] 78.30 86.80 80.50 3.73 8.13 9.84 83.60 89.50 80.70 99.70 99.10 99.70
DMFNet [30] 80.12 90.62 84.54 3.06 4.66 6.44 83.37 92.60 83.52 99.78 99.34 99.78
MBANet 80.18 89.80 85.47 2.47 5.13 5.65 84.12 93.44 83.67 99.76 99.24 99.81
Table 2
Comparisons with the state-of-the-art methods on the BraTS 2019 validation set. The bold is the best result.
Methods Dice (%) Hausdorff95 (mm) Sensitivity (%) Specificity (%)
ET WT TC ET WT TC ET WT TC ET WT TC
UNet3D [46] 74.20 88.48 80.98 6.67 10.83 10.25 77.21 92.25 82.71 99.82 99.18 99.63
Li et al. [67] 77.10 88.60 81.30 6.03 6.23 7.41 80.20 92.10 81.90 99.80 99.20 99.70
Wang et al. [68] 73.70 89.40 80.70 5.99 7.36 5.68 77.60 82.60 89.70 99.80 99.60 99.50
Islam et al. [69] 70.40 89.80 79.20 7.05 6.29 8.76 75.10 90.00 81.60 99.80 99.40 99.60
Cheng et al. [70] 76.40 90.50 82.00 3.40 5.41 7.38 77.10 91.00 82.50 99.80 99.40 99.70
MBANet 78.21 89.79 83.04 3.08 5.88 5.09 79.44 93.02 82.58 99.80 99.27 99.68
proposed a coarse-to-fine multi-step cascade network, using the results On the BraTS 2019 dataset, in order to verify the difference between
of the previous step as the prior information of the next step to guide the the MBANet prediction results and Ground Truth (GT), the same random
finer segmentation process. Wang et al. [68] used patch strategy to division method as the BraTS 2018 dataset was performed to train and
segment brain tumor images. The patch strategy could not fully map the test MBANet. The visual comparison between the segmentation results
global information of MRI images. Using the principle of combining of MBANet and GT is shown in Fig. 5. The segmentation results of
channel and spatial attention with a decoder network, Islam et al. [69] MBANet are not very different from GT, which further proves that
developed an attentional convolutional neural network (CNN) to MBANet can not only achieve excellent results on the BraTS 2018
segment brain tumors from magnetic resonance images (MRI). Cheng dataset, but also can be applied to BraTS 2019 dataset.
et al. [70] proposed a multi-task learning method that employs a two-
scale lightweight network to segment different types of tumor regions.
By comparing with other methods, it can be seen that Dice in ET and TC 4.5. Ablation experiment of MBANet
is advanced. It can be seen that MBANet achieves the best results on the
BraTS 2019 validation set. Different modules in the network model will produce different ef
fects. This section uses the labeled training set as well as the unlabeled
7
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
validation set in the BraTS 2018 dataset for ablation experiments. This information. SA attention obtains an ultra-lightweight attention mech
section first evaluates the effect of using different attention modules, anism based on spatial attention and channel attention.
different upsampling strategies, and different attentions in skip con
nections on the segmentation results of the validation set, and then 4.5.2. Effects of different upsampling strategies on MBANet
evaluates the effect of adding Gaussian blur and noise on the validation This section uses different upsampling strategies to validate MBANet.
set segmentation results. Finally, this paper also conducts a fivefold In lightweight networks, common upsampling strategies are linear
cross-validation experiment on the training set. interpolation and transposed convolution. Linear interpolation can
perform numerical estimation based on the sequence of network feature
4.5.1. Effects of different attention modules on MBANet maps in the previous layer, so as to achieve upsampling by connecting
In the deep convolutional neural network, adding different attention with the network in the latter layer. The transposed convolution can
modules has different effects on the network. The SE [23] module can restore the image size after pooling according to the size of the convo
well pay attention to the relationship between channels, so that the lution kernel and the size of the output. Since MBANet is a 3D network,
model can automatically learn different channel features. CBAM [21] trilinear interpolation is used here. From the lightness of the network,
convolves along the channel dimension and spatial dimension of the linear interpolation can reduce the amount of parameters and compu
feature maps, and the convolution operation of these two dimensions is tation, but the segmentation result dice is low. In this section, experi
performed independently, and then multiplies the attention with the ments are carried out on these two upsampling methods in MBANet.
input feature map. The 3D SA module has been introduced in detail in When other structures of the network remain unchanged, trilinear
Section 3.2 of this paper. The ablation study in this section does not interpolation and transposed convolution are used for upsampling of
change other structures of the network, and only uses the 3D SA module, MBANet respectively. The experimental results are shown in Table 4. As
SE module and CBAM module in the attention layer of the network can be seen from Table 4, the result of using transposed convolution for
encoder, respectively. The results obtained are shown in Table 3. Here, upsampling is better than trilinear interpolation.
the result of SE attention is the worst, because the SE module only
considers the channel information inside the feature maps and ignores 4.5.3. Effects of different attention in skip connections on MBANet
the importance of location information. CBAM attention adds spatial In previous studies, researchers tend to add spatial attention to skip
attention on the basis of SE attention, that is, CBAM is composed of connections. Therefore, while other results of the network remain un
channel attention and spatial attention. Using CBAM attention can only changed, 3D SA attention and widely used spatial attention are added to
capture local information, but cannot obtain globally dependent MBANet, respectively. The dice results shown in Table 5 are obtained. It
can be seen from Table 5 that the result of dice using spatial attention on
Table 3
Dice results of different attention module strategies on the BraTS 2018 valida Table 4
tion set. The bold is the best result. Dice results of different upsampling strategies on the BraTS 2018 validation set.
The bold is the best result.
Methods Dice (%)
Method Dice (%)
ET WT TC
ET WT TC
MBANet + 3D SE 78.28 89.34 80.18
MBANet + 3D CBAM 79.00 89.32 81.18 MBANet + Trilinear Interpolation 77.50 88.82 83.27
MBANet + 3D SA 80.18 89.80 85.47 MBANet + Transposed Convolution 80.18 89.80 85.47
8
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
Table 5 Table 7
Dice results of different attention in skip connection strategies on the BraTS 2018 Dice results of different models on the BraTS 2018 and BraTS 2019 training sets.
validation set. The bold is the best result. The bold is the best result.
Methods Dice (%) datasets Models Dice (%)
ET WT TC ET WT TC
MBANet + Spatial Attention 78.55 90.01 81.59 BraTS 2018 training set Kao et al. [71] 73.50 90.20 81.30
MBANet + 3D SA 80.18 89.80 85.47 Chen et al. [72] 73.95 88.81 84.42
Zhou et al. [73] 75.00 88.70 79.60
MBANet 77.49 89.95 85.96
WT is only 0.21% higher than that using 3D SA attention, but the results BraTS 2019 training set Sheng et al. [74] 70.70 88.10 79.60
of dice using spatial attention on ET and TC are 1.63% and 3.88% lower. Li et al. [67] 77.10 88.60 81.30
Xue et al. [75] 75.00 90.00 84.00
Therefore, this ablation experiment can illustrate that 3D SA attention is MBANet 78.92 89.91 85.12
more suitable for use in skip connections.
4.5.4. Effects of Gaussian blur or noise on the MBANet convolution and attention to exert accurate feature extraction capabil
To further verify the effectiveness of the model, this paper adds ities. Compared with other complex network structures, MBANet shows
Gaussian blur or noise to brain tumor MRI images. Table 6 shows the the best performance without redundant and complex convolution op
comparison of Dice results with adding Gaussian blur or adding noise in erations. Fig. 6 shows the visualization results of some ablation experi
MBANet. Here, the standard deviation of Gaussian blur and the standard ments. From Fig. 6, it can be seen that the method used in MBANet is
deviation of noise are 0.01 and 0.1. Here the standard deviation is excellent relative to other methods. In addition, from Section 4.5.4, it
denoted byσ. It can be seen from Table 6 that after adding Gaussian blur can be seen that noise and blur can be stable and efficient for the model.
or noise with different standard deviations, the Dice results after seg Finally, the superiority of MBANet is further demonstrated by experi
mentation do not have a very large gap, which This proves that MBANet ments on the BraTS 2018 and BraTS 2019 training sets. Overall, MBANet
is relatively stable and efficient. is proved to be the best network for brain tumor segmentation by
comparison with other better methods and extensive ablation
4.5.5. Experiment on BraTS 2018 and BraTS 2019 training set experiments.
Finally, in order to verify the advanced nature of the model, this
paper conducts experiments on the training sets of BraTS 2018 and 5. Conclusion
BraTS 2019 respectively. Here, this experiment adopts the method of
fivefold cross-validation, and divides the data set according to the ratio In medical image semantic segmentation, using attention mechanism
of 4:1. Among them, 228 cases in the BraTS 2018 training set are used greatly improves the segmentation efficiency. The paper proposes a
for training and 57 cases are used for testing, and 268 cases in the BraTS MBANet for lightweight brain tumor MRI image segmentation based on
2019 training set are used for training and 67 cases are used for testing. the 3D SA module. Specifically, MBANet adds SA attention to the 3D
The Dice comparison results of MBANet under the same cross-validation network and makes it as 3D SA module, which is placed in the encoder
method and the same dataset are shown in Table 7. The superiority of network layer and skip connection of MBANet. It can effectively focus on
MBANet is proved again through Table 7. the brain tumor region in the feature maps and extract more effective
image features for more accurate segmentation. The encoder and
4.5.6. Discussion decoder are composed of the BU module and the 3D SA module, which
This study proposes a novel lightweight network - MBANet. As can be enables the network module to achieve better feature fusion without
seen in Sections 4.4.1 and 4.4.2, MBANet outperforms most of the better increasing the amount of computation. In addition, MBANet uses
methods. MBANet uses the BU module as the baseline, and then applies transposed convolution as upsampling, which helps the recovery of
the 3D SA module attention layer to the feature maps on each channel. feature maps resolution. Later, in the training process of the network, a
Since the brain tumor region is smaller relative to the entire MRI image, generalized dice loss function is used, which can achieve pixel-level
adding attention can effectively improve the segmentation results. It can classification more accurately. Finally, TTA technology is used in the
be seen from Section 4.5.1 that the unique channel shuffle operation in process of network testing, which can effectively make up for the lack of
3D SA attention is more suitable for MBANet, and it can better combine some important features of the test set samples and improve the testing
with BU module. In addition, in the strategy selection of upsampling, efficiency. In this work, it is demonstrated that MBANet achieves the
transposed convolution is selected as the upsampling strategy. As can be best results on both BraTS 2018 and BraTS 2019 unlabeled validation
seen in Section 4.5.2, the use of transposed convolutions can improve sets by comparing the model with other better methods on the unlabeled
the dice score for multimodal brain tumor segmentation. MBANet in validation set of BraTS 2018 dataset and BraTS 2019 dataset. In addi
corporates 3D SA attention in skip connections, which is absent from tion, this experiment also explores the impact of adding different
other networks. In general, this paper proposes a feature extraction attention modules to MBANet, using different upsampling strategies,
strategy that combines convolution and convolution attention opera using different attention in skip connections, adding Gaussian blur or
tions, using “channel split”, “channel shuffle”, and “channel fusion” in noise and experiments on the training set. Through the visual compar
both convolution operations and attention. This strategy enables ison of MBANet segmentation results and training set labels, it can also
be intuitively seen that MBANet has good performance.
Through a series of experiments, it can be proved that MBANet has
Table 6 good segmentation performance, and has certain stability against noise
Dice results of adding Gaussian blur and noise in MBANet.
and blurring. But MBANet also has certain limitation: MBANet only
Methods Dice (%) conducts experiments on multimodal public datasets. At present, we are
ET WT TC negotiating cooperation with the hospital to obtain more clinical data.
So, in future work, we will use clinical data in order to really apply it to
MBANet + Gaussian Blur (σ = 0.01) 80.00 89.38 82.08
MBANet + Gaussian Blur (σ = 0.1) 79.61 90.04 80.02 medical clinical research.
MBANet + Noise (σ = 0.01) 80.00 90.09 82.21
MBANet + Noise (σ = 0.1) 79.79 89.91 80.87
MBANet 80.18 89.80 85.47
9
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
Fig. 6. Visualization results of three ablation experiments on BraTS 2018 validation set. Where: (a) performance on different attention module strategies; (b)
performance on different upsampling strategies; (c) performance on different attention in skip connection strategies.
CRediT authorship contribution statement [9] Y. Zhang, Y. Lu, W. Chen, Y. Chang, H. Gu, B. Yu, MSMANet: A multi-scale mesh
aggregation network for brain tumor segmentation, Appl. Soft Comput. 110
(2021), 107733.
Yuan Cao: Data curation, Formal analysis, Investigation, Method [10] W. Chen, W. Zhou, L. Zhu, Y. Cao, H. Gu, B. Yu, MTDCNet: A 3D multi-threading
ology, Software, Writing – original draft, Validation, Visualization. dilated convolutional network for brain tumor automatic segmentation, J. Biomed.
Weifeng Zhou: Formal analysis, Investigation, Methodology, Writing – Inform. 133 (2022), 104173.
[11] E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic
original draft. Min Zang: Formal analysis, Investigation, Methodology, segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2017) 640–651.
Validation. Dianlong An: Formal analysis, Investigation, Methodology, [12] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for biomedical
Validation. Yan Feng: Formal analysis, Investigation, Methodology, image segmentation, in: International MICCAI Brainlesion Workshop, Springer,
Cham, 2015, pp. 234–241.
Validation, Writing – review & editing. Bin Yu: Conceptualization, Data [13] L. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for
curation, Formal analysis, Investigation, Methodology, Writing – orig semantic image segmentation, arXiv preprint arXiv:1706.05587 (2017).
inal draft, Validation, Writing – review & editing, Supervision, Project [14] A. Howard, M. Sandler, G. Chu, L.C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R.
Pang, V. Vasudevan, Q.V. Le, H. Adam, Searching for mobilemetv3, in: Proceedings
administration, Funding acquisition. of the IEEE/CVF International Conference on Computer Vision, 2019, pp.
1314–1324.
[15] X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: an extremely efficient convolutional
Declaration of Competing Interest neural network for mobile devices, in: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
[16] N. Ma, X. Zhang, H.T. Zheng, J. Sun, ShuffleNetV2: practical guidelines for efficient
The authors declare that they have no known competing financial CNN architecture design, 2018, in: Proceedings of the European Conference on
interests or personal relationships that could have appeared to influence Computer Vision, 2018, pp. 116-131.
[17] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, GhostNet: more features from cheap
the work reported in this paper.
operations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2020, pp. 1580–1589.
Data availability [18] S. Mehta, M. Rastegari, L. Shapiro, H. Hajishirzi, ESPNetv2: a light-weight, power
efficient, and general purpose convolutional neural network, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp.
The data that has been used is confidential. 9190–9200.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, arXiv preprint arXiv:1706.03762 (2017).
Acknowledgements [20] J. Park, S. Woo, J.Y. Lee, I.S. Kweon, BAM: bottleneck attention module, arXiv
preprint arXiv:1807.06514 (2018).
We thank anonymous reviewers for valuable suggestions and com [21] S. Woo, J. Park, J.Y. Lee, I.S. Kweon, CBAM: convolutional block attention module,
in: Proceedings of the European Conference on Computer Vision, 2018, pp. 3–19.
ments. This work was supported by the National Natural Science
[22] H. Li, P. Xiong, J. An, L. Wang, Pyramid attention network for semantic
Foundation of China (No. 62172248), and the Natural Science Foun segmentation, arXiv preprint arXiv:1805.10180 (2018).
dation of Shandong Province of China (No. ZR2021MF098). [23] J. Hu, L. Shen, G. Sun, Squeeze-and-Excitation networks, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (2018) 7132–7141.
[24] Q. Zhang, Y. Yang, SA-Net: shuffle attention for deep convolutional neural
References networks, in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics,
Speech and Signal Processin, 2021, pp. 2235–2239.
[1] H. Wang, T. Xu, Q. Huang, W. Jin, J. Chen, Immunotherapy for malignant glioma: [25] M. Soltaninejad, G. Yang, T. Lambrou, N. Allinson, T.L. Jones, T.R. Barrick, F.
current status and future directions, Trends Pharmacol. Sci. 41 (2) (2020) 123–138. A. Howe, X. Ye, Automated brain tumour detection and segmentation using
[2] A. Wadhwa, A. Bhardwaj, V.S. Verma, A review on brain tumor segmentation of superpixel-based extremely randomized trees in FLAIR MRI, Int. J. Comput.
MRI images, Magn. Reson. Imaging 61 (2019) 247–259. Assisted Radiol. Surg. 12 (2017) 183–203.
[3] Y. Nan, J.D. Ser, S. Walsh, et al., Data harmonisation for information fusion in [26] M. Soltaninejad, L. Zhang, T. Lambrou, G. Yang, N. Allinson, X. Ye, MRI brain
digital healthcare: A state-of-the-art systematic review, meta-analysis and future tumor segmentation and patient survival prediction using random forests and fully
research directions, Inform. Fusion 82 (2022) 99–122. convolutional networks, in: in: International MICCAI Brainlesion Workshop,
[4] F. Raschke, T.R. Barrick, T.L. Jones, G. Yang, X. Ye, F.A. Howe, Tissue-type Springer, Cham, 2017.
mapping of gliomas, NeuroImage: Clinical 21 (2019) 101648. [27] M. Soltaninejad, G. Yang, T. Lambrou, N. Allinson, T.L. Jones, T.R. Barrick, F.
[5] S. Dash, S. Verma, Kavita, S. Bevinakoppa, M. Woźniak, J. Shafi, M.F. Ijaz, A. Howe, X. Ye, Supervised learning based multimodal MRI brain tumour
Guidance image-based enhanced matched filter with modified thresholding for segmentation using texture features from supervoxels, Comput. Meth. Prog. Bio.
blood vessel extraction, Symmetry 14(2) (2022) 194. 157 (2018) 69–84.
[6] M. Wieczorek, J. Siłka, M. Woźniak, S. Garg, M.M. Hassan, Lightweight [28] N. Cinar, A. Ozcan, M. Kaya, A hybrid DenseNet121-UNet model for brain tumor
convolutional neural network model for human face detection in risk situations, segmentation from MR Images, Biomed. Signal Process. Control 76 (2022),
IEEE Trans. Industr. Inform. 18 (7) (2021) 4820–4829. 103647.
[7] W. Dong, M. Woźniak, J. Wu, W. Li, Z. Bai, De-noising aggregation of graph neural [29] L. Wu, S. Hu, C. Liu, MR brain segmentation based on DE-ResUNet combining
networks by using principal component analysis, IEEE Trans. Industr. Inform. texture features and background knowledge, Biomed. Signal Process. Control 75
(2022), https://doi.org/10.1109/TII.2022.3156658. (2022), 103541.
[8] M. Woźniak, J. Siłka, M. Wieczorek, Deep neural network correlation learning [30] C. Chen, X. Liu, M. Ding, J. Zheng, J. Li, 3D dilated multi-fiber network for real-
mechanism for CT brain tumor detection, Neural Comput. Applic. (2021) 1–16, time brain tumor segmentation in MRI, in: International MICCAI Brainlesion
https://doi.org/10.1007/s00521-021-05841-x. Workshop, Springer, 2019, pp. 184–192.
10
Y. Cao et al. Biomedical Signal Processing and Control 80 (2023) 104296
[31] X. Zhou, X. Li, K. Hu, Y. Zhang, Z. Chen, X. Gao, ERV-Net: an efficient 3D residual and scar segmentations on unbalanced atrial targets, IEEE J. Biomed. Health 26 (1)
neural network for brain tumor segmentation, Expert Syst. Appl. 170 (2021), (2021) 103–114.
114566. [54] Y. Wu, S. Hatipoglu, D. Alonso-Álvarez, P. Gatehouse, B. Li, Y. Gao, D. Firmin,
[32] Y. Liu, J. Du, C.M. Vong, G. Yue, J. Yu, Y. Wang, B. Lei, T. Wang, Scale-adaptive J. Keegan, G. Yang, Fast and automated segmentation for the three-directional
super-feature based MetricUNet for brain tumor segmentation, Biomed. Signal multi-slice cine myocardial velocity mapping, Diagnostics 11 (2) (2021) 346.
Process. Control 73 (2022), 103442. [55] Y. Jin, G. Yang, Y. Fang, R. Li, X. Xu, Y. Liu, X. Lai, 3D PBV-Net: An automated
[33] M. Jiang, F. Zhai, J. Kong, A novel deep learning model DDU-net using edge prostate MRI data segmentation method, Comput. Biol. Med. 128 (2021), 104160.
features to enhance brain tumor segmentation on MR images, Artif. Intell. Med. [56] Y. Liu, G. Yang, M. Hosseiny, A. Azadikhah, S. Mirak, Q. Miao, S. Raman, K. Sung,
121 (2021), 102180. Exploring uncertainty measures in Bayesian deep attentive neural networks for
[34] W. Zhang, G. Yang, H. Huang, W. Yang, X. Xu, Y. Liu, X. Lai, ME-Net: Multi- prostate zonal segmentation, IEEE Access 8 (2020) 151817–151828.
encoder net framework for brain tumor segmentation, Int. J. Imaging Syst. [57] G. Yang, J. Chen, Z. Gao, S. Li, H. Ni, E. Angelini, T. Wong, et al., Simultaneous left
Techonl. 31 (4) (2021) 1834–1848. atrium anatomy and scar segmentations via deep learning in multiview
[35] A. Abdollahi, B. Pradhan, A. Alamri, VNet: An end-to-end fully convolutional information with attention, Future Gener. Comp. Sy. 107 (2020) 215–228.
neural network for road extraction from high-resolution remote sensing data, IEEE [58] Y. Liu, G. Yang, S. Mirak, M. Hosseiny, X. Zhong, R. Reiter, Y. Lee, S. Raman,
Access 8 (2020) 179424–179436. K. Sung, Automatic prostate zonal segmentation using fully convolutional network
[36] H. Huang, G. Yang, W. Zhang, X. Xu, W. Yang, W. Jiang, X. Lai, A deep multi-task with feature pyramid attention, IEEE access 7 (2019) 163626–163632.
learning framework for brain tumor segmentation, Front. Oncol. 11 (2021), [59] Q. Ye, J. Xia, G. Yang, Explainable AI for COVID-19 CT classifiers: An initial
690244. comparison study, in: the IEEE 34th International Symposium on Computer-Based
[37] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional Medical Systems (CBMS), 2021, pp. 521–526.
networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) [60] G. Yang, Q. Ye, J. Xia, Unbox the black-box for the medical explainable AI via
(2015) 1904–1916. multi-modal and multi-centre data fusion: A mini-review, two showcases and
[38] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, H. Hajishirzi, ESPNet: efficient spatial beyond, Inform. Fusion 77 (2022) 29–52.
pyramid of dilated convolutions for semantic segmentation, in: in: Proceedings of [61] B.H. Menze, A. Jakab, S. Bauer, et al., The multimodal brain tumor image
the European Conference on Computer Vision, Springer, 2018, pp. 552–568. segmentation benchmark (BRATS), IEEE Trans. Med. Imag. 34 (10) (2015)
[39] S. Mehta, M. Rastegari, L. Shapiro, H. Hajishirzi, ESPNetv2: a light-weight, power 1993–2024.
efficient, and general purpose convolutional neural network, in: Proceedings of [62] N. Ketkar, J. Moolayil, Introduction to PyTorch (2021), https://doi.org/10.1007/
IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 978-1-4842-5364-9_2.
9190–9200. [63] N. Moshkov, B. Mathe, A. Kertész-Farkas, R. Hollandi, P. Horváth, Test-time
[40] J. Wang, J. Gao, J. Ren, Z. Luan, Z. Yu, Y. Zhao, Y. Zhao, DFP-ResUNet: augmentation for deep learning-based cell segmentation on microscopy images,
convolutional neural network with a dilated convolutional feature pyramid for Sci. Rep. 10 (1) (2020) 5068.
multimodal brain tumor segmentation, Comput. Methods Programs Biomed. 208 [64] S. Chandra, M. Vakalopoulou, L. Fidon, E. Battistella, T. Estienne, R. Sun, C.
(2021), 106208. Robert, E. Deutch, N. Paragios, Context aware 3-d residual networks for brain
[41] A. Khan, H. Kim, L. Chua, PMED-Net: pyramid based multi-scale encoder-decoder tumor segmentation, in: International MICCAI Brainlesion Workshop, Springer,
network for medical image segmentation, IEEE Access 9 (2021) 55988–55998. Cham, 2018, pp. 74–82.
[42] Z. Zhou, Z. He, Y. Jia, AFPNet: a 3D fully convolutional neural network with [65] R. Hua, Q. Huo, Y. Gao, H. Sui, B. Zhang, Y. Sun, Z. Mo, F. Shi, Segmenting brain
atrous-convolution feature pyramid for brain tumor segmentation via MRI images, tumor using cascaded V-Nets in multimodal MR images, in: International MICCAI
Neurocomputing 402 (2020) 235–244. Brainlesion Workshop, Springer, Cham, 2018, pp. 49–60.
[43] Z. Zhou, Z. He, M. Shi, J. Du, D. Chen, 3D dense connectivity network with atrous [66] A. Kermi, I. Mahmoudi, M.T. Khadir, Deep convolutional neural networks using u-
convolutional feature pyramid for brain tumor segmentation in magnetic net for automatic brain tumor segmentation in multimodal MRI volumes, in: in:
resonance imaging of human heads, Comput. Biol. Med. 121 (2020), 103766. International MICCAI Brainlesion Workshop, Springer, Cham, 2018, pp. 37–48.
[44] D. Kong, X. Liu, Y. Wang, D. Li, J. Xue, 3D hierarchical dual-attention fully [67] X. Li, G. Luo, K. Wang, Multi-step cascaded networks for brain tumor segmentation,
convolutional networks with hybrid losses for diverse glioma segmentation, Knowl. in: in: International MICCAI Brainlesion Workshop, Springer, Cham, 2019,
Based Syst. 237 (2022), 107692. pp. 163–173.
[45] Z. Huang, Y. Zhao, Y. Liu, G. Song, GCAUNet: a group cross-channel attention [68] F. Wang, R. Jiang, L. Zheng, C. Meng, B. Biswal, 3D U-Net based brain tumor
residual UNet for slice based brain tumor segmentation, Biomed. Signal Process. segmentation and survival days prediction, in: International MICCAI Brainlesion
Control 70 (2021), 102958. Workshop, Springer, Cham, 2019, pp. 131–141.
[46] A.S. Akbar, C. Fatichah, N. Suciati, Single level UNet3D with multipath residual [69] M. Islam, V.S. Vibashan, V.J.M. Jose, N. Wijethilake, U. Utkarsh, H. Ren, Brain
attention block for brain tumor segmentation, Journal of King Saud University – tumor segmentation and survival prediction using 3D attention Unet, in:
Computer and Information Sciences 34(6) 2022 3247-3258. International MICCAI Brainlesion Workshop, Springer, Cham, 2019, pp. 262–272.
[47] Z. Wang, Y. Zou, P.X. Liu, Hybrid dilation and attention residual U-Net for medical [70] G. Cheng, J. Cheng, M. Luo, L. He, Y. Tian, R. Wang, Effective and efficient
image segmentation, Comput. Biol. Med. 134 (2021), 104449. multitask learning for brain tumor segmentation, J. Real-Time Image Pr. 17 (6)
[48] S. Ma, J. Tang, F. Guo, Multi-task deep supervision on attention R2U-Net for brain (2020) 1951–1960.
tumor segmentation, Front. Oncol. 11 (2021), 704850. [71] P. Kao, T. Ngo, A. Zhang, J.W. Chen, B.S. Manjunath, Brain tumor segmentation
[49] T. Zhou, S. Canu, S. Ruan, Fusion based on attention mechanism and context and tractographic feature extraction from structural MR images for overall survival
constrain for multi-modal brain tumor segmentation, Comput. Med. Imaging prediction, in: in: International MICCAI Brainlesion Workshop, Springer, Cham,
Graph. 86 (2020), 101811. 2018, pp. 128–141.
[50] W. Xu, H. Yang, M. Zhang, Z. Cao, X. Pan, W. Liu, Brain tumor segmentation with [72] W. Chen, B. Liu, S. Peng, J. Sun, X. Qiao, S3D-UNet: separable 3D U-Net for brain
corner attention and high-dimensional perceptual loss, Biomed. Signal Process. tumor segmentation, in: International MICCAI Brainlesion Workshop, Springer,
Control 73 (2022), 103438. Cham, 2018, pp. 358–368.
[51] X. Guan, G. Yang, J. Ye, W. Yang, X. Xu, W. Jiang, X. Lai, 3D AGSE-VNet: an [73] T. Zhou, S. Ruan, P. Vera, S. Canu, A Tri-Attention fusion guided multi-modal
automatic brain tumor MRI data segmentation framework, BMC Med, Imaging segmentation network, Pattern Recogn. 124 (2022), 108417.
(2021) arXiv preprint arXiv:2107.12046. [74] N. Sheng, D. Liu, J. Zhang, C. Che, J. Zhang, Second-order ResU-Net for automatic
[52] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, MRI brain tumor segmentation, Math. Biosci. Eng. 18 (5) (2021) 4943–4960.
A. Courville, Y. Bengio, Generative adversarial networks, Commun. ACM 63 (11) [75] Y. Xue, M. Xie, F.G. Farhat, O. Boukrina, A.M. Barrett, J.R. Binder, U.W. Roshan,
(2022) 139–144. W.W. Graves, A multi-path decoder network for brain tumor segmentation, in:
[53] J. Chen, G. Yang, H. Khan, H. Zhang, Y. Zhang, S. Zhao, R. Mohiaddin, T. Wong, International MICCAI Brainlesion Workshop, Springer, Cham, 2019, pp. 255–265.
D. Firmin, J. Keegan, JAS-GAN: Generative adversarial network based joint atrium
11