0% found this document useful (0 votes)
100 views

Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation

This paper proposes a connection sensitive attention U-Net (CSAU) for accurate retinal vessel segmentation. CSAU improves on attention U-Net with four key aspects: 1) A connection sensitive loss that models structure properties to improve pixel-wise segmentation accuracy, especially for thin vessels. 2) An attention gate with a novel neural network structure and concatenating DOWN-Link features to better learn attention weights for fine vessels. 3) Integrating the connection sensitive loss and attention gate by additionally concatenating attention weights to features before the output to further improve accuracy on detailed vessels. 4) New connection sensitive accuracy metrics to better reflect segmentation performance on boundaries and thin vessels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation

This paper proposes a connection sensitive attention U-Net (CSAU) for accurate retinal vessel segmentation. CSAU improves on attention U-Net with four key aspects: 1) A connection sensitive loss that models structure properties to improve pixel-wise segmentation accuracy, especially for thin vessels. 2) An attention gate with a novel neural network structure and concatenating DOWN-Link features to better learn attention weights for fine vessels. 3) Integrating the connection sensitive loss and attention gate by additionally concatenating attention weights to features before the output to further improve accuracy on detailed vessels. 4) New connection sensitive accuracy metrics to better reflect segmentation performance on boundaries and thin vessels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation

Ruirui Li, Mingming Li, Jiacheng Li∗, Yating Zhou


Beijing University of Chemical Technology
North Third Ring Road 15, Chaoyang District
Beijing, China, 100029
{ilydouble,lmming0429}@gmail.com, [email protected], [email protected]
arXiv:1903.05558v2 [cs.CV] 23 Apr 2019

vessels, and (ii) reserve the connectivity of thin vessels by


modeling the structural properties. Our method achieves
the leading position on DRIVE, STARE and HRF datasets
among the state-of-the-art methods.

1. Introduction
Retinal vasculature structure implicates important infor-
mation and helps the ophthalmologist in detecting and di-
agnosing a variety of retinal pathology such as Retinopa-
thy of Prematurity (RoP), Diabetic Retinopathy(DR), Glau-
Figure 1. Challenges on retinal vessel segmentation:(a) an exam- coma, hypertension, and Age-related Macular Degenera-
ple on STARE, from left to right: input, GT, VGAN and our
tion(AMD) which are leading causes of blindness. The seg-
method; (b) our results for bifurcation, intersection, tortuosity and
mentation of retinal vessels is particularly important for di-
microvascular cases.
agnosis assistance, treatment and surgery planning of reti-
nal diseases. Changes in vessel morphology such as shape,
Abstract tortuosity, branching pattern and width provide an accurate
early detection of many retinal diseases.
We develop a connection sensitive attention U- Over the past two decades, a tremendous amount of re-
Net(CSAU) for accurate retinal vessel segmentation. This search has been devoted in segmenting the vessels from reti-
method improves the recent attention U-Net for semantic nal fundus images. Numerous fully automated methods[24,
segmentation with four key improvements: (1) connection 14, 17] have been proposed in literature which were quite
sensitive loss that models the structure properties to im- successful in achieving segmentation accuracy on par with
prove the accuracy of pixel-wise segmentation; (2) attention trained human annotators. Despite this, there is a con-
gate with novel neural network structure and concatenating siderable method for further improvements due to various
DOWN-Link to effectively learn better attention weights on challenges posed by the complex nature of vascular struc-
fine vessels; (3) integration of connection sensitive loss and tures. Some of the active problems include segmentation
attention gate to further improve the accuracy on detailed in the presence of abnormalities, segmentation of thin ves-
vessels by additionally concatenating attention weights to sels structures and segmentation near the bifurcation and
features before output; (4) metrics of connection sensitive crossover regions.
accuracy to reflect the segmentation performance on bound- Comprehensive and detailed survey of retinal vessels
aries and thin vessels. segmentation methods are included in [21, 1, 5]. Works that
Our method can effectively improve state-of-the-art ves- concerned by the paper are deep learning based methods for
sel segmentation methods that suffer from difficulties in accurate retinal vessel segmentation. Liskowski et al. [10]
presence of abnormalities, bifurcation and microvascular. proposed a deep neural network model, achieving an area
This connection sensitive loss tightly integrates with the under the curve (ROC AUC) of 0.97 on the DRIVE dataset.
proposed attention U-Net to accurately (i) segment retinal Their method performs reasonably well on pathological im-
ages. A novel CNN architecture was proposed in [12] to
∗ The corresponding author. solve both the retinal vessel and optic disc segmentation

1
problem. Fu et al. [6] formulated the vessel segmentation as
a boundary detection problem using fully connected CNN
model. In semantic segmentation field, U-Net[18] are fully
convolutional networks for biomedical image segmentation.
Though many deep learning based approaches have been
proposed, existing methods tend to miss fine vessels struc-
tures or allow false positives at terminal branches.Attention
U-Net[15] is used to automatically learn to focus on target
structure of varying shapes and sizes. Mosinska et.al [13]
have found that pixel-wise losses are unsuitable for retinal
Figure 2. The proposed framework
vessel segmentation because of their inability to reflect the
topological impact of mistakes in the final prediction. The
work[25] added a coefficient to cross-entropy loss. It de- 4. In order to better reflect the quality of the segmentation
signed an estimating way of connectivity depending on the details, this paper invents a new metrics to evaluate the
Euclidean distance between focused pixel and the nearest segmentation of boundaries and thin vessel structures.
pixel belongs to the class. Ventura et.al[23] defined a new We name it as connection sensitive accuracy.
way to evaluate the connectivity on a patch. The most recent
approach by Son et al. [20] generates the precise map of In Section 2, we will introduce the proposed method.
retinal vessels using generative adversarial training (GAN). Section 3 shows implementation details that include data
Unfortunately, with limited data, generative models are con- preprocessing and training process. And Section 4 dis-
sidered much harder to train than discriminative models. cusses the experimental situation and analyzes the results.
For thin vessels segmentation, this paper proposes an The last section shows the conclusions of this paper.
efficient topology-aware loss and a novel attention mech-
anism based on the U-Net to improve the accuracy. The 2. Proposed methodology
proposed loss is called connection sensitive loss (CS loss) in In this section, we present the architecture of the con-
that it considers the probability of connectivity in the neigh- nection sensitive attention U-Net (CSAU). The main frame-
boring region when designing the loss function. Moreover, work is showed in Fig. 2. Its structure is very like the
the network is added new attention gates and learns a bet- original attention U-Net except the connections and the de-
ter matrix of attention weights before output. The proposed signs of attention gates. Moreover, the framework uses a
method provides an end-to-end fashion without any inter- new connection sensitive loss with which the attention gate
vene in learning. With the well-designed attention U-Net learns better attentive weights and helps improve the accu-
architecture, the proposed connection sensitive loss gets racy of details.
the highest F1 -score on all the three datasets which are
The parameters of the convolutional neural layers are
DRIVE[22], STARE[8] and HRF[4]. It also performs better
listed in Table 1. The network contains four encoder blocks
to extract thin vessel structures compared with the state-of-
and four decoder blocks. They are connected by the skip
the-art methods. In summary, the paper mainly made the
connections. Each encoder block consists of two successive
following contributions:
3×3 convolutional layers and a max pooling layer. Every
1. For vessels segmentation, the paper proposes a con- convolutional layer is followed by a Batch-normalization
nection sensitive loss. It is designed for simultane- layer and a ReLU layer. The decoder block is the same
ous region-wise structure extraction and pixel-wise se- as the encoder block except that it uses the transposed con-
mantic segmentation. It helps achieve accurate results, volutional layer instead of the pooling layer.
even for thin vessel structures in crossover regions.
2.1. Connection sensitive loss
2. A new attention mechanism is designed based on the
The parameters of the model are learnt by a training ob-
standard U-Net. The proposed attention gates improve
jective, using Adam stochastic gradient descent. In this pa-
the quality and the effectiveness of the features and
per, we build a new training objective on top of the pro-
thus take better advantage of them during segmenta-
posed attention U-Net architecture. In the following discus-
tion.
sion, let x ∈ RH×W be the H × W input image, and let
3. The paper proposes the connection sensitive attention y ∈ {0, 1}H×W be the corresponding ground-truth label-
U-Net (CSAU) which combines the connection sensi- ing, with 1 indicating pixels in the vessels and 0 indicating
tive loss and the attention gates together. In the experi- background pixels. Let f be the proposed neural network
ment, CSAU gets the highest F1 -score on all the three parameterized by weights v. The output of the network is
datasets compared with the state-of-the-art methods. an image ŷ = f (x, v) ∈ {0, 1}H×W . Every element of
Table 1. The parameters of the convolutional neural layers.
Block Layer Layer
Remark
name name configuration
Encoder conv1 1 3×3, 32
Block(1) conv1 2 3×3, 32
2×2 max pool, stride 2
Encoder conv2 1 3×3, 64
Block(2) conv2 2 3×3, 64
2×2 max pool, stride 2 Down-sampling
Encoder conv3 1 3×3, 128 path
Figure 3. Results trained by binary cross entropy in which red pix-
Block(3) conv3 2 3×3, 128 els are false negatives.
2×2 max pool, stride 2
Encoder conv4 1 3×3, 256
Block(4) conv4 2 3×3, 256 ing two coefficients into the cross-entropy loss, as showed
2×2 max pool, stride 2 in (3). Lcs is the connection sensitive loss. θ1 and θ2 rep-
conv5 1 3×3, 512
Decoder
conv5 2 3×3, 512
resent local structural properties in the labeled ground truth
Block(5) and the predicted map respectively, while wi is a weighted
convTranspose5 1 2×2, 256
Decoder
conv6 1 3×3, 256 parameter that multiplies with the encoded loss on every
conv6 2 3×3, 256 pixel which will be explained later.
Block(6)
convTranspose6 1 2×2, 128 Up-sampling
conv7 1 3×3, 128 path X
Decoder Lcs = − wi ∗(θ1 yi log(fi (x,v))+θ2 (1−yi )log(1−fi (x,v))
conv7 2 3×3, 128
Block(7)
convTranspose7 1 2×2, 64 i=1
conv8 1 3×3, 64 (2)
Decoder
conv8 2 3×3, 64 To model the structural properties, an exponential func-
Block(8)
convTranspose8 1 2×2, 32
tion is constructed as showed in the following equation:
conv9 1 3×3, 32
conv9 2 3×3, 1 2 2
∗yi ) ∗fi (x,v))
conv10 1 3×3, 32 θ1 = e(1−Ci , θ2 = e(1−Ci (3)
conv10 2 3×3, 32
conv10 3 3×3, 1 in which Ci represents the probability of connectivity in
local regions. It can be computed by the following function
with upper bound 1 and lower bound 0. zi is a variable
ŷ is interpreted as the probability of pixel i having label 1:
representing whether the pixel belongs to the ground truth
ŷi ≡ p(Yi = 1|x, v), where Yi is a random Bernoulli vari-
(zi = yi ) or the predicted map (zi = fi (x, v)).
able Yi ∼ Ber(ŷi ).
Cross entropy is widely used as the loss function in deep P
i∈Ω(m,n,r;z) β
learning networks to deal with binary classification prob- Ci,i∈Ω(m,n,r;z) = max(min(α ∗ ( ) − γ,1),0)
r2
lems, which calculates the probability of being one specific (4)
class or not. Thus, the proposed loss function is also on the It is observed that Ci is strongly correlated to the lo-
basis of the cross-entropy loss Lce defined by cal density. To estimate Ci , the function chooses a poly-
nomial model and computes the local density by averaging
X the values in the region. α, β, γ are constant coefficients.
Lce = − (yi ∗log(fi (x, v))+(1−yi )∗log(1−fi (x, v))
Ω(m, n, r; z) represents a square region in the map z with
i=1
(1) the side length r and the coordinate (m, n) of the center
By observing the definition of Lce in (1), we can find point. The region can be defined with the matrix in the
that the cross-entropy loss assigns equal weights to the loss equation:
of different pixels, failing to consider fine object structures.
Z(m−r−1 ,n−r−1 ) ···
· · · Z(m−r−1 ,n+ r−1 )
 
Therefore, cross entropy loss is not fit well to the tasks of 2 2 2 2

segmenting connected vascular structures. Fig. 3 shows


 ··· ··· ··· ··· 
Ω(m,n,r;z)=  
the segmentation results produced by the U-Net with the  ··· ··· ··· ··· 
cross-entropy loss. The colored pixels are false negative Z(m+r−1 ,n−r−1 ) ···
· · · Z(m+r−1 ,n+r−1 )
2 2 2 2 r×r
results. It is obvious that cross-entropy loss tends to bring (5)
broken vessels in terminal branches, which are critical for To get the values of the constant coefficients α, β and γ,
diagnosis. we throw N sampling points on r × r region for different
The connection sensitive loss is designed for neural net- densities through the Monte Carlo important sampling. In-
work training tasks in the field where the structural connec- spired by the definition of connectivity in the paper[23], on
tivity of segmented objects is concerned. To solve the prob- each sampled patch, we decide whether the region is con-
lem, we take the connectivity into consideration by encod- nected or not by checking if there exist two paths from the
Figure 4. Some samples when adding 5 points in 5×5 resolution
area.

Connectivity fitted curve


1.00
0.95
0.90
0.85
0.80
0.75
0.70
Figure 6. (a) is a local region of the label image, the purple region
0.65
Probability

0.60 is the background and the yellow region is the vessels. (b) is the
0.55
0.50 corresponding region of connectivity feature map where the pixels
0.45
0.40
0.35
0.30
with dark value have higher probability of connectivity than those
0.25
0.20
0.15 experimental curve with bright colors.
0.10 ideal curve
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Density
Figure 5. Curves of connectivity probability with different densi-
ties in 5×5 region.

center point (m, n) to the boundary of the region accord-


ing to the eight-connected domain algorithm. Fig. 4 shows
some cases when the density is 0.2 in 5 × 5 resolution ar-
eas. Fig. 5 shows the fitted curve when α = 10.3180,
β = 1.9808, γ = −0.0254 and r = 5. The sampled blue
curve is very close to the modeled red curve. Figure 7. The proposed attention gate.
It is recommended to choose r = 5 during the local
connectivity estimation for simplification without scarify-
ing too much accuracy. In fact, 5 × 5 area could be seen
2.2. Attention gates
as a local pattern. Images with complex contents and other The proposed attention gates are incorporated into the
resolutions could be mapped to the local pattern. Fig. 6 standard U-Net architecture to highlight salient features that
illustrates the connectivity feature map in which pixels are are passed through the skip connections, see Fig. 2. The at-
computed through (yi − Ci2 × yi ) on the ground truth. It tention gate has two input signals. One is the feature map
means that the larger the value is, the more attention should that is transported by the skip connection. The other input
be paid on this pixel. It is assumed that large value has high is the coarse feature gotten from the output of previous neu-
risk of being less connected. ral layer. Information extracted from coarse scale is used
The factor wi is proposed to further decrease the false in gating to disambiguate irrelevant and noisy responses in
negatives and is formulated as: skip connections. The output of attention gate is connected
to the next decoder. The gating signal for each skip connec-
wi = (1 + (max(Ωj,i,λ;o ) − fi (x, v)) ∗ yi ) (6) tion aggregates information from multiple imaging scales
which increases the resolution of the attention weights and
If the output fi (x, v) is expected to connect other vessel helps achieve better performance.
pixels and is predicted small probability, the value wi would The proposed attention gate is showed in Fig. 7. It is
be higher and brings more punishment on the false negative actually a sub-network in a simple encoder-decoder pattern.
pixels by increasing their losses. The punishment is region- The attention gate consists of five 3×3 convolutional layers,
aware. The term max(Ωj,i,λ;o ) indicates the probability to five batch normalizers, five ReLUs, two max pooling layers
classify the pixel as vessel class to some extent. The larger and a transposed convolutional layer. The feature map X
the value is, the easier it is going to be recognized, and vice and G are transformed to an intermedia space first. Then the
versa. The term (max(Ωj,i,λ;o ) − fi (x, v)) illustrates the addition of them are up-sampled by transposed convolution.
difference between the probability and the predicted value. We use additive attention[2] to obtain the gating coefficient.
It is expected to become smaller during the training process. Additive attention is formulated as follows:
Figure 9. Two examples of mask maps on DRIVE.

2.3. Metrics of connection sensitive accuracy


General metrics for image segmentation could judge how
good main vessels are segmented. But they could not dis-
tinct clearly the minor changes in boundaries and fine vessel
structures which are critical for early diagnosis. To solve
the problem, this paper presents a new evaluation metrics
to evaluate the performance of segmentation on boundaries
and thin structures. Based on a factor of CS loss, we define
Figure 8. Visualizations of attention weights. (a) is a fundus im- the ACCcs as follows:
age, (b) is the ground truth and (c) (d) are visualized attention P
weights by UP-Link and DOWN-Link respectively. (mi × δ1 (fi (x, v)))
ACCcs = P (8)
mi

αi = σ(qatt (xi , gi ; θatt )) (7) mi = (δ2 (1 − Ci2 ) × yi ) ∪ DOG(y) (9)


1
where σ(xi ) = 1+exp(−x i)
correspond to sigmoid ac-
tivation function. Attention gate is characterized by a set in which δ1 , δ2 are binary threshold functions. fi is the
of parameters θatt containing: linear transformations, non- predicted result with input x and weights v. mi constructed
linear transformations and bias terms. qatt defines the oper- a mask map. It is calculated by union operation of two sets.
ations on xi and gi by parameters θatt . The first set (δ2 ((1 − Ci2 ) × yi ) represents the pixels belong-
We tried two kinds of connection modes for designing ing to fine vessel structures that are hard to be segmented.
of attention gates. We called them the UP-Link and the The second set is the extracted boundary of the ground truth
DOWN-Link respectively. According to the UP-Link, there through the DOG edge detection algorithm. Fig. 9 presents
is a connection between the input G and the output of at- two examples of the mask maps on No.3 image and No.11
tention gate as showed in Fig. 7. On the other hand, the image of DRIVE. Actually, ACCcs computes the propor-
DOWN-Link has a connection between the input X and the tion of correctly segmented pixels and the total pixels with
output instead. CSAU chooses the UP-Link mode since the mask.
such mechanism improves the quality and influence of de-
tailed features during training. Updating parameters of the 3. Implementation and Experiments setup
attention gates depends on the gradient passed not only from
the decoder layers but also from the encoder layers. It re-
3.1. Implementation details
sults experimentally in better attention weights for segmen- In this part, we will make a brief introduction of the im-
tation model. Examples of intermediate attention weights plementation of the connection sensitive attention U-Net.
are converted and visualized in Fig. 8 in which (c) illus- The experiments are carried out on a laboratory computer.
trates the last attention weights gotten by the UP-Link while Its configuration is showed in Table 2. The operating sys-
(d) illustrates that gotten by the DOWN-Link in the same tem is Ubuntu 16.04. The main required packages include
situation. The UP-Link mode provides sufficient detailed python 3.6, CUDA8.0, cudnn7.0, Pytorch0.4.0.
information as well as strengthened salient features for the
following decoders in feed forward propagation. As a re- Table 2. Experimental environments
CPU Intel (R) Core (TM) i7-4790K 4.00Hz
sult, both the vessels and the structures are well preserved.
GPU GeForce GTX1080 Ti
At the end of the network showed in Fig. 2, the last at- RAM 20GB
tentive weights are extracted out and concatenated to the Hard disk Toshiba SSD 512G
output of the features, which further emphasize attentive System Ubuntu 16.04
pixels. Experiments in section 4.3 show the validation of
the proposed attention mechanism for thin vessels segmen- To avoid complex CUDA coding, we make full
tation and its connectivity preservation. use of functions provided by PyTorch, mainly the
nn.Functional.conv2d and the nn.MaxPool2d. Specifi- times current learning rate. If the loss doesn’t decrease for
cally, to calculate the summation of the probability in continuous 20 groups of validations, the learning rate will
the region that centered at a focused pixel, we use be set to the initial value 0.002.
nn.Functional.conv2d with a kernel 5×5 and perform con- We use a mini-batch size of 2 images for DRIVE,
volution on the whole image, except the padding part, which STARE and HRF. The model with the minimal validation
remains zero. To get the max probability of that region, we loss will be chosen as the final model for testing. Accord-
use nn.MaxPool2d, setting kernel size as 7. ing to the experiments, the validation loss tends to converge
within 20th training epoch and we set the max training
3.2. Datasets and preparation epoch to 25.
Our approach is examined on three widely used bench-
marks: DRIVE[22], STARE[8] and HRF[4], provided by 4. Results and Analysis
different organizations. All photographs in these bench-
marks are RGB images, while annotated images are binary 4.1. Evaluation metrics
images. DRIVE contains 20 training images and 20 test-
We use F1 -score, PR AUC, ROC AUC, Accuracy and
ing images, with each of size 584×565. STARE contain 20
Sensitivity to evaluate the performance of binary seg-
fundus images, with each of size 605×700. We manually
mentation model. False Negative(F N ), True Positive(T P ),
divide the STARE dataset into training and testing images
True Negative(T N ), False Positive(F P ) are four basic el-
in the ratio of 10/10. For DRIVE and STARE, we use only
ements to compute the metrics. We also introduce con-
one image from the training set for validation. The HRF
nection sensitive accuracy(ACCcs ) to measure the perfor-
dataset comprises 45 images and is organized as 15 subsets.
mance of segmentation on terminal thin vessels.
Each subset contains one healthy fundus image, one image
F1 −Score considers both Recall and Precision, which
of patient with diabetic retinopathy and one glaucoma im-
is defined as:
age. We set the first 5 subsets as our training set and the rest
as testing set. Five validation images are randomly selected TP
in the training set. Recall = Sensitivity = (10)
TP + FN
For DRIVE, we resize each image to 640×640 by
padding it with zero in four margins. For STARE, we re-
size them to 720×720 in the same way. Each image in HRF TP
is digitalized to 2336×3504 pixels. Because of the high res- P recision = (11)
TP + FP
olution image in HRF and limitation of GPU memory, we
crop a single image into 640×640 tiles, and test the tiles one
by one from bottom left to up right in a sliding window way. P recision ∗ Recall
F1 = 2 ∗ (12)
To predict the pixels in the border region of the image, the P recision + Recall
missing context is extrapolated by mirroring the input im-
age. We use an overlap strategy described in the work[9]. F1 -Score is positively related to the performance of the
For each tile, we compute the weight for overlapped pix- model.
els by the Gaussian function. Through weighted summary, Accuracy is the proportion of the pixels which are cor-
we composite the overlapped tiles and seamless stitch the rectly segmented and the total pixels.
whole segmental image.
To augment the data, the method rotates the image every TP + TN
4 degree along the whole round. Then it further flips them Accuracy = (13)
TP + FN + TN + FP
horizontally and vertically. Thus, there are 270 images gen-
erated from a single image. PR AUC and ROC AUC A P recision and Recall (PR)
curve is plotting Precision against Recall while a Receiver
3.3. Training methodology
Operating Characteristic (ROC) curve is plotting True Pos-
The model is trained by AdamW[11] with parameters itive Rate (Recall) against False Positive Rate (F P R).
β1 = 0.9, β2 = 0.999 and learning rate 0.002. We propose F P R is defined as:
a new learning strategy for the experiments. According to
the strategy, we test the latest model on the validation set FP
FPR = (14)
for every fifty batches. We use its loss as metrics to adjust FP + TN
the following learning rate. If the loss doesn’t decrease for
continuous five groups of validations, the learning rate will AUC is the area under the curve and the performance of
be set to the maximum of the values between 0.0001 and 0.1 the model is positively related to the value of the area.
4.2. Overall performance Table 3. Comparison of different methods on DRIVE.

We trained the CSAU model on DRIVE, STARE and DRIVE


Methods
ROC AUC PR AUC F1 -score Sensitivity Accuracy ACCcs
HRF respectively and compared it with the state-of-the- K-Boost[3] 0.9307 0.8464 0.7797 0.7563 0.9456 0.6739
art methods. The results for comparison on DRIVE and HED[26] 0.9696 0.8773 0.7938 0.7943 0.9475 0.7016
STARE are obtained from the web site of VGAN[20]. We Wavelets[19] 0.9436 0.8149 0.7601 0.7628 0.9387 0.6839
N 4 -Fields[7] 0.9686 0.8851 0.8021 0.7994 0.9498 0.7178
directly use the segmented images to compute the metrics. DRIU[12] 0.9793 0.9064 0.8210 0.8261 0.9541 0.7470
On the other hand, the results for comparison on HRF are CRFs[16] – – 0.7799 0.7829 0.9438 0.6785
gotten from the work[17]. Since they do not provide the VGAN[20] 0.9803 0.9142 0.8277 0.8300 0.9560 0.7537
source code and the result images, as a result, we simply CSAU 0.9807 0.9157 0.8294 0.8349 0.9563 0.7751

copy the metrics provided in their paper. To guarantee the


fairness, we use the same way when choosing training, val-
idating and testing set, which is described in section 3.2.
Table 3-5 show the results of comparison metrics. As
observed, CSAU got the highest F1 -score, Sensitivity
and ROC AUC on all the benchmarks. On DRIVE, the
proposed method achieves leading position on the leading
broad through all the evaluation metrics. The comparison
methods include K-Boost[3], HED[26], Wavelets[19], N 4 -
Fields[7], DRIU[12], CRFs[16] and VGAN[20]. Among
them, HED, DRIU and VGAN are deep learning based
Figure 10. Precision and Recall curves and Receiver Operating
methods which show superior performance in contrast to the
Characteristic curves for different methods on DRIVE.
other non-deep learning methods. Fig. 10 displays the PR
curves and the ROC curves. The performance of VGAN is Table 4. Comparison of different methods on STARE.
also good and is listed in the second place. Compared with
STARE
VGAN, the F1 -score of CSAU is 0.2% higher than that of Methods
ROC AUC PR AUC F1 -score Sensitivity Accuracy ACCcs
VGAN and the Sensitivity of CSAU is promoted by 0.6%. HED[26] 0.9764 0.8888 0.8057 0.8200 0.9588 0.7257
CSAU improves the PR AUC by 0.2% and ROC AUC by Wavelets[19] 0.9694 0.8433 0.7756 0.7817 0.9529 0.7226
DRIU[12] 0.9772 0.9101 0.8323 0.8380 0.9648 0.7667
0.4% respectively. Actually, most deep learning based neu-
VGAN[20] 0.9777 0.9159 0.8353 0.8350 0.9657 0.7694
ral networks could segment the main vessels well. What CSAU 0.9834 0.9206 0.8435 0.8465 0.9673 0.7878
really challenging is the task to segment thin vessel struc-
tures. In fundus images, pixels of thin vessels take a much Table 5. Comparison of different methods on HRF.
smaller proportion compared with the other pixels. As a re-
HRF
sult, even the improvement on thin vessel segmentation is Methods
ROC AUC PR AUC F1 -score P recision Sensitivity
obvious, the promotion is slight when evaluated by the gen- Odstrcilik[14] 0.967 – 0.7316 0.6950 0.7772
eral metrics on the whole image. The last column in Table Vostatek
0.97 – – – 0.7340
3 shows the results of connection sensitive accuracy metrics (Soares)[24]
by different methods. The ACCcs of CSAU is 2.8% higher Vostatek
0.937 – – – 0.5830
(Sofka)[24]
than that of VGAN and is 3.8% higher than that of DRIU. It Orlando[17] – – 0.7168 0.7199 0.7201
means that CSAU has a better performance on segmenting CSAU 0.9867 0.9047 0.8171 0.8043 0.8303
the boundaries and the thin vessels. Fig. 11 shows a group
of examples on DRIVE. The segmented results by VGAN
and CSAU are looked similar from an overall perspective. On HRF, CSAU is compared with Odstrcilik,
By zooming in the area surrounded by red rectangles, it is Vostatek(Soares), Vostatek(Sofka) and Orlando. The
clear to distinct that where VGAN tend to obtain inaccu- results of different methods are differed a lot on general
rate boundaries and broken thin vessels. CSAU gets more segmentation metrics. Thus we did not compute the metrics
accurate boundary and more integrated vessel structures. of ACCcs for further analysis. CSAU gets the highest
On STARE, similar phenomenon could be found as scores and values in this group of experiments. Compared
that on DRIVE in the experiments. CSAU wins the first with Orlando, the F1 -score is enhanced by more than 14
place by all the metrics except the PR AUC. It gets 0.34% percent.
higher ROC AUC, 0.1% higher F1 -score and 0.6% higher
4.3. Experiment Analysis
Sensitivity than VGAN. Several zoomed in images are
displayed in Fig. 1 which indicate that CSAU obtain good To explore the reason why CSAU could get good per-
vessel structures on STARE either. formance, we carried out extra experiments on the datasets.
Figure 12. Comparison between UCE and CSAU.

Table 6. Comparison of different combinations on DRIVE.


DRIVE
Methods
F1 -score PR AUC ROC AUC Sensitivity Accuracy ACCcs
UCE 0.8243 0.9084 0.9776 0.8318 0.9549 0.7435
UCS 0.8255 0.9101 0.9802 0.8307 0.9554 0.7523
AUCE 0.8258 0.9111 0.9777 0.8303 0.9553 0.7524
CSAU 0.8294 0.9157 0.9807 0.8349 0.9563 0.7751

Table 7. Comparison of different combinations on STARE.


STARE
Methods
F1 -score PR AUC ROC AUC Sensitivity Accuracy ACCcs
UCE 0.8310 0.9096 0.9789 0.8350 0.9646 0.7513
UCS 0.8372 0.9155 0.9796 0.8492 0.9656 0.7702
AUCE 0.8393 0.9202 0.9842 0.8455 0.9663 0.7619
CSAU 0.8435 0.9206 0.9834 0.8465 0.9673 0.7878

and HRF conform the effectiveness of the proposed method.


Full experimental results could be found in the supplement
Figure 11. Comparison of details between VGAN and CSAU.
materials.

We tried four different combinations. They are U-Net with


5. Conclusions
CE loss(UCE), U-Net with CS loss(UCS), Attention U- In this paper, we proposed a very elegant symmetric neu-
Net with CE loss (AUCE) and Attention U-Net with CS ral network named connection sensitive attention U-Net for
loss(CSAU). Table 6 and 7 display the results of different retinal vessels segmentation. Differed with other end-to-
combinations on DRIVE and STARE respectively. The ta- end semantic segmentation networks, the proposed CSAU
ble of HRF are provided in supplementary materials. From not only concerned with pixel-level accuracy but also took
the results, we could find that either the usage of the pro- care of topology structures by designing a novel connection
posed attention mechanism or that of the CS loss improves sensitive loss and a new attention gate. The network was
the performance. With both techniques, CSAU gets the best also learnt attention weights and concatenated it at the end
results in the group. Fig. 12 visually compares UCE and of the network, which further improves the accuracy.
CSAU on an image of DRIVE. It is obvious that the pro- We verify the validity of CSAU on three public datasets:
posed CSAU segments fine vessels more correctly while DRIVE, STARE, and HRF. The CSAU not only gets the
preserve topology structures well. highest F1 -score, ROC AUC and Sensitivity on all the
For quantitative analysis, on DRIVE, the result of CSAU three datasets, but also performs well to segment the thin
is 0.6% higher in F1 -score, 0.2% higher in ROC AUC and vessel structures, compared with the state-of-the-art meth-
0.6% higher in Sensitivity than that of the UCE. As previ- ods. We also propose a new metrics named connection sen-
ously discussed, results on general metrics are not improved sitive accuracy to evaluate the improvement on thin vessels
a lot. But in Fig. 12, the enhancement is noticeble. To fur- segmentation. Based on it, we conclude that CSAU could
ther analyze the source of contributions, we calculates the segment thin vessels with high accuracy which is important
ACCcs on the results by different combinations. It could for clinical diagnosis.
be seen that CSAU enhances the accuracy of segmentation In the future, we will intend to try multiscale techniques
mainly by improving the performance on boundaries and and semi-supervised learning techniques to further enhance
thin vessels. The other groups of experiments on STARE accuracy and efficiency.
References [13] A. Mosinska, P. Marquez-Neila, M. Kozinski, and
P. Fua. Beyond the pixel-wise loss for topology-aware
[1] J. Almotiri, K. Elleithy, and A. Elleithy. Retinal ves- delineation. In Conference on Computer Vision and
sels segmentation techniques and algorithms: A sur- Pattern Recognition (CVPR), number CONF, 2018.
vey. Applied Sciences, 8(2):155, 2018.
[14] J. Odstrcilik, R. Kolar, A. Budai, J. Hornegger, J. Jan,
[2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine
J. Gazarek, T. Kubena, P. Cernosek, O. Svoboda, and
translation by jointly learning to align and translate.
E. Angelopoulou. Retinal vessel segmentation by im-
arXiv preprint arXiv:1409.0473, 2014.
proved matched filtering: evaluation on a new high-
[3] C. Becker, R. Rigamonti, V. Lepetit, and P. Fua. Su- resolution fundus image database. IET Image Process-
pervised feature learning for curvilinear structure seg- ing, 7(4):373–383, 2013.
mentation. In International Conference on Medical
[15] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee,
Image Computing and Computer-Assisted Interven-
M. Heinrich, K. Misawa, K. Mori, S. McDonagh,
tion, pages 526–533. Springer, 2013.
N. Y. Hammerla, B. Kainz, et al. Attention u-net:
[4] A. Budai, R. Bock, A. Maier, J. Hornegger, and Learning where to look for the pancreas. arXiv
G. Michelson. Robust vessel segmentation in fundus preprint arXiv:1804.03999, 2018.
images. International journal of biomedical imaging,
2013, 2013. [16] J. I. Orlando and M. Blaschko. Learning fully-
connected crfs for blood vessel segmentation in retinal
[5] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanon-
images. In International Conference on Medical Im-
vara, A. R. Rudnicka, C. G. Owen, and S. A. Barman.
age Computing and Computer-Assisted Intervention,
Blood vessel segmentation methodologies in retinal
pages 634–641. Springer, 2014.
images–a survey. Computer methods and programs
in biomedicine, 108(1):407–433, 2012. [17] J. I. Orlando, M. Fracchia, V. del Rı́o, and M. del
Fresno. Retinal blood vessel segmentation in high res-
[6] H. Fu, Y. Xu, D. W. K. Wong, and J. Liu. Retinal ves-
olution fundus photographs using automated feature
sel segmentation via deep learning network and fully-
parameter estimation. In 13th International Confer-
connected conditional random fields. In Biomedical
ence on Medical Information Processing and Analy-
Imaging (ISBI), 2016 IEEE 13th International Sym-
sis, volume 10572, page 1057210. International Soci-
posium on, pages 698–701. IEEE, 2016.
ety for Optics and Photonics, 2017.
[7] Y. Ganin and V. Lempitsky. N4 -fields: Neural net-
work nearest neighbor fields for image transforms. In [18] O. Ronneberger, P. Fischer, and T. Brox. U-net: Con-
Asian Conference on Computer Vision, pages 536– volutional networks for biomedical image segmenta-
551. Springer, 2014. tion. In International Conference on Medical image
computing and computer-assisted intervention, pages
[8] A. Hoover, V. Kouznetsova, and M. Goldbaum. Lo-
234–241. Springer, 2015.
cating blood vessels in retinal images by piecewise
threshold probing of a matched filter response. IEEE [19] J. V. Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek,
Transactions on Medical imaging, 19(3):203–210, and M. J. Cree. Retinal vessel segmentation using the
2000. 2-d gabor wavelet and supervised classification. IEEE
Transactions on medical Imaging, 25(9):1214–1222,
[9] R. Li, W. Liu, L. Yang, S. Sun, W. Hu, F. Zhang, and
2006.
W. Li. Deepunet: A deep fully convolutional network
for pixel-level sea-land segmentation. IEEE Journal [20] J. Son, S. J. Park, and K.-H. Jung. Retinal vessel seg-
of Selected Topics in Applied Earth Observations & mentation in fundoscopic images with generative ad-
Remote Sensing, PP(99):1–9, 2017. versarial networks. arXiv preprint arXiv:1706.09318,
[10] P. Liskowski and K. Krawiec. Segmenting retinal 2017.
blood vessels with deep neural networks. IEEE trans- [21] C. L. Srinidhi, P. Aparna, and J. Rajan. Recent ad-
actions on medical imaging, 35(11):2369–2380, 2016. vancements in retinal vessel segmentation. Journal of
[11] I. Loshchilov and F. Hutter. Fixing weight decay regu- medical systems, 41(4):70, 2017.
larization in adam. arXiv preprint arXiv:1711.05101, [22] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A.
2017. Viergever, and B. Van Ginneken. Ridge-based ves-
[12] K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and sel segmentation in color images of the retina. IEEE
L. Van Gool. Deep retinal image understanding. In In- transactions on medical imaging, 23(4):501–509,
ternational Conference on Medical Image Computing 2004.
and Computer-Assisted Intervention, pages 140–148. [23] C. Ventura, J. Pont-Tuset, S. Caelles, K.-K. Mani-
Springer, 2016. nis, and L. Van Gool. Iterative deep learning
for network topology extraction. arXiv preprint
arXiv:1712.01217, 2017.
[24] P. Vostatek, E. Claridge, H. Uusitalo, M. Hauta-
Kasari, P. Fält, and L. Lensu. Performance compar-
ison of publicly available retinal blood vessel segmen-
tation methods. Computerized Medical Imaging and
Graphics, 55:2–12, 2017.
[25] Y. Wei, Z. Wang, and M. Xu. Road structure refined
cnn for road extraction in aerial image. IEEE Geosci.
Remote Sensing Lett., 14(5):709–713, 2017.
[26] S. Xie and Z. Tu. Holistically-nested edge detection.
In Proceedings of the IEEE international conference
on computer vision, pages 1395–1403, 2015.

You might also like