Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation
Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation
1. Introduction
Retinal vasculature structure implicates important infor-
mation and helps the ophthalmologist in detecting and di-
agnosing a variety of retinal pathology such as Retinopa-
thy of Prematurity (RoP), Diabetic Retinopathy(DR), Glau-
Figure 1. Challenges on retinal vessel segmentation:(a) an exam- coma, hypertension, and Age-related Macular Degenera-
ple on STARE, from left to right: input, GT, VGAN and our
tion(AMD) which are leading causes of blindness. The seg-
method; (b) our results for bifurcation, intersection, tortuosity and
mentation of retinal vessels is particularly important for di-
microvascular cases.
agnosis assistance, treatment and surgery planning of reti-
nal diseases. Changes in vessel morphology such as shape,
Abstract tortuosity, branching pattern and width provide an accurate
early detection of many retinal diseases.
We develop a connection sensitive attention U- Over the past two decades, a tremendous amount of re-
Net(CSAU) for accurate retinal vessel segmentation. This search has been devoted in segmenting the vessels from reti-
method improves the recent attention U-Net for semantic nal fundus images. Numerous fully automated methods[24,
segmentation with four key improvements: (1) connection 14, 17] have been proposed in literature which were quite
sensitive loss that models the structure properties to im- successful in achieving segmentation accuracy on par with
prove the accuracy of pixel-wise segmentation; (2) attention trained human annotators. Despite this, there is a con-
gate with novel neural network structure and concatenating siderable method for further improvements due to various
DOWN-Link to effectively learn better attention weights on challenges posed by the complex nature of vascular struc-
fine vessels; (3) integration of connection sensitive loss and tures. Some of the active problems include segmentation
attention gate to further improve the accuracy on detailed in the presence of abnormalities, segmentation of thin ves-
vessels by additionally concatenating attention weights to sels structures and segmentation near the bifurcation and
features before output; (4) metrics of connection sensitive crossover regions.
accuracy to reflect the segmentation performance on bound- Comprehensive and detailed survey of retinal vessels
aries and thin vessels. segmentation methods are included in [21, 1, 5]. Works that
Our method can effectively improve state-of-the-art ves- concerned by the paper are deep learning based methods for
sel segmentation methods that suffer from difficulties in accurate retinal vessel segmentation. Liskowski et al. [10]
presence of abnormalities, bifurcation and microvascular. proposed a deep neural network model, achieving an area
This connection sensitive loss tightly integrates with the under the curve (ROC AUC) of 0.97 on the DRIVE dataset.
proposed attention U-Net to accurately (i) segment retinal Their method performs reasonably well on pathological im-
ages. A novel CNN architecture was proposed in [12] to
∗ The corresponding author. solve both the retinal vessel and optic disc segmentation
1
problem. Fu et al. [6] formulated the vessel segmentation as
a boundary detection problem using fully connected CNN
model. In semantic segmentation field, U-Net[18] are fully
convolutional networks for biomedical image segmentation.
Though many deep learning based approaches have been
proposed, existing methods tend to miss fine vessels struc-
tures or allow false positives at terminal branches.Attention
U-Net[15] is used to automatically learn to focus on target
structure of varying shapes and sizes. Mosinska et.al [13]
have found that pixel-wise losses are unsuitable for retinal
Figure 2. The proposed framework
vessel segmentation because of their inability to reflect the
topological impact of mistakes in the final prediction. The
work[25] added a coefficient to cross-entropy loss. It de- 4. In order to better reflect the quality of the segmentation
signed an estimating way of connectivity depending on the details, this paper invents a new metrics to evaluate the
Euclidean distance between focused pixel and the nearest segmentation of boundaries and thin vessel structures.
pixel belongs to the class. Ventura et.al[23] defined a new We name it as connection sensitive accuracy.
way to evaluate the connectivity on a patch. The most recent
approach by Son et al. [20] generates the precise map of In Section 2, we will introduce the proposed method.
retinal vessels using generative adversarial training (GAN). Section 3 shows implementation details that include data
Unfortunately, with limited data, generative models are con- preprocessing and training process. And Section 4 dis-
sidered much harder to train than discriminative models. cusses the experimental situation and analyzes the results.
For thin vessels segmentation, this paper proposes an The last section shows the conclusions of this paper.
efficient topology-aware loss and a novel attention mech-
anism based on the U-Net to improve the accuracy. The 2. Proposed methodology
proposed loss is called connection sensitive loss (CS loss) in In this section, we present the architecture of the con-
that it considers the probability of connectivity in the neigh- nection sensitive attention U-Net (CSAU). The main frame-
boring region when designing the loss function. Moreover, work is showed in Fig. 2. Its structure is very like the
the network is added new attention gates and learns a bet- original attention U-Net except the connections and the de-
ter matrix of attention weights before output. The proposed signs of attention gates. Moreover, the framework uses a
method provides an end-to-end fashion without any inter- new connection sensitive loss with which the attention gate
vene in learning. With the well-designed attention U-Net learns better attentive weights and helps improve the accu-
architecture, the proposed connection sensitive loss gets racy of details.
the highest F1 -score on all the three datasets which are
The parameters of the convolutional neural layers are
DRIVE[22], STARE[8] and HRF[4]. It also performs better
listed in Table 1. The network contains four encoder blocks
to extract thin vessel structures compared with the state-of-
and four decoder blocks. They are connected by the skip
the-art methods. In summary, the paper mainly made the
connections. Each encoder block consists of two successive
following contributions:
3×3 convolutional layers and a max pooling layer. Every
1. For vessels segmentation, the paper proposes a con- convolutional layer is followed by a Batch-normalization
nection sensitive loss. It is designed for simultane- layer and a ReLU layer. The decoder block is the same
ous region-wise structure extraction and pixel-wise se- as the encoder block except that it uses the transposed con-
mantic segmentation. It helps achieve accurate results, volutional layer instead of the pooling layer.
even for thin vessel structures in crossover regions.
2.1. Connection sensitive loss
2. A new attention mechanism is designed based on the
The parameters of the model are learnt by a training ob-
standard U-Net. The proposed attention gates improve
jective, using Adam stochastic gradient descent. In this pa-
the quality and the effectiveness of the features and
per, we build a new training objective on top of the pro-
thus take better advantage of them during segmenta-
posed attention U-Net architecture. In the following discus-
tion.
sion, let x ∈ RH×W be the H × W input image, and let
3. The paper proposes the connection sensitive attention y ∈ {0, 1}H×W be the corresponding ground-truth label-
U-Net (CSAU) which combines the connection sensi- ing, with 1 indicating pixels in the vessels and 0 indicating
tive loss and the attention gates together. In the experi- background pixels. Let f be the proposed neural network
ment, CSAU gets the highest F1 -score on all the three parameterized by weights v. The output of the network is
datasets compared with the state-of-the-art methods. an image ŷ = f (x, v) ∈ {0, 1}H×W . Every element of
Table 1. The parameters of the convolutional neural layers.
Block Layer Layer
Remark
name name configuration
Encoder conv1 1 3×3, 32
Block(1) conv1 2 3×3, 32
2×2 max pool, stride 2
Encoder conv2 1 3×3, 64
Block(2) conv2 2 3×3, 64
2×2 max pool, stride 2 Down-sampling
Encoder conv3 1 3×3, 128 path
Figure 3. Results trained by binary cross entropy in which red pix-
Block(3) conv3 2 3×3, 128 els are false negatives.
2×2 max pool, stride 2
Encoder conv4 1 3×3, 256
Block(4) conv4 2 3×3, 256 ing two coefficients into the cross-entropy loss, as showed
2×2 max pool, stride 2 in (3). Lcs is the connection sensitive loss. θ1 and θ2 rep-
conv5 1 3×3, 512
Decoder
conv5 2 3×3, 512
resent local structural properties in the labeled ground truth
Block(5) and the predicted map respectively, while wi is a weighted
convTranspose5 1 2×2, 256
Decoder
conv6 1 3×3, 256 parameter that multiplies with the encoded loss on every
conv6 2 3×3, 256 pixel which will be explained later.
Block(6)
convTranspose6 1 2×2, 128 Up-sampling
conv7 1 3×3, 128 path X
Decoder Lcs = − wi ∗(θ1 yi log(fi (x,v))+θ2 (1−yi )log(1−fi (x,v))
conv7 2 3×3, 128
Block(7)
convTranspose7 1 2×2, 64 i=1
conv8 1 3×3, 64 (2)
Decoder
conv8 2 3×3, 64 To model the structural properties, an exponential func-
Block(8)
convTranspose8 1 2×2, 32
tion is constructed as showed in the following equation:
conv9 1 3×3, 32
conv9 2 3×3, 1 2 2
∗yi ) ∗fi (x,v))
conv10 1 3×3, 32 θ1 = e(1−Ci , θ2 = e(1−Ci (3)
conv10 2 3×3, 32
conv10 3 3×3, 1 in which Ci represents the probability of connectivity in
local regions. It can be computed by the following function
with upper bound 1 and lower bound 0. zi is a variable
ŷ is interpreted as the probability of pixel i having label 1:
representing whether the pixel belongs to the ground truth
ŷi ≡ p(Yi = 1|x, v), where Yi is a random Bernoulli vari-
(zi = yi ) or the predicted map (zi = fi (x, v)).
able Yi ∼ Ber(ŷi ).
Cross entropy is widely used as the loss function in deep P
i∈Ω(m,n,r;z) β
learning networks to deal with binary classification prob- Ci,i∈Ω(m,n,r;z) = max(min(α ∗ ( ) − γ,1),0)
r2
lems, which calculates the probability of being one specific (4)
class or not. Thus, the proposed loss function is also on the It is observed that Ci is strongly correlated to the lo-
basis of the cross-entropy loss Lce defined by cal density. To estimate Ci , the function chooses a poly-
nomial model and computes the local density by averaging
X the values in the region. α, β, γ are constant coefficients.
Lce = − (yi ∗log(fi (x, v))+(1−yi )∗log(1−fi (x, v))
Ω(m, n, r; z) represents a square region in the map z with
i=1
(1) the side length r and the coordinate (m, n) of the center
By observing the definition of Lce in (1), we can find point. The region can be defined with the matrix in the
that the cross-entropy loss assigns equal weights to the loss equation:
of different pixels, failing to consider fine object structures.
Z(m−r−1 ,n−r−1 ) ···
· · · Z(m−r−1 ,n+ r−1 )
Therefore, cross entropy loss is not fit well to the tasks of 2 2 2 2
0.60 is the background and the yellow region is the vessels. (b) is the
0.55
0.50 corresponding region of connectivity feature map where the pixels
0.45
0.40
0.35
0.30
with dark value have higher probability of connectivity than those
0.25
0.20
0.15 experimental curve with bright colors.
0.10 ideal curve
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Density
Figure 5. Curves of connectivity probability with different densi-
ties in 5×5 region.