0% found this document useful (0 votes)
10 views

Bayesian_Deep_Learning_with_Monte_Carlo_Dropout_for_Qualification_of_Semantic_Segmentation

Uploaded by

Aniket kolte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Bayesian_Deep_Learning_with_Monte_Carlo_Dropout_for_Qualification_of_Semantic_Segmentation

Uploaded by

Aniket kolte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BAYESIAN DEEP LEARNING WITH MONTE CARLO DROPOUT FOR QUALIFICATION OF

SEMANTIC SEGMENTATION

Clément Dechesne, Pierre Lassalle Sébastien Lefèvre

CNES Univ. Bretagne Sud / IRISA


IGARSS 2021 - 2021 IEEE International Geoscience and Remote Sensing Symposium | 978-1-6654-0369-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/IGARSS47720.2021.9555043

18 avenue Edouard Belin Campus de Tohannic


31401 Toulouse Cedex 9, France 56000 Vannes, France

ABSTRACT Despite deep neural network architectures achieve state-


Despite the intense development of deep neural networks for of-the-art results in almost all classification tasks, they still
computer vision, and especially semantic segmentation, their make over-confident decisions. Indeed, on the one hand, it
application to Earth Observation data remains usually below is easy to produce images (not recognizable to humans) that
accuracy requirements brought by real-life scenarios. Even existing networks believe to be recognizable with high con-
if well-known deep learning methods produce excellent re- fidence [5]. On the other hand, a small change in the input
sults, they tend to be over-confident and cannot assess how image can lead to a very different prediction, still with a high
relevant their predictions are. In this work, a Bayesian deep confidence [6]. No measure of uncertainty of the prediction
learning method, based on Monte Carlo Dropout, is proposed is provided from the current network architectures. Some
to tackle semantic segmentation of aerial and satellite im- works have been proposed for generating relevant probabil-
ages. Bayesian deep learning can provide both a semantic ity estimates from a deep neural network [7] as a measure of
segmentation and uncertainty maps. Based on the popular U- model confidence. However, these metrics are based on soft-
Net architecture, our model achieves semantic segmentation max probabilities which cannot fully capture uncertainty.
with high accuracy, e.g. F1-score and overall accuracy re- Bayesian deep learning has been proposed for seman-
spectively reaching 90.84% and 93.22% on a public standard tic segmentation to provide some measure of uncertainty in
dataset. Uncertainty maps, also derived from our model, show the prediction. It can be seen as an ensemble or forest of
a strong interest in qualitative evaluation of the segmentation deep neural networks, each providing a single prediction. [8]
and in the improvement of the database. showed that dropout (initially designed to avoid overfitting)
Index Terms— Deep learning, Semantic segmentation, can be used as a Bayesian approximation. [9] applied this
Bayesian network, Optical imagery, Uncertainty estimation method, called Monte Carlo Dropout (MCD), for the seman-
tic segmentation of the Cityscape dataset. They designed a
DeepLab model with MCD and achieved great results with
1. INTRODUCTION
an overall accuracy of 95.3% and Intersection over Union
Deep learning methods have been widely used for semantic (IoU) of 78%. They also provided, along with the seman-
segmentation of optical images. Among the earliest works, tic segmentation output, several uncertainty maps (namely
[1] and [2] used several Fully Convolutional Neural Networks predictive entropy and mutual information), showing how
(FCN) for semantic segmentation on aerial orthophotos with the model was pretty uncertain of its prediction on pixels
three spectral bands (red, green, near-infrared), plus a digital where the prediction was erroneous. [10] also applied MCD
surface model (DSM) of the same resolution. They both re- to a SegNet architecture. The model was trained on CamVid
port excellent results for a 5-class classification task (roads, Road Scenes and SUN RGB-D Indoor Scene Understand-
buildings, low vegetation, tree, car) with an overall accuracy ing datasets. It achieved better results than state-of-the-art
greater than 88%, and also with an efficient detection of small methods and also provided uncertainty maps (for all classes
objects (such as individual cars). In [3], in addition to the and per class, based on output variability). [11] compared
FCNN, a boundary detection CNN module is added, increas- MCD to another Bayesian deep model, where weights were
ing the accuracy of the model. [4] used a refinement module sampled from a distribution. In this case, the model learns
in their FCNN trained on multispectral images for a 18-class the parameters of the distribution instead of the weights.
classification task, achieving excellent results (overall accu- They showed that such models produce better results and
racy greater than 93% and average accuracy of 59.8%). They more interpretable uncertainty maps. However, some specific
also showed that data augmentation was meaningful for se- training strategies were needed.
mantic segmentation. To our knowledge, Bayesian deep learning has never been

978-1-6654-0369-6/21/$31.00 ©2021 IEEE 2536 IGARSS 2021

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
applied to remote sensing images yet. In this paper, we apply
it using Monte Carlo Dropout on aerial images. In addition to
semantic segmentation, we also provide confidence maps, in-
dicating how confident the network is on its prediction. Qual-
ification maps, that combine both segmentation accuracy and
uncertainty are also derived.
Our paper is structured as follows: we first describe our
method in Sec. 2. We then present the dataset and our results
in Sec. 3. We finally draw some conclusions in Sec. 4. CONV + ACTIVATION

MAXPOOLING DECONVOLUTION (UPSAMPLING)

SOFTMAX SKIP INPUT (CONCATENATION)

2. METHOD (a) Traditional U-NET.

We briefly recall here the principles of Bayesian learning,


highlight its relevance w.r.t traditional deep learning and ex-
plain how it is applied to neural networks. The proposed net-
work architecture, inspired from U-NET [12] (see Figure 1a)
but including Bayesian layers, will then be introduced.
Bayesian learning for CNN has been recently proposed [13]
and is based on Bayes by Backprop [14]. It produces results
similar to traditional deep learning methods. However, the
weights of the network are no longer simple points but are CONV + ACTIVATION DROPOUT

sampled according to a distribution whose parameters are MAXPOOLING DECONVOLUTION (UPSAMPLING)

SOFTMAX SKIP INPUT (CONCATENATION)


learned. Therefore, each prediction is different from an other.
(b) Architecture of a MCD model with a block
With a large number of predictions, the average behavior pro- size of 2 and 3 poolings.
duces relevant results, while the variability of the predictions
allows us to assess the confidence of the model. Fig. 1: Traditional U-NET (a) and our derived MCD model
A simple way of implementing Bayesian Deep Learning (b).
is using Monte Carlo Dropout (MCD). [8] demonstrated that
MCD is equivalent to traditional Bayesian Deep Learning. A
layer with weights Mi followed by a dropout layer active in
both training and prediction is equivalent to a Bayesian layer corresponding to the number of classes and a softmax activa-
with weight Wi defined as: tion. The architecture of the proposed model is presented in
Figure 1b.
Wi = Mi · diag(zi )
(1)
with zi ∼ Bernoulli(pi )
3. EXPERIMENTS
with zi the random (in)activation coefficients and Mi the
weights matrix before dropout is applied. pi is the activation We report here some preliminary experiments conducted on
probability for layer i and can be learned or set manually. the ISPRS Vaihingen dataset. It is composed of 38 images
The model is composed of several convolution blocks at a spatial resolution of 9cm. The three bands of the im-
(made of convolution layers with ELU activation), followed ages correspond to the near infrared, red and green bands
by a pooling layer (when downsampling) or a deconvolu- delivered by the camera. 80% of the images are used for
tion layer (when upsampling). After a pooling, the number training/validation, the 20% remaining are kept for testing.
of filters of the convolution layers (resp. deconvolution) is Patches of size 128 × 128 were extracted in order to train the
multiplied (resp. divided) by 2. Each upsampled output is network. Data augmentation was applied by randomly flip-
concatenated with the output of the convolution block of the ping extracted patches.
same size. A model has therefore three parameters; the block The network was trained using the Adam optimizer [15]
size (i.e. the number of convolutions in a block), the number with a batch size of 64 and an initial learning rate of 0.001.
of poolings and the number of filters in the convolution layers The learning rate is reduced on plateau (learning rate divided
of the first convolution block. In order to produce uncertainty by 10 if no decay in the validation loss is observed in the
maps, we exploit the MCD strategy. This is done by adding a 10 last epochs) and we also perform early stopping (stop the
dropout layer at the end of a convolution block. The dropout training if no decay in the validation loss is observed in the
is active in both training and prediction. The last convolution 20 last epochs). These are standard parameters, allowing us
layer of the last convolution block has a number of filters to achieve the best results while avoiding over-fitting.

2537

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
A trained Bayesian model produces different predic- indeed predicted with a low confidence (blue pixels). It is also
tions for the same input data since its weights are sampled interesting to note that similar classes (e.g. low vegetation and
from a distribution. Therefore, several predictions need to tree) are well predicted but again with a low confidence. This
be performed. For each iteration, the model will produce a shows that our network, and more generally Bayesian deep
pixel-wise probability. The final semantic segmentation is learning, is relevant to provide both a high-quality semantic
obtained through a majority vote from all these predictions. segmentation but also some associated uncertainty metrics.
From this semantic segmentation, one can derive confusion
matrices and several metrics, e.g. precision, recall, accuracy,
F1-score and kappa coefficient (κ).
Since this segmentation is not sufficient to assess the re-
liability of the model, other metrics able to evaluate the un-
certainty of the network were also computed. We investi-
gate here two types of uncertainty measures among those re-
viewed in [13]. The Epistemic uncertainty (or model uncer-
tainty) represents what the model does not know due to in-
sufficient training data. The Aleatoric uncertainty is related
to the measurement noise of the sensor. Combined, these
two uncertainties form the predictive uncertainty of the net-
work. In this work, two metrics were derived, namely the
entropy of the predictive distribution (a.k.a. predictive en-
tropy) and the mutual information between the predictive dis- (a) False-color infrared image (b) Ground truth. Color code:
(NIR, red, green) input image. impervious surface, build-
tribution and the posterior over network weights [9]. These
metrics are very interesting since mutual information mostly ing, low vegetation, tree,
captures epistemic (or model uncertainty) whereas predictive car, clutter/background
entropy captures predictive uncertainty which combines both
epistemic and aleatoric uncertainties. The Predictive entropy
is computed as follow:
! !
X 1X 1X
Ĥ = − pc,ŵt (y|x) log pc,ŵt (y|x)
c
T t T t
(2)
where c ranges over all the classes, T is the number of Monte
Carlo samples, pc,ŵt (y|x) is the softmax probability of input
x being in class c , and ŵt are the model parameters on the tth
Monte Carlo sample. The mutual information is computed as
follow: Right label

Wrong label

1X Con dence

Î = Ĥ + pc,ŵt (y|x)log(pc,ŵt (y|x)) (3)


T c,t (c) Results of the semantic segmen- (d) Qualification map using
tation. Same color code as in (b). the predictive entropy as un-
certainty metric.
In order to evaluate more precisely the impact of uncer-
tainty metrics, qualification maps were computed. A qualifi- Fig. 2: Results on the ISPRS dataset using the MCD model.
cation map combines the validity of the majority vote (if the
network predicted the right or the wrong label) and the uncer-
tainty of the majority vote (how confident is the network in
its prediction). For the sake of visual understanding, we com- 4. CONCLUSION
pute two different color gradients describing the uncertainty
of the prediction, for the well-labelled and wrongly-labelled In this work, we consider semantic segmentation within a
pixels respectively. Bayesian deep learning context in order to both improve seg-
The results obtained on the ISPRS Vaihingen dataset are mentation accuracy and provide some measures of prediction
presented in Figure 2 and Table 1. We achieved very high uncertainty. Our model includes the Monte Carlo Dropout
scores, with an overall accuracy of 93.22% and a F1-score method into the popular U-Net architecture, leading to an
of 90.84%. It is slightly better than other results reported on original Bayesian model. The proposed network performs
this dataset (with overall accuracy usually ranging from 80% well on the well-established ISPRS Vaihingen dataset, with
to 91%). Figure 2d shows that wrongly predicted pixel are results comparable or better to existing methods (overall ac-

2538

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
Class
Method Measure Overall
impervious surface building low vegetation tree car clutter/background
F1-Score 90.84 97.15 92.85 90.53 87.49 83.12 93.90
Accuracy 93.22 98.78 95.00 94.84 99.96 99.76 98.10
Proposed (Bayesian U-Net) Precision 88.79 97.36 94.04 89.24 78.07 80.45 93.58
Recall 93.27 96.94 91.70 91.86 99.49 85.98 94.22
κ × 100 93.06 96.38 89.01 86.99 87.47 83.00 92.72
F1-Score 89.85 96.79 91.94 89.02 86.38 82.24 92.75
Baseline (U-Net)
Accuracy 92.23 98.63 94.37 94.02 99.95 99.75 97.74

Table 1: Comparative evaluation of our Bayesian U-Net with MCD and a standard U-Net architecture considered as baseline.

curacy of 93.22% and F1-score of 90.84%). More impor- [4] R. Kemker, C. Salvaggio, and C. Kanan, “Algorithms
tantly, our Bayesian deep network is able to extract uncer- for semantic segmentation of multispectral remote sens-
tainty maps that are very useful for assessing the output seg- ing imagery using deep learning,” P&RS, vol. 145, pp.
mentation. Qualification of land cover maps is a strong re- 60–77, 2018.
quirement for delivering AI-driven EO products. Further- [5] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural net-
more, one can analyse such maps, together with initial ground works are easily fooled: High confidence predictions for
truth, to spot areas where the ground truth might be erroneous, unrecognizable images,” in CVPR, 2015, pp. 427–436.
before conducting some automatic or manual correction. In
this context, the uncertainty maps can be exploited to gener- [6] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack
for fooling deep neural networks,” IEEE Transactions
ate reference data with higher accuracy.
on Evolutionary Computation, vol. 23, no. 5, pp. 828–
We now plan to evaluate how Bayesian deep learning can 841, 2019.
help to improve the quality of reference data. First, the areas
where the network predicted the wrong label with high con- [7] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On
fidence need to be re-inspected and corrected if needed. It calibration of modern neural networks,” arXiv preprint
will also require us to assess whether the network had appro- arXiv:1706.04599, 2017.
priate reasons to be confident or not. Then we can re-trained [8] Y. Gal and Z. Ghahramani, “Dropout as a bayesian ap-
the network using the updated ground truth, and experimen- proximation: Representing model uncertainty in deep
tally assess the possible gain in prediction quality. We would learning,” in ICML, 2016, pp. 1050–1059.
like also to investigate Bayesian neural network considering
[9] J. Mukhoti and Y. Gal, “Evaluating bayesian deep learn-
another variational inference. This would allow us to use dif-
ing methods for semantic segmentation,” arXiv preprint
ferent distributions for the network weights (such as a normal arXiv:1811.12709, 2018.
distribution) since it tends to produce more significant uncer-
tainty maps [11]. The main challenge here is the increase in [10] A. Kendall, V. Badrinarayanan, and R. Cipolla,
number of parameters, leading to training issues that need to “Bayesian segnet: Model uncertainty in deep convo-
be addressed. Finally, as every ensemble method, one no- lutional encoder-decoder architectures for scene under-
table issue is also the inference time, since multiple predic- standing,” arXiv preprint arXiv:1511.02680, 2015.
tions need to be performed. [11] T. M. LaBonte, C. Martinez, and S. A. Roberts, “We
know where we don’t know: 3d bayesian cnns for credi-
ble geometric uncertainty,” Tech. Rep., Sandia National
5. REFERENCES Lab, Albuquerque, NM, USA, 2020.

[1] D. Marmanis, J. D. Wegner, S. Galliani, K. Schindler, [12] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo-
M. Datcu, and U. Stilla, “Semantic segmentation of lutional networks for biomedical image segmentation,”
aerial images with an ensemble of cnss,” ISPRS Annals, in MICCAI. Springer, 2015, pp. 234–241.
vol. 3, pp. 473–480, 2016.
[13] K. Shridhar, F. Laumann, and M. Liwicki, “A
comprehensive guide to bayesian convolutional neural
[2] N. Audebert, B. Le Saux, and S. Lefèvre, “Seman- network with variational inference,” arXiv preprint
tic segmentation of earth observation data using mul- arXiv:1901.02731, 2019.
timodal and multi-scale deep networks,” in ACCV.
Springer, 2016, pp. 180–196. [14] A. Graves, “Practical variational inference for neural
networks,” in NIPS, 2011, pp. 2348–2356.
[3] D. Marmanis, K. Schindler, J. D. Wegner, S. Galliani,
M. Datcu, and U. Stilla, “Classification with an edge: [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic
Improving semantic image segmentation with boundary optimization,” arXiv preprint arXiv:1412.6980, 2014.
detection,” P&RS, vol. 135, pp. 158–172, 2018.

2539

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.

You might also like