Bayesian_Deep_Learning_with_Monte_Carlo_Dropout_for_Qualification_of_Semantic_Segmentation
Bayesian_Deep_Learning_with_Monte_Carlo_Dropout_for_Qualification_of_Semantic_Segmentation
SEMANTIC SEGMENTATION
Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
applied to remote sensing images yet. In this paper, we apply
it using Monte Carlo Dropout on aerial images. In addition to
semantic segmentation, we also provide confidence maps, in-
dicating how confident the network is on its prediction. Qual-
ification maps, that combine both segmentation accuracy and
uncertainty are also derived.
Our paper is structured as follows: we first describe our
method in Sec. 2. We then present the dataset and our results
in Sec. 3. We finally draw some conclusions in Sec. 4. CONV + ACTIVATION
2537
Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
A trained Bayesian model produces different predic- indeed predicted with a low confidence (blue pixels). It is also
tions for the same input data since its weights are sampled interesting to note that similar classes (e.g. low vegetation and
from a distribution. Therefore, several predictions need to tree) are well predicted but again with a low confidence. This
be performed. For each iteration, the model will produce a shows that our network, and more generally Bayesian deep
pixel-wise probability. The final semantic segmentation is learning, is relevant to provide both a high-quality semantic
obtained through a majority vote from all these predictions. segmentation but also some associated uncertainty metrics.
From this semantic segmentation, one can derive confusion
matrices and several metrics, e.g. precision, recall, accuracy,
F1-score and kappa coefficient (κ).
Since this segmentation is not sufficient to assess the re-
liability of the model, other metrics able to evaluate the un-
certainty of the network were also computed. We investi-
gate here two types of uncertainty measures among those re-
viewed in [13]. The Epistemic uncertainty (or model uncer-
tainty) represents what the model does not know due to in-
sufficient training data. The Aleatoric uncertainty is related
to the measurement noise of the sensor. Combined, these
two uncertainties form the predictive uncertainty of the net-
work. In this work, two metrics were derived, namely the
entropy of the predictive distribution (a.k.a. predictive en-
tropy) and the mutual information between the predictive dis- (a) False-color infrared image (b) Ground truth. Color code:
(NIR, red, green) input image. impervious surface, build-
tribution and the posterior over network weights [9]. These
metrics are very interesting since mutual information mostly ing, low vegetation, tree,
captures epistemic (or model uncertainty) whereas predictive car, clutter/background
entropy captures predictive uncertainty which combines both
epistemic and aleatoric uncertainties. The Predictive entropy
is computed as follow:
! !
X 1X 1X
Ĥ = − pc,ŵt (y|x) log pc,ŵt (y|x)
c
T t T t
(2)
where c ranges over all the classes, T is the number of Monte
Carlo samples, pc,ŵt (y|x) is the softmax probability of input
x being in class c , and ŵt are the model parameters on the tth
Monte Carlo sample. The mutual information is computed as
follow: Right label
Wrong label
1X Con dence
2538
Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.
Class
Method Measure Overall
impervious surface building low vegetation tree car clutter/background
F1-Score 90.84 97.15 92.85 90.53 87.49 83.12 93.90
Accuracy 93.22 98.78 95.00 94.84 99.96 99.76 98.10
Proposed (Bayesian U-Net) Precision 88.79 97.36 94.04 89.24 78.07 80.45 93.58
Recall 93.27 96.94 91.70 91.86 99.49 85.98 94.22
κ × 100 93.06 96.38 89.01 86.99 87.47 83.00 92.72
F1-Score 89.85 96.79 91.94 89.02 86.38 82.24 92.75
Baseline (U-Net)
Accuracy 92.23 98.63 94.37 94.02 99.95 99.75 97.74
Table 1: Comparative evaluation of our Bayesian U-Net with MCD and a standard U-Net architecture considered as baseline.
curacy of 93.22% and F1-score of 90.84%). More impor- [4] R. Kemker, C. Salvaggio, and C. Kanan, “Algorithms
tantly, our Bayesian deep network is able to extract uncer- for semantic segmentation of multispectral remote sens-
tainty maps that are very useful for assessing the output seg- ing imagery using deep learning,” P&RS, vol. 145, pp.
mentation. Qualification of land cover maps is a strong re- 60–77, 2018.
quirement for delivering AI-driven EO products. Further- [5] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural net-
more, one can analyse such maps, together with initial ground works are easily fooled: High confidence predictions for
truth, to spot areas where the ground truth might be erroneous, unrecognizable images,” in CVPR, 2015, pp. 427–436.
before conducting some automatic or manual correction. In
this context, the uncertainty maps can be exploited to gener- [6] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack
for fooling deep neural networks,” IEEE Transactions
ate reference data with higher accuracy.
on Evolutionary Computation, vol. 23, no. 5, pp. 828–
We now plan to evaluate how Bayesian deep learning can 841, 2019.
help to improve the quality of reference data. First, the areas
where the network predicted the wrong label with high con- [7] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On
fidence need to be re-inspected and corrected if needed. It calibration of modern neural networks,” arXiv preprint
will also require us to assess whether the network had appro- arXiv:1706.04599, 2017.
priate reasons to be confident or not. Then we can re-trained [8] Y. Gal and Z. Ghahramani, “Dropout as a bayesian ap-
the network using the updated ground truth, and experimen- proximation: Representing model uncertainty in deep
tally assess the possible gain in prediction quality. We would learning,” in ICML, 2016, pp. 1050–1059.
like also to investigate Bayesian neural network considering
[9] J. Mukhoti and Y. Gal, “Evaluating bayesian deep learn-
another variational inference. This would allow us to use dif-
ing methods for semantic segmentation,” arXiv preprint
ferent distributions for the network weights (such as a normal arXiv:1811.12709, 2018.
distribution) since it tends to produce more significant uncer-
tainty maps [11]. The main challenge here is the increase in [10] A. Kendall, V. Badrinarayanan, and R. Cipolla,
number of parameters, leading to training issues that need to “Bayesian segnet: Model uncertainty in deep convo-
be addressed. Finally, as every ensemble method, one no- lutional encoder-decoder architectures for scene under-
table issue is also the inference time, since multiple predic- standing,” arXiv preprint arXiv:1511.02680, 2015.
tions need to be performed. [11] T. M. LaBonte, C. Martinez, and S. A. Roberts, “We
know where we don’t know: 3d bayesian cnns for credi-
ble geometric uncertainty,” Tech. Rep., Sandia National
5. REFERENCES Lab, Albuquerque, NM, USA, 2020.
[1] D. Marmanis, J. D. Wegner, S. Galliani, K. Schindler, [12] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo-
M. Datcu, and U. Stilla, “Semantic segmentation of lutional networks for biomedical image segmentation,”
aerial images with an ensemble of cnss,” ISPRS Annals, in MICCAI. Springer, 2015, pp. 234–241.
vol. 3, pp. 473–480, 2016.
[13] K. Shridhar, F. Laumann, and M. Liwicki, “A
comprehensive guide to bayesian convolutional neural
[2] N. Audebert, B. Le Saux, and S. Lefèvre, “Seman- network with variational inference,” arXiv preprint
tic segmentation of earth observation data using mul- arXiv:1901.02731, 2019.
timodal and multi-scale deep networks,” in ACCV.
Springer, 2016, pp. 180–196. [14] A. Graves, “Practical variational inference for neural
networks,” in NIPS, 2011, pp. 2348–2356.
[3] D. Marmanis, K. Schindler, J. D. Wegner, S. Galliani,
M. Datcu, and U. Stilla, “Classification with an edge: [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic
Improving semantic image segmentation with boundary optimization,” arXiv preprint arXiv:1412.6980, 2014.
detection,” P&RS, vol. 135, pp. 158–172, 2018.
2539
Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on October 15,2024 at 05:24:29 UTC from IEEE Xplore. Restrictions apply.