Springer Lecture Notes in Computer Science 1
Springer Lecture Notes in Computer Science 1
Systems?
1 Introduction
Removing shadows from the images has been considered as a challenging task
in the field of computer vision. The presence of opaque objects in the path of
sunlight leads to the formation of shadows and depend on different factors such
as the altitude of the sun and location of the object. For example, consider a
bike and bus in traffic such that the bike is standing left side of the bus. There
are chances for the shadow of the bus to cover the bike if the sunlight is from the
right side of the bus. Shadow of different shapes distorts two different objects
into a single object. This is called as the occlusion. This is a difficult situation
in which we can’t efficiently detect different objects. In this example, it will be
difficult for us to distinguish between bike and bus. Probably the Bus and its
shadow will merge together and form another shape which will be far different
from the shape of a bus.
?
Supported by Amrita Vishwa Vidyapeetham
2 Jishnu P and Rajathilagam B
Fig. 1 illustrates the expected outcome of the shadow removal process. In tra-
ditional approaches, a common method to remove shadow consist of detecting
shadows and using detected shadow masks as a clue for removing the shad-
ows.The field of shadow detection predicts the location of the shadowed region
in the image and separates the shadowed and non-shadowed region of the original
image in pixels. This has been considered a challenging task to classify shadows
in an image because shadows have various properties. Depending on the degrees
of occlusion by the object, the brightness of shadow varies such as umbra and
penumbra. The dark part of the shadow is called the umbra, and the part of a
shadow that’s a little lighter is called the penumbra.
After the introduction of Generative Adversarial Networks (GAN) in 2014[1],
the computer vision domain has taken leap at various tasks. Shadow removal is
an important task which can be considered as an invaluable preprocessing step
for higher level Computer Vision tasks in surveillance systems like road accident
identification and severity determination from CCTV surveillance[2] and plant
leaf recognition using machine learning techniques[3].
The challenging task is to ensure the higher quality in shadow-less image by
using efficient evaluation metric and enhanced architecture. Shadow removal is
also a difficult task because we have to remove shadows and restore the infor-
mation in that region according to the degree of occlusion.
2 Background
3 Methodology
Fig. 2 Illustrates the proposed architecture for the shadow removal task.
Shadowed image given to the generator module and produces the shadow-less
(generated image)version of the input image. The discriminator module takes the
Enhanced Shadow Removal for Surveillance Systems 5
paired image containing the generated shadow-less image and the real shadow-
less image. The duty of the discriminator is to check whether the paired image
is real or fake. The generator module trained in such a way that to minimize
the loss between the expected target image and generated image and fool the
discriminator module.
Since we are more focused on the shadow removal task, ISTD[4] shadow removal
bench mark dataset is used. Dataset contains shadowed images, shadow mask,
shadow-less images. Train data - 1330 images (640 * 480 pixels) Test data - 540
images(640 * 480 pixels)
During the data preprocessing phase, the images are loaded and re-scaled to
(256*256) pixels for processing convenience and converted to numpy array
Fig. 4 Shows the overall abstract architecture of the Generator. The genera-
tor is an encoder-decoder model using a U-Net architecture[13]. The encoder and
decoder of the generator contain convolutional, batch normalization, dropout,
and activation layers. The encoder encodes the information in the given input
image and the context information generated in the bottleneck is used for recon-
structing the image using the decoder block. Skip connections are used between
corresponding encoder and decoder layers for improving the quality of the image
generated by the generator.
Adversarial training is used for training the discriminator model. The training
of generator is too slow compared to the discriminator. This will lead to the issues
in the GAN such as Vanishing gradient, Mode collapse, Non - convergence. The
obvious solution is to balance generator & discriminator training to avoid over-
fitting.
Combined Loss(1) introduced to balance the training of Generator and Dis-
criminator. Giving more importance to the reconstruction loss(2) rather than
the adversarial loss(3) during the Generator training will reduce the effect of the
fast training of Discriminator on the Generator.
Where,
λ1, λ2 = 100, 1
Loss function between the generated fake image and the shadow-less image is
called as the reconstruction loss.
n
X
Lreconstruction = |Y true − Y pred| (2)
i=1
Where,
Ypred = The predicted value for the ith pixel.
Ytrue = The observed(actual) value for the ith pixel
n = Total number of pixels.
Adversarial loss (Binary Cross Entropy loss)
Where,
G - Generator D - Discriminator
x - shadow-less image y - shadowed image
In this project, ISTD dataset is used to train our model and the metrics
RMSE, PSNR are calculated. The state of the art models RIS-GAN[5], Stacked
8 Jishnu P and Rajathilagam B
Conditional GAN(ST-GAN)[4] used as the base models for comparing the per-
formance of our model(Table 1). The training graph based on the generator loss
and discriminator loss is shown in figure 6.
From the loss curve shown in Fig.6, it is clear that the generator and discrim-
inator training is like a min-max game. The generator tries to fool the discrimi-
nator and at the same time, the discriminator try not to fool by the generator’s
fake image. The loss never become same for the generator and discriminator for
a good GAN[1] model.
Fig. 7. sample output. source, generated, expected images are arranged row wise.
From Fig.7,we can see that our model performs well for the indoor images
and the outdoor images. First and third column corresponds to the outdoor
images. RMSE = 0.045 and PSNR = 78.76dB for these two images. Second
column corresponds to the indoor image. It is an image of a black board in a
classroom. The shadow successfully removed by our model. RMSE = 0.08 and
PSNR = 74.58db for this particular indoor shadowed image sample.
Fig.8 illustrates the sample output of the proposed model on the sample
images which are entirely different from the training dataset. First row corre-
sponds to an indoor shadowed image and corresponding output of our model.
Enhanced Shadow Removal for Surveillance Systems 9
Second row corresponds to a outdoor shadowed image from a real life instance
and corresponding shadow-less image produced by our model.
It is clear that, proposed model performs well in the shadowed images outside
the training dataset and it is evident in the shadow-less images produced by the
proposed model.
5 Conclusion
In this project, we proposed a GAN based shadow removal model for generat-
ing enhanced shadow-less images. Initially, we use basic conditional GAN model
on ISTD dataset and analyzed the areas of improvements. Secondly, we mod-
ified the architecture and introduced a combined loss for training the model.
Tuned the parameters by conducting repeated experiments and identified the
appropriate set of parameters for the model. From the experiments, it was seen
that the proposed model performs better than the existing model. The model
showed promising results on the shadowed(outdoor and indoor) real-time images
collected by us.
As a future enhancement, shadow-less image generated by the proposed
model can be improved further by using image super-resolution techniques. Pa-
rameter tuning is a time-consuming task and also requires more domain knowl-
edge. In addition to future enhancement, Neuroevolution techniques[14] can be
used to tune the hyper-parameters to identify the best set of parameters for
the model and thereby improving the performance of our model. The enhanced
model can be used for improving the efficiency of the object detection and track-
ing applications, especially for the applications like Wild animal detection and
recognition from aerial videos using computer vision technique[15] in which the
presence of shadow is unavoidable.
10 Jishnu P and Rajathilagam B
References
1. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, “Generative Adversarial Net-
works”, International Conference on Neural Information Processing Systems, 2014,
https://arxiv.org/abs/1406.2661
2. S Veni, Anand, R., and Santosh, B., “Road Accident Detection and Severity De-
termination from CCTV Surveillance”, in Advances in Distributed Computing and
Machine Learning, Singapore, 2021.
3. Sujee R. and Thangavel, S. Kumar, “Plant Leaf Recognition Using Machine Learn-
ing Techniques”, New Trends in Computational Vision and Bio-inspired Computing:
Selected works presented at the ICCVBIC 2018, Coimbatore, India. Springer Inter-
national Publishing, Cham, pp. 1433–1444, 2020.
4. Jifeng Wang, Xiang Li2, Jian Yang, “Stacked Conditional Generative Adversar-
ial Networks for Jointly Learning Shadow Detection and Shadow Removal”,CVPR
2017, https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00192
5. Ling Zhang,1 Chengjiang Long,2 Xiaolong Zhang,1 Chunxia Xiao3, “RIS-GAN: Ex-
plore Residual and Illumination with Generative Adversarial Networks for Shadow
Removal”, AAAI 2020,https://doi.org/10.1609/aaai.v34i07.6979
6. L. Qu, J. Tian, S. He, Y. Tang and R. W. H. Lau, ”DeshadowNet:
A Multi-context Embedding Deep Network for Shadow Removal,” IEEE
Conference on Computer Vision and Pattern Recognition (CVPR)
2017,https://ieeexplore.ieee.org/document/8099731
7. Sidorov, Oleksii, “Conditional GANs for Multi-Illuminant Color Constancy: Revo-
lution or Yet Another Approach?”,The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) 2018,https://arxiv.org/abs/1811.06604
8. Wang, H. Xu, Z. Zhou, L. Deng and M. Yang, ”Shadow Detection and Removal
for Illumination Consistency on the Road,” in IEEE Transactions on Intelligent
Vehicles, 2020, https://ieeexplore.ieee.org/document/9068460
9. Wang, L. Deng, Z. Zhou, M. Yang and B. Wang, ”Shadow de-
tection and removal for illumination consistency on the road,
”https://ieeexplore.ieee.org/document/8304275
10. Yun, Heejin Kang Jik, Kim Chun, Jun-Chul. “Shadow Detection and Removal
From Photo-Realistic Synthetic Urban Image Using Deep Learning”, Computers,
Materials Continua2019, https://www.techscience.com/cmc/v62n1/38123
11. Xiaodong Cun, Chi-Man Pun, Cheng Shi, “ Towards Ghost-free Shadow Removal
via Dual Hierarchical Aggregation Network and Shadow Matting GAN” AAAI 2020,
https://arxiv.org/abs/1911.08718
12. M. Mirza, S. Osindero, ”Conditional Generative Adversarial Nets”, In Arxiv 2014.
http://arxiv.org/abs/1411.1784
13. Olaf Ronneberger, Philipp Fischer, Thomas Brox,”U-Net: Convolu-
tional Networks for Biomedical Image Segmentation”MICCAI 2015,
https://arxiv.org/abs/1505.04597
14. K. Sree and Dr. Jeyakumar G., “An Evolutionary Computing Approach to Solve
Object Identification Problem for Fall Detection in Computer Vision-Based Video
Surveillance Applications”, Journal of Computational and Theoretical Nanoscience,
vol. 17, no. 1, pp. 1-18, 2020. https://doi.org/10.1166/jctn.2020.8687
15. M. K. and Dr. Padmavathi S., “Wild animal detection and recognition from aerial
videos using computer vision technique”, International Journal of Emerging Trends
in Engineering Research, vol. 7, no. 5, pp. 21-24, 2019.