Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
Abstract
Key words: Remote Sensing Images, Feature Extraction, Multi Scale Object
Detection
I. INTRODUCTION
Remote sensing object detection (RSOD) is the most researched topic in Remote
Sensing Images (RSI). It locates the object regions of interest and classifies the multi
objects present. Remote Sensing Object Detection still remains as a challenge because
of complex scenarios and variations in the scales of the objects [1]. Remote Sensing
Images are captured from satellites having wide views, which lead to the variations of
scales in images and complex background. These are the main obstacles for object
detection in Remote Sensing Images. They have many applications which include
12 Ch. Radhika et al
hazard response, urban monitoring, traffic control and many more [2]. Noise can be
removed from grayscale and color photographs with a lot of techniques [4].
The algorithms that have been effective in natural scene images are not adapted to aerial
images taken in wide view. Convolutional Neural Networks are used based on their
performance with the natural images. The object detection algorithms can be classified
into one stage and two stage object detector methods. The one stage method performs
in a one step process whereas the two stages perform region extraction and classifying
bounding boxes.
Faster RCNN involves the design of region proposal network. The one stage object
detector method constitutes YOLO [11], Retina Net [14]. YOLO works by dividing the
image into several cells through a single network.
Feature Pyramid Network (FPN) has been incorporated for multiscale object detection
but these can only address the imbalances present at the feature level. To address the
above issues and improve the detection accuracy and reduce computation time and
adaptive network is proposed which consist of feature extraction techniques that contain
additional information about the object and object detection algorithm that give a
superior accuracy.
by the anchor boxes so as to resist the feature computation that is repetitive and rapidly
increasing the detection speed [17].
Faster RCNN was responsible for the design of the anchor boxes and one stage detectors
such as YOLOV2 has been widely used in the modern detectors.
Object detection is still considered to be challenging in the field of computer vision,
where there is a need to predict the bounding box with the class label and confidence
score associated with the object in the image [18].
III. METHODOLOGY
The following step by step procedure implements the proposed model for detecting the
objects in remote sensing images.
A. Data collection:
Data was collected from aerial satellite images dataset.
14 Ch. Radhika et al
B. Preprocessing:
Images have been preprocessed. Data augmentation is done to increase the data for
custom object detection.
C. Feature Extraction:
The features from images are extracted using Residual Networks 101 and Zeiler and
Fergus Net.
F. Object detection:
The model is used to predict single scale and multi scale objects after training using
YOLOV5.
IV. IMPLEMENTATION
A. Dataset
The Satellite images dataset is used to implement the proposed system. Farmlands,
factories, playgrounds, residential areas, airplane, and parking lots are the different sorts
of classes. There are 1272 images in the entire dataset. Manual Annotation is done using
RoboFlow tool and Data Labelling is done by incorporating Boundary Box.
(a) (b)
B. Algorithms:
YOLOV5: The single stage object detector is used to detect the objects of different
scales in a remote sensing image with custom dataset and provides an output of
bounding box around the object with the confidence score. CSPNet can be employed
in order to extract the features of the image. YOLOV5 uses pytorch implementation
which overcomes the challenges of the darknet frameworks.
Faster RCNN: This model is used for comparison with YOLOV5. It uses deep
convolutional neural network and it uses the regions of interest pooling layer for the
extraction of feature vectors. It appears to be a single, unified network that provides the
output with class probabilities and accuracy.
V. RESULTS
When the YOLOV5 algorithm is used in the process, the images are trained and tested,
and the final result is obtained with the bounding box around the object and
corresponding confidence score.
A. Performance Measures:
For analyzing the YOLOV5 and Faster RCNN Models, the Mean Average Precision
(mAP) and accuracy are looked over to figure out how well the model works.
AP is defined as
1
𝑨𝑷 = ∫0 𝑃(𝑅)𝑑𝑅 Eq: 5.1
mAP is defined as
𝟏
𝒎𝑨𝑷 = 𝑵 ∑𝑵
𝒊=𝟏 𝐴𝑃𝑖
𝒄𝒍𝒔
Eq: 5.2
𝒄𝒍𝒔
16 Ch. Radhika et al
The Fig 3 shows that the YOLOV5 model has resulted in mean average precision
(mAP) of 0.75 that indicates the model is able to detect objects with a high level of
accuracy.
The Fig 4 shows the Faster RCNN result. The model has given an accuracy of 70%.
The performance of deep learning models for remote sensing object detection have been
analyzed. Two classification algorithms are trained which achieved an accuracy of 83%
and 62%. For object detection, YOLOV5 and Faster RCNN were used, which achieved
mean average precision (mAP) of 75% and 70%, respectively. The results presented in
the tables below clearly indicate the superior performance of YOLOV5 over Faster R-
CNN.
MultiScale Object Detection in Remote Sensing Images using Deep Learning 17
Fig 5: Detection results of YOLOV5 on aerial images dataset. (a) Residential Area.
(b)Farmlands (c) Aeroplanes (d) Forest Area (e) Trees and Storage Tanks
(f) Playground and Trees.
VI. CONCLUSION
The proposed system uses the customised data for training the model using object
detection algorithms such asFaster Regions with Convolutional Neural Networks, You
Only Look Once(YOLOV5) and also research the efficiency of deep learning
techniques such as ResNet101, ZFNet on Remote Sensing Object Detection. The
system also detects objects of different scale variations from aerial images using the
18 Ch. Radhika et al
above mentioned algorithms. Feature extraction techniques such as ResNet101 has been
used which got a higher accuracy of 88%. The Mean Average Precision obtained by
YOLOV5 is 75% and for FasterRCNN is 70%. YOLOV5 has given better results when
compared to FasterRCNN. The future enhancement of the projectwould be the
incorporation of detecting the small objects effectively and transfer learning for the
algorithms.
REFERENCES
[1] G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial object detection
and geographic image classification based on collection of part detectors, ” ISPRS
J. Photogramm. Remote Sens., vol. 98, pp. 119–132, Dec. 2014.
[2] G. Ganci, A. Cappello, G. Bilotta, and C. Del Negro, “How the variety of satellite
remote sensing data over volcanoes can assist hazard monitoring efforts: The
2011 eruption of nabro volcano, ” Remote Sens. Environ., vol. 236, Jan. 2020,
Art. no. 111426.
[3] J. Ding et al., “Object detection in aerial images: A large-scale benchmark and
challenges, ” IEEE Trans. Pattern Anal. Mach. Intell., early access, Oct. 6, 2021,
doi: 10.1109/TPAMI.2021.3117983.
[4] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger, ” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6517–6525.
[5] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement, ” 2018,
arXiv:1804.02767.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
Unified, real-time object detection, ” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2016, pp. 779–788.
[7] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical
remote sensing images: A survey and a new benchmark, ” ISPRS J. Photogramm.
Remote Sens., vol. 159, pp. 296–307, Jan. 2020.
[8] Q. Wang, J. Gao, and Y. Yuan, “A joint convolutional neural networks and
context transfer for street scenes labeling, ” IEEE Trans. Intell. Transp. Syst., vol.
19, no. 5, pp. 1457–1470, May 2018.
[9] Q. Wang, J. Gao, and Y. Yuan, “Embedding structured contour and location prior
in siamesed fully convolutional networks for road detection, ” IEEE Trans. Intell.
Transp. Syst., vol. 19, no. 1, pp. 230–241, Jan. 2017.
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object
detection with region proposal networks, ” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
MultiScale Object Detection in Remote Sensing Images using Deep Learning 19
[11] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object
detection, ” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2999–
3007.
[12] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature
pyramid networks for object detection, ” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.
[13] T. Kong, F. Sun, C. Tan, H. Liu, and W. Huang, “Deep feature pyramid
reconfiguration for object detection, ” in Proc. Eur. Conf. Comput. Vis. (ECCV),
Sep. 2018, pp. 169–185.
[14] W. Xie, J. Lei, S. Fang, Y. Li, X. Jia, and M. Li, “Dual feature extraction network
for hyperspectral image analysis, ” Pattern Recognit., vol. 118, Apr. 2021, Art.
no. 107992.
[15] W. Xie, J. Lei, Y. Cui, Y. Li, and Q. Du, “Hyperspectral pansharpening with deep
priors, ” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1529–1543,
May 2020.
[16] W. Xie, X. Zhang, Y. Li, J. Lei, J. Li, and Q. Du, “Weakly supervised low-rank
representation for hyperspectral anomaly detection, ” IEEE Trans. Cybern., vol.
51, no. 8, pp. 3889–3900, Aug. 2021.
20 Ch. Radhika et al