A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
Abstract— Large-size images cannot be directly put into GPU data. Moreover, random and sequential sampling are blind
for training and need to be cropped to patches due to GPU (cannot make different decisions based on data distribution),
memory limitation. The commonly used cropping methods before leading to the need to crop a large number of patches to cover
are random cropping and sequential cropping, which are crude
and fatally inefficient. First, categories of datasets are often all the patterns, which greatly increases the training time and
imbalanced, and just simple cropping misses an excellent oppor- so is low efficiency. We draw our work from the imbalance
tunity to make the data distribution balanced. Second, the and the inefficiency.
training needs to crop a large number of patches to cover all
patterns, which greatly increases the training time. This problem
is of great practical hazards but is often overlooked by previous A. Imbalance
works. The optimal solution is to generate valuable patches. Most segmentation datasets are imbalanced [2], [3].
Valuable patches refer to the value to network training, i.e., In Deepglobe [2], “agriculture” accounts for 56.76%, while
the value of this patch for the convergence of the network,
and the improvement of the accuracy. To this end, we propose “water” only accounts for 3.74%. In ISPRS Potsdam, the
a data-related patch proposal strategy to sample high valuable largest category impervious surface is 32 times larger than
patches. The core idea is to score each patch according to the the smallest class cluster. There are two strategies to solve the
accuracy of each category, so as to perform balanced sampling. imbalance problem: resampling and reweighting. At present,
Compared with random cropping or sequential cropping, our people mainly pay attention to the latter one and propose
method can improve the segmentation accuracy and accelerate
the training vastly. Moreover, our method also shows great advan- focal loss [4], gradient harmonizing mechanism (GHM) [5],
tages over the loss-based balanced approaches. Experiments on online hard example mining (OHEM) [6] and other excellent
Deepglobe and Potsdam show the excellent effect of our method. works. The main idea of these works is to reweight losses
Index Terms— Large-size images, patch proposal, semantic calculated from different samples. However, compared with
segmentation. the indirect loss-based reweighting methods, the resampling
approach can directly change the data distribution, and thus is
I. I NTRODUCTION simple and effective. Random or sequential cropping cannot
seize the opportunity to perform data adjustment, but our
W ITH the advancement of photography and sensor tech-
nologies, more and more large-size (high-resolution)
images are accessible. The demand for efficient and effec-
method is data-related so it can balance the data distribution
with a wet finger.
tive processes of this kind of image has been increased
in remote sensing analysis. However, due to GPU memory B. Inefficiency
limitation, large-size images need to be cropped to patches Due to the need to cover all kinds of textures and colors
during training [1]. And the two main cropping approaches are in the original image, the number of patches cropped out is
random cropping and sequential cropping, which only execute often very large. Without enough patches for training, the
the “crop” operation and do no other filters. These rough results of the same network will drop a lot. Some works with
operations miss one great opportunity to balance the training state-of-the-art performance like HRNet with object-contextual
representations (OCRs) [7] cannot avoid this decline, indi-
Manuscript received 24 February 2023; revised 30 May 2023; accepted 4 cating that the improvement of network architecture cannot
July 2023. Date of publication 25 October 2023; date of current version
8 November 2023. This work was supported by the Science and Technology ease or solve the problem. Therefore, filtering those recurring
Innovation (STI) 2030-Major Projects under Grant 2021ZD0201404. (Corre- patches (those with low training value) is an effective way to
sponding author: Zhepeng Wang.) improve efficiency.
Lianlei Shan and Guiqin Zhao are with the School of Computer Science
and Technology, University of Chinese Academy of Sciences, Beijing 100049, According to the above analysis, the key to solve the
China (e-mail: [email protected]; [email protected]. problem is to generate patches with high training values.
edu.cn). The high training value means the cropped patches can make
Jun Xie and Zhepeng Wang are with Lenovo Research, Beijing 100085,
China (e-mail: [email protected]; [email protected]). networks get a competitive accuracy on all the categories
Peirui Cheng and Xiaobin Li are with the Key Laboratory of Network and converge in the shortest possible time. Specifically, the
Information System Technology (NIST), Aerospace Information Research proposed patches can cover all the patterns, and there are no
Institute, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
[email protected]; [email protected]). redundant and repeated patches. Moreover, the patch proposal
Digital Object Identifier 10.1109/LGRS.2023.3327390 strategy should give a specific emphasis to the difficult case.
1558-0571 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHINA UNIVERSITY OF MINING AND TECHNOLOGY. Downloaded on April 16,2024 at 07:29:57 UTC from IEEE Xplore. Restrictions apply.
6011905 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 20, 2023
Authorized licensed use limited to: CHINA UNIVERSITY OF MINING AND TECHNOLOGY. Downloaded on April 16,2024 at 07:29:57 UTC from IEEE Xplore. Restrictions apply.
SHAN et al.: DATA-RELATED PATCH PROPOSAL FOR SEMANTIC SEGMENTATION OF AERIAL IMAGES 6011905
Authorized licensed use limited to: CHINA UNIVERSITY OF MINING AND TECHNOLOGY. Downloaded on April 16,2024 at 07:29:57 UTC from IEEE Xplore. Restrictions apply.
6011905 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 20, 2023
TABLE I
R ESULTS ON D IFFERENT S EGMENTATION N ETWORKS
Fig. 5. High-resolution results. Cyan for urban, yellow for agriculture, purple
for rangeland, green for forest, blue for water, white for barren land, and black
for fog which is a neglected category. Large area segmentation errors occurred
in the previous work, and our results are already very close to the label.
TABLE III
E FFECTS OF D IFFERENT R ATIOS OF λ1 AND λ2
TABLE II
R ESULTS ON D IFFERENT S EGMENTATION N ETWORKS
ON THE D EEPGLOBE DATASET
Authorized licensed use limited to: CHINA UNIVERSITY OF MINING AND TECHNOLOGY. Downloaded on April 16,2024 at 07:29:57 UTC from IEEE Xplore. Restrictions apply.
SHAN et al.: DATA-RELATED PATCH PROPOSAL FOR SEMANTIC SEGMENTATION OF AERIAL IMAGES 6011905
TABLE IV [3] M. Cordts et al., “The cityscapes dataset for semantic urban scene
C OMPARISON B ETWEEN S AMPLE -BASED BALANCE understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
AND L OSS -BASED BALANCE (CVPR), Jun. 2016, pp. 3213–3223.
[4] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Oct. 2017, pp. 2999–3007.
[5] B. Li, Y. Liu, and X. Wang, “Gradient harmonized single-stage detector,”
in Proc. AAAI Conf. Artif. Intell., vol. 33, 2019, pp. 8577–8584.
[6] A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based
object detectors with online hard example mining,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 761–769.
[7] Y. Yuan, X. Chen, X. Chen, and J. Wang, “Segmentation transformer:
Object-contextual representations for semantic segmentation,” 2019,
arXiv:1909.11065.
[8] Y. Li, J. Wu, and Q. Wu, “Classification of breast cancer histology
TABLE V images using multi-size and discriminative patches based on deep
learning,” IEEE Access, vol. 7, pp. 21400–21408, 2019.
E XPERIMENTAL R ESULTS W ITH D IFFERENT PATCH N UMBERS
[9] F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein,
“NnU-Net: A self-configuring method for deep learning-based biomed-
ical image segmentation,” Nature Methods, vol. 18, no. 2, pp. 203–211,
Feb. 2021.
[10] H. Yang and K. Min, “A saliency-based patch sampling approach for
deep artistic media recognition,” Electronics, vol. 10, no. 9, p. 1053,
Apr. 2021.
[11] B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch
network with cumulative learning for long-tailed visual recognition,”
The experimental results are shown in Table IV. Among in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2020, pp. 9716–9725.
them, super-parameters in focal loss, GHM, and OHEM are
[12] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
consistent with the original paper. It can be observed that time object detection with region proposal networks,” in Proc. Adv.
compared with the method of loss reweighting, our method Neural Inf. Process. Syst., 2015, pp. 91–99.
is more direct and has more obvious effects. [13] A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. NIPS,
2017, pp. 1–4.
3) Results With Different K (Number of Patches): After [14] K. Chen et al., “MMSegmentation: Open MMLAB segmentation toolbox
the score map is obtained, the number of patches (K in and benchmark,” 2019, arXiv:1906.07155.
Section III-C) with high scores is selected as a hyperparame- [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
ter. The experimental results are shown in Table V. It can be (CVPR), Jun. 2016, pp. 770–778.
observed that 100 is a threshold. When the number exceeds [16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
100, most of the newly added patch proposals are filtered out 2014, arXiv:1412.6980.
[17] I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with
due to the existence of NMS, so the improvement of the result warm restarts,” 2016, arXiv:1608.03983.
can be ignored. [18] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on ImageNet classification,” in
Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.
V. C ONCLUSION
[19] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,
In this letter, we propose a data-related patch proposal “Encoder–decoder with atrous separable convolution for semantic
strategy, which can directly change the data distribution to image segmentation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
pp. 801–818.
avoid the loss of accuracy caused by imbalance. In addition, [20] Z. Sun, Z. Zhang, M. Chen, Z. Qian, M. Cao, and Y. Wen, “Improving
filtering out meaningless patches for training will also accel- the performance of automated rooftop extraction through geospatial
erate network convergence and thus shorten training time. stratified and optimized sampling,” Remote Sens., vol. 14, no. 19,
p. 4961, Oct. 2022.
More importantly, the proposed method can be applied to [21] Y. Yuan, X. Chen, and J. Wang, “Object-contextual representations for
all the segmentation networks and so contains a wide range semantic segmentation,” in Computer Vision—ECCV. Glasgow, U.K.:
of application significance. And due to our unique sampling Springer, 2020, pp. 173–190.
[22] E. Xie et al., “SegFormer: Simple and efficient design for semantic
method, multiple categories exist in each patch, so it is very segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Sys.
suitable for contrastive learning methods to play a role. (NIPS), vol. 34, Dec. 2021, pp. 12077–12090.
[23] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
R EFERENCES network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jul. 2017, pp. 6230–6239.
[1] W. Chen, Z. Jiang, Z. Wang, K. Cui, and X. Qian, “Collaborative [24] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
global-local networks for memory-efficient segmentation of ultra-high works for biomedical image segmentation,” in Proc. Int. Conf. Med.
resolution images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Image Comput. Comput.-Assist. Intervent. Cham, Switzerland: Springer,
Recognit. (CVPR), Jun. 2019, pp. 8916–8925. 2015, pp. 234–241.
[2] I. Demir et al., “DeepGlobe 2018: A challenge to parse the earth [25] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional
through satellite images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern neural networks for volumetric medical image segmentation,” in Proc.
Recognit. Workshops (CVPRW), Jun. 2018, pp. 172–17209. 4th Int. Conf. 3D Vis., Oct. 2016, pp. 565–571.
Authorized licensed use limited to: CHINA UNIVERSITY OF MINING AND TECHNOLOGY. Downloaded on April 16,2024 at 07:29:57 UTC from IEEE Xplore. Restrictions apply.