0% found this document useful (0 votes)
20 views

Extraction of Cropland Field Parcels With High Resolution Remote Sensing Using Multi-Task Learning

Uploaded by

kobeforeverno731
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Extraction of Cropland Field Parcels With High Resolution Remote Sensing Using Multi-Task Learning

Uploaded by

kobeforeverno731
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

European Journal of Remote Sensing

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/tejr20

Extraction of cropland field parcels with high


resolution remote sensing using multi-task
learning

Leilei Xu, Peng Yang, Juanjuan Yu, Fei Peng, Jia Xu, Shiran Song & Yongxing
Wu

To cite this article: Leilei Xu, Peng Yang, Juanjuan Yu, Fei Peng, Jia Xu, Shiran Song &
Yongxing Wu (2023) Extraction of cropland field parcels with high resolution remote
sensing using multi-task learning, European Journal of Remote Sensing, 56:1, 2181874, DOI:
10.1080/22797254.2023.2181874

To link to this article: https://doi.org/10.1080/22797254.2023.2181874

© 2023 The Author(s). Published by Informa


UK Limited, trading as Taylor & Francis
Group.

Published online: 15 Mar 2023.

Submit your article to this journal

Article views: 2987

View related articles

View Crossmark data

Citing articles: 4 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tejr20
EUROPEAN JOURNAL OF REMOTE SENSING
2023, VOL. 56, NO. 1, 2181874
https://doi.org/10.1080/22797254.2023.2181874

Extraction of cropland field parcels with high resolution remote sensing using
multi-task learning
Leilei Xua, Peng Yanga, Juanjuan Yua, Fei Pengb, Jia Xua, Shiran Songc and Yongxing Wud
a
School of Earth Sciences and Engineering, Hohai University, Nanjing, China; bInstitute of Atmospheric and Environmental Sciences,
University of Edinburgh, Edinburgh, UK; cState Key Laboratory of Desert and Oasis Ecology, Chinese Academy of Sciences, Xinjiang
Institute of Ecology and Geography, Urumqi, China; dUrban Renewal Technology Research Institute, SITRI, Suzhou, china

ABSTRACT ARTICLE HISTORY


Parcel-level farmland information contains rich spatial distribution and boundary details, which Received 18 July 2022
is crucial for digital agriculture and agricultural resource surveys. However, the spatial complex­ Revised 24 November 2022
ity and heterogeneity of features resulting from high resolution makes it difficult to obtain Accepted 14 February 2023
parcel-level information quickly and accurately. In addition, existing methods do not suffi­ KEYWORDS
ciently take into account the spatial topological information, particularly for blurred bound­ High-resolution image;
aries. Here, we develop a multi-task network model to extract plot-level cropland information. semantic segmentation;
Specifically, the model consists of a cascaded multi-task network with integrated semantic and edge detection and repair;
edge detection, a refinement network with fixed edge local connectivity, and an integrated cropland-parcel extraction;
fusion model. To validate the performance of the model, two typical tests were conducted in multi-task learning
Denmark (Europe) and Chongqing (Asia) with high-resolution remote sensing images provided
by Sentinel-2 (10 m) and Google Earth (0.53 m) as data sources. The results show that our
proposed model outperforms other baseline models and exhibits higher performance. This
study is expected to provide important support for the design of new global agricultural
information management systems in the future.

Introduction
features-based (HFB) and convolutional neural net­
Timely and accurate parcel-level farmland informa­ work (CNN)-based methods. HFB mainly used object-
tion is important for policy-makers, agricultural infor­ inherent characteristics such as topological character­
mation survey and global food security (Fritz et al., istic and brightness characteristic to aggregate homo­
2015), which is largely based on quantifying the spatial genized regions. For complex agricultural boundaries,
distribution of agricultural land and individual parcel- a new growing snakes active contour model was
level fields (Kuemmerle et al., 2009) accurately. Early usually employed to extract agricultural fields at the
information on field boundaries was mainly collected sub-pixel level. In addition, Hong et al. (2021) devel­
through manual interpretation, which involved huge oped a parcel-level boundary extraction algorithm
costs and low efficiency. As increasing remotely using different traditional methods, such as the
sensed data have become available through open data Suzuki85 algorithm and canny edge detection.
policies, automated and semi-automated agricultural However, this algorithm can extract regularly
field extractions have been gradually adopted into arranged agricultural area boundaries only. For this
modern agricultural information management issue, Torre and Radeva (2000) designed a region
(Garcia-Pedrero et al., 2019; Wagner & Oppelt, 2020; competition technique that integrates region growing
Waldner et al., 2015; Xiong et al., 2017; Yan & Roy, and deformable models to segment agricultural fields
2014). bypassing the extraction of boundaries. Belgiu and
In remotely sensed images, although agricultural Csillik (2018) evaluated the performance of a time-
boundaries form destructive connections between weighted dynamic time warping (TWDTW) method
plots through ditches and roads, they are easily dis­ that uses Sentinel-2 time series in pixel-based and
tinguished by different ground materials. However, object-based classifications of various crop types in
those boundaries that are adjacent to natural features three different case areas. Similarly, Graesser and
are often ambiguous and highly heterogeneous in Ramankutty (2017) proposed a combined method
remotely sensed imagery. Moreover, the varied global that consists of multi-spectral image edge extraction,
cropping patterns also challenged the extraction of multi-scale contrast limited adaptive histogram equal­
field boundaries with wide applicability. For this, ization, and adaptive thresholding, which used
existing studies can be divided into handcrafted Landsat imagery to estimate cropland changes.

CONTACT Fei Peng [email protected] Institute of Atmospheric and Environmental Sciences, University of Edinburgh, Edinburgh, UK;
Yongxing Wu [email protected] Urban Renewal Technology Research Institute, SITRI, Suzhou, China
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
2 L. XU ET AL.

However, the above methods have two issues: they are results, they do not fully consider the spatial connec­
sensitive to intra-plot variability, which could produce tivity and structural characteristics of the road. To
more segments and rely heavily on a correct selection address these two issues, Wei et al. (2020) integrated
of parameters (e.g. the similarity measure used to a multi-point tracking method into a multi-stage fra­
group the pixels of the image), which obviously mework, which forms a centerline graph based on the
requires prior knowledge of the scene or trial-and- road segmentation prediction map and then fuses it
error tuning (Meyer & Beucher, 1990; Mueller et al., with the segmentation map to obtain the final seg­
2004). mentation map and centerline. As a result, learning
Fortunately, CNN, which has been successfully the road network topology indeed improves road con­
applied to image classification (Cheng et al., 2018; nectivity. Further, Ding and Bruzzone (2021) pro­
He et al., 2016), semantic segmentation posed a direction-aware residual network that adopts
(Ronneberger et al., 2015), and object detection multiple loss functions to supervise the learning of the
(Redmon & Farhadi, 2018), is promising for automatic road surface, the structure of the road, and the direc­
agricultural field extraction from remote sensing tionality of the road to optimize the road extraction
images. An updated technique based on fully convolu­ results. Meanwhile, the MTL method is also employed
tional network (FCN) (Long et al., 2015) and combi­ for farmland plot extraction. Waldner and
natorial grouping was proposed to delineate Diakogiannis (2020) proposed a multi-task semantic
agricultural fields in smallholder farms (Persello segmentation model using ResUNet-a (Diakogiannis
et al., 2019). However, for mountainous areas where et al., 2020). In this model, the decoder is divided into
lack of continuous optical images, W. Liu et al. (2020) three branches to predict the extent of the fields, their
applied a CNN to obtain an accurate image object boundaries, and the distance to the closest boundary
distribution map on very high resolution (VHR) opti­ in satellite images. The generalization ability of this
cal images and the Long Short-Term Memory (LSTM) model is further improved by averaging the model
networks were next adopted to identify farmland par­ predictions from time-series data and adjusting the
cels on time-series synthetic aperture radar (SAR) parameters by the watershed post-processing method.
data. Masoud et al. (2020) combined a dilated fully Although remarkable improvements have been
convolutional network and super-resolution mapping made in cropland extraction by recent deep learning-
techniques to produce refined agricultural field based approaches, extracting boundary information
boundaries. To further improve the model perfor­ under more complex and fragmental farmlands has
mance, Garcia-Pedrero et al. (2019) proposed not been conducted yet. The boundary detection
a methodology based on U-Net to delineate agricul­ model has been widely used in the field of remote
tural boundaries automatically, and made a hit. sensing (Cheng et al., 2022; Lee et al., 2021). Given
Obviously, the method to construct a neural net­ an edge detection model such as (H. Liu et al., 2019), it
work for a particular task cannot obtain good results is simply used to extract farmland plots. Unclosed
regarding different agri-landscapes. Multi-task learn­ edges containing isolated segments or missing infor­
ing (MTL) (Jou & Chang, 2016; Misra et al., 2016; mation will thus make it impossible to obtain real
Ruder, 2017; S. Liu et al., 2019) is a machine learning cropland parcels. The segmentation strategy proposed
technique that solves multiple tasks simultaneously by Xia et al. (2018) includes two edge tasks and one
and overcomes this limitation to some extent. It uses semantic task to extract soft boundaries, hard bound­
retrieved information from related tasks to improve aries and cropland surface and merge the three results
generalization and transfer knowledge as an inductive to construct plots. However, the shortcomings of the
bias (Ruder, 2017). For example, a multi-task model is strategy are that the semantic segmentation model and
applied to the extraction of building footprints edge detection model are trained independently. As
(Bischke et al., 2019). Furthermore, some studies a result, the two tasks have a strong correlation, but
have introduced MTL into automatic road extraction this is not actually reflected in the training phase.
from remote sensing images. We were happy to see The problem to be addressed at this point is how to
that a cascaded end-to-end model called CasNet combine cropland features extracted by segmentation
(Cheng et al., 2017) was first proposed, in which the and edge detection models to realize their comple­
first network predicts the road segmentation map, and mentary strengths. The novel framework presented
the second network extracts the centerline based on in this study adopts a cascaded CNN network to
the first one. Next, as a typical case, RoadNet (Y. Liu extract cropland surfaces and edges simultaneously
et al., 2019) is subsequently developed forming an to construct cropland parcels with much better accu­
updated three-task-branch network. The interrelated racy and completeness. Meanwhile, in the edge detec­
tasks of the above two models can promote comple­ tion task, a point regression network is used to detect
mentarity, and road segmentation provides ideal initi­ potentially connectable points with the local center,
alization with a small amount of complex background and the final break-point pair is confirmed through
for other tasks. Although the two models achieve good a series of post-processing for the broken edge line to
EUROPEAN JOURNAL OF REMOTE SENSING 3

make up for its lack of topology information learning. Multi-task learning model (MTL)
This study is expected to provide technical support for The cascade model architecture is presented in
agricultural information surveys and cropland Figure 2(a). The proposed MTL model creates three
management. tasks related to the problem of agricultural land
For further details in paper framework, Section II boundary detection to enhance performance. The cre­
elaborates on the overall framework of our plot extrac­ ated tasks consist of three parts: semantic segmenta­
tion method, including the cascade network SGENet tion, gaussian extraction, and edge detection, which
and patch refinement model. Section III describes the correspond to SSNet, GENet, and EDNet, respectively.
specific conditions of farmland in the two case areas. As shown in Figure 2(b), the function of SSNet is to
Section IV presents ablation experiments to verify the simplify the complex cropland background and obtain
effectiveness and generalization of our method on the initial classification map (Figure 1). Semantic seg­
diverse data sets. Section V and Section VI make mentation net (SSNet) is an encoder-decoder struc­
discussion and conclusion, respectively. ture with the RGB channels of an image at a size 512 ×
512 as input I H × W × 3. ResNet-101 pretrained on
ImageNet is used as our encoder network. The five
Methodology residual module extract high-level and low-level fea­
ture maps by downsampling convolutional layers. In
Overall framework the decoder stage, the feature maps are upsampled and
Farmland plot extraction aims to classify pixels into two perform skip connection operations with the feature
categories of farmland category and background category maps of the same level as the previous encoder until
in the image. In order to accurately classify farmlands back to the same size as the input image I. A final
with heterogeneity, we propose a hybrid framework, softmax layer is employed to output the probability of
which mainly utilizes and combines edge and semantic each pixel belonging to background classes, surface
information obtained from a multi-task cascade network. and boundary. The three-dimensional feature map S
We find that 1) semantic information can guide the H×W × 3 represents the probability maps of the three
extraction of edge information for plots. 2) Implicit categories. The background information is filtered
edge information where edge extraction results are out, and the remaining two feature maps S H×W × 2
often disconnected can be refined using local features. 3) and the original image I H × W × 3 are concated to
objects can be segmented more accurately by combining generate feature map SI H × W × 5 as the input of
both information. These observations lead us to divide Gaussian Extraction task.
the final cropland extraction task into three stages, which Gaussian Extraction Net (GENet) uses the output
are illustrated in Figure 1. First, the coarse semantic map SI of the previous task as input (Figure 2(a)). Its net­
(classification map) and imperfect boundaries are output work structure is the same as the semantic task
sequentially by the cascaded network. Then, we use (Figure 2(b)). The difference with SSNet is the ground
patch-level network to optimize the connectivity (patch truth mask and output. Ground truth masks are pro­
with breakpoint phenomenon) on the edge results given cessed by a distance transformation from the vector
by the previous model. Finally, the optimized edge (edge reference agricultural land boundary data. Inspired by
map) and coarse semantic results are merged through Road Connectivity (Batra et al., 2019), the Gaussian
a voting process to produce fine segmentation maps (final distribution is equally divided into eight intervals.
prediction). Each interval is used as a category, and the original

Figure 1. Flowchart of the proposed framework for plot extraction.


4 L. XU ET AL.

Figure 2. Cascade model.

regression process is converted into a classification information always comes from low-level feature
task that is conducive to the convergence of multi- maps, stage 5 residual blocks containing higher-level
task loss (learning edge buffer information). It information on the backbone are discarded. The net­
improves semantic segmentation accuracy at locations work structure of EDNet is simplified on original HED
where the agricultural land boundaries are ambiguous (Xie & Tu, 2015). The output depth of stage 1 to stage
(e.g. the inside of the boundary contains ditches, 4 of each convolutional layer is halved. As a result, it
bridges, roads, and other features). The GENet output has fewer parameters to avoid overfitting. A 1 × 1 − 1
is an eight-dimensional feature map after using soft­ conv layer follows each residual block of every stage.
max activation. 1 × 1 convolution is performed to Upsampling and sigmoid activation layers are used
reduce the dimension to 1 so that the final features jointly to generate side outputs at different scales in
are merged. The final output G H × W × 1 concated with four stages. The four side outputs are concatenated on
I H × W × 3 to get GI H × W × 4. It is used to feed the Edge channels and then passed through a 1 × 1 − 1 conv
detection task (Figure 2(a)). layer to get the final edge result E H × W × 2 , which
The function of Edge Detection Net (EDNet) task is represents the intensity distribution map of boundary
to produce the final geometric distribution of farm­ and non-boundary.
land edges, in combination with the semantic feature There are three different prediction results at each
cues obtained from the above two tasks. As shown in cascade stage, namely S, G and E, which are predic­
Figure 2(c). It also uses ResNet-101 as the backbone to tions of SSNet, GENet, and ENet, respectively
extract different feature layers. Since the edge (Figure 2). A joint loss function is used here to
EUROPEAN JOURNAL OF REMOTE SENSING 5

supervise, using two segmentation losses and one edge breakpoints are used as the center point of the patch.
loss, which are all cross-entropy losses. The label of each patch is the Gaussian distribution
map generated by these breakpoints in the patch. The
network is trained with a set of k × k pixel patches
Patch-level model for connectivity
from the training set. As a result, the output is
A patch-level model is introduced to estimate the
a heatmap that reflects the probability of each location
connectivity between adjacent breakpoints using
connected to the central point of the patch.
local features at the local scale. Given an image patch
As shown in Eq. 1 and Figure 4, our post-
centered on the breakpoint, our proposed patch
processing method is mainly divided into two stages:
refinement network infers whether there are other
locating the position of the breakpoint and matching
locations at the patch border that are connected to
the relevance of the breakpoint. This study first con­
the center point. The network architecture is shown
verts the predicted intensity map E H × W × 2 into
in Figure 3. After the network predicts the possible
a binary image, extracts the skeleton line based on
connection orientations between the center and bor­
the binary image, and slides the window on the skele­
der in the patch, the best combination of breakpoints
ton line to obtain the breakpoint. Meanwhile, a 3 × 3
will be selected for breakpoint connection.
identity matrix is used as a filter to traverse all pixels in
As shown in Figure 3, our patch refinement net­
the binarization map with the center position of the
work is mainly composed of a preprocessing network
matrix to detect the breakpoint of the binarization
and hourglass modules. Of these two, the preproces­
map. Whether each pixel is a breakpoint is determined
sing network uses large convolution kernels to initially
by Eq. 1 - 2:
extract features. Then, two hourglass modules (Newell
2 3
et al., 2016) are stacked together. The symmetric 1 1 1
design allows for bottom-up (from high to low resolu­ Mði;jÞ ¼ 41 1 15
tion) in the first half of the module, and top-down 12 1 1 3
Pixelðði 1Þ;ðj 1ÞÞ Pixelði;ðj 1ÞÞ Pixelððiþ1Þ;ðj 1ÞÞ
(from low to high resolution) in the second half. To � 4 Pixelðði 1Þ;jÞ Pixelði;jÞ Pixelððiþ1Þ;jÞ 5
capture the local information of the points on the edge Pixelðði 1Þ;ðjþ1ÞÞ Pixelði;ðjþ1ÞÞ Pixelððiþ1Þ;ðjþ1ÞÞ
line at different scales, the bottom-up and top-down
structures conduct intermediate supervision to make ((0<(w-2)), 0<j<(h-2)) (1)
optional predictions for each hourglass module.
Meanwhile, intermediate supervision is re-conducted where, Mði;jÞ , Pixelði;jÞ , w and h are dot product sum of
after the first module. As a result, the heatmap in 3 × 3 pixel matrix and identity matrix in the image (i,
the second module represents the final output of the j) position, the pixel value of the image (i, j) after
network that is used to extract the connectivity from binarization, the width of the predicted image and
the center of the image. the height of the predicted image.
Connectivity is not a property of a single point but �
1; if Mði;jÞ ¼¼ 2 and Pixelði;jÞ ¼¼ 1;
of pairs of pixels. The local network is designed to Boolbreakpointði;jÞ
0; else;
estimate the points from the border in a patch that are
(2)
connected to a given input point. Therefore, given
a patch, the position of the input and output points Where Boolbreakpointði;jÞ determines whether pixel point
needs to be encoded and updated. Specifically, this (i,j) is a breakpoint or not.
study breaks the edge line into points in the design The breakpoints of the agricultural landline are
of the models, and each of our patches is cut out based detected to obtain a set of breakpoints. Then, the slices
on the position of the breakpoint; then these with one of the breakpoints as the center are

Figure 3. The structure of the simplified HED model. Stages 1–4 indicate single-stream deep convolutional networks for learning
multi-scale features and different levels of visual perception, a 1 × 1 − 1 conv layer follows the last conv layer of each stage. Then,
the feature map is up-sampled with the up-sampling layer to obtain the corresponding lateral output; Fusion indicates all
upsampling layers are connected by a concatenated layer. Then a 1 × 1 − 1 conv layer is applied to fuse the feature maps obtained
from each side output. Finally, the output of the fusion loss is obtained through a loss layer. K×K in K×K-s represents the size of the
kernel, s in K×K-s represents the stride size and N is the multiple of upsampling.
6 L. XU ET AL.

Figure 4. Overview of the patch refinement network architecture.

generated. Next, the proposed patch refinement net­ effectively ensures that the algorithm is iterative and
work predicts the connectivity from the center point convergent when performing detection and prediction
of the sliced image to the surrounding points, as regarding all breakpoints. In this way, we optimize the
shown in the green circle in Figure 4a. The central edge results in the cascaded network. Finally, the
point is connected to the breakpoint that is predicted obtained edge maps (Figure 1) were prepared for vot­
to be connectable, as shown in Figure 4b. ing fusion in the next stages.
Subsequently, the line segment matching is performed
on the values with a certain angle tolerance, and the Voting strategy
results are shown in Figure 4c. According to the priori Considering spatial similarity and heterogeneity (Wu
of an agricultural field in high-resolution remote sen­ et al., 2013) (i.e. the same objects show similar char­
sing images, the agricultural field boundaries are acteristics in neighboring areas, and different features
obvious and have a sharp contrast to the texture of in the same area show high heterogeneity), farmland
the surrounding environment on a local scale. In par­ generally has uncertain texture features. For this, our
ticular, in the scenario of this paper, the orientation of study defines the visual features of the cultivated land
the agricultural field boundary is estimated only at the parcels into a combination of edges (boundary trans­
locations of breakpoint seeds based on a small patch, formation) and textures (a mixture of artificial con­
in which the edges are much clearer with fewer inter­ struction and natural growth, which is semantic
ferences. Thus, the characteristics of the agricultural information). The edges and textures are fused by
field boundary in these local areas are more abundant the voting mechanism, which is illustrated as follows
and clear, and the main orientation can be estimated (Figure 5).
more robustly. The regularity and periodicity of the It should be noted that, the segmentation result
regular texture make it possible for texture primitives only provides inaccurate pixel-wise output, and differ­
to embody some characteristics on the whole, such as ent kinds of pixels may be involved within the pre­
orientation. Thus, the orientation of the texture is dicted edge. So, this study utilizes a majority voting
predicted to represent the possible orientation of the strategy (Ma et al., 2019) to obtain the final cropland
edges by canny edge detection. Finally, the orientation parcels so that the various segmentation and boundary
of the broken line at the center point is estimated, and information are combined to obtain�the�best result. Let
the line segment is retained for repair when the orien­ N
G denotes the ground truth, and rbj j¼i denotes the
tation angle can be complementary. Such an inte­ � �N
corresponding area of rbj j¼i in G. There are M pixels
grated process is illustrated in Figure 4d. After the � �
prediction is finished, the paired breakpoints are in the parcel, i.e. M =�rbj �, and the category of the m-th
removed from the breakpoint set, and the above pro­ pixel is represented as lm , m = 1, . . . M. Then, the
cess is repeated until the breakpoint set is empty. This number of pixels in rbj belonging to each category is
EUROPEAN JOURNAL OF REMOTE SENSING 7

Figure 5. Details of the breakpoint connection process.

counted, and the most frequent class will be selected as study, it is adopted to describe the agricultural land
the category of rj . This process is represented as: boundary surface divided by the total amount of pixels
XM that describe the agricultural land boundary in both
argmax m¼1
signðlm ¼ rÞ (3) images (refer to Section III). The calculation formula
r¼f1;...;kg
of IOU is:
where, sign (·) denotes an indicator function, sign prediction \ ground truth
(true) = 1, and sign (false) = 0; r is the possible class. IOU ¼ (7)
prediction [ ground truth
The voting mechanism is applied within all predicted
boundaries to obtain the parcels. In the multi-task
learning model, the semantic results S H×W × 2 are
merged with the optimized edge results. This paper Study areas and dataset
superimposes the semantic segmentation map and the
refined edge results for analysis: if half of the pixels Denmark
within a boundary belong to the farmland plot cate­ The first study area is located in central Denmark
gory, and then it is classified as a plot, otherwise, as (Figure 6, left), which follows the typical European
a background category, within a single contour. agricultural pattern. It is defined by the extent of the
Sentinel-2 satellite image mosaic. The corresponding
digitized agricultural parcels are used as the training
Accuracy evaluation
set, validation set and test set. The mosaic was made
In this study, the completeness, correctness, and qual­ up of two cloud-free Top of Atmosphere (TOA)
ity introduced by Wiedemann et al. (1998) are Sentinel-2 images with a spatial resolution of 10-m,
explored to evaluate the performance of the line and the images were captured at the beginning of the
extraction algorithm. Completeness (Comp) is growing period in May 2016. Agricultural field poly­
a variant of recall, which indicates the percentage of gons are obtained from the 2016 Denmark “Marker”
the reference line that lies within a buffer of width ρ dataset (https://collections.eurodatacube.com/den
around the extracted lines. Correctness (Corr) is mark-lpis/), which is part of the European Union
a variant of precision, which is the percentage of the Land Parcel Identification System (LPIS) initiative.
extracted lines that lie within a buffer of width ρ The dataset contains nearly 600,000 field parcel poly­
around the reference lines; quality (Qual) is an overall gons for Denmark, and each polygon is assigned with
metric, which combines Comp and Corr. These eva­ a unique identification number, one of the 293 crop
luation metrics are described as follows: classes and the field area. A total of 249,298 parcels are
contained in this study area, from which 159,042 with
length of matched reference
Comp ¼ (4) a mean area of 52,451 m2 are used in this study (some
length of reference classes are removed during preprocessing). The parcel
classes provided by the original dataset are reclassified
length of matched extraction into five classes, namely spring cereal, winter cereal,
Corr ¼ (5)
length of extraction maize, grass fields, and other fields. Parcels of classes
not directly related to agricultural usage, e.g. natural or
length of matched extraction
Qual ¼ permanent environments, forests, recreational areas
length of extraction þ length of unmatched reference
or wastelands, are removed.
(6)
As shown in Figure 7a, there are heterogeneous
The intersection over union (IOU) metric, also geographic phenomena in the farmland areas, and
known as the Jaccard index, evaluates the overlap the shape and size of the farmland are different.
between reference and classified objects. In this The special cultivated method results in strips in
8 L. XU ET AL.

Figure 6. Fusion process of edge results and semantic results based on voting mechanism. a indicates the original cropland field;
b indicates superposition of outputs from edge detection stage and semantic segmentation stage; c indicates the final cropland
parcels using the voting mechanism.

Figure 7. The study area (left) and the detailed views (right) showing the geographical distribution of the training set, validation
set, and test set.

the interior of the field objects (Figure 7b), which forests and grasses or land parcels, the natural
can easily cause interpretation confusion with the disruptions partition the locations where a change
actual farmland edge. In the case of borders with of crop type is not significant (Figure 7c). Due to
EUROPEAN JOURNAL OF REMOTE SENSING 9

such local variations, it is difficult to identify sev­ Chongqing


eral field objects accurately.
This paper cuts the land area of the Sentinel-2 image The second area in Chongqing, China, is selected as an
mosaic into 1100 chips of 512 × 512 pixels. The agri­ experimental case, which follows the typical Asian
cultural field dataset is intersected with the same grid agricultural pattern. Unlike European ones, this culti­
and transformed into the pixel coordinates of the 512 × vated land is slender and narrow and has serious
512 pixel chips as ground truth. Due to the large dif­ fragmentation. It is a representative area of the hilly
ference in the shape and size of the farmland in remote area in southwest China.
sensing images, the cropped model input data may not As shown in Figure 9, the cultivated land in this
cover the narrow and long cultivated land in the image area has the following characteristics: (1) Regular cul­
during model training and inference. Thus, this paper tivated land is relatively neat in spatial distribution,
performs overlapping-window cropping on the original and it has clear edges and relatively uniform internal
and labeled images to generate the training and test texture (Figure 9a); (2) the narrow and long cultivated
data. Considering that the training data must contain land has clear boundaries and uniform internal texture
as many different examples as possible to avoid over- (Figure 9b); (3) the edge characteristics of the culti­
fitting the generated model, data augmentations, such vated land are relatively fuzzy and have different
as flipping and rotation, are applied. As shown in shapes, which can be told by the texture characteristic
Figure 8, each piece of the original image data will of the crop fields inside the plot (Figure 9c).
generate different labels as the ground truth of the The satellite images of the study area were obtained
training set. This is because our cascaded network is from the Google Earth Platform (https://earth.google.
divided into multiple tasks, which causes our model to com/), with RGB bands and a spatial resolution of
generate different forms of ground truth corresponding 0.53 m. The southwestern region is cloudy and rainy,
to different tasks during training to backpropagate the and most of the images have a high cloud cover,
loss function calculation. making it difficult to obtain single-period cloudless

Figure 8. Exampled 512 × 512 image chips with different agricultural fields. a is a long and narrow farmland plot; b is a farmland
plot with a complex planting structure, and c is a farmland plot with fuzzy boundaries.
10 L. XU ET AL.

Figure 9. Example of data input feeding the cascade network model. a, original image; b, semantic mask; c, gaussian mask and d,
boundary mask.

image data. To address this issue, the images used in line indicators. Existing studies suggest that transfer
this experiment are cropped and mosaicked from learning could accelerate network convergence and
multiple phases of images, and the time span is from improve network performance (Oquab et al., 2014).
March 2018 to April 2019. Generally, in high- Thus, in this study, all experiments were performed on
resolution remote sensing images, the cultivated clusters with strong processing power and memory
land, forests, and grasses are confusing objects. and running on the Ubuntu operating system (Linux
Therefore, the images selected in this experiment con­ OS). More specifically, all models were trained on an
tain a large number of forests and grasses to verify the NVIDIA GeForce GTX 3090 GPU paired with 24 GB
effectiveness of the proposed method. memory. In addition, the Pytorch deep learning fra­
The experimental data of cultivated land samples mework (https://pytorch.org/features/) was used in
are collected, and the ArcGIS 10.2 software (https:// this study.
www.esri.com/) is used for drawing samples. In this
way, a total of 1000 image slices with a size of 512 ×
512 pixels are obtained. This paper randomly selects The performance of the multi-task cascade model
600 of the images as the training set, 200 as the
validation set and 200 as the test set. Similarly, the Our model was compared with two edge detection
images are flipped and rotated for data networks, DexiNed (Poma et al., 2020) and RCF (Y.
augmentation. Liu et al., 2017). Combined with the semantic segmen­
tation network, the plots of cultivated land were given
semantic information. To facilitate experimental com­
Experiment and results parison, U-Net was used as the semantic segmentation
network, which was the same as the cascaded semantic
To investigate the capabilities of the method proposed segmentation network in this study. Referring to the
in this paper, two experiments were conducted in the literature (instance segmentation), MaskRCNN (He
study area of Denmark and Chongqing. The experi­ et al., 2017) and PointRend (Kirillov et al., 2020)
mental setup, results, and analysis are shown in the were also selected for comparisons. Similarly, for cas­
following sections. caded networks, Bsinet is also used as a comparison
model (Long et al., 2022).
It can be seen from Figures 10 and 11 that our
Experiment setup and deployments
proposed method in this paper can avoid false clas­
All network models were trained using the Adam sification effectively and achieve an excellent extrac­
optimizer. The learning rate was set to 0.001, and it tion performance in remote sensing scenes,
was decayed by a factor of 0.1 when the performance especially for complex landscape backgrounds with
stopped improving after two consecutive epochs. The weak edges and narrow plots (Figure 10). Using
models were trained with a batch size of 8, the largest a single-task edge semantic segmentation model,
number of images the available GPU could handle. the edge features of farmland not well responded
The cascade model fully embodies the relationship in complex scenes, while our cascade network can
between the area features and the line features of the distinguish the inner and outer parts of the farmland
farmland. To reflect the model’s learning performance well without the interference of the surrounding
representative features, the evaluation indicators were background. The corresponding final processed
used on the test set, also including area indicators and farmland extraction results are also significantly
EUROPEAN JOURNAL OF REMOTE SENSING 11

Figure 10. The 512 × 512 image chips with different agricultural fields in Chongqing.

Figure 11. Comparison of the edge detection results by our cascaded network with other edge detection models. The first column
presents the original image; the second column presents the ground truth; the third column presents the segmentation map
predicted by the RCF model; the fourth column presents the edge detection result predicted by the DexiNed model; the fifth
column presents the boundary result predicted by our cascaded network.

better than the single-task model. It can also be seen at high resolution, the recognition ability of general
from Figure 11 that our proposed method extracts segmentation models is not enough to deal with this
the largest number of correct plots. The farmland kind of scene. The cultivated land extraction methods
plots extracted by our method have better visual based on the literature (XIA) present a large number
performance, and the plot edges are more consistent of false classifications of the edges of non-cultivated
with their real spatial distribution. land plots.
Table 1 and Table 2 provide detailed information
about the comparisons of the line and area evaluation
indicators, respectively. It can be seen from the quan­
Post-processing refinement
titative results that the cultivated land extraction
method proposed in this article is superior to the The farmland edge lines predicted by the above cas­
existing methods in terms of line and area evaluation caded network may not be closed. At the end of our
indicators. Our method performs 11.31% and 11.72% framework, some useful closed lines are filtered to vote
better than the existing best method in terms of IOU the fused edge lines and semantic segmentation
and the Qua line indicator. However, considering the results. Meanwhile, to repair the connection of the
complex internal structure characteristics of farmland farmland boundary, a locally optimized post-
12 L. XU ET AL.

Table 1. Comparison of different line indicators of different methods.


Model Name Com (%) Cor (%) Qua (%)
RCF 67.99 90.30 57.28
DexiNed 73.43 94.92 64.35
MaskRcnn 67.19 92.18 61.58
PointRend 67.51 92.32 62.58
Bsinet 83.00 86.30 73.06
Ours 84.68 92.89 76.07
RCF + patch refinement 72.01 88.25 60.53
DexiNed + patch refinement 75.27 94.41 66.15

Table 2. Comparison of different area indicators of different than the single-task method. In this section, it will be
methods. shown that our SGENet (S: semantic segmentation
Methods\metrics IOU (%) F1 (%) Acc (%) task, G: Gaussian segmentation task, E: edge detection
RCF 64.47 78.40 85.74 task) is the optimal combination. Meanwhile, the qua­
DexiNed 65.19 78.80 86.30
MaskRcnn 59.31 74.46 85.29 litative and quantitative results will be analyzed and
PointRend 58.89 74.13 85.20 discussed.
Bsinet 68.91 81.59 88.49
Ours 76.50 86.69 90.74

The study area in Denmark


processing method is adopted to further improve the The visual results of the farmland boundary
farmland boundary extraction results. (Figure 13) indicate that the SGENet model can be
To train the connectivity of the patch-level model, more prominent in the areas where the weak bound­
this paper randomly selected the slices with a size of ary is not obvious. It also performs well in segmenting
48 × 48 pixels from each image in the training set. The the farmland connecting with artificial features, which
pixel center of all slices is at the farmland boundary. illustrates the necessity of the Gaussian segmentation
Figure 12 shows some results of applying the slice- task. The addition of this task improves the ability to
level model of local connections. It can be seen that extract different types of farmland boundaries, and the
despite the complex planting structure of the plot and topological structure of the extracted farmland bound­
the fuzzy information on the edge of the plot, the aries becomes clearer, which is conducive to the sub­
points that our model can predict are basically located sequent local optimization.
on the border of the farmland. The indicator comparison is shown in Tables 3 and
4 (all index calculations are based on plot data, that is,
the line and area of non-plots are not included). It can
Case analysis
be seen from Table 3 that the three-task SGENet
In the previous section, it has been proved that the model performs 9.72% better than the two-task
multi-task model achieves better extraction results SENet model in terms of Qua. Also, the SGENet

Figure 12. The extraction results of the cultivated land plots. The five columns present the original image, the ground truth, the
instantiation result inferred by RCF, DexiNed, MaskRcnn, PointRend, and the final segmentation result obtained by our method
framework (from left to right).
EUROPEAN JOURNAL OF REMOTE SENSING 13

Figure 13. Visual results of the patch refinement model for connectivity applied to the patches of the Denmark dataset. The red
dots represent the detection results.

Table 3. Comparison of line indicators.


Com (%) Cor (%) Qua (%) Semantic Gaussian Edge Patch Refinement
71.02 94.50 61.76 √ √
79.77 94.24 71.48 √ √ √
84.68 92.89 76.07 √ √ √ √

Table 4. Comparison of area indicators.


IOU (%) F1 (%) Acc (%) Semantic Gaussian Edge Patch Refinement
63.93 78.00 86.36 √ √
74.43 85.34 89.98 √ √ √
76.50 86.69 90.74 √ √ √ √

Figure 14. Ablation experiment results of two cascade methods. SENet refers to the two-task cascade model with semantic
segmentation and edge detection, and SGENet refers to the three-task cascade model with semantic segmentation, gaussian
extraction, and edge detection.
14 L. XU ET AL.

Figure 15. Farmland boundary phenomenon.

Figure 16. Comparison of plot extraction using the voting fusion strategy. The first row presents the ground truth; the second row
presents the fusion result of the semantic output and the edge output predicted by the SENet model; the third row presents the
fusion result by the SGENet model; the fourth row presents the fusion result by SGENet with patch refinement.
EUROPEAN JOURNAL OF REMOTE SENSING 15

model performs 4.59% and 2.07% better than the the line probability map, the farmland boundary is
three-task model without patch refinement in terms obvious, which complements the surface probability
of Qua and IOU. map. The final plot segmentation map is formed by
After our model was optimized, the accuracy of the combining the advantages of the two maps and adding
farmland boundary line decreased a little in the line a breakpoint connection method to make the broken
indicator. As shown in Figure 14, because the bound­ line form closed contours. It can also be seen from the
aries of some adjacent farmland plots are blurred, they image map that instance segmentation can be achieved
were merged into the same plot. In model extraction, by rationally filtering out incorrect small patches from
these adjacent farmland boundaries were extracted, the process of combining surface elements and line
especially after post-processing. As a result of the elements and paying attention to the accuracy of farm­
increase in the overall index, the accuracy decreased. land boundary extraction.
It can be seen from Figure 15, compared with other Figure 17 visualizes the cropland boundaries
ablation experiments, our final framework optimizes the extracted by the SGENet model (green line) and
edge detection results through the Gaussian segmenta­ SGENet + patch refinement (red line). We found that
tion task and local-repair post-processing methods. After most of the farmland boundaries in the large remote
the edge detection results are fused with the intermediate sensing image area are extracted by our cascade net­
semantic output, the parcel omission can be well reduced. work. However, some adjacent farmland borders are
Figure 16 shows the intermediate output of our blurred, and the boundary extraction results with
cascade model, including the area probability map, weaker features are intermittent and imperfect. Due
the line probability map, and the final instance seg­ to the inability to connect and close the edge lines
mentation result map. From the binarized area prob­ wrapping farmland, some plots were missed. Our
ability map, the strong semantic information of the patch refinement strategy uses the local structured
farmland can be observed, but the probability of the information of farmland to connect some unclosed
extracted farmland boundary information is small. In edge lines and increases the number of correct plots

Figure 17. Field extraction examples in the study area of Denmark.

Figure 18. Extraction result of the study area.


16 L. XU ET AL.

Figure 19. Edge intensity graph. The four columns respectively represent the original image, the ground truth, the edge output
inferred by the DexiNed model and our SGENet model from left to right. (the edge strength predicted by the model is binarized by
the Otsu method to obtain the edge line).

to be extracted. Moreover, the results also show that results. For example, our method performs better than
our method achieves excellent extraction results in the DexiNed by 31.55%, 11.38%, and 21.47% in Com, Qual,
overall field areas in Denmark (Figure 18). and IOU, respectively (Table 5). The value of the Cor
indicator has declined, and this is because the farmland
The study area of Chongqing boundaries in mountainous areas are as ambiguous as
In this hilly area, a lot of farmlands are fragmented those in the study area of Denmark and are not marked.
into plots of different shapes, and many farmland plots With patch refinement, the indicator values are signifi­
are separated by mountain roads or surrounding for­ cantly improved in addition to the corr indicator.
ests and grasses. As shown in Figure 19, compared
with the DexiNed model, our model has obvious
advantages in the case of the narrow and long borders Discussion
of cultivated land, the junction of forests and grasses,
Monitoring of farmland plots is an important compo­
and cultivated land. SGENet can refine and highlight
nent in global digital agriculture practices. This study
the borders of farmland under complex terrain, and
proposes a novel model to extract cropland parcels
the strength of the extracted borders is obvious.
from high-resolution remote sensing images. The
The major boundaries in the study area of
main results of this study suggest that our proposed
Chongqing are correctly detected (Figure 20) by our
SGENet model has great advantage in extracting plots
proposed method, while a few are missing because the
of cropland fields with high-performance, especially in
closed line cannot be formed without patch refine­
areas with serious and complex fragmentations.
ment. The farmland landscapes in mountainous
areas are mostly composed of farmland boundaries
with complex structures, and many adjacent farmland
Datasets and its uncertainty
boundaries are interdependent. Because the texture
feature is not obvious, the strength of some boundary High-resolution remote sensing images with rich spec­
extraction is weak. Patch refinement can make up for tral information will undoubtedly help our models to
this deficiency and enhance the detection of farmland learn more and make accurate judgments. As the data
contours (Figure 20, a2, b2, a3, b3), thus effectively used in this study only contain information in the RGB
improving the extraction of plots. band and does not involve multi-band satellite data, the
Our method also significantly outperforms DexiNed recognition ability of the model may be affected.
in the study of Chongqing, and it provides more accurate Nevertheless, as satellite data processing techniques
EUROPEAN JOURNAL OF REMOTE SENSING 17

Figure 20. Comparative experimental results before and after boundary patch refinement in the study area of Chongqing. From
top to bottom, the first row presents the extracted results of the farmland plots in the entire study area. (red: SGENet, yellow:
SGENet+patch refinement); the following two rows present the partial extraction results; the blue boxes exhibit the area where
the broken line is particularly obvious.

Table 5. Comparisons of different evaluation indicators.


Methods\metrics Com (%) Cor (%) Qua (%) IOU (%) F1 (%) Acc (%)
DexiNed 57.83 86.49 53.52 59.03 74.24 82.14
Ours(without Patch Refinement) 84.27 72.99 63.84 76.23 86.51 89.86
Ours 89.38 70.66 64.90 80.50 89.19 91.56

improve, future high-resolution satellite products will Considering the chromatic aberration of images at
greatly reduce these uncertainties, and also the pro­ different periods, we used multi-period images for
posed model in this study will benefit and further mosaicking to obtain cloud-free images, so there may
expand the application scenarios spanning from RGB be some minor mismatches between the reference data
bands to multispectral satellite data in the future. and the image data. The obvious errors were corrected
18 L. XU ET AL.

through manual inspection. Nevertheless, our model can of complex edge information. The Gaussian extraction
still tolerate some small labeling errors in accuracy task is added in this study as the transition between
evaluation. semantic task and edge task. The structured location
information of the Gaussian extraction task is bene­
ficial for the learning of the edge detection task. In
Topology information and its benefits to model
addition, the patch refinement model further
Due to the high resolution of the images used in this strengthens the extraction of weak boundaries.
study, there is a large amount of heterogeneity and Additionally, the attribute categories provided by the
fragmentation in the landscape, especially in mountai­ semantic task and the geometric information provided
nous areas. Field boundaries are often indistinct or by the boundary detection task are merged to extract
vaguely visible because the textural patterns are often the final farmland parcels.
less significant. In this case, parcels cannot be
extracted well by the semantic segmentation model
or edge model only. Compared with the single-task Breakpoints refine strategy and its robustness
semantic segmentation, our model achieves an It should be noted that there are certain defects in the
improved performance in cropland fields extraction. post-processing stage of our method. In the case of
In addition, compared with the instance segmentation inferring the correct connection direction, our refine
network, our model also has significant advantages, strategy cannot effectively repair the connection of the
especially in the IOU indicator of the final extracted breakpoints on the farmland boundary (Figure 21).
plot. The main reason is that the parcel object at the Since there is only a single breakpoint in the local
border of the slice is incomplete, which affects the area, we could not find any matching points. This
overall segmentation result of the instance segmenta­ kind of connection failure can only be compensated
tion model. The boundaries of farmland are con­ by adding logical judgments for such scenarios.
nected, and this feature is more obvious in local Considering that such scenarios are rare, if all scenar­
areas. In our designs of method, based on the topology ios are traversed in a loop, the computational cost will
of the edge lines of the plot, intermittent farmland be greatly increased. More intelligent point connectiv­
boundaries are connected using local features to repair ity models require further exploration.
more parcel results, so as to maximize the recall rate of
the plot.
Model generalization ability in regional tests
Unlike the case in Denmark, the second one in the
Limitations of edge detection task and optimized
Southwest of Chongqing is featured with small fields
strategy
of irregular shapes, and most of the fields are culti­
Our model is used in the case where edge features are vated with mixed crops. We reclassified the metadata
trained under the guidance of semantic features. This to unify all agricultural land into the plots that need to
effectively filters the boundaries of non-cultivated be extracted. Despite the complexity of the task, the
land. Since the farmland boundary may be composed proposed deep cascade network obtained more signif­
of different feature materials, it is difficult to define the icant performance than other methods in both cases.
specific type of the unified farmland boundary. The This at least indicates that our model is more robust to
edge detection task cannot learn to cover many types different farm geometries and crop types.

Figure 21. The performance variation of different boundary extraction methods with different buffer widths on the Denmark data
set.
EUROPEAN JOURNAL OF REMOTE SENSING 19

Figure 22. Results of agricultural fields extractions using our method on the Germany.

To further verify the generalization ability of the produce less satisfactory segmentation results
model in other regions, we used a model trained on (Farahani et al., 2021). Heterogeneous domain adapta­
the Denmark Sentinel 2 imagery to directly predict the tion (Tasar et al., 2020; Yao et al., 2021) can solve the
Germany Sentinel 2 image. As shown in Figure 22, our learning of such cross-domain samples with different
finding is that the transferability of our cascade model probability distributions and feature representations
needs to be further improved. However, the refine­ (Dou et al., 2019; Pizzati et al., 2020; Voreiter et al.,
ment model can still significantly improve the detec­ 2020; Vu et al., 2019; Yang & Soatto, 2020).
tion rate of the plot, which reflects the good
applicability of the post-processing method. Further,
Impacts of model parameters on evaluation
we used a model pre-trained on the source image data
indicators
to directly predict new data from the downsampled
Denmark Sentinel 2. It was found that the spatial This paper introduces a buffer-based evaluation for
resolution downsampling factor is inversely propor­ the boundaries because it is difficult to directly com­
tional to the robustness of the model (Figure 23). We pare the pixel difference between the extracted bound­
observe that our model capability appears sensitive to aries and the ground truth. Based on this, the influence
changes in the spatial resolution of the data. Different of different buffer widths ρ on the performance of
regions, different spatial resolutions of images, or dif­ boundary extraction methods was investigated.
ferent sensors often lead to very different data distri­ Figure 24 shows the performance variation of different
butions and feature representations for the same boundary extraction methods with buffer widths (1, 3,
category. Such differences often induce CNNs to 5, and 7 pixels) on the Denmark data set. The results

Figure 23. Extraction of farmland in Denmark under different spatial resolutions.


20 L. XU ET AL.

Figure 24. The performance variation of different boundary extraction methods with different buffer widths on the Denmark data set.

indicate that as the buffer width grew, the extraction local topological information, local repair and
accuracy was significantly improved. Our method per­ enhanced Gaussian-based optimization strategies.
formed better than RCF and DexiNed on Qual, and it We propose a multi-task cascade network model for
achieved a 20% improvement on Qual compared to cropland parcel extraction. This multitask model can
RCF. Also, when the buffer width ρ changed from 5 to automatically learn multi-scale and multi-level fea­
7, the fusion performance slightly improved. tures and perform the overall learning through
Therefore, in this study, we try to set widths up to 7, a specially designed network for complex cropping
indicating a more relaxed constraint for the indicators. scenarios and multi-scale cropland parcels.
To assess the robustness of the model, experiments
were conducted in the study areas of Denmark and
Implications for agricultural info-survey systems Chongqing, considering differences between global
agricultural land characteristics. The results show that
Global agricultural land is widely distributed across
this model can extract accurate information about
different eco-climatic zones, and planting patterns and
cropland parcels and achieve better consistency with
crop types vary considerably. The Find-ability,
the satellite datasets in terms of spatio-temporal char­
Accessibility, Interoperability, and Reusability (FAIR)
acteristics. Our proposed model is expected to be inte­
are crucial for the agricultural land parcel data. Remote
grated with practical agricultural observation systems,
sensing and Geographic Information Systems (GIS) are
and thus to contribute to the development of a global
playing an increasingly important role in the manage­
agricultural information management platform.
ment of agricultural parcel information (Kocur-Bera,
2020; Lin & Zhang, 2021). However, traditional parcel
surveys were associated with high economic costs. The Acknowledgments
combination of high-resolution remote sensing data
and intelligent algorithms would make it feasible to We thank the Danish Agency for Agriculture for providing
manage parcel information from regional to global Danish Land Parcel Identification System (LPIS) data
(https://collections.eurodatacube.com/denmark-lpis/). This
scale easily and efficiently (e.g. convergence of multiple study is funded by the Key Laboratory of Land Satellite
sources of satellite data, high performance cloud- Remote Sensing Application, Ministry of Natural
computing platforms, online real-time data processing Resources of the People’s Republic of China under grant
and improving the reliability of the model in the face of KLSMNR-202106. We thank the anonymous reviewers for
large amounts of data). Due to the advantages of low their valuable comments and suggestions.
cost and global spatial and temporal consistency of
open policy-based satellite data, the multitasking
model we proposed is more promising for constructing Disclosure statement
global-scale agricultural parcel information, which The authors declared that they have no conflicts of interest
strongly tackles the weak interoperability of previous to this work.
parcel databases obtained based on multi-source satel­
lite data.
Funding
The work was supported by the The Key Laboratory of Land
Remarking conclusions Satellite Remote Sensing Application, Ministry of Natural
Resources of the People’s Republic of China [KLSMNR-
In this study, we summarize the shortcomings of pre­ 202106]. This work was funded by Suzhou Urban Renewal
vious studies, recognize the importance of spatially Research Institute.
EUROPEAN JOURNAL OF REMOTE SENSING 21

Data availability statement Plug-and-play adversarial domain adaptation network at


unpaired cross-modality cardiac segmentation. IEEE
The code of our model is available from the link below Access, 7, 99065–99076. https://doi.org/10.1109/
(https://github.com/SonwYang/SLP-cropland-parcel- ACCESS.2019.2929258
extraction), and the training dataset also is available from Farahani, A., Voghoei, S., Rasheed, K., & Arabnia, H. R.
the authors upon reasonable request. (2021). A brief review of domain adaptation. In
R. Stahlbock, G.M. Weiss, M. Abou-Nasr, C.Y. Yang, H.
R. Arabnia, & L. Deligiannidis (Eds.), Advances in Data
Science and Information Engineering (pp. 877–894).
Authorship contribution statement Springer. https://doi.org/10.1007/978-3-030-71704-9_65
Leilei Xu, Fei Peng, Yongxing Wu, Peng Yang and Jia Xu: Fritz, S., See, L., McCallum, I., You, L., Bun, A.,
conceptualization and methodology; Leilei Xu, Peng Yang, Moltchanova, E., Duerauer, M., Albrecht, F., Schill, C.,
Juanjuan Yu, Shiran Song and Hao Chen: visualization; Fei Perger, C., Havlik, P., Mosnier, A., Thornton, P., Wood-
Peng, Leilei Xu: formal analysis; Leilei Xu, Peng Yang and Sichra, U., Herrero, M., Becker-Reshef, I., Justice, C.,
Fei Peng: original draft; Yongxing Wu and Fei Peng: review Hansen, M., Gong, P. . . . Obersteiner, M. (2015).
and editing, supervision; All authors discussed the manu­ Mapping global cropland and field size. Global Change
script equally. Biology, 21(5), 1980–1992. https://doi.org/10.1111/gcb.
12838
Garcia-Pedrero, A., Lillo-Saavedra, M., Rodriguez-
References Esparragon, D., & Gonzalo-Martin, C. (2019). Deep
learning for automatic outlining agricultural parcels:
Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., & Exploiting the land parcel identification system. IEEE
Paluri, M. 2019. “Improved road connectivity by joint Access, 7, 158223–158236. https://doi.org/10.1109/
learning of orientation and segmentation.” in ACCESS.2019.2950371
Proceedings of the IEEE/CVF Conference on Computer Graesser, J., & Ramankutty, N. (2017). Detection of crop­
Vision and Pattern Recognition, pp. 10385–10393. land field parcels from Landsat imagery. Remote Sensing
https://doi.org/10.1109/CVPR.2019.01063 of Environment, 201, 165–180. https://doi.org/10.1016/j.
Belgiu, M., & Csillik, O. (2018). Sentinel-2 cropland map­ rse.2017.08.027
ping using pixel-based and object-based time-weighted He, K., Gkioxari, G., Dollár, P., & Girshick, R. 2017. “Mask
dynamic time warping analysis. Remote Sensing of R-CNN.” in Proceedings of the IEEE International
Environment, 204, 509–523. https://doi.org/10.1016/j.rse. Conference on Computer Vision , pp. 2961–2969.
2017.10.005 https://arxiv.org/abs/1703.06870
Bischke, B., Helber, P., Folz, J., Borth, D., & Dengel, A. 2019. He, K., Zhang, X., Ren, S., & Sun, J. 2016. “Deep residual
“Multi-task learning for segmentation of building foot­ learning for image recognition.” in Proceedings of the
prints with deep neural networks.” in 2019 IEEE IEEE Conference on Computer Vision and Pattern
International Conference on Image Processing (ICIP), Recognition, pp. 770–778. https://doi.org/10.48550/
Taipei, Taiwan, pp. 1480–1484. https://doi.org/10.1109/ arXiv.1512.03385
ICIP.2019.8803050 . Hong, R., Park, J., Jang, S., Shin, H., Kim, H., & Song, I.
Cheng, G., Wang, G., & Han, J. (2022). ISNet: Towards (2021). Development of a parcel-level land boundary
improving separability for remote sensing image change extraction algorithm for aerial imagery of regularly
detection. IEEE Transactions on Geoscience and Remote arranged agricultural areas. Remote Sensing, 13(6), 1167.
Sensing, 60, 5623811. https://doi.org/10.1109/TGRS.2022. https://doi.org/10.3390/rs13061167
3174276 Jou, B., & Chang, S. F. 2016. “Deep cross residual learning
Cheng, G., Wang, Y., Xu, S., Wang, H., Xiang, S., & Pan, C. for multitask visual recognition.” in Proceedings of the
(2017). Automatic road detection and centerline extrac­ 24th ACM international Conference on Multimedia,
tion via cascaded end-to-end convolutional neural Amsterdam, The Netherlands, pp. 998–1007. https://doi.
network. IEEE Transactions on Geoscience and Remote org/10.1145/2964284.2964309.
Sensing, 55(6), 3322–3337. https://doi.org/10.1109/TGRS. Kirillov, A., Wu, Y., He, K., & Girshick, R. 2020. “Pointrend:
2017.2669341 Image segmentation as rendering.” in Proceedings of the
Cheng, G., Yang, C., Yao, X., Guo, L., & Han, J. (2018). IEEE/CVF Conference on Computer Vision and Pattern
When deep learning meets metric learning: Remote sen­ Recognition, pp. 9799–9808. https://doi.org/10.1109/
sing image scene classification via learning discriminative CVPR42600.2020.00982
CNNs. IEEE Transactions on Geoscience and Remote Kocur-Bera, K. (2020). Understanding information about
Sensing, 56(5), 2811–2821. https://doi.org/10.1109/ agricultural land. An evaluation of the extent of data
TGRS.2017.2783902 modification in the land parcel identification system for
Diakogiannis, F. I., Waldner, F., Caccetta, P., & Wu, C. the needs of area-based payments–a case study. Land Use
(2020). ResUNet-a: A deep learning framework for Policy, 94, 104527. https://doi.org/10.1016/j.landusepol.
semantic segmentation of remotely sensed data. Isprs 2020.104527
Journal of Photogrammetry and Remote Sensing, 162, Kuemmerle, T., Hostert, P., St-Louis, V., & Radeloff, V. C.
94–114. https://doi.org/10.1016/j.isprsjprs.2020.01.013 (2009). Using image texture to map farmland field size:
Ding, L., & Bruzzone, L. (2021). Diresnet: Direction-aware A case study in Eastern Europe. Journal of Land Use
residual network for road extraction in vhr remote sen­ Science, 4(1–2), 85–107. https://doi.org/10.1080/
sing images. IEEE Transactions on Geoscience and Remote 17474230802648786
Sensing, 59(12), 10243–10254. https://doi.org/10.1109/ Lee, K., Kim, J. H., Lee, H., Park, J., Choi, J. P., &
TGRS.2020.3034011 Hwang, J. Y. (2021). Boundary-oriented binary building
Dou, Q., Ouyang, C., Chen, C., Chen, H., Glocker, B., segmentation model with two scheme learning for aerial
Zhuang, X., & Heng, P. A. (2019). PnP-AdaNet: images. IEEE Transactions on Geoscience and Remote
22 L. XU ET AL.

Sensing, 60, 5604517. https://doi.org/10.1109/TGRS.2021. imagery. Pattern Recognition, 37(8), 1619–1628. https://
3089623 doi.org/10.1016/j.patcog.2004.03.001
Lin, L., & Zhang, C. (2021). Land parcel identification. In Newell, A., Yang, K., & Deng, J. 2016. “Stacked hourglass
L. Di & B. Üstündağ (Eds.), Agro-Geoinformatics (pp. networks for human pose estimation.” in: B. Leibe,
163–174). Springer. https://doi.org/10.1007/978-3-030- J. Matas, N. Sebe, & M. Welling (Eds.), Computer
66387-2_9 Vision – European Conference on Computer Vision.
Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. 2017. Springer, Cham, pp. 483–499. https://doi.org/10.1007/
“Richer convolutional features for edge detection.” in 978-3-319-46484-8_29.
Proceedings of the IEEE Conference on Computer Oquab, M., Bottou, L., Laptev, I., & Sivic, J. 2014. “Learning
Vision and Pattern Recognition, pp. 3000–3009. https:// and transferring mid-level image representations using
doi.org/10.1109/TPAMI.2018.2878849 convolutional neural networks.” in Proceedings of the
Liu, S., Johns, E., & Davison, A. J. 2019. “End-to-end IEEE Conference on Computer Vision and Pattern
multi-task learning with attention.” in Proceedings of Recognition, pp. 1717–1724. https://doi.org/10.1109/
the IEEE/CVF Conference on Computer Vision and CVPR.2014.222
Pattern Recognition, pp. 1871–1880. https://doi.org/10. Persello, C., Tolpekin, V. A., Bergado, J. R., & de by, R. A.
1109/CVPR.2019.00197 (2019). Delineation of agricultural fields in smallholder
Liu, H., Luo, J., Sun, Y., Xia, L., Wu, W., Yang, H., Hu, X., & farms from satellite images using fully convolutional net­
Gao, L. 2019. “Contour-oriented cropland extraction works and combinatorial grouping. Remote Sensing of
from high resolution remote sensing imagery using richer Environment, 231, 111253. https://doi.org/10.1016/j.rse.
convolution features network.” in 2019 8th International 2019.111253
Conference on Agro-Geoinformatics (Agro- Pizzati, F., Charette, R. D., Zaccaria, M., & Cerri, P. 2020.
Geoinformatics), Istanbul, Turkey, pp. 1–6. https://doi. “Domain bridge for unpaired image-to-image translation
org/10.1109/Agro-Geoinformatics.2019.8820430. and unsupervised domain adaptation.” in Proceedings of
Liu, W., Wang, J., Luo, J., Wu, Z., Chen, J., Zhou, Y., Sun, Y., the IEEE/CVF Winter Conference on Applications of
Shen, Z., Xu, N., & Yang, Y. (2020). Farmland parcel Computer Vision, pp. 2990–2998. https://doi.org/10.
mapping in mountain areas using time-series SAR data 48550/arXiv.1910.10563
and VHR optical images. Remote Sensing, 12(22), 3733. Poma, X. S., Riba, E., & Sappa, A. 2020. “Dense extreme
https://doi.org/10.3390/rs12223733 inception network: Towards a robust cnn model for edge
detection.” in Proceedings of the IEEE/CVF Winter
Liu, Y., Yao, J., Lu, X., Xia, M., Wang, X., & Liu, Y. (2019).
Conference on Applications of Computer Vision, pp.
RoadNet: Learning to comprehensively analyze road net­
1923–1932. https://doi.org/10.48550/arXiv.2112.02250
works in complex urban scenes from high-resolution
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental
remotely sensed images. IEEE Transactions on
improvement. Computer Vision and Pattern Recognition,
Geoscience and Remote Sensing, 57(4), 2043–2056.
Accessed 12 5 2022. https://doi.org/10.48550/arXiv.1804.
https://doi.org/10.1109/tgrs.2018.2870871
02767
Long, J., Li, M., Wang, X., & Stein, A. (2022). Delineation of
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net:
agricultural fields using multi-task BsiNet from
Convolutional networks for biomedical image segmenta­
high-resolution satellite images. International Journal of
tion. In N. Navab, J. Hornegger, W. Wells, & A. Frangi
Applied Earth Observation and Geoinformation, 112,
(Eds.), Medical Image Computing and Computer-Assisted
102871. https://doi.org/10.1016/j.jag.2022.102871 Intervention – MICCAI 2015. MICCAI 2015 (pp.
Long, J., Shelhamer, E., & Darrell, T. 2015. “Fully convolu­ 234–241). Springer. https://doi.org/10.1007/978-3-319-
tional networks for semantic segmentation.” in 24574-4_28
Proceedings of the IEEE Conference on Computer Ruder, S. (2017). An overview of multi-task learning in deep
Vision and Pattern Recognition, pp. 3431–3440 https:// neural networks. https://doi.org/10.48550/arXiv.1706.
doi.org/10.48550/arXiv.1411.4038 05098
Ma, F., Gao, F., Sun, J., Zhou, H., & Hussain, A. (2019). Tasar, O., Tarabalka, Y., Giros, A., Alliez, P., & Clerc, S.
Attention graph convolution network for image segmen­ 2020. “StandardGAN: Multi-source domain adaptation
tation in big SAR imagery data. Remote Sensing, 11(21), for semantic segmentation of very high resolution satel­
2586. https://doi.org/10.3390/rs11212586 lite images by data standardization.” in Proceedings of the
Masoud, K. M., Persello, C., & Tolpekin, V. A. (2020). IEEE/CVF Conference on Computer Vision and Pattern
Delineation of agricultural field boundaries from Recognition Workshops, pp. 192–193. https://doi.org/10.
Sentinel-2 images using a novel super-resolution con­ 1109/CVPRW50498.2020.00104
tour detector based on fully convolutional networks. Torre, M., & Radeva, P. 2000. “Agricultural-field extraction
Remote Sensing, 12(1), 59. https://doi.org/10.3390/ on aerial images by region competition algorithm.” in
rs12010059 Proceedings 15th International Conference on Pattern
Meyer, F., & Beucher, S. (1990). Morphological Recognition. ICPR-2000, Barcelona, Spain, pp. 313–316.
segmentation. Journal of Visual Communication and https://doi.org/10.1109/ICPR.2000.905337.
Image Representation, 1(1), 21–46. https://doi.org/10. Voreiter, C., Burnel, J. C., Lassalle, P., Spigai, M.,
1016/1047-3203(90)90014-M Hugues, R., & Courty, N. 2020. “A cycle GAN approach
Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. 2016. for heterogeneous domain adaptation in land use
“Cross-stitch networks for multi-task learning.” in classification.” in IGARSS 2020-2020 IEEE International
Proceedings of the IEEE Conference on Computer Geoscience and Remote Sensing Symposium, pp.
Vision and Pattern Recognition, pp. 3994–4003. https:// 1961–1964. https://doi.org/10.1109/IGARSS39084.2020.
doi.org/10.1109/CVPR.2016.433 9324264.
Mueller, M., Segl, K., & Kaufmann, H. (2004). Edge- and Vu, T. H., Jain, H., Bucher, M., Cord, M., & Pérez, P. 2019.
region-based segmentation technique for the extraction “ADVENT: Adversarial entropy minimization for
of large, man-made objects in high-resolution satellite domain adaptation in semantic segmentation.” in
EUROPEAN JOURNAL OF REMOTE SENSING 23

Proceedings of the IEEE/CVF Conference on Computer tion/287488637_On_combining_spectral_textural_


Vision and Pattern Recognition, pp. 2517–2526. https:// and_shape_features_for_remote_sensing_image_
doi.org/10.48550/arXiv.1811.12833 segmentation
Wagner, M. P., & Oppelt, N. (2020). Extracting agricultural Xia, L., Luo, J., Sun, Y., & Yang, H. 2018. “Deep extraction of
fields from remote sensing imagery using graph-based cropland parcels from very high-resolution remotely
growing contours. Remote Sensing, 12(7), 1205. https:// sensed imagery.” in 2018 7th International Conference
doi.org/10.3390/rs12071205 on Agro-geoinformatics (Agro-geoinformatics),
Waldner, F., Canto, G. S., & Defourny, P. (2015). Hangzhou, China, pp. 1–5. https://doi.org/10.1109/Agro-
Automated annual cropland mapping using Geoinformatics.2018.8476002.
knowledge-based temporal features. Isprs Journal of Xie, S., & Tu, Z. 2015. “Holistically-nested edge detection.”
Photogrammetry and Remote Sensing, 110, 1–13. https:// in Proceedings of the IEEE International Conference on
doi.org/10.1016/j.isprsjprs.2015.09.013 Computer Vision, pp. 1395–1403. https://doi.org/10.
Waldner, F., & Diakogiannis, F. I. (2020). Deep learning on 48550/arXiv.1504.06375
edge: Extracting field boundaries from satellite images Xiong, J., Thenkabail, P. S., Gumma, M. K., Teluguntla, P.,
with a convolutional neural network. Remote Sensing of Poehnelt, J., Congalton, R. G., Yadav, K., & Thau, D.
Environment, 245, 111741. https://doi.org/10.1016/j.rse. (2017). Automated cropland mapping of continental
2020.111741 Africa using google earth engine cloud computing. Isprs
Wei, Y., Zhang, K., & Ji, S. (2020). Simultaneous road sur­ Journal of Photogrammetry and Remote Sensing, 126,
face and centerline extraction from large-scale remote 225–244. https://doi.org/10.1016/j.isprsjprs.2017.01.019
sensing images using CNN-based segmentation and Yang, Y., & Soatto, S. 2020. “FDA: Fourier domain adapta­
tracing. IEEE Transactions on Geoscience and Remote tion for semantic segmentation.” In Proceedings of the
Sensing, 58(12), 8919–8931. https://doi.org/10.1109/tgrs. IEEE/CVF Conference on Computer Vision and Pattern
2020.2991733 Recognition, pp. 4085–4095. https://doi.org/10.1109/
Wiedemann, C., Heipke, C., Mayer, H., & Jamet, O. 1998. CVPR42600.2020.00414
“Empirical evaluation of automatically extracted road Yan, L., & Roy, D. P. (2014). Automated crop field extrac­
axes.” in: K. Bowyer & P.J. Phillips (Eds.), Empirical tion from multi-temporal web enabled Landsat data.
Evaluation Techniques in Computer Vision, IEEE Remote Sensing of Environment, 144, 42–64. https://doi.
Computer Society Press, Washington, DC, United org/10.1016/j.rse.2014.01.006
States, pp. 172–187. Yao, Y., Li, X., Zhang, Y., & Ye, Y. (2021). Multisource
Wu, Z., Hu, Z., Zhang, Q., & Cui, W. (2013). On comb­ heterogeneous domain adaptation with
ing spectral, textural and shape features for remote conditional weighting adversarial network. IEEE
sensing image segmentation. Acta Geod Cartogr Sin, Transactions on Neural Networks and Learning Systems,
42(111), 44–50. https://www.researchgate.net/publica 1–14. https://doi.org/10.1109/TNNLS.2021.3105868

You might also like