0% found this document useful (0 votes)
10 views

Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale.18475v1

Uploaded by

neturiue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale.18475v1

Uploaded by

neturiue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Highlights

Weakly Supervised Framework Considering Multi-temporal Information for Large-scale


Cropland Mapping with Satellite Imagery
Yuze Wang, Aoran Hu, Ji Qi, Yang Liu, Chao Tao

• A weakly-supervised framework based on SITS for large-scale cropland mapping.


arXiv:2411.18475v1 [cs.CV] 27 Nov 2024

• Encode intrinsic features to optimize the utilization of labels from GLC products
• Experiments on three agricultural areas showed the advantage of the proposed method.
• Uncover the benefits of using multi-temporal information in cropland extraction.

• The methods exhibit robustness in data deficiency scenarios.


Weakly Supervised Framework Considering Multi-temporal
Information for Large-scale Cropland Mapping with Satellite
Imagery⋆
Yuze Wanga,1 , Aoran Hua , Ji Qia , Yang Liua,b and Chao Taoa,∗
a School of Geosciences and Info-Physics, Central South University, No. 932, Lushan Nan Road, Changsha, 410083, China
a The 27th Research Institute, China Electronic Technology Group Corporation, Zhengzhou, 450047, China

ARTICLE INFO ABSTRACT


Keywords: Accurately mapping large-scale cropland is crucial for agricultural production management
Weakly supervised and planning. Currently, the combination of remote sensing data and deep learning techniques
Multi-temporal information has shown outstanding performance in cropland mapping. However, those approaches require
Large-scale cropland mapping massive precise labels, which are labor-intensive. To reduce the label cost, this study presented a
weakly supervised framework considering multi-temporal information for large-scale cropland
mapping. Specifically, we extract high-quality labels according to their consistency among
global land cover (GLC) products to construct the supervised learning signal. On the one
hand, to alleviate the over-fitting problem caused by the model’s over-trust of remaining errors
in high-quality labels, we encode the similarity/aggregation of cropland in the visual/spatial
domain to construct the unsupervised learning signal, and take it as the regularization term to
constrain the supervised part. On the other hand, to sufficiently leverage the plentiful information
in the samples without high-quality labels, we also incorporate the unsupervised learning
signal in these samples, enriching the diversity of the feature space. After that, to capture
the phenological features of croplands, we introduce dense satellite image time series (SITS)
to extend the proposed framework in the temporal dimension. We also visualized the high-
dimensional phenological features to uncover how multi-temporal information benefits cropland
extraction, and assessed the method’s robustness under conditions of data scarcity. The proposed
framework has been experimentally validated for strong adaptability across three study areas
(Hunan Province, Southeast France, and Kansas) in large-scale cropland mapping, and the
internal mechanism and temporal generalizability are also investigated. The source codes are
available at https://github.com/wangyuze-csu/WSFCMI.

1. Introduction
Over the past decades, remote sensing observation has played significant roles in large-scale cropland mapping
and monitoring (Huang et al., 2018; Defourny et al., 2019). By offering timely and comprehensive images of nearly
every part of the Earth’s surface (Chi et al., 2016), it served as a reliable information source for identifying cropland
spatial distribution on a large scale (Weiss et al., 2020). Moreover, it provides valuable support for various agricultural
applications, such as land-use planning (Yin et al., 2021), food security (Karthikeyan et al., 2020; Calvao and Pessoa,
2015), and sustainable agroecology (Prince, 2019).
With the accumulation of remote sensing data, several data-driven methods based on machine learning have been
widely applied to the large-scale cropland mapping (Zanaga et al., 2022a; Do Nascimento Bendini et al., 2019; Pelletier
et al., 2016; Gong et al., 2013; Zanaga et al., 2022a; Yue et al., 2013; North et al., 2019; Belgiu and Csillik, 2018; Xu
et al., 2017). The widely adopted approaches are employing traditional machine learning methods to analyze and
interpret the remote sensing (RS) images. These methods often rely on a combination of low-level and middle-level
visual features, such as texture (Yue et al., 2013), spectral (North et al., 2019), and shape (Wagner and Oppelt, 2020)
features. However, due to the influence of various factors on cropland, such as climate, geography, and topography,
the methods that rely on handcrafted features typically encounter issues of restricted generalization performance and
low accuracy (Nanni et al., 2017).In recent years, Deep Convolutional Neural Networks (DCNNs) (LeCun et al.,

∗ Chao Tao
[email protected] (C. Tao)
ORCID (s): 0000-0003-0071-310X (C. Tao)

Yuze.W: Preprint submitted to Elsevier Page 1 of 25


WSF-MI for Large-scale Cropland Mapping

2015) attracted great attention in cropland mapping (Singh et al., 2022; Brown et al., 2022a; Zhang et al., 2020; Sun
et al., 2019; Karra et al., 2021a; Persello et al., 2019), because it can extract high-level visual features that are more
representative and distinguishable. However, the performance of DCNNs shows a strong positive correlation with
the number and diversity of high-quality labeled samples, leading to high labeling costs (Zhu et al., 2017; Li et al.,
2019). Although various methods (Hua et al., 2021; Lenczner et al., 2022) designed for sparse labeling conditions can
significantly decrease the demand for labeling, the remaining label requirement in large-scale cropland mapping still
implies a considerable manual cost. So, reducing the labeling cost while maintaining cropland mapping accuracy is
still a great challenge.
Some methods use existing global land cover (GLC) products, such as GFSAD 30 (Oliphant et al., 2019), CCI-LC
(Copernicus Climate Change Service, 2019), FROMGLC (Yu et al., 2013), MCD12Q1 (Friedl and Sulla-Menashe,
2019), Esri (Karra et al., 2021a), ESA (Zanaga et al., 2022a), DyWorld (Brown et al., 2022a) as reference to train the
model at a low cost and obtain accurate cropland mapping results (Li et al., 2021; Zhu et al., 2016). Those methods
are known as Automatic Training Sample Generation (ATSG) (Liu et al., 2022). However, the label obtained from
GLC products inevitably contains some errors, due to factors like the diversity of cropland scenes, imaging conditions,
and classifier performance. Directly using those labels may lead to instability and over-fitting in the model learning
process, causing low-quality cropland mapping. Therefore, identifying the errors of the labels generated from GLC
products and preventing their negative impacts on the model training process is one of the significant research topics
of ATSG methods.
According to the post-processing ways of error labels, the ATSG methods based on existing GLC products are
divided into two categories: discard and re-correct
Discard methods establish quality criteria to rate the labels from the GLC products, and exclude low-quality labels
before training. Based on the MCD12Q1 product, Zhang and Roy (2017) utilized temporal invariance as a quality
criterion to discard the labels that have changes within three years, and excluded the labels with low classification
confidence according to the quality assessment layer provided in the product auxiliary information. With the continuous
release of GLC products, Li and Xu (2020) identified high-quality labels by considering their consistent performance
across four GLC products (GFSAD 30, CCI-LC, FROMGLC, and MCD12Q1), and discarded the outlier labels based
on the spectral distribution of all corresponding pixels. Considering the spectral mixing problem among vegetation
cover, Hermosilla et al. (2022) integrated the 3D information from LiDAR data into the quality assessment of the
labels from three products (NFI photo plot (Stinson et al., 2016), EOSD (Wulder et al., 2003) and NWS (Wulder et al.,
2018)). By doing so, they could identify and remove the low-quality labels that mismatch the vegetation height.
Re-correct methods construct filter strategies to obtain high-quality labels based on criteria such as classification
consistency across multiple products (Hermosilla et al., 2022) and spectral consistency among the same land covers (Li
and Xu, 2020). High-quality labels are then obtained and used as references to correct the remaining labels. Considering
the spatial continuity and texture consistency within the same land cover, Chen et al. (2023) took the sub-regions
obtained by Simple Non-Iterative Clustering (SNIC) as the correction unit, and reclassified each unit through voting
among filtered high-quality labels, thereby extending high-quality labels into regions with low-quality labels. Zhang
et al. (2023) considered the phenological attribute of vegetation cover, and used high-quality labels and multi-temporal
images to train the classifier to correct low-quality labels. Naboureh et al. (2023) selected pure pixels with high-quality
labels as reference, and rectified the remaining pixels containing low-quality labels according to their spectral distance
from the selected pixels.
The above methods still encountered difficulties in achieving high-precision cropland extraction, thus falling short
of meeting the localized needs of agricultural applications. The main reasons are as follows:

1) Insufficient use of samples with low-quality labels: Discard methods can mitigate the model’s over-fitting problem
on incorrect labels, they also eliminate the samples with diverse features but low-quality labels before training.
Furthermore, Discard methods may lead to the imbalance of intra-class distribution for training samples (Wang
et al., 2022), so the model can only learn typical features and has limited generalization capability. For example, in
cropland mapping, these methods often constrain the trained classifier to only extract continuous and large block
plain croplands, making it challenging to recognize cropland in complex environments, such as those scattered
across hills and mountains.
2) Over-trust the samples with high-quality labels: Although re-correct methods allow diverse samples to participate
in the model training process, they potentially introduce a lot of label noise, misleading the model during the
learning process. This occurs for two reasons. First, due to the inherent limitations of the products, the filtering

Yuze.W: Preprint submitted to Elsevier Page 2 of 25


WSF-MI for Large-scale Cropland Mapping

Table 1
The extent, climate, and main crops of the three study areas.
Study areas Extent(km2 ) Climate Main crops
Hunan Province,China 211,800 Continental subtropical monsoonal humid climate Rice, Rapeseed, Cotton, Tea
Southwest France 195,910 Temperate maritime climate Spring wheat, Soybeans, Olives
Kansas State, USA 213,096 Temperate continental climate Winter wheat, Corn

strategies struggle to completely eliminate all errors in the selected high-quality labels. Second, re-correct methods
use samples with high-quality labels as references to correct the samples with low-quality labels, which may further
amplify the remaining label errors.

In summary, discard methods can ensure the overall quality of training samples but may decrease the diversity of
the feature space due to the insufficient use of samples with low-quality labels. On the other hand, re-correct methods
effectively utilize samples with low-quality labels, but place excessive trust in high-quality labels, which can amplify
label errors and mislead the model optimization process.
To balance these two issues, we proposed a weakly-supervised framework that uses the labels from existing
GLC products for large-scale cropland mapping. Specifically, to avoid the model over-trusting high-quality labels,
we encoded the intrinsic feature distribution of the image to construct the unsupervised part of the learning signal,
which was then used to constrain the supervised part directly constructed by the high-quality labels. It can prompt
the model to assess and question the reliability of the supervised part of the learning signal, avoiding over-fitting the
remaining errors in high-quality labels. Meanwhile, we applied the unsupervised part of the learning signal to the
samples with low-quality labels to use the information contained to enhance the diversity of the model’s feature space.
Additionally, we enhanced this framework by extending it into the temporal dimension, allowing the model to fully
extract the phenological feature and change patterns of croplands from the high-density Satellite Image Time Series
(SITS). The contributions of this paper are as follows:
1) Given the high costs of large-scale cropland mapping, we proposed a weakly-supervised framework that leverages
existing GLC data without manual labels. It further alleviates the impact of residual errors in filtered high-quality
labeled samples, while effectively utilizing the plentiful information contained in low-quality labeled samples.
2) In the framework, we flexibly incorporated multi-temporal DCNNs with SITS to capture the phenological features
of croplands. We further visualized these high-dimensional features to uncover how multi-temporal information
enhances cropland extraction, and assessed the method’s robustness under conditions of data scarcity to validate
its practical applicability.
3) We conducted experiments in three study areas to test the feasibility and stability of our method for large-scale
cropland mapping. We also investigated the internal working mechanisms and temporal generalizability of the
proposed framework.
The remainder of this paper is organized as follows: Section 2 presents the materials and process employed, and
Section 3 describes the workflow of the proposed framework and the detail of applied network architecture. Section 4
and Section 5 present experimental results and discussion, and Section 6 summarizes and concludes the paper.

2. Materials and process


2.1. Study areas and satellite imagery
We chose three study areas across Asia, Europe, and North America, each representing distinct terrain landscapes,
climatic regions, and agriculture systems, including different crop types and phenological attributes. Collectively, these
regions encompass a vast area exceeding 620,806 km2 (Figure 1). The general information of the study areas is as
follows (Table 1) :

• Hunan Province in China: Hunan province is situated in the central-southern part of China, covering a total area of
211,800 km2 . As one of China’s largest rice-planting bases, Hunan province ranks among the top ten in national
grain production. The province’s diverse terrain, ranging from semi-alpine to low mountains, hills, basins, and

Yuze.W: Preprint submitted to Elsevier Page 3 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 1: The location and extent of the three study areas. The red regions are used to validate the accuracy of the
cropland mapping result.

plains, presents significant challenges for cropland monitoring. The area has a continental subtropical monsoonal
humid climate, offering abundant light, heat, and water resources, which is suitable for cultivating a variety of crops
including rice, rapeseed, cotton, and tea. The rice types grown here include double-season rice, medium-season rice,
and late-season rice, each with distinct sowing and harvesting periods from April to November.
• Southwest France: We chose the southwest part of France, encompassing about 195,910 km2 , which represents 2/5
of the entire country, as our study area. This region has several major grain-producing areas such as Aquitaine Basin
and Centre-Val de Loire. The landscape is dominated by basins and valleys. It has a temperate oceanic climate,
suitable for cultivating a variety of crops, such as spring wheat, soybeans, and olives. Spring wheat and soybeans
are typically sown in spring and harvested in late summer, while the olives are planted in early summer and harvested
in mid-autumn.
• Kansas State in the USA: Kansas state is located in the middle of the USA, covering a total area of 213,096 km2 . It
is a major grain-producing region in the United States, with the highest wheat production in the country. Kansas has
large plains, suitable for extensive mechanized agricultural activities. The area has a temperate continental climate,
favoring growing winter wheat and corn. Winter wheat is sown in mid-September and harvested in late June to early
July of the following year. Corn is sown in mid-April and harvested in mid-October.

We collected Sentinel-2A and Sentinel-2B Level-2A Bottom of Atmosphere reflectance images (S2 L2A) from
Google Cloud, encompassing the three study areas for the period from January 2020 to December 2020. For cropland
extraction, we selectively utilized the Blue, Green, Red, and Near-Infrared (NIR) bands (10 m spatial resolution), along
with Narrow NIR, Red Edge (RE), and Short-Wave Infrared (SWIR) bands (20 m spatial resolution). Furthermore, we
used the Quality Assurance (QA) bands provided by the ESA on Google Earth Engine (GEE) as a reference to select
the images with cloud coverage less than 20% (Amani et al., 2020). The QA bands were also used to identify cloudy
regions in the remaining images, which were filled using the cloud-free images from adjacent time phases. Finally, all
SITS were composed of monthly images, where each month’s image was obtained by averaging all available images
during the month.

Yuze.W: Preprint submitted to Elsevier Page 4 of 25


WSF-MI for Large-scale Cropland Mapping

2.2. Global Cropland layer and Validation datasets


To generate training labels, we utilized the cropland layers from three GLC products with 10m spatial resolution:
ESA World Cover, Eris Land Cover, and Dynamic World. Firstly, we established a definition of ’cropland’ by uniformly
mapping from the related categories in various GLC products. Specifically, cropland is defined as fields that are covered
by annual crops that are sown or planted and are capable of being harvested at least once within the 12 months after
the date of sowing or planting. This type of annual cropland generates an herbaceous canopy, and may occasionally
be intermixed with trees or shrubby vegetation (Food and Agriculture Organization of the United Nations, 2005).
Secondly, we collected their data through the GEE platform, and projected their geographic coordinates to the World
Geodetic System 1984 (WGS 84), consistent with that of SITS. The details of the three GLC products are as follows:

• ESA World Cover (ESA) : It was developed as a part of the ESA WorldCover project, under the 5th Earth
Observation Envelope Program (EOEP-5) of the European Space Agency. The product was generated by SITS
from Sentinel-1 and Sentinel-2 with multiple random forest classifiers from 2020 to 2021. It contains 11 first-level
categories, and we used the ’Cultivated areas’ category from the 2020 product to obtain the labels. This category is
defined as land covered with annual cropland that is sowed/planted and harvestable at least once within the 12 months
after the sowing/planting date. The annual cropland produces an herbaceous cover and is sometimes combined with
some tree or woody vegetation (Zanaga et al., 2022b).
• Esri Land Cover (Esri) : It is a 10-m resolution map of the Earth’s land surface, published by the Environmental
Systems Research Institute. This map was annually generated from 2017 to 2022 using the composite Sentinel-2
satellite images by deep learning models. The product divides the land cover into 9 categories, and we used the
’cropland’ category from the 2020 product to obtain the labels. This category includes crops, human-planted cereals,
and other non-tree-height cultivated plants. (Karra et al., 2021b).
• Dynamic World (DyWorld) : It was developed by Google and the World Resources Institute. It is a near real-time
global land use and cover dataset, updated in sync with the revisit cycle of the Sentinel-2 satellite. The product was
generated by deep learning models using available images from 2015 to 2023. We collated all available data from
various time phases in 2020 and determined the land cover categories for each pixel based on the mode principle
derived from the statistical analysis. It divides land cover into 9 categories, and we used the ’crop’ category to obtain
the labels. This category is defined as crops humans planted/plotted cereals. (Brown et al., 2022b).
To assess the accuracy and completeness of the cropland mapping obtained by the proposed method, we established
validation datasets for the three study areas. Considering that random sampling alone may skew the validation datasets
toward particular cropland types prevalent in the study areas, potentially compromising the comprehensiveness and
objectivity of the cropland extraction assessment in diverse environments. To address this issue, we introduced manual
intervention in the selection process for the validation area. Specifically, we classified the entire study area based on the
topographical features, and then utilized this result as a foundation to manually refine the sub-areas that were initially
selected through random sampling.
For the Hunan study area, which is predominantly covered by hills and mountains, we manually excluded certain
mountainous and hilly sub-areas and included sub-areas dominated by plains. For the Kansas study area, where
plains are the dominant landscape, we excluded some sub-areas of plains, and added sub-areas that include hills and
mountains. The validation labels of Hunan and Kansas study areas were obtained by visually interpreting the RGB
bands of Sentinel-2 time series images, assisted by high-resolution Google images. In the Southwest France study
area, we utilized the S4A crop classification datasets (Sykas et al., 2022) as a basis and modified them for validation
purposes. We initially checked all the labels, and selected sub-areas with relatively comprehensive cropland annotations
for further supplementation and adjustment. Subsequently, we manually selected the sub-areas to ensure a balance
among various types of cropland scenes. The sampling results of the final validation areas are shown in Figure 1.
Finally, in the Hunan study area, we labeled a total of 978,388 cropland fields, accounting for 18.00% of the
validation area. In the Southwest France study area, we identified 520,177 cropland fields, which covered 40.48% of
the validation area. In the Kansas study area, we identified 185,250 cropland fields, accounting for 48.24% of the
validation area. Note that, ’cropland fields’ refer to a piece of cropland separated from one another by identifiable
boundaries (FAO., 2010; Xu et al., 2024). Each image-level sample may include several cropland fields. Some typical
samples of each study area are shown in Figure 2.

Yuze.W: Preprint submitted to Elsevier Page 5 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 2: Examples of croplands in the study areas.

3. Methodology
The proposed framework aims to effectively use the prior information from existing GLC products for large-scale
cropland mapping. By using this prior information, the model can learn the cropland phenology features from SITS
without manual labeling. As shown in Figure 3, the framework consists of three parts. (1) Labels collecting and
quality rating: We collect labels from GLC products, and evaluate their quality. These labels are then categorized
into high-quality and low-quality parts; (2) Construction of weakly supervised learning signal: we construct the
supervised part of the learning signal using high-quality labels, and encode the image intrinsic feature distribution to
construct the unsupervised part of the learning signal. By constructing the unsupervised part, we not only incorporate
the abundant information contained in the samples with low-quality labels into the model learning process, but also
prompt the model to assess and question the reliability of the high-quality labels. Additionally, we extend the weakly
supervised signal in the temporal dimension to sufficiently extract the phenology features of cropland. (3) Accuracy
assessment: we utilize the well-trained models for large-scale cropland mapping, and establish validation datasets to
evaluate their performance.

3.1. Labels collecting and quality rating


The cropland layers from GLC products are cross-referenced and quality-rated. Consistent parts are considered
high-quality labels, and divergent parts are considered low-quality labels. The quality of a given spatial position (𝑖, 𝑗)
is determined by whether the results of GLC products are the same. Finally, we obtained the high-quality sample mask
𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) and high-quality labels (𝑖, 𝑗):

{
1, if 𝑃1 (𝑖, 𝑗) = 𝑃2 (𝑖, 𝑗) = … = 𝑃𝑚 (𝑖, 𝑗)
𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) = (1)
0, otherwise

1 ∑𝑀
(𝑖, 𝑗) = 𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) ∗ (𝑃 (𝑖, 𝑗)) (2)
𝑀 𝑚=0 𝑚
where 𝑃𝑚 (𝑖, 𝑗) is the cropland labels of the m-th product at position (𝑖, 𝑗), and 𝑀 represents the total number of products

Yuze.W: Preprint submitted to Elsevier Page 6 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 3: The workflow of the proposed framework. (a) Labels collecting and quality rating, (b) Weakly supervised learning
signal construction, (c) Accuracy assessment.

3.2. The construction of weakly supervised learning signal


The weakly supervised learning signal consists of the supervised part and the unsupervised part. The supervised
part is constructed based on high-quality labels, and it can effectively guide the model to learn the cropland features
without manual labeling. The unsupervised part is constructed by encoding the visual similarity and spatial aggregation
of the same land cover based on the feature space extracted from images. It serves as the regularization term to avoid
over-fitting the remaining errors in high-quality labels. In addition, the unsupervised part integrates the abundant
information contained in the low-quality labeled samples to enhance the diversity of the model’s feature space in
the optimization process.
Given an image sequence 𝑋 of size 𝑇 × 𝐶 × 𝐻 × 𝑊 , we use it as the input of the multi-temporal network to
obtain the prediction result 𝑃 , which is used to construct the supervised part of the learning signal. The intermediate
feature maps extracted by the multi-temporal network are fused to form the feature space 𝑍, which is used to construct
the unsupervised part of the learning signal. Both the supervised and unsupervised parts are employed in samples
with high-quality labels, but only the unsupervised part is employed in samples with low-quality labels. The specific
construction process is as follows:
We use the high-quality sample mask and prediction result to generate the masked prediction, which is combined
with the high-quality labels to construct the supervised part of the learning signal (Figure 4). Given a pixel (𝑖, 𝑗), with
0 ≤ 𝑖 < 𝐻 and 0 ≤ 𝑗 < 𝑊 , the masked prediction 𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) ∗ (𝑖, 𝑗) and the high-quality (𝑖, 𝑗) are used to
calculate the supervised cross-entropy loss 𝐿𝑜𝑠𝑠𝑆𝐿 as follows:


𝐻 ∑
𝑊
𝐿𝑜𝑠𝑠𝑆𝐿 = − ((𝑖, 𝑗) ∗ log (𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) ∗ (𝑖, 𝑗))) (3)
𝑖 = 0𝑗 = 0

Due to the quality limitation of the products, the high-quality labels may contain some errors. To avoid over-fitting
these errors, inspired by Hua et al. (2021) and Sabokrou et al. (2019), we construct the unsupervised part of the learning
signal as the regularization term to constrain the model. The construction is based on two assumptions: (1) In the visual
domain, two visually similar samples have a higher probability of belonging to the same semantical concept (Sabokrou

Yuze.W: Preprint submitted to Elsevier Page 7 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 4: The flowchart to construct the supervised part of the learning signal.

Figure 5: The flowchart to construct the unsupervised part of the learning signal.

et al., 2019), which means these samples are adjacent in the model’s feature space. (2) In the spatial domain, the land
covers are continuous and aggregated (Jiang, 2015), which means the adjacent samples with the most similarities
should belong to the same category. The regularization term allows the model to enhance the stability of the feature
space during the optimization process, which can alleviate cognitive bias caused by the over-fitting of the remaining
errors.
To enhance the model’s generalization ability in large-scale cropland mapping, we also employ the unsupervised
part of the learning signal in the low-quality samples. Thus, the information from these samples can be involved in the
model optimization process, balancing the intra-class feature distribution of the training samples and improving the
diversity of the feature space.
In Figure 5, the feature space  is represented by the fused intermediate feature maps obtained from the multi-
temporal network. Given the sample 𝑥𝑛 , which represents the nth pixel in an image mapped to the high-dimensional
feature space , where 𝑁 = 𝑊 ×𝐻 and 𝑛 ∈ ℕ. In the visual domain, the sample 𝑥𝑠𝑛 and 𝑥𝑑𝑛 are identified as the samples
with the highest similarity and difference, respectively, to 𝑥𝑛 . These are determined by searching within the same image
using the Sorensen-Dice index. We then encourage the feature distance between 𝑥𝑛 and 𝑥𝑠𝑛 ∕𝑥𝑑𝑛 to be as small/large
as possible during model optimization In the spatial domain, given 𝑥𝑠𝑛 𝑛 as the sample with the highest similarity to
𝑥𝑛 among its eight-neighborhood samples, we encourage the model to minimize their feature distance. Ultimately, the
constraints from both the visual and spatial domains are integrated to yield the unsupervised loss 𝐿𝑜𝑠𝑠𝑈 𝑆𝐿 :


𝑁 ∑
𝑁 ∑
𝑁
𝐿𝑜𝑠𝑠𝑈 𝑆𝐿 = 𝛼 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑠𝑛 )] − 𝛽 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑑𝑛 )] + 𝛾 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑠𝑛
𝑛 )] (4)
𝑛=0 𝑛=0 𝑛=0

Yuze.W: Preprint submitted to Elsevier Page 8 of 25


WSF-MI for Large-scale Cropland Mapping

where 𝛼, 𝛽, and 𝛾 are the priori parameters to measure the importance of different terms. 𝐷𝐾𝐿 represents the Kullback-
Leibler Divergence. At last, the entire model is optimized by minimizing the weakly supervised loss 𝐿𝑜𝑠𝑠𝑊 𝑆 that
combines the aforementioned supervised and unsupervised parts:

𝐿𝑜𝑠𝑠𝑊 𝑆 = 𝐿𝑜𝑠𝑠𝑆𝐿 + 𝐿𝑜𝑠𝑠𝑈 𝑆𝐿 (5)

3.3. The structure of multi-temporal network


In this paper, considering the temporal features of cropland, we use the multi-temporal network for phenological
feature extraction to enhance the separability of cropland in the feature space. Note that, the multi-temporal network
is changeable, allowing for different network architectures to be employed or substituted as needed. We select U-Net
with Temporal Attention Encoder (U-TAE) (Fare Garnot and Landrieu, 2021) to extract multi-scale spatio-temporal
features, which enhances the robustness of anomalies and the capture ability of long-term dependencies in temporal
features. The U-TAE contains three parts: spatial encoder, temporal encoder, and spatial-temporal decoder.
In the spatial encoder, each image in the temporal sequence is embedded by a shared multi-level convolutional
spatial encoder 𝔼𝑙 . In Equation (6), image sequence 𝑋 of size 𝑇 × 𝐶 × 𝐻 × 𝑊 is used as input, The multi-scale spatial
feature sequence 𝑒𝑙 of the temporal image is obtained by continuously down-sampling using sliding convolution at
each layer.

{
𝑋, 𝑙=0
𝑙
𝑒 = (6)
𝔼𝑙 (𝑒𝑙−1 )𝑇𝑡=0 , for 𝑙 ∈ [1, 𝐿]

where l is the number of layers and the size of 𝑒𝑙 is 𝑇 × 𝐶 𝑙 × 𝐻 𝑙 × 𝑊 𝑙 .


In the temporal encoder, Lightweight-Temporal Attention Encoder (L-TAE) (Garnot and Landrieu, 2020) is used
for processing the temporal dimension, which can help the model to capture long-term dependencies and adapt to
dynamic temporal features of the cropland. In this process, the model obtains attention 𝑎𝑙 of the image sequence in the
temporal dimension based on the lowest scale spatial feature, and resizes it to 𝐻 𝑙 × 𝑊 𝑙 for compressing the sequence
of spatial features at different scales. Eventually, the feature sequences  𝑙 with size 𝐶 𝑙 × 𝐻 𝑙 × 𝑊 𝑙 are obtained from
multi-scale spatial features 𝑒𝑙 and the temporal attention 𝑎𝑙 :

{
LTAE(𝑒𝑙 ), if 𝑙 = 𝐿
𝑙
𝑎 = (7)
resize[LTAE(𝑒𝑙 )]𝐿−1
𝑙=0
, for 𝑙 ∈ [0, 𝐿 − 1]

∑𝑇
𝑙 = 𝑙 𝑙
𝑡=0 𝐶𝑜𝑛𝑣1×1 [𝑎𝑡 ⊙ 𝑒𝑙𝑡 ]𝐿
𝑙=0
, for 𝑙 ∈ [0, 𝐿] (8)

where 𝐶𝑜𝑛𝑣𝑙1×1 is a shared 1 × 1 convolution layer of width 𝐶 𝑙 and ⊙ is the term-wise multiplication with channel
broadcasting.
In the Spatial-Temporal decoding part, a multi-level convolutional decoder 𝔻𝑙 is used to generate single spatial-
temporal feature maps on different scales. In detail, the feature map 𝑙 connects with compressed features  𝑙 channel-
wise, and continuously up-sampled by transposed convolution 𝔻𝑙𝑢𝑝 to get the multi-scale spatial-temporal feature maps:

{
𝐿 , if 𝑙 = 𝐿
𝑙 = (9)
𝔻𝑙 (𝔻𝑙𝑢𝑝 [(𝑙−1 )],  𝑙 ) , for 𝑙 ∈ [0, 𝐿 − 1]

 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝐶𝑜𝑛𝑣(𝐿 )) (10)


where [∙] is the channel-wise concatenation. The last feature map 𝐿 with the same size as the original image (𝐻 ×𝑊 )
is processed by the convolution layer 𝐶𝑜𝑛𝑣(∙) and the activation function 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(∙) to get the predictions .
Ultimately, the predictions  and multi-scale spatial-temporal feature maps 𝑙 are used as  for constructing the
supervised and unsupervised part of the learning signal, respectively.

Yuze.W: Preprint submitted to Elsevier Page 9 of 25


WSF-MI for Large-scale Cropland Mapping

3.4. Mapping and accuracy assessment


Given that the proposed method does not rely on manual annotations for model training, we are not constrained
by the typical limitations of training and validation set independence. Although we refer to the information from GLC
products as labels, they more closely resemble pseudo-labels. The overlap between the training and validation sets will
not affect the accuracy assessment process. Consequently, we trained the models with each study area, and employed
the well-trained models in three study areas (Hunan Province in China, Southeast France, and Kansas State in the
USA) to map the cropland (Figure 3). Meanwhile, we carefully sampled the representative regions of each study area
for manual labeling, and constructed the validation set, which is included in the region of the training set. To ensure
a comprehensive and objective assessment, the selection process for the validation area was guided by topography to
balance different cropland scenes (plains, hills, and mountainous). Details regarding the selection and labeling of the
validation area are described in 2.2.
To evaluate the accuracy of our cropland mapping results, we employed a set of assessment metrics (Olofsson
et al., 2014; Li et al., 2023), including Overall Accuracy (OA), Producer’s Accuracy (PA), User’s Accuracy (UA),
Mean Intersection over Union (mIoU) (Deng et al., 2021), the macro-average of F1-scores across all (Avg. F1-score),
and the F1-score of cropland (Crop. F1-score ) and Non-cropland (Non. crop. F1-score) (Zhong et al., 2019). The
F1-score acts as a harmonic mean that integrates PA and UA to measure the model’s precision and recall capabilities
effectively.

4. Experiment
4.1. Experiment setting
Experiments used the datasets collected from Hunan province in China, Southeast France, and Kansas state in
the USA. We cropped all the images into the size of 256*256 pixels. For the training dataset, we produced 32,318
patches in the Hunan study area, 29,893 in Southeast France, and 32,515 in Kansas. For the validation dataset, we
obtained 15,484 patches within the Hunan study area, 6,152 in Southeast France, and 7,697 in Kansas. In the process
of cropland mapping, we segmented all the images into multiple patches with a sliding length of 128 pixels, and utilized
probabilistic prediction results to integrate the final result.
All models were trained using PyTorch on the Ubuntu 16.04 operation system with an NVIDIA GTX3080 GPU
(11-GB memory). Each model was trained using the Adam optimizer, with a batch size of 8 and 100 epochs. The
learning rate was initially set to 1 × 10−3 and decrease it to 1 × 10−4 for the last 50 epochs.

Figure 6: Confusion matrices represented by area ratios for Hunan, Southwest France, and Kansas study areas

4.2. Cropland mapping results


To demonstrate the effectiveness of our framework, we present the confusion matrix and all accuracy evaluation
metrics for our cropland mapping results across the three study regions in Figure 6 and Table 2. In the study areas
of Hunan, Southwest France, and Kansas, we attained an Avg.F1-score of 77.91%, 80.50%, and 88.36%, respectively.

Yuze.W: Preprint submitted to Elsevier Page 10 of 25


WSF-MI for Large-scale Cropland Mapping

Table 2
The accuracy evaluation of our cropland mapping results across three study areas
Non-cropland Cropland
Study Areas OA(%) mIoU(%) Avg.F1-score(%)
PA(%) UA(%) PA(%) UA(%)
Hunan,China 86.26% 65.86% 77.91% 90.09% 92.94% 68.82% 60.38%
Southwest France 80.95% 67.47% 80.50% 81.40% 74.09% 80.63% 86.44%
Kansas,USA 88.43% 79.15% 88.36% 92.81% 85.95% 83.72% 91.56%

Table 3
Cropland mapping accuracy of our framework and the cropland layers across the three GLC products.
Study Areas Products OA(%) mIoU(%) Avg.F1-score(%) Crop.F1-score(%) Non-crop.F1-score(%)

ESA 83.46% 60.85% 73.60% 57.47% 89.73%


Esri 80.26% 55.37% 68.44% 49.13% 87.75%
Hunan Province,China
DyWorld 85.07% 59.95% 72.15% 53.17% 91.12%
Ours 86.26% 65.86% 77.91% 64.32% 91.49%

ESA 71.50% 55.61% 71.47% 70.56% 72.37%


Esri 80.91% 66.99% 80.10% 83.13% 77.06%
Southwest France
DyWorld 73.68% 58.32% 73.67% 74.21% 73.12%
Ours 80.95% 67.47% 80.50% 83.44% 77.57%

ESA 87.32% 77.24% 87.13% 85.60% 88.67%


Esri 80.86% 67.86% 80.85% 81.18% 80.52%
Kansas,USA
DyWorld 78.96% 65.23% 78.95% 79.31% 78.60%
Ours 88.43% 79.15% 88.36% 87.47% 89.25%

It is observable that compared to other study areas, the accuracy of the Hunan is relatively low. This is attributed
to the fact that the farming pattern in Hunan is predominantly smallholder, which is distinct from the large-scale
agricultural operations common in Europe and America. Meanwhile, the prevalence of hilly and mountainous terrain
in the Hunan region leads to smaller average field sizes, a more fragmented spatial distribution, and a greater diversity
of cropland types. These factors collectively render the classification of croplands in this area more challenging. We
also visualized the cropland mapping result in Figure 7(a), and selected typical samples of three terrain types (plains,
hills, and mountains) in each study area to demonstrate the details Figure 7(b). Specifically, we determined the terrain
types by analyzing the slope of each pixel using the Digital Elevation Model (DEM) from STRM. All samples were
categorized into plains (0° to 2°), hills (2° to 6°), and mountains (greater than 6°). As shown in Figure 7, our framework
successfully extracts plain croplands, which tend to have relatively large average field size, exhibiting a clear distinction
between built-up areas and rivers. Hills croplands and mountain croplands exhibit relatively small average field sizes,
which typically exhibit a fragmented distribution mixed with other types of vegetation cover, and our framework is
also capable of accurately identifying their boundaries.
Furthermore, we analyzed the results using key evaluation metrics for the plain croplands (PC), hill croplands
(HC), and mountain croplands (MC) in each study area (Figure 8). In Hunan study areas, the Avg.F1-score is 74.85%,
76.37%, and 72.84% for plain, hill, and mountain croplands, respectively. The variability is lower in the hill cropland
than in the other two. In the Southwest France study areas, the Avg.F1-score is 81.41%, 71.00%, and 68.76% for
plain, hill, and mountain croplands, respectively. The fluctuations are smaller in plain croplands than in hill and
mountain croplands. In the Kansas study areas, the Avg.F1-score accuracy is 79.75% for plain croplands, 82.29%
for hill croplands, and 83.17% for mountainous croplands. This counterintuitive phenomenon can be attributed to the
incorporation of temporal information. Under the SITS observations, in hilly and mountainous areas, other types of
vegetation exhibit more distinct phenological differences from croplands than in flat regions. Conversely, in the plains,
some croplands demonstrate similar phenological patterns to other vegetation covers, such as shrubs or grasses.

Yuze.W: Preprint submitted to Elsevier Page 11 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 7: (a) Overall cropland mapping results for Hunan, Southwest France, and Kansas in 2020, (b) the image and
classification results for plain cropland (S1), hill cropland (S2), and mountain cropland (S3), respectively.

Yuze.W: Preprint submitted to Elsevier Page 12 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 8: (a) Classification map of plain croplands (PC), hill croplands (HC), and mountain croplands (MC) in the validation
regions of the three study areas. (b) the boxplots of main evaluation metrics in different types of croplands. “×” denotes
the location of the average value.

Yuze.W: Preprint submitted to Elsevier Page 13 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 9: Cropland mapping results of our method and the cropland layers of other GLC products.

4.3. Comparison with other GLC products


A great challenge for large-scale cropland mapping is to ensure the framework’s generalizability across diverse
scenarios while maintaining low labeling costs. While our framework provides a promising solution to reduce labeling
costs through the proposed weakly supervised learning signals, its generalizability in practical scenarios needs objective
assessment. Therefore, we compared the proposed approach with three public GLC products (ESA, Esri, and Dyworld)
by a series of comparative experiments across diverse agrosystems, including Hunan, Southwest France, and Kansas.
In quantitative terms, the results in Table 3 demonstrate that the proposed framework outperformed the three products.
Compared with the highest accuracy of GLC products, our framework achieved improvements in the Avg.F1-score
by 5.84%, 0.51%, and 1.40% in the corresponding study area, while achieving improvements in the Crop.F1-score by
11.91%, 0.37%, and 2.18%. The qualitative comparison results are shown in Figure 9, for the Hunan study area, our
results are more comprehensive and provide better extraction of fragmented plain and hill croplands. For the Southwest
France study area, our results exhibit greater detail. In the Kansas study area, our method is capable of extracting fields
with unusual phenological attributes.
Moreover, the stability of the model’s performance is crucial for large-scale cropland mapping in practical
application scenarios. However, as shown in Table 3, the accuracy of the three products varies greatly across these
study areas. For instance, the ESA performs exceptionally well in Kansas, but it showed the poorest performance in
Southwest France. To further evaluate the reliability, we calculated the average accuracy of our proposed framework
and the three products in the three study areas (Figure 10). Our framework demonstrates notable improvements in terms
of OA, mIoU, and Avg.F1-score, surpassing the best-performing products by 5.52%, 9.70%, and 6.27%, respectively.
These results clearly indicated the superior reliability and stability of our proposed framework.

4.4. Comparison with other methods


To demonstrate the superiority of the proposed method, we compared it with the following three types of methods
based on Automatic Training Sample Generation (ATSG) :
Discard methods: We used classical single- and multi-temporal networks, including Deeplabv3+ (Chen et al.,
2018), Unet-3D (Rustowicz et al., 2019) and LSTM (Shi et al., 2015), and U-TAE as classifiers, relying solely on

Yuze.W: Preprint submitted to Elsevier Page 14 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 10: The average accuracy of our framework and the cropland layers from ESA, Esri, and Dyworld in three study
areas.

supervised signals derived from high-quality labels to guide their learning process. We took the year-round composite
images as the base data for single-temporal network, and used the dense SITS for multi-temporal networks.
Re-correct method: We utilized the strategy from the RRE framework (Zhang et al., 2023) to construct the
comparison method, which is an automated solution for extracting high-resolution cropland through cross-scale sample
transfer. First, we trained a label corrector using only high-quality labels and multi-temporal images to obtain the labels
of low-quality samples. Then the corrected labels were used to generate supervised signals to guide the model learning
process.
Re-correct method with weakly supervised learning: We chose the WESUP-LCP (Chen et al., 2023) as the
comparison method, which is a weakly supervised semantic segmentation network for product resolution enhancement
based on the re-correct method. This method is also available for our task. Specifically, we used the same filtering
strategy as WESUP-LCP to identify high-quality sample points, which were then extended to pixels with low-quality
labels by the super-pixel method. Then, we took the super-pixel labels to construct the supervised learning signals and
used the deep dynamic label propagation mechanism to generate pseudo-labels for constructing weakly supervised
signals.
As the results in Table 4 show, our method achieved the best accuracy across most assessment metrics in all three
study areas. Our method outperformed other approaches by achieving improvements of 3.38%, 5.05%, and 0.58%
in mIoU, and 7.15%, 4.05%, and 0.33% in crop.F1-score for the Hunan, Southwest France, and Kansas study areas,
respectively. The qualitative visual results in Figure 11 also demonstrate that our method outperformed the others in
terms of completeness and the ability to capture detailed information across all three study areas.
In the Hunan study areas, our method (Figure 11. (a) and Figure 11. (b)) outperformed both the discard and re-
correct methods in mapping fragmented croplands with small average field sizes. The discard method limited the
model’s ability to learn the abundance of information from the region with low-quality labels, resulting in overfitting
to cropland features with large average field sizes and a failure to recognize fragmented croplands with small average
field sizes. The re-correct methods introduced errors into high-quality labels, which were then amplified during label
propagation, leading the model to misclassify other land covers as cropland. The WESUP-LCP was designed for
increasing label resolution and it was based on the re-correct method. Although it grasped more details than the other
methods, it was still unable to accurately identify fragmented croplands with small average field sizes.

Yuze.W: Preprint submitted to Elsevier Page 15 of 25


WSF-MI for Large-scale Cropland Mapping

Table 4
The classification accuracy of DeeplabV3+, Unet-3D, LSTM, U-TAE, RRE, WESUP-LCP and our method in the study
areas of Hunan, Southwest France, and Kansas.

Study Areas Methods OA(%) mIoU(%) Avg.F1-score(%) Crop.F1-score(%) Non-crop.F1-score(%)

DeeplabV3+ 85.56% 61.78% 73.98% 56.63% 91.34%


Unet-3D 85.57% 62.44% 74.68% 58.07% 91.29%
LSTM 86.02% 63.15% 75.28% 58.99% 91.58%
Hunan Province,China U-TAE 86.38% 63.63% 75.68% 59.55% 91.81%
RRE 85.93% 62.98% 75.13% 58.74% 91.52%
WESUP-LCP 86.14% 63.71% 75.82% 60.03% 60.03%
Ours 86.26% 65.86% 77.91% 64.32% 91.49%

DeeplabV3+ 75.84% 60.97% 75.73% 77.39% 74.06%


Unet-3D 78.34% 64.23% 78.19% 80.02% 76.36%
LSTM 78.32% 64.14% 78.12% 80.19% 76.05%
Southwest France U-TAE 77.70% 63.37% 77.55% 79.36% 75.75%
RRE 77.30% 62.83% 77.14% 79.02% 75.26%
WESUP-LCP 76.96% 62.40% 76.82% 78.62% 75.03%
Ours 80.95% 67.42% 80.47% 83.44% 77.57%

DeeplabV3+ 86.82% 76.63% 86.76% 85.85% 87.66%


Unet-3D 88.14% 78.70% 88.07% 87.18% 88.96%
LSTM 88.05% 78.56% 87.98% 87.09% 88.87%
Kansas State, USA U-TAE 87.40% 77.56% 87.36% 86.64% 88.07%
RRE 87.41% 77.53% 87.33% 86.33% 88.34%
WESUP-LCP 87.35% 77.33% 87.20% 85.81% 88.59%
Ours 88.43% 79.15% 88.36% 87.47% 89.25%

In the Southwest France study area, our method (Figure 11. (c) and Figure 11. (d)) extracted croplands with
significantly different features. Other methods only identified the croplands that were not planted in a specific period
(showed yellow soil), but omitted the planted croplands in the same period. This is because different GLC products
present great inconsistency in the cropland regions with unusual phenological attributes, which makes it difficult to
obtain high-quality labels for these samples. For the discard methods, the absence of labels caused the model to solely
focus on typical cropland features, but lose the ability to recognize diverse cropland features. In addition, due to a lack
of reliable references, the re-correct methods may generate a lot of incorrect labels for cropland areas with unusual
phenological attributes. This consequently leads to the model overfitting inaccurate information.
In the Kansas study areas (Figure 11. (e) and Figure 11. (f)), the croplands have large average field sizes and share
similar features, facilitating the generation of a large number of high-quality labels. Therefore, both the discard and
re-correct methods performed well in these areas. Nevertheless, there is still the risk of misclassifying other vegetation
cover as cropland. In contrast, the proposed method was capable of accurately detecting the boundary between farmland
and other vegetation covers, such as lawns and shrubs.
Additionally, as shown in Table 4, the networks that incorporate multi-temporal information exhibit significant
advantages over the single-temporal network, indicating that using time-series information enables the model to
enhance its ability to distinguish croplands from other land covers. We further discussed the necessity of using temporal
information in Section 5.2.

5. Discussion
In this section, we conducted ablation experiments across all three study areas to comprehensively analyze the
input setting, and further discussed the limitations and effects of using temporal information under our framework.
We discussed four questions: the impacts of using different GLC product combinations as inputs, the temporal

Yuze.W: Preprint submitted to Elsevier Page 16 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 11: The classification results of DeeplabV3+, Unet-3D, LSTM, U-TAE, RRE, WESUP-LCP, and our method in the
three study areas. (a) and (b) for Hunan, (c) and (d) from Southwest France, (e) and (f) for Kansas.

generalizability of our framework, the benefit of expanding the framework in the temporal dimension, and the
robustness of the proposed framework in real-world scenarios.

5.1. Analysis of Employing Varied GLC Product Combinations as Inputs


In the proposed framework, the use of different GLC products as inputs results in diverse high-quality labels, which
directly affect the supervised part of the learning signal. To analyze the impacts of these inputs, we used different
combinations of ESA, Esri, and Dyworld as inputs, and calculated the number and accuracy of the obtained high-
quality labels. The results in Table 5 demonstrate that the combination of all three products yields the best performance
across all the study areas, with the label accuracy showing the closest correlation with the final prediction outcome.
In the Hunan study areas, although the combination of the Dyworld and Esri gets the highest label ratio of 72.02%,
its prediction F1-score is the lowest at 72.99%, influenced by its lowest label accuracy of 75.33%. In the study areas
of Southwest France, the combination of Dyworld and ESA demonstrates the highest label ratio of 90.66%, but with
the lowest label accuracy of 77.28%, which leads to the lowest prediction F1-score of 73.49%. In the Kansas study
areas, the combination of all three GLC products has a label ratio of 72.85%, but it has higher prediction accuracy than
the rest combinations, because of its label accuracy of 91.65%. The reason is that our framework uses unsupervised
learning signals to incorporate the samples without high-quality labels into the model learning process, reducing the
model’s high dependence on the label number.

5.2. Assessment of temporal generalizability


A key challenge in large-scale cropland mapping is the generalization capacity of the framework. Influenced by
different agrosystems and climates, the cropland exhibits various phenological attributes on SITS for different spatial
and temporal coverage. Section 4.3 provides a comprehensive analysis of the model’s spatial generalization capability

Yuze.W: Preprint submitted to Elsevier Page 17 of 25


WSF-MI for Large-scale Cropland Mapping

Table 5
The number/accuracy of high-quality labels and the accuracy of the final prediction results obtained by using different
combinations of GLC products as inputs.
Inputs
Study Areas Combination of Products Prediction Avg.F1-score
Label Avg.F1-score Label Ratio
DyWorld ESA Esri
√ √
78.87% 67.99% 74.69%
√ √
75.33% 72.02% 72.99%
Hunan Province,China √ √
79.10% 70.50% 75.80%
√ √ √
80.70% 64.93% 77.91%
√ √
84.68% 75.77% 79.37%
√ √
85.73% 81.48% 79.89%
Southwest France √ √
77.28% 90.66% 73.49%
√ √ √
87.56% 73.99% 80.50%
√ √
91.49% 82.30% 88.13%
√ √
86.87% 81.12% 85.48%
Kansas, USA √ √
90.40% 82.26% 88.21%
√ √ √
91.65% 72.85% 88.36%

Figure 12: The average accuracy of our framework following Direct Transfer (DT), Continue Training (CT), and the
accuracy of other GLC products in 2021 across three study areas.

in the three study areas, but the model’s temporal generalization ability remains unclear. To address this, we randomly
selected 1,000 samples of changed cropland from each of the three study areas, each sample with a size of 256*256
pixels. For those samples, we collected the corresponding SITS and GLC products for the year 2021, while manually
labeling them with the assistance of Google Earth images. Furthermore, we designed two sets of experiments to
demonstrate the temporal generalizability of our method: (1) Direct Transfer (DT): The model trained on the 2020
data was directly employed on the 2021 data without any modification; and (2) Continue Training (CT): The model
trained on the 2020 data was further trained using 2021 data and then applied to the 2021 data.

Yuze.W: Preprint submitted to Elsevier Page 18 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 13: The classification accuracy of using seasonal composite images and whole year SITS in the three study areas.

As shown in Figure 12, the average accuracy of our framework after the DT operation does not exceed that of
the best GLC products. This is due to the changing phenological feature of cropland influenced by varying climatic
conditions and planting patterns between years, which results in the models trained on the 2020 data being unsuitable
for the 2021 data. However, our method does not require any manual labeling cost, which facilitates the incorporation
of new data for continued model training. This allows the model to progressively enhance the ability to recognize the
croplands with different phenological features. Therefore, our framework after the CT operation achieved superior and
more stable performance compared to other products. It outperformed the best-performing GLC product by 1.82%,
2.87%, and 2.10% in OA, mIoU, and F1- scores, respectively across all study areas.
This result illustrates the limitations of our method regarding temporal generalizability when directly transferred.
However, these limitations can be addressed by further training the model using available data without labeling costs.

5.3. Exploring the Imperative Role of Time-series Information


To analyze the necessity of incorporating temporal information, we used temporal data from each season to generate
the seasonal composite images. These images were used as input to train the model, following the proposed framework
but excluding the temporal encoding part. As shown in Figure 13, when compared with the highest accuracy obtained
by using only the seasonal composite images, the extraction results from the entire year’s SITS exhibit improvement of
3.47%, 5.22%, and 4.26% in Hunan, Southwest France, and Kansas, respectively. The main reasons for this phenomenon
are as follows:
Considering cropland as a whole, it displays significant visual variations across different periods due to its planting
status. This poses challenges for accurately delineating the extent of cropland, as it can resemble other land covers
during some periods. For instance, in the Hunan area, where rice cultivation is prevalent, the planting season falls
in early summer and harvesting occurs in autumn. Thus, the cropland exhibits similar visual features to bare land in
spring and winter, leading to relatively low extraction accuracy during these two seasons:
Inside one cropland, there may be multiple crop types with different phenological patterns, which leads to large
intra-class diversity in a specific period. In this case, the models only relying on textural, spectral, and spatial features
in a single time phase can hardly recognize every part of the cropland. For instance, in Hunan, oilseed rape is often
planted in the autumn, coinciding with the harvest of rice. Although the harvested rice field can be easily distinguished
by its unique textural feature, oilseed rape may visually resemble other vegetation covers like shrubs or lawns during

Yuze.W: Preprint submitted to Elsevier Page 19 of 25


WSF-MI for Large-scale Cropland Mapping

Figure 14: The t-SNE visualization results of seasonal composite images and whole year SITS in the three study areas.

that period. This similarity can lead to misidentification of oilseed rape as other vegetation types. As illustrated in
Figure 13, the extraction result in Hunan areas remains low during the autumn seasons.
Therefore, we extracted phenological features by integrating dense SITS and employed them to enhance the
separability of cropland within the model’s feature space, thereby augmenting the completeness of cropland extraction.
To further explore the benefit of the incorporation of multi-temporal information, we employed the t-distributed
stochastic neighbor embedding (t-SNE) method to visualize intermediate feature maps of the models trained with
seasonal composite images and whole-year SITS. As shown in Figure 14, the intermediate feature maps derived from

Yuze.W: Preprint submitted to Elsevier Page 20 of 25


WSF-MI for Large-scale Cropland Mapping

models using whole-year SITS demonstrated superior separability across all three study areas compared to those using
seasonal composite images. Specifically, regarding the entirety of cropland, the inclusion of multi-temporal information
can help the model to better distinguish between cropland and other land covers, thereby reducing feature confusion
related to planting status during specific periods. In terms of intra-cropland, multi-temporal information enables the
model to reduce intra-class feature dissimilarity within the croplands, facilitating the recognition of diverse croplands
encompassing various crop types simultaneously.

5.4. Robustness Analysis of Cloud Cover Scenarios


In practical application scenarios, SITS often suffers from information loss caused by cloud cover, which limits the
model’s ability to extract complete spatial and temporal features. To assess the robustness of the proposed framework
in cloud cover scenarios, we conducted experiments simulating cloud cover effects in both temporal and spatial
dimensions. Considering real-world scenarios (Coluzzi et al., 2018), we designed the experiment as follows: in the
spatial dimension, we added cloud masks of various sizes to the images, ranging from 0.00% to 40.00% in increments of
10.00%. In the temporal dimension, we simulated data-missing situations by randomly dropping images. The dropping
rate was set from 0% to 83.33% at intervals of 8.33%, which simulates the situation where images from one to ten
months were missing. Given that data loss in both spatial and temporal dimensions often occurs at the same time, we
simultaneously imposed the masking and dropping operation in the simulation experiment.

Figure 15: The accuracy of the proposed framework in situations of various spatial and temporal missing rates. The red
line in the color bar indicates the average accuracy of three GLC products in the corresponding study area.

The results in Figure 15 show that our framework exhibits considerable robustness to the data loss caused by
cloud cover in all three study areas. In the Hunan study area, compared to the Avg.F1-score of multiple GLC products
at 71.51%, our framework still exhibits better performance even with a spatial missing rate of 30% and a temporal
missing rate of 66.67%. In the Southwest France study area, compared with the Avg.F1-score of 75.09% for multiple
GLC products, our framework shows better performance under the situation of 10.00% spatial missing rate and 33.33%
temporal missing rate. In the Kansas study area, compared with the Avg.F1-score of 82.31% for multiple GLC products,
our framework has better performance under the situation of 20.00% spatial missing rate and 50.00% temporal missing
rate. These results demonstrate the strong feasibility of our framework in real-world data-missing scenarios. The
exceptional performance in the Hunan study area may be attributed to the training data affected by distinctive climatic
conditions. The local model has been already exposed to the data-missing situation in the training process, allowing
the model to gain better adaptability in this case.

6. Conclusion
In this study, we proposed a weakly supervised framework for large-scale cropland mapping using multi-temporal
information. The framework uses the labels from existing GLC products and dense SITS to capture the diverse temporal
features of cropland influenced by crop phenology and human agricultural activities, without the need for manual
labeling. The approach enables the model to effectively utilize the information in low-quality labeled samples, while
avoiding over-fitting the residual errors in high-quality labeled samples. In the experiments across three study areas,
the proposed framework demonstrated superiority over GLC products and outperformed the traditional methods that

Yuze.W: Preprint submitted to Elsevier Page 21 of 25


WSF-MI for Large-scale Cropland Mapping

rely on discard and re-correct methods. Furthermore, we investigated the effects of input setting and the temporal
generalizability of the proposed framework, while exploring the necessity of using multi-temporal information and the
robustness of our framework in a cloud cover scenario. Further efforts can be made to enhance the efficiency of the
proposed framework and reinforce the robustness of the model in real-world scenarios with missing information.

Acknowledgements
The work presented in this paper was supported by the National Natural Science Foundation of China (No.
42171376); The Distinguished Young Scholars under Grant 2022JJ10072; and the Open Fund of Xiangjiang Labo-
ratory under Grant 22XJ03007.

Yuze.W: Preprint submitted to Elsevier Page 22 of 25


WSF-MI for Large-scale Cropland Mapping

CRediT authorship contribution statement


Yuze Wang: Conceptualization, Data curation, Formal analysis, Investigation, Resources, Software, Validation,
Visualization, Writing – original draft, Writing – review & editing. Aoran Hu: Conceptualization, Data curation,
Methodology, Resources, Software, Writing – original draft. Ji Qi: Investigation, Methodology, Resources, Visual-
ization. Yang Liu: Funding acquisition, Project administration, Supervision. Chao Tao: Funding acquisition, Project
administration, Resources, Supervision, Writing – original draft, Writing – review & editing.

References
Amani, M., Ghorbanian, A., Ahmadi, S.A., Kakooei, M., Moghimi, A., Mirmazloumi, S.M., Moghaddam, S.H.A., Mahdavi, S., Ghahremanloo,
M., Parsian, S., Wu, Q., Brisco, B., 2020. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A
Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 13, 5326–5350. https://ieeexplore.ieee.org/
document/9184118/.
Belgiu, M., Csillik, O., 2018. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis.
Remote Sensing of Environment 204, 509–523. https://linkinghub.elsevier.com/retrieve/pii/S0034425717304686.
Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko,
S., Schwehr, K., Weisse, M., Stolle, F., Hanson, C., Guinan, O., Moore, R., Tait, A.M., 2022a. Dynamic World, Near real-time global 10 m land
use land cover mapping. Sci Data 9, 251. https://www.nature.com/articles/s41597-022-01307-4.
Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko,
S., et al., 2022b. Dynamic world, near real-time global 10 m land use land cover mapping. Scientific Data 9, 251.
Calvao, T., Pessoa, M., 2015. Remote sensing in food production-a review. Emir. J. Food Agric 27, 138. http://www.ejfa.me/index.php/
journal/article/view/652.
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H., 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image
Segmentation, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer International Publishing,
Cham. volume 11211, pp. 833–851. https://link.springer.com/10.1007/978-3-030-01234-2_49.
Chen, Y., Zhang, G., Cui, H., Li, X., Hou, S., Ma, J., Li, Z., Li, H., Wang, H., 2023. A novel weakly supervised semantic segmentation
framework to improve the resolution of land cover product. ISPRS Journal of Photogrammetry and Remote Sensing 196, 73–92. https:
//linkinghub.elsevier.com/retrieve/pii/S0924271622003422.
Chi, M., Plaza, A., Benediktsson, J.A., Sun, Z., Shen, J., Zhu, Y., 2016. Big Data for Remote Sensing: Challenges and Opportunities. Proc. IEEE
104, 2207–2219. https://ieeexplore.ieee.org/document/7565634/.
Coluzzi, R., Imbrenda, V., Lanfredi, M., Simoniello, T., 2018. A first assessment of the Sentinel-2 Level 1-C cloud mask product to support
informed surface analyses. Remote Sensing of Environment 217, 426–443. https://linkinghub.elsevier.com/retrieve/pii/
S0034425718303742.
Copernicus Climate Change Service, 2019. Land cover classification gridded maps from 1992 to present derived from satellite observations.
https://cds.climate.copernicus.eu/doi/10.24381/cds.006f2c9a.
Defourny, P., Bontemps, S., Bellemans, N., Cara, C., Dedieu, G., Guzzonato, E., Hagolle, O., Inglada, J., Nicola, L., Rabaute, T., Savinaud, M.,
Udroiu, C., Valero, S., Bégué, A., Dejoux, J.F., El Harti, A., Ezzahar, J., Kussul, N., Labbassi, K., Lebourgeois, V., Miao, Z., Newby, T.,
Nyamugama, A., Salh, N., Shelestov, A., Simonneaux, V., Traore, P.S., Traore, S.S., Koetz, B., 2019. Near real-time agriculture monitoring
at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world.
Remote Sensing of Environment 221, 551–568. https://linkinghub.elsevier.com/retrieve/pii/S0034425718305145.
Deng, P., Xu, K., Huang, H., 2021. When cnns meet vision transformer: A joint framework for remote sensing scene classification. IEEE Geoscience
and Remote Sensing Letters 19, 1–5.
Do Nascimento Bendini, H., Garcia Fonseca, L.M., Schwieder, M., Sehn Körting, T., Rufin, P., Del Arco Sanches, I., Leitão, P.J., Hostert, P.,
2019. Detailed agricultural land classification in the Brazilian cerrado based on phenological information from dense satellite image time series.
International Journal of Applied Earth Observation and Geoinformation 82, 101872. https://linkinghub.elsevier.com/retrieve/pii/
S0303243418308961.
FAO., 2010. World programme for the census of agriculture.
Fare Garnot, V.S., Landrieu, L., 2021. Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks, in:
2021 IEEECVF Int. Conf. Comput. Vis. ICCV, IEEE, Montreal, QC, Canada. pp. 4852–4861. https://ieeexplore.ieee.org/document/
9711189/.
Food and Agriculture Organization of the United Nations (Ed.), 2005. A System of Integrated Agricultural Censuses and Surveys. Number 11 in
FAO Statistical Development Series, Food and Agriculture Organization of the United Nations, Rome.
Friedl, M., Sulla-Menashe, D., 2019. MCD12Q1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V006. https:
//lpdaac.usgs.gov/products/mcd12q1v006/.
Garnot, V.S.F., Landrieu, L., 2020. Lightweight Temporal Self-attention for Classifying Satellite Images Time Series, in: Lemaire, V., Malinowski,
S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (Eds.), Advanced Analytics and Learning on Temporal Data. Springer International Publishing,
Cham. volume 12588, pp. 171–181. http://link.springer.com/10.1007/978-3-030-65742-0_12.
Gong, P., Wang, J., Yu, L., Zhao, Y., Zhao, Y., Liang, L., Niu, Z., Huang, X., Fu, H., Liu, S., Li, C., Li, X., Fu, W., Liu, C., Xu, Y., Wang, X., Cheng,
Q., Hu, L., Yao, W., Zhang, H., Zhu, P., Zhao, Z., Zhang, H., Zheng, Y., Ji, L., Zhang, Y., Chen, H., Yan, A., Guo, J., Yu, L., Wang, L., Liu,
X., Shi, T., Zhu, M., Chen, Y., Yang, G., Tang, P., Xu, B., Giri, C., Clinton, N., Zhu, Z., Chen, J., Chen, J., 2013. Finer resolution observation

Yuze.W: Preprint submitted to Elsevier Page 23 of 25


WSF-MI for Large-scale Cropland Mapping

and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. International Journal of Remote Sensing 34,
2607–2654. https://www.tandfonline.com/doi/full/10.1080/01431161.2012.748992.
Hermosilla, T., Wulder, M.A., White, J.C., Coops, N.C., 2022. Land cover classification in an era of big and open data: Optimizing localized
implementation and training data selection to improve mapping outcomes. Remote Sensing of Environment 268, 112780. https://
linkinghub.elsevier.com/retrieve/pii/S0034425721005009.
Hua, Y., Marcos, D., Mou, L., Zhu, X.X., Tuia, D., 2021. Semantic segmentation of remote sensing images with sparse annotations. IEEE Geoscience
and Remote Sensing Letters 19, 1–5.
Huang, Y., Chen, Z.x., Yu, T., Huang, X.z., Gu, X.f., 2018. Agricultural remote sensing big data: Management and applications. Journal of
Integrative Agriculture 17, 1915–1931. https://linkinghub.elsevier.com/retrieve/pii/S2095311917618598.
Jiang, B., 2015. Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity. GeoJournal 80, 1–13. http:
//link.springer.com/10.1007/s10708-014-9537-y.
Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P., 2021a. Global land use / land cover with Sentinel 2 and deep
learning, in: 2021 IEEE Int. Geosci. Remote Sens. Symp. IGARSS, IEEE, Brussels, Belgium. pp. 4704–4707. https://ieeexplore.ieee.
org/document/9553499/.
Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P., 2021b. Global land use/land cover with sentinel 2 and deep
learning, in: 2021 IEEE international geoscience and remote sensing symposium IGARSS, IEEE. pp. 4704–4707.
Karthikeyan, L., Chawla, I., Mishra, A.K., 2020. A review of remote sensing applications in agriculture for food security: Crop growth and yield, irri-
gation, and crop losses. Journal of Hydrology 586, 124905. https://linkinghub.elsevier.com/retrieve/pii/S0022169420303656.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. https://www.nature.com/articles/nature14539.
Lenczner, G., Chan-Hon-Tong, A., Le Saux, B., Luminari, N., Le Besnerais, G., 2022. Dial: Deep interactive and active learning for semantic
segmentation in remote sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 3376–3389.
Li, C., Xian, G., Zhou, Q., Pengra, B.W., 2021. A novel automatic phenology learning (APL) method of training sample selection using
multiple datasets for time-series land cover mapping. Remote Sensing of Environment 266, 112670. https://linkinghub.elsevier.
com/retrieve/pii/S0034425721003904.
Li, H., Song, X.P., Hansen, M.C., Becker-Reshef, I., Adusei, B., Pickering, J., Wang, L., Wang, L., Lin, Z., Zalles, V., et al., 2023. Development of
a 10-m resolution maize and soybean map over china: Matching satellite-based crop classification with sample-based area estimation. Remote
Sensing of Environment 294, 113623.
Li, J., Huang, X., Gong, J., 2019. Deep neural network for remote-sensing image interpretation: Status and perspectives. Natl. Sci. Rev. 6, 1082–1086.
https://academic.oup.com/nsr/article/6/6/1082/5484863.
Li, K., Xu, E., 2020. Cropland data fusion and correction using spatial analysis techniques and the Google Earth Engine. GIScience & Remote
Sensing 57, 1026–1045. https://www.tandfonline.com/doi/full/10.1080/15481603.2020.1841489.
Liu, Y., Wu, Y., Chen, Z., Huang, M., Du, W., Chen, N., Xiao, C., 2022. A Novel Impervious Surface Extraction Method Based on Automatically
Generating Training Samples From Multisource Remote Sensing Products: A Case Study of Wuhan City, China. IEEE J. Sel. Top. Appl. Earth
Observations Remote Sensing 15, 6766–6780. https://ieeexplore.ieee.org/document/9854083/.
Naboureh, A., Li, A., Bian, J., Lei, G., 2023. National Scale Land Cover Classification Using the Semiautomatic High-Quality Reference Sample
Generation (HRSG) Method and an Adaptive Supervised Classification Scheme. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
16, 1858–1870. https://ieeexplore.ieee.org/document/10035401/.
Nanni, L., Ghidoni, S., Brahnam, S., 2017. Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognition 71,
158–172. https://linkinghub.elsevier.com/retrieve/pii/S0031320317302224.
North, H.C., Pairman, D., Belliss, S.E., 2019. Boundary Delineation of Agricultural Fields in Multitemporal Satellite Imagery. IEEE J. Sel. Top.
Appl. Earth Observations Remote Sensing 12, 237–251. https://ieeexplore.ieee.org/document/8584043/.
Oliphant, A.J., Thenkabail, P.S., Teluguntla, P., Xiong, J., Gumma, M.K., Congalton, R.G., Yadav, K., 2019. Mapping cropland extent of Southeast
and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine Cloud.
International Journal of Applied Earth Observation and Geoinformation 81, 110–124. https://linkinghub.elsevier.com/retrieve/
pii/S0303243418307414.
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E., Wulder, M.A., 2014. Good practices for estimating area and assessing
accuracy of land change. Remote sensing of Environment 148, 42–57.
Pelletier, C., Valero, S., Inglada, J., Champion, N., Dedieu, G., 2016. Assessing the robustness of Random Forests to map land cover with high
resolution satellite image time series over large areas. Remote Sensing of Environment 187, 156–168. https://linkinghub.elsevier.
com/retrieve/pii/S0034425716303820.
Persello, C., Tolpekin, V., Bergado, J., de By, R., 2019. Delineation of agricultural fields in smallholder farms from satellite images using fully
convolutional networks and combinatorial grouping. Remote Sensing of Environment 231, 111253. https://linkinghub.elsevier.com/
retrieve/pii/S003442571930272X.
Prince, S.D., 2019. Challenges for remote sensing of the Sustainable Development Goal SDG 15.3.1 productivity indicator. Remote Sensing of
Environment 234, 111428. https://linkinghub.elsevier.com/retrieve/pii/S003442571930447X.
Rustowicz, R., Cheong, R., Wang, L., Ermon, S., Burke, M., Lobell, D., 2019. Semantic segmentation of crop type in africa: A novel dataset and
analysis of deep learning methods, in: CVPR Workshops, pp. 75–82. https://api.semanticscholar.org/CorpusID:198180478.
Sabokrou, M., Khalooei, M., Adeli, E., 2019. Self-Supervised Representation Learning via Neighborhood-Relational Encoding, in: 2019 IEEECVF
Int. Conf. Comput. Vis. ICCV, IEEE, Seoul, Korea (South). pp. 8009–8018. https://ieeexplore.ieee.org/document/9010354/.
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c., 2015. Convolutional lstm network: A machine learning approach for precipitation
nowcasting. Advances in neural information processing systems 28. https://proceedings.neurips.cc/paper_files/paper/2015/
file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf.

Yuze.W: Preprint submitted to Elsevier Page 24 of 25


WSF-MI for Large-scale Cropland Mapping

Singh, G., Singh, S., Sethi, G., Sood, V., 2022. Deep Learning in the Mapping of Agricultural Land Use Using Sentinel-2 Satellite Data. Geographies
2, 691–700. https://www.mdpi.com/2673-7086/2/4/42.
Stinson, G., Magnussen, S., Boudewyn, P., Eichel, F., Russo, G., Cranny, M., Song, A., 2016. Canada, in: Vidal, C., Alberdi, I.A., Hernández Mateo,
L., Redmond, J.J. (Eds.), National Forest Inventories. Springer International Publishing, Cham, pp. 233–247. http://link.springer.com/
10.1007/978-3-319-44015-6_12.
Sun, Z., Di, L., Fang, H., 2019. Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data
layer time series. International Journal of Remote Sensing 40, 593–614. https://www.tandfonline.com/doi/full/10.1080/01431161.
2018.1516313.
Sykas, D., Sdraka, M., Zografakis, D., Papoutsis, I., 2022. A Sentinel-2 Multiyear, Multicountry Benchmark Dataset for Crop Classification and
Segmentation With Deep Learning. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 15, 3323–3339. https://ieeexplore.
ieee.org/document/9749916/.
Wagner, M.P., Oppelt, N., 2020. Extracting Agricultural Fields from Remote Sensing Imagery Using Graph-Based Growing Contours. Remote
Sensing 12, 1205. https://www.mdpi.com/2072-4292/12/7/1205.
Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X., 2022. Semi-Supervised Semantic Segmentation Using
Unreliable Pseudo-Labels, in: 2022 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR, IEEE, New Orleans, LA, USA. pp. 4238–4247.
https://ieeexplore.ieee.org/document/9879387/.
Weiss, M., Jacob, F., Duveiller, G., 2020. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236,
111402. https://linkinghub.elsevier.com/retrieve/pii/S0034425719304213.
Wulder, M., Li, Z., Campbell, E., White, J., Hobart, G., Hermosilla, T., Coops, N., 2018. A National Assessment of Wetland Status and Trends
for Canada’s Forested Ecosystems Using 33 Years of Earth Observation Satellite Data. Remote Sensing 10, 1623. http://www.mdpi.com/
2072-4292/10/10/1623.
Wulder, M.A., Dechka, J.A., Gillis, M.A., Luther, J.E., Hall, R.J., Beaudoin, A., Franklin, S.E., 2003. Operational mapping of the land cover of the
forested area of Canada with Landsat data: EOSD land cover program. The Forestry Chronicle 79, 1075–1083. http://pubs.cif-ifc.org/
doi/10.5558/tfc791075-6.
Xu, F., Yao, X., Zhang, K., Yang, H., Feng, Q., Li, Y., Yan, S., Gao, B., Li, S., Yang, J., et al., 2024. Deep learning in cropland field identification:
A review. Computers and Electronics in Agriculture 222, 109042.
Xu, Y., Yu, L., Zhao, Y., Feng, D., Cheng, Y., Cai, X., Gong, P., 2017. Monitoring cropland changes along the Nile River in Egypt over past three
decades (1984–2015) using remote sensing. International Journal of Remote Sensing 38, 4459–4480. https://www.tandfonline.com/doi/
full/10.1080/01431161.2017.1323285.
Yin, J., Dong, J., Hamm, N.A., Li, Z., Wang, J., Xing, H., Fu, P., 2021. Integrating remote sensing and geospatial big data for urban land use
mapping: A review. International Journal of Applied Earth Observation and Geoinformation 103, 102514. https://linkinghub.elsevier.
com/retrieve/pii/S030324342100221X.
Yu, L., Wang, J., Gong, P., 2013. Improving 30 m global land-cover map FROM-GLC with time series MODIS and auxiliary data sets: A
segmentation-based approach. International Journal of Remote Sensing 34, 5851–5867. https://www.tandfonline.com/doi/full/10.
1080/01431161.2013.798055.
Yue, A., Zhang, C., Yang, J., Su, W., Yun, W., Zhu, D., 2013. Texture extraction for object-oriented classification of high spatial resolution remotely
sensed images using a semivariogram. International Journal of Remote Sensing 34, 3736–3759. https://www.tandfonline.com/doi/
full/10.1080/01431161.2012.759298.
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S.,
Lesiv, M., Herold, M., Tsendbazar, N.E., Xu, P., Ramoino, F., Arino, O., 2022a. ESA WorldCover 10 m 2021 v200. https://zenodo.org/
record/7254220.
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S.,
et al., 2022b. Esa worldcover 10 m 2021 v200 .
Zhang, D., Pan, Y., Zhang, J., Hu, T., Zhao, J., Li, N., Chen, Q., 2020. A generalized approach based on convolutional neural networks for large area
cropland mapping at very high resolution. Remote Sensing of Environment 247, 111912. https://linkinghub.elsevier.com/retrieve/
pii/S0034425720302820.
Zhang, H.K., Roy, D.P., 2017. Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover
classification. Remote Sensing of Environment 197, 15–34. https://linkinghub.elsevier.com/retrieve/pii/S0034425717302249.
Zhang, W., Guo, S., Zhang, P., Xia, Z., Zhang, X., Lin, C., Tang, P., Fang, H., Du, P., 2023. A Novel Knowledge-Driven Automated Solution for
High-Resolution Cropland Extraction by Cross-Scale Sample Transfer. IEEE Trans. Geosci. Remote Sensing 61, 1–16. https://ieeexplore.
ieee.org/document/10197441/.
Zhong, L., Hu, L., Zhou, H., 2019. Deep learning based multi-temporal crop classification. Remote sensing of environment 221, 430–443.
Zhu, X.X., Tuia, D., Mou, L., Xia, G.S., Zhang, L., Xu, F., Fraundorfer, F., 2017. Deep Learning in Remote Sensing: A Comprehensive Review
and List of Resources. IEEE Geosci. Remote Sens. Mag. 5, 8–36. doi:10.1109/MGRS.2017.2762307.
Zhu, Z., Gallant, A.L., Woodcock, C.E., Pengra, B., Olofsson, P., Loveland, T.R., Jin, S., Dahal, D., Yang, L., Auch, R.F., 2016. Optimizing selection
of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS Journal of Photogrammetry and Remote
Sensing 122, 206–221. https://linkinghub.elsevier.com/retrieve/pii/S0924271616302829.

Yuze.W: Preprint submitted to Elsevier Page 25 of 25

You might also like