Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale.18475v1
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale.18475v1
• Encode intrinsic features to optimize the utilization of labels from GLC products
• Experiments on three agricultural areas showed the advantage of the proposed method.
• Uncover the benefits of using multi-temporal information in cropland extraction.
1. Introduction
Over the past decades, remote sensing observation has played significant roles in large-scale cropland mapping
and monitoring (Huang et al., 2018; Defourny et al., 2019). By offering timely and comprehensive images of nearly
every part of the Earth’s surface (Chi et al., 2016), it served as a reliable information source for identifying cropland
spatial distribution on a large scale (Weiss et al., 2020). Moreover, it provides valuable support for various agricultural
applications, such as land-use planning (Yin et al., 2021), food security (Karthikeyan et al., 2020; Calvao and Pessoa,
2015), and sustainable agroecology (Prince, 2019).
With the accumulation of remote sensing data, several data-driven methods based on machine learning have been
widely applied to the large-scale cropland mapping (Zanaga et al., 2022a; Do Nascimento Bendini et al., 2019; Pelletier
et al., 2016; Gong et al., 2013; Zanaga et al., 2022a; Yue et al., 2013; North et al., 2019; Belgiu and Csillik, 2018; Xu
et al., 2017). The widely adopted approaches are employing traditional machine learning methods to analyze and
interpret the remote sensing (RS) images. These methods often rely on a combination of low-level and middle-level
visual features, such as texture (Yue et al., 2013), spectral (North et al., 2019), and shape (Wagner and Oppelt, 2020)
features. However, due to the influence of various factors on cropland, such as climate, geography, and topography,
the methods that rely on handcrafted features typically encounter issues of restricted generalization performance and
low accuracy (Nanni et al., 2017).In recent years, Deep Convolutional Neural Networks (DCNNs) (LeCun et al.,
⋆
∗ Chao Tao
[email protected] (C. Tao)
ORCID (s): 0000-0003-0071-310X (C. Tao)
2015) attracted great attention in cropland mapping (Singh et al., 2022; Brown et al., 2022a; Zhang et al., 2020; Sun
et al., 2019; Karra et al., 2021a; Persello et al., 2019), because it can extract high-level visual features that are more
representative and distinguishable. However, the performance of DCNNs shows a strong positive correlation with
the number and diversity of high-quality labeled samples, leading to high labeling costs (Zhu et al., 2017; Li et al.,
2019). Although various methods (Hua et al., 2021; Lenczner et al., 2022) designed for sparse labeling conditions can
significantly decrease the demand for labeling, the remaining label requirement in large-scale cropland mapping still
implies a considerable manual cost. So, reducing the labeling cost while maintaining cropland mapping accuracy is
still a great challenge.
Some methods use existing global land cover (GLC) products, such as GFSAD 30 (Oliphant et al., 2019), CCI-LC
(Copernicus Climate Change Service, 2019), FROMGLC (Yu et al., 2013), MCD12Q1 (Friedl and Sulla-Menashe,
2019), Esri (Karra et al., 2021a), ESA (Zanaga et al., 2022a), DyWorld (Brown et al., 2022a) as reference to train the
model at a low cost and obtain accurate cropland mapping results (Li et al., 2021; Zhu et al., 2016). Those methods
are known as Automatic Training Sample Generation (ATSG) (Liu et al., 2022). However, the label obtained from
GLC products inevitably contains some errors, due to factors like the diversity of cropland scenes, imaging conditions,
and classifier performance. Directly using those labels may lead to instability and over-fitting in the model learning
process, causing low-quality cropland mapping. Therefore, identifying the errors of the labels generated from GLC
products and preventing their negative impacts on the model training process is one of the significant research topics
of ATSG methods.
According to the post-processing ways of error labels, the ATSG methods based on existing GLC products are
divided into two categories: discard and re-correct
Discard methods establish quality criteria to rate the labels from the GLC products, and exclude low-quality labels
before training. Based on the MCD12Q1 product, Zhang and Roy (2017) utilized temporal invariance as a quality
criterion to discard the labels that have changes within three years, and excluded the labels with low classification
confidence according to the quality assessment layer provided in the product auxiliary information. With the continuous
release of GLC products, Li and Xu (2020) identified high-quality labels by considering their consistent performance
across four GLC products (GFSAD 30, CCI-LC, FROMGLC, and MCD12Q1), and discarded the outlier labels based
on the spectral distribution of all corresponding pixels. Considering the spectral mixing problem among vegetation
cover, Hermosilla et al. (2022) integrated the 3D information from LiDAR data into the quality assessment of the
labels from three products (NFI photo plot (Stinson et al., 2016), EOSD (Wulder et al., 2003) and NWS (Wulder et al.,
2018)). By doing so, they could identify and remove the low-quality labels that mismatch the vegetation height.
Re-correct methods construct filter strategies to obtain high-quality labels based on criteria such as classification
consistency across multiple products (Hermosilla et al., 2022) and spectral consistency among the same land covers (Li
and Xu, 2020). High-quality labels are then obtained and used as references to correct the remaining labels. Considering
the spatial continuity and texture consistency within the same land cover, Chen et al. (2023) took the sub-regions
obtained by Simple Non-Iterative Clustering (SNIC) as the correction unit, and reclassified each unit through voting
among filtered high-quality labels, thereby extending high-quality labels into regions with low-quality labels. Zhang
et al. (2023) considered the phenological attribute of vegetation cover, and used high-quality labels and multi-temporal
images to train the classifier to correct low-quality labels. Naboureh et al. (2023) selected pure pixels with high-quality
labels as reference, and rectified the remaining pixels containing low-quality labels according to their spectral distance
from the selected pixels.
The above methods still encountered difficulties in achieving high-precision cropland extraction, thus falling short
of meeting the localized needs of agricultural applications. The main reasons are as follows:
1) Insufficient use of samples with low-quality labels: Discard methods can mitigate the model’s over-fitting problem
on incorrect labels, they also eliminate the samples with diverse features but low-quality labels before training.
Furthermore, Discard methods may lead to the imbalance of intra-class distribution for training samples (Wang
et al., 2022), so the model can only learn typical features and has limited generalization capability. For example, in
cropland mapping, these methods often constrain the trained classifier to only extract continuous and large block
plain croplands, making it challenging to recognize cropland in complex environments, such as those scattered
across hills and mountains.
2) Over-trust the samples with high-quality labels: Although re-correct methods allow diverse samples to participate
in the model training process, they potentially introduce a lot of label noise, misleading the model during the
learning process. This occurs for two reasons. First, due to the inherent limitations of the products, the filtering
Table 1
The extent, climate, and main crops of the three study areas.
Study areas Extent(km2 ) Climate Main crops
Hunan Province,China 211,800 Continental subtropical monsoonal humid climate Rice, Rapeseed, Cotton, Tea
Southwest France 195,910 Temperate maritime climate Spring wheat, Soybeans, Olives
Kansas State, USA 213,096 Temperate continental climate Winter wheat, Corn
strategies struggle to completely eliminate all errors in the selected high-quality labels. Second, re-correct methods
use samples with high-quality labels as references to correct the samples with low-quality labels, which may further
amplify the remaining label errors.
In summary, discard methods can ensure the overall quality of training samples but may decrease the diversity of
the feature space due to the insufficient use of samples with low-quality labels. On the other hand, re-correct methods
effectively utilize samples with low-quality labels, but place excessive trust in high-quality labels, which can amplify
label errors and mislead the model optimization process.
To balance these two issues, we proposed a weakly-supervised framework that uses the labels from existing
GLC products for large-scale cropland mapping. Specifically, to avoid the model over-trusting high-quality labels,
we encoded the intrinsic feature distribution of the image to construct the unsupervised part of the learning signal,
which was then used to constrain the supervised part directly constructed by the high-quality labels. It can prompt
the model to assess and question the reliability of the supervised part of the learning signal, avoiding over-fitting the
remaining errors in high-quality labels. Meanwhile, we applied the unsupervised part of the learning signal to the
samples with low-quality labels to use the information contained to enhance the diversity of the model’s feature space.
Additionally, we enhanced this framework by extending it into the temporal dimension, allowing the model to fully
extract the phenological feature and change patterns of croplands from the high-density Satellite Image Time Series
(SITS). The contributions of this paper are as follows:
1) Given the high costs of large-scale cropland mapping, we proposed a weakly-supervised framework that leverages
existing GLC data without manual labels. It further alleviates the impact of residual errors in filtered high-quality
labeled samples, while effectively utilizing the plentiful information contained in low-quality labeled samples.
2) In the framework, we flexibly incorporated multi-temporal DCNNs with SITS to capture the phenological features
of croplands. We further visualized these high-dimensional features to uncover how multi-temporal information
enhances cropland extraction, and assessed the method’s robustness under conditions of data scarcity to validate
its practical applicability.
3) We conducted experiments in three study areas to test the feasibility and stability of our method for large-scale
cropland mapping. We also investigated the internal working mechanisms and temporal generalizability of the
proposed framework.
The remainder of this paper is organized as follows: Section 2 presents the materials and process employed, and
Section 3 describes the workflow of the proposed framework and the detail of applied network architecture. Section 4
and Section 5 present experimental results and discussion, and Section 6 summarizes and concludes the paper.
• Hunan Province in China: Hunan province is situated in the central-southern part of China, covering a total area of
211,800 km2 . As one of China’s largest rice-planting bases, Hunan province ranks among the top ten in national
grain production. The province’s diverse terrain, ranging from semi-alpine to low mountains, hills, basins, and
Figure 1: The location and extent of the three study areas. The red regions are used to validate the accuracy of the
cropland mapping result.
plains, presents significant challenges for cropland monitoring. The area has a continental subtropical monsoonal
humid climate, offering abundant light, heat, and water resources, which is suitable for cultivating a variety of crops
including rice, rapeseed, cotton, and tea. The rice types grown here include double-season rice, medium-season rice,
and late-season rice, each with distinct sowing and harvesting periods from April to November.
• Southwest France: We chose the southwest part of France, encompassing about 195,910 km2 , which represents 2/5
of the entire country, as our study area. This region has several major grain-producing areas such as Aquitaine Basin
and Centre-Val de Loire. The landscape is dominated by basins and valleys. It has a temperate oceanic climate,
suitable for cultivating a variety of crops, such as spring wheat, soybeans, and olives. Spring wheat and soybeans
are typically sown in spring and harvested in late summer, while the olives are planted in early summer and harvested
in mid-autumn.
• Kansas State in the USA: Kansas state is located in the middle of the USA, covering a total area of 213,096 km2 . It
is a major grain-producing region in the United States, with the highest wheat production in the country. Kansas has
large plains, suitable for extensive mechanized agricultural activities. The area has a temperate continental climate,
favoring growing winter wheat and corn. Winter wheat is sown in mid-September and harvested in late June to early
July of the following year. Corn is sown in mid-April and harvested in mid-October.
We collected Sentinel-2A and Sentinel-2B Level-2A Bottom of Atmosphere reflectance images (S2 L2A) from
Google Cloud, encompassing the three study areas for the period from January 2020 to December 2020. For cropland
extraction, we selectively utilized the Blue, Green, Red, and Near-Infrared (NIR) bands (10 m spatial resolution), along
with Narrow NIR, Red Edge (RE), and Short-Wave Infrared (SWIR) bands (20 m spatial resolution). Furthermore, we
used the Quality Assurance (QA) bands provided by the ESA on Google Earth Engine (GEE) as a reference to select
the images with cloud coverage less than 20% (Amani et al., 2020). The QA bands were also used to identify cloudy
regions in the remaining images, which were filled using the cloud-free images from adjacent time phases. Finally, all
SITS were composed of monthly images, where each month’s image was obtained by averaging all available images
during the month.
• ESA World Cover (ESA) : It was developed as a part of the ESA WorldCover project, under the 5th Earth
Observation Envelope Program (EOEP-5) of the European Space Agency. The product was generated by SITS
from Sentinel-1 and Sentinel-2 with multiple random forest classifiers from 2020 to 2021. It contains 11 first-level
categories, and we used the ’Cultivated areas’ category from the 2020 product to obtain the labels. This category is
defined as land covered with annual cropland that is sowed/planted and harvestable at least once within the 12 months
after the sowing/planting date. The annual cropland produces an herbaceous cover and is sometimes combined with
some tree or woody vegetation (Zanaga et al., 2022b).
• Esri Land Cover (Esri) : It is a 10-m resolution map of the Earth’s land surface, published by the Environmental
Systems Research Institute. This map was annually generated from 2017 to 2022 using the composite Sentinel-2
satellite images by deep learning models. The product divides the land cover into 9 categories, and we used the
’cropland’ category from the 2020 product to obtain the labels. This category includes crops, human-planted cereals,
and other non-tree-height cultivated plants. (Karra et al., 2021b).
• Dynamic World (DyWorld) : It was developed by Google and the World Resources Institute. It is a near real-time
global land use and cover dataset, updated in sync with the revisit cycle of the Sentinel-2 satellite. The product was
generated by deep learning models using available images from 2015 to 2023. We collated all available data from
various time phases in 2020 and determined the land cover categories for each pixel based on the mode principle
derived from the statistical analysis. It divides land cover into 9 categories, and we used the ’crop’ category to obtain
the labels. This category is defined as crops humans planted/plotted cereals. (Brown et al., 2022b).
To assess the accuracy and completeness of the cropland mapping obtained by the proposed method, we established
validation datasets for the three study areas. Considering that random sampling alone may skew the validation datasets
toward particular cropland types prevalent in the study areas, potentially compromising the comprehensiveness and
objectivity of the cropland extraction assessment in diverse environments. To address this issue, we introduced manual
intervention in the selection process for the validation area. Specifically, we classified the entire study area based on the
topographical features, and then utilized this result as a foundation to manually refine the sub-areas that were initially
selected through random sampling.
For the Hunan study area, which is predominantly covered by hills and mountains, we manually excluded certain
mountainous and hilly sub-areas and included sub-areas dominated by plains. For the Kansas study area, where
plains are the dominant landscape, we excluded some sub-areas of plains, and added sub-areas that include hills and
mountains. The validation labels of Hunan and Kansas study areas were obtained by visually interpreting the RGB
bands of Sentinel-2 time series images, assisted by high-resolution Google images. In the Southwest France study
area, we utilized the S4A crop classification datasets (Sykas et al., 2022) as a basis and modified them for validation
purposes. We initially checked all the labels, and selected sub-areas with relatively comprehensive cropland annotations
for further supplementation and adjustment. Subsequently, we manually selected the sub-areas to ensure a balance
among various types of cropland scenes. The sampling results of the final validation areas are shown in Figure 1.
Finally, in the Hunan study area, we labeled a total of 978,388 cropland fields, accounting for 18.00% of the
validation area. In the Southwest France study area, we identified 520,177 cropland fields, which covered 40.48% of
the validation area. In the Kansas study area, we identified 185,250 cropland fields, accounting for 48.24% of the
validation area. Note that, ’cropland fields’ refer to a piece of cropland separated from one another by identifiable
boundaries (FAO., 2010; Xu et al., 2024). Each image-level sample may include several cropland fields. Some typical
samples of each study area are shown in Figure 2.
3. Methodology
The proposed framework aims to effectively use the prior information from existing GLC products for large-scale
cropland mapping. By using this prior information, the model can learn the cropland phenology features from SITS
without manual labeling. As shown in Figure 3, the framework consists of three parts. (1) Labels collecting and
quality rating: We collect labels from GLC products, and evaluate their quality. These labels are then categorized
into high-quality and low-quality parts; (2) Construction of weakly supervised learning signal: we construct the
supervised part of the learning signal using high-quality labels, and encode the image intrinsic feature distribution to
construct the unsupervised part of the learning signal. By constructing the unsupervised part, we not only incorporate
the abundant information contained in the samples with low-quality labels into the model learning process, but also
prompt the model to assess and question the reliability of the high-quality labels. Additionally, we extend the weakly
supervised signal in the temporal dimension to sufficiently extract the phenology features of cropland. (3) Accuracy
assessment: we utilize the well-trained models for large-scale cropland mapping, and establish validation datasets to
evaluate their performance.
{
1, if 𝑃1 (𝑖, 𝑗) = 𝑃2 (𝑖, 𝑗) = … = 𝑃𝑚 (𝑖, 𝑗)
𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) = (1)
0, otherwise
1 ∑𝑀
(𝑖, 𝑗) = 𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) ∗ (𝑃 (𝑖, 𝑗)) (2)
𝑀 𝑚=0 𝑚
where 𝑃𝑚 (𝑖, 𝑗) is the cropland labels of the m-th product at position (𝑖, 𝑗), and 𝑀 represents the total number of products
Figure 3: The workflow of the proposed framework. (a) Labels collecting and quality rating, (b) Weakly supervised learning
signal construction, (c) Accuracy assessment.
∑
𝐻 ∑
𝑊
𝐿𝑜𝑠𝑠𝑆𝐿 = − ((𝑖, 𝑗) ∗ log (𝑀ℎ𝑖𝑔ℎ (𝑖, 𝑗) ∗ (𝑖, 𝑗))) (3)
𝑖 = 0𝑗 = 0
Due to the quality limitation of the products, the high-quality labels may contain some errors. To avoid over-fitting
these errors, inspired by Hua et al. (2021) and Sabokrou et al. (2019), we construct the unsupervised part of the learning
signal as the regularization term to constrain the model. The construction is based on two assumptions: (1) In the visual
domain, two visually similar samples have a higher probability of belonging to the same semantical concept (Sabokrou
Figure 4: The flowchart to construct the supervised part of the learning signal.
Figure 5: The flowchart to construct the unsupervised part of the learning signal.
et al., 2019), which means these samples are adjacent in the model’s feature space. (2) In the spatial domain, the land
covers are continuous and aggregated (Jiang, 2015), which means the adjacent samples with the most similarities
should belong to the same category. The regularization term allows the model to enhance the stability of the feature
space during the optimization process, which can alleviate cognitive bias caused by the over-fitting of the remaining
errors.
To enhance the model’s generalization ability in large-scale cropland mapping, we also employ the unsupervised
part of the learning signal in the low-quality samples. Thus, the information from these samples can be involved in the
model optimization process, balancing the intra-class feature distribution of the training samples and improving the
diversity of the feature space.
In Figure 5, the feature space is represented by the fused intermediate feature maps obtained from the multi-
temporal network. Given the sample 𝑥𝑛 , which represents the nth pixel in an image mapped to the high-dimensional
feature space , where 𝑁 = 𝑊 ×𝐻 and 𝑛 ∈ ℕ. In the visual domain, the sample 𝑥𝑠𝑛 and 𝑥𝑑𝑛 are identified as the samples
with the highest similarity and difference, respectively, to 𝑥𝑛 . These are determined by searching within the same image
using the Sorensen-Dice index. We then encourage the feature distance between 𝑥𝑛 and 𝑥𝑠𝑛 ∕𝑥𝑑𝑛 to be as small/large
as possible during model optimization In the spatial domain, given 𝑥𝑠𝑛 𝑛 as the sample with the highest similarity to
𝑥𝑛 among its eight-neighborhood samples, we encourage the model to minimize their feature distance. Ultimately, the
constraints from both the visual and spatial domains are integrated to yield the unsupervised loss 𝐿𝑜𝑠𝑠𝑈 𝑆𝐿 :
∑
𝑁 ∑
𝑁 ∑
𝑁
𝐿𝑜𝑠𝑠𝑈 𝑆𝐿 = 𝛼 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑠𝑛 )] − 𝛽 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑑𝑛 )] + 𝛾 𝐷𝐾𝐿 [(𝑥𝑛 ), (𝑥𝑠𝑛
𝑛 )] (4)
𝑛=0 𝑛=0 𝑛=0
where 𝛼, 𝛽, and 𝛾 are the priori parameters to measure the importance of different terms. 𝐷𝐾𝐿 represents the Kullback-
Leibler Divergence. At last, the entire model is optimized by minimizing the weakly supervised loss 𝐿𝑜𝑠𝑠𝑊 𝑆 that
combines the aforementioned supervised and unsupervised parts:
{
𝑋, 𝑙=0
𝑙
𝑒 = (6)
𝔼𝑙 (𝑒𝑙−1 )𝑇𝑡=0 , for 𝑙 ∈ [1, 𝐿]
{
LTAE(𝑒𝑙 ), if 𝑙 = 𝐿
𝑙
𝑎 = (7)
resize[LTAE(𝑒𝑙 )]𝐿−1
𝑙=0
, for 𝑙 ∈ [0, 𝐿 − 1]
∑𝑇
𝑙 = 𝑙 𝑙
𝑡=0 𝐶𝑜𝑛𝑣1×1 [𝑎𝑡 ⊙ 𝑒𝑙𝑡 ]𝐿
𝑙=0
, for 𝑙 ∈ [0, 𝐿] (8)
where 𝐶𝑜𝑛𝑣𝑙1×1 is a shared 1 × 1 convolution layer of width 𝐶 𝑙 and ⊙ is the term-wise multiplication with channel
broadcasting.
In the Spatial-Temporal decoding part, a multi-level convolutional decoder 𝔻𝑙 is used to generate single spatial-
temporal feature maps on different scales. In detail, the feature map 𝑙 connects with compressed features 𝑙 channel-
wise, and continuously up-sampled by transposed convolution 𝔻𝑙𝑢𝑝 to get the multi-scale spatial-temporal feature maps:
{
𝐿 , if 𝑙 = 𝐿
𝑙 = (9)
𝔻𝑙 (𝔻𝑙𝑢𝑝 [(𝑙−1 )], 𝑙 ) , for 𝑙 ∈ [0, 𝐿 − 1]
4. Experiment
4.1. Experiment setting
Experiments used the datasets collected from Hunan province in China, Southeast France, and Kansas state in
the USA. We cropped all the images into the size of 256*256 pixels. For the training dataset, we produced 32,318
patches in the Hunan study area, 29,893 in Southeast France, and 32,515 in Kansas. For the validation dataset, we
obtained 15,484 patches within the Hunan study area, 6,152 in Southeast France, and 7,697 in Kansas. In the process
of cropland mapping, we segmented all the images into multiple patches with a sliding length of 128 pixels, and utilized
probabilistic prediction results to integrate the final result.
All models were trained using PyTorch on the Ubuntu 16.04 operation system with an NVIDIA GTX3080 GPU
(11-GB memory). Each model was trained using the Adam optimizer, with a batch size of 8 and 100 epochs. The
learning rate was initially set to 1 × 10−3 and decrease it to 1 × 10−4 for the last 50 epochs.
Figure 6: Confusion matrices represented by area ratios for Hunan, Southwest France, and Kansas study areas
Table 2
The accuracy evaluation of our cropland mapping results across three study areas
Non-cropland Cropland
Study Areas OA(%) mIoU(%) Avg.F1-score(%)
PA(%) UA(%) PA(%) UA(%)
Hunan,China 86.26% 65.86% 77.91% 90.09% 92.94% 68.82% 60.38%
Southwest France 80.95% 67.47% 80.50% 81.40% 74.09% 80.63% 86.44%
Kansas,USA 88.43% 79.15% 88.36% 92.81% 85.95% 83.72% 91.56%
Table 3
Cropland mapping accuracy of our framework and the cropland layers across the three GLC products.
Study Areas Products OA(%) mIoU(%) Avg.F1-score(%) Crop.F1-score(%) Non-crop.F1-score(%)
It is observable that compared to other study areas, the accuracy of the Hunan is relatively low. This is attributed
to the fact that the farming pattern in Hunan is predominantly smallholder, which is distinct from the large-scale
agricultural operations common in Europe and America. Meanwhile, the prevalence of hilly and mountainous terrain
in the Hunan region leads to smaller average field sizes, a more fragmented spatial distribution, and a greater diversity
of cropland types. These factors collectively render the classification of croplands in this area more challenging. We
also visualized the cropland mapping result in Figure 7(a), and selected typical samples of three terrain types (plains,
hills, and mountains) in each study area to demonstrate the details Figure 7(b). Specifically, we determined the terrain
types by analyzing the slope of each pixel using the Digital Elevation Model (DEM) from STRM. All samples were
categorized into plains (0° to 2°), hills (2° to 6°), and mountains (greater than 6°). As shown in Figure 7, our framework
successfully extracts plain croplands, which tend to have relatively large average field size, exhibiting a clear distinction
between built-up areas and rivers. Hills croplands and mountain croplands exhibit relatively small average field sizes,
which typically exhibit a fragmented distribution mixed with other types of vegetation cover, and our framework is
also capable of accurately identifying their boundaries.
Furthermore, we analyzed the results using key evaluation metrics for the plain croplands (PC), hill croplands
(HC), and mountain croplands (MC) in each study area (Figure 8). In Hunan study areas, the Avg.F1-score is 74.85%,
76.37%, and 72.84% for plain, hill, and mountain croplands, respectively. The variability is lower in the hill cropland
than in the other two. In the Southwest France study areas, the Avg.F1-score is 81.41%, 71.00%, and 68.76% for
plain, hill, and mountain croplands, respectively. The fluctuations are smaller in plain croplands than in hill and
mountain croplands. In the Kansas study areas, the Avg.F1-score accuracy is 79.75% for plain croplands, 82.29%
for hill croplands, and 83.17% for mountainous croplands. This counterintuitive phenomenon can be attributed to the
incorporation of temporal information. Under the SITS observations, in hilly and mountainous areas, other types of
vegetation exhibit more distinct phenological differences from croplands than in flat regions. Conversely, in the plains,
some croplands demonstrate similar phenological patterns to other vegetation covers, such as shrubs or grasses.
Figure 7: (a) Overall cropland mapping results for Hunan, Southwest France, and Kansas in 2020, (b) the image and
classification results for plain cropland (S1), hill cropland (S2), and mountain cropland (S3), respectively.
Figure 8: (a) Classification map of plain croplands (PC), hill croplands (HC), and mountain croplands (MC) in the validation
regions of the three study areas. (b) the boxplots of main evaluation metrics in different types of croplands. “×” denotes
the location of the average value.
Figure 9: Cropland mapping results of our method and the cropland layers of other GLC products.
Figure 10: The average accuracy of our framework and the cropland layers from ESA, Esri, and Dyworld in three study
areas.
supervised signals derived from high-quality labels to guide their learning process. We took the year-round composite
images as the base data for single-temporal network, and used the dense SITS for multi-temporal networks.
Re-correct method: We utilized the strategy from the RRE framework (Zhang et al., 2023) to construct the
comparison method, which is an automated solution for extracting high-resolution cropland through cross-scale sample
transfer. First, we trained a label corrector using only high-quality labels and multi-temporal images to obtain the labels
of low-quality samples. Then the corrected labels were used to generate supervised signals to guide the model learning
process.
Re-correct method with weakly supervised learning: We chose the WESUP-LCP (Chen et al., 2023) as the
comparison method, which is a weakly supervised semantic segmentation network for product resolution enhancement
based on the re-correct method. This method is also available for our task. Specifically, we used the same filtering
strategy as WESUP-LCP to identify high-quality sample points, which were then extended to pixels with low-quality
labels by the super-pixel method. Then, we took the super-pixel labels to construct the supervised learning signals and
used the deep dynamic label propagation mechanism to generate pseudo-labels for constructing weakly supervised
signals.
As the results in Table 4 show, our method achieved the best accuracy across most assessment metrics in all three
study areas. Our method outperformed other approaches by achieving improvements of 3.38%, 5.05%, and 0.58%
in mIoU, and 7.15%, 4.05%, and 0.33% in crop.F1-score for the Hunan, Southwest France, and Kansas study areas,
respectively. The qualitative visual results in Figure 11 also demonstrate that our method outperformed the others in
terms of completeness and the ability to capture detailed information across all three study areas.
In the Hunan study areas, our method (Figure 11. (a) and Figure 11. (b)) outperformed both the discard and re-
correct methods in mapping fragmented croplands with small average field sizes. The discard method limited the
model’s ability to learn the abundance of information from the region with low-quality labels, resulting in overfitting
to cropland features with large average field sizes and a failure to recognize fragmented croplands with small average
field sizes. The re-correct methods introduced errors into high-quality labels, which were then amplified during label
propagation, leading the model to misclassify other land covers as cropland. The WESUP-LCP was designed for
increasing label resolution and it was based on the re-correct method. Although it grasped more details than the other
methods, it was still unable to accurately identify fragmented croplands with small average field sizes.
Table 4
The classification accuracy of DeeplabV3+, Unet-3D, LSTM, U-TAE, RRE, WESUP-LCP and our method in the study
areas of Hunan, Southwest France, and Kansas.
In the Southwest France study area, our method (Figure 11. (c) and Figure 11. (d)) extracted croplands with
significantly different features. Other methods only identified the croplands that were not planted in a specific period
(showed yellow soil), but omitted the planted croplands in the same period. This is because different GLC products
present great inconsistency in the cropland regions with unusual phenological attributes, which makes it difficult to
obtain high-quality labels for these samples. For the discard methods, the absence of labels caused the model to solely
focus on typical cropland features, but lose the ability to recognize diverse cropland features. In addition, due to a lack
of reliable references, the re-correct methods may generate a lot of incorrect labels for cropland areas with unusual
phenological attributes. This consequently leads to the model overfitting inaccurate information.
In the Kansas study areas (Figure 11. (e) and Figure 11. (f)), the croplands have large average field sizes and share
similar features, facilitating the generation of a large number of high-quality labels. Therefore, both the discard and
re-correct methods performed well in these areas. Nevertheless, there is still the risk of misclassifying other vegetation
cover as cropland. In contrast, the proposed method was capable of accurately detecting the boundary between farmland
and other vegetation covers, such as lawns and shrubs.
Additionally, as shown in Table 4, the networks that incorporate multi-temporal information exhibit significant
advantages over the single-temporal network, indicating that using time-series information enables the model to
enhance its ability to distinguish croplands from other land covers. We further discussed the necessity of using temporal
information in Section 5.2.
5. Discussion
In this section, we conducted ablation experiments across all three study areas to comprehensively analyze the
input setting, and further discussed the limitations and effects of using temporal information under our framework.
We discussed four questions: the impacts of using different GLC product combinations as inputs, the temporal
Figure 11: The classification results of DeeplabV3+, Unet-3D, LSTM, U-TAE, RRE, WESUP-LCP, and our method in the
three study areas. (a) and (b) for Hunan, (c) and (d) from Southwest France, (e) and (f) for Kansas.
generalizability of our framework, the benefit of expanding the framework in the temporal dimension, and the
robustness of the proposed framework in real-world scenarios.
Table 5
The number/accuracy of high-quality labels and the accuracy of the final prediction results obtained by using different
combinations of GLC products as inputs.
Inputs
Study Areas Combination of Products Prediction Avg.F1-score
Label Avg.F1-score Label Ratio
DyWorld ESA Esri
√ √
78.87% 67.99% 74.69%
√ √
75.33% 72.02% 72.99%
Hunan Province,China √ √
79.10% 70.50% 75.80%
√ √ √
80.70% 64.93% 77.91%
√ √
84.68% 75.77% 79.37%
√ √
85.73% 81.48% 79.89%
Southwest France √ √
77.28% 90.66% 73.49%
√ √ √
87.56% 73.99% 80.50%
√ √
91.49% 82.30% 88.13%
√ √
86.87% 81.12% 85.48%
Kansas, USA √ √
90.40% 82.26% 88.21%
√ √ √
91.65% 72.85% 88.36%
Figure 12: The average accuracy of our framework following Direct Transfer (DT), Continue Training (CT), and the
accuracy of other GLC products in 2021 across three study areas.
in the three study areas, but the model’s temporal generalization ability remains unclear. To address this, we randomly
selected 1,000 samples of changed cropland from each of the three study areas, each sample with a size of 256*256
pixels. For those samples, we collected the corresponding SITS and GLC products for the year 2021, while manually
labeling them with the assistance of Google Earth images. Furthermore, we designed two sets of experiments to
demonstrate the temporal generalizability of our method: (1) Direct Transfer (DT): The model trained on the 2020
data was directly employed on the 2021 data without any modification; and (2) Continue Training (CT): The model
trained on the 2020 data was further trained using 2021 data and then applied to the 2021 data.
Figure 13: The classification accuracy of using seasonal composite images and whole year SITS in the three study areas.
As shown in Figure 12, the average accuracy of our framework after the DT operation does not exceed that of
the best GLC products. This is due to the changing phenological feature of cropland influenced by varying climatic
conditions and planting patterns between years, which results in the models trained on the 2020 data being unsuitable
for the 2021 data. However, our method does not require any manual labeling cost, which facilitates the incorporation
of new data for continued model training. This allows the model to progressively enhance the ability to recognize the
croplands with different phenological features. Therefore, our framework after the CT operation achieved superior and
more stable performance compared to other products. It outperformed the best-performing GLC product by 1.82%,
2.87%, and 2.10% in OA, mIoU, and F1- scores, respectively across all study areas.
This result illustrates the limitations of our method regarding temporal generalizability when directly transferred.
However, these limitations can be addressed by further training the model using available data without labeling costs.
Figure 14: The t-SNE visualization results of seasonal composite images and whole year SITS in the three study areas.
that period. This similarity can lead to misidentification of oilseed rape as other vegetation types. As illustrated in
Figure 13, the extraction result in Hunan areas remains low during the autumn seasons.
Therefore, we extracted phenological features by integrating dense SITS and employed them to enhance the
separability of cropland within the model’s feature space, thereby augmenting the completeness of cropland extraction.
To further explore the benefit of the incorporation of multi-temporal information, we employed the t-distributed
stochastic neighbor embedding (t-SNE) method to visualize intermediate feature maps of the models trained with
seasonal composite images and whole-year SITS. As shown in Figure 14, the intermediate feature maps derived from
models using whole-year SITS demonstrated superior separability across all three study areas compared to those using
seasonal composite images. Specifically, regarding the entirety of cropland, the inclusion of multi-temporal information
can help the model to better distinguish between cropland and other land covers, thereby reducing feature confusion
related to planting status during specific periods. In terms of intra-cropland, multi-temporal information enables the
model to reduce intra-class feature dissimilarity within the croplands, facilitating the recognition of diverse croplands
encompassing various crop types simultaneously.
Figure 15: The accuracy of the proposed framework in situations of various spatial and temporal missing rates. The red
line in the color bar indicates the average accuracy of three GLC products in the corresponding study area.
The results in Figure 15 show that our framework exhibits considerable robustness to the data loss caused by
cloud cover in all three study areas. In the Hunan study area, compared to the Avg.F1-score of multiple GLC products
at 71.51%, our framework still exhibits better performance even with a spatial missing rate of 30% and a temporal
missing rate of 66.67%. In the Southwest France study area, compared with the Avg.F1-score of 75.09% for multiple
GLC products, our framework shows better performance under the situation of 10.00% spatial missing rate and 33.33%
temporal missing rate. In the Kansas study area, compared with the Avg.F1-score of 82.31% for multiple GLC products,
our framework has better performance under the situation of 20.00% spatial missing rate and 50.00% temporal missing
rate. These results demonstrate the strong feasibility of our framework in real-world data-missing scenarios. The
exceptional performance in the Hunan study area may be attributed to the training data affected by distinctive climatic
conditions. The local model has been already exposed to the data-missing situation in the training process, allowing
the model to gain better adaptability in this case.
6. Conclusion
In this study, we proposed a weakly supervised framework for large-scale cropland mapping using multi-temporal
information. The framework uses the labels from existing GLC products and dense SITS to capture the diverse temporal
features of cropland influenced by crop phenology and human agricultural activities, without the need for manual
labeling. The approach enables the model to effectively utilize the information in low-quality labeled samples, while
avoiding over-fitting the residual errors in high-quality labeled samples. In the experiments across three study areas,
the proposed framework demonstrated superiority over GLC products and outperformed the traditional methods that
rely on discard and re-correct methods. Furthermore, we investigated the effects of input setting and the temporal
generalizability of the proposed framework, while exploring the necessity of using multi-temporal information and the
robustness of our framework in a cloud cover scenario. Further efforts can be made to enhance the efficiency of the
proposed framework and reinforce the robustness of the model in real-world scenarios with missing information.
Acknowledgements
The work presented in this paper was supported by the National Natural Science Foundation of China (No.
42171376); The Distinguished Young Scholars under Grant 2022JJ10072; and the Open Fund of Xiangjiang Labo-
ratory under Grant 22XJ03007.
References
Amani, M., Ghorbanian, A., Ahmadi, S.A., Kakooei, M., Moghimi, A., Mirmazloumi, S.M., Moghaddam, S.H.A., Mahdavi, S., Ghahremanloo,
M., Parsian, S., Wu, Q., Brisco, B., 2020. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A
Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 13, 5326–5350. https://ieeexplore.ieee.org/
document/9184118/.
Belgiu, M., Csillik, O., 2018. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis.
Remote Sensing of Environment 204, 509–523. https://linkinghub.elsevier.com/retrieve/pii/S0034425717304686.
Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko,
S., Schwehr, K., Weisse, M., Stolle, F., Hanson, C., Guinan, O., Moore, R., Tait, A.M., 2022a. Dynamic World, Near real-time global 10 m land
use land cover mapping. Sci Data 9, 251. https://www.nature.com/articles/s41597-022-01307-4.
Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko,
S., et al., 2022b. Dynamic world, near real-time global 10 m land use land cover mapping. Scientific Data 9, 251.
Calvao, T., Pessoa, M., 2015. Remote sensing in food production-a review. Emir. J. Food Agric 27, 138. http://www.ejfa.me/index.php/
journal/article/view/652.
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H., 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image
Segmentation, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer International Publishing,
Cham. volume 11211, pp. 833–851. https://link.springer.com/10.1007/978-3-030-01234-2_49.
Chen, Y., Zhang, G., Cui, H., Li, X., Hou, S., Ma, J., Li, Z., Li, H., Wang, H., 2023. A novel weakly supervised semantic segmentation
framework to improve the resolution of land cover product. ISPRS Journal of Photogrammetry and Remote Sensing 196, 73–92. https:
//linkinghub.elsevier.com/retrieve/pii/S0924271622003422.
Chi, M., Plaza, A., Benediktsson, J.A., Sun, Z., Shen, J., Zhu, Y., 2016. Big Data for Remote Sensing: Challenges and Opportunities. Proc. IEEE
104, 2207–2219. https://ieeexplore.ieee.org/document/7565634/.
Coluzzi, R., Imbrenda, V., Lanfredi, M., Simoniello, T., 2018. A first assessment of the Sentinel-2 Level 1-C cloud mask product to support
informed surface analyses. Remote Sensing of Environment 217, 426–443. https://linkinghub.elsevier.com/retrieve/pii/
S0034425718303742.
Copernicus Climate Change Service, 2019. Land cover classification gridded maps from 1992 to present derived from satellite observations.
https://cds.climate.copernicus.eu/doi/10.24381/cds.006f2c9a.
Defourny, P., Bontemps, S., Bellemans, N., Cara, C., Dedieu, G., Guzzonato, E., Hagolle, O., Inglada, J., Nicola, L., Rabaute, T., Savinaud, M.,
Udroiu, C., Valero, S., Bégué, A., Dejoux, J.F., El Harti, A., Ezzahar, J., Kussul, N., Labbassi, K., Lebourgeois, V., Miao, Z., Newby, T.,
Nyamugama, A., Salh, N., Shelestov, A., Simonneaux, V., Traore, P.S., Traore, S.S., Koetz, B., 2019. Near real-time agriculture monitoring
at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world.
Remote Sensing of Environment 221, 551–568. https://linkinghub.elsevier.com/retrieve/pii/S0034425718305145.
Deng, P., Xu, K., Huang, H., 2021. When cnns meet vision transformer: A joint framework for remote sensing scene classification. IEEE Geoscience
and Remote Sensing Letters 19, 1–5.
Do Nascimento Bendini, H., Garcia Fonseca, L.M., Schwieder, M., Sehn Körting, T., Rufin, P., Del Arco Sanches, I., Leitão, P.J., Hostert, P.,
2019. Detailed agricultural land classification in the Brazilian cerrado based on phenological information from dense satellite image time series.
International Journal of Applied Earth Observation and Geoinformation 82, 101872. https://linkinghub.elsevier.com/retrieve/pii/
S0303243418308961.
FAO., 2010. World programme for the census of agriculture.
Fare Garnot, V.S., Landrieu, L., 2021. Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks, in:
2021 IEEECVF Int. Conf. Comput. Vis. ICCV, IEEE, Montreal, QC, Canada. pp. 4852–4861. https://ieeexplore.ieee.org/document/
9711189/.
Food and Agriculture Organization of the United Nations (Ed.), 2005. A System of Integrated Agricultural Censuses and Surveys. Number 11 in
FAO Statistical Development Series, Food and Agriculture Organization of the United Nations, Rome.
Friedl, M., Sulla-Menashe, D., 2019. MCD12Q1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V006. https:
//lpdaac.usgs.gov/products/mcd12q1v006/.
Garnot, V.S.F., Landrieu, L., 2020. Lightweight Temporal Self-attention for Classifying Satellite Images Time Series, in: Lemaire, V., Malinowski,
S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (Eds.), Advanced Analytics and Learning on Temporal Data. Springer International Publishing,
Cham. volume 12588, pp. 171–181. http://link.springer.com/10.1007/978-3-030-65742-0_12.
Gong, P., Wang, J., Yu, L., Zhao, Y., Zhao, Y., Liang, L., Niu, Z., Huang, X., Fu, H., Liu, S., Li, C., Li, X., Fu, W., Liu, C., Xu, Y., Wang, X., Cheng,
Q., Hu, L., Yao, W., Zhang, H., Zhu, P., Zhao, Z., Zhang, H., Zheng, Y., Ji, L., Zhang, Y., Chen, H., Yan, A., Guo, J., Yu, L., Wang, L., Liu,
X., Shi, T., Zhu, M., Chen, Y., Yang, G., Tang, P., Xu, B., Giri, C., Clinton, N., Zhu, Z., Chen, J., Chen, J., 2013. Finer resolution observation
and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. International Journal of Remote Sensing 34,
2607–2654. https://www.tandfonline.com/doi/full/10.1080/01431161.2012.748992.
Hermosilla, T., Wulder, M.A., White, J.C., Coops, N.C., 2022. Land cover classification in an era of big and open data: Optimizing localized
implementation and training data selection to improve mapping outcomes. Remote Sensing of Environment 268, 112780. https://
linkinghub.elsevier.com/retrieve/pii/S0034425721005009.
Hua, Y., Marcos, D., Mou, L., Zhu, X.X., Tuia, D., 2021. Semantic segmentation of remote sensing images with sparse annotations. IEEE Geoscience
and Remote Sensing Letters 19, 1–5.
Huang, Y., Chen, Z.x., Yu, T., Huang, X.z., Gu, X.f., 2018. Agricultural remote sensing big data: Management and applications. Journal of
Integrative Agriculture 17, 1915–1931. https://linkinghub.elsevier.com/retrieve/pii/S2095311917618598.
Jiang, B., 2015. Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity. GeoJournal 80, 1–13. http:
//link.springer.com/10.1007/s10708-014-9537-y.
Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P., 2021a. Global land use / land cover with Sentinel 2 and deep
learning, in: 2021 IEEE Int. Geosci. Remote Sens. Symp. IGARSS, IEEE, Brussels, Belgium. pp. 4704–4707. https://ieeexplore.ieee.
org/document/9553499/.
Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P., 2021b. Global land use/land cover with sentinel 2 and deep
learning, in: 2021 IEEE international geoscience and remote sensing symposium IGARSS, IEEE. pp. 4704–4707.
Karthikeyan, L., Chawla, I., Mishra, A.K., 2020. A review of remote sensing applications in agriculture for food security: Crop growth and yield, irri-
gation, and crop losses. Journal of Hydrology 586, 124905. https://linkinghub.elsevier.com/retrieve/pii/S0022169420303656.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. https://www.nature.com/articles/nature14539.
Lenczner, G., Chan-Hon-Tong, A., Le Saux, B., Luminari, N., Le Besnerais, G., 2022. Dial: Deep interactive and active learning for semantic
segmentation in remote sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 3376–3389.
Li, C., Xian, G., Zhou, Q., Pengra, B.W., 2021. A novel automatic phenology learning (APL) method of training sample selection using
multiple datasets for time-series land cover mapping. Remote Sensing of Environment 266, 112670. https://linkinghub.elsevier.
com/retrieve/pii/S0034425721003904.
Li, H., Song, X.P., Hansen, M.C., Becker-Reshef, I., Adusei, B., Pickering, J., Wang, L., Wang, L., Lin, Z., Zalles, V., et al., 2023. Development of
a 10-m resolution maize and soybean map over china: Matching satellite-based crop classification with sample-based area estimation. Remote
Sensing of Environment 294, 113623.
Li, J., Huang, X., Gong, J., 2019. Deep neural network for remote-sensing image interpretation: Status and perspectives. Natl. Sci. Rev. 6, 1082–1086.
https://academic.oup.com/nsr/article/6/6/1082/5484863.
Li, K., Xu, E., 2020. Cropland data fusion and correction using spatial analysis techniques and the Google Earth Engine. GIScience & Remote
Sensing 57, 1026–1045. https://www.tandfonline.com/doi/full/10.1080/15481603.2020.1841489.
Liu, Y., Wu, Y., Chen, Z., Huang, M., Du, W., Chen, N., Xiao, C., 2022. A Novel Impervious Surface Extraction Method Based on Automatically
Generating Training Samples From Multisource Remote Sensing Products: A Case Study of Wuhan City, China. IEEE J. Sel. Top. Appl. Earth
Observations Remote Sensing 15, 6766–6780. https://ieeexplore.ieee.org/document/9854083/.
Naboureh, A., Li, A., Bian, J., Lei, G., 2023. National Scale Land Cover Classification Using the Semiautomatic High-Quality Reference Sample
Generation (HRSG) Method and an Adaptive Supervised Classification Scheme. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
16, 1858–1870. https://ieeexplore.ieee.org/document/10035401/.
Nanni, L., Ghidoni, S., Brahnam, S., 2017. Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognition 71,
158–172. https://linkinghub.elsevier.com/retrieve/pii/S0031320317302224.
North, H.C., Pairman, D., Belliss, S.E., 2019. Boundary Delineation of Agricultural Fields in Multitemporal Satellite Imagery. IEEE J. Sel. Top.
Appl. Earth Observations Remote Sensing 12, 237–251. https://ieeexplore.ieee.org/document/8584043/.
Oliphant, A.J., Thenkabail, P.S., Teluguntla, P., Xiong, J., Gumma, M.K., Congalton, R.G., Yadav, K., 2019. Mapping cropland extent of Southeast
and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine Cloud.
International Journal of Applied Earth Observation and Geoinformation 81, 110–124. https://linkinghub.elsevier.com/retrieve/
pii/S0303243418307414.
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E., Wulder, M.A., 2014. Good practices for estimating area and assessing
accuracy of land change. Remote sensing of Environment 148, 42–57.
Pelletier, C., Valero, S., Inglada, J., Champion, N., Dedieu, G., 2016. Assessing the robustness of Random Forests to map land cover with high
resolution satellite image time series over large areas. Remote Sensing of Environment 187, 156–168. https://linkinghub.elsevier.
com/retrieve/pii/S0034425716303820.
Persello, C., Tolpekin, V., Bergado, J., de By, R., 2019. Delineation of agricultural fields in smallholder farms from satellite images using fully
convolutional networks and combinatorial grouping. Remote Sensing of Environment 231, 111253. https://linkinghub.elsevier.com/
retrieve/pii/S003442571930272X.
Prince, S.D., 2019. Challenges for remote sensing of the Sustainable Development Goal SDG 15.3.1 productivity indicator. Remote Sensing of
Environment 234, 111428. https://linkinghub.elsevier.com/retrieve/pii/S003442571930447X.
Rustowicz, R., Cheong, R., Wang, L., Ermon, S., Burke, M., Lobell, D., 2019. Semantic segmentation of crop type in africa: A novel dataset and
analysis of deep learning methods, in: CVPR Workshops, pp. 75–82. https://api.semanticscholar.org/CorpusID:198180478.
Sabokrou, M., Khalooei, M., Adeli, E., 2019. Self-Supervised Representation Learning via Neighborhood-Relational Encoding, in: 2019 IEEECVF
Int. Conf. Comput. Vis. ICCV, IEEE, Seoul, Korea (South). pp. 8009–8018. https://ieeexplore.ieee.org/document/9010354/.
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c., 2015. Convolutional lstm network: A machine learning approach for precipitation
nowcasting. Advances in neural information processing systems 28. https://proceedings.neurips.cc/paper_files/paper/2015/
file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf.
Singh, G., Singh, S., Sethi, G., Sood, V., 2022. Deep Learning in the Mapping of Agricultural Land Use Using Sentinel-2 Satellite Data. Geographies
2, 691–700. https://www.mdpi.com/2673-7086/2/4/42.
Stinson, G., Magnussen, S., Boudewyn, P., Eichel, F., Russo, G., Cranny, M., Song, A., 2016. Canada, in: Vidal, C., Alberdi, I.A., Hernández Mateo,
L., Redmond, J.J. (Eds.), National Forest Inventories. Springer International Publishing, Cham, pp. 233–247. http://link.springer.com/
10.1007/978-3-319-44015-6_12.
Sun, Z., Di, L., Fang, H., 2019. Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data
layer time series. International Journal of Remote Sensing 40, 593–614. https://www.tandfonline.com/doi/full/10.1080/01431161.
2018.1516313.
Sykas, D., Sdraka, M., Zografakis, D., Papoutsis, I., 2022. A Sentinel-2 Multiyear, Multicountry Benchmark Dataset for Crop Classification and
Segmentation With Deep Learning. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 15, 3323–3339. https://ieeexplore.
ieee.org/document/9749916/.
Wagner, M.P., Oppelt, N., 2020. Extracting Agricultural Fields from Remote Sensing Imagery Using Graph-Based Growing Contours. Remote
Sensing 12, 1205. https://www.mdpi.com/2072-4292/12/7/1205.
Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X., 2022. Semi-Supervised Semantic Segmentation Using
Unreliable Pseudo-Labels, in: 2022 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR, IEEE, New Orleans, LA, USA. pp. 4238–4247.
https://ieeexplore.ieee.org/document/9879387/.
Weiss, M., Jacob, F., Duveiller, G., 2020. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236,
111402. https://linkinghub.elsevier.com/retrieve/pii/S0034425719304213.
Wulder, M., Li, Z., Campbell, E., White, J., Hobart, G., Hermosilla, T., Coops, N., 2018. A National Assessment of Wetland Status and Trends
for Canada’s Forested Ecosystems Using 33 Years of Earth Observation Satellite Data. Remote Sensing 10, 1623. http://www.mdpi.com/
2072-4292/10/10/1623.
Wulder, M.A., Dechka, J.A., Gillis, M.A., Luther, J.E., Hall, R.J., Beaudoin, A., Franklin, S.E., 2003. Operational mapping of the land cover of the
forested area of Canada with Landsat data: EOSD land cover program. The Forestry Chronicle 79, 1075–1083. http://pubs.cif-ifc.org/
doi/10.5558/tfc791075-6.
Xu, F., Yao, X., Zhang, K., Yang, H., Feng, Q., Li, Y., Yan, S., Gao, B., Li, S., Yang, J., et al., 2024. Deep learning in cropland field identification:
A review. Computers and Electronics in Agriculture 222, 109042.
Xu, Y., Yu, L., Zhao, Y., Feng, D., Cheng, Y., Cai, X., Gong, P., 2017. Monitoring cropland changes along the Nile River in Egypt over past three
decades (1984–2015) using remote sensing. International Journal of Remote Sensing 38, 4459–4480. https://www.tandfonline.com/doi/
full/10.1080/01431161.2017.1323285.
Yin, J., Dong, J., Hamm, N.A., Li, Z., Wang, J., Xing, H., Fu, P., 2021. Integrating remote sensing and geospatial big data for urban land use
mapping: A review. International Journal of Applied Earth Observation and Geoinformation 103, 102514. https://linkinghub.elsevier.
com/retrieve/pii/S030324342100221X.
Yu, L., Wang, J., Gong, P., 2013. Improving 30 m global land-cover map FROM-GLC with time series MODIS and auxiliary data sets: A
segmentation-based approach. International Journal of Remote Sensing 34, 5851–5867. https://www.tandfonline.com/doi/full/10.
1080/01431161.2013.798055.
Yue, A., Zhang, C., Yang, J., Su, W., Yun, W., Zhu, D., 2013. Texture extraction for object-oriented classification of high spatial resolution remotely
sensed images using a semivariogram. International Journal of Remote Sensing 34, 3736–3759. https://www.tandfonline.com/doi/
full/10.1080/01431161.2012.759298.
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S.,
Lesiv, M., Herold, M., Tsendbazar, N.E., Xu, P., Ramoino, F., Arino, O., 2022a. ESA WorldCover 10 m 2021 v200. https://zenodo.org/
record/7254220.
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S.,
et al., 2022b. Esa worldcover 10 m 2021 v200 .
Zhang, D., Pan, Y., Zhang, J., Hu, T., Zhao, J., Li, N., Chen, Q., 2020. A generalized approach based on convolutional neural networks for large area
cropland mapping at very high resolution. Remote Sensing of Environment 247, 111912. https://linkinghub.elsevier.com/retrieve/
pii/S0034425720302820.
Zhang, H.K., Roy, D.P., 2017. Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover
classification. Remote Sensing of Environment 197, 15–34. https://linkinghub.elsevier.com/retrieve/pii/S0034425717302249.
Zhang, W., Guo, S., Zhang, P., Xia, Z., Zhang, X., Lin, C., Tang, P., Fang, H., Du, P., 2023. A Novel Knowledge-Driven Automated Solution for
High-Resolution Cropland Extraction by Cross-Scale Sample Transfer. IEEE Trans. Geosci. Remote Sensing 61, 1–16. https://ieeexplore.
ieee.org/document/10197441/.
Zhong, L., Hu, L., Zhou, H., 2019. Deep learning based multi-temporal crop classification. Remote sensing of environment 221, 430–443.
Zhu, X.X., Tuia, D., Mou, L., Xia, G.S., Zhang, L., Xu, F., Fraundorfer, F., 2017. Deep Learning in Remote Sensing: A Comprehensive Review
and List of Resources. IEEE Geosci. Remote Sens. Mag. 5, 8–36. doi:10.1109/MGRS.2017.2762307.
Zhu, Z., Gallant, A.L., Woodcock, C.E., Pengra, B., Olofsson, P., Loveland, T.R., Jin, S., Dahal, D., Yang, L., Auch, R.F., 2016. Optimizing selection
of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS Journal of Photogrammetry and Remote
Sensing 122, 206–221. https://linkinghub.elsevier.com/retrieve/pii/S0924271616302829.