Said - Deep Learning For Change Detection
Said - Deep Learning For Change Detection
Review
Deep-Learning for Change Detection Using Multi-Modal Fusion
of Remote Sensing Images: A Review
Souad Saidi 1 , Soufiane Idbraim 1 , Younes Karmoude 1 , Antoine Masse 2 and Manuel Arbelo 3, *
Abstract: Remote sensing images provide a valuable way to observe the Earth’s surface and iden-
tify objects from a satellite or airborne perspective. Researchers can gain a more comprehensive
understanding of the Earth’s surface by using a variety of heterogeneous data sources, including
multispectral, hyperspectral, radar, and multitemporal imagery. This abundance of different informa-
tion over a specified area offers an opportunity to significantly improve change detection tasks by
merging or fusing these sources. This review explores the application of deep learning for change
detection in remote sensing imagery, encompassing both homogeneous and heterogeneous scenes.
It delves into publicly available datasets specifically designed for this task, analyzes selected deep
learning models employed for change detection, and explores current challenges and trends in the
field, concluding with a look towards potential future developments.
Keywords: change detection; deep learning; remote sensing images; data fusion; multi-source;
multi-sensor; multi-modal
Citation: Saidi, S.; Idbraim, S.;
Karmoude, Y.; Masse, A.; Arbelo, M.
Deep-Learning for Change Detection
Using Multi-Modal Fusion of Remote 1. Introduction
Sensing Images: A Review. Remote
Sens. 2024, 16, 3852. https://doi.org/
Remote sensing captures Earth’s surface data without direct contact. It employs
10.3390/rs16203852
sensors on satellites, airplanes, drones, or ground-based devices [1]. This non-invasive
technique has significantly influenced geography, geology, agriculture, and environmental
Academic Editors: Byeungwoo Jeon, management [2]. It aids in investigating natural resources, the environment, and weather
Yuhui Zheng, Guoqing Zhang
patterns, enabling informed decision-making for long-term growth [3].
and Le Sun
With advancements in remote sensing technology, collecting diverse images using
Received: 5 September 2024 various sensors is now feasible, which improves our ability to analyze the Earth’s surface.
Revised: 10 October 2024 Thus, our review focuses on exploring deep learning (DL) methods for change detection
Accepted: 11 October 2024 through the fusion of multi-source remote sensing data, emphasizing their role in inte-
Published: 17 October 2024 grating information from different sensors. The optical remote sensing systems, including
high-resolution (VHR) images and multispectral data, provide detailed views and valuable
information for applications such as urban planning [4] and land cover mapping. VHR
optical offers excellent spatial resolution, while multispectral images enable analysis of
Copyright: © 2024 by the authors.
vegetation health [5], water quality [6], and mineral exploration [7]. On the other hand,
Licensee MDPI, Basel, Switzerland.
microwave remote sensing systems, particularly synthetic aperture radar (SAR), offer sev-
This article is an open access article
eral unique advantages: penetrating cloud cover and providing data regardless of weather
distributed under the terms and
conditions or daylight [8].
conditions of the Creative Commons
Attribution (CC BY) license (https://
Each system has its limitations. VHR optical images, despite their high spatial res-
creativecommons.org/licenses/by/
olution, have a limited range of spectral bands, restricting their applicability in studies
4.0/). that require a broader spectrum of wavelengths [9]. Multispectral images are sensitive to
atmospheric interference and cloud cover, which can significantly impact data accuracy.
Moreover, due to the coherent structure of radar waves, SAR data frequently experience
speckle noise [10]. Relying on a single data type can result in incomplete or biased insights,
as each sensor captures different aspects of the observed environment. Also, in long-term
studies, such as monitoring deforestation over decades, depending on a single dataset
becomes impractical. For example, using only one dataset would fail to provide the neces-
sary temporal depth, requiring multi-source data like combining Landsat (30 m, 1989) and
Sentinel-2 (10 m, 2024).
To overcome these limitations, multi-source data fusion has emerged as a vital tech-
nique. It combines complementary information from multiple sensors to create a more
complete and reliable representation of the target area. Multi-source fusion improves
robustness by combining data from optical, SAR, LiDAR, and hyperspectral images, pro-
viding more detailed feature sets. This can improve the accuracy of land classification and
object detection [11].
One of the applications of multi-source data fusion in remote sensing is change
detection, which is the process of identifying and analyzing differences in the state of
an object or phenomenon by comparing images at different times. This technique is
essential for monitoring transformations in various fields, including urban planning [12],
environmental monitoring [13], and disaster management [14]. Multi-source data fusion
provides a robust approach to change detection, where detecting and analyzing changes
between images taken at different times is crucial. By combining data from diverse remote
sensing modalities, we can improve the adaptability and precision of change detection,
facilitating the discrimination of diverse change patterns [15].
Over the last several decades, researchers have developed numerous change detection
methods. Before deep learning, pixel-based classification methods [16–23] progressed
significantly. Most traditional approaches focused on identifying changed pixels and classi-
fying them to create change maps. Despite achieving notable performance on certain image
types, these methods frequently encountered limitations regarding accuracy and gener-
alization. Furthermore, their performance was dependent on the classifier and threshold
parameters used [24]. Few studies have focused on applying multi-source data fusion for
the change detection task [16,20–23].
In recent years, deep learning has revolutionized change detection tasks, primar-
ily when used for homogeneous data. For images acquired from the same sensor type
(e.g., optical-to-optical or SAR-to-SAR), DL models like convolutional neural networks
(CNNs) [25], recurrent neural networks (RNNs) [26], and generative adversarial networks
(GANs) [27] have significantly outperformed traditional methods. These models excel
at automatically extracting hierarchical features from raw data, eliminating the need for
manual feature engineering.
Moreover, deep learning techniques have made substantial strides in change detection
using multi-modal data fusion, enabling the effective integration of diverse remote sensing
data [28] and allowing researchers to gain a comprehensive and accurate understanding
of land cover, environmental changes, and natural disasters. By seamlessly integrating
diverse data sources, DL simplifies the creation of detailed depictions of what’s happening
on Earth. It also enhances the efficiency and precision of fusing remote sensing data, con-
tributing to improved decision-making, environmental monitoring, and land management
practices. The capability to automatically extract meaningful insights from heterogeneous
data sources is a notable advancement in remote sensing data fusion.
Our state-of-the-art review distinguishes itself from the others carried out so far (between
2022 and 2023) [29–32]. While these studies focused on classification based on the type of
classification (supervised, unsupervised, or semi-supervised) [29], the type of deep learning
model used (CNN, RNN, GAN, transformer, etc.) [30], the level of analysis (scene, region,
super-pixel) [31], and the class of the model (UNet and non-UNet) [32]. However, our
approach stands out by being based on the nature of the satellite data available to the user.
We consider the difference between homogeneous and heterogeneous data, such as optical
Remote Sens. 2024, 16, 3852 3 of 32
data at different scales or multi-modal data combining optical and SAR (Synthetic Aperture
Radar). This review guides users towards the most suitable approaches for their data. It offers
specific recommendations for multi-scale optical data (e.g., Landsat, Sentinel, WorldView-2)
and multi-modal optical-SAR data (e.g., Sentinel-1A, Sentinel-2A).
This review is structured as follows: Section 2 outlines the literature review method-
ology used to collect articles for this review, including the search strategy and selection
criteria. Section 3 presents the key findings obtained through statistical analysis of the data.
Section 4 explores various multi-modal datasets used in remote sensing change detection,
discussing their quality and limitations. Section 5 examines approaches and techniques
used for multi-modal data fusion to enhance change detection accuracy. Section 6 raises
future research trends and discussions. Finally, Section 7 concludes this survey. By adopting
this review structure, we aim to provide a comprehensive and accessible resource illus-
trating the transformative potential of data fusion in remote sensing. It presents valuable
insights for researchers and practitioners alike.
120 studies left for analysis in this survey. A complete selection process flow is shown in
(Figure 1).
4. Multi-modal Datasets
Datasets are crucial for the performance of DL models. They influence accuracy, which
measures prediction success. They also affect efficiency, reflecting speed and resource usage.
High-quality datasets improve the model’s reliability and enhance its capacity to generalize
to new data [34]. This section delves into three critical categories of remote sensing datasets
used in DL applications: single-source, multi-source, and multi-sensor data. Each of these
dataset categories presents its own unique challenges and opportunities. A summary of
available datasets for each category is provided in Table 2.
the target area or phenomenon [57]. For instance, the Bastrop dataset [44] focuses on observing
the effects of forest fires in Bastrop County, Texas, USA, using pre-event images from Landsat-
5 and post-event images from EO-1 ALI. Another example is the S2Looking dataset [41], a
collection of data spanning the years 2017 to 2020 from various satellites, including GaoFen
(GF), SuperView (SV), and Beijing-2 (BJ-2), specifically designed for satellite-side-looking
change detection.
Figure 4. Feature extraction strategy. (a) Early fusion; (b) Late fusion; (c) Multiple fusion.
Early fusion combines features at the input layer, leading to a unified input representation
before the DL model processes it, as shown in Figure 4a. Several studies, including [60–62],
explore this method. Regardless, early fusion methods may only utilize partial information
for the change detection task, potentially impacting the detection performance [63].
Late fusion [64] combines features at the output layer, making a final decision using the
outputs of individual models trained on each modality as shown in Figure 4b. This method
allows each data source to be processed in a manner suited to its unique characteristics
before combining the extracted features. It is especially effective when the data sources
are heterogeneous.
Multiple fusion combines features from different stages of the DL model, allowing for
deeper information fusion as shown in Figure 4c. Recent studies, such as [12,63,65], have
demonstrated that multi-level fusion outperforms both early and late fusion by leveraging
the strengths of each. However, it is computationally complex and requires careful tuning.
This method is ideal for highly complex change detection tasks, where both broad and
detailed changes need to be captured.
Deep learning methods have adapted these fusion strategies through various archi-
tectural designs. Early fusion, also known as single-stream networks, often apply CNN
architectures such as encoder–decoder models, as illustrated in Figure 5a. While they excel at
Remote Sens. 2024, 16, 3852 9 of 32
capturing the overall context, they may overlook subtle or minor changes. Furthermore, they
may struggle when dealing with noisy or irrelevant variations in the input images. Late or
multiple fusion usually utilizes Siamese network architectures (Figure 5b,c) [66]. This architec-
ture used separate feature extraction branches with shared or unshared weights, extracting
features independently from input images. The network merges after convolutional layers
have processed the input. Extracted features are fused using techniques like concatenation
or addition in some cases. In other cases, an attention mechanism is employed to focus on
informative elements. The fused features are fed into an algorithm that compares them and
produces a change map.
Siamese networks have a flexible general structure (Figure 5a) that can accommodate
a variety of models, including a Siamese feature extractor, feature fusion, and a decision-
making module. This adaptable architecture allows for diverse applications and task
flexibility. An alternative approach is a UNet structure (Figure 5b), where the encoder
processes each image separately, and fusion occurs through skip connections. This architec-
ture is more adept at managing multi-scale features than a traditional Siamese network.
However, it requires more computational resources and memory.
Figure 5. Structures of models. (a) Single Stream Network. (b) General Siamese network structure.
(c) Double-Stream UNet.
5.2. Homogeneous-RSCD
The Homogeneous-RSCD (Hom-RSCD) method involves analyzing data from a single
sensor. These data could be optical imagery captured by satellites, providing valuable
insights into changes in land cover over time. Hom-RSCD has various real-world appli-
cations across different fields. For example, it is utilized to monitor deforestation [67–71]
to identify areas with reduced forest cover caused by illegal logging or fires. In rapidly
urbanizing countries like China, Hom-RSCD is used to track urban expansion and the
conversion of agricultural land into urban areas [12,22,72–77]. Additionally, in Bangladesh,
it is employed for flood monitoring [78,79] by comparing pre- and post-event imagery
to assess flood impacts. These applications highlight the versatility and importance of
Hom-RSCD in addressing critical environmental issues.
Several DL approaches are making a powerful impact on Hom-RSCD:
5.2.1. CNN-Based
Standard CNNs
In recent years, CNNs have established themselves as a versatile approach for ex-
tracting information from remote sensing images for CD. Recent research has focused on
pushing the boundaries even further.
Remote Sens. 2024, 16, 3852 10 of 32
(MBSSCA) module refines these features by leveraging details from pre- and post-images.
The T-UNet outperforms other approaches like early fusion and Siamese networks.
5.2.3. RNN-Based
Change detection tasks are a good fit for RNNs, especially LSTMs (Long Short-Term
Memories) since they can examine data sequences from various periods. Each RNN
cell considers both the current data and information about the past stored in its hidden
state, allowing the network to learn how data evolves over time. This makes them ef-
fective in identifying changes between multiple data periods. Various change detection
methods [72,114–119] employed LSTM as a temporal module. In [73], the authors combined
UNet and RNN architecture (BiLSTM), which is an LSTM development. UNet extracts
the spatial features from input images with varying capture times, and then BiLSTM will
analyze them to examine the temporal change pattern. Similarly, [120,121] also integrated
LSTM networks with a fully convolutional neural network (FCN).
5.2.4. Transforms
Building on the success of attention mechanisms in understanding relationships be-
tween images, researchers are now exploring transformers for even more powerful results.
Unlike attention mechanisms, which focus on specific image regions, transformers can ana-
lyze the entire image. This capability allows them to capture complex relationships between
pixels across different time points. When using ViTs for CD of VHR RSIs, there are two
strategies. Initially, temporal features are extracted by substituting ViTs for CNN backbones,
such as ChangeFormer [122], Pyramid-SCDFormer [123], FTN [124], SwinSUNet [125],
M-Swin [126], MGCDT [77,127], TCIANet [128], and EATDer [129]. Meanwhile, ViTs ex-
cel not only at feature extraction but also at modeling temporal dependencies. BiT [130]
leverages a transformer encoder to pinpoint changes and employs two siamese decoders
to create the change maps. [131] incorporated the token sampling strategy into the BIT
framework to concentrate the model on the most beneficial areas. CTD-Former [132] pro-
poses a novel cross-temporal transformer to analyze interactions between images from
different times. Additionally, SCanFormer [133] offers a joint approach, modeling both
the semantic information and change information in a single model. Zhou et al. [134]
introduced the Dual Cross-Attention transformer (DCAT) method. This innovation lies in a
novel dual cross-attention block that leverages a dual branch that combines convolution
and transformer. Noman et al. [135] replaced conventional self-attention with a shuffled
sparse-attention mechanism, focusing on selective, informative regions to capture CD data
characteristics better. Additionally, they introduce a change-enhanced feature fusion (CEFF)
module, which fuses features from input image pairs through per-channel re-weighting,
enhancing relevant semantic changes and reducing noise.
strong ability to learn both local and global features within data. It makes them suitable
for those tasks. CNNs are experts at identifying specific details within images, while
transformers excel at determining how these details interconnect across the whole scene.
By merging these capabilities, we can more accurately detect and analyze changes in re-
mote sensing images over time. A lot of research uses this hybrid approach [75,136,137].
Wang et al. [138] introduce UVACD, which combines CNNs and transformers for change
detection. A CNN backbone is used to extract high-level semantic features, while trans-
formers are employed to capture the temporal information interaction for generating better
change features. The work of [139] employs a hybrid architecture (TransUNetCD). The
encoder in this architecture utilizes features extracted from CNNs and augments them with
global contextual information. These enhanced features are then upsampled and merged
with multi-scale features to generate global-local features for precise localization. Similarly,
to collect and aggregate multiscale context information from features of various sizes, the
CNN-transformer network MSCANet [140] presents a Multiscale Context Aggregator with
token encoders and decoders. However, several methods have begun to include attention
mechanisms in hybrid CNN-transformer networks. Authors in [141–143] integrate CBAM
to bridge the gap between different types of features extracted from the data. In [144], a
gated attention module (GAM) is employed in a layer-by-layer fashion. The work in [145]
incorporates multiple attention mechanisms at different levels. On the other hand, some
research employs transformer and CNN structures in parallel [146,147]. Tang et al. [148]
proposed WNet, which combines features from a Siamese CNN and a Siamese transformer
in the decoder. Furthermore, ACAHNet [149] combines CNN and transformer models
in a series-parallel manner to create an asymmetric cross-attention hierarchical network.
This reduces computational complexity and enhances interaction between the two models’
features. To try to capture multiscale local and global features, Feng et al. [150] use a
dual-branch CNN and transformer structure. They then employ cross-attention to fuse
the features. To dynamically integrate the CNN and transformer branches’ interaction. Fu
et al. [151] built a semantic information aggregation module. One alternative approach
involves combining CNNs with Graph Neural Networks [152].
5.3. Heterogeneous-RSCD
Heterogeneous RSCD (Het-RSCD) breaks free from the limitations of a single sensor.
It can combine optical data from different resolutions or leverage the strengths of both
optical and SAR data. By combining diverse sources, Het-RSCD creates a more complete
view of Earth’s surface changes, resulting in better accuracy and robustness in change
detection tasks.
CNN-Based Methods
Remote sensing data are mostly image-based, and CNNs have shown impressive
success. In addition to their application to individual data sources, CNNs find application
in multi-scale optical change detection in several recent publications. As an early attempt,
Lv et al. [51] introduces a multi-scale convolutional module within the UNet model to
enhance change detection in heterogeneous images. Shao et al. [47] introduced a novel
approach called SUNet, which employs two distinct feature extractors to generate feature
maps from the two heterogeneous images. These extracted feature maps are then combined
and fed into the decoder. Additionally, SUNet [47] utilizes a Canny edge detector and
Hough transforms to extract edge auxiliary information from the heterogeneous two-phase
images. The study conducted by Wang et al. [43] proposes a novel Siamese network archi-
Remote Sens. 2024, 16, 3852 14 of 32
GAN-Based Methods
GANs have emerged as a powerful tool in deep learning. These fascinating archi-
tectures consist of two separate neural networks: a generator and a discriminator. The
generator always aims to create realistic data samples, while the discriminator attempts
to differentiate real data from the generator’s creations. This ongoing action leads to the
generator learning to produce increasingly high-quality outputs that closely resemble
real data.
The ability of GANs to generate high-resolution (HR) images from lower-resolution
(LR) inputs holds immense potential for Het-RSCD. As Het-RSCD depends on data from
multiple sensors, these sensors may have varying resolutions. LR data can lack important
details for accurate change detection. GANs can help by employing super-resolution
strategies, as shown in Figure 6.
Super-resolution (SR) plays a crucial role in multi-modal change detection (CD) by
enhancing the resolution of low-resolution (LR) images. This enhancement allows for more
accurate and detailed analysis of changes. SR techniques lend themselves to both individual
data modalities and fused images that combine information from multiple modalities.
sentation of the input data. Prexl et al. [157] proposed an unsupervised CD approach
that extends the DCVA framework to handle pre- and post-change imagery with different
spatial resolutions and spectral bands. The approach employs a self-supervised SR method
to enhance lower-resolution images and a set of trainable convolution layers to address
spectral differences. The MF-SRCDNet [158] proposed SR comprises an image transfor-
mation network and a loss network module based on Res-UNet. This method leverages
the strengths of residual networks and UNet. It uses Res-UNet for image transformation
and VGG-16 for loss. Followed by a multi-feature fusion strategy that extracts Harris-LSD
visual features, morphological building index (MBI) features, and non-maximum sup-
pressed Sobel (NMS-Sobel) features. Finally, a change detection module uses a modified
STANet-PAM model with a Siamese structure, enhancing the detection of building changes
using spatial attention mechanisms.
Transformers
Transformers have become increasingly popular in computer vision [159], including
change detection. This rise in popularity follows their success in natural language process-
ing [160]. In 2022, many new models published based transformers, especially in handling
heterogeneous data sources.
The MM-Trans [161] involves a multi-modal transformer framework. It initially
extracts features from bi-temporal images of varying resolutions using a Siamese feature
extractor (ResNet18) with unshared weights. Next, with the help of a token loss, a spatial-
aligned transformer (sp-Trans or SPT) is utilized to learn and shrink these bi-temporal
characteristics to a constant size. To enhance interaction and alignment, a semantic-aligned
transformer is then applied to the high-level bi-temporal characteristics. Ultimately, a
prediction head is used to determine the altered result.
The STCD-Former [162] is a pure transformer model consisting of a spectral token
transformer and a spectral token guidance spatial transformer. It encodes bi-temporal
images, generates spectral tokens, and learns change rules. It includes a difference amplifi-
cation module for discriminative features and an MLP for binary CD results.
Lastly, SILI [163] is an object-based method that utilizes a ResNet-18 Siamese CNN
backbone to extract multilevel features from bi-temporal images. Local window self-
attention establishes a feature interaction at different levels, capturing spatial-temporal
correlations rather than encoding images independently. This process improves feature
alignment by considering local texture variances. The refined features, obtained through
a transformer encoder, contribute to enhanced feature extraction. The decoder utilizes
implicit neural representation (INR) and coordinate information to generate a change map.
Multi-Model Combinations
The use of multi-model deep learning networks for multi-scale optical CD remains
limited, potentially due to the challenges of data fusion and network architecture design.
Moreover, convolutional multiple-layer recurrent neural networks are further proposed
for CD with multi-sensor images. Chen et al. [164] proposed an innovative and universal
deep siamese convolutional multiple-layer recurrent neural network (SiamCRNN), which
combines the benefits of RNN and CNN. Its overall structure consists of three sub-networks
that are highly connected, have a clear division of labor, and can be used to extract picture
attributes, mine change information, and predict change likelihood. The M3 Fusion [165]
uses a two-branch network. The CNN branch extracts patch-based features from a SPOT
6/7 image, and the RNN branch extracts temporal information from Sentinel-2 time-series
images. The extracted features are the input for three classifiers, with two independent
classifiers and a third applied to the fused features.
Remote Sens. 2024, 16, 3852 16 of 32
CNN-Based Methods
Encoder–decoder architectures, leveraging the power of CNNs, extract features from
multi-source data at various resolutions. These features are then compressed into a latent
representation, effectively capturing the core of the changes. The decoder utilizes this latent
representation to reconstruct an image, highlighting the areas where changes have occurred.
Early fusion methods, like M-UNet [51], employ multiscale convolutional modules
within the UNet architecture to enhance change detection in heterogeneous images contain-
ing data from multiple sensors. More recent advancements include multi-modal Siamese
architectures, such as the one proposed by Ebel et al. [166]. In this approach, two separate
encoder branches process SAR and optical data individually. A multi-scale decoder then
combines the extracted features from these branches to create a more comprehensive un-
derstanding of the changes. Similar to this, research by Hafner et al. [167] utilizes separate
UNet models for SAR and optical data before fusing the extracted features at the final stage.
In contrast to other research, which primarily employed pseudo-Siamese networks to ex-
tract features, [168] utilized two distinct encoder networks. Specifically, ResNet50 was used
for optical data, while EfficientNet-B2 was used for SAR data. Finally, the MSCDUNet [169]
architecture utilizes a pseudo-Siamese UNet++ structure. Each branch independently
processes SAR and multispectral optical data using a UNet++ network to extract features.
These features are then fused, and a deep supervision module leverages information from
both branches to generate accurate change maps.
Alternatively, autoencoders significantly improve change detection (CD) with multi-
source data by learning a unified latent space representation for data from different sources.
Autoencoders handle differences between data sources (like sensor types) by finding com-
mon patterns. This lets the model identify changes regardless of the source and works
well even with entirely new data sources. This makes them ideal for unsupervised change
detection tasks, which are a perfect fit for domain adaptation methods that improve per-
formance across different data distributions. DSDANet [170] stands as the first method to
introduce unsupervised domain adaptation into change detection. The DAMSCDNet [171]
suggests a domain adaptation-based network to treat optical and SAR images, which
employs feature-level transformation to align unstable deep feature spaces. To align similar
pixels from input images and minimize the impact of changed pixels, authors in [172]
combined autoencoders and domain-specific affinity matrices. CAE [173] proposes an
unsupervised change detection method. It contains only a convolutional autoencoder
for feature extraction and the commonality autoencoder for commonalities exploration.
Farahani et al. [174] propose an autoencoder-based technique to achieve fusion of features
from SAR and optical data. This method aligns multi-temporal images by reducing spectral
and radiometric differences, making features more similar, and improving accuracy in CD.
Additionally, domain adaptation with an unsupervised autoencoder (LEAE) helps discover
a shared feature space between heterogeneous images, further enhancing the fusion process.
The DHFF [175] used an unsupervised CD approach, which utilizes image style transfer
(IST) to achieve homogeneous transformation. The model separates semantic content and
style features extracted from the images using the VGG network. The IIST strategy is
employed, which iteratively minimizes a cost function to achieve feature homogeneity.
A novel topology-coupling-based heterogeneous network called TSCNet [36] introduces
wavelet transform, channel, and spatial attention methods in addition to transforming
the feature space of heterogeneous images utilizing an encoder–decoder structure. Touati
et al. [176] introduced a novel approach for detecting anomalies in image pairs using a
stacked sparse autoencoder. The method works by encoding the input image into a latent
space, computing reconstruction errors based on the L2 norm. It then generates a classi-
Remote Sens. 2024, 16, 3852 17 of 32
fication map indicating changes and unchanged regions by grouping the reconstructed
errors using a Gaussian mixture. Zheng et al. [177] introduced a cross-resolution differ-
ence to detect changes in images with distinct resolutions. They segmented images into
homogeneous regions and used a CDNN with two autoencoders to extract deep features.
They defined a distance to assess semantic links, computed pixel-wise difference maps, and
merged them to generate a final change map.
Transformers
CNNs have historically been used for CD across optical and SAR pictures by mapping
both images into a common domain for comparison. CNNs, however, have difficulty
identifying long-range dependencies in the data. A recent study by Wei et al. [178] suggests
a solution to this issue by utilizing transformers. Even though the features acquired from
each type of image are derived from distinct sensors, their Cross-Mapping Network (CM-
Net) uses transformers to discover correlations between them. As a result, CM-Net can
build a common representation space that is stronger and more reliable, enabling more
precise change detection. Another approach is mSwinUNet [179], which utilizes a Swin
transformer-based architecture to directly capture global semantic information from SAR
and optical images. This method splits images into patches, encodes them with positional
information, and employs a self-attention mechanism to learn global dependencies.
GAN-Based Methods
In remote sensing applications, GANs have become an effective tool for utilizing the
complementary information of optical and synthetic aperture radar (SAR) data. Studies
like [42,45,180–182] have successfully employed GAN-based image translation to enable
the use of established optical CD methods on SAR data. For instance, Saha et al. [45] utilize
a CycleGAN model for transcoding between different data domains. Deep features are ex-
tracted using an encoder–transformer–decoder architecture. In the same way, DTCDN [55]
employs a cyclic structure to map images from one domain to another, effectively translat-
ing them into a shared feature space. The translated images are then fed into a supervised
CD network. It leverages deep context features to identify and classify changes across
different sensor modalities. Research by [180] translated SAR images into “optical-like”
representations, enabling the use of established burn detection methods on post-fire SAR
imagery. Similarly, [182] proposed a Deep Adaptation-based Change Detection Technique
(DACDT) that utilizes image translation via an optimized UNet++ model to improve CD in
challenging weather conditions. However, limitations exist with separate image translation
and CD steps. Works like [183,184] address this by proposing frameworks that integrate
both tasks within a single deep-learning architecture. Du et al. [183] introduced a Multitask
Change Detection Network (MTCDN) that utilizes a concatenated GAN structure with
separate generators and discriminators for optical and SAR domains. In contrast, [184]
presented a Twin-Depthwise Separable Convolution Connect Network (TDSCCNet) that
employs CycleGAN for front-end image domain transformation. Additionally, it uses a
single-branch encoder–decoder for change feature extraction in the back-end. Recently,
EO-GAN [185] employed edge information for indirect image translation via a cGAN. It
extracts edges and reconstructs the corresponding optical image from a SAR image based
on those edges. To further improve the learning process, a super-pixel method helps the
network build a link between edge changes and actual content changes.
6. Discussion
The growing variety of remote sensing images has brought new challenges to RSCD,
including analyzing changes between images of different resolutions and sources. Due
to the limited availability of data in many CD scenarios, the occurrence of DRCD tasks
is becoming increasingly unavoidable. For example, in regions that experience regular
rainfall, floods, or storms, generating images with the same spatial resolutions over a long
period poses considerable difficulties for annual land cover change monitoring. These
Remote Sens. 2024, 16, 3852 18 of 32
scenarios show the inefficiency of the typical CD method built for bi-temporal images with
similar spatial resolution.
Deep learning’s ability to learn autonomously from complex data has made it a popular
choice for CD. However, the type of imagery used represents a major challenge. In its
early stages, the field has focused on scenarios with homogeneous images. This simplifies
CD, as the focus is only on identifying changes within the same data type. However, this
approach has its limitations, as real-world scenarios often involve heterogeneous images.
These images come from a variety of sources, such as optical and radar sensors, and have
distinct characteristics.
Precision
Method Name/Ref Network Structure DataSet F1 (%) OA (%)
(%)
SZTAKI-Szada 52.78 57.72 94.57
DSMS-FCN [82] Siamese UNet
SZTAKI-Tiszadob 89.18 88.86 96.20
SZTAKI-Tiszadob 76.33 74.56 93.95
ESCNet [85] Siamese UNet
SZTAKI-Szada 48.89 53.73 94.07
RFNet [86] Siamese CNN WHU-CD 95.72 92.49 -
CDD 96.6 97 99.3
Siamese UNet
SMD-Net [87] BCDD 94.80 94.33 99.48
OSCD 96.6 97.0 99.3
LEVIR-CD 93.71 95.31 -
SSCFNet [89] Siamese UNet
SZTAKI 96.54 96.58 -
CDD 95.62 94.58 98.14
Siam-FAUNet [88] Siamese UNet
WHU-CD 44.47 55.50 94.95
DASNet [104] Siamese UNet + Attention CDD 92.2 92.7 98.2
SVCD 92.15 92.37 -
DifUNet++ [92] Siamese UNet++
LEVIR-CD 92.15 89.6 -
SNUNet-CD [93] Siamese UNet++ CDD 96.3 96.2 -
TCDNet [94] Siamese CNN Google Earth 71.18 - -
GF-1 Data - 94.94 -
SSJLN [95] Siamese CNN
EMT+ Data - 98.75 -
LEVIR-CD 95.87 95.50 99.14
CLCD 88.25 86.89 96.26
SAM-CD [98] Siamese CNN
WHU-CD 97.97 97.58 99.60
S2Looking 72.80 65.13 -
CDD 88.26 88.62 -
NestNet [91] Siamese UNet++ ,Attention
OSCD 49.01 49.32 -
HARNU-Net [103] Siamese UNet, Attention CDD 97.10 97.20 99.34
AFSNet [101] Siamese UNet, Attention CDD 98.44 95.56 98.94
CANet [105] Siamese UNet, Attention CDD 93.2 93.2 98.4
PGA-SiamNet [46] Siamese UNet, Attention EV-CD building 94.01 91.74 99.68
SVCD 97.54 -
MFPNet [186] Siamese UNet, Attention
Zhang dataset 68.45 -
CDD 96.5 99.2 -
MAFF-Net [107] Siamese UNet, Attention LEVIR-CD 89.7 98.9 -
WHU-CD 92.4 99.4 -
MSF-Net [108] Siamese UNet, Attention LEVIR-CD 90 88.66 -
LEVIR-CD 91.57 89.58 -
FERA-Net [109] Siamese UNet, Attention
WHU-CD 93.51 92.48 -
LEVIR-CD 92.60 91.63 99.16
T-UNet [110] Triple UNet, Attention WHU-CD 95.44 91.77 99.42
DSIFN 70.86 69.52 89.83
LEVIR-CD 92.05 90.40 99.04
ChangeFormer [122] Siamese Transformer
DSIFN 88.48 86.67 95.56
CDD 95.7 94.0 98.5
SwinSUNet [125] Siamese Transformer OSCD 55.0 54.5 95.3
WHU 95.0 93.8 99.4
Remote Sens. 2024, 16, 3852 20 of 32
Table 3. Cont.
Precision
Method Name/Ref Network Structure DataSet F1 (%) OA (%)
(%)
LEVIR-CD 89.24 89.31 98.92
BiT [130] Siamese Transformer
DSIFN 68.36 69.26 89.41
LEVIR-CD 91.74 91.20 98.75
EATDer [129] Siamese Transformer CDD 96.83 95.97 98.97
WHU-CD 91.32 90 98.58
LEVIR-CD 91.85 92.71 98.62
CTD-Former [132] Siamese Transformer WHU-CD 96.74 96.86 99.5
CLCD 87.29 85.08 96.11
SECOND - 63.66 87.86
SCanFormer [133] Siamese Transformer
Landsat-SCD 89.27 96.26
CDD 93.2 93.2 98.4
TransUNetCD [139] Siamese UNet + Transformer
S2Looking 93.2 93.2 98.4
LEVIR-CD 92.19 91.21 99.11
CTCANet [141] Siamese CNN + Transformer
SYSU-CD 80.50 81.23 91.40
LEVIR-CD+ 84.72 84.02 -
DCAT [134] Siamese (CNN + Transformer) SYSU-CD 87.00 79.63 -
WHU-CD 91.53 88.19 -
LEVIR-CD 94.29 93.04 98.69
SYSU-CD 86.17 84.80 89.42
SMART [145] Siamese (CNN + Transformer)
WHU-CD 89.9 91.57 98.70
DSIFN 76.89 78.7 87
LEVIR-CD 91.16 90.67 99.06
WHU-CD 92.37 91.25 99.31
WNet [148] Siamese CNN+Siamese Transformer
SYSU-CD 81.71 80.64 90.98
SVCD 97.71 97.56 99.42
CDD 97.5 97.72 99.48
ACAHNet [149] Siamese (CNN + Transformer) LEVIR-CD 92.36 91.51 99.14
SYSU-CD 83.96 82.73 91.97
LEVIR-CD+ 87.79 83.65 98.73
ICIF-Net [150] Siamese (CNN + Transformer) WHU-CD 92.98 88.32 98.96
SYSU-CD 83.37 80.74 91.24
LEVIR-CD - 91.75 -
Slddnet [151] Siamese (CNN + Transformer) WHU-CD - 92.76 -
GZ-CD - 86.61 -
limits its ability to generalize to more diverse scenarios with varying sensors or image
properties. To achieve semantic alignment across resolutions (i.e., difference ratio, e.g., 4, 8),
a recent study [161] used CNN-based siamese feature extraction and transformers to learn
correlations between the upsampled LR features and the original HR ones, which verifies
the effectiveness of the feature-wise alignment strategy. The methods mentioned are
effective for fixed resolution differences but may not be suitable for situations with other
resolution differences, limiting their practical applications. To fill this gap, SILI [163] offers
a single model adjusted to different ratios between bi-temporal images by using local
window self-attention to establish a feature interaction at different levels and capturing
spatial-temporal correlations rather than encoding images independently. The decoder
utilizes implicit neural representation (INR) to generate a change map.
Data fusion is also used for classification tasks, with many methods integrating LiDAR
and hyperspectral images through various applications. For instance, Siamese networks
are often employed, as seen in studies [190,191]. Techniques include the Squeeze-and-
Excitation module for weighted feature fusion [192]. FusAtNet’s cross-attention allows
each modality’s feature learning to benefit from the other [193]. Additionally, SepDGConv’s
single-stream network with Dynamic Group Convolution [81]. AMM-FuseNet [194] en-
hances performance using channel attention and densely connected atrous spatial pyramid
pooling. Additionally, [165] fuses Sentinel-2 Time Series and Spot7 images using a GRU
with Attention and a CNN branch, aided by auxiliary classifiers.
However, a notable limitation of supervised methods is that models necessitate large
amounts of labeled data, which are costly and time-consuming to create, especially for
change detection tasks. Interest in unsupervised networks is growing as they aim to re-
duce reliance on labeled datasets. Domain adaptation is a popular method that aims to
project pre-change and post-change images into a shared feature space to allow for com-
parison. Image-to-image (I2I) translation via a conditional generative adversarial network
(cGAN) [195] is a powerful technique for mapping data across domains. Particularly, the
CycleGAN [42,45] approach utilizes cGANs and enforces cyclic consistency to accomplish
even more powerful results. Therefore, censoring change pixels is important for applying
this method in heterogeneous CD because their existence perturbs training and promotes
irrelevant object transformations. Despite their capability, high training requirements,
imbalanced training dynamics, and the possibility for mode failure or unstable loss func-
tions can limit their real-world applicability. In addition, these methods [175,196] applied
homogeneous transformation, which refers to transforming the heterogeneous images into
a homogeneous domain based on image translation and immediately comparing them
at the pixel level. Nevertheless, the homogeneous transformation characteristics rely on
low-level information such as pixel values, which are likely to affect the altered products’
semantic meaning, particularly in regions with many objects and intense environments.
Nowadays, several research papers have begun to concentrate on self-supervised multi-
modal learning. It motivates the network to acquire more meaningful and accessible feature
representations. Wu et al. [173] effectively aligned related pixels from multi-modal images
through domain-specific affinity matrices and autoencoders. Luppino et al. [172] suggested
a commonality autoencoder capable of discovering common features within heterogeneous
image representations. Nevertheless, its sensitivity to hyperparameters requires careful
tuning for optimal performance. Jiang et al. [176] proposed a stacked sparse autoencoder
unsupervised method for anomaly detection in image pairs. While most of the current
methods focus on extracting deep features to get the full image transformation, neglecting
the image’s topological structure. It includes direction, edge, and texture information.
Thus, TSCNet [36] proposes a new topology-coupling algorithm by introducing wavelet
transform, channel, and spatial attention mechanisms. Table 4 shows the performance of
heterogeneous RSCD methods on different datasets.
Remote Sens. 2024, 16, 3852 23 of 32
Method Precision
Network Structure DataSet F1 (%) OA (%)
Name/Reference (%)
Shuguang - 84.73 98.69
M-UNet [51] Single UNet Sardinia - 67 98.01
California - 61.33 96.66
OB-DSCNH [43] Siamese CNN Mengxi Liu [43] - - 97.92
Houston2018 56.55 - 63.74
SepDGConv [81] Single CNN Berlin 54.23 - 68.21
MUUFL 72.75 - 83.23
8×/11× CCD 95.48/95.17 90.44/90.07 -
MM-Trans [161] Siamese CNN + Transformer 65.37/
5×/8× S2looking 58.62/56.99 -
64.57
8× HTCD 82.13 74.99 -
MSBC Dataset - 64.21 -
MSCDUNet [169] Siamse UNet++
MSOSCD Dataset - 92.81 -
RACDNet [155] GAN + Saimese UNet MRCDD Dataset 91.18 96.79
SUNet [139] Siamese UNet HTCD dataset 97.3 91 99.6
Patrick et al. [166] Siamese UNet ONERA CD data 60.2 58.1 -
STCD-Former [162] Siamese Transformer Bastrop data - 99.25 -
M3 Fusion [165] Siamese CNN + RNN Reunion Island 90.09 89.96 -
Hunan 59.13 79.06
AMM-FuseNet [194] Siamese UNet + Attention DFC2020 - 90.33 94.56
Potsdam 79.31 85.28
Houston2013 90.56 - 89.15
MFT [197] Siamese CNN + Transformer MUUFL 81 - 94.18
Trento 95.91 - 97.76
Houston2013 98.57 - 98.61
Chen et al. [191] Siamese CNN Bayview Park 99.75 - 99.41
Recology 98.90 - 98.15
PoDelta - - 82.61
MBFNet [106] Siamese CNN + Attention
CHONGMING - - 93.61
TWINNS [188] Siamse CNN, GRU Reunion Island 89.87 89.88 -
SiamCRNN [164] Siamese CNN + LSTM LiDAR-Opt 87.38 82.15 82.15
WXCD 84.5 88.1 95.3
MF-SRCDNet [158] GAN + Siamese UNet
BCDD 96.4 96.4 98.5
SiamGAN [156] Siamese GAN Guangzhou 69.5 76.06 -
4×/8× BCDD 84.44/81.61 85.66/81.69 -
SRCDNet [154] GAN + Siamese UNet, Attention
4×/8× CDD 92.07/91.95 - -
LEVIR-CD(4×) 90 88 98
SILI [163] Siamese CNN + Transformer SV-CD(8×) 95 94 98
DE-CD(3.3×) 61 50 -
Data1 78.89 82.17 -
DAMSCDNet [171] Siamese CNN Data2 92.04 93.86 -
Data3 71.51 - 71.71
Lake overflow - - 92.2
CA_AE [172] Autoencoders
Constructions - - 85.9
Yellow River - - 97.74
CAE [173] Autoencoders Sardinia - - 97.47
farmland - - 97.91
Remote Sens. 2024, 16, 3852 24 of 32
Table 4. Cont.
Method Precision
Network Structure DataSet F1 (%) OA (%)
Name/Reference (%)
Farahani et al. [174] Autoencoders San Francisco - 96.44 72/68
Tōhoku 84.66 - 98.63
DHFF [175] Siamese VGG (IST)
Haiti 58.19 - 98.23
TSCNet [36] Autoencoders + Attention Flood California [198] 49.4 5.74 93.9
Yellow River - - 97.7
Niu et al. [195] Autoencoders
farmland - - 98.26
SARDINA 90.55 97.52
CM-Net [178] Autoencoder + Transformer Shuguang 95.00 - 98.57
GLOUCESTERSHIRE 93.51 - 96.92
Gloucester I 89.96 89.95 97.98
Gloucester II 90.78 88.67 96.33
DTCDN [55] CycleGAN
California 66.73 72.03 97.61
Shuguang 92.92 91.56 99.75
Gloucester I - - 98.67
DACDT [182] CycleGAN Gloucester II - - 97.68
California - - 98.87
Gloucester I 88.86 88.22 97.65
MTCDN [183] CycleGAN Gloucester II 89.49 88.87 96.34
California 55.20 61.54 95.83
Italy 85.64 81.07 97.62
WV-3 91.34 91.37 98.01
TDSCCNet [184] CycleGAN
Gloucester 93.29 93.75 97.36
Shuguang 82.58 88.58 97.01
Yellow River 98.01
EO-GAN [185] CGAN
Shuguang - - 98.16
7. Conclusions
In many real-world remote sensing applications, change detection is an essential
component. Deep learning has gained increasing traction for accomplishing this task.
This study delves into the deployment of deep learning techniques for change detec-
tion in remote sensing, particularly utilizing multi-modal imagery. It provides a summary
of available datasets suitable for change detection and analyzes the effectiveness of various
deep-learning models. There are two categories of models: those tailored for homogeneous
change detection and those suitable for diverse data types (heterogeneous). Additionally,
the paper illustrates the strengths, challenges, and possible avenues for future research in
this field.
A large amount of research in change detection has focused on homogenous scenarios.
Moreover, heterogeneous change detection presents a more challenging issue. Managing
discrepancies in data types, specifically when dealing with varying resolutions in multi-
sensor data, significantly complicates the detection process. Consequently, many research
efforts try to deal with change detection problems using multi-source data with similar or
near-identical resolutions, such as combining SAR and optical data.
Author Contributions: All authors contributed in a substantial way to the manuscript. S.S. and
S.I. conceived the review. S.S. and S.I. designed the overall structure of the review. S.S. wrote
the manuscript. All authors discussed the basic structure of the manuscript. S.S., S.I. and A.M.
contributed to the discussion of the review. S.S., S.I., A.M. and Y.K. made contribution to the review
of related literature. M.A. reviewed the manuscript and supervised the study for all the stages. All
authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Ministry of Higher Education, Scientific Research and Innova-
tion, the Digital Development Agency (DDA), and the CNRST of Morocco (ALKHAWARIZMI/2020/29).
Data Availability Statement: No new data were created in this manuscript.
Acknowledgments: The authors are grateful to the reviewers for their constructive comments and
valuable assistance in improving the manuscript.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Aplin, P. Remote sensing: Land cover. Prog. Phys. Geogr. 2004, 28, 283–293. [CrossRef]
2. Rees, G. Physical Principles of Remote Sensing; Cambridge University Press: Cambridge, UK, 2013.
3. Pettorelli, N. Satellite Remote Sensing and the Management of Natural Resources; Oxford University Press: Oxford, UK, 2019.
4. Yin, J.; Dong, J.; Hamm, N.A.; Li, Z.; Wang, J.; Xing, H.; Fu, P. Integrating remote sensing and geospatial big data for urban land
use mapping: A review. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102514. [CrossRef]
5. Dash, J.P.; Pearse, G.D.; Watt, M.S. UAV multispectral imagery can complement satellite data for monitoring forest health. Remote
Sens. 2018, 10, 1216. [CrossRef]
6. Cillero Castro, C.; Domínguez Gómez, J.A.; Delgado Martín, J.; Hinojo Sánchez, B.A.; Cereijo Arango, J.L.; Cheda Tuya, F.A.;
Díaz-Varela, R. An UAV and satellite multispectral data approach to monitor water quality in small reservoirs. Remote Sens. 2020,
12, 1514. [CrossRef]
7. Shirmard, H.; Farahbakhsh, E.; Müller, R.D.; Chandra, R. A review of machine learning in processing remote sensing data for
mineral exploration. Remote Sens. Environ. 2022, 268, 112750. [CrossRef]
8. Demchev, D.; Eriksson, L.; Smolanitsky, V. SAR image texture entropy analysis for applicability assessment of area-based and
feature-based aea ice tracking approaches. In Proceedings of the EUSAR 2021; 13th European Conference on Synthetic Aperture
Radar, VDE, Online, 29–31 April 2021; pp. 1–3.
9. Wen, D.; Huang, X.; Bovolo, F.; Li, J.; Ke, X.; Zhang, A.; Benediktsson, J.A. Change detection from very-high-spatial-resolution
optical remote sensing images: Methods, applications, and future directions. IEEE Geosci. Remote Sens. Mag. 2021, 9, 68–101.
[CrossRef]
10. Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE
Geosci. Remote Sens. Mag. 2013, 1, 6–43. [CrossRef]
11. Zhang, J. Multi-source remote sensing data fusion: Status and trends. Int. J. Image Data Fusion 2010, 1, 5–24. [CrossRef]
12. Liu, Y.; Pang, C.; Zhan, Z.; Zhang, X.; Yang, X. Building change detection for remote sensing images using a dual-task constrained
deep siamese convolutional network model. IEEE Geosci. Remote Sens. Lett. 2020, 18, 811–815. [CrossRef]
Remote Sens. 2024, 16, 3852 26 of 32
13. Shi, S.; Zhong, Y.; Zhao, J.; Lv, P.; Liu, Y.; Zhang, L. Land-use/land-cover change detection based on class-prior object-oriented
conditional random field framework for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2020,
60, 1–16. [CrossRef]
14. Brunner, D.; Bruzzone, L.; Lemoine, G. Change detection for earthquake damage assessment in built-up areas using very high
resolution optical and SAR imagery. In Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium,
IEEE, Honolulu, HI, USA, 25–30 July 2010; pp. 3210–3213.
15. You, Y.; Cao, J.; Zhou, W. A survey of change detection methods based on remote sensing images for multi-source and
multi-objective scenarios. Remote Sens. 2020, 12, 2460. [CrossRef]
16. Deng, J.; Wang, K.; Deng, Y.; Qi, G. PCA-based land-use change detection and analysis using multitemporal and multisensor
satellite data. Int. J. Remote Sens. 2008, 29, 4823–4838. [CrossRef]
17. Bovolo, F.; Bruzzone, L.; Marconcini, M. A novel approach to unsupervised change detection based on a semisupervised SVM
and a similarity measure. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2070–2082. [CrossRef]
18. Hao, M.; Zhou, M.; Jin, J.; Shi, W. An advanced superpixel-based Markov random field model for unsupervised change detection.
IEEE Geosci. Remote Sens. Lett. 2019, 17, 1401–1405. [CrossRef]
19. Zhou, L.; Cao, G.; Li, Y.; Shang, Y. Change detection based on conditional random field with region connection constraints in
high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3478–3488. [CrossRef]
20. Tan, K.; Jin, X.; Plaza, A.; Wang, X.; Xiao, L.; Du, P. Automatic change detection in high-resolution remote sensing images by
using a multiple classifier system and spectral–spatial features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3439–3451.
[CrossRef]
21. Seo, D.K.; Kim, Y.H.; Eo, Y.D.; Lee, M.H.; Park, W.Y. Fusion of SAR and multispectral images using random forest regression for
change detection. ISPRS Int. J. Geo-Inf. 2018, 7, 401. [CrossRef]
22. Wang, C.; Wang, X. Building change detection from multi-source remote sensing images based on multi-feature fusion and
extreme learning machine. Int. J. Remote Sens. 2021, 42, 2246–2257. [CrossRef]
23. Touati, R.; Mignotte, M.; Dahmane, M. Multimodal change detection in remote sensing images using an unsupervised pixel
pairwise-based Markov random field model. IEEE Trans. Image Process. 2019, 29, 757–767. [CrossRef]
24. Cheng, G.; Huang, Y.; Li, X.; Lyu, S.; Xu, Z.; Zhao, H.; Zhao, Q.; Xiang, S. Change detection methods for remote sensing in the last
decade: A comprehensive review. Remote Sens. 2024, 16, 2355. [CrossRef]
25. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
26. Schmidt, R.M. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv 2019, arXiv:1912.05911.
27. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680.
28. Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep learning in multimodal remote sensing data fusion: A
comprehensive review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [CrossRef]
29. Shafique, A.; Cao, G.; Khan, Z.; Asad, M.; Aslam, M. Deep learning-based change detection in remote sensing images: A review.
Remote Sens. 2022, 14, 871. [CrossRef]
30. Jiang, H.; Peng, M.; Zhong, Y.; Xie, H.; Hao, Z.; Lin, J.; Ma, X.; Hu, X. A survey on deep learning-based change detection from
high-resolution remote sensing images. Remote Sens. 2022, 14, 1552. [CrossRef]
31. Bai, T.; Wang, L.; Yin, D.; Sun, K.; Chen, Y.; Li, W.; Li, D. Deep learning for change detection in remote sensing: A review. Geo-Spat.
Inf. Sci. 2023, 26, 262–288. [CrossRef]
32. Parelius, E.J. A review of deep-learning methods for change detection in multispectral remote sensing images. Remote Sens. 2023,
15, 2092. [CrossRef]
33. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; PRISMA Group. Preferred reporting items for systematic reviews and
meta-analyses: The PRISMA statement. Ann. Intern. Med. 2009, 151, 264–269. [CrossRef]
34. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef]
35. Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Urban change detection for multispectral earth observation using convolutional
neural networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium,
Valencia, Spain, 22–27 July 2018; pp. 2115–2118.
36. Wang, X.; Cheng, W.; Feng, Y.; Song, R. TSCNet: Topological structure coupling network for change detection of heterogeneous
remote sensing images. Remote Sens. 2023, 15, 621. [CrossRef]
37. Chen, H.; Yokoya, N.; Wu, C.; Du, B. Unsupervised multimodal change detection based on structural relationship graph
representation learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [CrossRef]
38. Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection.
Remote Sens. 2020, 12, 1662. [CrossRef]
39. Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery
data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [CrossRef]
40. Feng, S.; Fan, Y.; Tang, Y.; Cheng, H.; Zhao, C.; Zhu, Y.; Cheng, C. A change detection method based on multi-scale adaptive
convolution kernel network and multimodal conditional random field for multi-temporal multispectral images. Remote Sens.
2022, 14, 5368. [CrossRef]
Remote Sens. 2024, 16, 3852 27 of 32
41. Shen, L.; Lu, Y.; Chen, H.; Wei, H.; Xie, D.; Yue, J.; Chen, R.; Lv, S.; Jiang, B. S2Looking: A satellite side-looking dataset for
building change detection. Remote Sens. 2021, 13, 5094. [CrossRef]
42. Lebedev, M.; Vizilter, Y.V.; Vygolov, O.; Knyaz, V.A.; Rubis, A.Y. Change detection in remote sensing images using conditional
adversarial networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 565–571. [CrossRef]
43. Wang, M.; Tan, K.; Jia, X.; Wang, X.; Chen, Y. A deep siamese network with hybrid convolutional feature extraction module for
change detection based on multi-sensor remote sensing images. Remote Sens. 2020, 12, 205. [CrossRef]
44. Volpi, M.; Camps-Valls, G.; Tuia, D. Spectral alignment of multi-temporal cross-sensor images with automated kernel canonical
correlation analysis. ISPRS J. Photogramm. Remote Sens. 2015, 107, 50–63. [CrossRef]
45. Saha, S.; Bovolo, F.; Bruzzone, L. Unsupervised multiple-change detection in VHR multisensor images via deep-learning based
adaptation. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama,
Japan, 28 July–2 August 2019; pp. 5033–5036.
46. Jiang, H.; Hu, X.; Li, K.; Zhang, J.; Gong, J.; Zhang, M. PGA-SiamNet: Pyramid feature-based attention-guided siamese network
for remote sensing orthoimagery building change detection. Remote Sens. 2020, 12, 484. [CrossRef]
47. Shao, R.; Du, C.; Chen, H.; Li, J. SUNet: Change detection for heterogeneous remote sensing images from satellite and UAV using
a dual-channel fully convolution network. Remote Sens. 2021, 13, 3750. [CrossRef]
48. Li, Y.; Zhou, Y.; Zhang, Y.; Zhong, L.; Wang, J.; Chen, J. DKDFN: Domain knowledge-guided deep collaborative fusion network
for multimodal unitemporal remote sensing land cover classification. ISPRS J. Photogramm. Remote Sens. 2022, 186, 170–189.
[CrossRef]
49. Robinson, C.; Malkin, K.; Jojic, N.; Chen, H.; Qin, R.; Xiao, C.; Schmitt, M.; Ghamisi, P.; Hänsch, R.; Yokoya, N. Global land-cover
mapping with weak supervision: Outcome of the 2020 IEEE GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote
Sens. 2021, 14, 3185–3199. [CrossRef]
50. Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS benchmark on urban object
classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. I-3 2012, 1, 293–298. [CrossRef]
51. Lv, Z.; Huang, H.; Gao, L.; Benediktsson, J.A.; Zhao, M.; Shi, C. Simple multiscale UNet for change detection with heterogeneous
remote sensing images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [CrossRef]
52. Xu, Y.; Du, B.; Zhang, L.; Cerra, D.; Pato, M.; Carmona, E.; Prasad, S.; Yokoya, N.; Hänsch, R.; Le Saux, B. Advanced multi-sensor
optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS data fusion contest.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1709–1724. [CrossRef]
53. Hong, D.; Hu, J.; Yao, J.; Chanussot, J.; Zhu, X.X. Multimodal remote sensing benchmark datasets for land cover classification
with a shared and specific feature learning model. ISPRS J. Photogramm. Remote Sens. 2021, 178, 68–80. [CrossRef]
54. Gader, P.; Zare, A.; Close, R.; Aitken, J.; Tuell, G. Muufl Gulfport Hyperspectral and Lidar Airborne Data Set; University of Florida:
Gainesville, FL, USA, 2013.
55. Li, X.; Du, Z.; Huang, Y.; Tan, Z. A deep translation (GAN) based change detection network for optical and SAR remote sensing
images. ISPRS J. Photogramm. Remote Sens. 2021, 179, 14–34. [CrossRef]
56. Huang, C.; Chen, Y.; Zhang, S.; Wu, J. Detecting, extracting, and monitoring surface water from space using optical sensors: A
review. Rev. Geophys. 2018, 56, 333–360. [CrossRef]
57. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive
review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [CrossRef]
58. Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hofle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource
and multitemporal data fusion in remote sensing: A comprehensive review of the state of the art. IEEE Geosci. Remote Sens. Mag.
2019, 7, 6–39. [CrossRef]
59. Gómez-Chova, L.; Tuia, D.; Moser, G.; Camps-Valls, G. Multimodal classification of remote sensing images: A review and future
directions. Proc. IEEE 2015, 103, 1560–1584. [CrossRef]
60. Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Multitask learning for large-scale semantic change detection. Comput. Vis.
Image Underst. 2019, 187, 102783. [CrossRef]
61. Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved UNet++. Remote
Sens. 2019, 11, 1382. [CrossRef]
62. Zheng, Z.; Wan, Y.; Zhang, Y.; Xiang, S.; Peng, D.; Zhang, B. CLNet: Cross-layer convolutional neural network for change
detection in optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 247–267. [CrossRef]
63. Lei, Y.; Peng, D.; Zhang, P.; Ke, Q.; Li, H. Hierarchical paired channel fusion network for street scene change detection. IEEE
Trans. Image Process. 2020, 30, 55–67. [CrossRef] [PubMed]
64. Zhang, M.; Xu, G.; Chen, K.; Yan, M.; Sun, X. Triplet-based semantic relation learning for aerial remote sensing image change
detection. IEEE Geosci. Remote Sens. Lett. 2018, 16, 266–270. [CrossRef]
65. Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change
detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [CrossRef]
66. Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking.
In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016;
Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 850–865.
Remote Sens. 2024, 16, 3852 28 of 32
67. Adarme, M.O.; Feitosa, R.Q.; Happ, P.N.; De Almeida, C.A.; Gomes, A.R. Evaluation of Deep Learning Techniques for
Deforestation Detection in the Brazilian Amazon and Cerrado Biomes From Remote Sensing Imagery. Remote Sens. 2020, 12, 910.
[CrossRef]
68. Zhang, J.; Wang, Z.; Bai, L.; Song, G.; Tao, J.; Chen, L. Deforestation Detection Based on U-Net and LSTM in Optical Satellite
Remote Sensing Images. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS,
IEEE, Brussels, Belgium, 11–16 July 2021; pp. 3753–3756.
69. John, D.; Zhang, C. An attention-based U-Net for detecting deforestation within satellite sensor imagery. Int. J. Appl. Earth Obs.
Geoinf. 2022, 107, 102685. [CrossRef]
70. Alshehri, M.; Ouadou, A.; Scott, G.J. Deep Transformer-based Network Deforestation Detection in the Brazilian Amazon Using
Sentinel-2 Imagery. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [CrossRef]
71. Bidari, I.; Chickerur, S. Deep Recurrent Residual U-Net with Semi-Supervised Learning for Deforestation Change Detection. SN
Comput. Sci. 2024, 5, 893. [CrossRef]
72. Papadomanolaki, M.; Verma, S.; Vakalopoulou, M.; Gupta, S.; Karantzalos, K. Detecting urban changes with recurrent neural
networks from multitemporal Sentinel-2 data. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and
Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 214–217.
73. Khusni, U.; Dewangkoro, H.I.; Arymurthy, A.M. Urban area change detection with combining CNN and RNN from Sentinel-2
multispectral remote sensing data. In Proceedings of the 2020 3rd International Conference on Computer and Informatics
Engineering (IC2IE), Yogyakarta, Indonesia, 15–16 September 2020; pp. 171–175.
74. Huang, F.; Shen, G.; Hong, H.; Wei, L. Change detection of buildings with the utilization of a deep belief network and
high-resolution remote sensing images. Fractals 2022, 30, 2240255. [CrossRef]
75. Pang, L.; Sun, J.; Chi, Y.; Yang, Y.; Zhang, F.; Zhang, L. CD-TransUNet: A hybrid transformer network for the change detection of
urban buildings using l-band SAR images. Sustainability 2022, 14, 9847. [CrossRef]
76. Shafique, A.; Seydi, S.T.; Cao, G. BCD-Net: Building change detection based on fully scale connected U-Net and subpixel
convolution. Int. J. Remote Sens. 2023, 44, 7416–7438. [CrossRef]
77. Xiong, J.; Liu, F.; Wang, X.; Yang, C. Siamese Transformer-Based Building Change Detection in Remote Sensing Images. Sensors
2024, 24, 1268. [CrossRef]
78. Ahmed, N.; Hoque, M.A.A.; Arabameri, A.; Pal, S.C.; Chakrabortty, R.; Jui, J. Flood susceptibility mapping in Brahmaputra
floodplain of Bangladesh using deep boost, deep learning neural network, and artificial neural network. Geocarto Int. 2022,
37, 8770–8791. [CrossRef]
79. Lemenkova, P. Deep Learning Methods of Satellite Image Processing for Monitoring of Flood Dynamics in the Ganges Delta,
Bangladesh. Water 2024, 16, 1141. [CrossRef]
80. Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 25th IEEE
International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067.
81. Yang, Y.; Zhu, D.; Qu, T.; Wang, Q.; Ren, F.; Cheng, C. Single-stream CNN with learnable architecture for multisource remote
sensing data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [CrossRef]
82. Chen, H.; Wu, C.; Du, B.; Zhang, L. Deep siamese multi-scale convolutional network for change detection in multi-temporal
VHR images. In Proceedings of the 2019 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images
(MultiTemp), Shanghai, China, 5–7 August 2019; pp. 1–4.
83. Zhang, M.; Shi, W. A feature difference convolutional neural network-based change detection method. IEEE Trans. Geosci. Remote
Sens. 2020, 58, 7232–7246. [CrossRef]
84. Iftene, M.; Larabi, M.E.A.; Karoui, M.S. End-to-end change detection in satellite remote sensing imagery. In Proceedings of the
2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4356–4359.
85. Zhang, H.; Lin, M.; Yang, G.; Zhang, L. ESCNet: An end-to-end superpixel-enhanced change detection network for very-high-
resolution remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 28–42. [CrossRef]
86. Chen, P.; Li, C.; Zhang, B.; Chen, Z.; Yang, X.; Lu, K.; Zhuang, L. A region-based feature fusion network for VHR image change
detection. Remote Sens. 2022, 14, 5577. [CrossRef]
87. Zhang, X.; He, L.; Qin, K.; Dang, Q.; Si, H.; Tang, X.; Jiao, L. SMD-Net: Siamese multi-scale difference-enhancement network for
change detection in remote sensing. Remote Sens. 2022, 14, 1580. [CrossRef]
88. Wang, Q.; Li, M.; Li, G.; Zhang, J.; Yan, S.; Chen, Z.; Zhang, X.; Chen, G. High-resolution remote sensing image change detection
method based on improved siamese U-Net. Remote Sens. 2023, 15, 3517. [CrossRef]
89. Wang, J.; Liu, F.; Jiao, L.; Wang, H.; Yang, H.; Liu, X.; Li, L.; Chen, P. SSCFNet: A spatial-spectral cross fusion network for remote
sensing change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4000–4012. [CrossRef]
90. Zhang, W.; Zhang, Y.; Su, L.; Mei, C.; Lu, X. Difference-enhancement triplet network for change detection in multispectral images.
IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [CrossRef]
91. Yu, X.; Fan, J.; Chen, J.; Zhang, P.; Zhou, Y.; Han, L. NestNet: A multiscale convolutional neural network for remote sensing
image change detection. Int. J. Remote Sens. 2021, 42, 4898–4921. [CrossRef]
92. Zhang, X.; Yue, Y.; Gao, W.; Yun, S.; Su, Q.; Yin, H.; Zhang, Y. DifUnet++: A satellite images change detection network based on
UNet++ and differential pyramid. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [CrossRef]
Remote Sens. 2024, 16, 3852 29 of 32
93. Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected siamese network for change detection of VHR images. IEEE
Geosci. Remote Sens. Lett. 2021, 19, 1–5. [CrossRef]
94. Qian, J.; Xia, M.; Zhang, Y.; Liu, J.; Xu, Y. TCDNet: Trilateral change detection network for Google Earth image. Remote Sens. 2020,
12, 2669. [CrossRef]
95. Zhang, W.; Lu, X. The spectral-spatial joint learning for change detection in multispectral imagery. Remote Sens. 2019, 11, 240.
[CrossRef]
96. Ye, Y.; Zhou, L.; Zhu, B.; Yang, C.; Sun, M.; Fan, J.; Fu, Z. Feature decomposition-optimization-reorganization network for
building change detection in remote sensing images. Remote Sens. 2022, 14, 722. [CrossRef]
97. Lei, J.; Gu, Y.; Xie, W.; Li, Y.; Du, Q. Boundary extraction constrained siamese network for remote sensing image change detection.
IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [CrossRef]
98. Ding, L.; Zhu, K.; Peng, D.; Tang, H.; Yang, K.; Bruzzone, L. Adapting segment anything model for change detection in VHR
remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [CrossRef]
99. Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast segment anything. arXiv 2023, arXiv:2306.12156.
100. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In
Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December
2017; Volume 30.
101. Jiang, M.; Zhang, X.; Sun, Y.; Feng, W.; Gan, Q.; Ruan, Y. AFSNet: Attention-guided full-scale feature aggregation network for
high-resolution remote sensing image change detection. Giscience Remote Sens. 2022, 59, 1882–1900. [CrossRef]
102. Adriano, B.; Yokoya, N.; Xia, J.; Miura, H.; Liu, W.; Matsuoka, M.; Koshimura, S. Learning from multimodal and multitemporal
earth observation data for building damage mapping. ISPRS J. Photogramm. Remote Sens. 2021, 175, 132–143. [CrossRef]
103. Li, H.; Wang, L.; Cheng, S. HARNU-Net: Hierarchical attention residual nested U-Net for change detection in remote sensing
images. Sensors 2022, 22, 4626. [CrossRef]
104. Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional siamese
networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020,
14, 1194–1206. [CrossRef]
105. Lu, D.; Wang, L.; Cheng, S.; Li, Y.; Du, A. CANet: A combined attention network for remote sensing image change detection.
Information 2021, 12, 364. [CrossRef]
106. Li, X.; Lei, L.; Sun, Y.; Li, M.; Kuang, G. Multimodal bilinear fusion network with second-order attention-based channel selection
for land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1011–1026. [CrossRef]
107. Ma, J.; Shi, G.; Li, Y.; Zhao, Z. MAFF-Net: Multi-attention guided feature fusion network for change detection in remote sensing
images. Sensors 2022, 22, 888. [CrossRef] [PubMed]
108. Chen, J.; Fan, J.; Zhang, M.; Zhou, Y.; Shen, C. MSF-Net: A multiscale supervised fusion network for building change detection in
high-resolution remote sensing images. IEEE Access 2022, 10, 30925–30938. [CrossRef]
109. Xu, X.; Zhou, Y.; Lu, X.; Chen, Z. FERA-Net: A building change detection method for high-resolution remote sensing imagery
based on residual attention and high-frequency features. Remote Sens. 2023, 15, 395. [CrossRef]
110. Zhong, H.; Wu, C. T-UNet: Triplet UNet for change detection in high-resolution remote sensing images. arXiv 2023,
arXiv:2308.02356. [CrossRef]
111. Sivasankari, A.; Jayalakshmi, S. Land cover clustering for change detection using deep belief network. In Proceedings of the 2022
International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 16–18 March 2022; pp. 815–822.
112. Jia, M.; Zhao, Z. Change detection in synthetic aperture radar images based on a generalized gamma deep belief networks.
Sensors 2021, 21, 8290. [CrossRef]
113. Samadi, F.; Akbarizadeh, G.; Kaabi, H. Change detection in SAR images using deep belief network: A new training approach
based on morphological images. IET Image Process. 2019, 13, 2255–2264. [CrossRef]
114. Mou, L.; Zhu, X.X. A recurrent convolutional neural network for land cover change detection in multispectral images. In
Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27
July 2018; pp. 4363–4366.
115. Mou, L.; Bruzzone, L.; Zhu, X.X. Learning spectral-spatial-temporal features via a recurrent convolutional neural network for
change detection in multispectral imagery. IEEE Trans. Geosci. Remote Sens. 2018, 57, 924–935. [CrossRef]
116. Lyu, H.; Lu, H.; Mou, L.; Li, W.; Wright, J.; Li, X.; Li, X.; Zhu, X.X.; Wang, J.; Yu, L.; et al. Long-term annual mapping of four cities
on different continents by applying a deep information learning method to landsat data. Remote Sens. 2018, 10, 471. [CrossRef]
117. Sun, S.; Mu, L.; Wang, L.; Liu, P. L-UNet: An LSTM network for remote sensing image change detection. IEEE Geosci. Remote
Sens. Lett. 2020, 19, 1–5. [CrossRef]
118. Zhao, Y.; Chen, P.; Chen, Z.; Bai, Y.; Zhao, Z.; Yang, X. A triple-stream network with cross-stage feature fusion for high-resolution
image change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [CrossRef]
119. Zhu, Y.; Lv, K.; Yu, Y.; Xu, W. Edge-guided parallel network for VHR remote sensing image change detection. IEEE J. Sel. Top.
Appl. Earth Obs. Remote Sens. 2023, 16, 7791–7803. [CrossRef]
120. Sefrin, O.; Riese, F.M.; Keller, S. Deep learning for land cover change detection. Remote Sens. 2020, 13, 78. [CrossRef]
121. Jing, R.; Liu, S.; Gong, Z.; Wang, Z.; Guan, H.; Gautam, A.; Zhao, W. Object-Based change detection for VHR remote sensing
images based on a trisiamese-LSTM. Int. J. Remote Sens. 2020, 41, 6209–6231. [CrossRef]
Remote Sens. 2024, 16, 3852 30 of 32
122. Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS
2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp.
207–210.
123. Yuan, P.; Zhao, Q.; Zhao, X.; Wang, X.; Long, X.; Zheng, Y. A transformer-based siamese network and an open optical dataset for
semantic change detection of remote sensing images. Int. J. Digit. Earth 2022, 15, 1506–1525. [CrossRef]
124. Yan, T.; Wan, Z.; Zhang, P. Fully transformer network for change detection of remote sensing images. In Proceedings of the Asian
Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 1691–1708.
125. Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE
Trans. Geosci. Remote Sens. 2022, 60, 1–13. [CrossRef]
126. Pan, J.; Bai, Y.; Shu, Q.; Zhang, Z.; Hu, J.; Wang, M. M-Swin: Transformer-based Multi-scale Feature Fusion Change Detection
Network within Cropland for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [CrossRef]
127. Song, L.; Xia, M.; Xu, Y.; Weng, L.; Hu, K.; Lin, H.; Qian, M. Multi-granularity siamese transformer-based change detection in
remote sensing imagery. Eng. Appl. Artif. Intell. 2024, 136, 108960. [CrossRef]
128. Xu, X.; Li, J.; Chen, Z. TCIANet: Transformer-based context information aggregation network for remote sensing image change
detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1951–1971. [CrossRef]
129. Ma, J.; Duan, J.; Tang, X.; Zhang, X.; Jiao, L. Eatder: Edge-assisted adaptive transformer detector for remote sensing change
detection. IEEE Trans. Geosci. Remote Sens. 2023, 62, 1–15. [CrossRef]
130. Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14.
[CrossRef]
131. Song, X.; Hua, Z.; Li, J. PSTNet: Progressive sampling transformer network for remote sensing image change detection. IEEE J.
Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8442–8455. [CrossRef]
132. Zhang, K.; Zhao, X.; Zhang, F.; Ding, L.; Sun, J.; Bruzzone, L. Relation changes matter: Cross-temporal difference transformer for
change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–5. [CrossRef]
133. Ding, L.; Zhang, J.; Guo, H.; Zhang, K.; Liu, B.; Bruzzone, L. Joint spatio-temporal modeling for semantic change detection in
remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [CrossRef]
134. Zhou, Y.; Huo, C.; Zhu, J.; Huo, L.; Pan, C. DCAT: Dual cross-attention-based transformer for change detection. Remote Sens.
2023, 15, 2395. [CrossRef]
135. Noman, M.; Fiaz, M.; Cholakkal, H.; Narayan, S.; Anwer, R.M.; Khan, S.; Khan, F.S. Remote sensing change detection with
transformers trained from scratch. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4704214. [CrossRef]
136. Yuan, J.; Wang, L.; Cheng, S. STransUNet: A siamese transUNet-based remote sensing image change detection network. IEEE J.
Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9241–9253. [CrossRef]
137. Deng, Y.; Meng, Y.; Chen, J.; Yue, A.; Liu, D.; Chen, J. TChange: A hybrid transformer-CNN change detection network. Remote
Sens. 2023, 15, 1219. [CrossRef]
138. Wang, G.; Li, B.; Zhang, T.; Zhang, S. A network combining a transformer and a convolutional neural network for remote sensing
image change detection. Remote Sens. 2022, 14, 2228. [CrossRef]
139. Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing
images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–19. [CrossRef]
140. Liu, M.; Chai, Z.; Deng, H.; Liu, R. A CNN-transformer network with multiscale context aggregation for fine-grained cropland
change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4297–4306. [CrossRef]
141. Yin, M.; Chen, Z.; Zhang, C. A CNN-transformer network combining CBAM for change detection in high-resolution remote
sensing images. Remote Sens. 2023, 15, 2406. [CrossRef]
142. Wang, W.; Tan, X.; Zhang, P.; Wang, X. A CBAM based multiscale transformer fusion approach for remote sensing image change
detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6817–6825. [CrossRef]
143. Song, X.; Hua, Z.; Li, J. LHDACT: Lightweight hybrid dual attention CNN and transformer network for remote sensing image
change detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [CrossRef]
144. Jiang, M.; Chen, Y.; Dong, Z.; Liu, X.; Zhang, X.; Zhang, H. Multiscale fusion CNN-transformer network for high-resolution
remote sensing image change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5280–5293. [CrossRef]
145. Tang, W.; Wu, K.; Zhang, Y.; Zhan, Y. A siamese network based on multiple attention and multilayer transformers for change
detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5219015. [CrossRef]
146. Niu, Y.; Guo, H.; Lu, J.; Ding, L.; Yu, D. SMNet: Symmetric multi-task network for semantic change detection in remote sensing
images based on CNN and transformer. Remote Sens. 2023, 15, 949. [CrossRef]
147. Li, W.; Xue, L.; Wang, X.; Li, G. Mctnet: A multi-scale cnn-transformer network for change detection in optical remote sensing
images. In Proceedings of the 2023 26th International Conference on Information Fusion (FUSION), Charleston, SC, USA, 27–30
July 2023; pp. 1–5.
148. Tang, X.; Zhang, T.; Ma, J.; Zhang, X.; Liu, F.; Jiao, L. Wnet: W-shaped hierarchical network for remote sensing image change
detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5615814. [CrossRef]
149. Zhang, X.; Cheng, S.; Wang, L.; Li, H. Asymmetric cross-attention hierarchical network based on CNN and transformer for
bitemporal remote sensing images change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [CrossRef]
Remote Sens. 2024, 16, 3852 31 of 32
150. Feng, Y.; Xu, H.; Jiang, J.; Liu, H.; Zheng, J. ICIF-Net: Intra-scale cross-interaction and inter-scale feature fusion network for
bitemporal remote sensing images change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [CrossRef]
151. Fu, Z.; Li, J.; Ren, L.; Chen, Z. Slddnet: Stage-wise short and long distance dependency network for remote sensing change
detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–19. [CrossRef]
152. Zhang, C.; Wang, L.; Cheng, S. HCGNet: A Hybrid Change Detection Network Based on CNN and GNN. IEEE Trans. Geosci.
Remote Sens. 2024, 62, 1–12. [CrossRef]
153. Zhu, Y.; Li, Q.; Lv, Z.; Falco, N. Novel land cover change detection deep learning framework with very small initial samples
using heterogeneous remote sensing images. Remote Sens. 2023, 15, 4609. [CrossRef]
154. Liu, M.; Shi, Q.; Marinoni, A.; He, D.; Liu, X.; Zhang, L. Super-resolution-based change detection network with stacked attention
module for images with different resolutions. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–18. [CrossRef]
155. Tian, J.; Peng, D.; Guan, H.; Ding, H. RACDNet: Resolution-and alignment-aware change detection network for optical remote
sensing imagery. Remote Sens. 2022, 14, 4527. [CrossRef]
156. Liu, M.; Shi, Q.; Liu, P.; Wan, C. Siamese generative adversarial network for change detection under different scales. In
Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26
September–2 October 2020; pp. 2543–2546.
157. Prexl, J.; Saha, S.; Zhu, X.X. Mitigating spatial and spectral differences for change detection using super-resolution and
unsupervised learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS,
Brussels, Belgium, 11–16 July 2021; pp. 3113–3116.
158. Li, S.; Wang, Y.; Cai, H.; Lin, Y.; Wang, M.; Teng, F. MF-SRCDNet: Multi-feature fusion super-resolution building change detection
framework for multi-sensor high-resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2023, 119, 103303. [CrossRef]
159. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
160. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers:
State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45.
161. Liu, M.; Shi, Q.; Li, J.; Chai, Z. Learning token-aligned representations with multimodel transformers for different-resolution
change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [CrossRef]
162. Sun, B.; Liu, Q.; Yuan, N.; Tan, J.; Gao, X.; Yu, T. Spectral token guidance transformer for multisource images change detection.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2559–2572. [CrossRef]
163. Chen, H.; Zhang, H.; Chen, K.; Zhou, C.; Chen, S.; Zou, Z.; Shi, Z. Continuous cross-resolution remote sensing image change
detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5623320. [CrossRef]
164. Chen, H.; Wu, C.; Du, B.; Zhang, L.; Wang, L. Change detection in multisource VHR images via deep siamese convolutional
multiple-layers recurrent neural network. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2848–2864. [CrossRef]
165. Benedetti, P.; Ienco, D.; Gaetano, R.; Ose, K.; Pensa, R.G.; Dupuy, S. M3Fusion: A deep learning architecture for multiscale
multimodal multitemporal satellite data fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4939–4949. [CrossRef]
166. Ebel, P.; Saha, S.; Zhu, X.X. Fusing multi-modal data for supervised change detection. Int. Arch. Photogramm. Remote Sens. Spat.
Inf. Sci. 2021, 43, 243–249. [CrossRef]
167. Hafner, S.; Nascetti, A.; Azizpour, H.; Ban, Y. Sentinel-1 and Sentinel-2 data fusion for urban change detection using a dual stream
u-net. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [CrossRef]
168. He, X.; Zhang, S.; Xue, B.; Zhao, T.; Wu, T. Cross-modal change detection flood extraction based on convolutional neural network.
Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103197. [CrossRef]
169. Li, H.; Zhu, F.; Zheng, X.; Liu, M.; Chen, G. MSCDUNet: A deep learning framework for built-Up area change detection
integrating multispectral, SAR, and VHR data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5163–5176. [CrossRef]
170. Chen, H.; Wu, C.; Du, B.; Zhang, L. DSDANet: Deep siamese domain adaptation convolutional neural network for cross-domain
change detection. arXiv 2020, arXiv:2006.09225.
171. Zhang, C.; Feng, Y.; Hu, L.; Tapete, D.; Pan, L.; Liang, Z.; Cigna, F.; Yue, P. A domain adaptation neural network for change
detection with heterogeneous optical and SAR remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102769. [CrossRef]
172. Luppino, L.T.; Hansen, M.A.; Kampffmeyer, M.; Bianchi, F.M.; Moser, G.; Jenssen, R.; Anfinsen, S.N. Code-aligned autoencoders
for unsupervised change detection in multimodal remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 5, 60–72.
[CrossRef]
173. Wu, Y.; Li, J.; Yuan, Y.; Qin, A.; Miao, Q.G.; Gong, M.G. Commonality autoencoder: Learning common features for change
detection from heterogeneous images. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4257–4270. [CrossRef]
174. Farahani, M.; Mohammadzadeh, A. Domain adaptation for unsupervised change detection of multisensor multitemporal
remote-sensing images. Int. J. Remote Sens. 2020, 41, 3902–3923. [CrossRef]
175. Jiang, X.; Li, G.; Liu, Y.; Zhang, X.P.; He, Y. Change detection in heterogeneous optical and SAR remote sensing images via deep
homogeneous feature fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1551–1566. [CrossRef]
176. Touati, R.; Mignotte, M.; Dahmane, M. Anomaly feature learning for unsupervised change detection in heterogeneous images: A
deep sparse residual model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 588–600. [CrossRef]
Remote Sens. 2024, 16, 3852 32 of 32
177. Zheng, X.; Chen, X.; Lu, X.; Sun, B. Unsupervised change detection by cross-resolution difference learning. IEEE Trans. Geosci.
Remote Sens. 2021, 60, 1–16. [CrossRef]
178. Wei, L.; Chen, G.; Zhou, Q.; Liu, C.; Cai, C. Cross-mapping net: Unsupervised change detection from heterogeneous remote
sensing images using a transformer network. In Proceedings of the 2023 8th International Conference on Computer and
Communication Systems (ICCCS), Guangzhou, China, 21–24 April 2023; pp. 1021–1026.
179. Lu, T.; Zhong, X.; Zhong, L. mSwinUNet: A multi-modal U-shaped swin transformer for supervised change detection. J. Intell.
Fuzzy Syst. 2024. Preprint.
180. Hu, X.; Zhang, P.; Ban, Y.; Rahnemoonfar, M. GAN-based SAR and optical image translation for wildfire impact assessment using
multi-source remote sensing data. Remote Sens. Environ. 2023, 289, 113522. [CrossRef]
181. Zhao, T.; Wang, L.; Zhao, C.; Liu, T.; Ohtsuki, T. Heterogeneous image change detection based on deep image translation and
feature refinement-aggregation. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala
Lumpur, Malaysia, 8–11 October 2023; pp. 1705–1709.
182. Manocha, A.; Afaq, Y. Optical and SAR images-based image translation for change detection using generative adversarial
network (GAN). Multimed. Tools Appl. 2023, 82, 26289–26315. [CrossRef]
183. Du, Z.; Li, X.; Miao, J.; Huang, Y.; Shen, H.; Zhang, L. Concatenated deep learning framework for multi-task change detection of
optical and SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 719–731. [CrossRef]
184. Wang, M.; Huang, L.; Tang, B.H.; Le, W.; Tian, Q. TDSCCNet: Twin-depthwise separable convolution connect network for change
detection with heterogeneous images. Geocarto Int. 2024, 39, 2329673. [CrossRef]
185. Su, Z.; Wan, G.; Zhang, W.; Wei, Z.; Wu, Y.; Liu, J.; Jia, Y.; Cong, D.; Yuan, L. Edge-bound change detection in multisource remote
sensing images. Electronics 2024, 13, 867. [CrossRef]
186. Xu, J.; Luo, C.; Chen, X.; Wei, S.; Luo, Y. Remote sensing change detection based on multidirectional adaptive feature fusion and
perceptual similarity. Remote Sens. 2021, 13, 3053. [CrossRef]
187. Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image
difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307. [CrossRef]
188. Ienco, D.; Interdonato, R.; Gaetano, R.; Minh, D.H.T. Combining Sentinel-1 and Sentinel-2 satellite image time series for land
cover mapping via a multi-source deep learning architecture. ISPRS J. Photogramm. Remote Sens. 2019, 158, 11–22. [CrossRef]
189. Wang, L.; Wang, L.; Wang, H.; Wang, X.; Bruzzone, L. SPCNet: A subpixel convolution-based change detection network for
hyperspectral images with different spatial resolutions. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [CrossRef]
190. Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource remote sensing data classification based on convolutional neural
network. IEEE Trans. Geosci. Remote Sens. 2017, 56, 937–949. [CrossRef]
191. Chen, Y.; Li, C.; Ghamisi, P.; Jia, X.; Gu, Y. Deep fusion of remote sensing data for accurate classification. IEEE Geosci. Remote Sens.
Lett. 2017, 14, 1253–1257. [CrossRef]
192. Feng, Q.; Zhu, D.; Yang, J.; Li, B. Multisource hyperspectral and LiDAR data fusion for urban land-use mapping based on a
modified two-branch convolutional neural network. ISPRS Int. J. Geo-Inf. 2019, 8, 28. [CrossRef]
193. Mohla, S.; Pande, S.; Banerjee, B.; Chaudhuri, S. Fusatnet: Dual attention based spectrospatial multimodal fusion network for
hyperspectral and lidar classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 92–93.
194. Ma, W.; Karakuş, O.; Rosin, P.L. AMM-FuseNet: Attention-based multi-modal image fusion network for land cover mapping.
Remote Sens. 2022, 14, 4458. [CrossRef]
195. Liu, J.; Gong, M.; Qin, K.; Zhang, P. A deep convolutional coupling network for change detection based on heterogeneous optical
and radar images. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 545–559. [CrossRef]
196. Liu, Z.; Li, G.; Mercier, G.; He, Y.; Pan, Q. Change detection in heterogenous remote sensing images via homogeneous pixel
transformation. IEEE Trans. Image Process. 2017, 27, 1822–1834. [CrossRef]
197. Roy, S.K.; Deria, A.; Hong, D.; Rasti, B.; Plaza, A.; Chanussot, J. Multimodal fusion transformer for remote sensing image
classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–20. [CrossRef]
198. Luppino, L.T.; Bianchi, F.M.; Moser, G.; Anfinsen, S.N. Unsupervised image regression for heterogeneous change detection. arXiv
2019, arXiv:1909.05948. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.