0% found this document useful (0 votes)
28 views

Surveying Image Segmentation Approaches in Astronomy

Uploaded by

Shubham Barge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Surveying Image Segmentation Approaches in Astronomy

Uploaded by

Shubham Barge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Surveying Image Segmentation Approaches in Astronomy

Duo Xua , Ye Zhub


a Department of Astronomy, University of Virginia, 530 McCormick Rd, Charlottesville, 22904-4235, VA, USA
b Department of Computer Science,Princeton University, 35 Olden St, Princeton, 08540, NJ, USA
arXiv:2405.14238v1 [astro-ph.IM] 23 May 2024

Abstract
Image segmentation plays a critical role in unlocking the mysteries of the universe, providing astronomers with a
clearer perspective on celestial objects within complex astronomical images and data cubes. Manual segmentation,
while traditional, is not only time-consuming but also susceptible to biases introduced by human intervention. As a
result, automated segmentation methods have become essential for achieving robust and consistent results in astro-
nomical studies. This review begins by summarizing traditional and classical segmentation methods widely used in
astronomical tasks. Despite the significant improvements these methods have brought to segmentation outcomes, they
fail to meet astronomers’ expectations, requiring additional human correction, further intensifying the labor-intensive
nature of the segmentation process. The review then focuses on the transformative impact of machine learning, partic-
ularly deep learning, on segmentation tasks in astronomy. It introduces state-of-the-art machine learning approaches,
highlighting their applications and the remarkable advancements they bring to segmentation accuracy in both as-
tronomical images and data cubes. As the field of machine learning continues to evolve rapidly, it is anticipated
that astronomers will increasingly leverage these sophisticated techniques to enhance segmentation tasks in their re-
search projects. In essence, this review serves as a comprehensive guide to the evolution of segmentation methods
in astronomy, emphasizing the transition from classical approaches to cutting-edge machine learning methodologies.
We encourage astronomers to embrace these advancements, fostering a more streamlined and accurate segmentation
process that aligns with the ever-expanding frontiers of astronomical exploration.
Keywords: Segmentation, Artificial Intelligence, Machine Learning, Neural Network, Vision Transformer,
Generative Model, Astronomy image processing(2306)

1. Introduction robust foundation for further scrutiny, enabling a com-


prehensive examination of the unique attributes associ-
Astronomy is currently experiencing a data-driven
ated with each distinct component. Astronomical im-
revolution, fueled by an exponential increase in data
ages and data cubes inherently encompass a diverse ar-
acquisition. Numerous comprehensive sky surveys
ray of celestial objects and phenomena. Segmentation is
have already been completed, with many more planned
instrumental in separating these multifarious elements,
(Ricker et al., 2015; Bellm et al., 2019; Ivezić et al.,
enabling an in-depth examination of individual compo-
2019; Dey et al., 2019; Flewelling et al., 2020; Ahu-
nents.
mada et al., 2020; Gaia Collaboration et al., 2021).
This deluge of data poses significant challenges to as- The ability to scrutinize the dimensions, contours,
tronomers, particularly in the complex analysis of these and assorted characteristics of these segmented regions
voluminous datasets. A fundamental and recurring task equips astronomers with a heightened level of precision
in astronomy is segmentation, the process of accurately in their research pursuits and analytical comparisons.
identifying celestial objects or features (Bertin and For example, segmentation of galaxies delineates their
Arnouts, 1996; Berry, 2015; Robitaille et al., 2019b). inherent components—such as spiral arms, bulges, and
Image segmentation is an essential first step in ana- bars—thereby elucidating the complex processes under-
lyzing images or three-dimensional data cubes. Its pri- lying their formation and evolution (Shen and Zheng,
mary goal is to partition this data into distinct segments 2020; Bekki, 2021; Rey Deutsch et al., 2023).
or regions, enabling the differentiation of boundaries Furthermore, the application of segmentation ex-
between individual celestial objects. This results in a tends to the analysis of multi-wavelength data, con-
Preprint submitted to VSI: ML methods in astronomy May 24, 2024
tributing to a holistic understanding of targeted astro- Determining a suitable threshold poses challenges,
nomical subjects. This approach involves exploring given factors like noise, background variations, or the
a broad spectrum of wavelengths, ranging from radio diffuse boundaries of objects. Selecting a threshold in-
waves and infrared to optical, UV, x-ray, and gamma- volves a trade-off: minimizing false negatives may lead
rays (Men’shchikov et al., 2012; Verbeeck et al., 2013; to an increase in false positives, and vice versa. Striking
Robitaille et al., 2019a; van der Zwaard et al., 2021; a balance between these errors becomes challenging, as
Adithya et al., 2021; Huertas-Company and Lanusse, extreme threshold variations tend to minimize one error
2023). Additionally, segmentation is employed in the type at the expense of the other. Consequently, the task
analysis of gravitational wave data (Covas and Prix, lies in finding a threshold that minimizes both types of
2022) and neutrino observations (Belavin et al., 2021). errors simultaneously.
The diverse range of wavelengths and messengers pro- Various strategies exist for setting the threshold, often
vides distinct insights into various physical processes, adopting arbitrary methods. Examples include adjust-
elevating the depth of the comprehensive analysis of ce- ing it based on sky background and noise levels (Irwin,
lestial objects. 1985; Starck et al., 1999, e.g., ) or deriving it from mod-
Additionally, segmentation facilitates the study of eled distributions, such as χ2 distributions. In the lat-
transient or time-dependent astronomical phenomena, ter case, the threshold is determined at the intersection
a domain that encompasses celestial events like super- point between the theoretical and actual data distribu-
novae and variable stars (Olmedo et al., 2008; Verbeeck tions (Szalay et al., 1999). Alternatively, if the emission
et al., 2014). In this context, segmentation becomes distribution of real sources is known, one might opt for
the conduit through which astronomers can trace the a threshold set at three times the deviation of the peak
temporal evolution and transformations of these celes- distribution (Slezak et al., 1988) or choose a threshold
tial bodies, thereby facilitating comprehensive investi- that minimizes the fraction of false detections (Lazzati
gations into the nature of these transient occurrences. et al., 1999; Hopkins et al., 2002). However, it is crucial
As the volume of observational data swells, the im- to note that these methods are not fully automated, and
portance of automation through segmentation is under- the selection of the threshold may involve some level of
scored. Leveraging the capabilities of machine learning arbitrariness.
and deep learning techniques, astronomers can effec- The majority of source extraction algorithms in as-
tively automate the process of identifying and catego- tronomy, such as PyBDSF (Python Blob Detection
rizing objects within expansive datasets, thus enhancing and Source Finder, Mohan and Rafferty, 2015), Selavy
the efficiency and efficacy of astronomical analyses. (Whiting and Humphreys, 2012), Aegean (Hancock
In this review paper, we embark on a two-fold jour- et al., 2012), and Caesar (Compact And Extend Source
ney: an exploration of classical segmentation methods Automated Recognition, Riggi, 2018), rely on basic
in Section 2, followed by an investigation into machine thresholding as their fundamental approach, with some
learning-based approaches to segmentation in Section 3. variation in the treatment of threshold selection among
different algorithms. Additional algorithms employing
2. Classic Segmentation Methods direct thresholding on optical, x-ray, and multiband im-
ages have been developed, as demonstrated in studies
In this section, we introduce several classical segmen- by Herzog and Illingworth (1977); Newell and O’Neil
tation method in astronomy. (1977); Buonanno et al. (1983); Vikhlinin et al. (1995).
One noteworthy instance within the realm of thresh-
2.1. Thresholding olding is Otsu’s method, proposed by Nobuyuki Otsu in
Thresholding is a fundamental and well-established 1979 (Otsu, 1979). This method is exceptionally pro-
image segmentation technique, widely used in image ficient in the automated selection of an optimal thresh-
analysis and processing, holding particular significance old for image segmentation. Otsu’s method optimizes
in the realm of astronomy. It plays a pivotal role in the between-class variance of pixel intensities, with the
partitioning astronomical images into discrete regions primary objective of demarcating the object of interest
or objects by distinguishing between pixel values above from the background. Notably, Otsu’s method excels
and below a predetermined threshold. Thresholding’s when confronted with an image whose pixel intensity
essence lies in its elementary and unpretentious nature, histogram exhibits a bimodal or multimodal distribu-
serving as the gatekeeper that demarcates an image into tion, a common scenario in image segmentation.
two or more segments based on the intrinsic pixel inten- The Otsu thresholding algorithm can be summarized
sity values. in the following steps:
2
1. Histogram computation: Compute the histogram
of the input image, revealing the distribution of
pixel intensities.
2. Probability distribution: Normalize the histogram
to obtain a probability distribution of pixel intensi-
ties.
3. Threshold initialization: Initialize a threshold
value, typically set to zero.
4. Between-class variance calculation: For each fea-
sible threshold value, calculate the between-class
variance, which quantifies the dispersion of pixel
intensities between the object of interest and the
background.
5. Within-class variance calculation: Concurrently,
for each threshold value, calculate the within-class
variance for both the background and the object of
interest, which characterizes the spread of pixel in-
tensities within these classes. Figure 1: Captured from Zheng et al. (2015), squares indicate accu-
6. Weighted sum of variances: Compute a weighted rately known object positions. Ellipses represent objects identified
sum of variances for each threshold. This mea- using the enhanced Otsu’s method outlined in Zheng et al. (2015),
sure captures the amalgamation of the between- while triangles denote true, faint objects exclusively detected by this
method but not by SExtractor.
class and within-class variances.
7. Threshold selection: Iterate through steps 4-6,
identifying the threshold value that optimally max- and regions. It plays a pivotal role in image segmenta-
imizes the total variance. This optimal threshold tion, providing the foundation for subsequent analysis.
signifies the zenith of contrast and separation be- In the realm of astronomy, where precision and accu-
tween the object of interest and the background in racy are paramount, various edge detection algorithms
the image. are applied, each offering distinct advantages and trade-
8. Binarization: Apply the selected threshold to the offs. Notably, the Sobel, Prewitt, and Canny methods
original image, rendering a binary image in which are three prominent gradient-based techniques widely
the object of interest and the background are dis- adopted for astronomical image analysis.
tinctly differentiated, setting the stage for further The Sobel and Prewitt operators are gradient-based
image processing or analysis. edge detection algorithms that operate by convolving
Otsu’s method is a robust and versatile threshold- the image with a pair of 3 × 3 kernels, one for horizontal
ing algorithm that has been adopted in a diverse range edges and one for vertical edges. The Sobel and Prewitt
of scientific and engineering fields, such as astronom- kernels are as follows:
+1 0 −1 +1 +2 +1
   
ical image segmentation for the identification of stars
Sobel: G x = +2 0 −2, Gy =  0 0 0 ,
   
and galaxies (Zheng et al., 2015). Its simplicity and
+1 0 −1 −1 −2 −1
   
efficiency render it a valuable tool for image segmen-
+1 0 −1 +1 +1 +1
   
tation and other image processing tasks. It is note-
Prewitt: G x = +1 0 −1, Gy =  0 0 0 .
   
worthy that SExtractor (Bertin and Arnouts, 1996), a
+1 0 −1 −1 −1 −1
   
extensively utilized algorithm for source segmentation,
also employs the thresholding concept during the pro- The convolution operation computes the weighted
cess of source identification. Illustrated in Figure 1 is sum of the neighboring pixels, where the weights are
an instance of source segmentation/detection utilizing given by the kernel. The resulting images, G x and Gy ,
the improved Otsu’s method introduced by Zheng et al. represent the horizontal and vertical gradients of the in-
(2015). put image, respectively. The edge magnitude and direc-
tion can then be calculated from the gradient images:
2.2. Edge Detection q
Edge detection is a fundamental image processing magnitude = G2x + G2y
technique that identifies discontinuities in pixel inten-
sity values, corresponding to the boundaries of objects direction = atan2(Gy , (G x ).

3
Pixels with a high magnitude and a consistent direction
are considered to be edges. The Sobel and Prewitt oper-
ators are simple and efficient, but they can be sensitive
to noise in the image. To improve robustness to noise,
it is common to pre-smooth the image with a Gaussian
filter before applying the edge detection algorithm.
In contrast to the Sobel and Prewitt operators, the
Canny edge detector distinguishes itself as a multi-stage
edge detection approach celebrated for its robustness
and precision. The Canny method incorporates a se-
quence of stages:
(a) (b) (c) (d)

1. Gaussian smoothing: The image is smoothed with Figure 2: Detected edges using various methods on Cassini Astron-
a Gaussian filter to reduce noise and create a con- omy images, as detailed in Yang et al. (2018). (a) Depicts the Original
Image. (b), (c), and (d) illustrate the edges detected by Canny, Sobel,
tinuous gradient. and Prewitt methods, respectively.
2. Gradient calculation: The gradient magnitude and
direction are calculated at each pixel location.
3. Non-maximum suppression: Only the local max- ods such as Canny, Sobel, and Prewitt (Yang et al.,
ima along the gradient direction are preserved, re- 2018).
sulting in single-pixel-wide edges.
4. Edge tracking by hysteresis: A two-level thresh- 2.3. Watershed
olding strategy is used to link strong edges together Watershed segmentation is a widely used image seg-
and discard weak edges. mentation algorithm that partitions an image into dis-
The Canny edge detector is more computationally ex- tinct regions based on pixel intensity and spatial rela-
pensive than other edge detection algorithms, such as tionships. The algorithm works by treating the image as
the Sobel and Prewitt operators, but it produces more a topographic surface, with pixel intensity values rep-
accurate edge maps. It is also more robust to noise, resenting elevations. Water is flooded onto the surface
making it suitable for a wide range of applications in from markers, which are typically identified by the user
computer vision, image processing, and image analysis. or generated using image processing techniques. The
water flows into valleys and basins, and the watershed
Edge detection is a fundamental image processing
lines (i.e., the boundaries between the basins) define the
technique that is widely used in astronomy to identify
segmented regions. Here is a step-by-step overview of
structures. For example, the Sobel operator is used to
the watershed segmentation process:
detect filamentary structures in molecular clouds, which
can be used to study their alignment with the magnetic 1. Preprocessing: Denoise and enhance the image, if
field direction (Green et al., 2017). The Sobel edge de- needed.
tection technique is also used to measure the magni- 2. Gradient computation: Calculate the image gradi-
tude of the tip of the red giant branch (TRGB), which ent magnitude to emphasize areas of rapid change
is a key parameter for determining distances to nearby in intensity.
galaxies (Mouhcine et al., 2005). Additionally, the
3. Marker selection: Identify seed pixels, either man-
Canny edge detector is used to detect transient events
ually or using image processing techniques.
such as coronal mass ejections from the Sun (Boursier
et al., 2005), and to search for the signatures of cos- 4. Distance transform: Calculate the distance trans-
mic string networks on cosmic microwave background form of the markers to define their potential re-
(CMB) anisotropies (Vafaei Sadr et al., 2018). gions of influence.
The Sobel, Prewitt, and Canny methods are all fun- 5. Watershed flooding: Simulate water rising from
damental edge detection techniques, each with its own the markers and flooding the image, separating
strengths and weaknesses. The choice of which method basins with watershed lines.
to use depends on factors such as the level of image 6. Region merging: Merge adjacent basins as the wa-
noise, the required precision, and the computational ter level rises.
constraints. Figure 2 illustrates the detected edges in 7. Result visualization: Label each pixel in the seg-
Cassini Astronomy images employing different meth- mented image according to its basin.
4
8. Post-processing: Refine the segmentation results 2. Compute the energy functional. The energy func-
as needed, e.g., by removing small regions or tional is computed based on the current position of
merging adjacent regions. the active contour and the image features.
3. Deform the active contour. The active contour is
Watershed segmentation is a powerful image segmen-
deformed to minimize the energy functional. This
tation algorithm, but it is sensitive to the selection of
can be done using a variety of optimization algo-
markers and initial conditions. Careful marker selec-
rithms.
tion and post-processing are essential for accurate and
meaningful results. 4. Repeat steps 2 and 3 until the active contour con-
verges. Convergence occurs when the energy func-
tional cannot be further reduced.
5. The final position of the active contour represents
the segmented object.

Figure 3: From Berry (2015), on the left: an artificial field containing


Gaussian clumps. On the right: the array depicting the clump assign-
ments generated by FellWalker.

The watershed algorithm is widely used in astronomy


to segment a variety of objects, including filamentary
structures in molecular clouds (e.g., FELLWALKER
Figure 4: Tracking ribbon positions using active contours on a solar-
method, Berry, 2015; Rani et al., 2023), stars, galax- flare UV and EUV image from Gill et al. (2010).
ies (Zheng et al., 2015; Hausen and Robertson, 2020),
and large-scale structures such as voids (Platen et al., Active contour segmentation is a powerful and versa-
2007). It is worth mentioning that the general source tile technique for segmenting objects with a wide range
extraction code ProFound (Robotham et al., 2018; Hale of shapes and appearances. However, it is important
et al., 2019) also incorporates the watershed algorithm. to note that the accuracy of the segmentation results
Figure 3 illustrates an instance of Gaussian clump seg- is sensitive to the choice of initial contour and the pa-
mentation using FellWalker (Berry, 2015). rameters of the energy functional. Active contour seg-
mentation is employed in solar astrophysics for various
2.4. Active Contours applications. Its usage spans the segmentation of elon-
Active contours, also known as snakes, are a powerful gated bright ribbons in solar-flare UV and EUV images,
image segmentation technique that uses an iterative op- contributing to the investigation of magnetic field line
timization algorithm to minimize an energy functional. reconnection (Gill et al., 2010). Furthermore, active
The energy functional consists of two terms: an internal contour segmentation is employed to track and deter-
energy term that penalizes the curvature and elasticity mine the differential rotation of sunspots and coronal
of the contour, and an external energy term that attracts bright points (Dorotovic et al., 2014). In the detection
the contour to image features such as edges and lines. of coronal holes, it integrates a confidence map for en-
Active contour segmentation is typically performed as hanced accuracy (Grajeda et al., 2023). Additionally,
follows: active contour segmentation facilitates real-time detec-
tion and extraction of coronal holes from solar disk im-
1. Initialize the active contour, a process that can be ages (Bandyopadhyay et al., 2020). It is also utilized to
performed either manually by the user or automat- segment diffuse objects within coronal holes (Tajik and
ically through an image processing algorithm. One Rahebi, 2013) and extract as well as characterize coro-
straightforward approach is to apply image thresh- nal holes, playing a crucial role in predicting the fast
olding. solar wind (Boucheron et al., 2016). Figure 4 depicts
5
the utilization of the active contour method for tracking puted as:
ribbon positions on a solar-flare UV and EUV image
(Gill et al., 2010). w j (k, l) = c j−1 (k, l) − c j (k, l).

2.5. Wavelet Transform This algorithm constructs the sequence by perform-


Wavelet transform, originally designed for signal pro- ing successive convolutions with a filter derived from
cessing, has become a versatile tool in image processing an auxiliary function known as the scaling function h.
and segmentation. It excels at decomposing images into A commonly employed linear profile scaling function h
frequency components, enabling simultaneous analysis is typically represented as follows:
in spatial and frequency domains. In image segmen-  
1 2 1
tation, wavelet transform represents features at differ- 1 
h= 2 4 2 .

ent scales through multiresolution analysis. Commonly, 16 1 2 1
wavelet-based segmentation uses adaptive thresholding
to separate regions of interest from background noise. A widely used B3 cubic spline profile scaling function
This adaptability ensures nuanced segmentation, mak- h takes the following form:
ing wavelets robust for diverse image structures, includ-
ing those encountered in medical imaging, remote sens- 1

4 6 4 1

ing, and astronomy. 4 16 24 16 4

1 
The “à trous” algorithm, also known as the ”with h= 6 .

6 24 36 24
holes” algorithm or Stationary Wavelet Transform 256 4 16 24 16

4

(SWT), is a pivotal wavelet-based method in image pro- 1 4 6 4 1
cessing and segmentation. It facilitates multiresolu-
tion analysis by decomposing an image into frequency Below are the steps involved in using the “à trous”
components at various scales without explicit downsam- algorithm for image segmentation:
pling, preserving the original sampling grid and leaving 1. Wavelet Decomposition: Apply the ”à trous” al-
“holes” in the transformation process. This unique ap- gorithm to decompose the input astronomical im-
proach allows for a detailed analysis of images at differ- age into frequency components at multiple scales
ent scales without loss of information. In image seg- through multiresolution analysis.
mentation, the algorithm accurately represents image
2. Wavelet Coefficient Thresholding: Threshold the
features, capturing both global and local characteristics.
obtained wavelet coefficients at different scales to
Its adaptability makes it well-suited for diverse segmen-
distinguish significant features from background
tation tasks, including those in astronomy with varying
noise. Adaptive thresholding ensures nuanced seg-
intensities in astronomical images. The “à trous” algo-
mentation, accommodating varying complexities
rithm’s distinctive approach enhances the precision of
in different astronomical structures, including ob-
image analysis, contributing to the broader field of im-
jects with diverse intensities.
age processing and analysis. We utilize the mathemati-
3. Segmentation: Utilize the thresholded wavelet co-
cal framework described in Starck et al. (1998); Starck
efficients to perform precise segmentation on the
et al. (1999). The “à trous” algorithm decomposes an
astronomical image. The algorithm’s capability
image into different levels (typically J levels). For each
to capture both global and local features makes
decomposition level ( j = 0 to J-1), the process involves
it well-suited for accurately representing complex
convolving the original image ( f (x, y)) with a scaling
structures during the segmentation process in as-
function (h) to obtain smoothed data (c j (k, l)) at resolu-
tronomical data.
tion level ( j) and position (k,l). This is mathematically
represented as: The “à trous” algorithm excels in multiresolution
analysis, decomposing images into frequency compo-
c0 (k, l) = f (k, l) nents at various scales to capture information at differ-
X
c j (k, l) = h(m, n) · c j−1 (k + 2 j−1 m, l + 2 j−1 n). ent levels of detail. Unlike traditional wavelet methods,
m,n it avoids explicit downsampling, preserving the original
sampling grid for a detailed analysis of images with-
Here, (m,n) ranges over the filter size. The difference out information loss. Adaptive thresholding of wavelet
signal (w j ) between two consecutive resolutions is com- coefficients enables nuanced segmentation, adapting to
6
complexities in different image structures, particularly
beneficial for astronomical images with varying inten-
sities. The algorithm accurately represents global and
local characteristics, making it suitable for diverse seg-
mentation tasks. It demonstrates robustness in handling
irregularities, noise, and contrast variations. However,
computational intensity arises with large datasets or im-
ages, escalating with the number of scales in multireso-
lution analysis. Sensitivity to parameter choices, such
as scaling function and scale count, may necessitate
experimentation for optimal performance. Challenges
NGC 4321 Final Segmentation exist in precisely localizing features, especially when
boundaries are ambiguous. The choice of scaling func-
tion influences performance, with certain functions bet-
ter suited for specific image types or structures.
In the field of astronomy, the ”à trous” algorithm
has found application in image segmentation (Rue and
Bijaoui, 1996; Barra et al., 2009; Xavier et al., 2012;
Chen et al., 2023b; Ellien et al., 2021), particularly in
tasks such as segmenting various galaxy components
w1 w2 (Bijaoui and Rué, 1995; Núnez and Llacer, 2003). Fig-
ure 5 provides an illustrative example of the ”à trous”
algorithm’s effectiveness in segmenting different com-
ponents of NGC 4321. Moreover, the algorithm has
been widely employed in segmenting extended struc-
tures, including supernova remnants, HII regions, and
bow shocks in radio images (Peracaula et al., 2011).

w3 ar 2.6. Clustering-based Methods

Clustering-based methods are a widely adopted and


versatile approach to image segmentation, and their sig-
nificance extends into the domain of astronomy. These
methods are prized for their versatility, serving as foun-
dational tools for partitioning astronomical images into
meaningful regions and objects. The core premise re-
volves around the grouping of pixels with akin char-
acteristics into clusters, thereby facilitating the creation
Figure 5: NGC 4321 image (upper left), segmentation of the NGC
of distinctive image segments. This process hinges on
4321 image (upper right), wavelet decomposition planes W1 (middle
left), W2 (middle right), W3 (lower left), and residual image ar (lower the underlying principle that pixels sharing common at-
right), as presented in Núnez and Llacer (2003). tributes, be it intensity values or color characteristics,
exhibit greater similarity within their designated cluster
than to pixels belonging to other clusters. Notable clus-
tering techniques deployed in this pursuit encompass:

• K-Means clustering: K-Means clustering is a sim-


ple and efficient clustering algorithm that is widely
used in image segmentation. K-Means clustering
works by first partitioning the image pixels into a
predefined number of clusters. Then, each pixel
is assigned to the cluster with the nearest centroid.
The centroids are then updated to reflect the new
7
assigned to a cluster or marked as noise.
• Hierarchical clustering: Hierarchical clustering is
a clustering algorithm that organizes pixels into a
hierarchical tree structure. Hierarchical clustering
works by iteratively merging or splitting clusters
based on the similarity of their constituent pixels.
The termination condition for hierarchical cluster-
ing can be a predefined number of clusters or a spe-
cific stopping criterion. Hierarchical clustering al-
lows for multi-level analysis of complex images,
but it can be computationally expensive for large
images.
Clustering-based methodologies play a pivotal role in
image segmentation, facilitating the meticulous identi-
fication and delineation of regions and objects in im-
ages. These techniques are employed in astronomy for
diverse segmentation tasks. For example, FCM clus-
Figure 6: From Colombo et al. (2015), top left: the Orion–Monoceros tering has been instrumental in partitioning EUV solar
complex. Top right: SCIMES cloud decomposition using spec- images into well-defined regions, such as active regions,
tral clustering on the dendrogram of emission (Colombo et al.,
2015). Bottom left: structures identified by CPROPS’ island method
coronal holes, and the quiet sun (Barra et al., 2009; Ver-
based on dendrogram (Rosolowsky and Leroy, 2006). Bottom right: beeck et al., 2014). Additionally, DBSCAN has proven
CLUMPFIND cloud decomposition using a ”friends-of-friends” algo- highly effective in segmenting molecular cloud emis-
rithm (Williams et al., 1994). sions while robustly mitigating noise contamination in
astronomical data (Yan et al., 2020). Notably, Johnston
cluster assignments. This process is repeated un- et al. (2014) skillfully harnessed hierarchical clustering,
til no further changes in the cluster assignments exemplified through the implementation of Astrodendro
occur. K-Means clustering is a relatively simple (Robitaille et al., 2019b), to segment molecular cloud
algorithm to implement and is computationally ef- emissions in astronomical investigations. Figure 6 illus-
ficient, but it can be sensitive to the choice of the trates the results of emission segmentation using various
initial centroids and the number of clusters. algorithms on the Orion–Monoceros complex.
• Fuzzy C-Means (FCM) clustering: FCM cluster-
ing is a more flexible clustering algorithm that al- 2.7. Multiband Segmentation
lows pixels to belong to multiple clusters with dif- It is noteworthy that several classical segmentation
ferent degrees of membership. This flexibility al- methods mentioned earlier are predominantly designed
lows FCM clustering to produce more nuanced and for single-band images rather than multiband ones. Al-
accurate segmentation results than K-Means clus- though one can apply these methods individually to
tering, especially for images with complex or over- each single-band image and subsequently merge the
lapping objects. However, FCM clustering is com- results based on specific criteria for a comprehensive
putationally more expensive than K-Means cluster- multiband segmentation, there are techniques specifi-
ing. cally crafted to handle multiband images. Examples of
• Density-Based Spatial Clustering of Applications such methods include the Hierarchical Hidden Markov
with Noise (DBSCAN): DBSCAN is a clustering Model (HHMM) (Collet and Murtagh, 2004) and the
algorithm that is robust to noise and outliers. DB- Connected Component Tree (cc-trees) (Slezak et al.,
SCAN works by grouping pixels based on their 2010), both falling under the hierarchical clustering cat-
density and spatial proximity. A pixel is consid- egory discussed earlier.
ered to be a core point if it has a minimum number The Hierarchical Hidden Markov Model (HHMM)
of neighboring pixels within a specified distance serves as a potent Bayesian estimation framework for
threshold. Core points are then assigned to clus- image segmentation, offering probabilistic sequences of
ters, and the clusters are expanded to include all observations. Its distinguishing characteristic lies in
neighboring pixels within the distance threshold. the ”hidden” nature of the underlying stochastic pro-
This process continues until all pixels have been cess generating observations, governed by the Markov
8
property wherein the state at any time depends solely on
the preceding state. Through its hierarchical structure,
the HHMM adeptly captures intricate dependencies and
patterns in data, making it ideal for image segmentation
tasks. This model excels in unraveling complex visual
structures, leveraging Markovian principles across var-
ious image levels to discern dependencies and relation-
ships. By modeling both global and local information,
HHMMs effectively handle complex image structures,
with transitions between states enhancing segmentation
precision. The hierarchical design allows for adaptabil-
ity in revealing varying detail levels. Notably, computa-
tional demands for HHMM training and inference, es-
pecially with large images, can be substantial. Addi-
tionally, optimizing the hierarchical structure and hid-
den state count poses challenges. Figure 7 illustrates
HHMM application in segmenting galaxy components
(Collet and Murtagh, 2004).
The Connected Component Tree (cc-tree) is a hier-
archical data structure employed in image segmenta-
tion for delineating connected components or regions
within an image. Each node within the cc-tree corre-
sponds to a distinct connected component, capturing the
relationships and hierarchy among these components.
The construction process begins with identifying con-
nected components based on criteria such as pixel inten-
sity or color similarity, followed by iterative hierarchi-
cal merging or splitting. The resulting cc-tree provides
a comprehensive representation of the image’s struc-
ture, aiding the understanding of spatial organization
and relationships among different regions. CC-Trees are
widely used in image segmentation tasks, especially in
scenarios with hierarchical or detailed object organiza-
tion. They offer a versatile tool for applications like ob-
ject recognition and feature extraction, efficiently repre-
senting segmented objects and their relationships. Fig-
ure 8 exemplifies the application of cc-trees in segment-
ing HII regions in a 5-band galaxy observation (Slezak
et al., 2010). However, the efficacy of cc-trees depends
on the quality of the initial image segmentation; errors
may occur if segmentation leads to over-segmentation
or smaller, disconnected components, affecting the ac-
curacy of the cc-trees in capturing true object relation-
ships.
It is important to highlight that there is no distinc-
tion in deep learning approaches between single-band Figure 7: From Collet and Murtagh (2004), presentation of 6-spectral
band images depicting the M82 starburst galaxy (rows 1-3), along-
segmentation, multiband segmentation, or even 3D data side the HHMMS segmentation results applied to the astronomical
cube segmentation. This lack of difference arises from 6-spectral band images from the M82 region (row 4).
the flexibility of input data during deep learning model
training, a topic that will be explored in the subsequent
section.
9
6. Fine-Tuning (Optional): If necessary, iteratively
fine-tune the model based on evaluation results.
7. Inference: Evaluate the model on an independent
testing set to measure real-world performance. Ap-
ply the trained model to segment objects in new,
unseen images.
8. Post-Processing (Optional): Implement post-
processing techniques to improve segmentation re-
sults and address artifacts.
These steps offer a general framework, and specific
Figure 8: From (Slezak et al., 2010), detection of HII regions using
details may vary based on the chosen model architec-
cc-trees in two 5-band astronomical images. ture, dataset characteristics, and segmentation task re-
quirements. Further discussion on these aspects will be
provided in the subsequent section.
3. Segmentation Based on Deep Learning
3.1. Mask R-CNN (Region-Based Convolutional Neu-
In this section, we introduce various deep learning ral Network)
approaches to image segmentation in astronomy. Given
the rapid evolution of the deep learning field, some
state-of-the-art algorithms have limited applications or
are yet to be explored in astronomy. The purpose of
this section is to draw attention to applicable deep learn-
ing methods, informing astronomers about efficient and
high-performance approaches that can enhance their
tasks.
The general steps involved in training and apply-
ing deep learning methods for astronomy segmentation
tasks include the following:

1. Data Preparation: Assemble a diverse dataset with Figure 9: Overview of the Mask R-CNN framework, as presented
labeled images for training and validation. Clean in He et al. (2017).
and preprocess the data by resizing, normalizing,
and augmenting to improve model generalization. Mask R-CNN (He et al., 2017), a deep learning ex-
2. Model Selection: Select an appropriate deep learn- tension of Faster R-CNN (Girshick, 2015), has revo-
ing architecture (e.g., U-Net, Mask R-CNN) based lutionized image segmentation, particularly in astron-
on task requirements and available resources. omy. The precise delineation of celestial objects in as-
Choose between building a custom structure or tronomical images is crucial, and Mask R-CNN excels
adopting pretrained models for transfer learning. in this domain by enabling pixel-level object segmen-
3. Model Configuration: Fine-tune parameters like tation. Mask R-CNN is a region-based convolutional
learning rate, batch size, and regularization for op- neural network that is finely tuned for detecting and seg-
timal performance. Define a suitable loss func- menting objects in images. Building on the success of
tion aligned with segmentation objectives (e.g., Faster R-CNN, a leading object detection model, Mask
cross-entropy loss, Dice loss, as discussed in Sec- R-CNN has become instrumental in image segmenta-
tion 3.9). tion tasks.
4. Training: Feed input images into the network and Figure 9, taken from the work by He et al. (2017), il-
optimize the model to minimize the defined loss lustrates the workflow of Mask R-CNN. In addition to
function. Monitor performance on a validation set the general steps introduced at the beginning of this sec-
to prevent overfitting. tion, we elaborate on the essential elements and central
processes of Mask R-CNN as follows:
5. Evaluation: Assess the model’s performance using
evaluation metrics such as IoU, Dice coefficient, or 1. Region Proposal Network (RPN): The RPN takes
pixel-wise accuracy. the extracted features from the backbone network
10
and generates a set of proposal boxes, each asso-
ciated with a score indicating the likelihood of an
object being present within the box.
2. Region of Interest (RoI) Align: The RoI Align op-
eration takes the proposal boxes and extracts cor-
responding feature maps from the backbone net-
work, preserving spatial information and aligning
the feature maps with the regions.
3. Classification and Bounding Box Refinement: For
each proposal box, Mask R-CNN performs two
tasks: classification and bounding box refinement.
It first classifies the object within the box and re-
fines the box’s position if necessary.
4. Mask Generation: Mask R-CNN generates a seg-
mentation mask for each object within the proposal
boxes using a separate neural network that predicts
pixel-wise masks for each object.
5. Final Prediction: The final output of Mask R-CNN
includes:
Figure 10: From Burke et al. (2019), illustration of detection inference
• Classification results indicating the object in an actual DECaLS image of ACO 1689. Galaxy masks are depicted
class for each box in light blue, while star masks are presented in green. The confidence
of the detection, indicating the likelihood that the object belongs to a
• Refined bounding box coordinates specific class, is displayed above each mask.
• Segmentation masks representing the ob-
ject’s precise shape at the pixel level
and galaxies (Burke et al., 2019). Figure 10 provides
6. Post-Processing: The final results can undergo
an illustration of how Mask R-CNN identifies and seg-
post-processing steps to filter out low-confidence
ments stars and galaxies in real observational data.
detections, eliminate duplicate detections, and
As a side note, there were earlier Fully Convolu-
fine-tune the masks.
tional Networks (FCN) based methods that predated
7. Output: The output of the Mask R-CNN algorithm
Mask R-CNN, although Mask R-CNN has largely re-
is a set of bounding boxes, each associated with a
placed these simpler FCN-based methods in image seg-
class label and a high-resolution binary mask that
mentation. These earlier FCN-based approaches em-
accurately delineates the object.
ployed straightforward Convolutional Neural Networks
Mask R-CNN demonstrates resilience to noise and (CNNs) like AlexNet (Krizhevsky et al., 2012), VG-
has maintained a track record of excellence in numerous GNet (Simonyan and Zisserman, 2014), GoogLeNet
image segmentation challenges and competitions. How- (Szegedy et al., 2015), and ResNet (He et al., 2016)
ever, it is important to note that the computational de- as backbones, predicting class labels for each pixel in
mands of training and deploying a Mask R-CNN model an image. For instance, Bhambra et al. (2022) utilized
can be substantial, necessitating the availability of ro- popular image classification model architectures such as
bust GPUs. Additionally, this model often mandates a VGG16 (Simonyan and Zisserman, 2014), ResNet50v2
substantial volume of meticulously annotated data for (He et al., 2016), and Xception (Chollet, 2017) as back-
effective training, a process that can be labor-intensive bones, incorporating saliency maps to highlight signif-
and costly. In certain scenarios, Mask R-CNN has the icant regions for segmentation of various components
potential to yield an abundance of smaller segments, of galaxies. Building upon this eXplainable Artificial
contributing to the challenge of over-segmentation. Intelligence (XAI) technique, Tang et al. (2023) further
In the domain of astronomy, Mask R-CNN has found developed and enhanced it for the classification and seg-
widespread application. It has been utilized for tasks mentation of different types of radio galaxies. Addition-
such as segmenting solar filaments in H-α images of ally, Richards et al. (2023) combined both semantic seg-
the Sun (Ahmadzadeh et al., 2019), distinguishing dif- mentation and instance segmentation, fusing Mask R-
ferent types of galaxies from SDSS data (Farias et al., CNN and FCN, to create a panoptic segmentation model
2020), and efficiently detecting and segmenting stars for segmenting galactic structures.
11
3.2. Self-Organizing Map comprehensive understanding:
1. SOM Initialization: Configure the SOM grid by
determining its size and dimensions. Initialize the
SOM nodes with random weights.
2. Competition: Present each input image to the SOM
and identify the winning node, which possesses
weights most similar to the input. Update the
weights of the winning node and its neighboring
nodes using a learning rate and a neighborhood
function. Repeat this process for multiple itera-
tions, allowing the SOM to adapt to the input data.
3. Quantization: Assign each input image to the node
with the closest weights, effectively mapping the
high-dimensional data onto the lower-dimensional
Figure 11: From Qian et al. (2019), an illustration depicting the SOM grid.
training of SOM with red dots representing training data in high-
dimensional space, and black dots alongside the grid indicating the
4. Clustering and Labeling: Group similar nodes to-
trained SOM. gether to form clusters, representing distinct re-
gions or structures in astronomical images. Ana-
Self-Organizing Map (SOM), also known as Koho- lyze the clustered map and assign labels to differ-
nen maps, is a neural network-based technique rooted in ent regions or clusters based on the characteristics
the principles of unsupervised autonomous learning in- of the astronomical objects they represent.
spired by neurobiological studies (Kohonen, 1990). The
SOM’s inherent strength lies in its adept handling of
SOM algorithm adapts to data through synaptic plastic-
complex, high-dimensional data, such as multi-band or
ity, mirroring the organized mapping of sensory inputs
hyperspectral images prevalent in remote sensing and
in the cerebral cortex. This topographic map maintains
astronomical observations. Its robustness against noise
two critical properties: firstly, it retains incoming infor-
and variations in image intensity further enhances its ef-
mation in its proper context at each processing stage,
ficacy in real-world applications. The SOM technique
and secondly, it ensures that neurons handling closely
offers a unique and adaptive approach to image segmen-
related information are positioned near one another, fos-
tation, proving to be a valuable tool for extracting mean-
tering interaction through short synaptic connections.
ingful structures and patterns from various types of im-
In the realm of image segmentation, SOM stands out
agery. This versatility and effectiveness extend across
as a potent unsupervised learning algorithm, adept at
different domains, including astronomy, medical imag-
discerning intricate patterns and structures within com-
ing, and remote sensing, showcasing its applicability to
plex datasets. Its ability to map high-dimensional input
a wide range of segmentation challenges. As an illus-
data onto a lower-dimensional grid ensures the preser-
tration, Schilliro and Romano (2021) employ SOM in
vation of topology and relationships among data points,
astronomy to analyze a high spatial and spectral resolu-
making it particularly well-suited for tasks such as im-
tion Hα line image of the sun, identifying several fea-
age segmentation, where understanding spatial relation-
tures corresponding to the main structures of the solar
ships is crucial. SOM excels in grouping similar pixels,
photosphere and chromosphere. Figure 12 provides an
facilitating the identification of distinct regions or ob-
example of the segmented output generated by SOM for
jects. Its departure from conventional methods, which
solar images.
often rely on predefined criteria, is notable, as SOM
learns patterns directly from the input data, offering an
adaptive and data-driven approach to segmentation. The 3.3. Encoder-Decoder Architectures
algorithm’s versatility in capturing both global and local Encoder-decoder architectures, a powerful paradigm
features positions it as an effective tool across a spec- in image segmentation, have revolutionized astronomy
trum of image segmentation tasks. by enabling precise delineation of celestial objects and
Figure 11 visually illustrates the training process of regions within astronomical images, a crucial step in
SOM. In addition to the introductory steps provided ear- understanding and analyzing astronomical structures.
lier in this section, we delve into the fundamental com- Encoder-decoder architectures comprise two key com-
ponents and central procedures of SOM to provide a ponents:
12
(A) (B) (C)

Figure 12: From Schilliro and Romano (2021), an illustration of the segmented output using SOM for a solar image. (A): A selection of spectral
images obtained along the Hα line. (B): The feature lattice generated by a 4 × 4 SOM, facilitating the segmentation of 16 distinct regions. (C): The
feature lattice produced by a 3 × 3 SOM, enabling the segmentation of nine different regions.

• Encoder: The encoder extracts informative fea- chitecture provides a robust foundation, while the vari-
tures from the input image using a sequence of ations in implementation cater to specific segmentation
convolutional layers that reduce spatial dimensions needs and challenges. In subsequent sections, we ex-
while capturing high-level features. In astronomy, plore the specific details of these encoder-decoder ar-
the encoder learns to discern and learn from the in- chitectures, unraveling their unique attributes and con-
tricate structures, objects, and phenomena present tributions to image segmentation in astronomy and re-
in astronomical images, even in noisy or complex lated domains.
backgrounds.
• Decoder: The decoder takes the encoded features 3.3.1. U-Net
and generates a segmentation mask using trans-
posed convolutions to upsample the spatial dimen-
sions and produce a high-resolution mask. In as-
tronomy, the decoder translates the learned fea-
tures into precise pixel-level delineation of astro-
nomical objects, providing insights into their size,
shape, and spatial distribution.

Encoder-decoder architectures have evolved to in-


clude a diverse range of models, each with unique
features and adaptations. These architectures, includ-
ing U-Net (Ronneberger et al., 2015), U-Net++ (Zhou
et al., 2018), TransUNet (Chen et al., 2021), and Swin-
Unet (Cao et al., 2022), share a common overarch-
ing structure while diverging in their specific encoder Figure 13: From Ronneberger et al. (2015), a schematic representation
and/or decoder implementations. The fundamental ar- of the U-Net architecture.

13
U-Net (Ronneberger et al., 2015), depicted in Fig- putationally intensive, requiring access to robust hard-
ure 13, is a pivotal encoder-decoder architecture that has ware resources. Achieving optimal performance with
significantly transformed image segmentation, includ- U-Net typically involves meticulous hyperparameter
ing in astronomy. Its symmetrical structure with skip tuning, a process that can be time-consuming and it-
connections enables precise object delineation, even in erative. Additionally, effectively deploying U-Net may
complex and noisy backgrounds. This adaptability has necessitate a deep understanding of deep learning con-
found some applications in astronomy, playing a pivotal cepts. Finally, U-Net can be prone to overfitting on
role in tasks such as segmenting large-scale structures small datasets without the proper application of regu-
in simulations (Aragon-Calvo, 2019), outlining stellar larization techniques, potentially leading to suboptimal
wind-driven bubble structures in simulations (Van Oort generalization when applied to new, unseen data.
et al., 2019), pinpointing stellar feedback structures in
observational data (Xu et al., 2020b,a), precisely cap- 3.3.2. U-Net++
turing the intricate details of spiral arms in disk galax-
ies (Bekki, 2021), and delineating galactic spiral arms
and bars (Walmsley and Spindler, 2023). Moreover, it
has proven effective in segmenting individual galaxies
within cosmological surveys (Boucaud et al., 2020; Bre-
tonnière et al., 2021), as well as in segmenting galaxy-
galaxy strong lensing systems (Ostdiek et al., 2022b)
and locating subhalos from strong lens images (Ostdiek
et al., 2022a). Figure 14 visually depicts the 3D out-
put of the Convolutional Approach to Structure Iden-
tification - 3D (casi-3d) prediction (Xu et al., 2020a).
Employing a U-net architecture, this method identi-
fies the positions of protostellar outflows within a real
3D position-position-velocity data cube of the Perseus
molecular cloud.

Figure 15: Overview of the U-Net++ architecture, adapted from Zhou


et al. (2018).

U-Net++ (Zhou et al., 2018), a refined version of the


Velocity

U-Net architecture illustrated in Figure 15, is specifi-


Position
cally engineered to boost feature extraction and enhance
Position
12
CO Emission segmentation precision. This is accomplished by incor-
CASI Prediction
porating nested dense convolutional blocks, which have
the ability to seize a more extensive range of contex-
Figure 14: From Xu et al. (2020a). 3D visualization of the casi-3d tual information. This characteristic is especially use-
prediction on the location of outflows. ful when dealing with the intricacies of astronomical
images, which frequently present a multitude of com-
U-Net offers several advantages for image segmen- plex structures that necessitate accurate delineation. U-
tation. Its versatility enables it to handle structures of Net++ has been successfully utilized to segment fila-
diverse sizes and complexities, while its symmetrical ar- mentary structures within the interstellar medium with
chitecture and effective use of skip connections preserve high precision in Zavagno et al. (2023), as well as in
fine spatial details and facilitate efficient segmentation detecting and segmenting moon impact craters on the
of objects of varying scales. This adaptability is crucial lunar surface (Jia et al., 2021). Figure 16 depicts exem-
for working with astronomical images, which often fea- plar outcomes of moon crater detection achieved using
ture astronomical objects spanning a wide spectrum of U-net and U-Net++ architectures.
sizes and shapes. When juxtaposed with its predecessor, U-Net, the ad-
However, U-Net also presents some limitations, pri- vantages of U-Net++ become more pronounced. The
marily associated with its computational demands. inclusion of nested dense convolutional blocks within
Training and inference with U-Net models can be com- U-Net++ marks a substantial advancement. These
14
Ground Truth U-Net U-Net++
Figure 16: From Jia et al. (2021), moon crater detection results of
different networks. Newly predicted craters are indicated by green
circles, accurately recognized craters are represented by blue circles,
and red circles signify unrecognized craters predicted by the network.

blocks equip the model with the ability to access and in-
tegrate a more extensive set of contextual information,
thereby fostering a deeper understanding of complex as-
tronomical structures. This feature enables U-Net++ to
attain more precise and accurate segmentation, which
is of paramount importance in the realm of astronomy,
where objects and structures can be highly complex and
exhibit considerable variation in size and shape. U-
Net++ has demonstrated its dominance by consistently
surpassing U-Net in various medical image segmenta-
tion challenges, thereby cementing its status as the op-
timal choice for these tasks.

3.3.3. TransUNet

Figure 18: From Yang et al. (2023), segmentation results with and
without the Transformer branch. The first column corresponds to the
input image, the second column showcases the segmentation result
without the Transformer branch, and the third column exhibits the
segmentation result obtained using the Transformer branch. The red
boxes highlight examples where the utilization of a dual-branch net-
work enhances feature extraction precision and improves MBP seg-
mentation. The yellow boxes point out instances where the dual-
branch network leads to fewer misidentifications of MBPs.
Figure 17: Overview of the TransUNet framework, as presented in
Chen et al. (2021).

TransUNet (Chen et al., 2021), depicted in Figure 17,


is a novel combination of the transformer architec-
ture and the encoder-decoder structure inherent to U-
Net. This integration harnesses the powerful atten-
tion mechanism intrinsic to transformers, equipping the
model with the ability to capture extensive dependen-
15
cies across the entire dataset. This feature is espe-
cially beneficial when dealing with complex structures
or objects that may cover large areas within an image.
By combining the advantages of both transformers and
U-Net, TransUNet presents a robust solution for tack-
ling image segmentation challenges in astronomy, en-
suring accurate delineation of celestial objects. Jia et al.
(2023b) utilize TransUnet to extract small impact crater
features on the moon with superior accuracy compared
to other models. Additionally, Yang et al. (2023) apply
TransUnet to segment magnetic bright points (MBP) in
the solar photosphere, achieving elevated accuracy. Fig-
ure 18 illustrates an instance of applying TransUnet to
segment MBP in the solar photosphere, comparing the
results with and without the Transformer branch.
Choosing between U-Net, U-Net++, and TransUNet
hinges on the unique characteristics of the segmentation
task and the inherent nature of the input data. While
TransUNet presents distinct advantages in certain as-
pects, the comparison of performance is contingent on
the complexity of the dataset and the specific demands
of the segmentation task at hand.

3.3.4. Swin-UNet
Swin-UNet (Cao et al., 2022), depicted in Figure 19,
represents an innovative amalgamation that integrates
the hierarchical architecture of the Swin Transformer
with the encoder-decoder framework of U-Net. Swin
Transformers are renowned for their ability to process
images with diverse structures and scales efficiently.
Figure 19: Overview of the Swin-UNet framework, adapted from Cao
Swin-UNet leverages this efficiency to skillfully capture et al. (2022).
complex structures that may vary significantly in size.
In situations where a single-scale approach may fail to
accurately segment objects within astronomical images,
Swin-UNet provides an effective solution.
Swin-UNet has proven its efficacy in effectively han-
dling complex and extensive medical images, substan-
tially improving the precision of image segmentation
tasks within the medical domain. While the utiliza-
tion of Swin-UNet in astronomical contexts is some-
what limited, there are instances where it has been ap-
plied. For example, it has been employed in segment-
ing clouds from remote sensing images (Gong et al.,
2023) and detecting astronomical targets from multi-
color photometry sky surveys (Jia et al., 2023a). Fig- RGB image Label UNet Swin-Unet Gong23
ure 20 illustrates Swin-UNet’s proficiency in accurately
segmenting clouds from a remote sensing image. Figure 20: From Gong et al. (2023), prediction outcomes of various
models on the AIR-CD remote sensing dataset. The Gong23 model
3.4. Vision Transformers (ViT) integrates both Swin-UNet and traditional CNN in its architecture.

The Vision Transformer (ViT) framework, illustrated


in Figure 21 and introduced by Dosovitskiy et al.
16
(2020), represents an innovative deep learning architec- segmentation tasks. In astronomy, where celestial ob-
ture that has transformed the landscape of image seg- jects can vary in size, shape, and distribution, the power
mentation. ViTs leverage the power of transformers of transformers and their attention mechanisms can sig-
(Vaswani et al., 2017), originally designed for natural nificantly enhance the accuracy and robustness of seg-
language processing (NLP) tasks, to capture long-range mentation processes.
dependencies and intricate spatial relationships within In addition to the previously outlined general steps
images. At the heart of transformers is the concept of in this section, we explore the fundamental components
attention, which allows them to focus on specific re- and central procedures of ViTs to offer a comprehensive
gions of an image, leading to highly effective feature understanding:
extraction. The integration of ViTs with image seg- 1. Image Patching: ViTs divide input images into
mentation has ushered in a groundbreaking paradigm smaller, non-overlapping patches. This process
for comprehending and delineating objects within scien- transforms the 2D image data into a sequence of
tific imagery, notably in the context of medical images. 2D patches. These patches serve as the input to the
The potential for applying this innovative approach to ViT model. Patch size is a crucial hyperparameter
astronomy images in the future holds great promise, of- to consider, as it affects the trade-off between com-
fering the prospect of achieving even greater precision putational efficiency and capturing fine-grained de-
and adaptability in segmentation results. tails.
Vision Tr ansfor mer (ViT) Tr ansfor mer Encoder 2. Embedding the Patches: Each patch is flattened
Class Lx
+
into a 1D vector and linearly projected to create
Bird MLP
Ball
Car Head
MLP
embeddings. These embeddings carry information
...
about the patches and are the input data for the ViT
Norm
Transformer Encoder
model. They allow ViTs to work with sequences of
+
Patch + Position
data, which is a fundamental concept in NLP and
0 1 2 3 4 5 6 7 8 9 Multi-Head
*

Embedding
* Extra learnable
Attention now applied to images.
[ c l as s ] embedding Linear Projection of Flattened Patches
Norm 3. Positional Encoding: Since ViTs lack the inherent
spatial understanding of CNNs, they incorporate
Embedded
Patches positional encoding to provide spatial information
to the model. This encoding informs the model
Figure 21: Overview of the Vision Transformer (ViT) framework as about the relative locations of patches within the
presented in (Dosovitskiy et al., 2020).
image. Various positional encoding techniques, in-
cluding sinusoidal encoding or learned positional
To understand the capabilities of ViTs in image seg-
encodings, can be used.
mentation, it is essential to explore the foundations of
4. Design Model Architecture: Design a ViT ar-
transformers and attention mechanisms. Transformers
chitecture for image segmentation. ViTs, unlike
are a class of deep learning models that have revolution-
CNNs, use a transformer architecture. A typical
ized various domains, including NLP and computer vi-
ViT model comprises several key components:
sion. These models rely on a mechanism called “atten-
tion,” which enables them to process sequential or spa- • Embedding Layer: This layer converts image
tial data effectively. In the context of computer vision, patches into embedding vectors.
transformers break down images into smaller patches, • Transformer Encoder Blocks: These blocks
treating them as sequences of data. This approach al- process the embeddings and capture spatial
lows them to capture global context and intricate spatial relationships and context information across
relationships, which are vital for tasks like image seg- patches.
mentation.
• Class Token: ViTs add a class token to the
Attention, the key mechanism within transformers,
embeddings to perform classification.
enables the model to assign varying levels of impor-
tance to different parts of the input data. By learn- • Positional Encodings: These encodings help
ing these importance weights, the model can focus on the model understand the spatial position of
relevant information while filtering out noise or irrele- patches.
vant details. This mechanism’s ability to capture long- • Linear Projection: This projection maps the
range dependencies in the data has made transformers, transformer output to the segmentation mask
and subsequently ViTs, exceptionally effective in image space.
17
ViTs have become increasingly prominent in the field have also ventured into various astronomical applica-
of computer vision, demonstrating remarkable profi- tions. For instance, they have been employed in tasks
ciency, particularly in scientific image segmentation. In such as classifying transient astronomical sources (Chen
domains like medical image segmentation, including X- et al., 2023) and estimating strong gravitational lensing
ray, CT, and MRI datasets, ViTs have showcased SOTA parameters (Huang et al., 2022). These initial explo-
performance and remarkable accuracy, as summarized rations into the application of ViTs in astronomy repre-
by (Henry et al., 2022). sent an exciting frontier. ViTs bring the transformative
capabilities of transformers and attention mechanisms
to this domain, potentially reshaping the way we ana-
lyze and comprehend astronomical structures in the vast
expanse of astronomical images in the future.

3.5. Mamba

Figure 22: From Merz et al. (2023): Left: Image showing artifacts
like blooming and optical ghosts around the bright star in the upper
right and large ghosts in the lower middle. Right: Inference results
from an MViTv2 Lupton-scaled network.

Figure 24: Overview of the Mamba framework as presented in (Gu


and Dao, 2023).

Mamba (Gu and Dao, 2023), depicted in Figure 24,


emerged as a recent model architecture in the machine
learning community, showcasing strong performance
across various downstream sequence modeling tasks.
Originally designed for large-scale natural language
processing tasks with extended sequence lengths akin to
Transformers, Mamba introduces structured state space
models (SSMs) and a hardware-aware parallel algo-
rithm in recurrent mode. This design simplifies the tra-
Image Ground Truth ResNet ViT ditional Transformer architecture, which heavily relies
on attention mechanisms and MLP blocks, resulting in
Figure 23: From Dai et al. (2022), a comparative analysis of terrain significantly faster inference speeds (5x higher through-
segmentation on Mars using ResNet and ViTs. put) and linear scalability with data sequence length.
One pivotal component of Mamba is its selective
There are pioneering direct applications of ViTs in state space model (SSM) layer, which assigns different
astronomy for segmentation tasks, as demonstrated by weights to inputs, enabling the model to prioritize pre-
Merz et al. (2023) using MViTv2 (Multiscale Vision dictive data for specific tasks. This adaptability allows
Transformers) for segmenting astronomical objects, and Mamba to excel in various sequence modeling tasks,
by Dai et al. (2022) applying ViTs for terrain segmenta- spanning languages, images, audio, and genomics.
tion on Mars. Figure 22 presents an example of MViTv2 While Mamba has not yet been applied to astronom-
used to segment astronomical objects in an image con- ical image segmentation, its proficiency in modeling
taining artifacts like blooming and optical ghosts, with variable and lengthy sequences suggests potential bene-
the model effectively ignoring these artifacts in its pre- fits for this community with straightforward adaptation.
dictions. For a visual comparison across different mod- The data pre-processing pipeline for Mamba resembles
els, Figure 23 illustrates the performance of ViTs in seg- that of Transformers but omits the requirement for posi-
menting terrain on Mars. Beyond segmentation, ViTs tional encoding.
18
3.6. Generative Models represents a notable departure from the conventional
discriminative segmentation deep learning paradigm.
The conditional generative model exhibits minimal re-
quirements for task-specific architecture and loss func-
𝜇𝑥 Sampling
Encoder Decoder tion modifications, fully capitalizing on the capabil-
𝜎𝑥
ities of off-the-shelf generative models (Chen et al.,
Data 𝑥
Bottleneck layer
Reconstructed data 𝑥ҧ
2023a). This characteristic positions them as a promis-
(a) Variational Auto-Encoder. ing method in image segmentation. A schematic com-
parison between conventional discriminative learning
and a generative learning-based model for segmentation
Random
input
Generator Discriminator is illustrated in Figure 26.

Generated data 𝑥ҧ True or False Real Data 𝑥 Discriminative


(b) Generative Adversarial Networks. learning

Data Latent
Diffusion process
𝑞(𝑥1 |𝑥0 ) 𝑞(𝑥 𝑇 |𝑥 𝑇−1 )
𝑥0 𝑥1 𝑥𝑇−1 𝑥𝑇
𝑝𝜃 (𝑥0 |𝑥1 ) 𝑝𝜃 (𝑥𝑇−1 |𝑥𝑇 )
Denoising process

(c) Diffusion Probabilistic Models.


Mask

N classes
Figure 25: Overview of the general frameworks for three generative (a) Discriminative semantic segmentation
models, figure adapted from (Zhu et al., 2022b).
Posterior
Prior learning
learning
Generative models constitute an intriguing subset of
deep learning that transcends the confines of conven-
tional image segmentation. They not only identify and
outline objects within images, but also generate new Latent distribution
data based on discerned patterns and structures. These
models are engineered to grasp the inherent distribution (b) Generative semantic segmentation
of data, enabling them to generate novel, data-coherent
content. In the realm of image segmentation, genera- Figure 26: From Chen et al. (2023a), a schematic comparison illus-
tive models offer a unique proficiency: they can fab- trating (a) conventional discriminative learning and (b) a generative
ricate detailed and lifelike segmentations of objects, a learning-based model for segmentation.
capability that can prove invaluable in various fields, in-
cluding astronomy. Generative models, utilizing tech-
niques like Variational Autoencoders (VAEs)(Kingma 3.6.1. Variational Autoencoders (VAEs)
and Welling, 2014), Generative Adversarial Networks Variational Autoencoders (VAEs) (Kingma and
(GANs) (Goodfellow et al., 2014), and Denoising Dif- Welling, 2014) represent a powerful class of genera-
fusion Probabilistic Models (DDPMs) (Sohl-Dickstein tive models used in various domains, inclding com-
et al., 2015; Ho et al., 2020), are capable of simulta- puter vision and image processing. These models ex-
neously identifying inherent patterns in an image and cel at learning the underlying structure of data and, in
generating segmentations that mirror these patterns ac- the context of image segmentation, play a pivotal role
curately. This vibrant fusion of image interpretation in understanding image patterns and generating coher-
and generation paves the way for new possibilities in ent segmentations. VAEs offer a unique approach to
object identification and precise image outlining. The both data compression and generation, making them in-
schematic representations of these generative models valuable in tasks like image reconstruction and synthe-
are depicted in Figure 25. sis. VAE training usually seeks to minimize the Kull-
In contrast to conventional deep learning models for back–Leibler (KL) divergence (a statistical metric that
segmentation, which rely on the log-likelihood of a con- measures the similarity between two distributions) via
ditional probability (i.e., the classification probability the Gaussian reparametrization in the bottleneck layer
of image pixels), generative segmentation models in- as part of the overall loss function. Additional image
troduce an auxiliary latent variable distribution. This perception losses such as L1 are further introduced to
19
ensure the reconstruction ability of the neural networks. are relatively data-efficient, they still require a certain
The overall learning objective can be formulated as: amount of labeled data for training. Acquiring a suffi-
ciently large and diverse dataset for segmentation tasks
log p(x) = Eq(x|z) [log p(x|z)] − DKL [q(z|x)||q(z)], (1) can be challenging, especially in some domains.

where p represents the decoder, q is the encoder, x and


z denote the original raw data and the learned latent em-
bedding, respectively.
In addition to the general training steps outlined at the
start of this section, we also provide a breakdown of key
components and steps involved in training and operating
VAE:
1. Model architecture design: A VAE consists of
an encoder and a decoder. The encoder com-
presses the input image into a lower-dimensional
latent space representation, while the decoder re-
constructs the image from the latent representation.
2. Objective function: The VAE objective function
consists of two parts: the reconstruction loss,
which measures how well the VAE can reconstruct
the input image, and the regularization term, which
ensures that the latent space follows a desired dis-
tribution, such as a Gaussian distribution.
3. Training: The VAE is trained on the prepared Figure 27: From Karmakar et al. (2018), comparison between clus-
dataset by minimizing the objective function. Dur- ters detected using VAEs (highlighted in red) and detection results
from Tej et al. (2006) (depicted by black circles) around IRAS
ing training, the encoder learns to map input im- 06055+2039. The red cross indicates the detected center as per VAE,
ages to a structured latent space, while the decoder while the plus sign denotes the position of the IRAS point source.
learns to generate accurate segmentations.
VAEs offer several unique advantages for image seg- Although VAEs have found some success in applica-
mentation. They are capable of both learning and gen- tions in medical image segmentation, addressing chal-
erating segmentations effectively, due to their ability to lenges such as segmenting ambiguous medical images
create a structured latent space where images are repre- (Kohl et al., 2018) and detecting incorrect segmenta-
sented in a continuous and meaningful way. This fea- tions in medical images (Sandfort et al., 2021), their
ture facilitates the capture of underlying patterns and application in segmentation within astronomy studies
structures within images. Additionally, VAEs can gen- is relatively limited. Notably, Karmakar et al. (2018)
erate segmentations for previously unseen data, making applied VAEs and Gaussian Mixture Models in tandem
them valuable for tasks involving new or unobserved to detect and segment stellar clusters from near-infrared
data, such as in medical or astronomical imaging ap- data. Figure 27 illustrates the performance of VAEs in
plications. segmenting stellar clusters in a near-infrared image.
However, VAEs come with their set of limitations.
Their complex architecture, with encoder, decoder, and 3.6.2. Generative Adversarial Networks (GANs)
regularization components, can be challenging to train Generative Adversarial Networks (GANs) (Goodfel-
and fine-tune, especially for beginners in deep learning. low et al., 2014) are a type of deep learning model that
The regularization terms can also oversmooth segmen- can generate new data, such as images or text, that is in-
tations, losing fine details. This can make VAEs un- distinguishable from real data. GANs are composed of
suitable for tasks that require preserving intricate image two neural networks: a generator and a discriminator.
elements, such as medical image segmentation. Train- The generator creates new data, while the discriminator
ing and running VAEs can be computationally expen- attempts to distinguish between real and generated data.
sive, especially for large or high-dimensional images. GAN training operates on the premise that the dis-
This can limit their practicality for users with limited criminator should adeptly capture the characteristics of
computational resources. Additionally, although VAEs the target distribution. If a generator manages to deceive
20
a well-trained discriminator, it signifies the generator’s 3. Training Process: GAN training is an iterative ad-
effectiveness as an approximator for the target distribu- versarial process where the generator and discrim-
tion. The training process involves both the generator inator compete to enhance their performances. Ini-
and discriminator working in tandem in an adversarial tially, the generator produces synthetic data evalu-
manner. Here, the generator strives to deceive the dis- ated by the discriminator. Subsequently, the gen-
criminator by producing more realistic data, while the erator updates its parameters to create more real-
discriminator endeavors to enhance its ability to distin- istic data, while the discriminator adjusts its pa-
guish between real and generated data. This adversarial rameters to better discern between real and syn-
competition propels both networks to refine their perfor- thetic data. This iterative process continues until
mance, leading to the generation of progressively more both networks converge to optimal states. Ideally,
realistic data. In terms of loss functions, while the dis- the generator’s loss decreases while the discrimi-
criminator and generator are trained together, they are nator’s loss stabilizes. Notably, during discrimina-
optimized using different loss functions in an adversar- tor training, the generator remains constant to en-
ial way: able the discriminator to distinguish real from syn-
thetic data effectively. Conversely, during genera-
minG maxD V(D, G) = tor training, the discriminator remains constant to
(2)
E x∼pdata (x) [logD(x)] + Ez∼pZ (z) [log(1 − D(G(z)))], prevent the generator from chasing a moving tar-
get, thereby facilitating convergence and iterative
where G and D are the generator and discriminator, re- improvement of both networks.
spectively. x and z denotes the original raw data and the
learned latent embedding, respectively. GANs can create high-quality synthetic data that is
GANs have revolutionized many fields, including similar to real data. This is useful for image synthe-
computer vision, image processing, and natural lan- sis and segmentation. GANs can also augment exist-
guage processing. They are used in a wide range of ing datasets, reducing the need for manual data labeling.
applications, including image generation, style transfer, This can improve model performance in image segmen-
and image segmentation. tation. Additionally, GANs can generate realistic im-
In the context of image segmentation, GANs can be ages, making them useful for image-to-image transla-
used to generate labeled images, which can be used to tion and image segmentation. GANs are versatile and
train machine learning models to perform segmentation can be adapted for various applications, including im-
tasks more accurately. GANs can also be used to cre- age segmentation in astronomy and medical imaging.
ate synthetic but realistic data for segmentation tasks in However, GANs can be challenging to train. It can
challenging domains, such as astronomy, where acquir- be difficult to achieve convergence and stability, and it
ing real labeled data can be difficult or expensive. often requires significant computational resources and
Apart from the general training procedures intro- expertise. GANs can also suffer from mode collapse,
duced earlier in this section, we offer an in-depth explo- where they generate limited variations of data. GANs
ration of the essential elements and processes required are sensitive to hyperparameter settings, and it can be
for training and utilizing GANs: time-consuming to select the right settings for learning
rate, architecture, and batch size. Additionally, GANs
1. Designing the Network: GANs consist of two pri- can generate unrealistic artifacts or biases present in the
mary components—the generator and the discrim- training data. GANs also require a large amount of data
inator. The generator employs random noise as in- for effective training. Finally, GANs are complex mod-
put to generate images, while the discriminator’s els, requiring expertise in deep learning for successful
function is to differentiate between real and gener- implementation and management.
ated images. Despite the inherent challenges, GANs showcase re-
2. Defining Loss Functions: The effectiveness of markable capabilities in data generation and refinement,
GAN training depends on carefully defining loss particularly within the realm of image segmentation,
functions for the generator and discriminator. The presenting a potential revolution in the field. Notably,
generator seeks to minimize its loss function, Liu et al. (2021) harnessed Conditional GANs to ef-
which quantifies its ability to deceive the discrimi- fectively segment filaments in low-quality solar im-
nator. Conversely, the discriminator aims to mini- ages, surpassing the performance of traditional UNet
mize its loss function, which measures its capabil- methods. Additionally, Reiman and Göhre (2019) uti-
ity to distinguish between real and synthetic data. lized GANs to tackle the issue of deblending overlap-
21
ping galaxies, a task comparable to galaxy segmen-
tation in managing scenarios with overlapping galax-
ies. In the domain of radio data, Vos et al. (2019)
utilized GANs for segmenting radio frequency interfer-
ence (RFI). Figure 28 illustrates an application of GANs
in segmenting radio signals and RFIs from noisy radio
data. These instances underscore the promising contri-
butions of GANs to image segmentation, signaling their
potential to reshape the landscape of this field signifi-
cantly.

3.6.3. Denoising Diffusion Probabilistic Models


(DDPMs)
Denoising Diffusion Probabilistic Models
(DDPMs) (Sohl-Dickstein et al., 2015; Ho et al.,
2020) are a class of advanced generative models that
have emerged as a powerful approach in machine
learning and image processing. They are designed
to tackle challenging problems related to image gen-
eration, denoising, and restoration, including image
segmentation.
DDPMs are inspired by the concept of diffusion,
which is the process of gradually spreading noise
throughout an image. In contrast, DDPMs work by it-
eratively removing noise from an image, gradually re-
vealing the underlying structure. This iterative process
enables DDPMs to effectively model the complex rela-
tionships between noisy and clean images, making them
proficient in both denoising and generating realistic im-
ages. While there has been fast development regard-
ing the diffusion training techniques tailored for vari-
ous generative tasks in machine learning and computer
vision (Austin et al., 2021; Rombach et al., 2022; Zhu
et al., 2023), vanilla DDPMs are trained on a variational
lower bound defined as follows:
Lvb = Eq [DKL (q(xT |x0 )||p(xT )) +
| {z }
Figure 28: From Vos et al. (2019), illustrations of the input mixture LT
and the ground truth signal and RFI components (rows 1-2). Addition- X
ally, the separated components by GANs are presented, accompanied D (q(xt−1 |xt , x0 )||pθ (xt−1 |xt )) − log pθ (x0 |x1 )],
by the corresponding absolute error maps (rows 3-4).
| KL {z } | {z }
t>1
Lt−1 L0
(3)
where q and p represent the diffusion and denoising pro-
cesses, respectively. xi denotes the data at diffusion step
t. θ stands for learnable model parameters.
While they are quite advanced and require a deep un-
derstanding of probabilistic modeling and neural net-
works, here is a simplified overview of how they work:
1. Forward Diffusion Process: The forward diffusion
process starts with a clean image and gradually
adds noise to it, until it reaches a desired level of
noise.
22
2. Reverse Diffusion Process: The reverse diffu- tations of traditional methods, especially in challenging
sion process starts with a noisy image from the conditions. For example, a recent study by Xu et al.
forward diffusion process and gradually removes (2023) showed that DDPMs can be used to achieve
noise from it, until it is as close to the original highly accurate segmentation of filamentary structures
image as possible. This is done by reversing the in astronomical images. Figure 29 presents an illus-
steps of the forward diffusion process using a well- tration of DDPM’s application in segmenting a distinct
trained DDPM model. type of filament formed through the collision-induced
3. Image Generation: A trained DDPM model can magnetic reconnection (CMR) mechanism in dust emis-
be used to generate realistic images. To do this, a sion. As machine learning and image processing con-
noisy or corrupted image is fed to the model. The tinue to evolve, DDPMs are poised to play an increas-
model then denoises and rejuvenates the image to ingly important role in image segmentation and other
approximate the original image. image-related tasks.
4. Image segmentation: In the context of image seg-
mentation, conditional DDPM is often used. In 3.7. Transfer Learning
this case, the original image is the conditional in-
put, and the corresponding segmentation mask is Transfer learning is a potent approach to enhance the
the target. The goal is to recover the mask as ac- efficacy of image segmentation tasks by capitalizing on
curately as possible through the reverse diffusion the knowledge acquired from pre-trained neural net-
process. works. This methodology entails taking a pre-trained
model, which has undergone training on an extensive
DDPMs offer several advantages for image segmen- dataset for a related task, and fine-tuning it for a new
tation. First, they can generate high-quality and realistic image segmentation objective. It is essential to recog-
images by effectively removing noise, which is essential nize that transfer learning is a technique, not a fixed ar-
for image segmentation. Second, DDPMs are designed chitecture in deep learning. Consequently, various ar-
to preserve the underlying structure and features of an chitectures, including Unet and ViTs, can be employed
image during denoising, which is important for accurate in transfer learning as long as pre-trained models are ac-
segmentation of object boundaries and intricate details. cessible for use.
Third, DDPMs are based on a probabilistic framework, Transfer learning is particularly beneficial when
which allows them to model the complex relationships working with limited labeled data for the specific seg-
between noisy and clean images. mentation problem at hand. In this case, the pre-trained
model, often referred to as the ”base model,” provides
a foundation of feature extraction and pattern recogni-
tion. By transferring this knowledge, the model can
learn to understand and recognize common visual pat-
terns, edges, and textures, which are essential for seg-
mentation tasks.
The transfer learning process then fine-tunes the base
model on the target dataset, which contains the images
and corresponding segmentation labels specific to the
new task. This fine-tuning refines the model’s learned
representations and adapts them to the nuances of the
segmentation problem in question.
Transfer learning offers several benefits for image
segmentation. First, it significantly reduces the need
Dust Emission Ground Truth CASI-2D DDPM
for a large annotated dataset, as the knowledge from the
Figure 29: From Xu et al. (2023), evaluation of casi-2d (a UNet archi- pre-trained model jumpstarts the learning process. This
tecture) and DDPM in segmenting CMR filaments in dust emission. can be particularly advantageous in domains like med-
ical imaging and astronomy, where obtaining labeled
However, DDPMs require significant computational data can be labor-intensive and time-consuming. Sec-
resources to train and deploy. Despite this challenge, ond, transfer learning accelerates the training process,
DDPMs represent a promising new approach to image allowing the model to reach a desired level of accuracy
segmentation, with the potential to overcome the limi- with fewer iterations.
23
Third, transfer learning enhances the model’s gener- cision, efficiency, and practical utility of segmentation
alization capacity. The features extracted by the base tasks across diverse domains. Notable examples include
model have been learned from extensive data, making the work of Mackovjak et al. (2021), where a pre-trained
them robust and transferable across related tasks. This model and transfer learning were employed to segment
adaptability proves beneficial when working with di- solar corona structures on the sun, and the study by (La-
verse images and complex backgrounds, common in torre et al., 2023), which utilized a transfer learning ap-
fields like astronomy. proach to detect asteroid craters on the moon. In another
application, Gu et al. (2023) applied transfer learning
with a Mask R-CNN architecture, starting with a model
pre-trained on the Microsoft Common Objects in Con-
text (MS COCO) dataset, and retrained it to segment
galaxies. Domı́nguez Sánchez et al. (2019) also demon-
strates the effectiveness of transfer learning by applying
a pre-existing training model that classifies galaxies in
one survey data to new survey data with minimal re-
training using a small sample. Figure 30 provides an
example of the transfer learning application of Mask R-
CNN in segmenting different types of galaxies. These
instances underscore the versatility and effectiveness of
transfer learning in addressing segmentation challenges
across various scientific domains.

3.8. Large Foundation Models


Large foundation models have become important in
image segmentation, transforming how we identify and
delineate objects in images. Among large foundation
models, the Segment Anything Model (SAM) repre-
sents a groundbreaking advancement, promising unpar-
alleled flexibility and the ability to generalize to new
objects and images without additional training.
Traditionally, segmentation has been divided into in-
teractive and automatic approaches. Interactive segmen-
tation requires user input to refine masks, while au-
tomatic segmentation requires predefined object cate-
gories and substantial training on labeled data. Both
approaches have limitations, making SAM’s unique
promptable interface a significant advantage. SAM
users can provide a variety of prompts, such as fore-
ground/background points, bounding boxes, and masks,
to perform segmentation without additional training.
This flexibility makes SAM suitable for a wide range
of tasks, including multimodal understanding and inte-
gration with AI systems.
Figure 30: From Gu et al. (2023), an illustration of Mask R-CNN
implementing transfer learning by retraining with a pre-trained model SAM’s remarkable adaptability also sets it apart. It
for the segmentation of various types of galaxies. can seamlessly transition between interactive and au-
tomatic segmentation, offering the best of both worlds.
Transfer learning offers significant advantages for im- Moreover, SAM’s training on a massive dataset of over
age segmentation, but it is crucial to carefully select an one billion masks enables it to generalize to new objects
appropriate pre-trained model, ensure compatibility be- and images with unprecedented precision.
tween the source and target tasks, and meticulously fine- SAM’s architecture consists of three key compo-
tune the model. In the domain of image segmentation, nents: a VIT-H (Vision Transformer-Huge) image en-
transfer learning has the potential to enhance the pre- coder, a prompt encoder, and a lightweight transformer-
24
ing crater structures on material surfaces (Gonzalez-
Sanchez et al., 2024) and defects on metal surfaces
(Zarin Era et al., 2023). Figure 31 provides an illus-
trative example of SAM’s application in segmenting de-
fects on a metal surface, suggesting its potential use in
similar solar images for segmenting sunspots, coronal
holes, and solar flares in solar observational images. Al-
though there have not been any direct applications of
SAM published in astronomy studies yet, one notable
application involves using SAM on remote sensing data
observed from space to segment different terrains (Osco
et al., 2023). Figure 32 illustrates an example of SAM
applied to segment different terrains in remote sensing
data. This adaptability holds significant promise for as-
tronomy, suggesting SAM’s potential to segment celes-
tial structures with unprecedented precision.

3.9. Metrics and Evaluations


Various metrics in deep learning-based segmentation
methods serve specific roles in either training or evalu-
ation. Here, we present several widely adopted metrics
Figure 31: From Zarin Era et al. (2023), a demonstration of SAM’s in the computer vision community for tasks related to
application in segmenting defects on a metal surface.
image segmentation.

based mask decoder. The image encoder generates an • Intersection over Union (IoU): IoU is a widely em-
image embedding from each input image, while the ployed metric for evaluating segmentation accu-
prompt encoder embeds various input prompts, includ- racy. It is calculated as the area of intersection be-
ing interaction cues like clicks or bounding boxes. The tween the predicted segmentation mask (denoted
mask decoder then predicts object masks using the im- as A) and the ground truth mask (denoted as B), di-
age embedding and prompt embedding. vided by the area of their union. Mathematically,
IoU is defined as:
|A ∩ B|
IoU = .
|A ∪ B|
A variant of the IoU metric is the mean IoU, which
represents the average IoU computed across all
target classes. Another form of generalized in-
tersection over union, proposed by (Rezatofighi
et al., 2019), can serve as the objective function
to optimize in scenarios involving non-overlapping
Figure 32: From Osco et al. (2023), depictions of images segmented
by SAM utilizing point prompts. The initial column displays the RGB
bounding boxes, where traditional IoU methods
image, followed by the second column illustrating the treatment of may be ineffective.
the point prompt. The third column presents the ground-truth mask,
and the fourth column exhibits the prediction result from SAM. The • Dice Coefficient: The Dice Coefficient, a promi-
last column emphasizes the false-positive (FP) pixels identified in the nent metric in image segmentation, particularly in
prediction. medical image analysis, measures the similarity
between two sets. It is commonly used to assess
SAM represents a groundbreaking large-scale model the agreement between a predicted segmentation
with the potential to revolutionize image segmentation, mask and the ground truth mask. The Dice Coeffi-
offering unparalleled flexibility and adaptability that can cient is defined as:
reshape our perception and interaction with the visual
world. Its versatility is demonstrated in several pioneer- 2|A ∩ B|
Dice = .
ing applications in scientific images, such as segment- |A| + |B|
25
The Dice Coefficient ranges from 0 to 1, with 0 in- serve as widely adopted automatic metrics to assess im-
dicating complete dissimilarity and 1 indicating a age quality according to human perceptions. For video
perfect match. This metric is advantageous in tasks generation, Fréchet Video Distance (FVD) (Unterthiner
where the class of interest is a small portion of the et al., 2019), an extension of FID, evaluates video qual-
overall image, effectively addressing class imbal- ity. In audio generation tasks, emphasis is placed on
ance. Similar to IoU in definition, the Dice Coef- beats and rhythms for the generated audio signals (Zhu
ficient is useful for both training and performance et al., 2022a). In applications like image segmentation,
evaluation. evaluations adhere to established norms specific to the
relevant downstream field of research. Despite the di-
• Pixel Accuracy (PA): PA is a metric used in im- versity of automated quantitative assessment methods,
age segmentation tasks to assess the overall accu- human inspection remains a universal benchmark for a
racy of pixel-wise classification. It is calculated as thorough performance evaluation. Given the diversity of
the ratio of correctly classified pixels to the total perspectives, individuals are likely to have varying opin-
number of pixels in the image. Mean Pixel Ac- ions on segmentation results, and even when providing
curacy (mPA) extends this concept by averaging training data, different individuals may produce differ-
PA over each segmentation class. Unlike metrics ent masks, contributing to subjectivity in user opinions
such as IoU and Dice, PA and mPA are primarily and model training. While methods like Mean Opin-
used for evaluation purposes. Pixel Accuracy of- ion Score exist to collect user opinions and rate model
fers a straightforward assessment of the model’s performance, they require substantial participation and
ability to correctly classify individual pixels, re- can be labor-intensive. Only with a large and diverse
gardless of class labels. However, it may not be group of respondents can a more converged opinion be
the most suitable metric for tasks with imbalanced obtained. We acknowledge that both models and indi-
class distributions. This is because PA treats all viduals may exhibit bias in segmentation tasks. Conse-
classes equally and may not provide a comprehen- quently, it is crucial to emphasize that all outputs from
sive assessment of the model’s performance, es- machine learning methods in image segmentation, in-
pecially for minority classes. In such scenarios, cluding those from traditional segmentation methods
metrics like Intersection over Union (IoU) or Dice listed in Section 2, require careful human inspection be-
Coefficient are often preferred for a more nuanced fore proceeding to further analysis.
evaluation.
• In addition to the commonly used metrics and 4. Conclusions
losses discussed earlier, there are other metrics
specifically designed for object segmentation in the In conclusion, our review of segmentation meth-
computer vision field. Metrics like Precision, Re- ods in astronomy has encompassed both classical tech-
call, and F1 scores are frequently used for eval- niques and the state-of-the-art machine learning ap-
uation, while the cross-entropy loss is a common proaches. These methods play a vital role in diverse sci-
choice for training. These metrics are typically de- entific tasks, providing valuable insights into the com-
fined and calculated based on object class annota- plex structures present in astronomical images and data
tions, making them more suitable for multi-object cubes.
semantic segmentation tasks. The emergence of advanced machine learning tech-
niques in the broader computer science community has
It is important to note that differences may exist be- heralded a new era of segmentation methodologies.
tween traditional discriminative machine learning mod- These innovative approaches offer enhanced capabilities
els and generative models in their training and evalua- for precisely delineating objects in astronomical data,
tion processes. Deep generative models aim to estimate marking a significant advancement in the field.
an unknown real-world distribution and establish an ef- As we navigate the evolving landscape of segmenta-
fective mapping between an easy-to-sample prior (e.g., tion methods, we envision a future where the astron-
Gaussian) and the target implicit distribution. While the omy community fully embraces and integrates these
training goals of generative models may align, the eval- advanced techniques into routine research endeavors.
uation protocols are diverse and tailored to specific ap- By harnessing the power of these cutting-edge seg-
plication scenarios. In image generation, where fidelity mentation methods, astronomers can benefit from more
matters, Fréchet Inception Distance (FID) (Heusel et al., precise, reliable, and efficient segmentation outcomes.
2017) and Inception Score (IS) (Salimans et al., 2016) This, in turn, has the potential to alleviate astronomers
26
from laborious manual efforts, allowing them to fo- Krishnarao, D., Lacerna, I., Lan, T.W., Lane, R.R., Law, D.R.,
cus more on interpreting and understanding the intri- Le Goff, J.M., Leung, H.W., Lewis, H., Li, C., Lian, J., Lin, L.,
Long, D., Longa-Peña, P., Lundgren, B., Lyke, B.W., Mackereth,
cate physical processes captured in their datasets. The J.T., MacLeod, C.L., Majewski, S.R., Manchado, A., Maraston,
synergy between advanced segmentation methods and C., Martini, P., Masseron, T., Masters, K.L., Mathur, S., McDer-
astronomical research holds the promise of unlocking mid, R.M., Merloni, A., Merrifield, M., Mészáros, S., Miglio, A.,
deeper insights into the mysteries of the universe. Minniti, D., Minsley, R., Miyaji, T., Mohammad, F.G., Mosser, B.,
Mueller, E.M., Muna, D., Muñoz-Gutiérrez, A., Myers, A.D., Na-
We express our gratitude to the anonymous referees dathur, S., Nair, P., Nandra, K., Correa do Nascimento, J., Nevin,
for their valuable comments and suggestions, particu- R.J., Newman, J.A., Nidever, D.L., Nitschelm, C., Noterdaeme, P.,
larly the references, which have significantly enhanced O’Connell, J.E., Olmstead, M.D., Oravetz, D., Oravetz, A., Oso-
the quality of this review. DX acknowledges support rio, Y., Pace, Z.J., Padilla, N., Palanque-Delabrouille, N., Palicio,
P.A., Pan, H.A., Pan, K., Parker, J., Paviot, R., Peirani, S., Ramŕez,
from the Virginia Initiative on Cosmic Origins (VICO). K.P., Penny, S., Percival, W.J., Perez-Fournon, I., Pérez-Ràfols, I.,
YZ acknowledges support from the VisualAI lab of Petitjean, P., Pieri, M.M., Pinsonneault, M., Poovelil, V.J., Povick,
Princeton University. We recognize the utilization of J.T., Prakash, A., Price-Whelan, A.M., Raddick, M.J., Raichoor,
A., Ray, A., Rembold, S.B., Rezaie, M., Riffel, R.A., Riffel, R.,
ChatGPT, a language model created by OpenAI using
Rix, H.W., Robin, A.C., Roman-Lopes, A., Román-Zúñiga, C.,
the GPT-3.5 architecture, for grammar checking in our Rose, B., Ross, A.J., Rossi, G., Rowlands, K., Rubin, K.H.R.,
review paper. Salvato, M., Sánchez, A.G., Sánchez-Menguiano, L., Sánchez-
Gallego, J.R., Sayres, C., Schaefer, A., Schiavon, R.P., Schimoia,
J.S., Schlafly, E., Schlegel, D., Schneider, D.P., Schultheis, M.,
References Schwope, A., Seo, H.J., Serenelli, A., Shafieloo, A., Shamsi, S.J.,
Shao, Z., Shen, S., Shetrone, M., Shirley, R., Silva Aguirre, V.,
Adithya, H.N., Kariyappa, R., Shinsuke, I., Kanya, K., Zender, J., Simon, J.D., Skrutskie, M.F., Slosar, A., Smethurst, R., Sobeck,
Damé, L., Gabriel, G., DeLuca, E., Weber, M., 2021. Solar Soft J., Sodi, B.C., Souto, D., Stark, D.V., Stassun, K.G., Steinmetz,
X-ray Irradiance Variability, I: Segmentation of Hinode/XRT Full- M., Stello, D., Stermer, J., Storchi-Bergmann, T., Streblyanska,
Disk Images and Comparison with GOES (1 - 8 Å) X-Ray Flux. A., Stringfellow, G.S., Stutz, A., Suárez, G., Sun, J., Taghizadeh-
SoPh 296, 71. doi:10.1007/s11207-021-01785-6. Popp, M., Talbot, M.S., Tayar, J., Thakar, A.R., Theriault, R.,
Ahmadzadeh, A., Mahajan, S.S., Kempton, D.J., Angryk, R.A., Ji, S., Thomas, D., Thomas, Z.C., Tinker, J., Tojeiro, R., Toledo, H.H.,
2019. Toward filament segmentation using deep neural networks, Tremonti, C.A., Troup, N.W., Tuttle, S., Unda-Sanzana, E., Valen-
in: 2019 IEEE International Conference on Big Data (Big Data), tini, M., Vargas-González, J., Vargas-Magaña, M., Vázquez-Mata,
IEEE. pp. 4932–4941. J.A., Vivek, M., Wake, D., Wang, Y., Weaver, B.A., Weijmans,
Ahumada, R., Allende Prieto, C., Almeida, A., Anders, F., Anderson, A.M., Wild, V., Wilson, J.C., Wilson, R.F., Wolthuis, N., Wood-
S.F., Andrews, B.H., Anguiano, B., Arcodia, R., Armengaud, E., Vasey, W.M., Yan, R., Yang, M., Yèche, C., Zamora, O., Zarrouk,
Aubert, M., Avila, S., Avila-Reese, V., Badenes, C., Balland, C., P., Zasowski, G., Zhang, K., Zhao, C., Zhao, G., Zheng, Z., Zheng,
Barger, K., Barrera-Ballesteros, J.K., Basu, S., Bautista, J., Beaton, Z., Zhu, G., Zou, H., 2020. The 16th Data Release of the Sloan
R.L., Beers, T.C., Benavides, B.I.T., Bender, C.F., Bernardi, M., Digital Sky Surveys: First Release from the APOGEE-2 South-
Bershady, M., Beutler, F., Bidin, C.M., Bird, J., Bizyaev, D., ern Survey and Full Release of eBOSS Spectra. ApJS 249, 3.
Blanc, G.A., Blanton, M.R., Boquien, M., Borissova, J., Bovy, doi:10.3847/1538-4365/ab929e, arXiv:1912.02905.
J., Brandt, W.N., Brinkmann, J., Brownstein, J.R., Bundy, K., Bu- Aragon-Calvo, M.A., 2019. Classifying the large-scale structure of
reau, M., Burgasser, A., Burtin, E., Cano-Dı́az, M., Capasso, R., the universe with deep neural networks. MNRAS 484, 5771–5784.
Cappellari, M., Carrera, R., Chabanier, S., Chaplin, W., Chapman, doi:10.1093/mnras/stz393, arXiv:1804.00816.
M., Cherinka, B., Chiappini, C., Doohyun Choi, P., Chojnowski, Austin, J., Johnson, D., Ho, J., Tarlow, D., van den Berg, R., 2021.
S.D., Chung, H., Clerc, N., Coffey, D., Comerford, J.M., Com- Structured denoising diffusion models in discrete state-spaces, in:
parat, J., da Costa, L., Cousinou, M.C., Covey, K., Crane, J.D., Conference on Neural Information Processing Systems (NeurIPS).
Cunha, K., Ilha, G.d.S., Dai, Y.S., Damsted, S.B., Darling, J., Bandyopadhyay, S., Das, S., Datta, A., 2020. Detection of coro-
Davidson, James W., J., Davies, R., Dawson, K., De, N., de la Ma- nal holes using hough simulated parameterized online region-
corra, A., De Lee, N., Queiroz, A.B.d.A., Deconto Machado, A., based active contour method, in: 2020 URSI Regional Confer-
de la Torre, S., Dell’Agli, F., du Mas des Bourboux, H., Diamond- ence on Radio Science ( URSI-RCRS), pp. 1–4. doi:10.23919/
Stanic, A.M., Dillon, S., Donor, J., Drory, N., Duckworth, C., URSIRCRS49211.2020.9113512.
Dwelly, T., Ebelke, G., Eftekharzadeh, S., Davis Eigenbrot, A., Barra, V., Delouille, V., Kretzschmar, M., Hochedez, J.F., 2009. Fast
Elsworth, Y.P., Eracleous, M., Erfanianfar, G., Escoffier, S., Fan, and robust segmentation of solar euv images: algorithm and results
X., Farr, E., Fernández-Trincado, J.G., Feuillet, D., Finoguenov, for solar cycle 23. Astronomy & Astrophysics 505, 361–371.
A., Fofie, P., Fraser-McKelvie, A., Frinchaboy, P.M., Fromenteau, Barra, V., Delouille, V., Kretzschmar, M., Hochedez, J.F., 2009. Fast
S., Fu, H., Galbany, L., Garcia, R.A., Garcı́a-Hernández, D.A., and robust segmentation of solar EUV images: algorithm and re-
Garma Oehmichen, L.A., Ge, J., Geimba Maia, M.A., Geisler, sults for solar cycle 23. A&A 505, 361–371. doi:10.1051/
D., Gelfand, J., Goddy, J., Gonzalez-Perez, V., Grabowski, K., 0004-6361/200811416.
Green, P., Grier, C.J., Guo, H., Guy, J., Harding, P., Hasselquist, Bekki, K., 2021. Quantifying the fine structures of disk galaxies with
S., Hawken, A.J., Hayes, C.R., Hearty, F., Hekker, S., Hogg, deep learning: Segmentation of spiral arms in different Hubble
D.W., Holtzman, J.A., Horta, D., Hou, J., Hsieh, B.C., Huber, types. A&A 647, A120. doi:10.1051/0004-6361/202039797,
D., Hunt, J.A.S., Ider Chitham, J., Imig, J., Jaber, M., Jimenez arXiv:2103.08127.
Angel, C.E., Johnson, J.A., Jones, A.M., Jönsson, H., Jullo, E., Belavin, V., Trofimova, E., Ustyuzhanin, A., 2021. Segmen-
Kim, Y., Kinemuchi, K., Kirkpatrick, Charles C., I., Kite, G.W., tation of em showers for neutrino experiments with deep
Klaene, M., Kneib, J.P., Kollmeier, J.A., Kong, H., Kounkel, M., graph neural networks. Journal of Instrumentation 16,

27
P12035. URL: https://dx.doi.org/10.1088/1748-0221/ Sembroski, G.H., Lin, J.Y.Y., 2019. Deblending and clas-
16/12/P12035, doi:10.1088/1748-0221/16/12/P12035. sifying astronomical sources with Mask R-CNN deep learn-
Bellm, E.C., Kulkarni, S.R., Graham, M.J., Dekany, R., Smith, R.M., ing. MNRAS 490, 3952–3965. doi:10.1093/mnras/stz2845,
Riddle, R., Masci, F.J., Helou, G., Prince, T.A., Adams, S.M., Bar- arXiv:1908.02748.
barino, C., Barlow, T., Bauer, J., Beck, R., Belicki, J., Biswas, R., Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang,
Blagorodnova, N., Bodewits, D., Bolin, B., Brinnel, V., Brooke, M., 2022. Swin-unet: Unet-like pure transformer for medical im-
T., Bue, B., Bulla, M., Burruss, R., Cenko, S.B., Chang, C.K., age segmentation, in: European conference on computer vision,
Connolly, A., Coughlin, M., Cromer, J., Cunningham, V., De, K., Springer. pp. 205–218.
Delacroix, A., Desai, V., Duev, D.A., Eadie, G., Farnham, T.L., Chen, J., Lu, J., Zhu, X., Zhang, L., 2023a. Generative semantic
Feeney, M., Feindt, U., Flynn, D., Franckowiak, A., Frederick, segmentation, in: Proceedings of the IEEE/CVF Conference on
S., Fremling, C., Gal-Yam, A., Gezari, S., Giomi, M., Goldstein, Computer Vision and Pattern Recognition, pp. 7111–7120.
D.A., Golkhou, V.Z., Goobar, A., Groom, S., Hacopians, E., Hale, Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L.,
D., Henning, J., Ho, A.Y.Q., Hover, D., Howell, J., Hung, T., Hup- Yuille, A.L., Zhou, Y., 2021. Transunet: Transformers make
penkothen, D., Imel, D., Ip, W.H., Ivezić, Ž., Jackson, E., Jones, strong encoders for medical image segmentation. arXiv preprint
L., Juric, M., Kasliwal, M.M., Kaspi, S., Kaye, S., Kelley, M.S.P., arXiv:2102.04306 .
Kowalski, M., Kramer, E., Kupfer, T., Landry, W., Laher, R.R., Chen, Z., Zheng, Y., Li, C., Zhan, Y., 2023b. Fast and robust star
Lee, C.D., Lin, H.W., Lin, Z.Y., Lunnan, R., Giomi, M., Maha- detection algorithm based on the dyadic wavelet transform. IET
bal, A., Mao, P., Miller, A.A., Monkewitz, S., Murphy, P., Ngeow, Image Processing 17, 944–955.
C.C., Nordin, J., Nugent, P., Ofek, E., Patterson, M.T., Penprase, Chen, Z., Zhou, W., Sun, G., Zhang, M., Ruan, J., Zhao, J.,
B., Porter, M., Rauch, L., Rebbapragada, U., Reiley, D., Rigault, 2023. TransientViT: A novel CNN - Vision Transformer hybrid
M., Rodriguez, H., van Roestel, J., Rusholme, B., van Santen, J., real/bogus transient classifier for the Kilodegree Automatic Tran-
Schulze, S., Shupe, D.L., Singer, L.P., Soumagnac, M.T., Stein, sient Survey. arXiv e-prints , arXiv:2309.09937doi:10.48550/
R., Surace, J., Sollerman, J., Szkody, P., Taddia, F., Terek, S., arXiv.2309.09937, arXiv:2309.09937.
Van Sistine, A., van Velzen, S., Vestrand, W.T., Walters, R., Ward, Chollet, F., 2017. Xception: Deep learning with depthwise separable
C., Ye, Q.Z., Yu, P.C., Yan, L., Zolkower, J., 2019. The Zwicky convolutions, in: Proceedings of the IEEE conference on computer
Transient Facility: System Overview, Performance, and First Re- vision and pattern recognition, pp. 1251–1258.
sults. PASP 131, 018002. doi:10.1088/1538-3873/aaecbe, Collet, C., Murtagh, F., 2004. Multiband segmentation based on
arXiv:1902.01932. a hierarchical markov model. Pattern Recognition 37, 2337–
Berry, D.S., 2015. FellWalker-A clump identification algorithm. 2347. URL: https://www.sciencedirect.com/science/
Astronomy and Computing 10, 22–31. doi:10.1016/j.ascom. article/pii/S0031320304001906, doi:https://doi.org/
2014.11.004, arXiv:1411.6267. 10.1016/j.patcog.2004.03.017.
Bertin, E., Arnouts, S., 1996. SExtractor: Software for source extrac- Colombo, D., Rosolowsky, E., Ginsburg, A., Duarte-Cabral, A.,
tion. A&AS 117, 393–404. doi:10.1051/aas:1996164. Hughes, A., 2015. Graph-based interpretation of the molecu-
Bhambra, P., Joachimi, B., Lahav, O., 2022. Explaining lar interstellar medium segmentation. MNRAS 454, 2067–2091.
deep learning of galaxy morphology with saliency mapping. doi:10.1093/mnras/stv2063, arXiv:1510.04253.
MNRAS 511, 5032–5041. doi:10.1093/mnras/stac368, Covas, P.B., Prix, R., 2022. Improved short-segment detec-
arXiv:2110.08288. tion statistic for continuous gravitational waves. Phys. Rev.
Bijaoui, A., Rué, F., 1995. A multiscale vision model D 105, 124007. URL: https://link.aps.org/doi/10.
adapted to the astronomical images. Signal Processing 46, 1103/PhysRevD.105.124007, doi:10.1103/PhysRevD.105.
345–362. URL: https://www.sciencedirect.com/ 124007.
science/article/pii/0165168495000934, doi:https: Dai, Y., Zheng, T., Xue, C., Zhou, L., 2022. Seg-
//doi.org/10.1016/0165-1684(95)00093-4. marsvit: Lightweight mars terrain segmentation network for au-
Boucaud, A., Huertas-Company, M., Heneka, C., Ishida, E.E.O., tonomous driving in planetary exploration. Remote Sensing
Sedaghat, N., de Souza, R.S., Moews, B., Dole, H., Castellano, 14. URL: https://www.mdpi.com/2072-4292/14/24/6297,
M., Merlin, E., Roscani, V., Tramacere, A., Killedar, M., Trindade, doi:10.3390/rs14246297.
A.M.M., Collaboration COIN, 2020. Photometry of high-redshift Dey, A., Schlegel, D.J., Lang, D., Blum, R., Burleigh, K., Fan, X.,
blended galaxies using deep learning. MNRAS 491, 2481–2495. Findlay, J.R., Finkbeiner, D., Herrera, D., Juneau, S., Landriau,
doi:10.1093/mnras/stz3056, arXiv:1905.01324. M., Levi, M., McGreer, I., Meisner, A., Myers, A.D., Moustakas,
Boucheron, L.E., Valluri, M., McAteer, R.T.J., 2016. Segmenta- J., Nugent, P., Patej, A., Schlafly, E.F., Walker, A.R., Valdes, F.,
tion of Coronal Holes Using Active Contours Without Edges. Weaver, B.A., Yèche, C., Zou, H., Zhou, X., Abareshi, B., Ab-
SoPh 291, 2353–2372. doi:10.1007/s11207-016-0985-z, bott, T.M.C., Abolfathi, B., Aguilera, C., Alam, S., Allen, L., Al-
arXiv:1610.01023. varez, A., Annis, J., Ansarinejad, B., Aubert, M., Beechert, J.,
Boursier, Y., Llebaria, A., Goudail, F., Lamy, P., Robelus, S., 2005. Bell, E.F., BenZvi, S.Y., Beutler, F., Bielby, R.M., Bolton, A.S.,
Automatic detection of coronal mass ejections on LASCO-C2 syn- Briceño, C., Buckley-Geer, E.J., Butler, K., Calamida, A., Carl-
optic maps, in: Fineschi, S., Viereck, R.A. (Eds.), Solar Physics berg, R.G., Carter, P., Casas, R., Castander, F.J., Choi, Y., Com-
and Space Weather Instrumentation, pp. 13–24. doi:10.1117/12. parat, J., Cukanovaite, E., Delubac, T., DeVries, K., Dey, S., Dhun-
616011. gana, G., Dickinson, M., Ding, Z., Donaldson, J.B., Duan, Y.,
Bretonnière, H., Boucaud, A., Huertas-Company, M., 2021. Proba- Duckworth, C.J., Eftekharzadeh, S., Eisenstein, D.J., Etourneau,
bilistic segmentation of overlapping galaxies for large cosmolog- T., Fagrelius, P.A., Farihi, J., Fitzpatrick, M., Font-Ribera, A., Ful-
ical surveys. arXiv e-prints , arXiv:2111.15455doi:10.48550/ mer, L., Gänsicke, B.T., Gaztanaga, E., George, K., Gerdes, D.W.,
arXiv.2111.15455, arXiv:2111.15455. Gontcho, S.G.A., Gorgoni, C., Green, G., Guy, J., Harmer, D.,
Buonanno, R., Buscema, G., Corsi, C.E., Ferraro, I., Iannicola, G., Hernandez, M., Honscheid, K., Huang, L.W., James, D.J., Jan-
1983. Automated photographic photometry of stars in globular nuzi, B.T., Jiang, L., Joyce, R., Karcher, A., Karkar, S., Kehoe,
clusters. A&A 126, 278–282. R., Kneib, J.P., Kueter-Young, A., Lan, T.W., Lauer, T.R., Le Guil-
Burke, C.J., Aleo, P.D., Chen, Y.C., Liu, X., Peterson, J.R., lou, L., Le Van Suu, A., Lee, J.H., Lesser, M., Perreault Levasseur,

28
L., Li, T.S., Mann, J.L., Marshall, R., Martı́nez-Vázquez, C.E., Drimmel, R., Katz, D., Lattanzi, M.G., van Leeuwen, F., Bakker,
Martini, P., du Mas des Bourboux, H., McManus, S., Meier, T.G., J., Cacciari, C., Castañeda, J., De Angeli, F., Ducourant, C.,
Ménard, B., Metcalfe, N., Muñoz-Gutiérrez, A., Najita, J., Napier, Fabricius, C., Fouesneau, M., Frémat, Y., Guerra, R., Guerrier,
K., Narayan, G., Newman, J.A., Nie, J., Nord, B., Norman, D.J., A., Guiraud, J., Jean-Antoine Piccolo, A., Masana, E., Messi-
Olsen, K.A.G., Paat, A., Palanque-Delabrouille, N., Peng, X., Pop- neo, R., Mowlavi, N., Nicolas, C., Nienartowicz, K., Pailler, F.,
pett, C.L., Poremba, M.R., Prakash, A., Rabinowitz, D., Raichoor, Panuzzo, P., Riclet, F., Roux, W., Seabroke, G.M., Sordo, R.,
A., Rezaie, M., Robertson, A.N., Roe, N.A., Ross, A.J., Ross, N.P., Tanga, P., Thévenin, F., Gracia-Abril, G., Portell, J., Teyssier, D.,
Rudnick, G., Safonova, S., Saha, A., Sánchez, F.J., Savary, E., Altmann, M., Andrae, R., Bellas-Velidis, I., Benson, K., Berthier,
Schweiker, H., Scott, A., Seo, H.J., Shan, H., Silva, D.R., Slepian, J., Blomme, R., Brugaletta, E., Burgess, P.W., Busso, G., Carry,
Z., Soto, C., Sprayberry, D., Staten, R., Stillman, C.M., Stupak, B., Cellino, A., Cheek, N., Clementini, G., Damerdji, Y., David-
R.J., Summers, D.L., Sien Tie, S., Tirado, H., Vargas-Magaña, son, M., Delchambre, L., Dell’Oro, A., Fernández-Hernández, J.,
M., Vivas, A.K., Wechsler, R.H., Williams, D., Yang, J., Yang, Galluccio, L., Garcı́a-Lario, P., Garcia-Reinaldos, M., González-
Q., Yapici, T., Zaritsky, D., Zenteno, A., Zhang, K., Zhang, T., Núñez, J., Gosset, E., Haigron, R., Halbwachs, J.L., Hambly,
Zhou, R., Zhou, Z., 2019. Overview of the DESI Legacy Imag- N.C., Harrison, D.L., Hatzidimitriou, D., Heiter, U., Hernández,
ing Surveys. AJ 157, 168. doi:10.3847/1538-3881/ab089d, J., Hestroffer, D., Hodgkin, S.T., Holl, B., Janßen, K., Jevardat
arXiv:1804.08657. de Fombelle, G., Jordan, S., Krone-Martins, A., Lanzafame, A.C.,
Domı́nguez Sánchez, H., Huertas-Company, M., Bernardi, M., Kavi- Löffler, W., Lorca, A., Manteiga, M., Marchal, O., Marrese, P.M.,
raj, S., Fischer, J.L., Abbott, T.M.C., Abdalla, F.B., Annis, J., Moitinho, A., Mora, A., Muinonen, K., Osborne, P., Pancino, E.,
Avila, S., Brooks, D., Buckley-Geer, E., Carnero Rosell, A., Car- Pauwels, T., Petit, J.M., Recio-Blanco, A., Richards, P.J., Riello,
rasco Kind, M., Carretero, J., Cunha, C.E., D’Andrea, C.B., da M., Rimoldini, L., Robin, A.C., Roegiers, T., Rybizki, J., Sarro,
Costa, L.N., Davis, C., De Vicente, J., Doel, P., Evrard, A.E., Fos- L.M., Siopis, C., Smith, M., Sozzetti, A., Ulla, A., Utrilla, E.,
alba, P., Frieman, J., Garcı́a-Bellido, J., Gaztanaga, E., Gerdes, van Leeuwen, M., van Reeven, W., Abbas, U., Abreu Aramburu,
D.W., Gruen, D., Gruendl, R.A., Gschwend, J., Gutierrez, G., Hart- A., Accart, S., Aerts, C., Aguado, J.J., Ajaj, M., Altavilla, G.,
ley, W.G., Hollowood, D.L., Honscheid, K., Hoyle, B., James, D.J., Álvarez, M.A., Álvarez Cid-Fuentes, J., Alves, J., Anderson, R.I.,
Kuehn, K., Kuropatkin, N., Lahav, O., Maia, M.A.G., March, M., Anglada Varela, E., Antoja, T., Audard, M., Baines, D., Baker,
Melchior, P., Menanteau, F., Miquel, R., Nord, B., Plazas, A.A., S.G., Balaguer-Núñez, L., Balbinot, E., Balog, Z., Barache, C.,
Sanchez, E., Scarpine, V., Schindler, R., Schubnell, M., Smith, M., Barbato, D., Barros, M., Barstow, M.A., Bartolomé, S., Bassilana,
Smith, R.C., Soares-Santos, M., Sobreira, F., Suchyta, E., Swan- J.L., Bauchet, N., Baudesson-Stella, A., Becciani, U., Bellazzini,
son, M.E.C., Tarle, G., Thomas, D., Walker, A.R., Zuntz, J., 2019. M., Bernet, M., Bertone, S., Bianchi, L., Blanco-Cuaresma, S.,
Transfer learning for galaxy morphology from one survey to an- Boch, T., Bombrun, A., Bossini, D., Bouquillon, S., Bragaglia, A.,
other. MNRAS 484, 93–100. doi:10.1093/mnras/sty3497, Bramante, L., Breedt, E., Bressan, A., Brouillet, N., Bucciarelli,
arXiv:1807.00807. B., Burlacu, A., Busonero, D., Butkevich, A.G., Buzzi, R., Caf-
Dorotovic, I., Shahamatnia, E., Lorenc, M., Rybansky, M., Ribeiro, fau, E., Cancelliere, R., Cánovas, H., Cantat-Gaudin, T., Carballo,
R.A., Fonseca, J.M., 2014. Sunspots and Coronal Bright Points R., Carlucci, T., Carnerero, M.I., Carrasco, J.M., Casamiquela, L.,
Tracking using a Hybrid Algorithm of PSO and Active Contour Castellani, M., Castro-Ginard, A., Castro Sampol, P., Chaoul, L.,
Model. Sun and Geosphere 9, 81–84. Charlot, P., Chemin, L., Chiavassa, A., Cioni, M.R.L., Comoretto,
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., G., Cooper, W.J., Cornez, T., Cowell, S., Crifo, F., Crosta, M.,
Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, Crowley, C., Dafonte, C., Dapergolas, A., David, M., David, P.,
S., et al., 2020. An image is worth 16x16 words: Transformers for de Laverny, P., De Luise, F., De March, R., De Ridder, J., de
image recognition at scale. arXiv preprint arXiv:2010.11929 . Souza, R., de Teodoro, P., de Torres, A., del Peloso, E.F., del
Ellien, A., Slezak, E., Martinet, N., Durret, F., Adami, C., Gavazzi, R., Pozo, E., Delbo, M., Delgado, A., Delgado, H.E., Delisle, J.B.,
Rabaça, C., Da Rocha, C., Pereira, D.E., 2021. Dawis: a detection Di Matteo, P., Diakite, S., Diener, C., Distefano, E., Dolding,
algorithm with wavelets for intracluster light studies. Astronomy C., Eappachen, D., Edvardsson, B., Enke, H., Esquej, P., Fabre,
& Astrophysics 649, A38. C., Fabrizio, M., Faigler, S., Fedorets, G., Fernique, P., Fienga,
Farias, H., Ortiz, D., Damke, G., Jaque Arancibia, M., Solar, M., A., Figueras, F., Fouron, C., Fragkoudi, F., Fraile, E., Franke,
2020. Mask galaxy: Morphological segmentation of galaxies. As- F., Gai, M., Garabato, D., Garcia-Gutierrez, A., Garcı́a-Torres,
tronomy and Computing 33, 100420. doi:10.1016/j.ascom. M., Garofalo, A., Gavras, P., Gerlach, E., Geyer, R., Giacobbe,
2020.100420. P., Gilmore, G., Girona, S., Giuffrida, G., Gomel, R., Gomez,
Flewelling, H.A., Magnier, E.A., Chambers, K.C., Heasley, J.N., A., Gonzalez-Santamaria, I., González-Vidal, J.J., Granvik, M.,
Holmberg, C., Huber, M.E., Sweeney, W., Waters, C.Z., Calamida, Gutiérrez-Sánchez, R., Guy, L.P., Hauser, M., Haywood, M.,
A., Casertano, S., Chen, X., Farrow, D., Hasinger, G., Hender- Helmi, A., Hidalgo, S.L., Hilger, T., Hładczuk, N., Hobbs, D.,
son, R., Long, K.S., Metcalfe, N., Narayan, G., Nieto-Santisteban, Holland, G., Huckle, H.E., Jasniewicz, G., Jonker, P.G., Juaristi
M.A., Norberg, P., Rest, A., Saglia, R.P., Szalay, A., Thakar, Campillo, J., Julbe, F., Karbevska, L., Kervella, P., Khanna, S., Ko-
A.R., Tonry, J.L., Valenti, J., Werner, S., White, R., Denneau, choska, A., Kontizas, M., Kordopatis, G., Korn, A.J., Kostrzewa-
L., Draper, P.W., Hodapp, K.W., Jedicke, R., Kaiser, N., Ku- Rutkowska, Z., Kruszyńska, K., Lambert, S., Lanza, A.F., Lasne,
dritzki, R.P., Price, P.A., Wainscoat, R.J., Chastel, S., McLean, B., Y., Le Campion, J.F., Le Fustec, Y., Lebreton, Y., Lebzelter, T.,
Postman, M., Shiao, B., 2020. The Pan-STARRS1 Database and Leccia, S., Leclerc, N., Lecoeur-Taibi, I., Liao, S., Licata, E.,
Data Products. ApJS 251, 7. doi:10.3847/1538-4365/abb82d, Lindstrøm, E.P., Lister, T.A., Livanou, E., Lobel, A., Madrero
arXiv:1612.05243. Pardo, P., Managau, S., Mann, R.G., Marchant, J.M., Marconi,
Gaia Collaboration, Brown, A.G.A., Vallenari, A., Prusti, T., de Brui- M., Marcos Santos, M.M.S., Marinoni, S., Marocco, F., Marshall,
jne, J.H.J., Babusiaux, C., Biermann, M., Creevey, O.L., Evans, D.J., Martin Polo, L., Martı́n-Fleitas, J.M., Masip, A., Massari, D.,
D.W., Eyer, L., Hutton, A., Jansen, F., Jordi, C., Klioner, S.A., Mastrobuono-Battisti, A., Mazeh, T., McMillan, P.J., Messina, S.,
Lammers, U., Lindegren, L., Luri, X., Mignard, F., Panem, C., Michalik, D., Millar, N.R., Mints, A., Molina, D., Molinaro, R.,
Pourbaix, D., Randich, S., Sartoretti, P., Soubiran, C., Walton, Molnár, L., Montegriffo, P., Mor, R., Morbidelli, R., Morel, T.,
N.A., Arenou, F., Bailer-Jones, C.A.L., Bastian, U., Cropper, M., Morris, D., Mulone, A.F., Munoz, D., Muraveva, T., Murphy, C.P.,

29
Musella, I., Noval, L., Ordénovic, C., Orrù, G., Osinde, J., Pagani, age Data. ApJS 248, 20. doi:10.3847/1538-4365/ab8868,
C., Pagano, I., Palaversa, L., Palicio, P.A., Panahi, A., Pawlak, M., arXiv:1906.11248.
Peñalosa Esteller, X., Penttilä, A., Piersimoni, A.M., Pineau, F.X., He, K., Gkioxari, G., Dollár, P., Girshick, R., 2017. Mask r-cnn,
Plachy, E., Plum, G., Poggio, E., Poretti, E., Poujoulet, E., Prša, in: Proceedings of the IEEE international conference on computer
A., Pulone, L., Racero, E., Ragaini, S., Rainer, M., Raiteri, C.M., vision, pp. 2961–2969.
Rambaux, N., Ramos, P., Ramos-Lerate, M., Re Fiorentin, P., Reg- He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning
ibo, S., Reylé, C., Ripepi, V., Riva, A., Rixon, G., Robichon, N., for image recognition, in: Proceedings of the IEEE conference on
Robin, C., Roelens, M., Rohrbasser, L., Romero-Gómez, M., Row- computer vision and pattern recognition, pp. 770–778.
ell, N., Royer, F., Rybicki, K.A., Sadowski, G., Sagristà Sellés, Henry, E.U., Emebob, O., Omonhinmin, C.A., 2022. Vision
A., Sahlmann, J., Salgado, J., Salguero, E., Samaras, N., Sanchez transformers in medical imaging: A review. arXiv preprint
Gimenez, V., Sanna, N., Santoveña, R., Sarasso, M., Schultheis, arXiv:2211.10043 .
M., Sciacca, E., Segol, M., Segovia, J.C., Ségransan, D., Semeux, Herzog, A.D., Illingworth, G., 1977. The Structure of Globular Clus-
D., Shahaf, S., Siddiqui, H.I., Siebert, A., Siltala, L., Slezak, E., ters. I. Direct Plane Automated Reduction Techniques. ApJS 33,
Smart, R.L., Solano, E., Solitro, F., Souami, D., Souchay, J., 55. doi:10.1086/190418.
Spagna, A., Spoto, F., Steele, I.A., Steidelmüller, H., Stephen- Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter,
son, C.A., Süveges, M., Szabados, L., Szegedi-Elek, E., Taris, F., S., 2017. Gans trained by a two time-scale update rule converge
Tauran, G., Taylor, M.B., Teixeira, R., Thuillot, W., Tonello, N., to a local nash equilibrium. Conference on Neural Information
Torra, F., Torra, J., Turon, C., Unger, N., Vaillant, M., van Dillen, Processing Systems (NeurIPS) 30.
E., Vanel, O., Vecchiato, A., Viala, Y., Vicente, D., Voutsinas, S., Ho, J., Jain, A., Abbeel, P., 2020. Denoising diffusion probabilistic
Weiler, M., Wevers, T., Wyrzykowski, Ł., Yoldas, A., Yvard, P., models, in: NeurIPS.
Zhao, H., Zorec, J., Zucker, S., Zurbach, C., Zwitter, T., 2021. Gaia Hopkins, A.M., Miller, C.J., Connolly, A.J., Genovese, C., Nichol,
Early Data Release 3. Summary of the contents and survey prop- R.C., Wasserman, L., 2002. A New Source Detection Algorithm
erties. A&A 649, A1. doi:10.1051/0004-6361/202039657, Using the False-Discovery Rate. AJ 123, 1086–1094. doi:10.
arXiv:2012.01533. 1086/338316, arXiv:astro-ph/0110570.
Gill, C.D., Fletcher, L., Marshall, S., 2010. Using Active Contours for Huang, K.W., Chih-Fan Chen, G., Chang, P.W., Lin, S.C., Hsu,
Semi-Automated Tracking of UV and EUV Solar Flare Ribbons. C.J., Thengane, V., Yao-Yu Lin, J., 2022. Strong Gravitational
SoPh 262, 355–371. doi:10.1007/s11207-010-9508-5. Lensing Parameter Estimation with Vision Transformer. arXiv
Girshick, R., 2015. Fast r-cnn, in: Proceedings of the IEEE interna- e-prints , arXiv:2210.04143doi:10.48550/arXiv.2210.04143,
tional conference on computer vision, pp. 1440–1448. arXiv:2210.04143.
Gong, C., Long, T., Yin, R., Jiao, W., Wang, G., 2023. A hybrid Huertas-Company, M., Lanusse, F., 2023. The Dawes Review 10: The
algorithm with swin transformer and convolution for cloud de- impact of deep learning for the analysis of galaxy surveys. PASA
tection. Remote Sensing 15. URL: https://www.mdpi.com/ 40, e001. doi:10.1017/pasa.2022.55, arXiv:2210.01813.
2072-4292/15/21/5264, doi:10.3390/rs15215264. Irwin, M.J., 1985. Automatic analysis of crowded fields. MNRAS
Gonzalez-Sanchez, E., Saccardo, D., Esteves, P.B., Kuffa, M., We- 214, 575–604. doi:10.1093/mnras/214.4.575.
gener, K., 2024. Automatic characterization of wedm single craters Ivezić, Ž., Kahn, S.M., Tyson, J.A., Abel, B., Acosta, E., Allsman,
through ai based object detection. International Journal of Automa- R., Alonso, D., AlSayyad, Y., Anderson, S.F., Andrew, J., Angel,
tion Technology 18, 265–275. J.R.P., Angeli, G.Z., Ansari, R., Antilogus, P., Araujo, C., Arm-
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, strong, R., Arndt, K.T., Astier, P., Aubourg, É., Auza, N., Axel-
D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adver- rod, T.S., Bard, D.J., Barr, J.D., Barrau, A., Bartlett, J.G., Bauer,
sarial nets, in: NeurIPS. A.E., Bauman, B.J., Baumont, S., Bechtol, E., Bechtol, K., Becker,
Grajeda, J.A., Boucheron, L.E., Kirk, M.S., Leisner, A., Arge, C.N., A.C., Becla, J., Beldica, C., Bellavia, S., Bianco, F.B., Biswas,
2023. Quantifying the consistency and characterizing the confi- R., Blanc, G., Blazek, J., Blandford, R.D., Bloom, J.S., Bogart,
dence of coronal holes detected by active contours without edges J., Bond, T.W., Booth, M.T., Borgland, A.W., Borne, K., Bosch,
(acwe). Solar Physics 298, 133. J.F., Boutigny, D., Brackett, C.A., Bradshaw, A., Brandt, W.N.,
Green, C.E., Dawson, J.R., Cunningham, M.R., Jones, P.A., No- Brown, M.E., Bullock, J.S., Burchat, P., Burke, D.L., Cagnoli,
vak, G., Fissel, L.M., 2017. Measuring Filament Orientation: A G., Calabrese, D., Callahan, S., Callen, A.L., Carlin, J.L., Carl-
New Quantitative, Local Approach. ApJS 232, 6. doi:10.3847/ son, E.L., Chandrasekharan, S., Charles-Emerson, G., Chesley,
1538-4365/aa8507, arXiv:1708.01953. S., Cheu, E.C., Chiang, H.F., Chiang, J., Chirino, C., Chow,
Gu, A., Dao, T., 2023. Mamba: Linear-time sequence modeling with D., Ciardi, D.R., Claver, C.F., Cohen-Tanugi, J., Cockrum, J.J.,
selective state spaces. arXiv preprint arXiv:2312.00752 . Coles, R., Connolly, A.J., Cook, K.H., Cooray, A., Covey, K.R.,
Gu, M., Wang, F., Hu, T., Yu, S., 2023. Localization and segmen- Cribbs, C., Cui, W., Cutri, R., Daly, P.N., Daniel, S.F., Daruich, F.,
tation of galaxy morphology based on mask r-cnn, in: 2023 4th Daubard, G., Daues, G., Dawson, W., Delgado, F., Dellapenna,
International Conference on Computer Engineering and Applica- A., de Peyster, R., de Val-Borro, M., Digel, S.W., Doherty, P.,
tion (ICCEA), pp. 512–515. doi:10.1109/ICCEA58433.2023. Dubois, R., Dubois-Felsmann, G.P., Durech, J., Economou, F.,
10135337. Eifler, T., Eracleous, M., Emmons, B.L., Fausti Neto, A., Fergu-
Hale, C.L., Robotham, A.S.G., Davies, L.J.M., Jarvis, M.J., Driver, son, H., Figueroa, E., Fisher-Levine, M., Focke, W., Foss, M.D.,
S.P., Heywood, I., 2019. Radio source extraction with PRO- Frank, J., Freemon, M.D., Gangler, E., Gawiser, E., Geary, J.C.,
FOUND. MNRAS 487, 3971–3989. doi:10.1093/mnras/ Gee, P., Geha, M., Gessner, C.J.B., Gibson, R.R., Gilmore, D.K.,
stz1462, arXiv:1902.01440. Glanzman, T., Glick, W., Goldina, T., Goldstein, D.A., Goodenow,
Hancock, P.J., Murphy, T., Gaensler, B.M., Hopkins, A., Curran, J.R., I., Graham, M.L., Gressler, W.J., Gris, P., Guy, L.P., Guyonnet,
2012. Compact continuum source finding for next generation radio A., Haller, G., Harris, R., Hascall, P.A., Haupt, J., Hernandez, F.,
surveys. MNRAS 422, 1812–1824. doi:10.1111/j.1365-2966. Herrmann, S., Hileman, E., Hoblitt, J., Hodgson, J.A., Hogan, C.,
2012.20768.x, arXiv:1202.4500. Howard, J.D., Huang, D., Huffer, M.E., Ingraham, P., Innes, W.R.,
Hausen, R., Robertson, B.E., 2020. Morpheus: A Deep Learn- Jacoby, S.H., Jain, B., Jammes, F., Jee, M.J., Jenness, T., Jernigan,
ing Framework for the Pixel-level Analysis of Astronomical Im- G., Jevremović, D., Johns, K., Johnson, A.S., Johnson, M.W.G.,

30
Jones, R.L., Juramy-Gilles, C., Jurić, M., Kalirai, J.S., Kallivay- O., 2018. A probabilistic u-net for segmentation of ambiguous
alil, N.J., Kalmbach, B., Kantor, J.P., Karst, P., Kasliwal, M.M., images. Advances in neural information processing systems 31.
Kelly, H., Kessler, R., Kinnison, V., Kirkby, D., Knox, L., Kotov, Kohonen, T., 1990. The self-organizing map. Proceedings of the IEEE
I.V., Krabbendam, V.L., Krughoff, K.S., Kubánek, P., Kuczewski, 78, 1464–1480.
J., Kulkarni, S., Ku, J., Kurita, N.R., Lage, C.S., Lambert, R., Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classi-
Lange, T., Langton, J.B., Le Guillou, L., Levine, D., Liang, M., fication with deep convolutional neural networks, in: Pereira, F.,
Lim, K.T., Lintott, C.J., Long, K.E., Lopez, M., Lotz, P.J., Lupton, Burges, C., Bottou, L., Weinberger, K. (Eds.), Advances in Neural
R.H., Lust, N.B., MacArthur, L.A., Mahabal, A., Mandelbaum, Information Processing Systems, Curran Associates, Inc.
R., Markiewicz, T.W., Marsh, D.S., Marshall, P.J., Marshall, S., Latorre, F., Spiller, D., Sasidharan, S.T., Basheer, S., Curti, F., 2023.
May, M., McKercher, R., McQueen, M., Meyers, J., Migliore, Transfer learning for real-time crater detection on asteroids using a
M., Miller, M., Mills, D.J., Miraval, C., Moeyens, J., Moolekamp, Fully Convolutional Neural Network. Icarus 394, 115434. doi:10.
F.E., Monet, D.G., Moniez, M., Monkewitz, S., Montgomery, C., 1016/j.icarus.2023.115434, arXiv:2204.00477.
Morrison, C.B., Mueller, F., Muller, G.P., Muñoz Arancibia, F., Lazzati, D., Campana, S., Rosati, P., Panzera, M.R., Tagliaferri, G.,
Neill, D.R., Newbry, S.P., Nief, J.Y., Nomerotski, A., Nordby, M., 1999. The Brera Multiscale Wavelet ROSAT HRI Source Cata-
O’Connor, P., Oliver, J., Olivier, S.S., Olsen, K., O’Mullane, W., log. I. The Algorithm. ApJ 524, 414–422. doi:10.1086/307788,
Ortiz, S., Osier, S., Owen, R.E., Pain, R., Palecek, P.E., Pare- arXiv:astro-ph/9904374.
jko, J.K., Parsons, J.B., Pease, N.M., Peterson, J.M., Peterson, Liu, D., Song, W., Lin, G., Wang, H., 2021. Solar Filament Segmen-
J.R., Petravick, D.L., Libby Petrick, M.E., Petry, C.E., Pierfed- tation Based on Improved U-Nets. SoPh 296, 176. doi:10.1007/
erici, F., Pietrowicz, S., Pike, R., Pinto, P.A., Plante, R., Plate, s11207-021-01920-3.
S., Plutchak, J.P., Price, P.A., Prouza, M., Radeka, V., Rajagopal, Mackovjak, Š., Harman, M., Maslej-Krešňáková, V., Butka, P., 2021.
J., Rasmussen, A.P., Regnault, N., Reil, K.A., Reiss, D.J., Reuter, SCSS-Net: solar corona structures segmentation by deep learn-
M.A., Ridgway, S.T., Riot, V.J., Ritz, S., Robinson, S., Roby, W., ing. MNRAS 508, 3111–3124. doi:10.1093/mnras/stab2536,
Roodman, A., Rosing, W., Roucelle, C., Rumore, M.R., Russo, arXiv:2109.10834.
S., Saha, A., Sassolas, B., Schalk, T.L., Schellart, P., Schindler, Men’shchikov, A., André, P., Didelon, P., Motte, F., Hennemann,
R.H., Schmidt, S., Schneider, D.P., Schneider, M.D., Schoening, M., Schneider, N., 2012. A multi-scale, multi-wavelength source
W., Schumacher, G., Schwamb, M.E., Sebag, J., Selvy, B., Sem- extraction method: getsources. A&A 542, A81. doi:10.1051/
broski, G.H., Seppala, L.G., Serio, A., Serrano, E., Shaw, R.A., 0004-6361/201218797, arXiv:1204.4508.
Shipsey, I., Sick, J., Silvestri, N., Slater, C.T., Smith, J.A., Smith, Merz, G., Liu, Y., Burke, C.J., Aleo, P.D., Liu, X., Carrasco Kind,
R.C., Sobhani, S., Soldahl, C., Storrie-Lombardi, L., Stover, E., M., Kindratenko, V., Liu, Y., 2023. Detection, instance segmenta-
Strauss, M.A., Street, R.A., Stubbs, C.W., Sullivan, I.S., Sweeney, tion, and classification for astronomical surveys with deep learn-
D., Swinbank, J.D., Szalay, A., Takacs, P., Tether, S.A., Thaler, ing (DEEPDISC): DETECTRON2 implementation and demon-
J.J., Thayer, J.G., Thomas, S., Thornton, A.J., Thukral, V., Tice, stration with Hyper Suprime-Cam data. MNRAS 526, 1122–1137.
J., Trilling, D.E., Turri, M., Van Berg, R., Vanden Berk, D., Vet- doi:10.1093/mnras/stad2785, arXiv:2307.05826.
ter, K., Virieux, F., Vucina, T., Wahl, W., Walkowicz, L., Walsh, Mohan, N., Rafferty, D., 2015. PyBDSF: Python Blob Detection
B., Walter, C.W., Wang, D.L., Wang, S.Y., Warner, M., Wiecha, and Source Finder. Astrophysics Source Code Library, record
O., Willman, B., Winters, S.E., Wittman, D., Wolff, S.C., Wood- ascl:1502.007.
Vasey, W.M., Wu, X., Xin, B., Yoachim, P., Zhan, H., 2019. Mouhcine, M., Ferguson, H.C., Rich, R.M., Brown, T.M., Smith, T.E.,
LSST: From Science Drivers to Reference Design and Anticipated 2005. Halos of Spiral Galaxies. I. The Tip of the Red Giant Branch
Data Products. ApJ 873, 111. doi:10.3847/1538-4357/ab042c, as a Distance Indicator. ApJ 633, 810–820. doi:10.1086/468177,
arXiv:0805.2366. arXiv:astro-ph/0510253.
Jia, P., Zheng, Y., Wang, M., Yang, Z., 2023a. A deep learning Newell, B., O’Neil, Earl J., J., 1977. The Reduction of Panoramic
based astronomical target detection framework for multi-colour Photometry 1. Two Search Algorithms. PASP 89, 925. doi:10.
photometry sky survey projects. Astronomy and Computing 42, 1086/130248.
100687. URL: https://www.sciencedirect.com/science/ Núnez, J., Llacer, J., 2003. Astronomical image segmentation by self-
article/pii/S2213133723000021, doi:https://doi.org/ organizing neural networks and wavelets. Neural Networks 16,
10.1016/j.ascom.2023.100687. 411–417.
Jia, Y., Liu, L., Zhang, C., 2021. Moon impact crater detection using Olmedo, O., Zhang, J., Wechsler, H., Poland, A., Borne, K., 2008.
nested attention mechanism based unet++. IEEE Access 9, 44107– Automatic Detection and Tracking of Coronal Mass Ejections in
44116. doi:10.1109/ACCESS.2021.3066445. Coronagraph Time Series. SoPh 248, 485–499. doi:10.1007/
Jia, Y., Su, Z., Wan, G., Liu, L., Liu, J., 2023b. Ae-transunet+: An s11207-007-9104-5.
enhanced hybrid transformer network for detection of lunar south Osco, L.P., Wu, Q., de Lemos, E.L., Gonçalves, W.N., Ramos, A.P.M.,
small craters in lro nac images. IEEE Geoscience and Remote Li, J., Marcato, J., 2023. The segment anything model (sam) for
Sensing Letters 20, 1–5. doi:10.1109/LGRS.2023.3294500. remote sensing applications: From zero to one shot. International
Johnston, K.G., Beuther, H., Linz, H., Schmiedeke, A., Ragan, S.E., Journal of Applied Earth Observation and Geoinformation 124,
Henning, T., 2014. The dynamics and star-forming potential of 103540. URL: https://www.sciencedirect.com/science/
the massive Galactic centre cloud G0.253+0.016. A&A 568, A56. article/pii/S1569843223003643, doi:https://doi.org/
doi:10.1051/0004-6361/201423943, arXiv:1404.1372. 10.1016/j.jag.2023.103540.
Karmakar, A., Mishra, D., Tej, A., 2018. Stellar cluster detection Ostdiek, B., Diaz Rivero, A., Dvorkin, C., 2022a. Extracting the Sub-
using gmm with deep variational autoencoder, in: 2018 IEEE Re- halo Mass Function from Strong Lens Images with Image Segmen-
cent Advances in Intelligent Computational Systems (RAICS), pp. tation. ApJ 927, 83. doi:10.3847/1538-4357/ac2d8d.
122–126. doi:10.1109/RAICS.2018.8634903. Ostdiek, B., Diaz Rivero, A., Dvorkin, C., 2022b. Image
Kingma, D.P., Welling, M., 2014. Auto-encoding variational bayes, segmentation for analyzing galaxy-galaxy strong lensing sys-
in: ICLR. tems. A&A 657, L14. doi:10.1051/0004-6361/202142030,
Kohl, S., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J.R., arXiv:2009.06663.
Maier-Hein, K., Eslami, S., Jimenez Rezende, D., Ronneberger, Otsu, N., 1979. A threshold selection method from gray-level his-

31
tograms. IEEE transactions on systems, man, and cybernetics 9, doi:10.1093/mnras/sty440, arXiv:1802.00937.
62–66. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B., 2022.
Peracaula, M., Oliver, A., Torrent, A., Lladó, X., Freixenet, J., Martı́, High-resolution image synthesis with latent diffusion models, in:
J., 2011. Segmenting extended structures in radio astronomical im- Proceedings of the IEEE/CVF conference on computer vision and
ages by filtering bright compact sources and using wavelets decom- pattern recognition, pp. 10684–10695.
position, in: 2011 18th IEEE International Conference on Image Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional
Processing, pp. 2805–2808. doi:10.1109/ICIP.2011.6116254. networks for biomedical image segmentation, in: Medical Image
Platen, E., van de Weygaert, R., Jones, B.J.T., 2007. A cos- Computing and Computer-Assisted Intervention–MICCAI 2015:
mic watershed: the WVF void detection technique. MNRAS 18th International Conference, Munich, Germany, October 5-9,
380, 551–570. doi:10.1111/j.1365-2966.2007.12125.x, 2015, Proceedings, Part III 18, Springer. pp. 234–241.
arXiv:0706.2788. Rosolowsky, E., Leroy, A., 2006. Bias-free Measurement of Giant
Qian, J., Nguyen, N.P., Oya, Y., Kikugawa, G., Okabe, T., Huang, Y., Molecular Cloud Properties. PASP 118, 590–610. doi:10.1086/
Ohuchi, F.S., 2019. Introducing self-organized maps (som) as a 502982, arXiv:astro-ph/0601706.
visualization tool for materials research and education. Results in Rue, F., Bijaoui, A., 1996. Pyramidal vision model applied to as-
Materials 4, 100020. tronomical images, in: Wavelet Applications in Signal and Image
Rani, R., Moore, T.J.T., Eden, D.J., Rigby, A.J., Duarte-Cabral, A., Processing IV, SPIE. pp. 373–383.
Lee, Y.N., 2023. Identification of molecular clouds in emission Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A.,
maps: a comparison between methods in the 13 CO/C18 O (J = 3-2) Chen, X., 2016. Improved techniques for training gans. Confer-
Heterodyne Inner Milky Way Plane Survey. MNRAS 523, 1832– ence on Neural Information Processing Systems (NeurIPS) 29.
1852. doi:10.1093/mnras/stad1507, arXiv:2305.07874. Sandfort, V., Yan, K., Graffy, P.M., Pickhardt, P.J., Summers, R.M.,
Reiman, D.M., Göhre, B.E., 2019. Deblending galaxy superposi- 2021. Use of variational autoencoders with unsupervised learning
tions with branched generative adversarial networks. MNRAS 485, to detect incorrect organ segmentations at ct. Radiology: Artificial
2617–2627. doi:10.1093/mnras/stz575, arXiv:1810.10098. Intelligence 3, e200218.
Rey Deutsch, T., Bignone, L.A., Pedrosa, S.E., 2023. Galaxy segmen- Schilliro, F., Romano, P., 2021. Segmentation of spectroscopic im-
tation using U-Net deep-learning algorithm. Boletin de la Asocia- ages of the low solar atmosphere by the self-organizing map tech-
cion Argentina de Astronomia La Plata Argentina 64, 253–255. nique. MNRAS 503, 2676–2687. doi:10.1093/mnras/stab507,
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, arXiv:2102.11595.
S., 2019. Generalized intersection over union: A metric and a loss Shen, J., Zheng, X.W., 2020. The bar and spiral arms in the Milky
for bounding box regression, in: Proceedings of the IEEE/CVF Way: structure and kinematics. Research in Astronomy and
conference on computer vision and pattern recognition, pp. 658– Astrophysics 20, 159. doi:10.1088/1674-4527/20/10/159,
666. arXiv:2012.10130.
Richards, F., Paiement, A., Xie, X., Sola, E., Duc, P.A., 2023. Panop- Simonyan, K., Zisserman, A., 2014. Very deep convolutional
tic segmentation of galactic structures in lsb images, in: 2023 networks for large-scale image recognition. arXiv preprint
18th International Conference on Machine Vision and Applications arXiv:1409.1556 .
(MVA), IEEE. pp. 1–6. Slezak, E., Bijaoui, A., Mars, G., 1988. Galaxy counts in the Coma su-
Ricker, G.R., Winn, J.N., Vanderspek, R., Latham, D.W., Bakos, percluster field. II. Automated image detection and classification.
G.Á., Bean, J.L., Berta-Thompson, Z.K., Brown, T.M., Buchhave, A&A 201, 9–20.
L., Butler, N.R., Butler, R.P., Chaplin, W.J., Charbonneau, D., Slezak, E., Lefèvre, S., Collet, C., Perret, B., 2010. Connected com-
Christensen-Dalsgaard, J., Clampin, M., Deming, D., Doty, J., De ponent trees for multivariate image processing and applications
Lee, N., Dressing, C., Dunham, E.W., Endl, M., Fressin, F., Ge, J., in astronomy, in: Pattern Recognition, International Conference
Henning, T., Holman, M.J., Howard, A.W., Ida, S., Jenkins, J.M., on, IEEE Computer Society, Los Alamitos, CA, USA. pp. 4089–
Jernigan, G., Johnson, J.A., Kaltenegger, L., Kawai, N., Kjeld- 4092. URL: https://doi.ieeecomputersociety.org/10.
sen, H., Laughlin, G., Levine, A.M., Lin, D., Lissauer, J.J., Mac- 1109/ICPR.2010.994, doi:10.1109/ICPR.2010.994.
Queen, P., Marcy, G., McCullough, P.R., Morton, T.D., Narita, N., Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.,
Paegert, M., Palle, E., Pepe, F., Pepper, J., Quirrenbach, A., Rine- 2015. Deep unsupervised learning using nonequilibrium thermo-
hart, S.A., Sasselov, D., Sato, B., Seager, S., Sozzetti, A., Stas- dynamics, in: ICML, PMLR.
sun, K.G., Sullivan, P., Szentgyorgyi, A., Torres, G., Udry, S., Vil- Starck, J.L., Aussel, H., Elbaz, D., Fadda, D., Cesarsky, C., 1999.
lasenor, J., 2015. Transiting Exoplanet Survey Satellite (TESS). Faint source detection in ISOCAM images. A&AS 138, 365–379.
Journal of Astronomical Telescopes, Instruments, and Systems 1, doi:10.1051/aas:1999281.
014003. doi:10.1117/1.JATIS.1.1.014003. Starck, J.L., Murtagh, F.D., Bijaoui, A., 1998. Image processing
Riggi, S., 2018. CAESAR: Compact And Extended Source Auto- and data analysis: the multiscale approach. Cambridge University
mated Recognition. Astrophysics Source Code Library, record Press.
ascl:1807.015. Szalay, A.S., Connolly, A.J., Szokoly, G.P., 1999. Simultaneous Mul-
Robitaille, J.F., Motte, F., Schneider, N., Elia, D., Bontemps, S., ticolor Detection of Faint Galaxies in the Hubble Deep Field. AJ
2019a. Exposing the plural nature of molecular clouds. Extract- 117, 68–74. doi:10.1086/300689, arXiv:astro-ph/9811086.
ing filaments and the cosmic infrared background against the true Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Er-
scale-free interstellar medium. A&A 628, A33. doi:10.1051/ han, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with
0004-6361/201935545, arXiv:1905.11492. convolutions, in: Proceedings of the IEEE conference on computer
Robitaille, T., Rice, T., Beaumont, C., Ginsburg, A., MacDonald, vision and pattern recognition, pp. 1–9.
B., Rosolowsky, E., 2019b. astrodendro: Astronomical data Tajik, H.R., Rahebi, J., 2013. Diffuse objects extraction in coronal
dendrogram creator. Astrophysics Source Code Library, record holes using active contour means model. ACSIJ Adv. Comput. Sci
ascl:1907.016. arXiv:1907.016. 2, 55–61.
Robotham, A.S.G., Davies, L.J.M., Driver, S.P., Koushan, S., Taranu, Tang, H., Yue, S., Wang, Z., Lai, J., Wei, L., Luo, Y., Liang, C.,
D.S., Casura, S., Liske, J., 2018. ProFound: Source Extraction and Chu, J., Xu, D., 2023. A model local interpretation routine for
Application to Modern Survey Data. MNRAS 476, 3137–3159. deep learning based radio galaxy classification, in: 2023 XXXVth

32
General Assembly and Scientific Symposium of the International tion of Convolutional Neural Networks to Identify Stellar Feed-
Union of Radio Science (URSI GASS), IEEE. pp. 1–4. back Bubbles in CO Emission. ApJ 890, 64. doi:10.3847/
Tej, A., Ojha, D.K., Ghosh, S.K., Kulkarni, V.K., Verma, R.P., 1538-4357/ab6607, arXiv:2001.04506.
Vig, S., Prabhu, T.P., 2006. A multiwavelength study of the Yan, Q.Z., Yang, J., Su, Y., Sun, Y., Wang, C., 2020. Distances
massive star-forming region IRAS 06055+2039 (RAFGL 5179). and Statistics of Local Molecular Clouds in the First Galactic
A&A 452, 203–215. doi:10.1051/0004-6361:20054687, Quadrant. ApJ 898, 80. doi:10.3847/1538-4357/ab9f9c,
arXiv:astro-ph/0601535. arXiv:2006.13654.
Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michal- Yang, P., Bai, H., Zhao, L., Gong, X., Zhong, L., Yang, Y., Rao, C.,
ski, M., Gelly, S., 2019. Fvd: A new metric for video generation 2023. A deep learning approach for automated segmentation of
. magnetic bright points in the solar photosphere. A&A 677, A121.
Vafaei Sadr, A., Movahed, S.M.S., Farhang, M., Ringeval, C., doi:10.1051/0004-6361/202346914.
Bouchet, F.R., 2018. A Multiscale pipeline for the search of Yang, X., Zhang, Q., Yang, X., Peng, Q., Li, Z., Wang, N., 2018. Edge
string-induced CMB anisotropies. MNRAS 475, 1010–1022. detection in cassini astronomy image using extreme learning ma-
doi:10.1093/mnras/stx3126, arXiv:1710.00173. chine, in: MATEC Web of Conferences, EDP Sciences. p. 06007.
van der Zwaard, R., Bergmann, M., Zender, J., Kariyappa, R., Giono, Zarin Era, I., Ahmed, I., Liu, Z., Das, S., 2023. An unsu-
G., Damé, L., 2021. Segmentation of Coronal Features to Under- pervised approach towards promptable defect segmentation in
stand the Solar EUV and UV Irradiance Variability III. Inclusion laser-based additive manufacturing by Segment Anything. arXiv
and Analysis of Bright Points. SoPh 296, 138. doi:10.1007/ e-prints , arXiv:2312.04063doi:10.48550/arXiv.2312.04063,
s11207-021-01863-9. arXiv:2312.04063.
Van Oort, C.M., Xu, D., Offner, S.S.R., Gutermuth, R.A., 2019. Zavagno, A., Dupé, F.X., Bensaid, S., Schisano, E., Li Causi, G.,
CASI: A Convolutional Neural Network Approach for Shell Iden- Gray, M., Molinari, S., Elia, D., Lambert, J.C., Brescia, M., Arzou-
tification. ApJ 880, 83. doi:10.3847/1538-4357/ab275e, manian, D., Russeil, D., Riccio, G., Cavuoti, S., 2023. Supervised
arXiv:1905.09310. machine learning on Galactic filaments. Revealing the filamentary
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, structure of the Galactic interstellar medium. A&A 669, A120.
A.N., Kaiser, Ł., Polosukhin, I., 2017. Attention is all you need. doi:10.1051/0004-6361/202244103, arXiv:2212.00463.
Advances in neural information processing systems 30. Zheng, C., Pulido, J., Thorman, P., Hamann, B., 2015. An improved
Verbeeck, C., Delouille, V., Mampaey, B., De Visscher, R., 2014. method for object detection in astronomical images. MNRAS 451,
The SPoCA-suite: Software for extraction, characterization, and 4445–4459. doi:10.1093/mnras/stv1237.
tracking of active regions and coronal holes on EUV images. A&A Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J., 2018.
561, A29. doi:10.1051/0004-6361/201321243. Unet++: A nested u-net architecture for medical image segmen-
Verbeeck, C., Higgins, P.A., Colak, T., Watson, F.T., Delouille, V., tation, in: Deep Learning in Medical Image Analysis and Multi-
Mampaey, B., Qahwaji, R., 2013. A Multi-wavelength Anal- modal Learning for Clinical Decision Support: 4th International
ysis of Active Regions and Sunspots by Comparison of Auto- Workshop, DLMIA 2018, and 8th International Workshop, ML-
matic Detection Algorithms. SoPh 283, 67–95. doi:10.1007/ CDS 2018, Held in Conjunction with MICCAI 2018, Granada,
s11207-011-9859-6. Spain, September 20, 2018, Proceedings 4, Springer. pp. 3–11.
Vikhlinin, A., Forman, W., Jones, C., Murray, S., 1995. Matched Zhu, Y., Olszewski, K., Wu, Y., Achlioptas, P., Chai, M., Yan, Y.,
Filter Source Detection Applied to the ROSAT PSPC and the De- Tulyakov, S., 2022a. Quantized gan for complex music generation
termination of the Number-Flux Relation. ApJ 451, 542. doi:10. from dance videos, in: European Conference on Computer Vision
1086/176242. (ECCV), Springer. pp. 182–199.
Vos, E.E., Francois Luus, P.S., Finlay, C.J., Bassett, B.A., 2019. A Zhu, Y., Wu, Y., Olszewski, K., Ren, J., Tulyakov, S., Yan, Y., 2023.
generative machine learning approach to rfi mitigation for radio Discrete contrastive diffusion for cross-modal music and image
astronomy, in: 2019 IEEE 29th International Workshop on Ma- generation, in: The Eleventh International Conference on Learn-
chine Learning for Signal Processing (MLSP), pp. 1–6. doi:10. ing Representations (ICLR).
1109/MLSP.2019.8918820. Zhu, Y., Wu, Y., Sebe, N., Yan, Y., 2022b. Vision+ x: A sur-
Walmsley, M., Spindler, A., 2023. Deep Learning Segmentation of vey on multimodal learning in the light of data. arXiv preprint
Spiral Arms and Bars. arXiv e-prints , arXiv:2312.02908doi:10. arXiv:2210.02884 .
48550/arXiv.2312.02908, arXiv:2312.02908.
Whiting, M., Humphreys, B., 2012. Source-Finding for the Australian
Square Kilometre Array Pathfinder. PASA 29, 371–381. doi:10.
1071/AS12028, arXiv:1208.2479.
Williams, J.P., de Geus, E.J., Blitz, L., 1994. Determining Structure
in Molecular Clouds. ApJ 428, 693. doi:10.1086/174279.
Xavier, G., Soman, K.P., Tvn, D., Philip, T.E., 2012. An efficient
algorithm for the segmentation of astronomical images. IOSR
Journal of Computer Engineering 6, 21–29. URL: https://api.
semanticscholar.org/CorpusID:54658228.
Xu, D., Kong, S., Kaul, A., Arce, H.G., Ossenkopf-Okada, V.,
2023. CMR Exploration. II. Filament Identification with Machine
Learning. ApJ 955, 113. doi:10.3847/1538-4357/acefce,
arXiv:2308.06641.
Xu, D., Offner, S.S.R., Gutermuth, R., Oort, C.V., 2020a. Application
of Convolutional Neural Networks to Identify Protostellar Out-
flows in CO Emission. ApJ 905, 172. doi:10.3847/1538-4357/
abc7bf, arXiv:2010.12525.
Xu, D., Offner, S.S.R., Gutermuth, R., Oort, C.V., 2020b. Applica-

33

You might also like