Yu_SoftCollage_A_Differentiable_Probabilistic_Tree_Generator_for_Image_Collage_CVPR_2022_paper
Yu_SoftCollage_A_Differentiable_Probabilistic_Tree_Generator_for_Image_Collage_CVPR_2022_paper
Criterion
Initialize
Image collage task aims to create an informative and Hand-crafted adjustment rules
visual-aesthetic visual summarization for an image collec- Image collection Collage tree
tion. While several recent works exploit tree-based algo- SoftCollage Tree probability Sampled
distribution 𝜏𝜃 collage
rithm to preserve image content better, all of them resort to set
hand-crafted adjustment rules to optimize the collage tree Tree Sample Criterion
Generator Map Loss
structure, leading to the failure of fully exploring the struc-
ture space of collage tree. Our key idea is to soften the Image collection Back propagation
3729
Horizontal Cut
V
H H Vertical Cut
Figure 2. An example of the mapping from a standard collage tree Figure 3. Due to the failure of fully exploring the structure space of
to the tree-based collage. collage tree, the collage generated by the state-of-the-art method
(a) still contains images suffering severe aspect ratio distortion
crete operations that prevent back propagation. Although (red dotted rectangle) and fails to place similar images together
recent tree-based advances [16, 23] utilized learning strate- (blue dotted ellipse). Our result (b) preserves aspect ratio and con-
gies, they only applied them to yield semantic feature in the tent correlation better.
first stage so that images with similar features clustered to-
gether. These works achieve much improvement because preserving collage tree conditioned on the deep image
placing correlated images together can facilitate collage in- feature, aspect ratio and canvas size.
formativeness [18, 38, 41]. However, these methods still • We formulate the tree-based collage generation proce-
employed hand-crafted scheme to refine tree structure and dure as a differentiable process for the first time, and
failed to fully explore the solution space (Fig. 3). Recently, introduce an end-to-end learning strategy to perform
despite Pan et al. [23] introduced back propagation for the gradient-based structure optimization.
first time to fine-tune aspect ratio and splitting ratio, they • We provide a large-scale public-available annotated
still failed to propagate the gradients back to optimize the benchmark dataset for evaluation of image collage
collage tree structure due to the undifferentiable character- method.
istic of the tree-based process. • We conduct extensive experiments and user study, and
In this paper, we attack the key problem of differen- show that our model outperforms the state-of-the-art
tiating the overall two-stage tree-based collage generation methods.
process (Fig. 1). Specifically, firstly we propose a novel
neural-based differentiable probabilistic tree generator to 2. Related Work
model the first stage of tree-based procedure. Our tree
generator exploits deep image feature and embedded infor- Previous works on image collage mainly fall into two
mation including aspect ratio and canvas size to construct categories, i.e. parametric method and partitioning-based
a correlation-preserving probabilistic collage tree (PCtree), method. Our tree-based method belongs to the latter.
which builds a probability space via modeling the node type Parametric methods parameterize a collage with vari-
distribution (the cut type of the node is horizontal (“H”) ables including position, scale, orientation and layer index
or vertical (“V”)) and the edge connection distribution (the of each image and design well-defined objective functions
child node is on the left (“L”) or right (“R”)) (Fig. 5). Sec- to solve the optimal variables directly [4,9,12,19,25–27,33,
ondly, we formulate the tree generator optimization as an 36,40]. These works either modeled the problem via a prob-
end-to-end framework resorting to the policy gradient tech- abilistic graphical framework [19,25,26,33,36,40] or solved
nique [30], which naturally overcomes the differentiation the collage parameters in a heuristic manner [4, 9, 12, 27].
difficulty in the second stage of tree-based procedure. In- To preserve correlation among images, some methods ex-
stead of the hand-crafted adjustment scheme in instance ploited a feature space to acquire the correlation and pro-
level, our optimization paradigm directly utilizes the gradi- jected the images into a visualization space [1, 13, 20, 21,
ent of collage criteria loss to optimize the collage tree struc- 29, 39]. However, these methods introduce image overlap-
ture in the level of probability space, which facilitates the ping and artifact problem.
exploration of the optimal collage structure. Partitioning-based methods partition the canvas and as-
Furthermore, this field lacks a benchmark dataset with sign each image with a corresponding region to compose
sufficient labels for quantitative evaluation. To facilitate im- a collage [3, 8, 10, 16, 18, 23, 28, 31, 37, 38, 41]. Some
age collage research, we propose AIC, a large-scale public- works utilized Voronoi tessellation [31] and packing al-
available annotated dataset for image collage evaluation. gorithm [18, 41] to allocate canvas space for the irregular
The major contributions can be summarized as follows. salient region of each image, which brought about image
artifacts when blending image boundaries. Hence, tree-
• We propose a novel neural-based probabilistic tree based collage is developed to preserve image content bet-
generator which constructs “soft” probabilistic tree ter [3, 8, 16, 23, 28, 37, 38]. Atkins [3] first introduced tree-
structure to build a probability space of correlation- based collage and solved tree structure in a beam-search
3730
L R L R L R
Feature Extractor 4 6
Edge Forward
Backbone
Classifier Shared Shared
Weights Weights
Siam Siam Siam Forward via
FC FC FC Nearest Neighbor
InfoEmbed Policy
Shared
Weights
Figure 4. The pipeline of our tree generator. Here the image collection size is four, and our feature extractor initially extracts feature of
each image. Subsequently the NNP and fusion module iteratively select child nodes to yield parent feature node in a bottom-up manner
until the root feature node of the probability collage tree is acquired. Finally, the edge classifier and node classifier generate pe and pn
respectively. σ is the softmax activation.
H V
V
crafted distortion threshold. These tree-based methods all L R
ture, thus failed to fully explore the solution space. Re- Probabilistic collage tree Image collection Standard collage tree
cently, Pan et al. [23] utilized back propagation to refine the
aspect ratio and splitting ratio of region box in [38]. How- Figure 5. Our probabilistic collage tree softens the standard col-
ever, the gradient in [23] still fails to flow back to optimize lage tree structure via modeling the node type distribution as pn
and edge connection distribution as pe .
the tree structure due to the undifferentiable characteristic
of the tree-based collage generation process. Different from Therefore, given an image collection {Ii } ,canvas width
the prior work, we attack the key problem of differentiat- w and height h, we aim to design a tree generator G.
ing the process via softening the discrete structure of col- This generator constructs a collage tree τ in the first stage
lage tree, and hence our gradient can directly update all the and the tree is mapped to the final collage C via a map-
structural details of collage tree. ping function g in the second stage. Supposing we in-
tegrate the above four criteria into one criterion function
F , our goal is to solve the optimal tree generator G∗ =
3. Approach arg maxG F g G(w, h, {Ii }) .
Problem formulation. According to the literature, a high- Overview. To solve the above two-stage problem in an
quality collage should satisfy the following criteria: 1) end-to-end manner, firstly we propose a “soft” probabilistic
Compact. The collage should fully utilize canvas space by collage tree (PCtree) and design a differentiable tree gen-
blank space minimization. 2) Ratio-preserving. Image in erator to construct the PCtree. Secondly, we approximate
the collage should suffer low aspect ratio distortion to retain the gradient of criterion loss to optimize our generator via
the aesthetics. 3) Content-preserving. Image content, espe- back propagation. These two steps tackle the differentia-
cially the salient region, should prevent occlusion. And im- tion problem of the two stages repectively. In the following
age overlapping decreases the representativeness and aes- parts, we firstly present the PCtree, our tree generator and
thetics of the collage [23]. 4) Correlation-preserving. Re- the tree generation algorithm in Sec. 3.1. Afterwards we
cent works show that placing correlated images together fa- introduce the model architecture of our neural generator in
cilitates informativeness of the collage [18, 23, 38, 41]. Sec. 3.2. Finally we present our gradient-based optimiza-
3731
Algorithm 1: Tree construction process in Fig. 4. Feature extractor extracts image semantic fea-
Input: w, h, {Ii } tures to learn correlation among images and embeds aspect
1 N ← size({fi }) ; ratio and canvas information to learn layout adjustment. Fu-
2 {fi } ← {FeatureExtractor(Ii )} ; sion module fuses the features of child nodes to yield parent
3 repeat node feature for the bottom-up tree construction. Edge clas-
4 fnx , fny ← NNP ({fi }) ; sifier determines the edge connection distribution between
5 fnz ← F usionM odule(fnx , fny ) ; child nodes and parent node. Node classifier predicts the
6 pe (nx , ny ) ← EdgeClassif ier(fnx , fny ) ; cut type distribution of interior nodes.
7 pn (nz ) ← NodeClassif ier(fnz ) ; Tree construction algorithm. To preserve correlation
8 Remove fnx , fny from {fi } and add fnz into {fi } ; among images, we adopt nearest neighbor policy (NNP) to
9 N ←N −1;
conduct the tree construction in a greedy manner. Given a
10 until N = 1;
list of features, our NNP finds the pair of features with the
closest Euclidean distance. The tree construction process is
tion paradigm in Sec. 3.3.
described in Algo. 1, where fn denotes the feature of node
3.1. Probabilistic Collage Tree Generation n. The time complexity of this algorithm is O(N 2 log N )
with the use of priority queue and hash table, where N is
Probabilistic collage tree. Standard collage tree represents the size of image collection.
collage layout using discrete structural parameters includ-
ing edge connection and node type [3], while the proposed 3.2. Model Architecture
probabilistic collage tree (PCtree) softens the parameters
In this section, we elaborate on the network architecture
via modeling the node type distribution (the cut type of the
of our four generator components.
node is designated as horizontal (“H”) or vertical (“V”)) as
Feature extractor. This component is composed of two-
pn and the edge connection distribution (the first child node
path feature extractors, as shown in Fig. 4. One path em-
in the child list is designated as the left (“L”) or right (“R”)
ploys a pre-trained backbone network to extract content fea-
child node) as pe , as shown in Fig. 5. The nodes in PCtree (i)
ture fbb (θbb ) from each image Ii and the network param-
and standard collage tree are in one-to-one correspondence.
eter θbb is fine-tuned during training. Another path intro-
Thus, given an interior node ñ in a PCtree with child nodes
duces information embedding edw , edh , edar to inject can-
ñi and n˜j , and the nodes n, ni and nj (corresponding to ñ,
vas size and image aspect ratio signals and these signals are
ñi and ñj respectively) in a standard collage tree, we define
fused via a fully connected layer and the ReLU activation
pn , pe ∈ R2 as
function [11] as
p(0)
n (ñ) = p cn = “H”|τθ (ñi ), τθ (n˜j ) (1) (i)
finf = ReLU W1 [w · edw , h · edh , ari · edar ]T + b1 (6)
(1)
pn (ñ) = p cn = “V”|τθ (ñi ), τθ (n˜j ) (2)
Here, ari is the aspect ratio of image Ii , and we de-
(i) (i)
p(0) notes the dimension of finf and fbb (θbb ) as dinf and dbb
e (ñi , n˜j ) = p ln = ni , rn = nj |τθ (ñi ), τθ (n˜j ) (3)
respectively. The elements in the embedding row vectors
p(1) edw , edh , edar are all initialized to one and they are fine-
e (ñi , n˜j ) = p ln = nj , rn = ni |τθ (ñi ), τθ (n˜j ) (4)
(i) (i)
tuned during training. W1 and b1 are also learnable param-
where pn and pe denotes the i-th (i ∈ {0, 1}) component eters. dw , dh , dar , dbb and dinf are hyperparameters.
of pn and pe respectively, cn is the cut type of n, ln is the Because the signals from these two paths are indepen-
left child node of n, rn is the right child node of n, and dent, the leaf node feature fni of image Ii is obtained via
τθ (x) denotes the subtree of PCtree τθ rooted at node x. concatenating these two feature vectors.
Through softening the parameters, we build a probability (i) (i)
fni = concat fbb (θbb ), finf (7)
space for the collage tree and the likelihood of a standard
collage tree τ given the PCtree τθ can be calculated as Fusion module. This module should obtain the parent fea-
ture node via symmetry invariant transforms of the two
p(1{cn =“V”}) ñ × p(0) l˜n , r˜n
Y
p(τ |τθ ) = n e (5) given child nodes, i.e. ff us (fni , fnj ) = ff us (fnj , fni )
n∈N (τ )
where ff us denotes the fusion module. Our idea is to use
where N (τ ) is the interior node set of τ , ñ, l˜n and r˜n denote the self-attentive weighted sum of the two child features
nodes in the PCtree corresponding to n, ln and rn respec- to satisfy symmetry invariance. To obtain the weight vec-
tively, and 1{·} is the indicator function (the value is 1 when tors, we utilize self-attentive embedding technique [17] to
the condition is true, otherwise it is 0). design Eq. (10), which injects additive operation into the
Generator components. To generate the PCtree, we de- aspect ratio information fusion process. Moreover, we uti-
sign four learnable components, i.e. feature extractor, fu- lize self-attention mechanism [32] to pre-process the input
sion module, edge classifier and node classifier, as shown features for injecting multiplicative signal (Eq. (9)). Bene-
3732
Algorithm 2: Optimization procedure of our model Loss function. We define Eτ ∼L(τ ;θ,π) [F (g (τ ))] as
Input: w, h, {Ii } Fθ (τ ; π) and approximate the gradient as
1 Initialize θ randomly; X M
1
2 t ← 0; ∇θ Fθ (τ ; π) ≈ ∇θ F g(τi ) log p(τ |τθ ) (16)
3 repeat M i=1
4 Construct probabilistic collage tree τθ via θ and π in where M is the number of sample τi . Therefore, we define
accordance with Algo. 1 ; the loss function as
5 Sample {τi }M from p(τ |τθ ) ; M
Compute L(θ) via Eq. (17); 1 X
6 L(θ) = − F g(τi ) log p(τ |τθ ) (17)
7 θ ← θ − α × ∇θ L(θ) ; M i=1
8 t ← t + 1; In term of mapping funcition g, We initially utilize an
9 until t ≥ Tm ; efficient mapping algorithm [8] to generate collage with
canvas blank loss rb , i.e. canvas blank space ratio, and
fiting from the two-stage transformation, the fusion module
we stretch the overall collage to fit the canvas in the post-
is able to memorize a variety of subtree structure schemes,
processing process. Our approach avoids canvas blank
which boosts the learning ability of the model.
space by introducing little aspect ratio distortion. The rea-
f(i,j) = [fni , fnj ]T (8) son is that canvas blank loss has a significantly worse im-
0
pact on the user’s visual experience than aspect ratio loss,
f(i,j) = Attention f(i,j) WQ , f(i,j) WK , f(i,j) WV (9)
provided that magnitudes of the both losses are similarly
A = sof tmax Ws2 tanh Ws1 f(i,j)
0
(10) small. Moreover, our mapping function benefits from [8] in
preventing image content occlusion.
0 With respect to criterion F , we mainly focus on the ratio
fnp = Ws3 f latten Af(i,j) + b2 (11) preservation criterion because our NNP and mapping func-
Here, WQ ∈ Rd×dQ , WK ∈ Rd×dK , WV ∈ tion already consider the other three criteria. For this part,
R d×dV
, Ws1 ∈ Rd1 ×dV , Ws2 ∈ Rd2 ×d1 , Ws3 ∈ we design a reward shaping function R for canvas blank
R d×d2 dV
, b2 are all learnable parameters, where d is the loss rb as
dimension of node feature. dQ , dK , dV , d1 and d2 are all
−R0 , r3 < rb
R0 (rb −r2 )
r2 < rb ≤ r3
r2 −r3 ,
hyperparameters. Eq. (9) is the scaled dot-product attention
parameterized by dK [32]. R(rb ) = R0 (log10 rb −log10 r2 )
, r1 < rb ≤ r2
log10 r1 −log10 r2
Node classifier. A fully connected layer is utilized to model
R0 , rb ≤ r1
this component as (18)
pn (n) = sof tmax(W2 fn + b3 ) (12) where R0 is the bound of reward value, r1 , r2 and r3 are
where W2 and b3 are learnable parameters. specific blank loss values. The shape design of Eq. (18) is
Edge classifier. Different from pn , binary function pe based on the observation that the difficulty of decreasing rb
(0) (0)
owns the property that pe (ni , nj ) + pe (nj , ni ) = may be linear in the ratio interval of r2 to r3 and it may
(0) (1) increase exponentially when rb is below r2 . R0 , r1 , r2 and
pe (ni , nj ) + pe (ni , nj ) = 1, as shown in Fig. 4. Thus,
r3 are hyperparameters. And aesthetics property Faes pro-
siamese network architecture [5] is employed to model this
posed in [23] is also included in F . Moreover, we design
component as
00
the area penalty Fp to prevent model shrinking some im-
f(i,j) = W3 concat(fni , fnj ) + b4 (13) ages too much as
Fp (C) = −R0 × 1{∃I ∈ C min(hI , wI ) ≤ sp }
00
f(j,i) = W3 concat(fnj , fni ) + b4 (14) (19)
00 00 where I is an image in collage C and sp is a hyperparameter.
pe (ni , nj ) = sof tmax f(i,j) , f(j,i) (15) Therefore, criterion F is defined as
where W3 and b4 are learnable parameters. F (C) = λr R rb (C) + λa Faes C + λp Fp C (20)
3.3. Gradient-Based Optimization Paradigm where λr , λa and λp are hyperparameters.
Optimization. Different from hand-crafted adjustment
Through building the probability space of collage scheme, our optimization paradigm exploits ∇θ L(θ) to
tree, the tree-based collage generation problem formula- optimize collage tree probability distribution p(τ |τθ ) in
∗
tion can be modified as solving θ subject to Gθ∗ = an end-to-end manner. Algo. 2 shows the optimization
arg maxθ Eτ ∼L(τ ;θ,π) F g(τ ) , where π denotes our paradigm of our model, where Tm is the maximum num-
NNP, L(τ ; θ, π) = p τ |w, h, {Ii }; θ, π = p(τ |τθ ) and θ ber of iterations and α is learning rate. At inference stage,
is the parameter of tree generator. optimal collage tree τ ∗ is determined with maximum likeli-
3733
Theme Animals Food Fruits Transportation Sports Office Baby Clothes Houseware Instrument Makeup
Percentage(%) 3.85 11.73 23.22 12.76 4.94 6.44 4.29 18.34 9.38 1.79 3.26
Table 1. The percentage of image number under each theme of ICSS.
3734
5-scale Excellent (4) Good (3) Borderline (2) Poor (1) Bad (0) Score Kappa
SHP [6] 17.5% 50.8% 27.5% 4.2% 0.0% 2.816 0.82
CLT [2] 16.7% 51.2% 28.8% 3.3% 0.0% 2.813 0.80
VSM [23] 29.2% 52.9% 15.4% 2.5% 0.0% 3.088 0.76
Ours 34.2% 51.7% 12.0% 2.1% 0.0% 3.180 0.80
Side-by-side Wins Equally Good Equally Borderline Equally Poor Losses ∆ Kappa
Ours v.s. SHP [6] 60.6% 28.1% 11.3% 0.0% 0.0% 60.6% 0.75
Ours v.s. CLT [2] 63.1% 26.3% 10.6% 0.0% 0.0% 63.1% 0.71
Ours v.s. VSM [23] 26.9% 57.5% 9.4% 0.0% 6.2% 20.7% 0.67
Table 5. 5-scale human evaluation along with side-by-side human evaluation of collage results on the AIC. The score in 5-scale evaluation
is the weighted average. ∆ in side-by-side evaluation denotes the gap between the win rate and the lose rate.
3735
SHP [6] CLT [2] VSM [23] Ours w/o Fp Ours
Figure 7. Comparison of the collage results generated by different methods on the AIC. We can see that SHP [6] and CLT [2] both
introduce content occlusion (red dotted rectangle) into the images in collage. Despite VSM [23] circumvents this defect, the results still
contain images suffering high aspect ratio distortion (red dotted rectangle), particularly when the image collection size is large. However,
our method takes advantage of the probability space to produce results closer to the global optimal. Notably employing loss function
without Fp of Eq. (19) to train our model leads to drastic imbalance in image area assignment in collages.
results, illustrated in Tab. 5, suggest that our method is sub- ipants are inclined to choose more images as remembered,
stantially superior to all baselines in producing high-quality leading to a higher recall than precision.
collage from human’s perspective. The high Kappa scores
imply that a major agreement prevails among the evaluators. 5. Conclusion
Information conveying test. We further validate the effec- In this paper, we present SoftCollage, a novel tree-based
tiveness of our NNP via the information conveying test ac- collage method. Our key idea is to soften the discrete tree
cording to [22, 23]. Twenty subjects participated in the test structure into the probability space. By modeling the con-
and they were equally divided into four groups. Each group ditional probability distribution of collage tree via the pro-
corresponds to one collage method. For each image collec- posed tree generator, we can formulate the collage genera-
tion, we showed participants the corresponding collage for tion as a differentiable process and optimize the layout with
20 s and then asked them to perform a binary classification the gradient of criterion loss instead of the hand-crafted ad-
test, namely selecting the images that they had seen in the justment scheme. We demonstrate the effectiveness of our
collage, on an image set including five groundtruth images method via extensive experiments on the proposed large-
and five negative samples (sharing the identical theme with scale dataset AIC. Currently, the GPU memory consump-
the groundtruths). Tab. 6 shows the test results. Our col- tion of our model is high when the size of image collec-
lage benefits from the NNP and thus outperforms the other tion is large. Because of the extensibility of our method in
baselines. We find that the images selected by participants model architecture design, in the future we will explore the
account for approximately 72%, which implies that partic- lightweight design and knowledge distillation of our model.
3736
References [16] Yuan Liang, Xiting Wang, Song-Hai Zhang, Shi-Min Hu,
and Shixia Liu. Photorecomposer: Interactive photo recom-
[1] Similarity preserving snippet-based visualization of web position by cropping. IEEE transactions on visualization and
search results. IEEE transactions on visualization and com- computer graphics, 24(10):2728–2742, 2017. 1, 2
puter graphics, 20(3):457–470, 2014. 2 [17] Zhouhan Lin, Minwei Feng, Cı́cero Nogueira dos Santos,
[2] Collageit. online, 2019. https : / / www . Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A
collageitfree.com/. 6, 7, 8 structured self-attentive sentence embedding. In 5th Interna-
[3] C Brian Atkins. Blocked recursive image composition. In tional Conference on Learning Representations, ICLR 2017,
Proceedings of the 16th ACM international conference on Toulon, France, April 24-26, 2017, Conference Track Pro-
Multimedia, pages 821–824, 2008. 1, 2, 3, 4 ceedings. OpenReview.net, 2017. 4
[4] Simone Bianco and Gianluigi Ciocca. User preferences mod- [18] Lingjie Liu, Hongjie Zhang, Guangmei Jing, Yanwen Guo,
eling and learning for pleasing photo collage generation. Zhonggui Chen, and Wenping Wang. Correlation-preserving
ACM Transactions on Multimedia Computing, Communica- photo collage. IEEE transactions on visualization and com-
tions, and Applications (TOMM), 12(1):1–23, 2015. 1, 2 puter graphics, 24(6):1956–1968, 2017. 1, 2, 3, 6
[5] Jane Bromley, James W Bentz, Léon Bottou, Isabelle Guyon, [19] Tie Liu, Jingdong Wang, Jian Sun, Nanning Zheng, Xiaoou
Yann LeCun, Cliff Moore, Eduard Säckinger, and Roopak Tang, and Heung-Yeung Shum. Picture collage. IEEE Trans-
Shah. Signature verification using a “siamese” time delay actions on Multimedia, 11(7):1225–1239, 2009. 1, 2
neural network. International Journal of Pattern Recognition [20] G. P. Nguyen and M. Worring. Interactive access to large im-
and Artificial Intelligence, 7(04):669–688, 1993. 5 age collections using similarity-based visualization. Journal
[6] V. Cheung. Shape collage. online, 2013. http://www. of Visual Languages & Computing, 19(2):203–224, 2008. 2
shapecollage.com/. 6, 7, 8 [21] E. G. Nieto, W. Casaca, L. G. Nonato, and G. Taubin. Mixed
[7] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, integer optimization for layout arrangement. In Graphics,
and Li Fei-Fei. Imagenet: A large-scale hierarchical image Patterns & Images, 2013. 2
database. In 2009 IEEE conference on computer vision and [22] Aude Oliva and Antonio Torralba. Modeling the shape of
pattern recognition, pages 248–255. Ieee, 2009. 6, 7 the scene: A holistic representation of the spatial envelope.
International journal of computer vision, 42(3):145–175,
[8] Jian Fan. Photo layout with a fast evaluation method and
2001. 8
genetic algorithm. In 2012 IEEE International Conference
on Multimedia and Expo Workshops, pages 308–313. IEEE, [23] Xingjia Pan, Fan Tang, Weiming Dong, Chongyang Ma, Yip-
2012. 1, 2, 3, 5 ing Meng, Feiyue Huang, Tong-Yee Lee, and Changsheng
Xu. Content-based visual summarization for image col-
[9] Yuan Gan, Yan Zhang, Zhengxing Sun, and Hao Zhang.
lections. IEEE transactions on visualization and computer
Qualitative photo collage by quartet analysis and active
graphics, 2019. 1, 2, 3, 5, 6, 7, 8
learning. Computers & Graphics, 88:35–44, 2020. 2
[24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
[10] J. Geigel, A. Loui, and E. Loui. Automatic page layout using
James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
genetic algorithms for electronic albuming. Proceedings of
Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im-
SPIE - The International Society for Optical Engineering,
perative style, high-performance deep learning library. Ad-
pages 79–90, 2001. 2
vances in neural information processing systems, 32:8026–
[11] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep 8037, 2019. 7
sparse rectifier neural networks. Journal of Machine Learn- [25] Carsten Rother, Lucas Bordeaux, Youssef Hamadi, and An-
ing Research, 15:315–323, 2011. 4 drew Blake. Autocollage. ACM transactions on graphics
[12] Stas Goferman, Ayellet Tal, and Lihi Zelnik-Manor. Puzzle- (TOG), 25(3):847–852, 2006. 1, 2
like collage. In Computer graphics forum, volume 29, pages [26] Carsten Rother, Sanjiv Kumar, Vladimir Kolmogorov, and
459–468. Wiley Online Library, 2010. 1, 2 Andrew Blake. Digital tapestry [automatic image synthesis].
[13] E. Gomez-Nieto, W. Casaca, D. Motta, I. Hartmann, G. In 2005 IEEE Computer Society Conference on Computer
Taubin, and L. G. Nonato. Dealing with multiple require- Vision and Pattern Recognition (CVPR’05), volume 1, pages
ments in geometric arrangements. IEEE Transactions on Vi- 589–596. IEEE, 2005. 1, 2
sualization & Computer Graphics, 22(3):1223–1235, 2016. [27] M. Shuang and W. C. Chang. Automatic creation of
2 magazine-page-like social media visual summary for mobile
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. browsing. In 2016 IEEE International Conference on Image
Deep residual learning for image recognition. In Proceed- Processing (ICIP), 2016. 2
ings of the IEEE conference on computer vision and pattern [28] Yu Song, Fan Tang, Weiming Dong, Feiyue Huang, Tong-
recognition, pages 770–778, 2016. 6, 7 Yee Lee, and Changsheng Xu. Balance-aware grid collage
[15] Diederik P. Kingma and Jimmy Ba. Adam: A method for for small image collections. IEEE Transactions on Visual-
stochastic optimization. In Yoshua Bengio and Yann LeCun, ization and Computer Graphics, 2021. 2
editors, 3rd International Conference on Learning Represen- [29] Hendrik Strobelt, Marc Spicker, Andreas Stoffel, Daniel
tations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Keim, and Oliver Deussen. Rolled-out wordles: A heuris-
Conference Track Proceedings, 2015. 7 tic method for overlap removal of 2d data representatives. In
3737
Computer Graphics Forum, volume 31, pages 1135–1144.
Wiley Online Library, 2012. 2
[30] Richard S Sutton, David A McAllester, Satinder P Singh, and
Yishay Mansour. Policy gradient methods for reinforcement
learning with function approximation. In Advances in neural
information processing systems, pages 1057–1063, 2000. 2
[31] Li Tan, Yangqiu Song, Shixia Liu, and Lexing Xie. Image-
hive: Interactive content-aware image summarization. IEEE
computer graphics and applications, 32(1):46–55, 2011. 2
[32] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Advances in neural
information processing systems, pages 5998–6008, 2017. 4,
5
[33] Jingdong Wang, Long Quan, Jian Sun, Xiaoou Tang, and
Heung-Yeung Shum. Picture collage. In 2006 IEEE Com-
puter Society Conference on Computer Vision and Pattern
Recognition (CVPR’06), volume 1, pages 347–354. IEEE,
2006. 1, 2
[34] Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng,
Dong Wang, Baocai Yin, and Xiang Ruan. Learning to de-
tect salient objects with image-level supervision. In Proceed-
ings of the IEEE conference on computer vision and pattern
recognition, pages 136–145, 2017. 6
[35] Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen,
Haibin Ling, and Ruigang Yang. Salient object detection
in the deep learning era: An in-depth survey. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 2021.
6
[36] Yichen Wei, Yasuyuki Matsushita, and Yingzhen Yang. Ef-
ficient optimization of photo collage. Microsoft Research,
Redmond, WA, USA, MSRTR-2009-59, 2009. 1, 2
[37] Zhipeng Wu and Kiyoharu Aizawa. Picwall: Photo collage
on-the-fly. In 2013 Asia-Pacific Signal and Information Pro-
cessing Association Annual Summit and Conference, pages
1–10. IEEE, 2013. 1, 2
[38] Zhipeng Wu and Kiyoharu Aizawa. Very fast generation
of content-preserved photo collage under canvas size con-
straint. Multimedia Tools and Applications, 75(4):1813–
1841, 2016. 1, 2, 3, 6
[39] Xintong, Han, Chongyang, Zhang, Weiyao, Lin, Mingliang,
Xu, Bin, and Sheng. Tree-based visualization and optimiza-
tion for image collection. IEEE transactions on cybernetics,
46(6):1286–300, 2016. 2
[40] Yingzhen Yang, Yichen Wei, Chunxiao Liu, Qunsheng Peng,
and Yasuyuki Matsushita. An improved belief propaga-
tion method for dynamic collage. The Visual Computer,
25(5):431–439, 2009. 1, 2
[41] Zongqiao Yu, Lin Lu, Yanwen Guo, Rongfei Fan, Mingming
Liu, and Wenping Wang. Content-aware photo collage us-
ing circle packing. IEEE transactions on visualization and
computer graphics, 20(2):182–195, 2013. 1, 2, 3, 6
3738