Arcimboldo-Like Collage Using Internet Images: Hua Huang Lei Zhang Hong-Chao Zhang Xi'an Jiaotong University, China
Arcimboldo-Like Collage Using Internet Images: Hua Huang Lei Zhang Hong-Chao Zhang Xi'an Jiaotong University, China
Hua Huang
e-mail: [email protected]
1 Introduction
Our body is overcoat of the soul inside Bhagwad Gita. It
has long been discussed by psychologists and artists about essence
and its physical representation. The essence can still be recogniz-
able even with marked changes on its appearance, while conveys
more information in a compelling way (see Figure 1). This prin-
ciple is artistically realized in the painting artworks by Giuseppe
Arcimboldo, an Italian painter in the Renaissance and honored as
the vanguard of Surrealism. In his paintings, portrait heads are rep-
resented by a variety of symbolic elements like fruits, books (see
Figure 2). Although this sort of hodgepodge images looks rather
eccentric, it provides a fascinating form to display the content in an
extravagant yet lucid style [Maiorino 1983], which is also fashion-
able nowadays. This paper focuses on the creation of such collages
with the Arcimboldo-like style.
From their origin dating back to the year 200 BC in China, collages
have gained growing popularity as artistic and collective expression
of photo assemblage [Wikipedia 2011]. However, manually creat-
ing collage is labor intensive and time consuming, which needs del-
icate cutouts as materials and reciprocity on their assembly. Thus,
many approaches work on automating collage construction [Rother
et al. 2006; Wang et al. 2006; Goferman et al. 2010]. These ap-
proaches often produce commendable collages, but conne them-
selves in a regular canvas, which disable the Arcimboldo-like ef-
fect as shown in Figure 2. In [Gal et al. 2007], 3D models instead
of image cutouts are assembled to approximate a 3D shape. Al-
though exhibiting the Arcimboldo-like style, such collage needs a
3D model as the template, which prohibits stylization of various 2D
images without 3D representation.
The visual mechanism by which we recognize the Arcimboldo-
like collage is the so-called Apophenia, which suggests the expe-
rienced meaningfulness coming from specic connections of co-
herent representation [Maiorino 1983]. Hence, there are two chal-
lenges to create effective Arcimboldo-like collages: selecting com-
petent image cutouts and assembling them consistently for resem-
blance. The Internet provides large easy-to-access image database
such as Flickr or Picasa Web Albums, where various image cutouts
can possibly be discovered. In this paper, we exploit the appropriate
Internet image cutouts as the elements to produce the Arcimboldo-
like collages.
Figure 2: Expressive Arcimboldo-like collages: Summer (1563,
left) and Librarian (1566, middle) painted by Giuseppe Arcim-
boldo; The Pirate (right) from a textbook.
The main contribution of our paper is an Internet image based al-
gorithm to create Arcimboldo-like collages from some input im-
ages. Being the rst attempt to create such 2D collages with the
Internet images, our algorithm is effective to produce the plausible
Arcimboldo-like expression of the input images.
2 Related Work
Much research has gone into collage construction from a photo al-
bum to an informative assembly photo. The cutout region in each
photo can be interactively specied [Agarwala et al. 2004] or au-
tomatically detected [Rother et al. 2006; Wang et al. 2006], and
subsequently stitched together with seamless transition. These ap-
proaches employ rectangular cutouts as primitives, while in [Gofer-
man et al. 2010] cutouts of arbitrary shapes are used to create col-
lages with puzzle-like style. However, all the resulting collages
above are assembled in a rectangle to form a new photo. ShapeCol-
lage [Cheung 2011] can arrange photos into arbitrary shape, but re-
quires regular cutouts as primitives. Our approach would assemble
cutouts of arbitrary shapes to a target image of arbitrary shape.
Image mosaic is another stylization similar to collage, which ar-
ranges a set of icon-sized tile images in a container image. The
tiles are regular squares [Hausner 2001] or arbitrarily-shaped im-
ages [Kim and Pellacini 2002], which can be seen as tiny ele-
ments. However, such stylized images mainly rest on the tile ori-
entation along feature edges to represent the target shape, ignoring
the shape consistency between individual tiles and their occupied
regions. Hence, image mosaic usually needs a large number of tile
images to fabricate the desirable representation. Our approach pur-
sues Arcimboldo-like effect by using consistent cutouts to represent
the meaningful components of the target image.
Gal et al. [2007] present a method to produce the collage like our
style, but using 3D models from a database. Their method needs a
3D shape as the template, not suitable for production from many 2D
images having no 3D representations. Besides, 3D database has far
less data than Internet image database, reducing the potential range
of artistic expression. Also using 3D models, Mitra et al. [2009]
design a system to assemble the texture splats instead of 3D geom-
etry, to produce the so-called emerging images. Emerging images
are perceived as a whole like Arcimboldo-like collage, but do not
locally present any meaningful components. Chu et al. [2010] im-
merse and hide foreground images into a background to produce
camouage images. Although a visual style similar to collage style,
those camouage images are less concerned about shape and color
consistency between the inset foreground and background.
Internet image based approaches have been gaining remarkable
ground as the method of choice for a series of classical image pro-
cessing problems, such as image based 3D touring [Snavely et al.
2006], image completion [Hays and Efros 2007], and so on. Chen
et al. [2009] make realistic photos by blending cutouts from ap-
propriate Internet images. They propose a set of lters to retrieve
the desirable online images and also the cutouts as scene items for
photo synthesis. In this paper, we will present a novel usage of
Internet images to create the artforms of Arcimboldo-like collages.
3 Competent Cutouts Selection
The essence of Arcimboldo-like collage is a metaphor of composite
representation about the input image. So cutouts should be diversi-
ed yet consistent in forms to make both themselves and the collage
still recognizable after assembling.
We use the Internet images to enable diversity of cutouts. Since
the space of Internet images is effectively innite, searching all
the online images to retrieve candidate cutouts is impossible. Ad-
ditionally, assembly of cutouts belonging to the close themes be-
comes more recognizable than those from contrast themes [Maior-
ino 1983], so we would like to select cutouts with some related
themes. We use text-based image searching and let user input some
descriptive key words to collect the relevant Internet images belong-
ing to the theme. Typically, the key words are suggested to some
collective or material nouns like fruit, vegetable or combination of
such words. Then, the saliency and content consistency image l-
ters [Chen et al. 2009] are applied to classify the searched images as
well as obtain the cutouts by GrabCut segmentation [Rother et al.
2004], which are denoted by c = |Ci (see Figure 3 (a)). As stated
in their report, those image lters prefer images with salient fore-
ground and simple background, while might generate inaccurate
cutouts for complex images, which needs ofine manual effort to
improve the segmentation quality for better cutout database. How-
ever, due to the large number of online images, sufcient data can
always be found to feed the image lters for applicable cutouts in
the collage construction.
(a)
(b)
x
y
(xt, yt)
O
Figure 3: (a) Cutouts from the relevant Internet Images. (b) Each
cutout is encoded with color-shape descriptor.
3.1 Cutouts Encoding
As establishment of the database, we mark the cutouts with some
descriptive tags for selecting the competent ones in the next sec-
tions. For each cutout Ci c, we assign a descriptor as:
(Ci) = Hi, Gi) (1)
where Hi = [hi1, ..., hiN]C
i
denotes the N-bin histogram in YUV
color space, and Gi = [gi1, gi2, ...]C
i
are afne moment invari-
ants (AMIs) [Flusser and Suk 1994]. This descriptor encodes the
cutout with both color distribution and shape feature invariant un-
der afne transformation, which approximates imaging mechanism
in photographing (see Section 3.2). For two cutouts Ci and Cj, we
dene their color and shape disparity as:
dc(Ci, Cj) =
N
k=1
(h
ik
h
jk
)
2
h
ik
+h
jk
(2)
ds(Ci, Cj) =
k
|g
ik
g
jk
| (3)
For efcient computation, we set N = 12
3
and only use the AMI
of the lowest order, i.e., Gi = gi1 = (2002
2
11
)/
4
00
, where
pq is the shape moment dened as
pq =
O
(x xt)
p
(y yt)
q
dxdy (4)
with (xt, yt) as center of the binary image of the cutout (see Fig-
ure 3 (b)). Then, K-means method is used to classify c into clusters
by color and shape descriptors respectively. The K-means method
is initialized with 40 clusters in our experiments. The set of color
cluster centers is denoted by |/
c
k
, and |/
s
k
for shape cluster
centers. Next, we proceed to selection of the competent cutouts.
3.2 Component-aware Cutouts Matching
The distinction of our Arcimboldo-like collage is to disguise an im-
age of arbitrary shape with multiple cutouts of arbitrary shapes in
a visually consistent matching manner. Such consistency embod-
ies two aspects: the collage of cutouts resembling the input image,
while individual cutout still being recognizable in the collage. This
demands the cutouts to match structural components of the input
image, thus forming plausible Arcimboldo-like representation.
Assuming the input image is segmented into a set of components
S = |Si (see Section 3.3), the matching is performed to assign
each component a label L indicating its cutout. The matching en-
ergy for competents cutout L(Si) c comprises three terms:
E(Si; L, T) = Ecms(L, T) +w
col
E
col
(L) +w
dev
E
dev
(T) (5)
where T is the induced transformation for consistent matching. The
rst term Ecms tends to select cutout most resembling the compo-
nent in shape, possibly with appropriate transformation. Next, E
col
term measures color similarity between component and its cutout.
Finally, E
dev
considers the identiability of individual cutouts in
the collage, which penalizes severe shape change under the trans-
formation. Next, each of these energy terms is discussed in detail,
as well as their parameters setting.
Shape Matching The destined cutout is required to well match
the shape of the corresponding component. However, there is rare
photographed image, even from the Internet, with exactly the same
shape as the component. Thus, shape deformation is inevitable,
and Ecms should favor shape consistency under such deformation.
Formally, this term is dened as
Ecms(L, T) = Ecms(L, Ai) = (Si Ai L(Si))/(Si) (6)
where () is the region area, is the symmetric difference, i.e.,
X Y = (X Y ) \ (X Y ) , and Ai is the best afne transfor-
mation matrix for shape matching as described below.
There are many published approaches to shape matching under pre-
scribed transformation. Here, we set T to the afne transformation
to follow the projective photography model of pin-hole camera.
The implied projection can be well approximated by afne trans-
formation, which makes the deformed cutout projectively changed
with less distortion (see Figure 4), thus being recognizable in the
collage. Hence, optimization of Equation (6) is to nd the cutout
best matching the component under afne transformation. Here, we
use the afne registration approach [Ho et al. 2009] to compute the
matching transformation Ai between the outlines of L(Si) and Si.
This approach does not need explicit pairwise correspondence and
is computationally fast.
Color Matching This term enables color imitation when using
cutout L(Si) to represent the component Si, dened as
E
col
(L) = dc(Si, L(Si)) (7)
where dc is the color difference dened by the histogram distance
as in Equation (2).
Matching Deviation To make the cutouts still recognizable, we
should keep their changes from severe deviation when matching to
components (see Figure 4). Since the deviation comes from afne
transformation in the shape matching, this term is dened as
E
dev
(T = Ai) = |i1/i2 1| (8)
where i1 and i2 are the two singular values of Ai. Actually, the
ratio i1/i2 measures the variation of Ai to a conformal mapping,
which only admits locally isotropic transformation.
Parameters The parameters w
col
and w
dev
tune the delity of
color and shape in the Arcimboldo-like representation. Sometimes,
the cutouts database might be in biased supply that cannot recon-
cile the components over the color and shape request, e.g., there
are few instruments of blue color, or fruits with rectangular shape.
Hence, we use adaptive setting instead of constant values to dene
the weighting parameters as:
w
col
(Si) = exp(dc(Si, c)) (9)
w
dev
(Si) = exp(ds(Si, c)) (10)
where and are constant coefcients for scales adjustment, and
set as = = 10 in our experiments. The disparity dc(Si, c) =
Figure 4: For each component, cutout is selected by measuring the
consistency of color (histogram), shape under afne transforma-
tion, and the matching deviation (shown by the circle distortion).
(d) (f)
Figure 5: (a) Segmentation by mean-shift clustering method. (b) Distribution of cutouts (diamonds) and segmented patches (dots) embedded
in the 2D space based on the metric ds. (c) Unqualied patches. (d) Patches merging and splitting. (e) Segmentation result by merging and
splitting on all the unqualied patches in (c) after the rst iteration. (f) Final segmented components.
min
k
|dc(Si, /
c
k
) and ds(Si, c) = min
k
|ds(Si, /
s
k
) mea-
sure color and shape similarity of the component to the database.
Energy minimization of Equation (5) can be performed by sequen-
tially recording energy value of each cutout in c, and keeping the
optimal one with minimum value as the competent one for the cor-
responding component. Since the optimization is independent be-
tween the components, it can be efciently implemented in parallel.
3.3 Cutouts Guided Component Segmentation
Obviously, selection of competent cutouts depends on the segmen-
tation |Si by the component-aware matching way. However, un-
supervised segmentation of semantic components remains a great
challenge in image processing, and we do not solve this general
problem in our system. Since components are represented by the
cutouts, it would be favorable to get segmentation with shape close
to the cutouts. So we use the database c as reference to iteratively
rene automatic segmentation result for desirable components.
Let T = |Pi be the automatically segmented patches using mean-
shift clustering approach [Comaniciu and Meer 2002] (see Figure 5
(a)), with uniform kernel of radius 10 in our experiments, and =
|Pi : ds(Pi, c) > be the unqualied components, which have
shapes far away from the database (see Figure 5 (b-c)). Then, for
each patch Pi , we optionally apply merging or splitting step to
improve its segmentation result as follows.
Merging. Given the patch Pi, we nd the patch Qi such that Qi =
arg min
Q
j
|ds(Pi Qj, c) < ds(Pi, c) : Qj A(Pi), where
A() is the neighborhood, and then merge Qi into Pi to increase the
shape similarity of Pi (see top in Figure 5 (d)). The merging step
can reduce the number of patches by combining neighbor patches
for the better shape matching. If such Qi does not exist, we turn to
the next splitting step.
Splitting. Let (uj, vj) Pi denote a pair of points on the out-
line of the patch Pi, and T = |(uj, vj) : [ujvj[/[
ujvj[ <
be the constraint set of point pairs, where [ujvj[ is the inner dis-
tance measured within the interior of Pi, and [
K
k=1
ds(E
ik
, c))/K, and split Pi into K new patches,
which have shapes closer to the database on average. Although the
splitting step increases the number of patches, it provides segmen-
tation more conformable to the cutout database, which induces the
consistent cutout matching.
The merging and splitting are performed on all the patches in ,
and subsequently update the segmentation T and . This proce-
dure is iterated until = . In our experiments, we set =
0.1 max|ds(/
s
i
, /
s
j
) and = 0.3. Finally, we obtain the
components |Si more suitable for cutouts matching (see Figure 5
(f)). However, this guided segmentation, relying on the database
c, cannot always generate accurate semantic components. We also
provide interactive merging and splitting tools to further improve
the segmentation (see Section 5).
4 Collage Assembly
Since the goal is to represent the components with cutouts in a
shape and color consistent way, we use the afne transformation
Ai associated with L(Si) in Equation (6) to assemble the cutouts
together. Thus, each component Si is replaced with the transformed
cutout
Si = Ai L(Si) (Figure 6). However, there is no evident se-
quence to sort the transformed cutouts in the assembly, so we must
reason about their layer ordering from the input image.
i
A
j
A
Figure 6: Selected cutouts are assembled according to the afne
transformations in unorganized layer ordering (gray image).
4.1 Layer Ordering of Cutouts
Recovering depth information from a single image is almost impos-
sible due to the lack of sufcient spatial cues. Here, image has been
represented by the segmented components, so it can be cast as esti-
mation of a reasonable layer ordering of the components. However,
even reasoning on the layer ordering of components is severely ill-
posed. Inspired by the work of occlusion recovery from single im-
age [Hoiem et al. 2007], we employ some perceptually-motivated
cues to infer the layer ordering. In the following sections, we use
inequalities to describe the orders of components, e.g., Si < Sj im-
plies that Si lies in front of Sj. Each cue has some inequality votes
on the ordering of components, which is discussed as follows.
Region coverage If the boundary of component Sa is completely
surrounded by component S
b
, we have the ordering Sa < S
b
(see
Figure 7 (a)). Hence, complete components are more likely to be
displayed in the front. This is a rather intuitive cue arising from
common physical interpretation of the scene.
T-junction area T-junction occurs when one boundary ends on
another one, where more than two boundaries intersect with each
other (see Figure 7 (b)). Assuming components |S
k
[k = 1, ..., m
touch at T-junction t, and r is a disc centered at t with radius
r, then the ordering is determined according to local areas near
t, i.e., S
l
< S
k
if (S
k
r) < (S
l
r). T-junction has
long been used as evidence for occlusion detection [Hoiem et al.
2007]. To confound ambiguity caused by noisy segmentation, we
set the discs in a range of radius to compute the local areas, e.g.,
r |
1
n
D(S
k
), ...,
z
n
D(S
k
)[z < n, where D(S
k
) is the di-
ameter of the union of components intersecting at t. Then for each
disc, we obtain a layer ordering of the components.
Due to the heuristics, conicting inequalities might occur from the
cues above (Figure 7 (b)). To make fair inference, we keep a vot-
ing table VT to record the propounded inequalities from all the cues
(Figure 7 (c)). VT is a square matrix with dimension of the com-
ponents number, of which each entry nij is the number of votes
for Si < Sj. For the cues of region coverage, each inequality tal-
lies one vote; while for T-junction area, each inequality contributes
1/z, where z is the number of discs. Then, all the votes are summa-
rized in VT. After majority election from all the cues, we obtain the
inequality set V, which can be represented by a direct graph with
the edge weight of nij. Then, we apply topological sorting on V
and erase the loop on the graph by deleting edges with the minimum
weights [Kahn 1962], which results in a consistent layer ordering
of all the components. Accordingly, we reshufe the cutouts in the
assembly (Figure 7 (d)).
a b
S S
a
S
b
S
j
S
k
S
(a) (b)
(c) (d)
t
b
S
a
S
b
S
j
S
k
S
a
S
b
S
j
S
k
S
1
1 1
2
3
1
3
b k j
b j k
b k j
S S S
S S S
S S S
0 2 4 6 8 10
Calabash
Excavator
Kiss
Shin-chan
Parrot
Popeye
Cock
Kangaroo
Elephant
Panda
Prime
(b) (a)
Participant number
Score
0 4 8 12 16 20
Our method Photoshop
0 2 4 6 8 10
D E
Participant number
Score
0 4 8 12 16 20
Our method Photoshop
0 2 4 6 8 10
D E
Score
0 2 4 6 8 10
Participant number
0 4 8 12 16 20
(c)
Figure 11: Statistics of user study and collages created by an artist.
7 Conclusion
We have given an efcient algorithm to create the Arcimboldo-like
collage, which represents an input image with thematically-related
cutouts from the Internet images. User study shows that our system
is feasible to produce collages having plausible Arcimboldo-like
style of the input images with the recognizable cutouts.
Despite minor limitations, we hope this paper opens a new direction
in computational aesthetics based on Internet images. We believe
that the massive database of Internet images furnishes the favor-
able desideratum in a wide range of image processing tasks. As the
future work, we plan to apply more stylization techniques [Wang
et al. 2010] to enhance the collages. Another promising scenario is
to combine Arcimboldo-like collage and mosaic in a unied Inter-
net image framework. By controlling the element size, we would
like to produce a spectrum of artworks with varied assembly styles.
Acknowledgements
We would like to thank the anonymous reviewers for their helpful
comments. We are also grateful to Hasbro International Inc., King
Features Syn., Animation International Ltd., and Shanghai Anima-
tion Film Studio for granting the permissions to use the pictures of
Prime, Popeye, Shin-chan and Calabash. We thank Nerina Patane
and Meng Ding for sharing their artworks in Figure 2 and 11, Yu
Zang and Hong Liu for helping us with the gures and video. This
work was partly supported by the Program for New Century Excel-
lent Talents in University (No. NCET-09-0635) and the National
Natural Science Foundation of China (No. 61133008, 61103159).
References
AGARWALA, A., DONTCHEVA, M., AGRAWALA, M., DRUCKER,
S., COLBURN, A., CURLESS, B., SALESIN, D., AND COHEN,
M. 2004. Interactive digital photomontage. ACM Trans. Graph.
23 (August), 294302.
CHEN, T., CHENG, M.-M., TAN, P., SHAMIR, A., AND HU, S.-
M. 2009. Sketch2photo: Internet image montage. ACM Trans.
Graph. 28 (December), 124:1124:10.
CHEUNG, V., 2011. Shape collage. http://www.
shapecollage.com.
CHU, H.-K., HSU, W.-H., MITRA, N. J., COHEN-OR, D.,
WONG, T.-T., AND LEE, T.-Y. 2010. Camouage images.
ACM Trans. Graph. 29 (July), 51:151:8.
COMANICIU, D., AND MEER, P. 2002. Mean shift: A robust
approach toward feature space analysis. IEEE Trans. Pattern
Anal. Mach. Intell. 24 (May), 603619.
CONG, L., TONG, R., AND DONG, J. 2011. Selective image
abstraction. The Visual Computer 27 (March), 187198.
FLUSSER, J., AND SUK, T. 1994. Afne moment invariants: A
new tool for character recognition. Pattern Recognition Letters
15 (April), 433436.
GAL, R., SORKINE, O., POPA, T., SHEFFER, A., AND COHEN-
OR, D. 2007. 3D collage: Expressive non-realistic modeling. In
Proc. NPAR, 714.
GOFERMAN, S., TAL, A., AND ZELNIK-MANOR, L. 2010.
Puzzle-like collage. Comput. Graph. Forum 29 (May), 459468.
HAUSNER, A. 2001. Simulating decorative mosaics. In ACM
SIGGRAPH 2001, 573580.
HAYS, J., AND EFROS, A. A. 2007. Scene completion using
millions of photographs. ACM Trans. Graph. 26 (July).
HO, J., PETER, A., RANGARAJAN, A., AND YANG, M.-H. 2009.
An algebraic approach to afne registration of point sets. In
Proc. ICCV, 13351340.
HOIEM, D., EFROS, A. A., AND HEBERT, M. 2007. Recovering
occlusion boundaries from a single image. In Proc. ICCV, 18.
KAHN, A. B. 1962. Topological sorting of large networks. Com-
mun. ACM 5 (November), 558562.
KIM, J., AND PELLACINI, F. 2002. Jigsaw image mosaics. ACM
Trans. Graph. 21 (July), 657664.
LOY, G., AND EKLUNDH, J.-O. 2006. Detecting symmetry and
symmetric constellations of features. In Proc. ECCV, 508521.
MAIORINO, G. 1983. The Portrait of Eccentricity: Arcimboldo and
the Mannerist Grotesque. Pennsylvania State University Press.
MITRA, N. J., CHU, H.-K., LEE, T.-Y., WOLF, L., YESHURUN,
H., AND COHEN-OR, D. 2009. Emerging images. ACM Trans.
Graph. 28 (December), 163:1163:8.
ROTHER, C., KOLMOGOROV, V., AND BLAKE, A. 2004. Grab-
Cut: Interactive foreground extraction using iterated graph cuts.
ACM Trans. Graph. 23 (August), 309314.
ROTHER, C., BORDEAUX, L., HAMADI, Y., AND BLAKE, A.
2006. AutoCollage. ACM Trans. Graph. 25 (July), 847852.
SNAVELY, N., SEITZ, S. M., AND SZELISKI, R. 2006. Photo
tourism: Exploring photo collections in 3D. ACM Trans. Graph.
25 (July), 835846.
WANG, J., SUN, J., QUAN, L., TANG, X., AND SHUM, H. 2006.
Picture collage. In Proc. CVPR, 347354.
WANG, S., CAI, K., LU, J., LIU, X., AND WU, E. 2010. Real-
time coherent stylization for augmented reality. The Visual Com-
puter 26 (June), 445455.
WIKIPEDIA, 2011. Collage. http://en.wikipedia.org/
wiki/Collage.