"GrabCut" - Interactive Foreground Extraction Using Iterated Graph Cuts
"GrabCut" - Interactive Foreground Extraction Using Iterated Graph Cuts
Carsten Rother
Vladimir Kolmogorov
Microsoft Research Cambridge, UK
Andrew Blake
Figure 1: Three examples of GrabCut . The user drags a rectangle loosely around an object. The object is then extracted automatically.
Abstract
The problem of efficient, interactive foreground/background segmentation in still images is of great practical importance in image editing. Classical image segmentation tools use either texture
(colour) information, e.g. Magic Wand, or edge (contrast) information, e.g. Intelligent Scissors. Recently, an approach based on
optimization by graph-cut has been developed which successfully
combines both types of information. In this paper we extend the
graph-cut approach in three respects. First, we have developed a
more powerful, iterative version of the optimisation. Secondly, the
power of the iterative algorithm is used to simplify substantially the
user interaction needed for a given quality of result. Thirdly, a robust algorithm for border matting has been developed to estimate
simultaneously the alpha-matte around an object boundary and the
colours of foreground pixels. We show that for moderately difficult
examples the proposed method outperforms competitive tools.
CR Categories:
I.3.3 [Computer Graphics]: Picture/Image
GenerationDisplay algorithms; I.3.6 [Computer Graphics]:
Methodology and TechniquesInteraction techniques; I.4.6 [Image Processing and Computer Vision]: SegmentationPixel classification; partitioning
Keywords: Interactive Image Segmentation, Graph Cuts, Image
Editing, Foreground extraction, Alpha Matting
Introduction
This paper addresses the problem of efficient, interactive extraction of a foreground object in a complex environment whose background cannot be trivially subtracted. The resulting foreground object is an alpha-matte which reflects the proportion of foreground
and background. The aim is to achieve high performance at the
cost of only modest interactive effort on the part of the user. High
performance in this task includes: accurate segmentation of object
from background; subjectively convincing alpha values, in response
to blur, mixed pixels and transparency; clean foreground colour,
e-mail:
[email protected]
[email protected]
e-mail: [email protected]
e-mail:
1.1
Graph Cut [Boykov and Jolly 2001; Greig et al. 1989] is a powerful optimisation technique that can be used in a setting similar
to Bayes Matting, including trimaps and probabilistic colour models, to achieve robust segmentation even in camouflage, when foreground and background colour distributions are not well separated.
The system is explained in detail in section 2. Graph Cut techniques
can also be used for image synthesis, like in [Kwatra et al. 2003]
where a cut corresponds to the optimal smooth seam between two
images, e.g. source and target image.
Level sets [Caselles et al. 1995] is a standard approach to image
and texture segmentation. It is a method for front propagation by
solving a corresponding partial differential equation, and is often
used as an energy minimization tool. Its advantage is that almost
any energy can be used. However, it computes only a local minimum which may depend on initialization. Therefore, in cases where
the energy function can be minimized exactly via graph cuts, the
latter method should be preferable. One such case was identified
by [Boykov and Kolmogorov 2003] for computing geodesics and
minimal surfaces in Riemannian space.
1.2
First, the segmentation approach of Boykov and Jolly , the foundation on which GrabCut is built, is described in some detail.
2.1
Image segmentation
= {h(z; ), = 0, 1},
(1)
one for background and one for foreground. The histograms are
assembled directly from labelled pixels from the respective trimap
regions TB , TF . (Histograms
are normalised to sum to 1 over the
!
grey-level range: z h(z; ) = 1.)
The segmentation task is to infer the unknown opacity variables
from the given image data z and the model .
2.2
An energy function E is defined so that its minimum should correspond to a good segmentation, in the sense that it is guided both
by the observed foreground and background grey-level histograms
and that the opacity is coherent, reflecting a tendency to solidity
of objects. This is captured by a Gibbs energy of the form:
E( , , z) = U( , , z) +V ( , z) .
(2)
The data term U evaluates the fit of the opacity distribution to the
data z, given the histogram model , and is defined to be:
U( , , z) = log h(zn ; n ) .
(3)
(m,n)C
= arg min E( , ).
(6)
Magic Wand
Intelligent Scissors
Bayes Matte
Knockout 2
Graph cut
GrabCut
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2: Comparison of some matting and segmentation tools. The top row shows the user interaction required to complete the segmentation or matting process: white brush/lasso (foreground), red brush/lasso (background), yellow crosses (boundary). The bottom row illustrates
the resulting segmentation. GrabCut appears to outperform the other approaches both in terms of the simplicity of user input and the quality
of results. Original images on the top row are displayed with reduced intensity to facilitate overlay; see fig. 1. for original. Note that our
implementation of Graph Cut [Boykov and Jolly 2001] uses colour mixture models instead of grey value histograms.
This section describes the novel parts of the GrabCut hard segmentation algorithm: iterative estimation and incomplete labelling.
3.1
(7)
(8)
1
log det (n , kn )
2
(9)
1
+ [zn (n , kn )] (n , kn )1 [zn (n , kn )].
2
Therefore, the parameters of the model are now
(10)
[n = m ] exp zm zn 2 .
(11)
(m,n)C
3.2
Initialisation
User initialises trimap T by supplying only TB . The foreground is set to TF = 0;
/ TU = T B , complement of the background.
Initialise n = 0 for n TB and n = 1 for n TU .
Background and foreground GMMs initialised from sets
n = 0 and n = 1 respectively.
Automatic
Segmentation
User
Iterative minimisation
on
Interacti
Automatic
kn
Segmentation
Figure 5: User editing. After the initial user interaction and segmentation (top row), further user edits (fig. 3) are necessary. Marking roughly with a foreground brush (white) and a background
brush (red) is sufficient to obtain the desired result (bottom row).
(a)
12
GREEN
GREEN
Energy E
RED
(b)
RED
(c)
3.3
Further user editing. The initial, incomplete user-labelling is often sufficient to allow the entire segmentation to be completed automatically, but by no means always. If not, further user editing
is needed [Boykov and Jolly 2001], as shown in fig.5. It takes the
form of brushing pixels, constraining them either to be firm foreground or firm background; then the minimisation step 3. in fig. 3
is applied. Note that it is sufficient to brush, roughly, just part of a
wrongly labeled area. In addition, the optional refine operation of
fig. 3 updates the colour models, following user edits. This propagates the effect of edit operations which is frequently beneficial.
Note that for efficiency the optimal flow, computed by Graph Cut,
can be re-used during user edits.
Incomplete trimaps. The iterative minimisation algorithm allows increased versatility of user interaction. In particular, incomplete labelling becomes feasible where, in place of the full trimap
T , the user needs only specify, say, the background region TB , leaving TF = 0. No hard foreground labelling is done at all. Iterative
minimisation (fig. 3) deals with this incompleteness by allowing
provisional labels on some pixels (in the foreground) which can
subsequently be retracted; only the background labels TB are taken
to be firm guaranteed not to be retracted later. (Of course a complementary scheme, with firm labels for the foreground only, is also
a possibility.) In our implementation, the initial TB is determined by
the user as a strip of pixels around the outside of the marked rectangle (marked in red in fig. 2f).
Transparency
Given that a matting tool should be able to produce continuous alpha values, we now describe a mechanism by which hard segmentation, as described above, can be augmented by border matting, in
which full transparency is allowed in a narrow strip around the hard
segmentation boundary. This is sufficient to deal with the problem
of matting in the presence of blur and mixed pixels along smooth
object boundaries. The technical issues are: Estimating an alphamap for the strip without generating artefacts, and recovering the
foreground colour, free of colour bleeding from the background.
4.1
Border Matting
TF
TF
rn
(a)
TB
0
w
TB
TU
(b)
r
w
Knockout 2
Bayes Matte
GrabCut
(c)
Figure 6: Border matting. (a) Original image with trimap overlaid. (b) Notation for contour parameterisation and distance map.
Contour C (yellow) is obtained from hard segmentation. Each pixel
in TU is assigned values (integer) of contour parameter t and distance rn from C. Pixels shown share the same value of t. (c) Soft
step-function for -profile g, with centre and width .
-profile. It is assumed that all pixels with the same index t share
values of the parameters t , t .
Parameter values 1 , 1 , . . . , T , T are estimated by minimizing
the following energy function using DP over t:
T
E=
nTU
D n (n ) + V (t , t , t+1 , t+1 )
(12)
t=1
(13)
t ( )
t ( )
=
=
(1 )t (0) + t (1)
2
(15)
(1 ) t (0) + t (1).
4.2
Foreground estimation
www.research.microsoft.com/vision/cambridge/segmentation/
No User
Interaction
Figure 8: Results using GrabCut . The first row shows the original images with superimposed user input (red rectangle). The second row
displays all user interactions: red (background brush), white (foreground brush) and yellow (matting brush). The degree of user interaction
increases from left to right. The results obtained by GrabCut are visualized in the third row. The last row shows zoomed portions of the
respective result which documents that the recovered alpha mattes are smooth and free of background bleeding.
References
M ORTENSEN , E., AND BARRETT, W. 1999. Tobogan-based intelligent scissors with a four parameter edge model. In Proc. IEEE
Conf. Computer Vision and Pattern Recog., vol. 2, 452458.