Mean Shift a Robust Approach Toward Feature Space Analysis
Mean Shift a Robust Approach Toward Feature Space Analysis
AbstractÐA general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate
arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean
shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying
density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-
Watson estimator from kernel regression and the robust M-estimators of location is also established. Algorithms for two low-level vision
tasks, discontinuity preserving smoothing and image segmentation, are described as applications. In these algorithms, the only user
set parameter is the resolution of the analysis and either gray level or color images are accepted as input. Extensive experimental
results illustrate their excellent performance.
Index TermsÐMean shift, clustering, image segmentation, image smoothing, feature space, low-level vision.
1 INTRODUCTION
Fig. 1. Example of a feature space. (a) A 400 276 color image. (b) Corresponding L*u*v* color space with 110; 400 data points.
some proximity measure. See [28, Section 3.2] for a survey points xi , i 1; . . . ; n in the d-dimensional space Rd , the
of hierarchical clustering methods. The hierarchical meth- multivariate kernel density estimator with kernel K x and a
ods tend to be computationally expensive and the definition symmetric positive definite d d bandwidth matrix H,
of a meaningful stopping criterion for the fusion (or computed in the point x is given by
division) of the data is not straightforward.
The rationale behind the density estimation-based non- 1X n
f^ x KH x xi ; 1
parametric clustering approach is that the feature space can n i1
be regarded as the empirical probability density function
(p.d.f.) of the represented parameter. Dense regions in the where
feature space thus correspond to local maxima of the p.d.f., 1=2 1=2
KH x j H j K H x: 2
that is, to the modes of the unknown density. Once the
location of a mode is determined, the cluster associated The d-variate kernel K x is a bounded function with
with it is delineated based on the local structure of the compact support satisfying [62, p. 95]
feature space [25], [60], [63]. Z
Our approach to mode detection and clustering is based on K xdx 1 lim kxkd K x 0
the mean shift procedure, proposed in 1975 by Fukunaga and Rd kxk!1
Z Z 3
Hostetler [21] and largely forgotten until Cheng's paper [7]
rekindled interest in it. In spite of its excellent qualities, the xK xdx 0 xx> K xdx cK I;
Rd Rd
mean shift procedure does not seem to be known in statistical
literature. While the book [54, Section 6.2.2] discusses [21], the where cK is a constant. The multivariate kernel can be
advantages of employing a mean shift type procedure in generated from a symmetric univariate kernel K1 x in two
density estimation were only recently rediscovered [8]. different ways
As will be proven in the sequel, a computational module
Y
d
based on the mean shift procedure is an extremely versatile K P x K1 xi K S x ak;d K1 kxk; 4
tool for feature space analysis and can provide reliable i1
solutions for many vision tasks. In Section 2, the mean shift
procedure is defined and its properties are analyzed. In where K P x is obtained from the product of the univariate
Section 3, the procedure is used as the computational kernels and K S x from rotating K1 x in RdR, i.e., K S x is
module for robust feature space analysis and implementa- radially symmetric. The constant ak;d1 Rd K1 kxkdx
tional issues are discussed. In Section 4, the feature space assures that K S x integrates to one, though this condition
analysis technique is applied to two low-level vision tasks: can be relaxed in our context. Either type of multivariate
discontinuity preserving filtering and image segmentation. kernel obeys (3), but, for our purposes, the radially
Both algorithms can have as input either gray level or color symmetric kernels are often more suitable.
images and the only parameter to be tuned by the user is We are interested only in a special class of radially
the resolution of the analysis. The applicability of the mean symmetric kernels satisfying
shift procedure is not restricted to the presented examples.
K x ck;d k kxk2 ; 5
In Section 5, other applications are mentioned and the
procedure is put into a more general context. in which case it suffices to define the function k x called
the profile of the kernel, only for x 0. The normalization
constant ck;d , which makes K x integrate to one, is
2 THE MEAN SHIFT PROCEDURE assumed strictly positive.
Kernel density estimation (known as the Parzen window Using a fully parameterized H increases the complexity
technique in pattern recognition literature [17, Section 4.3]) is of the estimation [62, p. 106] and, in practice, the bandwidth
the most popular density estimation method. Given n data matrix H is chosen either as diagonal H diagh21 ; . . . ; h2d ,
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 605
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
606 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
points reside. Since the mean shift vector is aligned with the procedures to chose the adequate step sizes. This is a major
local gradient estimate, it can define a path leading to a advantage over the traditional gradient-based methods.
stationary point of the estimated density. The modes of the For discrete data, the number of steps to convergence
density are such stationary points. The mean shift procedure, depends on the employed kernel. When G is the uniform
obtained by successive kernel, convergence is achieved in a finite number of steps
since the number of locations generating distinct mean
. computation of the mean shift vector mh;G x, values is finite. However, when the kernel G imposes a
. translation of the kernel (window) G x by mh;G x, weighting on the data points (according to the distance
is guaranteed to converge at a nearby point where the estimate from its center), the mean shift procedure is infinitely
(11) has zero gradient, as will be shown in the next section. The convergent. The practical way to stop the iterations is to set
presence of the normalization by the density estimate is a a lower bound for the magnitude of the mean shift vector.
desirable feature. The regions of low-density values are of no
interest for the feature space analysis and, in such regions, the 2.3 Mean Shift-Based Mode Detection
mean shift steps are large. Similarly, near local maxima the Let us denote by yc and f^h;Kc
f^h;K yc the convergence
steps are small and the analysis more refined. The mean shift points of the sequences fyj gj1;2... and ff^h;K jgj1;2... ,
procedure thus is an adaptive gradient ascent method. respectively. The implications of Theorem 1 are the following.
First, the magnitude of the mean shift vector converges to
2.2 Sufficient Condition for Convergence zero. Indeed, from (17) and (20) the jth mean shift vector is
Denote by fyj gj1;2... the sequence of successive locations of
the kernel G, where, from (17), mh;G yj yj1 yj 22
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 607
0
Theorem 2. The cosine of the angle between two consecutive f x
E x^ x j X1 ; . . . ; Xn h2 ; 29
mean shift vectors is strictly positive when a normal kernel is f x2 g
employed, i.e., which is similar to (19). The mean shift procedure thus
>
mh;N yj mh;N yj1 exploits to its advantage the inherent bias of the zero-order
> 0: 25 kernel regression.
kmh;N yj kkmh;N yj1 k
The connection to the kernel regression literature opens
As a consequence of Theorem 2, the normal kernel many interesting issues, however, most of these are more of
appears to be the optimal one for the mean shift procedure. a theoretical than practical importance.
The smooth trajectory of the mean shift procedure is in
contrast with the standard steepest ascent method [4, p. 21]
2.6 Relation to Location M-Estimators
(local gradient evaluation followed by line maximization) The M-estimators are a family of robust techniques which can
whose convergence rate on surfaces with deep narrow handle data in the presence of severe contaminations, i.e.,
valleys is slow due to its zigzagging trajectory. outliers. See [26], [32] for introductory surveys. In our context
In practice, the convergence of the mean shift procedure only, the problem of location estimation has to be considered.
based on the normal kernel requires large number of steps, Given the data xi ; i 1; . . . ; n; and the scale h, will
as was discussed at the end of Section 2.2. Therefore, in ^ the location estimator as
define ,
most of our experiments, we have used the uniform kernel, !
for which the convergence is finite, and not the normal Xn
xi 2
^
argmin J argmin ; 30
kernel. Note, however, that the quality of the results almost h
i1
always improves when the normal kernel is employed.
where, u is a symmetric, nonnegative valued function,
2.5 Relation to Kernel Regression
with a unique minimum at the origin and nondecreasing for
Important insight can be gained when (19) is obtained
u 0. The estimator is obtained from the normal equations
approaching the problem differently. Considering the
univariate case suffices for this purpose. 0 1
2
^ xi
Kernel regression is a nonparametric method to estimate ^ 2h 2 ^ xi w@
r J A 0; 31
complex trends from noisy data. See [62, chapter 5] for an h
introduction to the topic, [24] for a more in-depth treatment.
Let n measured data points be Xi ; Zi and assume that the where
values Xi are the outcomes of a random variable x with
probability density function f x, xi Xi ; i 1; . . . ; n, d u
w u :
while the relation between Zi and Xi is du
Therefore, the iterations to find the location M-estimate are
Zi m Xi i i 1; . . . ; n; 26
based on
where m x is called the regression function and i is an
independently distributed, zero-mean error, Ei 0. Pn ^ xi
2
i1 xi w h
A natural way to estimate the regression function is by ^ ; 32
locally fitting a degree p polynomial to the data. For a window Pn ^ xi
2
i1 w h
centered at x, the polynomial coefficients then can be obtained
by weighted least squares, the weights being computed from a
which is identical to (20) when w u g u. Taking into
symmetric function g x. The size of the window is controlled
by the parameter h, gh x h 1 g x=h. The simplest case is account (13), the minimization (30) becomes
that of fitting a constant to the data in the window, i.e., p 0. It !
Xn
x 2
^ argmax
can be shown, [24, Section 3.1], [62, Section 5.2], that i
k ; 33
the estimated constant is the value of the Nadaraya-
i1
h
Watson estimator,
which can also be interpreted as
Pn
gh x Xi Zi
^ x; h Pi1
m n ; 27 ^ argmax f^h;K j x1 ; . . . ; xn : 34
i1 gh x Xi
introduced in the statistical literature 35 years ago. The That is, the location estimator is the mode of the density
asymptotic conditional bias of the estimator has the estimated with the kernel K from the available data. Note that
expression [24, p. 109], [62, p. 125], the convexity of the k x profile, the sufficient condition for
^ x; h the convergence of the mean shift procedure (Section 2.2) is in
E m m x j X1 ; . . . ; Xn
accordance with the requirements to be satisfied by the
m00
xf x 2m0 xf x
0
28
h2 2 g; objective function u.
2f x The relation between location M-estimators and kernel
R 2
where 2 g u g udu. Defining m x x reduces the density estimation is not well-investigated in the statistical
Nadaraya-Watson estimator to (20) (in the univariate case), literature, only [9] discusses it in the context of an edge
while (28) becomes preserving smoothing technique.
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
608 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
3 ROBUST ANALYSIS OF FEATURE SPACES that for a synthetic, bimodal normal distribution, the
technique achieves a classification error similar to the
Multimodality and arbitrarily shaped clusters are the defin-
optimal Bayesian classifier. The behavior of this feature
ing properties of a real feature space. The quality of the mean
space analysis technique is illustrated in Fig. 2. A two-
shift procedure to move toward the mode (peak) of the hill on
dimensional data set of 110; 400 points (Fig. 2a) is decom-
which it was initiated makes it the ideal computational
posed into seven clusters represented with different colors
module to analyze such spaces. To detect all the significant
in Fig. 2b. A number of 159 mean shift procedures with
modes, the basic algorithm given in Section 2.3 should be run
multiple times (evolving in principle in parallel) with uniform kernel were employed. Their trajectories are shown
initializations that cover the entire feature space. in Fig. 2c, overlapped over the density estimate computed
Before the analysis is performed, two important (and with the Epanechnikov kernel. The pruning of the mode
somewhat related) issues should be addressed: the metric of candidates produced seven peaks. Observe that some of the
the feature space and the shape of the kernel. The mapping trajectories are prematurely stopped by local plateaus.
from the input domain into a feature space often associates 3.1 Bandwidth Selection
a non-Euclidean metric to the space. The problem of color
The influence of the bandwidth parameter h was assessed
representation will be discussed in Section 4, but the
employed parameterization has to be carefully examined empirically in [12] through a simple image segmentation
even in a simple case like the Hough space of lines, e.g., task. In a more rigorous approach, however, four different
[48], [61]. techniques for bandwidth selection can be considered.
The presence of a Mahalanobis metric can be accommo- . The first one has a statistical motivation. The optimal
dated by an adequate choice of the bandwidth matrix (2). In bandwidth associated with the kernel density esti-
practice, however, it is preferable to have assured that the mator (6) is defined as the bandwidth that achieves the
metric of the feature space is Euclidean and, thus, the best compromise between the bias and variance of the
bandwidth matrix is controlled by a single parameter, estimator, over all x 2 Rd , i.e., minimizes AMISE. In
H h2 I. To be able to use the same kernel size for all the the multivariate case, the resulting bandwidth for-
mean shift procedures in the feature space, the necessary mula [54, p. 85], [62, p. 99] is of little practical use, since
condition is that local density variations near a significant it depends on the Laplacian of the unknown density
mode are not as large as the entire support of a significant being estimated, and its performance is not well
mode somewhere else. understood [62, p. 108]. For the univariate case, a
The starting points of the mean shift procedures should reliable method for bandwidth selection is the plug-in
be chosen to have the entire feature space (except the very rule [53], which was proven to be superior to least-
sparse regions) tessellated by the kernels (windows). squares cross-validation and biased cross-validation
Regular tessellations are not required. As the windows [42], [55, p. 46]. Its only assumption is the smoothness
evolve toward the modes, almost all the data points are of the underlying density.
visited and, thus, all the information captured in the feature . The second bandwidth selection technique is related
space is exploited. Note that the convergence to a given to the stability of the decomposition. The bandwidth
mode may yield slightly different locations due to the is taken as the center of the largest operating range
threshold that terminates the iterations. Similarly, on flat over which the same number of clusters are obtained
plateaus, the value of the gradient is close to zero and the for the given data [20, p. 541].
mean shift procedure could stop. . For the third technique, the best bandwidth max-
These artifacts are easy to eliminate through postproces- imizes an objective function that expresses the quality
sing. Mode candidates at a distance less than the kernel of the decomposition (i.e., the index of cluster
bandwidth are fused, the one corresponding to the highest validity). The objective function typically compares
density being chosen. The global structure of the feature the inter- versus intra-cluster variability [30], [28] or
space can be confirmed by measuring the significance of the evaluates the isolation and connectivity of the
valleys defined along a cut through the density in the delineated clusters [43].
direction determined by two modes.
. Finally, since in most of the cases the decomposition
The delineation of the clusters is a natural outcome of the
is task dependent, top-down information provided
mode seeking process. After convergence, the basin of
by the user or by an upper-level module can be used
attraction of a mode, i.e., the data points visited by all the
to control the kernel bandwidth.
mean shift procedures converging to that mode, automati-
cally delineates a cluster of arbitrary shape. Close to the We present in [15], a detailed analysis of the bandwidth
boundaries, where a data point could have been visited by selection problem. To solve the difficulties generated by the
several diverging procedures, majority logic can be em- narrow peaks and the tails of the underlying density, two
ployed. It is important to notice that, in computer vision, locally adaptive solutions are proposed. One is nonpara-
most often we are not dealing with an abstract clustering metric, being based on a newly defined adaptive mean shift
problem. The input domain almost always provides an procedure, which exploits the plug-in rule and the sample
independent test for the validity of local decisions in the point density estimator. The other is semiparametric,
feature space. That is, while it is less likely that one can imposing a local structure on the data to extract reliable
recover from a severe clustering error, allocation of a few scale information. We show that the local bandwidth
uncertain data points can be reliably supported by input should maximize the magnitude of the normalized mean
domain information. shift vector. The adaptation of the bandwidth provides
The multimodal feature space analysis technique was superior results when compared to the fixed bandwidth
discussed in detail in [12]. It was shown experimentally, procedure. For more details, see [15].
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 609
Fig. 2. Example of a 2D feature space analysis. (a) Two-dimensional data set of 110; 400 points representing the first two components of the L*u*v*
space shown in Fig. 1b. (b) Decomposition obtained by running 159 mean shift procedures with different initializations. (c) Trajectories of the mean
shift procedures drawn over the Epanechnikov density estimate computed for the same data set. The peaks retained for the final classification are
marked with red dots.
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
610 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
the context of feature representation for image segmentation A recently proposed noniterative discontinuity preserving
in [16]. In practice, there is no clear advantage between using smoothing technique is the bilateral filtering [59]. The relation
L*u*v* or L*a*b*; in the proposed algorithms, we employed between bilateral filtering and diffusion-based techniques
L*u*v* motivated by a linear mapping property [65, p.166]. was analyzed in [3]. The bilateral filters also work in the joint
Our first image segmentation algorithm was a straightfor- spatial-range domain. The data is independently weighted in
ward application of the feature space analysis technique to an the two domains and the center pixel is computed as the
L*u*v* representation of the color image [11]. The modularity weighted average of the window. The fundamental differ-
ence between the bilateral filtering and the mean shift-based
of the segmentation algorithm enabled its integration by other
smoothing algorithm is in the use of local information.
groups to a large variety of applications like image retrieval
[1], face tracking [6], object-based video coding for MPEG-4 4.1.1 Mean Shift Filtering
[22], shape detection and recognition [33], and texture analysis Let xi and zi ; i 1; . . . ; n, be the d-dimensional input and
[47], to mention only a few. However, since the feature space filtered image pixels in the joint spatial-range domain. For
analysis can be applied unchanged to moderately higher each pixel,
dimensional spaces (see Section 5), we subsequently also
1. Initialize j 1 and yi;1 xi .
incorporated the spatial coordinates of a pixel into its feature
2. Compute yi;j1 according to (20) until convergence,
space representation. This joint domain representation is
y yi;c .
employed in the two algorithms described here. 3. Assign zi xsi ; yri;c .
An image is typically represented as a two-dimensional
lattice of p-dimensional vectors (pixels), where p 1 in the The superscripts s and r denote the spatial and range
gray-level case, three for color images, and p > 3 in the components of a vector, respectively. The assignment
multispectral case. The space of the lattice is known as the specifies that the filtered data at the spatial location xsi will
spatial domain, while the gray level, color, or spectral have the range component of the point of convergence yri;c .
information is represented in the range domain. For both The kernel (window) in the mean shift procedure moves in
domains, Euclidean metric is assumed. When the location the direction of the maximum increase in the joint density
and range vectors are concatenated in the joint spatial-range gradient, while the bilateral filtering uses a fixed, static
domain of dimension d p 2, their different nature has to window. In the image smoothed by mean shift filtering,
be compensated by proper normalization. Thus, the multi- information beyond the individual windows is also taken into
variate kernel is defined as the product of two radially account.
symmetric kernels and the Euclidean metric allows a single An important connection between filtering in the joint
domain and robust M-estimation should be mentioned. The
bandwidth parameter for each domain
improved performance of the generalized M-estimators (GM
! !
or bounded-influence estimators) is due to the presence of a
C xs 2 xr 2
Khs ;hr x 2 p k k ; 35 second weight function which offsets the influence of leverage
hs hr hs hr points, i.e., outliers in the input domain [32, Section 8E]. A
similar (at least in spirit) twofold weighting is employed in the
where xs is the spatial part, xr is the range part of a feature
bilateral and mean shift-based filterings, which is the main
vector, k x the common profile used in both two domains,
reason for their excellent smoothing performance.
hs and hr the employed kernel bandwidths, and C the
Mean shift filtering with uniform kernel having hs ; hr
corresponding normalization constant. In practice, an
8; 4 has been applied to the often used 256 256 gray-level
Epanechnikov or a (truncated) normal kernel always cameraman image (Fig. 3a), the result being shown in Fig. 3b.
provides satisfactory performance, so the user only has to The regions containing the grass field have been almost
set the bandwidth parameter h hs ; hr , which, by completely smoothed, while details such as the tripod and the
controlling the size of the kernel, determines the resolution buildings in the background were preserved. The processing
of the mode detection. required fractions of a second on a standard PC (600 Mhz
4.1 Discontinuity Preserving Smoothing Pentium III) using an optimized C++ implementation of the
algorithm. On the average, 3:06 iterations were necessary until
Smoothing through replacing the pixel in the center of a the filtered value of a pixel was defined, i.e., its mean shift
window by the (weighted) average of the pixels in the procedure converged.
window indiscriminately blurs the image, removing not To better visualize the filtering process, the 4020 window
only the noise but also salient information. Discontinuity marked in Fig. 3a is represented in three dimensions in Fig. 4a.
preserving smoothing techniques, on the other hand, Note that the data was reflected over the horizontal axis of the
adaptively reduce the amount of smoothing near abrupt window for a more informative display. In Fig. 4b, the mean
changes in the local structure, i.e., edges. shift paths associated with every other pixel (in both
There are a large variety of approaches to achieve this coordinates) from the plateau and the line are shown. Note
goal, from adaptive Wiener filtering [31], to implementing that convergence points (black dots) are situated in the center
isotropic [50] and anisotropic [44] local diffusion processes, of the plateau, away from the discontinuities delineating it.
a topic which recently received renewed interest [19], [37], Similarly, the mean shift trajectories on the line remain on it.
[56]. The diffusion-based techniques, however, do not have As a result, the filtered data (Fig. 4c) shows clean quasi-
a straightforward stopping criterion and, after a sufficiently homogeneous regions.
large number of iterations, the processed image collapses The physical interpretation of the mean shift-based
into a flat surface. The connection between anisotropic filtering is easy to see by examining Fig. 4a, which, in fact,
diffusion and M-estimators is analyzed in [5]. displays the three dimensions of the joint domain of a
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 611
Fig. 3. Cameraman image. (a) Original. (b) Mean shift filtered hs ; hr 8; 4.
gray-level image. Take a pixel on the line. The uniform (color) bandwidth. Only features with large spatial support
kernel defines a parallelepiped centered on this pixel and are represented in the filtered image when hs increases. On the
the computation of the mean shift vector takes into account other hand, only features with high color contrast survive
only those pixels which have both their spatial coordinates when hr is large. Similar behavior was also reported for the
and gray-level values inside the parallelepiped. Thus, if the bilateral filter [59, Fig. 3].
parallelepiped is not too large, only pixels on the line are
averaged and the new location of the window is 4.2 Image Segmentation
guaranteed to remain on it. Image segmentation, decomposition of a gray level or color
A second filtering example is shown in Fig. 5. The image into homogeneous tiles, is arguably the most important
512512 color image baboon was processed with mean shift low-level vision task. Homogeneity is usually defined as
filters employing normal kernels defined using various similarity in pixel values, i.e., a piecewise constant model is
spatial and range resolutions, hs ; hr 8 32; 4 16. enforced over the image. From the diversity of image
While the texture of the fur has been removed, the details of segmentation methods proposed in the literature, we will
the eyes and the whiskers remained crisp (up to a certain mention only some whose basic processing relies on the joint
resolution). One can see that the spatial bandwidth has a domain. In each case, a vector field is defined over the
distinct effect on the output when compared to the range sampling lattice of the image.
Fig. 4. Visualization of mean shift-based filtering and segmentation for gray-level data. (a) Input. (b) Mean shift paths for the pixels on the plateau and
on the line. The black dots are the points of convergence. (c) Filtering result hs ; hr 8; 4. (d) Segmentation result.
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
612 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
The attraction force field defined in [57] is computed at 4.2.1 Mean Shift Segmentation
each pixel as a vector sum of pairwise affinities between the Let xi and zi ; i 1; . . . ; n, be the d-dimensional input and
current pixel and all other pixels, with similarity measured
filtered image pixels in the joint spatial-range domain and
in both spatial and range domains. The region boundaries
Li the label of the ith pixel in the segmented image.
are then identified as loci where the force vectors diverge. It
is interesting to note that, for a given pixel, the magnitude 1. Run the mean shift filtering procedure for the image
and orientation of the force field are similar to those of the and store all the information about the d-dimensional
joint domain mean shift vector computed at that pixel and convergence point in zi , i.e., zi yi;c .
projected into the spatial domain. However, in contrast to
2. Delineate in the joint domain the clusters Cp p1...m
[57], the mean shift procedure moves in the direction of this
by grouping together all zi which are closer than hs
vector, away from the boundaries.
in the spatial domain and hr in the range domain,
The edge flow in [34] is obtained at each location for a
i.e., concatenate the basins of attraction of the
given set of directions as the magnitude of the gradient of a
corresponding convergence points.
smoothed image. The boundaries are detected at image
locations which encounter two opposite directions of flow. 3. For each i 1; . . . ; n, assign Li fp j zi 2 Cp g.
The quantization of the edge flow direction, however, may 4. Optional: Eliminate spatial regions containing less
introduce artifacts. Recall that the direction of the mean than M pixels.
shift is dictated solely by the data. The cluster delineation step can be refined according to
The mean shift procedure-based image segmentation is a a priori information and, thus, physics-based segmentation
straightforward extension of the discontinuity preserving algorithms, e.g., [2], [35], can be incorporated. Since this
smoothing algorithm. Each pixel is associated with a process is performed on region adjacency graphs, hierarch-
significant mode of the joint domain density located in its ical techniques like [36] can provide significant speed-up.
neighborhood, after nearby modes were pruned as in the The effect of the cluster delineation step is shown in Fig. 4d.
generic feature space analysis technique (Section 3). Note the fusion into larger homogeneous regions of the
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 613
Fig. 6. MIT image. (a) Original. (b) Segmented hs ; hr ; M 8; 7; 20. (c) Region boundaries.
Fig. 7. Room image. (a) Original. (b) Region boundaries delineated with hs ; hr ; M 8; 5; 20, drawn over the input.
result of filtering shown in Fig. 4c. The segmentation step A number of 225 homogeneous regions were identified in
does not add a significant overhead to the filtering process. fractions of a second, most of them delineating semantically
The region representation used by the mean shift meaningful regions like walls, sky, steps, inscription on the
segmentation is similar to the blob representation employed building, etc. Compare the results with the segmentation
in [64]. However, while the blob has a parametric description obtained by one-dimensional clustering of the gray-level
(multivariate Gaussians in both spatial and color domain), the values in [11, Fig. 4] or by using a Gibbs random fields-
partition generated by the mean shift is characterized by a based approach [40, Fig. 7].
nonparametric model. An image region is defined by all the The joint domain segmentation of the color 256 256 room
pixels associated with the same mode in the joint domain. image presented in Fig. 7 is also satisfactory. Compare this
In [43], a nonparametric clustering method is described in result with the segmentation presented in [38, Figs. 3e and 5c]
which, after kernel density estimation with a small band- obtained by recursive thresholding. In both these examples,
width, the clusters are delineated through concatenation of one can notice that regions in which a small gradient of
the detected modes' neighborhoods. The merging process is illumination exists (like the sky in the MIT or the carpet in the
based on two intuitive measures capturing the variations in room image) were delineated as a single region. Thus, the joint
the local density. Being a hierarchical clustering technique, domain mean shift-based segmentation succeeds in over-
the method is computationally expensive; it takes several coming the inherent limitations of methods based only on
minutes in MATLAB to analyze a 2,000 pixel subsample of gray-level or color clustering which typically oversegment
the feature space. The method is not recommended to be used small gradient regions.
in the joint domain since the measures employed in the The segmentation with hs ; hr ; M 16; 7; 40 of the
merging process become ineffective. Comparing the results 512 512 color image lake is shown in Fig. 8. Compare this
for arbitrarily shaped synthetic data [43, Fig. 6] with a result with that of the multiscale approach in [57, Fig. 11].
similarly challenging example processed with the mean shift Finally, one can compare the contours of the color image
method [12, Fig. 1] shows that the use of a hierarchical hs ; hr ; M 16; 19; 40 hand presented in Fig. 9 with those
approach can be successfully avoided in the nonparametric from [66, Fig. 15], obtained through a complex global
clustering paradigm. optimization, and from [41, Fig. 4a], obtained with geodesic
All the segmentation experiments were performed using active contours.
uniform kernels. The improvement due to joint space The segmentation is not very sensitive to the choice
analysis can be seen in Fig. 6 where the 256 256 gray- of the resolution parameters hs and hr . Note that all
level image MIT was processed with hs ; hr ; M 8; 7; 20. 256 256 images used the same hs 8, corresponding to a
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
614 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
Fig. 8. Lake image. (a) Original. (b) Segmented with hs ; hr ; M 16; 7; 40.
Fig. 9. Hand image. (a) Original. (b) Region boundaries delineated with hs ; hr ; M 16; 19; 40 drawn over the input.
17 17 spatial window, while all 512 512 images used hs The code for the discontinuity preserving smoothing and
16 corresponding to a 31 31 window. The range image segmentation algorithms integrated into a single
parameter hr and the smallest significant feature size system with graphical interface is available at http://
M control the number of regions in the segmented image. www.caip.rutgers.edu/riul/research/code.html.
The more an image deviates from the assumed piecewise
constant model, larger values have to be used for hr and M to 5 DISCUSSION
discard the effect of small local variations in the feature space.
For example, the heavily textured background in the hand The mean shift-based feature space analysis technique
image is compensated by using hr 19 and M 40, values introduced in this paper is a general tool which is not
which are much larger than those used for the room image restricted to the two applications discussed here. Since the
hr 5; M 20 since the latter better obeys the model. As quality of the output is controlled only by the kernel
with any low-level vision algorithm, the quality of the bandwidth, i.e., the resolution of the analysis, the technique
segmentation output can be assessed only in the context of should be also easily integrable into complex vision systems
the whole vision task and, thus, the resolution parameters where the control is relinquished to a closed loop process.
should be chosen according to that criterion. An important Additional insights on the bandwidth selection can be
advantage of mean shift-based segmentation is its modularity obtained by testing the stability of the mean shift direction
which makes the control of segmentation output very simple. across the different bandwidths, as investigated in [57] in
Other segmentation examples in which the original the case of the force field. The nonparametric toolbox
image has the region boundaries superposed are shown in developed in this paper is suitable for a large variety of
Fig. 10 and in which the original and labeled images are computer vision tasks where parametric models are less
compared in Fig. 11. adequate, for example, modeling the background in visual
As a potential application of the segmentation, we return to surveillance [18].
the cameraman image. Fig. 12a shows the reconstructed image The complete solution toward autonomous image seg-
after the regions corresponding to the sky and grass were mentation is to combine a bandwidth selection technique
manually replaced with white. The mean shift segmentation (like the ones discussed in Section 3.1) with top-down task-
has been applied with hs ; hr ; M 8; 4; 10. Observe the related high-level information. In this case, each mean shift
preservation of the details which suggests that the algorithm process is associated with a kernel best suited to the local
can also be used for image editing, as shown in Fig. 12b. structure of the joint domain. Several interesting theoretical
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 615
Fig. 10. Landscape images. All the region boundaries were delineated with hs ; hr ; M 8; 7; 100 and are drawn over the original image.
issues have to be addressed, though, before the benefits of dimension of the space. This is mostly due to the empty space
such a data driven approach can be fully exploited. We are phenomenon [20, p. 70], [54, p. 93] by which most of the mass in
currently investigating these issues. a high-dimensional space is concentrated in a small region of
The ability of the mean shift procedure to be attracted by the space. Thus, whenever the feature space has more than
the modes (local maxima) of an underlying density function, (say) six dimensions, the analysis should be approached
can be exploited in an optimization framework. Cheng [7] carefully. Employing projection pursuit, in which the density
already discusses a simple example. However, by introdu- is analyzed along lower dimensional cuts, e.g., [27], is a
cing adequate objective functions, the optimization problem possibility.
can acquire physical meaning in the context of a computer To conclude, the mean shift procedure is a valuable
vision task. For example, in [14], by defining the distance computational module whose versatility can make it an
between the distributions of the model and a candidate of the important component of any computer vision toolbox.
target, nonrigid objects were tracked in an image sequence
under severe distortions. The distance was defined at every APPENDIX
pixel in the region of interest of the new frame and the mean Proof of Theorem 1. If the kernel K has a convex and
shift procedure was used to find the mode of this measure monotonically decreasing profile, the sequences fyj gj1;2... and
nearest to the previous location of the target. ff^h;K jgj1;2... converge, and ff^h;K jgj1;2... is monotonically
The above-mentioned tracking algorithm can be re- increasing.
garded as an example of computer vision techniques which Since n is finite, the sequence f^h;K (21) is bounded,
are based on in situ optimization. Under this paradigm, the therefore, it is sufficient to show that f^h;K is strictly
solution is obtained by using the input domain to define the monotonic increasing, i.e., if yj 6 yj1 , then
optimization problem. The in situ optimization is a very f^h;K j < f^h;K j 1;
powerful method. In [23] and [58], each input data point
was associated with a local field (voting kernel) to produce for j 1; 2 . . . . Without loss of generality, it can be
a more dense structure from where the sought information assumed that yj 0 and, thus, from (16) and (21)
(salient features, the hyperplane representing the funda- f^h;K j 1 f^h;K j
mental matrix) can be reliably extracted. n
ck;d X yj1 xi 2 xi 2 A:1
The mean shift procedure is not computationally expen- k k :
nhd i1 h h
sive. Careful C++ implementation of the tracking algorithm
allowed real time (30 frames/second) processing of the video The convexity of the profile k x implies that
stream. While it is not clear if the segmentation algorithm
k x2 k x1 k0 x1 x2 x1 A:2
described in this paper can be made so fast, given the quality of
the region boundaries it provides, it can be used to support for all x1 ; x2 2 0; 1, x1 6 x2 , and since g x k0 x,
edge detection without significant overhead in time. (A.2) becomes
Kernel density estimation, in particular, and nonpara-
k x2 k x1 g x1 x1 x2 : A:3
metric techniques, in general, do not scale well with the
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
616 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
Fig. 11. Some other segmentation examples with hs ; hr ; M 8; 7; 20. Left: original. Right: segmented.
Now, using (A.1) and (A.3), we obtain and, recalling (20), yields
X
f^h;K j 1 f^h;K j
n
ck;d xi 2
f^h;K j 1 f^h;K j d2 kyj1 k2 g : A:5
ck;d Xn
xi 2 h i nh i1
h
d2 g kxi k2 kyj1 xi k2
nh i1
h The profile k x being monotonically decreasing for all
P
xi 2 h > i
Xn 2
ck;d x 0, the sum ni1 g xhi is strictly positive. Thus, as
d2 g 2yj1 xi kyj1 k2
nh i1
h long as yj1 6 yj 0, the right term of (A.5) is strictly
" #
ck;d Xn
xi 2
Xn
xi 2 positive, i.e., f^h;K j 1 > f^h;K j. Consequently, the
2
d2 2y> j1 x i g ky j1 k g sequence ff^h;K jgj1;2... is convergent.
nh i1
h i1
h
To prove the convergence of the sequence fyj gj1;2... ,
A:4 (A.5) is rewritten for an arbitrary kernel location yj 6 0.
After some algebra, we have
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 617
Fig. 12. Cameraman image. (a) Segmentation with hs ; hr ; M 8; 4; 10 and reconstruction after the elimination of regions representing sky and
grass. (b) Supervised texture insertion.
X
n n
X
ck;d yj xi 2 yj1 xi 2
f^h;K j 1 f^h;K j d2 kyj1 yj k2
g : kyj1 k 2
y>
j1 xi exp < 0: B:2
nh i1
h i1
h
A:6 The space Rd can be decomposed into the following three
domains:
Now, summing the two terms of (A.6) for indices
j; j 1 . . . j m 1, it results that d > 1 2
D1 x 2 R yj1 x kyj1 k
2
f^h;K j m f^h;K j
1 B:3
X
n
yjm xi D2 x 2 Rd kyj1 k2 < y> x ky k 2
ck;d 1
2
2 j1 j1
ky yjm 1 k2 g ... n o
nhd2 jm h
i1
D3 x 2 Rd kyj1 k2 < y>
j1 x
ck;d 2
Xn
yj xi 2
d2 kyj1 yj k g and after some simple manipulations from (B.1), we can
nh h
i1
derive the equality
ck;d
d2 kyjm yjm 1 k2 . . . kyj1 yj k 2 M X xi 2
nh kyj1 k2 y> x exp
j1 i
ck;d h
d2 kyjm yj k2 M; xi 2D2
B:4
nh X 2
xi 2
>
A:7 yj1 xi kyj1 k exp :
x 2D [D
h
i 1 3
where M represents the minimum (always strictly In addition, for x 2 D2 , we have kyj1 k2 y>
P j1 x 0,
y x 2 which implies
positive) of the sum ni1 g j h i for all fyj gj1;2... .
Since ff^h;K jgj1;2... is convergent, it is also a Cauchy kyj1 xi k2 kyj1 k2 kxi k2 2y> 2
kyj1 k2
j1 xi kxi k
sequence. This property in conjunction with (A.7) implies
B:5
that fyj gj1;2... is a Cauchy sequence, hence, it is con-
vergent in the Euclidean space. u
t from where
Proof of Theorem 2. The cosine of the angle between two X
2 yj1 xi 2
kyj1 k y>
j1 xi exp
consecutive mean shift vectors is strictly positive when a xi 2D2
h
X
normal kernel is employed. yj1 2 xi 2
exp kyj1 k2 y>
j1 xi exp :
We can assume, without loss of generality that yj 0 and h xi 2D2
h
yj1 6 yj2 6 0 since, otherwise, convergence has already B:6
been achieved. Therefore, the mean shift vector mh;N 0 is
Now, introducing (B.4) in (B.6), we have
Pn
xi 2 X
i1 x i exp yj1 xi 2
mh;N 0 yj1 P
h
: B:1 kyj1 k2 y> j1 x i exp
n xi 2 h
i1 exp h
xi 2D2
yj1 2 X xi 2
2
We will show first that, when the weights are given by a exp y> x
j1 i ky j1 k exp
h x 2D [D
h
i 1 3
normal kernel centered at yj1 , the weighted sum of the
B:7
projections of yj1 xi onto yj1 is strictly negative, i.e.,
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
618 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002
and, by adding to both sides of (B.7), the quantity [4] D.P. Bertsekas, Nonlinear Programming. Athena Scientific, 1995.
[5] M.J. Black, G. Sapiro, D.H. Marimont, and D. Heeger, ªRobust
X Anisotropic Diffusion,º IEEE Trans. Image Processing, vol. 7, pp. 421-
2 > yj1 xi 2
kyj1 k yj1 xi exp ; 432, 1998.
x 2D [D
h [6] G.R. Bradski, ªComputer Vision Face Tracking as a Component of
i 1 3
a Perceptual User Interface,º Proc. IEEE Workshop Applications of
after some algebra, it results that Computer Vision, pp. 214-219, Oct. 1998.
[7] Y. Cheng, ªMean Shift, Mode Seeking, and Clustering,º IEEE
Xn Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 8,
yj1 xi 2
kyj1 k2 y> j1 ix exp pp. 790-799, Aug. 1995.
i1
h [8] E. Choi and P. Hall, ªData Sharpening as a Prelude to Density
X Estimation,º Biometrika, vol. 86, pp. 941-947, 1999.
yj1 2 xi 2
exp kyj1 k2 y> x
j1 i exp [9] C.K. Chu, I.K. Glad, F. Godtliebsen, and J.S. Maron, ªEdge-
h xi 2D1 [D3
h Preserving Smoothers for Image Processing,º J. Am. Statistical
Assoc., vol. 93, pp. 526-541, 1998.
2 2 >
[10] D. Comaniciu, ªNonparametric Robust Methods for Computer
exp ky j1 k y j1 x i 1 :
h2 Vision,º PhD thesis, Dept. of Electrical and Computer Eng.,
Rutgers Univ., 1999. Available at http://www.caip.rutgers.edu/
B:8 riul/research/theses.html.
[11] D. Comaniciu and P. Meer, ªRobust Analysis of Feature Spaces:
The right side of (B.8) is negative because Color Image Segmentation,º Proc. 1997 IEEE Conf. Computer Vision
and Pattern Recognition, pp. 750-755, June 1997.
kyj1 k2 y> j1 xi [12] D. Comaniciu and P. Meer, ªDistribution Free Decomposition of
Multivariate Data,º Pattern Analysis and Applications, vol. 2, pp. 22-
and the last product term has opposite signs in both the 30, 1999.
[13] D. Comaniciu and P. Meer, ªMean Shift Analysis and Applica-
D1 and D3 domains. Therefore, the left side of (B.8) is tions,º Proc. Seventh Int'l Conf. Computer Vision, pp. 1197-1203, Sept.
also negative, which proves (B.2). 1999.
We can use now (B.2) to write [14] D. Comaniciu, V. Ramesh, and P. Meer, ªReal-Time Tracking of
Non-Rigid Objects Using Mean Shift,º Proc. 2000 IEEE Conf.
Pn yj1 xi 2 Computer Vision and Pattern Recognition, vol. II, pp. 142-149, June
i1 xi exp h 2000.
kyj1 k2 < y>
j1 Pn
yj1 xi 2
y>
j1 yj2 ; B:9 [15] D. Comaniciu, V. Ramesh, and P. Meer, ªThe Variable Bandwidth
i1 exp h Mean Shift and Data-Driven Scale Selection,º Proc Eighth Int'l
Conf. Computer Vision, vol. I, pp. 438-445, July 2001.
from where [16] C. Connolly, ªThe Relationship between Colour Metrics and the
Appearance of Three-Dimensional Coloured Objects,º Color
y>
j1 yj2 yj1 Research and Applications, vol. 21, pp. 331-337, 1996.
>0 B:10 [17] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.
kyj1 kkyj2 yj1 k Wiley, 1973.
[18] A. Elgammal, D. Harwood, and L. Davis, ªNon-Parametric Model
or by taking into account (24) for Background Subtraction,º Proc. Sixth European Conf. Computer
Vision, vol. II, pp. 751-767, June 2000.
mh;N yj > mh;N yj1 [19] B. Fischl and E.L. Schwartz, ªAdaptive Nonlocal Filtering: A Fast
> 0: Alternative to Anisotropic Diffusion for Image Enhancement,º
kmh;N yj kkmh;N yj1 k IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 1,
pp. 42-48, Jan. 1999.
u
t [20] K. Fukunaga, Introduction to Statistical Pattern Recognition, second
ed. Academic Press, 1990.
[21] K. Fukunaga and L.D. Hostetler, ªThe Estimation of the Gradient
of a Density Function, with Applications in Pattern Recognition,º
ACKNOWLEDGMENTS IEEE Trans. Information Theory, vol. 21, pp. 32-40, 1975.
[22] J. Guo, J. Kim, and C. Kuo, ªFast and Accurate Moving Object
The support of the US National Science Foundation under Extraction Technique for MPEG-4 Object Based Video Coding,º
grants IRI 95-30546 and IRI 99-87695 is gratefully Proc. SPIE Visual Comm. and Image Processing, vol. 3653, pp. 1210-
acknowledged. Preliminary versions for some parts of the 1221, 1999.
[23] G. Guy and G. Medioni, ªInference of Surfaces, 3D Curves, and
material were presented in [13] and [14]. The authors would Junctions from Sparse, Noisy, 3D Data,º IEEE Trans. Pattern
like to thank John Kent from the University of Leeds and Analysis and Machine Intelligence, vol. 19, pp. 1265-1277, 1997.
David Tyler of Rutgers for discussions about the relation [24] W. HaÈrdle, Applied Nonparameteric Regression. Cambridge Univ.
Press, 1991.
between the mean shift procedure and M-estimators. [25] M. Herbin, N. Bonnet, and P. Vautrot, ªA Clustering Method
Based on the Estimation of the Probability Density Function and
on the Skeleton by Influence Zones,º Pattern Recognition Letters,
REFERENCES vol. 17, pp. 1141-1150, 1996.
[1] G. Aggarwal, S. Ghosal, and P. Dubey, ªEfficient Query [26] P.J. Huber, Robust Statistical Procedures, second ed. SIAM, 1996.
Modification for Image Retrieval,º Proc. 2000 IEEE Conf. [27] J.N. Hwang and S.R. Lay, and A. Lippman, ªNonparametric
Computer Vision and Pattern Recognition, vol. II, pp. 255-261, Multivariate Density Estimation: A Comparative Study,º IEEE
June 2000. Trans. Signal Processing, vol. 42, pp. 2795-2810, 1994.
[2] R. Bajcsy, S.W. Lee, and A. Leonardis, ªDetection of Diffuse and [28] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice
Specular Interface Reflections and Inter-Reflections by Color Hall, 1988.
Image Segmentation,º Int'l J. Computer Vision, vol. 17, pp. 241- [29] A.K. Jain, R.P.W. Duin, and J. Mao, ªStatistical Pattern Recogni-
272, 1996. tion: A Review,º IEEE Trans. Pattern Analysis and Machine
[3] D. Barash, ªBilateral Filtering and Anisotropic Diffusion: Towards a Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
Unified Viewpoint,º IEEE Trans. Pattern Analysis and Pattern [30] L. Kauffman and P. Rousseeuw, Finding Groups in Data: An
Analysis, to appear. Introduction to Cluster Analysis. J. Wiley & Sons, 1990.
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.
COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 619
[31] D.T. Kuan, A.A. Sawchuk, T.C. Strand, and P. Chavel, ªAdaptive [60] A. Touzani and J.G. Postaire, ªClustering by Mode Boundary
Noise Smoothing Filter for Images with Signal Dependent Noise,º Detection,º Pattern Recognition Letters, vol. 9, pp. 1-12, 1989.
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 2, [61] T. Tuytelaars, L. Van Gool, M. Proesmans, and T. Moons, ªThe
pp. 165-177, Mar. 1985. Cascaded Hough Transform as an Aid in Aerial Image Interpreta-
[32] G. Li, ªRobust Regression,º Exploring Data Tables, Trends, and Shapes, tion,º Proc. Sixth Int'l Conf. Computer Vision, pp. 67-72, Jan. 1998.
D.C. Hoaglin, F. Mosteller, and J.W. Tukey, eds., pp. 281-343. Wiley, [62] M.P. Wand and M. Jones, Kernel Smoothing. Chapman and Hall,
1985. 1995.
[33] L. Liu and S. Sclaroff, ªDeformable Shape Detection and [63] R. Wilson and M. Spann, ªA New Approach to Clustering,º
Description via Model-Based Region Grouping,º Proc. 1999 IEEE Pattern Recognition, vol. 23, pp. 1413-1425, 1990.
Conf. Computer Vision and Pattern Recognition, vol. II, pp. 21-27, [64] C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, ªPfinder:
June 1999. Real-Time Tracking of the Human Body,º IEEE Trans. Pattern
[34] W.Y. Ma and B.S. Manjunath, ªEdge flow: A Framework of Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780-785, July 1997.
Boundary Detection and Image Segmentation,º IEEE Trans. Image [65] G. Wyszecki and W.S. Stiles, Color Science: Concepts and Methods,
Processing, vol. 9, pp. 1375-1388, 2000. Quantitative Data and Formulae. second ed. Wiley, 1982.
[35] B.A. Maxwell and S.S. Shafer, ªSegmentation and Interpretation of [66] S.C. Zhu and A. Yuille, ªRegion Competition: Unifying Snakes,
Multicolored Objects with Highlights,º Computer Vision and Image Region Growing, and Bayes/MDL for Multiband Image Segmen-
Understanding, vol. 77, pp. 1-24, 2000. tation,º IEEE Trans. Pattern Analysis and Machine Intelligence,
[36] A. Montanvert, P. Meer, and A. Rosenfeld, ªHierarchical Image vol. 18, no. 9, pp. 884-900, Sept. 1996.
Analysis Using Irregular Tessellation,º IEEE Trans. Pattern Analysis [67] X. Zhuang, Y. Huang, K. Palaniappan, and Y. Zhao, ªGaussian
and Machine Intelligence, vol. 13, no. 4, pp. 307-316, Apr. 1991. Mixture Density Modeling Decomposition, and Applications,º
[37] J. Monteil and A. Beghdadi, ªA New Interpretation and IEEE Trans. Image Processing, vol. 5, pp. 1293-1302, 1996.
Improvement of Nonlinear Anisotropic Diffusion for Image
Enhancement,º IEEE Trans. Pattern Analysis and Machine Intelli- Dorin Comaniciu received the Dipl. Engn. and
gence, vol. 21, no. 9, pp. 940-946, Sept. 1999. PhD degrees in electronics from the Polytechnic
[38] Y. Ohta, T. Kanade, and T. Sakai, ªColor Information for Region University of Bucharest in 1988 and 1995 and the
Segmentation,º Computer Graphics and Image Processing, vol. 13, PhD degree in electrical engineering from Rut-
pp. 222-241, 1980. gers University in 1999. From 1988 to 1990, he
[39] J. Pan, F. McInnes, and M. Jack, ªFast Clustering Algorithms for was with ICE Felix Computers in Bucharest. In
Vector Quantization,º Pattern Recognition, vol. 29, pp. 511-518, 1996. 1991, he joined the Department of Electronics
[40] T.N. Pappas, ªAn Adaptive Clustering Algorithm for Image and Telecommunications at the Polytechnic
Segmentation,º IEEE Trans. Signal Processing, vol. 40, pp. 901-914, University of Bucharest and he held research
1992. appointments in Germany and France. From
[41] N. Paragios and R. Deriche, ªGeodesic Active Contours for 1996 to 1999, he was with the Center for Advanced Information
Supervised Texture Segmentation,º Proc. 1999 IEEE Conf. Compu- Processing associated with Rutgers University. Since 1999, he has been
ter Vision and Pattern Recognition, vol. II, pp. 422-427, June 1999. a member of the technical staff at Siemens Corporate Research in
[42] B. Park and J. Marron, ªComparison of Data-Driven Bandwidth Princeton, New Jersey. His research interests include robust methods for
Selectors,º J. Am. Statistical Assoc., vol. 85, pp. 66-72, 1990. autonomous computer vision, nonparametric analysis, real-time vision
[43] E.J. Pauwels and G. Frederix, ªFinding Salient Regions in Images,º systems, video surveillance, content-based access to visual data, and
Computer Vision and Image Understanding, vol. 75, pp. 73-85, 1999. data compression. He has coauthored numerous papers, conference
[44] P. Perona and J. Malik, ªScale-Space and Edge Detection Using papers, and book chapters in the area of visual information processing.
Anisotropic Diffusion,º IEEE Trans. Pattern Analysis and Machine He received the Best Paper Award at the IEEE Conference Computer
Intelligence, vol. 12, no. 7, pp. 629-639, July 1990. Vision and Pattern Recognition 2000. He is a member of the IEEE.
[45] K. Popat and R.W. Picard, ªCluster-Based Probability Model and
Its Application to Image and Texture Processing,º IEEE Trans. Peter Meer received the Dipl. Engn. degree
Image Processing, vol. 6, pp. 268-284, 1997. from the Bucharest Polytechnic Institute, Bu-
[46] W.K. Pratt, Digital Image Processing, second ed. Wiley, 1991. charest, Romania, in 1971, and the DSc degree
[47] D. Ridder, J. Kittler, O. Lemmers, and R. Duin, ªThe Adaptive from the Technion, Israel Institute of Technol-
Subspace Map for Texture Segmentation,º Proc. 2000 Int'l Conf. ogy, Haifa, Israel, in 1986, both in electrical
Pattern Recognition, pp. 216-220, Sept. 2000. engineering. From 1971 to 1979, he was with the
[48] T. Risse, ªHough Transform for Line Recognition: Complexity of Computer Research Institute, Cluj, Romania,
Evidence Accumulation and Cluster Detection,º Computer Vision working on research and development of digital
Graphics and Image Processing, vol. 46, pp. 327-345, 1989. hardware. From 1986 to 1990, he was an
[49] S.J. Roberts, ªParametric and Non-Parametric Unsupervised assistant research scientist at the Center for
Cluster Analysis,º Pattern Recognition, vol. 30, pp. 261-272, 1997. Automation Research, University of Maryland at College Park. In 1991,
[50] P. Saint-Marc and J.S. Chen, G. Medioni, ªAdaptive Smoothing: A he joined the Department of Electrical and Computer Engineering,
General Tool for Early Vision,º IEEE Trans. Pattern Analysis and Rutgers University, Piscataway, New Jersey, where he is currently an
Machine Intelligence, vol. 13, no. 6, pp. 514-529, June 1991. associate professor. He has held visiting appointments in Japan, Korea,
[51] D.W. Scott, Multivariate Density Estimation. Wiley, 1992. Sweden, Israel and France and was on the organizing committees of
numerous International workshops and conferences. He is an associate
[52] R. Sedgewick, Algorithms in C++. Addison-Wesley, 1992.
editor of the IEEE Transaction on Pattern Analysis and Machine
[53] S. Sheather, M. Jones, ªA Reliable Data-Based Bandwidth
Intelligence, a member of the editorial board of Pattern Recognition, and
Selection Method for Kernel Density Estimation,º J. Royal Statistics
he was a guest editor of Computer Vision and Image Understanding for
Soc. B, vol. 53, pp. 683-690, 1991.
the April 2000 special issue on ªRobust Statistical Techniques in Image
[54] B. W. Silverman, Density Estimation for Statistics and Data Analysis.
Understanding.º He is coauthor of an award winning paper in Pattern
Chapman and Hall, 1986.
Recognition in 1989, the best student paper in 1999, and the best paper
[55] J. Simonoff, Smoothing Methods in Statistics. Springer-Verlag, 1996. in the 2000 IEEE Conference Computer Vision and Pattern Recognition.
[56] Special Issue on Partial Differential Equations and Geometry- His research interest is in application of modern statistical methods to
Driven Diffusion in Image Processing and Analysis, IEEE Trans. image understanding problems. He is a senior member of the IEEE.
Image Processing, vol. 7, Mar. 1998.
[57] M. Tabb and N. Ahuja, ªMultiscale Image Segmentation by
Integrated Edge and Region Detection,º IEEE Trans. Image
Processing, vol. 6, pp. 642-655, 1997.
[58] C.K. Tang, G. Medioni, and M.S. Lee, ªEpipolar Geometry
Estimation by Tensor Voting in 8D,º Proc. Seventh Int'l Conf. . For more information on this or any other computing topic,
Computer Vision, vol. I, pp. 502-509, Sept. 1999. please visit our Digital Library at http://computer.org/publications/dlib.
[59] C. Tomasi and R. Manduchi, ªBilateral Filtering for Gray and
Color Images,º Proc. Sixth Int'l Conf. Computer Vision, pp. 839-846,
Jan. 1998.
Authorized licensed use limited to: CTRO.INV. EN OPTICA (CIO). Downloaded on June 14,2022 at 18:28:25 UTC from IEEE Xplore. Restrictions apply.