A Variational Approach To Joint Denoising, Edge Detection and Motion Estimation
A Variational Approach To Joint Denoising, Edge Detection and Motion Estimation
1 Introduction
The task of motion estimation is a fundamental problem in computer vision. In
low-level image processing, the accurate computation of object motion in scenes
is a long standing problem, which has been addressed extensively. In particular,
global variational approaches initiated by the work of Horn and Schunck [1] are
increasingly popular. Initial problems such as the smoothing over discontinuities
or the high computational cost have been resolved successfully [2,3,4]. Motion
also poses an important cue for object detection and recognition. While a number
of techniques first estimate the motion field and segment objects later in a second
phase [5], an approach of both computing motion as well as segmenting objects
at the same time is much more appealing. First advances in this direction were
investigated in [6,7,8,9,10,11].
The idea of combining different image processing tasks into a single model
in order to cope with interdependencies has drawn attention in several different
fields. In image registration, for instance, a joint discontinuity approach for si-
multaneous registration, segmentation and image restoration has been proposed
by Droske & Ring [12] and extended in [13] incorporating phase field approxi-
mations. Yezzi, Zöllei and Kapur [14] and Unal et al. [15] have combined seg-
mentation and registration applying geodesic active contours described by level
sets in both images. Vemuri et al. have also used a level set technique to exploit
a reference segmentation in an atlas [16]. We refer to [17] for further references.
Cremers and Soatto [18,19] presented an approach for joint motion estima-
tion and motion segmentation with one functional. Incorporating results from
Bayesian inference, they derived an energy functional, which can be seen as an
extension to the well-known Mumford–Shah [20] approach. Their functional in-
volves the length of boundaries separating regions of different motion as well as
a “fidelity-term” for the optical-flow assumption. Our approach is in particular
motivated by their investigations, resolving the drawback of detecting edges in
a parametric model, by a non-parametric approach.
Recently, highly accurate motion estimation [21] has been extended to contour-
based segmentation [22] following a well known segmentation scheme [23]. The
authors demonstrate that extending the motion estimator to edge detection in a
variational framework leads to an increase in accuracy. However, as opposed to
our framework, the authors do not include image denoising in their framework.
Including a denoising functional together with motion estimation in a variational
framework has been achieved by [24]. They report significant increases the accu-
racy of motion estimation, particularly with respect to noisy image sequences.
However, edges are not detected, but errors of smothing over discontinueties are
lessened by formulating the smoothness constraint in a L1 metric.
We present the first approach of combining motion estimation, image denois-
ing and edge detection in the same variational framework. This step will allow
us to produce more accurate estimations of motion while detecting edges at the
same time and preventing any smoothing across them.
The combination of denoising and edge detection with the estimation of mo-
tion results in an energy functional incorporating fidelity- and smoothness-terms
for both the image and the flow field. Moreover, we incorporate an anisotropic
enhancement of the flow along the edges of the image in the sense of Nagel
and Enkelmann [2]. The model is implemented using the phase-field approxi-
mation in the spirit of Ambrosio’s and Tortorelli’s[25] approach for the original
Mumford–Shah functional. The identification of edges is phrased in terms of a
phase field function, no a-priori knowledge of objects is required, as opposed to
formulations of explicit contours. In contrast to a level set approach, the built-in
multi-scale enables a robust and efficient computation and no initial guess for
the edge set is required. We present here a truly d + 1 dimensional algorithm,
considering time as an additional dimension to the d-dimensional image data.
This fully demonstrates the conceptual advantages of the joint approach. The
characteristics of our approach are:
The first and second term of the energy are fidelity terms with respect to the
image intensity and the regular part of the optical–flow–constraint, respectively.
The third and fourth term encode the smoothness requirement of u and w.
Finally, the last terms represents the area of the edge surfaces S, parameterized
by the phase file π. The projection operator P [φ] couples the smoothness of the
motion field w to the image geometry:
!
2 ∇(t,x) φ
2 ∇(t,x) φ
P [φ] = α(φ ) 1I − β(φ ) ⊗ .
∇(t,x) φ ∇(t,x) φ
hexahedral grid. In the following, the spatial and temporal grid cell sizes are de-
noted by h and τ respectively, i.e. image frames are at a distance of τ and pixels
of each frame are sampled on a regular mesh with cell size h. To avoid tri-linear
interpolation problems, we subdivide each hexahedral cell into 6 tetrahedra. On
this tetrahedral grid, we consider the space of piecewise affine, continuous func-
tions V and ask for discrete functions U, Φ ∈ V and V ∈ V 2 , such that the
discrete and weak counterparts of the Euler Lagrange equations (5), (6) and (7)
are fulfilled. This leads to solving systems of linear equations for the vectors of
the nodal values of the unknowns U, Φ, V . Using an efficient custom-designed
compressed row sparse matrix storage, we can treat datasets of up to K = 10
frames of N = 500, M = 320 pixels in less than 1GB memory. The linear systems
of equations are solved applying a classical conjugate gradient method. For the
pedestrian sequence (Fig. 5), one such iteration takes 47 seconds on a Pentium
IV PC at 1.8 GHz running Linux. The complete method converges after 2 or
3 such iterations. Large video sequences are computed by shifting a window of
K = 6 frames successively in time. Thus temporal boundary effects are avoided.
We present here several results of the proposed method for two dimensional
image sequences. In the considered examples, the parameter setting = h/4,
µu = h−2 , µw = λu = 1, λw = 105 h−2 and C() = , δ = has proven to give
good results.
We first consider a simple example of a white disk moving with constant speed
v = (1, 1) on a black background (Fig. 1). A small amount of smoothing results
from the regularization energy Ereg,u (Fig. 1(b)), which is desirable to ensure
robustness in the resulting optical flow term ∇(t,x) u·w and removes noisy artifacts
in real-world videos, e.g. Fig. 4 and Fig. 5. The phase field clearly captures the
moving object’s contour. The optical flow is depicted in Fig. 1(c) by color coding
a b c d
Fig. 2. Noisy circle sequence: From top to bottom, frames 3 and 9 − 11 are
shown. (a) original image sequence, (b) smoothed images, (c) phase field, (d)
estimated motion (color coded)
the vector directions as shown by the lower-right color wheel. Clearly, the method
is able to extract the uniform motion of the disc. The optical flow information,
available only on the motion edges (black in Fig. 1(c)), is propagated into the
information-less area inside the moving disk, yielding the final result.
In the next example, we revisit the simple moving circle sequence, but add
noise to it. We also completely destroy the information of frame 10 in the se-
quence (Fig. 2). Figure 2 shows the results for frames 3 and 9 − 11. We see that
the phase field detects the missing circle in the destroyed frame as a temporal
edge surface in the sequence, i.e. φ drops to zero in the temporal vicinity of
the destroyed frame. This is still visible in the previous and next frames, shown
in the second and third row. However, this does not hamper the restoration of
the correct optical flow field, shown in the fourth column. This result is due to
the anisotropic smoothing of information from the frames close to the destroyed
frame. For this example, we used = 0.4h.
A second synthetic example is shown in Fig. 3, using data from the publicly
available collection at [29]. Here, a textured sphere spins on a textured back-
ground (Fig. 3(a)). Again, our method is able to clearly segment the moving
a b c d
Fig. 3. Rotating sphere: smoothed image (a), phase field (b), optical flow (color
coded) (c), optical flow (vector plot, color coded magnitude) (d)
a b c
Fig. 4. Taxi sequence: smoothed image (a), phase field (b), and flow field (c)
object from the background, even though the object doesn’t change position.
We used a phase field parameter = 0.15h. The extracted optical flow clearly
shows the spinning motion (Fig. 3(d)) and the discontinuous motion field.
We next consider a known real video sequence, the so-called Hamburg taxi
sequence. Figure 4 shows the smoothed image (u), phase field φ and color-coded
optical flow field (w). Our method detects well the image edges (Fig. 4 b).
Also, the upper-left rotating motion of the central car is extracted accurately
(Fig. 4 c). As it should be, the edges of the stationary objects, clearly visible
in the phase field, do not contribute to the optical flow. Moreover, the moving
car is segmented as one single object in the optical flow field, i.e. the motion
information is extended from the moving edges, i.e. car and car windscreen
contours, to the whole moving shape.
Finally, we consider a complex video sequence, taken under outdoor condi-
tions by a monochrome video camera. The sequence shows a group of walking
pedestrians (Fig. 5 (top)). The human silhouettes are well extracted and cap-
tured by the phase field (Fig. 5(middle)). We do not display a vector plot of the
optical flow, as it is hard to interpret it visually at the video sequence resolution
of 640 by 480 pixels. However, the color-coded optical flow plot (Fig. 5(bottom))
shows how the method is able to extract the moving limbs of the pedestrians.
The overall red and blue color corresponds to the walking directions of the
pedestrians. The estimated motion is smooth inside the areas of the individual
pedestrians and not smeared across the motion boundaries. In addition, the al-
gorithm nicely segments the different moving persons. The cluttered background
poses no big problem to the segmentation, nor are the edges of occluding and
overlapping pedestrians, who are moving at almost the same speed.
Fig. 5. Pedestrian video: frames from original sequence (top); phase field (mid-
dle); optical flow, color coded (bottom)
References
1. Horn, B.K.P., Schunk, B.: Determining optical flow. Artificial Intelligence 17
(1981) 185–204
2. Nagel, H.H., Enkelmann, W.: An investigation of smoothness constraints for the
estimation of dispalcement vector fields from image sequences. IEEE Trans. on
PAMI 8(5) (1986) 565–593
3. Weickert, J., Schnörr, C.: A theoretical framework for convex regularizers in pde-
based computation of image motion. Int. J. of Comp. Vision 45(3) (2001) 245–264
4. Bruhn, A., Weickert, J., Feddern, C., Kohlberger, T., Schnörr, C.: Real-time optical
flow computation with variational methods. In Petkov, N., Westenberg, M.A., eds.:
CAIP 2003. Volume 2756 of LNCS., Springer (2003) 222–229
5. Wang, J.Y.A., Adelson, E.H.: Representating moving images with layers. IEEE
Trans. on Im. Proc. 3(5) (1994) 625–638
6. Schnörr, C.: Segmentation of visual motion by minimizing convex non-quadratic
functionals. In: 12th ICPR. (1994)
7. Odobez, J.M., Bouthemy, P.: Robust multiresolution estimation of parametric
motion models. J. of Vis. Comm. and Image Rep. 6(4) (1995) 348–365
8. Odobez, J.M., Bouthemy, P.: Direct incremental model-based image motion seg-
mentation for video analysis. Sig. Proc. 66 (1998) 143–155
9. Caselles, V., Coll, B.: Snakes in movement. SIAM J. Num. An. 33 (1996) 2445–
2456
10. Memin, E., Perez, P.: A multigrid approach for hierarchical motion estimation. In:
ICCV. (1998) 933–938
11. Paragios, N., Deriche, R.: Geodesic active contours and level sets for the detection
and tracking of moving objects. IEEE Trans. on PAMI 22(3) (2000) 266–280
12. Droske, M., Ring, W.: A Mumford-Shah level-set approach for geometric image
registration. SIAM Appl. Math. (2005) to appear.
13. Authors: Mumford-shah based registration. Computing and Visualization in Sci-
ence (2005) submitted.
14. Kapur, T., Yezzi, L., Zöllei, L.: A variational framework for joint segmentation
and registration. IEEE CVPR (2001) 44–51
15. Unal, G., Slabaugch, G., Yezzi, A., Tyan, J.: Joint segmentation and non-rigid
registration without shape priors. (2004)
16. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image registration via level-set motion:
Applications to atlas-based segmentation. Med. Im. Analysis 7 (2003) 1–20
17. Davatzikos, C.A., Bryan, R.N., Prince, J.L.: Image registration based on boundary
mapping. IEEE Trans. Med. Imaging 15(1) (1996) 112–115
18. Cremers, D., Soatto, S.: Motion competition: A variational framework for piecewise
parametric motion segmentation. Int. J. of Comp. Vision 62(3) (2005) 249–265
19. Cremers, D., Kohlberger, T., Schnörr, C.: Nonlinear shape statistics in mumford-
shah based segmentation. In: 7th ECCV. Volume 2351 of LNCS. (2002) 93–108
20. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and
associated variational problems. Comm. Pure Appl. Math. 42 (1989) 577–685
21. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow
estimation based on a theory for warping. In Pajdla, T., Matas, J., eds.: Proc. of
the 8th ECCV. Volume 3024 of LNCS. (2004) 25–36
22. Amiaz, T., Kiryati, N.: Dense discontinuous optical flow via contour-based seg-
mentation. In: Proc. ICIP 2005. Volume III. (2005) 1264–1267
23. Vese, L., Chan, T.: A multiphase level set framework for image segmentation using
the mumford and shah model. Int. J. Computer Vision 50 (2002) 271–293
24. Nir, T., Kimmel, R., Bruckstein, A.: Variational approach for joint optic-flow
computation and video restoration. Technical report, Dep. of C. S. - Israel Inst. of
Tech., Haifa, Israel (2005)
25. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity prob-
lems. Boll. Un. Mat. Ital. B 6(7) (1992) 105–123
26. Ambrosio, L., Fusco, N., Pallara, D.: Functions of bounded variation and free
discontinuity problems. Oxford University Press (2000)
27. Bourdin, B.: Image segmentation with a Finite Element method. ESIAM: Math.
Modelling and Num. Analysis 33(2) (1999) 229–244
28. Bourdin, B., Chambolle, A.: Implementation of an adaptive Finite-Element ap-
proximation of the Mumford-Shah functional. Numer. Math. 85(4) (2000) 609–646
29. Group, C.V.R.: Optical flow datasets. Univ. of Otago, New Zealand,
www.cs.otago.ac.nz/research/vision (2005)