01.13.hierarchical Model-Based Motion Estimation
01.13.hierarchical Model-Based Motion Estimation
Center,PrincetonNJ 08544,USA
David SarnoffResearch
1 Introduction
A large body of work in computer vision over the last L0 or 15 years has been con-
."ro"d with the extraction of motion information from image sequences.The motivation
of this work is actually quite diverse,with intended applicationsranging from data com-
pressionto pattern recognition (alignment strategles) to robotics and vehicle navigat[gn.
In tandem with this diversity of motivation is a diversity of representation of motion
information: from optical flow, to affine or other parametric transformations, to 3-d ego-
motion plus range or other structure. The purpose of this paper is to describe a common
framework within which all of these computations can be represented.
This unification is possible because all of these problems can be viewed from t[e
perspective of image registration. That is, given an image sequence,compute a repre-
sentation of motion that best aligns pixels in one frame of the sequencewith those in
the next. The differencesamong the various approachesmentioned above can then be
expressedas different parametric representationsof the alignment process.In all ca^ses
the function minimized is the same; the difference lies in the fact that it is minimized
with respect to different parameters.
The key features of the resulting framework (or family of algorithms) are a global
modelthat constrainsthe overallstructure of the motion estimated, a local rnodelthat is
used in the estimation process1, and a coarse-finerefinement strategy. An example of a
global model is the rigidity constraint; an example of a local model is that displacement
is constant over a patch. Coarse-finerefinement or hierarchical estimation is included in
this framework for reasonsthat go well beyond the conventionalones of computational
efficiency.Its utility derives from the nature of the objective function common to the
various motion models.
238
[8] and the advantagesof using parametric models within such a framework have also
been discussedin [5].
Arguments for use of hierarchical (i.e. pyramid based) estimation techniquesfor mo.
tion estimation have usually focused on issuesof computational efficiency.A matching
process that must accommodatelarge displacementscan be very expensiveto compute.
Simple intuition suggeststhat if large displacementsca^nbe computed using low resolu-
tion image information great savingsin computation will be achieved.Higher resolution
information can then be used to improve the accuracy of displacement estimation by
incrementally estimating small displacements(see,for example, [2]). However,it can also
be a.rguedthat it is not only efficient,to ignore high resolution image information when
computing large displacements,in a senseit is necessaryto do so. This is becauseof
aliasing of high spatial frequency componentsundergoing large motion. Aliasing is the
source of false matchesin correspondencesolutions or (equivalently) local minima in the
objective function used for minimization. Minimization or matching in a multiresolution
framework helps to eliminate problems of this type. Another way of expressingthis is
to say that many sourcesof non-convexitythat complicate the matching processare not
stable with respect to scale.
With only a few exceptions ([5, 9J), much of this work has concentrated on using a
small family of "generic" motion models within the hiera,rchicalestimation framework.
Such models involve the use of some type of a smoothnessconstraint (sometimesallow-
ing for discontinuities) to constrain the estimation processat image locations containing
little or no image structure. However, as noted above, the arguments for use of a mul-
tiresolution, hierarchical approach apply equally to more structured models of image
motion.
In this paper, we describe a variety of motion models used within the same hierar-
chical framework. These models provide powerful constraints on the estimation process
and their use within the hierarchical estimation framework leads to increased accuracy,
robustnessand efficiency.We outline the implementation of four new models and present
results using real images.
r,lso exact value of the flow vector at each pixel. By non-parametric models, we mean those
such as are commonly used in optical flow computation, i.e. those involving the use of
no- some type of a smoothnessor uniformity constraint.
itrg A parallel taxonomy of motion models can be constructed by consideringlocal models
ute. that constrain the motion in the neighborhoodof a pixel and global models that describe
>lu- the motion over the entire visual field. This distinction becomesespeciallyuseful in a^na.
ion lyzing hiera,rchicalapproacheswhere the meaning of "local" changesas the computation
by movesthrough the multiresolution hierarchy.In this schemefully parametric models are
r,lso global models, non-parametric models such as smoothnessor uniformity of displacement
ren are local models, and quasi-parametricmodels involve both a global and a local model.
rof The rea^sonfor describing motion models in this way is that it clarifies the relationship
bhe between different approachesand allows consideration of the range of possibilities in
the choosing a model appropriate to a given situation. Purely global (or fully parametric)
ion models in essencetrivially imply a local model so no choice is possible.However, in the
ris ca^seof quasi- or non-parametric models, the local model can be more or less complex.
not Also, it makesclea,rthat by varying the size of local neighborhoods,it is possibleto move
continuously from a partially or purely local model to a purely global one.
g a The reasonsfor choosingone model or a.notherare generally quite intuitive, though
rk. the exact choice of model is not always easy to make in a rigorous way. In general,
)w- parametric models constrain the local motion more strongly than the less parametric
ing ones. A small number of parameters (e.g., six in the ca.seof a,ffineflow) are sufficient
rul- to completely specify the flow vector at every point within their region of applicability.
Ige However, they tend to be applicable only within local regions, and in many casesare
approximations to the actual flow field within those regions (although they may be very
ar- good approximations). From the point of view of motion estimation, such models allow
ess the preciseestimation of motion at locations containing no image structure, provided the
ICY, region contains at least a few locations with significant image structure.
ent
Quasi-parametric models constrain the flow field less, but neverthelessconstrain it
to some degree. For instance, for rigidly moving objects under perspective projection,
the rigid motion pa.rameters(same as the egomotion paxarnetersin the case of observer
motion), constrain the flow vector at each point to lie along a line in the velocity space.
One dirnensionalimage structure (e.g., a,nedge) is generally sufficient to precisely esti-
ion
mate the motion of that point. These models tend to be applicable over a wide region
ed.
in the image, perhaps even the entire image. If the local structure of the scene can be
tis further parametrized (".9., planar surfacesunder rigid motion), the model becomesfully
:ily parametric within the region.
Non-parametric models require local image structure that is two-dimensional (e.g.,
:ct
corner points, textured areas). However, with the use of a smoothnessconstraint it is
el. usually possible to 'frll-in" where there is inadequatelocal information. The estimation
,nd
process is typically more computationally expensive than the other two ca.ses.These
.on
models are more generally applicable (not requiring parametrizable scene structure or
motion) than the other two classes.
where the sum is computed over all the points within the region and {.t} it used to denote
the entire flow field within that region. In general this error (which is actually the sum
of individual errors) is not quadratic in terms of the unknown quantities {t}, be_cause
of the complex pu,[1gtttof intensity variations. Hence, we typically have a non-linear
minimization problem at hand-
Note that the basic structure of the problem is independent of the choiceof a motion
model. The model is in essencea statement about the function t(x). To make this
explicit, we can write,
u(x) = u(x;p-), (2)
u(x)= (8)
fuo(*)t*B(x)c.r
position
where Z(*)is the distancefrom the cameraof the point (i.e., depth) whoseimage
is (x)' and
o"l
A(*) = [-Jo - f
L aJ
t =k++k,I* kg.
we obtain
using k to denotethe vector(tc1,kz,ks) and r to denotethe vector (*lf ,vlf ,1)
- r(x)"k.
z(*)
Substituting this into Equation 8 gives
.ow This flow field is quadratic in (x) and can be written also as
nal
sof u(x) - a1* a2x * aey* azxz* asxy
fine o(x) - &4* asc * aaU* azxU+ aeUz (1 1 )
rIrl-
we where the 8 coefficients(41,...,og) are functionsof the motion paramterst,cl and the
be- surface parmeters k. Since this 8-parameterform is rather well-known (e.g., see [15]) we
Iter omit its details.
fer- If the egomotionparametersare known, then the three parameter vector k can be used
the to represent the motion of the pla^narsurface.Otherwise the 8-parameter representation
can be used. In either case,the flow field is a linear in the unknown pa,rameters.
The problem of estimating pla^narsurfacemotion has been has been extensivelystud-
ied before [21, 1, 23]. In particular, Negahdaripourand Horn [21]suggestiterative meth-
ods for estimating the motion and the surfaceparameters,a"swell as amethod of estimat-
ace
ing the 8 parameters and then decomposingthem into the five rigid motion parameters
r,tes
the three surfaceparametersin closedform. Besidesthe embeddingof thesecomputations
rief
within the hierarchical estimation framework, we also take a slightly different, approach
to the problem.
(it We assumethat the rigid motion parametersare already known or can be estimated
(".9., see Section3.3 below). Then, the problem reducesto that of estimating the three
surfaceparametersk. There are severalpractical reasonsto prefer this approach:First, in
many situations the rigid motion model may be more globally applicable than the planar
(8) surface model, and can be estimated using information from all the surfacesundergoing
the same rigid motion. Second,unless the region of interest subtends a significant field
of view, the second order components of the flow field will be small, and hence the
estimation of the eight parameterswill be inaccurateand the processmay be unstable.
On the other hand, the information concerningthe three parametersk is containedin the
first order componentsof the flow field, and (if the rigid motion parameters are known)
their estimation will be more accurateand stable.
The Estirnation Algorithm: Let ki denote the current estimate of the surface pa-
t f
rameters, and let t and cudenote the motion parameters.These parameters are used to
md
construct an initial flow field that is used in the warping step. The residual information
is then used to determine an incremental estimate 6k.
By substituting the parametric form of 6u
(e) 6u=u-u0
om = (A(x)t) (r(x)"(ko + 6k)) * B(x)c.,- (a(*)t) (r(x)"ko) + B(x)c.,
i.rg - (A(x)t) r(x)"6k (12)
refine the parametersof the local and global models.We now show how these models are
: refined.
S We begin by writing equation 15 in an incremental form so that
3
I du(x) -#A(x)ts -B(x)c.,s
= jft.A(x)t*B(x).., (16)
is
Inserting the parametric form of du into Equation 3 we obtain the pixel-wise error as
e
(17)
I
To refinethe local models,we assumethat L/Z(x) is constantover 5 x 5 imagepatches
centered on each image pixel. We then algebraically solve for this Z both in order to
estimate its current value, and to eliminate it from the global error measure. Consider
the local component of the error measure,
We insert the expressionfor | / Z (x) given in Equation L9-not the current numeri,cal
aalue of the local parameter-into Equation 20. The result is an expression for Eilobar
that is non-quadratic in t but quadratic in c.r. We recover refined estimates of t a,ndc.r
by performing one Gauss-Newtonminimization step using the previous estimates of the
global parameters, ti and arg,as starting values. Expressionsa,reevaluated numerically
att ;andu)=u)i.
h We then repeat the estimation algorithm severaltimes at each image resolution.
d
,f
Experiments with the rigid body motion model: We have chosenan outdoor scene
e to demonstrate the rigid body motion model. Figure 4a shows one of the input images,
rt and Figure 4b shows the difference between the two input images. The algorithm was
rt perfiormedbeginning at level 3 (subsampledby u factor of 8) of a Laplacian pyramid. The
local surface parameterc If Z(x) were all initialized to zero, and the rigid-body motion
parameterswereinitializedto t0 = (0,0, 1)T and u) = (0,0,0)t.The modelparameters
rl were refined 10 times at each image resolution. Figure 4c shows the difierence image
between the secondimage and the first image after being warped using the final estimates
of the rigid-body motion parameters and the local surface parameters. Figure 4d shows
an image of the recoveredlocal surface parameterc lf Z(x) such that bright points are
d nea,rerthe camera than dark points. The recoveredinverse ranges are plausible almost
w everywhere,except at the image border and near the recoveredfocus of expansion.The
al bright dot at the bottom right hand side of the inverserange map correspondsto a leaf
;o in the original image that is blowing acoss the ground towa"rdsthe camera. Figure 4e
246
shows a table of rigid-body motion parameters that were recovered at the end of each
resolution of analysis.
More experimental results and a detailed discussion of the algorithm's performance
on va.rious types of scenes can be found in [12].
The Estirnation Algorithm: Assume that we have an approximate flow field from
previous levels (or previous iterations at the same level). Assuming that the incremental
flow vector 6u is constant within the 5 x 5 window, Equation 3 can be written as
Experiments with the general flow model: We demonstrate the generalflow algo-
rithm on an image sequencecontaining severalindependently moving objects, a casefor
which the other motion models described here are not applicable. Figure 5a shows one
image of the original sequence.Figure 5b shows the difference between the two frames
that were used to compute imageflow. Figure 5c showslittle differencebetween the com-
pensatedimage and the other original image. Figure 5d showsthe horizontal component
of the computed flow field, and figure 5e shows the vertical component. In local image
regions where image structure is well-defined,and where the local image motion is sim-
ple, the recoveredmotion estimates appear plausible. Errors predictably occur however
at motion bounda^ries.Errors also occur in image regionswhere the local image structure
is not well-defined (like some parts of the road), but for the same rea"son,such errors do
not appear as intensity errors in the compensateddifferenceimage.
247
I each 4 Discussron
nalrce Thus far, we have described a hierarchicalframework for the estimation of image motion
between two images using va^riousmodels. Our motivation was to generalizetle notion
of direct estimation to model-basedestimation and unify a diverse set of model-based
estimation algorithms into a singleframework.The framework also supports the combined
use of parametric global models and local models which typically represent some type of
3lobal a smoothnessor local uniformity assumption.
l pro-
One of the unifying aspects of the framework is that the same objective function
flow. (SSD) is used for all models, but the minimization is performed with respect to different
luires parameters.As noted in the introduction, this is enabledby viewing all these problems
on of
from the perspective of image registration.
It is interesting to contrast this perspective (of model-basedimage registration) with
level
some of the more traditional approachesto motion analysis. One such approach is to
:itis compute image flow fields, which involvescombining the local brightness constraint with
somesort of a global smoothnessa^ssumption, and then interpret them using appropriate
motion models. In contrast, the approach taken here is to use the motion models to
from constrain the flow field computation. The obvious benefit of this is that the resulting
ental flow fields may generally be expected to be more consistent with models than general
smooth flow fields. Note, however,that the framework also includes general ,*ooih flow
field techniques,which can be used if the motion model is unkno*n.
(21) In the caseof models that are not fully parametric, local image information is used to
determine local image/sceneproperties (e.g.,the local range value). However,the accu-
:t to racy of these can only be as good as the availablelocal image information. For example,
in homogeneousareasof the scene,it may be possibleto achieveperfect registration even
if the surface range estimates (and the correspondinglocal flow vectorsf are incorrect.
(22) However, in the presenceof significant image structures, these local estimates may be
expectedto be accurate.On the other hand, the accuracyof the global parameters(e.g.,
ium-
the rigid motion parameters) dependsonly on having sufficient and sufficiently diverse
san
local information across the entire region. Hence, it may be possible to obtain reliable
rhen
estimates of these global parameters, even though estimated local inf,ormation may not
side
be reliable everywherewithin the region. For fully parametric models, this problem does
.dow
not exist.
The image registration problem addressedin this paper occurs in a wide range of
tbv
image processingapplications, far beyond the usual ones consideredin computer vision
:itly
(".9., navigationand imageunderstanding).Theseinclude imagecompressionvia motion
compensatedencoding, spatiotemporal analysisof remote sensingtype of images,image
databaseindexing and retrieval, and possibly object recognition. On" way to state this
lgo- general problem is as that of recoveringthe coordinate system that relate two imagesof
for a scenetaken from two different viewpoints. In this sense,the framework proporuJ h"r"
one unifies motion analysis acrossthese diferent applications as well.
mes
om-
ent Acknowledgements: M*y individuals have contributed to the ideas and results pre-
age sentedhere. These include Peter Burt and Leonid Oliker from the David Sarnoff Research
im- Center, and ShmuelPelegfrom Hebrew University.
:ver
ure
do
248
References
and structure from optical flow generated
1. G. Adiv. Determining three-dimensionalmotion pattern Anorysis and Machine Intelligence,
by severar moving objects. IEEE Trans. on
?( ):384-401,JulY 1985-
techniques for the measurement of
2 . p. Anandan. A unified perspective on computational
vision, pages zl9-230, London,
visual motion. rn Internationar conference on computer
May 1987.
an algorithm for the measurementof visual
3 . p. Anandan. A computational framework and
1989'
motion. International Journal o! computer vision,2z283-3L0,
computationaly efrcient motion estimation
4 . J. R. Bergen and E. H. Adersoo. Hi.rarchicar,
algorithm. J. Opt. Soc.Am' A',4:35,1987'
pereg.computing two motionsfrom three
s. lt;:';;;, ;1. ir"rr,'ii. Hinsorani,lnd s. 1ee0'
Tt 5,ol' on computer vision,osaka, Japan,December
- l-- T^-^- T\^^o-l'o. 1 OOf)
;;;;"ii"i)rr)r""t;;;'
lfallles. lll rt c;;i;"ce _- .r^ Ttr1E
pyramid as a compact image code. IEEE
6. p. J. Burt and E. H. Adelson. The raprr.ino
/d!a e Transactionson c ommunication,31:532-540,1983.
Awl ;. #""rrJ;."b;;:;-;;;;kfi.,oi*, a moving camera,an appricationof dvnamic motion 'l
;;il:
(LItCLIJDrD. i"'ioii
-rr1
component pat--- :-^l^1i-- ^^rnhanan
13. :: ri:"i5't;:;':rir**i.";il;;;e
1190,MIT AI LAB, Cambridge,MA, 1990'
14. E. C. Hildreth . The Messureme,nt o! visual Motion' The MIT Press' 1983'
1 5 . B . K . P . I l o r n . R o b o t V i s i o n , . M I T P r e s s , C a m b r i d g e , M A , 1 9 8 6 . Inteuigence,r7:L85-
optical flow. Artificial
1 6 . B. K. p. Horn and B. G. Schunck.Determining
2 0 3 ,1 9 8 1 .
for recovering motion. International
17. B. K. p. Horn and E. J. weldon. Direct methods
Journalof ComputerVision,2(1):51-76' 'it"r.tive June 1988'
1g. B.D. Lucas and T. Kanade.'Ao image registration techniquewith an application
pages 121-130,1-991'
to stereo vision. In Image{JnderstsndingWorkshop' ror estimating
Matthtes, K. s;"ilil:;;J i: K;";J;. Karman'ftt"'-u.'ed algorithms
l: il=jfit;'*.
19. L. SzensKl' ar
on computer vision, pages 199-
r l f :
;of
)[r
ual
ion
ree
).
EE
Fig. 1. Diagram of the hierarchical motion estimation framework.
nd
ion
at-
)ft,
In
for
in
3.
)n.
)rt
t-
wl
on
ng
t-
.n-
(
rn
wl
>al
(.oooo,.ooo0,.oooo)
(.0000,.0000,1.0000)
3 2 x 3 0 .0027,.0039,-.0001 ( - . 3 3 7 9 , -3. 15 2 , . 9 3 1 4 )
64x60 ( . 0 0 3 8 , . 0 0 4 1 , . 0 0 ( -9. 3 31 9 , - . 0 5 6 1 , .19! )4
1
1 2 8 x 1 2 0 ( . 0 0 3 ? , . 0 0 1 2 , . 0 0 0-.oooo,-.0383,.9971)
8)
256 x 240 (.oozg,.oo06,.oo13) -.0255,-.0899,.9956)
(")
(b)
(") (d)
(")