Dynamic 2D/3D Registration: Sofien Bouaziz Andrea Tagliasacchi Mark Pauly Ecole Polytechnique F Ed Erale de Lausanne
Dynamic 2D/3D Registration: Sofien Bouaziz Andrea Tagliasacchi Mark Pauly Ecole Polytechnique F Ed Erale de Lausanne
Abstract
1
About the lecturers
Sofien Bouaziz is a PhD student in the Computer Graphics and Geometry Laboratory
at the École Polytechnique Fédérale de Lausanne (EPFL) under the supervision of Prof.
Mark Pauly. He received his MSc degree in Computer Science from EPFL in 2009. His
research interests include computer graphics, computer vision, and machine learning.
Sofien co-developed the facial motion capture software faceshift studio.
e-mail: [email protected]
website: http://lgg.epfl.ch/~bouaziz
e-mail: [email protected]
website: http://drtaglia.github.io
e-mail: [email protected]
website: http://lgg.epfl.ch
2
1 Introduction
Recent technological advances in RGB-D sensing devices, such as the Microsoft Kinect,
facilitate numerous new and exciting applications, for example in 3D scanning [24] and
human motion tracking [26, 19, 6]. While affordable and accessible, consumer-level RGB-
D devices typically exhibit high noise levels in the acquired data. Moreover, difficult
lighting situations and geometric occlusions commonly occur in many application set-
tings, potentially leading to a severe degradation in data quality. This necessitates a
particular emphasis on the robustness of image and geometry processing algorithms.
The combination of 2D and 3D registration is one important aspect in the design of ro-
bust applications based on RGB-D devices. This lecture introduces the main concepts
of 2D and 3D registration and explains how to combine them efficiently. An up-to-
date version of these course notes as well as slides and source code can be found at
http://lgg.epfl.ch/2d3dRegistration.
2 2D/3D Registration
In the first part of the course we introduce the theory of 2D/3D registration algorithms
suitable for processing RGB-D data. We focus on pairwise registration to compute the
alignment of a source model onto a target model. This alignment can be rigid or non-
rigid, depending on the type of object being scanned. We formulate the registration as
the minimization of an energy
Ereg = Ematch + Eprior . (1)
The matching energy Ematch defines a measure of how close the source is from the target.
The prior energy Eprior quantifies the deviation from the type of transformation or defor-
mation that the source is allowed to undergo during the registration, for example, a rigid
motion or an elastic deformation. The goal of registration is to find a transformation of
the source model that minimizes Ereg to bring the source into alignment with the target.
For data acquired with RGB-D devices, registration can utilize both the geometric infor-
mation encoded in the 3D depth map, as well as the color information provided by the
recorded 2D images. We show that Equation 1 provides a unified way to formulate both
2D and 3D registration, which simplifies their integration.
2.1 3D Registration
3
2.1.1 Matching energy
The matching energy measures how close the surface Z is to the surface Y and is defined
as
Z
Ematch (Z) = ϕ(z, Y)dz, (2)
Z
where PY (zi ) : R3 → R3 returns the closest point (using Euclidian distance) on the
surface Y from zi . PY (zi ) can also be seen as the orthogonal projection of zi onto Y.
In this section we present several prior energies that can be used for registration. These
energies can also be combined to build more sophisticated priors. Priors encode proper-
ties of the scanned objects. For example, when scanning rigid objects, a global rigidity
4
prior can be used to limit the allowed transformations to rotations and translations. For
deforming objects, for example a human body, geometric priors are often employed that
try to mimic physical behavior such as an elastic deformation. We describe a simple
local rigidity prior that approximates elastic deformations and facilitates efficient imple-
mentations. More complex deformation behavior can be captured using a data-driven
approach. One popular method is based on a collection of sample shapes that represents
the space of space of allowed deformations. Using dimensionality reduction, for example
principal component analysis, efficient linear models can be derived that are suitable for
realtime registration algorithms.
where R ∈ R3×3 is a rotation matrix and t ∈ R3 a translation vector. In this case, the
deformed surface Z tries to follow a rigid transformation of the original surface X .
Local rigidity. The local rigidity energy, following [22, 4], can be expressed as
n X
X
Earap (Z, Ri |ni=1 ) = k(zj − zi ) − Ri (xj − xi )k22 , (5)
i=1 j∈Ni
where the Ri ∈ R3×3 are rotation matrices and Ni is the set of indices of the neighboring
points of xi . In this case, each local neighborhood on the surface Z tries to follow a rigid
transformation of its corresponding local neighborhood on the surface X . Other local
rigidity energies can also be used as prior, see for example [3, 23].
5
Linear model. A 3D linear shape model can be defined using a matrix P containing
the shape model basis, and a mean shape vector m [10]. A new shape s can be defined
as
s = Pd + m, (6)
where d is a vector containing the basis coefficients. A linear model prior energy can be
formulated as the deviation of the vertices from the linear model
n
X
Eprior (Z, d) = kzi − (Pi d + mi )k22 , (7)
i=1
2.1.3 Optimization
How to best optimize the registration energy depends on the prior energy. In this section
we show, as an example, how to optimize a registration energy for two applications: rigid
scanning and non-rigid modeling.
In-hand rigid scanning. Since single depth maps acquired with the RGB-D sensor
exhibit high noise levels and do not cover the whole surface of the 3D object, an aggrega-
tion procedure is typically applied to obtain a complete model with reduced noise level.
In order to aggregate multiple scans over time, different methods can be used [28, 29, 18].
The classical approach is to perform a 3D rigid registration of the currently acquired scan
of the object with the already accumulated 3D data. The pairwise 3D alignment can be
formulated as
where the matching energy is combined with a global rigidity prior. To optimize E(Z, R, t)
we linearize the rotation matrix [20] approximating cos θ by 1 and sin θ by θ
1 −γ β
R ≈ R̃ = γ 1 −α . (9)
−β α 1
6
where t is the iteration number and z0i = xi . As PY (.) is a non linear function that is
difficult to optimize with, we use in the optimization the previous estimate PY (zti ). This
correspond to the point-to-point matching error [1]. To speed up the convergence of the
optimization one can linearize kzt+1
i − PY (zti )k2 at PY (zti ) which gives nTi (zt+1
i − PY (zti )),
where ni is the normal of the surface Y at PY (zti ). This leads to the point-to-plane
matching error [8]. The optimization can be reformulated as
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 + w2 kzt+1
i − (R̃(Rt xi + tt ) + t̃)k22 . (11)
Z t+1 ,R̃,t̃ i=1
Both Equation 10 and Equation 11 are quadratic, and therefore, can be optimized by
setting the partial derivatives to zero by solving a linear system. During the optimization,
it can be advantageous to apply a Tikhonov regularization to the parameters of the rigid
motion as linearizing the rotation matrix assumes that the angles are small.
This energy can be minimized in a similar spirit by linearizing the rotation matrix and
iteratively solving a linear system. Other approaches can be found in [11].
Shape Model
Fitting
7
scan of a face. Non-rigid modeling using a morphable model can be formulated as
A local rigidity energy is added to the optimization in order to get an accurate result, as
the morphable model represents the large-scale variability but might not capture small
scale details. As previously, we solve iteratively
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 + w2 kzt+1
i − (R̃(Rt xi + tt ) + t̃)k22 +
Z t+1 ,d,R̃i |n
i=1 ,R̃,t̃ i=1
X
w3 kzt+1
i − (Pi d + mi )k22 + w4 k(zt+1
j − zt+1 t 2
i ) − R̃i Ri (xj − xi )k2 , (15)
j∈Ni
2.2 2D Registration
We define I(x) as the pixel value of the image I located at the position x. The matching
energy measures the color similarity between the source image and the target image
8
wrapped onto the deformed grid Z .
n
X
Ematch (Z) = kI(xi ) − J(zi )k22 . (16)
i=1
Similarly to 3D geometry registration, we can use different prior energies that can be
combined to build more complex priors.
Horn-Schunck. In the Horn-Schunck algorithm [14] the smoothness of the flow is de-
fined using a Laplacian operator
n
X X
EHK (Z) = k(zi − xi ) − |Ni |−1 (zj − xj )k22 , (18)
i=1 j∈Ni
where |Ni | is the cardinality of Ni . This energy measures for each grid vertex the deviation
of its deformation from the mean deformation of its neighbors.
2.2.3 Optimization
In this section we show, as an example, how to optimize the matching energy combined
with the laplacian smoothness energy. This is similar to the method presented in [14].
Our optimization energy is
9
To solve this optimization we linearize J(.) at the current estimate and solve iteratively
n
X
arg min w1 kI(xi ) − J(zti ) − ∇J(zti )T (zt+1
i − zti )k22 +
Z t+1 i=1
X
w2 k(zt+1
i − xi ) − |Ni |−1 (zt+1
j − xj )k22 . (20)
j∈Ni
T
where ∇J = ∇Jx ∇Jy is the image gradient, with ∇Jx the image gradient in x
direction and ∇Jy the image gradient in y direction. As previously, the minimization
can be computed by setting the partial derivative to zero, which corresponds to solving
a linear system.
We formulate the energy measuring the quality of the 2D and 3D alignment as follow
n
X
Ematch (Z) = w1 kzi − PY (zi )k22 + w2 kI(xi ) − J(f (zi ))k22 . (21)
i=1
The first term is the matching energy presented in Section 2.1. The second term is similar
to the 2D matching energy presented in Section 2.2. The only difference is the additional
function f : R3 → R2 that projects a 3D point zi to the 2D image J. For example this
h iT
f zi,x f zi,y
function could be a perspective projection of the form f (zi ) = zi,z zi,z .
2.3.2 Optimization
We illustrate 2D/3D registration in the context of a face tracking system that combines
the 2D/3D matching energy with a 3D blendshape prior. A blendshape representation
is a linear model defined as a set of blendshape meshes B = [b0 , ..., bn ] where b0 is the
rest pose and bi , i > 0 are different expressions. A new expression can be generated as
T = b0 + Bd, where B = [b1 − b0 , ..., bn − b0 ]. The blendshape model shown below is
inspired from Ekmans Facial Action Coding System [12]. Realtime face tracking using
10
Neutral
11
To solve this optimization we linearize J(f (.)) at the current estimate
n
X ∂f (zti ) t+1
kI(xi ) − J(f (zt+1 t t T
i )k ≈ kI(xi ) − J(f (zi )) − ∇J(f (zi )) (zi − zti )k22 . (23)
i=1
∂zi
h iT
f zi,x f zi,y
For a perspective projection f (zi ) = zi,z zi,z we have
" f f zi,x #
∂f (zi ) zi,z
0 − z2i,z
= f f zi,y . (24)
∂zi 0 zi,z
− z2
i,z
In [27], the global rigidity is decoupled leading to a two steps optimization procedure. In
a first step, a 2D/3D alignment of the blendshape model is computed
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 +
Z t+1 ,dt+1 i=1
∂f (zti ) t+1
w2 kI(xi ) − J(f (zti )) − ∇J(f (zti ))T (zi − zti )k22 +
∂zi
t+1
w3 kzi − (R (Bi d + b0i ) + tt )k22 ,
t t+1
(25)
These two steps are repeated alternatively until convergence. The first step can be
computed by solving a linear system. The second step can be solved using [11] or by
linearizing the rotation matrix. For tracking, another 2D matching energy can be added
to the system:
n
X
Ematch (Z t+1 ) = kJt (f (zti )) − Jt+1 (f (zt+1 2
i ))k2 . (27)
i=1
This optical flow energy enforces color consistency over time by measuring the variation
of color from the previous image frame Jt to the current frame Jt+1 for each zi .
12
3 Robust Registration
In registration, outliers are not only introduced by corrupted sensor measurements, but
also by partial overlaps - many samples on the source simply do not have an ideal cor-
responding point on the target shape. To address this problem, various techniques rely
on a set of heuristics to either prune or downweigh low quality correspondences. Typical
criteria include discarding correspondences that are too far from each other, have dissim-
ilar normals, or involve points on the boundary of the geometry; see [21] for details. As
we will see next these heuristics are related to the optimization of robust functions. In
this section we will consider robust functions as alternatives to the Euclidean metric and
introduce a suitable optimization technique to use them efficiently.
2
1 x2 |x|p p
0.35 0.35
2 0.8
= 0.9
0.3 0.3 τ = 0.80 p = 0.7
0.25 0.25 0.6
p = 0.5
p = 0.3
0.2 0.2
τ = 0.64 0.4
0.15 0.15
0.1 0.1
τ = 0.48 0.2
0.05 0.05
τ = 0.32
0 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
1 if |x| 6 τ
5
0 otherwise p|x|p−2
1 4.5
1 1
0.6 0.6
2.5
2
0.4 0.4
1.5
1
0.2 0.2
0.5
0 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 3: (top) The robust norms ϕ. (bottom) The associated weight functions w.
In Fig. 3 we show a few examplar commonly used penalty functions, note how these
all posses properties like radial monotonicity and symmetry [13]. This optimization
problem in Equation 28 can be solved using Iteratively Re-Weighted Least Squares (IRLS)
13
by solving a sequence of problems of the form
n
X
arg min αi i (p)2 . (29)
p
i=1
To understand how to compute the weights αi first notice that the optima of Eq. 28 can
be obtained by vanishing its gradient, which can be computed by a simple application of
the chain rule (note we only look at one element of the sum)
∂ϕ((p)) ∂(p) ∂(p)
= ψ((p)) = w((p))(p) , (30)
∂p ∂p ∂p
where ψ(x) = ∂ϕ(x)/∂x for compactness of notation and w(x) = ψ(x)/x is the so called
weighting function. Interestingly, the gradient of Eq. 29 is
∂α (p)2 ∂(p)
= α (p) . (31)
∂p ∂p
We can now see that by setting α = w((p)) the two gradients become equal. However,
as the optimal weights αi∗ = w(i (p∗ )) are not available, we use an iterative approach
where at each iteration the weights are computed using the previous iteration
n
X
arg min w(i (pt ))i (pt+1 )2 . (32)
pt+1 i=1
This scheme is know as Iteratively Re-Weighted Least Squares (IRLS) and is related to
majorization-minimization. The basic idea of majorization-minimization is to iteratively
minimize a function always larger or equal to the objective function and with at least
one point in common. If these requirements are fullfilled the algorithm converges to a
minimum [25].
14
4 Conclusion
In this course, we introduced 2D/3D registration algorithms and show their applications
for data captured with RGBD devices, such as the Microsoft Kinect or Asus Xtion Live.
Image and geometry registration algorithms are an essential component of many computer
graphics and computer vision systems. With recent technological advances in RGB-D
sensors, robust algorithms that combine 2D image and 3D geometry registration have
become an active area of research. The goal of this course was to introduce the basics
of 2D/3D registration algorithms and to provide theoretical explanations and practical
tools to design robust computer vision and computer graphics systems based on RGBD
devices. We have shown that 2D and 3D registration can be expressed and combined in
a common framework. Numerous application based on RGB-D devices can benefit from
this formulation that allows to combine different priors in an easy manner. To illustrate
the theory and demonstrate practical relevance, we briefly discuss three applications:
rigid scanning, non-rigid modeling, and realtime face tracking.
15
References
[1] P. Besl and H. McKay. A method for registration of 3d shapes. PAMI, 1992.
[2] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. Proc. of
ACM SIGGRAPH, 1999.
[3] M. Botsch, M. Pauly, M. Gross, and L. Kobbelt. Primo: coupled prisms for intuitive
surface modeling. SGP, 2006.
[5] S. Bouaziz, A. Tagliasacchi, and M. Pauly. Sparse iterative closest point. SGP, 2013.
[6] S. Bouaziz, Y. Wang, and M. Pauly. Online modeling for realtime facial animation.
ACM Trans. Graph., 2013.
[8] Y. Chen and G. Medioni. Object modeling by registration of multiple range images.
In ICRA, 1991.
[9] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek. The trimmed iterative clos-
est point algorithm. In Pattern Recognition, 2002. Proceedings. 16th International
Conference on, volume 3, pages 545–548. IEEE, 2002.
[10] T. Cootes and C. Taylor. Statistical models of appearance for computer vision, 2000.
[11] D. W. Eggert, A. Lorusso, , and R. B. Fisher. Estimating 3-d rigid body transfor-
mations: a comparison of four major algorithms. Machine Vision and Applications,
1997.
[12] P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the
Measurement of Facial Movement. Consulting Psychologists Press, 1978.
[13] J. Fox. An R and S-Plus companion to applied regression. Sage, 2002. http://cran.r-
project.org/doc/contrib/Fox-Companion/appendix-robust-regression.pdf.
[14] B. K. P. Horn and B. G. Schunck. ”determining optical flow”. Artif. Intell., 1981.
[15] H. Li, B. Adams, L. J. Guibas, and M. Pauly. Robust single-view geometry and
motion reconstruction. ACM Trans. Graph., 2009.
16
[18] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison,
P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense
surface mapping and tracking. ISMAR, 2011.
[19] I. Oikonomidis, N. Kyriazis, and A. Argyros. Tracking the articulated motion of two
strongly interacting hands. CVPR, 2012.
[21] S. Rusinkiewicz and M. Levoy. Efficient variants of the icp algorithm. 3DIM, 2001.
[23] R. W. Sumner, J. Schmid, and M. Pauly. Embedded deformation for shape manip-
ulation. ACM Trans. Graph., 2007.
[24] J. Tong, J. Zhou, L. Liu, Z. Pan, and H. Yan. Scanning 3d full human bodies using
kinects. TVCG, 2012.
[25] P. Verboon. Majore ation wtthiteratively reweighted least squares: A general ap-
proach to optimize a class of resistant loss functions.
[26] X. Wei, P. Zhang, and J. Chai. Accurate realtime full-body motion capture using a
single depth camera. ACM Trans. Graph., 2012.
[28] T. Weise, B. Leibe, and L. V. Gool. Accurate and robust registration for in-hand
modeling. CVPR, 2008.
[29] T. Weise, T. Wismer, B. Leibe, and L. Van Gool. In-hand scanning with online loop
closure. 3DIM, 2009.
17