0% found this document useful (0 votes)
22 views

Dynamic 2D/3D Registration: Sofien Bouaziz Andrea Tagliasacchi Mark Pauly Ecole Polytechnique F Ed Erale de Lausanne

The document summarizes a course on dynamic 2D/3D registration. It discusses: 1) How image and geometry registration algorithms are important for computer graphics and vision systems, and how RGB-D sensors require robust 2D and 3D registration algorithms. 2) The course introduces basics of 2D/3D registration, including formulating it as minimizing a matching energy and prior energy. It discusses using both geometric and color information. 3) Specific registration techniques covered include rigid and non-rigid 3D registration using various priors like rigid transformations, local rigidity, and linear shape models.

Uploaded by

名昊官
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Dynamic 2D/3D Registration: Sofien Bouaziz Andrea Tagliasacchi Mark Pauly Ecole Polytechnique F Ed Erale de Lausanne

The document summarizes a course on dynamic 2D/3D registration. It discusses: 1) How image and geometry registration algorithms are important for computer graphics and vision systems, and how RGB-D sensors require robust 2D and 3D registration algorithms. 2) The course introduces basics of 2D/3D registration, including formulating it as minimizing a matching energy and prior energy. It discusses using both geometric and color information. 3) Specific registration techniques covered include rigid and non-rigid 3D registration using various priors like rigid transformations, local rigidity, and linear shape models.

Uploaded by

名昊官
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Dynamic 2D/3D Registration

Sofien Bouaziz Andrea Tagliasacchi Mark Pauly


École Polytechnique Fédérale de Lausanne

Abstract

Image and geometry registration algorithms are an essential component of many


computer graphics and computer vision systems. With recent technological ad-
vances in RGB-D sensors, such as the Microsoft Kinect or Asus Xtion Live, ro-
bust algorithms that combine 2D image and 3D geometry registration have be-
come an active area of research. The goal of this course is to introduce the ba-
sics of 2D/3D registration algorithms and to provide theoretical explanations and
practical tools to design computer vision and computer graphics systems based on
RGB-D devices. To illustrate the theory and demonstrate practical relevance, we
briefly discuss three applications: rigid scanning, non-rigid modeling, and realtime
face tracking. Our course targets researchers and computer graphics practition-
ers with a background in computer graphics and/or computer vision. An up-to-
date version of the course notes as well as slides and source code can be found at
http://lgg.epfl.ch/2d3dRegistration.

1
About the lecturers

Sofien Bouaziz is a PhD student in the Computer Graphics and Geometry Laboratory
at the École Polytechnique Fédérale de Lausanne (EPFL) under the supervision of Prof.
Mark Pauly. He received his MSc degree in Computer Science from EPFL in 2009. His
research interests include computer graphics, computer vision, and machine learning.
Sofien co-developed the facial motion capture software faceshift studio.

e-mail: [email protected]

website: http://lgg.epfl.ch/~bouaziz

Andrea Tagliasacchi is a post-doctoral scholar in the Computer Graphics and Geome-


try Laboratory at the Ecole Polytechnique Federale de Lausanne (EPFL). He received his
MSc from Politecnico di Milano and a PhD from Simon Fraser University (SFU) under
the joint supervision of Prof. Richard Zhang and Prof. Daniel Cohen-Or. His research
interests include computer graphics, geometry processing and computer vision with a
focus on geometry tracking.

e-mail: [email protected]

website: http://drtaglia.github.io

Mark Pauly is an associate professor of computer science at EPFL in Lausanne, Switzer-


land, where he directs the Computer Graphics and Geometry Laboratory. Prior to joining
EPFL he was an assistant professor at ETH Zurich and a postdoctoral scholar at Stan-
ford University. He received his Ph.D. degree in 2003 from ETH Zurich. His research
interests include computer graphics and animation, shape analysis, geometry processing,
and architectural design.

e-mail: [email protected]

website: http://lgg.epfl.ch

Sofien and Mark are co-founders of faceshift AG (www.faceshift.com), an EPFL spin-off


that brings high-quality markerless facial motion capture to the consumer market.

2
1 Introduction

Recent technological advances in RGB-D sensing devices, such as the Microsoft Kinect,
facilitate numerous new and exciting applications, for example in 3D scanning [24] and
human motion tracking [26, 19, 6]. While affordable and accessible, consumer-level RGB-
D devices typically exhibit high noise levels in the acquired data. Moreover, difficult
lighting situations and geometric occlusions commonly occur in many application set-
tings, potentially leading to a severe degradation in data quality. This necessitates a
particular emphasis on the robustness of image and geometry processing algorithms.
The combination of 2D and 3D registration is one important aspect in the design of ro-
bust applications based on RGB-D devices. This lecture introduces the main concepts
of 2D and 3D registration and explains how to combine them efficiently. An up-to-
date version of these course notes as well as slides and source code can be found at
http://lgg.epfl.ch/2d3dRegistration.

2 2D/3D Registration

In the first part of the course we introduce the theory of 2D/3D registration algorithms
suitable for processing RGB-D data. We focus on pairwise registration to compute the
alignment of a source model onto a target model. This alignment can be rigid or non-
rigid, depending on the type of object being scanned. We formulate the registration as
the minimization of an energy
Ereg = Ematch + Eprior . (1)
The matching energy Ematch defines a measure of how close the source is from the target.
The prior energy Eprior quantifies the deviation from the type of transformation or defor-
mation that the source is allowed to undergo during the registration, for example, a rigid
motion or an elastic deformation. The goal of registration is to find a transformation of
the source model that minimizes Ereg to bring the source into alignment with the target.
For data acquired with RGB-D devices, registration can utilize both the geometric infor-
mation encoded in the 3D depth map, as well as the color information provided by the
recorded 2D images. We show that Equation 1 provides a unified way to formulate both
2D and 3D registration, which simplifies their integration.

2.1 3D Registration

In 3D registration we want to align a source surface X embedded in R3 to a target surface


Y in R3 . To formalize this problem, we introduce a surface Z that is a transformed or
deformed version of X that eventually aligns with Y. To solve the registration problem
numerically, we represent the continuous surface X by a set of points X = {xi ∈ X , i =
1 . . . n} and define their corresponding points on the deformed surface Z as Z = {zi ∈
Z, i = 1 . . . n}. Different sampling strategies have been presented by Rusinkiewicz and
Levoy [21].

3
2.1.1 Matching energy

The matching energy measures how close the surface Z is to the surface Y and is defined
as
Z
Ematch (Z) = ϕ(z, Y)dz, (2)
Z

where z ∈ R3 is a point on Z. The accuracy of the registration is evaluated by the metric


ϕ that measures the distance to Y. For simplicity, we will first use the squared Euclidian
distance as metric. Robust metrics [17] could be use instead to increase the robustness
of the registration to noise and outliers and will be presented later on. Using the set of
points Z, we can discretize the matching energy as
n
X
Ematch (Z) = kzi − PY (zi )k22 . (3)
i=1

where PY (zi ) : R3 → R3 returns the closest point (using Euclidian distance) on the
surface Y from zi . PY (zi ) can also be seen as the orthogonal projection of zi onto Y.

2.1.2 Prior energy

In this section we present several prior energies that can be used for registration. These
energies can also be combined to build more sophisticated priors. Priors encode proper-
ties of the scanned objects. For example, when scanning rigid objects, a global rigidity

4
prior can be used to limit the allowed transformations to rotations and translations. For
deforming objects, for example a human body, geometric priors are often employed that
try to mimic physical behavior such as an elastic deformation. We describe a simple
local rigidity prior that approximates elastic deformations and facilitates efficient imple-
mentations. More complex deformation behavior can be captured using a data-driven
approach. One popular method is based on a collection of sample shapes that represents
the space of space of allowed deformations. Using dimensionality reduction, for example
principal component analysis, efficient linear models can be derived that are suitable for
realtime registration algorithms.

Global rigidity. The global rigidity of the 3D registration can be measured as


n
X
Erigid (Z, R, t) = kzi − (Rxi + t)k22 , (4)
i=1

where R ∈ R3×3 is a rotation matrix and t ∈ R3 a translation vector. In this case, the
deformed surface Z tries to follow a rigid transformation of the original surface X .

Local rigidity. The local rigidity energy, following [22, 4], can be expressed as
n X
X
Earap (Z, Ri |ni=1 ) = k(zj − zi ) − Ri (xj − xi )k22 , (5)
i=1 j∈Ni

where the Ri ∈ R3×3 are rotation matrices and Ni is the set of indices of the neighboring
points of xi . In this case, each local neighborhood on the surface Z tries to follow a rigid
transformation of its corresponding local neighborhood on the surface X . Other local
rigidity energies can also be used as prior, see for example [3, 23].

5
Linear model. A 3D linear shape model can be defined using a matrix P containing
the shape model basis, and a mean shape vector m [10]. A new shape s can be defined
as

s = Pd + m, (6)

where d is a vector containing the basis coefficients. A linear model prior energy can be
formulated as the deviation of the vertices from the linear model
n
X
Eprior (Z, d) = kzi − (Pi d + mi )k22 , (7)
i=1

where Pi and mi are the part of P and m corresponding to the vertex zi .

2.1.3 Optimization

How to best optimize the registration energy depends on the prior energy. In this section
we show, as an example, how to optimize a registration energy for two applications: rigid
scanning and non-rigid modeling.

In-hand rigid scanning. Since single depth maps acquired with the RGB-D sensor
exhibit high noise levels and do not cover the whole surface of the 3D object, an aggrega-
tion procedure is typically applied to obtain a complete model with reduced noise level.
In order to aggregate multiple scans over time, different methods can be used [28, 29, 18].
The classical approach is to perform a 3D rigid registration of the currently acquired scan
of the object with the already accumulated 3D data. The pairwise 3D alignment can be
formulated as

E(Z, R, t) = w1 Ematch + w2 Erigid (8)


Xn
Ematch = kzi − PY (zi )k22
i=1
Xn
Erigid = kzi − (Rxi + t)k22
i=1

where the matching energy is combined with a global rigidity prior. To optimize E(Z, R, t)
we linearize the rotation matrix [20] approximating cos θ by 1 and sin θ by θ
 
1 −γ β
R ≈ R̃ =  γ 1 −α . (9)
−β α 1

The alignment is computed by solving iteratively


n
X
arg min w1 kzt+1
i − PY (zti )k22 + w2 kzt+1
i − (R̃(Rt xi + tt ) + t̃)k22 , (10)
Z t+1 ,R̃,t̃ i=1

6
where t is the iteration number and z0i = xi . As PY (.) is a non linear function that is
difficult to optimize with, we use in the optimization the previous estimate PY (zti ). This
correspond to the point-to-point matching error [1]. To speed up the convergence of the
optimization one can linearize kzt+1
i − PY (zti )k2 at PY (zti ) which gives nTi (zt+1
i − PY (zti )),
where ni is the normal of the surface Y at PY (zti ). This leads to the point-to-plane
matching error [8]. The optimization can be reformulated as
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 + w2 kzt+1
i − (R̃(Rt xi + tt ) + t̃)k22 . (11)
Z t+1 ,R̃,t̃ i=1

Both Equation 10 and Equation 11 are quadratic, and therefore, can be optimized by
setting the partial derivatives to zero by solving a linear system. During the optimization,
it can be advantageous to apply a Tikhonov regularization to the parameters of the rigid
motion as linearizing the rotation matrix assumes that the angles are small.

Rigidity as a hard constraint. It is interesting to note that when w2 = +∞ then zi


can be replaced into the matching energy by Rxi + t leading to a registration energy
n
X
E(R, t) = k(Rxi + t) − PY (Rxi + t)k22 . (12)
i=1

This energy can be minimized in a similar spirit by linearizing the rotation matrix and
iteratively solving a linear system. Other approaches can be found in [11].

Shape Model
Fitting

Accumulated Scans 3D Mesh

Figure 1: Registration of a morphable model towards the scanned face.

Non-rigid registration. Registering a shape template towards a scanned 3D object


allows to obtain a complete and clean 3D mesh [15]. An example is given below in the
context of face modeling. In this case, the morphable model of Blanz and Vetter [2] that
represents the variations of different human faces in neutral expression is registered to a

7
scan of a face. Non-rigid modeling using a morphable model can be formulated as

E(Z, d, Ri |ni=1 , R, t) = w1 Ematch + w2 Erigid + w3 Emodel + w4 Earap (13)


Xn
Ematch = kzi − PY (zi )k22
i=1
Xn
Erigid = kzi − (Rxi + t)k22
i=1
n
X
Emodel = kzi − (Pi d + mi )k22
i=1
n X
X
Earap = k(zj − zi ) − Ri (xj − xi )k22
i=1 j∈Ni
(14)

A local rigidity energy is added to the optimization in order to get an accurate result, as
the morphable model represents the large-scale variability but might not capture small
scale details. As previously, we solve iteratively
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 + w2 kzt+1
i − (R̃(Rt xi + tt ) + t̃)k22 +
Z t+1 ,d,R̃i |n
i=1 ,R̃,t̃ i=1
X
w3 kzt+1
i − (Pi d + mi )k22 + w4 k(zt+1
j − zt+1 t 2
i ) − R̃i Ri (xj − xi )k2 , (15)
j∈Ni

which corresponds to solving a linear system.

2.2 2D Registration

In 2D registration we want to register a source image I to a target image J. During the


registration process, the 2D pixel grid of the source image X = {xi ∈ R2 , i = 1 . . . n} is
deformed to Z = {zi ∈ R2 , i = 1 . . . n} to match the target image.

2.2.1 Matching energy

We define I(x) as the pixel value of the image I located at the position x. The matching
energy measures the color similarity between the source image and the target image

8
wrapped onto the deformed grid Z .
n
X
Ematch (Z) = kI(xi ) − J(zi )k22 . (16)
i=1

2.2.2 Prior energy

Similarly to 3D geometry registration, we can use different prior energies that can be
combined to build more complex priors.

Lucas-Kanade. In the Lucas-Kanade algorithm [16] the deformation is assumed to be


constant within a patch around each pixel. This corresponds to the prior energy
n X
X
ELK (Z) = k(zj − xj ) − (zi − xi )k22 , (17)
i=1 j∈Ni

where Ni is the set of indices of the neighbors of xi .

Horn-Schunck. In the Horn-Schunck algorithm [14] the smoothness of the flow is de-
fined using a Laplacian operator
n
X X
EHK (Z) = k(zi − xi ) − |Ni |−1 (zj − xj )k22 , (18)
i=1 j∈Ni

where |Ni | is the cardinality of Ni . This energy measures for each grid vertex the deviation
of its deformation from the mean deformation of its neighbors.

2.2.3 Optimization

In this section we show, as an example, how to optimize the matching energy combined
with the laplacian smoothness energy. This is similar to the method presented in [14].
Our optimization energy is

E(Z) = w1 Ematch + w2 EHK (19)


n
X
Ematch = kI(xi ) − J(zi )k22
i=1
n
X X
EHK = k(zi − xi ) − |Ni |−1 (zj − xj )k22
i=1 j∈Ni

9
To solve this optimization we linearize J(.) at the current estimate and solve iteratively
n
X
arg min w1 kI(xi ) − J(zti ) − ∇J(zti )T (zt+1
i − zti )k22 +
Z t+1 i=1
X
w2 k(zt+1
i − xi ) − |Ni |−1 (zt+1
j − xj )k22 . (20)
j∈Ni

 T
where ∇J = ∇Jx ∇Jy is the image gradient, with ∇Jx the image gradient in x
direction and ∇Jy the image gradient in y direction. As previously, the minimization
can be computed by setting the partial derivative to zero, which corresponds to solving
a linear system.

2.3 2D/3D Registration

We show how to combine 2D image registration and 3D geometry registration to best


utilize the data provided by the RGB-D sensor. More specifically, we want to register a
surface X ⊂ R3 with color information I, i.e. a texture mapped surface, to a 3D surface Y
with corresponding color image J. As previously, the source X is deformed to a surface Z.
We sample the continuous surface X to obtain a set of points X = {xi ∈ X , i = 1 . . . n}.
We define their corresponding points on the deformed surface Z as Z = {zi ∈ Z, i =
1 . . . n}. The color information of sample point xi is given by I(xi ).

2.3.1 Matching energy

We formulate the energy measuring the quality of the 2D and 3D alignment as follow
n
X
Ematch (Z) = w1 kzi − PY (zi )k22 + w2 kI(xi ) − J(f (zi ))k22 . (21)
i=1

The first term is the matching energy presented in Section 2.1. The second term is similar
to the 2D matching energy presented in Section 2.2. The only difference is the additional
function f : R3 → R2 that projects a 3D point zi to the 2D image J. For example this
h iT
f zi,x f zi,y
function could be a perspective projection of the form f (zi ) = zi,z zi,z .

2.3.2 Optimization

We illustrate 2D/3D registration in the context of a face tracking system that combines
the 2D/3D matching energy with a 3D blendshape prior. A blendshape representation
is a linear model defined as a set of blendshape meshes B = [b0 , ..., bn ] where b0 is the
rest pose and bi , i > 0 are different expressions. A new expression can be generated as
T = b0 + Bd, where B = [b1 − b0 , ..., bn − b0 ]. The blendshape model shown below is
inspired from Ekmans Facial Action Coding System [12]. Realtime face tracking using

10
Neutral

Figure 2: A blendshape model composed of 48 expressions.

an RGB-D device can be formulated as a 2D/3D registration of the blendshape model to


the 2D and 3D data [27]. The registration energy can be formulated as

E(Z, d, R, t) = w1 Ematch geometry + w2 Ematch color + w3 Emodel+rigid (22)


n
X
Ematch geometry = kzi − PY (zi )k22
i=1
n
X
Ematch color = kI(xi ) − J(f (zi ))k22
i=1
Xn
Emodel+rigid = kzi − (R(Bi d + b0i ) + t)k22
i=1

11
To solve this optimization we linearize J(f (.)) at the current estimate
n
X ∂f (zti ) t+1
kI(xi ) − J(f (zt+1 t t T
i )k ≈ kI(xi ) − J(f (zi )) − ∇J(f (zi )) (zi − zti )k22 . (23)
i=1
∂zi
h iT
f zi,x f zi,y
For a perspective projection f (zi ) = zi,z zi,z we have
" f f zi,x #
∂f (zi ) zi,z
0 − z2i,z
= f f zi,y . (24)
∂zi 0 zi,z
− z2
i,z

In [27], the global rigidity is decoupled leading to a two steps optimization procedure. In
a first step, a 2D/3D alignment of the blendshape model is computed
n
X
arg min w1 (nTi (zt+1
i − PY (zti )))2 +
Z t+1 ,dt+1 i=1
∂f (zti ) t+1
w2 kI(xi ) − J(f (zti )) − ∇J(f (zti ))T (zi − zti )k22 +
∂zi
t+1
w3 kzi − (R (Bi d + b0i ) + tt )k22 ,
t t+1
(25)

in a second step, a 3D rigid alignment is performed


n
X
arg min kzt+1
i − (Rt+1 (Bi dt+1 + b0i ) + tt+1 )k22 . (26)
Rt+1 ,tt+1 i=1

These two steps are repeated alternatively until convergence. The first step can be
computed by solving a linear system. The second step can be solved using [11] or by
linearizing the rotation matrix. For tracking, another 2D matching energy can be added
to the system:
n
X
Ematch (Z t+1 ) = kJt (f (zti )) − Jt+1 (f (zt+1 2
i ))k2 . (27)
i=1

This optical flow energy enforces color consistency over time by measuring the variation
of color from the previous image frame Jt to the current frame Jt+1 for each zi .

12
3 Robust Registration

In registration, outliers are not only introduced by corrupted sensor measurements, but
also by partial overlaps - many samples on the source simply do not have an ideal cor-
responding point on the target shape. To address this problem, various techniques rely
on a set of heuristics to either prune or downweigh low quality correspondences. Typical
criteria include discarding correspondences that are too far from each other, have dissim-
ilar normals, or involve points on the boundary of the geometry; see [21] for details. As
we will see next these heuristics are related to the optimization of robust functions. In
this section we will consider robust functions as alternatives to the Euclidean metric and
introduce a suitable optimization technique to use them efficiently.

In previous sections, we always considered an energy composed by terms like ϕ((p)),


where ϕ() = 2 and (p) is the euclidean norm of the residual vector with parameters p.
This squared Euclidian distance metric is ideal for the data corrupted by Gaussian noise
as it is the maximum-likelyhood solution of the problem [7, Sec. 7.1.1]. However, it is not
robust to outliers which are common in real world data acquired by RGB-D devices.
0.5 0.5
(
0.45 0.45
1 x2 if |x| 6 τ
2
0.4 0.4
1 τ2 otherwise
1

2
1 x2 |x|p p
0.35 0.35
2 0.8
= 0.9
0.3 0.3 τ = 0.80 p = 0.7
0.25 0.25 0.6
p = 0.5
p = 0.3
0.2 0.2
τ = 0.64 0.4
0.15 0.15

0.1 0.1
τ = 0.48 0.2

0.05 0.05
τ = 0.32
0 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5


1 if |x| 6 τ
5
0 otherwise p|x|p−2
1 4.5
1 1

0.8 0.8 3.5

0.6 0.6
2.5

2
0.4 0.4
1.5

1
0.2 0.2

0.5

0 0 0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 3: (top) The robust norms ϕ. (bottom) The associated weight functions w.

In registration, robustness can be obtained by exploiting robust functions [17]. In this


framework, ϕ() acts as a “penalty” function – a function measuring the influence that
a certain residual has in the optimization. Given one of these functions, our robust
optimization can be expressed as
n
X
arg min ϕ(i (p)). (28)
p
i=1

In Fig. 3 we show a few examplar commonly used penalty functions, note how these
all posses properties like radial monotonicity and symmetry [13]. This optimization
problem in Equation 28 can be solved using Iteratively Re-Weighted Least Squares (IRLS)

13
by solving a sequence of problems of the form
n
X
arg min αi i (p)2 . (29)
p
i=1

To understand how to compute the weights αi first notice that the optima of Eq. 28 can
be obtained by vanishing its gradient, which can be computed by a simple application of
the chain rule (note we only look at one element of the sum)
∂ϕ((p)) ∂(p) ∂(p)
= ψ((p)) = w((p))(p) , (30)
∂p ∂p ∂p
where ψ(x) = ∂ϕ(x)/∂x for compactness of notation and w(x) = ψ(x)/x is the so called
weighting function. Interestingly, the gradient of Eq. 29 is
∂α (p)2 ∂(p)
= α (p) . (31)
∂p ∂p
We can now see that by setting α = w((p)) the two gradients become equal. However,
as the optimal weights αi∗ = w(i (p∗ )) are not available, we use an iterative approach
where at each iteration the weights are computed using the previous iteration
n
X
arg min w(i (pt ))i (pt+1 )2 . (32)
pt+1 i=1

This scheme is know as Iteratively Re-Weighted Least Squares (IRLS) and is related to
majorization-minimization. The basic idea of majorization-minimization is to iteratively
minimize a function always larger or equal to the objective function and with at least
one point in common. If these requirements are fullfilled the algorithm converges to a
minimum [25].

Trimmed Metrics. Discarding unreliable correspondences is undoubtedly the simplest


and most common way of dealing with outliers [21]. This can as well be formulated by
Eq. 28, as it corresponds to a weight function like the one in Fig. 3 (bottom-middle)
whose corresponding penalty function is a truncated squared euclidean norm Fig. 3 (top-
middle). Even though this is trivial to implement, the local support of the weight function
is problematic: if the souce surface is too far from the target surface the registration
process will not proceed as all the weights would be zero valued. A possible solution is
to dynamically adapt the threshold value by analyzing the distribution of residuals. For
example, when the ratio of outliers versus inliers is known a priori, then the threshold
can be readily estimated [9].

Sparse Metrics. The shortcomings of trimmed metrics can be overcome by considering


sparse metrics. The penalty functions for sparse metrics take the form ϕ() = ||p , see
Fig. 3 (bottom-right). An important observation is that the weight functions of p-norms
tend to infinity as we approach zero giving a very large reward to inliers. Moreover,
contrary to trimmed metrics, p-norms weakly penalize outliers leading to a more stable
approach when target and source are far apart. This metric has been demonstrated
successful in [5].

14
4 Conclusion

In this course, we introduced 2D/3D registration algorithms and show their applications
for data captured with RGBD devices, such as the Microsoft Kinect or Asus Xtion Live.
Image and geometry registration algorithms are an essential component of many computer
graphics and computer vision systems. With recent technological advances in RGB-D
sensors, robust algorithms that combine 2D image and 3D geometry registration have
become an active area of research. The goal of this course was to introduce the basics
of 2D/3D registration algorithms and to provide theoretical explanations and practical
tools to design robust computer vision and computer graphics systems based on RGBD
devices. We have shown that 2D and 3D registration can be expressed and combined in
a common framework. Numerous application based on RGB-D devices can benefit from
this formulation that allows to combine different priors in an easy manner. To illustrate
the theory and demonstrate practical relevance, we briefly discuss three applications:
rigid scanning, non-rigid modeling, and realtime face tracking.

15
References
[1] P. Besl and H. McKay. A method for registration of 3d shapes. PAMI, 1992.

[2] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. Proc. of
ACM SIGGRAPH, 1999.

[3] M. Botsch, M. Pauly, M. Gross, and L. Kobbelt. Primo: coupled prisms for intuitive
surface modeling. SGP, 2006.

[4] S. Bouaziz, M. Deuss, Y. Schwartzburg, T. Weise, and M. Pauly. Shape-up: Shaping


discrete geometry with projections. Comput. Graph. Forum, 2012.

[5] S. Bouaziz, A. Tagliasacchi, and M. Pauly. Sparse iterative closest point. SGP, 2013.

[6] S. Bouaziz, Y. Wang, and M. Pauly. Online modeling for realtime facial animation.
ACM Trans. Graph., 2013.

[7] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,


2004.

[8] Y. Chen and G. Medioni. Object modeling by registration of multiple range images.
In ICRA, 1991.

[9] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek. The trimmed iterative clos-
est point algorithm. In Pattern Recognition, 2002. Proceedings. 16th International
Conference on, volume 3, pages 545–548. IEEE, 2002.

[10] T. Cootes and C. Taylor. Statistical models of appearance for computer vision, 2000.

[11] D. W. Eggert, A. Lorusso, , and R. B. Fisher. Estimating 3-d rigid body transfor-
mations: a comparison of four major algorithms. Machine Vision and Applications,
1997.

[12] P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the
Measurement of Facial Movement. Consulting Psychologists Press, 1978.

[13] J. Fox. An R and S-Plus companion to applied regression. Sage, 2002. http://cran.r-
project.org/doc/contrib/Fox-Companion/appendix-robust-regression.pdf.

[14] B. K. P. Horn and B. G. Schunck. ”determining optical flow”. Artif. Intell., 1981.

[15] H. Li, B. Adams, L. J. Guibas, and M. Pauly. Robust single-view geometry and
motion reconstruction. ACM Trans. Graph., 2009.

[16] B. D. Lucas and T. Kanade. An iterative image registration technique with an


application to stereo vision. IJCAI, 1981.

[17] M. Mirza and K. Boyer. Performance evaluation of a class of m-estimators for


surface parameter estimation in noisy range data. IEEE Transactions on Robotics
and Automation, 9:75–85, 1993.

16
[18] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison,
P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense
surface mapping and tracking. ISMAR, 2011.

[19] I. Oikonomidis, N. Kyriazis, and A. Argyros. Tracking the articulated motion of two
strongly interacting hands. CVPR, 2012.

[20] S. Rusinkiewicz. Derivation of point to plane minimization, 2013. http://www.cs.


princeton.edu/~smr/papers/icpstability.pdf.

[21] S. Rusinkiewicz and M. Levoy. Efficient variants of the icp algorithm. 3DIM, 2001.

[22] O. Sorkine and M. Alexa. As-rigid-as-possible surface modeling. SGP, 2007.

[23] R. W. Sumner, J. Schmid, and M. Pauly. Embedded deformation for shape manip-
ulation. ACM Trans. Graph., 2007.

[24] J. Tong, J. Zhou, L. Liu, Z. Pan, and H. Yan. Scanning 3d full human bodies using
kinects. TVCG, 2012.

[25] P. Verboon. Majore ation wtthiteratively reweighted least squares: A general ap-
proach to optimize a class of resistant loss functions.

[26] X. Wei, P. Zhang, and J. Chai. Accurate realtime full-body motion capture using a
single depth camera. ACM Trans. Graph., 2012.

[27] T. Weise, S. Bouaziz, H. Li, and M. Pauly. Realtime performance-based facial


animation. ACM Trans. Graph., 2011.

[28] T. Weise, B. Leibe, and L. V. Gool. Accurate and robust registration for in-hand
modeling. CVPR, 2008.

[29] T. Weise, T. Wismer, B. Leibe, and L. Van Gool. In-hand scanning with online loop
closure. 3DIM, 2009.

17

You might also like