Multiple View Geometry in Robotics
Multiple View Geometry in Robotics
Lecture 08
Multiple View Geometry 2
Davide Scaramuzza
[Link] 1
Lab Exercise 6 - Today
Implement the 8-point algorithm
𝐾1 , 𝑅1 ,𝑇1
2-view Structure From Motion: 𝐾2 , 𝑅2,𝑇2
• Assumptions: none (K, T, and R are unknown).
• Goal: Recover simultaneously 3D scene structure and
camera poses (up to scale) from two images 𝑃𝑖 =?
𝐾1 , 𝑅1 ,𝑇1 =?
𝐾2 , 𝑅2,𝑇2 =?
3
Structure from Motion (SFM)
Problem formulation: Given a set of 𝑛 point correspondences between two images, {𝑝𝑖 1 = (𝑢𝑖 1 , 𝑣 𝑖 1 ), 𝑝𝑖 2 =
(𝑢𝑖 2 , 𝑣 𝑖 2 )}, where 𝑖 = 1 … 𝑛, the goal is to simultaneously
• estimate the 3D points 𝑷𝑖 ,
• the camera relative-motion parameters (𝑹, 𝑻), 𝑷𝑖 = ?
• and the camera intrinsics 𝑲1, 𝑲2 that satisfy:
X iw
u 1
i
i
1i v i1 = K1 I 0 i
Yw
Z w
1
1
X iw
u 2
i
i
2 i v i 2 = K 2 R T i
Yw
𝐶1
Z w 𝑅, 𝑇 = ?
1
1 𝐶2
4
Structure from Motion (SFM)
Two variants exist:
• Calibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are known
• Uncalibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are unknown
𝑷𝑖 = ?
𝐶1
𝑅, 𝑇 = ?
𝐶2
5
Structure from Motion (SFM)
• Let’s study the case in which the cameras are calibrated u u
• For convenience, let’s use normalized image coordinates → v = K −1 v
• Thus, we want to find 𝑹, 𝑻, 𝑷𝑖 that satisfy: 1 1
𝑷𝑖 = ?
X iw
u 1
i
i
i i
1 v 1 = I 0 i
Yw
Z w
1
1
X iw
u 2
i
i
i 2 v i 2 = R T i
Yw
Z w
1
1
𝐶1
𝑅, 𝑇 = ?
𝐶2
6
Scale Ambiguity
If we rescale the entire scene and camera views by a constant factor (i.e., similarity transformation), the
projections (in pixels) of the scene points in both images remain exactly the same:
7
Scale Ambiguity
• In Structure from Motion, it is therefore not possible to recover the absolute scale of the
scene!
• What about stereo vision? Is it possible? Why?
• Thus, only 5 degrees of freedom are measurable:
• 3 parameters to describe the rotation
• 2 parameters for the translation up to a scale (we can only compute the direction of translation but
not its length)
8
Structure From Motion (SFM)
• How many knowns and unknowns?
• 𝟒𝒏 knowns:
• 𝑛 correspondences; each one (𝑢𝑖 1 , 𝑣 𝑖 1 ) and (𝑢𝑖 2 , 𝑣 𝑖 2 ), 𝑖 = 1 … 𝑛
• 𝟓 + 𝟑𝒏 unknowns
• 5 for the motion up to a scale (3 for rotation, 2 for translation)
• 3𝑛 = number of coordinates of the 𝑛 3D points
E. Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit Innerer Orientierung, Sitz.-Ber. Akad. Wiss., Wien,
Math. Naturw. Kl., Abt. IIa., 1913. – English Translation plus original paper by Guillermo Gallego, Arxiv, 2017 9
Structure From Motion (SFM)
• Can we solve the estimation of relative motion (𝑅, 𝑇) independently of the
estimation of the 3D points? Yes! The next couple of slides prove that this is
possible.
• Once (𝑅, 𝑇) are known, the 3D points can be triangulated using the
triangulation algorithm from Lecture 7 (i.e., least square approximation plus
reprojection error minimization)
10
The Epipolar Constraint: Recap from Lecture 07
• The camera centers 𝐶1 , 𝐶2 and one image point 𝑝1 (or 𝑝2 ) determine the so called epipolar plane
• The intersections of the epipolar plane with the two image planes are called epipolar lines
• Corresponding points must therefore lie along the epipolar lines: this constraint is called epipolar
constraint
• An alternative way to formulate the epipolar constraint is to notice that two corresponding image vectors
plus the baseline must be coplanar
𝑝1
epipolar line epipolar line
epipolar plane 𝑝2
C1 C2
11
Epipolar Geometry
u1 u2 𝑃
p1 = v1 p2 = v2
1 1
𝑝1 epipolar plane 𝑝2 𝑝′lj 1 = 𝑅𝑝1
normal 𝑛
C1
C2
T
epipolar constraint
E = [𝑇× ]𝑅 essential matrix
12
Epipolar Geometry
u1 u2
p1 = v1 p2 = v2 Normalized image coordinates
1 1
H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF. 13
Example: Essential Matrix of a Camera Translating along 𝑥
E = T× R
0 −𝑡𝑧 𝑡𝑦 0 0 0
T× = 𝑡𝑧 0 −𝑡𝑥 = 0 0 𝑏
−𝑡𝑦 𝑡𝑥 0 0 −𝑏 0
R = I3×3 −𝑏
T= 0
0
0 0 0
՜ 𝐸 = T× 𝑅 = 0 0 𝑏
0 −𝑏 0
14
How to compute the Essential Matrix?
• If we don’t know (𝑅, 𝑇) can we estimate 𝐸 from two images?
• Yes, given at least 5 correspondences
Image 1 Image 2
15
A Note of History
• Kruppa showed in 1913 that 5 image correspondences is the minimal case and that there can be at up to
11 solutions
• However, in 1988, Demazure showed that there are actually at most 10 distinct solutions.
• In 1996, Philipp proposed an iterative algorithm to find these solutions.
• In 2004, Nister proposed the first efficient and non iterative solution. It uses Groebner basis
decomposition.
• The first popular solution uses 8 points and is called the 8-point algorithm or Longuet-Higgins algorithm
(1981). Because of its ease of implementation, it is still used today (e.g., NASA rovers).
[1] E. Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit Innerer Orientierung, Sitz.-Ber. Akad. Wiss., Wien, Math. Naturw. Kl., Abt. IIa., 1913. –
English Translation plus original paper by Guillermo Gallego, Arxiv, 2017
[2] H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF.
[3] D. Nister, An Efficient Solution to the Five-Point Relative Pose Problem, PAMI, 2004, PDF
16
The 8-point algorithm
• Each pair of point correspondences 1
p = (u ,
1 1v ,1) T
, p2 = (u2 , v2 ,1)T
provides a linear equation:
p2T E p1 = 0 e11 e12 e13
E = e21 e22 e23
e31 e32 e33
NB: The 8-point algorithm assumes that the entries of E are all independent
(which is not true since, for the calibrated case, they depend on 5 parameters (R and T))
The 5-point algorithm uses the epipolar constraint considering the dependencies among all entries.
H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF. 17
The 8-point algorithm
• For 𝑛 points, we can write
e11
e
12
e13
u21u11 1
u2 v1
1
u2
1 1
v2 u1
1 1
v2 v1
1
v2
1
u1
1
v1
1
1
2 2 2 2 2 2 2 2 2 2 2 2 e21
u2 u1 u2 v1 u2 v2 u1 v2 v1 v2 u1 v1 1
e22 = 0
n n e23
u2 u1 1
n n n n n n n n n n
u2 v1 u2 v2 u1 v2 v1 v2 u1 v1
e31
e32
e
Q (this matrix is known) 33
Over-determined solution
• n > 8 points
• A solution is to minimize | 𝑄𝐸ത |2 subject to the constraint | 𝐸ത |2 = 1.
The solution is the eigenvector corresponding to the smallest eigenvalue of the matrix 𝑄𝑇 𝑄 (because it is the unit vector 𝑥 that
minimizes | 𝑄𝑥 |2 = 𝑥 𝑇 𝑄𝑇 𝑄𝑥).
• It can be solved through Singular Value Decomposition (SVD). Matlab instructions:
[U,S,V] = svd(Q);
Ev = V(:,9);
E = reshape(Ev,3,3)';
Degenerate Configurations
• The solution of the 8-point algorithm is degenerate when the 3D points are coplanar.
• Conversely, the 5-point algorithm works also for coplanar points
19
8-point algorithm: Matlab code
A few lines of code. In today’s exercise you will learn how to implement it
function E = calibrated_eightpoint( p1, p2)
Q = [p1(:,1).*p2(:,1) , ...
p1(:,2).*p2(:,1) , ...
p1(:,3).*p2(:,1) , ...
p1(:,1).*p2(:,2) , ...
p1(:,2).*p2(:,2) , ...
p1(:,3).*p2(:,2) , ...
p1(:,1).*p2(:,3) , ...
p1(:,2).*p2(:,3) , ...
p1(:,3).*p2(:,3) ] ;
[U,S,V] = svd(Q);
Eh = V(:,9);
E = reshape(Eh,3,3)';
20
Extract R and T from E
Won’t be asked
• Singular Value Decomposition: E = U V T at the exam
☺
• Enforcing rank-2 constraint: set smallest singular value of ∑ to 0:
1 0 0 1 0 0
= 0 2 0 = 0 2 0
0 0 3 0 0 0
0 1 0 0 − tz ty t x
Tˆ = U 1 0 0 V T
Tˆ = t z 0 t x tˆ = t y
− t y t z
0 0 0 tx 0
0 1 0 T = K 2tˆ
Rˆ = U 1 0 0V T
0 0 1
R = K Rˆ K −1 2 1
21
4 possible solutions of R and T
There exists only one solution where points are in front of both cameras
These two views are flipped by 180 ͦ around the optical axis
22
Structure from Motion (SFM)
Two variants exist:
• Calibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are known
• Uses the Essential matrix
• Uncalibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are unknown 𝑷𝑖 = ?
• Uses the Fundamental matrix
𝐶1
𝑅, 𝑇 = ?
𝐶2
23
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:
p2T E p1 = 0
i T
u
2
u1i
i
v1 = 0
i
v
2 E
1 1
24
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:
p2T E p1 = 0
i T
u
2
u1i
-1 i
1 v1 = 0
i -T
v
2 K 2 E K
1 1
25
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:
p2T E p1 = 0
T Fundamental Matrix F = K -T E K -1
u2i u1i 2 1
i i
v
2 F v1 = 0
1 1 Fun thing: check out the Fundamental Matrix song,
[Link] :-) 26
The 8-point Algorithm for the Fundamental Matrix
• The same 8-point algorithm to compute the essential matrix from a set of normalized
image coordinates can also be used to determine the Fundamental matrix:
i T
u
2
u1i
i
v F v1 = 0
i
2
1 1
• However, now the key advantage is that we work directly in pixel coordinates
27
Problem with 8-point algorithm
f11
f
12
f13
u21u11 1
u2 v1
1
u2
1 1
v2 u1
1 1
v2 v1
1
v2
1
u1
1
v1
1
1
2 2 2 2 2 2 2 2 2 2 2 2 f 21
u2 u1 u2 v1 u2 v2 u1 v2 v1 v2 u1 v1 1
f 22 = 0
n n f 23
u2 u1 1
n n n n n n n n n n
u2 v1 u2 v2 u1 v2 v1 v2 u1 v1
f
31
f 32
f
33
28
Problem with 8-point algorithm
• Poor numerical conditioning, which makes results very sensitive to noise
• Can be fixed by rescaling the data: Normalized 8-point algorithm f11
f
12
f13
f 21
f 22 = 0
f 23
f
31
~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1 f 32
f
Orders of magnitude difference 33
between column of data matrix
→ least-squares yields poor results
29
Normalized 8-point algorithm (1/3)
• This can be fixed using a normalized 8-point algorithm [Hartley, 1997], which estimates the Fundamental
matrix on a set of Normalized correspondences (with better numerical properties) and then unnormalizes
the result to obtain the fundamental matrix for the given (unnormalized) correspondences
• Idea: Transform image coordinates so that they are in the range ~[−1,1] × [−1,1]
• One way is to apply the following rescaling and shift
Hartley, In defense of the eight-point algorithm, IEEE Transactions of Pattern Analysis and Machine Intelligence, PDF 30
Normalized 8-point algorithm (3/3)
The Normalized 8-point algorithm can be summarized in three steps:
1. Normalize the point correspondences: 𝑝 ෞ1 = 𝐵1 𝑝1 , 𝑝 ෞ2 = 𝐵2 𝑝2
2. Estimate normalized 𝐹 with 8-point algorithm using normalized coordinates ෞ
𝑝1 , 𝑝
ෞ2
3. Compute unnormalized F from 𝐹:
𝑝ෞ1 =
𝑝ෝ2 𝑇 F 0
𝑝2⊤ 𝐵2 ⊤ F 𝐵1 𝑝1
F = B2⊤ F B1
31
Normalized 8-point algorithm (2/3)
• In the original 1997 paper, Hartley proposed to rescale the two point sets such that the centroid of each set
is 0 and the mean standard deviation 2 (equivalent to having the points distributed around a circled
passing through the four corners of the [−1,1] × [−1,1] square).
𝑖
2 𝑖
• This can be done for every point as follows: 𝑝 = (𝑝 − 𝜇)
𝜎
1 1 2
where 𝜇 = (𝜇𝑥 , 𝜇𝑦 ) = ∑𝑛𝑖=1 𝑝𝑖 is the centroid and 𝜎 = ∑𝑛𝑖=1 𝑝𝑖 − 𝜇 is the mean standard deviation
𝑁 𝑁
of the point set
• This transformation can be expressed in matrix form using homogeneous coordinates:
2 2
0 −𝜇
𝜎 𝜎 𝑥
𝑝𝑖 = 2 2 𝑝𝑖
0 − 𝜇
𝜎 𝜎 𝑦
0 0 1
Hartley, In defense of the eight-point algorithm, IEEE Transactions of Pattern Analysis and Machine Intelligence, 1997. PDF 32
Can 𝑅, 𝑇, 𝐾1 , 𝐾2 be extracted from F?
• In general no: infinite solutions exist
• However, if the coordinates of the principal points of each camera are known
and the two cameras have the same focal length 𝑓 in pixels, then 𝑅, 𝑇, 𝑓 can
determined uniquely
33
Comparison between Normalized and non-normalized algorithm
Avg. Ep. Line Distance 2.33 pixels 0.92 pixel 0.86 pixel
34
Error Measures
• The quality of the estimated Essential or Fundamental matrix can be measured using different error
metrics:
• Algebraic error
• Directional Error
• Epipolar Line Distance
• Reprojection Error
• When is the error 0?
• These errors will be exactly 0 only if 𝑬 (or 𝑭) is
computed from just 8 points (because in this
case a non-overdetermined solution exists).
• For more than 8 points, it will only be 0 if there Epipolar plane
is no noise or outliers in the data p2
p1
(if there is image noise or outliers then it
the system becomes overdetermined)
𝑛
C1 C2 35
Algebraic Error
• It follows directly from the 8-point algorithm, which seeks to minimize the algebraic error:
N
err = QE = iT
2 i 2
( p E p
2 1)
i =1
• From the proof of the epipolar constraint and using the definition of dot product, it can be observed that:
ഥ⊤
𝒑𝟐 𝑬ഥ
𝒑1 = ഥ⊤
𝒑𝟐 ∙ (𝑬ഥ
𝒑1) ഥ2
= 𝒑 𝑬ഥ
𝒑1 cos(𝜃)
ഥ2
= 𝒑 ഥ1 cos(𝜃)
T× 𝑅 𝒑
• We can see that this product depends on the
angle 𝜃 between 𝒑ഥ2 and the normal 𝒏 = 𝑬𝒑1 to
the epipolar plane. Epipolar plane
It is nonzero when 𝒑ഥ1,ഥ
𝒑2, and 𝑻 are not coplanar p2
p1
• What is the drawback of this error measure?
𝑛
C1 C2 36
Directional Error
𝑁
• Sum of squared cosines of the angle from the epipolar plane: err = (cos(𝜃𝑖 ))2
𝑖=1
ഥ⊤
𝒑 𝟐 𝑬ഥ
𝒑1
• It is obtained by normalizing the algebraic error: cos(𝜃) =
𝒑2 𝑬𝒑1
Epipolar plane
p2
p1
𝑛
C1 C2 37
Epipolar Line Distance
𝑁
2 2
• Sum of Squared Epipolar-Line-to-point Distances: 𝑒𝑟𝑟 = 𝑑 𝑝1𝑖 , 𝑙1𝑖 + 𝑑 𝑝2𝑖 , 𝑙2𝑖
𝑖=1
• Cheaper than reprojection error because does not require point triangulation
l1 = F T p2 epipolar plane
l2 = Fp1
p2
p1
C1 C2 38
Reprojection Error
𝑁
2 2
• Sum of the Squared Reprojection Errors: 𝑒𝑟𝑟 = 𝑝1𝑖 −𝜋 𝑃𝑖 , 𝐾1 , 𝐼, 0 + 𝑝2𝑖 −𝜋 𝑃𝑖 , 𝐾2 , 𝑅, 𝑇
𝑖=1
• More expensive than the previous three errors because it requires to first triangulate the 3D points!
• However, it is the most popular because more accurate. The reason is that the error is computed directly
with the respect the raw input data, which are the image points
Camera 2 reprojection error
𝑝2 − 𝜋 𝑃, 𝐾2, 𝑅, 𝑇
𝑃 Reprojected point
𝜋 𝑃, 𝐾2 , 𝑅, 𝑇
Reprojected point
𝜋 𝑃, 𝐾1, 𝐼, 0 epipolar plane
p2
p1
Camera 1 reprojection error
𝑝1 − 𝜋 𝑃, 𝐾1, 𝐼, 0
C1 C2 39
Things to remember
• SFM from 2 view
• Calibrated and uncalibrated case
• Proof of Epipolar Constraint
• 8-point algorithm and algebraic error
• Normalized 8-point algorithm
• Algebraic, directional, Epipolar line distance, Reprojection error
40
Readings
• CH. 11.3 of Szeliski book, 2nd edition
• Ch. 14.2 of Corke book
41
Understanding Check
Are you able to answer the following questions?
• What's the minimum number of correspondences required for calibrated SFM and why?
• Are you able to derive the epipolar constraint?
• Are you able to define the essential matrix?
• Are you able to derive the 8-point algorithm?
• How many rotation-translation combinations can the essential matrix be decomposed into?
• Are you able to provide a geometrical interpretation of the epipolar constraint?
• Are you able to describe the relation between the essential and the fundamental matrix?
• Why is it important to normalize the point coordinates in the 8-point algorithm?
• Describe one or more possible ways to achieve this normalization.
• Are you able to describe the normalized 8-point algorithm?
• Are you able to provide quality metrics and their interpretation for the essential and fundamental matrix estimation?
42