0% found this document useful (0 votes)
118 views42 pages

Multiple View Geometry in Robotics

This document discusses structure from motion (SFM) and multiple view geometry. It contains the following key points: 1. SFM aims to simultaneously estimate 3D structure and camera poses from 2 or more images. It can handle both calibrated and uncalibrated cameras. 2. For calibrated cameras, SFM estimates a rotation R and translation T between camera frames, up to an unknown scale. 3. Corresponding points in multiple views must lie on epipolar lines. The essential matrix E, which encodes the rotation and translation between views, can be used to represent this epipolar constraint. 4. The essential matrix E can be estimated directly from point correspondences between views, without needing

Uploaded by

李金良
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views42 pages

Multiple View Geometry in Robotics

This document discusses structure from motion (SFM) and multiple view geometry. It contains the following key points: 1. SFM aims to simultaneously estimate 3D structure and camera poses from 2 or more images. It can handle both calibrated and uncalibrated cameras. 2. For calibrated cameras, SFM estimates a rotation R and translation T between camera frames, up to an unknown scale. 3. Corresponding points in multiple views must lie on epipolar lines. The essential matrix E, which encodes the rotation and translation between views, can be used to represent this epipolar constraint. 4. The essential matrix E can be estimated directly from point correspondences between views, without needing

Uploaded by

李金良
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Vision Algorithms for Mobile Robotics

Lecture 08
Multiple View Geometry 2

Davide Scaramuzza
[Link] 1
Lab Exercise 6 - Today
Implement the 8-point algorithm

Estimated poses and 3D structure


2
2-View Geometry: recap
𝑃𝑖 =?
Depth from stereo (i.e., stereo vision):
• Assumptions: K, T and R are known.
• Goal: Recover the 3D structure from two images

𝐾1 , 𝑅1 ,𝑇1
2-view Structure From Motion: 𝐾2 , 𝑅2,𝑇2
• Assumptions: none (K, T, and R are unknown).
• Goal: Recover simultaneously 3D scene structure and
camera poses (up to scale) from two images 𝑃𝑖 =?

𝐾1 , 𝑅1 ,𝑇1 =?
𝐾2 , 𝑅2,𝑇2 =?
3
Structure from Motion (SFM)
Problem formulation: Given a set of 𝑛 point correspondences between two images, {𝑝𝑖 1 = (𝑢𝑖 1 , 𝑣 𝑖 1 ), 𝑝𝑖 2 =
(𝑢𝑖 2 , 𝑣 𝑖 2 )}, where 𝑖 = 1 … 𝑛, the goal is to simultaneously
• estimate the 3D points 𝑷𝑖 ,
• the camera relative-motion parameters (𝑹, 𝑻), 𝑷𝑖 = ?
• and the camera intrinsics 𝑲1, 𝑲2 that satisfy:

X iw 
u 1 
i
 i 
 
1i  v i1  = K1 I 0  i 
Yw
Z w 
1  
 
 1 
X iw 
u 2 
i
 i 
 
2 i  v i 2  = K 2 R T   i 
Yw
𝐶1
Z w  𝑅, 𝑇 = ?
1  
   1  𝐶2
4
Structure from Motion (SFM)
Two variants exist:
• Calibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are known
• Uncalibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are unknown
𝑷𝑖 = ?

𝐶1
𝑅, 𝑇 = ?
𝐶2
5
Structure from Motion (SFM)
• Let’s study the case in which the cameras are calibrated u  u 
• For convenience, let’s use normalized image coordinates →  v  = K −1  v 
   
• Thus, we want to find 𝑹, 𝑻, 𝑷𝑖 that satisfy:  1  1 

𝑷𝑖 = ?
X iw 
u 1 
i
 i 
i  i 
 1  v 1  = I 0  i 
Yw
Z w 
1  
   1 
X iw 
u 2 
i
 i 
 
i 2  v i 2  = R T   i 
Yw
Z w 
1   
   1 
𝐶1
𝑅, 𝑇 = ?
𝐶2
6
Scale Ambiguity
If we rescale the entire scene and camera views by a constant factor (i.e., similarity transformation), the
projections (in pixels) of the scene points in both images remain exactly the same:

7
Scale Ambiguity
• In Structure from Motion, it is therefore not possible to recover the absolute scale of the
scene!
• What about stereo vision? Is it possible? Why?
• Thus, only 5 degrees of freedom are measurable:
• 3 parameters to describe the rotation
• 2 parameters for the translation up to a scale (we can only compute the direction of translation but
not its length)

8
Structure From Motion (SFM)
• How many knowns and unknowns?
• 𝟒𝒏 knowns:
• 𝑛 correspondences; each one (𝑢𝑖 1 , 𝑣 𝑖 1 ) and (𝑢𝑖 2 , 𝑣 𝑖 2 ), 𝑖 = 1 … 𝑛
• 𝟓 + 𝟑𝒏 unknowns
• 5 for the motion up to a scale (3 for rotation, 2 for translation)
• 3𝑛 = number of coordinates of the 𝑛 3D points

• Does a solution exist?


• If and only if the number of independent equations ≥ number of unknowns
֜ 4𝑛 ≥ 5 + 3𝑛 ֜ n ≥ 𝟓
• First attempt to identify the solutions by Kruppa in 1913 (see historical note on slide 16).

E. Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit Innerer Orientierung, Sitz.-Ber. Akad. Wiss., Wien,
Math. Naturw. Kl., Abt. IIa., 1913. – English Translation plus original paper by Guillermo Gallego, Arxiv, 2017 9
Structure From Motion (SFM)
• Can we solve the estimation of relative motion (𝑅, 𝑇) independently of the
estimation of the 3D points? Yes! The next couple of slides prove that this is
possible.
• Once (𝑅, 𝑇) are known, the 3D points can be triangulated using the
triangulation algorithm from Lecture 7 (i.e., least square approximation plus
reprojection error minimization)

10
The Epipolar Constraint: Recap from Lecture 07
• The camera centers 𝐶1 , 𝐶2 and one image point 𝑝1 (or 𝑝2 ) determine the so called epipolar plane
• The intersections of the epipolar plane with the two image planes are called epipolar lines
• Corresponding points must therefore lie along the epipolar lines: this constraint is called epipolar
constraint
• An alternative way to formulate the epipolar constraint is to notice that two corresponding image vectors
plus the baseline must be coplanar

𝑝1
epipolar line epipolar line
epipolar plane 𝑝2

C1 C2

11
Epipolar Geometry
u1  u2  𝑃
p1 =  v1  p2 =  v2 
 1   1 
𝑝1 epipolar plane 𝑝2 𝑝′lj 1 = 𝑅𝑝1
normal 𝑛
C1
C2
T

𝑝ഥ1, 𝑝ഥ2, 𝑇 are coplanar:

p2T  n = 0  𝑝lj 2𝑇 ⋅ (𝑇 × 𝑝′lj 1 )) = 0 ֜ 𝑝lj 2𝑇 (𝑇 × (𝑅𝑝lj1 )) = 0 ֜ 𝑝lj 2𝑇 [𝑇× ]𝑅 𝑝lj1 = 0 ֜ 𝑝lj 2𝑇 𝐸 𝑝lj1 = 0

epipolar constraint
E = [𝑇× ]𝑅 essential matrix
12
Epipolar Geometry
u1  u2 
p1 =  v1  p2 =  v2  Normalized image coordinates
 1   1 

p2T E p1 = 0 Epipolar constraint or Longuet-Higgins equation (1981)

E = [T ]R Essential matrix

𝑅 and 𝑇 can be computed from 𝐸 recalling that:


E = [T ]R

H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF. 13
Example: Essential Matrix of a Camera Translating along 𝑥

E = T× R

0 −𝑡𝑧 𝑡𝑦 0 0 0
T× = 𝑡𝑧 0 −𝑡𝑥 = 0 0 𝑏
−𝑡𝑦 𝑡𝑥 0 0 −𝑏 0

R = I3×3 −𝑏
T= 0
0
0 0 0
՜ 𝐸 = T× 𝑅 = 0 0 𝑏
0 −𝑏 0
14
How to compute the Essential Matrix?
• If we don’t know (𝑅, 𝑇) can we estimate 𝐸 from two images?
• Yes, given at least 5 correspondences

Image 1 Image 2
15
A Note of History
• Kruppa showed in 1913 that 5 image correspondences is the minimal case and that there can be at up to
11 solutions
• However, in 1988, Demazure showed that there are actually at most 10 distinct solutions.
• In 1996, Philipp proposed an iterative algorithm to find these solutions.
• In 2004, Nister proposed the first efficient and non iterative solution. It uses Groebner basis
decomposition.
• The first popular solution uses 8 points and is called the 8-point algorithm or Longuet-Higgins algorithm
(1981). Because of its ease of implementation, it is still used today (e.g., NASA rovers).

[1] E. Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit Innerer Orientierung, Sitz.-Ber. Akad. Wiss., Wien, Math. Naturw. Kl., Abt. IIa., 1913. –
English Translation plus original paper by Guillermo Gallego, Arxiv, 2017
[2] H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF.
[3] D. Nister, An Efficient Solution to the Five-Point Relative Pose Problem, PAMI, 2004, PDF
16
The 8-point algorithm
• Each pair of point correspondences 1
p = (u ,
1 1v ,1) T
, p2 = (u2 , v2 ,1)T
provides a linear equation:
p2T E p1 = 0  e11 e12 e13 
E = e21 e22 e23 
e31 e32 e33 

u2u1e11 + u2 v1e12 + u2 e13 + v2u1e21 + v2 v1e22 + v2 e23 + u1e31 + v1e32 + e33 = 0

NB: The 8-point algorithm assumes that the entries of E are all independent
(which is not true since, for the calibrated case, they depend on 5 parameters (R and T))
The 5-point algorithm uses the epipolar constraint considering the dependencies among all entries.

H. Christopher Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature, 1981, PDF. 17
The 8-point algorithm
• For 𝑛 points, we can write
 e11 
e 
 12 
 e13 
 u21u11 1
u2 v1
1
u2
1 1
v2 u1
1 1
v2 v1
1
v2
1
u1
1
v1
1
1  
 2 2 2 2 2 2 2 2 2 2 2 2   e21 
u2 u1 u2 v1 u2 v2 u1 v2 v1 v2 u1 v1 1  
e22 = 0
         
  
 n n  e23 
u2 u1 1  
n n n n n n n n n n
u2 v1 u2 v2 u1 v2 v1 v2 u1 v1
e31
 
e32 
e 
Q (this matrix is known)  33 

𝐸ത (this matrix is unknown) 18


The 8-point algorithm
QE = 0
Minimal solution
• 𝑄(𝑛×9) should have rank 8 to have a unique (up to a scale) non-trivial solution 𝐸ത
• Each point correspondence provides 1 independent equation
• Thus, 8 point correspondences are needed

Over-determined solution
• n > 8 points
• A solution is to minimize | 𝑄𝐸ത |2 subject to the constraint | 𝐸ത |2 = 1.
The solution is the eigenvector corresponding to the smallest eigenvalue of the matrix 𝑄𝑇 𝑄 (because it is the unit vector 𝑥 that
minimizes | 𝑄𝑥 |2 = 𝑥 𝑇 𝑄𝑇 𝑄𝑥).
• It can be solved through Singular Value Decomposition (SVD). Matlab instructions:
[U,S,V] = svd(Q);
Ev = V(:,9);
E = reshape(Ev,3,3)';
Degenerate Configurations
• The solution of the 8-point algorithm is degenerate when the 3D points are coplanar.
• Conversely, the 5-point algorithm works also for coplanar points

19
8-point algorithm: Matlab code
A few lines of code. In today’s exercise you will learn how to implement it
function E = calibrated_eightpoint( p1, p2)

p1 = p1'; % 3xN vector; each column = [u;v;1]


p2 = p2'; % 3xN vector; each column = [u;v;1]

Q = [p1(:,1).*p2(:,1) , ...
p1(:,2).*p2(:,1) , ...
p1(:,3).*p2(:,1) , ...
p1(:,1).*p2(:,2) , ...
p1(:,2).*p2(:,2) , ...
p1(:,3).*p2(:,2) , ...
p1(:,1).*p2(:,3) , ...
p1(:,2).*p2(:,3) , ...
p1(:,3).*p2(:,3) ] ;

[U,S,V] = svd(Q);
Eh = V(:,9);

E = reshape(Eh,3,3)';

20
Extract R and T from E
Won’t be asked
• Singular Value Decomposition: E = U  V T at the exam

• Enforcing rank-2 constraint: set smallest singular value of ∑ to 0:

 1 0 0   1 0 0
 =  0  2 0  =  0  2 0
 0 0  3   0 0 0

 0  1 0  0 − tz ty  t x 
Tˆ = U  1 0 0  V T
 
Tˆ =  t z 0 t x   tˆ = t y 
 
− t y   t z 
 0 0 0 tx 0

 0  1 0 T = K 2tˆ
Rˆ = U  1 0 0V T
 0 0 1
R = K Rˆ K −1 2 1
21
4 possible solutions of R and T
There exists only one solution where points are in front of both cameras

These two views are flipped by 180 ͦ around the optical axis
22
Structure from Motion (SFM)
Two variants exist:
• Calibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are known
• Uses the Essential matrix
• Uncalibrated camera(s) ֜ 𝑲𝟏 , 𝑲𝟐 are unknown 𝑷𝑖 = ?
• Uses the Fundamental matrix

𝐶1
𝑅, 𝑇 = ?
𝐶2
23
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:

u1i  u1i  u2i  u2i 


 i −1  i   i −1  i 
 v1  = K1  v1   v2  = K 2  v2 
1 1 1 1
       

p2T E p1 = 0

i T
u 
2
u1i 
   i
 v1  = 0
i
v
 
2 E
1 1
    24
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:

u1i  u1i  u2i  u2i 


 i −1  i   i −1  i 
 v1  = K1  v1   v2  = K 2  v2 
1 1 1 1
       

p2T E p1 = 0

i T
u 
2
u1i 
  -1  i 
1  v1  = 0
i -T
v
 
2 K 2 E K
1 1
    25
The Fundamental Matrix
So far, we have assumed to know the camera intrinsic parameters and we have used normalized image
coordinates to get the epipolar constraint for calibrated cameras:

u1i  u1i  u2i  u2i 


 i −1  i   i −1  i 
 v1  = K1  v1   v2  = K 2  v2 
1 1 1 1
       

p2T E p1 = 0
T Fundamental Matrix F = K -T E K -1
u2i  u1i  2 1

 i  i
v
 2 F  v1  = 0
1 1 Fun thing: check out the Fundamental Matrix song,
    [Link] :-) 26
The 8-point Algorithm for the Fundamental Matrix
• The same 8-point algorithm to compute the essential matrix from a set of normalized
image coordinates can also be used to determine the Fundamental matrix:

i T
u 
2
u1i 
   i
 v  F  v1  = 0
i
2
1 1
   

• However, now the key advantage is that we work directly in pixel coordinates

27
Problem with 8-point algorithm

 f11 
f 
 12 
 f13 
 u21u11 1
u2 v1
1
u2
1 1
v2 u1
1 1
v2 v1
1
v2
1
u1
1
v1
1
1  
 2 2 2 2 2 2 2 2 2 2 2 2   f 21 
u2 u1 u2 v1 u2 v2 u1 v2 v1 v2 u1 v1 1  
f 22 = 0
         
  
 n n   f 23 
u2 u1 1  
n n n n n n n n n n
u2 v1 u2 v2 u1 v2 v1 v2 u1 v1
f
 31 
 f 32 
f 
 33 

28
Problem with 8-point algorithm
• Poor numerical conditioning, which makes results very sensitive to noise
• Can be fixed by rescaling the data: Normalized 8-point algorithm  f11 
f 
 12 
 f13 
 
 f 21 
 f 22  = 0
 
 f 23 
f 
 31 
~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1  f 32 
f 
Orders of magnitude difference  33 
between column of data matrix
→ least-squares yields poor results
29
Normalized 8-point algorithm (1/3)
• This can be fixed using a normalized 8-point algorithm [Hartley, 1997], which estimates the Fundamental
matrix on a set of Normalized correspondences (with better numerical properties) and then unnormalizes
the result to obtain the fundamental matrix for the given (unnormalized) correspondences
• Idea: Transform image coordinates so that they are in the range ~[−1,1] × [−1,1]
• One way is to apply the following rescaling and shift

(0,0) (700,0) (-1,-1) (1,-1)


 2 
 700 0 − 1
 2 
 0 − 1
 500 
 0 0 1
  (0,0)

(0,500) (700,500) (-1,1) (1,1)

Hartley, In defense of the eight-point algorithm, IEEE Transactions of Pattern Analysis and Machine Intelligence, PDF 30
Normalized 8-point algorithm (3/3)
The Normalized 8-point algorithm can be summarized in three steps:
1. Normalize the point correspondences: 𝑝 ෞ1 = 𝐵1 𝑝1 , 𝑝 ෞ2 = 𝐵2 𝑝2
2. Estimate normalized 𝐹෠ with 8-point algorithm using normalized coordinates ෞ
𝑝1 , 𝑝
ෞ2
3. Compute unnormalized F from 𝐹: ෠

෠ 𝑝ෞ1 =
𝑝ෝ2 𝑇 F 0

𝑝2⊤ 𝐵2 ⊤ F෠ 𝐵1 𝑝1

F = B2⊤ F෠ B1
31
Normalized 8-point algorithm (2/3)
• In the original 1997 paper, Hartley proposed to rescale the two point sets such that the centroid of each set
is 0 and the mean standard deviation 2 (equivalent to having the points distributed around a circled
passing through the four corners of the [−1,1] × [−1,1] square).

෡𝑖
2 𝑖
• This can be done for every point as follows: 𝑝 = (𝑝 − 𝜇)
𝜎
1 1 2
where 𝜇 = (𝜇𝑥 , 𝜇𝑦 ) = ∑𝑛𝑖=1 𝑝𝑖 is the centroid and 𝜎 = ∑𝑛𝑖=1 𝑝𝑖 − 𝜇 is the mean standard deviation
𝑁 𝑁
of the point set
• This transformation can be expressed in matrix form using homogeneous coordinates:

2 2
0 −𝜇
𝜎 𝜎 𝑥
𝑝෡𝑖 = 2 2 𝑝𝑖
0 − 𝜇
𝜎 𝜎 𝑦
0 0 1

Hartley, In defense of the eight-point algorithm, IEEE Transactions of Pattern Analysis and Machine Intelligence, 1997. PDF 32
Can 𝑅, 𝑇, 𝐾1 , 𝐾2 be extracted from F?
• In general no: infinite solutions exist
• However, if the coordinates of the principal points of each camera are known
and the two cameras have the same focal length 𝑓 in pixels, then 𝑅, 𝑇, 𝑓 can
determined uniquely

33
Comparison between Normalized and non-normalized algorithm

8-point Normalized 8-point Nonlinear refinement

Avg. Ep. Line Distance 2.33 pixels 0.92 pixel 0.86 pixel

34
Error Measures
• The quality of the estimated Essential or Fundamental matrix can be measured using different error
metrics:
• Algebraic error
• Directional Error
• Epipolar Line Distance
• Reprojection Error
• When is the error 0?
• These errors will be exactly 0 only if 𝑬 (or 𝑭) is
computed from just 8 points (because in this
case a non-overdetermined solution exists).
• For more than 8 points, it will only be 0 if there Epipolar plane
is no noise or outliers in the data p2
p1
(if there is image noise or outliers then it
the system becomes overdetermined)
𝑛
C1 C2 35
Algebraic Error
• It follows directly from the 8-point algorithm, which seeks to minimize the algebraic error:
N
err = QE =  iT
2 i 2
( p E p
2 1)
i =1
• From the proof of the epipolar constraint and using the definition of dot product, it can be observed that:

ഥ⊤
𝒑𝟐 𝑬ഥ
𝒑1 = ഥ⊤
𝒑𝟐 ∙ (𝑬ഥ
𝒑1) ഥ2
= 𝒑 𝑬ഥ
𝒑1 cos(𝜃)

ഥ2
= 𝒑 ഥ1 cos(𝜃)
T× 𝑅 𝒑
• We can see that this product depends on the
angle 𝜃 between 𝒑ഥ2 and the normal 𝒏 = 𝑬𝒑1 to
the epipolar plane. Epipolar plane
It is nonzero when 𝒑ഥ1,ഥ
𝒑2, and 𝑻 are not coplanar p2
p1
• What is the drawback of this error measure?
𝑛
C1 C2 36
Directional Error
𝑁
• Sum of squared cosines of the angle from the epipolar plane: err = ෍(cos(𝜃𝑖 ))2
𝑖=1
ഥ⊤
𝒑 𝟐 𝑬ഥ
𝒑1
• It is obtained by normalizing the algebraic error: cos(𝜃) =
𝒑2 𝑬𝒑1

Epipolar plane
p2
p1

𝑛
C1 C2 37
Epipolar Line Distance
𝑁
2 2
• Sum of Squared Epipolar-Line-to-point Distances: 𝑒𝑟𝑟 = ෍ 𝑑 𝑝1𝑖 , 𝑙1𝑖 + 𝑑 𝑝2𝑖 , 𝑙2𝑖
𝑖=1

• Cheaper than reprojection error because does not require point triangulation

l1 = F T p2 epipolar plane
l2 = Fp1
p2
p1

C1 C2 38
Reprojection Error
𝑁
2 2
• Sum of the Squared Reprojection Errors: 𝑒𝑟𝑟 = ෍ 𝑝1𝑖 −𝜋 𝑃𝑖 , 𝐾1 , 𝐼, 0 + 𝑝2𝑖 −𝜋 𝑃𝑖 , 𝐾2 , 𝑅, 𝑇
𝑖=1
• More expensive than the previous three errors because it requires to first triangulate the 3D points!
• However, it is the most popular because more accurate. The reason is that the error is computed directly
with the respect the raw input data, which are the image points
Camera 2 reprojection error
𝑝2 − 𝜋 𝑃, 𝐾2, 𝑅, 𝑇

𝑃 Reprojected point
𝜋 𝑃, 𝐾2 , 𝑅, 𝑇

Reprojected point
𝜋 𝑃, 𝐾1, 𝐼, 0 epipolar plane
p2
p1
Camera 1 reprojection error
𝑝1 − 𝜋 𝑃, 𝐾1, 𝐼, 0

C1 C2 39
Things to remember
• SFM from 2 view
• Calibrated and uncalibrated case
• Proof of Epipolar Constraint
• 8-point algorithm and algebraic error
• Normalized 8-point algorithm
• Algebraic, directional, Epipolar line distance, Reprojection error

40
Readings
• CH. 11.3 of Szeliski book, 2nd edition
• Ch. 14.2 of Corke book

41
Understanding Check
Are you able to answer the following questions?
• What's the minimum number of correspondences required for calibrated SFM and why?
• Are you able to derive the epipolar constraint?
• Are you able to define the essential matrix?
• Are you able to derive the 8-point algorithm?
• How many rotation-translation combinations can the essential matrix be decomposed into?
• Are you able to provide a geometrical interpretation of the epipolar constraint?
• Are you able to describe the relation between the essential and the fundamental matrix?
• Why is it important to normalize the point coordinates in the 8-point algorithm?
• Describe one or more possible ways to achieve this normalization.
• Are you able to describe the normalized 8-point algorithm?
• Are you able to provide quality metrics and their interpretation for the essential and fundamental matrix estimation?

42

You might also like