Computer Vision Solution Manual
Computer Vision Solution Manual
Cameras
PROBLEMS
1.1. Derive the perspective equation projections for a virtual image located at a distance
f 0 in front of the pinhole.
−−
→ −
−→
Solution We write again OP 0 = λOP but this time impose z 0 = −f 0 (since
the image plane is in front of the pinhole and has therefore negative depth). The
perspective projection equations become
x
x0 = −f 0 ,
z
y 0 = −f 0 y .
z
Note that the magnification is positive in this case since z is always negative.
1.2. Prove geometrically that the projections of two parallel lines lying in some plane
Π appear to converge on a horizon line H formed by the intersection of the image
plane with the plane parallel to Π and passing through the pinhole.
Solution Let us consider two parallel lines ∆1 and ∆2 lying in the plane Π and
define ∆0 as the line passing through the pinhole that is parallel to ∆1 and ∆2 .
The lines ∆0 and ∆1 define a plane Π1 , and the lines ∆0 and ∆2 define a second
plane Π2 . Clearly, ∆1 and ∆2 project onto the lines δ1 and δ2 where Π1 and Π2
intersect the image plane Π0 . These two lines intersect at the point p0 where ∆0
intersects Π0 . This point is the vanishing point associated with the family of lines
parallel to ∆0 , and the projection of any line in the family appears to converge on
it. (This is true even for lines parallel to ∆0 that do not lie in Π.)
Now let us consider two other parallel lines ∆01 and ∆02 in Π and define as before
the corresponding line ∆00 and vanishing point p00 . The lines ∆0 and ∆00 line in a
plane parallel to Π that intersects the image plane along a line H passing through
p0 and p00 . This is the horizon line, and any two parallel lines in Π appears to
intersect on it. They appear to converge there since any image point above the
horizon is associated with a ray issued from the pinhole and pointing away from Π.
Horizon points correspond to rays parallel to Π and points in that plane located at
an infinite distance from the pinhole.
1.3. Prove the same result algebraically using the perspective projection Eq. (1.1). You
can assume for simplicity that the plane Π is orthogonal to the image plane.
Solution Let us define the plane Π by y = c and consider a line ∆ in this plane
with equation ax + bz = d. According to Eq. (1.1), a point on this line projects
onto the image point defined by
x d − bz
x0 = f 0 = f 0
,
z az
y0 = f 0 y = f 0 c .
z z
1
2 Chapter 1 Cameras
Before constructing the image of P , let us first determine the image P00 of P0 on
the optical axis: after refraction at the right circular boundary of the lens, r 0 is
transformed into a new ray r1 intersecting the optical axis at the point P1 whose
depth z1 verifies, according to (1.5),
1 n n−1
+ = .
− z0 z1 R
The ray r1 is immediately refracted at the left boundary of the lens, yielding a new
ray r00 that intersects the optical axis in P00 . The paraxial refraction equation can
be rewritten in this case as
n 1 1−n
+ = ,
− z1 z00 −R
1 1 1 R
− = , where f= . (1.1)
z00 z0 f 2(n − 1)
Let r denote the ray passing through P and the center O of the lens, and let P 0
denote the intersection of r and r0 , located at depth z 0 and at a distance −y 0 of the
optical axis. We have the following relations among the sides of similar triangles:
y z − z0 z
= = (1 − ),
h − z z
0 0
− y0
z 0 − z00 z0
= 0 = −(1 − 0 ), (1.2)
h z0 z0
0
y y
0 = .
z z
3
1 1 1
− = .
z0 z f
1.5. Consider a camera equipped with a thin lens, with its image plane at position z 0
and the plane of scene points in focus at position z. Now suppose that the image
plane is moved to ẑ 0 . Show that the diameter of the corresponding blur circle is
|z 0 − ẑ 0 |
d ,
z0
where d is the lens diameter. Use this result to show that the depth of field (i.e.,
the distance between the near and far planes that will keep the diameter of the
blur circles below some threshold ε) is given by
d
D = 2εf z(z + f ) ,
f d − ε2 z 2
2 2
and conclude that, for a fixed focal length, the depth of field increases as the lens
diameter decreases, and thus the f number increases.
Hint: Solve for the depth ẑ of a point whose image is focused on the image plane at
position ẑ 0 , considering both the case where ẑ 0 is larger than z 0 and the case where
it is smaller.
Solution If ε denotes the diameter of the blur circle, using similar triangles im-
mediately shows that
|z 0 − ẑ 0 |
ε=d .
z0
Now let us assume that z 0 > ẑ 0 . Using the thin lens equation to solve for the depth
ẑ of a point focused on the plane ẑ 0 yields
d−ε
ẑ = f z .
df + εz
∂D f 2 d2 + ε 2 z 2
= −k < 0,
∂d (f 2 d2 − ε2 z 2 )2
1.6. Give a geometric construction of the image P 0 of a point P given the two focal
points F and F 0 of a thin lens.
Solution Let us assume that the point P is off the optical axis of the lens. Draw
the ray r passing through P and F . After being refracted by the lens, r emerges
parallel to the optical axis. Now draw the ray r 0 passing through P and parallel to
the optical axis. After being refracted by the lens, r 0 must pass through F 0 . Draw
the two refracted rays. They intersect at the image P 0 of P . For a point P on the
optical axis, just construct the image of a point off-axis with the same depth to
determine the depth of the image of P 0 . It is easy to derive the thin lens equation
from this geometric construction.
1.7. Derive the thick lens equations in the case where both spherical boundaries of the
lens have the same radius.
Solution The diagram below will help set up the notation. The thickness of the
lens is denoted by t. All distances here are taken positive; if some of the points
changed side in a different setting, all formulas derived below would still be valid,
with possibly negative distance values for the points having changed side.
C A B
a
c t b
Let us consider a point A located on the optical axis of the thick lens at a distance
a from its right boundary. A ray passing through A is refracted at the boundary,
and the secondary ray intersects the optical axis in a point B located a distance
b from the boundary (here B is on the right of the boundary, recall that we take
b > 0, if B were on the left, we would use −b for the distance). The secondary
ray is then refracted at the left boundary of the lens, and the ternary ray finally
intersects the optical axis in a point C located at (positive) distance c from the left
lens boundary.
Applying the paraxial refraction Eq. (1.5) yields
1 n n−1
− =
,
a b R
n 1 n−1
+ = .
t+b c R
To establish the thick lens equation, let us first postulate the existence of the
focal and principal points of the lens, and compute their positions. Consider the
diagram below. This time A is the right focal point F of the lens, and any ray
passing through this point emerges from the lens parallel to its optical axis.
5
h’
h
H V F B
d a
t b
f
This corresponds to 1/c = 0 or
nR nR − (n − 1)t
b= −t= .
n−1 n−1
Substituting the values of a and b obtained earlier in this equation shows that
Rt
d= .
2Rn − (n − 1)t
The focal length is the distance between H and F and it is thus given by
nR2
f =d+a= ,
(n − 1)[2Rn − (n − 1)t]
or
1 n−1 (n − 1)2 t
=2 − .
f R n R2
For these values of a, b, c, d, and f , it is now clear that any ray passing through F
emerges parallel to the optical axis, and that the emerging ray can be constructed
by pretending that the primary ray goes undeflected until it intersects the principal
plane passing through H and perpendicular to the optical axis, where refraction
turns it into a secondary ray parallel to the axis.
6 Chapter 1 Cameras
An identical argument allows the construction of the left focal and principal planes.
By symmetry, these are located at distances a and d from the left boundary, and
the left focal length is the same as the right one.
We gave in Ex. 1.6 a geometric construction of the image P 0 of a point P given
the two focal points F and F 0 of a thin lens. The same procedure can be used for a
thick lens, except for the fact that the ray going through the points P and F (resp.
P 0 and F 0 ) is “refracted” into a ray parallel to the optical axis when it crosses
the right (resp. left) principal plane instead of the right (resp. left) boundary of
the lens (Figure 1.11). It follows immediately that the thin lens equation holds for
thick lenses as well, i.e.,
1 1 1
0 − = ,
z z f
where the origin used to measure z is in the right principal plane instead of at the
optical center, and the origin used to measure z 0 is in the left principal plane.
C H A P T E R 2
cos θ − sin θ 0
à !
A
BR = sin θ cos θ 0 .
0 0 1
Note that A B
B R is of course the inverse of the matrix A R given by Eq. (2.4).
When B is deduced from A by a rotation of angle θ about the axis iA , we have
1 0 0
à !
A
BR = 0 cos θ − sin θ .
0 sin θ cos θ
2.2. Show that rotation matrices are characterized by the following properties: (a) the
inverse of a rotation matrix is its transpose and (b) its determinant is 1.
Solution Let us first show that the two properties are necessary. Consider the
rotation matrix
B B B B
A R = ( iA j A kA ).
Clearly, B
iA · B iA B
iA · B j A B
iA · B k A
T
R R = B j A · B iA B
jA · B jA B
j A · B kA = Id,
B
k A · B iA B
kA · B j A B
kA · B kA
7
8 Chapter 2 Geometric Camera Models
R0 t0
µ ¶ µ ¶
R t 0
T = T =
0T 1 0T 1
The matrix R00 is a rotation matrix since R00T R00 = R0T RT RR0 = R0T R0 = Id
and Det(R00 ) = Det(R)Det(R0 ) = 1. Thus T 00 is a rigid transformation matrix.
To prove (b), note that T 00 = Id when R0 = RT and t0 = −RT t. This shows that
any rigid transformation admits an inverse and that this inverse is given by these
two equations.
2.4. Let A T denote the matrix associated with a rigid transformation T in the coordi-
nate system (A), with µA
R At
¶
A
T = .
0T 1
or
A B 0
BT P = AT A B
BT P.
In turn, this can be rewritten as
B −1 A
P0 = A
BT T A B
BT P = BT B
P,
A −1
or, since BT =B
AT
B
T =B
AT
A
T A
BT .
B
It follows that an explicit expression for T is
µB A
RA B A A
+B A B
¶
B
T = AR BR A R R OB A R t + OA .
0T 1
9
2.5. Show that if the coordinate system (B) is obtained by applying to the coordinate
−1 A
system (A) the rigid transformation T , then B P = A T P , where A
T denotes
the matrix representing T in the coordinate frame (A).
Solution We write
µA A A
¶ µA ¶
A iB jB k B A OB iA A
jA A
kA A
OA
BT = = AT
0 0 0 1 0 0 0 1
1 0 0 0
A 0 1 0 0 A
= T = T,
0 0 1 0
0 0 0 1
0
Solution Let us write F P = (x, y, z)T and F P = (x0 , y 0 , z 0 )T . Obviously we
must have z = z 0 , the angle between the two vectors (x, y) and (x0 , y 0 ) must be
equal to θ, and the norms of these two vectors must be equal. Note that the vector
(x, y) is mapped onto the vector (−y, x) by a 90◦ counterclockwise rotation. Thus
we have
0
yy 0 xx0 + yy 0
cos θ = p xx + = 2 ,
x + y2
p
x2 + y 2 x02 + y 02
− yx0 + xy 0 − yx0 + xy 0
sin θ = p = .
x2 + y 2
p
x2 + y 2 x02 + y 02
Solving this system of linear equations in x0 and y 0 immediately yieds x0 = x cos θ −
y sin θ and y 0 = x sin θ + y cos θ, which proves that we have indeed F P 0 = RF P .
2.7. Show that the change of coordinates associated with a rigid transformation pre-
serves distances and angles.
Solution Let us consider a fixed coordinate system and identify points of E3
with their coordinate vectors. Let us also consider three points A, B, and C and
their images A0 , B 0 , and C 0 under the rigid transformation defined by the rotation
matrix R and the translation vector t. The squared distance between A0 and B 0
is
|B 0 − A0 |2 = |R(B − A)|2 = (B − A)T RT R(B − A) = |B − A|2 ,
and it follows that rigid transformations preserve distances. Likewise, if θ 0 denotes
the angle between the vectors joining the point A0 to the points B 0 and C 0 , we
have
(B 0 − A0 ) · (C 0 − A0 ) [R(B − A)] · [R(C − A)]
cos θ 0 = =
|B 0 − A0 | |C 0 − A0 | |B − A| |C − A|
(B − A)T RT R(C − A) (B − A) · (C − A)
= = = cos θ,
|B − A| |C − A| |B − A| |C − A|
where θ is the angle between the vectors joining the point A to the points B and
C. It follows that rigid transformations also preserve angles (to be rigorous, we
10 Chapter 2 Geometric Camera Models
should also show that the sine of θ is also preserved). Note that the translation
part of the rigid transformation is irrelevent in both cases.
2.8. Show that when the camera coordinate system is skewed and the angle θ between
the two image axes is not equal to 90 degrees, then Eq. (2.11) transforms into Eq.
(2.12).
Solution Let us denote by (û, v̂) the normalized coordinate system for the image
plane centered in the projection C0 of the optical center (see Figure 2.8). Let us
also denote by (ũ, ṽ) a skew coordinate centered in C0 with unit basis vectors and
a skew angle equal to θ. Overlaying the orthogonal and skew coordinate systems
immediately reveals that v̂/ṽ = sin θ and û = ũ + v̂ cot θ, or
x y
ũ = û − v̂ cot θ = − cot θ
z z
ṽ = 1 1 y
v̂ = .
sin θ sin θ z
Taking into account the actual position of the image center and the camera mag-
nifications yields
x y
u = αũ + u0 = α − α cot θ + u0 ,
z z
v = βṽ + v0 = β y
+ v0 ,
sin θ z
Solution As shown by Eq. (2.15), the most general form of the ¡ perspective
projection matrix in some world coordinate system (W ) is M = K C C
¢
WR OW .
Here, C OW is the non-homogeneous coordinate vector of the origin of (W ) in the
normalized coordinate system (C) attached to the camera. On the other hand, O
is by definition the homogeneous coordinate vector of the origin of (C)—that is,
T
the camera’s optical center—in the world coordiante system, so O T = (W OC , 1).
Thus
µW ¶
OC
= K(C W
O C + C O W ) = KC OC = K0 = 0.
¡C C
¢
MO = K WR OW WR
1
Solution As noted in the chapter itself, according to Eq. 2.15, we have A = KR,
thus the determinants of A and K are the same and A is nonsingular. Now,
according to Eq. (2.17), we have
a = αr 1 − α cot θr 2 + u0 r 3 ,
1
β
a2 = r 2 + v0 r 3 ,
sin θ
a3 = r 3 ,
11
where r T T T
1 , r 2 and r 3 denote the row vectors of the rotation matrix R. It follows
that
β
(a1 × a3 ) · (a2 × a3 ) = (−αr 2 + α cot θr 1 ) · ( r 1 ) = αβ cos θ,
sin θ
thus (a1 × a3 ) · (a2 × a3 ) = 0 implies that cos θ = 0 and the cameras as zero skew.
Finally, we have
β
|a1 × a3 |2 − |a2 × a3 |2 = | − αr 2 + α cot θr 1 |2 − | r 1 |2
sin θ
β2 α2 − β 2
= α2 (1 + cot2 θ) − = .
sin2 θ sin2 θ
Thus |a1 × a3 |2 = |a2 × a3 |2 implies that α2 = β 2 , i.e., that the camera has unit
aspect ratio.
2.11. Show that the conditions of Theorem 1 are sufficient. Note that the statement of
this theorem is a bit different from the corresponding theorems in Faugeras (1993)
and Heyden (1995), where the condition Det(A) 6= 0 is replaced by a3 6= 0. Of
course, Det(A) 6= 0 implies a3 6= 0.
Solution We follow here the procedure for the recovery of a camera’s intrinsic and
extrinsic parameters given in Section 3.2.1 of the next chapter. The conditions of
Theorem 1 ensure via Eq. (3.13) that this procedure succeeds and yields the correct
intrinsic parameters. In particular, when the determinant of M is nonzero, the
vectors ai are linearly independent, and their pairwise cross-products are nonzero,
ensuring that all terms in Eq. (3.13) are well defined. Adding the condition (a1 ×
a3 ) · (a2 × a3 ) = 0 gives cos θ a value of zero in the that equation, yielding a
zero-skew camera. Finally, adding the condition |a1 × a3 |2 = |a2 × a3 |2 gives the
two magnifications equal values, corresponding to a unit aspect-ratio.
2.12. If A Π denotes the homogeneous coordinate vector of a plane Π in the coordinate
frame (A), what is the homogeneous coordinate vector B Π of Π in the frame (B)?
Solution Let A B T denote the matrix representing the change of coordinates be-
tween the frames (B) and (A) such that A P = ABT
B
P . We have, for any point P
in the plane Π,
T A T A B TA T B
0 = AΠ P = AΠ BT P = [A
BT Π]T B
P = BΠ P.
T
Thus B Π = ABT
A
Π.
A
2.13. If Q denotes the symmetric matrix associated with a quadric surface in the co-
ordinate frame (A), what is the symmetric matrix B Q associated with this surface
in the frame (B)?
Solution Let A B T denote the matrix representing the change of coordinates be-
tween the frames (B) and (A) such that A P = ABT
B
P . We have, for any point P
on the quadric surface,
T A T A T A T B
0 = AP Q A P = (B P A B
B T ) Q (B T P) = BP Q BP .
B T A
Thus Q=A
BT QA
B T . This matrix is symmetric by construction.
12 Chapter 2 Geometric Camera Models
¢ 1 0 −xr /zr xr µ
à ! ¶µ ¶
1¡ R t P
p= K2 p0 0 1 −yr /zr yr ,
zr 0T 1 1
0 0 0 zr
13
which can indeed be written as an instance of the general affine projection equation
(2.19) with µ ¶
A = 1 K2 1 0 −xr /zr R,
zr · 0 1 −yr /zµr ¶¸
1 t z xr
b = K2 t2 + (1 − ) y + p0 .
zr zr r
The translation parameters are coupled to the camera intrinsic parameters and the
position of the reference point in the expression of b. As in the weak-perspective
case, we are free to change the position of the camera relative to the origin of the
world coordinate system to simplify this expression. In particular, we can choose
tz = zr and reset the value of t2 to t2 −zr K2−1 p0 , so the value of b becomes z1r K2 t2 .
Now, observing that the value of A does not change when K2 and (xr , yr , zr ) are
respectively replaced
¡ ¢ by λK2 and λ(xr , yr , zr ) allows us to rewrite the projection
matrix M = A b as
µ ¶µµ ¶ ¶
1 k s 1 0 −xr /zr
M= R t2 .
zr 0 1 0 1 −yr /zr
Let us now show that any affine projection matrix can be written in this form. We
can rewrite explicitly the entries of the two matrices of interest as:
xr T
rT
1 −r tx
aT zr 3
µ ¶
1 b1 1
= .
aT b2 zr yr T
2 rT
2 − r ty
zr 3
Since r T T T
1 , r 2 , and r 3 are the rows of a rotation matrix, these vectors are orthogonal
and have unit norm, and it follows that
1 + λ2 = zr2 |a1 |2 ,
1 + µ2 = zr2 |a2 |2 ,
λµ = zr2 (a1 · a2 ),
where c1 = (a1 · a2 )/|a1 |2 and c2 = (a1 · a2 )/|a2 |2 . It follows from the first
equation that µ = c1 (1 + λ2 )/λ, and substituting in the second equation yields,
after some simple algebraic manipulation,
c2
(1 − c1 c2 )λ4 + [1 − c1 c2 − (1 + c21 )]λ2 − c1 c2 = 0.
c1
signs. The positive root is the only root of interest, and it yields two opposite
values for λ. Once λ is known, µ is of course determined uniquely, and so is zr2 . It
follows that there are four possible solutions for the triple (xr , yr , zr ). For each of
these solutions, we have tx = zr b1 and ty = zr b2 .
We finally determine r 1 , r 2 , and r 3 by defining a3 = a1 × a2 and noting that
λr 1 + µr 2 + r 3 = zr2 a3 . In particular, we can write
1 0 λ
à !
¡ ¢ ¡ ¢
z r a1 a2 zr a 3 = r 1 r2 r3 0 1 µ .
−λ −µ 1
Multiplying both sides of this equation on the right by the inverse of the rightmost
matrix yields
2
zr ¢ 1+µ −λµ −λ
1 + λ2
¡ ¢ ¡
r1 r2 r3 = a1 a2 zr a3 −λµ −µ,
1 + λ 2 + µ2 λ µ 1
or
zr
r1 = [(1 + µ2 )a1 − λµa2 + λzr a3 ],
1 + λ 2 + µ2
z
r
r2 = [−λµa1 + (1 + λ2 )a2 + µzr a3 ],
1 + λ 2 + µ2
zr
r3 = 2 [−λa1 − µa2 + zr a3 ].
2
1+λ +µ
2.15. Line Plücker coordinates. The exterior product of two vectors u and v in R 4
is defined by
u1 v 2 − u2 v 1
u1 v 3 − u3 v 1
def u1 v4 − u4 v1
u∧v = .
u2 v 3 − u3 v 2
u2 v 4 − u4 v 2
u3 v 4 − u4 v 3
Given a fixed coordinate system and the (homogeneous) coordinates vectors A and
B associated with two points A and B in E3 , the vector L = A ∧ B is called the
vector of Plücker coordinates of the line joining A to B.
(a) Let us write L = (L1 , L2 , L3 , L4 , L5 , L6 )T and denote by O the origin of the
coordinate system and by H its projection onto L. Let us also identify the
−→ −−→
vectors OA and OB with their non-homogeneous coordinate vectors. Show
−→ −→ −−→ −−→ −→
that AB = −(L3 , L5 , L6 )T and OA × OB = OH × AB = (L4 , −L2 , L1 )T .
Conclude that the Plücker coordinates of a line obey the quadratic constraint
L1 L6 − L2 L5 + L3 L4 = 0.
(b) Show that changing the position of the points A and B along the line L only
changes the overall scale of the vector L. Conclude that Plücker coordinates
are homogeneous coordinates.
(c) Prove that the following identity holds of any vectors x, y, z, and t in R4 :
(d) Use this identity to show that the mapping between a line with Plücker co-
ordinate vector L and its image l with homogeneous coordinates l can be
represented by
(m2 ∧ m3 )T
def
ρl = M̃L, where M̃ = (m3 ∧ m1 )T ,
(2.1)
(m1 ∧ m2 )T
and mT T T
1 , m2 , and m3 denote as before the rows of M and ρ is an appropriate
scale factor.
Hint: Consider a line L joining two points A and B and denote by a and b
the projections of these two points, with homogeneous coordinates a and b.
Use the fact that the points a and b lie on l, thus if l denote the homogeneous
coordinate vector of this line, we must have l · a = l · b = 0.
(e) Given a line L with Plücker coordinate vector L = (L1 , L2 , L3 , L4 , L5 , L6 )T
and a point P with homogeneous coordinate vector P , show that a necessary
and sufficient condition for P to lie on L is that
−L5
0 L6 L4
def −L6 0 L3 −L2
LP = 0, where L = .
L5 −L3 0 L1
−L4 L2 −L1 0
(f ) Show that a necessary and sufficient condition for the line L to lie in the plane
Π with homogeneous coordinate vector Π is that
0 L1 L2 L3
∗ def −L1 0 L4 L5
L∗ Π = 0, where L = .
−L2 −L4 0 L6
−L3 −L5 −L6 0
Solution
(a) If A = (a1 , a2 , a3 , 1)T and B = (b1 , b2 , b3 , 1)T , we have
L1 a 1 b 2 − a2 b 1
L 2 a 1 b 3 − a3 b 1
L a1 − b 1
L = 3 = A ∧ B = ,
L 4 a 2 b 3 − a3 b 2
L 5 a2 − b 2
L6 a3 − b 3
−→ −→ −−→
thus AB = −(L3 , L5 , L6 )T and OA × OB = (L4 , −L2 , L1 )T . In addition, we
have
−→ −−→ −−→ −−→ −−→ −−→ −−→ −→
OA × OB = (OH + HA) × (OH + HB) = OH × AB
−−→ −−→
since HA and HB are parallel.
−→ −→ −−→
Since the vectors AB and OA × OB are orthogonal, it follows immediately
that their dot product L1 L6 − L2 L5 + L3 L4 is equal to zero.
(b) Replacing A and B by any other two points C and D on the same line only
−−→ −→
scales the Plücker coordinates of this lines since we can always write CD = AB
16 Chapter 2 Geometric Camera Models
(m2 ∧ m3 )T
(m2 ∧ m3 ) · (A ∧ B)
à !
M̃L = (m3 ∧ m1 )T (A ∧ B) = (m3 ∧ m1 ) · (A ∧ B)
(m ∧ m2 )T (m1 ∧ m2 ) · (A ∧ B)
à 1
(m2 · A)(m3 · B) − (m2 · B)(m3 · A)
!
= (m3 · A)(m1 · B) − (m3 · B)(m1 · A)
(m1 · A)(m2 · B) − (m1 · B)(m2 · A)
= a × b.
and a necessary and sufficient condition for L∗ Π to be the zero vector is that
−→ −→
the vector AB lie in a plane parallel to Π (condition n · AB = 0), located at a
−→ −−→ −−→ −→
distance d from the origin (condition (n · AB)OH + (d − n · OH)AB = 0 =⇒
−−→
d = n · OH), or equivalently, that L lie in the plane Π.
C H A P T E R 3
PROBLEMS
3.1. Show that the vector x that minimizes |Ux|2 under the constraint |Vx|2 = 1 is
the (appropriately scaled) generalized eigenvector associated with the minimum
generalized eigenvalue of the symmetric matrices U T U and V T V.
Hint: First show that the minimum sought is reached at x = x0 , where x0 is the
(unconstrained) minimum of the error E(x) = |Ux|2 /|Vx|2 such that |Vx0 |2 = 1.
(Note that since E(x) is obviously invariant under scale changes, so are its extrema,
and we are free to fix the norm of |Vx0 |2 arbitrarily. Note also that the minimum
must be taken over all values of x such that Vx 6= 0.)
|Ux|2 |Ux0 |2
∀x, Vx 6= 0 =⇒ ≥ = |Ux0 |2 .
|Vx|2 |Vx0 |2
In particular,
∀x, |Vx|2 = 1 =⇒ |Ux|2 ≥ |Ux0 |2 .
1 ∂ |Ux|2 ∂ 2 |Ux|2
∇E = (xT U T Ux)− (xT V T Vx) = [U T U − VV T ]x,
|Vx|2 ∂x |Vx|4 ∂x |V(x)| 2
|Vx|2
18
19
(xi − x̄)2
µ P P ¶
T (x i − x̄)(yi − ȳ)
U U= P
(y − ȳ)2
P
µ P i2 (x − x̄)(y i − ȳ)
2 P i ¶
(xi − 2x̄xi + x̄ ) (x i yi − x̄yi − ȳxi + x̄ȳ)
=
(yi2 − 2ȳyi + ȳ 2 )
P P
µ (xiP yi − x̄yi − ȳxi + x̄ȳ)
2 2
P P P P ¶
xi − 2x̄ xiP + nx̄ i − x̄
xi yP yiP− ȳ xi + nx̄ȳ
=
yi2 − 2ȳ yi + nȳ 2
P P
µ Pxi y2 i − x̄ 2 yiP − ȳ xi + nx̄ȳ
¶
x − nx̄ Pxi y2 i − nx̄ȳ
= P i ,
xi yi − nx̄ȳ yi − nȳ 2
which is, indeed, the matrix of second moments of inertia of the points pi .
3.3. Extend the line-fitting method presented in Section 3.1.1 to the problem of fitting
a plane to n points in E3 .
Solution We consider n points Pi (i = 1, . . . , n) with coordinates (xi , yi , zi )T in
some fixed coordinate system, and find the plane with equation ax + by + cz − d = 0
and unit normal = n = (a, b, c)T that best fits these points. This amounts to
minimizing
n
X
E(a, b, c, d) = (axi + byi + czi − d)2
i=1
n n n
1X 1X 1X
d = ax̄ + bȳ + cz̄, where x̄ = xi , ȳ = yi , and z̄ = zi . (3.1)
n n n
i=1 i=1 i=1
where
x1 − x̄ y1 − ȳ z1 − z̄
à !
U= ... ... ... ,
xn − x̄ yn − ȳ zn − z̄
and our original problem finally reduces to minimizing |Un|2 with respect to n
under the constraint |n|2 = 1. We recognize a homogeneous linear least-squares
problem, whose solution is the unit eigenvector associated with the minimum eigen-
value of the 3 × 3 matrix U T U. Once a, b, and c have been computed, the value of
d is immediately obtained from Eq. (3.1). Similar to the line-fitting case, we have
Pn 2 2 Pn Pn
T Pni=1 xi − nx̄ Pi=1
n
xi yi − nx̄ȳ
2 2 Pi=1
n
xi zi − nx̄z̄
U U= xi yi − nx̄ȳ Pni=1 yi − nȳ yi zi − nȳz̄
Pi=1
n Pi=1
n 2 2
i=1 xi zi − nx̄z̄ i=1 yi zi − nȳz̄ i=1 zi − nȳ
3.4. Derive an expression for the Hessian of the functions f2i−1 (ξ) = ũi (ξ) − ui and
f2i (ξ) = ṽi (ξ) − vi (i = 1, . . . , n) introduced in Section 3.4.
∂ 2 f2i−1
T
∂ 1¡ T ¢ ∂m
= [ P i 0T −ũi P Ti ]
∂ξj ∂ξk ∂ξ
· k i z̃ ∂ξj µ ¶¸
− 1 ∂ z̃i ¡ T T T
¢ 1 T T ∂ ũi T ∂m
= Pi 0 −ũi P i + 0 0 − P
z̃i2 ∂ξk z̃i ∂ξk i ∂ξj
2
1¡ T ¢ ∂ m
+ P i 0T −ũi P Ti
·z̃i T
∂ξj ∂ξk
− 1 ∂m3 ¡ T ¢
= 2 ( P i ) P i 0T −ũi P Ti
z̃i µ k ξ ¶¸
1 T T 1¡ T T T ∂m
¢ T ∂m
+ 0 0 − Pi 0 −ũi P i P
z̃i z̃i ∂ξk i ∂ξj
2
1¡ T ¢ ∂ m
+ P i 0T −ũi P Ti .
z̃i ∂ξj ∂ξk
3.5. Euler angles. Show that the rotation obtained by first rotating about the z axis of
some coordinate frame by an angle α, then rotating about the y axis of the new
coordinate frame by an angle β and finally rotating about the z axis of the resulting
frame by an angle γ can be represented in the original coordinate system by
cos α cos β cos γ − sin α sin γ − cos α cos β sin γ − sin α cos γ cos α sin β
à !
sin α cos β cos γ + cos α sin γ − sin α cos β sin γ + cos α cos γ sin α sin β .
− sin β cos γ sin β sin γ cos β
Solution Let us denote by (A), (B), (C), and (D) the consecutive coordinate
systems. If Rx (θ), Ry (θ), and Rz (θ) denotes the rotation matrices about axes x,
21
A B C
y, and z, we have BR = Rz (α), CR = Ry (β), and DR = Rz (γ). Thus
A
=A
DRÃ
B C
BR C R DR
cos α − sin α 0 cos β 0 sin β cos γ − sin γ 0
!Ã !Ã !
= sin α cos α 0 0 1 0 sin γ cos γ 0
0 0 1 − sin β 0 cos β 0 0 1
cos α cos β − sin α cos α sin β cos γ − sin γ 0
à !à !
= sin α cos β cos α sin α sin β sin γ cos γ 0
− sin β 0 cos β 0 0 1
cos α cos β cos γ − sin α sin γ − cos α cos β sin γ − sin α cos γ cos α sin β
à !
= sin α cos β cos γ + cos α sin γ − sin α cos β sin γ + cos α cos γ sin α sin β .
− sin β cos γ sin β sin γ cos β
Now the rotation that maps (A) onto (D) maps a point P with position A P in the
coordinate system (A) onto the point P 0 with the same position D P 0 =A P in the
coordinate system D. We have A P 0 = A D 0 A A
D R P = D R P , which proves the desired
result.
3.6. The Rodrigues formula. Consider a rotation R of angle θ about the axis u (a unit
vector). Show that Rx = cos θx + sin θ u × x + (1 − cos θ)(u · x)u.
Hint: A rotation does not change the projection of a vector x onto the direction u
of its axis and applies a planar rotation of angle θ to the projection of x into the
plane orthogonal to u.
Solution Let a denote the orthogonal projection of x onto u, b = x − a denote
its orthogonal projection onto the plane perpendicular to u, and c = u × b. By
construction c is perpendicular to both u and b, and, according to the property of
rotation matrices mentioned in the hint, we must have Rx = a + cos θb + sin θc.
Obviously, we also have
a = (u · x)u,
(
b = x − (u · x)u,
c = u × x,
and it follows that Rx = cos θx + sin θ u × x + (1 − cos θ)(u · x)u.
3.7. Use the Rodrigues formula to show that the matrix R associated with a rotation
of angle θ about the unit vector u = (u, v, w)T
u2 (1 − c) + c
uv(1 − c) − ws uw(1 − c) + vs
uv(1 − c) + ws v 2 (1 − c) + c vw(1 − c) − us.
uw(1 − c) − vs vw(1 − c) + us w 2 (1 − c) + c
u2
1 0 0 0 −w v uv uw
à ! à !
R=c 0 1 0 +s w 0 −u + (1 − c) uv v2 vw ,
0 0 1 −v u 0 uw vw w2
3.8. Assuming that the intrinsic parameters of a camera are known, show how to com-
pute its extrinsic parameters once the vector n0 defined in Section 3.5 is known.
Hint: Use the fact that the rows of a rotation matrix form an orthonormal family.
Solution Recall that the vector n0 = (m11 , m12 , m14 , m21 , m22 , m24 )T can only
be recovered up to scale. With the intrinsic ¡ parameters
¢ ¡ known,
¢ this means that we
can write the projection matrix as M = A b = ρ R t , where R and t are
the rotation matrix and translation vector associated with the camera’s extrinsic
parameters.
Let aT T
1 and a2 denote as usual the two rows of the matrix A. Since the rows
of a rotation matrix have unit norm and are orthogonal to each other, we have
|a21 | = |a22 | = ρ2 and a1 · a2 = 0. These two constraints can be seen as quadratic
equations in the unknowns m13 and m23 , namely
½
m223 − m213 = |b1 |2 − |b2 |2 ,
m13 m23 = −b1 · b2 ,
where b1 = (m11 , m12 )T and b2 = (m21 , m22 )T . Squaring the second equation
and substituting the value of m223 from the first equation into it yields
or equivalently
m413 + (|b1 |2 − |b2 |2 )m213 − (b1 · b2 )2 = 0.
This is a quadratic equation in m213 . Since the constant term and the quadratic
term have opposite signs, it always admits two real solutions with opposite signs.
Only the positive one is valid of course, and it yields two opposite solutions for
m13 . The remaining unknown is then determined as m23 = −(b1 · b2 )/m13 .
At this point, there are four valid values for the triple (a1 , a2 , ρ) since m13 and
m23 are determined up to a single sign ambiguity, and the value of ρ is determined
up to a second sign ambiguity by ρ2 = |a1 |2 . In turn, this determines four valid
values for the rows r T T
1 and r 2 of R and the coordinates tx and ty of t. For each
of these solutions, the last row of R is computed as r 3 = r 1 × r 2 , which gives in
turn a3 = ρr 3 . Finally, an initial value of tz = m14 /ρ can be computed using
linear least squares by setting λ = 1 in Eq. (3.23). The correct solution among
the four found can be identified by (a) using the sign of tz (when it is known) to
discard obviously incorrect solutions, and (b) picking among the remaining ones the
solution that yields the smallest residual in the least-squares estimation process.
3.9. Assume that n fiducial lines with known Plücker coordinates are observed by a
camera.
(a) Show that the line projection matrix M̃ introduced in the exercises of chapter
2 can be recovered using linear least squares when n ≥ 9.
(b) Show that once M̃ is known, the projection matrix M can also be recovered
using linear least squares.
Hint: Consider the rows mi of M as the coordinate vectors of three planes Πi
and the rows m̃i of M̃ as the coordinate vectors of three lines, and use the
incidence relationships between these planes and these lines to derive linear
constraints on the vectors mi .
Solution
(a) We saw in Exercise 2.15 that the Plücker coordinate vector of a line ∆ and
23
(m2 ∧ m3 )T
def
ρδ = M̃∆, where M̃ = (m3 ∧ m1 )T .
(m1 ∧ m2 )T
We can eliminate the unknown scale factor ρ by using the fact that the cross
product of two parallel vectors is zero, thus δ × M̃∆ = 0. This linear vector
equation in the components of M̃ is equivalent to two independent scalar
equations. Since the 3 × 6 matrix M̃ is only defined up to scale, its 17 inde-
pendent coefficients can thus be estimated as before via linear least squares
(ignoring the non-linear constraints imposed by the fact that the rows of M̃
are Plücker coordinate vectors) when n ≥ 9.
(b) Once M̃ is known, we can recover M as well through linear least squares.
Indeed, the vectors mi (i = 1, 2, 3) can be thought of as the homogeneous
coordinate vectors of three projection planes Πi (see diagram below). These
planes intersect at the optical center O of the camera since the homogeneous
coordinate vector of this point satisfies the equation MO = 0. Likewise, it is
easy to show that Π3 is parallel to the image plane, that Π3 and Π1 intersect
along a line L31 parallel to the u = 0 coordinate axis of the image plane, that
Π2 and Π3 intersect along a line L23 parallel to its v = 0 coordinate axis,
and that the line L12 formed by the intersection of Π1 and Π2 is simply the
optical axis.
L31
v
Π1 Π3
L12 O
C Π2 L23
via linear least squares (at most 11 of the 30 equations are independent in
the noise-free case). Once M is known, the intrinsic and extrinsic parameters
can be computed as before. We leave to the reader the task of characterizing
the degenerate line configurations for which the proposed method fails.
Programming Assignments
3.10. Use linear least-squares to fit a plane to n points (xi , yi , zi )T (i = 1, . . . , n) in R3 .
3.11. Use linear least-squares to fit a conic section defined by ax2 +bxy+cy 2 +dx+ey+f =
0 to n points (xi , yi )T (i = 1, . . . , n) in R2 .
3.12. Implement the linear calibration algorithm presented in Section 3.2.
3.13. Implement the calibration algorithm that takes into account radial distortion and
that was presented in Section 3.3.
3.14. Implement the nonlinear calibration algorithm from Section 3.4.
C H A P T E R 4
Radiometry—Measuring Light
PROBLEMS
4.1. How many steradians in a hemisphere?
Solution 2π.
4.2. We have proved that radiance does not go down along a straight line in a non-
absorbing medium, which makes it a useful unit. Show that if we were to use power
per square meter of foreshortened area (which is irradiance), the unit must change
with distance along a straight line. How significant is this difference?
Solution Assume we have a source and two receivers that look exactly the same
from the source. One is large and far away, the other is small and nearby. Because
they look the same from the source, exactly the same rays leaving the source pass
through each receiver. If our unit is power per square meter of foreshortened area,
the amount of power arriving at a receiver is given by integrating this over the
area of the receiver. But the distant one is bigger, and so if the value of power
per square meter of foreshortened area didn’t go down with distance, then it would
receive more power than the nearby receiver, which is impossible (how does the
source know which one should get more power?).
4.3. An absorbing medium: Assume that the world is filled with an isotropic absorb-
ing medium. A good, simple model of such a medium is obtained by considering
a line along which radiance travels. If the radiance along the line is N at x, it is
N − (αdx)N at x + dx.
(a) Write an expression for the radiance transferred from one surface patch to
another in the presence of this medium.
(b) Now qualitatively describe the distribution of light in a room filled with this
medium for α small and large positive numbers. The room is a cube, and the
light is a single small patch in the center of the ceiling. Keep in mind that if
α is large and positive, little light actually reaches the walls of the room.
Solution
dN
(a) dx = −αN .
(b) Radiance goes down exponentially with distance. Assume the largest distance
in the room is d. If α is small enough — much less than 1/d, room looks like
usual. As α gets bigger, interreflections are quenched; for large α only objects
that view the light directly and are close to the light will be bright.
4.4. Identify common surfaces that are neither Lambertian nor specular using the un-
derside of a CD as a working example. There are a variety of important biological
examples, which are often blue in color. Give at least two different reasons that it
could be advantageous to an organism to have a non-Lambertian surface.
Solution There are lots. Many possible advantages; for example, an animal that
looks small to a predator approaching in one direction (because it looks dark from
this direction) could turn quickly and look big (because it looks bright from this
direction).
25
26 Chapter 4 Radiometry—Measuring Light
4.5. Show that for an ideal diffuse surface the directional hemispheric reflectance is con-
stant; now show that if a surface has constant directional hemispheric reflectance,
it is ideal diffuse.
Solution In an ideal diffuse surface, the BRDF is constant; DHR is an integral
of the BRDF over the outgoing angles, and so must be constant too. The other
direction is false (sorry!—DAF).
4.6. Show that the BRDF of an ideal specular surface is
PROBLEMS
5.1. What shapes can the shadow of a sphere take if it is cast on a plane and the source
is a point source?
Solution Any conic section. The vertex of the cone is the point source; the rays
tangent to the sphere form a right circular cone, and this cone is sliced by a plane.
It’s not possible to get both parts of a hyperbola.
5.2. We have a square area source and a square occluder, both parallel to a plane. The
source is the same size as the occluder, and they are vertically above one another
with their centers aligned.
(a) What is the shape of the umbra?
(b) What is the shape of the outside boundary of the penumbra?
Solution
(a) Square.
(b) Construct with a drawing, to get an eight-sided polygon.
5.3. We have a square area source and a square occluder, both parallel to a plane. The
edge length of the source is now twice that of the occluder, and they are vertically
above one another with their centers aligned.
(a) What is the shape of the umbra?
(b) What is the shape of the outside boundary of the penumbra?
Solution
(a) Depends how far the source is above the occluder; it could be absent, or
square.
(b) Construct with a drawing, to get an eight sided polygon.
5.4. We have a square area source and a square occluder, both parallel to a plane. The
edge length of the source is now half that of the occluder, and they are vertically
above one another with their centers aligned.
(a) What is the shape of the umbra?
(b) What is the shape of the outside boundary of the penumbra?
Solution (a) Square. (b) Construct with a drawing, to get an eight sided poly-
gon.
5.5. A small sphere casts a shadow on a larger sphere. Describe the possible shadow
boundaries that occur.
Solution Very complex, given by the intersection of a right circular cone and a
sphere. In the simplest case, the two centers are aligned with the point source, and
the shadow is a circle.
5.6. Explain why it is difficult to use shadow boundaries to infer shape, particularly if
the shadow is cast onto a curved surface.
27
28 Chapter 5 Sources, Shadows and Shading
Solution
(a) Obvious symmetry in the geometry.
(b) This integral is in “Shading primitives”, J. Haddon and D.A. Forsyth, Proc.
Int. Conf. Computer Vision, 1997.
5.9. If one looks across a large bay in the daytime, it is often hard to distinguish the
mountains on the opposite side; near sunset, they are clearly visible. This phe-
nomenon has to do with scattering of light by air — a large volume of air is actually
a source. Explain what is happening. We have modeled air as a vacuum and assert-
ed that no energy is lost along a straight line in a vacuum. Use your explanation
to give an estimate of the kind of scales over which that model is acceptable.
Solution In the day, the air between you and the other side is illuminated by the
sun; some light scatters toward your eye. This has the effect of reducing contrast,
meaning that the other side of the bay is hard to see because it’s about as bright
as the air. By evening, the air is less strongly illuminated and the contrast goes
up. This suggests that assuming air doesn’t interact with light is probably dubious
at scales of multiple kilometers (considerably less close to a city).
5.10. Read the book Colour and Light in Nature, by Lynch and Livingstone, published
by Cambridge University Press, 1995.
Programming Assignments
5.11. An area source can be approximated as a grid of point sources. The weakness of
this approximation is that the penumbra contains quantization errors, which can
be quite offensive to the eye.
(a) Explain.
29
(b) Render this effect for a square source and a single occluder casting a shadow
onto an infinite plane. For a fixed geometry, you should find that as the number
of point sources goes up, the quantization error goes down.
(c) This approximation has the unpleasant property that it is possible to produce
arbitrarily large quantization errors with any finite grid by changing the ge-
ometry. This is because there are configurations of source and occluder that
produce large penumbrae. Use a square source and a single occluder, casting
a shadow onto an infinite plane, to explain this effect.
5.12. Make a world of black objects and another of white objects (paper, glue and spray-
paint are useful here) and observe the effects of interreflections. Can you come up
with a criterion that reliably tells, from an image, which is which? (If you can,
publish it; the problem looks easy, but isn’t).
5.13. (This exercise requires some knowledge of numerical analysis.) Do the numerical
integrals required to reproduce Figure 5.17. These integrals aren’t particularly easy:
If one uses coordinates on the infinite plane, the size of the domain is a nuisance;
if one converts to coordinates on the view hemisphere of the patch, the frequency
of the radiance becomes infinite at the boundary of the hemisphere. The best way
to estimate these integrals is using a Monte Carlo method on the hemisphere. You
should use importance sampling because the boundary contributes rather less to
the integral than the top.
5.14. Set up and solve the linear equations for an interreflection solution for the interior
of a cube with a small square source in the center of the ceiling.
5.15. Implement a photometric stereo system.
(a) How accurate are its measurements (i.e., how well do they compare with known
shape information)? Do interreflections affect the accuracy?
(b) How repeatable are its measurements (i.e., if you obtain another set of images,
perhaps under different illuminants, and recover shape from those, how does
the new shape compare with the old)?
(c) Compare the minimization approach to reconstruction with the integration
approach; which is more accurate or more repeatable and why? Does this
difference appear in experiment?
(d) One possible way to improve the integration approach is to obtain depths by
integrating over many different paths and then average these depths (you need
to be a little careful about constants here). Does this improve the accuracy or
repeatability of the method?
C H A P T E R 6
Color
PROBLEMS
6.1. Sit down with a friend and a packet of colored papers, and compare the color names
that you use. You need a large packet of papers — one can very often get collections
of colored swatches for paint, or for the Pantone color system very cheaply. The
best names to try are basic color names — the terms red, pink, orange, yellow,
green, blue, purple, brown, white, gray and black, which (with a small number of
other terms) have remarkable canonical properties that apply widely across different
languages (the papers in ?) give a good summary of current thought on this issue).
You will find it surprisingly easy to disagree on which colors should be called blue
and which green, for example.
Solution Students should do the experiment; there’s no right answer, but if two
people agree on all color names for all papers with a large range of colour, something
funny is going on.
6.2. Derive the equations for transforming from RGB to CIE XYZ and back. This is a
linear transformation. It is sufficient to write out the expressions for the elements
of the linear transformation — you don’t have to look up the actual numerical
values of the color matching functions.
Solution Write the RGB primaries as pr (λ), pg (λ), pb (λ). If a colour has RGB
coords. (a, b, c), that means it matches apr (λ)+bpg (λ)+cpb (λ). What are the XYZ
coord.’s (d, e, f ) of this colour? we compute them with the XYZ colour matching
functions x(λ), y(λ) and z(λ), to get
R R R Ã !
d R x(λ)pr (λ)dλ R x(λ)pg (λ)dλ R x(λ)pb (λ)dλ a
à !
e = R y(λ)pr (λ)dλ R y(λ)pg (λ)dλ R y(λ)pb (λ)dλ
b .
f z(λ)pr (λ)dλ z(λ)pg (λ)dλ z(λ)pb (λ)dλ c
6.3. Linear color spaces are obtained by choosing primaries and then constructing color
matching functions for those primaries. Show that there is a linear transformation
that takes the coordinates of a color in one linear color space to those in another;
the easiest way to do this is to write out the transformation in terms of the color
matching functions.
Solution Look at the previous answer.
6.4. Exercise 6.3 means that, in setting up a linear color space, it is possible to choose
primaries arbitrarily, but there are constraints on the choice of color matching
functions. Why? What are these constraints?
Solution Assume I have some linear color space and know its color matching
functions. Then the color matching functions for any other linear color space are
a linear combination of the color matching functions for this space. Arbitrary
functions don’t have this property, so we can’t choose color matching functions
arbitrarily.
30
31
6.5. Two surfaces that have the same color under one light and different colors under
another are often referred to as metamers. An optimal color is a spectral reflectance
or radiance that has value 0 at some wavelengths and 1 at others. Although op-
timal colors don’t occur in practice, they are a useful device (due to Ostwald) for
explaining various effects.
(a) Use optimal colors to explain how metamerism occurs.
(b) Given a particular spectral albedo, show that there are an infinite number of
metameric spectral albedoes.
(c) Use optimal colors to construct an example of surfaces that look different under
one light (say, red and green) and the same under another.
(d) Use optimal colors to construct an example of surfaces that swop apparent
color when the light is changed (i.e., surface one looks red and surface two
looks green under light one, and surface one looks green and surface two looks
red under light two).
Solution
(a) See Figure 6.1.
(b) You can either do this graphically by extending the reasoning of Figure 6.1,
or analytically.
(c,d) This follows directly from (a) and (b).
6.6. You have to map the gamut for a printer to that of a monitor. There are colors
in each gamut that do not appear in the other. Given a monitor color that can’t
be reproduced exactly, you could choose the printer color that is closest. Why is
this a bad idea for reproducing images? Would it work for reproducing “business
graphics” (bar charts, pie charts, and the like, which all consist of many differernt
large blocks of a single color)?
Solution Some regions that, in the original picture had smooth gradients will
now have a constant color. Yes.
6.7. Volume color is a phenomenon associated with translucent materials that are col-
ored — the most attractive example is a glass of wine. The coloring comes from
different absorption coefficients at different wavelengths. Explain (a) why a small
glass of sufficiently deeply colored red wine (a good Cahors or Gigondas) looks
black (b) why a big glass of lightly colored red wine also looks black. Experimental
work is optional.
3.5
2.5
1.5
r
b
1
g
0.5
-0.5
350 400 450 500 550 600 650 700 750 800 850
wavelength in nm
FIGURE 6.1: The figure shows the RGB colormatching functions. The reflectance
given by the two narrow bars is metameric to that given by the single, slightly
thicker bar under uniform illumination, because under uniform illumination either
reflectance will cause no response in B and about the same response in R and G.
However, if the illuminant has high energy at about the center wavelength of the
thicker bar and no energy elsewhere, the surface with this reflectance will look the
same as it does under a uniform illuminant but the other one will be dark. It’s worth
trying to do a few other examples with this sort of graphical reasoning because it will
give you a more visceral sense of what is going on than mere algebraic manipulation.
Determine the values of A and b, and show how to solve this general problem.
You will need to keep in mind that A does not have full rank, so you can’t go
inverting it.
Solution
(a) Straightforward detail.
(b) The difficulty is the constant of integration. The problem is
or, equivalently,
choose l so that (AT A)l = −AT b,
33
Linear Filters
PROBLEMS
7.1. Show that forming unweighted local averages, which yields an operation of the form
1 X v=j+k
u=i+k X
Rij = Fuv
(2k + 1)2
u=i−k v=j−k
Solution Convolving this image with any kernel reproduces the kernel; the cur-
rent kernel is a circularly symmetric fuzzy blob.
7.3. Show that convolving an image with a discrete, separable 2D filter kernel is equiv-
alent to convolving with two 1D filter kernels. Estimate the number of operations
saved for an N xN image and a 2k + 1 × 2k + 1 kernel.
Solution
7.4. Show that convolving a function with a δ function simply reproduces the original
function. Now show that convolving a function with a shifted δ function shifts the
function.
34
35
Solution
(a) The wheel has a symmetry, and has rotated just enough to look like itself,
and so is stationary. It moves the wrong way when it rotates just too little
to look like itself.
(b) Typically, color images are obtained by using three different sites on the same
imaging grid, each sensitive to a different range of wavelengths. If the blue
site in the camera sees on a stripe and the nearby red and green sites see on
the shirt, the pixel reports yellow; but a small movement may mean (say) the
green sees the stripe and the red and blue see the shirt, and we get purple.
(c) The source has been subdivided into a grid with point sources at the vertices;
each block boundary occurs when one of these elements disappears behind,
or reappears from behind, an occluder.
Programming Assignments
7.7. One way to obtain a Gaussian kernel is to convolve a constant kernel with itself
many times. Compare this strategy with evaluating a Gaussian kernel.
(a) How many repeated convolutions do you need to get a reasonable approxima-
tion? (You need to establish what a reasonable approximation is; you might
plot the quality of the approximation against the number of repeated convo-
lutions).
(b) Are there any benefits that can be obtained like this? (Hint: Not every com-
puter comes with an FPU.)
7.8. Write a program that produces a Gaussian pyramid from an image.
7.9. A sampled Gaussian kernel must alias because the kernel contains components at
arbitrarily high spatial frequencies. Assume that the kernel is sampled on an infinite
grid. As the standard deviation gets smaller, the aliased energy must increase. Plot
the energy that aliases against the standard deviation of the Gaussian kernel in
pixels. Now assume that the Gaussian kernel is given on a 7x7 grid. If the aliased
energy must be of the same order of magnitude as the error due to truncating the
Gaussian, what is the smallest standard deviation that can be expressed on this
grid?
C H A P T E R 8
Edge Detection
PROBLEMS
8.1. Each pixel value in 500 × 500 pixel image I is an independent, normally distributed
random variable with zero mean and standard deviation one. Estimate the number
of pixels that, where the absolute value of the x derivative, estimated by forward
differences (i.e., |Ii+1,j − Ii,j|, is greater than 3.
√
Solution The signed difference has mean 0 and standard deviation 2. There
are 500 rows and 499 differences per row, so a total of 500 × 499 differences. The
probability that the absolute value of a difference is larger than 3 is
Z ∞ Z −∞
1 2 1 2
P (diff > 3) = √ √ e(−x /4) dx + √ √ e(−x /4) dx
3 2 2π −3 2 2π
36
37
Now some elements of these sums are shared, and it the shared values that produce
covariance. In particular, the shared terms occur when i−l = u−s and j−m = v−t.
The covariance will be the variance times the weights with which these shared terms
appear. Hence
X
E(Rij Ruv ) = Glm Gst .
i−l=u−s,j−m=v−t
8.3. We have a camera that can produce output values that are integers in the range
from 0 to 255. Its spatial resolution is 1024 by 768 pixels, and it produces 30 frames
a second. We point it at a scene that, in the absence of noise, would produce the
constant value 128. The output of the camera is subject to noise that we model as
zero mean stationary additive Gaussian noise with a standard deviation of 1. How
long must we wait before the noise model predicts that we should see a pixel with
a negative value? (Hint: You may find it helpful to use logarithms to compute the
answer as a straightforward evaluation of exp(−1282 /2) will yield 0; the trick is to
get the large positive and large negative logarithms to cancel.)
Solution The hint is unhelpful; DAF apologizes. Most important issue here is
P (value of noise < −128). This is
Z −128
1 2
√ e(−x /2)
dx,
2π −∞
which can be looked up in tables for the complementary error function, as above.
There are 30 × 1024 × 768 samples per second, each of which has probability
8.7. Obtain an implementation of Canny’s edge detector (you could try the vision home
page; MATLAB has an implementation in the image processing toolbox, too) and
make a series of images indicating the effects of scale and contrast thresholds on
the edges that are detected. How easy is it to set up the edge detector to mark
only object boundaries? Can you think of applications where this would be easy?
8.8. It is quite easy to defeat hysteresis in edge detectors that implement it — essentially,
one sets the lower and higher thresholds to have the same value. Use this trick to
compare the behavior of an edge detector with and without hysteresis. There are
a variety of issues to look at:
(a) What are you trying to do with the edge detector output? It is sometimes
helpful to have linked chains of edge points. Does hysteresis help significantly
here?
(b) Noise suppression: We often wish to force edge detectors to ignore some edge
points and mark others. One diagnostic that an edge is useful is high contrast
(it is by no means reliable). How reliably can you use hysteresis to suppress
low-contrast edges without breaking high-contrast edges?
C H A P T E R 9
Texture
PROBLEMS
9.1. Show that a circle appears as an ellipse in an orthographic view, and that the minor
axis of this ellipse is the tilt direction. What is the aspect ratio of this ellipse?
Solution The circle lies on a plane. An orthographic view of the plane is obtained
by projecting along some family of parallel rays onto another plane. Now on the
image plane there will be some direction that is parallel to the object plane — call
this T . Choose another direction on the image plane that is perpendicular to this
one, and call it B. Now I can rotate the coordinate system on the object plane
without problems (it’s a circle!) so I rotate it so that the x direction is parallel
to T . The y-coordinate projects onto the B direction (because the image plane
is rotated about T with respect to the object plane) but is foreshortened. This
means that the point (x, y) in the object plane projects to the point (x, αy) in the
T , B coordinate system on the image plane (0 ≤ α ≤ 1 is a constant to do with
the relative orientation of the planes). This means that the curve (cos θ, sin θ) on
the object plane goes to (cos θ, α sin θ) on the image plane, which is an ellipse.
9.2. We will study measuring the orientation of a plane in an orthographic view, given
the texture consists of points laid down by a homogenous Poisson point process.
Recall that one way to generate points according to such a process is to sample
the x and y coordinate of the point uniformly and at random. We assume that the
points from our process lie within a unit square.
(a) Show that the probability that a point will land in a particular set is propor-
tional to the area of that set.
(b) Assume we partition the area into disjoint sets. Show that the number of
points in each set has a multinomial probability distribution.
We will now use these observations to recover the orientation of the plane. We
partition the image texture into a collection of disjoint sets.
(c) Show that the area of each set, backprojected onto the textured plane, is a
function of the orientation of the plane.
(d) Use this function to suggest a method for obtaining the plane’s orientation.
Solution The answer to (d) is no. The rest is straightforward.
Programming Assignments
9.3. Texture synthesis: Implement the non-parametric texture synthesis algorithm
of Section 9.3.2. Use your implementation to study:
(a) the effect of window size on the synthesized texture;
(b) the effect of window shape on the synthesized texture;
(c) the effect of the matching criterion on the synthesized texture (i.e., using
weighted sum of squares instead of sum of squares, etc.).
9.4. Texture representation: Implement a texture classifier that can distinguish be-
tween at least six types of texture; use the scale selection mechanism of Section
9.1.2, and compute statistics of filter outputs. We recommend that you use at
least the mean and covariance of the outputs of about six oriented bar filters and
39
40 Chapter 9 Texture
a spot filter. You may need to read up on classification in chapter 22; use a simple
classifier (nearest neighbor using Mahalanobis distance should do the trick).
C H A P T E R 10
It follows that all non-zero singular values of E must be equal. Note that the
singular values of E cannot all be zero since this matrix has rank 2.
10.2. Exponential representation of rotation matrices. The matrix associated with the
rotation whose axis is the unit vector a and whose angle is θ can be shown to be
def P+∞ 1
equal to eθ[a× ] = (θ[a ])i . Use this representation to derive Eq. (10.3).
i=0 i! ×
Solution Let us consider a small motion with translational velocity v and ro-
tational velocity ω. If the two camera frames are separated by the small time
interval δt, the translation separating them is obviously (to first order) t = δtv.
The corresponding rotation is a rotation of angle δt|ω| about the axis (1/|ω|)ω,
i.e.,
+∞
1
R = eδt[ω × ] =
X
(δt[ω × ])i = Id + δt [ω × ] + higher-order terms.
i!
i=0
41
42 Chapter 10 The Geometry of Multiple Views
Solution Let us consider first a moving camera and a static scene, use the coor-
dinate system attached to the camera in its initial position as the world coordinate
system, and identify scene points with their positions in this coordinate system and
image points with their position in the corresponding camera coordinate system.
We have seen that
¡ the ¢projection matrix associated with ¡this cameraT can¢ be taken
equal to M = Id 0 before the motion and to M = RT c −Rc tc after the
camera has undergone a rotation Rc and a translation tc . Using non-homogeneous
coordinates for scene points and homogeneous ones for image points, the two images
of a point P are thus p = P and p0 = RT T
c P − R c tc .
Let us now consider a static camera and a moving object. Suppose this object
undergoes the (finite) motion defined by P 0 = Ro P + to¡ in the¢ coordinate system
attached to this camera. Since the projection matrix is Id 0 in this coordinate
system, the image of P before the object displacement is p = P . The image
after the displacement is p0 = Ro P + to , and it follows immediately that taking
Ro = RT and to = −RT tc yields the same motion field as before.
For small motions, we have
Ro = Id + δt [ω o× ] = RT
c = Id − δt [ω c× ],
to = δt v o = −RT
c tc = −(Id − δt [ω c× ])(δt v c ) = −δt v c
when second-order terms are neglected. Thus v c = −v o . Recall that Eq. (10.4)
can be written as
pT ([v c× ][ω c× ])p − (p × ṗ) · v c = 0.
Substituting v c = −v o and ω c = −ω o in this equation finally yields
10.4. Show that when the 8×8 matrix associated with the eight-point algorithm is singu-
lar, the eight points and the two optical centers lie on a quadric surface (Faugeras,
1993).
Hint: Use the fact that when a matrix is singular, there exists some nontrivial
linear combination of its columns that is equal to zero. Also take advantage of the
fact that the matrices representing the two projections in the coordinate system of
the first camera are in this case (Id 0) and (RT − RT t).
Solution We follow the proof in Faugeras (1993): Each row of the 8 × 8 matrix
associated with the eight-point algorithm can be written as
1
(uu0 , uv 0 , u, vu0 , vv 0 , v, u0 , v 0 ) = (xx0 , xy 0 , xz 0 , yx0 , yy 0 , yz 0 , zx0 , zy 0 ),
zz 0
where P = (x, y, z)T and P 0 = (x0 , y 0 , z 0 )T denote the positions of the scene point
projecting onto (u, v)T and (u0 , v 0 )T in the corresponding camera coordinate sys-
tems (C) and (C 0 ). For the matrix to be singular, there must exist some nontrivial
linear combination of its columns that is equal to zero—that is, there must exist
eight scalars λi (i = 1, . . . , 8) such that
λ1 λ2 λ3
à !
T 0
P QP = 0, where Q= λ4 λ5 λ6 .
λ7 λ8 0
P T QRT (P − t) = 0.
lT lT 1
1 0 2 G1 l 3
T
L = l 2 R2 lT
2 t2
can be written as T
l1 × l2 G12 l3 = 0.
lT3 R3 lT t
3 3 lT 3
2 G1 l 3
Show that the fourth determinant can be written as a linear combination of these.
In particular,
which, except for a change in sign, is identical to the expression derived earlier.
Thus the fact that three of the 3 × 3 minors of L are zero can indeed be expressed
by the trifocal tensor.
Let us conclude by showing that the fourth determinant is a linear combination of
the other three. This determinant is
D= a b c = (a × b) · c.
1 c1 c2 c3
D= [e(a × b) − d(a × c)] · c = D23 + D31 + D12 ,
e e e e
which shows that D can indeed be written as a linear combination of D12 , D23 ,
and D31 .
10.6. Show that Eq. (10.18) reduces to Eq. (10.2) when M1 = (Id 0) and M2 =
(RT − RT t).
Solution Recall that Eq. (10.18) expresses the bilinear constraints associated
with two cameras as D = 0, where D is the determinant
u1 M31 − M11
v M3 − M21
D = 1 13 ,
u2 M2 − M12
3 2
v 2 M2 − M2
cT
1 0 0 0 −c1 · t
à !
1
and M2 = RT − R t = cT
T
¡ ¢ ¡ ¢
M1 = Id 0 = 0 1 0 0 2 −c2 · t,
0 0 1 0 cT
3 −c3 · t
where c1 , c2 , and c3 denote the three columns of R, we can rewrite the determinant
as
(−1, 0, u1 ) 0
(0, −1, v1 ) 0
D=
u2 cT
3 − c1
T
−(u2 c3 − c1 ) · t
T T
v2 c3 − c2 −(v2 c3 − c2 ) · t
−1 0 −1 0
= 0 −1 v2 c3 − c2 (u2 c3 − c1 ) · t − 0 −1 u2 c3 − c1 (v2 c3 − c2 ) · t
u1 v1 u1 v1
= pT
¡ ¢
1 (c2 · t)c3 − (c3 · t)c2 (c3 · t)c1 − (c1 · t)c3 (c1 · t)c2 − (c2 · t)c1 p2 ,
45
= pT
¡ ¢
D 1 ¡t2 c3 − t3 c2 t3 c1 − t1 c3 t1 c2 − t2 c1 p2
= pT t × c 3 p2 = p T
¢
1 t × c 1 t × c 2 1 [t× ]Rp2 ,
Solution Recall that Eq. (10.18) expresses the trilinear constraints associated
with two cameras as D = 0, where D is the determinant
u1 M31 − M11
v M3 − M21
D = 1 13 ,
u2 M2 − M12
3 2
v 3 M3 − M3
(−1, 0, u1 , 0)
(0, −1, v1 , 0)
D= .
u2 M32 − M12
3 2
v 3 M3 − M3
or, since the determinant of a matrix is equal to the determinant of its transpose,
−1
à !
D23
à !
0 a b c
D= = (−1, 0, u1 ) D31 ,
u1
D12
0 0 d e
where we use the same notation as in Ex. 10.5. According to that exercise, we thus
have
lT 1
lT 1
2 G1 l 3 2 G1 l 3
D = −(−1, 0, u1 )[l1 × lT 2 ] = −(−1, 0, u )[l ×] T 2
2 G1 l 3 1 1 l 2 G1 l 3
lT
2G13 l3 lT 3
2 G1 l 3
T 1 T 1
−1 0 l 2 G1 l 3 l 2 G1 l 3
à ! à !
= −( 0 × −1 ) · lT 2 = −pT T 2 ,
2 G1 l 3 1 l 2 G1 l 3
u1 v1 lT 3
2 G1 l 3 lT 3
2 G1 l 3
10.8. Develop Eq. (10.20) with respect to the image coordinates, and verify that the
coefficients can indeed be written in the form of Eq. (10.21).
Solution This follows directly from the multilinear nature of determinants. In-
deed, Eq. (10.20) can be written as
It is thus clear that all the coefficients of the quadrilinear tensor can be written in
the form of Eq. (10.21).
10.9. Use Eq. (10.23) to calculate the unknowns zi , λi , and z1i in terms of p1 , pi , Ri , and
ti (i = 2, 3). Show that the value of λi is directly related to the epipolar constraint,
and characterize the degree of the dependency of z12 − z13 on the data points.
Solution We rewrite Eq. (10.23) as
where q i = Ri pi , r i = p1 × q i , and i = 2, 3.
Forming the dot product of this equation with r i yields
ti · [p1 × Ri pi ] + λi |r i |2 = 0,
λi |r i |2 = p1 · [ti × Ri pi ],
z1i p1 · (q i × r i ) = ti · (q i × r i ).
or
rT
i Ei p i
z1i = .
|r i |2
47
[ti , p1 , r i ] + zi [q i , p1 , r 1 ] = 0,
or
[ti , p1 , r i ]
zi = .
|r i |2
|r 2 |2 (r T 2 T
3 E3 p3 ) = |r 3 | (r 2 E2 p2 ),
Stereopsis
PROBLEMS
11.1. Show that, in the case of a rectified pair of images, the depth of a point P in the
normalized coordinate system attached to the first camera is z = −B/d, where B
is the baseline and d is the disparity.
Solution Note that for rectified cameras, the v and v 0 axis of the two image
coordinate systems are parallel to each other and to the y axis of the coordinate
system attached to the first camera. In addition, the images q and q 0 of any point
Q in the plane y = 0 verify v = v 0 = 0. As shown by the diagram below, if H, C0
and C00 denote respectively the orthogonal projection of Q onto the baseline and
the principal points of the two cameras, the triangles OHQ and qC0 O are similar,
thus −b/z = −u/1. Likewise, the triangles HO 0 Q and C00 q 0 O0 are similar, thus
−b0 /z = u0 /1, where b and b0 denote respectively the lengths of the line segments
OH and HO 0 . It follows that u0 − u = −B/z or z = −B/d.
P d
-z
O H O’
b b’
v v’ ’
1 1
q q’
C0 u C’0 u’
p p’
Let us now consider a point P with nonzero y coordinate and its orthogonal pro-
jection Q onto the plane y = 0. The points P and Q have the same depth since
the line P Q joining them is parallel to the y axis. The lines pq and p0 q 0 joining
the projections of the two points in the two images are also obviously parallel to
P Q and to the v and v 0 axis. It follows that the u coordinates of p and q are the
same, and that the u0 coordinates of p0 and q 0 are also the same. In other words,
the disparity and depths for the points P and Q are the same, and the formula
z = −B/d holds in general.
11.2. Use the definition of disparity to characterize the accuracy of stereo reconstruction
as a function of baseline and depth.
Solution Let us assume that the cameras have been rectified. In this case, as in
48
49
Ex. 11.1, we have z = −B/d. Let us assume the disparity has been measured with
some error ε. Using a first-order Taylor expansion of the depth shows that
B ε
z(δ + ε) − z(δ) ≈ εz 0 (δ) = ε 2
= z2.
d B
In other words, the error is proportional to the squared depth and inversely pro-
portional to the baseline.
11.3. Give reconstruction formulas for verging eyes in the plane.
Solution Let us define a Cyclopean coordinate system with origin at the mid-
point between the two eyes, x axis in the direction of the baseline, and z axis
oriented so that z > 0 for points in front of the two eyes (note that this contradicts
our usual conventions, but allows us to use a right-handed (x, z) coordinate system).
Now consider a point P with coordinates (x, z). As shown by the diagram below, if
the corresponding projection rays make angles θl and θr with the z axis, we must
have
x + B/2 B cot θl + cot θr
= cot θl , x=
,
z 2 cot θl − cot θr
⇐⇒
x − B/2 = cot θr ,
z=
B
.
z cot θl − cot θr
z
F
P z
θr
θl φr
φl
-x x
B
Now, if the fixated point has angular coordinates (φl , φr ) and some other point P
has absolute angular coordinates (θl , θr ), Cartesian coordinates (x, z), and retinal
angular coordinates (ψl , ψr ), we must have θl = φl + ψl and θr = φr + ψr , which
gives reconstruction formulas for given values of (φl , φr ) and (ψl , ψr ).
11.4. Give an algorithm for generating an ambiguous random dot stereogram that can
depict two different planes hovering over a third one.
Solution We display two squares hovering at different heights over a larger bac-
ground square. The background images can be synthesized by spraying random
black dots on a white background plate after (virtually) covering the area corre-
sponding to the hovering squares. For the other two squares, the dots are generated
as follows: On a given scanline, intersect a ray issued from the left eye with the
first plane and the second one, and paint black the resulting dots P1 and P2 . Then
paint a black dot on the first plane at the point P3 where the ray joining the right
eye to P2 intersects the first plane. Now intersect the ray joining the left eye to
P3 with the second plane. Continue this process as long as desired. It is clear that
this will generate a deterministic, but completely ambiguous pattern. Limiting this
50 Chapter 11 Stereopsis
process to a few iterations and repeating it at many random locations will achieve
the desired random effect.
11.5. Show that the correlation function reaches its maximum value of 1 when the image
brightnesses of the two windows are related by the affine transform I 0 = λI + µ for
some constants λ and µ with λ > 0.
Solution Let us consider two images represented by the vectors w = (w1 , . . . , wp )T
and w0 = (w10 , . . . , wp0 )T of Rp (typically, p = (2m + 1) × (2n + 1) for some positive
values of m and n). As noted earlier, the corresponding normalized correlation
value is the cosine of the angle θ between the vectors w − w̄ and w0 − w̄0 , where ā
denotes the vector whose coordinates are all equal to the mean ā of the coordinates
of a.
The correlation function reaches its maximum value of 1 when the angle θ is zero.
In this case, we must have w 0 − w̄0 = λ(w − w̄) for some λ > 0, or for i = 1, . . . , p,
where µ = w̄ 0 − λw̄.
Conversely, suppose that wi0 = λwi + µ for some λ, µ with λ > 0. Clearly, w 0 =
λw + µ̄ and w̄0 = λw̄ + µ̄, where µ̄ denotes this time the vector with all coordinates
equal to µ. Thus w 0 − w̄0 = λ(w − w̄), and the angle θ is equal to zero, yielding
the maximum possible value of the correlation function.
11.6. Prove the equivalence of correlation and sum of squared differences for images with
zero mean and unit Frobenius norm.
Solution Let w and w 0 denote the vectors associated with two image windows.
If these windows have zero mean and unit Frobenius norm, we have by definition
|w|2 = |w0 |2 = 1 and w̄ = w̄0 = 0. In this case, the sum of squared differences is
where C is the normalized correlation of the two windows. Thus minimizing the
sum of squared differences is equivalent to maximizing the normalized correlation.
11.7. Recursive computation of the correlation function.
(a) Show that (w − w̄) · (w0 − w̄0 ) = w · w 0 − (2m + 1)(2n + 1)I¯I¯0 .
(b) Show that the average intensity I¯ can be computed recursively, and estimate
the cost of the incremental computation.
(c) Generalize the prior calculations to all elements involved in the construction
of the correlation function, and estimate the overall cost of correlation over a
pair of images.
Solution
(a) First note that for any two vectors of size p, we have ā · b = pāb̄, where ā and
b̄ denote respectively the average values of the coordinates of a and b. For
vectors w and w 0 representing images of size (2m + 1) × (2n + 1) with average
intensities I¯ and I¯0 , we have therefore
¯ j) and I(i
(b) Let I(i, ¯ + 1, j) denote the average intensities computed for windows
respectively centered in (i, j) and (i + 1, j). If p = (2m + 1) × (2n + 1), we
51
have
m n
¯ + 1, j) = 1
X X
I(i I(i + k + 1, j + l)
p
k=−m l=−n
m−1 n n
1 X X 1 X
= I(i + k + 1, j + l) + I(i + m + 1, j + l)
p p
k=−m l=−n l=−n
m n n
1 X X 1 X
= I(i + k 0 , j + l) + I(i + m + 1, j + l)
p p
k0 =−m+1 l=−n l=−n
n n
1
X 1 X
¯ j) −
= I(i, I(i − m, j + l) + I(i + m + 1, j + l).
p p
l=−n l=−n
Thus the average intensity can be updated in 4(n+1) operations when moving
from one pixel to the one below it. The update for moving one column to the
right costs 4(m + 1) operations. This is to compare to the (2m + 1)(2n + 1)
operations necessary to compute the average from scratch.
(c) It is not possible to compute the dot product incrementally during column
shifts associated with successive disparities. However, it is possible to compute
the dot product associated with elementary row shifts since 2m of the rows
are shared by consecutive windows. Indeed, let w(i, j) and w 0 (i, j) denotes
the vectors w and w 0 associated with windows of size (2m + 1) × (2n + 1)
centered in (i, j). We have
m
X n
X
w(i + 1, j) · w 0 (i + 1, j) = I(i + k + 1, j + l)I 0 (i + k + 1, j + l),
k=−m l=−n
and the exact same line of reasoning as in (b) can be used to show that
Thus the dot product can be updated in 4(2n + 1) operations when moving
from one pixel to the one below it. This is to compare to the 2(2m + 1)(2n +
1) − 1 operations necessary to compute the dot product from scratch.
To complete the computation of the correlation function, one must also com-
pute the norms |w − w̄| and |w0 − w̄0 |. This computation also reduces to the
evaluation of a dot product and an average, but it can be done recursively for
both rows and column shifts.
Suppose that images are matched by searching for each pixel in the left image
its match in the same scanline of the right image, within some disparity range
[−D, D]. Suppose also that the two images have size M × N and that the
windows being compared have, as before, size (2m+1)×(2n+1). By assuming
if necessary that the two images have been obtained by removing the outer
layer of a (M + 2m + 2D) × (N + 2n + 2D) image, we can ignore boundary
effects.
52 Chapter 11 Stereopsis
Processing the first scan line requires computing and storing (a) 2N dot prod-
ucts of the form w · w or w 0 · w0 , 2N averages of the form I¯ or I¯0 , and (c)
(2D + 1)N dot products of the form w · w 0 . The total storage required is
(2D + 5)N , which is certainly reasonable for, say, 1000 × 1000 images, and
disparity ranges of [−100, 100]. The computation is dominated by the w · w 0
dot products, and its cost is on the order of 2(2m + 1)(2n + 1)(2D + 1)N .
The incremental computations for the next scan line amount to updating all
averages and dot products, with a total cost of 4(2n + 1)(2D + 1)N . Assum-
ing M À m, the overall cost of the correlation is therefore, after M updates,
4(2n + 1)(2D + 1)M N operations. Note that a naive implementation would
require instead 2(2m + 1)(2n + 1)(2D + 1)M N operations.
11.8. Show how a first-order expansion of the disparity function for rectified images can
be used to warp the window of the right image corresponding to a rectangular region
of the left one. Show how to compute correlation in this case using interpolation
to estimate right-image values at the locations corresponding to the centers of the
left window’s pixels.
Solution Let us set up local coordinate systems whose origins are at the two
points of interest—that is, the two matched points have coordinates (0, 0) in these
coordinate systems. If d(u, v) denotes the disparity function in the neighborhood
of the first point, and α and β denotes its derivatives in (0, 0), we can write the
coordinates of a match (u0 , v 0 ) for the point (u, v) in the first image as
µ 0¶ µ ¶
u u + d(u, v)
= ,
v0 v
It follows that, to first-order, a small rectangular region in the first image maps
onto a parallelogram in the second image, and that the corresponding affine trans-
formation is completely determined by the derivatives of the disparity function.
To exploit this property in stereo matching, one can map the centers of the pixels
contained in the left window onto their right images, calculate the corresponding
intensity values via bilinear interpolation of neighborhing pixels in the right image,
and finally computer the correlation function from these values. This is essentially
the method described in Devernay and Faugeras (1994).
11.9. Show how to use the trifocal tensor to predict the tangent line along an image curve
from tangent line measurements in two other pictures.
Solution Let us assume we have estimated the trifocal tensor associated with
three images of a curve Γ. Let us denote by pi the projection of a point P of Γ
onto image number i (i = 1, 2, 3). The tangent line T to Γ in P projects onto the
tangent line ti to γi . Given the coordinate vectors t2 and t3 of t2 and t2 , we can
predict the coordinate vector t1 of t1 as t1 = (tT 1 T 2 T 3 T
2 G1 t3 , t2 G1 t3 , t2 G1 t3 ) . This is
another method for transfer, this time applied to lines instead of points.
Programming Assignments
11.10. Implement the rectification process.
11.11. Implement a correlation-based approach to stereopsis.
11.12. Implement a multiscale approach to stereopsis.
53
PROBLEMS
12.1. Explain why any definition of the “addition” of two points or of the “multiplication”
of a point by a scalar necessarily depends on the choice of some origin.
m m
def
X X
αi A i = A j + αi (Ai − Aj ),
i=0 i=0,i6=j
Noting that the summation defining P can be taken Pmover the whole 0..m range
without changing its result and using the fact that i=0 αi = 1, we can write
m
X
P = Ak + αi [(Ai − Aj ) + (Aj − Ak )].
i=0
m
X
P = Ak + αi (Ai − Ak ).
i=0
As before, omitting the term corresponding to i = k does not change the result of
the summation, which proves the result.
54
55
12.3. Given the two affine coordinate systems (A) = (OA , uA , v A , wA ) and (B) =
(OB , uB , v B , wB ) for the affine space E3 , let us define the 3 × 3 matrix
B
¡B B B
¢
AC = uA vA wA ,
where B a denotes the coordinate vector of the vector u in the (vector) coordinate
system (uA , v A , wA ). Show that
µB ¶ µB B
¶µA ¶
B B A B P AC OA P
P = AC P + OA or, equivalently, = T .
1 0 1 1
Solution The proof follows the derivation of the Euclidean change of coordinates
in chapter 2. We write
−−−→ ¡ ¢B −−−−→ ¡ ¢A
OB P = uB vB wB P = OB OA + uA vA wA P.
B
P =B A B
A C P + OA
since B
B C is obviously the identity. The homogeneous form of this expression follows
immediately, exactly as in the Euclidean case.
12.4. Show that the set of barycentric combinations of m + 1 points A0 , . . . , Am in X is
indeed an affine subspace of X, and show that its dimension is at most m.
Solution We equip R3 with a fixed affine coordinate system and identify points
with their (non-homogeneous) coordinate vectors. According to Section 12.1.2, a
necessary and sufficient for the three points P 1 = (x1 , y1 , z1 )T , P 2 = (x2 , y2 , z2 )T ,
and P = (x, y, z)T to define a line (i.e., a one-dimensional affine space) is that the
matrix
x1 x2 x
y1 y2 y
z z2 z
1
1 1 1
have rank 2, or equivalently, that all its 3 × 3 minors have zero determinant (we
assume that the three points are distincts so the matrix has at least rank 2). Note
56 Chapter 12 Affine Structure from Motion
y1 y2 y
z1 z2 z = y(z1 − z2 ) − z(y1 − y2 ) + y1 z2 − y2 z1 ,
1 1 1
z1 z2 z
x1 x2 x = z(x1 − x2 ) − x(z1 − z2 ) + z1 x2 − z2 x1 ,
1 1 1
x1 x2 x
y1 y2 y = x(y1 − y2 ) − y(x1 − x2 ) + x1 y2 − x2 y1 ,
1 1 1
P × (P 1 − P 2 ) + P 1 × P 2 = (P − P 2 ) × (P 1 − P 2 ).
As could have been expected, writing that these three coordinates are zero is e-
quivalent to writing that P1 , P2 , and P are collinear. Only two of the equations
associated with the three coordinates of the cross product are equivalent. It is easy
to see that the fourth minor is a linear combination of the other three, so the line
is defined by any two of the above equations.
12.6. Show that the intersection of a plane with two parallel planes consists of two parallel
lines.
Solution Consider the plane A + U and the two parallel planes B + V and C + V
in some affine space X. Here A, B, and C are points in X, and U and V are vector
~ and we will assume from now on that U and V are distinct (otherwise
planes in X,
the three planes are parallel). As shown in Example 12.2, the intersection of two
affine subspaces A + U and B + V is an affine subspace associated with the vector
subspace W = U ∩ V . The intersection of two distinct planes in a vector space is a
line, thus the intersection of A + U and B + V is a line. The same reasoning shows
that the intersection of A + U and C + V is also a line associated with W . The two
lines are parallel since they are associated with the same vector subspace W .
12.7. Show that an affine transformation ψ : X → Y between two affine subspaces
X and Y associated with the vector spaces X ~ and Y~ can be written as ψ(P ) =
~ − O), where O is some arbitrarily chosen origin, and ψ
ψ(O) + ψ(P ~:X ~ →Y ~ is a
linear mapping from X ~ onto Y
~ that is independent of the choice of O.
~
ψ(λu + µv) = ψ(O + λu + µv) − ψ(O)
= ψ(O + λ(A − O) + µ(B − O)) − ψ(O)
= ψ((1 − λ − µ)O + λA + µB) − ψ(O)
= (1 − λ − µ)ψ(O) + λψ(A) + µψ(B) − ψ(O)
= ψ(O) + λ(ψ(A) − ψ(O)) + µ(ψ(B) − ψ(O)) − ψ(O)
~
= λψ(u) ~
+ µψ(v).
57
λr T
λ 0 0 tx λtx
à !à !
1
Mλ,µ =K 0 λ 0 R ty = Kλr T
2 λty .
0 0 1 µtz rT3 µtz
Now if we choose µ = λ we can write
rT
1 tx
Mλ,λ = λK r T
2 ty .
1 T
λ r3 tz
When λ → +∞, the projection becomes affine, with affine projection matrix
1¡ ¢
K 2 R2 K2 t2 + p0 ,
tz
where we follow the notation used in Eq. (2.19) of chapter 2.
Note that picking µ = λ ensures that the magnification remains constant for the
fronto-parallel plane Π0 that contains OW . Indeed, let us denote by (iC , j C , kC )
the camera coordinate system, and consider a point A = OW + xiC + yj C in Π0 .
T T T T
Since R = C W
W R = C R , we have
W
iC = rT
1,
W
jC = rT
2 , and
W
kC = rT
3 . It
follows that
rT
1 (xr 1 + yr 2 ) + tx x + tz
à !
W
Mλ,λ A = λK r T
2 (xr 1 + yr 2 ) + t y = λK y + ty .
1 T tz
r
λ 3 (xr 1 + yr 2 ) + t z
12.9. Generalize the notion of multilinearities introduced in chapter 10 to the affine case.
uM3 − M1
µ ¶
P = 0,
vM3 − M2
where this time M3 = (0, 0, 0, 1). We can thus construct as in that chapter the
8×4 matrix Q, and all its 4×4 minors must, as before, have zero determinant. This
yields multi-image constraints involving two, three, or four images, but since image
coordinates only occur in the fourth column of Q, these constraints are now linear
in these coordinates (note the similarity with the affine fundamental matrix). On
the other hand, the multi-image relations between lines remain multilinear in the
affine case. For example, the derivation of the trifocal tensor for lines in Section
10.2.1 remains unchanged (except for the fact the third row of M is now equal
to (0, 0, 0, 1)), and yields trilinear relationships among the three lines’ coordinate
vectors. Likewise, the interpretation of the quadrifocal tensor in terms of lines
remains valid in the affine case.
12.10. Prove Theorem 3.
Solution Let us write the singular value decomposition of A as A = U WV T .
Since U is column-orthogonal, we have
AT A = VW T U T UWV T = VW T WV T .
cT
1 0
.. ..
. .
0
ci−1
AT Aci = T 2 2 2 2 2
VW W ci ci = Vdiag(w1 , . . . , wi−1 , wi , wi+1 , . . . , wn )1
c 0
i+1
. .
.. ..
cn 0
0
..
.
0
¢
c1 , . . . , ci−1 , ci , ci+1 , . . . , cn wi2 = wi2 ci .
¡
=
0
.
..
0
It follows that the vectors ci are indeed eigenvectors of AT A, and that the singular
values are the nonnegative square roots of the corresponding eigenvalues.
12.11. Show that a calibrated paraperspective camera is an affine camera that satisfies
the constraints
ur v r ur v r
a·b= |a|2 + |b|2 and (1 + vr2 )|a|2 = (1 + u2r )|b|2 ,
2(1 + u2r ) 2(1 + vr2 )
59
where (ur , vr ) denote the coordinates of the perspective projection of the point R.
Solution Recall from chapter 2 that the paraperspective projection matrix can
be written as µµ ¶ µ ¶ ¶
1 k s u 0 − ur k s
M= R t .
zr 0 1 v 0 − vr 0 1 2
1 1
a= (r 1 − ur r 3 ) and b= (r 2 − vr r 3 ).
zr zr
In particular, we have |a|2 = (1 + u2r )/zr2 , |b|2 = (1 + vr2 )/zr2 , and a · b = ur vr /zr2 .
The result immediately follows.
12.12. What do you expect the RREF of an m × n matrix with random entries to be
when m ≥ n? What do you expect it to be when m < n? Why?
Solution A random m × n matrix A usually has maximal rank. When m > n,
this rank is n, all columns are base columns, and the m − n bottom rows of the
RREF of A are zero. When m < n, the rank is m, and the first m columns of A are
normally independent. It follows that the base columns of the RREF are its first
m columns; the n − m rightmost columns of the RREF contain the coordinates of
the corresponding columns of A in the basis formed by its first m columns. There
are no zeros in the RREF in this case.
Programming Assignments
12.13. Implement the Koenderink–Van Doorn approach to affine shape from motion.
12.14. Implement the estimation of affine epipolar geometry from image correspondences
and the estimation of scene structure from the corresponding projection matrices.
12.15. Implement the Tomasi–Kanade approach to affine shape from motion.
12.16. Add random numbers uniformly distributed in the [0, 0.0001] range to the entries
of the matrix U used to illustrate the RREF and compute its RREF (using, e.g., the
rref routine in MATLAB); then compute again the RREF using a “robustified”
version of the reduction algorithm (using, e.g., rref with a nonzero tolerance).
Comment on the results.
C H A P T E R 13
v = λ(A x0 a0 + A x1 a1 + A x2 a2 + A x3 a3 ) = µ(B x0 b0 + B x1 b1 + B x2 b2 + B x3 b3 ),
ρB P = B A B
¡b b b b
¢
AT P , where AT = a0 a1 a2 a3 ,
and ρ = µ/λ, which proves the desired result. Note that the columns of B AT
are related to the coordinate vectors B Ai by a priori unknown scale factors. A
technique for computing these scale factors is given in Section 13.1.
13.3. Show that any two distinct lines in a projective plane intersect in exactly one point
and that two parallel lines ∆ and ∆0 in an affine plane intersect at the point at
infinity associated with their common direction v in the projective completion of
this plane.
Hint: Use JA to embed the affine plane in its projective closure, and write the vector
of Π× R associated with any point in JA (∆) (resp. JA (∆0 )) as a linear combination
−→ −→ −−→ −−→
of the vectors (AB, 1) and (AB + v, 1) (resp. (AB 0 , 1) and (AB 0 + v, 1)), where B
and B 0 are arbitrary points on ∆ and ∆0 .
60
61
Solution Consider two distinct lines ∆ and ∆0 in a projective plane, and let
(e1 , e2 ) and (e01 , e02 ) denote two bases for the associated two-dimensional vector
spaces. The intersection of ∆ and ∆0 is the set of points p(u), where u = λe1 +
µe2 = λ0 e01 + µ0 e02 for some value of the scalars λ, µ, λ0 , µ0 . When e01 can be
written as a linear combination of the vectors e1 and e2 , we must have µ0 = 0 since
otherwise e02 would also be a (non trivial) linear combination of e1 and e2 and the
two lines would be the same. In this case, p(e01 ) is the unique intersection point of ∆
and ∆0 . Otherwise, the three vectors e1 , e2 , and e01 are linearly independent, and
the vector e02 can be written in a unique manner as a linear combination of these
vectors, yielding a unique solution (defined up to scale) for the scalars λ, µ, λ0 , µ0 ,
and therefore a unique intersection for the lines ∆ and ∆0 .
Now let us consider two parallel (and thus distinct) lines ∆ and ∆0 with direction
v in the affine plane. The intersection of their images JA (∆) and JA (∆0 ) is de-
−→ −→ −→
termined by the solutions of the equation λ(AB, 1) + µ(AB + v, 1) = λ0 (AB 0 , 1) +
−→
µ0 (AB 0 + v, 1). This equation can be rewritten as
−−→
λ + µ = λ 0 + µ0 and (λ + µ)BB 0 + (µ0 − µ)v = 0.
−−→
Since the lines are not the same, the vectors BB 0 and v are not proportional to
each other, thus we must have µ = µ0 and λ + µ = λ0 + µ0 = 0. Thus the two lines
JA (∆) and JA (∆0 ) intersect at the point associated with the vector
−→
((λ + µ)AB + µv, λ + µ) = (µv, 0),
13.6. In this exercise, you will show that the cross-ratio of four collinear points A, B, C,
and D is equal to
sin(α + β) sin(β + γ)
{A, B; C, D} = ,
sin(α + β + γ) sin β
where the angles α, β, and γ are defined as in Figure 13.2.
(a) Show that the area of a triangle P QR is
1 1
A(P, Q, R) = P Q × RH = P Q × P R sin θ,
2 2
where P Q denotes the distance between the two points P and Q, H is the
projection of R onto the line passing through P and Q, and θ is the angle
between the lines joining the point P to the points Q and R.
(b) Define the ratio of three collinear points A, B, C as
AB
R(A, B, C) =
BC
for some orientation of the line supporting the three points. Show that R(A, B, C) =
A(A, B, O)/A(B, C, O), where O is some point not lying on this line.
(c) Conclude that the cross-ratio {A, B; C, D} is indeed given by the formula
above.
Solution
(a) The distance between the points H and R is by construction HR = P R sin θ.
It is possible to construct a rectangle of dimensions P Q × RH by adding to
the triangles P HR and RHQ their mirror images relative to the lines P R and
RQ respectively. The area A(P, Q, R) of the triangle P QR is half the area of
the rectangle, i.e.,
1 1
A(P, Q, R) = P Q × RH = P Q × P R sin θ.
2 2
(b) Let H denote the orthogonal projection of the point O onto the line passing
through the points A, B, and C. According to (a), we have A(A, B, O) =
1 1
2 AB × OH, and A(B, C, O) = 2 BC × OH. Thus
AB A(A, B, O)
R(A, B, C) = =ε ,
BC A(B, C, O)
where ε = ∓1. Taking the convention that the area A(P, Q, R) is negative
when the points P , Q, and R are in clockwise order yields the desired result.
(c) By definition of the cross-ratio,
CA DB − R(A, C, B) R(A, C, B)
{A, B; C, D} = = = .
CB DA − R(A, D, B) R(A, D, B)
Now, according to (a) and (b), we have, with the same sign convention as
before
A(A, C, O) OA × OC sin(α + β) OA sin(α + β)
R(A, C, B) = = =−
A(C, B, O) − OB × OC sin β OB sin β
63
and
thus
sin(α + β) sin(β + γ)
{A, B; C, D} = .
sin(α + β + γ) sin β
13.7. Show that the homography between two epipolar pencils of lines can be written as
aτ + b
τ → τ0 = ,
cτ + d
Now the slope of the line l = (λ, µ, −λα − µβ)T is τ = −λ/µ, and the slope of the
line l0 is τ 0 = −λ0 /µ0 . It follows that
Aλ + Bµ − Aτ + B aτ + b
τ0 = − =− = ,
Cλ + Dµ − Cτ + D cτ + d
Solution The following diagram will help articulate the successive steps of the
solution.
D
(x,y,z,w) C
(0,0,1,0)
D’
(x’,y’,z’,0) D"
A (x",y",z",0) E
(1,0,0,0) (1,1,1,0)
B
(0,1,0,0)
O"
(1,1,1,1)
O’
(0,0,0,1)
(a) Obviously, the coordinates of the points D 0 and D00 are simply (x0 , y 0 , z 0 , 0)
and (x00 , y 00 , z 00 , 0). The coordinates of the point E are (1, 1, 1, 0).
(b) Since D lies on the line O 0 D0 , we can write D = λ0 O 0 +µ0 D 0 = λ00 O 00 +µ00 D 00 .
It remains to compute the coordinates of D as the intersection of the two rays
O0 D0 and O 00 D00 .
We write D = λ0 O 0 + µ0 D 0 = λ00 O 00 + µ00 D 00 , which yields:
(c) The values of µ0 , µ00 , λ00 are found (up to some scale factor) by solving the
following homogeneous system:
−x0 x00 1 µ0
à !à !
−y 0 y 00 1 µ00 = 0. (13.2)
−z 0 z 00 1 λ00
Note that the determinant of this equation must be zero, which corresponds
to D0 , D00 , and E being collinear. In practice, (13.2) is solved through linear
least-squares, and the values of x, y, z, w are then computed using (13.1).
13.9. Show that if M̃ = (A b) and M̃0 = (Id 0) are two projection matrices, and if
F denotes the corresponding fundamental matrix, then [b× ]A is proportional to F
whenever F T b = 0 and
A = −λ[b× ]F + ( µb νb τ b ).
This shows that [b× ] is indeed proportional to F and there exists a four-parameter
family of solutions for the matrix M̃ defined (up to scale) by the parameters λ, µ,
ν, and τ .
13.10. We derive in this exercise a method for computing a minimal parameterization
of the fundamental matrix and estimating the corresponding projection matrices.
This is similar in spirit to the technique presented in Section 12.2.2 of chapter 12
in the affine case.
(a) Show that two projection matrices M and M0 can always be reduced to the
following canonical forms by an appropriate projective transformation:
aT
1 0 0 0 b1
à !
1
0
M̃ = 0 1 0 0 and M̃ = aT
2 b 2 .
0 0 1 0 0T 1
Note: For simplicity, you can assume that all the matrices involved in your
solution are nonsingular.
(b) Note that applying this transformation to the projection matrices amounts to
applying the inverse transformation to every scene point P . Let us denote by
P̃ = (x, y, z)T the position of the transformed point P̃ in the world coordinate
system and by p = (u, v, 1)T and p0 = (u0 , v 0 , 1)T the homogeneous coordinate
vectors of its images. Show that
mT
1
mT2
N = T
m3
m0T
3
u0 = za1 · p + b1 ,
½
(13.3)
v 0 = za2 · p + b2 .
66 Chapter 13 Projective Structure from Motion
(c) The above equation is easily rewritten in the familiar form pFp0 = 0 of the
epipolar constraint, the fundamental matrix being written in this case as
¡ ¢
F = a2 −a1 b 2 a 1 − b 1 a2 .
13.11. We show in this exercise that when two cameras are (internally) calibrated so the
essential matrix E can be estimated from point correspondences, it is possible the
recover the rotation R and translation t such E = [t× ]R without solving first the
projective structure-from-motion problem. (This exercise is courtesy of Andrew
Zisserman.)
(a) Since the structure of a scene can only be determined up to a similitude, the
translation t can only be recovered up to scale. Use this and the fact that
E T t = 0 to show that the SVD of the essential matrix can be written as
and conclude that t can be taken equal to the third column vector of U.
(b) Show that the two matrices
0 −1 0
à !
W= 1 0 0 .
0 0 1
Solution
(a) Since an essential matrix is singular with two equal nonzero singular values
(see chapter 10), and E and t are only defined up to scale, we can always take
the two nonzero singular values equal to 1, and write the SVD of E as
u1 · t
à !
T
0 = Vdiag(1, 1, 0)U t = V u2 · t ,
0
= u2 −u1 0 WV T = − u1 0 VT
¡ ¢ ¡ ¢
[t× ]R1 u2
= −Udiag(1, 1, 0)V T = −E.
Segmentation by Clustering
PROBLEMS
14.1. We wish to cluster a set of pixels using color and texture differences. The objective
function
X X
Φ(clusters, data) = (xj − ci )T (xj − ci )
i∈clusters j∈i‘th cluster
used in Section 14.4.2 may be inappropriate — for example, color differences could
be too strongly weighted if color and texture are measured on different scales.
(a) Extend the description of the k-means algorithm to deal with the case of an
objective function of the form
X X
Φ(clusters, data) = (xj − ci )T S(xj − ci ) ,
i∈clusters j∈i‘th cluster
68
69
14.3. Show that choosing a real vector that maximises the expression
y T (D − W)y
y T Dy
is the same as solving the eigenvalue problem
where z = D −1/2 y.
Solution DAF suggests not setting this as an exercise, because he got it wrong
(sorry!). The correct form would be: Show that choosing a real vector that max-
imises the expression
y T (D − W)y
y T Dy
is the same as solving the eigenvalue problem
where z = D 1/2 y. Of course, this requires that D have full rank, in which case one
could also solve
D−1 Wy = λy
or simply the generalized eigenvalue problem,
Wy − λDy = 0
Programming Assignments
14.5. Build a background subtraction algorithm using a moving average and experiment
with the filter.
14.6. Build a shot boundary detection system using any two techniques that appeal, and
compare performance on different runs of video.
14.7. Implement a segmenter that uses k-means to form segments based on color and
position. Describe the effect of different choices of the number of segments and
investigate the effects of different local minima.
C H A P T E R 15
which means that ((u − x), (v − y)) is parallel to the line’s normal, so (u, v) =
(x, y) + λ(a, b). Now if a2 + b2 = 1, abs(λ) would be the distance, because (a, b) is
a unit vector. But ax + by + c = 0, so au + bv + c = −λ(a2 + b2 ) = −λ and we are
done.
15.2. Derive the eigenvalue problem
µ ¶µ ¶ µ ¶
x2 − x x xy − x y a a
=µ
xy − x y y2 − y y b b
from the generative model for total least squares. This is a simple exercise —
maximum likelihood and a little manipulation will do it — but worth doing right
and remembering; the technique is extremely useful.
+ byi + c)2 subject to a2 + b2 = 1. This
P
Solution We wish to minimise i (axi
yields Ã
x2 xy x a a
! Ã !
y2 xy y b +λ b ,
x y 1 c 0
where λ is the Lagrange multiplier. Now substitute back the third row (which is
xa + yb + c = 0) to get the result.
15.3. How do we get a curve of edge points from an edge detector that returns orientation?
Give a recursive algorithm.
15.4. A slightly more stable variation of incremental fitting cuts the first few pixels and
the last few pixels from the line point list when fitting the line because these pixels
may have come from a corner
(a) Why would this lead to an improvement?
(b) How should one decide how many pixels to omit?
Solution
(a) The first and last few are respectively the end of one corner and the beginning
of the next, and tend to bias the fit.
(b) Experiment, though if you knew a lot about the edge detector and the lens
you might be able to derive an estimate.
71
72 Chapter 15 Segmentation by Fitting a Model
au2 + buv + cv 2 + du + ev + f = 0
and
2(a − c)uv − (2ady + e)u + (2cdx + d)v + (edx − ddy ) = 0.
(b) These are two quadratic equations. Write u for the vector (u, v, 1). Now show
that we can write these equations as uT M1 u = 0 and uT M2 u = 0, for M1
and M2 symmetric matrices.
(c) Show that there is a transformation T , such that T T M1 T = Id and T T M2 T
is diagonal.
(d) Now show how to use this transformation to obtain a set of solutions to the
equations; in particular, show that there can be up to four real solutions.
(e) Show that there are four, two, or zero real solutions to these equations.
(f ) Sketch an ellipse and indicate the points for which there are four or two solu-
tions.
Solution All this is straightforward algebra, except for (c) which gives a lot of
people trouble. M1 is symmetric, so can be reduced to a diagonal form by the
eigenvector matrix and to the identity using the square roots of the eigenvalues.
Now any rotation matrix fixes the identity; so I can use the eigenvector matrix of
M2 to diagonalize M2 while fixing M1 at the identity.
15.6. Show that the curve
1 − t2 2t
( , )
1 + t2 1 + t2
is a circular arc (the length of the arc depending on the interval for which the
parameter is defined).
(a) Write out the equation in t for the closest point on this arc to some data point
(dx , dy ). What is the degree of this equation? How many solutions in t could
there be?
(b) Now substitute s3 = t in the parametric equation, and write out the equation
for the closest point on this arc to the same data point. What is the degree of
the equation? Why is it so high? What conclusions can you draw?
Solution Do this by showing that
¶2
1 − t2
µ ´2
2t
³
+ = 1.
1 + t2 1 + t2
1 − t2 2t 2t 1 − t2
(x − ) + (y − )(− ) = 0,
1 + t2 1 + t2 1 + t2 1 + t2
and if we clear denominators by multiplying both sides by (1+t2 )2 , the highest
degree term in t will have degree 4, so the answer is in principle 4. But if you
expand the sum out, you’ll find that the degree 4 and degree 3 terms cancel,
and you’ll have a polynomial of degree 2.
73
(b) It will have degree 6 in s; this is because the parametrisation allows each
point on the curve to have three different parameter values (the s value for
each t is t(1/3) , and every number has three cube roots; it is very difficult in
practice to limit this sort of calculation to real values only).
15.7. Show that the viewing cone for a cone is a family of planes, all of which pass
through the focal point and the vertex of the cone. Now show the outline of a cone
consists of a set of lines passing through a vertex. You should be able to do this
by a simple argument without any need for calculations.
Solution The viewing cone for a surface consists of all rays through the focal
point and tangent to the surface. Construct a line through the focal point and the
vertex of the cone. Now construct any plane through the focal point that does not
pass through the vertex of the cone. This second plane slices the cone in some
curve. Construct the set of tangents to this curve that pass through the focal point
(which is on the plane by construction). Any plane that contains the first line and
one of these tangents is tangent to the cone, and the set of such planes exhausts
the planes tangent to the cone and passing through the focal point. The outline is
obtained by slicing this set of planes with another plane not lying on their shared
line, and so must be a set of lines passing through some common point.
Programming Assignments
15.8. Implement an incremental line fitter. Determine how significant a difference results
if you leave out the first few pixels and the last few pixels from the line point list
(put some care into building this, as it’s a useful piece of software to have lying
around in our experience).
15.9. Implement a hough transform line finder.
15.10. Count lines with an HT line finder - how well does it work?
C H A P T E R 16
74
C H A P T E R 17
(M T T T
i xi , M i+1 xi+1 , . . . , M i+k−1 xi+k−1 ).
(M T T T
i xi , M i+1 Di xi , . . . M i+k−1 Di+k−2 Di+k−3 . . . Di xi )
, which is
MT i
MTi+1 Di
xi ,
...
MT D D
i+k−1 i+k−2 i+k−3 . . . D i
and if the rank of this matrix (which is the transpose of the one given, except for
the index typo) is k, we are ok. The rest are calculations.
17.2. A point on the line is moving under the drift dynamic model. In particular, we
have xi ∼ N (xi−1 , 1). It starts at x0 = 0.
(a) What is its average velocity? (Remember, velocity is signed.)
Solution 0.
(b) What is its average speed? (Remember, speed is unsigned.)
Solution This depends (a) on the timestep and (b) on the number of steps
you allow before measuring the speed (sorry - DAF). But the average of 1-step
speeds is 1 if the timestep is 1.
75
76 Chapter 17 Tracking with Linear Dynamic Models
(c) How many steps, on average, before its distance from the start point is greater
than two (i.e., what is the expected number of steps, etc.?)
Solution This is finicky and should not have been set — don’t use it; sorry
– DAF.
(d) How many steps, on average, before its distance from the start point is greater
than ten (i.e., what is the expected number of steps, etc.)?
Solution This is finicky and should not have been set — don’t use it; sorry
– DAF.
(e) (This one requires some thought.) Assume we have two nonintersecting inter-
vals, one of length 1 and one of length 2; what is the limit of the ratio (average
percentage of time spent in interval one)/ (average percentage of time spent
in interval two) as the number of steps becomes infinite?
Solution This is finicky and should not have been set — don’t use it; sorry
– DAF.
(f ) You probably guessed the ratio in the previous question; now run a simulation
and see how long it takes for this ratio to look like the right answer.
Solution This is finicky and should not have been set — sorry, DAF. The
answer is 1/2, and a simulation will produce it, but will take quite a long time
to do so.
17.3. We said that
ad + cb bd
g(x; a, b)g(x; c, d) = g(x; , )f (a, b, c, d).
b+d b+d
Show that this is true. The easiest way to do this is to take logs and rearrange the
fractions.
17.4. Assume that we have the dynamics
2
yi ∼ N (mi xi , σm i
).
(a) P (xi |xi−1 ) is a normal density with mean di xi−1 and variance σd2i . What is
P (xi−1 |xi )?
(b) Now show how we can obtain a representation of P (xi |y i+1 , . . . , y N ) using a
Kalman filter.
Solution
(a) We have xi = di xi−1 + ζ, where ζ is Gaussian noise with zero mean and
variance σd2i . This means that xi−1 = (1/di )(xi − ζ) = xi /di + ξ, where ξ is
Gaussian noise with zero mean and variance σd2i /d2i .
(b) Run time backwards.
Programming Assignments
17.5. Implement a 2D Kalman filter tracker to track something in a simple video se-
quence. We suggest that you use a background subtraction process and track the
foreground blob. The state space should probably involve the position of the blob,
its velocity, its orientation — which you can get by computing the matrix of second
moments — and its angular velocity.
77
17.6. If one has an estimate of the background, a Kalman filter can improve background
subtraction by tracking illumination variations and camera gain changes. Imple-
ment a Kalman filter that does this; how substantial an improvement does this
offer? Notice that a reasonable model of illumination variation has the background
multiplied by a noise term that is near one — you can turn this into linear dynamics
by taking logs.
C H A P T E R 18
Model-Based Vision
PROBLEMS
18.1. Assume that we are viewing objects in a calibrated perspective camera and wish
to use a pose consistency algorithm for recognition.
(a) Show that three points is a frame group.
(b) Show that a line and a point is not a frame group.
(c) Explain why it is a good idea to have frame groups composed of different types
of feature.
(d) Is a circle and a point not on its axis a frame group?
Solution
(a) We have a calibrated perspective camera, so in the camera frame we can
construct the three rays through the focal point corresponding to each image
point. We must now slice these three rays with some plane to get a prescribed
triangle (the three points on the object). If there is only a discrete set of ways
of doing this, we have a frame group, because we can recover the rotation and
translation of the camera from any such plane. Now choose a point along ray
1 to be the first object point. There are at most two possible points on ray
2 that could be the second object point — see this by thinking about the 12
edge of the triangle as a link of fixed length, and swinging this around the
first point; it forms a sphere, which can intersect a line in at most two points.
Choose one of these points. Now we have fixed one edge of our triangle in
space — can we get the third point on the object triangle to intersect the third
image ray? In general, no, because we can only rotate the triangle about the
12 edge, which means the third point describes a circle; but a circle will not in
general intersect a ray in space, so we have to choose a special point for along
ray 1 to be the object point. It follows that only a discrete set of choices are
possible, and we are done.
(b) We use a version of the previous argument. We have a calibrated perspective
camera, and so can construct in the camera frame the plane and ray corre-
sponding respectively to the image line and image point. Now choose a line
on the plane to be the object line. Can we find a solution for the object point?
The object point could lie anywhere on a cylinder whose axis is the chosen
line and whose radius is the distance from line to point. There are now two
general cases — either there is no solution, or there are two (where the ray
intersects the cylinder). But for most lines where there are two solutions,
slightly moving the line results in another line for which there are two solu-
tions, so there is a continuous family of available solutions, meaning it can’t
be a frame group.
(c) Correspondence search is easier.
(d) Yes, for a calibrated perspective camera.
78
79
18.2. We have a set of plane points P j ; these are subject to a plane affine transformation.
Show that £ ¤
det P i P j P k
£ ¤
det P i P j P l
is an affine invariant (as long as no two of i, j, k, and l are the same and no three
of these points are collinear).
Solution Write Qi = MP i for the affine transform of point P i . Now
£ ¤ £ ¤ £ ¤ £ ¤
det Qi Qj Qk det(M P i P j P k ) det(M) det P i P j P k det P i P j P k
£ ¤ = £ ¤ = £ ¤ = £ ¤
det Qi Qj Ql det(M P i P j P l ) det(M) det P i P j P l det P i P j P l
18.3. Use the result of the previous exercise to construct an affine invariant for:
(a) four lines,
(b) three coplanar points,
(c) a line and two points (these last two will take some thought).
Solution
(a) Take the intersection of lines 1 and 2 as P i , etc.
(b) Typo! can’t be done; sorry - DAF.
(c) Construct the line joining the two points; these points, with the intersection
between the lines, give three collinear points. The ratio of their lengths is an
affine invariant. Easiest proof: an affine transformation of the plane restricted
to this line is an affine transformation of the line. But this involves only scaling
and translation, and the ratio of lengths is invariant to both.
18.4. In chamfer matching at any step, a pixel can be updated if the distances from some
or all of its neighbors to an edge are known. Borgefors counts the distance from a
pixel to a vertical or horizontal neighbor as 3 and to a diagonal
√ neighbor as 4 to
ensure the pixel values are integers. Why does this mean 2 is approximated as
4/3? Would a better approximation be a good idea?
18.5. One way to improve pose estimates is to take a verification score and then optimize
it as a function of pose. We said that this optimization could be hard particularly
if the test to tell whether a backprojected curve was close to an edge point was a
threshold on distance. Why would this lead to a hard optimization problem?
Solution Because the error would not be differentiable — as the backprojected
outline moved, some points would start or stop contributing.
18.6. We said that for an uncalibrated affine camera viewing a set of plane points, the
effect of the camera can be written as an unknown plane affine transformation.
Prove this. What if the camera is an uncalibrated perspective camera viewing a
set of plane points?
18.7. Prepare a summary of methods for registration in medical imaging other than the
geometric hashing idea we discussed. You should keep practical constraints in mind,
and you should indicate which methods you favor, and why.
18.8. Prepare a summary of nonmedical applications of registration and pose consistency.
Programming Assignments
18.9. Representing an object as a linear combination of models is often represented as
abstraction because we can regard adjusting the coefficients as obtaining the same
view of different models. Furthermore, we could get a parametric family of models
80 Chapter 18 Model-Based Vision
by adding a basis element to the space. Explore these ideas by building a system for
matching rectangular buildings where the width, height, and depth of the building
are unknown parameters. You should extend the linear combinations idea to han-
dle orthographic cameras; this involves constraining the coefficients to represent
rotations.
C H A P T E R 19
PROBLEMS
19.1. What is (in general) the shape of the silhouette of a sphere observed by a perspective
camera?
|x0 × x00 |
κ= , (19.1)
|x0 |3
where x0 and x00 denote, respectively, the first and second derivatives of x with
respect to the parameter t defining it.
Hint: Reparameterize x by its arc length and reflect the change of parameters in
the differentiation.
d ds d ds
x0 = x= x= t,
dt dt ds dt
81
82 Chapter 19 Smooth Surfaces and Their Outlines
and
d d2 s ds d d2 s ds
x00 = x0 = 2
t+( )2 t= t + κ( )2 n.
dt dt dt ds dt2 dt
It follows that
ds
x0 × x00 = κ( )3 b,
dt
and since t and b have unit norm, we have indeed
|x0 × x00 |
κ= .
|x0 |3
19.5. Prove that, unless the normal curvature is constant over all possible directions, the
principal directions are orthogonal to each other.
Solution According to Ex. 19.6 below, the second fundamental form is symmet-
ric. If follows that the tangent plane admits an orthonormal basis formed by the
eigenvectors of the associated linear map dN , and that the corresponding eigen-
values are real (this is a general property of symmetric operators). Unless they
are equal, the orthonormal basis is essentially unique (except for swapping the two
eigenvectors or changing their orientation), and the two eigenvalues are the maxi-
mum and minimum values of the second fundamental form (this is another general
property of quadratic forms, see chapter 3 for a proof that the maximum value of
a quadratic form is the maximum eigenvalue of the corresponding linear map). It
follows that the principal curvatures are the two eigenvalues, and the principal di-
rections are the corresponding eigenvectors, that are uniquely defined (in the sense
used above) and orthogonal to each other unless the eigenvalues are equal, in which
case the normal curvature is constant.
19.6. Prove that the second fundamental form is bilinear and symmetric.
Solution The bilinearity of the second fundamental form follows immediately
from the fact that the differential of the Gauss map is linear. We remain quite
informal in our proof of its symmetry. Given two directions u and v in the tangent
plane of a surface S at some point P0 , we pick a parameterization P : U ×V ⊂ R2 →
W ⊂ S of S in some neighborhood W of P0 such that P (0, 0) = P0 , and the tangents
to the two surface curves α and β respectively defined by P (u, 0) for u ∈ I and
P (0, v) for v ∈ J are respectively u and v. We assume that this parameterization
is differentiable as many times as desired and abstain from justifying its existence.
We omit the parameters from now on and assume that all functions are evaluated
in (0, 0). We use subscripts to denote partial derivatives, e.g., Puv denotes the
second partial derivative of P with respect to u and v. The partial derivatives Pu
and Pv lie in the tangent plane at any point in W . Differentiating N · Pu = with
respect to v yields
N v · Pu + N · Puv = 0.
Likewise, we have
N u · Pv + N · Pvu = 0.
N u · Pv = N v · Pu ,
83
or equivalently
v · dN u = u · dN v,
which shows that the second fundamental form is indeed symmetric.
19.7. Let us denote by α the angle between the plane Π and the tangent to a curve Γ
and by β the angle between the normal to Π and the binormal to Γ, and by κ the
curvature at some point on Γ. Prove that if κa denotes the apparent curvature of
the image of Γ at the corresponding point, then
cos β
κa = κ .
cos3 α
y = (x · i)i + (x · j)j,
(
y 0 = (t · i)i + (t · j)j,
y 00 = κ[(n · i)i + (n · j)j].
|y 0 × y 00 |
κa =
|y 0 |3
| cos β|
κa = ,
| cos3 α|
but α can always be taken positive (just pick the appropriate orientation for i),
and β can also be taken positive by choosing the orientation of Γ appropriately.
The result follows.
19.8. Let κu and κv denote the normal curvatures in conjugated directions u and v at
a point P , and let K denote the Gaussian curvature; prove that
K sin2 θ = κu κv ,
Hint: Relate the expressions obtained for the second fundamental form in the
bases of the tangent plane respectively formed by the conjugated directions and
the principal directions.
Solution Let us assume that u and v are unit vectors and write them in the basis
of the tangent plane formed by the (unit) principal directions as u = u1 e1 + u2 e2
and v = v1 e1 + v2 e2 . We have
II(u, u) = κu = κ1 u21 + κ2 u22 ,
II(v, v) = κv = κ v 2 + κ v 2 ,
II(u, v) = 0 = κ u1 v1 + κ2 u2 v .
1 1 1 2 2 2
u1 u1
κu = κ 1 (u1 v2 − u2 v1 ) = κ1 sin θ
v2 v2
since e1 and e2 form an orthonormal basis of the tangent plane and u has unit
norm. A similar line of reasoning shows that
v2 v2
κv = κ 2 (u1 v2 − u2 v1 ) = κ2 sin θ,
u1 u1
κt
κa = ,
cos2 α
Solution Let us denote by Γ the surface curve and by γ its projection. We assume
of course that the point P that we are considering lies on the occluding contour
(even though the curve under consideration may not be the occluding contour).
Since κt is a signed quantity, it will be necessary to give κa a meaningful sign to
establish the desired result. Let us first show that, whatever that meaning may be,
we have indeed
|κt |
|κa | = .
cos2 α
85
We follow the notation of Ex. 19.7 and use the same coordinate system. Since P
is on the occluding contour, the surface normal N in P is also the normal to γ,
and we must have N = ∓j. Let φ denote the angle between N and the principal
normal n to Γ. We must therefore have b = | cos φ| = cosβ/ cos α (since we have
chosen our coordinate system so cos α ≥ 0 and cos β ≥ 0), and it follows, according
to Meusnier’s theorem and Ex. 19.7, that
Let us now turn to giving a meaningful sign to κa and determining this sign. By
convention, we take κa positive when the principal normal n0 to γ is equal to −N ,
and negative when n0 = N .
It is easy to show that with our choice of coordinate system, we always have n 0 = j:
Briefly, let us reparameterize y by its arc length s0 , noting that, because of the
foreshortening induced by the projection, ds0 = ds cos α. Using a line or reasoning
similar to Ex. 19.7 but differentiating y with respect to s0 , it is easy to show that
the cross product of the tangent t0 = i and (principal) normal n0 to γ verify
cos β
t0 × (κ0 n0 ) = κ k,
cos3 α
κt
κa = .
cos2 α
Note that when Γ is the occluding contour, the convention we have chosen for
the sign of the apparent curvature yields the expected result: κa is positive when
the contour point is convex (i.e., its principal normal is (locally) inside the region
bounded by the image contour), and κa is negative when the point is concave.
C H A P T E R 20
Aspect Graphs
PROBLEMS
20.1. Draw the orthographic and spherical perspective aspect graphs of the transparent
Flatland object below along with the corresponding aspects.
Solution The visual events for the transparent object, along with the various
cells of the perspective and orthographic aspect graph, are shown below.
8
9 8
10 7 9
15 7
10
17 11
11 16
19 6
12 14 12 6
13
20 18 5 5
1 4
1 3
4 2
2 3
Note that cells of the perspective aspect graph created by the intersection of visual
event rays outside of the box are not shown. Three of the perspective aspects are
shown below. Note the change in the order of the contour points between aspects 1
and 13, and the addition of two contour points as one goes from aspect 1 to aspect
12.
20.2. Draw the orthographic and spherical perspective aspect graphs of the opaque object
along with the corresponding aspects.
86
87
Solution The visual events for the opaque object, along with the various cells of
the perspective and orthographic aspect graph, are shown below.
6
7
6
5 7 5
11
12 8
14
8
9
10
13 4 4
1 3
1
3 2
2
Note that cells of the perspective aspect graph created by the intersection of visual
event rays outside of the box are not shown. Three of the perspective aspects are
shown below. Note the change in the order of the contour points between aspects
1 and 13, and the addition of a single contour point as one goes from aspect 1 to
aspect 12.
20.3. Is it possible for an object with a single parabolic curve (such as a banana) to have
no cusp of Gauss at all? Why (or why not)?
Layer k+2
Layer k+1
Layer k
k k+2 k
If there is no cusp, the fold is a smooth closed curve that forms (globally) the
boundary between layers k + 1 and k + 2 (the change in multiplicity cannot change
along the curve). But layer k + 1 must be connected to layer k by another fold
curve. Thus either the surface of a nonconvex compact solid admits cusps of Gauss,
or it has at least two distinct parabolic curves.
88 Chapter 20 Aspect Graphs
20.4. Use an equation-counting argument to justify the fact that contact of order six
or greater between lines and surfaces does not occur for generic surfaces. (Hint:
Count the parameters that define contact.)
Solution A line has contact of order n with a surface when all derivatives of
order less than or equal to n − 1 of the surface are zero in the direction of the
line. Ordinary tangents have order-two contact with the surface, and there is
a three-parameter family of those (all tangent lines in the tangent planes of all
surface points); asymptotic tangents have order-three contact and there is a two-
parameter family of those (the two asymptotic tangents at each saddle-shaped
point); order-four contact occurs for the asymptotic tangents along flecnodal and
parabolic curves; there are a finite number of order-five tangents at isolated points
of the surface (including gutterpoints and cusps of Gauss); and finally there is in
general no order-six tangent.
20.5. We saw that the asymptotic curve and its spherical image have perpendicular tan-
gents. Lines of curvature are the integral curves of the field of principal directions.
Show that these curves and their Gaussian image have parallel tangents.
Solution Lip and beak-to-beak events occur when the Gaussian image of the oc-
cluding contour becomes tangent to the fold associated with a parabolic point. Let
us assume that the fold is convex at this point (a similar reasoning applies when the
fold is concave, but the situation becomes more complicated at inflections). There
exists some neighborhood of the tangency point such that any great circle inter-
secting the fold in this neighborhood will intersect it exactly twice. As illustrated
by the diagram below, two of the asymptotic curve branches tangent to the fold at
the intersections admit a great circle bitangent to them.
Great
circles
This great circle also intersects the fold exactly twice, and since it is tangent to the
asymptotic curves, it is orthogonal to the corresponding asymptotic direction. In
other words, the viewing direction is an asymptotic direction at the corresponding
points of the occluding contour, yielding two cusps of the image contour.
20.7. Lip and beak-to-beak events of implicit surfaces. It can be shown (Pae and Ponce,
2001) that the parabolic curves of a surface defined implicitly as the zero set of some
density function F (x, y, z) = 0 are characterized by this equation and P (x, y, z) = 0,
89
def
where P = ∇F T A∇F = 0, ∇F is the gradient of F , and A is the symmetric matrix
2
Fyy Fzz − Fyz Fxz Fyz − Fzz Fxy Fxy Fyz − Fyy Fxz
def 2
A = Fxz Fyz − Fzz Fxy
Fzz Fxx − Fxz Fxy Fxz − Fxx Fyz .
2
Fxy Fyz − Fyy Fxz Fxy Fxz − Fxx Fyz Fxx Fyy − Fxy
It can also be shown that the asymptotic direction at a parabolic point is A∇F .
(a) Show that AH = Det(H)Id, where H denotes the Hessian of F .
(b) Show that cusps of Gauss are parabolic points that satisfy the equation ∇P T A∇F =
0. Hint: Use the fact that the asymptotic direction at a cusp of Gauss is tan-
gent to the parabolic curve, and that the vector ∇F is normal to the tangent
plane of the surface defined by F = 0.
(c) Sketch an algorithm for tracing the lip and beak-to-beak events of an implicit
surface.
Solution
(a) Note that the Hessian can be written as
(h2 × h3 )T ¡
AH = (h3 × h1 )T h1
¢
h2 h3 = Det(H)Id
(h1 × h2 )T
since the determinant of the matrix formed by three vectors is the dot product
of the first vector with the cross product of the other two vectors.
(b) The parabolic curve can be thought of as the intersection of the two surfaces
defined by F (x, y, z) = 0 and P (x, y, z) = 0. Its tangent lies in the intersection
of the tangent planes of these two surfaces and is therefore orthogonal to
the normals ∇F and ∇P . For a point to be a cusp of Gauss, this tangent
must be along the asymptotic direction A∇F , and we must therefore have
∇F T A∇F = 0, which is automatically satisfied at a parabolic point, and
∇P T A∇F = 0, which is the desired condition.
(c) To trace the lip and beak-to-beak events, simply use Algorithm 20.2 to trace
the parabolic curve defined in R3 by the equations F (x, y, z) = 0 and P (x, y, z) =
0, computing for each point along this curve the vector A∇F as the corre-
sponding asymptotic direction. The cusps of Gauss can be found by adding
∇P T A∇F = 0 to these two equations and solving the corresponding system
of three polynomial equations in three unknowns using homotopy continua-
tion.
90 Chapter 20 Aspect Graphs
20.8. Swallowtail events of implicit surfaces. It can be shown that the asymptotic direc-
tions a at a hyperbolic point satisfy the two equations ∇F · a = 0 and aT Ha = 0,
where H denotes the Hessian of F . These two equations simply indicate that the
order of contact between a surface and its asymptotic tangents is at least equal
to three. Asymptotic tangents along flecnodal curves have order-four contact with
the surface, and this is characterized by a third equation, namely
a T Hx a
aT Hy a · a = 0.
a T Hz a
Solution Triple points are characterized by the following equations in the posi-
tions of the contact points xi = (xi , yi , zi )T (i = 1, 2, 3):
F (xi ) = 0, i = 1, 2, 3,
(
(x1 − x2 ) · ∇F (xi ) = 0, i = 1, 2, 3,
(x2 − x1 ) × (x3 − x1 ) = 0.
Note that the vector equation involving the cross product is equivalent to two
independent scalar equations, thus triple points correspond to curves defined in R9
by eight equations in nine unknowns.
Tangent crossings correspond to curves defined in R6 by the following five equations
in the positions of the contact points x1 and x2 :
F (xi ) = 0, i = 1, 2,
(
(x1 − x2 ) · ∇F (xi ), i = 1, 2,
(∇F (x1 ) × ∇F (x2 )) · (x2 − x1 ) = 0.
where the last equation simply expresses the fact that the viewing direction is an
asymptotic direction of the surface in x1 .
Programming Assignments
20.10. Write a program to explore multilocal visual events: Consider two spheres with
different radii and assume orthographic projection. The program should allow you
to change viewpoint interactively as well as explore the tangent crossings associated
with the limiting bitangent developable.
20.11. Write a similar program to explore cusp points and their projections. You have
to trace a plane curve.
C H A P T E R 21
Range Data
PROBLEMS
21.1. Use Eq. (21.1) to show that a necessary and sufficient condition for the coordinate
curves of a parameterized surface to be lines of curvature is that f = F = 0.
Solution The principal directions satisfy the differential equation (21.1) repro-
duced here for completeness:
2 2
v0 −u0 v 0 u0 2 2
0= E F G = (F g − f G)v 0 − u0 v 0 (Ge − gE) + u0 (Ef − eF ).
e f g
The coordinate curves are lines of curvature when their tangents are along principal
directions. Equivalently, the solutions of Eq. (21.1) must be u0 = 0 and v 0 = 0,
which is in turn equivalent to F g − f G = Ef − eF = 0. Clearly, this condition
is satisfied when f = F = 0. Conversely, when F g − f G = Ef − eF = 0, either
f = F = 0, or we can write e = λE, f = λF , and g = λG for some scalar λ 6= 0.
In the latter case, the normal curvature in any direction t of the tangent plane is
given by
2 2
II(t, t) eu0 + 2f u0 v 0 + gv 0
κt = = 2 2
= λ,
I(t, t) Eu0 + 2F u0 v 0 + Gv 0
i.e., the normal curvature is independent of the direction in which it is measured
(we say that such a point is an umbilic). In this case, the principal directions
are of course ill defined. It follows that a necessary and sufficient condition for
the coordinate curves of a parameterized surface to be lines of curvature is that
f = F = 0.
21.2. Show that the lines of curvature of a surface of revolution are its meridians and
parallels.
Solution Let us consider a surface of revolution parameterized by x(θ, z) =
(r(z) cos θ, r(z) sin θ, z)T , where r(z0 ) denotes the radius of the circle formed by the
intersection of the surface with the plane z = z0 . We have xθ = (−r sin θ, r cos θ, 0)T
and xz = (r 0 cos θ, r 0 sin θ, 1)T , thus F = x
pθ · xz = 0. Now, normalizing the cross
product of xθ and xz shows that N = (1/ 1 + r 0 2 )(cos θ, sin θ, −r 0 )T . Finally, we
have xθz = (−r 0 sin θ, r0 cos θ, 0)T , and it follows that f = −N ·xθz = 0. According
to the previous exercise, the lines of curvature of a surface of revolution are thus
its coordinate curves—that is, its meridians and its parallels.
21.3. Step model: Compute zσ (x) = Gσ ∗ z(x), where z(x) is given by Eq. (21.2). Show
that zσ00 is given by Eq. (21.3). Conclude that κ00 0
σ /κσ = −2δ/h in the point xσ
where zσ00 and κσ vanish.
Solution Recall that the step model is defined by
½
k1 x + c when x < 0,
z=
k2 x + c + h when x > 0.
92
93
and
1 hx x2
zσ00 = √ (δ − 2 ) exp(− 2 ).
σ 2π σ 2σ
The latter is indeed Eq. (21.3). Now, we have
Since zσ00 (xσ ) = 0, the second term in the expression above vanishes in xσ . Now, the
derivatives of the numerator and denominator of this term obviously also vanish
in xσ since all the terms making them up contain zσ00 as a factor. Likewise, the
derivative of the denominator of the first term vanished in xσ , and it follows that
κ00 0 0000 000
σ /κσ = zσ /zσ √ at this point. Now, we can write zσ00 = a exp(−x2 /2σ 2 ), with
2
a = (δ − xh/σ )/σ 2π. Therefore,
ax x2
zσ000 = (a0 − 2
) exp(− )
σ 2σ 2
and
a x2 2a0 x x2
zσ0000 = (a00 − 2
(1 − 2
)− ) exp(− ).
σ σ σ2 2σ 2
2
Now, a00 is identically zero, and a is zero in xσ . It follows that κ00 0
σ /κσ = −2xσ /σ =
−2δ/h at this point.
21.4. Roof model: Show that κσ is given by Eq. (21.4).
Solution Plugging the value h = 0 in the expressions for zσ0 and zσ00 derived in
the previous exercise shows immediately that
1 δ exp(−x2 /2σ 2 )
κσ = √ ,
σ 2π
" µ Z x ¶2 #3/2
δ
1+ k+ √ exp(−t2 /2σ 2 )dt
σ 2π 0
and using the change of variable u = t/σ in the integral finally shows that
x2
1 δ exp(− )
κσ = √ 2σ 2 .
σ 2π
à !2 3/2
Z x/σ 2
1 + k + √δ u
exp(− )du
2π 0 2
94 Chapter 21 Range Data
21.5. Show that the quaternion q = cos θ2 + sin θ2 u represents the rotation R of angle θ
about the unit vector u in the sense of Eq. (21.5).
Hint: Use the Rodrigues formula derived in the exercises of chapter 3.
Solution Let us consider some vector α in R 3 , define β = qαq̄, and show that
β is the vector of R3 obtained by rotating α about the vector u by an angle θ.
Recall that the quaternion product is defined by
def
(a + α)(b + β) = (ab − α · β) + (aβ + bα + α × β).
Thus,
which is indeed the Rodrigues formula for a rotation of angle θ about the unit
vector u.
21.6. Show that the rotation matrix R associated with a given unit quaternion q = a + α
with α = (b, c, d)T is given by Eq. (21.6).
Solution First, note that any unit quaternion can be written as q = a + α where
a = cos θ2 and α = sin θ2 uT for some angle θ and unit vector u (this is because
a2 + |α|2 = 1). Now, to derive Eq. (21.6), i.e.,
a2 + b 2 − c 2 − d 2
2(bc − ad) 2(bd + ac)
R= 2(bc + ad) a 2 − b2 + c 2 − d 2 2(cd − ab) ,
2 2 2 2
2(bd − ac) 2(cd + ab) a −b −c +d
all we need to do is combine the result established in the previous exercise with
that obtained in Ex. 3.7, that states that if u = (u, v, w)T , then
u2 (1 − cos θ) + cos θ
uv(1 − cos θ) − w sin θ uw(1 − cos θ) + v sin θ
R = uv(1 − cos θ) + w sin θ
v 2 (1 − cos θ) + cos θ vw(1 − cos θ) − u sin θ .
uw(1 − cos θ) − v sin θ vw(1 − cos θ) + u sin θ w 2 (1 − cos θ) + cos θ
Showing that the entries of both matrices are the same is a (slightly tedious) exercise
in algebra and trigonometry.
Let us first show that the two top left entries are the same. Note that |α| 2 =
b2 + c2 + d2 = sin2 θ2 |u|2 = sin2 θ2 ; it follows that
θ θ θ θ
a2 +b2 −c2 −d2 = a2 +2b2 −sin2 = cos2 −sin2 +2 sin2 u2 = cos θ+u2 (1−cos θ).
2 2 2 2
The exact same type of resoning applies to the other diagonal entries.
Let us not consider the entries corresponding to the first row and the second column
of the two matrices. We have
θ θ θ
2(bc − ad) = 2uv sin2 − 2 cos sin w = uv(1 − cos θ) − w sin θ,
2 2 2
95
so the two entries do indeed coincide. The exact same type of resoning applies to
the other non-diagonal entries.
21.7. Show that the matrix Ai constructed in Section 21.3.2 is equal to
T
µ ¶
Ai = 0 yT
i − yi
0
.
y 0i − y i [y i + y 0i ]×
y 0i q − qy i = (−y 0i · α + ay 0i + y 0i × α) − (−y i · α + ay i + α × y i )
0 0
µ i − y i ) · α + a(y i T−¶µ
= (y y i ) +¶(y i + y 0i ) × α
= 0 yT
i − yi
0 a
= Ai q,
y 0i − y i [y i + y 0i ]× α
where q has been identified with the 4-vector whose first coordinate is a and the
remaining coordinates are those of α. In particular, we have
n
X n
X n
X
E= |y 0i q − qy i |2 = |Ai q|2 = qT Bq, where B = AT
i Ai .
i=1 i=1 i=1
21.8. As mentioned earlier, the ICP method can be extended to various types of geometric
models. We consider here the case of polyhedral models and piecewise parametric
patches.
(a) Sketch a method for computing the point Q in a polygon that is closest to
some point P .
(b) Sketch a method for computing the point Q in the parametric patch x : I ×J →
R3 that is closest to some point P . Hint: Use Newton iterations.
Solution
(a) Let A denote the polygon. Construct the orthogonal projection Q of P into
the plane that contains A. Test whether Q lies inside A. When A is convex,
this can be done by testing whether Q is on the “right” side of all the edges
of A. When A is not convex, one can accumulate the angles between Q and
the successive vertices of A. The point will be inside the polygon when these
angles add to 2π, on the boundary when they add to π, and outside when
they add to 0. Both methods take linear time. If Q is inside A, it is the closest
point to P . If it is outside, the closest point must either lie in the interior
of one of the edges (this is checked by projecting P onto the line supporting
each edge) or be one of the vertices, and it can be found in linear time as well.
(b) Just as in the polygonal case, the shortest distance is reached either in the
interior of the patch or on its boundary. We only detail the case where
the closest point lies inside the patch, since the case where it lies in the
interior of a boundare curve is similar, and the case where it is a vertex is
straightforward. As suggested, starting from some point x(u, v) inside the
patch, we can use Newton iterations to find the closest point, which is the
orthogonal projection of P onto the patch. Thus we seek a zero of the (vector)
function f (u, v) = N (u, v) × (x(u, v) − P ), where P denotes the coordinate
vector of the point P . The Jacobian J of f is easily computed as a function
96 Chapter 21 Range Data
def
and q = (a200 , a110 , a020 , a011 , a002 , a101 , a100 , a010 , a001 , a000 )T .
This is a homogeneous linear least-squares problem that can be solved using the
eigenvalue/eigenvector methods described in chapter 3.
21.10. Show that a surface triangle maps onto a patch with hyperbolic edges in α, β
space.
Solution Consider a triangle edge with extremities Q1 and Q2 . Any point along
this edge can be written as Q = (1 − t)Q1 + tQ2 . If the corresponding spin
coordinates are α and β, we have
−
−→ −−→ −−→
β = P Q · n = [(1 − t)P Q1 + tP Q2 ] · n = (1 − t)β1 + tβ2 ,
−−→ −−→
where a1 = P Q1 × n and a2 = P Q2 × n. Computing t as a function of β and
substituting in this expression for α yields
β2 − β β − β1
α=| a1 + a2 |.
β2 − β1 β2 − β1
(β2 − β1 )2 α2 − |a1 − a2 |2 β 2 + λβ + µ = 0,
(b) Why are we ignoring the boundary of D (which is the same as the boundary
of S − D) in computing the total risk?
Solution
(a) Straightforward.
(b) Boundary has measure zero.
22.2. In Section 22.2, we said that if each class-conditional density had the same covari-
ance, the classifier of Algorithm 22.2 boiled down to comparing two expressions
that are linear in x.
(a) Show that this is true.
(b) Show that if there are only two classes, we need only test the sign of a linear
expression in x.
22.3. In Section 22.3.1, we set up a feature u, where the value of u on the ith data point
is given by ui = v · (xi − µ). Show that u has zero mean.
Solution The mean of ui is the mean of v · (xi − µ) which is the mean of v · xi
minus v · µ; but the mean of v · xi is v dotted with the mean of xi , which mean is
µ.
22.4. In Section 22.3.1, we set up a series of features u, where the value of u on the
ith data point is given by ui = v · (xi − µ). We then said that the v would be
eigenvectors of Σ, the covariance matrix of the data items. Show that the different
features are independent using the fact that the eigenvectors of a symmetric matrix
are orthogonal.
22.5. In Section 22.2.1, we said that the ROC was invariant to choice of prior. Prove
this.
Programming Assignments
22.6. Build a program that marks likely skin pixels on an image; you should compare at
least two different kinds of classifier for this purpose. It is worth doing this carefully
because many people have found skin filters useful.
22.7. Build one of the many face finders described in the text.
98
C H A P T E R 23
99
C H A P T E R 24
We now write that the gradient is an eigenvector of the Hessian, or if “×” denotes
the operator associating with two vectors in R2 the determinant of their coordi-
nates,
0 = (H∇h)
µ × ∇h ¶ µ ¶
2 (1 − lκ1 sin θ1 ) cos θ1 + cos θ2 cos(θ1 − θ2 ) cos θ1
= l × ,
− cos θ1 cos(θ1 − θ2 ) − (1 + lκ2 sin θ2 ) cos θ2 − cos θ2
100
101
24.2. Generalized cylinders: The definition of a valley given in the previous exercise
is valid for height surfaces defined over n-dimensional domains and valleys form
curves in any dimension. Briefly explain how to extend the definition of ribbons
given in that exercise to a new definition for generalized cylinders. Are difficulties
not encountered in the two-dimensional case to be expected?
Solution Following the ideas from the previous exercise, it is possible to define
the generalized cylinder associated with a volume V by the valleys of a height
function defined over a three-dimensional domain: for example, we can pick some
parameterization Π of the three-dimensional set of all planes by three parameters
(s1 , s2 , s3 ), and define h(s1 , s2 , s3 ) as the area of the region where V and the plane
Π(s1 , s2 , s3 ) intersect. The valleys (and ridges) of this height function are charac-
terized as before by (H∇h) × ∇h = 0, where “×” denotes this time the operator
associating with two vectors their cross product. They form a one-dimensional set
of cross-sections of V that can be taken as the generalized cylinder description of
this volume.
There are some difficulties with this definition that are not encountered in the two-
dimensional case: In particular, there is no natural parameterization of the cross-
sections of a volume by the points on its boundary, and the valleys found using
a plane parameterization depend on the choice of this parameterization. More-
over, the cross-section of a volume by a plane may consist of several connected
components. See Ponce et al. (1999) for a discussion.
24.3. Skewed symmetries: A skewed symmetry is a Brooks ribbon with a straight ax-
is and generators at a fixed angle θ from the axis. Skewed symmetries play an
important role in line-drawing analysis because it can be shown that a bilaterally
symmetric planar figure projects onto a skewed symmetry under orthographic pro-
jection (Kanade, 1981). Show that two contour points P1 and P2 forming a skewed
symmetry verify the equation
· ¸3
κ2 sin α2
=− ,
κ1 sin α1
where u is the generator direction, v is the skew axis direction, and x1 and x2
denote the two endpoints of the ribbon generators. Differentiating x1 and x2 with
respect to s yields ½ 0
x1 = v − r 0 u,
x02 = v + r 0 u,
102 Chapter 24 Geometric Templates from Spatial Relations
and
x00 00
½
1 = −r u,
00 00
x2 = r u.
Let us define αi as the (unsigned) angle between the normal in xi (i = 1, 2) and
the line joining the two points x1 and x2 . We have
1 sin θ
0
sin α1 = |x0 | |u × x1 | = p ,
1 1 − 2r 0 cos θ + r 02
sin α = 1 |u × x0 | = p sin θ
,
2 0 2
|x2 |
1 + 2r 0 cos θ + r 02
where “×” denotes the operator associating with two vectors in R2 the determinant
of their coordinates.
Now remember from Ex. 19.4 that the curvature of a parametric curve is κ =
|x0 × x00 |/|x0 |3 . Using the convention that the curvature is positive when the
ribbon boundary is convex, we obtain
− r00 sin θ
κ1 = ,
(1 − 2r 0 cos θ + r 02 )3/2
r00 sin θ
κ2 = ,
(1 + 2r 0 cos θ + r 02 )3/2
0 0 0 0 0 *
* 1 * 0 1 1
1 1 1 * 1 1
The auxiliary picture is then copied into the input image, and the process is re-
peated with the right pattern. The remaining steps of each iteration are similar
and use the six patterns obtained by consecutive 90-degree rotations of the original
ones. The output of the program is the 4-connected skeleton of the original region
(Serra, 1982).
24.5. Implement the FORMS approach to skeleton detection.
24.6. Implement the Brooks transform.
24.7. Write a program for finding skewed symmetries. You can implement either (a)
a naive O(n2 ) algorithm comparing all pairs of contour points, or (b) the O(kn)
projection algorithm proposed by Nevatia and Binford (1977). The latter method
can be summarized as follows: Discretize the possible orientations of local ribbon
axes; for each of these k directions, project all contour points into buckets and
verify the local skewed symmetry condition for points within the same bucket only;
finally, group the resulting ribbon pairs into ribbons.
C H A P T E R 25
103
C H A P T E R 26
We show in this exercise that P0n (t) is the Bézier curve of degree n associated with
the n + 1 points P0 , . . . , Pn . This construction of a Bézier curve is called the de
Casteljeau algorithm.
(a) Show that Bernstein polynomials satisfy the recursion
(n) (n−1) (n−1)
bi (t) = (1 − t)bi (t) + tbi−1 (t)
(0) (n)
with b0 (t) = 1 and, by convention, bj (t) = 0 when j < 0 or j > n.
(b) Use induction to show that
k
(k)
X
Pik (t) = bj (t)Pi+j for k = 0, . . . , n and i = 0, . . . , n − k.
j=0
Solution Let us recall that the Bernstein polynomials of degree n are defined by
(n) def ¡n ¢
bi (t) = i ti (1 − t)n−i (i = 0, . . . , n).
(a) Writing
(n−1) (n−1)
(t) + tbi−1 (t) = n−1
¡n−1¢
ti (1 − t)n−i + ti (1 − t)n−i
¡ ¢
(1 − t)bi i i−1
(n − 1)! (n)
= [(n − i) + i]ti (1 − t)n−i = bi (t).
(n − i)!i!
shows that the recursion is satisfied when i > 0 and i < n. It also holds
(n) (n−1)
when i = 0 since, by definition, b0 (t) = (1 − t)n = (1 − t)b0 (t) and,
(n−1)
by convention, b−1 (t) = 0. Likewise, the recursion is satisfied when i = n
(n) (n−1) (n−1)
since, by definition, bn (t) = tn = tbn−1 (t) and, by convention, bn (t) =
0.
(b) The induction hypothesis is obviously true for k = 0 since, by definition,
Pi0 (t) = Pi for i = 0, . . . , n. Suppose it is true for k = l − 1. We have, by
definition,
Pil (t) = (1 − t)Pil−1 (t) + tPi+1
l−1
(t).
Thus, according to the induction hypothesis,
Pl−1 (l−1) Pl−1 (l−1)
Pil (t) = (1 − t) j=0 bj (t)Pi+j + t j=0 bj (t)Pi+1+j
Pl−1 (l−1) P l (l−1)
= (1 − t) j=0 bj (t)Pi+j + t m=1 bm−1 (t)Pi+m ,
104
105
To equate the polynomial coefficients of both expressions for P (t), we multiply each
Bernstein polynomial in the first expression by t + 1 − t = 1. With the usual change
of variables in the second line below, this yields
Pn ¡n¢ j+1 Pn ¡ ¢
P (t) = j t ¢ (1 − t)
n−j
Pj + j=0 nj tj (1 − t)n+1−j Pj
Pj=0
n+1 n Pn ¡ ¢
tk (1 − t)n+1−k Pk−1 + j=0 nj tj (1 − t)n+1−j Pj
¡
= k=1 k−1
n+1 n+1 Pn j n+1−j
¡ n ¢ ¡ n¢
= (1 − t) P0 + t Pn + j=1 [t (1 − t) ][ j−1 Pj−1 + j Pj ]
Note that P (0) is the first control point of both arcs, so P0 = Q0 . Likewise,
P (1) is the last control point, so Pn = Qn+1 . This is confirmed by examining the
polynomial coefficients corresponding to j = 0 and j = n + 1 in the two expressions
of P (t). Equating the remaining coefficients yields
µ ¶ µ ¶ µ ¶
n+1 n n
Qj = Pj−1 + Pj for j = 1, . . . , n,
j j−1 j
or
(n + 1)! n! n!
Qj = Pj−1 + Pj .
(n + 1 − j)!j! (n + 1 − j)!(j − 1)! (n − j)!j!
This can finally be rewritten as
j n+1−j j j
Qj = Pj−1 + Pj = Pj−1 + (1 − )Pj .
n+1 n+1 n+1 n+1
Note that this is indeed a barycentric combination, which justifies our calculations,
and that this expression is valid for j > 0 and j < n + 1. It is in fact also valid for
j = n + 1 since, as noted before, we have Pn = Qn+1 .
106 Chapter 26 Application: Image-Based Rendering
26.3. Show that the tangent to the Bézier curve P (t) defined by the n + 1 control points
P0 , . . . , Pn is
n−1
(n−1)
X
P 0 (t) = n bj (t)(Pj+1 − Pj ).
j=0
Conclude that the tangents at the endpoints of a Bézier arc are along the first and
last line segments of its control polygon.
Pn (n) 0
P 0 (t) = bj (t)Pj
Pj=0
n Pn−1 ¡ ¢
= j=1 nj jtj−1 (1 − t)n−j Pj + j=0 nj (n − j)tj (1 − t)n−j−1 Pj
¡ ¢
Pn Pn−1 ¡
= n j=1 n−1 tj−1 (1 − t)n−j Pj + n j=0 n−1
¢j
t (1 − t)n−j−1 Pj
¡ ¢
j−1 j
Pn−1 ¡n−1¢ k n−1
= n k=0 k t (1 − t)n−k−1 Pk+1 + n j=0 n−1
¢j
t (1 − t)n−j−1 Pj
P ¡
j
Pn−1 n−1 j
¡ ¢ n−1−1
= n j=0 j t (1 − t) (Pj+1 − Pj )
Pn−1 (n−1)
=n j=0 bj (t)(Pj+1 − Pj )
proves the result (note the change of variable k = j − 1 in the fourth line of the
equation).
The tangents at the endpoints of the arc correspond to t = 0 and t = 1. Since
(n−1) (n−1) (n−1) (n−1)
bj (0) = 0 for j > 0, bj (1) = 0 for j < n−1, b0 (0) = 1, and bn−1 (1) =
1, we conclude that
which shows that the tangents at the endpoints of the Bézier arc along the first
and last line segments of its control polygon.
26.4. Show that the construction of the points Qi in Section 26.1.1 places these points
in a plane that passes through the centroid O of the points Ci
Solution First it is easy to show that the points Qi are indeed barycentric com-
binations of the points Cj : This follows immediately from the fact that, due to the
regular and symmetric sampling of angles in the linear combination defining Qi ,
the sum of the cosine terms is zero. Now let us shows that the points Qi are copla-
nar, and more precisely, that any of these points can be written as a barycentric
combination of the points
½ µ ¶¾
Pp π1 π
Q1 = j=1 cos [2(j − 1) − 1]
1 + cos Cj ,
pp p ¶¾
½ µ
π π
p
Qp−1 = j=1 p1 1 + cos cos [2(j + 1) − 1]
P
Cj ,
½ p µ ¶¾ p
π π
Pp
Qp = j=1 p1 1 + cos cos [2j − 1]
Cj .
p p
We write
p ½ µ ¶¾
X 1 π π
Qi = 1 + cos cos [2(j − i) − 1] Cj = aQ1 + bQ2 + cQ3 ,
p p p
j=1
107
π π π
cos θij = a cos[2(j − 1) − 1] + b cos[2(j + 1) − 1] + c cos[2j − 1]
p p p
π π π
= a cos[θij + 2(i − 1)] + b cos[θij + 2(i + 1)] + c cos[θij + 2i]
½ p p ¾ p
π π π
= cos θij a cos[2(i − 1) ] + b cos[2(i + 1) ] + c cos[2i ]
½ p p p¾
π π π
− sin θij a sin[2(i − 1) ] + b sin[2(i + 1) ] + c sin[2i ] ,
p p p
π π π
cos[2(i − 1) ] cos[2(i + 1) ] + cos[2i ] Ã ! Ã !
p p p a 1
π π π b = 0 .
sin[2(i − 1) p ] sin[2(i + 1) ] + sin[2i ]
p p c 1
1 1 1
This system of three equations in three unknowns admits (in general) a unique
solution, which shows that any Qi can be written as a barycentric combination
of the points Q1 , Qp−1 , and Qp , and that the points Q1 , Q2 , . . ., Qp are indeed
coplanar.
Now, it is easy to see that the points Qi can be written as
Q = λ 1 C1 + λ 2 C2 + . . . + λ p Cp
1
Q2 = λp C1 + λ1 C2 + . . . + λp−1 Cp
...
Qp = λ 2 C 1 + λ 3 C 2 + . . . + λ 1 C p
with λ1 +. . .+λp = 1, and it follows immediately that the centroid of the points Qi
(which obviously belongs to the plane spanned by these points) is also the centroid
of the points Ci .
26.5. Façade’s photogrammetric module. We saw in the exercises of chapter 3 that the
mapping between a line δ with Plücker coordinate vector ∆ and its image δ with
homogeneous coordinates δ can be represented by ρδ = M̃∆. Here, ∆ is a function
of the model parameters, and M̃ depends on the corresponding camera position
and orientation.
(a) Assuming that the line δ has been matched with an image edge e of length
l, a convenient measure of the discrepancy between predicted and observed
data is obtained by multiplying by l the mean squared distance separating the
points of e from δ. Defining d(t) as the signed distance between the edge point
p = (1 − t)p0 + tp1 and the line δ, show that
Z 1
1
E= d2 (t)dt = (d(0)2 + d(0)d(1) + d(1)2 ).
0 3
where d0 and d1 denote the (signed) distances between the endpoints of e and
δ.
(b) If p0 and p1 denote the homogeneous coordinate vectors of these points, show
that
1 1
d0 = pT
0 M̃∆ and d1 = pT
1 M̃∆,
|[M̃∆]2 | |[M̃∆]2 |
108 Chapter 26 Application: Image-Based Rendering
where [a]2 denotes the vector formed by the first two coordinates of the vector
a in R3
(c) Formulate the recovery of the camera and model parameters as a non-linear
least-squares problem.
Solution
(a) Let us write the equation of δ as n · p = D, where n is a unit vector and D
is the distance between the origin and δ. The (signed) distance between the
point p = (1 − t)p0 + tp1 and δ is
We have therefore
R1 2 R1
E = 0
d (t)dt = 0 [(1 − t)2 d(0)2 + 2(1 − t)td(0)d(1) + t2 d(1)2 ]dt
3 1
£ 1 ¤ 2
£ 2 2 3 ¤1 £ 1 3 ¤1 2
= − 3 (1 − t) 0 d(0) + t − 3 t d d
0 0 1
+ 3t 0
d(1)
= 13 (d(0)2 + d(0)d(1) + d(1)2 ).
(b) With the same notation as before, we can write δ T = (nT , D)T . Since ρδ =
M̃∆ and n is a unit vector, it follows immediately that
1 1
d(0) = n·p0 −D = pT
0 M̃∆ and d(1) = n·p1 −D = pT
1 M̃∆.
|[M̃∆]2 | |[M̃∆]2 |
def 1 q
fij = (pT 2 T T T 2
j0 M̃i ∆j ) + (pj0 M̃i ∆j )(pj1 M̃i ∆j ) + (pj1 M̃i ∆j ) ,
|[M̃i ∆j ]2 |
with respect to the unknown parameters (note that the term under the radical
is positive since it is equal—up to a positive constant—to the integral of d2 ).
It follows that the recovery of these parameters can be expressed as a (non-
linear) least-squares problem.
26.6. Show that a basis for the eight-dimensional vector space V formed by all affine
images of a fixed set of points P0 , . . . , Pn−1 can be constructed from at least two
images of these points when n ≥ 4.
Hint: Use the matrix
(1) (1) (m) (m)
u0 v0 ... u0 v0
... ... ... ,
(1) (1) (m) (m)
un−1 vn−1 ... un−1 vn−1
(j) (j)
where (ui , vi ) are the coordinates of the projection of the point Pi into image
number j.
Solution The matrix introduced here is simply the transpose of the data matrix
used in the Tomasi-Kanade factorization approach of chapter 12. With at least two
109
views of n ≥ 4 points, the singular value decomposition of this matrix can be used
to estimate the points P i (i = 0, . . . , n − 1) and construct the matrix
PT 0T 1 0
0
0T PT0 0 1
... ... ... . . .
P T 0T 1 0
n−1
0T PT
n−1 0 1
whose columns span the eight-dimensional vector space V formed by all images of
these n points.
26.7. Show that the set of all projective images of a fixed scene is an eleven-dimensional
variety.
Solution Writing
m1 · P 0
m3 · P 0
m · P0
! 2
m3 · P 0
p0
Ã
... = ...
pn−1 m1 · P 0
m · P
3 0
m2 · P 0
m3 · P 0
shows that the set of all images of a fixed scene forms a surface embedded in
R2n and defined by rational equations in the row vectors m1 , m2 and m3 of the
projection matrix. Rational parametric surfaces are varieties whose dimension is
given by the number of independent parameters. Since projection matrices are only
defined up to scale, the dimension of the variety formed by all projective images of
a fixed scene is 11.
26.8. Show that the set of all perspective images of a fixed scene (for a camera with
constant intrinsic parameters) is a six-dimensional variety.
¡ ¢
Solution A perspective projection matrix can always be written as ρM = K R t ,
where K is the matrix of intrinsic parameters, R is a rotation matrix, and ρ is a
scalar accounting for the fact that M is only defined up to a scale factor. The
matrix A formed by the three leftmost columns of M must therefore satisfy the
five polynomial constraints associated with the fact that the columns of the matrix
K−1 A are orthogonal to each other and have the same length. Since the set of all
projective images of a fixed scene is a variety of dimension 11, it follows that the
set of all perspective images is a sub-variety of dimension 11 − 5 = 6.
26.9. In this exercise, we show that Eq. (26.7) only admits two solutions.
(a) Show that Eq. (26.6) can be rewritten as
½
X 2 − Y 2 + e1 − e2 = 0,
(26.1)
2XY + e = 0,
where ½
X = u + αu1 + βu2 ,
Y = v + αv1 + βv2 ,
and e, e1 , and e2 are coefficients depending on u1 , v1 , u2 , v2 and the structure
parameters.
110 Chapter 26 Application: Image-Based Rendering
(a) Let us define the vector α = (α, β, 1) T and note that X = α·u and Y = α·v.
This allows us to write
1 + λ2
µ ¶ µ ¶
L 0 λµ
R = αα t + z 2 where L= .
0T 0 λµ µ2
where e1 = z 2 uT 2 T 2 T
2 Lu2 , e2 = z v 2 Lv 2 , and e = 2z u2 Lv 2 .
(b) We can always write X = a cos θ and Y = a sin θ for some a > 0 and θ ∈
[0, 2π]. This allows us to rewrite rewrite Eq. (26.8) as
a2 cos 2θ = e2 − e1
½
a2 sin 2θ = −e