0% found this document useful (0 votes)
14 views

Lecture 05

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 05

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Computer Vision

CS-E4850, 5 study credits


Lecturer: Juho Kannala
Lecture 5: Image alignment and
object instance recognition
• Given two images of a planar scene (or from a rotating camera),
find the parameters of a global geometric transformation that
accounts for most true point correspondences between the images
• Given a query image and a database of object images, detect
whether the objects are visible in the query image
• Reading:
– Szeliski’s book, Sections 8.1.1 - 8.1.4 in 2nd edition
– Chapter 4 in Hartley & Zisserman
– Lowe’s SIFT paper: http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

Acknowledgement: many slides from Svetlana Lazebnik, Derek Hoiem, Kristen


Grauman, and others (detailed credits on individual slides)
Source: S. Lazebnik

Image alignment

Image from http://graphics.cs.cmu.edu/courses/15-463/2010_fall/


Source: S. Lazebnik

Alignment applications
• A look into the past
Source: S. Lazebnik

Alignment applications

Panorama stitching
Source: S. Lazebnik

Alignment applications

Recognition
of object
instances
Source: S. Lazebnik

Alignment

Alignment: find parameters of model that maps one set of


points to another

Typically want to solve for a global transformation that


accounts for most true correspondences

Difficulties
• Noise (typically 1-3 pixels)
• Outliers (often 30-50%)
• Many-to-one matches or multiple objects
Source: S. Lazebnik

Alignment challenges

Small degree of overlap


Intensity changes

Occlusion,
clutter
Source: S. Lazebnik

Feature-based alignment
• Search sets of feature matches that agree in terms of:
a) Local appearance
b) Geometric configuration

?
Source: S. Lazebnik

Feature-based alignment: Overview


• Alignment as fitting
• Affine transformations
• Homographies

• Robust alignment
• Descriptor-based feature matching
• RANSAC
Source: S. Lazebnik

Alignment as fitting
• Previous lectures: fitting a model to features in one image
M
Find model M that minimizes
xi
∑ residual( x , M )
i
i
Source: S. Lazebnik

Alignment as fitting
• Previous lectures: fitting a model to features in one image
M
Find model M that minimizes
xi
∑ residual( x , M )
i
i

• Alignment: fitting a model to a transformation between


pairs of features (matches) in two images

xi
x'i Find transformation T
T that minimizes

∑ residual(T ( x ), xʹ)
i
i i
Source: S. Lazebnik

2D transformation models

• Similarity
(translation,
scale, rotation)

• Affine

• Projective
(homography)
Source: S. Lazebnik

Let’s start with affine transformations


• Simple fitting procedure (linear least squares)
• Approximates viewpoint changes for roughly planar
objects and roughly orthographic cameras
• Can be used to initialize fitting for more complex
models
Source: S. Lazebnik

Fitting an affine transformation


• Assume we know the correspondences, how do we
get the transformation?

( xi , yi )
( xiʹ, yiʹ)

xʹi = Mx i + t
⎡ xiʹ ⎤ ⎡ m1 m2 ⎤ ⎡ xi ⎤ ⎡ t1 ⎤
⎢ yʹ ⎥ = ⎢m ⎥ ⎢ ⎥ +⎢ ⎥ Want to find M, t to minimize
⎣ i⎦ ⎣ 3 m4 ⎦ ⎣ yi ⎦ ⎣t2 ⎦ n
2
∑ || xi − Mxi − t ||
i =1
ʹ
Source: S. Lazebnik

Fitting an affine transformation


• Assume we know the correspondences, how do we
get the transformation?

( xi , yi )
( xiʹ, yiʹ)

⎡ m1 ⎤
⎢ ⎥
⎡ ! ⎤ ⎢m2 ⎥ ⎡!⎤
⎢x 1 0⎥⎥ ⎢ m3 ⎥ ⎢⎢ xiʹ ⎥⎥
⎡ xiʹ ⎤ ⎡ m1 m2 ⎤ ⎡ xi ⎤ ⎡ t1 ⎤ ⎢ i yi 0 0
⎢ yʹ ⎥ = ⎢m ⎥ ⎢ ⎥ +⎢ ⎥ ⎢ ⎥=
⎣ i⎦ ⎣ 3 m4 ⎦ ⎣ yi ⎦ ⎣t2 ⎦ ⎢0 0 xi yi 0 1⎥ ⎢m4 ⎥ ⎢ yiʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ! ⎦ t1 ⎣!⎦
⎢ ⎥
⎢⎣ t 2 ⎥⎦
Source: S. Lazebnik

Fitting an affine transformation


⎡ m1 ⎤
⎢ ⎥
⎡ ! ⎤ ⎢m2 ⎥ ⎡!⎤
⎢x yi 0 0 1 0⎥⎥ ⎢ m3 ⎥ ⎢⎢ xiʹ ⎥⎥
⎢ i ⎢ ⎥=
⎢0 0 xi yi 0 1⎥ ⎢m4 ⎥ ⎢ yiʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ! ⎦ t1 ⎣!⎦
⎢ ⎥
⎢⎣ t 2 ⎥⎦
• Linear system with six unknowns
• Each match gives us two linearly independent
equations: need at least three to solve for the
transformation parameters
Source: S. Lazebnik

Fitting a plane projective transformation


• Homography: plane projective transformation
(transformation taking a quad to another arbitrary
quad)
Source: S. Lazebnik

Homography
• The transformation between two views of a planar
surface

• The transformation between images from two


cameras that share the same center
Application: Panorama stitching

Source: Hartley & Zisserman


Source: S. Lazebnik

Fitting a homography
• Recall: homogeneous coordinates

Converting to homogeneous Converting from homogeneous


image coordinates image coordinates
Source: S. Lazebnik

Fitting a homography
• Recall: homogeneous coordinates

Converting to homogeneous Converting from homogeneous


image coordinates image coordinates

• Equation for homography:

⎡ xʹ ⎤ ⎡ h11 h12 h13 ⎤ ⎡ x ⎤


⎢ ⎥ ⎢
λ ⎢ y ⎥ = ⎢h21 h22
ʹ ⎥ ⎢
h23 ⎥ ⎢ y ⎥ ⎥
⎢⎣ 1 ⎥⎦ ⎢⎣h31 h32 h33 ⎥⎦ ⎢⎣ 1 ⎥⎦
Source: S. Lazebnik

Fitting a homography
• Equation for homography:
⎡ xiʹ ⎤ ⎡ h11 h12 h13 ⎤ ⎡ xi ⎤ λ xʹi = H xi
λ ⎢⎢ yiʹ ⎥⎥ = ⎢⎢h21 h22 h23 ⎥⎥ ⎢⎢ yi ⎥⎥
⎢⎣ 1 ⎥⎦ ⎢⎣h31 h32 h33 ⎥⎦ ⎢⎣ 1 ⎥⎦
xʹi × H xi = 0
T T T
x
⎡ i⎤ ʹ ⎡h x
1 i
⎤ ⎡ y ʹ h
i 3 i x − h 2 xi

⎢ yʹ ⎥ × ⎢hT x ⎥ = ⎢ hT x − xʹ hT x ⎥
⎢ i⎥ ⎢ 2 i⎥ ⎢ 1 i i 3 i ⎥
⎢⎣ 1 ⎥⎦ ⎢⎣hT3 xi ⎥⎦ ⎢⎣ xiʹ hT2 xi − yiʹ h1T xi ⎥⎦

⎡ 0T − xTi yiʹ xTi ⎤⎛ h1 ⎞


⎢ T T⎥
⎜ ⎟ 3 equations,
⎢ xi 0T − xiʹ xi ⎥⎜ h 2 ⎟ = 0 only 2 linearly
⎢− yiʹ xTi
⎣ xiʹ xTi 0T ⎥⎦⎜⎝ h 3 ⎟⎠ independent
Source: S. Lazebnik

Fitting a homography: DLT algorithm


⎡0T x1T − y1ʹ x1T ⎤
⎢ T T T ⎥
⎢x1 0 − x1ʹ x1 ⎥⎛ h1 ⎞
⎜ ⎟
⎢! ! ! ⎥⎜ h 2 ⎟ = 0 Ah = 0
⎢ T T ⎥⎜
⎢0 x n − ynʹ x n ⎥⎝ h 3 ⎟⎠
T

⎢ xT T
0 − xn x n ⎦
ʹ T⎥
⎣ n
• H has 8 degrees of freedom (9 parameters, but scale is
arbitrary)
• One match gives us two linearly independent equations
• Homogeneous least squares: find h minimizing ||Ah||2
• Eigenvector of ATA corresponding to smallest eigenvalue
• Four matches needed for a minimal solution
• For more info, see Sec. 4.1 in (Hartley & Zisserman)
Source: S. Lazebnik

Robust feature-based alignment


• So far, we’ve assumed that we are given a set of
“ground-truth” correspondences between the two
images we want to align
• What if we don’t know the correspondences?

( xi , yi )
( xiʹ, yiʹ)
Source: S. Lazebnik

Robust feature-based alignment


• So far, we’ve assumed that we are given a set of
“ground-truth” correspondences between the two
images we want to align
• What if we don’t know the correspondences?

?
Source: S. Lazebnik

Robust feature-based alignment


Source: S. Lazebnik

Robust feature-based alignment

• Extract features
Source: S. Lazebnik

Robust feature-based alignment

• Extract features
• Compute putative matches
Source: S. Lazebnik

Robust feature-based alignment

• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
Source: S. Lazebnik

Robust feature-based alignment

• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
• Verify transformation (search for other matches consistent
with T)
Source: S. Lazebnik

Robust feature-based alignment

• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
• Verify transformation (search for other matches consistent
with T)
Source: S. Lazebnik

Generating putative correspondences

?
Source: S. Lazebnik

Generating putative correspondences

?
() = ()
feature feature
descriptor descriptor

• Need to compare feature descriptors of local patches


surrounding interest points
Source: S. Lazebnik

Feature descriptors
• Recall: feature detection and description
Source: S. Lazebnik

Feature descriptors
• Simplest descriptor: vector of raw intensity values
• How to compare two such vectors?
• Sum of squared differences (SSD)
2
SSD(u, v ) = ∑ (ui − vi )
i
– Not invariant to intensity change

• Normalized correlation

(u − u ) ( v − v ) ∑ i
(ui − u )(vi − v )
ρ (u, v) = ⋅ =
|| u − u || || v − v || ⎛ ⎞⎛ ⎞
2 2
⎜ ∑ (u j − u ) ⎟⎜ ∑ (v j − v ) ⎟
⎜ ⎟⎜ ⎟
⎝ j ⎠⎝ j ⎠
– Invariant to affine intensity change
Source: S. Lazebnik

Disadvantage of intensity vectors as descriptors


• Small deformations can affect the matching
score a lot
Source: S. Lazebnik

Feature descriptors: SIFT


• Descriptor computation:
• Divide patch into 4x4 sub-patches
• Compute histogram of gradient orientations (8 reference
angles) inside each sub-patch
• Resulting descriptor: 4x4x8 = 128 dimensions

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60


(2), pp. 91-110, 2004.
Source: S. Lazebnik

Feature descriptors: SIFT


• Descriptor computation:
• Divide patch into 4x4 sub-patches
• Compute histogram of gradient orientations (8 reference
angles) inside each sub-patch
• Resulting descriptor: 4x4x8 = 128 dimensions

• Advantage over raw vectors of pixel values


• Gradients less sensitive to illumination change
• Pooling of gradients over the sub-patches achieves
robustness to small shifts, but still preserves some spatial
information

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60


(2), pp. 91-110, 2004.
Source: S. Lazebnik

Feature matching
• Generating putative matches: for each patch in one
image, find a short list of patches in the other image
that could match it based solely on appearance

?
Source: S. Lazebnik

Problem: Ambiguous putative matches

Source: Y. Furukawa
Source: S. Lazebnik

Rejection of unreliable matches


• How can we tell which putative matches are more reliable?
• Heuristic: compare distance of nearest neighbor to that of
second nearest neighbor
• Ratio of closest distance to second-closest distance will be high
for features that are not distinctive

Threshold of 0.8
provides good
separation

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60


(2), pp. 91-110, 2004.
Source: S. Lazebnik

RANSAC
• The set of putative matches contains a very high
percentage of outliers

RANSAC loop:
1. Randomly select a seed group of matches
2. Compute transformation from seed group
3. Find inliers to this transformation
4. If the number of inliers is sufficiently large, re-compute
least-squares estimate of transformation on all of the
inliers

Keep the transformation with the largest number of inliers


Source: S. Lazebnik

RANSAC example: Translation

Putative matches
Source: S. Lazebnik

RANSAC example: Translation

Select one match, count inliers


Source: S. Lazebnik

RANSAC example: Translation

Select translation with the most inliers


Object Instance Recognition

Slide credit: David Lowe


Source: D. Hoeim

Object Instance Recognition


A1 B3
1. Match keypoints to object
A3
model A2

Matched B2
B1
2. Solve for affine keypoints
transformation parameters

3. Score by inliers and choose Affine


Parameters
solutions with score above
threshold
# Inliers

Choose hypothesis with max


score above threshold
Source: D. Hoeim

Overview of Keypoint Matching


1. Find a set of
distinctive key-
points
A1 B3
2. Define a region
around each
A2 A3 keypoint
B2
3. Extract and
B1
normalize the
region content
fA fB
4. Compute a local
N pixels

descriptor from the


e.g. color e.g. color
normalized region
N pixels d( f A, fB ) < T
5. Match local
K. Grauman, B. Leibe descriptors
Source: D. Hoeim
Finding the objects (overview)

Input
Image Stored
Image

1. Match interest points from input image to database image


2. Matched points vote for rough position/orientation/scale of
object
3. Find position/orientation/scales that have at least three
votes
4. Compute affine registration and matches using iterative
least squares with outlier check
5. Report object if there are at least T matched points
Source: D. Hoeim

Matching Keypoints

Want to match keypoints between:


1. Query image
2. Stored image containing the object

Given descriptor x0, find two nearest neighbors x1, x2 with


distances d1, d2

x1 matches x0 if d1/d2 < 0.8


• This gets rid of 90% false matches, 5% of true matches in
Lowe’s study
Source: D. Hoeim

Affine Object Model


Accounts for 3D rotation of a surface under orthographic
projection
Source: D. Hoeim

Affine Object Model

⎡ x⎤
⎡ x ʹ⎤ ⎡ a b c ⎤⎢ ⎥
⎢ y ʹ⎥ = ⎢d e ⎥ y
⎣ ⎦ ⎣ f⎦ ⎥⎢
⎢⎣ 1 ⎥⎦
⎡a ⎤
⎡ x1 y1 1 0 0 0⎤ ⎢ b ⎥ ⎡ x1ʹ ⎤
⎢ ⎥
⎢0 0 0 x1 y1 1 ⎢ c ⎥ ⎢ y1ʹ ⎥

⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢ x2 y2 1 0 0 0⎥ ⎢ d ⎥ ⎢ x2ʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ .! ⎦ e ⎣!⎦
⎢ ⎥
⎣f⎦
Source: D. Hoeim
Finding the objects (in detail)
1. Match interest points from input image to database image
2. Get location/scale/orientation using Hough voting
• In training, each point has known position/scale/orientation wrt
whole object
• Matched points vote for the position, scale, and orientation of the
entire object
• Bins for x, y, scale, orientation
– Wide bins (0.25 object length in position, 2x scale, 30 degrees
orientation)
– Vote for two closest bin centers in each direction (16 votes
total)
3. Geometric verification
• For each bin with at least 3 keypoints
• Iterate between least squares fit and checking for inliers and
outliers
4. Report object if > T inliers (T is typically 3, can be computed to match
some probabilistic threshold)
Examples of recognized objects

Slide credit: David Lowe


Location Recognition

Training

[Lowe04]
Slide credit: David Lowe
Key concepts
Alignment as robust fitting
• Affine transformations
• Homographies
• Descriptor-based feature
matching
• RANSAC

Object instance recognition


• Find keypoints, compute
descriptors
• Match descriptors
• Vote for / fit affine parameters
• Return object if # inliers > T

Adapted from D. Hoeim

You might also like