0% found this document useful (0 votes)
8 views87 pages

Digital Video

Uploaded by

anuraagnandi9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views87 pages

Digital Video

Uploaded by

anuraagnandi9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Digital Video

EE6310: Image and Video Processing, Spring 2024

March 5, 2024
Digital Video

I Generic Video Codec


I Block Motion Estimation
I Interframe Coding and Motion Estimation
I Video Compression Standards: MPEG-x/H.264
I Optical Flow
Induced Motion Effects
Induced Motion Effects
Induced Motion Effects
Induced Motion Effects
Induced Motion Effects
Generic Video Encoder Diagram
Generic Video Decoder Diagram
Lossless Coding

I Lossless techniques achieve compression with no loss of


information
I The true image can be reconstructed exactly from the coded
image
I Lossless coding doesn’t usually achieve high compression but
has applications such as:
I In combination with lossy compression, multiply gains
I In applications where information loss is unacceptable
I Lossless compression ratios usually in the range:
2 : 1 ≤ CR ≤ 3 : 1 but can vary from image to image
Methods for Lossless Coding

I Basically amounts to clever rearrangement of data


I This can be done in many ways and many domains (DFT,
DCT, wavelet etc)
I The most popular methods use variable length coding
(VLC)
Variable Length Coding (VLC)

I Idea: use variable length codewords to encode gray levels


I Assign short lengths to gray levels that occur frequently
I Assign long lengths to gray levels that occur infrequently
I On average, the bits per pixel (BPP) will be reduced
Image Histogram and BPP

I Recall the image histogram HI :

I If B(k) is the number of bits used to code gray-level k, then


K −1
1 P
BPP(I) = NM B(k)HI (k)
k=0
I BPP is the common measure for VLC
Image Entropy

I Recall the normalized histogram values


1
pI (k) = NM HI (k); k = 0, . . . , K − 1
I pI (k) is the probability of gray level k
I The entropy of image I is then:
−1
KP
E[I] = − pI (k)log2 (pI (k))
k=0
I Entropy is a measure of information
I Provides a lower bound on VLCs:
BPP(I) ≥ E(I)
Optimal VLC

I Recall
−1
KP −1
KP
1
BPP(I) = NM HI (k)B(k) = pI (k)B(k)
k=0 k=0
−1
KP
E[I] = − pI (k)log2 (pI (k))
k=0
I Comparing equations, if B(k) = −log2 (pI (k)), optimum code
found - lower bound attained!
The Huffman Code

I The Huffman algorithm yields an optimum code


I For a set of gray levels {0, . . . , K − 1} it gives a set of code
words c(k) such that
KP−1
BPP(I) = pI (k)L(c(k)) is the smallest possible
k=0
The Huffman Algorithm

I Form a binary tree with branches labeled by the gray-levels km


and their probabilities pI (km ):
1. Eliminate any km where pI (km ) = 0
2. Find 2 smallest probabilities pm = pI (km ), pn = pI (kn)
3. Replace by pmn = pm + pn to form a node; reduce list by 1
4. Label the branch for km with (e.g.) ’1’ and for kn with ’0’
5. Until list has only 1 element (root reached), return to (2)
I In step (4), values ’1’ and ’0’ are assigned to element pairs
(km , kn ), element triples, etc. as the process progresses
Huffman Example

I There are K = 8 values 0, . . . , 7 to be assigned codewords:


pI (0) = 0.5, pI (1) = pI (2) = pI (3) = 0.125, pI (4) =
0.0625, pI (5) = pI (6) = 0.03125, pI (7) = 0
I The process creates a binary tree with values ’1’ and ’0’
placed on the top and bottom branches at each stage
I Solution on board - make note
Huffman Decoding

I The Huffman code is a uniquely decodable code. There is


only one interpretation for a series of codewords (bits)
I Decoding progresses as follows:
I Starting at tree root, traverse tree using coded bits until a leaf
is found. The symbol at the leaf is output
I Return to tree root and repeat above step until all bits are
exhausted
Arithmetic Coding

I Assigns a single arithmetic code word to an entire sequence


of source symbols – creates a mapping between source
symbol sequence and real numbers in the interval [0, 1)
I Achieves higher compression efficiency than Huffman codes
- no need to map source symbols to integral number of code
symbols
I Achieves Shannon’s noiseless source coding bound
Arithmetic Coding Algorithm

I Divide the interval [0, 1) according to PMF


I For e.g., if: p(a) = p(e) = 0.25, p(i) = p(o) = 0.2, p(u) = 0.1

Symbol a e i o u
Range [0, 0.25) [0.25, 0.5) [0.5, 0.7) [0.7, 0.9) [0.9, 1)
I Initialize h = 1, l = 0
I Loop over all source symbols
I r =h−l
I h = l + r ∗ hs
I l = l + r ∗ ls
I Final output is any real number in the interval [h, l)
Arithmetic Decoding Algorithm

I Initial r to encoded number


I Loop over sequence length or until EOF symbol
I Find interval where r lands - output corresponding source
symbol s
I r = hr −ls −ls
= rp(s)
s −ls

I Example on board . . .
Lossy Coding – Goals

To optimize and balance the following:


I Compression achieved by coding
I Computation required to code and decode
I Quality of the decompressed data
Lossy Coding – Broad Methodology

I Many methods proposed


I The successful methods broadly and loosely follow:
I Transform the image to another domain and/or extract
features
I Quantize in this image or those features
I Efficiently organize and/or entropy code the quantized data
Lossy Coding – Block Coding

I Most lossy methods begin by partitioning the image/frame


into sub-blocks that are individually coded
I Wavelet methods are an exception
Lossy Coding – Why Block Coding?

I Reason: Images are highly non-stationary: different areas


of an image may have different properties
I E.g., more high or frequencies, more or less detail etc.
I Local coding is thus more efficient
I Wavelet methods provide localization without blocks
I Typical block sizes: 4 × 4, 8 × 8, 16 × 16
Lossy Coding – Karhunen-Loeve Expansion
I Thm 8.5-1 in Stark and Woods, stated here for completeness
I Optimal decorrelating transform in a probabilistic framework
I Theorem: Let X (t) be a zero-mean, second-order random
process defined over [−T /2, T /2] with continuous covariance
function KXX (t1 , t2 ) = RXX (t1 , t2 ), then
P∞
X (t) = Xn φn (t) for |t| ≤ T /2
n=1
+T
R /2
I Xn ≡ X (t)φ∗n (t)dt
−T /2
I The set of functions {φn (t)} is a complete orthonormal set of
solutions to the integral equation
+TR /2
KXX (t1 , t2 )φn (t2 )dt2 = λn φn (t1 ) for |t1 | ≤ T /2
−T /2
I The coefficients Xn are statistically orthogonal
E [Xn Xm∗ ] = λn δmn
Lossy Coding – Principal Components Analysis (PCA)

I Optimal decorrelating transform in a deterministic


framework
I Given an m × n matrix X of observations of an unknown
system or a physical process
I Find a linear transform matrix P of size m × m such that
Y = PX
I The transform should be such that:
I The covariance matrix CY ’s off-diagonal elements must be 0
I Successive dimensions in Y must be rank-ordered according to
variance
I Note: CX ≡ n1 XXT
I Proof outline on board . . .
Lossy Coding – Discrete Cosine Transform

I The DCT of an N × M image or sub-image is defined as:


P M−1
N−1
I˜(u, v ) = 4CN (u)CM (v )
I (i, j)cos[ (2i+1)uπ ]cos[ (2j+1)v π
P
NM 2N 2M ]
i=0 j=0
I The inverse DCT of is defined as:
P M−1
N−1
CN (u)CM (v )I˜(u, v )cos[ (2i+1)uπ ]cos[ (2j+1)v π
P
I (i, j) = 2N 2M ]
u=0 v =0
(
√1 , if u = 0
I CN (u) = 2
1, u = 1, . . . , N − 1
Lossy Coding – Discrete Cosine Transform
Lossy Coding – Discrete Cosine Transform

I Good decorrelating properties


I Non-adaptive orthonormal basis
I Seperable, fast implementation
I Adopted by JPEG and MPEG standards
Lossy Coding – Overview of JPEG

I The commercial industry standard - formulated by the CCIT


Joint Photographic Experts Group (JPEG)
I Uses DCT as the central transform
I Standard is quite complex - will only discuss outline here
Lossy Coding – JPEG Block Diagram
Lossy Coding – JPEG Baseline Algorithm

I Partition image into 8 × 8 blocks and apply DCT to each


block to get I˜k (u, v )
I Pointwise divide each block by an 8 × 8 user-defined
normalization array Q(u, v )
I Q(u, v ) designed using sensitivity properties of human vision
I Uniformly quantize the result
I˜k (u,v )
I˜k (u, v ) = INT[ Q(u,v ) + 0.5]
Lossy Coding – JPEG Baseline Example

I A block DCT (integer only algorithm) I˜k =

I JPEG Normalization Array Q =


Lossy Coding – JPEG Baseline Example

I A block DCT (integer only algorithm) I˜k =

I Notice all the zeros - due to DCT’s good energy


compaction
Lossy Coding – JPEG Data Rearrangment

I Rearrange quantized AC coefficients


I This array contains mostly zeros, especially at high
frequencies
I So, rearrange into a 1-D array using zig-zag ordering

Reordered quantized block is:


[79, 0, −2, −1, −1, −1, 0, 0, −1, (55 00 s)]
Lossy Coding – JPEG DC Coefficient Handling

I Simple DPCM applied to DC values I˜k (0, 0) between


adjacent blocks to reduce entropy
I Difference between current block and left-adjacent block is
found
e(k) = I˜k (0, 0) − I˜k−1 (0, 0)
I e(k) losslessly encoded with Huffman coder
I First column of DC values retained to allow reconstruction
Lossy Coding – JPEG AC Coefficient Handling

I AC vector contains many zeros


I Using Run Length Coding (RLC) results in considerable
compression
I The AC vector is converted into 2-tuples (skip, value) where
I Skip = number of zeros preceding a non-zero value
I Value = the following non-zero value
I The AC pairs are then Huffman coded
Lossy Coding – JPEG Decoding

I Decoding is achieved by reversing Huffman coding, RLC and


DPCM to recreate I˜k
I Then multiply by normalization array to create lossy DCT
I˜klossy = Q ⊗ I˜k
I The decoded image block is the IDCT of the result
Iklossy = IDCT[I˜klossy ]
I The overall decoded image is recreated by putting together
the compressed 8 × 8 pieces:
I lossy = [Iklossy ]
Lossy Coding – JPEG Example

(a) Original (b) 16:1

(c) 32:1 (d) 64:1


Block Motion Estimation

I Image motion estimation, as in optical flow, is important


for many video applications including video filtering, motion
compensation, and video compression
I Practical systems usually use block motion estimation for
ease of implementation
I This approach assumes that the video consists of images
containing moving blocks
I The blocks are assumed to have simple translational motion.
We can again use windows and windowed sets to express
this
Video Notation

I A video sequences I is a 3-D array or signal:


I = [I (i, j, k); 0 ≤ i ≤ N − 1, 0 ≤ j ≤ M − 1, 0 ≤ k ≤ K − 1]
I Here, we will regard still images as single images taken from
a video. Thus a video consists of a sequence of still image:
I = [. . . Ik−1 Ik Ik+1 . . .]
Video Windows

I A window B is a set of 2-D coordinate shifts Bi = (mi , ni ):


B = {B1 , . . . , B2M+1 } = {(m1 , n1 ), . . . , (m2M+1 , n2N+1 )}
I Given an image Ik and a window B, the windowed set at
(i, j, k) is
B  I(i, j, k) = B  Ik (i, j) = {I (i − m, j − n, k); (m, n) ∈ B}
the set of image pixels covered by B at coordinate (i, j) at
time k
I The windows used are SQUARE and non-overlapping for
ease of implementation
I Standards typically use 16 × 16
Translational Block Motion

I This assumes that at some later time k + r , each block or


windowed set at time k has translated in the i and j
directions.
B  I(i, j, k) = B  I(i + d1 , j + d2 , k + r)
for integer displacement (d1 , d2 ) and time shift r , as depicted.
Translational Block Motion

I Advantages of translational block models:


I Only one motion vector needed per block
I Ease of hardware implementation
I Disadvantages of translational block models:
I Inaccurate for other motion types: zoom, rotation, bending
etc.
I Leads to visual “blocking artifacts” at low bitrates
I It is possible to estimate other motion types, but at much
higher cost. Standards use the simple model
Block Matching

I Goal: Estimate (d1 , d2 ) by block matching - the simplest


method for estimating block motion
I Involves a simple search to find the best-fitting translational
motion for each block
I Method: for each block B  I(i, j, k) in video signal I at time
k, search for the best-fitting block of the same size at
time k + 1
I The blocks that are found at time k + 1 may overlap
Search Space

The search is conducted over a neighborhood centered around the


location (i, j) of the original block:
Block Match Measures

I Goal: Find the block with the minimum error with respect
to the original block:
FIND:min(d1 ,d2 ) ||B  I(i, j, k) − B  I(i + d1 , j + d2 , k + 1)||
where ||.|| is an error metric such as (assume P × Q blocks):
MSE(d 1 ,P
d2 ) =
1 P
PQ [I (i −m, j −n, k)−I (i −m +d1 , j −n +d2 , k +1)]2
(m,n)∈B
MAD(d 1 ,P
d2 ) =
1 P
PQ |I (i − m, j − n, k) − I (i − m + d1 , j − n + d2 , k + 1)|
(m,n)∈B
I MAD is commonly used in practice - no computation of
squares:
(d1∗ , d2∗ ) = arg min(d1 ,d2 ) MAD(d1 , d2 )
Block Searching

I In practice it is far too time consuming to check for all


possible matches. Instead, a subset of matches is checked
I First, the amount of translation is always limited:
−M ≤ d1 , d2 , ≤ M
I Three-step search is a typical strategy that is used. It
involves narrowing down the best location using a directed
search
I However, sub-optimal
Three-Step Search
I Step 1: Compute error at d1 = d2 = 0 at 9 equally-spaced
pixels:

I Step 2: Localize the search near the best match from Step 1:
Three-Step Search
I Step 3: Localize the search near the best match from Step 2:

I The motion estimate is then simply the displacement


between the current block (time k) and the best match (time
k + 1)
Comments on Match Search

I Discussed forward motion estimation between current frame


k and next frame k + 1
I Backward motion estimation between current frame k and
previous frame k − 1 common
I Block-based motion estimation is really a form of optical flow
I Current (H.264, H.265) video coding standards use
block-based motion estimation
Motion-Compensated Transform Coding

I Compute block motion displacement vectors using loopback


from frame Ik to frame Ik−1 . Usually 16×16 blocks. Blocks
are non-overlapping in frame Ik . This is referred to as
interframe coding
I Compute motion-compensated difference image Dk by
differencing each 16×16 block in Ik with its corresponding
displaced block in Ik−1
I Subdivide difference image Dk into sub-blocks (usually 8×8)
and code using JPEG-like algorithm
I The first frame is coded like an image. This is referred to as
intraframe coding
Motion-Compensated Transform Coding
Comments on Motion Compensation

I Motion compensation (MC) is highly effective for:


I Increasing compression efficiency
I Reducing “ghosting” artifacts in compressed video. Very fast
movements result in large differences between blocks. Without
MC, leads to “motion ghosts”
I Accomodating compression to temporal aliasing
I Current standards (H.264, H.265) use MC
Practical Video Codec – The Basics

I Earlier, defined video as time-indexed images


I Can sample in all three dimensions, yielding discrete video.
This is always quantized, which is digital video
I In principle, analog video is continuous in all three dimensions
I In practice, analog video is sampled along one spatial
dimension and along time dimension
Practical Video Codec – Analog Video

I An optical analog video signal is a function Ic (x, y , t) of


space and time
I Practical video systems, such as television and monitors,
represent analog video as a one-dimensional electrical
signal V (t)
I A 1-D signal V (t) is obtained by sampling Ic (x, y , t) along
the vertical (y ) direction and along time (t) direction. This is
called scanning and the result is a series of scan lines
Analog Video Sampling

I Progressive Analog Video involves sampling row after row


at intervals ∆y and each frame at intervals ∆t

I Interlaced Analog Video involves sampling even and odd


rows alternately
Digital Video Sampling

I Digital video is obtained either by sampling an analog video


signal or by directly sampling the 3-D intensity distribution
I If progressive analog video is sampled, or if digital video is
directly sampled, then the sampling is rectangular and
properly indexed
I If interlaced analog video is sampled, then the digital is
interlaced also and must be re-indexed
Practical Video Codec – TV Standards

I NTSC (National Television Systems Committee)


I 2:1 interlaced
I 525 lines per frame (262.5 / refresh) - 485 active, 40 blank
I 60 refreshes / second
I Used heavily in Japan and North America
I PAL (Phase Alternation Line)
I 2:1 interlaced
I 625 lines per frame
I 50 refreshes / second
I Used heavily in Europe and Asia (including India)
I Older tube TVs used this format
Practical Video Codec – Aspect Ratio

Aspect ratio: The ratio of the width of a video frame to its height

Figure: Analog formats used 4:3 aspect ratio


Practical Video Codec – Color Basics
I Any color can be represented as a combination of Red (R),
Blue (B), and Green (G)
I RGB representation codes a color video as three separate
signals
I The YIQ representation combines the information according
to perceptual criteria:
Y = 0.299R + 0.587G + 0.114B (luminance)
I = 0.596R + 0.275G − 0.321B (chrominance)
Q = 0.212R − 0.523G + 0.311B (chrominance)
I Alternately, YCbCr chrominance representation:
Cr = R − Y (chrominance)
Cb = B − Y (chrominance)
I Why bother? Compression! Chrominance information can
be sent in a fraction of information required for luminance
information
Practical Video Codec – Resolution

Figure: Typical modern video resolutions


Practical Video Codec – HDTV Resolutions

The HDTV format:


I Has interlaced and progressive modes
I 720p: progressive, 1280×720 pixels, 60 frames per second
(fps). Raw BW (24 bits/pixel): 1.3 Gbps
I 1080i: interlaced, 1920×1080 pixels, 50 fields (25 frames) per
second (fps). Raw BW (24 bits/pixel): 1.2 Gbps
I 1080p: progressive, 1920×1080 pixels, 59.94 fps. Raw BW
(24 bits/pixel): 2.98 Gbps
I Aspect ratio: 16:9
I Typically compressed: MPEG-2 or H.264
Practical Video Codec – Group of Pictures (GOP)

I Specifies the order in which Intra (I) and Inter (P, B) frames
are arranged in a video sequence

Figure: Traditional GOP structure


Video Compression Standard - H.264

I Standardized in 2003
I A large, complex video standard
I High-level overview here
I Good reference: “The H.264 Advanced Video Compression
Standard” by Iain E. Richardson, Wiley, 2010
H.264 - The Highlights

I Variable macroblock sizes - better motion estimation


I In-loop filtering - reduces blocking artifacts
I Integer transform - efficient implementation
I Improved lossless coding - CABAC
I Network Abstraction Layer (NAL) units - facilitates network
transport
H.264 - The Highlights
Optical Flow

I Fundamental to the concept of motion, but nevertheless


different, is optical flow
I Optical flow is the instantaneous motion of image
intensities. This is not the same as the motion of the objects
being imaged: image motion is not object motion
I Examples:
I An “off-camera” variable light source illuminating a stationary
object. A case of image motion without object motion
I A mirrored sphere that is spinning. A case of object motion
without image motion
I Still, optical flow is all the motion information that the image
supplies! So, most methods of motion estimation, motion
compensation, etc. depend on it
Optical Flow – Continuous Formulation

I The image intensity at a point in space and time is I (x, y , t)


I After a sufficiently small time interval ∆t, the intensity at
(x, y ) will move to a point (x + ∆x, y + ∆y ). In other words:
I (x + ∆x, y + ∆y , t + ∆t) = I (x, y , t)
I Illustration on board . . .
I This assumes that the intensity does not change, just its
position
Optical Flow – Taylor Expansion

I Expanding the LHS in a Taylor’s series:


I (x + ∆x, y + ∆y , t + ∆t) =
∂I ∂I ∂I
I (x, y , t) + ∆x ∂x + ∆y ∂y + ∆t ∂t + higher order terms
I So that
∂I ∂I ∂I
I (x, y , t) + ∆x ∂x + ∆y ∂y + ∆t ∂t + higher order terms =
I (x, y , t)
I Letting higher order terms to 0 (assuming small time and
motion), cancelling I (x, y , t) and dividing by ∆t
∆x ∂I ∆y ∂I ∂I
∆t ∂x + ∆t ∂y + ∂t = 0
Optical Flow Constraint Equation

I Taking the limit ∆t → 0 yields:


∂x ∂I ∂y ∂I ∂I
∂t ∂x + ∂t ∂y + ∂t = 0
I The optical flow components are:
∂y
u(x, y , t) = ∂x∂t (x, y , t), v (x, y , t) = ∂t (x, y , t)
I Putting this together gives the optical flow constraint
equation or OFCE:
Ix u + Iy v + It = 0
I So-called since it does not solve for optical flow - only
constrains the optical flow vector (u, v ) to lie on a line
Optical Flow – The Aperture Problem

I Even knowing Ix , Iy , It does not solve optical flow


I This is the aperture problem
I Image being able to view only a small region of the image
that is in motion:

Figure: Aperture problem

I If the edge is sensed to be moving “up”, the true motion


could actually be in any of the shown directions
I In order to solve for optical flow, some other physically
meaningful constraints must be found or assumed
Smooth Optical Flow

I The assumption that is usually made is that optical flow is


smooth. By smooth is meant that the derivatives of u and v
have small magnitudes
I Solution involves minimizing the overall departure from
2 2 2 2
RR
smoothness: Esmooth = Es = image [ux + uy + vx + vy ]dxdy
I We also want the overall OFCE error to be small:
2
RR
Ec = image [Ix u + Iy v + It ] dxdy
A Minimization Problem

I Minimize the weighted sum:


E = Es + λEc
I A solution will always exist and be unique
I For larger λ, the solution will track OFCE closely
I For smaller λ, the solution will be forced smoother
I Picking λ is a hard problem – not discussed here
I We will not try to solve the continuous problem
Discrete Optical Flow

I Approximations of derivatives of flow:


ux ≈ [u(i + 1, j) − u(i, j)]/2, uy ≈ [u(i, j + 1) − u(i, j)]/2
vx ≈ [v (i + 1, j) − v (i, j)]/2, vy ≈ [v (i, j + 1) − v (i, j)]/2
I Then
P N−1
M−1
{[u(i + 1, j) − u(i, j)]2 + [u(i, j + 1) − u(i, j)]2 +
P
Es =
i=0 j=0
[v (i + 1, j) − v (i, j)]2 + [v (i, j + 1) − v (i, j)]2 },
M−1
P N−1
[Ix (i, j)u(i, j) + Iy (i, j)v (i, j) + It (i, j)]2
P
Ec =
i=0 j=0
I The estimates Ix (i, j), Iy (i, j), It (i, j) will be discussed soon . . .
Discrete Optimization

I The goal is to minimize: E = Es + λEc


I Take derivatives w.r.t. u(i, j), v (i, j) for
0 ≤ i ≤ N − 1, 0 ≤ j ≤ M − 1:
∂E
∂u(i,j) = 2[u(i, j) − uave (i, j)] + 2λ[Ix u(i, j) + Iy v (i, j) + It ]Ix
∂E
∂v (i,j) = 2[v (i, j) − vave (i, j)] + 2λ[Ix u(i, j) + Iy v (i, j) + It ]Iy
I The local 4-averages are:
uave (i, j) = 14 [u(i + 1, j) + u(i − 1, j) + u(i, j + 1) + u(i, j − 1)]
vave (i, j) = 41 [v (i + 1, j) + v (i − 1, j) + v (i, j + 1) + v (i, j − 1)]
Discrete Solution

I The minima occur when the derivatives are zero:


∂E ∂E
∂u(i,j) = ∂v (i,j) = 0
I This results in:
(1 + λIx2 )u(i, j) + λIx Iy v (i, j) = vave (i, j) − λIx It
(1 + λIy2 )v (i, j) + λIx Iy u(i, j) = vave (i, j) − λIy It
I Solving for u(i, j), v (i, j) yield:
Ix uave (i,j)+Iy vave (i,j)+It
u(i, j) = uave (i, j) − λ 1+λ(Ix2 +Iy2 )
.Ix
Ix uave (i,j)+Iy vave (i,j)+It
v (i, j) = vave (i, j) − λ 1+λ(Ix2 +Iy2 )
.Iy
Intensity Gradient Estimation

I The derivatives Ix , Iy , It can also be estimated as


differences-of-averages across a 2 × 2 × 2 data cube:
I Ix ≈ 41 [I (i +1, j, k)+I (i +1, j, k +1)+I (i +1, j +1, k)+I (i +1, j +
1, k +1)]−[I (i, j, k)+I (i, j, k +1)+I (i, j +1, k)+I (i, j +1, k +1)]
I Iy ≈
1
4 [I (i, j+1, k)+I (i, j+1, k+1)+I (i +1, j+1, k)+I (i +1, j+1, k+
1)]−[I (i, j, k)+I (i, j, k +1)+I (i +1, j +1, k)+I (i +1, j, k +1)]
I It ≈ 41 [I (i, j, k +1)+I (i, j +1, k +1)+I (i +1, j, k +1)+I (i +1, j +
1, k +1)]−[I (i, j, k)+I (i, j +1, k)+I (i +1, j, k)+I (i +1, j +1, k)]
Intensity Gradient Estimation
Iterative Solution

I The solution for u(i, j) and v (i, j) suggests a numerical


algorithm for actually computing them. The relaxation
algorithm is:
(p) (p)
p Ix uave (i,j)+Iy vave (i,j)+It
u (p+1) (i, j) = uave (i, j) − λ 1+λ(Ix2 +Iy2 )
.Ix
(p) (p)
p Ix uave (i,j)+Iy vave (i,j)+It
v (p+1) (i, j) = vave (i, j) − λ 1+λ(Ix2 +Iy2 )
.Iy
I This technique of compute a “new” estimate from “old”
estimates is a common technique in numerical analysis called
successive refinement
Initial Estimates

I The initial estimates u (0) (i, j), v (0) (i, j) might be taken from
some independent estimate of u, v or simply by taking
u (0) (i, j) = v (0) (i, j) = 0 which gives
I t Ix
u (1) (i, j) = −λ 1+λ(I 2 +I 2 )
x y
I I
v (1) (i, j) = −λ 1+λ(It 2y+I 2 )
x y
Iteration Limit

I The iteration are continued either:


I for a prescribed number P of iterations
I until iterating doesn’t change the solution much e.g.,
max(i,j) |u (p+1) (i, j) − u (p) (i, j)| <  where  is a tolerance
threshold
I Although in principle it could take N iterations for the
constraints to propagate across the image domain, in practice
it take just a few iterations due to the localness of image
motion
Optical Flow Example
Optical Flow Example

(a) Original Flow (b) Estimated Flow

Needle diagram: Arrow direction indicates flow direction and its


length indicates flow magnitude
Optical Flow Example

I Results are accurate in most places but errors occur near


the sphere boundary
I Error occur near flow discontinuities - the smoothness
conditions is inaccurate
I This is the Horn-Schunk Algorithm, the first and still classic
approach
I Many sophisticated techniques exist e.g., the attempt to find
flow discontinuites, then disable the smoothness constraint
there

You might also like