0% found this document useful (0 votes)
49 views

Unit 5

Motion estimation refers to calculating the movement of objects or features in a sequence of images over time. It estimates the motion field, which describes how pixels or features move between frames. There are two main types of motion estimation: global estimates uniform motion of the entire image, while local focuses on smaller regions. Popular algorithms include block matching, Lucas-Kanade, and deep learning approaches. Dense motion estimation calculates a motion vector for every pixel to determine detailed motion across frames, and is used for video compression, image stabilization, and action recognition.

Uploaded by

Lakshya Karwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Unit 5

Motion estimation refers to calculating the movement of objects or features in a sequence of images over time. It estimates the motion field, which describes how pixels or features move between frames. There are two main types of motion estimation: global estimates uniform motion of the entire image, while local focuses on smaller regions. Popular algorithms include block matching, Lucas-Kanade, and deep learning approaches. Dense motion estimation calculates a motion vector for every pixel to determine detailed motion across frames, and is used for video compression, image stabilization, and action recognition.

Uploaded by

Lakshya Karwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT-4

MOTION ESTIMATION
Chap 6,7,8
Introduction
• Refers to the process of calculating the motion or movement of
objects or features within a sequence of images or frames over time.
• The primary goal of motion estimation is to estimate the motion
field, which describes how pixels or features move from one frame to
another.
• This information is crucial for understanding the dynamics of a scene and can
be used to make predictions about future movements.
• In summary, motion estimation is a crucial aspect of computer vision,
enabling the understanding and analysis of the dynamics and movements
within a sequence of images or video frames. It has broad applications in
fields like video processing, surveillance, robotics, and autonomous
vehicles. This technique is fundamental for various applications,
including video compression, object tracking, optical flow estimation,
image stabilization, action recognition, and more.
Types
• Global Motion Estimation:
• Global motion models assume that the entire image or a large portion of it undergoes a
uniform motion. Common global motion models include translations, rotations, scaling,
and affine transformations. These models are useful for estimating camera motion or
large rigid motions in a scene.
• Local Motion Estimation:
• Local motion estimation focuses on estimating motion at a smaller scale, typically
within small regions or patches of an image. This is often achieved by estimating the
optical flow, which provides the motion vector for each pixel or feature between
consecutive frames. Optical flow can be dense (computed for all pixels) or sparse
(computed for selected feature points).

ALGORITHMS USED
• Block matching, Lucas-Kanade method, Horn-Schunck method, Farnebäck
method
• Deep learning-based approaches like CNNs
Feature Based Alignment (Chap-6)
• 2D and 3D feature-based alignment
• Pose estimation
• Geometric intrinsic calibration
SfM Techniques(Chap-7)
• Introduction & Concepts-Sfm ppt
• Triangulation
• Factorization
• Bundle Adjustment
• Constrained structure and motion
Triangulation
• The problem of determining a point’s 3D position from a set of
corresponding image locations and known camera positions is known as
triangulation.
• fundamental in computer vision and 3D reconstruction, enabling
the reconstruction of a 3D scene or object from multiple 2D views.

If the points xL and xR are known, their projection lines are also known. If the
two image points correspond to the same 3D point X the projection lines must
intersect precisely at X. This means that X can be calculated from the
coordinates of the two image points, a process called triangulation.
• A point x (3D point) in 3D space is projected onto
the respective image plane along a line (green)
which goes through the camera's focal point,o1 and
o2 resulting in two points y1 and y2.
• If y1 and y2 are given and the geometry of the two
cameras are known, the two projection lines (green
lines) can be determined and it must be the case
The ideal case of epipolar geometry. A 3D point x is projected onto two that they intersect at point x (3D point).
camera images through lines (green) which intersect with each camera's • Using basic linear algebra that intersection point can
focal point, O1 and O2. The resulting image points are y1 and y2. The be determined in a straightforward way.
green lines intersect at x. • Due reasons like distortions, camera postions etc
the measured image points are y1’ and y2’. Their
projection lines (blue) do not have to intersect in 3D
space or come close to x.
• Triangulation solves this problem

Which 3D point xest is the best estimate of x given y1’


and y2’.and the geometry of the cameras?
-The answer is often found by defining an error measure
In practice, the image points y1 and y2 cannot be measured with arbitrary which depends on xest and then minimizing this error.
accuracy. Instead points y'1 and y'2 are detected and used for the
triangulation. The corresponding projection lines (blue) do not, in general,
intersect in 3D space and may also not intersect with point x.
• All triangulation methods produce xest = x in the case that y1’=y1 and y2’=y2 that is,
when the epipolar constraint is satisfied.
• If the relative position of the two cameras is known, this leads to two important
observations:
• Assume the projection point xL is known, and the epipolar line eR–xR is known and the
point X projects into the right image, on a point xR which must lie on this particular
epipolar line.
• This means that for each point observed in one image the same point must be observed
in the other image on a known epipolar line. This provides an epipolar constraint: the
projection of X on the right camera plane xR must be contained in the eR–xR epipolar
line.

Epipolar constraints can also be


described by the essential matrix or the
fundamental matrix between the two
cameras.
Factorization
Bundle Adjustment
Dense Motion Estimation(Chap-8)
• Dense motion estimation involves calculating motion vectors for every
pixel or a dense set of pixels in an image or frame sequence.
• The objective is to determine how each pixel has moved between consecutive
frames, providing a detailed motion field across the entire image.
Dense motion estimation is essential for various computer vision tasks,
1. Video Compression: Understanding motion between frames is
crucial for efficient video compression. Motion compensation can be
used to predict the content of one frame based on the motion
observed in previous frames.
2. Image Stabilization: Analyzing and compensating for camera shake
or unwanted motion in videos to stabilize the frames.
3. Action Recognition: Recognizing human actions or movements in
videos requires accurate estimation of dense motion fields to
understand how objects or body parts move over time.
4. Object Tracking: Continuously tracking the movement of objects or
features in a scene by estimating their motion from frame to frame.
• In this technique, motion is typically represented as a 2D vector
field, often referred to as optical flow. Each vector in the optical
flow field corresponds to the estimated motion (displacement) of
a pixel from one frame to another.
• The optical flow vectors can convey information about the
direction and magnitude of pixel movements.
Dense Motion Estimation
• Translational alignment
• Parametric motion
• Spline-based motion
• Optical flow
• Layered motion
Translational alignment
• Algorithms for aligning images and estimating motion in video sequences
are among the most widely used in computer vision.
• Widely used image registration algorithm is the patch-based translational
alignment (optical flow) technique developed by Lucas and Kanade
(1981)
• Many motion estimation techniques exist
• Parametric motion models- considers the global transformations like rotation,
shear etc.
• By learning the dynamics of motion or motion statistics-Eg.Gait analysis
• Spline motion modes-for complex motions
• TA in motion estimation refers to the estimation of translational
motion, which involves movement along a straight line without
rotation or scaling.
• Specifically, it involves determining the horizontal and vertical
displacements (translations) of objects or features in an image or video
sequence between consecutive frames.
• When applying translation alignment in motion estimation, the
assumption is made that the movement between frames can be
accurately modeled as a pure translation.
• This is a simplifying assumption that holds true for scenarios where the
motion is relatively small or when the objects being tracked or analyzed
exhibit primarily translational movement.
• The goal of translation alignment is to calculate the offset or
shift required to align the corresponding features or pixels in
one frame with those in another frame.
• This alignment information is then used to estimate the translation
motion vector, which specifies the horizontal and vertical shifts needed
to match the features or objects.
• Algorithms for translation alignment often involve optimization
techniques to find the best translation parameters that minimize
the difference or error between corresponding pixels or features
in the frames.
• Common methods include cross-correlation, mean square error (MSE)
minimization, normalized cross-correlation, and phase correlation,
among others.
• When dealing with scenarios where translational motion is the
dominant or primary form of movement.
How it is done?
• Error Metrics
• Optimization problem of displacement error
• Methods
• Hierarchical motion estimation
• Fourier Based Alignment
• Incremental Refinement
Parametric Motion Models
• Parametric motion in motion estimation refers to the representation
of motion using a specific mathematical or parametric model.
• In this approach, the motion between frames is assumed to follow a
predefined mathematical model with a set of parameters that describe the
motion.
• Estimating these parameters allows for the characterization and prediction
of motion in the sequence of images or frames.
Optical Flow
• It involves tracking the movement of pixels or features in
successive frames of a video sequence to understand how
objects are moving and where they are located in the scene.
• Optical flow is a technique used to describe image motion. It is
usually applied to a series of images that have a small time step
between them, for example, video frames.
• Optical flow calculates a velocity for points within the images,
and provides an estimation of where points could be in the next
image sequence.
• Hence, the optical flow is a vector field describing the
displacement of each pixel between two consecutive frames. Hence,
it allows for determining how the objects in the scene move.
What is it?
• This motion can also tell you how close you are to the different objects
you see. Distant objects like clouds, and mountains move so slowly
they appear still. The objects that are closer, such as buildings and
trees, appear to move backwards, with the closer objects moving faster
than the distant objects. Very close objects, such as grass or small
signs by the road, move so fast they whiz right by you.
• in addition to detecting obstacles, the optic flow can be used to
measure or estimate one’s own motion.
• Basic Principle:
• Optical flow is based on the assumption that neighboring pixels in an
image will have similar motion.
• It tries to estimate the velocity or displacement of each pixel or a set of
feature points between consecutive frames.
• Algorithms:
• There are different methods to calculate optical flow like Lucas-Kanade,
Horn-Schunck, Farneback, etc. and is widely used in different fields such
as video compression, object tracking, action recognition, autonomous
vehicles, augmented reality and so forth.
• The application of optical flow includes the problem of inferring not only the motion
of the observer and objects in the scene, but also the structure of objects and the
environment.
Lucas-Kanade Method
• Based on Brightness Constancy Assumption. The key idea here
is that pixel level brightness won't change a lot in just one
frame.
• Assumes that the optical flow is constant in a small
neighborhood around a pixel.
• The optical flow is generally computed using the brightness
constancy constraint. This is assumptions states the a point at
location (x,y) and time t will have moved by deltax, deltay) in the
short time interval delta t and its brightness I remains the same,
i.e.:
• Ix and Iy are the spatial gradients of the image intensity in the x
and y directions, respectively.
• u and v are the components of the optical flow vector in the x
and y directions, respectively.
• It is the temporal gradient of the intensity, representing the
change in intensity between frames.
• The Lucas-Kanade method assumes that the optical flow vx,vy
is constant within a small window W nxn pixels. Hence, the
optical flow equation holds for all pixels of coordinates q = (k, l)
within the window W:
• These equations form a linear system that can be written in matrix form:

where A is a matrix of size nX2 containing the image gradient components evaluated for each pixel of
the window W:

v is the vector representing the optical flow of the window W that we are going to estimate
• The least-square solution is:

Challenges and Limitations


Ambiguity: Optical flow estimation can be ambiguous, especially in areas with low texture or
occlusions.

Aperture Problem: It is challenging to determine the true motion direction when only a limited
portion of the object is visible in the image.

Illumination Changes: Changes in lighting conditions can violate the brightness constancy
assumption.
OF for Object Tracking using OpenCV
• https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2
021-12-10--opencv-optical-flow-tracking/2021-12-10/
• Shi-Tomasi corner detection to get key points of corners of
objects and pass it to LK method.
• ()

You might also like