0% found this document useful (0 votes)
0 views

UNIT-5 Introduction to video processing

The document provides an overview of video processing, distinguishing between analog and digital video, and explaining concepts like persistence of vision, scanning processes, and various video standards such as NTSC and PAL. It discusses digital video processing techniques, including motion estimation, optical flow, and the Lucas-Kanade algorithm, as well as pixel-based motion detection methods. Additionally, it covers the Kalman filter's application in object tracking and motion analysis in video streams.

Uploaded by

samiksha.code24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

UNIT-5 Introduction to video processing

The document provides an overview of video processing, distinguishing between analog and digital video, and explaining concepts like persistence of vision, scanning processes, and various video standards such as NTSC and PAL. It discusses digital video processing techniques, including motion estimation, optical flow, and the Lucas-Kanade algorithm, as well as pixel-based motion detection methods. Additionally, it covers the Kalman filter's application in object tracking and motion analysis in video streams.

Uploaded by

samiksha.code24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Unit-5

Introduction to video processing

Dr. Y. N. Thakare
CSE(AIML)
RCOEM, Nagpur
Video
• Analog video is represented as a continuous (time-
varying) signal

• Digital video is represented as a sequence of digital


images.
Persistence of vision
• A video camera is a device which captures a series of images, over
a period of time. This series of images are played back on the TV
monitor at a rate fast enough so that there is a perception of
continuous motion to the viewer.
• This is where persistence of vision comes into play. The human
eye retains an image presented in front of it for a short time after
the image is no longer visible.
• This tendency of retaining an image in the memory is caller
persistence of vision.
Analog video
• We begin with trying to understand how scanning takes place in analog
television.
• In TV sets, the electron beam scans the picture from top left corner to
bottom right corner and then travels back from the bottom edge of the
screen to the top edge to start scanning the second frame.
• Hence the scene is simultaneously scanned in the horizontal as well as the
vertical direction.
• The screen is blanked out during the time the beam moves from the bottom
of the screen to the top of the screen.
• This time is also known as flyback time.
• Persistence of vision ensures that the picture looks continuous. Hence the
scene is scanned rapidly both in horizontal and vertical directions
simultaneously to give enough number of complete pictures per second so
that it gives the illusion of continuous motion.
Scanning process
Horizontal Scanning
The horizontal deflection coils present deflect the electron beam from left
to right. This deflection is possible due to linear increase of current in the
coils. As the trace of sawtooth wave goes up, horizontal deflection moves
to the right. The direction reverses after the peak and comes back to the
initial value.
Vertical Scanning
• The job of the horizontal deflection coils is to scan the beam from the
left to the right. Vertical deflection coils are required to make the
beam move in the vertical direction. This is necessary otherwise the
same horizontal line will be scanned again and again. Vertical
deflection coils, like the horizontal deflection coils also produce
sawtooth current required for moving the electron beam from top to
bottom of the raster. After coming down to the bottom of the raster,
the rapid vertical retrace moves the beam again to the top.
Example of analog video
• NTSC Video
• NTSC stands for (National Television System Committee of
the U.S.A)
• The NTSC TV standard is mostly used in North America and
Japan.
• It uses a familiar 4:3 aspect ratio (i.e., the ratio of picture
width to height) and 525 (interlaced) scan lines per frame at
30 fps.
PAL Video
• PAL (Phase Alternating Line) is a TV standard originally invented
by German scientists.
• This important standard is widely used in Western Europe, China,
India, and many other parts of the world.
• Because it has higher resolution than NTSC, the visual quality of
its pictures is generally better.
Digital Video
• Digital video is audio/visual content in a binary format,
with information is presented as a sequence of digital data.
• The advantages of digital representation for video:
• Storing video on digital devices or in memory, ready to
be processed (noise removal, cut and paste, and so on)
and integrated into various multimedia applications.
• Direct access, which makes nonlinear video editing
simple.
• Repeated recording without degradation of image quality.
• Ease of encryption and better tolerance to channel noise.
• High-Definition TV
• Ultra High Definition TV (UHDTV)
Analog Display Interfaces
• Analog video signals are often transmitted in one of three different interfaces:
• Component video,
• Composite video, and
• S-video.

• Figure 5.7 shows the typical connectors for them


Digital Display Interfaces
• Given the rise of digital video processing and the monitors that
directly accept digital video signals, there is a great demand
toward video display interfaces that transmit digital video signals.
• Such interfaces emerged in 1980s (e.g., Color Graphics Adapter
(CGA)
• Today, the most widely used digital video interfaces include
Digital Visual Interface (DVI), High-Definition Multimedia
Interface (HDMI), and Display Port, as shown in Fig. 5.8
Digital Video Processing
• Digital video processing refers to manipulation of the digital video
bitstream.
• All known applications of digital video today require digital
processing for data compression, In addition, some applications
may benefit from additional processing for motion analysis,
standards conversion, enhancement, and restoration.
• Digital processing of still images has found use in military,
commercial, and consumer applications since the early 1960s.
Space missions, surveillance imaging, night vision, computed
tomography, magnetic resonance imaging, and fax machines are
just some examples.
TIME-VARYING IMAGE FORMATION
MODEL
• We represent a time varying image by a function of three
continuous variables, sc(x1,x2, t), which is formed by projecting a
time-varying three-dimensional (3-D) spatial scene into the two-
dimensional (2-D) image plane.
• The temporal variations in the 3-D scene are usually due to
movements of objects in the scene. Thus, time-varying images
reflect a projection of 3-D moving objects into the 2-D image
plane as a function of time.
Three-Dimensional Motion Models
• In this section, we address modeling of the relative 3-D motion
between the camera and the objects in the scene. This includes 3-D
motion of the objects in the scene, such as translation and rotation,
as well as the 3-D motion of the camera, such as zooming and
panning.
• In the following, models are presented to describe the relative
motion of a set of 3-D object points and the camera, in the
Cartesian coordinate system (Xi, X2, Xs) and in the homogeneous
coordinate system (kX1,kX2,kX3,k), respectively.
Rigid Motion in the Cartesian Coordinates

3-D translation vector

denote the coordinates of an object point at times t and t’ with respect to the center of rotation, respectively
Geometric Image Formation
Perspective Projection
• Perspective projection reflects image formation using an ideal
pinhole camera according to the principles of geometrical optics.
Thus, all the rays from the object pass through the center of
projection, which corresponds to the center of the lens. For this
reason, it is also known as “central projection.”
Orthographic Projection
• Orthographic projection is an approximation of the actual imaging
process where it is assumed that all the rays from the 3-D object
(scene) to the image plane travel parallel to each other. For this
reason it is sometimes called the “parallel projection.”
Motion Estimation
• Motion estimation is a process used in video processing to
determine the motion of objects within a video sequence. It
involves analyzing the changes in pixel values between frames of
a video to estimate the motion of objects within the scene.
• Motion estimation is an important step in many video processing
applications such as video compression, video stabilization, and
object tracking.
• By accurately estimating the motion of objects within a video, it
becomes possible to identify and track moving objects, remove
camera motion, and compress video data by only transmitting the
changes between frames instead of transmitting the entire frame.
• There are various techniques used for motion estimation, including
block matching, optical flow, and phase correlation. These
methods involve comparing the pixel values in different frames to
estimate the motion of objects between them. The accuracy of the
motion estimation depends on the quality of the video data and the
complexity of the motion within the scene.
Optical Flow
• Optical flow is a popular technique used for motion estimation in
video processing. It involves estimating the motion of objects by
analyzing the changes in pixel intensity values between
consecutive frames in a video sequence.
• The basic principle behind optical flow is that each point in an
image moves in a particular direction between consecutive frames.
By calculating the direction and magnitude of this movement for
each pixel, it is possible to estimate the overall motion of objects
in the scene.
OPTICAL FLOW AND DIRECT METHODS
• Optical flow is the motion of objects between consecutive frames
of sequence, caused by the relative movement between the object
and camera. The problem of optical flow may be expressed as:
Sparse vs Dense Optical Flow
Lucas Kanade Algorithm
• The Lucas-Kanade algorithm is a widely used method for optical flow
estimation in video processing.
• Optical flow refers to the motion of objects in a video stream, and the Lucas-
Kanade algorithm is used to estimate the optical flow between successive
frames.
• The Lucas-Kanade algorithm works by assuming that the motion of each
pixel in the image can be described by a small motion vector. The algorithm
estimates the motion vector for each pixel by solving a system of linear
equations based on the intensity values of the pixels in the two frames.
• The algorithm is best explained with an example.
• Consider a simple scenario where a camera is capturing a video of
a car moving on a road. The Lucas-Kanade algorithm can be used
to estimate the optical flow of the car between two consecutive
frames.
• In the first frame, the car appears as a collection of pixels with
certain intensity values. In the second frame, the car has moved
slightly, and the intensity values of the pixels have changed. The
Lucas-Kanade algorithm estimates the motion of each pixel by
assuming that the motion can be described by a small motion
vector.
Car example
• The algorithm first selects a small window around each pixel in
the first frame. The size of the window is typically around 3x3 or
5x5 pixels. The algorithm then searches for the corresponding
window in the second frame, which should contain the same pixel
as the first frame but with different intensity values due to motion.
• The Lucas-Kanade algorithm then solves a system of linear
equations based on the intensity values in the two windows to
estimate the motion vector. The system of equations is based on
the assumption that the pixel intensities in the two windows are
related by a linear function of the motion vector.
• The Lucas-Kanade algorithm repeats this process for all the pixels in
the image to estimate the optical flow of the car between the two
frames. The estimated optical flow can be visualized as a set of motion
vectors, with each vector representing the estimated motion of a single
pixel.

• In summary, the Lucas-Kanade algorithm is a method for estimating


optical flow in video processing. It works by assuming that the motion
of each pixel can be described by a small motion vector, and estimating
the motion vector by solving a system of linear equations based on the
intensity values of the pixels in two consecutive frames. The algorithm
is widely used in applications such as object tracking and motion
analysis.
Pixel Based Motion Estimation
• Pixel-based motion detection is a technique used to detect motion
in a video stream by analyzing the pixel values in successive
frames.
• It works by comparing the pixel values of corresponding pixels in
two consecutive frames of the video stream and detecting any
regions where the pixel values have changed significantly.
• If the absolute difference between the pixel values is greater than a
certain threshold, then the pixels are considered to be part of a
moving object.
• There are different types of pixel-based methods for motion
detection, including frame difference, background subtraction, and
temporal differencing. Each method has its own advantages and
disadvantages, but they all use the same basic principle of
comparing pixel values in successive frames.
• Frame Difference Method: The frame difference method involves
subtracting the pixel values of two consecutive frames to obtain
the difference image. The difference image represents the regions
where the pixel values have changed between the two frames. A
threshold value is applied to the difference image to remove small
changes due to noise and to detect only significant changes. The
remaining pixels are considered to be part of a moving object.
Frame difference method
• Background Subtraction Method: The background subtraction
method involves creating a background model that represents the
static parts of the scene. The background model is created by
averaging the pixel values over a period of time. The current frame
is then compared to the background model to detect any changes.
The difference between the current frame and the background
model represents the moving objects in the scene.
• Temporal Differencing Method: The temporal differencing method involves
comparing the pixel values of consecutive frames and storing the difference
in a buffer. The buffer stores the differences between the current frame and
the frame n frames ago. If the difference between the current frame and the
frame n frames ago is greater than a certain threshold, then the pixels are
considered to be part of a moving object.
• To reduce false detections due to noise or lighting changes, a
background model is often used to represent the static parts of the
scene. The background model is typically computed using a
running average of the pixel values over time.

• Once the moving objects have been detected, various techniques


can be used to track them over time, such as using centroid
tracking or Kalman filtering. Centroid tracking involves
computing the centroid of the moving object in each frame and
using it to track the object over time. Kalman filtering is a more
sophisticated technique that uses a probabilistic model to track the
object and predict its future position.
• One of the advantages of pixel-based motion detection is that it is
computationally efficient and can be implemented in real-time.
However, it can be sensitive to changes in lighting or shadows,
and may not be as accurate as more sophisticated methods that use
machine learning algorithms.
kalman filter in video processing
• Kalman filter is a widely used algorithm in video processing for object
tracking and motion analysis. It is a mathematical tool for estimating
the state of a system, based on a series of measurements. In video
processing, Kalman filter is used to estimate the position and velocity
of an object in the image sequence.
• The Kalman filter works by maintaining a model of the object's motion,
and updating this model based on new measurements from the video
stream. The model consists of two parts: the state vector and the state
transition matrix. The state vector contains the estimated position and
velocity of the object, while the state transition matrix describes how
the state changes over time.
• The Kalman filter algorithm has two main steps: prediction and update.
• Prediction: In the prediction step, the algorithm predicts the new state
of the object based on the previous state and the state transition
matrix. The state transition matrix describes how the state of the
object changes over time, based on the laws of physics. For example, if
the object is moving at a constant velocity, the state transition matrix
would be a simple linear equation that updates the position and
velocity of the object.
• The prediction step also estimates the uncertainty of the predicted
state, based on the uncertainty of the previous state and the
uncertainty in the state transition. The uncertainty is represented by a
covariance matrix, which describes the amount of error in the
prediction.
• Update
In the update step, the algorithm compares the predicted state with
the actual measurements from the video stream. The
measurements are typically noisy and contain errors due to
factors such as lighting conditions, occlusions, and camera motion.
The Kalman filter uses the difference between the predicted state
and the actual measurements to update the state estimate and the
covariance matrix. The update step involves two main calculations:
the Kalman gain and the innovation.
• The Kalman gain is a matrix that determines how much weight to
give to the predicted state and the actual measurements, based on
their respective uncertainties. The innovation is the difference
between the actual measurements and the predicted state, and
represents the error in the prediction.
• The Kalman filter updates the state estimate and the covariance
matrix based on the Kalman gain and the innovation. The updated
state estimate becomes the new predicted state for the next
iteration of the algorithm.
• Thank You

You might also like