Video Compression
Video Compression
on
Video Compression
R.K. Singh
Director (Engg.)
STI(T),AIR & DD.
Digital Revolution
WHO
IS
BEHIND
ITS
SUCCESS ?
Digital Revolution
Is it feasible
without
?
Technically- Yes,
Economically & Practically
No
How?
Status of Video Signal after simple
Digitization
Crominance (Y) -Sampling at 13.5 MHz
Crominance (Cr)- Sampling at 6.75 MHz
Crominance (Cb)- Sampling at 6.75 MHz
Quantization - 8 Bit/ sample
Status of Video Signal after
simple Digitization (contd.)
Total Bits/second-
(13.5+6.75+6.75) x 8
=216 Mbits
BW Required-
108 Mhz
Video Compression: Orange Juice Analogy
Video Compression: Orange Juice Analogy
Compression v/s Orange Juice
VIDEO SIGNAL
1. Filtering of video
signal to remove
noise etc. otherwise
it may corrupt the
video signal.
2. Removing unwanted
information from the
video signal.
ORANGES
1. Washing of oranges
to remove dust etc.
otherwise it may
spoil the orange
juice.
2. Removing peel,
skin, seed from
oranges
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
3. By digitization
process converting
analog video into
digital video.
ORANGES
3. By crushing process
converting orange
into orange juice.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
4. A significant amount of
information content in
the video signal is
redundant. An
expensive encoder is
used to remove the
various types of
redundant information
from the digitized video
signal without
destroying its quality.
ORANGES
4. Water content in the
orange juice is
redundant. An
expensive plant is used
to remove the
redundant water
content from the orange
juice without destroying
its quality
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
5. The output of the
encoder is compressed
digital video whose
number of bits are
much less than the bits
of digitized video fed to
the encoder.
ORANGES
5. The output of the plant
is pulp of orange juice
whose physical
quantity is much less
than the orange juice
quantity which is fed to
the plant.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
6. The bits of compressed
digital video are converted
into small packets. Header
of each packet contains
necessary information such
as name, number of bits,
and direction for decoding
to convert it into almost like
original video signal.
ORANGES
6. The orange pulp is packed
in small tins. Necessary
information such as name
of company, details of
ingredients, weight,
directions for handling and
converting pulp into almost
like fresh orange juice.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
7. Packets of bits are
multiplexed to form
group of pictures
(GOP). These are now
ready for storage in
hard disc or
transmission by
modulating on carrier.
ORANGES
7. These small sealed
tins are packed into big
cartons for storage/
transportation. These
tins in the carton are
transported by some
carriers.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
8. The decoder on
receiving the packets
converts the
compressed bits into
almost like original
video signal as per the
information given in the
header.
ORANGES
8. Consumer follows the
instructions written on
each tin to prepare
almost like fresh
orange juice from the
pulp. For this he is
required to add correct
quantity of water and
ice etc.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
9. The decoding process
of converting
compressed bits into
video signal is much
simpler than encoding
process. Therefore
decoders are cheaper
and less complicated
than encoder.
ORANGES
9. The process of making
orange juice from the
pulp is very simple as
compared to the
process of converting
orange juice into pulp.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
10.Encoder is like a black
box. The manufacturer
do not specify, how
coding is done. It is
their trade secret. only
bit stream and decoding
format is described.
ORANGES
10.Manufacturer do not
specify how exactly
pulp is prepared. It is
their trade secret. They
only specify its
ingredients and
instruction about how
to prepare orange
juice from the pulp.
Compression v/s Orange Juice (contd.)
VIDEO SIGNAL
11.The compression
process must follow the
international standards
like JPEG, MPEG,
H261 etc.
ORANGES
11.The orange pulp must
be prepared as per the
standard specifications
like ISI etc.
Understanding Compression
Bit Rate Reduction
Bit Rate is product of
Sampling Rate & Coding bits
per sample.
Some of the information is
redundant
Types of redundancy in video signal
1. Spectrum Redundancy.
2. Spatial Redundancy.
3. Temporal Redundancy.
4. Entropy Redundancy.
5. Psycho-visual redundancy
Lossless Compression
Decoder output is exactly identical with
source data.
variable output data rate.
suitable for computer data.
Low compression factor of 2:1.
Lossy Compression
Decoder output is not identical with
source data.
Based on psycho-acoustic and psycho-
visual perception.
Reduces the bandwidth required.
Not suitable for computer data but
suitable for Audio & Video.
Coding Techniques for Compression
Spectral Coding
Spatial coding
Temporal Coding
Spectral Coding
The RGB signals from video cameras are highly
correlated and take on large bandwidth of 15 MHz.
To decrease the amount of video sample data based
on human perception, the RGB color space is
converted to Y Cr Cb color space.
The Y has the full bandwidth as it is very sensitive to
human perception.
The Cr and Cb components have a narrower
bandwidth because these are less discernible by
human eye. The chrominance components are
usually decimated by two, both horizontally and
vertically resulting in a reduced number of samples.
Also applicable to analog video.
Spatial Coding
Intraframe or spatial coding uses only
the spatial information existing in each
video frame.
Also applicable for still-image coding,
where real-time implementation does
not matter.
Spatial Coding (contd.)
Video signal exists in four dimensions
I) the magnitude of the sample
ii) the horizontal spatial axis
iii) the vertical spatial axis
iv) the time axis.
Since spatial coding is confined only to individual
frame therefore it works in three dimensions only is
on the horizontal and vertical spatial axis and on the
sample values.
Spatial Coding (contd.)
In a picture, where there is a high spatial frequency
content due to detailed areas of the picture, there is a
relatively small amount of energy at such
frequencies.
Also picture may contains sizeable areas in which the
same or similar pixel values exist.
This give rise to low spatial frequency contents e.g.
areas of sky or grassland.
The average brightness of the picture results in a
substantial zero frequency (DC) component.
Spatial Coding (contd.)
Simply omitting the high
frequency components is
unacceptable as this causes
an obvious softening of the
picture.
Human eye's sensitivity to
noise in high spatial
frequencies is less.
This is used for coding gain.
Spatial Coding (contd.)
Coding gain is obtained by taking advantage of the fact that the
amplitude of the spatial components falls with frequency.
If the spatial spectrum is divided into frequency bands the high
frequency bands can be described by fewer bits not only because
their amplitudes are smaller but also because more noise can be
tolerated.
Predictive coding:
A Spatial Coding categories
Prediction errors are very small when the present
pixel is predicted from neighbouring pixels.
The DPCM technique encodes the quantized value of
the difference between present pixel value and
predicted value (i.e. prediction error).
The use of a great number of neighbouring pixels for
prediction can decrease prediction error and raise
performance.
As using a large number of pixels for predicting result
in complexity, the number of neighbouring pixels
used for prediction is generally not more than four.
Predictive coding:
A Spatial Coding categories (Contd.)
Transform coding:
A Spatial Coding categories
Employed in the world standards such as JPEG,
H.261, and MPEG for still-images and moving image
compression.
The basic concept is to obtains a high compression
ratio by eliminating redundancies through orthogonal
transforms.
Transform coding:
A Spatial Coding categories (Contd.)
Input image is divided into blocks of NXN pixels in
DCT.
Block size in chosen by considering the requisite
compression efficiency and picture quality.
The bigger the block size the greater the
compression ratio, since more pixels are used to
reduce redundancies.
But with block size, design complexity increase.
As Per experimental results, the 8 x8 block size is
known to be the most effective.
Transform coding:
A Spatial Coding categories (Contd.)
Image degradations due to the DCT coding scheme
result from the ringing effects of blocks, including
abrupt boundaries and traces of block boundaries in
flat regions.
These visual degradations can be reduced using a
DCT co-efficient quantizer that is based on human
visual characteristics.
A larger quantizer step size is used for those DCT
high frequency coefficients that are not susceptible
to human eyes, and a smaller step size for the low
frequency coefficients that are susceptible to human
eyes.
Transform coding:
A Spatial Coding categories (Contd.)
Transform coding:
A Spatial Coding categories (Contd.)
The thresholding processing and quantization in the
basic DCT coding algorithm can apply the same
technique to all the blocks regardless of the contents
of the image data.
Fine thresholding and quantization step size are
required for restoration of high quality pictures but
then the requisite bit rate increases too much for
complex images.
Hence an adaptive coding scheme is considered for
a trade-off between quality and bit rate.
Transform coding:
A Spatial Coding categories (Contd.)
DCT - Adaptive Coding Scheme
Some of the DCT Coding techniques categorize
blocks into several characteristic models according to
block characteristics and handle them according to
the properties of each model.
Transform coding uses the orthogonal transform to
reduce redundancies by eliminating data correlation,
and its performance improves when there are fewer
activities in a block using this fact, an adaptation
method is derived that divides image data into blocks
of variable sizes according to the degree of data
activities.
Transform coding:
A Spatial Coding categories (Contd.)
DCT - Adaptive Coding Scheme
Adaptive schemes divides highly active parts into
very small blocks or low activity parts into large
blocks and is thus suitable for the coding of text,
drawings and graphic images.
Subband Coding :
A Spatial Coding Categories
Subband coding is composed of two major
steps.
Subband filtering step, which splits an image
signal into its constituent frequency
components.
coding step, which compresses each
frequency band according to its respective
characteristics.
Subband Coding :
A Spatial Coding Categories (Contd.)
Subband coding is accompanied by an analysis filter
band at the encoder and a synthesis filter bank at the
decoder, respectively.
The analysis filter bank splits the input into several
different bands using a different sampling rate for
each band.
The synthesis filter bank combines several band
signals of different rate to synthesize the desired
signal.
Subband coding requires less processing time for
each band, but requires many processors, say one
for each band.
Subband Coding :
A Spatial Coding Categories (Contd.)
After decomposing an image into several bands
using the analysis filter bank, a different coding
scheme can be applied to each band that is most
appropriate for the given band.
Since data characteristics of each band vary widely
and human visual sensitivity to degradation also
varies from band to band, better performance is
obtained when each of the bands is processed
according to its own set of characteristics.
Spatial Coding Process
Spatial Coding Process Consists of following
steps.
1. DCT
2. Quantization
3. Entropy Coding
4. Multiplexing
5. Buffer
6. Rate Control
7.Quantizing Tables
Spatial Coding Process
Spatial Coding Process- DCT
The first step in spatial coding is to perform an
analysis of spatial frequency using a transform.
A transform is simply a way of expressing a
waveform in a different domain, in this case, the
frequency domain.
The basic concept of DCT coding is to obtain a high
compression ratio by eliminating redundancies
through orthogonal transforms
Spatial Coding Process- DCT
(Contd.)
The size of DCT is decided to be 8 x 8 as a trade-off
between the energy compactions efficiency and the
complexity of the transforms. The efficiency of DCT is
high for intraframe coded blocks due to high correlation
among image samples but low for interframeing coded
blocks due to low correlation among samples in motion
compnsated prediction error signal. DCT can be applied
either in the frame mode or field mode. The field DCT is
more effective in interlaced video with rich motion.
Spatial Coding Process- DCT
(Contd.)
The size of DCT is decided to be 8 x 8 as a trade-off
between the energy compactions efficiency and the
complexity of the transforms.
The efficiency of DCT is high for intraframe coded
blocks due to high correlation among image samples
but low for interframeing coded blocks due to low
correlation among samples in motion compensated
prediction error signal.
DCT can be applied either in the frame mode or field
mode. The field DCT is more effective in interlaced
video with rich motion.
Spatial Coding Process-
Quantization
Quantiziation is a process of representing the real
valued DCT coefficients by a finite number of bits.
The DC and AC components are quantized
seperately.
The DC component is more important in image
reconstruction and is thus quantized using 8 to 11
bits.
The AC coefficients are quantized using the
quantization matrix and the quantizer scale factor.
The quantization matrix incorporates the human
visual system.
Spatial Coding Process-
Quantization (Contd.)
The scale factor controls the quantizer step size and
resulting number of bits generated.
The output of a transform is a set of coefficients that
describe how much of a given frequency is present in
the signal. An inverse transform will reproduce the
original waveform.
The most well known transform is the Fourier
transform. This transform finds each frequency in the
input signal.
It is done by multiplying the input wave form by a
sample of a target frequency called a basis function
and integrating the product.
Spatial Coding Process-
Quantization (Contd.)
It is done by multiplying the input wave form by a
sample of a target frequency called a basis function
and integrating the product.
When the input waveform does not contain the target
frequency, the integral will be zero but when it does,
the integral will be a coefficient describing the
amplitude of that component frequency.
Spatial Coding Process-
Zig-Zag Scanning
Spatial Coding Process-
Entropy Coding (contd.)
After quantization the entropy coding
may be performed in two ways :
Variable length coding (VLC)
Run Length Coding
Spatial Coding Process-
Entropy Coding (contd.)
Variable length coding (VLC)
This type of coding reduces the average code
length by assigning a longer code word to a
less frequent symbol and a shorter code word
to a more frequent one.
Huffman coding is a typical variable length
coding method whose objective is to yield the
average code length as close as the
theoretical limit, called entropy.
Spatial Coding Process-
Entropy Coding (contd.)
Variable length coding (VLC) (Contd.)
Huffman coding is used for variable length coding of
quantized DCT coefficients, differential motion
vectors, macroblock type, coded block patterns,
macroblock address, and so on .
In MPEG-2 image compression, the run-length
coding and Huffman coding are combined to variable
length code the quantized DCT coefficients.
In coding the macroblock information, the macroblock
address (MBA), macroblock type (Mbtype) and coded
block pattern (CBP) are Huffman coded.
Spatial Coding Process-
Multiplexing
After encoding, the video and audio data of each
channel are repackaged into a single data stream in
a process called multiplexing.
This data stream is eventually transmitted to the
home and de-multiplexed in the digital set-top box
which reforms the original individual video and audio
components.
Multiplexing simplifies the transmission system and
reduces capital costs
One set of communications equipment is needed to
carry a potentially large number of services and that
bandwidth can be better managed.
Spatial Coding Process-
Multiplexing (contd.)
The digital multiplex is more efficient & flexible.
Digital multiplexing allows for the possibility or
offering regional services through re-packaging.
Because the individual channels in the digital
multiplex can be identified then re-packaged by a
process called re-multiplexing (or remux), regional
programming can be inserted into a national digital
multiplex on a region by region basis.
Spatial Coding Process-
Buffer
The Video Buffer Verifier (VBV) is a hypothetical
decoder connected at the output of the encoder.
The encoder monitors the buffer status of the
hypothetical decoder and controls the bit rate such
that the overflow or underflow can be avoided at the
encoder/decoder.
Spatial Coding Process-
Rate Control
In MPEG-2 compression, the three different picture
types produce quite a fluctuating number of bits.
Even in the same type of pictures, the number of
generated bits can be significantly different
depending on the variance of motion and scene
complexity.
Therefore, an appropriate bit allocation is needed that
can determine the target bit budget of a GOP and
assign bits to each frame within the GOP.
Even in a frame, one may incorporate the human
visual system in deciding the quantizer step size.
Spatial Coding Process-
Rate Control (contd.)
For example, a complex scene, like a forest, to
whose quantization noise human eyes are
insensitive, can be quantized using coarser step size,
whereas a smooth scene, like a human face, to
whose quantization noise human eyes are very
sensitive, can be more finely quantized.
MPEG-2 standard does not specify how to control the
number of bits generated from the encoder. The
encoder should be carefully designed so that no
overflow or underflow occurs at the output buffer and
the picture quality is globally constant to human
perception.
Spatial Coding Process-
Quantization Tables
Temporal Coding
Temporal coding or inter-coding takes advantage of
the similarities between successive pictures in real
material.
Instead of sending information for each picture
separately, inter-coders will send the difference
between the previous picture and the current picture
in a form of differential coding.
A picture store is required at the coder to allow
comparison to be made between successive pictures
and a similar store is required at the decoder to make
the previous picture available.
The difference data may be treated as a picture itself and
subjected to some form of transform-based compression.
Temporal Coding (Contd.)
Motion Compensation
Motion reduces the similarities between pictures and
increase the data needed to create the difference
picture.
Motion compensation is a process which effectively
measures motion of objects from one picture to the
next so that it can allow for that motion when looking
for redundancy between pictures.
Motion compensation is used to increase the
similarities between pictures.
Temporal Coding (Contd.)
When an object moves across the TV screen, it may
appear in a different place in each picture, but it does
not change in appearance very much.
The picture difference can be reduced by measuring
the motion at the encoder.
This is sent to the decoder as a vector.
The decoder uses vector to shift part of the previous
picture to a more appropriate place in the new
pictures.
Temporal Coding (Contd.)
Temporal Coding (Contd.)
Temporal Coding is achieved by creating three types of pictures.
I Picture
P Picture
B Picture
Temporal Coding (Contd.)
(Intra) I-Picture
This type of picture is intra frame coded that need no
additional information for decoding.
They require a lot of data compared to other pictures
types, and therefore they are not transmitted more
frequently than necessary.
They consist primarily of transform co-efficients and have
not vectors.
Temporal Coding (Contd.)
(Intra) I-Picture (Contd.)
The I pictures are inserted periodically for the purpose of
blocking propagation of errors caused by inter frame
coding and for the purpose of realising random access in
broadcasting or storage media environments.
The I picture affects the total image quality substantially,
so it must be quantized more fine that the P and B
pictures. The I pictures allow the viewer to switch
channels, and they arrest error propagation.
Temporal Coding (Contd.)
(Intra) I-Picture
Temporal Coding (Contd.)
Predictive (P) picture
This type of pictures are forward predicted from an earlier
picture which could be an I Picture or P picture.
Picture data consists of vectors describing where, in the
previous picture each macroblock should be taken from,
and transform coefficients which describe the correction
in difference data which must be added to that
macroblock.
The picture propagates coding errors to the following P
and B pictures so it need to be quantized more finally
than B pictures.
P pictures require roughly half the data of an I picture.
Temporal Coding (Contd.)
Predictive (P) picture
Temporal Coding (Contd.)
Bi-directionally predictive (B) picture
B pictures are bi-directionally predicted from earlier or
later I or P pictures.
B picture data consists of vectors describing wherein
earlier or later pictures data should be taken from.
It also contains the transform coefficients that provide
the correction.
A coding error in this type of pictures does not
propagate and the quantization step size is controlled
to be relatively larger.
B picture require one quarter the data of an I picture.
Temporal Coding (Contd.)
.
Bit Stream
An MPEG program refers to a set of video, audio and
data information that is multiplexed into a simple data
stream for storage and playback or for transmission
across a network.
Such individual MPEG program streams, which are
called elementary streams, are packetized to large,
variable-size packets to form packetized elemantary
streams (PES) combining one or more programs that
have one or more independent time bases into a
single stream produces a transport stream (TS).
Bit Stream (Contd.)
The TS is designed for use in error- probable
environments such as storage or transmission in
lossy or noisy media and in the cases when multiple
programs need to be multiplexed into a simple data
stream. TS is segmented into fixed -size 188 byte
packets, and TSs can be multiplexed into a new TS.
Streams: Synchronization
The MPEG-2 system assumes a constant delay
model in which the time delay of the entire system,
from the input of the encoder to the output of the
decoder is constant.
This time delay includes all of the time segments
required for coding, coder buffering, multiplexing,
storage or transmission, decoder buffering, decoding
and presentation.
Streams: Synchronization (contd.)
Since the end-to-end delay through the entire system
is constant, the audio and video presentations can be
exactly synchronized.
For this the decoder needs to have a system clock
whose frequency and instantaneous value match
those of the encoder.
The time information necessary to convey the
encoder system's clock to the decoder is the program
clock reference (P CR).
MPEG Bit Stream
MPEG Video Compression
Algorithms
MPEG-2 video compression algorithm is composed
of ME/MC, DCT scaler quantization, RLC and
Huffman coding functions.
It is a hybrid algorithm that combines the intra-frame
and inter-frame coding schemes.
The MPEG-2 video syntax has a hierarchical
structure comprising the sequence layer, groups of
picture (GOP) layer, picture layer, slice layer,
inacroblock layer, and block layer.
MPEG Video Compression
Algorithms (Contd.)
Among these the GDP layer is designed for random
access and recovery from transmission errors.
The slice layer is for resynchronization at the decoder
in the case of transmission errors.
A macroblock is the unit for ME/MC
A block is the unit for DCT.
MPEG Video Compression
Algorithms (Contd.)
When a slice is lost at the decoder due to
transmission errors in the channel, the next slice can
be received after resynchronization at the start of the
slice.
When the decoder is turned on during the
transmissions of the bit stream or when a channel
change occurs in the broadcasting environment, the
pictures are reconstructed and presented from the
next I pictures.
MPEG Video Compression
Drawbacks of Compression
By definition, compression removes redundancy from
signals. Redundancy is, however, essential to
making data resistant to errors. As a result the
compressed data are more sensitive to errors than
uncompressed data.
Techniques for recovering original uncompressed
data depend on the probabilistic characterization of
the signal. However, signal statistics are not
stationary. Unlike statistical redundancy, the removal
of information based on the limitations of the human
perception is irreversible. The original data cannot
be recovered following such a removal.
Drawbacks of Compression
(Contd.)
Compression techniques using tables such as
Lampel - Ziv-Wekh codecs are very sensitive to bit
errors as an error in the transmission of a table value
results in bit errors every time that table location is
accessed. This is known as error propagation.
Variable length coding techniques such as Huffman
Code are sensitive to bit errors.
As perceptive coders introduce noise, it will be clear
that in a cascaded system codec could be confused
by the noise due to the first. If the coders are
identical then each may will make the same decisions
when they are in tandem, but if the codecs are not
identical the results could be disappointing.
Drawbacks of Compression
(Contd.)
Signal manipulation between codecs can also result in artifacts
which were previously undetectable being revealed because the
signal which was making them is no longer present.
One practical drawback of compression system is that they are
largely generic in structure and the same hardware can be
operated at a variety of compression factors. Clearly the higher
the compression factor, the cheaper the system will be to
operate so there will be economic pressure to use high
compression factors. Naturally the risk of artifacts is increased
and so there is counter - pressure from those with engineering
skills to moderate the compression.
Drawbacks of Compression
(Contd.)
Cascaded compression systems cause loss
of quality, and the lower the bit rates the
worse this gets. Quality loss increases if any
post production steps are performed between
compressions.
Compression systems cause delay.
Compression systems work best with clean
source material. Noisy signals or poorly
decoded composite video give poor results.
Drawbacks of Compression
(Contd.)
In motion-compensated compression system, the use
of periodic intra-fields means that the coding noise
varies from picture to picture and this may be visible
as noise pumping. Noise pumping may also be
visible where the amount of motion changes.
Input video noise destroys inter-coding as there is
less redundancy between pictures and the difference
data become larger, requiring course quantizing and
adding to the existing noise.
Excess compression may also result in color bleed
where fringing has taken place in the chroma or
where high frequency chroma co-efficient have been
discarded.
Precautions with
Compression techniques
The transmission systems using compressed
data must incorporate more powerful error
correction strategies and avoid compression
techniques which are notoriously sensitive.
If not necessary compression should not be
used.
If compression is to be used, the degree of
compression should be as small as possible.
In other words highest practical bit rate
should be used.
Precautions with
Compression techniques
(contd.)
Low bit rate coders should only be used
for the final delivery of post produced
signal to the end user.
Compression quality should only be
assessed subjectively.
Precautions with
Compression techniques
(contd.)
Compression quality varies widely with source
material. One should be careful about demonstration
which may use selected material to achieve low bit
rates.
While assessing the performance of a codec one
should not hesitate in criticizing artifacts. Eyes and
ears are good to assess the quality of the output of
the codec.
Use still frames to distinguish special artifacts from
temporal artifacts, while assessing the performance..
1
2
3
4 5 6