Chapter 10 - Compression
Chapter 10 - Compression
Principles
Chapter 10
Compression
TMH Chapter - 10 1
Compression
Introduction
Compared to text files other media files like image, audio and
video take up a huge amount of disk space.
TMH Chapter - 10 2
Compression
Introduction
After an analog quantity has been digitized, it is stored on the
disk as a digital file. These files are referred to as raw or
uncompressed media data.
TMH Chapter - 10 3
Compression
Introduction
In either case this leads to a reduction in file size, but the actual
amount of reduction depends on a large number of factors
involving both the media data and the CODEC used
TMH Chapter - 10 5
Compression
Introduction
TMH Chapter - 10 6
Compression
Types of Compression
• Lossless compression - In this case the original data is not
changed permanently during compression. After decompression
therefore the original data can be retrieved exactly.
TMH Chapter - 10 8
Compression
Redundancies
• Statistical redundancy owes its origin to some kind of statistical
relationship existing within the media data.
TMH Chapter - 10 9
Compression
Redundancies
• In the HVS, visual information is not perceived equally, some
information may be more important than other information.
TMH Chapter - 10 10
Compression
Lossless Techniques
• Lossless compression techniques are also known as Entropy
Coding. Entropy coding is a generic term which refers to the
compression techniques which do not take into account the
nature of the information to be compressed.
TMH Chapter - 10 11
Compression
Lossless Techniques
Run Length Encoding (RLE)
• To depict that the number has a special meaning and not part of the
normal text, a special character is used as a flag.
• Less frequent patterns will be coded with more bits whereas the
most frequent patterns will use shorter codes.
TMH Chapter - 10 13
Compression
Lossless Techniques
Huffman Coding
TMH Chapter - 10 14
Compression
Lossless Techniques
Huffman Coding
• Transmitting this string without compression using 7-bit ASCII code for
each character requires a total of 7 × 16 = 112 bits.
• In the next step, the leaf node containing BCD is again divided into two,
into B with probability 1/8 and CD with probability 1/8. In the final step
the leaf node containing CD is divided into C with probability 1/16 and D
with probability 1/16.
TMH Chapter - 10 15
Compression
Lossless Techniques
Huffman Coding
TMH Chapter - 10 16
Compression
Lossless Techniques
Huffman Coding
TMH Chapter - 10 18
Compression
Lossless Techniques
Lempel Ziv (LZ) Coding
• As each word occurs in the text, the encoder stores the index of
the word in the table instead of the actual word.
TMH Chapter - 10 19
Compression
Lossless Techniques
Differential Pulse Code Modulation (DPCM)
TMH Chapter - 10 21
Compression
Lossless Techniques
Delta Modulation (DM)
• Since a single bit can take up only two values, the difference signal here
only specifies whether a sample value is greater than (positive
difference) or less than (negative difference) than the previous sample
value.
TMH Chapter - 10 22
Compression
Lossless Techniques
TMH Chapter - 10 23
Compression
Lossless Techniques
Adaptive Differential Pulse Code Modulation (ADPCM)
TMH Chapter - 10 24
Compression
Lossless Techniques
Graphics Interchange Format (GIF)
• The GIF format uses a Color Lookup Table (CLUT) to store the
color values of pixels. The table consists of 256 rows and 3
columns.
TMH Chapter - 10 25
Compression
Lossless Techniques
Graphics Interchange Format (GIF)
TMH Chapter - 10 26
Compression
Lossless Techniques
Graphics Interchange Format (GIF)
• Each row consists of a color value having its RGB values in the 3
columns. When an image is converted to the GIF format, the
algorithm selects the 256 most representative colors from the
image and stores all these values in the 256 row table.
• Next it uses the 8-bit index number of the row to represent the
colors, resulting in a compression ratio of 3:1.
TMH Chapter - 10 28
Compression
Lossy Techniques
Transform Coding
• The step called transform coding converts data into a form which
is more suited for identifying redundancies.
TMH Chapter - 10 30
Compression
Lossy Techniques
Psycho-analysis
TMH Chapter - 10 31
Compression
Lossy Techniques
TMH Chapter - 10 32
Compression
Lossy Techniques
Interframe Co-relation
TMH Chapter - 10 33
Compression
Lossy Techniques
Interframe Co-relation
• The unchanging pixels for a set of frames are coded only once
and are just repeated over the next frames. The changing pixels
are encoded however for every frame.
TMH Chapter - 10 34
Compression
Lossy Techniques
TMH Chapter - 10 35
Compression
Lossy Techniques
Interframe Co-relation
TMH Chapter - 10 36
Compression
Lossy Techniques
TMH Chapter - 10 37
Compression
Lossy Techniques
Interframe Co-relation
• In reality both the pixel difference method and the motion vector
methods are used.
TMH Chapter - 10 38
Compression
JPEG
• The block preparation step breaks each 2D array of the image into
individual blocks of 8 X 8 pixels per block.
• For an image 640 X 480 pixels in RGB format, this step prepares 4800
blocks each for R, G and B information.
TMH Chapter - 10 40
Compression
JPEG
TMH Chapter - 10 41
Compression
JPEG
Discrete Cosine Transform (DCT)
• The objective of this step is to transform each block from the spatial
domain to the frequency domain.
• We may call this function a = f(x,y) where x and y are the two spatial
dimensions and a represents the amplitude of the signal (or pixel) at the
sampled position (x,y).
TMH Chapter - 10 42
Compression
JPEG
Discrete Cosine Transform (DCT)
• After the DCT this function is turned into another function c = g(Fx,Fy)
where c is a coefficient and Fx and Fy are the respective spatial
frequencies for each direction.
TMH Chapter - 10 43
Compression
JPEG
Discrete Cosine Transform (DCT)
TMH Chapter - 10 44
Compression
JPEG
Discrete Cosine Transform (DCT)
• From DFT ignoring the imaginary component we get the equation of
forward DCT as :
TMH Chapter - 10 45
Compression
JPEG
Discrete Cosine Transform (DCT)
• All 64 values in the input array P[x,y] contribute to each entry in the
transformed frequency domain array F[i,j].
• For i=j=0, the two cosine terms and hence horizontal and vertical frequency
terms become 0. Since cos(0) = 1, the value in the location F[0,0] of the
transformed array is a function of the summation of all the values in the
input array. This term is known as the DC coefficient.
• Since the values in the other locations of the transformed array have a
frequency coefficient associated with them, either horizontal or vertical or
both, they are known as AC coefficients.
TMH Chapter - 10 46
Compression
JPEG
Quantization
• The human eye responds primarily to the DC coefficients and lower spatial
frequency coefficients. Thus if the magnitude of a higher frequency
coefficient is below a certain threshold the eye will not detect it.
• This property is exploited in the quantization phase by dropping (i.e. setting
to zero) the higher spatial frequency coefficients in the transformed array
whose amplitudes are less than a pre-defined threshold value.
• Instead of simply comparing each coefficient with the corresponding
threshold value, a division operation is performed using the defined
threshold value as the divisor.
• If the resulting quotient (rounded) is zero the coefficient is less than the
threshold value while if it is non-zero this indicates the number of times the
coefficient is greater than the threshold.
TMH Chapter - 10 47
Compression
JPEG
Quantization
• The threshold values are stored in a square 8 X 8 table called the
quantization table. Each element of the table can be any integer value from 1
to 255.
• The threshold values used in general increase in magnitude with increasing
spatial frequency. Hence many of the higher frequency coefficients are
scaled to zero. Since some of the components are neglected, this step leads
to data loss
TMH Chapter - 10 48
Compression
JPEG
Zig-zag Scan
• After the DCT stage, the remaining
stages involve entropy encoding.
The entropy coding algorithms
operate on a one-dimensional string
of values
• The output of the quantization stage
is however a 2D array. Hence to
apply an entropy scheme, the array
is to be converted to a 1D vector.
• To cluster together zero and non-
zero values a zig-zag scan of the
array is performed. The scanning is
started from the top-left value and
proceeds
TMH
in the manner shown Chapter - 10 49
below.
Compression
JPEG
DPCM coding
• There is one DC coefficient per block. It is the largest coefficient and
because of its importance it is kept as high as possible during the
quantization phase.
• Because of the small physical area covered by each block, the DC
coefficient varies slowly from one block to the next.
• To exploit this similarity, the sequence of DC coefficients are encoded in
DPCM mode. This means that the difference between the DC Coefficient of
each block and the adjacent block is computed and stored.
• This scheme helps to reduce the number of bits required to encode the
relatively large magnitudes of the DC coefficients .
TMH Chapter - 10 50
Compression
JPEG
RLE coding
• After the quantization step only some of the coefficients have survived while
others have been reduced to almost zero values. The surviving values of
each block are to be stored.
• However many of the values might be same and to take advantage of this
they are run length encoded.
• Due to the zig-zag scan the AC coefficients of each block have been
grouped in such a way that the zero values have been clustered together.
• To exploit this arrangement, for each string of repeated zero values, a single
value is stored along with a count number of how many times the value is to
be repeated
TMH Chapter - 10 51
Compression
JPEG
Huffman coding and Frame packing
• The final step consist of applying a Huffman encoding scheme which
allocates variable length codes and requires a code-table. This is applied to
both the differential encoded DC coefficients of different blocks as well as
the AC coefficients within a block.
• The frame packing block does the final assembling of the data and adds
additional error-checking codes before sending the data to the output as an
encoded data stream.
TMH Chapter - 10 52
Compression
JPEG
JPEG Encoder Block Diagram
TMH Chapter - 10 53
Compression
JPEG
JPEG Decoder Block Diagram
TMH Chapter - 10 54
Compression
MPEG
TMH Chapter - 10 55
Compression
MPEG
TMH Chapter - 10 56
Compression
MPEG
• The primary uses for the MPEG-4 standard are web (streaming media)
and CD distribution, conversational (videophone), and broadcast
television.
TMH Chapter - 10 57
Compression
MPEG
TMH Chapter - 10 58
Compression
MPEG
• Layer I is the basic mode and Layer II and Layer III have
increasing levels of complexities associated with them which in
turn produces a corresponding increase in levels of compression
for the same perceived audio quality.
TMH Chapter - 10 60
Compression
MPEG-1 Audio
• In 1993 MP2 (MPEG-1 Layer II) files first appeared on the Internet and
were often played back using the Xing MPEG Player
• The bitrate for Layer-II is about 128 kbps/channel, the compression ratio
ranges from 6:1 to 8:1 and the quality is same as that of digital audio
broadcasting.
TMH Chapter - 10 61
Compression
MPEG-1 Audio
• Beginning in the first half of 1995 MP3 (MPEG-1 Layer III) files began
flourishing on the Internet. Most listeners accept the MP3 bitrate of 64
kbps/channel as near CD quality, which provides a compression ratio of
approximately 10:1
TMH Chapter - 10 62
Compression
MPEG-1 Audio
TMH Chapter - 10 63
Compression
MPEG-1 Video
TMH Chapter - 10 64
Compression
MPEG-1 Video
• The first I frame must be transmitted first followed by the next P frame
and then by the B frames. Thereafter the second I frame must be
transmitted.
TMH Chapter - 10 67
Compression
MPEG-1 Video
TMH Chapter - 10 68
Compression
MPEG-1 Video
TMH Chapter - 10 69
Compression
MPEG-1 Video
TMH Chapter - 10 70
Compression
MPEG-1 Video
TMH Chapter - 10 71
Compression
MPEG-1 Video
• The decoder first uses the motion vector and then the prediction
error to create the new macroblock.
TMH Chapter - 10 72
Compression
MPEG-1 Video
IBBPBBPBBI...
IPBBPBBIBBPBB...
TMH Chapter - 10 73
Compression
MPEG-1 Video
TMH Chapter - 10 74
Compression
MPEG-2 Audio
• The MPEG-2 audio standard was designed for applications ranging from
digital HDTV television transmission to Internet downloading.
TMH Chapter - 10 75
Compression
MPEG-2 Audio
• The MPEG–2 AAC (advanced audio coding) format codes stereo
or multichannel sound at 64 kbps/channel. It is specified in the
ISO/IEC 13818-7 standard and finalized in April 1997.
TMH Chapter - 10 76
Compression
MPEG-2 Video
• MPEG-2 is formally referred to as ISO/IEC specification 13818
and was completed in 1994. Specifically the MPEG-2 was
developed to provide video quality not lower than and upto
HDTV quality.
TMH Chapter - 10 77
Compression
MPEG-2 Video
• Additional features include – support for interlaced video, allows
adaptive selection whether DCT is to be applied at frame level or
field level, provides for a panning window within a frame, support
for encoding at multiple quality levels etc.
TMH Chapter - 10 78
Compression
MPEG-4
• The most important feature of MPEG-4 is content based coding. It is the
first standard that support content based coding of audio visual objects.
The contents such as audio, video and data are represented in the form
of primitive audio visual objects (AVO).
TMH Chapter - 10 79
Compression
MPEG-4
• Each audio and video object is described by an object descriptor which
enables a user to manipulate the objects.
TMH Chapter - 10 80
Compression
MPEG-4
• MPEG-4 Part 3 (formally ISO/IEC 14496-3) is, as the name suggests,
the third part of the ISO/IEC MPEG-4 international standard.
• aacPlus was standardized by the MPEG under the High Efficiency AAC
(HE-AAC) name. The codec can operate at very low bitrates and is
good for Internet radio streaming.
TMH Chapter - 10 81
Compression
MPEG-4
• MPEG-4 Part 14 or *.mp4, is a file format (a so called container)
specified as a part of the ISO/IEC MPEG-4 international
standard.
TMH Chapter - 10 82
Compression
MPEG-4
• MPEG-4 Part 2 is a video coding technology developed by MPEG. Like
the Audio part for MPEG-4, the video part is divided into several profiles
that are aimed for use in several different standards.
• Simple Profile is mostly aimed for use in cell phones and other
technical devices that that can't handle features in MPEG-4 that
requires a lot of CPU power.
TMH Chapter - 10 83
Compression
MPEG-4
• MPEG-4 Part 10 is a high compression digital video codec
standard written by the ITU-T Video Coding Experts Group
(VCEG) together with the ISO/IEC Moving Picture Experts Group
(MPEG)
• The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard
(formally, ISO/IEC 14496-10) are technically identical, and the
technology is also known as Advanced Video Coding (AVC).
TMH Chapter - 10 84
Compression
MPEG-4
• This must be done without an increase in complexity as to make
the design impractical (expensive) to implement.
TMH Chapter - 10 85
Compression
MPEG-7
• The MPEG-7 standard is targeted towards making the identification of
various accessible audio / video resources easier.
TMH Chapter - 10 86
Compression
MPEG-7
• The main elements of the MPEG-7 standard are the following.
• Descriptors (D), that define the syntax and the semantics of each
feature (metadata element); and Description Schemes (DS), that specify
the structure and semantics of the relationships between their
components, that may be both Descriptors and Description Schemes