0% found this document useful (0 votes)
2 views

Chapter 10 - Compression

Uploaded by

Arpan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 10 - Compression

Uploaded by

Arpan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Multimedia

Principles

Chapter 10

Compression

TMH Chapter - 10 1
Compression
Introduction
ƒ Compared to text files other media files like image, audio and
video take up a huge amount of disk space.

ƒ Because of their huge sizes, storage requirements increases


rapidly with multiple such media files thereby increasing the cost.
During a multimedia development work, one can quickly run out
of disk space.

ƒ Even if there is adequate storage space, such files require a


large data transfer rate that may be beyond the capabilities of
both the processor and hard disk. For example, to play back an
audio file, the disk and processor have to process about 1.4
million bits of data per second

TMH Chapter - 10 2
Compression
Introduction
ƒ After an analog quantity has been digitized, it is stored on the
disk as a digital file. These files are referred to as raw or
uncompressed media data.

ƒ To compress the file and reduce its size, it needs to be filtered


through a specialized software called a CODEC, which is short
for Compression / Decompression or Coder / Decoder

ƒ The algorithms work by trying to find redundant information


within the media files. Redundant information are those which
can either be discarded without affecting the media quality by
appreciable measures, or they can be expressed in a more
compact form.

TMH Chapter - 10 3
Compression
Introduction
ƒ In either case this leads to a reduction in file size, but the actual
amount of reduction depends on a large number of factors
involving both the media data and the CODEC used

ƒ The compressed media is then again stored on the disk or other


storage media, frequently as a different type of file with another
extension.

ƒ When the media needs to be played back either as stand-alone


or as part of a presentation, the same CODEC is again used to
decompress the data which was previously compressed. Hence
the compression process needs to be essentially reversible.
TMH Chapter - 10 4
Compression
Introduction
• However depending on the type of CODEC used, the
decompressed data may or may not be exactly identical to the
original uncompressed data.

• The decompression process is necessary because the playback


devices like the monitor or speakers cannot directly handle
compressed data.

TMH Chapter - 10 5
Compression
Introduction

TMH Chapter - 10 6
Compression
Types of Compression
• Lossless compression - In this case the original data is not
changed permanently during compression. After decompression
therefore the original data can be retrieved exactly.

• CODECs used for lossless compression are called lossless


CODECs. The algorithms within these CODECs attempt to
represent the existing information in a more compact form
without actually discarding any data.

• The advantage of lossless compression is that the original data


stays intact and can be reused. The disadvantage is that the
compression achieved is not very high and may be typically of
the order of 2 to 5 times.
TMH Chapter - 10 7
Compression
Types of Compression
• While lossless compression is always desirable images with very little
redundancy do not produce acceptable results with this technique.

• In lossy compression parts of the original data are discarded


permanently to reduce file size. After decompression the original data
cannot be recovered which leads to a degradation of media quality.

• However if done correctly the quality loss may not be immediately


apparent because of the limitations of the human eye and ears, and also
because of the tendency of the human senses to bridge the gaps in
information.

• Compression ratios of the order of 10 to 50 times may be achieved by


this method.

TMH Chapter - 10 8
Compression
Redundancies
• Statistical redundancy owes its origin to some kind of statistical
relationship existing within the media data.

• It may be related either to the actual information content e.g.


intensity values of adjacent similar pixels in an image or it may
be related to representation of the information e.g. frequently
occurring data may be represented by smaller number of bits.

• While interpixel redundancy rests in the image, audio and video


data, psycho-visual redundancy originates from the
characteristics of the human visual system (HVS).

TMH Chapter - 10 9
Compression
Redundancies
• In the HVS, visual information is not perceived equally, some
information may be more important than other information.

• This implies that if we apply fewer data to represent less


important visual information, perception will not be affected.
Eliminating this type of redundancy leads to data compression.

TMH Chapter - 10 10
Compression
Lossless Techniques
• Lossless compression techniques are also known as Entropy
Coding. Entropy coding is a generic term which refers to the
compression techniques which do not take into account the
nature of the information to be compressed.

• Entropy based techniques treat all data as sequences of bits and


tries to achieve compression by statistical manipulation or
rearrangement of the bits .

TMH Chapter - 10 11
Compression
Lossless Techniques
Run Length Encoding (RLE)

• In this method, any sequence of repetitive characters may be replaced


by a short form.

• A series of n successive characters may be replaced by a single


instance of the character and the number of occurrences.

• To depict that the number has a special meaning and not part of the
normal text, a special character is used as a flag.

• Using an exclamation mark as a special flag to indicate RLE, the


following example shows how data can be compressed by replacing the
sequence of six characters “N” by “!6N”

Uncompressed data : UNNNNNNIMANNHEIM


Compressed data : U!6NIMANNHEIM
TMH Chapter - 10 12
Compression
Lossless Techniques
Huffman Coding

• This method consists of identifying the most frequent bit or byte


patterns in a given file and coding these patterns with fewer bits
than initially represented.

• Less frequent patterns will be coded with more bits whereas the
most frequent patterns will use shorter codes.

• A table of correspondence called the code-book between the


initial patterns and their new representations must be available at
both the encoding and the decoding ends.

TMH Chapter - 10 13
Compression
Lossless Techniques
Huffman Coding

• This table is generated by analyzing the probabilities of the


occurrences of each character in the file during the encoding
process. The actual codewords are generated by using a binary
tree whose branches are assigned a value of 0 or 1.

• Let us consider a document containing the following string :


AAAAAAAAAAAABBCD i.e. 12 A’s, 2 B’s, 1 C and 1 D. The
probabilities of the four characters can be written as : p(A)=3/4,
p(B)=1/8, p(C)=p(D)=1/16.

TMH Chapter - 10 14
Compression
Lossless Techniques
Huffman Coding

• Transmitting this string without compression using 7-bit ASCII code for
each character requires a total of 7 × 16 = 112 bits.

• To use Huffman coding, a binary tree is constructed whose root node


has two branches. The first branch has a leaf node A with probability ¾
while the other branch contains the leaf node with the combined
characters BCD with probability ¼.

• In the next step, the leaf node containing BCD is again divided into two,
into B with probability 1/8 and CD with probability 1/8. In the final step
the leaf node containing CD is divided into C with probability 1/16 and D
with probability 1/16.

TMH Chapter - 10 15
Compression
Lossless Techniques
Huffman Coding

TMH Chapter - 10 16
Compression
Lossless Techniques
Huffman Coding

• As each branch divided a binary value of 0 or 1 is assigned to


each new branch, a binary 0 for the left-branch and 1 for the
right-branch.

• The codewords used for each character are determined by


tracing the path from the root node to each leaf node. The
codewords are as follows:
A:1
B : 01
C : 001
D : 000
TMH Chapter - 10 17
Compression
Lossless Techniques
Huffman Coding

• Thus we see that the character which is occurring most


frequently is represented by the shortest code and vice versa.

• After compression the total number of bits required to represent


the document is : 12 × 1 + 2 × 2 + 1 × 3 + 1 × 3 = 22 bits, a
compression ratio of about 5:1.

• Obviously the larger is the repeated occurrence of characters the


more is the compression achieved.

TMH Chapter - 10 18
Compression
Lossless Techniques
Lempel Ziv (LZ) Coding

• Unlike the Huffman or arithmetic coding, the Lempel-Ziv (LZ)


coding uses codes documents by considering a string of
characters at a time instead of single characters.

• For text compression, both the encoder and decoder holds a


table containing all possible words in the document.

• As each word occurs in the text, the encoder stores the index of
the word in the table instead of the actual word.

• The decoder uses the index to access the corresponding word


from the table in order to reconstruct the document.

TMH Chapter - 10 19
Compression
Lossless Techniques
Differential Pulse Code Modulation (DPCM)

• The differential coding techniques do not code a sample value


directly but instead codes the difference between the actual
value of the sample and a prediction of that value.

• Hence it is also called the predictive coding technique. The


predicted value is derived from the previous sample value.

• The difference between the predicted and the actual value is


called the prediction error. The differential coding method is
particularly well suited for signals in which successive values do
not differ too much from each other e.g. audio samples.
TMH Chapter - 10 20
Compression
Lossless Techniques

TMH Chapter - 10 21
Compression
Lossless Techniques
Delta Modulation (DM)

• Delta modulation is a special case of DPCM. Here the difference


between the predicted value and the current value is coded with a single
bit.

• Since a single bit can take up only two values, the difference signal here
only specifies whether a sample value is greater than (positive
difference) or less than (negative difference) than the previous sample
value.

• When applied to signals which change too rapidly, DM leads to a kind of


distortion called Slope Overload Distortion. On the other hand, for
signals which change too slowly, DM leads to another type of distortion
called Granular Distortion.

TMH Chapter - 10 22
Compression
Lossless Techniques

TMH Chapter - 10 23
Compression
Lossless Techniques
Adaptive Differential Pulse Code Modulation (ADPCM)

• When applying DPCM to a changing signal, it is assumed that


the difference between sample values will be small compared to
the sample values themselves.

• However for signals which changes too rapidly, the difference


values may be occasionally too large to be encoded within 4-bits.
In those cases DPCM would produce a distortion.

• To minimize this distortion, an adaptive DPCM scheme is


followed. In this method the number of bits is varied depending
on the amplitude of the differential signal. Fewer bits are used to
encode smaller difference values than for larger values.

TMH Chapter - 10 24
Compression
Lossless Techniques
Graphics Interchange Format (GIF)

• The Graphics Interchange Format (GIF) is used extensively to


compress images for use on the Internet. It is an 8-bit format
which means that it can represent a maximum of 256 colors.

• When images of higher color depth are converted to GIF, the


algorithm chooses those 256 colors from the original set which
most closely represents the colors of the image.

• The GIF format uses a Color Lookup Table (CLUT) to store the
color values of pixels. The table consists of 256 rows and 3
columns.
TMH Chapter - 10 25
Compression
Lossless Techniques
Graphics Interchange Format (GIF)

TMH Chapter - 10 26
Compression
Lossless Techniques
Graphics Interchange Format (GIF)

• Each row consists of a color value having its RGB values in the 3
columns. When an image is converted to the GIF format, the
algorithm selects the 256 most representative colors from the
image and stores all these values in the 256 row table.

• Next it uses the 8-bit index number of the row to represent the
colors, resulting in a compression ratio of 3:1.

• This compression can both be lossy and lossless. Since each of


the colors is represented exactly by its RGB value, the process is
essentially lossless.
TMH Chapter - 10 27
Compression
Lossless Techniques
Graphics Interchange Format (GIF)

• However if the number of colors in the original image is more


than 256, the process of selecting the 256 most representative
colors leads to loss of information.

• Further compression on the table of color values may be


achieved by using the LZW compression algorithm.

TMH Chapter - 10 28
Compression
Lossy Techniques
Transform Coding

• The step called transform coding converts data into a form which
is more suited for identifying redundancies.

• It is based on the idea that there may be redundancies in image


or audio data but those redundancies may not be readily
apparent.

• This phase usually involves conversion from the spatial (for


image) domain or temporal (for audio) domain to the frequency
domain, by using mathematical algorithms like the Discrete
Fourier Transform or Discrete Cosine Transform etc.
TMH Chapter - 10 29
Compression
Lossy Techniques
Transform Coding

• Once converted to the frequency domain it becomes relatively


easier to identify portions of the original data which might be
considered irrelevant and therefore do not contribute much to the
understanding or perception of the original image or audio.

• It is to be noted that the transform coding step does not lead to


any compression or loss of data, the data is simply expressed in
another form which can be readily reverted back to its original
form using an inverse transform operation .

TMH Chapter - 10 30
Compression
Lossy Techniques
Psycho-analysis

• This phase is responsible for analysing the transformed data and


identifying which portions may be irrelevant with respect to the
human visual or acoustic system.

• Usually a quantization procedure actually following the analysis


procedure discards the redundant data and leads to a reduction
in file size.

• The input audio or visual data is fed to a psycho-analytic block


which has the knowledge of the limitations of human audio and
visual systems and therefore can find out redundant portions
from the input data.

TMH Chapter - 10 31
Compression
Lossy Techniques

TMH Chapter - 10 32
Compression
Lossy Techniques
Interframe Co-relation

• This involves the exploitation of interframe correlation that exists


between successive frames in the video sequence in addition to
the intraframe correlation that exists within each frame.

• The interframe correlation is also called temporal redundancy


while the intraframe correlation is referred to as spatial
redundancy.

• In the first step called frame replenishment each pixel in a frame


can be classified into changing or unchanging areas depending
on whether or not the intensity difference between its present
value and that in the previous frame exceeds a threshold.

TMH Chapter - 10 33
Compression
Lossy Techniques
Interframe Co-relation

• The unchanging pixels for a set of frames are coded only once
and are just repeated over the next frames. The changing pixels
are encoded however for every frame.

• Since the technique only encodes those pixels whose intensity


value changes, its coding efficiency is much higher than the
coding techniques which encode every pixel of every frame.

TMH Chapter - 10 34
Compression
Lossy Techniques

TMH Chapter - 10 35
Compression
Lossy Techniques
Interframe Co-relation

• Side by side the frame-difference predictive coding, another


method, called displacement-based predictive coding, was also
being developed.

• Here changes between the successive frames are considered


due to translation of moving objects in the image planes.

• Motion vectors of objects are computed and are encoded instead


of pixel values.

• The drawback of the above model is that it cannot accommodate


motions other than translation, say rotation or zooming

TMH Chapter - 10 36
Compression
Lossy Techniques

TMH Chapter - 10 37
Compression
Lossy Techniques
Interframe Co-relation

• In reality both the pixel difference method and the motion vector
methods are used.

• First motion vectors are computed from the present frame to


arrive at an intermediate frame.

• Next pixel differences are computed between the intermediate


frame and the next frame.

TMH Chapter - 10 38
Compression
JPEG

• The JPEG (Joint Photographers Expert Group) is the ISO/IEC


international standard 10918-1, developed in collaboration with
the International Telecommunication Union (ITU). It became an
international standard in 1992. The essential characteristics of
JPEG are summarized below.

• Involves a number of steps : block preparation, forward DCT,


quantization, RLE, DPCM, Huffman coding.

• The decompression step just involves the reverse procedures in


proper sequence.
TMH Chapter - 10 39
Compression
JPEG
Block Preparation
• An image is represented by one or more 2D array of pixel values.

• For color images there will be three 2D arrays of pixel values


corresponding to R, G and B sub-components.

• The block preparation step breaks each 2D array of the image into
individual blocks of 8 X 8 pixels per block.

• For an image 640 X 480 pixels in RGB format, this step prepares 4800
blocks each for R, G and B information.

TMH Chapter - 10 40
Compression
JPEG

TMH Chapter - 10 41
Compression
JPEG
Discrete Cosine Transform (DCT)

• The objective of this step is to transform each block from the spatial
domain to the frequency domain.

• Each block is composed of 64 values which represents the amplitude of


the sampled signal

• We may call this function a = f(x,y) where x and y are the two spatial
dimensions and a represents the amplitude of the signal (or pixel) at the
sampled position (x,y).

TMH Chapter - 10 42
Compression
JPEG
Discrete Cosine Transform (DCT)

• After the DCT this function is turned into another function c = g(Fx,Fy)
where c is a coefficient and Fx and Fy are the respective spatial
frequencies for each direction.

• For an image these frequencies determines how rapidly the luminance


and color change in each direction.

• The coefficient g(0,0) corresponds to the zero frequencies in either


direction and is called the DC coefficient.

TMH Chapter - 10 43
Compression
JPEG
Discrete Cosine Transform (DCT)

• For a continuous tone image,


sample values vary in
luminance and color very
slightly from point to point.
Thus the coefficients of the
lowest frequencies will be high
while the coefficients of the
higher frequencies will be low.

TMH Chapter - 10 44
Compression
JPEG
Discrete Cosine Transform (DCT)
• From DFT ignoring the imaginary component we get the equation of
forward DCT as :

• where ReX[ ] is the real component value in the frequency domain, N is


the total number of samples, x[ ] is the time domain value,
• Here since we are considering an image which is a two dimensional entity,
we need to consider a 2-dimensional DCT. Here both the spatial domain
array and the frequency domain array are two dimensional arrays.

TMH Chapter - 10 45
Compression
JPEG
Discrete Cosine Transform (DCT)
• All 64 values in the input array P[x,y] contribute to each entry in the
transformed frequency domain array F[i,j].

• For i=j=0, the two cosine terms and hence horizontal and vertical frequency
terms become 0. Since cos(0) = 1, the value in the location F[0,0] of the
transformed array is a function of the summation of all the values in the
input array. This term is known as the DC coefficient.

• Since the values in the other locations of the transformed array have a
frequency coefficient associated with them, either horizontal or vertical or
both, they are known as AC coefficients.

TMH Chapter - 10 46
Compression
JPEG
Quantization
• The human eye responds primarily to the DC coefficients and lower spatial
frequency coefficients. Thus if the magnitude of a higher frequency
coefficient is below a certain threshold the eye will not detect it.
• This property is exploited in the quantization phase by dropping (i.e. setting
to zero) the higher spatial frequency coefficients in the transformed array
whose amplitudes are less than a pre-defined threshold value.
• Instead of simply comparing each coefficient with the corresponding
threshold value, a division operation is performed using the defined
threshold value as the divisor.
• If the resulting quotient (rounded) is zero the coefficient is less than the
threshold value while if it is non-zero this indicates the number of times the
coefficient is greater than the threshold.

TMH Chapter - 10 47
Compression
JPEG
Quantization
• The threshold values are stored in a square 8 X 8 table called the
quantization table. Each element of the table can be any integer value from 1
to 255.
• The threshold values used in general increase in magnitude with increasing
spatial frequency. Hence many of the higher frequency coefficients are
scaled to zero. Since some of the components are neglected, this step leads
to data loss

TMH Chapter - 10 48
Compression
JPEG
Zig-zag Scan
• After the DCT stage, the remaining
stages involve entropy encoding.
The entropy coding algorithms
operate on a one-dimensional string
of values
• The output of the quantization stage
is however a 2D array. Hence to
apply an entropy scheme, the array
is to be converted to a 1D vector.
• To cluster together zero and non-
zero values a zig-zag scan of the
array is performed. The scanning is
started from the top-left value and
proceeds
TMH
in the manner shown Chapter - 10 49
below.
Compression
JPEG
DPCM coding
• There is one DC coefficient per block. It is the largest coefficient and
because of its importance it is kept as high as possible during the
quantization phase.
• Because of the small physical area covered by each block, the DC
coefficient varies slowly from one block to the next.
• To exploit this similarity, the sequence of DC coefficients are encoded in
DPCM mode. This means that the difference between the DC Coefficient of
each block and the adjacent block is computed and stored.
• This scheme helps to reduce the number of bits required to encode the
relatively large magnitudes of the DC coefficients .

TMH Chapter - 10 50
Compression
JPEG
RLE coding
• After the quantization step only some of the coefficients have survived while
others have been reduced to almost zero values. The surviving values of
each block are to be stored.
• However many of the values might be same and to take advantage of this
they are run length encoded.
• Due to the zig-zag scan the AC coefficients of each block have been
grouped in such a way that the zero values have been clustered together.
• To exploit this arrangement, for each string of repeated zero values, a single
value is stored along with a count number of how many times the value is to
be repeated

TMH Chapter - 10 51
Compression
JPEG
Huffman coding and Frame packing
• The final step consist of applying a Huffman encoding scheme which
allocates variable length codes and requires a code-table. This is applied to
both the differential encoded DC coefficients of different blocks as well as
the AC coefficients within a block.

• The frame packing block does the final assembling of the data and adds
additional error-checking codes before sending the data to the output as an
encoded data stream.

TMH Chapter - 10 52
Compression
JPEG
JPEG Encoder Block Diagram

TMH Chapter - 10 53
Compression
JPEG
JPEG Decoder Block Diagram

TMH Chapter - 10 54
Compression
MPEG

• The Motion Pictures Expert Group (MPEG) is a working group


under ISO/IEC (International Standards Organization /
International Electrotechnical Commission) set up to formulate a
set of standards relating to a range of multimedia application
involving audio and video primararily.

• It first devised the ISO/IEC International Standard 11172 for


reduced data rate coding of video and audio signals. The
standard was finalized in November 1992 and is commonly
known as MPEG-1.

TMH Chapter - 10 55
Compression
MPEG

• MPEG-2 involves video and audio storage and transmission


standards for broadcast quality television.

• Used for digital TV transmission, digital satellite TV services,


digital cable TV signals and for DVD applications.

• It is defined in a series of documents which are all subsets of


ISO Recommendations 13818.

TMH Chapter - 10 56
Compression
MPEG

• MPEG-4, introduced in late 1998, is the designation for a group of audio


and video coding standards and related technology agreed upon by the
ISO/IEC MPEG.

• The primary uses for the MPEG-4 standard are web (streaming media)
and CD distribution, conversational (videophone), and broadcast
television.

• MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and


other related standards, adding new features such as (extended) VRML
support for 3D rendering, object-oriented composite files (including
audio, video and VRML (Virtual Reality Modeling Language) objects),
support for externally-specified Digital Rights Management and various
types of interactivity.

TMH Chapter - 10 57
Compression
MPEG

• The ISO subcommittee SC29, WG11, MPEG (Moving Picture Experts


Group), published in February 2002 another standard called "Multimedia
Content Description Interface" (in short MPEG-7)

• This provides a set of descriptors (D), i.e., quantitative measures of


audio-visual features, and description schemes (DS), i.e., structures of
descriptors and their relationship.

• A language called the Description Definition Language (DDL) can be


used to specify the scheme.

• It is currently the most complete description standard for multimedia.

TMH Chapter - 10 58
Compression
MPEG

• Formally defined as ISO/IEC 21000-N, MPEG-21 aims at


defining an open framework for multimedia delivery and
consumption.

• MPEG-21 is based on two essential concepts: the definition of a


fundamental unit of distribution and transaction (the Digital Item)
and the concept of Users interacting with Digital Items.

• MPEG-21 identifies and defines the mechanisms and elements


needed to support the multimedia delivery chain as described
above as well as the relationships between and the operations
supported by them.
TMH Chapter - 10 59
Compression
MPEG-1 Audio

• The MPEG-1 Audio standard, defined in ISO Recommendation


11172-3, provides compression algorithms for digital audio.

• There are three different layers pertaining to three levels of


complexities for the algorithms. All of them are based on the
principle of the perceptual coder.

• Layer I is the basic mode and Layer II and Layer III have
increasing levels of complexities associated with them which in
turn produces a corresponding increase in levels of compression
for the same perceived audio quality.

TMH Chapter - 10 60
Compression
MPEG-1 Audio

• Layer I describes the least sophisticated method that requires relatively


high data rates of about 192 kbps/channel. The compression ratio is
about 4:1 and the quality is same as that of digital audio cassette.

• The MPEG-1 Layer II encoding project was initiated by Fraunhofer IIS


[http://www.iis.fraunhofer.de/]. The project was financed by the
European Union as part of the EUREKA research program, commonly
known as EU-147.

• In 1993 MP2 (MPEG-1 Layer II) files first appeared on the Internet and
were often played back using the Xing MPEG Player

• The bitrate for Layer-II is about 128 kbps/channel, the compression ratio
ranges from 6:1 to 8:1 and the quality is same as that of digital audio
broadcasting.
TMH Chapter - 10 61
Compression
MPEG-1 Audio

• The MPEG Layer III data reduction algorithm is widely used to


compress files prior to electronic distribution. The Layer III algorithm is
widely known as MP3.

• Beginning in the first half of 1995 MP3 (MPEG-1 Layer III) files began
flourishing on the Internet. Most listeners accept the MP3 bitrate of 64
kbps/channel as near CD quality, which provides a compression ratio of
approximately 10:1

• Most encoders allow encoding profiles with different levels of


compression. For example bit rates of 28.8, 64, 112, 128, 192 and 320
kbps may be allowed. Higher bit rates may provide stereo playback at
44.1 sampling frequency whereas lower rates will not.

TMH Chapter - 10 62
Compression
MPEG-1 Audio

TMH Chapter - 10 63
Compression
MPEG-1 Video

• The MPEG-1 video standard was completed in 1991 with the


development of the ISO/IEC specification 11172 which the
standard for coding of moving pictures and associated audio for
digital storage at about 1.5 Mbps.

• This provides for a VHS-quality video and audio. It was


developed for CD-ROM applications.

• Important features provided by MPEG-1 include frame-based


random access of video, fast-forward/fast-reverse searches,
reverse playback of video, and editability of the compressed
bitstream. To achieve a high compression ratio, both intra-frame
and inter-frame redundancy are exploited.

TMH Chapter - 10 64
Compression
MPEG-1 Video

• The MPEG-1 Video uses 3 different types of frames : I-frames,


P-frames and B-frames.

• I-frames - These are coded without any reference to other


images. MPEG makes use of JPEG coding for I frames. They
can be used as a reference for other frames.

• In a video stream I-frames are present at regular intervals in


order to allow for the possibility of the contents of an I-frame
being corrupted during transmission.

• Typical compression ratios range from 10:1 to 20:1


TMH Chapter - 10 65
Compression
MPEG-1 Video

• P-frames - These require information from the previous I and/or


P frame for encoding and decoding.

• By exploiting temporal redundancies, the achievable


compression ratio is higher than that of the I frames. P frames
can be accessed only after the referenced I or P frame has been
decoded.

• Encoding of P-frames is done using a combination of frame


replenishment and motion compensation.

• Typical compression ratios range from 20:1 to 30:1


TMH Chapter - 10 66
Compression
MPEG-1 Video

• B-frames - Requires information from the previous and following I and/or


P frame for encoding and decoding. The highest compression ratio is
attainable by using these frames.

• B frames are never used as reference for other frames.

• Reference frames must be transmitted first. Thus transmission order


and display order may differ.

• The first I frame must be transmitted first followed by the next P frame
and then by the B frames. Thereafter the second I frame must be
transmitted.

• Typical compression ratios range from 30:1 to 50:1.

TMH Chapter - 10 67
Compression
MPEG-1 Video

TMH Chapter - 10 68
Compression
MPEG-1 Video

• During the encoding phase, the digitized content of the Y array


associated with each frame are first divided into a 2 dimensional
array of 16 by 16 pixels known as macroblock.

• Assuming a 4:1:1 chroma sub-sampling scheme the related Cb


and Cr arrays are each of 8 by 8 pixels in size.

• A macroblock therefore consists of four DCT blocks for


luminance and one DCT block each of the two chrominance
signals i.e. a total of 6 DCT blocks.

TMH Chapter - 10 69
Compression
MPEG-1 Video

TMH Chapter - 10 70
Compression
MPEG-1 Video

• To encode a P-frame the contents of each macroblock are compared on


a pixel by pixel basis with contents of the corresponding macroblock in
the preceding I- or P-frame.

• If a close match is found then only the address of the current


macroblock is encoded, without the contents (the contents is assumed
same as that of the macroblock of the previous frame).

• If a match is not found, the search is extended to cover a number of


macroblocks in the neighbouring area.

• If a neighbouring macroblock is found to match with the current one, a


motion vector is created encoding the offset between these two
macroblocks.

TMH Chapter - 10 71
Compression
MPEG-1 Video

• Now the shifted macroblock may not be identical to the original


one, there might be differences.

• The difference in the pixel values between these two


macroblocks are encoded in a set of arrays (for Y, Cb, Cr) called
the prediction error.

• The decoder first uses the motion vector and then the prediction
error to create the new macroblock.

• Finally if a match cannot be found (for example when the moving


object has moved out of the screen), the macroblock is encoded
independently in the same way as macroblocks in an I-frame.

TMH Chapter - 10 72
Compression
MPEG-1 Video

• A typical sequence of stored frames can be as follows :

IBBPBBPBBI...

• When these frames are transmitted from one place to another, it


can be easily seen that the first two B-frames cannot be decoded
fully until the first P-frame arrives. Hence for transmission
purposes the order of frames are changed to :

IPBBPBBIBBPBB...

• Hence the transmission order of frames are generally different


from the storage order.

TMH Chapter - 10 73
Compression
MPEG-1 Video

TMH Chapter - 10 74
Compression
MPEG-2 Audio
• The MPEG-2 audio standard was designed for applications ranging from
digital HDTV television transmission to Internet downloading.

• It uses lower sampling frequencies (16 KHz, 22.05 KHz, 24 KHz)


providing better sound quality at low bit rates (below 64 Kbps/channel).

• The MPEG-2 standard encompasses the MPEG-1 standard using the


same coding and decoding principles as MPEG-1.

• It is thus backward compatible to MPEG-1 and also known as MPEG-2


BC (backward compatible).

• The MPEG-2 audio standard was approved by the MPEG committee in


November 1994 and is specified in ISO/IEC 13818-3.

TMH Chapter - 10 75
Compression
MPEG-2 Audio
• The MPEG–2 AAC (advanced audio coding) format codes stereo
or multichannel sound at 64 kbps/channel. It is specified in the
ISO/IEC 13818-7 standard and finalized in April 1997.

• MPEG-2 AAC is not backward compatible with MPEG-1. By


lifting the constraint of compatibility better performance is
achieved compared to MPEG-2 BC.

• MPEG-2 AAC supports standard sampling frequencies of 32,


44.1 and 48 KHz as well as other rates from 8 to 96 KHz yielding
maximum data rates of 48 kbps and 576 kbps

TMH Chapter - 10 76
Compression
MPEG-2 Video
• MPEG-2 is formally referred to as ISO/IEC specification 13818
and was completed in 1994. Specifically the MPEG-2 was
developed to provide video quality not lower than and upto
HDTV quality.

• Its target bit rates are 2 to 15 Mbps and it is optimized at 4 Mbps.

• The basic coding structure of MPEG-2 video is the same as that


of MPEG-1, that is, interframe and intraframe compression with I,
P and B frames.

TMH Chapter - 10 77
Compression
MPEG-2 Video
• Additional features include – support for interlaced video, allows
adaptive selection whether DCT is to be applied at frame level or
field level, provides for a panning window within a frame, support
for encoding at multiple quality levels etc.

• Provides for four levels of video resolutions – low, main, high,


high 1440.

• Provides for 5 profiles – simple, main, spatial resolution,


quantization accuracy, high.

• At a given level, the decoders for a given profile would be able to


decode all the lower profiles that have been defined for that level.
This provides for scalability in the standard.

TMH Chapter - 10 78
Compression
MPEG-4
• The most important feature of MPEG-4 is content based coding. It is the
first standard that support content based coding of audio visual objects.
The contents such as audio, video and data are represented in the form
of primitive audio visual objects (AVO).

• These AVOs can be composed together to create compound AVOs.


Each AVO is in turn composed of one or more audio objects and / or
video objects.

• The data associated with the AVOs can be multiplexed and


synchronized so that they can be transported through network channels
with certain quality requirements.

• Examples of AVOs : a walking figure, background (fixed) behind the


figure, speech associated with the figure etc.

TMH Chapter - 10 79
Compression
MPEG-4
• Each audio and video object is described by an object descriptor which
enables a user to manipulate the objects.

• The language used to describe and manipulate the objects is called


binary format for scenes (BIFS). This has commands to change shape,
size and location of an object.

• While individual objects are defined by AVOs, at a higher level a scene


is described by a scene descriptor. This defines various inter-
relationships between the AVOs in context to the complete scene e.g.
their relative positions.

• Each video scene is segmented into a number of video object planes


(VOP) each of which corresponds to an AVO of interest and is
composed of the AVO using the minimum number of macroblocks.

TMH Chapter - 10 80
Compression
MPEG-4
• MPEG-4 Part 3 (formally ISO/IEC 14496-3) is, as the name suggests,
the third part of the ISO/IEC MPEG-4 international standard.

• It specifies audio coding methods. The Advanced Audio Coding in


MPEG-4 Part 3 was enhanced relative to what was previously specified
in MPEG-2 Part 7, in order to provide better sound quality relative to the
bit rate used for the encoding.

• aacPlus is a lossy data compression scheme for audio streams. It is


based on MPEG-4, and combines three techniques: Advanced Audio
Coding (AAC), Spectral Band Replication (SBR), and parametric stereo
(PS).

• aacPlus was standardized by the MPEG under the High Efficiency AAC
(HE-AAC) name. The codec can operate at very low bitrates and is
good for Internet radio streaming.
TMH Chapter - 10 81
Compression
MPEG-4
• MPEG-4 Part 14 or *.mp4, is a file format (a so called container)
specified as a part of the ISO/IEC MPEG-4 international
standard.

• It is used to store media types defined by the ISO/IEC Moving


Picture Experts Group, and can be used to store other media
types as well.

• *.mp4 files allow streaming over the internet as well as


multiplexing of multiple video and audio streams in one file,
variable frame- and bit-rates, subtitles and still images.

TMH Chapter - 10 82
Compression
MPEG-4
• MPEG-4 Part 2 is a video coding technology developed by MPEG. Like
the Audio part for MPEG-4, the video part is divided into several profiles
that are aimed for use in several different standards.

• Simple Profile is mostly aimed for use in cell phones and other
technical devices that that can't handle features in MPEG-4 that
requires a lot of CPU power.

• Advanced Simple Profile (ASP) is a "profile" defined in the digital


compression codec standard MPEG-4 Part 2 (Visual). It is used
for video compression in numerous products like XviD, DivX,
3ivx, FFmpeg, and Nero Digital.

TMH Chapter - 10 83
Compression
MPEG-4
• MPEG-4 Part 10 is a high compression digital video codec
standard written by the ITU-T Video Coding Experts Group
(VCEG) together with the ISO/IEC Moving Picture Experts Group
(MPEG)

• The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard
(formally, ISO/IEC 14496-10) are technically identical, and the
technology is also known as Advanced Video Coding (AVC).

• The intent of H.264/AVC project has been to create a standard that


would be capable of providing good video quality at bit rates that are
substantially lower (e.g., half or less) than what previous standards
would need (e.g., relative to MPEG-2, H.263, or MPEG-4 Part 2)

TMH Chapter - 10 84
Compression
MPEG-4
• This must be done without an increase in complexity as to make
the design impractical (expensive) to implement.

• An additional goal was to do this in a flexible way that would


allow the standard to be applied to a very wide variety of
applications (e.g., for both low and high bit rates, and low and
high resolution video) and to work well on a very wide variety of
networks and systems

TMH Chapter - 10 85
Compression
MPEG-7
• The MPEG-7 standard is targeted towards making the identification of
various accessible audio / video resources easier.

• Although users physically have access to a large repository of


multimedia resources either locally or over the Internet, till now there
was no standardized scheme to search for or filter these resources as
per requirement, like the string based search for text. MPEG-7 provides
a scheme to tackle this problem.

• The MPEG-7 standard, formally named "Multimedia Content Description


Interface", provides a rich set of standardized tools to describe
multimedia content. Both human users and automatic systems that
process audiovisual information are within the scope of MPEG-7.

TMH Chapter - 10 86
Compression
MPEG-7
• The main elements of the MPEG-7 standard are the following.

• Descriptors (D), that define the syntax and the semantics of each
feature (metadata element); and Description Schemes (DS), that specify
the structure and semantics of the relationships between their
components, that may be both Descriptors and Description Schemes

• A Description Definition Language (DDL) to define the syntax of the


MPEG-7 Description Tools and to allow the creation of new Description
Schemes and, possibly, Descriptors and to allow the extension and
modification of existing Description Schemes

• System tools, to support binary coded representation for efficient


storage and transmission, transmission mechanisms (both for textual
and binary formats), multiplexing of descriptions, synchronization of
descriptions with content, management and protection of intellectual
property in MPEG-7 descriptions, etc.
TMH Chapter - 10 87

You might also like