0% found this document useful (0 votes)
13 views25 pages

CH 2. Size 10.5 Finalised

The document provides an overview of the digitization process for audio, video, and images, explaining how analog signals are converted into digital representations through sampling and capturing. It discusses the advantages of digital signals, methods for digitizing audio and video, and various compression techniques such as predictive and perceptual encoding for audio, and JPEG and MPEG for images and video, respectively. Additionally, it touches on motion compensation in video compression and the Golomb code for encoding integers.

Uploaded by

Shancy Sojan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views25 pages

CH 2. Size 10.5 Finalised

The document provides an overview of the digitization process for audio, video, and images, explaining how analog signals are converted into digital representations through sampling and capturing. It discusses the advantages of digital signals, methods for digitizing audio and video, and various compression techniques such as predictive and perceptual encoding for audio, and JPEG and MPEG for images and video, respectively. Additionally, it touches on motion compensation in video compression and the Golomb code for encoding integers.

Uploaded by

Shancy Sojan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Audio, video, image basics

(1) Digitization means to transfer an anologue into a digital representation. A good example for this is the
information that is stored on the good old audio cassette tapes. The sound pressure waveform is contained
on the tape in form of an adequate function of magnetization strength. This analog representation has to be
converted to a series of numbers, since digital computers can only work with discrete numbers stored at
discrete time moments. The following graphic may indicate the process: at equidistant times the actual value
of the analog waveform is taken and stored as a number. This process where analog signals are turned into
a series of digital numbers is called sampling.

(2) Capturing means to transfer data that is already in a digital representation into a digital representation on
computers. Many recording devices have already a digital representation such as DAT-recorders for audio or
DV camcorders for video signals. In this case the processing is even more simple since one only has to
assure that the digital packaging (format) has to be converted. There is, however, equipment where capturing
is not possible since the companies don't offer the digital information in a proper format at an external plug.
Earlier MiniDisc recorders only allowed the user to copy the stored data via an analog line. Currently, there
are no analog recorders anymore in the video sector, i.e. all cameras carry out a direct digitization.

Digital signals have a number of advantages compared to analog signals:


a) copying can in general be done without loss of information which is not true for copying analog signals (3
dB degradation);
b) digital circuitry is much more resistant against external influences (electromagnetic radiation etc);
c) once data is digitized all processing is exact and reproducible.
Before audio or video signals can be sent on the Internet, they need to be digitized which are discussed
below.

Digitizing Audio
When sound is fed into a microphone, an electronic analog signal is generated that represents the sound
amplitude as a function of time. The signal is called an analog audio signal. An analog signal, such as audio,
can be digitized to produce a digital signal. According to the Nyquist theorem, if the highest frequency of the
signal is f, we need to sample the signal 2f times per second. There are other methods for digitizing an audio
signal, but the principle is the same. Voice is sampled at 8,000 samples per second with 8 bits per sample.
This results in a digital signal of 64 kbps. Music is sampled at 44,100 samples per second with 16 bits per
sample. This results in a digital signal of 705.6 kbps for monaural and 1.411 Mbps for stereo.

Digitizing Video
A video consists of a sequence of frames. If the frames are displayed on the screen fast enough, we get an
impression of motion. The reason is that our eyes cannot distinguish the rapidly flashing frames as individual
ones. There is no standard number of frames per second; in North America 25 frames per second is
common. However, to avoid a condition known as flickering, a frame needs to be refreshed. The TV industry
repaints each frame twice. This means 50 frames need to be sent, or if there is memory at the sender site, 25
frames with each frame repainted from the memory. Each frame is divided into small grids, called picture
elements or pixels. For black and- white TV, each 8-bit pixel represents one of 256 different gray levels. For a
color TV, each pixel is 24 bits, with 8 bits for each primary color (red, green, and blue). We can calculate the
number of bits in a second for a specific resolution. In the lowest resolution a color frame is made of
1,024x768 pixels. This means that we need
2x25x1024x768x24=944Mbps.This data rate needs a very high data rate technology such as SONET. To
send video using lower-rate technologies, we need to compress the video.

Audio Compression
Audio compression can be used for speech or music. For speech, we need to compress a 64-kHz digitized
signal; for music, we need to compress a 1.411 –MHz digitized signal. Two categories of techniques are
used for audio compression: predictive encoding and perceptual encoding.

• Predictive Encoding
In predictive encoding, the differences between the samples are encoded instead of encoding all the
sampled values. This type of compression is normally used for speech. Several standards have been defined
such as GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps).
• Perceptual Encoding: MP3
The most common compression technique that is used to create CD-quality audio is based on the perceptual
encoding technique. As we mentioned before, this type of audio needs at least 1.411 Mbps; this cannot be
sent over the Internet without compression. MP3 (MPEG audio layer 3), a part of the MPEG standard uses
this
technique.
Perceptual encoding is based on the science of psychoacoustics, which is the study of how people perceive
sound. The idea is based on flaws in our auditory system: Some sounds can mask other sounds. Masking
can happen in frequency and time. In frequency masking, a loud sound in a frequency range can partially or
totally mask a softer sound in another frequency range. For example, we cannot hear what we say in a room
where a loud heavy sound is playing. In temporal masking, a loud sound can numb our ears for a short time
even after the sound has stopped. MP3 uses these two phenomena, frequency and temporal masking, to
compress audio signals. The technique analyzes and divides the spectrum into several groups. Zero bits are
allocated to the frequency ranges that are totally masked. A small number of bits are allocated to the
frequency ranges that are partially masked. A larger number of bits are allocated to the frequency ranges
that are not masked. MP3 produces three data rates: 96 kbps, 128 kbps, and 160 kbps. The rate is based on
the range of the frequencies in the original analog audio.

Adaptive DPCM: This variant of DPCM is commonly used for audio compression. In ADPCM the
quantization step size adapts to the changing frequency of the sound being compressed. The predictor also
has to adapt itself and recalculate the weights according to changes in the input. Several versions of ADPCM
exist. A popular version is the IMA ADPCM standard (Section 7.6), which specifies the compression of PCM
from 16 down to four bits per sample. ADPCM is fast, but it introduces noticeable quantization noise and
achieves unimpressive compression factors of about four.

Video Compression
As we mentioned before, video is composed of multiple frames. Each frame is one image. We can compress
video by first compressing images. Two standards are prevalent in the market. Joint Photographic Experts
Group (JPEG) is used to compress images. Moving Picture Experts Group (MPEG) is used to compress
video.

Image Compression: JPEG


As we discussed previously, if the picture is not in color (gray scale), each pixel can be represented by an 8-
bit integer (256 levels). If the picture is in color, each pixel can be represented by 24 bits (3 x 8 bits), with
each 8 bits representing red, blue, or green (RBG). To simplify the discussion, we concentrate on a gray
scale picture. In JPEG, a gray scale picture is divided into blocks of 8 x 8 pixels The purpose of dividing the
picture into blocks is to decrease the number of calculations because the number of mathematical operations
for each picture is the
square of the number of units. The whole idea of JPEG is to change the picture into a linear (vector) set of
numbers that reveals the redundancies. The redundancies (lack of changes) can then be removed by using
one of the text compression methods

Video Compression with Motion Compensation


• Consecutive frames in a video are similar — temporal redundancy exists.
• Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as
a new image.
The difference between the current frame and other frame(s) in the sequence will be coded — small values
and low entropy, good for compression.
• Steps of Video compression based on Motion Compensation (MC):
1. Motion Estimation (motion vector search).
2. MC-based Prediction.
3. Derivation of the prediction error, i.e., the difference.

Motion Compensation
• Each image is divided into macroblocks of size N × N .
– By default, N = 16 for luminance images. For chrominance images,N = 8 if 4:2:0 chroma subsampling is
adopted.
• Motion compensation is performed at the macroblock level.
– The current image frame is referred to as Target Frame.
– A match is sought between the macroblock in the Target Frame and the most similar macroblock in
previous and/or future frame(s) (referred to as Reference frame(s)).
– The displacement of the reference macroblock to the target mac- roblock is called a motion vector MV.

Image compression
A digital image is a rectangular array of dots, or picture elements, arranged in m rows and n columns. The
expression m×n is called the resolution of the image, and the dots are called pixels (except in the cases of
fax images and video compression, where they are referred to as pels). The term “resolution” is sometimes
also used to indicate the number of pixels per unit length of the image. Thus, dpi stands for dots per inch.

1. A bi-level (or monochromatic) image. This is an image where the pixels can have one of two values,
normally referred to as black and white. Each pixel in such an image is represented by one bit, making this
the simplest type of image.
2. A grayscale image. A pixel in such an image can have one of the n values 0 through n − 1, indicating one
of 2n shades of gray (or shades of some other color). The value of n is normally compatible with a byte size;
i.e., it is 4, 8, 12, 16, 24, or some other convenient multiple of 4 or of 8. The set of the most-significant bits of
all the pixels is the most-significant bitplane. Thus, a grayscale image has n bitplanes.
DPCM
The DPCM compression method is a member of the family of differential encoding compression methods,
which itself is a generalization of the simple concept of relative encoding.
It is based on the well-known fact that neighboring pixels in an image are correlated.

3. A continuous-tone image. This type of image can have many similar colors (or grayscales). When
adjacent pixels differ by just one unit, it is hard or even impossible for the eye to distinguish their colors. As a
result, such an image may contain areas
with colors that seem to vary continuously as the eye moves along the area. A pixel in such an image is
represented by either a single large number (in the case of many grayscales) or three components (in the
case of a color image).
4. A discrete-tone image (also called a graphical image or a synthetic image). This is normally an
artificial image. It may have a few colors or many colors, but it does not have the noise and blurring of a
natural image. Examples are an artificial object or machine, a page of text, a chart, a cartoon, or the contents
of a computer screen. (Not every artificial image is discrete-tone. A computer-generated image that’s meant
to look natural is a continuous-tone image in spite of its being artificially generated.)
5. A cartoon-like image. This is a color image that consists of uniform areas. Each area has a uniform color
but adjacent areas may have very different colors. This feature may be exploited to obtain excellent
compression. Whether an image is treated as discrete or continuous is usually dictated by the depth of the
data. However, it is possible to force an image to be continuous even if it would fit in the discrete category). It
is intuitively clear that each type of image may feature redundancy, but they are redundant in different ways.
This is why any given compression method may not perform well for all images, and why different methods
are needed to compress the different image types
______________________________________________________________________________________
Q. Explain motion compensation wrt to video compression____________________________________

Motion Compensation:
i. If the encoder discovers that a part P of the preceding frame has been rigidly moved to a different location
in the current frame, then P can be compressed by writing the following three items on the compressed
stream: its previous location, its current location, and information identifying the boundaries of P.
ii. In principle, such a part can have any shape.
iii. The encoder scans the current frame block by block. For each block B it searches the preceding frame for
an identical block C (if compression is to be lossless) or for a similar one (if it can be lossy).
iv. Finding such a block, the encoder writes the difference between its past and present locations on the
output. This difference is of the form
v. (Cx − Bx, Cy − By) = (Δx,Δy), so it is called a motion vector.
vi. Figure 6.10a,b shows a simple example where the sun and trees are moved rigidly to the right (because of
camera movement) while the child moves a different distance to the left (this is scene movement).

Motion compensation is effective if objects are just translated, not scaled or rotated. Drastic changes in illumination
from frame to frame also reduce the effectiveness of this method. In general, motion compensation is lossy.

Flow of information in motion compensation process


______________________________________________________________________________________
Q.Golomb codes________________________________________________________________________
Intro :
i. The Golomb-Rice codes belong to a family of codes designed to encode integers with the
assumption that the larger an integer, the lower its probability of occurrence.
ii. The simplest code for this situation is the unary code.
iii. The unary code for a positive integer n is simply n 1s followed by a 0. Thus, the code for 4 is 11110,
and the code for 7 is 11111110. One step higher in complexity are a number of coding schemes that
split the integer into two parts, representing one part with a unary code and the other part with a
different code. An example of such a code is the Golomb code.
Definition:
The Golomb code is actually a family of codes parameterized by an integer m > 0.
In the Golomb code with parameter m, we represent an integer n > 0 using two numbers q and r, where

In other words, q is the quotient and r is the remainder when n is divided by m. The
quotient q can take on values 0_ 1_ 2_ ……….. and is represented by the unary code
of q. The remainder r can take on the values 0_ 1_ 2_ …_ m−1. If m is a power of two,
we use the log2m-bit binary representation of r. If m is not a power of two, we could still
use [log2m] bits.
It can be shown that the Golomb code is optimal for the probability model

Encoding:
i. The Golomb code for nonnegative integers n depends on the choice of a parameter m .
ii. The first step in computing the Golomb code of the nonnegative integer n is to compute the three
quantities q (quotient), r(remainder), and c by

following which the code is constructed in two parts; the first is the value of q, coded in unary and the
second is the binary value of r coded in a special way.
iii. The first 2c − m values of r are coded, as unsigned integers, in c − 1 bits each, and the rest are
coded in c bits each (ending with the biggest c-bit number, which consists of c 1’s).
iv. The case where m is a power of 2 (m = 2c) is special because it requires no (c − 1)-bit codes. We
know that n = r + qm; so once a Golomb code is decoded, the values of q and r can be used to easily
reconstruct n.
Example.
Choosing m = 3 produces c = 2 and the three remainders 0, 1, and 2.We compute 22 − 3 = 1, so the first
remainder is coded in c − 1 = 1 bit to become 0,and the remaining two are coded in two bits each ending
with 112, to become 10 and 11.
Selecting m = 5 results in c = 3 and produces the five remainders 0 through 4. The first three (23−5 = 3) are
coded in c−1 = 2 bits each, and the remaining two are each coded in three bits ending with 1112. Thus, 00,
01, 10, 110, and 111.
Decoding:
The Golomb codes are designed in this special way to facilitate their decoding.

Case 1. m = 16 (m is a power of2).


To decode, start at the left end of the code and count the number A of 1’s preceding the first 0. The length of
the code is A + c + 1 bits (for m = 16, this is A + 5 bits).If we denote the rightmost five bits of the code by R,
then the value of the code is 16A + R. This simple decoding reflects the way the code was constructed. To
encode n with m = 16, start by dividing it by 16 to get n = 16A + R, then write A 1’s followed by a single zero,
followed by the 4-bit representation of R.
Case 2: For m values that are not powers of 2,
Assuming again that a code begins with A 1’s, start by removing them and the zero immediately following
them. Denote the c − 1 bits that follow by R. If R < 2c − m, then the total length of the code is A + 1 + (c − 1)
(the A 1’s, the zero following them, and the c – 1 bits that follow) and its value is m×A + R. If R ≥ 2c −m, then
the total length of the code is A + 1 + c and its value is m×A + R_ − (2c − m), where R_ is the c-bit integer
consisting of R and the bit that follows R. An example is the code 0001xxx, for m = 14. There are no leading
1’s, so A is 0. After removing the leading zero, the c − 1 = 3 bits that follow are R = 001. Since R < 2c − m =
2, we conclude that the length of the code is 0 + 1 + (4 − 1) = 4 and its value is 001.
______________________________________________________________________________________
Q .Short note on scalar quantization._______________________________________________________
In many lossy compression applications we are required to represent each source output using one of a
small number of codewords. The number of possible distinct source output values is generally much larger
than the number of codewords available to represent them. The process of representing a large—possibly
infinite—set of values with a much smaller set is called quantization.
1. Quantization is the process of mapping a continuous or discrete scalar or vector, produced by a
source, into a set of digital symbols that can be transmitted or stored using a finite number of bits.
In the field of data compression, quantization is used in two ways:
(a) If the data to be compressed is in the form of large numbers, quantization is used to convert it to
small numbers. Small numbers take less space than large ones, so quantization generates
compression. On the other hand, small numbers generally contain less information than large
ones, so quantization results in lossy compression.
(b) If the data to be compressed is analog (i.e., a voltage that changes with time) quantization is
used to digitize it into small numbers. The smaller the numbers the better the compression, but
also the greater the loss of information.
2. The set of inputs and outputs of a quantizer can be scalars or vectors. If they are scalars, we call the
quantizers scalar quantizers. If they are vectors, we call the quantizers vector quantizers.

Procedure
i. In practice,the quantizer consists of two mappings:an encoder mapping and a decoder mapping.
ii. The encoder divides the range of values that the source generates into a number of intervals.
iii. Each interval is represented by a distinct codeword. The encoder represents all the source outputs
that fall into a particular interval by the codeword representing that interval.
iv. As there could be many—possibly infinitely many—distinct sample values that can fall in any given
interval, the encoder mapping is irreversible. Knowing the code only tells us the interval to which the
sample value belongs. It does not tell us which of the many values in the interval is the actual sample
value. When the sample value comes from an analog source, the encoder is called an analog-to-
digital (A/D) converter.
v. For every codeword generated by the encoder, the decoder generates a reconstruction value.
vi. Because a codeword represents an entire interval, and there is no way of knowing which value in the
interval was actually generated by the source, the decoder puts out a value that, in some sense, best
represents all the values in the interval. If the reconstruction is analog, the decoder is often referred
to as a digital-to-analog (D/A) converter.
This aspect of quantization is used by several audio compression methods.

Drawbacks :
i. Scalar quantization is an example of a lossy compression method, where it is easy to control the
trade-off between compression ratio and the amount of loss. However, because it is so simple, its
use is limited to cases where much loss can be tolerated.
ii. Scalar quantization is not suitable for image compression because it creates annoying artifacts in the
decompressed image. Imagine an image with an almost uniform area where all pixels have values
127 or 128. If 127 is quantized to 111 and 128 is quantized to 144, then the result, after
decompression,
may resemble a checkerboard where adjacent pixels alternate between 111 and 144. This is why
practical algorithms use vector quantization, instead of scalar quantization, for lossy (and sometimes
lossless) compression of images and sound.

Q. Distinguish between scalar and vector quantization (5mks)


____________________________________________________________________________________
Q. Explain vector quantization______________________________________________________________
Need :
Scalar quantization is an intuitive, lossy method where the information lost is not necessarily the least
important. Vector quantization can obtain better results. The vector quantization is a method where the
compression ratio is known in advance.

.
Principle:
The image is partitioned into equal-size blocks (called vectors) of pixels, and the encoder has a list (called a
codebook) of blocks of the same size. Each image block B is compared to all the blocks of the codebook and
i s matched with the “closest” one.

Procedure:
i. In vector quantization we group the source output into blocks or vectors. For example, we can treat L
consecutive samples of speech as the components of an L-dimensional vector.
ii. This vector of source outputs forms the input to the vector quantizer.
iii. At both the encoder and decoder of the vector quantizer, there is a set of L-dimensional vectors
called the codebook.
iv. The vectors in this codebook, known as code-vectors, are selected to be representative of the
vectors
v. generated from the source output.
vi. Each code-vector is assigned a binary index.
vii. At the encoder, the input vector is compared to each code-vector in order to find the code-vector
closest to the input vector. The elements of this code-vector are the quantized values of the source
output.
viii. In order to inform the decoder about which code-vector was found to be the closest to the input
vector, the binary index of the code-vector is transmitted or stored.. Because the decoder has
exactly the same codebook, it can retrieve the code-vector given its binary index.
A pictorial representation of this process is shown in Figure 10.1.

xi. Although the encoder may have to perform a considerable amount of computations in order to
find the closest reproduction vector to the vector of source outputs, the decoding consists of a
table lookup. This makes vector quantization a very attractive encoding scheme for applications
in which the resources available for decoding are considerably less than the resources available
for encoding.

Advantages of Vector Quantization over Scalar Quantization


i. For a given rate (in bits per sample), use of vector quantization results in a lower distortion than when
scalar quantization is used at the same rate, for several reasons.
ii. If the source output is correlated, vectors of source output values will tend to fall in clusters. By
selecting the quantizer output points to lie in these clusters, a more accurate representation of the
source output can be obtained.
______________________________________________________________________________________
Q .What is RLE? How can it be used for image compression? _________________________________
Run-Length Encoding
The idea behind this approach to data compression is this: If a data item d occurs n consecutive times in the
input stream, replace the n occurrences with the single pair nd. The n consecutive occurrences of a data item
are called a run length of n, and this approach to data compression is called run-length encoding or RLE. The
compressed data is written in the following format (escape character, data character, run length)
RLE Text Compression
i. Just replacing “2._all_is_too_well “with “2._a2_is_t2_we2 “will not work.
ii. Clearly, the decompressor should have a way to tell that the first 2 is part of the text while the others
are repetition factors for the letters o and l.
iii. One way to solve this problem is to precede each repetition with a special escape character. If we
use the character @ as the escape character, then the string 2._a@2l_is_t@2o_we@2l can be
decompressed unambiguously.
iv. However, this string is longer than the original string, because it replaces two consecutive letters with
three characters.
v. We have to adopt the convention that only three or more repetitions of the same character will be
replaced with a repetition factor. Figure 1.6a is a flowchart for such a simple run-length text
compressor.

Compression
After reading the first character, the repeat-count is 1 and the character is saved. Subsequent characters are
compared with the one already saved, and if they are identical to it, the repeat-count is incremented. When a
different character is read, the operation depends on the value of the repeat count. If it is small, the saved
character is written on the compressed file and the newly-read character is saved. Otherwise, an @ is written
,followed by the repeat-count and the saved character.
Decompression: Decompression is also straightforward. It is shown in Figure 1.6b. When an @ is read, the
repetition count n and the actual character are immediately read, and the character is written n times on the
output stream.
Drawbacks : In plain English text there are not many repetitions .The most repetitive character is the space.
The character “@” may be part of the text in the input stream, in which case a different escape character
must be chosen. Sometimes the input stream may contain every possible character in the alphabet. Since
the repetition count is written on the output stream as a byte, it is limited to counts of up to 255.
Compression ratio: To get an idea of the compression ratios produced by RLE, we assume a string of N
characters that needs to be compressed. We assume that the string contains M repetitions of average length
L each. Each of the M repetitions is replaced by 3 characters (escape, count, and data), so the size of the
compressed string is N − M × L +M ×3 = N −M(L − 3) and the compression factor is N / (N −M(L − 3) ).
______________________________________________________________________________________
Q .What is RLE? How can it be used for audio compression?__________________________________
RLE Image Compression
i. RLE is a natural candidate for compressing graphical data.
ii. A digital image consists of small dots called pixels. Each pixel can be either one bit, indicating a black
or a white dot, or several bits, indicating one of several colors or shades of gray.
iii. We assume that the pixels are stored in an array called a bitmap in memory, so the bitmap is the
input stream for the image.
iv. Pixels are normally arranged in the bitmap in scan lines, so the first bitmap pixel is the dot at the top
left corner of the image, and the last pixel is the one at the bottom right corner.

Principle: Compressing an image using RLE is based on the observation that if we select a
pixel in the image at random, there is a good chance that its neighbors will have the same color.
Compression:
i. The compressor scans the bitmap row by row, looking for runs of pixels of the same color. If the
bitmap starts, e.g., with 17 white pixels, followed by 1 black pixel, etc., then only the numbers 17, 1, .
. need be written on the output stream.
ii. The compressor assumes that the bitmap starts with white pixels. If this is not true, then the bitmap
starts with zero white pixels, and the output stream should start with the run length 0. The resolution
of the bitmap should also be saved at the start of the output stream.
iii. The size of the compressed stream depends on the complexity of the image.
iv. The more detail, the worse the compression.
v. RLE can also be used to compress grayscale images.
vi. Each run of pixels of the same intensity (gray level) is encoded as a pair (run length, pixel value). The
run length usually occupies one byte, allowing for runs of up to 255 pixels. The pixel value occupies
several bits, depending on the number of gray levels (typically between 4 and 8 bits).

Disadvantage of image RLE:


i. When the image is modified, the run lengths normally have to be completely redone.
ii. The RLE output can sometimes be bigger than pixel by-pixel storage (i.e., an uncompressed
image, a raw dump of the bitmap) for complex pictures. Imagine a picture with many vertical lines.
When it is scanned horizontally,it produces very short runs, resulting in very bad compression, or
even in expansion.
iii. A good, practical RLE image compressor should be able to scan the bitmap by rows, columns, or
in zigzag (Figure 1.8a,b) and it may automatically try all three ways on every bitmap compressed
to achieve the best compression.
______________________________________________________________________________________
Q. Why statistical or dictionary based methods can’t be used for image compression?_____________

Digitizing an image involves two steps: sampling and quantization. Sampling an image is the process of
dividing the two-dimensional original image into small regions: pixels. Quantization is the process of
assigning an integer value to each pixel. Notice that digitizing sound involves the same two steps, with the
difference that sound is one-dimensional.

Statistical methods work best when the symbols being compressed have different probabilities.
i. An input stream where all symbols have the same probability will not compress, even though it may
not be random. It turns out that in a continuous-tone color or grayscale image, the different colors or
shades of gray may often have roughly the sameprobabilities. This is why statistical methods are not
a good choice for compressing such images, and why new approaches are needed.
ii. Images with color discontinuities, where adjacent pixels have widely different colors, compress better
with statistical methods, but it is not easy to predict, just by looking at an image, whether it has
enough color discontinuities.
Dictionary-based compression methods also tend to be unsuccessful in dealing with continuous-tone
images. Such an image typically contains adjacent pixels with similar colors, but does not contain repeating
patterns.
i. An image that contains repeated patterns such as vertical lines may lose them when digitized. So the
pixels in a scan row may end up having slightly different colors from those in adjacent rows, resulting
in a dictionary with short strings. (This problem may also affect curved edges.)
ii. Another problem with dictionary compression of images is that such methods scan the image row by
row, and therefore may miss vertical correlations between pixels.
______________________________________________________________________________________
The Principle of Image Compression. If we select a pixel in the image at random, there is a good chance that
its neighbors will have the same color or very similar colors. Image compression is therefore based on the fact that
neighboring pixels are highly correlated. This correlation is also called spatial redundancy.
______________________________________________________________________________________
Q. Discuss the various approaches of image compression. Explain any one of them.____(5mks)__________

Approach 1: This is appropriate for bi-level images.


Explanation:
A pixel in such an image is represented by one bit. Applying the principle of image compression to a bi-level
image therefore means that the immediate neighbors of a pixel P tend to be identical to P. Thus, it makes
sense to use run-length encoding (RLE) to compress such an image. A compression method for such an
image may scan it in raster order (row by row) and compute the lengths of runs of black and white pixels. The
lengths are encoded by variable-size (prefix) codes and are written on the compressed stream.
Example: facsimile compression
Approach 2: Also for bi-level images.
Explanation:
The principle of image compression tells us that the neighbors of a pixel tend to be similar to the pixel. We
can extend this principle and conclude that if the current pixel has color c (where c is either black or white),
then pixels of the same color seen in the past (and also those that will be found in the future) tend to have the
same immediate neighbors. This approach looks at n of the near neighbors of the current pixel and considers
them an n-bit number. This number is the context of the pixel. In principle there can be 2n contexts, but
because of image redundancy we expect them to be distributed in anon uniform way. Some contexts should
be common while others will be rare.
Example: This approach is used by JBIG.
Approach 3: Separate the grayscale image into n bi-level images and compress each with RLE and prefix
codes.
Explanation:
The principle of image compression seems to imply intuitively that two adjacent pixels that are similar in the
grayscale image will be identical in most of the n bi-level images. This, however, is not true.
Example: Reflected gray code.
Approach 4: Use the context of a pixel to predict its value.
Explanation:
The context of a pixel is the values of some of its neighbors. We can examine some neighbors of a pixel P,
compute an average A of their values, and predict that P will have the value A. The principle of image
compression tells us that our prediction will be correct in most cases, almost correct in many cases, and
completely wrong in a few cases.
Example: This is the principle of the MLP method.
Approach 5: Transform the values of the pixels and encode the transformed values.
Explanation:
Compression is achieved by reducing or removing redundancy caused by the correlation between pixels, so
transforming the pixels to a representation where they are decorrelated eliminates the redundancy. Also in a
highly correlated image, the pixels tend to have equiprobable values, which results in maximum entropy. If
the transformed pixels are decorrelated, certain pixel values become common, thereby having large
probabilities, while others are rare. This results in small entropy. Quantizing the transformed values can
produce efficient lossy image compression.
Approach 6: The principle of this approach is to separate a continuous-tone color image into three grayscale
images and compress each of the three separately, using approaches 3, 4, or 5.
Explanation:
i. For a continuous-tone image, the principle of image compression implies that adjacent pixels have
similar, although perhaps not identical, colors. However, similar colors do not mean similar pixel
values.
ii. An important feature of this approach is to use a luminance chrominance color representation
instead of the more common RGB.
iii. The advantage of the luminance chrominance color representation is that the eye is sensitive to small
changes in luminance but not in chrominance. This allows the loss of considerable data in the
chrominance components, while making it possible to decode the image without a significant visible
loss of quality.
Approach 7: This approach is for discrete-tone images.
Explanation:
Discrete-tone images contain uniform regions, and a region may appear several times in the image. A good
example is a screen dump. Such an image consists of text and icons. Each character of text and each icon is
a region, and any region may appear several times in the image. A possible way to compress such an image
is to scan it, identify regions, and find repeating regions. If a region B is identical to an already found region
A, then B can be compressed by writing a pointer to A on the compressed stream. The block decomposition
method (FABD) is an example of how this approach can be implemented.
Approach 8: Partition the image into parts (overlapping or not) and compress it by processing the parts one
by one.
Explanation:
Suppose that the next unprocessed image part is part number 15. Try to match it with parts 1–14 that have
already been processed. If part 15 can be expressed, for example, as a combination of parts 5 (scaled) and
11 (rotated), then only the few numbers that specify the combination need be saved, and part 15can be
discarded. If part 15 cannot be expressed as a combination of already-processed parts, it is declared
processed and is saved in raw format. This approach is the basis of the various fractal methods for image
compression. It applies the principle of image compression to image parts instead of to individual pixels.
Applied this way, the principle tells us that “interesting” images (i.e., those that are being compressed in
practice) have a certain amount of self similarity. Parts of the image are identical or similar to the entire
image or to other parts.

Q . Explain with example the significance of Gray codes for image compression.______________________

Gray Codes
Gray code is the binary representation of integers where consecutive integers differ only by one bit.
Need for grey code
Any method for compressing bi-level images, for example, can be used to compress grayscale images by
separating the bit planes and compressing each individually, as if it were a bi-level image. Imagine, for
example, an image with 16 grayscale values. Each pixel is defined by four bits, so the image can be
separated into four bi-level images. The trouble with this approach is that it violates the general principle of
image compression. Imagine two adjacent 4-bit pixels with values 7 = 01112 and 8 = 10002. These pixels
have close values, but when separated into four bit planes, the resulting 1-bit pixels are different in every bit
plane! This is because the binary representations of the consecutive integers 7 and 8 differ in all four bit
positions. In order to apply any bi-level compression method to grayscale images, a binary representation of
the integers is needed where consecutive integers have codes differing by one bit only. Such a
representation exists and is called reflected Gray code (RGC).
Encoding
This code is easy to generate with the following recursive construction:
i. Start with the two 1-bit codes (0, 1).
ii. Construct two sets of 2-bit codes by duplicating(0, 1) and appending, either on the left or on the right,
first a zero, then a one, to the original set. The result is (00, 01) and (10, 11).
iii. Now reverse (reflect) the second set, and concatenate the two.
iv. The result is the 2-bit RGC (00, 01, 11, 10); a binary code of the integers 0 through 3 where
consecutive codes differ by exactly one bit.
v. Applying the rule again produces the two sets (000, 001, 011, 010) and (110, 111, 101, 100), which
are concatenated to form the 3-bit RGC.
Note that the first and last codes of any RGC also differ by one bit. Here are the first three steps for
computing the 4-bit RGC:

Example:
2-bit list: 00, 01, 11, 10

Reflected: 10, 11, 01, 00


Prefix old entries with 0: 000, 001, 011, 010,
Prefix new entries with 1: 110, 111, 101, 100
Concatenated: 000, 001, 011, 010, 110, 111, 101, 100
Explanation:
i. The conclusion is that the most-significant bit planes of an image obey the principle of image
compression more than the least-significant ones.
ii. When adjacent pixels have values that differ by one unit (such as p and p+1), chances are that the
least-significant bits are different and the most-significant ones are identical.
iii. Any image compression method that compresses bit planes individually should therefore treat the
least-significant bit planes differently from the most-significant ones, or should use RGC instead of
the binary code to represent pixels.
iv. Color images provide another example of using the same compression method across image types.
Any compression method for grayscale images can be used to compress color images.
v. In a color image, each pixel is represented by three color components (such as RGB). Imagine a
color image where each color component is represented by one byte.
vi. A pixel is represented by three bytes, or 24 bits, but these bits should not be considered a single
number.
vii. The two pixels 118|206|12 and 117|206|12 differ by just one unit in the first component, so they have
very similar colors. Considered as 24-bit numbers, however, these pixels are very different, since
they differ in one of their most significant bits.
viii. Any compression method that treats these pixels as 24-bit numbers would consider these pixels very
different, and its performance would suffer as a result.
ix. A compression method for grayscale images can be applied to compressing color images, but the
color image should first be separated into three color components, and each component compressed
individually as a grayscale image.
______________________________________________________________________________________
Q. Explain various steps in JPEG image compression. Why DCT is more popular as compared to
other transforms for image compression?___________________________________________________

i. The name JPEG is an acronym that stands for Joint Photographic Experts Group.
ii. JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images (not
videos). It does not handle bi-level (black and white) images very well. It also works best on
continuous-tone images, where adjacent pixels have similar colors.
iii. An important feature of JPEG is its use of many parameters, allowing the user to adjust the amount
of the data lost (and thus also the compression ratio) over a very wide range.
iv. JPEG has been designed as a compression method for continuous-tone images.
The whole idea of JPEG is to change the picture into a linear (vector) set of numbers that reveals the
redundancies. The redundancies (lack of changes) can then be removed by using one of the text
compression methods.

(The main goals of JPEG compression are the following:


1. High compression ratios, especially in cases where image quality is judged as very good to excellent.
2.The use of many parameters, allowing knowledgeable users to experiment and achieve the desired
compression/quality trade-off.
3. Obtaining good results with any kind of continuous-tone image, regardless of image dimensions, color
spaces, pixel aspect ratios, or other image features.
4.A sophisticated, but not too complex compression method, allowing software and hardware
implementations on many platforms.
5. Several modes of operation:
(a) A sequential mode where each image component is compressed in a single left-to-right, top-to-bottom
scan;
(b) A progressive mode where the image is compressed in multiple blocks (known as “scans”) to be viewed
from coarse to fine detail;
(c) A lossless mode that is important in cases where the user decides that no pixels should be lost (the
trade-off is low compression ratio compared to the lossy modes); and
(d) A hierarchical mode where the image is compressed at multiple resolutions allowing lower-resolution
blocks to be viewed without first having to decompress the following higher-resolution blocks.)not needed

The main JPEG compression steps:


Step 1: Color images are transformed from RGB into a luminance/chrominance color space.
The eye is sensitive to small changes in luminance but not in chrominance, so the chrominance part can later
lose
much data, and thus be highly compressed, without visually impairing the overall image quality much.

Step 2: Color images are down sampled by creating low-resolution pixels from the original ones.
(this step is used only when hierarchical compression is selected). Down sampling is done either at a ratio of
2:1 both horizontally and vertically (reduces image to 1/3) or at ratios of 2:1 horizontally and 1:1 vertically
(reduces image to 2/3 of original size).Since this is done only on chrominance component & not on
luminance component, there is no loss of image quality.

Step 3: The pixels of each color component are organized in groups of 8×8 pixels called data units, and each
data unit is compressed separately. The bottom row and rightmost column can be duplicated in case the
number of rows or columns is not a multiple of 8.

Step 4: The discrete cosine transform (DCT, Section 4.6) is then applied to each data unit to create an 8×8
map of frequency components.
They represent the average pixel value and successive higher-frequency changes within the group. This
prepares the image data for the crucial step of losing information.

Step 5: Each of the 64 frequency components in a data unit is divided by a separate number called its
quantization coefficient (QC), and then rounded to an integer.
Here information lost is not retrievable, large QC causes more loss, so high frequency components typically
have large QC. Practically QC tables are recommended by JPEG standards for chrominance and luminance
image compression..

Step 6: The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded
using a combination of RLE and Huffman coding.

Step 7: The last step adds headers and all the required JPEG parameters, and outputs the result.
The compressed file may be in one of three formats (1) the interchange format, (2) the abbreviated format for
compressed image data, and (3) the abbreviated format for table-specification data.

The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression method.)
Why DCT is used in JPEG?(5mks)
The JPEG committee elected to use the DCT because of
i. its good performance,
ii. it does not assume anything about the structure of the data (the DFT, for example, assumes that the data to be
transformed is periodic), and
iii. there are ways to speed it up .
The JPEG standard calls for applying the DCT not to the entire image but to data units (blocks) of 8×8 pixels. The
reasons for this are:
(1) Applying DCT to large blocks involves many arithmetic operations and is therefore slow. Applying DCT to small data
units is faster.
(2) Experience shows that, in a continuous-tone image, correlations between pixels are short range. A pixel in such an
image has a value (color component or shade of gray) that’s close to those of its near neighbors, but has nothing to do
with the values of far neighbors.
The JPEG DCT is therefore executed by Equation given below, duplicated here for n = 8

(3)The DCT is JPEG’s key to lossy compression.


(4)The unimportant image information is reduced or removed by quantizing the 64 DCT coefficients, especially the ones
located toward the lower-right.
(5) If the pixels of the image are correlated, quantization does not degrade the image quality much. For best results,
each of the 64 coefficients is quantized by dividing it by a different quantization coefficient (QC). All 64 QCs are
parameters that can be controlled, in principle, by the user.

The JPEG decoder works by computing the inverse DCT (IDCT), Equation given below duplicated here for n = 8

It takes the 64 quantized DCT coefficients and calculates 64 pixels pxy. If the QCs are the right ones, the new 64 pixels
will be very similar to the original ones. Mathematically, the DCT is a one-to-one mapping of 64-point vectors from the
image domain to the frequency domain. The IDCT is the reverse mapping. If the DCT and IDCT could be calculated with
infinite precision and if the DCT coefficients were not quantized, the original 64 pixels would be exactly reconstructed.

Quantization in JPEG
After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized.This is the step where
information is lost .
Each number in the DCT coefficients matrix is divided by the corresponding number from the particular
“quantization table” used, and the result is rounded to the nearest integer. Three such tables are needed, for
the three color components.
The JPEG standard allows for up to four tables, and the user can select any of the four for quantizing each
color component.
The 64 numbers that constitute each quantization table are all JPEG parameters.
In principle, they can all be specified and fine-tuned by the user for maximum compression. Instead of
dealing wirh 64 parameters,JPEG software normally uses the following two approaches:
1. Default quantization tables. Two such tables, for the luminance (grayscale) and the chrominance
components are developed by the JPEG committee as a means to reduce the DCT coefficients with high
spatial frequencies.
2. A simple quantization table Q is computed, based on one parameter R specified by the user. A simple
expression such as Qij = 1+(i + j) × R guarantees that QCs start small at the upper-left corner and get bigger
toward the lower-right corner of the table .
If the quantization is done correctly, very few nonzero numbers will be left in the DCT coefficients matrix, and
they are the output of JPEG, but they are further compressed before being written on the output stream.
In the JPEG literature this compression is called “entropy coding,”
Three techniques are used by entropy coding to compress the 8 × 8 matrix of integers:
1. The 64 numbers are collected by scanning the matrix in zigzags. This produces a string of 64 numbers
that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero numbers
are output and are followed by a special end-of block (EOB) code. This way there is no need to output the
trailing zeros (we can say that the EOB is the run-length encoding of all the trailing zeros).
2. The nonzero numbers are compressed using Huffman coding .
3. The first of those numbers (the DC coefficient) is treated differently from the others (the AC coefficients).
Why zig-zag scan?
1.To group low frequency co-efficient in the top of vector.
2. maps 8x8 to a 1x64 vector.
3.zig-zag scan is more effective.

Coding in JPEG
Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at position (0, 0), the top left
corner] and 63 AC coefficients.
The DC coefficient is a measure of the average value of the 64 original pixels, constituting the data unit. The
DC coefficients of adjacent data units don’t differ much. JPEG outputs the first one (encoded), followed by
differences (also encoded) of the DC coefficients of consecutive data units.

Example: If the first three 8×8 data units of an image have quantized DC coefficients of 1118, 1114, and
1119, then the JPEG output for the first data unit is 1118 (Huffman encoded)followed by the 63 (encoded) AC
coefficients of that data unit. The output for the second data unit will be 1114 − 1118 = −4 (also Huffman
encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the output for the third data unit
will be 1119 − 1114 = 5 (also Huffman encoded), again followed by the 63 (encoded) AC coefficients of that
data unit. This way of handling the DC coefficients is worth the extra trouble, because the differences are
small.

JPEG file format


i. A JPEG encoder outputs a compressed file that includes parameters, markers, and the compressed
data units.
ii. The parameters are either four bits (these always come in pairs), one byte, or two bytes long.
iii. The markers serve to identify the various parts of the file. Each is two bytes long, where the first byte
is X’FF’ and the second one is not 0 or X’FF’. A marker may be preceded by a number of bytes with
X’FF’.
iv. The compressed data units are combined into MCUs (minimal coded unit), where an MCU is either a
single data unit (in the non interleaved mode) or three data units from the three image components
(in the interleaved mode).

Figure 4.68 shows the main parts of the JPEG compressed file (parts in square brackets are optional).
i. The file starts with the SOI marker and ends with the EOI marker. In between these markers, the
compressed image is organized in frames.
ii. In the hierarchical mode there are several frames, and in all other modes there is only one frame. In
each frame the image information is contained in one or more scans, but the frame also contains a
header and optional tables (which, in turn, may include markers).
iii. The first scan may be followed by an optional DNL segment (define number of lines), which starts
with the DNL marker and contains the number of lines in the image that’s represented by the frame.
iv. A scan starts with optional tables, followed by the scan header, followed by several entropy-coded
segments (ECS), which are separated by (optional) restart markers (RST). Each ECS contains one
or more MCUs, where an MCU is, as explained earlier, either a single data unit or three such units.
Q. List various methods of lossless image compression. Explain any one of them._________________

The lossless methods of image compression are:


1. GIF (graphic interchange format)
2. JPEG-LS
3. PNG

1. GIF

GIF—the graphics interchange format—was developed by CompuServe Information Services in 1987 as


an efficient, compressed graphics file format, which allows for images to be sent between different
computers. The original version of GIF is known as GIF 87a.
Explanation
i. GIF is not a data compression method; it is a graphics file format that uses a variant of LZW to
compress the graphics data.
ii. In compressing data, GIF is very similar to compress and uses a dynamic, growing dictionary. It
starts with the number of bits per pixel b as a parameter. For a monochromatic image, b = 2; for an
image with 256 colors or shades of gray, b = 8.
iii. The dictionary starts with 2 b+1 entries and is doubled in size each time it fills up, until it reaches a
size of 212 = 4,096 entries, where it remains static. At such a point, the encoder may decide to
discard the dictionary at any point and start with a new, empty one after monitoring it.
iv. The pointers, which get longer by 1 bit from one dictionary to the next, are accumulated and are
output in blocks of 8-bit bytes. Each block is preceded by a header that contains the block size (255
bytes maximum) and is terminated by a byte of eight zeros. The last block contains, just before the
terminator, the eof value, which is 2b +1.
v. An interesting feature is that the pointers are stored with their least significant bit on the left.

Example
Consider the following 3-bit pointers 3, 7, 4, 1, 6, 2, and 5.Their binary values are 011, 111, 100, 001, 110,
010, and 101, so they are packed in 3 bytes|10101001|11000011|11110...|.

Application & drawback


The GIF format is commonly used today by web browsers, but it is not an efficient image compressor. GIF
scans the image row by row, so it discovers pixel correlations within a row, but not between rows. We can
say that GIF compression is inefficient because GIF is one-dimensional while an image is two-dimensional.

2.PNG

The portable network graphics (PNG) file format has been developed in the mid-1990s by a group (the
PNG development group [PNG 03]) headed by Thomas Boutell. The project was started in response to the
legal issues surrounding the GIF file format.

Aim
The aim of this project was to develop a sophisticated graphics file format that will be flexible, will support
many different types of images, will be easy to transmit over the Internet, and will be unencumbered by
patents.

Features: Its main features are as follows:


a) It supports images with 1, 2, 4, 8, and 16 bitplanes.
b) Sophisticated color matching.
c) A transparency feature with very fine control provided by an alpha channel.
d) Lossless compression by means of Deflate combined with pixel prediction.
e) Extensibility: New types of meta-information can be added to an image file without creating
incompatibility with existing applications.
Explanation
i. A PNG file consists of chunks that can be of various types and sizes. The chunks which contain
information that’s essential for displaying the image are referred to as “critical chunks.” Other chunks
are ancillary.
ii. A chunk consists of the following parts: (1) size of the data field (2) chunk name (3) data field and (4) a
32-bit cyclical redundancy .
iii. Each chunk has a 4-letter name of which
 the first letter is uppercase for a critical chunk and lowercase for an ancillary chunk
 the second letter is uppercase for standard chunks (those defined by or registered with the
PNG group) and lowercase for a private chunk (an extension of PNG)
 the third letter is always uppercase, and
 the fourth letter is uppercase if the chunk is “unsafe to copy” and lowercase if it is “safe to
copy.”
iv. Any PNG-aware application will process all the chunks it recognizes. Examples of “safe to copy” are
chunks with text comments or those indicating the physical size of a pixel. Examples of “unsafe to
copy” are chunks with gamma/color correction data or palette histograms.
v. The four critical chunks defined by the PNG standard are IHDR (the image header),PLTE (the color
palette), IDAT (the image data), and IEND (the image trailer).
vi. The PNG file format uses a 32-bit CRC as defined by the ISO standard 3309 .The CRC polynomial is
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1.
vii. A PNG file starts with an 8-byte signature that helps software to identify it as PNG. This is immediately
followed by an IHDR chunk with the image dimensions, number of bitplanes, color type, and data
filtering and interlacing. The remaining chunks must include a PLTE chunk if the color type is palette,
and one or more adjacent IDAT chunks with the compressed pixels. The file must end with an IEND
chunk. The PNG standard defines the order of the public chunks, whereas private chunks may have
their own ordering constraints.
viii. An image in PNG format may have one of the following five color types: RGB with 8 or 16 bit planes,
palette with 1, 2, 4, or 8 bitplanes, grayscale with 1, 2, 4, 8, or 16 bitplanes, RGB with alpha channel
(with 8 or 16 bitplanes), and grayscale with alpha channel (also with 8 or 16 bitplanes).
ix. The most intriguing feature of the PNG format is the way it handles interlacing. Interlacing makes it
possible to display a rough version of the image on the screen, then improve it gradually until it reaches
its final, high-resolution form. The special interlacing used in PNG is called Adam 7.
x. PNG divides the image into blocks of 8×8 pixels each, and displays each block in seven steps. In step
1, the entire block is filled up with copies of the top-left pixel. In each subsequent step, the block’s
resolution is doubled by modifying half its pixels according to the next number.
xi. PNG compression is lossless and is performed in two steps. The first step, termed delta filtering (or
just filtering), converts pixel values to numbers .Filtering does not compress the data but transforms the
pixel data to a format where it is more compressible. The second step employs Deflate to encode the
differences.

Advantage :
The PNG (Portable Network Graphics) format is l more space-efficient in the case of images with many pixels
of the same color, such as diagrams, and supports special compression features that JPEG 2000 does not.
______________________________________________________________________________________
Q. Short note on JPEG –LS____(10mks)_____________________________________________________

JPEG-LS is a new standard for the lossless (or near-lossless) compression of continuous tone images.

Principle
JPEG-LS examines several of the previously-seen neighbors of the current pixel, uses them as the context of
the pixel, uses the context to predict the pixel and to select a probability distribution out of several such
distributions, and uses that distribution to encode the prediction error with a special Golomb code
The context used to predict the current pixel x is shown in Figure below.

Encoder
The encoder examines the context pixels and decides whether to encode the current pixel x in the run mode
or in the regular mode. If the context suggests that the pixels y, z,. . . following the current pixel are likely to
be identical, the encoder selects the run mode. Otherwise, it selects the regular mode. In the near-lossless
mode the decision is slightly different. If the context suggests that the pixels following the current pixel are
likely to be almost identical (within the tolerance parameter NEAR), the encoder selects the run mode.
Otherwise, it selects the regular mode. The rest of the encoding process depends on the mode selected.

1.In the regular mode, the encoder uses the values of context pixels a, b, and c to predict pixel x, and
subtracts the prediction from x to obtain the prediction error, denoted by Errval. This error is then corrected
by a term that depends on the context (this correction is done to compensate for systematic biases in the
prediction), and encoded with a Golomb code. The Golomb coding depends on all four pixels of the context
and also on prediction errors that were previously encoded for the same context (this information is stored in
arrays A and N.If near-lossless compression is used, the error is quantized before it is encoded

2.In the run mode, the encoder starts at the current pixel x and finds the longest run of pixels that are
identical to context pixel a. The encoder does not extend this run beyond the end of the current image row.
Since all the pixels in the run are identical to a (and a is already known to the decoder) only the length of the
run needs be encoded, and this is done with a 32-entry array denoted by J (Section 4.9.1). If near-lossless
compression is used, the encoder selects a run of pixels that are close to a within the tolerance parameter
NEAR.
Decoder
The decoder is not substantially different from the encoder, so JPEG-LS is a nearly symmetric compression
method. The compressed stream contains data segments (with the Golomb codes and the encoded run
lengths), marker segments (with information needed by the decoder), and markers (some of the reserved
markers of JPEG are used). A marker is a byte of all ones followed by a special code, signaling the start of a
new segment. If a marker is followed by a byte whose most significant bit is 0, that byte is the start of a
marker segment. Otherwise, that byte starts a data segment.
Advantages:
i. JPEG-LS is capable of lossless compression.
ii. JPEG-LS has very low computational complexity.
____________________________________________________________________________________
Q.Short note on JPEG 2000____(read in detail from txt 622 onwards)_____________________________
Need:
The current JPEG standard provides excellent performance at rates above 0.25 bits per pixel. However, at
lower rates there is a sharp degradation in the quality of the reconstructed image. To correct this and other
shortcomings, the JPEG committee initiated work on another standard, commonly known as JPEG 2000. It
introduced “compress once, decompress many ways” paradigm.
Principle:
The JPEG 2000 is the standard based on wavelet decomposition using Discrete Wavelet Transform (DWT).
This transform decompose the image using functions called wavelets.

Following is a list of areas where this new standard is expected to improve on existing methods:
i. High compression efficiency. Bitrates of less than 0.25 bpp are expected for highly detailed grayscale
images.
ii. The ability to handle large images, up to 232×232 pixels (original JPEG can handle 216×216 pixels).
iii. Progressive image transmission.
iv. Easy, fast access to various points in the compressed stream.
v. The decoder can pan/zoom the image while decompressing only parts of it.
vi. The decoder can rotate and crop the image while decompressing it.
vii. Error resilience. Error-correcting codes can be included in the compressed stream, to improve
transmission reliability in noisy environments.
Block diagram

At the encoder, the discrete transform is first applied on the source image data. The transform coefficients are then
quantized and entropy coded before forming the output code stream .
The decoder is the reverse of the encoder. The code stream is first entropy decoded, dequantized and inverse
discrete transformed, thus resulting in the reconstructed image data.
Steps
i. The source image is decomposed into components.
ii. The image components are (optionally) decomposed into rectangular tiles. The tile component is the
basic unit of the original or reconstructed image.
iii. A wavelet transform is applied on each tile. The tile is decomposed into different resolution levels.
iv. The resolution level is made up of subbands of coefficients that describe the frequency
characteristics
v. of local areas of tile components, rather than across the entire image component.
vi. The subbands of coefficients are quantized and collected into rectangular arrays of “code blocks”.
vii. The bit planes of coefficients in the code block are entropy coded.
viii. The encoding can be done in such a way that Regions of Interest (ROI)can be encoded at a higher
quality than the background.
ix. Markers are added to the bit stream to allow for error resilience.
x. The code stream has a main header at the beginning that describes the original image and the
various decomposition and coding styles that are used to locate, extract, decode and reconstruct the
image with the desired resolution, fidelity, region of interest(ROI) or other characteristics.
Advantages
i. Better image quality than JPEG at the same files size; or alternatively 25-35% smaller file size with
the same quality.
ii. Good image quality at low bit rates.
iii. Scalable image files i.e. no decompression needed for reformatting.
iv. JPEG 2000 is more suitable to web-graphics than baseline JPEG because it supports Alpha-channel
(Transparency component)
v. ROI: one can define some more interesting parts of the image, which are coded with more bits than
surrounding areas.
______________________________________________________________________________________
Q. Discuss the application of JPEG 2000 (5mks)_____________________________________________

Some markets and applications intended to be served by this standard are listed below:

i. Consumer applications such as multimedia devices (e.g., digital cameras, personal digital assistants,
3G mobile phones, color facsimile, printers, scanners, etc.)
ii. Client/server communication (e.g., the Internet, Image database, Video streaming, video server, etc.)
iii. Military/surveillance (e.g., HD satellite images, Motion detection, network distribution and storage, etc.)
iv. Medical imagery, esp. the DICOM specifications for medical data interchange.
v. Remote sensing
vi. High-quality frame-based video recording, editing and storage.
vii. Live HDTV feed contribution (I-frame only video compression with low transmission latency), such as
live HDTV feed of a sport event linked to the TV station studio
viii. Digital cinema
ix. JPEG 2000 has many design commonalities with the ICER image compression format that is used to
send images back from the Mars rovers.
x. Digitized Audio-visual contents and Images for Long term digital preservation
xi. World Meteorological Organization has built JPEG 2000 Compression into the new GRIB2 file
format. The GRIB file structure is designed for global distribution of meteorological data. The
implementation of JPEG 2000 compression in GRIB2 has reduced file sizes up to 80%]

Q. Distinguish between JPEG & JPEG-LS___________________________________________________

JPEG JPEG-LS
JPEG is a very well known ISO/ITU-T standard JPEG-LS is a recent ISO/ITU-T standard for
created in late 80’s. lossless coding of still images.
This compression method is usually lossy. It provides support for “near lossless
compression”
Due to losssy compression ,image quality is
Compression is generally slower than JPEG- Much faster and much better than the original
LS. JPEG standard.
In this image is divided into 8x8 blocks of pixels. It is based on the (Low Complexity Lossless
Then Discrete Cosine Transform & Huffman Compression for Images) LOCO-I algorithm
Entropy Coding are employed. using adaptive prediction, context modeling &
Golomb coding.
It offers progressive bit stream functionality. Does not offer
The choice of quantization coefficients specifies The near lossless mode allows the user to
the error introduced during compression. specify a bound on the error introduced by
compression algorithm.
____________________________________________________________________________________
Q. Difference between image and video compression?____________(5mks)______________________

Main differences between still image and video compression:

•Still image compression is simple and easy to work with.


•It’s difficult to obtain a single image from a video stream using video compression.
• Less data is used to store /transmit video in compressed form compared to still image.
• It’s not possible to reduce the frame rate when using video compression.
• Still image compression is more suitable when using a modem, or other media that only offers limited
bandwidth.

With the exception of Motion-JPEG, all video compression standards mix still images with partially complete
images. By storing only the changes from one full image to another, these ‘partially complete’ images reduce
the file size of the compressed video sequence. Scenes containing little or no variation can be compressed
quite dramatically.

Compression standards for still images:


1.JPEG
This is short for Joint Photographic Experts Group international in which each image is divided into 8x8
pixels. Each block is then individually compressed using the Digital Cosine Transform (DCT).
2.Wavelet
A popular standard optimized for images containing small amounts of data. Wavelet is not standardized and
requires special software for viewing but it is used by many manufacture.
3.JPEG 2000
Based on Wavelet (not JPEG) technology, this is a new but still relatively little-used standard.
4. GIF
This is short for Graphics Interchange Format -a bit-mapped graphics file format used widely on the Internet.
Limited to 256 colors, GIF is a good standard for images that are not too complex.

Compression Standards for video:

1.Motion-JPEG
With motion-JPEG, each frame within the video is stored as a complete image in the JPEG format &still
images are displayed at a high frame rate to produce very high-quality video .
2.H.261, 263, 321, 324 etc.
Designed for video conferencing, but sometimes used for network cameras, these standards offer a high
frame rate. However, the quality of the images is low.
3.MPEG 1
Stands for Moving Picture Encoding Group international standard ISO/IEC 11172.This gives a performance
of up to 352x288 pixel, 30 frames per second (fps) at a maximum of 1.86Mbit/s. Audio according to MPEG 1
layer 1, 2 and 3 are also included in the standard.
4.MPEG 2
It is a popular standard that offers high-quality video suitable for installations where TV-quality is needed.
All MPEG formats use a “Differential method”, the first frame is a complete picture similar to a JPEG frame,
the next frames are only an up-date of the difference between the first frame and how it looks now.
Depending on how the GOP (Group of Pictures) is set, you can have different amounts of up-dated frames (B
and I-frames) between the frames containing a complete picture (P-frame).
5.MPEG 3:
A cancelled standard aimed at HDTV (High Definition TV).
6. MPEG 4
Moving Picture Encoding Group international standard ISO/IEC 14496. This standard covers a wide variety
of applications ranging from the video displayed in cellular phones, to full feature-length movies shown in a
cinema.

Questions in Terna notes (Xerox attached):(Q1)Explain motion compensation wrt to video compression, (Q
2)Basic structure of MPEG-1 video standard.(Q3) Explain the algorithm for video conferencing.(Q 4)Loop filter.(Q
5)Explain the concept of packet video? What are the compression issues in ATM network?(Q 6)Explain with
example progressive image compression. (Q 7) Fascimile encoding.(Q 8)Run length encoding.( Q 9) Short note on
JPEG. (Q10)MPEG video standards. (Q11)Compression algorithm for packet video.(Q12)What are improvements
in JBIG-2 as compared to JBIG and how it is used for the encoding and decoding?(Q13)Describe the various
models used for lossy compression algorithm.
______________________________________________________________________________________
Q.Short note on Facsimile Encoding___(5mks)_________________________________________________
i. It is one of the earliest applications of lossless compression in the modern era. It is also as simply
fax.
ii. In facsimile transmission, a page is scanned and converted into a sequence of black or white pixels.
The requirements of how fast the facsimile of an A4 document (210×297 mm) must be transmitted
have changed over the last two decades.
iii. The CCITT (now ITU-T) has issued a number of recommendations based on the speed requirements
at a given time.
iv. The CCITT classifies the apparatus for facsimile transmission into four groups. (This classification is
done for A4 size document)
_ Group 1: This apparatus is capable of transmitting an A4-size document in about six minutes over phone
lines using an analog scheme. The apparatus is standardized in recommendation T.2.
_ Group 2: This apparatus is capable of transmitting an A4-size document over phone lines in about three
minutes. A Group 2 apparatus also uses an analog scheme and, therefore, does not use data compression.
The apparatus is standardized in recommendation T.3.
_ Group 3: This apparatus uses a digitized binary representation of the facsimile. Because it is a digital
scheme, it can and does use data compression and is capable of transmitting an A4-size document in about
a minute. The apparatus is standardized in recommendation T.4.
_ Group 4: This apparatus has the same speed requirement as Group 3. The apparatus is standardized in
recommendations T.6, T.503, T.521, and T.563.
v. With the arrival of the Internet, facsimile transmission has changed as well. Given the wide range of
rates and “apparatus” used for digital communication, it makes sense to focus more on protocols than
on apparatus. The newer recommendations from the ITU provide standards for compression that are
more or less independent of apparatus.
______________________________________________________________________________________
Q. Short note on fractal image compression_____(10mks)_____________________________________
Need for fractal image compression OR
Fractal image compression: How fractal compression differs from other techniques :
i. The underlying compression technique in JPEG standard is DCT (Discrete Cosine Transform). DCT
breaks an image into small blocks, usually 8 pixels by 8 pixels. JPEG achieves compression by discarding
image data( high frequency cosine terms) that is not required for the human eye to perceive the image
resulting in a poorer quality image with a pixelized (blocky) appearance.
Fractal images are not based on a map of pixels, nor is the encoding weighted to the visual characteristics
of the human eye. Instead, bitmap data is discarded when it is required to create a best-fit fractal pattern.
Greater compression ratios are achieved using greater computationally intensive transforms that may
degrade the image, but the distortion appears much more natural due to the fractal components.
ii. DCT-based JPEG compression is quite effective at low or moderate compression ratios, up to ratios of 20
or 25 to 1. Beyond this point, the image becomes very “blocky” as the compression increases and the
image quality becomes too poor for practical use. JPEG obtains high compression ratios by cutting off the
high frequency components of the image. This can also introduce very visible artifacts, in particular for
sharp edges in the image. This is known as Gibb’s phenomenon.
iii. Furthermore, the JPEG method is resolution dependent. Higher resolution graphics have more pixels than
the same images at lower resolution. . In order to “zoom-in” on a portion of an image and to enlarge it, it is
necessary to replicate pixels. The enlarged image will exhibit a certain level of “blockiness” which soon
becomes unacceptable as the expansion factor increases.
iv. Most other lossy methods are also symmetrical in nature. That is, a particular sequence of steps is used to
compress an image, and the reverse of those steps is used to decompress it. Compression and
decompression will take about the same amount of time as well. Fractal compression is an asymmetrical
process, taking much longer to compress an image than to decompress it.

Explanation
Fractal compression is a lossy compression method for digital images, based on fractals. The method is best
suited for textures and natural images.
A fractal is a structure that is made up of similar forms and patterns that occur in many different sizes.
For a given initial image each image is formed from a transformed (and reduced) copy of itself, and hence it
must have detail at every scale. That is, the images are fractals.
The term fractal was first used by Benoit Mandelbrot to describe repeating patterns that he observed
occurring in many different structures. These patterns appeared nearly identical in form at any size and
occurred naturally in all things.

Principle:
It is based on the fact that parts of an image often resemble other parts of the same image.(self-similarity
property) .Fractal algorithms convert these parts into mathematical data called "fractal codes" which are
used to recreate the encoded image. Fractal encoding is largely used to convert bitmap images to fractal
codes. Fractal decoding is just the reverse, in which a set of fractal codes are converted to a bitmap.

Currently, the most popular method of fractal encoding is a process called the Fractal Transform .

Compression
The real-world images often are rich in affine redundancy. This observation and the Collage Theorem were
the motivations of the fractal image compression algorithm.
i. The fractal image compression first partitions the original image into non overlapping domain
blocks/ regions (they can be any size or shape).
ii. Then a collection of possible range blocks/ regions is defined. The range regions can overlap and
need not cover the entire image, but must be larger than the domain regions.
iii. For each domain region the algorithm then searches for a suitable range region that, when applied
with an appropriate affine transformation, very closely resembles the domain region.
iv. Afterward, a FIF (Fractal Image Format) file is generated for the image. This file contains information
on the choice of domain regions, and the list of affine coefficients (i.e. the entries of the
transformation matrix) of all associated affine transformations.
v. So all the pixels' data in a given region are compressed into a small set of entries of the transform
matrix, with each entry, corresponding to an integer between 0 and 255, taking up one byte.

This process is independent of the resolution of the original image. The output graphic will look like the
original at any resolution, since the compressor has found an IFS whose attractor replicates the original one
(i.e. a set of equations describing the original image).
Decompression
i. To decompress an image, the compressor first allocates two memory buffers of equal size, with
arbitrary initial content.
ii. The iterations then begin, with buffer 1 the range image and buffer 2 the domain image.
iii. The domain image is partitioned into domain regions as specified in the FIF file.
iv. For each domain region, its associated range region is located in the range image.
v. Then the corresponding affine map is applied to the content of the range region, pulling the content
toward the map's attractor. Since each of the affine maps is contractive, the range region is
contracted by the transformation. This is the reason that the range regions are required to be larger
than the domain regions during compression.
vi. For the next iteration, the roles of the domain image and range image are switched. The process of
mapping the range regions (now in buffer 2) to their respective domain regions (in buffer 1) is
repeated, using the prescribed affine transformations.
vii. Then the entire step is repeated again and again, with the content of buffer 1 mapped to buffer 2,
then vice versa.
viii. At every step, the content is pulled ever closer to the attractor of the IFS which forms a collage of the
original image. Eventually the differences between the two images become very small, and the
content of the first buffer is the output decompressed image.
Advantages:
i. Two tremendous benefits are immediately realized by converting conventional bitmap images to
fractal data.
a. The first is the ability to scale any fractal image up or down in size without the introduction of
image artifacts or a loss in detail that occurs in bitmap images. This process of "fractal
zooming".l bitmap image, and the zooming is limited only by the amount of available memory
in the computer.
b. The second benefit is the fact that the size of the physical data used to store fractal codes is
much smaller than the size of the original bitmap data.. It is this aspect of fractal technology,
called fractal compression..
ii. The fractal method has the benefit of faster decompression speed, having done most of the
computation during the compression step, while giving equal or better compression ratio.
Disadvantages:
i. The asymmetric characteristic limits the usefulness of fractally compressed data to applications
where image data is constantly decompressed but never recompressed. Fractal compression is
therefore highly suited for use in image databases and CD-ROM applications.
ii. Fractal compression is lossy. The process of matching fractals does not involve looking for exact
matches, but instead looking for "best fit" matches based on the compression parameters (encoding
time, image quality, and size of output).
iii. The process of fractal compression is by no means in the public domain. There are many patents
claiming a method of image data compression based on fractal transforms. Also, the exact process
used by some fractal packages--including Barnsley's Fractal Transform--is considered proprietary.

DISCRETE COSINE TRANSFORM


The discrete predictive coding technique operates on the pixels of an image and thus is called as spatial
domain methods. There are other compression techniques that are based on modifying the transform of an
image. In transform coding, a reversible linear transform such as Fourier Transform is used to map the image
into a set of transform coefficients which are then quantized and coded.
The transform coding system encoder performs following steps.
A NxN input image is first divided into sub images of size n x n which are then transformed to generate
(N/n)2 nxn sub images transform arrays. The goal of the transformation process is to decorrelate the pixels
of each sub image, or to pack as much information as possible into the smallest number of transform
coefficients. The quantization stage then selectively eliminates or more correctly quantizes the coefficients
that carry the least information .These coefficients have the smallest impact or the reconstructed sub image
quality. The encoding process terminates by encoding the quantized coefficients.
There are various transform coding techniques such as: DCt, DFT, WHT, etc. The choice of a particular
transform in a given application depends on the amount of reconstruction errors that can be tolerated and the
computational resources available.
DCT is discrete cosine transform.
The 1-D DCT is defined as
−1 (2 +1)
C( ) =∝ ( ) ∑ =0 ( ) cos{
2
} for = 0,1,2 … . . , −1
1
Where ∝= =0

= 2/ for = 1,2, … . . , − 1
The corresponding 2D DCT pair is
(2 + 1) (2 + 1)
∝ ( , ) =∝ ( ) ∝ ( ) ( , ) cos{ } cos{ }
2 2
Where , = 1,2,3, … . . , −1
(2 +1) (2 +1)
And ( , )=∑ ∑ ∝ ( ) ∝ ( ) C( , ) cos{ 2
} cos{ 2
}
For x,y = 0,1,2,3, … . . , −1
1
and ∝= =0

= 2/ for = 1,2, … . . , −1

The DCT is closely related to DFT. It can be obtained from DFT by mirroring the Original N point sequence
to obtain a 2N-point sequence. The DCT is the first N points of the resulting 2N-point DFT. he DCT is
substantially better at energy computation in most correlated sources when compared to DFT.
______________________________________________________________________________________
Q. MPEG video compression______(10mks)_________________________________________________
The Moving Picture Experts Group method is used to compress video. In principle, a motion picture is a rapid
flow of a set of frames, where each frame is an image. In other words, a frame is a spatial combination of
pixels, and a video is a temporal combination of frames that are sent one after another. Compressing video,
then, means spatially compressing each frame and temporally compressing a set of frames.
Video compression is based on two principles.
i. The first is the spatial redundancy that exists in each frame.
ii. The second is the fact that most of the time, a video frame is very similar to its immediate neighbors. This
is called temporal redundancy

Spatial Compression: The spatial compression of each frame is done with JPEG (or a modification of it).
Each frame is a picture that can be independently compressed.

Temporal Compression: In temporal compression, redundant frames are removed. When we watch
television, we receive 50 frames per second. However, most of the consecutive frames are almost the same.
For example, when someone is talking, most of the frame is the same as the previous one except for the
segment of the frame around the lips, which changes from one frame to another.
To temporally compress data, the MPEG method first divides frames into three categories: I-frames, P-
frames, and B-frames.
A frame that is coded using its predecessor is called inter frame (or just inter), while a frame that is coded
independently is called intra frame (or just intra). An intra frame is labeled I, and an inter frame is labeled P (for
predictive). ). A frame that is encoded based on both past and future frames is labeled B (for bidirectional).
i. An I frame is decoded independently of any other frame.
ii. A P frame is decoded using the preceding I or P frame.
iii. A B frame is decoded using the preceding and following I or P frames
1. I-frames. An intracoded frame (I-frame) is an independent frame that is not related to any other frame (not
to the frame sent before or to the frame sent after). They are present at regular intervals (e.g., every ninth
frame is an I- frame). An I-frame must appear periodically to handle some sudden change in the frame that
the previous and following frames cannot show. Also, when a video is broadcast, a viewer may tune in at any
time. If there is only one I-frame at the beginning of the broadcast, the viewer who tunes in late will not
receive a complete picture. I-frames are independent of other frames and cannot be constructed from other
frames.
2. P-frames. A predicted frame (P-frame) is related to the preceding I-frame or P- frame. In other words,
each P-frame contains only the changes from the preceding frame. The changes, however, cannot cover a
big segment. For example, for a fast- moving object, the new changes may not be recorded in a P- frame. P-
frames can be constructed only from previous I- or P-frames. P-frames carry much less information than
other frame types and carry even fewer bits after compression.
3. B-frames. A bidirectional frame (B-frame) is related to the preceding and following I- frame or P-frame. In
other words, each B-frame is relative to the past and the future. Note that a B-frame is never related to
another B-frame. Fig. 4.8 shows a sample sequence of frames.

Fig. 4.8 MPEG frames


Fig. 4.9 MPEG frame construction
Fig. 4.9 shows how I-, P-, and B-frames are constructed from a series of seven frames.
MPEG has gone through two versions. MPEG1 was designed for a CD-ROM with a data rate of 1.5 Mbps.
MPEG2 was designed for high-quality DVD with a data rate of 3 to 6 Mbps.

You might also like