CH 2. Size 10.5 Finalised
CH 2. Size 10.5 Finalised
(1) Digitization means to transfer an anologue into a digital representation. A good example for this is the
information that is stored on the good old audio cassette tapes. The sound pressure waveform is contained
on the tape in form of an adequate function of magnetization strength. This analog representation has to be
converted to a series of numbers, since digital computers can only work with discrete numbers stored at
discrete time moments. The following graphic may indicate the process: at equidistant times the actual value
of the analog waveform is taken and stored as a number. This process where analog signals are turned into
a series of digital numbers is called sampling.
(2) Capturing means to transfer data that is already in a digital representation into a digital representation on
computers. Many recording devices have already a digital representation such as DAT-recorders for audio or
DV camcorders for video signals. In this case the processing is even more simple since one only has to
assure that the digital packaging (format) has to be converted. There is, however, equipment where capturing
is not possible since the companies don't offer the digital information in a proper format at an external plug.
Earlier MiniDisc recorders only allowed the user to copy the stored data via an analog line. Currently, there
are no analog recorders anymore in the video sector, i.e. all cameras carry out a direct digitization.
Digitizing Audio
When sound is fed into a microphone, an electronic analog signal is generated that represents the sound
amplitude as a function of time. The signal is called an analog audio signal. An analog signal, such as audio,
can be digitized to produce a digital signal. According to the Nyquist theorem, if the highest frequency of the
signal is f, we need to sample the signal 2f times per second. There are other methods for digitizing an audio
signal, but the principle is the same. Voice is sampled at 8,000 samples per second with 8 bits per sample.
This results in a digital signal of 64 kbps. Music is sampled at 44,100 samples per second with 16 bits per
sample. This results in a digital signal of 705.6 kbps for monaural and 1.411 Mbps for stereo.
Digitizing Video
A video consists of a sequence of frames. If the frames are displayed on the screen fast enough, we get an
impression of motion. The reason is that our eyes cannot distinguish the rapidly flashing frames as individual
ones. There is no standard number of frames per second; in North America 25 frames per second is
common. However, to avoid a condition known as flickering, a frame needs to be refreshed. The TV industry
repaints each frame twice. This means 50 frames need to be sent, or if there is memory at the sender site, 25
frames with each frame repainted from the memory. Each frame is divided into small grids, called picture
elements or pixels. For black and- white TV, each 8-bit pixel represents one of 256 different gray levels. For a
color TV, each pixel is 24 bits, with 8 bits for each primary color (red, green, and blue). We can calculate the
number of bits in a second for a specific resolution. In the lowest resolution a color frame is made of
1,024x768 pixels. This means that we need
2x25x1024x768x24=944Mbps.This data rate needs a very high data rate technology such as SONET. To
send video using lower-rate technologies, we need to compress the video.
Audio Compression
Audio compression can be used for speech or music. For speech, we need to compress a 64-kHz digitized
signal; for music, we need to compress a 1.411 –MHz digitized signal. Two categories of techniques are
used for audio compression: predictive encoding and perceptual encoding.
• Predictive Encoding
In predictive encoding, the differences between the samples are encoded instead of encoding all the
sampled values. This type of compression is normally used for speech. Several standards have been defined
such as GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps).
• Perceptual Encoding: MP3
The most common compression technique that is used to create CD-quality audio is based on the perceptual
encoding technique. As we mentioned before, this type of audio needs at least 1.411 Mbps; this cannot be
sent over the Internet without compression. MP3 (MPEG audio layer 3), a part of the MPEG standard uses
this
technique.
Perceptual encoding is based on the science of psychoacoustics, which is the study of how people perceive
sound. The idea is based on flaws in our auditory system: Some sounds can mask other sounds. Masking
can happen in frequency and time. In frequency masking, a loud sound in a frequency range can partially or
totally mask a softer sound in another frequency range. For example, we cannot hear what we say in a room
where a loud heavy sound is playing. In temporal masking, a loud sound can numb our ears for a short time
even after the sound has stopped. MP3 uses these two phenomena, frequency and temporal masking, to
compress audio signals. The technique analyzes and divides the spectrum into several groups. Zero bits are
allocated to the frequency ranges that are totally masked. A small number of bits are allocated to the
frequency ranges that are partially masked. A larger number of bits are allocated to the frequency ranges
that are not masked. MP3 produces three data rates: 96 kbps, 128 kbps, and 160 kbps. The rate is based on
the range of the frequencies in the original analog audio.
Adaptive DPCM: This variant of DPCM is commonly used for audio compression. In ADPCM the
quantization step size adapts to the changing frequency of the sound being compressed. The predictor also
has to adapt itself and recalculate the weights according to changes in the input. Several versions of ADPCM
exist. A popular version is the IMA ADPCM standard (Section 7.6), which specifies the compression of PCM
from 16 down to four bits per sample. ADPCM is fast, but it introduces noticeable quantization noise and
achieves unimpressive compression factors of about four.
Video Compression
As we mentioned before, video is composed of multiple frames. Each frame is one image. We can compress
video by first compressing images. Two standards are prevalent in the market. Joint Photographic Experts
Group (JPEG) is used to compress images. Moving Picture Experts Group (MPEG) is used to compress
video.
Motion Compensation
• Each image is divided into macroblocks of size N × N .
– By default, N = 16 for luminance images. For chrominance images,N = 8 if 4:2:0 chroma subsampling is
adopted.
• Motion compensation is performed at the macroblock level.
– The current image frame is referred to as Target Frame.
– A match is sought between the macroblock in the Target Frame and the most similar macroblock in
previous and/or future frame(s) (referred to as Reference frame(s)).
– The displacement of the reference macroblock to the target mac- roblock is called a motion vector MV.
Image compression
A digital image is a rectangular array of dots, or picture elements, arranged in m rows and n columns. The
expression m×n is called the resolution of the image, and the dots are called pixels (except in the cases of
fax images and video compression, where they are referred to as pels). The term “resolution” is sometimes
also used to indicate the number of pixels per unit length of the image. Thus, dpi stands for dots per inch.
1. A bi-level (or monochromatic) image. This is an image where the pixels can have one of two values,
normally referred to as black and white. Each pixel in such an image is represented by one bit, making this
the simplest type of image.
2. A grayscale image. A pixel in such an image can have one of the n values 0 through n − 1, indicating one
of 2n shades of gray (or shades of some other color). The value of n is normally compatible with a byte size;
i.e., it is 4, 8, 12, 16, 24, or some other convenient multiple of 4 or of 8. The set of the most-significant bits of
all the pixels is the most-significant bitplane. Thus, a grayscale image has n bitplanes.
DPCM
The DPCM compression method is a member of the family of differential encoding compression methods,
which itself is a generalization of the simple concept of relative encoding.
It is based on the well-known fact that neighboring pixels in an image are correlated.
3. A continuous-tone image. This type of image can have many similar colors (or grayscales). When
adjacent pixels differ by just one unit, it is hard or even impossible for the eye to distinguish their colors. As a
result, such an image may contain areas
with colors that seem to vary continuously as the eye moves along the area. A pixel in such an image is
represented by either a single large number (in the case of many grayscales) or three components (in the
case of a color image).
4. A discrete-tone image (also called a graphical image or a synthetic image). This is normally an
artificial image. It may have a few colors or many colors, but it does not have the noise and blurring of a
natural image. Examples are an artificial object or machine, a page of text, a chart, a cartoon, or the contents
of a computer screen. (Not every artificial image is discrete-tone. A computer-generated image that’s meant
to look natural is a continuous-tone image in spite of its being artificially generated.)
5. A cartoon-like image. This is a color image that consists of uniform areas. Each area has a uniform color
but adjacent areas may have very different colors. This feature may be exploited to obtain excellent
compression. Whether an image is treated as discrete or continuous is usually dictated by the depth of the
data. However, it is possible to force an image to be continuous even if it would fit in the discrete category). It
is intuitively clear that each type of image may feature redundancy, but they are redundant in different ways.
This is why any given compression method may not perform well for all images, and why different methods
are needed to compress the different image types
______________________________________________________________________________________
Q. Explain motion compensation wrt to video compression____________________________________
Motion Compensation:
i. If the encoder discovers that a part P of the preceding frame has been rigidly moved to a different location
in the current frame, then P can be compressed by writing the following three items on the compressed
stream: its previous location, its current location, and information identifying the boundaries of P.
ii. In principle, such a part can have any shape.
iii. The encoder scans the current frame block by block. For each block B it searches the preceding frame for
an identical block C (if compression is to be lossless) or for a similar one (if it can be lossy).
iv. Finding such a block, the encoder writes the difference between its past and present locations on the
output. This difference is of the form
v. (Cx − Bx, Cy − By) = (Δx,Δy), so it is called a motion vector.
vi. Figure 6.10a,b shows a simple example where the sun and trees are moved rigidly to the right (because of
camera movement) while the child moves a different distance to the left (this is scene movement).
Motion compensation is effective if objects are just translated, not scaled or rotated. Drastic changes in illumination
from frame to frame also reduce the effectiveness of this method. In general, motion compensation is lossy.
In other words, q is the quotient and r is the remainder when n is divided by m. The
quotient q can take on values 0_ 1_ 2_ ……….. and is represented by the unary code
of q. The remainder r can take on the values 0_ 1_ 2_ …_ m−1. If m is a power of two,
we use the log2m-bit binary representation of r. If m is not a power of two, we could still
use [log2m] bits.
It can be shown that the Golomb code is optimal for the probability model
Encoding:
i. The Golomb code for nonnegative integers n depends on the choice of a parameter m .
ii. The first step in computing the Golomb code of the nonnegative integer n is to compute the three
quantities q (quotient), r(remainder), and c by
following which the code is constructed in two parts; the first is the value of q, coded in unary and the
second is the binary value of r coded in a special way.
iii. The first 2c − m values of r are coded, as unsigned integers, in c − 1 bits each, and the rest are
coded in c bits each (ending with the biggest c-bit number, which consists of c 1’s).
iv. The case where m is a power of 2 (m = 2c) is special because it requires no (c − 1)-bit codes. We
know that n = r + qm; so once a Golomb code is decoded, the values of q and r can be used to easily
reconstruct n.
Example.
Choosing m = 3 produces c = 2 and the three remainders 0, 1, and 2.We compute 22 − 3 = 1, so the first
remainder is coded in c − 1 = 1 bit to become 0,and the remaining two are coded in two bits each ending
with 112, to become 10 and 11.
Selecting m = 5 results in c = 3 and produces the five remainders 0 through 4. The first three (23−5 = 3) are
coded in c−1 = 2 bits each, and the remaining two are each coded in three bits ending with 1112. Thus, 00,
01, 10, 110, and 111.
Decoding:
The Golomb codes are designed in this special way to facilitate their decoding.
Procedure
i. In practice,the quantizer consists of two mappings:an encoder mapping and a decoder mapping.
ii. The encoder divides the range of values that the source generates into a number of intervals.
iii. Each interval is represented by a distinct codeword. The encoder represents all the source outputs
that fall into a particular interval by the codeword representing that interval.
iv. As there could be many—possibly infinitely many—distinct sample values that can fall in any given
interval, the encoder mapping is irreversible. Knowing the code only tells us the interval to which the
sample value belongs. It does not tell us which of the many values in the interval is the actual sample
value. When the sample value comes from an analog source, the encoder is called an analog-to-
digital (A/D) converter.
v. For every codeword generated by the encoder, the decoder generates a reconstruction value.
vi. Because a codeword represents an entire interval, and there is no way of knowing which value in the
interval was actually generated by the source, the decoder puts out a value that, in some sense, best
represents all the values in the interval. If the reconstruction is analog, the decoder is often referred
to as a digital-to-analog (D/A) converter.
This aspect of quantization is used by several audio compression methods.
Drawbacks :
i. Scalar quantization is an example of a lossy compression method, where it is easy to control the
trade-off between compression ratio and the amount of loss. However, because it is so simple, its
use is limited to cases where much loss can be tolerated.
ii. Scalar quantization is not suitable for image compression because it creates annoying artifacts in the
decompressed image. Imagine an image with an almost uniform area where all pixels have values
127 or 128. If 127 is quantized to 111 and 128 is quantized to 144, then the result, after
decompression,
may resemble a checkerboard where adjacent pixels alternate between 111 and 144. This is why
practical algorithms use vector quantization, instead of scalar quantization, for lossy (and sometimes
lossless) compression of images and sound.
.
Principle:
The image is partitioned into equal-size blocks (called vectors) of pixels, and the encoder has a list (called a
codebook) of blocks of the same size. Each image block B is compared to all the blocks of the codebook and
i s matched with the “closest” one.
Procedure:
i. In vector quantization we group the source output into blocks or vectors. For example, we can treat L
consecutive samples of speech as the components of an L-dimensional vector.
ii. This vector of source outputs forms the input to the vector quantizer.
iii. At both the encoder and decoder of the vector quantizer, there is a set of L-dimensional vectors
called the codebook.
iv. The vectors in this codebook, known as code-vectors, are selected to be representative of the
vectors
v. generated from the source output.
vi. Each code-vector is assigned a binary index.
vii. At the encoder, the input vector is compared to each code-vector in order to find the code-vector
closest to the input vector. The elements of this code-vector are the quantized values of the source
output.
viii. In order to inform the decoder about which code-vector was found to be the closest to the input
vector, the binary index of the code-vector is transmitted or stored.. Because the decoder has
exactly the same codebook, it can retrieve the code-vector given its binary index.
A pictorial representation of this process is shown in Figure 10.1.
xi. Although the encoder may have to perform a considerable amount of computations in order to
find the closest reproduction vector to the vector of source outputs, the decoding consists of a
table lookup. This makes vector quantization a very attractive encoding scheme for applications
in which the resources available for decoding are considerably less than the resources available
for encoding.
Compression
After reading the first character, the repeat-count is 1 and the character is saved. Subsequent characters are
compared with the one already saved, and if they are identical to it, the repeat-count is incremented. When a
different character is read, the operation depends on the value of the repeat count. If it is small, the saved
character is written on the compressed file and the newly-read character is saved. Otherwise, an @ is written
,followed by the repeat-count and the saved character.
Decompression: Decompression is also straightforward. It is shown in Figure 1.6b. When an @ is read, the
repetition count n and the actual character are immediately read, and the character is written n times on the
output stream.
Drawbacks : In plain English text there are not many repetitions .The most repetitive character is the space.
The character “@” may be part of the text in the input stream, in which case a different escape character
must be chosen. Sometimes the input stream may contain every possible character in the alphabet. Since
the repetition count is written on the output stream as a byte, it is limited to counts of up to 255.
Compression ratio: To get an idea of the compression ratios produced by RLE, we assume a string of N
characters that needs to be compressed. We assume that the string contains M repetitions of average length
L each. Each of the M repetitions is replaced by 3 characters (escape, count, and data), so the size of the
compressed string is N − M × L +M ×3 = N −M(L − 3) and the compression factor is N / (N −M(L − 3) ).
______________________________________________________________________________________
Q .What is RLE? How can it be used for audio compression?__________________________________
RLE Image Compression
i. RLE is a natural candidate for compressing graphical data.
ii. A digital image consists of small dots called pixels. Each pixel can be either one bit, indicating a black
or a white dot, or several bits, indicating one of several colors or shades of gray.
iii. We assume that the pixels are stored in an array called a bitmap in memory, so the bitmap is the
input stream for the image.
iv. Pixels are normally arranged in the bitmap in scan lines, so the first bitmap pixel is the dot at the top
left corner of the image, and the last pixel is the one at the bottom right corner.
Principle: Compressing an image using RLE is based on the observation that if we select a
pixel in the image at random, there is a good chance that its neighbors will have the same color.
Compression:
i. The compressor scans the bitmap row by row, looking for runs of pixels of the same color. If the
bitmap starts, e.g., with 17 white pixels, followed by 1 black pixel, etc., then only the numbers 17, 1, .
. need be written on the output stream.
ii. The compressor assumes that the bitmap starts with white pixels. If this is not true, then the bitmap
starts with zero white pixels, and the output stream should start with the run length 0. The resolution
of the bitmap should also be saved at the start of the output stream.
iii. The size of the compressed stream depends on the complexity of the image.
iv. The more detail, the worse the compression.
v. RLE can also be used to compress grayscale images.
vi. Each run of pixels of the same intensity (gray level) is encoded as a pair (run length, pixel value). The
run length usually occupies one byte, allowing for runs of up to 255 pixels. The pixel value occupies
several bits, depending on the number of gray levels (typically between 4 and 8 bits).
Digitizing an image involves two steps: sampling and quantization. Sampling an image is the process of
dividing the two-dimensional original image into small regions: pixels. Quantization is the process of
assigning an integer value to each pixel. Notice that digitizing sound involves the same two steps, with the
difference that sound is one-dimensional.
Statistical methods work best when the symbols being compressed have different probabilities.
i. An input stream where all symbols have the same probability will not compress, even though it may
not be random. It turns out that in a continuous-tone color or grayscale image, the different colors or
shades of gray may often have roughly the sameprobabilities. This is why statistical methods are not
a good choice for compressing such images, and why new approaches are needed.
ii. Images with color discontinuities, where adjacent pixels have widely different colors, compress better
with statistical methods, but it is not easy to predict, just by looking at an image, whether it has
enough color discontinuities.
Dictionary-based compression methods also tend to be unsuccessful in dealing with continuous-tone
images. Such an image typically contains adjacent pixels with similar colors, but does not contain repeating
patterns.
i. An image that contains repeated patterns such as vertical lines may lose them when digitized. So the
pixels in a scan row may end up having slightly different colors from those in adjacent rows, resulting
in a dictionary with short strings. (This problem may also affect curved edges.)
ii. Another problem with dictionary compression of images is that such methods scan the image row by
row, and therefore may miss vertical correlations between pixels.
______________________________________________________________________________________
The Principle of Image Compression. If we select a pixel in the image at random, there is a good chance that
its neighbors will have the same color or very similar colors. Image compression is therefore based on the fact that
neighboring pixels are highly correlated. This correlation is also called spatial redundancy.
______________________________________________________________________________________
Q. Discuss the various approaches of image compression. Explain any one of them.____(5mks)__________
Q . Explain with example the significance of Gray codes for image compression.______________________
Gray Codes
Gray code is the binary representation of integers where consecutive integers differ only by one bit.
Need for grey code
Any method for compressing bi-level images, for example, can be used to compress grayscale images by
separating the bit planes and compressing each individually, as if it were a bi-level image. Imagine, for
example, an image with 16 grayscale values. Each pixel is defined by four bits, so the image can be
separated into four bi-level images. The trouble with this approach is that it violates the general principle of
image compression. Imagine two adjacent 4-bit pixels with values 7 = 01112 and 8 = 10002. These pixels
have close values, but when separated into four bit planes, the resulting 1-bit pixels are different in every bit
plane! This is because the binary representations of the consecutive integers 7 and 8 differ in all four bit
positions. In order to apply any bi-level compression method to grayscale images, a binary representation of
the integers is needed where consecutive integers have codes differing by one bit only. Such a
representation exists and is called reflected Gray code (RGC).
Encoding
This code is easy to generate with the following recursive construction:
i. Start with the two 1-bit codes (0, 1).
ii. Construct two sets of 2-bit codes by duplicating(0, 1) and appending, either on the left or on the right,
first a zero, then a one, to the original set. The result is (00, 01) and (10, 11).
iii. Now reverse (reflect) the second set, and concatenate the two.
iv. The result is the 2-bit RGC (00, 01, 11, 10); a binary code of the integers 0 through 3 where
consecutive codes differ by exactly one bit.
v. Applying the rule again produces the two sets (000, 001, 011, 010) and (110, 111, 101, 100), which
are concatenated to form the 3-bit RGC.
Note that the first and last codes of any RGC also differ by one bit. Here are the first three steps for
computing the 4-bit RGC:
Example:
2-bit list: 00, 01, 11, 10
i. The name JPEG is an acronym that stands for Joint Photographic Experts Group.
ii. JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images (not
videos). It does not handle bi-level (black and white) images very well. It also works best on
continuous-tone images, where adjacent pixels have similar colors.
iii. An important feature of JPEG is its use of many parameters, allowing the user to adjust the amount
of the data lost (and thus also the compression ratio) over a very wide range.
iv. JPEG has been designed as a compression method for continuous-tone images.
The whole idea of JPEG is to change the picture into a linear (vector) set of numbers that reveals the
redundancies. The redundancies (lack of changes) can then be removed by using one of the text
compression methods.
Step 2: Color images are down sampled by creating low-resolution pixels from the original ones.
(this step is used only when hierarchical compression is selected). Down sampling is done either at a ratio of
2:1 both horizontally and vertically (reduces image to 1/3) or at ratios of 2:1 horizontally and 1:1 vertically
(reduces image to 2/3 of original size).Since this is done only on chrominance component & not on
luminance component, there is no loss of image quality.
Step 3: The pixels of each color component are organized in groups of 8×8 pixels called data units, and each
data unit is compressed separately. The bottom row and rightmost column can be duplicated in case the
number of rows or columns is not a multiple of 8.
Step 4: The discrete cosine transform (DCT, Section 4.6) is then applied to each data unit to create an 8×8
map of frequency components.
They represent the average pixel value and successive higher-frequency changes within the group. This
prepares the image data for the crucial step of losing information.
Step 5: Each of the 64 frequency components in a data unit is divided by a separate number called its
quantization coefficient (QC), and then rounded to an integer.
Here information lost is not retrievable, large QC causes more loss, so high frequency components typically
have large QC. Practically QC tables are recommended by JPEG standards for chrominance and luminance
image compression..
Step 6: The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded
using a combination of RLE and Huffman coding.
Step 7: The last step adds headers and all the required JPEG parameters, and outputs the result.
The compressed file may be in one of three formats (1) the interchange format, (2) the abbreviated format for
compressed image data, and (3) the abbreviated format for table-specification data.
The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression method.)
Why DCT is used in JPEG?(5mks)
The JPEG committee elected to use the DCT because of
i. its good performance,
ii. it does not assume anything about the structure of the data (the DFT, for example, assumes that the data to be
transformed is periodic), and
iii. there are ways to speed it up .
The JPEG standard calls for applying the DCT not to the entire image but to data units (blocks) of 8×8 pixels. The
reasons for this are:
(1) Applying DCT to large blocks involves many arithmetic operations and is therefore slow. Applying DCT to small data
units is faster.
(2) Experience shows that, in a continuous-tone image, correlations between pixels are short range. A pixel in such an
image has a value (color component or shade of gray) that’s close to those of its near neighbors, but has nothing to do
with the values of far neighbors.
The JPEG DCT is therefore executed by Equation given below, duplicated here for n = 8
The JPEG decoder works by computing the inverse DCT (IDCT), Equation given below duplicated here for n = 8
It takes the 64 quantized DCT coefficients and calculates 64 pixels pxy. If the QCs are the right ones, the new 64 pixels
will be very similar to the original ones. Mathematically, the DCT is a one-to-one mapping of 64-point vectors from the
image domain to the frequency domain. The IDCT is the reverse mapping. If the DCT and IDCT could be calculated with
infinite precision and if the DCT coefficients were not quantized, the original 64 pixels would be exactly reconstructed.
Quantization in JPEG
After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized.This is the step where
information is lost .
Each number in the DCT coefficients matrix is divided by the corresponding number from the particular
“quantization table” used, and the result is rounded to the nearest integer. Three such tables are needed, for
the three color components.
The JPEG standard allows for up to four tables, and the user can select any of the four for quantizing each
color component.
The 64 numbers that constitute each quantization table are all JPEG parameters.
In principle, they can all be specified and fine-tuned by the user for maximum compression. Instead of
dealing wirh 64 parameters,JPEG software normally uses the following two approaches:
1. Default quantization tables. Two such tables, for the luminance (grayscale) and the chrominance
components are developed by the JPEG committee as a means to reduce the DCT coefficients with high
spatial frequencies.
2. A simple quantization table Q is computed, based on one parameter R specified by the user. A simple
expression such as Qij = 1+(i + j) × R guarantees that QCs start small at the upper-left corner and get bigger
toward the lower-right corner of the table .
If the quantization is done correctly, very few nonzero numbers will be left in the DCT coefficients matrix, and
they are the output of JPEG, but they are further compressed before being written on the output stream.
In the JPEG literature this compression is called “entropy coding,”
Three techniques are used by entropy coding to compress the 8 × 8 matrix of integers:
1. The 64 numbers are collected by scanning the matrix in zigzags. This produces a string of 64 numbers
that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero numbers
are output and are followed by a special end-of block (EOB) code. This way there is no need to output the
trailing zeros (we can say that the EOB is the run-length encoding of all the trailing zeros).
2. The nonzero numbers are compressed using Huffman coding .
3. The first of those numbers (the DC coefficient) is treated differently from the others (the AC coefficients).
Why zig-zag scan?
1.To group low frequency co-efficient in the top of vector.
2. maps 8x8 to a 1x64 vector.
3.zig-zag scan is more effective.
Coding in JPEG
Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at position (0, 0), the top left
corner] and 63 AC coefficients.
The DC coefficient is a measure of the average value of the 64 original pixels, constituting the data unit. The
DC coefficients of adjacent data units don’t differ much. JPEG outputs the first one (encoded), followed by
differences (also encoded) of the DC coefficients of consecutive data units.
Example: If the first three 8×8 data units of an image have quantized DC coefficients of 1118, 1114, and
1119, then the JPEG output for the first data unit is 1118 (Huffman encoded)followed by the 63 (encoded) AC
coefficients of that data unit. The output for the second data unit will be 1114 − 1118 = −4 (also Huffman
encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the output for the third data unit
will be 1119 − 1114 = 5 (also Huffman encoded), again followed by the 63 (encoded) AC coefficients of that
data unit. This way of handling the DC coefficients is worth the extra trouble, because the differences are
small.
Figure 4.68 shows the main parts of the JPEG compressed file (parts in square brackets are optional).
i. The file starts with the SOI marker and ends with the EOI marker. In between these markers, the
compressed image is organized in frames.
ii. In the hierarchical mode there are several frames, and in all other modes there is only one frame. In
each frame the image information is contained in one or more scans, but the frame also contains a
header and optional tables (which, in turn, may include markers).
iii. The first scan may be followed by an optional DNL segment (define number of lines), which starts
with the DNL marker and contains the number of lines in the image that’s represented by the frame.
iv. A scan starts with optional tables, followed by the scan header, followed by several entropy-coded
segments (ECS), which are separated by (optional) restart markers (RST). Each ECS contains one
or more MCUs, where an MCU is, as explained earlier, either a single data unit or three such units.
Q. List various methods of lossless image compression. Explain any one of them._________________
1. GIF
Example
Consider the following 3-bit pointers 3, 7, 4, 1, 6, 2, and 5.Their binary values are 011, 111, 100, 001, 110,
010, and 101, so they are packed in 3 bytes|10101001|11000011|11110...|.
2.PNG
The portable network graphics (PNG) file format has been developed in the mid-1990s by a group (the
PNG development group [PNG 03]) headed by Thomas Boutell. The project was started in response to the
legal issues surrounding the GIF file format.
Aim
The aim of this project was to develop a sophisticated graphics file format that will be flexible, will support
many different types of images, will be easy to transmit over the Internet, and will be unencumbered by
patents.
Advantage :
The PNG (Portable Network Graphics) format is l more space-efficient in the case of images with many pixels
of the same color, such as diagrams, and supports special compression features that JPEG 2000 does not.
______________________________________________________________________________________
Q. Short note on JPEG –LS____(10mks)_____________________________________________________
JPEG-LS is a new standard for the lossless (or near-lossless) compression of continuous tone images.
Principle
JPEG-LS examines several of the previously-seen neighbors of the current pixel, uses them as the context of
the pixel, uses the context to predict the pixel and to select a probability distribution out of several such
distributions, and uses that distribution to encode the prediction error with a special Golomb code
The context used to predict the current pixel x is shown in Figure below.
Encoder
The encoder examines the context pixels and decides whether to encode the current pixel x in the run mode
or in the regular mode. If the context suggests that the pixels y, z,. . . following the current pixel are likely to
be identical, the encoder selects the run mode. Otherwise, it selects the regular mode. In the near-lossless
mode the decision is slightly different. If the context suggests that the pixels following the current pixel are
likely to be almost identical (within the tolerance parameter NEAR), the encoder selects the run mode.
Otherwise, it selects the regular mode. The rest of the encoding process depends on the mode selected.
1.In the regular mode, the encoder uses the values of context pixels a, b, and c to predict pixel x, and
subtracts the prediction from x to obtain the prediction error, denoted by Errval. This error is then corrected
by a term that depends on the context (this correction is done to compensate for systematic biases in the
prediction), and encoded with a Golomb code. The Golomb coding depends on all four pixels of the context
and also on prediction errors that were previously encoded for the same context (this information is stored in
arrays A and N.If near-lossless compression is used, the error is quantized before it is encoded
2.In the run mode, the encoder starts at the current pixel x and finds the longest run of pixels that are
identical to context pixel a. The encoder does not extend this run beyond the end of the current image row.
Since all the pixels in the run are identical to a (and a is already known to the decoder) only the length of the
run needs be encoded, and this is done with a 32-entry array denoted by J (Section 4.9.1). If near-lossless
compression is used, the encoder selects a run of pixels that are close to a within the tolerance parameter
NEAR.
Decoder
The decoder is not substantially different from the encoder, so JPEG-LS is a nearly symmetric compression
method. The compressed stream contains data segments (with the Golomb codes and the encoded run
lengths), marker segments (with information needed by the decoder), and markers (some of the reserved
markers of JPEG are used). A marker is a byte of all ones followed by a special code, signaling the start of a
new segment. If a marker is followed by a byte whose most significant bit is 0, that byte is the start of a
marker segment. Otherwise, that byte starts a data segment.
Advantages:
i. JPEG-LS is capable of lossless compression.
ii. JPEG-LS has very low computational complexity.
____________________________________________________________________________________
Q.Short note on JPEG 2000____(read in detail from txt 622 onwards)_____________________________
Need:
The current JPEG standard provides excellent performance at rates above 0.25 bits per pixel. However, at
lower rates there is a sharp degradation in the quality of the reconstructed image. To correct this and other
shortcomings, the JPEG committee initiated work on another standard, commonly known as JPEG 2000. It
introduced “compress once, decompress many ways” paradigm.
Principle:
The JPEG 2000 is the standard based on wavelet decomposition using Discrete Wavelet Transform (DWT).
This transform decompose the image using functions called wavelets.
Following is a list of areas where this new standard is expected to improve on existing methods:
i. High compression efficiency. Bitrates of less than 0.25 bpp are expected for highly detailed grayscale
images.
ii. The ability to handle large images, up to 232×232 pixels (original JPEG can handle 216×216 pixels).
iii. Progressive image transmission.
iv. Easy, fast access to various points in the compressed stream.
v. The decoder can pan/zoom the image while decompressing only parts of it.
vi. The decoder can rotate and crop the image while decompressing it.
vii. Error resilience. Error-correcting codes can be included in the compressed stream, to improve
transmission reliability in noisy environments.
Block diagram
At the encoder, the discrete transform is first applied on the source image data. The transform coefficients are then
quantized and entropy coded before forming the output code stream .
The decoder is the reverse of the encoder. The code stream is first entropy decoded, dequantized and inverse
discrete transformed, thus resulting in the reconstructed image data.
Steps
i. The source image is decomposed into components.
ii. The image components are (optionally) decomposed into rectangular tiles. The tile component is the
basic unit of the original or reconstructed image.
iii. A wavelet transform is applied on each tile. The tile is decomposed into different resolution levels.
iv. The resolution level is made up of subbands of coefficients that describe the frequency
characteristics
v. of local areas of tile components, rather than across the entire image component.
vi. The subbands of coefficients are quantized and collected into rectangular arrays of “code blocks”.
vii. The bit planes of coefficients in the code block are entropy coded.
viii. The encoding can be done in such a way that Regions of Interest (ROI)can be encoded at a higher
quality than the background.
ix. Markers are added to the bit stream to allow for error resilience.
x. The code stream has a main header at the beginning that describes the original image and the
various decomposition and coding styles that are used to locate, extract, decode and reconstruct the
image with the desired resolution, fidelity, region of interest(ROI) or other characteristics.
Advantages
i. Better image quality than JPEG at the same files size; or alternatively 25-35% smaller file size with
the same quality.
ii. Good image quality at low bit rates.
iii. Scalable image files i.e. no decompression needed for reformatting.
iv. JPEG 2000 is more suitable to web-graphics than baseline JPEG because it supports Alpha-channel
(Transparency component)
v. ROI: one can define some more interesting parts of the image, which are coded with more bits than
surrounding areas.
______________________________________________________________________________________
Q. Discuss the application of JPEG 2000 (5mks)_____________________________________________
Some markets and applications intended to be served by this standard are listed below:
i. Consumer applications such as multimedia devices (e.g., digital cameras, personal digital assistants,
3G mobile phones, color facsimile, printers, scanners, etc.)
ii. Client/server communication (e.g., the Internet, Image database, Video streaming, video server, etc.)
iii. Military/surveillance (e.g., HD satellite images, Motion detection, network distribution and storage, etc.)
iv. Medical imagery, esp. the DICOM specifications for medical data interchange.
v. Remote sensing
vi. High-quality frame-based video recording, editing and storage.
vii. Live HDTV feed contribution (I-frame only video compression with low transmission latency), such as
live HDTV feed of a sport event linked to the TV station studio
viii. Digital cinema
ix. JPEG 2000 has many design commonalities with the ICER image compression format that is used to
send images back from the Mars rovers.
x. Digitized Audio-visual contents and Images for Long term digital preservation
xi. World Meteorological Organization has built JPEG 2000 Compression into the new GRIB2 file
format. The GRIB file structure is designed for global distribution of meteorological data. The
implementation of JPEG 2000 compression in GRIB2 has reduced file sizes up to 80%]
JPEG JPEG-LS
JPEG is a very well known ISO/ITU-T standard JPEG-LS is a recent ISO/ITU-T standard for
created in late 80’s. lossless coding of still images.
This compression method is usually lossy. It provides support for “near lossless
compression”
Due to losssy compression ,image quality is
Compression is generally slower than JPEG- Much faster and much better than the original
LS. JPEG standard.
In this image is divided into 8x8 blocks of pixels. It is based on the (Low Complexity Lossless
Then Discrete Cosine Transform & Huffman Compression for Images) LOCO-I algorithm
Entropy Coding are employed. using adaptive prediction, context modeling &
Golomb coding.
It offers progressive bit stream functionality. Does not offer
The choice of quantization coefficients specifies The near lossless mode allows the user to
the error introduced during compression. specify a bound on the error introduced by
compression algorithm.
____________________________________________________________________________________
Q. Difference between image and video compression?____________(5mks)______________________
With the exception of Motion-JPEG, all video compression standards mix still images with partially complete
images. By storing only the changes from one full image to another, these ‘partially complete’ images reduce
the file size of the compressed video sequence. Scenes containing little or no variation can be compressed
quite dramatically.
1.Motion-JPEG
With motion-JPEG, each frame within the video is stored as a complete image in the JPEG format &still
images are displayed at a high frame rate to produce very high-quality video .
2.H.261, 263, 321, 324 etc.
Designed for video conferencing, but sometimes used for network cameras, these standards offer a high
frame rate. However, the quality of the images is low.
3.MPEG 1
Stands for Moving Picture Encoding Group international standard ISO/IEC 11172.This gives a performance
of up to 352x288 pixel, 30 frames per second (fps) at a maximum of 1.86Mbit/s. Audio according to MPEG 1
layer 1, 2 and 3 are also included in the standard.
4.MPEG 2
It is a popular standard that offers high-quality video suitable for installations where TV-quality is needed.
All MPEG formats use a “Differential method”, the first frame is a complete picture similar to a JPEG frame,
the next frames are only an up-date of the difference between the first frame and how it looks now.
Depending on how the GOP (Group of Pictures) is set, you can have different amounts of up-dated frames (B
and I-frames) between the frames containing a complete picture (P-frame).
5.MPEG 3:
A cancelled standard aimed at HDTV (High Definition TV).
6. MPEG 4
Moving Picture Encoding Group international standard ISO/IEC 14496. This standard covers a wide variety
of applications ranging from the video displayed in cellular phones, to full feature-length movies shown in a
cinema.
Questions in Terna notes (Xerox attached):(Q1)Explain motion compensation wrt to video compression, (Q
2)Basic structure of MPEG-1 video standard.(Q3) Explain the algorithm for video conferencing.(Q 4)Loop filter.(Q
5)Explain the concept of packet video? What are the compression issues in ATM network?(Q 6)Explain with
example progressive image compression. (Q 7) Fascimile encoding.(Q 8)Run length encoding.( Q 9) Short note on
JPEG. (Q10)MPEG video standards. (Q11)Compression algorithm for packet video.(Q12)What are improvements
in JBIG-2 as compared to JBIG and how it is used for the encoding and decoding?(Q13)Describe the various
models used for lossy compression algorithm.
______________________________________________________________________________________
Q.Short note on Facsimile Encoding___(5mks)_________________________________________________
i. It is one of the earliest applications of lossless compression in the modern era. It is also as simply
fax.
ii. In facsimile transmission, a page is scanned and converted into a sequence of black or white pixels.
The requirements of how fast the facsimile of an A4 document (210×297 mm) must be transmitted
have changed over the last two decades.
iii. The CCITT (now ITU-T) has issued a number of recommendations based on the speed requirements
at a given time.
iv. The CCITT classifies the apparatus for facsimile transmission into four groups. (This classification is
done for A4 size document)
_ Group 1: This apparatus is capable of transmitting an A4-size document in about six minutes over phone
lines using an analog scheme. The apparatus is standardized in recommendation T.2.
_ Group 2: This apparatus is capable of transmitting an A4-size document over phone lines in about three
minutes. A Group 2 apparatus also uses an analog scheme and, therefore, does not use data compression.
The apparatus is standardized in recommendation T.3.
_ Group 3: This apparatus uses a digitized binary representation of the facsimile. Because it is a digital
scheme, it can and does use data compression and is capable of transmitting an A4-size document in about
a minute. The apparatus is standardized in recommendation T.4.
_ Group 4: This apparatus has the same speed requirement as Group 3. The apparatus is standardized in
recommendations T.6, T.503, T.521, and T.563.
v. With the arrival of the Internet, facsimile transmission has changed as well. Given the wide range of
rates and “apparatus” used for digital communication, it makes sense to focus more on protocols than
on apparatus. The newer recommendations from the ITU provide standards for compression that are
more or less independent of apparatus.
______________________________________________________________________________________
Q. Short note on fractal image compression_____(10mks)_____________________________________
Need for fractal image compression OR
Fractal image compression: How fractal compression differs from other techniques :
i. The underlying compression technique in JPEG standard is DCT (Discrete Cosine Transform). DCT
breaks an image into small blocks, usually 8 pixels by 8 pixels. JPEG achieves compression by discarding
image data( high frequency cosine terms) that is not required for the human eye to perceive the image
resulting in a poorer quality image with a pixelized (blocky) appearance.
Fractal images are not based on a map of pixels, nor is the encoding weighted to the visual characteristics
of the human eye. Instead, bitmap data is discarded when it is required to create a best-fit fractal pattern.
Greater compression ratios are achieved using greater computationally intensive transforms that may
degrade the image, but the distortion appears much more natural due to the fractal components.
ii. DCT-based JPEG compression is quite effective at low or moderate compression ratios, up to ratios of 20
or 25 to 1. Beyond this point, the image becomes very “blocky” as the compression increases and the
image quality becomes too poor for practical use. JPEG obtains high compression ratios by cutting off the
high frequency components of the image. This can also introduce very visible artifacts, in particular for
sharp edges in the image. This is known as Gibb’s phenomenon.
iii. Furthermore, the JPEG method is resolution dependent. Higher resolution graphics have more pixels than
the same images at lower resolution. . In order to “zoom-in” on a portion of an image and to enlarge it, it is
necessary to replicate pixels. The enlarged image will exhibit a certain level of “blockiness” which soon
becomes unacceptable as the expansion factor increases.
iv. Most other lossy methods are also symmetrical in nature. That is, a particular sequence of steps is used to
compress an image, and the reverse of those steps is used to decompress it. Compression and
decompression will take about the same amount of time as well. Fractal compression is an asymmetrical
process, taking much longer to compress an image than to decompress it.
Explanation
Fractal compression is a lossy compression method for digital images, based on fractals. The method is best
suited for textures and natural images.
A fractal is a structure that is made up of similar forms and patterns that occur in many different sizes.
For a given initial image each image is formed from a transformed (and reduced) copy of itself, and hence it
must have detail at every scale. That is, the images are fractals.
The term fractal was first used by Benoit Mandelbrot to describe repeating patterns that he observed
occurring in many different structures. These patterns appeared nearly identical in form at any size and
occurred naturally in all things.
Principle:
It is based on the fact that parts of an image often resemble other parts of the same image.(self-similarity
property) .Fractal algorithms convert these parts into mathematical data called "fractal codes" which are
used to recreate the encoded image. Fractal encoding is largely used to convert bitmap images to fractal
codes. Fractal decoding is just the reverse, in which a set of fractal codes are converted to a bitmap.
Currently, the most popular method of fractal encoding is a process called the Fractal Transform .
Compression
The real-world images often are rich in affine redundancy. This observation and the Collage Theorem were
the motivations of the fractal image compression algorithm.
i. The fractal image compression first partitions the original image into non overlapping domain
blocks/ regions (they can be any size or shape).
ii. Then a collection of possible range blocks/ regions is defined. The range regions can overlap and
need not cover the entire image, but must be larger than the domain regions.
iii. For each domain region the algorithm then searches for a suitable range region that, when applied
with an appropriate affine transformation, very closely resembles the domain region.
iv. Afterward, a FIF (Fractal Image Format) file is generated for the image. This file contains information
on the choice of domain regions, and the list of affine coefficients (i.e. the entries of the
transformation matrix) of all associated affine transformations.
v. So all the pixels' data in a given region are compressed into a small set of entries of the transform
matrix, with each entry, corresponding to an integer between 0 and 255, taking up one byte.
This process is independent of the resolution of the original image. The output graphic will look like the
original at any resolution, since the compressor has found an IFS whose attractor replicates the original one
(i.e. a set of equations describing the original image).
Decompression
i. To decompress an image, the compressor first allocates two memory buffers of equal size, with
arbitrary initial content.
ii. The iterations then begin, with buffer 1 the range image and buffer 2 the domain image.
iii. The domain image is partitioned into domain regions as specified in the FIF file.
iv. For each domain region, its associated range region is located in the range image.
v. Then the corresponding affine map is applied to the content of the range region, pulling the content
toward the map's attractor. Since each of the affine maps is contractive, the range region is
contracted by the transformation. This is the reason that the range regions are required to be larger
than the domain regions during compression.
vi. For the next iteration, the roles of the domain image and range image are switched. The process of
mapping the range regions (now in buffer 2) to their respective domain regions (in buffer 1) is
repeated, using the prescribed affine transformations.
vii. Then the entire step is repeated again and again, with the content of buffer 1 mapped to buffer 2,
then vice versa.
viii. At every step, the content is pulled ever closer to the attractor of the IFS which forms a collage of the
original image. Eventually the differences between the two images become very small, and the
content of the first buffer is the output decompressed image.
Advantages:
i. Two tremendous benefits are immediately realized by converting conventional bitmap images to
fractal data.
a. The first is the ability to scale any fractal image up or down in size without the introduction of
image artifacts or a loss in detail that occurs in bitmap images. This process of "fractal
zooming".l bitmap image, and the zooming is limited only by the amount of available memory
in the computer.
b. The second benefit is the fact that the size of the physical data used to store fractal codes is
much smaller than the size of the original bitmap data.. It is this aspect of fractal technology,
called fractal compression..
ii. The fractal method has the benefit of faster decompression speed, having done most of the
computation during the compression step, while giving equal or better compression ratio.
Disadvantages:
i. The asymmetric characteristic limits the usefulness of fractally compressed data to applications
where image data is constantly decompressed but never recompressed. Fractal compression is
therefore highly suited for use in image databases and CD-ROM applications.
ii. Fractal compression is lossy. The process of matching fractals does not involve looking for exact
matches, but instead looking for "best fit" matches based on the compression parameters (encoding
time, image quality, and size of output).
iii. The process of fractal compression is by no means in the public domain. There are many patents
claiming a method of image data compression based on fractal transforms. Also, the exact process
used by some fractal packages--including Barnsley's Fractal Transform--is considered proprietary.
The DCT is closely related to DFT. It can be obtained from DFT by mirroring the Original N point sequence
to obtain a 2N-point sequence. The DCT is the first N points of the resulting 2N-point DFT. he DCT is
substantially better at energy computation in most correlated sources when compared to DFT.
______________________________________________________________________________________
Q. MPEG video compression______(10mks)_________________________________________________
The Moving Picture Experts Group method is used to compress video. In principle, a motion picture is a rapid
flow of a set of frames, where each frame is an image. In other words, a frame is a spatial combination of
pixels, and a video is a temporal combination of frames that are sent one after another. Compressing video,
then, means spatially compressing each frame and temporally compressing a set of frames.
Video compression is based on two principles.
i. The first is the spatial redundancy that exists in each frame.
ii. The second is the fact that most of the time, a video frame is very similar to its immediate neighbors. This
is called temporal redundancy
Spatial Compression: The spatial compression of each frame is done with JPEG (or a modification of it).
Each frame is a picture that can be independently compressed.
Temporal Compression: In temporal compression, redundant frames are removed. When we watch
television, we receive 50 frames per second. However, most of the consecutive frames are almost the same.
For example, when someone is talking, most of the frame is the same as the previous one except for the
segment of the frame around the lips, which changes from one frame to another.
To temporally compress data, the MPEG method first divides frames into three categories: I-frames, P-
frames, and B-frames.
A frame that is coded using its predecessor is called inter frame (or just inter), while a frame that is coded
independently is called intra frame (or just intra). An intra frame is labeled I, and an inter frame is labeled P (for
predictive). ). A frame that is encoded based on both past and future frames is labeled B (for bidirectional).
i. An I frame is decoded independently of any other frame.
ii. A P frame is decoded using the preceding I or P frame.
iii. A B frame is decoded using the preceding and following I or P frames
1. I-frames. An intracoded frame (I-frame) is an independent frame that is not related to any other frame (not
to the frame sent before or to the frame sent after). They are present at regular intervals (e.g., every ninth
frame is an I- frame). An I-frame must appear periodically to handle some sudden change in the frame that
the previous and following frames cannot show. Also, when a video is broadcast, a viewer may tune in at any
time. If there is only one I-frame at the beginning of the broadcast, the viewer who tunes in late will not
receive a complete picture. I-frames are independent of other frames and cannot be constructed from other
frames.
2. P-frames. A predicted frame (P-frame) is related to the preceding I-frame or P- frame. In other words,
each P-frame contains only the changes from the preceding frame. The changes, however, cannot cover a
big segment. For example, for a fast- moving object, the new changes may not be recorded in a P- frame. P-
frames can be constructed only from previous I- or P-frames. P-frames carry much less information than
other frame types and carry even fewer bits after compression.
3. B-frames. A bidirectional frame (B-frame) is related to the preceding and following I- frame or P-frame. In
other words, each B-frame is relative to the past and the future. Note that a B-frame is never related to
another B-frame. Fig. 4.8 shows a sample sequence of frames.