Unit 4 Compression 11 Oct 22
Unit 4 Compression 11 Oct 22
Image compression addresses the problem of reducing the amount of data required to represent a
digital image. It is a process intended to yield a compact representation of an image, thereby
reducing the image storage/transmission requirements.
Need of Compression
Storage: The storage requirements of imaging application are very high. The goal of data
compression is to reduce the amount of memory by reducing the number of bits, while at the
same time maintain the minimum data to reconstruct the image. The reduction of data reduces
the memory requirement and hence the money spent for storage.
Transmission: The transmission time of the image is directly proportional to the size of the
image. Image compression aims to reduce the transmission time by reducing the size of the
image. The reduction of data leads to easier and faster transportation of data.
Faster computation: Reduction data often simplifies the algorithm design and facilitates faster
execution of the algorithms.
In first stage, the mapper transforms the input image into a format designed to reduce interpixel
redundancies. The second stage, quantizer block reduces the accuracy of mapper’s output in
accordance with a predefined criterion. In third and final stage, a symbol encoder creates a code
for quantizer output and maps the output in accordance with the code. These blocks perform, in
reverse order, the inverse operations of the encoder’s symbol coder and mapper block. As
quantization is irreversible, an inverse quantization is not included.
Compression Measures
( ) N1
Compression ratio = ( )
=
N2
This is expressed explicitly as N1:N2. It is common to use the compression ratio of 4:1. The
interpretation is that 4 pixels of the input image are expressed as 1 pixel in the output image.
( ) N2
Saving percentage =1 - ( )
= 1
N1
Bit rate describes the rate at which bits are transferred from the sender to the receiver and
indicates the efficiency of the compression algorithm by specifying how much data is transmitted
in a given amount of time. It is often given as bits per second (bps), kilobits per second (Kbps),
or megabits per second (Mbps). Bit rate specifies the average number of bits per stored pixel of
the image and is given as
N2
Bit rate = = (bits per pixel)
N
1.1 Redundancy
Image compression addresses the problem of reducing the amount of data required to represent a
digital image. It is a process intended to yield a compact representation of an image, thereby
reducing the image storage/transmission requirements. Compression is achieved by the removal
of one or more of the three basic data redundancies:
1. Coding Redundancy
2. Interpixel Redundancy
3. Psychovisual Redundancy
4. Chromatic Redundancy
Coding redundancy is present when less than optimal code words are used.
Interpixel redundancy results from correlations between the pixels of an image.
Psychovisual redundancy is due to data that is ignored by the human visual system (i.e. visually
non-essential information).
Chromatic redundancy refers to the presence of unnecessary colours in an image.
Image compression techniques reduce the number of bits required to represent an image by
taking advantage of these redundancies. An inverse process called decompression (decoding) is
applied to the compressed data to get the reconstructed image. The objective of compression is to
reduce the number of bits as much as possible, while keeping the resolution and the visual
quality of the reconstructed image as close to the original image as possible.
Coding Redundancy:
• Coding redundancy is associated with the representation of information.
• The information is represented in the form of codes.
• If the gray levels of an image are coded in a way that uses more code symbols than
absolutely necessary to represent each gray level then the resulting image is said to
contain coding redundancy.
Psychovisual Redundancy:
• The Psychovisual redundancies exist because human perception does not involve
quantitative analysis of every pixel or luminance value in the image.
• It’s elimination is real visual information is possible only because the information itself
is not essential for normal visual processing.
• Psychovisual redundancy is related with the visual information. Its elimination results a
loss of quantitative information. However psychovisually loss is negligible. Removing
this type of redundancy is a lossy process and the lost information cannot be recovered.
Chromatic redundancy
• Chromatic redundancy refers to the presence of unnecessary colours in an image. The
colour channels of colour image are highly correlated and the human visual system
cannot perceive millions of colours. Hence the colours that are not perceived by the
human visual system can be removed without affecting the quality of image.
BENEFITS OF COMPRESSION
• It provides a potential cost savings associated with sending less data over switched
telephone network where cost of call is really usually based upon its duration.
• It not only reduces storage requirements but also overall execution time.
• It also reduces the probability of transmission errors since fewer bits are transferred.
• It also provides a level of security against illicit monitoring.
Measuring Information: The entropy of the given 8-bit image segment can be calculated by:
1. Lossless technique
2. Lossy technique
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWW
WWWWWWWWWWWWWBWWWWWWWWWWWWWW
If we apply the run-length encoding (RLE) data compression algorithm to the above
hypothetical scan line, we get the following:12W1B12W3B24W1B14W
This is a very simple compression method used for sequential data. It is very useful in case of
repetitive data. This technique replaces sequences of identical symbols (pixels), called runs by
shorter symbols. The run length code for a gray scale image is represented by a sequence {V i,
Ri} where Vi is the intensity of pixel and Ri refers to the number of consecutive pixels with the
intensity Vi as shown in the figure. If both Vi and Ri are represented by one byte, this span of
12 pixels is coded using eight bytes yielding a compression ratio of 1:5
6
3
1
24
25/24= 1.042:1
The simplest construction algorithm uses a priority queue where the node with lowest probability
is given highest priority:
• Create a leaf node for each symbol and add it to the priority queue. While there is
more than one node in the queue, remove the node of highest priority (lowest
probability) twice to get two nodes.
• Create a new internal node with these two nodes as children and with probability
equal to the sum of the two nodes' probabilities. Add the new node to the queue.
• Repeat step 2 until the remaining node is the root and the tree is complete.
• For example, a source generates 5 different symbols {a1, a2, a3, a4, a5, a6} with
probability {0.1; 0.4; 0.06; 0.1; 0.04; 0.3}.
The first code assignment is done for a2 with the highest probability and the last assignment are
done for a3 and a5 with the lowest probabilities.
Shanon –Fano Coding
Bit plane coding (class notes)
Arithmetic coding
Arithmetic coding is a form of entropy encoding used in lossless data compression.
Normally, a string of characters such as the words "hello there" is represented using a fixed
number of bits per character, as in the ASCII code. When a string is converted to arithmetic
encoding, frequently used characters will be stored with fewer bits and not-so-frequently
occurring characters will be stored with more bits, resulting in fewer bits used in total.
Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in
that rather than separating the input into component symbols and replacing each with a code,
arithmetic coding encodes the entire message into a single number, an arbitrary-
precision fraction n where [0.0 ≤ n < 1.0).
LZW (Lempel- Ziv-Welch) is a dictionary based coding. Dictionary based coding can be
static or dynamic. In static dictionary coding, dictionary is fixed during the encoding and
decoding processes. In dynamic dictionary coding, the dictionary is updated on fly. LZW is
widely used in computer industry and is implemented as compress command on UNIX.
Lossless predictive coding
The system consists of an encoder and a decoder, each containing an identical predictor. As
each successive pixel of the input image, is introduced to the encoder, the predictor generates
the anticipated value of that pixel based on some number of past inputs. The output of the
predictor is then rounded to the nearest integer.
ˆf round f
m
n i n i
i 1
m is the order of the linear predictor, round is a function used to denote the rounding or
nearest integer operation and the αi for i=1,2,…m are prediction coefficient
en f n fˆn
this error is send across the chennel. the same predictor is used in the decoder side to predict the
value. the reconstructed image is
f n en fˆn
LOSSY COMPRESSION TECHNIQUES
Unlike the error-free compression, lossy encoding is based on the concept of compromising the
accuracy of the reconstructed image in exchange for increased compression.
The lossy compression method produces distortion which is irreversible. On the other hand, very
high compression ratios ranging between 10:1 to 50:1 can be achieved with visually
indistinguishable from the original. The error-free methods rarely give results more than 3:1.
Vector Quantization
The basic idea in this technique is to develop a dictionary of fixed-size vectors, called code
vectors. A vector is usually a block of pixel values. A given image is then partitioned into non-
overlapping blocks (vectors) called image vectors. Then for each in the dictionary is determined
and its index in the dictionary is used as the encoding of the original image vector. Thus, each
image is represented by a sequence of indices that can be further entropy coded.
Block Transformation Coding
In this coding scheme, transforms such as DFT (Discrete Fourier Transform) and DCT (Discrete
Cosine Transform) are used to change the pixels in the original image into frequency domain
coefficients (called transform coefficients).These coefficients have several desirable properties.
One is the energy compaction property that results in most of the energy of the original data
being concentrated in only a few of the significant transform coefficients. This is the basis of
achieving the compression. Only those few significant coefficients are selected and the
remaining are discarded. The selected coefficients are considered for further quantization and
entropy encoding. DCT coding has been the most common approach to transform coding. It is
also adopted in the JPEG image compression standard.
DCT-based JPEG (Joint Photographic Expert Group) Standard
Image is divided in to smaller blocks. For example if the RGB image is 1024x1024 it can be
divided into subblocks of 4x4, 8x8, or 16x16. if the sub block size is chosen is 8x8, there will be
1024/8X1024/8=128x128 blocks in the horizontal and vertical direction.
Step 2. Quantization:
After the frequency coefficients are obtained, all the frequency components are not necessary
since the human eye is sensitive to the low frequency components. For this purpose a threshold
value is applied. The frequency components whose values are lesser than the threshold value are
discarded. The quantization table is used for this purpose. If the value in the quantization table is
10 and the frequency coefficient is 43, the quantized table value is 43/10=4.3. This is
approximated to the nearest integer i.e. 4. Depending on the threshold value, it is either retained
or discarded.
Once the encoded file is received the decoding is the inverse process given below
JPEG Decoder
The JPEG format offers the following four model of operation:
1. Sequential DCT – based mode
2. Loss less mode
3. Progressive mode
4. Hierarchical mode
v. Frame building: The frame header is created with additional information such as start
bit, end bit, data type and nature of the image. The compressed data is packed and the
frames are sent across the channel.
The JPEG decoder is the same as the encoder. It consists of the following components-
a. Frame decoder
b. Entropy decoder
c. DE quantization
d. Inverse DCT transform
e. Image Builder
Due to quantization, information loss occurs and hence perfect reconstruction is not possible.
However the thresholds are such the human visual system.
4. Hierarchical mode
This scheme uses the pyramidal data structure that store the image at several different
resolutions. Its advantage is that they allow the user to negotiate with the application at the
required resolution. The bottom layer is the original image in a pyramid data structure and
the subsequent layers are subsampled image by factor 2.
The predictive coding is used to encode the differential frames. Hence hierarchical coding
supports lossless code as well as progressive coding.
Video compression- MPEG
A sequence of still images associated with time index is called video. Video includes both still
image data and audio information. Video compression is based on two important aspects spatial
redundancy that is present in a single frame of the video and temporal redundancy that is present
among the frames.
Video
Group of pictures
Macroblock
Slice
Picture
Block
Frame Construction
1. The first frame or key frame is called the I-frame. These frames are very important as they
have more information.
2. Forward prediction schemes compute the difference between the current and the previous
frames. Backward prediction schemes compute the difference between the current and the
next frames. P-pictures or predictive pictures are obtained based on the difference obtained
from the frames compared with their previous frames. The bidirectional schemes predict the
difference between the current frame and both the previous and the subsequent frames. B-
frames are obtained by predicting both the previous and the later frames and then
interpolating them to get the complete frame.
I-frames are encoded using intra-frame compression algorithms. P-frames can be coded as
follows:
1. The difference between the current frame and the preceding frame is calculated.
2. The difference is encoding using DCT. Then the frequency coefficients are quantized and
then RLC is used for coding the sequence along with the motion vector.
B-frames use both past and future frames and hence should be reconstructed first before being
sent across the channel. Thus the frames should be reordered to the decoder in an efficient
manner.
Motion Estimation
Audio Compression
PCM encoder: This takes the audio signal and generates 32 samples.
Filter bank: This employs the 32-point FFT, and its 32 frequency coefficients are considered as
32 sub-bands.
The audio compression data flow is shown in figs 6.25(a) and 6.25(b).
Start
Start
Read audio
Read audio compressed data
signal
Unpack and
Transform to dequantize
frequency domain
Frequency sample
reconstruction
Bit allocation
Transform to time
domain
Quantization
Entropy decoding
Entropy encoding
Audio signal
Transmit
Fig. Audio compression data flow (a) Sender side (b) Receiver side.