Compression 1
Compression 1
IMAGE COMPRESSION
Definition: Image compression deals with reducing the amount of data required to represent a
digital image by removing of redundant data.
Images can be represented in digital format in many ways. Encoding the contents of a
2-D image in a raw bitmap (raster) format is usually not economical and may result in very
large files. Since raw image representations usually require a large amount of storage space
(and proportionally long transmission times in the case of file uploads/ downloads), most
image file formats employ some type of compression. The need to save storage space and
shorten transmission time, as well as the human visual system tolerance to a modest amount of
loss, have been the driving factors behind image compression techniques.
Data Information:
126
Definition of compression ratio:
Coding redundancy:
Code: a list of symbols (letters, numbers, bits etc.,)
Code word: a sequence of symbol used to represent a piece of information or an event
(e.g., gray levels).
Code word length: number of symbols in each code word.
127
128
129
130
131
COMPRESSION METHODS OF IMAGES:
when a tolerable degree of deterioration in the visual quality of the resulting image is
acceptable,
or lossless,
when the image is encoded in its full quality. The overall results of the compression
process, both in terms of storage savings – usually expressed numerically in terms of
compression ratio (CR) or bits per pixel (bpp) – as well as resulting quality loss (for the case
of lossy techniques) may vary depending on the technique, format, options (such as the
quality setting for JPEG), and the image contents.
whereas lossless compression should be preferred when dealing with line art,
technical drawings, cartoons, etc. or images in which no loss of detail may be tolerable (most
notably, space images and medical images).
132
transforming (encoding) a 2-D pixel array into a statistically uncorrelated data set. This
transformation is applied prior to storage or transmission. At some later time, the compressed
image is decompressed to reconstruct the original image information (preserving or lossless
techniques) or an approximation of it (lossy techniques).
Redundancy
Data compression is the process of reducing the amount of data required to represent a given
quantity of information. Different amounts of data might be used to communicate the same
amount of information. If the same information can be represented using different amounts of
data, it is reasonable to believe that the representation that requires more data contains what is
technically called data redundancy.
133
mean square (RMS) error, some subjective, such as pair wise comparison of two images
encoded with different quality settings) can be used. Most of the image coding algorithms in
use today exploit this type of redundancy, such as the Discrete Cosine Transform (DCT)-
based algorithm at the heart of the JPEG encoding standard.
Figure 2 shows the source encoder in further detail. Its main components are:
Mapper: transforms the input data into a (usually nonvisual) format designed to
reduce interpixel redundancies in the input image. This operation is generally
reversible and may or may not directly reduce the amount of data required to
represent the image.
Quantizer: reduces the accuracy of the mapper’s output in accordance with some pre-
established fidelity criterion. Reduces the psychovisual redundancies of the input
image. This operation is not reversible and must be omitted if lossless compression is
desired.
Error-free compression
where:
The concept of entropy provides an upper bound on how much compression can be achieved,
given the probability distribution of the source. In other words, it establishes a theoretical
limit on the amount of lossless compression that can be achieved using entropy encoding
techniques alone.
RLE is one of the simplest data compression techniques. It consists of replacing a sequence
(run) of identical symbols by a pair containing the symbol and the run length. It is used as the
primary compression technique in the 1-D CCITT Group 3 fax standard and in conjunction
with other techniques in the JPEG image compression standard (described in a separate short
article).
Differential coding
Differential coding techniques explore the interpixel redundancy in digital images. The basic
idea consists of applying a simple difference operator to neighboring pixels to calculate a
difference image, whose values are likely to follow within a much narrower range than the
original gray-level range. As a consequence of this narrower distribution – and consequently
reduced entropy – Huffman coding or other VLC schemes will produce shorter codewords
for the difference image.
Predictive coding
Figure 3 shows the main blocks of a lossless predictive encoder. The key component is the
predictor, whose function is to generate an estimated (predicted) value for each pixel from the
input image based on previous pixel values. The predictor’s output is rounded to the nearest
integer and compared with the actual pixel value: the difference between the two –
135
called prediction error – is then encoded by a VLC encoder. Since prediction errors are likely
to be smaller than the original pixel values, the VLC encoder will likely generate shorter
codewords.
There are several local, global, and adaptive prediction algorithms in the literature. In most
cases, the predicted pixel value is a linear combination of previous pixels.
Dictionary-based coding
Lossy compression
Quantization
The quantization stage is at the core of any lossy image encoding algorithm. Quantization, in
at the encoder side, means partitioning of the input data range into a smaller set of values.
There are two main types of quantizers: scalar quantizers and vector quantizers. A scalar
quantizer partitions the domain of input values into a smaller number of intervals. If the
output intervals are equally spaced, which is the simplest way to do it, the process is
called uniform scalar quantization; otherwise, for reasons usually related to minimization of
total distortion, it is called nonuniform scalar quantization. One of the most popular
nonuniform quantizers is the Lloyd-Max quantizer. Vector quantization (VQ) techniques
extend the basic principles of scalar quantization to multiple dimensions. Because of its fast
lookup capabilities at the decoder side, VQ-based coding schemes are particularly attractive
to multimedia applications.
Transform coding
The techniques discussed so far work directly on the pixel values and are usually
called spatial domain techniques. Transform coding techniques use a reversible, linear
mathematical transform to map the pixel values onto a set of coefficients, which are then
136
quantized and encoded. The key factor behind the success of transform-based coding
schemes many of the resulting coefficients for most natural images have small magnitudes
and can be quantized (or discarded altogether) without causing significant distortion in the
decoded image. Different mathematical transforms, such as Fourier (DFT), Walsh-Hadamard
(WHT), and Karhunen-Loeve (KLT), have been considered for the task. For compression
purposes, the higher the capability of compressing information in fewer coefficients, the
better the transform; for that reason, the Discrete Cosine Transform (DCT) has become the
most widely used transform coding technique.
Wavelet coding
Wavelet coding techniques are also based on the idea that the coefficients of a transform that
decorrelates the pixels of an image can be coded more efficiently than the original pixels
themselves. The main difference between wavelet coding and DCT-based coding (Figure 4)
is the omission of the first stage. Because wavelet transforms are capable of representing an
input signal with multiple levels of resolution, and yet maintain the useful compaction
properties of the DCT, the subdivision of the input image into smaller subimages is no longer
necessary. Wavelet coding has been at the core of the latest image compression standards,
most notably JPEG 2000, which is discussed in a separate short article.
Work on international standards for image compression started in the late 1970s with
the CCITT (currently ITU-T) need to standardize binary image compression algorithms for
Group 3 facsimile communications. Since then, many other committees and standards have
been formed to produce de jure standards (such as JPEG), while several commercially
successful initiatives have effectively become de facto standards (such as GIF). Image
compression standards bring about many benefits, such as: (1) easier exchange of image files
between different devices and applications; (2) reuse of existing hardware and software for a
wider array of products; (3) existence of benchmarks and reference data sets for new and
alternative developments.
Work on binary image compression standards was initially motivated by CCITT Group 3 and
4 facsimile standards. The Group 3 standard uses a non-adaptive, 1-D RLE technique in
which the last K-1 lines of each group of K lines (for K = 2 or 4) are optionally coded in a 2-
D manner, using the Modified Relative Element Address Designate (MREAD) algorithm. The
Group 4 standard uses only the MREAD coding algorithm. Both classes of algorithms are
non-adaptive and were optimized for a set of eight test images, containing a mix of
representative documents, which sometimes resulted in data expansion when applied to
different types of documents (e.g., half-tone images).. The Joint Bilevel Image Group
(JBIG)– a joint committee of the ITU-T and ISO – has addressed these limitations and
proposed two new standards
(JBIG and JBIG2) which can be used to compress binary and gray-scale images of up to 6
gray-coded bits/pixel.
Encode each pixel ignoring their inter-pixel dependencies. Among methods are:
1. Entropy Coding: Every block of an image is entropy encoded based upon the Pk’s
within a block. This produces variable length code for each block depending on
spatial activities within the blocks.
2. Run-Length Encoding: Scan the image horizontally or vertically and while scanning
assign a group of pixel with the same intensity into a pair (gi , li) where gi is the
intensity and li is the length of the “run”. This method can also be used for detecting
edges and boundaries of an object. It is mostly used for images with a small number
of gray levels and is not effective for highly textured images.
Example 2: Let the transition probabilities for run-length encoding of a binary image
(0:black and 1:white) be p0 = P(0/1) and p1 = P(1/0). Assuming all runs are independent, find
(a) average run lengths, (b) entropies of white and black runs, and (c) compression ratio.
Solution:
138
Using the same series formula, we get
Example 3: For the same image in the previous example, which requires 3 bits/pixel using
standard PCM we can arrange the table on the next page.
139
Fig. Tree structure for Huffman Encoding
i.e., an average of 2 bits/pixel (instead of 3 bits/pixel using PCM) can be used to code the
image. However, the drawback of the standard Huffman encoding method is that the codes
have variable lengths.
140
PREDICTIVE ENCODING:
Idea: Remove mutual redundancy among successive pixels in a region of support (ROS) or
neighborhood and encode only the new information. This mehtod is based upon linear
prediction. Let us start with 1-D linear predictors. An Nth order linear prediction of x(n)
based on N previous samples is generated using a 1-D autoregressive (AR) model.
ai s are model coefficients determined based on some sample signals. Now instead of
encoding x(n) the prediction error.
To understand the need for comapct image representation, consider the amount of
data required to represent a 2 hour standard Definition(SD) using 720 x 480 x 24 bit pixel
arrays.
141
A video is a sequence of video frames where each frame is full color still image.
Because video player must display the frames sequentially at rates near 30 fps. Standard
definition data must be accessed 30fps x (720 x 480) ppf x 3bpp = 31,104,000 bps.
fps: frames per second, ppf: pixels per frame, bpp: bytes per pixel, bps: bytes per second.
where sph is second per hour = 2.24 x 1011 bytes = 224 GB of data.
TWENTY SEVEN 8.5 GB dual layer DVD’s are needed to store it.
To put 2 hours movie on a single DVD, each frame must be compressed by a factor of around
26.3.
The compression must be even higher for HD, where image resolution reach 1920 x 1080 x
24 bits per image.
Webpage images & High-resolution digital camera photos also are compressed to save
storage space & reduce transmission time.
Residential Internet connection delivers data at speeds ranging from 56kbps (conventional
phone line) to more than 12 mbps (broadband).
Time required to transmit a small 128 x 128 x 24 bit full color image over this range of speed
is from 7.0 to 0.03 sec.
Similarly, number of uncompressed full color images that an 8 Megapixel digital camere can
store on a 1GB Memory card can be increased.
Data compression: It refers to the process of reducting the amount of data required to
represent a given quantity of information.
Data Vs Information:
Data and information are not the same thing; data are the means by which information is
conveyed.
Because various amount of data can be used to represent the same amount of information,
representations that contain irrelevant or repeated information are said to contain redundant.
In today’s multimedia wireless communication, major issue is bandwidth needed to satisfy real time
transmission of image data. Compression is one of the good solutions to address this issue.
Transform based compression algorithms are widely used in the field of compression, because of
their de-correlation and other properties, useful in compression. In this paper, comparative study of
compression methods is done based on their types. This paper addresses the issue of importance of
142
transform in image compression and selecting particular transform for image compression. A
comparative study of performance of a variety of different image transforms is done base on
compression ratio, entropy and time factor.
The Role of Transforms in Image Compression (PDF Download Available). Available from:
https://www.researchgate.net/publication/257251096_The_Role_of_Transforms_in_Image_Compr
ession [accessed Jun 05 2018].
143
The general encoding architecture of image compression system is shown is Fig. 1.4. The
fundamental theory and concept of each functional block will be introduced in the following
sections.
Why an image can be compressed? The reason is that the correlation between one
pixel and its neighbor pixels is very high, or we can say that the values of one pixel and its
adjacent pixels are very similar. Once the correlation between the pixels is reduced, we can
take advantage of the statistical characteristics and the variable length coding theory to
reduce the storage quantity. This is the most important part of the image compression
algorithm; there are a lot of relevant processing methods being proposed. The best-known
methods are as follows:
144
JPEG and JPEG 2000 have their own quantization methods, and the details of relevant
theory will be introduced in the chapter 2.
ENTROPY CODING
The main objective of entropy coding is to achieve less average length of the
image. Entropy coding assigns codewords to the corresponding symbols according to
the probability of the symbols. In general, the entropy encoders are used to compress
the data by replacing symbols represented by equal-length codes with the code words
whose length is inverse proportional to corresponding probability. The entropy
encoder of JPEG and JPEG 2000 will also be introduced in the chapter 2.
2 AN OVERVIEW OF IMAGE COMPRESSION STANDARD:
In this chapter, we will introduce the fundamental theory of two well-known
image compression standards –JPEG and JPEG 2000.
Fig. 2.1 and 2.2 shows the Encoder and Decoder model of JPEG. We will introduce
the operation and fundamental theory of each block in the following sections.
145
DISCRETE COSINE TRANSFORM
The next step after color coordinate conversion is to divide the three color
components of the image into many 8×8 blocks. The mathematical definition of the Forward
DCT and the Inverse DCT are as follows:
The f(x,y) is the value of each pixel in the selected 8×8 block, and the F(u,v) is the
DCT coefficient after transformation. The transformation of the 8×8 block is also a 8×8 block
composed of F(u,v).
The DCT is closely related to the DFT. Both of them taking a set of points from the
spatial domain and transform them into an equivalent representation in the frequency domain.
However, why DCT is more appropriate for image compression than DFT? The two main
reasons are:
1. The DCT can concentrate the energy of the transformed signal in low
frequency, whereas the DFT can not. According to Parseval’s theorem,
theenergy is the same in the spatial domain and in the frequency domain.
Because the human eyes are less sensitive to the low frequency component,
we can focus on the low frequency component and reduce the contribution
of the high frequency component after taking DCT.
2. 2. For image compression, the DCT can reduce the blocking effect than the
DFT.
After transformation, the element in the upper most left corresponding to zero
frequency in both directions is the “DC coefficient” and the rest are called “AC
coefficients.”
Quantization in JPEG:
Quantization is the step where we actually throw away data. The DCT is a lossless
procedure. The data can be precisely recovered through the IDCT (this isn’t entirely true
because in reality no physical implementation can compute with perfect accuracy). During
146
Quantization every coefficients in the 8×8 DCT matrix is divided by a corresponding
quantization value. The quantized coefficient is defined in (2.3), and the reverse the process
can be achieved by the (2.4).
The goal of quantization is to reduce most of the less important high frequency DCT
coefficients to zero, the more zeros we generate the better the image will compress. The
matrix Q generally has lower numbers in the upper left direction and large numbers in the
lower right direction. Though the high-frequency components are removed, the IDCT still
can obtain an approximate matrix which is close to the original 8×8 block matrix. The JPEG
committee has recommended certain Q matrix that work well and the performance is close to
the optimal condition, the Q matrix for luminance and chrominance components is defined in
(2.5) and (2.6)
ZIGZAG SCAN:
After quantization, the DC coefficient is treated separately from the 63 AC
coefficients. The DC coefficient is a measure of the average value of the original 64 image
samples. Because there is usually strong correlation between the DC coefficients of adjacent
8×8 blocks, the quantized DC coefficient is encoded as the difference from the DC term of
the previous block. This special treatment is worthwhile, as DC coefficients frequently
contain a significant fraction of the total image energy. The other 63 entries are the AC
147
components. They are treated separately from the DC coefficients in the entropy coding
process.
We set DC0 = 0. DC of the current block DCi will be equal to DCi-1 + Diffi .
Therefore, in the JPEG file, the first coefficient is actually the difference of DCs.
Then the difference is encoded with Huffman coding algorithm together with the
encoding of AC coefficients.
(i) Redundancy can be broadly classified into Statistical redundancy and Psycho visual
redundancy.
(ii) Statistical redundancy can be classified into inter-pixel redundancy and coding
redundancy.
148
(iii) Inter-pixel can be further classified into spatial redundancy and temporal redundancy.
(iv) Spatial redundancy or correlation between neighboring pixel values.
(v) Spectral redundancy or correlation between different color planes or spectral bands.
(vi) Temporal redundancy or correlation between adjacent frames in a sequence of images in
video applications.
(vii) Image compression research aims at reducing the number of bits needed to represent an
image by removing the spatial and spectral redundancies as much as possible.
(viii) In digital image compression, three basic data redundancies can be identified and
exploited: Coding redundancy, Inter-pixel redundancy and Psychovisual redundancy.
Coding Redundancy:
o Coding redundancy is associated with the representation of information.
o The information is represented in the form of codes.
o If the gray levels of an image are coded in a way that uses more code symbols
than absolutely necessary to represent each gray level then the resulting image
is said to contain coding redundancy.
Inter-pixel Spatial Redundancy:
o Interpixel redundancy is due to the correlation between the neighboring pixels
in an image.
o That means neighboring pixels are not statistically independent. The gray
levels are not equally probable.
o The value of any given pixel can be predicated from the value of its neighbors
that is they are highly correlated.
o The information carried by individual pixel is relatively small. To reduce the
interpixel redundancy the difference between adjacent pixels can be used to
represent an image.
Inter-pixel Temporal Redundancy:
o Interpixel temporal redundancy is the statistical correlation between pixels
from successive frames in video sequence.
o Temporal redundancy is also called interframe redundancy. Temporal
redundancy can be exploited using motion compensated predictive coding.
o Removing a large amount of redundancy leads to efficient video compression.
Psychovisual Redundancy:
o The Psychovisual redundancies exist because human perception does not
involve quantitative analysis of every pixel or luminance value in the image.
o It’s elimination is real visual information is possible only because the
information itself is not essential for normal visual processing.
149