IP
IP
LECTURE NOTES
Prepared By
CO 1 √ √
CO 2 √ √ √
CO 3 √ √ √ √ √ √
INTRODUCTION:
Basic concept of digital image:
The field of digital image processing refers to processing digital images by means of digital
computer. Digital image is composed of a finite number of elements, each of which has a
particular location and value. These elements are called picture elements, image elements, pels
and pixels. Pixel is the term used most widely to denote the elements of digital image.An image
is a two-dimensional function that represents a measure of some characteristic such as
brightness or color of a viewed scene. An image is a projection of a 3- D scene into a 2D
projection plane.
An image may be defined as a two-dimensional function f(x,y), where x and y are spatial (plane)
coordinates, and the amplitude of f at any pair of coordinates (x,y) is called the intensity of the
image at that point.
The term gray level is used often to refer to the intensity of monochrome images.
Color images are formed by a combination of individual 2-D images.For example: The RGB
color system, a color image consists of three (red, green and blue) individual component
images. For this reason many of the techniques developed for monochrome images can be
extended to color images by processing the three component images individually.
An image may be continuous with respect to the x- and y- coordinates and also in amplitude.
Converting such an image to digital form requires that the coordinates, as well as the amplitude,
be digitized.
Applications of digital image processing
Since digital image processing has very wide applications and almost all of the
technical fields are impacted by DIP, we will just discuss some of the major
applications of DIP.
Digital image processing has a broad spectrum of applications, such as
• Remote sensing via satellites and other spacecrafts
• Image transmission and storage for business applications
• Medical processing,
• RADAR (Radio Detection and Ranging)
• SONAR(Sound Navigation and Ranging) and
• Acoustic image processing (The study of underwater sound is known as
underwater acoustics or hydro acoustics.)
• Robotics and automated inspection of industrial parts. mages acquired by
satellites are useful in tracking of
• Earth resources;
• Geographical mapping;
• Prediction of agricultural crops,
• Urban growth and weather monitoring
• Flood and fire control and many other environmental applications. Space image
applications include:
• Recognition and analysis of objects contained in images obtained from deep
space-probe missions.
• Image transmission and storage applications occur in broadcast television
• Teleconferencing
• Transmission of facsimile images(Printed documents and graphics) for office
automation Communication over computer networks
• Closed-circuit television based security monitoring systems and
• In military communications.
• Medical applications:
• Processing of chest X- rays
• Cineangiograms
• Projection images of transaxial tomography and
• Medical images that occur in radiology nuclear magnetic
esonance(NMR)
Image processing toolbox (IPT) is a collection of functions that extend the capability
of the MATLAB numeric computing environment. These functions, and the
expressiveness of the MATLAB language, make many image-processing operations
easy to write in a compact, clear manner, thus providing a ideal software prototyping
environment for the solution of image processing problem.
Components of Image processing System:
Color image processing: It is an area that is been gaining importance because of the
use of digital images over the internet. Color image processing deals with basically
color models and their implementation in image processing applications.
Wavelets and Multiresolution Processing: These are the foundation for representing
image in various degrees of resolution.
Compression: It deals with techniques reducing the storage required to save an image,
or the bandwidth required to transmit it over the network. It has to major approaches
a) Lossless Compression b) Lossy Compression
Morphological processing: It deals with tools for extracting image components that
are useful in the representation and description of shape and boundary of objects. It is
majorly used in automated inspection applications.
Representation and Description: It always follows the output of segmentation step
that is, raw pixel data, constituting either the boundary of an image or points in the
region itself. In either case converting the data to a form suitable for computer
processing is necessary.
Recognition: It is the process that assigns label to an object based on its descriptors.
It is the last step of image processing which use artificial intelligence of software.
Knowledge base:
Knowledge about a problem domain is coded into an image processing system in the
form of a knowledge base. This knowledge may be as simple as detailing regions of
an image where the information of the interest in known to be located. Thus limiting
search that has to be conducted in seeking the information. The knowledge base also
can be quite complex such interrelated list of all major possible defects in a materials
inspection problems or an image database containing high resolution satellite images
of a region in connection with change detection application.
Digital image through scanner
Scanner is a device that scans images, printed text, and handwriting etc and converts
it to digital form or image. It is so named because the data is converted one line at a
time or scanned down the page as the scanning head moves down the page.
The glass plate is the transparent plate wherein the original is placed so that the scanner
can scan it and the cover keeps out stray light that can affect the accuracy of the scan
Scanning head
Scanning head is the most important component because it is the one which does actual
scanning. It contains components like
Light source and mirror: It is the bright white light that is used to illuminate the
original as it is being scanned and which bounces off the original and reflected off
several mirrors
Stabilizer bar: It is a long stainless steel rod that is securely fastened to the case of
the scanner and it provides a smooth ride as the scanner scans down the page
CCD (Charge Coupled Device) or CIS (Contact Image Sensor): A CCD array is a
device that converts photons into electricity. Any scanner that uses CCD use lens to
focus the light coming from the mirrors within the scanning head.
Another technology used in some cheaper scanners is CIS wherein the light source is
a set of LEDs that runs the length of the glass plate.
iii. Stepper Motor
The stepper motor in a scanner moves the scan head down the page during scan cycle
and this is often located either on the scan head itself or attached to a belt to drive the
scanner head.
Flatbed Scanners
The most commonly used scanner is a flatbed scanner also known as desktop scanner.
It has a glass plate on which the picture or the document is placed. The scanner head
placed beneath the glass plate moves across the picture and the result is a good quality
scanned image. For scanning large maps or top sheets wide format flatbed scanners
can be used.
Sheet fed scanners work on a principle similar to that of a fax machine. In this, the
document to be scanned is moved past the scanning head and the digital form of the
image is obtained. The disadvantage of this type of scanner is that it can only scan
loose sheets and the scanned image can easily become distorted if the document is not
handled properly while scanning
Handheld Scanners
In analog cameras, the image formation is due to the chemical reaction that takes place
on the strip that is used for image formation.A 35mm strip is used in analog camera. It
is denoted in the figure by 35mm film cartridge. This strip is coated with silver halide
( a chemical substance).
Like analog cameras, in the case of digital too, when light falls on the object, the light
reflects back after striking the object and allowed to enter inside the camera.
Each sensor of the CCD array itself is an analog sensor. When photons of light strike
on the chip, it is held as a small electrical charge in each photo sensor. The response
of each sensor is directly equal to the amount of light or (photon) energy striked on the
surface of the sensor.
Since we have already define an image as a two dimensional signal and due to the two
dimensional formation of the CCD array, a complete image can be achieved from this
CCD array.It has limited number of sensors, and it means a limited detail can be
captured by it. Also each sensor can have only one value against the each photon
particle that strike on it.So the number of photons striking (current) are counted and
stored. In order to measure accurately these, external CMOS sensors are also attached
with CCD array.
Sampling and quantization:
To create a digital image, we need to convert the continuous sensed data into digital
from. This involves two processes – sampling and quantization. An image may be
continuous with respect to the x and y coordinates and also in amplitude. To convert
it into digital form we have to sample the function in both coordinates and in
amplitudes.
Digitalizing the coordinate values is called sampling. Digitalizing the amplitude values
is called quantization. There is a continuous the image along the line segment AB. To
simple this function, we take equally spaced samples along line AB. The location of
each samples is given by a vertical tick back (mark) in the bottom part. The samples
are shown as block squares superimposed on function the set of these discrete locations
gives the sampled function.
In order to form a digital, the gray level values must also be converted (quantized) into
discrete quantities. So we divide the gray level scale into eight discrete levels ranging
from eight level values. The continuous gray levels are quantized simply by assigning
one of the eight discrete gray levels to each sample. The assignment it made depending
on the vertical proximity of a simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces
a two dimensional digital image.
Digital Image definition:
A digital image f(m,n) described in a 2D discrete space is derived from an analog
image f(x,y) in a 2D continuous space through a sampling process that is frequently
referred to as digitization. The mathematics of that sampling process will be described
in subsequent Chapters. For now we will look at some basic definitions associated with
the digital image. The effect of digitization is shown in figure.
The 2D continuous image f(x,y) is divided into N rows and M columns. The
intersection of a row and a column is termed a pixel. The value assigned to the integer
coordinates (m,n) with m=0,1,2..N-1 and n=0,1,2…N-1 is f(m,n). In fact, in most
cases, is actually a function of many variables including depth, color and time (t).
Thus the right side of the matrix represents a digital element, pixel or pel. The matrix
can be represented in the following form as well. The sampling process may be viewed
as partitioning the xy plane into a grid with the coordinates of the center of each grid
being a pair of elements from the Cartesian products Z2 which is the set of all ordered
pair of elements (Zi, Zj) with Zi and Zj being integers from Z. Hence f(x,y) is a digital
image if gray
level (that is, a real number from the set of real number R) to each distinct pair of
coordinates (x,y). This functional assignment is the quantization process. If the gray
levels are also integers, Z replaces R, the and a digital image become a 2D function
whose coordinates and she amplitude value are integers. Due to processing storage and
hardware consideration, the number gray levels typically is an integer power of 2.
k
L=2
Then, the number, b, of bites required to store a digital image is B=M *N* k When
2
M=N, the equation become b=N *k
When an image can have 2k gray levels, it is referred to as “k- bit”. An image with
8
256 possible gray levels is called an “8- bit image” (256=2 ).
The components of a single sensor. Perhaps the most familiar sensor of this type is the
photodiode, which is constructed of silicon materials and whose output voltage waveform is
proportional to light. The use of a filter in front of a sensor improves selectivity. For
example, a green (pass) filter in front of a light sensor favors light in the green band of the
color spectrum. As a consequence, the sensor output will be stronger for green light than for
other components in the visible spectrum.
In order to generate a 2-D image using a single sensor, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be
imaged. Figure shows an arrangement used in high-precision scanning, where a film
negative is mounted onto a drum whose mechanical rotation provides displacement in
one dimension. The single sensor is mounted on a lead screw that provides motion in
the perpendicular direction. Since mechanical motion can be controlled with high
precision, this method is an inexpensive (but slow) way to obtain high-resolution
images. Other similar mechanical arrangements use a flat bed, with the sensor moving
in two linear directions. These types of mechanical digitizers sometimes are referred
to as micro densitometers.
This set of pixels, called the 4-neighbors or p, is denoted by N4(p). Each pixel is one
unit distance from (x,y) and some of the neighbors of p lie outside the digital image if
(x,y) is on the border of the image. The four diagonal neighbors of p have coordinates
and are denoted by ND (p).
(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)
These points, together with the 4-neighbors, are called the 8-neighbors of p, denoted
by N8 (p).
As before, some of the points in ND (p) and N8 (p) fall outside the image if (x,y) is on
the border of the image.
Adjacency and connectivity
Let v be the set of gray –level values used to define adjacency, in a binary image,
v={1}. In a gray-scale image, the idea is the same, but V typically contains more
elements, for example, V = {180, 181, 182, …, 200}.
If the possible intensity values 0 – 255, V set can be any subset of these 256 values. if
we are reference to adjacency of pixel with value.
Three types of adjacency
• 4- Adjacency – two pixel P and Q with value from V are 4 –adjacency if A is in
the set N4(P)
• 8- Adjacency – two pixel P and Q with value from V are 8 –adjacency if A is in
the set N8(P)
• M-adjacency –two pixel P and Q with value from V are m – adjacency if (i) Q is
in N4(p) or (ii) Q is in ND(q) and the set N4(p) ∩ N4(q) has no pixel whose values
are from V.
• Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate
the ambiguities that often arise when 8-adjacency is used.
• For example:
Figure :(a) Arrangement of pixels; (b) pixels that are 8-adjacent (shown dashed) to
the center pixel; (c) m-adjacency.
Types of Adjacency:
• In this example, we can note that to connect between two pixels (finding a path
between two pixels):
– In 8-adjacency way, you can find multiple paths between two pixels
– While, in m-adjacency, you can find only one path between two pixels
• So, m-adjacency has eliminated the multiple path connection that has been
generated by the 8-adjacency.
• Two subsets S1 and S2 are adjacent, if some pixel in S1 is adjacent to some pixel in
S2. Adjacent means, either 4-, 8- or m-adjacency.
A Digital Path:
• A digital path (or curve) from pixel p with coordinate (x,y) to pixel q with coordinate
(s,t) is a sequence of distinct pixels with coordinates (x0,y0), (x1,y1), …, (xn, yn) where
(x0,y0) = (x,y) and (xn, yn) = (s,t) and pixels (xi, yi) and (xi-1, yi-1) are adjacent for 1 ≤ i ≤
n
• n is the length of the path
• If (x0,y0) = (xn, yn), the path is closed.
We can specify 4-, 8- or m-paths depending on the type of adjacency specified.
• Return to the previous example:
Figure: (a) Arrangement of pixels; (b) pixels that are 8-adjacent(shown dashed) to
the center pixel; (c) m-adjacency.
In figure (b) the paths between the top right and bottom right pixels are 8-paths. And
the path between the same 2 pixels in figure (c) is m-path
Connectivity:
• Let S represent a subset of pixels in an image, two pixels p and q are said to be
connected in S if there exists a path between them consisting entirely of pixels in
S.
• For any pixel p in S, the set of pixels that are connected to it in S is called a
connected component of S. If it only has one connected component, then set S is
called a connected set.
Region and Boundary:
REGION: Let R be a subset of pixels in an image, we call R a region of the image if
R is a connected set.BOUNDARY: The boundary (also called border or contour) of a
region R is the set of pixels in the region that have one or more neighbors that are not
in R. If R happens to be an entire image, then its boundary is defined as the set of pixels in the
first and last rows and columns in the image. This extra definition is required because an image
has no neighbors beyond its borders. Normally, when we refer to a region, we are referring to
subset of an image, and any pixels in the boundary of the region that happen to coincide with
the border of the image are included implicitly as part of the region boundary.
DISTANCE MEASURES:
For pixel p,q and z with coordinate (x.y) ,(s,t) and (v,w) respectively D is a distance
function or metric if
D [p.q] ≥ O {D[p.q] = O iff p=q}
D [p.q] = D [p.q] and
D [p.q] ≥ O {D[p.q]+D(q,z)
• The Euclidean Distance between p and q is defined as:
De (p,q) = [(x – s)2 + (y - t)2]1/2
Pixels having a distance less than or equal to some value r from (x,y) are the points
contained in a disk of radius „ r „centered at (x,y)
• The D4 distance (also called city-block distance) between p and q is defined as:
D4 (p,q) = | x – s | + | y – t |
Pixels having a D4 distance from (x,y), less than or equal to some value r form a
Diamond centered at (x,y)
Example:
The pixels with distance D4 ≤ 2 from (x,y) form the following contours of constant
distance.
The pixels with D4 = 1 are the 4-neighbors of (x,y)
• The D8 distance (also called chessboard distance) between p and q is defined as:
D8 (p,q) = max(| x – s |,| y – t |)
Pixels having a D8 distance from (x,y), less than or equal to some value r form a
square Centered at (x,y).
Example:
D8 distance ≤ 2 from (x,y) form the following contours of constant distance.
• Dm distance:
It is defined as the shortest m-path between the points.In this case, the distance
between two pixels will depend on the values of the pixels along the path, as well as
the values of their neighbors.
• Example:
Consider the following arrangement of pixels and assume that p, p2, and p4 have value
1 and that p1 and p3 can have can have a value of 0 or 1 Suppose that we consider the
adjacency of pixels values 1 (i.e. V = {1})
Case2: If p1 =1 and p3 = 0
Case3: If p1 =0 and p3 = 1
The same applies here, and the shortest –m-path will be 3 (p, p2, p3, p4)
Case4: If p1 =1 and p3 = 1
The length of the shortest m-path will be 4 (p, p1 , p2, p3, p4)
Gray levels:
Image resolution
A resolution can be defined as the total number of pixels in an image. This has been
discussed in Image resolution. And we have also discussed, that clarity of an image
does not depends on number of pixels, but on the spatial resolution of the image. This
has been discussed in the spatial resolution. Here we are going to discuss another type
of resolution which is called gray level resolution.
Gray level resolution
Gray level resolution refers to the predictable or deterministic change in the shades or
levels of gray in an image.
In short gray level resolution is equal to the number of bits per pixel.
We have already discussed bits per pixel in our tutorial of bits per pixel and image
storage requirements. We will define bpp here briefly.
BPP
The number of different colors in an image is depends on the depth of color or bits per
pixel.
Mathematically
The mathematical relation that can be established between gray level resolution and
bits per pixel can be given as.
In this equation L refers to number of gray levels. It can also be defined as the shades
of gray. And k refers to bpp or bits per pixel. So the 2 raise to the power of bits per
pixel is equal to the gray level resolution.
For example:
The above image of Einstein is an gray scale image. Means it is an image with 8 bits
per pixel or 8bpp.
Now if were to calculate the gray level resolution, here how we going to do it
L=2k
Where k=8
L=28
L=256
It means it gray level resolution is 256. Or in other way we can say that this image has
256 different shades of gray.The more is the bits per pixel of an image, the more is its
gray level resolution.
Defining gray level resolution in terms of bpp
It is not necessary that a gray level resolution should only be defined in terms of levels.
We can also define it in terms of bits per pixel.
For example
If you are given an image of 4 bpp, and you are asked to calculate its gray level
resolution. There are two answers to that question.
The first answer is 16 levels.
The second answer is 4 bits.
Finding bpp from Gray level resolution
You can also find the bits per pixels from the given gray level resolution. For this, we
just have to twist the formula a little.
Equation 1.
This formula finds the levels. Now if we were to find the bits per pixel or in this case
k, we will simply change it like this.
K = log base 2(L) Equation (2)
Because in the first equation the relationship between Levels (L ) and bits per pixel (k)
is exponentional. Now we have to revert it, and thus the inverse of exponentional is
log.
Let’s take an example to find bits per pixel from gray level resolution.
For example:
If you are given an image of 256 levels. What are the bits per pixel required for it.
Putting 256 in the equation, we get.
K = log base 2 (256)
K = 8.
So the answer is 8 bits per pixel.
Gray level resolution and quantization:
The quantization will be formally introduced in the next tutorial, but here we are just
going to explain the relationship between gray level resolution and quantization.
Gray level resolution is found on the y axis of the signal. In the tutorial of Introduction
to signals and system, we have studied that digitizing a an analog signal requires two
steps. Sampling and quantization.
The concept of sampling is directly related to zooming. The more samples you take,
the more pixels, you get. Oversampling can also be called as zooming. This has been
discussed under sampling and zooming tutorial.
But the story of digitizing a signal does not end at sampling too, there is another step
involved which is known as Quantization.
What is quantization
Quantization is opposite to sampling. It is done on y axis. When you are quantizing an
image, you are actually dividing a signal into quanta (partitions).On the x axis of the
signal, are the co-ordinate values, and on the y axis, we have amplitudes. So digitizing
the amplitudes is known as Quantization. Here how it is done
You can see in this image, that the signal has been quantified into three different levels.
That means that when we sample an image, we actually gather a lot of values, and in
quantization, we set levels to these values. This can be more clear in the image below.
In the figure shown in sampling, although the samples has been taken, but they were
still spanning vertically to a continuous range of gray level values. In the figure shown
above, these vertically ranging values have been quantized into 5 different levels or
partitions. Ranging from 0 black to 4 white. This level could vary according to the type
of image you want.
The relation of quantization with gray levels has been further discussed below.
Relation of Quantization with gray level resolution:
The quantized figure shown above has 5 different levels of gray. It means that the
image formed from this signal, would only have 5 different colors. It would be a black
and white image more or less with some colours of gray. Now if you were to make the
quality of the image better, there is one thing you can do here. Which is, to increase
the levels or gray level resolution up? If you increase this level to 256, it means you
have a gray scale image. Which is far better then simple black and white image?
Now 256, or 5 or whatever level you choose is called gray level. Remember the
formula that we discussed in the previous tutorial of gray level resolution which is,
We have discussed that gray level can be defined in two ways. Which were these two.
Now we will start reducing the gray levels. We will first reduce the gray levels from
256 to 128.
128 Gray Levels
There is not much effect on an image after decrease the gray levels to its half. Lets
decrease some more.
64 Gray Levels
Still not much effect, then lets reduce the levels more.
32 Gray Levels
Surprised to see, that there is still some little effect. May be it’s due to reason, that it
is the picture of Einstein, but let’s reduce the levels more.
16 Gray Levels
Boom here, we go, the image finally reveals, that it is effected by the levels.
8 Gray Levels
4 Gray Levels
Now before reducing it, further two 2 levels, you can easily see that the image has been
distorted badly by reducing the gray levels. Now we will reduce it to 2 levels, which
is nothing but a simple black and white level. It means the image would be simple
black and white image.
2 Gray Levels
That’s the last level we can achieve, because if reduce it further, it would be simply a
black image, which cannot be interpreted.
Contouring
There is an interesting observation here, that as we reduce the number of gray levels,
there is a special type of effect start appearing in the image, which can be seen clear in
16 gray level picture. This effect is known as Contouring.
Iso preference curves
The answer to this effect, that why it appears, lies in Iso preference curves. They are
discussed in our next tutorial of Contouring and Iso preference curves.
What is contouring?
As we decrease the number of gray levels in an image, some false colors, or edges start
appearing on an image. This has been shown in our last tutorial of Quantization.
Lets have a look at it.
Consider we, have an image of 8bpp (a grayscale image) with 256 different shades of
gray or gray levels.
This above picture has 256 different shades of gray. Now when we reduce it to 128
and further reduce it 64, the image is more or less the same. But when re reduce it
further to 32 different levels, we got a picture like this
If you will look closely, you will find that the effects start appearing on the image.
These effects are more visible when we reduce it further to 16 levels and we got an
image like this.
These lines that start appearing on this image are known as contouring that are very
much visible in the above image.
Increase and decrease in contouring
The effect of contouring increase as we reduce the number of gray levels and the effect
decrease as we increase the number of gray levels. They are both vice versa
VS
That means more quantization, will effect in more contouring and vice versa. But is
this always the case. The answer is No. That depends on something else that is
discussed below.
Isopreference curves
A study conducted on this effect of gray level and contouring, and the results were
shown in the graph in the form of curves, known as Isopreference curves.
The phenomena of Isopreference curve shows that the effect of contouring not only
depends on the decreasing of gray level resolution but also on the image detail.
The essence of the study is:If an image has more detail, the effect of contouring
would start appear on this image later, as compare to an image which has less detail,
when the gray levels are quantized.According to the original research, the researchers
took these three images and they vary the Gray level resolution, in all three images.
The images were
Level of detail
The first image has only a face in it, and hence very less detail. The second image has
some other objects in the image too, such as camera man, his camera, camera stand,
and background objects e.t.c. whereas the third image has more details then all the
other images. graph has been shown below.
According to this graph, we can see that the first image which was of face was subject
to contouring early then all of the other two images. The second image that was of the
cameraman was subject to contouring a bit after the first image when its gray levels
are reduced. This is because it has more details then the first image. And the third
image was subject to contouring a lot after the first two images i-e: after 4 bpp. This is
because, this image has more detaills
Imaging geometry
Central Projection
Vector notation:
Here central projection is represented in the coordinate frame attached to the camera.
Generally, there is not direct access to this camera coordinate frame. Instead, we need
to determine the mapping from a world coordinate frame to an image coordinate
system (see next slide).
2D-FFT PROPERTIES:
Therefore, the array formed by the Walsh matrix is a real symmetric matrix. It is
easily shown that it has orthogonal columns and rows
1-D Inverse Walsh Transform
The array formed by the inverse Walsh matrix is identical to the one formed by the
forward Walsh matrix apart from a multiplicative factor N.
2-D Walsh Transform
We define now the 2-D Walsh transform as a straightforward extension of the 1-D
transform:
Hadamard Transform:
We define now the 2-D Hadamard transform. It is similar to the 2-D Walsh
transform.
We define now the Inverse 2-D Hadamard transform. It is identical to the forward 2-
D Hadamard transform.
and the corresponding inverse 2D DCT transform is simple F-1(u,v), i.e.: where
The basic operation of the DCT is as follows:
From the sequency property, it is clear that the rows are ordered by the number of sign
changes.
The slant transform reproduces linear variations of brightness very well. However, its
performance at
edges is not as optimal as the KLT or DCT. Because of the ‘slant’ nature of the lower
order coefficients, its effect is to smear the edges
Hotelling Transform:
The KL transform is named after Kari Karhunen and Michel Loeve who developed it
as a series expansion method for continuous random processes. Originally, Harold
Hotelling studied the discrete formulation of the KL transform and for this reason, the
KL transform is also known as the Hotelling transform. The KL transform is a
reversible linear transform that exploits the statistical properties of a vector
representation. The basic functions of the KL transform are orthogonal eigen vectors
of the covariance matrix of a data set. A KL transform optimally decorrelates the input
data. After a KL transform, most of the ‘energy’ of the transform coefficients is
concentrated within the first few components. This is the energy compaction property
of a KL transform.
Drawbacks of KL Transforms
The two serious practical drawbacks of KL transform are the following:
i. A KL transform is input-dependent and the basic function has to be calculated for
each signal model
on which it operates. The KL bases have no specific mathematical structure that leads
to fast implementations.
ii. The KL transform requires O(m2) multiply/add operations. The DFT and DCT
require O( log2
m ) multiplications.
Applications of KL Transforms
(i) Clustering Analysis The KL transform is used in clustering analysis to determine a
new coordinate
system for sample data where the largest variance of a projection of the data lies on
the first axis, the next largest variance on the second axis, and so on. Because these
axes are orthogonal, this approach allows for reducing the dimensionality of the data
set by eliminating those coordinate axes with small variances. This data-reduction
technique is commonly referred as Principle Component Analysis (PCA).
(ii) Image Compression The KL transform is heavily utilised for performance
evaluation of compression algorithms since it has been proven to be the optimal
transform for the compression of an image sequence in the sense that the KL spectrum
contains the largest number of zero-valued coefficients.
MODULE-II
IMAGE ENHANCEMENT
COURSE OUTCOMES MAPPED WITH MODULE-III
At the end of the unit students are able to:
Bloom’s
CO Course Outcomes
Taxonomy
CO 2 Construct image intensity transformations and spatial filtering for image Apply
enhancement in the spatial domain.
Identify 2D convolution and filtering techniques for smoothening and
CO 3 sharpening of images in frequency domain. Apply
CO 2 √ √ √ √
CO 3
√ √ √ √
INTRODUCTION:
Image enhancement approaches fall into two broad categories: spatial domain methods and frequency
domain methods. The term spatial domain refers to the image plane itself, and approaches in this
category are based on direct manipulation of pixels in an image.
Frequency domain processing techniques are based on modifying the Fourier transform of an image.
Enhancing an image provides better contrast and a more detailed image as compare to non enhanced
image. Image enhancement has very good applications. It is used to enhance medical images, images
captured in remote sensing, images from satellite e.t.c. As indicated previously, the term spatial domain
refers to the aggregate of pixels composing an image. Spatial domain methods are procedures that operate
directly on these pixels. Spatial domain processes will be denoted by the expression.
g(x,y) = T[f(x,y)]
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator on f, defined
over some neighborhood of (x, y). The principal approach in defining a neighborhood about a point
(x, y) is to use a square or rectangular subimage area centered at (x, y), as shows. The center of the
subimage is moved from pixel to pixel starting, say, at the top left corner. The operator T is applied
at each location (x, y) to yield the output, g, at that location. The process utilizes only the pixels in
the area of the image spanned by the neighborhood.
The simplest form of T is when the neighborhood is of size 1*1 (that is, a single pixel). In this case,
g depends only on the value of f at (x, y), and T becomes a gray-level (also called an intensity or
mapping) transformation function of the form
s=T(r)
where r is the pixels of the input image and s is the pixels of the output image. T is a transformation
function that maps each value of „r‟ to each value of „s‟.
For example, if T(r) has the form the effect of this transformation would be to produce an image of
higher contrast than the original by darkening the levels below m and brightening the levels above
m in the original image. In this technique, known as contrast stretching, the values of r below m are
compressed by the transformation function into a narrow range of s, toward black.The opposite
effect takes place for values of r above m.
In the limiting case shown in Fig. T(r) produces a two-level (binary) image. A mapping of this form
is called a thresholding function.
One of the principal approaches in this formulation is based on the use of so-called masks (also
referred to as filters, kernels, templates, or windows). Basically, a mask is a small (say, 3*3) 2-D
array, such as the one shown in Fig, in which the values of the mask coefficients determine the
nature of the process, such as image sharpening. Enhancement techniques based on this type of
approach often are referred to as mask processing or filtering.
Figure: Gray level transformation functions for contrast enhancement.
Image enhancement can be done through gray level transformations which are discussed below.
Basic gray level transformations:
• Image negative
• Log transformations
• Power lawtransformations
• Piecewise-Linear transformationfunctions
Linear transformation:
First we will look at the linear transformation. Linear transformation includes simple identity
and negative transformation.
Identity transition is shown by a straight line. In this transition, each value of the input image is
directly mapped to each other value of output image. That results in the same input image and
output image. And hence is called identity transformation. It has been shown below:
Negative transformation:
The second linear transformation is negative transformation, which is invert of identity
transformation. In negative transformation, each value of the input image is subtracted from the
L-1 and mapped onto the output image
Image negative: The image negative with gray level value in the range of [0, L-1] is obtained by negative
transformation given by S = T(r) or
S = L -1 – r
Where r= gray level value at pixel (x,y)
L is the largest gray level consists in the image It results in getting photograph negative. It is useful
when for enhancing white details embedded in dark regions of the image. The overall graph of these
transitions has been shown below.
since the input image of Einstein is an 8 bpp image, so the number of levels in this image are
256. Putting 256 in the equation, we get this
s = 255 – r
So each value is subtracted by 255 and the result image has been shown above. So what happens
is that, the lighter pixels become dark and the darker picture becomes light. And it results in
image negative.
It has been shown in the graph below.
Logarithmic transformations:
Logarithmic transformation further contains two type of transformation. Log transformation and
inverse log transformation
Log transformations:
The log transformations can be defined by this formula
s = c log(r + 1).
Where s and r are the pixel values of the output and the input image and c is a constant. The
value 1 is added to each of the pixel value of the input image because if there is a pixel intensity
of 0 in the image, then log (0) is equal to infinity. So 1 is added, to make the minimum value at
least 1.
During log transformation, the dark pixels in an image are expanded as compare to the higher
pixel values. The higher pixel values are kind of compressed in log transformation. This result
in following image enhancement.An another way of representing
Log transformations: Enhance details in the darker regions of an image at the expense of detail in brighter
regions.
T(f) = C * log (1+r)
• Here C is constant and r ≥ 0.
• The shape of the curve shows that this transformation maps the narrow range of low gray level values
in the input image into a wider range of output image.
• The opposite is true for high level values of input image.
Fig. 2.13 Plot of the equation S = crγ for various values of γ (c =1 in all cases).
This type of transformation is used for enhancing images for different type of display devices.
The gamma of different display devices is different. Varying gamma (γ) obtains family of
possible transformation curves S = C* r γ
Here C and γ are positive constants. Plot of S versus r for various values of γ is γ > 1 compresses dark values
Expands bright values
γ < 1 (similar to Log transformation) Expands dark values Compresses bright values
When C = γ = 1 , it reduces to identity transformation .
Correcting gamma:
Fig. x Contrast stretching. (a) Form of transformation function. (b) A low-contrast stretching.
(c) Result of contrast stretching. (d) Result of thresholding
Figure x(b) shows an 8-bit image with low contrast. Fig. x(c) shows the result of contrast
stretching, obtained by setting (r1, s1 )=(rmin, 0) and (r2, s2)=(rmax,L-1) where rmin and rmax
denote the minimum and maximum gray levels in the image, respectively.
Thus, the transformation function stretched the levels linearly from their original range to the
full range
[0, L-1] . Finally, Fig. x(d) shows the result of using the thresholding function defined
previously, with r1=r2=m, the mean gray level in the image. The original image on which these
results are based is a scanning electron microscope image of pollen, magnified approximately
700 times.
Gray-level slicing:
Highlighting a specific range of gray levels in an image often is desired. Applications include
enhancing features such as masses of water in satellite imagery and enhancing flaws in X-ray
images.
There are several ways of doing level slicing, but most of them are variations of two basic
themes .One approach is to display a high value for all gray levels in the range of interest and a
low value for all other gray levels.
This transformation, shown in Fig. y(a), produces a binary image. The second approach, based
on the transformation shown in Fig .y (b), brightens the desired range of gray levels but
preserves the background and gray-level tonalities in the image. Figure y (c) shows a gray-scale
image, and Fig. y(d) shows the result of using the transformation in Fig. y(a).Variations of the
two transformations shown in Fig. are easy to formulate.
Bit-plane slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total image
appearance by specific bits might be desired. Suppose that each pixel in an image is represented
by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from bit-plane 0 for
the least significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes, plane
0 contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7
contains all the high-order bits.
Figure illustrates these ideas, and Fig. 3.14 shows the various bit planes for the image shown in
Fig. Note that the higher-order bits (especially the top four) contain the majority of the visually
significant data. The other bit planes contribute to more subtle details in the image. Separating
a digital image into its bit planes is useful for analyzing the relative importance played by each
bit of the image, a process that aids in determining the adequacy of the number of bits used to
quantize each pixel.
In terms of bit-plane extraction for an 8-bit image, it is not difficult to show that the (binary)
image for bit-plane 7 can be obtained by processing the input image with a thresholding gray-
level transformation function that (1) maps all levels in the image between 0 and 127 to one
level (for example, 0); and (2) maps all levels between 129 and 255 to another (for example,
255).The binary image for bit-plane 7 in Fig. was obtained in just this manner. It is left as an
exercise to obtain the gray-level transformation functions that would yield the other bit planes.
Histogram Processing:
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function of
the form
H(rk)=nk
where rk is the kth gray level and nk is the number of pixels in the image having the level rk..
A normalized histogram is given by the equation p(rk)=nk/n for k=0,1,2,…..,L-1 ,P(rk) gives
the estimate of the probability of occurrence of gray level rk. The sum of all components of a
normalized histogram is equal to 1.The histogram plots are simple plots of H(rk)=nk versus rk.
In the dark image the components of the histogram are concentrated on the low (dark) side of
the gray scale. In case of bright image the histogram components are baised towards the high
side of the gray scale. The histogram of a low contrast image will be narrow and will be centered
towards the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of the gray
scale. The net effect of this will be an image that shows a great deal of gray levels details and
has high dynamic range.
Histogram Equalization:
Histogram equalization is a common technique for enhancing the appearance of images.
Suppose we have an image which is predominantly dark. Then its histogram would be skewed
towards the lower end of the grey scale and all the image detail are compressed into the dark
end of the histogram. If we could „stretch out‟ the grey levels at the dark end to produce a more
uniformly distributed histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be enhanced. The
range of r is [0, 1] with r=0 repressing black and r=1 representing white. The transformation
function is of the form
S=T(r) where 0<r<1
It produces a level s for every pixel value r in the original image.
The transformation function is assumed to fulfill two condition T(r) is single valued
and monotonically increasing in the internal 0<T(r)<1 for 0<r<1.The transformation function
should be single valued so that the inverse transformations should exist. Monotonically
increasing condition preserves the increasing order from black to white in the output image. The
second conditions guarantee that the output gray levels will be in the same range as the input
levels. The gray levels of the image may be viewed as random variables in the interval [0.1].
The most fundamental descriptor of a random variable is its probability density function (PDF)
Pr(r) and Ps(s) denote the probability density functions of random variables r and s respectively.
Basic results from an elementary probability theory states that if Pr(r) and Tr are known and T-
1(s) satisfies conditions (a), then the probability density function Ps(s) of the transformed
variable is given by the formula
Thus the PDF of the transformed variable s is the determined by the gray levels PDF of the
input image and by the chosen transformations function.
A transformation function of a particular importance in image processing
Where as P and Q are the padded sizes from the basic equations
Wraparound error in their circular convolution can be avoided by padding these functions with
zeros,
Visualization: ideal low pass filter:
As shown in fig. below
Fig: ideal low pass filter 3-D view and 2-D view and line graph. Effect of different cutoff frequencies:
Fig. below (a) Test pattern of size 688x688 pixels, and (b) its Fourier spectrum. The spectrum
is double the image size due to padding but is shown in half size so that it fits in the page. The
superimposed circles have radii equal to 10, 30, 60, 160 and 460 with respect to the full- size
spectrum image. These radii enclose 87.0, 93.1, 95.7, 97.8 and 99.2% of the padded image
power respectively.
Fig: (a) Test patter of size 688x688 pixels (b) its Fourier spectrum
Fig: (a) original image, (b)-(f) Results of filtering using ILPFs with cutoff frequencies set at radii
values 10, 30, 60, 160 and 460, as shown in figure.The power removed by these filters was 13,
6.9, 4.3, 2.2 and 0.8% of the total, respectively.
As the cutoff frequency decreases,
• image becomes more blurred
• Noise becomes increases
• Analogous to larger spatial filter sizes
The severe blurring in this image is a clear indication that most of the sharp detail information
in the picture is contained in the 13% power removed by the filter. As the filter radius is
increases less and less power is removed, resulting in less blurring.
Why is there ringing?
Ideal low-pass filter function is a rectangular function
The inverse Fourier transform of a rectangular function is a sinc function.
Fig. Spatial representation of ILPFs
Fig. Spatial representation of ILPFs of order 1 and 20 and corresponding intensity profiles
through the center of the filters( the size of all cases is 1000x1000 and the cutoff frequency is
5), observe how ringing increases as a function of filter order.
Butterworth low-pass filter:
Transfor function of a Butterworth low pass filter (BLPF) of order n, and with cutoff frequency
at a distance D0 from the origin, is defined as
Transfer function does not have sharp discontinuity establishing cutoff between passed and
filtered frequencies.
Cut off frequency D0 defines point at which H(u,v) = 0.5
Fig. (a) perspective plot of a Butterworth lowpass-filter transfer function. (b) Filter displayed
as an image. (c)Filter radial cross sections of order 1 through 4.Unlike the ILPF, the BLPF
transfer function does not have a sharp discontinuity that gives a clear cutoff between passed
and filtered frequencies.
Fig. (a)-(d) Spatial representation of BLPFs of order 1, 2, 5 and 20 and corresponding intensity
profiles through the center of the filters (the size in all cases is 1000 x 1000 and the cutoff
frequency is 5) Observe how ringing increases as a function of filter order.
Gaussian low pass filters:
The form of these filters in two dimensions is given by
Fig.(a) Original image. (b)-(f) Results of filtering using GLPFs with cutoff frequencies at the radii.
Fig. (a) Original image (784x 732 pixels). (b) Result of filtering using a GLPF with D0 = 100. (c) Result of filtering using a
GLPF with D0 = 80. Note the reduction in fine skin lines in the magnified sections in (b) and (c).
Fig. shows an application of low pass filtering for producing a smoother, softer- looking result
from a sharp original. For human faces, the typical objective is to reduce the sharpness of fine
skin lines and small blemished.
Image sharpening using frequency domain filters:
An image can be smoothed by attenuating the high-frequency components of its Fourier
transform. Because edges and other abrupt changes in intensities are associated with high-
frequency components, image sharpening can be achieved in the frequency domain by high pass
filtering, which attenuates the low-frequency components without disturbing high- frequency
information in the Fourier transform.
The filter function H(u,v) are understood to be discrete functions of size PxQ; that is the discrete
frequency variables are in the range u = 0,1,2,…….P-1 and v = 0,1,2,…….Q-1.
The meaning of sharpening is
• Edges and fine detail characterized by sharp transitions in image intensity
• Such transitions contribute significantly to high frequency components of Fourier transform
• Intuitively, attenuating certain low frequency components and preserving high frequency
components result in sharpening.
• Intended goal is to do the reverse operation of low-pass filters
When low-pass filter attenuated frequencies, high-pass filter passes them
• When high-pass filter attenuates frequencies, low-pass filter passes them. A high pass filter
is obtained from a given low pass filter using the equation.
•
H hp (u,v) = 1- Htp (u,v) Where Hlp (u,v) is the transfer function of the low-pass filter. That is
when the low- pass filter attenuates frequencies; the high-pass filter passed them, and vice-
versa. We consider ideal, Butter-worth, and Gaussian high-pass filters. As in the previous
section, we illustrate the characteristics of these filters in both the frequency and spatial
domains. Fig. Shows typical 3-D plots, image representations and cross sections for these filters.
As before, we see that the Butter-worth filter represents a transition between the sharpness of
the ideal filter and the broad smoothness of the Gaussian filter. Fig. discussed in the sections
the follow, illustrates what these filters look like in the spatial domain. The spatial filters were
obtained and displayed by using the procedure used.
Fig: Top row: Perspective plot, image representation, and cross section of a typical ideal high-
pass filter. Middle and bottom rows: The same sequence for typical butter-worth and Gaussian
high-pass filters.
Ideal high-pass filter:
A 2-D ideal high-pass filter (IHPF) is defined as
H (u,v) = 0, if D(u,v) ≤ D0 1, if D(u,v) ˃ D0
Where D0 is the cutoff frequency and D(u,v) is given by eq. As intended, the IHPF is the
opposite of the ILPF in the sense that it sets to zero all frequencies inside a circle of radius D0
while passing, without attenuation, all frequencies outside the circle. As in case of the ILPF,
the IHPF is not physically realizable.
Spatial representation of high pass filters:
Fig.. Spatial representation of typical (a) ideal (b) Butter-worth and (c) Gaussian frequency
domain high-pass filters, and corresponding intensity profiles through their centers. We can
expect IHPFs to have the same ringing properties as ILPFs. This is demonstrated clearly in Fig.
Which consists of various IHPF results using the original image in Fig.(a) with D0 set to 30,
60,and 160 pixels, respectively. The ringing in Fig. (a) is so severe that it produced distorted,
thickened object boundaries (e.g., look at the large letter “a”). Edges of the top three circles do
not show well because they are not as strong as the other edges in the image (the intensity of
these three objects is much closer to the background intensity, giving discontinuities of smaller
magnitude).
Fig.. Results of high-pass filtering the image in Fig.(a) using an IHPF with D0 = 30, 60, and
160.The situation improved somewhat with D0 = 60. Edge distortion is quite evident still, but
now we begin to see filtering on the smaller objects.
Due to the now familiar inverse relationship between the frequency and spatial domains, we
know that the spot size of this filter is smaller than the spot of the filter with D0 = 30. The result
for D0 = 160 is closer to what a high-pass filtered image should look like. Here, the edges are
much cleaner and less distorted, and the smaller objects have been filtered properly.Of course,
the constant background in all images is zero in these high-pass filtered images because
highpass filtering is analogous to differentiation in the spatial domain.
Butter-worth high-pass filters:
A 2-D Butter-worth high-pass filter (BHPF) of order n and cutoff frequency D0 is defined as
Where D(u,v) is given by Eq.(3). This expression follows directly from and (6). The middle row
of Fig. shows an image and cross section of the BHPF function.
Butter-worth high-pass filter to behave smoother than IHPFs. Figure shows the performance of
a BHPF of order 2 and with D0 set to the same values as in Figure shows The boundaries are
much less distorted than in below figure. even for the smallest value of cutoff frequency.
Filtered results: BHPF:
Fig. Results of high-pass filtering the image in above figure (a) using a BHPF of order 2 with
D0 = 30, 60, and 160 corresponding to the circles in above figure (b). These results are much
smoother than those obtained with an IHPF.
Gaussian high-pass filters:
The transfer function of the Gaussian high-pass filter(GHPF) with cutoff frequency locus at a
distance D0 from the center of the frequency rectangle is given by
Where D(u,v) is given by Eq.(4). This expression follows directly from Eqs.(2) and (6). The
third row in below figure. Shows a perspective plot, image and cross section of the GHPF
function. Following the same format as for the BHPF, we show in below Fig comparable results
using GHPFs. As expected, the results obtained are more gradual than with the previous two
filters.
MODULE – III
IMAGE RESTORATION AND FILTERING
At the end of the unit students are able to:
Knowledge Level
Course Outcomes (Bloom’s
Taxonomy)
CO3 Apply region and edge based image segmentation Apply
techniques for detection of objects in images.
CO4 Interpret morphological operations for extracting image Apply
components to represent and description of region shape.
PROGRAM OUTCOMES AND PROGRAM SPECIFIC OUTCOMES MAPPED WITH MODULE III
PO 1 Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex
engineering problems.
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
CO 3
√ √
CO 4 √ √ √ √ √ √
Gray level interpolation.
The distortion correction equations yield non integer values for x' and y'. Because the
distorted image g is digital, its pixel values are defined only at integer coordinates. Thus using
non integer values for x' and y' causes a mapping into locations of g for which no gray levels
are defined. Inferring what the gray-level values at those locations should be, based only on the
pixel values at integer coordinate locations, and then becomes necessary. The technique used to
accomplish this is called gray-level interpolation. The simplest scheme for gray-level
interpolation is based on a nearest neighbor approach.This method, also called zero-order
interpolation, is illustrated in Fig. 3.1: This figure shows The mapping of integer (x, y) coordinates
into fractional coordinates (x', y') by means of following equations
and
(B) The assignment of the gray level of this nearest neighbor to the pixel located at (x, y).
Although nearest neighbor interpolation is simple to implement, this method often has
the drawback of producing undesirable artifacts, such as distortion of straight edges in images
of high resolution. Smoother results can be obtained by using more sophisticated techniques,
such as cubic convolution interpolation, which fits a surface of the sin(z)/z type through a much
larger number of neighbors (say, 16) in order to obtain a smooth estimate of the gray level
at any
Desired point. Typical areas in which smoother approximations generally are required include
3-D graphics and medical imaging. The price paid for smoother approximations is additional
computational burden. For general-purpose image processing a bilinear interpolation approach
that uses the gray levels of the four nearest neighbors usually is adequate. This approach is
straightforward. Because the gray level of each of the four integral nearest neighbors of a non
integral pair of coordinates (x', y') is known, the gray-level value at these coordinates, denoted
v(x', y'), can be interpolated from the values of its neighbors by using the relationship
where the four coefficients are easily determined from the four equations in four unknowns that
can be written using the four known neighbors of (x', y'). When these coefficients have been
determined, v(x', y') is computed and this value is assigned to the location in f{x, y) that yielded
the spatial mapping into location (x', y'). It is easy to visualize this procedure with the aid of
Fig.3.1. The exception is that, instead of using the gray-level value of the nearest neighbor to
(x', y'), we actually interpolate a value at location (x', y') and use this value for the gray-level
assignment at (x, y).
e2 = E {(f- f )2}
where E {•} is the expected value of the argument. It is assumed that the noise and the image
are uncorrelated; that one or the other has zero mean; and that the gray levels in the estimate
are a linear function of the levels in the degraded image. Based on these conditions, the
minimum of the error function is given in the frequency domain by the expression
where we used the fact that the product of a complex quantity with its conjugate is equal to the
magnitude of the complex quantity squared. This result is known as the Wiener filter, after N.
Wiener [1942], who first proposed the concept in the year shown. The filter, which consists of
the terms inside the brackets, also is commonly referred to as the minimum mean square error
filter or the least square error filter. The Wiener filter does not have the same problem as the
inverse filter with zeros in the degradation function, unless both H(u, v) and Sη(u, v) are zero
for the same value(s) of u and v. The terms in above equation are as follows: H (u, v) =
degradation function H*(u, v) = complex conjugate of H (u, v)
As before, H (u, v) is the transform of the degradation function and G (u, v) is the transform of
the degraded image. The restored image in the spatial domain is given by the inverse Fourier
transform of the frequency-domain estimate F (u, v). Note that if the noise is zero, then the
noise power spectrum vanishes and the Wiener filter reduces to the inverse filter. When we are
dealing with spectrally white noise, the spectrum │N (u, v│ 2 is a constant, which simplifies
things considerably. However, the power spectrum of the undegraded image seldom is known.
An approach used frequently when these quantities are not known or cannot be estimated is to
approximate the equation as
where h (x, y) is the spatial representation of the degradation function and, the symbol
* indicates convolution. Convolution in the spatial domain is equal to multiplication in
the frequency domain, hence
where the terms in capital letters are the Fourier transforms of the corresponding terms in above
equation.
1. Mean filters
2. Order static filters and
3. Adaptive filters
Mean filters.
There are four types of mean filters. They are
This operation can be implemented using a convolution mask in which all coefficients have
value 1/mn
The harmonic mean filter works well for salt noise, but fails for pepper noise. It does well
also with other types of noise like Gaussian noise.
where Q is called the order of the filter. This filter is well suited for reducing or virtually
eliminating the effects of salt-and-pepper noise. For positive values of Q, the filter eliminates
pepper noise. For negative values of Q it eliminates salt noise. It cannot do both
simultaneously. Note that the contra harmonic filter reduces to the arithmetic mean filter if Q
= 0, and to the harmonic mean filter if Q = -1.
The original value of the pixel is included in the computation of the median. Median filters
are quite popular because, for certain types of random noise, they provide excellent noise-
reduction capabilities, with considerably less blurring than linear smoothing filters of
similar size. Median filters are particularly effective in the presence of both bipolar and
unipolar impulse noise.
This filter is useful for finding the brightest points in an image. Also, because pepper noise
has very low values, it is reduced by this filter as a result of the max selection process in the
subimage area Sxy. The 0th percentile filter is the min filter.
This filter is useful for finding the darkest points in an image. Also, it reduces salt noise as a
result of the min operation.
where the value of d can range from 0 to mn - 1. When d = 0, the alpha- trimmed filter
reduces to the arithmetic mean filter. If d = (mn - l)/2, the filter becomes a median filter. For
other values of d, the alpha-trimmed filter is useful in situations involving multiple types of
noise, such as a combination of salt-and-pepper and Gaussian noise.
This filter is to operate on a local region, Sxy. The response of the filter at any point (x,
y) on which the region is centered is to be based on four quantities: (a) g(x, y), the value of the
noisy image at (x, y); (b) a2, the variance of the noise corrupting /(x, y) to form g(x, y); (c)
ray, the local mean of the pixels in Sxy; and (d) σ2L , the local variance of the pixels in Sxy.The
behavior of the filter to be as follows:
1. If σ2η is zero, the filler should return simply the value of g (x, y). This is the trivial, zero-noise
case in which g (x, y) is equal to f (x, y).
2. If the local variance is high relative to σ2η the filter should return a value close to g (x, y). A
high local variance typically is associated with edges, and these should be preserved.
3. If the two variances are equal, we want the filter to return the arithmetic mean value of the
pixels in Sxy. This condition occurs when the local area has the same properties as the overall
image, and local noise is to be reduced simply by averaging.
The only quantity that needs to be known or estimated is the variance of the overall noise, a2.
The other parameters are computed from the pixels in Sxy at each location (x, y) on which the
filter window is centered.
zmin = minimum gray level value in Sxy zmax = maximum gray level value in
The adaptive median filtering algorithm works in two levels, denoted level A and level B, as
follows:
A2 = zmed - zmax
B2 = zxy - zmax
Appropriately, these are called the illumination and reflectance components and are
denoted by i (x, y) and r (x, y), respectively. The two functions combine as a product to
form f (x, y).
f (x, y) = i (x, y) r (x, y) …. (2)
where
0 < i (x, y) < ∞ …. (3)
and
Equation (4) indicates that reflectance is bounded by 0 (total absorption) and 1 (total
reflectance).The nature of i (x, y) is determined by the illumination source, and r (x, y) is
determined by the characteristics of the imaged objects. It is noted that these expressions also
are applicable to images formed via transmission of the illumination through a medium, such
as a chest X-ray.
Inverse filtering.
The simplest approach to restoration is direct inverse filtering, where F (u, v), the
transform of the original image is computed simply by dividing the transform of the degraded
image, G (u, v), by the degradation function
Hence
It tells that even if the degradation function is known the undegraded image cannot be
recovered [the inverse Fourier transform of F( u, v)] exactly because N(u, v) is a random
function whose Fourier transform is not known.
If the degradation has zero or very small values, then the ratio N(u, v)/H(u, v) could
easily dominate the estimate F(u, v).
One approach to get around the zero or small-value problem is to limit the filter
frequencies to values near the origin. H (0, 0) is equal to the average value of h(x, y) and that
this
is usually the highest value of H (u, v) in the frequency domain. Thus, by limiting the analysis
to frequencies near the origin, the probability of encountering zero values is reduced.
Gaussian noise
Because of its mathematical tractability in both the spatial and frequency domains,
Gaussian (also called normal) noise models are used frequently in practice. In fact, this
tractability is so convenient that it often results in Gaussian models being used in situations in
which they are marginally applicable at best.
… (1)
where z represents gray level, µ is the mean of average value of z, and a σ is its standard
deviation. The standard deviation squared, σ2, is called the variance of z. A plot of this function
is shown in Fig. 5.10. When z is described by Eq. (1), approximately 70% of its values will be
in the range [(µ - σ), (µ +σ)], and about 95% will be in the range [(µ - 2σ), (µ + 2σ)].
Rayleigh noise
The PDF of Rayleigh noise is given by
Figure 5.10 shows a plot of the Rayleigh density. Note the displacement from the origin and
the fact that the basic shape of this density is skewed to the right. The Rayleigh density can be
quite useful for approximating skewed histograms.
where the parameters are such that a > 0, b is a positive integer, and "!" indicates factorial.
The mean and variance of this density are given by
µ=b/a
σ2 = b / a2
Exponential noise
The PDF of exponential noise is given by
µ=1/a
σ2 = 1 / a2
This PDF is a special case of the Erlang PDF, with b = 1.
Uniform noise
µ = a + b /2
σ2 = (b – a ) 2 / 12
(ii) Image enhancement can be implemented by spatial and frequency domain technique, whereas
image restoration can be implement by frequency domain and algebraic techniques.
(iii) The computational complexity for image enhancement is relatively less when compared to the
computational complexity for irrrage restoration, since algebraic methods requires manipulation of
large number of simultaneous equation. But, under some condition computational complexity can
be reduced to the same level as that required by traditional frequency domain technique.
(iv) Image enhancement techniques are problem oriented, whereas image restoration techniques are
general and are oriented towards modeling the degradation and applying the reverse process in
order to reconstruct the original image.
(v) Masks are used in spatial domain methods for image enhancement, whereas masks are not used
for image restoration techniques.
(vi) Contrast stretching is considered as image enhancement technique because it is based on the
pleasing aspects of the review, whereas removal of’ image blur by applying a deblurring function is
considered as a image restoration technique.
The L-R algorithm is based on maximum-likelihood formulation, in this formulation Poisson statistics are
used to model the image. If the likelihood of model is increased, then the result is an equation which
satisfies when the following iteration converges.
Here,
The factor f which is present in the right side denominator leads to non-linearity. Since, the
algorithm is a type of nonlinear restorations; hence it is stopped when satisfactory result is
obtained. The basic syntax of function deconvlucy with the L-R algorithm is implemented is
given below.
fr = Deconvlucy (g, psf, NUMIT, DAMPAR, WEIGHT). Here the parameters are,
DAMPAR
The DAMPAR parameter is a scalar parameter which is used to determine the deviation
of resultant image with the degraded image (g). The pixels which gel deviated from their original
value within the DAMPAR, for these pixels iterations are cancelled so as to reduce noise
generation and present essential image information.
WEIGHT
WEIGHT parameter gives a weight to each and every pixel. It is array of size similar
to that of degraded image (g). In applications where a pixel leads to improper image is
removed by assigning it to a weight as 0’. The pixels may also be given weights depending
upon the flat-field correction, which is essential according to image array. Weights are used
in applications such as blurring with specified psf. They are used to remove the pixels which
are pre9ent at the boundary of the image and are blurred separately by psf. If the array size
of psf is n x n then the width of weight of border of zeroes being used is ceil (n / 2).
MODULE-IV
COLOR IMAGE PROCESSING
4.1 Introduction:
It was also stated that only energy within a certain frequency/wavelength range is measured. This
wavelength range is denoted the visual spectrum, In the human eye this is done by the so-called rods,
which are specialized nerve-cells that act as photoreceptors. Besides the rods, the human eye also
contains cones. These operate like the rods, but are not sensitive to all wavelengths in the visual
spectrum. Instead, the eye contains three types of cones, each sensitive to a different wavelength
range. The human brain interprets the output from these different cones as different colors as seen in
Table . So, a color is defined by a certain wavelength in the electromagnetic spectrum as illustrated.
Since the three different types of cones exist we have the notion of the primary colors being red,
green and blue. Psycho-visual experiments have shown that the different cones have different
sensitivity. This means that when you see two different colors with the same intensity, you will judge
their brightness differently. On average, a human perceives red as being 2.6 times as bright as blue
and green as being 5.6 times as bright as blue. Hence the eye is more sensitive to green and least
sensitive to blue. When all wavelengths (all colors) are present at the same time, the eye perceives
this as a shade of gray, hence no color is seen! If the energy level increases the shade becomes
brighter and ultimately becomes white. Conversely, when the energy:
A color camera is based on the same principle as the human eye. That is, it measures the
amount of incoming red light, green light and blue light, respectively. This is done in one
of two ways depending on the number of sensors in the camera. In the case of three
sensors, each sensor measures one of the three colors, respectively. This is done by
splitting the incoming light into the three wavelength ranges using some optical filters and
mirrors. So red light is only send to the “red-sensor” etc. The result is three images each
describing the amount of red, green and blue light per pixel, respectively. In a color image,
each pixel therefore consists of three values: red, green and blue. The actual representation
might be three images—one for each color, as illustrated in Fig. 3.4, but it can also be a 3-
dimensional vector for each pixel, hence an image of vectors. Such a vector looks like this:
Color pixel = [Red,Green,Blue]=[R,G,B]
In terms of programming a color pixel is usually represented as a struct. Say we want to set
the RGB values of the pixel at position (2, 4) to: Red = 100, Green = 42, and Blue = 10,
respectively. In C-code this can for example be written as f [2][4].R = 100; f [2][4].G = 42;
f [2][4].B = 10;
Typically each color value is represented by an 8-bit (one byte) value meaning that
256 different shades of each color can be measured. Combining different values of
the three colors, each pixel can represent 2563 = 16,777,216 different colors. A
cheaper alternative to having three sensors including mirrors and optical filters is to
only have one sensor. In this case, each cell in the sensor is made sensitive to one of
the three colors (ranges of wavelength). This can be done in a number of different
ways. One is using a Bayer pattern. Here 50% of the cells are sensitive to green,
while the remaining cells are divided equally between red and blue. The reason
being, as mentioned above, that the human eye is more sensitive to green. The layout
of the different cells is illustrated in Fig. 3.5. The figure shows the upper-left corner
of the sensor, where the letters illustrate which color a particular pixel is sensitive to.
This means that each pixel only cap tures one color and that the two other colors of
a particular pixel must be inferred from the neighbors. Algorithms for finding the
remaining colors of a pixel are known as demosaicing and, generally speaking, the
algorithms are characterized by the required processing time (often directly
proportional to the number of neighbors included) and the quality of the output. The
higher the processing time the better the result. How to balance these two issues is up
to the camera manufactures, and in general, the higher the quality of the camera, the
higher the cost. Even very ad vanced algorithms are not as good as a three sensor
color camera and note that when using, for example, a cheap web-camera, the quality
of the colors might not be too good and care should be taken before using the colors
for any processing. Regard less of the choice of demosaicing algorithm, the output
is the same as when using three sensors, namely Eq. 3.1. That is, even though only
one color is measured per pixel, the output for each pixel will (after demosaicing)
consist of three values: R, G, and B. An example of a simple demosaicing algorithm
is to infer the missing colors from the nearest pixels, for example using the following
set of equations: g(x,y) ⎧ ⎪⎪⎨ ⎪⎪⎩ [R,G,B]B = [f (x + 1,y + 1),f (x + 1,y),f (x,y)]
[R,G,B]GB = [f (x,y + 1),f (x,y),f (x − 1,y)] [R,G,B]GR = [f (x + 1,y),f (x,y),f (x,y −
1)] [R,G,B]R = [f (x,y),f (x − 1,y),f (x − 1,y − 1)] (3.2) where f (x,y) is the input
image (Bayer pattern) and g(x,y) is the output RGB image. The RGB values in the
output image are found differently depending on which color a particular pixel is
sensitive to: [R,G,B]B should be used for the pixels sensitive to blue, [R,G,B]R
should be used for the pixels sensitive to red, and [R,G,B]GB and [R,G,B]GR should
be used for the pixels sensitive to green followed by a blue or red pixel, respectively.
In Fig. 3.6 a concrete example of this algorithm is illustrated. In the left figure the
values sampled from the sensor are shown. In the right figure the resulting RGB
output image is shown using Eq. 3.2.
According to Eq. 3.1 a color pixel has three values and can therefore be represented
as one point in a 3D space spanned by the three colors. If we say that each color is
represented by 8-bits, then we can construct the so-called RGB color cube, see Fig.
3.7. In the color cube a color pixel is one point or rather a vector from (0, 0, 0) to the
pixel value. The different corners in the color cube represent some of the pure colors
and are listed in Table 3.2. The vector from (0, 0, 0) to (255, 255, 255) passes
through all the gray-scale values and is denoted the gray-vector. Note that the
gray vector is identical to Fig. 3.2. 3.2.2 Converting from RGB to Gray-Scale Even
though you use a color camera it might be sufficient for your algorithm to ap ply the
intensity information in the image and you therefore need to convert the color image
into a gray-scale image. Converting from RGB to gray-scale is performed as I = WR
· R + WG · G + WB · B
If we have the following three RGB pixel values (0, 50, 0), (0, 100, 0), and (0, 223,
0) in the RGB color cube, we can see that they all lie on the same vec tor, namely
the one spanned by (0, 0, 0) and (0, 255, 0). We say that all values are a shade of
green and go even further and say that they all have the same color (green), but
different levels of illumination. This also applies to the rest of the color cube. For
example, the points (40, 20, 50), (100, 50, 125) and (200, 100, 250) all lie on the
same vector and therefore have the same color, but just different illumination levels.
This is illustrated in Fig. 3.9. If we generalize this idea of different points on the
same line having the same color, then we can see that all possible lines pass through
the triangle defined by the points (1, 0, 0), (0, 1, 0) and (0, 0, 1), see Fig. 3.10(a).
The actual point (r,g,b) where a line intersects the triangle is found as2:
(r,g,b) = R R + G + B , G R + G + B , B R + G + B …… (3.5)
These values are named normalized RGB and denoted (r,g,b). In Table 3.3 the rgb
values of some RGB values are shown. Note that each value is in the interval [0, 1]
and that r + g + b = 1. This means that if we know two of the normalized.
Other Color Representations From a human perception point of view the triangular
representation in 3.10(b) is not intuitive. Instead humans rather use the notion of hue
and saturation, when perceiving colors. The hue is the dominant wavelength in the
perceived light and represents the pure color, i.e., the colors located on the edges of
the triangle in Fig. 3.10(b). The saturation is the purity of the color and represents the
amount of white light mixed with the pure color. To understand these entities better,
let us look at Fig. 3.11(a). First of all we see that the point C corresponds to the
neutral point, meaning the colorless center of the triangle where (r,g) = (1/3, 1/3). Let
us define a random point in the triangle as P . The hue of this point is now defined as
an angle, θ , between the vectors −−→Cr=1 and −→CP . So hue = 0° means red and
hue = 120° means green. If the point P is located on the edge of the triangle then we
say the saturation is 1, hence a pure color. As the point approaches C the saturation
goes toward 0, and ultimately becomes 0 when P = C. Since the distance from C to
the three edges of the triangle is not uniform, the saturation is defined as a relative
distance. That is, saturation is defined as the ratio between the distance from C to P ,
and the distance from C to the point on the edge of the triangle in the direction of
−→CP . Mathematically we have Saturation = −→CP −−→CP , Hue = θ
(3.7) where −→CP is the length of the vector −→CP . The representation of
colors based on hue and saturation results in a circle as opposed to the triangle in Fig.
3.10(b). In Fig. 3.11(b) the hue–saturation representation is illustrated together with
some of the pure colors. It is important to realize how this figure relates to Fig. 3.7,
or in other words, how the hue–saturation representation relates to the RGB
representation. The center of the hue–saturation circle in Fig. 3.11(b) is a shade of
gray and corresponds to the gray-vector in Fig. 3.7. The circle is located so that it is
perpendicular to the gray-vector. For a particular RGB value, the hue–saturation
circle is therefore centered at a position on the gray-vector, so that the RGB value is
included in the circle. A number of different color representations exist, which are
based on the notion of hue and saturation. Below two of these are presented.
Further Information When reading literature on color spaces and color processing it
is important to re alize that a number of different terms are used.6 Unfortunately,
some of these terms are used interchangeably even though they might have different
physical/perceptu al/technical meanings. We therefore give a guideline to some of
the terms you are likely to encounter when reading literature on colors: Chromatic
Color All colors in the RGB color cube except those lying on the gray line spanned
by (0, 0, 0) and (255, 255, 255). Achromatic Color The colorless values in the RGB
cube, i.e., all those colors lying on the gray-line. The opposite of chromatic color.
Shades of gray The same as achromatic color. Intensity The average amount of
energy, i.e., (R + G + B)/3. Brightness The amount of light perceived by a human.
Lightness The amount of light perceived by a human. Luminance The amount of
light perceived by a human. Note that when you ven ture into the science of color
understanding, the luminance defines the amount of emitted light. Luma Gamma-
corrected luminance.
Shade Darkening a color. When a subtractive color space is applied, different shades
(darker nuances) of a color are obtained by mixing the color with differ ent amounts
of black. Tint Lightening a color. When a subtractive color space is applied, different
tints (lighter nuances) of a color are obtained by mixing the color with different
amounts of white. Tone A combination of shade and tint, where gray is mixed with
the input color. ’(denoted prime) The primed version of a color, i.e., R’, means that
the value has been gamma-corrected. Sometimes a gray-scale image is mapped to a
color image in order to enhance some aspect of the image. As mentioned above a
true color image cannot be recon structed from a gray-level image. We therefore
use the term pseudo color to under line that we are not talking about a true RGB
image. How to map from gray-scale to color can be done in many different ways
MODULE-V
IMAGE COMPRESSION
Introduction:
Image compression and the redundancies in a digital image.
The term data compression refers to the process of reducing the amount of data required to
represent a given quantity of information. A clear distinction must be made between data and
information. They are not synonymous. In fact, data are the means by which information is
conveyed. Various amounts of data may be used to represent the same amount of information.
Such might be the case, for example, if a long-winded individual and someone who is short and
to the point where to relate the same story. Here, the information of interest is the story; words are
the data used to relate the information. If the two individuals use a different number of words to
tell the same basic story, two different versions of the story are created, and at least one includes
nonessential data. That is, it contains data (or words) that either provide no relevant information
or simply restate that which is already known. It is thus said to contain data redundancy.
Data redundancy is a central issue in digital image compression. It is not an abstract concept
but a mathematically quantifiable entity. If n1 and n2 denote the number of information-
carrying units in two data sets that represent the same information, the relative data
redundancy RD of the first data set (the one characterized by n1) can be defined as
In digital image compression, three basic data redundancies can be identified and exploited:
coding redundancy, inter pixel redundancy, and psychovisual redundancy. Data
compression is achieved when one or more of these redundancies are reduced or
eliminated.
Coding Redundancy:
In this, we utilize formulation to show how the gray-level histogram of an image also can
provide a great deal of insight into the construction of codes to reduce the amount of data
used to represent it.
Let us assume, once again, that a discrete random variable rk in the interval [0, 1] represents
the gray levels of an image and that each rk occurs with probability pr (rk).
where L is the number of gray levels, nk is the number of times that the kth gray level
appears in the image, and n is the total number of pixels in the image. If the number of bits
used to represent each
value of rk is l (rk), then the average number of bits required to represent each pixel is
That is, the average length of the code words assigned to the various gray-level
values is found by summing the product of the number of bits used to represent each
gray level and the probability that the gray level occurs. Thus the total number of bits
required to code an M X N image is MNLavg.
Figure: Two images and their gray-level histograms and normalized autocorrelation coefficients along
one line.
Figures (e) and (f) show the respective autocorrelation coefficients computed along
one line of each image.
where
The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integer value of Δn. Of course, Δn must be strictly less than N, the
number of pixels on a line. The variable x is the coordinate of the line used in the
computation. Note the dramatic difference between the shape of the functions shown
in Figs. (e) and (f). Their shapes can be qualitatively related to the structure in the
images in Figs. (a) and (b).This relationship is particularly noticeable in Fig. (f), where
the high correlation between pixels separated by 45 and 90 samples can be directly
related to the spacing between the vertically oriented matches of Fig. (b). In addition,
the adjacent pixels of both images are highly correlated. When Δn is 1, γ is 0.9922 and
0.9928 for the images of Figs. (a) and (b), respectively. These values are typical of
most properly sampled television images. These illustrations reflect another important
form of data redundancy—one directly related to the inter pixel correlations within an
image. Because the value of any given pixel can be reasonably predicted from the
value of its neighbors, the information carried by individual pixels is relatively small.
Much of the visual contribution of a single pixel to an image is redundant; it could
have been guessed on the basis of the values of its neighbors. A variety of names,
including spatial redundancy, geometric redundancy, and inter frame redundancy,
have been coined to refer to these inter pixel dependencies. We use the term inter pixel
redundancy to encompass them all.
In order to reduce the inter pixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a more
efficient (but usually "non visual") format. For example, the differences between
adjacent pixels can be used to represent an image. Transformations of this type (that
is, those that remove inter pixel redundancy) are referred to as mappings. They are
called reversible mappings if the original image elements can be reconstructed from
the transformed data set.
Psychovisual Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than
simply the light reflected by the region. For example, intensity variations (Mach bands)
can be perceived in an area of constant intensity. Such phenomena result from the fact
that the eye does not respond with equal sensitivity to all visual information. Certain
information simply has less relative importance than other information in normal
visual processing. This information is said to be psycho visually redundant. It can be
eliminated without significantly impairing the quality of image perception.
That psycho visual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative
analysis of every pixel value in the image. In general, an observer searches for
distinguishing features such as edges or textural regions and mentally combines them
into recognizable groupings. The brain then correlates these groupings with prior
knowledge in order to complete the image interpretation process. Psycho visual
redundancy is fundamentally different from the redundancies discussed earlier. Unlike
coding and inter pixel redundancy, psycho visual redundancy is associated with real
or quantifiable visual information. Its elimination is possible only because the
information itself is not essential for normal visual processing. Since the elimination
of psycho visually redundant data results in a loss of quantitative information, it is
commonly referred to as quantization.
This terminology is consistent with normal usage of the word, which generally means
the mapping of a broad range of input values to a limited number of output values. As
it is an irreversible operation (visual information is lost), quantization results in lossy
data compression.
Fidelity criterion.
The removal of psycho visually redundant data results in a loss of real or quantitative
visual information. Because information of interest may be lost, a repeatable or
reproducible means of quantifying the nature and extent of information loss is highly
desirable. Two general classes of criteria are used as the basis for such an assessment:
When the level of information loss can be expressed as a function of the original or
input image and the compressed and subsequently decompressed output image, it is
said to be based on an objective fidelity criterion. A good example is the root-mean-
square (rms) error between an input and output image. Let f(x, y) represent an input
image and let f(x, y) denote an estimate or approximation of f(x, y) that results from
compressing and subsequently decompressing the input. For any value of x and y, the
error e(x, y) between f (x, y) and f^ (x, y) can be defined as
Where the images are of size M X N. The root-mean-square error, erms, between f(x,
y) and f^(x,
y) Then is the square root of the squared error averaged over the M X N array, or
The rms value of the signal-to-noise ratio, denoted SNR rms, is obtained by taking the
square root of Eq. above.
Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by
humans. Consequently, measuring image quality by the subjective evaluations of a
human observer often is more appropriate. This can be accomplished by showing a
"typical" decompressed image to an appropriate cross section of viewers and averaging
their evaluations. The evaluations may be made using an absolute rating scale or by
means of side-by-side comparisons of f(x, y) and f^(x, y).
The second stage, or quantizer block in Fig. (a), reduces the accuracy of the mapper's
output in accordance with some reestablished fidelity criterion. This stage reduces the
psycho visual redundancies of the input image. This operation is irreversible. Thus it
must be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol coder creates a
fixed- or variable-length code to represent the quantizer output and maps the output in
accordance with the code. The term symbol coder distinguishes this coding operation
from the overall source encoding process. In most cases, a variable-length code is used
to represent the mapped and quantized data set. It assigns the shortest code words to
the most frequently occurring output values and thus reduces coding redundancy. The
operation, of course, is reversible. Upon completion of the symbol coding step, the
input image has been processed to remove each of the three redundancies.
Figure (a) shows the source encoding process as three successive operations, but all
three operations are not necessarily included in every compression system. Recall, for
example, that the quantizer must be omitted when error-free compression is desired.
In addition, some compression techniques normally are modeled by merging blocks
that are physically separate in Fig (a). In the predictive compression systems, for
instance, the mapper and quantizer are often represented by a single block, which
simultaneously performs both operations.
The source decoder shown in Fig. (b) Contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse
operations of the source encoder's symbol encoder and mapper blocks. Because
quantization results in irreversible information loss, an inverse quantizer block is not
included in the general source decoder model shown in Fig. (b).
If a nonzero value is found, the decoder simply complements the code word bit position
indicated by the parity word. The decoded binary value is then extracted from the corrected
code word as h3h5h6h7. Method of generating variable length codes with an example.
Variable-Length Coding:
Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually,
Huffman coding yields the smallest possible number of code symbols per source
symbol. In terms of the noiseless coding theorem, the resulting code is optimal for a
fixed value of n, subject to the constraint that the source symbols be coded one at a
time.The first step in Huffman's approach is to create a series of source reductions by
ordering the probabilities of the symbols under consideration and combining the
lowest probability symbols into a single symbol that replaces them in the next source
reduction. Figure illustrates this process for binary coding (K-ary Huffman codes can
also be constructed). At the far left, a hypothetical set of source symbols and their
probabilities are ordered from top to bottom in terms of decreasing probability values.
To form the first source reduction, the bottom two probabilities,0.06 and 0.04, are
combined to form a "compound symbol" with probability 0.1. This compound symbol
and its associated probability are placed in the first source reduction column so that
the probabilities of the reduced source are also ordered from the most to the least
probable. This process is then repeated until a reduced source with two symbols (at the
far right) is reached.
The second step in Huffman's procedure is to code each reduced source, starting with
the smallest source and working back to the original source. The minimal length binary
code for a two-symbol source, of course, is the symbols 0 and 1. As Fig. 4.2 shows,
these symbols are assigned to the two symbols on the right (the assignment is arbitrary;
reversing the order of the 0 and 1 would work just as well). As the reduced source
symbol with probability 0.6 was generated by combining two symbols in the reduced
source to its left, the 0 used to code it is now assigned to both of these symbols, and a
0 and 1 are arbitrarily
Appended to each to distinguish them from each other. This operation is then repeated
for each reduced source until the original source is reached. The final code appears at
the far left in Fig. The average length of this code is
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code
efficiency is 0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities
subject to the constraint that the symbols be coded one at a time. After the code has
been created, coding and/or decoding is accomplished in a simple lookup table manner.
The code itself is an instantaneous uniquely decodable block code. It is called a block
code because each source symbol is mapped into a fixed sequence of code symbols. It
is instantaneous, because each code word in a string of code symbols can be decoded
without referencing succeeding symbols. It is uniquely decodable, because any string
of code symbols can be decoded in only one way. Thus, any string of Huffman encoded
symbols can be decoded by examining the individual symbols of the string in a left to
right manner. For the binary code of Fig., a left-to-right scan of the encoded string
010100111100 reveals that the first valid code word is 01010, which is the code for
symbol a3 .The next valid code is 011, which corresponds to symbol a1. Continuing in
this manner reveals the completely decoded message to be a3a1a2a2a6.
Arithmetic encoding process with an example.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates
non block codes. In arithmetic coding, which can be traced to the work of Elias, a one-
to-one correspondence between source symbols and code words does not exist.
Instead, an entire sequence of source symbols (or message) is assigned a single
arithmetic code word. The code word itself defines an interval of real numbers between
0 and 1. As the number of symbols in the message increases, the interval used to
represent it becomes smaller and the number of information units (say, bits) required
to represent the interval becomes larger. Each symbol of the message reduces the size
of the interval in accordance with its probability of occurrence. Because the technique
does not require, as does Huffman’s approach, that each source symbol translate into
an integral number of code symbols (that is, that the symbols be coded one at a time),
it achieves (but only in theory) the bound established by the noiseless coding theorem.
In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it
to [0.056, 0.072), and so on. The final message symbol, which must be reserved as a
special end-of- message indicator, narrows the range to [0.06752, 0.0688). Of course,
any number within this subinterval—for example, 0.068—can be used to represent the
message. In the arithmetically coded message of Fig. 5.6, three decimal digits are used
to represent the five-symbol message. This translates into 3/5 or 0.6 decimal digits per
source symbol and compares favorably with the entropy of the source, which is 0.58
decimal digits or 10-ary units/symbol. As the length of the sequence being coded
increases, the resulting arithmetic code approaches the bound established by the
noiseless coding theorem. In practice, two factors cause coding performance to fall
short of the bound: (1) the addition of the end-of-message indicator that is needed to
separate one message from an- other; and (2) the use of finite precision arithmetic.
Practical implementations of arithmetic coding address the latter problem by
introducing a scaling strategy and a rounding strategy (Langdon and Rissanen [1981]).
The scaling strategy renormalizes each subinterval to the [0, 1) range before
subdividing it in accordance with the symbol probabilities. The rounding strategy
guarantees that the truncations associated with finite precision arithmetic do not
prevent the coding subintervals from being represented accurately.
LZW coding is conceptually very simple (Welch [1984]). At the onset of the coding
process, a codebook or "dictionary" containing the source symbols to be coded is
constructed. For 8-bit Monochrome images, the first 256 words of the dictionary are
assigned to the gray values 0, 1, 2..., and
255. As the encoder sequentially examines the image's pixels, gray- level sequences
that are not in the dictionary are placed in algorithmically determined (e.g., the next
unused) locations. If the first two pixels of the image are white, for instance, sequence
―255- 255‖ might be assigned to location 256, the address following the locations
reserved for gray levels 0 through 255. The next time that two consecutive white pixels
are encountered, code word 256, the address of the location containing sequence 255-
255, is used to represent them.
Table details the steps involved in coding its 16 pixels. A 512-word dictionary with
the following starting content is assumed:
Locations 256 through 511 are initially unused. The image is encoded by processing
its pixels in a left- to-right, top-to-bottom manner. Each successive gray-level value is
concatenated with a variable— column 1 of Table 6.1 —called the "currently
recognized sequence." As can be seen, this variable is initially null or empty. The
dictionary is searched for each concatenated sequence and if found, as was the case in
the first row of the table, is replaced by the newly concatenated and recognized (i.e.,
located in the dictionary) sequence. This was done in column 1 of row 2.
No output codes are generated, nor are the dictionary altered. If the concatenated
sequence is not found, however, the address of the currently recognized sequence is
output as the next encoded value, the concatenated but unrecognized sequence is added
to the dictionary, and the currently recognized sequence is initialized to the current
pixel value. This occurred in row 2 of the table. The last two columns detail the gray-
level sequences that are added to the dictionary when scanning the entire 4 x 4 image.
Nine additional code words are defined. At the conclusion of coding, the dictionary
contains 265 code words and the LZW algorithm has successfully identified several
repeating gray-level sequences—leveraging them to reduce the original 128-bit image
lo 90 bits (i.e., 10 9-bit codes). The encoded output is obtained by reading the third
column from top to bottom. The resulting compression ratio is 1.42:1.
A unique feature of the LZW coding just demonstrated is that the coding dictionary or
code book is created while the data are being encoded. Remarkably, an LZW decoder
builds an identical decompression dictionary as it decodes simultaneously the encoded
data stream. . Although not needed in this example, most practical applications require
a strategy for handling dictionary overflow. A simple solution is to flush or reinitialize
the dictionary when it becomes full and continue coding with a new initialized
dictionary. A more complex option is to monitor compression performance and flush
the dictionary when it becomes poor or unacceptable. Alternately, the least used
dictionary entries can be tracked and replaced when necessary.
Bit-plane decomposition:
The gray levels of an m-bit gray-scale image can be represented in the form of the base
2 polynomial
Based on this property, a simple method of decomposing the image into a collection
of binary images is to separate the m coefficients of the polynomial into m 1-bit bit
planes. The zeroth- order bit plane is
generated by collecting the a0 bits of each pixel, while the (m - 1) st-order bit plane
contains the am-1, bits or coefficients. In general, each bit plane is numbered from 0 to
m-1 and is constructed by setting its pixels equal to the values of the appropriate bits
or polynomial coefficients from each pixel in the original image. The inherent
disadvantage of this approach is that small changes in gray level can have a significant
impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111) is
adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will
contain a corresponding 0 to 1 (or 1 to 0) transition. For example, as the most
significant bits of the two binary codes for 127 and 128 are different, bit plane 7 will
contain a zero-valued pixel next to a pixel of value 1, creating a 0 to 1 (or 1 to 0)
transition at that point.
Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level
are less likely to affect all m bit planes. For instance, when gray levels 127 and 128
are adjacent, only the 7th bit plane will contain a 0 to 1 transition, because the Gray
codes that correspond to 127 and 128 are 11000000 and 01000000, respectively.
The error-free compression approach does not require decomposition of an image into
a collection of bit planes. The approach, commonly referred to as lossless predictive
coding, is based on eliminating the interpixel redundancies of closely spaced pixels by
extracting and coding only the new information in each pixel. The new information
of a pixel is defined as the difference between the actual and predicted value of that
pixel.
Figure shows the basic components of a lossless predictive coding system. The system
consists of an encoder and a decoder, each containing an identical predictor. As each
successive pixel of the input image, denoted fn, is introduced to the encoder, the
predictor generates the anticipated value of that pixel based on some number of past
inputs. The output of the predictor is then rounded to the nearest integer, denoted f^n
and used to form the difference or prediction error which is coded using a variable-
length code (by the symbol encoder) to generate the next element of the compressed
data stream.
Figure :A lossless predictive coding model: (a) encoder; (b) decoder
The decoder of Fig. (b) reconstructs en from the received variable-length code words
and performs the inverse operation Various local, global, and adaptive methods can be
used to generate f^n. In most cases, however, the prediction is formed by a linear
combination of m previous pixels. That is,
where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction
coefficients. In raster scan applications, the subscript n indexes the predictor outputs
in accordance with their time of occurrence. That is, fn, f^n and en in Eqns. above could
be replaced with the more explicit notation f (t), f^(t), and e (t), where t represents
time. In other cases, n is used as an index on the spatial coordinates and/or frame
number (in a time sequence of images) of an image. In 1-D linear predictive coding,
for example, Eq. above can be written as
Figure: A lossy predictive coding model: (a) encoder and (b) decoder.
In order to accommodate the insertion of the quantization step, the error-free encoder of figure
must be altered so that the predictions generated by the encoder and decoder are equivalent. As
Fig. shows, this is accomplished by placing the lossy encoder's predictor within a feedback loop,
where its input, denoted f˙n, is generated as a function of past predictions and the corresponding
quantized errors. That is,
This closed loop configuration prevents error buildup at the decoder's output. Note
from Fig that the output of the decoder also is given by the above Eqn.
Optimal predictors:
The optimal predictor used in most predictive coding applications minimizes the
encoder's mean- square prediction error
That is, the optimization criterion is chosen to minimize the mean-square prediction
error, the quantization error is assumed to be negligible (e˙n ≈ en), and the prediction
is constrained to a linear combination of m previous pixels.1 These restrictions are not
essential, but they simplify the analysis considerably and, at the same time, decrease
the computational complexity of the predictor. The resulting predictive coding
approach is referred to as differential pulse code modulation (DPCM).
Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques
that are based on modifying the transform of an image. In transform coding, a
reversible, linear transform (such as the Fourier transform) is used to map the image
into a set of transform coefficients, which are then quantized and coded. For most
natural images, a significant number of the coefficients have small magnitudes and can
be coarsely quantized (or discarded entirely) with little image distortion. A variety of
transformations, including the discrete Fourier transform (DFT), can be used to
transform the image data.
Figure shows a typical transform coding system. The decoder implements the inverse
sequence of steps (with the exception of the quantization function) of the encoder,
which performs four relatively straightforward operations: sub image decomposition,
transformation, quantization, and coding. An N X N input image first is subdivided
into sub images of size n X n, which are then transformed to generate (N/n) 2 sub image
transform arrays, each of size n X n. The goal of the transformation process is to
decorrelate the pixels of each sub image, or to pack as much information as possible
into the smallest number of transform coefficients. The quantization stage then
selectively eliminates or more coarsely quantizes the coefficients that carry the least
information. These coefficients have the smallest impact on reconstructed sub image
quality. The encoding process terminates by coding (normally using a variable-length
code) the quantized coefficients. Any or all of the transform encoding steps can be
adapted to local image content, called adaptive transform coding, or fixed for all sub
images, called non adaptive transform coding.