0% found this document useful (0 votes)
4 views

IP

The document provides comprehensive lecture notes on digital image processing for B.Tech students, covering fundamental concepts, image enhancement, restoration, color processing, and system design techniques. It outlines the course outcomes, mapping with program outcomes, and various applications of digital image processing in fields such as medical imaging, remote sensing, and robotics. Additionally, it discusses the components of image processing systems, fundamental steps in image processing, and the operation of different types of scanners.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

IP

The document provides comprehensive lecture notes on digital image processing for B.Tech students, covering fundamental concepts, image enhancement, restoration, color processing, and system design techniques. It outlines the course outcomes, mapping with program outcomes, and various applications of digital image processing in fields such as medical imaging, remote sensing, and robotics. Additionally, it discusses the components of image processing systems, fundamental steps in image processing, and the operation of different types of scanners.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

IMAGE PROCESSING

B.Tech V semester – UG-20

LECTURE NOTES

ACADEMIC YEAR: 2022-2023

Prepared By

Ms. B. Santhosh Kumar, Assistnat Professor

INSTITUTE OF AERONAUTICAL ENGINEERING


Autonomous
Dundigal, Hyderabad - 500 043
Computer Science and Engineering
MODULE -I INTRODUCTION
What is digital image processing, origins of digital image processing, examples of fields that use
dip, fundamental steps in digital image processing, components of an image processing system;
Digital image fundamentals: Elements of visual perception, a simple image formation model, basic
concepts in sampling and quantization, representing digital images, spatial and gray-level
resolution, zooming and shrinking digital images, some basic relationships between pixels, linear
and nonlinear operations
MODULE -II IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN
Some basic gray level transformations, histogram processing, enhancement using
arithmetic/logic operations, basics of spatial filtering, smoothing spatial filters, sharpening
spatial filters, combining spatial enhancement methods. Introduction to the Fourier
transform and the frequency domain, smoothing frequency domain filters, sharpening
frequency domain filters, homomorphic filtering.
MODULE -III IMAGE RESTORATION AND FILTERING
A model of the image degradation/restoration process, noise models, restoration in the presence of
noise only spatial filtering, periodic noise reduction by frequency domain filtering. Linear position
invariant degradations, estimating the degradation function, inverse filtering, minimum mean
square error (wiener) filtering,constrained least square filtering, and geometric mean filter .
MODULE -IV COLOR IMAGE PROCESSING
Color models, pseudo color image processing, basics of full-color image processing, color
transformations, smoothing and sharpening, color segmentation, noise in color images, color image
compression; Wavelets and multi resolution processing: Image pyramids, sub band coding, the haar
transform, multi resolution expansions, wavelet transforms in one dimension, fast wavelet
transform, wavelet transforms in two dimensions, wavelet packets; Fundamentals, image
compression models, error-free (lossless) compression, lossy compression.
MODULE -V SYSTEM DESIGN TECHNIQUES
Preliminaries, dilation and erosion, opening and closing, the hit-or-miss transformation, some basic
morphological algorithms; Image segmentation: Detection of discontinuities, edge linking and
boundary detection, thresholding, region-based segmentation.
Text Books:
1. Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing”, Pearson, 3rd Edition,
2008.
2. S. Jayaraman, S. Esakkirajan, T. Veerakumar, “Digital Image Processing”, TMH, 3rd Edition,
2010.
Reference Books:
1. Rafael, C. Gonzalez, Richard E woods, Stens L Eddings, “Digital Image Processing using
MAT LAB”, Tata McGraw Hill, 2nd Edition, 2010.
2. A.K. Jain, “Fundamentals of Digital Image Processing”, PHI, 1 st Edition, 1989.
3. Somka, Hlavac, Boyle, “Digital Image Processing and Computer Vision”, Cengage Learning,
1st Edition, 2008.
4. Adrain Low, “Introductory Computer vision Imaging Techniques and Solutions”, Tata
McGraw-Hill, 2nd Edition, 2008.
5. John C. Russ, J. Christian Russ, “Introduction to Image Processing & Analysis”, CRC Press,
1st Edition, 2010.
MODULE-1
INTRODUCTION
COURSE OUTCOMES MAPPED WITH MODULE-I

At the end of the unit students are able to:


Bloom’s
CO Course Outcomes
Taxonomy
Outline the principles and terminology of digital image Understand
CO 1 processing for describing the features of image.
Design systems using , 2D DFT transforms for image Apply
CO 2 processing applications.
Make use of various image transform techniques like Apply
CO 3 Walsh, Hadamard , Slant, DCT and Haar transforms for
analyzing images in transform domain.

PROGRAM OUTCOMES AND PROGRAM SPECIFIC OUTCOMES MAPPED


WITH MODULE-I
PO1 Engineering knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization to the solution
of complex engineering problems.
PO 2 Problem analysis: Identify, formulate, review research literature, and
analyze complex engineering problems reaching substantiated conclusions
using first principles of mathematics, natural sciences, and engineering
sciences

PO 5 Modern tool usage: Create, Select, and apply appropriate techniques,


resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.

PO 10 Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able
to comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.

PO 12 Life-long learning: Recognize the need for, and have the


preparation and ability to engage in independent and life-long learning in the
broadest context of technological change
PSO 3 Understand design and analyze computer program in the areas related to
algorithms, systems, software, web design, big data, artificial intelligence
,machine learning and networking
MAPPING OF COs WITH POs, PSOs FOR MODULE I
Program
Course Program Outcomes Specific
Outcomes Outcomes
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3

CO 1 √ √

CO 2 √ √ √

CO 3 √ √ √ √ √ √
INTRODUCTION:
Basic concept of digital image:
The field of digital image processing refers to processing digital images by means of digital
computer. Digital image is composed of a finite number of elements, each of which has a
particular location and value. These elements are called picture elements, image elements, pels
and pixels. Pixel is the term used most widely to denote the elements of digital image.An image
is a two-dimensional function that represents a measure of some characteristic such as
brightness or color of a viewed scene. An image is a projection of a 3- D scene into a 2D
projection plane.
An image may be defined as a two-dimensional function f(x,y), where x and y are spatial (plane)
coordinates, and the amplitude of f at any pair of coordinates (x,y) is called the intensity of the
image at that point.
The term gray level is used often to refer to the intensity of monochrome images.

Color images are formed by a combination of individual 2-D images.For example: The RGB
color system, a color image consists of three (red, green and blue) individual component
images. For this reason many of the techniques developed for monochrome images can be
extended to color images by processing the three component images individually.
An image may be continuous with respect to the x- and y- coordinates and also in amplitude.
Converting such an image to digital form requires that the coordinates, as well as the amplitude,
be digitized.
Applications of digital image processing
Since digital image processing has very wide applications and almost all of the
technical fields are impacted by DIP, we will just discuss some of the major
applications of DIP.
Digital image processing has a broad spectrum of applications, such as
• Remote sensing via satellites and other spacecrafts
• Image transmission and storage for business applications
• Medical processing,
• RADAR (Radio Detection and Ranging)
• SONAR(Sound Navigation and Ranging) and
• Acoustic image processing (The study of underwater sound is known as
underwater acoustics or hydro acoustics.)
• Robotics and automated inspection of industrial parts. mages acquired by
satellites are useful in tracking of
• Earth resources;
• Geographical mapping;
• Prediction of agricultural crops,
• Urban growth and weather monitoring
• Flood and fire control and many other environmental applications. Space image
applications include:
• Recognition and analysis of objects contained in images obtained from deep
space-probe missions.
• Image transmission and storage applications occur in broadcast television
• Teleconferencing
• Transmission of facsimile images(Printed documents and graphics) for office
automation Communication over computer networks
• Closed-circuit television based security monitoring systems and
• In military communications.
• Medical applications:
• Processing of chest X- rays
• Cineangiograms
• Projection images of transaxial tomography and
• Medical images that occur in radiology nuclear magnetic
esonance(NMR)
Image processing toolbox (IPT) is a collection of functions that extend the capability
of the MATLAB numeric computing environment. These functions, and the
expressiveness of the MATLAB language, make many image-processing operations
easy to write in a compact, clear manner, thus providing a ideal software prototyping
environment for the solution of image processing problem.
Components of Image processing System:

Figure: Components of Image processing System


Image Sensors: With reference to sensing, two elements are required to acquire digital
image. The first is a physical device that is sensitive to the energy radiated by the object
we wish to image and second is specialized image processing hardware.
Specialize image processing hardware: It consists of the digitizer just mentioned,
plus hardware that performs other primitive operations such as an arithmetic logic unit,
which performs arithmetic such addition and subtraction and logical operations in
parallel on images.
Computer: It is a general purpose computer and can range from a PC to a
supercomputer depending on the application. In dedicated applications, sometimes
specially designed computer are used to achieve a required level of performance
Software: It consists of specialized modules that perform specific tasks a well
designed package also includes capability for the user to write code, as a minimum,
utilizes the specialized module. More sophisticated software packages allow the
integration of these modules.
Mass storage: This capability is a must in image processing applications. An image
of size 1024 x1024 pixels, in which the intensity of each pixel is an 8- bit quantity
requires one Megabytes of storage space if the image is not compressed .Image
processing applications falls into three principal categories of storage
• Short term storage for use during processing
• On line storage for relatively fast retrieval
• Archival storage such as magnetic tapes and disks
Image display: Image displays in use today are mainly color TV monitors. These
monitors are driven by the outputs of image and graphics displays cards that are an
integral part of computer system.
Hardcopy devices: The devices for recording image includes laser printers, film
cameras, heat sensitive devices inkjet units and digital units such as optical and CD
ROM disk. Films provide the highest possible resolution, but paper is the obvious
medium of choice for written applications.
Networking: It is almost a default function in any computer system in use today
because of the large amount of data inherent in image processing applications. The key
consideration in image transmission bandwidth.
Fundamental Steps in Digital Image Processing:
There are two categories of the steps involved in the image processing –
1. Methods whose outputs are input are images.
2. Methods whose outputs are attributes extracted from those images.
Fig: Fundamental Steps in Digital Image Processing
Image acquisition: It could be as simple as being given an image that is already in
digital form. Generally the image acquisition stage involves processing such scaling.
Image Enhancement: It is among the simplest and most appealing areas of digital
image processing. The idea behind this is to bring out details that are obscured or
simply to highlight certain features of interest in image. Image enhancement is a very
subjective area of image processing.

Image Restoration: It deals with improving the appearance of an image. It is an


objective approach, in the sense that restoration techniques tend to be based on
mathematical or probabilistic models of image processing. Enhancement, on the other
hand is based on human subjective preferences regarding what constitutes a “good”
enhancement result.

Color image processing: It is an area that is been gaining importance because of the
use of digital images over the internet. Color image processing deals with basically
color models and their implementation in image processing applications.
Wavelets and Multiresolution Processing: These are the foundation for representing
image in various degrees of resolution.
Compression: It deals with techniques reducing the storage required to save an image,
or the bandwidth required to transmit it over the network. It has to major approaches
a) Lossless Compression b) Lossy Compression
Morphological processing: It deals with tools for extracting image components that
are useful in the representation and description of shape and boundary of objects. It is
majorly used in automated inspection applications.
Representation and Description: It always follows the output of segmentation step
that is, raw pixel data, constituting either the boundary of an image or points in the
region itself. In either case converting the data to a form suitable for computer
processing is necessary.
Recognition: It is the process that assigns label to an object based on its descriptors.
It is the last step of image processing which use artificial intelligence of software.
Knowledge base:
Knowledge about a problem domain is coded into an image processing system in the
form of a knowledge base. This knowledge may be as simple as detailing regions of
an image where the information of the interest in known to be located. Thus limiting
search that has to be conducted in seeking the information. The knowledge base also
can be quite complex such interrelated list of all major possible defects in a materials
inspection problems or an image database containing high resolution satellite images
of a region in connection with change detection application.
Digital image through scanner

Scanner is a device that scans images, printed text, and handwriting etc and converts
it to digital form or image. It is so named because the data is converted one line at a
time or scanned down the page as the scanning head moves down the page.

Components inside a scanner are the following

Glass Plate and Cover

The glass plate is the transparent plate wherein the original is placed so that the scanner
can scan it and the cover keeps out stray light that can affect the accuracy of the scan

Scanning head

Scanning head is the most important component because it is the one which does actual
scanning. It contains components like
Light source and mirror: It is the bright white light that is used to illuminate the
original as it is being scanned and which bounces off the original and reflected off
several mirrors

Stabilizer bar: It is a long stainless steel rod that is securely fastened to the case of
the scanner and it provides a smooth ride as the scanner scans down the page

CCD (Charge Coupled Device) or CIS (Contact Image Sensor): A CCD array is a
device that converts photons into electricity. Any scanner that uses CCD use lens to
focus the light coming from the mirrors within the scanning head.

Another technology used in some cheaper scanners is CIS wherein the light source is
a set of LEDs that runs the length of the glass plate.
iii. Stepper Motor

The stepper motor in a scanner moves the scan head down the page during scan cycle
and this is often located either on the scan head itself or attached to a belt to drive the
scanner head.

Flatbed Scanners

The most commonly used scanner is a flatbed scanner also known as desktop scanner.
It has a glass plate on which the picture or the document is placed. The scanner head
placed beneath the glass plate moves across the picture and the result is a good quality
scanned image. For scanning large maps or top sheets wide format flatbed scanners
can be used.

Sheet fed Scanners

Figure: sheet fed scanner

Sheet fed scanners work on a principle similar to that of a fax machine. In this, the
document to be scanned is moved past the scanning head and the digital form of the
image is obtained. The disadvantage of this type of scanner is that it can only scan
loose sheets and the scanned image can easily become distorted if the document is not
handled properly while scanning

Handheld Scanners

Figure: hand held scanner


Hand-held scanners although portable, can only scan images up to about four inches
wide. They require a very steady hand for moving the scan head over the document.
They are useful for scanning small logos or signatures and are virtually of no use for
scanning maps and photographs

Human eye operation


Before we discuss, the image formation on analog and digital cameras, we have to first
discuss the image formation on human eye. Because the basic principle that is followed
by the cameras has been taken from the way, the human eye works.
When light falls upon the particular object, it is reflected back after striking through
the object. The rays of light when passed through the lens of eye, form a particular
angle, and the image is formed on the retina which is the back side of the wall. The
image that is formed is inverted. This image is then interpreted by the brain and that
makes us able to understand things. Due to angle formation, we are able to perceive
the height and depth of the object we are seeing. This has been more explained in the
tutorial of perspective transformation.

Figure: human eye


As you can see in the above figure, that when sun light falls on the object (in this case
the object is a face), it is reflected back and different rays form different angle when
they are passed through the lens and an invert image of the object has been formed on
the back wall. The last portion of the figure denotes that the object has been interpreted
by the brain and re-inverted.
Now lets take our discussion back to the image formation on analog and digital
cameras.
Image formation on analog cameras

In analog cameras, the image formation is due to the chemical reaction that takes place
on the strip that is used for image formation.A 35mm strip is used in analog camera. It
is denoted in the figure by 35mm film cartridge. This strip is coated with silver halide
( a chemical substance).

Figure: Analog film


A 35mm strip is used in analog camera. It is denoted in the figure by 35mm film
cartridge. This strip is coated with silver halide ( a chemical substance).
Light is nothing but just the small particles known as photon particles. So when these
photon particles are passed through the camera, it reacts with the silver halide particles
on the strip and it results in the silver which is the negative of the image.In order to
understand it better, have a look at this equation.
Photons (light particles) + silver halide +Silver Image negative.
This is just the basics, although image formation involves many other concepts
regarding the passing of light inside, and the concepts of shutter and shutter speed and
aperture and its opening but for now we will move on to the next part. Although most
of these concepts have been discussed in our tutorial of shutter and aperture.
This is just the basics, although image formation involves many other concepts
regarding the passing of light inside, and the concepts of shutter and shutter speed and
aperture and its opening but for now we will move on to the next part. Although most
of these concepts have been discussed in our tutorial of shutter and aperture.
Image formation on digital cameras
In the digital cameras, the image formation is not due to the chemical reaction that
takes place; rather it is a bit more complex then this. In the digital camera, a CCD array
of sensors is used for the image formation.
Image formation through CCD array

Figure: digital camera CCD


CCD stands for charge-coupled device. It is an image sensor, and like other sensors it
senses the values and converts them into an electric signal. In case of CCD it senses
the image and converts it into electric signal e.t.c.
This CCD is actually in the shape of array or a rectangular grid. It is like a matrix with
each cell in the matrix contains a censor that senses the intensity of photon.

Like analog cameras, in the case of digital too, when light falls on the object, the light
reflects back after striking the object and allowed to enter inside the camera.
Each sensor of the CCD array itself is an analog sensor. When photons of light strike
on the chip, it is held as a small electrical charge in each photo sensor. The response
of each sensor is directly equal to the amount of light or (photon) energy striked on the
surface of the sensor.
Since we have already define an image as a two dimensional signal and due to the two
dimensional formation of the CCD array, a complete image can be achieved from this
CCD array.It has limited number of sensors, and it means a limited detail can be
captured by it. Also each sensor can have only one value against the each photon
particle that strike on it.So the number of photons striking (current) are counted and
stored. In order to measure accurately these, external CMOS sensors are also attached
with CCD array.
Sampling and quantization:
To create a digital image, we need to convert the continuous sensed data into digital
from. This involves two processes – sampling and quantization. An image may be
continuous with respect to the x and y coordinates and also in amplitude. To convert
it into digital form we have to sample the function in both coordinates and in
amplitudes.
Digitalizing the coordinate values is called sampling. Digitalizing the amplitude values
is called quantization. There is a continuous the image along the line segment AB. To
simple this function, we take equally spaced samples along line AB. The location of
each samples is given by a vertical tick back (mark) in the bottom part. The samples
are shown as block squares superimposed on function the set of these discrete locations
gives the sampled function.
In order to form a digital, the gray level values must also be converted (quantized) into
discrete quantities. So we divide the gray level scale into eight discrete levels ranging
from eight level values. The continuous gray levels are quantized simply by assigning
one of the eight discrete gray levels to each sample. The assignment it made depending
on the vertical proximity of a simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces
a two dimensional digital image.
Digital Image definition:
A digital image f(m,n) described in a 2D discrete space is derived from an analog
image f(x,y) in a 2D continuous space through a sampling process that is frequently
referred to as digitization. The mathematics of that sampling process will be described
in subsequent Chapters. For now we will look at some basic definitions associated with
the digital image. The effect of digitization is shown in figure.
The 2D continuous image f(x,y) is divided into N rows and M columns. The
intersection of a row and a column is termed a pixel. The value assigned to the integer
coordinates (m,n) with m=0,1,2..N-1 and n=0,1,2…N-1 is f(m,n). In fact, in most
cases, is actually a function of many variables including depth, color and time (t).

There are three types of computerized processes in the processing of image


Low level process -these involve primitive operations such as image processing to
reduce noise, contrast enhancement and image sharpening. These kind of processes
are characterized by fact the both inputs and output are images.
Mid level image processing - it involves tasks like segmentation, description of those
objects to reduce them to a form suitable for computer processing, and classification
of individual objects. The inputs to the process are generally images but outputs are
attributes extracted from images.
High level processing – It involves “making sense” of an ensemble of recognized
objects, as in image analysis, and performing the cognitive functions normally
associated with vision.
Representing Digital Images:- The result of sampling and quantization is matrix of real
numbers. Assume that an image f(x,y) is sampled so that the resulting digital image
has M rows and N Columns. The values of the coordinates (x,y) now become discrete
quantities thus the value of the coordinates at orgin become 9X,y) =(o,o) The next
Coordinates value along the first signify the iamge along the first row. it does not mean
that these are the actual values of physical coordinates when the image was sampled.

Thus the right side of the matrix represents a digital element, pixel or pel. The matrix
can be represented in the following form as well. The sampling process may be viewed
as partitioning the xy plane into a grid with the coordinates of the center of each grid
being a pair of elements from the Cartesian products Z2 which is the set of all ordered
pair of elements (Zi, Zj) with Zi and Zj being integers from Z. Hence f(x,y) is a digital
image if gray

level (that is, a real number from the set of real number R) to each distinct pair of
coordinates (x,y). This functional assignment is the quantization process. If the gray
levels are also integers, Z replaces R, the and a digital image become a 2D function
whose coordinates and she amplitude value are integers. Due to processing storage and
hardware consideration, the number gray levels typically is an integer power of 2.
k
L=2

Then, the number, b, of bites required to store a digital image is B=M *N* k When
2
M=N, the equation become b=N *k
When an image can have 2k gray levels, it is referred to as “k- bit”. An image with
8
256 possible gray levels is called an “8- bit image” (256=2 ).

Spatial and Gray level resolution:


Spatial resolution is the smallest discernible details are an image. Suppose a chart can
be constructed with vertical lines of width w with the space between the also having
width W, so a line pair consists of one such line and its adjacent space thus. The width
of the line pair is 2w and there is 1/2w line pair per unit distance resolution is simply
the smallest number of discernible line pair unit distance.
Gray levels resolution refers to smallest discernible change in gray levels. Measuring
discernible change in gray levels is a highly subjective process reducing the number
of bits R while repairing the spatial resolution constant creates the problem of false
contouring.
It is caused by the use of an insufficient number of gray levels on the smooth areas of
the digital image . It is called so because the rides resemble top graphics contours in a
map. It is generally quite visible in image displayed using 16 or less uniformly spaced
gray levels.
Image sensing and Acquisition:
The types of images in which we are interested are generated by the combination of
an “illumination” source and the reflection or absorption of energy from that source
by the elements of the “scene” being imaged. We enclose illumination and scene in
quotes to emphasize the fact that they are considerably more general than the familiar
situation in which a visible light source illuminates a common everyday 3-D (three-
dimensional) scene. For example, the illumination may originate from a source of
electromagnetic energy such as radar, infrared, or X-ray energy. But, as noted earlier,
it could originate from less traditional sources, such as ultrasound or even a computer-
generated illumination pattern. Similarly, the scene elements could be familiar objects,
but they can just as easily be molecules, buried rock formations, or a human brain.
We could even image a source, such as acquiring images of the sun. Depending on
the nature of the source, illumination energy is reflected from, or transmitted through,
objects. An example in the first category is light reflected from a planar surface. An
example in the second category is when X-rays pass through a patient’s body for the

purpose of generating a diagnostic X-ray film. In some applications, the reflected or


transmitted energy is focused onto a photo converter (e.g., a phosphor screen), which
converts the energy into visible light. Electron microscopy and some applications of
gamma imaging use this approach. The idea is simple: Incoming energy is transformed
into a voltage by the combination of input electrical power and sensor material that is
responsive to the particular type of energy being detected. The output voltage
waveform is the response of the sensor(s), and a digital quantity is obtained from each
sensor by digitizing its response. In this section, we look at the principal modalities for
image sensing and generation.
Fig: Single Image sensor

Fig: Line Sensor

Fig: Array sensor Image Acquisition using a Single sensor:

The components of a single sensor. Perhaps the most familiar sensor of this type is the
photodiode, which is constructed of silicon materials and whose output voltage waveform is
proportional to light. The use of a filter in front of a sensor improves selectivity. For
example, a green (pass) filter in front of a light sensor favors light in the green band of the
color spectrum. As a consequence, the sensor output will be stronger for green light than for
other components in the visible spectrum.

In order to generate a 2-D image using a single sensor, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be
imaged. Figure shows an arrangement used in high-precision scanning, where a film
negative is mounted onto a drum whose mechanical rotation provides displacement in
one dimension. The single sensor is mounted on a lead screw that provides motion in
the perpendicular direction. Since mechanical motion can be controlled with high
precision, this method is an inexpensive (but slow) way to obtain high-resolution
images. Other similar mechanical arrangements use a flat bed, with the sensor moving
in two linear directions. These types of mechanical digitizers sometimes are referred
to as micro densitometers.

Image Acquisition using a Sensor strips:


A geometry that is used much more frequently than single sensors consists of an in-
line arrangement of sensors in the form of a sensor strip, shows. The strip provides
imaging elements in one direction. Motion perpendicular to the strip provides imaging
in the other direction. This is the type of arrangement used in most flat bed scanners.
Sensing devices with 4000 or more in-line sensors are possible. In-line sensors are
used routinely in airborne imaging applications, in which the imaging system is
mounted on an aircraft that flies at a constant altitude and speed over the geographical
area to be imaged. One dimensional imaging sensor strips that respond to various
bands of the electromagnetic spectrum are mounted perpendicular to the direction of
flight. The imaging strip gives one line of an image at a time, and the motion of the
strip completes the other dimension of a two-dimensional image. Lenses or other
focusing schemes are used to project area to be scanned onto the sensors. Sensor strips
mounted in a ring configuration are used in medical and industrial imaging to obtain
cross-sectional (“slice”) images of 3-D objects.

Fig: Image Acquisition using linear strip and circular strips.


Image Acquisition using a Sensor Arrays:
The individual sensors arranged in the form of a 2-D array. Numerous electromagnetic
and some ultrasonic sensing devices frequently are arranged in an array format. This
is also the predominant arrangement found in digital cameras. A typical sensor for
these cameras is a CCD array, which can be manufactured with a broad range of
sensing properties and can be packaged in rugged arrays of elements or more. CCD
sensors are used widely in digital cameras and other light sensing instruments. The
response of each sensor is proportional to the integral of the light energy projected
onto the surface of the sensor, a property that is used in astronomical and other
applications requiring low noise images. Noise reduction is achieved by letting the
sensor integrate the input light signal over minutes or even hours. The two
dimensional, its key advantage is that a complete image can be obtained by focusing
the energy pattern onto the surface of the array. Motion obviously is not necessary, as
is the case with the sensor arrangements This figure shows the energy from an
illumination source being reflected from a scene element, but, as mentioned at the
beginning of this section, the energy also could be transmitted through the scene
elements. The first function performed by the imaging system is to collect the
incoming energy and focus it onto an image plane. If the illumination is light, the front
end of the imaging system is a lens, which projects the viewed scene onto the lens
focal plane. The sensor array, which is coincident with the focal plane, produces
outputs proportional to the integral of the light received at each sensor. Digital and
analog circuitries sweep these outputs and convert them to a video signal, which is
then digitized by another section of the imaging system.
Image sampling and Quantization:
To create a digital image, we need to convert the continuous sensed data into digital
form. This involves two processes: sampling and quantization. A continuous image,
f(x, y), that we want to convert to digital form. An image may be continuous with
respect to the x- and y- coordinates, and also in amplitude. To convert it to digital
form, we have to sample the function in both coordinates and in amplitude. Digitizing
the coordinate values is called sampling. Digitizing the amplitude values is called
quantization.

Digital Image representation:


Digital image is a finite collection of discrete samples (pixels) of any observable
object. The pixels represent a two- or higher dimensional “view” of the object, each
pixel having its own discrete value in a finite range. The pixel values may represent
the amount of visible light, infra red light, absortation of x-rays, electrons, or any other
measurable value such as ultrasound wave impulses. The image does not need to have
any visual sense; it is sufficient that the samples form a two-dimensional spatial
structure that may be illustrated as an image. The images may be obtained by a digital
camera, scanner, electron microscope, ultrasound stethoscope, or any other optical or
non-optical sensor. Examples of digital image are:
• digital photographs
• satellite images
• radiological images (x-rays, mammograms)
• binary images, fax images, engineering drawings
Computer graphics, CAD drawings, and vector graphics in general are not considered
in this course even though their reproduction is a possible source of an image. In fact,
one goal of intermediate level image processing may be to reconstruct a model (e.g.
vector representation) for a given digital image.

Relationship between pixels:


We consider several important relationships between pixels in a digital image.
Neighbors of a pixel
• A pixel p at coordinates (x,y) has four horizontal and vertical neighbors whose
coordinates are given by:
(x+1,y), (x-1, y), (x, y+1), (x,y-1)

This set of pixels, called the 4-neighbors or p, is denoted by N4(p). Each pixel is one
unit distance from (x,y) and some of the neighbors of p lie outside the digital image if
(x,y) is on the border of the image. The four diagonal neighbors of p have coordinates
and are denoted by ND (p).
(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)
These points, together with the 4-neighbors, are called the 8-neighbors of p, denoted
by N8 (p).

As before, some of the points in ND (p) and N8 (p) fall outside the image if (x,y) is on
the border of the image.
Adjacency and connectivity
Let v be the set of gray –level values used to define adjacency, in a binary image,
v={1}. In a gray-scale image, the idea is the same, but V typically contains more
elements, for example, V = {180, 181, 182, …, 200}.
If the possible intensity values 0 – 255, V set can be any subset of these 256 values. if
we are reference to adjacency of pixel with value.
Three types of adjacency
• 4- Adjacency – two pixel P and Q with value from V are 4 –adjacency if A is in
the set N4(P)
• 8- Adjacency – two pixel P and Q with value from V are 8 –adjacency if A is in
the set N8(P)
• M-adjacency –two pixel P and Q with value from V are m – adjacency if (i) Q is
in N4(p) or (ii) Q is in ND(q) and the set N4(p) ∩ N4(q) has no pixel whose values
are from V.
• Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate
the ambiguities that often arise when 8-adjacency is used.
• For example:

Figure :(a) Arrangement of pixels; (b) pixels that are 8-adjacent (shown dashed) to
the center pixel; (c) m-adjacency.
Types of Adjacency:
• In this example, we can note that to connect between two pixels (finding a path
between two pixels):
– In 8-adjacency way, you can find multiple paths between two pixels
– While, in m-adjacency, you can find only one path between two pixels
• So, m-adjacency has eliminated the multiple path connection that has been
generated by the 8-adjacency.
• Two subsets S1 and S2 are adjacent, if some pixel in S1 is adjacent to some pixel in
S2. Adjacent means, either 4-, 8- or m-adjacency.
A Digital Path:
• A digital path (or curve) from pixel p with coordinate (x,y) to pixel q with coordinate
(s,t) is a sequence of distinct pixels with coordinates (x0,y0), (x1,y1), …, (xn, yn) where
(x0,y0) = (x,y) and (xn, yn) = (s,t) and pixels (xi, yi) and (xi-1, yi-1) are adjacent for 1 ≤ i ≤
n
• n is the length of the path
• If (x0,y0) = (xn, yn), the path is closed.
We can specify 4-, 8- or m-paths depending on the type of adjacency specified.
• Return to the previous example:

Figure: (a) Arrangement of pixels; (b) pixels that are 8-adjacent(shown dashed) to
the center pixel; (c) m-adjacency.
In figure (b) the paths between the top right and bottom right pixels are 8-paths. And
the path between the same 2 pixels in figure (c) is m-path
Connectivity:
• Let S represent a subset of pixels in an image, two pixels p and q are said to be
connected in S if there exists a path between them consisting entirely of pixels in
S.
• For any pixel p in S, the set of pixels that are connected to it in S is called a
connected component of S. If it only has one connected component, then set S is
called a connected set.
Region and Boundary:
REGION: Let R be a subset of pixels in an image, we call R a region of the image if
R is a connected set.BOUNDARY: The boundary (also called border or contour) of a
region R is the set of pixels in the region that have one or more neighbors that are not
in R. If R happens to be an entire image, then its boundary is defined as the set of pixels in the
first and last rows and columns in the image. This extra definition is required because an image
has no neighbors beyond its borders. Normally, when we refer to a region, we are referring to
subset of an image, and any pixels in the boundary of the region that happen to coincide with
the border of the image are included implicitly as part of the region boundary.
DISTANCE MEASURES:
For pixel p,q and z with coordinate (x.y) ,(s,t) and (v,w) respectively D is a distance
function or metric if
D [p.q] ≥ O {D[p.q] = O iff p=q}
D [p.q] = D [p.q] and
D [p.q] ≥ O {D[p.q]+D(q,z)
• The Euclidean Distance between p and q is defined as:
De (p,q) = [(x – s)2 + (y - t)2]1/2
Pixels having a distance less than or equal to some value r from (x,y) are the points
contained in a disk of radius „ r „centered at (x,y)

• The D4 distance (also called city-block distance) between p and q is defined as:
D4 (p,q) = | x – s | + | y – t |

Pixels having a D4 distance from (x,y), less than or equal to some value r form a
Diamond centered at (x,y)

Example:
The pixels with distance D4 ≤ 2 from (x,y) form the following contours of constant
distance.
The pixels with D4 = 1 are the 4-neighbors of (x,y)

• The D8 distance (also called chessboard distance) between p and q is defined as:
D8 (p,q) = max(| x – s |,| y – t |)
Pixels having a D8 distance from (x,y), less than or equal to some value r form a
square Centered at (x,y).
Example:
D8 distance ≤ 2 from (x,y) form the following contours of constant distance.

• Dm distance:
It is defined as the shortest m-path between the points.In this case, the distance
between two pixels will depend on the values of the pixels along the path, as well as
the values of their neighbors.
• Example:
Consider the following arrangement of pixels and assume that p, p2, and p4 have value
1 and that p1 and p3 can have can have a value of 0 or 1 Suppose that we consider the
adjacency of pixels values 1 (i.e. V = {1})

Now, to compute the Dm between points p and p4


Here we have 4 cases:
Case1: If p1 =0 and p3 = 0
The length of the shortest m-path (the Dm distance) is 2 (p, p2, p4)

Case2: If p1 =1 and p3 = 0

now, p1 and p will no longer be adjacent (see m-adjacency definition)


then, the length of the shortest path will be 3 (p, p1, p2, p4)

Case3: If p1 =0 and p3 = 1

The same applies here, and the shortest –m-path will be 3 (p, p2, p3, p4)

Case4: If p1 =1 and p3 = 1
The length of the shortest m-path will be 4 (p, p1 , p2, p3, p4)

Gray levels:
Image resolution
A resolution can be defined as the total number of pixels in an image. This has been
discussed in Image resolution. And we have also discussed, that clarity of an image
does not depends on number of pixels, but on the spatial resolution of the image. This
has been discussed in the spatial resolution. Here we are going to discuss another type
of resolution which is called gray level resolution.
Gray level resolution
Gray level resolution refers to the predictable or deterministic change in the shades or
levels of gray in an image.
In short gray level resolution is equal to the number of bits per pixel.
We have already discussed bits per pixel in our tutorial of bits per pixel and image
storage requirements. We will define bpp here briefly.
BPP
The number of different colors in an image is depends on the depth of color or bits per
pixel.
Mathematically
The mathematical relation that can be established between gray level resolution and
bits per pixel can be given as.

In this equation L refers to number of gray levels. It can also be defined as the shades
of gray. And k refers to bpp or bits per pixel. So the 2 raise to the power of bits per
pixel is equal to the gray level resolution.
For example:

The above image of Einstein is an gray scale image. Means it is an image with 8 bits
per pixel or 8bpp.
Now if were to calculate the gray level resolution, here how we going to do it
L=2k
Where k=8
L=28
L=256
It means it gray level resolution is 256. Or in other way we can say that this image has
256 different shades of gray.The more is the bits per pixel of an image, the more is its
gray level resolution.
Defining gray level resolution in terms of bpp
It is not necessary that a gray level resolution should only be defined in terms of levels.
We can also define it in terms of bits per pixel.
For example
If you are given an image of 4 bpp, and you are asked to calculate its gray level
resolution. There are two answers to that question.
The first answer is 16 levels.
The second answer is 4 bits.
Finding bpp from Gray level resolution
You can also find the bits per pixels from the given gray level resolution. For this, we
just have to twist the formula a little.
Equation 1.

This formula finds the levels. Now if we were to find the bits per pixel or in this case
k, we will simply change it like this.
K = log base 2(L) Equation (2)
Because in the first equation the relationship between Levels (L ) and bits per pixel (k)
is exponentional. Now we have to revert it, and thus the inverse of exponentional is
log.
Let’s take an example to find bits per pixel from gray level resolution.
For example:
If you are given an image of 256 levels. What are the bits per pixel required for it.
Putting 256 in the equation, we get.
K = log base 2 (256)
K = 8.
So the answer is 8 bits per pixel.
Gray level resolution and quantization:
The quantization will be formally introduced in the next tutorial, but here we are just
going to explain the relationship between gray level resolution and quantization.
Gray level resolution is found on the y axis of the signal. In the tutorial of Introduction
to signals and system, we have studied that digitizing a an analog signal requires two
steps. Sampling and quantization.

Sampling is done on x axis. And quantization is done in Y axis.


So that means digitizing the gray level resolution of an image is done in quantization
we have introduced quantization in our tutorial of signals and system. We are formally
going to relate it with digital images in this tutorial. Let’s discuss first a little bit about
quantization.
Digitizing a signal
As we have seen in the previous tutorials, that digitizing an analog signal into a digital
requires two basic steps. Sampling and quantization. Sampling is done on x axis. It is
the conversion of x axis (infinite values) to digital values. The below figure shows
sampling of a signal.

Fig: Sampling with relation to digital images

The concept of sampling is directly related to zooming. The more samples you take,
the more pixels, you get. Oversampling can also be called as zooming. This has been
discussed under sampling and zooming tutorial.
But the story of digitizing a signal does not end at sampling too, there is another step
involved which is known as Quantization.
What is quantization
Quantization is opposite to sampling. It is done on y axis. When you are quantizing an
image, you are actually dividing a signal into quanta (partitions).On the x axis of the
signal, are the co-ordinate values, and on the y axis, we have amplitudes. So digitizing
the amplitudes is known as Quantization. Here how it is done
You can see in this image, that the signal has been quantified into three different levels.
That means that when we sample an image, we actually gather a lot of values, and in
quantization, we set levels to these values. This can be more clear in the image below.

Figure: continues to sampling conversion

In the figure shown in sampling, although the samples has been taken, but they were
still spanning vertically to a continuous range of gray level values. In the figure shown
above, these vertically ranging values have been quantized into 5 different levels or
partitions. Ranging from 0 black to 4 white. This level could vary according to the type
of image you want.
The relation of quantization with gray levels has been further discussed below.
Relation of Quantization with gray level resolution:
The quantized figure shown above has 5 different levels of gray. It means that the
image formed from this signal, would only have 5 different colors. It would be a black
and white image more or less with some colours of gray. Now if you were to make the
quality of the image better, there is one thing you can do here. Which is, to increase
the levels or gray level resolution up? If you increase this level to 256, it means you
have a gray scale image. Which is far better then simple black and white image?
Now 256, or 5 or whatever level you choose is called gray level. Remember the
formula that we discussed in the previous tutorial of gray level resolution which is,

We have discussed that gray level can be defined in two ways. Which were these two.

• Gray level = number of bits per pixel (BPP).(k in the equation)


• Gray level = number of levels per pixel.
In this case we have gray level is equal to 256. If we have to calculate the number of
bits, we would simply put the values in the equation. In case of 256levels, we have 256
different shades of gray and 8 bits per pixel; hence the image would be a gray scale
image.
Reducing the gray level
Now we will reduce the gray levels of the image to see the effect on the image.
For example
Lets say you have an image of 8bpp, that has 256 different levels. It is a gray scale
image and the image looks something like this.
256 Gray Levels

Now we will start reducing the gray levels. We will first reduce the gray levels from
256 to 128.
128 Gray Levels

There is not much effect on an image after decrease the gray levels to its half. Lets
decrease some more.
64 Gray Levels
Still not much effect, then lets reduce the levels more.
32 Gray Levels

Surprised to see, that there is still some little effect. May be it’s due to reason, that it
is the picture of Einstein, but let’s reduce the levels more.

16 Gray Levels

Boom here, we go, the image finally reveals, that it is effected by the levels.
8 Gray Levels

4 Gray Levels

Now before reducing it, further two 2 levels, you can easily see that the image has been
distorted badly by reducing the gray levels. Now we will reduce it to 2 levels, which
is nothing but a simple black and white level. It means the image would be simple
black and white image.
2 Gray Levels

That’s the last level we can achieve, because if reduce it further, it would be simply a
black image, which cannot be interpreted.
Contouring
There is an interesting observation here, that as we reduce the number of gray levels,
there is a special type of effect start appearing in the image, which can be seen clear in
16 gray level picture. This effect is known as Contouring.
Iso preference curves
The answer to this effect, that why it appears, lies in Iso preference curves. They are
discussed in our next tutorial of Contouring and Iso preference curves.
What is contouring?
As we decrease the number of gray levels in an image, some false colors, or edges start
appearing on an image. This has been shown in our last tutorial of Quantization.
Lets have a look at it.
Consider we, have an image of 8bpp (a grayscale image) with 256 different shades of
gray or gray levels.

This above picture has 256 different shades of gray. Now when we reduce it to 128
and further reduce it 64, the image is more or less the same. But when re reduce it
further to 32 different levels, we got a picture like this

If you will look closely, you will find that the effects start appearing on the image.
These effects are more visible when we reduce it further to 16 levels and we got an
image like this.

These lines that start appearing on this image are known as contouring that are very
much visible in the above image.
Increase and decrease in contouring
The effect of contouring increase as we reduce the number of gray levels and the effect
decrease as we increase the number of gray levels. They are both vice versa

VS

That means more quantization, will effect in more contouring and vice versa. But is
this always the case. The answer is No. That depends on something else that is
discussed below.
Isopreference curves
A study conducted on this effect of gray level and contouring, and the results were
shown in the graph in the form of curves, known as Isopreference curves.
The phenomena of Isopreference curve shows that the effect of contouring not only
depends on the decreasing of gray level resolution but also on the image detail.
The essence of the study is:If an image has more detail, the effect of contouring
would start appear on this image later, as compare to an image which has less detail,
when the gray levels are quantized.According to the original research, the researchers
took these three images and they vary the Gray level resolution, in all three images.
The images were
Level of detail
The first image has only a face in it, and hence very less detail. The second image has
some other objects in the image too, such as camera man, his camera, camera stand,
and background objects e.t.c. whereas the third image has more details then all the
other images. graph has been shown below.

According to this graph, we can see that the first image which was of face was subject
to contouring early then all of the other two images. The second image that was of the
cameraman was subject to contouring a bit after the first image when its gray levels
are reduced. This is because it has more details then the first image. And the third
image was subject to contouring a lot after the first two images i-e: after 4 bpp. This is
because, this image has more detaills
Imaging geometry

Central Projection

Vector notation:

where and are 3-vectors, with .

Here central projection is represented in the coordinate frame attached to the camera.
Generally, there is not direct access to this camera coordinate frame. Instead, we need
to determine the mapping from a world coordinate frame to an image coordinate
system (see next slide).

2D-FFT PROPERTIES:

2-D FFT properties:


Walsh transform :
We define now the 1-D Walsh transform as follows:

The above is equivalent to:


The transform kernel values are obtained from:

Therefore, the array formed by the Walsh matrix is a real symmetric matrix. It is
easily shown that it has orthogonal columns and rows
1-D Inverse Walsh Transform

The above is again equivalent to

The array formed by the inverse Walsh matrix is identical to the one formed by the
forward Walsh matrix apart from a multiplicative factor N.
2-D Walsh Transform
We define now the 2-D Walsh transform as a straightforward extension of the 1-D
transform:

•The above is equivalent to:

2D inverse Walsh Transform


We define now the Inverse 2-D Walsh transform. It is identical to the forward 2-D
Walsh transform
The above is equivalent to:

Hadamard Transform:
We define now the 2-D Hadamard transform. It is similar to the 2-D Walsh
transform.

The above is equivalent to:

We define now the Inverse 2-D Hadamard transform. It is identical to the forward 2-
D Hadamard transform.

The above is equivalent to:

Discrete cosine transforms (DCT):


The discrete cosine transform (DCT) helps separate the image into parts (or spectral
sub- bands) of differing importance (with respect to the image's visual quality). The
DCT is similar to the discrete Fourier transform: it transforms a signal or image from
the spatial domain to the frequency domain.
The general equation for a 1D (N data items) DCT is defined by the following equation:
and the corresponding inverse 1D DCT transform is simple F-1(u), i.e.: where

The general equation for a 2D (N by M image) DCT is defined by the following


equation:

and the corresponding inverse 2D DCT transform is simple F-1(u,v), i.e.: where
The basic operation of the DCT is as follows:

• The input image is N by M;


• f(i,j) is the intensity of the pixel in row i and column j;
• F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix.
• For most images, much of the signal energy lies at low frequencies; these appear
in the upper left corner of the DCT.
• Compression is achieved since the lower right values represent higher
frequencies, and are often small - small enough to be neglected with little visible
distortion.
• The DCT input is an 8 by 8 array of integers. This array contains each pixel's
gray scale level;
• 8 bit pixels have levels from 0 to 255.
• The Haar transform is based on a class of orthogonal matrices whose elements
are either 1, –1, or 0 multiplied by powers of 2 . The Haar transform is a
computationally efficient transform as the transform of an N-point 2 vector
requires only 2(N – 1) additions and N multiplications..
Slant transform
The slant transform was introduced by Enomoto and Shibata as an orthogonal
transform containing sawtooth waveforms or ‘slant’ basis vectors. A slant basis vector
that is monotonically decreasing in constant steps from maximum to minimum has the
sequency property and has a fast computational algorithm. Let SN denote an N × N
slant matrix with N = 2n. Then
The S4 matrix is obtained by the following operation:

If a = 2b and b = 15, the slant matrix is given by

The sequency of the slant matrix of order four is given below:

From the sequency property, it is clear that the rows are ordered by the number of sign
changes.
The slant transform reproduces linear variations of brightness very well. However, its
performance at
edges is not as optimal as the KLT or DCT. Because of the ‘slant’ nature of the lower
order coefficients, its effect is to smear the edges
Hotelling Transform:
The KL transform is named after Kari Karhunen and Michel Loeve who developed it
as a series expansion method for continuous random processes. Originally, Harold
Hotelling studied the discrete formulation of the KL transform and for this reason, the
KL transform is also known as the Hotelling transform. The KL transform is a
reversible linear transform that exploits the statistical properties of a vector
representation. The basic functions of the KL transform are orthogonal eigen vectors
of the covariance matrix of a data set. A KL transform optimally decorrelates the input
data. After a KL transform, most of the ‘energy’ of the transform coefficients is
concentrated within the first few components. This is the energy compaction property
of a KL transform.
Drawbacks of KL Transforms
The two serious practical drawbacks of KL transform are the following:
i. A KL transform is input-dependent and the basic function has to be calculated for
each signal model
on which it operates. The KL bases have no specific mathematical structure that leads
to fast implementations.
ii. The KL transform requires O(m2) multiply/add operations. The DFT and DCT
require O( log2
m ) multiplications.
Applications of KL Transforms
(i) Clustering Analysis The KL transform is used in clustering analysis to determine a
new coordinate
system for sample data where the largest variance of a projection of the data lies on
the first axis, the next largest variance on the second axis, and so on. Because these
axes are orthogonal, this approach allows for reducing the dimensionality of the data
set by eliminating those coordinate axes with small variances. This data-reduction
technique is commonly referred as Principle Component Analysis (PCA).
(ii) Image Compression The KL transform is heavily utilised for performance
evaluation of compression algorithms since it has been proven to be the optimal
transform for the compression of an image sequence in the sense that the KL spectrum
contains the largest number of zero-valued coefficients.
MODULE-II
IMAGE ENHANCEMENT
COURSE OUTCOMES MAPPED WITH MODULE-III
At the end of the unit students are able to:
Bloom’s
CO Course Outcomes
Taxonomy
CO 2 Construct image intensity transformations and spatial filtering for image Apply
enhancement in the spatial domain.
Identify 2D convolution and filtering techniques for smoothening and
CO 3 sharpening of images in frequency domain. Apply

PROGRAM OUTCOMES AND PROGRAM SPECIFIC OUTCOMES MAPPED WITH


MODULE III

PO1 Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.
PO 2 Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences
PO 3 Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified
needs with appropriate consideration for the public health and safety, and the
cultural, societal, and environmental considerations.

PO 4 Conduct investigations of complex problems: Use research- based knowledge


and research methods including design of experiments, analysis and
interpretation of data, and synthesis
of the information to provide valid conclusions.

PO 10 Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.

PSO 3 Focus on improve software reliability, network security or information retrieval


systems.

MAPPING OF COs WITH POs, PSOs FOR MODULE III


Program
Course Program Outcomes Specific
Outcomes Outcomes
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3

CO 2 √ √ √ √
CO 3
√ √ √ √
INTRODUCTION:
Image enhancement approaches fall into two broad categories: spatial domain methods and frequency
domain methods. The term spatial domain refers to the image plane itself, and approaches in this
category are based on direct manipulation of pixels in an image.
Frequency domain processing techniques are based on modifying the Fourier transform of an image.
Enhancing an image provides better contrast and a more detailed image as compare to non enhanced
image. Image enhancement has very good applications. It is used to enhance medical images, images
captured in remote sensing, images from satellite e.t.c. As indicated previously, the term spatial domain
refers to the aggregate of pixels composing an image. Spatial domain methods are procedures that operate
directly on these pixels. Spatial domain processes will be denoted by the expression.

g(x,y) = T[f(x,y)]
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator on f, defined
over some neighborhood of (x, y). The principal approach in defining a neighborhood about a point
(x, y) is to use a square or rectangular subimage area centered at (x, y), as shows. The center of the
subimage is moved from pixel to pixel starting, say, at the top left corner. The operator T is applied
at each location (x, y) to yield the output, g, at that location. The process utilizes only the pixels in
the area of the image spanned by the neighborhood.
The simplest form of T is when the neighborhood is of size 1*1 (that is, a single pixel). In this case,
g depends only on the value of f at (x, y), and T becomes a gray-level (also called an intensity or
mapping) transformation function of the form

s=T(r)
where r is the pixels of the input image and s is the pixels of the output image. T is a transformation
function that maps each value of „r‟ to each value of „s‟.
For example, if T(r) has the form the effect of this transformation would be to produce an image of
higher contrast than the original by darkening the levels below m and brightening the levels above
m in the original image. In this technique, known as contrast stretching, the values of r below m are
compressed by the transformation function into a narrow range of s, toward black.The opposite
effect takes place for values of r above m.
In the limiting case shown in Fig. T(r) produces a two-level (binary) image. A mapping of this form
is called a thresholding function.
One of the principal approaches in this formulation is based on the use of so-called masks (also
referred to as filters, kernels, templates, or windows). Basically, a mask is a small (say, 3*3) 2-D
array, such as the one shown in Fig, in which the values of the mask coefficients determine the
nature of the process, such as image sharpening. Enhancement techniques based on this type of
approach often are referred to as mask processing or filtering.
Figure: Gray level transformation functions for contrast enhancement.

Image enhancement can be done through gray level transformations which are discussed below.
Basic gray level transformations:
• Image negative
• Log transformations
• Power lawtransformations
• Piecewise-Linear transformationfunctions
Linear transformation:
First we will look at the linear transformation. Linear transformation includes simple identity
and negative transformation.
Identity transition is shown by a straight line. In this transition, each value of the input image is
directly mapped to each other value of output image. That results in the same input image and
output image. And hence is called identity transformation. It has been shown below:

Fig. Linear transformation between input and output.

Negative transformation:
The second linear transformation is negative transformation, which is invert of identity
transformation. In negative transformation, each value of the input image is subtracted from the
L-1 and mapped onto the output image
Image negative: The image negative with gray level value in the range of [0, L-1] is obtained by negative
transformation given by S = T(r) or
S = L -1 – r
Where r= gray level value at pixel (x,y)
L is the largest gray level consists in the image It results in getting photograph negative. It is useful
when for enhancing white details embedded in dark regions of the image. The overall graph of these
transitions has been shown below.

Input gray level, r


Fig. Some basic gray-level transformation functions used for image enhancement.

In this case the following transition has been done.


s = (L – 1) – r

since the input image of Einstein is an 8 bpp image, so the number of levels in this image are
256. Putting 256 in the equation, we get this
s = 255 – r

So each value is subtracted by 255 and the result image has been shown above. So what happens
is that, the lighter pixels become dark and the darker picture becomes light. And it results in
image negative.
It has been shown in the graph below.

Fig. Negative transformations.

Logarithmic transformations:
Logarithmic transformation further contains two type of transformation. Log transformation and
inverse log transformation
Log transformations:
The log transformations can be defined by this formula
s = c log(r + 1).
Where s and r are the pixel values of the output and the input image and c is a constant. The
value 1 is added to each of the pixel value of the input image because if there is a pixel intensity
of 0 in the image, then log (0) is equal to infinity. So 1 is added, to make the minimum value at
least 1.
During log transformation, the dark pixels in an image are expanded as compare to the higher
pixel values. The higher pixel values are kind of compressed in log transformation. This result
in following image enhancement.An another way of representing
Log transformations: Enhance details in the darker regions of an image at the expense of detail in brighter
regions.
T(f) = C * log (1+r)
• Here C is constant and r ≥ 0.
• The shape of the curve shows that this transformation maps the narrow range of low gray level values
in the input image into a wider range of output image.
• The opposite is true for high level values of input image.

Fig. log transformation curve input vs output

POWER – LAW TRANSFORMATIONS:


There are further two transformation is power law transformations, that include nth power and
nth root transformation. These transformations can be given by the expression:
s=crγ
This symbol γ is called gamma, due to which this transformation is also known as gamma
transformation.
Variation in the value of γ varies the enhancement of the images. Different display devices /
monitors have their own gamma correction, that’s why they display their image at different
intensity.
where c and g are positive constants. is written as S = C (r +ε) γ to account for an offset (that
is, a measurable output when the input is zero). Plots of s versus r for various values of γ are
shown in Fig. As in the case of the log transformation, power-law curves with fractional values
of γ map a narrow range of dark input values into a wider range of output values, with the
opposite being true for higher values of input levels. Unlike the log function, however, we notice
here a family of possible transformation curves obtained simply by varying γ.
In Fig that curves generated with values of γ>1 have exactly The opposite effect as those
generated with values of γ<1. Finally, we Note that reduces to the identity transformation when
c=γ=1.

Fig. 2.13 Plot of the equation S = crγ for various values of γ (c =1 in all cases).

This type of transformation is used for enhancing images for different type of display devices.
The gamma of different display devices is different. Varying gamma (γ) obtains family of
possible transformation curves S = C* r γ
Here C and γ are positive constants. Plot of S versus r for various values of γ is γ > 1 compresses dark values
Expands bright values
γ < 1 (similar to Log transformation) Expands dark values Compresses bright values
When C = γ = 1 , it reduces to identity transformation .
Correcting gamma:

s=crγ s=cr (1/2.5)


The same image but with different gamma values has been shown here.
Piecewise-Linear Transformation Functions:
A complementary approach to the methods discussed in the previous three sections is to use
piecewise linear functions. The principal advantage of piecewise linear functions over the types
of functions we have discussed thus far is that the form of piecewise functions can be arbitrarily
complex.
The principal disadvantage of piecewise functions is that their specification requires
considerably more user input. Contrast stretching: One of the simplest piecewise linear
functions are a contrast-stretching transformation. Low-contrast images can result from poor
illumination, lack of dynamic angle in the imaging sensor, or even wrong setting of a lens
aperture during image acquisition.
S= T(r )
Figure x(a) shows a typical transformation used for contrast stretching. The locations of points
(r1, s1) and (r2, s2) control the shape of the transformation Function. If r1=s1 and r2=s2, the
transformation is a linear function that produces No changes in gray levels. If r1=r2, s1=0and
s2= L-1, the transformation Becomes a thresholding function that creates a binary image,
Intermediate values of ar1, s1b and ar2, s2b produce various degrees Of spread in the gray levels
of the output image, thus affecting its contrast. In general, r1≤ r2 and s1 ≤ s2 is assumed so that
the function is single valued and monotonically increasing.

Fig. x Contrast stretching. (a) Form of transformation function. (b) A low-contrast stretching.
(c) Result of contrast stretching. (d) Result of thresholding

Figure x(b) shows an 8-bit image with low contrast. Fig. x(c) shows the result of contrast
stretching, obtained by setting (r1, s1 )=(rmin, 0) and (r2, s2)=(rmax,L-1) where rmin and rmax
denote the minimum and maximum gray levels in the image, respectively.
Thus, the transformation function stretched the levels linearly from their original range to the
full range
[0, L-1] . Finally, Fig. x(d) shows the result of using the thresholding function defined
previously, with r1=r2=m, the mean gray level in the image. The original image on which these
results are based is a scanning electron microscope image of pollen, magnified approximately
700 times.
Gray-level slicing:
Highlighting a specific range of gray levels in an image often is desired. Applications include
enhancing features such as masses of water in satellite imagery and enhancing flaws in X-ray
images.
There are several ways of doing level slicing, but most of them are variations of two basic
themes .One approach is to display a high value for all gray levels in the range of interest and a
low value for all other gray levels.
This transformation, shown in Fig. y(a), produces a binary image. The second approach, based
on the transformation shown in Fig .y (b), brightens the desired range of gray levels but
preserves the background and gray-level tonalities in the image. Figure y (c) shows a gray-scale
image, and Fig. y(d) shows the result of using the transformation in Fig. y(a).Variations of the
two transformations shown in Fig. are easy to formulate.

Bit-plane slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total image
appearance by specific bits might be desired. Suppose that each pixel in an image is represented
by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from bit-plane 0 for
the least significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes, plane
0 contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7
contains all the high-order bits.
Figure illustrates these ideas, and Fig. 3.14 shows the various bit planes for the image shown in
Fig. Note that the higher-order bits (especially the top four) contain the majority of the visually
significant data. The other bit planes contribute to more subtle details in the image. Separating
a digital image into its bit planes is useful for analyzing the relative importance played by each
bit of the image, a process that aids in determining the adequacy of the number of bits used to
quantize each pixel.

In terms of bit-plane extraction for an 8-bit image, it is not difficult to show that the (binary)
image for bit-plane 7 can be obtained by processing the input image with a thresholding gray-
level transformation function that (1) maps all levels in the image between 0 and 127 to one
level (for example, 0); and (2) maps all levels between 129 and 255 to another (for example,
255).The binary image for bit-plane 7 in Fig. was obtained in just this manner. It is left as an
exercise to obtain the gray-level transformation functions that would yield the other bit planes.
Histogram Processing:
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function of
the form
H(rk)=nk
where rk is the kth gray level and nk is the number of pixels in the image having the level rk..
A normalized histogram is given by the equation p(rk)=nk/n for k=0,1,2,…..,L-1 ,P(rk) gives
the estimate of the probability of occurrence of gray level rk. The sum of all components of a
normalized histogram is equal to 1.The histogram plots are simple plots of H(rk)=nk versus rk.
In the dark image the components of the histogram are concentrated on the low (dark) side of
the gray scale. In case of bright image the histogram components are baised towards the high
side of the gray scale. The histogram of a low contrast image will be narrow and will be centered
towards the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of the gray
scale. The net effect of this will be an image that shows a great deal of gray levels details and
has high dynamic range.
Histogram Equalization:
Histogram equalization is a common technique for enhancing the appearance of images.
Suppose we have an image which is predominantly dark. Then its histogram would be skewed
towards the lower end of the grey scale and all the image detail are compressed into the dark
end of the histogram. If we could „stretch out‟ the grey levels at the dark end to produce a more
uniformly distributed histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be enhanced. The
range of r is [0, 1] with r=0 repressing black and r=1 representing white. The transformation
function is of the form
S=T(r) where 0<r<1
It produces a level s for every pixel value r in the original image.
The transformation function is assumed to fulfill two condition T(r) is single valued

and monotonically increasing in the internal 0<T(r)<1 for 0<r<1.The transformation function
should be single valued so that the inverse transformations should exist. Monotonically
increasing condition preserves the increasing order from black to white in the output image. The
second conditions guarantee that the output gray levels will be in the same range as the input
levels. The gray levels of the image may be viewed as random variables in the interval [0.1].
The most fundamental descriptor of a random variable is its probability density function (PDF)
Pr(r) and Ps(s) denote the probability density functions of random variables r and s respectively.
Basic results from an elementary probability theory states that if Pr(r) and Tr are known and T-
1(s) satisfies conditions (a), then the probability density function Ps(s) of the transformed
variable is given by the formula

Thus the PDF of the transformed variable s is the determined by the gray levels PDF of the
input image and by the chosen transformations function.
A transformation function of a particular importance in image processing

This is the cumulative distribution function of r.


L is the total number of possible gray levels in the image.
Image enhancement in frequency domain
Blurring/noise reduction: Noise characterized by sharp transitions in image intensity. Such
transitions contribute significantly to high frequency components of Fourier transform.
Intuitively, attenuating certain high frequency components result in blurring and reduction of
image noise.
Ideal low-pass filter:
Cuts off all high-frequency components at a distance greater than a certain distance from
origin (cutoff frequency).
H (u,v) = 1, if D(u,v) ≤ D0 0, if D(u,v) ˃ D0
Where D0 is a positive constant and D(u,v) is the distance between a point (u,v) in the
frequency domain and the center of the frequency rectangle; that is
D(u,v) = [(u-P/2)2 + (v-Q/2)2] 1/2

Where as P and Q are the padded sizes from the basic equations
Wraparound error in their circular convolution can be avoided by padding these functions with
zeros,
Visualization: ideal low pass filter:
As shown in fig. below

Fig: ideal low pass filter 3-D view and 2-D view and line graph. Effect of different cutoff frequencies:
Fig. below (a) Test pattern of size 688x688 pixels, and (b) its Fourier spectrum. The spectrum
is double the image size due to padding but is shown in half size so that it fits in the page. The
superimposed circles have radii equal to 10, 30, 60, 160 and 460 with respect to the full- size
spectrum image. These radii enclose 87.0, 93.1, 95.7, 97.8 and 99.2% of the padded image
power respectively.
Fig: (a) Test patter of size 688x688 pixels (b) its Fourier spectrum

Fig: (a) original image, (b)-(f) Results of filtering using ILPFs with cutoff frequencies set at radii
values 10, 30, 60, 160 and 460, as shown in figure.The power removed by these filters was 13,
6.9, 4.3, 2.2 and 0.8% of the total, respectively.
As the cutoff frequency decreases,
• image becomes more blurred
• Noise becomes increases
• Analogous to larger spatial filter sizes
The severe blurring in this image is a clear indication that most of the sharp detail information
in the picture is contained in the 13% power removed by the filter. As the filter radius is
increases less and less power is removed, resulting in less blurring.
Why is there ringing?
Ideal low-pass filter function is a rectangular function
The inverse Fourier transform of a rectangular function is a sinc function.
Fig. Spatial representation of ILPFs

Fig. Spatial representation of ILPFs of order 1 and 20 and corresponding intensity profiles
through the center of the filters( the size of all cases is 1000x1000 and the cutoff frequency is
5), observe how ringing increases as a function of filter order.
Butterworth low-pass filter:
Transfor function of a Butterworth low pass filter (BLPF) of order n, and with cutoff frequency
at a distance D0 from the origin, is defined as

Transfer function does not have sharp discontinuity establishing cutoff between passed and
filtered frequencies.
Cut off frequency D0 defines point at which H(u,v) = 0.5
Fig. (a) perspective plot of a Butterworth lowpass-filter transfer function. (b) Filter displayed
as an image. (c)Filter radial cross sections of order 1 through 4.Unlike the ILPF, the BLPF
transfer function does not have a sharp discontinuity that gives a clear cutoff between passed
and filtered frequencies.

Butterworth low-pass filters of different frequencies:


frequencies at the radii Fig. shows the results of applying the BLPF of eq. to fig.(a), with n=2
and D0 equal to the five radii in fig.(b) for the ILPF, we note here a
Fig. (a) Original image.(b)-(f) Results of filtering using BLPFs of order 2, with cutoff
smooth transition in blurring as a function of increasing cutoff frequency. Moreover, no ringing
is visible in any of the images processed with this particular BLPF, a fact attributed to the filter’s
smooth transition between low and high frequencies.
A BLPF of order 1 has no ringing in the spatial domain. Ringing generally is imperceptible in
filters of order 2, but can become significant in filters of higher order.
Fig. Shows a comparison between the spatial representations of BLPFs of various orders (using
a cutoff frequency of 5 in all cases). Shown also is the intensity profile along a horizontal scan
line through the center of each filter. The filter of order 2 does show mild ringing and small
negative values, but they certainly are less pronounced than in the ILPF. A butter worth filter
of order 20 exhibits characteristics similar to those of the ILPF (in the limit, both filters are
identical).

Fig. (a)-(d) Spatial representation of BLPFs of order 1, 2, 5 and 20 and corresponding intensity
profiles through the center of the filters (the size in all cases is 1000 x 1000 and the cutoff
frequency is 5) Observe how ringing increases as a function of filter order.
Gaussian low pass filters:
The form of these filters in two dimensions is given by

• This transfer function is smooth, like Butterworth filter.


• Gaussian in frequency domain remains a Gaussian in spatial domain
• Advantage: No ringing artifacts.
Where D0 is the cutoff frequency. When D (u,v) = D0, the GLPF is down to 0.607 of its
maximum value. This means that a spatial Gaussian filter, obtained by computing the IDFT of
above equation. will have no ringing. Fig. Shows a perspective plot, image display and radial
cross sections of a GLPF function.
Fig. (a) Perspective plot of a GLPF transfer function. (b) Filter displayed as an image.
(c). Filter radial cross sections for various values of D0

Fig.(a) Original image. (b)-(f) Results of filtering using GLPFs with cutoff frequencies at the radii.
Fig. (a) Original image (784x 732 pixels). (b) Result of filtering using a GLPF with D0 = 100. (c) Result of filtering using a
GLPF with D0 = 80. Note the reduction in fine skin lines in the magnified sections in (b) and (c).
Fig. shows an application of low pass filtering for producing a smoother, softer- looking result
from a sharp original. For human faces, the typical objective is to reduce the sharpness of fine
skin lines and small blemished.
Image sharpening using frequency domain filters:
An image can be smoothed by attenuating the high-frequency components of its Fourier
transform. Because edges and other abrupt changes in intensities are associated with high-
frequency components, image sharpening can be achieved in the frequency domain by high pass
filtering, which attenuates the low-frequency components without disturbing high- frequency
information in the Fourier transform.
The filter function H(u,v) are understood to be discrete functions of size PxQ; that is the discrete
frequency variables are in the range u = 0,1,2,…….P-1 and v = 0,1,2,…….Q-1.
The meaning of sharpening is
• Edges and fine detail characterized by sharp transitions in image intensity
• Such transitions contribute significantly to high frequency components of Fourier transform
• Intuitively, attenuating certain low frequency components and preserving high frequency
components result in sharpening.
• Intended goal is to do the reverse operation of low-pass filters
When low-pass filter attenuated frequencies, high-pass filter passes them
• When high-pass filter attenuates frequencies, low-pass filter passes them. A high pass filter
is obtained from a given low pass filter using the equation.

H hp (u,v) = 1- Htp (u,v) Where Hlp (u,v) is the transfer function of the low-pass filter. That is
when the low- pass filter attenuates frequencies; the high-pass filter passed them, and vice-
versa. We consider ideal, Butter-worth, and Gaussian high-pass filters. As in the previous
section, we illustrate the characteristics of these filters in both the frequency and spatial
domains. Fig. Shows typical 3-D plots, image representations and cross sections for these filters.
As before, we see that the Butter-worth filter represents a transition between the sharpness of
the ideal filter and the broad smoothness of the Gaussian filter. Fig. discussed in the sections
the follow, illustrates what these filters look like in the spatial domain. The spatial filters were
obtained and displayed by using the procedure used.

Fig: Top row: Perspective plot, image representation, and cross section of a typical ideal high-
pass filter. Middle and bottom rows: The same sequence for typical butter-worth and Gaussian
high-pass filters.
Ideal high-pass filter:
A 2-D ideal high-pass filter (IHPF) is defined as
H (u,v) = 0, if D(u,v) ≤ D0 1, if D(u,v) ˃ D0

Where D0 is the cutoff frequency and D(u,v) is given by eq. As intended, the IHPF is the
opposite of the ILPF in the sense that it sets to zero all frequencies inside a circle of radius D0
while passing, without attenuation, all frequencies outside the circle. As in case of the ILPF,
the IHPF is not physically realizable.
Spatial representation of high pass filters:

Fig.. Spatial representation of typical (a) ideal (b) Butter-worth and (c) Gaussian frequency
domain high-pass filters, and corresponding intensity profiles through their centers. We can
expect IHPFs to have the same ringing properties as ILPFs. This is demonstrated clearly in Fig.
Which consists of various IHPF results using the original image in Fig.(a) with D0 set to 30,
60,and 160 pixels, respectively. The ringing in Fig. (a) is so severe that it produced distorted,
thickened object boundaries (e.g., look at the large letter “a”). Edges of the top three circles do
not show well because they are not as strong as the other edges in the image (the intensity of
these three objects is much closer to the background intensity, giving discontinuities of smaller
magnitude).

Filtered results: IHPF:

Fig.. Results of high-pass filtering the image in Fig.(a) using an IHPF with D0 = 30, 60, and
160.The situation improved somewhat with D0 = 60. Edge distortion is quite evident still, but
now we begin to see filtering on the smaller objects.
Due to the now familiar inverse relationship between the frequency and spatial domains, we
know that the spot size of this filter is smaller than the spot of the filter with D0 = 30. The result
for D0 = 160 is closer to what a high-pass filtered image should look like. Here, the edges are
much cleaner and less distorted, and the smaller objects have been filtered properly.Of course,
the constant background in all images is zero in these high-pass filtered images because
highpass filtering is analogous to differentiation in the spatial domain.
Butter-worth high-pass filters:
A 2-D Butter-worth high-pass filter (BHPF) of order n and cutoff frequency D0 is defined as

Where D(u,v) is given by Eq.(3). This expression follows directly from and (6). The middle row
of Fig. shows an image and cross section of the BHPF function.

Butter-worth high-pass filter to behave smoother than IHPFs. Figure shows the performance of
a BHPF of order 2 and with D0 set to the same values as in Figure shows The boundaries are
much less distorted than in below figure. even for the smallest value of cutoff frequency.
Filtered results: BHPF:

Fig. Results of high-pass filtering the image in above figure (a) using a BHPF of order 2 with
D0 = 30, 60, and 160 corresponding to the circles in above figure (b). These results are much
smoother than those obtained with an IHPF.
Gaussian high-pass filters:
The transfer function of the Gaussian high-pass filter(GHPF) with cutoff frequency locus at a
distance D0 from the center of the frequency rectangle is given by

Where D(u,v) is given by Eq.(4). This expression follows directly from Eqs.(2) and (6). The
third row in below figure. Shows a perspective plot, image and cross section of the GHPF
function. Following the same format as for the BHPF, we show in below Fig comparable results
using GHPFs. As expected, the results obtained are more gradual than with the previous two
filters.
MODULE – III
IMAGE RESTORATION AND FILTERING
At the end of the unit students are able to:
Knowledge Level
Course Outcomes (Bloom’s
Taxonomy)
CO3 Apply region and edge based image segmentation Apply
techniques for detection of objects in images.
CO4 Interpret morphological operations for extracting image Apply
components to represent and description of region shape.

PROGRAM OUTCOMES AND PROGRAM SPECIFIC OUTCOMES MAPPED WITH MODULE III
PO 1 Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex
engineering problems.

PO 2 Problem analysis: Identify, formulate, review research literature, and analyze


complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences

Conduct investigations of complex problems: Use research- based knowledge and


PO 4 research methods including design of experiments, analysis and interpretation of
data, and synthesis of the information to provide valid conclusions.
PO5 Modern tool usage: Create, Select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
PO 10 Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions.
PSO 3 Build the Embedded hardware design and software programming skills for entry level job
positions to meet the requirements of employers
MAPPING OF COs WITH POs and PSOs FOR UNIT-III
Course Program Outcomes Program
Outcomes Specific
Outcomes

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
CO 3
√ √
CO 4 √ √ √ √ √ √
Gray level interpolation.
The distortion correction equations yield non integer values for x' and y'. Because the
distorted image g is digital, its pixel values are defined only at integer coordinates. Thus using
non integer values for x' and y' causes a mapping into locations of g for which no gray levels
are defined. Inferring what the gray-level values at those locations should be, based only on the
pixel values at integer coordinate locations, and then becomes necessary. The technique used to
accomplish this is called gray-level interpolation. The simplest scheme for gray-level
interpolation is based on a nearest neighbor approach.This method, also called zero-order
interpolation, is illustrated in Fig. 3.1: This figure shows The mapping of integer (x, y) coordinates
into fractional coordinates (x', y') by means of following equations

x' = c1x + c2y + c3xy + c4


and

y' = c5x + c6y + c7xy + c8


(A) The selection of the closest integer coordinate neighbor to (x', y');

and

(B) The assignment of the gray level of this nearest neighbor to the pixel located at (x, y).

Figure 3.1: Gray-level interpolation based on the nearest neighbor concept.

Although nearest neighbor interpolation is simple to implement, this method often has
the drawback of producing undesirable artifacts, such as distortion of straight edges in images
of high resolution. Smoother results can be obtained by using more sophisticated techniques,
such as cubic convolution interpolation, which fits a surface of the sin(z)/z type through a much
larger number of neighbors (say, 16) in order to obtain a smooth estimate of the gray level
at any
Desired point. Typical areas in which smoother approximations generally are required include
3-D graphics and medical imaging. The price paid for smoother approximations is additional
computational burden. For general-purpose image processing a bilinear interpolation approach
that uses the gray levels of the four nearest neighbors usually is adequate. This approach is
straightforward. Because the gray level of each of the four integral nearest neighbors of a non
integral pair of coordinates (x', y') is known, the gray-level value at these coordinates, denoted
v(x', y'), can be interpolated from the values of its neighbors by using the relationship

v (x', y') = ax' + by' + c x' y' + d

where the four coefficients are easily determined from the four equations in four unknowns that
can be written using the four known neighbors of (x', y'). When these coefficients have been
determined, v(x', y') is computed and this value is assigned to the location in f{x, y) that yielded
the spatial mapping into location (x', y'). It is easy to visualize this procedure with the aid of
Fig.3.1. The exception is that, instead of using the gray-level value of the nearest neighbor to
(x', y'), we actually interpolate a value at location (x', y') and use this value for the gray-level
assignment at (x, y).

Wiener filter used for image restoration.


The inverse filtering approach makes no explicit provision for handling noise. This approach
incorporates both the degradation function and statistical characteristics of noise into the
restoration process. The method is founded on considering images and noise as random
processes and the objective is to find an estimate f of the uncorrupted image f such that the
mean square error between them is minimized. This error measure is given by

e2 = E {(f- f )2}
where E {•} is the expected value of the argument. It is assumed that the noise and the image
are uncorrelated; that one or the other has zero mean; and that the gray levels in the estimate
are a linear function of the levels in the degraded image. Based on these conditions, the
minimum of the error function is given in the frequency domain by the expression
where we used the fact that the product of a complex quantity with its conjugate is equal to the
magnitude of the complex quantity squared. This result is known as the Wiener filter, after N.
Wiener [1942], who first proposed the concept in the year shown. The filter, which consists of
the terms inside the brackets, also is commonly referred to as the minimum mean square error
filter or the least square error filter. The Wiener filter does not have the same problem as the
inverse filter with zeros in the degradation function, unless both H(u, v) and Sη(u, v) are zero
for the same value(s) of u and v. The terms in above equation are as follows: H (u, v) =
degradation function H*(u, v) = complex conjugate of H (u, v)

│H (u, v│ 2 = H*(u, v)* H (u, v)

Sη (u, v) = │N (u, v) 2 = power spectrum of the noise

Sf (u, v) = │F (u, v) 2 = power spectrum of the un-degraded image.

As before, H (u, v) is the transform of the degradation function and G (u, v) is the transform of
the degraded image. The restored image in the spatial domain is given by the inverse Fourier
transform of the frequency-domain estimate F (u, v). Note that if the noise is zero, then the
noise power spectrum vanishes and the Wiener filter reduces to the inverse filter. When we are
dealing with spectrally white noise, the spectrum │N (u, v│ 2 is a constant, which simplifies
things considerably. However, the power spectrum of the undegraded image seldom is known.
An approach used frequently when these quantities are not known or cannot be estimated is to
approximate the equation as

where K is a specified constant.


Model of the Image Degradation/Restoration Process.
The Fig. 3.1shows, the degradation process is modeled as a degradation function that, together
with an additive noise term, operates on an input image f(x, y) to produce a degraded image g(x,
y). Given g(x, y), some knowledge about the degradation function H, and some knowledge about
the additive noise term η(x, y), the objective of restoration is to obtain an estimate f(x, y) of the
original image. the estimate should be as close as possible to the original input image and, in
general, the more we know about H and η, the closer f(x, y) will be to f(x, y). The degraded
image is given in the spatial domain by

g (x, y) = h (x, y) * f (x, y) + η (x, y)

where h (x, y) is the spatial representation of the degradation function and, the symbol
* indicates convolution. Convolution in the spatial domain is equal to multiplication in
the frequency domain, hence

G (u, v) = H (u, v) F (u, v) + N (u, v)

where the terms in capital letters are the Fourier transforms of the corresponding terms in above
equation.

Figure 3.2: model of the image degradation/restoration


process.
The restoration filters used when the image degradation is due to noise only.

If the degradation present in an image is only due to noise, then,

g (x, y) = f (x, y) + η (x, y)

G (u, v) = F (u, v) + N (u, v)

The restoration filters used in this case are,

1. Mean filters
2. Order static filters and
3. Adaptive filters

Mean filters.
There are four types of mean filters. They are

(i) Arithmetic mean filter


This is the simplest of the mean filters. Let Sxy represent the set of coordinates in a
rectangular subimage window of size m X n, centered at point (x, y).The arithmetic mean
filtering process computes the average value of the corrupted image g(x, y) in the area
defined by Sxy. The value of the restored image f at any point (x, y) is simply the arithmetic
mean computed using the pixels in the region defined by Sxy. In other words

This operation can be implemented using a convolution mask in which all coefficients have
value 1/mn

(ii) Geometric mean filter


An image restored using a geometric mean filter is given by the expression
Here, each restored pixel is given by the product of the pixels in the subimage window, raised
to the power 1/mn. A geometric mean filter achieves smoothing comparable to the arithmetic
mean filter, but it tends to lose less image detail in the process.

(iii) Harmonic mean filter


The harmonic mean filtering operation is given by the expression

The harmonic mean filter works well for salt noise, but fails for pepper noise. It does well
also with other types of noise like Gaussian noise.

(iv) Contra harmonic mean filter


The contra harmonic mean filtering operation yields a restored image based on the expression

where Q is called the order of the filter. This filter is well suited for reducing or virtually
eliminating the effects of salt-and-pepper noise. For positive values of Q, the filter eliminates
pepper noise. For negative values of Q it eliminates salt noise. It cannot do both
simultaneously. Note that the contra harmonic filter reduces to the arithmetic mean filter if Q
= 0, and to the harmonic mean filter if Q = -1.

The Order-Statistic Filters.


There are four types of Order-Statistic filters. They are

(i) Median filter


The best-known order-statistics filter is the median filter, which, as its name implies,
replaces the value of a pixel by the median of the gray levels in the neighborhood of that pixel:

The original value of the pixel is included in the computation of the median. Median filters
are quite popular because, for certain types of random noise, they provide excellent noise-
reduction capabilities, with considerably less blurring than linear smoothing filters of
similar size. Median filters are particularly effective in the presence of both bipolar and
unipolar impulse noise.

(ii) Max and min filters


Although the median filter is by far the order-statistics filler most used in image
processing, it is by no means the only one. The median represents the 50th percentile of a
ranked set of numbers, but the reader will recall from basic statistics that ranking lends itself
to many other possibilities. For example, using the 100th percentile results in the so-called
max filter, given by

This filter is useful for finding the brightest points in an image. Also, because pepper noise
has very low values, it is reduced by this filter as a result of the max selection process in the
subimage area Sxy. The 0th percentile filter is the min filter.

This filter is useful for finding the darkest points in an image. Also, it reduces salt noise as a
result of the min operation.

(iii) Midpoint filter


The midpoint filter simply computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:
Note that this filter combines order statistics and averaging. This filter works best for randomly
distributed noise, like Gaussian or uniform noise.

(iv) Alpha - trimmed mean filter


It is a filter formed by deleting the d/2 lowest and the d/2 highest gray-level values of
g(s,
t) in the neighborhood Sxy. Let gr (s, t) represent the remaining mn - d pixels. A filter formed by
averaging these remaining pixels is called an alpha-trimmed mean filter:

where the value of d can range from 0 to mn - 1. When d = 0, the alpha- trimmed filter
reduces to the arithmetic mean filter. If d = (mn - l)/2, the filter becomes a median filter. For
other values of d, the alpha-trimmed filter is useful in situations involving multiple types of
noise, such as a combination of salt-and-pepper and Gaussian noise.

The Adaptive Filters.


Adaptive filters are filters whose behavior changes based on statistical characteristics
of the image inside the filter region defined by the m X n rectangular window Sxy.

Adaptive, local noise reduction filter:


The simplest statistical measures of a random variable are its mean and variance. These
are reasonable parameters on which to base an adaptive filler because they are quantities
closely related to the appearance of an image. The mean gives a measure of average gray level
in the region over which the mean is computed, and the variance gives a measure of average
contrast in that region.

This filter is to operate on a local region, Sxy. The response of the filter at any point (x,
y) on which the region is centered is to be based on four quantities: (a) g(x, y), the value of the
noisy image at (x, y); (b) a2, the variance of the noise corrupting /(x, y) to form g(x, y); (c)
ray, the local mean of the pixels in Sxy; and (d) σ2L , the local variance of the pixels in Sxy.The
behavior of the filter to be as follows:

1. If σ2η is zero, the filler should return simply the value of g (x, y). This is the trivial, zero-noise
case in which g (x, y) is equal to f (x, y).

2. If the local variance is high relative to σ2η the filter should return a value close to g (x, y). A
high local variance typically is associated with edges, and these should be preserved.

3. If the two variances are equal, we want the filter to return the arithmetic mean value of the
pixels in Sxy. This condition occurs when the local area has the same properties as the overall
image, and local noise is to be reduced simply by averaging.

Adaptive local noise filter is given by,

The only quantity that needs to be known or estimated is the variance of the overall noise, a2.
The other parameters are computed from the pixels in Sxy at each location (x, y) on which the
filter window is centered.

Adaptive median filter:


The median filter performs well as long as the spatial density of the impulse noise is not large
(as a rule of thumb, Pa and Pb less than 0.2). The adaptive median filtering can handle impulse
noise with probabilities even larger than these. An additional benefit of the adaptive median
filter is that it seeks to preserve detail while smoothing nonimpulse noise, something that the
"traditional" median filter does not do. The adaptive median filter also works in a rectangular
window area Sxy. Unlike those filters, however, the adaptive median filter changes (increases)
the size of Sxy during filter operation, depending on certain conditions. The output of the filter
is a single value used to replace the value of the pixel at (x, y), the particular point on which the
window Sxy is centered at a given time.

Consider the following notation:

zmin = minimum gray level value in Sxy zmax = maximum gray level value in

Sxy zmcd = median of gray levels in Sxy


zxy = gray level at coordinates (x, y) Smax = maximum allowed size of Sxy.

The adaptive median filtering algorithm works in two levels, denoted level A and level B, as
follows:

Level A: A1 = zmed - zmin

A2 = zmed - zmax

If A1 > 0 AND A2 < 0, Go to level B

Else increase the window size

If window size ≤ Smax repeat level A

Else output zxy

Level B: B1 = zxy - zmin

B2 = zxy - zmax

If B1> 0 AND B2 < 0, output zxy

Else output zmed

Image Formation Model.


An image is represented by two-dimensional functions of the form f(x, y). The value or
amplitude of f at spatial coordinates (x, y) is a positive scalar quantity whose physical meaning
is determined by the source of the image. When an image is generated from a physical process,
its values are proportional to energy radiated by a physical source (e.g., electromagnetic waves).
As a consequence, f(x, y) must be nonzero and finite; that is,

0 < f (x, y) < ∞ …. (1)

The function f(x, y) may be characterized by two components:

A) The amount of source illumination incident on the scene being viewed.

B) The amount of illumination reflected by the objects in the scene.

Appropriately, these are called the illumination and reflectance components and are
denoted by i (x, y) and r (x, y), respectively. The two functions combine as a product to
form f (x, y).
f (x, y) = i (x, y) r (x, y) …. (2)

where
0 < i (x, y) < ∞ …. (3)

and

0 < r (x, y) < 1 …. (4)

Equation (4) indicates that reflectance is bounded by 0 (total absorption) and 1 (total
reflectance).The nature of i (x, y) is determined by the illumination source, and r (x, y) is
determined by the characteristics of the imaged objects. It is noted that these expressions also
are applicable to images formed via transmission of the illumination through a medium, such
as a chest X-ray.

Inverse filtering.
The simplest approach to restoration is direct inverse filtering, where F (u, v), the
transform of the original image is computed simply by dividing the transform of the degraded
image, G (u, v), by the degradation function

The divisions are between individual elements of the functions.

But G (u, v) is given by

G (u, v) = F (u, v) + N (u, v)

Hence
It tells that even if the degradation function is known the undegraded image cannot be
recovered [the inverse Fourier transform of F( u, v)] exactly because N(u, v) is a random
function whose Fourier transform is not known.

If the degradation has zero or very small values, then the ratio N(u, v)/H(u, v) could
easily dominate the estimate F(u, v).

One approach to get around the zero or small-value problem is to limit the filter
frequencies to values near the origin. H (0, 0) is equal to the average value of h(x, y) and that
this

is usually the highest value of H (u, v) in the frequency domain. Thus, by limiting the analysis
to frequencies near the origin, the probability of encountering zero values is reduced.

Noise Probability Density Functions.


The following are among the most common PDFs found in image processing applications.

Gaussian noise
Because of its mathematical tractability in both the spatial and frequency domains,
Gaussian (also called normal) noise models are used frequently in practice. In fact, this
tractability is so convenient that it often results in Gaussian models being used in situations in
which they are marginally applicable at best.

The PDF of a Gaussian random variable, z, is given by

… (1)
where z represents gray level, µ is the mean of average value of z, and a σ is its standard
deviation. The standard deviation squared, σ2, is called the variance of z. A plot of this function
is shown in Fig. 5.10. When z is described by Eq. (1), approximately 70% of its values will be
in the range [(µ - σ), (µ +σ)], and about 95% will be in the range [(µ - 2σ), (µ + 2σ)].

Rayleigh noise
The PDF of Rayleigh noise is given by

The mean and variance of this density are given by


µ = a + ƒ Mb/4
σ2 = b(4 – Π)/4

Figure 5.10 shows a plot of the Rayleigh density. Note the displacement from the origin and
the fact that the basic shape of this density is skewed to the right. The Rayleigh density can be
quite useful for approximating skewed histograms.

Erlang (Gamma) noise

The PDF of Erlang noise is given by

where the parameters are such that a > 0, b is a positive integer, and "!" indicates factorial.
The mean and variance of this density are given by

µ=b/a

σ2 = b / a2
Exponential noise
The PDF of exponential noise is given by

The mean of this density function is given by

µ=1/a

σ2 = 1 / a2
This PDF is a special case of the Erlang PDF, with b = 1.

Uniform noise

The PDF of uniform noise is given by

The mean of this density function is given by

µ = a + b /2

σ2 = (b – a ) 2 / 12

Impulse (salt-and-pepper) noise


The PDF of (bipolar) impulse noise is given by
If b > a, gray-level b will appear as a light dot in the image. Conversely, level a will
appear like a dark dot. If either Pa or Pb is zero, the impulse noise is called unipolar. If neither
probability is zero, and especially if they are approximately equal, impulse noise values will
resemble salt-and-pepper granules randomly distributed over the image. For this reason, bipolar
impulse noise also is called salt-and-pepper noise. Shot and spike noise also are terms used to
refer to this type of noise.

Figure 3.4: Some important probability density functions


Enumerate the differences between the image enhancement and image
restoration.
(i) Image enhancement techniques are heuristic procedures designed to manipulate an image in order
to take advantage of the psychophysical aspects of the human system. Whereas image restoration
techniques are basically reconstruction techniques by which a degraded image is reconstructed by
using some of the prior knowledge of the degradation phenomenon.

(ii) Image enhancement can be implemented by spatial and frequency domain technique, whereas
image restoration can be implement by frequency domain and algebraic techniques.

(iii) The computational complexity for image enhancement is relatively less when compared to the
computational complexity for irrrage restoration, since algebraic methods requires manipulation of
large number of simultaneous equation. But, under some condition computational complexity can
be reduced to the same level as that required by traditional frequency domain technique.

(iv) Image enhancement techniques are problem oriented, whereas image restoration techniques are
general and are oriented towards modeling the degradation and applying the reverse process in
order to reconstruct the original image.

(v) Masks are used in spatial domain methods for image enhancement, whereas masks are not used
for image restoration techniques.

(vi) Contrast stretching is considered as image enhancement technique because it is based on the
pleasing aspects of the review, whereas removal of’ image blur by applying a deblurring function is
considered as a image restoration technique.

Iterative nonlinear restoration using the Lucy–Richardson algorithm.

Lucy-Richardson algorithm is a nonlinear restoration method used to recover a latent


image which is blurred by a Point Spread Function (psf). It is also known as Richardson-
Lucy de-convolution. With as the point spread function, the pixels in observed image are
expressed as,
Here,

uj = Pixel value at location j in the image


ci = Observed value at ith pixel locacation
The L-R algorithm cannot be used in application in which the psf (Pij) is dependent on one
or more unknown variables.

The L-R algorithm is based on maximum-likelihood formulation, in this formulation Poisson statistics are
used to model the image. If the likelihood of model is increased, then the result is an equation which
satisfies when the following iteration converges.

Here,

f = Estimation of undegraded image.

The factor f which is present in the right side denominator leads to non-linearity. Since, the
algorithm is a type of nonlinear restorations; hence it is stopped when satisfactory result is
obtained. The basic syntax of function deconvlucy with the L-R algorithm is implemented is
given below.

fr = Deconvlucy (g, psf, NUMIT, DAMPAR, WEIGHT). Here the parameters are,

g = Degraded image, fr = Restored image, psf = Point spread function

NUMIT = Total number of iterations. The remaining two parameters are,

DAMPAR

The DAMPAR parameter is a scalar parameter which is used to determine the deviation
of resultant image with the degraded image (g). The pixels which gel deviated from their original
value within the DAMPAR, for these pixels iterations are cancelled so as to reduce noise
generation and present essential image information.
WEIGHT
WEIGHT parameter gives a weight to each and every pixel. It is array of size similar
to that of degraded image (g). In applications where a pixel leads to improper image is
removed by assigning it to a weight as 0’. The pixels may also be given weights depending
upon the flat-field correction, which is essential according to image array. Weights are used
in applications such as blurring with specified psf. They are used to remove the pixels which
are pre9ent at the boundary of the image and are blurred separately by psf. If the array size
of psf is n x n then the width of weight of border of zeroes being used is ceil (n / 2).
MODULE-IV
COLOR IMAGE PROCESSING
4.1 Introduction:
It was also stated that only energy within a certain frequency/wavelength range is measured. This
wavelength range is denoted the visual spectrum, In the human eye this is done by the so-called rods,
which are specialized nerve-cells that act as photoreceptors. Besides the rods, the human eye also
contains cones. These operate like the rods, but are not sensitive to all wavelengths in the visual
spectrum. Instead, the eye contains three types of cones, each sensitive to a different wavelength
range. The human brain interprets the output from these different cones as different colors as seen in
Table . So, a color is defined by a certain wavelength in the electromagnetic spectrum as illustrated.
Since the three different types of cones exist we have the notion of the primary colors being red,
green and blue. Psycho-visual experiments have shown that the different cones have different
sensitivity. This means that when you see two different colors with the same intensity, you will judge
their brightness differently. On average, a human perceives red as being 2.6 times as bright as blue
and green as being 5.6 times as bright as blue. Hence the eye is more sensitive to green and least
sensitive to blue. When all wavelengths (all colors) are present at the same time, the eye perceives
this as a shade of gray, hence no color is seen! If the energy level increases the shade becomes
brighter and ultimately becomes white. Conversely, when the energy:

4.2 Basics of full-color image processing


level is decreased, the shade becomes darker and ultimately becomes black. This continuum of
different gray-levels (or shades of gray) is denoted the achromatic colors and illustrated in Fig.
3.2. Note that this is the same as Fig. 2.18. An image is created by sampling the incoming light.
The colors of the incoming light depend on the color of the light source illuminating the scene
and the material the object is made of, see Fig. 3.3. Some of the light that hits the object will
bounce right off and some will penetrate into the object. An amount of this light will be absorbed
by the object and an amount leaves again possibly with a different color. So when you see a green
car this means that the wavelengths of the main light reflected from the car are in the range of the
type M cones, see Table 3.1. If we assume the car was illuminated by the sun, which emits all
wavelengths, then we can reason that all wavelengths except the green ones are absorbed by the
material the car is made of. Or in other words, if you are wearing a black shirt all wavelengths
(energy) are absorbed by the shirt and this is why it becomes hotter than a white shirt. When the
resulting color is created by illuminating an object by white light and then absorbing some of the
wavelengths (colors) we use the notion of subtractive colors. Exactly as when you mix paint to
create a color. Say you start with a white piece of paper, where no light is absorbed. The resulting
color will be white. If you then want the paper to become green you add green paint, which
absorbs every thing but the green wavelengths. If you add yet another color of paint, then more
wavelengths will be absorbed, and hence the resulting light will have a new color. Keep doing
this and you will in theory end up with a mixture where all wavelengths are absorbed, that is,
black. In practice, however, it will probably not be black, but rather dark gray/brown

Representation of an RGB Color Image :

A color camera is based on the same principle as the human eye. That is, it measures the
amount of incoming red light, green light and blue light, respectively. This is done in one
of two ways depending on the number of sensors in the camera. In the case of three
sensors, each sensor measures one of the three colors, respectively. This is done by
splitting the incoming light into the three wavelength ranges using some optical filters and
mirrors. So red light is only send to the “red-sensor” etc. The result is three images each
describing the amount of red, green and blue light per pixel, respectively. In a color image,
each pixel therefore consists of three values: red, green and blue. The actual representation
might be three images—one for each color, as illustrated in Fig. 3.4, but it can also be a 3-
dimensional vector for each pixel, hence an image of vectors. Such a vector looks like this:
Color pixel = [Red,Green,Blue]=[R,G,B]

In terms of programming a color pixel is usually represented as a struct. Say we want to set
the RGB values of the pixel at position (2, 4) to: Red = 100, Green = 42, and Blue = 10,
respectively. In C-code this can for example be written as f [2][4].R = 100; f [2][4].G = 42;
f [2][4].B = 10;

Typically each color value is represented by an 8-bit (one byte) value meaning that
256 different shades of each color can be measured. Combining different values of
the three colors, each pixel can represent 2563 = 16,777,216 different colors. A
cheaper alternative to having three sensors including mirrors and optical filters is to
only have one sensor. In this case, each cell in the sensor is made sensitive to one of
the three colors (ranges of wavelength). This can be done in a number of different
ways. One is using a Bayer pattern. Here 50% of the cells are sensitive to green,
while the remaining cells are divided equally between red and blue. The reason
being, as mentioned above, that the human eye is more sensitive to green. The layout
of the different cells is illustrated in Fig. 3.5. The figure shows the upper-left corner
of the sensor, where the letters illustrate which color a particular pixel is sensitive to.
This means that each pixel only cap tures one color and that the two other colors of
a particular pixel must be inferred from the neighbors. Algorithms for finding the
remaining colors of a pixel are known as demosaicing and, generally speaking, the
algorithms are characterized by the required processing time (often directly
proportional to the number of neighbors included) and the quality of the output. The
higher the processing time the better the result. How to balance these two issues is up
to the camera manufactures, and in general, the higher the quality of the camera, the
higher the cost. Even very ad vanced algorithms are not as good as a three sensor
color camera and note that when using, for example, a cheap web-camera, the quality
of the colors might not be too good and care should be taken before using the colors
for any processing. Regard less of the choice of demosaicing algorithm, the output
is the same as when using three sensors, namely Eq. 3.1. That is, even though only
one color is measured per pixel, the output for each pixel will (after demosaicing)
consist of three values: R, G, and B. An example of a simple demosaicing algorithm
is to infer the missing colors from the nearest pixels, for example using the following
set of equations: g(x,y) ⎧ ⎪⎪⎨ ⎪⎪⎩ [R,G,B]B = [f (x + 1,y + 1),f (x + 1,y),f (x,y)]
[R,G,B]GB = [f (x,y + 1),f (x,y),f (x − 1,y)] [R,G,B]GR = [f (x + 1,y),f (x,y),f (x,y −
1)] [R,G,B]R = [f (x,y),f (x − 1,y),f (x − 1,y − 1)] (3.2) where f (x,y) is the input
image (Bayer pattern) and g(x,y) is the output RGB image. The RGB values in the
output image are found differently depending on which color a particular pixel is
sensitive to: [R,G,B]B should be used for the pixels sensitive to blue, [R,G,B]R
should be used for the pixels sensitive to red, and [R,G,B]GB and [R,G,B]GR should
be used for the pixels sensitive to green followed by a blue or red pixel, respectively.
In Fig. 3.6 a concrete example of this algorithm is illustrated. In the left figure the
values sampled from the sensor are shown. In the right figure the resulting RGB
output image is shown using Eq. 3.2.

The RGB Color Space

According to Eq. 3.1 a color pixel has three values and can therefore be represented
as one point in a 3D space spanned by the three colors. If we say that each color is
represented by 8-bits, then we can construct the so-called RGB color cube, see Fig.
3.7. In the color cube a color pixel is one point or rather a vector from (0, 0, 0) to the
pixel value. The different corners in the color cube represent some of the pure colors
and are listed in Table 3.2. The vector from (0, 0, 0) to (255, 255, 255) passes
through all the gray-scale values and is denoted the gray-vector. Note that the
gray vector is identical to Fig. 3.2. 3.2.2 Converting from RGB to Gray-Scale Even
though you use a color camera it might be sufficient for your algorithm to ap ply the
intensity information in the image and you therefore need to convert the color image
into a gray-scale image. Converting from RGB to gray-scale is performed as I = WR
· R + WG · G + WB · B

If we have the following three RGB pixel values (0, 50, 0), (0, 100, 0), and (0, 223,
0) in the RGB color cube, we can see that they all lie on the same vec tor, namely
the one spanned by (0, 0, 0) and (0, 255, 0). We say that all values are a shade of
green and go even further and say that they all have the same color (green), but
different levels of illumination. This also applies to the rest of the color cube. For
example, the points (40, 20, 50), (100, 50, 125) and (200, 100, 250) all lie on the
same vector and therefore have the same color, but just different illumination levels.
This is illustrated in Fig. 3.9. If we generalize this idea of different points on the
same line having the same color, then we can see that all possible lines pass through
the triangle defined by the points (1, 0, 0), (0, 1, 0) and (0, 0, 1), see Fig. 3.10(a).

The actual point (r,g,b) where a line intersects the triangle is found as2:

(r,g,b) = R R + G + B , G R + G + B , B R + G + B …… (3.5)

These values are named normalized RGB and denoted (r,g,b). In Table 3.3 the rgb
values of some RGB values are shown. Note that each value is in the interval [0, 1]
and that r + g + b = 1. This means that if we know two of the normalized.

Other Color Representations From a human perception point of view the triangular
representation in 3.10(b) is not intuitive. Instead humans rather use the notion of hue
and saturation, when perceiving colors. The hue is the dominant wavelength in the
perceived light and represents the pure color, i.e., the colors located on the edges of
the triangle in Fig. 3.10(b). The saturation is the purity of the color and represents the
amount of white light mixed with the pure color. To understand these entities better,
let us look at Fig. 3.11(a). First of all we see that the point C corresponds to the
neutral point, meaning the colorless center of the triangle where (r,g) = (1/3, 1/3). Let
us define a random point in the triangle as P . The hue of this point is now defined as
an angle, θ , between the vectors −−→Cr=1 and −→CP . So hue = 0° means red and
hue = 120° means green. If the point P is located on the edge of the triangle then we
say the saturation is 1, hence a pure color. As the point approaches C the saturation
goes toward 0, and ultimately becomes 0 when P = C. Since the distance from C to
the three edges of the triangle is not uniform, the saturation is defined as a relative
distance. That is, saturation is defined as the ratio between the distance from C to P ,
and the distance from C to the point on the edge of the triangle in the direction of
−→CP . Mathematically we have Saturation = −→CP −−→CP , Hue = θ
(3.7) where −→CP is the length of the vector −→CP . The representation of
colors based on hue and saturation results in a circle as opposed to the triangle in Fig.
3.10(b). In Fig. 3.11(b) the hue–saturation representation is illustrated together with
some of the pure colors. It is important to realize how this figure relates to Fig. 3.7,
or in other words, how the hue–saturation representation relates to the RGB
representation. The center of the hue–saturation circle in Fig. 3.11(b) is a shade of
gray and corresponds to the gray-vector in Fig. 3.7. The circle is located so that it is
perpendicular to the gray-vector. For a particular RGB value, the hue–saturation
circle is therefore centered at a position on the gray-vector, so that the RGB value is
included in the circle. A number of different color representations exist, which are
based on the notion of hue and saturation. Below two of these are presented.

Further Information When reading literature on color spaces and color processing it
is important to re alize that a number of different terms are used.6 Unfortunately,
some of these terms are used interchangeably even though they might have different
physical/perceptu al/technical meanings. We therefore give a guideline to some of
the terms you are likely to encounter when reading literature on colors: Chromatic
Color All colors in the RGB color cube except those lying on the gray line spanned
by (0, 0, 0) and (255, 255, 255). Achromatic Color The colorless values in the RGB
cube, i.e., all those colors lying on the gray-line. The opposite of chromatic color.
Shades of gray The same as achromatic color. Intensity The average amount of
energy, i.e., (R + G + B)/3. Brightness The amount of light perceived by a human.
Lightness The amount of light perceived by a human. Luminance The amount of
light perceived by a human. Note that when you ven ture into the science of color
understanding, the luminance defines the amount of emitted light. Luma Gamma-
corrected luminance.

Shade Darkening a color. When a subtractive color space is applied, different shades
(darker nuances) of a color are obtained by mixing the color with differ ent amounts
of black. Tint Lightening a color. When a subtractive color space is applied, different
tints (lighter nuances) of a color are obtained by mixing the color with different
amounts of white. Tone A combination of shade and tint, where gray is mixed with
the input color. ’(denoted prime) The primed version of a color, i.e., R’, means that
the value has been gamma-corrected. Sometimes a gray-scale image is mapped to a
color image in order to enhance some aspect of the image. As mentioned above a
true color image cannot be recon structed from a gray-level image. We therefore
use the term pseudo color to under line that we are not talking about a true RGB
image. How to map from gray-scale to color can be done in many different ways
MODULE-V
IMAGE COMPRESSION
Introduction:
Image compression and the redundancies in a digital image.
The term data compression refers to the process of reducing the amount of data required to
represent a given quantity of information. A clear distinction must be made between data and
information. They are not synonymous. In fact, data are the means by which information is
conveyed. Various amounts of data may be used to represent the same amount of information.
Such might be the case, for example, if a long-winded individual and someone who is short and
to the point where to relate the same story. Here, the information of interest is the story; words are
the data used to relate the information. If the two individuals use a different number of words to
tell the same basic story, two different versions of the story are created, and at least one includes
nonessential data. That is, it contains data (or words) that either provide no relevant information
or simply restate that which is already known. It is thus said to contain data redundancy.

Data redundancy is a central issue in digital image compression. It is not an abstract concept
but a mathematically quantifiable entity. If n1 and n2 denote the number of information-
carrying units in two data sets that represent the same information, the relative data
redundancy RD of the first data set (the one characterized by n1) can be defined as

where CR , commonly called the compression ratio, is

In digital image compression, three basic data redundancies can be identified and exploited:
coding redundancy, inter pixel redundancy, and psychovisual redundancy. Data
compression is achieved when one or more of these redundancies are reduced or
eliminated.
Coding Redundancy:
In this, we utilize formulation to show how the gray-level histogram of an image also can
provide a great deal of insight into the construction of codes to reduce the amount of data
used to represent it.

Let us assume, once again, that a discrete random variable rk in the interval [0, 1] represents
the gray levels of an image and that each rk occurs with probability pr (rk).

where L is the number of gray levels, nk is the number of times that the kth gray level
appears in the image, and n is the total number of pixels in the image. If the number of bits
used to represent each
value of rk is l (rk), then the average number of bits required to represent each pixel is

That is, the average length of the code words assigned to the various gray-level
values is found by summing the product of the number of bits used to represent each
gray level and the probability that the gray level occurs. Thus the total number of bits
required to code an M X N image is MNLavg.

Inter pixel Redundancy:


Consider the images shown in Figs. (a) and (b). As Figs. (c) and (d) show, these images
have virtually identical histograms. Note also that both histograms are trimodal,
indicating the presence of three dominant ranges of gray-level values. Because the gray
levels in these images are not equally probable, variable-length coding can be used to
reduce the coding redundancy that would result from a straight or natural binary
encoding of their pixels. The coding process, however, would not alter the level of
correlation between the pixels within the images. In other words, the codes used to
represent the gray levels of each image have nothing to do with the correlation between
pixels. These correlations result from the structural or geometric relationships
between the objects in the image.

Figure: Two images and their gray-level histograms and normalized autocorrelation coefficients along
one line.
Figures (e) and (f) show the respective autocorrelation coefficients computed along
one line of each image.

where

The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integer value of Δn. Of course, Δn must be strictly less than N, the
number of pixels on a line. The variable x is the coordinate of the line used in the
computation. Note the dramatic difference between the shape of the functions shown
in Figs. (e) and (f). Their shapes can be qualitatively related to the structure in the
images in Figs. (a) and (b).This relationship is particularly noticeable in Fig. (f), where
the high correlation between pixels separated by 45 and 90 samples can be directly
related to the spacing between the vertically oriented matches of Fig. (b). In addition,
the adjacent pixels of both images are highly correlated. When Δn is 1, γ is 0.9922 and
0.9928 for the images of Figs. (a) and (b), respectively. These values are typical of
most properly sampled television images. These illustrations reflect another important
form of data redundancy—one directly related to the inter pixel correlations within an
image. Because the value of any given pixel can be reasonably predicted from the
value of its neighbors, the information carried by individual pixels is relatively small.
Much of the visual contribution of a single pixel to an image is redundant; it could
have been guessed on the basis of the values of its neighbors. A variety of names,
including spatial redundancy, geometric redundancy, and inter frame redundancy,
have been coined to refer to these inter pixel dependencies. We use the term inter pixel
redundancy to encompass them all.

In order to reduce the inter pixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a more
efficient (but usually "non visual") format. For example, the differences between
adjacent pixels can be used to represent an image. Transformations of this type (that
is, those that remove inter pixel redundancy) are referred to as mappings. They are
called reversible mappings if the original image elements can be reconstructed from
the transformed data set.
Psychovisual Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than
simply the light reflected by the region. For example, intensity variations (Mach bands)
can be perceived in an area of constant intensity. Such phenomena result from the fact
that the eye does not respond with equal sensitivity to all visual information. Certain
information simply has less relative importance than other information in normal
visual processing. This information is said to be psycho visually redundant. It can be
eliminated without significantly impairing the quality of image perception.
That psycho visual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative
analysis of every pixel value in the image. In general, an observer searches for
distinguishing features such as edges or textural regions and mentally combines them
into recognizable groupings. The brain then correlates these groupings with prior
knowledge in order to complete the image interpretation process. Psycho visual
redundancy is fundamentally different from the redundancies discussed earlier. Unlike
coding and inter pixel redundancy, psycho visual redundancy is associated with real
or quantifiable visual information. Its elimination is possible only because the
information itself is not essential for normal visual processing. Since the elimination
of psycho visually redundant data results in a loss of quantitative information, it is
commonly referred to as quantization.

This terminology is consistent with normal usage of the word, which generally means
the mapping of a broad range of input values to a limited number of output values. As
it is an irreversible operation (visual information is lost), quantization results in lossy
data compression.

Fidelity criterion.
The removal of psycho visually redundant data results in a loss of real or quantitative
visual information. Because information of interest may be lost, a repeatable or
reproducible means of quantifying the nature and extent of information loss is highly
desirable. Two general classes of criteria are used as the basis for such an assessment:

Objective fidelity criteria and Subjective fidelity criteria.

When the level of information loss can be expressed as a function of the original or
input image and the compressed and subsequently decompressed output image, it is
said to be based on an objective fidelity criterion. A good example is the root-mean-
square (rms) error between an input and output image. Let f(x, y) represent an input
image and let f(x, y) denote an estimate or approximation of f(x, y) that results from
compressing and subsequently decompressing the input. For any value of x and y, the
error e(x, y) between f (x, y) and f^ (x, y) can be defined as

so that the total error between the two images is

Where the images are of size M X N. The root-mean-square error, erms, between f(x,
y) and f^(x,
y) Then is the square root of the squared error averaged over the M X N array, or

A closely related objective fidelity criterion is the mean-square signal-to-noise ratio of


the compressed- decompressed image. If f^ (x, y) is considered to be the sum of the
original image f(x, y) and a noise signal e(x, y), the mean-square signal-to-noise ratio
of the output image, denoted SNRrms, is

The rms value of the signal-to-noise ratio, denoted SNR rms, is obtained by taking the
square root of Eq. above.

Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by
humans. Consequently, measuring image quality by the subjective evaluations of a
human observer often is more appropriate. This can be accomplished by showing a
"typical" decompressed image to an appropriate cross section of viewers and averaging
their evaluations. The evaluations may be made using an absolute rating scale or by
means of side-by-side comparisons of f(x, y) and f^(x, y).

Image compression models.


Fig. shows, a compression system consists of two distinct structural blocks: an encoder
and a decoder. An input image f(x, y) is fed into the encoder, which creates a set of
symbols from the input data. After transmission over the channel, the encoded
representation is fed to the decoder, where a reconstructed output image f^(x, y) is
generated. In general, f^(x, y) may or may not be an exact replica of f(x, y). If it is, the
system is error free or information preserving; if not, some level of distortion is present
in the reconstructed image. Both the encoder and decoder shown in Fig. consist of two
relatively independent functions or sub blocks. The encoder is made up of a source
encoder, which removes input redundancies, and a channel encoder, which increases
the noise immunity of the source encoder's output. As would be expected, the decoder
includes a channel decoder followed by a source decoder. If the channel between the
encoder and decoder is noise free (not prone to error), the channel encoder and decoder
are omitted, and the general encoder and decoder become the source encoder and
decoder, respectively.
Figure: A general compression system model

The Source Encoder and Decoder:


The source encoder is responsible for reducing or eliminating any coding, inters pixel,
or psycho visual redundancies in the input image. The specific application and
associated fidelity requirements dictate the best encoding approach to use in any given
situation. Normally, the approach can be modeled by a series of three independent
operations. As Fig (a) shows, each operation is designed to reduce one of the three
redundancies. Figure (b) depicts the corresponding source decoder. In the first stage
of the source encoding process, the mapper transforms the input data into a (usually
non visual) format designed to reduce inte rpixel redundancies in the input image. This
operation generally is reversible and may or may not reduce directly the amount of
data required to represent the image.

Figure: (a) Source encoder and (b) source decoder model

Run-length coding is an example of a mapping that directly results in data compression


in this initial stage of the overall source encoding process. The representation of an
image by a set of transform coefficients is an example of the opposite case. Here, the
mapper transforms the image into an array of coefficients, making its inter pixel
redundancies more accessible for compression in later stages of the encoding process.

The second stage, or quantizer block in Fig. (a), reduces the accuracy of the mapper's
output in accordance with some reestablished fidelity criterion. This stage reduces the
psycho visual redundancies of the input image. This operation is irreversible. Thus it
must be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol coder creates a
fixed- or variable-length code to represent the quantizer output and maps the output in
accordance with the code. The term symbol coder distinguishes this coding operation
from the overall source encoding process. In most cases, a variable-length code is used
to represent the mapped and quantized data set. It assigns the shortest code words to
the most frequently occurring output values and thus reduces coding redundancy. The
operation, of course, is reversible. Upon completion of the symbol coding step, the
input image has been processed to remove each of the three redundancies.

Figure (a) shows the source encoding process as three successive operations, but all
three operations are not necessarily included in every compression system. Recall, for
example, that the quantizer must be omitted when error-free compression is desired.
In addition, some compression techniques normally are modeled by merging blocks
that are physically separate in Fig (a). In the predictive compression systems, for
instance, the mapper and quantizer are often represented by a single block, which
simultaneously performs both operations.

The source decoder shown in Fig. (b) Contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse
operations of the source encoder's symbol encoder and mapper blocks. Because
quantization results in irreversible information loss, an inverse quantizer block is not
included in the general source decoder model shown in Fig. (b).

The Channel Encoder and Decoder:


The channel encoder and decoder play an important role in the overall encoding-
decoding process when the channel of Fig. is noisy or prone to error. They are
designed to reduce the impact of channel noise by inserting a controlled form of
redundancy into the source encoded data. As the output of the source encoder contains
little redundancy, it would be highly sensitive to transmission noise without the
addition of this "controlled redundancy." One of the most useful channel encoding
techniques was devised by R. W. Hamming (Hamming [1950]). It is based on
appending enough bits to the data being encoded to ensure that some minimum number
of bits must change between valid code words. Hamming showed, for example, that if
3 bits of redundancy are added to a 4-bit word, so that the distance between any two
valid code words is 3, all single-bit errors can be detected and corrected. (By appending
additional bits of redundancy, multiple-bit errors can be detected and corrected.) The
7-bit Hamming (7, 4) code word h1, h2, h3…., h6, h7 associated with a 4-bit binary
number b3b2b1b0 is
Where denotes the exclusive OR operation. Note that bits h1, h2, and h4 are even- parity
bits for the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively. (Recall that a string of
binary bits has even parity if the number of bits with a value of 1 is even.) To decode
a Hamming encoded result, the channel decoder must check the encoded value for odd
parity over the bit fields in which even parity was previously established. A single-bit
error is indicated by a nonzero parity word c4c2c1, where

If a nonzero value is found, the decoder simply complements the code word bit position
indicated by the parity word. The decoded binary value is then extracted from the corrected
code word as h3h5h6h7. Method of generating variable length codes with an example.

Variable-Length Coding:

The simplest approach to error-free image compression is to reduce only coding


redundancy. Coding redundancy normally is present in any natural binary encoding of
the gray levels in an image. It can be eliminated by coding the gray levels. To do so
requires construction of a variable-length code that assigns the shortest possible code
words to the most probable gray levels. Here, we examine several optimal and near
optimal techniques for constructing such a code. These techniques are formulated in
the language of information theory. In practice, the source symbols may be either the
gray levels of an image or the output of a gray-level mapping operation (pixel
differences, run lengths, and so on).

Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually,
Huffman coding yields the smallest possible number of code symbols per source
symbol. In terms of the noiseless coding theorem, the resulting code is optimal for a
fixed value of n, subject to the constraint that the source symbols be coded one at a
time.The first step in Huffman's approach is to create a series of source reductions by
ordering the probabilities of the symbols under consideration and combining the
lowest probability symbols into a single symbol that replaces them in the next source
reduction. Figure illustrates this process for binary coding (K-ary Huffman codes can
also be constructed). At the far left, a hypothetical set of source symbols and their
probabilities are ordered from top to bottom in terms of decreasing probability values.
To form the first source reduction, the bottom two probabilities,0.06 and 0.04, are
combined to form a "compound symbol" with probability 0.1. This compound symbol
and its associated probability are placed in the first source reduction column so that
the probabilities of the reduced source are also ordered from the most to the least
probable. This process is then repeated until a reduced source with two symbols (at the
far right) is reached.

The second step in Huffman's procedure is to code each reduced source, starting with
the smallest source and working back to the original source. The minimal length binary
code for a two-symbol source, of course, is the symbols 0 and 1. As Fig. 4.2 shows,
these symbols are assigned to the two symbols on the right (the assignment is arbitrary;
reversing the order of the 0 and 1 would work just as well). As the reduced source
symbol with probability 0.6 was generated by combining two symbols in the reduced
source to its left, the 0 used to code it is now assigned to both of these symbols, and a
0 and 1 are arbitrarily

Figure: Huffman source reductions.

Figure: Huffman code assignment procedure.

Appended to each to distinguish them from each other. This operation is then repeated
for each reduced source until the original source is reached. The final code appears at
the far left in Fig. The average length of this code is
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code
efficiency is 0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities
subject to the constraint that the symbols be coded one at a time. After the code has
been created, coding and/or decoding is accomplished in a simple lookup table manner.
The code itself is an instantaneous uniquely decodable block code. It is called a block
code because each source symbol is mapped into a fixed sequence of code symbols. It
is instantaneous, because each code word in a string of code symbols can be decoded
without referencing succeeding symbols. It is uniquely decodable, because any string
of code symbols can be decoded in only one way. Thus, any string of Huffman encoded
symbols can be decoded by examining the individual symbols of the string in a left to
right manner. For the binary code of Fig., a left-to-right scan of the encoded string
010100111100 reveals that the first valid code word is 01010, which is the code for
symbol a3 .The next valid code is 011, which corresponds to symbol a1. Continuing in
this manner reveals the completely decoded message to be a3a1a2a2a6.
Arithmetic encoding process with an example.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates
non block codes. In arithmetic coding, which can be traced to the work of Elias, a one-
to-one correspondence between source symbols and code words does not exist.
Instead, an entire sequence of source symbols (or message) is assigned a single
arithmetic code word. The code word itself defines an interval of real numbers between
0 and 1. As the number of symbols in the message increases, the interval used to
represent it becomes smaller and the number of information units (say, bits) required
to represent the interval becomes larger. Each symbol of the message reduces the size
of the interval in accordance with its probability of occurrence. Because the technique
does not require, as does Huffman’s approach, that each source symbol translate into
an integral number of code symbols (that is, that the symbols be coded one at a time),
it achieves (but only in theory) the bound established by the noiseless coding theorem.

Figure: Arithmetic coding procedure


Figure illustrates the basic arithmetic coding process. Here, a five-symbol sequence or
message, a1a2a3a3a4, from a four-symbol source is coded. At the start of the coding
process, the message is assumed to occupy the entire half-open interval [0, 1). As Table
shows, this interval is initially subdivided into four regions based on the probabilities
of each source symbol. Symbol ax, for example, is associated with subinterval [0, 0.2).
Because it is the first symbol of the message being coded, the message interval is
initially narrowed to [0, 0.2). Thus in Fig. [0, 0.2) is expanded to the full height of the
figure and its end points labeled by the values of the narrowed range. The narrowed
range is then subdivided in accordance with the original source symbol probabilities
and the process continues with the next message symbol.
Table Arithmetic coding example

In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it
to [0.056, 0.072), and so on. The final message symbol, which must be reserved as a
special end-of- message indicator, narrows the range to [0.06752, 0.0688). Of course,
any number within this subinterval—for example, 0.068—can be used to represent the
message. In the arithmetically coded message of Fig. 5.6, three decimal digits are used
to represent the five-symbol message. This translates into 3/5 or 0.6 decimal digits per
source symbol and compares favorably with the entropy of the source, which is 0.58
decimal digits or 10-ary units/symbol. As the length of the sequence being coded
increases, the resulting arithmetic code approaches the bound established by the
noiseless coding theorem. In practice, two factors cause coding performance to fall
short of the bound: (1) the addition of the end-of-message indicator that is needed to
separate one message from an- other; and (2) the use of finite precision arithmetic.
Practical implementations of arithmetic coding address the latter problem by
introducing a scaling strategy and a rounding strategy (Langdon and Rissanen [1981]).
The scaling strategy renormalizes each subinterval to the [0, 1) range before
subdividing it in accordance with the symbol probabilities. The rounding strategy
guarantees that the truncations associated with finite precision arithmetic do not
prevent the coding subintervals from being represented accurately.

LZW coding with an example. LZW Coding:


The technique, called Lempel-Ziv-Welch (LZW) coding, assigns fixed-length code
words to variable length sequences of source symbols but requires no a priori
knowledge of the probability of occurrence of the symbols to be encoded. LZW
compression has been integrated into a variety of mainstream imaging file formats,
including the graphic interchange format (GIF), tagged image file format (TIFF), and
the portable document format (PDF).

LZW coding is conceptually very simple (Welch [1984]). At the onset of the coding
process, a codebook or "dictionary" containing the source symbols to be coded is
constructed. For 8-bit Monochrome images, the first 256 words of the dictionary are
assigned to the gray values 0, 1, 2..., and
255. As the encoder sequentially examines the image's pixels, gray- level sequences
that are not in the dictionary are placed in algorithmically determined (e.g., the next
unused) locations. If the first two pixels of the image are white, for instance, sequence
―255- 255‖ might be assigned to location 256, the address following the locations
reserved for gray levels 0 through 255. The next time that two consecutive white pixels
are encountered, code word 256, the address of the location containing sequence 255-
255, is used to represent them.

If a 9-bit, 512-word dictionary is employed in the coding process, the original (8 + 8)


bits that were used to represent the two pixels are replaced by a single 9-bit code word.
Cleary, the size of the dictionary is an important system parameter. If it is too small,
the detection of matching gray- level sequences will be less likely; if it is too large, the
size of the code words will adversely affect compression performance. Consider the
following 4 x 4, 8-bit image of a vertical edge:

Table details the steps involved in coding its 16 pixels. A 512-word dictionary with
the following starting content is assumed:

Locations 256 through 511 are initially unused. The image is encoded by processing
its pixels in a left- to-right, top-to-bottom manner. Each successive gray-level value is
concatenated with a variable— column 1 of Table 6.1 —called the "currently
recognized sequence." As can be seen, this variable is initially null or empty. The
dictionary is searched for each concatenated sequence and if found, as was the case in
the first row of the table, is replaced by the newly concatenated and recognized (i.e.,
located in the dictionary) sequence. This was done in column 1 of row 2.
No output codes are generated, nor are the dictionary altered. If the concatenated
sequence is not found, however, the address of the currently recognized sequence is
output as the next encoded value, the concatenated but unrecognized sequence is added
to the dictionary, and the currently recognized sequence is initialized to the current
pixel value. This occurred in row 2 of the table. The last two columns detail the gray-
level sequences that are added to the dictionary when scanning the entire 4 x 4 image.
Nine additional code words are defined. At the conclusion of coding, the dictionary
contains 265 code words and the LZW algorithm has successfully identified several
repeating gray-level sequences—leveraging them to reduce the original 128-bit image
lo 90 bits (i.e., 10 9-bit codes). The encoded output is obtained by reading the third
column from top to bottom. The resulting compression ratio is 1.42:1.

A unique feature of the LZW coding just demonstrated is that the coding dictionary or
code book is created while the data are being encoded. Remarkably, an LZW decoder
builds an identical decompression dictionary as it decodes simultaneously the encoded
data stream. . Although not needed in this example, most practical applications require
a strategy for handling dictionary overflow. A simple solution is to flush or reinitialize
the dictionary when it becomes full and continue coding with a new initialized
dictionary. A more complex option is to monitor compression performance and flush
the dictionary when it becomes poor or unacceptable. Alternately, the least used
dictionary entries can be tracked and replaced when necessary.

Concept of bit plane coding method. Bit-Plane Coding:

An effective technique for reducing an image's inter pixel redundancies is to process


the image's bit planes individually. The technique, called bit-plane coding, is based on
the concept of decomposing a multilevel (monochrome or color) image into a series of
binary images and compressing each binary image via one of several well-known
binary compression methods.

Bit-plane decomposition:

The gray levels of an m-bit gray-scale image can be represented in the form of the base
2 polynomial

Based on this property, a simple method of decomposing the image into a collection
of binary images is to separate the m coefficients of the polynomial into m 1-bit bit
planes. The zeroth- order bit plane is
generated by collecting the a0 bits of each pixel, while the (m - 1) st-order bit plane
contains the am-1, bits or coefficients. In general, each bit plane is numbered from 0 to
m-1 and is constructed by setting its pixels equal to the values of the appropriate bits
or polynomial coefficients from each pixel in the original image. The inherent
disadvantage of this approach is that small changes in gray level can have a significant
impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111) is
adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will
contain a corresponding 0 to 1 (or 1 to 0) transition. For example, as the most
significant bits of the two binary codes for 127 and 128 are different, bit plane 7 will
contain a zero-valued pixel next to a pixel of value 1, creating a 0 to 1 (or 1 to 0)
transition at that point.

An alternative decomposition approach (which reduces the effect of small gray-level


variations) is to first represent the image by an m-bit Gray code. The m-bit Gray code
gm-1... g2g1g0 that corresponds to the polynomial in Eq. above can be computed from

Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level
are less likely to affect all m bit planes. For instance, when gray levels 127 and 128
are adjacent, only the 7th bit plane will contain a 0 to 1 transition, because the Gray
codes that correspond to 127 and 128 are 11000000 and 01000000, respectively.

Lossless Predictive Coding:

The error-free compression approach does not require decomposition of an image into
a collection of bit planes. The approach, commonly referred to as lossless predictive
coding, is based on eliminating the interpixel redundancies of closely spaced pixels by
extracting and coding only the new information in each pixel. The new information
of a pixel is defined as the difference between the actual and predicted value of that
pixel.

Figure shows the basic components of a lossless predictive coding system. The system
consists of an encoder and a decoder, each containing an identical predictor. As each
successive pixel of the input image, denoted fn, is introduced to the encoder, the
predictor generates the anticipated value of that pixel based on some number of past
inputs. The output of the predictor is then rounded to the nearest integer, denoted f^n
and used to form the difference or prediction error which is coded using a variable-
length code (by the symbol encoder) to generate the next element of the compressed
data stream.
Figure :A lossless predictive coding model: (a) encoder; (b) decoder

The decoder of Fig. (b) reconstructs en from the received variable-length code words
and performs the inverse operation Various local, global, and adaptive methods can be
used to generate f^n. In most cases, however, the prediction is formed by a linear
combination of m previous pixels. That is,

where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction
coefficients. In raster scan applications, the subscript n indexes the predictor outputs
in accordance with their time of occurrence. That is, fn, f^n and en in Eqns. above could
be replaced with the more explicit notation f (t), f^(t), and e (t), where t represents
time. In other cases, n is used as an index on the spatial coordinates and/or frame
number (in a time sequence of images) of an image. In 1-D linear predictive coding,
for example, Eq. above can be written as

Where each subscripted variable is now expressed explicitly as a function of spatial


coordinates x and
y. The Eq. indicates that the 1-D linear prediction f(x, y) is a function of the previous
pixels on the current line alone. In 2-D predictive coding, the prediction is a function
of the previous pixels in a left- to-right, top-to-bottom scan of an image. In the 3-D
case, it is based on these pixels and the previous pixels of preceding frames. Equation
above cannot be evaluated for the first m pixels of each line, so these pixels must be
coded by using other means (such as a Huffman code) and considered as an overhead
of the predictive coding process. A similar comment applies to the higher-dimensional
cases.

Lossy Predictive Coding:


In this type of coding, we add a quantizer to the lossless predictive model and examine
the resulting trade-off between reconstruction accuracy and compression performance.
As Fig. shows, the quantizer, which absorbs the nearest integer function of the error-
free encoder, is inserted between the symbol encoder and the point at which the
prediction error is formed. It maps the prediction error into a limited range of outputs,
denoted e^n which establish the amount of compression and distortion associated with
lossy predictive coding.

Figure: A lossy predictive coding model: (a) encoder and (b) decoder.

In order to accommodate the insertion of the quantization step, the error-free encoder of figure
must be altered so that the predictions generated by the encoder and decoder are equivalent. As
Fig. shows, this is accomplished by placing the lossy encoder's predictor within a feedback loop,
where its input, denoted f˙n, is generated as a function of past predictions and the corresponding
quantized errors. That is,

This closed loop configuration prevents error buildup at the decoder's output. Note
from Fig that the output of the decoder also is given by the above Eqn.

Optimal predictors:

The optimal predictor used in most predictive coding applications minimizes the
encoder's mean- square prediction error

Subject to the constraint that


and

That is, the optimization criterion is chosen to minimize the mean-square prediction
error, the quantization error is assumed to be negligible (e˙n ≈ en), and the prediction
is constrained to a linear combination of m previous pixels.1 These restrictions are not
essential, but they simplify the analysis considerably and, at the same time, decrease
the computational complexity of the predictor. The resulting predictive coding
approach is referred to as differential pulse code modulation (DPCM).

Block diagram about transform coding system.

Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques
that are based on modifying the transform of an image. In transform coding, a
reversible, linear transform (such as the Fourier transform) is used to map the image
into a set of transform coefficients, which are then quantized and coded. For most
natural images, a significant number of the coefficients have small magnitudes and can
be coarsely quantized (or discarded entirely) with little image distortion. A variety of
transformations, including the discrete Fourier transform (DFT), can be used to
transform the image data.

Figure A transform coding system: (a) encoder; (b) decoder.

Figure shows a typical transform coding system. The decoder implements the inverse
sequence of steps (with the exception of the quantization function) of the encoder,
which performs four relatively straightforward operations: sub image decomposition,
transformation, quantization, and coding. An N X N input image first is subdivided
into sub images of size n X n, which are then transformed to generate (N/n) 2 sub image
transform arrays, each of size n X n. The goal of the transformation process is to
decorrelate the pixels of each sub image, or to pack as much information as possible
into the smallest number of transform coefficients. The quantization stage then
selectively eliminates or more coarsely quantizes the coefficients that carry the least
information. These coefficients have the smallest impact on reconstructed sub image
quality. The encoding process terminates by coding (normally using a variable-length
code) the quantized coefficients. Any or all of the transform encoding steps can be
adapted to local image content, called adaptive transform coding, or fixed for all sub
images, called non adaptive transform coding.

You might also like