0% found this document useful (0 votes)
25 views

Dip

Uploaded by

infallibletu0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Dip

Uploaded by

infallibletu0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

UNIT - 1

Digital Image Processing

Digital Image Processing means processing digital image by means of a digital computer.
We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some
useful information.
Digital image processing is the use of algorithms and mathematical models to process and analyse digital images.
The goal of digital image processing is to enhance the quality of images, extract meaningful information from
images, and automate image-based tasks.

❖ The basic steps involved in digital image processing are:


1. Image acquisition: This involves capturing an image using a digital camera or scanner, or importing an
existing image into a computer.
2. Image enhancement: This involves improving the visual quality of an image, such as increasing
contrast, reducing noise, and removing artifacts.
3. Image restoration: This involves removing degradation from an image, such as blurring, noise, and
distortion.
4. Image segmentation: This involves dividing an image into regions or segments, each of which
corresponds to a specific object or feature in the image.
5. Image representation and description: This involves representing an image in a way that can be
analyzed and manipulated by a computer, and describing the features of an image in a compact and
meaningful way.
6. Image analysis: This involves using algorithms and mathematical models to extract information from an
image, such as recognizing objects, detecting patterns, and quantifying features.
7. Image synthesis and compression: This involves generating new images or compressing existing
images to reduce storage and transmission requirements.

❖ What is an image?
An image is defined as a two-dimensional function, F(x,y), where x and y are spatial coordinates, and
the amplitude of F at any pair of coordinates (x,y) is called the intensity of that image at that point.
When x,y and amplitude values of F are finite, we call it a digital image.
In other words, an image can be defined by a two-dimensional array specifically arranged in rows and
columns.
Digital Image is composed of a finite number of elements, each of which elements have a particular
value at a particular location.
These elements are referred to as picture elements, image elements, and pixels.
A Pixel is most widely used to denote the elements of a Digital Image.

❖ Resolution: -
• Resolution is an important characteristic of an imaging system.
• It is the ability of the imaging system to produce the smallest discern able details, i.e., the smallest
sized object clearly, and differentiate it from the neighbouring small objects that are present in the
image.
• The number of rows in digital image is called vertical resolution.
• The number of columns is known as horizontal resolution.
➢ Image resolution depends on two factors:
o Optical resolution of the lens
o Spatial resolution: - A useful way to define resolution is the smallest number of line
pairs per unit distance.

2
✓ Spatial resolution also depends on two parameters: -
• Number of pixels of the image.
• Number of bits necessary for adequate intensity resolution, referred to
as the bit depth.
The number of bits necessary to encode the pixel value is called bit depth. Bit depth is a power of two; it
can be written as power of 2.
So, the total number of bits necessary to represent the image is = Number of rows * Number of columns
* Bit depth.
Generally, image processing operations are divided into two categories:
1. Low Level Operations: - Low level image processing is associated with traditional
image processing.
2. High Level Operations: - High level image processing deals with image understanding.

➢ Process of image understanding can be understood as: -


o Constructing model of any real-world object or scene.
o Constructing model from image.
o Matching processing between model created and real-world model.
o Feedback mechanism which invokes additional routines to update model if required.
This process is iteratively performed till model converge to achieve global goal. These tasks are
quite complex and intensive.

❖ Types of an image: -
1. BINARY IMAGE– The binary image as its name suggests, contain only two-pixel elements i.e. 0 &
1, where 0 refers to black and 1 refers to white. This image is also known as Monochrome.
2. BLACK AND WHITE IMAGE– The image which consist of only black and white color is called
BLACK AND WHITE IMAGE.
3. 8 bit COLOR FORMAT– It is the most famous image format. It has 256 different shades of colors
in it and commonly known as Grayscale Image. In this format, 0 stands for Black, and 255 stands for
white, and 127 stands for gray.
4. 16 bit COLOR FORMAT– It is a color image format. It has 65,536 different colors in it.It is also
known as High Color Format. In this format the distribution of color is not as same as Grayscale
image.
A 16 bit format is actually divided into three further formats which are Red, Green and Blue. That
famous RGB format.

3
❖ DIGITAL IMAGE REPRESENTATION
➢ Image as a Matrix
As we know, images are represented in rows and columns we have the following syntax in which
images are represented:

The right side of this equation is digital image by definition. Every element of this matrix is called
image element, picture element , or pixel.

In MATLAB the start index is from 1 instead of 0. Therefore, f(1,1) = f(0,0).


In MATLAB, matrices are stored in a variable i.e., X, x, input image, and so on.
The variables must be a letter as same as other programming languages.

❖ Advantages of Digital Image Processing: -


1. Improved image quality: Digital image processing algorithms can improve the visual quality of
images, making them clearer, sharper, and more informative.
2. Automated image-based tasks: Digital image processing can automate many image-based tasks,
such as object recognition, pattern detection, and measurement.
3. Increased efficiency: Digital image processing algorithms can process images much faster than
humans, making it possible to analyze large amounts of data in a short amount of time.
4. Increased accuracy: Digital image processing algorithms can provide more accurate results than
humans, especially for tasks that require precise measurements or quantitative analysis.

❖ Disadvantages of Digital Image Processing: -


1. High computational cost: Some digital image processing algorithms are computationally intensive
and require significant computational resources.
2. Limited interpretability: Some digital image processing algorithms may produce results that are
difficult for humans to interpret, especially for complex or sophisticated algorithms.
3. Dependence on quality of input: The quality of the output of digital image processing algorithms
is highly dependent on the quality of the input images. Poor quality input images can result in poor
quality output.
4. Limitations of algorithms: Digital image processing algorithms have limitations, such as the
difficulty of recognizing objects in cluttered or poorly lit scenes, or the inability to recognize objects
with significant deformations or occlusions.
5. Dependence on good training data: The performance of many digital image processing algorithms
is dependent on the quality of the training data used to develop the algorithms. Poor quality training
data can result in poor performance of the algorithm.

4
Image sampling and quantization

❖ Introduction: -
We cannot process or store these analog images on our computers.
Digital images are more useful than analog images.
We can store them on computers, apply digital image processing, make hundreds and thousands of copies,
and share them over the internet.

❖ What is an analog image?


When we capture the image of an object, we use image sensors to sense the incoming light and form the
image.
Image sensors convert the incoming light from an object into electrical signals that can be stored and
viewed later.
These analog signals are continuous.
The images are stored in an analog form.
Thus, the image formed has continuous variation in the tone.
We cannot process analog images by a computer.
Analog signals contain infinite points, and we need infinite memory to store them.
We need to convert the analog images into digital images to store and process by a computer.

❖ Analog image to digital image conversion:


An analog image is converted to a digital image by digitizing the analog signals.
We apply sampling and quantization to the analog signals to convert them into digital form.
A digital image is formed by arranging pixels in rows and columns.
Each pixel has a particular integral value.
The computer process that integral value and show us that pixel, the arrangement of the pixels form the
digital image.
We use sampling and quantization to change the continuous analog image into quantized integral values
that will represent each pixel and ultimately form the digital image.

❖ What is Image Sampling:


Sampling is the process of converting an analog signal into discrete values.
A sampling function is applied to the analog signal that results in the sampled signal.
We get a finite number of samples of an analog signal.
The number of samples gives us the number of pixels.
More samples will result in higher image quality of the digital image because of more pixels.

5
The sampled signal is then quantized to get the value of each pixel.

❖ What is Image quantization


After sampling the analog signal, we will apply quantization.
Quantization digitizes the amplitude of the sampled signal.
Quantization is done by rounding off the amplitude of each sample and then assigning a different value
according to its amplitude.
Each value will represent a different color tone.

Each pixel is assigned an integer value after quantization.


Each number represents a different shade of grey.
The collection of these pixels will form the image.
In the above example, there are 256 quantization level.

6
❖ Difference between Image Sampling and Quantization:

Sampling Quantization

Digitization of co-ordinate values. Digitization of amplitude values.

x-axis(time) – discretized. x-axis(time) – continuous.

y-axis(amplitude) – continuous. y-axis(amplitude) – discretized.

Sampling is done prior to the quantization


Quantizatin is done after the sampling process.
process.

It determines the spatial resolution of the It determines the number of grey levels in the
digitized images. digitized images.

It reduces c.c. to a series of tent poles over a


It reduces c.c. to a continuous series of stair steps.
time.

A single amplitude value is selected from Values representing the time intervals are rounded
different values of the time interval to off to create a defined set of possible amplitude
represent it. values.

7
Image Processing Steps

1. Image Acquisition:
Image acquisition is the first step in image processing. This step is also known as pre-processing in image
processing. It involves retrieving the image from a source, usually a hardware-based source.

2. Image Enhancement:
Image enhancement is the process of bringing out and highlighting certain features of interest in an image that
has been obscured. This can involve changing the brightness, contrast, etc.

3. Image Restoration:
Image restoration is the process of improving the appearance of an image. However, unlike image enhancement,
image restoration is done using certain mathematical or probabilistic models.

8
4. Color Image Processing:
Color image processing includes a number of color modeling techniques in a digital domain. This step has
gained prominence due to the significant use of digital images over the internet.

5. Wavelets and Multiresolution Processing:


Wavelets are used to represent images in various degrees of resolution. The images are subdivided into wavelets
or smaller regions for data compression and for pyramidal representation.

6. Compression:
Compression is a process used to reduce the storage required to save an image or the bandwidth required to
transmit it. This is done particularly when the image is for use on the Internet.

7. Morphological Processing:
Morphological processing is a set of processing operations for morphing images based on their shapes.

8. Segmentation:
Segmentation is one of the most difficult steps of image processing. It involves partitioning an image into its
constituent parts or objects.

9. Representation and Description:


After an image is segmented into regions in the segmentation process, each region is represented and described
in a form suitable for further computer processing. Representation deals with the image’s characteristics and
regional properties. Description deals with extracting quantitative information that helps differentiate one class
of objects from the other.

10. Recognition:
Recognition assigns a label to an object based on its description.

9
Image Acquisition

Image acquisition can be defined as the act of procuring an image from sources.
This can be done via hardware systems such as cameras, encoders, sensors, etc.
In the image acquisition step, the incoming light wave from an object is converted into an electrical signal by a
combination of photo-sensitive sensors.
These small subsystems fulfil the role of providing your machine vision algorithms with an accurate description of
the object.
The supreme goal of an image acquisition system is to maximize the contrast for the features of interest.

❖ Components of Image Acquisition:


The image acquisition system is composed of four significant parts.
While the efficiency of the sensors and cameras might vary with the available technology, the users have
absolute control over the illumination systems. The major image acquisition components have been mentioned
below:
1. Trigger
2. Camera
3. Optics
4. Illumination

1. Trigger:
A completely free-running camera reads the input from the sensor permanently. Upon an “image query,” the
current image is captured completely.
After this, new image acquisition is started and then this completely captured image is transferred to the PC.
Sensors, PLC, and push buttons for manual operation can perform these image queries.
Triggers also depend on the type of camera you have installed in the system.

2. Camera:
In a machine vision system, the cameras are responsible for taking the light information from a scene and
converting it into digital information i.e. pixels using CMOS or CCD sensors.
Many key specifications of the system correspond to the camera’s image sensor.
These key aspects include resolution, the total number of rows, and columns of pixels the sensor
accommodates.
The higher the resolution, the more data the system collects, and the more precisely it can judge discrepancies
in the environment.
However, more data demands more processing, which can significantly limit the performance of a system.

Based on the image format, cameras could be of three major types:


2D cameras
3D cameras
Hyperspectral cameras

Based on the acquisition type, cameras could be classified into two major categories:
Line Scan cameras
Area scan cameras
While cameras and sensors are crucial, they alone are not sufficient to capture an image.

3. Optics:
The lens should provide appropriate working distance, image resolution, and magnification for a vision
system.
To calibrate magnification precisely, it is necessary to know the camera’s image sensor size and the field of
view that is desirable. Some of the most used lenses include:
Standard Resolution Lenses:

10
These lenses are optimized for focusing to infinity with low distortion and vignette.

Macro Lenses:
Specified in terms of their magnification relative to the camera sensor, they are optimized for ‘close-up’
focusing on negligible distortion.

High-Resolution Lenses:
These lenses offer better performance than standard resolution lenses and are suitable for precise
measurement applications.

Telecentric Lenses:
These are specialized lenses that produce no distortion and result in images with constant magnification
regardless of the object’s distance.

4. Illumination:
The lighting should provide uniform illumination throughout all the visible object surfaces.
The illumination system should be set up in a way that avoids glare and shadows.
Spectral uniformity and stability are key.
Ambient light and daytime need to be considered as well.

11
Color Image Representation

1. Binary Images:
It is the simplest type of image. It takes only two values i.e., Black, and White or 0 and 1. The binary image
consists of a 1-bit image and it takes only 1 binary digit to represent a pixel.
Binary images are mostly used for general shape or outline.
For Example: Optical Character Recognition (OCR).
Binary images are generated using threshold operation.
When a pixel is above the threshold value, then it is turned white ('1') and which are below the threshold value
then they are turned black ('0').
2. Gray-scale images:
Grayscale images are monochrome images, means they have only one color.
Grayscale images do not contain any information about color.
Each pixel determines available different grey levels.
A normal grayscale image contains 8 bits/pixel data, which has 256 different grey levels. In medical images and
astronomy, 12 or 16 bits/pixel images are used.
3. Color images:
Colour images are three band monochrome images in which, each band contains a different color and the actual
information is stored in the digital image.
The color images contain gray level information in each spectral band.
The images are represented as red, green and blue (RGB images).
And each color image has 24 bits/pixel means 8 bits for each of the three-color band (RGB).

❖ Image Formats:
1. 8-bit color format:
8-bit color is used for storing image information in a computer's memory or in a file of an image.
In this format, each pixel represents one 8 bit byte.
It has 0-255 range of colors, in which 0 is used for black, 255 for white and 127 for gray color.
The 8-bit color format is also known as a grayscale image.

2. 16-bit color format:


The 16-bit color format is also known as high color format.
It has 65,536 different color shades.
The 16-bit color format is further divided into three formats which are Red, Green, and Blue also known as
RGB format.
In RGB format, there are 5 bits for R, 6 bits for G, and 5 bits for B. One additional bit is added in green
because in all the 3 colors green color is soothing to eyes.

12
3. 24-bit color format:
The 24-bit color format is also known as the true color format.
The 24-bit color format is also distributed in Red, Green, and Blue.
As 24 can be equally divided on 8, so it is distributed equally between 3 different colors like 8 bits for R, 8
bits for G and 8 bits for B.

13
UNIT - 2
Intensity Transformation

Intensity transformation as the name suggests, we transform the pixel intensity value using some transformation
function or mathematical expression.
Intensity transformation operation is usually represented in the form
s = T(r)
where, r and s denote the pixel value before and after processing and T is the transformation that maps pixel value r
into s.
Basic types of transformation functions used for image enhancement are
• Linear (Negative and Identity Transformation)
• Logarithmic (log and inverse-log transformation)
• Power law transformation

1. Image Negatives: -

Equation: s = L – 1 – r
Consider L = 256 and r be the intensity of the image (Range 0 to 255).

2. Log Transformation: -

Equation: s = c log (1 + r) where c is a constant

Consider c = 1 and r be the intensity of the image (Range 0 to 255).

3. Power –Law (Gamma) corrections: -

Equation:

Where c and gamma are positive constants.

Consider c = 1, gamma =0.04 and r be the intensity of the image (Range 0 to 255)

14
The below figure summarizes these functions. Here, L denotes the intensity value (for 8-bit, L = [0,255])

This is a spatial domain technique which means that all the operations are done directly on the pixels.

❖ Applications:

1. To increase the contrast between certain intensity values or image regions.


2. For image thresholding or segmentation.

15
Histograms

In digital image processing, the histogram is used for graphical representation of a digital image.
A graph is a plot by the number of pixels for each tonal value.
Nowadays, image histogram is present in digital cameras. Photographers use them to see the distribution of tones
captured.
In a graph, the horizontal axis of the graph is used to represent tonal variations whereas the vertical axis is used to
represent the number of pixels in that pixel.
Black and dark areas are represented in the left side of the horizontal axis, medium grey color is represented in the
middle, and the vertical axis represents the size of the area.

❖ Applications of Histograms:
1. In digital image processing, histograms are used for simple calculations in software.

2. It is used to analyze an image. Properties of an image can be predicted by the detailed study of the
histogram.

3. The brightness of the image can be adjusted by having the details of its histogram.

4. The contrast of the image can be adjusted according to the need by having details of the x-axis of a
histogram.

5. It is used for image equalization. Gray level intensities are expanded along the x-axis to produce a high
contrast image.

6. Histograms are used in thresholding as it improves the appearance of the image.

7. If we have input and output histogram of an image, we can determine which type of transformation is
applied in the algorithm.

❖ Histogram Processing Techniques:

1. Histogram Sliding:

In Histogram sliding, the complete histogram is shifted towards rightwards or leftwards. When a histogram
is shifted towards the right or left, clear changes are seen in the brightness of the image.

The brightness of the image is defined by the intensity of light which is emitted by a particular light source.

16
2. Histogram Stretching:

In histogram stretching, contrast of an image is increased.

The contrast of an image is defined between the maximum and minimum value of pixel intensity.

If we want to increase the contrast of an image, histogram of that image will be fully stretched and covered the
dynamic range of the histogram.

From histogram of an image, we can check that the image has low or high contrast.

3. Histogram Equalization:

Histogram equalization is used for equalizing all the pixel values of an image. Transformation is done in such a
way that uniform flattened histogram is produced.

17
Histogram equalization increases the dynamic range of pixel values and makes an equal count of pixels at each
level which produces a flat histogram with high contrast image.

While stretching histogram, the shape of histogram remains the same whereas in Histogram equalization, the
shape of histogram changes and it generates only one image.

18
Spatial Filtering

Spatial Filtering technique is used directly on pixels of an image.


Mask is usually considered to be added in size so that it has specific center pixel.
This mask is moved on the image such that the center of the mask traverses all image pixels.

❖ General Classification:
Smoothing Spatial Filter: Smoothing filter is used for blurring and noise reduction in the image.
Blurring is pre-processing steps for removal of small details and Noise Reduction is accomplished by
blurring.
➢ Types of Smoothing Spatial Filter:

1. Mean Filter:
Linear spatial filter is simply the average of the pixels contained in the neighbourhood of the filter mask.
The idea is replacing the value of every pixel in an image by the average of the grey levels in the
neighbourhood define by the filter mask.
Types of Mean filter:
(i) Averaging filter: It is used in reduction of the detail in image. All coefficients are equal.
(ii) Weighted averaging filter: In this, pixels are multiplied by different coefficients.
Center pixel is multiplied by a higher value than average filter.

2. Order Statistics Filter:


It is based on the ordering the pixels contained in the image area encompassed by the filter.
It replaces the value of the center pixel with the value determined by the ranking result.
Edges are better preserved in this filtering.
Types of Order statistics filter:
(i) Minimum filter: 0th percentile filter is the minimum filter. The value of the center is
replaced by the smallest value in the window.
(ii) Maximum filter: 100th percentile filter is the maximum filter. The value of the center
is replaced by the largest value in the window.
(iii) Median filter: Each pixel in the image is considered. First neighboring pixels are
sorted and original values of the pixel is replaced by the median of the list.

Sharpening Spatial Filter: It is also known as derivative filter.


The purpose of the sharpening spatial filter is just the opposite of the smoothing spatial filter.
Its main focus is on the removal of blurring and highlight the edges.
It is based on the first and second order derivative.
First order derivative:
▪ Must be zero in flat segments.
▪ Must be non zero at the onset of a grey level step.
▪ Must be non zero along ramps.

First order derivative in 1-D is given by:


f' = f(x+1) - f(x)
Second order derivative:
▪ Must be zero in flat areas.
▪ Must be zero at the onset and end of a ramp.
▪ Must be zero along ramps.
Second order derivative in 1-D is given by:
f'' = f(x+1) + f(x-1) - 2f(x)

19
Fourier Transforms & It’s Properties

Fourier transform is the input tool that is used to decompose an image into its sine and cosine components.

❖ Properties of Fourier Transform:


1. Linearity:
Addition of two functions corresponding to the addition of the two frequency spectrum is called the
linearity.
a. If we multiply a function by a constant, the Fourier transform of the resultant function is
multiplied by the same constant.
b. The Fourier transform of sum of two or more functions is the sum of the Fourier transforms of
the functions.
c. Case I:
d. If h(x) -> H(f) then ah(x) -> aH(f)
e. Case II:
f. If h(x) -> H(f) and g(x) -> G(f) then h(x)+g(x) -> H(f)+G(f)

2. Scaling:
Scaling is the method that is used to the change the range of the independent variables or features of
data.
a. If we stretch a function by the factor in the time domain then squeeze the Fourier transform by
the same factor in the frequency domain.
b. If f(t) -> F(w) then f(at) -> (1/|a|)F(w/a)

3. Differentiation:
Differentiating function with respect to time yields to the constant multiple of the initial function.
a. If f(t) -> F(w) then f'(t) -> jwF(w)

4. Convolution:
It includes the multiplication of two functions.
a. The Fourier transform of a convolution of two functions is the point-wise product of their
respective Fourier transforms.
b. If f(t) -> F(w) and g(t) -> G(w)
c. then f(t)*g(t) -> F(w)*G(w)

5. Frequency Shift:
Frequency is shifted according to the co-ordinates.
a. There is a duality between the time and frequency domains and frequency shift affects the time
shift.
b. If f(t) -> F(w) then f(t)exp[jw't] -> F(w-w')

6. Time Shift:
The time variable shift also effects the frequency function.
a. The time shifting property concludes that a linear displacement in time corresponds to a linear
phase factor in the frequency domain.
b. If f(t) -> F(w) then f(t-t') -> F(w)exp[-jwt']

20
Frequency domain
Since this Fourier series and frequency domain is purely mathematics, so we will try to minimize that math’s part and
focus more on its use in DIP.

❖ Frequency domain analysis:


All the domains in which we have analyzed a signal, we analyze it with respect of frequency.

❖ Difference between spatial domain and frequency domain:


In spatial domain, we deal with images as it is.
The value of the pixels of the image change with respect to scene.
Whereas in frequency domain, we deal with the rate at which the pixel values are changing in spatial domain.

❖ Spatial domain:

In simple spatial domain, we directly deal with the image matrix.


Whereas in frequency domain, we deal an image.

❖ Frequency Domain:
We first transform the image to its frequency distribution.
Then our black box system perform what ever processing it has to performed, and the output of the black box in
this case is not an image, but a transformation.
After performing inverse transformation, it is converted into an image which is then viewed in spatial domain.

❖ Transformation:
A signal can be converted from time domain into frequency domain using mathematical operators called
transforms.

21
There are many kinds of transformation that does this. Some of them are given below.
1. Fourier Series.
2. Fourier transformation.
3. Laplace transform.
4. Z transform.

❖ Frequency components:
Any image in spatial domain can be represented in a frequency domain.
We will divide frequency components into two major components.

❖ High frequency components:


High frequency components correspond to edges in an image.

❖ Low frequency components:


Low frequency components in an image correspond to smooth regions.

22
Color Models

❖ Additive Color Model:


1. These types of models use light which is emitted directly from a source to display colors.

2. These models mix different amount of RED, GREEN, and BLUE (primary colors) light to produce rest of

the colors.

3. Adding these three primary colors results in WHITE image.

4. Example: RGB model is used for digital displays such as laptops, TVs, tablets, etc.
❖ Subtractive Color Model:
1. These types of models use printing inks to display colors.

2. Subtractive color starts with an object that reflects light and uses colorants to subtract portions of the white

light illuminating an object to produce other colors.

3. If an object reflects all the white light back to the viewer, it appears white, and if it absorbs all the light then

it appears black.

4. Example: Graphic designers used the CMYK model for printing purpose.

1. RGB:

The model’s name comes from the initials of the three additive primary colors, red, green, and blue.

The RGB color model is an additive color model in which red, green, and blue are added together in various

ways to reproduce a wide range of colors .

Usually, in RGB a pixel is represented using 8 bits for each red, green, and blue.

This creates a total of around 16.7 million colors (2²⁴).

Equal values of these three primary colors represents shade of gray color ranging from black to white.

RGB values will be at the corners present on the three axes.

The origin will be black, and the diagonal opposite to the origin will be black.

The rest three corners of the cube will be cyan, magenta, and yellow. Inside the cube, we get a variety of colors

represented by the RGB vector (origin at black).

23
With the help of the primary colors, we can generate secondary colors (Yellow, Cyan, and Magenta) as follows.
Colour combination:
Green(255) + Red(255) = Yellow
Green(255) + Blue(255) = Cyab
Red(255) + Blue(255) = Magenta
Red(255) + Greeb(255) + Blue(255) = White

2. CMY and CMYK:

The CMY color model is a subtractive color model in which cyan, magenta, and yellow (secondary colors)

pigments or dyes are mixed in different ways to produce a broad range of colors .

The secondary colors are also called the primary color pigments.

The CMY color model itself does not describe what is meant by cyan, magenta, and yellow colorimetrically, so

the mixing results are not specified as absolute but relative to the primary colors.

When the exact chromaticities of the cyan, magenta, and yellow primaries are defined, the color model then

becomes an absolute color space.


➢ The Process of Color Subtraction:

The methodology of color subtraction is a valuable way of predicting the ultimate color appearance of an object

if the color of the incident light and the pigments are known.

The relationship between the RGB and CMY color models is given by:
RGB = 1 — CMY or CMY = 1 — RGB

When a white light (R+G+B) is incident on this yellow surface, the blue lights will get absorbed and we will see

only the combination of red and green light.

24
Similarly, if we throw a magenta light, a combination of red and blue, on a yellow pigment, the result will be

a red light because the yellow pigment absorbs the blue light.

➢ CMYK:

CYMK color model is used in hardcopy devices.

According to the theory, 100% cyan, 100% magenta, and 100% yellow would result in pure black.

3. HSI:

HSI stands for Hue, Saturation, and Intensity.

When humans view a color object, its hue, saturation, and brightness are described.

1) Hue: It is a color attribute that describes a pure color.

2) Saturation: It measures the extent to which a pure color is diluted by white light.

3) Brightness: It depends upon color intensity, which is a key factor in describing the color sensation.

The intensity is easily measurable, and the results are also easily interpretable.

25
Pseudo Coloring

Pseudo Coloring is one of the attractive categories in image processing.


It is used to make old black and white images or videos colorful.
Pseudo Coloring techniques are used for analysis identifying color surfaces of the sample image and adaptive
modeling of histogram black and white image.

❖ Grayscale image:
It is a black and white image.
The pixels values are shades of gray colour which is the combination of white and black shades.
The image is represented in form of one 2-Dimensional matrix. Each value represents the intensity or
brightness of the corresponding pixel at that coordinate in the image.
Total 256 shades are possible for the grayscale images.
0 means black and 255 means white.
As we increase the value from 0 to 255, the white component gets increases and brightness increases.

❖ RGB color image:


It is a colored image.
It consists of three 2-Dimensional matrices, which are called channels.
Red, Green and Blue channels contain the corresponding colour values for each pixel in the image.
In integer format, the range of pixel intensity goes from 0 to 255.
0 means black and 255 represents the highest intensity of the primary colour.
There exist 256 shades of each colour.

❖ Steps:
1. Read the grayscale image.
2. If its bit-depth is 24, then make it 8.
3. Create an empty image of the same size.
4. Assign some random weight to RGB channels.
5. Copy weighted product of grayscale image to each channel of Red, Green, and Blue.
6. Display the images after creation.

❖ Functions Used:
1. imread( ) inbuilt function is used to read the image.
2. imtool( ) inbuilt function is used to display the image.
3. rgb2gray( ) inbuilt function is used to convert RGB to gray image.
4. uint8( ) inbuilt function is used to convert double into integer format.
5. pause( ) inbuilt function is used to stop execution for specified seconds.

26
Color Transformations
The conversion of those components between color models.
Formulation:
We model color transformations using the expression
G(x,y) = T[f(x,y)]
Where, f(x,y) is a color input image
G(x,y) is the transformed or processed color output image and T is an operation on f over a spatial
neighbourhood of (x,y)
The pixel values here are triplets or quarters from the color space chosen to represent the images.
Color can be described by its red (R), green (G) and blue (B) coordinates (the well-known RGB system), or by some
its linear transformation as XYZ, CMY, YUV, IQ, among others.
If the RGB coordinates are in the interval from 0 to 1, each color can be represented by the point in the cube in the
RGB space.
We need such model, where the range of values of saturation is identical for all hues.
From this point of view, the GLHS color model is probably the best from the current ones, particularly for w min =
wmid = wmax = 1/3.

❖ Color Complement:
The computed complement is reminiscent of conventional photographic color film negatives.
Reds of the original image are replaced by cyans in the complement.
When the original image is black, the complement is white, and so on.
Each of the hues in the complement image can be predicted from the original image using the color circle.
And each of the RGB component transforms involved in the computation of the complement is a function
of only the corresponding input color component.

❖ Histogram Processing:
The gray-level histogram processing transformations can be applied to color images in an automated way.
Since color images are composed of multiple components, however, consideration must be given to
adapting the gray-scale technique to more than one component and/or histogram.
It is generally unwise to histogram equalize the components of a color image independently.
This results in erroneous color.
A more logical approach is to spread the color intensities uniformly leaving the colors themselves
unchanged.

❖ Color Complement:
The hues directly opposite one another on the color circle of next figure are called complements.
Our interest in complements stems from the fact that they are analogous to the gray – scale negatives.

27
As in the gray-sacle case, color complements are useful for enhancing detail that is embedded in dark
regions of a color image – particularly when the regions are dominant in size.
The computed complement is reminiscet of conventional photographic color film negatives.
Reds of the original image are replaced by cyans in the complement.
When the original image is black, the complement is white, and so on.
Each of the hues in the complement image can be predicted from the original image using the color circle.
And each of the RGB component transforms involved in the computation of the complement is a function
of only the corresponding input color component.

❖ Color Slicing:
Highlighting a specific range of colors in an image is useful for separating objects from their surrounding.
The basic idea is either:
1. Display the colors of interest so that stand out from the background.
2. Use the region defined by the colors as a mask for further processing.
One of the simplest ways to “slice” a color image is to map the colors outside some range of interest to a no
prominent neutral color.

❖ The good model should satisfy some demands as:

1. The brightness should be a linear combination of all three RGB components. At least, it must be
continuous growing function of all of them.
2. The hue differences between the basic colors (red, green and blue) should be 120 ◦ and similarly
between the complement colors (yellow, purple and cyan). The hue difference between a basic color
and an adjacent complement one (e.g. red and yellow) should be 60 ◦.
3. The saturation should be 1 for the colors on the surface of the RGB color cube, it means in case of one
of the RGB components is 0 or 1 except black and white vertices and it is 0 in case of R=G=B.

28
Wavelet Transformation

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero.

Wavelets are functions that are concentrated in time and frequency around a certain point.

This transformation technique is used to overcome the drawbacks of fourier method.

Fourier transformation, although it deals with frequencies, does not provide temporal details.

This wavelet transform finds its most appropriate use in non-stationary signals.

This transformation achieves good frequency resolution for low-frequency components and high temporal resolution

for high-frequency components.

❖ Wavelet Analysis:

Wavelet analysis is used to divide information present on an image (signals) into two discrete components

approximations and details (sub-signals).

A signal is passed through two filters, high pass and low pass filters. The image is then decomposed into

high frequency (details) and low frequency components (approximation).

At every level, we get 4 sub-signals.

The approximation shows an overall trend of pixel values and the details as the horizontal, vertical and

diagonal components.

If these details are insignificant, they can be valued as zero without significant impact on the image, thereby

achieving filtering and compression.

29
❖ Wavelet Based Denoising of Images:

We perform a 3-level discrete wavelet transform on a noisy image and thresholding on the high frequency

(detail) components on the frequency domain of the image.

There are two types of thresholding for denoising — hard and soft.

Hard thresholding is the process of setting to zero the coefficients whose absolute values are lower than the

threshold λ .

Soft thresholding is another method by first setting to zero coefficients whose absolute values are lower than the

threshold λ and then shrinking the nonzero coefficients toward zero.

First we estimate thresholds for all detail coefficients.

Once we apply the threshold on all levels, we get the denoised matrices for all the detail components in every

level.

We use these matrices as coefficients for inverse discrete wavelet transformation to reconstruct the image.

The reconstructed image is now denoised.

30
UNIT - 3
Image Degradation and Restoration process

❖ IMAGE RESTORATION: -
Restoration improves image in some predefined sense. It is an objective process. Restoration attempts to
reconstruct an image that has been degraded by using a priori knowledge of the degradation phenomenon.
These techniques are oriented toward modeling the degradation and then applying the inverse process in
order to recover the original image. Image restoration refers to a class of methods that aim to remove or
reduce the degradations that have occurred while the digital image was being obtained.
All natural images when displayed have gone through some sort of degradation:
a) During display mode
b) Acquisition mode, or
c) Processing mode
The degradations may be due to
a) Sensor noise
b) Blur due to camera mis focus
c) Relative object-camera motion
d) Random atmospheric turbulence
e) Others

❖ Degradation Model: -
Degradation process operates on a degradation function that operates on an input image with an additive
noise term. Input image is represented by using the notation f(x,y), noise term can be represented as
η(x,y).These two terms when combined gives the result as g(x,y).
If we are given g(x,y), some knowledge about the degradation function H or J and some knowledge about
the additive noise teem η(x,y), the objective of restoration is to obtain an estimate f'(x,y) of the original
image.
We want the estimate to be as close as possible to the original image. The more we know about h and η ,
the closer f(x,y) will be to f'(x,y). If it is a linear position invariant process, then degraded image is given in
the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
h(x,y) is spatial representation of degradation function and symbol * represents convolution.
In frequency domain we may write this equation as
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms in the spatial domain.

The image restoration process can be achieved by inversing the image degradation process.
Although the concept is relatively simple, the actual implementation is difficult to achieve, as one requires
prior knowledge or identifications of the unknown degradation function and the unknown noise source. In
the following sections, common noise models and method of estimating the degradation function are
presented.

31
Noise Models

❖ Based on Distribution: -
Noise is a fluctuation in pixel values and it is characterized by random variable.
A random variable probability distribution is an equation that links the values of the statistical result with its
probability of occurrence.
Categorization of noise based on probability distribution is very popular.

❖ Gaussian Noise: -
Because of its mathematical simplicity, the Gaussian noise model is often used in practice and even in situations
where they are marginally applicable at best. Here, m is the mean and σ 2 is the variance.
Gaussian noise arises in an image due to factors such as electronic circuit noise and sensor noise due to poor
illumination or high temperature.
Random noise that enters a system can be modelled as a Gaussian or normal distribution.
Gaussian noise affects both dark and light areas of image.

Where z represents gray level.


µ is the mean of average values of z.
σ is the standard deviation.
The standard deviation squared, σ^2 is known as variance of z.

❖ Rayleigh Noise: -
This type of noise is mostly present in range images.
Range images are mostly used in remote sensing applications.

Here mean m and variance σ2 are the following:

Rayleigh noise is usually used to characterize noise phenomena in range imaging.

❖ Erlang (or gamma) Noise: -

Here ! indicates factorial. The mean and variance are given below.

Gamma noise density finds application in laser imaging.

❖ Exponential Noise: -
This type of noise occurs mostly due to the illumination problems.
It is present in laser imaging.

Here a > 0. The mean and variance of this noise pdf are:

This density function is a special case of b = 1.


Exponential noise is also commonly present in cases of laser imaging.

32
❖ Uniform Noise: -
It is also very popular noise occurs in images where different values of noise are equally probable.
It occurs because of Quantization noise.

The mean and variance are given below.

Uniform noise is not practically present but is often used in numerical simulations to analyze systems.

❖ Impulse Noise: -
It is also known as Shot Noise, Salt and Pepper Noise and Binary Noise.
It occurs mostly because of sensor and memory problem because of which pixels are assigned incorrect
maximum values.

If b > a, intensity b will appear as a light dot in the image. Conversely, level a will appear like a black dot in
the image. Hence, this presence of white and black dots in the image resembles to salt-and-pepper granules,
hence also called salt-and-pepper noise. When either P a or Pb is zero, it is called unipolar noise. The origin of
impulse noise is quick transients such as faulty switching in cameras or other such cases.

❖ Poisson Noise: -
This type of noise manifests as a random structure or texture in images.
It is very common in X-ray images.

P(z) = (np)^2 / z! * e^-np


Where
n is number of pixels
p is ratio of noise pixels to the total number of pixels.

❖ Gamma Noise: -
This type of noise also occurs mostly due to the illumination problems.
P(z) = a^b * z^b-1 / (b-1)! * e^-a^2 for z >= 0, otherwise 0.

❖ Based on Correlation: -
Statistical dependence among pixels is known as correlation.
If a pixel is independent of its neighboring pixels, it is known as uncorrelated pixel otherwise it is known
as correlated pixel.
Uncorrelated Noise is known as White Noise.
Mathematically for white noise, the noise power spectrum or power spectral density remains constant
with frequency.
Characterization of colored noise is quite difficult because its origin is mostly unknown.
One popular colored noise is Pink Noise.
Its power spectrum is not constant, rather it is proportional to reciprocal of frequency.
This is also known as 1/f or Flicker Noise.

❖ Based on Nature: -
Additive Noise:
In this case an image can be perceived as the image plus noise.
This is a linear problem.
G(x,y) = f(x,y) + n(x,y)
Where,

33
f is input of image

Multiplicative Noise:
It can be modelled as multiplicative process.
Speckle noise is most encountered multiplicative noise in image processing.
It is mostly present in medical images.
It can be modelled as pixel value multiplied by the random value.
I = f(x,y) + [f(x,y) * Ng

❖ Based on Sources: -
Noise based on source, commonly encountered in image processing are:
Quantization Noise:
It occurs due to a difference between the actual and allocated values.
It is inherent in the quantization process.

Photon Noise:
It occurs due to the statistical nature of electromagnetic waves.
Generation of photon is not constant because of statistical variation.
This causes variation in photon count which is known as photon noise.
It is present in many medical images.

34
Noise Filters
Noise is always presents in digital images during image acquisition, coding, transmission, and processing steps. It is
very difficult to remove noise from the digital images without the prior knowledge of filtering techniques. In this
article, a brief overview of various noise filtering techniques. These filters can be selected by analysis of the noise
behaviour. In this way, a complete and quantitative analysis of noise and their best suited filters will be presented over
here.
Filtering image data is a standard process used in almost every image processing system. Filters are used for this
purpose. They remove noise from images by preserving the details of the same. The choice of filter depends on the
filter behaviour and type of data.
We all know that, noise is abrupt change in pixel values in an image. So when it comes to filtering of images, the first
intuition that comes is to replace the value of each pixel with average of pixel around it. This process smooths the
image. For this we consider two assumptions.

❖ Assumption:
1. The true value of pixels are similar to true value of pixels nearby
2. The noise is added to each pixel independently.
Let’s first consider 1-dimensional function before going into 2-dimensional image.

In the above image of original function(fig-1), if we will consider each circle as pixel values, then the smoothed
function(fig-2) is the result of averaging the side by pixel values of each pixel.

35
1. Filtering with weighted moving average uniform weight: -
Instead of just thinking about averaging the local pixel, which is resulting in some loss of data, we consider a set
of local pixel and assign them as uniform weights. Here we assume that noise is added to each pixel
independently. According to this noise amount, we assign weights to different pixels.

2. Filtering with weighted moving average non-uniform weight: -


Previously we took the assumption that the true value of pixels are similar to true value of pixels nearby. But it is
not always true. So for higher accuracy we assign the nearby pixels with greater weight then the pixels that are
far away. This smooths the image and preserves the image information with less amount of data loss.

3. Weighted moving average in 2-dimensional image: -


Thinking of image as a 2-dimensional matrix, we slide a small window(the red square in fig. 5) over the whole
image to replace each pixel with the average of nearby pixels. This small window is otherwise known as mask or
kernel.

The process used in filtering with uniform weights is also called correlation or correlation filtering.

36
Fig. Correlation function for uniform weights. src: Udacity
In correlation filtering with non-uniform weight, an function is used as non-uniform weights which is also
called mask or kernel (function of the pixel values of the small sliding window) . The process used in it is
called cross-correlation.

Fig. Correlation function for non-uniform weights .src: Udacity

❖ Types of Image noise filters:


There are different types of image noise filters. They can typically be divided into 2 types.

Fig. Classification of Filters

Though there are many types of filters, for this article we will consider 4 filters which are mostly used in
image processing.

37
1. Gaussian Filter:
In image processing, a Gaussian blur (also known as Gaussian smoothing) is the result of blurring
an image by a Gaussian function (named after mathematician and scientist Carl Friedrich Gauss). It is a
widely used effect in graphics software, typically to reduce image noise and reduce detail.
2. Mean Filter:
Mean filter is a simple sliding window that replace the center value with the average of all pixel values in the
window. The window or kernel is usually a square but it can be of any shape.

3. Median Filter:
Mean filter is a simple sliding window that replace the center value with the Median of all pixel values in the
window. The window or kernel is usually a square but it can be of any shape.

4. Bilateral Filter:
Bilateral filter uses Gaussian Filter but it has one more multiplicative component which is a function of pixel
intensity difference. It ensures that only pixel intensity similar to that of the central pixel is included in
computing the blurred intensity value. This filter preserves edges.

38
Inverse Filtering

Inverse filtering is a deterministic and direct method for image restoration.


The process of removing blurs and noise is known as deconvolution or inverse filtering.
Simple deconvolution starts with a assumption that a blur is characterized by the PSF or the impulse response of the
system.
It assumes that most blurs are linear, and output of imaging system is the convolution of impulse response and input
image.
• The images involved must be lexicographically ordered. That means that an image is converted to a column vector
by pasting the rows one by one after converting them to columns.
• An image of size 256 × 256 is converted to a column vector of size 65536 × 1.
• The degradation model is written in a matrix form, where the images are vectors and the degradation process is a
huge but sparse matrix. 𝐠 = 𝐇𝐟
• The above relationship is ideal. What really happens is 𝐠 = 𝐇𝐟 + 𝐧!
• In this problem we know 𝐇 and 𝐠 and we are looking for a descent 𝐟.
• The problem is formulated as follows: We are looking to minimize the Euclidian norm of the error, i.e., ||𝐧||^2 = ||𝐠
− 𝐇𝐟||^2
• If 𝐇 is a square matrix and its inverse exists then 𝐟 = 𝐇^−𝟏 g

39
Homomorphism Filtering

Homomorphic filters are widely used in image processing for compensating the effect of no uniform illumination in
an image.
It is a generalized technique for signal and image processing.
Involving a nonlinear mapping to a different domain in which linear filter techniques are applied.
Followed by mapping back to the original domain.
It is simultaneously normalizes the brightness across and image and increase contrast.
Pixel intensities in an image represent the light reflected from the corresponding points in the objects. As per as
image model, image f(z,y) may be characterized by two components:
(1) the amount of source light incident on the scene being viewed, and
(2) the amount of light reflected by the objects in the scene.
These portions of light are called the illumination and reflectance components, and are denoted i ( x , y) and r ( x , y)
respectively.
The functions i ( x , y) and r ( x , y) combine multiplicatively to give the image function f ( x , y): f ( x , y) = i ( x ,
y).r(x, y) (1) where 0 < i ( x , y ) < a and 0 < r( x , y ) < 1.
Homomorphic filters are used in such situations where the image is subjected to the multiplicative interference or
noise as depicted.
We cannot easily use the above product to operate separately on the frequency components of illumination and
reflection because the Fourier transform of f ( x , y) is not separable; that is F[f(x,y)) not equal to F[i(x, y)].F[r(x,
y)].
We can separate the two components by taking the logarithm of the two sides ln f(x,y) = ln i(x, y) + ln r(x, y).
Taking Fourier transforms on both sides we get, F[ln f(x,y)} = F[ln i(x, y)} + F[ln r(x, y)]. that is, F(x,y) = I(x,y) +
R(x,y), where F, I and R are the Fourier transforms ln f(x,y),ln i(x, y) , and ln r(x, y). respectively. The function F
represents the Fourier transform of the sum of two images: a low-frequency illumination image and a highfrequency
reflectance image.
If we now apply a filter with a transfer function that suppresses low- frequency components and enhances high-
frequency components, then we can suppress the illumination component and enhance the reflectance component.

❖ Applications:
1. It is used for removing multiplicative noise that has certain characteristics.
2. It is also used in correcting non uniform illumination in images.
3. It can be used for improving the appearance of a grey scale image.

40
UNIT - 4
Coding Redundancy

Redundancy means repetitive data.


This may be data that share some common characteristics or overlapped information.

❖ Types of redundancy:
1. Coding Redundancy:
• Coding redundancy is caused due to poor selection of coding technique.
• Coding techniques assigns a unique code for all symbols of message.
• Wrong choice of coding technique creates unnecessary additional bits. These extra bits are called
redundancy.
Coding Redundancy = Average bits used to code – Entropy

2. Inter-pixel Redundancy:
▪ This type of redundancy is related with the inter-pixel correlations within an image.
▪ Much of the visual contribution of a single pixel is redundant and can be guessed from the values of
its neighbors.
▪ Example:
• The visual nature of the image background is given by many pixels that are not actually
necessary.
• This is known as Spatial Redundancy or Geometrical Redundancy.
• Inter-pixel dependency is solved by algorithms like:
o Predictive Coding, Bit Plane Algorithm, Run Length Coding and Dictionary based
Algorithms.
• Spatial Redundancy may be present in:
o Single frame or among multiple frames.

3. Psycho-visual Redundancy:
• The eye and the brain do not respond to all visual information with same sensitivity.
• Some information is neglected during the processing by the brain. Elimination of this information
does not affect the interpretation of the image by the brain.
• Edges and textual regions are interpreted as important features and the brain groups and correlates
such grouping to produce its perception of an object.
• Psycho visual redundancy is distinctly vision related, and its elimination does result in loss of
information.
• Quantization is an example.
• When 256 levels are reduced by grouping to 16 levels, objects are still recognizable. The
compression is 2:1, but an objectionable graininess and contouring effect results.

4. Chromatic Redundancy:
• It refers to the presence of unnecessary colors in an image.
• The color channels of color images are highly correlated and human visual system can not perceive
millions of colors.
• Therefore the colors that are not perceived by human visual system can be removed without
affecting the image quality.

41
Interpixel Redundancy

Interpixel redundancy can present in between images if the co-relation result of them belongs to a structural or
geometric relationship between the objects in the image.
In this case the histogram of such images virtually looks identical. But in reality, objects in the image are of different
structure and geometry. This happens because the gray levels in these images are not equally probable. In this case
we can apply variable length coding to reduce the coding redundancy because it would result from a straight or
natural binary
encoding of their pixels. The coding process cannot alter the level of co-relation between the pixels within the
images. The codes used to represent the gray levels of each image.
Inter pixel redundancy and Psychovisual redundancy do not have the co-relation between the pixels. These co-
relations are related with the
structural or geometric relationship between the objects in the image.

42
Psychovisual Redundancy

The brightness of region of image perceived by human eyes depends upon factors other than light reflected
by the region. For example intensity variation can be perceived in an area of constant intensity, such
phenomena results in the fact that the eye does not respond with equal sensitivity to all visual information.
Certain information simply has
less relative importance than other information in normal visual processing. This information is said to be
Psychovisual redundant. This type of redundancy can be eliminated without significantly impairing the
quality of image perception. This type of redundancy can present in image as human perception of the
information in an image normally does not involve quantitative analysis of every pixel values in the image. In
general an observer searches for distinguishing features such as edges or texture region and mentally
combines them into recognizable groupings. Afterwards the brain then correlates these grouping with prior
knowledge in order to complete the image interpretation process. Psychovisual redundancy is associated with
real or quantifiable
visual information. The elimination of such information is possible only because of such information are not
essential for normal visual processing. Since the elimination of Psych visually redundant data results in a loss
of quantitative information, this commonly refers to as quantization. This is irreversible operation hence it is
called as lossy data compression.

43
Huffman Coding

It is a type of variable length coding.


Here coding redundancy can be eliminated by choosing a better way of assigning the codes.
The Huffman Coding algorithm is given as:
1. List the symbols and sort probabilities per symbol.
2. Combine the lowest two probabilities of symbols and label the new code with it.
3. Newly created item is given priority and placed at the highest position in the sorted list.
4. Repeat step 2 util only one node remain.
5. Assign code 0 to higher up symbol and 1 to the lower down symbol.
6. Now trace the code symbols going backwards.
Huffman Coding is one of the lossless compression algorithms, its main motive is to minimize the data’s total code
length by assigning codes of variable lengths to each of its data chunks based on its frequencies in the data. High-
frequency chunks get assigned with shorter code and lower-frequency ones with relatively longer code, making a
compression factor ≥ 1.
In Information Theory, Shannon’s source coding theorem expresses that an independent and identically distributed
random variable data code rate (average code length for symbols) cannot be smaller than the Shannon’s entropy. It is
proven that Huffman Coding Algorithm provides optimality following Shannon’s source coding theorem, i.e., after
encoding it provides the lowest possible bit rate.

❖ Huffman Coding: -
Following are the two steps in Huffman Coding
• Building Huffman Tree
• Assigning codes to Leaf Nodes
➢ Building Huffman Tree: -
First Compute probabilities for all data chunks, build nodes for each of the data chunks and push all nodes into a
list. Now pop the least two probabilistic nodes and create a parent node out of them, with probability as the sum
of both their probabilities, now add this parent node to the list. Now repeat the process with the current set of
nodes until you create a parent with probability = 1.

❖ Assigning codes to Leaf Nodes: -

By following the tree build from the above procedure, at an arbitrary node assign its child nodes with child_word =

current encoded-word + ‘0’ / ‘1’ to its left/right nodes. Apply this procedure from the root node.

44
Arithmetic Coding

In this coding technique, a string of characters like the words “hello there” is represented using a fixed number of
bits per character, as in the ASCII code.
When a string is converted to arithmetic encoding, frequently used characters are stored with fewer bits and not-
so-frequently occurring characters are stored with more bits, resulting in relatively fewer bits used in total.

In the most straightforward case, the probability of every symbol occurring is equivalent. For instance, think
about a set of three symbols, A, B, and C, each similarly prone to happen. Straightforward block encoding would
require 2 bits for every symbol, which is inefficient as one of the bit varieties is rarely utilized. In other words, A
= 00, B = 01, and C = 10, however, 11 is unused. An increasingly productive arrangement is to speak to a
succession of these three symbols as a rational number in base 3 where every digit speaks to a symbol. For
instance, the arrangement “ABBCAB” could become 0.011201. In arithmetic coding as an incentive in the stretch
[0, 1). The subsequent stage is to encode this ternary number utilizing a fixed-guide paired number of adequate
exactness toward recuperating it, for example, 0.00101100102 — this is just 10 bits. This is achievable for long
arrangements because there are productive, set up calculations for changing over the base of subjectively exact
numbers.
When all is said and done, arithmetic coders can deliver close ideal output for some random arrangement of
symbols and probabilities (the ideal value is – log2P bits for every symbol of likelihood P). Compression
algorithms that utilize arithmetic coding go by deciding a structure of the data – fundamentally an expectation of
what examples will be found in the symbols of the message. The more precise this prediction is, the closer to
ideal the output will be.
When all is said and done, each progression of the encoding procedure, aside from the absolute last, is the
equivalent; the encoder has fundamentally only three bits of information to consider: The fo llowing symbol that
should be encoded. The current span (at the very beginning of the encoding procedure, the stretch is set to [0,1],
yet that will change) The probabilities the model allows to every one of the different symbols that are conceivable
at this stage (as referenced prior, higher-request or versatile models imply that these probabilities are not really
the equivalent in each progression.) The encoder isolates the current span into sub-spans, each speaking to a small
amount of the current span relative to the likelihood of that symbol in the current setting. Whichever stretch
relates to the real symbol that is close to being encoded turns into the span utilized in the subsequent stage. At the
point when the sum total of what symbols have been encoded, the subsequent span unambiguously recognizes the
arrangement of symbols that delivered it. Any individual who has a similar last span and model that is being
utilized can remake the symbol succession that more likely than not entered the encoder to bring about that last
stretch. It isn’t important to transmit the last stretch, in any case; it is just important to transmit one division that
exists in that span. Specifically, it is just important to transmit enough digits (in whatever base) of the part so all
divisions that start with those digits fall into the last stretch; this will ensure that the subsequent code is a prefix
code.

❖ Difference between Arithmetic coding and Huffman coding: -

45
Compression

The objective of compression algorithm is to reduce the source data to a compressed form and decompress it to get
the original data.
Any Compression algorithm has two components:

❖ Modeler: -
It is used to condition the image data for compression using the knowledge of data.
It is present in both sender and receiver.
It can be either static or dynamic.

❖ Coder: -
Sender side coder is known as encoder.
This codes the symbols independently or using the model.
Receiver side coder is known as decoder.
Decoder, decodes the message from the compressed data.

❖ Compression Algorithms: -
➢ Lossless Compression –
Reconstructed data is identical to the original data (Entropy Coding).
Example techniques:
Huffman Coding.
Arithmetic Coding.
Shannon - Fano Coding.

➢ Lossy Compression –
Reconstructed data approximates the original data (Source Coding).
Example techniques:
Linear Prediction.
Transform Coding.

❖ Difference between lossless and lossy compression: -

46
❖ Another way of classifying image compression algorithm is:
➢ Entropy Coding:
The average information in an image is known as its entropy.
Coding is based on the entropy of the source and on the possibility of occurrence of the symbols.
An event that is less likely to occur is said to contain more information that an event that is more likely to
occur.
Set of symbols (alphabet) S = {S1, S2, ……. , Sn},
n is number of symbols in the alphabet
Probability distribution of the symbols: P = { p1, p2, ………. , pn}
According the Shannon, the entropy H of an information source S is defined as follows:

➢ Predictive Coding:
The idea is to remove the mutual dependency between the successive pixels and then perform the encoding.
Normally the samples would be very large but the difference would the small.

➢ Transform Coding:
Objective is to exploit the information packing capability of the transform.
Energy is packed into fewer components and only these components are encoded and transmitted.
Idea is to remove the redundant high frequency components to create compression.
Removal of these frequency components leads to loss of information.
This loss of information, if tolerable, can be used for imaging and video applications.

➢ Layered Coding:
It is very useful in case of layered images.
Data structures like pyramids are useful to represent and image in multiresolution form.
Sometimes these images are segmented as foreground and background, and based on the application
requirement encoding is done.
It is also in the form of selected frequency coefficients or selected bits of pixels in an image.

47
JPEG compression

JPEG is a lossy image compression method. JPEG compression uses the DCT (Discrete Cosine Transform) method
for coding transformation. It allows a trade-off between storage size and the degree of compression can be adjusted.

❖ Following are the steps of JPEG Image Compression: -

Step 1: The input image is divided into a small block which is having 8x8 dimensions. This dimension is sum up
to 64 units. Each unit of the image is called pixel.

Step 2: JPEG uses [Y,Cb,Cr] model instead of using the [R,G,B] model. So in the 2 nd step, RGB is converted into
YCbCr.

Step 3: After the conversion of colors, it is forwarded to DCT. DCT uses a cosine function and does not use
complex numbers. It converts information?s which are in a block of pixels from the spatial domain to the
frequency domain.

DCT Formula

Step 4: Humans are unable to see important aspects of the image because they are having high frequencies. The
matrix after DCT conversion can only preserve values at the lowest frequency that to in certain point. Quantization
is used to reduce the number of bits per sample.

48
➢ There are two types of Quantization:
1. Uniform Quantization
2. Non-Uniform Quantization

Step 5: The zigzag scan is used to map the 8x8 matrix to a 1x64 vector. Zigzag scanning is used to group low-
frequency coefficients to the top level of the vector and the high coefficient to the bottom. To remove the large
number of zero in the quantized matrix, the zigzag matrix is used.

Step 6: Next step is vectoring, the different pulse code modulation (DPCM) is applied to the DC component. DC
components are large and vary but they are usually close to the previous value. DPCM encodes the difference
between the current block and the previous block.

Step 7: In this step, Run Length Encoding (RLE) is applied to AC components. This is done because AC
components have a lot of zeros in it. It encodes in pair of (skip, value) in which skip is non zero value and value
is the actual coded value of the non zero components.

49
Step 8: In this step, DC components are coded into Huffman.

50
UNIT - 5
Point Detection

In point-detection, the point appears near the mask in which it is cantered (x, y). We use two masks to detect lines so
that each of these points can more easily associate with a line based on its direction as compared to its position on
the other.

❖ What Is A Point Detection?


Fire detection is mainly handled by point detection which combines heat and smoke into a single detector. These are
also known as multi-sensors. There is a legendary record of accuracy and consistency in the Fix fire point detection
range.

❖ Which Mask Is Used For Point Detection?


Point detection in a Laplacian works with an isotropic signal. It is not sensitive to directions. Horizontal masks,
+45o horizontal, -45o vertical masks have the same effects.

❖ What Is Interest Point Detection In Image Processing?


Point of difference in an image where a well-defined position can accurately be discerned, although it can also be
robustly detected. (e.g., a significant loss of one or more image properties at once) is generally associated with it.
The key factors that make up intensity, color, and texture are color, texture, and intensity.

❖ How Do You Find The Point Of An Image?


Point features in a photo have a lot to do with image detection. A characteristic that has the advantage of strength in
two or more directions at least for most images is the feature of thin surfaces on the edge of the image. Apply a
mask over an image if you want it to appear darker. Thresholding must be applied by: R://> T, then any
discontinuity.

❖ How Do You Detect Isolated Points In An Image?


In a centering operation, centered edges can be the location of each pixel that allows the mask’s output or response
to be computed. The image is detected with this technique for detecting isolated spots. There will be a significant
difference in the graylevel of an isolated point from that of its neighbors.

51
❖ Which Of The Following Is Used For Point Detection?

Que. For point detection we use

b. second derivative

c. third derivative

d. Both a and b

Answer:second derivative

❖ Which Mask Can Be Used For Detection Of Vertical Line?

-1 -2 -1

1 2 1

❖ Why Mask Is Used In Image Processing?


Using masks restricts a point or arithmetic operator to an area which was highlighted by the mask itself. mask can be
directly defined as the operator to be assigned to the pixels defined in its mask, so this operator is only applied
automatically if the mask specifies those pixels.

❖ Which Filter Is Used For Edge Detection?


It includes two stages in the Canny filter process. During gradient computation, a filter is applied to an input
Gaussian derivative to achieve this intensity. Using the Gaussian filter, you can reduce the noise in an image.

❖ What Is An Interest Point In An Image?


An interest point represents something special about a particular location in an image. A: In order to detect and
describe this point, you must first step two: b. ‘Local features’ can be defined by a feature detector (extractor),
which extracts various regions (‘feature columns’) from an input image.

❖ Which Filter Can Be Used For Point Detection?


Due to its scale-invariant feature transform algorithm, the so-called SIFT algorithm detects extrema by making a
comparison between the LoG and DoG distributions at the same time, which is called a difference-of-Gaussian.

❖ What Is The Main Limitation Of Harris Interest Point Detector?


The only big drawback of the Harris corner detector is the need for each image to be manually screened to identify
the most important elements. Points detect relatively small amounts of noisy images and image noises when using a
low threshold value.

52
❖ What Is Image Of The Point?
If points P and N are both reflected at the same distance on opposite sides of the line then, as if they were mirrors,
the given point will appear on the other side as well. A “reflector” of a line indicates this by the name “P'”; this is
how you call P’ (pronounced P prime).

❖ How Do You Find The Image Of A Point With Respect To A Point?


Follow the single step: let the midpoint of $AB$ be $C$ to arrive at $AB$ as a target. Afterwards, the following
horizontal pairs are parallel with the respective y’s, the product of their corresponding y’s is $ – 1$. A mirror image
of this is equal to $(2,1)$ with respect to line mirror $x + y – 5 = 0 ($is$ (4,3)$ per direction.

❖ How Do You Find The Image Of A Point In A Given Line?


• The distance between the forward and the backward edge of the graph is R and P. This relationship
corresponds to an equilibrate distance to line AB.
• The middle point of (h,k) with 3 to (h,8) was joined (2H++3.2k+8).
• In addition, PQ and AB are perpendicular to one another.
• How much the PQ slopes, and therefore how much AB slopes.
• As a result, slope of AB = 31 s.
• With regard to slope of PQ, it stands at 3.0.

53
Line Detection

The Hough Transform is a method that is used in image processing to detect any shape, if that shape can be
represented in mathematical form. It can detect the shape even if it is broken or distorted a little bit.
We will see how Hough transform works for line detection using the Hough Line transform method. To apply the
Hough line method, first an edge detection of the specific image is desirable.

❖ Basics of Hough line Method: -


A line can be represented as y = mx + c or in parametric form, as r = xcosθ + ysinθ where r is the
perpendicular distance from origin to the line, and θ is the angle formed by this perpendicular line and
horizontal axis measured in counter-clockwise ( That direction varies on how you represent the coordinate
system. This representation is used in OpenCV).

So Any line can be represented in these two terms, (r, θ).

Working of Hough line method: -

• First it creates a 2D array or accumulator (to hold values of two parameters) and it is set to zero
initially.
• Let rows denote the r and columns denote the (θ)theta.
• Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1
degree, you need 180 columns(Maximum degree for a straight line is 180).
• For r, the maximum distance possible is the diagonal length of the image. So taking one pixel
accuracy, number of rows can be diagonal length of the image.

Example: -
Consider a 100×100 image with a horizontal line at the middle. Take the first point of the line. You know its (x,y)
values. Now in the line equation, put the values θ(theta) = 0,1,2,….,180 and check the r you get. For every (r, 0)
pair, you increment value by one in the accumulator in its corresponding (r,0) cells. So now in accumulator, the
cell (50,90) = 1 along with some other cells.
Now take the second point on the line. Do the same as above. Increment the values in the cells corresponding to
(r,0) you got. This time, the cell (50,90) = 2. We are actually voting the (r,0) values. You continu e this process for
every point on the line. At each point, the cell (50,90) will be incremented or voted up, while other cells may or
may not be voted up. This way, at the end, the cell (50,90) will have maximum votes. So if you search the
accumulator for maximum votes, you get the value (50,90) which says, there is a line in this image at distance 50
from origin and at angle 90 degrees.

54
Everything explained above is encapsulated in the OpenCV function, cv2.HoughLines(). It simply returns an
array of (r, 0) values. r is measured in pixels and 0 is measured in radians.

55
Edge Detection

Edges are significant local changes of intensity in a digital image. An edge can be defined as a set of connected
pixels that forms a boundary between two disjoint regions. There are three types of edges:
• Horizontal edges
• Vertical edges
• Diagonal edges
Edge Detection is a method of segmenting an image into regions of discontinuity. It is a widely used technique in
digital image processing like

• pattern recognition
• image morphology
• feature extraction
Edge detection allows users to observe the features of an image for a significant change in the gray level. This
texture indicating the end of one region in the image and the beginning of another. It reduces the amount of data
in an image and preserves the structural properties of an image.
Edge Detection Operators are of two types:
• Gradient – based operator which computes first-order derivations in a digital image like, Sobel
operator, Prewitt operator, Robert operator
• Gaussian – based operator which computes second-order derivations in a digital image like, Canny
edge detector, Laplacian of Gaussian

Sobel Operator: It is a discrete differentiation operator. It computes the gradient approximation of image
intensity function for image edge detection. At the pixels of an image, the Sobel operator produces either the
normal to a vector or the corresponding gradient vector. It uses two 3 x 3 kernels or masks which ar e convolved
with the input image to calculate the vertical and horizontal derivative approximations respectively –

56
❖ Advantages:
1. Simple and time efficient computation.
2. Very easy at searching for smooth edges.

❖ Limitations:
1. Diagonal direction points are not preserved always.
2. Highly sensitive to noise.
3. Not very accurate in edge detection.
4. Detect with thick and rough edges does not give appropriate results.

❖ Prewitt Operator:
This operator is almost similar to the sobel operator. It also detects vertical and horizontal edges of an image.
It is one of the best ways to detect the orientation and magnitude of an image. It uses the kernels or masks –

❖ Advantages:
1. Good performance on detecting vertical and horizontal edges.
2. Best operator to detect the orientation of an image.

❖ Limitations:
1. The magnitude of coefficient is fixed and cannot be changed.
2. Diagonal direction points are not preserved always.

❖ Robert Operator:
This gradient-based operator computes the sum of squares of the differences between diagonally adjacent
pixels in an image through discrete differentiation. Then the gradient approximation is made. It uses the
following 2 x 2 kernels or masks –

❖ Advantages:
1. Detection of edges and orientation are very easy
2. Diagonal direction points are preserved
❖ Limitations:
1. Very sensitive to noise
2. Not very accurate in edge detection

57
❖ Marr-Hildreth Operator or Laplacian of Gaussian (LoG):
It is a gaussian-based operator which uses the Laplacian to take the second derivative of an image. This
really works well when the transition of the grey level seems to be abrupt. It works on the zero-crossing
method i.e when the second-order derivative crosses zero, then that particular location corresponds to a
maximum level. It is called an edge location. Here the Gaussian operator reduces the noise and the Laplacian
operator detects the sharp edges.
The Gaussian function is defined by the formula:

❖ Advantages:
1. Easy to detect edges and their various orientations.
2. There is fixed characteristics in all directions.
❖ Limitations:
1. Very sensitive to noise
2. The localization error may be severe at curved edges.
3. It generates noisy responses that do not correspond to edges, so-called “false edges”.
❖ Canny Operator:
It is a gaussian-based operator in detecting edges. This operator is not susceptible to noise. It extracts image
features without affecting or altering the feature. Canny edge detector have advanced algorithm derived from
the previous work of Laplacian of Gaussian operator. It is widely used an optimal edge detection technique.
It detects edges based on three criteria:
1. Low error rate
2. Edge points must be accurately localized
3. There should be just one single edge response

Advantages:
1. It has good localization.
2. It extracts image features without altering the features.
3. Less Sensitive to noise.

Limitations:
1. There is false zero crossing.
2. Complex computation and time consuming.

Some Real-world Applications of Image Edge Detection:


• medical imaging, study of anatomical structure
• locate an object in satellite images.
• automatic traffic controlling systems.
• face recognition, and fingerprint recognition

58
Thresholding

Image segmentation is the technique of subdividing an image into constituent sub-regions or distinct objects. The
level of detail to which subdivision is carried out depends on the problem being solved. That is, segmentation
should stop when the objects or the regions of interest in an application have been detected.
Segmentation of non-trivial images is one of the most difficult tasks in image processing. Segmentation accuracy
determines the eventual success or failure of computerized analysis procedures. Segmentation procedures are
usually done using two approaches – detecting discontinuity in images and linking edges to form the region (known
as edge-based segmenting), and detecting similarity among pixels based on intensity levels (known as threshold -
based segmenting).

❖ Thresholding: -
Thresholding is one of the segmentation techniques that generates a binary image (a binary image is one
whose pixels have only two values – 0 and 1 and thus requires only one bit to store pixel intensity) from a
given grayscale image by separating it into two regions based on a threshold value. Hence pixels having
intensity values greater than the said threshold will be treated as white or 1 in the output image and the others
will be black or 0.

Suppose the above is the histogram of an image f(x,y). We can see one peak near level 40 and another at
180. So there are two major groups of pixels – one group consisting of pixels having a darker shade and
the others having a lighter shade. So there can be an object of interest set in the background. If we use an
appropriate threshold value, say 90, will divide the entire image into two distinct regions.
In other words, if we have a threshold T, then the segmented image g(x,y) is computed as shown below:

So the output segmented image has only two classes of pixels – one having a value of 1 and others
having a value of 0.
If the threshold T is constant in processing over the entire image region, it is said to be global
thresholding. If T varies over the image region, we say it is variable thresholding.
Multiple-thresholding classifies the image into three regions – like two distinct objects on a background.
The histogram in such cases shows three peaks and two valleys between them. The segmented image can
be completed using two appropriate thresholds T 1 and T2.

59
We may intuitively infer that the success of intensity thresholding is directly related to the width and
depth of the valleys separating the histogram modes. In turn, the key factors affecting the properties of
the valleys are the separation between peaks, the noise content in the image, and the relative sizes of
objects and backgrounds. The more widely the two peaks in the histogram are separated, the better
thresholding and hence image segmenting algorithms will work. Noise in an image often degrades this
widely-separated two-peak histogram distribution and leads to difficulties in adequate thresholding and
segmenting. When noise is present, it is appropriate to use some filter to clean the image a nd then apply
segmentation. The relative object sizes play a role in determining the accuracy of segmentation.

❖ Global Thresholding: -

When the intensity distribution of objects and background are sufficiently distinct, it is possible to use a
single or global threshold applicable over the entire image. The basic global thresholding algorithm
iteratively finds the best threshold value so segmenting.
The algorithm is explained below.
1. Select an initial estimate of the threshold T.
2. Segment the image using T to form two groups G 1 and G2: G1 consists of all pixels with
intensity values > T, and G 2 consists of all pixels with intensity values ≤ T.
3. Compute the average intensity values m 1 and m2 for groups G1 and G2.σ
4. Compute the new value of the threshold T as T = (m 1 + m2)/2
5. Repeat steps 2 through 4 until the difference in the subsequent value of T is smaller than a
pre-defined value δ.
6. Segment the image as g(x,y) = 1 if f(x,y) > T and g(x,y) = 0 if f(x,y) ≤ T.
This algorithm works well for images that have a clear valley in their histogram. The larger the value of
δ, the smaller will be the number of iterations. The initial estimate of T can be made equal to the average
pixel intensity of the entire image.
The above simple global thresholding can be made optimum by using Otsu’s method. Otsu’s method is
optimum in the sense that it maximizes the between-class variance. The basic idea is that well-thresholded
classes or groups should be distinct with respect to the intensity values of their pixels and conversely, a
threshold giving the best separation between classes in terms of their intensity values would be the best or
optimum threshold.

60
❖ Variable Thresholding: -
There are broadly two different approaches to local thresholding. One approach is to partition the image into
non-overlapping rectangles. Then the techniques of global thresholding or Otsu’s method are applied to each
of the sub-images. Hence in the image partitioning technique, the methods of global thresholding are applied
to each sub-image rectangle by assuming that each such rectangle is a separate image in itself. This approach
is justified when the sub-image histogram properties are suitable (have two peaks with a wide valley in
between) for the application of thresholding techniques but the entire image histogram is corrupted by noise
and hence is not ideal for global thresholding.
The other approach is to compute a variable threshold at each point from the neighbourhood pixel properties.
Let us say that we have a neighbourhood Sxy of a pixel having coordinates (x,y). If the mean and standard
deviation of pixel intensities in this neighbourhood be mxy and σxy , then the threshold at each point can be
computed as:

where a and b are arbitrary constants. The above definition of the variable threshold is just an example. Other
definitions can also be used according to the need.
The segmented image is computed as:

Moving averages can also be used as thresholds. This technique of image thresholding is the most general
one and can be applied to widely different cases.

61
Edge Linking and Boundary Detection

Edge linking and boundary detection operations are the fundamental steps in any image understanding. Edge linking
process takes an unordered set of edge pixels produced by an edge detector as an input to form an ordered list of
edges. Local edge information are utilized by edge linking operation; thus edge detection algorithms typically are
followed by linking procedure to assemble edge pixels into meaningful edges.

❖ Local Processing: -
One of the simplest approaches of linking edge points is to analyze the characteristics of pixels in a small
neighborhood (say, 3 x 3 or 5 x 5) about every point (x, y) in an image that has undergone edge-detection. All
points that are similar are linked, forming a boundary of pixels that share some common properties.
The two principal properties used for establishing similarity of edge pixels in this kind of analysis are (1) the
strength of the response of the gradient operator used to produce the edge pixel; and (2) the direction of the
gradient vector. The first property is given by the value of , the gradient. Thus an edge pixel with coordinates in
a predefined neighborhood of (x, y) is similar in magnitude to the pixel at (x, y) if

where E is a non-negative threshold:


The direction (angle) of the gradient vector is given by. An edge pixel at in the predefined neighborhood of (x,
y) has an angle similar to the pixel at (x, y) if
where A is a nonnegative angle threshold. As noted in 9.2.3, the direction of the edge at (x, y)
is perpendicular to the direction of the gradient vector at that point.
A point in the predefined neighborhood of (x, y) is linked to the pixel at (x, y) if both magnitude and direction
criteria are satisfied. This process is repeated at every location in the image. A record must be kept of linked
points as the center of the neighborhood is moved from pixel to pixel. A simple book keeping procedure is to
assign a different gray level to each set of linked edge pixels.

❖ Global Processing via the Hough transform: -


In this section, points are linked by determining first if they lie on a curve of specified shape. Unlike the local
analysis method, we now consider global relationships between pixels.
Suppose that, for n points in an image, we want to find subsets of these points that lie on straight lines. One
possible solution is to first find all lines determined by every pair of points and then find all subsets of points
that are close to particular lines. The problem with this procedure is that it involves finding n(n-1)/2 lines and
then performing (n)(n(n-1))/2 comparisons of every point to all lines. This approach is computationally
prohibitive in al but the most trivial applications.

❖ Global Processing via Graph-Theoretic Techniques: -


In this section, a global approach based on representing edge segments in the form of a graph and searching the
graph for low-cost paths that correspond to significant edges is discussed. This representation provides a rugged
approach that performs well in the presence of noise. As might be expected, the procedure is considerably more
complicated and requires more processing time.
A graph G = (N, A) is a finite, non empty set of nodes N, together with a set A of unordered pair of distinct
elements of N. Each pair of A is called an arc. A graph in which the arcs are directed is called a directed graph.
If an arc is directed from node to node, then is said to be a successor of its parent node . The process of
identifying the successors of a node is called expansion of the node. In each graph we define levels, such that
level consists of a single node, called the start node, and the nodes in the last level are called goal nodes. A cost
can be associated with every arc.

62
Hough Transforms

The HT is a feature extraction method in image analysis, computer vision, and digital image processing. It uses a
voting mechanism to identify bad examples of objects inside a given class of forms. This voting mechanism is
carried out in parameter space. First, object candidates are produced as local maxima in an accumulator space, using
the HT algorithm.
The traditional HT was concerned with detecting lines in an image, but it was subsequently expanded to identifying
locations of arbitrary shapes, most often circles or ellipses.

❖ Why is it Needed?
In many circumstances, an edge detector can be used as a pre-processing stage to get picture points or pixels on
the required curve in the image space. However, there may be missing points or pixels on the required curves
due to flaws in either the image data or the edge detector and spatial variations between the ideal
line/circle/ellipse and the noisy edge points acquired by the edge detector. As a result, grouping the extracted
edge characteristics into an appropriate collection of lines, circles, or ellipses is frequently difficult.

Original image of Lane:

63
Figure 2: Image after applying edge detection technique. Red circles show that the

line is breaking there.

❖ How Does it Work?


The Hough approach is effective for computing a global description of a feature(s) from (potentially noisy)
where the number of solution classes does not need to be provided before. For example, the Hough approach for
line identification is motivated by the assumption that each input measurement reflects its contribution to a
globally consistent solution (e.g., the physical line which gave rise to that image point).

A line can be described analytically in a variety of ways. One of the line equations uses the parametric or
normal notion: xcosθ+ysinθ=r. where r is the length of a normal from the origin to this line and θ is the
orientation.

64
Point equations Now a=0 New point (a,b) Now a=1 New point (a,b)
A(1,4) b= -a+4 b= -(0)+4 =4 (0,4) b= -(1)+4 =3 (1,3)
B(2,3) b= -2a+3 b= -2(0)+3=3 (0,3) b= -2(1)+3=1 (1,1)
C(3,1) b= -3a+1 b= -3(0)+1=1 (0,1) b= -3(1)+1=-2 (1,-2)
D(4,1) b= -4a+1 b= -4(0)+1=1 (0,1) b= -4(1)+1=3 (1,-3)
E(5,0) b= -5a+0 b= -5(0)+0=0 (0,0) b= -5(1)+0=-5 (1,-5)

The known variables (i.e., xi,yi) in the Image are constants in the parametric line equation, whereas r and are the
unknown variables we seek. Points in cartesian image space correspond to curves (i.e., sinusoids) in the polar Hough
parameter space if we plot the potential (r, θ) values specified by each. The Hough transformation for straight lines
is this point-to-curve transformation. Collinear spots in the cartesian image space become clearly obvious when
examined in the Hough parameter space because they provide curves that overlap at a single (r, θ) point.
Where a and b are the circle’s center coordinates, and r is the radius. Because we now have three coordinates in the
parameter space and a 3-D accumulator, the algorithm’s computing complexity increases. (In general, the number of
parameters increases the calculation and the size of the accumulator array polynomially.) As a result, the
fundamental Hough approach described here is only applicable to straight lines.

❖ Algorithm: -
1. Determine the range of ρ and θ. Typically, the range of θ is [0, 180] degrees and ρ is [-d, d], where d is the
diagonal length of the edge. Therefore, it’s crucial to quantify the range of ρ and θ, which means there should
only be a finite number of potential values.
2. Create a 2D array called the accumulator with the dimensions (num rhos, num thetas) to represent the Hough
Space and set all its values to zero.
3. Use the original Image to do edge detection (ED). You can do this with whatever ED technique you like.
4. Check each pixel on the edge picture to see if it is an edge pixel. If the pixel is on edge, loop over all possible
values of θ, compute the corresponding ρ, locate the θ and ρ index in the accumulator, then increase the
accumulator base on those index pairs.
5. Iterate over the accumulator’s values. Retrieve the ρ and θ index, and get the value of ρ and θ from the index
pair, which may then be transformed back to the form of y = ax + b if the value is greater than a specified
threshold.

❖ Sum of Hough Transform:


Problem: Given set of points, use Hough transform to join these points. A(1,4), B(2,3) ,C(3,1) ,D(4,1) ,E(5,0)
Solution:
Lets think about equation of line that is y=ax+b.
Now, if we rewrite the same line equation by keeping b in LHS, then we get
b=-ax+y. So if we write the same equation for point A(1,4), then consider x=1 and y=4 so that we will get
b=-a+4. The following table shows all the equations for a given point
Point X and y values Substituting the value in b=-ax+y
A(1,4) x=1 ; y=4 b= -a+4
B(2,3) x=2 ; y=3 b= -2a+3
C(3,1) x=3 ; y=1 b= -3a+1
D(4,1) x=4 ; y=1 b= -4a+1
E(5,0) x=5 ; y=0 b= -5a+0

Now take x-0 and find corresponding y value for above given five equations
Let us plot the new point on the graph as given below in figure.

65
We can see that almost all line crosses each other at a point (-1,5). So here now a=-1 and b =5.
Now let’s put these values in the y=ax+b equation so we get y=-1x+5 so y=-x+5 is the line equation that will link all
the edges.

❖ Advantages:
The HT benefits from not requiring all pixels on a single line to be contiguous. As a result, it can be quite effective
when identifying lines with small gaps due to noise or when objects are partially occluded.

❖ Disadvantages:
The HT has the following drawbacks:
• It can produce deceptive results when objects align by accident;
• Rather than finite lines with definite ends, detected lines are infinite lines defined by their (m,c) values.

❖ Application:
The HT has been widely employed in numerous applications because of the benefits, such as noise immunity. 3D
applications, object and form detection, lane and traffic sign recognition, industrial and medical applications, pipe
and cable inspection, and underwater tracking are just a few examples. Below are some examples of these
applications. Proposes the hierarchical additive Hough transform (HAHT) for detecting lane lines. The HAHT that is
recommended accumulates the votes at various hierarchical levels. Line segmentation into multiple blocks also
minimizes the computational load. proposes a lane detection strategy in which the HT is merged with the joint
photographic experts’ group (JPEG) compression. However, only simulations are used to test the method.

66
Region Based Segmentation

This process involves dividing the image into smaller segments that have a certain set of rules. This technique
employs an algorithm that divides the image into several components with common pixel characteristics. The
process looks out for chunks of segments within the image. Small segments can include similar pixes from
neighboring pixels and subsequently grow in size. The algorithm can pick up the gray level from surrounding
pixels.

❖ They are two types: -


1. Region growing − This method recursively grows segments by including neighbouring pixels with similar
characteristics. It uses the difference in gray levels for gray regions and the difference in textures for textured
images.

2. Region splitting − In this method, the whole image is considered a single region. Now to divide the region into
segments it checks for pixels included in the initial region if they follow the predefined set of criteria. If they
follow similar rules they are taken into one segment.

• Region Merging- In this segmentation we usually deal each and every pixel region the no. of pixels is
equal to the no. of region in the image.
We use to merge the region according to the given rule in merging segmentation if the rule is followed.

• Region splitting and merging- It is combination both splitting and merging takes place at the same
time is call as region splitting and merging segmentation.

67
Boundary Descriptors

❖ Simple Descriptors:
• Length of a Contour By counting the number of pixels along the contour. For a chain coded curve with unit
spacing in both directions, the number of vertical and horizontal components plus 21/2 times the number of
components give the exact length of curve.
• Boundary Diameter It is defines as Diam (B) = max [D(pi , pj)] i, j where D is the distance measure which
can be either Euclidean distance or D4 distance. The value of the diameter and the orientation of the major
axis of the boundary are two useful Descriptors.
• Curvature It is the rate of change of slope. Curvature can be determined by using the difference between the
slopes of adjacent boundary segments at the point of intersection of the segments. Shape Numbers Shape
number is the smallest magnitude of the first difference of a chain code representation. The order of a shape
number is defined as the number of digits in its representation. Shape order is even for a closed boundary.
REGIONAL DESCRIPTORS Simple Descriptors Area, perimeter and compactness are the simple region
Descriptors Compactness = (perimeter)2/area Topological Descriptors
• Rubber-sheet Distortions Topology is the study of properties of a figure that are unaffected by any
deformation, as long as there is no tearing or joining of the figure.
• Euler Number Euler number (E) of region depends on the number of connected components (C) and holes
(H). E = C − H A connected component of a set is a subset of maximal size such that any two of its points
can be joined by a connected curve lying entirely within the subset.

68

You might also like