Dip
Dip
Digital Image Processing means processing digital image by means of a digital computer.
We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some
useful information.
Digital image processing is the use of algorithms and mathematical models to process and analyse digital images.
The goal of digital image processing is to enhance the quality of images, extract meaningful information from
images, and automate image-based tasks.
❖ What is an image?
An image is defined as a two-dimensional function, F(x,y), where x and y are spatial coordinates, and
the amplitude of F at any pair of coordinates (x,y) is called the intensity of that image at that point.
When x,y and amplitude values of F are finite, we call it a digital image.
In other words, an image can be defined by a two-dimensional array specifically arranged in rows and
columns.
Digital Image is composed of a finite number of elements, each of which elements have a particular
value at a particular location.
These elements are referred to as picture elements, image elements, and pixels.
A Pixel is most widely used to denote the elements of a Digital Image.
❖ Resolution: -
• Resolution is an important characteristic of an imaging system.
• It is the ability of the imaging system to produce the smallest discern able details, i.e., the smallest
sized object clearly, and differentiate it from the neighbouring small objects that are present in the
image.
• The number of rows in digital image is called vertical resolution.
• The number of columns is known as horizontal resolution.
➢ Image resolution depends on two factors:
o Optical resolution of the lens
o Spatial resolution: - A useful way to define resolution is the smallest number of line
pairs per unit distance.
2
✓ Spatial resolution also depends on two parameters: -
• Number of pixels of the image.
• Number of bits necessary for adequate intensity resolution, referred to
as the bit depth.
The number of bits necessary to encode the pixel value is called bit depth. Bit depth is a power of two; it
can be written as power of 2.
So, the total number of bits necessary to represent the image is = Number of rows * Number of columns
* Bit depth.
Generally, image processing operations are divided into two categories:
1. Low Level Operations: - Low level image processing is associated with traditional
image processing.
2. High Level Operations: - High level image processing deals with image understanding.
❖ Types of an image: -
1. BINARY IMAGE– The binary image as its name suggests, contain only two-pixel elements i.e. 0 &
1, where 0 refers to black and 1 refers to white. This image is also known as Monochrome.
2. BLACK AND WHITE IMAGE– The image which consist of only black and white color is called
BLACK AND WHITE IMAGE.
3. 8 bit COLOR FORMAT– It is the most famous image format. It has 256 different shades of colors
in it and commonly known as Grayscale Image. In this format, 0 stands for Black, and 255 stands for
white, and 127 stands for gray.
4. 16 bit COLOR FORMAT– It is a color image format. It has 65,536 different colors in it.It is also
known as High Color Format. In this format the distribution of color is not as same as Grayscale
image.
A 16 bit format is actually divided into three further formats which are Red, Green and Blue. That
famous RGB format.
3
❖ DIGITAL IMAGE REPRESENTATION
➢ Image as a Matrix
As we know, images are represented in rows and columns we have the following syntax in which
images are represented:
The right side of this equation is digital image by definition. Every element of this matrix is called
image element, picture element , or pixel.
4
Image sampling and quantization
❖ Introduction: -
We cannot process or store these analog images on our computers.
Digital images are more useful than analog images.
We can store them on computers, apply digital image processing, make hundreds and thousands of copies,
and share them over the internet.
5
The sampled signal is then quantized to get the value of each pixel.
6
❖ Difference between Image Sampling and Quantization:
Sampling Quantization
It determines the spatial resolution of the It determines the number of grey levels in the
digitized images. digitized images.
A single amplitude value is selected from Values representing the time intervals are rounded
different values of the time interval to off to create a defined set of possible amplitude
represent it. values.
7
Image Processing Steps
1. Image Acquisition:
Image acquisition is the first step in image processing. This step is also known as pre-processing in image
processing. It involves retrieving the image from a source, usually a hardware-based source.
2. Image Enhancement:
Image enhancement is the process of bringing out and highlighting certain features of interest in an image that
has been obscured. This can involve changing the brightness, contrast, etc.
3. Image Restoration:
Image restoration is the process of improving the appearance of an image. However, unlike image enhancement,
image restoration is done using certain mathematical or probabilistic models.
8
4. Color Image Processing:
Color image processing includes a number of color modeling techniques in a digital domain. This step has
gained prominence due to the significant use of digital images over the internet.
6. Compression:
Compression is a process used to reduce the storage required to save an image or the bandwidth required to
transmit it. This is done particularly when the image is for use on the Internet.
7. Morphological Processing:
Morphological processing is a set of processing operations for morphing images based on their shapes.
8. Segmentation:
Segmentation is one of the most difficult steps of image processing. It involves partitioning an image into its
constituent parts or objects.
10. Recognition:
Recognition assigns a label to an object based on its description.
9
Image Acquisition
Image acquisition can be defined as the act of procuring an image from sources.
This can be done via hardware systems such as cameras, encoders, sensors, etc.
In the image acquisition step, the incoming light wave from an object is converted into an electrical signal by a
combination of photo-sensitive sensors.
These small subsystems fulfil the role of providing your machine vision algorithms with an accurate description of
the object.
The supreme goal of an image acquisition system is to maximize the contrast for the features of interest.
1. Trigger:
A completely free-running camera reads the input from the sensor permanently. Upon an “image query,” the
current image is captured completely.
After this, new image acquisition is started and then this completely captured image is transferred to the PC.
Sensors, PLC, and push buttons for manual operation can perform these image queries.
Triggers also depend on the type of camera you have installed in the system.
2. Camera:
In a machine vision system, the cameras are responsible for taking the light information from a scene and
converting it into digital information i.e. pixels using CMOS or CCD sensors.
Many key specifications of the system correspond to the camera’s image sensor.
These key aspects include resolution, the total number of rows, and columns of pixels the sensor
accommodates.
The higher the resolution, the more data the system collects, and the more precisely it can judge discrepancies
in the environment.
However, more data demands more processing, which can significantly limit the performance of a system.
Based on the acquisition type, cameras could be classified into two major categories:
Line Scan cameras
Area scan cameras
While cameras and sensors are crucial, they alone are not sufficient to capture an image.
3. Optics:
The lens should provide appropriate working distance, image resolution, and magnification for a vision
system.
To calibrate magnification precisely, it is necessary to know the camera’s image sensor size and the field of
view that is desirable. Some of the most used lenses include:
Standard Resolution Lenses:
10
These lenses are optimized for focusing to infinity with low distortion and vignette.
Macro Lenses:
Specified in terms of their magnification relative to the camera sensor, they are optimized for ‘close-up’
focusing on negligible distortion.
High-Resolution Lenses:
These lenses offer better performance than standard resolution lenses and are suitable for precise
measurement applications.
Telecentric Lenses:
These are specialized lenses that produce no distortion and result in images with constant magnification
regardless of the object’s distance.
4. Illumination:
The lighting should provide uniform illumination throughout all the visible object surfaces.
The illumination system should be set up in a way that avoids glare and shadows.
Spectral uniformity and stability are key.
Ambient light and daytime need to be considered as well.
11
Color Image Representation
1. Binary Images:
It is the simplest type of image. It takes only two values i.e., Black, and White or 0 and 1. The binary image
consists of a 1-bit image and it takes only 1 binary digit to represent a pixel.
Binary images are mostly used for general shape or outline.
For Example: Optical Character Recognition (OCR).
Binary images are generated using threshold operation.
When a pixel is above the threshold value, then it is turned white ('1') and which are below the threshold value
then they are turned black ('0').
2. Gray-scale images:
Grayscale images are monochrome images, means they have only one color.
Grayscale images do not contain any information about color.
Each pixel determines available different grey levels.
A normal grayscale image contains 8 bits/pixel data, which has 256 different grey levels. In medical images and
astronomy, 12 or 16 bits/pixel images are used.
3. Color images:
Colour images are three band monochrome images in which, each band contains a different color and the actual
information is stored in the digital image.
The color images contain gray level information in each spectral band.
The images are represented as red, green and blue (RGB images).
And each color image has 24 bits/pixel means 8 bits for each of the three-color band (RGB).
❖ Image Formats:
1. 8-bit color format:
8-bit color is used for storing image information in a computer's memory or in a file of an image.
In this format, each pixel represents one 8 bit byte.
It has 0-255 range of colors, in which 0 is used for black, 255 for white and 127 for gray color.
The 8-bit color format is also known as a grayscale image.
12
3. 24-bit color format:
The 24-bit color format is also known as the true color format.
The 24-bit color format is also distributed in Red, Green, and Blue.
As 24 can be equally divided on 8, so it is distributed equally between 3 different colors like 8 bits for R, 8
bits for G and 8 bits for B.
13
UNIT - 2
Intensity Transformation
Intensity transformation as the name suggests, we transform the pixel intensity value using some transformation
function or mathematical expression.
Intensity transformation operation is usually represented in the form
s = T(r)
where, r and s denote the pixel value before and after processing and T is the transformation that maps pixel value r
into s.
Basic types of transformation functions used for image enhancement are
• Linear (Negative and Identity Transformation)
• Logarithmic (log and inverse-log transformation)
• Power law transformation
1. Image Negatives: -
Equation: s = L – 1 – r
Consider L = 256 and r be the intensity of the image (Range 0 to 255).
2. Log Transformation: -
Equation:
Consider c = 1, gamma =0.04 and r be the intensity of the image (Range 0 to 255)
14
The below figure summarizes these functions. Here, L denotes the intensity value (for 8-bit, L = [0,255])
This is a spatial domain technique which means that all the operations are done directly on the pixels.
❖ Applications:
15
Histograms
In digital image processing, the histogram is used for graphical representation of a digital image.
A graph is a plot by the number of pixels for each tonal value.
Nowadays, image histogram is present in digital cameras. Photographers use them to see the distribution of tones
captured.
In a graph, the horizontal axis of the graph is used to represent tonal variations whereas the vertical axis is used to
represent the number of pixels in that pixel.
Black and dark areas are represented in the left side of the horizontal axis, medium grey color is represented in the
middle, and the vertical axis represents the size of the area.
❖ Applications of Histograms:
1. In digital image processing, histograms are used for simple calculations in software.
2. It is used to analyze an image. Properties of an image can be predicted by the detailed study of the
histogram.
3. The brightness of the image can be adjusted by having the details of its histogram.
4. The contrast of the image can be adjusted according to the need by having details of the x-axis of a
histogram.
5. It is used for image equalization. Gray level intensities are expanded along the x-axis to produce a high
contrast image.
7. If we have input and output histogram of an image, we can determine which type of transformation is
applied in the algorithm.
1. Histogram Sliding:
In Histogram sliding, the complete histogram is shifted towards rightwards or leftwards. When a histogram
is shifted towards the right or left, clear changes are seen in the brightness of the image.
The brightness of the image is defined by the intensity of light which is emitted by a particular light source.
16
2. Histogram Stretching:
The contrast of an image is defined between the maximum and minimum value of pixel intensity.
If we want to increase the contrast of an image, histogram of that image will be fully stretched and covered the
dynamic range of the histogram.
From histogram of an image, we can check that the image has low or high contrast.
3. Histogram Equalization:
Histogram equalization is used for equalizing all the pixel values of an image. Transformation is done in such a
way that uniform flattened histogram is produced.
17
Histogram equalization increases the dynamic range of pixel values and makes an equal count of pixels at each
level which produces a flat histogram with high contrast image.
While stretching histogram, the shape of histogram remains the same whereas in Histogram equalization, the
shape of histogram changes and it generates only one image.
18
Spatial Filtering
❖ General Classification:
Smoothing Spatial Filter: Smoothing filter is used for blurring and noise reduction in the image.
Blurring is pre-processing steps for removal of small details and Noise Reduction is accomplished by
blurring.
➢ Types of Smoothing Spatial Filter:
1. Mean Filter:
Linear spatial filter is simply the average of the pixels contained in the neighbourhood of the filter mask.
The idea is replacing the value of every pixel in an image by the average of the grey levels in the
neighbourhood define by the filter mask.
Types of Mean filter:
(i) Averaging filter: It is used in reduction of the detail in image. All coefficients are equal.
(ii) Weighted averaging filter: In this, pixels are multiplied by different coefficients.
Center pixel is multiplied by a higher value than average filter.
19
Fourier Transforms & It’s Properties
Fourier transform is the input tool that is used to decompose an image into its sine and cosine components.
2. Scaling:
Scaling is the method that is used to the change the range of the independent variables or features of
data.
a. If we stretch a function by the factor in the time domain then squeeze the Fourier transform by
the same factor in the frequency domain.
b. If f(t) -> F(w) then f(at) -> (1/|a|)F(w/a)
3. Differentiation:
Differentiating function with respect to time yields to the constant multiple of the initial function.
a. If f(t) -> F(w) then f'(t) -> jwF(w)
4. Convolution:
It includes the multiplication of two functions.
a. The Fourier transform of a convolution of two functions is the point-wise product of their
respective Fourier transforms.
b. If f(t) -> F(w) and g(t) -> G(w)
c. then f(t)*g(t) -> F(w)*G(w)
5. Frequency Shift:
Frequency is shifted according to the co-ordinates.
a. There is a duality between the time and frequency domains and frequency shift affects the time
shift.
b. If f(t) -> F(w) then f(t)exp[jw't] -> F(w-w')
6. Time Shift:
The time variable shift also effects the frequency function.
a. The time shifting property concludes that a linear displacement in time corresponds to a linear
phase factor in the frequency domain.
b. If f(t) -> F(w) then f(t-t') -> F(w)exp[-jwt']
20
Frequency domain
Since this Fourier series and frequency domain is purely mathematics, so we will try to minimize that math’s part and
focus more on its use in DIP.
❖ Spatial domain:
❖ Frequency Domain:
We first transform the image to its frequency distribution.
Then our black box system perform what ever processing it has to performed, and the output of the black box in
this case is not an image, but a transformation.
After performing inverse transformation, it is converted into an image which is then viewed in spatial domain.
❖ Transformation:
A signal can be converted from time domain into frequency domain using mathematical operators called
transforms.
21
There are many kinds of transformation that does this. Some of them are given below.
1. Fourier Series.
2. Fourier transformation.
3. Laplace transform.
4. Z transform.
❖ Frequency components:
Any image in spatial domain can be represented in a frequency domain.
We will divide frequency components into two major components.
22
Color Models
2. These models mix different amount of RED, GREEN, and BLUE (primary colors) light to produce rest of
the colors.
4. Example: RGB model is used for digital displays such as laptops, TVs, tablets, etc.
❖ Subtractive Color Model:
1. These types of models use printing inks to display colors.
2. Subtractive color starts with an object that reflects light and uses colorants to subtract portions of the white
3. If an object reflects all the white light back to the viewer, it appears white, and if it absorbs all the light then
it appears black.
4. Example: Graphic designers used the CMYK model for printing purpose.
1. RGB:
The model’s name comes from the initials of the three additive primary colors, red, green, and blue.
The RGB color model is an additive color model in which red, green, and blue are added together in various
Usually, in RGB a pixel is represented using 8 bits for each red, green, and blue.
Equal values of these three primary colors represents shade of gray color ranging from black to white.
The origin will be black, and the diagonal opposite to the origin will be black.
The rest three corners of the cube will be cyan, magenta, and yellow. Inside the cube, we get a variety of colors
23
With the help of the primary colors, we can generate secondary colors (Yellow, Cyan, and Magenta) as follows.
Colour combination:
Green(255) + Red(255) = Yellow
Green(255) + Blue(255) = Cyab
Red(255) + Blue(255) = Magenta
Red(255) + Greeb(255) + Blue(255) = White
The CMY color model is a subtractive color model in which cyan, magenta, and yellow (secondary colors)
pigments or dyes are mixed in different ways to produce a broad range of colors .
The secondary colors are also called the primary color pigments.
The CMY color model itself does not describe what is meant by cyan, magenta, and yellow colorimetrically, so
the mixing results are not specified as absolute but relative to the primary colors.
When the exact chromaticities of the cyan, magenta, and yellow primaries are defined, the color model then
The methodology of color subtraction is a valuable way of predicting the ultimate color appearance of an object
if the color of the incident light and the pigments are known.
The relationship between the RGB and CMY color models is given by:
RGB = 1 — CMY or CMY = 1 — RGB
When a white light (R+G+B) is incident on this yellow surface, the blue lights will get absorbed and we will see
24
Similarly, if we throw a magenta light, a combination of red and blue, on a yellow pigment, the result will be
a red light because the yellow pigment absorbs the blue light.
➢ CMYK:
According to the theory, 100% cyan, 100% magenta, and 100% yellow would result in pure black.
3. HSI:
When humans view a color object, its hue, saturation, and brightness are described.
2) Saturation: It measures the extent to which a pure color is diluted by white light.
3) Brightness: It depends upon color intensity, which is a key factor in describing the color sensation.
The intensity is easily measurable, and the results are also easily interpretable.
25
Pseudo Coloring
❖ Grayscale image:
It is a black and white image.
The pixels values are shades of gray colour which is the combination of white and black shades.
The image is represented in form of one 2-Dimensional matrix. Each value represents the intensity or
brightness of the corresponding pixel at that coordinate in the image.
Total 256 shades are possible for the grayscale images.
0 means black and 255 means white.
As we increase the value from 0 to 255, the white component gets increases and brightness increases.
❖ Steps:
1. Read the grayscale image.
2. If its bit-depth is 24, then make it 8.
3. Create an empty image of the same size.
4. Assign some random weight to RGB channels.
5. Copy weighted product of grayscale image to each channel of Red, Green, and Blue.
6. Display the images after creation.
❖ Functions Used:
1. imread( ) inbuilt function is used to read the image.
2. imtool( ) inbuilt function is used to display the image.
3. rgb2gray( ) inbuilt function is used to convert RGB to gray image.
4. uint8( ) inbuilt function is used to convert double into integer format.
5. pause( ) inbuilt function is used to stop execution for specified seconds.
26
Color Transformations
The conversion of those components between color models.
Formulation:
We model color transformations using the expression
G(x,y) = T[f(x,y)]
Where, f(x,y) is a color input image
G(x,y) is the transformed or processed color output image and T is an operation on f over a spatial
neighbourhood of (x,y)
The pixel values here are triplets or quarters from the color space chosen to represent the images.
Color can be described by its red (R), green (G) and blue (B) coordinates (the well-known RGB system), or by some
its linear transformation as XYZ, CMY, YUV, IQ, among others.
If the RGB coordinates are in the interval from 0 to 1, each color can be represented by the point in the cube in the
RGB space.
We need such model, where the range of values of saturation is identical for all hues.
From this point of view, the GLHS color model is probably the best from the current ones, particularly for w min =
wmid = wmax = 1/3.
❖ Color Complement:
The computed complement is reminiscent of conventional photographic color film negatives.
Reds of the original image are replaced by cyans in the complement.
When the original image is black, the complement is white, and so on.
Each of the hues in the complement image can be predicted from the original image using the color circle.
And each of the RGB component transforms involved in the computation of the complement is a function
of only the corresponding input color component.
❖ Histogram Processing:
The gray-level histogram processing transformations can be applied to color images in an automated way.
Since color images are composed of multiple components, however, consideration must be given to
adapting the gray-scale technique to more than one component and/or histogram.
It is generally unwise to histogram equalize the components of a color image independently.
This results in erroneous color.
A more logical approach is to spread the color intensities uniformly leaving the colors themselves
unchanged.
❖ Color Complement:
The hues directly opposite one another on the color circle of next figure are called complements.
Our interest in complements stems from the fact that they are analogous to the gray – scale negatives.
27
As in the gray-sacle case, color complements are useful for enhancing detail that is embedded in dark
regions of a color image – particularly when the regions are dominant in size.
The computed complement is reminiscet of conventional photographic color film negatives.
Reds of the original image are replaced by cyans in the complement.
When the original image is black, the complement is white, and so on.
Each of the hues in the complement image can be predicted from the original image using the color circle.
And each of the RGB component transforms involved in the computation of the complement is a function
of only the corresponding input color component.
❖ Color Slicing:
Highlighting a specific range of colors in an image is useful for separating objects from their surrounding.
The basic idea is either:
1. Display the colors of interest so that stand out from the background.
2. Use the region defined by the colors as a mask for further processing.
One of the simplest ways to “slice” a color image is to map the colors outside some range of interest to a no
prominent neutral color.
1. The brightness should be a linear combination of all three RGB components. At least, it must be
continuous growing function of all of them.
2. The hue differences between the basic colors (red, green and blue) should be 120 ◦ and similarly
between the complement colors (yellow, purple and cyan). The hue difference between a basic color
and an adjacent complement one (e.g. red and yellow) should be 60 ◦.
3. The saturation should be 1 for the colors on the surface of the RGB color cube, it means in case of one
of the RGB components is 0 or 1 except black and white vertices and it is 0 in case of R=G=B.
28
Wavelet Transformation
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero.
Wavelets are functions that are concentrated in time and frequency around a certain point.
Fourier transformation, although it deals with frequencies, does not provide temporal details.
This wavelet transform finds its most appropriate use in non-stationary signals.
This transformation achieves good frequency resolution for low-frequency components and high temporal resolution
❖ Wavelet Analysis:
Wavelet analysis is used to divide information present on an image (signals) into two discrete components
A signal is passed through two filters, high pass and low pass filters. The image is then decomposed into
The approximation shows an overall trend of pixel values and the details as the horizontal, vertical and
diagonal components.
If these details are insignificant, they can be valued as zero without significant impact on the image, thereby
29
❖ Wavelet Based Denoising of Images:
We perform a 3-level discrete wavelet transform on a noisy image and thresholding on the high frequency
There are two types of thresholding for denoising — hard and soft.
Hard thresholding is the process of setting to zero the coefficients whose absolute values are lower than the
threshold λ .
Soft thresholding is another method by first setting to zero coefficients whose absolute values are lower than the
Once we apply the threshold on all levels, we get the denoised matrices for all the detail components in every
level.
We use these matrices as coefficients for inverse discrete wavelet transformation to reconstruct the image.
30
UNIT - 3
Image Degradation and Restoration process
❖ IMAGE RESTORATION: -
Restoration improves image in some predefined sense. It is an objective process. Restoration attempts to
reconstruct an image that has been degraded by using a priori knowledge of the degradation phenomenon.
These techniques are oriented toward modeling the degradation and then applying the inverse process in
order to recover the original image. Image restoration refers to a class of methods that aim to remove or
reduce the degradations that have occurred while the digital image was being obtained.
All natural images when displayed have gone through some sort of degradation:
a) During display mode
b) Acquisition mode, or
c) Processing mode
The degradations may be due to
a) Sensor noise
b) Blur due to camera mis focus
c) Relative object-camera motion
d) Random atmospheric turbulence
e) Others
❖ Degradation Model: -
Degradation process operates on a degradation function that operates on an input image with an additive
noise term. Input image is represented by using the notation f(x,y), noise term can be represented as
η(x,y).These two terms when combined gives the result as g(x,y).
If we are given g(x,y), some knowledge about the degradation function H or J and some knowledge about
the additive noise teem η(x,y), the objective of restoration is to obtain an estimate f'(x,y) of the original
image.
We want the estimate to be as close as possible to the original image. The more we know about h and η ,
the closer f(x,y) will be to f'(x,y). If it is a linear position invariant process, then degraded image is given in
the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
h(x,y) is spatial representation of degradation function and symbol * represents convolution.
In frequency domain we may write this equation as
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms in the spatial domain.
The image restoration process can be achieved by inversing the image degradation process.
Although the concept is relatively simple, the actual implementation is difficult to achieve, as one requires
prior knowledge or identifications of the unknown degradation function and the unknown noise source. In
the following sections, common noise models and method of estimating the degradation function are
presented.
31
Noise Models
❖ Based on Distribution: -
Noise is a fluctuation in pixel values and it is characterized by random variable.
A random variable probability distribution is an equation that links the values of the statistical result with its
probability of occurrence.
Categorization of noise based on probability distribution is very popular.
❖ Gaussian Noise: -
Because of its mathematical simplicity, the Gaussian noise model is often used in practice and even in situations
where they are marginally applicable at best. Here, m is the mean and σ 2 is the variance.
Gaussian noise arises in an image due to factors such as electronic circuit noise and sensor noise due to poor
illumination or high temperature.
Random noise that enters a system can be modelled as a Gaussian or normal distribution.
Gaussian noise affects both dark and light areas of image.
❖ Rayleigh Noise: -
This type of noise is mostly present in range images.
Range images are mostly used in remote sensing applications.
Here ! indicates factorial. The mean and variance are given below.
❖ Exponential Noise: -
This type of noise occurs mostly due to the illumination problems.
It is present in laser imaging.
Here a > 0. The mean and variance of this noise pdf are:
32
❖ Uniform Noise: -
It is also very popular noise occurs in images where different values of noise are equally probable.
It occurs because of Quantization noise.
Uniform noise is not practically present but is often used in numerical simulations to analyze systems.
❖ Impulse Noise: -
It is also known as Shot Noise, Salt and Pepper Noise and Binary Noise.
It occurs mostly because of sensor and memory problem because of which pixels are assigned incorrect
maximum values.
If b > a, intensity b will appear as a light dot in the image. Conversely, level a will appear like a black dot in
the image. Hence, this presence of white and black dots in the image resembles to salt-and-pepper granules,
hence also called salt-and-pepper noise. When either P a or Pb is zero, it is called unipolar noise. The origin of
impulse noise is quick transients such as faulty switching in cameras or other such cases.
❖ Poisson Noise: -
This type of noise manifests as a random structure or texture in images.
It is very common in X-ray images.
❖ Gamma Noise: -
This type of noise also occurs mostly due to the illumination problems.
P(z) = a^b * z^b-1 / (b-1)! * e^-a^2 for z >= 0, otherwise 0.
❖ Based on Correlation: -
Statistical dependence among pixels is known as correlation.
If a pixel is independent of its neighboring pixels, it is known as uncorrelated pixel otherwise it is known
as correlated pixel.
Uncorrelated Noise is known as White Noise.
Mathematically for white noise, the noise power spectrum or power spectral density remains constant
with frequency.
Characterization of colored noise is quite difficult because its origin is mostly unknown.
One popular colored noise is Pink Noise.
Its power spectrum is not constant, rather it is proportional to reciprocal of frequency.
This is also known as 1/f or Flicker Noise.
❖ Based on Nature: -
Additive Noise:
In this case an image can be perceived as the image plus noise.
This is a linear problem.
G(x,y) = f(x,y) + n(x,y)
Where,
33
f is input of image
Multiplicative Noise:
It can be modelled as multiplicative process.
Speckle noise is most encountered multiplicative noise in image processing.
It is mostly present in medical images.
It can be modelled as pixel value multiplied by the random value.
I = f(x,y) + [f(x,y) * Ng
❖ Based on Sources: -
Noise based on source, commonly encountered in image processing are:
Quantization Noise:
It occurs due to a difference between the actual and allocated values.
It is inherent in the quantization process.
Photon Noise:
It occurs due to the statistical nature of electromagnetic waves.
Generation of photon is not constant because of statistical variation.
This causes variation in photon count which is known as photon noise.
It is present in many medical images.
34
Noise Filters
Noise is always presents in digital images during image acquisition, coding, transmission, and processing steps. It is
very difficult to remove noise from the digital images without the prior knowledge of filtering techniques. In this
article, a brief overview of various noise filtering techniques. These filters can be selected by analysis of the noise
behaviour. In this way, a complete and quantitative analysis of noise and their best suited filters will be presented over
here.
Filtering image data is a standard process used in almost every image processing system. Filters are used for this
purpose. They remove noise from images by preserving the details of the same. The choice of filter depends on the
filter behaviour and type of data.
We all know that, noise is abrupt change in pixel values in an image. So when it comes to filtering of images, the first
intuition that comes is to replace the value of each pixel with average of pixel around it. This process smooths the
image. For this we consider two assumptions.
❖ Assumption:
1. The true value of pixels are similar to true value of pixels nearby
2. The noise is added to each pixel independently.
Let’s first consider 1-dimensional function before going into 2-dimensional image.
In the above image of original function(fig-1), if we will consider each circle as pixel values, then the smoothed
function(fig-2) is the result of averaging the side by pixel values of each pixel.
35
1. Filtering with weighted moving average uniform weight: -
Instead of just thinking about averaging the local pixel, which is resulting in some loss of data, we consider a set
of local pixel and assign them as uniform weights. Here we assume that noise is added to each pixel
independently. According to this noise amount, we assign weights to different pixels.
The process used in filtering with uniform weights is also called correlation or correlation filtering.
36
Fig. Correlation function for uniform weights. src: Udacity
In correlation filtering with non-uniform weight, an function is used as non-uniform weights which is also
called mask or kernel (function of the pixel values of the small sliding window) . The process used in it is
called cross-correlation.
Though there are many types of filters, for this article we will consider 4 filters which are mostly used in
image processing.
37
1. Gaussian Filter:
In image processing, a Gaussian blur (also known as Gaussian smoothing) is the result of blurring
an image by a Gaussian function (named after mathematician and scientist Carl Friedrich Gauss). It is a
widely used effect in graphics software, typically to reduce image noise and reduce detail.
2. Mean Filter:
Mean filter is a simple sliding window that replace the center value with the average of all pixel values in the
window. The window or kernel is usually a square but it can be of any shape.
3. Median Filter:
Mean filter is a simple sliding window that replace the center value with the Median of all pixel values in the
window. The window or kernel is usually a square but it can be of any shape.
4. Bilateral Filter:
Bilateral filter uses Gaussian Filter but it has one more multiplicative component which is a function of pixel
intensity difference. It ensures that only pixel intensity similar to that of the central pixel is included in
computing the blurred intensity value. This filter preserves edges.
38
Inverse Filtering
39
Homomorphism Filtering
Homomorphic filters are widely used in image processing for compensating the effect of no uniform illumination in
an image.
It is a generalized technique for signal and image processing.
Involving a nonlinear mapping to a different domain in which linear filter techniques are applied.
Followed by mapping back to the original domain.
It is simultaneously normalizes the brightness across and image and increase contrast.
Pixel intensities in an image represent the light reflected from the corresponding points in the objects. As per as
image model, image f(z,y) may be characterized by two components:
(1) the amount of source light incident on the scene being viewed, and
(2) the amount of light reflected by the objects in the scene.
These portions of light are called the illumination and reflectance components, and are denoted i ( x , y) and r ( x , y)
respectively.
The functions i ( x , y) and r ( x , y) combine multiplicatively to give the image function f ( x , y): f ( x , y) = i ( x ,
y).r(x, y) (1) where 0 < i ( x , y ) < a and 0 < r( x , y ) < 1.
Homomorphic filters are used in such situations where the image is subjected to the multiplicative interference or
noise as depicted.
We cannot easily use the above product to operate separately on the frequency components of illumination and
reflection because the Fourier transform of f ( x , y) is not separable; that is F[f(x,y)) not equal to F[i(x, y)].F[r(x,
y)].
We can separate the two components by taking the logarithm of the two sides ln f(x,y) = ln i(x, y) + ln r(x, y).
Taking Fourier transforms on both sides we get, F[ln f(x,y)} = F[ln i(x, y)} + F[ln r(x, y)]. that is, F(x,y) = I(x,y) +
R(x,y), where F, I and R are the Fourier transforms ln f(x,y),ln i(x, y) , and ln r(x, y). respectively. The function F
represents the Fourier transform of the sum of two images: a low-frequency illumination image and a highfrequency
reflectance image.
If we now apply a filter with a transfer function that suppresses low- frequency components and enhances high-
frequency components, then we can suppress the illumination component and enhance the reflectance component.
❖ Applications:
1. It is used for removing multiplicative noise that has certain characteristics.
2. It is also used in correcting non uniform illumination in images.
3. It can be used for improving the appearance of a grey scale image.
40
UNIT - 4
Coding Redundancy
❖ Types of redundancy:
1. Coding Redundancy:
• Coding redundancy is caused due to poor selection of coding technique.
• Coding techniques assigns a unique code for all symbols of message.
• Wrong choice of coding technique creates unnecessary additional bits. These extra bits are called
redundancy.
Coding Redundancy = Average bits used to code – Entropy
2. Inter-pixel Redundancy:
▪ This type of redundancy is related with the inter-pixel correlations within an image.
▪ Much of the visual contribution of a single pixel is redundant and can be guessed from the values of
its neighbors.
▪ Example:
• The visual nature of the image background is given by many pixels that are not actually
necessary.
• This is known as Spatial Redundancy or Geometrical Redundancy.
• Inter-pixel dependency is solved by algorithms like:
o Predictive Coding, Bit Plane Algorithm, Run Length Coding and Dictionary based
Algorithms.
• Spatial Redundancy may be present in:
o Single frame or among multiple frames.
3. Psycho-visual Redundancy:
• The eye and the brain do not respond to all visual information with same sensitivity.
• Some information is neglected during the processing by the brain. Elimination of this information
does not affect the interpretation of the image by the brain.
• Edges and textual regions are interpreted as important features and the brain groups and correlates
such grouping to produce its perception of an object.
• Psycho visual redundancy is distinctly vision related, and its elimination does result in loss of
information.
• Quantization is an example.
• When 256 levels are reduced by grouping to 16 levels, objects are still recognizable. The
compression is 2:1, but an objectionable graininess and contouring effect results.
4. Chromatic Redundancy:
• It refers to the presence of unnecessary colors in an image.
• The color channels of color images are highly correlated and human visual system can not perceive
millions of colors.
• Therefore the colors that are not perceived by human visual system can be removed without
affecting the image quality.
41
Interpixel Redundancy
Interpixel redundancy can present in between images if the co-relation result of them belongs to a structural or
geometric relationship between the objects in the image.
In this case the histogram of such images virtually looks identical. But in reality, objects in the image are of different
structure and geometry. This happens because the gray levels in these images are not equally probable. In this case
we can apply variable length coding to reduce the coding redundancy because it would result from a straight or
natural binary
encoding of their pixels. The coding process cannot alter the level of co-relation between the pixels within the
images. The codes used to represent the gray levels of each image.
Inter pixel redundancy and Psychovisual redundancy do not have the co-relation between the pixels. These co-
relations are related with the
structural or geometric relationship between the objects in the image.
42
Psychovisual Redundancy
The brightness of region of image perceived by human eyes depends upon factors other than light reflected
by the region. For example intensity variation can be perceived in an area of constant intensity, such
phenomena results in the fact that the eye does not respond with equal sensitivity to all visual information.
Certain information simply has
less relative importance than other information in normal visual processing. This information is said to be
Psychovisual redundant. This type of redundancy can be eliminated without significantly impairing the
quality of image perception. This type of redundancy can present in image as human perception of the
information in an image normally does not involve quantitative analysis of every pixel values in the image. In
general an observer searches for distinguishing features such as edges or texture region and mentally
combines them into recognizable groupings. Afterwards the brain then correlates these grouping with prior
knowledge in order to complete the image interpretation process. Psychovisual redundancy is associated with
real or quantifiable
visual information. The elimination of such information is possible only because of such information are not
essential for normal visual processing. Since the elimination of Psych visually redundant data results in a loss
of quantitative information, this commonly refers to as quantization. This is irreversible operation hence it is
called as lossy data compression.
43
Huffman Coding
❖ Huffman Coding: -
Following are the two steps in Huffman Coding
• Building Huffman Tree
• Assigning codes to Leaf Nodes
➢ Building Huffman Tree: -
First Compute probabilities for all data chunks, build nodes for each of the data chunks and push all nodes into a
list. Now pop the least two probabilistic nodes and create a parent node out of them, with probability as the sum
of both their probabilities, now add this parent node to the list. Now repeat the process with the current set of
nodes until you create a parent with probability = 1.
By following the tree build from the above procedure, at an arbitrary node assign its child nodes with child_word =
current encoded-word + ‘0’ / ‘1’ to its left/right nodes. Apply this procedure from the root node.
44
Arithmetic Coding
In this coding technique, a string of characters like the words “hello there” is represented using a fixed number of
bits per character, as in the ASCII code.
When a string is converted to arithmetic encoding, frequently used characters are stored with fewer bits and not-
so-frequently occurring characters are stored with more bits, resulting in relatively fewer bits used in total.
In the most straightforward case, the probability of every symbol occurring is equivalent. For instance, think
about a set of three symbols, A, B, and C, each similarly prone to happen. Straightforward block encoding would
require 2 bits for every symbol, which is inefficient as one of the bit varieties is rarely utilized. In other words, A
= 00, B = 01, and C = 10, however, 11 is unused. An increasingly productive arrangement is to speak to a
succession of these three symbols as a rational number in base 3 where every digit speaks to a symbol. For
instance, the arrangement “ABBCAB” could become 0.011201. In arithmetic coding as an incentive in the stretch
[0, 1). The subsequent stage is to encode this ternary number utilizing a fixed-guide paired number of adequate
exactness toward recuperating it, for example, 0.00101100102 — this is just 10 bits. This is achievable for long
arrangements because there are productive, set up calculations for changing over the base of subjectively exact
numbers.
When all is said and done, arithmetic coders can deliver close ideal output for some random arrangement of
symbols and probabilities (the ideal value is – log2P bits for every symbol of likelihood P). Compression
algorithms that utilize arithmetic coding go by deciding a structure of the data – fundamentally an expectation of
what examples will be found in the symbols of the message. The more precise this prediction is, the closer to
ideal the output will be.
When all is said and done, each progression of the encoding procedure, aside from the absolute last, is the
equivalent; the encoder has fundamentally only three bits of information to consider: The fo llowing symbol that
should be encoded. The current span (at the very beginning of the encoding procedure, the stretch is set to [0,1],
yet that will change) The probabilities the model allows to every one of the different symbols that are conceivable
at this stage (as referenced prior, higher-request or versatile models imply that these probabilities are not really
the equivalent in each progression.) The encoder isolates the current span into sub-spans, each speaking to a small
amount of the current span relative to the likelihood of that symbol in the current setting. Whichever stretch
relates to the real symbol that is close to being encoded turns into the span utilized in the subsequent stage. At the
point when the sum total of what symbols have been encoded, the subsequent span unambiguously recognizes the
arrangement of symbols that delivered it. Any individual who has a similar last span and model that is being
utilized can remake the symbol succession that more likely than not entered the encoder to bring about that last
stretch. It isn’t important to transmit the last stretch, in any case; it is just important to transmit one division that
exists in that span. Specifically, it is just important to transmit enough digits (in whatever base) of the part so all
divisions that start with those digits fall into the last stretch; this will ensure that the subsequent code is a prefix
code.
45
Compression
The objective of compression algorithm is to reduce the source data to a compressed form and decompress it to get
the original data.
Any Compression algorithm has two components:
❖ Modeler: -
It is used to condition the image data for compression using the knowledge of data.
It is present in both sender and receiver.
It can be either static or dynamic.
❖ Coder: -
Sender side coder is known as encoder.
This codes the symbols independently or using the model.
Receiver side coder is known as decoder.
Decoder, decodes the message from the compressed data.
❖ Compression Algorithms: -
➢ Lossless Compression –
Reconstructed data is identical to the original data (Entropy Coding).
Example techniques:
Huffman Coding.
Arithmetic Coding.
Shannon - Fano Coding.
➢ Lossy Compression –
Reconstructed data approximates the original data (Source Coding).
Example techniques:
Linear Prediction.
Transform Coding.
46
❖ Another way of classifying image compression algorithm is:
➢ Entropy Coding:
The average information in an image is known as its entropy.
Coding is based on the entropy of the source and on the possibility of occurrence of the symbols.
An event that is less likely to occur is said to contain more information that an event that is more likely to
occur.
Set of symbols (alphabet) S = {S1, S2, ……. , Sn},
n is number of symbols in the alphabet
Probability distribution of the symbols: P = { p1, p2, ………. , pn}
According the Shannon, the entropy H of an information source S is defined as follows:
➢ Predictive Coding:
The idea is to remove the mutual dependency between the successive pixels and then perform the encoding.
Normally the samples would be very large but the difference would the small.
➢ Transform Coding:
Objective is to exploit the information packing capability of the transform.
Energy is packed into fewer components and only these components are encoded and transmitted.
Idea is to remove the redundant high frequency components to create compression.
Removal of these frequency components leads to loss of information.
This loss of information, if tolerable, can be used for imaging and video applications.
➢ Layered Coding:
It is very useful in case of layered images.
Data structures like pyramids are useful to represent and image in multiresolution form.
Sometimes these images are segmented as foreground and background, and based on the application
requirement encoding is done.
It is also in the form of selected frequency coefficients or selected bits of pixels in an image.
47
JPEG compression
JPEG is a lossy image compression method. JPEG compression uses the DCT (Discrete Cosine Transform) method
for coding transformation. It allows a trade-off between storage size and the degree of compression can be adjusted.
Step 1: The input image is divided into a small block which is having 8x8 dimensions. This dimension is sum up
to 64 units. Each unit of the image is called pixel.
Step 2: JPEG uses [Y,Cb,Cr] model instead of using the [R,G,B] model. So in the 2 nd step, RGB is converted into
YCbCr.
Step 3: After the conversion of colors, it is forwarded to DCT. DCT uses a cosine function and does not use
complex numbers. It converts information?s which are in a block of pixels from the spatial domain to the
frequency domain.
DCT Formula
Step 4: Humans are unable to see important aspects of the image because they are having high frequencies. The
matrix after DCT conversion can only preserve values at the lowest frequency that to in certain point. Quantization
is used to reduce the number of bits per sample.
48
➢ There are two types of Quantization:
1. Uniform Quantization
2. Non-Uniform Quantization
Step 5: The zigzag scan is used to map the 8x8 matrix to a 1x64 vector. Zigzag scanning is used to group low-
frequency coefficients to the top level of the vector and the high coefficient to the bottom. To remove the large
number of zero in the quantized matrix, the zigzag matrix is used.
Step 6: Next step is vectoring, the different pulse code modulation (DPCM) is applied to the DC component. DC
components are large and vary but they are usually close to the previous value. DPCM encodes the difference
between the current block and the previous block.
Step 7: In this step, Run Length Encoding (RLE) is applied to AC components. This is done because AC
components have a lot of zeros in it. It encodes in pair of (skip, value) in which skip is non zero value and value
is the actual coded value of the non zero components.
49
Step 8: In this step, DC components are coded into Huffman.
50
UNIT - 5
Point Detection
In point-detection, the point appears near the mask in which it is cantered (x, y). We use two masks to detect lines so
that each of these points can more easily associate with a line based on its direction as compared to its position on
the other.
51
❖ Which Of The Following Is Used For Point Detection?
b. second derivative
c. third derivative
d. Both a and b
Answer:second derivative
-1 -2 -1
1 2 1
52
❖ What Is Image Of The Point?
If points P and N are both reflected at the same distance on opposite sides of the line then, as if they were mirrors,
the given point will appear on the other side as well. A “reflector” of a line indicates this by the name “P'”; this is
how you call P’ (pronounced P prime).
53
Line Detection
The Hough Transform is a method that is used in image processing to detect any shape, if that shape can be
represented in mathematical form. It can detect the shape even if it is broken or distorted a little bit.
We will see how Hough transform works for line detection using the Hough Line transform method. To apply the
Hough line method, first an edge detection of the specific image is desirable.
• First it creates a 2D array or accumulator (to hold values of two parameters) and it is set to zero
initially.
• Let rows denote the r and columns denote the (θ)theta.
• Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1
degree, you need 180 columns(Maximum degree for a straight line is 180).
• For r, the maximum distance possible is the diagonal length of the image. So taking one pixel
accuracy, number of rows can be diagonal length of the image.
Example: -
Consider a 100×100 image with a horizontal line at the middle. Take the first point of the line. You know its (x,y)
values. Now in the line equation, put the values θ(theta) = 0,1,2,….,180 and check the r you get. For every (r, 0)
pair, you increment value by one in the accumulator in its corresponding (r,0) cells. So now in accumulator, the
cell (50,90) = 1 along with some other cells.
Now take the second point on the line. Do the same as above. Increment the values in the cells corresponding to
(r,0) you got. This time, the cell (50,90) = 2. We are actually voting the (r,0) values. You continu e this process for
every point on the line. At each point, the cell (50,90) will be incremented or voted up, while other cells may or
may not be voted up. This way, at the end, the cell (50,90) will have maximum votes. So if you search the
accumulator for maximum votes, you get the value (50,90) which says, there is a line in this image at distance 50
from origin and at angle 90 degrees.
54
Everything explained above is encapsulated in the OpenCV function, cv2.HoughLines(). It simply returns an
array of (r, 0) values. r is measured in pixels and 0 is measured in radians.
55
Edge Detection
Edges are significant local changes of intensity in a digital image. An edge can be defined as a set of connected
pixels that forms a boundary between two disjoint regions. There are three types of edges:
• Horizontal edges
• Vertical edges
• Diagonal edges
Edge Detection is a method of segmenting an image into regions of discontinuity. It is a widely used technique in
digital image processing like
• pattern recognition
• image morphology
• feature extraction
Edge detection allows users to observe the features of an image for a significant change in the gray level. This
texture indicating the end of one region in the image and the beginning of another. It reduces the amount of data
in an image and preserves the structural properties of an image.
Edge Detection Operators are of two types:
• Gradient – based operator which computes first-order derivations in a digital image like, Sobel
operator, Prewitt operator, Robert operator
• Gaussian – based operator which computes second-order derivations in a digital image like, Canny
edge detector, Laplacian of Gaussian
Sobel Operator: It is a discrete differentiation operator. It computes the gradient approximation of image
intensity function for image edge detection. At the pixels of an image, the Sobel operator produces either the
normal to a vector or the corresponding gradient vector. It uses two 3 x 3 kernels or masks which ar e convolved
with the input image to calculate the vertical and horizontal derivative approximations respectively –
56
❖ Advantages:
1. Simple and time efficient computation.
2. Very easy at searching for smooth edges.
❖ Limitations:
1. Diagonal direction points are not preserved always.
2. Highly sensitive to noise.
3. Not very accurate in edge detection.
4. Detect with thick and rough edges does not give appropriate results.
❖ Prewitt Operator:
This operator is almost similar to the sobel operator. It also detects vertical and horizontal edges of an image.
It is one of the best ways to detect the orientation and magnitude of an image. It uses the kernels or masks –
❖ Advantages:
1. Good performance on detecting vertical and horizontal edges.
2. Best operator to detect the orientation of an image.
❖ Limitations:
1. The magnitude of coefficient is fixed and cannot be changed.
2. Diagonal direction points are not preserved always.
❖ Robert Operator:
This gradient-based operator computes the sum of squares of the differences between diagonally adjacent
pixels in an image through discrete differentiation. Then the gradient approximation is made. It uses the
following 2 x 2 kernels or masks –
❖ Advantages:
1. Detection of edges and orientation are very easy
2. Diagonal direction points are preserved
❖ Limitations:
1. Very sensitive to noise
2. Not very accurate in edge detection
57
❖ Marr-Hildreth Operator or Laplacian of Gaussian (LoG):
It is a gaussian-based operator which uses the Laplacian to take the second derivative of an image. This
really works well when the transition of the grey level seems to be abrupt. It works on the zero-crossing
method i.e when the second-order derivative crosses zero, then that particular location corresponds to a
maximum level. It is called an edge location. Here the Gaussian operator reduces the noise and the Laplacian
operator detects the sharp edges.
The Gaussian function is defined by the formula:
❖ Advantages:
1. Easy to detect edges and their various orientations.
2. There is fixed characteristics in all directions.
❖ Limitations:
1. Very sensitive to noise
2. The localization error may be severe at curved edges.
3. It generates noisy responses that do not correspond to edges, so-called “false edges”.
❖ Canny Operator:
It is a gaussian-based operator in detecting edges. This operator is not susceptible to noise. It extracts image
features without affecting or altering the feature. Canny edge detector have advanced algorithm derived from
the previous work of Laplacian of Gaussian operator. It is widely used an optimal edge detection technique.
It detects edges based on three criteria:
1. Low error rate
2. Edge points must be accurately localized
3. There should be just one single edge response
Advantages:
1. It has good localization.
2. It extracts image features without altering the features.
3. Less Sensitive to noise.
Limitations:
1. There is false zero crossing.
2. Complex computation and time consuming.
58
Thresholding
Image segmentation is the technique of subdividing an image into constituent sub-regions or distinct objects. The
level of detail to which subdivision is carried out depends on the problem being solved. That is, segmentation
should stop when the objects or the regions of interest in an application have been detected.
Segmentation of non-trivial images is one of the most difficult tasks in image processing. Segmentation accuracy
determines the eventual success or failure of computerized analysis procedures. Segmentation procedures are
usually done using two approaches – detecting discontinuity in images and linking edges to form the region (known
as edge-based segmenting), and detecting similarity among pixels based on intensity levels (known as threshold -
based segmenting).
❖ Thresholding: -
Thresholding is one of the segmentation techniques that generates a binary image (a binary image is one
whose pixels have only two values – 0 and 1 and thus requires only one bit to store pixel intensity) from a
given grayscale image by separating it into two regions based on a threshold value. Hence pixels having
intensity values greater than the said threshold will be treated as white or 1 in the output image and the others
will be black or 0.
Suppose the above is the histogram of an image f(x,y). We can see one peak near level 40 and another at
180. So there are two major groups of pixels – one group consisting of pixels having a darker shade and
the others having a lighter shade. So there can be an object of interest set in the background. If we use an
appropriate threshold value, say 90, will divide the entire image into two distinct regions.
In other words, if we have a threshold T, then the segmented image g(x,y) is computed as shown below:
So the output segmented image has only two classes of pixels – one having a value of 1 and others
having a value of 0.
If the threshold T is constant in processing over the entire image region, it is said to be global
thresholding. If T varies over the image region, we say it is variable thresholding.
Multiple-thresholding classifies the image into three regions – like two distinct objects on a background.
The histogram in such cases shows three peaks and two valleys between them. The segmented image can
be completed using two appropriate thresholds T 1 and T2.
59
We may intuitively infer that the success of intensity thresholding is directly related to the width and
depth of the valleys separating the histogram modes. In turn, the key factors affecting the properties of
the valleys are the separation between peaks, the noise content in the image, and the relative sizes of
objects and backgrounds. The more widely the two peaks in the histogram are separated, the better
thresholding and hence image segmenting algorithms will work. Noise in an image often degrades this
widely-separated two-peak histogram distribution and leads to difficulties in adequate thresholding and
segmenting. When noise is present, it is appropriate to use some filter to clean the image a nd then apply
segmentation. The relative object sizes play a role in determining the accuracy of segmentation.
❖ Global Thresholding: -
When the intensity distribution of objects and background are sufficiently distinct, it is possible to use a
single or global threshold applicable over the entire image. The basic global thresholding algorithm
iteratively finds the best threshold value so segmenting.
The algorithm is explained below.
1. Select an initial estimate of the threshold T.
2. Segment the image using T to form two groups G 1 and G2: G1 consists of all pixels with
intensity values > T, and G 2 consists of all pixels with intensity values ≤ T.
3. Compute the average intensity values m 1 and m2 for groups G1 and G2.σ
4. Compute the new value of the threshold T as T = (m 1 + m2)/2
5. Repeat steps 2 through 4 until the difference in the subsequent value of T is smaller than a
pre-defined value δ.
6. Segment the image as g(x,y) = 1 if f(x,y) > T and g(x,y) = 0 if f(x,y) ≤ T.
This algorithm works well for images that have a clear valley in their histogram. The larger the value of
δ, the smaller will be the number of iterations. The initial estimate of T can be made equal to the average
pixel intensity of the entire image.
The above simple global thresholding can be made optimum by using Otsu’s method. Otsu’s method is
optimum in the sense that it maximizes the between-class variance. The basic idea is that well-thresholded
classes or groups should be distinct with respect to the intensity values of their pixels and conversely, a
threshold giving the best separation between classes in terms of their intensity values would be the best or
optimum threshold.
60
❖ Variable Thresholding: -
There are broadly two different approaches to local thresholding. One approach is to partition the image into
non-overlapping rectangles. Then the techniques of global thresholding or Otsu’s method are applied to each
of the sub-images. Hence in the image partitioning technique, the methods of global thresholding are applied
to each sub-image rectangle by assuming that each such rectangle is a separate image in itself. This approach
is justified when the sub-image histogram properties are suitable (have two peaks with a wide valley in
between) for the application of thresholding techniques but the entire image histogram is corrupted by noise
and hence is not ideal for global thresholding.
The other approach is to compute a variable threshold at each point from the neighbourhood pixel properties.
Let us say that we have a neighbourhood Sxy of a pixel having coordinates (x,y). If the mean and standard
deviation of pixel intensities in this neighbourhood be mxy and σxy , then the threshold at each point can be
computed as:
where a and b are arbitrary constants. The above definition of the variable threshold is just an example. Other
definitions can also be used according to the need.
The segmented image is computed as:
Moving averages can also be used as thresholds. This technique of image thresholding is the most general
one and can be applied to widely different cases.
61
Edge Linking and Boundary Detection
Edge linking and boundary detection operations are the fundamental steps in any image understanding. Edge linking
process takes an unordered set of edge pixels produced by an edge detector as an input to form an ordered list of
edges. Local edge information are utilized by edge linking operation; thus edge detection algorithms typically are
followed by linking procedure to assemble edge pixels into meaningful edges.
❖ Local Processing: -
One of the simplest approaches of linking edge points is to analyze the characteristics of pixels in a small
neighborhood (say, 3 x 3 or 5 x 5) about every point (x, y) in an image that has undergone edge-detection. All
points that are similar are linked, forming a boundary of pixels that share some common properties.
The two principal properties used for establishing similarity of edge pixels in this kind of analysis are (1) the
strength of the response of the gradient operator used to produce the edge pixel; and (2) the direction of the
gradient vector. The first property is given by the value of , the gradient. Thus an edge pixel with coordinates in
a predefined neighborhood of (x, y) is similar in magnitude to the pixel at (x, y) if
62
Hough Transforms
The HT is a feature extraction method in image analysis, computer vision, and digital image processing. It uses a
voting mechanism to identify bad examples of objects inside a given class of forms. This voting mechanism is
carried out in parameter space. First, object candidates are produced as local maxima in an accumulator space, using
the HT algorithm.
The traditional HT was concerned with detecting lines in an image, but it was subsequently expanded to identifying
locations of arbitrary shapes, most often circles or ellipses.
❖ Why is it Needed?
In many circumstances, an edge detector can be used as a pre-processing stage to get picture points or pixels on
the required curve in the image space. However, there may be missing points or pixels on the required curves
due to flaws in either the image data or the edge detector and spatial variations between the ideal
line/circle/ellipse and the noisy edge points acquired by the edge detector. As a result, grouping the extracted
edge characteristics into an appropriate collection of lines, circles, or ellipses is frequently difficult.
63
Figure 2: Image after applying edge detection technique. Red circles show that the
A line can be described analytically in a variety of ways. One of the line equations uses the parametric or
normal notion: xcosθ+ysinθ=r. where r is the length of a normal from the origin to this line and θ is the
orientation.
64
Point equations Now a=0 New point (a,b) Now a=1 New point (a,b)
A(1,4) b= -a+4 b= -(0)+4 =4 (0,4) b= -(1)+4 =3 (1,3)
B(2,3) b= -2a+3 b= -2(0)+3=3 (0,3) b= -2(1)+3=1 (1,1)
C(3,1) b= -3a+1 b= -3(0)+1=1 (0,1) b= -3(1)+1=-2 (1,-2)
D(4,1) b= -4a+1 b= -4(0)+1=1 (0,1) b= -4(1)+1=3 (1,-3)
E(5,0) b= -5a+0 b= -5(0)+0=0 (0,0) b= -5(1)+0=-5 (1,-5)
The known variables (i.e., xi,yi) in the Image are constants in the parametric line equation, whereas r and are the
unknown variables we seek. Points in cartesian image space correspond to curves (i.e., sinusoids) in the polar Hough
parameter space if we plot the potential (r, θ) values specified by each. The Hough transformation for straight lines
is this point-to-curve transformation. Collinear spots in the cartesian image space become clearly obvious when
examined in the Hough parameter space because they provide curves that overlap at a single (r, θ) point.
Where a and b are the circle’s center coordinates, and r is the radius. Because we now have three coordinates in the
parameter space and a 3-D accumulator, the algorithm’s computing complexity increases. (In general, the number of
parameters increases the calculation and the size of the accumulator array polynomially.) As a result, the
fundamental Hough approach described here is only applicable to straight lines.
❖ Algorithm: -
1. Determine the range of ρ and θ. Typically, the range of θ is [0, 180] degrees and ρ is [-d, d], where d is the
diagonal length of the edge. Therefore, it’s crucial to quantify the range of ρ and θ, which means there should
only be a finite number of potential values.
2. Create a 2D array called the accumulator with the dimensions (num rhos, num thetas) to represent the Hough
Space and set all its values to zero.
3. Use the original Image to do edge detection (ED). You can do this with whatever ED technique you like.
4. Check each pixel on the edge picture to see if it is an edge pixel. If the pixel is on edge, loop over all possible
values of θ, compute the corresponding ρ, locate the θ and ρ index in the accumulator, then increase the
accumulator base on those index pairs.
5. Iterate over the accumulator’s values. Retrieve the ρ and θ index, and get the value of ρ and θ from the index
pair, which may then be transformed back to the form of y = ax + b if the value is greater than a specified
threshold.
Now take x-0 and find corresponding y value for above given five equations
Let us plot the new point on the graph as given below in figure.
65
We can see that almost all line crosses each other at a point (-1,5). So here now a=-1 and b =5.
Now let’s put these values in the y=ax+b equation so we get y=-1x+5 so y=-x+5 is the line equation that will link all
the edges.
❖ Advantages:
The HT benefits from not requiring all pixels on a single line to be contiguous. As a result, it can be quite effective
when identifying lines with small gaps due to noise or when objects are partially occluded.
❖ Disadvantages:
The HT has the following drawbacks:
• It can produce deceptive results when objects align by accident;
• Rather than finite lines with definite ends, detected lines are infinite lines defined by their (m,c) values.
❖ Application:
The HT has been widely employed in numerous applications because of the benefits, such as noise immunity. 3D
applications, object and form detection, lane and traffic sign recognition, industrial and medical applications, pipe
and cable inspection, and underwater tracking are just a few examples. Below are some examples of these
applications. Proposes the hierarchical additive Hough transform (HAHT) for detecting lane lines. The HAHT that is
recommended accumulates the votes at various hierarchical levels. Line segmentation into multiple blocks also
minimizes the computational load. proposes a lane detection strategy in which the HT is merged with the joint
photographic experts’ group (JPEG) compression. However, only simulations are used to test the method.
66
Region Based Segmentation
This process involves dividing the image into smaller segments that have a certain set of rules. This technique
employs an algorithm that divides the image into several components with common pixel characteristics. The
process looks out for chunks of segments within the image. Small segments can include similar pixes from
neighboring pixels and subsequently grow in size. The algorithm can pick up the gray level from surrounding
pixels.
2. Region splitting − In this method, the whole image is considered a single region. Now to divide the region into
segments it checks for pixels included in the initial region if they follow the predefined set of criteria. If they
follow similar rules they are taken into one segment.
• Region Merging- In this segmentation we usually deal each and every pixel region the no. of pixels is
equal to the no. of region in the image.
We use to merge the region according to the given rule in merging segmentation if the rule is followed.
• Region splitting and merging- It is combination both splitting and merging takes place at the same
time is call as region splitting and merging segmentation.
67
Boundary Descriptors
❖ Simple Descriptors:
• Length of a Contour By counting the number of pixels along the contour. For a chain coded curve with unit
spacing in both directions, the number of vertical and horizontal components plus 21/2 times the number of
components give the exact length of curve.
• Boundary Diameter It is defines as Diam (B) = max [D(pi , pj)] i, j where D is the distance measure which
can be either Euclidean distance or D4 distance. The value of the diameter and the orientation of the major
axis of the boundary are two useful Descriptors.
• Curvature It is the rate of change of slope. Curvature can be determined by using the difference between the
slopes of adjacent boundary segments at the point of intersection of the segments. Shape Numbers Shape
number is the smallest magnitude of the first difference of a chain code representation. The order of a shape
number is defined as the number of digits in its representation. Shape order is even for a closed boundary.
REGIONAL DESCRIPTORS Simple Descriptors Area, perimeter and compactness are the simple region
Descriptors Compactness = (perimeter)2/area Topological Descriptors
• Rubber-sheet Distortions Topology is the study of properties of a figure that are unaffected by any
deformation, as long as there is no tearing or joining of the figure.
• Euler Number Euler number (E) of region depends on the number of connected components (C) and holes
(H). E = C − H A connected component of a set is a subset of maximal size such that any two of its points
can be joined by a connected curve lying entirely within the subset.
68