Assignment DIP
Assignment DIP
Assignment No 3
Submitted To:
Dr. Shahzad Anwar
Submission Date:
5th January ,2024
What is Image Segmentation?
Image segmentation, a fundamental concept in computer vision, is dividing a digital picture into several
segments, which are sometimes referred to as image regions or objects. This procedure converts a picture
into a more relevant and easier-to-understand representation. Image segmentation classifies pixels so that
those with the same label have similar properties. This method is useful for finding items and boundaries
inside photos. In medical imaging, for example, segmentation may generate 3D reconstructions from CT
images using geometry reconstruction methods.
Instance Segmentation: Every pixel is associated with a specific instance of the item. It is concerned
with distinguishing different things of interest in a picture. In a photograph with numerous people, for
example, each person would be segmented as a separate item.
Panoptic Segmentation:
Panoptic segmentation, a hybrid of semantic and instance segmentation, determines the class to which each
pixel belongs while discriminating between distinct instances of the same class.
Working Mechanism:
Panoptic segmentation has emerged as a game changer in computer vision. It's a hybrid strategy that
combines the best of semantic and instance segmentation. Whereas semantic segmentation categorizes each
pixel, instance segmentation recognizes specific object instances. Panoptic segmentation, on the other hand,
performs both: it classifies every pixel and provides a unique instance ID to distinct objects.
The Efficient Panoptic Segmentation (EfficientPS) approach is one of the most advanced in panoptic
segmentation. Deep learning and neural networks are used in this method to generate high-quality
segmentation results. EfficientPS is intended to be both computationally efficient and effective in terms of
segmentation quality. It processes input photos and generates segmentation masks using feature pyramid
networks and convolutional layers. The COCO dataset is also used for training and validation, ensuring
that the models are exposed to a wide range of pictures and settings.
The benefit of panoptic segmentation, particularly approaches like EfficientPS, is that it may give a precise,
pixel-level comprehension of pictures. This is extremely useful in real-world applications such as driverless
cars, where recognizing the category (road, pedestrian, vehicle) and the environment is critical.
Autonomous Vehicles
Another arena in which panoptic segmentation excels is that of autonomous cars. Understanding the
environment is critical for self-driving automobiles. Panoptic segmentation helps with this by offering a
pixel-level comprehension of the environment. It is critical in estimating distance to object, allowing the
vehicle to make intelligent judgments in real time. Panoptic segmentation provides safer navigation for
autonomous cars by discriminating between countable items (such as pedestrians and other vehicles) and
uncountable objects (such as highways and sky).
Output:
Input Image:
Output:
Figure 2 Result of Panoptic Segmentation
Watershed Algorithm
Thresholding:
In the context of the Watershed Algorithm, thresholding plays an important role in identifying
certain parts of the image. After converting the image to grayscale, the algorithm applies
thresholding to the grayscale image to obtain a binary image that helps in segregating the foreground
(objects to be segmented) and the background.
Opening (Erosion followed by Dilation):
In this step, the opening operation, which is an erosion operation followed by a dilation
operation, is performed. The purpose of this step is primarily to remove noise. The erosion operation
removes small white noise in the image, but it also shrinks our objects. Following this with a dilation
operation allows us to retain the size of our objects while keeping the noise out.
Erosion:
This operation erodes away the boundaries of the foreground object. It works by creating a
convolution kernel and passing it over the image. If any of the pixels in the region under the kernel
are black, then the pixel in the middle of the kernel is set to black. This operation is effective at
removing small white noise.
Dilation:
After erosion, dilation is performed, which is essentially the opposite of erosion. It adds pixels
to the boundaries of objects in an image. If any of the pixels in the region under the kernel are white,
then the pixel in the middle of the kernel is set to white.
Dilation for Background Identification:
In this step, the dilation operation is used to identify the background region of the image. The
result of previous step, where noise has been removed, is subjected to dilation. After dilation, a
significant portion around the objects (or the foreground) is expected to be the background region
(since dilation expands the objects). This “sure background” region aids in the subsequent steps of
the Watershed algorithm where we aim to identify distinct segments/objects.
Distance Transformation:
Watershed Algorithm involves applying a distance transform to identify regions that are likely to
be the foreground.
How the watershed algorithm works?
The concept of “flooding” and “dam construction” in the Watershed Algorithm is essentially a
metaphorical way to describe how the algorithm works
Flooding:
The “flooding” process refers to the expansion of each labeled region (the markers) based on
the gradient of the image. In this context, the gradient represents the topographic elevation, with
high-intensity pixel values representing peaks and low-intensity pixel values representing valleys.
The flooding starts from the valleys, or the regions with the lowest intensity values. The flooding
process is carried out in such a way that each pixel in the image is assigned a label. The label it
receives depends on which marker’s “flood” reaches it first. If a pixel is equidistant from multiple
markers, it remains as part of the unknown region for now.
Dam Construction:
As the flooding process continues, the floodwaters from different markers (representing
different regions in the image) will eventually start to meet. When they do, a “dam” is constructed.
In terms of the algorithm, this dam construction corresponds to the creation of boundaries in the
marker image. These boundaries are assigned a special label (usually -1). The dams are constructed
at the locations where the floodwaters from different markers meet, which are typically the areas of
the image where there’s a rapid change in intensity — signifying the boundary between different
regions in the image.
title("Gradient Magnitude")
L = watershed(gmag);
Lrgb = label2rgb(L);
imshow(Lrgb)
title("Watershed Transform of Gradient Magnitude")
se = strel("disk",20);
Io = imopen(I,se);
imshow(Io)
title("Opening")
Ie = imerode(I,se);
Iobr = imreconstruct(Ie,I);
imshow(Iobr)
title("Opening-by-Reconstruction")
Ioc = imclose(Io,se);
imshow(Ioc)
title("Opening-Closing")
Iobrd = imdilate(Iobr,se);
Iobrcbr = imreconstruct(imcomplement(Iobrd),imcomplement(Iobr));
Iobrcbr = imcomplement(Iobrcbr);
imshow(Iobrcbr)
title("Opening-Closing by Reconstruction")
fgm = imregionalmax(Iobrcbr);
imshow(fgm)
title("Regional Maxima of Opening-Closing by Reconstruction")
I2 = labeloverlay(I,fgm);
imshow(I2)