0% found this document useful (0 votes)
11 views

14 Segmentation

segmentation pdf

Uploaded by

Raviteja PV
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

14 Segmentation

segmentation pdf

Uploaded by

Raviteja PV
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Segmentation partitions an image into distinct regions containing each pixels with similar attributes.

Meaningful segmentation is the first step from low-level image processing transforming a greyscale or color image into
one or more other images to high-level image description in terms of features, objects, and scenes.

The success of image analysis depends on reliability of segmentation, but an accurate partitioning of an image is
generally a very challenging problem.

FCD detection and segmentation

Segmentation:

Input : Images
automatic separation of arteries/veins
Output : segmented regions, structures etc.
In both these images, you can easily
segment the foreground region by
applying a threshold on the image.
What is semantic segmentation?

Idea: recognizing, understanding what's in the


image in pixel level.

Each pixel in the image will be classified into one of


the output labels

Image classification at pixel level.


Image Classification Image Segmentation
Fully Convolutional Networks (FCN) for 2D
segmentation

• Presented in CVPR 2015.

• They build “fully convolutional” networks that take


input of arbitrary size and produce correspondingly-
sized output with efficient inference and learning.

• They show that convolutional networks by


themselves, trained end-to-end, pixels to-pixels,
exceed the state-of-the-art in semantic
segmentation.

• This is the first work to train FCNs end-to-end (1) for


pixelwise prediction and (2) from supervised pre-
training.

• Fully convolutional networks (FCN) : CNNs without


fully connected layers or dense layers.
In classification, conventionally, an input image is downsized and goes through the convolution layers and fully connected
(FC) layers, and output one predicted label for the input image, as follows:

https://towardsdatascience.com/review-fcn-semantic-segmentation-eb8c9b50d2d1
The above architecture can be converted into FCN
by using 1 x 1 convolutional layers

https://towardsdatascience.com/review-fcn-semantic-segmentation-eb8c9b50d2d1
When we do segmentation, the size (dimension) of the output (segmented image) should be same as the input image.

One straightforward approach is to use “same convolution” and avoid pooling in the layers.

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Designing deep CNNs with “same convolutions” and without max pooling is challenging. The memory requirement and the
number of parameters in the model will be huge. Also lose the advantages of max-pooling.
So the alternative option is to create a model with valid convolution/Strided convolution + max pooling. In that case the size
of the output will be different (less than the original image). So we should do some kind of up sampling to make the output
(predictions) same as the input size.
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
1.Nearest Neighbors: In Nearest Neighbors, as the name suggests we take an input pixel value and copy it to
the K-Nearest Neighbors where K depends on the expected output.

2. Bed Of Nails: In Bed of Nails, we copy the value of the input pixel at the corresponding position in the output
image and filling zeros in the remaining positions.
3. Max-Unpooling: The Max-Pooling layer in CNN takes the maximum among all the values in the kernel. To
perform max-unpooling, first, the index of the maximum value is saved for every max-pooling layer during the
encoding step. The saved index is then used during the Decoding step where the input pixel is mapped to the
saved index, filling zeros everywhere else.
Transposed convolution
Implementing a transposed convolutional layer can be better explained as a 4 step process

tep Calculate new parameters and p’


tep Between each row and columns of the input insert num er of eros.
This increases the si e of the input to *i x *i
tep ad the modified input image with p’ num er of eros
Step 4: Carry out standard convolution on the image generated from step 3 with a stride length of 1
Convolution With Stride 1, No Padding

Convolution With Stride 2, No Padding


Transpose Convolution With Stride 2, No Padding

• First we create an intermediate grid which has the original input’s cells spaced apart with a step size set to the stride.
• Next we extend the edges of the intermediate image with additional cells with value 0. We add the maximum
amount of these so that a kernel in the top left covers one of the original cells.
• Finally, the kernel is moved across this intermediate grid in step sizes of 1. This step size is always 1. The stride option
is used to set how far apart the original cells are in the intermediate grid. Unlike normal convolution, here the stride
is not used to decide how the kernel moves.

http://makeyourownneuralnetwork.blogspot.com/2020/02/calculating-output-size-of-convolutions.html
Convolution

Transpose Convolution
Transpose Convolution With Stride 1, No Padding
Transpose Convolution With Stride 2, With Padding

Unlike the normal convolution where padding is used to expand the image, here it is used to reduce it.
Transposed convolutions suffer from chequered board effects as shown below.

This can be fixed or reduced by using kernel-size divisible by the stride, for e.g taking a kernel size of 2x2 or 4x4
when having a stride of 2.

You might also like