14 Segmentation
14 Segmentation
Meaningful segmentation is the first step from low-level image processing transforming a greyscale or color image into
one or more other images to high-level image description in terms of features, objects, and scenes.
The success of image analysis depends on reliability of segmentation, but an accurate partitioning of an image is
generally a very challenging problem.
Segmentation:
Input : Images
automatic separation of arteries/veins
Output : segmented regions, structures etc.
In both these images, you can easily
segment the foreground region by
applying a threshold on the image.
What is semantic segmentation?
https://towardsdatascience.com/review-fcn-semantic-segmentation-eb8c9b50d2d1
The above architecture can be converted into FCN
by using 1 x 1 convolutional layers
https://towardsdatascience.com/review-fcn-semantic-segmentation-eb8c9b50d2d1
When we do segmentation, the size (dimension) of the output (segmented image) should be same as the input image.
One straightforward approach is to use “same convolution” and avoid pooling in the layers.
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Designing deep CNNs with “same convolutions” and without max pooling is challenging. The memory requirement and the
number of parameters in the model will be huge. Also lose the advantages of max-pooling.
So the alternative option is to create a model with valid convolution/Strided convolution + max pooling. In that case the size
of the output will be different (less than the original image). So we should do some kind of up sampling to make the output
(predictions) same as the input size.
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
1.Nearest Neighbors: In Nearest Neighbors, as the name suggests we take an input pixel value and copy it to
the K-Nearest Neighbors where K depends on the expected output.
2. Bed Of Nails: In Bed of Nails, we copy the value of the input pixel at the corresponding position in the output
image and filling zeros in the remaining positions.
3. Max-Unpooling: The Max-Pooling layer in CNN takes the maximum among all the values in the kernel. To
perform max-unpooling, first, the index of the maximum value is saved for every max-pooling layer during the
encoding step. The saved index is then used during the Decoding step where the input pixel is mapped to the
saved index, filling zeros everywhere else.
Transposed convolution
Implementing a transposed convolutional layer can be better explained as a 4 step process
• First we create an intermediate grid which has the original input’s cells spaced apart with a step size set to the stride.
• Next we extend the edges of the intermediate image with additional cells with value 0. We add the maximum
amount of these so that a kernel in the top left covers one of the original cells.
• Finally, the kernel is moved across this intermediate grid in step sizes of 1. This step size is always 1. The stride option
is used to set how far apart the original cells are in the intermediate grid. Unlike normal convolution, here the stride
is not used to decide how the kernel moves.
http://makeyourownneuralnetwork.blogspot.com/2020/02/calculating-output-size-of-convolutions.html
Convolution
Transpose Convolution
Transpose Convolution With Stride 1, No Padding
Transpose Convolution With Stride 2, With Padding
Unlike the normal convolution where padding is used to expand the image, here it is used to reduce it.
Transposed convolutions suffer from chequered board effects as shown below.
This can be fixed or reduced by using kernel-size divisible by the stride, for e.g taking a kernel size of 2x2 or 4x4
when having a stride of 2.