0% found this document useful (0 votes)
10 views

5. Object Detection and Segmentation - part 2

The document discusses semantic segmentation in deep learning, focusing on labeling each pixel in images without differentiating instances. It covers various datasets used for training, models like Fully Convolutional Networks (FCN), U-Net, and DeepLabV3+, and techniques for upsampling and downsampling. Additionally, it addresses the concepts of object detection, instance segmentation, and panoptic segmentation, along with their respective metrics for evaluation.

Uploaded by

gamecule1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

5. Object Detection and Segmentation - part 2

The document discusses semantic segmentation in deep learning, focusing on labeling each pixel in images without differentiating instances. It covers various datasets used for training, models like Fully Convolutional Networks (FCN), U-Net, and DeepLabV3+, and techniques for upsampling and downsampling. Additionally, it addresses the concepts of object detection, instance segmentation, and panoptic segmentation, along with their respective metrics for evaluation.

Uploaded by

gamecule1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Deep Learning

Object Detection and Segmentation


Huỳnh Văn Thống
FPT Univ.
Semantic Segmentation
• Label each pixel in the image with
a category label.
• Don’t differentiate instances, only
care about pixels.

2/24/2025 2
Segmentation: Dataset
• Pascal VOC: 16k training natural images divided into 20 classes.
• Cityscapes: 25K urban-street images divided into 30 classes.
• ADE20K: 25K (20 stands for 20K training) scene-parsing images
divided into 150 classes.
• MS COCO: 328K images with 80 things categories and 91 stuff
categories.

Models are often pre-trained in the


large MS-COCO dataset, before
finetuned to the specific dataset.

2/24/2025 3
Semantic Segmentation: FCN
• FCN = Fully Convolutional Network.
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

2/24/2025 4
Semantic Segmentation: FCN
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

Problem #1: Effective receptive field size Problem #2: Convolution on high res
is linear in number of conv layers: With L images is expensive! Recall ResNet stem
3x3 conv layers, receptive field is 1+2L aggressively downsamples.
2/24/2025 5
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

2/24/2025 6
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling : ?
Pooling, strided convolution

2/24/2025 7
In-Network Upsampling: “Unpooling”

2/24/2025 8
In-Network Upsampling: Bilinear Interpolation

Use two closest neighbors in 𝑥 and 𝑦


to construct linear approximations

2/24/2025 9
In-Network Upsampling: Bicubic Interpolation

Use three closest neighbors in 𝑥 and 𝑦 to


construct cubic approximations.
(This is how we normally resize images)

2/24/2025 10
In-Network Upsampling: “Max Unpooling”
Max Pooling: Remember Max Unpooling: Place into
which position had the max remembered positions

Pair each downsampling layer with


an upsampling layer

2/24/2025 11
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 1, pad 1

2/24/2025 12
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

2/24/2025 13
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Convolution with stride > 1 is “Learnable Downsampling”

Can we use stride < 1 for “Learnable Upsampling”?

2/24/2025 14
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 15
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 16
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs


are overlap

2/24/2025 17
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs


are overlap

2/24/2025 18
Transposed Convolution: 1D example

Output has copies of filter


weighted by input.

Stride 2: Move 2 pixels output


for each pixel in input.

Sum at overlaps.

2/24/2025 19
Transposed Convolution: 1D example
Many name:
• Deconvolution (bad).
• Upconvolution.
• Fractionally strided
convolution.
• Backward strided
convolution.
• Transposed Convolution
(best).
2/24/2025 20
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling :
Pooling, strided convolution Iinterpolation,
transposed conv
2/24/2025 21
Semantic Segmentation: FCN
• Combine predictions with different resolutions

Fully Convolutional Networks for Semantic Segmentation. Long et al., CVPR, 2015
2/24/2025 22
Semantic Segmentation: U-Net
• Incorporating the low-
level information.

U-Net: Convolutional Networks for Biomedical Image


Segmentation, Ronneberger et al., MICCAI 2015
2/24/2025 23
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales

Encoder-Decoder with Atrous Separable Convolution


for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 24
Atrous Convolution

Sparse feature extraction with


standard convolution on a
low-resolution input feature
map.

Dense feature extraction with


atrous convolution with rate r=2,
applied on a high-resolution input
feature map.

2/24/2025 25
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales.

• Refine the segmentation


results along object
boundaries.

Encoder-Decoder with Atrous Separable Convolution


for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 26
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances

2/24/2025 27
Things and Stuff
Things: Object categories that
can be separated into object
instances (e.g. cats, cars,
person).

Stuff: Object categories that


cannot be separated into
instances (e.g. sky, grass,
water, trees)

2/24/2025 28
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances.
(Only things) (Both things and stuff)

2/24/2025 29
Computer Vision Tasks
Instance Segmentation: Detect all objects Semantic Segmentation: Gives per
in the image and identify the pixels that pixel labels, but merges instances.
belong to each object. (Only things!) (Both things and stuff)

2/24/2025 30
Computer Vision Tasks: Instance Segmentation
Instance Segmentation: Detect all
objects in the image, and identify the
pixels that belong to each object.
(Only things!)

Approach: Perform object detection,


then predict a segmentation mask
for each object!

2/24/2025 31
Beyond Instance Segmentation: Panoptic Segmentation

• Label all pixels in the image


(both things and stuff).

• For “thing” categories also


separate into instances.

2/24/2025 32
Beyond Instance Segmentation: Panoptic Segmentation

2/24/2025 33
Panoptic quality (PQ) measure
• Computed per-category and results are averaged
across categories.
• The ground truth and predicted segments are
matched with an IoU threshold 0.5
• TP (matched pairs), FP (unmatched predicted
segments), and FN (unmatched ground truth
segments).

SQ: how close the predicted segments are to the


ground truth segment (does not consider bad RQ: just like for detection, we want to know if we are missing
predictions!) any instances (FN) or predicting more instances (FP)
2/24/2025 34
Next
• Visualization and Understanding
• Attention and Transformer
• Foundation Models and Promptable Segmentation.
• ….

2/24/2025 35
Questions?

2/24/2025 36

You might also like