5. Object Detection and Segmentation - part 2
5. Object Detection and Segmentation - part 2
2/24/2025 2
Segmentation: Dataset
• Pascal VOC: 16k training natural images divided into 20 classes.
• Cityscapes: 25K urban-street images divided into 30 classes.
• ADE20K: 25K (20 stands for 20K training) scene-parsing images
divided into 150 classes.
• MS COCO: 328K images with 80 things categories and 91 stuff
categories.
2/24/2025 3
Semantic Segmentation: FCN
• FCN = Fully Convolutional Network.
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.
2/24/2025 4
Semantic Segmentation: FCN
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.
Problem #1: Effective receptive field size Problem #2: Convolution on high res
is linear in number of conv layers: With L images is expensive! Recall ResNet stem
3x3 conv layers, receptive field is 1+2L aggressively downsamples.
2/24/2025 5
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!
2/24/2025 6
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!
Downsampling: Upsampling : ?
Pooling, strided convolution
2/24/2025 7
In-Network Upsampling: “Unpooling”
2/24/2025 8
In-Network Upsampling: Bilinear Interpolation
2/24/2025 9
In-Network Upsampling: Bicubic Interpolation
2/24/2025 10
In-Network Upsampling: “Max Unpooling”
Max Pooling: Remember Max Unpooling: Place into
which position had the max remembered positions
2/24/2025 11
Learnable Upsampling: Transposed Convolution
2/24/2025 12
Learnable Upsampling: Transposed Convolution
2/24/2025 13
Learnable Upsampling: Transposed Convolution
2/24/2025 14
Learnable Upsampling: Transposed Convolution
2/24/2025 15
Learnable Upsampling: Transposed Convolution
2/24/2025 16
Learnable Upsampling: Transposed Convolution
2/24/2025 17
Learnable Upsampling: Transposed Convolution
2/24/2025 18
Transposed Convolution: 1D example
Sum at overlaps.
2/24/2025 19
Transposed Convolution: 1D example
Many name:
• Deconvolution (bad).
• Upconvolution.
• Fractionally strided
convolution.
• Backward strided
convolution.
• Transposed Convolution
(best).
2/24/2025 20
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!
Downsampling: Upsampling :
Pooling, strided convolution Iinterpolation,
transposed conv
2/24/2025 21
Semantic Segmentation: FCN
• Combine predictions with different resolutions
Fully Convolutional Networks for Semantic Segmentation. Long et al., CVPR, 2015
2/24/2025 22
Semantic Segmentation: U-Net
• Incorporating the low-
level information.
2/24/2025 25
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales.
2/24/2025 27
Things and Stuff
Things: Object categories that
can be separated into object
instances (e.g. cats, cars,
person).
2/24/2025 28
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances.
(Only things) (Both things and stuff)
2/24/2025 29
Computer Vision Tasks
Instance Segmentation: Detect all objects Semantic Segmentation: Gives per
in the image and identify the pixels that pixel labels, but merges instances.
belong to each object. (Only things!) (Both things and stuff)
2/24/2025 30
Computer Vision Tasks: Instance Segmentation
Instance Segmentation: Detect all
objects in the image, and identify the
pixels that belong to each object.
(Only things!)
2/24/2025 31
Beyond Instance Segmentation: Panoptic Segmentation
2/24/2025 32
Beyond Instance Segmentation: Panoptic Segmentation
2/24/2025 33
Panoptic quality (PQ) measure
• Computed per-category and results are averaged
across categories.
• The ground truth and predicted segments are
matched with an IoU threshold 0.5
• TP (matched pairs), FP (unmatched predicted
segments), and FN (unmatched ground truth
segments).
2/24/2025 35
Questions?
2/24/2025 36