0% found this document useful (0 votes)

10 views

5. Object Detection and Segmentation - part 2

The document discusses semantic segmentation in deep learning, focusing on labeling each pixel in images without differentiating instances. It covers various datasets used for training, models like Fully Convolutional Networks (FCN), U-Net, and DeepLabV3+, and techniques for upsampling and downsampling. Additionally, it addresses the concepts of object detection, instance segmentation, and panoptic segmentation, along with their respective metrics for evaluation.

Uploaded by

gamecule1

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

5. Object Detection and Segmentation - part 2

Uploaded by

gamecule1

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Deep Learning

Object Detection and Segmentation

Huỳnh Văn Thống
FPT Univ.
Semantic Segmentation
• Label each pixel in the image with
a category label.
• Don’t differentiate instances, only
care about pixels.

2/24/2025 2
Segmentation: Dataset
• Pascal VOC: 16k training natural images divided into 20 classes.
• Cityscapes: 25K urban-street images divided into 30 classes.
• ADE20K: 25K (20 stands for 20K training) scene-parsing images
divided into 150 classes.
• MS COCO: 328K images with 80 things categories and 91 stuff
categories.

Models are often pre-trained in the

large MS-COCO dataset, before
finetuned to the specific dataset.

2/24/2025 3
Semantic Segmentation: FCN
• FCN = Fully Convolutional Network.
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

2/24/2025 4
Semantic Segmentation: FCN
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

Problem #1: Effective receptive field size Problem #2: Convolution on high res
is linear in number of conv layers: With L images is expensive! Recall ResNet stem
3x3 conv layers, receptive field is 1+2L aggressively downsamples.
2/24/2025 5
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

2/24/2025 6
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling : ?
Pooling, strided convolution

2/24/2025 7
In-Network Upsampling: “Unpooling”

2/24/2025 8
In-Network Upsampling: Bilinear Interpolation

Use two closest neighbors in 𝑥 and 𝑦

to construct linear approximations

2/24/2025 9
In-Network Upsampling: Bicubic Interpolation

Use three closest neighbors in 𝑥 and 𝑦 to

construct cubic approximations.
(This is how we normally resize images)

2/24/2025 10
In-Network Upsampling: “Max Unpooling”
Max Pooling: Remember Max Unpooling: Place into
which position had the max remembered positions

Pair each downsampling layer with

an upsampling layer

2/24/2025 11
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 1, pad 1

2/24/2025 12
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

2/24/2025 13
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Convolution with stride > 1 is “Learnable Downsampling”

Can we use stride < 1 for “Learnable Upsampling”?

2/24/2025 14
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 15
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 16
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs

are overlap

2/24/2025 17
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs

are overlap

2/24/2025 18
Transposed Convolution: 1D example

Output has copies of filter

weighted by input.

Stride 2: Move 2 pixels output

for each pixel in input.

Sum at overlaps.

2/24/2025 19
Transposed Convolution: 1D example
Many name:
• Deconvolution (bad).
• Upconvolution.
• Fractionally strided
convolution.
• Backward strided
convolution.
• Transposed Convolution
(best).
2/24/2025 20
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling :
Pooling, strided convolution Iinterpolation,
transposed conv
2/24/2025 21
Semantic Segmentation: FCN
• Combine predictions with different resolutions

Fully Convolutional Networks for Semantic Segmentation. Long et al., CVPR, 2015
2/24/2025 22
Semantic Segmentation: U-Net
• Incorporating the low-
level information.

U-Net: Convolutional Networks for Biomedical Image

Segmentation, Ronneberger et al., MICCAI 2015
2/24/2025 23
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales

Encoder-Decoder with Atrous Separable Convolution

for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 24
Atrous Convolution

Sparse feature extraction with

standard convolution on a
low-resolution input feature
map.

Dense feature extraction with

atrous convolution with rate r=2,
applied on a high-resolution input
feature map.

2/24/2025 25
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales.

• Refine the segmentation

results along object
boundaries.

Encoder-Decoder with Atrous Separable Convolution

for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 26
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances

2/24/2025 27
Things and Stuff
Things: Object categories that
can be separated into object
instances (e.g. cats, cars,
person).

Stuff: Object categories that

cannot be separated into
instances (e.g. sky, grass,
water, trees)

2/24/2025 28
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances.
(Only things) (Both things and stuff)

2/24/2025 29
Computer Vision Tasks
Instance Segmentation: Detect all objects Semantic Segmentation: Gives per
in the image and identify the pixels that pixel labels, but merges instances.
belong to each object. (Only things!) (Both things and stuff)

2/24/2025 30
Computer Vision Tasks: Instance Segmentation
Instance Segmentation: Detect all
objects in the image, and identify the
pixels that belong to each object.
(Only things!)

Approach: Perform object detection,

then predict a segmentation mask
for each object!

2/24/2025 31
Beyond Instance Segmentation: Panoptic Segmentation

• Label all pixels in the image

(both things and stuff).

• For “thing” categories also

separate into instances.

2/24/2025 32
Beyond Instance Segmentation: Panoptic Segmentation

2/24/2025 33
Panoptic quality (PQ) measure
• Computed per-category and results are averaged
across categories.
• The ground truth and predicted segments are
matched with an IoU threshold 0.5
• TP (matched pairs), FP (unmatched predicted
segments), and FN (unmatched ground truth
segments).

SQ: how close the predicted segments are to the

ground truth segment (does not consider bad RQ: just like for detection, we want to know if we are missing
predictions!) any instances (FN) or predicting more instances (FP)
2/24/2025 34
Next
• Visualization and Understanding
• Attention and Transformer
• Foundation Models and Promptable Segmentation.
• ….

2/24/2025 35
Questions?

2/24/2025 36

2021-22 Math Core Post Mock Exam Paper 1
No ratings yet
2021-22 Math Core Post Mock Exam Paper 1
24 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Lect-7 Segmentation Localization
No ratings yet
Lect-7 Segmentation Localization
151 pages
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
02 Semantic Segmentation 2024
No ratings yet
02 Semantic Segmentation 2024
53 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
Lecture-21-Semantic-Segmentation
No ratings yet
Lecture-21-Semantic-Segmentation
24 pages
Deconvolution Network ICCV 2015 Paper PDF
No ratings yet
Deconvolution Network ICCV 2015 Paper PDF
9 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
8-Image Detection and Segmentation
No ratings yet
8-Image Detection and Segmentation
73 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
8DL
No ratings yet
8DL
6 pages
12. Object Detection-compressed
No ratings yet
12. Object Detection-compressed
80 pages
DL UNIT 5
No ratings yet
DL UNIT 5
63 pages
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
No ratings yet
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
6 pages
Vision
No ratings yet
Vision
24 pages
UNet For Semantic Segmentation - DTD - 19april2024
No ratings yet
UNet For Semantic Segmentation - DTD - 19april2024
20 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
17 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
Fully_Convolutional_Networks_for_Semantic_Segmentation
No ratings yet
Fully_Convolutional_Networks_for_Semantic_Segmentation
12 pages
lecture4
No ratings yet
lecture4
46 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
2018 - Understanding Convolution For Semantic Segmentation
No ratings yet
2018 - Understanding Convolution For Semantic Segmentation
10 pages
He 2017
No ratings yet
He 2017
9 pages
Semantic Segmentation of Images
No ratings yet
Semantic Segmentation of Images
76 pages
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
No ratings yet
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
5 pages
05 CNN 2
No ratings yet
05 CNN 2
92 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
od_segment_221219_043435
No ratings yet
od_segment_221219_043435
40 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Mask
No ratings yet
Mask
12 pages
Overview of semantic segmentation
No ratings yet
Overview of semantic segmentation
20 pages
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
No ratings yet
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
9 pages
METHODOLOGY
No ratings yet
METHODOLOGY
5 pages
Instance Segmentation
No ratings yet
Instance Segmentation
51 pages
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
6 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
No ratings yet
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
9 pages
He Mask R-CNN Iccv 2017 Paper
No ratings yet
He Mask R-CNN Iccv 2017 Paper
9 pages
He Mask R-CNN ICCV 2017 Paper PDF
No ratings yet
He Mask R-CNN ICCV 2017 Paper PDF
9 pages
CS60010_CNN 4
No ratings yet
CS60010_CNN 4
32 pages
Lecture 8 Image Segmentationi n Computer Vision 2025
No ratings yet
Lecture 8 Image Segmentationi n Computer Vision 2025
18 pages
Strudel Transformer Segmentation
No ratings yet
Strudel Transformer Segmentation
17 pages
NN 09
No ratings yet
NN 09
34 pages
REF-6-DeepLab_Semantic_Image_Segmentation_with_Deep_Convolutional_Nets_Atrous_Convolution_and_Fully_Connected_CRFs
No ratings yet
REF-6-DeepLab_Semantic_Image_Segmentation_with_Deep_Convolutional_Nets_Atrous_Convolution_and_Fully_Connected_CRFs
15 pages
9781638280712-summary
No ratings yet
9781638280712-summary
65 pages
Deeplab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs
No ratings yet
Deeplab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs
14 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
No ratings yet
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
18 pages
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide _ Medium
No ratings yet
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide _ Medium
16 pages
Lecture07 VDL Part01
No ratings yet
Lecture07 VDL Part01
90 pages
RefineNet
No ratings yet
RefineNet
11 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet
GPR-KPIs 2022
No ratings yet
GPR-KPIs 2022
29 pages
Thesis Final
No ratings yet
Thesis Final
89 pages
AI - (Deep Learning/NLP) : 5 Days
No ratings yet
AI - (Deep Learning/NLP) : 5 Days
4 pages
NO.1 A. B. C. D. E.: Answer
No ratings yet
NO.1 A. B. C. D. E.: Answer
7 pages
PLSQL Assignments Day5 Final
No ratings yet
PLSQL Assignments Day5 Final
3 pages
3D Cell Design Capability Stack Template 2402 Workflow
No ratings yet
3D Cell Design Capability Stack Template 2402 Workflow
53 pages
US6 - Move - To - The - Cloud
No ratings yet
US6 - Move - To - The - Cloud
194 pages
U.S. Army Corps of Engineers Riprap Design For Flood Channels
No ratings yet
U.S. Army Corps of Engineers Riprap Design For Flood Channels
9 pages
Peoplesoft Financials User Guide: Journal Entries
No ratings yet
Peoplesoft Financials User Guide: Journal Entries
36 pages
Papago m9 Installation Instruction
No ratings yet
Papago m9 Installation Instruction
5 pages
(Campus of Open Learning) University of Delhi Delhi-110007
No ratings yet
(Campus of Open Learning) University of Delhi Delhi-110007
1 page
9 - Mobile Marketing
No ratings yet
9 - Mobile Marketing
57 pages
How To Make Money With Paypal
50% (2)
How To Make Money With Paypal
13 pages
GIS Manual (Powerpoint) Final
100% (1)
GIS Manual (Powerpoint) Final
252 pages
Assignment 7 Solution
No ratings yet
Assignment 7 Solution
3 pages
2014 - Machine To MachineCommunication
No ratings yet
2014 - Machine To MachineCommunication
6 pages
Data Encryption Standard (DES) : Information Assurance and Security 1
No ratings yet
Data Encryption Standard (DES) : Information Assurance and Security 1
20 pages
LAB 4-In Lab Tasks
No ratings yet
LAB 4-In Lab Tasks
2 pages
Basic PLC LOGO 8!
No ratings yet
Basic PLC LOGO 8!
70 pages
"Development of HRM System": A Study On
No ratings yet
"Development of HRM System": A Study On
79 pages
Shahidullah Kotwal: Personal Statement
No ratings yet
Shahidullah Kotwal: Personal Statement
3 pages
SAURAVTERMPAPER_PPT
No ratings yet
SAURAVTERMPAPER_PPT
11 pages
Network Analysis On DURGAPUR
No ratings yet
Network Analysis On DURGAPUR
10 pages
OCZ Vertex4 M Product Sheet
No ratings yet
OCZ Vertex4 M Product Sheet
3 pages
Pcs Neo Webinar Praesentation
No ratings yet
Pcs Neo Webinar Praesentation
51 pages
Sekurzone: Open ITS Platform
No ratings yet
Sekurzone: Open ITS Platform
2 pages
MyReskill Session 4 - Node-RED, Serial, Gauge, Line Chart
No ratings yet
MyReskill Session 4 - Node-RED, Serial, Gauge, Line Chart
66 pages
MEC HW Katalog en
No ratings yet
MEC HW Katalog en
14 pages
InspireDesigner-User Manual-V10.0.0.2 v2
No ratings yet
InspireDesigner-User Manual-V10.0.0.2 v2
21 pages

5. Object Detection and Segmentation - part 2

Uploaded by

5. Object Detection and Segmentation - part 2

Uploaded by

Deep Learning

Object Detection and Segmentation

Models are often pre-trained in the

Use two closest neighbors in 𝑥 and 𝑦

Use three closest neighbors in 𝑥 and 𝑦 to

Pair each downsampling layer with

Recall: Normal 3 x 3 convolution, stride 1, pad 1

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Convolution with stride > 1 is “Learnable Downsampling”

Can we use stride < 1 for “Learnable Upsampling”?

3 x 3 transposed convolution, stride 2

3 x 3 transposed convolution, stride 2

3 x 3 transposed convolution, stride 2 Sum where outputs

3 x 3 transposed convolution, stride 2 Sum where outputs

Output has copies of filter

Stride 2: Move 2 pixels output

U-Net: Convolutional Networks for Biomedical Image

Encoder-Decoder with Atrous Separable Convolution

Sparse feature extraction with

Dense feature extraction with

• Refine the segmentation

Encoder-Decoder with Atrous Separable Convolution

Stuff: Object categories that

Approach: Perform object detection,

• Label all pixels in the image

• For “thing” categories also

SQ: how close the predicted segments are to the

You might also like