Object-Detection-with-YOLO
Object-Detection-with-YOLO
Chad Wakamiya
Spring 2020
Agenda
Cat Car
Car
Dog Cat
Dog
Bounding Box
Naive Approach
1. Scan the image with a sliding window. 2. Feed the images into a classifier model to
predict a label for that region.
Label
Classifier Dog?
Model (CNN) Person?
Nothing?
Confidence
Probability
Convolutional Neural Net Car: 0.93
Bounding Box
Input Output
YOLO Steps
1. Divide the image into cells 2. Each cell predicts B 3. Return bounding boxes
with an S x S grid. bounding boxes. above confidence threshold.
Car: 0.93
S=3 B=2
Cell A cell is responsible for detecting an All other bounding boxes have a
object if the object's bounding box confidence probability less than
falls within the cell. (Notice that each the threshold (say 0.90) so they
cell has 2 blue dots.) are suppressed.
In practice, we we would use large values (S = 19 and B = 5) to identify more objects.
How are bounding boxes encoded?
Let's use a simple example where there are 3x3 cells (S=3), each cell predicts 1 bounding box (B=1),
and objects are either dog = 1 or human = 2. For each cell, the CNN predicts a vector y:
Example:
Probability the bounding box contains
pc an object 1
bx bx
Coordinates of the bounding box's
center
by by
b
(bx, y= y=
h
bh Width (height) of bounding box as
bh
by) a percent of the cell's width or
bw (height) bw
c1 Probability the cell contains an 0
object that belongs to class 1 (or 2)
bw c2 given the cell contains an object 1
*There's a probability for each class so if there are 80 classes we would have c1,…c80
Encoding Multiple Bounding Boxes
What happens if we predict multiple bounding boxes per cell (B>1)? We simply augment y.
S×S×(5B+C)
W×H×3
W: Width of image in pixels Series of convolutional and A tensor that specifies the
L: Height of image in pixels
3: Number of color channels in RGB
pooling layers. bounding box locations and
class probabilities.
Measuring Performance with UoI
● Union over Intersection (UoI) measures the overlap between two bounding boxes.
● During training, we calculate the UoI between a predicted bounding box and and the ground truth
(the prelabeled bounding box we aim to match)
Ground Truth
Area of Intersection
Union over
=
Intersection Area of Union
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
Double Counting Objects (Non-Max Suppression)
● When predicting more than 2 bounding boxes per cell, sometimes the same object will be
detected multiple times (overlapping boxes with the same label)
● Non-max suppression solves multiple counting by removing the box with the lower confidence
probability when the UoI between 2 boxes with the same label is above some threshold.
Non-Max Suppression
Dog: 0.95
Dog: 0.95
Dog: 0.95 Dog: 0.82
Dog: 0.82 Dog: 0.41
Dog: 0.41
UoI: 0.62
UoI: 0.47
1. Identify the box with the 2. Calculate the UoI between 3. Suppress boxes with UoI
highest confidence. the highest confidence above a selected
box each of the other threshold (usually 0.3)
boxes.
Implementing YOLO
Pretrained Models
● Training a YOLO model requires images labeled with bounding boxes. These datasets may take
time to label, so readily available prelabeled images are often used to train models.
● A common dataset for image classification/detection/segmentation is the COCO (Common
Objects in Context), a database of images with 80 labelled classes.
● Popular pretrained YOLO models with COCO:
○ ImageAI (easy-to-use, lightweight YOLO implementation)
○ Darknet (trained by the author of YOLO)
YOLO Implementation
(CNN)
Pretrained Model
with COCO Pineapples and cantaloupes are not in
COCO so they are not recognized.
Applications built with COCO trained models will
COCO Pretrained Labels only be able to identify these objects!
car parking zebra sports ball fork hot dog tvmonitor refrigerator
meter
motorbike bench giraffe kite knife pizza laptop book
aeroplane bird backpack baseball bat spoon donut mouse clock
bus cat umbrella baseball glove bowl cake remote vase