0% found this document useful (0 votes)
10 views

Object-Detection-with-YOLO

The document discusses object detection using the YOLO (You Only Look Once) algorithm, which improves speed by processing the entire image at once rather than scanning multiple regions. It outlines the steps of the YOLO algorithm, including bounding box prediction, performance measurement using Union over Intersection (UoI), and non-max suppression to avoid double counting. Additionally, it covers the use of pretrained models from the COCO dataset and the process for training custom YOLO models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Object-Detection-with-YOLO

The document discusses object detection using the YOLO (You Only Look Once) algorithm, which improves speed by processing the entire image at once rather than scanning multiple regions. It outlines the steps of the YOLO algorithm, including bounding box prediction, performance measurement using Union over Intersection (UoI), and non-max suppression to avoid double counting. Additionally, it covers the use of pretrained models from the COCO dataset and the process for training custom YOLO models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Object Detection with YOLO

Chad Wakamiya
Spring 2020
Agenda

Object Detection YOLO Algorithm YOLO Implementations


Defining the object ● YOLO algorithm steps ● Pretrained models
detection problem and a ● Bounding boxes with the COCO
naive solution. ● Measuring performance dataset.
(UoI) ● Custom trained
● Non-max suppression models
Object Detection
Classification vs. Object Detection
Object Detection is the problem of locating and classifying objects in an image.

Classification Object Detection


● Each image has one object ● Each image may contain multiple objects
● Model predicts one label ● Model classifies objects and identifies their location.

Cat Car

Car

Dog Cat
Dog
Bounding Box
Naive Approach
1. Scan the image with a sliding window. 2. Feed the images into a classifier model to
predict a label for that region.

Label
Classifier Dog?
Model (CNN) Person?
Nothing?

● This approach is slow since it checks many windows that


don't contain anything -> Not good for real time uses.
● The Region-based Convolutional Neural Net (R-CNN) is an
improved version that strategically selects regions that are
likely to contain an object to run through the CNN.
YOLO Algorithm
YOLO "You Only Look Once"
● Instead of making predictions on many regions of an image, YOLO passes the entire image at
once into a CNN that predicts the labels, bounding boxes, and confidence probabilities for
objects in the image.
● YOLO runs much faster than region based algorithms quick because requires only a single pass
through a CNN.
Label

Confidence
Probability
Convolutional Neural Net Car: 0.93

Bounding Box
Input Output
YOLO Steps
1. Divide the image into cells 2. Each cell predicts B 3. Return bounding boxes
with an S x S grid. bounding boxes. above confidence threshold.

Car: 0.93

S=3 B=2
Cell A cell is responsible for detecting an All other bounding boxes have a
object if the object's bounding box confidence probability less than
falls within the cell. (Notice that each the threshold (say 0.90) so they
cell has 2 blue dots.) are suppressed.
In practice, we we would use large values (S = 19 and B = 5) to identify more objects.
How are bounding boxes encoded?
Let's use a simple example where there are 3x3 cells (S=3), each cell predicts 1 bounding box (B=1),
and objects are either dog = 1 or human = 2. For each cell, the CNN predicts a vector y:
Example:
Probability the bounding box contains
pc an object 1

bx bx
Coordinates of the bounding box's
center
by by
b
(bx, y= y=
h
bh Width (height) of bounding box as
bh
by) a percent of the cell's width or
bw (height) bw
c1 Probability the cell contains an 0
object that belongs to class 1 (or 2)
bw c2 given the cell contains an object 1

*There's a probability for each class so if there are 80 classes we would have c1,…c80
Encoding Multiple Bounding Boxes
What happens if we predict multiple bounding boxes per cell (B>1)? We simply augment y.

The CNN will predict a y for each cell,


pc so the size of the output tensor
bx (multidimensional "matrix") should be:
bh S×S×(5B+C)
(bx, by) by
bh
bw
y=
pc
bw
bx
S
by
bh bh (5B+C)
(bx, by) bw
c1 S
bw c2
Notice that y has 5B+C elements (C is the number of classes).
YOLO Overview
Input Output

Convolutional Neural Net Car: 0.93

S×S×(5B+C)
W×H×3
W: Width of image in pixels Series of convolutional and A tensor that specifies the
L: Height of image in pixels
3: Number of color channels in RGB
pooling layers. bounding box locations and
class probabilities.
Measuring Performance with UoI
● Union over Intersection (UoI) measures the overlap between two bounding boxes.
● During training, we calculate the UoI between a predicted bounding box and and the ground truth
(the prelabeled bounding box we aim to match)

Ground Truth
Area of Intersection
Union over
=
Intersection Area of Union

Predicted Bounding Box

Poor Good Excellent

https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
Double Counting Objects (Non-Max Suppression)
● When predicting more than 2 bounding boxes per cell, sometimes the same object will be
detected multiple times (overlapping boxes with the same label)
● Non-max suppression solves multiple counting by removing the box with the lower confidence
probability when the UoI between 2 boxes with the same label is above some threshold.

Non-Max Suppression
Dog: 0.95
Dog: 0.95
Dog: 0.95 Dog: 0.82
Dog: 0.82 Dog: 0.41
Dog: 0.41
UoI: 0.62

UoI: 0.47

1. Identify the box with the 2. Calculate the UoI between 3. Suppress boxes with UoI
highest confidence. the highest confidence above a selected
box each of the other threshold (usually 0.3)
boxes.
Implementing YOLO
Pretrained Models
● Training a YOLO model requires images labeled with bounding boxes. These datasets may take
time to label, so readily available prelabeled images are often used to train models.
● A common dataset for image classification/detection/segmentation is the COCO (Common
Objects in Context), a database of images with 80 labelled classes.
● Popular pretrained YOLO models with COCO:
○ ImageAI (easy-to-use, lightweight YOLO implementation)
○ Darknet (trained by the author of YOLO)

YOLO Implementation
(CNN)

Pretrained Model
with COCO Pineapples and cantaloupes are not in
COCO so they are not recognized.
Applications built with COCO trained models will
COCO Pretrained Labels only be able to identify these objects!

person fire elephant skis wine glass broccoli diningtable toaster


hydrant
bicycle stop sign bear snowboard cup carrot toilet sink

car parking zebra sports ball fork hot dog tvmonitor refrigerator
meter
motorbike bench giraffe kite knife pizza laptop book
aeroplane bird backpack baseball bat spoon donut mouse clock
bus cat umbrella baseball glove bowl cake remote vase

train dog handbag skateboard banana chair keyboard scissors


truck horse tie surfboard apple sofa cell phone teddy bear
boat sheep suitcase tennis racket sandwich pottedplant microwave hair drier

traffic light cow frisbee bottle orange bed oven toothbrush


Custom Models
● If your use case only uses objects in COCO → you can use a pretrained model.
● Otherwise you will need to train your own YOLO model. This will require:

1. Finding images of the objects to recognize.


2. Label bounding boxes.
3. Train your YOLO model. There are 2 options:
a. Implement your own model using OpenCV, Tensorflow/Keras
b. Use ImageAI's custom training methods.
References/Further Reading
● YOLO
○ ://towardsdatascience.com/you-only-look-once-yolo-implementing-yolo-in-less-than-30-lines-o
f-python-code-97fb9835bfd2
● R-CNN
○ https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms
-36d53571365e
● CNN
○ https://www.coursera.org/lecture/convolutional-neural-networks/optional-region-proposals-aCY
Zv
● YOLO
○ https://hackernoon.com/understanding-yolo-f5a74bbc7967
○ https://www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor
-python/
● Intersection Over Union
○ https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

You might also like