0% found this document useful (0 votes)
112 views35 pages

CV - YOLO v1

The document discusses YOLO v1, a one-stage object detection algorithm. YOLO v1 frames object detection as a regression problem to predict bounding boxes and class probabilities directly from full images in one evaluation. It divides images into grids and predicts two bounding boxes per grid cell along with confidence scores representing how confident the model is that the box contains an object and how accurate the box is. The model is trained end-to-end using a mean squared error loss function. Evaluation metrics for object detection like average precision are used to evaluate YOLO v1 on datasets like PASCAL VOC.

Uploaded by

TẤN TRÌNH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views35 pages

CV - YOLO v1

The document discusses YOLO v1, a one-stage object detection algorithm. YOLO v1 frames object detection as a regression problem to predict bounding boxes and class probabilities directly from full images in one evaluation. It divides images into grids and predicts two bounding boxes per grid cell along with confidence scores representing how confident the model is that the box contains an object and how accurate the box is. The model is trained end-to-end using a mean squared error loss function. Evaluation metrics for object detection like average precision are used to evaluate YOLO v1 on datasets like PASCAL VOC.

Uploaded by

TẤN TRÌNH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Computer Vision

YOLO v1

Pham Viet Cuong


Dept. Control Engineering & Automation, FEEE
Ho Chi Minh City University of Technology
Face Detection: Viola - Jones Algorithm
✓ Object detection problem
❖ Single object detection ❖ Bounding box(es)
❖ Multiple object detection ❖ Class(es) of object(s)

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 2
Face Detection: Viola - Jones Algorithm
✓ Haar-like features
❖ Window size: 24x24
❖ Type
❖ Position
❖ Size
❖ ~ 160K features
✓ Features usefulness?

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 3
Face Detection: Viola - Jones Algorithm
✓ Feature selection
❖ Positive & negative sets
10K examples/set
❖ Weak classifiers

24x24 window feature


❖ Objective: min # examples
misclassified
❖ ~ 6K features selected
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 4
Face Detection: Viola - Jones Algorithm
✓ Trade-off
❖ More features (higher detection rate, lower false positive rate)
❖ More computational complexity
✓ Cascade structure
❖ 6061 features
❖ 38 stages
❖ First 5 layers: 1, 10, 25, 25, 50 features
❖ Average: 10 feature evaluations
per window
❖ 15 – 600 times faster than others
❖ Negative examples?
false positive examples
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 5
Face Detection: Viola - Jones Algorithm
✓ How to detect face(s) in an image?
❖ Sliding window: 24x24 ❖ Binay classifier: Face / Non face
(384x288 image) ❖ Window scaling

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 6
Face Detection: Viola - Jones Algorithm
✓ How to detect face(s) in an image?
❖ Sliding window: 24x24 ❖ Binay classifier: Face / Non face
(384x288 image)

❖ AlexNet?
▪ Binary classifier → AlexNet
▪ Multiple object detection
❖ More efficient?
▪ Region proposal

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 7
YOLO v1
✓ Two-stage object detection (R-CNN, fast R-CNN, faster R-CNN)
Region Object
Image
Proposal Classification
✓ One-stage object detection
Image CNN
❖ YOLO – You Only Look Once

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 8
YOLO v1
✓ Structure

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 9
YOLO v1
✓ S=7 ✓ Confidence score
✓ Bounding box: x, y, w, h

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 10
YOLO v1
✓ # outputs:
❖ (5B + C)S2, B = 2, C = 20, S = 7

20

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 11
YOLO v1

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 12
YOLO v1
✓ Linear regression problem

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 13
YOLO v1
✓ Confidence score
❖ How likely the bounding box contains an object?
❖ How accurate is the bounding box (location and size)?
Confidence score = Pr(Object)*IoU
IoU: Intersection over Union

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 14
YOLO v1
✓ Confidence score
❖ How likely the bounding box contains an object?
❖ How accurate is the bounding box (location and size)?
Confidence score C = Pr(Object)*IoU
✓ Conditional class probability
pi(c) = Pr(Classi|Object)
✓ Test:

Class-specific confidence scores for each box: probability of that class appearing
in the box and how well the predicted box fits the object.
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 15
YOLO v1
✓ Activation function

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 16
YOLO v1
✓ Loss function

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 17
YOLO v1
✓ Training

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 18
YOLO v1

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 19
YOLO v1

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 20
YOLO v1
✓ Limitations
❖ Spatial constrain
▪ Two bounding boxes, one class per grid cell
▪ Struggle with small objects in groups, e.g. flocks of birds
❖ Relatively coarse features due to multiple downsampling layers
❖ Main error: incorrect localization

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 21
YOLO v1
✓ Results – PASCAL VOC 2007

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 22
YOLO v1
✓ Results – PASCAL VOC 2007

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 23
YOLO v1
✓ Results – PASCAL VOC 2007

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 24
YOLO v1
✓ Results – PASCAL VOC 2012

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 25
YOLO v1
✓ Results

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 26
YOLO v1

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 27
YOLO v1
✓ Results

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 28
YOLO v1
✓ Comparison

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 29
YOLO v1
✓ Comparison

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 30
YOLO v1
✓ Comparison

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 31
YOLO v1
✓ Comparison

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 32
YOLO v1
✓ Evaluation
❖ Classification problem?
▪ Top-1 error rate
▪ Top-5 error rate
❖ Object detection problem?

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 33
YOLO v1
✓ Evaluation
❖ Confusion matrix
❖ Recall (detection rate, true positive rate, sensitivity)
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝐷𝑅 =
𝑇𝑃 + 𝐹𝑁
❖ Precision
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 34
YOLO v1
✓ Evaluation
❖ Precision - Recall curve
❖ Interpolated Precision - Recall curve
❖ AP
❖ AP50, AP75
❖ mAP
✓ Dataset
❖ PASCAL VOC
❖ COCO

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 35

You might also like