Computer Vision
YOLO v1
Pham Viet Cuong
Dept. Control Engineering & Automation, FEEE
Ho Chi Minh City University of Technology
Face Detection: Viola - Jones Algorithm
✓ Object detection problem
❖ Single object detection ❖ Bounding box(es)
❖ Multiple object detection ❖ Class(es) of object(s)
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 2
Face Detection: Viola - Jones Algorithm
✓ Haar-like features
❖ Window size: 24x24
❖ Type
❖ Position
❖ Size
❖ ~ 160K features
✓ Features usefulness?
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 3
Face Detection: Viola - Jones Algorithm
✓ Feature selection
❖ Positive & negative sets
10K examples/set
❖ Weak classifiers
24x24 window feature
❖ Objective: min # examples
misclassified
❖ ~ 6K features selected
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 4
Face Detection: Viola - Jones Algorithm
✓ Trade-off
❖ More features (higher detection rate, lower false positive rate)
❖ More computational complexity
✓ Cascade structure
❖ 6061 features
❖ 38 stages
❖ First 5 layers: 1, 10, 25, 25, 50 features
❖ Average: 10 feature evaluations
per window
❖ 15 – 600 times faster than others
❖ Negative examples?
false positive examples
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 5
Face Detection: Viola - Jones Algorithm
✓ How to detect face(s) in an image?
❖ Sliding window: 24x24 ❖ Binay classifier: Face / Non face
(384x288 image) ❖ Window scaling
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 6
Face Detection: Viola - Jones Algorithm
✓ How to detect face(s) in an image?
❖ Sliding window: 24x24 ❖ Binay classifier: Face / Non face
(384x288 image)
❖ AlexNet?
▪ Binary classifier → AlexNet
▪ Multiple object detection
❖ More efficient?
▪ Region proposal
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 7
YOLO v1
✓ Two-stage object detection (R-CNN, fast R-CNN, faster R-CNN)
Region Object
Image
Proposal Classification
✓ One-stage object detection
Image CNN
❖ YOLO – You Only Look Once
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 8
YOLO v1
✓ Structure
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 9
YOLO v1
✓ S=7 ✓ Confidence score
✓ Bounding box: x, y, w, h
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 10
YOLO v1
✓ # outputs:
❖ (5B + C)S2, B = 2, C = 20, S = 7
20
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 11
YOLO v1
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 12
YOLO v1
✓ Linear regression problem
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 13
YOLO v1
✓ Confidence score
❖ How likely the bounding box contains an object?
❖ How accurate is the bounding box (location and size)?
Confidence score = Pr(Object)*IoU
IoU: Intersection over Union
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 14
YOLO v1
✓ Confidence score
❖ How likely the bounding box contains an object?
❖ How accurate is the bounding box (location and size)?
Confidence score C = Pr(Object)*IoU
✓ Conditional class probability
pi(c) = Pr(Classi|Object)
✓ Test:
Class-specific confidence scores for each box: probability of that class appearing
in the box and how well the predicted box fits the object.
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 15
YOLO v1
✓ Activation function
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 16
YOLO v1
✓ Loss function
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 17
YOLO v1
✓ Training
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 18
YOLO v1
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 19
YOLO v1
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 20
YOLO v1
✓ Limitations
❖ Spatial constrain
▪ Two bounding boxes, one class per grid cell
▪ Struggle with small objects in groups, e.g. flocks of birds
❖ Relatively coarse features due to multiple downsampling layers
❖ Main error: incorrect localization
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 21
YOLO v1
✓ Results – PASCAL VOC 2007
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 22
YOLO v1
✓ Results – PASCAL VOC 2007
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 23
YOLO v1
✓ Results – PASCAL VOC 2007
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 24
YOLO v1
✓ Results – PASCAL VOC 2012
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 25
YOLO v1
✓ Results
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 26
YOLO v1
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 27
YOLO v1
✓ Results
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 28
YOLO v1
✓ Comparison
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 29
YOLO v1
✓ Comparison
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 30
YOLO v1
✓ Comparison
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 31
YOLO v1
✓ Comparison
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 32
YOLO v1
✓ Evaluation
❖ Classification problem?
▪ Top-1 error rate
▪ Top-5 error rate
❖ Object detection problem?
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 33
YOLO v1
✓ Evaluation
❖ Confusion matrix
❖ Recall (detection rate, true positive rate, sensitivity)
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝐷𝑅 =
𝑇𝑃 + 𝐹𝑁
❖ Precision
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 34
YOLO v1
✓ Evaluation
❖ Precision - Recall curve
❖ Interpolated Precision - Recall curve
❖ AP
❖ AP50, AP75
❖ mAP
✓ Dataset
❖ PASCAL VOC
❖ COCO
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 35