100% found this document useful (1 vote)

673 views

Object Detection Week 2 YOLOv1-YOLOv8

YOLO and Its Variants discusses the YOLO object detection algorithm and its variants YOLOv1, YOLOv2, and YOLOv3. The original YOLO algorithm detects objects in images using a single neural network, making it faster than two-stage detectors like Faster R-CNN. YOLOv2 improved on YOLOv1 by adding batch normalization, higher resolution classifiers, anchor boxes, and dimension clusters to detect objects more accurately. YOLOv3 further improved performance by using Darknet-53, a deeper backbone network with 53 convolutional layers, to better detect small objects.

Uploaded by

Ngọc Hân

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

673 views

Object Detection Week 2 YOLOv1-YOLOv8

Uploaded by

Ngọc Hân

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 264

YOLO and Its Variants

Dr. Vinh Dinh Nguyen

A paper list of object detection using deep learning
https://arxiv.org/abs/1905.05055
https://www.v7labs.com/blog/yolo-object-detection
motivation
➢ In 2015, Joseph Redmon (University of Washington) developed
YOLO. One of his co-authors, Ross Girshick (Microsoft Research),
published a paper for Faster R-CNN around the same time. They
probably shared common ideas in computer vision research as there
are some similarities between YOLO and Faster R-CNN. For
example, both models apply convolutional layers on input images to
generate feature maps. However, Faster R-CNN uses a two-stage
object detection pipeline, while YOLO has no separate region
proposal step and is much faster than Faster R-CNN.
➢ YOLO has many versions (variants). Joseph Redmon
developed the first three versions of YOLO: YOLOv1, v2,
and v3. Then, he quit.
What is YOLO?
➢ YOLO is an abbreviation for the term ‘You Only Look Once’. This
is an algorithm that detects and recognizes various objects in a picture
(in real-time). Object detection in YOLO is done as a regression
problem and provides the class probabilities of the detected images.

➢ YOLO algorithm employs convolutional neural networks (CNN) to

detect objects in real-time. As the name suggests, the algorithm
requires only a single forward propagation through a neural network
to detect objects.
➢
Why the YOLO algorithm is important
➢ Speed: This algorithm improves the speed of detection
because it can predict objects in real-time.
➢ High accuracy: YOLO is a predictive technique that provides
accurate results with minimal background errors.
➢ Learning capabilities: The algorithm has excellent learning
capabilities that enable it to learn the representations of objects
and apply them in object detection.
How the YOLO algorithm works
➢ Residual blocks
➢ Bounding box regression
➢ Intersection Over Union (IOU)
Residual blocks
Bounding box regression
➢ Every bounding box in the image consists of the following
attributes:
○ Width (bw)
○ Height (bh)
○ Class (for example, person, car, traffic light, etc.)- This
is represented by the letter c.
○ Bounding box center (bx,by)
Intersection over union (IOU)
➢ Each grid cell is responsible for predicting the bounding
boxes and their confidence scores. The IOU is equal to 1
if the predicted bounding box is the same as the real box.
This mechanism eliminates bounding boxes that are not
equal to the real box.
Combination of the three techniques

https://www.cv-
foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
What is YOLO algorithm? | Deep Learning Tutorial 31 (Tensorflow, Keras & Python)
https://www.youtube.com/watch?v=ag3DLKsl2vk
Non Maximum Suppression
Intersection Over Union (IoU)
Yolo-v1 - Network Architecture
YOLO-v1
➢ The YOLO is a network was “inspired by” GoogleNet. It
has 24 convolutional layers working for feature extractors
and 2 dense layers for doing the predictions. This
architecture works upon is called Darknet. There is a fast
version of YOLO called “Tiny-YOLO” which only has 9
convolution layers
YOLOV1
The input image is divided into an S×S grid (S=7)
Each grid cell predicts B bounding boxes (B=2) and confidence scores for those boxes
Each bounding box consists of 5 predictions: x, y, w, h, and confidence
Each grid cell also predicts conditional class probabilities, P(Classi|Object). (Total number of
classes=20)
output of the network The output size becomes: 7×7×(2×5+20)=1470
Yolo-v1 - Network Architecture
Yolo-v1
Yolo-v1
Limitations of YOLO
YOLOV1 - implementation
MOTIVATION OF YOLOV2
➢ YOLO v1 was faster than Faster R-CNN, but it was less
accurate.
➢ YOLO v1’s weakness was the bounding box accuracy. It
didn’t predict object locations and sizes well,
particularly bad at spotting small objects.
➢ SSD, another single-stage detector, broke the record by
being better (more accurate) than Faster R-CNN and
even faster than YOLO v1.
MOTIVATION OF YOLOV2
➢ Joseph Redmon and Ali Farhadi developed YOLO v2, which is
better and faster than SSD and Faster R-CNN. They made a series
of changes, and the paper details how much improvement each
incremental change brought. However, they didn’t stop there.

➢ They wanted to make their object detector to recognize a wide

variety of objects. Pascal VOC object detection dataset contains
only 20 classes. They wanted their model to recognize much
more classes of objects
MOTIVATION OF YOLOV2
➢ . The problem was that building an object detection dataset with
millions of labeled images would take too much time. Labeling
images for object detection is far more expensive than image
classification. So, they devised a new method to train YOLO9000,
simultaneously taking advantage of MS
COCO and ImageNet datasets. As a result, YOLO9000 can detect
over 9000 object categories.
Changes in YOLOV2
➢ Batch Normalization
○ In YOLO v2, they added Batch Normalization to all
convolutional layers
Batch normalization
➢ Normalizing the input data => speed up training process
➢ Why do we not normalize the activation of every layer?
Changes in YOLOV2
➢ High-Resolution Classifier
○ In YOLO v1, they trained a classifier with ImageNet images of
size 224 x 224. Then, they increase the image resolution to
448 x 448 to train their object detection models. Hence, the
network had to simultaneously adjust to the new input
resolution and learn object detection.
Changes in YOLOV2
➢ High-Resolution Classifier
yolo-v2
➢ Convolutional with Anchor Boxes
○ YOLOv2 removes all fully connected layers and uses
anchor boxes to predict bounding boxes

○ YOLO v1 only predicted two bounding boxes per grid cell,

which means a total of 49 (= 7 x 7 x 2) bounding boxes per
image, much lower than Faster R-CNN.
YOLO v1 suffered from a low recall rate
YOLOV1

Two bounding boxes had to share the same class probabilities. As such, increasing the number of bounding
boxes would not benefit much. On the contrary, Faster R-CNN and SSD predicted class probabilities for each
bounding box, making it easier to predict multiple classes sharing a similar center location.
Increasing Feature Map Resolution
➢ They removed one max-pooling layer, leaving five of them
for downsampling input images by a factor of 32. It would
convert the original input size of 448 x 448 into 14 x 14
feature maps, four times more grid cell locations than the
original feature map of 7 x 7.

➢ (448 x 448) => (416x416) => 13x13 feature

➢ Having more weights in the fully-connected layers
Going Fully Convolutional
➢ They replaced the fully-connected layers with
convolutional layers. The number of weights became
negligible since the kernel size is small and fixed no matter
the input resolution. It allowed them to use more bounding
boxes per grid cell. As such, they introduced anchor boxes
similar to Faster R-CNN.
➢ The recall rate increased thanks to the increased feature
map resolution and more bounding boxes. However, the
accuracy went down from 69.5 mAP to 69.2 mAP. So, why
did they still make this change?
➢ YOLOv1 was an anchor-free model that predicted the coordinates of B-
boxes directly using fully connected layers in each grid cell.

➢ Inspired by Faster-RCNN that predicts B-boxes using hand-picked priors

known as anchor boxes, YOLOv2 also works on the same principle.

➢ Unlike YOLOv1, wherein each grid cell, the model predicted one set of
class probabilities per grid cell, ignoring the number of boxes B, YOLOv2
predicted class and objectness for every anchor box.
Dimension Clusters
➢ Dimension ClustersUnlike Faster-RCNN, which used hand-
picked anchor boxes, YOLOv2 used a smart technique to
find anchor boxes for the PASCAL VOC and MS COCO
datasets.
➢ Redmon and Farhadi thought that instead of using hand-
picked anchor boxes, we pick better priors that reflect the
data more closely. It would be a great starting point for the
network, and it would become much easier for the network
to predict the detections and optimize faster.
Dimension Clusters
➢ Using k-means clustering on the training set bounding boxes
to find good anchor boxes or priors.
Direct Location Prediction
Add fine-grained feature
Multi-Scale Training
Light-weight backbone
➢ Darknet-19
➢ A fully convolutional model with 19
convolutional layers and five max-
pooling layers was designed.
YOLOV2 - Implementation
➢ https://www.maskaravivek.com/post/yolov2/
Yolo-v3
➢ YOLO v3 released in April
2018 which adds further
small improvements,
included the fact that
bounding boxes get
predicted at different scales.
In this version, the darknet
framework expanded to 53
convolution layers.
Yolo-v3
Yolo-v3
YOLOV3
➢ YOLO v2 introduced darknet-19, which is 19-layer network
supplemented with 11 more layers for object detection.
However, having a 30-layer architecture, YOLO v2 often
struggled at detecting small objects. Therefore, the authors
introduce successive 3 × 3 and 1 × 1 convolutional layers
followed by some shortcut connections allowing the
network be much deeper. Thus, the authors introduce their
Darknet-53, which is shown below.
➢ The Original YOLO was the first object detection network to combine
the problem of drawing bounding boxes and identifying class labels in one
end-to-end differentiable network.
➢ YOLOv2 made a number of iterative improvements on top of YOLO
including BatchNorm, higher resolution, and anchor boxes.
➢ YOLOv3 built upon previous models by adding an objectness score to
bounding box prediction, added connections to the backbone network
layers, and made predictions at three separate levels of granularity to
improve performance on smaller objects.
YOLOV3-Implementation
YOLOV4
➢ The Original YOLO — YOLO was the first object detection network to
combine the problem of drawing bounding boxes and identifying class labels
in one end-to-end differentiable network.

➢ YOLOv2 — YOLOv2 made a number of iterative improvements on top of

YOLO including BatchNorm, higher resolution, and anchor boxes.

➢ YOLOv3 — YOLOv3 built upon previous models by adding an objectness

score to bounding box prediction, added connections to the backbone
network layers and made predictions at three separate levels of granularity to
improve performance on smaller objects.
➢ There are two types of object detection models : two-stage object detectors
and single-stage object detectors. Single-stage object detectors (like YOLO )
architecture are composed of three components: Backbone, Neck and a
Head to make dense predictions as shown in the figure bellow.
Yolo-v4
➢ The YOLO v4 released in April 2020, but this release is not from the YOLO
first author. In Feb 2020, Joseph Redmon announced he was leaving the field
of computer vision due to concerns regarding the possible negative impact of
his works.
YOLO-v4
➢ The YOLO v4 release lists three authors: Alexey Bochkovskiy, the Russian developer
who built the YOLO Windows version, Chien-Yao Wang, and Hong-Yuan Mark
Liao.
➢ Compared with the previous YOLOv3, YOLOv4 has the following advantages:
○ It is an efficient and powerful object detection model that enables anyone with a
1080 Ti or 2080 Ti GPU to train a super fast and accurate object detector.
○ The influence of state-of-the-art “Bag-of-Freebies” and “Bag-of-Specials” object
detection methods during detector training has been verified.
○ The modified state-of-the-art methods, including CBN (Cross-iteration batch
normalization), PAN (Path aggregation network), etc., are now more efficient
and suitable for single GPU training.
Backbone
➢ Models such as ResNet, DenseNet, VGG, etc, are used as
feature extractors. They are pre-trained on image
classification datasets, like ImageNet, and then fine-tuned
on the detection dataset. Turns out that, these networks that
produce different levels of features with higher semantics
as the network gets deeper (more layers), are useful for
latter parts of the object detection network.
What’s the backbone for?
➢ It’s a deep neural network composed mainly of convolution
layers. The main objective of the backbone is to extract the
essential features, the selection of the backbone is a key step
it will improve the performance of object detection. Often
pre-trained neural networks are used to train the backbone
YOLOv4 Backbone Network: Feature Formation
➢ CSPResNext50
➢ CSPDarknet53
➢ EfficientNet-B3
CPSDarknet53
➢ CSP( Cross-Stage-Partial connections) + Darknet53
DenseNet (Dense connected convolutional network)
Five-layer dense block
One Dense Block in DenseNet
The ResNet-18 architecture.
Effficientnet
YOLOv4 Neck: Feature Aggregation
➢ FPN, PAN, NAS-FPN, BiFPN
Feature image pyramid
SPP
Path Aggregation Network (PAN)
Path Aggregation Network (PAN)
Spatial Attention Module
YOLOV4+SAM
YOLOv4 Head: The Detection Step
Bag of freebies
➢ Bag of freebies methods are the set of methods that only
increase the cost of training or change the training
strategy while leaving the cost of inference low. Let’s present
some simple methods commonly used in computer vision.
YOLOv4 Bag of Freebies
➢ Improve performance of the network without adding to inference
time in production
CutMix data augmentation
Mosaic data augmentation.
DropBlock regularization

https://playground.tensorflow.org
Class label smoothing
Overconfidence in Neural Networks
YOLOv4 Bag of Specials
• Mish activation,
• Cross-stage partial connections (CSP)
• Multi-input weighted residual connections (MiWRC)
Mish activation
YOLOV4
➢ IMPLEMENTATION
YOLOV5
YOLOV5
➢ Yolov5 almost resembles Yolov4 with some of the following
differences:
○ Yolov4 is released in the Darknet framework, which is
written in C. Yolov5 is based on the PyTorch
framework.

○ Yolov4 uses .cfg for configuration whereas Yolov5 uses

.yaml file for configuration.
Backbone
➢ CSPResBlock
=> C3 Module
Activation function
Loss Function
➢ YOLOv5 returns three outputs: the classes of the detected
objects, their bounding boxes and the objectness scores.
Thus, it uses BCE (Binary Cross Entropy) to compute the
classes loss and the objectness loss. While CIoU (Complete
Intersection over Union) loss to compute the location loss.
The formula for the final loss is given by the following
equation
Anchor Box
n for extra small (nano) size model., s for small size
model., m for medium size model., l for large size model
x for extra large size model
YOLOV5 implementation
YOLOX
➢ Released in July 2021, YOLOX has switched to the anchor
free approach which is different from previous YOLO
networks.
➢ In short, salient features of YOLOX are,
• Anchor free design
• Decoupled head
• simOTA label assignment strategy
• Advanced Augmentations: Mixup and Mosaic
Anchors in Object Detection
Anchor-Based Object Detection
Drawbacks of Anchor Based Approach
1. It needs a large set of anchor boxes. For example, it is more
than 100k in RetinaNet.
2. The anchor boxes require a lot of hyperparameters and
design tweaks. For example,
○ Number of anchors
○ Size of the anchors
○ The aspect ratio of the boxes
○ The number of sections the image should be divided into
Anchor Free Object Detection
Anchor Free Object Detectors
Anchor Free YOLOX
➢ YOLOX adopts the center-based approach which has a per-pixel detection mechanism.
In anchor based detectors, the location of the input image acts as the center for multiple
anchors.

➢ YOLOv3 SPP outputs 3 predictions per location. Each prediction has an 85D vector
with embeddings for the following values.
○ Class score
○ IoU score
○ Bounding box coordinates ( center x, center y, width, height)
➢ On the other hand, YOLOX reduced the predictions at
each location (pixel) from 3 to 1. The prediction contains
only a single 4D vector, encoding the location of the box at
each foreground pixel.
Striding for anchor free
The anchor location on the image can be obtained with the following formulas:
x = s/2 + s*i
y = s/2 + s*j
The following formulas are used to map a predicted bounding box (p_x, p_y, p_w, p_h)
to the actual location on the image (l_x, l_y, l_w, l_h) if (x, y) is the intersection point on
the grid which the prediction belongs to and s is the stride at the current FPN level:
l_x = p_x + x

l_y = p_y + y

l_w = s*e^(p_w)

l_h = s*e^(p_h)
For example, let’s go back the bear image with a stride of 32. If the anchor point for this
prediction was (i, j) = (2, 1) meaning intersection point 2 on the x-axis and 1 on the y-axis, I
would be looking at the following point on the image:

l_x = 20 + 80 = 100

l_y = 15 + 48 = 63

If the model produces the prediction of (20, 15, 0.2, 0.3) l_w = 32*e^(0.2) = 39

l_h = 32*e^(0.3) = 43
Introducing the Decoupled Head in YOLOX
Shared Head in YOLO
➢ Earlier YOLO networks used a coupled head architecture.
All the regression and classification scores are obtained from
the same head.
The Need for Decoupling the YOLO Head
Decoupled Head in YOLOX
Multi-Positives in YOLOX to Improve the
Recall
Label Assignment in Object Detection problem
Label Assignment Before SimOTA
OTA: Optimal Transport Assignment for Object Detection
simOTA Advanced Label Assignment Strategy
Dynamic k Estimation
Center Prior
Strong Data Augmentation in YOLOX
Mosaic Augmentation
YOLOX IMplementation
YOLOV6

YOLO detectors are constantly evolving, as is evident from new YOLO models being released
every few months. With YOLOv6, let’s explore what new and exciting features it brings to the
table.
➢ YOLOv6 employs plenty of new approaches to achieve state-of-the-art
results. These can be summarized into four points:
• Anchor free: Hence provides better generalizability and costs less time in
post-processing.
• The model architecture: YOLOv6 comes with a revised reparameterized
backbone and neck.
• Loss functions: YOLOv6 used Varifocal loss (VFL) for classification and
Distribution Focal loss (DFL) for detection.
• Industry handy improvements: Longer training epochs, quantization, and
knowledge distillation are some techniques that make YOLOv6 models best
suited for real-time industrial applications.
Anchor-free method
➢ This makes YOLOv6 51% faster compared to most anchor-
based object detectors. This is possible because it has 3
times fewer predefined priors.
➢
The YOLOv6 Backbone Architecture
Training Inference
The YOLOv6 Neck Architecture
The YOLOv6 Detection Head
YOLOv6 implementation
YOLOV7
➢ YOLOv3 model, introduced by Redmon et al. in 2018
➢ YOLOv4 model, released by Bochkovskiy et al. in 2020, YOLOv4-tiny model,
research published in 2021
➢ YOLOR (You Only Learn One Representation) model, published in 2021
➢ YOLOX model, published in 2021
➢ NanoDet-Plus model, published in 2021
➢ PP-YOLOE, an industrial object detector, published in 2022
➢ YOLOv5 model v6.1 published by Ultralytics in 2022
➢ YOLOv7, published in 2022
YOLOV7
• Backbone: ELAN (YOLOv7-p5, YOLOv7-p6), E-ELAN (YOLOv7-
E6E)
• Neck: SPPCSPC + (CSP-OSA)PANet (YOLOv7-p5, YOLOv7-p6)
+ RepConv
• Head: YOLOR + Auxiliary Head (YOLOv7-p6)
➢ The architecture is derived from YOLOv4, Scaled YOLOv4, and
YOLO-R. Using these models as a base, further experiments
were carried out to develop new and improved YOLOv7.
➢
backbone Extended efficient layer aggregation networks (E-ELAN)
Model scaling for concatenation-based models
Planned re-parameterized convolution
Coarse for Auxiliary and Fine for Lead Loss
YOLOV7 implementaton
What’s New in YOLOv8
v User-friendly API (Command Line + Python).
v Faster and More Accurate.
v Supports
v Object Detection,
v Instance Segmentation,
v Image Classification.
v Extensible to all previous versions.
v New Backbone network.
v New Anchor-Free head.
v New Loss Function.
YOLOV8 implementaton
Here is a summary of the steps to calculate the AP:

• Generate the prediction scores using the model.

• Convert the prediction scores to class labels.
• Calculate the confusion matrix—TP, FP, TN, FN.
• Calculate the precision and recall metrics.
• Calculate the area under the precision-recall curve.
• Measure the average precision.
Dr. nguyen dinh vinh, FpT University, can tho

Machine Learning Models For Salary Prediction Dataset Using Python
No ratings yet
Machine Learning Models For Salary Prediction Dataset Using Python
5 pages
Mercedes Benz Model 963-964 Set Valve Clearance
100% (4)
Mercedes Benz Model 963-964 Set Valve Clearance
3 pages
PC 19 FT in
No ratings yet
PC 19 FT in
1 page
Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
Vermeer D100x120 Series II Navigator
No ratings yet
Vermeer D100x120 Series II Navigator
608 pages
Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Session 1
0% (1)
Session 1
13 pages
Computer Vision55
100% (1)
Computer Vision55
268 pages
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
50% (2)
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
8 pages
Project
100% (1)
Project
30 pages
Object Detection Technique (YOLO)
No ratings yet
Object Detection Technique (YOLO)
19 pages
Yolo
No ratings yet
Yolo
10 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
ML Training by Custom Yolo v5
No ratings yet
ML Training by Custom Yolo v5
56 pages
Object Detection - Week 1 - Object Detection in 20 Years - Final
No ratings yet
Object Detection - Week 1 - Object Detection in 20 Years - Final
280 pages
Object Detection
No ratings yet
Object Detection
57 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Object Recognition
No ratings yet
Object Recognition
30 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Deep Learning
100% (1)
Deep Learning
49 pages
Object Detection Using Image Processing
No ratings yet
Object Detection Using Image Processing
17 pages
Pytorch Lightning Readthedocs Latest
100% (1)
Pytorch Lightning Readthedocs Latest
421 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
20 Questions To Test Your Skills On CNN Convolutional Neural Networks
No ratings yet
20 Questions To Test Your Skills On CNN Convolutional Neural Networks
11 pages
YOLO
No ratings yet
YOLO
31 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
0% (1)
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
4 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
13 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
YOLOV10 Explained
No ratings yet
YOLOV10 Explained
13 pages
Deep Learning and Computer Vision in Remote Sensing
No ratings yet
Deep Learning and Computer Vision in Remote Sensing
574 pages
Notes Ros
No ratings yet
Notes Ros
51 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
OpenCV by Example - Sample Chapter
No ratings yet
OpenCV by Example - Sample Chapter
25 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
Analytical Study On Object Detection Using Yolo Algorithm
No ratings yet
Analytical Study On Object Detection Using Yolo Algorithm
3 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
ROS-Lab Tutorial
No ratings yet
ROS-Lab Tutorial
7 pages
Yolov 8
No ratings yet
Yolov 8
31 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
ANN Matlab
No ratings yet
ANN Matlab
13 pages
Movidius Neural Computer Stick
No ratings yet
Movidius Neural Computer Stick
33 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Tensorflow PDF
No ratings yet
Tensorflow PDF
62 pages
YOLO Presentation
100% (1)
YOLO Presentation
21 pages
Deep Learning Applications and Image Processing
No ratings yet
Deep Learning Applications and Image Processing
5 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Computer Vision and Deep Learning 1708702317
No ratings yet
Computer Vision and Deep Learning 1708702317
93 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
No ratings yet
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
No ratings yet
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
7 pages
Tensorflow Internal
No ratings yet
Tensorflow Internal
17 pages
Cognitive Sensors and IoT ElectroVolt - Ir
100% (1)
Cognitive Sensors and IoT ElectroVolt - Ir
281 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
55 pages
08 Robot Sensor Motor
No ratings yet
08 Robot Sensor Motor
29 pages
YOLO V2 For Object Detection
No ratings yet
YOLO V2 For Object Detection
38 pages
YOLO_v2
No ratings yet
YOLO_v2
9 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
Project Presentation Lab
No ratings yet
Project Presentation Lab
13 pages
Mtech PPT 2
No ratings yet
Mtech PPT 2
14 pages
High Frequency and Algorithmic Trading - Indian School of Business (ISB)
No ratings yet
High Frequency and Algorithmic Trading - Indian School of Business (ISB)
2 pages
RS5583 - IM XPanDG Plus
No ratings yet
RS5583 - IM XPanDG Plus
52 pages
Unit I Notes DMT
No ratings yet
Unit I Notes DMT
26 pages
Robotics Safety-Teaching Module-Presentation
No ratings yet
Robotics Safety-Teaching Module-Presentation
58 pages
1-Calculation of The Peak Power Output of The PV System
No ratings yet
1-Calculation of The Peak Power Output of The PV System
2 pages
Taurus LMLA SPV Series Product Leaflet
No ratings yet
Taurus LMLA SPV Series Product Leaflet
2 pages
2024 Ethio KI13213MU Hyundai Tucson Kia Sportage Web
No ratings yet
2024 Ethio KI13213MU Hyundai Tucson Kia Sportage Web
20 pages
Steam Turbine
100% (1)
Steam Turbine
16 pages
Software Guide: Roland Software Package (RSP-078a)
No ratings yet
Software Guide: Roland Software Package (RSP-078a)
12 pages
English Ncsentry2k User Guide
No ratings yet
English Ncsentry2k User Guide
8 pages
Test Cases From UC & Scenario
No ratings yet
Test Cases From UC & Scenario
12 pages
Elgin Hyper G Cut Sheet 2019
No ratings yet
Elgin Hyper G Cut Sheet 2019
4 pages
Assignment3 Ee12
No ratings yet
Assignment3 Ee12
2 pages
2 U48 Datasheet 2
No ratings yet
2 U48 Datasheet 2
1 page
Assessment of Saturation Flow at Signalized Intersections A Synthesis of Global Perspective and Future Directions PDF
No ratings yet
Assessment of Saturation Flow at Signalized Intersections A Synthesis of Global Perspective and Future Directions PDF
13 pages
Blessing Komponen 5 November 2022
No ratings yet
Blessing Komponen 5 November 2022
250 pages
Veeam Basics: Intro To Veeam Products: Product Demo
No ratings yet
Veeam Basics: Intro To Veeam Products: Product Demo
11 pages
Honda WT20X Water Pump Shop Manual (Serial No GX140-1000001 To 9999999 & GC02-2000001 To 8669999, WAAJ-1000001 To 9999999)
No ratings yet
Honda WT20X Water Pump Shop Manual (Serial No GX140-1000001 To 9999999 & GC02-2000001 To 8669999, WAAJ-1000001 To 9999999)
105 pages
Specification 1431487912 PDF
No ratings yet
Specification 1431487912 PDF
13 pages
Admission Brochure For: Details of Undergraduate & Postgraduate Programmes
No ratings yet
Admission Brochure For: Details of Undergraduate & Postgraduate Programmes
27 pages
Land Use Survey Proposal
No ratings yet
Land Use Survey Proposal
11 pages
Project Charter 2015
No ratings yet
Project Charter 2015
3 pages
The Impact of Augmented Reality On Purchase Intentions While Shopping Online in The Footwear Market - Thesis Submission
No ratings yet
The Impact of Augmented Reality On Purchase Intentions While Shopping Online in The Footwear Market - Thesis Submission
55 pages
Lab 02
No ratings yet
Lab 02
32 pages

Object Detection Week 2 YOLOv1-YOLOv8

Uploaded by

Object Detection Week 2 YOLOv1-YOLOv8

Uploaded by

YOLO and Its Variants

Dr. Vinh Dinh Nguyen

➢ YOLO algorithm employs convolutional neural networks (CNN) to

➢ They wanted to make their object detector to recognize a wide

○ YOLO v1 only predicted two bounding boxes per grid cell,

➢ (448 x 448) => (416x416) => 13x13 feature

➢ Inspired by Faster-RCNN that predicts B-boxes using hand-picked priors

➢ YOLOv2 — YOLOv2 made a number of iterative improvements on top of

➢ YOLOv3 — YOLOv3 built upon previous models by adding an objectness

○ Yolov4 uses .cfg for configuration whereas Yolov5 uses

• Generate the prediction scores using the model.

You might also like