0% found this document useful (0 votes)
188 views13 pages

SEMINAR

The document discusses a seminar report on object detection using YOLO (You Only Look Once). It describes YOLO as a new approach to object detection that looks at the entire image to predict bounding boxes and class probabilities, allowing it to detect objects faster than other algorithms. The report provides an introduction to object detection and compares YOLO to other methods. It also includes sections on literature survey, the working of the YOLO algorithm including how it divides images into grids and predicts bounding boxes and class probabilities for each grid, and methods for improving detection accuracy.

Uploaded by

RAM LAXMAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views13 pages

SEMINAR

The document discusses a seminar report on object detection using YOLO (You Only Look Once). It describes YOLO as a new approach to object detection that looks at the entire image to predict bounding boxes and class probabilities, allowing it to detect objects faster than other algorithms. The report provides an introduction to object detection and compares YOLO to other methods. It also includes sections on literature survey, the working of the YOLO algorithm including how it divides images into grids and predicts bounding boxes and class probabilities for each grid, and methods for improving detection accuracy.

Uploaded by

RAM LAXMAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

JNANA SANGAMA, BELAGAVI – 590014

A Seminar Report on

“OBJECT DETECTION WITH YOLO”


Submitted in partial fulfillment of the requirements for the award of degree of

BACHELOR OF ENGINEERING
IN
ELECTRONICS AND COMMUNICATION
Submitted by:
NAME: Shrinivas Bhusannavar
USN: 2AG16EC036

Under the Guidance of


Mr. Gajanan Tudavekar.
Assistant Professor, Dept. of ECE,
AITM Belgaum.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

ANGADI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,


BELGAUM. 590008
2019-2020
SURESH ANGADI EDUCATION FOUNDATION’S
ANGADI INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(An Integrated Campus)
Department of Electronics & Communication Engineering
Approved by AICTE, New Delhi
Affiliated to Visvesvaraya Technological University, Belagavi

Certificate
Certified that the seminar work entitled “OBJECT DETECTION WITH YOLO” is
Bonafede work carried out by SHRINIVAS A BHUSANNAVAR (2AG16EC036) , in partial
fulfillment of the requirements for the award of the degree of Bachelor of Electronics and
Communication Engineering of Visvesvaraya Technological University, Belagavi, during
the year 2019-2020. It is certified that all the corrections/suggestions indicated for internal
assessment have been incorporated in the report. The Seminar report has been approved as it
satisfies the academic requirements in respect of seminar work prescribed for the Bachelor of
Engineering degree.

Signature of the Guide Signature of the HOD Signature of the Principal

Mr. Gajanan Tudavekar Dr. Anand Deshpande Dr. S.A.Pujari


Assistant Professor, Professor and Head, Principal,
Dept. of ECE, AITM. Dept. of ECE, AITM. AITM, Belgaum

Name of the Examiners: Signature with date:

1. ………………………. ………………………..

2. ………………………. ………………………..
Ⅰ. ABSTRACT:
The Objective is to detect of objects using You Only Look Once (YOLO), A new
approach to object detection. This method has several advantages as compared to other
object detection algorithms. In other algorithms like Convolutional Neural Network,
Fast Convolutional Neural Network the algorithm will not look at the image completely
but in YOLO the algorithm looks the image completely by predicting the bounding
boxes using convolutional network and the class probabilities for these boxes and
detects the image faster as compared to other algorithms. . Our base YOLO model
processes images in real-time at 45 frames per second. Finally, YOLO learns very
general representations of objects. It outperforms all other detection methods
Ⅱ. INTRODUCTION:
Object detection is a technology that detects the semantic objects of a class in digital
images and videos. One of its real-time applications is self-driving cars. In this, our task
is to detect multiple objects from an image. The most common object to detect in this
application is the car, motorcycle, and pedestrian. For locating the objects in the image,
we use Object Localization and must locate more than one object in real-time systems.
There are various techniques for object detection, they can be split up into two
categories, first is the algorithms based on Classifications. CNN and RNN come under
this category. In this, we must select the interested regions from the image and must
classify them using Convolutional Neural Network. This method is very slow because
we must run a prediction for every selected region. The second category is the
algorithms based on Regressions. YOLO method comes under this category. In this, we
won't select the interested regions from the image. Instead, we predict the classes and
bounding boxes of the whole image at a single run of the algorithm and detect multiple
objects using a single neural network. YOLO algorithm is fast as compared to other
classification algorithms. In real time our algorithm process 45 frames per second.
YOLO algorithm makes localization errors but predicts less false positives in the
background. YOLO reasons globally about the image when making predictions. Unlike
sliding window and region proposal-based techniques, YOLO sees the entire image
during training and test time so it encodes contextual information about classes as well
as their appearance. Fast R-CNN, a top detection method, mistakes background patches
in an image for objects because it can’t see the larger context. YOLO makes less than
half the number of background errors compared to Fast R-CNN. YOLO learns
generalizable representations of objects. When trained on natural images and tested on
artwork, YOLO outperforms top detection methods like DPM and R-CNN by a wide
margin. Since YOLO is highly generalizable it is less likely to break down when applied
to new domains or unexpected input.
Ⅲ. LITERATURE SURVEY:
You Only Look Once: Unified, Real-Time Object Detection, by Joseph Redmon. Their
prior work is on detecting objects using a regression algorithm. To get high accuracy
and good predictions they have proposed YOLO algorithm in this paper. Understanding
of Object Detection Based on CNN Family and YOLO, by Juan Du. In this paper, they
generally explained about the object detection families like CNN, R-CNN and
compared their efficiency and introduced YOLO algorithm to increase the efficiency.
Learning to Localize Objects with Structured Output Regression, by Matthew B.
Blaschko. This paper is about Object Localization. In this, they used the Bounding box
method for localization of the objects to overcome the drawbacks of the sliding window
method.
Ⅳ. WORKING OF YOLO ALGORITHM:
First, an image is taken and YOLO algorithm is applied. In our example, the image is
divided as grids of 3x3 matrixes. We can divide the image into any number grids,
depending on the complexity of the image. Once the image is divided, each grid
undergoes classification and localization of the object. The objectness or the confidence
score of each grid is found. If there is no proper object found in the grid, then the
objectness and bounding box value of the grid will be zero or if there found an object
in the grid then the objectness will be 1 and the bounding box value will be its
corresponding bounding values of the found object. The bounding box prediction is
explained as follows. Also, Anchor boxes are used to increase the accuracy of object
detection which also explained below in detail.

Figure 1: Working of YOLO

4.1 Bounding box predictions:


YOLO algorithm is used for predicting the accurate bounding boxes from the image.
The image divides into S x S grids by predicting the bounding boxes for each grid and
class probabilities. Both image classification and object localization techniques are
applied for each grid of the image and each grid is assigned with a label. Then the
algorithm checks each grid separately and marks the label which has an object in it and
marks its bounding boxes. The labels of the gird without object are marked as zero.
Figure 2: Example image with 3x3 grids

Consider the above example, an image is taken, and it is divided in the form of 3 x 3
matrixes. Each grid is labelled, and each grid undergoes both image classification and
objects localization techniques. The label is considered as Y. Y consists of 8 values.

Figure 3: Elements of label Y


Pc – Represents whether an object is present in the grid or not. If present pc=1 else 0.
bx, by, bh, bw – are the bounding boxes of the objects (if present). c1, c2, c3 – are the
classes. If the object is a car then c1 and c3 will be 0 and c2 will be 1.
In our example image, the first grid contains no proper object. So it is represented as,

Figure 4: Bounding box and Class values of grid 1

In this grid, there exists no proper object so the pc value is 0. And rest of the values are
doesn’t matter because there exist no object. So, it is represented as ?. Consider a grid
with the presence of an object. Both 6th and 9th grid of the image contains an object.
Let’ consider the 9th grid, it is represented as.

Figure 5: Bounding box and Class values of grid 9

In this table, 1 represents the presence of an object. And bx, by, bh, bw are the bounding
boxes of the object in the 6th grid. And the object in that grid is a car so the classes are
(0,1,0). The matrix form of Y in this is Y=3x3x8. For the 5th grid also the matrix will
be little similar with different bounding boxes by depending on the objects position in
the corresponding grid. If two or more grids contain the same object, then the centre
point of the object is found and the grid which has that point is taken. For this, to get
the accurate detection of the object we can use to methods. They are Intersection over
Union and Non-Max Suppression. In IoU, it will take the actual and predicted bounding
box value and calculates the IoU of two boxes by using the formulae, IoU = Area of
Intersection / Area of Union. If the value of IoU is more than or equal to our threshold
value (0.5) then it's a good prediction. The threshold value is just an assuming value.
We can also take greater threshold value to increase the accuracy or for better prediction
of the object. The other method is Non-max suppression, in this, the high probability
boxes are taken and the boxes with high IoU are suppressed. Repeat this until a box is
selected and consider that as the bounding box for that object.
4.2 Accuracy Improvement:
ANCHOR BOX: By using Bounding boxes for object detection, only one object can
be identified by a grid. So, for detecting more than one object we go for Anchor box.

Figure 6: An example image for anchor box

Consider the above picture, in that both the human and the car’s midpoint come under
the same grid cell. For this case, we use the anchor box method. The red color grid cells
are the two anchor boxes for those objects. Any number of anchor boxes can be used
for a single image to detect multiple objects. In our case, we have taken two anchor
boxes.

Figure 7: Anchor boxes Figure 8: Anchor box prediction values

The above figure represents the anchor box of the image we considered. The vertical
anchor box is for the human and the horizontal one is the anchor box of the car. In this
type of overlapping object detection, the label Y contains 16 values i.e, the values of
both anchor boxes.
Pc in both the anchor box represents the presence of the object. bx, by, bh, bw in both
the anchor box represents their corresponding bounding box values. The value of the
class in anchor box 1 is (1, 0, 0) because the detected object is a human. In the case of
anchor box 2, the detected object is a car so the class value is (0, 1, 0). In this case, the
matrix form of Y will be Y= 3x3x16 or Y= 3x3x2x8. Because of two anchor box, it is
2x8.

4.3 Design:
The idea of YOLO is to make a Convolutional neural network to predict a (7, 7, 30)
tensor. It uses a Convolutional neural network to scale back the spatial dimension to
7x7 with 1024 output channels at every location. By using two fully connected layers it
performs a linear regression to create a 7x7x2 bounding box prediction. Finally, a
prediction is made by considering the high confidence score of a box.
The initial convolutional layers of the network extract feature from the image while the
fully connected layers predict the output probabilities and coordinates. Our network
architecture is inspired by the GoogLeNet model for image classification. Our network
has 24 convolutional layers followed by 2 fully connected layers. However, instead of
the inception modules used by GoogLeNet we simply use 1×1 reduction layers followed
by 3×3 convolutional layers, similar to Linetal. The full network is shown in Figure 9.

Figure 9: CNN Network Design

The final output of our network is the 7×7×30 tensor of prediction.


Ⅴ. APPLICATIONS:

▪ Image panoramas
▪ Image watermarking
▪ Global robot localization
▪ Face Detection
▪ Optical Character Recognition
▪ Manufacturing Quality Control
▪ Content-Based Image Indexing
▪ Object Counting and Monitoring
▪ Automated vehicle parking systems
▪ Visual Positioning and tracking
▪ Video Stabilization
Ⅵ. CONCLUSION:
We introduce YOLO, a unified model for object detection. Our model is simple to
construct and can be trained directly on full images. Unlike classifier-based approaches,
YOLO is trained on a loss function That directly corresponds to detection performance
and the entire model is trained jointly. YOLO algorithm for the purpose of detecting
objects using a single neural network. This algorithm is generalized, it outperforms
different strategies once generalizing from natural pictures to different domains. The
algorithm is simple to build and can be trained directly on a complete image. Region
proposal strategies limit the classifier to a region. YOLO accesses to the entire image
in predicting boundaries. And, it predicts fewer false positives in background areas.
Comparing to other classifier algorithms this algorithm is much more efficient and
fastest algorithm to use in real time.
Ⅶ. REFERENCES:
1. Joseph Redmon, Santosh Divvala, Ross Girshick, “You Only Look Once: Unified,
Real-Time Object Detection”,The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 779-788.
2. YOLO Juan Du1,” Understanding of Object Detection Based on CNN Family”,New
Research, and Development Center of Hisense, Qingdao 266071, China.
3. Matthew B. Blaschko Christoph H. Lampert, “Learning to Localize Objects with
Structured Output Regression”, Published in Computer Vision – ECCV 2008 pp 2-15.
4. Wei Liu, Dragomir Anguelov, Dumitru Erhan, “SSD: Single Shot MultiBox
Detector”, Published in Computer Vision – ECCV 2016 pp 21-37.
5. Lichao Huang, Yi Yang, Yafeng Deng, Yinan Yu DenseBox, “Unifying Landmark
Localization with End to End Object Detection”, Published in Computer Vision and
Pattern Recognition (cs.CV).
6. Dumitru Erhan, Christian Szegedy, Alexander Toshev, “Scalable Object Detection
using Deep Neural Networks”, The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2014, pp. 2147-2154.

You might also like