Object detection research paper
Object detection research paper
Abstract—Today images and videos are everywhere. In fact, in this domain. It uses ultrasonic sensors to discover objects
the sheer quantity of images on social media and networking sites and hurdles for the correct evaluation [1]. The use of high
is unfathomable. Every device is now fitted with a camera. This value ultrasonic sensors may be user friendly but it is highly
opens up huge possibilities. Object Recognition is a process of affected by temperature variations and also it has problems in
detecting an object and identifying it using various image scrutinizing reflections from soft, curved, slim and tiny
algorithms. The main purpose of this paper is to recognize objects
objects. With the help of buzzers and alarm the alert is
in real time and allot the objects to the classes that are previously
defined. The algorithms that we utilized are more computationally generated so as to make sure the person does not face any
efficient. Previously, object detection was done using RFID and accident. This paper also makes use of the Dangling Object
IR technologies which required dedicated hardware. But with the Detection algorithm which determines the position of the
advent of image processing and neural networks, we require object and if it falls in the warning range. The research
almost no new hardware. Almost everything has camera these days analysis of this paper shows that the users can quickly obtain
from pens to mobile phones. This has given rise to a new field real time output by utilizing their device when they are in
called computer vision i.e. using pictures and videos to detect, movement. The use of buzzers is a viable option in our project
segregate and track objects or events so that we can “understand” as it is an ideal output generation mode for the visually
a real world scenario.
impaired. Using Haar Cascade algorithm we can create a
similar comparison model for recognition. One can extract
Keywords—Haar Cascade, numpy array, depth calculation, the images and make the identification by dividing it into
tensorflow, datasets various regions and running the algorithm in each region for
part by part division for a higher operational accuracy [2]. We
can also implement this system using RFID tags which
I. INTRODUCTION increases the overall accuracy. This is based on transmitter-
receiver technology where the objects are fitted with a
Blind people have traditionally always relied on guide canes
transmitter and the person holds the receiver. It transfers data
and physical touch to sense objects. With about 285 million
from a tag, which sends information to a reader, thus it can
people over the world have some form of visual impairment,
influence the data making decisions [3]. Using the above
according to World Health Organization, developing visual
method one can measure the distance between the detector
aids is one of the most vibrant research projects among the
and the object and the motion of the locator with respect to
computer vision community. We have designed one such aid
the RFID tag which is made using the RSSI (Received Signal
by combining the traditional cane with a device which uses
Strength Indicator) value. The usage of Haar-like features
neural networks and image processing to guide the visually
makes the task simpler and easy to design as a Haar-like
impaired.
feature takes neighboring pixel regions at a required point in
a detection frame, adds up the pixel intensities in all frames
One of the biggest challenge in computer vision and in image
and estimates the difference between these summations. This
processing is achieving true invariant object recognition.
difference is then used to classify sub regions of an image [4].
Although concepts like image matching, robust feature
This constructed frame keeps sliding over the entire image
detection, and 3D models have been in the conversation for a
and defines a positive or negative value set to find the object.
long time now, it’s only recently, more specifically since the
If the object is found then that frame gives out a positive value
twilight of the last decade that researchers and professionals
or else a negative. Thus by carrying out this process over the
have approached this problem seriously. Only recently, has
entire image we can accurately identify the object. The
there been substantial progress in the implementation of
overall speed of this process is high and thus there is a bleak
algorithms that detect invariant features in every-day more
chance of a delayed or incorrect output. Advancement in the
complex images. The early endeavors towards digital image
object detection domain has been with the use of haptic
recognition were limited in scope. That is identification was
technology with the help of a virtual environment [5]. With
limited to only corners and edges. This proved effective but
the help of virtual reality we can easily construct an ideal
had many limitation as the mere recognition of corners was
route for the visually impaired. In the object recognition
not enough for the elaboration of 3D models and object
scenario we reduce the time required for the system to
reconstruction suffered in many cases. Hence another class of
identify objects based on different shapes which are
algorithm focused on matching textures was included. As we
comprehended using different edges, points and lines.
see there have been many projects and technical publications
978-1-5386-9166-3/19/$31.00 ©2019 IEEE
2019 International Conference on Nascent Technologies in Engineering (ICNTE 2019)
Systems like voice are very efficient in converting captured We are implementing an object recognition system using the
images to sound signals which can be heard by the person on Haar Cascade algorithm. We have taken various input real
a headset [6]. One can also use the Vibe system which takes time datasets which are stored internally. We then import
into account certain scanning laws and correlates the image these images in our algorithm so as to keep as a reference
intensity to sound thus giving a different sound level for model for comparison. Our input datasets comprise of all the
different pixel intensities [7] . common image models which could be encountered by a
Traditionally radars have also been used for object detection, blind person in their day to day life. We take our input image
especially when the objects are moving vehicles, but even in using a Raspberry Pi camera module and interface it with the
this scenario cameras are found to offer significant assistance Raspberry Pi hardware. We set our frame rate as 1fps. With
to radars. But this is limited for getting more accuracy in the frame rate of 10fps, the Raspberry Pi takes more than 20
motion, more desirable resolution and lesser overall costs. seconds to process the image and recognize the objects. This
Volvo’s Blind Spot Information System is one example[8]. time lag is not desirable. Thus, the frame rate is kept less. This
The fact that extra features can be added to already existing helps us to get an output with good accuracy without
camera systems enables the case of using a camera over other compromising on the speed.
devices. Object and its moving direction detection using
Depth calculation has also been realized [9]. In this paper for
image acquisition they have used a RGB camera with a depth
sensor of Microsoft Kinect. For data collection a total of 600
samples of depth images of many different front scenes with
their respective RGB images are scrutinized to test the new
system. This is becomes a huge problem again because of its
computational heavy nature and real time application would
become very arduous. A near-range object detection using
randomly aligned stereo cameras is presented in [10]. It is
based on stereo reverse perspective mapping. Images on the
left and right are compared and using their differences
obstacle detection is achieved. After the transformation phase
a comparison is made using a polar histogram. The system
holds up well in a variety of conditions but it requires two
cameras, which would increase cost and space requirements.
We took objects from real life and took more than 200 images
II. IMPLEMENTATION of each object. This dataset was then trained for more than 40
epochs i.e. cycles. Thus, the edges were clearly defined in
these datasets. To add simplicity to our project, we utilized
2019 International Conference on Nascent Technologies in Engineering (ICNTE 2019)
the imageai library. Along with our original datasets this the object and identifier we can give an output signal
library helped us to recognize more objects. These images are conveying the position of the object, how far it is and what
made into MODEL files by tensorflow. best route should be adopted to avoid collision.
The real time data was then compared with the datasets. The We also added the features for obstacle aversion. This is
images were converted to a numpy array. It facilitates with a necessary as we want to avoid obstacles and design a clear
elevated-performance array object and different tools for path for movement. The object estimation has to be done
running on them. The numpy array that we made is a grid of perfect so as to avoid any mishap. The signals could be as
values and by a tuple of nonnegative integers, it is indexed. simple as ‘left’ and ‘right’ signals which notify the user about
We also used the SciPy library. SciPy provides primary the object and thus update him or her for the change in route.
functions to work with various images. It read images from The output also gives information on the object regarding its
disk into numpy arrays and resized them. Numpy array was name so that the user has a clear idea of what is in front of
particularly useful in creating a set of image pixels on which him or her. This output signal is given in terms of an audio
various operations can be performed. Once the image was output so that it is friendly to the visually impaired person. So
obtained we imported the numpy function which gives first we introduced a buzzer with gives an alert about the
required value to all pixels and creates a grid. This array presence of an object and then the audio output giving the
makes it easy for the classifier to initialize the recognition information about the object, how far it is and what direction
process and implement the required logic. We ran the change to be adopted. Using these audio output directions one
inference for each frame per second. In each of the frames, can easily manage his/her route safely.
the objects are detected. Thus, the objects called the
appropriate functions. In the end, these objects were labelled
with the classes that we defined before. The probability of the III. RESULTS AND CONCLUSION
object actually being present was also found. We set the
threshold of the error probability to 55%. Below this As we obtained the input images the classifier converted
threshold, even though the object is detected and recognized, those images into a numpy array of defined values. After
it won’t show in the output. detecting the image, certain functions were called to find the
edges and points of the image and finding its particular shape
and dimension. As we took input datasets of various objects
we were able to identify these objects by the class names that
we had taken and stated a probability of that object appearing
in the image. In the first part of this project we designed the
algorithm for identifying objects in a given captured image.
The objects covering a larger area of the image are considered
as large objects and their probability of occurrence is high.
We ran the code for various different images with varying
light and background to check the accuracy. The output
showed the smallest of images as been identified by the
classifier.
V. FUTURE WORK