0% found this document useful (0 votes)
42 views

Object Detection With Voice Sensor and Cartoonizing The Image

Object detection is a general term to describe a collection of related computer vision tasks that involve activities like identifying objects in digital photographs, identifying objects in live captured images.

Uploaded by

WARSE Journals
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Object Detection With Voice Sensor and Cartoonizing The Image

Object detection is a general term to describe a collection of related computer vision tasks that involve activities like identifying objects in digital photographs, identifying objects in live captured images.

Uploaded by

WARSE Journals
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISSN 2278-3091

Volume 10, No.4, July - August 2021


MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse171042021.pdf
https://doi.org/10.30534/ijatcse/2021/171042021

Object Detection with Voice Sensor and Cartoonizing the Image


MD.Salar Mohammad1, Bollepalli Pranitha2, Shivani Goud Pandula3 , Pulakanti Teja Sree4
1
Sreyas Institute of Engineering and Technology , India , [email protected]
2
Sreyas Institute of Engineering and Technology , India , [email protected]
3
Sreyas Institute of Engineering and Technology , India , [email protected]
4
Sreyas Institute of Engineering and Technology , India , [email protected]

ABSTRACT annotated text to speech and cartoonizing the image. When


we look at images or videos, we can easily locate and
Object detection is a general term to describe a identify the objects of our interest within moments. For
collection of related computer vision tasks that involve Computers it is a very big task to detect the objects. Blind
activities like identifying objects in digital photographs, people cannot detect objects in an image. So, Our application
identifying objects in live captured images. Object detection uses voice sensor in order to help the blind people. This
combines these two tasks and localizes and classifies one or application mainly uses the new approach to Object detection
more objects in an image. Object localization refers to i.e., YOLO(You Only Look Once).
identifying the location of one or more objects in an image
and drawing abounding box around their extent. Image Cartoonizing an image will transforms the image into
classification involves predicting the class of one object in an a cartoon image. It’s similar to beautify or AI effect in
image. In this application SAPI.spVoice is used inorder to cameras of modern mobile phones. It can be taken as
add voice. Voice sensor is used especially for the people who smoothening of an image to an extent. It makes an image
cannot see objects in a particular image. look vicious and like water paint, removing the roughness in
colors.
We present YOLO, a new approach to object
detection. YOLO, is a technique for object recognition OBJECTIVE
designed for speed and real-time use. YOLO model
processes images in real-time at 45 frames per second. A Object detection is a key ability required by most
smaller version of the network, Fast YOLO, processes an computer and robot vision systems. The latest research on
astounding 155 frames per second. this area has been making great progress in many directions.
The Objective of object detection with voice sensor is to
Cartoonizing an image will transforms the image into detect the object within an image with corresponding class
a cartoon image. Today we can find countless numbers of ids regardless of its position,scale,view within an image and
photo editing applications on the internet that allow us to converting the class ids to speech.
transforms images into cartoons on the internet. It’s similar
to BEAUTIFY or AI effect in cameras of modern mobile 1. INTRODUCTION
phones. It can be taken as smoothening of an image to an
extent. It makes an image look vicious and like water paint, 1.1 Object Detection
removing the roughness in colors. Object recognition is to describe a collection of
related computer vision tasks that involve activities like
So, this application will allow us to detect and
identifying objects in digital photographs. Image
identify the objects in an image along with voice sensor
classification involves activities such as predicting the class
which converts annotated text to speech and transforms an
of one object in an image. Object localization is refers to
image into a cartoon image without using any external tool.
identifying the location of one or more objects in an image
Key words : Object Detection, YOLO-You Look Only and drawing an abounding box around their extent. Object
Once, NMS- Non-Max Suppression, IoU-Intersection of detection does the work of combines these two tasks and
Union,Cartoonizing,Voice Sensor-win32com.client. localizes and classifies one or more objects in an image.
When a user or practitioner refers to the term “object
PROBLEM STATEMENT recognition“, they often mean “object detection“. It may be
challenging for beginners to distinguish between different
The main aim of this project is to recognize the related computer vision tasks.
objects in an image along with voice sensor which converts

2762
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767

So, we can distinguish between these three computer vision localization, and finally explore an object detection
tasks with this example: algorithm known as “You only look once” (YOLO).

Image Classification: This is done by Predict the type or Image classification also involves assigning a class
class of an object in an image. label to an image, whereas object localization involves
drawing a bounding box around one or more objects in an
Input: An image which consists of a single object, such as a image. Object detection is always more challenging and
photograph. combines these two tasks and draws a bounding box around
each object of interest in the image and assigns them a class
Output: A class label (e.g. one or more integers that are label. Together, all these problems are referred to as object
mapped to class labels). recognition.
Object Localization: This is done through, Locate the Object recognition refers to a collection of related
presence of objects in an image and indicate their location tasks for identifying objects in digital photographs. Region-
with a bounding box. based Convolutional Neural Networks, or R-CNNs, is a
family of techniques for addressing object localization and
Input: An image which consists of one or more objects, such recognition tasks, designed for model performance. You
as a photograph. Only Look Once, or YOLO is known as the second family of
techniques for object recognition designed for speed and
Output: One or more bounding boxes (e.g. defined by a
real-time use.
point, width, and height).
1.2 Cartoonizing an Image
Object Detection: This is done through, Locate the presence
of objects with a bounding box and types or classes of the Image Processing – In the field of the research
located objects in an image. processing of an image consisting of identifying an object in
an image, identify the dimensions, no of objects, changing
Input: An image which consists of one or more objects, such
the images to blur effect and such effects are highly
as a photograph.
appreciated in this modern era of media and communication.
Output: One or more bounding boxes (e.g. defined by a There are multiple properties in the Image Processing. Each
point, width, and height), and a class label for each bounding of the property estimates the image to be produced more with
box. essence and sharper image. Each Image is examined to
various grid. Each picture element together is viewed as a 2-
One of the further extension to this breakdown of D Matrix. With each of the cell store different pixel values
computer vision tasks is object segmentation, also called corresponding to each of the picture element.
“object instance segmentation” or “semantic segmentation,”
where instances of recognized objects are indicated by 2. LITERATURE SURVEY
highlighting the specific pixels of the object instead of a
coarse bounding box. From this breakdown, we can [1] Joseph Redmon, Santosh Divvala, Ali Farhadi -
understand that object recognition refers to a suite of Unified, Real-Time Object Detection : A unified model for
challenging computer vision tasks. object detection which is easy to build and is trained straight
on full images. The model was built to detect images
For example, image classification is simply straight accurately, fast and to differentiate between art and real
forward, but the differences between object localization and images.[2] Chengji Liu, Yufan Tao - Degenerative model: A
object detection can be confusing, especially when all three degenerative model built for detecting degraded images like
tasks may be just as equally referred to as object recognition. blurred and noisy images .This model performed better in
terms of detecting degraded images and coped better with
Humans can detect and identify objects present in an complex scenes. [3]Wenbo Lan, Song Wang - YOLO
image. The human visual system is fast and accurate and can Network Model : The number of detection frames can
also perform complex tasks like identifying multiple objects reach 25 frames/s, which meets the demands of real-time
and detect obstacles with little conscious thought. The performance.[4]Rumin Zhang, Yifeng Yang - The images of
availability of large sets of data, faster GPUs, and better the common obstacles were labeled and used for training
algorithms, we can now easily train computers to detect and YOLO. The object filter is applied to remove the unconcern
classify multiple objects within an image with high accuracy. obstacle. Different types of scene, including pedestrian,
We need to understand terms such as object detection, object chairs, books and so on, are demonstrated to prove the
localization, loss function for object detection and effectiveness of this obstacle detection algorithm.[5]Zhimin
Mo1, Liding Chen1, Wen-jing - Identification and detection

2763
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767

automotive door panel solder joints based on YOLO. The value is selected and used to locate the object within the
YOLO algorithm, proposed identifies the position of the image.
solder joints accurately in real time. This is helpful to
increase the efficiency of the production line and it has a Image classification and localization are applied on
great significance for the flexibility and real-time of the each grid. YOLO then predicts the bounding boxes and their
welding of automobile door panels.[6]Gatys first proposed a corresponding class probabilities for objects.
neural style transfer (NST) method based on CNNs that
transfers the style from the style image to the content image. We need to pass the labelled data to the model in
They use the feature maps of a pre-trained VGG network to order to train it. Suppose we have divided the image into a
represent the content and optimize the result image.The grid of size 3 X 3 and there are a total of 3 classes which we
results for cartoon style transfer are more problematic, as want the objects to be classified into. Let’s say the classes
they often fail to reproduce clear edges or smooth are Pedestrian, Car, and Motorcycle respectively. So, for
shading.[7]Li and Wand obtained style transfer by local each grid cell, the label y will be an eight dimensional
matching of CNN feature maps and using a Markov Random vector:
Field for fusion (CNNMRF). However, local matching can
make mistakes, resulting in semantically incorrect
output.[8]Chen proposed a method to improve comic style
9466 transfer by training a dedicated CNN to classify
comic/noncomic images.[9] Liao proposed a Deep Analogy
method which keeps semantically meaningful dense
correspondences between the content and style images while
transferring the style. They also compare and blend patches
in the VGG feature space.

3. METHODOLOGY
Figure3.1-Y Vector
Object detection is done using YOLO algorithm.
YOLO is a single stage detector.win32com.client is used to
convert the annotated text to speech. To achieve the basic
cartoon effect, a bilateral filter and edge detection is used. In Figure 3.1 Y-Vector
The bilateral filter will reduce the color palette, or the
numbers of colors that are used in the image. It reduce noise
 pc defines whether an object is present in the grid or not
in an image.
(it is the probability)
3.1 Technique of detection  bx, by, bh, bw specify the bounding box if there is an
object
3.1.1 YOLO  c1, c2, c3 represent the classes. So, if the object is a car,
c2 will be 1 and c1 & c3 will be 0, and so on
All the previous object detection algorithms have
used regions to localize the object within the image. The 3.1.2 Non-Max Suppression
network does not look at the complete image. Instead, parts
of the image which has high probabilities of containing the One of the most common problems with object
object. YOLO or You Only Look Once is an object detection detection algorithms is that rather than detecting an object
algorithm much is different from the region based algorithms just once, they might detect it multiple times. The Non-Max
which seen above. In YOLO a single convolutional network Suppression technique cleans up this up so that we get only a
predicts the bounding boxes and the class probabilities for single detection per object. Taking the boxes with maximum
these boxes. To help increase the speed of deep learning- probability and suppressing the close-by boxes with non-max
based object detectors, YOLO uses a one-stage detector probabilities.Discard all the boxes having probabilities less
strategy. than or equal to a pre-defined threshold (say, 0.5).
YOLO works by taking an image and split it into an
3.1.3 win32com.client
SxS grid, within each of the grid we take m bounding boxes.
For each of the bounding box, the network gives an output a win32com.client module is used to add voice that
class probability and offset values for the bounding box. The converts the annotated tect to speech. Specifically,
bounding boxes have the class probability above a threshold SAPI.SPvoice is used.

2764
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767

3.1.4 Cartoonizing an image The main goal of system design is to distinguish the
modules, whereas the main goal of careful style is to plan the
The process to create a cartoon effect image can be logic for each of the modules.
initially branched into 2 divisions –To detect, blur and bold
the edges of the actual RGB color image. To smooth,
quantize and the conversion of the RGB image to grayscale.
The results involved in combining the image and help
achieve the desired result.

4. IMPLEMENTATION

During the implementation phase, code is generated


from the deliverables of the design phase, and is the longest
phase of the software development life cycle. For a
developer, this is the most vital stage of the life cycle
because it is where the code is created. The implementation
phase may overlap with the design and testing phases. There
are numerous tools (CASE tools) available to automate the
production of code based on information gathered and
produced during the design phase.

5. SYSTEM ARCHITECTURE

The design phase's goal is to start organizing a Figure 5.1 Architecture diagram
solution to the problem, such as a necessity document. This
section describes how the opening moves from the matter
domain to the answer domain. The design phase meets the Figure 5.1 shows that , the User will upload or
system's requirements. The design of a system is most likely capture the image.It is the preference given to user
the most important factor in determining the quality of the whether to upload or capture the image. The given image
software package. It has a significant impact on the later is then divided into S X S grid cells by YOLO algorithm
stages, particularly testing and maintenance. which is then given as a forward pass to DCNN .Then
prediction of Bounding boxes will takes place and its
The style of the document is the result of this section.
corresponding class ids are taken into consideration.
This document works similar to a blueprint of solution and
There is a possibility that more than one bounding box
is used later in implementation, testing, and maintenance.
may be predicted for single object. So, Non-Max
The design process is typically divided into two phases:
Suppression along with IoU have to be done.NMS will
System Design and Detailed Design.
only keeps the highest score boxes.
System design, also known as top-ranking design,
IoU =areaof Intersection of Bounding
seeks to identify the modules that should be included in the
system, the specifications of those modules, and how they boxes/area of Union of Bounding boxes
interact with one another to provide the desired results.
All of the main knowledge structures, file formats, Speech synthesis is the artificial production of human
output formats, as well as the major modules within the speech. A computer system used for this purpose is called a
system and their specifications square measure set at the top speech computer or speech synthesizer, and can be
of the system style. System design is the method or art of implemented in software or hardware products. In order to
creating the design, components, modules, interfaces, and convert the annotated text to speech Win32.com module
knowledge for a system in order to meet such requirements. from OpenCV library specifically SAPI.sp voice is used.
It will be read by users because it applies systems theory to
development. In order to convert the image given by the user to
The inner logic of each of the modules laid out in cartoon ,first it is converted to grayscale. cvtColor(image,
system design is determined in Detailed Design.Throughout flag) is a method in cv2 which is used to transform an image
this section, the fine print of a module square measure is into the colour-space mentioned as ‘flag’. Here, our first step
sometimes laid out in a high-level style description language is to convert the image into grayscale. Thus, we use the
that is independent of the target language within which the BGR2GRAY flag. This returns the image in grayscale. A
software package will eventually be enforced. grayscale image is stored as grayScaleImage.

2765
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767

To smoothen an image, we simply apply a blur effect.


This is done using medianBlur() function. We use
bilateralFilter which removes the noise. It can be taken as
smoothening of an image to an extent.Here, we will try to
retrieve the edges and highlight them. This is attained by the
adaptive thresholding technique.We perform bitwise_and on Figure 6.4 command for capturing the image
two images to mask them. This finally CARTOONIFY our
image!

6. RESULTS

Figure6.1 command for uploading the image

Figure 6.5 image window with the detected objects

Figure 6.2 image window with the detected objects

Figure 6.3 image window with cartoonized image Figure 6.6 image window with cartoonized image
Figure 6.1 shows the command for uploading the image. Figure 6.4 shows the command for capturing the image.
Figure 6.2 refers to an Image Window that will be displayed Figure 6.5 refers to an Image Window that will be displayed
to the user with detected objects . Figure 6.2 refers to an to the user with detected objects . Figure 6.6 refers to an
Image Window that will be displayed to the user as a separate Image Window that will be displayed to the user as a separate
window with Cartoonized Image. window with Cartoonized Image.

2766
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767

7. CONCLUSION also thankful to Prof. Abdul Nabi Shaik(HOD, Computer


Science and Engineering), without his support and advice
Object detection with Voice Sensor and Cartoonizing our project would not have shaped up as it has.
an Image can be used widely to provide the blind with
privacy and convenience in everyday life. Also, it is
expected to be applied to industrial areas where diminished REFERENCES
visibility occurs, such as coal mines and sea beds, to greatly
help production and industrial development in extreme 1. Aditya Raj, Manish Kannaujiya, Ajeet Bharti, Rahul
environments. Prasad, Namrata Singh, Ishan Bhardwaj “ Model for
This application aims to enable people with visual Object Detection using Computer Vision and Machine
impairment to live more independently. People with visual Learning for Decision Making ” International Journal of
impairment will be able to overcome some threats that they Computer Applications (0975 – 8887) .
may come across in their day to day life that may be either 2. Cartoonizing an image,https://data-
while reading a book or traveling through the city by making flair.training/blogs/cartoonify-image-opencv-python/
efficient use of the application and its associative voice 3. Global data on visual impairment, World Health
feedback. Thus, helping visually impaired people to ‘See Organization.
Through the Ears’. 4. https://www.who.int/blindness/publications/globaldata/e
Cartoonizing an image will transforms the image into n/
a cartoon image. Today we can find countless numbers of 5. Google cloud Text to Speech,
photo editing applications on the internet that allow us to https://cloud.google.com/text-to-speech
transforms images into cartoons on the internet. It’s similar 6. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali
to BEAUTIFY or AI effect in cameras of modern mobile Farhadi “You Only Look Once: Unified, Real-Time
phones. It can be taken as smoothening of an image to an Object Detection”.
extent. It makes an image look vicious and like water paint, 7. OpenCV, https://opencv.org/
removing the roughness in colors. 8. Python programming language, https://www.python.org/
9. Rafael C. Gonzalez and Richard E. Woods “Digital
Image Processing”. Pearson 2018.
8. FUTURE SCOPE 10. Selman TOSUN, Enis KARAARSLAN “Real-Time
Object Detection Application for Visually Impaired
 Object detection is a key ability for most computer and People: Third Eye”.
robot vision system. Although great progress has been 11. win32com.client , https://pbpython.com/windows-
observed in the last years, and some existing techniques
com.html
are now part of many consumer electronics (e.g., face
detection for auto-focus in smartphones) or have been
integrated in assistant driving technologies.
 In the fields of healthcare and security systems. In the
domain of healthcare, medical image analysis can be
performed using image extraction or object detection
systems for computer vision predictive analytics and
therapy. Identification of cancer cells in tissue biopsy
may serve as an example for the above technique.
 It is impossible for humans to reach the depth parts of
sea as they cannot handle pressure. So, Object detection
systems for nano-robots or for robots is used to explore
areas that have not been seen by humans.

9. ACKNOWLEDGEMENT

We have tried our best to present Paper on the


“Object Detection with Voice Sensor and Cartoonizing the
Image” as clearly as possible. We are also thankful to our
Guide Prof.Md Salar Mohammad for providing the technical
guidance and suggestions regarding the completion of this
work. It’s our duty to acknowledge their constant
encouragement, support and, guidance throughout the
development of the project and its timely completion. We are

2767

You might also like