Object Detection With Voice Sensor and Cartoonizing The Image
Object Detection With Voice Sensor and Cartoonizing The Image
2762
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
So, we can distinguish between these three computer vision localization, and finally explore an object detection
tasks with this example: algorithm known as “You only look once” (YOLO).
Image Classification: This is done by Predict the type or Image classification also involves assigning a class
class of an object in an image. label to an image, whereas object localization involves
drawing a bounding box around one or more objects in an
Input: An image which consists of a single object, such as a image. Object detection is always more challenging and
photograph. combines these two tasks and draws a bounding box around
each object of interest in the image and assigns them a class
Output: A class label (e.g. one or more integers that are label. Together, all these problems are referred to as object
mapped to class labels). recognition.
Object Localization: This is done through, Locate the Object recognition refers to a collection of related
presence of objects in an image and indicate their location tasks for identifying objects in digital photographs. Region-
with a bounding box. based Convolutional Neural Networks, or R-CNNs, is a
family of techniques for addressing object localization and
Input: An image which consists of one or more objects, such recognition tasks, designed for model performance. You
as a photograph. Only Look Once, or YOLO is known as the second family of
techniques for object recognition designed for speed and
Output: One or more bounding boxes (e.g. defined by a
real-time use.
point, width, and height).
1.2 Cartoonizing an Image
Object Detection: This is done through, Locate the presence
of objects with a bounding box and types or classes of the Image Processing – In the field of the research
located objects in an image. processing of an image consisting of identifying an object in
an image, identify the dimensions, no of objects, changing
Input: An image which consists of one or more objects, such
the images to blur effect and such effects are highly
as a photograph.
appreciated in this modern era of media and communication.
Output: One or more bounding boxes (e.g. defined by a There are multiple properties in the Image Processing. Each
point, width, and height), and a class label for each bounding of the property estimates the image to be produced more with
box. essence and sharper image. Each Image is examined to
various grid. Each picture element together is viewed as a 2-
One of the further extension to this breakdown of D Matrix. With each of the cell store different pixel values
computer vision tasks is object segmentation, also called corresponding to each of the picture element.
“object instance segmentation” or “semantic segmentation,”
where instances of recognized objects are indicated by 2. LITERATURE SURVEY
highlighting the specific pixels of the object instead of a
coarse bounding box. From this breakdown, we can [1] Joseph Redmon, Santosh Divvala, Ali Farhadi -
understand that object recognition refers to a suite of Unified, Real-Time Object Detection : A unified model for
challenging computer vision tasks. object detection which is easy to build and is trained straight
on full images. The model was built to detect images
For example, image classification is simply straight accurately, fast and to differentiate between art and real
forward, but the differences between object localization and images.[2] Chengji Liu, Yufan Tao - Degenerative model: A
object detection can be confusing, especially when all three degenerative model built for detecting degraded images like
tasks may be just as equally referred to as object recognition. blurred and noisy images .This model performed better in
terms of detecting degraded images and coped better with
Humans can detect and identify objects present in an complex scenes. [3]Wenbo Lan, Song Wang - YOLO
image. The human visual system is fast and accurate and can Network Model : The number of detection frames can
also perform complex tasks like identifying multiple objects reach 25 frames/s, which meets the demands of real-time
and detect obstacles with little conscious thought. The performance.[4]Rumin Zhang, Yifeng Yang - The images of
availability of large sets of data, faster GPUs, and better the common obstacles were labeled and used for training
algorithms, we can now easily train computers to detect and YOLO. The object filter is applied to remove the unconcern
classify multiple objects within an image with high accuracy. obstacle. Different types of scene, including pedestrian,
We need to understand terms such as object detection, object chairs, books and so on, are demonstrated to prove the
localization, loss function for object detection and effectiveness of this obstacle detection algorithm.[5]Zhimin
Mo1, Liding Chen1, Wen-jing - Identification and detection
2763
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
automotive door panel solder joints based on YOLO. The value is selected and used to locate the object within the
YOLO algorithm, proposed identifies the position of the image.
solder joints accurately in real time. This is helpful to
increase the efficiency of the production line and it has a Image classification and localization are applied on
great significance for the flexibility and real-time of the each grid. YOLO then predicts the bounding boxes and their
welding of automobile door panels.[6]Gatys first proposed a corresponding class probabilities for objects.
neural style transfer (NST) method based on CNNs that
transfers the style from the style image to the content image. We need to pass the labelled data to the model in
They use the feature maps of a pre-trained VGG network to order to train it. Suppose we have divided the image into a
represent the content and optimize the result image.The grid of size 3 X 3 and there are a total of 3 classes which we
results for cartoon style transfer are more problematic, as want the objects to be classified into. Let’s say the classes
they often fail to reproduce clear edges or smooth are Pedestrian, Car, and Motorcycle respectively. So, for
shading.[7]Li and Wand obtained style transfer by local each grid cell, the label y will be an eight dimensional
matching of CNN feature maps and using a Markov Random vector:
Field for fusion (CNNMRF). However, local matching can
make mistakes, resulting in semantically incorrect
output.[8]Chen proposed a method to improve comic style
9466 transfer by training a dedicated CNN to classify
comic/noncomic images.[9] Liao proposed a Deep Analogy
method which keeps semantically meaningful dense
correspondences between the content and style images while
transferring the style. They also compare and blend patches
in the VGG feature space.
3. METHODOLOGY
Figure3.1-Y Vector
Object detection is done using YOLO algorithm.
YOLO is a single stage detector.win32com.client is used to
convert the annotated text to speech. To achieve the basic
cartoon effect, a bilateral filter and edge detection is used. In Figure 3.1 Y-Vector
The bilateral filter will reduce the color palette, or the
numbers of colors that are used in the image. It reduce noise
pc defines whether an object is present in the grid or not
in an image.
(it is the probability)
3.1 Technique of detection bx, by, bh, bw specify the bounding box if there is an
object
3.1.1 YOLO c1, c2, c3 represent the classes. So, if the object is a car,
c2 will be 1 and c1 & c3 will be 0, and so on
All the previous object detection algorithms have
used regions to localize the object within the image. The 3.1.2 Non-Max Suppression
network does not look at the complete image. Instead, parts
of the image which has high probabilities of containing the One of the most common problems with object
object. YOLO or You Only Look Once is an object detection detection algorithms is that rather than detecting an object
algorithm much is different from the region based algorithms just once, they might detect it multiple times. The Non-Max
which seen above. In YOLO a single convolutional network Suppression technique cleans up this up so that we get only a
predicts the bounding boxes and the class probabilities for single detection per object. Taking the boxes with maximum
these boxes. To help increase the speed of deep learning- probability and suppressing the close-by boxes with non-max
based object detectors, YOLO uses a one-stage detector probabilities.Discard all the boxes having probabilities less
strategy. than or equal to a pre-defined threshold (say, 0.5).
YOLO works by taking an image and split it into an
3.1.3 win32com.client
SxS grid, within each of the grid we take m bounding boxes.
For each of the bounding box, the network gives an output a win32com.client module is used to add voice that
class probability and offset values for the bounding box. The converts the annotated tect to speech. Specifically,
bounding boxes have the class probability above a threshold SAPI.SPvoice is used.
2764
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
3.1.4 Cartoonizing an image The main goal of system design is to distinguish the
modules, whereas the main goal of careful style is to plan the
The process to create a cartoon effect image can be logic for each of the modules.
initially branched into 2 divisions –To detect, blur and bold
the edges of the actual RGB color image. To smooth,
quantize and the conversion of the RGB image to grayscale.
The results involved in combining the image and help
achieve the desired result.
4. IMPLEMENTATION
5. SYSTEM ARCHITECTURE
The design phase's goal is to start organizing a Figure 5.1 Architecture diagram
solution to the problem, such as a necessity document. This
section describes how the opening moves from the matter
domain to the answer domain. The design phase meets the Figure 5.1 shows that , the User will upload or
system's requirements. The design of a system is most likely capture the image.It is the preference given to user
the most important factor in determining the quality of the whether to upload or capture the image. The given image
software package. It has a significant impact on the later is then divided into S X S grid cells by YOLO algorithm
stages, particularly testing and maintenance. which is then given as a forward pass to DCNN .Then
prediction of Bounding boxes will takes place and its
The style of the document is the result of this section.
corresponding class ids are taken into consideration.
This document works similar to a blueprint of solution and
There is a possibility that more than one bounding box
is used later in implementation, testing, and maintenance.
may be predicted for single object. So, Non-Max
The design process is typically divided into two phases:
Suppression along with IoU have to be done.NMS will
System Design and Detailed Design.
only keeps the highest score boxes.
System design, also known as top-ranking design,
IoU =areaof Intersection of Bounding
seeks to identify the modules that should be included in the
system, the specifications of those modules, and how they boxes/area of Union of Bounding boxes
interact with one another to provide the desired results.
All of the main knowledge structures, file formats, Speech synthesis is the artificial production of human
output formats, as well as the major modules within the speech. A computer system used for this purpose is called a
system and their specifications square measure set at the top speech computer or speech synthesizer, and can be
of the system style. System design is the method or art of implemented in software or hardware products. In order to
creating the design, components, modules, interfaces, and convert the annotated text to speech Win32.com module
knowledge for a system in order to meet such requirements. from OpenCV library specifically SAPI.sp voice is used.
It will be read by users because it applies systems theory to
development. In order to convert the image given by the user to
The inner logic of each of the modules laid out in cartoon ,first it is converted to grayscale. cvtColor(image,
system design is determined in Detailed Design.Throughout flag) is a method in cv2 which is used to transform an image
this section, the fine print of a module square measure is into the colour-space mentioned as ‘flag’. Here, our first step
sometimes laid out in a high-level style description language is to convert the image into grayscale. Thus, we use the
that is independent of the target language within which the BGR2GRAY flag. This returns the image in grayscale. A
software package will eventually be enforced. grayscale image is stored as grayScaleImage.
2765
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
6. RESULTS
Figure 6.3 image window with cartoonized image Figure 6.6 image window with cartoonized image
Figure 6.1 shows the command for uploading the image. Figure 6.4 shows the command for capturing the image.
Figure 6.2 refers to an Image Window that will be displayed Figure 6.5 refers to an Image Window that will be displayed
to the user with detected objects . Figure 6.2 refers to an to the user with detected objects . Figure 6.6 refers to an
Image Window that will be displayed to the user as a separate Image Window that will be displayed to the user as a separate
window with Cartoonized Image. window with Cartoonized Image.
2766
MD.Salar Mohammad et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2762 – 2767
9. ACKNOWLEDGEMENT
2767