0% found this document useful (0 votes)
50 views

REPORT Python

The document discusses object detection using TensorFlow. It provides an overview of TensorFlow and its uses in applications like image recognition, voice search, etc. It then discusses various object detection models available in TensorFlow like ResNet, R-CNN, Fast R-CNN, Faster R-CNN and YOLO. These models differ in their approaches to generate region proposals and classify objects, with newer models aiming to improve speed and accuracy over earlier ones. The document also outlines the objectives and theoretical background of the object detection problem.

Uploaded by

imroz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

REPORT Python

The document discusses object detection using TensorFlow. It provides an overview of TensorFlow and its uses in applications like image recognition, voice search, etc. It then discusses various object detection models available in TensorFlow like ResNet, R-CNN, Fast R-CNN, Faster R-CNN and YOLO. These models differ in their approaches to generate region proposals and classify objects, with newer models aiming to improve speed and accuracy over earlier ones. The document also outlines the objectives and theoretical background of the object detection problem.

Uploaded by

imroz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

OBJECT DETECTION USING TENSORFLOW

CHAPTER 1

INTRODUCTION

The TensorFlow Object Detection API is an open-source framework built on


top of TensorFlow that makes it easy to construct, train and deploy object
detection models.

 There are already pre-trained models in their framework which are


referred to as Model Zoo. 

there are various models available so what is different in these models. These
various models have different architecture and thus provide different accuracies
but there is a trade-off between speed of execution and the accuracy in placing
bounding boxes.

Tensorflow bundles together Machine Learning and Deep Learning models and
algorithms. It uses Python as a convenient front-end and runs it efficiently in
optimized C++.

Tensorflow allows developers to create a graph of computations to perform.


Each node in the graph represents a mathematical operation and each
connection represents data. Hence, instead of dealing with low-details like
figuring out proper ways to hitch the output of one function to the input of
another, the developer can focus on the overall logic of the application.

TensorFlow is at present the most popular software library. There are several
real-world applications of deep learning that makes TensorFlow popular. Being
an Open-Source library for deep learning and machine learning, TensorFlow
finds a role to play in text-based applications, image recognition, voice search,
and many more. DeepFace, Facebook’s image recognition system, uses

Department of CSE, Jain Polytechnic Belgaum 1


OBJECT DETECTION USING TENSORFLOW

TensorFlow for image recognition. It is used by Apple’s Siri for voice


recognition. Every Google app that you use has made good use of TensorFlow
to make your experience better.

1.1 Objectives

The core purpose behind making this project is,

 To detect objects from image inputs and also in real-time.


 Recognise and label objects from image inputs and in real-time.
 Publish the code in git-hub public repository for public-use and
modification.
1.2 Theoretical background definitions of problem

The aim of object detection is to detect all instances of objects from a known
class, such as people, cars or faces in an image. Generally, only a small number
of instances of the object are present in the image, but there is a very large
number of possible locations and scales at which they can occur and that need to
somehow be explored. Each detection of the image is reported with some form
of pose information. This is as simple as the location of the object, a location
and scale, or the extent of the object defined in terms of a bounding box. In
some other situations, the pose information is more detailed and contains the
parameters of a linear or non-linear transformation.

Department of CSE, Jain Polytechnic Belgaum 2


OBJECT DETECTION USING TENSORFLOW

CHAPTER 2
LITERATURE SURVEY :

 Ionic documentation:

https://github.com/tensorflow/tensorflow

This website contains dependencies and pre-trained models.

https://www.mygreatlearning.com/blog/object-detection-using-tensorflow/

This website contains virtual environment setup and tutorial to set the
dependencies.

 In various fields, there is a necessity to detect the target object and also track
them effectively while handling occlusions and other included complexities.
Many researchers (Almeida and Guting 2004, Hsiao-Ping Tsai 2011, Nicolas
Papadakis and Aure lie Bugeau 2010 ) attempted for various approaches in
object tracking. The nature of the techniques largely depends on the
application domain.

Department of CSE, Jain Polytechnic Belgaum 3


OBJECT DETECTION USING TENSORFLOW

CHAPTER 3

SYSTEM ANALYSIS

3.1 Existing Models

ResNet
To train the network model in a more effective manner, we herein adopt the
same strategy as that used for DSSD (the performance of the residual network is
better than that of the VGG network). The goal is to improve accuracy.
However, the first implemented for the modification was the replacement of the
VGG network which is used in the original SSD with ResNet. We will also add
a series of convolution feature layers at the end of the underlying network.
These feature layers will gradually be reduced in size that allowed prediction of
the detection results on multiple scales. When the input size is given as 300 and
320, although the ResNet–101 layer is deeper than the VGG–16 layer, it is
experimentally known that it replaces the SSD’s underlying convolution
network with a residual network, and it does not improve its accuracy but rather
decreases it.

R-CNN

To circumvent the problem of selecting a huge number of regions, Ross


Girshick et al. proposed amethod where we use the selective search for extract
just 2000 regions from the image and he called them region proposals.
Therefore, instead of trying to classify the huge number of regions, you can just
work with 2000 regions. These 2000 region proposals are generated by using
the selective search algorithm which is written below.

SelectiveSearch:
1. Generate the initial sub-segmentation, we generate many candidate regions
2. Use the greedy algorithm to recursively combine similar regions into larger

Department of CSE, Jain Polytechnic Belgaum 4


OBJECT DETECTION USING TENSORFLOW

ones
3. Use generated regions to produce the final candidate region proposals

These 2000 candidate regions which are proposals are warped into a square and
fed into a convolutional neural network that produces a 4096-dimensional
feature vector as output. The CNN plays a role of feature extractor and the
output dense layer consists of the features extracted from theimage and the
extracted features are fed into an SVM for the classify the presence of the object
within that candidate region proposal.

In addition to predicting the presence of an object within the region proposals,


the algorithm also predicts four values which are offset values for increasing the
precision of the bounding box. For example, given the region proposal, the
algorithm might have predicted the presence of a person but the face of that
person within that region proposal could have been cut in half. Therefore, the
offset values which is given help in adjusting the bounding box of the region
proposal.

Problems with R-CNN


 It still takes a huge amount of time to train the network as you would have to
classify 2000 region proposals per image.
 It cannot be implemented real time as it takes around 47 seconds for each test
image.

Department of CSE, Jain Polytechnic Belgaum 5


OBJECT DETECTION USING TENSORFLOW

 The selective search algorithm is a fixed algorithm. Therefore, no learning is


happening at that stage. This could lead to the generation of bad candidate
region proposals.

Fast R-CNN

The same author of the previous paper(R-CNN) solved some of the drawbacks
of R-CNN to build a faster object detection algorithm and it was called Fast R-
CNN. The approach is similar to the R-CNN algorithm. But, instead of feeding
the region proposals to the CNN, we feed the input image to the CNN to
generate a convolutional feature map. From the convolutional feature map, we
can identify the region of the proposals and warp them into the squares and by
using an RoI pooling layer we reshape them into the fixed size so that it can be
fed into a fully connected layer. From the RoI feature vector, we can use a
softmax layer to predict the class of the proposed region and also the offset
values for the bounding box. The reason “Fast R-CNN” is faster than R-CNN is
because you don’t have to feed 2000 region proposals to the convolutional
neural network every time. Instead, the convolution operation is always done
only once per image and a feature map is generated from it.

Faster R-CNN
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search to

Department of CSE, Jain Polytechnic Belgaum 6


OBJECT DETECTION USING TENSORFLOW

find out the region proposals. Selective search is the slow and time-consuming
process which affect the performance of the network. Similar to Fast R-CNN,
the image is provided as an input to a convolutional network which provides a
convolutional feature map. Instead of using the selective search algorithm for
the feature map to identify the region proposals, a separate network is used to
predict the region proposals. The predicted the region which is proposals are
then reshaped using an RoI pooling layer which is used to classify the image
within the proposed region and predict the offset values for the bounding boxes.

Therefore, it can even be used for real-time object detection.

YOLO — You Only Look Once

All the previous object detection algorithms have used regions to localize the
object within the image. The network does not look at the complete image.
Instead, parts of the image which has high probabilities of containing the object.
YOLO or You Only Look Once is an object detection algorithm much is
different from the region based algorithms which seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class probabilities
for these boxes.

YOLO works by taking an image and split it into an SxS grid, within each of
the grid we take m bounding boxes. For each of the bounding box, the network
gives an output a class probability and offset values for the bounding box. The
bounding boxes have the class probability above a threshold value is selected
and used to locate the object within the image.YOLO is orders of magnitude
faster(45 frames per second) than any other object detection algorithms. The
limitation of YOLO algorithm is that it struggles with the small objects within

Department of CSE, Jain Polytechnic Belgaum 7


OBJECT DETECTION USING TENSORFLOW

the image, for example, it might have difficulties in identifying a flock of birds.
This is due to the spatial constraints of the algorithm.

SDD:
The SSD object detection composes of 2 parts:

1. Extract feature maps, and


2. Apply convolution filters to detect objects.

SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3 layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it
should be 38 × 38). For each cell in the image(also called location), it makes 4
object predictions.
Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the
bounded object.

SSD does not use the delegated region proposal network. Instead, it resolves to
a very simple method. It computes both the location and class scores using
small convolution filters. After extraction the feature maps, SSD applies 3 × 3
convolution filters for each cell to make predictions. (These filters compute the
results just like the regular CNN filters.) Each filter gives outputs as 25
channels: 21 scores for each class plus one boundary box.
Beginning, we describe the SSD detects objects from a single layer. Actually, it
uses multiple layers (multi-scale feature maps) for the detecting objects
independently. As CNN reduces the spatial dimension gradually, the resolution
of the feature maps also decrease. SSD uses lower resolution layers for the
detect larger-scale objects. For example, the 4× 4 feature maps are used for the
larger-scale object.

Department of CSE, Jain Polytechnic Belgaum 8


OBJECT DETECTION USING TENSORFLOW

SSD adds 6 more auxiliary convolution layers to image after VGG16. Five of
these layers will be added for object detection. In which three of those layers,
we make 6 predictions instead of 4. In total, SSD makes 8732 predictions using
6 convolution layers.
Multi-scale feature maps enhance accuracy.The accuracy with different number
of feature map layers is used for object detection.

MANet:
Target detection is fundamental challenging problem for long time and has been
a hotspot in the area of computer vision for many years. The purpose and
objective of target detection is, to determine if any instances of a specified
category of objects exist in an image. If there is an object to be detected in a
specific image, target detection return the spatial positions and the spatial extent
of the instances of the objects (based on the use a bounding box, for example).
As one of cornerstones of image understanding and computer vision,target and
object detection forms the basis for more complex and higher-level visual tasks,
such as object tracking, image capture, instance segmentation, and others.Target
detection is also widely used in areas such as artificial intelligence and
information technology, including machine vision, automatic driving vehicles,
and human–computer interaction. In recent times, the method automatic
learning of represented features from data based on deep learning has
effectively improved performance of target detection. Neural networks are
foundation of deep learning.

Therefore, design of better neural networks has become an key issue toward
improvement of target detection algorithms and performance. Recently
developed object detectors that has been based on convolutional neural
networks (CNN) has been classified in two types:The first is two-stage detector
type, such as Region-Based CNN (R–CNN), Region-Based Full Convolutional

Department of CSE, Jain Polytechnic Belgaum 9


OBJECT DETECTION USING TENSORFLOW

Networks (R–FCN), and Feature Pyramid Network (FPN), and the other is
single-stage detector, such as the You Only Look Once (YOLO), Single-shot
detector (SSD), and the RetinaNet. The former type generates an series of
candidate frames as samples of data , and then classifies the samples based on a
CNN; the latter type do not generate candidate frames but directly converts the
object frame positioning problem into a regression processing problem.

To maintain realtime speeds without sacrificing precision in various object


detectors described above, Liu et al proposed the SSD which is faster than
YOLO and has a comparable accuracy to that of the most advanced region-
based target detectors.SSD combines regression idea of YOLO with the anchor
box mechanism of Faster R–CNN, predicts the object region based on the
feature maps of the different convolution layers, and outputs discretised
multiscale and multi proportional default box coordinates. The convolution
kernel predicts frame coordinates compensation of a series of candidate frames
and the confidence of each category. The local feature maps of multiscale area
are used to obtain results for each position in the entire image. This maintains
the fast characteristics of YOLO algorithm and also ensures that the frame
positioning effect is similar to that is induced by the Faster R–CNN. However,
SSD directly and independently uses two layers of the backbone VGG16 and
four extra layers obtained by a convolution with stride 2 to construct feature
pyramid but lacks strong contextual connections.

To solve these problems,A single-stage detection architecture, commonly


referred to as MANet, which aggregates feature information at different scales.
MANet achievs 82.7% mAP on the PASCAL V .

Department of CSE, Jain Polytechnic Belgaum 10


OBJECT DETECTION USING TENSORFLOW

3.2 SYSTEM REQUIREMENT:

1. Install Python on your computer system.


2. Install ImageAI and its dependencies like tensorflow, Numpy,OpenCV, etc.
3. Download the Object Detection model file(Retinanet)OC 2007 test.
4.

Department of CSE, Jain Polytechnic Belgaum 11


OBJECT DETECTION USING TENSORFLOW

CHAPTER 4

4.1 Steps to be followed

1) Download and install Python version 3 from official Python Language


website
https://python.org
2) Install the following dependencies via pip:

i. Tensorflow:
Tensorflow is an open-source software library for dataflow and differentiable
programming across a range of tasks. It is an symbolic math library, and is also
used for machine learning application such as neural networks,etc.. It is used for
both research and production by Google. Tensorflow is developed by the
Google Brain team for internal Google use. It is released under the Apache
License 2.0 on November 9,2015. Tensorflow is Google Brain's second-
generation system.1st Version of tensorflow was released on February 11,
2017.While the reference implementation runs on single devices, Tensorflow
can run on multiple CPU’s and GPU (with optional CUDA and SYCL
extensions for general-purpose computing on graphics processing units).
TensorFlow is available on various platforms such as64-bit Linux, macOS,
Windows, and mobile computing platforms including Android and iOS.
The architecture of tensorflow allows the easy deployment of computation
across a variety of platforms (CPU’s, GPU’s, TPU’s), and from desktops -
clusters of servers to mobile and edge devices.

tensorflow computations are expressed as stateful dataflow graphs. The name


Tensorflow derives from operations that such neural networks perform on
multidimensional data arrays, which are referred to as tensors.
pip install tensorflow -command

Department of CSE, Jain Polytechnic Belgaum 12


OBJECT DETECTION USING TENSORFLOW

ii. Numpy:
NumPy is library of Python programming language, adding support for large,
multi-dimensional array and matrice, along with large collection of high-level
mathematical function to operate over these arrays. The ancestor of NumPy,
Numeric, was originally created by Jim Hugunin with contributions from
several developers. In 2005 Travis Olphant created NumPy by incorporating
features of computing Numarray into Numeric, with extension modifications.
NumPy is open-source software and has many contributors.

pip install numpy -command

iii. SciPy:
SciPy contain modules for many optimizations, linear algebra, integration,
interpolation, special function, FFT, signal and image processing, ODE solvers
and other tasks common in engineering. SciPy abstracts majorly on NumPy
array object, and is the part of the NumPy stack which include tools like
Matplotlib, pandas and SymPy,etc., and an expanding set of scientific
computing libraries. This NumPy stack has similar uses to other applications
such as MATLAB,Octave, and Scilab. The NumPy stack is also sometimes
referred as the SciPy stack. The SciPy library is currently distributed under
BSDlicense, and its development is sponsored and supported by an open
communities of developers. It is also supported by NumFOCUS, community
foundation for supporting reproducible and accessible science.

pip install scipy -command

iv. OpenCV:

Department of CSE, Jain Polytechnic Belgaum 13


OBJECT DETECTION USING TENSORFLOW

OpenCV is an library of programming functions mainly aimed on real time


computer vision. originally developed by Intel, it is later supported by Willow
Garage then Itseez. The library is a cross-platform and free to use under the
open-source BSD license.

pip install opencv-python -command

v. Pillow:
Python Imaging Library is a free Python programming language library that
provides support to open, edit and save several different formats of image files.
Windows, Mac OS X and Linux are available for this.
pip install pillow -command

vi. Matplotlib:
Matplotlib is a Python programming language plotting library and its NumPy
numerical math extension. It provides an object-oriented API to use general-
purpose GUI toolkits such as Tkinter, wxPython, Qt, or GTK+ to embed plots
into applications.

pip install matplotlib – command

vii. H5py:
The software h5py includes a high-level and low-level interface for Python's
HDF5 library. The low interface expected to be complete wrapping of the
HDF5 API, while the high-level component uses established Python and
NumPy concepts to support access to HDF5 files, datasets and groups.
A strong emphasis on automatic conversion between Python (Numpy) datatypes
and data structures and their HDF5 equivalents vastly simplifies the process of
reading and writing data from Python.

Department of CSE, Jain Polytechnic Belgaum 14


OBJECT DETECTION USING TENSORFLOW

pip install h5py

viii. Cython:
Cython is an optimising static compiler for both the Python programming
language and the extended Cython programming language (based on Pyrex). It
makes writing C extensions for Python as easy as Python itself.

Cython gives you the combined power of Python and C to let you

 write Python code that calls back and forth from and to C or C++ code
natively at any point.
 easily tune readable Python code into plain C performance by adding
static type declarations, also in Python syntax.
 use combined source code level debugging to find bugs in your Python,
Cython and C code.
 interact efficiently with large data sets, e.g. using multi-dimensional
NumPy arrays.
 quickly build your applications within the large, mature and widely used
CPython ecosystem.
 integrate natively with existing code and data from legacy, low-level or
high-performance libraries and applications.

The Cython language is a superset of the Python language that additionally


supports calling C functions and declaring C types on variables and class
attributes. This allows the compiler to generate very efficient C code from
Cython code. The C code is generated once and then compiles with all major
C/C++ compilers in CPython 2.6, 2.7 (2.4+ with Cython 0.20.x) as well as 3.3
and all later versions. We regularly run integration tests against all supported

Department of CSE, Jain Polytechnic Belgaum 15


OBJECT DETECTION USING TENSORFLOW

CPython versions and their latest in-development branches to make sure that the
generated code stays widely compatible and well adapted to each version. PyPy
support is work in progress (on both sides) and is considered mostly usable
since Cython 0.17. The latest PyPy version is always recommended here.

All of this makes Cython the ideal language for wrapping external C libraries,
embedding CPython into existing applications, and for fast C modules that
speed up the execution of Python code.

ix. Lxml

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and
libxslt. It is unique in that it combines the speed and XML feature completeness
of these libraries with the simplicity of a native Python API, mostly compatible
but superior to the well-known ElementTree API. The latest release works with
all CPython versions from 2.7 to 3.9. See the introduction for more information
about background and goals of the lxml project.

x. Tf_slim

TF-Slim is a lightweight library for defining, training and evaluating complex


models in TensorFlow. Components of tf-slim can be freely mixed with native
tensorflow, as well as other frameworks..

xi. Jupyter Notebook

Jupyter Notebook is the latest web-based interactive development environment


for notebooks, code, and data. Its flexible interface allows users to configure
and arrange workflows in data science, scientific computing, computational
journalism, and machine learning. A modular design invites extensions to
expand and enrich functionality.

xii. pyCOCO (Common Objects in Context) dataset

Department of CSE, Jain Polytechnic Belgaum 16


OBJECT DETECTION USING TENSORFLOW

COCO is a large-scale object detection, segmentation, and captioning dataset.


COCO has several features:

 Object segmentation
 Recognition in context
 Superpixel stuff segmentation
 330K images (>200K labeled)
 1.5 million object instances
 80 object categories
 91 stuff categories
 5 captions per image
 250,000 people with keypoints

Steps to setup tensorflow and dependencies:

Python 3should be installed (version 3.7 to 3.10) and anaconda.

Step 1: Setup Virtual environment using

Conda create -n obj_name

Step 2: Activate Virtual Environment using

Conda activate obj_name

Step 3: Install TensorFlow using pip in the models directory which we


downloaded from Tensorflow official git-hub repo.

pip install tensorflow

Step 4: install the rest of dependencies

pip install pillow Cython lxml jupyter matplotlib contextlib2 tf_slim

Step 5: download proto-buff(v 21.4) from

https://github.com/protocolbuffers/protobuf/releases

Department of CSE, Jain Polytechnic Belgaum 17


OBJECT DETECTION USING TENSORFLOW

Step 6: Now in the anaconda prompt, navigate to the folder containing the
protoc file using the cd ‘path of folder’ and run this command

protoc object_detection/protos/*.proto –python_out=.

Step 7: Install pycocotools using pip

pip install pycocotools

Setup for Real-time object detection

https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv

Step 1: Download the whl file (version which is compatible with your python
and windows) from the above website in models directory.

Step 2: Now in the anaconda prompt navigate to the folder and install openCV
using pip

pip install opencv ”file location”

Department of CSE, Jain Polytechnic Belgaum 18


OBJECT DETECTION USING TENSORFLOW

CHAPTER 5

DETAILED LIFE-CYCLE OF PROJECT

5.1 Data-flow diagram (DFD)

Department of CSE, Jain Polytechnic Belgaum 19


OBJECT DETECTION USING TENSORFLOW

CHAPTER 6

6.1 Code

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import pathlib
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
 
while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')
 
def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name,
    origin=base_url + model_file,

Department of CSE, Jain Polytechnic Belgaum 20


OBJECT DETECTION USING TENSORFLOW

    untar=True)
 
  model_dir = pathlib.Path(model_dir)/"saved_model"
 
  model = tf.saved_model.load(str(model_dir))
 
  return model
 
PATH_TO_LABELS =
'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index =
label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS,
use_display_name=True)
 
model_name = 'ssd_inception_v2_coco_2017_11_17'
detection_model = load_model(model_name)
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]

# Run inference
model_fn = model.signatures['serving_default']
output_dict = model_fn(input_tensor)

# All outputs are batches tensors.


# Convert to numpy arrays, and take index [0] to remove the batch dimension.

Department of CSE, Jain Polytechnic Belgaum 21


OBJECT DETECTION USING TENSORFLOW

# We're only interested in the first num_detections.


num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy()
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections

# detection_classes should be ints.


output_dict['detection_classes'] =
output_dict['detection_classes'].astype(np.int64)

# Handle models with masks:


if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed =
utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'], output_dict['detection_boxes'],
image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
tf.uint8)
output_dict['detection_masks_reframed'] =
detection_masks_reframed.numpy()

return output_dict

def show_inference(model, image_path):


# the array based representation of the image will be used later in order to
prepare the
# result image with boxes and labels on it.
image_np = np.array(Image.open(image_path))

Department of CSE, Jain Polytechnic Belgaum 22


OBJECT DETECTION USING TENSORFLOW

# Actual detection.

output_dict = run_inference_for_single_image(model, image_np)


# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)

display(Image.fromarray(image_np))
PATH_TO_TEST_IMAGES_DIR =
pathlib.Path('models/research/object_detection/test_images')
2
TEST_IMAGE_PATHS =
sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
for image_path in TEST_IMAGE_PATHS:
    print(image_path)
    show_inference(detection_model, image_path)

Real-Time Object detection using Tensorflow


import numpy as np

import os

Department of CSE, Jain Polytechnic Belgaum 23


OBJECT DETECTION USING TENSORFLOW

import six.moves.urllib as urllib

import sys

import tarfile

import tensorflow as tf

import zipfile

import pathlib

from collections import defaultdict

from io import StringIO

from matplotlib import pyplot as plt

from PIL import Image

from IPython.display import display

from object_detection.utils import ops as utils_ops

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as vis_util

while "models" in pathlib.Path.cwd().parts:

    os.chdir('..')

def load_model(model_name):

  base_url = 'http://download.tensorflow.org/models/object_detection/'
Department of CSE, Jain Polytechnic Belgaum 24
OBJECT DETECTION USING TENSORFLOW

  model_file = model_name + '.tar.gz'

  model_dir = tf.keras.utils.get_file(

    fname=model_name,

    origin=base_url + model_file,

    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))

  return model

PATH_TO_LABELS =
'models/research/object_detection/data/mscoco_label_map.pbtxt'

category_index =
label_map_util.create_category_index_from_labelmap(PATH_TO_L
ABELS, use_display_name=True)

model_name = 'ssd_inception_v2_coco_2017_11_17'

detection_model = load_model(model_name)

def run_inference_for_single_image(model, image):

Department of CSE, Jain Polytechnic Belgaum 25


OBJECT DETECTION USING TENSORFLOW

  image = np.asarray(image)

  # The input needs to be a tensor, convert it using


`tf.convert_to_tensor`.

  input_tensor = tf.convert_to_tensor(image)

  # The model expects a batch of images, so add an axis with


`tf.newaxis`.

  input_tensor = input_tensor[tf.newaxis,...]

  # Run inference

  model_fn = model.signatures['serving_default']

  output_dict = model_fn(input_tensor)

  # All outputs are batches tensors.

  # Convert to numpy arrays, and take index [0] to remove the batch
dimension.

  # We're only interested in the first num_detections.

  num_detections = int(output_dict.pop('num_detections'))

  output_dict = {key:value[0, :num_detections].numpy()

                 for key,value in output_dict.items()}

  output_dict['num_detections'] = num_detections

Department of CSE, Jain Polytechnic Belgaum 26


OBJECT DETECTION USING TENSORFLOW

  # detection_classes should be ints.

  output_dict['detection_classes'] =
output_dict['detection_classes'].astype(np.int64)

    

  # Handle models with masks:

  if 'detection_masks' in output_dict:

    # Reframe the the bbox mask to the image size.

    detection_masks_reframed =
utils_ops.reframe_box_masks_to_image_masks(

              output_dict['detection_masks'],
output_dict['detection_boxes'],

               image.shape[0], image.shape[1])     

    detection_masks_reframed = tf.cast(detection_masks_reframed >


0.5,

                                       tf.uint8)

    output_dict['detection_masks_reframed'] =
detection_masks_reframed.numpy()

     

  return output_dict

def show_inference(model, frame):

Department of CSE, Jain Polytechnic Belgaum 27


OBJECT DETECTION USING TENSORFLOW

  #take the frame from webcam feed and convert that to array
  image_np = np.array(frame)
  # Actual detection.
     
  output_dict = run_inference_for_single_image(model, image_np)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks_reframed',
None),
      use_normalized_coordinates=True,
      line_thickness=5)
 
  return(image_np)

#Now we open the webcam and start detecting objects


import cv2
video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    re,frame = video_capture.read()

Department of CSE, Jain Polytechnic Belgaum 28


OBJECT DETECTION USING TENSORFLOW

    Imagenp=show_inference(detection_model, frame)
    cv2.imshow('object detection', cv2.resize(Imagenp, (800,600)))
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Department of CSE, Jain Polytechnic Belgaum 29


OBJECT DETECTION USING TENSORFLOW

CHAPTER 7

7.1 Results and discussion

Snap1.: Application recognising a Cup

Snap 2.: Recognising mouse and keyboard in a frame( Real-time)

Department of CSE, Jain Polytechnic Belgaum 30


OBJECT DETECTION USING TENSORFLOW

Snap 3.: Detecting a Laptop (single object)

Snap 4.: Detecting similar objects in a frame(bottles)

Department of CSE, Jain Polytechnic Belgaum 31


OBJECT DETECTION USING TENSORFLOW

Snap 5.: Detecting Different kinds of objects in a frame.

Department of CSE, Jain Polytechnic Belgaum 32


OBJECT DETECTION USING TENSORFLOW

CHAPTER 8
APPLICATION AND ADVANTAGES
Object detection is completely inter-linked with other similar computer
vision techniques such as image segmentation and image recognition that assist
us to understand and analyze the scenes in videos and images. Nowadays,
several real-world use cases are implemented in the market of object detection
which make a tremendous impact on different industries.

Here we’ll specifically examine how object detection applications have


impacted in the following areas.

Self-driving cars

The primary reason behind the success of autonomous vehicles is real-time


object detection artificial intelligence based models. These systems allow us to
locate, identify and track the objects around them, for the purpose of safety and
efficiency.

Video Surveillance

Real-time object detection and tracking the movements of objects allow video
surveillance cameras to track the record of scenes of a particular location such
as an airport. This state-of-the-art technique accurately recognizes and locates
several instances of a given object in the video. In real-time, as the object moves
through a given scene or across the particular frame, the system stores the
information with real-time tracking feeds.

Crowd Counting

For heavily populated areas such as shopping malls, airports, city squares and
theme parks, this application performs unbelievably well. Generally, this object
detection application proves to be helpful to large enterprises and municipalities
for tracking road traffic, violation of laws and number of vehicles passing in a
particular time frame.

Anomaly detection

There are several anomaly detection applications available for different


industries which use object detection. For instance, in agriculture, object
detection models can accurately recognize and find the potential instances of

Department of CSE, Jain Polytechnic Belgaum 33


OBJECT DETECTION USING TENSORFLOW

plant disease. With the help of this, farmers will get notified and they will be
able to prevent their crops from such threats.

As another example, this model has been used to identify the skin infections and
symptomatic lesions. Some applications are already built for skin care and acne
treatment using object detection models.

Keep in mind, there are some problems encountered while creating any kind of
object detection model. However, solutions are also available to limit the
challenges.

Department of CSE, Jain Polytechnic Belgaum 34


OBJECT DETECTION USING TENSORFLOW

CHAPTER 9

9.1 CONCLUSION
By using this thesis and based on experimental results we are able to detect
obeject more precisely and identify the objects individually with exact location
of an obeject in the picture in x,y axis.This paper also provide experimental
results on different methods for object detection and identification and
compares each method for their efficiencies.

9.2 FUTURE ENCHANCEMENTS


The object recognition system can be applied in the area of surveillance system,
face recognition, fault detection, character recognition etc. The objective of this
thesis is to develop an object recognition system to recognize the 2D and 3D
objects in the image. The performance of the object recognition system depends
on the features used and the classifier employed for recognition. This research
work attempts to propose a novel feature extraction method for extracting
global features and and obtaining local features from the region of interest. Also
the research work attempts to hybrid the traditional classifiers to recognize the
object. The object recognition system developed in this research was tested with
the benchmark datasets like COIL100, Caltech 101, ETH80 and MNIST.
The object recognition system is implemented in MATLAB 7.5 It is important
to mention the difficulties observed during the experimentation of the object
recognition system due to several features present in the image. The research
work suggests that the image is to be preprocessed and reduced to a size of 128
x 128. The proposed feature extraction method helps to select the important
feature. To improve the efficiency of the classifier, the number of features
should be less in number. Specifically, the contributions towards this research
work are as follows,

Department of CSE, Jain Polytechnic Belgaum 35


OBJECT DETECTION USING TENSORFLOW

 An object recognition system is developed, that recognizes the two-


dimensional and three dimensional objects.
 The feature extracted is sufficient for recognizing the object and marking the
location of the object. x The proposed classifier is able to recognize the object
in less computational cost.
 The proposed global feature extraction requires less time, compared to the
traditional feature extraction method.
 The performance of the SVM-kNN is greater and promising when compared
with the BPN and SVM.
 The performance of the One-against-One classifier is efficient.
 Global feature extracted from the local parts of the image.
 Local feature PCA-SIFT is computed from the blobs detected by the Hessian-
Laplace detector.
 Along with the local features, the width and height of the object computed
through projection method is used.
The methods presented for feature extraction and recognition are common and
can be applied to any application that is relevant to object recognition.
The proposed object recognition method combines the state-of-art classifier
SVM and k-NN to recognize the objects in the image. The multiclass SVM is
used to hybridize with the k-NN for the recognition. The feature extraction
method proposed in this research work is efficient and provides unique
information for the classifier. The image is segmented into 16 parts, from each
part the Hu’s Moment invariant is computed and it is converted into Eigen
component. The local feature of the image is obtained by using the Hessian-
Laplace detector. This helps to obtain the objects feature easily and mark the
object location without much difficulty.

Department of CSE, Jain Polytechnic Belgaum 36


OBJECT DETECTION USING TENSORFLOW

As a scope for future enhancement,


 Features either the local or global used for recognition can be increased, to
increase the efficiency of the object recognition system.
 Geometric properties of the image can be included in the feature vector for
recognition.
 Using unsupervised classifier instead of a supervised classifier for recognition
of the object.
 The proposed object recognition system uses grey-scale image and discards
the colour information. The colour information in the image can be used for
recognition of the object. Colour based object recognition plays vital role in
Robotics. Although the visual tracking algorithm proposed here is robust in
many of the conditions, it can be made more robust by eliminating some of the
limitations as listed below:
 In the Single Visual tracking, the size of the template remains fixed for
tracking. If the size of the object reduces with the time, the background
becomes more dominant than the object being tracked. In this case the object
may not be tracked.
 Fully occluded object cannot be tracked and considered as a new object in the
next frame.
 Foreground object extraction depends on the binary segmentation which is
carried out by applying threshold techniques. So blob extraction and tracking
depends on the threshold value.
 Splitting and merging cannot be handled very well in all conditions using the
single camera due to the loss of information of a 3D object projection in 2D
images.
 For Night time visual tracking, night vision mode should be available as an
inbuilt feature in the CCTV camera.

Department of CSE, Jain Polytechnic Belgaum 37


OBJECT DETECTION USING TENSORFLOW

To make the system fully automatic and also to overcome the above limitations,
in future, multi-view tracking can be implemented using multiple cameras.
Multi view tracking has the obvious advantage over single view tracking
because of wide coverage range with different viewing angles for the objects to
be tracked. In this thesis, an effort has been made to develop an algorithm to
provide the base for future applications such as listed below.
 In this research work, the object Identification and Visual Tracking has been
done through the use of ordinary camera. The concept is well extendable in
applications like Intelligent Robots, Automatic Guided Vehicles, Enhancement
of Security Systems to detect the suspicious behaviour along with detection of
weapons, identify the suspicious movements of enemies on boarders with the
help of night vision cameras and many such applications.
 In the proposed method, background subtraction technique has been used that
is simple and fast. This technique is applicable where there is no movement of
camera. For robotic application or automated vehicle assistance system, due to
the movement of camera, backgrounds are continuously changing leading to
implementation of some different segmentation techniques like single Gaussian
mixture or multiple Gaussian mixture models.
 Object identification task with motion estimation needs to be fast enough to be
implemented for the real time system. Still there is a scope for developing faster
algorithms for object identification. Such algorithms can be implemented using
FPGA or CPLD for fast execution

Department of CSE, Jain Polytechnic Belgaum 38


OBJECT DETECTION USING TENSORFLOW

BIBLIOGRAPHY

1. Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect objects in
images via a sparse, part-based representation. IEEE Trans. Pattern Anal.
Mach. Intell. 26,1475–1490. doi:10.1109/TPAMI.2004.108

2. Alexe, B., Deselaers, T., and Ferrari, V. (2010). “What is an object?,” in


ComputerVision and Pattern Recognition (CVPR), 2010 IEEE Conference
on (San Francisco,CA: IEEE), 73–80. doi:10.1109/CVPR.2010.5540226

3. Aloimonos, J., Weiss, I., and Bandyopadhyay, A. (1988). Active vision. Int.
J.Comput. Vis. 1, 333–356. doi:10.1007/BF00133571

4. Andreopoulos, A., and Tsotsos, J. K. (2013). 50 years of object recognition:


direc-tions forward. Comput. Vis. Image Underst. 117, 827–891.
doi:10.1016/j.cviu.2013.04.005

5. Azizpour, H., and Laptev, I. (2012). “Object detection using strongly-


superviseddeformable part models,” in Computer Vision-ECCV 2012
(Florence: Springer),836–849.

6. Azzopardi, G., and Petkov, N. (2013). Trainable cosfire filters for keypoint
detectionand pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell.
35, 490–503.doi:10.1109/TPAMI.2012.106

7. Azzopardi, G., and Petkov, N. (2014). Ventral-stream-like shape


representation:from pixel intensity values to trainable object-selective
cosfire models. Front.Comput. Neurosci. 8:80.
doi:10.3389/fncom.2014.00080

8. Benbouzid, D., Busa-Fekete, R., and Kegl, B. (2012). “Fast classification


using sparsedecision dags,” in Proceedings of the 29th International

Department of CSE, Jain Polytechnic Belgaum 39


OBJECT DETECTION USING TENSORFLOW

Conference on MachineLearning (ICML-12), ICML ‘12, eds J. Langford


and J. Pineau (New York, NY:Omnipress), 951–958.

9. Bengio, Y. (2012). “Deep learning of representations for unsupervised and


transferlearning,” in ICML Unsupervised and Transfer Learning, Volume
27 of JMLRProceedings, eds I. Guyon, G. Dror, V. Lemaire, G. W. Taylor,
and D. L. Silver(Bellevue: JMLR.Org), 17–36.

10. Bourdev, L. D., Maji, S., Brox, T., and Malik, J. (2010). “Detecting
peopleusing mutually consistent poselet activations,” in Computer Vision –
ECCV2010 – 11th European Conference on Computer Vision, Heraklion,
Crete, Greece,September 5-11, 2010, Proceedings, Part VI, Volume 6316 of
Lecture Notes in Computer Science.

(Heraklion:Springer), 168–181.

11. Bourdev, L. D., and Malik, J. (2009). “Poselets: body part detectors trained
using 3dhuman pose annotations,” in IEEE 12th International Conference
on ComputerVision, ICCV 2009, Kyoto, Japan, September 27 – October 4,
2009 (Kyoto: IEEE),1365–1372.

Department of CSE, Jain Polytechnic Belgaum 40

You might also like