REPORT Python
REPORT Python
CHAPTER 1
INTRODUCTION
there are various models available so what is different in these models. These
various models have different architecture and thus provide different accuracies
but there is a trade-off between speed of execution and the accuracy in placing
bounding boxes.
Tensorflow bundles together Machine Learning and Deep Learning models and
algorithms. It uses Python as a convenient front-end and runs it efficiently in
optimized C++.
TensorFlow is at present the most popular software library. There are several
real-world applications of deep learning that makes TensorFlow popular. Being
an Open-Source library for deep learning and machine learning, TensorFlow
finds a role to play in text-based applications, image recognition, voice search,
and many more. DeepFace, Facebook’s image recognition system, uses
1.1 Objectives
The aim of object detection is to detect all instances of objects from a known
class, such as people, cars or faces in an image. Generally, only a small number
of instances of the object are present in the image, but there is a very large
number of possible locations and scales at which they can occur and that need to
somehow be explored. Each detection of the image is reported with some form
of pose information. This is as simple as the location of the object, a location
and scale, or the extent of the object defined in terms of a bounding box. In
some other situations, the pose information is more detailed and contains the
parameters of a linear or non-linear transformation.
CHAPTER 2
LITERATURE SURVEY :
Ionic documentation:
https://github.com/tensorflow/tensorflow
https://www.mygreatlearning.com/blog/object-detection-using-tensorflow/
This website contains virtual environment setup and tutorial to set the
dependencies.
In various fields, there is a necessity to detect the target object and also track
them effectively while handling occlusions and other included complexities.
Many researchers (Almeida and Guting 2004, Hsiao-Ping Tsai 2011, Nicolas
Papadakis and Aure lie Bugeau 2010 ) attempted for various approaches in
object tracking. The nature of the techniques largely depends on the
application domain.
CHAPTER 3
SYSTEM ANALYSIS
ResNet
To train the network model in a more effective manner, we herein adopt the
same strategy as that used for DSSD (the performance of the residual network is
better than that of the VGG network). The goal is to improve accuracy.
However, the first implemented for the modification was the replacement of the
VGG network which is used in the original SSD with ResNet. We will also add
a series of convolution feature layers at the end of the underlying network.
These feature layers will gradually be reduced in size that allowed prediction of
the detection results on multiple scales. When the input size is given as 300 and
320, although the ResNet–101 layer is deeper than the VGG–16 layer, it is
experimentally known that it replaces the SSD’s underlying convolution
network with a residual network, and it does not improve its accuracy but rather
decreases it.
R-CNN
SelectiveSearch:
1. Generate the initial sub-segmentation, we generate many candidate regions
2. Use the greedy algorithm to recursively combine similar regions into larger
ones
3. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and
fed into a convolutional neural network that produces a 4096-dimensional
feature vector as output. The CNN plays a role of feature extractor and the
output dense layer consists of the features extracted from theimage and the
extracted features are fed into an SVM for the classify the presence of the object
within that candidate region proposal.
Fast R-CNN
The same author of the previous paper(R-CNN) solved some of the drawbacks
of R-CNN to build a faster object detection algorithm and it was called Fast R-
CNN. The approach is similar to the R-CNN algorithm. But, instead of feeding
the region proposals to the CNN, we feed the input image to the CNN to
generate a convolutional feature map. From the convolutional feature map, we
can identify the region of the proposals and warp them into the squares and by
using an RoI pooling layer we reshape them into the fixed size so that it can be
fed into a fully connected layer. From the RoI feature vector, we can use a
softmax layer to predict the class of the proposed region and also the offset
values for the bounding box. The reason “Fast R-CNN” is faster than R-CNN is
because you don’t have to feed 2000 region proposals to the convolutional
neural network every time. Instead, the convolution operation is always done
only once per image and a feature map is generated from it.
Faster R-CNN
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search to
find out the region proposals. Selective search is the slow and time-consuming
process which affect the performance of the network. Similar to Fast R-CNN,
the image is provided as an input to a convolutional network which provides a
convolutional feature map. Instead of using the selective search algorithm for
the feature map to identify the region proposals, a separate network is used to
predict the region proposals. The predicted the region which is proposals are
then reshaped using an RoI pooling layer which is used to classify the image
within the proposed region and predict the offset values for the bounding boxes.
All the previous object detection algorithms have used regions to localize the
object within the image. The network does not look at the complete image.
Instead, parts of the image which has high probabilities of containing the object.
YOLO or You Only Look Once is an object detection algorithm much is
different from the region based algorithms which seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class probabilities
for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of
the grid we take m bounding boxes. For each of the bounding box, the network
gives an output a class probability and offset values for the bounding box. The
bounding boxes have the class probability above a threshold value is selected
and used to locate the object within the image.YOLO is orders of magnitude
faster(45 frames per second) than any other object detection algorithms. The
limitation of YOLO algorithm is that it struggles with the small objects within
the image, for example, it might have difficulties in identifying a flock of birds.
This is due to the spatial constraints of the algorithm.
SDD:
The SSD object detection composes of 2 parts:
SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3 layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it
should be 38 × 38). For each cell in the image(also called location), it makes 4
object predictions.
Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the
bounded object.
SSD does not use the delegated region proposal network. Instead, it resolves to
a very simple method. It computes both the location and class scores using
small convolution filters. After extraction the feature maps, SSD applies 3 × 3
convolution filters for each cell to make predictions. (These filters compute the
results just like the regular CNN filters.) Each filter gives outputs as 25
channels: 21 scores for each class plus one boundary box.
Beginning, we describe the SSD detects objects from a single layer. Actually, it
uses multiple layers (multi-scale feature maps) for the detecting objects
independently. As CNN reduces the spatial dimension gradually, the resolution
of the feature maps also decrease. SSD uses lower resolution layers for the
detect larger-scale objects. For example, the 4× 4 feature maps are used for the
larger-scale object.
SSD adds 6 more auxiliary convolution layers to image after VGG16. Five of
these layers will be added for object detection. In which three of those layers,
we make 6 predictions instead of 4. In total, SSD makes 8732 predictions using
6 convolution layers.
Multi-scale feature maps enhance accuracy.The accuracy with different number
of feature map layers is used for object detection.
MANet:
Target detection is fundamental challenging problem for long time and has been
a hotspot in the area of computer vision for many years. The purpose and
objective of target detection is, to determine if any instances of a specified
category of objects exist in an image. If there is an object to be detected in a
specific image, target detection return the spatial positions and the spatial extent
of the instances of the objects (based on the use a bounding box, for example).
As one of cornerstones of image understanding and computer vision,target and
object detection forms the basis for more complex and higher-level visual tasks,
such as object tracking, image capture, instance segmentation, and others.Target
detection is also widely used in areas such as artificial intelligence and
information technology, including machine vision, automatic driving vehicles,
and human–computer interaction. In recent times, the method automatic
learning of represented features from data based on deep learning has
effectively improved performance of target detection. Neural networks are
foundation of deep learning.
Therefore, design of better neural networks has become an key issue toward
improvement of target detection algorithms and performance. Recently
developed object detectors that has been based on convolutional neural
networks (CNN) has been classified in two types:The first is two-stage detector
type, such as Region-Based CNN (R–CNN), Region-Based Full Convolutional
Networks (R–FCN), and Feature Pyramid Network (FPN), and the other is
single-stage detector, such as the You Only Look Once (YOLO), Single-shot
detector (SSD), and the RetinaNet. The former type generates an series of
candidate frames as samples of data , and then classifies the samples based on a
CNN; the latter type do not generate candidate frames but directly converts the
object frame positioning problem into a regression processing problem.
CHAPTER 4
i. Tensorflow:
Tensorflow is an open-source software library for dataflow and differentiable
programming across a range of tasks. It is an symbolic math library, and is also
used for machine learning application such as neural networks,etc.. It is used for
both research and production by Google. Tensorflow is developed by the
Google Brain team for internal Google use. It is released under the Apache
License 2.0 on November 9,2015. Tensorflow is Google Brain's second-
generation system.1st Version of tensorflow was released on February 11,
2017.While the reference implementation runs on single devices, Tensorflow
can run on multiple CPU’s and GPU (with optional CUDA and SYCL
extensions for general-purpose computing on graphics processing units).
TensorFlow is available on various platforms such as64-bit Linux, macOS,
Windows, and mobile computing platforms including Android and iOS.
The architecture of tensorflow allows the easy deployment of computation
across a variety of platforms (CPU’s, GPU’s, TPU’s), and from desktops -
clusters of servers to mobile and edge devices.
ii. Numpy:
NumPy is library of Python programming language, adding support for large,
multi-dimensional array and matrice, along with large collection of high-level
mathematical function to operate over these arrays. The ancestor of NumPy,
Numeric, was originally created by Jim Hugunin with contributions from
several developers. In 2005 Travis Olphant created NumPy by incorporating
features of computing Numarray into Numeric, with extension modifications.
NumPy is open-source software and has many contributors.
iii. SciPy:
SciPy contain modules for many optimizations, linear algebra, integration,
interpolation, special function, FFT, signal and image processing, ODE solvers
and other tasks common in engineering. SciPy abstracts majorly on NumPy
array object, and is the part of the NumPy stack which include tools like
Matplotlib, pandas and SymPy,etc., and an expanding set of scientific
computing libraries. This NumPy stack has similar uses to other applications
such as MATLAB,Octave, and Scilab. The NumPy stack is also sometimes
referred as the SciPy stack. The SciPy library is currently distributed under
BSDlicense, and its development is sponsored and supported by an open
communities of developers. It is also supported by NumFOCUS, community
foundation for supporting reproducible and accessible science.
iv. OpenCV:
v. Pillow:
Python Imaging Library is a free Python programming language library that
provides support to open, edit and save several different formats of image files.
Windows, Mac OS X and Linux are available for this.
pip install pillow -command
vi. Matplotlib:
Matplotlib is a Python programming language plotting library and its NumPy
numerical math extension. It provides an object-oriented API to use general-
purpose GUI toolkits such as Tkinter, wxPython, Qt, or GTK+ to embed plots
into applications.
vii. H5py:
The software h5py includes a high-level and low-level interface for Python's
HDF5 library. The low interface expected to be complete wrapping of the
HDF5 API, while the high-level component uses established Python and
NumPy concepts to support access to HDF5 files, datasets and groups.
A strong emphasis on automatic conversion between Python (Numpy) datatypes
and data structures and their HDF5 equivalents vastly simplifies the process of
reading and writing data from Python.
viii. Cython:
Cython is an optimising static compiler for both the Python programming
language and the extended Cython programming language (based on Pyrex). It
makes writing C extensions for Python as easy as Python itself.
Cython gives you the combined power of Python and C to let you
write Python code that calls back and forth from and to C or C++ code
natively at any point.
easily tune readable Python code into plain C performance by adding
static type declarations, also in Python syntax.
use combined source code level debugging to find bugs in your Python,
Cython and C code.
interact efficiently with large data sets, e.g. using multi-dimensional
NumPy arrays.
quickly build your applications within the large, mature and widely used
CPython ecosystem.
integrate natively with existing code and data from legacy, low-level or
high-performance libraries and applications.
CPython versions and their latest in-development branches to make sure that the
generated code stays widely compatible and well adapted to each version. PyPy
support is work in progress (on both sides) and is considered mostly usable
since Cython 0.17. The latest PyPy version is always recommended here.
All of this makes Cython the ideal language for wrapping external C libraries,
embedding CPython into existing applications, and for fast C modules that
speed up the execution of Python code.
ix. Lxml
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and
libxslt. It is unique in that it combines the speed and XML feature completeness
of these libraries with the simplicity of a native Python API, mostly compatible
but superior to the well-known ElementTree API. The latest release works with
all CPython versions from 2.7 to 3.9. See the introduction for more information
about background and goals of the lxml project.
x. Tf_slim
Object segmentation
Recognition in context
Superpixel stuff segmentation
330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image
250,000 people with keypoints
https://github.com/protocolbuffers/protobuf/releases
Step 6: Now in the anaconda prompt, navigate to the folder containing the
protoc file using the cd ‘path of folder’ and run this command
https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv
Step 1: Download the whl file (version which is compatible with your python
and windows) from the above website in models directory.
Step 2: Now in the anaconda prompt navigate to the folder and install openCV
using pip
CHAPTER 5
CHAPTER 6
6.1 Code
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import pathlib
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
return model
PATH_TO_LABELS =
'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index =
label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS,
use_display_name=True)
model_name = 'ssd_inception_v2_coco_2017_11_17'
detection_model = load_model(model_name)
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference
model_fn = model.signatures['serving_default']
output_dict = model_fn(input_tensor)
return output_dict
# Actual detection.
display(Image.fromarray(image_np))
PATH_TO_TEST_IMAGES_DIR =
pathlib.Path('models/research/object_detection/test_images')
2
TEST_IMAGE_PATHS =
sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
for image_path in TEST_IMAGE_PATHS:
print(image_path)
show_inference(detection_model, image_path)
import os
import sys
import tarfile
import tensorflow as tf
import zipfile
import pathlib
os.chdir('..')
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
Department of CSE, Jain Polytechnic Belgaum 24
OBJECT DETECTION USING TENSORFLOW
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
return model
PATH_TO_LABELS =
'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index =
label_map_util.create_category_index_from_labelmap(PATH_TO_L
ABELS, use_display_name=True)
model_name = 'ssd_inception_v2_coco_2017_11_17'
detection_model = load_model(model_name)
image = np.asarray(image)
input_tensor = tf.convert_to_tensor(image)
input_tensor = input_tensor[tf.newaxis,...]
model_fn = model.signatures['serving_default']
output_dict = model_fn(input_tensor)
# Convert to numpy arrays, and take index [0] to remove the batch
dimension.
num_detections = int(output_dict.pop('num_detections'))
output_dict['num_detections'] = num_detections
output_dict['detection_classes'] =
output_dict['detection_classes'].astype(np.int64)
detection_masks_reframed =
utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'],
output_dict['detection_boxes'],
image.shape[0], image.shape[1])
tf.uint8)
output_dict['detection_masks_reframed'] =
detection_masks_reframed.numpy()
return output_dict
#take the frame from webcam feed and convert that to array
image_np = np.array(frame)
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed',
None),
use_normalized_coordinates=True,
line_thickness=5)
return(image_np)
Imagenp=show_inference(detection_model, frame)
cv2.imshow('object detection', cv2.resize(Imagenp, (800,600)))
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video_capture.release()
cv2.destroyAllWindows()
CHAPTER 7
CHAPTER 8
APPLICATION AND ADVANTAGES
Object detection is completely inter-linked with other similar computer
vision techniques such as image segmentation and image recognition that assist
us to understand and analyze the scenes in videos and images. Nowadays,
several real-world use cases are implemented in the market of object detection
which make a tremendous impact on different industries.
Self-driving cars
Video Surveillance
Real-time object detection and tracking the movements of objects allow video
surveillance cameras to track the record of scenes of a particular location such
as an airport. This state-of-the-art technique accurately recognizes and locates
several instances of a given object in the video. In real-time, as the object moves
through a given scene or across the particular frame, the system stores the
information with real-time tracking feeds.
Crowd Counting
For heavily populated areas such as shopping malls, airports, city squares and
theme parks, this application performs unbelievably well. Generally, this object
detection application proves to be helpful to large enterprises and municipalities
for tracking road traffic, violation of laws and number of vehicles passing in a
particular time frame.
Anomaly detection
plant disease. With the help of this, farmers will get notified and they will be
able to prevent their crops from such threats.
As another example, this model has been used to identify the skin infections and
symptomatic lesions. Some applications are already built for skin care and acne
treatment using object detection models.
Keep in mind, there are some problems encountered while creating any kind of
object detection model. However, solutions are also available to limit the
challenges.
CHAPTER 9
9.1 CONCLUSION
By using this thesis and based on experimental results we are able to detect
obeject more precisely and identify the objects individually with exact location
of an obeject in the picture in x,y axis.This paper also provide experimental
results on different methods for object detection and identification and
compares each method for their efficiencies.
To make the system fully automatic and also to overcome the above limitations,
in future, multi-view tracking can be implemented using multiple cameras.
Multi view tracking has the obvious advantage over single view tracking
because of wide coverage range with different viewing angles for the objects to
be tracked. In this thesis, an effort has been made to develop an algorithm to
provide the base for future applications such as listed below.
In this research work, the object Identification and Visual Tracking has been
done through the use of ordinary camera. The concept is well extendable in
applications like Intelligent Robots, Automatic Guided Vehicles, Enhancement
of Security Systems to detect the suspicious behaviour along with detection of
weapons, identify the suspicious movements of enemies on boarders with the
help of night vision cameras and many such applications.
In the proposed method, background subtraction technique has been used that
is simple and fast. This technique is applicable where there is no movement of
camera. For robotic application or automated vehicle assistance system, due to
the movement of camera, backgrounds are continuously changing leading to
implementation of some different segmentation techniques like single Gaussian
mixture or multiple Gaussian mixture models.
Object identification task with motion estimation needs to be fast enough to be
implemented for the real time system. Still there is a scope for developing faster
algorithms for object identification. Such algorithms can be implemented using
FPGA or CPLD for fast execution
BIBLIOGRAPHY
1. Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect objects in
images via a sparse, part-based representation. IEEE Trans. Pattern Anal.
Mach. Intell. 26,1475–1490. doi:10.1109/TPAMI.2004.108
3. Aloimonos, J., Weiss, I., and Bandyopadhyay, A. (1988). Active vision. Int.
J.Comput. Vis. 1, 333–356. doi:10.1007/BF00133571
6. Azzopardi, G., and Petkov, N. (2013). Trainable cosfire filters for keypoint
detectionand pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell.
35, 490–503.doi:10.1109/TPAMI.2012.106
10. Bourdev, L. D., Maji, S., Brox, T., and Malik, J. (2010). “Detecting
peopleusing mutually consistent poselet activations,” in Computer Vision –
ECCV2010 – 11th European Conference on Computer Vision, Heraklion,
Crete, Greece,September 5-11, 2010, Proceedings, Part VI, Volume 6316 of
Lecture Notes in Computer Science.
(Heraklion:Springer), 168–181.
11. Bourdev, L. D., and Malik, J. (2009). “Poselets: body part detectors trained
using 3dhuman pose annotations,” in IEEE 12th International Conference
on ComputerVision, ICCV 2009, Kyoto, Japan, September 27 – October 4,
2009 (Kyoto: IEEE),1365–1372.