Batch 20 KV
Batch 20 KV
MASK R-CNN
A PROJECT REPORT
Submitted by
KANNAN. E (210416105018)
GOKUL. G (210416105012)
BACHELOR OF ENGINEERING
IN
ELECTRICAL AND ELECTRONICS ENGINEERING
SIGNATURE SIGNATURE
PROFESSOR PROFESSOR
Electrical and Electronics Engineering, Information Technology ,
Chennai Institute of Technology, Chennai Institute of Technology,
Pudhupedu, Kundrathur, Pudhupedu, Kundrathur,
Chennai – 600069 Chennai - 600069
We thank our beloved Chairman Shri. P. SRIRAM and all the trust members of
Chennai Institute of Technology at this high time for providing us with plethora of facilities to
complete my project successfully.
We owe our sincere gratitude to our vice chairman MR.P. JANAKIRAMAN and ours
seceratary Mrs. S. SRIDEVI, for helping us in all the way to complete the project succesfully.
We take privilege to express my thanks to our Principal Dr. A.Ramesh M.E,Phd., who has been a
bastion of moral strength and a source of incessant encourage to us.
We express our sincere thanks to Dr.M. ETTAPAN, Ph.D., Head of the Department,
Chennai Institute of Technology, for her valuable guidance and suggestions.
We take immense pleasure to express our heartfelt thanks to our beloved project guide,
Mr. R. JANARTHANAN, M.E., PH.D., and Co-project Guide Mr. KEERTHI
VIJAYADHASAN, M.E for their valuable suggestions, excellent guidance and constant
support provided all through the course of our project.
We also thank the teaching and non-teaching staff members of Electrical and Electronics
Engineering Department and all our fellow students who stood with us to complete our project
successfully.
Last but not least we extend our deep gratitude to our beloved family members for their
moral coordination, encouragement and financial support to carry out this project.
ABSTRACT
India is the second most populated countries in the world after China. The
management of crowd is a very challenging task due to the large population. For
the management of crowd and analysis of crowd. we have proposed a system
using Computer Vision and Deep Learning. In our system we use the Mask
RCNN. For the detection of the objects, specifically people in a real time
surveillance camera .We use our own created data to train the Convolution
Neural Network to detect people, Since only people has to be detected by our
Neural Network .With the help of the Deep Learning algorithm (Mask RCNN)
we detect the people who cross the camera and make a count of the people them
in parallel. When the count of the people exceed a certain threshold a warning
message displayed to manage crowd in the particular area .With our system the
management of the crowd becomes a easy task. The surveillance cameras are
present every where for the monitoring of the people, we use those surveillance
camera for our system to perform crowd management .With the help of this data
we can perform analysis of the crowd and the rate of people crossing through
the certain path. This helps in the easy and simple solution using Computer
Vision and Deep Learning.
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO
ABSTRACT (iv)
LIST OF FIGURES (ix)
LIST OF TERMINOLOGIES (ix)
1 INTRODUCTION 1
1.1 Aim 1
1.2 Object Detection 2
1.3 Mask RCNN 3
1.4 Characteristics of Mask RCNN 4
1.5 Applications of Object Detections 5
1.6 Object Detection Algorithms 6
2 LITERATURE SURVEY 8
3 SYSTEM ANALYSIS 11
3.1 Existing System 11
3.1.1 Disadvantages in Existing System 1 12
5 SYSTEM DESIGN 20
5.1 Introduction 20
5.2 UML Diagram 20
5.2.1 System Architecture 21
5.2.2 Sequence Diagram 22
5.2.3 Class Diagram 23
6 SYSTEM IMPLEMENTATION 24
6.1 Module 24
6.1.1 Image Annotation Module 24
6.1.2 Object Detection Module 25
6.1.3 People counter module 25
6.2 Algorithm 25
6.2.1 Mask-RCNN 25
7 SYSTEM TESTING 27
7.1 Coding Standards 27
7.1.1 Naming Conventions 27
7.1.2 Value Conventions 28
7.1.3 Script Writing Standard 28
7.1.4 Message Box Format 29
7.2 Testing Objective 29
7.3 Types of Testing 30
7.3.1 Unit Testing 30
7.3.2 Integration Testing 30
7.3.3 Validation Testing 31
7.4 Testing Strategies 31
7.4.1 White Box Testing 31
7.4.2 Black Box Testing 32
7.4.3 User Interface Testing 32
7.4.4 Module Testing 32
7.4.5 Integration Testing 33
7.4.6 User Acceptance Testing 33
8 FEASIBILITY STUDY 34
8.1 Feasibility Study 34
8.1.1 Technical Feasibility 34
8.1.2 Economic Feasibility 35
8.1.3 Operational Feasibility 35
9 RESULT ANALYSIS 37
A) SCREENSHOTS 38
11) REFERENCES 44
LIST OF FIGURES
LIST OF ABBREVIATIONS
II
CHAPTER 1
INTRODUCTION
1.1 AIM
1
monitoring of the people, we use those surveillance camera for
our system to perform crowd management. With the help of
this data we can perform analysis of the crowd and the rate of
people crossing through the certain path. This helps in the easy
and simple solution using Computer Vision and Deep Learning.
2
approaches. For deep learning approaches any one of the following
methods or approaches is to be used for object detection:
Retina-Net
3
1.4 CHARACTERISTICS OF MASK RCNN
Mask RCNN has been the new state of art in terms of instance
segmentation.
Mask RCNN is a deep neural network aimed to solve instance
segmentation problem in machine learning or computer vision.
Backbone is a Feature Pyramid network style deep neural
network.
A light weight neural network called RPN scans all FPN top-
bottom pathwayand proposes regions which may contain
objects.
Mask RCNN is that we could actually force different
layers in neural network to learn features with different
scales.
4
1.5 APPLICATIONS OF OBJECT DETECTION
5
belong to a given class. Examples include upper torsos, pedestrians,
and cars.
4. PEDESTRIAN DETECTION
Pedestrian detection is an essential and significant task in any
intelligent video surveillance system, as it provides the fundamental
information for semantic understanding of the video footages. It has an
obvious extension to automotive applications due to the potential for
improving safety systems.
1.6 OBJECT DETECTION ALGORITHMS
1. Single Shot Detector (SSD):
3. Fast R-CNN :
Fast RCNN uses the ideas from SPP-net and RCNN and fixes the
key problem in SPP-net i.e. they made it possible to train end-to-end.
To propagate the gradients through spatial pooling, It uses a simple
back-propagation calculation which is very similar to max-pooling
6
gradient calculation with the exception that pooling regions overlap
and therefore a cell can have gradients pumping in from
multiple regions
7
CHAPTER 2
LITERATURE SURVEY
Description:
The vision community has rapidly improved object detection and
semantic segmentation results over a short period of time. In large
part, these advances have been driven by
powerful baseline systems, such as the Fast/Faster R-CNN and Fully
Convolutional Network (FCN) frameworks for object detection and
semantic segmentation, respectively. These methods are conceptually
intuitive and offer flexibility and robustness, together with fast
training and inference time. Our goal in this work is to develop a
comparably enabling framework for instance segmentation.
Description:
We propose a deep Convolutional Neural Network(CNN) for
counting the number of people across a line-of-interest(LOI) in
surveillance videos. It is a challenging problem and has many
potential applications. Observing the limitations of temporal slices used
8
by state-of-the-art LOI crowd counting methods, our proposed CNN
directly estimates the crowd counts with pairs of video frames as inputs
and is trained with pixel-level supervision maps. Such rich supervision
information helps our CNN learn more discriminative feature
representations.
A two-phase training scheme is adopted, which decomposes the original
counting problem into two easier sub-problems, estimating crowd
density map and estimating crowd velocity map. Learning to solve the
sub-problems provides a good initial point for our CNN model, which is
then _ne-tuned to solve the original counting problem. A new dataset
with pedestrian trajectory annotations is introduced for evaluating LOI
crowd counting methods and has more annotations than any existing one.
Our extensive experiments show that our proposed method is robust to
variations of crowd density, crowd velocity, and directions of the LOI,
and outperforms state-of-the-art LOI counting methods.
Authors:
Description:
9
being counted. To overcome the above issues, we propose a novel
real-time people counting approach dubbed YOLO-PC (YOLO based
People Counting).
10
CHAPTER 3
SYSTEM ANALYSIS
11
3.2 PROPOSED SYSTEM
13
CHAPTER 4
SYSTEM REQUIREMENT
4.1 INTRODUCTION
The system requirement is a technical specification of
requirements for the software products. It is the first step in the
requirements analysis process it lists the requirements of a particular
software system including functional, performance and security
requirements. The requirements also provide usage scenarios from a
user, an operational and an administrative perspective. The purpose
of software requirements specification is to provide a detailed
overview of the software project, its parameters and goals. This
describes the project target audience and its user-interface, hardware
and software requirements. It defines how the client, team and
audience see the project and its functionality.
14
4.2.2 SOFTWARE REQUIREMENTS
Operating System : Windows 10
Technologies Used : Python, Tensor flow, keras, ImageAI
Tools Used : PyCharm
4.3.1 PYTHON
4.3.2 TENSORFLOW
4.3.3 KERAS
16
4.3.4 Open-CV
OpenCV (Open Source Computer Vision Library) is an open
source computer vision and machine learning software library.
OpenCV was built to provide a common infrastructure for computer
vision applications and to accelerate the use of machine perception in
the commercial products. Being a BSD-licensed product, OpenCV
makes it easy for businesses to utilize and modify the code. The
library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision
and machine learning algorithms. These algorithms can be used to
detect and recognize faces, identify objects, classify human actions in
videos
17
4.3.6 VERSION TRACKING:GIT
Version control systems are a category of software tools that help a
software team manage changes to source code overtime. Version
control software keeps track of every modification to the code in a
special kind of database. If a mistake is made, developers can turn
back the clock and compare earlier versions of the code to help fix the
mistake while minimizing disruption to all team members.
4.3.8 CYTHON
CPython is the reference implementation of the Python
programming language. Written in C and Python, CPython is the
default and most widely used implementation of the language.
CPython can be defined as both an interpreter and a compiler as it
compiles Python code into byte code before interpreting it. It has
18
a foreign function interface with several languages including C, in
which one must explicitly write bindings in a language other than
Python.
4.3.9 NUMPY
19
CHAPTER 5
SYSTEM DESIGN
5.1 INTRODUCTION
20
5.2.1 SYSTEM ARCHITECTURE DIAGRAM
21
5.2.2 SEQUENCE DIAGRAM
22
5.2.3 CLASS DIAGRAM
23
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 MODULES
24
6.1.2 OBJECT DETECTION MODULE:
6.2 ALGORITHM
The system uses MASK RCNN algorithm for the detection of the objects
for the people in the image or the video. The explanation of the algorithm is
as follows
26
CHAPTER-7
SYSTEM TESTING
7.1 CODING STANDARDS
Coding standards are guidelines to programming that focuses on
the physical structure and appearance of the program. They make the code
easier to read, understand and maintain. This phase of the system actually
implements the blueprint developed during the design phase. The coding
specification should be in such a way that any programmer must be able
to understand the code and can bring about change whenever felt
necessary. Some of the standard needed to achieve the above-mentioned
objectives are as follows:
Program should be simple, clear and easy to understand.
Naming conventions
Value conventions
27
user. So it is customary to follow the conventions. These
conventions are as follows:
Class names
Class names are problem domain equivalence and begin
with capital letter and have mixed cases. Member Function and Data
Member name
28
7.1.4 MESSAGE BOX FORMAT
7.2TESTING OBJECTIVES:
Testing is a set of activities that can be planned in advance and
conducted systematically. For this reason a template for software
testing, a set of steps into which can place specific test case design
techniques and testing methods should be defined for software
process. Testing often accounts for more effort than any other
software engineering activity. If it is conducted haphazardly, time is
wasted, unnecessary effort is expanded, and even worse, errors sneak
through undetected. It would therefore seem reasonable to establish a
systematic strategy for testing software.
29
7.3TYPES OF TESTING:
30
7.3.3 VALIDATION TESTING:
31
what is going on during execution of the system. The point at which
the bug occurs were all clear and were removed.
32
their correctness. By testing in this method we would be very clear of
all the bugs that have occurred.
33
CHAPTER 8
FEASIBILITY STUDY
A feasibility study is carried out to select the best system that meets
performance requirements. The main aim of the feasibility study
activity is to determine whether it would be financially and
technically feasible to develop the product. The feasibility study
activity involves the analysis of the problem and collection of all
relevant information relating to the product such as the different data
items which would be input to the system, the processing required to
be carried out on these data, the output data required to be produced
by the system as well as various constraints on the behavior of the
system.
34
In examining technical feasibility, configuration of the system is
given more importance than the actual make of hardware. The
configuration should give the complete picture about the system’s
requirements: How many workstations are required, how these units
are interconnected so that they could operate and communicate
smoothly. And what speeds of input and output should be achieved at
particular quality of printing.
35
What new skills will be required? Do the existing staff
members have these skills? If not, can they be trained in due
course of time?
This feasibility study is carried out by a small group of people
who are familiar with information system technique and are skilled
in system analysis and design process. Proposed projects are
beneficial only if they can be turned into information system that
will meet the operating requirements of the organization. This test of
feasibility asks if the system will work when it is developed and
installed.
36
CHAPTER-9
RESULT ANALYSIS
Based on the test results obtained from different Test Strategies we can we
define the performance of the code and it’s efficiency. From the Black box
testing and white box testing there are no internal errors . These tests can be
functional or non-functional, though usually functional. They give us the
results that our code is working fine.
The model which is trained by the set of data which are web scraped online.
Those images and video are fitted to train the model.
The model was able to recognize the test data which is totally different from
the training data which his used to train the model. The model accuracy for
the test images and video
The general threshold for the IOU can be 0.5, our model has IOU of 0.57
which is considered to decent score among the object detection models.
The accuracy of the model from the lab test under certain condition has the
average score above 80.44 .
37
A)SCREENSHOTS
38
FIG 10.2 Real Time Object Detection Module
39
FIG 10.3 Video Object Detection module.
40
FIG 10.4 People Counter Module
41
FIG 10.5 Mask RCNN Object detection module
42
CHAPTER- 10
The crowd management is done efficiently with the help of the Mask R-
CNN algorithm with the precision, which helps to monitor each and every
individual in the hallway or any place which has high possibility of people
gathering. Thereby we can maintain social distance in these places without
fear the of social transmission.
In a large bus stands and railway stations , during festivities we can manage
the crowd in these areas by diverting those crowd to prevent hassle.
43
CHAPTER 11
REFERENCES
1. He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN.
Facebook AI research (FAIR). arXiv:1703.06870v3 24 Jan
2018
44
Edinburg, Texas Ryan Luna UTRGV Edinburg, Texas
45