Face Mask Detection Final Report Draft
Face Mask Detection Final Report Draft
Xavier’S College
Affiliated to Tribhuvan University
Maitighar, Kathmandu
Submitted by
Dilip Karki (T.U. Exam Roll No. 15173/074)
Kishan KC (T.U. Exam Roll No. 15181/074)
Submitted to
St. Xavier’S College
Department of Computer Science
Maitighar, Kathmandu, Nepal
April, 2021
Face Mask Detection And Alert System – [CSC-404]
Submitted By:
Dilip Karki (T.U. Exam Roll No. 15173/074)
Kishan KC (T.U.Exam Roll No. 15181/074)
Submitted To:
St. Xavier’S College
Department of Computer Science
Maitighar, Kathmandu, Nepal
April, 2021
CERTIFICATE OF APPROVAL
The undersigned certify that they have read and recommended to the Department of
Computer Science for acceptance, a project proposal entitled “Face Mask Detection
and Alert System” submitted by Dilip Karki (T.U. Exam Roll No. 15173/074) and
Kishan KC (T.U. Exam Roll No. 15181/074) for the partial fulfillment of the
requirement for the degree of Bachelor of Science in Computer Science and Information
Technology awarded by Tribhuvan University.
…………………………..
Mr. Bal Krishna Subedi
Project Supervisor /Lecturer
St. Xavier’s College
…………………………..
External Examiner
Tribhuvan University
…………………………..
Mr. Ganesh Yogi
Head of the Department
Department of Computer Science
St. Xavier’s College
ACKNOWLEDGEMENT
We are momentously privileged to be the students of Computer Science here in St. Xavier’s
College with a department, utterly packed by expertise of the respective field, greatly
supportive to the learners. We would like to express our sincere gratitude to Mr. Sarjan
Shrestha – our supervisor for creating a virtuous academic and sociable environment to
foster this project. Therefore, we would like to express our innermost thanks to him for
providing us with all the crucial advices, guidelines and resources for the accomplishment
of this project.
We are also grateful to the entire Computer Science Department of St. Xavier’s College for
housing us a seemly environment where we could work with this project. We were pleased
to be under the commands of the department to help us from all possible ways. We would
also take this opportunity to express our gratitude to Mr. Ganesh Yogi for his continuous
encouragement and support throughout the completion of this project.
We would also like to express our heartfelt gratitude to Er. Rajan Karmacharya, Mr. Bal
Krishna Subedi, Er. Anil Shah, Er. Saugat Sigdel, Er. Nitin Malla, Er. Sansar Dewan,
Er. Sanjay Kumar Yadav, Mr. Ganesh Dhami and Mr. Ramesh Shahi for their constant
support and guidance.
At the end we would like to express our sincere thanks to all our friends and others who
helped us directly or indirectly during this project work.
i
ABSTRACT
Object recognition is one of the newest and most widely studied areas of laptop imagination
and predictive structure. The purpose of object recognition is to discover positive teaching
gadgets on the perimeter side of a particular photo and assign the corresponding great
designations. With the help of deep acquaintances, the use and performance of object
recognition structures has expanded significantly. Our task contains modern-day strategies
for item detection that also can be used for real-time item detection.
One of the main inconveniences of many element detection mechanisms is the reliance on
various witty predictive tactics on laptops before resorting to the use of deep learning,
resulting in overall performance within the system. Is to drop. This task uses in-depth
knowledge to solve stop-to-stop item recognition issues. The community is trained on
datasets developed in-house. The following modules are very quick and accurate and can
also be used for real-time item recognition.
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENT ........................................................................................... i
ABSTRACT ................................................................................................................. ii
Background .....................................................................................................1
iii
2.2.3 Threshold ...................................................................................................13
Design............................................................................................................30
CHAPTER 5: IMPLEMENTATION.......................................................................33
Implementation..............................................................................................33
Testing ...........................................................................................................42
Conclusion.....................................................................................................56
iv
REFERENCES ...........................................................................................................57
v
LIST OF FIGURES
vi
Figure 26 counted mask ............................................................................................... 45
Figure 31 Detection of face with different cases(wearing mask and not wearing mask)
...................................................................................................................................... 49
Figure 37: Email alert after detection of absence of face mask ................................... 55
vii
CHAPTER 1: INTRODUCTION
Background
Face mask is crucial in the prevention of airborne diseases. Droplets of microorganisms
ejected into the air by coughing, sneezing, or talking produce an airborne illness. The
pathogens in question could be viruses, bacteria, or fungi. Tuberculosis, influenza, and
small pox are just a few of the frequent illnesses that can spread through the air.
People with some diseases can transfer disease via the air when they cough, sneeze, or talk,
releasing nasal and throat secretions. Some viruses or bacteria take to the air and float
around, landing on humans or surfaces. When you inhale harmful germs from the air, they
take up residence inside you. You can also pick up germs by touching a germ-infested
surface and then touching your own eyes, nose, or mouth. These infections are difficult to
control because they spread through the air.
COVID-19 has recently triggered a global pandemic. COVID-19 spreads when an infected
person exhales virus-containing droplets and very minute particles. Other people may
inhale these droplets and particles, or they may settle on their eyes, noses, or mouths. They
may contaminate surfaces they come into contact with in some cases. People who are closer
than 6 feet from the infected person are most likely to get infected.
Wearing a mask can prevent the spattering of the droplets from the body of infected person.
If a person is infected by any airborne diseases then using face mask he/she can prevent
other people from being infected and vice versa.
Previous studies have found that facemask-wearing is valuable in preventing the spread of
respiratory viruses. For instance, the efficiencies of N95 and surgical masks in blocking the
transmission of SARS are 91% and 68%, respectively. Facemask-wearing can interrupt
airborne viruses and particles effectively, such that these pathogens cannot enter the
respiratory system of another person. As a non-pharmaceutical intervention, facemask-
wearing is a non-invasive and cheap method to reduce mortality and morbidity from
respiratory infections [1].
Hence, it is very crucial to detect whether an individual is wearing a face mask. Our project
will help in detecting the presence of face mask on the face of a person. If the person isn’t
1
wearing mask of any kind then the system will detect such individual and alert the
concerned authority. Mostly the mask detection is being done in manual fashion. With alert
system functioning side by side with the face mask detection system, the process of mask
detection can be automated. This will help in the prevention in the spread of many airborne
disease because the recent global pandemic has demonstrated the importance of preventing
such diseases.
Problem Statement
Face mask is crucial for the spread of many airborne diseases. Although the importance of
face mask is evident, many people can be seen roaming in public places such as banks
without face mask. Some people don’t think wearing a face mask is a moral duty. When
such people go unnoticed or unpunished, then such people tend to not wear mask in future
as well. Many people wear the mask in improper way i.e not covering the nose and mouth
properly.
Project Objectives
This project is to help identify face masks as an object in video surveillance cameras across
different places like hospitals, emergency departments, out-patient facilities, residential
care facilities, emergency medical services, and home health care delivery to provide safety
to doctors, patients and reduce the outbreak of disease. Where the detection of Face Mask
would be required to happen in Real-time as the necessary actions in case of any
disobedience will be taken on the spot.
2
Project Scope
1.4.1 Airports:
The Face Mask Detection System can be used at airports to detect travelers without
masks. Face data of travelers can be captured in the system at the entrance. If a traveler
is found to be without a face mask, their picture is sent to the airport authorities so that
they could take quick action.
1.4.2 Hospitals:
Using Face Mask Detection System, Hospitals can monitor if their staff is wearing
masks during their shift or not. If any health worker is found without a mask, alert them
Also, if quarantine people who are required to wear a mask, the system can keep an eye
and detect if the mask is present or not and send notification automatically or report to
the authorities.
1.4.3 Offices:
The Face Mask Detection System can be used at office premises to detect if employees
are maintaining safety standards at work. It monitors employees without masks.
Development Methodology
1.5.1 Agile
Our project is based on the agile model. Each development process has been done
iteratively. The meaning of Agile is swift or versatile. “Agile process model" refers to a
software development approach based on iterative development. Agile methods break tasks
into smaller iterations, or parts do not directly involve long term planning. The project
scope and requirements are laid down at the beginning of the development process. Plans
regarding the number of iterations, the duration and the scope of each iteration are clearly
defined in advance Each iteration is considered as a short time "frame" in the agile process
model, which typically lasts from one to four weeks. The division of the entire project into
smaller parts helps to minimize the project risk and to reduce the overall project delivery
time requirements.
3
Each iteration involves a team working through a full software development life cycle
including planning, requirements analysis, design, coding, and testing before a working
product is demonstrated to the client[2].
Report Organization
The first chapter of this report consists of a project introduction, along with problem
definitions, objectives, scopes, and limitations. The next chapter included a literature
search on the background of the project and existing systems. Chapter 3 consists of system
analysis including requirements analysis and feasibility analysis.
4
General theory and concept:
Neural networks, also known as artificial neural networks (ANNs) or simulated neural
networks (SNNs), are a subset of machine learning and are at the heart of deep learning
algorithms. Their name and structure are inspired by the human brain, mimicking the
way that biological neurons signal to one another.
Neural networks rely on training data to learn and improve their accuracy over time.
However, once these learning algorithms are fine-tuned for accuracy, they are powerful
tools in computer science and artificial intelligence, allowing us to classify and cluster
data at a high velocity. Tasks in speech recognition or image recognition can take
minutes versus hours when compared to the manual identification by human experts.
One of the most well-known neural networks is Google’s search algorithm.
Artificial neural networks (ANNs) are comprised of a node layers, containing an input
layer, one or more hidden layers, and an output layer. Each node, or artificial neuron,
connects to another and has an associated weight and threshold. If the output of any
individual node is above the specified threshold value, that node is activated, sending
data to the next layer of the network. Otherwise, no data is passed along to the next layer
of the network.
Neural networks rely on training data to learn and improve their accuracy over time.
However, once these learning algorithms are fine-tuned for accuracy, they are powerful
tools in computer science and artificial intelligence, allowing us to classify and cluster
data at a high velocity. Tasks in speech recognition or image recognition can take
5
minutes versus hours when compared to the manual identification by human experts.
One of the most well-known neural networks is Google’s search algorithm[4].
Convolutional neural networks are very good at picking up on patterns in the input
image, such as lines, gradients, circles, or even eyes and faces. It is this property that
makes convolutional neural networks so powerful for computer vision. Unlike earlier
computer vision algorithms, convolutional neural networks can operate directly on a
raw image and do not need any preprocessing.
6
four convolutional layers it is possible to recognize handwritten digits and with 25 layers
it is possible to distinguish human faces.
The above mentioned is important when our goal is to design an architecture that is not
only good at learning features but is also scalable to massive datasets. shows the
Convolutional Neural Network[5].
The Kernel
The element which is involved in the process of carrying out the convolution operation
in the first part of the convolutional layer is called the Kernel/Filter.
In the Fig.3 the left section is similar to 5 × 5 × 1 matrix which is input image.
7
In the Fig. 3 the right section is similar to 3 × 3 × 1 matrix which is Kernel. It is
represented here as K.
Kernel/Filter, K =
Here, the kernel will shift 9 times because Stride Length = 1, every time performing a
matrix multiplication operation between K and the portion P of the image over which
the kernel is hovering. The filter will keep on moving to the right with some stride value
until it parses the complete width. Then it will move down to the left most beginning of
the image where it will again continue its journey to the end until the complete image is
traversed[6].
Pooling Layer
The function of the pooling layer is to reduce the spatial size of the convolved feature.
Because of this the computational power required to process the data will decrease
gradually through dimensionality reduction. Also, it is useful for finding out the
dominant features which are independent of rotation and position thereby maintaining
the process of effectively training the model.
Max pooling works as a noise reducer. It removes the noisy activations and performs
de-noising along with dimensionality reduction.
Average pooling simply performs dimensionality reduction for the reduction of noise.
Hence, we can conclude that Max pooling performs better than average pooling[6].
8
Figure 5 Pooling process
Linear regression analysis is used to predict the value of a variable based on the value
of another variable. The variable you want to predict is called the dependent variable.
The variable you are using to predict the other variable's value is called the independent
variable.
This form of analysis estimates the coefficients of the linear equation, involving one or
more independent variables that best predict the value of the dependent variable. Linear
regression fits a straight line or surface that minimizes the discrepancies between
predicted and actual output values. There are simple linear regression calculators that
use a “least squares” method to discover the best-fit line for a set of paired data. You
then estimate the value of X (dependent variable) from Y (independent variable)[7].
o Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
9
o Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification[8].
The construction of the pyramid involves a bottom-up pathway and a top-down pathway.
Related Terms
2.2.1 IOU
11
The left image IOU is very low
The bounding box is a rectangle that is drawn in such a way that it covers the entire
object and fits it perfectly. There exists a bounding box for every instance of the object
in the image. And for the box, 4 numbers are predicted which are as follows:
Recall
Recall is the ratio of true positive (true predictions) and the total of ground truth positives
(total number of cars)[11].
How many relevant items are selected?
The recall is the measure of how accurately we detect all the objects in the data.
Recall =
12
Precision
Precision is the ratio of true positive (true predictions) (TP) and the total number of
predicted positives (total predictions)[11].
How many selected items are relevant?
Precision=
MAP
Average precision is calculated by taking the area under the precision-recall curve.
Mean Average Precision is the mean of the AP calculated for all the classes.
2.2.3 Threshold
Conf. Threshold
Confidence Threshold is a base probability value above which the detection made by
the algorithm will be considered as an object. Most of the time it is predicted by a
classifier[12].
NMS Threshold
13
2.2.4 Activation Function
Sigmoid Function
The research that has been carried out in the Sigmoid functions which resulted in three
variants of sigmoid Activation Function, which are used in the Deep Learning
applications. Sigmoid Function is mostly used in feedforward neural networks.
It is a bounded differentiable real function, defined for real input values, with positive
derivatives everywhere and some degree of smoothness.
The sigmoid function appears in the output layers of the DL architectures, and they are
useful for predicting probability-based output.
ReLU is the most widely used activation function for deep learning applications with the most
accurate results. It is faster compared to many other Activation Functions. ReLU represents a
nearly Linear function and hence it preserves the properties of the linear function that made it
easy to optimize with gradient descent methods. The ReLU activation function performs a
threshold operation to each input element where values less than zero are set to zero[13].
14
Figure 11ReLU Activation Function
The leaky ReLU, was introduced to sustain and keep the weights updates alive during the entire
propagation process. A parameter named alpha was introduced as a solution to ReLU’s dead
neuron problem so that the gradients will not be zero at any time during training.
LReLU computes the gradient with a very small constant value for the negative gradient with a
very small constant value for the negative gradient alpha in the range of 0.01.
The LReLU has a similar result as compared to standard ReLU with an exception that it
will have non-zero gradients over the entire duration and hence suggesting that there is
no significant result improvement except in sparsity and dispersion when compared to
standard ReLU and other activation functions[13].
15
Figure 12 Leaky ReLU Activation Function
There are 3 detection layers in the YOLO algorithm. Each of these 3 layers is responsible
for the calculation of loss at three different scales. Then the losses that are calculated at
the 3 scales are then summed up for Backpropagation. Every layer of YOLO uses 7
dimensions to calculate the Loss. The first 4 dimensions correspond to center_X,
center_Y, width, height of the bounding box. The next dimension corresponds to the
objectness score of the bounding box and the last 2 dimensions correspond to the one-
hot encoded class prediction of the bounding box.
Related work
After the breakout of the worldwide pandemic COVID-19, there arises a severe need of
protection mechanisms. The year 2020 has shown mankind some mind-boggling series of
16
events amongst which the COVID-19 pandemic is the most life-changing event which has
startled the world since the year began. Affecting the health and lives of masses, COVID-
19 has called for strict measures to be followed in order to prevent the spread of disease.
From the very basic hygiene standards to the treatments in the hospitals, people are doing
all they can for their own and the society’s safety; face masks are one of the personal
protective equipment. People wear face masks once they step out of their homes and
authorities strictly ensure that people are wearing face masks while they are in groups and
public places. To select a base model, we evaluated the metrics like accuracy, precision
and recall and selected MobileNetV2 architecture with the best performance having 100%
precision and 99% recall. It is also computationally efficient using MobileNetV2 which
makes it easier to install the model to embedded systems. This face mask detector can be
deployed in many areas like shopping malls, airports and other heavy traffic places to
monitor the public and to avoid the spread of the disease by checking who is following
basic rules and who is not[15].
To identify facemask wearing condition, the input images were processed with image pre-
processing, facial detection and cropping, SR, and facemask-wearing condition
identification. Finally, SRCNet achieved a 98.70% accuracy and outperformed traditional
end-to-end image classification metho[1]ds by over 1.5% in kappa. Our findings indicate
that the proposed SRCNet can achieve high accuracy in facemask-wearing condition
identification, which is meaningful for the prevention of epidemic diseases including
COVID-19 in public[1].
In 2017, A Cascade Framework for masked face detection proposed by Weibu Jiangejinn
Xiao and Chuanhong Zhou used a simple system for mask detection. The architecture
consists of cascaded 3 convolutional mask detectors are Mask–12, Mask–24-1 and Mask –
24-2. Here ResNet 5 model–7 layer convolutional layer followed by a pooling layer is used.
Mask 1 is the first stage and Mask 3 is the last stage of masked face detector. A masked
face Dataset is used and it is contained 160 images for testing and 40 images for testing
purpose. Training process includes Pre-train model and Fine tune models. Finally use
PASCAL VOC for evacuation process. Testing on Masked Face achieved 86.6%
accuracy[16].
While already many people are persuaded of the interest for facial protective mask, as
suggested by the World Health Organization (WHO, 2020) and Studies in Science
17
conducted by (N. Leung et al, 2020), (S. Zhou et al, 2018) and (M. Sande et al, 2008), One
may note that many people don't wear masks for protection from the virus (see various
sample data in Figure 1). Such findings led infirmiers and other people to Initiate Public
Health Education prevention campaigns in mask wearing. Such campaigns consist
specifically of Sensitizing people of the importance of wearing a mask by sharing
prevention Posters and sketches[17].
In this paper, we proposed a new object detection method based on YOLOv3, name
Squeeze and Excitation YOLOv3 (SE-YOLOv3). The proposed method can locate the face
in real time and assess how the mask is being worn to aid the control of the pandemic in
public areas. Our main contributions are depicted as follows: We built a large dataset of
masked faces, the Properly-Wearing Masked Face Detection Dataset (PWMFD). Three
predefined classes were included concerning the targeted cases. Combined with the channel
dimension of the attention mechanism, the backbone of YOLOv3 was improved. We added
the Squeeze and Excitation block (SE block) between the convolutional layer of Darknet53,
which helped the model to learn he relationship between channels. The final accuracy
reached 99.64% on the Real-Time Face Dataset (RMFD)[18].
Jiang and Fan in 2020 proposed a one-stage face-detection model capable of classifying
detected faces with respect to whether they are wearing masks or not. The proposed
approach was again inspired by the RetinaNet model and represents a one-stage object
detector that consists of a Feature Pyramid Network (FPN) and a novel context attention
module. The model comprises a backbone, a neck, and a head. The main (high accuracy)
model uses a ResNet backbone, but a simpler model with a MobileNet backbone is also
explored. For the neck of the model (the intermediate connection between the backbone
and the heads of the model), the authors use an FPN. For the heads, the proposed approach
relies on a structure similar to that used in single-stage detectors (SSD). The model is tested
on selected subsets from the MAFA and Wider Face datasets that consist of a total of 7959
images with masked and unmasked faces. Despite the impressive detection performance
the proposed models do not distinguish between faces that wear masks properly (in
accordance with recommendations) and faces that do not [19].
ace and iris localization is one of the most active research areas in image understanding for
new applications in security and theft prevention, as well as in the development of human–
machine interfaces. In the past, several methods for real-time face localization have been
18
developed using face anthropometric templates which include face features such as eyes,
eyebrows, nose and mouth. It has been shown that accuracy in face and iris localization is
crucial to face recognition algorithms. An error of a few pixels in face or iris localization
will produce significant reduction in face recognition rates. In this paper, we present a new
method based on particle swarm optimization (PSO) to generate templates for frontal face
localization in real time. The PSO templates were tested for face localization on the Yale
B Face Database and compared to other methods based on anthropometric templates and
Adaboost.
Additionally, the PSO templates were compared in iris localization to a method using
combined binary edge and intensity information in two subsets of the AR face database,
and to a method based on SVM classifiers in a subset of the FERET database. Results show
that the PSO templates exhibit better spatial selectivity for frontal faces resulting in a better
performance in face localization and face size estimation. Correct face localization reached
a rate of 97.4% on Yale B which was higher than 96.2% obtained with the anthropometric
templates and much better than 60.5% obtained with the Adaboost face detection method.
On the AR face subsets, different disparity errors were considered and for the smallest
error, a 100% correct detection was reached in the AR-63 subset and 99.7% was obtained
in the AR-564 subset. On the FERET subset a detection rate of 96.6% was achieved using
the same criteria. In contrast to the Adaboost method, PSO templates were able to localize
faces on high-contrast or poorly illuminated environments. Additionally, in comparison
with the anthropometric templates, the PSO templates have fewer pixels, resulting in a 40%
reduction in processing time thus making them more appropriate for real-time applications
[20].
In 2021, Madhura Inamdar and Ninad Mehendale perform projects on detection of face
mask .Deep learning can be used in unsupervised learning algorithms to process the
unlabeled data. A CNN model for speedy face detection has been introduced by Li et al.
that evaluates low resolution an input image and discards non-face sections and accurately
processes the regions that are at a greater resolution for precise detection. Calibration nets
are used to stimulate detection. The advantage of this model is that it is fast and achieves
14 FPS in case of standard VGA images on the CPU and can be quickened to 100 FPS on
GPU [21].
19
The objective of this work is to provide a simple and yet efficient tool to detect human faces
in video sequences. This information can be very useful for many applications such as
video indexing and video browsing. In particular the paper will focus on the significant
improvements made to our face detection algorithm presented in [l]. Specifically, a novel
approach to retrieve skin-like homogeneous regions will be presented, which will be later
used to retrieve face images. Good results have been obtained for a large variety of video
sequences [22].
The closest to our work is the recent paper by Qin and Li. Here, the authors describe an
approach (SRCNet) for classifying face-mask wearing. The approach incorporates an
image super resolution model that makes it possible to process low-resolution faces and a
classification network that predicts whether faces are masked, without masks or if the
masks are worn incorrectly. The model is trained and evaluated on a dataset that contained
a total of 3835 images, which unfortunately is no longer available. Out of the 3835 images,
671 contain faces without masks, 134 images contain faces with incorrectly worn masks
and 3030 images contain faces with correctly worn face-masks. An accuracy of 98.70% is
reported for the proposed model. Although this work shares the basic problem statement,
we do not focus solely on low-resolution faces, but explore the general task of detecting
whether face-masks are worn correctly or not regardless of the data characteristics[1],[23].
Currently, there is a global outbreak of novel coronavirus pneumonia, which infected many
people.
One of the most efficient ways to prevent infection is to wear a mask. Thus, mask detection,
which essentially belongs to object detection is meaningful for the authorities to prevent
and control the epidemic. After comparing different methods utilized in object detection
and conducting relevant analysis, YOLO v3-tiny is proved to be suitable for real-time
detection[24].
There are many solutions to prevent the spread of the COVID-19 virus and one of the most
effective solutions is wearing a face mask. Almost everyone is wearing face masks at all
times in public places during the coronavirus pandemic. This encourages us to explore face
mask detection technology to monitor people wearing masks in public places. Most recent
and advanced face mask detection approaches are designed using deep learning. In this
article, two state-of-the-art object detection models, namely, YOLOv3 and faster R-CNN
are used to achieve this task. The authors have trained both the models on a dataset that
20
consists of images of people of two categories that are with and without face masks. This
work proposes a technique that will draw bounding boxes (red or green) around the faces
of people, based on whether a person is wearing a mask or not, and keeps the record of the
ratio of people wearing face masks on the daily basis. The authors have also compared the
performance of both the models i.e., their precision rate and inference time[25].
The human face is a complicated multidimensional visual model and hence it is very
difficult to develop a computational model for recognizing it. The paper presents a
methodology for recognizing the human face based on the features derived from the image.
The proposed methodology is implemented in two stages. The first stage detects the human
face in an image using viola Jones algorithm. In the next stage the detected face in the
image is recognized using a fusion of Principle Component Analysis and Feed Forward
Neural Network. The performance of the proposed method is compared with existing
methods. Better accuracy in recognition is realized with the proposed method. The
proposed methodology uses Bio ID-Face-Database as standard image database[25].
The COVID-19 is an unparalleled crisis leading to a huge number of casualties and security
problems. To reduce the spread of coronavirus, people often wear masks to protect
themselves. This makes face recognition a very difficult task since certain parts of the face
are hidden. A primary focus of the researchers during the ongoing coronavirus pandemic
is to come up with suggestions to handle this problem through rapid and efficient solutions.
This paper aims to present a review of various methods and algorithms used for human
recognition with a face mask. Different approaches i.e. Haar cascade, Adaboost, VGG-16
CNN Model, etc. are described in this paper. A comparative analysis is made on these
methods to conclude which approach is feasible. With the advancement of technology and
time more reliable methods for human recognition with a face mask can be implemented
in the future. Finally, it includes some of the applications of face detection. This system
has various applications at public places, schools, etc. where people need to be detected
with the presence of a face mask and recognize them and help society[26].
The outbreak of Coronavirus disease has thus far killed over 2.85M people and infected
over 131M all over the world, causing global health crisis. Due to this the government was
forced to impose lockdown all over the world. As made mandatory by World Health
Organization (WHO), the only effective protection method is to wear face mask every time
we are out in public and maintain social distancing. Wearing face masks will automatically
21
reduce the risk of spreading of the deadly virus. An efficient approach used for building
Deep learning model for face detection will be presented. Here, we will have dataset that
consists images that are with mask and without mask and later use OpenCV real-time face
mask detection from our webcam. We will use the dataset to build a COVID-19 face mask
detector with computer vision using Python, OpenCV, and Tensor Flow and Keras. Our
aim to identify if the person in the image/video is masked or unmasked. The model achieves
98.7% accuracy on distinguishing people with or without face mask. We hope that our
study would be useful to reduce the rapid spread of virus[27].
22
CHAPTER 3: SYSTEM ANALYSIS
System Analysis
The team size for the development of the system was a 2 and the total project duration
was 20 weeks. Each of the members worked 35 hours per week to develop the system.
i) Functional Requirements
A Functional Requirement (FR) describes the service that the software must provide. It
refers to a software system or its component. A function is nothing more than the
software system's inputs, behavior, and outputs. It could be a calculation, data
manipulation, business process, user interaction, or any other specific functionality that
describes the function that a system is likely to perform[29].
23
Figure 13Use Case Diagram
The Use Case describes the interaction between the actor and the system - what the actor
does and how the system reacts
In face mask detection all the user/visitors should pass through system first. The system
take visual input of the user and
Performance and scalability: How fast do the system return results? How much will
this performance change with higher workloads?
Portability and compatibility: Face mask detection can run on a system with RAM
of 4 GB or higher and 1 GHz or faster processor. These are readily available on most
of the systems nowadays. During development, it runs on the Windows platform but
24
can be further developed to run for Mac and Linux. Python is available for all
platforms. The package used for library is called open CV is used. All of the tools are
available for cross platform portability and will have no issues with compatibility.
Localization: Facemask detection match the local specifics currently. It surely can
also be used globally.
Usability: Facemask has a simple design. The system is very simple to use.
Hardware Requirements
• GB RAM or higher.
• 1 GHz or faster processor.
• Input device: Keyboard, Mouse
• Output device: Monitor
• Camera
Software Requirements
i) Technical Feasibility
25
Development of the proposed detection based facemask detction is technically feasible.
It complies with current technology. Pycharm is an open source platform which can be
programmed using python language. Also, only access to a computer system.
This project aims to provide a suitable working environment in the office. By providing
the alertness to the indivisual/without any harm. Hence, this product is operationally
feasible.
The web application and mobile application used in the system will be developed using
open-source platforms and technologies such as python pycharm tenserflow which will
require no seed investment. Also, any computer device or smartphone will be capable
of making use of the user DAPP. The controller can also access using any computer
device connected to the to the system. The tool used for the simulation was open source.
So, the proposed project is economically feasible.
Gantt Chart
13-Jun 3-Jul 23-Jul 12-Aug 1-Sep 21-Sep 11-Oct 31-Oct 20-Nov
Preliminary Investigation 18
Planning 30
Coding 25
Development 50
Testing and Debugging 20
Finalizing 10
Documentation 120
26
3.1.3 System Analysis
Data Modelling
Figure 15 ER Diagram
27
Figure 16 DFD level 0
DFD Level 0 is also called a Context Diagram. It’s a basic overview of the whole system
or process being analyzed or modeled. It’s designed to be an at-a-glance view, showing
the system as a single high-level process, with its relationship to external entities.
DFD Level 1 provides a more detailed breakout of pieces of the Context Level Diagram.
The main functions carried out by the system are highlighted, as the high-level process
28
of the Context Diagram is broken down into its sub-processes. The DFD level 1 is in the
above figure.
Level 2 DFD goes one step deeper into parts of 1-level DFD. It can be used to plan or
record the specific/necessary detail about the system’s functioning
29
CHAPTER 4: SYSTEM DESIGN
Design
The relational diagram as follows shows the relationship between the 4 major entities of
our application.
Algorithm details
Introduction to YOLO v3
YOLOv3 (You Only Look Once, Version 3) is a real-time object detection algorithm
that identifies specific objects in videos, live feeds, or images. YOLO uses features
learned by a deep convolutional neural network to detect an object. Versions 1-3 of
YOLO were created by Joseph Redmon and Ali Farhadi.
30
The first version of YOLO was created in 2016, and version 3, which is discussed
extensively in this article, was made two years later in 2018. YOLOv3 is an improved
version of YOLO and YOLOv2. YOLO is implemented using the Keras or OpenCV
deep learning libraries.
As typical for object detectors, the features learned by the convolutional layers are
passed onto a classifier which makes the detection prediction. In YOLO, the prediction
is based on a convolutional layer that uses 1×1 convolutions.
YOLO is named “you only look once” because its prediction uses 1×1 convolutions; the
size of the prediction map is exactly the size of the feature map before it[32].
Architecture
The YOLOv3 algorithm first separates an image into a grid. Each grid cell predicts some
number of boundary boxes (sometimes referred to as anchor boxes) around objects that
score highly with the aforementioned predefined classes.
Each boundary box has a respective confidence score of how accurate it assumes that
prediction should be and detects only one object per bounding box. The boundary boxes
are generated by clustering the dimensions of the ground truth boxes from the original
dataset to find the most common shapes and sizes.
Other comparable algorithms that can carry out the same objective are R-CNN (Region-
based Convolutional Neural Networks made in 2015) and Fast R-CNN (R-CNN
improvement developed in 2017), and Mask R-CNN.
However, unlike systems like R-CNN and Fast R-CNN, YOLO is trained to do
classification and bounding box regression at the same time.
Working
It allows the model to look at the whole image at test time, so its predictions are informed
by the global context in the image. YOLO and other convolutional neural network
algorithms “score” regions based on their similarities to predefined classes.
31
High-scoring regions are noted as positive detections of whatever class they most closely
identify with. For example, in a live feed of traffic, YOLO can be used to detect different
kinds of vehicles depending on which regions of the video score highly in comparison
to predefined classes of vehicles[33].
Step 1: Start
Step 2: User launches the software.
Step 3: User presses the start button.
Step 4: When the user presses the start button, the software accesses the webcam of
the device and initializes the webcam.
Step 5: After the webcam is initialized, the software checks whether a person is
wearing face mask or not for each frame.
Step 6: If everyone is wearing face mask, the system displays the status as safe. If less
than 3 persons are not wearing face mask, the system displays warning status. If
more than 3 persons are not wearing face mask, the status is displayed as danger.
Step 7: For each danger status, an alert email is sent to the concerned authority
consisting of count value of masked and non-masked individual along with date and
time.
Step 8: End.
32
CHAPTER 5: IMPLEMENTATION
AND TESTING
Implementation
The face mask detection and alert system developed from the previously mentioned
requirements and designed is implemented in this phase. The tools used and the
implementation details are mentioned as follows:
The following hardware and software tools are used to develop the face mask detection
and alert system:
Hardware tools
4 GB RAM or higher.
1 GHz or faster processor.
Input device: Keyboard, Mouse
Output device: Monitor
Camera
Software Tools
The following software tools are used to develop the face mask detection and alert system:
VS code: Visual Studio Code is a distribution of the Code repository with Microsoft-
specific customizations released under a traditional Microsoft product license. Visual
Studio Code combines the simplicity of a code editor with what developers need for their
core edit-build-debug cycle. It provides comprehensive code editing, navigation, and
understanding support along with lightweight debugging, a rich extensibility model, and
lightweight integration with existing tools.
PyCharm: PyCharm is an integrated development environment (IDE) used in
computer programming, specifically for the Python programming language. It is
developed by the Czech company JetBrains (formerly known as IntelliJ).[5] It
provides code analysis, a graphical debugger, an integrated unit tester, integration
with version control systems (VCSes), and supports web development with Django
as well as data science with Anaconda.
33
Python: Python is an interpreted high-level general-purpose programming language.
Its design philosophy emphasizes code readability with its use of significant
indentation. Its language constructs as well as its object-oriented approach aim to
help programmers write clear, logical code for small and large-scale projects. Python
is dynamically-typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented and
functional programming. It is often described as a "batteries included" language due
to its comprehensive standard library.
The Python language was designed with the following features:
Easy to code: Python is a high-level programming language. Python is very easy to
learn the language as compared to other languages like C, C#, JavaScript, Java, etc.
It is very easy to code in python language and anybody can learn python basics in
a few hours or days. It is also a developer-friendly language.
Free and Open Source: Python language is freely available at the official website.
Since it is open-source, this means that source code is also available to the public
.So you can download it as, use it as well as share it.
Object-Oriented Language: One of the key features of python is Object-Oriented
programming. Python supports object-oriented language and concepts of classes,
objects encapsulation, etc.
GUI Programming Support: Graphical User interfaces can be made using a module
such as PyQt5, PyQt4, wxPython, or Tk in python. PyQt5 is the most popular option
for creating graphical apps with Python.
High-Level Language: Python is a high-level language. When we write programs
in python, we do not need to remember the system architecture, nor do we need to
manage the memory
Anaconda Python: Anaconda is a distribution of the Python and R programming
languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.), that aims to
simplify package management and deployment. The distribution includes data-
science packages suitable for Windows, Linux, and macOS. It is developed and
maintained by Anaconda, Inc., which was founded by Peter Wang and Travis
Oliphant in 2012. As an Anaconda, Inc. product, it is also known as Anaconda
Distribution or Anaconda Individual Edition, while other products from the
34
company are Anaconda Team Edition and Anaconda Enterprise Edition, both of
which are not free.
LabelImg: LabelImg is a free, open source tool for graphically labeling images. It’s
written in Python and uses QT for its graphical interface. It’s an easy, free way to
label a few hundred images to try out your next object detection project.
Google Drive: Google Drive is a file storage and synchronization
service developed by Google. Launched on April 24, 2012, Google Drive allows
users to store files in the cloud (on Google's servers), synchronize files across
devices, and share files. In addition to a web interface, Google Drive offers apps
with offline capabilities for Windows and macOS computers
and Android and iOS smartphones and tablets. Google Drive encompasses Google
Docs, Google Sheets, and Google Slides, which are a part of the Google Docs
Editors office suite that permits collaborative editing of documents, spreadsheets,
presentations, drawings, forms, and more.
Microsoft Excel: Microsoft Excel is a helpful and powerful program for data
analysis and documentation. It is a spreadsheet program, which contains a number
of columns and rows, where each intersection of a column and a row is a “cell. It is
used to create grids of text, numbers and formulas specifying calculations. That is
extremely valuable for many businesses, which use it to record expenditures and
income, plan budgets, chart data and succinctly present fiscal results[34].
The main reason why we used Excel for our project was that we can create a gantt
chart.
Draw.io software: Designed by Seibert Media, draw.io is proprietary software for
making diagrams and charts. The software allows you to choose from an automatic
layout function, or create a custom layout. They have a large selection of shapes
and hundreds of visual elements to make your diagram or chart one-of-a-kind. It
also produces web-based diagramming technology and integrates with Google
Drive and Dropbox[35].
We used this online diagram software for the purpose of making flowchart, context
diagram,data flow diagram, use case and class diagrams.
Microsoft Word: Microsoft Word is a word processor developed by Microsoft. It
was first released on October25, 1983[36] .Using word you can create the document
and edit them later, as and when required, by adding more text, modifying the
35
existing text, deleting/moving some part of it. Changing the size of the margins can
reformat complete document or part of text. Font size and type of fonts can also be
changed. Page numbers and Header and Footer can be included.
We used Microsoft Word for the purpose of documenting the whole workflow from
start to the end of the project.
GitHub: GitHub is a code hosting platform for version control and collaboration.
It lets you and others work together on projects from anywhere. is a highly used
software that is typically used for version control. To understand GitHub, we must
first have an understanding of Git. Git is a version control system which allows
developers to easily collaborate, as they can download a new version of the
software, make changes, and upload the newest revision.
The main reason why we preferred Git was that it has multiple advantages over the
other systems available. It stores file changes more efficiently and ensures file
integrity better[37].
Opencv: OpenCV is the huge open-source library for the computer vision, machine
learning, and image processing and now it plays a major role in real-time operation
which is very important in today’s systems. By using it, one can process images and
videos to identify objects, faces, or even handwriting of a human. When it integrated
with various libraries, such as NumPy, python is capable of processing the OpenCV
array structure for analysis. To Identify image pattern and its various features we
use vector space and perform mathematical operations on these features.
Numpy: Numpy is a general-purpose array-processing package. It provides a high-
performance multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.
pytz,: Pytz brings the Olson tz database into Python and thus supports almost all
time zones. This module serves the date-time conversion functionalities and helps
user serving international client’s base. It enables time-zone calculations in our
Python applications and also allows us to create time zone aware date time
instances.
36
Imutils: A series of convenience functions to make basic image processing
functions such as translation, rotation, resizing, skeletonization, and displaying
Matplotlib images easier with OpenCV and both Python 2.7 and Python 3.
Python GUI –
Tkinter: Python offers multiple options for developing GUI (Graphical User
Interface). Out of all the GUI methods, tkinter is the most commonly used method.
It is a standard Python interface to the Tk GUI toolkit shipped with Python. Python
with tkinter is the fastest and easiest way to create the GUI applications. Creating a
GUI using tkinter is an easy task.
Pillow: PIL is the Python Imaging Library by Fredrik Lundh and Contributors.
Pillow for enterprise is available via the Tidelift Subscription.
Python Tcl: Tcl is a dynamic interpreted programming language, just like Python.
Though it can be used on its own as a general-purpose programming language, it is
most commonly embedded into C applications as a scripting engine or an interface
to the Tk toolkit[38].
while True:
_, frame = cap.read()
fps._numFrames = frame.sum()
net.setInput(blob)
output_layers_names = net.getUnconnectedOutLayersNames()
37
layerOutputs = net.forward(output_layers_names)
boxes = []
confidences = []
class_ids = []
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
center_x = int(detection[0]*width)
center_y = int(detection[1]*height)
w = int(detection[2]*width)
h = int(detection[3]*height)
x = int(center_x - w/4)
y = int(center_y - h/4)
boxes.append([x, y, w, h])
confidences.append((float(confidence)))
class_ids.append(class_id)
border_size = 100
# display status
text = "Status:"
datetime_ist = datetime.now(IST)
39
sendEmail(msg)
print(fps._numFrames)
print(next_frame_towait)
else:
if len(indexes)>0:
for i in indexes.flatten():
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i],2))
color = colors[i]
key = cv2.waitKey(1)
40
if key==27:
break;
smtp_server = "smtp.gmail.com"
password = "qahylckwpfzgipnv"
context = ssl.create_default_context()
server.login(sender_email, password)
For GUI
root = tkinter.Tk()
root.configure(bg='#87cefa')
canvas=tkinter.Canvas(root,width=600,height=400,)
canvas.grid(columnspan=3,rowspan=3)
logo = Image.open('pp.png')
logo = ImageTk.PhotoImage(logo,width=300,height=300)
logo_label = tkinter.Label(image=logo)
41
logo_label.image = logo
logo_label.grid(column=1, row=0)
padx=10,pady=10,bg="#87cefa")
instruction.grid(column=1, row=10,padx=10,pady=10)
browse_btn.grid(column=1, row=20,padx=10,pady=10)
root.mainloop()
Testing
Unit Testing is a software testing technique by means of which individual units of software i.e.
group of computer program modules, usage procedures and operating procedures are tested to
determine whether they are suitable for use or not. It is a testing method using which every
independent modules are tested to determine if there are any issue by the developer himself.
Test Case 1
42
Figure 21 start of application
43
Test Case 2
Test Case 3
44
Figure 24 no mask detected
Test Case 4
Test Objectives: Test for showing the number of mask count /no mask count
Test Case 5
45
Figure 27 email alert
Test Case 1
Test Objectives: Test for detection of face mask for single person with status
46
Figure 28 Single person Face mask detection
47
Test Case 2
Test Objectives: Test for detection of face mask for single person unusual cases
Test Case 3
Test Objectives: Test for detection of face mask for multiple person with status Warning
48
Figure 30Detection of multiple person not wearing mask
Figure 31 Detection of face with different cases(wearing mask and not wearing mask)
Test Case 4
49
Test Objectives: Test for detection of face mask for multiple person with status danger
Test Case 4
Test Objectives: Test for detection of facemask for multiple person with status Safe
50
Figure 33 Detection of faces wearing mask
Test Case 5
Expected Output: successfully send the email to the controller (every 30 sec)
51
Figure 34 E-mail alert system
Result Analysis
All the functionalities of the projects are achieved in the project as per the objective of the
52
Figure 35: Detection of absence of face mask
53
Figure 36: Detection of Presence of face mask
54
An alert system which alerts the concerned authority when an individual isn’t
wearing a mask.
55
CHAPTER 6: CONCLUSION AND FUTURE
ENHANCEMENTS
Conclusion
Face mask detection system is a software which can detect whether a person wearing a face
mask or not and alert the concerned authority by displaying the stats on the screen and
sending alert email as well. A simple and easy GUI pops up when the software is launched.
With the help of single button, the detection system starts. User can set a threshold count
value for receiving email alert on their email address. The email will state the number of
people with and without face mask on a particular time. The system waits for a certain
amount of time before sending another alert email. The algorithms used in the development
process of face detection and recognition are fast and work perfectly fine on good lighting
conditions.
This system can replace the process of manual face mask detection which is unsafe, hectic
and impractical at many places. It can be easily deployed as an executable file. The GUI
interfacing is very simple to use and the software works properly in all available OS.
Future Enhancements
To overcome the limitations in the future following enhancements can be implemented:
1. The first step towards enhancement would be to improve accuracy in detecting not
commonly used masks or fancy face mask
2. It should be capable of detecting any faces under any light conditions.
3. Improve the recognition rate of algorithms when there are unintentional changes in
a person like using scarf, glasses, hats.
4. Better performance even on low end devices.
5. The system can be made to store images of individuals roaming without face mask
so that the culprit can be identified and punished.
56
REFERENCES
[1] B. Qin and D. Li, “Identifying facemask-wearing condition using image super-resolution
with classification network to prevent COVID-19,” Sensors (Switzerland), vol. 20, no. 18,
pp. 1–23, Sep. 2020, doi: 10.3390/s20185236.
[6] B. O. F. Technology, Face-Mask Detection Using Yolo V3 Architecture, no. May. 2020.
[10] K. Sharma and E. Gurinder Singh, “IJARCCE Face Recognition using Principal Component
Analysis and ANN,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 5, 2016, doi:
10.17148/IJARCCE.2016.53144.
[15] A. Ahmed, S. Adeel, H. Shahriar, and S. Mojumder, “Face Mask Detector Face Mask
Recognition View project,” 2020, doi: 10.13140/RG.2.2.32147.50725.
[16] J. Babu, “A Review on Face Mask Detection using Convolutional Neural Network,” Int.
Res. J. Eng. Technol., 2020, [Online]. Available: www.irjet.net.
[17] L. Dinalankara, “‘Face Detection & Face Recognition Using Open Computer Vision
57
Classifies FACE DETECTION & FACE RECOGNITION USING OPEN COMPUTER
VISION CLASSIFIRES,’ , [Online].,” [Online]. Available: available:
https://www.researchgate.net/publication/318900718.
[18] X. Jiang, T. Gao, Z. Zhu, and Y. Zhao, “Real-time face mask detection method based on
yolov3,” Electron., vol. 10, no. 7, Apr. 2021, doi: 10.3390/electronics10070837.
[19] M. Jiang, X. Fan, and H. Yan, “RetinaMask: A Face Mask detector,” May 2020, [Online].
Available: http://arxiv.org/abs/2005.03950.
[21] M. Inamdar and N. Mehendale, “Real-time face mask identification using Facemasknet deep
learning network.” [Online]. Available: https://ssrn.com/abstract=3663305.
[22] and M. K. A. Kumar, A. Kaur, “‘Face detection techniques: a review,’ Artif. Intell. Rev.,”
vol. 52, pp. 927–948, 2019.
[23] B. Batagelj, P. Peer, V. Štruc, and S. Dobrišek, “How to Correctly Detect Face-Masks for
COVID-19 from Visual Information?,” Appl. Sci., vol. 11, no. 5, p. 2070, Feb. 2021, doi:
10.3390/app11052070.
[24] G. Cheng, S. Li, Y. Zhang, and R. Zhou, “A Mask Detection System Based on Yolov3-
Tiny,” Front. Soc. …, vol. 2, no. 11, pp. 33–41, 2020, doi: 10.25236/FSST.2020.021106.
[25] S. Singh, U. Ahuja, M. Kumar, K. Kumar, and M. Sachdeva, “Face mask detection using
YOLOv3 and faster R-CNN models: COVID-19 environment,” Multimed. Tools Appl., vol.
80, no. 13, pp. 19753–19768, 2021, doi: 10.1007/s11042-021-10711-8.
[26] V. S. Bhat, “Review on Literature Survey of Human Recognition with Face Mask,” vol. 10,
no. 01, pp. 697–702, 2021.
[27] S. Singh, R. Swami, and M. V Bonde, “REAL TIME FACE MASK DETECTION USING,”
vol. 8, no. 5, pp. 1–5, 2021.
58
[30] “Feasibility Study Definition: How Does It Work?,” [Online]. Available:
https://www.investopedia.com/terms/f/feasibility-study.asp.
[32] J. Redmon and A. Farhadi, “YOLO v.3,” Tech Rep., pp. 1–6, 2018, [Online]. Available:
https://pjreddie.com/media/files/papers/YOLOv3.pdf.
[37] K. BROWN, “What Is GitHub, and What Is It Used For?,” How-To-Geek, no. 13 november
2019.
59