0% found this document useful (0 votes)
7 views

MiniProject Report

Uploaded by

rockyarun9159
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

MiniProject Report

Uploaded by

rockyarun9159
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 76

USER LOGIN SYSTEM USING FACE

RECOGNITION

MINI PROJECT REPORT

Submitted by

C.ARUNPANDIAN REGISTER NO: 21TD0213


S.P.KARKATESH REGISTER NO: 21TD0248
R.PRADEEP RESISTER NO: 21TD0273
S.SATHISH REGISTERNO: 21TD0289

Under the guidance of

Dr.N.Palanivel, M.E., Ph.D.,


Associate Professor

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

MANAKULA VINAYAGAR INSTITUTE OF TECHNOLOGY,


KALITHEERTHALKUPPAM, PONDICHERRY
PONDICHERRY UNIVERSITY, INDIA.

MAY 2024
MANAKULA VINAYAGAR INSTITUTE OF TECHNOLOGY
PONDICHERRY UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the mini project work entitled “USER LOGIN SYSTEM
USING FACE RECOGNITION” is a bonafide work done by C. ARUNPANDIAN
[REGISTER NO: 21TD0213], S.P.KARKATESH [REGISTER NO: 21TD0248],
R.PRADEEP [REGISTER NO: 21TD0273], S.SATHISH [REGISTER NO:
21TD0289] in partial fulfillment of the requirement for the award of B.Tech Degree in
Computer Science and Engineering by Pondicherry University during the academic year
2023 -24.

HEAD OF THE DEPARTMENT PROJECT GUIDE

Dr.S.Pariselvam, M.E., Ph.D., Dr.N.Palanivel, M.E., Ph.D.,


Professor and Head, CSE Associate Professor, CSE

MINIPROJECT COORDINATOR

Mrs. I. VARALAKSHMI
Assistant Professor (SG)

ii
ACKNOWLEDGEMENT

We express our deep sense of gratitude to Theiva Thiru. N. Kesavan, Founder,


Shri. M. Dhanasekaran, Chairman & Managing Director, Shri. S. V. Sugumaran,
Vice-Chairman and Dr. K. Gowtham Narayanasamy Secretary of Sri Manakula
Vinayagar Educational Trust, Puducherry for providing necessary facilities to
successfully complete our project and report works.
We express our sincere thanks to our beloved Principal Dr. S. Malarkkan for
having provided necessary facilities and encouragement for successful completion of this
project work.
We express our sincere thanks to Dr.S.Pariselvam Professor and Head of the
Department, Computer Science and Engineering for his support in making necessary
arrangements for the conduction of Project and also for guiding us to execute our project
successfully.
We express our sincere thanks to to Dr.N.Palanivel, Associate Professor,
Computer Science and Engineering for his consistent reviews which motivated us in
completing project.
We express our sincere gratitude to Mini Project coordinator Mrs.I.Varalakshmi,
Assistant Professor (SG), Computer Science and Engineering for her consistent
reviews and suggestions which help us in effective completion of project.
We thank all our department faculty members, non-teaching staffs and my friends
of Computer Science and Engineering for helping us to complete the document
successfully on time.
We would like to express our eternal gratitude to our parents for the sacrifices
they made for educating and preparing us for our future and their everlasting love and
support. We thank the Almighty for blessing us with such wonderful people and for being
with us always.

iii
SDG GOALS

A user login system using face recognition can potentially align with some
Sustainable Development Goals (SDGs) but also raise concerns for others. Here's a
breakdown of the relevant SDGs:

Goal 9: Industry, Innovation and Infrastructure


A face recognition login system leverages advanced technologies like computer
vision and deep learning. Developing and deploying such an innovative system
contributes to building resilient infrastructure and promoting inclusive and sustainable
industrialization.

Goal 16: Peace, Justice and Strong Institutions


Face recognition can enhance security and prevent unauthorized access to systems
and data. This helps promote peaceful and inclusive societies for sustainable
development, provide access to justice for all, and build effective, accountable and
inclusive institutions at all levels.

Goal 17: Partnerships for the Goals


Implementing a face recognition login system often requires collaboration
between different stakeholders like hardware manufacturers, software developers, and
system integrators. This fosters global partnerships and cooperation to support and
achieve the ambitious targets of the SDGs.

In summary, a user login system based on face recognition can directly contribute
to SDG 9 by promoting innovation, SDG 16 by enhancing security and accountability,
and SDG 17 by encouraging multi-stakeholder partnerships. However, it's important to
ensure such systems are designed with privacy and ethics in mind to avoid potential
misuse.

iv
ABSTRACT

With every passing day, we are becoming more and more dependent upon
technology to carry out even the most basic of our actions. Facial detection and Facial
recognition help us in many ways, be it sorting of photos in our mobile phone gallery by
recognizing pictures with their face in them or unlocking a phone by a mere glance to
adding biometric information in the form of face images in the country’s unique ID
database (Aadhaar) as an acceptable biometric input for verification. This project lays out
the basic terminology required to understand the implementation of Face Detection and
Face Recognition using Intel’s Computer Vision library called ‘OpenCV’. It also shows
the practical implementation of the Face Detection and Face Recognition using OpenCV
with Python embedding on both Windows as well as macOS platform. The aim of the
project is to implement Facial Recognition on faces that the script can be trained for. The
input is taken from a webcam and the recognized faces are displayed along with their
name in real time. This project can be implemented on a larger scale to develop a
biometric attendance system which can save the time-consuming process of manual
attendance system. In today's digital era, the need for robust yet user-friendly
authentication methods is paramount. This paper proposes a novel face recognition-based
authentication system tailored for login applications. Leveraging the advancements in
deep learning and computer vision, the proposed system offers a seamless and secure
login experience for users across various platforms and devices. We can identify human
faces using a web Camera which is known as Face Detection. This is a very effective
technique in computer technology. There are used different types of attendance systems
such as log in with the password, punch card, fingerprint, etc. In this research, we have
introduced a facial recognition type of biometric system that can identify a specific face
by analyzing and comparing patterns of a digital image. This system is the latest login
system based on face detection. Primarily, the device captures the face images and stores
the captured images into the specific path of the computer relating the information into a
database.

v
TABLE OF CONTENTS

CHAPTER TITLE PAGE


NO. NO.
ACKNOWLEDGEMENT iii
SDG GOALS iv
ABSTRACT v
LIST OF FIGURES ix
LIST OF TABLES x
LIST OF ABBREVIATIONS xi

1 INTRODUCTION 1
1.1 OVERVIEW 1
1.2 TECHNOLOGY (used in project) 2
1.3 PROJECT OBJECTIVE 3
1.4 PROJECT ARCHIETECTURE 3

2 LITERATURE SURVEY 4
2.1 APPEARANCE BASED APPROACH 4
2.1.1 THE EIGEN FACE METHOD 5
2.1.2 THE FISHER FACE METHOD 6
2.1.3 SUPPORT VECTOR MACHINES 7
2.2 FEATURE BASED APPROACH 9
2.2.1 GEOMETRIC FEATURE METHOD 9
2.2.2 HIDDEN MARKOV MODEL 11

vi
2.2.3 ACTIVE APPEARANCE MODEL 11
2.3 CONCLUSION 13

3 EXISTING WORK 14
3.1 ARCHITECTURE 14
3.2 SKIN COLOR BASED FACE DETECTION 15
3.3 VIOLA JONES METHOD 16
3.4 MECHANISM OF HUMAN FACIAL 17
RECOGNITION
3.5 EYE SPACING MEASUREMENT 18
3.6 LDA ALGORITHM 19

4 PROPOSED SYSTEM 20
4.1 SYSTEM ARCHITECTURE 20
4.2 DESIGN METHODOLOGY 21
4.2.1 OpenCV 21
4.2.2 Dlib 21
4.2.3 Cmake 21
4.3 ALGORITHM 22
4.3.1 CONVOLUTIONAL NEURAL NETWORK 23
4.3.2 ARCHITECTURE OF CNN 24
4.3.3 TRADITIONAL NEURAL NETWORK 26
4.3.4 BENEFITS OF CNN 28
4.3.5 APPLIACTIONS 30
4.3.6 ARTIFICIAL NEURONS 32
4.4.1 REGION BASED CNN 35
4.4.2 R-CNN 38
4.4.3 FAST R-CNN 40
4.4.4 FASTER R-CNN 41
4.4.5 YOLO 42
4.4.6 ADVANTAGES 44

5 CONCLUSION 45
5.1 CONCLUSION 45
5.2 FUTURE SCOPE 45
REFERENCES 46
6 APPENDIX-1 (Coding)
APPENDIX-2 (ScreenShot)

vii
LIST OF FIGURES

FIGURE TITLE PAGE


NO. NO.
1.1 Feature vector are derived using Eigen faces 5
1.2 Example of six classes using LDA 5
1.3 Snapshot of ORL database 8
1.4 Cropped yale database 9
1.5 Geometrical feature 10
1.6 Left to Right HMM face region 14

viii
1.7 Normalized feature 15
3.1 Skin color detection 16
3.2 Viola Jones detection 17
4.1 Sample of CNN 19
4.2 Sample of CNN 2 21
4.3 Working of CNN 23
4.4 Kernel Matrix 24
4.5 Convolution of matrix 25
4.6 Artificial Neurons 31
4.7 Object Detection 32
4.8 Pooling layer 33
4.9 Max and Average Pool 34
4.10 Limitations of CNN 35
4.11 R-CNN features 38
4.12 Fast R-CNN 39
4.13 Faster R-CNN 41
4.14 43
YOLO

LIST OF TABLES

TABLE PAGE
TITLE
NO. NO.

1.1 ORL Result 4


1.2 Yale Result 12

ix
LIST OF ABBREVIATIONS

AMM Active Appearance Model


ANN Artificial Neural Network
CNN Convolutional Neural Network
DNN Deep Neural Network
FLD Fishers Linear Discriminant
HMM Hidden Markov Model
IFK Improved Fisher Kernel
ICA Independent Component Analysis
KNN K-Nearest Neighbor Approach
LDA Linear Discriminant Analysis
LFA Local Feature Analysis
NM Nearest Mean Approach
OSH Optimal Separating HyperPlane

x
OpenCV Open Computer Vision
PCA Principle Component Analysis

xi
CHAPTER 1

INTRODUCTION

1.1 OVERVIEW

A face recognition system could also be a technology which is very capable of


matching a personality's face from a digital image or a video frame which it has or use it
as a reference to map and identify against an information of faces. Researchers area unit
presently developing multiple ways throughout that face recognition systems work. The
foremost advanced face recognition methodology, that is to boot used to manifest users
through ID verification services, works by pinpointing and mensuration countenance
from a given image. While at first a kind of laptop application, face recognition systems
have seen wider uses in recent times on smartphones and in alternative kinds of
technology, like artificial intelligence. As a result of computerised face recognition
involves the measuring of a human's physiological characteristics face recognition
systems area unit classified as bioscience. through the accuracy of face recognition
systems as a biometric technology is a smaller amount than iris recognition and
fingerprint recognition, it's wide adopted because of its contactless and non-invasive
method.Facial recognition systems area unit deployed in advanced human-computer
interaction, video police work and automatic compartmentalisation of pictures. We have
a created a face recognition technology capable of identifying faces. There are various
advantages of developing an software using face detection and recognition in the field of
authentication. Face detection is an easy and simple task for humans, but not so for
computers. It has been regarded as the most complex and challenging problem in the field
of computer vision due to large intra-class variations caused by the changes in facial
appearance, lighting and expression. Face detection is the process of identifying one or
more human faces in images or videos. It plays an important part in many biometric,
security and surveillance systems, as well as image and video indexing systems. Face
detection can be regarded as a specific case of object-class detection. In object-class

1
detection, the task is to find the locations and sizes of all objects in an image that belong
to a given class In recent years, face recognition has attracted much attention and its
research has rapidly expanded by not only engineers but also neuroscientists, since it has
many potential applications in computer vision communication and automatic access
control system.

1.2 PROJECT TECHNOLOGY

EmguCV Library
EmguCV is a cross platform .Net wrapper to the OpenCV image processing
library. OpenCV/EmguCV uses a type of face detector called a Haar Cascade. The Haar
Cascade is a classifier (detector) trained on thousands of human faces.
Visual Studio

Visual Studio is able to build and run the solution examples after a proper
configuration of EmguCV. The desktop software will implement the two sub-systems
(Training set manager and Face recognizer) together with face detector in windows form.

Open Source Computer Vision

OpenCV (Open Source Computer Vision) is a library of programming functions for


real-time computer vision. The face detection part of the project was made using an
OpenCV Library for Scala. The reason was that most Face APIs are restricted to doing
detection on pictures only, whereas the project was required to have face detection done
on a live video footage to speed up the process of checking student attendance and
prevent queues before lectures. The OpenCV library proved to be flexible enough for the
project as it can accurately detect a face in real time and highlight it by drawing a
rectangle around the faces of the students passing by. This all happens in a window
separate from the face recognition so the lecturer can keep track of both students passing
by while having their faces detected and the feedback from the recognition part of the
system. While faces are being detected, the application takes a snapshot of the live
footage every second and then sends it to the recognition system.

2
1.3 PROJECT OBJECTIVE

Whenever we implement a new system it is developed to remove the shortcomings of


the existing system. The computerized mechanism has the more edge than the manual
system. The existing system is based on manual system which takes a lot of time to get
performance of the work. The proposed system is a web application and maintains a
centralized repository of all related information. The system allows one to easily access
the software and detect what he wants.

1. Reducing time wastage during conventional class attendance.


2. Utilizing latest trends in machine vision to implement a feasible solution for class
attendance system.
3. Automating the whole process so that we have digital environment.
4. Preventing fake roll calls as one to one attendance marking is possible only.
5. Encouraging the use of technology in daily lives.

1.4 PROJECT ARCHITECHURE

3
CHAPTER 2

LITERATURE SURVEY

2.0 OVERVIEW

In recent years, the techniques based on the biological properties of human beings
having the much significant in the identification of individuals where the other techniques
like pass codes, OTP generation and other types of security modes having the
possibilities of getting stolen, misused and forged etc. Hence the biological properties
like identification of face finger prints, Palm, ear, Iris, retina and signature can be used
which are not easily accessed by anyone[3]. The major purpose of face recognition is to
verify and identify. Face recognition applications are playing significant role in the
following fields like security investigation, Camera surveillance process, General identity
verification, criminal case investigation, database management systems, application
based on smart card and other types of magnetic cards. In addition, the underlying
techniques have also been modified and used in relevant applications like gender
classification, gesture recognition, facial recognition and tracking. The gesture
recognition can be used in the field of medicine for monitoring intensive care unit. The
facial recognition can be implemented for tracking vehicle driver face where face
recognition can also be hybrid with other biometrics like speech, fingerprint and gait
recognition. It is similar to object recognition. Human faces are mostly appearing to be
similar and differences between them are negligible. Although face is not a unique, there
are several factors that make variation in appearance of the face. It can be classified as
follows; Intrinsic factors and extrinsic factors. Intrinsic represents the objective of a face.
It is divided into interpersonal and intrapersonal .

4
2.2 FACE RECOGNITION TECHNIQUES- APPEARANCE BASED
APPROACHES

2.1.1 The Eigen face Method

Firstly Kirby and Sirvoich demonstrated Eigenfaces method for recognition. Pentland
and Turk made improvements on this research by employing Eigenfaces method based on
Principle Component Analysis for the same reason. PCA is a Karhumen-Loeve
transformation. PCA is a realized linear dimensionality reduction method used to
determine a set of mutually orthogonal basis functions and as shown in fig 1. It uses the
vanguard eigenvectors of the sample covariance matrix to characterize the lower
dimensional. It is used to reduce dimension of image matrix. Ex: If a face image is
represented in g dimensional space, PCA aims to obtain an h dimensional sub space using
linear transforms, which answers

Figure 2.1: Feature vectors are derived using Eigen faces [5]

maximum variance in g dimensional space and g is too big according to h. Subtracting


the normalized training images from the calculated mean images thus mean centered
images are calculated. If w is mean centered training image matrix Wi(i=1,2,........,L) and
l is the number of training images, matrix d is calculated from as in equation 1

5
To reduce the size of covariance matrix D, we can use D = WTW instead. Eigen vectors
ei and eigen values λi are obtained from covariance matrix.

In the equation 2, Zi represents the new feature vector of lower dimensional space.
Negative aspect of this method, it tries to max inter and intra class scattering. Inter class
scattering is good for classification where intra scattering is not. If there is variance
illumination, increases intra class scattering very high, even classes seems stained.

2.1.2 The Fisher face Method

Belhumeur introduced the Fisher Face method in 1997, a derivative of Fishers Linear
Discriminant (FLD) which has linear discriminant analysis (LDA) to gain the vast
discriminant structures. Both PCA and LDA which are used to produce a subspace
projection matrix is similar to eigen face and Fisher face methods. LDA describes a pair
of projection vectors which form the maximum between-class scatter and minimum in the
class scatter matrix concurrently produces lower error when compared to Eigen face
method. Six different classes using LDA with large variances within classes, but little
variance within classes are shown in Fig 2. Kernel FLD is capable of extracting the most
distinct features in the feature space, which is common to the nonlinear features in the
reference input space and shows better results when compare to the conventional Fisher
face which is established on second order statistics of an image-set without considering
the high order statistical dependencies. Few of the modern LDA-based algorithms include
: Direct LDA constructing the image scatter matrix obtained from a normal two
dimensional image and it is capable of resolving small sample size problem. Further, to
resolve the same problem Dual-Space LDA algorithm requires full discriminative
information of face. Both LDA and weighted pair wise Fisher criteria privileges are used
together by Direct-Weighted LDA. Block LDA algorithm segments the entire image into
several blocks and structures each block as a row vector. Linear discrimination analysis is
performed on the row vectors for block which from the two dimensional matrices. The K-
Nearest Neighbor approach (KNN) and the Nearest Mean approach (NM) are the two
approaches fused using LDA and PCA, was done on the AT&T and Yale datasets.

6
Figure 2.2: Example of Six Classes Using LDA

Fisher face or Linear Discriminant Analysis (LDA) aims to increase inter class
differences and are not used to increase data representation.

Above are intraclass (Equation 3) and inter class (Equation 4) scatter matrices
respectively. The indices, i is image number, j is class. j is the mean of class j , and is
mean of all classes. Mj shows the number images in class j, and R is the number of
classes. Sb is maximized while Sw is minimized for the classification to be done.
Intrinsic factors are independent of the observer and represents the objective of the face.
Further it can be divided into intrapersonal and interpersonal .

2.1.3 Support Vector Machines

To improve the classification performance of the PCA and LDA subspace features,
support vector [SVM] machines came into existence , Supervised learning techniques are
used to train SVM generally. In estimating the Optimal Separating Hyper plane (OSH) a
set of images is used for training SVM. Bringing down the risk of misclassification
among two classes of image in some feature space. Guo et al applied this technique for
face recognition. He applied binary tree classification techniques where a face image is
continuously grouped as belonging to one of two classes. A binary tree structure is
applied until the two classes denote individual subjects and a final classification decision
can be arrived. SVM has be opted for face recognition by some other researchers to attain
good results.

7
2.1.3.1 Performance comparison & Experimental result

Two sets of experiments are shown to examine the performance of individual


algorithms. For a given sample of n images in a class, a classifier is trained using (n-1)
images in that class and tested on the remaining single case. This test repeats n times until
each time training a classifier with leaving-one-out. This is how all images are used for
training and testing to attain good result.

The ORL Database of Faces

Figure 2.3: Snapshot of ORL Database

AT & T Laboratories Cambridge is the first to perform experiments on the ORL


face databases. These images are in grayscale with a resolutions of 92 x 112 pixels
containing 400 images, including 40 distinct people, each with 10 images that are
different in position, rotation, scale and expressions. The images are shot under constant
light exposure. Figure 3 shows a snapshot of 4 individual from the ORL results

Figure 2.4: Snapshot of cropped Yale database

8
Table 1: ORL Result
Image Set Eigen Fisher SVM
1 92.5% 100.0% 95.0%
2 85.0% 100.0% 100%
3 87.5% 100.0% 100%
4 90.0% 97.5% 100%
5 85.0% 100.0% 100%
6 87.5% 97.5% 97.5%
7 82.5% 95.0% 95.0%
8 92.5% 95.0% 97.5%
9 90.0% 100.0% 97.5%
10 85.0% 97.5% 95.0%
Average 87.5% 98.3% 97.8%

Yale Face Database

Second experiment is performed on Yale face database from Yale University. These
images are in gray-scale and have cropped to a resolution of 116 x 136 ppi, has 165
images in group, including 15 discrete people, each with 11 images that differ in both
lighting and expression. A snapshot of 4 individuals from the database is how in Figure 4.
The results of FRCM performed on Yale database to distinguish the 15 people under
different conditions is given in the Table 2.

2.2 FEATURE BASED APPROACHES

2.2.1 Face Recognition through geometric features

In the first phase a set of fiducial points are examined in each faces and the geometric
facts such as distances between these points are explored and the image closest to the
query face is nominated. This process was done by Kanade who employed the Euclidean
distance for correlation between 16 extracted feature vectors constructed on image
database of 20 distinct people with 2 imager per person and attained accuracy rate of
75%. Further, Brunelli and Poggio practiced the same on geometric features from image
database of 47 peculiar people with 4 images per person as displayed in fig 5 and attained
performance rate of 95%. Most recently , Cox et al derived 35 facial features from a
database comprised 685 images and reportedly achieved a recognition performance rate
of 95% on a database of 685 images with one image for each individual.

9
Table 2: Yale Result
Image Set Eigen Fisher SVM
Centerlight 53.3% 93.3% 86.7%
Glassess 80% 100.0% 86.7%
Happy 93.3% 100.0% 100%
Left light 26.7% 26.7% 26.7%
No glasses 100.0% 100.0% 100%
Normal 86.7% 100% 100%
Right light 26.7% 40% 13.3%
Sad 86.7% 93.3% 100%
Sleepy 86.7% 100.0% 100.0%
Suprised 86.7% 66.7% 73.3%
Wink 100% 100% 93.3%

Figure 2.5: Geometrical feature used by Brunelli and Poggio

2.2.2 Hidden Markov Model (Hmm)

The HMM was first presented by Young and Samaira. HMM generally employed on
images with variations due to lighting, orientation and facial expression and thus it have
more advancements over than the approaches for treating images using HMM, space
sequences are considered. This procedure is named as a Hidden Markov Model this is
why because the states are invisible, only the output is vivid to the external use. This
procedure uses pixel strips to cover all the areas in the without finding the precise
locations of facial features. The face arrangements are identified as a continuous of

10
discrete parts. The arrangements of the system should be maintained for e.g., it should
start from top to bottom from forehead, eyes, nose, mouth, and chin as in fig 6.

Figure 2.6: Left to Right HMM for face recognition

2.2.3 Active Appearance Model (AAM)-2D Morphable Method

Faces are highly distinct and able to be deformed. Classifying by pose, expression,
lighting, and faces can have various looks in the images. Coots, Taylor, and Edwards [56]
presented Active Appearance Model which is strongly capable of explaining the view of
face in set of model parameters. AAM is an integrated statistical model, implemented on
the basis of a training set comprising labeled images. The landmark points are pointed as
show in fig 7. Model parameters are found to ability matching with the image which
brings down the difference between the image and a synthesized model sample projected
into the image.

Figure 2.7: image is split into shape and shape normalized texture

11
2.2.4 3D Morphable Model

To differentiate the facial variations like illumination, pose etc. 3D morphable model
is an effective, strong and versatile representation of human faces and so it is better to
represent the face by employing 3D model. High quality frontal and half profile picture
are taken first of each subject under ambient lighting conditions to make a 3D model.
Later these images are use as reference to the analysis by synthesis loop which results a
face mode. Blanz et al. [57] proposed this method based upon a 3D morphable face
model in which he tries to find an algorithm to reconstruct the parameters like texture and
shape from the single image of a face and encodes them with respect to model
parameters. Thus the 3D morphable model provides the full 3D feature information
which enables for automatic extraction of facial regions and facial components. 4 Hybrid
Methods These methods show better results using both the holistic and feature-based
methods to recognize the face. Eigen modules proposed by Pentland et al. [58], which
applied both local Eigen features and global Eigen faces and shows much better results
than the holistic eigen faces. Penev and Atiek [59], used a method called Hybrid LFA
(Local Feature Analysis). Shape-normalized Flexible appearance technique by lantis et al.
[61] which combines componentbased Face region and components by Huang et al.[61]
which combines component based recognition and 3D morphable model for face
recognition. The important phase is to generate 3D face models using 3D morphable
model form the three reference images of each person. These images are furnished under
variable illumination conditions and pose to populate a large set of synthetic images, are
used to train a component-based face recognition system [62].A Support Vector Machine
(SVM) based recognition system is used to decompose the face into a set of components
that are interconnected by a flexible geometrical model so that it can keep track for the
changes in the head pose leading to changes in the position of the facial components.

2.3 Conclusions

Face recognition is a highly challenging task in the domain of image analysis and
computer vision that has received an immense deal of attention over the last few decades
because of its many applications in vast domains. Few classical face recognition

12
techniques are cited in this paper. In some face database, the methods of SVM and HMM
can produce better face recognition results, but they use more complex algorithms.
Research has been conducted exorbitantly in this area and immense progress has been
attained, notable results have been obtained and present face recognition systems have
elevated to a certain measure of maturity when imposed under constrained conditions;
however, these methods are far from achieving the ideal of being able to perform
adequately in all the various situations that are commonly faced by the applications
employing these procedures in practical life. The fundamental goal of researchers in this
domain is to enable computers to emulate the human vision system and, as has aptly
pointed out by Torres, Strong and coordinated effort between the computer vision,
psychophysics and signal processing and neurosciences communities is needed to derive
this objective .

CHAPTER 3

EXISTING SYSTEM

3.1 ARCHITECTURE OF EXISTING SYSTEM

13
3.2 SKIN COLOR BASED FACE DETECTION METHOD:

Detection of skin color in color images is a very popular and useful technique for face
detection. Color is an important feature of human faces. Using skin-color as a feature for
tracking a face has several advantages. Color processing is much faster than processing
other facial features. In the skin color detection process, each pixel was classified as skin
or non-skin based on its color components. Fig.2.RGB to HSV conversion In situations
where color description plays an integral role, the HSV color model is often preferred
14
over the RGB model . The first step of face detection is to segment the color image into
skin and non-skin region. Different color space has different ranges of pixels which
represents skin region and non-skin region . After segmentation procedure,
morphological operators are implemented with a structuring element. After application of
morphological operators , the standard deviation of the area is calculated and rectangles
are drawn in the skin regions. If any unwanted rectangles are created, it is then removed.

Figure 3.1 : Skin color detection

Advantages:

∙ This method is able to correctly locate all faces in the images with almost at right scale

∙ More robust to noise and shape variations.

∙ Accuracy 80-82%

Disadvantages:

∙ Moderate false detection rate

∙ Sometimes non face skin color region is also detected.

∙ Many objects in the real world have skin-tone colors, such as some kinds of leather,

sand, wood, fur, etc., which might be mistakenly detected.

15
3.3 VIOLA JONES FACE DETECTION SYSTEM

The Viola–Jones object detection framework is an object detection framework which


provide robust and competitive object detection rates in real-time proposed in 2001 by
Paul Viola and Michael Jones. Even though it can be trained to detect a variety of object
classes, it was motivated mainly by the task of face detection. This face detection
framework is capable of processing images extremely rapidly and achieving high
detection rates. There are three main stages of face detection framework. 1. Integral
Image: It is a new representation of an image , which allows the features used by detector
to be computed very rapidly. Once integral image is computed, Harr-like features can be
computed at any scale or location in constant time. The integral image at location (x,y) is
the sum of the pixels above and to the left of (x,y), inclusive.

Figure 3.2 : Viola jones face detection

3.4 MECHANISMS OF HUMAN FACIAL RECOGNITION

Basically what we see in this paper is that it presents an extension and a new way of
perception of the author's theory for human visual information processing, which The
method includes extracting a second sub-image from the second image, where the second

16
sub-image includes a representation of the at least one corresponding facial landmark. “In
turn detecting a facial gesture by determining whether a sufficient difference exists
between the second sub-image and first sub-image to indicate the facial gesture, and
determining, based on detecting the facial gesture, whether to deny authentication to the
user with respect to the human recognition system and same was applied”.Several
indispensable techniques are implicated: encoding of visible photographs into neural
patterns, detection of easy facial features, measurement standardization, discount of the
neural patterns in dimensionality . “The logical (computational) role suggested for the
primary visual cortex has several components: size standardization, size reduction, and
object extraction”. “The result of processing by the primary visual cortex, it is suggested,
is a neural encoding of the visual pattern at a size suitable for storage. “(In this context,
object extraction is the isolation of regions in the visual field having the same color,
texture, or spatial extent.)”It is shown in detail how the topology of the mapping from
retina to cortex, the connections between retina, lateral geniculate 12 bodies and primary
visual cortex, and the local structure of the cortex itself may combine to encode the visual
patterns. Aspects of this theory are illustrated graphically with human faces as the
primary stimulus. However, the theory is not limited to facial recognition but pertains to
Gestalt recognition of any class of familiar objects or scenes .

3.5 EYE SPACING MEASUREMENT FOR FACIAL RECOGNITION

Few procedures to computerized facial consciousness have employed geometric size


of attribute points of a human face. Eye spacing dimension has been recognized as an
17
essential step in reaching this goal. Measurement of spacing has been made by means of
software of the Hough radically change method to discover the occasion of a round form
and of an ellipsoidal form which approximate the perimeter of the iris and each the
perimeter of the sclera and the form of the place under the eyebrows respectively. Both
gradient magnitude and gradient direction were used to handle the noise contaminating
the feature space. “Results of this application indicate that measurement of the spacing by
detection of the iris is the most accurate of these three methods with measurement by
detection of the position of the eyebrows the least accurate. However, measurement by
detection of the eyebrows' position is the least constrained method. Application of these
strategies has led to size of a attribute function of the human face with adequate accuracy
to advantage later inclusion in a full bundle for computerized facial consciousness”.

3.6 A DIRECT LDA ALGORITHM FOR HIGH-DIMENSIONAL


DATA WITH APPLICATION TO FACE RECOGNITION

“Linear discriminant analysis (LDA) has been successfully used as a dimensionality


reduction technique to many classification problems, such as speech recognition, face
recognition, and multimedia information retrieval”.The objective is to "nd a projection A
that maximizes the ratio of between-class scatter against within-class scatter S (Fisher's
criterion).

CHAPTER 4

PROPOSED SYSTEM

4.1 SYSTEM ARCHITECTURE

18
4.2 DESIGN METHODOLOGY

The Project utilizes various libraries of Python

4.2.1 OpenCV

19
“OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library”. The main purpose of this was to provide a
common infrastructure for computer vision applications and it was also built specifically
for such purposes not to mention it also accelerated the use of machine perception inside
the business product. “Being a BSD-licensed product, OpenCV makes it straightforward
for businesses to utilize and modify the code”. In total we can say that The library has
about 2500 optimized algorithms which is really insane, “These algorihms contain a
comprehensive set which comprises of each classic and progressive laptop vision and
machine learning algorithms. These algorithms area unit usually accustomed sight and
acknowledge faces, determine objects, classify human actions in videos, track camera
movements, track moving objects, extract 3D models of objects, manufacture 3D purpose
clouds from stereo cameras, sew pictures along to produce a high resolution image of a
full scene, realize similar pictures from an image info, take away red eyes from pictures
taken exploitation flash, follow eye movements, acknowledge scenery and establish
markers to overlay it with increased reality, etc”. The amazing thing about this library is
that it has quite about forty seven thousand individuals of user community and calculable
variety of downloads olympian eighteen million. The library is utilized extensively in
corporations, analysis teams and by governmental bodies. Along with well-established
corporations like “Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, Toyota” that use
the library, there area unit several startups like “Applied Minds, VideoSurf, and Zeitera”,
that create in depth use of OpenCV. OpenCV’s deployed wide array spans vary from
sewing streetview pictures along, police work intrusions in police work video in Israel,
watching mine instrumentality in China, serving to robots navigate and devour objects at
“Willow Garage, detection of natatorium drowning accidents in Europe, running
interactive art in Espana and New York , checking runways for scrap in Turkey”,
inspecting labels on product in factories around the world on to fast face detection in
Japan.

WORKING OF OpenCV

1. READ THE IMAGE

20
OpenCV helps to read the image from file or directly from camera to remake it
accessible for further processing.
2. IMAGE ENHANCEMENT
We will be able to enhance image by adjusting the brightness , sharpness or contrast
of the image.It is used to improve the image quality.
3. OBJECT DETECTION
The openCV cannot detect only the face of a person .It will be also capable for
detecting the objects of the body.
4. IMAGE FILTERING
You can change the image by applying various filters such bluring or sharpening.
5. DRAW THE IMAGE
OpenCv allow to draw text, lines and any shapes.
6. SAVING THE CHANGED IMAGE
After processing , You can save images that are beign modified for future work and
analysis.

4.2.2 Dlib

Dlib is a popular toolkit for machine learning that is used primarily for computer
vision and image processing tasks , such as face recognition, facial landmark detection,
and more.

4.2.3 Cmake

Cmake is used to control the software compilation process using simple platform
and compiler independent configuration files, and generate native makefiles and
workspaces tat can be used in the compiler environment for your choice.

21
4.3 ALGORITHM

4.3.1 Convolutional Neural Network (CNN)

Introduction

In the past few decades, Deep Learning has proved to be a very powerful tool because
of its ability to handle large amounts of data. The interest to use hidden layers has
surpassed traditional techniques, especially in pattern recognition. One of the most
popular deep neural networks is Convolutional Neural Networks (also known as CNN or
ConvNet) in deep learning, especially when it comes to Computer Vision applications.

Figure 4.1 : Sample of CNN

Since the 1950s, the early days of AI, researchers have struggled to make a system that
can understand visual data. In the following years, this field came to be known as
Computer Vision. In 2012, computer vision took a quantum leap when a group of
researchers from the University of Toronto developed an AI model that surpassed the
best image recognition algorithms, and that too by a large margin.

The AI system, which became known as AlexNet (named after its main creator, Alex
Krizhevsky), won the 2012 ImageNet computer vision contest with an amazing 85
percent accuracy. The runner-up scored a modest 74 percent on the test.

At the heart of AlexNet was Convolutional Neural Networks a special type of neural
network that roughly imitates human vision. Over the years CNNs have become a very
important part of many Computer Vision applications and hence a part of any computer

22
vision course online. So let’s take a look at the workings of CNNs or CNN algorithm in
deep learning.

4.3.1 What are Convolutional Neural Network (CNN) ?

In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep


neural networks, most commonly applied to analyze visual imagery. Now when we think
of a neural network we think about matrix multiplications but that is not the case with
ConvNet. It uses a special technique called Convolution. Now in
mathematics convolution is a mathematical operation on two functions that produces a
third function that expresses how the shape of one is modified by the other.

Figure 4.2 : Sample CNN 2

4.3.2 How Does CNN work?

Before we go to the working of Convolutional neural networks (CNN), let’s cover the
basics, such as what an image is and how it is represented. An RGB image is nothing but
a matrix of pixel values having three planes whereas a grayscale image is the same but it
has a single plane. Take a look at this image to understand more.

23
Figure 4.3 : Working of CNN

For simplicity, let’s stick with grayscale images as we try to understand how CNNs work.

Figure 4.4 : Kernel Matrix

The above image shows what a convolution is. We take a filter/kernel(3×3 matrix) and
apply it to the input image to get the convolved feature. This convolved feature is passed
on to the next layer.

24
Figure 4.5 : Convolution of Matrix

In the case of RGB color, channel take a look at this animation to understand its working

CNNs use a series of layers, each of which detects different features of an input image.
Depending on the complexity of its intended purpose, a CNN can contain dozens,
hundreds or even thousands of layers, each building on the outputs of previous layers to
recognize detailed patterns .

The process starts by sliding a filter designed to detect certain features over the input
image, a process known as the convolution operation (hence the name "convolutional
neural network"). The result of this process is a feature map that highlights the presence
of the detected features in the image. This feature map then serves as input for the next
layer, enabling a CNN to gradually build a hierarchical representation of the image.

25
Initial filters usually detect basic features, such as lines or simple textures. Subsequent
layers' filters are more complex, combining the basic features identified earlier on to
recognize more complex patterns. For example, after an initial layer detects the presence
of edges, a deeper layer could use that information to start identifying shapes.

Between these layers, the network takes steps to reduce the spatial dimensions of the
feature maps to improve efficiency and accuracy. In the final layers of a CNN, the model
makes a final decision -- for example, classifying an object in an image -- based on the
output from the previous layers.

4.3.2.1 Unpacking the architecture of a CNN

A CNN typically consists of several layers, which can be broadly categorized into
three groups: convolutional layers, pooling layers and fully connected layers. As data
passes through these layers, the complexity of the CNN increases, which lets the CNN
successively identify larger portions of an image and more abstract features.

Convolutional layer

The convolutional layer is the fundamental building block of a CNN and is where the
majority of computations occur. This layer uses a filter or kernel -- a small matrix of
weights -- to move across the receptive field of an input image to detect the presence of
specific features.

The process begins by sliding the kernel over the image's width and height, eventually
sweeping across the entire image over multiple iterations. At each position, a dot
product is calculated between the kernel's weights and the pixel values of the image under
the kernel. This transforms the input image into a set of feature maps or convolved
features, each of which represents the presence and intensity of a certain feature at
various points in the image.

CNNs often include multiple stacked convolutional layers. Through this layered
architecture, the CNN progressively interprets the visual information contained in the raw
image data. In the earlier layers, the CNN identifies basic features such as edges, textures

26
or colors. Deeper layers receive input from the feature maps of previous layers, enabling
them to detect more complex patterns, objects and scenes.

Pooling layer

The pooling layer of a CNN is a critical component that follows the convolutional
layer. Similar to the convolutional layer, the pooling layer's operations involve a
sweeping process across the input image, but its function is otherwise different.

The pooling layer aims to reduce the dimensionality of the input data while retaining
critical information, thus improving the network's overall efficiency. This is typically
achieved through down sampling: decreasing the number of data points in the input.

For CNNs, this typically means reducing the number of pixels used to represent the
image. The most common form of pooling is max pooling, which retains the maximum
value within a certain window (i.e., the kernel size) while discarding other values.
Another common technique, known as average pooling, takes a similar approach but uses
the average value instead of the maximum.

Downsampling significantly reduces the overall number of parameters and


computations. In addition to improving efficiency, this strengthens the model's
generalization ability. Less complex models with higher-level features are typically
less prone to overfitting -- a phenomenon that occurs when a model learns noise and
overly specific details in its training data, negatively affecting its ability to generalize to
new, unseen information.

Reducing the spatial size of the representation does have a potential downside, namely
loss of some information. However, learning only the most prominent features of the
input data is usually sufficient for tasks such as object detection and image classification.

27
Fully connected layer

The fully connected layer plays a critical role in the final stages of a CNN, where it is
responsible for classifying images based on the features extracted in the previous layers.
The term fully connected means that each neuron in one layer is connected to each neuron
in the subsequent layer.

The fully connected layer integrates the various features extracted in the previous
convolutional and pooling layers and maps them to specific classes or outcomes. Each
input from the previous layer connects to each activation unit in the fully connected layer,
enabling the CNN to simultaneously consider all features when making a final
classification decision.

Not all layers in a CNN are fully connected. Because fully connected layers have
many parameters, applying this approach throughout the entire network would create
unnecessary density, increase the risk of overfitting and make the network very expensive
to train in terms of memory and compute. Limiting the number of fully connected layers
balances computational efficiency and generalization ability with the capability to learn
complex patterns .

4.3.3 CNNs vs. Traditional Neural Networks

A more traditional form of neural networks, known as multilayer perceptrons, consists


entirely of fully connected layers. These neural networks, while versatile, are not
optimized for spatial data like images. This can create a number of problems when using
them to handle larger, more complex input data.

For a smaller image with fewer color channels, a traditional neural network might
produce satisfactory results. But as image size and complexity increase, so does the
amount of computational resources required. Another major issue is the tendency to
overfit, as fully connected architectures do not automatically prioritize the most relevant
features and are more likely to learn noise and other irrelevant information .

28
CNNs differ from traditional neural networks in a few key ways. Importantly, in a
CNN, not every node in a layer is connected to each node in the next layer. Because their
convolutional layers have fewer parameters compared with the fully connected layers of a
traditional neural network, CNNs perform more efficiently on image processing tasks.

CNNs use a technique known as parameter sharing that makes them much more
efficient at handling image data. In the convolutional layers, the same filter -- with fixed
weights -- is used to scan the entire image, drastically reducing the number of parameters
compared to a fully connected layer of a traditional neural network. The pooling layers
further reduce the dimensionality of the data to improve a CNN's overall efficiency and
generalizability.

Convolutional vs. recurrent neural networks

Recurrent neural networks (RNN) sare a type of deep learning algorithm designed to
process sequential or time series data. RNNs are well suited for use in natural language
processing (NLP), language translation, speech recognition and image captioning, where
the temporal sequence of data is particularly important. CNNs, in contrast, are primarily
specialized for processing spatial data, such as images. They excel at image-related tasks
such as image recognition, object classification and pattern recognition.

4.3.4 Benefits of using CNNs for deep learning

Deep learning, a subcategory of machine learning, uses multilayered neural networks


that offer several benefits over simpler single-layer networks. Both RNNs and CNNs are
forms of deep learning algorithms.

CNNs are especially useful for computer vision tasks such as image recognition and
classification because they are designed to learn the spatial hierarchies of features by
capturing essential features in early layers and complex patterns in deeper layers. One of
the most significant advantages of CNNs is their ability to perform automatic feature
extraction or feature learning. This eliminates the need to extract features manually,
historically a labor-intensive and complex process.

29
CNNs are also well suited for transfer learning, in which a pretrained model is fine-
tuned for new tasks. This reusability makes CNNs versatile and efficient, particularly for
tasks with limited training data. Building on preexisting networks enables machine
learning developers to deploy CNNs in various real-world scenarios while minimizing
computational costs.

As described above, CNNs are more computationally efficient than traditional fully
connected neural networks thanks to their use of parameter sharing. Due to their
streamlined architecture, CNNs can be deployed on a wide range of devices, including
mobile devices such as smartphones, and in edge computing scenarios.

4.3.5 Applications of Convolutional Neural Networks

Because processing and interpreting visual data is such a common task, CNNs have a
wide range of real-world applications, from healthcare and automotive to social
media and retail.

Some of the most common fields in which CNNs are used include the following:

Healthcare. In the healthcare sector, CNNs are used to assist in medical diagnostics and
imaging. For example, a CNN could analyze medical images such as X-rays or pathology
slides to detect anomalies indicative of disease, thereby aiding in diagnosis and treatment
planning.

Automotive. The automotive industry uses CNNs in self-driving cars that navigate their
environments by interpreting camera and sensor data. CNNs are also useful in AI-
powered features of nonautonomous vehicles, such as automated cruise control and
parking assistance.

Social media. On social media platforms, CNNs are employed in a range of image
analysis tasks. For example, a social media company might use a CNN to suggest people
to tag in photographs or to flag potentially offensive images for moderation.

30
Retail. E-commerce retailers use CNNs in visual search systems that let users search for
products using images rather than text. Online retailers can also use CNNs to improve
their recommender systems by identifying products that visually resemble those a
shopper has shown interest in.

4.3.6Artificial Neurons

Artificial neurons is a rough imitation of their biological counterparts, are


mathematical functions that calculate the weighted sum of multiple inputs and outputs an
activation value. When you input an image in a ConvNet, each layer generates several
activation functions that are passed on to the next layer.

The first layer usually extracts basic features such as horizontal or diagonal edges. This
output is passed on to the next layer which detects more complex features such as corners
or combinational edges. As we move deeper into the network it can identify even more
complex features such as objects, faces, etc.

31
Figure 4.6 : Artificial neurons

Based on the activation map of the final convolution layer, the classification layer
outputs a set of confidence scores (values between 0 and 1) that specify how likely the
image is to belong to a “class.” For instance, if you have a ConvNet that detects cats,
dogs, and horses, the output of the final layer is the possibility that the input image
contains any of those animals. In 2012 Alex Krizhevsky realized that it was time to bring
back the branch of deep learning that uses multi-layered neural networks. The availability
of large sets of data, to be more specific ImageNet datasets with millions of labeled
images and an abundance of computing resources enabled researchers to revive CNNs .

32
Figure 4.7 : Oject Detection

4.3.7 Background of Convolutional Neural Networks (CNNs)

CNN’s were first developed and used around the 1980s. The most that a
Convolutional Neural Networks (CNN) could do at that time was recognize handwritten
digits. It was mostly used in the postal sectors to read zip codes, pin codes, etc. The
important thing to remember about any deep learning model is that it requires a large
amount of data to train and also requires a lot of computing resources. This was a major
drawback for CNNs at that period and hence CNNs were only limited to the postal
sectors and it failed to enter the world of machine learning.

In 2012 Alex Krizhevsky realized that it was time to bring back the branch of deep
learning that uses multi-layered neural networks. The availability of large sets of data, to
be more specific ImageNet datasets with millions of labeled images and an abundance of
computing resources enabled researchers to revive CNNs .

33
4.3.8 What Is a Pooling Layer?

Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the
spatial size of the Convolved Feature. This is to decrease the computational power
required to process the data by reducing the dimensions. There are two types of pooling
average pooling and max pooling. I’ve only had experience with Max Pooling so far I
haven’t faced any difficulties.

Figure 4.8 : Pooling Layer


So what we do in Max Pooling is we find the maximum value of a pixel from a
portion of the image covered by the kernel. Max Pooling also performs as a Noise
Suppressant. It discards the noisy activations altogether and also performs de-noising
along with dimensionality reduction.

On the other hand, Average Pooling returns the average of all the values from the
portion of the image covered by the Kernel. Average Pooling simply performs
dimensionality reduction as a noise suppressing mechanism. Hence, we can say that Max
Pooling performs a lot better than Average Pooling.

34
Figure 4.9 : Max and Average Pooling

4.3.9 Limitations of Convolutional neural networks (CNNs)

Despite the power and resource complexity of CNNs, they provide in-depth results. At
the root of it all, it is just recognizing patterns and details that are so minute and
inconspicuous that it goes unnoticed to the human eye. But when it comes
to understanding the contents of an image it fails.

Let’s take a look at this example. When we pass the below image to a CNN it detects a
person in their mid-30s and a child probably around 10 years. But when we look at the
same image we start thinking of multiple different scenarios. Maybe it’s a father and son
day out, a picnic or maybe they are camping. Maybe it is a school ground and the child
scored a goal and his dad is happy so he lifts him.

35
Figure 4.10 : Limitations of CNN
These limitations are more than evident when it comes to practical applications. For
example, CNN’s were widely used to moderate content on social media. But despite the
vast resources of images and videos that they were trained on it still isn’t able to
completely block and remove inappropriate content. As it turns out it flagged a 30,000-
year statue with nudity on Facebook.

Several studies have shown that CNNs trained on ImageNet and other popular datasets
fail to detect objects when they see them under different lighting conditions and from
new angles.

Does this mean that CNNs are useless? Despite the limits of convolutional neural
networks, however, there’s no denying that they have caused a revolution in artificial
intelligence. Today, CNN’s are used in many computer vision applications such as facial
recognition, image search, and editing, augmented reality, and more. As advances in
ConvNets show, our achievements are remarkable and useful, but we are still very far
from replicating the key components of human intelligence.

36
4.4.1 FACE DETECTION SYSTEM BASED ON REGION-BASED
CONVOLUTIONAL NEURAL NETWORK (RCNN)

R-CNN, or Region-based Convolutional Neural Network, was introduced by Ross


Girshick in 2014. It is a method for object detection that involves using a convolutional
neural network (CNN) to classify object proposals, or regions of interest (ROIs), within
an image. R-CNN is a two-stage object detection pipeline that first generates a set of
ROIs using a method such as selective search or edge boxes and then classifies the
objects within these ROIs using a CNN .

Computer vision is an interdisciplinary field that has been gaining huge amounts of
traction in the recent years(since CNN) and self-driving cars have taken centre stage.
Another integral part of computer vision is object detection. Object detection aids in pose
estimation, vehicle detection, surveillance etc. The difference between object detection
algorithms and classification algorithms is that in detection algorithms, we try to draw a
bounding box around the object of interest to locate it within the image. Also, you might
not necessarily draw just one bounding box in an object detection case, there could be
many bounding boxes representing different objects of interest within the image and you
would not know how many beforehand.

In the recent years, different architectures and models of ANN were used for face
detection. Rowley, Baluja and Kanade presented face detection system based on a retinal
connected neural network (RCNN) that examine small windows of an image to decide
whether each window contains a face . RCNN for face detection . Object detection is
crucial in computer vision, as it involves locating and identifying objects within an image
or video. There are various approaches to object detection, including traditional methods
such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features
(SURF), as well as more recent deep learning-based methods such as R-CNN, Fast R-
CNN, and Faster R-CNN.

37
The major reason why you cannot proceed with this problem by building a standard
convolutional network followed by a fully connected layer is that, the length of the output
layer is variable — not constant, this is because the number of occurrences of the objects
of interest is not fixed. A naive approach to solve this problem would be to take different
regions of interest from the image, and use a CNN to classify the presence of the object
within that region. The problem with this approach is that the objects of interest might
have different spatial locations within the image and different aspect ratios. Hence, you
would have to select a huge number of regions and this could computationally blow up.
Therefore, algorithms like R-CNN, YOLO etc have been developed to find these
occurrences and find them fast.

4.4.2 R-CNN

To bypass the problem of selecting a huge number of regions, Ross Girshick et al.
proposed a method where we use selective search to extract just 2000 regions from the
image and he called them region proposals. Therefore, now, instead of trying to classify a
huge number of regions, you can just work with 2000 regions. These 2000 region
proposals are generated using the selective search algorithm which is written below.

38
Selective Search:
1. Generate initial sub-segmentation, we generate many candidate regions
2. Use greedy algorithm to recursively combine similar regions into larger ones
3. Use the generated regions to produce the final candidate region proposals

Figure 4.11 : R-CNN features

These 2000 candidate region proposals are warped into a square and fed into a
convolutional neural network that produces a 4096-dimensional feature vector as output.
The CNN acts as a feature extractor and the output dense layer consists of the features
extracted from the image and the extracted features are fed into an SVM to classify the
presence of the object within that candidate region proposal. In addition to predicting the
presence of an object within the region proposals, the algorithm also predicts four values
which are offset values to increase the precision of the bounding box. For example, given
a region proposal, the algorithm would have predicted the presence of a person but the
face of that person within that region proposal could’ve been cut in half. Therefore, the
offset values help in adjusting the bounding box of the region proposal.

39
4.4.2.1 Problems with R-CNN
● no learning is happening at that stage. This could lead to the generation of bad
candidate region proposals It still takes a huge amount of time to train the network
as you would have to classify 2000 region proposals per image.
● It cannot be implemented real time as it takes around 47 seconds for each test
image.
● The selective search algorithm is a fixed algorithm. Therefore.

4.4.3 Fast R-CNN

Figure 4.12 : Fast R-CNN

40
The same author of the previous paper(R-CNN) solved some of the drawbacks of
R-CNN to build a faster object detection algorithm and it was called Fast R-CNN. The
approach is similar to the R-CNN algorithm. But, instead of feeding the region proposals
to the CNN, we feed the input image to the CNN to generate a convolutional feature map.
From the convolutional feature map, we identify the region of proposals and warp them
into squares and by using a RoI pooling layer we reshape them into a fixed size so that it
can be fed into a fully connected layer. From the RoI feature vector, we use a softmax
layer to predict the class of the proposed region and also the offset values for the
bounding box.

The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed
2000 region proposals to the convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature map is generated from it.

From the above graphs, you can infer that Fast R-CNN is significantly faster in
training and testing sessions over R-CNN. When you look at the performance of Fast R-
CNN during testing time, including region proposals slows down the algorithm
significantly when compared to not using region proposals. Therefore, region proposals
become bottlenecks in Fast R-CNN algorithm affecting its performance.

41
4.4.4 Faster R-CNN

Figure 4.13 : Faster R-CNN

Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search to
find out the region proposals. Selective search is a slow and time-consuming process
affecting the performance of the network. Therefore, Shaoqing Ren et al. came up with
an object detection algorithm that eliminates the selective search algorithm and lets the
network learn the region proposals.

Similar to Fast R-CNN, the image is provided as an input to a convolutional


network which provides a convolutional feature map. Instead of using selective search

42
algorithm on the feature map to identify the region proposals, a separate network is used
to predict the region proposals. The predicted region proposals are then reshaped using a
RoI pooling layer which is then used to classify the image within the proposed region and
predict the offset values for the bounding boxes.

From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. Therefore, it can even be used for real-time object detection.

4.4.5 YOLO — You Only Look Once

All of the previous object detection algorithms use regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of the
image which have high probabilities of containing the object. YOLO or You Only Look
Once is an object detection algorithm much different from the region based algorithms
seen above. In YOLO a single convolutional network predicts the bounding boxes and
the class probabilities for these boxes.

43
Figure 4.14 : YOLO

How YOLO works is that we take an image and split it into an SxS grid, within
each of the grid we take m bounding boxes. For each of the bounding box, the network
outputs a class probability and offset values for the bounding box. The bounding boxes
having the class probability above a threshold value is selected and used to locate the
object within the image.

YOLO is orders of magnitude faster(45 frames per second) than other object
detection algorithms. The limitation of YOLO algorithm is that it struggles with small
objects within the image, for example it might have difficulties in detecting a flock of
birds. This is due to the spatial constraints of the algorithm.

44
This system operates following steps:
The R-CNN pipeline can be divided into three main steps:

1. Region proposal: A method, such as selective search or edge boxes, generates a


set of ROIs within the image. The bounding boxes around the objects of interest
typically define these ROIs.
2. Feature extraction: A CNN is used to extract features from each ROI. These
features are then used to represent the ROI in a compact and informative manner.
3. Classification: The extracted features are fed into a classifier, such as a support
vector machine (SVM), to predict the object’s class within the ROI.

4.4.6 Advantages:
∙ This method produces good detection rates (77.9% and 90.3%) with an acceptable
number of false positives.
∙ Depending on the application, the system can be made more or less conservative by
varying the arbitration heuristics or thresholds used
∙ We have also applied the same algorithm for the detection of car tires and human eyes.

45
CHAPTER 5

CONCLUSION

5.1 CONCLUSION

Facial Detection and Recognition systems are gaining a lot of popularity these
days. Most of the flagship smartphones of major mobile phone manufacturing companies
use face recognition as the means to provide access to the user. This project report
explains the implementation of face detection and face recognition using OpenCV with
Python and also lays out the basic information that is needed to develop a face detection
and face recognition software.The goal of increasing the accuracy of this project will
always remain constant and new configurations and different algorithms will be tested to
obtain better results. In this project, the approach we used was that of Local Binary
Pattern Histograms that are a part of the FaceRecognizer Class of OpenCV.

5.2 SCOPE FOR FUTURE WORK

•Government/ Identity Management: Governments all around the world are using face
recognition systems to identify civilians. America has one of the largest face databases in
the world, containing data of about 117 million people.

•Emotion & Sentiment Analysis: Face Detection and Recognition have brought us
closer to the technology of automated psyche evaluation. As systems now a days can
judge the precise emotions frame by frame in order to evaluate the psyche.

•Authentication systems: Various devices like mobile phones or even ATMs work
using facial recognition, thus making getting access or verification quicker and hassle
free.

46
•Full Automation: This technology helps us become fully automated as there is very
little to zero amount of effort required for verification using facial recognition.

•High Accuracy: Face Detection and Recognition systems these days have developed
very high accuracy and can be trained using very small data sets and the false acceptance
rates have dropped down significantly.

47
REFERENCES

[1] Ahmetzdil Metin Metezbilen A Survey on Comparison of Face Recognition


Algorithms IEEE 2015

[2] K. Kim, Intelligent Immigration Control System by Using Passport Recognition


and Face Verification, in International Symposium on Neural Networks Chongqing,
China, 2005, pp.147-156

[3] J. N. K. Liu, M. Wang, and B. Feng, iBotGuard: an Internetbased intelligent robot


security system using invariant face recognition against intruder, IEEE Transactions
on Systems Man And Cybernetics Part C-Applications And Reviews, Vol.35,
pp.97-105, 2005..

[4] T. Choudhry, B. Clarkson, T. Jebara, and A. Pentland, Multimodal person


recognition using unconstrained audio and video, in Proceedings, International
Conference on Audio and VideoBased Person Authentication, 1999, pp.176-181.

[5] J.-H. Lee and W.-Y. Kim, Video Summarization and Retrieval System Using Face
Recognition and MPEG-7 Descriptors, in Image and Video Retrieval, Vol.3115.

Appendix I (Coding)

Util.py

48
import os
import pickle

import tkinter as tk
from tkinter import messagebox
import face_recognition

def get_button(window, text, color, command, fg='white'):


button = tk.Button(
window,
text=text,
activebackground="black",
activeforeground="white",
fg=fg,
bg=color,
command=command,
height=2,
width=20,
font=('Helvetica bold', 20)
)

return button

def get_img_label(window):
label = tk.Label(window)
label.grid(row=0, column=0)
return label

def get_text_label(window, text):


label = tk.Label(window, text=text)
label.config(font=("sans-serif", 21), justify="left")
return label

def get_entry_text(window):
inputtxt = tk.Text(window,
height=2,
width=15, font=("Arial", 32))

return inputtxt

def msg_box(title, description):


messagebox.showinfo(title, description)

49
def recognize(img, db_path):
# it is assumed there will be at most 1 match in the db

embeddings_unknown = face_recognition.face_encodings(img)
if len(embeddings_unknown) == 0:
return 'no_persons_found'
else:
embeddings_unknown = embeddings_unknown[0]

db_dir = sorted(os.listdir(db_path))

match = False
j = 0
while not match and j < len(db_dir):
path_ = os.path.join(db_path, db_dir[j])

file = open(path_, 'rb')


embeddings = pickle.load(file)

match = face_recognition.compare_faces([embeddings],
embeddings_unknown)[0]
j += 1

if match:
return db_dir[j - 1][:-7]
else:
return 'unknown_person'

Main.py

import os.path
import datetime

50
import pickle
import tkinter as tk
import cv2
from PIL import Image, ImageTk
import face_recognition
import util

class App:
def __init__(self):
self.main_window = tk.Tk()
self.main_window.geometry("1200x520+350+100")

self.login_button_main_window =
util.get_button(self.main_window, 'login', 'green', self.login)
self.login_button_main_window.place(x=750, y=300)
self.login_button_main_window.place(x=750, y=200)

self.logout_button_main_window =
util.get_button(self.main_window, 'logout', 'red', self.logout)
self.logout_button_main_window.place(x=750, y=300)

self.register_new_user_button_main_window =
util.get_button(self.main_window, 'register new user', 'gray',

self.register_new_user, fg='black')
self.register_new_user_button_main_window.place(x=750,
y=400)
self.webcam_label = util.get_img_label(self.main_window)
self.webcam_label.place(x=10, y=0, width=700, height=500)
self.add_webcam(self.webcam_label)
self.db_dir = './db'
if not os.path.exists(self.db_dir):
os.mkdir(self.db_dir)
self.log_path = './log.txt'

def add_webcam(self, label):


if 'cap' not in self.__dict__:
self.cap = cv2.VideoCapture(0)
self._label = label
self.process_webcam()

def process_webcam(self):
ret, frame = self.cap.read()
self.most_recent_capture_arr = frame
img_ = cv2.cvtColor(self.most_recent_capture_arr,
cv2.COLOR_BGR2RGB)
self.most_recent_capture_pil = Image.fromarray(img_)
imgtk =
ImageTk.PhotoImage(image=self.most_recent_capture_pil)

51
self._label.imgtk = imgtk
self._label.configure(image=imgtk)
self._label.after(20, self.process_webcam)
def login(self):
name = util.recognize(self.most_recent_capture_arr,
self.db_dir)
if name in ['unknown_person', 'no_persons_found']:
util.msg_box('Ups...', 'Unknown user. Please register
new user or try again.')
else:
util.msg_box('Welcome back !', 'Welcome,
{}.'.format(name))
with open(self.log_path, 'a') as f:
f.write('{},{}\n'.format(name,
datetime.datetime.now()))
f.write('{},{},logged in\n'.format(name,
datetime.datetime.now()))
f.close()

def logout(self):

name = util.recognize(self.most_recent_capture_arr,
self.db_dir)

if name in ['unknown_person', 'no_persons_found']:


util.msg_box('Ups...', 'Unknown user. Please register
new user or try again.')
else:
util.msg_box('Welcome back !', 'See you back,
{}.'.format(name))
with open(self.log_path, 'a') as f:
f.write('{},{},logged out\n'.format(name,
datetime.datetime.now()))
f.close()

def register_new_user(self):
self.register_new_user_window =
tk.Toplevel(self.main_window)
self.register_new_user_window.geometry("1200x520+370+120")
self.accept_button_register_new_user_window =
util.get_button(self.register_new_user_window, 'Accept', 'green',
self.accept_register_new_user)
self.accept_button_register_new_user_window.place(x=750,
y=300)

52
self.try_again_button_register_new_user_window =
util.get_button(self.register_new_user_window, 'Try again', 'red',
self.try_again_register_new_user)
self.try_again_button_register_new_user_window.place(x=750,
y=400)
self.capture_label =
util.get_img_label(self.register_new_user_window)
self.capture_label.place(x=10, y=0, width=700, height=500)
self.add_img_to_label(self.capture_label)
self.entry_text_register_new_user =
util.get_entry_text(self.register_new_user_window)
self.entry_text_register_new_user.place(x=750, y=150)
self.text_label_register_new_user =
util.get_text_label(self.register_new_user_window, 'Please, \ninput
username:')
self.text_label_register_new_user.place(x=750, y=70)

def try_again_register_new_user(self):
self.register_new_user_window.destroy()

def add_img_to_label(self, label):


imgtk =
ImageTk.PhotoImage(image=self.most_recent_capture_pil)
label.imgtk = imgtk
label.configure(image=imgtk)
self.register_new_user_capture =
self.most_recent_capture_arr.copy()

def start(self):
self.main_window.mainloop()

def accept_register_new_user(self):
name = self.entry_text_register_new_user.get(1.0, "end-1c")
embeddings =
face_recognition.face_encodings(self.register_new_user_capture)[0]
file = open(os.path.join(self.db_dir,
'{}.pickle'.format(name)), 'wb')
pickle.dump(embeddings, file)
util.msg_box('Success!', 'User was registered
successfully !')
self.register_new_user_window.destroy()
if __name__ == "__main__":
app = App()
app.start()

53
Appendix II (Screenshots)

54
Appendix III (Copyright)

COPYRIGHT PARTICULARS

CONTACT INFORMATION
1. Details of the Applicant:
Name Permanent Experience Mobile E mail i
(Official Designation & Address with AGE From To Citizenship
No SIGN
Address with Phone No) Phone No
Dr.N.Palanivel,M.E.,Ph.D., Manakula 48 17 years India 9976167
Associate Professor
Vinayagar 499
Institute of
Technology,
Kalitheerthalk
uppam,
Puducherry
605107.

2. Communication Details:
Name: Sathish S
Address: No.25, 5th cross, Sakthi nagar, Saram, Puducherry -13.
Telephone No: 9344082030
Mobile No:9344082030
E-mail Id: [email protected]

55
I. INFORMATION FOR COPYRIGHT FILING
1. Type of creation (Please tick )
o Artistic Works (Poster)
o Musical Works
o Literature Works
o Dramatic Works
o Books
o App
o Coding/Software

Annexures Required:
1. For APP, Coding/Software
Object Code
Source Code
are required. This has to be given in 2 CDS/2 Pendrives for filing.
2. For Books, Artistic works (Posters), Literature Works
2 Hard copies have to be given for filing.
3. For Musical, Dramatic works

2 Provide brief description of the functionality/use of your creation. (The content


given below is only for illustration purpose. Kindly insert your content here)

56
In today's digital era, the need for robust yet user-friendly authentication methods is
paramount. This paper proposes a novel face recognition-based authentication system tailored for
login applications. Leveraging the advancements in deep learning and computer vision, the
proposed system offers a seamless and secure login experience for users across various
platforms and devices.

3 Whether the work is published or unpublished:


UNPUBLISHED

Signature of the Inventor(s) (add Inventors if needed)


1st Author/Inventor
Name: Arunpandian C
Date:
Sign:

2nd Author/Inventor
Name : Sathish S
Date:
Sign:

3st Author/Inventor

57
Name: Pradeep R
Date:
Sign:

4st Author/Inventor
Name: Karkatesh S P
Date:
Sign:

DESIGN PARTICULARS

Project Title: User Login System using Face Recognition

Inventor Details:
Name Designation
S.Sathish
Inventor 1 Student
Phone No: 9344082030 Mail ID:
[email protected]
Name Designation
C.Arunpandian
Inventor 2 Student
Phone No: 8072396488 Mail ID:
[email protected]
Name Designation
S.P.Karkatesh
Inventor 3 Student

Phone No: 9363332168 Mail ID:


[email protected]
Name Designation

58
R.Pradeep Student
Inventor 4
Phone No:9025330797 Mail ID:
[email protected]
Signature of the Inventors
Inventor 1 : Inventor 2 Inventor 3 Inventor 4

INFORMATION FOR DESIGN FILING


Abstract: (Minimum 750Words)

Face recognition technology has emerged as a promising solution for authentication in


various domains, offering a secure, convenient, and user-friendly alternative to traditional password-based
systems. This abstract presents a comprehensive overview of a face recognition login project, detailing its
methodology, benefits, challenges, and potential applications.

Introduction:
In today's digital age, the need for robust authentication mechanisms to protect sensitive information and
secure access to systems has become paramount. Traditional password-based authentication methods are
prone to vulnerabilities such as password theft, phishing attacks, and user forgetfulness. As a result, there
is a growing demand for more secure and user-friendly authentication solutions. Face recognition
technology, which leverages biometric data unique to each individual, has gained traction as a promising
authentication method due to its inherent security and convenience.

Methodology:
The face recognition login project utilizes state-of-the-art computer vision algorithms and machine
learning techniques to authenticate users based on their facial features. The methodology involves several
key steps:

1. Data Collection: A diverse dataset of facial images is collected, encompassing various facial
expressions, lighting conditions, and angles to ensure robust performance.

2. Preprocessing: The collected facial images are preprocessed to enhance their quality and consistency,
including tasks such as normalization, alignment, and resizing.

3. Feature Extraction: Distinctive features are extracted from the preprocessed facial images using
techniques such as Principal Component Analysis (PCA), Local Binary Patterns (LBP), or Convolutional

59
Neural Networks (CNNs).

4. Model Training: A machine learning model, such as a Support Vector Machine (SVM) or a Deep
Learning model, is trained using the extracted features to recognize faces and associate them with specific
users.

5. Deployment: The trained face recognition model is deployed within the login system or application,
where users interact with it through a camera interface.

6. Authentication: When a user attempts to log in, their facial image is captured by the system's camera
and processed by the deployed face recognition model. The model compares the extracted features from
the captured image with those stored in its database to determine the user's identity.

Challenges:
Despite its numerous benefits, the face recognition login project also faces several challenges:

1. Privacy Concerns: The collection and storage of biometric data raise privacy concerns, as facial images
are inherently personal and sensitive information.

2. Algorithm Bias: Face recognition algorithms may exhibit bias, leading to inaccurate or unfair results,
particularly for individuals from underrepresented demographics.

3. Security Vulnerabilities: Like any technology, face recognition systems are susceptible to security
vulnerabilities, including spoofing attacks, adversarial examples, and data breaches.

4. Ethical Considerations: The ethical implications of face recognition technology, such as potential
misuse or abuse, require careful consideration and regulation to ensure responsible deployment.

Potential Applications:
The face recognition login project has a wide range of potential applications across various industries and
domains, including:

1. Mobile Devices: Integrating face recognition into smartphones and tablets for secure unlocking and
authentication of mobile applications.

2. Banking and Finance: Enhancing security for online banking and financial transactions by using face
recognition for user authentication.

3. Healthcare: Securing access to electronic health records and medical devices using face recognition to
authenticate healthcare professionals.

4. Retail: Implementing face recognition for personalized customer experiences and secure access to
loyalty programs and payment systems.

60
Conclusion:
The face recognition login project represents an innovative approach to authentication, leveraging
biometric technology to enhance security, convenience, and user experience. While facing challenges
such as privacy concerns and algorithm bias, the project holds immense potential for revolutionizing
authentication across various industries and domains. With careful consideration of ethical and regulatory
implications, face recognition technology can pave the way for a more secure and accessible digital
future.

Application : ( Minimum 500 Words)

The face recognition login project has numerous applications across various industries and
domains, including:

1. Smartphones and Tablets: Face recognition can be used to unlock devices, authorize payments.

2. Computers and Laptops: Face recognition can replace traditional password-based login methods
on computers and laptops, providing a more secure and user-friendly authentication experience.

3. Access Control Systems: Face recognition can be integrated into access control systems for
buildings, offices, and secure facilities, allowing authorized personnel to gain entry based on facial
recognition.

4. Banking and Finance: Face recognition can enhance security in banking and finance
applications by verifying the identity of customers for online banking, ATM transactions, and
account access.

5. Education: Face recognition can be used in educational institutions for attendance tracking,
access control to campus facilities, and secure authentication for online learning platforms.

6. Transportation: Face recognition can improve security and streamline passenger experiences in
transportation hubs such as airports, train stations, and bus terminals by verifying passenger
identities for boarding and ticketing.

7. Government Services: Face recognition can be deployed in government services for identity
verification in areas such as passport control, border security, voter registration, and social welfare
programs.

These are just a few examples of the diverse applications of the face recognition login project,
demonstrating its versatility and potential to enhance security, convenience, and efficiency across

61
various industries and sectors.

Advantages : ( Minimum 250 Words)

The face recognition login project offers several advantages over traditional authentication methods:

1. Enhanced Security: Face recognition provides a higher level of security compared to passwords.

2. Convenience: Users no longer need to remember complex passwords or carry physical tokens;
they can simply use their face to authenticate. This offers a more seamless and user-friendly login
experience, saving time and reducing frustration.

3. Accessibility: Face recognition can be particularly beneficial for users with disabilities who may
have difficulty typing passwords or using traditional authentication methods. It promotes inclusivity
by offering an alternative authentication solution that is easier to use for a wider range of
individuals.

4. Reduced Fraud: Since facial features are difficult to forge or replicate, face recognition helps
reduce instances of fraud and impersonation. This is especially important in sectors such as banking,
e-commerce, and healthcare where secure authentication is crucial.

5. Scalability: The face recognition login project can be easily integrated into existing systems and
applications, making it scalable and adaptable to various use cases and environments. It can be
deployed across different platforms, devices, and industries with minimal effort.

6. User Experience: By eliminating the need for passwords or PINs, face recognition improves the
overall user experience by simplifying the authentication process. This can lead to higher user
satisfaction and retention rates.

Overall, the face recognition login project offers a secure, convenient, and user-friendly solutions.

62
Appendix IV (Expo Image)

63
64
65

You might also like