0% found this document useful (0 votes)
64 views

Parallel Computer For Face Recognition Using - 9068130 PDF

This document describes a facial recognition application that uses parallel computing on multiple CPU cores to improve performance. The application was implemented using the VGGFace model for neural networks to identify faces. Testing showed that the system's performance scaled linearly as the number of processor cores increased up to 12 cores. The implementation used TensorFlow to enable parallel processing across CPU cores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Parallel Computer For Face Recognition Using - 9068130 PDF

This document describes a facial recognition application that uses parallel computing on multiple CPU cores to improve performance. The application was implemented using the VGGFace model for neural networks to identify faces. Testing showed that the system's performance scaled linearly as the number of processor cores increased up to 12 cores. The implementation used TensorFlow to enable parallel processing across CPU cores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Parallel Computer For Face Recognition Using

Artificial Intelligence
Bashini Balachandran, Kazi Farzana Saad, Ketu Patel and Nagi Mekhiel
Department of Electrical, Computer and Biomedical Engineering
Ryerson University
Toronto, ON, CANADA M5B 2K3
Email: [email protected]

Abstract—We implemented a facial recognition application transferring large amounts of data to all the cores of a CPU.
with AI. We used the VGGFace model for our neural net to Our task is to find effective methods of overcoming this
identify faces. The application includes training and recognizing. inherent challenge in AI applications and creating a model
The training part is to add new faces to our system, while
the recognizing part is to determine the identity of a face. that brings true scaling to AI workloads. One current method
The application runs on multiple cores and able to scale with relies on using one core to transfer data while another core
different numbers of cores. The implementation for parallelism is used for actually processing the data with the neural net
uses tensorflow. For performance measurements, we used the model.
Task Manager application found in Windows with special option This project uses parallel computing to improve the perfor-
known as ’affinity to choose the number of cores to run
the application. The results show that our system scales in mance of an AI application that requires significant computing
performance with number of processors up to twelve. power. The Application is to implement a real time home
monitoring system. The system captures video/images in real
I. I NTRODUCTION time and detects if the person belongs to the household, a
In the recent years, there have been extraordinary growth guest or an intruder based on conditions predefined by the
regarding the field of artificial intelligence. Many great ap- application. There are different types of AI Face recognition
plications have been created based on the foundation of systems, different types of parallelization techniques and ap-
neural nets and other learning techniques. Applications such as plication measuring techniques that we evaluated to decide on
Google’s famous assistant, Google Now, use machine learning the optimal system.
for accomplishing tasks that would otherwise be impossible to
run. Another example is Tesla, the backbone behind their self II. BACKGROUND
driving cars is based on AI. With this rise of AI, there has The application aims to secure home from intruders with the
also been a rise for the need in more processing power to fuel use of Artificial intelligence, tries to enhance the performance
this innovation. of the application by parallelizing it and this enhancement of
Companies like Nvidia have been a key player in moving the this application is then measured. There are a few companies
AI trend forward with their GPUs and rich software libraries. developing security systems using AI.
Moore’s law has been slowing down and there has been Figure 1 shows Lighthouse AIs security camera. This cam-
little increase in IPC of CPU architectures. Our objective is era comes with 3D sensor, speaker, microphone, and an alarm.
to find better ways to use current CPU hardware to better It is controlled by an associated App, that is capable of
execute AI workloads. To do this, we decided to use parallel surveillance live video stream, detect and distinguish between
computing. The process of using multiple cores of a CPU organic or inorganic motion, manage profiles of homeowners,
to concurrently execute the multiple tasks. The main goal is guests and other trusted parties, exclude pings for certain times
for the performance to scale linearly as the number of cores of the day.
increases. Multicore CPUs were the solution that hardware This project focused on Facial Recognition using Artifi-
manufacturers like Intel and AMD proposed as a solution to cial Intelligence. There are many methods of coding Facial
the slow IPC improvements. The biggest challenge that arose Recognition AI in the industry. Among Which the use of
from parallel code was the scaling. It was very hard to write OpenCV, Facenet and VGG-Face are three popular methods.
code that would increase linearly in performance with the Because this project was built using Keras VGG-Face the
number of cores due to the overhead of synchronization and background related to VGG-Face is used for Face Recognition
communication between different processors. and discussed here.
The challenge with AI applications is that it is an entirely VGG-Face or the DeepFace Recognition was developed
different workload. The flow of data is different, with a high by the visual geometry group of Oxford University. Initially
amount of data being transferred from main memory to the DeepFace method is trained using 4 million examples pertain-
CPU. There is a need for efficient model that can handle ing to 4000 different people.

ª*&&& 
to object recognition. We used specific algorithms for feature
extraction and pattern recognition. Facial recognition AI used
in this project was based on three main components:- VGG-
Face, Siamese Network and CNNs.
Siamese neural networks were first applied in the early
1990s by Bromley and LeCun to solve signature verification
as an image matching problem [2]. They consists of a pair of
networks that may acquire different inputs but are combined
together at the end by an energy function. These neural
networks can be used to differentiate between two input
images and establish similarities. In this project, the images
are of faces.
Face Net is an application of of Siamese neural networks.
This deep convolutional network was designed by Google and
trained to solve face verification. It is used to map an image of
a face to a compact Euclidean space, where the distance is a
measure of the similarities of faces. The Smaller the distance,
the similar the faces. CNN indicates the convolutional neural
Fig. 1. Face recognition and notification system
networks, anchor refers to the image already established in
the database against which new images are compared against,
III. M OTIVATIONS AND C ONCEPT positive refers to a face-image that belongs to the same face
as the anchor and negative refers to a face-image that does not
Our project for Safe Home is an Artificial Intelligence
belongs to the same face as the anchor [3]
based application that is capable of recognizing intruders in
The last component of interest is the Triplet Loss. Usually
a home and notifying homeowners by email. The design of
in supervised learning, the number of classes are fixed and
this application was done in three parts. The first part was
SoftMax cross entropy loss function can be used to train the
creating the AI base necessary for facial recognition, followed
network. However, in face recognition, we need to be able to
by adding email functionality and finally designing the user
compare two unknown faces and decide if the face-images
interface of this AI application. The second phase of the design
belong to the same person. This brings rise to employing
was parallelizing this application then implement it in multi-
Triplet loss [4]. Triplet loss enforces a margin between each
core processing. Figure 2 illustrates the design Model that we
pair of faces from one person to all the other faces. In other
followed to complete the Safe Home project.
words, Triplet loss minimizes the distance between the anchor
It was chosen for our Safe Home Application due to its
and a positive while increasing the distance between the anchor
simplicity while providing direct access to modify all three
and a negative.
design phases.
V. T HE S YSTEM D ESIGN
One of the most important design aspects of a software
design project is the coding environment. The coding envi-
ronment of this project was mainly a combination of Pycharm
and Anaconda. The versions used were as follows: Pycharm
- Version: 2019.1.1, Windows, Community Python - Version:
3.6, Windows, 64 bit Anaconda - Version: Python 3.6,
Windows, 64 bit Coding was done in the Python programming
language. The above mentioned frameworks made installing
the libraries needed to run the application very easy to use
and update.
A. Design for Artificial Intelligence
As mentioned above, Safe Home is an application that can
be trained to recognize family members faces. Implying, any
face that was not recognized, can be considered a possible
intruder. For this application to be able to recognize facial
Fig. 2. The AI Design Model
features and subsequently classify the face, Artificial Intelli-
gence programming was used.
IV. U SING A RTIFICIAL I NTELLIGENCE After considering many methods of implementing necessary
The Artificial Intelligence program used for this project is AI into the application, we decided to use Keras with a
based on facial recognition. The facial recognition is similar Tensorflow Backend with OpenCV [5] .


The video that were used for training purposes were stored
inside same directory as the project source code files. For our
project, we trained our AI using 4 videos from 4 people. They
were stored inside data folder and each video file was named
according to the Name that was intended to be displayed
whenever it starts monitoring.
One of the add-ons to this Application was allowing the
program to send an Email to any specified email address.
The Library used to facilitate this procedure was smtplib. This
allowed the setting up the server to send the email, dictate the
msg and subject and eventually send the email when the code
pertaining was prompted via a method call.
Figure 3 shows Flow chart for AI programming of Safe
Home based on VGGFace

Fig. 4. User Interface (UI) of Safe Home AI Application after Start


Monitoring Now

Fig. 3. Flow chart for AI programming of Safe Home based on VGGFace

Fig 4 shows the User Interface (UI) of Safe Home AI


Application after Start Monitoring Now was pressed. This
User Interface was designed to be able to install on non-web
based application. Libraries used to make this UI was tkinter
and pillow(PIL). It was made by creating a window that can
display an preloaded image and an button that can be pressed
to prompt the monitoring of the home. Fig. 5. Flow chart of Safe Home AI Application
Figure 5 shows User Interface (UI) Flow chart of Safe Home
AI Application.
send it to the workers. The master also inserts send and receive
VI. T HE D ESIGN FOR PARALLEL P ROCESSING nodes to allow the flow of information between the workers.
To achieve parallelism in our application, we relied on using The second stage of parallelism occurs at the worker ser-
the built-in functionality of the tensorflow core. The core is vices. These workers have special kernels they can dispatch
comprised of the client, underlying C api, distributed master, on local devices. If there are multiple cores available, the
kernels and hardware layers. workers will run the kernels in parallel. The workers also
Figure 6 shows the Tensorflow Core Architecture. have various optimizations that allow for efficient transfers
In the tensorflow core, the client represents the user, who between the local cores and between tasks. As for the kernels
is responsible for creating a program. The program composes themselves, they are implemented using C++ templates for
of a neural network with individual operations or a library. efficient parallel code.
The program has a respective session, an environment for
evaluating tensor objects. This session passes on the graph A. Implementation Using Parallel Processing
definition from the client to the distributed master. In our search for a suitable multiprocessing library, we
After the graph definition passes from the client to the decided to use PyMP. It is a python implementation of
master, the master will find an appropriate partitioning and OpenMP. Its design philosophy is based on python, yet it


Fig. 7. Performance of the Training Application on different cores

number of processors up to number of cores =6 then it does


Fig. 6. Tensorflow Core Architecture
not improve much with increasing the number of cores. This
is due to the amount of parallelism in the code for recognition
has a fixed number of independent parallel tasks thus affecting
contains all the needed features of OpenMP such as minimal the overall performance gain with the increase in number of
code changes. cores.
It is important to find a balance between the number of Figure 8 also shows the time spent in scaling to make the
cores and the performance requirements of the functions. If application more scalable which improves only when number
the number of independent tasks that can run in parallel is of cores is larger than 4. If the number of cores is smaller
less than the number of available cores, then there are number than 4, the time it takes for scaling increases due to overhead
of cores are not utilized, thus affecting system scalability [1]. added to application and that cannot be utilized unless number
VII. R ESULTS AND D ISCUSSION of cores increases more than 4.
Figure 8 shows the overall system performance and scalabil-
The results were obtained during the implementation on a
ity of application in using 12 cores improves by about 250%.
12 core desktop computer.
This indicates that using parallel processing is useful and gives
Figure 7 shows the time it takes in training versus number of
considerable gain in reducing time to identify intruders for a
cores. The graph shows significant gain in reducing training
security system that requires fast action.
time with the increase in number of cores. The scalability
increases linearly with number of used cores up to four
processors. This resulted in a reduction in training time by
100% when using 4 cores.
When increasing the number of cores more than four the
rate of improvement decreases due to a portion of training
application that cannot be improved according to Amdahl’s
law.
Figure 8 shows the results of time it takes in running the
application when using different numbers of cores.
The results show the time of initialization is independent
on changing the number of cores. This causes a bottleneck
to the performance improvement when increasing the number
of cores. This limitation is common for parallel processing
when a portion of time cannot be improved with parallelism
according to Amdahl’s law, therefore limiting the performance
of the whole system. This is noticeable from the graph when
number of cores equal six, the training time= 12 sec then when
number of cores increased to 12, the initialization time=11 sec Fig. 8. Performance of the Face recognition Application on different cores
only a 10% decrease for doubling the number of processors.
Figure 8 also shows the recognition time versus the number In summary, the results obtained were very close to that
of cores. The recognition time scales well with the increase in expected. The performance improvement of just running the


application on a desktop with higher processing power and
more cores instead of a personal laptop was significant. The
total time required to run even on a single core on the
desktop machine was significantly lower than that on the
laptop. That being said, the trend observed in both cases were
very similar. Upon comparison of the graph obtained from the
4 cores laptop and the 12 core desktop shows good scaling
trend. As more cores added the total time decreased and thus
performance increased.
Another major observation from the analysis was the effect
of bottlenecks, it can be seen that in the init function used
during the facial recognition, the total time the function
needs to implement does not change with the amount of
cores provided. The time always remains around 5.01 seconds
regardless of how many processors added.
It can be seen that by the time all 12 cores are activated and
the application runs, almost half the total application runtime is
employed by the init function. Another major trend that can be
seen is the effect of performance ceiling. This can be observed
by looking at the total times for the 6, 8, 10 and 12 cores.
In comparison to the first 4 cores, the performance scale up
decreases significantly. The values are almost the same and the
lines are almost flattened signifying again that just increasing
the number of cores on the machine does not simply increase
performance.
VIII. CONCLUSIONS
We implemented a working facial recognition application
with the use of AI. The VGGFace model was used for our
neural net. There are two major parts to our application,
training and recognizing. The training part is to add new faces
to our system, while the recognizing part is to determine the
identity of a face. The application is successfully able to run
on multiple cores.
Results show that the application is able to scale well
among the different number of cores using tensorflow. We have
achieved the major goals of this project creating a working
facial recognition application based on neural nets that scales
well.
R EFERENCES
[1] J. Hennessey and D. Patterson, Computer Architecture : A Quantitative
Approach,” 2nd ed. Morgan Kaufmann, San Mateo, Calif. 2003.
[2] Jane Bromley, James W Bentz, Leon Bottou, Isabelle Guyon, Yann
LeCun, Cliff Moore, Eduard Sackinger,and Roopak Shah. Signature
verification using a siamese time delay neural network. International
Journal of Pattern Recognition and Artificial Intelligence, 7(04):669688,
1993
[3] Schroff, F., Kalenichenko, D., Philbin, J. (2015). Facenet: A unified
embedding for face recognition and clustering. In Proceedings of the
IEEE conference on computer vision and pattern recognition (pp. 815-
823).
[4] Hermans, A., Beyer, L., Leibe, B. (2017). In defense of the triplet loss
for person re-identification. arXiv preprint arXiv:1703.07737.
[5] Beaucourt, L. (2018, April 12). Real-time and video processing object
detection using Tensorflow, OpenCV and Docker. Retrieved from
https://towardsdatascience.com/real-time-and-video-processing-object-
detection-using-tensorflow-opencv-and-docker-2be1694726e5



You might also like