ai based recognition system
ai based recognition system
The prototype could successfully recognize human faces and activate the
electronics components. It has fast performance and could log information
about recognized humans in the Google database.
1 Introduction 1
2 Artificial Intelligence 2
3 Facial Recognition 20
4 Implementation 25
4.2.2.4 Database 36
5 Conclusion 43
References 44
Appendices
Appendix 1: The encodings of an image in the dataset
Appendix 2: The Circuit diagram of the project
List of Abbreviations
2D: 2-dimensional
3D: 3-dimensional
1 Introduction
The goal of the thesis project was to build a facial recognition system that
could recognize people through the camera and unlock the door locks. The
prototype would be fixed to the doors and use the camera to operate the
whole circuitry. The results would be logged in the google database and
analyzed by users after the successful login.
2 Artificial Intelligence
Artificial intelligence divides into two categories: strong AI and weak AI.
Weak AI is a narrow application, and it is suitable for specific tasks, for
instance, virtual assistants. On the other hand, strong AI is a broader
application and has human-level intelligence. It is mainly used in advanced
robotics and automation. [4.]
3
The supervised learning algorithms begin the operation by feeding the data
and adjusting the weights until the model fits appropriately. This process is
used to ensure that the model prevents overfitting and underfitting. Over
time, the algorithms learn to approximate the connection between the input
data and labels. Once the algorithms are fully trained, they can observe the
new objects and predict proper labels. [6.]
There are a lot of possible hyperplanes that can be found in a plane, as shown in
figure
4. In order to find the optimal hyperplane, among others, mathematical
computation of the margin is needed, which is described below.
5
𝑓(𝑥) = 𝑤 𝑇 𝑥 + 𝑏 (1)
In the equation, w and b are the weight vector and the bias, respectively [9].
|𝑤𝑇𝑥 + 𝑏| = 1 (2)
Where 𝑥 is data points closer to the hyperplane called support vectors that
are used to increase the margin and help to build a classifier. [9.]
The next step is to compute the distance between point 𝑥 and the
hyperplane by using the rule of geometry:
|𝑤𝑇𝑥+𝑏|
𝑑= ‖𝑤‖
(3)
|𝑤𝑇𝑥+𝑏|
𝑑𝑠𝑣
1
= = (4)
‖𝑤‖ ‖𝑤‖
6
The margin is twice the distance between points and the hyperplane:
𝑀 = 2 ∗ 𝑑𝑠𝑣
2
= ‖𝑤‖
(5)
min 𝐿(𝑤) =
‖𝑤‖
2 (6)
𝑤,𝑤0 2
In order to find the perpendicular distance between two data points, x and
z, the following formula is used.
√∑𝑃 (
𝑥 |
𝑖 − 𝑧 ) =
𝜇(𝑧)| 2
(7)
𝑖=1 𝑖 ‖𝑤‖2
Equation (7) is the Euclidean distance formula and it is used to calculate the
distance between two data points and implemented in face recognition,
which is described in further sections. [10.]
7
Clustering
Association
Dimensionality Reduction
The model gives faulty results in the beginning. Despite this, as long as the
feedback is provided to the algorithm, it selects correct feedback over
incorrect ones and improves itself for the subsequent trial. Over time, the
algorithm learns and makes fewer mistakes than it used to. [6.]
Deep learning algorithms process unstructured data, like text and images,
and automate feature extraction. For instance, the algorithm processes a
set of photos of different animals to categorize by a cat, dog, etc. They can
determine which features are most significant to distinguish each animal
from another, like ears, nose, etc. [12.]
9
Neural networks are at the heart of deep learning algorithms. Their name
and structure are inspired by the biological neuron. Neuron in neural
networks is a mathematical operation and imitates the functioning of a
biological neuron, as schematized in figure
7. [13.]
As Figure 7 illustrates, the input feeds into the neuron and produces the output.
On the other hand, several input neurons are used to solve complicated
problems, as shown in figure 8.
𝑧 = ∑𝑖 𝑤𝑖𝑥𝑖 + 𝑏 (8)
𝑎 = 𝜓(𝑧) (9)
Each neuron multiplies its weights 𝑤𝑖 to inputs 𝑥𝑖, adds the bias 𝑏 and passes
the sum through the activation function 𝜓. [14.]
10
The architecture of the artificial neural networks starts with the input layer,
which ingests the data for processing, and gives the material to hidden
layers to do all the mathematical computations. Finally, the output layer
produces the result for given inputs. [15.]
CNN starts the operation by converting the inputted image into pixels, and
forwards it to filter processing. The filters used in image processing are
vertical-edge and horizontal- edge filters. The combination of those filters
gets the edges of an object in an image. [16.] The vertical edge filter, VEF, is
defined as follows:
1 0 −1
𝑉𝐸𝐹 = [1 0 −1] =
𝐻𝐸𝐹𝑇 (10)
1 0 −1
This filter slides over the input image to extract the vertical edges, which is
the sum of the elementwise product in each block, as shown in figure 11.
[16.]
Figure 11. The feature map after filtering the image [16]
When VEF is used, the pixels on the edges are less used than those in the
middle. It means that the data from the edges are ignored. In order to solve
this problem, padding can be added around the image to consider the edge
pixels, as shown in figure 12. [16.]
12
Figure 12. The output, after adding padding around the image [16]
Once the stride and the padding are defined, here comes to construct a
CNN, layer per layer. CNN consists of three layers: convolution, pooling, and
fully-connected layers. [16.]
As mentioned above, CNN derives its name from the convolutional operator.
The primary goal of the convolutional layer is to extract features from the
input image, which can be mathematically represented as a tensor with the
following dimensions:
The filter has the odd dimension 𝑓 to center each pixel and the
same number of channels as the input image [16].
𝑎 ) 𝜓 𝐻 ∑
𝑊 ∑
𝐶
𝑥,𝑦 𝑖=1 𝑗= 𝑘=1
1
Here, 𝑎[𝑙−1] is the input image with the dimensions (𝑛[𝑙−1], 𝑛[𝑙−1], 𝑛[𝑙−1]), and 𝑛[𝑙]
𝐻 𝑊 𝐶 𝐶
is the
𝑎[𝑙]
= 𝑐𝑜𝑛𝑣(𝑎 , 𝑐𝑜𝑛𝑣(𝑎 , ,… ,) 𝜓[𝑙](𝑐𝑜𝑛𝑣(𝑎[𝑙−1],
]𝜓[𝑙 𝐾 ) ,𝜓[𝑙] 𝐾 )
[𝑙]
𝐾(𝑛 ))
(1) [𝑙−1] (2) [𝑙−1]
(15)
[ ( ) ( ) 𝐶 ]
with
14
�𝐻/ = ⌊ + 1⌋
[𝑙−1]
[𝑙] 𝑛 𝐻/𝑊 +2𝑝[𝑙]−𝑓[𝑙]
𝑠[𝑙]
(17)
�𝑊
Figure 14. Illustration of the convolutional layer with multiple filters [16]
In figure 14, 𝑝[𝑙] and 𝑠[𝑙] are the padding and stride parameters,
respectively, and the learned parameters from these convolutional layers
are filters and the bias [16].
CNN uses the pooling layer to reduce the training time and the
dimensionality of each feature map by applying it to each channel.
However, it still maintains the useful information in the image. There are
two often-used pooling types: max and average pooling. Max pooling
returns the largest element from the feature map. On the other hand,
average pooling takes the average of all elements, as illustrated in figure
15, when the stride parameter is equal to two. [16.]
1
5
𝑝𝑜𝑜𝑙(𝑎[𝑙−1])
= 𝜙[𝑙]((𝑎 )
[𝑙−1]
2 (18)
)
𝑥,𝑦,𝑧 𝑥+𝑖−1,𝑦+𝑗− [𝑙 ]
1,𝑧 𝑖,𝑗∈[1,2,…,𝑓 ]
Here, 𝑎[𝑙−1] is the input image to the pooling layer, which passes through a
pooling function 𝜙[𝑙] to the output 𝑎[𝑙] as shown in figure 16. [16.]
This layer only produces the compressed version of images using the
pooling function, and it has no learned parameters [16].
16
The fully connected layers are the main layers of the CNN, which connects
every neuron in one layer to every neuron in the other layer. The primary
purpose of these layers is to use convolutional and pooling layers and
produce the desired output. They are the layers where the actual neural
network starts and takes in a vector 𝑎[𝑙−1] and returns a vector 𝑎[𝑙]. The
formula of the fully connected layer on the 𝑗𝑡ℎ node of the 𝑖𝑡ℎ layer is
[𝑖] [𝑖] 𝑛𝑖−1 [𝑖] [𝑖−1]
+ 𝑏= ∑
𝑧 𝑤 𝑎
(19)
𝑗 𝑙=1 𝑗,𝑙 𝑙 𝑗
𝑎 [𝑖]
= 𝜓 (𝑧 )
[𝑖] [𝑖]
𝑗 𝑗
(20)
Here, 𝑤[𝑖] is the weight, 𝑏[𝑖] is the bias, and 𝑎[𝑖−1] is the output of the pooling
𝑗,𝑙 𝑗
layer with
The fully connected layers can be summarized in the illustration in figure 17.
𝑛𝑖−1 = 𝑛 ×𝑛 ×𝐶𝑛
[𝑖−1] [𝑖−1] [𝑖−1]
𝐻 𝑊 (21)
This vector feeds into the fully connected layer and generates the output. The
learned parameters from this layer are the weights and the bias. [16.]
1
7
Initially, CNN extracts features from the input image by performing the
convolutional and the pooling layers. These features fed to fully connected
layers to produce the output.
The output can be the label or other features of the inputted image, like 128
measurements described in further sections.
Data preprocessing is the step to transform the data so that the computer
can easily read it. It is applied to increase the number of images in a given
dataset. There are many techniques used in data preprocessing, such as
cropping, rotation, flipping, etc. These techniques enable better learning
due to the large size of the training set and allow the algorithm to learn
from different conditions.
Before the CNN is trained, the dataset splits into training and test set. The
training set is used to train the algorithm and consists of 80% of the
dataset. On the other hand, the test set is used to check the algorithm's
precision. [14.]
18
Learning algorithms
Learning algorithms aim to find the best parameters that give the best
prediction. For this, the loss function 𝐽 is defined to measure the distance
between the real and the predicted values. The loss function has two steps:
forward propagation and backward propagation. [14.]
evaluated as
∑𝑚 ℒ( 𝑦̂ 𝜭 , 𝑦 )
1
𝐽(𝛳) =
𝑖 𝑖
(22)
𝑚 𝑖=1
Here, 𝑚 is the size of the training set, 𝛳 is the model parameters, ℒ is the
cost function and 𝑦𝑖 is the real values for all 𝑖 = (1,2, … , 𝑁). 𝑁 is the
iteration of the same process, called epoch number. [14.]
The convolutional neural network is fully trained when the parameters are
adjusted, and the training of CNN gives the minimum loss, which makes the
model fast and reliable..
Sigmoi
d:
𝜓(𝑥) = 1 (24)
1+𝑒
𝑥
−
Tanh:
𝜓(𝑥−2𝑥
)=
(25)
1−𝑒
1+𝑒
𝑥
−2
LeakyReLU:
The RNN is a type of neural network that applies sequential data and is
used for natural language processing, speech recognition, language
translation, etc. RNNs are derived from feedforward neural networks and
can use their memory to take information from previous inputs to impact
the current input and output, as shown in figure 19. [18.]
The rolled RNN represents the total predicted outputs. On the other hand,
the unrolled RNN represents the individual layers of the neural network, and
each layer maps to a single output. [18.]
20
Computer vision is a field of AI and works like human vision. It uses deep
and machine learning algorithms described in sections 2.1 and 2.2 to
enable computers to observe and understand images and videos by feeding
lots of data. They run data over and over until they recognize images. [19].
3 Facial Recognition
Face recognition is executed in three stages: face detection, face encoding, and
face classification [22].
The operation of face recognition starts by detecting faces which uses the
HOG method to detect the faces in an image. The HOG stands for the
histogram of oriented gradients. It starts the operation by converting an
image to black and white. For every pixel in an image, surrounding pixels
are selected to figure out the darkness of that pixel compared to
surrounding pixels. Then the arrow is drawn in the direction of the
darkness, as shown in figure 20. [22.]
This process repeats for every single pixel in an image. In the end, every
pixel replaces by arrows. These arrows are called gradients, which are
2
obtained by combining 1
22
magnitude and angle from the image. First, gradients 𝐺𝑥 and 𝐺𝑦 are calculated
for each pixel using the following formulas. [23.]
(28) After these calculations, the magnitude and the direction of the
gradient are obtained as
The magnitude and the direction are divided into several cells. For each cell, a
9-point histogram is calculated and each bin produces the intensity of gradient.
Four cells are combined to form a block once the histogram computation is
over for all cells. This combining is done in an overlapping manner, as
shown in figure 21. [23.]
For all four cells in a block, 9-point histograms of each cell are
concatenated to form a 36-point feature vector. Then the normalization is
applied to reduce the effect of changes in the contrast between images of
the same face. [23.]
Figure 22 below shows the inputted HOG image extracted from a bunch of other
training faces [22].
2
3
In this way, the faces can be easily found in any image. If the image size is
128x64, then the total HOG feature is
𝑇𝑓 = 7 ∗ 15 ∗ 36 = 3780 (31)
Here, 36 is the feature vector, 7 and 15 are the blocks in horizontal and
vertical directions, respectively. [23.]
The HOG method goes through 8 steps to collect the feature vectors. Those
feature vectors obtain the HOG feature according to the input image.
24
After detecting the person's face, FaceNet is used to extract features from
that face. It is a convolutional neural network published in 2015 by Google
researchers Florian Schroff, Dmitry Kalenichenko, and James Philbin.
Generally, the CNN trains to recognize pictures, objects, and digits.
However, FaceNet takes an input image of a person's face, extracts the
feature from convolutional and max-pooling layers as described in section
2.2.2, and generates a vector of 128 measurements from fully- connected
layers, as shown in figure 24. [24.]
Figure 25. Distances between embeddings of anchor, positive and negative [20]
When the embeddings give close measurements, the neural network is trained
and can generate 128 measurements for any face [22].
The last step is to compare the embedding of the test image with the
embedding of the database image. In this case, the machine learning
algorithm SVM can be used to classify the test image with the closest
match. As described in section 2.1.1, equation
(7) is used to find the distance between two data points. The same
technique can be applied to the embeddings of images. If the distance
between these embeddings is small, the faces are from the same person
and vice versa. [22.]
Overall, the face recognition system can be summarized in the following figure
26.
Figure 26. Illustration of the face recognition system (Modified from [24])
After FaceNet is trained, the database and the test images pass through the
FaceNet, which generates embeddings. These embeddings feed into the
SVM classifier to tell whether they match or not.
26
4 Implementation
This section of the thesis describes the practical use of the theoretical
background, the necessary materials, tools, technologies, and the detailed
workflow of the project.
4.1.1 Python
In this project, Python was used for machine learning, deep learning,
mathematics, and computer vision by taking advantage of various Python
libraries such as OpenCV, TensorFlow, and Openface.
4.1.2 OpenCV
In this thesis work, the OpenCV library was used to read the path, capture
the video, draw the frames, and put the name of the detected face.
2
7
4.1.3 TensorFlow
4.1.4 Openface
4.1.5 Firebase
Firebase is a Google backend platform that helps to build and run web and
mobile applications. This platform provides tools for analytics, reporting,
marketing, fixing app crashes, cloud messaging, test lab, authentication, as
well as a real-time database, which is used in the project and described in
further sections. [30.]
4.1.6 HTML/CSS/JS
HTML, CSS, and JavaScript are the languages to run the web. They all are
related but have specific functions. HTML controls the layout of the content,
which provides the structure for the web page. Then CSS applies to stylize
the web page elements, mainly targets various screen sizes to make web
pages responsive. The last step is to use javascript for adding interactivity
to a web page. [31.]
28
Jetson Nano is NVIDIA’s small and powerful computer for AI purposes such
as deep learning and computer vision. Figure 27 illustrates the jetson nano
board. [32.]
Jetson nano board has four USB ports, an HDMI port, two connectors for the
CSI cameras, and 40 GPIO pins expansion header to control electronics
components. The operating voltage for this board is 5 Volts using a barrel
jack and a micro-USB port.
The barrel jack delivers 4 Amps, while the micro-USB port has 2.5 Amps. [33.]
Jetson Nano allows running multiple neural networks in parallel for image
classification, segmentation, object detection, speech processing, and face
recognition [32].
4.1.8 Arduino
4.2.1 Hardware
Various components and sensors were used in this project to build the fully
functional facial recognition system. Some of these components and sensors
are attached to the Arduino UNO board and others to the Jetson Nano board,
as illustrated in figure 29.
The table 1 below shows the list of all the necessary components, their
quantity, and values.
Resistor 2x 330 Ω
Green LED 1x -
Red LED 1x -
Relay 2x 5 Volts
Buzzer 1x -
Ultrasonic sensor 1x -
OLED display 2x -
Fan 1x 5 Volts
Webcam 1x -
Wi-Fi Dongle 1x -
USB cable 1x -
In this project, the ultrasonic sensor was used to measure the distance.
When the distance is less than 30 centimeters, then the buzzer buzzes, and
the OLED display outputs the message “Please, look at the camera,” as
shown in figure 30.
In this project, the ultrasonic sensor was used to measure the distance. When the
distance is less than 30 centimeters, then the buzzer buzzes, and the OLED display
outputs the message “Please, look at the camera,” as shown in figure 30.
Resistors were used to limit the current through the green and red LEDs.
These LEDs were connected to the Arduino UNO. The green LED burns when
the face is recognized, and the red LED burns when the access is denied, as
3
shown in figure 31. 1
32
The relays were used to send the power to solenoid locks in figure 32
below, which lock and unlock the door.
The fan was attached to the Jetson Nano heat sink to cool the processor
during the training process, and the webcam was used to capture the
video. The Wi-Fi dongle was plugged into the USB port of the jetson nano to
access the internet since the Jetson Nano does not have built-in Wi-Fi. The
board was powered using the 5V 2.5A
3
3
Raspberry Pi adapter and shared that power with Arduino using the USB
cable. This USB cable was also used to make a serial communication
between these two boards.
4.2.2 Software
The dataset image and the real-time face pass through the facial
recognition stages. When the embedding gives the close measurement in
the face classification section, it means that the faces match, and the data
is sent to the google database. All these steps in the block diagram are
explained in further sections.
image = cv2.imread('image1.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
fd, hog_image = hog(image, orientations = 8, pixels_per_cell =
(16,16),cells_per_block = (1,1), visualize = True, multichannel =
True)
34
Listing 1. A python code that generates the face pattern using the HOG function
[36]
3
5
Here, the HOG function was applied to 16x16 pixels per cell and 1x1 cells
per block with eight vector orientations. The output from this HOG function
can be plotted using the matplotlib library, as shown in listing 2 below.
fig, (ax1,ax2) = plt.subplots(1,2, figsize = (8,4), sharex = True,
sharey = True)
ax1.axis('off')
ax1.imshow(image, cmap = plt.cm.gray)
ax1.set_title('Input Image')
hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range =
(0,10))
ax2.axis('off')
ax2.imshow(hog_image_rescaled, cmap = plt.cm.gray)
ax2.set_title('Histogram of Oriented Gradients')
plt.show()
Listing 2. A python code that plots the output from the HOG
function [36] The following figure 34 shows the output from the
HOG function.
This HOG image was inputted to the function in the face recognition library
to detect the face, as shown in the following python code in listing 3.
import face_recognition
img = “Ogtay_Ahmadli.jpg”
color=(0,0,255)
faceLocationCurrentImage = face_recognition.face_locations(hog_image)
y1,x2,y2,x1 = faceLocationCurrentImage cv2.rectangle(img,(x1,y1),
(x2,y2),color,1)
Listing 3. A python code that draws a rectangle to the detected face
36
After the successful detection in figure 35, the new python subroutine
called findEncoding() was created to find the encodings for each face image
in the dataset. Firstly, this subroutine goes through the dataset, and for
each image in the dataset, the FaceNet method was used to generate those
encodings. When the encoding process is completed, the subroutine returns
two lists. The first list is the encoding list of each image in the dataset, as
illustrated in Appendix 1. The second list is the list of the names in the
dataset, as shown in figure 36.
Once the face images were encoded, the subroutine called recognizeFaces()
was created to recognize faces using the support vector machine algorithm.
This subroutine takes the returned lists from the previous subroutine as
inputs along with the image.
The process of the subroutine starts by generating the encodings of the
real-time face image detected from a webcam. Next, the encodings are
looped through to calculate the face distance and the result. The result is
the list that compares the dataset faces with the real-time face using the
compare_faces() function of the face recognition library and outputs the
following list in figure 37.
3
7
The face distance is computed using equation (7) in section 2.1.1, which is
the Euclidean formula to find the sum of the distance between encodings of
the dataset and real-time faces, as shown in listing 3.
faceDistance = distance.euclidean(encodingList,encodingFace)
Listing 3. A python code to calculate the distance between encodings
matchIndex = np.argmin(faceDistance)
Listing 4. The python code to get the minimum value of a list
The output from this line is equal to one, which is the index of the second
element in a list in figure 38.
The following listing 5 checks whether the result in figure 38 is true or false
at the minimum value.
names = []
if result[matchIndex]:
name = classNames[matchIndex]
color = (0,255,0)
sm.sendData(ser,[0,0,1,0], 1)
38
else:
name = 'unknown'
color = (0, 0, 255)
sm.sendData(ser,[1,1,0,1], 1)
names.append(name)
Listing 5. A python code to recognize faces.
Here, If the result is true, it means that the face is recognized. The name is
labeled according to the list in figure 38 and the match index. Then the data
is sent to the Arduino UNO to unlock the solenoid locks and turn on the
green LED.
On the other hand, If the result is false, the name is labeled as ”unknown,”
and the Arduino UNO receives the data to keep the locks closed and turn on
the red LED.
y1,x2,y2,x1 = faceLocation
y1,x2,y2,x1 = int(y1/0,25), int(x2/0,25), int(y2/0,25),
int(x1/0,25) cv2.rectangle(imgFaces,(x1,y1),(x2,y2),color,2)
cv2.putText(imgFaces, name, (x1+6, y1-6),
cv2.FONT_HERSHEY_COMPLEX,1,color,2)
Listing 6. A python code to draw a rectangle and put text to the recognize face
[36]
Due to the image size in Figure 35, the face locations are increased four
times to get the proper face frame from the webcam. Then a rectangle and
a text were added around the face using the computer vision library.
3
9
4.2.2.4 Database
In this project, firebase was used to keep the data in google’s real-time
database. First, the firebase database was created, and then the following
python module (listing 7) was designed to get the communication with
firebase.
After importing the firebase library, the URL of the firebase database was
copied to the code. Then the postData() subroutine was created to post the
name and the time to the database.
def markAttendance(name):
with open('Attendance.csv','r+') as f:
myDataList = f.readlines()
nameList = []
for line in myDataList:
entry = line.split(',')
nameList.append(entry[0])
if name not in nameList:
now = datetime.now()
dateString = now.strftime('%H:%M:%S') f.writelines(f'{name},
{dateString}\n') fbm.postData(name,dateString)
Listing 8. The python subroutine that marks the name and the date [36]
40
def main():
encodingList, classNames = findEncodings("ImageAttendance")
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW) sm.sendData(ser,
[1,1,0,0],1)
while True:
success, img = cap.read()
imgFaces, names = recognizeFaces(img, encodingList,
classNames)
for name in names:
if name == "unknown":
sleep(0.2)
else:
markAttendance(name)
cv2.imshow("Image", imgFaces)
if cv2.waitKey(1) & 0xFF == ord("q"): break
Listing 9. The transmitter function
The function starts the operation by taking the returned values of the
findEncodings() function according to the images in the dataset called
“ImageAttendance.” Then it activates the camera and sends the initial lock
and LED values to the Arduino UNO board.
Then the webcam captures and inputs the image to the recognizeFaces()
function. Here, the for loop was used to loop through the names of the
captured faces. If the
4
1
face is not recognized, the program does not publish anything. Otherwise,
the name and the time are sent to the database, as shown in Figure 39.
In the end, the function displays the output, which can be seen in figure 40.
In this project, the Jetson Nano is responsible for AI, and Arduino UNO is
responsible for Electronics operation. The Jetson Nano board is in serial
communication with Arduino UNO to transmit the desired data and make
the components operate, as shown in figure 41.
When the Jetson Nano connects to Arduino UNO with the USB cable, the
python subroutine shown in listing 10 checks if the boards are connected.
import serial
def initConnection(portNo, baudRate):
try:
ser = serial.Serial(portNo, baudRate)
print("Device Connected ")
return ser
except:
print("Not Connected ")
pass
Here, the subroutine checks the port number and the baud rate of the
Arduino UNO using the serial library and returns those initialized serial
objects. When the Arduino UNO is connected, the subroutine prints
"Device Connected" and vice versa.
4
3
After the successful connection, the new subroutine was created to send the
data to Arduino UNO, as shown in listing 11 below.
This subroutine takes the initialized serial object, data, and digits per data value as
inputs. The subroutine starts looping through the data. For each data, it inserts the
dollar sign and sends that data to the relevant port. If some issues occur in the
connection, the subroutine prints "Data Transmission Failed."
The next step was to create a receiver function for Arduino UNO to control
the components. This subroutine starts the operation by checking the dollar
sign, as shown in listing 12 below.
#define numOfValsRec 4
#define digitsPerValRec 1
int valsRec[numOfValsRec];
int stringLength = numOfValsRec * digitsPerValRec + 1;
int counter = 0;
bool counterStart = false;
String receivedString;
void receiveData() {
while (Serial.available()) {
char c = Serial.read();
if (c == '$')
{ counterStart = true; }
if (counterStart) {
44
As Listing 12 shows, when the dollar sign is detected and the counter is
less than a string length, then the function gets the data and increments
the counter. Following this, it loops through the received data elements.
For each element, an array was utilized to get and use them in the code
independently.
Firstly, the Arduino pin of each component was defined and set up as input
or output. Then the new function was created to pass the received data to
solenoid locks and LEDs, as shown in listing 13.
void unlock_solenoid()
{ digitalWrite(solenoid1Pin, valsRec[0]);
digitalWrite(solenoid2Pin, valsRec[1]);
digitalWrite(greenLed, valsRec[2]);
digitalWrite(redLed, valsRec[3]);}
Listing 13. The Arduino subroutine that sends digital values to the components
As listing 13 shows, the array was used to get each signal element and assigned
to the components using the function in listing 3.
4
5
Overall, there are three main functions in the code that loops all the time, as
shown in listing 14.
void loop() {
receiveData();
unlock_solenoid();
oled();
}
Listing 14. The Looping process of the functions
The first function is to receive the data from the Jetson Nano. The second
one is the function above to pass data to the components. Finally, the last
function is to display the status message on the OLED display according to
the data and the distance from the ultrasonic sensor.
The web page was created using HTML, CSS, and javascript. The first step
was to create a login interface for the webpage, which can be seen in figure
42.
After a successful login, The firebase configurations were used to access the
data, and the webpage displays it, as the following figure 43.
5 Conclusion
The goal of the project was to build a facial recognition system that could
recognize human faces, log information into the database, and unlock the
door.
The thesis project was executed in three steps. During the first step, the
machine learning and deep learning algorithms were used to recognize
faces and send the data to the google database. In the second step, AI data
is transmitted to the electronics components and sensors to make a smart
lock system. Finally, the last step was to design a webpage that requires a
login and displays the attendance list.
The project’s result was accomplished as expected, and the prototype could
successfully recognize human faces and activate the electronics
components. It has fast performance and could log information about
recognized humans in the Google database.
This prototype can be used for office doors to identify employees, open the
door and send the boss an attendance list, which displays the employee’s
name and entry time. A future improvement of the prototype could be
implementing more extensive algorithms to distinguish the pictures and
real faces from a camera. These algorithms would make the prototype
faster, secure, and suitable for commercial purposes.
4
7
References
6 Towards Data Science [online] What are the types of machine learning?
URL: https://towardsdatascience.com/what-are-the-types-of-machine-
learning- e2b9e5d1756f
Accessed on: 15.10.2021