0% found this document useful (0 votes)
4 views

LR2

Uploaded by

shammassatti00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

LR2

Uploaded by

shammassatti00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proc.

of the International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME 2023)
19-20 July 2023, Tenerife, Canary Islands, Spain

Hybrid Deepfake Detection Utilizing MLP and LSTM


Jacob Mallet Natalie Krueger Dr. Mounika Vanamala Dr. Rushit Dave
Department of Computer Science Department of Computer Science Department of Computer Science Department of Computer
University of Wisconsin – Eau University of Wisconsin – Eau University of Wisconsin – Eau Information Science
Claire Claire Claire Minnesota State University,
Eau Claire, WI, USA Eau Claire, WI, USA Eau Claire, WI, USA Mankato
[email protected] [email protected] [email protected] Mankato, MN, USA
[email protected]

Abstract—The growing reliance of society on social media for This paper provides a new method of deepfake image
authentic information has done nothing but increase over the past detection that uses two different machine learning algorithms.
years. This has only raised the potential consequences of the Machine learning has been shown to be effective when used for
spread of misinformation. One of the growing methods in image classification [1], user authentication [2-13], and other
popularity is to deceive users through the use of a deepfake. A security functions [14-16]. This is evidence that it is also an
deepfake is a new invention that has come with the latest effective method for detecting deepfakes. The algorithms we
technological advancements, which enables nefarious online users have tested in this study are Long Short-Term Memory Network
to replace one’s face with a computer-generated, synthetic face of (LSTM) and Multilayer Perceptron (MLP). These methods have
numerous powerful members of society. Deepfake images and
all been shown to produce accurate results when used for
videos now provide the means to mimic important political and
cultural figures to spread massive amounts of false information.
deepfake detection [17-19]. The dataset we are using to test
Models that are able to detect these deepfakes to prevent the these algorithms is called 140k Real and Fake Faces [20], a
spread of misinformation are now of tremendous necessity. In this publicly available dataset retrieved from Kaggle. This dataset
paper, we propose a new deepfake detection schema utilizing two consists of 70,000 images from two different datasets, Flickr-
deep learning algorithms: long short-term memory and multilayer Faces-HQ [21], which contains entirely real faces, and the
perceptron. We evaluate our model using a publicly available Deepfake Detection Challenge dataset [22], containing deepfake
dataset named 140k Real and Fake Faces to detect images altered faces created using style Generative Adversarial Networks
by a deepfake with accuracies achieved as high as 74.7%. (GANs). The novel contributions of this study include the
testing of the two mentioned algorithms on their ability to
Keywords—Deepfake, Machine Learning, Fake Image classify real and fake images, all on the same dataset.
Detection, Long Short-Term Memory, Multilayer Perceptron
II. RELATED WORK
I. INTRODUCTION
A. LSTM
In the modern world, digital media has a large impact on the
opinions of the public. Specifically, media that originates from In one study [17], researchers use a convolutional LSTM-
based residual network, CLRNet, to detect deepfakes. This
well-known people such as politicians or celebrities. Deepfakes
method focuses on deepfake videos rather than images,
can take advantage of this impact and use it for malicious detecting the inconsistencies between frames of a video. It also
purposes. A deepfake is a digitally created photo or video of a uses a convolutional LSTM to overcome a lack of spatial
person in which it is not really them, but an altered version of information recorded with other LSTM methods. This includes
them. Deepfake technology has progressed to the point that 3D tensors that record two dimensions of spatial information.
almost anyone can easily impersonate someone else without Sets of five frames are taken from videos in multiple datasets,
their permission. This has allowed many people to maliciously resized, and put through data augmentation methods before
create fake photos and videos of well-known public figures, being evaluated by the algorithm. CLRNet is compared to
painting them in a negative light or making it seem as if they several current methods on three different tests of transfer
are saying or doing something that they have not. This media learning. The method performs best with an accuracy of 97.18%
can spread rapidly and cause public outrage or confusion when when a single source dataset and a single target dataset were
the deepfake is realistic enough to trick the average person. This used. CLRNet has shown to be a superior architecture when
is a prominent reason why research is needed to develop ways compared to previous baseline models, and provides a step
of detecting deepfakes accurately, helping to stop the spread of towards improved future deepfake detection.
malicious media and to create a more informed public. B. CNN

Funding provided by University of Wisconsin-Eau Claire’s Office of


Research and Supported Projects
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Another study [18] features a Convolutional Neural Network TABLE I. RESULTS FOR ALL ALGORITHMS
(CNN) used alongside a Vision Transformer (ViT) to create a Metrics
Convolutional Vision Transformer method. This study also Algorithms
Accuracy Precision Recall F1 Score
works to classify deepfake videos. First, faces are extracted from
the videos and the data is augmented to prepare for LSTM 74.7% 71.8% 81.4% 76.3%
classification. The next step, performed by a CNN with
MLP 68% 69% 61% 64%
seventeen convolutional layers, is feature learning. This
involves extracting learnable features from the selected faces. CNN 88.3% 89.9% 86.3% 88.1%
Each layer of the CNN has three to four convolutional
operations, and the final product is a feature map of the inputted SVM 81.7% 84.8% 77.2% 80.8%
face images. This map is then passed through the ViT, which
uses an encoder to classify the faces as real or fake. Researchers
split multiple premade datasets into training, validation, and During the training process of an MLP classifier, a
testing sets. In an experiment, thirty facial images are evaluated supervised learning approach is used called backpropagation.
at a time. The proposed model reached a maximum accuracy of This technique attempts to minimize error within the model and
93%, and performed similarly to compared models. One issue trains using gradient descent. During training, the weights
with this experiment was inherent issues in the face detection associated with each feature can be updated. This identifies and
software used in preprocessing, which misclassified many prioritizes the inputted features that are found to be most
images. The study concludes that the model has room for important by the algorithm. MLP is a versatile deep learning
improvement in multiple ways, but the high accuracy shows that algorithm, in that it has the ability to handle either a linear or
CLRNet is a good example of a generalized model. continuous function. This enables the algorithm to be applied in
several different contexts, including image classification.
C. MLP
Long Short-Term Memory is another type of artificial neural
Researchers in [19] used a hybrid of a CNN and an MLP to network and deep learning algorithm. LSTM is very similar to a
classify deepfake videos. Both algorithms are used separately Recurrent Neural Network (RNN), in that they both have the
and in different ways. Initially, frames from a video are inputted ability to process sequences of data, in addition to individual
into a CNN, which performs automatic feature extraction. data points like most deep learning algorithms. One major issue
However, before the MLP is used, data must be extracted from with traditional RNN networks stems from their failure to
the frames with facial landmark detection. This is done by a pre- leverage and remember data in the long term. LSTM attempts ot
trained detector. The data includes a number of eye blinks and solve this issue by placing more of an emphasis on remembering
shape features such as eye and nose width and lip size. This data long-term dependencies.
is normalized and passed as input to the MLP, which uses two
layers to produce an initial classification. Once both the MLP During the training process of LSTM, the primary goal the
and the CNN have produced output, it is mixed together and network is striving for is to minimize loss as much as possible.
linked to an activation layer and a connected neural layer. These To calculate the loss, a gradient is used, which is the loss with
produce the final result, a classification of the deepfake. respect to several weights within the data. The specific weights
are constantly being adjusted to produce the lowest amount of
An experiment is conducted using 199 fake videos and 53 loss possible. For each time step, data flows through an input
real videos from a premade dataset, and 66 real videos from a gate where the features that are identified by the network as
separate set. The model was compared to another that only meaningful are extracted.
includes a CNN. Results showed that the testing accuracy of the
hybrid model reached 87%, with the CNN model at 74%. The The remaining data is again assessed to evaluate the long-
proposed model also provides a faster training period. An term relevance of the data, and any features identified as such
important note is that the CNN model overfits earlier than the are scaled and added to the cell state. The data continues to flow
into an output gate, where the cell state is determined if it is
proposed model, which resulted in a drop in test accuracy. This
ready to make a prediction, or if the cell needs to be fed back
model provides a base for future work on classifying deepfakes
into the LSTM network for further refinement. After traversing
quickly and with few computational resources. through the data, the model is ready to compile and make
III. METHODOLOGY predictions.

A. Classifier Background TABLE II. AUC FOR ALL ALGORITHMS


This section will detail the background of the two classifiers Metrics
used in this study, namely LSTM and MLP, detailed in Tables Algorithms
1, 2, 3, and 4, as well as Fig. 1 and 2. Multilayer Perceptron is a LSTM MLP CNN SVM
deep-learning algorithm and is repeatedly used in classification AUC 74.7% 66% 88.3% 81.7%
contexts due to its efficiency in this domain. MLP is known as a
feed-forward neural network, meaning the data flows through
the model advancing from the input layer to the output layer,
never retreating. Neurons named perceptrons compose the MLP
algorithm, with each perceptron receiving n features, and a
weight is assigned to each individual feature.
Moving on to our model using an MLP classifier, here we
100 use four layers in total. The first layer flattens the input data
90 further in order to advance into our fully connected dense layer,
80
70
which uses a sigmoid activation function. The dense layer is
Score (%)

60 essentially repeated, with the difference being half of the units


Precision
50 used in the previous layer. After these two fully connected
40 Recall layers, the final layer, serving as the output layer, is a dense
30 layer with a sigmoid activation function applied here again. The
20
10
same compilation process as the previous model is applied to
0 our MLP-based model again.
LSTM MLP CNN SVM C. Dataset
The dataset used to evaluate our various models in this study
Algorithm is called 140k real and fake faces [20]. This dataset merges two
separate datasets together, namely Flickr-Faces-HQ [21] and
the Deepfake Detection Challenge dataset [22] to create an
Fig. 1. Comparing Precision and Recall expansive and robust set of images to benchmark our model.
70,000 images come from the Flickr-Faces-HQ dataset, with all
images being of real faces of different people. Another 70,000
images are taken from the Deepfake Detection Challenge
B. System Overview
dataset, which contains deepfakes generated by a style GAN
We will overview both models evaluated in our research in [20]. This style GAN used holds the ability to learn authentic
this section, starting with the preprocessing steps taken. Both of facial features and patterns, only then to apply that knowledge
our models use data with the same preprocessing methodology in generating deepfakes. Using a dataset that contains an
applied. First, images are read using an Image Data Generator. equivalent amount of deepfakes and real faces is essential for
Grayscale, resizing, and zooming were applied to the images
training our model without biasing in one direction. We split
primarily to focus more on the person of interest’s face in the
these images into a 75/25 train-test split. All of the images are
image and trim out some of the background. These images are
then outputted and converted to a three-dimensional array, preprocessed using the steps detailed earlier and fed to an image
which will be directly fed into each model. data generator to obtain an output capable of being learned by
the neural networks. Within the image data generator process,
To expand on our model based on the LSTM classifier, we the images are zoomed in centrally by 20% to isolate the face
simply use two layers. The first being the LSTM layer, where within the image further.
the activation function is set to sigmoid. The next and final layer
is a dense output layer to provide the prediction for this model. D. Ethical Disclaimer
Finally, we compile the model using a binary cross-entropy loss On top of the safety concerns regarding misinformation that
function to place the predictions into one class or the other, a come with the use of deepfakes, research in this area comes
deepfake or authentic image. with an abundance of ethical issues in itself. When using a
deepfake, whether with malicious intent or not, a real person’s
100 face is used as the source of the generation of the deepfake,
90 unless the face is entirely computer generated. A key aspect of
80 the data used in this study is the satisfaction of the latter
70
Score (%)

scenario, meaning all our deepfakes are completely computer-


60
50 generated faces. Therefore, none of the synthetic faces created
Accuracy
40 were done so with the intent of taking a real person’s face and
30 applying their likeness to someone else. In addition to the
20 deepfakes included in this study, 70,000 real faces are used as
10
0 well. These images containing authentic faces were captured
LSTM MLP CNN SVM with consent from the users. On no occasion was there the use
of synthetic or real faces without the permission granted from
the participants that allowed for their likeness to be used in the
Algorithm creation of their respective datasets.

TABLE III. CONFUSION MATRIX FOR LSTM

Fig. 2. Comparing Accuracies Across Algorithms Real Deepfake

Real 11,895 5,605

Deepfake 3,261 14,239


TABLE IV. CONFUSION MATRIX FOR MLP four algorithms evaluated can be seen in Figure 2. Regarding
Real Deepfake precision, both algorithms produced very similar results with
LSTM at 71.8% and MLP at 69%. One big disparity found
Real 12,642 4,858 between the two algorithms was in the sensitivity, or recall.
Deepfake 6,876 10,624 Recall was the lowest metric yielded by MLP at 61%, while it
was the highest for LSTM at 81.4%. This 20.4% difference
shows a great contrast in the two algorithms' ability to identify
E. Performance Metrics true positives, meaning when LSTM is given a real face, it is
able to classify the image correctly at a much higher rate than
To measure the performance of the two models, we use six
MLP. An in-depth comparison of recall and precision for each
success metrics, namely accuracy, precision, recall, F1 Score,
algorithm can be seen in Figure 1.
Area Under the Curve (AUC), and a confusion matrix. Perhaps
the most widely used metric, accuracy, is evaluating the When looking at the confusion matrix, MLP actually has a
percentage of time the model simply correctly identifies if the higher raw number of true positives, but the number of false
face is a deepfake or real. The second metric recorded in this negatives is over double the total observed in LSTM’s confusion
study is precision. Precision gives us an idea of the rate at that matrix, which provides further context to the recall disparity
our model is correctly identifying positives, or a real face in our observed. A lower recall in comparison to precision, like MLP
case. That is, precision looks at all of the positive predictions exhibits, would indicate the model is classifying too many real
and identifies the percentage that were actually real faces. images as deepfakes. LSTM has the opposite issue, where the
Recall also deals with positive predictions, but accounts for the model has a lower precision compared to recall, thus the
algorithm is incorrectly classifying deepfakes as real images at
rate that real images were correctly identified. This time, all the
a higher rate. LSTM produced a higher F1 Score than MLP, with
real faces are gathered, and the rate that the real faces were the algorithms producing scores of 76.3% and 64% respectively.
correctly classified as real faces represents the recall. Recall can
also be referred to as the true positive rate. The third metric we When a curve is graphed to represent the true and false
use, F1 Score, is a balance between precision and recall. positives, we achieved an AUC of 74.7% for LSTM and 66%
Another metric used as a benchmark, the Area Under the Curve, for MLP. These AUCs show the ability our models have to
is the area underneath the receiving operator characteristic effectively distinguish between deepfake and real images.
(ROC) curve. The ROC curve plots a curve representing the Finally, our confusion matrices can be seen in Table 3 and Table
TPR, or recall, and the false positive rate (FPR). FPR identifies 4. As noted earlier, the most glaring difference between the two
the rate at which a deepfake is incorrectly identified as a real algorithms can be seen in the bottom two quadrants displaying
the false negatives and true negatives. This can lead us to
face. Again, the AUC is the area underneath the curve that
conclude LSTM is more proficient in correctly identifying
results from these two parameters. The final success metric
deepfakes in comparison to MLP. The two algorithms perform
used to evaluate our models is a confusion matrix. A confusion similarly for true and false positive results, but the biggest
matrix has four quadrants in a two by two table. The upper left takeaway is LSTM’s superior ability to classify deepfakes
quadrant of the matrix represents the number of true positives compared to MLP.
from our model, similar to recall. The upper right quadrant
displays the number of false positives predicted, similar to the V. LIMITATIONS
FPR. The lower left quadrant represents the number of false One limitation that is a part of this study is the costs of using
negatives. This is the number of times the model predicted that this model in the real world. To detect deepfakes in the wild,
a real image was a deepfake. The final, lower right quadrant every image would have to be tested. This amount of testing
tallies the number of the true negatives. This would be the would be extremely costly with the number of deepfakes being
model successfully identifying a deepfake in our use case. created and distributed every day. Another limitation that
frequently affects deepfake detection studies is the
IV. RESULTS AND DISCUSSION generalizability and transferability of a model. When a model is
Many conclusions can be drawn by analyzing the results trained on a single dataset, it can be difficult to apply the model
produced by the deep learning algorithms used in this study. In to real-world situations or use it on other datasets without a
[1], Convolutional Neural Network (CNN) and Support Vector major drop in performance. This limitation is also applicable to
Machine (SVM) algorithms are used on the same dataset. These this research. Using a dataset for training that contains a diverse
results can be used as a benchmark for the algorithms used in collection of deepfakes made using multiple methods is
this study. A comprehensive view of all metrics across all something that could minimize this problem in future studies.
algorithms can be seen in Table 1. One of the most significant
conclusions found within our results is the performance of SVM VI. CONCLUSION & FUTURE WORK
and especially CNN, compared to MLP and LSTM. CNN Through this paper, we have introduced a new method of
outperformed MLP and LSTM in every metric used to evaluate deepfake detection for use with images. We used a dataset
the algorithms. consisting of both real and fake images, 140k real and fake faces.
We tested two different algorithms, Long Short-Term Memory
As for the models developed in this study, LSTM yielded
Network and Multilayer Perceptron. Through these models, we
slightly higher results in each metric observed in comparison to
achieved an accuracy of up to 74.7% with the LSTM algorithm.
MLP. Using LSTM and MLP resulted in accuracies of 74.7%
Although these results are a good indicator that this model can
and 68% respectively. A comparison of accuracies between the
accurately detect deepfake images, there is still a need to make [9] Gunn, Dylan J., et al. "Touch-based active cloud authentication using
more progress if it is to be used in real-world situations. It was traditional machine learning and LSTM on a distributed tensorflow
framework." International Journal of Computational Intelligence and
previously stated that a model trained on only one dataset is Applications 18.04 (2019): 1950022.
likely to experience a drop in performance in the real world. [10] J. Shelton et al., "Palm Print Authentication on a Cloud Platform," 2018
Increasing the ability of our model to move towards realistic International Conference on Advances in Big Data, Computing and Data
applications is a necessary next step. We can accomplish this by Communication Systems (icABCD), Durban, South Africa, 2018, pp. 1-6,
training and testing our method on different, more robust doi: 10.1109/ICABCD.2018.8465479.
datasets to create a smoother transition into real-world use. We [11] Siddiqui, Nyle, Laura Pryor, and Rushit Dave. "User authentication
also plan to test our model with other algorithms to find the best schemes using machine learning methods—a review." Proceedings of
International Conference on Communication and Computational
performing machine learning algorithms possible. Technologies: ICCCT 2021. Springer Singapore, 2021.
ACKNOWLEDGMENT [12] Mallet, Jacob, et al. "Hold on and swipe: a touch-movement based
continuous authentication schema based on machine learning." 2022 Asia
Funding for this project has been provided by the University Conference on Algorithms, Computing and Machine Learning (CACML).
of Wisconsin-Eau Claire’s Office of Research and Special IEEE, 2022.
Programs Summer Research Experience Grant [13] Pryor, Laura, et al. "Machine learning algorithms in user authentication
schemes." 2021 International Conference on Electrical, Computer and
REFERENCES Energy Technologies (ICECET). IEEE, 2021.
[14] Vanamala, M., Gilmore, J., Yuan, X., & Roy, K. (2020, December).
Recommending Attack Patterns for Software Requirements Document.
[1] Mallet, J., Dave, R., Seliya, N., & Vanamala, M. (2022). Using Deep In 2020 International Conference on Computational Science and
Learning to Detecting Deepfakes. arXiv preprint arXiv:2207.13644. Computational Intelligence (CSCI) (pp. 1813-1818). IEEE.
[2] Mallet, J., Pryor, L., Dave, R., Seliya, N., Vanamala, M., & Sowells- [15] Mounika, V., Yuan, X., & Bandaru, K. (2019, December). Analyzing
Boone, E. (2022, March). Hold On and Swipe: A Touch-Movement Based CVE database using unsupervised topic modelling. In 2019 International
Continuous Authentication Schema based on Machine Learning. In 2022 Conference on Computational Science and Computational Intelligence
Asia Conference on Algorithms, Computing and Machine Learning (CSCI) (pp. 72-77). IEEE.
(CACML) (pp. 442-447). IEEE. [16] Vanamala, M., Yuan, X., & Roy, K. (2020, August). Topic modeling and
[3] Tee, W. Z., Dave, R., Seliya, N., & Vanamala, M. (2022, March). Human classification of Common Vulnerabilities And Exposures database.
Activity Recognition models using Limited Consumer Device Sensors In 2020 International Conference on Artificial Intelligence, Big Data,
and Machine Learning. In 2022 Asia Conference on Algorithms, Computing and Data Communication Systems (icABCD) (pp. 1-5). IEEE.
Computing and Machine Learning (CACML) (pp. 456-461). IEEE. [17] Tariq, S., Lee, S., & Woo, S. S. (2020). A convolutional lstm based
[4] Tee, W. Z., Dave, R., Seliya, J., & Vanamala, M. (2022, May). A Close residual network for deepfake video detection. arXiv preprint
Look into Human Activity Recognition Models using Deep Learning. arXiv:2009.07480.
In 2022 3rd International Conference on Computing, Networks and [18] Wodajo, D., & Atnafu, S. (2021). Deepfake video detection using
Internet of Things (CNIOT) (pp. 201-206). IEEE. convolutional vision transformer. arXiv preprint arXiv:2102.11126.
[5] Siddiqui, N., Dave, R., Vanamala, M., & Seliya, N. (2022). Machine and [19] Kolagati, S., Priyadharshini, T., & Rajam, V. M. A. (2022). Exposing
Deep Learning Applications to Mouse Dynamics for Continuous User deepfakes using a deep multilayer perceptron–convolutional neural
Authentication. Machine Learning and Knowledge Extraction, 4(2), 502- network model. International Journal of Information Management Data
518. Insights, 2(1), 100054.
[6] Pryor, L., Mallet, J., Dave, R., Seliya, N., Vanamala, M., & Boone, E. S. [20] Xhlulu, “140k Real and Fake Faces,” Kaggle, 10-Feb-2020.
(2022). Evaluation of a User Authentication Schema Using Behavioral https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces.
Biometrics and Machine Learning. arXiv preprint arXiv:2205.08371.
[21] T. Karras and J. Hellsten, “Flickr-Faces-HQ Dataset,” GitHub, 2021.
[7] Deridder, Z., Siddiqui, N., Reither, T., Dave, R., Pelto, B., Seliya, N., & https://github.com/NVlabs/ffhq-dataset
Vanamala, M. (2022). Continuous User Authentication Using Machine
Learning and Multi-Finger Mobile Touch Dynamics with a Novel [22] B. Tunguz, “Deepfake Detection Challenge,” Kaggle.
Dataset. arXiv preprint arXiv:2207.13648., https://www.kaggle.com/c/deepfake-detection-
challenge/discussion/121173
[8] Mason, J., Dave, R., Chatterjee, P., Graham-Allen, I., Esterline, A., &
Roy, K. (2020). An investigation of biometric authentication in the
healthcare environment. Array, 8, 100042.

You might also like