Deepfake Detection System Using Deep Neural Networks

Uploaded by

Samuel Guchhait

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Deepfake Detection System Using Deep Neural Networks

Uploaded by

Samuel Guchhait

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2024 2nd International Conference on Computer, Communication and Control (IC4)

Deepfake Detection System Using Deep Neural

Networks
2024 2nd International Conference on Computer, Communication and Control (IC4) | 979-8-3503-8793-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/IC457434.2024.10486659

Dr. Madan Lal Saini Arnav Patnaik Dr. Mahadev

Department of CSE Department of CSE Department of CSE
Apex Institute of Technology Apex Institute of Technology Apex Institute of Technology
Chandigarh University, Punjab, India Chandigarh University, Punjab, India Chandigarh University, Punjab, India
[email protected] [email protected] [email protected]

Dayal Chandra Sati Dr. Ratish Kumar

Department of CSE Department of CSE
Apex Institute of Technology Apex Institute of Technology
Chandigarh University, Punjab, India Chandigarh University, Punjab, India
[email protected] [email protected]

Abstract— The rapid progress in technology and automation B. How do Deep Fakes Work?
has enabled sophisticated manipulation of multimedia content,
Over time, we've witnessed a significant surge in the
blurring the line between real and fabricated media. Deepfake
volume of data accessible to the public through various online
technology, driven by deep learning and Generative Adversarial
Networks (GANs), creates hyper-realistic fake content, with
platforms, including social media. This wealth of information
applications spanning video games, films, and advertising. has become a valuable resource. Simultaneously, rapid
However, this technology also carries substantial societal risks, advancements in machine learning and deep learning,
fostering misinformation and explicit content. To mitigate these particularly through technologies like Generative Adversarial
concerns, this paper presents a Deepfake detection system that Networks, have made it astonishingly simple to create
utilizes deep neural networks to discern genuine from forged convincingly realistic fake content [2].
images. Frames are extracted from videos and face detection
Various deepfake techniques exist for altering the
and face cropped are performed. LSTM and ResNext CNN are
utilized to generate a feature vector. The proposed system uses
appearances of individuals, and three of the most frequently
the Anvil platform to design the front end and Visual Studio and encountered methods include identity swapping, expression
Jupyter Notebook for the back end. A publicly available dataset swapping, and the creation of entirely new faces. Machine
was used to train and test the model. The proposed model learning plays a pivotal role in the creation of deepfakes, with
achieved an impressive 86% accuracy on video dataset. these techniques rooted in deep learning methods and neural
networks [3]. Fig. 1 illustrates the process of deep learning,
Keywords— Deepfake, Fake contents, Deep Neural Networks, which serves as the foundation for generating deepfake
Generative Adversarial Networks (GANs), Anvil Platform. content. The initial step in the deepfake creation process
entails training a neural network, which falls under the
I. INTRODUCTION broader umbrella of machine learning. This training process
A. Problem Overview imparts the model with a comprehensive understanding of a
person's appearance in images or videos under various
A deepfake refers to a deceptive form of media, whether conditions, such as different angles and lighting scenarios.
it's a video or an image, cleverly crafted using advanced deep Machine learning further divides the model used in deepfake
learning and machine learning techniques. Typically, creation into two distinct phases: training and testing. During
deepfakes involve superimposing a person's appearance, model training, a dataset is employed to generate fake videos,
voice, and behaviour into fictional scenarios that never while in parallel, detection mechanisms are developed to
actually took place, resulting in misleading content [1]. The identify these fabricated videos [4].
consequences of this technology can be deeply distressing,
particularly when it's used to create revenge porn, which can C. Deep Neural Networks
inflict intense emotional turmoil on the victims, leading to A deep neural network, comprising numerous layers of
feelings of anger, guilt, paranoia, depression, and tragically, nodes, is designed to execute complex operations based on the
even thoughts of suicide. provided input. On the other hand, a convolutional neural
Moreover, deepfake technology allows for the effortless network (CNN) specializes in image recognition and plays a
swapping of faces in videos or images, a capability frequently crucial role in tasks such as deep fake detection, which
exploited for nefarious purposes. Malicious actors use this involves analyzing images and video frames individually.
technology to fabricate videos featuring individuals we trust, CNNs are fundamental components of contemporary artificial
aiming to deceive others into falling for scams, divulging intelligence systems [5]. The cornerstone of their functionality
sensitive information, or even parting with their hard-earned lies in convolution operations, where an image, represented as
money. a two-dimensional array of numbers, is convolved with a

979-8-3503-8793-3/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 19,2024 at 06:31:06 UTC from IEEE Xplore. Restrictions apply.
kernel image. The kernel, which also assigns colours to instances, while the second, the discriminator, is dedicated to
numbers, is inverted before this operation. The convolution distinguishing authentic samples from counterfeit ones.
process proceeds by sliding the reversed kernel over the Through iterative training, these two sub-models converge,
image, row by row and column by column, multiplying and ultimately resulting in the generator generating samples
accumulating the elements within each matching patch. These resembling those from the genuine dataset to a degree that can
kernels serve as feature detectors, and CNNs adapt them to challenge human discernment. Furthermore, convolution
enhance their performance. In essence, CNN is a deep- plays a pivotal role in image processing, and an alternative
learning algorithm tailored to process images. It assigns perspective is to regard it not merely as a feature detector but
adjustable weights and biases to elements within an image, as a replicator [7].
allowing it to distinguish and separate objects effectively [6].
The term "deep fakes" refers to highly realistic, AI-generated D. How datasets are created for deepfake detection?
videos and images that depict individuals engaging in actions To identify deepfake videos, researchers create datasets
and utterances that never transpired. These deep fakes are the containing a mix of authentic and manipulated content from
outcome of Generative Adversarial Networks (GANs), a various sources. The process of compiling a deepfake
collaborative effort between two neural networks to produce detection dataset involves extensive data collection and
authentic-looking multimedia. Detecting these synthetic manipulation. Typically, such datasets encompass high-
creations can be challenging because they incorporate genuine resolution images of diverse individuals displaying a wide
footage and convincing audio. range of poses and facial expressions. Kaggle offers a notable
dataset for deepfake detection, comprising a total of 100,000
clips derived from 3,426 paid actors who have been subjected
to various deepfake techniques. Some datasets are initially
sourced from genuine videos on platforms like YouTube and
then undergo deepfake generation through various methods.
Examples of these include the Face Forensics dataset and the
Celeb-DF dataset [8], both of which are constructed from real
YouTube videos. In addition, datasets like "Deepfake in the
Wild [9]" are curated from deepfake content available online.
Furthermore, datasets such as "Deepfake Timit [10]" employ
techniques like face swap-GAN and Google Deepfake
Detection dataset that involve hiring actors and actresses to
create a real video dataset, which subsequently becomes the
basis for generating deepfake content.

II. LITERATURE REVIEW

The research paper, presented by Yang et al. [11],
addresses the detection of synthetic content, such as deep fake
Fig. 1. Flowchart of Deep Learning process videos and images, through the analysis of inconsistent head
poses. In this context, 3D poses are employed to transform
world coordinates into corresponding camera coordinates,
with 'R' representing a 3x3 rotation matrix and 't' a 3x1
translation vector. To achieve 3D head pose estimation, the
task involves solving for the reverse of 'R' and 't' using 2D
image coordinates. The model employs a Support Vector
Machine (SVM) for synthetic content detection, which is
trained to distinguish between fake and authentic data based
on disparities in head poses derived from a comprehensive set
of facial landmarks and those within central face regions. The
dataset used for training and testing includes both real and
deepfake photos and videos. This method makes use of an
SVM classifier, and accuracy and complexity are the model's
Fig. 2. Deep Learning Network assessment metrics.
In the context of deep fake detection, the employment of
C. Convolution Process neural networks like Mesonet, specifically the Meso4 Model,
The efficacy of deep fake media primarily relies on the comes into play [12]. This convolutional model comprises
application of Generative Adversarial Networks (GANs). four initial layers with repetitive convolutions and pooling,
GANs adopt an unsupervised learning paradigm known as followed by a dense network with a hidden layer. To enhance
"generative modelling," which entails recognizing and resilience, the fully connected layers incorporate Dropout for
assimilating patterns within input data, enabling the model to regularisation, while ReLU activation functions introduce
generate novel instances that closely resemble those from the non-linearity, and Batch Normalization regulates their output
original dataset. This approach formulates the problem as a to counter the vanishing gradient problem. With a total of
supervised learning task with two distinct sub-models. The 27,977 trainable parameters, this network serves as a robust
first sub-model, the generator, is trained to produce fresh tool for deep fake identification.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 19,2024 at 06:31:06 UTC from IEEE Xplore. Restrictions apply.
Balde et al [13] proposed a method for improving facial
recognition of FaceNet. By employing facial recognition
technology to examine video frames, the model's III. PROPOSED SYSTEM
methodology enables it to discriminate between authentic and The proposed system aims to develope a method for the
altered content. The submission uses Facenet, a facial detection of deepfake videos. The methodology for
recognition system, as a starting point. Facenet maps facial identifying deep fakes is outlined as follows:
features to points in a multidimensional space, where the
proximity of points indicates how similar a face is to another. Step 1: Obtain a deepfake dataset from Kaggle, which will
A plotted histogram that conveys the final verdict on video be refined by incorporating real-world samples. This
authenticity and sheds light on the model's level of refinement process enhances the quality of the dataset,
confidence in its prediction is used. The evaluation of this particularly in terms of feature selection.
model primarily considers complexity and accuracy. Step 2: Develop a system using Meso-4 algorithm, which is
However, it's worth noting that the absence of a graphical based on deep learning and neural networks. This algorithm
user interface renders it a straightforward text-based program, will be employed for training and testing the refined dataset.
and the final output is presented graphically, potentially
posing a clarity challenge. Step 3: To present the analysis effectively, a graphical user
interface (GUI) has been created. This GUI will display the
Tariq Shahroz et al [14] employed a technique known as results in the form of a bar chart, offering users a clear visual
Long Short-Term Memory (LSTM), a type of neural network, representation of the growth of deepfakes and their variations
in their paper. In addition to neural networks, they also based on confidence scores. The Anvil platform, a Python-
developed web applications using the Django framework. based drag-and-drop web app builder, was utilized to design
The primary objective of this paper was to identify videos the GUI.
falling into the category of "deep fakes." To achieve this,
deep learning techniques such as LSTM and ResNext CNN
were utilized to generate a feature vector. The system
architecture of this model entails the uploading of videos,
followed by data preprocessing, which includes data splitting.
The deep fake detection model incorporates ResNext feature
extraction and LSTM video classification. Following the
deepfake detection phase, the model's performance is
assessed using a confusion matrix, providing a final label
indicating whether the video is authentic (original) or
synthetic (categorized as fake). To bring this system to life, a
website was created using the Django application, which is a
high-level Python web framework known for facilitating
swift development and clean code practices. Kavita Lal [15]
performed a study on deep fake identification techniques
using deep learning. Table 1 presents a summary of the
contributions done by various reseachers.

TABLE 1 COMPARISON OF DIFFERENT WORKS

Authors Datase
Purpose Technique
Findings t
of study used Fig. 3. System Architecture of proposed system
used
The primary focus is Paperswit
This paper is to
on the examination hcode.co
employ deep
of deep fake m/
materials through the
Used to learning
dataset/fa
IV. IMLEMENTATION
detect techniques,
Yang X, assessment of
synthetic including
ceforensi The system flowchart of Deep Fake Analyzer is depicted
Li, Y and inconsistent head cs-1 in Figure 4. This flowchart is divided into two components:
video by ResNext and
Lyu S pose estimation. This
[11] particular method
making use LSTM, for the the backend and the front end. The backend comprises
of face detection of datasets containing both genuine and deep-fake images of
involves the
recognition. video deepfakes individuals, as well as weights crucial for the Convolutional
utilization of a
through transfer
classifier such as
learning. Neural Network (CNN) layer's learning process, which
SVM. adjusts these weights to extract the necessary features for
kaggle.co classification. The Python code is written into sections for
m/code/pi
Piyush
Deep Learning Deep yush357/ preprocessing, model building, training, and prediction.
Chandra ResNext, LSTM
Head Pose estimation Learning deepfake-
[16]
detection/
A. Experimental Setup
input For implementing the proposed system, a normal system
D. H. Detection FaceFore
configuration is required. The proposed system used
Choi, H. Fake images and of fake Detection of nsic++,
J. Lee video detection images and fake videos DFDC, windows 11, AMD Ryzen 7 processor, 8GB NVIDIA GPU,
[18] videos Celeb-DF 16GB RAM, tensorflow version 2.9
B. Dataset

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 19,2024 at 06:31:06 UTC from IEEE Xplore. Restrictions apply.
The proposed system used publically available dataset V. RESULT AND DISCUSSION
from kaggle [16] which has 50 smaller videos. These videos The primary challenge faced in this research work is the
are pre-processed and frames are extracted. Resizing is done insufficient availability of datasets for authenticating both
for standard resolution and data augmentation was done for genuine and deepfake images. To enhance the model's
increasing the dataset size. predictive accuracy, it is imperative to expand the dataset.
C. Model Architecture: This issue holds significant importance as it hinders the
model's ability to detect deepfake images prevalent in the real
The proposed system used MesoNet pre-trained deep world, primarily because the majority of available deepfake
neural network for feature extractor and classification. Meso- datasets originate from adult content websites or a specific
4 consists of four layers of pooling and convolutions that are subset thereof. Addressing this challenge necessitates the
successively followed by a dense network with a single hidden creation of a more extensive and diverse dataset featuring
layer. One more layer was added to classify the images as a images of various types and individuals from different ethnic
fake or real. backgrounds. Several strategies can be explored to tackle this
D. Training and Fine-Tune the Model issue. One approach involves establishing a platform for users
to contribute their images. Alternatively, collaboration with
Dataset was split into training, test, and validation sets. prominent image platforms like Meta, which hosts a wide
The model was fine-tuned on this dataset to achieve better range of user-generated images, could yield valuable datasets
performance. for research purposes. Another avenue is to amalgamate
publicly accessible datasets of both deepfake and genuine
images into a unified resource.
Another noteworthy concern pertains to the limitations of
deep fake analysis as a comprehensive solution for mitigating
the proliferation of synthetic content on the internet. Our
findings suggest that the ultimate solution lies in
implementing content verification mechanisms across various
internet platforms. In practice, this may involve major IT
corporations collectively integrating identification protocols
into their platforms when publishing content. The illustrations
of implementation of developed model and the results are
depicted in Fig. 5, Fig. 6 and Fig. 7.

Fig. 5. Deep Fake Analyzer’s Web Application

Fig. 4. Back End and Front End of Proposed System

Anvil, as a web application development platform, was

employed for crafting the front end of our system. The front-
end component is segmented into two main sections: the
Homepage and the Analysis section. Notably, the Homepage
features a convenient upload button that allows users to
submit datasets or specific images, subsequently predicting
the confidence score. In the development of this system, we
made use of several software tools, including Jupyter
Notebook, Visual Studio Code, and Anvil.

Fig. 6. Deepfake Detected by the App

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 19,2024 at 06:31:06 UTC from IEEE Xplore. Restrictions apply.
[4] S. S. Chauhan, N. Jain, S. C. Pandey and A. Chabaque, "Deepfake
Detection in Videos and Picture: Analysis of Deep Learning Models
and Dataset," 2022 IEEE International Conference on Data Science and
Information System (ICDSIS), Hassan, India, 2022, pp. 1-5, doi:
10.1109/ICDSIS55133.2022.9915885.
[5] Raza A, Munir K, Almutairi M. A Novel Deep Learning Approach for
Deepfake Image Detection. Applied Sciences. 2022; 12(19):9820.
https://doi.org/10.3390/app12199820
[6] Güera, David, and Edward J. Delp. "Deepfake video detection using
recurrent neural networks." 2018 15th IEEE international conference
on advanced video and signal based surveillance (AVSS). IEEE, 2018.
[7] K. Bansal, S. Agarwal and N. Vyas, "Deepfake Detection Using CNN
and DCGANS to Drop-Out Fake Multimedia Content: A Hybrid
Approach," 2023 International Conference on IoT, Communication
and Automation Technology (ICICAT), Gorakhpur, India, 2023, pp. 1-
6, doi: 10.1109/ICICAT57735.2023.10263628.
Fig. 7. Real Image Detected by the App
[8] https://github.com/yuezunli/celeb-deepfakeforensics
[9] https://github.com/deepfakeinthewild/deepfake-in-the-wild
The deepfake detection system underwent training and
[10] https://www.idiap.ch/en/scientific-research/data/deepfaketimit
testing using a video dataset where frames are extracted. This
model demonstrates a high level of accuracy, achieving 86% [11] Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-
scale challenging dataset for deepfake forensics. In Proceedings of the
precision, particularly when applied to synthetic datasets. IEEE/CVF conference on computer vision and pattern recognition (pp.
3207-3216).
[12] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “MesoNet: A
VI. CONCLUSION compact facial video forgery detection network,” 10th IEEE Int. Work.
Inf. Forensics Secur. WIFS 2018, 2018, doi:
Deep learning has emerged as a prominent force in the 10.1109/WIFS.2018.8630761
realm of artificial intelligence, and image-related challenges [13] Balde, Ansh Abhay, Abhay Jain, and Dipti Patra. "Improving facial
have played a pivotal role in this progress. Thanks to recent recognition of FaceNet in a small dataset using DeepFakes." 2020 4th
advancements in convolutional neural networks, there is a International Conference on Computer, Communication and Signal
Processing (ICCCSP). IEEE, 2020.
prevailing belief that AI will soon surpass human capabilities
[14] Tariq, Shahroz, Sangyup Lee, and Simon S. Woo. "A convolutional
in most conventional vision-based tasks. This development lstm based residual network for deepfake video detection." arXiv
carries profound implications for society, information preprint arXiv:2009.07480 (2020).
dissemination, and the media, especially in light of AI's [15] Kavita Lal, Madan Lal Saini; A study on deep fake identification
newfound ability to generate convincingly deceptive images techniques using deep learning. AIP Conf. Proc. 15 June 2023; 2782
and videos. Notably, our proposed model, employing Meso4 (1): 020155. https://doi.org/10.1063/5.0154828
architecture and trained on a publically available dataset of [16] https://www.kaggle.com/code/piyush357/deepfake-
videos, has demonstrated an 86% accuracy in detecting deep detection/notebook
fakes. The forthcoming endeavours will predominantly focus [17] https://www.kaggle.com/competitions/deepfake-detection-
on enhancing the precision of deep-fake video detection. challenge/data
[18] D. H. Choi, H. J. Lee, S. Lee, J. U. Kim and Y. M. Ro, "Fake Video
Detection With Certainty-Based Attention Network," 2020 IEEE
International Conference on Image Processing (ICIP), Abu Dhabi,
REFERENCES United Arab Emirates, 2020, pp. 823-827, doi:
10.1109/ICIP40778.2020.9190655
[1] Deepfakes: Trick or Treat. - ScienceDirect. ScienceDirect.Com/
Science, Health and Medical Journals, Full Text Articles and Books., [19] Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A. and Ortega-
https://www.sciencedirect.com/science/article/abs/pii/S000768131930 Garcia, J. Deepfakes and beyond: A Survey of face manipulation and
1600. fake detection. Information Fusion, (2020), [online] 64, pp.131–148.
Available at: https://arxiv.org/pdf/2001.00179.pdf.
[2] Korshunov, P., & Marcel,. Deepfakes: A new threat to face recognition,
assessment and detection. From arXiv.org, online available at [20] Nguyen, H. M., & Derakhshani, R. Eyebrow recognition for
https://arxiv.org/abs/1812.08685. identifying deepfake videos. In 2020 international conference of the
biometrics special interest group (BIOSIG) (pp. 1-5). IEEE.
[3] H. A. Khalil and S. A. Maged, "Deepfakes Creation and Detection
Using Deep Learning," 2021 International Mobile, Intelligent, and
Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 2021, pp.
1-4, doi: 10.1109/MIUCC52538.2021.9447642

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 19,2024 at 06:31:06 UTC from IEEE Xplore. Restrictions apply.