Phase 1
Phase 1
Report On
“DEEPFAKE DETECTION SYSTEM”
Submitted in partial fulfilment of the requirement for the award of the degree of
BACHELOR OF ENGINEERING
IN
ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Submitted by
Jayashree N 1KG22AD024
Manoj B 1KG22AD036
Niveditha P 1KG22AD041
Tejas J 1KG22AD060
CERTIFICATE
Certified that the Project Work Phase I (BAD685) entitled “DEEPFAKE DETECTION
SYSTEM” is a bonafide work carried out by:
Jayashree N 1KG22AD024
Manoj B 1KG22AD036
Niveditha P 1KG22AD041
Tejas J 1KG22AD060
in partial fulfilment for VI semester B.E., Project Work in the branch of Artificial Intelligence
& Data Science prescribed by Visvesvaraya Technological University, Belagavi during the
academic year 2024-2025. It is certified that all the corrections and suggestions indicated for
internal assessment have been incorporated. The Project Work Phase I (BAD685) report has
been approved as it satisfies the academic requirements in respect of project work prescribed
for the Bachelor of Engineering degree.
I
DECLARATION
We, the undersigned students of 6th semester, Artificial Intelligence & Data Science, KSSEM,
declare that our Project Work entitled “Deepfake Detection System” is a bonafide work of ours.
Our project is neither a copy nor by means a modification of any other engineering project.
We also declare that this project was not entitled for submission to any other university in the
past and shall remain the only submission made and will not be submitted by us to any other
university in the future.
Place: Bengaluru
Date:
II
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task will be
incomplete without the mention of the individuals, we are greatly indebted to, who through
guidance and providing facilities have served as a beacon of light and crowned our efforts with
success.
We would like to express our gratitude to our MANAGEMENT, K.S. School of Engineering and
Management, Bengaluru, for providing a very good infrastructure and all the kindness forwarded
to us in carrying out this project work in college.
We would like to express our gratitude to Dr. K.V.A Balaji, CEO, K.S. School of Engineering
and Management, Bengaluru, for his valuable guidance.
We would like to express our gratitude to Dr. K. Rama Narasimha, Principal, K.S. School of
Engineering and Management, Bengaluru, for his valuable guidance.
We like to extend our gratitude to Dr. Manjunath T K, Professor and Head, Department of
Artificial Intelligence & Data Science, for providing a very good facilities and all the support
forwarded to us in carrying out this project work successfully.
We also like to thank our Project Coordinator Mrs. K. Padma Priya, Assistant Professor,
Artificial Intelligence & Data Science for her help and support provided to carry out the Project
Work successfully.
We are also thankful to the teaching and non-teaching staff of Artificial Intelligence & Data
Science, KSSEM for helping us in completing the Project Work.
Jayashree N (1KG22AD024)
Manoj B (1KG22AD036)
Niveditha P (1KG22AD041)
Tejas J (1KG22AD060)
III
ABSTRACT
Deepfake technology leverages advanced deep learning techniques to generate highly realistic
fake videos, images, and audio that are often indistinguishable from authentic content. While these
synthetic media offer creative applications in fields like entertainment, education, and
accessibility, they also present significant threats including misinformation, identity theft, and
digital fraud. As deepfakes continue to evolve in quality and accessibility, the need for effective
and reliable detection systems becomes increasingly urgent.
This project presents a detailed literature survey and comparative analysis of five recent research
papers focusing on machine learning and deep learning-based deepfake detection techniques. The
selected studies explore a range of models, including Support Vector Machines (SVM), Decision
Trees, Convolutional Neural Networks (CNNs), EfficientNet, InceptionNet, and hybrid
architectures. Each method is analyzed in terms of its dataset usage, model architecture, accuracy,
limitations, and real-world applicability.
The goal of this phase is to understand the landscape of current detection methods, evaluate their
performance across different datasets, and identify existing research gaps. Based on the findings,
the project aims to implement or propose a robust, scalable, and generalizable deepfake detection
framework that can adapt to evolving manipulation techniques and support real-time media
verification.
IV
TABLE OF CONTENTS
CERTIFICATE I
DECLARATION II
ACKNOWLEDGEMENT III
ABSTRACT IV
TABE OF CONTENTS V
Chapter 1. INTRODUCTION 1
Chapter 2. LITERATURE SURVEY 2
2.1 Deepfake Detection using Inception-ResNetV2 2
2.2 DeepFake Videos Detection and Classification Using ResNeXt 3
and LSTM Neural Network
2.3 Facial Recognition for Deepfake Detection 4
2.4 A Comprehensive Overview of Deepfake: Generation, 4
Detection, Datasets, and Opportunities
2.5 Improving Deepfake Detection by Mixing Top Solutions of the 5
DFDC
2.6 Real-Time Face Transition Using Deepfake Technology (GAN 6
Model)
2.7 Wave-Spectrogram Cross-Modal Aggregation for Audio 7
Deepfake Detection
2.8 Spatial Vision Transformer: A Novel Approach to Deepfake 8
Video Detection
2.9 Robust and Generalized DeepFake Detection 9
V
2.13 AI-Based Deepfake Detection. 12
2.14 Motion Magnified 3D Residual-in-Dense Network for 12
DeepFake Detection.
2.15 Domain Generalization via Aggregation and Separation for 13
Audio Deepfake Detection.
2.16 Comparative Analysis on Different DeepFake Detection 13
Methods and Semi Supervised GAN Architecture for
DeepFake Detection.
2.17 Deepfake Detection: A Systematic Literature Review. 14
2.18 Deepfake Generation and Detection – An Exploratory Study. 15
2.19 An Effective Approach for Detecting Deepfake Videos Using 15
Long Short-Term Memory and ResNet
2.20 Artificial Intelligence into Multimedia Deepfakes Creation 16
and Detection.
2.21 DeepFake Video Detection Using Machine Learning and 17
Deep Learning Techniques.
2.22 Enhancing Deepfake Video Detection Performance with a 18
Hybrid CNN Deep Learning Model.
2.23 Comparative Analysis of Deepfake Video Detection Using 19
InceptionNet and EfficientNet.
2.24 Comparison of Different Machine Learning Algorithms for 20
Deep Fake Detection.
2.25 Analysis and Comparison of Deepfakes Detection Methods 22
for Cross-Library Generalisation
2.26 A Comprehensive Review on Fake Images/Videos Detection 23
Techniques.
2.27 Contemporary Cybersecurity Challenges in Metaverse Using 24
Artificial Intelligence.
2.28 Deepfake Disasters: A Comprehensive Review of 25
Technology, Ethical Concerns, Countermeasures, and
Societal Implications.
2.29 DeepFake Video Detection. 26
2.30 Unmasking the Illusions: A Comprehensive Study on 27
Deepfake Videos and Images.
VI
2.31 A Heterogeneous Feature Ensemble Learning based Deepfake 29
Detection Method
2.32 Deepfake Detection in Videos and Picture: Analysis of Deep 30
Learning Models and Dataset
2.33 Deepfake Detection Using Deep Learning 30
2.34 Detecting Deepfakes: Training Adversarial Detectors with 31
GANs for Image Authentication
2.35 Detection of Deepfakes: Protecting Images and Videos 32
Against Deepfake
2.36 Div-Df: A Diverse Manipulation Deepfake Video Dataset 32
2.37 Model Attribution of Face-Swap Deepfake Videos 33
2.38 Review: DeepFake Detection Techniques using Deep Neural 34
Networks (DNN)”
2.39 Deepfake Generation and Detection: A Survey 34
2.40 Deep Fake in Picture Using Convolutional Neural Network 35
Chatpter 3 PROBLEM STATEMENT IDENTIFICATION 36
3.1 Problem statement
3.2 Project Scope
Chatpter 4 GOALS AND OBJECTIVES 37
4.1 Project Goals
4.2 Project Objectives
Chapter 5 SYSTEM REQUIREMENT SPECIFICATION 38
5.1 Software Reuirements
5.2 Hardware Requirements
Chatpter 6 METHODOLOGY 39
6.1 Literature Survey
6.2 Datasset Collection and Preprocessing
6.3 Feature Extraction
6.4 Model Training and Evaluation 40
6.5 Result Analysis 40
6.6 Documentation and Reporting 40
Chapter 7 APPLICATIONS 41
REFERENCES 42
VII
A Survey Paper on Deepfake Detection System
CHAPTER 1
INTRODUCTION
Deepfake technology, a portmanteau of "deep learning" and "fake," refers to the use of
advanced artificial intelligence techniques to create highly realistic synthetic videos, images,
or audio that can convincingly mimic real people. It utilizes deep learning models like
Generative Adversarial Networks (GANs) to manipulate or generate visual and auditory
content. While deepfakes offer creative applications in fields like filmmaking, education, and
accessibility, they also pose serious threats to security, privacy, and public trust.
With the emergence of easy-to-use tools such as FakeApp, DeepFaceLab, and FaceSwap, even
individuals without technical expertise can generate convincing fake media. These deepfakes
have been weaponized in various contexts, including fake political speeches, doctored celebrity
content, identity theft, and non-consensual pornography. Their widespread misuse has made
deepfake detection a critical issue in cybersecurity and digital forensics.
Early detection techniques relied on handcrafted features and traditional machine learning
algorithms like Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Decision
Trees. However, as deepfakes became more sophisticated, deep learning models— particularly
CNNs, InceptionNet, EfficientNet, and hybrid architectures—have shown greater success.
Despite this, a major limitation persists: many models excel on their training datasets but fail
to generalize to unseen, real-world data.
In this project, we conduct a comprehensive review of five research papers, each exploring
different detection techniques, including ML classifiers, deep CNNs, and hybrid models
incorporating both spatial and temporal cues. We compare their performance on datasets like
FaceForensics++, DFDC, and CelebDF to evaluate accuracy, generalization, and feasibility.
This review aims to highlight current advancements, uncover research gaps, and guide future
development of scalable and reliable deepfake detection systems.
CHAPTER 2
LITERATURE SURVEY
2.1. The paper ―Deepfake Detection using Inception-ResNetV2‖ by IEEE ICACFCT 2021
provides a detailed study on frame-based fake video detection using deep learning. Key
highlights include:
• Problem Statement: With the rapid evolution of AI-generated facial manipulations,
detecting deepfakes has become increasingly challenging, as they closely resemble real
facial expressions and movements.
• Proposed System: The paper proposes a CNN-based classifier using Inception- ResNetV2
to extract features from facial frames in videos. It focuses on binary classification of frames
as real or fake using deep hierarchical features.
• Implementation Details:
o Video frames are extracted and resized to 299×299.
o Face alignment is performed before feeding the data into the network.
o The model is trained and tested on a deepfake dataset.
o The network uses multiple convolutional and residual layers for feature extraction.
• Technologies Used:
o Deep Learning Model: Inception-ResNetV2
o Tools/Libraries: TensorFlow, Keras, OpenCV
o Dataset: Not explicitly mentioned but aligned with standard deepfake benchmarks.
• Accuracy: The model achieves high accuracy (over 90%) in identifying deepfake frames
using visual cues.
• Advantages:
o Strong feature extraction capability.
o Performs well on high-quality face-level manipulations.
o Scalable and can be integrated with real-time systems
• Conclusion: The study concludes that CNNs like Inception-ResNetV2 are powerful in
detecting facial forgeries in videos. However, further research is required to improve
detection under occlusion and low-light conditions.
2.2 The paper ―DeepFake Videos Detection and Classification Using ResNeXt and LSTM
Neural Network‖ by Suman Patel, Saroj Kumar Chandra, and Amit Jain provides a detailed
study on deepfake video classification using spatial-temporal neural networks. Key highlights
include:
• Problem Statement: Deepfake videos pose severe threats to digital authenticity by
creating realistic face-swapped videos, which are hard to detect with the human eye or
traditional algorithms.
• Proposed System: The model integrates ResNeXt (a CNN) for extracting frame-level
features and LSTM (an RNN) for capturing temporal dependencies to detect deepfake
sequences effectively.
• Implementation Details:
o Extract video frames and apply preprocessing.
o ResNeXt extracts features from each frame.
o LSTM analyzes time-based dependencies.
o Network is trained using a binary classification objective.
• Technologies Used:
o CNN + RNN: ResNeXt and LSTM
o Tools: Python, TensorFlow/Keras
o Datasets: FaceForensics++, Celeb-DF, DFDC
• Accuracy: Performance improves with more epochs. The paper highlights a significant
drop in training loss and a consistent increase in model accuracy, although exact figures are
not given.
• Advantages:
o Effective at capturing spatial and temporal artifacts.
o Can detect face-swapping and reconstruction-based deepfakes.
o Robust with sequential data.
• Conclusion: Combining ResNeXt and LSTM provides improved deepfake video
classification. Future work should explore attention mechanisms and transformer-based
models for enhanced performance.
2.3. The paper ―Facial Recognition for Deepfake Detection‖ by Firaol Desta and Emily J
Brown provides a detailed study on using face embeddings for detecting altered or
impersonated identities. Key highlights include:
• Problem Statement: Deepfakes enable malicious impersonation on social media, which
can lead to privacy violations, fraud, and misinformation.
• Proposed System: The system uses the Python face_recognition library to compare
known and unknown face images to detect tampered or fake identities.
• Implementation Details:
o Create a folder of known celebrity faces.
o Prepare a separate folder of altered/unknown images.
o Compare faces using facial embeddings.
o Count matches to determine authenticity.
• Technologies Used:
o Tool: Python face_recognition library.
o Libraries: Dlib, OpenCV.
o Dataset: Custom image dataset of known/unknown faces.
• Accuracy: Effective in detecting obvious modifications or disguises, but limited against
high-quality deepfakes.
• Advantages:
o Easy to implement.
o Efficient for image-level manipulation detection.
o Ideal for small-scale verification tasks.
• Conclusion: While basic facial recognition is useful for flagging identity mismatches, it is
insufficient alone for detecting modern deepfakes. Future systems must combine visual,
temporal, and contextual cues.
2.5. The paper ―Improving Deepfake Detection by Mixing Top Solutions of the DFDC‖
by Anis Trabelsi, Marc Michel Pic, and Jean-Luc Dugelay provides a detailed study on model
ensembling for robust deepfake detection. Key highlights include:
• Problem Statement: Top-performing models in the Deepfake Detection Challenge
(DFDC) often overfit and fail on unseen data, highlighting a lack of generalization.
• Proposed System: The paper investigates assembling the top five DFDC models using
boosting, bagging, and stacking to create a more robust and generalizable detector.
• Implementation Details:
o Collect predictions from different models.
o Apply ensemble strategies to merge outputs.
• Technologies Used:
o Ensemble Methods: Bagging, Boosting, Stacking.
o Frameworks: PyTorch, TensorFlow.
o Datasets: DFDC, Celeb-DF, Google DFD, FaceForensics++.
• Accuracy:
o +2.26% improvement in accuracy.
o +41% improvement in log-loss over single models.
• Advantages:
o High generalization.
o Combines strengths of diverse models.
o Scalable to other detection frameworks.
• Conclusion: Ensembling top solutions yields better performance than individual models.
The study encourages dynamic, modular fusion strategies for practical deepfake detection.
2.6. The paper ―Improving Deepfake Detection by Mixing Top Solutions of the DFDC‖
by Anis Trabelsi, Marc Michel Pic, and Jean-Luc Dugelay provides a detailed study on model
ensembling for robust deepfake detection. Key highlights include:
• Problem Statement: Top-performing models in the Deepfake Detection Challenge
(DFDC) often overfit and fail on unseen data, highlighting a lack of generalization.
• Proposed System: The paper investigates assembling the top five DFDC models using
boosting, bagging, and stacking to create a more robust and generalizable detector.
• Implementation Details:
o Collect predictions from different models.
o Apply ensemble strategies to merge outputs.
o Evaluate performance on DFDC and unseen video samples.
• Technologies Used:
o Ensemble Methods: Bagging, Boosting, Stacking.
o Frameworks: PyTorch, TensorFlow.
o Datasets: DFDC, Celeb-DF, Google DFD, FaceForensics++.
• Accuracy:
o +2.26% improvement in accuracy.
o +41% improvement in log-loss over single models.
• Advantages:
o High generalization.
o Combines strengths of diverse models.
Dept. of AI&DS / KSSEM 2024-2025 Page 6
A Survey Paper on Deepfake Detection System
2.7. The paper ―Improving Deepfake Detection by Mixing Top Solutions of the DFDC‖
by Anis Trabelsi, Marc Michel Pic, and Jean-Luc Dugelay provides a detailed study on model
ensembling for robust deepfake detection. Key highlights include:
• Problem Statement: Top-performing models in the Deepfake Detection Challenge
(DFDC) often overfit and fail on unseen data, highlighting a lack of generalization.
• Proposed System: The paper investigates assembling the top five DFDC models using
boosting, bagging, and stacking to create a more robust and generalizable detector.
• Implementation Details:
o Collect predictions from different models.
o Apply ensemble strategies to merge outputs.
o Evaluate performance on DFDC and unseen video samples.
• Technologies Used:
o Ensemble Methods: Bagging, Boosting, Stacking.
o Frameworks: PyTorch, TensorFlow.
o Datasets: DFDC, Celeb-DF, Google DFD, FaceForensics++.
• Accuracy:
o +2.26% improvement in accuracy.
o +41% improvement in log-loss over single models.
• Advantages:
o High generalization.
o Combines strengths of diverse models.
o Scalable to other detection frameworks.
• Conclusion: Ensembling top solutions yields better performance than individual models.
The study encourages dynamic, modular fusion strategies for practical deepfake detection.
2.8. The paper ―Real-Time Face Transition using Deepfake Technology (GAN Model)‖
by Shubham Tandon, Aryan Vig, Harish Chandra Kumawat, and Murli Kartik provides a
detailed study on lightweight GAN architectures for real-time face animation. Key highlights
include:
• Problem Statement: With the increasing sophistication of deepfake generation techniques,
detecting forgeries in videos has become more challenging. Traditional CNN-based
methods often fall short in modeling long-range dependencies and generalizing across
diverse manipulation types.
• Proposed System: The authors propose a hybrid deepfake detection model named Spatial
Vision Transformer (SViT), which integrates CNN-based ConvBlocks and SCConv
(Spatial and Channel Reconstruction Convolution) blocks with a Vision Transformer
architecture. This model is designed to capture both local (spatial) and global (contextual)
features effectively from face regions in video frames.
• Implementation Details:
o Videos are preprocessed using frame extraction, face detection (BlazeFace), and
cropping with post-processing.
o Frames are resized to 224×224 pixels.
o The architecture includes 5 ConvBlocks and 2 SCConvBlocks for feature extraction.
o These features are embedded into patches and passed through a transformer encoder
with multi-head self-attention.
o The output is processed through an MLP head for binary classification (real or fake).
• Technologies Used:
o Model Architecture: SViT (ConvBlocks + SCConv + Vision Transformer)
o Tools: Python, PyTorch, OpenCV, BlazeFace, MTCNN
o Dataset: Subset of DFDC (Deepfake Detection Challenge) dataset
• Accuracy: The SViT model achieved an accuracy of 93.92%, AUC of 94.79%, and an
F1 Score of 93.01%, outperforming baseline models like CViT and DSViT with fewer
parameters (79.9M vs. CViT’s 88.9M).
• Advantages:
o Enhanced feature representation via SCConv
o High accuracy with reduced false positives
o Efficient architecture with optimized preprocessing
o Balanced trade-off between complexity and performance
• Conclusion: The paper concludes that the Spatial Vision Transformer model provides
superior detection capabilities by combining CNN and transformer strengths. It addresses
the limitations of earlier models like CViT and DSViT, making it suitable for real-time or
resource-constrained deployment. Future research aims to test SViT on more sophisticated
deepfake techniques and larger, diverse datasets.
2.9. The paper ―Improving Deepfake Detection by Mixing Top Solutions of the DFDC‖ by
Anis Trabelsi, Marc Michel Pic, and Jean-Luc Dugelay provides a detailed study on model
ensembling for robust deepfake detection. Key highlights include:
• Problem Statement: Top-performing models in the Deepfake Detection Challenge
(DFDC) often overfit and fail on unseen data, highlighting a lack of generalization.
• Proposed System: The paper investigates assembling the top five DFDC models using
boosting, bagging, and stacking to create a more robust and generalizable detector.
• Implementation Details:
o Collect predictions from different models.
o Apply ensemble strategies to merge outputs.
o Evaluate performance on DFDC and unseen video samples.
• Technologies Used:
o Ensemble Methods: Bagging, Boosting, Stacking.
o Frameworks: PyTorch, TensorFlow.
o Datasets: DFDC, Celeb-DF, Google DFD, FaceForensics++.
• Accuracy:
o +2.26% improvement in accuracy.
o +41% improvement in log-loss over single models.
• Advantages:
o High generalization.
o Combines strengths of diverse models.
o Scalable to other detection frameworks.
• Conclusion: Ensembling top solutions yields better performance than individual models.
The study encourages dynamic, modular fusion strategies for practical deepfake detection.
2.10. The paper ―Real-Time Face Transition using Deepfake Technology (GAN Model)‖
by Shubham Tandon, Aryan Vig, Harish Chandra Kumawat, and Murli Kartik provides a
detailed study on lightweight GAN architectures for real-time face animation. Key
highlights include:
• Problem Statement: Deepfake videos use advanced face-swapping and generation
techniques to create highly realistic fake content that can deceive both humans and
traditional detection algorithms. Detecting these manipulations requires understanding not
just frame-level features but also the temporal consistency across frames.
• Proposed System: The study proposes a hybrid model combining ResNeXt-50, a
convolutional neural network, for spatial feature extraction, with LSTM (Long Short-Term
Memory) networks to capture temporal dependencies between video frames. The dual-
architecture is designed to leverage spatial and sequential patterns to detect manipulations
more effectively.
• Implementation Details:
o Video frames are extracted and preprocessed for consistent input.
o ResNeXt-50 extracts frame-level features from individual video frames.
o The sequence of feature vectors is passed into an LSTM to analyze frame-to-frame
dynamics.
o The model is trained with binary classification labels: real or fake.
• Technologies Used:
o Deep Learning Architecture: ResNeXt-50 + LSTM
o Tools: Python, TensorFlow/Keras
o Datasets: Celeb-DF, FaceForensics++, DFDC
• Accuracy: The hybrid ResNeXt-LSTM model achieved an accuracy of 93.5%,
demonstrating strong performance in identifying both spatial artifacts and temporal
inconsistencies typical of deepfake content.
• Advantages:
o Combines powerful spatial and temporal modeling.
o Performs well across multiple datasets and deepfake types.
o Suitable for sequential data such as videos.
• Conclusion: The paper highlights the effectiveness of combining CNN and RNN
architectures for deepfake detection. The integration of ResNeXt and LSTM enables the
model to analyze both visual quality and temporal coherence.
2.11. The paper “ABC-CapsNet: Attention-based Cascaded Capsule Network for Audio
Deepfake Detection” by Taiba Majid Wani presents a novel approach for detecting audio
deepfakes with high precision using capsule networks and attention mechanisms. Key
highlights include:
• Problem Statement: Traditional CNN-based audio deepfake detectors suffer from loss of
spatial hierarchy and temporal information, resulting in limited performance against
sophisticated attacks.
• Proposed System: Introduces ABC-CapsNet which combines Mel spectrograms with
VGG18-based feature extraction, followed by a cascaded capsule network and attention
layer for robust classification.
• Implementation Details:
o Input: Mel spectrograms from audio samples.
o Features extracted using VGG18.
o Attention layer highlights critical segments.
o Capsule network captures hierarchical features.
o Evaluated on ASVspoof2019 and FoR datasets.
• Technologies Used: Python, VGG18, Capsule Networks, Attention Mechanisms.
• Accuracy: Achieved EER of 0.06% (ASVspoof2019) and 0.04% (FoR).
• Advantages: Maintains spatial hierarchy, improves generalization, reduces EER, and
surpasses CNN limitations.
• Conclusion: ABC-CapsNet effectively detects manipulated audio, offering high accuracy
across varied datasets and establishing a new benchmark in audio deepfake detection.
2.12. The paper “Audio Deepfake Detection Using Deep Learning” by R. Anagha focuses
on detecting audio deepfakes using CNNs applied on Mel spectrograms. Key highlights
include:
• Problem Statement: Audio deepfakes are increasingly used for impersonation and
misinformation, necessitating robust detection systems.
• Proposed System: A CNN-based deep learning architecture that processes Mel
spectrograms of audio files to distinguish real vs. fake samples.
• Implementation Details:
o Dataset: ASVspoof2019.
o Mel spectrograms generated and used as input.
2.13. The paper “AI-Based Deepfake Detection” by Aditi Garde explores physiological cue-
based deepfake detection focusing on eye-blinking irregularities. Key highlights include:
• Problem Statement: GANs often fail to mimic natural human traits like eye blinking,
which can be exploited for detection.
• Proposed System: A deepfake detection method based on detecting unnatural eye blinking
using CNN and SVM.
• Implementation Details:
o Extract video frames.
o Detect eyes and analyze blinking patterns.
o Train classifiers (SVM and CNN) to detect irregularities.
• Technologies Used: Python, OpenCV, CNN, SVM.
• Advantages: Innovative use of involuntary physiological behavior; works on low-
resolution content.
• Conclusion: Eye-blinking detection offers an efficient and interpretable method for real-
time deepfake detection in videos.
2.15. The paper “Domain Generalization via Aggregation and Separation for Audio
Deepfake Detection” by Yuankun Xie proposes a model that generalizes better across unseen
audio deepfake datasets. Key highlights include:
• Problem Statement: Audio deepfake detectors often overfit to the training dataset and fail
to detect new types of fake audio.
• Proposed System: ASDG method that aggregates real speech and separates fake samples
using adversarial and triplet loss learning.
• Implementation Details:
o Uses Lightweight CNN for feature generation.
o Adversarial training on real data only.
o Triplet loss increases separation between fake types.
• Technologies Used: TensorFlow, Python, multiple English audio datasets.
• Accuracy: Up to 39.24% reduction in Equal Error Rate (EER) vs. baseline.
• Advantages: Strong generalization to unseen domains; improves robustness without
needing new spoofed data.
• Conclusion: ASDG offers a scalable and adaptive audio deepfake detection approach
suitable for rapidly evolving real-world threats.
2.16. The paper “Comparative Analysis on Different DeepFake Detection Methods and
Semi Supervised GAN Architecture for DeepFake Detection” by Jerry John and Bismin V.
Sherif presents a comparative overview of detection methods and proposes a semi-supervised
GAN-based model. Key highlights include:
• Problem Statement: Deepfakes, particularly with combined visual and voice
manipulations, are difficult to detect manually. There is a need to identify optimal detection
techniques across varied scenarios.
2.17. The paper “Deepfake Detection: A Systematic Literature Review” by Md Shohel Rana
provides a comprehensive survey of over 100 papers on deepfake detection. Key highlights
include:
• Problem Statement: The rapid development of deepfake generation techniques has
outpaced detection methods, necessitating a structured review to map current
advancements.
• Proposed System: The paper categorizes deepfake detection approaches into four groups:
classical ML, deep learning, statistical methods, and blockchain-based techniques.
• Implementation Details:
o Reviews 112 research papers (2018–2020).
o Evaluates based on technique, dataset used, metrics, and performance.
o Provides taxonomy and challenges for each detection method.
• Technologies Used: Multi-technique (DL, ML, blockchain); FaceSwap, DeepFaceLab,
Face2Face.
• Advantages: Broad coverage, useful for benchmarking; highlights the pros/cons of each
approach.
• Conclusion: Deep learning-based approaches outperform others, but generalization, real-
time execution, and interpretability remain key challenges for future research.
2.18. The paper “Deepfake Generation and Detection – An Exploratory Study” by Diya
Garg and Rupali Gill gives an overview of creation and detection methods along with
benchmark datasets. Key highlights include:
• Problem Statement: Deepfakes, especially multimodal ones (audio + video), are
increasingly realistic and pose risks in privacy and public trust.
• Proposed System: Reviews state-of-the-art detection models, including an AVoiD-DF
model that exploits audio-visual disparity using temporal-spatial information.
• Implementation Details:
o Covers GAN-based generation, D-CNN-based detection.
o Proposes multi-modal joint decoder for integrated feature analysis.
o Introduces DefakeAVMiT dataset for audio-video deepfakes.
• Technologies Used: Deep CNNs, multi-modal encoder-decoder, binary cross-entropy,
Adam optimizer.
• Advantages: Exploits cross-modal inconsistencies; novel dataset supports real-world
multimodal detection.
• Conclusion: Detecting multimodal deepfakes is more challenging but achievable with joint
analysis of audio and visual cues.
2.19. The paper “An Effective Approach for Detecting Deepfake Videos Using Long Short-
Term Memory and ResNet” by Keerthana S. introduces a hybrid LSTM + ResNet architecture
to capture temporal and spatial features. Key highlights include:
• Problem Statement: Most models fail to capture both frame-level and temporal
inconsistencies in deepfake videos.
• Proposed System: Combines ResNet (for spatial features from frames) with LSTM (for
temporal dependencies) to detect forged videos.
• Implementation Details:
o ResNext CNN used for extracting frame-level features.
o LSTM processes temporal information across frames.
o Datasets used: FaceForensics++, DFDC, Celeb-DF.
• Technologies Used: Python, ResNet, LSTM, TensorFlow/Keras.
• Accuracy: Not explicitly stated; validated across multiple benchmark datasets.
• Advantages: Captures both short and long-term inconsistencies, effective in detecting
sequential manipulations.
• Conclusion: The LSTM-ResNet model enhances deepfake detection accuracy and robustness.
2.20. The paper “Artificial Intelligence into Multimedia Deepfakes Creation and
Detection” by Moaiad Ahmed Khder explores AI techniques in both creating and detecting
deepfakes. Key highlights include:
• Problem Statement: Deepfake detection is becoming increasingly challenging as videos
grow more realistic, leading to a loss of public trust and increased misinformation. Traditional
detection techniques are no longer sufficient to identify subtle manipulations.
• Proposed System: The paper reviews deep learning-based systems that analyze facial
features and behaviors to distinguish real videos from manipulated ones. It emphasizes the use
of autoencoders and GANs for creation and binary classification models for detection.
• Implementation Details:
o Deep autoencoders and GANs used to generate deepfakes.
o Detection uses binary classification models trained on real vs fake data.
o Features like facial movements and speech patterns are analyzed.
o Benchmarks discussed for cross-dataset and cross-forgery generalization.
• Technologies Used: Python, Deep Learning, TensorFlow, Autoencoders, GANs, Web
scraping for data collection.
• Accuracy: Not explicitly stated; emphasizes the need for better generalization and real-
world robustness.
• Advantages:
o Can detect high-quality and subtle manipulations.
o Helps restore trust by identifying fake content.
o Adaptable with updated datasets and model improvements.
• Conclusion: Deepfake technology poses serious risks but also benefits in entertainment
and healthcare. The paper concludes that evolving AI-based detection methods, updated
datasets, and ethical awareness are crucial to combating this growing threat.
o The approach uses widely available datasets and standard tools, enhancing
replicability.
o The system provides insights into model effectiveness on complex deepfake data.
• Conclusion: The paper concludes that deep learning methods, especially CNNs,
significantly outperform traditional machine learning classifiers in detecting deepfake
videos. It highlights the importance of automated feature learning and sets the stage for
integrating temporal analysis techniques like LSTM in future work.
2.22. The paper “Enhancing Deepfake Video Detection Performance with a Hybrid CNN
Deep Learning Model” by R. Bhanu Prasad et al. proposes a multi-CNN hybrid approach
to improve deepfake detection accuracy and [Link] highlights include:
• Problem Statement:
o Deepfake videos have increasingly subtle manipulations that single CNN models
struggle to detect reliably.
o Many existing systems fail to capture both shallow pixel-level artifacts and deeper
semantic inconsistencies.
o There is a need for a hybrid model that combines strengths of multiple CNN
architectures to enhance detection performance.
• Proposed System: The system integrates multiple CNN models—Xception, VGG16, and
InceptionResNetV2—to form a hybrid architecture that fuses features from differentlayers.
This approach allows the model to simultaneously detect fine-grained anomalies and global
inconsistencies in facial videos, improving overall accuracy and reducing false positives.
• Implementation Details:
o The DFDC dataset is used, featuring thousands of deepfake and real videos in varied
settings.
o Video frames are extracted and faces detected using Haar cascades or MTCNN.
o Each CNN model is fine-tuned on the dataset to extract complementary features.
o Features are concatenated and passed through dense layers for classification.
o The system is trained over multiple epochs using batch normalization and dropout to
prevent overfitting.
• Technologies Used:
o Python and Keras for model development
o TensorFlow backend
o DFDC dataset for training and evaluation
2.23. The paper “Comparative Analysis of Deepfake Video Detection Using InceptionNet
and EfficientNet” by V. Pandey and K. Jain evaluates the suitability of two CNN
architectures for efficient and accurate deepfake detection. Key highlights include:
• Problem Statement:
o Many deepfake detection models achieve accuracy at the expense of high
computational costs, making real-time application challenging.
o There is a need to balance detection accuracy with inference speed and resource
efficiency.
o Choosing an optimal CNN architecture can address this trade-off.
• Proposed System: The system integrates multiple CNN models—Xception, VGG16, and
InceptionResNetV2—to form a hybrid architecture that fuses features from different
[Link] approach allows the model to simultaneously detect fine-grained anomalies and
global inconsistencies in facial videos, improving overall accuracy and reducing false
positives.
• Implementation Details:
o The DFDC dataset is used, featuring thousands of deepfake and real videos in varied
settings.
o Video frames are extracted and faces detected using Haar cascades or MTCNN.
o Each CNN model is fine-tuned on the dataset to extract complementary features.
o Features are concatenated and passed through dense layers for classification.
o The system is trained over multiple epochs using batch normalization and dropout to
prevent overfitting.
• Technologies Used:
o TensorFlow and Keras frameworks
o OpenCV for face detection preprocessing
o YOLOv3 for real-time face extraction
o Vision Transformers integrated with CNN backbones
o Custom dataset compiled from public deepfake sources
• Accuracy: EfficientNet achieved an accuracy of 94%, outperforming InceptionNet in both
accuracy and inference speed..
• Advantages:
o EfficientNet offers a lightweight yet powerful alternative for deepfake detection.
o Vision Transformers enhance the model’s ability to capture context.
o Suitable for real-time deployment in low-resource environments.
o Demonstrates better generalization on varied datasets.
• Conclusion: EfficientNet combined with Vision Transformers presents an optimal solution
balancing speed and accuracy, promising for scalable and real-time deepfake detection
systems
2.24. The paper “Comparison of Different Machine Learning Algorithms for Deep Fake
Detection” by R. Arora and K. Madan compares traditional machine learning classifiers for
image-based deepfake detection. Key highlights include:
• Problem Statement:
o Deepfake detection research predominantly focuses on deep learning, but there is still
a need to evaluate simpler machine learning methods for resource-constrained
environments.
o Understanding which traditional ML algorithm performs best on static image datasets
aids in selecting appropriate models for different application scenarios.
• Proposed System: The authors propose a comparative study of five machine learning
algorithms—Support Vector Machine (SVM), Naïve Bayes, Decision Tree, Random
Forest, and K-Nearest Neighbors (KNN)—to classify real and fake images.
The images are subjected to preprocessing and dimensionality reduction using Principal
Component Analysis (PCA) before being fed to the classifiers. The models are evaluated
based on standard classification metrics.
• Implementation Details:
o Images are preprocessed by resizing and grayscale conversion.
o PCA reduces features while retaining essential information.
o Classifiers are trained using 80:20 train-test splits.
o Model performance evaluated using accuracy, precision, recall, and F1-score.
• Technologies Used:
o Python’s Scikit-learn library for ML models
o Pandas and NumPy for data handling
o Matplotlib for visualization
o Kaggle dataset for real and fake images
• Accuracy:
o SVM achieved the highest accuracy at approximately 94%.
o Random Forest and Decision Tree showed slightly lower but comparable
performances.
• Advantages:
o Machine learning models are less computationally expensive than deep learning
models.
o Easier to implement and interpret.
o Suitable for image-only detection where temporal features are not required.
o Faster training times on moderate datasets.
• Conclusion: The study concludes that Support Vector Machine outperforms other
traditional ML algorithms in detecting deepfake images. Despite the simplicity of the
approach, SVM demonstrated strong accuracy and generalization, suggesting that it can
serve as an efficient alternative when deep learning is not practical. The paper emphasizes
the relevance of machine learning techniques in scenarios where speed, interpretability, or low-
resource deployment is crucia.
2.25. The paper “Analysis and Comparison of Deepfakes Detection Methods for Cross-
Library Generalisation” by B. Packkildurai and V. Sivasankari investigates the
generalizability of popular detection models across datasets. Key highlights include
• Problem Statement:
o Deepfake detection models often overfit to the datasets they are trained on, limiting
their real-world effectiveness.
o There is a critical need to evaluate how well these models generalize when tested on
entirely different datasets with varying video qualities and compression artifacts.
• Proposed System: The study compares the performance of XceptionNet, EfficientNet, and
Capsule Networks across multiple datasets including FaceForensics++, DFDC, and Celeb-
DF. Models are trained on one dataset and tested on others to evaluate cross-library
generalization.
• Implementation Details:
o Transfer learning is used for model training.
o Cross-dataset testing protocol ensures models are evaluated on unseen data distributions.
o Accuracy and F1-score are computed for each training-testing dataset pair.
o Confusion matrices and ROC curves analyze misclassification patterns.
• Technologies Used:
o Python-based scripting and experimentation
o TensorFlow/Keras deep learning frameworks
o Data augmentation tools for preprocessing.
o Evaluation metrics: Accuracy, EER, HTER
o Datasets: FaceForensics++, Celeb-DF, DeepfakeTIMIT
• Accuracy:
o Intra-library Testing:Most models achieved high accuracy (>90%) with Multi-task
reaching up to 98% accuracy in same-dataset testing.
o Cross-library Testing:Performance dropped significantly—HTER remained above
30% for all models, indicating poor generalization.
• Advantages:
o Provides a benchmark comparison across six popular detection models.
o Introduces a unified testing framework including cross-library testing, making results
more interpretable.
o Identifies real-world factors like domain offset, data partitioning, and threshold tuning
that directly impact model performance.
o Highlights the importance of person-based dataset splitting for better generalization.
• Conclusion:The study concludes that while many models perform excellently within their
training datasets, they struggle with real-world generalizability. Random data partitioning
and poorly tuned thresholds inflate performance metrics. The paper recommends using
person-based partitioning, carefully selected thresholds, and diverse datasets for
evaluation. It emphasizes the importance of creating more robust and transferable models
that can handle domain shifts across various deepfake generation techniques and contexts.
• Advantages:
o Offers a consolidated comparison of detection techniques and their trade-offs.
o Bridges the understanding between traditional media forensics and modern AI-
driven approaches.
• Conclusion: The study concludes that no single model or method can universally detect all
types of fake content. A combination of deep learning, statistical analysis, and multi-modal
feature extraction is essential. It calls for more diverse datasets, real-time testing
environments, and adaptive models to tackle the fast-evolving landscape of synthetic media.
• Advantages:
o Raises awareness of the future impact of deepfakes in emerging platforms like the
metaverse.
o Encourages the use of AI not only for security but also for trust-building and identity
protection.
• Conclusion: The paper concludes that AI must play a central role in metaverse security to
mitigate deepfake threats. Embedding intelligent detection tools within virtual systems is
essential for maintaining trust, user safety, and the integrity of digital identities.
• Advantages:
o Offers a holistic understanding of deepfakes beyond just detection.
o Bridges technology with ethics, law, media, and psychology.
• Conclusion: The paper concludes that combating deepfakes requires more than technical
solutions. It advocates for multi-pronged efforts combining detection technology, strict
regulation, media awareness, and ethical design to create safe digital ecosystems.
o F1-score: 84.23%
• Advantages:
o Incorporates both spatial and temporal learning for more reliable detection.
o Performs well across multiple datasets.
o Flexible design — can be extended to other biometric modalities.
• Conclusion: The paper concludes that combining CNN and LSTM improves detection
accuracy and robustness. However, the model is computationally intensive, and future work
should focus on optimizing it for real-time and mobile environments.
2.30. The paper “Unmasking the Illusions: A Comprehensive Study on Deepfake Videos
and Images” by Ravikant Ranout and Prof. CRS Kumar examines both the creation and
detection of deepfakes, focusing on the evolution of detection techniques. Key highlights
include:
• Problem Statement:
o Deepfake generation techniques have outpaced detection methods, making it harder
for existing tools to keep up.
o A dual perspective is needed to understand both how deepfakes are made and how
they can be countered effectively.
• Proposed System: The paper reviews traditional and modern detection strategies including
forensic fingerprinting, CNN-based models, and attention-based architectures. It also
highlights future directions such as using blockchain and watermarking for validation.
• Implementation Details:
o Analyzes how early detection was done using cues like eye-blink rate, facial jitter,
and inconsistent lighting.
o Reviews recent deep learning approaches using XceptionNet, Capsule Networks,
and GAN discriminators.
o Discusses role of datasets like FaceForensics++, DFDC, and their influence on
model development.
• Technologies Used:
o CNNs, RNNs, Capsule Networks, Attention Mechanisms.
o Autoencoders, GAN discriminators.
o Python, TensorFlow, Deep Learning libraries.
• Accuracy: As a survey paper, it does not report specific accuracy values but identifies
XceptionNet and CapsuleNet as among the most effective models in reviewed studies.
• Advantages:
o Comprehensive summary of creation and detection mechanisms
o Bridges the technical and societal perspective on deepfakes
• Conclusion: The paper concludes that a multi-modal approach combining video, audio,
metadata, and user behavior is essential for future-proof deepfake detection.
2.31. The paper “A Heterogeneous Feature Ensemble Learning based Deepfake Detection
Method” by Jixin Zhang, Ke Cheng, Giuliano Sovernigo, and Xiaodong Lin presents a novel
method combining multiple feature types to improve the robustness and accuracy of deepfake
image detection. Key highlights include:
• Problem Statement: Deepfake detectors often perform poorly when tested on fake content
generated by models different from the training set due to a lack of generalization.
• Proposed System: The authors propose an ensemble learning model integrating gray
gradient, spectrum, and texture features, which are flattened and input into a back-
propagation neural network classifier.
• Implementation Details:
o Three features: grey gradients (facial landmarks), co-occurrence matrix (texture), and
spectrum are extracted.
o A flattening process is used to unify heterogeneous features.
o These features are used in a back-propagation neural network for training and
classification.
• Technologies Used:
o Feature Engineering: Gray Gradient Histogram, Co-occurrence Matrix, Spectral
Features
o Model: Back propagation Neural Network
• Accuracy: Achieved detection accuracy of 97.04%, outperforming state-of-the-art
detectors on cross-model evaluations.
• Advantages:
o Good generalization across unknown deepfake models o Effective feature
combination strategy
o Improved accuracy and robustness
• Conclusion: The paper demonstrates that heterogeneous feature ensemble learning
significantly boosts the robustness and accuracy of deepfake detectors against unseen
manipulations.
2.32. The paper “Deepfake Detection in Videos and Picture: Analysis of Deep Learning
Models and Dataset” by Surendra Singh Chauhan, Nitin Jain, Satish Chandra Pandey, and
Aakash Chabaque presents a comparative analysis of deepfake detection methods and datasets.
Key highlights include:
• Problem Statement: Rising misuse of easily accessible deepfake tools necessitates better
detection systems that can keep up with evolving manipulation techniques.
• Proposed System: The authors analyze different deepfake detection models (CNN, RNN,
hybrid), datasets (FaceForensics++, DFDC), and manipulation types (lip-sync, face swap,
puppeteering).
• Implementation Details:
o Overview of models like CNN, LSTM, and DenseNet with recurrent convolutional
components.
o Used pre-existing datasets for training and evaluation (FaceForensics++, DFDC).
o Compared methods based on detection capabilities for various manipulation types.
• Technologies Used:
o Models: DenseNet + GRU, CNN + LSTM
o Libraries: TensorFlow, OpenCV, Python
o Datasets: FaceForensics++, DFDC
• Accuracy: Methods discussed achieved promising results with temporal-aware models
outperforming frame-wise models in many cases.
• Advantages:
o Comprehensive coverage of techniques
o Identifies limitations in dataset diversity
o Emphasizes need for multi-modal analysis
• Conclusion: The paper underscores the importance of hybrid and temporally aware models
and datasets with broader manipulation coverage for reliable deepfake detection.
2.33. The paper “Deepfake Detection Using Deep Learning” by Prakash Raj S, Pravin D,
Sabareeswaran G, Sanjith R K, and Gomathi B presents an application-centric study on using
modern deep learning architectures to tackle deepfake media. Key highlights include:
• Problem Statement: Traditional approaches fall short against sophisticated deepfake
techniques due to lack of contextual understanding and robustness.
• Proposed System: A study of CNN-based and hybrid (CNN + RNN) models for detecting
deepfake video and image content with focus on robustness and real-time application.
• Implementation Details:
o Preprocessing of video/image data for model ingestion.
o Model architectures include CNN variants (VGG, ResNet) and hybrid with LSTM
layers.
o Evaluation based on metrics like accuracy and F1-score.
• Technologies Used:
o Tools: Python, TensorFlow, Keras
o Architectures: CNN, ResNet, VGG16, LSTM
• Accuracy: The hybrid models demonstrated improved detection with temporal consistency
across frames.
• Advantages:
o Enhanced performance using RNNs on video sequences
o Effective for both static and dynamic deepfakes
• Conclusion: The authors affirm that combining spatial and temporal features using deep
learning significantly strengthens deepfake detection systems.
2.34. The paper “Detecting Deepfakes: Training Adversarial Detectors with GANs for
Image Authentication” presents a generative adversarial training approach for improving fake
image detection. Key highlights include:
• Problem Statement: Existing detectors are vulnerable to adversarial examples and fail to
generalize across manipulation techniques.
• Proposed System: A system where GANs are used to simulate adversarial attacks on
detectors, thereby improving their robustness and adaptability.
• Implementation Details:
o Adversarial examples generated using GAN variants.
o Fine-tuning traditional classifiers with adversarial training.
o Evaluation based on classification robustness.
• Technologies Used:
o GANs, TensorFlow, Custom CNNs
• Accuracy: Improved generalization on unseen manipulations compared to non-adversarial
baselines.
• Advantages:
o Boosts robustness against adversarial attacks
o Prepares models for real-world conditions
• Conclusion: The paper validates adversarial training as a potent strategy for developing
resilient deepfake detectors capable of withstanding attacks and unseen variations.
2.35. The paper “Detection of Deepfakes: Protecting Images and Videos Against Deepfake”
by Ying Tian, Wang Zhou, and Amin Ul Haq presents an overview of deepfake generation and
CNN-based detection techniques, with attention to societal implications. Key highlights
include:
• Problem Statement: The increasing misuse of deepfake technology poses threats to
security, privacy, and social trust.
• Proposed System: The authors describe the use of CNN architectures for detecting
manipulated images and videos, analyzing facial attributes and inconsistencies.
• Implementation Details:
o Deepfake videos analyzed for facial distortions, light mismatches, and temporal
inconsistencies.
o Model trained on facial landmarks, pixel inconsistencies, and edge information
• Technologies Used:
o CNN, GANs
o Tools: TensorFlow, OpenCV
• Accuracy: Empirical results suggest robust performance for basic CNN configurations on
typical datasets.
• Advantages:
o Ease of implementation
o Good performance on synthetic datasets
• Conclusion: The authors highlight the importance of continuous improvement in detection
tools to meet evolving threats posed by deepfake technologies.
2.36. The paper “Div-Df: A Diverse Manipulation Deepfake Video Dataset” by Deepak
Dagar and Dinesh Kumar Vishwakarma introduces a novel dataset for benchmarking deepfake
detection. Key highlights include:
• Problem Statement: Existing datasets are biased toward face-swap manipulation and do
not represent diverse real-world deepfakes.
• Proposed System: The authors present Div-DF, a dataset containing 250 manipulated
videos using face-swap, lip-sync, and facial reenactment, along with 150 real videos.
• Implementation Details:
o Fake videos created using FaceSwap-GAN and Wav2Lip methods.
o Includes diverse content: different identities, lighting, professions, and expressions.
• Technologies Used:
o Deepfake Synthesis Tools: FSGAN, Wav2Lip
o Evaluation Models: CNNs, Vision Transformers
• Accuracy: Detection models performed significantly lower on Div-DF than on
conventional datasets, revealing its challenge.
• Advantages:
o Improves generalization benchmarking
o Highlights real-world diversity issues
• Conclusion: Div-DF provides a richer and more realistic benchmark for testing and
improving deepfake detection systems.
2.37. The paper “Model Attribution of Face-Swap Deepfake Videos” by Shan Jia, Xin Li,
and Siwei Lyu presents a novel method to identify the specific model used to generate a
deepfake. Key highlights include:
• Problem Statement: Most studies focus on detecting whether a video is fake, but forensic
attribution to the generating model is lacking.
• Proposed System: The authors frame attribution as a multi-class classification task using a
new dataset (DFDM) containing videos from five different autoencoder models
• Implementation Details:
o Spatial and temporal attention mechanism (DMA-STA) for feature extraction.
o Trained on 6450 Deepfake videos from different encoder-decoder settings.
• Technologies Used:
o Vision Transformers with Attention Modules
o DFDM Dataset
• Accuracy: Achieved over 70% attribution accuracy on high-quality deepfakes.
• Advantages:
o Supports forensic tracking
o Discriminates subtle visual differences
• Conclusion: The method introduces a new dimension to deepfake analysis by enabling
traceability of generation tools.
2.38. The paper “Review: DeepFake Detection Techniques using Deep Neural Networks
(DNN)” by Harsh Chotaliya, Mohammed Adil Khatri, Shubham Kanojiya, and Mandar
Bivalkar provides a comparative analysis of various deepfake detection methods using DNNs.
Key highlights include:
• Problem Statement: Deepfake detection methods struggle with generalization and
robustness against evolving manipulations.
• Proposed System: Comparison of CNN models (VGG16, ResNet, EfficientNet) and hybrid
models (CNN + LSTM) on benchmark datasets.
• Implementation Details:
o Model evaluation on FaceForensics++ and custom test sets.
o Analysis based on accuracy, AUC, and processing time.
• Technologies Used:
o DNN Frameworks: TensorFlow, Keras
o Models: VGG16, ResNet, EfficientNet, LSTM
• Accuracy: CNN + LSTM models showed improved performance on temporal datasets.
• Advantages:
o Wide model comparison
o Identification of pros and cons for each architecture
• Conclusion: The paper offers practical insights into selecting and improving deepfake
detection models using deep learning.
2.39. The paper “Deepfake Generation and Detection: A Survey” by Tao Zhang provides an
extensive review of deepfake synthesis and detection methods, highlighting existing challenges
and future research directions. Key highlights include:
• Problem Statement: Deepfake technology poses severe societal risks due to its realism and
accessibility.
• Proposed System: Surveys both generation (face-swap, reenactment) and detection
methods (CNN, RNN, GAN-based, multimodal).
• Implementation Details:
o Compares over 20 generation and detection approaches.
o Identifies gaps in dataset diversity and model robustness.
• Technologies Used:
o DNNs, GANs, CNNs, RNNs, LSTMs, SVMs
• Accuracy: Varies widely across techniques and datasets.
• Advantages:
o Comprehensive landscape overview
o Highlights real-world threats and limitations
• Conclusion: Emphasizes the need for scalable, efficient, and generalizable detection
systems with robust benchmarking datasets.
2.40. The paper “Deep Fake in Picture Using Convolutional Neural Network” by Dr. J.N. Singh,
Ashutosh Gautam, and Harsh Tomar presents a deep learning approach for detecting fake
images using CNNs. Key highlights include:
• Problem Statement: With the rise of realistic image manipulation through deepfake tools,
it becomes critical to develop reliable methods to identify fake images generated through
AI.
• Proposed System: The authors develop and train a CNN-based model to differentiate
between authentic and manipulated images, leveraging the MesoNet architecture for
enhanced feature extraction.
• Implementation Details:
o Dataset: Real and fake images sourced from Kaggle.
o Model: Custom CNN using four convolutional layers, batch normalization, max
pooling, and dense layers.
o Script Flow: Data is loaded via Google Drive, images are passed through the
network using a generator, and predictions are evaluated visually.
• Technologies Used:
o Language: Python
o Frameworks: Custom MesoNet implementation, Matplotlib for visualization
o Tools: Google Colab, Image datasets from Kaggle
• Accuracy: While exact metrics are not reported, the model demonstrated strong
performance, particularly on compressed and altered inputs where traditional systems fail.
• Advantages:
o Strong image-based detection performance
o Lightweight architecture suitable for mid-tier systems (4GB RAM, 128GB SSD)
o Practical against real-world noise and compression artifacts
• Conclusion: The CNN-based approach using MesoNet provides a reliable means to detect
fake images. The authors recommend further enhancements for broader dataset diversity
and cross-domain generalization.
CHAPTER 3
PROBLEM IDENTIFICATION
3.1 Problem statement
Despite significant advancements in deepfake detection through spatial, temporal, and hybrid
models, existing systems struggle to operate effectively in dynamic real-world environments
such as social media, legal platforms, or crisis communication channels. Current approaches
often lack adaptability to low-resolution, user-uploaded content, overlook contextual metadata
anomalies, and fail to provide transparent, explainable outputs. This gap not only limits
detection accuracy in uncontrolled settings but also undermines public trust and decision-
making, especially during critical events where manipulated media can have serious ethical,
social, and legal consequences.
The scope of this project includes the research and evaluation of various machine learning and
deep learning techniques for deepfake detection. It covers a comprehensive review of existing
models, ranging from traditional classifiers like SVMs to advanced CNN-based and hybrid
architectures such as EfficientNet, InceptionNet, and ensemble methods. The objective is to
analyze their performance, generalizability, computational efficiency, and applicability in real-
world scenarios. Key limitations such as poor cross-dataset performance, high resource
demands, and lack of real-time processing will be identified to inform the design of a more
robust detection methodology.
In addition to technical improvements, the project also addresses ethical and societal
considerations. The study involves experimentation using benchmark datasets like
FaceForensics++, DFDC, and Celeb-DF, with evaluation metrics including accuracy,
precision, recall, and generalization. Ultimately, the project aims to develop a scalable,
adaptive, and trustworthy deepfake detection system suitable for applications in legal forensics,
media verification, and combating misinformation.
CHAPTER 4
The primary goal of this project is to explore and analyze various machine learning and deep
learning models for the detection of deepfake videos and images. The project aims to develop
a deeper understanding of how existing detection systems perform, especially across different
datasets, manipulation types, and model architectures.
Another major goal is to identify and implement an approach that offers improved accuracy,
better generalizability across datasets, and potential for real-time applicability. The ultimate
aim is to contribute toward building a reliable and scalable deepfake detection system that can
assist in real-world scenarios such as media verification, digital forensics, and online content
moderation.
CHAPTER 5
o The project will be developed using the Python 3.x programming language due to its
strong support for AI/ML libraries.
o TensorFlow and Keras will be the primary deep learning frameworks used for
building and training CNN models.
o Scikit-learn will be used for implementing traditional machine learning algorithms
like SVM, Random Forest, etc.
o OpenCV will be used for video frame extraction and face detection.
o NumPy, Pandas, and Matplotlib will be used for data manipulation and visualization.
o Development will be done using Jupyter Notebook, Google Colab, or VS Code.
o Seaborn will be optionally used for advanced plotting and visual analytics.
o Dataset integration will be done using tools/APIs for FaceForensics++, DFDC, and
Celeb-DF.
CHAPTER 6
METHODOLOGY
The methodology for this project outlines the systematic approach followed to study, analyze,
and implement deepfake detection models. It is divided into multiple phases, each contributing
to the successful development and evaluation of the detection system.
o Each model’s performance is recorded and compared based on detection accuracy and
computational efficiency.
o Deep learning models generally outperform ML models in terms of accuracy but
require more training time and computational resources.
o Cross-library performance is analyzed to determine how well models generalize to
unseen datasets.
CHAPTER 7
APPLICATIONS
Deepfake detection systems have wide-ranging applications across several domains due to the
rising concern over the misuse of manipulated media. As synthetic media becomes increasingly
realistic and accessible, the need for automated and accurate detection systems has become
critical. The following are some key application areas where deepfake detection plays a vital
role:
REFERENCES
1. S. Patel, S. K. Chandra, and A. Jain, “DeepFake Videos Detection and Classification Using
ResNeXt and LSTM Neural Network,” 2023 3rd Int. Conf. Smart Gen. Comput., Commun.
Netw. (SMART GENCON), IEEE, 2023.
2. F. Desta and E. J. Brown, “Facial Recognition for Deepfake Detection,” 2022 IEEE
Integrated STEM Educ. Conf. (ISEC), pp. 364–367, IEEE, 2022.
3. J. W. Seow et al., “A Comprehensive Overview of Deepfake: Generation, Detection,
Datasets, and Opportunities,” Neurocomputing, vol. 513, pp. 351–371, Elsevier, 2022.
4. A. Trabelsi, M. M. Pic, and J.-L. Dugelay, “Improving Deepfake Detection by Mixing Top
Solutions of the DFDC,” 2022 Eur. Signal Process. Conf. (EUSIPCO), IEEE, 2022.
5. S. Tandon et al., “Real-Time Face Transition Using Deepfake Technology (GAN Model),”
2023 Int. Conf. RAEEUCCI, IEEE, 2023.
6. Z. Jin, L. Lang, and B. Leng, “Wave-Spectrogram Cross-Modal Aggregation for Audio
Deepfake Detection,” 2025 IEEE ICASSP, pp. 1–5, 2025.
7. Anonymous, “Deepfake Detection using Inception-ResNetV2,” 2021 Int. Conf. ACFCFT,
IEEE, 2021.
8. P. M. Thuan, B. T. Lam, and P. D. Trung, “Spatial Vision Transformer: A Novel Approach
to Deepfake Video Detection,” Proc. VCRIS, IEEE, 2024, pp. 1–10.
9. S. Yadav et al., “Robust and Generalized DeepFake Detection,” Proc. ICCCNT, IEEE,
2022, pp. 1–8.
10. P. Karthik et al., “DeepFake Videos Detection and Classification Using ResNeXt and
LSTM Neural Network,” 2023 Int. Conf. SMART GENCON, IEEE, 2023.
11. T. M. Wani et al., “ABC-CapsNet: Attention-based Cascaded Capsule Network for Audio
Deepfake Detection,” CVPRW, pp. 2464–2473, IEEE, 2024.
12. R. Anagha et al., “Audio Deepfake Detection Using Deep Learning,” Proc. SMART–2023,
IEEE, 2023.
13. Garde, S. Suratkar, and F. Kazi, “AI-Based Deepfake Detection,” 2022 1st Int. Conf. DDS,
IEEE, 2022.
14. A. Mehra et al., “Motion Magnified 3-D Residual-in-Dense Network for DeepFake
Detection,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 5, no. 1, pp. 39–52, Jan.
2023.
15. Y. Xie et al., “Domain Generalization via Aggregation and Separation for Audio Deepfake
Detection,” IEEE Trans. Inf. Forensics Secur., vol. 19, pp. 344–356, 2024.
IEEE, 2023.
34. T. Zhang, “Deepfake Generation and Detection: A Survey,” Multimedia Tools Appl., vol.
81, pp. 6259–6276, Springer, 2022. doi: 10.1007/s11042-021-11733-y.
35. J. N. Singh et al., “Deep Fake in Picture Using CNN,” 2023 5th ICAC3N, IEEE, pp. 1104–
1107, 2023. doi: 10.1109/ICAC3N60023.2023.10541758.