0% found this document useful (0 votes)
11 views

Generalization_of_Forgery_Detection_With_Meta_Deepfake_Detection_Model

The document presents a novel Meta Deepfake Detection (MDD) model aimed at improving the generalization of face forgery detection across unseen domains without requiring model updates. Utilizing meta-learning techniques, the MDD algorithm enhances detection capabilities through meta-weight learning and introduces two new loss functions: Pair-Attention Loss and Average-Center Alignment Loss. The effectiveness of the proposed model is evaluated against various benchmarks derived from popular deepfake datasets, demonstrating its ability to adapt to diverse data distributions.

Uploaded by

xueping wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Generalization_of_Forgery_Detection_With_Meta_Deepfake_Detection_Model

The document presents a novel Meta Deepfake Detection (MDD) model aimed at improving the generalization of face forgery detection across unseen domains without requiring model updates. Utilizing meta-learning techniques, the MDD algorithm enhances detection capabilities through meta-weight learning and introduces two new loss functions: Pair-Attention Loss and Average-Center Alignment Loss. The effectiveness of the proposed model is evaluated against various benchmarks derived from popular deepfake datasets, demonstrating its ability to adapt to diverse data distributions.

Uploaded by

xueping wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 17 November 2022, accepted 17 December 2022, date of publication 26 December 2022,

date of current version 3 January 2023.


Digital Object Identifier 10.1109/ACCESS.2022.3232290

Generalization of Forgery Detection With Meta


Deepfake Detection Model
VAN-NHAN TRAN 1 , SEONG-GEUN KWON2 , SUK-HWAN LEE3 , HOANH-SU LE4 ,
AND KI-RYONG KWON 1
1 Department of Artificial Intelligence Convergence, Pukyong National University, Busan 48513, South Korea
2 Department of Electronics Engineering, Kyungil University, Gyeongsan 38428, South Korea
3 Department of Computer Engineering, Dong-A University, Busan 49315, South Korea
4 Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City, Ho Chi Minh 700000, Vietnam

Corresponding author: Ki-Ryong Kwon ([email protected])


This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF)
funded by the Ministry of Education under Grant 2020R1I1A306659411 and Grant 2020R1F1A1069124; and in part by the Ministry of
Science and ICT (MSIT), South Korea, through the Information Technology Research Center (ITRC) Support Program supervised by the
Institute for Information & Communications Technology Planning & Evaluation (IITP) under Grant IITP-2022-2020-0-01797.

ABSTRACT Face forgery generating algorithms that produce a range of manipulated videos/images have
developed quickly. Consequently, this causes an increase in the production of fake information, making it
difficult to identify. Because facial manipulation technologies raise severe concerns, face forgery detection is
gaining increasing attention in the area of computer vision. In real-world applications, face forgery detection
systems frequently encounter and perform poorly in unseen domains, due to poor generalization. In this
paper, we propose a deepfake detection method based on meta-learning called Meta Deepfake Detection
(MDD). The goal of the model is to develop a generalized model capable of directly solving new unseen
domains without the need for model updates. The MDD algorithm establishes various weights for facial
images from various domains. Specifically, MDD uses meta-weight learning to shift information from the
source domains to the target domains with meta-optimization steps, which aims for the model to generate
effective representations of the source and target domains. We build multi-domain sets using meta splitting
strategy to create a meta-train set and meta-test set. Based on these sets, the model determines the gradient
descent and obtains backpropagation. The inner and outer loop gradients were aggregated to update the model
to enhance generalization. By introducing pair-attention loss and average-center alignment loss, the detection
capabilities of the system were substantially enhanced. In addition, we used some evaluation benchmarks
established from several popular deepfake datasets to compare the generalization of our proposal in several
baselines and assess its effectiveness.

INDEX TERMS Deepfake detection, meta-learning, artificial intelligence, computer vision.

I. INTRODUCTION and Variational AutoEncoders [2], [3]. Fake facial images and
Face recognition systems have progressed substantially in videos can be made and utilized to deceive recognition sys-
recent times. In particular, deep learning technologies have tems. Many manipulation algorithms [4], [5], [6] person with-
significantly improved the performance of this task. However, out specific skills to produce high-quality fake faces without
the sophistication of face image manipulation puts existing expert skills and special knowledge for training. As a result,
facial recognition algorithms in danger of being considered it can be often challenging for the human eyes to identify the
inefficient. With the development of technologies such as difference between actual and manipulated images. This has
Generative Adversarial Networks (GAN) [1], GANs family, led to an increase in the usage of modified multimedia content
in various cybercrime activities. The technology may be uti-
The associate editor coordinating the review of this manuscript and lized maliciously, resulting in a major trust issue for modern
approving it for publication was Liangxiu Han . society. Due to the fact that such methods may produce

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 535
V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

FIGURE 1. Overview architecture of our proposed MDD.

high-quality fake images that are even indistinguishable from data distribution in each one. When training and testing are
human eyes. Therefore, the scientific community has shown completed on one dataset, then only one data distribution
a lot of interest in the need to develop techniques for identi- set is used to assess the outcomes. When testing with other
fying authentic faces from fraudulent images. Many methods databases, often the results are poor. However, in real-world
for deepfake detection have been proposed in [7], [8], [9], applications, the model is frequently used in a significantly
[10], and [11]. These proposals primarily take inspiration different domain (unseen domain) with a different distribu-
from the binary classification problem, applying its models tion than the source domains. As a result, generalized face
to the deepfake detection challenge in order to differentiate forgery detection is less researched and more difficult with
between real and fake photos. The common model for these unseen facial manipulations.
proposals typically uses the data preprocessing associated In this research, we design a generalized face forgery
with backbone networks to extract features from faces in detection model to solve the face authentication issue. With-
images or videos. Then uses a binary classifier network to out any model updating, the model can be evaluated directly
classify them into real and fake ones. However, due to the on unseen domains after being trained on a number of source
rapid advancement of face forgery generation algorithms, domains. Inspired by [14], [15], and [16], by using meta-
some samples seem extremely similar to one another and learning, we propose a novel deepfake detection algorithm,
only differ from one another by a few small features, it is termed Meta Deepfake Detection (MDD). With a meta-
getting harder to determine the difference between fake and optimization objective, in order to learn efficient face rep-
real features in fake images. In addition, there is a lot of resentations on both synthetic source and target domains.
variety in fake images which are produced using different The MDD shifts the source domain to the target domain.
algorithms. Resulting in the ineffective performance of such So as to increase model generalization, the gradients from
global feature-based systems which used binary classifier the meta-train and the meta-test are combined using meta-
networks. optimization. The MDD can handle unseen domains without
Presently, Face forgery generation algorithms are increas- model updating for unseen domains. The followings are sum-
ing rapidly, which can be mentioned as expression swap- mary of our main contributions:
ping, identity swapping, face swapping, face synthesis, etc. • We propose a Meta Deepfake Detection model (MDD)
Based on these algorithms, a variety of manipulated datasets to handle the generalization of the deepfake forgery
is created to serve the research and development of face detection problem, which uses transferable knowledge
forgery detection. Several common datasets used in the across domains to learn from meta-learning to enhance
experiment of this paper are DFDC [12], Celeb-DF-v2 [13], model generalization.
FaceForensics++ [9]. The synthetic faces in these datasets • We emphasize the generalized deepfake detection
were produced using the same algorithm leading to similar challenge, which necessitates that a trained model

536 VOLUME 11, 2023


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

generalizes effectively on new domains without any components: Dense Mapping Network (DMN) and Editing
updating. Behavior Simulated Training (EBST) to modify target images
• We propose two loss functions: Pair-Attention Loss and learn style mapping by using a modified mask. Star-
(PAL), which is to concentrate on maximizing positive GANv2 [23] proposed a framework that meets the variety
and negative pairings and separating positive samples of generated images and scalability across multiple domains
from negative samples. Average-Center Alignment Loss when learning a mapping across several visual domains.
(ACA), which is to minimize the variations in each class, FacialGAN [24] proposed a framework that allows for the
while retaining the capacity to differentiate between simultaneous manipulation of dynamic face features and
features of various classes. Moreover, these two losses extensive style transfers.
are aggregated with softmax loss to update the entire
3) EXPRESSION REENACTMENT
model and learn across domains.
• We apply data preprocessing along with the block shuf- The conditional face synthesis problem of facial expres-
fling transformation technique to increase the perfor- sion reenactment aims to transfer a source face shape to
mance of the generalized model. a target face while keeping the same target identity of the
• Some generalized deepfake detection benchmarks are face and appearance. Some related research can be men-
used for the evaluation of our proposal. A number of tioned as MarioNETte [25] which creates professional reen-
experiments on these evaluation benchmarks are con- actments of hidden identities in a few-shot environment
ducted and compared with some related methods. to handle attention block of the image, facial landmark
transformer, and focus feature alignment. DEA-GAN [26]
II. RELATED WORK presented a self-supervised hybrid model that learns an
A. FACE FORGERY GENERATION embedded face that is pose-invariant for each video by using
Deep generative models, which are gaining popularity, are a multi-frame deforming auto-encoder. FReeNet [27] pro-
being used to synthesize and produce fake videos and images. posed a multi-identity face reenactment framework to share
The manipulation algorithms also expand along with it. Sev- a common model and transmit facial expressions from the
eral well-known algorithms include face swap, face manipu- source face to the target face. AD-NeRF [28] proposed an
lation, expression reenactment, etc. audio-driven talking head technique that renders portraits
by directly mapping audio characteristics to dynamic neural
1) FACE SWAP radiance fields. FACEGAN [29] proposed a model that uses
Face swapping involves replacing the face of a source image the Action Unit (AU) representation to transfer from the
with that of a target image. Some remarkable research such driving face to facial motion.
as RSGAN [17] proposed a region-separative generative B. FACE FORGERY DETECTION
adversarial network, which replaces the handles face and Face forgery detection is divided into different groups, such
hair appearances in the latent-space representations of the as spatial clue for detection, temporal clue for detection, and
faces and reconstructs the full face to achieve face swapping. generalizable clue for detection.
FSGAN [18] proposed Face Swapping GAN, which derives
a recurrent neural network (RNN) for face reenactment and 1) SPATIAL CLUE FOR DETECTION
adapts to changes in position and expression. FSGANv2 [19] The work in [30] presented an innovative attention-based
offered a subject-agnostic swapping scheme for face reenact- layer to boost classification efficiency and generate an
ment which adjusts important pose and expression variation. attention map showing the altered face areas. Furthermore,
MobileFaceSwap [20] proposed an advanced face swapping the work in [31] designed an inconsistency-aware wavelet
approach with a lightweight Identity-aware Dynamic Net- dual-branch network to recognize real and fake images.
work (IDN) to modify the model parameters depending on Capsule-forensics [32] proposed a method that employs a
the identification information dynamically. deep convolutional neural network and a capsule network
to identify several types of spoofs, including replay attacks
2) FACE MANIPULATION that use printed pictures or recorded movies and computer-
It is a generation task in which the facial attributes and generated videos. FakeLocator [33] introduced the attention
styles of the output face are changed to point in the direction mechanism by using face parsing and suggest a single sample
of the intended target. AttGAN [21] applied an attribute clustering and partial data augmentation to improve the train-
classification constraint to ensure the precise changing of ing data. In research [34], with the goal of developing a novel
the desired characteristics in the resulting image and pre- detection technique that can find a forensics trail concealed
serve attribute-excluding details. Moreover, the suggested in images, we concentrate on the analysis of deep fakes of
approach is enhanced to allow attribute style adjustment in human faces.
an unsupervised setting. STGAN [22] presented a selec-
tive transfer perspective to utilize the target attribute vec- 2) TEMPORAL CLUE FOR DETECTION
tor to direct the flexible translation to the desired target MesoNet [35] presented a method for quickly and effec-
domain. MaskGAN [6] proposed a model with two primary tively identifying face tampering in videos with a focus on

VOLUME 11, 2023 537


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

two recent methods for creating fake videos that appear to domains are configured with various weights. The generaliz-
be extremely realistic. FakeCatcher [36] provided a fresh ability of the model can be balanced across many domains
method for detecting phony content in portrait videos as a using their network. The gradient of the source domain is
preventative measure against the growing danger of deep- then calibrated by the meta-optimization, allowing for the
fake. Bita-Net [37] proposed a model to detect fake faces, learning of additional discriminative features. The work in
which reflects the two-pathway architecture to enhance the [47] presented a frequency adversarial attack technique based
forgery detection ability. Furthermore, the work in [38] pro- on meta-learning for face forgery detection. Moreover, they
posed a spatiotemporal attention mechanism combined with performed a discrete cosine transform (DCT) on the input
Xception-LSTM algorithm to improve deepfake detection. photos and applied a fusion module to capture the strong
area in the frequency domain. NAS-FAS [48], presented an
3) GENERALIZABLE CLUE FOR DETECTION approach based on neural architecture search and created a
The work in [39] presented a multi-task incremental brand-new search space using pooling and central difference
learning-based methodology for the detection and classifi- convolution operators. The work in [49] suggested a learnable
cation of manipulated images, the model can adapt to new network to extract Meta Pattern (MP) in their architecture
classes without losing the existing information. OC-FakeDect for learning to learn and created a two-stream network uti-
[40] presented a model that only uses actual face images for lizing their suggested Hierarchical Fusion Module to hierar-
training, and treats fake images like deepfakes as irregular- chically fuse the input RGB picture and the extracted MP.
ities. The research in [41] recommended intensive training The discriminative features extracted from MP are capable of
to increase generalization performance. The generalization learning a more generalized model by substituting handmade
ability significantly enhances by adversarially created train- features with the MP.
ing samples that are designed to challenge the classification
models. III. METHODOLOGY
A. OVERVIEW
C. META-LEARNING
We suggest a method based on meta-learning called the meta
One of the most promising and popular research areas in
deepfake detection (MDD) algorithm. The model aims to
the field of artificial intelligence currently is meta-learning.
enhance the performance of detecting manipulated images
Basically, With the help of meta-learning, an adaptable AI
and videos produced by a certain method as well as enhance
model is created that can learn to perform various tasks
the generalization of the detector. In the training stage,
without needing to be trained from scratch. The meta-learning
we have N-related tasks: TS = {TS1 , TS2 , . . . , TSN }; N >
model is trained on a variety of related tasks on sparse data
1 and each task TSi = {(xi , yi )}, where TSi represents the
points, allowing it to apply to learn from such tasks to new
ith task, xi is extracted feature vectors and yi is its own
related tasks. Some famous research can be mentioned as
set of labels. In the evaluating stage, the trained model is
MAML [42] which is to find a better initial parameter. So that
tested on one or more unseen target domains, ≥ TT =
the model can learn quickly on new tasks with fewer gradient
{TT 1 , TT 2 , . . . , TTM }; M 1. The model learns from a variety
steps. CAML [43] used context parameters and shared param-
of connected tasks, and the meta-learner process makes it fast
eters to adapt and share information across tasks in order to
learner with good generalization abilities. We define a single
avoid overfitting problems. Meta-SGD [16], a meta-learning
backbone during training, a parametrized function f (θ) with
algorithm that is used for performing learning quickly, Meta-
parameters θ. It will generalize parameters to predict accu-
SGD not only determines the optimal parameter but also
rately the target domain by training and optimizing for source
the optimal learning rate and update direction. TAML [44],
domains. The overall architecture is displayed in Fig. 1.
which prevents the problem that the model can be biased
over some tasks during adapting to new tasks with meta-
learning technique, especially the tasks that are sampled in B. META SPLITTING
the meta-training phase. MLDG [14] presented a new method We separated the source domains into the meta-train domain
of meta-learning for domain generalization and a training Tstrain and the meta-test domain Tstest during training to obtain
procedure for domain generalization by developing models domain generalization. In order to simulate the domain shift
that naturally generalize to new testing domains. problem that existed when used in real-world situations, the
Some researches used the meta-learning to solve the model is driven to acquire generalizable information about
problem of face forgery detection and face anti-spoofing how to generalize well on the new domains with different
can be mentioned in the work [45], they designed a novel distributions. We also create meta-batches for training and
meta-learning framework named Regularized Fine-grained testing by randomly splitting N source domains of TS ; these
Meta-learning to identify generalized learning directions in data contain both real and fake face pairs and these pat-
the meta-learning process, which is accomplished by per- terns are not duplicated across domains. These pairs increase
forming effectively in the simulated domain shift scenarios. collation and comparison of information between real and
The work in [46] designed a domain generalization model, fake images. Therefore, it also increases inter-class separa-
named learning-to-weight. The facial pictures from various bility, which can be interpreted as a distinct dispersion of

538 VOLUME 11, 2023


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

the feature distribution of samples, increasing differentiation


during training as well as enhancing the model’s quality.
More distinguishable characteristics may be learned by the
network with less effort during optimization.
The fact is that features learned by supervised learn-
ing have much less ability to generalize when subjected to
unseen manipulation techniques. This suggests that super-
vised learning-based characteristics have a close relation-
ship with manipulation techniques. The features of samples
produced by various manipulation techniques make it chal-
lenging to combine all of the manipulated faces. Therefore,
the model is easier to generalize when the source domain is
split into meta-train and meta-test. In addition, samples in
the meta-train and meta-test are also shuffled and selected at
random, which minimizes the problem of overfitting. Addi-
tionally, the data in the unseen domain is very diverse in
reality, which the model has never seen or been trained in
before. Thus, meta-splitting makes the model easier to train
and also to generalize to unseen data.

C. DATA PREPROCESSING
FIGURE 2. A few samples of extracted real images in FaceForensics++
A lot of data is used to train deep learning models. Hence, dataset [9].
proper dataset preparation is essential for their learning qual-
ity and prediction accuracy. In our paper, we use several
existing datasets, including DFDC [12], Celeb-DF-v2 [13],
FaceForensics++ [9]. These datasets include real and manip-
ulated videos, accompanied by real or fake labels. These
videos are sampled to obtain images. Afterward, face extrac-
tion is utilized for extracting the faces from the images and
resizing them to 224 × 224 RGB format. Multitask cas-
caded convolutional network (MTCNN) [50] library is used
to extract the faces. Fig. 2 and Fig. 3 show a sample of the
extracted faces. Our approach does not use any data augmen-
tation techniques in order to compare fairly with the study
contents relevant to deepfake detection. After obtaining a set
of extracted face images, block pixel shuffling transformation
is applied on the part of extracted face images to increase the
diversity of the data set during training. It is different from
data augmentation as the amount of data after applying block
shuffling transformation does not change. The overview of
the data preprocessing process is shown in Fig. 4.
The local spatial structure of the local regions might be
destroyed by the shuffling of the pixels in an image, which
prevents the network from extracting valuable features. This
is also mentioned in some research related to image encryp- FIGURE 3. A few samples of extracted fake images in FaceForensics++
tion [51], [52], [53], [54]. However, if the blocks of the image dataset [9].
are shuffled in a proper way, it can lead to preserving essential
characteristics while also enhancing the quality of the model
[55], [56]. Additionally, several researches in [57] and [58] of face forgery detection, more efficient local features are
have demonstrated that creating patches by using charac- extracted using neural networks. A portion of the sample of
teristics gathered in an image also increases the quality of the meta-train set is applied block shuffling transformation.
the training process. Therefore, these demonstrate that block The description for block shuffling transformation is shown
shifting and shuffling local regions greatly raise quality when in Fig. 5.
applied properly. The block shuffling transformation is a data We divide an image with RGB format has dimension of
enhancement technique to increase the performance of the X × Y × 3 into block by using a window (with the window
system. So as to improve the robustness and generalization size of W × W × 3). The original image will be divided into

VOLUME 11, 2023 539


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

vector of real face obtained through model f (θ), Ff is the


embedding vector of fake face obtained through model f (θ),
with label l = {l1 , l2 , . . . , lC }, li ∈ (0, 1), where ‘‘0’’ means a
real sample and ‘‘1’’ mean a fake sample. P is the number of
positive samples in a batch B and N is the number of negative
samples in a batch B, B = P + N . The PAL function can be
formulated as follows.
1 X
LPAL = |Fri − Ffi |2
2 (P + 1)
i∈P
FIGURE 4. Overview of data processing. 1 X
− |Ffj − Frj |2 (1)
2 (N + 1)
j∈N

2) SOFTMAX LOSS (SOF)


The goal of softmax loss is to identify a decision bound-
ary that divides several classes by mapping the samples to
discrete labels. The softmax loss function is presented as
follows.
m wTy xi +byi
X e i
LSOF = Log P T
(2)
n wy xi +byj
j=1 e
j
i=1

3) AVERAGE-CENTER ALIGNMENT LOSS (ACA)


The purpose of average-center alignment loss (ACA) is to
focus on minimizing the variations in each class while main-
taining the ability to distinguish between characteristics of
various classes. The domain gap between several meta-train
domains can be reduced by adding average-center alignment
loss to make the embedding domain invariant. We determine
the embedding center for all mean embeddings of meta-train
domains. After optimizing these embedding centers, the cen-
ter points find out the better destination to gets closer to other
data points of its class and reduce the gap between two classes
FIGURE 5. Visualization of block shuffling transformation. On top is an (‘‘1’’ and ‘‘0’’). As embedding centers get close to each other,
example of the original image and bottom is a shuffled image and the embedding distribution of class samples get closer. As a
coordinate blocks.
result of that, the domain gap of different meta-train domains
smaller blocks with the size of the window. If the original can also be reduced. Therefore, the alignment of all meta-
size of the image is not divisible by the block size, padding train domains becomes easier to generalize. The average-
will be applied. Thus, we get r × c blocks and i and j are center alignment loss is only used in meta-train domains. The
the horizontal and vertical indices, respectively, where r and loss is formulated as:
c are the horizontal and vertical blocks, i ∈ 0, 1, . . . , r and P
1 X T sj
j ∈ 0, 1, . . . , c. The block B(i, j) interpreted as block ith row cri = Fri (3)
P
and jth column. The block B(i, j) is changed randomly. Where i=1
ith is an integer from 0 to r that is randomly permuted, and jth 1 XN
T
is a random permutation of integer from 0 to c. cfi = Ffi sj (4)
N
i=1
n
D. LOSS FUNCTION 1 X 
cavg = cri + cfi (5)
1) PAIR-ATTENTION LOSS (PAL) n
i=1
The basic idea of Pair-Attention Loss (PAL) is to focus on n
optimizing negative and positive pairs, along with distin- 1 X 
LACA = | cri + cfi − cavg |2 (6)
guishing between positive samples and negative samples. n
i=1
A batch of each iteration contains B identities, each identity T T
contains real and fake faces. We define the input as X . With B where Frisj , Ffi sj are embedding features of real and fake
identities, we have Fr = f (Xr , θ), ∈ RP×C , Ff = f (Xf , θ), ∈ samples respectively in the meta-train domain jth . In a batch
RN ×C , where C is the dimension length, Fr is the embedding B(B = P + N ) sampled from domain Tsj , cri represents

540 VOLUME 11, 2023


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

the mean embedding of real samples Tsj , cfi represents the Algorithm 1 Meta Deepfake Detection for Generalization of
mean embedding of fake samples, n is the number of meta- Deepfake Detection Problem
train domains, cavg is embedding the center of all meta-train Input: Source domains and target domains:
domains. The cri , cfi should ideally be updated whenever TS = {TS1 , TS2 , . . . , TSN }; N > 1
the deep features get changed. However, it is inefficient and TT = {TT 1 , TT 2 , . . . , TTM }; M 1
perhaps impracticable to calculate the mean of the embedding Init: A pre-train model f (θ ) parameterized by a parameter θ , distri-
bution over task p(T ). Batch size of B, hyper-parameter: α, β, γ , λ.
of each class in each iteration, when taking into consideration For iteration in max_iteration do:
the entire training set. Therefore, instead of calculating and Initialize the gradient gθ equal to 0:
updating the centers for the entire training set, we perform an Meta Splitting:
update on B number of identities in a batch. Meta-train: Tstrain = {TStrain , TStrain , . . . , TStrain
1 2 N
Meta-test: Tstest = {TStest , TStest , . . . , TStest
1 2 N
E. LEARNING PROCESS For each task (TSi ) in task (TS ) do:
Sample k data point and its label
In the meta-learning algorithm, we train the model over a TSitrain = {(x1train , ytrain train , ytrain), . . . , (x train , ytrain )
1 ), (x2 2 k k
distribution of related tasks and look for a better model TSi = {(x1 , y1 ), (x2 , ytest 2 ), . . . , (xk , yk )
test test test test test test
parameter that can be used in a variety of similar tasks and Meta-train:
can easily be adopted in new tasks. Create a batch B samples of meta trainset TSitrain ,
Create embedding features of real and fake samples Fri , Ffi
Calculate loss function:
1) META-TRAIN λLtrain = LPAL + LSOF + LACA
In each iteration, we apply meta-splitting, so as to get meta- Update model parameter by:
train sets Tstrain and meta-test sets Tstest . For meta-train, the ∇θi∗ = θ − α θ Ltrain
Tstrain is used to calculate the loss function in the training Meta-test:
Test with TSitest : LPALtest , L test
stage. In each task TSitrain of Tstrain , the data points are divided target
SOF
target
Test with TT : LPAL , LSOF
into batch sizes B, which contain fake samples and real sam- test + L target
LPAL = LPAL
ples. The purpose of the meta-training stage is to calculate PAL
test + L target
LSOF = LSOF
the loss of each task based on the binary classification model SOF
Ltest = LPAL + LSOF
f (θ), where θ represents the model parameters. The loss Gradient synthetic:
function of the meta-train stage Ltrain is formulated as: ∇gθ = γθ Ltrain + (1 − γ )∇θ Ltest
∇g∗θ ← gθ + ∇gθ
Ltrain = LPAL + LSOF + LACA (7) end
Meta-optimization:
Here λ is a hyper-parameter to balance the average-center Update θ ∗ ← θ − βg∗θ by SGD
alignment loss (ACA) and other losses. Because the average- end
center alignment loss can reduce the domain gap between IV. EXPERIMENTS
several meta-train domains, it also makes the distribution of To evaluate the quality of our proposed MDD, we use open
data closer to its center point. datasets of facial synthesis, such as DFDC [12], Celeb-DF-v2
[13], FaceForensics++ [9]. The DFDC dataset, which has
2) META-TEST over 100,000 total videos gathered from 3,426 paid actors
After the meta-train, in each iteration, the model is validated and was created using a variety of Deepfake, GAN-based
on the meta-test sets Tstest and meta sets of unseen target and unlearned algorithms, the DFDC dataset is a sizable face
domains TT with a different distribution. The pair-attention swap video dataset and freely accessible. The Celeb-DF-v2
loss and softmax loss are calculated to update parameters. presents a large-scale Deepfake video dataset based on the
In order to allow the model to generalize across domains. The development and evaluation of improved deepfake synthe-
loss of the meta-test is calculated as follows: sis algorithms. The Celeb-DF-v2 contains 5639 high-quality
Ltest = LPAL + LSOF (8) Deepfake videos. The FaceForensics++ dataset contains
four state-of-the-art methods for facial manipulation, namely,
After calculating the loss function of the meta-train and the Deepfakes, FaceSwap, Face2Face, and NeuralTextures. Each
meta-test, we need to update the parameters of the inner loop. method has different manipulated techniques and algorithms.
The gradient is synthesized and updated as follows: Corresponding to each method, it includes 1000 original
videos (real videos) and 1000 manipulated videos (fake
∇gθ = γ ∇ θ Ltrain + (1 − γ ) ∇θ Ltest (9) videos). This dataset released raw videos and compressed
g∗θ ← gθ + ∇gθ (10) videos (high-quality videos and low-quality videos).

3) META-OPTIMIZATION A. EVALUATION BENCHMARKS


After updating the model in the inner loop, the model needs Based on the popular datasets mentioned above, we use
to update thoroughly in the outer loop. We use stochastic videos generated from different methods to use for source and
gradient descent (SGD) to optimize. target unseen domains. We utilize variety here to illustrate the

VOLUME 11, 2023 541


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

large gap between target unseen domains and source domains. domains and a testing set for target domains. (For example,
In real-world scenarios, after the models are trained, the in CID-DF23, NeuralTexture, FaceSwap, Face2Face, and the
model is validated with many different manipulated videos. original video are 720 videos for each method in the source
Even the model needs to be evaluated with videos generated domain. DeepFakes, and the original video are 140 videos
from the specific method that the model has never been for target domains). The Celeb-DF-v2 dataset contains 5639
trained before (which is called the unseen domain). Target high-quality fake videos and 890 real videos. As for the
unseen domain aims to simulate this situation. The detailed source domains of CVD-CV23-1 and CVD-CV23-3, we have
content of the evaluation benchmark is illustrated in Table 1. used 6011 original and DeepFake videos from Celeb-DF-v2.
Inspired by [46], we take advantage of similar evaluation For the target domains of CVD-CV23-2 and CVD-CV23-3,
benchmarks. The purpose is also to compare with related we have selected 518 test videos (official release) from Celeb-
researches. DF-v2. In CVD-CV23-2 and CVD-CV23-3, we have used
DFDC test set for target domains. All these videos are sam-
TABLE 1. Seven evaluation benchmarks. The FaceForensics++ dataset pled and extracted to the face, we have used multitask cas-
use compressed videos. ‘‘C23’’ means higher quality (constant rate
quantization parameter equal to 23), ‘‘C40’’ means lower quality (using
caded convolutional network (MTCNN) [50] library for facial
quantization parameter equal to 40). For benchmarking in the unseen extraction. We only choose 10 frames of facial extraction for
domain, source domains for training and target domains for testing are training and testing. The extracted face images are resized to
considered. CID: crossing intra-datasets. CVD: crossing variety of datasets.
224 × 224 RGB format.

C. IMPLEMENTATION DETAILS
We have used EfficientNet-B0 [59] as a single backbone f (θ )
with 5.3M parameters. The meta-train step-size α, the meta
optimization learning rate β, the hyper-parameter γ (which
balances meta-train loss and meta-test loss), and the hyper-
parameter λ to balance the PAL, SOF loss, and ACA loss
are initially set to 0.0005, 0.0005, 0.5, 0.01, respectively.
The batch-size B is set to 32. To evaluate the performance
of the model, our comparisons include (i) Base: The model
was pre-trained on ImageNet [60] without being fine-tuned
on our benchmarks. (ii) FT-Base: Based on our bench-
marks, the base model is fine-tuned with the same training
datasets. This method is for a fair comparison with our MDD.
(iii) Multi-task [61]: This method proposed a multi-task
learning approach to improve the generalization of the model.
There are two tasks: one task applied to share knowledge
to improve the performance of both tasks, and another
task shares the data it has collected with another. We have
run official code with our benchmarks (iv) MLDG [14]:
This method proposed a novel meta-learning method and a
model agnostic training procedure for domain generalization.
We have adapted it for face forgery detection and trained it
with our benchmarks (v): Learning-to-weight (LTW) [46]:
This method proposed a domain-general model, known as
learning-to-weight, which can balance different weights for
face forgery images from various domains. This method is
also proposed to handle deepfake detection problems. Some
experimental results have been completed on several similar
benchmarks in their paper which are reused. Other bench-
marks, which have not yet been tested in their proposal,
are conducted in our research. (vi): Multi-attentional model
(Multi-Att) [62]: This method proposed a multiple spatial
B. SETTINGS attention network and combined the attention maps-guided
The official release of each method in FaceForensics++ high-level semantic information with low-level textural fea-
dataset included 720 videos for training, 140 videos for tures for face forgery detection. We have run the official code
validation, and 140 videos for testing. In each method of with our benchmarks. (vii): Model-Agnostic Meta-Learning
FaceForensics++ dataset, we use a training set for source (MAML) [42]: This method is a well-known meta-learning

542 VOLUME 11, 2023


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

TABLE 2. Performance on the CID-DF23/40, CID-FF23/40, CID-FS23/40, TABLE 2. (Continued.) Performance on the CID-DF23/40, CID-FF23/40,
and CID-NT23/40 benchmarks. CID-FS23/40, and CID-NT23/40 benchmarks.

model. We have adapted it for face forgery detection and


trained it with our benchmarks.

D. EVALUATION METRICS
For performance evaluation, we use the area under the
receiver operating characteristic curve (AUC). The receiver
operating characteristic (ROC) is used to display a classifier
to select the classification threshold. AUC is an area covered
by the ROC curve. Moreover, we have used the accuracy
score (ACC) for evaluating classification models. Another
metric we have applied to measure our model performance is
a log loss function. We have chosen a log loss metric because
it measures how well the predicted probability close to the
corresponding actual value and it is appropriate for binary
classification, where ‘‘0’’ represents the real class and ‘‘1’’
represents the fake class.
n
1X
LogLoss = − [yi log(ŷi )
n
i=1
+ (1 − yi )log(1 − ŷi )] (11)

E. EVALUATION RESULTS
1) CID COMPARISONS
From the results in Table 2, our proposal achieves superior
results in most of the benchmarks. The base model is pre-
trained on ImageNet. Because without being fine-tuned on
our benchmarks, the results are frequently insufficient to
identify false information. The FT-Base model is fine-tuned
on our benchmarks which can detect fake images but can not
generalize well for the target domains, especially for low-
quality images/videos. The method of Multi-task, MLDG,
and Learning-to-weight (LTW) have different approaches.
Each proposal offers different solutions to generalize the
model in order to identify tampering from as many sources

VOLUME 11, 2023 543


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

TABLE 3. Performance on the CVD-CV23-1, CVD-CV23-2, and CVD-CV23-3 TABLE 4. Performance on different backbone architectures with or
benchmarks. without our proposal on the CID-DF23 benchmark.

TABLE 5. Ablation Study of loss function on CID-DF23 benchmarks.

as possible. It is important to note that the outcomes of LTW


are fairly promising. The results of our method achieve the
best result on higher-quality images. Compare to FT-Base on
AUC, our method improves the performance from 0.903 to TABLE 6. Ablation Study of data preprocessing technique on CID-DF23
0.931 in CID-DF23, from 0.742 to 0.777 in CID-DF40, from benchmarks.

0.792 to 0.821 in CID-FF23, from 0.669 to 0.691 in CID-


FF40, from 0.579 to 0.658 in CID-FS23, from 0.609 to
0.681 in CID-FS40, from 0.764 to 0.791 in CID-NT23, and
from 0.618 to 0.621 in CID-NT40. This demonstrates that our
method improves the generalization of the model in all of the
CID benchmarks.

2) CVD COMPARISONS Because the higher the performance of the model, the harder
Results on the CVD benchmark are shown in Table 3. it is to increase when the result is as high as a certain level.
We focus on the performance across the datasets, the target The backbones used for the experiment are the good back-
domains are the test sets from many datasets. The obtained bones used in the image classification. Therefore, the differ-
results show that our proposal has improved the quality of ence in results obtained between the backbones is usually not
the model in all benchmarks. Our proposal can compare with large.
the most basic and commonly used model FT-Base. The
performance improvements on AUC from 0.582 to 0.708 with 3) EFFECTIVENESS OF DIFFERENT COMPONENTS
the CVD-CV23-1 benchmark, from 0.672 to 0.788 with the We compare our entire MDD with three degraded versions
CVD-CV23-2 benchmark, and from 0.717 to 0.821 with the on the CID-DF23 benchmarks to assess the efficacy of
CVD-CV23-3 benchmark. This shows that our proposal has various components. The first component is pair-attention
increased the generalization of the basic model when tested loss (PAL), which prioritizes increasing negative and posi-
with benchmarks. tive pairings and distinguishing positive from negative input.
Results in Table 4 show the effect of backbones with and The second component is the average-center alignment loss
without our proposal. We use the CID-DF23 benchmark and (ACA), which focuses on lowering the variability within each
test with different architectures (light and heavy parame- class while maintaining the ability to distinguish between
ters). The observed results demonstrate that our method is attributes of other classes. The third component is both the
model-independent and can improve the performance of the pair-attention loss and the average-center alignment loss. The
model irrespective of the types of architectures. The model efficiency of each performance component is displayed in
is less effective for complex models than simples model. Table 5. When any of them are eliminated, the performance

544 VOLUME 11, 2023


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

decreases. The quality of the model degrades the greatest [13] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, ‘‘Celeb-DF: A large-scale
when the pair-attention loss and average-center alignment challenging dataset for DeepFake forensics,’’ in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3207–3216.
loss are not employed. This demonstrates the impact of pro- [14] D. Li, Y. Yang, Y.-Z. Song, and T. Hospedales, ‘‘Learning to generalize:
posed loss functions on the quality of the model. Meta-learning for domain generalization,’’ in Proc. AAAI Conf. Artif.
Table 6 displays the effects of block shuffling transforma- Intell., 2018, vol. 32, no. 1, pp. 1–8.
[15] K. Hsu, S. Levine, and C. Finn, ‘‘Unsupervised learning via meta-
tion. The results of the model fluctuate around an average learning,’’ 2018, arXiv:1810.02334.
of 0.919 of AUC if block shuffling transformation is not [16] Z. Li, F. Zhou, F. Chen, and H. Li, ‘‘Meta-SGD: Learning to learn quickly
used in the data preprocessing. We enhance performance for few-shot learning,’’ 2017, arXiv:1707.09835.
using the block shuffling transformation approach, going [17] R. Natsume, T. Yatagawa, and S. Morishima, ‘‘RSGAN: Face swapping
and editing using face and hair representation in latent spaces,’’ 2018,
from an AUC of 0.919 to 0.931. Improvement with ACC is arXiv:1804.03447.
from 0.832 to 0.861. The loss then decreases from 0.84 to [18] Y. Nirkin, Y. Keller, and T. Hassner, ‘‘FSGAN: Subject agnostic face
0.78. This demonstrates the block shuffling transformation swapping and reenactment,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis.
(ICCV), Oct. 2019, pp. 7184–7193.
improves the performance of the model.
[19] Y. Nirkin, Y. Keller, and T. Hassner, ‘‘FSGANv2: Improved subject agnos-
tic face swapping and reenactment,’’ 2022, arXiv:2202.12972.
V. CONCLUSION [20] Z. Xu, Z. Hong, C. Ding, Z. Zhu, J. Han, J. Liu, and E. Ding, ‘‘Mobile-
In this paper, we propose an approach that can improve the FaceSwap: A lightweight framework for video face swapping,’’ 2022,
arXiv:2201.03808.
generalization of the model, named Meta Deepfake Detection [21] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, ‘‘AttGAN: Facial attribute
model (MDD). We also apply block shuffling transformation editing by only changing what you want,’’ IEEE Trans. Image Process.,
to enhance the performance and reduce the overfitting prob- vol. 28, no. 11, pp. 5464–5478, Nov. 2019.
[22] M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, ‘‘STGAN:
lem. Moreover, we design two loss functions Pair-Attention A unified selective transfer network for arbitrary image attribute edit-
Loss and Average-Center Alignment Loss, aggregate with ing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
softmax loss to update and learn across domains. We show Jun. 2019, pp. 3673–3682.
[23] Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, ‘‘StarGAN V2: Diverse image
that by using MDD, we can generalize the unseen domain,
synthesis for multiple domains,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
as demonstrated in the experiment using several benchmarks. Pattern Recognit. (CVPR), Jun. 2020, pp. 8188–8197.
For future work, we will find a new strategy to develop MDD [24] R. Durall, J. Jam, D. Strassel, M. H. Yap, and J. Keuper, ‘‘Facial-
and experiment with more benchmarks. GAN: Style transfer and attribute manipulation on synthetic faces,’’ 2021,
arXiv:2110.09425.
[25] S. Ha, M. Kersner, B. Kim, S. Seo, and D. Kim, ‘‘Marionette: Few-shot face
REFERENCES reenactment preserving identity of unseen targets,’’ in Proc. AAAI Conf.
Artif. Intell., 2020, vol. 34, no. 7, pp. 10893–10900.
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial networks,’’ [26] X. Zeng, Y. Pan, M. Wang, J. Zhang, and Y. Liu, ‘‘Realistic face reenact-
Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020. ment via self-supervised disentangling of identity and pose,’’ in Proc. AAAI
[2] D. P. Kingma and M. Welling, ‘‘Auto-encoding variational Bayes,’’ 2013, Conf. Artif. Intell., 2020, vol. 34, no. 7, pp. 12757–12764.
arXiv:1312.6114. [27] J. Zhang, X. Zeng, M. Wang, Y. Pan, L. Liu, Y. Liu, Y. Ding, and C. Fan,
[3] D. J. Rezende, S. Mohamed, and D. Wierstra, ‘‘Stochastic backpropagation ‘‘FReeNet: Multi-identity face reenactment,’’ in Proc. IEEE/CVF Conf.
and approximate inference in deep generative models,’’ in Proc. Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5326–5335.
Mach. Learn., 2014, pp. 1278–1286. [28] Y. Guo, K. Chen, S. Liang, Y.-J. Liu, H. Bao, and J. Zhang, ‘‘AD-NeRF:
[4] R. Wu, G. Zhang, S. Lu, and T. Chen, ‘‘Cascade EF-GAN: Progressive Audio driven neural radiance fields for talking head synthesis,’’ in Proc.
facial expression editing with local focuses,’’ in Proc. IEEE/CVF Conf. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 5784–5794.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5021–5030. [29] S. Tripathy, J. Kannala, and E. Rahtu, ‘‘FACEGAN: Facial attribute con-
[5] Y. Shen, J. Gu, X. Tang, and B. Zhou, ‘‘Interpreting the latent space of trollable rEenactment GAN,’’ in Proc. IEEE Winter Conf. Appl. Comput.
GANs for semantic face editing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Vis. (WACV), Jan. 2021, pp. 1329–1338.
Pattern Recognit. (CVPR), Jun. 2020, pp. 9243–9252. [30] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, ‘‘On the detection of
[6] C.-H. Lee, Z. Liu, L. Wu, and P. Luo, ‘‘MaskGAN: Towards diverse and digital face manipulation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
interactive facial image manipulation,’’ in Proc. IEEE/CVF Conf. Comput. Recognit. (CVPR), Jun. 2020, pp. 5781–5790.
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5549–5558. [31] G. Jia, M. Zheng, C. Hu, X. Ma, Y. Xu, L. Liu, Y. Deng, and R. He,
[7] F. Matern, C. Riess, and M. Stamminger, ‘‘Exploiting visual artifacts to ‘‘Inconsistency-aware wavelet dual-branch network for face forgery detec-
expose deepfakes and face manipulations,’’ in Proc. IEEE Winter Appl. tion,’’ IEEE Trans. Biometrics, Behav., Identity Sci., vol. 3, no. 3,
Comput. Vis. Workshops (WACVW), Jan. 2019, pp. 83–92. pp. 308–319, Jul. 2021.
[8] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, ‘‘Face [32] H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Capsule-forensics:
X-ray for more general face forgery detection,’’ in Proc. IEEE/CVF Conf. Using capsule networks to detect forged images and videos,’’ in Proc.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5001–5010. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019,
[9] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Niessner, pp. 2307–2311.
‘‘FaceForensics++: Learning to detect manipulated facial images,’’ in [33] Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, ‘‘FakeLocator: Robust
Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1–11. localization of GAN-based face manipulations,’’ IEEE Trans. Inf. Foren-
[10] X. Wu, Z. Xie, Y. Gao, and Y. Xiao, ‘‘SSTNet: Detecting manipu- sics Security, vol. 17, pp. 2657–2672, 2022.
lated faces through spatial, steganalysis and temporal features,’’ in Proc. [34] L. Guarnera, O. Giudice, and S. Battiato, ‘‘DeepFake detection by analyz-
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020, ing convolutional traces,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
pp. 2952–2956. Recognit. Workshops (CVPRW), Jun. 2020, pp. 666–667.
[11] Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, ‘‘Thinking in frequency: [35] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, ‘‘MesoNet: A compact
Face forgery detection by mining frequency-aware clues,’’ in Proc. Eur. facial video forgery detection network,’’ in Proc. IEEE Int. Workshop Inf.
Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 86–103. Forensics Secur. (WIFS), Dec. 2018, pp. 1–7.
[12] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and [36] U. A. Ciftci, I. Demir, and L. Yin, ‘‘FakeCatcher: Detection of synthetic
C. C. Ferrer, ‘‘The DeepFake detection challenge (DFDC) dataset,’’ 2020, portrait videos using biological signals,’’ IEEE Trans. Pattern Anal. Mach.
arXiv:2006.07397. Intell., early access, Jul. 15, 2020, doi: 10.1109/TPAMI.2020.3009287.

VOLUME 11, 2023 545


V.-N. Tran et al.: Generalization of Forgery Detection With Meta Deepfake Detection Model

[37] Y. Ru, W. Zhou, Y. Liu, J. Sun, and Q. Li, ‘‘Bita-Net: Bi-temporal attention [61] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, ‘‘Multi-task learning
network for facial video forgery detection,’’ in Proc. IEEE Int. Joint Conf. for detecting and segmenting manipulated facial images and videos,’’
Biometrics (IJCB), Aug. 2021, pp. 1–8. in Proc. IEEE 10th Int. Conf. Biometrics Theory, Appl. Syst. (BTAS),
[38] B. Chen, T. Li, and W. Ding, ‘‘Detecting DeepFake videos based on Sep. 2019, pp. 1–8.
spatiotemporal attention and convolutional LSTM,’’ Inf. Sci., vol. 601, [62] H. Zhao, T. Wei, W. Zhou, W. Zhang, D. Chen, and N. Yu, ‘‘Multi-
pp. 58–70, Jul. 2022. attentional DeepFake detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
[39] F. Marra, C. Saltori, G. Boato, and L. Verdoliva, ‘‘Incremental learning for Pattern Recognit. (CVPR), Jun. 2021, pp. 2185–2194.
the detection and classification of GAN-generated images,’’ in Proc. IEEE [63] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
Int. Workshop Inf. Forensics Secur. (WIFS), Dec. 2019, pp. 1–6. recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[40] H. Khalid and S. S. Woo, ‘‘OC-FakeDect: Classifying DeepFakes using Jun. 2016, pp. 770–778.
one-class variational autoencoder,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 656–657. VAN-NHAN TRAN received the B.E. degree in
[41] Z. Wang, Y. Guo, and W. Zuo, ‘‘DeepFake forensics via an adversarial electronics-telecommunications engineering from
game,’’ IEEE Trans. Image Process., vol. 31, pp. 3541–3552, 2022. the Vietnam National University Ho Chi Minh
[42] C. Finn, P. Abbeel, and S. Levine, ‘‘Model-agnostic meta-learning for fast City, University of Technology, Vietnam, in 2018.
adaptation of deep networks,’’ in Proc. Int. Conf. Mach. Learn., 2017, He is currently pursuing the M.S.E. degree in
pp. 1126–1135. artificial intelligence convergence with Pukyong
[43] L. Zintgraf, K. Shiarli, V. Kurin, K. Hofmann, and S. Whiteson, ‘‘Fast National University, South Korea. His research
context adaptation via meta-learning,’’ in Proc. Int. Conf. Mach. Learn., interests include computer vision, multimedia
2019, pp. 7693–7702. security, machine learning, and AI.
[44] M. A. Jamal and G.-J. Qi, ‘‘Task agnostic meta-learning for few-shot learn-
ing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2019, pp. 11719–11727. SEONG-GEUN KWON received the B.S., M.S.,
[45] R. Shao, X. Lan, and P. C. Yuen, ‘‘Regularized fine-grained meta face and Ph.D. degrees in electrical engineering from
anti-spoofing,’’ in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 7, Kyungpook National University, South Korea,
pp. 11974–11981. in 1996, 1998, and 2002, respectively. He worked
[46] K. Sun, H. Liu, Q. Ye, Y. Gao, J. Liu, L. Shao, and R. Ji, ‘‘Domain general as a Senior Engineer with the Mobile Division,
face forgery detection by learning to weight,’’ in Proc. AAAI Conf. Artif.
Samsung Electronics, from 2002 to 2011. He is
Intell., 2021, vol. 35, no. 3, pp. 2638–2646.
currently working as a Professor with the Depart-
[47] S. Jia, C. Ma, T. Yao, B. Yin, S. Ding, and X. Yang, ‘‘Exploring frequency
adversarial attacks for face forgery detection,’’ in Proc. IEEE/CVF Conf. ment of Electronic Engineering, Kyungil Univer-
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 4103–4112. sity. His research interests include mobile device,
[48] Z. Yu, J. Wan, Y. Qin, X. Li, S. Z. Li, and G. Zhao, ‘‘NAS-FAS: multimedia security, and computer vision.
Static-dynamic central difference network search for face anti-spoofing,’’
SUK-HWAN LEE received the B.S., M.S., and
IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 9, pp. 3005–3023,
Ph.D. degrees in electrical engineering from
Sep. 2021.
[49] R. Cai, Z. Li, R. Wan, H. Li, Y. Hu, and A. C. Kot, ‘‘Learning meta
Kyungpook National University, South Korea,
pattern for face anti-spoofing,’’ IEEE Trans. Inf. Forensics Security, vol. 17, in 1999, 2001, and 2004, respectively. He is
pp. 1201–1213, 2022. currently a Professor with the Department of
[50] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, ‘‘Joint face detection and alignment Computer Engineering, Dong-A University. His
using multitask cascaded convolutional networks,’’ IEEE Signal Process. research interests include multimedia security, dig-
Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016. ital image processing, and computer graphics.
[51] T. Chuman, W. Sirichotedumrong, and H. Kiya, ‘‘Encryption-then- He is a Thesis Editor-in-Chief of KMMS journal.
compression systems using grayscale-based image encryption for
JPEG images,’’ IEEE Trans. Inf. Forensics Security, vol. 14, no. 6,
pp. 1515–1525, Jun. 2019. HOANH-SU LE received the B.E. degree in elec-
[52] W. Sirichotedumrong and H. Kiya, ‘‘Grayscale-based block scram- tronics and telecommunication, the M.Sc. and
bling image encryption using YCbCr color space for encryption-then- M.B.A. degrees in MIS from the Vietnam National
compression systems,’’ APSIPA Trans. Signal Inf. Process., vol. 8, no. 1, University HCM City, and the Ph.D. degree in MIS
pp. 1–15, 2019. from Pukyong National University, South Korea.
[53] W. Sirichotedumrong, Y. Kinoshita, and H. Kiya, ‘‘Pixel-based image From 2006 to 2011, he was a Senior Engineer
encryption without key management for privacy-preserving deep neural and the Project Team Leader at Global CyberSoft.
networks,’’ IEEE Access, vol. 7, pp. 177844–177855, 2019. Since 2011, he has been a Faculty Member with
[54] M. Du, S. Pentyala, Y. Li, and X. Hu, ‘‘Towards generalizable DeepFake the University of Economics and Law, Vietnam
detection with locality-aware AutoEncoder,’’ in Proc. 29th ACM Int. Conf. National University Ho Chi Minh City, where he
Inf. Knowl. Manag., Oct. 2020, pp. 325–334. is currently the Dean of the Faculty of Information Systems. His research
[55] M. Maung, A. Pyone, and H. Kiya, ‘‘Encryption inspired adversarial interests include data analytics, big data, robotics, and AI.
defense for visual classification,’’ in Proc. IEEE Int. Conf. Image Process.
(ICIP), Oct. 2020, pp. 1681–1685. KI-RYONG KWON received the B.S., M.S.,
[56] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, ‘‘Random erasing data and Ph.D. degrees in electronics engineering
augmentation,’’ in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 7, from Kyungpook National University, in 1986,
pp. 13001–13008. 1990, and 1994, respectively. He worked at
[57] D. A. Coccomini, N. Messina, C. Gennaro, and F. Falchi, ‘‘Combining Hyundai Motor Company, from 1986 to 1988,
EfficientNet and vision transformers for video DeepFake detection,’’ in
and at the Pusan University of Foreign Lan-
Proc. Int. Conf. Image Anal. Process. Cham, Switzerland: Springer, 2022,
guage, from 1996 to 2006. He was a Postdoctoral
pp. 219–229.
[58] D. Wodajo and S. Atnafu, ‘‘DeepFake video detection using convolutional
Researcher at the University of Minnesota, USA,
vision transformer,’’ 2021, arXiv:2102.11126. from 2000 to 2002. He is currently the Dean of the
[59] M. Tan and Q. Le, ‘‘EfficientNet: Rethinking model scaling for con- Engineering College as well as a Professor with
volutional neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2019, the Department of IT Convergence and Application Engineering, Pukyong
pp. 6105–6114. National University. His research interests include digital image processing,
[60] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: multimedia security, bioinformatics, and machine learning. He was the Pres-
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. ident of the Korea Multimedia Society, from 2015 to 2016.
Vis. Pattern Recognit., Jun. 2009, pp. 248–255.

546 VOLUME 11, 2023

You might also like