Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI
Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI
ABSTRACT The widespread propagation of misinformation on social media platforms poses a significant
concern, prompting substantial endeavors within the research community to develop robust detection
solutions. Individuals often place unwavering trust in social networks, often without discerning the origins
and authenticity of the information disseminated through these platforms. Hence, the identification of
media-rich fake news necessitates an approach that adeptly leverages multimedia elements and effectively
enhances detection accuracy. The ever-changing nature of cyberspace highlights the need for measures
that may effectively resist the spread of media-rich fake news while protecting the integrity of information
systems. This study introduces a robust approach for fake news detection, utilizing three publicly available
datasets: WELFake, FakeNewsNet, and FakeNewsPrediction. We integrated FastText word embeddings with
various Machine Learning and Deep Learning methods, further refining these algorithms with regularization
and hyperparameter optimization to mitigate overfitting and promote model generalization. Notably, a hybrid
model combining Convolutional Neural Networks and Long Short-Term Memory, enriched with FastText
embeddings, surpassed other techniques in classification performance across all datasets, registering
accuracy and F1-scores of 0.99, 0.97, and 0.99, respectively. Additionally, we utilized state-of-the-art
transformer-based models such as BERT, XLNet, and RoBERTa, enhancing them through hyperparameter
adjustments. These transformer models, surpassing traditional RNN-based frameworks, excel in managing
syntactic nuances, thus aiding in semantic interpretation. In the concluding phase, explainable AI modeling
was employed using Local Interpretable Model-Agnostic Explanations, and Latent Dirichlet Allocation to
gain deeper insights into the model’s decision-making process.
INDEX TERMS Fake news, deep learning, interpretability modeling, machine learning, word embeddings,
transformers.
I. INTRODUCTION People all over the world use these platforms to get news
In the current era, digital platforms such as social media, about everything from celebrities to politics, often without
online forums, and websites have overtaken traditional questioning if the news is real or not [3]. Fake news, which is
media as the foremost sources of information [1]. This intentionally created and verifiably false information, is seen
paradigm shift highlights the transformation in our methods as a threat to the stability of democratic systems, diminishing
of accessing and interacting with information [2]. Social public trust in government institutions, and having a profound
media’s freedom of expression and instant information make effect on critical societal aspects such as elections, economic
it very popular, especially with the younger generation. conditions, and public opinions on matters like wars [4], [5].
The dissemination of fake news was markedly prominent
The associate editor coordinating the review of this manuscript and in the key stages of the 2016 U.S. presidential election.
approving it for publication was Leimin Wang . This trend not only influenced public perception but also
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
44462 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning
raised concerns about the integrity of information consumed to the classification of a sentence as fake news. We have
by voters during such significant democratic processes [6]. employed LIME with multiple deep learning models to
During that period, around 19 million bot accounts were interpret these black-box deep learning models.
established to disseminate false news regarding Trump Following, the contributions of this research work are
and Clinton and this deliberate strategy rapidly increased summarized, followed by how the rest of the paper is
the spread and influence of misinformation among the organized.
public [7], [8]. Additionally, reports indicate that fake news
tends to receive more attention on social media compared
to factual news, with examples of this trend visible on A. WORK CONTRIBUTION
prominent social media platforms. The issue of fake news is 1) In this study, our focus is on advancing the detection
considered to be more critical than other types of misinfor- of fake news by the refinement and application of
mation [9], [10]. As the widespread presence of fake news established fake news detection methodologies through
on social media continues to challenge the trustworthiness the use of regularization methods, optimization tech-
of online information, it becomes increasingly important niques, and hyperparameter tuning. Our methodology
to develop effective measures to address this problem. is carefully applied to a baseline dataset suited for
With the continuous increase in data volume, the need to binary classification, differentiating between factual
rapidly and efficiently gather pertinent information becomes and fabricated information. We carried out our work
increasingly important. This underscores the importance of using three publicly available fake news datasets:
using computational linguistic methods. In this context, the WELFake, and two other news article datasets from
application of Artificial Intelligence (AI) techniques becomes Kaggle.
crucial, providing advanced tools to detect and address 2) We stacked supervised and unsupervised FastText
misinformation effectively. embeddings into ML-based models, including Support
The use of AI in fake news detection is critical because Vector Machine (SVM), Decision Tree (DT), Logistic
it can methodically analyze the minute details of language Regression (LR), Random Forest (RF), and bagging
and context that might be missed by human moderators [11], classifiers like Extreme Gradient Boosting (XGBoost),
[12]. Recent progress in AI and Natural Language Processing and Categorical Boosting (CATBoost). To ensure com-
(NLP) has heightened the interest in fake news detection, prehensive coverage of text data, we also implemented
resulting in the creation of many innovative approaches for a solution to handle out-of-vocabulary (OOV) words
research in this area [13], [14]. The extensive array of online using FastText embeddings, allowing our models to
content, encompassing a wide range of subjects, increases effectively process previously unseen terms. In addi-
the complexity of the task. This has led researchers to tion, we pursued rigorous optimization, fine-tuning
focus on developing methods for automated detection of regularization techniques and hyperparameters across
fake news. Consequently, this advancement in technology our ML models. This meticulous approach aimed to
is crucial for maintaining the integrity of information optimize model performance, prevent overfitting, and
on the internet [15]. Identifying fake news presents a ultimately produce robust, generalizable results.
significant technological challenge for several reasons. This 3) Additionally, to effectively capture complex contex-
complexity necessitates advanced solutions to ensure the tual information and sequential dependencies within
reliability and accuracy of information disseminated online. the text data, we applied FastText embeddings in
This paper utilizes Machine Learning (ML) and Deep DL-based models such as Long Short-Term Mem-
Learning (DL) based techniques, including state-of-the-art ory (LSTM), Gated Recurrent Unit (GRU), and
transformer-based models, to enhance fake news detection. Convolutional Neural Network (CNN). Furthermore,
By incorporating FastText word embeddings for effective this study implemented state-of-the-art text classi-
text data processing and applying these methods to three fication transformer-based models, including Bidi-
publicly available datasets, we achieve a thorough and rectional Encoder Representation from Transformers
detailed analysis. This approach is crucial for accurately (BERT), Robustly Optimized BERT (RoBERTa), and
identifying misinformation in the world of online media. the auto-regressive transformer XLNET with hyper-
Additionally, our work integrates explainable AI methods, parameter tuning. We leveraged these transformers
ensuring that our processes are not only effective but also for their proven ability to capture intricate contextual
transparent and understandable, aligning with the growing information and long-range dependencies in text data,
need for accountability in AI-driven solutions. making them well-suited for the complex task of fake
These advanced DL-based models are excellent when it news detection.
comes to classification, but these models operate as black 4) To enhance the interpretability of our results, par-
boxes [16]. To understand how the model works and which ticularly after observing the best performance of the
attributes contribute most to a prediction, Explainable AI CNN-LSTM model, we implemented Explainable AI
(XAI) comes into play. In this work, we have utilized XAI (XAI) techniques. These included Local Interpretable
algorithms to determine the words that contributed the most Model-Agnostic Explanations (LIME) and coupled
with topic modeling using Latent Dirichlet Allocation characteristics from a dataset based on phrase frequency
(LDA), all applied to the WELFake dataset. and then applying classification algorithms. The method
is particularly effective at detecting rogue accounts within
B. STRUCTURE OF THE PAPER biassed datasets, which are typical in social media platforms.
The structure of the remainder of this paper is organized The technology distinguishes between legitimate and fake
as follows: Section (II) reviews the existing research on identities with high accuracy. The system achieves improved
fake news detection. Section (III) details the methodology of accuracy by utilizing Recurrent Neural Networks (RNNs)
the proposed work. Section (IV) is dedicated to presenting with multiple activation functions. Furthermore, as the num-
the results and discussions. Section (V) compares these ber of folds in cross-validation increases, the classification
results with baseline methods. Section (VI) delves into the precision improves. The experimental analysis includes tests
interpretability modeling using LIME, and LDA. In the on both synthetic and real-time social media datasets, with
concluding phase, Section (VII) concludes the paper and real-time Twitter data obtaining roughly 96% accuracy and
outlines future work. synthetic datasets achieving 98% accuracy.
a binary classification framework for fake news detection more accurate and interpretable analysis. Soga et al. [49]
that combines Bidirectional Encoder Representations from focuses on the detection of fake news on social media by
Transformers (BERT) to capture global text semantics analyzing stance similarity and employing Graph Neural
through the relationships between words in sentences, and Networks (GNNs). Their research work proposes a method
CNN to leverage N-gram features for local text semantics. that accounts for the opinion similarity between users by
They conducted their experiments on four publicly available examining their stances towards news articles and user post
datasets. A similar approach was proposed by Guo et al. [39] interactions. This method uses Graph Transformer Networks
using DL-based models and a pre-trained transformer-based (GNNs) to extract both global structural information and
BERT model for the same purpose. The results of both interactions of similar stances effectively. The technique
studies provide valuable insights into the effectiveness of addresses stance analysis challenges in microblogs and
these methods in the domain of fake news detection. minimizes the impact of poorly represented stance features.
Praseed et al. [40] presented an approach for detecting fake The approach was evaluated using custom crawled Twitter
news in Hindi using an ensemble of pre-trained transformer data and the benchmark FibVID1 dataset, demonstrating
models XLM-RoBERTa [41], mBERT, and ELECTRA [42] significant improvements in detection performance com-
which are separately fine-tuned for the task of Hindi fake pared to conventional methods, including state-of-the-art
news detection. After undergoing appropriate fine-tuning, approaches. This advancement suggests that incorporating
pre-trained transformer models have demonstrated their stance similarity in news-sharing interactions, alongside
capability to identify fake news across various languages. the extraction of propagation patterns characteristic of
In their research study, they utilized the CONSTRAINT2021 fake news, enhances the detection accuracy, making it a
dataset [43], which comprises a total of 8192 online promising direction for future fake news detection studies.
posts. Among these posts, 4358 are categorized as non- Pilkevych et al. [50] explored fake news detection by using
hostile, whereas the remaining 3834 posts exhibit some GNNs, they did a detailed analysis aimed at mitigating
form of hostility. In their research study, Biradar et al. [44] the impacts of disinformation, particularly in the context
introduced an early fusion-based approach that combined of Russia’s aggression against Ukraine. They advocate for
essential features extracted from context-based embeddings GNNs as a potent tool for the automated identification of
like BERT, XLNet, and ELMo [45]. This fusion method harmful content, emphasizing their application in monitoring
aimed to improve the collection of context and semantic online media to promptly detect and assess fake news.
information from social media posts, leading to increased Their approach leverages knowledge graphs (KG) for entity
accuracy in detecting false news. Alongside this approach, recognition and relationship mapping in textual content, with
they implemented both ML and DL-based techniques. Their an emphasis on detecting signs of negative psychological
experiments were conducted using the ‘‘CONSTRAINT influence. Among the models evaluated, GraphSAGE stands
shared task 2021’’ dataset. Moreover, when considering the out for its performance, achieving notable accuracy scores
various embeddings discussed, BERT embeddings exhibited of 89.78% on the Politifact dataset and 98.01% on the
significantly superior performance compared to XLNet and Gossipcop dataset, when trained on data embodying signs of
ELMo, particularly when applied to the limited short text negative psychological influence. This research underscores
data extracted from Twitter. Additionally, combining features the critical role of sophisticated machine learning techniques
derived from different embeddings into a unified vector for in addressing the challenge of disinformation, highlighting
classification resulted in a slight performance improvement. the effectiveness of GNNs in enhancing the accuracy and
Wu et al. [46] introduce Graph-based Semantic Structure efficiency of fake news detection systems.
Mining with Contrastive Learning (GETRAL), a revolution-
ary graph-based semantic structure mining framework with C. MAJOR CHALLENGES
contrastive learning, to improve evidence-based fake news After performing comprehensive analysis or related work
identification that significantly surpasses existing models following are the current challenges in fake news detection,
on the Snopes [47] and PolitiFact [48] datasets. This 1) Variability and Sophistication: Fake news often
methodology overcomes the constraints of earlier methods mimics genuine news in style and presentation, making
by representing claims and evidence as graph-structured it difficult to distinguish based on surface features
data, allowing for the capture of long-distance semantic rela- alone. The sophistication of misinformation tactics
tionships. GETRAL lowers information redundancy through evolves continuously, necessitating advanced detection
graph structure learning and enhances representation learning techniques that can adapt to changing patterns [51].
through supervised contrastive learning with adversarial 2) Linguistic Nuances and Contextual Understanding:
augmented examples. On Snopes, GETRAL achieves an The effective detection of fake news requires a deep
F1-Macro score of 80.61% and an F1-Micro score of 85.12%. understanding of linguistic subtleties and the ability to
On the PolitiFact dataset, GETRAL records an F1-Macro interpret context. This is challenging due to the vast
of 69.53% and an F1-Micro of 69.81%, demonstrating its
superior performance in addressing the challenges of fake
news detection by integrating advanced techniques for a 1 https://github.com/merry555/FibVID
diversity of languages and the specific cultural contexts news. By incorporating explainable AI and topic modeling
within which news is disseminated [52]. techniques into our research methodology, we intend to shed
3) Bias and Subjectivity: Identifying biases and subjec- light on the interpretability and transparency of our models,
tive assertions within news content without suppressing ultimately enhancing the comprehensibility of fake news
freedom of expression or introducing detection biases understanding. Table 1 represents the comparative analysis
presents a significant challenge. of the current state-of-the-art methods.
4) Scalability and Generalizability: The ability to scale
detection mechanisms to process vast quantities of III. WORK METHODOLOGY
data across different platforms, and ensuring these The proposed research methodology of this study involves a
mechanisms are generalizable across various domains systematic approach to achieving promising results, as shown
and languages, is a complex endeavor. in Figure 1. Each of the steps from our research methodology
From the existing literature, it is evident that numerous is further elaborated in detail below:
studies have tackled the problem of fake news detection
utilizing both traditional ML and DL-based approaches A. DATASET
and highlight the current challenges in the domain of In our study, we addressed the binary classification prob-
fake news detection, such as the sophisticated techniques lem, where 0 represents fake news, and 1 represents
used to generate and disseminate fake news, the rapid real news. We employed three publicly available datasets:
evolution of misinformation, and the difficulty of achieving WELFake [29], FakeNewsNet [30], and FakeNewsPredic-
high accuracy in detection while maintaining interpretability tion.4 WELFake consists of 72,134 news articles, with 35,028
and generalizability. In this study, we aim to contribute categorized as fake news and 37,106 classified as real
to this analysis by employing a comprehensive range news. To prevent classifier overfitting and enhance machine
of techniques, including ML, DL, and transformer-based learning training, the authors combined data from four
models. To enhance the accuracy and generalizability of fake prominent news datasets, including those from Kaggle,
news detection, we leverage supervised and unsupervised McIntire, Reuters, and BuzzFeed Political, thereby enriching
FastText word embeddings using three benchmark datasets, the dataset with a more extensive and varied collection of
complemented by extensive regularization techniques and text data. FakeNewsNet comprises two extensive datasets
hyperparameter tuning methods. A noteworthy aspect of our that encompass a wide range of characteristics related to
contribution to this paper will be our focus on addressing
the limited body of work concerning XAI within fake 4 https://www.kaggle.com/datasets/rajatkumar30/fake-news
TABLE 2. Count of instances in datasets. complex endeavor. Table 3 highlights some preprocessed text
data examples from the WELFake dataset.
C. WORD EMBEDDING
Word embeddings provide numerical representations for
textual inputs, allowing machines to process and understand
textual data more effectively. These embeddings capture
news content, social context, and spatiotemporal information. semantic relationships and contextual information, facilitat-
The third dataset, FakeNewsPrediction, comprises 3,171 ing tasks such as sentiment analysis, text classification, and
instances of real news and 3,164 instances of fake news. language modeling. By transforming words into vectors in a
Table 2 represents the count of instances in three datasets used continuous vector space, word embeddings enable machines
in this paper. to recognize similarities between words, capture word
meanings, and generalize from the training data, ultimately
B. DATA PREPROCESSING enhancing the performance of various natural language tasks.
Effective data preprocessing plays a pivotal role in enhancing In this paper, we have utilized FastText embeddings due
the performance of various ML and DL-based models, as it to their effectiveness in capturing semantic information and
involves eliminating irrelevant text from the dataset and contextual nuances within text data. FastText embeddings
ensuring that the data is presented in a concise and suitable offer distinct advantages over traditional word embeddings,
format. In our study, we placed particular emphasis on as they can represent subword information and handle out-
two primary columns: ‘‘text,’’ which contained all the news of-vocabulary words more gracefully. These qualities make
comments, and ‘‘label,’’ representing the true or fake label. FastText embeddings a superior choice, particularly when
The rationale behind text preprocessing lies in its ability to dealing with languages with rich morphological structures
significantly impact the performance of learning algorithms. and variations. Conventional word vectors disregard the
By preparing the data appropriately, we can improve the internal structure of words, which holds valuable information.
quality and relevance of information used for training and This information could prove beneficial when generating
analysis. To preprocess the ‘‘text’’ column, we implemented a representations for infrequent or incorrectly spelled words.
series of essential steps. Initially, we converted all uppercase The equation 1 denotes the mathematical formula to compute
letters to lowercase and removed non-essential characters, FastText word embeddings [54].
such as ASCII symbols. Subsequently, we conducted tok-
enization of both words and sentences while eliminating stop 1 X
uw + xn (1)
words to further refine the data. Moreover, we employed |N |
n∈N
Python’s RegEx library to filter and process elements such
as numbers, punctuation, and specific patterns, including where:
email addresses, URLs, and phone numbers. Additionally,
we addressed the removal of duplicate examples within the uw : represents the vector for a word w in the embedding
dataset, ensuring data quality and diversity for model training. space.
Data preprocessing ensures that the dataset is cleansed 1
of extraneous information that might otherwise hinder the : is the fraction representing the average.
|N |
learning process. In addition to these steps, we applied X
: is the sum symbol, used to sum over a set of vectors.
lemmatization to our text data. Lemmatization is employed
to reduce words to their base or root form, promoting n ∈ N : specifies that we are summing over the set N.
consistency in word usage and improving the model’s ability xn : represents the vector for the context words in the set.
to recognize similarities between different inflections of the
same word. Overall, our text preprocessing pipeline was FastText, a word representation tool developed by Face-
designed to optimize the quality and relevance of the data fed book’s research division, provides both unsupervised and
into our learning algorithms, thereby enhancing the accuracy supervised modes, featuring an extensive lexicon of 2 mil-
of fake news detection. lion words sourced from Common Crawl. Each word is
For our transformer-based models, we have streamlined represented in a 300-dimensional vector space, resulting in
our preprocessing to include word and sentence tokenization, a vast library comprising a staggering 600 billion word
converting uppercase characters to lowercase, and removing vectors. What sets this word embedding method apart is its
extraneous symbols. This focused approach is instrumental unique approach, incorporating manually crafted n-grams as
in addressing the issue of syntactic ambiguity, as highlighted features in addition to individual words [55]. FastText offers
in prior research [53]. Syntactic ambiguity presents a sub- two primary modes of usage: unsupervised and supervised.
stantial challenge encountered in previous ML and DL-based In our research, we have employed both of these modes,
algorithms, where words within a sentence can have multiple conducting a comprehensive analysis of their respective
meanings depending on the context, making interpretation a applications.
hyperparameters and regularization techniques have been rate of 0.1, and logloss as the loss function. XGBoost uses
employed to optimize performance. a maximum depth of 3 and a subsample ratio of 0.8, while
CatBoost uses a maximum depth of 6 and a subsample
TABLE 5. Configuration details for ML models. ratio of 0.7. These parameters are critical in managing
the models’ complexity and preventing overfitting while
ensuring efficient learning.
2) DL BASED MODELS
In our study, we implemented LSTM, its variant BiLSTM,
GRU, and the hybrid CNN-LSTM model. These RNN-based
models excel in processing sequential data, with LSTM units
In table 5, for DT, the Split_min values of 2, 5, and
adept at capturing long-term dependencies. The BiLSTM
10 dictate the minimum number of samples required for a
variant further enhances this by processing data in both
node split, influencing the tree’s complexity and potential
forward and backward directions, thus gaining a more
overfitting. In RF, the N-Estimators parameter, with values
comprehensive understanding of context, which is especially
50, 100, and 200, determines the number of trees in
beneficial in complex sequential tasks. GRU, while similar
the forest, balancing between computational efficiency and
to LSTM in managing sequence dependencies, offers a
model accuracy. The SVM with a linear kernel and LR
more streamlined architectural design. Additionally, the
classifiers both utilize the regularization parameter C, tested
CNN-LSTM model combines CNN with LSTM, leveraging
at values 0.1, 1, and 10 for SVM, and 1, 10, and 100 for
CNNs’ ability to extract spatial features and LSTMs’
LR. The C parameter plays a crucial role in controlling the
strength in interpreting these features temporally. This hybrid
strength of regularization, which helps to prevent overfitting
model is particularly effective in tasks that require an
by penalizing the magnitude of the coefficients. Lower values
understanding of both spatial and temporal patterns, such as
of C imply more regularization, constraining the model to
video classification and time-series forecasting.
simpler decision boundaries.
All these parameters across different models were meticu- 1) Regularization Techniques: Regularization tech-
lously optimized using GridSearchCV, an exhaustive search niques serve as a method in classifier training to
over specified parameter values. GridSearchCV systemat- avoid overfitting, a condition where a model predicts
ically evaluates combinations of parameters, selecting the training data accurately but fails to generalize well
ones that yield the best performance metrics, thereby ensuring to new, unseen data. The performance enhancement
that each model is finely tuned for optimal accuracy and of the CNN-LSTM model is significantly attributed
generalization. The equation 2 represents the GridSearchCV to the use of kernel L2 regularization, with a lambda
algorithm in ML. In this formulation, optimize reflects the setting of 0.01 for both LSTM and CNN layers. The
goal of GridSearchCV to find the best model parameters. importance of L2 regularization lies in its ability to
The hyperparameters h1 ∈ H1 , h2 ∈ H2 , . . . , hn ∈ Hn are minimize weight magnitudes, thereby encouraging
exhaustively searched to maximize the score function within the model to adopt smaller values for weights. This
their ranges. The argmax operator identifies the specific set approach accomplishes two key goals: it minimizes the
of hyperparameters that yield the highest score, typically a likelihood of overfitting and preserves the model’s abil-
measure of model accuracy or performance. ity to generalize across different datasets effectively.
! The preference for L2 regularization over L1 was a
calculated choice. L1 regularization, while capable of
optimize argmax score (model(h1 , h2 , . . . , hn ))
h1 ∈H1 , h2 , ..., hn ∈Hn inducing sparsity by turning some weights to zero,
could lead to underfitting, an issue that emerged in
(2)
the initial testing phases. The formulas for L1 and
The following table 6 represents the hyperparameters and L2 regularization are detailed in equations (3) and (4),
regularization details for boosting methods in the proposed respectively.
approach, n
X
L1(w) = λ |wi | (3)
TABLE 6. Configuration details for boosting algorithms.
i=1
where:
L1 regularization incorporates the absolute magnitude configuration details of each DL-based model. Notably,
of coefficients as a penalty to the loss function. This the count of each layer has been mentioned as well.
addition of absolute values introduces a non-linear
penalty based on the weights, making L1 regulariza- 3) TRANSFORMER BASED MODELS
tion conducive to sparse outcomes where numerous The Transformer, an innovative system in Natural Language
coefficients become precisely zero. Processing (NLP), is structured to handle sequence-to-
n
X sequence tasks, utilizing a self-attention mechanism that
L2(w) = λ w2i (4) efficiently manages long-range dependencies comprising two
i=1 main components encoder and decoder. BERT, RoBERTa,
and XLNet are all encoder-only models. This architecture
L2 regularization introduces the squared magnitude
makes them highly effective for text classification tasks,
of coefficients as a penalty to the loss function. This
where understanding and processing input data to generate
squaring process results in a smoother, differentiable
contextual representations is crucial. Transformers were first
penalty, even at wi = 0. Contrary to L1 regularization,
introduced in 2017 by Vaswani et al. [56], the Transformer’s
L2 does not lead to sparse models because it generally
self-attention mechanism is characterized by its ability to
does not push coefficients to become exactly zero,
focus on different parts of the input sequence, which can be
although it may reduce them to small values.
represented through a specific mathematical formulation.
2) Hyperparameter Tuning for DL-Based Models: In
the hyperparameter optimization process for DL-based QK ⊤
models, we methodically adjusted the model’s learning Attention(Q, K , V ) = softmax( √ i )Vi (5)
dk
process through targeted experimentation. The training
period was set to 10 epochs, a duration chosen to where:
balance effective learning against the risk of overfitting, Q: is the loss to minimize
and ended when the model’s loss decreased.
K : is the key matrix
In figure 3, CNN-LSTM model combines two con-
volutional layers and LSTM layers for advanced V : is the value matrix
data processing. The convolutional layers, each with dk : is the dimension of the key vectors
64 filters, use kernel sizes of 4 and 3 respectively, N : is the length of the input sequence
with ‘relu’ activation, effectively extracting spatial
i: is the index of the query vector
features. A MaxPooling layer follows, reducing data
dimensionality and enhancing efficiency. The LSTM This study concentrates on the use of transformers, with
segment, with two layers of 50 and 30 units, captures a particular emphasis on the optimization of their hyper-
temporal dynamics, crucial for sequential data analysis. parameters. Transformers represent a notable progression
The model concludes with a ’softmax’-activated dense from earlier language models like RNNs, which were limited
layer, making it suitable for classification tasks. This by their computational intensity and memory demands,
architecture excels in tasks requiring both spatial especially in generative tasks. In our research, we lever-
feature extraction and temporal sequence understand- aged extensive text datasets and utilized text classification
ing. Table 7 illustrates the hyperparameters and transformers, including BERT, XLNet, and RoBERTa. BERT
TABLE 8. Configuration details for transformer based models. performance of our supervised FastText model. This
approach was particularly beneficial for accommodat-
ing the model’s scalability and efficiency without com-
promising accuracy. Post-quantization enabled us to
adjust learning rates dynamically, with certain param-
eters set to true, thereby optimizing computational
resource usage.
excels in understanding the context of a word in a sentence 3) ML Models Execution Time: On average, each epoch
by looking at the words that come before and after it. for our ML-based models required approximately
XLNet, an extension of the Transformer model, outperforms 5 minutes of execution time. This efficiency demon-
BERT in certain scenarios by using a permutation-based strates the models’ suitability for scalable applications.
training approach. RoBERTa modifies key hyperparameters 4) DL-Based Models: The deep learning models took
in BERT, including removing the next-sentence pretraining roughly 5 minutes per epoch, striking a balance
objective and training with much larger mini-batches and between computational demand and performance.
learning rates, leading to improved performance on several 5) Transformer-Based Models: Due to their architec-
benchmarks. The table 8 represents the hyperparameters and tural complexity, transformer-based models necessi-
configuration details for transformer-based methods in the tated about 15 minutes per epoch for training. Despite
proposed approach, the longer duration, the significant improvements
in detection capabilities justify the computational
IV. RESULTS AND DISCUSSION investment.
In our assessment, we utilized standard metrics to evaluate 6) Model Optimization: In addition to post-quantization,
the model’s performance. These metrics include accuracy, we explored various optimization techniques to
precision, recall, and F1-score, all of which offer quantitative enhance model efficiency further. These included layer
measures of the model’s effectiveness. pruning, dropout adjustments, and batch normalization,
TP + TN which collectively contributed to reducing overfitting
Accuracy = (6)
TP + TN + FP + FN and accelerating the training process.
TP
Precision = (7)
TP + FP
TP B. ANALYSIS OF RESULTS: UNSUPERVISED FASTTEXT
Recall = (8) WITH ML AND DL MODELS
TP + FN
2 · Precision · Recall The weighted evaluation scores for ML and DL-based
F1-Score = (9) models, employing unsupervised FastText embeddings on
Precision + Recall
WELFake, FakeNewsNet, and FakeNewsPrediction, are
A. COMPUTATIONAL EFFICIENCY displayed in Tables 9, 10 and, 11 respectively. The provided
To ensure a comprehensive understanding of our proposed tables highlight the SVM classifier’s best performance across
models’ performance and efficiency, we have conducted an all three datasets, surpassing all other ML classifiers in
in-depth comparison of our achievements against existing both accuracy and F1-scores, achieving impressive values of
state-of-the-art methods. Our evaluation extends beyond 0.92, 0.97, and 0.91, respectively. Notably, it outperforms
accuracy, precision, recall, and F1-scores to include compu- even DL-based models utilizing unsupervised FastText
tational efficiency, a crucial aspect for practical applications. embeddings. This consistent and remarkable performance
1) Hardware and Optimization: Our experiments were is noteworthy, especially considering the differing dataset
conducted on a MacBook M3 Max with 128GB of sizes. The SVM classifier’s ability to effectively handle
unified memory. This setup allowed us to benchmark high-dimensional data, create clear decision boundaries, and
the computational requirements accurately. navigate complex, non-linear relationships makes it a strong
2) Post Quantization on Supervised FastText: We choice for text classification, contributing to its exceptional
employed post quantization techniques to optimize the performance in fake news detection tasks.
Unlike the SVM classifier, which demonstrated remark- TABLE 11. Results of ML and DL-Based models with unsupervised
FastText on FakeNewsPrediction dataset.
able and consistent performance, ML classifiers such
as LR, RF, and DT exhibited inconsistent performance
across all three datasets, showing variations in their
performance, even when employing different regularization
techniques with unsupervised FastText embeddings. This
inconsistency underscores the challenges they faced in
adapting to the unique characteristics of each dataset.
In contrast, all DL-based models consistently maintained
their performance and generalizability across the datasets,
showcasing their reliability in handling varying data
complexities.
Table 12 highlights some examples of incorrect predictions TABLE 13. Results of ML and DL-Based models with supervised FastText
on WELFake dataset.
made by the CNN-LSTM model using unsupervised FastText
on the WELFake dataset.
TABLE 15. Results of ML and DL-Based models with supervised FastText TABLE 16. Results of transformer based models on WELFake dataset.
on FakeNewsPrediction dataset.
0.02 increase in accuracy may appear marginal at first glance, log likelihood
Perplexity = exp −1 ∗ (11)
it is statistically significant when considering the extensive total number of words
size of the datasets involved. Specifically, the WELFake We applied LDA to WELFake, FakeNewsNet, and Fake-
dataset includes 72,134 records, and the FakeNewsNet NewsPrediction datasets mentioned in this paper. Based on
dataset contains 23,196 records. By implementing strategic the coherent terms identified, we categorized each dataset
regularization techniques and meticulous parametric tuning, into three primary topics, providing a structured thematic
we were able to achieve these promising results. Such understanding of the datasets. The hyperparameter tuning of
approaches not only enhance model performance but also LDA is performed by performing different experiments and
contribute to the robustness and generalizability of the the best parameters obtained, which are used in this study are
models. This suggests that our models are not only adept at shown in Table 20.
handling the specific datasets they were trained on but also
have the potential to perform well across varied datasets, TABLE 20. Hyperparameter tuning of LDA model.
measures the discrepancy between the predictions of g and FIGURE 10. Example 1: Supervised CNN-LSTM with LIME.
[4] N. Capuano, G. Fenza, V. Loia, and F. D. Nota, ‘‘Content-based fake [27] P. Akhtar, A. M. Ghouri, H. U. R. Khan, M. Amin ul Haq, U.
news detection with machine and deep learning: A systematic review,’’ Awan, N. Zahoor, Z. Khan, and A. Ashraf, ‘‘Detecting fake news
Neurocomputing, vol. 530, pp. 91–103, Apr. 2023. and disinformation using artificial intelligence and machine learning to
[5] F. Miró-Llinares and J. C. Aguerri, ‘‘Misinformation about fake news: avoid supply chain disruptions,’’ Ann. Operations Res., vol. 327, no. 2,
A systematic critical review of empirical studies on the phenomenon and pp. 633–657, Aug. 2023.
its status as a‘threat,’’’ Eur. J. Criminol., vol. 20, no. 1, pp. 356–374, [28] A. K. Shalini, S. Saxena, and B. S. Kumar, ‘‘Designing a model for fake
Jan. 2023. news detection in social media using machine learning techniques,’’ Int.
[6] C. Silverman, ‘‘This analysis shows how viral fake election news stories J. Intell. Syst. Appl. Eng., vol. 11, no. 2, pp. 218–226, 2023.
outperformed real news on Facebook,’’ BuzzFeed news, vol. 16, p. 24, [29] P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, ‘‘WELFake: Word
Jan. 2016. embedding over linguistic features for fake news detection,’’ IEEE Trans.
[7] G. Sansonetti, F. Gasparetti, G. D’Aniello, and A. Micarelli, ‘‘Unre- Computat. Social Syst., vol. 8, no. 4, pp. 881–893, Aug. 2021.
liable users detection in social media: Deep learning techniques [30] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNewsNet:
for automatic detection,’’ IEEE Access, vol. 8, pp. 213154–213167, A data repository with news content, social context, and spatiotemporal
2020. information for studying fake news on social media,’’ Big Data, vol. 8,
[8] A. Jarrahi and L. Safari, ‘‘Evaluating the effectiveness of publishers’ no. 3, pp. 171–188, Jun. 2020.
features in fake news detection on social media,’’ Multimedia Tools Appl., [31] C.-O. Truică and E.-S. Apostol, ‘‘It’s all in the embedding! Fake news
vol. 82, no. 2, pp. 2913–2939, Jan. 2023. detection using document embeddings,’’ Mathematics, vol. 11, no. 3,
[9] R. Rodríguez-Ferrándiz, ‘‘An overview of the fake news phenomenon: p. 508, Jan. 2023.
From untruth-driven to post-truth-driven approaches,’’ Media Commun., [32] J. H. Joloudari, S. Hussain, M. A. Nematollahi, R. Bagheri, F. Fazl,
vol. 11, no. 2, pp. 15–29, Apr. 2023. R. Alizadehsani, R. Lashgari, and A. Talukder, ‘‘BERT-deep CNN: State
[10] M. R. Kondamudi, S. R. Sahoo, L. Chouhan, and N. Yadav, ‘‘A of the art for sentiment analysis of COVID-19 tweets,’’ Social Netw. Anal.
comprehensive survey of fake news in social networks: Attributes, features, Mining, vol. 13, no. 1, p. 99, Jul. 2023.
and detection approaches,’’ J. King Saud Univ.-Comput. Inf. Sci., vol. 35, [33] D. Antony, S. Abhishek, S. Singh, S. Kodagali, N. Darapaneni, M. Rao,
no. 6, Jun. 2023, Art. no. 101571. and A. R. Paduri, ‘‘A survey of advanced methods for efficient text
[11] C. Martel and D. G. Rand, ‘‘Misinformation warning labels are widely summarization,’’ in Proc. IEEE 13th Annu. Comput. Commun. Workshop
effective: A review of warning effects and their moderating features,’’ Conf. (CCWC), Mar. 2023, pp. 0962–0968.
Current Opinion Psychol., vol. 54, Dec. 2023, Art. no. 101710.
[34] J. Briskilal and C. N. Subalalitha, ‘‘An ensemble model for classifying
[12] S. Wang, ‘‘Factors related to user perceptions of artificial intelligence idioms and literal texts using BERT and RoBERTa,’’ Inf. Process. Manage.,
(AI)-based content moderation on social media,’’ Comput. Hum. Behav., vol. 59, no. 1, Jan. 2022, Art. no. 102756.
vol. 149, Dec. 2023, Art. no. 107971.
[35] S. J. Johnson, M. R. Murty, and I. Navakanth, ‘‘A detailed review on word
[13] K. Węcel, M. Sawiński, M. Stróżyna, W. Lewoniewski, E. Księżniak, embedding techniques with emphasis on word2vec,’’ Multimedia Tools
P. Stolarski, and W. Abramowicz, ‘‘Artificial intelligence—Friend or foe Appl., vol. 2023, pp. 1–29, Oct. 2023.
in fake news campaigns,’’ Econ. Bus. Rev., vol. 9, no. 2, pp. 41–70,
2023. [36] M. Umer, Z. Imtiaz, M. Ahmad, M. Nappi, C. Medaglia, G. S. Choi,
and A. Mehmood, ‘‘Impact of convolutional neural network and FastText
[14] A. Altheneyan and A. Alhadlaq, ‘‘Big data ML-based fake news detection
embedding on text classification,’’ Multimedia Tools Appl., vol. 82, no. 4,
using distributed learning,’’ IEEE Access, vol. 11, pp. 29447–29463, 2023.
pp. 5569–5585, Feb. 2023.
[15] S. D. M. Kumar and A. M. Chacko, ‘‘A systematic survey on explainable
[37] A. Nanade and A. Kumar, ‘‘Combating fake news on Twitter: A machine
AI applied to fake news detection,’’ Eng. Appl. Artif. Intell., vol. 122,
learning approach for detection and classification of fake tweets,’’ Int.
Jun. 2023, Art. no. 106087.
J. Intell. Syst. Appl. Eng., vol. 12, no. 1, pp. 424–436, 2024.
[16] S. Ali, F. Akhlaq, A. S. Imran, Z. Kastrati, S. M. Daudpota, and M. Moosa,
‘‘The enlightening role of explainable artificial intelligence in medical & [38] P. K. Verma, P. Agrawal, V. Madaan, and R. Prodan, ‘‘MCred: Multi-
healthcare domains: A systematic literature review,’’ Comput. Biol. Med., modal message credibility for fake news detection using BERT and CNN,’’
vol. 166, Nov. 2023, Art. no. 107555. J. Ambient Intell. Humanized Comput., vol. 14, no. 8, pp. 10617–10629,
Aug. 2023.
[17] D. Choudhury and T. Acharjee, ‘‘A novel approach to fake news detection
in social networks using genetic algorithm applying machine learning [39] Z. Guo, Q. Zhang, F. Ding, X. Zhu, and K. Yu, ‘‘A novel fake news
classifiers,’’ Multimedia Tools Appl., vol. 82, no. 6, pp. 9029–9045, detection model for context of mixed languages through multiscale
Mar. 2023. transformer,’’ IEEE Trans. Computat. Social Syst., 2024.
[18] W. Wang, ‘‘A new benchmark dataset for fake news detection,’’ in Proc. [40] A. Praseed, J. Rodrigues, and P. S. Thilagam, ‘‘Hindi fake news detection
55th Annu. Meeting Assoc. Comput. Linguistics, vol. 2, 2021. using transformer ensembles,’’ Eng. Appl. Artif. Intell., vol. 119, Mar. 2023,
[19] S. Dutta and S. K. Bandyopadhyay, ‘‘Fake job recruitment detection using Art. no. 105731.
machine learning approach,’’ Int. J. Eng. Trends Technol., vol. 68, no. 4, [41] S. Sai, A. W. Jacob, S. Kalra, and Y. Sharma, ‘‘Stacked embeddings and
pp. 48–53, Apr. 2020. multiple fine-tuned XLM-roBERTa models for enhanced hostility identifi-
[20] L. R. Ali, B. N. Shaker, and S. A. Jebur, ‘‘An extensive study of sentiment cation,’’ in Combating Online Hostile Posts in Regional Languages During
analysis techniques: A survey,’’ in AIP Conf. Proc., 2023. Emergency Situation. Cham, Switzerland: Springer, 2021, pp. 224–235.
[21] M. A. Chandra and S. S. Bedi, ‘‘Survey on SVM and their application [42] K. Subramanyam Kalyan, A. Rajasekharan, and S. Sangeetha, ‘‘AMMUS
in image classification,’’ Int. J. Inf. Technol., vol. 13, no. 5, pp. 1–11, : A survey of transformer-based pretrained models in natural language
Oct. 2021. processing,’’ 2021, arXiv:2108.05542.
[22] H. Wang, F. G. Quintana, Y. Lu, M. Mohebujjaman, and K. Kamronnaher, [43] M. Bhardwaj, M. Shad Akhtar, A. Ekbal, A. Das, and T. Chakraborty,
‘‘An application of ordianl logistic regression model to a health survey in ‘‘Hostility detection dataset in Hindi,’’ 2020, arXiv:2011.03588.
a hispanic university,’’ Tech. Rep. [44] S. Biradar, S. Saumya, and A. Chauhan, ‘‘Combating the infodemic:
[23] J. Hu and S. Szymczak, ‘‘A review on longitudinal data analysis with COVID-19 induced fake news recognition in social media networks,’’
random forest in precision medicine,’’ 2022, arXiv:2208.04112. Complex Intell. Syst., vol. 9, no. 3, pp. 2879–2891, Jun. 2023.
[24] M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, ‘‘Mobile big [45] M. S. I. Malik, A. Nawaz, M. M. Jamjoom, and D. I. Ignatov,
data analytics using deep learning and apache spark,’’ IEEE Netw., vol. 30, ‘‘Effectiveness of ELMo embeddings, and semantic models in predicting
no. 3, pp. 22–29, May 2016. review helpfulness,’’ Intell. Data Anal., vol. 2023, pp. 1–21, Nov. 2023.
[25] S. Lee, J. Lee, H. Moon, C. Park, J. Seo, S. Eo, S. Koo, and H. Lim, [46] J. Wu, W. Xu, Q. Liu, S. Wu, and L. Wang, ‘‘Adversarial contrastive
‘‘A survey on evaluation metrics for machine translation,’’ Mathematics, learning for evidence-aware fake news detection with graph neural
vol. 11, no. 4, p. 1006, Feb. 2023. networks,’’ IEEE Trans. Knowl. Data Eng., 2023.
[26] V. Bhaskar and U. Shanmugam, ‘‘Novel spam comment detection system [47] K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, ‘‘Where the truth
using countvectorizer techniques with SVM for online YouTube comments lies: Explaining the credibility of emerging claims on the Web and social
for improving the recall and precision value over naive Bayes,’’ in Proc. media,’’ in Proc. 26th Int. Conf. World Wide Web Companion, 2017,
AIP Conf., 2023. pp. 1003–1012.
[48] A. Vlachos and S. Riedel, ‘‘Fact checking: Task definition and dataset MUHAMMAD MUDASSAR YAMIN is currently
construction,’’ in Proc. ACL Workshop Lang. Technol. Comput. Social an Associate Professor with the Department
Sci., 2014, pp. 18–22. of Information and Communication Technology,
[49] K. Soga, S. Yoshida, and M. Muneyasu, ‘‘Exploiting stance similarity and Norwegian University of Science and Technology
graph neural networks for fake news detection,’’ Pattern Recognit. Lett., (NTNU). He is a member with the System Security
vol. 177, pp. 26–32, Jan. 2024. Research Group, and the focus of his research
[50] I. A. Pilkevych, D. L. Fedorchuk, M. P. Romanchuk, and O. M. Naumchak, is on system security, penetration testing, secu-
‘‘An analysis of approach to the fake news assessment based on the graph rity assessment, and intrusion detection. Before
neural networks,’’ in Proc. CEUR Workshop, vol. 3374, 2023, pp. 56–65.
joining NTNU, he was an Information Security
[51] T. J. Billard and R. E. Moran, ‘‘Designing trust: Design style, political
Consultant and served multiple government and
ideology, and trust in ‘fake’ news websites,’’ Digit. Journalism, vol. 11,
no. 3, pp. 519–546, Mar. 2023. private clients. He holds multiple cybersecurity certifications, such as OSCE,
[52] P. P. Ray, ‘‘ChatGPT: A comprehensive review on background, applica- OSCP, LPT-MASTER, CEH, CHFI, CPTE, CISSO, and CBP.
tions, key challenges, bias, ethics, limitations and future scope,’’ Internet
Things Cyber-Phys. Syst., vol. 3, pp. 121–154, Jan. 2023.
[53] R. S. Satpute and A. Agrawal, ‘‘A critical study of pragmatic ambiguity
detection in natural language requirements,’’ Int. J. Intell. Syst. Appl. Eng.,
vol. 11, no. 3s, pp. 249–259, 2023.
[54] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, ‘‘Advances
in pre-training distributed word representations,’’ 2017, arXiv:1712.09405.
[55] C. Qiao, B. Huang, G. Niu, D. Li, D. Dong, W. He, D. Yu, and H. Wu,
‘‘A new method of region embedding for text classification,’’ in Proc.
ICLR, 2018.
[56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
Neural Inf. Process. Syst., 2023. SUBHAN ALI received the bachelor’s degree from
[57] S. N. Edi, ‘‘Topic modelling Twitter data with latent Dirichlet allocation Sukkur IBA University, through a fully funded
method,’’ Tech. Rep., 2022. Talent Hunt Scholarship offered by OGDCL,
[58] D. M. Mimno, H. M. Wallach, E. M. Talley, M. Leenders, and Pakistan, in 2021. He is currently pursuing
A. McCallum, ‘‘Optimizing semantic coherence in topic models,’’ in Proc.
the master’s degree in applied computer sci-
Conf. Empirical Methods Natural Lang. Process., 2011, pp. 262–272.
ence with Norwegian University of Science
[59] H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno, ‘‘Evaluation
methods for topic models,’’ in Proc. 26th Annu. Int. Conf. Mach. Learn., and Technology (NTNU), Norway, through a
Jun. 2009, pp. 1105–1112. NORPART-CONNECT fully funded scholarship.
[60] P. Biecek and T. Burzykowski, ‘‘Local interpretable model-agnostic expla- He is a highly motivated Researcher with a passion
nations (LIME),’’ Explanatory Model Anal. Explore, Explain Examine for advancing the field of artificial intelligence.
Predictive Models, vol. 1, pp. 107–124, Jan. 2021. His research interests include the intersection of explainable AI, gen-
[61] H. Mehta and K. Passi, ‘‘Social media hate speech detection using erative AI, and natural language processing. His talent for innovative
explainable artificial intelligence (XAI),’’ Algorithms, vol. 15, no. 8, problem-solving and his dedication to advancing the field of AI makes him
p. 291, Aug. 2022. a valuable addition to any team.