0% found this document useful (0 votes)
69 views

Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI

Uploaded by

harianush3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI

Uploaded by

harianush3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Received 10 March 2024, accepted 20 March 2024, date of publication 25 March 2024, date of current version 29 March 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3381038

Advancing Fake News Detection: Hybrid Deep


Learning With FastText and Explainable AI
EHTESHAM HASHMI 1 , SULE YILDIRIM YAYILGAN 1, MUHAMMAD MUDASSAR YAMIN 1,

SUBHAN ALI 2 , AND MOHAMED ABOMHARA 1


1 Department of Information Security and Communication Technology (IIK), Norwegian University of Science and Technology (NTNU), 2815 Gjøvik, Norway
2 Department of Computer Science (IDI), Norwegian University of Science and Technology (NTNU), 2815 Gjøvik, Norway
Corresponding author: Ehtesham Hashmi ([email protected])
This work was supported by the Research Council of Norway through the SOCYTI Project (Technological Convergence Related to
Enabling Technologies) under Grant 331736.

ABSTRACT The widespread propagation of misinformation on social media platforms poses a significant
concern, prompting substantial endeavors within the research community to develop robust detection
solutions. Individuals often place unwavering trust in social networks, often without discerning the origins
and authenticity of the information disseminated through these platforms. Hence, the identification of
media-rich fake news necessitates an approach that adeptly leverages multimedia elements and effectively
enhances detection accuracy. The ever-changing nature of cyberspace highlights the need for measures
that may effectively resist the spread of media-rich fake news while protecting the integrity of information
systems. This study introduces a robust approach for fake news detection, utilizing three publicly available
datasets: WELFake, FakeNewsNet, and FakeNewsPrediction. We integrated FastText word embeddings with
various Machine Learning and Deep Learning methods, further refining these algorithms with regularization
and hyperparameter optimization to mitigate overfitting and promote model generalization. Notably, a hybrid
model combining Convolutional Neural Networks and Long Short-Term Memory, enriched with FastText
embeddings, surpassed other techniques in classification performance across all datasets, registering
accuracy and F1-scores of 0.99, 0.97, and 0.99, respectively. Additionally, we utilized state-of-the-art
transformer-based models such as BERT, XLNet, and RoBERTa, enhancing them through hyperparameter
adjustments. These transformer models, surpassing traditional RNN-based frameworks, excel in managing
syntactic nuances, thus aiding in semantic interpretation. In the concluding phase, explainable AI modeling
was employed using Local Interpretable Model-Agnostic Explanations, and Latent Dirichlet Allocation to
gain deeper insights into the model’s decision-making process.

INDEX TERMS Fake news, deep learning, interpretability modeling, machine learning, word embeddings,
transformers.

I. INTRODUCTION People all over the world use these platforms to get news
In the current era, digital platforms such as social media, about everything from celebrities to politics, often without
online forums, and websites have overtaken traditional questioning if the news is real or not [3]. Fake news, which is
media as the foremost sources of information [1]. This intentionally created and verifiably false information, is seen
paradigm shift highlights the transformation in our methods as a threat to the stability of democratic systems, diminishing
of accessing and interacting with information [2]. Social public trust in government institutions, and having a profound
media’s freedom of expression and instant information make effect on critical societal aspects such as elections, economic
it very popular, especially with the younger generation. conditions, and public opinions on matters like wars [4], [5].
The dissemination of fake news was markedly prominent
The associate editor coordinating the review of this manuscript and in the key stages of the 2016 U.S. presidential election.
approving it for publication was Leimin Wang . This trend not only influenced public perception but also
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
44462 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

raised concerns about the integrity of information consumed to the classification of a sentence as fake news. We have
by voters during such significant democratic processes [6]. employed LIME with multiple deep learning models to
During that period, around 19 million bot accounts were interpret these black-box deep learning models.
established to disseminate false news regarding Trump Following, the contributions of this research work are
and Clinton and this deliberate strategy rapidly increased summarized, followed by how the rest of the paper is
the spread and influence of misinformation among the organized.
public [7], [8]. Additionally, reports indicate that fake news
tends to receive more attention on social media compared
to factual news, with examples of this trend visible on A. WORK CONTRIBUTION
prominent social media platforms. The issue of fake news is 1) In this study, our focus is on advancing the detection
considered to be more critical than other types of misinfor- of fake news by the refinement and application of
mation [9], [10]. As the widespread presence of fake news established fake news detection methodologies through
on social media continues to challenge the trustworthiness the use of regularization methods, optimization tech-
of online information, it becomes increasingly important niques, and hyperparameter tuning. Our methodology
to develop effective measures to address this problem. is carefully applied to a baseline dataset suited for
With the continuous increase in data volume, the need to binary classification, differentiating between factual
rapidly and efficiently gather pertinent information becomes and fabricated information. We carried out our work
increasingly important. This underscores the importance of using three publicly available fake news datasets:
using computational linguistic methods. In this context, the WELFake, and two other news article datasets from
application of Artificial Intelligence (AI) techniques becomes Kaggle.
crucial, providing advanced tools to detect and address 2) We stacked supervised and unsupervised FastText
misinformation effectively. embeddings into ML-based models, including Support
The use of AI in fake news detection is critical because Vector Machine (SVM), Decision Tree (DT), Logistic
it can methodically analyze the minute details of language Regression (LR), Random Forest (RF), and bagging
and context that might be missed by human moderators [11], classifiers like Extreme Gradient Boosting (XGBoost),
[12]. Recent progress in AI and Natural Language Processing and Categorical Boosting (CATBoost). To ensure com-
(NLP) has heightened the interest in fake news detection, prehensive coverage of text data, we also implemented
resulting in the creation of many innovative approaches for a solution to handle out-of-vocabulary (OOV) words
research in this area [13], [14]. The extensive array of online using FastText embeddings, allowing our models to
content, encompassing a wide range of subjects, increases effectively process previously unseen terms. In addi-
the complexity of the task. This has led researchers to tion, we pursued rigorous optimization, fine-tuning
focus on developing methods for automated detection of regularization techniques and hyperparameters across
fake news. Consequently, this advancement in technology our ML models. This meticulous approach aimed to
is crucial for maintaining the integrity of information optimize model performance, prevent overfitting, and
on the internet [15]. Identifying fake news presents a ultimately produce robust, generalizable results.
significant technological challenge for several reasons. This 3) Additionally, to effectively capture complex contex-
complexity necessitates advanced solutions to ensure the tual information and sequential dependencies within
reliability and accuracy of information disseminated online. the text data, we applied FastText embeddings in
This paper utilizes Machine Learning (ML) and Deep DL-based models such as Long Short-Term Mem-
Learning (DL) based techniques, including state-of-the-art ory (LSTM), Gated Recurrent Unit (GRU), and
transformer-based models, to enhance fake news detection. Convolutional Neural Network (CNN). Furthermore,
By incorporating FastText word embeddings for effective this study implemented state-of-the-art text classi-
text data processing and applying these methods to three fication transformer-based models, including Bidi-
publicly available datasets, we achieve a thorough and rectional Encoder Representation from Transformers
detailed analysis. This approach is crucial for accurately (BERT), Robustly Optimized BERT (RoBERTa), and
identifying misinformation in the world of online media. the auto-regressive transformer XLNET with hyper-
Additionally, our work integrates explainable AI methods, parameter tuning. We leveraged these transformers
ensuring that our processes are not only effective but also for their proven ability to capture intricate contextual
transparent and understandable, aligning with the growing information and long-range dependencies in text data,
need for accountability in AI-driven solutions. making them well-suited for the complex task of fake
These advanced DL-based models are excellent when it news detection.
comes to classification, but these models operate as black 4) To enhance the interpretability of our results, par-
boxes [16]. To understand how the model works and which ticularly after observing the best performance of the
attributes contribute most to a prediction, Explainable AI CNN-LSTM model, we implemented Explainable AI
(XAI) comes into play. In this work, we have utilized XAI (XAI) techniques. These included Local Interpretable
algorithms to determine the words that contributed the most Model-Agnostic Explanations (LIME) and coupled

VOLUME 12, 2024 44463


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

with topic modeling using Latent Dirichlet Allocation characteristics from a dataset based on phrase frequency
(LDA), all applied to the WELFake dataset. and then applying classification algorithms. The method
is particularly effective at detecting rogue accounts within
B. STRUCTURE OF THE PAPER biassed datasets, which are typical in social media platforms.
The structure of the remainder of this paper is organized The technology distinguishes between legitimate and fake
as follows: Section (II) reviews the existing research on identities with high accuracy. The system achieves improved
fake news detection. Section (III) details the methodology of accuracy by utilizing Recurrent Neural Networks (RNNs)
the proposed work. Section (IV) is dedicated to presenting with multiple activation functions. Furthermore, as the num-
the results and discussions. Section (V) compares these ber of folds in cross-validation increases, the classification
results with baseline methods. Section (VI) delves into the precision improves. The experimental analysis includes tests
interpretability modeling using LIME, and LDA. In the on both synthetic and real-time social media datasets, with
concluding phase, Section (VII) concludes the paper and real-time Twitter data obtaining roughly 96% accuracy and
outlines future work. synthetic datasets achieving 98% accuracy.

II. RELATED WORK B. DL AND TRANSFORMER BASED APPROACHES


In this section, we discuss the existing research in the field of Verma et al. [29] presented Word Embedding Over Linguistic
fake news detection, where extensive studies have explored Features for Fake News Detection (WELFake) a novel
various methodologies ranging from traditional ML and DL two-phase benchmark model to authenticate news content
to transformer-based methods. by leveraging machine learning classification with word
embedding over linguistic features. This comprehensive
A. ML BASED APPROACHES approach demonstrates a remarkable improvement in fake
Choudhury and Acharjee [17] proposed an ML-based news detection, with the WELFake model achieving a peak
approach for fake news detection using three different accuracy of 96.73%. This performance surpasses traditional
datasets: Liar [18], Fake Job Posting [19], and Fake News. methods, including BERT and CNN models, by up to 4.25%,
After data pre-processing, the cleaned text was then converted highlighting the efficacy of combining linguistic features
into numerical features using Term Frequency Inverse Doc- with advanced embedding techniques. The study further
ument Frequency (TF-IDF) to select the categorical features, contributes a novel dataset comprising approximately 72,000
and these features were then fed to various ML-based articles, enhancing the model’s reliability and generaliz-
algorithms, including Naive Bayes (NB) [20], SVM [21], ability across diverse datasets. Shu et al. [30] introduced
LR [22], and RF [23]. The SVM classifier achieved the FakeNewsNet, a repository designed to support research
highest accuracy with 61%, 97%, and 96% in these datasets, on fake news detection on social media. This repository
respectively. Altheneyan and Alhadlaq [14] introduced a comprises two detailed datasets, rich in news content, social
distributed ML-based approach for fake news detection using context, and spatiotemporal information, to overcome the
the Spark framework [24]. Their study utilized the False limitations of existing datasets. The comprehensive analysis
News Challenge (FNC-1) dataset, categorizing fake news into of FakeNewsNet sheds light on its potential applications
four distinct categories. Leveraging big data technology with in detecting fake news, aiming to address the challenges
Spark, they assessed and compared their method with other posed by the scarcity of multifaceted fake news datasets.
state-of-the-art approaches. Their approach involved creating This initiative marks a significant step towards enhancing
a stacked ensemble model and experimenting on a distributed the accuracy and effectiveness of fake news detection
Spark cluster. To enhance performance, they explored three mechanisms.
distinct word embedding techniques: N-grams [25], Hashing C. Truică and Apostol [31] introduced an innovative
TF-IDF, and Count Vectorizer (CV) [26]. Akhtar et al. [27] approach that employs document embeddings to construct
introduced a query expansion technique for detecting fake multiple models capable of accurately classifying news
news and disinformation with the integration of AI and ML, articles as either reliable or fake. Their evaluation encom-
aiming to mitigate Supply Chain Disruptions (SCD). They passed various machine learning (ML) models, including
focused on four prominent Pakistani online news sources: NB, Gradient Boosting, DL-based models like LSTM and
‘Geo News,’ ’The Dawn,’ ‘Express Tribune,’ and ‘The News.’ GRU, as well as three transformer-based models: pre-
Their study involved analyzing approximately 500 pages trained BERT [32], (Bidirectional and Auto-Regressive
from each source to extract relevant events and topics Transformers) BART [33], and RoBERTa [34] methods.
spanning from January to April 2021. The SCD data were These evaluations were conducted using five distinct datasets
categorized into various types, including natural, human- containing fake news articles, employing various word
caused, maritime, and mass disruptions, all associated with embeddings, including TF-IDF, WORD2VEC [35], and Fast-
fake news and disinformation. Text [36]. In their study, Nanade and Kumar [37] proposed
Shalini et al. [28] proposed ML-based techniques to a transformer-based method for Twitter fake news detection
distinguish between bot-generated and human-generated using the BERT base model, which provided them with
information on social media. The process entails extracting an accuracy score of 77.29%. Verma et al. [38] introduced

44464 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

a binary classification framework for fake news detection more accurate and interpretable analysis. Soga et al. [49]
that combines Bidirectional Encoder Representations from focuses on the detection of fake news on social media by
Transformers (BERT) to capture global text semantics analyzing stance similarity and employing Graph Neural
through the relationships between words in sentences, and Networks (GNNs). Their research work proposes a method
CNN to leverage N-gram features for local text semantics. that accounts for the opinion similarity between users by
They conducted their experiments on four publicly available examining their stances towards news articles and user post
datasets. A similar approach was proposed by Guo et al. [39] interactions. This method uses Graph Transformer Networks
using DL-based models and a pre-trained transformer-based (GNNs) to extract both global structural information and
BERT model for the same purpose. The results of both interactions of similar stances effectively. The technique
studies provide valuable insights into the effectiveness of addresses stance analysis challenges in microblogs and
these methods in the domain of fake news detection. minimizes the impact of poorly represented stance features.
Praseed et al. [40] presented an approach for detecting fake The approach was evaluated using custom crawled Twitter
news in Hindi using an ensemble of pre-trained transformer data and the benchmark FibVID1 dataset, demonstrating
models XLM-RoBERTa [41], mBERT, and ELECTRA [42] significant improvements in detection performance com-
which are separately fine-tuned for the task of Hindi fake pared to conventional methods, including state-of-the-art
news detection. After undergoing appropriate fine-tuning, approaches. This advancement suggests that incorporating
pre-trained transformer models have demonstrated their stance similarity in news-sharing interactions, alongside
capability to identify fake news across various languages. the extraction of propagation patterns characteristic of
In their research study, they utilized the CONSTRAINT2021 fake news, enhances the detection accuracy, making it a
dataset [43], which comprises a total of 8192 online promising direction for future fake news detection studies.
posts. Among these posts, 4358 are categorized as non- Pilkevych et al. [50] explored fake news detection by using
hostile, whereas the remaining 3834 posts exhibit some GNNs, they did a detailed analysis aimed at mitigating
form of hostility. In their research study, Biradar et al. [44] the impacts of disinformation, particularly in the context
introduced an early fusion-based approach that combined of Russia’s aggression against Ukraine. They advocate for
essential features extracted from context-based embeddings GNNs as a potent tool for the automated identification of
like BERT, XLNet, and ELMo [45]. This fusion method harmful content, emphasizing their application in monitoring
aimed to improve the collection of context and semantic online media to promptly detect and assess fake news.
information from social media posts, leading to increased Their approach leverages knowledge graphs (KG) for entity
accuracy in detecting false news. Alongside this approach, recognition and relationship mapping in textual content, with
they implemented both ML and DL-based techniques. Their an emphasis on detecting signs of negative psychological
experiments were conducted using the ‘‘CONSTRAINT influence. Among the models evaluated, GraphSAGE stands
shared task 2021’’ dataset. Moreover, when considering the out for its performance, achieving notable accuracy scores
various embeddings discussed, BERT embeddings exhibited of 89.78% on the Politifact dataset and 98.01% on the
significantly superior performance compared to XLNet and Gossipcop dataset, when trained on data embodying signs of
ELMo, particularly when applied to the limited short text negative psychological influence. This research underscores
data extracted from Twitter. Additionally, combining features the critical role of sophisticated machine learning techniques
derived from different embeddings into a unified vector for in addressing the challenge of disinformation, highlighting
classification resulted in a slight performance improvement. the effectiveness of GNNs in enhancing the accuracy and
Wu et al. [46] introduce Graph-based Semantic Structure efficiency of fake news detection systems.
Mining with Contrastive Learning (GETRAL), a revolution-
ary graph-based semantic structure mining framework with C. MAJOR CHALLENGES
contrastive learning, to improve evidence-based fake news After performing comprehensive analysis or related work
identification that significantly surpasses existing models following are the current challenges in fake news detection,
on the Snopes [47] and PolitiFact [48] datasets. This 1) Variability and Sophistication: Fake news often
methodology overcomes the constraints of earlier methods mimics genuine news in style and presentation, making
by representing claims and evidence as graph-structured it difficult to distinguish based on surface features
data, allowing for the capture of long-distance semantic rela- alone. The sophistication of misinformation tactics
tionships. GETRAL lowers information redundancy through evolves continuously, necessitating advanced detection
graph structure learning and enhances representation learning techniques that can adapt to changing patterns [51].
through supervised contrastive learning with adversarial 2) Linguistic Nuances and Contextual Understanding:
augmented examples. On Snopes, GETRAL achieves an The effective detection of fake news requires a deep
F1-Macro score of 80.61% and an F1-Micro score of 85.12%. understanding of linguistic subtleties and the ability to
On the PolitiFact dataset, GETRAL records an F1-Macro interpret context. This is challenging due to the vast
of 69.53% and an F1-Micro of 69.81%, demonstrating its
superior performance in addressing the challenges of fake
news detection by integrating advanced techniques for a 1 https://github.com/merry555/FibVID

VOLUME 12, 2024 44465


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 1. Comparative analysis of selected studies.

diversity of languages and the specific cultural contexts news. By incorporating explainable AI and topic modeling
within which news is disseminated [52]. techniques into our research methodology, we intend to shed
3) Bias and Subjectivity: Identifying biases and subjec- light on the interpretability and transparency of our models,
tive assertions within news content without suppressing ultimately enhancing the comprehensibility of fake news
freedom of expression or introducing detection biases understanding. Table 1 represents the comparative analysis
presents a significant challenge. of the current state-of-the-art methods.
4) Scalability and Generalizability: The ability to scale
detection mechanisms to process vast quantities of III. WORK METHODOLOGY
data across different platforms, and ensuring these The proposed research methodology of this study involves a
mechanisms are generalizable across various domains systematic approach to achieving promising results, as shown
and languages, is a complex endeavor. in Figure 1. Each of the steps from our research methodology
From the existing literature, it is evident that numerous is further elaborated in detail below:
studies have tackled the problem of fake news detection
utilizing both traditional ML and DL-based approaches A. DATASET
and highlight the current challenges in the domain of In our study, we addressed the binary classification prob-
fake news detection, such as the sophisticated techniques lem, where 0 represents fake news, and 1 represents
used to generate and disseminate fake news, the rapid real news. We employed three publicly available datasets:
evolution of misinformation, and the difficulty of achieving WELFake [29], FakeNewsNet [30], and FakeNewsPredic-
high accuracy in detection while maintaining interpretability tion.4 WELFake consists of 72,134 news articles, with 35,028
and generalizability. In this study, we aim to contribute categorized as fake news and 37,106 classified as real
to this analysis by employing a comprehensive range news. To prevent classifier overfitting and enhance machine
of techniques, including ML, DL, and transformer-based learning training, the authors combined data from four
models. To enhance the accuracy and generalizability of fake prominent news datasets, including those from Kaggle,
news detection, we leverage supervised and unsupervised McIntire, Reuters, and BuzzFeed Political, thereby enriching
FastText word embeddings using three benchmark datasets, the dataset with a more extensive and varied collection of
complemented by extensive regularization techniques and text data. FakeNewsNet comprises two extensive datasets
hyperparameter tuning methods. A noteworthy aspect of our that encompass a wide range of characteristics related to
contribution to this paper will be our focus on addressing
the limited body of work concerning XAI within fake 4 https://www.kaggle.com/datasets/rajatkumar30/fake-news

44466 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 2. Count of instances in datasets. complex endeavor. Table 3 highlights some preprocessed text
data examples from the WELFake dataset.

C. WORD EMBEDDING
Word embeddings provide numerical representations for
textual inputs, allowing machines to process and understand
textual data more effectively. These embeddings capture
news content, social context, and spatiotemporal information. semantic relationships and contextual information, facilitat-
The third dataset, FakeNewsPrediction, comprises 3,171 ing tasks such as sentiment analysis, text classification, and
instances of real news and 3,164 instances of fake news. language modeling. By transforming words into vectors in a
Table 2 represents the count of instances in three datasets used continuous vector space, word embeddings enable machines
in this paper. to recognize similarities between words, capture word
meanings, and generalize from the training data, ultimately
B. DATA PREPROCESSING enhancing the performance of various natural language tasks.
Effective data preprocessing plays a pivotal role in enhancing In this paper, we have utilized FastText embeddings due
the performance of various ML and DL-based models, as it to their effectiveness in capturing semantic information and
involves eliminating irrelevant text from the dataset and contextual nuances within text data. FastText embeddings
ensuring that the data is presented in a concise and suitable offer distinct advantages over traditional word embeddings,
format. In our study, we placed particular emphasis on as they can represent subword information and handle out-
two primary columns: ‘‘text,’’ which contained all the news of-vocabulary words more gracefully. These qualities make
comments, and ‘‘label,’’ representing the true or fake label. FastText embeddings a superior choice, particularly when
The rationale behind text preprocessing lies in its ability to dealing with languages with rich morphological structures
significantly impact the performance of learning algorithms. and variations. Conventional word vectors disregard the
By preparing the data appropriately, we can improve the internal structure of words, which holds valuable information.
quality and relevance of information used for training and This information could prove beneficial when generating
analysis. To preprocess the ‘‘text’’ column, we implemented a representations for infrequent or incorrectly spelled words.
series of essential steps. Initially, we converted all uppercase The equation 1 denotes the mathematical formula to compute
letters to lowercase and removed non-essential characters, FastText word embeddings [54].
such as ASCII symbols. Subsequently, we conducted tok-
enization of both words and sentences while eliminating stop 1 X
uw + xn (1)
words to further refine the data. Moreover, we employed |N |
n∈N
Python’s RegEx library to filter and process elements such
as numbers, punctuation, and specific patterns, including where:
email addresses, URLs, and phone numbers. Additionally,
we addressed the removal of duplicate examples within the uw : represents the vector for a word w in the embedding
dataset, ensuring data quality and diversity for model training. space.
Data preprocessing ensures that the dataset is cleansed 1
of extraneous information that might otherwise hinder the : is the fraction representing the average.
|N |
learning process. In addition to these steps, we applied X
: is the sum symbol, used to sum over a set of vectors.
lemmatization to our text data. Lemmatization is employed
to reduce words to their base or root form, promoting n ∈ N : specifies that we are summing over the set N.
consistency in word usage and improving the model’s ability xn : represents the vector for the context words in the set.
to recognize similarities between different inflections of the
same word. Overall, our text preprocessing pipeline was FastText, a word representation tool developed by Face-
designed to optimize the quality and relevance of the data fed book’s research division, provides both unsupervised and
into our learning algorithms, thereby enhancing the accuracy supervised modes, featuring an extensive lexicon of 2 mil-
of fake news detection. lion words sourced from Common Crawl. Each word is
For our transformer-based models, we have streamlined represented in a 300-dimensional vector space, resulting in
our preprocessing to include word and sentence tokenization, a vast library comprising a staggering 600 billion word
converting uppercase characters to lowercase, and removing vectors. What sets this word embedding method apart is its
extraneous symbols. This focused approach is instrumental unique approach, incorporating manually crafted n-grams as
in addressing the issue of syntactic ambiguity, as highlighted features in addition to individual words [55]. FastText offers
in prior research [53]. Syntactic ambiguity presents a sub- two primary modes of usage: unsupervised and supervised.
stantial challenge encountered in previous ML and DL-based In our research, we have employed both of these modes,
algorithms, where words within a sentence can have multiple conducting a comprehensive analysis of their respective
meanings depending on the context, making interpretation a applications.

VOLUME 12, 2024 44467


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

FIGURE 1. Methodology diagram.

TABLE 3. Examples of preprocessed data on WELFake dataset.

1) UNSUPERVISED FASTTEXT be utilized in various tasks such as word similarity, word


In unsupervised learning, FastText generates word vectors, analogy, or as features in downstream NLP applications.
extending the Word2Vec model to include subword informa- In this study, we used FastText’s unsupervised word
tion by breaking words into a bag of character n-grams. For vectors, specifically the pre-trained model cc.en.300.bin.5
example, with the word ‘‘Obama’’, FastText would consider This model was developed on Common Crawl and Wikipedia
not just ‘‘Obama’’ but also n-grams like ‘‘Oba’’, ‘‘bam’’, using FastText’s unsupervised learning technique, which
‘‘ama’’, depending on the specified n-gram range. Similarly, integrates subword information into the training process.
for ‘‘Trump’’, it would analyze fragments like ‘‘Tru’’, ‘‘rum’’, By doing so, the model captures the morphological intricacies
‘‘ump’’. This approach is valuable for understanding suffixes of words and represents them as vectors in a 300-dimensional
and prefixes, helping the model recognize that words with space. Each word vector is enhanced with information
similar subparts might be semantically related. In FastText’s collected from character n-grams, helping the model to better
approach to unsupervised learning, when breaking down understand word morphology and handle out-of-vocabulary
words into a bag of character n-grams, the typical range terms. The introduction of subword information allows
for these n-grams is between 3 to 6 characters. FastText’s our study to include not just the semantic representation
unsupervised learning method uses vast amounts of unlabeled
text to build word representations. These word vectors can 5 https://fasttext.cc/docs/en/english-vectors.html

44468 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

Algorithm 1 Create Unsupervised FastText Embeddings for


Text Data
Require: FastText model file ‘cc.lang.300.bin’, text data
from df
Ensure: Matrix X _fasttext of FastText embeddings
1: Load the FastText model: ft_model ←
fasttext.load_model(‘cc.lang.300.bin’);
2: function text_to_fasttext_embeddings(text, ft_model)
3: words ← split the text;
4: embeddings ← initialize an empty list;
5: for each word in words do
6: vector ← ft_model.get_word_vector(word);
7: if vector is valid then
8: Append vector to embeddings;
9: end if
10: end for
FIGURE 2. FastText word embedding architecture.
11: if embeddings is not empty then return mean of
embeddings across axis 0;
of complete words but also the semantic implications of 12: else
their constituent pieces, providing a more sophisticated 13: return zero vector of length
perspective of language semantics. ft_model.get_dimension();
In our implementation, we use FastText’s unsupervised 14: end if
model to create word embeddings for a Fake News 15: end function
dataset. The function text_to_fasttext_embeddings processes 16: X _fasttext ← stack vertically the result of
each text, generating embeddings for each word using text_to_fasttext_embeddings for each text in df ;
get_word_vector. This method effectively handles OOV
words by leveraging subword information. It computes the
average of these embeddings to represent the entire text. applicability for classification-oriented tasks and highlights
If a text has no known words or is entirely OOV, a zero its potential as a significant tool for improving the accuracy of
vector is returned. Applied to our dataset df, this approach our research. In our experimentation, we trained the FastText
results in a feature matrix X _fasttext, suitable for various model over 50 epochs, employing learning rates of 0.01,
analytical tasks in our Fake News study. Algorithm 1 explains 0.1, and 0.01 respectively for three different datasets. This
the unsupervised FastText with OOV used in our paper. training strategy allowed us to harness the power of FastText
embeddings to enhance our classification performance
2) SUPERVISED FASTTEXT effectively.
In supervised learning, FastText is used for text classification.
It applies the same principle of using subword information TABLE 4. Hyperparameters details for supervised FastText.
but is trained on a labeled dataset where each text snippet has
an associated label or category. FastText uses a hierarchical
softmax function based on the Huffman coding tree which
speeds up training and prediction time, making it feasible
to train on millions of documents. For text classification,
the model averages the word vectors in a text to form the
text representation, which is then used to predict a label.
FastText’s supervised mode is particularly powerful because D. MODELING APPROACHES
it can handle large datasets and large numbers of classes This section will detail the ML, DL, and transformer-based
efficiently. models utilized in this paper. It will provide an in-depth
In this study, we did a thorough exploration by deploying examination of each model’s architecture and its application
both supervised and unsupervised FastText models. While within our research framework.
both approaches produced encouraging results, it was clear
that supervised FastText consistently beat and outperformed 1) ML BASED MODELS
its unsupervised counterpart. This analysis emphasizes the FastText embeddings were used as input for the subsequent
importance of using labeled training data in text classi- supervised ML-based models, including DT, SVM, LR, and
fication problems, where supervised learning can exploit RF. In addition, boosting methods such as XGBoost and
explicit category information to obtain greater accuracy. The CatBoost, along with feature engineering techniques, were
effectiveness of the supervised FastText model confirms its applied. In the implementation of ML-based models, several

VOLUME 12, 2024 44469


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

hyperparameters and regularization techniques have been rate of 0.1, and logloss as the loss function. XGBoost uses
employed to optimize performance. a maximum depth of 3 and a subsample ratio of 0.8, while
CatBoost uses a maximum depth of 6 and a subsample
TABLE 5. Configuration details for ML models. ratio of 0.7. These parameters are critical in managing
the models’ complexity and preventing overfitting while
ensuring efficient learning.

2) DL BASED MODELS
In our study, we implemented LSTM, its variant BiLSTM,
GRU, and the hybrid CNN-LSTM model. These RNN-based
models excel in processing sequential data, with LSTM units
In table 5, for DT, the Split_min values of 2, 5, and
adept at capturing long-term dependencies. The BiLSTM
10 dictate the minimum number of samples required for a
variant further enhances this by processing data in both
node split, influencing the tree’s complexity and potential
forward and backward directions, thus gaining a more
overfitting. In RF, the N-Estimators parameter, with values
comprehensive understanding of context, which is especially
50, 100, and 200, determines the number of trees in
beneficial in complex sequential tasks. GRU, while similar
the forest, balancing between computational efficiency and
to LSTM in managing sequence dependencies, offers a
model accuracy. The SVM with a linear kernel and LR
more streamlined architectural design. Additionally, the
classifiers both utilize the regularization parameter C, tested
CNN-LSTM model combines CNN with LSTM, leveraging
at values 0.1, 1, and 10 for SVM, and 1, 10, and 100 for
CNNs’ ability to extract spatial features and LSTMs’
LR. The C parameter plays a crucial role in controlling the
strength in interpreting these features temporally. This hybrid
strength of regularization, which helps to prevent overfitting
model is particularly effective in tasks that require an
by penalizing the magnitude of the coefficients. Lower values
understanding of both spatial and temporal patterns, such as
of C imply more regularization, constraining the model to
video classification and time-series forecasting.
simpler decision boundaries.
All these parameters across different models were meticu- 1) Regularization Techniques: Regularization tech-
lously optimized using GridSearchCV, an exhaustive search niques serve as a method in classifier training to
over specified parameter values. GridSearchCV systemat- avoid overfitting, a condition where a model predicts
ically evaluates combinations of parameters, selecting the training data accurately but fails to generalize well
ones that yield the best performance metrics, thereby ensuring to new, unseen data. The performance enhancement
that each model is finely tuned for optimal accuracy and of the CNN-LSTM model is significantly attributed
generalization. The equation 2 represents the GridSearchCV to the use of kernel L2 regularization, with a lambda
algorithm in ML. In this formulation, optimize reflects the setting of 0.01 for both LSTM and CNN layers. The
goal of GridSearchCV to find the best model parameters. importance of L2 regularization lies in its ability to
The hyperparameters h1 ∈ H1 , h2 ∈ H2 , . . . , hn ∈ Hn are minimize weight magnitudes, thereby encouraging
exhaustively searched to maximize the score function within the model to adopt smaller values for weights. This
their ranges. The argmax operator identifies the specific set approach accomplishes two key goals: it minimizes the
of hyperparameters that yield the highest score, typically a likelihood of overfitting and preserves the model’s abil-
measure of model accuracy or performance. ity to generalize across different datasets effectively.
! The preference for L2 regularization over L1 was a
calculated choice. L1 regularization, while capable of
optimize argmax score (model(h1 , h2 , . . . , hn ))
h1 ∈H1 , h2 , ..., hn ∈Hn inducing sparsity by turning some weights to zero,
could lead to underfitting, an issue that emerged in
(2)
the initial testing phases. The formulas for L1 and
The following table 6 represents the hyperparameters and L2 regularization are detailed in equations (3) and (4),
regularization details for boosting methods in the proposed respectively.
approach, n
X
L1(w) = λ |wi | (3)
TABLE 6. Configuration details for boosting algorithms.
i=1

where:

w: is the weight vector of the model


λ: is the regularization coefficient
n: is the number of weights in the vector
For the gradient boosting models, XGBoost and CatBoost,
the configuration includes 100 boosting rounds, a learning wi : is the ith weight in the weight vector

44470 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

FIGURE 3. CNN-LSTM proposed model architecture diagram.

L1 regularization incorporates the absolute magnitude configuration details of each DL-based model. Notably,
of coefficients as a penalty to the loss function. This the count of each layer has been mentioned as well.
addition of absolute values introduces a non-linear
penalty based on the weights, making L1 regulariza- 3) TRANSFORMER BASED MODELS
tion conducive to sparse outcomes where numerous The Transformer, an innovative system in Natural Language
coefficients become precisely zero. Processing (NLP), is structured to handle sequence-to-
n
X sequence tasks, utilizing a self-attention mechanism that
L2(w) = λ w2i (4) efficiently manages long-range dependencies comprising two
i=1 main components encoder and decoder. BERT, RoBERTa,
and XLNet are all encoder-only models. This architecture
L2 regularization introduces the squared magnitude
makes them highly effective for text classification tasks,
of coefficients as a penalty to the loss function. This
where understanding and processing input data to generate
squaring process results in a smoother, differentiable
contextual representations is crucial. Transformers were first
penalty, even at wi = 0. Contrary to L1 regularization,
introduced in 2017 by Vaswani et al. [56], the Transformer’s
L2 does not lead to sparse models because it generally
self-attention mechanism is characterized by its ability to
does not push coefficients to become exactly zero,
focus on different parts of the input sequence, which can be
although it may reduce them to small values.
represented through a specific mathematical formulation.
2) Hyperparameter Tuning for DL-Based Models: In
the hyperparameter optimization process for DL-based QK ⊤
models, we methodically adjusted the model’s learning Attention(Q, K , V ) = softmax( √ i )Vi (5)
dk
process through targeted experimentation. The training
period was set to 10 epochs, a duration chosen to where:
balance effective learning against the risk of overfitting, Q: is the loss to minimize
and ended when the model’s loss decreased.
K : is the key matrix
In figure 3, CNN-LSTM model combines two con-
volutional layers and LSTM layers for advanced V : is the value matrix
data processing. The convolutional layers, each with dk : is the dimension of the key vectors
64 filters, use kernel sizes of 4 and 3 respectively, N : is the length of the input sequence
with ‘relu’ activation, effectively extracting spatial
i: is the index of the query vector
features. A MaxPooling layer follows, reducing data
dimensionality and enhancing efficiency. The LSTM This study concentrates on the use of transformers, with
segment, with two layers of 50 and 30 units, captures a particular emphasis on the optimization of their hyper-
temporal dynamics, crucial for sequential data analysis. parameters. Transformers represent a notable progression
The model concludes with a ’softmax’-activated dense from earlier language models like RNNs, which were limited
layer, making it suitable for classification tasks. This by their computational intensity and memory demands,
architecture excels in tasks requiring both spatial especially in generative tasks. In our research, we lever-
feature extraction and temporal sequence understand- aged extensive text datasets and utilized text classification
ing. Table 7 illustrates the hyperparameters and transformers, including BERT, XLNet, and RoBERTa. BERT

VOLUME 12, 2024 44471


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 7. Configuration details for DL models.

TABLE 8. Configuration details for transformer based models. performance of our supervised FastText model. This
approach was particularly beneficial for accommodat-
ing the model’s scalability and efficiency without com-
promising accuracy. Post-quantization enabled us to
adjust learning rates dynamically, with certain param-
eters set to true, thereby optimizing computational
resource usage.
excels in understanding the context of a word in a sentence 3) ML Models Execution Time: On average, each epoch
by looking at the words that come before and after it. for our ML-based models required approximately
XLNet, an extension of the Transformer model, outperforms 5 minutes of execution time. This efficiency demon-
BERT in certain scenarios by using a permutation-based strates the models’ suitability for scalable applications.
training approach. RoBERTa modifies key hyperparameters 4) DL-Based Models: The deep learning models took
in BERT, including removing the next-sentence pretraining roughly 5 minutes per epoch, striking a balance
objective and training with much larger mini-batches and between computational demand and performance.
learning rates, leading to improved performance on several 5) Transformer-Based Models: Due to their architec-
benchmarks. The table 8 represents the hyperparameters and tural complexity, transformer-based models necessi-
configuration details for transformer-based methods in the tated about 15 minutes per epoch for training. Despite
proposed approach, the longer duration, the significant improvements
in detection capabilities justify the computational
IV. RESULTS AND DISCUSSION investment.
In our assessment, we utilized standard metrics to evaluate 6) Model Optimization: In addition to post-quantization,
the model’s performance. These metrics include accuracy, we explored various optimization techniques to
precision, recall, and F1-score, all of which offer quantitative enhance model efficiency further. These included layer
measures of the model’s effectiveness. pruning, dropout adjustments, and batch normalization,
TP + TN which collectively contributed to reducing overfitting
Accuracy = (6)
TP + TN + FP + FN and accelerating the training process.
TP
Precision = (7)
TP + FP
TP B. ANALYSIS OF RESULTS: UNSUPERVISED FASTTEXT
Recall = (8) WITH ML AND DL MODELS
TP + FN
2 · Precision · Recall The weighted evaluation scores for ML and DL-based
F1-Score = (9) models, employing unsupervised FastText embeddings on
Precision + Recall
WELFake, FakeNewsNet, and FakeNewsPrediction, are
A. COMPUTATIONAL EFFICIENCY displayed in Tables 9, 10 and, 11 respectively. The provided
To ensure a comprehensive understanding of our proposed tables highlight the SVM classifier’s best performance across
models’ performance and efficiency, we have conducted an all three datasets, surpassing all other ML classifiers in
in-depth comparison of our achievements against existing both accuracy and F1-scores, achieving impressive values of
state-of-the-art methods. Our evaluation extends beyond 0.92, 0.97, and 0.91, respectively. Notably, it outperforms
accuracy, precision, recall, and F1-scores to include compu- even DL-based models utilizing unsupervised FastText
tational efficiency, a crucial aspect for practical applications. embeddings. This consistent and remarkable performance
1) Hardware and Optimization: Our experiments were is noteworthy, especially considering the differing dataset
conducted on a MacBook M3 Max with 128GB of sizes. The SVM classifier’s ability to effectively handle
unified memory. This setup allowed us to benchmark high-dimensional data, create clear decision boundaries, and
the computational requirements accurately. navigate complex, non-linear relationships makes it a strong
2) Post Quantization on Supervised FastText: We choice for text classification, contributing to its exceptional
employed post quantization techniques to optimize the performance in fake news detection tasks.

44472 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

Unlike the SVM classifier, which demonstrated remark- TABLE 11. Results of ML and DL-Based models with unsupervised
FastText on FakeNewsPrediction dataset.
able and consistent performance, ML classifiers such
as LR, RF, and DT exhibited inconsistent performance
across all three datasets, showing variations in their
performance, even when employing different regularization
techniques with unsupervised FastText embeddings. This
inconsistency underscores the challenges they faced in
adapting to the unique characteristics of each dataset.
In contrast, all DL-based models consistently maintained
their performance and generalizability across the datasets,
showcasing their reliability in handling varying data
complexities.

TABLE 9. Results of ML and DL-Based models with unsupervised FastText


on WELFake dataset.

FIGURE 4. CNN-LSTM training and validation loss curve.

TABLE 10. Results of ML and DL-Based models with unsupervised


FastText on FakeNewsNet dataset.

FIGURE 5. CNN-LSTM training and validation accuracy curve.

Figure 6 displays the confusion matrix for binary clas-


sification in the context of fake news detection using the
CNN-LSTM model on the WELFake dataset.

Figures 4 and 5 are the train-validation loss and accuracy


curves for the CNN-LSTM model which is our best model
that outperformed unsupervised FastText algorithms when
stacked with the supervised FastText embeddings which will
be discussed later in the details. These curves depict a stable
convergence on WELFake dataset, where the validation
metrics closely mirror the training metrics throughout
the training process. The alignment between training and
validation accuracy, coupled with a continual decrease in loss
for both training and validation, suggests that the model is
effectively learning and not exhibiting signs of overfitting to
the training data. FIGURE 6. CNN-LSTM confusion matrix with unsupervised FastText.

VOLUME 12, 2024 44473


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 12. Examples of wrongly predicted instances.

Table 12 highlights some examples of incorrect predictions TABLE 13. Results of ML and DL-Based models with supervised FastText
on WELFake dataset.
made by the CNN-LSTM model using unsupervised FastText
on the WELFake dataset.

C. ANALYSIS OF RESULTS: SUPERVISED FASTTEXT WITH


ML AND DL MODELS
The weighted evaluation score for ML and DL-based models
using supervised FastText embeddings have been shown in
Tables 13, 14 and, 15. It can be clearly observed that in
the case of supervised FastText, each ML and DL-based
algorithm outperformed its counterpart in unsupervised
FastText. For instance, with DT, the lowest accuracy score in
unsupervised FastText was observed as 0.83, 0.85, and 0.78,
respectively, across all three datasets. However, with super-
vised FastText, these scores showed significant enhancement, TABLE 14. Results of ML and DL-Based models with supervised FastText
reaching 0.98, 0.91, and 0.90, respectively. In the case of on FakeNewsNet dataset.
SVM, which outperformed all other ML-based algorithms in
unsupervised FastText, a similar consistent performance was
observed with supervised FastText. However, RF surpassed
all other ML-based algorithms, exhibiting accuracy scores of
0.99, 0.95, and 0.93.
Our optimal model, CNN-LSTM, surpassed all other
models in ML, DL, and even those based on transformers,
which will be discussed later. It achieved the highest
accuracy and F1-scores of 0.99, 0.99, and 0.97, respectively,
marking it as an exceptionally effective algorithm for fake
news classification, even in larger datasets. The outstanding
performance of CNN-LSTM can be attributed to its ability
to efficiently capture both local features through CNN and
long-term dependencies using LSTM, making it particularly
adept at handling the complexities of natural language
in fake news detection. In summary, the CNN-LSTM
model not only demonstrates consistent high performance
across all three datasets but also clearly outperforms or
matches the performance of other ML and DL models in
more complex detection scenarios. Its ability to maintain
high evaluation scores of 0.99 in the second and third
datasets, where ML models showed reduced effectiveness,
underscores the CNN-LSTM model’s advanced capability for FIGURE 7. Supervised CNN-LSTM training and validation loss curve.
accurately classifying fake news. This analysis, by directly
referencing the specific scores from the tables, highlights the
CNN-LSTM model’s significant contribution to the field of Figures 7 and 8 indicate a stable convergence, with the
fake news detection and its suitability as the proposed method validation metrics closely tracking the training metrics across
in this research. epochs. The graphs demonstrate a stable convergence and

44474 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 15. Results of ML and DL-Based models with supervised FastText TABLE 16. Results of transformer based models on WELFake dataset.
on FakeNewsPrediction dataset.

TABLE 17. Results of transformer based models on FakeNewsNet dataset.

TABLE 18. Results of transformer based models on FakeNewsPrediction


dataset.

On the WELFake Dataset, all models, including BERT


base, BERT large, XLNet base, and RoBERTa large, show
FIGURE 8. Supervised CNN-LSTM training and validation accuracy curve. a strong and uniform performance, with each scoring
0.97 in precision, recall, accuracy, and F1-score, except
RoBERTa large which is marginally lower at 0.96. This
slight underperformance of RoBERTa large might indicate
some dataset-specific challenges or limitations in model
architecture. Moving to the FakeNewsNet Dataset, a remark-
able increase in model performance is observed, with all
models achieving a uniform score of 0.99 across all metrics.
This exceptional performance suggests that the FakeNewsNet
Dataset contains patterns more easily interpreted by these
models compared to the other datasets. In the case of the
FakeNewsPrediction Dataset, a slight variation is noted.
While BERT base, BERT large, and RoBERTa large models
maintain a consistent score of 0.95 across all metrics, the
FIGURE 9. CNN-LSTM confusion matrix with supervised FastText.
XLNet base model demonstrates superior performance with
scores of 0.97.
Despite the consistent and robust performance of the
indicate that the validation scores are in close agreement transformer-based models, the CNN-LSTM architecture
with the training scores throughout the training epochs. The exceeded the effectiveness of all other learning algo-
close alignment between training and validation accuracy, rithms used in this research. This outcome highlights the
alongside a consistent decrease in loss for both training and CNN-LSTM model’s superior capability in handling the
validation, suggests that the model is learning generalizable specific challenges and nuances of the datasets used.
patterns rather than overfitting the training data.
Figure 9 represents the confusion matrix for CNN-LSTM V. COMPARISON OF THE RESULTS WITH THE
using supervised FastText. STATE-OF-THE-ART
In this section, we compare our results with those from
D. ANALYSIS OF RESULTS: TRANSFORMER BASED two baseline studies, [29] and [30]. In both the WELFake
MODELS and FakeNewsNet datasets, the proposed models achieved a
The following figures 16, 17, and 18 represents the results remarkable accuracy of 0.99, surpassing the baseline scores
obtained from transformer-based models which are BERT of 0.97. This improvement underscores the effectiveness
with its base and large variant, XLNet and RoBERTa. of the applied methodologies in our study. While the

VOLUME 12, 2024 44475


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

 
0.02 increase in accuracy may appear marginal at first glance, log likelihood
Perplexity = exp −1 ∗ (11)
it is statistically significant when considering the extensive total number of words
size of the datasets involved. Specifically, the WELFake We applied LDA to WELFake, FakeNewsNet, and Fake-
dataset includes 72,134 records, and the FakeNewsNet NewsPrediction datasets mentioned in this paper. Based on
dataset contains 23,196 records. By implementing strategic the coherent terms identified, we categorized each dataset
regularization techniques and meticulous parametric tuning, into three primary topics, providing a structured thematic
we were able to achieve these promising results. Such understanding of the datasets. The hyperparameter tuning of
approaches not only enhance model performance but also LDA is performed by performing different experiments and
contribute to the robustness and generalizability of the the best parameters obtained, which are used in this study are
models. This suggests that our models are not only adept at shown in Table 20.
handling the specific datasets they were trained on but also
have the potential to perform well across varied datasets, TABLE 20. Hyperparameter tuning of LDA model.

demonstrating their adaptability and reliability in broader


applications.

TABLE 19. Comparison of the results with SOTA.

Table 22 shows LDA results across the three datasets


indicate a solid performance in terms of both coherence
and perplexity scores, suggesting a good understanding and
effective topic modeling of the datasets. For the WELFake
dataset, the coherence score is 0.26236, coupled with a
perplexity score of -8.218491. These figures imply that the
topics generated are reasonably coherent and meaningful,
with the negative perplexity indicating a good fit of the
VI. INTERPRETABILITY MODELING model to the data. In the case of the FakeNewsPrediction
Interpretability modeling is crucial in the field of fake dataset, the coherence score is slightly lower at 0.22106,
news detection, as it helps create models or methods that yet still represents a decent level of topic interpretability
make complex machine learning processes more clear and and relevance. The perplexity score is very close to that of
transparent. Techniques like LDA and LIME are particularly WELFake, at -8.2188, indicating a consistent model perfor-
useful in this area. LDA helps uncover hidden themes in large mance across different datasets. Finally, the FakeNewsNet
text datasets, which is vital for understanding the content dataset shows a coherence score of 0.26039 and a perplexity
patterns that might indicate fake news. On the other hand, of -8.9435. The coherence is comparable to that of WELFake,
LIME provides straightforward explanations for individual suggesting effective topic representation. The relatively lower
predictions made by the models, shedding light on the reasons perplexity score here signifies an even better model fit to
behind classifying certain news articles as fake. the dataset compared to the other two. Overall, these scores
reflect that LDA has a good grasp on the datasets, effectively
A. TOPIC MODELING WITH LDA capturing the underlying thematic structures in the text. This
LDA is renowned for its effective balance between simplicity demonstrates the utility of LDA in providing meaningful
and sophistication in topic modeling. It is crafted to identify insights into large text corpora, particularly in the context of
various topics within a corpus of text. These identified topics fake news detection and analysis.
can be understood as groups of words that often appear
together [57]. In our research, we assessed the performance B. LIME
of LDA using two key metrics: coherence and perplexity. Local Interpretable Model-Agnostic Explanations (LIME)
Coherence evaluates how meaningful and interpretable the is a method designed for providing localized insights and
topics generated by LDA are, by assessing the semantic evaluating the predictions of any learning algorithm. It offers
similarity between words within these topics. Perplexity, an understanding of how a model’s predictions correspond
on the other hand, measures how well the model predicts a to the specific requirements of a given task. This technique
sample. The calculations for coherence and perplexity are is especially useful in situations where comprehending a
detailed in equations 10 and 11, respectively, as cited in [58] model’s decision-making process is as crucial as the accuracy
and [59]. of its outcomes, as noted by [60]. The LIME formula seeks
1 X to identify an interpretable model, denoted as ĝ, within a
Coherence = ∗ PMI (wi , wj ) (10) class of models G. It aims to minimize the loss L, which
C
44476 VOLUME 12, 2024
E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

TABLE 21. Topics and items for four datasets.

TABLE 22. LDA results across all datasets.

measures the discrepancy between the predictions of g and FIGURE 10. Example 1: Supervised CNN-LSTM with LIME.

the more complex model f . This is done while accounting


have likely contributed to the model’s prediction, with words
for the locality kernel πx . Additionally, (g) represents the
like ‘‘reform’’, ‘‘early’’, ‘‘dismantling’’, ‘‘tax’’, ‘‘adminis-
complexity of the interpretable model g, with a preference
tration’’, ‘‘killing’’, ‘‘Obamacare’’, ‘‘pushing’’, ‘‘rule’’, and
for lower complexity to ensure better interpretability and
‘‘list’’. These words are weighted according to their influence
maintain simplicity.
on the prediction, as indicated by the numbers next to
ĝ = arg min L(f , g, πx ) + (g) (12) each word. The terms such as ‘‘Obamacare’’, ‘‘reform’’,
g∈G
and ‘‘dismantling’’ have higher weights, suggesting they are
LIME is a widely-used interpretability technique known strong indicators for the model’s decision to classify this as
for its simplicity and effectiveness in providing insights fake news.
into complex machine learning models. Other techniques In figure 11 LIME visualization depicts the model’s
like SHAP, counterfactual explanations, and similar language prediction process for another news article, which indicates
tools can help understand complex models. But we chose that the learning algorithm has assigned a higher probability
LIME because it uses simpler methods, making explanations to the text being fake news (label 0) with a probability of
easy to understand. LIME’s use of Lasso or short trees helps 0.80, while there is a smaller probability of 0.20 for the
create straightforward and focused explanations, which are text being real news (label 1). The words highlighted on
easier for humans to understand [61]. While other techniques the right side as influencing the model’s decision include
offer valuable interpretability features, we opted for LIME ‘‘disqualified’’, ‘‘president’’, ‘‘thing’’, ‘‘latest’’, ‘‘revela-
due to its ability to provide local explanations, which tion’’, ‘‘email’’, ‘‘one’’, ‘‘Hillary’’, and ‘‘point’’. The weights
are crucial for understanding individual predictions within next to the words suggest their relative impact on the
our model. LIME generates locally faithful explanations prediction, with ‘‘disqualified’’ and ‘‘president’’ having the
by approximating the decision boundary around a specific most significant influence. The model appears to be using
instance, thus providing insights into why a model made a these key terms to assess the credibility of the text, with
particular prediction for that instance. This approach aligns words related to political figures and potential controversies
well with our goal of gaining deeper insights into the model’s (‘‘Hillary’’, ‘‘email’’) being central to its classification as fake
decision-making process, particularly in the context of hate news. The presence of words like ‘‘latest’’ and ‘‘revelation’’
speech detection in multimedia-rich content. might indicate a news-like structure which could contribute
In this study we will implement LIME with supervised to the ambiguity in the model’s decision, reflected in the less
FastText CNN-LSTM which showed the best performance confident prediction compared to the first example.
across all three datasets, we will use some examples from the In figure 12, LIME visualization shows that the learning
WELFake dataset to interpret the model’s decision-making algorithm is completely certain in its classification, assigning
process. The LIME visualization provided in figure 10 offers a probability of 1.00 to the text being fake news (label 0) and
a granular view into the decision-making process of the no probability of it being real (label 1).
learning algorithm. According to the prediction probabilities, The visualization highlights the terms ‘‘Reuters’’,
the learning algorithm has confidently classified this text as ‘‘EDIT’’, ‘‘Source’’, ‘‘Twitter’’, ‘‘realDonaldTrump’’, ‘‘con-
fake news (label 0) with a probability of 1.00, and there firmed’’, ‘‘edited’’, and ‘‘posted’’ as key contributors to
is no probability assigned to it being real news (label 1). the prediction. Notably, the word ‘‘Reuters’’ carries the
The right side of the visualization highlights key words that highest weight, followed by ‘‘EDIT’’ and ‘‘Source,’’ which

VOLUME 12, 2024 44477


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

FIGURE 14. Example 5: Supervised CNN-LSTM with LIME.


FIGURE 11. Example 2: Supervised CNN-LSTM with LIME.

perceiving the text as a legitimate news report. This could


indicate that the model may be overvaluing specific keywords
that are often found in factual news content, without fully
grasping the deceptive context that characterizes fake news.

VII. CONCLUSION AND FUTURE WORK


This research study represents a comprehensive and robust
framework for the detection of fake news, addressing the
FIGURE 12. Example 3: Supervised CNN-LSTM with LIME. pressing challenge of misinformation spread across social
media. By utilizing three diverse datasets, we systematically
evaluated the efficacy of FastText embeddings in conjunction
with advanced ML and DL-based techniques, with an
emphasis on optimization strategies to avoid overfitting and
address generalizability. Our findings indicate that a hybrid
model combining CNN with LSTM layers, augmented with
FastText embeddings, outperforms other models in accurately
classifying news articles. Moreover, the application of
FIGURE 13. Example 4: Supervised CNN-LSTM with LIME.
transformer-based models has shed light on the capabilities
of these architectures in deciphering complex syntactic
may indicate that the presence of these terms is strongly structures for enhanced semantic understanding. The use of
associated with the features of fake news within the explainable AI through LIME and LDA has not only justified
model’s learned parameters. Additionally, the inclusion of the transparency of the detection process but also provided
‘‘realDonaldTrump’’ and ‘‘FraudNewsCNN’’ could suggest valuable interpretative insights. In future work, we will aim
the model is picking up on the mention of high-profile names to expand the scope of fake news detection to encompass
and possible claims of misinformation as indicators of fake multiple languages, particularly those with scarce linguistic
news. resources. We aim to explore the potential of multilingual
In figure 13, learning algorithm’s prediction with a high transformers, such as mBERT, mT5, and GPT, to tackle the
degree of confidence, indicating a 0.99 probability that the intricacies of multi-label classification, thereby advancing
text is fake news (label 0) and a very small probability of the frontier in combating the global challenge of fake news
0.01 for it being real news (label 1). dissemination. Furthermore, by adopting adversarial training
Key words influencing this decision include ‘‘breit- methods, we anticipate further advancements in our fight
bart’’, ‘‘memo’’, ‘‘weekday’’, ‘‘added’’, ‘‘went’’, ‘‘remains’’, against the widespread issue of fake news dissemination
‘‘see’’, ‘‘wonder’’, and ‘‘president’’. The term ‘‘breitbart’’ globally.
has the highest weight, which suggests that the model has
learned to associate this term with the characteristics of fake ACKNOWLEDGMENT
news. Words like ‘‘memo’’ and temporal references such as This research work has been acknowledged by SOCYTI.6
‘‘weekday’’ and ‘‘12:25’’ also seem to play a significant role
in the classification. REFERENCES
LIME visualization in figure 14 indicates that the learning [1] R. M. Johnson, ‘‘Social media and free speech: A collision course that
algorithm misclassified a fake news article (label 0) as real threatens democracy,’’ Ohio Northern Univ. Law Rev., vol. 49, no. 2, p. 5,
2023.
news (label 1) with a confidence of 0.71. The LIME output [2] S. Rastogi and D. Bansal, ‘‘A review on fake news detection 3T’s:
highlights the words ‘‘October’’, ‘‘said’’, ‘‘November’’, Typology, time of detection, taxonomies,’’ Int. J. Inf. Secur., vol. 22, no. 1,
‘‘massive’’, ‘‘Britain’’, ‘‘French’’, ‘‘France’’, and ‘‘March’’ pp. 177–212, Feb. 2023.
[3] S. Rastogi and D. Bansal, ‘‘Disinformation detection on social media:
as significant for the model’s decision-making process. The An integrated approach,’’ Multimedia Tools Appl., vol. 81, no. 28,
weightiest term, ‘‘October’’ along with other time-related pp. 40675–40707, Nov. 2022.
terms like ‘‘November’’ and ‘‘March’’, and frequent occur-
rences of the word ‘‘said,’’ might have misled the model into 6 https://www.bigdata.vestforsk.no/ongoing/socyti

44478 VOLUME 12, 2024


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

[4] N. Capuano, G. Fenza, V. Loia, and F. D. Nota, ‘‘Content-based fake [27] P. Akhtar, A. M. Ghouri, H. U. R. Khan, M. Amin ul Haq, U.
news detection with machine and deep learning: A systematic review,’’ Awan, N. Zahoor, Z. Khan, and A. Ashraf, ‘‘Detecting fake news
Neurocomputing, vol. 530, pp. 91–103, Apr. 2023. and disinformation using artificial intelligence and machine learning to
[5] F. Miró-Llinares and J. C. Aguerri, ‘‘Misinformation about fake news: avoid supply chain disruptions,’’ Ann. Operations Res., vol. 327, no. 2,
A systematic critical review of empirical studies on the phenomenon and pp. 633–657, Aug. 2023.
its status as a‘threat,’’’ Eur. J. Criminol., vol. 20, no. 1, pp. 356–374, [28] A. K. Shalini, S. Saxena, and B. S. Kumar, ‘‘Designing a model for fake
Jan. 2023. news detection in social media using machine learning techniques,’’ Int.
[6] C. Silverman, ‘‘This analysis shows how viral fake election news stories J. Intell. Syst. Appl. Eng., vol. 11, no. 2, pp. 218–226, 2023.
outperformed real news on Facebook,’’ BuzzFeed news, vol. 16, p. 24, [29] P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, ‘‘WELFake: Word
Jan. 2016. embedding over linguistic features for fake news detection,’’ IEEE Trans.
[7] G. Sansonetti, F. Gasparetti, G. D’Aniello, and A. Micarelli, ‘‘Unre- Computat. Social Syst., vol. 8, no. 4, pp. 881–893, Aug. 2021.
liable users detection in social media: Deep learning techniques [30] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNewsNet:
for automatic detection,’’ IEEE Access, vol. 8, pp. 213154–213167, A data repository with news content, social context, and spatiotemporal
2020. information for studying fake news on social media,’’ Big Data, vol. 8,
[8] A. Jarrahi and L. Safari, ‘‘Evaluating the effectiveness of publishers’ no. 3, pp. 171–188, Jun. 2020.
features in fake news detection on social media,’’ Multimedia Tools Appl., [31] C.-O. Truică and E.-S. Apostol, ‘‘It’s all in the embedding! Fake news
vol. 82, no. 2, pp. 2913–2939, Jan. 2023. detection using document embeddings,’’ Mathematics, vol. 11, no. 3,
[9] R. Rodríguez-Ferrándiz, ‘‘An overview of the fake news phenomenon: p. 508, Jan. 2023.
From untruth-driven to post-truth-driven approaches,’’ Media Commun., [32] J. H. Joloudari, S. Hussain, M. A. Nematollahi, R. Bagheri, F. Fazl,
vol. 11, no. 2, pp. 15–29, Apr. 2023. R. Alizadehsani, R. Lashgari, and A. Talukder, ‘‘BERT-deep CNN: State
[10] M. R. Kondamudi, S. R. Sahoo, L. Chouhan, and N. Yadav, ‘‘A of the art for sentiment analysis of COVID-19 tweets,’’ Social Netw. Anal.
comprehensive survey of fake news in social networks: Attributes, features, Mining, vol. 13, no. 1, p. 99, Jul. 2023.
and detection approaches,’’ J. King Saud Univ.-Comput. Inf. Sci., vol. 35, [33] D. Antony, S. Abhishek, S. Singh, S. Kodagali, N. Darapaneni, M. Rao,
no. 6, Jun. 2023, Art. no. 101571. and A. R. Paduri, ‘‘A survey of advanced methods for efficient text
[11] C. Martel and D. G. Rand, ‘‘Misinformation warning labels are widely summarization,’’ in Proc. IEEE 13th Annu. Comput. Commun. Workshop
effective: A review of warning effects and their moderating features,’’ Conf. (CCWC), Mar. 2023, pp. 0962–0968.
Current Opinion Psychol., vol. 54, Dec. 2023, Art. no. 101710.
[34] J. Briskilal and C. N. Subalalitha, ‘‘An ensemble model for classifying
[12] S. Wang, ‘‘Factors related to user perceptions of artificial intelligence idioms and literal texts using BERT and RoBERTa,’’ Inf. Process. Manage.,
(AI)-based content moderation on social media,’’ Comput. Hum. Behav., vol. 59, no. 1, Jan. 2022, Art. no. 102756.
vol. 149, Dec. 2023, Art. no. 107971.
[35] S. J. Johnson, M. R. Murty, and I. Navakanth, ‘‘A detailed review on word
[13] K. Węcel, M. Sawiński, M. Stróżyna, W. Lewoniewski, E. Księżniak, embedding techniques with emphasis on word2vec,’’ Multimedia Tools
P. Stolarski, and W. Abramowicz, ‘‘Artificial intelligence—Friend or foe Appl., vol. 2023, pp. 1–29, Oct. 2023.
in fake news campaigns,’’ Econ. Bus. Rev., vol. 9, no. 2, pp. 41–70,
2023. [36] M. Umer, Z. Imtiaz, M. Ahmad, M. Nappi, C. Medaglia, G. S. Choi,
and A. Mehmood, ‘‘Impact of convolutional neural network and FastText
[14] A. Altheneyan and A. Alhadlaq, ‘‘Big data ML-based fake news detection
embedding on text classification,’’ Multimedia Tools Appl., vol. 82, no. 4,
using distributed learning,’’ IEEE Access, vol. 11, pp. 29447–29463, 2023.
pp. 5569–5585, Feb. 2023.
[15] S. D. M. Kumar and A. M. Chacko, ‘‘A systematic survey on explainable
[37] A. Nanade and A. Kumar, ‘‘Combating fake news on Twitter: A machine
AI applied to fake news detection,’’ Eng. Appl. Artif. Intell., vol. 122,
learning approach for detection and classification of fake tweets,’’ Int.
Jun. 2023, Art. no. 106087.
J. Intell. Syst. Appl. Eng., vol. 12, no. 1, pp. 424–436, 2024.
[16] S. Ali, F. Akhlaq, A. S. Imran, Z. Kastrati, S. M. Daudpota, and M. Moosa,
‘‘The enlightening role of explainable artificial intelligence in medical & [38] P. K. Verma, P. Agrawal, V. Madaan, and R. Prodan, ‘‘MCred: Multi-
healthcare domains: A systematic literature review,’’ Comput. Biol. Med., modal message credibility for fake news detection using BERT and CNN,’’
vol. 166, Nov. 2023, Art. no. 107555. J. Ambient Intell. Humanized Comput., vol. 14, no. 8, pp. 10617–10629,
Aug. 2023.
[17] D. Choudhury and T. Acharjee, ‘‘A novel approach to fake news detection
in social networks using genetic algorithm applying machine learning [39] Z. Guo, Q. Zhang, F. Ding, X. Zhu, and K. Yu, ‘‘A novel fake news
classifiers,’’ Multimedia Tools Appl., vol. 82, no. 6, pp. 9029–9045, detection model for context of mixed languages through multiscale
Mar. 2023. transformer,’’ IEEE Trans. Computat. Social Syst., 2024.
[18] W. Wang, ‘‘A new benchmark dataset for fake news detection,’’ in Proc. [40] A. Praseed, J. Rodrigues, and P. S. Thilagam, ‘‘Hindi fake news detection
55th Annu. Meeting Assoc. Comput. Linguistics, vol. 2, 2021. using transformer ensembles,’’ Eng. Appl. Artif. Intell., vol. 119, Mar. 2023,
[19] S. Dutta and S. K. Bandyopadhyay, ‘‘Fake job recruitment detection using Art. no. 105731.
machine learning approach,’’ Int. J. Eng. Trends Technol., vol. 68, no. 4, [41] S. Sai, A. W. Jacob, S. Kalra, and Y. Sharma, ‘‘Stacked embeddings and
pp. 48–53, Apr. 2020. multiple fine-tuned XLM-roBERTa models for enhanced hostility identifi-
[20] L. R. Ali, B. N. Shaker, and S. A. Jebur, ‘‘An extensive study of sentiment cation,’’ in Combating Online Hostile Posts in Regional Languages During
analysis techniques: A survey,’’ in AIP Conf. Proc., 2023. Emergency Situation. Cham, Switzerland: Springer, 2021, pp. 224–235.
[21] M. A. Chandra and S. S. Bedi, ‘‘Survey on SVM and their application [42] K. Subramanyam Kalyan, A. Rajasekharan, and S. Sangeetha, ‘‘AMMUS
in image classification,’’ Int. J. Inf. Technol., vol. 13, no. 5, pp. 1–11, : A survey of transformer-based pretrained models in natural language
Oct. 2021. processing,’’ 2021, arXiv:2108.05542.
[22] H. Wang, F. G. Quintana, Y. Lu, M. Mohebujjaman, and K. Kamronnaher, [43] M. Bhardwaj, M. Shad Akhtar, A. Ekbal, A. Das, and T. Chakraborty,
‘‘An application of ordianl logistic regression model to a health survey in ‘‘Hostility detection dataset in Hindi,’’ 2020, arXiv:2011.03588.
a hispanic university,’’ Tech. Rep. [44] S. Biradar, S. Saumya, and A. Chauhan, ‘‘Combating the infodemic:
[23] J. Hu and S. Szymczak, ‘‘A review on longitudinal data analysis with COVID-19 induced fake news recognition in social media networks,’’
random forest in precision medicine,’’ 2022, arXiv:2208.04112. Complex Intell. Syst., vol. 9, no. 3, pp. 2879–2891, Jun. 2023.
[24] M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, ‘‘Mobile big [45] M. S. I. Malik, A. Nawaz, M. M. Jamjoom, and D. I. Ignatov,
data analytics using deep learning and apache spark,’’ IEEE Netw., vol. 30, ‘‘Effectiveness of ELMo embeddings, and semantic models in predicting
no. 3, pp. 22–29, May 2016. review helpfulness,’’ Intell. Data Anal., vol. 2023, pp. 1–21, Nov. 2023.
[25] S. Lee, J. Lee, H. Moon, C. Park, J. Seo, S. Eo, S. Koo, and H. Lim, [46] J. Wu, W. Xu, Q. Liu, S. Wu, and L. Wang, ‘‘Adversarial contrastive
‘‘A survey on evaluation metrics for machine translation,’’ Mathematics, learning for evidence-aware fake news detection with graph neural
vol. 11, no. 4, p. 1006, Feb. 2023. networks,’’ IEEE Trans. Knowl. Data Eng., 2023.
[26] V. Bhaskar and U. Shanmugam, ‘‘Novel spam comment detection system [47] K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, ‘‘Where the truth
using countvectorizer techniques with SVM for online YouTube comments lies: Explaining the credibility of emerging claims on the Web and social
for improving the recall and precision value over naive Bayes,’’ in Proc. media,’’ in Proc. 26th Int. Conf. World Wide Web Companion, 2017,
AIP Conf., 2023. pp. 1003–1012.

VOLUME 12, 2024 44479


E. Hashmi et al.: Advancing Fake News Detection: Hybrid Deep Learning

[48] A. Vlachos and S. Riedel, ‘‘Fact checking: Task definition and dataset MUHAMMAD MUDASSAR YAMIN is currently
construction,’’ in Proc. ACL Workshop Lang. Technol. Comput. Social an Associate Professor with the Department
Sci., 2014, pp. 18–22. of Information and Communication Technology,
[49] K. Soga, S. Yoshida, and M. Muneyasu, ‘‘Exploiting stance similarity and Norwegian University of Science and Technology
graph neural networks for fake news detection,’’ Pattern Recognit. Lett., (NTNU). He is a member with the System Security
vol. 177, pp. 26–32, Jan. 2024. Research Group, and the focus of his research
[50] I. A. Pilkevych, D. L. Fedorchuk, M. P. Romanchuk, and O. M. Naumchak, is on system security, penetration testing, secu-
‘‘An analysis of approach to the fake news assessment based on the graph rity assessment, and intrusion detection. Before
neural networks,’’ in Proc. CEUR Workshop, vol. 3374, 2023, pp. 56–65.
joining NTNU, he was an Information Security
[51] T. J. Billard and R. E. Moran, ‘‘Designing trust: Design style, political
Consultant and served multiple government and
ideology, and trust in ‘fake’ news websites,’’ Digit. Journalism, vol. 11,
no. 3, pp. 519–546, Mar. 2023. private clients. He holds multiple cybersecurity certifications, such as OSCE,
[52] P. P. Ray, ‘‘ChatGPT: A comprehensive review on background, applica- OSCP, LPT-MASTER, CEH, CHFI, CPTE, CISSO, and CBP.
tions, key challenges, bias, ethics, limitations and future scope,’’ Internet
Things Cyber-Phys. Syst., vol. 3, pp. 121–154, Jan. 2023.
[53] R. S. Satpute and A. Agrawal, ‘‘A critical study of pragmatic ambiguity
detection in natural language requirements,’’ Int. J. Intell. Syst. Appl. Eng.,
vol. 11, no. 3s, pp. 249–259, 2023.
[54] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, ‘‘Advances
in pre-training distributed word representations,’’ 2017, arXiv:1712.09405.
[55] C. Qiao, B. Huang, G. Niu, D. Li, D. Dong, W. He, D. Yu, and H. Wu,
‘‘A new method of region embedding for text classification,’’ in Proc.
ICLR, 2018.
[56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
Neural Inf. Process. Syst., 2023. SUBHAN ALI received the bachelor’s degree from
[57] S. N. Edi, ‘‘Topic modelling Twitter data with latent Dirichlet allocation Sukkur IBA University, through a fully funded
method,’’ Tech. Rep., 2022. Talent Hunt Scholarship offered by OGDCL,
[58] D. M. Mimno, H. M. Wallach, E. M. Talley, M. Leenders, and Pakistan, in 2021. He is currently pursuing
A. McCallum, ‘‘Optimizing semantic coherence in topic models,’’ in Proc.
the master’s degree in applied computer sci-
Conf. Empirical Methods Natural Lang. Process., 2011, pp. 262–272.
ence with Norwegian University of Science
[59] H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno, ‘‘Evaluation
methods for topic models,’’ in Proc. 26th Annu. Int. Conf. Mach. Learn., and Technology (NTNU), Norway, through a
Jun. 2009, pp. 1105–1112. NORPART-CONNECT fully funded scholarship.
[60] P. Biecek and T. Burzykowski, ‘‘Local interpretable model-agnostic expla- He is a highly motivated Researcher with a passion
nations (LIME),’’ Explanatory Model Anal. Explore, Explain Examine for advancing the field of artificial intelligence.
Predictive Models, vol. 1, pp. 107–124, Jan. 2021. His research interests include the intersection of explainable AI, gen-
[61] H. Mehta and K. Passi, ‘‘Social media hate speech detection using erative AI, and natural language processing. His talent for innovative
explainable artificial intelligence (XAI),’’ Algorithms, vol. 15, no. 8, problem-solving and his dedication to advancing the field of AI makes him
p. 291, Aug. 2022. a valuable addition to any team.

EHTESHAM HASHMI received the B.S. degree


in computer science from the University of Central
Punjab, Lahore Campus, in 2020, and the M.S.
degree in computer science from COMSATS
University Islamabad, Lahore Campus, in 2022.
He is currently pursuing the Ph.D. degree with
the Department of Information Security and
Communication Technology (IIK), Norwegian
University of Science and Technology (NTNU).
From 2022 to 2023, he was a Lecturer with the MOHAMED ABOMHARA received the
Department of Computer Science, The University of Lahore. His research bachelor’s degree in Libya, in 2006, the master’s
interests include multilingual natural language processing, computational degree (M.Sc.) in computer science in Malaysia,
linguistics, large language models, knowledge graphs, and data mining. in 2011, the master’s degree (M.B.A.) in business
administration, in 2013, and the Ph.D. degree
in information technology from the University
of Agder, Norway, in 2018. He is currently a
SULE YILDIRIM YAYILGAN received the M.Sc. Cybersecurity Researcher and a Data Protection
degree in computer engineering, in 1995, and Specialist with the Department of Informa-
the Ph.D. degree in artificial intelligence and tion Security and Communication Technology,
computer science, in 2002. She has been with the Norwegian University of Science and Technology (NTNU). His scholarly
Department of Information Security and Commu- contributions extend to active participation in several prestigious European,
nication Technology (IIK), NTNU, since 2009. Erasmus+, and Norwegian Research Council projects, where he has assumed
She worked for more than 25 years in academic both technical and managerial roles and has published multiple journals
teaching. She has been supervising students at and conferences papers. His commitment to advancing technology while
different academic levels and has been publishing upholding ethical and privacy standards underscores his prominent role in
more than 100 journals and conferences papers. academia and research. His research interests include the development of
Her research interests include image processing, information security, natural data-driven technologies that uphold critical principles, such as transparency,
language processing, computational linguistics, large language models, and accountability, and privacy.
data mining.

44480 VOLUME 12, 2024

You might also like