AComparative Study of Machine Learning and Deep Learning Techniques For Fake News Detection
AComparative Study of Machine Learning and Deep Learning Techniques For Fake News Detection
Article
A Comparative Study of Machine Learning and Deep Learning
Techniques for Fake News Detection
Jawaher Alghamdi 1,2, *, Yuqing Lin 1 and Suhuai Luo 1
1 School of Information and Physical Sciences, College of Engineering Science and Environment,
The University of Newcastle, Newcastle 2308, Australia
2 Department of Computer Science, King Khalid University, Abha 62521, Saudi Arabia
* Correspondence: [email protected]
Abstract: Efforts have been dedicated by researchers in the field of natural language processing
(NLP) to detecting and combating fake news using an assortment of machine learning (ML) and deep
learning (DL) techniques. In this paper, a review of the existing studies is conducted to understand
and curtail the dissemination of fake news. Specifically, we conducted a benchmark study using a
wide range of (1) classical ML algorithms such as logistic regression (LR), support vector machines
(SVM), decision tree (DT), naive Bayes (NB), random forest (RF), XGBoost (XGB) and an ensemble
learning method of such algorithms, (2) advanced ML algorithms such as convolutional neural
networks (CNNs), bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent
units (BiGRU), CNN-BiLSTM, CNN-BiGRU and a hybrid approach of such techniques and (3) DL
transformer-based models such as BERTbase and RoBERTabase . The experiments are carried out
using different pretrained word embedding methods across four well-known real-world fake news
datasets—LIAR, PolitiFact, GossipCop and COVID-19—to examine the performance of different
techniques across various datasets. Furthermore, a comparison is made between context-independent
embedding methods (e.g., GloVe) and the effectiveness of BERTbase -contextualised representations
in detecting fake news. Compared with the state of the art’s results across the used datasets, we
achieve better results by solely relying on news text. We hope this study can provide useful insights
Citation: Alghamdi, J.; Lin, Y.; Luo, S. for researchers working on fake news detection.
A Comparative Study of Machine
Learning and Deep Learning Keywords: fake news; misinformation; machine learning; deep learning; transformer-based models
Techniques for Fake News Detection.
Information 2022, 13, 576. https://
doi.org/10.3390/info13120576
2. Problem Definition
We consider the problem of fake news detection as a binary classification, where a
model is created to predict the credibility of a piece of news as either fake or real based on
Information 2022, 13, 576 3 of 28
3. Related Works
ML and DL techniques have significantly contributed to detecting and categorising
fake news. Earlier research in fake news detection mainly relied on linguistic features to
engineer relevant information. The textually based features were divided broadly into two
groups, according to [12]: (1) general features and (2) latent features.
They demonstrated that their approach was more accurate than state-of-the-art approaches
such as Transformers’ Bidirectional Encoder Representations (BERT). Linguistic features
have been used in both supervised and unsupervised learning approaches. It is essential to
recognise that prospective deceivers use short sentences, certain phrasal verbs as well as
certain verb tenses as part of their language, which has been revealed by some cooperative
experiments between psychologists, linguists and computer scientists [22,29–32]. For ex-
ample, 16 language variables were investigated by Burgoon et al. [29] in order to see if they
may help distinguish between deceptive and truthful communications. They conducted
two experiments in order to construct a database, in which they set up face-to-face or
computer-based discussions with one volunteer acting as the deceiver and the other acting
genuinely. Then, such discussions were transcribed for further processing, and they came
up with certain linguistic cue classes that could reveal the deceiver. The authors utilised
the C4.5 decision tree (DT) technique with 15-fold cross-validation to cluster and construct
a hierarchical tree structure of the proposed features. Their proposed method achieved
an overall accuracy of 60.72 per cent on a short sample of 72 cases. They concluded that
noncontent words (e.g., function words) should be included when studying social and
personality processes. According to the authors, in addition to specific nouns and verbs,
linguistic style markers such as pronouns, articles and prepositions are equally important
in revealing what someone is thinking and feeling. Therefore, liars are more likely to tell
less self-relevant, less complicated and more negative stories. Moreover, liars were more
likely than truth-tellers to utilise negative emotion (more negative feeling) terms [31].
Liars may feel guilty, either for the lie they said or for the topic they lied about [33].
Knapp et al. [34] observed that liars were far more likely than truth-tellers to make dis-
paraging statements about their communication partners. In addition, they observed that
the use of other references is more common among liars than among truth-tellers. How-
ever, this is inconsistent with what [31] found, where they observed that compared with
truth-tellers, liars used fewer third-person pronouns. Horne et al. [35] applied an SVM
classifier with a linear kernel using several linguistic clues. The authors cast the prob-
lem as a multi-class classification to identify whether an article was real, fake or satire,
where classes were equally distributed. After a 5-fold cross-validation with BuzzFeed
News (https://www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-
outperformed-real-news-on-facebook) (accessed on 5 May 2022) that was enriched with
satire articles, they reached 78 percent accuracy. Their feature set mainly consisted of POS
tags and specific LIWC word categories.
A wide range of stylistic, morphological and grammatical text and punctuation could
serve as useful cues to detect news veracity. These sets of features were adopted by
Papadopoulou et al. [36] using a two-level text-based classifier in order to detect click-
bait. Similarly, Rubin et al. [37] used some of these features, such as punctuation and
grammatical features. Sentiment analysis, opinion mining and opinionated and inflam-
matory language as factors for fake news detection have also been explored [19,37–46].
Bhelande et al. [47] applied a naive Bayesian classifier using a bag of positive and negative
words (i.e., sentiment analysis). Syntactic features (e.g., sentence-level features such as POS
and n-grams) [4,13,18,25,37,39,44,48–50], and lexical features for salient content words (i.e.,
character-level and word-level features) [8,18,20,25,35,37,40,44,51–56] have been adopted
by existing research. Castillo et al. [40] analysed and assessed information credibility
on Twitter by adopting a list of rudimentary content-based features. In [57], the authors
aimed to gauge the credibility of web claims by exploiting language stylistic features.
They claimed that the language style of the reporting articles plays a crucial role in un-
derstanding their credibility. Da Silva et al. [58] examined ML approaches to identify and
detect fake news. They found that neural networks composed of statistical classification
algorithms heavily concentrate on analysing lexicons as the main features for detection
to perform best. Potthast et al. [20] used news content to extract style features for fake
news prediction. Psycholinguistic cues (i.e., additional indicators of persuasive language)
such as sadness, anger and others and signals of biased language have been adopted in
Information 2022, 13, 576 5 of 28
detection as well [9,18,37,46,49,54]. More work on fake news detection relying on linguistic
cues can be found in [52,59,60]. A systematic mapping study was conducted in [61] to
analyse and synthesise studies regarding the use of ML techniques for detecting fake news.
Klyuev [62] discussed different approaches and techniques to fight fake news. They also
discussed how important it is to define text features using NLP methods to build a text
document profile. However, it turns out that straightforward approaches such as n-grams
and POS tags tend to ignore different writing style features and cannot capture cues across
long news articles. Indeed, irrelevant (noisy) text inevitably exists in fake news datasets,
particularly those extracted from OSNs such as Twitter. If not processed, this would end up
with classification performance leading to inaccurate predictions. This poses the challenge
of how to encode such data so as to mitigate such a problem. Various dedicated efforts
have been put forward to respond to this challenge using neural network-based models.
Each provides a unique set of informative and useful features that can help discriminate
fake news from real news.
tion. The technique of learning how to transfer knowledge is known as transfer learning,
which stores and applies the knowledge gained from performing a specific task to another
problem. Learning this way is useful for training and evaluating models with relatively
small amounts of data. In recent years, pretrained language models (PLMs) have become
mainstream for downstream text classification [74] thanks to transformer-based structures.
Major advances have been driven by the use of PLMs, such as ELMo [75], GPT [76] or
BERT [74]. BERT and RoBERTa, the most commonly utilised PLMs, were trained on huge
corpora, such as those containing over three billion words for BERT [74]. The success of
such approaches raises the question of how such models can be used for downstream text
classification tasks. Over the PLMs, task-specific layers are added for each downstream
task, and then the new model is trained with only those layers from scratch [74,77,78] in a
supervised manner. Specifically, these models use a two-step learning approach. First, in a
self-supervised manner, they learn language representations by analysing a tremendous
amount of text. This process is commonly called pretraining. Second, feature-based and
fine-tuning approaches can then be used to apply these pretrained language representations
to downstream NLP tasks. The former uses pretrained representations and includes them
as additional features for learning a given task. The latter introduces minimal task-specific
parameters, and all pretrained parameters are fine-tuned for the downstream tasks. These
models are advantageous in that they can learn deep context-aware word representations
from large unannotated text corpora, namely large-scale self-supervised pretraining. This
is especially useful when learning a domain-specific language with insufficient available
labelled data.
As we will further discuss in later sections, this paper exploits the power of BERTbase
and its variations in building robust fake news predictive models. Few studies have been
conducted using such models, despite using different methodologies and different scenar-
ios, which have shown promising results. One recent example is a study conducted by
Kula et al. [79], which presents a hybrid architecture based on a combination of BERT and
an RNN. Alghamdi et al. [80] presented a computational framework for automatic fake
news detection using BERTbase . The authors applied BERTbase to encode the input text, and
then the resulting output sequence was fed into a CNN network to extract the salient local
features. The metadata were then encoded using a CNN followed by a BiLSTM and then a
classification layer. Their study achieved state-of-the-art performance on the LIAR dataset.
Aggarwal et al. [81] showed that BERT, even with minimal text preprocessing, provided
better performance compared with that of the LSTM and gradient-boosted tree models.
Jwa et al. [82] adopted BERT for fake news detection by analysing the relationship between
the headline and the body text of news using the FNC dataset, where they achieved an F1
score of 0.746. In an attempt to automatically detect fake news spreaders, Baruah et al. [83]
proposed BERT for the classification task, achieving an accuracy of 0.690.
Given that BERT is more complex to train (depending on how large the number
of parameters being used is), a variation of BERT, the so-called DistilBERT method [84],
provides a simpler and more reasonable number of parameters compared with that of
BERT (reducing BERT by 40% in size while retaining 97% of its language understanding
abilities), thus leading to faster training (60% faster). With a larger dataset, larger batches
and more iterations, a robust BERT model was developed: the so-called RoBERTa [78].
A benchmark study of ML models for fake news detection was provided in [85], where
the authors formulated the problem of fake news detection using three different datasets,
including the LIAR dataset [69] (see Section 4.6.1 for more details about the dataset), as a
binary classification. Their experimental results showed the power of advanced PLMs such
as BERT and RoBERTa. A summary of the previous related studies on the LIAR dataset
(binary classification) can be seen in Table 1.
Information 2022, 13, 576 7 of 28
Models Metrics
Accuracy Precision Recall F1
RoBERTabase [85] 0.62 0.63 0.62 0.62
SVM [86] 0.62 NA NA NA
Using a composite fake news dataset called FakeNewsNet (see Section 4.6.2 for more
details about the dataset), which includes two datasets—PolitiFact and GossipCop—the
authors of [87] employed an autoencoder with LSTM of two layers for both encoders
and decoders and also used another network of LSTM with two layers to capture the
temporal pattern of user engagements. The authors later presented a DL model based
on the hierarchical attention network for fake news detection using the same datasets
where their proposed framework achieved state-of-the-art results [88]. Using the PolitiFact
dataset, a deep neural network model with various representations was proposed in [89].
A summary of the previous related work on the FakeNewsNet dataset can be seen in
Tables 2 and 3.
Models Metrics
Accuracy Precision Recall F1
Social Article Fusion [87] 0.69 0.64 0.79 0.71
Logistic Regression (N-Gram) [90] 0.80 0.79 0.78 0.78
BiLSTM-BERT [91] 0.8558 NA NA NA
LNN-KG [89] 0.880 0.9011 0.880 0.8892
DEFEND [88] 0.904 0.902 0.956 0.928
Models Metrics
Accuracy Precision Recall F1
CNN [87] 0.723 0.751 0.701 0.725
Logistic Regression (N-Gram) [90] 0.82 0.75 0.79 0.77
DEFEND [88] 0.808 0.729 0.782 0.755
In [92], the authors applied a pretrained transformer model, the so-called XLNet
model, combined with Latent Dirichlet Allocation (LDA) by integrating contextualised
representations generated from the former with topical distributions produced by the latter.
Their model achieved an F1 score of 0.967. In the same vein, a fine-tuned transformer-based
ensemble model was proposed in [93]. The proposed model achieved a 0.979 F1 score on
the Constraint@AAAI2021-COVID19 fake news dataset (see Section 4.6.3 for more details
about the dataset). Similarly, the authors of [94] carried out several experiments on the
same dataset, and they proposed a framework for detecting fake news using the BERT
language model by considering content information and prior knowledge as well as the
credibility of the source. According to the results, the highest F1 scores obtained ranged
from 97.57 to 98.13. By applying several supervised advanced ML models such as CNNs,
LSTM and BERT to detect COVID-19 fake news, the authors of [95] achieved the best
accuracy of 98.41% using the BERT-based version. A summary of the previous related
studies on the COVID-19 dataset can be seen in Table 4.
Information 2022, 13, 576 8 of 28
Models Metrics
Accuracy Precision Recall F1
SVM+LR+NB+biLSTM [96] NA NA NA 0.94
SVM [97] 0.9570 0.9571 0.9570 0.9570
SNN(LM+KG) [89] 0.9570 0.9533 0.9652 0.9569
BERT-Based [95] 0.9841 NA NA NA
Ensemble Transformer Models [93] 0.9799 0.9799 0.9799 0.9799
XLNet with Topic Distributions [92] NA 0.968 0.967 0.967
SVM [98] 0.9332 0.9333 0.9332 0.9332
4. Comparative Study
Based on thorough literature review analysis, we conducted this comprehensive
comparative study by selecting several popular, well-discussed classical and advanced
ML and DL models for fake news detection. In this section, we will first present different
word-embedding techniques developed in NLP and then introduce different traditional and
advanced ML algorithms as well as DL models for fake news detection and the benchmark
datasets, which will be used to conduct the comparative study. The objective is to examine
the performance of a wide range of ML techniques, including classical and advanced ML
models as well as DL approaches, across various datasets for fake news detection.
4.1. Embeddings
Word embedding is considered the centre of any NLP task and is basically a form
of word representation that bridges the human understanding of language to that of
a machine. Word embeddings are typically learned from a large corpus of text. The
distributed representational vector that captures the semantics of the input text can be
obtained using two main alternatives that have been used for several text classification
tasks in NLP. Modelling text is challenging because it is messy, and techniques such as ML
algorithms require well-defined, fixed-length inputs and outputs. ML algorithms require
the text to be converted into numbers (specifically vectors of numbers) as they cannot
directly work with raw text. These vectors capture more linguistic properties of the text,
as “in language processing, the vectors x are derived from textual data, in order to reflect
various linguistic properties of the text” [99] (p. 65). Textual data requires a numerical
representation and, most importantly, an efficient representation for computation and
manipulation. Various statistically and contextually based methods have been developed
to represent text numerically. The former is based only on statistical metrics, which typically
generate a sparse vector space, while the latter is based on the concept of word context,
which produces a rather dense vector space.
Several approaches have applied such methods (e.g., the bag-of-words [100], n-
gram [101] and TF–IDF methods [102]) as input for text classification using ML (e.g., NB
classifiers [103], K-NN algorithms [104] and SVM [105]). Nevertheless, the contextualised
Information 2022, 13, 576 9 of 28
representation that allows for efficient computation and captures the underlying patterns
is urgently needed, especially with the massive amounts of textual data. These statistically
based representation methods are computationally efficient and have achieved promising
results where they are traditionally considered the centre of any text classification task.
However, these methods focus entirely on capturing the frequency features of a particular
word, and the contextual information of a text is fully disregarded, making it difficult to
capture semantic-based information. To capture more semantical information, pretrained
language representation models (a.k.a context-independent models) were developed, such
as Word2Vec and GloVe, which captured those semantics patterns but did little to capture
context information. A significant amount of attention has been devoted to developing
context-aware representations of textual data, including transformer-based models such
as BERT, which has led to outstanding results in most NLP mainstream tasks. Overall,
these techniques try to model the following problem. Assume we have a corpus of n
documents S = {d1 , d2 , . . . , dn }, each of which consists of m words W = {w1 , w2 , . . . , wm }
and a vocabulary V. The embedding vector representation w ~ i is defined as mapping each
word wi in a specific document di into a continuous space Rd , where d is the dimension of
the vector space. Mathematically, the words in a document can be mapped as follows:
wi → w ~ i ∈ Rd
~ i, w (1)
information in a given document. “Thus the idf of a rare term is high, whereas the idf
of a frequent term is likely to be low” [108] (p. 118). For each word in a document j, the
TF-IDF value is calculated by first calculating the term frequency (TF), which counts the
number of occurrences of words in a document, and then the inverse document frequency
(IDF), which is the catalyst for ensuring that words appear less commonly are assigned
more weights compared with those occurring more frequently (e.g., stop words), which is
calculated as follows:
|D|
log( ) (2)
d fi
where d f i denotes the number of documents that contains a word i and | D | refers to the
number of documents in the corpus. The TF-IDF metric is calculated as follows:
|D|
wi = t f ij · log( (3)
d fi
4.2. ML Algorithms
In this subsection, we mainly describe the classification models we used.
• Support Vector Machine (SVM): The SVM classifier is a strong classifier that yields
promising results in a suite of NLP tasks.
• Multinomial Naive Bayes (MNB): MNB is a kind of probabilistic algorithm (a Bayesian
learning approach) that is also popular and yields great results in different NLP tasks.
• Decision Tree (DT): This is a tree-based algorithm whose end nodes represent high-
level features. A branch represents an output, while a leaf represents a label class.
There are internal nodes that test one attribute and branch from a node that selects one
value for the attribute. The leaf node is used to predict the class label. Classification is
carried out based on supervised learning, which involves mapping the features and
values to desired outcomes.
• Random Forest (RF): RF models consist of a set of decision trees, each trained from a
random selection of features.
• XGBoost (XGB): This is an ensemble ML algorithm. The XGB algorithm uses a
gradient-boosting framework whose algorithm is based on decision trees. Through
boosting, the trees are constructed sequentially, with each one (i.e., weak learners)
aimed at reducing the errors of the previous one. With the help of these weak learn-
ers, the boosting technique is able to combine these weak learners to produce a
strong learner.
• Ensemble: This is a hard voting ensemble learning method that combines the ML
algorithms, including LR, SVM, DT, MNB, RF and XGB, which is built for
better performance.
a CNN can be used to reduce the dimensionality of the feature space while retaining
the informative features from the text. In addition, a CNN can capture and extract
local patterns.
• CNN-BiLSTM: Hybridising recurrence-based models with a CNN helps extract
salient features, capturing local contextualised patterns and improving the model’s
accuracy. First, a single CNN layer of 128 filters with a kernel size of 5 is used to
process the input vectors and extract the local features. The resultant feature maps
of the CNN layer are then fed to a single BiLSTM layer with 128 units to learn the
long-term dependencies of the local features of news articles. This is followed by an
output layer with a single unit activated with a sigmoid function. The temporal and
contextual features and long-term dependencies of the input text can be learned and
captured from the text by using an RNN, and important local features can be detected
by harnessing the power of the CNN in handling the spatial relations [115,116].
• CNN-BiGRU: Similar to the CNN-BiLSTM model, the architecture with a BiLSTM
layer was replaced with a BiGRU layer.
• Hybrid: This is a hybrid model that combines three models: a single CNN layer with
128 neurons of a kernel size of 5, followed by a max-pooling and then a BiLSTM layer
with 128 units and then a BiGRU layer with 128 units.
Hyperparameters
Model Hidden Activation Neurons or Kernel Dropout
Layer(s) Function Filters Size
CNN 1 ReLU 128 5 0.3
BiLSTM 1 N/A 128 N/A 0.3
BiGRU 1 N/A 128 N/A 0.3
(FC) layer is added on top of the PLMs for classification purposes. A sophisticated approach
is needed to detect fake news, since it has become increasingly difficult to distinguish
between fake and real content.
This section evaluates two models—BERT [74] and RoBERTa [78]—which have been
considered the key breakthroughs of the impressive performance in a suite of NLP tasks,
largely due to their powerful language representations being learned from massive amounts
of a text corpus. Moreover, such models can be easily fine-tuned to a specific downstream
task through so-called transfer learning:
• BERT: Originally introduced by Devlin et al. [74], BERT stands for Bidirectional
Encoder Representation from Transformers. As the first deeply bidirectional and
unsupervised language representation, this model uses a multi-layer bidirectional
transformer encoder that simultaneously conditions the left and right contexts. As a re-
sult, BERT generates embeddings that are context-aware. BERT further eliminates the
unidirectional constraint by performing pretraining using an unsupervised prediction
task that includes a masked language model (MLM) which understands context and
predicts words. A vector representation can therefore be generated by the model that
captures the general information of the input text. The semantic representation of each
word in the input text can be improved using an attention mechanism by boosting
semantic representation based on the context of the word. The attention mechanism
plays an important role in transformer architecture in that it assigns varying weights
to different parts of text according to their contributions to the output. An Attention
function maps queries and follows key-value and output-vector pairs, which can be
seen in Equation (14):
QK T
Attention( Q, K, V ) = So f tmax ( √ )V (14)
dk
√
where Q, K and V denote the query, key and value itself, respectively. dk denotes the
dimension of the key vector k and query vector q. Attention uses a Softmax activation
function that normalises the inputs to a value between 0 and 1. BERT uses a multi-head
attention mechanism which can be seen in Equation (15), where each specific head
and the associated weight matrices are denoted with the subscript i:
where W O denotes a weight matrix that was trained jointly with the model and each
headi is calculated as follows:
Even though BERT contains millions of parameters (i.e., BERTbase contains 110 million
parameters, while BERTlarge has 340 million parameters) [74], in contrast to pretraining,
BERT is relatively inexpensive to apply to downstream tasks using jointly fine-tuned
parameters based on a pretrained model. In this work, we use BERTbase .
• RoBERTa [78]: An optimised version of the BERT approach was introduced by Face-
book. In this method, BERT is retrained with an improved training methodology by
(1) removing the Next Sentence Prediction task from pretraining; (2) using 10 times
as much data as BERT to train RoBERTa and (3) introducing dynamic masking with
larger batch sizes so that the masked tokens change during training, as opposed to the
static masking pattern used in BERT. In this way, RoBERTa differs from BERT in the
way it approaches pretraining. We experiment with RoBERTabase in this paper.
4.3. Preprocessing
Preprocessing of the input text included stop word and punctuation removal. The text
was then passed through a process of tokenising it into tokens and building a dictionary
Information 2022, 13, 576 16 of 28
as a result of learning the vocabulary included in the given text corpus. Such a dictionary
was then used to map every token into a unique integer, resulting in sequences of integers.
Then, we padded or truncated such sequences to a fixed number of entries, as we needed to
feed the models with vectors of the same length. The downside here was that any vectors
that were too long would be truncated, resulting in the loss of some (useful) information.
These sequences would then be transformed into fixed-length vectors of word embeddings
(i.e., the dimension of such word vector embeddings being used). We then initialised the
neural networks with (1) context-independent pretrained word-embedding models such as
Word2Vec and GloVe and (2) the context-informed pretrained language model BERTbase . For
classical ML algorithms, in order to convert tokenised texts into features, CountVectorizer
(CV) and TF-IDF were used as statistical feature extractors for the ML models. Additionally,
we used pretrained Word2Vec and GloVe to produce word representations as features for
classical ML algorithms.
TP
Precision = (18)
TP + FP
TP
Recall = (19)
TP + FN
Information 2022, 13, 576 17 of 28
• F1 score (F1): The F1 score is calculated as the weighted harmonic mean of the precision
and recall measures of the classifier using Equation (20):
2 ∗ Precision ∗ Recall 2 ∗ TP
F1 = = (20)
Precision + Recall 2 ∗ TP + FP + FN
where TP, TN, FP and FN are true positive, true negative, false positive and false
negative, respectively.
4.6. Datasets
Our study made use of the following datasets described below. These datasets best
reflect the real world, where the percentage of fake news that is fabricated and shared
is smaller than the volume of news produced on a daily basis. The baseline models on
the LIAR, FakeNewsNet-PolitiFact, FakeNewsNet-GossipCop and COVID-19 datasets are
summarised in Tables 1–4, respectively.
4.6.1. LIAR
This is a large-scale, publicly available fake news dataset proposed in [69]. The dataset
has about 12,800 entries and contains two main components: user profiles and short political
statements. The user profile features include the speaker’s name, job, party affiliation, state,
credit history and context. The statements (reported during the time interval from 2007
to 2016) have been labelled by the editors of Politifact.com (accessed on 18 March 2022)
using six fine-grained categories, namely true, mostly true, half true, barely true, false
and pants on fire. These six labels are relatively balanced in size. Overall, each statement
has its associated label and information about the speaker of that statement. The authors
considered a multi-class classification task on the Liar dataset. Several textual and metadata
elements, such as the speaker’s affiliation and the source newspaper, were included, and
labels were assigned based on the six degrees of truth that PolitiFact provides. In our study,
we cast the problem as a binary classification problem where we set true, mostly true, and
half true to “true” and the rest to “false”. The statistics of the dataset are shown in Table 6.
4.6.2. FakeNewsNet
This comprehensive dataset (https://github.com/KaiDMML/FakeNewsNet) (ac-
cessed on 20 March 2022) consists of full-text news articles collected from the politifact.com
and gossipcop.com (accessed on 18 March 2022) websites. Each of these includes tagged
news content (e.g., news articles) and social context information (e.g., relevant social user
interactions for news articles). Nevertheless, this contextual information may not always
be available in the real world. Thus, we solely relied on content-based features (i.e., news
article text). The statistics of the dataset are tabulated in Table 7.
4.6.3. COVID-19
The goal of the Constraint@AAAI2021 COVID19 fake news detection challenge [118]
was to build a model that has the potential to identify whether a message about COVID-19
is fake or real news. This dataset is a collection of COVID-19-related social media posts,
comments and news, classified as real or fake based on their truthfulness. The dataset [98]
was collected from various social media platforms, such as Twitter and YouTube. In
addition, the challenge organisers collected 10,700 social media posts and news articles
about COVID-19 in the form of an annotated dataset in English. As the dataset was
separated in advance by the task organisers into training, validation and testing sets, we
opted to evaluate our models using the original split. The statistics of the dataset are shown
in Table 8.
Table 9. Fake news detection using classical ML classification models on LIAR dataset.
Dataset
LIAR
Model Feature A (%) P (%) R (%) F1 (%)
LR CV 0.6196 0.6530 0.6933 0.6726
SVM CV 0.6346 0.6441 0.7857 0.7079
MNB CV 0.6243 0.6530 0.7115 0.6810
DT CV 0.5730. 0.6137 0.6541 0.6332
RF CV 0.6275 0.6360 0.7927 0.7057
XGB CV 0.6093 0.6085 0.8599 0.7127
Ensemble CV 0.6306 0.6504 0.7451 0.6945
LR TFIDF 0.6354 0.6493 0.7675 0.8655
SVM TFIDF 0.6101 0.6421 0.6961 0.6680
MNB TFIDF 0.6093 0.6311 0.7381 0.6804
DT TFIDF 0.5414 0.5917 0.6008 0.5962
RF TFIDF 0.5659 0.5658 0.9874 0.7194
XGB TFIDF 0.6093 0.6121 0.8375 0.7073
Ensemble TFIDF 0.6393 0.6503 0.7787 0.7087
LR Word2Vec 0.6361 0.6462 0.7829 0.7080
SVM Word2Vec 0.6314 0.6438 0.7745 0.7031
MNB Word2Vec 0.5762 0.5778 0.9202 0.7099
DT Word2Vec 0.5493 0.6032 0.5854 0.5942
RF Word2Vec 0.5675 0.5707 0.9384 0.7097
XGB Word2Vec 0.6156 0.6300 0.7703 0.6931
Ensemble Word2Vec 0.6267 0.6355 0.7913 0.7049
LR GloVe 0.6172 0.6265 0.7941 0.7004
SVM GloVe 0.6212 0.6306 0.7913 0.7019
MN GloVe 0.5635 0.5635 0.5635 0.7208
DT GloVe 0.5178 0.5724 0.5700 0.5712
RF GloVe 0.6006 0.6042 0.8445 0.7044
XGB GloVe 0.5943 0.6109 0.7717 0.6819
Ensemble GloVe 0.5983 0.6099 0.7969 0.6910
Table 10. Fake news detection using advanced ML and DL classification models on LIAR dataset.
Dataset
LIAR
Model Feature A (%) P (%) R (%) F1 (%)
CNN Word2Vec 0.5825 0.6219 0.6611 0.6409
BiLSTM Word2Vec 0.5533 0.6028 0.6078 0.6053
BiGRU Word2Vec 0.5572 0.6073 0.6064 0.6069
CNN-LSTM Word2Vec 0.5635 0.5635 1.0000 0.7208
CNN-GRU Word2Vec 0.5627 0.5632 0.9986 0.7202
CNN-BiLSTM Word2Vec 0.5762 0.6281 0.6078 0.6178
CNN-BiGRU Word2Vec 0.5714 0.6067 0.6807 0.6416
Hybrid Word2Vec 0.5785 0.6175 0.6625 0.6392
CNN GloVe 0.6172 0.6388 0.7381 0.6849
BiLSTM GloVe 0.5888 0.6442 0.6036 0.6233
BiGRU GloVe 0.5927 0.6371 0.6443 0.6407
CNN-LSTM GloVe 0.5635 0.5635 1.0000 0.7208
CNN-GRU GloVe 0.5635 0.5635 1.0000 0.7208
CNN-BiLSTM GloVe 0.5912 0.6365 0.6401 0.6383
CNN-BiGRU GloVe 0.6014 0.6240 0.7367 0.6757
Hybrid GloVe 0.5809 0.6332 0.6092 0.6210
CNN BERTbase 0.5975 0.6555 0.6022 0.6277
BiLSTM BERTbase 0.6204 0.6392 0.7493 0.6899
BiGRU BERTbase 0.6180 0.6386 0.7423 0.6865
CNN-LSTM BERTbase 0.6306 0.6602 0.7101 0.6842
CNN-GRU BERTbase 0.6077 0.6543 0.6443 0.6493
CNN-BiLSTM BERTbase 0.5793 0.6100 0.7031 0.6532
CNN-BiGRU BERTbase 0.6140 0.6313 0.7577 0.6887
Hybrid BERTbase 0.5912 0.6195 0.7115 0.6623
BERTbase BERTbase 0.6306 0.6662 0.6905 0.6781
RoBERTabase RoBERTabase 0.6117 0.6468 0.6849 0.6653
Table 11. Fake news detection using classical ML classification models on FakeNewsNet dataset.
Datasets
PolitiFact GossipCop
Model Feature A (%) P (%) R (%) F1 (%) A (%) P (%) R (%) F1 (%)
LR CV 0.8311 0.7374 0.9771 0.8405 0.8375 0.8644 0.9333 0.8976
SVM CV 0.8330 0.7248 0.9978 0.8397 0.8183 0.8136 0.9881 0.8924
MNB CV 0.7220 0.5393 1.0000 0.7007 0.8335 0.8417 0.9628 0.8982
DT CV 0.7419 0.8050 0.7758 0.7901 0.7875 0.8607 0.8607 0.8607
RF CV 0.8131 0.6918 0.9977 0.8171 0.8536 0.8505 0.9803 0.9108
XGB CV 0.8548 0.7720 0.9840 0.8652 0.8429 0.8442 0.9737 0.9043
Ensemble CV 0.8993 1.0000 0.8250 0.9041 0.8528 0.8521 0.9765 0.9100
LR TFIDF 0.8567 0.9979 0.7642 0.8655 0.8587 0.8605 0.9723 0.9130
SVM TFIDF 0.8416 0.9855 0.7484 0.8508 0.8635 0.8667 0.9702 0.9155
MNB TFIDF 0.7524 0.9973 0.5912 0.7423 0.8212 0.8574 0.9182 0.8868
DT TFIDF 0.6841 0.8272 0.6022 0.6970 0.7861 0.8612 0.8578 0.8595
RF TFIDF 0.7960 1.000 0.6619 0.7966 0.8496 0.8435 0.9856 0.9090
XGB TFIDF 0.8748 0.9846 0.8050 0.8858 0.8410 0.8445 0.9702 0.9030
Ensemble TFIDF 0.8777 0.9565 0.8250 0.8859 0.8611 0.8650 0.9691 0.9141
LR Word2Vec 0.7884 0.6525 0.9952 0.7882 0.8383 0.8421 0.9698 0.9015
SVM Word2Vec 0.7922 0.6651 0.9860 0.7944 0.8367 0.8325 0.9839 0.9019
MNB Word2Vec 0.6575 0.9791 0.4418 0.6089 0.7626 0.7626 1.0000 0.8653
DT Word2Vec 0.6907 0.5393 0.9122 0.6779 0.7578 0.8529 0.8245 0.8385
RF Word2Vec 0.6907 0.4984 0.9784 0.6604 0.8394 0.8388 0.9772 0.9027
XGB Word2Vec 0.6983 0.5047 0.9907 0.6688 0.8332 0.8406 0.9642 0.8982
Ensemble Word2Vec 0.8849 0.9324 0.8625 0.8961 0.8410 0.8395 0.9786 0.9037
LR GloVe 0.6992 0.9819 0.5110 0.6722 0.7628 0.7632 0.9989 0.8653
SVM GloVe 0.7239 0.9859 0.5503 0.7064 0.7626 0.7626 1.0000 0.8653
MN GloVe 0.6575 0.9758 0.4434 0.6097 0.7626 0.7626 1.0000 0.8653
DT GloVe 0.6262 0.8151 0.4921 0.6137 0.7230 0.8265 0.8059 0.8161
RF GloVe 0.6546 0.9789 0.4371 0.6043 0.8236 0.8228 0.9796 0.8944
XGB GloVe 0.7106 0.9853 0.5283 0.6878 0.8062 0.8137 0.9674 0.8839
Ensemble GloVe 0.7986 0.9194 0.7125 0.8028 0.7944 0.7912 0.9923 0.8804
Table 12. Fake news detection using advanced ML and DL classification models on FakeNewsNet dataset.
Datasets
PolitiFact GossipCop
Model Feature A (%) P (%) R (%) F1 (%) A (%) P (%) R (%) F1 (%)
CNN Word2Vec 0.6475 0.9189 0.4250 0.5812 0.8415 0.8566 0.9516 0.9016
BiLSTM Word2Vec 0.7770 0.9153 0.6750 0.7770 0.8239 0.8702 0.9038 0.8867
BiGRU Word2Vec 0.7554 0.8194 0.7375 0.7763 0.8263 0.8616 0.9200 0.8898
CNN-LSTM Word2Vec 0.7266 0.7059 0.9000 0.7912 0.8303 0.8700 0.9140 0.8915
CNN-GRU Word2Vec 0.6691 0.6848 0.7875 0.7326 0.8266 0.8694 0.9091 0.8888
CNN-BiLSTM Word2Vec 0.7050 0.7532 0.7250 0.7389 0.8260 0.8683 0.9098 0.8886
CNN-BiGRU Word2Vec 0.7338 0.7417 0.8250 0.7811 0.8097 0.8791 0.8701 0.8746
Hybrid Word2Vec 0.7050 0.8197 0.6250 0.7092 0.8298 0.8639 0.9221 0.8920
CNN GloVe 0.8561 0.8750 0.8750 0.8750 0.8097 0.8968 0.8480 0.8717
BiLSTM GloVe 0.8129 0.8971 0.7625 0.8243 0.8284 0.8548 0.9337 0.8925
BiGRU GloVe 0.8201 0.8571 0.8250 0.8408 0.8271 0.8580 0.9266 0.8910
CNN-LSTM GloVe 0.7554 0.8382 0.7125 0.7703 0.8185 0.8709 0.8947 0.8826
CNN-GRU GloVe 0.7194 0.7356 0.8000 0.7665 0.8137 0.8688 0.8901 0.8793
CNN-BiLSTM GloVe 0.7986 0.8611 0.7750 0.8158 0.8290 0.8590 0.9280 0.8922
CNN-BiGRU GloVe 0.8129 0.8375 0.8375 0.8375 0.8180 0.8675 0.8986 0.8828
Hybrid GloVe 0.8085 0.8354 0.8250 0.8302 0.8349 0.8598 0.9361 0.8963
CNN BERTbase 0.8993 0.9342 0.8875 0.9103 0.8616 0.8684 0.9702 0.9145
BiLSTM BERTbase 0.9065 0.9589 0.8750 0.9150 0.8391 0.8769 0.9179 0.8969
BiGRU BERTbase 0.9065 0.9136 0.9250 0.9193 0.8533 0.8754 0.9417 0.9073
CNN-LSTM BERTbase 0.8561 0.8750 0.8750 0.8750 0.8418 0.8729 0.9277 0.8994
CNN-GRU BERTbase 0.8417 0.8537 0.8750 0.8642 0.8514 0.8784 0.9396 0.9061
CNN-BiLSTM BERTbase 0.8993 0.9125 0.9125 0.9125 0.8370 0.8756 0.9165 0.8956
CNN-BiGRU BERTbase 0.8777 0.9437 0.8375 0.8874 0.8322 0.8915 0.8880 0.8897
Hybrid BERTbase 0.9209 0.9600 0.9000 0.9290 0.8437 0.8719 0.9319 0.9009
BERTbase BERTbase 0.9137 0.9359 0.9125 0.9241 0.8544 0.8724 0.9477 0.9085
RoBERTabase RoBERTabase 0.9209 0.9259 0.9375 0.9317 0.8383 0.8794 0.9133 0.8960
Information 2022, 13, 576 22 of 28
Table 13. Fake news detection using classical ML classification models on COVID-19 dataset.
Dataset
COVID-19
Model Feature A (%) P (%) R (%) F1 (%)
LR CV 0.9313 0.9363 0.9321 0.9342
SVM CV 0.9318 0.9333 0.9366 0.9349
MNB CV 0.9051 0.9127 0.9054 0.9090
DT CV 0.8874 0.8921 0.8929 0.8925
RF CV 0.9271 0.9221 0.9402 0.9310
XGB CV 0.8944 0.8942 0.9054 0.8997
Ensemble CV 0.9327 0.9444 0.9259 0.9351
LR TFIDF 0.9294 0.9209 0.9464 0.9335
SVM TFIDF 0.9430 0.9354 0.9571 0.9462
MNB TFIDF 0.9187 0.9085 0.9393 0.9236
DT TFIDF 0.8696 0.8778 0.8723 0.8751
RF TFIDF 0.8654 0.8467 0.9071 0.8759
XGB TFIDF 0.8883 0.8930 0.8938 0.8934
Ensemble TFIDF 0.9360 0.9408 0.9366 0.9387
LR Word2Vec 0.9028 0.9050 0.9098 0.9074
SVM Word2Vec 0.9051 0.9018 0.9187 0.9102
MNB Word2Vec 0.8388 0.8083 0.9071 0.8549
DT Word2Vec 0.7916 0.7993 0.8036 0.8014
RF Word2Vec 0.8734 0.8794 0.8786 0.8790
XGB Word2Vec 0.8916 0.8943 0.8991 0.8967
Ensemble Word2Vec 0.9023 0.9175 0.8938 0.9055
LR GloVe 0.8182 0.8170 0.8411 0.8289
SVM GloVe 0.8173 0.8145 0.8429 0.8284
MN GloVe 0.6486 0.6024 0.6961 0.7421
DT GloVe 0.7379 0.7450 0.7589 0.7519
RF GloVe 0.8280 0.8144 0.8696 0.8411
XGB GloVe 0.8416 0.8381 0.8643 0.8510
Ensemble GloVe 0.8449 0.8493 0.8554 0.8523
Table 14. Fake news detection using advanced ML and DL classification models on COVID-19 dataset.
Dataset
COVID-19
Model Feature A (%) P (%) R (%) F1 (%)
CNN Word2Vec 0.9187 0.9022 0.9473 0.9242
BiLSTM Word2Vec 0.9257 0.9309 0.9268 0.9289
BiGRU Word2Vec 0.9294 0.9246 0.9420 0.9332
CNN-LSTM Word2Vec 0.9290 0.9291 0.9357 0.9324
CNN-GRU Word2Vec 0.9332 0.9342 0.9384 0.9363
CNN-BiLSTM Word2Vec 0.9304 0.9362 0.9304 0.9333
CNN-BiGRU Word2Vec 0.9070 0.9656 0.8527 0.9056
Hybrid Word2Vec 0.9294 0.9145 0.9545 0.9340
CNN GloVe 0.9542 0.9554 0.9571 0.9563
BiLSTM GloVe 0.9355 0.9161 0.9652 0.9400
BiGRU GloVe 0.9421 0.9220 0.9714 0.9461
CNN-LSTM GloVe 0.8593 0.7991 0.9768 0.8791
CNN-GRU GloVe 0.5234 0.5234 0.9991 0.6869
CNN-BiLSTM GloVe 0.9477 0.9308 0.9723 0.9511
CNN-BiGRU GloVe 0.9472 0.9382 0.9625 0.9502
Hybrid GloVe 0.9374 0.9143 0.9714 0.9420
CNN BERTbase 0.9752 0.9725 0.9804 0.9764
BiLSTM BERTbase 0.9650 0.9485 0.9866 0.9672
BiGRU BERTbase 0.9706 0.9616 0.9830 0.9722
CNN-LSTM BERTbase 0.9598 0.9434 0.9821 0.9624
CNN-GRU BERTbase 0.9626 0.9710 0.9571 0.9640
CNN-BiLSTM BERTbase 0.9621 0.9529 0.9759 0.9643
CNN-BiGRU BERTbase 0.9565 0.9485 0.9696 0.9589
Hybrid BERTbase 0.9617 0.9642 0.9625 0.9634
BERTbase BERTbase 0.9771 0.9735 0.9830 0.9782
RoBERTabase RoBERTabase 0.9668 0.9541 0.9839 0.9688
Information 2022, 13, 576 23 of 28
6. Discussion
In this section, we will have an overall discussion of the experimental results in the
previous subsections. The experiments were carried out using a set of datasets commonly
used in fake news detection by applying an assortment of classical and advanced ML clas-
sifiers using different word representation methods as well as different transformer-based
architectures. The results of the models on the LIAR dataset are tabulated in Tables 9 and 10.
The TF-IDF-based method for extracting text features performed better, with the ensemble
approach exhibiting the highest accuracy while the CNN-LSTM/CNN-GRU hybrid model
with GloVe embeddings had the highest F1 score. Despite their ability to capture automatic
features, advanced ML models may produce unsatisfactory results when used individually
at this level of analysis, where hybrid algorithms generally perform better. For instance,
when a CNN and LSTM were combined with BERTbase , the classification accuracy was
comparable to that of the best-performing model. As such, we agree with [80] that combin-
ing different models based on CNNs and recurrence-based models helps uncover useful
patterns to discriminate fake from real content, where a model based on this combination
would not only be able to extract the most relevant and salient local features but also
capture long-term dependencies. Additionally, BERTbase alone achieved an accuracy result
that was on par with the best-performing models, demonstrating the power of such a
model in capturing useful clues from a short text sample. Moreover, we observed that
the CNN network with context-free embedding models beat the recurrence-based models,
whereas the opposite was true for the embedding models that took context into account.
Using the FakeNewsNet dataset, as shown in Tables 11 and 12, we found that the
transformer-based DL architectures outperformed the state-of-the-art approaches on the
PolitiFact dataset. In contrast, classical ML models, such as SVM with TF-IDF features,
beat the DL models on the GossipCop dataset, including the official baselines. On the
COVID-19 dataset (see Tables 13 and 14), BERTbase was ranked as the top-performing model,
and CNN-BERTbase was the second-best model. The results show that transformer-based
models are powerful in capturing the intricacies of such (noisy and sparse) datasets.
7. Conclusions
This paper presents an overview of fake news detection approaches found in the
literature and compares seven classical ML algorithms, namely LR, SVM, NB, DT, RF, XGB
and an ensemble that combines all such algorithms, with two scenarios of word represen-
tation methods: statistical (sparse) word vector representation methods and context-free
(dense) pretrained word representation models. In addition, this paper compares eight
advanced ML models, namely the CNN, BiLSTM, BiGRU, CNN-BiLSTM, CNN-BiGRU
and different hybrid models with two types of text-representation models—context-free
and context-aware embedding models—and two advanced pretrained transformer-based
models: BERTbase and RoBERTabase . We found that the ensemble of the classical ML meth-
ods with TF-IDF features outperformed the other methods on the LIAR dataset, including
advanced ML models. As the LIAR dataset contains short political statements, it is difficult
to obtain useful clues that could help discriminate fake news from real news. However,
BERTbase alone was able to achieve an accuracy result that was on par with the best-
performing models, demonstrating the power of such a model in capturing useful clues
from a short piece of text. RoBERTabase performed the best on the PolitiFact dataset with
an F1 score of 93.17. In contrast, classical ML methods, such as SVM with TF-IDF features,
beat the DL models on the GossipCop dataset, including the official baselines. For the
COVID-19 dataset, the best-performing model was BERTbase . By relying only on news text
with minimal text preprocessing, we achieved better results than the state-of-the-art results
across the used datasets. It may be possible to improve the performance if context-based
information, in addition to news text, is considered and if other factors, such as style
and sentiment, are included. Generally, the experimental results indicate that no single
technique can deliver the best performance scores across all used datasets.
Information 2022, 13, 576 24 of 28
To this end, we hope our comparative study will contribute to future research in this
area and identify the best model for detecting fake news.
Author Contributions: Conceptualisation, experiments and formal analysis, J.A.; paper writing, J.A.;
writing—review and editing, Y.L.; supervision, S.L. and Y.L. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The LIAR data is available from [69], FakeNewsNet data is available
from https://github.com/KaiDMML/FakeNewsNet (accessed on 20 March 2022) and COVID-19
data is available from [118].
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Bondielli, A.; Marcelloni, F. A survey on fake news and rumour detection techniques. Inf. Sci. 2019, 497, 38–55. [CrossRef]
2. Kwak, H.; Lee, C.; Park, H.; Moon, S. What is Twitter, a Social Network or a News Media? In Proceedings of the 19th International
Conference on World Wide Web; Association for Computing Machinery: New York, NY, USA, 2010; pp. 591–600. [CrossRef]
3. Allcott, H.; Gentzkow, M. Social media and fake news in the 2016 election. J. Econ. Perspect. 2017, 31, 211–236. [CrossRef]
4. Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake News Detection on Social Media: A Data Mining Perspective. arXiv 2017,
arXiv:1708.01967.
5. Trends, G. “Fake News—Explore—Google Trends”. Available online: https://trends.google.com/trends/explore?date=2010-01-
01%202022-07-14&q=%22fake%20news%22 (accessed on 20 July 2022).
6. Langin, K. Fake news spreads faster than true news on Twitter—Thanks to people, not bots. Sci. Mag. 2018. Available online:
https://www.science.org/content/article/fake-news-spreads-faster-true-news-twitter-thanks-people-not-bots (accessed on 20
February 2022).
7. Zubiaga, A.; Aker, A.; Bontcheva, K.; Liakata, M.; Procter, R. Detection and Resolution of Rumours in Social Media: A Survey.
ACM Comput. Surv. 2018, 51, 1–36. [CrossRef]
8. Zhao, Z.; Resnick, P.; Mei, Q. Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts. In Proceedings of
the 24th International Conference on World Wide Web; International World Wide Web Conferences Steering Committee: Geneva,
Switzerland, 2015; pp. 1395–1405. [CrossRef]
9. Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [CrossRef]
10. Kumar, S.; West, R.; Leskovec, J. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In
Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 591–602.
11. Friggeri, A.; Adamic, L.; Eckles, D.; Cheng, J. Rumor cascades. In Proceedings of the International AAAI Conference on Web and
Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8.
12. Zhou, X.; Zafarani, R. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Comput.
Surv. 2020, 53, 1–40. [CrossRef]
13. Conroy, N.K.; Rubin, V.L.; Chen, Y. Automatic deception detection: Methods for finding fake news. Proc. Assoc. Inf. Sci. Technol.
2015, 52, 1–4. [CrossRef]
14. Zhou, X.; Jain, A.; Phoha, V.V.; Zafarani, R. Fake News Early Detection: An Interdisciplinary Study. arXiv 2019, arXiv:1904.11679.
15. Feng, S.; Banerjee, R.; Choi, Y. Syntactic Stylometry for Deception Detection. In Proceedings of the 50th Annual Meeting of the
Association for Computational Linguistics (Volume 2: Short Papers); Association for Computational Linguistics: Jeju Island, Republic
of Korea, 2012; pp. 171–175.
16. Karimi, H.; Tang, J. Learning hierarchical discourse-level structure for fake news detection. arXiv 2019, arXiv:1903.07389.
17. Pisarevskaya, D. Deception detection in news reports in the russian language: Lexics and discourse. In Proceedings of the 2017
EMNLP Workshop: Natural Language Processing Meets Journalism, Copenhagen, Denmark, 7 September 2017; pp. 74–79.
18. Pérez-Rosas, V.; Kleinberg, B.; Lefevre, A.; Mihalcea, R. Automatic detection of fake news. arXiv 2017, arXiv:1708.07104.
19. Chen, Y.; Conroy, N.J.; Rubin, V.L. Misleading Online Content: Recognizing Clickbait as “False News”. In Proceedings of the 2015
ACM on Workshop on Multimodal Deception Detection; Association for Computing Machinery: New York, NY, USA, 2015; pp. 15–19.
[CrossRef]
20. Potthast, M.; Kiesel, J.; Reinartz, K.; Bevendorff, J.; Stein, B. A stylometric inquiry into hyperpartisan and fake news. arXiv 2017,
arXiv:1702.05638.
21. Fuller, C.M.; Biros, D.P.; Wilson, R.L. Decision support for determining veracity via linguistic-based cues. Decis. Support Syst.
2009, 46, 695–703. [CrossRef]
Information 2022, 13, 576 25 of 28
22. Zhou, L.; Burgoon, J.K.; Nunamaker, J.F.; Twitchell, D. Automating linguistics-based cues for detecting deception in text-based
asynchronous computer-mediated communications. Group Decis. Negot. 2004, 13, 81–106. [CrossRef]
23. Pennebaker, J.W.; Francis, M.E.; Booth, R.J. Linguistic inquiry and word count: LIWC 2001. Mahway Lawrence Erlbaum Assoc.
2001, 71, 2001.
24. Riedel, B.; Augenstein, I.; Spithourakis, G.P.; Riedel, S. A simple but tough-to-beat baseline for the Fake News Challenge stance
detection task. arXiv 2018, arXiv:cs.CL/1707.03264.
25. Ahmed, H.; Traore, I.; Saad, S. Detection of online fake news using n-gram analysis and machine learning techniques. In
Proceedings of the International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments;
Springer: Berlin/Heidelberg, Germany, 2017; pp. 127–138.
26. Bharadwaj, P.; Shao, Z. Fake news detection with semantic features and text mining. Int. J. Nat. Lang. Comput. (IJNLC) 2019, 8,
17–22. [CrossRef]
27. Wynne, H.E.; Wint, Z.Z. Content based fake news detection using n-gram models. In Proceedings of the 21st International Con-
ference on Information Integration and Web-Based Applications & Services, Munich, Germany, 2–4 December 2019; pp. 669–673.
28. Gravanis, G.; Vakali, A.; Diamantaras, K.; Karadais, P. Behind the cues: A benchmarking study for fake news detection. Expert
Syst. Appl. 2019, 128, 201–213. [CrossRef]
29. Burgoon, J.K.; Blair, J.P.; Qin, T.; Nunamaker, J.F. Detecting deception through linguistic analysis. In Proceedings of the International
Conference on Intelligence and Security Informatics; Springer: Berlin/Heidelberg, Germany, 2003; pp. 91–101.
30. Hancock, J.T.; Curry, L.E.; Goorha, S.; Woodworth, M. On lying and being lied to: A linguistic analysis of deception in
computer-mediated communication. Discourse Process. 2007, 45, 1–23. [CrossRef]
31. Newman, M.L.; Pennebaker, J.W.; Berry, D.S.; Richards, J.M. Lying Words: Predicting Deception from Linguistic Styles. Personal.
Soc. Psychol. Bull. 2003, 29, 665–675. [CrossRef]
32. Tausczik, Y.R.; Pennebaker, J.W. The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang.
Soc. Psychol. 2010, 29, 24–54. [CrossRef]
33. Vrij, A. Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice; Wiley: Hoboken, NJ, USA, 2000.
34. Knapp, M.L.; Hart, R.P.; Dennis, H.S. An exploration of deception as a communication construct. Hum. Commun. Res. 1974,
1, 15–29. [CrossRef]
35. Horne, B.; Adali, S. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire
than real news. In Proceedings of the International AAAI Conference on Web and Social Media, Montreal, QC, Canada, 15–18
May 2017; Volume 11, pp. 759–766.
36. Papadopoulou, O.; Zampoglou, M.; Papadopoulos, S.; Kompatsiaris, I. A two-level classification approach for detecting clickbait
posts using text-based features. arXiv 2017, arXiv:1710.08528.
37. Rubin, V.L.; Conroy, N.; Chen, Y.; Cornwell, S. Fake news or truth? using satirical cues to detect potentially misleading news. In
Proceedings of the Second Workshop on Computational Approaches to Deception Detection, San Diego, CA, USA, 17 June 2016;
pp. 7–17.
38. Chowdhury, G.G. Natural language processing. Annu. Rev. Inf. Sci. Technol. 2003, 37, 51–89. [CrossRef]
39. Qazvinian, V.; Rosengren, E.; Radev, D.R.; Mei, Q. Rumor has it: Identifying Misinformation in Microblogs. In Proceedings of the
2011 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Edinburgh, UK,
2011; pp. 1589–1599.
40. Castillo, C.; Mendoza, M.; Poblete, B. Information Credibility on Twitter. In Proceedings of the 20th International Conference on
World Wide Web; Association for Computing Machinery: New York, NY, USA, 2011; pp. 675–684. [CrossRef]
41. Hamidian, S.; Diab, M.T. Rumor Detection and Classification for Twitter Data. arXiv 2019, arXiv:1912.08926.
42. Yang, F.; Liu, Y.; Yu, X.; Yang, M. Automatic Detection of Rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on
Mining Data Semantics; Association for Computing Machinery: New York, NY, USA, 2012. [CrossRef]
43. Ajao, O.; Bhowmik, D.; Zargari, S. Sentiment Aware Fake News Detection on Online Social Networks. In Proceedings of the
ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17
May 2019; pp. 2507–2511.
44. Ratkiewicz, J.; Conover, M.; Meiss, M.; Gonçalves, B.; Flammini, A.; Menczer, F. Detecting and tracking political abuse in social
media. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011;
Volume 5.
45. Samonte, M.J.C. Polarity analysis of editorial articles towards fake news detection. In Proceedings of the 2018 International
Conference on Internet and e-Business, Singapore, 25–27 April 2018; pp. 108–112.
46. Volkova, S.; Shaffer, K.; Jang, J.Y.; Hodas, N. Separating facts from fiction: Linguistic models to classify suspicious and trusted
news posts on twitter. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 647–653.
47. Bhelande, M.; Sanadhya, A.; Purao, M.; Waldia, A.; Yadav, V. Identifying controversial news using sentiment analysis. Imp. J.
Interdiscip. Res. 2017, 3. Available online: https://www.semanticscholar.org/paper/Identifying-Controversial-News-using-
Sentiment-Bhelande-Sanadhya/23862325ff7b53e7851cd4398553d82cbca483f4 (accessed on 22 May 2022).
48. Qin, Y.; Wurzer, D.; Lavrenko, V.; Tang, C. Spotting Rumors via Novelty Detection. arXiv 2016, arXiv:1611.06322.
49. Kwon, S.; Cha, M.; Jung, K. Rumor Detection over Varying Time Windows. PLoS ONE 2017, 12, e0168344. [CrossRef]
Information 2022, 13, 576 26 of 28
50. Wei, W.; Wan, X. Learning to identify ambiguous and misleading news headlines. arXiv 2017, arXiv:1705.06031.
51. Chakraborty, A.; Paranjape, B.; Kakarla, S.; Ganguly, N. Stop Clickbait: Detecting and Preventing Clickbaits in Online News
Media. arXiv 2016, arXiv:1610.09786.
52. Feng, V.W.; Hirst, G. Detecting Deceptive Opinions with Profile Compatibility. In Proceedings of the Sixth International Joint
Conference on Natural Language Processing; Asian Federation of Natural Language Processing: Nagoya, Japan, 2013; pp. 338–346.
53. Potthast, M.; Köpsel, S.; Stein, B.; Hagen, M. Clickbait Detection. In Proceedings of the ECIR, Padua, Italy, 20–23 March 2016.
54. Gupta, A.; Kumaraguru, P.; Castillo, C.; Meier, P. Tweetcred: Real-time credibility assessment of content on twitter. In Proceedings
of the International Conference on Social Informatics; Springer: Berlin/Heidelberg, Germany, 2014; pp. 228–243.
55. Bhattacharjee, S.D.; Talukder, A.; Balantrapu, B.V. Active learning based news veracity detection with feature weighting and
deep-shallow fusion. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14
December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 556–565.
56. Ribeiro, M.H.; Calais, P.H.; Almeida, V.A.; Meira, W., Jr. “Everything I Disagree With is# FakeNews”: Correlating Political
Polarization and Spread of Misinformation. arXiv 2017, arXiv:1706.05924.
57. Popat, K. Assessing the Credibility of Claims on the Web. In Proceedings of the 26th International Conference on World Wide Web
Companion; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 735–739. [CrossRef]
58. Cardoso Durier da Silva, F.; Vieira, R.; Garcia, A.C. Can machines learn to detect fake news? A survey focused on social media.
In Proceedings of the 52nd Hawaii International Conference on System Sciences, Maui, HI, USA, 8–11 January 2019.
59. Markowitz, D.M.; Hancock, J.T. Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel. PLoS ONE 2014, 9, e105937.
[CrossRef] [PubMed]
60. Ruchansky, N.; Seo, S.; Liu, Y. CSI: A Hybrid Deep Model for Fake News Detection. In Proceedings of the 2017 ACM on Conference
on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2017; pp. 797–806.
61. Lahby, M.; Aqil, S.; Yafooz, W.M.S.; Abakarim, Y. Online Fake News Detection Using Machine Learning Techniques: A Systematic
Mapping Study. In Combating Fake News with Computational Intelligence Techniques; Lahby, M., Pathan, A.S.K., Maleh, Y., Yafooz,
W.M.S., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 3–37. [CrossRef]
62. Klyuev, V. Fake news filtering: Semantic approaches. In Proceedings of the 2018 7th International Conference on Reliability,
Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; IEEE:
Piscataway, NJ, USA, 2018; pp. 9–15.
63. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013,
arXiv:1301.3781.
64. Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543.
65. Arora, S.; Liang, Y.; Ma, T. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the International
Conference on Learning Representations, Toulon, France, 24–26 April 2017.
66. Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on
Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 1188–1196.
67. Zubiaga, A.; Liakata, M.; Procter, R. Exploiting context for rumour detection in social media. In Proceedings of the International
Conference on Social Informatics; Springer: Berlin/Heidelberg, Germany, 2017; pp. 109–123.
68. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and Their
Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems; Curran Associates
Inc.: Red Hook, NY, USA, 2013; Volume 2, pp. 3111–3119.
69. Wang, W.Y. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. arXiv 2017, arXiv:1705.00648.
70. Qian, F.; Gong, C.; Sharma, K.; Liu, Y. Neural User Response Generator: Fake News Detection with Collective User Intelligence.
In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 18, pp. 3834–3840.
71. Goldani, M.H.; Momtazi, S.; Safabakhsh, R. Detecting fake news with capsule neural networks. Appl. Soft Comput. 2021,
101, 106991. [CrossRef]
72. Girgis, S.; Amer, E.; Gadallah, M. Deep learning algorithms for detecting fake news in online text. In Proceedings of the 2018 13th
International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 18–19 December 2018; IEEE: Piscataway,
NJ, USA, 2018; pp. 93–97.
73. Sarnovskỳ, M.; Maslej-Krešňáková, V.; Ivancová, K. Fake News Detection Related to the COVID-19 in Slovak Language Using
Deep Learning Methods. Acta Polytech. Hung. 2022, 19, 43–57. [CrossRef]
74. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
75. Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations.
arXiv 2018, arXiv:1802.05365.
76. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training.
2018. Available online: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (accessed on 10
August 2022).
77. Müller, M.; Salathé, M.; Kummervold, P.E. Covid-twitter-bert: A natural language processing model to analyse COVID-19 content
on twitter. arXiv 2020, arXiv:2005.07503.
Information 2022, 13, 576 27 of 28
78. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly
optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
79. Kula, S.; Choraś, M.; Kozik, R. Application of the bert-based architecture in fake news detection. In Proceedings of the Computational
Intelligence in Security for Information Systems Conference; Springer: Berlin/Heidelberg, Germany, 2019; pp. 239–249.
80. Alghamdi, J.; Lin, Y.; Luo, S. Modeling Fake News Detection Using BERT-CNN-BiLSTM Architecture. In Proceedings of the
2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), Online, 2–4 August 2022;
pp. 354–357. [CrossRef]
81. Aggarwal, A.; Chauhan, A.; Kumar, D.; Mittal, M.; Verma, S. Classification of fake news by fine-tuning deep bidirectional
transformers based language model. EAI Endorsed Trans. Scalable Inf. Syst. 2020, 7, e10. [CrossRef]
82. Jwa, H.; Oh, D.; Park, K.; Kang, J.M.; Lim, H. exBAKE: Automatic fake news detection model based on bidirectional encoder
representations from transformers (bert). Appl. Sci. 2019, 9, 4062. [CrossRef]
83. Baruah, A.; Das, K.A.; Barbhuiya, F.A.; Dey, K. Automatic Detection of Fake News Spreaders Using BERT. In Proceedings of the
CLEF (Working Notes), Thessaloniki, Greece, 22–25 September 2020.
84. Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019,
arXiv:1910.01108.
85. Khan, J.Y.; Khondaker, M.T.I.; Afroz, S.; Uddin, G.; Iqbal, A. A benchmark study of machine learning models for online fake news
detection. Mach. Learn. Appl. 2021, 4, 100032. [CrossRef]
86. Elhadad, M.K.; Li, K.F.; Gebali, F. A Novel Approach for Selecting Hybrid Features from Online News Textual Metadata for
Fake News Detection. In Proceedings of the Advances on P2P, Parallel, Grid, Cloud and Internet Computing; Barolli, L., Hellinckx, P.,
Natwichai, J., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 914–925.
87. Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; Liu, H. FakeNewsNet: A Data Repository with News Content, Social Context and
Spatialtemporal Information for Studying Fake News on Social Media. arXiv 2018, arXiv:1809.01286.
88. Shu, K.; Cui, L.; Wang, S.; Lee, D.; Liu, H. DEFEND: Explainable Fake News Detection. In Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2019;
pp. 395–405. [CrossRef]
89. Koloski, B.; Perdih, T.S.; Robnik-Šikonja, M.; Pollak, S.; Škrlj, B. Knowledge graph informed fake news classification via
heterogeneous representation ensembles. Neurocomputing 2022, 496, 208–226. [CrossRef]
90. Oriola, O. Exploring N-gram, word embedding and topic models for content-based fake news detection in FakeNewsNet
evaluation. Int. J. Comput. Appl. 2021, 975, 8887. [CrossRef]
91. Sadeghi, F.; Jalaly Bidgoly, A.; Amirkhani, H. Fake News Detection on Social Media Using A Natural Language Inference
Approach. Multimed. Tools Appl. 2020. [CrossRef]
92. Gautam, A.; Venktesh, V.; Masud, S. Fake News Detection System using XLNet model with Topic Distributions: CON-
STRAINT@AAAI2021 Shared Task. arXiv 2021, arXiv:2101.11425. [CrossRef]
93. Shifath, S.M.S.U.R.; Khan, M.F.; Islam, M.S. A transformer based approach for fighting COVID-19 fake news. arXiv 2021,
arXiv:2101.12027. [CrossRef]
94. Veyseh, A.P.B.; Thai, M.T.; Nguyen, T.H.; Dou, D. Rumor detection in social networks via deep contextual modeling. In
Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver,
BC, Canada, 27–30 August 2019; pp. 113–120.
95. Wani, A.; Joshi, I.; Khandve, S.; Wagh, V.; Joshi, R. Evaluating Deep Learning Approaches for COVID-19 Fake News De-
tection. In Combating Online Hostile Posts in Regional Languages during Emergency Situation; Springer International Publishing:
Berlin/Heidelberg, Germany, 2021; pp. 153–163. [CrossRef]
96. Shushkevich, E.; Cardiff, J. TUDublin team at Constraint@ AAAI2021–COVID19 Fake News Detection. arXiv 2021,
arXiv:2101.05701.
97. Felber, T. Constraint 2021: Machine learning models for COVID-19 fake news detection shared task. arXiv 2021, arXiv:2101.03717.
98. Patwa, P.; Sharma, S.; Pykl, S.; Guptha, V.; Kumari, G.; Akhtar, M.S.; Ekbal, A.; Das, A.; Chakraborty, T. Fighting an infodemic:
COVID-19 fake news dataset. In Proceedings of the International Workshop on Combating Online Hostile Posts in Regional Languages
during Emergency Situation; Springer: Berlin/Heidelberg, Germany, 2021; pp. 21–29.
99. Goldberg, Y. Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 2017, 10, 1–309.
100. Wallach, H.M. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning,
Pittsburgh, PA, USA, 25–29 June 2006; pp. 977–984.
101. Damashek, M. Gauging similarity with n-grams: Language-independent categorization of text. Science 1995, 267, 843–848.
[CrossRef] [PubMed]
102. Joachims, T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report,
Carnegie-Mellon univ Pittsburgh pa Dept of Computer Science. 1996. Available online: https://www.cs.cornell.edu/people/tj/
publications/joachims_97a.pdf (accessed on 11 January 2022).
103. McCallum, A.; Nigam, K. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI-98 Workshop
on Learning for Text Categorization; Citeseer: State College, PA, USA, 1998; Volume 752, pp. 41–48.
104. Trstenjak, B.; Mikac, S.; Donko, D. KNN with TF-IDF based framework for text categorization. Procedia Eng. 2014, 69, 1356–1364.
[CrossRef]
Information 2022, 13, 576 28 of 28
105. Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the
European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1998; pp. 137–142.
106. Harris, Z.S. Distributional structure. Word 1954, 10, 146–162. [CrossRef]
107. Salton, G. MCGILL, Michael. In Introduction to Modern Information Retrieval; McGraw-Hill, Inc.: New York, NY, USA, 1986.
108. Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008;
Volume 39.
109. Mikolov, T.; Yih, W.t.; Zweig, G. Linguistic regularities in continuous space word representations. In Proceedings of the 2013
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,
Atlanta, GA, USA, 9–14 June 2013; pp. 746–751.
110. Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv 2018, arXiv:1801.06146.
111. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
112. Tang, D.; Qin, B.; Liu, T. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proceedings
of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1422–
1432.
113. Graves, A. Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer:
Berlin/Heidelberg, Germany, 2012; pp. 5–13.
114. Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations
using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078.
115. Zhang, X.; Chen, F.; Huang, R. A combination of RNN and CNN for attention-based relation classification. Procedia Comput. Sci.
2018, 131, 911–917. [CrossRef]
116. Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv 2015, arXiv:1511.08630.
117. Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t stop pretraining: Adapt
language models to domains and tasks. arXiv 2020, arXiv:2004.10964.
118. Patwa, P.; Bhardwaj, M.; Guptha, V.; Kumari, G.; Sharma, S.; Pykl, S.; Das, A.; Ekbal, A.; Akhtar, M.S.; Chakraborty, T. Overview
of constraint 2021 shared tasks: Detecting english COVID-19 fake news and hindi hostile posts. In Proceedings of the International
Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation; Springer: Berlin/Heidelberg, Germany,
2021; pp. 42–53.
119. Horne, L.; Matti, M.; Pourjafar, P.; Wang, Z. GRUBERT: A GRU-Based Method to Fuse BERT Hidden Layers for Twitter Sentiment
Analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th
International Joint Conference on Natural Language Processing: Student Research Workshop; Association for Computational Linguistics:
Suzhou, China, 2020; pp. 130–138.