Analysis_and_Classification_of_Fake_News_Using_Sequential_Pattern_Mining
Analysis_and_Classification_of_Fake_News_Using_Sequential_Pattern_Mining
Abstract: Disinformation, often known as fake news, is a major issue that has received a lot of attention lately.
Many researchers have proposed effective means of detecting and addressing it. Current machine and deep
learning based methodologies for classification/detection of fake news are content-based, network
(propagation) based, or multimodal methods that combine both textual and visual information. We introduce
here a framework, called FNACSPM, based on sequential pattern mining (SPM), for fake news analysis and
classification. In this framework, six publicly available datasets, containing a diverse range of fake and real
news, and their combination, are first transformed into a proper format. Then, algorithms for SPM are applied to
the transformed datasets to extract frequent patterns (and rules) of words, phrases, or linguistic features. The
obtained patterns capture distinctive characteristics associated with fake or real news content, providing
valuable insights into the underlying structures and commonalities of misinformation. Subsequently, the
discovered frequent patterns are used as features for fake news classification. This framework is evaluated
with eight classifiers, and their performance is assessed with various metrics. Extensive experiments were
performed and obtained results show that FNACSPM outperformed other state-of-the-art approaches for fake
news classification, and that it expedites the classification task with high accuracy.
Key words: disinformation; fake news; sequential pattern mining (SPM); frequent patterns; classification
© The author(s) 2024. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 943
responds to true news and can undermine the based on pattern mining that focuses on a diverse set of
credibility of the whole news ecosystem[7–9]. Thus, it is datasets.
important to analyze and detect fake news on OSN and This study’s two primary objectives are to (1)
other platforms. examine the application of sequential pattern mining
Many manual tools and websites for fact-checking (SPM)[19] for the reliable and accurate classification
(e.g., PolitiFact◎, FactCheck ‡ , Snopes§, and Fiskkit¶) and detection of fake news from datasets in textual
are currently available for the analysis, evaluation, and format, and (2) evaluate the SPM-based fake news
recognition of fake news. However, the problem of classification approach on multiple datasets, and their
fake news analysis and detection is far from being combination, to get insights into its effectiveness and
solved. Now it is not possible to manually assess and generalization ability across different data sources and
verify every news or information due to the enormous characteristics. In the past, SPM is used extensively in
amount of online data generated every minute, various applications such as tourist movements
particularly on OSN platforms[1]. Moreover, analysis[20], bioinformatics[21, 22], market basket[23], text
determining the credibility of online news articles is analysis[24], energy reduction in smarthomes[25],
difficult as fake news frequently contains wrong or malware analysis[26], proof sequence analysis[27], and
false information mixed with certain facts[10]. In the webpage click-stream analysis[28]. However, no one
last decade, computational approaches for fake news has explored its applicability for fake news analysis
classification/detection have drawn a lot of interest. and classification yet. Based on the analysis of online
Fake news classification/detection methods, based on news and information contents, we present a new
machine learning (ML) and deep learning (DL), can be content-based framework, called fake news analysis
broadly classified into two main groups: (1) content- and classification using sequential pattern mining
based methods and (2) propagation-based (FNACSPM) that provides:
methods[11–14]. Content-based methods detect fake ● One approach based on SPM to analyze news
news by analyzing the news content or information contents. Using this approach, the datasets are first
present in articles by either relying on a knowledge- transformed into an appropriate learning format.
based system[15, 16] or finding latent[13, 17] and non- Second, SPM techniques are employed to find frequent
latent (hand-crafted) features[16, 18] in the content. sequential patterns in the transformed datasets.
Knowledge-based fake news detection methods can Additionally, frequent sequential rules among fake and
only detect false news but not fake news[1, 11]. Non- real news are identified.
latent features are style-based and self-defined at ● One fake news detection approach that uses
various language levels, and various embedding and frequent patterns (FreqP), discovered by using SPM
encoding techniques are used for these features. Latent algorithms. These patterns are then utilized in the fake
features are features that are automatically generated
news classification process. For classification, eight
by using matrix or tensor factorization, or DL
classifiers are utilized and comprehensive experiments
techniques (for more details about non-latent and latent
are conducted by using various evaluation metrics to
features, see Section 2). Selecting features or extracting
evaluate the effectiveness of the detection approach.
non-latent features requires expertise, and some
The proposed framework is evaluated on six datasets,
discovered linguistic clues might not be applicable to
and their combination for both binary and multi-class
news or information. Latent features perform well, but
fake news classification. Obtained results indicate that
they are difficult to comprehend. Moreover, content-
using the FNACSPM to identify frequent sequential
based methods often face problems of computational
patterns in news and using these patterns yields
efficiency, interpretability, scalability, and
improved classification results as compared to using all
generalization because they are tested on limited
the news. It is also observed that logistic regression
datasets. As far as we are aware, no study has been
(LR) performed well, overall, for both types of
published yet for fake news classification or detection
classification. Using all the news in the classification
◎https://www.politifact.com/
process provided less accurate results and took more
‡https://www.factcheck.org/
§https://www.snopes.com/ time. FNACSPM also outperformed state-of-the-art
¶https://fiskkit.com/ approaches for fake news classification/detection. By
944 Big Data Mining and Analytics, September 2024, 7(3): 942−963
utilizing frequent patterns, this study offers valuable to have or create a KG that has all the possible
insights into the linguistic and semantic structures “valuable” information and facts[1].
present in fake news. This aids in a deeper and better Style-based approaches for fake news detection, as
understanding of the characteristics and commonalities opposed to knowledge-based systems, examine the
of misinformation, potentially assisting in the news contents. To differentiate fake news from the
development of faster and more reliable strategies and truth, these methods use various general self-defined
models for detection. (non-latent) features that represent the writing style of
The rest of the paper contains five sections. Section 2 online news. Non-latent features describe the style of
examines the previous research on the analysis and the news (or content) at four language levels: (1)
classification of fake news by using ML and DL. lexicon[11, 16, 17], (2) syntax[11, 29], (3) discourse[30, 31],
Section 3 provides the details for the six datasets that and (4) semantic[18]. At the lexicon level, these
are used in this study. FNACSPM is presented in approaches compute the lexicon frequency statistics
Section 4 which offers approaches for fake news with models such as bag-of-words (BOW)[11]. Part-of-
analysis and classification. The experimental results speech (POS) taggers are used for shallow syntax tasks
and comparison of FNACSPM with recent fake news at the syntax level to compute the frequencies of
classification/detection approaches is presented in POS[11, 29, 32]. Moreover, probabilistic context-free
Section 5. Finally, the paper is concluded with some grammar (PCFG) can be used in style-based methods
remarks in Section 6. to examine and compute the rewrite rules
frequencies[18, 29]. The rhetorical structure theory (RST)
2 Related Work
and tools for rhetorical parsing are used at the
The two main categories of fake news detection discourse level to compute the frequencies of rhetorical
techniques are content-based and propagation (or social relations among sentences as features[30, 31]. In the
context) based. Content-based approaches for fake fourth language level (semantic), phrases or lexicons
news detection evaluate online news/information by that fit into each category of psycho-linguistic (like
examining textual information, visual information, or those that are described in linguistic inquiry and word
both. Content-based approaches use three common count (LIWC)[18]) or that fit into each self-described
textual representations to analyze news: knowledge, psycho-linguistic feature are assigned frequencies.
style information (non-latent or general), and latent Experience and associated deception theories can be
information[13, 15–18]. Propagation- or network-based used to learn such features. Style-based approaches can
methods analyze and identify fake news by also use term frequency–inverse document frequency
investigating how news/information spreads over social (TF-IDF) and n-grams at various language levels to
networks. As the second category of propagation-based capture features of sequences of words (POS tagging,
techniques is not relevant to this work, those are not rewrite rules, etc.)
discussed further. Latent textual features are generally used to create
The first representation, knowledge, is a subject, embeddings of the news content. These features can be
predicate, object (SPO) tuples set that is obtained from extracted at the word, sentence, or document level.
the text of online news. To identify fake news, Embeddings are vectors that can be fed to classifiers
knowledge-based methods assess the news authenticity within a traditional ML framework for fake news
by evaluating the knowledge discovered in news detection. In a DL framework, such embeddings can
content that needs verification. One way to identify also be incorporated into neural networks and
true knowledge is by comparing the obtained transformers[1, 11]. In theory, a latent representation can
SPO from a news article with a knowledge graph also be generated automatically by processes such as
(KG)[15, 16]. Generally, knowledge-based systems matrix or tensor factorization. The selection or
access the credibility of a given news, but they also extraction of general (non-latent) features is heavily
face challenges related to the authenticity of the influenced by experience and is weakly supported by
source(s) from which the KG is constructed. For fact- fundamental theories from other disciplines. Latent
checking online news, it is necessary to identify not features are difficult to comprehend and thus make it
only parts of the news that are worth checking but also difficult to educate the public about fake/real news.
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 945
Content-based approaches do not take into account Sastrawan et al.[43] combined CNN and RNN to
auxiliary information that plays a role in news identify fake news. Similarly, the approach[44]
propagation, such as news spreaders. Moreover, these examined news headlines using BERT and a long
approaches are sensitive to news content. A malicious short-term memory (LSTM) network. To classify fake
entity can also manipulate the detection results by news on OSN, the FakeBERT[45] approach combined
disguising their writing styles[14]. CNN with BERT. An ensemble learning model based
Next, we review style-based fake news detection on BERT and text sentiment analysis was employed in
studies published in the last seven years, based on Ref. [46] for improved detection of harmful news.
traditional ML and DL. Reference [47] used various word vector representation
The semi-supervised learning method[5] to detect techniques with feed-forward neural network (FNN)
breaking news rumors combined unsupervised and and LSTM for fake news identification. FNDNet[48] is
supervised learning objectives. Sitaula et al.[10] a deep CNN for fake news identification. Until now,
assessed the veracity of fake news, and they found that the majority of the literature has focused on fake and
the total authors and the link for the creator of a news real news identification as a binary classification
article with false information play important roles in problem. Some studies[49–59] worked on multi-class
identifying fake news. The theory-driven method[11] fake news identification. Recently, DL and neural
represented news articles with various manual features network based techniques have been proposed and
that captured content structure and writing style. A developed for fake news detection that incorporated
multi-modal approach was used in SAFE[13] to identify multi-modal data such as social context[60], text, and
fake news that relied on similarities between news text image[13, 17, 61–65] and text with users’ behavior and
and visual information. Reis et al.[33] used various profiles[66].
supervised classifiers for fake news classification on
3 Dataset
some features from the literature and also on a new set
of features. Some studies[34, 35] have compared various This study uses six publicly available datasets to
ML classifiers on different datasets for fake news analyze and validate the effectiveness of the proposed
detection. Ahmad et al.[36] investigated various textual framework. Fact-checking experts provided the ground
properties of news and used an ensemble approach to truth labels of true (real) or false (fake) for news
detect fake news. TF-IDF and 23 classifiers were used articles in each of these six datasets. The George
in Ref. [37] to detect fake news in three datasets. Shu McIntire Dataset[67] is the first dataset, referred to as
et al.[38] examined fake news datasets from various Dataset-1 (DS-1), containing 2291 fake news and 2285
contexts to understand their characteristics and used real news. The second and third datasets are from
various standard ML classifiers and social article FakeNewsNet Repository[38]. The websites (GossipCop
fusion models for classification. and PolitiFact) for fact-checking were used to get both
A hybrid framework, named BerConvoNet[12], fake and true news. The GossipCop dataset, referred to
combined bidirectional encoder representations from as Dataset-2 (DS-2), contains 5335 (16 819) fake (real)
transformers (BERT) embeddings and convolutional news. The PolitiFact dataset, referred to as Dataset-3
neural network (CNN) to detect fake news. Two-level (DS-3), contains 474 (798) fake (real) news stories.
CNN with user response generator (TCNN-URG) The next three datasets are originally sourced from
framework[39] for fake news detection represented the Kaggle data science community. The BuzzFeed
online articles at sentence and word levels for dataset[68], called Dataset-4 (DS-4), comprises both 91
extraction of semantic information. The BERT model real and fake news articles. Another dataset known as
was applied in Ref. [40] to examine how the news title Fake News Classification[69], referred to as Dataset-5
and the text (body) relate to fake news. Shu et al.[41] (DS-5), contains 23 503 (21 418) fake (real) news
proposed dEFEND, an explainable fake news detection articles. The last dataset used in this study is known as
method that was based on recurrent neural network Fake and Real News Dataset[70] and is here called
(RNN) and co-attention-based techniques. Reference Dataset-6 (DS-6). It contains 34 980 (35 208) fake
[42] proposed a co-attention sub-network explainable (real) news articles. The authors of this dataset
detection model based on sentence-comment. integrated various famous datasets (i.e., McIntire,
946 Big Data Mining and Analytics, September 2024, 7(3): 942−963
Kaggle, BuzzFeed Political, and Reuters). easily feed the data into pattern mining tools for
Statistical details about the six datasets are given in analysis and, consequently, our ML models for
Table 1. Furthermore, the data present in the classification. It also helped to standardize the input
aforementioned six datasets are combined into one format across all datasets and to make the modeling
large dataset which is called the whole dataset process less complex.
(WDataset). In each of the six datasets, the articles vary
4 Methodology
in nature. WDataset goal is to access and evaluate the
classifiers on a whole dataset that includes news and The proposed FNACSPM framework (Fig. 1) for the
information from a wide range of diverse domains. analysis and detection/classification of fake news
These datasets contain various attributes such as title, consists of four main parts:
body, subject, video, and image. To prepare the data (1) Datasets pre-processing and abstraction: The
for analysis, we combined only text-based data (i.e., first step is to pre-process the datasets to put them into
title and body) into a single attribute called “Text”. For a suitable format for SPM. This is carried out by
the datasets with only a title attribute, we simply used converting each sequence into a discrete sequence,
the title as the text. For the datasets with both title and where each distinct word is transformed into a distinct
body attributes, we concatenated the two attributes positive integer.
with a separator (e.g., a space or newline character) to (2) Learning via SPM: The second step entails
form the text. For the datasets with additional attributes applying various algorithms for SPM on the abstracted
(such as subject, timestamps, video, and image links), datasets to find frequent words, their frequent patterns,
we ignored them. This process of combining the and the sequential relationships among discovered
attributes into a single “Text” attribute enabled us to frequent patterns.
Table 1 Datasets statistics.
Dataset Fake news True news Feature MiL MaL MeL
DS-1 2291 2285 T, B 23 32 674 4379.5
DS-2 5335 16 819 NURL, T, TID 10 204 69.5
DS-3 474 798 NURL, T, TID 10 340 60.7
DS-4 91 91 T, B, URL ¤ 62 32 641 3257.3
DS-5 23 503 21 418 T, B, subject 30 32 888 2553.49
DS-6 34 980 35 208 T, B 15 33 026 3138.40
Note: T: title, B: body, NURL: news URL, TID: tweet ID, MiL: minimum length, MaL: maximum length, MeL: mean length, and ¤:
top_img, authors, source, publish_date, movies, images, canonical_link, meta_data.
Real Fake
Null data
removal Random Guassian
Frequent words forest Naïve Bayes
Lowercasing
Frequent words Feature 2
sequences
Stopwords Multi layer
Sequential rules kNN
removal perceptron
between words
Feature 1
Abstraction
Support Accuracy
Logistic Precision
vector
regression Recall
machine F1-measure
(1) Datasets pre-processing (2) Learning using SPM (3) Classification (4) Benchmark evaluation
and abstraction
Fig. 1 FNACSPM framework, for fake news analysis and classification, consisting of four main steps: (1) Datasets pre-
processing and abstraction, (2) learning using SPM, (3) classification via discovered frequent sequential patterns of words in
the datasets by training various classifiers, and (4) evaluation of the framework by performing extensive experiments.
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 947
(3) Classification via frequent patterns: The third contains n words, i.e., |WS| = n . For instance, take
step is to use frequent patterns, discovered in Step (2), W = { trump, image, people, featured, via, even } . Then,
for the classification/detection of fake news. Various the set { trump, people, via, even } is a WS containing
classifiers are utilized, and their performance is four words. A total order relation on words is defined,
evaluated with various evaluation measures. indicated by the ≺ , to aid in the identification of
(4) Evaluation: Comprehensive experiments are patterns. In the framework’s implementation, this
carried out to access FNACSPM’s performance and lexicographical order is employed as the processing
compare it with recent approaches for fake news order for pattern searching.
detection. A sequence of words is basically a list of words set
In the next subsections, the first three parts of S = ⟨WS1 , WS2 , ..., WSn ⟩ , such that WSi ⊆ WS ( 1 ⩽
FNACSPM are explained in greater detail. i ⩽ n ). A words corpus dataset, WCD = ⟨S 1 , S 2 , ..., S n ⟩ ,
is a list of words sequences. In WCD , a sequence is
4.1 Dataset pre-processing and abstraction
associated with an ID. Table 3 shows a WCD
The first step is data pre-processing, where cleaning containing four word sequences. According to the first
operations such as lemmatization and stemming, and sequence, the word “trump” is followed by “featured”,
eliminating special characters, punctuation, and stop then “via”, and “show”.
words are performed to prepare the data for further The word sequences are transformed into integer
analysis. After the pre-processing, the sequences of sequences. This is done to prepare the datasets in a
words in the datasets are represented in an appropriate format that SPM algorithms can process more easily.
format. Table 2 provides the statistical details of six Each line in the final transformed datasets denotes a
datasets after pre-processing. After pre-processing, the word sequence for a fake/real news. In sequences, a
datasets are reduced, approximately, as follows: DS-1 unique positive integer is used to replace each unique
(29%), DS-2 (24%), DS-3 (23%), DS-4 (35%), DS-5 word type. For instance, the words “trump” and
(32%), and DS-6 (31%). For example, DS-1 (29%) “featured” are changed to 1 and 3, respectively. A
indicates that the size of DS-1 is reduced to 29% of its single space and the negative number −1 are used to
original size as a result of the cleaning operations separate the words in sequences from one another.
performed in pre-processing. News (sequence) ends when a negative number (−2)
Let W = {w1 , w2 , ..., wm } represent the set of words in a appears at the end of a line. Table 3 also provides the
dataset. A words set WS is a set of words such that conversion of four word sequences into integer
WS ⊆ W . Set cardinality is represented by |WS| . A sequences.
words set WS has a length of n (known as n-WS) if it 4.2 Learning via SPM
Table 2 Datasets statistics (after pre-processing). WCD is analyzed, in the second step, to discover
Dataset MiL MaL MeL frequent patterns. Suppose that S a = ⟨a1 , a2 , ..., an ⟩ and
DS-1 17 21 875 3144 S b = ⟨b1 , b2 , ..., bm ⟩ are two sequences of words. S b
DS-2 4 174 53.1
contains S a ( S a ⊑ S b ) if and only if there exists integer
DS-3 10 279 47.1
1 ⩽ k1 < k2 < · · · < kn ⩽ m , s.t., a1 ⊆ bk1 , a2 ⊆ bk2 , ...,
DS-4 39 20 203 2124.3
an ⊆ bkn . S a is considered to be S b ’s subsequence if S b
DS-5 22 3279 1759.5
contains S a . The importance and interestingness of a
DS-6 9 22 831 2189.1
subsequence in SPM can be found via various
Table 3 Sample of WCD .
ID Sequence Representation of words sequences as integer sequences
1 ⟨{trump}, {featured}, {via}, {show}⟩ 1 −1 3 −1 4 −1 5 −1 −2
2 ⟨{image}, {getty}, {image}, {image}, {said}, {president}⟩ 6 −1 12 −1 6 −1 6 −1 32 −1 23 −1 −2
7 −1 11 −1 6 −1 32 −1 15 −1 18 −1 6
3 ⟨{one}, {donald}, {image}, {said}, {reuters}, {release}, {image}, {american}⟩
−1 22 −1 −2
measures, in which the support measure is mostly used. facilitates the counting of patterns without expensive
In a WCD , the support of S a is the total number of database scans. Thus, SPM algorithms based on VDR
sequences ( S ) that contain S a , which is denoted by the generally work more effectively on dense or long
symbol sup(S a ) : sequences. Other strategies for search space reduction
are also used in TKS, along with the data structure of
sup(S a ) = |{S |S a ⊑ S ∧ S ∈ WCD}| (1)
the precedence map (PMAP). These methods allow
In a sequential dataset, such as WCD , SPM deals TKS to lower the number of costly operations like bit
with the enumeration problem to find all the frequent vector intersections. Another SPM algorithm used in
subsequences. If support of a sequence S is equal to or this work is CM-SPAM[74]. It scans the search space of
greater than a user-provided threshold of minimum a dataset or database to find frequent sequential
support ( sup(S ) ⩾ minsup ), then S is said to be a patterns. CM-SPAM uses the data structure of co-
frequent sequences. Sequences can have up to 2n − 1 occurrence MAP (CMAP) that stores items co-
distinct subsequences, where n represents the total occurrence information. CM-SPAM uses a generic
number of items. For most datasets, finding the support mechanism to prune the search space via the VDR. The
of all potential subsequences using the naive method is reader may refer to Refs. [73, 74] for more details
not possible[71]. However, over the past two decades, about the two aforementioned algorithms for SPM.
various effective algorithms have been developed that The aforementioned algorithms have the main
can discover all sequential patterns without having to drawback that they may discover too many sequential
search through all the potential subsequences. patterns, most of which are not interesting or important
SPM algorithms use the s-extensions and i- for users. Sequential patterns appearing frequently in a
extensions operations to move through the search space database with low confidence are of no value in tasks
of sequential patterns. For an item y , S b is an s- of prediction or decision-making. Due to this, there is
extension of S x , if S b = ⟨x1 , x2 , ..., xn , {y}⟩ . On the other another pattern type known as sequential rules. A
hand, S c is an i-extension of S x if S c = pattern as a sequential rule considers both the
⟨x1 , x2 , ..., xn ∪ {y}⟩ . In general, SPM algorithms use a confidence (conditional probability) that some events
depth-first or breadth-first search with various (words in this work) will follow or be followed by
optimizations and data structures. others in addition to the support of events. A sequential
Frequent itemset mining (FIM), a special case of rule X → Y in this work represents a relationship
SPM, deals with analyzing records where the between two WS s X, Y ⊆ W , s.t., X, Y , ∅ and
sequential ordering among items is not considered. The X ∩ Y = ∅ . According to the rule r : X → Y , if words
first and best-known FIM method, called Apriori[72], from X appear in a series, then words from Y will
can discover frequent itemsets (like word sets) in large follow in the same sequence. S x contains X if and only
databases. Apriori first discovers items (e.g., words) in ∪
if X ⊆ ni=1 xi . Similarly, S x contains the rule r if an
databases that occur frequently. Then, discovered items ∪
integer k exists s.t., 1 ⩽ k < n , X ⊆ ki=1 xi and
are expanded to discover larger itemsets that often Y ⊆ ∪ni=k+1 xi . In a dataset WCD , the confidence of a
appear adequately. Besides finding itemsets, Apriori rule r is
can also finds relationships (association rules) among
|{S |r ⊑ S ∧ S ∈ WCD}|
items. Multiple memory efficient and fast algorithms confWCD (r) = (2)
|{S |X ⊑ S ∧ S ∈ WCD}|
can be used for FIM, which find the same patterns.
These new algorithms use different types of data Similarly, in WCD , the support of a rule r is
structures, optimization techniques, and search |{S |r ⊑ S ∧ S ∈ WCD}|
supWCD (r) = (3)
strategies. |WCD|
One SPM algorithm used in this work is top-k For a WCD and a user-specified minimum support
sequential (TKS)[73], which can find the top-k most threshold ( minsup ), a rule r is considered a frequent
common sequential patterns in a database (or dataset), sequential rule if and only if supWCD (r) ⩾ minsup .
where a user sets the parameter k . TKS finds the Similarly, for a user-specified minimum confidence
desired k patterns by applying the sequential pattern threshold ( minconf ), a rule r is considered a valid
mining (SPAM)’s candidate generation procedure and sequential rule if (1) it is frequent and (2)
a vertical database representation (VDR). The VRD confWCD (r) ⩾ minconf . Enumerating all the valid
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 949
sequential rules in a dataset is the goal of sequential regard to c for a chosen class c ∈ NT :
rule mining. In this work, the ERMiner algorithm[75] is {
1, if s ∈ c;
used to discover frequent sequential rules in fake news Sc = (4)
0, otherwise
datasets. A VDR is employed by ERMiner. The rules
The news class labels, according to Eq. (4), are
search space is investigated by the use of equivalency
classes of rules with identical antecedent and labeled to 1 for those that belong to c , while others are
consequent. Moreover, the search space of sequential labeled as 0 (or “Others”). A simple example is
rules is investigated by using two procedures (called provided to illustrate this process. For instance, if the
left and right merges). ERMiner is more effective than news type of interest in DS-1 is fake, then Eq. (4)
earlier algorithms for mining sequential rules because it assigns 1 to all the DS-1 sequences belonging to the
uses the sparse count matrix (SCM) approach for fake type and 0 to all other sequences in DS-1 and
search space pruning. In summary, SPM algorithms are other sequences in the whole dataset.
different from each other on the basis of (1) the use of A second way to train and test classifiers for fake
a depth-first or breadth-first search, (2) the use of a news classification/detection is to use MC
VDR or horizontal representations and of particular classification. In the context of this work, each
data structures, and (3) how the support measure is sequence in the whole dataset (DS-7 or WDS), which
calculated to find those frequent patterns that satisfy combines all six datasets, is labeled with its respective
minsup constraint. class name. There are 12 classes in total, as shown in
Table 4. In MC classification, a classifier is trained to
4.3 Classification
correctly label sequences according to those classes.
The third step of the framework involves the fake news For classification, seven standard ML algorithms and
classification using the sequential frequent patterns one DL algorithm are used, which are: (1) Bernoulli
discovered with SPM. The lengths of news articles are Naive Bayes (BNB), (2) Gaussian Naive Bayes (GNB),
generally long, for example, see Tables 1 and 2. A (3) decision tree (DT), (4) random forest (RF), (5)
close inspection of the WCD reveals that the majority support vector machine (SVM), (6) k-Nearest
of the sequences (both real and fake) contain the same Neighbors (kNN), (7) LR, and (8) multi layer
words repeated multiple times. This word repetition in perceptron (MLP). We chose these eight classifiers for
online news can be avoided during the classification this work because most previous studies on fake news
process by treating contiguous identical words as a analysis and detection also used them.
single word. The performance of classifiers is assessed using the
More precisely, FNACSPM uses the sequential following seven metrics: accuracy (ACC), precision
frequent patterns, found with the SPM algorithms, to
(P), F1 score, recall (R), Matthews correlation
classify fake and real news in datasets. For
classification, two methods (binary and multi-class Table 4 MC labeling of sequences from the whole dataset.
(MC)) are employed. Two types of binary Dataset Class MC class
classification are considered for training a classifier so Fake DS1-F
DS-1
that it classifies each fake or real news. Real DS1-R
Type 1: Each dataset is considered separately in the Fake DS2-F
first type. For a separate dataset, binary classification DS-2
Real DS2-R
assigns “fake” or “real” labels to each sequence (news)
Fake DS3-F
corresponding to that class. DS-3
Type 2: In this type, all datasets are combined Real DS3-R
together to create one dataset, that is used in training a Fake DS4-F
DS-4
model for the classification of news (sequence) type. Real DS4-R
This classification type labels “1” to sequence(s) that Fake DS5-F
originally belonged to that dataset type and labels DS-5
Real DS5-R
“Others” (or 0) to all other sequences.
Fake DS6-F
Definition 1 Assume that NT denotes the set of all DS-6
Real DS6-R
news types (classes). A sequence S is labeled with
950 Big Data Mining and Analytics, September 2024, 7(3): 942−963
coefficient (MCC), area under the receiver operating algorithms on the abstracted datasets.
characteristic curve (AUC) and area under the
5.1 Discovered pattern and rule
precision-recall curve (AUPRC). The seven measures
are defined as follows: The Apriori is first applied to the transformed datasets
TP + TN to find frequent words. Both fake and real news contain
ACC = (5) many similar words (Fig. 2). We found that in the first
TP + TN + FP + FN
3000 frequent words discovered by Apriori in fake and
TP
Precision = (6) real news, approximately 93% are similar to each
TP + FP
other. However, frequent sets of words are unordered.
TP
Recall = (7) Besides, Apriori does not guarantee that words from a
TP + FN
word set ( WS ) occur in a sequence consecutively. As a
P×R result, Apriori’s long patterns are not interesting or
F1 = 2 × (8)
P+R important and offer no helpful information. Apriori is
TP × TN − FP × FN unable to identify sequential patterns because it ignores
MCC = √ (9)
(TP + FP)(TP + FN)(TN + FP)(TN + FN) the relationships between words in order. Next, we
w1 present the outcomes for SPM algorithms that improve
AUC = TPR(dFPR) (10) upon Apriori.
0
More important and meaningful patterns can be
∑
n
(Ri − Ri−1 ) × (Pi + Pi−1 )
AUPRC = (11) discovered in data using SPM algorithms like TKS,
2
i=1 CM-SPAM, and ERMiner. The top-k sequential
TP = true positive, TN = true negative, FP = false patterns of words in the datasets are discovered using
positive, and FN = false negative. In Eq. (10), TPR the TKS algorithm. CM-SPAM algorithm needs the
represents the recall (R) and dFPR is the derivative of minsup threshold to be set, unlike TKS. Table 5 lists
FP some frequent sequential patterns of words that are
the false positive rate (FPR), that is equal to .
FP + TN found in six datasets with TKS and CM-SPAM. From
Pi and Ri in Eq. (11) represent the values for precision
and recall, respectively, at the i -th decision threshold. discovered patterns, one can find useful and interesting
details about the frequent occurrences of words in fake
5 Result and real news. The bold patterns represent fake ones
A computer equipped with 16 GB RAM and an 11th-
generation Core i5 processor was utilized for carrying
out experiments. A JAVA-based open-source library,
called SPMF[76], was used to examine and find patterns
in the datasets. Implementations of over 250 data and
pattern mining algorithms are available in this library.
For classification purposes, Python is used, where a
variety of libraries are utilized, including scikit-learn[77]
for ML algorithms, NumPy for numerical (a) Fake
computations, and Pandas for data manipulation. In the
text pre-processing phase, the TF-IDF was used,
utilizing the “TfidfVectorizer” module from the scikit-
learn library. To ensure reliable model evaluation, the
dataset is split into training and testing sets (80%
training and 20% testing) by using the train_test_split
function from the scikit-learn. This function facilitated
the random partitioning of the data, allocating a
specified proportion for training the models and the
(b) Real
remaining portion for evaluating their performance.
Next, we discuss the obtained results by using the SPM Fig. 2 Frequent words discovered in the whole dataset.
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 951
while others represent real ones. We find that some Table 6 Extracted sequential rules by using the ERMiner
fake and real patterns are similar to each other and algorithm.
there is some difference among the patterns found in Dataset Extracted sequential rule
donald → republican
the six datasets. Overall, it was observed that using
state → hillary, clinton
pattern mining on news was quite fast. However, for
campaign → clinton
datasets that contain long news sequences, we need to
state → time, year
fine-tune some parameters of both algorithms to find DS-1
party, campaign → president, said, state
frequent sequential patterns.
trump, donald → said, clinton, hillary, campaign
Table 6 shows the relationships between frequent
clinton, state → people
words that are identified in each dataset via the new, people → hillary, state
ERMiner algorithm. It was observed that different brad → pitt
dataset requires different parameter settings (minsup miley, cyrus → liam
and minconf) before they start giving sequential rules. kim → kardashian
For example, for DS-1, minconf = 15%. This indicates selena, gomez → justin, bieber
that rules should therefore have a minimum of 15% DS-2
beyonce → jay, z
confidence. The third rule in DS-1 indicates that the wedding, prince, harry → meghan, markle
word “campaign” is followed by the word “clinton”. jennifer, leaving → biggest, mistake
Similarly, the last rule indicates that the words “new” brad, pitt → angelina, jolie
and “people” are followed by “hillary” and “state”, week → transcript
respectively. ERMiner offered some useful news, latest → video
dependencies and relationships that are present among office → news, breaking
frequent words. On six datasets, the three SPM donald → paid
DS-3
algorithms performed effectively. The obtained results senate, call, vote → congress
showed a clear association between the total number of trump, executive → order
words in news sequences and the effectiveness of queen → say, elizabeth
algorithms for sequential patterns. In Table 6, X in DS- kim, jong → trump, north
6 represents that ERminer was unable to find rules in new → president
get, life → people
the set of fake news due to running out of time or
thing, get → short, life
memory.
hillary → trump
DS-4
5.2 Result for classification hillary, clinton → donald, trump
The experimental results for both binary and MC hillary → said, clinton, trump
donald, continued → trump
classification on six datasets are presented in this
thing, know → get, trial
section. The eight classifiers were used for two cases:
reuters → president
Case 1: All the words, after prepossessing, in news
washington → persident, trump
sequences are used in the classification process.
official, house → state, year
Case 2: The frequent sequential patterns, found with
featured → image
two SPM algorithms are used in the classification DS-5
government, last, president → new, republican
process. trump → people, president
TKS and CM-SPAM algorithms are used in Case 2 obama, president → time, image
to find frequent 100, 200, 400, and 600 patterns of twitter, pic, country → white, house
words in each dataset. Four different numbers of X
patterns were considered to investigate whether or not video → trump
the number of patterns affects how well classification X
models perform. After discovery, the frequent patterns president, donald → trump
DS-6
are further pre-processed to ensure that in each pattern donald → image, trump
there are at least 3 distinct frequent words. For the X
classification in both cases, the default trump → image, featured
hyperparameters for algorithms were as follows: X
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 953
● BNB with an α value of 1.0; of 0.84 on DS-1 and took 9.5 s to terminate. For Case
● GNB with no significant hyperparameters to tune; 1, 10 000 random and fake news articles were used in
● DT with a criterion of “gini”, a splitter of “best”, DS-5 and DS-6 in the classification. For the WDS,
no maximum depth limit, a minimum samples split, proportionate sampling was used to deal with data
and leaf of 2 and 1, respectively; imbalance. It involves selecting a subset of data from
● RF with 100 estimators, “gini” criterion, no each dataset in a way that maintains the original class
maximum depth limit, minimum samples split of 2, and distribution. This helps to ensure that each class is
minimum samples leaf of 1; represented proportionally in the 10 000 randomly
● MLP with a hidden layer size of 600, “tanh” selected articles. LR achieved the highest accuracy of
activation function, “adam” solver, an α value of 0.83 on average, on all datasets. On DS-5, DT and RF
0.0001, invscaling learning rate, and learning rate achieved the highest accuracy of 0.99, followed by LR
initialization of 0.001; (0.98). The ranking of classifiers based on average
● SVM with a C value of 1.0, a “radial basis accuracy is in the order LR > RF > DT > kNN > BNB >
function (rbf)” kernel, a degree of 3, and a “scale” γ GNB. SVM and MLP are not included in the ranking
value; as they were unable to produce results within 5 h on
● kNN with 2 neighbors, “uniform” weight scheme, various datasets.
“auto” algorithm, leaf size of 30, and Euclidean kNN performed best in terms of computational time,
distance metric (p = 2). followed by LR. The ranking of classifiers based on
● LR with a C value of 1.0, “Limited-memory time is kNN > LR > BNB > GNB > RF > DT. RF and
Broyden-Fletcher-Goldfarb-Shanno (lbfgs)” solver, LR performed better overall but RF was slow
and a maximum iteration limit of 100. compared to LR. The accuracy of classifiers, except
5.2.1 Binary classification result RF, was low on DS-4 compared to the other five
Table 7 provides the results of binary classification for datasets. This is because DS-4 contained few fake and
Case 1 (all words are used for classification). The real news. For the whole dataset, the results for all the
Acc classifiers decreased significantly. In Case 1,
format is used for classifiers. For example, the
Time
0.84 interestingly we find that classifiers achieved
entry represents that BNB achieved an accuracy the highest accuracy, except kNN, on the Fake
9.5
Table 7 Classifiers’ accuracy and running time for binary classification (Case 1).
Dataset Result BNB GNB DT RF MLP SVM kNN LR
Accuracy 0.84 0.81 0.82 0.90 0.93 0.93 0.84 0.92
DS-1
Running time (s) 9.5 17.8 55.8 28.8 3402.4 2845.6 3.5 8.4
Accuracy 0.84 0.65 0.80 0.84 0.77 – 0.81 0.85
DS-2
Running time (s) 17.4 23.4 1677.8 943.4 6390.5 – 26.7 10.9
Accuracy 0.75 0.75 0.76 0.75 0.79 0.81 0.86 0.81
DS-3
Running time (s) 0.1 0.1 0.9 1.8 34.9 4.9 0.3 0.07
Accuracy 0.62 0.59 0.76 0.80 0.65 0.70 0.73 0.69
DS-4
Running time (s) 0.2 0.1 0.5 0.7 27.7 0.7 0.4 0.08
Accuracy 0.85 0.90 0.99 0.99 – – 0.66 0.98
DS-5
Running time (s) 28.2 58.1 60.6 72.6 – – 8.6 19.4
Accuracy 0.84 0.75 0.91 0.90 0.92 – 0.86 0.91
DS-6
Running time (s) 62.9 329.2 329.2 118.3 3749.1 – 15.1 30.7
Accuracy 0.58 0.57 0.57 0.61 – – 0.62 0.65
WDS
Running time (s) 83.6 167.7 217.7 212.5 – – 12.9 43.1
Accuracy 0.76 0.71 0.80 0.82 – – 0.76 0.83
Average
Running time (s) 28.8 85.2 334.6 196.8 – – 9.64 16.1
954 Big Data Mining and Analytics, September 2024, 7(3): 942−963
News Classification dataset (DS-5). In previous classifiers based on average accuracy is in the order
studies[10–14, 35–38, 41, 42, 44, 60, 63] that used multiple LR ≈ MLP ≈ SVM > BNB > RF > DT > kNN >
datasets, classifiers performed better on the PolitoFact GNB. When compared to TKS, all classifiers
dataset (DS-3). Figure 3 shows the overall results for performed generally better on patterns found with CM-
LR, which performed best for Case 1. SPAM. Moreover, classifiers achieved high accuracy
Classifiers in Case 2 performed significantly better on 600 patterns, followed by 400, 200, and 100,
than classifiers in Case 1 (Table 8). Overall for varying respectively. Again, the time taken by classifiers
pattern lengths, it is observed that classifiers achieved reduced significantly in Case 2 as compared to Case 1.
better results on patterns found with TKS as compared For Case 2, GNB performed worst.
to CM-SPAM. Moreover, classification models Interestingly, the majority of the classifiers, except
achieved the highest accuracy with 400 patterns, BNB, GNB, and MLP, performed better in Case 1 as
followed by 600, 200, and 100, respectively. The compared to the results obtained by using TKS’s 100
ranking of classifiers based on accuracy on patterns and 200 frequent patterns. Conversely, all classifiers
found by using TKS (CM-SPAM) is in the order LR performed better with 600 patterns obtained by CM-
(LR) ≈ ( > ) SVM (MLP) > ( > ) MLP (BNB) > ( > ) RF SPAM. Moreover, the classifiers in Case 1 performed
(SVM) > ( > ) DT (kNN) > ( > ) BNB (RF) > ( > ) kNN better in some cases than Case 2. Figure 4 provides the
(DT) > ( > ) GNB (GNB). Table 9 lists the overall overall results of LR for Case 1 and Case 2.
results for LR, which performed best on patterns In summary, the obtained results show that frequent
discovered using TKS. LR performed best on DS-2 patterns discovered in news can be used efficiently to
(GossipCop), followed by DS-3 (PolitiFact). Obtained classify and detect fake news instead of providing the
results so far clearly indicate that using all the words whole news sequences. Using all the words (or the
not only provides less accurate results, compared to entire news) not only provides less accurate results,
using frequent patterns, but also takes much time and compared to using frequent patterns, but also takes
memory. much more time. From Tables 1 and 2, it is evident that
5.2.2 MC classification result news articles typically consist of thousands of words.
Tables 10 and 11 provides the obtained MC However, the sequential patterns discovered by the
classification results for both cases. For Case 1, LR and TKS algorithm contain 74 words at most. For binary
SVM performed better accuracy (0.77) compared to classification, it was observed that classification
others. The ranking of classifiers based on accuracy is models performed better, overall, on TKS’s patterns as
in the order LR ≈ SVM > RF > MLP > kNN > DT ≈ compared to CM-SPAM’s patterns. The opposite was
BNB > GNB. Interestingly, the classifiers achieved true for MC classification. However, classifiers
high accuracy for MC as compared to binary performed better on TKS’s 600 patterns as compared to
classification for Case 1. Computation-wise, kNN CM-SPAM’s 600 patterns. For binary (MC)
performed better, followed by BNB, while SVM classification, classifiers performed better on 400 (600)
performed worst, followed by MLP. patterns.
For Case 2, LR performed better (average accuracy 5.2.3 Comparison
0.767) compared to others. For Case 2, the ranking of FNACSPM is compared in this section with state-of-
DS-1 DS-2 DS-3 DS-4 DS-5 DS-6 WDS the-art approaches (published during 2017−2023) for
1.0
0.9 fake news classification and detection.
Binary classification result
Table 8 Classifiers accuracy and running time for binary classification on frequent sequential patterns discovered by TKS
(CM-SPAM).
Dataset FreqP Result BNB GNB DT RF MLP SVM kNN LR
Accuracy 0.78 (0.80) 0.80 (0.88) 0.90 (0.88) 0.80 (0.85) 0.85 (0.82) 0.88 (0.82) 0.80 (0.80) 0.88 (0.80)
100
Running time (s) 0.2 (0.2) 0.01 (0.01) 0.07 (0.07) 0.3 (0.2) 0.7 (0.7) 0.02 (0.02) 0.2 (0.4) 0.02 (0.02)
Accuracy 0.85 (0.82) 0.82 (0.81) 0.96 (0.85) 0.85 (0.75) 0.89 (0.81) 0.90 (0.82) 0.85 (0.81) 0.91 (0.85)
200
Running time (s) 0.2 (0.3) 0.01 (0.01) 0.07 (0.07) 0.4 (0.3) 2.2 (2.6) 0.05 (0.06) 0.2 (0.3) 0.01 (0.01)
DS-1
Accuracy 0.89 (0.86) 0.90 (0.92) 0.89 (0.84) 0.89 (0.85) 0.92 (0.91) 0.94 (0.88) 0.82 (0.78) 0.94 (0.88)
400
Running time (s) 0.4 (0.2) 0.02 (0.01) 0.09 (0.08) 0.5 (0.4) 4.4 (4.03) 0.2 (0.1) 0.3 (0.2) 0.03 (0.02)
Accuracy 0.95 (0.92) 0.96 (0.92) 0.88 (0.86) 0.93 (0.92) 0.93 (0.95) 0.94 (0.93) 0.85 (0.84) 0.95 (0.94)
600
Running time (s) 0.2 (0.3) 0.02 (0.03) 0.08 (0.09) 0.6 (0.5) 5.9 (9.1) 0.2 (0.8) 0.2 (0.2) 0.02 (0.03)
Accuracy 1 (0.80) 1 (0.80) 0.95 (0.68) 1 (0.72) 1 (0.70) 1 (0.68) 1 (0.65) 1 (0.70)
100
Running time (s) 0.3 (0.2) 0.01 (0.01) 0.07 (0.08) 0.2 (0.3) 0.7 (0.8) 0.01 (0.02) 0.3 (0.2) 0.01 (0.02)
Accuracy 0.94 (0.90) 0.94 (0.89) 1 (0.88) 1 (0.88) 0.94 (0.90) 1 (0.94) 1 (0.78) 1 (0.94)
200
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.07 (0.08) 0.2 (0.3) 0.7 (2.7) 0.02 (0.05) 0.2 (0.3) 0.01 (0.02)
DS-2
Accuracy 0.99 (0.94) 0.99 (0.94) 1 (0.91) 1 (0.92) 0.99 (0.94) 1 (0.94) 1 (0.92) 1 (0.94)
400
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.07 (0.08) 0.3 (0.4) 1.8 (4.1) 0.08 (0.3) 0.3 (0.2) 0.02 (0.02)
Accuracy 0.99 (0.95) 0.99 (0.96) 1 (0.93) 1 (0.95) 0.99 (0.95) 1 (0.97) 1 (0.91) 1 (0.96)
600
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.07 (0.08) 0.3 (0.4) 1.8 (4.7) 0.1 (0.5) 0.4 (0.3) 0.04 (0.03)
Accuracy 1 (0.89) 0.48 (0.55) 1 (0.82) 1 (0.91) 1 (0.91) 1 (0.86) 1 (0.89) 1 (0.89)
100
Running time (s) 0.2 (0.3) 0.01 (0.01) 0.07 (0.07) 0.4 (0.3) 0.8 (0.9) 0.01 (0.02) 0.3 (0.2) 0.01 (0.02)
Accuracy 1 (0.92) 0.68 (0.54) 1 (0.56) 1 (0.57) 1 (0.91) 1 (0.92) 1 (0.92) 1 (0.92)
200
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.09 (0.08) 0.3 (0.4) 0.8 (3.3) 0.02 (0.05) 0.2 (0.4) 0.01 (0.02)
DS-3
Accuracy 0.98 (0.84) 0.69 (0.61) 0.98 (0.57) 0.98 (0.56) 0.98 (0.83) 0.98 (0.62) 0.96 (0.82) 0.98 (0.84)
400
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.07 (0.1) 0.3 (0.7) 1.5 (12.5) 0.06 (0.3) 0.2 (0.3) 0.02 (0.03)
Accuracy 0.98 (0.80) 0.76 (0.52) 0.96 (0.50) 0.98 (0.51) 0.98 (0.78) 0.99 (0.79) 0.90 (0.79) 0.97 (0.81)
600
Running time (s) 0.2 (0.3) 0.01 (0.04) 0.07 (0.3) 0.3 (1.1) 1.7 (21.5) 0.1 (1.1) 0.3 (0.2) 0.02 (0.04)
Accuracy 0.98 (0.92) 0.98 (0.95) 0.95 (0.95) 0.95 (0.95) 0.95 (0.95) 0.95 (0.95) 0.92 (0.95) 0.95 (0.95)
100
Running time (s) 0.2 (0.2) 0.01 (0.01) 0.07 (0.06) 0.2 (0.2) 0.8 (0.7) 0.02 (0.01) 0.3 (0.2) 0.01 (0.02)
Accuracy 0.96 (0.98) 0.94 (0.91) 0.94 (0.96) 0.95 (0.96) 0.96 (0.96) 0.96 (90.6) 0.94 (0.96) 0.96 (0.96)
200
Running time (s) 0.3 (0.2) 0.01 (0.02) 0.09 (0.07) 0.3 (0.4) 1.6 (1.9) 0.03 (0.02) 0.4 (0.2) 0.02 (0.01)
DS-4
Accuracy 0.98 (0.97) 0.96 (0.93) 0.96 (0.96) 0.98 (0.96) 0.98 (0.98) 0.98 (0.98) 0.96 (0.95) 0.98 (0.98)
400
Running time (s) 0.3 (0.2) 0.02 (0.01) 0.2 (0.07) 0.4 (0.3) 2.5 (2.6) 0.2 (0.04) 0.3 (0.2) 0.03 (0.02)
Accuracy 0.96 (0.96) 0.95 (0.93) 0.95 (0.93) 0.96 (0.95) 0.95 (0.96) 0.95 (0.95) 0.95 (0.94) 0.95 (0.96)
600
Running time (s) 0.2 (0.3) 0.01 (0.02) 0.08 (0.07) 0.4 (0.3) 3.2 (2.8) 0.2 (0.07) 0.2 (0.3) 0.02 (0.03)
Accuracy 0.92 (0.95) 0.82 (0.90) 0.88 (0.95) 0.89 (0.95) 0.90 (0.95) 0.90 (0.95) 0.85 (0.95) 0.90 (0.95)
100
Running time (s) 0.1 (0.2) 0.02 (0.01) 0.1 (0.07) 0.4 (0.3) 1.2 (0.6) 0.03 (0.01) 0.3 (0.2) 0.03 (0.01)
Accuracy 0.88 (0.98) 0.81 (0.91) 0.90 (0.96) 0.91 (0.96) 0.84 (0.96) 0.89 (0.96) 0.88 (0.96) 0.79 (0.96)
200
Running time (s) 0.1 (0.2) 0.02 (0.01) 0.09 (0.06) 0.4 (0.2) 1.2 (1.1) 0.07 (0.02) 0.3 (0.2) 0.02 (0.01)
DS-5
Accuracy 0.88 (0.92) 0.77 (0.62) 0.87 (0.91) 0.88 (0.91) 0.88 (0.95) 0.87 (0.94) 0.88 (0.93) 0.89 (0.95)
400
Running time (s) 0.3 (0.2) 0.03 (0.01) 0.09 (0.07) 0.4 (0.3) 2.4 (1.6) 0.1 (0.2) 0.2 (0.3) 0.02 (0.02)
Accuracy 0.87 (0.92) 0.71 (0.67) 0.86 (0.91) 0.89 (0.90) 0.90 (0.94) 0.92 (0.95) 0.89 (0.93) 0.92 (0.94)
600
Running time (s) 0.2 (0.3) 0.02 (0.01) 0.1 (0.09) 0.5 (0.6) 2.5 (2.4) 0.2 (0.1) 0.2 (0.2) 0.03 (0.02)
(To be continued)
956 Big Data Mining and Analytics, September 2024, 7(3): 942−963
Table 8 Classifiers accuracy and running time for binary classification on frequent sequential patterns discovered by TKS
(CM-SPAM).
(Continued)
Dataset FreqP Result BNB GNB DT RF MLP SVM kNN LR
Accuracy 0.90 (0.98) 0.57 (0.62) 0.90 (0.88) 0.88 (0.95) 0.95 (0.98) 0.92 (0.95) 0.90 (0.90) 0.95 (0.98)
100
Running time (s) 0.3 (0.2) 0.01 (0.009) 0.07 (0.07) 0.2 (0.3) 0.7 (0.5) 0.02 (0.01) 0.2 (0.3) 0.01 (0.02)
Accuracy 0.96 (0.91) 0.71 (0.68) 0.96 (0.85) 0.96 (0.84) 0.91 (0.89) 0.91 (0.85) 0.92 (0.82) 0.92 (0.88)
200
Running time (s) 0.3 (0.2) 0.01 (0.01) 0.08 (0.07) 0.4 (0.3) 1.5 (1.8) 0.02 (0.03) 0.3 (0.2) 0.03 (0.02)
DS-6
Accuracy 0.91 (0.88) 0.67 (0.71) 0.89 (0.84) 0.89 (0.85) 0.91 (0.86) 0.89 (0.86) 0.88 (0.83) 0.90 (0.86)
400
Running time (s) 0.2 (0.3) 0.02 (0.01) 0.07 (0.08) 0.4 (0.3) 2.8 (4.1) 0.08 (0.09) 0.2 (0.5) 0.02 (0.02)
Accuracy 0.90 (0.91) 0.75 (0.76) 0.88 (0.86) 0.88 (0.87) 0.90 (0.89) 0.89 (0.90) 0.87 (0.87) 0.89 (0.90)
600
Running time (s) 0.3 (0.2) 0.02 (0.01) 0.08 (0.07) 0.5 (0.4) 5.01 (5.6) 0.2 (0.2) 0.3 (0.2) 0.03 (0.02)
Accuracy 0.78 (0.82) 0.72 (0.75) 0.76 (0.82) 0.78 (0.83) 0.83 (0.82) 0.77 (0.83) 0.70 (0.78) 0.83 (0.82)
100
Running time (s) 0.7 (0.7) 0.03 (0.03) 0.1 (0.1) 0.7 (0.7) 11.3 (11.1) 1.4 (0.8) 0.2 (0.2) 0.03 (0.06)
Accuracy 0.72 (0.85) 0.66 (0.71) 0.76 (0.76) 0.77 (0.78) 0.78 (0.88) 0.78 (0.88) 0.65 (0.80) 0.79 (0.87)
200
Running time (s) 0.9 (0.7) 0.07 (0.08) 0.2 (0.2) 1.5 (1.3) 23.8 (17.1) 3.2 (2.9) 0.2 (0.2) 0.08 (0.09)
WDS
Accuracy 0.85 (0.84) 0.81 (0.75) 0.87 (0.80) 0.89 (0.82) 0.91 (0.86) 0.90 (0.86) 0.80 (0.79) 0.88 (0.86)
400
Running time (s) 0.8 (0.7) 0.2 (0.2) 0.9 (0.7) 3.4 (3.6) 59.8 (30.5) 13.5 (14.8) 0.2 (0.2) 0.2 (0.2)
Accuracy 0.79 (0.87) 0.64 (0.75) 0.80 (0.87) 0.83 (0.84) 0.84 (0.87) 0.86 (0.89) 0.70 (0.81) 0.84 (0.88)
600
Running time (s) 1.5 (1.6) 0.5 (0.8) 2.3 (2.8) 8.3 (8.1) 78.6 (71.8) 55.2 (52.6) 0.3 (0.3) 0.5 (0.6)
0.767 0.905 0.90 0.925 0.916 0.88 0.929
Accuracy 0.891 (88)
(0.778) (0.854) (0.879) (0.875) (0.862) (0.845) (0.869)
100
0.015 0.086
Running time (s) 0.28 (0.24) 0.39 (0.43) 3.1 (4.8) 0.26 (0.2) 0.2 (0.2) 0.01 (0.02)
(0.017) (0.088)
0.931 0.92 0.911 0.919 0.89 0.91
Accuracy 0.90 (0.90) 0.79 (0.77)
200 (0.831) (0.817) (0.901) (0.904) (0.864) (0.911)
Running time (s) 0.31 (0.3) 0.02 (0.03) 0.09 (0.08) 0.5 (0.4) 4.5 (4.3) 0.04 (0.04) 0.2 (0.2) 0.02 (0.02)
Average
0.925 0.827 0.922 0.929 0.938 0.936 0.899 0.938
Accuracy
400 (0.892) (0.782) (0.832) (0.838) (0.903) (0.873) (0.859) (0.901)
Running time (s) 0.3 (0.3) 0.04 (0.04) 0.09 (0.08) 0.4 (0.5) 10.6 (8.4) 2.03 (2.2) 0.2 (0.3) 0.03 (0.02)
0.919 0.822 0.903 0.924 0.925 0.935 0.928
Accuracy 0.88 (0.87)
(0.904) (0.786) (0.836) (0.848) (0.905) (0.911) (0.912)
600
14.05
Running time (s) 0.3 (0.3) 0.02 (0.03) 0.3 (0.5) 1.5 (1.5) 7.9 (7.8) 0.3 (0.2) 0.03 (0.03)
(16.7)
0.908 0.802 0.915 0.918 0.924 0.926 0.887 0.926
Accuracy
(0.894) (0.781) (0.838) (0.845) (0.896) (0.887) (0.859) (0.898)
Average
0.02
Running time (s) 0.3 (0.3) 0.02 (0.02) 0.1 (0.2) 0.6 (0.7) 8.06 (8.5) 2.5 (2.5) 0.2 (0.2)
(0.023)
Except Ref. [12], no previous study used the MCC accuracy of 0.99, followed by Refs. [34, 45, 48] with
metric to evaluate models. an accuracy of 0.98. Because LR outperformed other
Interestingly, in studies that used multiple datasets, classifiers in both types of classification (binary and
the highest accuracy was achieved on the PolitiFact MC), we have included LR findings for comparison
dataset which contains few real and fake sequences. with other classifiers from the literature. LR
Here, the highest accuracy for binary classification is outperformed the multimodal approaches[13, 17, 60–66]
achieved on the GossipCop dataset which is much for fake news detection. Some studies such as
larger in size than PolitiFact. For binary classification, Refs. [58, 59] used their models for binary classification
some studies[36, 43, 46, 58, 59] achieved the highest in the ISOT dataset and MC classification on the LIAR
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 957
0.8
last three years) for binary and MC fake news
0.6 detection.
RF results (highlighted in bold) for both binary and
0.4
MC classification of FNACSPM outperformed other
0.2 classifiers. For MC classification, the majority of the
0 previous studies used the LIAR dataset that has 6
ACC P R F1 MCC AUC AUPRC
labels. The whole dataset used in this work for MC
Fig. 4 MC classification results for LR. classification has 12 labels. Interestingly, MC results
958 Big Data Mining and Analytics, September 2024, 7(3): 942−963
Table 12 Comparison of FNACSPM with recent studies for fake news identification.
Classification Reference Dataset used Best learning model ACC P R F1 MCC AUC AUPRC
[5] PHEME LSTM-RNN – 0.83 0.84 0.83 – – –
[10] BuzzFeed, PolitiFact Linear SVM – – – 0.82 – – –
[11] BuzzFeed, PolitiFact RF 0.89 0.87 0.90 0.89 – – –
George M., Kaggle, GossipCop,
[12] BERT+CNN 0.97 0.96 0.98 0.97 0.94 – –
PolitiFact
[13] BuzzFeed, PolitiFact SAFE (Multimodal) 0.87 0.88 0.90 0.89 – – –
[14] BuzzFeed, PolitoFact DT 0.92 – – 0.93 – – –
EANN
[17] Twitter, Weibo 0.82 0.84 0.81 0.82 – – –
(Multimodal)
[18] FakeNewsAMT, Celebirty Linear SVM 0.76 – – – – – –
[30] Combine 5 datasets HDSF 0.82 – – – – – –
[33] Buzzfeed XGB – – – 0.81 – 0.86 –
[34] LIAR, George M., self made RoBERTa 0.98 0.98 0.98 0.98 – – –
[35] BuzzFeed, PolitiFact Linguistic+SVM 0.84 – – – – – –
ISOT Fake News, 2 Kaggle,
[36] LIWC+RF 0.99 0.99 1 0.99 – – –
George M.
BuzzFeed, Random Political News,
[37] TF-IDF+DT 0.96 0.96 0.97 0.96 – – –
ISOT Fake News
[38] GossipCop, PolitiFact SAF 0.69 0.63 0.78 0.70 – – –
[39] Weibo, self made TCNN-URG 0.89 – – – – – –
News Headlines from CNN, Daily
[40] BERT+WCE – – – 0.74 – – –
Mall
Co-attention
[41] GossipCop, PolitiFact 0.90 0.90 0.95 0.92 – – –
network
Co-attention
[42] GossipCop, PolitiFact 0.93 0.93 0.97 0.95 – – –
network
Binary [43] ISOT Fake News, 2 Kaggle GloVe+BiLSTM 0.99 0.99 0.99 0.99 – – –
[44] GossipCop, PolitiFact BERT+LSTM 0.88 0.91 0.90 0.90 – – –
[45] Kaggle BERT+CNN 0.98 – – – – – –
BERT-based
[46] Fake and Real News Dataset 0.99 0.98 0.99 0.99 – – –
ensemble
[47] George M., Word2Vec+LSTM 0.91 0.89 0.94 0.91 – – –
[48] Kaggle GloVe+CNN 0.98 0.99 0.96 0.98 – – –
[58] ISOT Fake News CNN-ML 0.99 – – – – –
Static+Capsule
[59] ISOT Fake News 0.99 – – – – – –
neural net
TriFN
[60] Buzzfeed, Politifact 0.87 0.86 0.89 0.88 – – –
(MultiModal)
TRIMOON
[61] GossipCop, Weibo 0.91 0.92 0.88 0.90 – – –
(MultiModal)
Twitter, Weibo A, Weibo B, MCN+CARN
[62] 0.92 0.92 0.92 0.92 – – –
Weibo C (MultiModal)
BERT+CapsNet
[63] GossipCop, PolitiFact 0.93 0.92 0.91 0.92 – – –
(MultiModal)
SceneFND
[64] GossipCop, PolitiFact 0.83 0.84 0.84 0.83 – – –
(Multimodal)
MPFN
[65] Twitter, Weibo 0.88 0.82 0.82 0.81 – – –
(MultiModal)
GCAN
[66] Twitter15, Twitter16 0.86 0.79 0.79 0.79 – – –
(MultiModal)
George M., GossipCop, PolitiFact,
FNACSPM Buzzeed, Fake News Classification,
LR 1 1 1 1 1 1 1
(LR) Fake and Real News classification,
all combined
(To be continued)
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 959
Table 12 Comparison of FNACSPM with recent studies for fake news identification.
(Continued)
Classification Reference Dataset used Best learning model ACC P R F1 MCC AUC AUPRC
[5] PHEME LSTM-RNN – – – 0.79 – – –
[49] LIAR, 6 classes Hybrid CNN 0.27 – – – – – –
[50] LIAR, 6 classes MMDF 0.34 – – – – – –
[51] PolitiFact, 6 classes LIWC+LSTM – – – 0.22 – – –
[52] LIAR, 6 classes DT 0.39 – – – – – –
[53] CT-FAN-21, 4 classes RoBERTa 0.47 0.36 0.34 0.29 – – –
[54] LIAR, 6 classes BiLSTM 0.41 – – – – – –
[55] LIAR, 6 classes BERT 0.41 – – – – – –
BERT+CNN-
[56] LIAR, 6 classes 0.47 – – – – – –
MC BiLSTM
[57] LIAR, 6 classes AC-BiLSTM 0.33 – – 0.36 – – –
[58] LIAR, 6 classes Static CNN-ML 0.41 – – – – – –
Non-static+capsule
[59] LIAR, 6 classes 0.40 – – – – – –
neural net
FNACSPM
Combination of 6 datasets, 12 classes LR 0.77 0.76 0.63 0.78 0.66 0.97 0.82
(LR, Case 1)
FNACSPM
Combination of 6 datasets, 12 classes LR 0.80 0.81 0.78 0.78 0.65 0.89 0.92
(LR, Case 2)
FNACSPM
LIAR, 6 classes BNB 0.49 0.49 0.47 0.48 0.42 0.78 0.61
(BNB)
that we obtained with classifiers in Case 1, when all the patterns were then used in the classification process.
words in the pre-processed data are considered, are Eight classifiers were applied and their performance
better than the approaches listed in Table 12, except for was accessed and compared by using seven metrics.
Ref. [5] that achieved the highest F1. The results suggest that LR performed better than
We also accessed the proposed framework others for binary and MC classification. It was also
robustness and scalability on LIAR dataset[49] that observed that (1) using all the words (or news) not only
contains 12 800 short statements, labeled manually, in provided less accurate results, compared to using
various contexts from PolitiFact. From this dataset, we frequent patterns, but also took more time and memory,
took relevant attributes including “statement”, and (2) limited (or short) sequences of news that
“subject”, “speaker”, “speaker’s job”, “state”, “party contain only frequent patterns of words can be used for
affiliation”, and “context (venue)”. For the LIAR reliable prediction and classification rather than entire
dataset, BNB achieved the highest accuracy of 0.49 on news. Moreover, FNACSPM outperformed the
patterns discovered by using TKS. This result for the previous fake news classification/approaches. The
LIAR dataset shows the superior performance of the proposed framework can handle both binary and MC
proposed framework by outperforming other classification tasks, showcasing its versatility and
approaches[49, 50, 52, 54–59] that also used the LIAR efficacy in distinguishing between fake and genuine
dataset for MC classification. news articles across different complexity levels.
Additionally, the research has shed light on the
6 Conclusion
linguistic and semantic structures underlying fake news
A novel SPM-based framework (called FNACSPM) is articles through the utilization of frequent patterns.
presented to analyze and classify fake news. Six This study has various limitations: (1) A drawback of
diverse datasets, and their combination, were used to using SPM for fake news classification is that it may
investigate the effectiveness and generalization ability exclude crucial words that serve as significant
of FNACSPM. The datasets were first abstracted and differentiators between fake and real news. This occurs
algorithms for SPM were then applied on them to when these words have low frequency and are not
discover frequent words, their frequent sequential considered frequent patterns. As a result, this approach
patterns, and sequential rules. Discovered frequent may overlook valuable discriminatory features, which
960 Big Data Mining and Analytics, September 2024, 7(3): 942−963
may affect the classification accuracy. (2) The Detection and visualization of misleading content on
credibility of online datasets used for training and Twitter, Int. J. Multimed. Inf. Retr., vol. 7, no. 1, pp.
71–86, 2018.
testing may not be reliable and bias in information [10] N. Sitaula, C. K. Mohan, J. Grygiel, X. Zhou, and R.
collection may not be completely eliminated. (3) The Zafarani, Credibility-based fake news detection, in
interpretability of the extracted frequent sequential Disinformation, Misinformation, and Fake News in Social
Media, K. Shu, S. Wang, D. Lee, and H. Liu, eds. Cham,
patterns and their relationship to the classification Switzerland: Springer, 2020, pp. 163–182.
decisions may be limited. Understanding the [11] X. Zhou, A. Jain, V. V. Phoha, and R. Zafarani, Fake news
underlying reasons behind the classification results and early detection: A theory-driven model, Digit. Threats
Res. Pract., vol. 1, no. 2, p. 12, 2020.
explaining them to users or stakeholders may be [12] M. Choudhary, S. S. Chouhan, E. S. Pilli, and S. K.
challenging. (4) Patterns and rules discovered by SPM Vipparthi, BerConvoNet: A deep learning framework for
algorithms require validation and verification from fake news classification, Appl. Soft Comput., vol. 110, p.
experts. The study focused on extracting patterns from 107614, 2021.
[13] X. Zhou, J. Wu, and R. Zafarani, SAFE: Similarity-aware
static and retrospective datasets, which do not capture multi-modal fake news detection, in Proc. 24th Pacific-
the dynamic nature of fake news propagation in real- Asia Conference, PAKDD 2020, Singapore, 2020, pp.
time. Real-time analysis and detection of emerging 354–367.
[14] X. Zhou and R. Zafarani, Network-based fake news
fake news may require additional considerations and detection: A pattern-driven approach, arXiv preprint
techniques beyond pattern mining. Moreover, emerging arXiv: 1906.04210, 2019.
or contrast pattern mining[78] can be used on the [15] B. Shi and T. Weninger, Discriminative predicate path
mining for fact checking in knowledge graphs, Knowl.
datasets to find contrasting frequent patterns of words Based Syst., vol. 104, no. C, pp. 123–133, 2016.
and using these patterns for analysis and classification. [16] G. L. Ciampaglia, P. Shiralkar, L. M. Rocha, J. Bollen, F.
Menczer, and A. Flammini, Computational fact checking
References from knowledge networks, PLoS One, vol. 10, no. 6, p.
e0128193, 2015.
[1] X. Zhou and R. Zafarani, A survey of fake news: [17] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su,
Fundamental theories, detection methods, and and J. Gao, EANN: Event adversarial neural networks for
opportunities, ACM Comput. Surv., vol. 53, no. 5, pp. multi-modal fake news detection, in Proc. 24th ACM
1–40, 2020. SIGKDD Conf. Knowledge Discovery & Data Mining,
[2] G. Ruffo, A. Semeraro, A. Giachanou, and P. Rosso, London, UK, 2018, pp. 849–857.
Studying fake news spreading, polarisation dynamics, and [18] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R.
manipulation by bots: A tale of networks and language, Mihalcea, Automatic detection of fake news, arXiv
Comput. Sci. Rev., vol. 47, p. 100531, 2023. preprint arXiv: 1708.07104, 2017.
[3] X. Zhang and A. A. Ghorbani, An overview of online fake [19] P. Fournier-Viger, J. C. W. Lin, R. U. Kiran, Y. S. Koh,
news: Characterization, detection, and discussion, Inf. and R. Thomas, A survey of sequential pattern mining,
Process. Manag., vol. 57, p. 102025, 2020. Data Science and Pattern Recognition, vol. 1, no. 1, pp.
[4] C. Kong, G. Luo, L. Tian, and X. Cao, Disseminating 54–77, 2017.
authorized content via data analysis in opportunistic social [20] M. Cheng, X. Jin, Y. Wang, X. Wang, and J. Chen, A
networks, Big Data Mining and Analytics, vol. 2, no. 1, sequential pattern mining approach to tourist movement:
pp. 12–24, 2019. The case of a mega event, J. Travel. Res., vol. 62, no. 6,
[5] S. A. Alkhodair, S. H. H. Ding, B. C. M. Fung, and J. Liu, pp. 1237–1256, 2023.
Detecting breaking news rumors of emerging topics in [21] M. S. Nawaz, P. Fournier-Viger, M. Aslam, W. Li, Y. He,
social media, Inf. Process. Manag., vol. 57, p. 102018, and X. Niu, Using alignment-free and pattern mining
2020. methods for SARS-CoV-2 genome analysis, Appl. Intell.,
[6] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, Fake news vol. 53, no. 19, pp. 21920–21943, 2023.
detection on social media: A data mining perspective, [22] M. S. Nawaz, P. Fournier-Viger, Y. He, and Q. Zhang,
arXiv preprint arXiv: 1708.01967, 2017. PSAC-PDB: Analysis and classification of protein
[7] T. Buchanan, Why do people spread false information structures, Comput. Biol. Med., vol. 158, p. 106814, 2023.
online? The effects of message and viewer characteristics [23] L. Ni, W. Luo, N. Lu, and W. Zhu, Mining the local
on self-reported likelihood of sharing social media dependency itemset in a products network, ACM Trans.
disinformation, PLoS One, vol. 15, no. 10, p. e0239666, Manage. Inf. Syst., vol. 11, no. 1, pp. 1–31, 2020.
2020. [24] R. U. Mustafa, M. S. Nawaz, J. Ferzund, M. I. U. Lali, B.
[8] C. Boididou, S. Papadopoulos, Y. Kompatsiaris, S. Shahzad, and P. Fournier-Viger, Early detection of
Schifferes, and N. Newman, Challenges of computational controversial Urdu speeches from social media, Data
verification in social multimedia, in Proc. 23rd Int. Conf. Science and Pattern Recognition, vol. 1, no. 2, pp. 26–42,
World Wide Web, Seoul, Republic of Korea, 2014, pp. 2017.
743–748. [25] D. Schweizer, M. Zehnder, H. Wache, H. F. Witschel, D.
[9] C. Boididou, S. Papadopoulos, M. Zampoglou, L. Zanatta, and M. Rodriguez, Using consumer behavior data
Apostolidis, O. Papadopoulou, and Y. Kompatsiaris, to reduce energy consumption in smart homes: Applying
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 961
machine learning to save energy without lowering comfort bidirectional encoder representations from transformers
of inhabitants, in Proc. IEEE 14th Int. Conf. Machine (BERT), Appl. Sci., vol. 9, no. 19, p. 4062, 2019.
Learning and Applications (ICMLA), Miami, FL, USA, [41] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, dEFEND:
2015, pp. 1123–1129. Explainable fake news detection, in Proc. 25th ACM
[26] M. S. Nawaz, P. Fournier-Viger, M. Z. Nawaz, G. Chen, SIGKDD Int. Conf. Knowledge Discovery & Data Mining,
and Y. Wu, MalSPM: Metamorphic malware behavior Anchorage, AK, USA, 2019, pp. 395–405.
analysis and classification using sequential pattern mining, [42] F. Khan, R. Alturki, G. Srivastava, F. Gazzawe, S. T. U.
Comput. Secur., vol. 118, p. 102741, 2022. Shah, and S. Mastorakis, Explainable detection of fake
[27] M. S. Nawaz, M. Sun, and P. Fournier-Viger, Proof news on social media using pyramidal co-attention
guidance in PVS with sequential pattern mining, in Proc. network, IEEE Trans. Comput. Soc. Syst., doi:
FSEN 2019, Tehran, Iran, 2019, pp. 45–60. 10.1109/TCSS.2022.3207993.
[28] P. Fournier-Viger, T. Gueniche, and V. S. Tseng, Using [43] I. K. Sastrawan, I. P. A. Bayupati, and D. M. S. Arsa,
partially-ordered sequential rules to generate more Detection of fake news using deep learning CNN–RNN
accurate sequence prediction, in Proc. 8th Int. Conf. based methods, ICT Express, vol. 8, no. 3, pp. 396–408,
Advanced Data Mining and Applications, ADMA 2012, 2022.
Nanjing, China, 2012, pp. 431–442. [44] N. Rai, D. Kumar, N. Kaushik, C. Raj, and A. Ali, Fake
[29] S. Feng, R. Banerjee, and Y. Choi, Syntactic stylometry news classification using transformer based enhanced
for deception detection, in Proc. 50th Annual Meeting of LSTM and BERT, Int. J. Cogn. Comput. Eng., vol. 3, pp.
the Association for Computational Linguistics, ACL 2012, 98–105, 2022.
Jeju Island, Republic of Korea, 2012, pp. 171–175. [45] R. K. Kaliyar, A. Goswami, and P. Narang, FakeBERT:
[30] H. Karimi and J. Tang, Learning hierarchical discourse- Fake news detection in social media with a BERT-based
level structure for fake news detection, in Proc. 2019 deep learning approach, Multimed. Tools Appl., vol. 80,
Conf. the North American Chapter of the Association for no. 8, pp. 11765–11788, 2021.
Computational Linguistics: Human Language [46] S. Y. Lin, Y. C. Kung, and F. Y. Leu, Predictive
Technologies, Minneapolis, MN, USA, 2019, pp. intelligence in harmful news identification by BERT-
3432–3442. based ensemble learning model with text sentiment
[31] V. L. Rubin and T. Lukoianova, Truth and deception at the analysis, Inf. Process. Manag., vol. 59, no. 2, p. 102872,
rhetorical structure level, J. Assoc. Inf. Sci. Technol., vol. 2022.
66, no. 5, pp. 905–917, 2015. [47] S. Deepak and B. Chitturi, Deep neural approach to fake-
[32] B. Horne and S. Adali, This just in: Fake news packs a lot news identification, Procedia Comput. Sci., vol. 167, pp.
in title, uses simpler, repetitive content in text body, more 2236–2243, 2020.
similar to satire than real news, Proc. Int. AAAI Conf. Web [48] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha,
Soc. Medium., vol. 11, no. 1, pp. 759–766, 2017. FNDNet—A deep convolutional neural network for fake
[33] J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. news detection, Cogn. Syst. Res., vol. 61, no. C, pp. 32–44,
Benevenuto, Supervised learning for fake news detection, 2020.
IEEE Intell. Syst., vol. 34, no. 2, pp. 76–81, 2019. [49] W. Y. Wang, “Liar, liar pants on fire”: A new benchmark
[34] J. Y. Khan, M. T. I. Khondaker, S. Afroz, G. Uddin, and dataset for fake news detection, arXiv preprint arXiv:
A. Iqbal, A benchmark study of machine learning models 1705.00648, 2017.
for online fake news detection, Mach. Learn. Appl., vol. 4, [50] H. Karimi, P. C. Roy, S. Saba-Sadiya, and J. Tang, Multi-
p. 100032, 2021. source multi-class fake news detection, in Proc. 27th Int.
[35] G. Gravanis, A. Vakali, K. Diamantaras, and P. Karadais, Conf. Computational Linguistics (COLING), Santa Fe,
Behind the cues: A benchmarking study for fake news NM, USA, 2018, pp. 1546–1557.
detection, Expert Syst. Appl., vol. 128, no. C, pp. 201–213, [51] H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, and Y. Choi,
2019. Truth of varying shades: Analyzing language in fake news
[36] I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, Fake and political fact-checking, in Proc. 2017 Conf. Empirical
news detection using machine learning ensemble methods, Methods in Natural Language Processing (EMNLP),
Complexity, vol. 2020, p. 8885861, 2020. Copenhagen, Denmark, 2017, pp. 2931–2937.
[37] F. A. Ozbay and B. Alatas, Fake news detection within [52] T. Rasool, W. H. Butt, A. Shaukat, and M. U. Akram,
online social media using supervised artificial intelligence Multi-label fake news detection using multi-layered
algorithms, Phys. A: Stat. Mech. Appl., vol. 540, p. supervised learning, in Proc. 2019 11th Int. Conf.
123174, 2020. Computer and Automation Engineering, Perth, Australia,
[38] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, 2019, pp. 73–77.
FakeNewsNet: A data repository with news content, social [53] M. Arif, A. L. Tonja, I. Ameer, O. Kolesnikova, A. F.
context, and spatiotemporal information for studying fake Gelbukh, G. Sidorov, and A. G. M. Meque, CIC at
news on social media, Big Data, vol. 8, no. 3, pp. CheckThat! 2022: Multi-class and cross-lingual fake news
171–188, 2020. detection, in Proc. CEUR Workshop, Bologna, Italy, 2022,
[39] F. Qian, C. Gong, K. Sharma, and Y. Liu, Neural user pp. 434–443.
response generator: Fake news detection with collective [54] Y. Long, Q. Lu, R. Xiang, M. Li, and C. R. Huang, Fake
user intelligence, in Proc. 27th Int. Joint Conf. Artificial news detection through multi-perspective speaker profiles,
Intelligence (IJCAI-18), Stockholm, Sweden, 2018, pp. in Proc. 8th Int. Joint Conf. Natural Language Processing
3834–3840. (IJCNLP), Taipei, China, 2017, pp. 252–256.
[40] H. Jwa, D. Oh, K. Park, J. Kang, and H. Lim, exBAKE: [55] N. Singh, R. K. Kaliyar, T. Vivekanand, K. Uthkarsh, V.
Automatic fake news detection model based on Mishra, and A. Goswami, B-LIAR: A novel model for
962 Big Data Mining and Analytics, September 2024, 7(3): 942−963
handling multiclass fake news data utilizing a transformer [67] G. McIntire, Fake Real News Dataset, https://github.
encoder stack-based architecture, in Proc. 1st Int. Conf. com/GeorgeMcIntire/fake_real_news_dataset, 2024.
Informatics (ICI), Noida, India, 2022, pp. 31–35. [68] Kaggle, BuzzFeed News Analysis and Classification,
[56] J. Alghamdi, Y. Lin, and S. Luo, Modeling fake news http://kaggle.com/code/sohamohajeri/buzzfeed-news-
detection using BERT-CNN-BiLSTM architecture, in analysis-and-classification/, 2024.
Proc. IEEE 5th Int. Conf. Multimedia Information [69] Kaggle, Fake News Classification, http://kaggle.com/
Processing and Retrieval (MIPR), CA, USA, 2022, pp. datasets/saurabhshahane/fake-news-classification, 2024.
354–357. [70] Kaggle, Fake and Real News Dataset, http://github.
[57] T. E. Trueman, J. Ashok Kumar, P. Narayanasamy, and J. com/MuhammadzohaibNawaz/FakeNewDS6, 2024.
Vidya, Attention-based C-BiLSTM for fake news [71] M. S. Nawaz, P. Fournier-Viger, A. Shojaee, and H.
detection, Appl. Soft Comput., vol. 110, p. 107600, 2021. Fujita, Using artificial intelligence techniques for COVID-
[58] M. H. Goldani, R. Safabakhsh, and S. Momtazi, 19 genome analysis, Appl. Intell., vol. 51, no. 5, pp.
Convolutional neural network with margin loss for fake
3086–3103, 2021.
news detection, Inf. Process. Manag., vol. 58, no. 1, p. [72] R. Agrawal and R. Srikant, Fast algorithms for mining
102418, 2021. association rules in large databases, in Proc. 20th VLDB,
[59] M. H. Goldani, S. Momtazi, and R. Safabakhsh, Detecting
fake news with capsule neural networks, Appl. Soft Santiago, Chile, 1994, pp. 487–499.
[73] P. Fournier-Viger, A. Gomariz, T. Gueniche, E.
Comput., vol. 101, p. 106991, 2021.
[60] K. Shu, S. Wang, and H. Liu, Beyond news contents: The Mwamikazi, and R. Thomas, TKS: Efficient mining of
role of social context for fake news detection, arXiv top-k sequential patterns, in Proc. 9th Int. Conf. Advanced
preprint arXiv: 1712.07709, 2017. Data Mining and Applications (ADMA), Hangzhou, China,
[61] S. Xiong, G. Zhang, V. Batra, L. Xi, L. Shi, and L. Liu, 2013, pp. 109–120.
TRIMOON: Two-round inconsistency-based multi-modal [74] P. Fournier-Viger, A. Gomariz, M. Campos, and R.
fusion network for fake news detection, Inf. Fusion, vol. Thomas, Fast vertical mining of sequential patterns using
93, no. C, pp. 150–158, 2023. co-occurrence information, in Advances in Knowledge
[62] C. Song, N. Ning, Y. Zhang, and B. Wu, A multimodal Discovery and Data, V. S. Tseng, T. B. Ho, Z. H. Zhou,
fake news detection model based on crossmodal attention A. L. P. Chen, and H. Y. Kao, eds. Cham, Switzerland:
residual and multichannel convolutional neural networks, Springer, 2014, pp. 40–52.
Inf. Process. Manag., vol. 58, no. 1, p. 102437, 2021. [75] P. Fournier-Viger, T. Gueniche, S. Zida, and V. S. Tseng,
[63] B. Palani, S. Elango, and V. K. Vignesh, CB-Fake: A
ERMiner: Sequential rule mining using equivalence
multimodal deep learning framework for automatic fake
classes, in Advances in Intelligent Data Analysis XIII, H.
news detection using capsule neural network and BERT,
Blockeel, M. van Leeuwen, and V. Vinciotti, eds. Cham,
Multimed. Tools Appl., vol. 81, no. 4, pp. 5587–5620,
2022. Switzerland: Springer, 2014, pp. 108–119.
[64] G. Zhang, A. Giachanou, and P. Rosso, SceneFND: [76] P. Fournier-Viger, J. C. W. Lin, A. Gomariz, T. Gueniche,
Multimodal fake news detection by modelling scene A. Soltani, Z. Deng, and H. T. Lam, The SPMF open-
context information, J. Inf. Sci., vol. 50, no. 2, pp. source data mining library version 2, in Machine Learning
355–367, 2022. and Knowledge Discovery in Databases, B. Berendt, B.
[65] J. Jing, H. Wu, J. Sun, X. Fang, and H. Zhang, Multimodal Bringmann, É. Fromont, G. Garriga, P. Miettinen, N. Tatti,
fake news detection via progressive fusion networks, Inf. and V. Tresp, eds. Cham, Switzerland: Springer, 2016, pp.
Process. Manag., vol. 60, no. 1, p. 103120, 2023. 36–40.
[66] Y. J. Lu and C. T. Li, GCAN: Graph-aware co-attention [77] O. Kramer, Scikit-learn, in Machine Learning for
networks for explainable fake news detection on social Evolution Strategies, O. Kramer, ed. Cham, Switzerland:
media, in Proc. 58th Annual Meeting of the Association Springer, 2016, pp. 45–53.
for Computational Linguistics, Virtual Event, 2020, pp. [78] S. Ventura and J. M. Luna, Supervised Descriptive Pattern
504–514. Mining. Berlin, Germany: Springer, 2018.
M. Saqib Nawaz received the BS degree M. Zohaib Nawaz received the bachelor
in computer systems engineering from degree from University of Sargodha,
University of Engineering and Pakistan in 2016 and the master degree
Technology, Peshawar, Pakistan in 2011, from National University of Sciences and
the MS degree in computer science from Technology (NUST), Pakistan in 2020. He
University of Sargodha, Pakistan in 2014, is a lecturer (on leave) at the Department
and the PhD degree from Peking of Computer Science, Faculty of
University, Beijing, China in 2019. He Computing and Information Technology,
worked as a postdoctoral fellow at Harbin Institute of University of Sargodha, Pakistan since 2018. Currently, he is
Technology (Shenzhen), China from September 2019 to January pursuing the PhD degree in computer science at Shenzhen
2022. He is currently working as an associate researcher at University, China. His research interests include descriptive
Shenzhen University, China. His research interests include pattern mining and formal methods. He is a member of ACM.
bioinformatics, pattern mining, formal methods, and the use of
machine learning and data mining in software engineering.
M. Zohaib Nawaz et al.: Analysis and Classification of Fake News Using Sequential Pattern Mining 963