0% found this document useful (0 votes)
8 views

Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods

Fake News Detection

Uploaded by

nguyenwoong20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods

Fake News Detection

Uploaded by

nguyenwoong20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Hindawi

Complexity
Volume 2020, Article ID 8885861, 11 pages
https://doi.org/10.1155/2020/8885861

Research Article
Fake News Detection Using Machine Learning Ensemble Methods

Iftikhar Ahmad ,1 Muhammad Yousaf,1 Suhail Yousaf ,1


and Muhammad Ovais Ahmad 2
1
Department of Computer Science and Information Technology, University of Engineering and Technology, Peshawar, Pakistan
2
Department of Mathematics and Computer Science, Karlstad University, Karlstad, Sweden

Correspondence should be addressed to Muhammad Ovais Ahmad; [email protected]

Received 4 September 2020; Revised 14 September 2020; Accepted 16 September 2020; Published 17 October 2020

Academic Editor: M. Irfan Uddin

Copyright © 2020 Iftikhar Ahmad et al. )is is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
)e advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way
for information dissemination that has never been witnessed in the human history before. With the current usage of social media
platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to
reality. Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a
particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article. In this work, we propose to
use machine learning ensemble approach for automated classification of news articles. Our study explores different textual properties
that can be used to distinguish fake contents from real. By using those properties, we train a combination of different machine
learning algorithms using various ensemble methods and evaluate their performance on 4 real world datasets. Experimental
evaluation confirms the superior performance of our proposed ensemble learner approach in comparison to individual learners.

1. Introduction spreading satire or absurdity. )e phenomenon is commonly


known as fake news.
)e advent of the World Wide Web and the rapid adoption of )ere has been a rapid increase in the spread of fake news
social media platforms (such as Facebook and Twitter) paved in the last decade, most prominently observed in the 2016 US
the way for information dissemination that has never been elections [5]. Such proliferation of sharing articles online that
witnessed in the human history before. Besides other use cases, do not conform to facts has led to many problems not just
news outlets benefitted from the widespread use of social media limited to politics but covering various other domains such as
platforms by providing updated news in near real time to its sports, health, and also science [3]. One such area affected by
subscribers. )e news media evolved from newspapers, tab- fake news is the financial markets [6], where a rumor can have
loids, and magazines to a digital form such as online news disastrous consequences and may bring the market to a halt.
platforms, blogs, social media feeds, and other digital media Our ability to take a decision relies mostly on the type of
formats [1]. It became easier for consumers to acquire the latest information we consume; our world view is shaped on the
news at their fingertips. Facebook referrals account for 70% of basis of information we digest. )ere is increasing evidence
traffic to news websites [2]. )ese social media platforms in that consumers have reacted absurdly to news that later
their current state are extremely powerful and useful for their proved to be fake [7, 8]. One recent case is the spread of
ability to allow users to discuss and share ideas and debate over novel corona virus, where fake reports spread over the
issues such as democracy, education, and health. However, Internet about the origin, nature, and behavior of the virus
such platforms are also used with a negative perspective by [9]. )e situation worsened as more people read about the
certain entities commonly for monetary gain [3, 4] and in other fake contents online. Identifying such news online is a
cases for creating biased opinions, manipulating mindsets, and daunting task.
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Complexity

Fortunately, there are a number of computational convolutional neural network (CNN). A convolutional layer
techniques that can be used to mark certain articles as fake is used to capture the dependency between the metadata
on the basis of their textual content [10]. Majority of these vectors, followed by a bidirectional LSTM layer. )e max-
techniques use fact checking websites such as “PolitiFact” pooled text representations were concatenated with the
and “Snopes.” )ere are a number of repositories main- metadata representation from the bidirectional LSTM,
tained by researchers that contain lists of websites that are which was fed to fully connected layer with a softmax ac-
identified as ambiguous and fake [11]. However, the problem tivation function to generate the final prediction. )e re-
with these resources is that human expertise is required to search is conducted on a dataset from political domain
identify articles/websites as fake. More importantly, the fact which contains statements from two different parties. Along
checking websites contain articles from particular domains with that, some metadata such as subject, speaker, job, state,
such as politics and are not generalized to identify fake news party, context, and history are also included as a feature set.
articles from multiple domains such as entertainment, Accuracy of 27.7% was achieved with combination of fea-
sports, and technology. tures such as text and speaker, whereas 27.4% accuracy was
)e World Wide Web contains data in diverse formats achieved by combining all the different metadata elements
such as documents, videos, and audios. News published with text. A competitive solution is provided by Riedel et al.
online in an unstructured format (such as news, articles, [19], which is a stance detection system that assigns one of
videos, and audios) is relatively difficult to detect and classify four labels to an article, “agree,” “disagree,” “discuss,” or
as this strictly requires human expertise. However, compu- “unrelated,” depending on the conformity of article headline
tational techniques such as natural language processing with article text. )e authors used linguistic properties of
(NLP) can be used to detect anomalies that separate a text text such as term frequency (TF) and term frequency-inverse
article that is deceptive in nature from articles that are based document frequency (TF-IDF) as a feature set, and a
on facts [12]. Other techniques involve the analysis of multilayer perceptron (MLP) classifier is used with one
propagation of fake news in contrast with real news [13]. hidden layer and a softmax function on the output of the
More specifically, the approach analyzes how a fake news final layer. )e dataset contained articles with a headline,
article propagates differently on a network relative to a true body, and label. )e system’s accuracy on the “disagree”
article. )e response that an article gets can be differentiated label on test examples was poor, whereas it performs best
at a theoretical level to classify the article as real or fake. A with respect to the “agree” label. )e authors used a simple
more hybrid approach can also be used to analyze the social MLP with some fine-tuned hyperparameters to achieve an
response of an article along with exploring the textual features overall accuracy of 88.46%. Shu et al. [12] also discussed
to examine whether an article is deceptive in nature or not. several varieties of veracity assessment methods to detect
A number of studies have primarily focused on detection fake news online. Two major categories of assessment
and classification of fake news on social media platforms methods are explored: one is linguistic cue approaches and
such as Facebook and Twitter [13, 14]. At conceptual level, the other is network analyses approaches. A combination of
fake news has been classified into different types; the both creates a more robust hybrid approach for fake news
knowledge is then expanded to generalize machine learning detection online. Linguistic approaches involve deep syntax,
(ML) models for multiple domains [10, 15, 16]. )e study by rhetorical structure, and discourse analysis. )ese linguistic
Ahmed et al. [17] included extracting linguistic features such approaches are used to train classifiers such as SVM or naı̈ve
as n-grams from textual articles and training multiple ML Bayes models. Network-based approaches included ana-
models including K-nearest neighbor (KNN), support vector lyzing and processing social network behavior and linked
machine (SVM), logistic regression (LR), linear support data. A unique approach is followed by Vosoughi et al. [13]
vector machine (LSVM), decision tree (DT), and stochastic to explore the properties of news spread on social media; i.e.,
gradient descent (SGD), achieving the highest accuracy the authors discussed the spread of news (rumors) on social
(92%) with SVM and logistic regression. According to the media such as Twitter and analyzed how the spread of fake
research, as the number of n increased in n-grams calculated news differs from real news in terms of its diffusion on
for a particular article, the overall accuracy decreased. )e Twitter. Multiple analysis techniques are discussed in the
phenomenon has been observed for learning models that are paper to explore the spread of fake news online, such as the
used for classification. Shu et al. [12] achieved better ac- depth, the size, the maximum breadth, the structural virality,
curacies with different models by combining textual features the mean breadth of true and false rumor cascades at various
with auxiliary information such as user social engagements depths, the number of unique Twitter users reached at any
on social media. )e authors also discussed the social and depth, and the number of minutes it takes for true and false
psychological theories and how they can be used to detect rumor cascades to reach depth and number of Twitter users.
false information online. Further, the authors discussed
different data mining algorithms for model constructions
and techniques shared for features extraction. )ese models 1.1. Our Contributions. In the current fake news corpus,
are based on knowledge such as writing style, and social there have been multiple instances where both supervised
context such as stance and propagation. and unsupervised learning algorithms are used to classify
A different approach is followed by Wang [18]. )e text [20, 21]. However, most of the literature focuses on
author used textual features and metadata for training specific datasets or domains, most prominently the politics
various ML models. )e author focused mainly on using domain [10, 19, 21]. )erefore, the algorithm trained works
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Complexity 3

best on a particular type of article’s domain and does not structure. )ese operations are performed on all the datasets
achieve optimal results when exposed to articles from other to achieve consistency of format and structure.
domains. Since articles from different domains have a Once the relevant attributes are selected after the data
unique textual structure, it is difficult to train a generic cleaning and exploration phase, the next step involves ex-
algorithm that works best on all particular news domains. In traction of the linguistic features. Linguistic features in-
this paper, we propose a solution to the fake news detection volved certain textual characteristics converted into a
problem using the machine learning ensemble approach. numerical form such that they can be used as an input for the
Our study explores different textual properties that could be training models. )ese features include percentage of words
used to distinguish fake contents from real. By using those implying positive or negative emotions; percentage of stop
properties, we train a combination of different machine words; punctuation; function words; informal language; and
learning algorithms using various ensemble methods that percentage of certain grammar used in sentences such as
are not thoroughly explored in the current literature. )e adjectives, preposition, and verbs. To accomplish the ex-
ensemble learners have proven to be useful in a wide variety traction of features from the corpus, we used the LIWC2015
of applications, as the learning models have the tendency to tool which classifies the text into different discrete and
reduce error rate by using techniques such as bagging and continuous variables, some of which are mentioned above.
boosting [22]. )ese techniques facilitate the training of LIWC tool extracts 93 different features from any given text.
different machine learning algorithms in an effective and As all of the features extracted using the tool are numerical
efficient manner. We also conducted extensive experiments values, no encoding is required for categorical variables.
on 4 real world publicly available datasets. )e results However, scaling is employed to ensure that various feature’s
validate the improved performance of our proposed tech- values lie in the range of (0, 1). )is is necessary as some
nique using the 4 commonly used performance metrics values are in the range of 0 to 100 (such as percentage
(namely, accuracy, precision, recall, and F-1 score). values), whereas other values have arbitrary range (such as
word counts). )e input features are then used to train the
different machine learning models. Each dataset is divided
2. Materials and Methods
into training and testing set with a 70/30 split, respectively.
In the following, we describe our proposed framework, )e articles are shuffled to ensure a fair allocation of fake and
followed by the description of algorithms, datasets, and true articles in training and tests instances.
performance evaluation metrics. )e learning algorithms are trained with different
hyperparameters to achieve maximum accuracy for a given
dataset, with an optimal balance between variance and bias.
2.1. Proposed Framework. In our proposed framework, as Each model is trained multiple times with a set of different
illustrated in Figure 1, we are expanding on the current parameters using a grid search to optimize the model for the
literature by introducing ensemble techniques with various best outcome. Using a grid search to find the best parameters
linguistic feature sets to classify news articles from multiple is computationally expensive [26]; however, the measure is
domains as true or fake. )e ensemble techniques along with taken to ensure the models do not overfit or underfit the data.
Linguistic Inquiry and Word Count (LIWC) feature set used Novel to this research, various ensemble techniques such
in this research are the novelty of our proposed approach. as bagging, boosting, and voting classifier are explored to
)ere are numerous reputed websites that post legiti- evaluate the performance over the multiple datasets. We
mate news contents, and a few other websites such as used two different voting classifiers composed of three
PolitiFact and Snopes which are used for fact checking. In learning models: the first voting classifier is an ensemble of
addition, there are open repositories which are maintained logistic regression, random forest, and KNN, whereas the
by researchers [11] to keep an up-to-date list of currently second voting classifier consists of logistic regression, linear
available datasets and hyperlinks to potential fact checking SVM, and classification and regression trees (CART). )e
sites that may help in countering false news spread. How- criteria used for training the voting classifiers is to train
ever, we selected three datasets for our experiments which individual models with the best parameters and then test the
contain news from multiple domains (such as politics, en- model based on the selection of the output label on the basis
tertainment, technology, and sports) and contain a mix of of major votes by all three models. We have trained a
both truthful and fake articles. )e datasets are available bagging ensemble consisting of 100 decision trees, whereas
online and are extracted from the World Wide Web. )e two boosting ensemble algorithms are used, XGBoost and
first dataset is ISOT Fake News Dataset [23]; the second and AdaBoost. A k-fold (k � 10) cross validation model is
third datasets are publicly available at Kaggle [24, 25]. A employed for all ensemble learners. )e learning models
detailed description of the datasets is provided in Section 2.5. used are described in detail in Section 2.2. To evaluate the
)e corpus collected from the World Wide Web is performance of each model, we used accuracy, precision,
preprocessed before being used as an input for training the recall, and F1 score metrics as discussed in Section 2.6.
models. )e articles’ unwanted variables such as authors,
date posted, URL, and category are filtered out. Articles with
no body text or having less than 20 words in the article body 2.2. Algorithms. We used the following learning algorithms
are also removed. Multicolumn articles are transformed into in conjunction with our proposed methodology to evaluate
single column articles for uniformity of format and the performance of fake news detection classifiers.
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Complexity

Data scraping Dataset Data cleaning/


collection exploration
World wide web

Model training
/ensemble Feature Extract linguistic
methods selection features

Hyperparameters/ Training data Test data


model tuning

Model
evaluation

User query
Trained model Classification

Figure 1: Workflow for training algorithms and classification of news articles.

2.2.1. Logistic Regression. As we are classifying text on the 1 n 2


basis of a wide feature set, with a binary output (true/false or J(θ) � 􏽘θ , (3)
2 j�1 j
true article/fake article), a logistic regression (LR) model is
used, since it provides the intuitive equation to classify
such that
problems into binary or multiple classes [27]. We performed
hyperparameters tuning to get the best result for all indi- θT x(i) ≥ 1, y(i) � 1, (4)
vidual datasets, while multiple parameters are tested before
acquiring the maximum accuracies from LR model. θT x(i) ≤ − 1, y(i) � 0. (5)
Mathematically, the logistic regression hypothesis function
can be defined as follows [27]: )e function above uses a linear kernel. Kernels are
1 usually used to fit data points that cannot be easily separable
hθ (X) � . (1) or data points that are multidimensional. In our case, we
−(β0 + β1 X)
1+e have used sigmoid SVM, kernel SVM (polynomial SVM),
Logistic regression uses a sigmoid function to transform Gaussian SVM, and basic linear SVM models.
the output to a probability value; the objective is to minimize
the cost function to achieve an optimal probability. )e cost
2.2.3. Multilayer Perceptron. A multilayer perceptron
function is calculated as shown in
(MLP) is an artificial neural network, with an input layer,
log hθ (x)􏼁, y � 1, one or more hidden layers, and an output layer. MLP can be
Cost hθ (x), y􏼁 � 􏼨 (2) as simple as having each of the three layers; however, in our
−log 1 − hθ (x)􏼁, y � 0.
experiments we have fine-tuned the model with various
parameters and number of layers to generate an optimum
predicting model. A basic multilayered perceptron model
2.2.2. Support Vector Machine. Support vector machine with one hidden layer can be represented as a function as
(SVM) is another model for binary classification problem shown below [31]:
and is available in various kernels functions [28]. )e
f(x) � g􏼐b(2) + W(2) 􏼐s􏼐b(1) + W(1) x􏼑􏼑􏼑. (6)
objective of an SVM model is to estimate a hyperplane (or
decision boundary) on the basis of feature set to classify Here, b(1) and b(2) are the bias vectors, W(1) and W(2) are the
data points [29]. )e dimension of hyperplane varies weight matrices, and g and s are the activation functions. In
according to the number of features. As there could be our case, the activation function is ReLU and the Adam
multiple possibilities for a hyperplane to exist in an N- solver, with 3 hidden layers.
dimensional space, the task is to identify the plane that
separates the data points of two classes with maximum
margin. A mathematical representation of the cost 2.2.4. K-Nearest Neighbors (KNN). KNN is an unsupervised
function for the SVM model is defined as given in [30] and machine learning model where a dependent variable is not
shown in required to predict the outcome on a specific data. We
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Complexity 5

provide enough training data to the model and let it decide the final prediction is based on a class that received majority
to which particular neighborhood a data point belongs. votes. )e error rate is low in random forest as compared to
KNN model estimates the distance of a new data point to its other models, due to low correlation among trees [33]. Our
nearest neighbors, and the value of K estimates the majority random forest model was trained using different parameters;
of its neighbors’ votes; if the value of K is 1, then the new data i.e., different numbers of estimators were used in a grid
point is assigned to a class which has the nearest distance. search to produce the best model that can predict the
)e mathematical formulae to estimate the distance between outcome with high accuracy. )ere are multiple algorithms
two points can be calculated as follows [31]: to decide a split in a decision tree based on the problem of
􏽶����������� regression or classification. For the classification problem,
􏽴
k
2
we have used the Gini index as a cost function to estimate a
Euclidean distance � 􏽘 xi − yi 􏼁 , (7) split in the dataset. )e Gini index is calculated by sub-
i�1 tracting the sum of the squared probabilities of each class
from one. )e mathematical formula to calculate the Gini
k 􏼌
􏼌 􏼌􏼌 index (Gind ) is as follows [34]:
Manhattan distance � 􏽘􏼌􏼌xi − yi 􏼌􏼌 , (8)
i�1 c
2
Gind � 1 − 􏽘 Pi 􏼁 , (10)
1/q i�1
k 􏼌 􏼌
⎝􏽘 􏼌􏼌􏼌x − y 􏼌􏼌􏼌q ⎞
Minkowski distance � ⎛ ⎠ . (9)
i i
i�1
2.3.2. Bagging Ensemble Classifiers. bootstrap aggregating,
or in short bagging classifier, is an early ensemble method
mainly used to reduce the variance (overfitting) over a
2.3. Ensemble Learners. We proposed using existing en-
training set. Random forest model is one of the most fre-
semble techniques along with textual characteristics as feature
quently used as a variant of bagging classifier. Intuitively, for
input to improve the overall accuracy for the purpose of
a classification problem, the bagging model selects the class
classification between a truthful and a false article. Ensemble
on the basis of major votes estimated by M number of trees
learners tend to have higher accuracies, as more than one
to reduce the overall variance, while the data for each tree is
model is trained using a particular technique to reduce the
selected using random sampling with replacement from
overall error rate and improve the performance of the model.
overall dataset. For regression problems, however, the
)e intuition behind the ensemble modeling is synonymous
bagging model averages over multiple estimates.
to the one we are already used to in our daily life such as
requesting opinions of multiple experts before taking a
particular decision in order to minimize the chance of a bad 2.3.3. Boosting Ensemble Classifiers. boosting is another
decision or an undesirable outcome. For example, a classi- widely used ensemble method to train weak models to
fication algorithm can be trained on a particular dataset with a become strong learners. For that purpose, a forest of ran-
unique set of parameters that can produce a decision domized trees is trained, and the final prediction is based on
boundary which fits the data to some extent. )e outcome of the majority vote outcome from each tree. )is method
that particular algorithm depends not only on the parameters allows weak learners to correctly classify data points in an
that were provided to train the model, but also on the type of incremental approach that are usually misclassified. Initially
training data. If the training data contains less variance or equal weighted coefficients are used for all data points to
uniform data, then the model might overfit and produce classify a given problem. In the successive rounds, the
biased results over unseen data. )erefore, approaches like weighted coefficients are decreased for data points that are
cross validation are used to minimize the risk of overfitting. A correctly classified and are increased for data points that are
number of models can be trained on different set of pa- misclassified [35]. Each subsequent tree formed in each
rameters to create multiple decision boundaries on randomly round learns to reduce the errors from the preceding round
chosen data points as training data. Hence, using ensemble and to increase the overall accuracy by correctly classifying
learning techniques, these problems can be addressed and data points that were misclassified in previous rounds. One
mitigated by training multiple algorithms, and their results major problem with boosting ensemble is that it might
can be combined for near optimum outcome. One such overfit to the training data which may lead to incorrect
technique is using voting classifiers where the final classifi- predictions for unseen instances [36]. )ere are multiple
cation depends on the major votes provided by all algorithms boosting algorithms available that can be used for both the
[32]. However, there are other ensemble techniques as well purposes of classification and regression. In our experiments
that can be used in different scenarios such as the following. we used XGBoost [37] and AdaBoost [38] algorithms for
classification purpose.
2.3.1. Random Forest (RF). Random forest (RF) is an ad-
vanced form of decision trees (DT) which is also a supervised 2.3.4. Voting Ensemble Classifiers. voting ensemble is gen-
learning model. RF consists of large number of decision trees erally used for classification problems as it allows the
working individually to predict an outcome of a class where combination of two or more learning models trained on the
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Complexity

whole dataset [39]. Each model predicts an outcome for a and fake articles extracted from the World Wide Web. )e
sample data point which is considered a “vote” in favor of the true articles are extracted from reuters.com which is a re-
class that the model has predicted. Once each model predicts nowned news website, while the fake articles were extracted
the outcome, the final prediction is based on the majority from multiple sources, mostly websites which are flagged by
vote for a specific class [32]. Voting ensemble, as compared politifact.com. )e dataset contains a total of 44,898 articles,
to bagging and boosting algorithms, is simpler in terms of out of which 21,417 are truthful articles and 23,481 fake
implementation. As discussed, bagging algorithms create articles. )e total corpora contain articles from different
multiple subsets of data by random sampling and replace- domains, but most prominently target political news.
ment from the whole dataset, thus creating a number of )e second dataset is available at Kaggle [24] (hereafter
datasets. Each dataset is then used to train a model, while the referred to as DS2) which contains a total of 20,386 articles
final result is an aggregation of outcome from each model. In used for training and 5,126 articles used for testing. )e
case of boosting, multiple models are trained in a sequential dataset is built from multiple sources on the Internet. )e
manner where each model learns from the previous by articles are not limited to a single domain such as politics as
increasing weights for the misclassified points, thus creating they include both fake and true articles from various other
a generic model that is able to correctly classify the problem. domains.
However, voting ensemble on the other hand is a combi- )e third dataset is also available at Kaggle [25] (here-
nation of multiple independent models that produces after referred to as DS3); it includes a total of 3,352 articles,
classification results that contribute to the overall prediction both fake and true. )e true articles are extracted from
by majority voting. trusted online sources such as CNN, Reuters, the New York
Times, and various others, while the fake news articles are
extracted from untrusted news websites. )e domains it
2.4. Benchmark Algorithms. In this section, we discuss the covered include sports, entertainment, and politics.
benchmark algorithms with which we compare the per- A combined dataset is the collection of articles from the
formance of our methodology. three datasets (hereafter referred to as DS4). As the articles
vary in nature in each dataset, the fourth dataset is created to
2.4.1. Linear SVM. We use linear SVM approach proposed evaluate the performance of algorithms on datasets which
in [21]. To ensure a meaningful comparison, we trained the cover a wide array of domains in a single dataset.
linear SVM on the feature set as discussed in [21] with 5-fold
cross validation. Note that the approach is referred to as
Perez-LSVM in the text. 2.6. Performance Metrics. To evaluate the performance of
algorithms, we used different metrics. Most of them are
based on the confusion matrix. Confusion matrix is a tabular
2.4.2. Convolutional Neural Network. Wang [18] used representation of a classification model performance on the
convolutional neural network (CNN) for automatic detec- test set, which consists of four parameters: true positive, false
tion of fake news. We employed the same approach using positive, true negative, and false negative (see Table 1).
our dataset. However, we could not use the feature set of
Wang [18] as the dataset contains only short statements. )e
approach is referred to as Wang-CNN in the text. 2.6.1. Accuracy. Accuracy is often the most used metric
representing the percentage of correctly predicted obser-
vations, either true or false. To calculate the accuracy of a
2.4.3. Bidirectional Long Short-Term Memory Networks.
model performance, the following equation can be used:
Wang [18] also used bidirectional long short-term memory
networks (Bi-LSTM), and we used the same approach with TP + TN
Accuracy � . (11)
different feature sets. )e approach is referred to as Wang- TP + TN + FP + FN
Bi-LSTM in the text.
In most cases, high accuracy value represents a good
model, but considering the fact that we are training a
2.5. Datasets. )e datasets we used in this study are open classification model in our case, an article that was predicted
source and freely available online. )e data includes both as true while it was actually false (false positive) can have
fake and truthful news articles from multiple domains. )e negative consequences; similarly, if an article was predicted
truthful news articles published contain true description of as false while it contained factual data, this can create trust
real world events, while the fake news websites contain issues. )erefore, we have used three other metrics that take
claims that are not aligned with facts. )e conformity of into account the incorrectly classified observation, i.e.,
claims from the politics domain for many of those articles precision, recall, and F1-score.
can be manually checked with fact checking websites such as
politifact.com and snopes.com. We have used three different
datasets in this study, a brief description of which is provided 2.6.2. Recall. Recall represents the total number of positive
as follows. classifications out of true class. In our case, it represents the
)e first dataset is called the “ISOT Fake News Dataset” number of articles predicted as true out of the total number
[23] (hereafter referred to as DS1) which contains both true of true articles.
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Complexity 7

Table 1: Confusion matrix. Table 2: Overall accuracy score for each dataset.
Predicted true Predicted false DS1 DS2 DS3 DS4
Actual true True positive (TP) False negative (FN) Logistic regression (LR) 0.97 0.91 0.91 0.87
Actual false False positive (FP) True negative (TN) Linear SVM (LSVM) 0.98 0.37 0.53 0.86
Multilayer perceptron 0.98 0.35 0.94 0.9
K-nearest neighbors (KNN) 0.88 0.28 0.82 0.77
TP Ensemble learners
Recall � . (12)
TP + FN Random forest (RF) 0.99 0.35 0.95 0.91
Voting classifier (RF, LR, KNN) 0.97 0.88 0.94 0.88
Voting classifier (LR, LSVM, CART) 0.96 0.86 0.92 0.85
Bagging classifier (decision trees) 0.98 0.94 0.94 0.9
2.6.3. Precision. Conversely, precision score represents the Boosting classifier (AdaBoost) 0.98 0.92 0.92 0.86
ratio of true positives to all events predicted as true. In our Boosting classifier (XGBoost) 0.98 0.94 0.94 0.89
case, precision shows the number of articles that are marked Benchmark algorithms
as true out of all the positively predicted (true) articles: Perez-LSVM 0.99 0.79 0.96 0.9
Wang-CNN 0.87 0.66 0.58 0.73
TP
Precision � . (13) Wang-Bi-LSTM 0.86 0.52 0.57 0.62
TP + FP

Figure 2 summarizes the average accuracy of all algo-


2.6.4. F1-Score. F1-score represents the trade-off between rithms over the 4 datasets. Overall, the best performing
precision and recall. It calculates the harmonic mean be- algorithm is bagging classifier (decision trees) (accuracy
tween each of the two. )us, it takes both the false positive 94%), whereas the worst performing algorithm is Wang-Bi-
and the false negative observations into account. F1-score LSTM (accuracy 64.25%). Individual learners’ accuracy is
can be calculated using the following formula: 77.6% whereas the accuracy of ensemble learners is 92.25%.
Precision × Recall Random forest achieved better accuracy on all datasets
F1 − score � 2 . (14) except DS2. However, accuracy score alone is not a good
Precision + Recall
measure to evaluate the performance of a model; therefore,
we also evaluate performance of learning models on the basis
3. Results and Discussion of recall, precision, and F1-score.
Tables 3–5 summarize the recall, precision, and F1 score
Table 2 summarizes the accuracy achieved by each algorithm of each algorithm on all the four datasets. In terms of average
on the four considered datasets. It is evident that the precision (Table 3), boosting classifier (XGBoost) achieved
maximum accuracy achieved on DS1 (ISOT Fake News the best results. )e average precision of boosting classifier
Dataset) is 99%, achieved by random forest algorithm and (XGBoost) on all the four datasets is 95.25%. Random forest
Perez-LSVM. Linear SVM, multilayer perceptron, bagging (RF) achieved a precision of 79.75%; however, on the three
classifiers, and boosting classifiers achieved an accuracy of datasets (removing the dataset with the lowest score, i.e.,
98%. )e average accuracy attained by ensemble learners is DS2), the average precision of random forest jumped to
97.67% on DS1, whereas the corresponding average for 96.3%. )e corresponding score for boosting classifier
individual learners is 95.25%. )e absolute difference be- (XGBoost) is 96.3% as well.
tween individual learners and ensemble learners is 2.42% Based on the recall performance metric, bagging clas-
which is not significant. Benchmark algorithms Wang-CNN sifier (decision trees) stands best by achieving a recall score
and Wang-Bi-LSTM performed poorer than all other al- of 0.942. )is is closely followed by boosting classifier
gorithms. On DS2, bagging classifier (decision trees) and (XGBoost) which achieved a recall of 0.94. Among the
boosting classifier (XGBoost) are the best performing al- benchmark algorithms, Perez-LSVM is found to be best
gorithms, achieving an accuracy of 94%. Interestingly, linear performing algorithm, achieving a recall score of 0.92. )e
SVM, random forest, and Perez-LSVM performed poorly on algorithms exhibited a similar performance behavior on F1-
DS2. Individual learners reported an accuracy of 47.75%, score as that of precision. Boosting classifier (XGBoost)
whereas ensemble learners’ accuracy is 81.5%. A similar achieved F1-score of 0.945, the best among all the tech-
trend is observed for DS3, where individual learners’ ac- niques, followed by bagging classifier (decision trees) and
curacy is 80% whereas ensemble learners’ accuracy is 93.5%. logistic regression (LR).
However, unlike DS2, the best performing algorithm on DS3 Figure 3 is a graphical representation of average per-
is Perez-LSVM which achieved an accuracy of 96%. On DS4 formance of learning algorithms on all datasets using pre-
(DS1, DS2, and DS3 combined), the best performing al- cision, recall, and F1-score. It can be seen that there is not
gorithm is random forest (91% accuracy). On average, in- much difference between the performance of learning al-
dividual learners achieved an accuracy of 85%, whereas gorithms using various performance metrics except for
ensemble learners achieved an accuracy of 88.16%. )e worst linear SVM, KNN, Wang-CNN, and Wang-Bi-LSTM.
performing algorithm is Wang-Bi-LSTM which achieved an )e ensemble learner XGBoost performed better in
accuracy of 62%. comparison to other learning models on all performance
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Complexity

0.95

0.9

0.85
Accuracy

0.8

0.75

0.7

0.65

0.6
Logistic Linear SVM Multilayer K nearest Random Voting Voting Bagging Boosting Boosting Perez-LSVM Wang-CNN Wang-Bi-
regression perceptron neighbors forests (RF) classifier classifier classifier classifier classifier LSTM
(LR) (KNN) (RF, (LR, (Decision (AdaBoost) (XGboost)
LR, KNN) LSVM, CART) trees)
Algorithms

Figure 2: Average accuracy over all datasets.

Table 3: Precision on the 4 datasets.


DS1 DS2 DS3 DS4
Logistic regression (LR) 0.98 0.92 0.93 0.88
Linear SVM (LSVM) 0.98 0.31 0.54 0.88
Multilayer perceptron 0.97 0.32 0.93 0.92
K-nearest neighbors (KNN) 0.91 0.22 0.85 0.8
Ensemble learners
Random forest (RF) 0.99 0.3 0.98 0.92
Voting classifier (RF, LR, KNN) 0.96 0.88 0.92 0.86
Voting classifier (LR, LSVM, CART) 0.94 0.86 0.88 0.83
Bagging classifier (decision trees) 0.98 0.94 0.93 0.9
Boosting classifier (AdaBoost) 0.98 0.92 0.92 0.86
Boosting classifier (XGBoost) 0.99 0.94 0.96 0.92
Benchmark algorithms
Perez-LSVM 0.99 0.79 0.96 0.9
Wang-CNN 0.84 0.65 0.48 0.72
Wang-Bi-LSTM 0.92 0.43 0.5 0.65

Table 4: Recall on the 4 datasets.


DS1 DS2 DS3 DS4
Logistic regression (LR) 0.98 0.9 0.92 0.86
Linear SVM (LSVM) 0.98 0.32 1 0.86
Multilayer perceptron 1 0.36 0.96 0.88
K-nearest neighbors (KNN) 0.87 0.24 0.81 0.74
Ensemble learners
Random forest (RF) 1 0.34 0.93 0.91
Voting classifier (RF, LR, KNN) 0.97 0.89 0.96 0.9
Voting classifier (LR, LSVM, CART) 0.97 0.87 0.96 0.89
Bagging classifier (decision trees) 0.97 0.95 0.94 0.91
Boosting classifier (AdaBoost) 0.98 0.93 0.92 0.86
Boosting classifier (XGBoost) 0.99 0.94 0.94 0.89
Benchmark algorithms
Perez-LSVM 0.99 0.81 0.97 0.91
Wang-CNN 0.9 0.71 0.29 0.75
Wang-Bi-LSTM 0.78 0.59 0.35 0.61
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Complexity 9

Table 5: F1-score on the 4 datasets.


DS1 DS2 DS3 DS4
Logistic regression (LR) 0.98 0.91 0.92 0.87
Linear SVM (LSVM) 0.98 0.32 0.7 0.87
Multilayer perceptron 0.98 0.34 0.95 0.9
K-nearest neighbors (KNN) 0.89 0.23 0.83 0.77
Ensemble learners
Random forest (RF) 0.99 0.32 0.95 0.91
Voting classifier (RF, LR, KNN) 0.97 0.88 0.94 0.88
Voting classifier (LR, LSVM, CART) 0.96 0.86 0.92 0.86
Bagging classifier (decision trees) 0.98 0.94 0.94 0.9
Boosting classifier (AdaBoost) 0.98 0.92 0.92 0.86
Boosting classifier (XGBoost) 0.99 0.94 0.95 0.9
Benchmark algorithms
Perez-LSVM 0.99 0.8 0.96 0.9
Wang-CNN 0.87 0.67 0.31 0.73
Wang-Bi-LSTM 0.84 0.44 0.35 0.57

0.95

0.9

0.85
Precision/recall/F1-score

0.8

0.75

0.7

0.65

0.6

0.55

0.5
Logistic Linear SVM Multilayer K nearest Random Voting Voting Bagging Boosting Boosting Perez-LSVM Wang-CNN Wang-Bi-
regression perceptron neighbors forests (RF) classifier classifier classifier classifier classifier LSTM
(LR) (KNN) (RF, (LR, (Decision (AdaBoost) (XGboost)
LR, KNN) LSVM, CART) trees)
Algorithms

Precision
Recall
F1-score

Figure 3: Precision, recall, and F1-score over all datasets.

metrics. )e main factor leading to the superior perfor- grid search with different hyperparameters; secondly, some
mance of XGBoost is the working principle which efficiently of the datasets (such as DS1) have similar writing styles of
identifies errors and minimizes them in each iteration. )e authors, which led to 97% accuracy of logistic regression
basic intuition behind the working of XGBoost is to use model. On DS4, which is the combination of all the three
multiple classification and regression trees (CART) which datasets (and thereby includes more versatile writing styles
combine multiple weak learners to assign higher weights to as well), the accuracy of logistic regression drops to 87%.
misclassified data points. )erefore, during each subsequent
iteration, the model is able to correctly identify the mis- 4. Conclusion
classified points whereas regularization parameters are used
to reduce the overfitting problem. )e task of classifying news manually requires in-depth
Logistic regression is a relatively simpler model but knowledge of the domain and expertise to identify anomalies
achieved an average accuracy of over 90% on the three in the text. In this research, we discussed the problem of
datasets (DS1, DS2, and DS3). )ere can be multiple ex- classifying fake news articles using machine learning models
planations for achieving a high average accuracy; firstly, and ensemble techniques. )e data we used in our work is
logistic regression model is fine-tuned using an extensive collected from the World Wide Web and contains news
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Complexity

articles from various domains to cover most of the news Proceedings of the Association for Information Science and
rather than specifically classifying political news. )e pri- Technology, vol. 52, no. 1, pp. 1–4, 2015.
mary aim of the research is to identify patterns in text that [11] F. T. Asr and M. Taboada, “Misinfotext: a collection of news
differentiate fake articles from true news. We extracted articles, with false and true labels,” 2019.
different textual features from the articles using an LIWC [12] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news
tool and used the feature set as an input to the models. )e detection on social media,” ACM SIGKDD Explorations
learning models were trained and parameter-tuned to obtain Newsletter, vol. 19, no. 1, pp. 22–36, 2017.
[13] S. Vosoughi, D. Roy, and S. Aral, “)e spread of true and false
optimal accuracy. Some models have achieved compara-
news online,” Science, vol. 359, no. 6380, pp. 1146–1151, 2018.
tively higher accuracy than others. We used multiple per-
[14] H. Allcott and M. Gentzkow, “Social media and fake news in
formance metrics to compare the results for each algorithm. the 2016 election,” Journal of Economic Perspectives, vol. 31,
)e ensemble learners have shown an overall better score on no. 2, pp. 211–236, 2017.
all performance metrics as compared to the individual [15] V. L. Rubin, N. Conroy, Y. Chen, and S. Cornwell, “Fake news
learners. or truth? using satirical cues to detect potentially misleading
Fake news detection has many open issues that require news,” in Proceedings of the Second Workshop on Computa-
attention of researchers. For instance, in order to reduce the tional Approaches to Deception Detection, pp. 7–17, San Diego,
spread of fake news, identifying key elements involved in the CA, USA, 2016.
spread of news is an important step. Graph theory and [16] H. Jwa, D. Oh, K. Park, J. M. Kang, and H. Lim, “exBAKE:
machine learning techniques can be employed to identify the automatic fake news detection model based on bidirectional
key sources involved in spread of fake news. Likewise, real encoder representations from transformers (bert),” Applied
time fake news identification in videos can be another Sciences, vol. 9, no. 19, 2019.
possible future direction. [17] H. Ahmed, I. Traore, and S. Saad, “Detection of online fake
news using n-gram analysis and machine learning tech-
niques,” in Proceedings of the International Conference on
Data Availability Intelligent, Secure, and Dependable Systems in Distributed and
Previously reported data were used to support this study and Cloud Environments, pp. 127–138, Springer, Vancouver,
are available at https://www.kaggle.com/c/fake-news and Canada, 2017.
[18] W. Y. Wang, Liar, Liar Pants on Fire: A New Benchmark
https://www.kaggle.com/jruvika/fake-news-detection.
Dataset for Fake News Detection, Association for Computa-
tional Linguistics, Stroudsburg, PA, USA, 2017.
Conflicts of Interest [19] B. Riedel, I. Augenstein, G. P. Spithourakis, and S. Riedel, “A
simple but tough-to-beat baseline for the fake news challenge
)e authors declare that there are no conflicts of interest stance detection task,” 2017, https://arxiv.org/abs/1707.
regarding the publication of this paper. 03264.
[20] N. Ruchansky, S. Seo, and Y. Liu, “Csi: a hybrid deep model
References for fake news detection,” in Proceedings of the 2017 ACM on
Conference on Information and Knowledge Management,
[1] A. Douglas, “News consumption and the new electronic pp. 797–806, Singapore, 2017.
media,” 7e International Journal of Press/Politics, vol. 11, [21] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea,
no. 1, pp. 29–52, 2006. “Automatic detection of fake news,” 2017, https://arxiv.org/
[2] J. Wong, “Almost all the traffic to fake news sites is from abs/1708.07104.
facebook, new data show,” 2016. [22] P. Bühlmann, “Bagging, boosting and ensemble methods,” in
[3] D. M. J. Lazer, M. A. Baum, Y. Benkler et al., “)e science of Handbook of Computational Statistics, pp. 985–1022,
fake news,” Science, vol. 359, no. 6380, pp. 1094–1096, 2018. Springer, Berlin, Germany, 2012.
[4] S. A. Garcı́a, G. G. Garcı́a, M. S. Prieto, A. J. M. Guerrero, and [23] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams
C. R. Jiménez, “)e impact of term fake news on the scientific and fake news using text classification,” Security and Privacy,
community scientific performance and mapping in web of
vol. 1, no. 1, 2018.
science,” Social Sciences, vol. 9, no. 5, 2020.
[24] Kaggle, Fake News, Kaggle, San Francisco, CA, USA, 2018,
[5] A. D. Holan, 2016 Lie of the Year: Fake News, Politifact,
https://www.kaggle.com/c/fake-news.
Washington, DC, USA, 2016.
[25] Kaggle, Fake News Detection, Kaggle, San Francisco, CA, USA,
[6] S. Kogan, T. J. Moskowitz, and M. Niessner, “Fake News:
2018, https://www.kaggle.com/jruvika/fake-news-detection.
Evidence from Financial Markets,” 2019, https://ssrn.com/
[26] J. Bergstra and Y. Bengio, “Random search for hyper-pa-
abstract=3237763.
[7] A. Robb, “Anatomy of a fake news scandal,” Rolling Stone, rameter optimization,” Journal of Machine Learning Research,
vol. 1301, pp. 28–33, 2017. vol. 13, pp. 281–305, 2012.
[8] J. Soll, “)e long and brutal history of fake news,” Politico [27] T. M. Mitchell, 7e Discipline of Machine Learning, Carnegie
Magazine, vol. 18, no. 12, 2016. Mellon University, Pittsburgh, PA, USA, 2006.
[9] J. Hua and R. Shaw, “Corona virus (covid-19) “infodemic” [28] N. Cristianini and J. Shawe-Taylor, An Introduction to Support
and emerging issues through a data lens: the case of China,” Vector Machines and Other Kernel-Based Learning Methods,
International Journal of Environmental Research and Public Cambridge University Press, Cambridge, UK, 2000.
Health, vol. 17, no. 7, p. 2309, 2020. [29] T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods
[10] N. K. Conroy, V. L. Rubin, and Y. Chen, “Automatic de- in machine learning,” 7e Annals of Statistics, vol. 36, no. 3,
ception detection: methods for finding fake news,” pp. 1171–1220, 2008.
8503, 2020, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2020/8885861 by Readcube (Labtiva Inc.), Wiley Online Library on [24/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Complexity 11

[30] V. Kecman, Support Vector Machines-An Introduction in


“Support Vector Machines: 7eory and Applications”,
Springer, New York City, NY, USA, 2005.
[31] S. Akhtar, F. Hussain, F. R. Raja et al., “Improving mispro-
nunciation detection of arabic words for non-native learners
using deep convolutional neural network features,” Elec-
tronics, vol. 9, no. 6, 2020.
[32] D. Ruta and B. Gabrys, “Classifier selection for majority
voting,” Information Fusion, vol. 6, no. 1, pp. 63–81, 2005.
[33] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and
variable importance in random forests,” Statistics and Com-
puting, vol. 27, no. 3, pp. 659–678, 2017.
[34] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classifi-
cation and Regression Trees, Springer, Berlin, Germany, 1984.
[35] R. E. Schapire, “A brief introduction to boosting,” IJCAI,
vol. 99, pp. 1401–1406, 1999.
[36] E. M. Dos Santos, R. Sabourin, and P. Maupin, “Overfitting
cautious selection of classifier ensembles with genetic algo-
rithms,” Information Fusion, vol. 10, no. 2, pp. 150–162, 2009.
[37] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting
system,” in Proceedings of the 22nd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining,
pp. 785–794, San Francisco, CA, USA, 2016.
[38] T. Hastie, S. Rosset, J. Zhu, and H. Zou, “Multi-class ada-
boost,” Statistics and its Interface, vol. 2, no. 3, pp. 349–360,
2009.
[39] L. Lam and S. Y. Suen, “Application of majority voting to
pattern recognition: an analysis of its behavior and perfor-
mance,” IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Humans, vol. 27, no. 5, pp. 553–568, 1997.

You might also like