0% found this document useful (0 votes)
85 views

Fake News Classification Using Machine Learning Techniques

Fake news exerts a pervasive and urgent influence, causing mental harm to readers. Differentiating between fake and genuine news is increasingly tricky, impacting countless lives. This proliferation of falsehoods spreads harm and misinformation and erodes trust in global information sources, affecting individuals, organizations, and nations. It requires immediate attention.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Fake News Classification Using Machine Learning Techniques

Fake news exerts a pervasive and urgent influence, causing mental harm to readers. Differentiating between fake and genuine news is increasingly tricky, impacting countless lives. This proliferation of falsehoods spreads harm and misinformation and erodes trust in global information sources, affecting individuals, organizations, and nations. It requires immediate attention.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Fake News Classification using


Machine Learning Techniques
Islam D. S. Aabdalla Dr. D. Vasumathi
PhD. Scholar of CSEJNTUH University Professor of CSEJNTUH University
Hyderabad, India Hyderabad, India

Abstract:- Fake news exerts a pervasive and urgent Unreliable news creating for financial or political motives or
influence, causing mental harm to readers. to gain notoriety, using ideological narratives to deceive the
Differentiating between fake and genuine news is receivers [5],[6]. TThis unreliable content, news
increasingly tricky, impacting countless lives. This manipulation, knowledge bubbles, and a lack of security on
proliferation of falsehoods spreads harm and social platforms have become a pervasive disadvantage in
misinformation and erodes trust in global information our society.
sources, affecting individuals, organizations, and nations.
It requires immediate attention. To address this issue, we Not only is unreliable news prevalent in traditional
conducted a comprehensive study utilizing advanced media, but it has also gained prominence in social forums,
techniques such as TF-IDF and feature engineering to allowing it to spread quickly and extensively [2]. Clickbait,
detect fake news. WWe proposed Machine Learning often with catchy headlines, is commonly used to attract
Techniques (MLT), including Naïve Bayes (NB), readers' attention [7]. By clicking on these enticing titles,
Decision trees (DT), Support Vector Machines (SVM), readers leading to poorly written articles with little
Random Forest (RF), and Logistic Regression (LR) to relevance to the news they were expecting. Clickbait aims to
classify news articles. Our studies involved analyzing drive more traffic to websites that rely on advertisements for
word patterns from diverse news sources to identify revenue. An infamous example occurred during the 2016
unreliable news. We calculated the likelihood of an presidential election, where Russian trolls used clickbait to
article being fake or genuine based on the extracted sway public opinion away from Donald Trump toward
features and evaluated algorithm accuracy using a Hillary Clinton. This instance illustrates the considerable
carefully crafted training dataset. The analysis revealed influence that false information can exert on important
that the decision tree algorithm exhibited the highest matters. Social media platforms have evolved into
accuracy, detecting fake news with an impressive 99.68% environments where untrustworthy news, characterized by
rate. While the remaining algorithms performed well, errors, informal language, and flawed grammar,
none surpassed the accuracy of the decision tree. TThis proliferates[8]. The quest for improved credibility and
study highlights the immense potential of machine accuracy has created an urgent need for techniques that help
learning techniques in combating the pervasive menace users make informed decisions [6].
of leaks. Our research presents a reliable and efficient
method to identify and classify unreliable information, Websites like Snopes and Politifact have emerged to
fact-check news articles and uphold the truth. Research
Safeguarding the integrity of news sources and
protecting individuals and societies from the harmful studies have also developed repositories to identify genuine
effects of misinformation. and fraudulent internet sources [9]. In light of these
discussions, categorizing unreliable news hinges on purpose
Keywords:- Machine Learning, TF-IDF, Feature and authenticity. Authenticity refers to false news
Extraction, Fake News Detection, social media. containing inaccurate information. The second factor
involves deliberately manipulating the news content to
I. INTRODUCTION deceive the audience [10].

Fake news is a term used to describe inaccurate or The main challenge lies in distinguishing between fake
deceptive information that is presented as genuine news. and real news [11]. Different social media platforms
This can encompass fabricated narratives, overstated or recognize false news through Extraction Features (FE),
altered facts, and deliberately misleading content[1]. while traditional news societies rely on various factors, such
However, with new technologies, the internet has made it as images and text, to identify and spot fake news. I In
possible for people to access news from all over the world, terms of textual word-based sources, there are several
at any time and on any device. The internet, primarily aspects to consider:
through social media and other media applications, has
become the primary platform for spreading fake news. It is essential to determine whether the article news
Despite the abundance of information available, the truth carries the original content or just a part of it.
often needs to be clarified [2]. The purpose behind the
spread of fake news is to manipulate the audience, whether The authenticity of the news source needs to evaluate,
for political or commercial gain [3]. In today's digital knowing who published the news is crucial.
landscape, a vast amount of news is published across
various media outlets, making it increasingly challenging to
discern between accurate and false information [4].

IJISRT23NOV2097 www.ijisrt.com 2194


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Another aspect to consider is the headline, which videos. Khan, J.Y., et al. [14] investigated the effectiveness
provides an in-detail news overview and aims to entice the of benchmarking ML models on various datasets for fake
audience. Additionally, the article news should accurately news detection. They analyzed the content and size of news
represent the content of the news. RResearchers believe that articles and compared them with existing studies. The study
detecting datasets and applying machine learning techniques aimed to assist the research community in selecting the most
can significantly contribute to quickly detecting unreliable reliable technique for identifying fake news. The authors
news, both for the title and the article content [12]. However, found that pre-trained BERT (Bidirectional Encoder
categorizing article news poses a significant challenge due Representations from Transformers)-based models
to analyzing text news from datasets, which involves performed well on small datasets.
processing many words, terms, and phrases, leading to
computational limitations. Furthermore, redundant and Baydogan, C., and B. Alatas [15] proposed a
extraneous features can harm the performance of classifiers. framework based on ML models and NLP techniques to
Feature engineering is crucial for enhancing performance. In predict fake news from article content. They utilised
this study, we bridge this gap by applying machine learning different feature count vectors, word embedding, and TF-
algorithms such as Support Vector Machine ( SVM), IDF (Term Frequency-Inverse Document Frequency) to
Decision Tree ( DT), Logistic Regression ( LR), Naive generate feature vectors. The SVM (Support Vector
Bayes (NB), and Random Forest (RF). We also employ Machine) linear classification algorithm achieved a
feature extraction techniques such as TF-IDF features, N- precision of 0.94. B. Alatas and Ozbay [16] improved the
grams, and feature engineering. detection of fake news articles by utilizing the FNC-1
dataset, which includes four categories of false news. They
The meaningful contributions of this paper are as assessed modern techniques for fake news detection using
follows: ML algorithms and big data technologies. The authors
 They are utilizing two datasets, removing unnecessary employed a decentralized Spark cluster and stacked
entities, eliminating duplicate and missing values, and ensemble algorithms. By using N-gram, count vectorizers,
merging them. and TF-IDF, they achieved a performance of 92.45% in
 After removing stop words and punctuation and detecting fake news.Amutha, R., and D.VD.V. Kumar [17]
converting text to lowercase, applying feature extraction presented a methodology for analyzing news information
and distinguishing between real and fake news. They used a
techniques, such as TF-IDF, to the news articles, feature
dataset consisting of Twitter microblog postings related to
engineering is employed to enhance performance. newsworthy topics. The study focused on supervised
 It calculates the probabilities of each word and predicts learning techniques such as SVM, decision trees, and Kappa
whether it is fake or accurate based on these probabilities. statistics. The authors considered subsets of attributes,
 To obtain the best results, we have implemented including text characteristics, social network features, and
different algorithms for detecting fake news, including propagation-based attributes. SVM achieved high precision
Naive Bayes, Decision Tree, Random Forest, Logistic with 87% recall and 82% accuracy for real news and 84%
Regression, and SVM. We compare the performance of precision with 89% recall and 87% accuracy for fake
these algorithms with the previous approach. Notably, news.Kaur, P., and M. Edalati [18] analyzed and classified
the decision-tree algorithm shows promising results in fake news using a dataset of approximately 40,000 news
classifying junk news. articles. They first created a list of stop-words to remove
unnecessary words from the articles. Then, they applied
The remaining sections of this study follow the CountVectorizer and TfidfVectorizer to generate feature
following structure: Section two provides a Literature vectors. They selected classification models such as Naive
Review, highlighting the related work on detecting Bayes, Linear SVC, Logistic Regression, and Random
unreliable news in the last three years. Section three presents Forest. Logistic regression achieved the highest performance,
the methodology framework for detecting fake news, with 80% accuracy for fake news and 76% accuracy for
focusing on models for predicting the news's authenticity. reliable news.Meel, P., and D.K. Vishwakarma [19] focused
Section four presents the results and discussion, evaluating on classifying movie opinions as positive or negative using
the obtained results. Finally, in section five, we conclude ML algorithms. They analysed online movie reviews using
our study and provide recommendations for future work. opinion mining and text classification algorithms. Five ML
algorithms, including DT-J48, SVM, NB, and KNN, were
II. LITERATURE REVIW compared. SVM achieved the highest accuracy of 81.35%
for sentiment classification. The authors also suggested
This section provides an overview of relevant studies extending the analysis to other datasets, such as those from
in the field. Additionally, numerous experiments have been Amazon or eBay.Aslam, N., et al. [20] discussed the
conducted to detect the spread of fake news on social media reliability of news on the internet and proposed a fake news
using AI and ML.L. Sudhakar, M., and K. Kaliyamurthie detection system. They collected posts from Facebook and
[13] discussed the detection of fake news articles through used two classification techniques: a Boolean crowd-
ML algorithms. They identified several open problems that sourcing algorithm and logistic regression. Logistic
require further research. They proposed an LVQ (learning regression achieved a high accuracy of 99% in predicting
vector quantization) approach and achieved a precision fake news posts.
output of 93.54%. The authors also suggested future
research areas for the real-time identification of fake news in

IJISRT23NOV2097 www.ijisrt.com 2195


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
This section presents an overview of relevant studies
on detecting fake news. The approach adopted in this study To begin, we merge two different datasets to create a
aligns with the methods used by the previously mentioned corpus in the first step. The second step involves applying
authors. Moreover, various ML models are employed, preprocessing techniques, including handling missing values,
feature extraction techniques are applied, including feature removing duplicate attributes, and eliminating unnecessary
engineering. The study proceeds to compare different ML attributes in the fake news dataset. Furthermore, various
techniques and assess their effectiveness. When the results preprocessing operations are performed on the news
are compared to those of previous studies, this study shows attributes, such as removing redundant words, converting
exceptional performance. text to lowercase, and implementing other necessary
preprocessing steps. Subsequently, the dataset is divided
III. METHODOLOGY into 80% for training purposes and 20% for testing, enabling
further analysis.
In this section, we present our proposed approach,
which encompasses multiple stages, including using two In the third step, we concentrate on feature extraction
datasets, feature extraction, feature engineering, ML methods to convert the textual data into numerical
classification, and addressing the challenge of detecting representations while utilizing feature engineering
unreliable news. techniques to enhance accuracy. The fourth step details the
ML models employed in this analysis as we explore various
The dataset comprises text news with attributes such as machine-learning algorithms for detecting fake news.
the headline, ID, and date, providing information on whether
the news is real or fake. Figure 1 provides an overview of Finally, in the last step, we evaluate the performance of
our approach, illustrating the process of detecting fake news the models and compare them with other approaches,
on a combined dataset. allowing us to assess their effectiveness.

Fig. 1: Graphical Representation of Proposed System

IJISRT23NOV2097 www.ijisrt.com 2196


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
A. Dataset collection Furthermore, Figure 2 provides a visual representation of the
This initial phase of the study focuses on the dataset. dataset, where the labeled class 0 represents fake news and 1
Table 1 presents a total of 23,481 headline-article pairs of denotes real news. The dataset is accessible online on the
fake news and 21,417 descriptions of actual news. Kaggle website [27].

Table 1: Discretion of Dataset Collection.


Attribute Description
No Unique ID for article article news.
Headline The headline for articlenews.
Article news Article news could be incomplete.
Class A labeled of fake or real.
Author How write article.
Date Date of new make.

Fig. 2: Label Dataset fake and Real

Figure 3 provides an overview of the distribution of reputable outlets such as the Washington Post, New York
articles across different subjects. It includes 1,570 articles Times, CNN, etc. This study's findings validate the proposed
related to government news, 778 articles about medals in the model's effectiveness in identifying fake news articles by
East, 9,050 general articles, 783 articles on US news, 4,459 analyzing their text using machine learning algorithms. This
articles categorized as Left news, 11,272 articles labeled as P approach dramatically streamlines the decision-making
news, 10,145 articles covering world news, and 6,831 articles process.
focusing on politics. These articles are sourced from

Fig. 3: Article News Per Subject

Moreover, Figure 4 and Figure 5 display word clouds with each category, offering an insight into the most
that have been generated based on the identified fake and real frequently occurring words found in both fake and real news
news within the system, respectively. These word clouds articles.
visually represent the presence of multiple words associated

IJISRT23NOV2097 www.ijisrt.com 2197


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 4: Word Shadow for Unreliable News

Fig. 5: Word Shadow for Reliable News

Table 2: Parameter Word cloud for fake and real dataset news
Target Height Max_Font_Size Collocations Ggenerate Figsize Width Interpolation
ture 500 110 False all_words 10,7 800 Bilinear

In Table 2, the most important properties for applying Furthermore, Figure 6 and Figure 7 provide visual
common words in real and fake news articles are presented. representations of the distribution of these common words.

Fig. 6: Most Commonof real news

IJISRT23NOV2097 www.ijisrt.com 2198


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 7: Most Commonof fake news

Table 3: Common parameters


Parameters
Rotation Vertical
Figure size 12,8
Tocanization phrase token_space.tokenize(all_words)
Data Df_Frequency
List(Frequency.Values())
Color Red
Frequency Nltk.FreqDist(token_phrase)

DataFrame(DF) "Word": list(frequency.keys)


Df_Frequency.nlargest N=Quantity
columns = Frequency
Token_space Tokenize.WhitespaceTokenizer

B. Prepossessing Dataset stop word elimination, conversion to lowercase, stemming,


Machine learning heavily relies on preprocessing to tokenization, and utilization of models from the Keras
transform incomplete and inconsistent datasets into useful library. TThe dataset is then visualized using an N-gram
representations. Various text preprocessing techniques are term-based tokenizer, which segments the news based on
applied to the dataset, including text transformation for stop the specified size of N. Specific preprocessing steps, such
word elimination, conversion to lowercase, stemming, as tokenization, sentence segmentation, lowercase
tokenization, and utilization of models from the Keras library. conversion, stop word removal, and punctuation deletion,
TThe dataset is then visualized using an N-gram term-based are performed to reduce the dataset's volume by
tokenizer, which segments the news based on the specified eliminating irrelevant details. These preprocessing steps
size of N. Specific preprocessing steps, such as tokenization, are crucial in preparing the data for subsequent analysis.
sentence segmentation, lowercase conversion, stop word Data preprocessing plays a vital role in many supervised
removal, and punctuation deletion, are performed to reduce learning algorithms. The individual data preprocessing
the dataset's volume by eliminating irrelevant details. These steps are as follows:
preprocessing steps are crucial in preparing the data for  B. onfigure the tokenizer. Tokenization involves
subsequent analysis. Data preprocessing plays a vital role in separating the news into units such as words or sentences.
many supervised learning algorithms. The individual data It facilitates text detection by converting the content into
preprocessing steps are as follows: features using ML models [28]. After tokenizing the
 Specify a stop words list and remove punctuation. samples, the next step is to transform the tokens into a
Machine learning heavily relies on preprocessing to standardized form. Stemming is applied to convert phrases
transform incomplete and inconsistent datasets into useful into their basic form, reducing the number of term types or
representations. Various text preprocessing techniques are labels in the data for faster and more efficient detection.
applied to the dataset, including text transformation for

IJISRT23NOV2097 www.ijisrt.com 2199


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 8: Article News Tokenization

C. Lowercase transformation and stemming: Snowball stemming implementation [30] is utilized to reduce
In this step, all terms in the dataset are transformed to phrases to their stem forms. This rule-based approach aids in
lowercase to accommodate variations in capitalization. reducing the word corpus while preserving the
Moreover, stemming is applied using the NLTK's WordNet meaningfulness of the words.
stemming implementation [8]. Conversely, the NLTK's

Fig. 9: Stemming and Convert to Lower Case Process

D. Feature Extraction (FE): taking the logarithm of the ratio between the total number of
The main challenge in news categorization is dealing with documents in the corpus and the number of documents in
high-dimensional data. The presence of numerous document which the specific phrase appears[3].
terms, phrases, and words can lead to increased computational
limitations in learning. Additionally, redundant and irrelevant Term frequency (TF) is a method that uses the
features can hinder the interpretation of classifiers. Therefore, occurrence counts of terms in documents to determine the
it is crucial to perform feature reduction and transform the text similarity between documents. Each vector is then normalized
into numerical features that can be further processed while so that the sum of its elements corresponds to the total word
preserving the dataset [29]. count and represents the probability of a specific phrase
existing in the documents [32].in the following equation:
The CountVectorizer of Words describes the occurrence
of terms within news articles. It assigns a value of 1 if a term TF = (Number of occurrences of a term in a document)) / /
(Total number of terms in the document) (1)
is present in the sentence and 0 if it is not. This creates a bag-
of-words document matrix for each text document. N-grams IDF= log D/(1 + DF) (2)
are combinations of adjacent terms or phrases of length "n"
Where:
that can be found in the original text [31].
 D is the total number of documents in the collection.
TF-IDF (Term Frequency-Inverse Document Frequency)  DF is the number of documents containing the term.
is a widely used weighting metric in dataset analysis. It is a
statistical measure that evaluates the importance of a phrase to [33]. For every word present in a dataset row, the value
a document in an article news. The reputation of a phrase is non-zero, and if the word is not present, the value is zero.
increases with the number of occurrences within the document The TF-IDF of a token is calculated using the following two
equations:
but is also influenced by its frequency in the entire corpus.
The IDF (Inverse Document Frequency) It is computed by TF − IDF = TF ∗ IDF (3)

IJISRT23NOV2097 www.ijisrt.com 2200


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 10: Extract Article News

E. Feature engineering for fake news detection  Support Vector Machine (SVM):
Feature engineering (FET) is crucial for enhancing the SVM is a classification model that helps identify patterns
performance of any machine learning algorithm, including its in data for regression and classification. It creates learning
application to extract features from datasets. Transforming processes from class training datasets and has a sound
the raw dataset into feature data improves the quality of the theoretical basis. SVM requires a relatively small number of
model and enables achieving sufficient accuracy [30]. FET samples compared to the dimensions of the data. It addresses
involves converting the original values and applying them the problem of discriminating between components of two
during the feature engineering step. There are various classes using dimensional vectors.
techniques available for feature engineering, and sometimes
it can be unclear which methods fall under the scope of FE  Logistic Regression (LR):
and which do not [37]. LR is a classification model used for predicting the
outcome of a categorical dependent variable based on
F. Algorithms Used for Classification predictor features. It can handle numeric or categorical
Machine learning (ML) in real-time during the predictors and a categorical label. LR estimates discrete
experimentation has a rapid impact on categorizing unreliable values and predicts the probability of an event occurring,
news. We use the following ML algorithms, such as Naïve with values between 0 and 1.
Bayes (NB), decision tree (DT), Random Forest (RF), SVM,
and Logistic regression (LR), to detect anomalies and analyze  Evaluation Matrix:
the effectiveness of our progressive algorithms. We use various assessment measures and evaluation
metrics to analyze the efficiency of the model in detecting
 Naïve Bayes (NB): false news articles.
The NB algorithm provides a probabilistic model-making
technique. It computes the probability of each label variable's Accuracy: indicates the proportion of accurate
importance for conveyed input variable significances. By predictions relative to the numeral of possible ones[8].
using dependent probabilities for an unexplored record, the
model calculates the result of all target class weights and 𝐴𝑐𝑐. =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
predicts the most likely outcome. NB is a classification 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
(4)
algorithm that is probabilistic and supervised, originally
developed by Thomas Bayes. It is easy to interpret and Recall: its point to the percentage of relevant measures
efficient for computation. retrieved from the whole numeral of relevant computed and
instances[9].
 Decision Tree (DT):
The DT algorithm partitions data into two or more 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (5)
subsets based on the similarity of samples. It is a recursive
process that splits subsets and repeats the process until a
F measure (F1 or F-score): harmonic mean of recall and
stopping condition is satisfied. Each decision node tests the precision [10]given by:
values of specific data functions, and each branch
corresponds to a different test outcome. Decision trees are 2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝑓 − 𝑠𝑐𝑜𝑟𝑒 = (6)
efficient for making classifiers and can handle both 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
categorical and continuous variables.
Precision indicates the percentage of actual test
 Random Forest (RF): outcomes predicted accurately by dividing the numeral of
RF is a collection of tree predictors that depend on the correct predictions by the numeral of inaccurate ones[11].
same distribution for all trees. It uses a random vector to 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
sample features independently in each tree. The prediction 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑡𝑖𝑣𝑒 (6)
error for random forests converges, and they have the
advantage of being robust against noise. This section presents the output or results of identifying
fake news; the common word accurate and most common

IJISRT23NOV2097 www.ijisrt.com 2201


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
word fake contain the dataset, the classification models for Therefore, methods such as RF, NB, SVM, Support to
real and fake news, classification models for opinions real determine and show which data are actual and which has
and fake, and the evaluation of the results. TThis section has been spreading fake over social media.
studied ML algorithms processing and analyzing datasets.

Table 4: Shows the Classification Report of the Proposed Model.


Methods Accuracy (%) Precision(%) Recall.(%) F1-score(%)
Naïve Bayes(NB) 99.55 99.72 99.42 99.56
Decision tree(DT) 99.68 99.69 99.71 99.69
Support Vector Machine(SVM) 94.86 96.75 93.36 95.02
Logistic regression(LR) 98.73 99.53 98.78 99.15
Random forest(RF) 99.03 99.37 98.78 99.07

The investigation results presented in Table 5 depict model showcases high accuracy (99.03%), precision
the performance of various classification models in (99.37%), recall (98.78%), and F1-score (99.07%),
identifying fake news using TF-IDF feature extraction and emphasizing its effectiveness in fake news classification.
feature engineering techniques. Naïve Bayes demonstrates Figure 10 provides a visual representation of the comparison
impressive results, boasting an accuracy of 99.55%, a results, offering a comprehensive overview of each model's
precision of 99.72%, a recall of 99.42%, and an F1-score of performance. These findings underscore the effectiveness of
99.56%, underscoring its exceptional performance in fake TF-IDF feature extraction and feature engineering
news classification. The decision tree model stands out with techniques in enhancing classification accuracy and overall
the highest accuracy at 99.68% and exhibits commendable model performance. In summary, all models demonstrate
precision, recall, and F1 score. In contrast, SVM delivers strong capabilities in identifying fake news, with the
satisfactory performance, achieving an accuracy of 94.86%, decision tree model achieving the highest accuracy. Naïve
precision of 96.75%, recall of 93.36%, and an F1-score of Bayes, logistic regression, and random forest models also
95.02%, albeit not reaching the levels attained by Naïve exhibit excellent performance, while SVM delivers
Bayes or the decision tree model. The logistic regression satisfactory results. These outcomes emphasize the efficacy
model performs reasonably well, securing an accuracy score of the employed techniques in detecting fake news and
of 98.73% and displaying good precision, recall, and F1- provide valuable insights into the strengths of different
score (99.53%, 98.78%, and 99.15%, respectively), although classification models.
there is potential for higher results. The random forest

Fig. 11: Distribution of Classification Result

sequence follows: The distribution of the classification set. Accuracy was utilized to calculate the F1 score,
matrix of the ML algorithm is visualized in Figure 11. It precision, and recall as it pertains to the classification of the
depicts the number of instances for each class in the testing classes.

IJISRT23NOV2097 www.ijisrt.com 2202


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 12: Confusion matrix of final ML model (a)NB (b)DT (c) SVM (d)RF (e) LR

We also constructed the ML models, as shown in Additionally, LR obtained a score of 98.73%, and SVM
Figure 11. A confusion matrix is a table that provides an achieved a distribution of 94.86% as shown in Figure 7.
overview of the performance of supervised algorithms. The TThese results were part of the evaluation process, which
entries (A) NB, (B) DT, (C) SVM, (D) RF, and (E) LR included assessing accuracy, precision, recall, F1-score, and
indicate the models used, and they show that the models the confusion matrix to evaluate the model's performance.
made some incorrect classifications. Among the models, the PPython was chosen for implementing the ML models due
DT model achieved the highest accuracy of 99.68%, to its extensive libraries and high efficiency.
followed by NB with 99.55% and RF with 99.03%.

Fig. 13: Probability of the Fake News

Fig. 14: Confusion Matrix of Each Words

IJISRT23NOV2097 www.ijisrt.com 2203


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 15: Distribution of Accuracy Result

Following the feature extraction process depicted in using a dataset to estimate the probability values and
Figures 13 and 14, the task was to identify unreliable or1141 analyze the dataset using three different methods. This
real news articles by calculating the probability of being real allowed for determining which model was more accurate in
or fake based on specific criteria. The models were trained classifying the news articles.

Table 6: Comparison-Based Classification Results with Previous Work.


Authors and References Accuracy(%)
Faustini et,al.[3] 79.00
.Goswami et, al.[4] 85.86
Liu, Y. and Y.-F.B. Wu[2] 90.00
Altheneyan, A. and A. Alhadlaq[1] 92.45
Goldani, and S. Momtazi[7] 99.08
Our work 99.68

Table 6 provides a comparative analysis of our REFERENCES


proposed model with previous studies in detecting unreliable
news. The decision tree (DT) model in our current work [1]. Romaguera, O.G., News (?) papers: A Typology of
achieved the highest accuracy of 99.68% for detecting fake Fake News, 1880-1920. 2023.
article news, showcasing significant improvement. [2]. Sarkar, S. and M. Nandan, A Comprehensive
Approach to AI-Based Fake News Prediction in Digital
IV. CONCLUSION AND FUTURE WORK Platforms by Applying Supervised Machine Learning
Techniques, in Handbook of Research on Applications
Many algorithms machine learning to detect fake news. of AI, Digital Twin, and Internet of Things for
However, it is crucial to select the model that achieves high Sustainable Development. 2023, IGI Global. p. 61-86.
accuracy on the datasets. This study focused on identifying [3]. Dumitru, E.-A., Testing children and adolescents’
fake news by utilizing feature extraction TF-IDF and feature ability to identify fake news: a combined design of
engineering methods. quasi-experiment and group discussions. Societies,
2020. 10(3): p. 71.
In conclusion, this study employed feature extraction [4]. Fraga-Lamas, P. and T.M. Fernandez-Carames, Fake
using TF-IDF and feature engineering techniques to detect news, disinformation, and deepfakes: Leveraging
fake news. Several machine learning classification distributed ledger technologies and blockchain to
algorithms were applied and compared, including Random combat digital deception and counterfeit reality. IT
Forest, Naive Bayes (NB), Decision Tree (DT), SVM, and Professional, 2020. 22(2): p. 53-59.
Logistic Regression. Our findings revealed that Decision [5]. Khan, A., K. Brohman, and S. Addas, The anatomy of
Tree (DT) exhibited exceptional performance, achieving a ‘fake news’: Studying false messages as digital objects.
remarkable classification accuracy of 99.68% in correctly Journal of Information Technology, 2022. 37(2): p.
identifying fake news, surpassing previous research results. 122-143.
For future endeavors, explore implementing deep learning [6]. Altheneyan, A. and A. Alhadlaq, Big Data ML-Based
algorithms further to enhance the development of real-time Fake News Detection Using Distributed Learning.
fake news detection techniques. Additionally, incorporating IEEE Access, 2023. 11: p. 29447-29463.
sentiment analysis into the detection process will contribute [7]. Bruzzese, M., Fake news and information disorder: a
to better identifying and flagging fake news content. By journey through QAnon's conspiracy theory. 2021.
integrating these advancements, we expect to significantly [8]. Hakak, S., et al., An ensemble machine learning
improve the accuracy and efficiency of detecting fake news. approach through effective feature extraction to

IJISRT23NOV2097 www.ijisrt.com 2204


Volume 8, Issue 11, November 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
classify fake news. Future Generation Computer [24]. Vereshchaka, A., S. Cosimini, and W. Dong,
Systems, 2021. 117: p. 47-58. Analyzing and distinguishing fake and real news to
[9]. Singh, G. and K. Selva, A Comparative Study of mitigate the problem of disinformation. Computational
Hybrid Machine Learning Approaches for Fake News and Mathematical Organization Theory, 2020. 26: p.
Detection that Combine Multi-Stage Ensemble 350-364.
Learning and NLP-based Framework. 2023. [25]. Di Franco, G. and M. Santurro, Machine learning,
[10]. Liu, Y. and Y.-F.B. Wu, Fned: a deep network for fake artificial neural networks and social research. Quality
news early detection on social media. ACM & quantity, 2021. 55(3): p. 1007-1025.
Transactions on Information Systems (TOIS), 2020. [26]. Goldani, M.H., R. Safabakhsh, and S. Momtazi,
38(3): p. 1-33. Convolutional neural network with margin loss for
[11]. Faustini, P.H.A. and T.F. Covoes, Fake news detection fake news detection. Information Processing &
in multiple platforms and languages. Expert Systems Management, 2021. 58(1): p. 102418.
with Applications, 2020. 158: p. 113503. [27]. Ahmad, I., et al., Fake news detection using machine
[12]. Weiss, S.M., et al., Text mining: predictive methods learning ensemble methods. Complexity, 2020. 2020:
for analyzing unstructured information. 2010: Springer p. 1-11.
Science & Business Media. [28]. Baarir, N.F. and A. Djeffal. Fake news detection using
[13]. Sudhakar, M. and K. Kaliyamurthie, Effective machine learning. in 2020 2nd International Workshop
prediction of fake news using a learning vector on Human-Centric Smart Environments for Health and
quantization with hamming distance measure. Well-being (IHSH). 2021. IEEE.
Measurement: Sensors, 2023. 25: p. 100601. [29]. Wahab, O.A., Intrusion detection in the iot under data
[14]. Khan, J.Y., et al., A benchmark study of machine and concept drifts: Online deep learning approach.
learning models for online fake news detection. IEEE Internet of Things Journal, 2022. 9(20): p.
Machine Learning with Applications, 2021. 4: p. 19706-19716.
100032. [30]. Kaliyar, R.K., et al., FNDNet–a deep convolutional
[15]. Baydogan, C. and B. Alatas, Metaheuristic ant lion and neural network for fake news detection. Cognitive
moth flame optimization-based novel approach for Systems Research, 2020. 61: p. 32-44.
automatic detection of hate speech in online social [31]. Perincheri, S., et al., An independent assessment of an
networks. IEEE Access, 2021. 9: p. 110047-110062. artificial intelligence system for prostate cancer
[16]. Ozbay, F.A. and B. Alatas, Fake news detection within detection shows strong diagnostic accuracy. Modern
online social media using supervised artificial Pathology, 2021. 34(8): p. 1588-1595.
intelligence algorithms. Physica A: statistical [32]. Rahman, M.S., F.B. Ashraf, and M.R. Kabir. An
mechanics and its applications, 2020. 540: p. 123174. Efficient Deep Learning Technique for Bangla Fake
[17]. Amutha, R. and D.V. Kumar, Ensemble based News Detection. in 2022 25th International Conference
Classification of Dynamic Rumor Detection in Social on Computer and Information Technology (ICCIT).
Networks for Green Communication. Journal of Green 2022. IEEE.
Engineering, 2021. 11(2): p. 1220-1243. [33]. Lv, Z. and S. Xie, Artificial intelligence in the digital
[18]. Kaur, P. and M. Edalati, Sentiment analysis on twins: State of the art, challenges, and future research
electricity twitter posts. arXiv preprint topics. Digital Twin, 2022. 1(12): p. 12.
arXiv:2206.05042, 2022. [34]. Shahid, W., et al., Detecting and mitigating the
[19]. Meel, P. and D.K. Vishwakarma, Fake news, rumor, dissemination of fake news: Challenges and future
information pollution in social media and web: A research opportunities. IEEE Transactions on
contemporary survey of state-of-the-arts, challenges Computational Social Systems, 2022.
and opportunities. Expert Systems with Applications, [35]. Ragia Sultana, M.K.H., et al., An Effective Fake News
2020. 153: p. 112986. Detection on Social Media and Online News Portal by
[20]. Aslam, N., et al., Fake detect: A deep learning Using Machine Learning.
ensemble model for fake news detection. complexity, [36]. Madani, M., H. Motameni, and H. Mohamadi, Fake
2021. 2021: p. 1-8. news detection using deep learning integrating feature
[21]. Kang, M., et al., A study on the influence of online extraction, natural language processing, and statistical
reviews of new products on consumers’ purchase descriptors. Security and Privacy, 2022. 5(6): p. e264.
decisions: An empirical study on JD. com. Frontiers in [37]. Verdonck, T., Baesens, B., Óskarsdóttir, M., & vanden
Psychology, 2022. 13: p. 983060. Broucke, S. (2021). Special issue on feature
[22]. Zhang, X. and A.A. Ghorbani, An overview of online engineering editorial. Machine Learning, 1-12.
fake news: Characterization, detection, and discussion.
Information Processing & Management, 2020. 57(2):
p. 102025.
[23]. Kaliyar, R.K., A. Goswami, and P. Narang, DeepFakE:
improving fake news detection using tensor
decomposition-based deep neural network. The Journal
of Supercomputing, 2021. 77: p. 1015-1037.

IJISRT23NOV2097 www.ijisrt.com 2205

You might also like