0% found this document useful (0 votes)

27 views

D13_Project Report

The project report titled 'Fake News Detection Using Machine Learning' outlines a study conducted by students at Siksha 'O' Anusandhan University, focusing on the challenges of identifying fake news in the digital age. It explores various machine learning algorithms, particularly emphasizing the Decision Tree algorithm for its accuracy in detection. The report includes acknowledgments, individual contributions, and a thorough examination of existing systems, methodologies, and datasets used for the project.

Uploaded by

ipsit9009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

D13_Project Report

Uploaded by

ipsit9009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

FAKE NEWS DETECTION USING

MACHINE LEARNING
A Project Report

Submitted by:

Ashutosh Kumar (2041011113)

Aditi Rath (2041018064)
Ashutosh Sarangi (2041019145)
Indrajit Das (2041004164)

in partial fulfillment for the award of the degree

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Faculty of Engineering and Technology, Institute of Technical Education and Research
SIKSHA ‘O’ ANUSANDHAN (DEEMED TO BE) UNIVERSITY
Bhubaneswar, Odisha, India
(June 2024)
CERTIFICATE

This is to certify that the project report titled “FAKE NEWS DETECTION USING
MACHINE LEARNING” being submitted by Ashutosh Kumar, Aditi Rath, Ashutosh
Sarangi, Indrajit Das of section ‘D’ to the Institute of Technical Education and Research,
Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar for the partial fulfillment
for the degree of Bachelor of Technology in Computer Science and Engineering is a record of
original confide work carried out by them under my/our supervision and guidance. The project
work, in my/our opinion, has reached the requisite standard fulfilling the requirements for the
degree of Bachelor of Technology.

The results contained in this project work have not been submitted in part or full to any other
University or Institute for the award of any degree or diploma.

(Name and signature of the Project Supervisor)

Department of Computer Science and Engineering
Faculty of Engineering and Technology;
Institute of Technical Education and Research;
Siksha ‘O’ Anusandhan (Deemed to be) University

ii
ACKNOWLEDGMENT

We would like to thank Dr. Prativa Das, our project supervisor, from the bottom of our hearts
for all of his help and support during our group project. Her knowledge, perceptions, and
priceless comments have been extremely helpful in forming our project and guaranteeing its
triumphant conclusion. Her patience, attention, and dedication to our education and
development are deeply appreciated.

We express our gratitude to Siksha "O" Anusandhan (Deemed to be University) for providing
the facilities, resources, and infrastructure that were critical to our joint project's success. Our
capacity to conduct research, develop, and collaborate is a result of the institution's
commitment to fostering an atmosphere that promotes academic success and inquiry.

We also acknowledge and express our gratitude to the other group members for their efforts,
cooperation, and project-related contributions. Their varied backgrounds, viewpoints, and
commitment have greatly aided in our project's overall success. Our capacity to collaborate as
a team and overcome challenges has enabled us to achieve our objectives.

Finally, we'd want to thank everyone and every organization that has contributed to our
collaborative effort through conversations, criticism, or other forms of aid. Your help and
encouragement have been critical to our growth and achievement of our purpose. We have
developed tremendously as a consequence of your support and encouragement, and our
project has been finished successfully.

Place: Signature of Students

Date:

iii
DECLARATION

We declare that this written submission represents our ideas in our own words and where
other’s ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/fact/source in our
submission. We understand that any violation of the above will cause for disciplinary action
by the University and can also evoke penal action from the sources which have not been
properly cited or from whom proper permission has not been taken when needed.

Signature of Students with Registration Numbers

Date: ___________

2041018064

2041011113

2041004164

iv
REPORT APPROVAL

This project report titled “FAKE NEWS DETECTION USING MACHINE LEARNING
“submitted by Ashutosh Kumar (2041011113), Aditi Rath (2041018064), Ashutosh
Sarangi (2041019145), Indrajit Das (2041004164) is approved for the degree of Bachelor of
Technology in Computer Science and Engineering.

Examiner(s)

________________________
________________________
________________________

Supervisor

________________________

Project Coordinator
________________________

v
PREFACE

Fake news, a term denoting misinformation or disinformation spread via various media
channels, has become a pervasive issue in today's digital era. The importance of detecting and
combating fake news cannot be overstated, as it can distort public perception, influence
elections, and incite social unrest. This report explores the critical need for effective fake news
detection mechanisms. The main issues include the rapid dissemination of false information,
the sophisticated nature of modern fake news, and the challenge of distinguishing it from
genuine news. To address these issues, various algorithms have been employed, such as
Natural Language Processing (NLP) techniques, machine learning models like Naive Bayes,
Support Vector Machines (SVM), Decision Tree, Logistic Regression and Random Forest
algorithms. Among these, the DT algorithm demonstrated the highest accuracy and robustness
in detecting fake news in our experiments, making it the preferred choice.

vi
INDIVIDUAL CONTRIBUTIONS

Ashutosh Kumar Introduction, Project overview, Motivation and Uniqueness.

Aditi Rath Literature survey, Material and methods; Model Diagram,

Methods used, Tools used, Evaluation Measures Used, Result,
Experimental Outcomes
Ashutosh Sarangi System Specifications, Parameters Used

Indrajit Das Existing System, Problem Outcomes.

vii
TABLE OF CONTENTS
Title Page i
Certificate ii
Acknowledgment iii
Declaration iv
Report Approval v
Preface vi
Individual Contributions vii
Table of Contents viii
List of Figures ix
List of Tables x

1. INTRODUCTION 1
1.1 Introduction 1
1.2 Project Overview 1
1.3 Motivation(s) 2
1.4 Uniqueness of the Work 2
1.5 Report Layout 2

2. LITERATURE SURVEY 3
2.1 Existing System 3
2.2 Problem Identification 4

3. METHODS 4
3.1 Dataset(s) Description 5
3.2 Model Diagram 9
3.3 Methods 9
3.4 Libraries 12
3.5 Evaluation Measures 13

4. EXPERIMENTATION AND RESULTS 14

4.1 System Specification 15
4.2 Parameters Used 15
4.3 Results and Outcomes 17
4.4 Result Analysis and Validation 19
5. CONCLUSIONS 19
6. REFERENCES 20
7. REFLECTION OF THE TEAM MEMBERS ON THE PROJECT 21
8. SIMILARITY REPORT 23

viii
LIST OF FIGURES

NO FIGURE NAME PAGE NO

1 Representing Fake Datasets 5

2 Representing True Datasets 6
3 Frequent words in fake news 6
4 Frequent word in true news 7
5 Article per subject 8
6 News percentage representing Pie-Chart 8
7 Model Diagram of Fake news Recognition 9
8 Confusion matrix 18
9 User Interface for prediction of fake news 19

ix
LIST OF TABLES

NO TABLE NAME PAGE NO

1 Performance Metrics of the Classifiers 17

x
1. INTRODUCTION

1.1 Introduction

This introduction provides a concise overview of the project's requirements and

objectives. It outlines the issues or challenges the project aims to address, emphasizing
the initiative's motivations. The originality of the work is underscored, showcasing its
innovative aspects and contributions to the field. Additionally, the report layout section
offers a roadmap for the reader, briefly describing the organizational structure of the
report and guiding them through its various sections.

1.2 Project Overview

Due to the increased density of international information exchange, the average

individual nowadays struggles to distinguish between true and fake news. Users of online
social networks are quickly influenced by the deceptive language used in fake news,
which has already had a profound effect on offline society. To increase the reliability of
information in online social networks, it is crucial to swiftly identify fake news. This
study addresses the problems caused by the elusive characteristics of fake news and the
complex relationships between news items, producers, and subjects. Machine learning-
based methods for identifying false news can help mitigate the harmful effects of
misinformation by providing a more accurate and efficient way to verify the reliability of
news sources.

Numerous examples exist of supervised and unsupervised learning algorithms being used
to categorize text within current fake news corpora. However, most research focuses on
specific datasets or domains, with the political domain being particularly prominent.
Consequently, algorithms trained on a specific type of article do not perform optimally
when exposed to articles from different domains. Developing a general algorithm that
performs well across all news domains is challenging due to the varying textual structures
of articles from different domains.

1
In this research, we propose a machine learning ensemble strategy to address the issue of
fake news detection. Our study examines various textual characteristics that can
distinguish between authentic and fraudulent content. We train several different machine
learning algorithms using a variety of ensemble methods that are not well explored in the
existing literature. These methods enable the effective and efficient training of various
machine learning algorithms. Additionally, we conducted thorough tests on four real-
world datasets that are freely accessible to the public.

1.3 Motivation

The goal is to identify news articles or other materials that make false or deceptive
claims. Fake news detection systems are crucial in curbing the rapid spread of
misinformation through social media platforms and other communication channels by
educating consumers about the characteristics and indicators of fake news. These
technologies enable people to consume news and information safely. Ensemble learners
have proven effective in numerous applications, as these learning models tend to reduce
error rates by utilizing strategies like bagging and boosting.

1.4 Uniqueness of the Work

Various algorithms are used by different fake news detection systems to recognize and
categorize fake news, and the algorithm selected has a big influence on the system's
accuracy. Numerous data sources, such as social media sites, news websites, and fact-
checking databases, are accessible to these systems. These algorithms employ a variety of
features, including linguistic ones like syntax and grammar as well as semantic ones like
word choice and context, to detect fake news.

1.5 Report Layout

The fact that this paper is organized into sub-sections makes it excellent. Details are
easily obtained. Section 1 presents the paper's introduction; Section 2 develops into the
literature review; and Section 3 presents our suggested model. All of the other statistical
majors' results are displayed in Section 4 along with our own. We have finished our paper
with potential future considerations in section 5.
2
2. LITERATURE SURVEY

This part examines the systems and solutions that are currently in use and are important
to the project. It also gives an overview of prior attempts and their flaws. This review
describes the difficulties with the current systems and provides a foundation for
identifying the problems that the project seeks to solve.

2.1 Existing System

In [1], the authors comprehensively compare high-performing models and their

characteristics for fake news detection using both machine learning and deep learning
algorithms. In [2], a transformer-based approach is proposed for fake news detection,
focusing on both news content and social contexts. This study employs Transformer-
based Encoder and Decoder models, achieving superior accuracy in a matter of minutes
using the LIAR and Fake News Net datasets. Patil et al. [3] investigate the effectiveness
of several machine learning algorithms, including Naive Bayes, SVM, and Passive
Aggressive Classifier, for fake news detection. Using a dataset containing both real and
fake news, the SVM model achieved an accuracy of 95.05%. In [4], Khanam et al.
explore various machine learning approaches on the LIAR dataset. They employ
algorithms such as Random Forest, SVM, Decision Tree, Naive Bayes, and XGBoost,
achieving an accuracy of over 75%. Ahmad et al. [5] focus on the use of ensemble
methods for fake news detection, utilizing the Kaggle and ISOT Fake News datasets.
Their study employs algorithms including Random Forest (RF), Linear SVM (LSVM),
K-Nearest Neighbors (KNN), and Logistic Regression (LR), with the RF algorithm
reaching a 99% accuracy rate. In [6], Goswami et al. evaluate multiple machine learning
techniques using the LIAR dataset. Their study, published on SSRN, employs methods
such as XGBoost, Random Forest (RF), AdaBoost, ExtraTrees, and Bagging, with the
Bagging Classifier and AdaBoost achieving an accuracy of 70%.

3
2.2 Problem Identification

Building a trustworthy fake news detection system faces many challenges. One of the
main challenges is that different studies use different databases; therefore, there is no
uniform dataset. It is challenging to compare system performance correctly because of
this lack of standardization.

Furthermore, the sheer volume and speed of online information present significant
hurdles. The rapid diffusion of information and the large volume of content produced and
shared makes it challenging to stay up to date on the latest news and verify its accuracy
before it circulates widely.

Additionally, the deliberate production and dissemination of misleading information by

those with ulterior motives complicate the problem further. Bad actors may use deceptive
tactics, such as creating fake social media accounts or manipulating visual media, to alter
public opinion or advance their own objectives through the spread of false information.

In conclusion, the lack of standardized datasets, the vast amount and speed of information
available online, and the existence of deliberate disinformation efforts by people with
hidden agendas are the challenges in creating a system to identify fake news. Developing
trustworthy methods for spotting and stopping fake news requires addressing these
problems.

3. METHODS

The materials and methods section includes a brief description of the datasets used, as
well as a synopsis of their features. A schematic layout or model diagram is also included
to illustrate the system's or model's structure. A brief description of the project's
methodologies is provided, with a focus on the key algorithms or techniques employed.
The project's technology stack, including any tools or software utilized, is explained.
Furthermore, the evaluation metrics or criteria that were employed to assess the project's
solution's efficacy are examined.

4
3.1 Dataset Description

The datasets used for this investigation are freely available online and are open source.
They include news stories from various domains, both fake and genuine. Fake news
websites present unsupported statements, while authentic news articles provide accurate
accounts of real events. Many of the political statements in these articles can be manually
verified using fact-checking websites like politifact.com and snopes.com. Now, let’s
discuss the datasets that were overlooked in our representation. We acquired the news
article-based datasets from Kaggle [6]. Each article is labeled as either “fake” or “true.”
The dataset includes the title, text, subject, and date of each article. The title is the
headline of the news piece; the text is the main content, detailing the news’s focus; the
subject indicates the nature of the news; and the date shows the publication date.

Figure 1. Representing Fake Datasets

Figure 1 shows the bogus article has the shape (23481, 4), which indicates that it has
23481 rows and 4 columns.

5
Figure 2. Representing True Datasets

Figure 2 shows the actual item has the shape (21417,4), which indicates that it
has 21417rows and 4 columns.

Figure 3. Frequent words in fake news

6
Figure 4. Frequent words in true news

Therefore, we are displaying the graph in two Figure 3 and Figure 4 above based on the
frequency of words in the fictitious dataset. To spot the same deceptive tendencies fake
news articles frequently feature, frequent words can be useful. The first thing we
performed in this procedure was to preprocess the text by removing any commas or
punctuation. Then, we tokenize the large words into smaller ones. The frequency of each
term in the dataset is then counted, and the frequencies are then divided based on the
label of the news story, i.e., authentic, or fraudulent. Here, we identify the words that
appear frequently in the text. It is easy to comprehend the common concepts, themes, or
topics related to fake news by analyzing the frequently used terms.

7
Figure 5. Article per subject

Figure 5 shows how many articles are useable for each subject. The articles are divided
into the following categories: government news, Middle East news, normal news, US
news, leftover news, political news, and world news.

Figure 6. News percentage

8
The percentage or number of articles in the dataset is shown in Figure 6. Consequently,
23481 articles, or 52% of the total, are in the fake version, while 21417 articles, or 48%
of the total, are in the real version.

3.2 Model Diagram

The process of developing a system to detect false news is illustrated in Figure 7. It

entails steps including data preparation, data separation, decision tree classifier use, TF-
IDF feature extraction, and performance evaluation utilizing metrics and a confusion
matrix. The flow of these phases is depicted, and the essential components engaged in
each step of the process are highlighted visually.

Figure 7. Model Diagram

3.3 Methods

Numerous strategies are employed to help the version of a successful acquisition become
ingrained. The archaic phase that is excessively big in this is the proclamation pre-
processing stage. To ensure that this declaration is insufficient for training machine

9
learning models, it involves transforming raw data into a comprehensible format. This
change requires the application of a few processes and techniques. Among the techniques
are function extraction, data disjunction, propensity scaling, unrestricted proclamation,
handling outliers, and managing inattentive statistics. Information scientists can design
audit completed judgments based on the declaration and help to accumulate errors and
inconsistencies by using the ML model's improved circumstance, which improves the
model's ability to determine whether a declaration is suitable for examination and to
launch knowledgeable and factual results pretreatment.

Our information will first be concatenated. Then, to originate our data leaner, we will
acquire the columns that aren't valuable. Since capitalization might disagree between
sources and can lead to duplication or inconsistent values if not standardized, this step is

frequently taken to make the text data more consistent and easier to deal with. For text
analysis tasks, removing punctuation can be helpful because it might not have a major
value.

To achieve a successful implementation, various approaches are utilized. The initial stage
is statement preprocessing, which involves converting raw data into a comprehensible
format, necessary for training machine learning models. Several methods are used in this
transformation, including managing missing data, normalizing statements, scaling biases,
handling outliers, feature extraction, and data partitioning. Pretreatment ensures data is
suitable for analysis, leading to accurate results and enhancing ML models' performance.
It enables data scientists to make informed decisions and reduce errors and
inconsistencies.

First, we will concatenate our data, and then remove non-essential columns to streamline
the dataset. Standardizing text data by handling capitalization and removing punctuation
ensures consistency and ease of use. Now, let's discuss the features of ML. Features are
measurable properties or aspects of data that influence the model's learning process
which provide crucial information for accurate classifications or predictions. We use TF-
IDF, which measures the significance of terms within a document relative to a collection
of documents, aiding in text analysis by highlighting important terms. Next, we split our

10
data into training and testing sets. Our workflow involves defining and evaluating a
decision tree classifier. The pipeline includes three steps: the Count Vectorizer, which
transforms text data into word count matrices; the TF-IDF Transformer, which applies
TF-IDF weights; and the DT-Classifier, which trains a decision tree classifier on the TF-
IDF weighted word counts.

To evaluate fake news detection classifiers, we used the Decision Tree (DT) algorithm, a
supervised learning method that predicts outcomes by recursively splitting data into
subsets based on the most significant features.

With an assumption of predictor independence, Naive Bayes is a probabilistic machine

learning algorithm based on Bayes' Theorem. It is frequently applied to classification
problems, such as the identification of false news.

3.3.1 Support Vector Machine: - The SVM algorithm's goal is to locate a hyperplane
that, as much as feasible, divides data points from one class to another. The approach
finds such a hyperplane only for linearly separable problems; for most real-world
problems, it optimizes the soft margin, permitting a limited amount of misclassifications.
A portion of the training observations that pinpoint the location of the dividing
hyperplane are referred to as support vectors.

3.3.2 Logistic Regression: - A Logistic Regression (LR) is the appropriate regression

analysis to conduct when the dependent variable is dichotomous (binary). Like all
regression analyses, logistic regression is a predictive analysis. It describes data and
explains the relationship between one dependent binary variable and one or more
nominal, ordinal, interval, or ratio-level independent variables.

3.3.3 Random Forest: - Another popular machine learning algorithm for a variety of
tasks, including the identification of false news, is Random Forest (RF). This kind of
ensemble learning technique constructs several decision trees and combines them to get a
forecast that is more reliable and accurate. Using bootstrapping, a random subset of
features and a random subset of data are used to train each tree.

11
The final stage is prediction. After training on historical data, the algorithm predicts
outcomes based on new information. We assess performance using metrics like F1 score,
recall, precision, and accuracy. A confusion matrix, which displays predicted versus
actual values, helps evaluate the classifier's performance. The model's structure is
paramount in this process.

3.4 Libraries

Key Python modules used in the project include matplotlib for data visualization, NumPy
for numerical computations, Pandas for data analysis and manipulation, and Scikit-Learn
for machine learning tasks. These technologies make it possible for efficient data
processing, analysis, visualization, and machine learning algorithm application to achieve
the project's objectives.

3.4.1 NumPy

NumPy, a core Python module, plays a vital role in fake news detection programs by
providing efficient numerical computations and array operations. Its array data structure
enhances data representation, allowing for the storage and manipulation of multi-
dimensional arrays.

3.4.2 Pandas

Pandas play a crucial role in fake news detection by efficiently handling and transforming
datasets. It provides a high-level data structure called a data frame, which simplifies
organizing, exploring, and preprocessing the dataset. Overall, Pandas is a key component
of the code, streamlining dataset handling, preprocessing, and feature creation.

3.4.3 Matplotlib

Matplotlib is a powerful Python data visualization package that can help summarize the
results and enhance the study of false news detection models. When it comes to
identifying false news, Matplotlib can be utilized in a multitude of ways to provide an
understanding of the predictions and summarize the model's performance.

12
3.4.4 Scikit-Learn

Scikit-learn, a well-known Python machine-learning toolkit, is largely used in the false

news detecting code. It is essential to several operations, including as evaluation, training,
and preparation of data. Using all of Scikit-learn's functionalities is demonstrated in the
code. The code demonstrates using every feature of Scikit-learn. The `train_test_split`
function from Scikit-learn is used to divide the dataset into training and testing sets
initially.

3.4.5 NLTK

Natural Language Toolkit is a potent Python library that is widely used for tasks related to
natural language processing (NLP), such as the identification of false news. Machine
learning models are constructed using the numerical representations of text data
following feature extraction. A variety of algorithms, including Support Vector Machines
(SVM), Naive Bayes, and even deep learning methods, can be used.

3.4.6 Word Cloud

The magnitude of each word in a word cloud, which is a visual representation of text
data, shows the term's relevance or frequency within the text corpus. Word clouds can be
quite useful for both exploratory data analysis and feature extraction when it comes to
machine learning (ML) techniques for false news identification.

3.5 Evaluation Measures

Several evaluation techniques can be used to assess the effectiveness of the classifier in
identifying fake news using machine learning. Commonly employed evaluation metrics
include:

1. Accuracy: This metric calculates the percentage of correct predictions made by

the classifier out of the total number of predictions.

𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 (1)

13
2. Precision: Precision measures the proportion of true positive predictions relative
to the total number of positive predictions.

𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (2)

3. Recall: Also known as sensitivity or the true positive rate, recall measures the
proportion of actual positive events that the classifier correctly identified.

𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑁 (3)

4. F1-score: This metric combines precision and recall into a single statistic, offering
a balanced assessment that accounts for both false positives and false negatives.
The F1-score is especially useful when dealing with imbalanced datasets.

𝑇𝑃
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 1 (4)
𝑇𝑃+ (𝐹𝑃+𝐹𝑁)
2

Where, TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative.

5. Confusion Matrix: A confusion matrix is a tabular representation that shows the

true positive, true negative, false positive, and false negative predictions.

Using functions provided by libraries like Scikit-learn, these evaluation metrics can be
calculated. Combining these metrics gives a comprehensive understanding of the model's
effectiveness in detecting fake news, including its accuracy, precision, recall, and the
distribution of predictions, as shown in the confusion matrix.

4. EXPERIMENTATION AND RESULTS

The system specs utilized in the project are described in the results and output section.
Additionally, it describes the variables that were changed or kept under control during the
experiments or simulations. The part concludes by presenting the experimental findings,
together with any statistical metrics or visuals that demonstrate the system's performance
or efficacy.

14
4.1 System Specifications

To complete this project, the following hardware specifications are required:

• Processor: 11th-generation Intel i5

• RAM: 8GB

• Storage: 512GB SSD

• Graphics Card: NVIDIA GEFORCE GTX 1650

The operating system should be Windows (x64 bit). For Python development, the VS
Code is used. This robust hardware setup and software configuration will ensure the
project can be designed and executed effectively, maximizing performance and
productivity.

4.2 Parameters Used

To detect fake news using machine learning, we implemented several preprocessing and
feature extraction techniques, leveraging Count Vectorizer and TF-IDF for feature
representation and employing Decision Tree Classifiers for classification. The
preprocessing stage involved meticulous text preprocessing to standardize the textual data
and make it suitable for machine learning algorithms. We started by converting all text to
lowercase using the `apply()` function from the pandas library. This function applies the
supplied function to each element in a data-frame. In this example, a lambda function was
used to apply the 'lower ()' function to each text element in the 'text' column, maintaining
uniformity by converting all uppercase letters to lowercase. This step is critical because it
standardizes the text data, eliminating conflicts caused by differences in capitalization
between sources.

Next, we removed common words, known as stop words, using the `stop-words` library.
In language, such as "a," "the," "is," and "are," which often have little meaning and can
interfere with the analysis of the underlying text, stop words are frequently used. The
download () function is used for downloading a corpus of stop-words. Then, the `stop-
words. words function call returns a list of all the stop words in English. We applied

15
another lambda function to each element in the 'text' column of the Data- Frame, splitting
each text element into individual words using the `split()` function.

Tokenization, a crucial step in text preprocessing, involves splitting text into tokens
(words). By converting all text to lowercase during vectorization, the Count Vectorizer
ensures that variations in capitalization do not result in separate tokens. Additionally, the
Count Vectorizer allows for the elimination of stop words, further refining the textual
data by excluding words that frequently occur but carry little meaningful information.
This refinement ensures that the text representation focuses on more significant terms that
contribute to distinguishing between real and fake news.

We used Term Frequency-Inverse Document Frequency (TF-IDF) method for feature

extraction. This is a statistical method that keeps in track the importance of a word in a
document relative to a collection of documents (corpus). It consists of two components:
Term Frequency (TF) and Inverse Document Frequency (IDF). Term Frequency (TF)
measures the frequency of a word in a document. It is obtained by counting the number
of times a word appears in a document, and then dividing this count by the total number
of words in the document.

We used the train_test_split function in scikitlearn's library to split datasets into training
and testing subsets to train models and evaluate them. For the training of a model to
recognize patterns and features in news articles, it is necessary to use an exercise set and
test sets while evaluating its performance on untraceable data.

We chose the Decision Tree Classifier because it has various advantages in the context of
detecting bogus news. Decision trees are extremely interpretable, providing for simple
comprehension and justification of the model's conclusions. This interpretability is
critical in fake news detection since it is necessary to comprehend the reasons behind
classifying news as real or fake. Decision trees can handle category and numerical
features, making them useful for assessing various data types found in news items,
including text, headlines, and metadata.

16
The Decision Tree Classifier's overall simplicity, interpretability, versatility, and capacity
to recognize complex patterns make it a crucial tool in fake news detection, enabling
clear and understandable decision-making and contributing to the identification of
significant features that enhance the credibility of the classification results.

In summary, our approach to fake news detection involves comprehensive text

preprocessing and feature extraction, followed by model training and evaluation using
robust techniques.

4.3 Results and Outcomes

As indicated in Table 1, these are the outcomes we obtained after applying the decision
tree classifier and using it improved our results. As shown in Table 1, we now have a f1-
score of 99.61%, an accuracy of 99.6%, a precision of 99.75, and a recall of 99.52%.

Table 1. Performance Metrics of the Classifier

Classifier Accuracy Precision Recall F1 Score

Naive Bayes 93.65% 94% 94% 94%

Random Forest 98.75% 99% 99% 99%

Logistic Regression 98.78% 99% 99% 99%

Support Vector Machine 99.33% 99% 99% 99%

Decision Tree 99.67% 100% 100% 100%

17
Figure 8. Confusion matrix

We explore many criteria to compose the ability of the methods the cm (confusion-
matrix) is the foundation for the absolute majority of them an assortment of model
executions on the test set is tabulated as a cm (confusion-matrix). The metric that is
repeatedly employed is accuracy. Indicating the preparation of accurately predicted
observations that were either right or fraudulent.

In reaction to the predictions produced using the decision-tree classifier, we created a cm

(confusion- -matrix). Figure 8 is a cm (confusion- -matrix) that displays a table set of the
assorted predictions and outcomes of a classification conundrum and aids in determining
its solvent. It creates a board with all of a classifier's awaited and correct values.

18
4.4 Result Analysis and Validation

Figure 9. User Interface for prediction of fake news

Lastly, this is the user interface of our project, which works on the best working model,
which is the Decision Tree Classifier.

A decision tree model must be developed and evaluated, an intuitive and user-friendly
interface must be designed, the model must be integrated with the UI via a backend
service, and the model must be continuously improved based on user feedback and model
performance. This is how a decision tree classifier-based UI for fake news detection is
designed. This theoretical foundation guarantees that the system is user-friendly and
easily comprehensible in addition to being accurate in identifying bogus news.

5. CONCLUSION

Identifying false information is essential and difficult in today's digital age. The quick
growth of social networks and online platforms has resulted in the widespread spread of
false information, which supports anti-social behaviors and has a big impact on social
digital marketing. Identifying false information remains a persistent and complicated
hurdle, with no one solution capable of stopping its dissemination. A comprehensive
plan is required, which includes technological strategies, media knowledge, and critical
thinking skills. Methods such as machine learning and natural language processing NLP
were developed to deal with this issue. Education and technology are essential to fight
false news, as they can help people acquire analytical thinking skills that enable them to

19
evaluate the reliability of information sources and their quality. Technology businesses
must collaborate with policymakers to successfully combat the spread of false
information. Through collaboration, we can improve transparency and decrease the
dissemination of false information in the digital environment. Ongoing research,
partnerships among different groups, and the creation of innovative technologies are
crucial for tackling these problems and protecting the accuracy of information in the
digital age.

6. REFERENCES

[1] Wang, Y., Qian, X. Li Y., & Zhang, H. (2018). Fake news detection on social
media: A data mining perspective using a hybrid deep learning model. ACM
Transactions on Management Information Systems (TMIS), 9(3), 1-21.

[2] Albahr, A., & Albahar, M. (2020). An empirical comparison of fake news
detection using different machine learning algorithms. International Journal of
Advanced Computer Science and Applications, 11(9).

[3] Khan, A. I., Shahzad, F., & Ali, S. (2019). Fake news detection: a deep learning
approach using CNN. IEEE Access, 7, 44112-44121. doi: 10.1109/ACCESS.2019
.2901590.

[4] Thakur, P., Shah, R. R., & Rana, N. P. (2020). A survey on automated fake news
detection: Trends and challenges. Information Processing & Management, 57(2),
102026. doi: 10.1016/j.ipm.2019.102026.

[5] Kumar, R., Singh, R. K., & Roy, P. P. (2021). Fake news detection on social media:
A review. Artificial Intelligence Review, 54(4), 2997-3030. doi: 10.1007/s10462-
020-09981-4.

[6] Allcott, H., & Gentzkow, M. (2017). Social Media and Fake News in the 2016
Election. Journal of Economic Perspectives, 31(2), 211-236. https://doi.org/10.12
57/jep.31.2.211

20
[7] Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection:
Methods for finding fake news. Proceedings of the Association for Information
Science and Technology, 52(1), 1-4. https://doi.org/10.1002/pra2.2015.145052010
082

[8] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on
Social Media: A Data Mining Perspective. ACM SIGKDD Explorations
Newsletter, 19(1), 22-36. https://doi.org/10.1145/3137597.3137600

[9] Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of
Varying Shades: Analyzing Language in Fake News and Political Fact-Checking.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language
Processing (EMNLP), 2931-2937. https://doi.org/10.18653/v1/D17-1317.

7. Reflection of the Team Members on The Project

Working on the project to identify fake news using machine learning techniques like
decision trees has given our team new knowledge and experience. We were able to
achieve our objectives through efficient teamwork since every team member brought a
unique set of skills and perspectives to the table. In this reflection, each of us will offer
our unique viewpoints and project-related contributions.

During this research, I, Aditi Rath (2041018064), was mostly interested in feature
engineering and data pretreatment. I collected and analyzed a large number of news items
using techniques like tokenization, stemming, and TF-IDF in order to extract useful
features. Using decision tree algorithms, I was able to identify crucial features that help
identify false news. In order to improve the decision tree model's performance, I also
performed cross-validation and made hyperparameter adjustments. In addition, I actively
participated in team meetings and discussions by bringing up ideas to improve workflow
overall and boost team output. I was centered on creating and evaluating the decision tree
model. In order to understand the preprocessed data and select the appropriate features
for the model's training. To assess the precision, recall, and accuracy of the model, I
employed decision tree approaches and conducted a comprehensive testing procedure. I

21
also investigated ensemble parameters like the Tf-Idf transformer and Count vectorizer to
see whether they could help the model perform better. Throughout the project, I
encouraged discussions with the team about potential adjustments and future directions
by sharing the data and conclusions with them. Finally, I created comprehensive
evaluation standards, including F1 score, recall, accuracy, and precision, to rank the
decision tree model's performance. I also utilized methods like confusion matrix analysis
to look at the pros and cons of the model and made an user Interface as well.

My contribution to the project, Ashutosh Kumar (2041011113), I worked on the overall

project overview as to why was it necessary making the Fake News Detection system
with maintaining the dignity and integrity of other working models in existing system. I
also worked on the motivations behind doing this project which was maintaining the
authenticity of the information that is getting shared throughout the internet. There is
uniqueness to our project as it compares four traditional Machine Learning models to our
model to compare each of their accuracies to find the best. Throughout the project, I
encouraged discussions with the team about potential adjustments and future directions
by sharing the data and conclusions with them.

For the model's training, I, Ashutosh Sarangi (2041019145), focused on obtaining and
annotating a reliable dataset. I worked very hard to locate trustworthy news sources and
ensured that the dataset was of the highest caliber. I also worked on the system
specifications to actually analize what was best for our project along with ensuring the
smooth application of our project. I collaborated with my team members to finalize what
were the parameters we were using for the successful application of our project.

I, Indrajit Das(2041004164) ,my main contribution to the research was centered on the
assessment and interpretation of the model. I have analyzed and viewed the existing
system through reading news articles of fake news detection topics throughout the
internet and went across the solutions that we already have with which I came across the
problem outcomes which was developing this project which compared traditional ML
algorithms to find the best one.Overall, we were able to tackle the challenging problem of
identifying fake news by effectively cooperating and utilizing our unique
individual skills.
22
8. SIMILARITY REPORT

How To Make Someone Fall in Love With You
100% (3)
How To Make Someone Fall in Love With You
91 pages
Fake News Detection
100% (1)
Fake News Detection
44 pages
shivansh
No ratings yet
shivansh
23 pages
Fake News Final Report DNWSLVDK C
No ratings yet
Fake News Final Report DNWSLVDK C
51 pages
Internshipreport 15
No ratings yet
Internshipreport 15
34 pages
Aiml Project Report
No ratings yet
Aiml Project Report
46 pages
Fake News Detection Using Machine Learning: Project Report On
No ratings yet
Fake News Detection Using Machine Learning: Project Report On
57 pages
501970592-docdownloader-com-pdf-to-study-fake-news-detection-in-online-social-media-in-context-of-machine-dd-4c0db34748aee9bf806ea5aa67d1d9a5
No ratings yet
501970592-docdownloader-com-pdf-to-study-fake-news-detection-in-online-social-media-in-context-of-machine-dd-4c0db34748aee9bf806ea5aa67d1d9a5
78 pages
Nistir89 4153
No ratings yet
Nistir89 4153
26 pages
Fake News Documentation Andhra University Project
No ratings yet
Fake News Documentation Andhra University Project
87 pages
Fake News Detection - Report
100% (1)
Fake News Detection - Report
59 pages
Fake News Analysis
No ratings yet
Fake News Analysis
46 pages
MINOR REPORT(1) Fake News Detect[1] Copy
No ratings yet
MINOR REPORT(1) Fake News Detect[1] Copy
14 pages
Final Report Vericheck
No ratings yet
Final Report Vericheck
49 pages
Front Papers-Technical Seminors
No ratings yet
Front Papers-Technical Seminors
46 pages
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
No ratings yet
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
52 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
18 pages
A Project Report On Fake News Detection
100% (1)
A Project Report On Fake News Detection
29 pages
Bharathi Mini Project
No ratings yet
Bharathi Mini Project
47 pages
Complete Final Sem Report PDF
No ratings yet
Complete Final Sem Report PDF
79 pages
Complete Final Sem Report PDF
100% (1)
Complete Final Sem Report PDF
79 pages
Fake News Detection Using Multi (1)
No ratings yet
Fake News Detection Using Multi (1)
9 pages
Encryption & Decryption Apk
No ratings yet
Encryption & Decryption Apk
27 pages
Batch 16 FakeMediaDetection BasedonNLP
No ratings yet
Batch 16 FakeMediaDetection BasedonNLP
45 pages
Edited_PROJECT REPORT_Amisha
No ratings yet
Edited_PROJECT REPORT_Amisha
24 pages
B.E Cse Batchno 214
No ratings yet
B.E Cse Batchno 214
47 pages
D1 - 4 - Fake News Detection
No ratings yet
D1 - 4 - Fake News Detection
39 pages
Fake News Detection: Adithiya G (Urk18Cs257)
No ratings yet
Fake News Detection: Adithiya G (Urk18Cs257)
28 pages
Fake News Detection: Project Proposal
No ratings yet
Fake News Detection: Project Proposal
7 pages
Ce 21 PDF
No ratings yet
Ce 21 PDF
75 pages
RKM029A02 - Project - INT248 - Report 1
No ratings yet
RKM029A02 - Project - INT248 - Report 1
16 pages
CHAITHRAs INTRENSHIP 1gg19cs007 - 055715
No ratings yet
CHAITHRAs INTRENSHIP 1gg19cs007 - 055715
6 pages
Fake News Detection Using Machine Learning: Bachelor of Technology
No ratings yet
Fake News Detection Using Machine Learning: Bachelor of Technology
22 pages
sri bb
No ratings yet
sri bb
65 pages
22A91F0056 Swathi
No ratings yet
22A91F0056 Swathi
66 pages
Mini Project Report Format
No ratings yet
Mini Project Report Format
16 pages
Fake News Detection System Report
No ratings yet
Fake News Detection System Report
29 pages
rt4
No ratings yet
rt4
77 pages
Tracking and Tracing of Fake News Using URL Report-1
No ratings yet
Tracking and Tracing of Fake News Using URL Report-1
78 pages
20SCSE1180073 Shreyansh.
No ratings yet
20SCSE1180073 Shreyansh.
21 pages
Seminar 18
No ratings yet
Seminar 18
20 pages
Fakenews ReportFIN With S PDF
No ratings yet
Fakenews ReportFIN With S PDF
35 pages
mini project[1]
No ratings yet
mini project[1]
24 pages
Fake News Detection Using Machine Learning12 2
No ratings yet
Fake News Detection Using Machine Learning12 2
65 pages
Fake News Detection
No ratings yet
Fake News Detection
36 pages
0th Rev Final
No ratings yet
0th Rev Final
3 pages
Report Multipile
No ratings yet
Report Multipile
24 pages
Fake News Detection System Project Report-Merged
No ratings yet
Fake News Detection System Project Report-Merged
60 pages
Documentation - real and fake
No ratings yet
Documentation - real and fake
66 pages
AI Project Proporsal - Fake News Detection
No ratings yet
AI Project Proporsal - Fake News Detection
4 pages
Fake News Final Report
No ratings yet
Fake News Final Report
29 pages
Fake News Proposal
No ratings yet
Fake News Proposal
18 pages
Final Report
No ratings yet
Final Report
79 pages
Fake News Analysis Report-1
No ratings yet
Fake News Analysis Report-1
45 pages
Seminar document
No ratings yet
Seminar document
23 pages
SYNOPSIS
No ratings yet
SYNOPSIS
4 pages
Fake News Detection System pdf
No ratings yet
Fake News Detection System pdf
47 pages
Presentation Slide of AI
No ratings yet
Presentation Slide of AI
30 pages
MAJOR PROJECT Documentation
No ratings yet
MAJOR PROJECT Documentation
67 pages
Mini Project
No ratings yet
Mini Project
19 pages
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
H12_Project Report
No ratings yet
H12_Project Report
22 pages
H11_Consent Form
No ratings yet
H11_Consent Form
1 page
CNW Assignment 3
No ratings yet
CNW Assignment 3
8 pages
CNW Ass1
No ratings yet
CNW Ass1
5 pages
Inclusion and Exclusion
No ratings yet
Inclusion and Exclusion
10 pages
Adobe Scan 29 Mar 2024
No ratings yet
Adobe Scan 29 Mar 2024
7 pages
Adobe Scan 14-Mar-2024
No ratings yet
Adobe Scan 14-Mar-2024
13 pages
3 q4 English Fourth Quarter Module 3 For Grade 7 Class
No ratings yet
3 q4 English Fourth Quarter Module 3 For Grade 7 Class
20 pages
Lawful Interception in 5G
No ratings yet
Lawful Interception in 5G
7 pages
General Instructions: Read Carefully and Answer The Following Items.
No ratings yet
General Instructions: Read Carefully and Answer The Following Items.
8 pages
MJM 023 e 2024
No ratings yet
MJM 023 e 2024
13 pages
MKT 7x0 Research Seminar Topics-Fall 2022
No ratings yet
MKT 7x0 Research Seminar Topics-Fall 2022
28 pages
Education in Crisis - A Position Paper For English For Academic and Professional Purposes
No ratings yet
Education in Crisis - A Position Paper For English For Academic and Professional Purposes
9 pages
World Health Organization - Worksheet
No ratings yet
World Health Organization - Worksheet
5 pages
Political Culture
No ratings yet
Political Culture
36 pages
EEX4330 220275925 M.labatheesh Field Visit
No ratings yet
EEX4330 220275925 M.labatheesh Field Visit
3 pages
4_2018
No ratings yet
4_2018
11 pages
MIST Paper (Maertens Götz Et Al. 2022) - Draft v1.4 - PsyArXiv
No ratings yet
MIST Paper (Maertens Götz Et Al. 2022) - Draft v1.4 - PsyArXiv
61 pages
Perbillo, AK (2019) - Hesitancy, Global and Local Issues On Vaccines and Their Impact To The Canadian Population
No ratings yet
Perbillo, AK (2019) - Hesitancy, Global and Local Issues On Vaccines and Their Impact To The Canadian Population
13 pages
chapter-1-introduction - 2024-05-09T171320.120
No ratings yet
chapter-1-introduction - 2024-05-09T171320.120
12 pages
BF Assignment
No ratings yet
BF Assignment
6 pages
Essay She
No ratings yet
Essay She
12 pages
EBook For Artificial Misinformation Exploring Human-Algorithm Interaction Online 1st Edition by Donghee Shin
No ratings yet
EBook For Artificial Misinformation Exploring Human-Algorithm Interaction Online 1st Edition by Donghee Shin
26 pages
Zuboff - The Coup We Are Not Talking About
No ratings yet
Zuboff - The Coup We Are Not Talking About
17 pages
Final Memorial for Applicant
No ratings yet
Final Memorial for Applicant
20 pages
MUET e
No ratings yet
MUET e
2 pages
Misinformation On YouTube: High Profits, Low Moderation
No ratings yet
Misinformation On YouTube: High Profits, Low Moderation
27 pages
Roi2 P 57 All About Advertising Powerpoint English Ver 1
No ratings yet
Roi2 P 57 All About Advertising Powerpoint English Ver 1
13 pages
PDF of Media Criticism
No ratings yet
PDF of Media Criticism
133 pages
Is Indian Media Responsible
No ratings yet
Is Indian Media Responsible
2 pages
Feeling fine about being wrong- the influence of self-affirmation on the effectiveness of corrective information
No ratings yet
Feeling fine about being wrong- the influence of self-affirmation on the effectiveness of corrective information
25 pages
The Evolution of Communication in The Digital Age
No ratings yet
The Evolution of Communication in The Digital Age
2 pages
72 Fake-News US
No ratings yet
72 Fake-News US
6 pages
The Chaos Machine PDF
No ratings yet
The Chaos Machine PDF
147 pages
The Fake News in The Philippines: Dissecting The Propaganda of Machine (Transcription)
No ratings yet
The Fake News in The Philippines: Dissecting The Propaganda of Machine (Transcription)
4 pages
Rogers 2021 Visual Media Analysis For Instagram and Other Online Platforms
No ratings yet
Rogers 2021 Visual Media Analysis For Instagram and Other Online Platforms
23 pages

D13_Project Report

Uploaded by

D13_Project Report

Uploaded by

FAKE NEWS DETECTION USING

Ashutosh Kumar (2041011113)

in partial fulfillment for the award of the degree

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(Name and signature of the Project Supervisor)

Place: Signature of Students

Signature of Students with Registration Numbers

Ashutosh Kumar Introduction, Project overview, Motivation and Uniqueness.

Aditi Rath Literature survey, Material and methods; Model Diagram,

Indrajit Das Existing System, Problem Outcomes.

4. EXPERIMENTATION AND RESULTS 14

NO FIGURE NAME PAGE NO

1 Representing Fake Datasets 5

NO TABLE NAME PAGE NO

1 Performance Metrics of the Classifiers 17

This introduction provides a concise overview of the project's requirements and

1.2 Project Overview

Due to the increased density of international information exchange, the average

1.4 Uniqueness of the Work

1.5 Report Layout

2.1 Existing System

In [1], the authors comprehensively compare high-performing models and their

Additionally, the deliberate production and dissemination of misleading information by

Figure 1. Representing Fake Datasets

Figure 3. Frequent words in fake news

Figure 6. News percentage

3.2 Model Diagram

The process of developing a system to detect false news is illustrated in Figure 7. It

Figure 7. Model Diagram

With an assumption of predictor independence, Naive Bayes is a probabilistic machine

3.3.2 Logistic Regression: - A Logistic Regression (LR) is the appropriate regression

Scikit-learn, a well-known Python machine-learning toolkit, is largely used in the false

3.4.6 Word Cloud

3.5 Evaluation Measures

1. Accuracy: This metric calculates the percentage of correct predictions made by

5. Confusion Matrix: A confusion matrix is a tabular representation that shows the

4. EXPERIMENTATION AND RESULTS

To complete this project, the following hardware specifications are required:

• Processor: 11th-generation Intel i5

• Storage: 512GB SSD

• Graphics Card: NVIDIA GEFORCE GTX 1650

4.2 Parameters Used

We used Term Frequency-Inverse Document Frequency (TF-IDF) method for feature

In summary, our approach to fake news detection involves comprehensive text

4.3 Results and Outcomes

Table 1. Performance Metrics of the Classifier

Classifier Accuracy Precision Recall F1 Score

Random Forest 98.75% 99% 99% 99%

Logistic Regression 98.78% 99% 99% 99%

Support Vector Machine 99.33% 99% 99% 99%

Decision Tree 99.67% 100% 100% 100%

In reaction to the predictions produced using the decision-tree classifier, we created a cm

Figure 9. User Interface for prediction of fake news

7. Reflection of the Team Members on The Project

My contribution to the project, Ashutosh Kumar (2041011113), I worked on the overall

You might also like