D13_Project Report
D13_Project Report
MACHINE LEARNING
A Project Report
Submitted by:
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
This is to certify that the project report titled “FAKE NEWS DETECTION USING
MACHINE LEARNING” being submitted by Ashutosh Kumar, Aditi Rath, Ashutosh
Sarangi, Indrajit Das of section ‘D’ to the Institute of Technical Education and Research,
Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar for the partial fulfillment
for the degree of Bachelor of Technology in Computer Science and Engineering is a record of
original confide work carried out by them under my/our supervision and guidance. The project
work, in my/our opinion, has reached the requisite standard fulfilling the requirements for the
degree of Bachelor of Technology.
The results contained in this project work have not been submitted in part or full to any other
University or Institute for the award of any degree or diploma.
ii
ACKNOWLEDGMENT
We would like to thank Dr. Prativa Das, our project supervisor, from the bottom of our hearts
for all of his help and support during our group project. Her knowledge, perceptions, and
priceless comments have been extremely helpful in forming our project and guaranteeing its
triumphant conclusion. Her patience, attention, and dedication to our education and
development are deeply appreciated.
We express our gratitude to Siksha "O" Anusandhan (Deemed to be University) for providing
the facilities, resources, and infrastructure that were critical to our joint project's success. Our
capacity to conduct research, develop, and collaborate is a result of the institution's
commitment to fostering an atmosphere that promotes academic success and inquiry.
We also acknowledge and express our gratitude to the other group members for their efforts,
cooperation, and project-related contributions. Their varied backgrounds, viewpoints, and
commitment have greatly aided in our project's overall success. Our capacity to collaborate as
a team and overcome challenges has enabled us to achieve our objectives.
Finally, we'd want to thank everyone and every organization that has contributed to our
collaborative effort through conversations, criticism, or other forms of aid. Your help and
encouragement have been critical to our growth and achievement of our purpose. We have
developed tremendously as a consequence of your support and encouragement, and our
project has been finished successfully.
Date:
iii
DECLARATION
We declare that this written submission represents our ideas in our own words and where
other’s ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/fact/source in our
submission. We understand that any violation of the above will cause for disciplinary action
by the University and can also evoke penal action from the sources which have not been
properly cited or from whom proper permission has not been taken when needed.
2041018064
2041018064
2041011113
2041004164
iv
REPORT APPROVAL
This project report titled “FAKE NEWS DETECTION USING MACHINE LEARNING
“submitted by Ashutosh Kumar (2041011113), Aditi Rath (2041018064), Ashutosh
Sarangi (2041019145), Indrajit Das (2041004164) is approved for the degree of Bachelor of
Technology in Computer Science and Engineering.
Examiner(s)
________________________
________________________
________________________
Supervisor
________________________
Project Coordinator
________________________
v
PREFACE
Fake news, a term denoting misinformation or disinformation spread via various media
channels, has become a pervasive issue in today's digital era. The importance of detecting and
combating fake news cannot be overstated, as it can distort public perception, influence
elections, and incite social unrest. This report explores the critical need for effective fake news
detection mechanisms. The main issues include the rapid dissemination of false information,
the sophisticated nature of modern fake news, and the challenge of distinguishing it from
genuine news. To address these issues, various algorithms have been employed, such as
Natural Language Processing (NLP) techniques, machine learning models like Naive Bayes,
Support Vector Machines (SVM), Decision Tree, Logistic Regression and Random Forest
algorithms. Among these, the DT algorithm demonstrated the highest accuracy and robustness
in detecting fake news in our experiments, making it the preferred choice.
vi
INDIVIDUAL CONTRIBUTIONS
vii
TABLE OF CONTENTS
Title Page i
Certificate ii
Acknowledgment iii
Declaration iv
Report Approval v
Preface vi
Individual Contributions vii
Table of Contents viii
List of Figures ix
List of Tables x
1. INTRODUCTION 1
1.1 Introduction 1
1.2 Project Overview 1
1.3 Motivation(s) 2
1.4 Uniqueness of the Work 2
1.5 Report Layout 2
2. LITERATURE SURVEY 3
2.1 Existing System 3
2.2 Problem Identification 4
3. METHODS 4
3.1 Dataset(s) Description 5
3.2 Model Diagram 9
3.3 Methods 9
3.4 Libraries 12
3.5 Evaluation Measures 13
viii
LIST OF FIGURES
ix
LIST OF TABLES
x
1. INTRODUCTION
1.1 Introduction
Numerous examples exist of supervised and unsupervised learning algorithms being used
to categorize text within current fake news corpora. However, most research focuses on
specific datasets or domains, with the political domain being particularly prominent.
Consequently, algorithms trained on a specific type of article do not perform optimally
when exposed to articles from different domains. Developing a general algorithm that
performs well across all news domains is challenging due to the varying textual structures
of articles from different domains.
1
In this research, we propose a machine learning ensemble strategy to address the issue of
fake news detection. Our study examines various textual characteristics that can
distinguish between authentic and fraudulent content. We train several different machine
learning algorithms using a variety of ensemble methods that are not well explored in the
existing literature. These methods enable the effective and efficient training of various
machine learning algorithms. Additionally, we conducted thorough tests on four real-
world datasets that are freely accessible to the public.
1.3 Motivation
The goal is to identify news articles or other materials that make false or deceptive
claims. Fake news detection systems are crucial in curbing the rapid spread of
misinformation through social media platforms and other communication channels by
educating consumers about the characteristics and indicators of fake news. These
technologies enable people to consume news and information safely. Ensemble learners
have proven effective in numerous applications, as these learning models tend to reduce
error rates by utilizing strategies like bagging and boosting.
Various algorithms are used by different fake news detection systems to recognize and
categorize fake news, and the algorithm selected has a big influence on the system's
accuracy. Numerous data sources, such as social media sites, news websites, and fact-
checking databases, are accessible to these systems. These algorithms employ a variety of
features, including linguistic ones like syntax and grammar as well as semantic ones like
word choice and context, to detect fake news.
The fact that this paper is organized into sub-sections makes it excellent. Details are
easily obtained. Section 1 presents the paper's introduction; Section 2 develops into the
literature review; and Section 3 presents our suggested model. All of the other statistical
majors' results are displayed in Section 4 along with our own. We have finished our paper
with potential future considerations in section 5.
2
2. LITERATURE SURVEY
This part examines the systems and solutions that are currently in use and are important
to the project. It also gives an overview of prior attempts and their flaws. This review
describes the difficulties with the current systems and provides a foundation for
identifying the problems that the project seeks to solve.
3
2.2 Problem Identification
Building a trustworthy fake news detection system faces many challenges. One of the
main challenges is that different studies use different databases; therefore, there is no
uniform dataset. It is challenging to compare system performance correctly because of
this lack of standardization.
Furthermore, the sheer volume and speed of online information present significant
hurdles. The rapid diffusion of information and the large volume of content produced and
shared makes it challenging to stay up to date on the latest news and verify its accuracy
before it circulates widely.
In conclusion, the lack of standardized datasets, the vast amount and speed of information
available online, and the existence of deliberate disinformation efforts by people with
hidden agendas are the challenges in creating a system to identify fake news. Developing
trustworthy methods for spotting and stopping fake news requires addressing these
problems.
3. METHODS
The materials and methods section includes a brief description of the datasets used, as
well as a synopsis of their features. A schematic layout or model diagram is also included
to illustrate the system's or model's structure. A brief description of the project's
methodologies is provided, with a focus on the key algorithms or techniques employed.
The project's technology stack, including any tools or software utilized, is explained.
Furthermore, the evaluation metrics or criteria that were employed to assess the project's
solution's efficacy are examined.
4
3.1 Dataset Description
The datasets used for this investigation are freely available online and are open source.
They include news stories from various domains, both fake and genuine. Fake news
websites present unsupported statements, while authentic news articles provide accurate
accounts of real events. Many of the political statements in these articles can be manually
verified using fact-checking websites like politifact.com and snopes.com. Now, let’s
discuss the datasets that were overlooked in our representation. We acquired the news
article-based datasets from Kaggle [6]. Each article is labeled as either “fake” or “true.”
The dataset includes the title, text, subject, and date of each article. The title is the
headline of the news piece; the text is the main content, detailing the news’s focus; the
subject indicates the nature of the news; and the date shows the publication date.
Figure 1 shows the bogus article has the shape (23481, 4), which indicates that it has
23481 rows and 4 columns.
5
Figure 2. Representing True Datasets
Figure 2 shows the actual item has the shape (21417,4), which indicates that it
has 21417rows and 4 columns.
6
Figure 4. Frequent words in true news
Therefore, we are displaying the graph in two Figure 3 and Figure 4 above based on the
frequency of words in the fictitious dataset. To spot the same deceptive tendencies fake
news articles frequently feature, frequent words can be useful. The first thing we
performed in this procedure was to preprocess the text by removing any commas or
punctuation. Then, we tokenize the large words into smaller ones. The frequency of each
term in the dataset is then counted, and the frequencies are then divided based on the
label of the news story, i.e., authentic, or fraudulent. Here, we identify the words that
appear frequently in the text. It is easy to comprehend the common concepts, themes, or
topics related to fake news by analyzing the frequently used terms.
7
Figure 5. Article per subject
Figure 5 shows how many articles are useable for each subject. The articles are divided
into the following categories: government news, Middle East news, normal news, US
news, leftover news, political news, and world news.
8
The percentage or number of articles in the dataset is shown in Figure 6. Consequently,
23481 articles, or 52% of the total, are in the fake version, while 21417 articles, or 48%
of the total, are in the real version.
3.3 Methods
Numerous strategies are employed to help the version of a successful acquisition become
ingrained. The archaic phase that is excessively big in this is the proclamation pre-
processing stage. To ensure that this declaration is insufficient for training machine
9
learning models, it involves transforming raw data into a comprehensible format. This
change requires the application of a few processes and techniques. Among the techniques
are function extraction, data disjunction, propensity scaling, unrestricted proclamation,
handling outliers, and managing inattentive statistics. Information scientists can design
audit completed judgments based on the declaration and help to accumulate errors and
inconsistencies by using the ML model's improved circumstance, which improves the
model's ability to determine whether a declaration is suitable for examination and to
launch knowledgeable and factual results pretreatment.
Our information will first be concatenated. Then, to originate our data leaner, we will
acquire the columns that aren't valuable. Since capitalization might disagree between
sources and can lead to duplication or inconsistent values if not standardized, this step is
frequently taken to make the text data more consistent and easier to deal with. For text
analysis tasks, removing punctuation can be helpful because it might not have a major
value.
To achieve a successful implementation, various approaches are utilized. The initial stage
is statement preprocessing, which involves converting raw data into a comprehensible
format, necessary for training machine learning models. Several methods are used in this
transformation, including managing missing data, normalizing statements, scaling biases,
handling outliers, feature extraction, and data partitioning. Pretreatment ensures data is
suitable for analysis, leading to accurate results and enhancing ML models' performance.
It enables data scientists to make informed decisions and reduce errors and
inconsistencies.
First, we will concatenate our data, and then remove non-essential columns to streamline
the dataset. Standardizing text data by handling capitalization and removing punctuation
ensures consistency and ease of use. Now, let's discuss the features of ML. Features are
measurable properties or aspects of data that influence the model's learning process
which provide crucial information for accurate classifications or predictions. We use TF-
IDF, which measures the significance of terms within a document relative to a collection
of documents, aiding in text analysis by highlighting important terms. Next, we split our
10
data into training and testing sets. Our workflow involves defining and evaluating a
decision tree classifier. The pipeline includes three steps: the Count Vectorizer, which
transforms text data into word count matrices; the TF-IDF Transformer, which applies
TF-IDF weights; and the DT-Classifier, which trains a decision tree classifier on the TF-
IDF weighted word counts.
To evaluate fake news detection classifiers, we used the Decision Tree (DT) algorithm, a
supervised learning method that predicts outcomes by recursively splitting data into
subsets based on the most significant features.
3.3.1 Support Vector Machine: - The SVM algorithm's goal is to locate a hyperplane
that, as much as feasible, divides data points from one class to another. The approach
finds such a hyperplane only for linearly separable problems; for most real-world
problems, it optimizes the soft margin, permitting a limited amount of misclassifications.
A portion of the training observations that pinpoint the location of the dividing
hyperplane are referred to as support vectors.
3.3.3 Random Forest: - Another popular machine learning algorithm for a variety of
tasks, including the identification of false news, is Random Forest (RF). This kind of
ensemble learning technique constructs several decision trees and combines them to get a
forecast that is more reliable and accurate. Using bootstrapping, a random subset of
features and a random subset of data are used to train each tree.
11
The final stage is prediction. After training on historical data, the algorithm predicts
outcomes based on new information. We assess performance using metrics like F1 score,
recall, precision, and accuracy. A confusion matrix, which displays predicted versus
actual values, helps evaluate the classifier's performance. The model's structure is
paramount in this process.
3.4 Libraries
Key Python modules used in the project include matplotlib for data visualization, NumPy
for numerical computations, Pandas for data analysis and manipulation, and Scikit-Learn
for machine learning tasks. These technologies make it possible for efficient data
processing, analysis, visualization, and machine learning algorithm application to achieve
the project's objectives.
3.4.1 NumPy
NumPy, a core Python module, plays a vital role in fake news detection programs by
providing efficient numerical computations and array operations. Its array data structure
enhances data representation, allowing for the storage and manipulation of multi-
dimensional arrays.
3.4.2 Pandas
Pandas play a crucial role in fake news detection by efficiently handling and transforming
datasets. It provides a high-level data structure called a data frame, which simplifies
organizing, exploring, and preprocessing the dataset. Overall, Pandas is a key component
of the code, streamlining dataset handling, preprocessing, and feature creation.
3.4.3 Matplotlib
Matplotlib is a powerful Python data visualization package that can help summarize the
results and enhance the study of false news detection models. When it comes to
identifying false news, Matplotlib can be utilized in a multitude of ways to provide an
understanding of the predictions and summarize the model's performance.
12
3.4.4 Scikit-Learn
3.4.5 NLTK
Natural Language Toolkit is a potent Python library that is widely used for tasks related to
natural language processing (NLP), such as the identification of false news. Machine
learning models are constructed using the numerical representations of text data
following feature extraction. A variety of algorithms, including Support Vector Machines
(SVM), Naive Bayes, and even deep learning methods, can be used.
The magnitude of each word in a word cloud, which is a visual representation of text
data, shows the term's relevance or frequency within the text corpus. Word clouds can be
quite useful for both exploratory data analysis and feature extraction when it comes to
machine learning (ML) techniques for false news identification.
Several evaluation techniques can be used to assess the effectiveness of the classifier in
identifying fake news using machine learning. Commonly employed evaluation metrics
include:
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 (1)
13
2. Precision: Precision measures the proportion of true positive predictions relative
to the total number of positive predictions.
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (2)
3. Recall: Also known as sensitivity or the true positive rate, recall measures the
proportion of actual positive events that the classifier correctly identified.
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑁 (3)
4. F1-score: This metric combines precision and recall into a single statistic, offering
a balanced assessment that accounts for both false positives and false negatives.
The F1-score is especially useful when dealing with imbalanced datasets.
𝑇𝑃
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 1 (4)
𝑇𝑃+ (𝐹𝑃+𝐹𝑁)
2
Where, TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative.
Using functions provided by libraries like Scikit-learn, these evaluation metrics can be
calculated. Combining these metrics gives a comprehensive understanding of the model's
effectiveness in detecting fake news, including its accuracy, precision, recall, and the
distribution of predictions, as shown in the confusion matrix.
The system specs utilized in the project are described in the results and output section.
Additionally, it describes the variables that were changed or kept under control during the
experiments or simulations. The part concludes by presenting the experimental findings,
together with any statistical metrics or visuals that demonstrate the system's performance
or efficacy.
14
4.1 System Specifications
• RAM: 8GB
The operating system should be Windows (x64 bit). For Python development, the VS
Code is used. This robust hardware setup and software configuration will ensure the
project can be designed and executed effectively, maximizing performance and
productivity.
To detect fake news using machine learning, we implemented several preprocessing and
feature extraction techniques, leveraging Count Vectorizer and TF-IDF for feature
representation and employing Decision Tree Classifiers for classification. The
preprocessing stage involved meticulous text preprocessing to standardize the textual data
and make it suitable for machine learning algorithms. We started by converting all text to
lowercase using the `apply()` function from the pandas library. This function applies the
supplied function to each element in a data-frame. In this example, a lambda function was
used to apply the 'lower ()' function to each text element in the 'text' column, maintaining
uniformity by converting all uppercase letters to lowercase. This step is critical because it
standardizes the text data, eliminating conflicts caused by differences in capitalization
between sources.
Next, we removed common words, known as stop words, using the `stop-words` library.
In language, such as "a," "the," "is," and "are," which often have little meaning and can
interfere with the analysis of the underlying text, stop words are frequently used. The
download () function is used for downloading a corpus of stop-words. Then, the `stop-
words. words function call returns a list of all the stop words in English. We applied
15
another lambda function to each element in the 'text' column of the Data- Frame, splitting
each text element into individual words using the `split()` function.
Tokenization, a crucial step in text preprocessing, involves splitting text into tokens
(words). By converting all text to lowercase during vectorization, the Count Vectorizer
ensures that variations in capitalization do not result in separate tokens. Additionally, the
Count Vectorizer allows for the elimination of stop words, further refining the textual
data by excluding words that frequently occur but carry little meaningful information.
This refinement ensures that the text representation focuses on more significant terms that
contribute to distinguishing between real and fake news.
We used the train_test_split function in scikitlearn's library to split datasets into training
and testing subsets to train models and evaluate them. For the training of a model to
recognize patterns and features in news articles, it is necessary to use an exercise set and
test sets while evaluating its performance on untraceable data.
We chose the Decision Tree Classifier because it has various advantages in the context of
detecting bogus news. Decision trees are extremely interpretable, providing for simple
comprehension and justification of the model's conclusions. This interpretability is
critical in fake news detection since it is necessary to comprehend the reasons behind
classifying news as real or fake. Decision trees can handle category and numerical
features, making them useful for assessing various data types found in news items,
including text, headlines, and metadata.
16
The Decision Tree Classifier's overall simplicity, interpretability, versatility, and capacity
to recognize complex patterns make it a crucial tool in fake news detection, enabling
clear and understandable decision-making and contributing to the identification of
significant features that enhance the credibility of the classification results.
As indicated in Table 1, these are the outcomes we obtained after applying the decision
tree classifier and using it improved our results. As shown in Table 1, we now have a f1-
score of 99.61%, an accuracy of 99.6%, a precision of 99.75, and a recall of 99.52%.
17
Figure 8. Confusion matrix
We explore many criteria to compose the ability of the methods the cm (confusion-
matrix) is the foundation for the absolute majority of them an assortment of model
executions on the test set is tabulated as a cm (confusion-matrix). The metric that is
repeatedly employed is accuracy. Indicating the preparation of accurately predicted
observations that were either right or fraudulent.
18
4.4 Result Analysis and Validation
Lastly, this is the user interface of our project, which works on the best working model,
which is the Decision Tree Classifier.
A decision tree model must be developed and evaluated, an intuitive and user-friendly
interface must be designed, the model must be integrated with the UI via a backend
service, and the model must be continuously improved based on user feedback and model
performance. This is how a decision tree classifier-based UI for fake news detection is
designed. This theoretical foundation guarantees that the system is user-friendly and
easily comprehensible in addition to being accurate in identifying bogus news.
5. CONCLUSION
Identifying false information is essential and difficult in today's digital age. The quick
growth of social networks and online platforms has resulted in the widespread spread of
false information, which supports anti-social behaviors and has a big impact on social
digital marketing. Identifying false information remains a persistent and complicated
hurdle, with no one solution capable of stopping its dissemination. A comprehensive
plan is required, which includes technological strategies, media knowledge, and critical
thinking skills. Methods such as machine learning and natural language processing NLP
were developed to deal with this issue. Education and technology are essential to fight
false news, as they can help people acquire analytical thinking skills that enable them to
19
evaluate the reliability of information sources and their quality. Technology businesses
must collaborate with policymakers to successfully combat the spread of false
information. Through collaboration, we can improve transparency and decrease the
dissemination of false information in the digital environment. Ongoing research,
partnerships among different groups, and the creation of innovative technologies are
crucial for tackling these problems and protecting the accuracy of information in the
digital age.
6. REFERENCES
[1] Wang, Y., Qian, X. Li Y., & Zhang, H. (2018). Fake news detection on social
media: A data mining perspective using a hybrid deep learning model. ACM
Transactions on Management Information Systems (TMIS), 9(3), 1-21.
[2] Albahr, A., & Albahar, M. (2020). An empirical comparison of fake news
detection using different machine learning algorithms. International Journal of
Advanced Computer Science and Applications, 11(9).
[3] Khan, A. I., Shahzad, F., & Ali, S. (2019). Fake news detection: a deep learning
approach using CNN. IEEE Access, 7, 44112-44121. doi: 10.1109/ACCESS.2019
.2901590.
[4] Thakur, P., Shah, R. R., & Rana, N. P. (2020). A survey on automated fake news
detection: Trends and challenges. Information Processing & Management, 57(2),
102026. doi: 10.1016/j.ipm.2019.102026.
[5] Kumar, R., Singh, R. K., & Roy, P. P. (2021). Fake news detection on social media:
A review. Artificial Intelligence Review, 54(4), 2997-3030. doi: 10.1007/s10462-
020-09981-4.
[6] Allcott, H., & Gentzkow, M. (2017). Social Media and Fake News in the 2016
Election. Journal of Economic Perspectives, 31(2), 211-236. https://doi.org/10.12
57/jep.31.2.211
20
[7] Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection:
Methods for finding fake news. Proceedings of the Association for Information
Science and Technology, 52(1), 1-4. https://doi.org/10.1002/pra2.2015.145052010
082
[8] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on
Social Media: A Data Mining Perspective. ACM SIGKDD Explorations
Newsletter, 19(1), 22-36. https://doi.org/10.1145/3137597.3137600
[9] Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of
Varying Shades: Analyzing Language in Fake News and Political Fact-Checking.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language
Processing (EMNLP), 2931-2937. https://doi.org/10.18653/v1/D17-1317.
Working on the project to identify fake news using machine learning techniques like
decision trees has given our team new knowledge and experience. We were able to
achieve our objectives through efficient teamwork since every team member brought a
unique set of skills and perspectives to the table. In this reflection, each of us will offer
our unique viewpoints and project-related contributions.
During this research, I, Aditi Rath (2041018064), was mostly interested in feature
engineering and data pretreatment. I collected and analyzed a large number of news items
using techniques like tokenization, stemming, and TF-IDF in order to extract useful
features. Using decision tree algorithms, I was able to identify crucial features that help
identify false news. In order to improve the decision tree model's performance, I also
performed cross-validation and made hyperparameter adjustments. In addition, I actively
participated in team meetings and discussions by bringing up ideas to improve workflow
overall and boost team output. I was centered on creating and evaluating the decision tree
model. In order to understand the preprocessed data and select the appropriate features
for the model's training. To assess the precision, recall, and accuracy of the model, I
employed decision tree approaches and conducted a comprehensive testing procedure. I
21
also investigated ensemble parameters like the Tf-Idf transformer and Count vectorizer to
see whether they could help the model perform better. Throughout the project, I
encouraged discussions with the team about potential adjustments and future directions
by sharing the data and conclusions with them. Finally, I created comprehensive
evaluation standards, including F1 score, recall, accuracy, and precision, to rank the
decision tree model's performance. I also utilized methods like confusion matrix analysis
to look at the pros and cons of the model and made an user Interface as well.
For the model's training, I, Ashutosh Sarangi (2041019145), focused on obtaining and
annotating a reliable dataset. I worked very hard to locate trustworthy news sources and
ensured that the dataset was of the highest caliber. I also worked on the system
specifications to actually analize what was best for our project along with ensuring the
smooth application of our project. I collaborated with my team members to finalize what
were the parameters we were using for the successful application of our project.
I, Indrajit Das(2041004164) ,my main contribution to the research was centered on the
assessment and interpretation of the model. I have analyzed and viewed the existing
system through reading news articles of fake news detection topics throughout the
internet and went across the solutions that we already have with which I came across the
problem outcomes which was developing this project which compared traditional ML
algorithms to find the best one.Overall, we were able to tackle the challenging problem of
identifying fake news by effectively cooperating and utilizing our unique
individual skills.
22
8. SIMILARITY REPORT
23