0% found this document useful (0 votes)
34 views

Batch 16 FakeMediaDetection BasedonNLP

Project

Uploaded by

baminiaishwarya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Batch 16 FakeMediaDetection BasedonNLP

Project

Uploaded by

baminiaishwarya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

A

Mini Project
On

FAKE MEDIA DETECTION BASED ON NATURAL LANGUAGE


PROCESSING AND BLOCKCHAIN APPROCHES
(Submitted in partial fulfillment of the requirements for the award of Degree)

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE AND ENGINEERING


By
B.Aishwarya(217R1A0508)
S.Praveen(217R1A0550)
N.Avanthi Rathod(217R1A0541)

Under the Guidance of

G.Swarna Latha
(Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CMR TECHNICAL CAMPUS
UGC AUTONOMOUS

(Accredited by NAAC, NBA, Permanently Affiliated to JNTUH, Approved by AICTE, New Delhi)
Recognized Under Section 2(f) & 12(B) of the UGCAct.1956,
Kandlakoya (V), Medchal Road, Hyderabad-501401.
2021-25
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the project entitled “FAKE MEDIA DETECTION BASED ON
NATUARAL LANGUAGE PROCESSING AND BLOCKCHAIN APPROCHES” being
submitted by B.AISHWARYA (217R1A0508), S.PRAVEEN (217R1A0550) &
N.AVANTHI RATHOD (217R1A0541) in partial fulfillment of the requirements for the
award of the degree of B.Tech in Computer Science and Engineering to the Jawaharlal Nehru
Technological University Hyderabad, is a record of bonafide work carried out by him/her
under our guidance and supervision during the year 2024-25.

The results embodied in this thesis have not been submitted to any other University
or Institute for the award of any degree or diploma.

Mrs.G.Swarna Latha Dr. A. Raji Reddy


(Assistant Professor)
DIRECTOR
(INTERNAL GUIDE)

Dr. Nuthanakanti Bhaskar EXTERNAL EXAMINER


HoD

Submitted for viva voice Examination held on


ACKNOWLEGDEMENT

Apart from the efforts of us, the success of any project depends largely on the
encouragement and guidelines of many others. We take this opportunity to express our
gratitude to the people who have been instrumental in the successful completion of this project.
We take this opportunity to express my profound gratitude and deep regard to my guide
Mrs.G.Swarna Latha, Assistant Professor for her exemplary guidance, monitoring
and constant encouragement throughout the project work. The blessing, help and guidance
given by her shall carry us a long way in the journey of life on which we are about to embark.
We also take this opportunity to express a deep sense of gratitude to Project Review
Committee(PRC) Coordinators: Dr. K.Maheswari, Dr. S.Suma,Mr. J.Narasimha Rao, Mrs
K.Shilpa, Mr .K.Ranjith Reddy for their cordial support, valuable information and guidance,
which helped us in completing this task through various stages.
We are also thankful to Dr. Nuthanakanti Bhaskar, Head, Department of Computer
Science and Engineering for providing encouragement and support for completing this project
successfully.
We are obliged to Dr. A. Raji Reddy, Director for being cooperative throughout the
course of this project. We would like to express our sincere gratitude to Sri. Ch. Gopal Reddy,
Chairman for providing excellent infrastructure and a nice atmosphere throughout the course
of this project.
The guidance and support received from all the members of CMR Technical Campus
who contributed to the completion of the project. We are grateful for their constant support
and help.
Finally, we would like to take this opportunity to thank our family for their constant
encouragement, without which this assignment would not be completed. We sincerely
acknowledge and thank all those who gave support directly and indirectly in the completion
of this project.

B.AISHWARYA (217R1A0508)
S.PRAVEEN (217R1A0550)
N.AVANTHI RATHOD (217R1A0541)
ABSTRACT

Social media network is one of the important parts of human life based on the recent
technologies and developments in terms of computer science area. This environment has
become a famous platform for sharing information and news on any topics and daily reports,
which is the main era for collecting data and data transmission. There are various advantages
of this environment, but in another point of view there are lots of fake news and information
that mislead the reader and user for the information needed. Lack of trust-able information and
real news of social media information is one of the huge problems of this system. To overcome
this problem, we have proposed an integrated system for various aspects of block chain and
natural language processing (NLP) to apply machine learning techniques to detect fake news
and better predict fake user accounts and posts. The Reinforcement Learning technique is
applied for this process. To improve this platform in terms of security, the decentralized
Blockchain framework applied, which provides the outline of digital contents authority proof.
More specifically, the concept of this system is developing a secure platform to predict and
identify fake news in social media networks.

i
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO

Figure 3.1 Project Architecture 10

Figure 3.2 Use case diagram 12

Figure 3.3 Class diagram 13

Figure 3.4 Sequence diagram 15

Figure 3.5 Activity diagram 16

ii
RESULTS AND DISCUSSIONS

SCREENSHOT NO. SCREENSHOT NAME PAGE NO.

Figure 5.1 Home Page Result 25

Figure 5.2 Upload Result 25

Figure 5.3 Preprocess Result 26

Figure 5.4 Reprocessed Result 26

Figure 5.5 Run Result 27

Figure 5.6 Console Result 28

Figure 5.7 Graph result 28

Figure 5.8 Upload Text Result 29

Figure 5.9 Prediction Result 29

iii
TABLE OF CONTENTS
i
ABSTRACT

LIST OF FIGURES ii

LIST OF SCREENSHOTS iii

1. INTRODUCTION 1

1.1 PROJECT SCOPE 2

1.2 PROJECT PURPOSE 2

1.3 PROJECT FEATURES 3

2. SYSTEM ANALYSIS 4

2.1 PROBLEM DEFINITION 5

2.2 EXISTING SYSTEM 5

2.2.1 LIMITATIONS OF THE EXISTING SYSTEM 6

2.3 PROPOSED SYSTEM 6

2.3.1 ADVANTAGES OF PROPOSED SYSTEM 6

2.4 FEASIBILITY STUDY 6

2.4.1 ECONOMIC FESIBILITY 6

2.4.2 TECHNICAL FEASIBILITY 7

2.4.3 SOCIAL FEASIBILITY 7

2.5 HARDWARE & SOFTWARE REQUIREMENTS 7

2.5.1 HARDWARE REQUIREMENTS 7

2.5.2 SOFTWARE REQUIREMENTS 8

3. ARCHITECTURE 9

3.1 PROJECT ARCHITECTURE 10

3.2 DESCRIPTION 10

3.3 USECASE DIAGRAM 12

3.4 CLASS DIAGRAM 13

iv
3.5 SEQUENCE DIAGRAM 15

3.6 ACTIVITY DIAGRAM 16

4. IMPLEMENTATION 18

4.1 SAMPLE CODE 19

5. RESULTS AND DISCUSSION 24

6. TESTING 31
6.1 INTRODUCTION TO TESTING 32

6.2 TYPES OF TESTING 32

6.2.1 UNIT TESTING 32

6.2.2 INTEGRATION TESTING 32

6.2.3 FUNCTIONAL TESTING 32

6.3 TEST CASES 33

6.3.1 UPLOADING IMAGES 33

6.3.2 CLASSIFICATION 33

7. CONCLUSION & FUTURE SCOPE 34

7.1 PROJECT CONCLUSION 35

7.2 FUTURE SCOPE 35

8. BIBLIOGRAPY 36

8.1 REFERENCES 37

8.2 WEBSITES 37

v
1. INTRODUCTION
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

1. INTRODUCTION

1.1 PROJECT SCOPE

The "Fake Media Detection Using Natural Language Processing and Blockchain"
project aims to develop a comprehensive system for identifying and mitigating the spread of
fake news in online media. The scope of the project encompasses several key areas, beginning
with data collection, where a diverse dataset of news articles will be gathered from various
sources, including both established media outlets and user-generated content. To analyze the
textual content of these articles, the project will implement natural language processing (NLP)
techniques that focus on features such as sentiment analysis, topic modeling, and linguistic
patterns indicative of misinformation. Additionally, community detection algorithms will be
utilized to identify clusters of news sources, helping to understand how information spreads
within these communities and recognizing the influence of regional and cultural biases.

The integration of blockchain technology will create a decentralized verification system for
news sources and articles, allowing users to track the provenance of information and verify its
authenticity. The project will also involve the development of predictive models designed to
assess the likelihood of news articles being fake, based on the identified features and
community dynamics. A user-friendly interface will be created to enable users to check news
articles for credibility, view the source’s history, and report suspected fake news.
Comprehensive evaluation and testing of the models and the overall system will ensure
accuracy, reliability, and scalability.

1.2 PROJECT PURPOSE


The primary purpose of this project is to combat the proliferation of fake news in online
media by leveraging advanced technologies like NLP and blockchain. The project seeks to
enhance public awareness by providing users with tools to critically evaluate the information
they consume, fostering a more informed public. It aims to promote credibility in journalism
by encouraging news outlets to maintain high standards, as users will be able to easily track
and verify reporting histories. Furthermore, the project intends to facilitate information
integrity through a decentralized system that holds media sources accountable, thereby
reducing the risk of misinformation spreading across social networks.

CMRTC
2
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN
By developing robust models that accurately predict and detect fake news, the project
also aspires to improve the overall quality of information available to users.

1.3 PROJECT FEATURES

Key features of the project will include text analysis tools for sentiment analysis and
fake news detection algorithms, as well as community structure analysis to better understand
the dynamics of information spread. The incorporation of blockchain will ensure immutable
records of news sources, enhancing transparency and traceability. A user feedback mechanism
will allow users to report fake news, contributing to the model’s training and improving
detection capabilities over time. Real-time monitoring of trending news articles and their
authenticity scores will be enabled, alongside an interactive visualization dashboard displaying
the credibility of news sources, recent reports of fake news, and insights from community
detection. The system will be designed for scalability and performance to handle large datasets
and numerous users through optimized algorithms and parallel processing techniques. Finally,
educational resources will be provided to help users better understand how to identify fake
news and the importance of media literacy. This structured approach will ensure the project is
comprehensive, targeted, and effective in addressing the challenges posed by fake media.

CMRTC
3
2. SYSTEM ANALYSIS
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

2.SYSTEM ANALYSIS
System Analysis is the important phase in the system development process. The System
is studied to the minute details and analyzed. The system analyst plays an important role of an
interrogator and dwells deep into the working of the present system. In analysis, a detailed study
of these operations performed by the system and their relationships within and outside the
system is done. A key question considered here is, “what must be done to solve the problem?”
The system is viewed as a whole and the inputs to the system are identified. Once analysis is
completed the analyst has a firm understanding of what is to be done.

2.1 PROBLEM DEFINITION

As misinformation spreads rapidly across social media and other digital platforms,
individuals often struggle to discern credible news from false reports, leading to misinformed
opinions and decisions. Traditional methods of fact-checking are often slow and inadequate to
keep pace with the volume and velocity of information shared online. Furthermore, the lack
of transparency and accountability in many news sources complicates efforts to verify the
authenticity of information, allowing misleading narratives to gain traction.

Additionally, the existing approaches to combating misinformation often fail to


account for the complex community dynamics within online media, where information
diffusion can be influenced by geographical, cultural, and social factors. This makes it difficult
to accurately model and predict the spread of fake news.

The project aims to address these issues by developing a robust system that utilizes
natural language processing to analyze news articles for indicators of misinformation, while
also leveraging blockchain technology to create a decentralized and transparent verification
process for news sources. By integrating community detection algorithms, the system will
enhance the understanding of how information spreads within specific clusters, ultimately
improving the prediction and detection of fake news.

2.2 EXISTING SYSTEM


News reports shape the public perception of the critical social, political, and
economical events around the world. Yet, the way in which emergent phenomena are reported
in the news makes the early prediction of such phenomena a challenging task.

CMRTC
5
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

2.2.1 LIMITATIONS OF EXISTING SYSTEM

• Prediction is less.
• Low accuracy
• Low efficiency

2.3 PROPOSED SYSTEM

We propose a scalable community-based probabilistic framework to model the


spreading of news about events in online media. Our approach exploits the latent community
structure in the global news media and uses the affiliation of the early adopters with a variety
of communities to identify the events widely reported in the news at the early stage of their
spread. The time complexity of our approach is linear in the number of news reports. It is also
amenable to efficient parallelization. To demonstrate these features, the inference algorithm is
parallelized for message passing paradigm and tested on the Rensselaer Polytechnic Institute
Advanced Multiprocessing Optimized System, one of the fastest Blue Gene/Q supercomputers
in the world. Thanks to the community-level features of the early adopters, the model gains an
improvement of 20% in the early detection of the most massively reported events compared
with the feature-based machine learning algorithm. Its parallelization scheme achieves orders
of magnitude speedup.

2.3.1 ADVANTAGES OF THE PROPOSED SYSTEM


• Prediction is more.
• High accuracy
• High efficiency

2.4 FEASIBILITY STUDY


The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. Three key considerations involved in the analysis are

• Economic Feasibility
• Operational Feasibility
• Technical Feasibility

CMRTC
6
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

2.4.1 ECONOMIC FEASIBILITY


A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs. The system is economically feasible. It does
not require any addition hardware or software. Since the interface for this system is developed
using the existing resources and technologies available at NIC, There is nominal expenditure
and economical feasibility for certain.

2.4.2 OPERATIONAL FEASIBILITY


Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of
the project are to be taken as an important part of the project implementation. This system is
targeted to be in accordance with the above-mentioned issues. Beforehand, the management
issues and user requirements have been taken into consideration. So there is no question of
resistance from the users that can undermine the possible application benefits. The well-
planned design would ensure the optimal utilization of the computer resources and would help
in the improvement of performance status.

2.4.3 TECHNICAL FEASIBILITY


Earlier no system existed to cater to the needs of ‘Secure Infrastructure Implementation
System’. The current system developed is technically feasible. It is a web based user interface
for audit workflow at NIC-CSD. Thus it provides an easy access to .the users. The database’s
purpose is to create, establish and maintain a workflow among various entities in order to
facilitate all concerned users in their various capacities or roles. Permission to the users would
be granted based on the roles specified. Therefore, it provides the technical guarantee of
accuracy, reliability and security.

2.5 HARDWARE & SOFTWARE REQUIREMENTS

2.5.1 HARDWARE REQUIREMENTS:


Hardware interfaces specifies the logical characteristics of each interface between the
software product and the hardware components of the system. The following are some
hardware requirements.

CMRTC
7
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN
• Processor : Intel i5

• Hard disk : 16GB and Above.

• RAM : 4GB and Above.

• Monitor : 5 inches or above.

2.5.2 SOFTWARE REQUIREMENTS:


Software Requirements specifies the logical characteristics of each interface and
software components of the system. The following are some software requirements,

• Operating system : Windows 8


• Languages : Python(v3.7.0)

CMRTC
8
3. ARCHITECTURE
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

3.ARCHITECTURE

3.1 PROJECT ARCITECTURE

This project architecture shows the procedure followed for breed detection using machine
learning, starting from input to final prediction.

Figure 3.1: Project Architecture of Image Classifier to Identify Dog Breeds

3.2 DESCRIPTION

1. Data Collection

• The system gathers data from two primary sources:

Social Network: Content from social media platforms where a large amount of user-
generated content is shared.

News Data: Data from news websites, which are another critical source for detecting
misinformation or fake news.

• This raw data is collected and fed into the next phase, where it undergoes various
processes.

CMRTC
10
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

2. Blockchain Module

This section handles the block chain-related operations, ensuring the system's transparency
and immutability in verifying fake media.

Hash Generation: This step generates a unique hash for each piece of media or data being
analyzed. The hash helps to ensure data integrity and traceability on the blockchain.

Mining Data: The collected data is then 'mined' into blocks. This essentially means that the
system organizes the data into block chain-compatible structures for further processing and
verification.

Smart Contract: A smart contract is a self-executing contract where the terms of the
agreement are directly written into lines of code. Here, it plays a role in automating the
verification of media authenticity. If the media is identified as fake, the contract might
trigger an alert or store relevant information on the blockchain.

Consensus Verification: Blockchain operates on a decentralized network where consensus


mechanisms (such as Proof of Work, Proof of Stake, etc.) verify the authenticity of blocks.
This step ensures that multiple nodes agree on the authenticity of the data before it is
accepted into the blockchain.

3. Data Nodes

• P1, P2, ..., Pn: These are distributed nodes in the blockchain network that store verified
data blocks. Each node has a copy of the blockchain ledger, making the system resilient
and decentralized.
• The data nodes respond to access requests from users seeking to verify the authenticity of
a piece of media.

4. User Interaction

• Access Request: Users can request access to verify whether a piece of media (either from
social networks or news) is genuine or fake.
• Response: Once the blockchain verifies the data, the user receives a response indicating
the authenticity of the media.

CMRTC
11
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

Overall Flow:

• Social media and news data are collected.

• The data is processed using block chain technologies like hashing, mining, and smart
contracts for verification.
• The verified data is stored in distributed data nodes.
• Users can query the system to check the validity of media, and blockchain provides a
verified response.

3.3 USE CASE DIAGRAM


In the use case diagram we have basically two actors who are the user and the
administrator. The user has the rights to login, access to resources and to view the crime details.
Whereas the administrator has the login, access to resources of the users and also the right to
update and remove the crime details, and he can also view the user files.

Figure 3.2: Use Case Diagram for user for Fake media Detection

The use case diagram illustrates the key interactions between the user and a Fake News
Detection system. The process begins with the User Login Screen, where the user needs to
authenticate to gain access to the system’s features.
CMRTC
12
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

This step ensures that only authorized individuals can utilize the platform. Once logged
in, the user can proceed to Upload News Articles. Here, the user provides the news articles or
textual data for the system to analyze, forming the input for the core functionality.

The next step involves the Run Fake News Detector Algorithm, which is the central
feature of the system. After the news articles are uploaded, the user initiates the algorithm,
which applies natural language processing (NLP) and potentially blockchain verification to
determine the authenticity of the news content. This function is crucial for distinguishing
between real and fake news.

Finally, the user has the option to Logout, which securely ends the session and prevents
unauthorized access after the tasks are completed. The overall flow in the diagram shows a
straightforward interaction, where the user logs in, uploads content, triggers the fake news
detection process, and logs out after the operation is complete.

3.4 CLASS DIAGRAM

Class Diagram is a collection of classes and objects.

Figure 3.3: Class Diagram for Fake media Detection


CMRTC
13
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

The class diagram represents a machine learning workflow centered around an LSTM
(Long Short-Term Memory) model used for text classification or prediction tasks. The process
begins with the DatasetHandler, which is responsible for managing the dataset. This class has
methods like uploadDataset() for uploading the dataset, loadCSV(filename: String) for loading
data from a CSV file, and getData() to retrieve the data for further processing.

Once the data is loaded, it is passed to the Preprocessor class, which cleans and
prepares the data. The preprocessor removes unnecessary words (stopwords) and applies
lemmatization (using WordNetLemmatizer) to reduce words to their base forms. It has
functions like cleanPost(doc: String) to clean individual documents and preprocess() to handle
the complete preprocessing operation on the dataset.

The LSTMModel class plays the central role by performing the machine learning tasks.
It includes methods like runLSTM() to run the model, trainLSTM(X, Y) to train the model
with the input features (X) and labels (Y), and saveModel() or loadModel() to store or retrieve
models. Once trained, it can predict outcomes through the predict(review: String) function and
evaluate the model's performance using evaluate().

The PredictionHandler takes care of predictions specifically for news articles with its
predictNews(newsItem: String) method, which predicts the category or sentiment of the given
news item.

Finally, the GraphHandler class provides visualization tools to monitor the model's
performance. It has methods like graph() and plotGraph(accuracy: list, loss: list) to generate
and display graphs that track the model's accuracy and loss during the training process.

CMRTC
14
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

3.5 SEQUENCE DIAGRAM

Figure 3.4: Sequence Diagram for Fake media detection

This sequence diagram outlines the process flow for a system that performs news
detection using an LSTM (Long Short-Term Memory) model. It involves several components:
GUI, DatasetHandler, Preprocessor, LSTMModel, GraphHandler, and PredictionHandler. The
user initiates the application by uploading a dataset (uploadDataset(filename)) through the
GUI, which is then cleaned by the DatasetHandler (cleanPost()). The Preprocessor processes
the data (preprocess()), after which the LSTMModel is triggered to either load a pre-trained
model (loadModel()) or train a new model if none exists (trainLSTM()), followed by saving
the model (saveModel()).

Once the model is prepared, the system generates accuracy and loss graphs (Generate
Graph()) using the GraphHandler, and these graphs are displayed to the user.

CMRTC
15
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

In the second phase, the user can input a news item to test for detection
(predictNews(newsItem)), which is processed by the Prediction-handler to make a prediction
(predict(newsItem)), and the results are displayed back to the use.

3.6 ACTIVITY DIAGRAM

It describes about flow of activity states.

Figure 3.5: Activity Diagram for User for fake media detection

CMRTC
16
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

This activity diagram represents the workflow for a news detection system using an LSTM
(Long Short-Term Memory) model. It begins with the user uploading a dataset, followed
by the steps required to process and use this data for model training or prediction.

1. Upload Dataset: The user initiates the process by providing a dataset.


2. Clean and Preprocess Data: The raw dataset is cleaned and preprocessed, preparing it
for the next steps.
3. Transform Data using TF-IDF: The preprocessed data is transformed using TF-IDF
(Term Frequency-Inverse Document Frequency), a feature extraction technique used to
quantify words' importance in a dataset.
4. Split Data into Train/Test sets: The dataset is split into training and testing sets to enable
model validation.
5. Model Exists?: A decision point checks if a pre-trained LSTM model is available.
If the model exists: The pre-trained LSTM model is loaded.
If no model exists: A new LSTM model is trained using the training dataset, and the
trained model is saved for future use.
6. View Accuracy and Loss Graph: The system generates graphs showing the accuracy and
loss metrics, which provide insight into the model's performance.
7. Test News Prediction: After the model is trained or loaded, it can be used to test news
predictions, where a user can input new data for classification.
8. End: The process concludes after the prediction is made.

CMRTC
17
4. IMPLEMENTATION
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

4. IMPLEMENTATION

4.1 SAMPLE CODE

from tkinter import messagebox


from tkinter import *
from tkinter import simpledialog
import tkinter
import matplotlib.pyplot as plt
import numpy as np
from tkinter import ttk
from tkinter import filedialog
import pandas as pd
from sklearn.model_selection import train_test_split
from string import punctuation
from nltk.corpus import stopwords
import nltk
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers.core import Dense,Activation,Dropout
from sklearn.preprocessing import OneHotEncoder
import keras.layers
from keras.models import model_from_json
import pickle
import os
from sklearn.preprocessing import normalize

from keras.models import Sequential


from keras.layers import Dense, Dropout, Flatten, LSTM

main = Tk()
main.title("DETECTION OF FAKE NEWS THROUGH IMPLEMENTATION OF DATA
SCIENCE APPLICATION")
main.geometry("1300x1200")

global filename
global X, Y
global tfidf_X_train, tfidf_X_test, tfidf_y_train, tfidf_y_test
global tfidf_vectorizer
global accuracy,error

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

textdata = []
labels = []
global classifier

CMRTC 13
19
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

def cleanPost(doc):
tokens = doc.split()
table = str.maketrans('', '', punctuation)
tokens = [w.translate(table) for w in tokens]
tokens = [word for word in tokens if word.isalpha()]
tokens = [w for w in tokens if not w in stop_words]
tokens = [word for word in tokens if len(word) > 1]
tokens = [lemmatizer.lemmatize(token) for token in tokens]
tokens = ' '.join(tokens)
return tokens

def uploadDataset():
global filename
text.delete('1.0', END)
filename = filedialog.askopenfilename(initialdir="TwitterNewsData")
textdata.clear()
labels.clear()
dataset = pd.read_csv(filename)
dataset = dataset.fillna(' ')
for i in range(len(dataset)):
msg = dataset.get_value(i, 'text')
label = dataset.get_value(i, 'target')
msg = str(msg)
msg = msg.strip().lower()
labels.append(int(label))
clean = cleanPost(msg)
textdata.append(clean)
text.insert(END,clean+" ==== "+str(label)+"\n")

def preprocess():
text.delete('1.0', END)
global X, Y
global tfidf_vectorizer
global tfidf_X_train, tfidf_X_test, tfidf_y_train, tfidf_y_test
stopwords=stopwords = nltk.corpus.stopwords.words("english")
tfidf_vectorizer = TfidfVectorizer(stop_words=stopwords, use_idf=True,
ngram_range=(1,2),smooth_idf=False, norm=None, decode_error='replace',
max_features=200)
tfidf = tfidf_vectorizer.fit_transform(textdata).toarray()
df = pd.DataFrame(tfidf, columns=tfidf_vectorizer.get_feature_names())
text.insert(END,str(df))
print(df.shape)
df = df.values
X = df[:, 0:df.shape[1]]
X = normalize(X)
Y = np.asarray(labels)
le = LabelEncoder()
Y = le.fit_transform(Y)
indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
Y = Y[indices]
Y = Y.reshape(-1, 1)
CMRTC 13
20
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN
print(X.shape)
encoder = OneHotEncoder(sparse=False)
#Y = encoder.fit_transform(Y)
X = X.reshape((X.shape[0], X.shape[1], 1))
print(Y)
print(Y.shape)
print(X.shape)
tfidf_X_train, tfidf_X_test, tfidf_y_train, tfidf_y_test = train_test_split(X, Y,
test_size=0.2)
text.insert(END,"\n\nTotal News found in dataset : "+str(len(X))+"\n")
text.insert(END,"Total records used to train machine learning algorithms :
"+str(len(tfidf_X_train))+"\n")
text.insert(END,"Total records used to test machine learning algorithms :
"+str(len(tfidf_X_test))+"\n")

def runLSTM():
text.delete('1.0', END)
global classifier
if os.path.exists('model/model.json'):
with open('model/model.json', "r") as json_file:
loaded_model_json = json_file.read()
classifier = model_from_json(loaded_model_json)
classifier.load_weights("model/model_weights.h5")
classifier._make_predict_function()
print(classifier.summary())
f = open('model/history.pckl', 'rb')
data = pickle.load(f)
f.close()
acc = data['accuracy']
acc = acc[9] * 100
text.insert(END,"LSTM Fake News Detection Accuracy : "+str(acc)+"\n\n")
text.insert(END,'LSTM Model Summary can be seen in black console for layer
details\n')
with open('model/model.txt', 'rb') as file:
classifier = pickle.load(file)
file.close()
else:
lstm_model = Sequential()
lstm_model.add(LSTM(128, input_shape=(X.shape[1:]), activation='relu',
return_sequences=True))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(128, activation='relu'))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(32, activation='relu'))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(2, activation='softmax'))
lstm_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
hist = lstm_model.fit(X, Y, epochs=10, validation_data=(tfidf_X_test,
tfidf_y_test))
classifier = lstm_model
classifier.save_weights('model/model_weights.h5')
model_json = classifier.to_json()
CMRTC 13
21
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN
with open("model/model.json", "w") as json_file:
json_file.write(model_json)
accuracy = hist.history
f = open('model/history.pckl', 'wb')
pickle.dump(accuracy, f)
f.close()
acc = accuracy['accuracy']
acc = acc[9] * 100
text.insert(END,"LSTM Accuracy : "+str(acc)+"\n\n")
text.insert(END,'LSTM Model Summary can be seen in black console for layer
details\n')
print(lstm_model.summary())

def graph():
f = open('model/history.pckl', 'rb')
data = pickle.load(f)
f.close()
acc = data['accuracy']
loss = data['loss']
plt.figure(figsize=(10,6))
plt.grid(True)
plt.xlabel('Epcchs')
plt.ylabel('Accuracy/Loss')
plt.plot(acc, 'ro-', color = 'green')
plt.plot(loss, 'ro-', color = 'blue')
plt.legend(['Accuracy','Loss'], loc='upper left')
#plt.xticks(wordloss.index)
plt.title('LSTM Model Accuracy & Loss Graph')
plt.show()

def predict():
testfile = filedialog.askopenfilename(initialdir="TwitterNewsData")
testData = pd.read_csv(testfile)
text.delete('1.0', END)
testData = testData.values
testData = testData[:,0]
print(testData)
for i in range(len(testData)):
msg = testData[i]
msg1 = testData[i]
print(msg)
review = msg.lower()
review = review.strip().lower()
review = cleanPost(review)
testReview = tfidf_vectorizer.transform([review]).toarray()
predict = classifier.predict(testReview)
print(predict)
if predict == 0:
text.insert(END,msg1+" === Given news predicted as GENUINE\n\n")
else:
text.insert(END,msg1+" == Given news predicted as FAKE\n\n")

CMRTC 13
22
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

font = ('times', 15, 'bold')


title = Label(main, text='DETECTION OF FAKE NEWS THROUGH IMPLEMENTATION
OF DATA SCIENCE APPLICATION')
title.config(bg='gold2', fg='thistle1')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=0,y=5)

font1 = ('times', 13, 'bold')


ff = ('times', 12, 'bold')

uploadButton = Button(main, text="Upload Fake News Dataset", command=uploadDataset)


uploadButton.place(x=20,y=100)
uploadButton.config(font=ff)

processButton = Button(main, text="Preprocess Dataset", command=preprocess)


processButton.place(x=20,y=150)
processButton.config(font=ff)

dtButton = Button(main, text="Run LSTM Algorithm", command=runLSTM)


dtButton.place(x=20,y=200)
dtButton.config(font=ff)

graphButton = Button(main, text="Accuracy & Loss Graph", command=graph)


graphButton.place(x=20,y=250)
graphButton.config(font=ff)

predictButton = Button(main, text="Test News Detection", command=predict)


predictButton.place(x=20,y=300)
predictButton.config(font=ff)

font1 = ('times', 12, 'bold')


text=Text(main,height=30,width=100)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=330,y=100)
text.config(font=font1)

main.config(bg='DarkSlateGray1')
main.mainloop()

CMRTC 13
23
5.RESULTS AND
DISCUSSIONS
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

5.RESULTS AND DISCUSSION


5.1 HOME PAGE
To run project double click on ‘run.bat’ file to get below screen

Screenshot 5.1: Home Page for fake media detection

In above screen click on ‘Upload Fake News Dataset’ button to upload dataset

5.2 UPLOAD RESULT

Screenshot 5.2: Upload result of Fake media detection

CMRTC
25
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

In above screen selecting and uploading ‘news.csv’ file and then click on ‘Open’ button
to load dataset and to get below screen

5.3 PREPROCESS RESULT

Figure 5.3 Preprocess result of Fake media detection

In above screen dataset loaded and then in text area we can see all news text with the
class label as 0 or 1 and now click on ‘Preprocess Dataset & Apply NGram’ button to
convert above string data to numeric vector and to get below screen

5.4 PREPROCESSED RESULT

Screenshot 5.3: Preprocessed Result

CMRTC
26
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

In above screen all news words put in column header and if that word appear in any
row then that rows column will be change with word count and if not appear then 0 will be
put in column.

In above screen showing some records from total 7612 news records and in bottom
lines we can see dataset contains total 7613 records and then application using 80% (6090
news records) for training and then using 20% (1523 news records) for testing and now
dataset is ready with numeric record and now click on ‘Run LSTM Algorithm’ button to
train above dataset with LSTM and then build LSTM model and then calculate accuracy and
error rate.

5.5 RUN RESULT

Figure 5.5 Run result of Fake media detection

In above screen LSTM model is generated and we got its prediction accuracy as 69.49%
and we can see below console to see LSTM layer details.

CMRTC
27
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

5.6 CONSOLE RESULT

Figure 5.6 Console result of LSTM for Fake media detection

In above screen different LSTM layers are created to filter input data to get efficient
features for prediction. Now click on ‘Accuracy & Loss Graph’ button to get LSTM graph

5.7 GRAPH RESULT

Figure 5.7 Graph result for fake media detection

In above graph x-axis represents epoch/iterations and y-axis represents accuracy and
loss value and green line represents accuracy and blue line represents loss value and at each
increasing epoch loss values get decrease and accuracy reached to 70%.

CMRTC
28
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

Now click on ‘Test News Detection’ button to upload some test news sentences and
then application predict whether that news is genuine or fake. In below test news dataset we
can see only TEXT data no class label and LSTM will predict class label for that test news

5.8 UPLOAD TEXT RESULT

Figure 5.8 Upload text result for fake media detecion

In above screen selecting and uploading ‘testNews.txt’ file and then click on ‘Open’
button to load data and to get below prediction result

5.9 PREDICTION RESULT

Figure 5.9 Prediction result for fake media detection

CMRTC
29
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

In above screen before dashed symbols we have news text and after dashed symbol
application predict news as ‘FAKE or GENUINE’. After building model when we gave any
news text then LSTM will check whether more words belongs to genuine or fake category and
whatever category get more matching percentage then application

CMRTC
30
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

6.TESTING

CMRTC
31
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

6.TESTING
6.1 INTRODUCTION TO TESTING
The purpose of testing is to discover errors. Testing is the process of trying to
discoverevery conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, subassemblies, assemblies and/or a finished product. It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of test. Each test type addresses a specific testing requirement.

6.2 TYPES OF TESTING


6.2.1 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.

6.2.2 INTEGRATION TESTING


Integration tests are designed to test integrated software components to determine if
they actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components
is correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

6.2.3 FUNCTIONAL TESTING


Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.

CMRTC
32
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.


Organization and preparation of functional tests is focused on requirements, key functions,
or special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes.

6.3 TEST CASES


6.3.1 UPLOADING IMAGES

Test case ID Test case name Purpose Test Case Output


1 User login Use it for Verify whether the username and Login successful
identification password is correct or not

2 Uploads News Use it for The user uploads the news article Uploaded
Articles verification successfully

3 Run Fake News Use if for Verify the Fake News Detector Running done
Detector detection succesfully
Algorithm Run or not.
Algorithm

6.3.2 CLASSIFICATION
Test case ID Test case name Purpose Input Output
1 Classification To check if the A news article is given Detection done
test 1 classifier performs successfully..
its task

2 Classification To check if the news articles We cannot do further


test 2 classifier performs without loading operations.
its task dataset

3 Classification To check if the Fake news articles Detected as fake.


test 3 classifier performs are given
its task

CMRTC
33
7.CONCLUSION
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

7. CONCLUSION & FUTURE SCOPE

7.1 PROJECT CONCLUSION

We exploit the latent community structure in the global news network to improve the
prediction of the viral cascades of news about events. The cascades which have early adopters
in different communities have advantages in disseminating the contagion to these communities
in parallel and therefore are more likely to result in the viral infections within a limited time
period. Our model captures such property by inferring the community structure using the
response times of nodes. Thus, we avoid using the explicit network topology which is often
not known because the references to propagation sources are usually missing in the real data
sets. Due to the size of the relevant data sets, we successfully parallelized the inference
algorithm for distributed memory machines and tested this parallelization on the RPI AMOS
achieving orders of magnitude speedup.

7.2 FUTURE SCOPE

"Fake News Detection Using NLP and Blockchain" has significant potential for future
advancements. One key area is real-time integration with social media platforms to monitor
and verify news as it is posted. The system can be improved by using advanced models like
BERT or GPT for more accurate text analysis. Expanding to multimodal detection—
including images and videos—can provide a comprehensive solution across media types.
Supporting multiple languages would broaden the system’s global applicability.

Blockchain technology can be leveraged to create decentralized, community-driven


verification platforms where users collaboratively report and validate fake news. Sentiment
analysis could also be added to track emotional manipulation in news content. For scalability,
deploying the system on cloud platforms would ensure it can handle larger datasets and more
users.

Additionally, social network analysis can help predict how fake news spreads,
enabling preventive actions. User-friendly mobile and web applications could provide real-
time notifications about fake news. Finally, blockchain's immutable records would ensure
secure and transparent verification, enhancing the system’s reliability. These improvements
would make the system more robust, scalable, and effective in combating misinformation.

CMRTC
35
8. BIBLIOGRAPHY
FAKE MEDIA DETECTION USING NATURAL
LANGUAGE PROCESSING AND BLOCKCHAIN

8.BIBLIOGRAPHY

8.1 REFERENCES
[1] A. Nematzadeh, E. Ferrara, A. Flammini, and Y. Y. Ahn, “Optimal network modularity
for information diffusion,” Phys. Rev. Lett., vol. 113, no. 8, p. 088701, 2014.

[2] J. Firmstone and S. Coleman, “The changing role of the local news media in enabling
citizens to engage in local democracies,” Journalism Pract., vol. 8, no. 5, pp. 596–606, 2014.

[3] R. A. Hackett, “Decline of a paradigm? Bias and objectivity in news media studies,” Crit.
Stud. Media Commun., vol. 1, no. 3, pp. 229–259, 1984.

[4] S. Della Vigna and E. Kaplan, “The Fox News effect: Media bias and voting,” Quart. J.
Econ., vol. 122, no. 3, pp. 1187–1234, 2007.

[5] M. Gentzkow and J. M. Shapiro, “Media bias and reputation,” J. Political Economy, vol.
114, no. 2, pp. 280–316, 2006.

[6] M. Karsai et al., “Small but slow world: How network topology and burstiness slow down
spreading,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 83, no. 2, p.
025102, 2011.

8.2 WEBSITES
[1] https://ieeexplore.ieee.org/document/9536745
[2] Githublink:
https://github.com/saidanollaPraveen/Batch-16-FAKE-MEDIA-DETECTION-USING-NATURAL-LANGUAGE-
PROCESSING-AND-BLOCKCHAIN

CMRTC
37

You might also like