0% found this document useful (0 votes)
19 views

Fyp 1

This document summarizes a final year project on spam tweet detection and data scraping. The project aims to develop a system using machine learning and natural language processing techniques to identify and filter spam tweets and analyze sentiment in tweets. A chatbot will also be integrated to help users navigate Twitter data and understand trends. The project was submitted to fulfill the requirements for a bachelor's degree in software engineering.

Uploaded by

bazedmmii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Fyp 1

This document summarizes a final year project on spam tweet detection and data scraping. The project aims to develop a system using machine learning and natural language processing techniques to identify and filter spam tweets and analyze sentiment in tweets. A chatbot will also be integrated to help users navigate Twitter data and understand trends. The project was submitted to fulfill the requirements for a bachelor's degree in software engineering.

Uploaded by

bazedmmii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Title of Final Project

“Spam Tweets Detection and Data Scrapping”

Project ID: (FYP-F23-29)

Session: BSSE Fall 2020 to 2024

Project Advisor:
Syed Zeeshan Ali

Submitted By

Bazed Gul 70113413

Muhammad Farooq 70110650

Taimoor Asghar 7O110725


Department of Software Engineering
The University of Lahore
Lahore, Pakistan

Declaration

We have read the project guidelines and we understand the meaning of academic dishonesty,
in particular plagiarism and collusion. We hereby declare that the work we submitted for our final
year project, entitled Tweets Spamming Analysis & Sentiment Analysis is original work and
has not been printed, published or submitted before as a final year project, research work,
publication or any other documentation.

Group Member 1 Name: Bazed Gul

SAP No: 70113413

Signature: …………………………
Group Member 2 Name: Muhammad Farooq

SAP No: 70110650

Signature: …………………………

Group Member 3 Name: Taimoor Asghar

SAP No: 7O110725

Signature: …………………………

Statement of Submission

This is to certify that Bazed Gul Roll No. 7113413, Muhammad Farooq Roll No. 70110650
and Taimoor Asghar Roll No. 7O110725 have successfully submitted the final project named
as: Tweets Spamming Analysis & Data Scrapping, at the Software Engineering Department,
The University of Lahore, Lahore Pakistan, to fulfill the partial requirement of the degree of BS
in Software Engineering.

Supervisor Name: Sir Syed Zeeshan Ali

Signature: …………………………

Date: ………………………
Dedication

This project is dedicated to my father, who taught me that the best kind of knowledge to have is
that which is learned for its own sake. It is also dedicated to my mother, who taught me that
even the largest task can be accomplished if it is done one step at a time.
Acknowledgment

We extend my heartfelt gratitude to my project supervisor, Sir Syed Zeeshan, for his unwavering
support and guidance throughout this journey. His constant motivation was a catalyst for our
project's progress, always pushing us to strive for excellence. His insightful feedback provided
invaluable direction, shaping our work and refining our ideas. Beyond expert guidance, Sir Syed
Zeeshan's approachable nature and willingness to help whenever needed created a
comfortable environment where we could thrive. We are truly grateful for his dedication and his
commitment to our success

Date:

Jan 20, 2024

Abstract
This project tackles the dual challenge of spam classification and sentiment analysis on social
media platforms, specifically focusing on Twitter. We develop a web-based application that
empowers users with valuable insights into the digital atmosphere of Twitter. Employing a
combination of data scraping techniques and Natural Language Processing (NLP) methods, our
application performs:

● Spam Classification: Utilizing rule-based and machine-learning algorithms, we identify


and filter unwanted content, enabling users to engage with genuine voices on Twitter.
● Sentiment Analysis: By leveraging state-of-the-art NLP techniques, we extract the
emotional tone from tweets, providing users with a real-time understanding of public
opinion and audience response.
Furthermore, the application integrates a conversational chatbot that acts as a user's digital
guide through the Twitterverse. This AI companion can answer questions about trending topics,
analyze the sentiment of specific tweets
Area of the Project

Our project sits at the vibrant intersection of Artificial Intelligence and Social Media analysis,
where we're building a powerful web-based application to tackle the two-headed beast of spam
and sentiment on Twitter. We're weaving intricate strands of Natural Language Processing
(NLP) into a shield against spam, using rule-based and machine-learning algorithms to identify
and filter unwanted content. On the other hand, we're wielding the delicate scalpel of sentiment
analysis, dissecting tweets to unveil the hidden emotions and opinions of the Twitterverse. But
our ambition doesn't stop there. We're also breathing life into a friendly and intelligent chatbot,
your own digital guide through the ever-evolving landscape of Twitter trends and insights.

Technologies used

Technologies used in our project e.g. HTML, CSS, JavaScript, Reactjs, Python, MongoDB etc.
List of Figures

Figure 1 Usecase Diagram Create Account 5

Figure 2 Architecture Diagram.. 7

Figure 3 ERD.. 7

Figure 4 Level 0 DFD.. 8

Figure 5 Level 1 DFD.. 8

Figure 6 Class Diagram.. 9

Figure 7 Activity Diagram Create Account 10

Figure 8 Sequence Diagram Create Account 11

Figure 9 Collaboration Diagram.. 12

Figure 10 State Transition Diagram.. 13


Figure 11 Component Diagram.. 14

Figure 12 Deployment Diagram.. 15

List of Tables

Table 1 Functional Requirement Create Account 3

Table 2 Usecase Create Account 6


Table of Content

Declaration. i

Statement of Submission. ii

Dedication. ii

Acknowledgment iv

Abstract v

List of Figures. vii

List of Tables. viii

Chapter 1: Introduction to the Problem.. 1

1.1 Introduction. 1

1.2 Purpose. 1

1.3 Objective. 1

1.4 Existing Solution. 1

1.5 Proposed Solution. 1

Chapter 2: Software Requirement Specification. 1

2.1 Introduction. 1

2.1.1 Purpose. 1

2.1.2 Scope. 1
2.1.3 Definitions, acronyms, and abbreviations. 1

2.2 Overall description. 2

2.2.1 Product perspective. 2

2.2.2 Product functions. 2

2.2.3 User characteristics 3

2.2.4 Constraints. 3

2.2.5 Assumptions and dependencies. 4

2.2.6 Apportioning of requirements. 4

2.3 Specific requirements. 4

2.3.1 Functional Requirement 4

2.3.2 Non-functional Requirements. 4

Chapter 3: Use Case Analysis. 5

Chapter 4: Design. 6

4.1 Architecture Diagram.. 7

4.2 ERD with data dictionary. 7

4.3 Data Flow diagram.. 8

4.3.1 The level 0. 8

4.3.2 The level 1. 8

4.4 Class Diagram.. 9

4.5 Activity Diagram.. 10

4.6 Sequence Diagram.. 11

4.7 Collaboration Diagram.. 12

4.8 State Transition Diagram.. 13

4.9 Component Diagram.. 14

4.10 Deployment Diagram.. 15

References. 16
Appendix. 17

Chapter 1: Introduction to the Problem


1.1 Introduction
The vast ocean of social media, while a treasure trove of information and connection, is often
plagued by unwanted currents - spam and misconstrued sentiment. Our project aims to
navigate these murky waters with the power of technology, providing users with a clearer and
more insightful lens to view the Twitterverse.

1.2 Purpose
Our purpose is to empower users with a web-based application that tackles the twin challenges
of spam classification and sentiment analysis on Twitter. We believe that understanding the
authenticity and emotional undertones of online communication is crucial for navigating the
often chaotic world of social media.

1.3 Objective
Our key objectives are:
● Develop a robust spam detection system: Utilizing a combination of rule-based and
machine-learning algorithms, we aim to identify and filter unwanted content, ensuring
users see genuine voices on Twitter.
● Implement accurate sentiment analysis: By leveraging state-of-the-art NLP techniques,
we will extract the emotional tone from tweets, providing users with valuable insights into
public opinion and audience response.
● Craft a user-friendly chatbot: Integrating a conversational AI companion, we will offer
users a guide through the Twitterverse, allowing them to ask questions, analyze specific
tweets, and engage in interactive dialogue.

1.4 Existing Solution


The ever-growing prevalence of spam and the difficulty in accurately gauging sentiment on
social media platforms like Twitter pose significant challenges for users.
● Spam: Inundated with unwanted promotional content and bots, users struggle to sift
through genuine voices and engage in meaningful interactions.
● Misconstrued Sentiment: The nuanced world of emotions often gets lost in text-based
communication, leading to misunderstandings and misinterpretations of public opinion.
These issues hinder meaningful social interaction, limit the accuracy of information gathering,
and create a frustrating user experience.

1.5 Proposed Solution


Our web-based application offers a multifaceted solution to the aforementioned problems:
● Precise Spam Filtering: Our system analyzes tweets for suspicious patterns and
linguistic markers, accurately identifying and filtering spam, leaving users with a cleaner
and more authentic Twitter feed.
● In-depth Sentiment Analysis: Utilizing advanced NLP techniques, we extract the
emotional tone of tweets, allowing users to understand the underlying feelings and
opinions behind the text.
● Interactive Chatbot: Our AI companion empowers users to navigate the Twitterverse with
ease. Ask questions about trending topics, analyze the sentiment of specific tweets, and
engage in stimulating conversations, all within our platform.
We believe our project possesses the potential to revolutionize the way users interact with and
understand the vast and vibrant world of Twitter. It offers a beacon of clarity in the sea of spam
and a window into the collective pulse of the digital world

Chapter 2: Software Requirement Specification


Project Name: Tweets Spam Analysis - Sentimental Analysis and Chatbot)
Date: 2024-01-16
Version: 1.0

2.1 Introduction
This document defines the software requirements for [Project Name], a web-based application
designed to combat spam and analyze sentiment on Twitter. It details the software's intended
purpose, scope, and functionalities, providing a clear understanding of its goals and desired
behavior.

2.1.1 Purpose
The purpose of this application is to:
● Detecting and filtering spam tweets: Identify unwanted content such as advertising bots
and malicious messages, providing users with a cleaner and more authentic Twitter
experience.
● Analyze the sentiment of tweets: Extract the emotional tone (positive, negative, neutral)
behind tweets, enabling users to understand public opinion and gauge audience
response.
● Offer an interactive chatbot: Provide users with a conversational AI companion to
answer questions about trending topics, analyze specific tweets, and engage in dialogue

2.1.2 Scope

This application will focus on:

● Analyzing tweets publicly available on the Twitter platform.


● Providing sentiment analysis for English-language tweets primarily.
● Offering chatbot interaction focused on Twitter-related information and analysis.

2.1.3 Definitions
● Spam: Unwanted or irrelevant content, including promotional tweets, bots, and malicious
messages.
● Sentiment Analysis: The process of automatically extracting the emotional tone (positive,
negative, neutral) from text.
● Natural Language Processing (NLP): A field of computer science concerned with the
interaction between computers and human (natural) languages.
● Chatbot: A computer program that simulates a conversation with human users.
● API: Application Programming Interface, a set of functions and protocols that allows one
program to communicate with another.

2.1.3 Acronyms and Abbreviations


● NLP - Natural Language Processing
● API - Application Programming Interface
● AI - Artificial Intelligence
● ML - Machine Learning
● UI - User Interface
● UX - User Experience

2.2 Overall Description

2.2.1 Product Perspective

From the user's vantage point, [Tweets Spam Analysis and Sentiment Analysis] seamlessly integrates
into their Twitter experience, offering a suite of powerful features without disrupting their usual
interactions.
System Interfaces:

● Seamless integration with Twitter: The application effortlessly connects with the Twitter API,
enabling users to analyze tweets, access their feeds, and interact with Twitter data without
leaving the platform.
● Browser-based accessibility: Users can access the application through any modern web browser,
eliminating the need for specific hardware or software installations.

User Interfaces:

● Intuitive interface: The design prioritizes user-friendliness, with clear navigation, informative
visualizations, and interactive elements.
● Streamlined spam filtering: Users can easily identify and filter spam tweets with a single click or
customize filtering thresholds based on their preferences.
● Visual sentiment analysis: Sentiment scores are presented, using color-coding or charts, making
it effortless to grasp the emotional tone of tweets and discussions.
● Chatbot integration: The chatbot seamlessly blends into the interface, providing a natural way to
ask questions, analyze sentiment, and engage in conversations, enhancing the user experience.

Hardware Interfaces:

● Minimal hardware requirements: The web-based nature of the application means it runs smoothly
on most devices with a stable internet connection, including smartphones, tablets, and
computers.

Software Interfaces:

● Compatibility with major browsers: The application is compatible with popular browsers like
Chrome, Firefox, Safari, and Edge.

Communications Interfaces:

● Secure communication with Twitter API: The application uses secure protocols (HTTPS) to
protect user data when interacting with the Twitter API.

Memory:

● Optimized for efficient performance: The application is designed to minimize memory usage,
ensuring smooth operation even on devices with limited resources.

Operations:

● Automatic updates: The application stays up-to-date with the latest features and security
enhancements through automatic background updates.
● Data backup and recovery: Options for backing up and restoring user data are available, ensuring
information security.

Site Adaptation Requirements:

● Flexible configuration: The application can be customized to align with specific user preferences
or site-specific requirements, allowing for tailored experiences.
By seamlessly integrating these technical elements, [Tweets Spam Analysis and Sentiment Analysis]
empowers users to navigate Twitter with clarity and confidence, offering a transformative experience that
filters noise, amplifies authentic voices, and unlocks the emotional landscape of the Twitterverse—all
within a user-friendly and accessible interface.

2.2.2 Product functions

ID Name Description Input Output Basic Workflow Requirements


(Optional)

FR-1 Cre Allows Userna Confir 1. User enters Passwo


ate new users me, matio required information. strength
Acc to create email n of 2. System validates validation, e
ount an account address, accou input and checks for addres
to access passwor nt duplicate accounts. verificatio
the d, creati 3. System creates optional pr
application' confirma on, account and stores fields (e.g., n
s features. tion of displa user information bio).
passwor y of securely. 4. System
d, user displays confirmation
optional profile and directs user to
profile page. profile page.
informati
on.

FR-2 Vie Enables User Displa 1. User logs in


w users to authenti y of successfully. 2.
Acc view their cation user System retrieves
ount account (login). profile user information from
details and inform the database. 3.
profile ation, System displays user
information includi profile page.
. ng
usern
ame,
email
addre
ss,
option
al
profile
fields,
and
accou
nt
setting
s.

FR-3 Upd Allows Updated Confir 1. User logs in Passwo


ate users to informati matio successfully. 2. User strength
Acc modify on, user n of enters updated validation, e
ount their authenti updat information. 3. addres
account cation es, System validates verificatio
details, (login). displa input and updates
including y of user information in
password, updat the database. 4.
email ed System displays
address, profile confirmation and
and profile inform updated profile page.
information ation.
.

FR-4 Del Enables User Confir 1. User logs in


ete users to authenti matio successfully. 2. User
Acc permanent cation n of confirms deletion
ount ly delete (login), accou request. 3. System
their confirma nt removes user data
account tion of deletio from the database. 4.
and deletion. n, System displays
associated remov confirmation of
data. al of deletion.
user
data.
FR-5 Log Allows Userna Confir 1. User enters Passwo
In existing me or matio credentials. 2. hashing, lo
users to email n of System verifies attempt lim
access address, succe credentials against session
their passwor ssful database. 3. System managem
account d. login, grants access upon
and the displa successful
application' y of authentication.
s features. user
dashb
oard
or
profile
page.

FR-6 Log Enables User Confir 1. User initiates


Out users to action matio logout action. 2.
securely (clicking n of System invalidates
end their logout logout session and clears
session button). , user data from
and redire memory. 3. System
protect ction redirects user to login
their to page.
account. login
page.

Table 1 Functional Requirement Create Account

● Spam Detection:
○ Analyze tweets for spam indicators.
○ Assign spam probability scores.
○ Filter tweets based on spam score thresholds.
○ Flag potential spam for user review.
● Sentiment Analysis:
○ Analyze the sentiment of individual tweets.
○ Aggregate sentiment scores for hashtags, users, and topics.
○ Visualize sentiment analysis results.
● Chatbot Interaction:
○ Understand natural language queries about Twitter trends and user analysis.
○ Access and process relevant Twitter data.
○ Provide informative and engaging responses.
○ Maintain a conversational tone.

2.2.3 User characteristics

1. Socially Savvy Trendsetters:


● Young adults and teenagers actively engaged on Twitter, seeking to stay informed about
trending topics and conversations.
● Value accurate and insightful sentiment analysis to gauge public opinion and understand
the emotional pulse of Twitter.
● Appreciate the chatbot's ability to answer quick questions about trending topics and
analyze specific tweets.
2. Professional Analysts and Researchers:
● Professionals from various fields require deep insights into public opinion and social
media trends.
● Utilize the application for sentiment analysis of specific hashtags, users, or campaigns to
inform their research and decision-making.
● Value advanced filtering options to focus their analysis on relevant data and identify
important voices.
3. Educators and Media Professionals:
● Teachers and journalists seeking to incorporate Twitter analysis into their lessons or
reporting.
● Use the application to illustrate real-world examples of public opinion and sentiment
formation.
● Appreciate the chatbot's ability to provide quick summaries of trending topics and key
talking points.
4. Casual Twitter Users:
● Individuals who occasionally use Twitter for entertainment or staying connected with
friends and family.
● Benefit from the spam detection feature to remove unwanted promotional content and
bots, enhancing their Twitter experience.
● Use the sentiment analysis for fun, understanding the emotional undertones of popular
tweets and reactions.
5. Security and Marketing Professionals:
● Individuals are concerned about online misinformation and social media manipulation.
● Utilize the application's spam detection and sentiment analysis capabilities to identify
potentially malicious content and understand its impact.
● Benefit from the chatbot's ability to analyze specific users and campaigns, potentially
revealing coordinated actions or misinformation campaigns.
Consider factors like:
● Demographics: Age, gender, location, and education level.
● Technical expertise: Familiarity with technology and web applications.
● Twitter usage patterns: Frequency, interests, followed accounts.

2.2.4 Constraints
● Technical limitations: Consider any limitations of the chosen technologies, APIs, or
platforms. For example, Twitter API rate limits, computational resource limitations for
NLP tasks, or browser compatibility limitations.
● Resource constraints: Be mindful of budget limitations, available personnel, and
development timelines.
● Legal and ethical considerations: Account for regulations and user privacy concerns
when collecting and analyzing Twitter data.

2.2.5 Assumptions and dependencies


● User acceptance: Assume initial user adoption will require marketing and educational
efforts.
● Data quality: Assume some limitations in the accuracy and consistency of Twitter data.
● Third-party services: Assume continued availability and stable performance of the
Twitter API and other potential dependencies.
● External dependencies: Identify any external tools, libraries, or APIs needed for core
functionalities (e.g., Twitter API, NLP libraries).
● Internal dependencies: Define any dependencies between different modules within your
project (e.g., sentiment analysis results feeding into the chatbot).

2.2.6 Apportioning of requirements


● Divide functionalities into manageable modules: Identify distinct modules or features
based on functionalities (e.g., spam detection, sentiment analysis, chatbot).
● Prioritize and assign ownership: Determine the priority of each module and assign
development responsibility to specific team members or groups.
● Consider dependencies and resource allocation: Ensure allocated resources and
development timelines cater to dependencies between modules.

Table for apportioning requirements in your project:

2.3 Specific requirements

Module Functionalities Priority Owner Estimated Dependencies


Time

Spam Analyze tweets for High Team 2 weeks Twitter API, Machine
Detection spam markers, assign A Learning libraries
scores, filter options

Sentiment Analyze sentiment of High Team 3 weeks NLP libraries, Data storage
Analysis tweets, aggregate A
scores, visualize
outputs

Chatbot Understand queries, Medium Team 4 weeks Sentiment analysis results,


analyze tweets, A User interaction module
provide responses

This section will describe the functional and non-functional requirements of the System at a
sufficient level of detail for the designers to design a system satisfying the User
requirements and tests to verify that the system satisfies the requirements.
2.3.1 Functional Requirement
● 2.3.1.1 Spam Detection:
○ The application should analyze tweets for spam indicators such as keywords,
hashtags, suspicious links, and unusual posting patterns.
○ It should assign a spam probability score to each tweet, allowing users to filter or
flag potential spam.
○ Different filtering options should be provided, based on spam probability score
thresholds.
● 2.3.1.2 Sentiment Analysis:
○ The application should analyze the sentiment of tweets using NLP techniques
like lexicon-based analysis or machine learning models.
○ Sentiment analysis should be performed on individual tweets and aggregated for
specific hashtags, users, or topics.
○ The results should be presented visually, using charts or graphs, for easy
interpretation.
● 2.3.1.3 Chatbot:
○ The chatbot should understand natural language queries related to Twitter
trends, user analysis, and general information.
○ It should be able to access and process relevant Twitter data using the Twitter
API.
○ The chatbot should provide informative and engaging responses while
maintaining a conversational tone.

2.3.2 Non-functional Requirements


● 7.1 Performance: The application should respond to user interactions within a
reasonable timeframe.
● 7.2 Security: User data and personal information should be protected with appropriate
security measures.
● 7.3 Accessibility: The application should be accessible to users with disabilities following
relevant web accessibility guidelines.
● 7.4 User Interface: The UI should be user-friendly, intuitive, and visually appealing.

Chapter 3: Use Case Analysis


1. Signup
Use Case ID: UC-1
Use Case Name: Signup
Description: Allows new users to create an account to access the application's features.
Primary Actor: User
Secondary Actor: System
Pre-Condition: User does not have an existing account.
Post-Condition: User account is created, and user is logged in.
Basic Flow:
1. User enters username, email, password, and optional profile information.
2. System validates input and checks for duplicates.
3. System creates account and stores user information.
4. System sends confirmation email (optional).
5. System logs user in and displays profile page.
Alternate Flow:
2a. Invalid input: System prompts user to correct errors.
3a. Duplicate account: System prompts user to choose a different username or email.
4a. Email delivery failure: System prompts user to verify email address.
2. Login
Use Case ID: UC- 2
Use Case Name: Login
Description: Allows existing users to authenticate themselves and access their accounts.
Primary Actor: User
Secondary Actor: System
Pre-Condition: User has a registered account.
Post-Condition: User is authenticated and granted access to the application.
Basic Flow:
1. User enters email or username and password.
2. System verifies credentials against stored user information.
3. System logs user in and displays appropriate dashboard or homepage.
Alternate Flow:
2a. Invalid credentials: System displays error message and prompts user to retry.
3a. Account locked: System displays message indicating account is locked and provides
instructions for unlocking.

Field Signup Login

Use Case ID UC_01 UC_02

Use Case
Signup Login
Name

Allows existing users to


Allows new users to create an account
Description authenticate themselves and
to access the application's features.
access their accounts.

Primary Actor User User

Secondary
System System
Actor

User does not have an existing


Pre-Condition User has a registered account.
account.

Post- User account is created, and user is User is authenticated and granted
Condition logged in. access to the application.
1. User enters username, email,
password, and confirmation of 1. User enters username or email
password. 2. System validates input address and password. 2. System
and checks for duplicate accounts. 3. verifies credentials against stored
Basic Flow System creates account and stores user information. 3. System grants
user information. 4. System sends access upon successful
confirmation email (optional). 5. authentication and displays the
System logs user in and displays user dashboard or homepage.
profile page.

1a. Invalid input (e.g., missing field, 1a. Invalid credentials: System
wrong format): System prompts user to displays an error message and
correct errors. 1b. Duplicate username prompts user to retry. 1b. Account
Alternate Flow or email: System prompts user to locked: System displays a
choose a different username or email. message indicating the account is
3a. Email delivery failure: System locked and provides instructions for
prompts user to verify email address. unlocking.

Figure 1 Usecase Diagram Create Account

Use case diagram detail

Use Case ID UC_01 (all ID should be in this sequence)


Use Case Name Create Account (Name of usecase here is create account)
Description Detail of this usecase
Primary Actor Actors associate with the usecase
Secondary Actor
Pre-Condition What is required to do this function
Post-Condition What is the output of this function
Basic Flow Actor Action System Action
Flow of information What would system do according to
the information

Alternate Flow Another way to work with this function

Table 2 Usecase Create Account

Tweet Sentiment Analysis Use Case Diagram


Use Case ID: UC_xx (replace xx with sequential numbering)
Use Case Name: Analyze Sentiment
Description: Allows users to analyze the sentiment of tweets or Twitter content to gauge public
opinion and emotional tone.
Primary Actor: User
Secondary Actor: System
Pre-Condition: User is logged in, and the trained sentiment analysis model is loaded.
Post-Condition: Sentiment analysis results are displayed, and insights are provided.
Basic Flow:
1. User enters a tweet or uploads a URL link containing tweets.
2. User clicks "Check Tweet" (or similar action).
3. System loads the trained sentiment analysis model.
4. System analyzes the text using the model.
5. System determines the overall sentiment (positive, negative, or neutral).
6. System displays the sentiment scores and any relevant visualizations.
7. System provides additional insights or recommendations based on the analysis
(optional).
Alternate Flow:
2a. Invalid input: System prompts user to enter valid text or URL.
4a. Model loading failure: System displays an error message and prompts user to retry.
Table:
Field Description

Use Case ID UC_xx

Use Case Name Analyze Sentiment

Description ... (as above)

Primary Actor User

Secondary Actor System

Pre-Condition ... (as above)


Post-Condition ... (as above)

Basic Flow 1-7 (as above)

Alternate Flow 2a, 4a (as above)

Diagram For Chatbot

Use Case Name: Interact with Chatbot


Description: This use case describes the interaction between the user and the chatbot within
your project. The chatbot provides various functionalities, including answering questions,
analyzing sentiment, and engaging in conversations.
Actors:
● Primary: User
● Secondary: Chatbot, System
Use Case Diagram
1. User types a question or request in the chat interface.
2. Chatbot receives the input and analyzes it using Natural Language Processing (NLP)
techniques.
3. Chatbot identifies the intent and entities of the user's input.
4. Based on the intent and entities, the chatbot performs various actions:
○ Answering questions: If the intent is to ask a question about Twitter trends, user
analysis, or the project itself, the chatbot retrieves relevant information from the
system and displays it to the user.
○ Analyzing sentiment: If the intent is to analyze the sentiment of specific tweets or
hashtags, the chatbot sends the text to the sentiment analysis module, retrieves
the results, and presents them to the user.
○ Engaging in conversations: If the intent is to have a casual conversation, the
chatbot uses its conversational skills to interact with the user in a natural and
engaging way.
5. Chatbot displays the response or takes other actions, depending on the intent and
entities.
6. User receives the response and can engage further with the chatbot or end the
interaction.
Table:
Field Description
Use Case Interact with Chatbot
Name

Description ... (as above)

Actors Primary: User, Secondary: Chatbot, System

Pre-Condition User is logged in and has access to the chatbot.

Post-Condition User receives a response from the chatbot or completes their desired
action.

Basic Flow 1-6 (as above)

Alternate Flow

Chapter 4: Design
In this section, we provide the design analysis of our modules including the following designs
1. Architecture Diagram
2. ERD with data dictionary
3. Data Flow Diagram
4. Class Diagram
5. Activity Diagram
6. Sequence Diagram
7. Collaboration Diagram
8. State Transition Diagram
9. Component Diagram
10. Deployment Diagram
Overall Architecture:

The architecture appears to be a pipeline system, where data flows through various stages for processing
and analysis. Tweets are the main input, and the system outputs sentiment analysis results and spam
classification labels.

Key Components:

● Tweepy API: This component interacts with the Twitter API to retrieve tweets based on specific
criteria (e.g., keywords, hashtags).
● Tweet Pre-Processing: This stage cleans and prepares the tweet text for further analysis. It might
involve tasks like:
○ Filtering: Removing irrelevant content like usernames, URLs, and special characters.
○ Tokenization: Breaking down the text into individual words or phrases.
○ Normalization: Converting words to lowercase, stemming/lemmatization (reducing words
to their root form).
● Spam Detection: This stage analyzes the processed tweets to identify potential spam based on
various features and machine learning models.
○ Features: The system might extract features like keywords, hashtags, suspicious links,
unusual posting patterns.
○ Models: Different machine learning models, such as Logistic Regression, Naive Bayes, or
Support Vector Machines, could be used to classify tweets as spam or ham (non-spam).
● Sentiment Analysis: This stage analyzes the processed tweets to determine their emotional tone
(positive, negative, or neutral).
○ Techniques: This could involve lexicon-based analysis using sentiment dictionaries or
supervised machine learning models trained on labeled sentiment data.
● Testing Classifiers: This stage evaluates the performance of the spam detection and sentiment
analysis models using separate testing datasets. Different classifiers are compared to choose the
most accurate ones for deployment.
● Classifier with Highest Accuracy: The chosen classifiers for both spam detection and sentiment
analysis are used to process incoming tweets in the main pipeline.
● Classifying Given Tweet: This final stage applies the chosen classifiers to the processed tweet,
giving it a spam probability score and a sentiment label.
Figure 1 Architecture Diagram

4.2 ERD with data dictionary


Entities:
● User: This entity represents a user on the platform. Each user is identified by a
unique user ID (userID) that is an integer and serves as the primary key for the
User table. Other attributes of a user include the creation date (creationDate), the
username (unique), email (unique), and password.
● Tweet: This entity represents a tweet posted by a user. Each tweet is identified
by a unique tweet ID (tweetId) that is an integer and serves as the primary key
for the Tweet table. Other attributes of a tweet include creation date (createdAt),
the text of the tweet (tweetText), and a foreign key (userID) that references the
user ID of the user who posted the tweet. This foreign key establishes the
relationship between the User and Tweet entities.

Relationships:

The ERD shows a one-to-many relationship between User and Tweet. This means that
one user can have many tweets, but each tweet belongs to only one user. This
relationship is enforced by the foreign key (userID) in the Tweet table, which references
the primary key (userID) in the User table.

Additional Notes:

● The ERD also specifies the data types for each attribute. For example, userID
and tweetId are integers, while username, email, password, and tweetText are
strings.
● The ERD does not show any constraints on the length of the attributes. However,
there are likely constraints on the lengths of username, email, password, and
tweet text.

Figure 3 ERD

4.3 Data Flow diagram


Data flow diagram includes two levels
4.3.1 The level 0
Key Elements:
● External Entities:
○ User: A person who interacts with the system to send or receive tweets.
○ Twitter API: An external service that provides access to Twitter data.
● Process:
○ System: The main system that handles the processing of tweets.
● Data Flows:
○ Enter query or tweet: The user enters a query to search for tweets or
composes a new tweet.
○ Retrieve tweets: The system retrieves tweets from the Twitter API based
on the user's query or sends a new tweet to be posted.
○ Provide tweets: The Twitter API provides the requested tweets to the
system.
○ Display results: The system displays the retrieved tweets or confirms the
successful posting of a new tweet.

Overall Functionality:

● The diagram depicts a system that interacts with Twitter to facilitate the sending
and receiving of tweets.
● Users can either enter a query to search for tweets or compose a new tweet to
be posted.
● The system communicates with the Twitter API to retrieve or send tweets as
needed.
● The retrieved tweets or confirmation of a posted tweet are then displayed to the
user.

Key Points:

● This is a level 0 DFD, which means it provides a high-level overview of the


system's functionality and interactions with external entities.
● It doesn't delve into the detailed processes or data stores within the system.

Figure 4 Level 0 DFD


4.3.2 The level 1

External Entity:
● User: This represents the person interacting with the system, likely entering
queries or composing tweets.
● Twitter API: This is the external service that provides access to Twitter data, such
as retrieving tweets based on specific criteria.

Processes:

● System: This encompasses the main functionalities of the system, further broken
down into subprocesses:
○ Validate User Input: Ensures the user's query or tweet adheres to
formatting requirements and is suitable for processing.
○ Construct API Request: Builds the appropriate request to send to the
Twitter API based on the validated user input.
○ Send Request to Twitter API: Transmits the constructed request to the
Twitter API.
○ Receive Response from Twitter API: Obtains the response containing
tweets from the Twitter API.
○ Preprocess Tweets: Cleans and prepares the received tweets for further
analysis, likely involving removing irrelevant information and tokenizing the
text.
○ Extract Features: Identifies relevant features from the preprocessed
tweets that can be used for analysis, such as n-grams or sentiment-
related word frequencies.
○ Spam Detection: Analyzes the extracted features to classify the tweets as
either spam or non-spam (ham). This might involve machine learning
algorithms trained on labeled data.
○ Sentiment Analysis: Analyzes the extracted features to classify the tweets
as positive, negative, or neutral sentiment. This could also involve
machine learning algorithms trained on labeled data.
○ Store Results: Saves the analyzed tweets and their associated labels
(spam/ham and sentiment) in a persistent storage system for future
reference or analysis.
○ Display Results: Presents the processed and analyzed tweets to the user,
potentially highlighting spam or sentiment classifications.

Data Flows:

● Enter query or tweet (User -> Validate User Input): The user's input is sent for
validation.
● Validated query or tweet (Validate User Input -> Construct API Request): The
validated input is used to create the API request.
● API request (Construct API Request -> Send Request to Twitter API): The
constructed request is sent to the Twitter API.
● Tweets (Twitter API -> Receive Response from Twitter API): The Twitter API
provides tweets in response to the request.
● Received tweets (Receive Response from Twitter API -> Preprocess Tweets):
The received tweets are sent for preprocessing.
● Preprocessed tweets (Preprocess Tweets -> Extract Features): The cleaned
tweets are used for feature extraction.
● Extracted features (Extract Features -> Spam Detection, Sentiment Analysis):
The identified features are used for both spam and sentiment analysis.
● Spam/Sentiment labels (Spam Detection, Sentiment Analysis -> Store Results):
The classification results are stored along with the tweets.
● Analyzed tweets with labels (Store Results -> Display Results): The final
processed and analyzed tweets are presented to the user.

Data Stores:

● (Optional): Consider whether temporary storage is needed for tweets during


processing or a database for storing analyzed results and user preferences.

Figure 5 Level 1 DFD

4.4 Class Diagram

Classes:

● User:
○ Attributes:
■ userID (int): A unique identifier for each user.
■ username (String): The user's chosen username.
■ email (String): The user's email address.
■ password (String): The user's password.
○ Methods:
■ createAccount(): Allows a user to create a new account.
■ login(): Allows a user to log in to their existing account.
■ updateProfile(): Allows a user to update their profile information.
● Tweet:
○ Attributes:
■ tweetID (int): A unique identifier for each tweet (assumed, as not
explicitly shown in the diagram).
■ tweetText (String): The text content of the tweet.
■ userID (int): The identifier of the user who posted the tweet (foreign
key).
○ Methods: (Not explicitly shown in the diagram, but would likely include
methods for creating and managing tweets)
● Chatbot:
○ Attributes: (Not explicitly shown in the diagram, but would likely include
attributes related to its conversational abilities and state)
○ Methods: (Not explicitly shown in the diagram, but would likely include
methods for interacting with users and generating responses)

Relationships:

● User - Tweet (One-to-Many): A single user can post multiple tweets, but each
tweet belongs to only one user. This is indicated by the 1...* multiplicity on the
User side of the relationship.
● User - Chatbot (Interacts With): This association indicates that users can interact
with the chatbot, but the specific nature of the interaction isn't explicitly defined in
the diagram.

Key Points:

● The diagram focuses on the core entities involved in a system that likely involves
user interactions, tweet management, and a chatbot component.
● It highlights the basic attributes and methods of each class, but doesn't provide
details about the chatbot's functionality or the specific interactions between users
and the chatbot.

Figure 6 Class Diagram


4.5 Activity Diagram

Here are the steps involved:

1. Retrieve tweets from Twitter API: This is the first step, where the system uses the
Twitter API to fetch tweets based on a specified query or criteria.
2. Preprocess tweets (clean, tokenize): The retrieved tweets are then cleaned and
preprocessed. This may involve removing irrelevant information such as stop
words, punctuation, and URLs, as well as tokenizing the text into individual words
or phrases.
3. Analyze sentiment using a trained model: The preprocessed tweets are then fed
into a sentiment analysis model, which classifies them as positive, negative, or
neutral based on their emotional tone.
4. Classify tweets as spam or ham: Based on the sentiment analysis results, the
tweets are classified as either spam or ham (non-spam).
5. Filter out spam tweets: If a tweet is classified as spam, it is filtered out and not
displayed.
6. Display tweet as ham: If a tweet is classified as ham, it is displayed.

The activity diagram also shows two alternative paths for displaying tweets:

● Display sentiment analysis results: If the user wants to see the sentiment
analysis results for a particular tweet, they can click on a button to display them.
● Display tweet as ham: If the user does not want to see the sentiment analysis
results, they can simply click on the tweet to display it.

Overall, this activity diagram provides a good overview of the process of using
sentiment analysis to filter out spam tweets from Twitter.
Figure 7 Activity Diagram Create Account

4.6 Sequence Diagram

Participants:
● User: A person who interacts with the system to search for tweets.
● System: The main system that handles the retrieval and display of tweets.
● Chatbot: A component within the system that interacts with the user and
potentially assists with tweet retrieval.

Interactions:

1. User initiates search:


○ The user starts the process by entering a search query.
2. System retrieves tweets:
○ The system, likely in response to the user's query, fetches relevant tweets
from the Twitter API (although this specific interaction isn't explicitly shown
in the diagram).
3. Chatbot receives tweets (optional):
○ The retrieved tweets might be passed to the chatbot, potentially for
filtering, analysis, or other processing (this is indicated by the dashed
arrow, suggesting it's optional or conditional).
4. Chatbot returns tweets (optional):
○ If the chatbot was involved, it would return the processed tweets back to
the system for display.
5. System displays tweets:
○ The system presents the final set of tweets to the user, enabling them to
view the search results.

Key Points:

● The diagram highlights the primary flow of interactions for a tweet search
scenario.
● It suggests potential chatbot involvement in tweet processing, but leaves the
exact nature of that involvement open for interpretation.

Figure 8 Sequence Diagram Create Account


4.7 Collaboration Diagram

Here's how it works:

1. Tweet retrieval: The process starts with the retrieval of tweets, either through a
user query or a continuous stream. This could involve interacting with the Twitter
API to fetch relevant tweets based on specific criteria.
2. Preprocessing: The retrieved tweets are then preprocessed to prepare them for
further analysis. This might involve cleaning the text by removing unnecessary
characters, punctuation, and stop words. Additionally, tokenization might occur,
where the tweet is broken down into individual words or phrases.
3. Sentiment analysis: The preprocessed tweets are then sent to the sentiment
analyzer. This component analyzes the emotional tone of the text and classifies it
as positive, negative, or neutral.
4. Spam detection: Simultaneously, the tweets are also passed to the spam
detector. This component utilizes various techniques to identify tweets that are
likely to be spam, such as analyzing the content for suspicious keywords,
patterns, or links.
5. Collaboration and decision: The sentiment analysis results and the spam
detection outcome are then combined to make a final decision about the tweet.
This could involve:
○ Displaying the tweet: If the tweet is classified as non-spam and has a
neutral or positive sentiment, it might be directly displayed to the user.
○ Flagging or filtering: If the tweet is classified as spam or has a negative
sentiment, it might be flagged for further review or filtered out from the
results.
○ Chatbot intervention: Depending on the specific system design, the
chatbot might intervene in certain cases. For example, it could interact
with the user to clarify the intent of a negative tweet or provide additional
information about a flagged tweet.

Figure 9 Collaboration Diagram

4.8 State Transition Diagram


● States: The diagram shows several states, represented by rounded boxes, but
only two of them are labeled: "Start" and "End Chat." This suggests that the
diagram focuses on the beginning and ending of a chatbot conversation.
● Transitions: Arrows connect the states, indicating possible transitions between
them. Unfortunately, none of the transitions are labeled in the diagram, making it
unclear what triggers the movement from one state to another.
● Missing information: To fully understand the chatbot's behavior, we would need
more information about the unlabeled states and transitions. This could include:
○ Labels for the states: What actions or activities do each state represent in
the conversation flow?
○ Labels for the transitions: What user inputs or system actions trigger the
transitions between states?
○ Additional states: Are there any other relevant states, such as
intermediate conversation stages or error handling scenarios, that are not
shown in the diagram?

Figure 10 State Transition Diagram


4.9 Component Diagram

Components:

● Twitter API: This component interacts with the Twitter API to retrieve tweets
based on a specified query or criteria.
● Preprocessor: This component cleans and preprocesses the retrieved tweets.
This might involve removing irrelevant information such as stop words,
punctuation, URLs, and usernames, as well as tokenizing the text into individual
words or phrases.
● Feature Extractor: This component extracts relevant features from the
preprocessed tweets. These features could be linguistic features like n-grams or
sentiment-related features like word frequency of positive and negative words.
● Spam Detector: This component uses the extracted features to classify tweets as
spam or ham (non-spam). It might employ machine learning algorithms like Naive
Bayes or Support Vector Machines trained on labeled data to make these
predictions.
● Sentiment Analyzer: This component analyzes the sentiment of the tweets,
classifying them as positive, negative, or neutral. It could also use machine
learning algorithms trained on labeled data to perform this task.
● Persistence Store: This component stores the analyzed tweets and their
associated labels (spam/ham and sentiment) for future use or analysis.
● Visualization Tool: This component displays the results of the analysis in a user-
friendly format, such as charts or graphs. This could allow users to see trends in
spam and sentiment over time or for specific topics.

Interactions:

1. Tweets retrieved: The Twitter API retrieves tweets based on the user's query or
criteria.
2. Preprocessing and feature extraction: The tweets are preprocessed and relevant
features are extracted.
3. Spam detection: The features are used by the spam detector to classify the
tweets as spam or ham.
4. Sentiment analysis: The features are also used by the sentiment analyzer to
classify the tweets as positive, negative, or neutral.
5. Persistence: The analyzed tweets and their labels are stored in the persistence
store.
6. Visualization: The visualization tool retrieves the stored data and displays the
results to the user.

Figure 11 Component Diagram

4.10 Deployment Diagram

Components:

● Twitter Streaming API: This component continuously retrieves tweets from


Twitter in real-time, based on a specified query or criteria.
● Kafka: This component acts as a distributed streaming platform that buffers and
distributes the incoming tweets from Twitter to different processing units.
● Spark Streaming: This component performs real-time analysis on the stream of
tweets received from Kafka. It likely involves tasks like preprocessing, feature
extraction, and sentiment analysis.
● Spam Detector: This component analyzes the features extracted from the tweets
and classifies them as spam or ham (non-spam).
● Sentiment Analyzer: This component analyzes the sentiment of the tweets,
classifying them as positive, negative, or neutral.
● Elasticsearch: This component serves as a search and analytics engine that
stores the analyzed tweets and their associated labels (spam/ham and
sentiment) for further analysis and visualization.
● Kibana: This component is a visualization tool that allows users to explore and
visualize the stored data in Elasticsearch. It could allow users to see trends in
spam and sentiment over time or for specific topics.

Interactions:

1. Tweets retrieved: The Twitter Streaming API retrieves tweets based on the user's
query or criteria.
2. Streaming to Kafka: The tweets are streamed to Kafka, which buffers and
distributes them to the Spark Streaming component for real-time processing.
3. Real-time analysis: Spark Streaming performs real-time analysis on the tweets,
including preprocessing, feature extraction, and sentiment analysis.
4. Spam detection and Sentiment analysis: The extracted features are used by the
Spam Detector and Sentiment Analyzer to classify the tweets as spam/ham and
positive/negative/neutral, respectively.
5. Persistence and visualization: The analyzed tweets and their labels are stored in
Elasticsearch and visualized using Kibana.

Figure 12 Deployment Diagram


References

Google

Chatgpt

Huggingface

Kaggle

Appendix
A. Dataset Description:

The project utilized a diverse dataset comprising tweets sourced from the Twitter API.

The dataset included a mix of spam and legitimate tweets to ensure a representative
training and testing set for the developed algorithms.

Details on the preprocessing steps undertaken, including tokenization, stemming, and


removal of stop words, are documented for transparency.

B. Machine Learning Models:

A detailed overview of the machine learning models employed for spam detection,
including but not limited to Naive Bayes, Support Vector Machines (SVM), and neural
networks.

The rationale behind the selection of each model, hyperparameter tuning, and validation
strategies are discussed.

C. Natural Language Processing (NLP) Techniques:

The NLP techniques applied in sentiment analysis, such as sentiment lexicons, word
embeddings, and deep learning architectures.

An exploration of how these techniques were adapted to handle the nuances of social
media language, slang, and emojis.

D. Feature Engineering:

Explanation of the key features used for spam detection, such as word frequency, user
engagement metrics, and time-based features.

The incorporation of sentiment-related features for sentiment analysis, including


sentiment lexicon scores and sentiment polarity.

E. System Architecture:

Overview of the system architecture, including the data flow, processing pipeline, and
integration points with external APIs or tools.

Details on the technology stack used, highlighting any specific frameworks, libraries, or
platforms that played a pivotal role in the project.
F. User Interface Design:

Screenshots or wireframes of the user interface, providing a visual representation of how


end-users interact with the system.

Description of the user interface components and functionalities, emphasizing the user-
friendly aspects and design considerations.

G. Evaluation Metrics:

A comprehensive list of metrics used to evaluate the performance of the spam detection
and sentiment analysis models.

Discussion on the choice of metrics and their relevance in the context of social media
analysis.

H. Ethical Considerations:

Reflection on the ethical considerations taken into account during the project, particularly
regarding privacy, bias, and the responsible use of user-generated content.

You might also like