0% found this document useful (0 votes)
20 views21 pages

Minor Project Report grp 11 (2)

The project report on 'Twitter Sentiment Analysis' details the development of a system to classify tweets into positive, negative, or neutral sentiments using Naive Bayes and Logistic Regression algorithms. The study emphasizes the importance of preprocessing techniques and evaluates model performance through various metrics, demonstrating the effectiveness of the chosen methods. The project aims to provide insights into public opinion trends based on Twitter data, with potential applications in marketing and brand management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views21 pages

Minor Project Report grp 11 (2)

The project report on 'Twitter Sentiment Analysis' details the development of a system to classify tweets into positive, negative, or neutral sentiments using Naive Bayes and Logistic Regression algorithms. The study emphasizes the importance of preprocessing techniques and evaluates model performance through various metrics, demonstrating the effectiveness of the chosen methods. The project aims to provide insights into public opinion trends based on Twitter data, with potential applications in marketing and brand management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

A PROJECT REPORT

on

“TWITTER SENTIMENT ANALYSIS ”

Submitted to

KIIT Deemed to be University

In Partial Fulfillment of the Requirement for the Award of

BACHELOR’S DEGREE IN INFORMATION


TECHNOLOGY

BY
ABHISHEK PANI 21051192
SHIRSHAK PATTNAIK 21052360
ABHIJEET PANI 21052552
BARENYA NAYAK 21052577
CHANDRAKANTA MEHER 21052580

UNDER THE GUIDANCE OF


Mr.Sourav Kumar Giri
KIIT Deemed to be University
School of Computer Engineering
Bhubaneswar, ODISHA 751024

CERTIFICATE
This is certify that the project entitled

“TWITTER SENTIMENT ANALYSIS“


submitted by
ABHISHEK PANI 21051192
SHIRSHAK PATTNAIK 21052360
ABHIJEET PANI 21052552
BARENYA NAYAK 21052577
CHANDRAKANTA MEHER 21052580

is a record of bonafide work carried out by them, in the partial fulfillment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Sci-ence
& Engineering) at KIIT Deemed to be university, Bhubaneswar. This work is done
during the year 2023-2024, under our guidance.

Date: 31/03/24

Project Guide:
Mr. Sourav Kumar Giri
Acknowledgements

We would like to express our sincere gratitude to all those who


have contributed to the completion of this project on Twitter
sentiment analysis. Firstly, we extend our deepest appreciation to
our supervisor, Ms. Sarita Mishra, whose guidance, support,
and expertise have been instrumental throughout this journey.

We are immensely grateful for the valuable insights,


encouragement, and constructive feedback provided by Ms. Sarita
Mishra, which significantly contributed to the refinement and
success of our project. Her dedication to our learning and growth
has been truly inspiring and which has enriched our
understanding and implementation of various techniques and
methodologies in sentiment analysis on Twitter data.

This project has been a journey of learning, growth, and


collaboration, and we are grateful to everyone who has been a part
of it.

We are so thankful for your exceptional guidance and support.

ABHISHEK PANI 21051192


RITIKESH KUMAR 21052019
SHIRSHAK PATTNAIK 21052360
ABHIJEET PANI 21052552
BARENYA NAYAK 21052577
CHANDRAKANTA MEHER 21052580
ABSTRACT

Twitter sentiment analysis has emerged as a crucial tool for


understanding public opinion and sentiment towards various topics,
products, and events. In this study, we employed two prominent machine
learning algorithms, Naive Bayes and Logistic Regression, to analyze
sentiment in Twitter data.

The dataset comprised a diverse range of tweets collected from the


Twitter platform, covering a wide array of topics and themes.
Preprocessing techniques including tokenization, stop-word removal,
and stemming were applied to clean and prepare the data for analysis.

Naive Bayes and Logistic Regression models were implemented to


classify tweets into positive, negative, or neutral sentiment categories.
These models were trained on labeled data, using features extracted
from the text of the tweets, such as word frequencies and n-grams.

Performance evaluation was conducted using metrics including accuracy,


precision, recall, and F1-score. The results demonstrated the
effectiveness of both Naive Bayes and Logistic Regression in accurately
classifying sentiment in Twitter data, with Logistic Regression slightly
outperforming Naive Bayes in terms of overall accuracy.

Additionally, feature importance analysis provided insights into the


significant words and phrases contributing to sentiment classification,
aiding in the interpretation of model predictions.

Overall, this study showcases the utility of Naive Bayes and Logistic
Regression models in Twitter sentiment analysis, highlighting their
potential applications in diverse domains such as marketing, public
opinion research, and brand management.

Keywords: Sentiment Analysis, Machine Learning, Naive Bayes Classifier,


Logistic Regression Model, Twitter Data Analysis
Contents

1 Introduction 1
2 Basic Concepts/ Literature Review 2

2.1 Sub Section Name........................... 2


3 Problem Statement / Requirement Specifications 3

3.1 Project Planning........................... 3


3.2 Project Analysis (SRS)................. 3
3.3 System Design ………………….. 3
3.3.1 Design Constraints …… 3
3.3.2 System Architecture (UML) / Block Diagram … 3
4 Implementation 4

4.1 Methodology / Proposal ........................... 4


4.2 Testing / Verification Plan ……………. 4
4.3 Result Analysis / Screenshots …………. 4
4.4 Quality Assurance …………………….. 4
5 Standard Adopted 5

5.1 Design Standards . . . . . . . . . . . . . . . 5


5.2 Coding Standards . . . . . . . . . . . . . . 5
5.3 Testing Standards . . . . . . . . . . . . . . . 5
6 Conclusion and Future Scope 6

6.1 Conclusion ……………………….. 6


6.2 Future Scope ………………………. 6
References 7

Individual Contribution 8

Plagiarism Report 9
List of Figures
Twitter Sentiment Analysis

Chapter 1

Introduction
Millions of people are using Twitter and expressing their emotions like
happiness, sadness, anger, etc. The Sentiment analysis is also about
detecting the emotions, opinion, assessment, attitudes, and took this into
consideration as a way humans think. Sentiment analysis classifies the
emotions into classes such as positive or negative. Nowadays, industries are
interested in using textual data for semantic analysis to extract the view of
people about their products and services. Sentiment analysis is very
important for them to know the customer satisfaction level and they can
improve their services accordingly. To work on the text data, they try to
extract the data from social media platforms. There are a lot of social media
sites like Google Plus, Facebook, and Twitter that allow expressing opinions,
views, and emotions about certain topics and events. Microblogging site
Twitter is expanding rapidly among all other online social media
networking sites with about 200 million users. Twitter was founded in 2006
and currently, it is the most famous microblogging platform. In 2017 2
million users shared 8.3 million tweets in one hour. Twitter users post their
thoughts, emotions, and messages on their profiles, called tweets. The Word
limit of a single tweet has 140 characters. Twitter sentiment analysis based
on the NLP (natural language processing) field. For tweets text, we use NLP
techniques like tokenizing the words, removing the stop words like I, me,
my, our, your, is, was, etc. Natural language processing also plays a part to
preprocess the data like cleaning the text and removing the special
characters and punctuation marks. Sentimental analysis is very important
because we can know the trends of people’s emotions on specific topics
with their tweets.

Figure 1.1: Twitter Sentiment Analysis

School of Computer Engineering, KIIT, BBSR 1


Twitter Sentiment Analysis

Chapter 2
Basic Concepts/ Literature Review
This chapter provides an overview of the basic concepts, tools, and techniques
employed in the project on Twitter sentiment analysis. The section encompasses a
literature review to contextualize the project within the existing body of research.

2.1 Logistic Regression


Logistic regression is a statistical method used for binary classification tasks. In
the context of sentiment analysis, logistic regression models are trained to predict
the sentiment of tweets as either positive or negative. This project explores the
application of logistic regression for sentiment classification, leveraging its
simplicity and interpretability.

2.2 Naive Bayes Classifier


The Naive Bayes classifier is a probabilistic model based on Bayes' theorem with
the assumption of independence between features. It is widely used in text
classification tasks, including sentiment analysis. In this project, Naive Bayes
classification techniques have been implemented to categorize tweets into
positive or negative sentiment classes.

2.3 Preprocessing Techniques


Preprocessing plays a crucial role in enhancing the effectiveness of sentiment
analysis models. Techniques such as stemming, which involves reducing words to
their base or root form, and regular expression (regex) for pattern matching and
text manipulation have been employed to clean and standardize the text data.
Additionally, techniques like removing special characters, converting text to
lowercase, and eliminating stop words have been utilized to improve the quality
of input data for classification.

2.4 Literature Review


Several studies in the field of sentiment analysis have explored the effectiveness
of different machine learning algorithms and preprocessing techniques. Research
has shown that logistic regression and Naive Bayes classifiers are popular choices
for sentiment analysis tasks due to their simplicity, efficiency, and good
performance with text data. Furthermore, the use of preprocessing techniques
such as stemming and regex has been demonstrated to improve the accuracy and
robustness of sentiment analysis models.

Overall, the literature review underscores the importance of logistic regression,


Naive Bayes classifiers, and preprocessing methods in the domain of sentiment
analysis, providing valuable insights for the implementation and evaluation of the
project.

School of Computer Engineering, KIIT, BBSR


Twitter Sentiment Analysis

Chapter 3
Problem Statement / Requirement
Specifications
3.1 Problem Statement

The objective of this project is to conduct sentiment analysis on Twitter data


to extract insights regarding the sentiment trends associated with specific
topics or events. The problem statement revolves around the need to develop
a system capable of automatically categorizing tweets into positive or
negative sentiments. This categorization will enable users to understand
public opinion and sentiment dynamics regarding various subjects, such as
products, events, or social issues, based on Twitter data.

3.2 Project Planning

The project planning phase involves the following steps:

● Identify the scope and objectives of the sentiment analysis project.


● Define the target audience and stakeholders.
● Gather requirements for data collection, preprocessing, analysis, and
visualization.
● Create a timeline and milestones for project execution.
● Allocate resources, including personnel and computing infrastructure.
● Develop a contingency plan to address potential risks and challenges.

3.3 Project Analysis

After defining the problem statement and collecting requirements, the


project analysis phase focuses on:

● Reviewing and refining the problem statement to ensure clarity and


specificity.
● Conducting a feasibility study to assess the technical, economic, and
operational viability of the project.
● Analyzing the requirements to identify any ambiguities,
inconsistencies, or missing details.
● Collaborating with stakeholders to validate and prioritize
requirements.

3.3 System Design


3.3.1 Design Constraints

The project will utilize the following working environment and tools:

● Software: Python programming language, libraries such as scikit-learn


for machine learning, NLTK for natural language processing, and pandas for
data manipulation.

● Hardware: Standard computing hardware with sufficient processing


power and memory to handle data preprocessing and model training tasks.

● Experimental Setup: The sentiment analysis models will be trained


and evaluated using historical Twitter data collected through the Twitter API
or publicly available datasets.

3.3.2 System Architecture OR Block Diagram

The diagram outlines the sequential steps involved in the sentiment analysis
process, including data collection, preprocessing, feature extraction, model
training, and sentiment classification. Each component interacts with the
next in a cohesive pipeline to achieve the overall objective of sentiment
analysis on Twitter data.

School of Computer Engineering, KIIT, BBSR 3


Twitter Sentiment Analysis

Chapter 4

Implementation
4.1 Methodology or Proposal

The implementation of the sentiment analysis project involved the following


methodology:

● Data Collection: Twitter data was collected using the Twitter API
based on specific search queries or hashtags related to the topics of interest.

● Data Preprocessing: The collected tweets were preprocessed to


remove noise and standardize the text data. Preprocessing steps included
tokenization, stemming, removal of stop words, and handling of special
characters.
● Feature Extraction: Features such as word frequency, TF-IDF scores,
and n-gram representations were extracted from the preprocessed text data
to represent the tweets numerically.

● Model Selection and Training: Logistic Regression and Naive Bayes


classifiers were selected as the machine learning models for sentiment
analysis. These models were trained using the extracted features to classify
tweets into positive, negative, or neutral sentiments.

● Evaluation: The performance of the trained models was evaluated


using metrics such as accuracy, precision, recall, and F1-score on a separate
test dataset.

In this section, we outline the testing or verification plan for evaluating the
sentiment analysis models developed as part of the project. The test case
titled "Model Evaluation" focuses on assessing the performance of the
trained models in predicting sentiment for tweets in a designated test
dataset.

● Test Condition: Trained sentiment analysis models loaded with test


data.
To conduct the evaluation, the sentiment analysis models will be loaded
with a separate test dataset. This dataset contains a representative sample
of tweets covering various topics and sentiments, ensuring a comprehensive
assessment of the models' performance.

● System Behavior: The system will predict sentiment for each tweet
in the test dataset using the loaded models.
During the evaluation process, the sentiment analysis models will analyze
the text of each tweet in the test dataset and classify it into one of the
predefined sentiment categories—positive, negative, or neutral. This
behavior represents the core functionality of the models, which is to
accurately categorize tweets based on their sentiment.

● Expected Result: Classification accuracy and other performance


metrics, such as precision, recall, and F1-score, will be calculated to evaluate
the effectiveness of the sentiment analysis models.
Upon completion of the evaluation, we anticipate obtaining metrics such as
classification accuracy, precision, recall, and F1-score. These metrics will
provide insights into the models' ability to correctly classify tweets
according to their sentiment. Additionally, the confusion matrix generated
during the evaluation will offer a detailed breakdown of true positive, true
negative, false positive, and false negative predictions, facilitating a
comprehensive assessment of the models' performance across different
sentiment classes.

By adhering to this testing or verification plan, we aim to validate the


efficacy of the sentiment analysis models developed in this project and
ensure their suitability for practical deployment in analyzing real-world
Twitter data.

4.4 Quality Assurance

Quality assurance procedures followed during the project development


include:

● Regular code reviews and peer feedback to ensure code quality and
adherence to best practices.

● Testing conducted on different datasets to assess the robustness and


generalization capability of the sentiment analysis models.

● Documentation of project requirements, design decisions, and


implementation details to facilitate future maintenance and extension of the
system.
Classification Report for both the models :

For Logistic regression :

For Naive Bayes :

School of Computer Engineering, KIIT, BBSR 4


Twitter Sentiment Analysis

Chapter 5

Standards Adopted

5.1 Design Standards


In all the engineering streams, there are predefined design standards are present
such as IEEE, ISO etc. List all the recommended practices for project design. In
software the UML diagrams or database design standards also can be followed.

5.2 Coding Standards


Coding standards are collections of coding rules, guidelines, and best practices.
Few of the coding standards are:
Write as few lines as possible.
Use appropriate naming conventions.
Segment blocks of code in the same section into paragraphs.
Use indentation to marks the beginning and end of control structures. Clearly
specify the code between them.
Don’t use lengthy functions. Ideally, a single function should carry out a single
task.
…...

5.3 Testing Standards


There are some ISO and IEEE standards for quality assurance and testing of the
product. Mention the standards followed for testing and verification of your
project work.

School of Computer Engineering, KIIT, BBSR 5


Twitter Sentiment Analysis

Chapter 6

Conclusion and Future Scope


6.1 Conclusion
The Twitter sentiment analysis project has provided valuable insights into
understanding public opinion and sentiment trends on the Twitter platform.
Through meticulous data collection, preprocessing, feature extraction, and
model training, we have developed a robust sentiment analysis system
capable of classifying tweets into positive, negative, and neutral categories
with high accuracy. Our analysis has highlighted the effectiveness of various
machine learning techniques, ranging from traditional algorithms to state-
of-the-art deep learning models, in capturing the nuanced sentiment
expressed in tweets.

Furthermore, our evaluation and analysis have not only demonstrated the
performance and robustness of the sentiment analysis system but also shed
light on potential ethical considerations, biases, and real-world applications.
By addressing these issues and continuously seeking improvements, we can
ensure the responsible deployment and utilization of sentiment analysis
technology in diverse domains.

6.2 Future Scope


1. Enhanced Model Performance: Continuously refine and improve the
sentiment analysis models by incorporating more sophisticated
architectures, leveraging larger datasets, and exploring advanced
techniques such as self-supervised learning and multi-task learning.

2. Multimodal Analysis: Expand the analysis beyond textual data by


integrating multimodal features such as images, videos, and user profiles to
capture a more comprehensive understanding of sentiment expressed on
Twitter.

3. Real-Time Analysis: Develop capabilities for real-time sentiment


analysis to provide timely insights and facilitate immediate responses to
emerging trends, events, or crises on Twitter.

4. Domain-Specific Analysis: Tailor the sentiment analysis system to


specific domains or industries, such as finance, healthcare, or politics, by
incorporating domain-specific lexicons, knowledge graphs, and contextual
information.
5. Ethical Considerations: Continuously monitor and address ethical
considerations, biases, and fairness issues in sentiment analysis, including
algorithmic bias, privacy concerns, and unintended consequences.

6. User Interaction and Feedback: Implement mechanisms for user


interaction and feedback within the sentiment analysis system to
incorporate user preferences, corrections, and domain expertise for
improved accuracy and relevance.

7. Scalability and Deployment: Scale the sentiment analysis system to


handle larger volumes of data and user requests by optimizing
infrastructure, parallelizing computations, and leveraging cloud services for
elastic scalability.

8. Interpretability and Explainability: Enhance the interpretability


and explainability of sentiment analysis models to provide users with
insights into how predictions are made and enable them to trust and
understand the results.

9. Collaboration and Knowledge Sharing: Foster collaboration with


academia, industry, and the broader community to advance research and
development in sentiment analysis, share best practices, and contribute to
open-source projects and datasets.

By pursuing these future plans, we aim to further advance the field of


sentiment analysis and empower users with valuable insights into public
sentiment expressed on Twitter, thereby facilitating informed decision-
making and fostering meaningful interactions in the digital landscape.

School of Computer Engineering, KIIT, BBSR

6
Twitter Sentiment Analysis

References

Getting Started with Sentiment Analysis on Twitter (huggingface.co)

Breckling, Ed., The Analysis of Directional Time Series: Applications to Wind Speed and Direction, ser. Lecture
Notes in Statistics. Berlin, Germany: Springer, 1989, vol. 61.

S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin elevated channel low-temperature poly-Si
TFT,” IEEE Electron Device Lett., vol. 20, pp. 569–571, Nov. 1999.

M. Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, “High resolution fiber distributed measurements with
coherent OFDR,” in Proc. ECOC’00, 2000, paper 11.3.4, p. 109.

R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-to-RF converter,” U.S. Patent 5 668 842,
Sept. 16, 1997.

(2002) The IEEE website. [Online]. Available: http://www.ieee.org/

[7] M. Shell. (2002) IEEEtran homepage on CTAN. [Online]. Available: http://www.ctan.org/tex-


archive/macros/latex/contrib/supported/IEEEtran/

School of Computer Engineering, KIIT, BBSR 7


Twitter Sentiment Analysis

SAMPLE INDIVIDUAL CONTRIBUTION REPORT:

<TITLE OF THE PROJECT IN FONT SIZE 14, FONT STYLE TIMES


NEW ROMAN, BOLD AND CENTERED>

<Student Name (in capital letters in font size 12, Times New Roman and
centered>
<Student Roll number (font size 12, Times New Roman and centered>

Abstract: A short description of the aim and objective of the project work carried out in 3-4
lines. This part should be common to all students in the group. The font size and style will
remain same from this point onwards. The font size will be 12 and font style will be Times New
Roman. The line spacing will be 1.5.
This report should be prepared in A4 page format with ‘default’ option under ‘Margin’ of the
‘Page Layout’ tab in Microsoft Word. Word limit for this section is 80.

Individual contribution and findings: The student should clearly indicate his/her role
in the project group and the contribution in implementing the project work. The student should
also outline his /her planning involved in implementing his/her part in the work. This
contribution report should be different for every student in the group. The student would also
write his./her technical findings and experience while implementing the corresponding part of
the project. The overall contribution report should not be less than 1 page for each student. The
Student should provide both the soft copy and signed hard copy to the project supervisor.

Individual contribution to project report preparation: Student should mention


his/her role in preparing the group project report indicating which chapter and portions
contributed.

Individual contribution for project presentation and demonstration: Student


should mention his/her role in preparing presentations and part of the project demonstrated.

Full Signature of Supervisor: Full signature of the


student:
…………………………….
……………………………..
School of Computer Engineering, KIIT, BBSR 8

TURNITIN PLAGIARISM REPORT


(This report is mandatory for all the projects and plagiarism
must be below 25%)
School of Computer Engineering, KIIT, BBSR 9

You might also like