Synopsis Report
on
Opinion Mining of Pandemic using Machine Learning
Submitted as requirement for the
Final Year Project
Session 2024-25
By:
Ojas Garg
2103213060
Radhika
Mehrotra
2103213074
Under the guidance of:
Mr. Shyam Sharma
Assistant Professor
DEPARTMENT OF CSE-AIML
ABES ENGINEERING COLLEGE, GHAZIABAD
AFFILIATED TO
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, U.P., LUCKNOW
(Formerly UPTU)
Student’s Declaration
I / we hereby declare that the work being presented in this report entitled “Opinion
Mining of Pandemic using Machine Learning.” is an authentic record of my/ our own
work carried out under the supervision of Mr. Shyam Sharma, Assistant Professor,
CSE-AIML. The matter embodied in this report has not been submitted by us anywhere
else.
Date:
Signature of student Signature of student
(Name: Ojas Garg) (Name: Radhika Mehrotra)
(Roll No. 2103213060) (Roll No. 2103213074)
Department: CSE-AIML Department: CSE-AIML
This is to certify that the above statement made by the candidate(s) is correct to the best
of my knowledge.
Signature of HOD Signature of Supervisor
…………………… Mr. Shyam Sharma
CSE-AIML Assistant Professor
Date: CSE-AIML
i
Acknowledgement
We would like to convey our sincere thanks to Mr. Shyam Sharma for giving the
motivation, knowledge and support throughout the course of the project. The continuous
support helps in a successful completion of project. The knowledge provided is very
useful for us.
We also like to give a special thanks to the department of CSE-AIML for giving us the
continuous support and opportunities for fulfilling our mini project.
Signature of student Signature of student
Ojas Garg Radhika Mehrotra
(Roll No. 2103213060) (Roll No. 2103213074)
ii
Table of Contents
S. No. Contents Page No.
Student’s Declaration i
Acknowledgement ii
List of Figures iv
List of Tables v
Abstract vi
Chapter 1: Introduction 1
Chapter 2: Related Work/Methodology 2
2.1: Existing Approaches 2
2.2: Comparative Analysis of Existing Works 2
Chapter 3: Project Objective 3
Chapter 4: Proposed Methodology 4
Chapter 5: Design and Implementation 5
5.1: Work Flow Diagram 5
Chapter 6: Results and Discussion 6
Chapter 7: Conclusion and Future Scope 7
References 8
iii
List of Tables
Table Page No.
Table 1. Count of tweets in each hashtag 4
iv
List of Figures
Figure No and name Page No.
Fig.1. Proposed Approach 4
5
Fig.2. Work Flow Diagram
6
Fig.3. Proportion of positive, negative and neutral
tweets.
V
ABSTRACT
COVID19 or popularly known as Coronavirus is an infectious disease
originated in Wuhan, China in 2019, and it have been spread all parts of the
world. In India the first case is found in the early 2020. Soon after it the
lockdown was imposed to control the situation. By now India have become 2nd
most affected country by the virus. In this project, the sentiments of the people
on the social media platform during this current pandemic is determined and
also it is tried to find that which machine learning algorithm will fits best for
analyzing the sentiments. About 1.5 lac tweets from Twitter have been
analyzed to determine the positivity, negativity or neutrality of people.
VI
Chapter 1
Introduction
The first case of this novel coronavirus was reported in December 2019 in China. From
there it spreads different countries like Italy, Spain, USA, India etc. World Health
Organization declared it a health emergency. Soon after it all the countries started taking
measures to stop spread of the novel coronavirus. On March 25, the nationwide lockdown
was imposed as a safety measure. By now, India became the 2 nd most affected country
after USA from coronavirus.
This project has been made to examine the opinions of the people after the lockdown
was imposed all over the India and people were locked in their homes. Analyzing the
sentiments are the emerging area of NLP which categorize the opinions and the
sentiments of the people using different text mining techniques. It can be helpful in many
ways. For example, it helps a seller to gain feedback of its product from the customer
from the online sites and by analyzing those feedbacks, the seller can improve the quality
of their product.
Social media platform is a place where everyone can express themselves without any
hesitation [6,8]. Twitter is a popular social media platform on which people express
themselves in the form of tweets. These tweets are studied to find out the sentiments or
opinions of the people on a certain subjective information.
The main objective of the project are : -
1. To analyze the tweets from the twitter and divide the emotions in three categories
(i.e. either positive, negative or neutral) and the emotions of the people.[3].
1
2. To study different machine learning algorithms for sentiment analysis and to
find out the best one that fits it[7].
2
Chapter 2
Related
Work
The related work associated with our project is given below:
2.1. Existing Approaches
Twitter Sentiment Analysis using Python:
To do the sentiment analysis of twitter data using python and find
the positive and negative tweets percentage [5].
Word frequency and sentiment analysis of twitter messages during Coronavirus
pandemic [9]
To find the frequency of each word and do the sentiment analysis of
the pandemic dataset [2].
COVID-19 pandemic: a sentiment analysis
To perform the analysis of sentiments of COVID-19 dataset [4].
An "Infodemic": Leveraging High-Volume Twitter Data to Understand Public
Sentiment for the COVID-19 Outbreak [10]
To measure and study the early changes in content and opinion about
the COVID-19[1].
2.2. Comparative Analysis of Existing Works
In the existing projects, the words with positive or negative polarity are
obtained but our project we are obtaining the polarity of the overall data set.
In existing projects, it is not specified that which machine learning model is
best for sentiment analysis but in our project we will be determining that too.
3
Chapter 3
Project Objective
This project will analyze the emotions of people during the pandemic.
To implement an algorithm for automatic classification of tweets into
positive, negative or neutral.
This project will analyze different Machine Learning Algorithms and finds the
one with best accuracy.
4
Chapter 4
Proposed Methodology
The proposed methodology related to our project is given below:
Step 1: Identify the famous hashtags during the pandemic in India on Twitter. Tweets
under those hashtags are extracted from the Twitter API using Tweepy library.
Step 2: The preprocessing of the dataset is done. It involves the following steps:
Removal of hashtags.
Removal of links, gifs, emoji, images and special characters.
Removal of stop words.
Removal of non-English words.
Lemmatization
Step 3: Analyzing the polarity of the dataset.
Step 4: Giving the step 3 output in different machine learning algorithms and analyze it to
find the algorithm with best accuracy.
Step 5: The results are represented using different charts.
Table 1. Count of tweets in each hashtag
[Link]. Hashtags No. of Tweets
1. #coronavirusIndia 10,000
2. #IndiafightsCorona 10,000
3. #IndiaLockdown 10,000
• Extraction of Dataset from Twitter API
• Pre-processing of Data to remove special characters, punctuations, Stop Words and Images
• Processing of Data to analyze the polarity of the Dataset
• To use Machine Learning Algorithm and find which fits best for performing Sentiment Analysis
• Results are represented using tables and graphs.
Fig.1. Proposed Approach
5
Chapter 5
Design and Implementation
The design and implementation of our project is as follows:
5.1. Work Flow Diagram
Fig.2. Work Flow Diagram
6
The dataset has been extracted from Twitter API using the tweepy library in
python. Python library Numpy is used for the numerical computation and pandas is
used for the data manipulation. Natural Language Toolkit is used for the
preprocessing of the dataset. Text Blob library is used for spelling checks and
analyzing the sentiments.
Matplotlib is used for the graphical representation of results.
7
Chapter 6
Results and Discussion
The result we got from analyzing the tweets is given below in Fig.3.
Fig.3. Proportion of positive, negative and neutral tweets.
Fig.3. shows that 46 % of the total tweets are neutral, about 36.5% tweets are positive
and 17.5% tweets are negative.
8
Chapter 7
Conclusion and Future Scope
The project will give the overall polarity score of Tweets and will find which is the
best Algorithm for performing Sentiment Analysis.
From the analyses of the tweets, we observe that most of the people feel neutral
during pandemic that is neither positive nor negative.
In future we will be planning to perform the analysis on various other social
platforms Instagram, Facebook, etc. and also try to further classify the sentiments.
9
References
[1] Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. (2020). An"
Infodemic": Leveraging High-Volume Twitter Data to Understand Public Sentiment for the
COVID-19 Outbreak. medRxiv.
[2] Rajput, N. K., Grover, B. A., & Rathi, V. K. (2020). Word frequency and sentiment
analysis of twitter messages during Coronavirus pandemic. arXiv preprint
arXiv:2004.03925.
[3] Samuel, J., Ali, G. G., Rahman, M., Esawi, E., & Samuel, Y. (2020). Covid-19 public
sentiment insights and machine learning for tweets classification. Information, 11(6), 314.
[4] Kumar, A., Khan, S. U., & Kalra, A. (2020). COVID-19 pandemic: a sentiment
analysis. European Heart Journal.
[5] Ahuja, S., & Dubey, G. (2017, August). Clustering and sentiment analysis on Twitter
data. In 2017 2nd International Conference on Telecommunication and Networks
(TEL- NET) (pp. 1-5). IEEE.
[6] Suman, C., Saha, S., Bhattacharyya, P., & Chaudhari, R. S. (2020). Emoji Helps! A
Multi-modal Siamese Architecture for Tweet User Verification. Cognitive Computation, 1-
16
[7] Neethu, M. S., & Rajasree, R. (2013, July). Sentiment analysis in twitter using machine
learning techniques. In 2013 Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
[8] Gupta, S., Singh, A., & Ranjan, J. (2020). Sentiment Analysis: Usage of Text and
Emoji for Expressing Sentiments. In Advances in Data and Information Sciences (pp.
477-486). Springer, Singapore
[9] Rajput, N. K., Grover, B. A., & Rathi, V. K. (2020). Word frequency and sentiment
analysis of twitter messages during coronavirus pandemic. arXiv preprint
arXiv:2004.03925.
[10] Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. An
“Infodemic”: Leveraging High-Volume Twitter Data to Understand Early Public
Sentiment for the COVID-19 Outbreak. In Open Forum Infectious Diseases.
1
0