Twitter Sentiment Analysis
Twitter Sentiment Analysis
Project Report
By
Jaanav Mathavan
S Vishal
Bevin Sukil Subash
Class:12
Section:F1
1
(Affiliated to Central Board of Secondary Education, New Delhi)
(Chettinad House, R.A.Puram, Chennai – 600 028)
COMPUTER SCIENCE
Date: Teacher-in-charge
2
ACKNOWLEDGEMENT
project.
3
INDEX
SNO Topic Page NO.
1 OVERVIEW OF PYTHON 5
2 PROJECT DESCRIPTION 6
3 FUNCTIONS USED 9
4 FILES USED 11
5 SOURCE CODE 11
6 SAMPLE OUTPUTS 21
7 CONCLUSION 25
8 BIBLIOGRAPHY 25
4
OVERVIEW OF PYTHON
5
Introduction
What is sentiment analysis?
Sentiment Analysis is the process of ‘computationally’ determining whether a
piece of writing is positive, negative or neutral. It’s also known as opinion
mining, deriving the opinion or attitude of a speaker.
Sentiment analysis (also known as opinion mining or emotion AI) refers to the
use of natural language processing, text analysis, computational linguistics, and
biometrics to systematically identify, extract, quantify, and study affective states
and subjective information. Sentiment analysis is widely applied to voice of the
customer materials such as reviews and survey responses, online and social
media, and healthcare materials for applications that range from marketing to
customer service to clinical medicine.
Why sentiment analysis?
Business: In marketing, companies use it to develop their strategies, to
understand customers’ feelings towards products or brands, how people
respond to their campaigns or product launches and why consumers don’t
buy some products.
Politics: In the political field, it is used to keep track of political views, to
detect consistency and inconsistency between statements and actions at
the government level. It can be used to predict election results as well!
Public Actions: Sentiment analysis also is used to monitor and analyse
social phenomena, for the spotting of potentially dangerous situations and
determining the general mood of the blogosphere.
6
Overview of the Proposed System
Introduction
As the term suggests, microblogging is the blogging of small statements such as
“I am having lunch” and is considered a passive form of blogging.
Microblogging services provide a simple, easy form of communication that
enables users to broadcast and share information about their day-to-day
activities, opinions, news stories, current status, and other interests.
Commercial or purposive microblogs also exist and are used to promote
websites, services, products, or individuals by using microblogging on popular
platforms such as Twitter, Facebook, etc., as marketing and public relations
services.
Since its launch in October 2006, Twitter has become a ubiquitous real-time
information network powered by people all around the world that lets users
share and discover what is happening now. Twitter is a social medium for
people to communicate and stay connected through the exchange of quick,
frequent messages. People write short updates, often called “tweets,” limited to
140 characters, about various topics such as their day-to-day activities. They
share information, news, and opinions with followers, and seek knowledge and
expertise through public tweets.
Twitter employs a social-networking model called “following,” in which
Twitter users can follow any other user without permission, i.e., the relationship
of following requires no reciprocation. To follow someone on Twitter means to
7subscribe to their tweets or updates on the site almost in real time. A
“follower” is another Twitter user who has followed you. A “reply” is a tweet
posted in reply to another user’s message; it begins with “@username,” where
the “@” sign is used to call out usernames in tweets.
“RT,” which stands for “retweet,” is the act of forwarding another user’s tweet
to all of your followers. Users can respond to another person’s tweet, which is
called “mention.” A “mention” is any Twitter update that contains
“@username” in the tweet content. It is important for popular users such as
celebrities, politicians, or corporations to understand their audiences, and to
measure their influence toward audiences on Twitter. The goal of this study is to
7
develop a measure of positive negative influence for popular users on Twitter
and reveal how this measure of influence is related to real-world phenomena.
We will collect the tweets of certain popular users, together with the tweets of
other empirical analysis of user sentiment on Twitter based on an analysis of
negative and positive words. We will develop a measure of the positive-
negative influence between popular users and their audience and then
investigate whether the positive negative influence changes over time. The
primary contribution of this work is that this measure of influence on Twitter
can be used as an indicator to identify real world audience sentiments, providing
new insights into influence and a better understanding of popular users.
Methodology Adopted
The proposed methodology for our project can be summarised by the following
four steps:
First we authorize the Twitter API Client with the help of the twitter API
credentials that have been provided to us.
We then do a get request to the Twitter API for a particular query with
the help of the tweepy library which will be explained later in detail.
We then parse the tweets and classify whether a tweet is positive or
negative or neutral.
We then create a pie chart with the help of the matplotlib library showing
the percentage of tweets that are positive, negative and neutral.
Authentication:
In order to fetch tweets through Twitter API, one needs to register an App
through their twitter account. Follow these steps for the same:
8
Make a Twitter Developer account
‘Create New App’
Fill the application details. You can leave the callback URL field empty.
Open the ‘Keys and Access Tokens’ tab.
Copy ‘Consumer Key’, ‘Consumer Secret’, ‘Access token’ and ‘Access
Token Secret’.
Modules imported
Installation:
9
contents
update_to_box() to add the contents of the drop down menu to the entry
box
python_code()
__init__(self)
clean_tweet(self, tweet)
get_tweet_sentiment(self, tweet)
main()
10
Files used
SOURCE CODE
import re
import sys
import tweepy
import matplotlib.pyplot as plt
from tweepy import OAuthHandler
from textblob import TextBlob
from tkinter import *
from tkinter import ttk
from tkinter import messagebox
entered_value=[];new=[];lst=[];show_queries=[]
positive_tweets=[];negative_tweets=[];neutral_tweets=[];code_analysis={}
g=''
def python_code():
global new
global entered_value
global qu
new=[]
queryna=entry.get().upper().split()
queryname=''
for i in queryna:
queryname+=i+' '
queryname=queryname.rstrip(' ')
new.append(queryname)
i=len(new)
if new[i-1] in entered_value:
pass
11
else:
entered_value.append(entry.get())
class TwitterClient(object):
def __init__(self):
# keys and tokens from the Twitter Dev Console
consumer_key = 'xM7IgmY4nbq7tgIw9ENVXyBEw'
consumer_secret='qkcW5AgsgLNOKjIhn4Hd0LTt1ktq7ox52pypYEc5lJUNy2fJXQ'
access_token = '1225054341913923584-WlGX3batUtcb9KXTPGKQ6z7bf2Nqw5'
access_token_secret ='qndhd5zaI3WAwIXBLe5gyRiV5X1u6fwJ8O6KC39X2PRgu'
# attempt authentication
try:
# create OAuthHandler object
self.auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
self.auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
self.api = tweepy.API(self.auth)
except:
print("Error: Authentication Failed")
def clean_tweet(self, tweet):
'''
Utility function to clean tweet text by removing links, special characters
using simple regex statements.
'''
return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) |(\w+:\/\/\S+)", "
",tweet).split())
def get_tweet_sentiment(self, tweet):
'''
Utility function to classify sentiment of passed tweet
using textblob's sentiment method
'''
# create TextBlob object of passed tweet text
analysis = TextBlob(self.clean_tweet(tweet))
# set sentiment
if analysis.sentiment.polarity > 0:
return 'positive'
elif analysis.sentiment.polarity == 0:
return 'neutral'
else :
return 'negative'
def get_tweets(self, query, count = 10):
12
'''
Main function to fetch tweets and parse them.
'''
# empty list to store parsed tweets
tweets = []
try:
# call twitter api to fetch tweets
fetched_tweets = self.api.search(q = query, count = count)
# parsing tweets one by one
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
# saving sentiment of tweet
parsed_tweet['sentiment'] =self.get_tweet_sentiment(tweet.text)
# appending parsed tweet to tweets list
if tweet.retweet_count > 0:
# if tweet has retweets, ensure that it is appended only once
if parsed_tweet not in tweets:
tweets.append(parsed_tweet)
else:
tweets.append(parsed_tweet)
# return parsed tweets
return tweets
except tweepy.TweepError as e:
# print error (if any)
print("Error : " + str(e))
def main():
# creating object of TwitterClient Class
api = TwitterClient()
global g,positive_tweets,negative_tweets,neutral_tweets,code_analysis
i=len(entered_value)-1
g = entered_value[i]
if g==None:
g=input("enter value")
# calling function to get tweets
tweets = []
try:
tweets = api.get_tweets(query = g, count = 10)
if not tweets:
13
return 'this should raise an error'
raise RuntimeError("No tweets available for this topic!")
a=str(len(tweets))
if len(a)==1:
a= '0'+str(a)
code_analysis['No. of tweets\t\t \t:'] = a
# picking positive tweets from tweets
ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']
# percentage of positive tweets
pp=0
pp=round(100*len(ptweets)/len(tweets),2)
code_analysis['No. of positive tweets \t\t:'] = str(len(ptweets))
# picking negative tweets from tweets
ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']
code_analysis['No. of negative tweets \t\t:'] = str(len(ntweets))
# percentage of negative tweets
np=0
np=round(100*len(ntweets)/len(tweets),2)
#picking neutral tweets from tweets
netweets = [tweet for tweet in tweets if tweet['sentiment'] == 'neutral']
# percentage of neutral tweets
nup=0
nup=round(100*len(netweets)/len(tweets),2)
code_analysis['No. of neutral tweets \t\t:'] = str(len(netweets))
code_analysis['Positive tweets Percentage\t\t:'] = pp
code_analysis['Negative tweets Percentage\t\t:'] = np
code_analysis['Neutral tweets Percentage\t\t:'] = nup
# printing first 5 positive tweets
ptw=''
for tweet in ptweets:
ptw=ptw+','+str(tweet['text'])+'\n'
positive_tweets.append(ptw)
ptw=''
# printing first 5 negative tweets
ntw=''
for tweet in ntweets:
ntw=ntw+', '+str(tweet['text'])+'\n'
negative_tweets.append(ntw)
ntw=''
# printing first 5 neutral tweets
14
netw=''
for tweet in netweets:
netw=netw+', '+str(tweet['text'])+'\n'
neutral_tweets.append(netw)
netw=''
exp_vals=[pp,np,nup]
exp_labels=["Positive","Negative","Neutral"]
plt.pie(exp_vals, labels=exp_labels)
except Exception as error:
print("ERROR : ", repr(error))
if __name__ == "__main__":
# calling main function
a=main()
if a=='this should raise an error':
error()
else:
show_queries.insert(-2,queryname)
dd()
create_and_append(new[0])
function_for_output()
def error():
messagebox.showerror('ERROR','No Tweets for this topic!!')
def list_of_queries():
global lst
try:
f=open('List_of_queries.txt','r')
s=' '
while s:
s=f.readline()
s=s.rstrip('\n')
s=s.strip(' ')
lst.append(s)
f.close()
print(lst)
except:
lst=[' ']
def checking():
global positive_tweets,negative_tweets,neutral_tweets,code_analysis,lst,show_queries
list_of_queries()
15
queryna=entry.get().upper().split()
queryname=''
positive_tweets.clear();negative_tweets.clear();neutral_tweets.clear()
for i in queryna:
queryname+=i+' '
queryname=queryname.rstrip(' ')
lst=[i for i in lst if i!='']
if queryname in lst:
name = open(queryname+'.txt','r',encoding='UTF-8')
a=''
lst1=name.readlines()
name.close()
code_analysis['No. of tweets\t\t \t:'] = (lst1[0].split())[-1]
code_analysis['No. of positive tweets \t\t:'] = (lst1[1].split())[-1]
code_analysis['No. of negative tweets \t\t:'] = (lst1[2].split())[-1]
code_analysis['No. of neutral tweets \t\t:'] = (lst1[3].split())[-1]
code_analysis['Positive tweets Percentage\t\t:'] = float(lst1[4].split()[-1])
code_analysis['Negative tweets Percentage\t\t:'] = float(lst1[5].split()[-1])
code_analysis['Neutral tweets Percentage\t\t:'] = float(lst1[6].split()[-1])
print(lst1)
u=int(lst1[1].split()[-1])
v=int(lst1[2].split()[-1])
w=int(lst1[3].split()[-1])
for i in lst1[7:]:
i=i.rstrip('\n')
print(i)
if i=='newline':
if len(positive_tweets)<u:
positive_tweets.append(a)
elif len(negative_tweets)<v:
negative_tweets.append(a)
elif len(neutral_tweets)<w:
neutral_tweets.append(a)
a=''
continue
elif i in ['',' ']:
pass
else :
a+= i+' '
print(a)
function_for_output()
clear_content()
else:
16
python_code()
clear_content()
def create_and_append(queryname):
global positive_tweets,negative_tweets,neutral_tweets,code_analysis
file_contents = open(queryname+'.txt','w',encoding='UTF-8')
contents=[positive_tweets,negative_tweets,neutral_tweets]
for i in code_analysis:
file_contents.write(i+' '+ str(code_analysis[i])+'\n')
for i in contents:
for j in i:
file_contents.write(j+'\n')
file_contents.write('newline\n')
file_contents.close()
searched = open('List_of_queries.txt','a',encoding='UTF-8')
searched.write(queryname+'\n')
searched.close()
def function_for_output():
#global entered_value
#entered_value.append(entry.get())
output=Toplevel()
output.geometry("1400x700")
output.title('output')
output.config(bg='#fafcff')
notebook=ttk.Notebook(output)
tab1=Frame(notebook,width=1250,height=600)
tab2=Frame(notebook,width=1250,height=600)
tab3=Frame(notebook,width=1250,height=600)
tab4=Frame(notebook,width=1250,height=600)
notebook.add(tab1,text='code analysis')
notebook.add(tab2,text='positive tweets')
notebook.add(tab3,text='negative tweets')
notebook.add(tab4,text='neutral tweets')
notebook.place(x=0,y=0)
c=0
for i in code_analysis:
Label(tab1,text=i + str(code_analysis[i]),font=('Bahnschrift
SemiBold',14,'bold')).place(x=0,y=0+(40*c))
c+=1
for i in range(len(positive_tweets)):
Label(tab2,text=str(i+1) + positive_tweets[i],font=('Bahnschrift
SemiBold',12,'bold')).place(x=0,y=0+(40*i))
if len(positive_tweets)==0:
17
Label(tab2,text='No Positive Tweets for this Topic!!',font=('Bahnschrift
SemiBold',12,'bold'),fg='red').place(x=0,y=0)
for i in range(len(negative_tweets)):
Label(tab3,text=str(i+1) + negative_tweets[i],font=('Bahnschrift
SemiBold',12,'bold')).place(x=0,y=0+(40*i))
if len(negative_tweets)==0:
Label(tab3,text='No Negative Tweets for this Topic!!',font=('Bahnschrift
SemiBold',12,'bold'),fg='red').place(x=0,y=0)
for i in range(len(neutral_tweets)):
Label(tab4,text=str(i+1) + neutral_tweets[i],font=('Bahnschrift
SemiBold',12,'bold')).place(x=0,y=0+(40*i))
if len(neutral_tweets)==0:
Label(tab4,text='No Neutral Tweets for this Topic!!',font=('Bahnschrift
SemiBold',12,'bold'),fg='red').place(x=0,y=0)
def angle(n):
if n!=100.00:
return (360*n)/100
else:
return 359
canvas=Canvas(tab1,width=200,height=200)
canvas.place(x=50,y=320)
canvas.create_arc((2,2,150,150),fill = 'green', outline = 'green' , start = 0, extent =
angle(code_analysis['Positive tweets Percentage\t\t:']))
canvas.create_arc((2,2,150,150),fill = 'red', outline = 'red' , start =
angle(code_analysis['Positive tweets Percentage\t\t:']), extent = angle(code_analysis['Negative
tweets Percentage\t\t:']))
canvas.create_arc((2,2,150,150),fill = 'blue', outline = 'blue' , start =
angle(code_analysis['Positive tweets Percentage\t\t:'])+angle(code_analysis['Negative tweets
Percentage\t\t:']), extent = angle(code_analysis['Neutral tweets Percentage\t\t:']))
Label(tab1,bg='green').place(x=220,y=330)
Label(tab1,text='positive tweets ({})'.format(code_analysis['Positive tweets Percentage\t\
t:'])).place(x=260,y=330)
Label(tab1,bg='red').place(x=220,y=370)
Label(tab1,text='negative tweets ({})'.format(code_analysis['Negative tweets Percentage\t\
t:'])).place(x=260,y=370)
Label(tab1,bg='blue').place(x=220,y=410)
Label(tab1,text='neutral tweets ({})'.format(code_analysis['Neutral tweets Percentage\t\
t:'])).place(x=260,y=410)
#Creating main window
18
mw=Tk()
mw.geometry("1500x700")
mw.title("TWITTER SENTIMENTAL ANALYSIS")
mw.config(bg='#1DA1F2')
clicked=StringVar()
clicked.set('Show Searched Queries')
def update_to_box():
if clicked.get() in ['Show Searched Queries','',' ']:
clicked.set('Show Searched Queries')
else:
update=clicked.get()
entry.delete(0,'end')
entry.insert(0,update)
clicked.set('Show Searched Queries')
def clear_content():
entry.delete(0,'end')
def dd():
global clicked
print('show queries= ',show_queries)
list_of_queries()#lst
drop = OptionMenu(mw,clicked,*show_queries)
drop.place(x=400,y=340)
Button(mw,text='Update to Entry
Box',command=update_to_box).place(x=570,y=340,height=30)
Button(mw,text='X',command=clear_content).place(x=800,y=300,height=30)
try:
f=open('List_of_queries.txt','r')
s=' '
while s:
s=f.readline()
s=s.rstrip('\n')
s=s.strip(' ')
show_queries.append(s)
f.close()
print(lst)
except:
show_queries=[' ']
#inserting image
photo = PhotoImage(file='twitter logo.png')
19
label=Label(mw,image=photo)
label.pack()
#creating label
label2=Label(mw,
text="Learn from twitter",
font=('Bahnschrift SemiBold',14,'bold'),
bg='#ffffff')
label2.place(x=400,y=260)
# creating the entry box
entry=Entry(mw,
font=('Bookman Old Style',14))
entry.place(x=400,y=300,width= 400,height=30)
dd()
#creating the search button
sub=Button(mw,
text='search',
bg="#e4edf5",
font=('Franklin Gothic Medium',14,'bold'),
command=checking)
sub.place(x=815,y=300,height=30)
#creating a drop down menu
mw.mainloop()
20
SAMPLE OUTPUT
21
22
23
24
CONCLUSION:
In the project that we have done, sentiment analysis is done with Twitter API,
to highlight the popularity of any specific hashtags and topics of discussion. The
rise of streaming services in the 21st century means that the work done and
progress achieved in the project is completely relevant to the real-time scenario
of the computer dynamics of the current world. Additionally, we have done a
sentimental analysis of a particular tweet, judging by its content, in what sense,
the tweet-er is trying to convey his/her message. It is an important problem to
have overcome and automated, because due the sheer volume of tweets that
have to be analyzed each day on Twitter, it is practically impossible to analyze
the contents of individual tweets in a manual way.
BIBLIOGRAPHY
1. Honey, C. and Herring, S.C., 2009, January. Beyond microblogging:
Conversation and collaboration via Twitter. In 2009 42nd Hawaii
International Conference on System Sciences (pp. 1-10). IEEE
2. Huberman, B.tA., Romero, D.M. and Wu, F., 2008. Social networks that
matter: Twitter under the microscope. arXiv preprint arXiv:0812.1045
3. Cha, M., Haddadi, H., Benevenuto, F. and Gummadi, K.P., 2010, May.
Measuring user influence in twitter: The million follower fallacy. In the
fourth international AAAI conference on weblogs and social media.
4. https://www.youtube.com/watch?v=TuLxsvK4svQ&t=8721s
5. https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/
6. STACKOVER-FLOW
7. W3-SCHOOLS
8. https://classroom.google.com/u/0/c/MzEzMDQ3NjAwOTM3/m/
MzY2MDMwMTk1NTUz/details
9. Computer Science with Python Textbook for class 12-By Sumita Arora.
10.https://webapps.stackexchange.com/questions/19241/how-can-i-get-code-
syntax-highlighting-in-google-docs
25