Sentiment Analysis Machine Learning
Sentiment Analysis Machine Learning
net/publication/318368881
CITATIONS READS
3 2,686
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kavita Oza on 24 July 2017.
A domain specific feature based model for movies review has been developed by [4]. Here aspect based
technique is used, which analyzes text movie reviews and assign sentiment label to it on the basis of aspect.
Each aspect is then aggregated from multiple reviews to find sentiment score of specific movie. Author uses
SentiWordNet based technique for feature extraction and to compute document level sentiment. The result
obtained by algorithm is compared with Alchemy API result. The result of comparison shows feature based
model result is better than Alchemy API technique. In short aspect wise sentiment result is better than document
wise result. [4]
A huge collection of near about 300000 corpus tweets for sentiment analysis and opinion mining is collected
by [5]. A sentiment classifier model is build which identifies tweets positive, negative or neutral. In this
technique, collected corpus was divided into 3 sets namely positive emotions- happiness, amusement or joy;
Negative emotions- sadness, anger or disappointment and Neutral-text doesn’t contains emotions. Tree Tagger
is used for POS-tagging for distribution of emotions.
Consumer marketing data is used for collecting sentiments about product and collected data is used for
future prediction. Consumer review data is huge amount of data so author uses hadoop environment for
sentiment analysis. Experimental work created hadoop clusters for analysis of data. Tweets were categorized as
positive, negative and neutral [6].
Hadoop’s FLUME and HIVE tools are also used for analysis of twitter data. FLUME tool extracts data
and stores into HDFS form. HIVE tool is used to extract and analyze data from HDFS type storage. HIVE tool
is helps in analysis of different topics by changing keywords. Author identifies sentiments and polarity of tweets
from election voting data [7].
Twitter data is also automatically classified into positive, negative and neutral according to query term
used in consumer review tweets. In the paper author uses Parts Of Speech (POS) polarity technique and tree
kernel technique. Research work uses two types of resources such as hand dictionary of emotions and
dictionary collected from web. Author used different types of classification and feature extraction algorithm.[8]
III.SENTIMENT ANALYSIS
Six different sentiments can be analyzed using sentiment package namely anger, disgust, fear, joy, sadness
and surprise. By using word cloud frequently occurring words were recorded. A sentiment was added to these
frequently occurring words. These new words and sentiments are added to sentiment file for sentiment analysis.
Present uses bayes algorithm. Sentiment analysis algorithm compares each word with words in sentiment file
and assigns count for each sentiment. Finally it can display count for each sentiment.
Present work also finds polarity of text. Polarity will be positive, negative or neutral. In this experiment
new words were identified using word cloud and then polarity was assigned to them. Similar to sentiment
analysis, it also compares each word with polarity word file and counts polarity of text file. Lastly it displays
count for each polarity.
A. Dataset Creation
Dataset was created from retrieval of Uri Attack tweets. To retrieve tweets a tweeter application was
generated to get ConsumerKey, Consumer_Secret, Access_Token and Access_Token_Secret. These keys are
used to connect R Studio and tweeter application. Once the connection is done, providing search term as “Uri
Attack” a dataset of 5000 tweets was created. This data set was pre-processed to eliminate duplicate tweets so
final dataset contained 1788 tweets
B. Data Pre-processing
The final dataset consisted of raw tweets, which needed pre-processing to get good results. Tweets were
processed to remove stop words, frequent usage words such as conjunctions, numbers, prepositions, names, base
verbs, etc. These type of words do not play any important role in sentiment analysis. Following are pre-
processing steps.
Filtering-In this step tweets are cleaned by removing inks, some special words, emotions symbols, user
names etc.
Tokenization-In this step tweets are separated into different tokens.
Stopwords Removal-Stop words are nothing but specific common words which have no analytic value
are removed from the tweets.
stemDocument-This step is used to remove common word endings such as “ing”, “es”, “s” etc.
Remove White Spaces-Each text contains lots of white spaces. In this step white spaces are removes.
Convert to lower-After removing all unnecessary terms from text, it is converted to lower case.
C. Experimental Work
Experimental work is carried out on Windows XP operating system. Configuration used for working
environment is Intel i3, 3.3 GHz core processor with 2 GB RAM. For experimental purpose R studio version
0.99.486 and R version 3.2.2 is used.
R is open source statistical programming language and software environment. It is mostly useful for
data manipulation, data analysis, calculations and visualization of result in graphical format. Some important
characteristics of R are; it is vast capabilities, wide range of statistical, graphical, data mining, data analysis
techniques or functionalities, excellent community support, extensible functionality and lots of help is available.
It is mostly useful for educational and research purpose. It is open source and freely available [9].
Present study uses R for sentiment analysis of Uri Attack. R has rich set of built in packages such as tm,
sentiment, wordcloudetc [10] are used for sentiment analysis.
A sentiment analysis algorithm is applied on pre-processed dataset of tweets which gave 6 different
sentiments with its counts and polarity of each tweet positive, negative or neutral. Table 1 show sentiment score
for each emotion whereas table 2 shows polarity count.
Table 1. Sentiment score for each emotion
IV. OBSERVATIONS
Present study classifies tweets into six different emotions namely anger, disgust, fear, joy, sadness and
surprise. Also we classified these tweets into three polarities namely positive, negative and neutral. Table 1
shows count of sentiment for different sentiments and table 2 shows polarity count for each type of
polarity.Figure 1 and figure 2 shows sentiment analysis about Uri attack according to emotions and polarity.
From table 1, it is observed that most of the people have fear about this event. Fear count was highest among
all the emotions which are equal to 55.59%. Table also shows that anger and disgust emotions are little bit
similar and are 18.46% and 19.63% respectively. Some tweets are related to sadness and are equal to 0.62%.
Table also shows that some people were surprised about Uri attack. But shameful fact is that 4.75% people were
in joy about Uri attack. This group of people may be terrorist or terrorist supporter. Overall we say that near
about 94.3% people have fear, anger, sadness and disgust emotions, and as 5.7% people were surprised and
joyed about Uri attack.
Polarity of tweet’s aim is to identify overall conceptual polarity of writer. Table 2 shows polarity of the
tweets. From table 2, we observe that 67.17% tweets express negativety, 14.93% express neutral stance and
17.90% were positive.
According to human psychology, any attack or terrorist activity generates panic. Present work accurately
classifies people’s emotions about Uri attack.
V. CONCLUSION
Uri attack was an attack by four terrorist on 18-sep-2016 on Indian military camp located near Uri town in
Jammu And Kashmir State. This attack was condemned by the world. People all over the world tweeted on Uri
attack. Tweets were extracted using Twitter API. This text data is pre-process by different technique to extract
features which were helpful to identifies emotions and polarity of tweets. Present study accurately classifies
emotions as per human psychology. It is useful to discover opinion/ sentiments of people when they tweet. It
also helps in identifying polarity of tweets. All this is carried out using R studio and its text mining packages.
All over the world many netizens have tweeted on this event but only 5000 tweets were considered for
analysis due to restriction of space and processing capability, remaining tweets with their emotions and polarity
were not considered. In future big data analysis technique can be used to classify all emotions for large volume
of tweets.
REFERENCES
[1] M.Govindarajan, Sentiment Analysis of Movie Reviews using Hybrid Method of Naive Bayes and Genetic Algorithm , International
Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970), Volume-3 Number-4 Issue-13
December-2013
[2] Apoorv Agarwal, BoyiXie Ilia Vovsha, Owen Rambow, Rebecca Passonneau, Sentiment Analysis of Twitter Data
[3] Apoor v Agarwal, Jasneet Singh Sabharwal, End-to-End Sentiment Analysis of Twitter Data, Proceedings of the Workshop on
Information Extraction and Entity Analytics on Social Media Data, pages 39–44,COLING 2012, Mumbai, December 2012.
[4] V.K. Singh, R. Piryani, A. Uddin,P. Waila, Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level
sentiment classification, Conference Paper March 2013, DOI: 10.1109/iMac4s.2013.6526500
[5] Alexander Pak, Patrick Paroubek, Twitter as a Corpus for Sentiment Analysis and Opinion Mining
[6] Ajinkya Ingle, Anjali Kante, ShriyaSamak, Anita Kumari, Sentiment Analysis of Twitter Data Using Hadoop, International Journal of
Engineering Research and General Science Volume 3, Issue 6, November-December, 2015, ISSN 2091-2730, www.ijergs.org
[7] Sangeeta, Twitter Data Analysis Using FLUME & HIVE on HadoopFrameWork,Special Issue on International Journal of Recent
Advances in Engineering & Technology (IJRAET) V-4 I-2 For National Conference on Recent Innovations in Science, Technology &
Management (NCRISTM) ISSN (Online): 2347-2812, Gurgaon Institute of Technology and Management, Gurgaon 26th to 27th
February 2016
[8] VarshaSahayak, VijayaShete, ApashabiPathan, Sentiment Analysis on Twitter Data, International Journal of Innovative Research in
Advanced Engineering (IJIRAE) ISSN: 2349-2163, Issue 1, Volume 2 (January 2015)
[9] http://www.rstudio.com accessed 10-Feb-2016
[10] https://cran.r-project.org/web/packages/ accessed 15-Feb-2016
AUTHOR PROFILE
Mr. Dipak R. Kawade is working asassistant professor in department of Computer Science at Sangola
College affiliated to Solapur University (MS) India. Research Interest are Data mining, Text Mining, Digital
Image Processing, Pattern mining etc.
Dr. Kavita S. Oza working asassistant professor in department of Computer Science at Shivaji University,
Kolhapur (MS) India. Her area of research is machine learning and big data analytic.