9)Sentiment Classification in Social Media
9)Sentiment Classification in Social Media
Sentiment Classification in
Social Media
An Analysis of Methods and the Impact of
Emoticon Removal
ANDREAS PÅLSSON
DANIEL SZERSZEN
ANDREAS PÅLSSON
DANIEL SZERSZEN
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Sentiment Classification . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Methods of Sentiment Classification . . . . . . . . . . . . . . . . . . 6
2.2.1 Learning-based Classifiers . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Lexicon-based Classifiers . . . . . . . . . . . . . . . . . . . . . 9
2.3 Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 F-measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Method 13
3.1 Programming Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Lexicon-based classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.1 Neutrality Thresholds . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Learning-based classifiers . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.1 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . 15
3.5.3 Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Results 17
4.1 VADER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 VADER preprocessed . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Naive Bayes preprocessed . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Support Vector Machine preprocessed . . . . . . . . . . . . . . . . . 20
4.7 Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.8 Maximum Entropy preprocessed . . . . . . . . . . . . . . . . . . . . 21
4.9 Comparison of the Results . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Discussion 23
5.1 Size of Data Set and Domain-Specific Knowledge . . . . . . . . . . . 23
5.2 Imbalanced Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.5 Twitter Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 Finding the Optimal Parameters . . . . . . . . . . . . . . . . . . . . 26
6 Future Research 27
6.1 Larger Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Conclusion 29
Chapter 1
Introduction
Social media is a growing source of data and information spread. However, the infor-
mation is convoluted with varying interests, opinions and emotions. Moreover, the
form of communication lacks standardized grammar, spelling, use of slang, sarcasm
and abbreviations, and more. These variables can make extracting critical points,
facts, and the sentiment of the message difficult in situations where a number of
these aspects are present. Through natural language processing (NLP) it is possible
to study and analyze these messages and objectively classify sentiments presented
in social media.
Sentiment classification is the task of labeling data with a polarity through
analysis of the properties contained within the data. Classification can be binary,
meaning either positive or negative, or describe a detailed range of polarity at
the expense of increased implementation complexity. Social media increases the
complexity of the problem, necessitating analysis of informal communication which
does not necessarily adhere to any grammatical or contextual rules. An interesting
aspect of this topic is the difference between spoken and written language and
evaluating which variables are the most important in conveying sentiment in written
form.
1
CHAPTER 1. INTRODUCTION
Ent) and Support Vector Machine (SVM). In addition, the classifiers will evaluate
differently processed data sets to examine the effects of emoticons on their perfor-
mance. Therefore, the main research question is:
1.3 Hypothesis
The machine-learning classifiers are expected to outperform the lexicon-based ap-
proach due to their performance in pattern recognition when trained correctly. This
attribute should prove useful in recognizing strong sentiment-laden properties in the
informal language and lead to improved classification results.
In addition, the better prepared data sets are expected to improve classification
performance. More specifically, classification of preprocessed sets should outperform
their unprocessed counterparts, considering that a generalized vocabulary is easier
to process. Though emoticons provide strong emotional connotation, the noise they
generate lessens the ease of classification.
1.4 Scope
The scope of the report is limited to investigating the performance of classifiers on
differently processed data sets of Tweets1 and examining the effects of emoticons on
sentiment classification. The classifiers include a lexicon-based and a learning-based
approach, limited to three different algorithms for the learning-based classifier. The
algorithms explored are naive Bayes, Maximum Entropy and Support Vector Ma-
chine. A single sentiment lexicon is utilized for the lexicon-based classifier VADER.
Classification is tested on data set consisting of 4200 manually annotated Tweets.
Two variations of this data set are used: an unprocessed set with emoticons included,
and a preprocessed set where emoticons have been replaced with sentiment-laden
words corresponding to the sentiment value of the emoticon.
1
https://twitter.com
2
1.5. STRUCTURE
1.5 Structure
The report is structured into six main sections consisting of Background, Method,
Results, Discussion, Future Research and Conclusion. Background details all the
essential information surrounding the theory and state-of-the-art of the topic in or-
der to assist with understanding the following sections of the report. It also includes
a subsection dedicated to related work outlining similar or related research on the
topic. Method describes the necessary procedure of preparing and implementing
the sentiment classifiers with the accompanied data set. The performance of the
classifiers on the different data sets is presented in Results. An attempt to explain
the results, propose possible improvements and follow-up investigations is included
in the Discussion. Following the discussion is a section detailing Future Research
that could be pursued. Finally, the Conclusion answers the research questions based
on the results of the investigation.
3
Chapter 2
Background
Text classification refers to the automated process of dividing and labeling of units
of texts into separate, predefined categories, also known as classes. Text classifica-
tion can be used to extract the topic from a text but can also include sentiment
classification. Sentiment classification, or sentiment analysis, relates to the polarity
classification of a text, i.e. deciding whether the text is positive, negative or neutral.
NLP is used to systematically examine and evaluate the sentiment conveyed in text
and label it with a corresponding sentiment class.
The popularity of social media has increased the interest and importance of
sentiment classification (Kiritchenko et al 2014). The substantial amounts of data
which they produce increase the need for an automated process of structuring and
categorizing the data, which has potential for a multitude of commercial, political
and social applications. Seeing as 97% of comments on MySpace contain non-
standard formal written English (Thelwall 2009), being able to correctly classify
informal text is becoming increasingly important. These areas include, but are not
limited to, trend recognition, market prediction, spam and flame detection, decision
making and popularity analysis. The availability of the data can allow businesses
to analyze their customers in larger volumes and detect if their product is in the
target domain, while consumers can gain access to information about their interests
allowing for more informed choices (Pang, Lee, 2008).
Certain approaches to sentiment analysis separates the problem into a two-step
process (Wiebe et al, 2005; Yang & Cardie 2013). First the text is analyzed to
determine if it is objective or subjective. The subjective texts are then classified as
positive, negative or neutral (Yu & Hatzivassiloglou, 2003). However, this approach
can cause additional errors in classification in cases where a text is mislabeled as
either objective or subjective. In addition, objective texts are not necessarily free
of sentiment-laden statements and could thus possibly be handled by the subjective
classifier (Kiritchenko et al., 2014). The neutral class has often been omitted in
research or included in the positive and negative classes, but results have shown
5
CHAPTER 2. BACKGROUND
that distinguishing neutral cases from the positive and negative classes can yield
increased classification accuracy (Koppel and Schler 2006).
Feature Extraction
Feature extraction refers to the process of extracting features (e.g. words, sequences
of words) from text. In learning-based algorithms these features are extracted and
put into a bag-of-features, a data set suitable for machine learning algorithms.
Term frequency is another feature that often is an accurate indicator of class
belonging, e.g. text containing many occurrences of “happy” is likely of positive
6
2.2. METHODS OF SENTIMENT CLASSIFICATION
sentiment. However, longer texts naturally have larger amounts of word occurrences,
and the terms occurring very frequently are less informative than features that occur
rarely. These features need to be weighed accordingly which is known as tf-idf (term
frequency times the inverse document frequency).
Naive Bayes
A naive Bayes classifier utilizes a simple methodology to calculate the probability of
a class belonging to a text. Its basis lies on the naive assumption that the features
of a text are independent. This assumption and Bayes’ theorem form the basis of
an NB classifier. In the following equation Ck is a class and x is a feature vector.
p(Ck )p(x|Ck )
p(Ck |x) = (2.1)
p(x)
Despite the classifiers’ simplicity it can compete with other state of the art
sentiment classifiers, and can be considered a strong competitor in the field as it is
easy to implement and achieves high performance scores (Rennie et al., 2003).
Given the assumption of independence between features given the class C, the
following proportionality is obtained:
n
Y
p(Ck |x1 , ..., xn ) ∝ p(Ck ) p(xi |Ck ) (2.2)
i=1
A number of different NB classifiers exist, and the ones most suitable for classi-
fying texts are Bernoulli NB and multinomial NB. The two classifiers differ in how
the features are represented. In Bernoulli classifiers features are independent bi-
nary variables (booleans), whereas a multinomial NB classifier uses feature vectors
representing the frequencies of feature occurrences.
Bernoulli classifiers have been shown to outperform the multinomial classifiers
when the vocabulary of the data set is small (McCallum et al, 1998). However,
term frequency is of greater importance when classifying longer texts with a larger
vocabulary, and so for longer texts the multinomial variant is preferred (McCallum
et al, 1998).
7
CHAPTER 2. BACKGROUND
Maximum Entropy
Maximum Entropy is an alternative method of calculating probability of class be-
longing. The method’s underlying principle is that distributions that are uniform
should be preferred (Nigam et al., 1999). As opposed to the NB method, MaxEnt
makes no assumptions about feature independence. This is more in line with in-
tuition and MaxEnt has been shown to be effective in a number of different NLP
applications (Berger et al, 1996), and sometimes outperforms NB in standard text
classification (Nigam et al., 1999). The probability of data d belonging to class c is
in a MaxEnt estimation is as follows:
1 X
PM E (c|d) := exp( λi,c Fi,c (d, c)) (2.5)
Z(d) i=1
8
2.2. METHODS OF SENTIMENT CLASSIFICATION
Negation Words
One of the major drawbacks of using a lexicon-based classifier is that the context
in which the text is found is often neglected. Consider the following example:
This restaurant was actually neither that good, nor super trendy.
Even though the sentence carries an overall negative sentiment, the words that
will carry the most weight are “good” and “trendy”, and hence the sentence’s class
will be predicted as positive. The problem is that the classifier has no concept of
negation and intensification. One possible solution is to invert the polarity of a
word when it is found next to a negation word, e.g. not. However, one problem is
that negation words are often placed long before the word they are negating. In the
example above, neither “neither” nor “nor” is next to the word they are negating.
A possible improvement that might fix this is looking for negation words before the
word until a clause boundary marker is found (Taboada et al., 2011).
Intensifiers
Intensifiers are words that alter the polarity value of another a phrase, e.g. “super”,
“slightly” and are largely ignored in lexicon-based approaches. These words are
important for classifying the sentiment value of a phrase, seeing as “very good” has
a higher polarity value than “good”. A possible solution is to increase the polarity
value of a word when placed next to an amplifier (e.g. “very”, “really”), and decrease
it when found next to a downtoner (e.g. “slightly”, “somewhat”).
Increasing a lexicon-based classifier’s ability to handle negations and intensifiers
greatly improves on its weaknesses, and consistently scores high accuracy on texts
from different domains, despite the classifier’s simplicity (Taboada et al., 2011).
9
CHAPTER 2. BACKGROUND
2.3.1 Recall
Recall is measured as the fraction of the relevant instances retrieved. However,
since it does not take false positives into account, a perfect recall can be achieved
by classifying every text as that class.
TP
recall = (2.6)
TP + FN
2.3.2 Precision
Precision is measured as the probability of a predicted class being correct. It can
be used in conjunction with recall to fill in the holes left by only considering a recall
measure, i.e. to distinguish where data has been predicted incorrectly.
TP
precision = (2.7)
TP + FP
2.3.3 F-measure
F-measure (or F-score) is a measure of performance that combines precision and
recall by calculating their weighted harmonic mean that covers the flaws of accuracy
with skewed data.
precision · recall
F =2· (2.8)
precision + recall
10
2.4. RELATED WORK
Equation 2.8 is also known as the F1 -score because the relative weight of recall
and precision is set to 1. Unless the experiments explicitly necessitate changing the
relative weights, F1 score is the default measurement for comparing performance
in sentiment classification. A higher F1 score, henceforth known as F-measure,
indicates better performance than a lower score.
11
CHAPTER 2. BACKGROUND
• Word normalization - Texts in social media often contain words that have been
stressed by repeating some of the letters in the word. For example, querying
Twitter for the word “haaate” generates a large amount of results. In order to
deal with this, Balahur checked for the existence of the word in a dictionary,
and if none were found, the stressed letters were removed until a word was
found. For example “haaate” would become “haate”, and finally “hate”.
12
Chapter 3
Method
1
https://python.org
2
version 3.2, http://nltk.org
3
version v0.17.1, http://scikit-learn.org
4
http://comp.social.gatech.edu/papers/
13
CHAPTER 3. METHOD
classifiers return sentiment values ranging from -1 to +1. In order to account for
this, the polarity values are normalized to the [-1, 1] range.
Thresholds were set for positive, negative and neutral classes after inspecting the
data set. A sentiment polarity value ≥ 0.2 is considered positive, whereas a value
≤ −0.2 is considered negative. Any value in the range [-0.2, 0.2] is considered to be
of neutral polarity. The thresholds were necessitated due to the classifiers results
being reported as simply positive, negative or neutral but had to be compared to
the values in the data set for validation.
The data set contains 4200 Tweets distributed over the three sentiment classes
of positive, negative and neutral as follows:
3.3 Preprocessing
In order to determine the importance of emoticons when conveying sentiment, the
classifiers will also be tested with a preprocessed set. The data will be preprocessed
by replacing emoticons in the data set with “happy”, “neutral” or “sad”. These
words were chosen for representing the average sentiment value of each class.
The lexicon used in the lexicon-based experiments contains over 200 emoticons,
and they are all manually annotated with a sentiment score in the range [−4, 4],
normalized to the range [−1, 1]. Every Tweet in the data set is then scanned for
emoticons. If an emoticon is found it is replaced with “happy” for sentiment values
≥ 0.2, “sad” for values ≤ 0.2 and neutral for the [-0.2, 0.2] range.
14
3.5. LEARNING-BASED CLASSIFIERS
considerations in text. This allows the handling of features, such as negation and
intensifiers, which improves the performance of VADER compared to other lexicon-
based classifiers (Hutto, Gilbert, 2014). The lexicon was created with the incoherent
nature of social media content in mind. Therefore, VADER is considered suitable
for a lexicon-based classifier for social media classification.
The test data is stripped of any labels or annotation and fed to the classifier.
The classifier returns a score in the range of [-1, 1] for every Tweet. Finally, the
label is validated with the label in the the original data set. A neutrality threshold
needs to be set up for the comparison to signify anything.
15
CHAPTER 3. METHOD
unigrams occurring most frequently in the data set. The number of unigrams chosen
is varied in the range [10, 20, 50, 100, 200, 300], in search of which parameters yield
the maximum performance.
16
Chapter 4
Results
This section follows up on the method outlined in the previous sections and presents
the results that were achieved. A range of results for different parameters are shown
for each classifier, followed by their preprocessed variant. The lexicon-based VADER
is presented first, which is succeeded by the machine learning approaches of NB,
SVM and MaxEnt respectively. Finally, the results of the optimal parameters are
aggregated and compared in one table.
4.1 VADER
Varying the positive neutrality threshold from 0.1 to 0.9 and the negative thresh-
old from −0.1 to −0.9 yielded a maximum F-measure of 72.3% given a negative
threshold of −0.3 and a positive threshold of 0.4
17
CHAPTER 4. RESULTS
18
4.3. NAIVE BAYES
19
CHAPTER 4. RESULTS
Figure 4.5. Performance of the SVM classifier on unprocessed data for different
α values
Figure 4.6. Performance of the SVM classifier on preprocessed data for differ-
ent α values
20
4.7. MAXIMUM ENTROPY
21
CHAPTER 4. RESULTS
Figure 4.9. Performance of every classifier with optimal parameters with and
without preprocessed data, where PP stands for preprocessed
22
Chapter 5
Discussion
As shown in the aggregate results, the method yielding the best F-measure is the
lexicon-based VADER classifier. The scores are the highest when the Tweets have
been preprocessed, i.e. all emoticons in the Tweets have been replaced with their
respective label.
The results also show that the more advanced machine-learning based approaches
performed worse than their lexicon-based counterparts. This section attempts to
cover the reasons for this trend, criticizes and identifies faults in the proposed
method, and suggests improvements that could be included in future research.
23
CHAPTER 5. DISCUSSION
wide range of domains. The Tweets used in the experiment spanned many different
areas but was lacking in size. Therefore, it is possible that a less reliable but larger
data set could improve the performance of the learning-based classifiers.
5.3 Cross-validation
One method through which the imbalanced data set could have been amended to
some degree would have been to use cross-validation, i.e 5-fold cross-validation, to
gain a better distribution of the negative, positive and neutral training data. This
would not solve the problems caused by the size of the data set and domain-specific
knowledge, but could improve the distribution of the sentiments in the training
data, and consequently allow for a more balanced performance between the positive,
negative and neutral classifications.
5.4 Preprocessing
The increase in performance when the data set was stripped of emoticons was mea-
sured up to 5%. This might not be in line with intuition, considering that emoticons
and other informal features are important for conveying sentiment in social media
(Davidov et al., 2010). However, an explanation for this could be that the words
“happy”, “neutral” and “sad” occurred more frequently in the data set after pre-
processing, and as a result, allowed the classifiers to more accurately weigh these
unigrams.
24
5.4. PREPROCESSING
The data set contained many of the informal features that are frequent in social
media text. The features were not removed due to being considered important for
conveying statement (Davidov et al., 2010). The task of generalizing the language
through preprocessing, in order to improve classifier performance, has been used
before (Balahur, 2013). The preprocessing removed repeated punctuation, replaced
and normalized capitalization. In other words, the generalization was done to such
an extent that the preprocessed data had removed most of the informal sentiment-
laden features and generally remodeled it to a formal text. However, as is evident
of the results in this report, removing detail and abstracting the problem through
preprocessing can be a suitable method for improving classification performance in
social media.
For the learning-based classifiers, in particular MaxEnt, it is important to note
that features that occur rarely in the data set, i.e. less than the minimum number of
occurrences required to be considered for training, would not have an impact on the
classifier’s performance. By replacing the occurrences of emoticons with the words
“happy”, “sad”, or “neutral”, and thus increase the frequency of these words, the
replaced emoticons would be correctly accounted for by the classifier. In addition,
features that were not previously considered could have been included in training
and evaluated accurately by the classifiers.
Preprocessing had the least effect on the NB classifier. An explanation for
this could be that the NB classifier was a Bernoulli NB classifier. In other words,
the classifier only weighed the features based on their presence rather than their
frequency. Therefore, when the emoticons were replaced and certain words became
more frequent, it did not have a significant effect on how the NB classifier evaluated
the data. An alternative would have been to use a Multinomial NB classifier which
evaluates features based on their frequency, in which case, preprocessing could have
had a more noticeable impact on the classifier’s performance.
VADER’s performance differed by 2.6 percentage points when comparing the
results between the unprocessed and preprocessed data sets. The lexicon-based
classifier evaluates the sentiment according to the weights contained in its sentiment
lexicon. While it may seem less accurate than evaluating each emoticon with its
specific weight, the performance difference could be explained by the fact that the
emoticons were replaced with words corresponding to a value closer to the average
of that class. In other words, emoticons with values close to the thresholds for each
class increased in value and thus conveyed sentiment stronger. Simultaneously, the
more sentiment-laden emoticons decreased in value, but the performance gain could
be attributed to the overall increase in words conveying strong sentiment.
Preprocessing might affect differently sized data sets in varying ways. If the
data set used for training is larger and contains enough occurrences of informal
features for the classifiers to be trained on, generalizing the language could penalize
the performance. However, if the data set is smaller and does not contain enough
of these informal features, it is likely that preprocessing the language improves the
performance of the classifier. The accuracy of this statement and the effects of more
sophisticated preprocessing methods could be investigated further.
25
CHAPTER 5. DISCUSSION
26
Chapter 6
Future Research
6.3 Preprocessing
Preprocessing was shown to have a positive impact on the performance of the classi-
fiers. While the effects of extensive preprocessing has been explored before (Balahur,
2013), the extent of the preprocessing in this report was relatively minimal. A more
balanced approach to avoid removing all informal features could be explored, in
other words normalize these features to retain the difference in sentiment conveyed
by e.g. “!” and “!!!!!”. The chosen approach only evaluated text based on three
classes, therefore the importance of retaining these features could be more evident
if additional classes were introduced.
27
Chapter 7
Conclusion
The results give reason to believe that a lexicon-based approach is the best choice for
sentiment classification in social media. The simplicity of the lexicon-based classifier
coupled with not requiring resource-costly training data makes it a strong contender
for social media sentiment classification. A generalized vocabulary improves the
performance of the classifiers, which proposes that further language abstraction
enhances classification performance in social media. Preprocessing the data set is
therefore a successful method for improving sentiment classification results.
29
Bibliography
Aue, Anthony and Michael Gamon. 2005. Customizing sentiment classifiers to new
domains: A case study. In Proceedings of the International Conference on Re-
cent Advances in Natural Language Processing, Borovets, Bulgaria. http://www.
msr-waypoint.com/pubs/65430/new_domain_sentiment.pdf (visited on 29/3/2016)
Agarwal, A., Xie, B., Vovsha, I., Rambow, O. and Passonneau, R., 2011, June.
Sentiment analysis of twitter data. In Proceedings of the workshop on languages in
social media (pp. 30-38). Association for Computational Linguistics. http://www.
cs.columbia.edu/~julia/papers/Agarwaletal11.pdf(visited on 4/4/2016)
Balahur, A., 2013, June. Sentiment analysis in social media texts. In 4th work-
shop on Computational Approaches to Subjectivity, Sentiment and Social Media
Analysis (pp. 120-128). http://www.aclweb.org/anthology/W13-1617(visited on
1/4/2016)
Berger, A.L., Pietra, V.J.D. and Pietra, S.A.D., 1996. A maximum entropy ap-
proach to natural language processing. Computational linguistics, 22(1), pp.39-71.
http://www.isi.edu/natural-language/people/ravichan/papers/bergeretal96.
pdf(visited on 8/4/2016)
Davidov, D., Tsur, O. and Rappoport, A., 2010, August. Enhanced sentiment
learning using twitter hashtags and smileys. In Proceedings of the 23rd international
conference on computational linguistics: posters (pp. 241-249). Association for
Computational Linguistics.
Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using dis-
tant supervision. CS224N Project Report, Stanford, 1, p.12. http://s3.eddieoz.
com/docs/sentiment_analysis/Twitter_Sentiment_Classification_using_Distant_
Supervision.pdf(visited on 10/3/2016)
Hutto, C.J., Yardi, S. and Gilbert, E., 2013, April. A longitudinal study of follow
predictors on twitter. In Proceedings of the sigchi conference on human factors
in computing systems (pp. 821-830). ACM. http://comp.social.gatech.edu/
papers/follow_chi13_final.pdf(visited on 14/3/2016)
Hutto, C.J. and Gilbert, E., 2014, May. Vader: A parsimonious rule-based model for
sentiment analysis of social media text. In Eighth International AAAI Conference on
Weblogs and Social Media. http://comp.social.gatech.edu/papers/icwsm14.
31
vader.hutto.pdf(visited on 22/3/2016)
Kiritchenko, S, Xiaodan Zhu, and Saif Mohammad. 2014. Sentiment analysis of
short informal texts. Journal of Artificial Intelligence Research 50 (2014). https:
//www.jair.org/media/4272/live-4272-8102-jair.pdf(visited on 29/3/2016)
Koppel, M. and J. Schler. 2006. The importance of neutral examples for learning
sentiment. Computational Intelligence, 22(2):100–109. http://citeseerx.ist.
psu.edu/viewdoc/download?doi=10.1.1.84.9735&rep=rep1&type=pdf(visited on
21/3/2016)
McCallum, A. and Nigam, K., 1998, July. A comparison of event models for
naive bayes text classification. In AAAI-98 workshop on learning for text cat-
egorization (Vol. 752, pp. 41-48). http://www.cs.cmu.edu/~knigam/papers/
multinomial-aaaiws98.pdf(visited on 3/4/2016)
Musto, C., Semeraro, G. and Polignano, M., 2014. A Comparison of Lexicon-
based approaches for Sentiment Analysis of microblog posts. Information Filter-
ing and Retrieval, p.59. http://ceur-ws.org/Vol-1314/paper-06.pdf(visited on
14/4/2016)
Nigam, K., Lafferty, J. and McCallum, A., 1999, August. Using maximum en-
tropy for text classification. In IJCAI-99 workshop on machine learning for in-
formation filtering (Vol. 1, pp. 61-67). http://www.kamalnigam.com/papers/
maxent-ijcaiws99.pdf(visited on 5/4/2016)
Pang, B. and Lee, L., 2004, July. A sentimental education: Sentiment analysis using
subjectivity summarization based on minimum cuts. InProceedings of the 42nd
annual meeting on Association for Computational Linguistics (p. 271). Association
for Computational Linguistics.
Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis.Foundations
and trends in information retrieval, 2(1-2), pp.1-135. http://www.cs.cornell.
edu/home/llee/omsa/omsa.pdf(visited on 12/4/2016)
Pang, B., Lee, L. and Vaithyanathan, S., 2002, July. Thumbs up?: sentiment
classification using machine learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language processing-Volume 10 (pp.
79-86). Association for Computational Linguistics. http://www.cs.cornell.edu/
home/llee/papers/sentiment.pdf(visited on 10/3/2016)
Rennie, J.D., Shih, L., Teevan, J. and Karger, D.R., 2003, August. Tackling the
poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623).
http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf(visited on 21/3/2016)
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and
Potts, C., 2013, October. Recursive deep models for semantic compositionality over
a sentiment treebank. In Proceedings of the conference on empirical methods in
natural language processing (EMNLP) (Vol. 1631, p. 1642).
Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M., 2011. Lexicon-
based methods for sentiment analysis. Computational linguistics, 37(2), pp.267-307.
http://www.mitpressjournals.org/doi/pdfplus/10.1162/COLI_a_00049(visited
on 8/4/2016)
Thelwall, M. (2009). “MySpace Comments. Online Information Review”. In: On-
line Information Review 33, pp. 58–76. http://www.emeraldinsight.com/doi/
abs/10.1108/14684520910944391(visited on 12/4/2016)
Yang, B. and Cardie, C., 2013. Joint Inference for Fine-grained Opinion Extraction.
In ACL (1) (pp. 1640-1649). https://aclweb.org/anthology/P/P13/P13-1161.
pdf(visited on 15/4/2016)
Yu, H. and V. Hatzivassiloglou: 2003, ‘Towards Answering Opinion Questions: Sep-
arating Facts from Opinions and Identifying the Polarity of Opinion Sentences’. In:
Proceedings of the Conference on Empirical Methods in Natural Language Process-
ing (EMNLP-2003). Sapporo, Japan, pp. 129–136.
Wiebe, J., Wilson, T. and Cardie, C., 2005. Annotating expressions of opinions
and emotions in language. Language resources and evaluation,39(2-3), pp.165-
210. https://www.cs.cornell.edu/home/cardie/papers/lre05withappendix.
pdf(visited on 15/4/2016)
www.kth.se