Sentiment Analysis of Conditional Sentences
Sentiment Analysis of Conditional Sentences
180
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 180–189,
Singapore, 6-7 August 2009. c 2009 ACL and AFNLP
the consequent clause, that are dependent on number of sentences from which we computed
each other. Their relationship has significant im- the percentage in several user-forums. The fig-
plications on whether the sentence describes an ures definitely suggest that there is considerable
opinion. One simple observation is that senti- benefit to be gained by developing techniques
ment words (also known as opinion words) (e.g., that can analyze conditional sentences.
great, beautiful, bad) alone cannot distinguish an To the best of our knowledge, there is no fo-
opinion sentence from a non-opinion one. A cused study on conditional sentences. This paper
conditional sentence may contain many senti- makes such an attempt. Specifically, we deter-
ment words or phrases, but express no opinion. mine whether a conditional sentence (which is
Example 1: If someone makes a beautiful and also called a conditional in the linguistic litera-
reliable car, I will buy it expresses no sentiment ture) expresses positive, negative or neutral opi-
towards any particular car, although “beautiful” nions on some topics/features. Since our focus is
and “reliable” are positive sentiment words. on studying how conditions and consequents af-
This, however, does not mean that a condition- fect sentiments, we assume that topics are given,
al sentence cannot express opinions/sentiments. which are product attributes since our data sets
Example 2: If your Nokia phone is not good, are user comments on different products.
buy this great Samsung phone is positive about Our study is conducted from two perspectives.
the “Samsung phone” but does not express an We start with the linguistic angle to gain a good
opinion on the “Nokia phone” (although the understanding of existing work on different types
owner of the “Nokia phone” may be negative of conditionals. As conditionals can be expressed
about it). Clearly, if the sentence does not have with other words or phrases than if, we will study
“if”, the first clause is negative. Hence, a method how they behave compared to if. We will also
for determining sentiments in normal sentences show that the distribution of these conditionals
will not work for conditional sentences. The ex- based on our data sets.
amples below further illustrate the point. With the linguistic knowledge, we perform a
In many cases, both the condition and conse- computational study using machine learning. A
quent together determine the opinion. set of features for learning is designed to capture
Example 3: If you are looking for a phone the essential determining information. Note that
with good voice quality, don’t buy this Nokia the features here are data attributes used in learn-
phone is negative about the “voice quality” of the ing rather than product attributes or features.
“Nokia phone”, although there is a positive sen- Three classification strategies are designed to
timent word “good” in the conditional clause study how to best perform the classification task
modifying “voice quality”. However, in the fol- due to the complex situation of two clauses and
lowing example, the opinion is just the opposite. their interactions in conditional sentences. These
Example 4: If you want a phone with good three classification strategies are clause-based,
voice quality, buy this Nokia phone is positive consequent-based and whole-sentence-based.
about the “voice quality” of the “Nokia phone”. Clause-based classification classifies each clause
As we can see, sentiment analysis of condi- separately and then combines their results. Con-
tional sentences is a challenging problem. sequent-based classification only uses conse-
One may ask whether there is a large percen- quents for classification as it is observed that in
tage of conditional sentences to warrant a fo- conditional sentences, it is often the consequents
cused study. Indeed, there is a fairly large pro- that decide the opinion. Whole-sentence-based
portion of such sentences in evaluative text. They classification treats the entire sentence as a whole
can have a major impact on the sentiment analy- in classification. Experimental results on condi-
sis accuracy. Table 1 shows the percentage of tional sentences from diverse domains demon-
conditional sentences (sentences containing the strate the effectiveness of these classification
words if, unless, assuming, etc) and also the total models. The results indicate that the whole-
sentence-based classifier performs the best.
Table 1: Percent of conditional sentences Since this paper only studies conditional sen-
Source % of cond. (total #. of sent.) tences, a natural question is whether the pro-
Cellphone 8.6 (47711) posed technique can be easily integrated into an
Automobile 5.0 (8113) overall sentiment analysis or opinion mining sys-
LCD TV 9.92 (258078) tem. The answer is yes because a large propor-
Audio Systems 8.1 (5702) tion of conditional sentences can be detected us-
Medicine 8.29 (160259) ing conditional connectives. Keyword search is
181
thus sufficient to identify such sentences for spe- 3.1 Conditional Connectives
cial handling using the proposed approach. There
A large majority of conditional sentences are
are, however, some subtle conditionals which do
introduced by the subordinating conjunction If.
not use normal conditional connectives and will
However, there are also many other conditional
need an additional module to identify them, but
connectives, e.g., even if, unless, in case, assum-
such sentences are very rare as Table 2 indicates.
ing/supposing, as long as, etc. Table 2 shows the
2 The Problem Statement distribution of conditional sentences with various
connectives in our data. Detailed linguistic dis-
The paper follows the feature-based sentiment cussions of them are beyond the scope of this
analysis model in (Hu and Liu 2004; Popescu paper. Interested readers, please refer to (Dec-
and Etzioni, 2005). We are particularly interested lerck and Reed, 2001). Below, we briefly discuss
in sentiments on products and services, which are some important ones and their interpretations.
called objects or entities. Each object is de- If: This is the most commonly used conditional
scribed by its parts and attributes, which are col- connective. In addition to its own usage, it can
lectively called features in (Hu and Liu, 2004;
also be used to replace other conditional connec-
Liu, 2006). For example, in the sentence, If this
tives, except some semantically richer connec-
camera has great picture quality, I will buy it,
tives (Declerck and Reed, 2001). Most (but not
“picture quality” is a feature of the camera. For
all) conditional sentences can be logically ex-
formal definitions of objects and features, please pressed in the form ‘If P then Q’, where P is the
refer to (Liu, 2006; Liu, 2009). In this paper, we condition clause and Q is the consequent clause.
use the term topic to mean feature as the feature
For practical purposes, we can automatically
here can confuse with the feature used in ma-
segment the condition and consequent clauses
chine learning. The term topic has also been used
using simple rules generated by observing
by some researchers (e.g., Kim and Hovy, 2004; grammatical and linguistic patterns.
Stoyanov and Cardie, 2008). Unless: Most conditional sentences containing
Our objective is to predict the sentiment
unless can be replaced with equivalent sentences
orientation (positive, negative or neutral) on each
with an if and a not. For example, the sentence
topic that has been commented on in a sentence.
Unless you need clarity, buy the cheaper model
The problem of automatically identifying fea- can be expressed with If you don’t need clarity,
tures or topics being spoken about in a sentence buy the cheaper model.
has been studied in (Hu and Liu, 2004; Popescu
Even if: Linguistic theories claim that even if is
and Etzioni, 2005; Stoyanov and Cardie, 2008).
a special case of a conditional which may not
In this work, we do not attempt to identify such
always imply an if-then relationship (Gauker
topics automatically. Instead, we assume that
2005). However, in our datasets, we have ob-
they are given because our objective is to study served that the usage of even if almost always
how the interaction of the condition and conse- translates into a conditional. Replacing even if by
quent clauses affects sentiments. For this pur-
if will yield a sentence that is semantically simi-
pose, we manually identify all the topics.
lar enough for the purpose of sentiment analysis.
3 Conditional Sentences Only if, provided/providing that, on condition
that: Conditionals involving these phrases typi-
This section presents the linguistic perspective of cally express a necessary condition, e.g., I will
conditional sentences. buy this camera only if they can reduce the price.
In such sentences, only usually does not affect
Table 2: Percentage of sentences with some main whether the sentence is opinionated or not.
conditional connectives In case: Conditional sentences containing in
Conditional Connective % of sentences case usually describe a precaution (I will close
If 6.42 the window in case it rains), prevention (I wore
Unless 0.32 sunglasses in case I was recognized), or a relev-
Even if 0.17 ance conditional (In case you need a car, you can
Until 0.10 rent one). Identifying the conditional and conse-
As (so) long as 0.09 quent clauses is not straightforward in many cas-
Assuming/supposing 0.04 es. Further, in these instances, replacing in case
In case 0.04 with if may not convey the intended meaning of
Only if 0.03 the conditional. We have ignored these cases in
182
our analysis as we believe that they need a sepa- tional, the condition is in the simple present
rate study, and also such sentences are rare. tense, and the consequent can be either in past
As (so) long as: Sentences with these connec- tense or present tense, usually with a modal aux-
tives behave similarly to if and can usually be iliary verb preceding the main verb, e.g., If the
replaced with if. acceleration is good, I will buy it.
Assuming/Supposing: These are a category of Second Conditional: This is usually used to
conditionals that behave quite differently. The describe less probable situations, for stating pre-
participles supposing and assuming create condi- ferences and imaginary events. The condition
tional sentences where the conditional clause and clause of a second conditional sentence is in the
the consequent clause can be syntactically inde- past subjunctive (past tense), and the consequent
pendent. It is quite difficult to distinguish those clause contains a conditional verb modifier (like
conditional sentences which contain an explicit would, should, might), in addition to the main
consequent clause and fit within our analysis verb, e.g., If the cell phone was robust, I would
framework. In our data, most of such sentences consider buying it.
have no consequent, thus representing assump- Third conditional: This is usually used to de-
tions rather than opinions. We omit these sen- scribe contrary-to-fact (impossible) past events.
tences in our study (they are also rare). The past perfect tense is used in the condition
clause, and the consequent clause is in the
3.2 Types of Conditionals present perfect tense, e.g., If I had bought the
There are extensive studies of conditional sen- a767, I would have hated it.
tences (also known as conditionals) in linguis- Based on the above definitions, we have devel-
tics. Various theories have led to a number of oped approximate part-of-speech (POS) tags 1 for
classification systems. Popular types of condi- the condition and the consequent of each pattern
tionals include actualization conditionals, infe- (Table 3), which do not cover all sentences, but
rential conditionals, implicative conditionals, etc overall they cover a majority of the sentences.
(Declerck and Reed, 2001). However, these clas- For those not covered cases, the problem is
sifications are mainly based on semantic mean- mainly due to incomplete sentences and wrong
ings which are difficult to recognize by a com- grammars, which are typical for informal writ-
puter program. To build classification models, ings in forum postings and blogs. For example,
we instead exploit canonical tense patterns of the sentence, Great car if you need powerful ac-
conditionals, which are often used in pedagogic celeration, does not fall into any category, but it
grammar books. They are defined based on tense actually means It is a great car if you need po-
and are associated with general meanings. How- werful acceleration, which is a zero conditional.
ever, as described in (Declerck and Reed, 2001), To handle such sentences, we designed a set of
their meanings are much more complex and nu- rules to assign them some default types:
merous than their associated general meanings.
However, the advantage of this classification is If condition contains VB/VBP/VBZ → 0 conditional
If consequent contains VB/VBP/VBS → 0 conditional
that different types can be detected easily be-
If condition contains VBG → 1st conditional
cause they depend on tense which can be pro- If condition contains VBD → 2nd conditional
duced by a part-of-speech tagger. As we will see If conditional contains VBN → 3rd conditional.
in Section 5, canonical tense patterns help senti-
ment classification significantly. Below, we in- Table 3: Tenses for identifying conditional types
troduce the four canonical tense patterns.
Type Linguistic Rule Condition Consequent
Zero Conditional: This conditional form is POS tags POS tags
used to describe universal statements like facts, 0 If + simple present VB/VBP/VBZ VB/VBP/
rules and certainties. In a zero conditional, both → simple present VBZ
the condition and consequent clauses are in the 1 If + simple present VB/VBP/VBZ MD + VB
→ will + bare infinitive /VBG
simple present tense. An example of such sen- 2 If + past tense VBD MD + VB
tences is: If you heat water, it boils. → would + infinitive
First Conditional: Conditional sentences of 3 If + past perfect VBD+VBN MD + VBD
this type are also called potential or indicative → present perfect
conditionals. They are used to express a hypo-
thetical situation that is probably true, but the 1
The list of Part-Of-Speech (POS) tags can be found at:
truth of which is unverified. In the first condi- http://www.ling.upenn.edu/courses/Fall_2003/ling001/
penn_treebank_pos.html
183
By using these rules, we can increase the sen- bating are used when the user is posing a
tence coverage from 73% to 95%. question or expressing doubts. Thus such
phrases usually do not contribute an opinion,
4 Sentiment Analysis of Conditionals especially if they are in the vicinity of the if
We now describe our computational study. We connective. We search a window of 3 words
take a machine learning approach to predict sen- on either side of if to determine if there is any
timent orientations. Below, we first describe fea- such word. We have compiled a list of these
tures used and then classification strategies. words as well and use it in our experiments.
IV. Tense patterns: These are the canonical tense
4.1 Feature construction patterns in Section 3.2. They are used to gen-
I. Sentiment words/phrases and their locations: erate a set of features. We identify the first
Sentiment words are words used to express verb in both the condition and consequent
positive or negative opinions, which are in- clauses by searching for the relevant POS tags
strumental for sentiment classification for ob- in Table 3. We also search for the words pre-
vious reasons. We obtained a list of over 6500 ceding the main verb to find modal auxiliary
sentiment words gathered from various verbs, which are also used as features.
sources. The bulk of it is from V. Special characters: The presence or absence
http://www.cs.pitt.edu/mpqa. We also added of ‘?’ and ‘!’.
some of our own. Our list is mainly from the VI. Conditional connectives: The conditional
work in (Hu and Liu, 2004; Ding, Liu and Yu, connective used in the sentence (if, even if,
2008). In addition to words, there are phrases unless, only if, etc) is also taken as a feature.
that describe opinions. We have identified a VII. Length of condition and consequent clauses:
set of such phrases. Although obtaining these Using simple linguistic and punctuation rules,
phrases was time-consuming, it was only a we automatically segment a sentence into
one-time effort. We will make this list availa- condition and consequent clauses. The num-
ble as a community resource. It is possible bers of words in the condition and consequent
that there is a better automated method for clauses are then used as features. We ob-
finding such phrases, such as the methods in served that when the condition clause is short,
(Kanayama and Nasukawa, 2006; Breck, Choi it usually has no impact on whether the sen-
and Cardie, 2007). However, automatically tence expresses an opinion.
generating sentiment phrases has not been the VIII. Negation words: The use of negation words
focus of this work as our objective is to study like not, don’t, never, etc, often alter the sen-
how the two clauses interact to determine timent orientation of a sentence. For example,
opinions given the sentiment words and the addition of not before a sentiment word
phrases are known. Our list of phrases is by can change the orientation of a sentence from
no means complete and we will continue to positive to negative. We consider a window of
expand it in the future. 3-6 words before an opinion word, and search
For each sentence, we also identify wheth- for these kinds of words.
er it contains sentiment words/phrases in its The following two features are singled out for
condition or consequent clause. It was ob- easy reference later. They are only used in one
served that the presence of a sentiment classification strategy. The first feature is an in-
word/phrase in the consequent clause has dicator, and the second feature has a parameter
more effect on the sentiment of a sentence. (which will be evaluated separately).
II. POS tags of sentiment words: Sentiment (1). Topic location: This feature indicates wheth-
words may be used in several contexts, not all er the topic is in the conditional clause or the
of which may correspond to an opinion. For consequent clause.
example, I trust Motorola and He has a trust (2). Opinion weight: This feature considers only
fund both contain the word trust. But only the sentiment words in the vicinity of the topic,
former contains an opinion. In such cases, the since they are more likely to influence the
POS tags can provide useful information. opinion on the topic. A window size is used
III. Words indicating no opinion: Similar to how to control what we mean by vicinity. The fol-
sentiment words are related to opinions, there lowing formula is used to assign a weight to
are also a number of words which imply the each sentiment word, which is inversely pro-
opposite. Words like wondering, thinking, de- portional to the distance (Dop) of the senti-
ment word to the topic mention. Sentiment
184
value is +1 for a positive word and -1 for a Consequent-based classification: It is observed
negative word. Sentwords are the set of that in most cases, the condition clause con-
known sentiment words and phrases. tains no opinion whereas the consequent clause
±1 reflects the sentiment of the entire sentence.
weight = ∑ , ∀op ∈ {sentwords} Thus, this method uses (in a different way) on-
op Dop
ly the above consequent classifier. If it classi-
4.2 Classification Strategies fies the consequent of a testing conditional
sentence as positive, all the topics in the whole
Since we are interested in topic-based sentiment sentence are assigned the positive orientation,
analysis, how to perform classification becomes and likewise for negative and neutral.
an interesting issue. Due to the two clauses, it Whole-sentence-based classification: In this
may not be sufficient to classify the whole sen- case, a single classifier is built to predict the
tence as positive or negative as in the same sen- opinion on each topic in a sentence.
tence, some topics may be positive and some Training data: In addition to the normal fea-
may be negative. We propose three strategies. tures, the two features (1) and (2) in Section
Clause-based classification: Since there are two 4.1 are used for this classifier. If a sentence
clauses in a conditional sentence, in this case contains multiple topics, multiple training in-
we build two classifiers, one for the condition stances of the same sentence are created in the
and one for the consequent. training data. Each instance represents one
Condition classifier: This method classifies the specific topic. The class of the instance de-
condition clause as expressing positive, nega- pends on whether the opinion on the topic is
tive or neutral opinion. positive, negative or neutral.
Training data: Each training sentence is Testing: For each topic in each test sentence,
represented as a feature vector. Its class is posi- the resulting classifier predicts its opinion.
tive, negative or neutral depending on whether Topic class prediction: This is not needed as
the conditional clause is positive, negative or the prediction has been done in testing.
neutral while considering both clauses.
Testing: For each test sentence, the resulting 5 Results and Discussions
classifier predicts the opinion of the condition 5.1 Data sets
clause.
Topic class prediction: To predict the opi- Our data consists of conditional sentences from 5
nion on a topic, if the topic is in the condition different user forums: Cellphone, Automobile,
clause, it takes the predicted class of the LCD TV, Audio systems and Medicine. We ob-
clause. tained user postings from these forums and ex-
Consequent classifier: This classifier classi- tracted the conditional sentences. We then ma-
fies the consequent clause as expressing posi- nually annotated 1378 sentences from this cor-
tive, negative or neutral opinion. pus. We also annotated the conditional and con-
Training data: Each training sentence is sequent clauses and identified the topics (or
represented as a feature vector. Its class is posi- product features) being commented upon, and
tive, negative or neutral depending on whether their sentiment orientations. In our annotation,
the consequent clause is positive, negative or we observed that sentences with no sentiment
neutral while considering both clauses. words or phrases almost never express opinions,
Testing: For each test sentence, the resulting i.e., only around 3% of them express opinions.
classifier predicts the opinion of the conse- There are around 26% sentences containing no
quent clause. sentiment words or phrases in our data. To make
Topic class prediction: To predict the opi- the problem challenging, we restrict our attention
nion on a topic, if the topic is in the consequent to only those sentences that contain at least one
clause, it takes the predicted class of the sentiment word or phrase. We have annotated
clause. topics from around 900 such sentences. Table 4
The combination of these two classifiers is shows the class distributions of this data. At the
called the clause-based classifier. It works as clause level (topics are not considered), we ob-
follows: If a topic is in the conditional clause, serve that conditional clauses contain few opi-
the condition classifier is used, and if a topic is nions. At the topic-level, 43.5% of the topics
in the consequent clause, the consequent clas- have positive opinions, 26.4% of the topics have
sifier is used. negative opinions, and the rest have no opinions.
185
Table 4: Distribution of classes ture set. All three classifiers improve slightly.
{I+II+III+IV}: This setting includes all the ca-
Positive Negative Neutral
Condition 6.9% 6.7% 86.4% nonical tense based features. We see marked im-
Consequent 49.3% 16.5% 34% provements for the consequent-based and whole-
Topic-level 43.5% 26.4% 29.9% sentence-based classifiers both in term of accura-
cy and F-score, which are statistically significant
For the annotation of data, we assume that compared to those of {I+II+III} at the 95% con-
topics are known. One student annotated the top- fidence level based on paired t-test.
ics first. Then two students annotated the senti- All: When all the features are used, the results
ments on the topics. If a student found that a top- of all the classifiers improve further.
ic annotation is wrong, he will let us know. Some Two main observations worth mentioning:
mistakes and missing topics were found but there 1. Both the consequent-based and whole-
were mainly due to oversights rather than disa- sentence-based classifiers outperform the
greements. The agreement on sentiment annota- clause-based classifier dramatically. This con-
tions were computed using the Kappa score. We firms our observation that the consequent
achieved the Kappa score of 0.63, which indi- usually plays the key role in determining the
cates strong agreements. The conflicting cases sentiment of the sentence. This is further rein-
were then solved through discussion to reach forced by the fact that the consequent-based
consensus. We did not find anything that the an- classifier actually performs similarly to the
notators absolutely disagree with each other. whole-sentence-based classifier. The condi-
5.2 Experimental results tion clause seems to give no help.
2. The second observation is that the linguistic
We now present the results for different combi- knowledge of canonical tense patterns helps
nations of features and classification strategies. significantly. This shows that the linguistic
For model building, we used Support Vector knowledge is very useful.
Machines (SVM), and the LIBSVM implementa- We also noticed that many misclassifications are
tion (Chang and Lin, 2001) with a Gaussian ker- caused by grammatical errors, use of slang
nel, which produces the best results. All the re- phrases and improper punctuations, which are
sults are obtained via 10-fold cross validation. typical of postings on the Web. Due to language
Two-class classification: We first discuss the irregularities (e.g., wrong grammar, missing
results for a simpler version of the problem that punctuations, sarcasm, exclamations), the POS
involves only sentences with positive or negative tagger makes many mistakes as well causing
orientations on some topics (at least one of the some errors in the tense based features.
clauses must have a positive/negative opinion on Three-class classification: We now move to the
a topic). Neutral sentences are not used (~28% of more difficult and realistic case of three classes:
the total). The results of all three classifiers are positive, negative and neutral (no-opinion). Ta-
given in Table 5. The feature sets have been de- ble 6 shows the results. The trend is similar ex-
scribed in Section 4.1. For all the experiments cept that the whole-sentence-based classifier now
below, features (1) and (2) are only used by the performs markedly better than the consequent-
whole-sentence-based classifier, but not used by based classifier. We believe that this is because
the other two classifiers for obvious reasons. the neutral class needs information from both the
{I+II}: This setting uses sentiment words and condition and consequent clauses. This is evident
phrases, their positions and POS tags as features from the fact that there is little or no improve-
(we used Brill’s POS tagger). This can be seen as ment after {I+II} for the consequent-based clas-
the baseline. We observe that both the conse- sifier. We also observe that the accuracies and F-
quent-based and whole-sentence-based classifiers scores for the three-class classification are lower
perform dramatically better than the clause-based than those for the two-class classification. This is
classifier. The consequent-based classifier and understandable due to the difficulty of determin-
the whole-sentence-based classifier perform si- ing whether a sentence has opinion or not. Again,
milarly (with the latter being slightly better). The statistical test shows that the canonical tense-
precision, recall, and F-score are computed as the based features help significantly.
average of the two classes. As mentioned in Section 4.1, the whole-
{I+II+III}: In this setting, the list of special sentence-based classifier only considers those
non-sentiment related words is added to the fea- sentiment words in the vicinity of the topic under
186
Table 5: Two-class classification – positive and negative
Clause-based Consequent-based Whole-sentence-based
classifier classifier classifier
Acc. Prec. Rec. F Acc. Prec. Rec. F Acc. Prec. Rec. F
I+II (senti. words+POS) 39.9 42.8 34.0 37.9 69.1 72.9 67.1 69.8 68.9 73.7 68.13 70.8
I+II+III (+ non-senti. words) 41.5 44.9 37.1 40.6 69.3 73.9 66.3 69.9 69.2 73.7 63.5 71.0
I+II+III+IV (+ tenses) 42.7 45.2 38.5 41.6 72.7 76.4 72.0 74.1 71.1 77.9 72.2 74.9
All 43.2 46.1 38.9 42.2 73.3 77.0 72.7 74.8 72.3 77.8 73.6 75.6
Table 7: Accuracy of the whole-sentence-based classifier with varying window sizes (n)
Window size 1 2 3 4 5 6 7 8 9 10
Accuracy 66.1 62.6 64.1 64.8 65.3 65.7 66.3 67.3 66.9 66.8
investigation. For this, we search a window of n ferent from our work as we are interested in con-
words on either side of the topic mention. To ditional sentences.
study the effect of varying n, we performed an Another important direction is classifying
experiment with various values of the window sentences as subjective or objective, and classify-
size and measured the overall accuracy for each ing subjective sentences or clauses as positive or
case. Table 7 shows how the accuracy changes as negative (Wiebe et al, 1999; Wiebe and Wilson,
we increase the window size. We found that a 2002, Yu and Hatzivassiloglou, 2003; Wilson et
window size of 6-10 yielded good accuracies. al, 2004; Kim and Hovy, 2004; Riloff and
This is because lower values of n lead to loss of Wiebe, 2005; Gamon et al 2005; McDonald et al,
information regarding sentiment words as some 2007). Although these works deal with sen-
sentiment words could be far from the topic. We tences, they aim to solve the general problem.
finally used 8, which gave the best results. This paper argues that there is unlikely a one-
We also investigated ways of using the nega- technique-fit-all solution, and advocates dealing
tion word in the sentence to correctly predict the with specific types of sentences differently by
sentiment. One method is to use the negation exploiting their unique characteristics. Condi-
word as a feature, as described in Section 4.1. tional sentences are the focus of this paper. To
Another technique is to reverse the orientation of the best of our knowledge, there is no focused
the prediction for those sentences which contain study on them.
negation words. We found that the former tech- Several researchers also studied feature/topic-
nique yielded better results. The results reported based sentiment analysis (e.g., Hu and Liu, 2004;
so far are based on the former approach. Popescu and Etzioni, 2005; Ku et al, 2006; Care-
nini et al, 2006; Mei et al, 2007; Ding, Liu and
6 Related Work Yu, 2008; Titov and R. McDonald, 2008; Stoya-
nov and Cardie, 2008; Lu and Zhai, 2008). Their
There are several research directions in sentiment
objective is to extract topics or product features
analysis (or opinion mining). One of the main
in sentences and determine whether the senti-
directions is sentiment classification, which clas-
ments expressed on them are positive or nega-
sifies the whole opinion document (e.g., a prod-
tive. Again, no focused study has been made to
uct review) as positive or negative (e.g., Pang et
handle conditional sentences. Effectively han-
al, 2002; Turney, 2002; Dave et al, 2003; Ng et
dling of conditional sentences can help their ef-
al. 2006; McDonald et al, 2007). It is clearly dif-
fort significantly.
187
In this work, we used many sentiment words Acknowledgements
and phrases. These words and phrases are usually This work was supported in part by DOE SCI-
compiled using different approaches (Hatzivassi- DAC-2: Scientific Data Management Center for
loglou and McKeown, 1997; Kaji and Kitsure- Enabling Technologies (CET) grant DE-FC02-
gawa, 2006; Kanayama and Nasukawa, 2006; 07ER25808, DOE FASTOS award number DE-
Esuli and Sebastiani, 2006; Breck et al, 2007; FG02-08ER25848, NSF HECURA CCF-
Ding, Liu and Yu. 2008; Qiu et al, 2009). There 0621443, NSF SDCI OCI-0724599, and NSF
are several existing lists produced by researchers. ST-HEC CCF-0444405.
We used the one from the MPQA corpus
(http://www.cs.pitt.edu/mpqa) with added phras- References
es of our own from (Ding, Liu and Yu. 2008). In
our work, we also assume that the topics are J. Bos, and M. Nissim. 2006. An Empirical Ap-
known. (Hu and Liu, 2004; Popescu and Etzioni, proach to the Interpretation of Superlatives.
2005; Kobayashi, Inui and Matsumoto, 2007; EMNLP-2006.
Stoyanov and Cardie, 2008) have studied top- E. Breck, Y. Choi, and C. Cardie. 2007. Identify-
ic/feature extraction. ing expressions of opinion in context, IJCAI-
One existing focused study is on comparative 2007.
and superlative sentences (Jindal and Liu, 2006;
Bos and Nissim, 2006; Fiszman et al, 2007; Ga- C.-C. Chang and C.-J. Lin. 2001. LIBSVM: a
napathibhotla and Liu, 2008). Their work identi- library for support vector machines.
fies comparative sentences, extracts comparative http://www.csie.ntu.edu.tw /~cjlin/libsvm
relations in the sentences and analyzes compara- G. Carenini, R. Ng, and A. Pauls. 2006. Interac-
tive opinions (Ganapathibhotla and Liu, 2008). tive Multimedia Summaries of Evaluative
An example comparative sentence is “Honda Text. IUI-2006.
looks better than Toyota”. As we can see, com-
parative sentences are entirely different from C. Gauker. 2005. Conditionals in Context. MIT
conditional sentences. Thus, their methods can- Press.
not be directly applied to conditional sentences. D. Dave, A. Lawrence, and D. Pennock. 2003.
Mining the Peanut Gallery: Opinion Extrac-
7 Conclusion tion and Semantic Classification of Product
To perform sentiment analysis accurately, we Reviews. WWW-2003.
argue that a divide-and-conquer approach is R. Declerck, and S. Reed. 2001. Conditionals: A
needed, i.e., focused study on each type of sen- Comprehensive Empirical Analysis. Berlin:
tences. It is unlikely that there is a one-size-fit-all Mouton de Gruyter.
solution. This paper studied one type, i.e., condi-
tional sentences, which have some unique cha- X. Ding, B. Liu, and P. S. Yu. 2008. A holistic
racteristics that need special handling. Our study lexicon-based approach to opinion mining.
was carried out from both the linguistic and WSDM-2008.
computational perspectives. In the linguistic A. Esuli, and F. 2006. Sebastiani. Determining
study, we focused on canonical tense patterns, term subjectivity and term orientation for opi-
which have been showed useful in classification. nion mining, EACL-2006.
In the computational study, we built SVM mod-
els to automatically predict whether opinions on M. Fiszman, D. Demner-Fushman, F. Lang, P.
topics are positive, negative or neutral. Experi- Goetz, and T. Rindflesch. 2007. Interpreting
mental results have shown the effectiveness of Comparative Constructions in Biomedical
the models. Text. BioNLP-2007.
In our future work, we will further improve M. Gamon, A. Aue, S. Corston-Oliver, S. and E.
the classification accuracy and study related Ringger. 2005. Pulse: Mining customer opi-
problems, e.g., identifying topics/features. Al- nions from free text. IDA-2005.
though there are some special conditional sen-
tences that do not use easily recognizable condi- G. Ganapathibhotla and B. Liu. 2008. Identifying
tional connectives and identifying them are use- Preferred Entities in Comparative Sentences.
ful, such sentences are very rare and spending COLING-2008.
time and effort on them may not be cost-effective V. Hatzivassiloglou, and K. McKeown, K. 1997.
at the moment.
188
Predicting the Semantic Orientation of Adjec- Thumbs up? Sentiment Classification Using
tives. ACL-EACL-1997. Machine Learning Techniques. EMNLP-
2002.
M. Hu and B. Liu. 2004. Mining and summariz-
ing customer reviews. KDD-2004. A-M. Popescu, and O. Etzioni. 2005. Extracting
Product Features and Opinions from Reviews.
N. Jindal, and B. Liu. 2006. Mining Comparative
EMNLP-2005.
Sentences and Relations. AAAI-2006.
G. Qiu, B. Liu, J. Bu and C. Chen. 2009. Ex-
N. Kaji, and M. Kitsuregawa. 2006. Automatic
panding Domain Sentiment Lexicon through
construction of polarity-tagged corpus from
Double Propagation. IJCAI-2009.
HTML documents. ACL-2006.
E. Riloff, and J. Wiebe. 2003. Learning extrac-
H. Kanayama, and T. Nasukawa. 2006. Fully
tion patterns for subjective expressions.
Automatic Lexicon Expansion for Domain-
EMNLP-2003.
Oriented Sentiment Analysis. EMNLP-2006.
V. Stoyanov, and C. Cardie. 2008. Topic Identi-
S. Kim and E. Hovy. 2004. Determining the Sen-
fication for fine-grained opinion analysis.
timent of Opinions. COLING-2004.
COLING-2008.
N. Kobayashi, K. Inui and Y. Matsumoto. 2007.
I. Titov and R. McDonald. 2008. A Joint Model
Extracting Aspect-Evaluation and Aspect-of
of Text and Aspect Ratings for Sentiment
Relations in Opinion Mining. EMNLP-2007.
Summarization. ACL-2008.
L.-W. Ku, Y.-T. Liang, and H.-H. Chen. 2006, P. Turney. 2002. Thumbs Up or Thumbs Down?
Opinion Extraction, Summarization and Semantic Orientation Applied to Unsuper-
Tracking in News and Blog Corpora. AAAI- vised Classification of Reviews. ACL-2002.
CAAW.
J. Wiebe, R. Bruce, and T. O’Hara. 1999. Devel-
B. Liu. 2006. Web Data Mining: Exploring opment and use of a gold standard data set for
Hyperlinks, Content and Usage Data. Sprin- subjectivity classifications. ACL-1999.
ger.
J. Wiebe, and T. Wilson. 2002. Learning to Dis-
B. Liu. 2009. Sentiment Analysis and Subjectivi- ambiguate Potentially Subjective Expressions.
ty. To appear in Handbook of Natural Lan- CoNLL-2002.
guage Processing, Second Edition, (editors:
T. Wilson, J. Wiebe. and R. Hwa. 2004. Just how
N. Indurkhya and F. J. Damerau), 2009 or
mad are you? Finding strong and weak
2010.
opinion clauses. AAAI-2004.
Y. Lu, and C. X. Zhai. 2008. Opinion integration
H. Yu, and Y. Hatzivassiloglou. 2003. Towards
through semi-supervised topic modeling.
answering opinion questions: Separating facts
WWW-2008.
from opinions and identifying the polarity of
R. McDonald, K. Hannan, T. Neylon, M. Wells, opinion sentences. EMNLP-2003.
and J. Reynar. 2007. Structured models for
fine-to-coarse sentiment analysis. ACL-2007
Q. Mei, X. Ling, M. Wondra, H. Su, and C. X.
Zhai. 2007. Topic Sentiment Mixture: Model-
ing Facets and Opinions in Weblogs. WWW-
2007.
V. Ng, S. Dasgupta, and S. M. Niaz Arifin. 2006.
Examining the role of linguistic knowledge
sources in the automatic identification and
classification of reviews. ACL-2006.
B. Pang and L. Lee. 2008. Opinion Mining and
Sentiment Analysis. Foundations and Trends
in Information Retrieval 2(1-2), pp. 1–135,
2008.
B. Pang, L. Lee. and S. Vaithyanathan. 2002.
189