sarcasm detection using emotion detection
sarcasm detection using emotion detection
Abstract— Sarcasm is a sophisticated form of sentiment that even humans find it hard to understand them sometimes
expression where speaker express their opinions opposite of what without prior knowledge of the topic. Example: “Oh! He out
they mean. Sarcasm detection and Emotion detection from social on a duck, what a legendry batsman”. In this example, person
net- working sites has been a great field of study. With the use to express the positive sentiment (greatness) but overall
growth of e-services such as e-commerce, e-tourism and e-
tweet reflect negative sentiment toward the batsman. Unlike a
business, the companies are very keen on exploiting emotion and
sarcasm analysis for their marketing strategies in order to simple negation, a sarcastic text, typically conveys a negative
evaluate the public attitudes towards their brand. Thus efficient opinion using only positive words. Recognition of sarcasm is
emotion and sarcasm modeling system can be a good solution to one of the most difficult tasks in natural language processing.
the above problem. This work aims at developing a system that The average human reader will have difficulty in recognition
groups posts based on emotions, sentiment and find sarcastic of sarcasm in twitter data, product review, blogs, online
posts, if present. The proposed system is to develop a prototype discussion forum, etc. Sarcasm is also very closer to lie in
that help to come to an inference about the emotions of the posts some context, making it more problematic and hard task. As
namely anger, surprise, happy, fear, sorrow, trust, anticipation user or author writes exactly opposite of what he means, this is
and disgust with three sentic levels in each. This helps in better
similar in lying.
understanding of the posts when compared to the approaches
which senses the polarity of the posts and gives just their Emotion is a strong feeling deriving from one's
sentiments i.e., positive, negative or neutral. The posts handling circumstances, mood, or relationships with others. Any strong
these emotions might be sarcastic too. The Sentiment & emotion agitation of the feelings actuated by experiencing love, hate,
identification module identifies the sentiment or emotion of the fear, etc., and usually accompanied by certain physiological
post by evaluating score of each word in the comment which is changes, as increased heartbeat or respiration, and often overt
used by different sarcasm detection methods to detect sarcasm. manifestation, as crying or shaking. Detecting emotion of
The emotion identification module uses the lexical databases comments or reviews of social sites is very important in
WordNet, SentiWordNet to find the right sentiment scores for different research areas. One of the application of Emotion
the words with respect to each emotion. It also uses Sarcasm
detection is in sarcasm detection. It is also used to study point
detection algorithms like Emoticon sarcasm detection, Hybrid
sarcasm detection, Hashtag Processing, Interjection Word Start of views of customer of different businesses, employees of
(IWT). companies, opinion of online site user etc.
An emoticon/emoji is a pictorial representation of a facial
Keywords— WordNet, SentiWordNet, Hybrid sarcasm expression using punctuation marks, numbers and letters,
detection, Interjection Word Start (IWT) which is used to express a person's feelings or mood. Every
emoji have its emotion, which is also used to express the
feeling instead of writing the text. Emoticons also used to
I. INTRODUCTION
make sentence sarcastic.
There are different trends opening in the era of sentiment There are different libraries used for the sentiment finding,
analysis, which analyze ’attitude and opinion people in social for synset detection and to find set of cognitive synonyms. For
media, which including social sites like Facebook, Twitter, Sarcasm detection and Emotion detection it uses lexical
blogs, etc. The main aim of sentiment analysis is to identify databases like WordNet, SentiWordNet, WordNet-Affect.
the polarity (positive, negative or neutral) in a given text. WordNet is a large lexical database of English. Nouns, verbs,
Sarcasm is a special type of sentiment which have the ability adjectives and adverbs are grouped into sets of cognitive
to flip the polarity of the given text. Sarcasm is defined as ‘the synonyms, each expressing a distinct concept. Synsets are
use of irony to mock or convey contempt’. Sarcasm is a interlinked by means of conceptual-semantic and lexical
sophisticated form of sentiment expression where speaker relations [2]. SentiWordNet is a lexical resource used for
express their opinions opposite of what they mean. Sarcasm is opinion mining. SentiWordNet assigns to each synset of
a contrast between positive sentiment word and a negative WordNet to get the sentiment scores which divide that word
situation [6]. What makes task of detecting sarcasm hard is into three categories: positive, negative, neutral word [7].
WordNet-Affect is an extension of WordNet Domains, word in the post. The emotion identification module uses the
including a subset of synsets suitable to represent affective lexical databases WordNet, SentiWordNet to find the right
concepts correlated with affective words. The affective sentiment scores for the words with respect to sentence. The
concepts representing emotional state and also represents fuzzy union of all the sentiment score for the each keywords is
moods, situations eliciting emotions, or emotional responses taken to obtain the measure for the entire sentence and
[7]. emotion of the sentence is find out. The emotion score of the
sentence is normalized based on the number of words in the
II. RELATED WORK sentence. The same methodology is applied to each of the
In recent times, research interest grew rapidly sentences to find the emotion of the whole paragraph and then
towards sarcasm detection in text. Many researchers have to the entire post text.
investigated sarcasm on the data collected from various The sentiment score found are provided to sarcasm
sources such as tweets on Twitter, Facebook, Amazon product detection module. This module has several sub modules
reviews, website comments, etc. namely Emoticon based sarcasm detection approach, Post-
The identification of emotion on social media has Comments based approach, Hybrid approach, Positive
gained lot of attention in recent years. Researchers have sentiment and Negative sentiment sarcasm detection & Pattern
programmed the analysis of emotions in the text. It is done by text match sarcasm detection. Every module has their
using six basic emotions namely anger, fear, disgust, joy, significance to detect the sarcasm.
surprise and sadness [1]. SenticNet is used for opinion mining
which is built using common sense reasoning techniques along
with emotion categorization model. Researchers used the
combination of SentiWordNet and WordNetAffect to find the
emotion in web based Content [3]. WordNet is used to find the
similarity of text and also find emotion synsets. Words have
different meaning depending on the context they appear which
is not handled. Emoticons are also used for the sarcasm
detection. One of the work states that Emoji’s are not always
a direct labeling of emotional content. For instance, a positive
emoji may serve to disambiguate an ambiguous sentence or to
complement an otherwise relatively negative text [2]. Latent
Semantic Analysis (LSA) gives a vector space model that
allows for a homogeneous representation of words, word sets,
sentences and texts. The LSA space represents an emotion in
different ways like: the vector of the specific word denoting
the emotion (e.g. love), the vector representing the synset of
the emotion e.g. (choler, ire), and the vector of all the words in
the synsets labeled with the emotion [3]. Sequence labeling is
uses as a learning mechanism for sarcasm detection in
dialogue. Based on information available in our dataset it
present new features. Comparison is done between two
sequence labelers (SEARN and SVMHMM) with three
classifiers (SVM with oversampled and under-sampled data, Fig. 1. System Architecture
and Naıve Bayes) [4]. Another approach is to use of Novel
Bootstrapping Algorithm that automatically learns lists of
positive sentiment phrases and negative situation phrases from IV. MODULE DESIGN
sarcastic tweets. SVM classifier is used for classifying tweets It explains detailed description of modules used in the
[5]. system architecture.
the data from Facebook database. The whole data is analyzed methodologies which also makes use of the emotion model. In
and the required fields are extracted. The language in which this work, we have designed four methodologies for
the posts are written is analyzed and non-English posts are identifying sarcasm:
clean by using parser.
1) Word Based Detection: In this method, first it gets the
2) POS Tagger: POS tagger which takes a word from text emotion of the comment with the help of SentiWordNet
as input and assign part-of-speech to each word as output. dictionary that is positive, negative or neutral. Then it will
Stanford NLP is used for this with parsing process. Parsing is check the count of each emotion of comment. If the count of
a process of analyzing grammatical structure, identifying its one of the emotion is maximum then other comments are
parts of speech and syntactic relations of words in sentences. sarcastic.
The post text is analyzed and the Part-of-speech of every word For ex. Consider 25 comments of one feed of a Facebook
in the text is found out using Stanford POS tagger. The POS is page. If we find 16 positive comments, 5 negative comments
used to get verbs, nouns, adjectives, etc. When a sentence is and 4 neutral comments. Here max count is 16 of positive
passed through a parser, the parser divides the sentence into comments then the sarcastic comments are 9, which are 5
words and identifies the POS tag information. negative comments and 4 neutral comments.
3) Extracting Required Data: The dataset consists of a lot 2) Emoticon Based Detection: Every post may or may
of noise such as http links, elongated words etc. The words contain emoticon but if it is present in the comment then it is
which doesn’t use in emotion or sarcasm detection, such as used to detect the sarcasm. In emoticon based detection, it gets
prepositions, pronouns, conjunctions are removed. From each the emotion of the emoticon that is positive, negative or
keyword the Unicode characters are removed and stemming is neutral. Then it will check the count of each emotion of
performed. comment. If the count of one of the emotion is maximum then
other comments are sarcastic.
For ex. Consider 25 comments of one feed of a Facebook
B. Emotion & Sentiment Identifier: page. If we find 17 positive comments, 5 negative comments
It finds emotion of words and emoticons present in the and 3 neutral comments. Here max count is 15 of positive
comment. comments then the sarcastic comments are 8, which are 5
negative comments and 3 neutral comments.
1) Finding the Sentiment score of words:The first step in
this phase is to find sentiment score of each word in the 3) Hybrid sarcasm detection approach: This method
comment. To find sentiment score it uses SentiWordNet combines the two approaches that is emoticon based and
dictionary. The dictionary finds score for each word in the comment based to get the output. Text part of the post is
comment with the help of POS tagging. It only take only those extracted separately and its emotion is found out. Like that
word which are helpful for sentiment analysis. For that POS emoticons are extracted separately and its emotion is found
tagging is helpful because it identifies the part of speech out. If there is valid conflict between the emotion of sentence
words from comment. After providing score to each word and emotion of emoticon gives the possibility of sarcasm [1].
addition is done to get final output that is negative score or
positive score. According to that emotion of the sentence is Algorithm: Hybrid sarcasm detection
provided to that comment that is positive emotion or negative Input: Facebook comments
emotion. Output: Classification of comments as sarcastic or not
sarcastic
2) Finding Sentiment score & Emotion of Emoticons:In Notation: C: Comment, CR: Corpus, E: Emoticon, EE:
this step sentiment score and emotion of emoticon is find out. Emotion of emoticon, N: Negative, Ne: Neutral, P: Positive,
W: Words, WE: Emotion of words
To do so, it uses emoticon library and SentiWordNet
dictionary. There is a emoticon library which stores the all
1. for C in CR
information about emoticon like, emoticon id, meaning,
2. W= Find words
emoticon score (with the help of SentiWordNet dictionary), 3. E= find emoticons
etc. For every emoticon there is a unique id called as emoticon 4. end for
id. Emoticon id is checked in that library to get its score and 5. for W in C
according to that sentiment and emotion of emoticon is find 6. Score= find score for each word
out. 7. Total score= score + score
8. end for
C. Sarcasm Detection: 9. WE= sense get from the score
10. for E in C
After finding the emotions, the sarcasm in the posts under
11. Check Unicode of each emoticon
each of these emotions are also detected using the following
12. Score= find score for each Unicode
d) Then calculate all score to get total score of emoticon. Comment: “Oh dear! I don’t know about this. GROW UP”.
e) Check the count of emotion of the comments in the post. a) Do POS tagging to the comment which provide part
of speech to each word of the comment: Oh_UH dear_RB !_.
d) Now emotion with maximum count having the
I_PRP do_VBP n't_RB know_VB about_IN this_DT ._.
probability of non-sarcastic comments while other than
GROW_VB UP_RP
maximum count emotion comments are sarcastic comment.
b) Apply Interjection Word Start (IWS) algorithm on
comments
Count of Negative comment: 14, Count of Positive comment:
8, Count of Neutral comment: 3 c) Check the pattern of Comment with IWS pattern
Here max count=14, so other than max count i.e. 8+3= 11 d) If the pattern match is found comment is sarcastic
comments are Sarcastic else not sarcastic: According to IWS pattern if, after the
interjection word adverb is present then comment is sarcastic:
3) Hybrid based sarcasm detection: Refer fig.2 In the example after interjection word (Oh), adverb is present
Comment: “A woman forgives you when she is at fault!! (dear) so comment is Sarcastic
” Oh_UH dear_RB !_. I_PRP do_VBP n't_RB know_VB
about_IN this_DT ._. GROW_VB UP_RP
a) Get the emotion of the comment from Word based
emotion detection: In this case emotion of the comment text is
C. Statistical Evaluation Metrics:
Negative
There are four statistical parameters namely accuracy,
b) Get the emotion of the emotion if present in the precision, recall and F-score, which are used to evaluate our
comment from Emoticon based emotion detection: In this case proposed approaches.
emotion of the emoticon is Positive
c) Use Hybrid Algorithm to check the conflict condition
between comment text and emoticon then provide the result as
comment is sarcastic or not sarcastic: According to the
algorithm if the emotion of the word is negative and emotion
of emoticon is positive then conflict is happen and the
comment is Sarcastic.
Fig. 3. Facebook Post-these guys are real heroes! Emoticon based 0.8412 0.8617 0.8513
method
Comment: “Heroes without doing anything #sarcasm”
0.8857 0.9323 0.9084
a) Find the Hashtag in the comment: “Heroes without Hybrid method
doing anything #sarcasm”
Hashtag method 0.833 0.854 0.846
b) Check for the word after hashtag (#): “Heroes
without doing anything #sarcasm” Pattern Analysis 0.831 0.736 0.774
c) Check that word in the Hashtag database
Positive Sentiment & 0.801 0.515 0.639
d) If that word is present in the hashtag database, the Negative Situation
comment is Sarcastic else comment is Not-Sarcastic: Word
found in the database so comment is Sarcastic.
5) Pattern analysis for Sarcasm Detection: Refer fig.2
According to the table we can say that Hybrid method gives [2] Raghavan V M; Mohana Kumar P; Sundara Raman R and Rajeswari
more accuracy according to their precision, recall & F-score Sridhar, “Emotion and sarcasm identification of Post from Facebook
data using a Hybrid approach”, ICTACT journal on soft computing, vol.
value because on Facebook people probably use emoticon & 07, Issue. 02, 2017.
the confliction pattern mentioned in the algorithm, when they [3] Aditya Joshi; Vaibhav Tripathi; Pushpak Bhattacharyya; Mark Carman,
want to make sarcastic statement. Positive Sentiment & “Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from
Negative Situation method gives lowest accuracy because on TV Series ‘Friends’” Indian Institute of Technology Bombay, India
the Facebook people generally doesn’t follow the pattern which Monash University, Australia IITB-Monash Research Academy, India,
provided in the PBLGA algorithm. Another reason is this vol. 05, 2017.
method requires full-fledged database which contain all the [4] Dario Bertero and Pascale Fung, “A Long Short-Term Memory
Framework for Predicting Humor in Dialogues”, Association for
positive & Negative situation phrases present in the English Computational Linguistics, pp. 130–135, 2017.
literature.
[5] Aditya Joshi; Pushpak Bhattacharyya and Mark J. Carman, “Automatic
Sarcasm Detection: A Survey”, ACM Computing Surveys, vol. 50,
VI. CONCLUSION AND FUTURE WORK pp.122-145, 2016.
[6] Prof. Nikita P. Desai; Anandkumar D. Dave, “A Comprehensive Study
Using single approach for sarcasm detection is not of Classification Techniques for Sarcasm Detection on Textual Data”,
sufficient. Our paper uses combined approach of different International Conference on Electrical, Electronics, and Optimization
methods like emotion detection, use of emoticons, patterns, Techniques (ICEEOT), pp. 1985-1991, 2016.
etc. identifies the social site comment is sarcastic or not. So it [7] Satoshi Hiai and Kazutaka Shimada, “A Sarcasm Extraction Method
Based on Patterns of Evaluation Expressions”, 5th IIAI International
is required to use combined approach which take different Congress on Advanced Applied Informatics, pp. 31-36, 2016
methods and identify the comment is sarcastic or not. [8] Santosh Kumar Bharti; Korra Sathya Babu, “Parsing-based Sarcasm
The sarcasm identification model is a novel approach Sentiment Recognition in Twitter Data”, IEEE/ACM International
based on emotion model. The sarcasm identification model Conference on Advances in Social Networks Analysis and Mining, pp.
uses different algorithms, libraries and methods in emotion 1373-1380, 2015.
detection phase and its result is used for sarcasm detection. So [9] Ellen R. and Prafulla S., “Sarcasm as Contrast between a Positive
Sentiment and Negative Situation”, in Proceedings the of 2013
it is too much dependent on the emotion identification module Conference on Empirical Methods in Natural Language Processing
which poses risk at times. Most of the time Facebook (EMNLP), Association for Computational Linguistics (ACL), pp. 704-
comments are consists of words, hashtags and the emoticons. 714, 2013.
The system also considers hash tags and emoticons for [10] Li, W., & Xu, H., “Text-based emotion classification using emotion
cause extraction”, Published by Elsevier Ltd, Expert Systems with
sarcasm detection which are an important feature set of Applications, pp. 202-210, 2013.
Facebook posts. The combined approach used in the system [11] Elena Filatova, “Irony and sarcasm: Corpus generation and analysis
gives more accurate result than using individual methods. This using crowdsourcing”, Proceedings of the Eight International
system nullifies the disadvantages of former algorithms and Conference on Language Resources and Evaluation (LREC’12), pp.
methods which are mentioned in the paper. In future work we 392-398, 2012.
can find the sarcasm in images and videos. [12] Roberto González-Ibáñez; Smaranda Muresan and Nina Wacholder,
“Identifying Sarcasm in Twitter: A Closer Look”, Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics:
shortpapers, pp. 581–586, 2011.
REFERENCES [13] Ze-Jing Chuang and Chung-Hsien Wu, “Multi-Modal Emotion
Recognition from Speech and Text”, Computational Linguistics and
[1] Bjarke Felbo; Alan Mislove; Anders Søgaard; Iyad Rahwan; Sune Chinese Language Processing, vol. 9, pp. 45-62, 2011.
Lehmann, “Using millions of emoji occurrences to learn any-domain [14] C. Strapparava and R. Mihalcea, “Learning to Identify Emotions in
representations for detecting sentiment, emotion and sarcasm”, 27th Text”, Proceedings of the 2008 ACM Symposium on Applied
Conference on Neural Information Processing Systems (NIPS), pp. Computing, pp. 1556-1560, 2011.
3111–3119, 2017.