Lecture 7 - Sentiment Analysis Understanding
Lecture 7 - Sentiment Analysis Understanding
Agenda
1 4 7
How to How to
Social Media determine improve
2 Sentiment? Sentiment?
2 5 8
Sentiment & How to
Sentiment Softwares select right
Analysis SNA tool?
3 6 9
3
Introduction (Cont’d.)
Emotion
• Roman statesman Cicero
categorized emotions into four
classes:
fear, pain, lust and pleasure.
4
Introduction (Cont’d.)
E.g. ‘I am happy’
The emotion of a person: ‘happy’ Emotion
The sentiment behind the emotion: ‘positive’.
Sentiment
5
Issues and challenges
6
Issues and challenges
Angry
Issues with textual data analysis: emotion
(negative
‘You don’t reply to my WhatsApp messages’ sentiment
)
7
Conversation analysis applications
● Psychological assessment
- Depression or mental status examination
- Behavior assessment
- Criminal psychology assessment
- legal trials
- Personal or telephonic interviews
● Rural and remote health care
● Hate speech detection
● Human computer interaction system
● Consumer behavior analysis and preparing business strategies
● Development of innovative products and product analysis
● Suggestion/Recommendation systems
8
Consumer behavior analysis
• Sentiment analysis: provides data points by whether they reflect a negative or positive
feeling, or neutral.
• Emotional analysis: provides deeper analysis of consumer emotions that tries to drill down
into the psychology of different user behaviors.
• In sentiment analysis,
E.g. “This product wasn’t what I expected” and
“I hate this product with the white hot fury of a thousand suns”
both are negative sentiments.
• A huge emotional difference between these statements ‘do not like’ and ‘hate.’
9
SA levels
• Document level: Document level SA is used to categorized
entire document in positive, negative or neutral sentiments.
E.g. Consider document having review of drama. Document
• Sentence level: SA is performed for each sentence. Opinion
about sentence can be gauged using sentence level SA. Sentence
SA
• Word/phrase level: Word/phrase
At this level SA is performed on given phrases or words.
Aspect
Sentiment of a phrase/entity is determined in this approach.
• Aspect based sentiment analysis: Process of extracting
relevant aspects and determining the sentiment of the Figure 8: SA levels
corresponding opinion.
10
ABSA (Cont’d.)
• Sometimes, it is not enough to say whether a post has a "positive" or a "negative" sentiment.
• The user may want to identify the aspects of given target entities and the sentiment expressed
towards each aspect.
E.g. 1 ‘I love Apple products but iPhone 7s is overhyped”;
In this scenario there are two opinions - one with a positive note about Apple product
- one with a negative note about iPhone 7s.
E.g. 2 “Although the service is not that great, I still love this restaurant.”
clearly connotes the positive sentiment, however, this sentence is not entirely positive.
In fact, this sentence represents the negative sentiment about the service, but positive
sentiment about the restaurant.
The majority of current sentiment analysis approaches tries to detect the overall polarity of a sentence
(or a document) regardless of the target entities (e.g. restaurants) and their aspects (e.g. food, price).
11
ABSA (Cont’d.)
Aspect Term Aspect Term
Extraction (ATE) Sentiments (ATS)
12
Challenges in SA
• Unstructured Data
• Noise (slangs, abbreviations):
E.g.1 mvie ws awsummm
The web content reports a large number of spelling variations for the same word.
E.g.2 a word awesome can be found in various forms as- “awsum, awssuummm, awsome”.
• Contextual Information: Identifying the context of the text becomes an important challenge to
address.
E.g.1 The movie was long.
E.g.2 Lecture was long.
E.g.3 Battery capacity of IPhone 11 pro is long.
In all the above 3 examples, meaning of long is same- indicating the duration or passage of time. In
e.g.1 and e.g.2 “long” indicates boredness hence a Negative expression whereas in e.g.3 “long” indicates
efficiency hence a Positive expression.
13
Challenges in SA (Cont’d.)
• Sarcasm Detection- a sharp, bitter, or cutting expression or remark; a bitter jibe or
taunt usually conveyed through irony or understatement.
E.g.1 Few characters are not irritating because they are already dead.
E.g.2 This was a fantastic place, I will not come again over here.
• Word Sense Disambiguation- word with multiple meanings
• Language Constructs
- Word Order
- Morphological Variations
- Handling Spelling Variations
- Lack of resources
14
1
Social Media
Social Media
Social Collaboration
Wikis
Blogs / Microblogs
Collaborative office
Social Publishing
Content Sharing
Content Aggregation
Social Publishing
Social Media Analysis - Context
1.5 billion – Internet users worldwide 600k - New members on Facebook per day
Using Opensource
Introduction - Facts and Opinions
2. Search Engines
a. Searches for facts (expressed with topic keywords)
b. Do not search for opinions
Opinions are hard to express with a few keywords
e.g. How do people think of Nokia cell phones ?
Our Interest
To mine opinions expressed in the user-generated content
An intellectually very challenging problem.
Practically very useful.
Applications / Motivations
Sentiment
"personal experience, one's own feeling"
"what one feels about something"
"feeling, affection, opinion"
Sentiment Analysis
Identify the orientation of text in the document
The task is to identify objects and their feature sets and to then classify them as positive or
negative.
Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some
topic
Word-level SA
His last movie was
Great and interesting.
This one’s a dud.
Sentence-level SA
fabulous
Document-level SA
interesting
Use Cases
Use Case - 1
Recommendation Systems
Use Case - 4
Experienced serious and global damage in reputation due to spread of bad news on Youtube
by its employees
— The man in the video put some cheese up his nose, nasal mucus on the sandwiches and
violated other health-code standards while a fellow employee provided narration
Within two days of its posting more than half million people watched it
Major news reported the crisis event
People started to discuss it on Twitter
Brand reputation got impacted badly
Followed by apology from CEO
How to determine
sentiment?
Technical Approaches to Sentiment Analysis
Classify documents (e.g. reviews) based on the overall sentiments expressed by authors.
E.g. epinions.com on automobiles, banks, movies, and travel destinations.
Task involved (NLP Approach, Turney – ACL 02) Accuracy obtained (70% to 85%)
Step # Activity
Classify documents (e.g. reviews) based on the overall sentiments expressed by authors.
E.g. epinions.com on automobiles, banks, movies, and travel destinations.
Step # Activity
a. Naïve Bayes
b. Maximum Entropy
Identifying subjective / opinionated sentences. Much of the work on sentence level sentiment
analysis focus on identifying subjective sentences in news articles. All technique used some form of
machine learning.
Objective: e.g., I bought an iPhone a few days ago.
Subjective: e.g., It is such a nice phone.
Alternatives# Activity
1 Use Naïve Bayesian classifier with a set of data features / attributes extracted
from training sentences (Wiebe et al. ACL-99).
2 Bootstrapping Approach - Using Learnt patterns (Rilloff and Wiebe, EMNLP – 03)
Two high precision (low recall) classifiers to be used namely, a high precision
subjective and objective classifier.
Based on manually collected lexical items, single words and n-grams, which
are good subjective clues. A set of patterns are then learned from these
identified subjective and objective sentences. Syntactic templates are
provided to restrict the kinds of patterns to be discovered, e.g., <sub> passive-
verb. The learned patterns are then used to extract more subject and
objective sentences (the process can be repeated).
Sentence Level Sentiment Analysis
Alternatives# Activity
3 Yu and Hazivassiloglou, EMNLP-03
For classification of each word, it takes average of LLR scores of words in the
sentence and use cutoffs to decide positive, negative or neutral.
Sentiment classifications at both document and sentence (or clause) level are useful.
1. They do not find what the opinion holder liked and disliked.
2. An negative sentiment on an object does not mean that the opinion holder dislikes
everything about the object.
3. A positive sentiment on an object does not mean that the opinion holder likes
everything about the object.
“I bought an iPhone a few days ago. It was such a nice phone. The touch
screen was really cool. The voice quality was clear too. Although the
battery life was not long, that is ok for me. However, my mother was mad
with me as I did not tell her before I bought the phone. She also thought
the phone was too expensive, and wanted me to return it to the shop. …”
What do we see?
Opinions
Targets of opinions
Opinion holders
Feature Extraction - Different Review format
1. Format 1 - Pros, Cons and detailed review: The reviewer is asked to describe Pros and Cons separately and also write
a detailed review. Epinions.com uses this format.
2. Format 2 - Pros and Cons: The reviewer is asked to Format 3 - Free format: The reviewer can write freely, i.e.,
describe no separation of
Pros and Cons separately - Cnet.com used to use Pros and Cons. Amazon.com uses this format.
this format.
GREAT Camera., Jun 3, 2004
Reviewer: jprice174 from Atlanta, Ga.
I did a lot of research last year before I bought this camera. It
kinda hurt to leave behind my beloved nikon 35mm SLR, but I
was going to Italy, and I needed something smaller, and digital.
The pictures coming out of this camera are amazing. The 'auto'
feature takes great pictures most of the time. And with digital,
you're not wasting film if the picture doesn't come out.
Feature Extraction from Pros and Cons of Format 1
(Liu et al WWW-03; Hu and Liu, AAAI-CAAW-05)
Observation: Each sentence segment in Pros or Cons contains only one feature. Sentence segments
can be separated by commas, periods, semi-colons, hyphens, ‘&’s, ‘and's, ‘but's, etc.
“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really
cool. The voice quality was clear too. Although the battery life was not long, that is ok for
me. However, my mother was mad with me as I did not tell her before I bought the phone. She
also thought the phone was too expensive, and wanted me to return it to the shop. …”
Softwares
Softwares
Challenges
Challenges
2. Humans are complicated. Not all of them know. e.g. English Mix language,
abbreviations, spelling mistakes, etc.
3. Named Entity Recognition - What is the person actually talking about, e.g. is
300 Spartans a group of Greeks or a movie?
6. Sarcasm - If you don't know the author you have no idea whether 'bad' means
bad or good.
E.g. For instance, let’s say Karen learns from a Facebook friend that an
electronics company has just started charging customers a support fee for a
popular product that had historically been free. Karen posts the following
response on Facebook: “Oh, that’s just great.”
Challenges
- 60% of the people who use the service are actually tweeting.
- 40% of Twitter users don’t tweet or haven’t tweeted in 30 days (Observers)
- Huge variability and subtlety of spoken and written language.
10. Relative Sentiment – “I bought a Honda Accord.” Great for Honda but bad for
Toyota.
Challenges
12. Compound Sentiment – “I love the trailer but hated the movie”
13. Conditional Sentiment – “I was really pissed, but then they gave me the
refund.”
14. Scoring Sentiment – How positive it is “I like it” versus “I really like it” vs. “I
love it”
16. International Sentiment - Japanese have unique emoticons, like (;_;) for
crying. Italians tend to be far more effusive and grandiose, whereas Brits are
generally drier and less effusive, making those relative scoring challenges
mentioned earlier all the more complicated.
Thank You