0% found this document useful (0 votes)
44 views

Lecture 7 - Sentiment Analysis Understanding

This document discusses sentiment analysis and social media analysis. It covers how to determine sentiment, common software tools, use cases, and challenges. Key aspects include determining sentiment at the document, sentence, word/phrase, and aspect level. Challenges include dealing with unstructured noisy data, contextual information, sarcasm, word sense disambiguation, and language constructs. Social media analysis involves examining social networks, profiles, collaboration, publishing, feedback and commenting on social platforms using metrics like membership, links, clicks and posts.

Uploaded by

trisim mathur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Lecture 7 - Sentiment Analysis Understanding

This document discusses sentiment analysis and social media analysis. It covers how to determine sentiment, common software tools, use cases, and challenges. Key aspects include determining sentiment at the document, sentence, word/phrase, and aspect level. Challenges include dealing with unstructured noisy data, contextual information, sarcasm, word sense disambiguation, and language constructs. Social media analysis involves examining social networks, profiles, collaboration, publishing, feedback and commenting on social platforms using metrics like membership, links, clicks and posts.

Uploaded by

trisim mathur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Sentiment Analysis

Agenda

1 4 7
How to How to
Social Media determine improve
2 Sentiment? Sentiment?
2 5 8
Sentiment & How to
Sentiment Softwares select right
Analysis SNA tool?
3 6 9

Use Cases Challenges


Q&A
Ahead
Introduction (Cont’d.)
• Sentiment analysis(SA) : Process of computationally identifying and categorizing opinions
expressed.

• Emotions involve a set of expressive, behavioral, physiological and phenomenological


features.

3
Introduction (Cont’d.)
Emotion
• Roman statesman Cicero
categorized emotions into four
classes:
fear, pain, lust and pleasure.

• Psychologist Robert Plutchik


categorized emotion into eight
basic categories of emotions: joy,
trust, fear, surprise, sadness,
anticipation, anger, and disgust.

Figure 5: Emotion wheel by Plutchik

4
Introduction (Cont’d.)

• ‘Emotions’ and ‘sentiments’ are often considered


as replaceable terms, but sentiments represent a Conversation
more general idea.

E.g. ‘I am happy’
The emotion of a person: ‘happy’ Emotion
The sentiment behind the emotion: ‘positive’.

Sentiment

5
Issues and challenges

• Issues with image/video data


analysis :

Emotion detection is difficult if


expressions are missing in image
datasets.

Figure 6: Image source Towarddatascience(author:Reza chu)

6
Issues and challenges

Angry
Issues with textual data analysis: emotion
(negative
‘You don’t reply to my WhatsApp messages’ sentiment
)

Interpretation of this message is difficult for


humans as well for machine programs.
Sad
emotion
(neutral
sentiment
)

7
Conversation analysis applications
● Psychological assessment
- Depression or mental status examination
- Behavior assessment
- Criminal psychology assessment
- legal trials
- Personal or telephonic interviews
● Rural and remote health care
● Hate speech detection
● Human computer interaction system
● Consumer behavior analysis and preparing business strategies
● Development of innovative products and product analysis
● Suggestion/Recommendation systems

8
Consumer behavior analysis

• Sentiment analysis: provides data points by whether they reflect a negative or positive
feeling, or neutral.
• Emotional analysis: provides deeper analysis of consumer emotions that tries to drill down
into the psychology of different user behaviors.
• In sentiment analysis,
E.g. “This product wasn’t what I expected” and
“I hate this product with the white hot fury of a thousand suns”
both are negative sentiments.
• A huge emotional difference between these statements ‘do not like’ and ‘hate.’

9
SA levels
• Document level: Document level SA is used to categorized
entire document in positive, negative or neutral sentiments.
E.g. Consider document having review of drama. Document
• Sentence level: SA is performed for each sentence. Opinion
about sentence can be gauged using sentence level SA. Sentence

SA
• Word/phrase level: Word/phrase
At this level SA is performed on given phrases or words.
Aspect
Sentiment of a phrase/entity is determined in this approach.
• Aspect based sentiment analysis: Process of extracting
relevant aspects and determining the sentiment of the Figure 8: SA levels
corresponding opinion.

10
ABSA (Cont’d.)
• Sometimes, it is not enough to say whether a post has a "positive" or a "negative" sentiment.
• The user may want to identify the aspects of given target entities and the sentiment expressed
towards each aspect.
E.g. 1 ‘I love Apple products but iPhone 7s is overhyped”;
In this scenario there are two opinions - one with a positive note about Apple product
- one with a negative note about iPhone 7s.
E.g. 2 “Although the service is not that great, I still love this restaurant.”
clearly connotes the positive sentiment, however, this sentence is not entirely positive.
In fact, this sentence represents the negative sentiment about the service, but positive
sentiment about the restaurant.

The majority of current sentiment analysis approaches tries to detect the overall polarity of a sentence
(or a document) regardless of the target entities (e.g. restaurants) and their aspects (e.g. food, price).

11
ABSA (Cont’d.)
Aspect Term Aspect Term
Extraction (ATE) Sentiments (ATS)

Aspect Category Aspect Category


Detection(ACD) Sentiments(ACS)

Figure 9: Subtasks in ABSA

E.g. ‘Ice-cream is nice as well.’


ATE: ice-cream, ATS: Positive,
ACD: Food/Dessert, ACS: Positive

12
Challenges in SA

• Unstructured Data
• Noise (slangs, abbreviations):
E.g.1 mvie ws awsummm
The web content reports a large number of spelling variations for the same word.
E.g.2 a word awesome can be found in various forms as- “awsum, awssuummm, awsome”.
• Contextual Information: Identifying the context of the text becomes an important challenge to
address.
E.g.1 The movie was long.
E.g.2 Lecture was long.
E.g.3 Battery capacity of IPhone 11 pro is long.

In all the above 3 examples, meaning of long is same- indicating the duration or passage of time. In
e.g.1 and e.g.2 “long” indicates boredness hence a Negative expression whereas in e.g.3 “long” indicates
efficiency hence a Positive expression.

13
Challenges in SA (Cont’d.)
• Sarcasm Detection- a sharp, bitter, or cutting expression or remark; a bitter jibe or
taunt usually conveyed through irony or understatement.
E.g.1 Few characters are not irritating because they are already dead.
E.g.2 This was a fantastic place, I will not come again over here.
• Word Sense Disambiguation- word with multiple meanings
• Language Constructs
- Word Order
- Morphological Variations
- Handling Spelling Variations
- Lack of resources

14
1

Social Media
Social Media

 Social Networking  Social Feedback


 Social Profiles  Social Rating, Ranking, Commentary
 Social Network Analysis  Social Content Structure

 Social Collaboration
 Wikis
 Blogs / Microblogs
 Collaborative office

 Social Publishing
 Content Sharing
 Content Aggregation
 Social Publishing
Social Media Analysis - Context

Internet Statistics Social Media Statistics

1.5 billion – Internet users worldwide 600k - New members on Facebook per day

55 trillion – links on the Internet 400 million – People on Facebook

27.3 million – Number of tweets on Twitter per


100 billion – The number clicks per day
day (November, 2009); 57% from USA
90 trillion – The number of emails sent in 900000 - The number of blogs posts put up
2011 every day
81% – The percentage of emails that were 84% – Percent of social network sites with more
spam women than men
24 – Hours of video uploaded every minute onto
Youtube
Reference: Internet Sources
Social Media Analysis - Context

 Seeking Relevance: Identify the right area where

information can be located.

 Influence and Authority Finding: Having identified this

subset of relevant blogs, how do we identify the most


authoritative or influential bloggers in this space?

 Sentiment Detection: How do we detect and characterize specific


sentiment expressed about an entity (e.g. product) mentioned in a blog or
forum?

 Emerging Topics / Idea Finding: How do we tease apart novel emerging


topics of discussion from the constant chatter in the blogosphere?

Using Opensource
Introduction - Facts and Opinions

1. Two main types of information on the Web : Facts and Opinions

2. Search Engines
a. Searches for facts (expressed with topic keywords)
b. Do not search for opinions
Opinions are hard to express with a few keywords
e.g. How do people think of Nokia cell phones ?

3. Word-of-mouth on the Web


One can express personal experiences and opinions on almost anything, at review sites, forums,
discussion groups, blogs, etc. They contain valuable information.

Web / Global scale


No longer limited to your circle of friends

Our Interest
To mine opinions expressed in the user-generated content
An intellectually very challenging problem.
Practically very useful.
Applications / Motivations

 Businesses and organizations:


 product and service benchmarking.
 market intelligence.
 Business spends a huge amount of money to find consumer sentiments and
opinions.
 Consultants, surveys and focused groups, etc.

 Individuals: interested in other’s opinions when


 purchasing a product or using a service
 finding opinions on political topics

 Ads placements: Placing ads in the user-generated content


 Place an ad when one praises a product.
 Place an ad from a competitor if one criticizes a product.

 Opinion retrieval/search: providing general search for opinions.


 Tracking sentiments over time, etc.
Sentiment & Sentiment Analysis

Sentiment
 "personal experience, one's own feeling"
 "what one feels about something"
 "feeling, affection, opinion"
Sentiment Analysis
 Identify the orientation of text in the document
 The task is to identify objects and their feature sets and to then classify them as positive or
negative.
 Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some
topic

The movie The movie The movie


was fabulous! stars Mr. X was horrible!

Can be generalized to a wider set of emotions


Types of Sentiment Analysis

 Polarity based Sentiment Analysis

 Subjectivity based Sentiment Analysis

 Feature / Aspect based Sentiment Analysis


Sentiment Analysis at different levels

Word-level SA
His last movie was
Great and interesting.
This one’s a dud.
Sentence-level SA

fabulous
Document-level SA
interesting

police (subj.) stopped (verb) corruption (obj.)


Sentiment Analysis - Context
3

Use Cases
Use Case - 1

Measuring different product features


Use Case - 2

Comparing Competitive brands


Use Case - 3

Recommendation Systems
Use Case - 4

Geographical Sentiment Analysis


Use Case - 5

Target Sentiment on Twitter


Use Case - 6

Stock Market Prediction


Use Case - 7

Brand Reputation Management - Domino Pizza – 2009 crisis

 Experienced serious and global damage in reputation due to spread of bad news on Youtube
by its employees
— The man in the video put some cheese up his nose, nasal mucus on the sandwiches and
violated other health-code standards while a fellow employee provided narration
 Within two days of its posting more than half million people watched it
 Major news reported the crisis event
 People started to discuss it on Twitter
 Brand reputation got impacted badly
 Followed by apology from CEO

Source: Sentiment Analysis on Bad News Spreading


Jaram Park Meeyoung Cha Hoh Kim Jaeseung Jeong
Graduate School of Culture Technology, KAIST
4

How to determine
sentiment?
Technical Approaches to Sentiment Analysis

Linguistic Natural Language Processing


 Linguistic NLP uses grammatical and lexical processing rules to identify and extract entities and
their attributes (features), associate them with specific topics, isolate subjective statements of
sentiment and assign a positive-negative polarity rating.
 Sentiment analysis is typically an additional processing module of an NLP-based text analytics
system. Linguistic NLP requires an industry or enterprise domain expert and sometimes a
linguistics expert to build the language models and rules required to interpret the language
snippets that are being analyzed for sentiment.

Machine Learning Statistical Analysis


 Machine learning statistical analysis (MLSA) forgoes linguistic based analysis of documents and
instead relies on statistical analyses of words in documents to determine topic, sentiment and
polarity.
 Most of the MLSA approaches are supervised meaning that the systems are trained on document
sets that already have sentiment and polarity assigned to them. The training process uses a
variety of mathematical algorithms to develop predictive models (e.g., decision trees, logistic
regression, or neural networks) that enable a trained system to properly classify new text
inputs. Training takes time so there have been some efforts to create un-supervised or semi-
supervised MLSA systems.
Machine Learning Statistical Analysis
NLP vs. Statistical

Rules, Models & Lexicons


Framework for characterizing
expressions Dictionaries / Word
Lists
Custom Taxonomies and Training Corpus
Ontologies
Linguistic processing rules
Rules-based processing model

Run-Time Sentiment Analysis Engine


Linguistic NLP Analysis Machine Learning Statistical Analysis
Language Detection Feature Creation:
Sentence Simplification  Bag-of-Words (BOW)
Sentence Detection  Parts-of-Speech Tagging
Tokenization
Part-of-speech tagging Feature Extraction / Sentence comparison:
Stemming  Latent Semantic Indexing
Named Entity Recognition
Shallow Parsing / Chunking Classifiers:
Quintuple Identification  Naïve Bayes
Co-Reference Analysis  Maximum Entropy
 Support Vector Machines
Sentiment Results
Scoring:
Bagging
Boosting
Representation & Display:
Textual
Non-Textual
Representative Sample
Timeline
Opinion Mining Tasks / Types of Sentiment Analysis

Document level Feature level


Task: sentiment classification of reviews
• Classes: positive, negative, and neutral Task 1: identifying and extracting object
features that have been commented on in
each review.
• Assumption: each document (or review)
focuses on a single object O (not true in
many discussion posts) and contains Task 2: determining whether the opinions
opinion from a single opinion holder. on the features are positive, negative or
neutral in the review.

Sentence level Task 3: grouping feature synonyms. Produce


Task 1: identifying subjective / opinionated a feature-based opinion summary of
sentences multiple reviews (more on this later).
• Classes: objective and subjective
(opinionated)  Opinion holders: identify holders is also
useful, e.g., in news articles, etc, but
Task 2: sentiment classification of sentences they are usually known in user generated
• Classes: positive, negative and neutral. content, i.e., the authors of the posts.
• Assumption: a sentence contains only one
opinion not true in many cases.
Document Level Sentiment Analysis

Classify documents (e.g. reviews) based on the overall sentiments expressed by authors.
E.g. epinions.com on automobiles, banks, movies, and travel destinations.

Task involved (NLP Approach, Turney – ACL 02) Accuracy obtained (70% to 85%)

Step # Activity

1 Part of Speech Tagging

2 Extracting two consecutive words (two-word phrases) from reviews if


their tags confirm to some pattern as described in Table next.

3 Estimate the semantic orientation of the extracted phrases using


Pointwise Mutual Information

PMI (word 1, word 2) = log2( P(word1


^ word2) / P(word1) P(word2) )

4 Sematic Orientation = PMI (phrase, “excellent”) – PMI (phrase, “poor”)

5 Compute the average semantic orientation of all phrases


Document Level Sentiment Analysis

Classify documents (e.g. reviews) based on the overall sentiments expressed by authors.
E.g. epinions.com on automobiles, banks, movies, and travel destinations.

Task involved (Machine Learning Approach, Pang et al, EMNLP-02)

Step # Activity

1 Any of the three classification techniques can be tried namely,

a. Naïve Bayes

b. Maximum Entropy

c. Support Vector Machine (Best accuracy 83% unigram)


Sentence Level Sentiment Analysis

Identifying subjective / opinionated sentences. Much of the work on sentence level sentiment
analysis focus on identifying subjective sentences in news articles. All technique used some form of
machine learning.
Objective: e.g., I bought an iPhone a few days ago.
Subjective: e.g., It is such a nice phone.

Task involved (Machine Learning Approach)

Alternatives# Activity
1 Use Naïve Bayesian classifier with a set of data features / attributes extracted
from training sentences (Wiebe et al. ACL-99).
2 Bootstrapping Approach - Using Learnt patterns (Rilloff and Wiebe, EMNLP – 03)

A high precision classifier is used to automatically identify some subjective and


objective sentences.

 Two high precision (low recall) classifiers to be used namely, a high precision
subjective and objective classifier.
 Based on manually collected lexical items, single words and n-grams, which
are good subjective clues. A set of patterns are then learned from these
identified subjective and objective sentences. Syntactic templates are
provided to restrict the kinds of patterns to be discovered, e.g., <sub> passive-
verb. The learned patterns are then used to extract more subject and
objective sentences (the process can be repeated).
Sentence Level Sentiment Analysis

Task involved (NLP Approach)

Alternatives# Activity
3 Yu and Hazivassiloglou, EMNLP-03

For opinion orientation classification, it uses a similar method to (Turney, ACL-02),


but with more seed words (rather than two) and based on log-likelihood ratio
(LLR).

For classification of each word, it takes average of LLR scores of words in the
sentence and use cutoffs to decide positive, negative or neutral.

4 Sum up orientations of opinion words in a sentence (or within some word


window) Kim and Hovy, Coling-04
Feature Level Sentiment Analysis – Context Setting

Sentiment classifications at both document and sentence (or clause) level are useful.

1. They do not find what the opinion holder liked and disliked.

2. An negative sentiment on an object does not mean that the opinion holder dislikes
everything about the object.

3. A positive sentiment on an object does not mean that the opinion holder likes
everything about the object.

We need to go to the Feature level


Feature Level Sentiment Analysis – Context Setting

“I bought an iPhone a few days ago. It was such a nice phone. The touch
screen was really cool. The voice quality was clear too. Although the
battery life was not long, that is ok for me. However, my mother was mad
with me as I did not tell her before I bought the phone. She also thought
the phone was too expensive, and wanted me to return it to the shop. …”

What do we see?
Opinions
Targets of opinions
Opinion holders
Feature Extraction - Different Review format

1. Format 1 - Pros, Cons and detailed review: The reviewer is asked to describe Pros and Cons separately and also write
a detailed review. Epinions.com uses this format.

2. Format 2 - Pros and Cons: The reviewer is asked to Format 3 - Free format: The reviewer can write freely, i.e.,
describe no separation of
Pros and Cons separately - Cnet.com used to use Pros and Cons. Amazon.com uses this format.
this format.
GREAT Camera., Jun 3, 2004
Reviewer: jprice174 from Atlanta, Ga.
I did a lot of research last year before I bought this camera. It
kinda hurt to leave behind my beloved nikon 35mm SLR, but I
was going to Italy, and I needed something smaller, and digital.

The pictures coming out of this camera are amazing. The 'auto'
feature takes great pictures most of the time. And with digital,
you're not wasting film if the picture doesn't come out.
Feature Extraction from Pros and Cons of Format 1
(Liu et al WWW-03; Hu and Liu, AAAI-CAAW-05)

Observation: Each sentence segment in Pros or Cons contains only one feature. Sentence segments
can be separated by commas, periods, semi-colons, hyphens, ‘&’s, ‘and's, ‘but's, etc.

Pros in Example 1 can be separated into 3 segments:


great photos <photo>
easy to use <use>
very small <small>  <size>

Cons can be separated into 2 segments:


battery usage <battery>
included memory is stingy <memory>
Feature Extraction using Label Sequential Rules
– Format 2 & 3

Reviews of these formats are usually complete sentences


e.g., “the pictures are very clear.”

Explicit feature: picture

“It is small enough to fit easily in a coat pocket or purse.”

Implicit feature: size

Extraction: Frequency based approach


Frequent features
Infrequent features
Feature Level Sentiment Analysis - Example

“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really
cool. The voice quality was clear too. Although the battery life was not long, that is ok for
me. However, my mother was mad with me as I did not tell her before I bought the phone. She
also thought the phone was too expensive, and wanted me to return it to the shop. …”

Feature Based Summary

Feature1: Touch screen


Positive:
 The touch screen was really cool.
 The touch screen was so easy to use
and can do amazing things.

Negative:
 The screen is easily scratched.
 I have a lot of difficulty in removing
finger marks from the touch screen.

Feature2: battery life

Note: We omit opinion holders


5

Softwares
Softwares

Natural Language Processing Databases


1. OpenNLP 1. MySQL
2. Lingpipe
3. Stanford Core NLP Words Databases
2. Sentiwordnet
Others 3. AFINN
4. Crawler 4j
5. Twitter 4j Graphs
6. RSSUtils 4. Jasper Reports
7. Sphinx4 5. Google Visualization APIs
8. TextCat 6. JFreeChart
9. GenderAnalyzer
5

Challenges
Challenges

1. Different people conceive polarity of words differently.


• Sharp - "a sharp businessman"
• Sharp - "sharp criticism"; "a sharp-worded exchange“

2. Humans are complicated. Not all of them know. e.g. English Mix language,
abbreviations, spelling mistakes, etc.

3. Named Entity Recognition - What is the person actually talking about, e.g. is
300 Spartans a group of Greeks or a movie?

4. Thwarted expressions - the sentences /words that contradict the overall


sentiment of the set are in majority. E.g. The actors are good, the music is
brilliant and appealing. Yet, the movie fails to strike a chord.
Challenges

5. Anaphora Resolution - the problem of resolving what a pronoun, or a noun


phrase refers to. "We watched the movie and went to dinner; it was awful."
What does "It" refer to?

6. Sarcasm - If you don't know the author you have no idea whether 'bad' means
bad or good.

E.g. For instance, let’s say Karen learns from a Facebook friend that an
electronics company has just started charging customers a support fee for a
popular product that had historically been free. Karen posts the following
response on Facebook: “Oh, that’s just great.”
Challenges

9. Twitter - abbreviations, lack of capitals, poor spelling, poor punctuation, poor


grammar, …

- 60% of the people who use the service are actually tweeting.
- 40% of Twitter users don’t tweet or haven’t tweeted in 30 days (Observers)
- Huge variability and subtlety of spoken and written language.

10. Relative Sentiment – “I bought a Honda Accord.” Great for Honda but bad for
Toyota.
Challenges

12. Compound Sentiment – “I love the trailer but hated the movie”

13. Conditional Sentiment – “I was really pissed, but then they gave me the
refund.”

14. Scoring Sentiment – How positive it is “I like it” versus “I really like it” vs. “I
love it”

15. Sentiment Modifiers - "I bought an iPhone today :-)“

16. International Sentiment - Japanese have unique emoticons, like (;_;) for
crying. Italians tend to be far more effusive and grandiose, whereas Brits are
generally drier and less effusive, making those relative scoring challenges
mentioned earlier all the more complicated.
Thank You

You might also like