0% found this document useful (0 votes)
26 views

Chapter4_DA_New.pptx

Chapter 4 discusses social media analytics, which involves collecting and analyzing data from social networks to inform business decisions and improve strategies. It outlines the benefits of social media analytics, including real-time insights and enhanced customer service, and details the seven layers of analytics, such as text and network analytics. The chapter also covers the social media analytics life cycle, which includes identification, extraction, cleaning, analyzing, visualization, and interpretation of data.

Uploaded by

armanmohd50584
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Chapter4_DA_New.pptx

Chapter 4 discusses social media analytics, which involves collecting and analyzing data from social networks to inform business decisions and improve strategies. It outlines the benefits of social media analytics, including real-time insights and enhanced customer service, and details the seven layers of analytics, such as text and network analytics. The chapter also covers the social media analytics life cycle, which includes identification, extraction, cleaning, analyzing, visualization, and interpretation of data.

Uploaded by

armanmohd50584
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 4

Social Media and


Text Analytics
Overview of social media analytics
Social Media Analytics
• Social media analytics is the process of collecting, tracking and analyzing data from social networks.
• Social media analytics uses specifically designed software platforms that work similarly to web search tools.
Data about keywords or topics is retrieved through search queries
• Social media analytics includes the concept of social listening. Listening is monitoring social channels for
problems and opportunities.
•Information gathered from social channels to support business decisions — and measure the performance of
actions based on those decisions through social media.
Benefits of Social media Analytics:
▪ Social media analytics help improve content strategy, understand audience behavior, boost
engagement, and track campaign performance.
▪ They provide real-time insights, enhance customer service, and help identify trends. Analytics also
allow you to measure , monitor competitor performance, and optimize ad spend for better results.
▪ In essence, they enable data-driven decisions to maximize the effectiveness of social media efforts.
Social Media Analytics Process
1. Data Capturing 2. Data understating 3. Data Presentation
Seven layers of social media analytics
Seven layers of social media analytics
• TEXT: Social media text analytics deals with the extraction and analysis of
business insights from textual elements of social media content, such as
comments, tweets, blog posts, and Facebook status updates. Text
analytics is mostly used to understand social media users’ sentiments or
identify emerging themes and topics.
• NETWORK: Social media network analytics extract, analyze, and interpret
personal and professional social networks, for example, Facebook,
Friendship Network, and Twitter. Network analytics seeks to identify
influential nodes (e.g., people and organizations) and their position in the
network.
• ACTIONS: Social media actions analytics deals with extracting,
analyzing, and interpreting the actions performed by social media users,
including
likes, dislikes, shares, mentions, and endorsement. Actions analytics are
mostly used to measure popularity, influence, and prediction in social
media. The case study included at the end of the chapter demonstrates
how social media actions (e.g., Twitter mentions) can be used for
business intelligence purposes.

• MOBILE/APP: Mobile analytics is the next frontier in the social


business landscape. Mobile analytics deals with measuring and
optimizing user engagement with mobile applications (or apps for
short).
• HYPERLINKS: Hyperlink analytics is about extracting, analyzing, and
interpreting social media hyperlinks (e.g., in-links and out-links). Hyperlink
analysis can reveal, for example, Internet traffic patterns and sources of
incoming or outgoing traffic to and from a source.
• LOCATION: Location analytics, also known as spatial analysis or
geospatial analytics, is concerned with mining and mapping the
locations of social media users, contents, and data.
• SEARCH ENGINES: Search engines analytics focuses on analyzing
historical search data for gaining a valuable insight into a range of
areas, including trends analysis, keyword monitoring, search result and
advertisement history, and advertisement spending statistics.
Social Media Analytics Life Cycle
STEP 1: IDENTIFICATION
• The identification stage is the art part of social media analytics and is concerned
with searching and identifying the right source of information for analytical
purposes.
• The numbers and types of users and information (such as text, conversation, and
networks) available over social media are huge, diverse, multilingual, and noisy.
• Thus, framing the right question and knowing what data to analyze is extremely
crucial in gaining useful business insights.
• The source and type of data to be analyzed should be aligned with business objectives.
Most of the data for analytics will come from your business

owned social media platforms, such as your official Twitter account, Facebook fan
pages, blogs, and YouTube channel.
Some data for analytics, however, will also be harvested from nonofficial social media
platforms, such as Google search engine trends data or Twitter search stream data.
STEP 2: EXTRACTION

•Once a reliable and minable source of data is identified, next comes the science of extraction
stage.
•The type (e.g., text, numerical, or network) and size of data will determine the method and
tools suitable for extraction.
•Small-size numerical information, for example, can be extracted manually (e.g., going through
your Facebook fan page and counting likes and copying comments), and a large-scale
automated extraction is done through an API (application programming interface).
•Manual data extraction maybe practical for small-scale data, but it is the API-based
extraction tools that will help you get most out of your social media platforms.
•Mostly, the social media analytics tools use API-based data extraction.
•APIs, in simple words, are sets of routines/protocols that social media service companies
(e.g., Twitter and Facebook) have set up that allow users to access small portions of data
hosted in their databases.
•The greatest benefit of using APIs is that it allows other entities (e.g., customers,
•Some data, such as social networks and hyperlink networks, can only be extracted
through specialized tools. Two important issues to bear in mind here are the privacy and
ethical issues related to mining data from social media platforms.
•Privacy advocacy groups have long raised serious concerns regarding large-scale
mining of social media data and warned against transforming social spaces into
behavioral laboratories.
•The social media privacy issue first came into the spotlight particularly due to the
large-scale “Facebook Experiment” carried out in 2012, in which Facebook
manipulated the news feeds feature of thousands of people to see if emotion
contagion occurs without face-to-face interaction (and absence of nonverbal cues)
between people in social networks (Kramer, Guillory et al. 2014).
•Though the experiment was consistent with Facebook’s Data Use Policy (Editorial 2014)
and helped promote our understanding of online social behavior, it does, however,
raise serious concerns regarding obtaining informed consent from participants and
allowing them to opt out. The bottom line here is that your data extraction practices
STEP 3: CLEANING
• This step involves removing the unwanted data from the automatically extracted
data. Some data may need a lot of cleaning, and others can go into analysis
directly. In the case of the text analytics, for example, cleaning, coding, clustering, and
filtering may be needed to get rid of irrelevant textual data using natural language
processing (NPL). Coding and filtering can be performed by machines (i.e., automated) or
can be performed manually by humans. For example, DiscoverText combines both machine
learning and human coding techniques to code, cluster, and classify social media data
(Shulman 2014).
STEP 4: ANALYZING
• At this stage the clean data is analyzed for business insights. Depending on the layer of
social media analytics under consideration and the tools and algorithm employed, the
steps and approach you take will greatly vary. For example, nodes in a social media
network can be clustered and visualized in a variety of ways depending on the algorithm
employed. The overall objective at this stage is to extract meaningful insights without the
data losing its integrity. While most of the analytics tools will follow you through the
step-by-step procedure to analyze your data, having background knowledge and an
• STEP. 5: VISUALIZATION
• In addition to numerical results, most of the seven layers of social media analytics will
also result in visual outcomes. The science of effective visualization known as visual
analytics is becoming an important part of interactive decision making facilitated by
solid visualization (Wong and Thomas 2004; Kielman and Thomas 2009). Effective
visualization is particularly helpful with complex and huge data because it can reveal
hidden patterns, relationships,
and trends. It is the effective visualization of the results that will demonstrate the value
of social media data to top management. Depending on the layer of the analytics, the
analysis part will result in relevant visualizations for effective communication of
results.Text analytics, for instance, can result in a word cooccurrence cloud; hyperlink
analytics will provide visual hyperlink networks; and location analytics can produce
interactive maps. Depending on the type of data,
different types of visualization are possible, including the following. Network data
(with whom)—network data visualizations can show who is connected to
whom. For example, a Twitter following-following network chart can show who
is following whom. Different types of networks are discussed in a later chapter.
Topical data (what)—topical data visualization is mostly focused on what aspect of a
phenomenon is under investigation. A text cloud generated from social media
comments can show what topics/themes are occurring more frequently in the
discussion. Temporal data (when)—temporal data visualization slice and dice
data with respect to a time horizon and can reveal longitudinal trends, patterns,
and relationships hidden in the data. Google trends data, for example, can
visually investigate longitudinal search engine trends .eospatial data
(where)—geospatial data visualization is used to map and locate data, people, and
resources.
STEP 6: INTERPRETATION
• Interpreting and translating analytics results into a meaningful business
problem is the art part of social media analytics. This step relies on
human judgments to interpret valuable knowledge from the visual data.
• Meaningful interpretation is particularly important when we are dealing
with descriptive analytics that leave room for different interpretations.
• Having domain knowledge and expertise are crucial in consuming the
obtained results correctly.
Two strategies or approaches used here can be
1) producing easily consumable analytical results and
2) improving analytics consumption capabilities
(Ransbotham 2015). The first approach requires training data scientists and
analysts to produce interactive and easy-to-use visual results. And the
second strategy focuses on improving
Social Network Analysis
•Social network analysis is the process of
investigating social structures through the use of
network.
• SNA is the practice of representing networks
of people as graphs and then exploring these
graphs
• A typical social network representation has
nodes for people, and edges connecting two
nodes to represent one or more relationships
between them as shown in figure.
• The resulting graph can reveal patterns of
connection among people.
The following are some everyday types of social media networks that we come
across and that can be subject to network analytics.
• FRIENDSHIP NETWORKS The most common type of social media networks facebook
• FOLLOW-FOLLOWING NETWORKS: In the follow-following network, users follow (or
keep track of) other users of interested. Twitter is a good example of
follow-following network where users follow influential people,brands, and
organizations.
• CONTENT NETWORKS Content networks are formed by the content posted by social
media users. A network among YouTube videos is an example of a content network.
• PROFESSIONAL NETWORKS LinkedIn is a good example of professional networks
where people manage their professional identify by creating a rofile that lists their
achievements, education, work history, and interests. Nodes in these networks are,
for example, people, brands, and organizations, and links are professional relations
(such as coworker, employee, or collaborator).
Introduction to Natural Language Processing
• Natural language processing (NLP) is the
intersection of computer science,
linguistics and machine learning.
•The field focuses on communication
between computers and humans in
natural language.
•NLP is all about making computers
understand and generate human
language.
•Applications of NLP techniques include
voice assistants like Amazon's Alexa and
Apple's Siri, but also things like machine
The field is divided into the three parts:

1. Speech Recognition—The translation of spoken language into text.


2. Natural Language Understanding—The computer's ability to
understand what we say.
3. Natural Language Generation—The generation of natural language by a
computer.
Understanding human language is considered a difficult task due to its complexity.
• For example, there is an infinite number of different ways to arrange words in a
sentence.
• Also, words can have several meanings and contextual information is necessary
to correctly interpret sentences.
Text Analytics
• Text analytics is the process of
transforming unstructured text
documents into usable, structured
data.
• Text analysis works by breaking apart
sentences and phrases into their
components, and then evaluating each
part’s role and meaning using complex
software rules and machine learning
algorithms.
• In broad terms, these NLP features aim to answer four questions:
1. Who is talking?
2. What are they talking about?
3. What are they saying about those subjects?
4. How do they feel?
Benefits of Text Analytics
• There are a range of ways that text analytics can help businesses, organizations, and event soci
movements
• Help businesses to understand customer trends, product performance, and service
quality. This results in quick decision making, enhancing business intelligence,
increased productivity, and cost savings.
• Helps researchers to explore a great deal of pre-existing literature in a short time, extracting
what is relevant to their study. This helps in quicker scientific breakthroughs.
• Assists in understanding general trends and opinions in the society, that enable
governments and political bodies in decision making.
• Text analytic techniques help search engines and information retrieval systems to improve
Tokenization
• Tokens are the individual units of meaning we are operating on.
• This can be words, phonemes, or even full sentences.
• Tokenization is the process of breaking text documents apart into those pieces.
• In text analytics, tokens are most frequently just words.
E.g. A sentence of 10 words, then, would contain 10 tokens.
• Tokenization is language-specific, and each language has its own tokenization requirements.
• For example English uses white space and punctuation to denote tokens, and is relatively simple
to tokenize.
• Many logographic (character-based) languages, such as Chinese, have no space breaks between
words. Tokenizing these languages requires the use of machine learning algorithm.
Bag of words (BoW)
• A bag-of-words model, or BoW for short, is a way of extracting
features from text for use in modeling, such as with machine
learning algorithms.

• Bag of words is a Natural Language Processing technique


of text modelling.

• In technical terms, we can say that it is a method of


feature extraction with text data.

• This approach is a simple and flexible way of extracting


features from documents.

• A bag of words is a representation of text that describes


Word weighting
• Weighting determines the importance of a term for a document.
• Weights are calculated by many different formulas that consider the
frequency of each term in a document and in the collection, as well as the
length of the document and the average or maximum length of any document
in the collection.
TF-IDF
• TF-IDF stands for “Term Frequency — Inverse Document Frequency”. This is
a technique to quantify words in a set of documents.
• We generally compute a score for each word to signify its importance in the
document
TF-IDF = Term Frequency (TF) * Inverse Document Frequency (IDF)
Term Frequency :
• This measures the frequency of a word in a document.
• This highly depends on the length of the document and the generality of the word, for
example, a very common word such as “was” can appear multiple times in a document.
• But if we take two documents with 100 words and 10,000 words respectively, there is a high
probability that the common word “was” is present more in the 10,000 worded document.
• But we cannot say that the longer document is more important than the shorter
document.
• For this exact reason, we perform normalization on the frequency value, we divide the
frequency with the total number of words in the document.
• Recall that we need to finally vectorize the document. TF is individual to each
document and word, hence we can formulate TF as follows:
tf(t,d) = count of t in d / number of words in d
• t — term (word) d — document (set of words)
Document Frequency

•This measures the importance of documents in a whole set of the corpus.

•This is very similar to TF but the only difference is that TF is the frequency counter for a term t

in document d, whereas DF is the count of occurrences of term t in the document set N.

•In other words, DF is the number of documents in which the word is present. We consider one

occurrence if the term is present in the document at least once, we do not need to know the

number of times the term is present.

df(t) = occurrence of t in N documents


Inverse document frequency

IDF helps assess the significance of a word based on its frequency across a collection of documents:

▪ IDF is high for terms that are rare across the corpus.

▪ IDF is low for terms that appear in many documents, as these are typically considered less informative.

The idea behind IDF is that if a term appears in many documents, it is likely a common term that doesn't

provide much useful information for distinguishing documents from one another. Conversely, terms that appear

in fewer documents are considered more meaningful for differentiating between documents.

Formula for IDF


The formula for calculating IDF is:
IDF(t)=log (N /df(t) )
N= number of document
Df(t) : is the number of documents that contain the term t.
Example
Consider a corpus with 5 documents:
▪ "The cat is on the mat."
▪ "The dog is on the mat."
▪ "The cat and the dog are playing."
▪ "The dog is playing outside."
▪ "The cat is playing with the dog."
If the term "cat" appears in 3 of these documents, its IDF would be calculated as:

IDF(cat)=log(53)≈0.2218

The dog sat on mat


IDF(dog)= log(45)
N- gram

N-gram can be defined as the contiguous sequence of n items from a given sample of text or speech. The

items can be letters, words, or base pairs according to the application. The N-grams typically are collected

from a text or speech corpus (A long text dataset).

For example, “Medium blog” is a 2-gram (a bigram),

“A Medium blog post” is a 4- gram,

“Write on Medium” is a 3- gram (trigram).


Stop Words
• The words which are generally filtered out before processing a natural
language are called stop words.
• These are actually the most common words in any language (like
articles, prepositions, pronouns, conjunctions, etc) and does not add
much information to the text.
• Examples of a few stop words in English are “the”, “a”, “an”, “so”,
“what”.
• Stop words are available in abundance in any human language.
Stemming and lemmatization synonyms
• Stemming is a technique used to extract the base form of the words by removing
affixes from them.
• It is just like cutting down the branches of a tree to its stems.
• For example, the stem of the words eating, eats, eaten is eat.
• Search engines use stemming for indexing the words.
• That’s why rather than storing all forms of a word, a search engine can store only
the stems.
• In this way, stemming reduces the size of the index and increases retrieval
accuracy.
•Lemmatization technique is like stemming.
•The output we will get after lemmatization is called ‘lemma’, which is a root word
rather than root stem, the output of stemming.
•After lemmatization, we will be getting a valid word that means the same thing.
Parts of speech tagging
• Parts of speech tagging is the process in which words in sentences are tagged
with parts of speech.
• We all are familiar about parts of speech used in English language.
• Read a sentence and identifying which word is noun, adverb, conjuction,
pronoun etc.
• Consider a example, given a sentence “Marry had a little lamb”, here ‘Marry’ is
noun, ‘had’ is verb,’a’ is determiner,’little’ is adjective and ‘lamb’ is noun.
•Word Sense Disambiguation:-It is the problem of determining in which
“sense” (meaning) a word is coming in the sentence i.e., by the context
the meaning of word is changing. For e.g., Consider the two sentences:
•He done work hardly.
•He hardly do any work.
•Here, hardly has two different meaning in the sentences.
• He done work hardly.
• He hardly do any work.
• Here, hardly has two different meaning in the sentences.
• Grammar checking:- Grammar checker uses parts of speech tagging , First it takes the input
and breaks it into sentences and then uses POS to tag individual sentences.For e.g., Consider
the two sentences:
• I wish i was dead.
• I wish i were dead.
• Here, second is grammatically correct while first is not. Grammar checker uses POS to find out
whether tag associated with words should follow it.
• Text to Speech Conversion:- For converting text to speech, we use parts of speech tagging as
here Part-of-Speech taggers have to deal with unknown words (Out-Of-Vocabulary problem) and
words with ambiguous POS tags (same structure in the sentence) such as nouns, verbs and
adjectives. As an example, the use a participle as an adjective for a noun in “broken glass”.

Sentiment Analysis
• Sentiment analysis is an automated
Sentiment Analysis

Sentiment analysis is an automated process capable of understanding the feelings or


opinions that underlie a text. It is one of the most interesting subfields of NLP, a branch of
Artificial Intelligence (AI) that focuses on how machines process human language.
• Sentiment analysis studies the subjective information in an expression, that is, the
opinions, appraisals, emotions, or attitudes towards a topic, person or entity. Expressions
can be classified as positive, negative, or neutral.
• For example:
“I really like the new design of your website!” → Positive
“I’m not sure if I like the new design” →Neutral
“The new design is bad!” → Negative
Examples of Sentiment Analysis
• Some of the most popular sentiment analysis business applications,
below:
1. Social media monitoring
2. Brand monitoring
3. Customer support analysis
4. Customer feedback analysis
5. Market research
Challenges to Social Media Analytics
▪ Data Overload: Social media generates massive amounts of data daily. Analyzing this data efficiently and
extracting meaningful insights can be overwhelming without the right tools or strategies in place.
▪ · Data Accuracy and Quality: Social media platforms can be prone to errors, spam accounts, or bots
that distort the data. Ensuring that the data is accurate and clean is a significant challenge, as poorly
cleaned data can lead to incorrect insights.
▪ · Privacy Concerns: With increasing concerns about user privacy and data protection regulations (e.g.,
GDPR), social media analytics can be restricted in terms of how data is collected, processed, and shared,
which limits the depth of analysis.
▪ · Sentiment Analysis Complexity: Understanding emotions or opinions from text-based posts,
comments, or reviews (e.g., sarcasm, irony, or mixed sentiments) can be difficult for automated sentiment
analysis tools, which often lead to inaccurate assessments.
▪ · Real-time Data Processing: Social media trends can change rapidly, so the ability to analyze and
react to data in real-time is important. However, this can require sophisticated technology and resources to
manage, especially for large brands or campaigns.

You might also like