0% found this document useful (0 votes)
3 views

Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science

The document provides a beginner-friendly overview of Python tools for sentiment analysis in social media, detailing various methods and libraries such as TextBlob, NLTK-VADER, and deep learning techniques using transformers. It emphasizes the importance of contextualization in achieving higher accuracy in sentiment classification and discusses the advantages of different machine learning frameworks like TensorFlow and PyTorch. The author shares personal experiences and code snippets to guide readers in their sentiment analysis journey.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science

The document provides a beginner-friendly overview of Python tools for sentiment analysis in social media, detailing various methods and libraries such as TextBlob, NLTK-VADER, and deep learning techniques using transformers. It emphasizes the importance of contextualization in achieving higher accuracy in sentiment classification and discusses the advantages of different machine learning frameworks like TensorFlow and PyTorch. The author shares personal experiences and code snippets to guide readers in their sentiment analysis journey.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

Get started Open in app

Follow 615K Followers

Sentiment Analysis of Social Media with Python


Beginner-friendly overview of Python tools available for classifying sentiment in
social media text. I discuss my experiences using different tools and offer
suggestions to get you started on your own Python sentiment analysis journey!

Haaya Naushan Oct 2, 2020 · 7 min read

Photo by T. Selin Erkan on Unsplash

https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 1/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

In ancient Rome, public discourse happened at the Forum at the heart of the city. People
gathered to exchange ideas and debate topics of social relevance. Today that public
discourse has moved online to the digital forums of sites like Reddit, the microblogging
arena of Twitter and other social media outlets. Perhaps as a researcher you are curious
what people’s opinions are about a specific topic, or perhaps as an analyst you wish to
study the effect of your company’s recent marketing campaign. Monitoring social media
with sentiment analysis is a good way to gauge public opinion. Luckily, with Python
there are many options available, and I will discuss the methods and tools I have
experimented with, along with my thoughts about the experience.

On my learning journey, I started with the simplest option, TextBlob, and worked my
way up to using transformers for deep learning with Pytorch and Tensorflow. If you are a
beginner to Python and sentiment analysis, don’t worry, the next section provides
background. Otherwise, feel free to skip ahead to my diagram below for a visual
overview of the Python natural language processing (NLP) playground.

Introduction to Sentiment Analysis

Sentiment analysis is a part of NLP; text can be classified by sentiment (sometimes


referred to as polarity), at a coarse or fine-grained level of analysis. Coarse sentiment
analysis could be either binary (positive or negative) classification or on a 3-point scale
which would include neutral. Whereas a 5-point scale would be fine-grained analysis,
representing highly positive, positive, neutral, negative and highly negative. Early
analysis relied on rule-based methods, like those used by the Python libraries TextBlob
and NLTK-VADER, both of which are popular amongst beginners. Most machine learning
(ML) methods are feature-based and involve either shallow or deep learning. Shallow
approaches include using classification algorithms in a single layer neural network
whereas deep learning for NLP necessitates multiple layers in a neural network. One of
these layers (the first hidden layer) will be an embedding layer, which contains
contextual information.

A detailed explanation of neural nets is beyond the scope of this post, however for our
purposes an oversimplification will suffice: Neural networks are a collection of
algorithms that learn relationships about data in a way that mimics the network of
neurons in the human brain. For a deeper dive into the fascinating theory behind neural
networks, I suggest this introductory post.
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 2/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

A common theme I noticed is that the better a method is at capturing nuances from
context, the greater the sentiment classification accuracy. There are several techniques
for encoding or embedding text in a way that captures context for higher accuracy.
Therefore an embedding layer is integral to the success of a deep learning model. Today,
deep learning is advancing the NLP field at an exciting rate. At the cutting edge of deep
learning are transformers, pre-trained language models with potentially billions of
parameters, that are open-source and can be used for state-of-the-art accuracy scores. I
created the diagram below to showcase the Python libraries and ML frameworks
available for sentiment analysis, but don’t feel overwhelmed there are several options
that are accessible for beginners.

Python libraries and machine learning frameworks available for sentiment analysis. Image by Author.

Rule-based Python Libraries

TextBlob is popular because it is simple to use, and it is a good place to start if you are
new to Python. An early project of mine involved data visualization of polarity and
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 3/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

subjectivity scores calculated with TextBlob. The code snippet below shows a
straightforward implementation of TextBlob on tweets streamed from Twitter in real-
time, for the full code check out my gist.

While using TextBlob is easy, unfortunately it is not very accurate, since natural
language, especially social media language, is complex and the nuance of context is
missed with rule based methods. NLTK-VADER is an NLP package developed specifically
for processing social media text. I suggest checking it out if you are working with tweets
and looking for a point of comparison for TextBlob.

1 from textblob import TextBlob


2
3 def analyze_sentiment(self, tweet):
4 analysis = TextBlob(self.clean_tweet(tweet))
5
6 if analysis.sentiment.polarity > 0:
7 return 1
8 elif analysis.sentiment.polarity == 0:
9 return 0
10 else:
11 return -1

Code_snippet_example_TextBlob.py hosted with ❤ by GitHub view raw

TextBlob example, full gist with real-time Twitter streaming is available.

Machine Learning for Feature-based Methods

I realized that if I wanted greater accuracy, I needed to use machine learning;


contextualization was key. I started with conventional shallow learning approaches like
logistic regression and support vector machine algorithms used in single layer neural
nets. Besides requiring less work than deep learning, the advantage is in extracting
features automatically from raw data with little or no preprocessing. I used the NLP
package spaCy in combination with the ML package scikit-learn to run simple
experiments. I was inspired by a blog post, where the author used these two packages to
detect insults in social commentary to identify bullies. For fine-grained sentiment
classification, machine learning (feature-based) has an advantage over rule based
methods, this excellent post compares the accuracy of rule based methods to feature
based methods on the 5-class Stanford Sentiment Treebank (SST-5) dataset.
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 4/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

Deep Learning: Embeddings and Transformers

Deep learning and word embeddings further improved accuracy scores for sentiment
analysis. In 2013, Google created the Word2Vec embedding algorithm, which along
with the GloVe algorithm remains the two most popular word embedding methods. For
a practical walk-through, check out this post, where the author uses embeddings to
create a book recommendation system. Traditionally, for deep learning classification a
word embedding would be used as part of a recurrent or convolutional neural network.
However, these networks take a very long time to train, because with recurrence and
convolutions it is difficult to parallelize. Attention mechanisms improved the accuracy of
these networks, and then in 2017 the transformer architecture introduced a way to use
attention mechanisms without recurrence or convolutions. Therefore, the biggest
development in deep learning for NLP in the past couple years is undoubtedly the
advent of transformers.

Python Deep Learning Libraries

When I started studying deep learning, I relied on Reddit recommendations to pick a


Python framework to start with. The top suggestion for beginners was the Python
library, Keras, which works as a functional API. I found it very accessible, especially since
it is built on top of the Tensorflow framework with enough abstraction that the details do
not become overwhelming, and straightforward enough that a beginner can learn by
playing with the code. Just because Keras simplifies deep learning, this does not mean
that it is ill-equipped to handle complex problems in a sophisticated way. It is relatively
easy to augment Keras with Tensorflow tools when necessary to tweak details at a low
level of abstraction, therefore Keras is a capable competitor on the deep-learning
battlefield. In the code snippet below I was attempting to build a classifier from a pre-
trained language model while experimenting with multi-sample dropout and stratified
k-fold cross-validation, all of which was possible with Keras.

1 N_SAMPLES = 6
2 def create_model(bert_model, MAX_LEN=100):
3 input_ids = layers.Input(shape=(MAX_LEN,), dtype=tf.int32, name='input_ids')
4 attention_mask = layers.Input(shape=(MAX_LEN,), dtype=tf.int32, name='attention_mask')
5
6 last_hidden_state, _ = bert_model({'input_ids': input_ids, 'attention_mask': attention_mask}
7 last_hidden_state = Dropout(0.1)(last_hidden_state)

https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 5/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
8 x_avg = layers.GlobalAveragePooling1D()(last_hidden_state)
9 x_max = layers.GlobalMaxPooling1D()(last_hidden_state)
10 x = layers.Concatenate()([x_avg, x_max])
11
12 samples = []
13 for n in range(N_SAMPLES):
14 sample_mask = layers.Dense(64, activation='relu', name = f'dense_{n}')
15 sample = layers.Dropout(.5)(x)
16 sample = sample_mask(sample)
17 sample = layers.Dense(1, activation='sigmoid', name=f'sample_{n}')(sample)
18 samples.append(sample)
19
20 output = layers.Average(name='output')(samples)
21
22 model = Model(inputs=[input_ids, attention_mask], outputs=output)
23 model.compile(Adam(lr=1e-5), loss = BinaryCrossentropy(label_smoothing=0.1), metrics=['accur
24 return model

Keras_example.py hosted with ❤ by GitHub view raw

Snippet of Keras code for a multi-dropout model, with sampling for stratified k-fold cross-validation.

My introduction to transformers was the adorably named Python library, Huggingface


transformers. This library makes it simple to use transformers with the major machine
learning frameworks, TensorFlow and Pytorch, as well as offering their own Huggingface
Trainer to fine-tune the assortment of pre-trained models they make available. The most
popular transformer BERT, is a language model pre-trained on a huge corpus; the base
model has 110 million parameters and the large model has 340 million parameters. For
sentiment classification, BERT has to be fine-tuned with a sentiment-labeled dataset on
a downstream classification task. This is referred to as transfer learning, which leverages
the power of pre-trained model weights that allow for the nuances of contextual
embedding to be transferred during the fine-tuning process. There are several other
transformers such as RoBERTa, ALBERT and ELECTRA, to name a few. In addition to
being very accessible, Huggingface has excellent documentation if you are interested in
exploring the other models, linked here. Additionally, since fine-tuning takes time on
CPUs, I suggest taking advantage of Colab notebooks, which will allow you to run
experiments for free on Google’s cloud GPUs (there is a monthly rate limit) for a faster
training time.

Which Machine learning framework is right for you?


https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 6/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

I can offer my opinion on which machine learning framework I prefer based on my


experiences, but my suggestion is to try them all at least once. The OG framework
Tensorflow is an excellent ML framework, however I mostly use either the Pytorch
framework (expressive, very fast, and complete control) or the HF Trainer (straight-
forward, fast, and simple) for my NLP transformers experiments. My preference for
Pytorch is due to the control it allows in designing and tinkering with an experiment —
and it is faster than Keras. If you prefer object oriented programming over functional, I
suggest the Pytorch framework since the code makes use of classes, and consequently is
elegant and clear. In the code snippet below using Pytorch, I create a classifier class and
use a constructor to create an object from the class, which is then executed by the class’
forward pass method.

1 # Adapted from https://www.curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-us


2 # Example classifier class with super contructor and forward pass method
3
4 class SentimentClassifier(nn.Module):
5
6 def __init__(self, n_classes):
7 super(SentimentClassifier, self).__init__()
8 self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
9 self.drop = nn.Dropout(p=0.3)
10 self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
11
12 def forward(self, input_ids, attention_mask):
13 _, pooled_output = self.bert(
14 input_ids=input_ids,
15 attention_mask=attention_mask
16 )
17 output = self.drop(pooled_output)
18 return self.out(output)

Pytorch_example.py hosted with ❤ by GitHub view raw

Snippet of a Pytorch implementation of BERT using a Huggingface pre-trained model.

Additional code is needed to run a backwards pass, and use an optimizer to compute loss
and update the weights. The code for Pytorch is significantly longer than the code
required for Keras. If you prefer to write code quickly and not spell out every training
step, then Keras is a better option for you. However, if you want to understand

https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 7/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

everything that happens during training, Pytorch makes this possible. For a step-by-step
guide to Pytorch with examples, check out this introductory post. For a cool project with
Pytorch, I recommend this great tutorial by Venelin Valkov, where he shows you how to
use BERT with Huggingface transformers and Pytorch, and then deploy that model with
FASTAPI.

Hopefully this post shed some light on where to start for sentiment analysis with Python,
and what your options are as you progress. Personally, I look forward to learning more
about recent advancements in NLP so that I can better utilize the amazing Python tools
available.

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

Get this newsletter

Python Social Media Sentiment Analysis NLP Machine Learning

About Write Help Legal

Get the Medium app

https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 8/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science

https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 9/9

You might also like