Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science
Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 1/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
In ancient Rome, public discourse happened at the Forum at the heart of the city. People
gathered to exchange ideas and debate topics of social relevance. Today that public
discourse has moved online to the digital forums of sites like Reddit, the microblogging
arena of Twitter and other social media outlets. Perhaps as a researcher you are curious
what people’s opinions are about a specific topic, or perhaps as an analyst you wish to
study the effect of your company’s recent marketing campaign. Monitoring social media
with sentiment analysis is a good way to gauge public opinion. Luckily, with Python
there are many options available, and I will discuss the methods and tools I have
experimented with, along with my thoughts about the experience.
On my learning journey, I started with the simplest option, TextBlob, and worked my
way up to using transformers for deep learning with Pytorch and Tensorflow. If you are a
beginner to Python and sentiment analysis, don’t worry, the next section provides
background. Otherwise, feel free to skip ahead to my diagram below for a visual
overview of the Python natural language processing (NLP) playground.
A detailed explanation of neural nets is beyond the scope of this post, however for our
purposes an oversimplification will suffice: Neural networks are a collection of
algorithms that learn relationships about data in a way that mimics the network of
neurons in the human brain. For a deeper dive into the fascinating theory behind neural
networks, I suggest this introductory post.
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 2/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
A common theme I noticed is that the better a method is at capturing nuances from
context, the greater the sentiment classification accuracy. There are several techniques
for encoding or embedding text in a way that captures context for higher accuracy.
Therefore an embedding layer is integral to the success of a deep learning model. Today,
deep learning is advancing the NLP field at an exciting rate. At the cutting edge of deep
learning are transformers, pre-trained language models with potentially billions of
parameters, that are open-source and can be used for state-of-the-art accuracy scores. I
created the diagram below to showcase the Python libraries and ML frameworks
available for sentiment analysis, but don’t feel overwhelmed there are several options
that are accessible for beginners.
Python libraries and machine learning frameworks available for sentiment analysis. Image by Author.
TextBlob is popular because it is simple to use, and it is a good place to start if you are
new to Python. An early project of mine involved data visualization of polarity and
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 3/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
subjectivity scores calculated with TextBlob. The code snippet below shows a
straightforward implementation of TextBlob on tweets streamed from Twitter in real-
time, for the full code check out my gist.
While using TextBlob is easy, unfortunately it is not very accurate, since natural
language, especially social media language, is complex and the nuance of context is
missed with rule based methods. NLTK-VADER is an NLP package developed specifically
for processing social media text. I suggest checking it out if you are working with tweets
and looking for a point of comparison for TextBlob.
Deep learning and word embeddings further improved accuracy scores for sentiment
analysis. In 2013, Google created the Word2Vec embedding algorithm, which along
with the GloVe algorithm remains the two most popular word embedding methods. For
a practical walk-through, check out this post, where the author uses embeddings to
create a book recommendation system. Traditionally, for deep learning classification a
word embedding would be used as part of a recurrent or convolutional neural network.
However, these networks take a very long time to train, because with recurrence and
convolutions it is difficult to parallelize. Attention mechanisms improved the accuracy of
these networks, and then in 2017 the transformer architecture introduced a way to use
attention mechanisms without recurrence or convolutions. Therefore, the biggest
development in deep learning for NLP in the past couple years is undoubtedly the
advent of transformers.
1 N_SAMPLES = 6
2 def create_model(bert_model, MAX_LEN=100):
3 input_ids = layers.Input(shape=(MAX_LEN,), dtype=tf.int32, name='input_ids')
4 attention_mask = layers.Input(shape=(MAX_LEN,), dtype=tf.int32, name='attention_mask')
5
6 last_hidden_state, _ = bert_model({'input_ids': input_ids, 'attention_mask': attention_mask}
7 last_hidden_state = Dropout(0.1)(last_hidden_state)
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 5/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
8 x_avg = layers.GlobalAveragePooling1D()(last_hidden_state)
9 x_max = layers.GlobalMaxPooling1D()(last_hidden_state)
10 x = layers.Concatenate()([x_avg, x_max])
11
12 samples = []
13 for n in range(N_SAMPLES):
14 sample_mask = layers.Dense(64, activation='relu', name = f'dense_{n}')
15 sample = layers.Dropout(.5)(x)
16 sample = sample_mask(sample)
17 sample = layers.Dense(1, activation='sigmoid', name=f'sample_{n}')(sample)
18 samples.append(sample)
19
20 output = layers.Average(name='output')(samples)
21
22 model = Model(inputs=[input_ids, attention_mask], outputs=output)
23 model.compile(Adam(lr=1e-5), loss = BinaryCrossentropy(label_smoothing=0.1), metrics=['accur
24 return model
Snippet of Keras code for a multi-dropout model, with sampling for stratified k-fold cross-validation.
Additional code is needed to run a backwards pass, and use an optimizer to compute loss
and update the weights. The code for Pytorch is significantly longer than the code
required for Keras. If you prefer to write code quickly and not spell out every training
step, then Keras is a better option for you. However, if you want to understand
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 7/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
everything that happens during training, Pytorch makes this possible. For a step-by-step
guide to Pytorch with examples, check out this introductory post. For a cool project with
Pytorch, I recommend this great tutorial by Venelin Valkov, where he shows you how to
use BERT with Huggingface transformers and Pytorch, and then deploy that model with
FASTAPI.
Hopefully this post shed some light on where to start for sentiment analysis with Python,
and what your options are as you progress. Personally, I look forward to learning more
about recent advancements in NLP so that I can better utilize the amazing Python tools
available.
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 8/9
1/24/22, 6:19 AM Sentiment Analysis of Social Media with Python | by Haaya Naushan | Towards Data Science
https://towardsdatascience.com/sentiment-analysis-of-social-media-with-python-45268dc8f23f 9/9