Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
Abstract—Sentiment analysis or opinion mining is a natural sentiment analysis on Sentiment 140 and Coronavirus tweets
language processing (NLP) technique to identify, extract, and NLP dataset and compares the results with DistilBERT and
quantify the emotional tone behind a body of text. It helps to RoBERTa models.
capture public opinion and user interests on various topics based
on comments on social events, product reviews, film reviews, etc.
Linear Regression, Support Vector Machines, Convolution Neural
Networks (CNN), Recurrent Neural Networks (RNN), LSTM II. RELATED WORKS
(Long Short Term Memory), and other machine learning and Researchers used Information Gain and K-Nearest Neigh-
deep learning algorithms can be used to analyze the sentiment bour to conduct research on sentiment analysis of movie
behind a text. This work analyses the sentiments behind movie
reviews and tweets using the Coronavirus tweets NLP dataset reviews. (KNN) in 2020 [3]. In order to improve the per-
and Sentiment140 dataset. Three advanced transformer-based formance of the system, feature selection was made with
deep learning models like BERT, DistilBERT, and RoBERTa are Information Gain. They made use of the Polarity v2.0 dataset
experimented with to perform the sentiment analysis. Finally, the from the Cornell movie review dataset, and on comparing with
performance obtained using these models on these two different Naı̈ve Bayes, SVM, and Random Forest, the KNN model
datasets is compared using the accuracy as the performance
evaluation matrix. On analyzing the performance, it can be seen with information gain achieved the best performance with
that the BERT model outperforms the other two models. 96.8% accuracy. Although KNN gave a comparatively good
Index Terms—Sentiment Analysis, Natural Language Process- accuracy, the results can be further improved by automating
ing, BERT, DistilBERT, RoBERTa the process of selecting the optimal threshold. Ghorbani et al.
[4] developed a ConvLSTMConv hybrid model in 2020, which
I. I NTRODUCTION used the architecture of the Convolutional Neural Network
As a result of the rapid development of information (CNN) and the Long Short-Term Memory (LSTM) network
technology, social media platforms such as Twitter, Instagram, to determine the polarity of words on the Google cloud.
and Facebook have emerged to play an important role in It made use of the Movie Reviews (MR) dataset, which is
modern life. Online platforms have replaced traditional a collection of negative and positive movie reviews with a
sources of textual data over the past few years. Extracting sentence in each. A CNN is used to extract the features,
and utilizing useful information from user-generated data is and the contextual information is learned by BiLSTM, and
crucial for various organizations and governments [1]. One before applying it to the final dense layer, the results were
of the biggest challenges in performing efficient data analysis used again for CNN to provide an abstract feature. This
is extracting sentiment from textual data to determine the model achieved a result of 89.02% accuracy. Since two
author’s attitude or opinion. Sentiment analysis, also called CNNs were utilized for the model, the complexity was higher
opinion mining, is the technique of identifying, obtaining compared to other classifiers. Jnoub et al. [5], in the year
and categorizing subjective data from unstructured text using 2020, introduced a neural model-based domain-independent
various text analyses and linguistic techniques [2]. classification model for sentiment analysis. In those models,
Although statistical machine learning algorithms such as the sentiment classification task was performed using CNN
linear regression and Support Vector Machines perform well and SNN (Shallow Neural Network). The model was trained
in simpler sentiment analysis applications, they cannot be on the IMDb dataset and then tested on three different datasets:
generalized to more complex sentiment analysis problems. the IMDb dataset, the Movie Reviews (MR) dataset, and a
Deep learning models like CNN, RNN, and LSTM, on custom dataset collected from Amazon reviews that rate users’
the other hand, produce significant results in sentiment opinions about Apple products. The model can automatically
analysis [2]. BERT (Bidirectional Encoder Representation and adaptively extract spatial hierarchies of features from
from Transformers) has become an emerging technology for written reviews that may capture different writing styles of
various NLP tasks like text classification, sentiment analysis, users. SNN outperformed CNN in generalization performance
etc. This work aims to experimentally evaluate BERT for using the MR dataset, with an accuracy of 0.82. These models
Authorized licensed use limited to: Rajshahi University Of Engineering and Technology. Downloaded on November 21,2023 at 08:04:55 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Sample data from Sentiment 140 dataset Fig. 4. Sample data from Coronavirus tweets NLP dataset
Fig. 3. Plot of the distribution of class in training and testing data of Sentiment
140 dataset
C. Proposed Framework
Authorized licensed use limited to: Rajshahi University Of Engineering and Technology. Downloaded on November 21,2023 at 08:04:55 UTC from IEEE Xplore. Restrictions apply.
corresponds to validation loss, and the line indicated in red
corresponds to training loss. After each epoch, both training
and validation loss is seen to decrease for all three models.
TABLE I
ACCURACY SCORES OF THE THREE MODELS IN S ENTIMENT 140 DATASET
Authorized licensed use limited to: Rajshahi University Of Engineering and Technology. Downloaded on November 21,2023 at 08:04:55 UTC from IEEE Xplore. Restrictions apply.
NLP dataset when compared to DistilBERT and RoBERTa For testing the three models, 5000 data samples from Sen-
model. BERT model gives 95.3% training accuracy and timent140 dataset and 100 data samples Coronavirus tweets
93.13% testing accuracy. The loss graphs of the three models – NLP are given to the model and obtained an accuracy of
BERT, DistilBERT and RoBERTa models as seen in figure 10, 90.43% and 93.76% for Coronavirus tweets NLP dataset and
11, and 12, respectively. The line indicated in blue corresponds Sentiment140 dataset respectively. From the results, it can
to validation loss, and the line indicated in red corresponds to be seen that the BERT model obtains better performance.
training loss. After each epoch, both training and validation The results of prediction and a sample of test data from
loss is seen to decrease for all three models. Coronavirus tweets NLP dataset and Sentiment140 dataset in
BERT model are shown in figure 13 and 14.
TABLE II
ACCURACY SCORES OF THE THREE MODELS IN C ORONAVIRUS TWEETS
NLP DATASET
Fig. 13. Predicted labels for a sample of test data from Coronavirus Tweets
NLP dataset in BERT model
Fig. 10. Loss graph of BERT model on Coronavirus tweets NLP dataset
Fig. 14. Predicted labels for a sample of test data from Sentiment 140 dataset
in BERT model
V. CONCLUSION
Deep learning models are currently popular in the field
of sentiment analysis, but existing traditional models can be
improved in accuracy [12]. This work experimented with a
proposed BERT model for sentiment analysis on two differ-
Fig. 11. Loss graph of DistilBERT model on Coronavirus tweets NLP dataset ent datasets and compared the results with DistilBERT and
RoBERTa models. On analysing the results it is seen that
the BERT model achieved better accuracy than the other two
models on the Sentiment 140 and Coronavirus tweets NLP
dataset, with a training accuracy of 95.3%, validation accuracy
of 93.13% and testing accuracy of 92.76% on the Sentiment
140 dataset and a training accuracy of 94.1%, validation
accuracy of 81.3% and testing accuracy of 90.43% on the
Coronavirus tweets NLP dataset. Since DistilBERT has fewer
layers when compared to BERT due to the distillation process,
the accuracy may slightly vary, and the NSP task in pre-
training the model seems to be useful in these datasets as
the BERT model achieves higher accuracy than the RoBERTa
Fig. 12. Loss graph of RoBERTa model on Coronavirus tweets NLP dataset
model. Pre-training the model on more task-specific data may
further improve the accuracy.
Authorized licensed use limited to: Rajshahi University Of Engineering and Technology. Downloaded on November 21,2023 at 08:04:55 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] Q. A. Xu, V. Chang, and C. Jayne, “A systematic review of social media-
based sentiment analysis: Emerging trends and challenges,” Decision
Analytics Journal, p. 100073, 2022.
[2] S. Tam, R. B. Said, and Ö. Ö. Tanriöver, “A convbilstm deep learning
model-based approach for twitter sentiment classification,” IEEE Access,
vol. 9, pp. 41 283–41 293, 2021.
[3] N. O. F. Daeli and A. Adiwijaya, “Sentiment analysis on movie reviews
using information gain and k-nearest neighbor,” Journal of Data Science
and Its Applications, vol. 3, no. 1, pp. 1–7, 2020.
[4] M. Ghorbani, M. Bahaghighat, Q. Xin, and F. Özen, “Convlstmconv
network: a deep learning approach for sentiment analysis in cloud
computing,” Journal of Cloud Computing, vol. 9, no. 1, pp. 1–12, 2020.
[5] N. Jnoub, F. Al Machot, and W. Klas, “A domain-independent clas-
sification model for sentiment analysis using neural models,” Applied
Sciences, vol. 10, no. 18, p. 6221, 2020.
[6] I. Priyadarshini and C. Cotton, “A novel lstm–cnn–grid search-based
deep neural network for sentiment analysis,” The Journal of Supercom-
puting, vol. 77, no. 12, pp. 13 911–13 932, 2021.
[7] S. Kausar, X. Huahu, W. Ahmad, and M. Y. Shabir, “A sentiment
polarity categorization technique for online product reviews,” IEEE
Access, vol. 8, pp. 3594–3605, 2019.
[8] A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, “A hybrid cnn-lstm
model for improving accuracy of movie reviews sentiment analysis,”
Multimedia Tools and Applications, vol. 78, no. 18, pp. 26 597–26 613,
2019.
[9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[10] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled
version of bert: smaller, faster, cheaper and lighter,” arXiv preprint
arXiv:1910.01108, 2019.
[11] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
Authorized licensed use limited to: Rajshahi University Of Engineering and Technology. Downloaded on November 21,2023 at 08:04:55 UTC from IEEE Xplore. Restrictions apply.