XLNet_Transfer_Learning_Model_for_Sentimental_Analysis
XLNet_Transfer_Learning_Model_for_Sentimental_Analysis
Abstract— In natural language processing, an important approaches to sentiment analysis relied on handcrafted
objective is to perform sentiment analysis, which involves features and machine learning techniques like SVM and
categorizing textual content based on whether it expresses a Naive Bayes classifiers. However, these methods were
positive, negative, or neutral sentiment. Sentiment analysis limited by the quality and quantity of labeled data and the
systems face challenges such as ambiguity, subjectivity, complexity of the relationships between words and contexts.
contextual understanding, and domain adaptation. These
challenges make accurately determining sentiment in text a In machine learning-based sentiment analysis, a labeled
complex task. To address these challenges, the proposed dataset is used to train a model to classify new text into one
objective for sentiment analysis on movie review datasets is to of the three sentiment categories. The input features used for
develop a transfer learning-based XLNet model. The the model can include word frequency, word embeddings, or
utilization of transformer-based models has resulted in notable other linguistic features. The model is trained using a variety
advancements across several NLP tasks in recent years. In this of supervised learning methods, like LR, SVM, or decision
work, the feasibility of employing the XLNet model for trees.
sentiment analysis is examined, which involves fine-tuning the
XLNet model on a labeled sentiment analysis dataset. First, the CNNs have been used for sentiment analysis by treating
dataset is preprocessed, and the XLNet model is loaded. In the text as an image and using convolutional and pooling
addition, the classification layer is added to the model and layers to identify the features from the text. The input data is
transfer learning is applied for fine-tuning on the sentiment first converted into a matrix where each row represents a
analysis dataset. The effectiveness of the proposed work is word or a group of words, and each column represents a
evaluated on a test set, and various metrics such as accuracy, feature. The convolutional layers learn the patterns and
precision, recall, and F1 score are reported. Experimental features in the text by sliding a filter over the matrix, and the
results indicate that the XLNet model attained goood results output of the convolutional layers is reduced in size through
than other transformer based models on movie review dataset down sampling by the pooling layers. The fully connected
for sentiment analysis and it shows the effectiveness of transfer layers then use the learned features to make a prediction on
learning with XLNet in the sentiment analysis domain. the sentiment of the text.
Keywords- XLNet, Sentiment analysis, Transfer Learning, Recent advancements in deep learning and transformer-
Fine Tuning, Transformer model. based models have revolutionized the field of NLP,
including sentiment analysis. Transformer models such as
I. INTRODUCTION BERT, GPT-2 and XLNet[9][13] have shown significant
Sentiment analysis is a branch of natural language improvements in sentimental analysis compared to
processing that seeks to automatically detect and extract traditional machine learning and CNN-based models. The
personal and emotional information from textual usage of a self-attention mechanism in these models enables
information. Sentiment analysis has various applications, them to better comprehend the connections between words
including social media monitoring, brand reputation and their contexts, as well as to capture the enduring
management, customer feedback analysis, and market interdependencies among words in a sentence.
research. In recent years, with the rapid growth of social In transformer models [11][12][14][15], the input text is
media and the internet, sentiment analysis has become tokenized and converted into embeddings, which are then fed
increasingly important for businesses and organizations to into multiple transformer layers. In the transformer layers, a
understand public opinion and respond to customer needs. self-attention mechanism is utilized to comprehend the
In sentiment analysis, text data is classified into connections between words and contexts within the text, and
three categories: positive, negative, or neutral. Traditional the output of the transformer layers is used to make a
prediction on the sentiment of the text. Transformer-based
models can capture long-term dependencies between words strategies such as learning rate and selecting hidden state
and are better at handling context than traditional machine vector. The results showed that BERT4TC outperformed
learning models. typical feature based and also fine tuning methods, achieving
superior performance on multiple class classification.
The main progress of the research work are outlined Furthermore, post-training the BERT4TC model on a
below, domain-related corpus resulted in better performance on
1) The labeled sentiment analysis dataset is binary sentiment classification datasets compared to the
preprocessed by cleaning and removing unwanted original BERT model.
characters, stop words, and punctuations. The text is In [4], a novel deep network model based on Hierarchical
tokenized into individual words or subwords. Graph Transformer is introduced for large scale multiple
2) Firstly, the pre-trained model of XLNet is loaded, label text classification. The model represents the text and its
and subsequently a classification layer is appended to the semantics using a structure of graph and employs a multiple
pre-trained model. layer transformer structure with multiple head attention to
capture features at different levels. The model also
3) In order to optimize the performance of the model,
incorporates the relationships between the labels to create
the hyper parameters, which include the learning rate, batch
label representations and employs a weighted loss function
size, and number of epochs, are fine-tuned.
which depends on semantic distances. The results on three
4) After fine-tuning, the model's performance is standard datasets proved that the proposed model is effective
assessed and contrasted against that of other models. in capturing the text's hierarchy and logic, and outperforms
existing methods.
The work is structured as below: Section 2 provides an
overview of related research. Section 3 presents the research The authors of [5] propose a method to enhance the
work model in detail. The performance of the proposed semantics captured from short texts by incorporating
model is discussed and compared with other models in knowledge based conceptualization and also transformer
Section 4. Finally, Section 5 gives the conclusion of the encoder. The method uses a CNN to identify local
work. information and enriches short information which is obtained
from a knowledge base. Furthermore, it employs a
II. RELATED WORKS subnetwork structure depends on a transformer embedding
The neural network model [1] is introduced for text encoder to embed concepts into a low-dimensional vector
sentiment analysis that incorporates BERT pre-trained space and gain more attention from them. By utilizing the
language model, BLSTM, and attention mechanism. The concept space and transformer encoder space, the method
model aims to tackle the limitation of current sentiment constructs understanding models for short text information
classification models that fail to consider the context of retrieval and classification. The experimental results show
words. To address this issue, the proposed model leverages that the proposed method significantly improves the
BERT to acquire word vectors along with contextual performance of short text analysis.
meaning of semantic information, BiLSTM to extract BERT-based convolutional network model[6] is adapted
context-related features, and it employs attention mechanism to enhance the accuracy of long text classification for
to assign importance to significant information. The model's Chinese news based on local features. The model consists of
performance is evaluated on the SST dataset, achieving a test four modules: first, Dynamic LEAD-n is applied for
accuracy rate of 89.17%, which is an improvement compared extracting short texts from long texts, Text-Text Encoder
to other methods. module is adapted for capturing global features using BERT
The research work [2] presents a method for mining user and attention mechanism, CNN based module with local
interests on social networking platforms to suggest friends by feature is employed for capturing local features, and fusion
classifying users' text posts. They use the BERT language of feature vectors from different operations. By addressing
model and change it to address the problem of missing local the limitation of BERT on the input sequence length, the
information by proposing the KBERT-CNN text proposed method enhances the accuracy of classification for
classification model. This model combines BERT's last four Chinese language, as validated by experimental results.
layers' output with TextCNN to classify text. The user's text The study utilizes a labeled Twitter dataset to perform
categories' probability distribution is then used to calculate depression intensity classification. Four transformer based
interest similarity, and Top-N friends are recommended. models, along with one moderately larger model, are adapted
Experimental results indicate that the KBERT-CNN model to classify the depression intensity based on tweets. The
achieves an F1 score of 92.26%, and the precision of friend models are enhanced with various hyperparameters, and their
recommendations based on text classification is better than performance is evaluated using metrics such as accuracy, F1-
other content-based methods. score (recall and precision). The results indicate that Electra
The BERT language model lacks domain-specific and Small Generator [7] gives better accuracy than all other
task-specific knowledge, so a BERT-based text classification models. The study highlights the need for further
model was proposed in [3] to improve its performance. The optimization of ESG for low-powered devices and
BERT4TC model uses auxiliary sentence construction to emphasizes the potential of achieving better classification
transform classification tasks into binary sentence-pair performance for depression detection.
problems and overcomes the issues of limited training data A novel approach [8] is proposed for aspect based
and task-awareness. The study presents the model's sentiment analysis that utilizes a lexicalized ontology to
architecture, implementation, and approach of post training extract indirect relationships in user social data. The
for BERT's domain challenge. The authors conducted approach employs XLNet and Bi-LSTM networks for
experiments on seven datasets and tested various fine-tuning
comprehensive context extraction and aspect classification, Despite the advancements made in sentiment analysis,
respectively. The experimental results on six real-world there are several limitations that need to be addressed. These
drug-related social data sets reveal that the proposed include the heavy reliance on labeled data for training, the
approach outperforms existing approaches, achieving high difficulty in accurately capturing contextual understanding,
accuracy and F-measure in ADRs aspect based sentiment challenges in adapting models to different domains, limited
analysis. This demonstrates the effectiveness of the approach support for multilingual analysis, and the lack of
in improving feature extraction and sentiment classification interpretability in model predictions. These limitations
accuracy in unstructured social media text. restrict the scalability, generalizability, and applicability of
sentiment analysis models. To overcome these challenges,
The BERT and XLNet [9] [13] language models are used the research is focused on improving the models' ability to
in the field of NLP research, where they are fine-tuned for comprehend context, enhancing domain adaptation
cross domain analysis of sentiment classification. It explores techniques, expanding support for multilingual sentiment
the transferability of these models and analyses their analysis, and developing models that offer interpretability.
performances, leading to a considerable improvement in the By addressing these limitations, sentiment analysis can be
existing approaches for cross domain analysis, while improved and made more effective in various applications.
applying a minimal amount of data. The findings suggest that
bidirectional contextualized models provides better results III. PROPOSED METHODOLOGY
than previous works for cross domain analysis of sentiment
text classification. The proposed work contains four phases, namely data pre-
processing, XLNet model, fine-tuning the XLNet model for
The research [10] employed a pre-trained BERT model text classification, and performance analysis. In the first
along with an AdamW optimizer to assess the sentiment of phase, the data is processed and cleaned to prepare it for the
each tweet which is related to COVID-19. The dataset XLNet model. In the second phase, the pre-trained model of
contained approximately 32,000 tweets, which were XLNet is used for text classification which is depicted in
categorized into three classes: negative, neutral, and positive. figure 1. In the third phase, the XLNet model is further fine-
To balance the data, under-sampling was conducted, and the tuned by using the transfer learning approach to optimize its
model was fine-tuned for four epochs. The outcome revealed performance. Finally, in the fourth phase, the performance of
that the model's accuracy was highest in predicting negative the fine-tuned XLNet model is assessed using several metrics
sentiment, but least accurate in predicting neutral sentiment. to determine its effectiveness in text classification.
The overall accuracy of the model was 75.15%. However,
the study suggests that enhancing the dataset's size is
expected to lead to a substantial improvement in accuracy.
IV. EXPERIMENTS Accuracy : The accuracy metric measures the ratio of right
The proposed text classification system is assessed by predictions made by a model to the total count of predictions
that it has made and it is given in equation (1)
employing various transformer models and comparing their
results
A. Dataset
(1)
Movie review dataset is used for analyzing the results of
transformer based models. F-Measure: Equation (2) indicates the mean which is a
combination measure of p-r.
Movie Review Dataset- IMDb
The IMDb Movie Reviews dataset is composed of 50,000 (2)
reviews sourced from the Internet Movie Database (IMDb),
which have been categorized as either positive text, negative Precision (p): Precision evaluates the capability of a model to
text for binary analysis. The dataset has an equal number of accurately classify text in instances where it is present and it
positive and negative reviews, with only the most polarizing is depicted in equation (3)
reviews included. To be labeled as negative, a review must
have a score of 4 out of 10 or lower, while a score of 7 out of
10 or higher qualifies a review as positive. Additionally, no (3)
more than 30 reviews per movie have been included in the
dataset, and it also contains some unlabelled data. The
sample data is given in figure 4 Recall(r): Recall indicates to the capability of a model to
correctly identify all relevant instances of a particular class in
a given text and it is represented in equation (4)
(4)
C. Performance Analysis
The comparison study is conducted to evaluate the
performance of various Transformer-based models on a
movie review dataset. Two evaluation metrics were used to
compare the models - accuracy and loss, which are standard
metrics for measuring the effectiveness of transformer
models. The results of the study are depicted in Tables 2 and
3. Table 2 indicates a comparison of the accuracy and loss of
different Transformer-based models, while Table 3 presents
the F1-measure and accuracy of the models. F1-measure is
another evaluation metric that takes both precision and recall.
The performance analysis revealed that the XLNet model
outperformed the other models on the movie review dataset.
This means that the XLNet model achieved the highest
accuracy and the lowest loss compared to other models.
Furthermore, Table 3 shows that the XLNet model also had
the highest F1-measure, which indicates that it had a better
balance between precision and recall compared to other
models.To further emphasize the superior performance of the
XLNet model, Figure 5 was presented, which displays the
accuracy and F1-measure of different Transformer models.
This visualization highlights the clear advantage of the
XLNet model over the other models based on accuracy and
F1-measure.
Fig. 4. Sample data of dataset The XLNet model exhibits the highest accuracy rate,
making it an optimal choice for further tuning through
B. Performance Metrics transfer learning. By employing transfer learning, the
performance of the XLNet model can be enhanced, as
Transformer based models are assessed using various metrics demonstrated through its enhanced accuracy on a movie
such as accuracy, F1-score(precision, recall) to analyze their review dataset, as indicated in Tables 4 and 5. Following
performance. fine-tuning, the XLNet model was identified as the most
suitable option for text classification, as illustrated in Figure
6. Sample output is given in figure 7.
TABLE II. COMPARISON ANALYSIS OF TRANFORMER MODELS INTERMS TABLE III. RESULT ANALYSIS OF VARIOUS TRANFORMER MODELS
OF ACCURACY AND LOSS INTERMS OF F-MEASURE
1 BERT 1 BERT
93 14 93 59 92 93 92.49 93
(Base) (Base)
2 BERT
BERT 83 84 84.5 86
2 87 34 86 32 (Large)
(Large)
Transformer
Transformer 3 encoder 87 88 87.5 89
3 encoder 88 64 89 62 with SVM
with SVM
4 XLNet 92 93 92.5 94
4 XLNet 95 37 94 64
TABLE IV. PERFORMANCE ANALYSIS OF PROPOSED MODEL INTERMSOF The comparative analysis of the table 6 shows the
ACCURACY AND LOSS
performance of various sentiment analysis models on a
Training Train Test Test movie review dataset. The proposed fine-tuned XLNet using
S.NO Models transfer learning yields the highest accuracy of 99%. This
accuracy loss Accuracy Loss
indicates that the proposed model outperforms the other
1
XLNet 95 37 94 64 models in accurately classifying sentiment in movie
reviews. By comparing the existing models with the
Fine- proposed fine-tuned XLNet model, it is evident that the
Tuned proposed model achieves significantly higher accuracy. This
2 XLNet 29
using
93 20 99 highlights the effectiveness of fine-tuning XLNet using
Transfer transfer learning for sentiment analysis on movie review
Learning datasets. The proposed model leverages the pre-trained
knowledge captured by XLNet and tailors it specifically for
TABLE V. PRFORMANCE ANALYSIS OF FINE-TUNED XLNET MODEL ON sentiment analysis, resulting in superior performance.
MOVIE REVIEW DATASET INTERMS OF F-MEASURE
F- Accuracy
S.NO Models Precision Recall
measure V. CONCLUSION
1
Over the past few years, there has been significant
XLNet 92 93 92.48 94 interest in sentiment analysis owing to its usefulness in a
variety of fields, including marketing, social media
Fine-
Tuned surveillance, and customer support. Transformer models
2 XLNet 99 have been widely used for sentiment analysis tasks, with
97 98 97.5
using transfer learning being one of the popular approaches for
Transfer improving their performance. The XLNet transfer learning
Learning
model is proposed to classify the text for sentiment analysis.
Transfer learning was utilized to fine-tune the XLNet
model, with the objective of improving its accuracy and
performance of text classification. A publicly available
movie review dataset was utilized for sentiment analysis,
containing labeled reviews as either positive or negative.
The data underwent preprocessing, including cleaning,
tokenization, and encoding. The XLNet model served as the
base model, also transfer learning was utilized to fine-tune
the model. The results showed that the XLNet model
performed exceptionally well on the movie review dataset,
Fig. 7. Sample Screen Shot achieving an accuracy rate of 99%. The model's initial
evaluation already demonstrated high accuracy rates
compared to other models. However, after fine-tuning, the
TABLE VI. PERFORMANCE COMPARISON ANALYSIS OF PROPOSED MODEL
WITH EXISTING TECHNIQUES accuracy rate improved significantly, suggesting the ability
of transfer learning in improving the performance of pre-
Accuracy (%)
trained models on specific tasks.
S.NO Models Method
References
1 S. M. Qaisar
LSTM 89.9
et.al. [16] [1] Y. Shen and J. Liu, "Comparison of Text Sentiment Analysis
based on Bert and Word2vec," 2021 IEEE 3rd International
Shaukat Z lexicon and Conference on Frontiers Technology of Information and
2 91 Computer (ICFTIC), Greenville, SC, USA, 2021, pp. 144-147,
et.al. [17] neural networks
doi: 10.1109/ICFTIC54370.2021.9647258.
Su, Sichang [2] N. Pan, W. Yao and X. Li, "Friends Recommendation Based on
3 SVM Model 85.2
et.al. [18] KBERT-CNN Text Classification Model," 2021 International
Joint Conference on Neural Networks (IJCNN), Shenzhen,
Hybrid feature China, 2021, pp. 1-6, doi: 10.1109/IJCNN52387.2021.9533618.
Kumar, H
4 model with ML 83.7
et.al. [19] [3] S. Yu, J. Su and D. Luo, "Improving BERT-Based Text
algorithms
Classification With Auxiliary Sentence and Domain
T. E. Trueman Knowledge," in IEEE Access, vol. 7, pp. 176600-176612, 2019,
5 BERT 94.6 doi: 10.1109/ACCESS.2019.2953990.
et.al. [20]