A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis

Uploaded by

sumithkumar chede

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis

Uploaded by

sumithkumar chede

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Received February 18, 2022, accepted March 13, 2022, date of publication March 16, 2022, date of current

version March 28, 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3160172

A Study of the Application of Weight Distributing

Method Combining Sentiment Dictionary and
TF-IDF for Text Sentiment Analysis
HAO LIU , XI CHEN , AND XIAOXIAO LIU
Collaborative Innovation Centre of Assessment for Basic Education Quality, BNU, Beijing 100875, China
Institute of Education, University of Alberta, Edmonton, AB T6G 2G5, Canada
Corresponding author: Hao Liu ([email protected])
This work was supported in part by the National Social Science Fund of China under Grant 20CTJ019.

ABSTRACT The most commonly used methods in text sentiment analysis are rule-based sentiment
dictionary and machine learning, with the later referring to the use of vectors to represent text followed by the
use of machine learning to classify the vectors. Both methods have their limitations, including inflexibility of
rules, non-prominence of sentiment words. In this paper, we design a weight distributing method combining
the two methods for text sentiment analysis, by which the sentence vectors obtained can both highlight words
with sentiment meanings while retaining their text information. Empirical results show that based on this
new method, the accuracy rate of text sentiment analysis can reach as high as 82.1%, which means 13.9%
higher than rule-based sentiment dictionary method, and 7.7% higher than TF-IDF weighting method.

INDEX TERMS Text sentiment analysis, sentiment dictionary, sentence vector, weights distribution.

I. INTRODUCTION digital texts. Sentiment analysis applications have spread to

Natural language processing (NLP) is a way to system- almost every possible domain, such as consumer products,
atically analyze, understand and extract information from services, healthcare, financial services and social events [4].
text data. Because the capacity for language is one of By analyzing sentiment information, governments can effec-
the central features of human intelligence, NLP is one tively grasp the trend of public opinion by analyzing the sen-
of the most important branches of artificial intelligence, timent orientation in these comments and provide a basis for
which integrates social sciences (linguistics, logic, etc.), nat- policy making; businesses can tap the feedback information
ural sciences (computer science, statistics, etc.) and engi- of customers on various products for businesses to develop
neering (electrical engineering, etc.) [1], [2]. The goal more precise marketing strategies; and ordinary consumers
of NLP is to provide new computational capabilities can know other usrs’ opinions about a product before making
around human language, which typically involves infor- better purchasing decisions [5]. In the era of big data when the
mation extraction, machine translation, text generation and analysis of massive text data can only be done by computers,
so on [2]. the common challenge faced by researchers in this area is
Text sentiment analysis is an important application of NL, how to improve the accuracy and efficiency of text sentiment
which refers to ‘‘analysis of subjective texts with sentiment analysis.
overtones to ta and classify their sentiment connotations and
attitudes’’ [3]. With the rapid development of the Internet and II. RELATED WORK
social media, people often log on to different types of web- The initial steps of text sentiment analysis include data collec-
sites to express their opinions on current affairs or products, tion and data pre-processing, the later includes such processes
so individuals and organizations are increasingly using the as removal o invalid characters, word tokenizing and stop
content in these media for decision making [4]. It has become words filtering. This section reviews the work done in the
one of the hot research topics today to effectively obtain and field of text sentiment analysis from three aspects: sentiment
analyze the sentiment information contained in the massive analysis based on sentiment dictionary, sentiment analysis
methods based on machine learning and sentiment analysis
The associate editor coordinating the review of this manuscript and methods based on integration of sentiment dictionary and
approving it for publication was Md. Asaduzzaman . machine learning.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
32280 VOLUME 10, 2022
H. Liu et al.: Study of Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF

A. SENTIMENT ANALYSIS BASED ON SENTIMENT methods. Many studies have focused on the feature selec-
DICTIONARY tion. Jianzhong et al. [13], For example, based on four
The most important indicators of sentiments are sentiment features, i.e., sentiment word frequency, the polarity offirst
words, also called opinion words, which are commonly used occurrence of sentiment word, emoji polarity, and negation
to express positive or negative sentiments [4]. The method words, to analyzespace-related text data collected from Web,
of rule-based sentiment dictionary analysis refers to statis- by using SVM (support vector machine) and Naive Bayes,
tically calculate the sentiment weights of sentiment words respectively. The results is satisfactory in terms of accuracy
in a text based on a sentiment dictionary [6]. The main and recall. Word2vec is one of the most commonly used meth-
strategy is rule-based, using a dictionary of words marked ods for generating representation vectors of words, which is
with sentiment to determine the sentiment of a sentence, a Word Embedding tool open sourced by Google in 2013,
which is an unsupervised method [7]. This method is the most because of a few initial successes that motivate early adopters
intuitive and is also similar to theprocess of people’ recog- to do more, and leaving plenty of room for early adopters to
nizing sentiments in a text. In sentiment analysis, the more contribute and benefit [14]. Embedding is essentially the rep-
widely used dictionaries are HowNet sentiment dictionary resentation of words with a low-dimensional vector, in which
from CNK, affective lexicon ontology of Dalian University of word vectors with close distances to each other correspond to
Technology, and Chinese sentiment dictionary from Taiwan words with similar meanings [15].
University [8]. Some scholars have conducted related sen- After obtaining the vectors representing the sentence
timent tendency analysis exploration based on this method. using word vectors, the sentence vectors are then classi-
Haodong and Wenqi [6], have improved the average accu- fied via machine learning. TF-IDF weighting method is
racy of sentiment tendency analysis of Chinese microblogs one of the commonly used methods for generating sentence
to some extent by combining HowNet sentiment dictionary vectors based on word vectors. For example, Thomas and
and lexical ontology database, and incorporating numerous Latha [16] used TF-IDF for feature selection of Kanada text
features of emoticons, semantic rules, negation words, degree and used decision tree classifier to classify text. Soumya and
adverbs and Internet neologisms. Peiyu and Yan [8] con- Pramod [17] used methods such as BOW and TF-IDF to form
structed a Chinese novel sentiment dictionary based on lex- feature vectors. They then used different machine learning
icon and Word2Vec, including basic sentiment dictionary, techniques, such as Naive Bayes Machines (NB), Support
expanded dictionary and sentiment-imagery dictionary, and Vector Machines (SVM), and Random Forests (RF), to clas-
obtained sentiment tendency of sentences after sentiment sify tweets into positive and negative ones. Mukwazvure and
score accumulation and averaging. Xu et al. [9] constructed Supreethi [18] also used TF-IDF for information weight-
an extended sentiment dictionary, including basic sentiment ing to generate sentence vectors and then used SVM and
words, the field sentiment words and polysemy sentiment K-nearest neighbors for classification, which have achieved
words, and obtained the sentiment polarity of the text by good results for text sentiment classification. Gang and
using the extended sentiment dictionary and the designed Fei [19] proposed to perform sentiment clustering through
sentiment scoring rules. Yu and Egger [10] used the VADER voting mechanism of multiple clustering, which is a kind of
algorithm, a sentiment-scoring method for text based on unsupervised machine learning. Specifically, each clustering
sentiment lexicons and rules, to explore how tourists feel uses the TF-IDF feature weighting method, which overcomes
about safety, service, queues, etc. when visiting popular and the low accuracy and instability of K-means.
crowded attractions. As a new direction of machine learning, deep learning has
gradually been widely used in sentiment analysis because it is
more complex and accurate than traditional machine learning
B. SENTIMENT ANALYSIS BASED ON MACHINE LEARNING models. Jane [20] utilized deep learning techniques such as
Machine learning, a branch of artificial intelligence (AI) the Doc2Vec algorithm, Recurrent Neural Network (RNN),
and computer science, generates empirical-data-based mod- Convolutional Neural Network (CNN) to extract insights
els that can make decisions and judgments in new situa- valuable to Electric Vehicles buyers, marketers and manufac-
tions [11]. The method is often used to process and analyze turers. Wang et al. [21] used an unsupervised BERT (Bidi-
large amounts of data and is widely used in finance, health- rectional Encoder Representation for Transformer) model
care, education, and other fields. In text sentiment analysis, to classify texts on Sina Weibo into positive, neutral, and
the sentiment analysis task is usually modeled as a classifica- negative, and used a TF-IDF model to summarize the topics
tion problem. The researchers convert the text into a feature of posts. Aydin and Gungor [22] proposed a novel neural
vector, and then feed these vectors to the model to generate network framework that combines recurrent and recursive
predicted labels for the corresponding vectors [7]. neural network models. Recurrent models propagate infor-
The key to using machine learning for sentiment clas- mation about sentiment labels throughout word sequences;
sification is to select text features suitable for sentiment Recursive models extract syntactic structure from text. The
classification [12], represent the text data with a vector, and neural network framework achieved state-of-the-art results
then train and classify the model using machine learning and outperformed the baseline study by a significant margin.