A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis
A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis
ABSTRACT The most commonly used methods in text sentiment analysis are rule-based sentiment
dictionary and machine learning, with the later referring to the use of vectors to represent text followed by the
use of machine learning to classify the vectors. Both methods have their limitations, including inflexibility of
rules, non-prominence of sentiment words. In this paper, we design a weight distributing method combining
the two methods for text sentiment analysis, by which the sentence vectors obtained can both highlight words
with sentiment meanings while retaining their text information. Empirical results show that based on this
new method, the accuracy rate of text sentiment analysis can reach as high as 82.1%, which means 13.9%
higher than rule-based sentiment dictionary method, and 7.7% higher than TF-IDF weighting method.
INDEX TERMS Text sentiment analysis, sentiment dictionary, sentence vector, weights distribution.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
32280 VOLUME 10, 2022
H. Liu et al.: Study of Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF
A. SENTIMENT ANALYSIS BASED ON SENTIMENT methods. Many studies have focused on the feature selec-
DICTIONARY tion. Jianzhong et al. [13], For example, based on four
The most important indicators of sentiments are sentiment features, i.e., sentiment word frequency, the polarity offirst
words, also called opinion words, which are commonly used occurrence of sentiment word, emoji polarity, and negation
to express positive or negative sentiments [4]. The method words, to analyzespace-related text data collected from Web,
of rule-based sentiment dictionary analysis refers to statis- by using SVM (support vector machine) and Naive Bayes,
tically calculate the sentiment weights of sentiment words respectively. The results is satisfactory in terms of accuracy
in a text based on a sentiment dictionary [6]. The main and recall. Word2vec is one of the most commonly used meth-
strategy is rule-based, using a dictionary of words marked ods for generating representation vectors of words, which is
with sentiment to determine the sentiment of a sentence, a Word Embedding tool open sourced by Google in 2013,
which is an unsupervised method [7]. This method is the most because of a few initial successes that motivate early adopters
intuitive and is also similar to theprocess of people’ recog- to do more, and leaving plenty of room for early adopters to
nizing sentiments in a text. In sentiment analysis, the more contribute and benefit [14]. Embedding is essentially the rep-
widely used dictionaries are HowNet sentiment dictionary resentation of words with a low-dimensional vector, in which
from CNK, affective lexicon ontology of Dalian University of word vectors with close distances to each other correspond to
Technology, and Chinese sentiment dictionary from Taiwan words with similar meanings [15].
University [8]. Some scholars have conducted related sen- After obtaining the vectors representing the sentence
timent tendency analysis exploration based on this method. using word vectors, the sentence vectors are then classi-
Haodong and Wenqi [6], have improved the average accu- fied via machine learning. TF-IDF weighting method is
racy of sentiment tendency analysis of Chinese microblogs one of the commonly used methods for generating sentence
to some extent by combining HowNet sentiment dictionary vectors based on word vectors. For example, Thomas and
and lexical ontology database, and incorporating numerous Latha [16] used TF-IDF for feature selection of Kanada text
features of emoticons, semantic rules, negation words, degree and used decision tree classifier to classify text. Soumya and
adverbs and Internet neologisms. Peiyu and Yan [8] con- Pramod [17] used methods such as BOW and TF-IDF to form
structed a Chinese novel sentiment dictionary based on lex- feature vectors. They then used different machine learning
icon and Word2Vec, including basic sentiment dictionary, techniques, such as Naive Bayes Machines (NB), Support
expanded dictionary and sentiment-imagery dictionary, and Vector Machines (SVM), and Random Forests (RF), to clas-
obtained sentiment tendency of sentences after sentiment sify tweets into positive and negative ones. Mukwazvure and
score accumulation and averaging. Xu et al. [9] constructed Supreethi [18] also used TF-IDF for information weight-
an extended sentiment dictionary, including basic sentiment ing to generate sentence vectors and then used SVM and
words, the field sentiment words and polysemy sentiment K-nearest neighbors for classification, which have achieved
words, and obtained the sentiment polarity of the text by good results for text sentiment classification. Gang and
using the extended sentiment dictionary and the designed Fei [19] proposed to perform sentiment clustering through
sentiment scoring rules. Yu and Egger [10] used the VADER voting mechanism of multiple clustering, which is a kind of
algorithm, a sentiment-scoring method for text based on unsupervised machine learning. Specifically, each clustering
sentiment lexicons and rules, to explore how tourists feel uses the TF-IDF feature weighting method, which overcomes
about safety, service, queues, etc. when visiting popular and the low accuracy and instability of K-means.
crowded attractions. As a new direction of machine learning, deep learning has
gradually been widely used in sentiment analysis because it is
more complex and accurate than traditional machine learning
B. SENTIMENT ANALYSIS BASED ON MACHINE LEARNING models. Jane [20] utilized deep learning techniques such as
Machine learning, a branch of artificial intelligence (AI) the Doc2Vec algorithm, Recurrent Neural Network (RNN),
and computer science, generates empirical-data-based mod- Convolutional Neural Network (CNN) to extract insights
els that can make decisions and judgments in new situa- valuable to Electric Vehicles buyers, marketers and manufac-
tions [11]. The method is often used to process and analyze turers. Wang et al. [21] used an unsupervised BERT (Bidi-
large amounts of data and is widely used in finance, health- rectional Encoder Representation for Transformer) model
care, education, and other fields. In text sentiment analysis, to classify texts on Sina Weibo into positive, neutral, and
the sentiment analysis task is usually modeled as a classifica- negative, and used a TF-IDF model to summarize the topics
tion problem. The researchers convert the text into a feature of posts. Aydin and Gungor [22] proposed a novel neural
vector, and then feed these vectors to the model to generate network framework that combines recurrent and recursive
predicted labels for the corresponding vectors [7]. neural network models. Recurrent models propagate infor-
The key to using machine learning for sentiment clas- mation about sentiment labels throughout word sequences;
sification is to select text features suitable for sentiment Recursive models extract syntactic structure from text. The
classification [12], represent the text data with a vector, and neural network framework achieved state-of-the-art results
then train and classify the model using machine learning and outperformed the baseline study by a significant margin.
C. SENTIMENT ANALYSIS INTEGRATING SENTIMENT etc. [27], [28]. In this study, 2 sentiment major categories
DICTIONARY AND MACHINE LEARNING ‘‘ (happy),’’ ‘‘ (good)’’ are combined into ‘‘positive sen-
Although both methods have good performance in sentiment timent’’ and 5 sentiment major categories ‘‘ (angry),’’ ‘‘
analysis, their shortcomings are also obvious. The rules based (sad),’’ ‘‘ (scared),’’ ‘‘ (disgusted),’’ ‘‘ (astonished)’’ are
on sentiment dictionary relies too much on sentiment words combined into ‘‘negative sentiment,’’ and the sentiment score
and the rules are not flexible enough, while the machine of the sentence is obtained by semantic rules and the intensity
learning approach does not highlight the important role of of the sentiment words, and if the score is greater than 0, the
emotional words. To address these drawbacks, some scholars sentence is judged as positive sentiment; if the score is less
have improved the method of sentence vector generation than 0, it is considered as negative sentiment; if the score is
based on sentiment dictionaries: Yang [23] proposed to assign equal to 0, it is considered as neutral sentiment.
weights of 2 to sentiment words and 1 to neutral words, The rules used in this study include the following three.
and then sum up the word vectors obtained from Word2Vec Rule 1: When a negative word, e.g., ‘‘ (not),’’ ‘‘
training based on the weights to obtain the corresponding (without),’’ etc., appears before a sentiment word, and there is
sentence vectors; Hui et al. [12] adjusted the sentiment of no punctuation between the sentiment word and the negative
the word vectors obtained from Word2Vec training to obtain word, the sentiment is reversed, and the weight of thenegative
word vectors considering both semantic and sentiment ten- word is −1. For example, ‘‘ (I’m not happy). The word
dencies, and used TF-IDF values for the word vectors to ( ) happy’’ is an positive sentiment word with an intensity
weight and sum to obtain the text vector representation, and of 7, so the sentence has an sentiment score of −7 and is
machine learning methods are used to classify the text for finally judged as a negative sentiment.
sentiment. Dashtipour et al. [24] firstly judged sentence sen- Rule 2: When there is an adversative conjunction, such as
timent polarity based on sentiment dictionary and rules. If the ‘‘ (but),’’ ‘‘ (however),’’ etc.), the sentiment before
sentence could not be classified, the concatenated fastText the transitive word is reversed. For example, ‘‘ ,
embedding of the sentence is inputted to the DNN (Deep (Although he studied hard, his perfor-
Neural Networks) to determine the polarity of the sentence. mance was still unsatisfactory).’’ The strengths of ‘‘
Yang et al. [25] used the BERT model to train word vectors, (someone study so hard that he forget to eat and sleep)’’
used a sentiment dictionary to enhance the sentiment features and ‘‘ (satisfactory)’’ are 7 and 5, respectively. The
in the text, and then went through a convolutional layer and adversative conjunction ‘‘ (but)’’ makes the weight of
a pooling layer to classify the weighted sentiment features. ‘‘ ’’ be −1, and the negative word ‘‘ (not)’’ makes the
Chiny et al. [26] proposed a hybrid sentiment analysis model weight of ‘‘ (satisfactory)’’ be −1. The final sentiment
based on Long Short-Term Memory network (LSTM), a rule score of the sentence is −12, which is finally judged as a
based sentiment dictionary (VADER) and TF-IDF weight- negative sentiment. When there are multiple sentences in
ing method. The above three methods each get a sentiment a paragraph, the adversative conjuctions only reverses the
score, and then treated the three scores as three inputs, ans sentence that it is in, rather than the sentiment of all the
used classification models such as Logistic Regression (LR), sentences, so it needs to be calculated sentence by sentence.
Random Forest (RF) and Support Vector Machine (SVM) for Rule 3: When there are adverbs of degree, e.g., ‘‘
sentiment polarity classification. (a little),’’ ‘‘ (very),’’ etc., they are given different weights.
Currently, the major difficulty facing researchers in the In this study, the degree adverbs are divided into 5 cate-
area is how to combin sentiment dictionary and machine gories: ‘‘ (extremely),’’ ‘‘ (very),’’ ‘‘ (more),’’ ‘‘
learning. To solve this problem, we proposes the Sentiment (slightly),’’ and ‘‘ (less),’’ are given weights of 1.5, 1.4,
Dictionary Weighting method based on the TF-IDF, which 1.2, 0.8, and 0.5, respectivel. When the degree adverb and the
combines sentiment dictionary and pre-trained word vec- negative word appear at the samesentence, thenegative word
tors, which enables obtained sentence vectors to retain text is given a weight of −0.5 if it comes before the degree adverb,
information while highlighting the words with sentiment and a weight of −1.5 if it comes after the degree adverb. For
tendencies. example, ‘‘ (I’m very unhappy).’’ The strength of the
word ‘‘ (happy)’’ is 7, the weight of the degree adverb
‘‘ (very)’’ is 1.4, and the weight of thenegative word is −1.
III. RESEARCH METHODOLOGY because it is after the degree adverb. Therefore, the final score
A. RULES BASED ON SENTIMENT DICTIONARY is −14.7, which is a negative sentiment.
The sentiment dictionary used in this study is a Chinese As the most widely used one, this method has the advan-
ontology resource compiled and labeled by the Information tages of easy operation and highlighting the role of sentiment
Retrieval Research Laboratory of Dalian University of Tech- words, but it also has many shortcomings Only sentiment
nology. The resource classifies sentiments into 7 major cate- words, negative words and related words are considered
gories and 20 subcategories, which describe a Chinese word with other information in the sentence being overlooked, and
or phrase from different perspectives, including word lexical the rules of classification are rigid and inflexible, among
category, sentiment category, sentiment intensity and polarity, others.
all neutral words in text s. IC is a schematic function and Thus we make a hypothesis that when the sentence vector
takes 1 when condition C is satisfied, otherwise it takes 0. of sentence s is known, the probability that word w appears in
The framework of weight distribution method proposed in sentence s is proportional to the exponent of the inner product
this paper is shown in Figure 1. of the sentence vector and the word vector. To facilitate the
As the weight sum of sentiment words, α represents the calculation, we assume that the intercept term is 0, so
importance of sentiment words. To highlight the importance
P (w ∈ s | vs ) = A × exp (vs ∗ vw )
of sentiment words, it cannot be too small, but if it is too
large, the role of neutral words would be reduced. According where A is a constant, and we obtain a log-likelihood function
to the latent variable generation model of Arora et al. [31], the for a sentence
generation of sentences is considered as a dynamic process, Y X
where the sentence vector does a slow random tour in order to lnL(w, vs ) = ln[ A × exp(vs ∗ vw )] ∝ vs ∗ vw
generate similar words in the context at moment t. his means w∈s w∈s
where s1 is the set of sentiment words in s and s2 is the set of cross-validation to compare the predictive power of the three
neutral words in s. We define: methods mentioned, each experiment uses four chapters as
X X dw0 the test set and the remaining 12 chapters as the training set,
a= 0
vw0 ∗ vw , repeating four times altogether. The sentiment word of novel
w∈s w ∈s1 Ds
X X tw0 text were manually marked. Because one assumption that
b= 0
vw0 ∗ vw (5) researchers often make about sentence-level analysis is that
w∈s w ∈s2 TS
a sentence expresses a single sentiment from a single opinion
It can be seen that, if a < b, the smaller the α, the holder [4] long, emotionally-rich paragraphs are segmented
largerthe likelihood function, which means that for the whole so that each paragraph contained only one kind of sentiment
sentence structure, neutral words are more important than as much as possible. A total of 2079 sentences were obtained,
sentiment words; given the importance of sentiment words in including 486 neutral sentences, 535 sentences with posi-
sentiment classification, sentiment words and neutral words tive sentiment and 1058 sentences with negative sentiment.
can be considered as equally important, so α is taken as 0.5. There are 503 sentences in chapter 1-4, 478 sentences in
When a > b, the larger α is, the larger the likelihood function, chapter 5-8, 363 sentences in chapter 9-12, 735 sentences in
which means that for the whole sentence structure, sentiment chapter 13-16. Figure 2 shows the distribution of sentiment
words are more important than neutral words. Considering in the novel. It can be found that the sentiment distribution of
the importance of sentiment words in sentiment classifica- each part of the text is almost the same, and there are more
tion, the weight of sentiment words can be enlarged, but it negative sentiment sentences than positive sentiment and
cannot be taken as 1, otherwise the role of neutral words will neutral sentences, which account for almost half of the total,
be completely eliminated. So α takes 0.75, the middle value which is in line with the tragic characteristics of loneliness,
between 0.5 and 1. In summary, the following formula is used fear and sadness of the novel.
to determine the values taken in the text s.
0.75 if a > b
B. EVALUATION METRICS
αs = 0.5 if a < b (6) All three methods determine the sentiment category (negative
0 if S1 = ∅ sentiment, neutral sentiment, andpositive sentiment) of each
sentence. For the overall performance evaluation, accuracy is
The vector computing method for sentiment classification used to measure.
is shown in Table 1.
where the slack variable ξ1 is the proportional amount by taken to represent the classification performance of the two
where the prediction f (x) = xT β + β0 is on the wrong side of parameter combinations. We found that the optimal parame-
its margin [34]. The cost parameter C represents the penalty ter combination of both the TF-IDF and the method of this
for misclassification. Kernel function is used in nonlinear paper is C = 10 and γ = 0.00. The model was trained with
classification tasks. The basic idea is to use a transformation the training set and the sentiment classification was done on
to map the sample points to a new space, making it linearly the test set, and Figure 3 is a comparison of the accuracy of the
separable [30]. The most popular choice for kernel function three sentiment classification methods.
in SVM literature is radial basis As can be seen from Figure3, the performance of the rule-
2 based on sentiment dictionary method is not satisfactory, the
K(x, x0 ) = exp(−γ x − x0 )
accuracy is between 60% and 70%. The method proposed in
To maximize predictive ability of the model, parameter this paper performed the best in four experiments, with the
selection is first made. A grid search method was used for accuracy rate above 80%. The mean values of the accuracy of
the parameters C and γ of the SVM, and a 5-fold cross- the three methods are 68.2%, 74.4% and 82.1%, respectively.
validation was made for each combination of the two param- Table 2, Table 3, and Table 4 show the comparison of preci-
eters, respectively, The training set is randomly divided into sion, recall and F1 score of the experimental results of leave-
five subsets, four subsets are used for model training, and four-chapters-out cross-validation.
the remaining one is used for testing. The above process is From the result of our experiment, the rule-based sentiment
repeated five times, and the average of the five results is dictionary method has a significant gap in classification effect
compared with the other two methods; the TF-IDF, which [6] Z. Haodong and L. Wenqi, ‘‘Chinese micro-blog emotional analysis
does not consider sentiment tendency, also has significant method based on semantic rules and expression weighting,’’ J. Light Ind.,
vol. 35, no. 2, pp. 74–82, 2020.
shortcomings for sentiment analysis. The method proposed in [7] P. Sudhir and V. D. Suresh, ‘‘Comparative study of various approaches,
the paper combines the advantages of the two, and the clas- applications and classifiers for sentiment analysis,’’ Global Transitions
sification effect show an obvious improvement, in which the Proc., vol. 2, no. 2, pp. 205–211, Nov. 2021.
[8] S. Peiyu and X. Yan, ‘‘Research on multi–feature fusion method for senti-
precision is improved by 13.9% compared with the rule-based ment analysis of Chinese microbiog,’’ Electron. World, vol. 2, pp. 20–21,
sentiment dictionary method and 7.7% compared with the Feb. 2018.
TF-IDF weighting method. The good results of the method [9] G. Xu, Z. Yu, H. Yao, F. Li, Y. Meng, and X. Wu, ‘‘Chinese text sentiment
analysis based on extended sentiment dictionary,’’ IEEE Access, vol. 7,
proposed in the paper in classification illustrate that the pp. 43749–43762, 2019.
method fully extracts the sentiment information in the text [10] J. Yu and R. Egger, Tourist Experiences at Overcrowded Attractions: A Text
and verifies its effectiveness in sentiment analysis. Analytics Approach. Springer, 2021, pp. 231–243.
[11] Z. Zhihua, Machine Learning. Beijing, China: Tsinghua Univ., 2016.
[12] D. Hui, X. Xueke, W. Dayong, L. Yue, Y. Zhihua, and C. Xueqi, ‘‘A senti-
V. CONCLUSION ment classification method based on sentiment-specific word embedding,’’
In this paper, a sentence vector generation method based J. Chin. Inf. Process., vol. 31, no. 3, pp. 170–176, 2017.
on sentiment dictionaries and pre-trained word vectors is [13] X. Jianzhong, Z. Jun, Z. Rui, Z. Liang, H. Liang, and L. Jiaojiao, ‘‘Sen-
timent analysis of aerospace microblog using SVM,’’ J. Inf. Secur. Res.,
proposed for sentiment classification, which calculates the vol. 3, no. 12, pp. 1129–1133, 2017.
weights of sentiment words and neutral words in a sentence [14] K. W. Church, ‘‘Word2Vec,’’ Nat. Lang. Eng., vol. 23, no. 1, pp. 155–162,
separately, and retains the overall information of the sen- 2017, doi: 10.1017/S1351324916000334.
tence while highlighting the sentiment words. A supervised [15] C. Deguang, M. Jinlin, M. Ziping, and Z. Jie, ‘‘Review of pre-training
techniques for natural language processing,’’ J. Frontiers Comput. Sci.
machine learning method is used to achieve the sentiment Technol., vol. 15, no. 8, pp. 1359–1389, 2021.
polarity determination of the text (in this paper, SVM is used [16] V. Rohini, M. Thomas, and C. A. Latha, ‘‘Domain based sentiment analysis
for classification), and the advantages and disadvantages of in regional Language-Kannada using machine learning algorithm,’’ in
Proc. IEEE Int. Conf. Recent Trends Electron., Inf. Commun. Technol.
the classification effect are compared with the commonly (RTEICT), May 2016, pp. 503–507.
used rule-based sentiment dictionary and TF-IDF weighting [17] S. Soumya and K. V. Pramod, ‘‘Sentiment analysis of Malayalam tweets
methods. Yu Hua’s novel Cries in the Drizzle was used as the using machine learning techniques,’’ ICT Exp., vol. 6, no. 4, pp. 300–305,
Dec. 2020.
domain classification corpus. From the experimental result, [18] A. Mukwazvure and K. P. Supreethi, ‘‘A hybrid approach to sentiment
the accuracy of text sentiment analysis using the method analysis of news comments,’’ in Proc. 4th Int. Conf. Rel., Infocom Technol.
proposed in this paper reaches 2.1%, which is 13.9% and Optim. (ICRITO) (Trends Future Directions), Sep. 2015, pp. 1–6.
[19] G. Li and F. Liu, ‘‘A clustering-based approach on sentiment analysis,’’ in
7.7% higher respectively than the other two methods. Proc. IEEE Int. Conf. Intell. Syst. Knowl. Eng., Nov. 2010, pp. 331–337.
It must be admitted that there are still some limitations and [20] R. Jena, ‘‘An empirical case study on Indian consumers’ sentiment towards
shortcomings in the proposed method, which needs further electric vehicles: A big data analytics approach,’’ Ind. Marketing Manage.,
vol. 90, pp. 605–616, Oct. 2020.
improvemen. The method proposed in this paper cannot break
[21] T. Wang, K. Lu, K. P. Chow, and Q. Zhu, ‘‘COVID-19 sens-
the boundary of single sentence, which cuts the connec- ing: Negative sentiment analysis on social media in China via
tion between text and context, which hinders the study of BERT model,’’ IEEE Access, vol. 8, pp. 138162–138169, 2020, doi:
paragraph and discourse comprehension. Second, sentiment 10.1109/ACCESS.2020.3012595.
[22] C. R. Aydin and T. Gungor, ‘‘Combination of recursive and recurrent neural
analysis at the sentence level is often insufficient for appli- networks for aspect-based sentiment analysis using inter-aspect relations,’’
cations because they do not identify opinion targets or assign IEEE Access, vol. 8, pp. 77820–77832, 2020.
sentiments to such targets [4]. In addition, it can be seen from [23] G. Yang, ‘‘Research and application of sentiment analysis based on
Word2Vec method,’’ Xiamen Univ., Xiamen, China, Tech. Rep., 2019.
the classification results that the classification results of the [24] K. Dashtipour, M. Gogate, J. Li, F. Jiang, B. Kong, and A. Hussain,
three methods are significantly better in negative sentiment ‘‘A hybrid Persian sentiment analysis framework: Integrating depen-
than neutral and positive sentiment. The reason for this phe- dency grammar based rules and deep neural networks,’’ Neurocomputing,
vol. 380, pp. 1–10, Mar. 2020.
nomenon is that the number of positive sentiment sentences is [25] L. Yang, Y. Li, J. Wang, and R. S. Sherratt, ‘‘Sentiment analysis for
more than the negative and neutral sentences. It indicates that e-commerce product reviews in Chinese based on sentiment lexicon
the classification performance is interfered by the imbalance and deep learning,’’ IEEE Access, vol. 8, pp. 23522–23530, 2020, doi:
10.1109/ACCESS.2020.2969854.
of text sentiment. [26] M. Chiny, M. Chihab, O. Bencharef, and Y. Chihab, ‘‘LSTM, VADER and
TF-IDF based hybrid sentiment analysis model,’’ Int. J. Adv. Comput. Sci.
REFERENCES Appl., vol. 12, no. 7, p. 2021, 2021, doi: 10.14569/IJACSA.2021.0120730.
[27] X. Lin-Hong, L. Hong-Fei, and Z. Jing, ‘‘Construction and analysis of
[1] F. Zhi-wei, ‘‘Academic position of natural language processing,’’ J. PLA
emotional corpus,’’ J. Chin. Inf. Process., vol. 22, no. 1, pp. 116–122,
Univ. Foreign Languagesguage Process., vol. 3, no. 28, pp. 1–8, 2005.
2008.
[2] J. Eisenstein, Natural Language Processing. Cambridge, MA, USA: MIT [28] L. Xu, H. Lin, Y. Pan, H. Ren, and J. Chen, ‘‘Constructing the affective
Press, 2018. lexicon ontology,’’ J. China Soc. Sci. Tech. Inf., vol. 27, no. 2, pp. 180–185,
[3] W. Ting and Y. Wenzhong, ‘‘Review of text sentiment analysis methods,’’ 2008.
Comput. Eng. Appl., vol. 57, no. 12, pp. 11–24, 2021. [29] T. MingZ Lei and Z. Xian-chun, ‘‘Document vector representation
[4] B. Liu, Sentiment Analysis and Opinion Mining. San Rafael, CA, USA: based on Word2CVec,’’ Comput. Sci., vol. 6, no. 43, pp. 214–217,
Morgan & Claypool, 2012. 2016.
[5] C. Long, G. Ziyu, H. Jianhong, and P. Jinye, ‘‘A survey on sentiment [30] S. Li, Z. Zhao, R. Hu, W. Li, T. Liu, and X. Du, ‘‘Analogical
classification,’’ J. Comput. Res. Develop., vol. 54, no. 6, pp. 1150–1170, reasoning on Chinese morphological and semantic relations,’’ 2018,
2017. arXiv:1805.06504.
[31] S. Arora, Y. Li, Y. Liang, T. Ma, and A. Risteski, ‘‘A latent XI CHEN was born in Linxia, Gansu, China,
variable model approach to PMI-based word embeddings,’’ 2015, in 1997. He received the B.S. degree from the
arXiv:1502.03520. South China University of Technology, in 2016.
[32] S. Arora, Y. Liang, and T. Ma, ‘‘A simple but tough-to-beat baseline for He is currently pursuing the master’s degree with
sentence embeddings,’’ in Proc. 5th Int. Conf. Learn. Represent., 2017, the Beijing Normal University, Beijing, China. His
pp. 1–16. current research interests include machine learn-
[33] L. Hang, Statistical Learning Methods. Beijing, China: Tsinghua Univ., ing, sentiment analysis, and model interpretability.
2019.
[34] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Cham, Switzerland:
Springer, 2009.