Sentiment Analysis of User Comment Text Based On L
Sentiment Analysis of User Comment Text Based On L
DOI: 10.37394/232014.2023.19.3 Feng Li, Chenxi Cui, Yashi Hu, Lingling Wang
Abstract: Taking the user-generated Chinese comment dataset on online platforms as the research object, we
constructed word2vec word vectors using gensim and built a sentiment analysis model based on LSTM using
the TensorFlow deep learning framework. From the perspective of mining user comment data on the platform,
we analyzed the sentiment tendency of user comments, providing data support for hotels to understand
consumers' real sentiment tendencies and improve their own service quality. Through analysis of the validation
dataset results obtained by crawling the website, the accuracy of this LSTM model can reach up to 0.89, but
there is still much room for improvement in the accuracy of sentiment analysis for some datasets. In future
research, this model needs further optimization to obtain a stable and more accurate deep-learning model.
Received: May 8, 2022. Revised: February 12, 2023. Accepted: March 3, 2023. Published: April 5, 2023.
information age, a single sentiment dictionary training speed and reduce computational cost and
cannot make accurate judgments, while building a time, [15], proposed an attention-based LSTM
more complete and diverse dictionary can be labor- oriented aspect-level sentiment memory network
intensive. classification for sentiment classification based on
LSTM, [16], proposed a streamlined LSTM with six
2.1.2 Machine Learning-based Sentiment different parameters and compared the performance
Analysis Methods differences between these LSTMs using the Twitter
[8], compared the results of decision trees, Bernoulli dataset to establish the best set of parameters for the
NB (BNB), Maximum Entropy (ME), support LSTM, [17], proposed a new sentiment analysis
vector machines (SVM), and multinomial naive scheme based on Twitter and Weibo data, focusing
Bayes (MNB) in sentiment classification, and found on the impact of expressions on sentiment, and
that multinomial naive Bayes obtaining the best training an emotion classifier by attending to these
results of 88. 5%. [9], constructed a sentiment binary expressions, embedded in an attention-based
analyzer based on SVM and naive Bayes to analyze long- and short-term memory network, which is a
Twitter data and compared it with a sentiment good guide for sentiment analysis. Because of the
analyzer using only SVM or NB. [10], proposed an lower human input as well as the higher accuracy,
optimized sentiment analysis framework (OSAF), deep learning-based sentiment analysis methods
which uses SVM lattice search techniques and have become a hot research topic in recent years.
cross-validation. [11], proposed an emoticon-based
sentiment analysis method and discussed the role of 2.1.4 Analysis of Irony
symbolic expressions in sentiment analysis. [12], It is easy to find that there are a lot of phenomena of
proposed a computational algorithm for semantic irony and sarcasm on online platforms, and the
analysis based on the WordNet linguistic English emotion implied by such statements is often the
lexicon training set, using a combination of machine opposite of the surface meaning of the statements.
learning algorithms SVM and NB to automatically Therefore, the analysis of ironic statements and the
detect strongly associated negative tweets. analysis of the deeper meaning of the statements
Although machine learning-based sentiment will help to determine the emotional polarity of the
analysis has made progress compared to lexicon- text. [18], achieved good results in experiments with
based sentiment analysis, it still requires manual four machine learning methods by improving the
labeling of text and subjective factors can affect the sentiment analysis process and decision-making
final result. Traditional machine learning requires process and crawling data on Twitter, linear SVC
high model requirements, and if the model is not (accuracy=83%, f1-score=0.81), logistic regression
efficient, it is difficult to adapt to the era of (accuracy=83%, f1-score=0.81), Naïve Bayes
exploding information. In addition, traditional (accuracy=74%, f1- score=0.73) and random forest
machine learning has difficulty using contextual classifier (accuracy=80%, f1-score=0.81). Some
information in sentiment analysis, which also affects authors, [19], found that previous research on
accuracy. sarcasm detection has mostly been conducted using
natural language processing techniques, without
2.1.3 Deep Learning-based Sentiment Analysis considering the context, user's expression habits, etc.
Methods Therefore, a two-channel convolutional neural
A sentiment analysis method based on deep learning network was used to analyze the semantics of the
can automatically learn deep features from a large target text, as well as its emotional context, and to
amount of text information, and the sentiment extract the user's expression habits using an
analysis is effective and the model is highly attention mechanism. The effectiveness of the
adaptable without human intervention during the method is confirmed by experiments on several
learning process. [13], proposed a Restricted datasets, and it can effectively improve the
Boltzmann Machine (RBM) based rule model for performance of the irony detection task.
sentiment analysis of sentences. [14], proposed a
restricted data framework using RNN as a 2.1.5 Implicit Sentiment Analysis
framework to train a single model using the largest Implicit sentiment analysis is a special part of the
dataset of languages and reuse it for languages with sentiment analysis field because of the lack of
limited datasets. This framework has good results sentiment vocabulary and the ambiguity of
for sentiment analysis of small languages. LSTM is sentiment polarity, which is a difficult area of
a special structure of RNN, and to improve the research at this stage. Combing the literature on
implicit sentiment analysis at this stage, it is found analysis of aspects of Arabic as a research direction,
that the current research is very limited. [20], found [23], and used a composite model combining a long
that previous Graph Convolutional Networks short-term memory (LSTM) model and a
(GCNs) used for the study of sentiment analysis convolutional neural network (CNN) to analyze the
problems had difficulty in effectively using sentiment of Arabic tweets. For the Arabic
contextual context or often ignored the sentiment tweet dataset (ASTD), this model scored
dependencies between phrases. Therefore, they 64.46% on F1, outperforming other deep learning
proposed a context-specific heterogeneous graph models; some scholars, [24], research using two
convolutional network (CsHGCN) based on this, different long short-term memory (LSTM) neural
and experimental results showed that the model networks for aspect-level sentiment analysis of
could effectively identify target emotions in Arabic hotel reviews. The first is an aspect-OTEs
sentences. oriented LSTM for aspect sentiment polarity
classification as sentiment polarity markers, and the
2.1.6 Aspect-level Sentiment Analysis second is a character-level bidirectional LSTM
Aspect-Based Sentiment Analysis (ABSA), an along with a conditional random field classifier (Bi-
actively challenging part of the sentiment analysis LSTM-CRF) for aspect opinion target expression
field, aims to identify and analyze the fine-grained (ballot) extraction. This method was evaluated using
sentiment polarities towards particular aspects. a reference dataset of Arabian hotel reviews and the
[21], proposed a new neural network-based results showed that this method outperformed the
framework to analyze the sentiment of aspect targets baseline study on both tasks by 6% and 39%
in comments. This framework captures distant respectively.
textual sentiment information through a multi-
attentive mechanism, employing a non-linear 2.2 Relevant Research Techniques
combination with recurrent neural networks to Sentiment analysis of text content is the complete
enhance the expressive power of the model, process of text preprocessing such as word
allowing it to handle more complex semantic segmentation, stop-word removal, and named entity
problems. The performance of this model is also recognition on the target text, followed by text
validated on four datasets (two from SemEval2014 vectorization, feature engineering, model training,
(restaurant and laptop reviews), a Chinese news classifier, and other processes to derive sentiment
review dataset, and Twitter datasets). tendency labels. A flowchart of text classification is
[22], found that most previous prediction presented in Figure 2.
methods used long- and short-term memory and
attention mechanisms to analyze the emotional
polarity of the target of interest, and that such
methods tended to be more complex and required
more training time. Therefore, it was proposed to
group the previous methods into two subtasks:
aspect-category sentiment analysis (ACSA) and
aspect-item sentiment analysis (ATSA). A model
based on gating mechanisms and convolutional
neural networks is also proposed, which is more
accurate and effective. The method firstly uses a
new gating unit, Tanh-ReLU, to selectively output
sentiment features based on a given entity or aspect;
this architecture is simpler than the attention layer
used in existing models; secondly, the computations
of this model are easily deserialized during training
and the gating unit works independently, and finally,
experiments on the SemEval dataset validate the
effectiveness of the model.
Arabic poses several challenges for the task of
sentiment analysis in Arabic because of its complex
grammatical structure and the lack of relevant
resources. Some scholars have taken the sentiment Fig. 2: Flowchart of text classification
during manual calibration. Moreover, traditional tasks related to sequence learning, such as speech
machine learning models require high model recognition, [31], language models, [32], part-of-
accuracy, and the explosion of information in speech tagging, [35], and machine translation, [36].
today's world makes it difficult for models to adapt Therefore, considering all factors, this paper uses
perfectly to complex and varied needs. In addition, LSTM as the deep learning model for sentiment
traditional machine learning has difficulty utilizing analysis.
contextual information, which can affect accuracy in
sentiment analysis. Deep learning-based sentiment 3.1 Recurrent Network Model
analysis methods can automatically learn deep RNNs, or Recurrent Neural Networks, excel in
features from a large amount of text information, processing sequences of data where context is
with good sentiment analysis effects and strong essential. One of the distinguishing features of
model adaptability, without the need for human RNNs is their ability to create directed loops
intervention in the learning process. Due to the low between nodes, [38]. Examples of sequence data
efficiency and quality of traditional methods, people that RNNs can handle well include speech
have begun to use deep learning to construct recognition, language prediction, garbage image
network models for text classification tasks. [30], classification, [39], and stock data analysis, [40].
reviewed more than 150 deep learning-based text Since the data at each node in the sequence is
classification models developed in recent years in related to the preceding and subsequent data points,
their review and discussed their technical RNNs can capture these dynamic relationships. By
contributions, similarities, and advantages. retaining previous information and using it as input
Therefore, this paper chose a deep learning-based for subsequent nodes, RNNs are ideal for analyzing
sentiment analysis method to complete the time-sequenced data.
sentiment judgment of text information.
Common deep-learning models include CNN, 3.2 RNN Model Gradient Disappearance
RNN, Transformer, GRU, and LSTM. Traditional Phenomenon
CNN models may not activate neurons that [41], proposed that standard RNNs suffer from
recognize the same object slightly differently due to gradient vanishing, which refers to the vanishing of
translational invariance, i.e., changes in the gradients in RNNs for more distant time steps. The
orientation or position of the same object. Moreover, BPTT method is used for backpropagation in RNNs,
the pooling layer causes a significant loss of where the gradient of loss against parameter W is
valuable information, ignoring the correlation equal to the sum of the derivatives of loss against W
between local and global features. Therefore, CNN at each time step. This can be expressed
models are difficult to accurately judge the precise mathematically as a formula.:
textual sentiment. Although RNN models can t
consider historical information during calculation ∂E ∂E ∂yt ∂ℎt ∂ℎi
=∑ t
and share weights over time compared to CNN ∂W ℎ ∂y ∂ℎt ∂ℎi ∂W ℎ
i=1
models, their computation speed is slow and cannot
consider any future input of the current state. In The calculation in the above equation is more
addition, RNN models often suffer from gradient complex and is based on a continuous derivative of
disappearance and explosion because it is difficult the complex function.
to capture long-term dependencies, and t
multiplication gradients can decrease or increase ∂ℎt ∂ℎk
i
= ∏
exponentially with the number of layers. Although ∂ℎ k=i+1
∂ℎk−1
GRU models can effectively alleviate the problem ∂ℎk
is the partial derivative of the current
of gradient explosion in RNN models, compared to ∂ℎk−1
GRU models, LSTM models have more parameters,
stronger functionality, and stronger expressive hidden state with respect to the previous hidden
power. state.
∂ℎk
LSTM has a similar working mechanism to = σ′W ℎ
RNN, but its implementation of more refined ∂ℎk−1
internal processing units enables effective storage Suppose that a time step j is (t-j) moments away
and updating of contextual information. Due to its from time step t. So:
excellent properties, LSTM has been used in many
t−j
∂ℎt The forget gate is closed when it tends to 1 and
i
= ∏ σ′W ℎ
∂ℎ opened when it tends to 0. By setting a large bias
term, most forget gates tend to 1. By setting a large
If t-j is large, that is, j is far from the t time step, bias term, most of the forgetting gates converge to 1.
whenσ′W h >1, a gradient explosion problem arises This also alleviates the problem of gradient
and σ′W h <1, there is a gradient disappearance disappearance due to fractional multiplication.
problem. And when t-j is small, there is no gradient
disappearance/gradient explosion problem. In
summary, the gradient of j farther away from time 4 Sentiment Analysis based on LSTM
step t will vanish and j does not affect the final
output y t has no effect on the final output. This 4.1 Data and Processing
means that there can be no long-term dependence on In this paper, the dataset is based on the comment
RNN. corpus collated by Tan Songbo, with 2000 positive
and negative examples each, which is a relatively
3.3 Gradient Disappearance Phenomenon small dataset. Examples are shown in Table 1.
To address the problem of long-term dependencies, Moreover, Table 2 presents model parameter
[42], proposed a Long Short-Term Memory (LSTM) settings.
network, which performs much better than RNN,
especially in long-distance dependent tasks, [43]. Table 1. Example of ChnSentiCorp data
The LSTM was originally designed so that the bias Positive Negative
It is a very nice 5-star Depressed!!!
derivative of the current memory unit with respect
hotel, the rooms are large, Angry!!! I don't
to the previous memory unit would be constant. As the facilities are new, and understand that the fiber
in the original version of the LSTM in 1997, the the location is convenient to optic is even slower than
memory cell update formula was the financial center, so I the internet speed in
would consider staying Shanghai Jinjiang Star,
C t = C t−1 + Zi ⨀x t there again. don't go to this place if
you want fast internet
∂Ct speed at night!!!!
=1
∂C t−1 The room was clean, The room was never
Later, to avoid the wireless growth of memory the facilities were ok, the arranged to have a frontal
furniture was a bit old. The lake view, especially as
cells, Σφάλμα! Το αρχείο προέλευσης της business room has a good the standard of the
αναφοράς δεν βρέθηκε., later refined the LSTM floor front desk and the reception was really poor,
cell by introducing the "forget gate". The updated price point is relatively low with grumbling and
formula is: for a 4-star. expressionless faces.
The hotel was clean, Not as bad as a good
C t = Zf ⨀Ct−1 + Zi ⨀x t the waiter would 2-star or no-star hotel
recommend me to the ladies'
The value of the partial derivatives at this non-smoking floor, the
moment is: facilities were better, and
the dim sum in the
∂Ct restaurant tasted ok.
= Zf
∂C t−1
AlthoughZf is a value in the interval [0,1], not in
the sense of satisfying the bias of the current
memory cell to the previous memory cell as a
constant. However, it is common to set a large bias
term to the forgetting gate such that the forgetting
gate is closed in most cases and open only in a few
cases. Recall the formula for the forgetting gate,
here we have added the bias b.
Zf = σ(Wf [ℎt−1 , xt ] + bf )
Word vectors: This experiment uses open- information into 300-dimensional word vectors. The
source word vectors and Chinese-word-vectors The parameters of the LSTM model were set as follows:
Word Vector is a Word Vector trained from the the maximum word count was set to 300 (setting the
Zhihu corpus. dimension too high would result in longer training
In this work, the data was divided into a training time); a buffer zone of 3500 was reserved; the
set and a test set in a ratio of 4:1. For the training regularization parameter was set to 0.5; the batch
and validation sets, the following format was size was set to 20; and the algorithm worked 25
followed when producing the training data: In the times on the entire training dataset.
text file, each row is the input for one sample, where
each paragraph is commented on for one line and 4.4 Results
separated from the word by space using jieba. Table 3. Experiment results using LSTM
Textual Results
4.2 Measurement Criteria
In this paper, recall, accuracy, precision, and F1 Emotional support
Tendencies Positive Negative
values are used as experimental measures and
positive texts are used to refer to texts with positive 1000
affective tendencies and negative texts to refer to Positive texts TP: 865 FN: 135
texts with negative affective tendencies. In the
above confusion matrix, TP is the number of texts 1000
correctly classified as positive; FN is the number of Negative text FP: 91 TN: 909
texts incorrectly classified as positive; FP is the
number of texts incorrectly classified as negative;
and TN is the number of texts correctly classified as
Table 4. LSTM model processing data results
negative.
precision recall f1- support
Precision is the percentage of texts judged to be score
of a certain type that is correctly judged. POS 0.90 0.87 0.88 1000
TP TN NEG 0.87 0.91 0.89 1000
p= or micro avg 0.89 0.89 0.89 2000
TP + FP TN + FN
macro avg 0.89 0.89 0.89 2000
The recall is the percentage of texts that are weighted 0.89 0.89 0.89 2000
actually of a certain type that are judged to be avg
correct.
TP TN 4.4.1 Model Training Results
R= or In this paper, 5000 positive and 5000 negative
TP + FN TN + FP
emotion texts were used for the training of the
The F1 value is the summed mean value of model, which were divided into a training set and a
precision and recall, which corresponds to the test set according to 4:1, with 8000 texts in the
combined precision and recall evaluation metric. training set, 4000 positive and 4000 negative
2∗P∗R emotions texts in the training set, and 2000 texts in
F1 = the test set, containing positive and the test set
P+R
contains 2000 texts, including 1000 positive and
Accuracy is the percentage of correctly judged 1000 negative texts. After processing the LSTM
texts out of all texts. model, the following results were obtained.
TP + TN In Table 3 we present the experiment results
Accuracy = using LSTM. Similarly, in Table 4 we present the
TP + TN + FP + FN
LSTM model processing the data results of our
paper. Specifically, regarding Table 4 properties, we
4.3 Parameter Settings specify the following:
The experiment in this paper used the open-source
word embedding model from Zhihu to train text
1. Macro average macro avg: sums the The results show that the classification accuracy
accuracy, recall, and F1 values for each of this LSTM model can reach a maximum of 0.89,
category to find the average. but there is still much room for improvement. The
2. Micro avg builds a global confusion LSTM model implemented in this paper aims to
matrix for each instance in the dataset, judge the sentiment tendency of user-generated
reviews on e-commerce platforms, to perform
regardless of category, and then
sentiment analysis on reviews on e-commerce
calculates the corresponding metric. platforms, and to provide a proven method for e-
3. weighted avg: an improvement on commerce platforms to judge the sentiment polarity
macro-averaging, considering the of user reviews and extract keywords in the process
number of samples in each category as a of investigating user feedback, to provide data
proportion of the total sample. support for merchants to understand consumers'
needs and real reviews, and to improve service
4.4.2 Validation of the Dataset Results quality in a targeted manner. It provides data
By importing a corpus of e-commerce reviews from support. Sentiment analysis of user reviews can
Baidu's library into the trained model, containing effectively find out whether users identify with a
1000 positive and negative texts each, the 2000 texts shop, observe how much they like the product, help
were divided into 20 groups of data, and the the management of the e-commerce platform to
accuracy, recall, F1 value, and accuracy of these 20 discover the strengths and weaknesses of the shop,
groups were calculated. The following graphs were improve the level of service and enhance user
generated from the results. satisfaction.
According to the analysis of the above graphs, The collective amount of data taken in this
we can find that: the accuracy of the positive text experiment is not large enough for effective analysis
can reach a maximum of 0.98 and a minimum value of non-semantic symbols and expressions, the
of 0.82; the accuracy of the negative text can reach a model training takes too long, and there are
maximum of 0.92 and a minimum value of 0.44; the individually large differences in the process of
recall of the positive text can reach a maximum of analyzing the accuracy of the validation set. The
0.91 and a minimum value of 0.64; the recall of the analysis of emoji information, the use of multiple
negative text can reach a maximum of 0.98 and a parameters, and the optimization of the model will
minimum value of 0.68; the accuracy of the positive be the next research directions in the future. In
text The maximum F1 value for positive text is subsequent research, a comparison between the
0.8727 and the minimum value is 0.7189; the optimized LSTM model and other neural network
maximum F1 value for negative text is 0.8383 and deep learning models will also be obtained with the
the minimum value is 0.5626; the accuracy of this increasing capability of text information recognition
LSTM model can reach up to 0.89. and generalization.
A comprehensive analysis of this LSTM model Based on the content of this paper, future
leads to the conclusion that the accuracy of this research can be conducted in four areas. First, it can
LSTM model still needs to be improved and further further optimize the sentiment analysis model and
improvements are needed to achieve more accurate try to use more efficient and accurate deep learning
sentiment propensity analysis. models, such as pre-trained language models such as
BERT and GPT, and combine with attention
mechanisms to improve the model's performance.
5 Conclusion Second, it is necessary to explore how to deal with
In this paper, the sentiment tendency analysis of e- the challenges of semantic complexity and
commerce platform reviews is carried out by the ambiguity in Chinese sentiment analysis, further
LSTM model, which is trained and validated by an improving the accuracy and robustness of the model.
open dataset downloaded from the web. Our Finally, it is necessary to consider the evolution of
research findings are summarized in Figure 4 emotions and contextual factors to more accurately
regarding our experimental results for the LSTM determine the user's emotional tendencies. These
model validation dataset. Moreover, Figure 5 research directions will help further improve the
showcases the accuracy, recall, and F1 values for effectiveness of text sentiment analysis based on
forward text whereas Figure 6 is for negative text. LSTM and make it more applicable to practical
Lastly, Figure 7 presents the overall accuracy of the scenarios.
studied data sets.
0,8
0,6
0,4
0,2
0
8 4 9 10 12 14 5 18 20 11 17 1 6 7 13 3 16 2 15 19
Fig. 4: Summary of experimental results for the LSTM model validation dataset
0,8
0,6
0,4
0,2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0,8
0,6
0,4
0,2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Accuracy
0,89
8 0,88
0,88
12 0,87
0,86
14 0,85
0,84
15 0,83
0,83
5 0,80
0,76
11 0,75
0,75
17 0,74
0,74
20 0,73
0,73
16 0,72
0,71
2 0,71
0 0,2 0,4 0,6 0,8 1
Accuracy
[25] Cai Y, Yang K, Huang D P, et al. A hybrid model [39] Feng Li, Lingling Wang, "Application of Deep
for opinion mining based on domain sentiment Learning Based on Garbage Image Classification,"
dictionary. International Journal of Machine WSEAS Transactions on Computers, vol. 21, pp.
Learning and Cybernetics,2019,10:2131-2142. 277-282, 2022
[26] Graves A, Mohamed A, Hinton G. Speech [40] Feng Li, Lingling Wang, "Case-Based Teaching for
Recognition with Deep Recurrent Neural Stock Prediction System Based on Deep Learning,"
Networks.Proceedings of International Conference WSEAS Transactions on Business and Economics,
on Acoustics, Speech and Signal Processing, 2013: vol. 19, pp. 1325-1331, 2022
6645-6649. [41] Bengio Y, Simard P, Frasconi P. Learning long-term
[27] B. Hou, J. Yang, P. Wang, and R. Yan, “LSTM dependencies with gradient descent is difficult.
Based Auto-Encoder Model for ECG Arrhythmias IEEE Transactions on Neural Networks, 1994, 5(2):
Classification”, IEEE Transactions on 157 - 166.
Instrumentation and Measurement, 2020, pp. 1232- [42] Hochreiter S, Schmidhuber J. Long short-term
1240. memory. Neural Computation, 1997, 9(8): 1735 -
[28] Gregor K, Danihelka I, Graves A, Rezende D J, 1780.
Wierstral D. DRAW: A Recurrent Neural Network [43] Gers F A, Schraudolph N N. Learning precise timing
for Image Generation.Proceedings of International with LSTM recurrent networks . Journal of Machine
Conference on Machine Learning, 2015: 1462-1471. Learning Research, 2002,3(1): 115 - 143.
[29] Mikolov T, Kombrink S, Deoras A, Burget L, [44] Gers F A, Schmidhube R J, Cummins F. Learning to
Cernocky A J H. RNNLM-Recurrent Neural forget: continual prediction with LSTM . Neural
Network Language Modeling Toolkit.Proceedings Computation, 2000, 12(10): 2451 - 2471.
of Automatic Speech Recognition and
Understanding Workshop, 2011: 196-201.
[30] Minaee S, Kalchbrenner N, Cambria E, et al. Deep Contribution of Individual Authors to the
learning based text classification: a comprehensive Creation of a Scientific Article (Ghostwriting
review. Computation and Language, 2020, 8(5):
Policy)
85616- 85638.
-Feng Li, Chenxi Cui carried out the simulation and
[31] Graves A, Jaitly N. Towards end-to-end speech
recognition with recurrent neural networks. the optimization.
Proceedings of the 31st International Conference on -Yashi Hu, Lingling Wang have organized and
Machine Learning. Beijing: JMLR, 2014: 1764 - executed the experiments of Section 4.
1772.
[32] Akase S, Suzuki J, Nagata M. Input-to-output gate Sources of Funding for Research Presented in a
to improve RNN language models. arXiv Preprint, Scientific Article or Scientific Article Itself
2017, 2017: arXiv: 1709. 08907. This work was supported in part by the
[33] Miyamoto Y, Cho K. Gated word-character Undergraduate teaching quality and teaching reform
recurrent language model. arXiv Preprint, 2016, project of Anhui University of Finance and
2016: arXiv: 1606. 01700. Economics under Grant No. acszjyyb2021035.
[34] Jozefowicz R, Vinyals O, Schuster M, et al.
Exploring the limits of language modeling. arXiv
Conflict of Interest
Preprint, 2016, 2016: arXiv: 1602. 02410.
The authors declare that the research was conducted
[35] CHO K, Van Merrienboer B, Gulcehre C, et al.
Learning phrase representations using RNN in the absence of any commercial or financial
encoder-decoder for statistical machine translation. relationships that could be construed as a potential
arXiv Preprint, 2014, 2014: arXiv: 1406. 1078. conflict of interest.
[36] Bahdanau D, Cho K, Bengio Y. Neural machine
translation by jointly learning to align and translate. Creative Commons Attribution License 4.0
arXiv Preprint, 2014, 2014: arXiv: 1409. 0473. (Attribution 4.0 International, CC BY 4.0)
[37] Wu Y, Schuster M, Chen Z, et al. Google's neural This article is published under the terms of the
machine translation system: Bridging the gap Creative Commons Attribution License 4.0
between human and machine translation. arXiv https://creativecommons.org/licenses/by/4.0/deed.en
Preprint, 2016, 2016: arXiv: 1609. 08144. _US
[38] Mou, Lichao, Pedram Ghamisi, and Xiao Xiang
Zhu. "Deep recurrent neural networks for
hyperspectral image classification." IEEE
Transactions on Geoscience and Remote Sensing
55.7 (2017): 3639-3655.