0% found this document useful (0 votes)
36 views

Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

Uploaded by

jaa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

Uploaded by

jaa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in IEEE Transactions on Quantum Engineering.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/TQE.2020.doi

Application of Quantum Recurrent


Neural Network in Low Resource
Language Text Classification
WENBIN YU1,2,4 (Member,IEEE), LEI YIN1 , CHENGJUN ZHANG2,3,4 (Member,IEEE), YADANG
CHEN3 (Member,IEEE), ALEX X. LIU5 (Fellow,IEEE)
1
School of Software, Nanjing University of Information Science and Technology, Nanjing 210044,China; [email protected](W.Y.)
2
Nanjing University of lnformation Science & Technology, Wuxi Institute of Technology, Jiangsu Wuxi 214000, China
3
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China; [email protected](Y.C.)
4
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and
Technology, Nanjing 210044, China
5
Shandong Provincial Key Laboratory of Computer Networks, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of
Technology (Shandong Academy of Sciences), Jinan 250014, China; [email protected]
Corresponding author:[email protected] (C.Z.).
This research was funded by the Natural Science Foundation of China, grant numbers 62071240; the Natural Science Foundation of
Jiangsu Province, grant number BK20231142; Innovation Program for Quantum Science and Technology, grant number 2021ZD0302900.

ABSTRACT
Text sentiment analysis is an important task in natural language processing and has always been a hot
research topic. However, in low-resource regions such as South Asia, where languages like Bengali are
widely used, the research interest is relatively low compared to high-resource regions due to limited
computational resources, flexible word order, and high inflectional nature of the language. With the
development of quantum technology, quantum machine learning models leverage the superposition property
of qubits to enhance model expressiveness and achieve faster computation compared to classical systems.
To promote the development of quantum machine learning in low-resource language domains, we propose
a quantum-classical hybrid architecture. This architecture utilizes a pre-trained multilingual BERT model to
obtain vector representations of words and combines the proposed Batched Upload Quantum Recurrent
Neural Network (BUQRNN) and Parameter Non-shared Batched Upload Quantum Recurrent Neural
Network (PN-BUQRNN) as feature extraction models for sentiment analysis in Bengali. Our numerical
results demonstrate that the proposed BUQRNN structure achieves a maximum accuracy improvement of
0.993% in Bengali text classification tasks while reducing average model complexity by 12%. The PN-
BUQRNN structure surpasses the BUQRNN structure once again and outperforms classical architectures in
certain tasks.

INDEX TERMS Natural language processing,Quantum machine learning,Quantum recurrent neural


network.

I. INTRODUCTION plexity of language grammar, limited usage, and expensive


As one of the classical subfields of machine learning [1]– computational resources, SA in low-resource languages has
[3], natural language processing (NLP) [4], [5] has been a not been extensively explored. With the development of the
hot research topic in recent years. Text sentiment analysis internet, a large influx of textual comments has made SA
(SA) [6], as a subtask of NLP, aims to classify text into in low-resource languages feasible. In general, effective SA
positive and negative sentiment categories by detecting the tasks can be achieved by combining good word embedding
polarity of the text. SA has been applied in various domains, models with efficient feature extraction models. In the case
including lexicon-based SA [7], machine learning-based SA of studying word embeddings for Bengali texts, a significant
[8], and deep learning-based SA [9]. Remarkable results have challenge lies in capturing the rich expressions of sentiment
been achieved in SA for high-resource languages such as present in the Bengali language. Due to the complexity of
English and Chinese [10], [11]. However, due to the com- grammar rules, the extraction of features for sentiment clas-

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

sification becomes intricate as sentiment information may be parameter-sharing linear layers both before and after each
expressed differently in sentences. The emergence of pre- VQC. This allows for independent optimization of each
trained language models [12], [13] has improved feature VQC, which is advantageous for the model.
extraction in Bengali sentiment classification tasks, as they Our contributions are as follows:
can effectively capture sentiment information within Bengali • We proposed a Batched Uploading Quantum Recurrent
sentences by learning rich language representations and con- Neural Network (BUQRNN) specifically designed for
textual understanding. sequential data. This method requires only a S number
Despite the positive role of pre-trained language models of qubits and mitigates the loss of semantic information
in Bengali sentiment classification tasks, traditional feature caused by previous approaches.
extraction models still face efficiency challenges. Quantum • We introduced a Parameter Non-sharing BUQRNN
deep learning [14], [15] combines the concepts of quantum (PN-BUQRNN) that employs independent linear layers
computing and deep learning, leveraging the parallelism ad- for each VQC, enabling independent optimization of
vantage [16] of quantum computing to accelerate the training each VQC in the model structure.
and inference processes of models. By utilizing quantum • For the first time, we applied quantum algorithms to
neural networks and quantum gate operations, quantum deep address sentiment classification tasks in low-resource
learning models can handle complex sentiment classification languages such as Bengali, which holds significant im-
tasks more efficiently [17]. A quantum-classical hybrid recur- portance for advancing the development of quantum in
rent neural network model (QRNN) based on the quantum low-resource languages.
variational circuit (VQC) core was proposed in literature
The rest of the paper is organized as follows. Section
[18]. Such networks have been successfully applied as fea-
2 focuses on the text classification process and discusses
ture extractors in text classification tasks for high-resource
word embedding techniques and the QRNN network for low-
languages, demonstrating better performance compared to
resource language text classification. Section 3 presents the
their classical counterparts. Considering the characteristics
specific implementation approach to address the aforemen-
of low-resource languages, this sparks the idea of using
tioned issues. Section 4 describes the numerical simulation
quantum algorithms to improve low-resource text SA tasks.
results, and Section 5 concludes the paper, providing insights
However, previous studies have shown that QRNNs may
into future directions and prospects.
struggle to effectively capture semantic information and may
even result in information loss when dealing with longer
II. RELATED WORK
sequences. The current challenges can be summarized as
follows: Reference [21] employed a combination of Multilingual
BERT embeddings and RNN for text classification tasks
• Due to the limitations of current Noisy Intermediate- in Bengali. In this section, we will introduce an improved
Scale Quantum (NISQ) devices [19], it is necessary to quantum-classical hybrid model based on its architecture.
match the dimensionality of the input sequence with the We will discuss various methods for text classification in the
number of qubits. Previous QRNN models employed low-resource language domain and provide an overview of
parameter-sharing linear layers to reduce the dimension- Bengali-specific word embedding techniques and the QRNN
ality of the input data, which may potentially result in model.
the loss of semantic information to some extent.
• Previous QRNN models did not optimize the Quantum A. WORD EMBEDDING
Variational Circuit (VQC) specifically but instead uti-
In the domain of high-resource languages, there are generally
lized parameter-sharing linear layers for optimization
two methods for generating word vectors: contextual word
across all VQCs.
embedding techniques and non-contextual word embedding
In response to the first situation mentioned above, we techniques. BERT as a context-based pre-trained language
designed and utilized a Batched Uploading Quantum Neural model, can also be employed for word embedding. In the ref-
Network (BUQNN), which is essentially a structure that erence [22], BERT is utilized to obtain word vectors for En-
incorporates VQC circuits. The BUQNN divides the input glish text, followed by feature extraction using the quantum
feature sequence into batches and loads them into the circuit TCN. The results demonstrate the high performance of BERT
to obtain the complete semantic information. We refer to the while also indicating the feasibility of related quantum mod-
QRNN model that utilizes this BUQNN as BUQRNN. By els in text classification tasks. Similar to the high-resource
adopting this approach, we can alleviate the semantic infor- language domain, low-resource languages can be categorized
mation loss caused by previous methods with a S number of into two main types in terms of word embedding methods.
qubits. The first type is non-contextual word embedding methods.
Regarding the second situation mentioned, in reference Reference [23] used the word2vec model to generate Bengali
[20], a non-parameter-sharing linear layer was applied after word vectors. However, word2vec does not handle subword
the VQC to enhance the expressiveness of the circuit. We information, such as affixes, and does not consider spelling
followed this idea and made improvements by using non- characteristics of words. In contrast to Word2Vec, GloVe [24]
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

considers both local context and global statistical information and finally collapses them into classical states through mea-
to generate word vectors. FastText [25], an open-source surement operations. The parameters of the rotation gates
word embedding and text classification library developed by need to be updated through gradient descent. Fig.1 illustrates
Facebook AI Research, differs from Word2Vec and GloVe the structure of the variational quantum circuit included in
in that it considers subword information within words. In the QLSTM network. In this case, we assume the circuit
contrast to non-contextual word embeddings, the second is simulated with 4 qubits. The input word vector sequence
type, contextual word embeddings, take into account the v⃗t = (vt1 , vt2 , ..., vtn ), where v⃗t is composed of the current
context of words in specific sentences or texts, enabling better MBERT output et and the previous hidden layer feature
capture of word semantics. Multilingual BERT (MBERT) ht−1 . Initially, a linear layer is used to match the number
[26], as a multilingual version of the BERT model, has of qubits, and the encoding layer utilizes angle encoding [31]
been pretrained on Wikipedia texts from 104 languages. An to embed the input sequence into the circuit. The variational
important feature of MBERT is its bidirectional self-attention layer optimizes the rotation angles of the qubits on the Bloch
mechanism using the Transformer model, which allows for sphere [32] using a series of rotation gates with updatable
a better understanding of the complex semantics of words parameters. The measurement part employs Pauli-Z gates to
within their context. Experimental results [21] demonstrate measure the states  of the  qubits. Measuring in the Pauli-
that compared to traditional word embeddings such as Fast- 1 0
Z basis σz = means projecting the state onto
Text and GloVe, MBERT performs better in Bengali word 0 −1
embeddings. In addition to employing classical methods for one of the eigenstates of the Pauli-Z matrix, namely |0⟩ or
text word embedding as mentioned above, the reference [27] |1⟩. The measured classical bits are then expanded to match
proposes a novel technique for text word embedding using the size of the hidden layer through another linear layer.
a quantum language model. The results indicate that word However, when I > n (where n is the number of qubits
vectors mapped by the quantum language model can achieve and I is the length of the feature sequence), the current
performance comparable to their classical counterparts in approach typically reduces the input dimensionality to match
downstream tasks. In this paper, we will choose MBERT as the number of qubits through a linear layer, which may
the word embedding model for Bengali text. The sentence result in the loss of semantic information to some extent.
vector x can be represented as x = (x1 , x2 , x3 , . . . , xt ) We draw inspiration from a quantum algorithm for image
after tokenization, where xt represents the t-th word in the classification on the MNIST handwritten dataset [33], which
sentence. After passing through the MBERT model, the differs from the previous data re-uploading approach [34].
corresponding BERT vector representation can be obtained The former method slices the image horizontally and uploads
as follows: the value of each pixel to the quantum circuit, while the latter
repeatedly uploads the feature vector to the circuit. Due to
MBERT(x1 , x2 , x3 , . . . , xt ) = (e1 , e2 , e3 , . . . , et ) (1) the limitations of current NISQ devices, the latter approach
requires significant computational resources for processing
Here, each et represents the MBERT vector representation of high-dimensional feature sequences, making it difficult to
the sentence with index j at time t. implement. The former approach, by batch uploading, re-
duces the dependence on the number of qubits and fully loads
B. QRNN the feature sequence into the circuit. We aim to address the
first issue mentioned in the introduction of traditional QRNN
using this method.
In addition, the parameters of the linear embedding layer
before each VQC and the linear expansion layer after each
VQC in traditional QRNN are shared. Taking QLSTM as
an example in reference [28], the model uses four VQCs to
replace the classical network layers, and these four VQCs
share the embedding layer and expansion layer. However,
this shared linear layer cannot be effectively optimized for
different VQCs. In reference [20], non-shared linear layers
with separate parameters were added after each VQC in
FIGURE 1. The VQC structure used in QLSTM consists of (a) an encoding
layer, (b) a variational layer, and (c) a measurement layer. The encoding layer
QLSTM to achieve independent optimization of the circuits.
is composed of a series of rotation gates used for encoding. We were inspired by this and made improvements by incor-
porating non-shared parameter linear layers before and after
QLSTM [28] is a variation of QRNN that incorpo- each VQC, along with the aforementioned batch uploading
rates a variational quantum circuit (VQC) [29] to construct approach, to achieve more comprehensive circuit optimiza-
quantum-gated units. The VQC adjusts the initial states of tion and further improve the performance of QRNN.
qubits through a series of rotation gates, performs quantum
entanglement operations using a series of CNOT gates [30],
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

FIGURE 2. The framework includes two red dashed sections, showcasing the proposed structures designed to address the mentioned issues.

III. THE QRNN WE PROPOSE the dimensionality of the input word vector features. This
Regarding the first issue mentioned in the introductory sec- assumes a scenario where the division is evenly divisible. If it
tion, in sections A and B, we demonstrate the replacement is not divisible, the number of batches p needs to be increased
of classical neural networks in classical RNN networks with by 1, and the remaining space is padded with zero elements.
a Batched Uploading Quantum Neural Network (BUQNN) ⃗ 1 , batch
After the division, v⃗t = (batch ⃗ 2 , ..., batch
⃗ p ), where
and apply it to QRNN, proposing a BUQRNN based on ⃗ is a vector containing N features. In this demonstration,
batch
a classical-quantum hybrid framework. Unlike traditional we use four qubits, and there is one variational layer between
QRNN, which requires matching the number of qubits to any two encoding layers. This restriction is just a design
the feature vectors, our approach provides universality and choice, and alternative design schemes can be chosen. Each
effectiveness for handling higher-dimensional feature vec- feature batch undergoes angle encoding embedding circuit
tors. BUQNN divides input features into batches according and then a variational layer. One feature batch is uploaded
to a predefined number of qubits and passes them through to the circuit at a time, and after loading p times, all feature
a variational quantum circuit, forming an n-layer encoding- vectors can be embedded in the circuit. It is important to note
variational hybrid structure. This approach allows for pro- that the essence of BUQNN is a VQC using batch uploading,
cessing sequence data without reducing the dimensionality of which differs from the structure of the VQC included in
feature vectors and does not require a large number of qubits traditional QLSTM networks. It can directly embed feature
to handle the data. Addressing the second issue mentioned vectors into the circuit without the need for additional linear
in the introductory section, section C provides a specific layers. The BUQNN with linear layers will be mentioned in
solution. The workflow for the sentiment classification task Section D of this chapter.
in Bengali language is shown in Fig. 2. The input text is
transformed into word embeddings using MBERT, and then
the word vectors are fed into the proposed BUQRNN or
PN-BUQRNN for feature extraction. Finally, the extracted
features are input into a fully connected layer for classifi-
cation. We will now describe the implementation details of
BUQRNN and PN-BUQRNN.

A. BATCH UPLOADING QUANTUM NEURAL NETWORK


FIGURE 3. The encoding layer circuit and variational layer structure that we
In the QNN, the encoding gate, decoding gate, and varia- use.
tional gate are further divided into encoding layers, decoding
layers, and variational layers. The selection of encoding Fig.3 illustrates the encoding layer circuit and variational
gates is based on the chosen encoding method and the layer structure that we utilize. In the simulation with n
number of input features. The optimal choice of encoding ⃗
qubits, let’s consider the example of a batch vector batch1 =
1 2 N i
method is crucial for successful learning of the QNN model. (vt , vt , ..., vt ) where 1 ≤ i ≤ N . For each vt , we generate
2
We implement the BUQNN using a multi-layer encoding- angles θi,1 = arctan(vti ) and θi,2 = arctan(vti ), resulting
variational structure. The left portion of Fig.2 illustrates the in a total of 2i rotation angles. θi,1 is applied using the
BUQNN structure that we employ. We divide the features Ry(θi,1 ) gate for rotation around the y-axis, while θi,2 is
v⃗t = (vt1 , vt2 , ..., vtn ) into batches based on the number applied using the Rz(θi,2 ) gate for rotation around the z-axis.
of qubits, such that the number of batches is n/N = p. The encoded data is in a quantum state and undergoes a series
Here, N represents the number of qubits, and n represents of unitary operations, including multiple CNOT gates and
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

single-qubit rotation gates. R(θ, α, γ) represents a general


parameterized single-qubit rotation gate and can be expressed
as:

eiθ2 cos(θ1 ) eiθ3 sin(θ1 )


 
R(θ1 , θ2 , θ3 ) =
−eiθ3 sin(θ1 ) e−iθ2 cos(θ1 )

Assuming a quantum circuit consists of 4 qubits.The


aforementioned process of quantum state changes can be
summarized as follows:
• First, the initial state of the four qubits in the quantum
circuit is denoted as |ψ0 ⟩, which is typically a pure FIGURE 4. Our Proposed BUQLSTM.
state with all qubits in the ground state |0⟩, i.e., |ψ0 ⟩ =
|0000⟩.
• Next, batch ⃗ 1 is encoded into |ψ0 ⟩ to form a new quan-
tum state |ψ1 ⟩. The encoding operation uses RY and ft = σ(BU QN Nf ); (2)
RZ gates, the rotation angles of which are determined it = σ(BU QN Ni ); (3)
by the elements in batch ⃗ 1 . We denote this operation ∼
⃗ ct = tanh(BU QN Nc ); (4)
as U (batch1 ), so the quantum state becomes |ψ1 ⟩ =

⃗ 1 )|ψ0 ⟩.
U (batch ct = ft ∗ ct−1 + it ∗ ct ; (5)
• In the third step, additional quantum operations, V (θ1 ), ot = σ(BU QN No ); (6)
are applied to |ψ1 ⟩ to introduce more complex quantum
ht = ot ∗ tanh(ct ). (7)
correlations, generating a new quantum state |ψ1′ ⟩ =
V (θ1 )|ψ1 ⟩ = V (θ1 )U (e1 )|ψ0 ⟩. Here, V (θ1 ) is a vari- The BUQLSTM that we propose uses four BUQNN net-
ational layer controlled by the parameter θ1 . works, represented by BU QN Nn (n ∈ f, i, c, o). Through
• Repeat steps 2 and 3, encoding (batch ⃗ 2 , ..., batch⃗ p)
the above calculations, the LSTM network can obtain the
into the respective quantum states, and performing uni- hidden state ht and cell state ct at time step t.
tary operation V (θi ) after each encoding, where i ∈ Algorithm 1 outlines the numerical computation process
1, 2, ..., p. The resulting series of quantum states are of BUQLSTM. Initially, within each gate unit, the input
|ψ2 ⟩, ..., |ψp ⟩ and their variational states |ψ2′ ⟩, ..., |ψp′ ⟩. vector "inputs" is partitioned into several "batch" vectors.
For example, for batch ⃗ 2 , the encoded state is Subsequently, these vectors are sequentially embedded into
the quantum circuit following the encoding-variation order.
⃗ 2 )|ψ ′ ⟩
|ψ2 ⟩ = U (batch The "weights" will be updated as part of the subsequent
1
⃗ ⃗ 1 )|ψ0 ⟩ optimization process. Finally, the expectation values of the
= U (batch2 )V (θ1 )U (batch
PauliZ operators on the relevant qubits are calculated, and
the results are returned in the form of a list for further
then applying the unitary operation of the varia-
computations. Here is the explanation for the 4 BUQNN
tional layer V (θ2 ) results in |ψ2′ ⟩ = V (θ2 )|ψ2 ⟩ =
⃗ 2 )V (θ1 )U (batch
⃗ 1 )|ψ0 ⟩. modules used in BUQLSTM:
V (θ2 )U (batch
BU QN Nf : BU QN Nf obtains the vector ft by combin-
• Finally, perform the measurement operation M on the
ing a sigmoid function and maps the expectation value to the
quantum state |ψp′ ⟩, compute the expectation value of
interval [0,1]. ft is a crucial component of the BUQLSTM
the measurement result, which is usually obtained by
network, with its output shown in (2). It operates on ct−1
random sampling in the Pauli basis. The expectation E
based on ft ∗ ct−1 , meaning it decides whether to "forget"
can be represented as E = ⟨ψp′ |M |ψp′ ⟩, and the obtained
or "retain" the corresponding elements from the previous cell
expectation value is used for subsequent calculations.
state ct−1 . For instance, values of 1 or 0 indicate that the
corresponding elements will be entirely retained (forgotten).
B. BATCH UPLOADING QUANTUM LSTM Typically, the ft vector affects cell state values between 0
Similar to QLSTM, we replace the classic neural network and 1, indicating that only part of the information will be
in LSTM with the aforementioned BUQNN. Fig.4 shows retained. Its function is vital for learning and modeling time
the BUQLSTM network we proposed, which consists of 4 dependence.
BUQNNs. The expectation values output by BUQNN are BU QN Ni and BU QN Nc : Firstly, the BU QN Ni mod-
combined in the LSTM network after passing through nonlin- ule processes the input data vt and outputs a set of values
ear activation functions, such as tanh and sigmoid, to update between 0 and 1 through a Sigmoid function, determining
the values of various gating units. The calculation process of which information can be added to the current cell state.
the 4 BUQNN units is as follows: Simultaneously, the BU QN Nc module also processes the
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification


same input data and generates a new cell state candidate ct
through a tanh function (as shown in (4)). Equation (5) com-
bines the output of the input gate with the forget gate ft , the
cell state of the previous moment ct−1 , and the new cell state

candidate ct , and the resulting vector will be used to update
the current cell state. In other words, the output of the input
gate (a real number between 0 and 1) determines how much

of the new information ct is added to the current cell state
ct . This mechanism allows LSTM to better remember long-
term dependencies, avoiding the vanishing gradient problem
of ordinary RNNs.
BU QN No : The goal of BU QN No is to generate the
output of the cell. In Equation (6), Ot obtains its output FIGURE 5. Our Proposed BUQGRU.
through the sigmoid function after obtaining the expectation
value from BU QN No . In Equation (7), the output ot is a brief introduction to the BUQGRU network. Fig.5 depicts
multiplied element-wise with the output of the update gate our BUQGRU network, where we replace the classical neural
ct (which is processed through a tanh activation function), network of the GRU model with BUQNN. Compared to
generating the new hidden state vector ht , which will be BUQLSTM, it only requires three BUQNNs. The current
passed to the next time step for calculation. input etj and the previous moment’s ht−1 are fed into the
network. The reset gate, composed of BU QN Nr and the
Algorithm 1 An algorithm for BUQLSTM. sigmoid activation function, determines how much of the
BUQLSTM( INPUT _ SIZE , HIDDEN _ SIZE ) previous moment’s hidden state information should be used
inputs = concatenate(input_size, hidden_size) when calculating the current candidate hidden state. The
forget gate : devicef = device(backend, wires = wf ) update gate, comprised of BU QN Nz and the sigmoid func-
#circuit_forget(inputs, weights): tion, determines how much of the previous moment’s hidden
split inputs into p batches state information should be preserved when calculating the
for batch in p: current hidden state. Next, BU QN Nz combined with the
encoding(batch, wires = wf ) tanh activation

function is used to calculate the candidate
variation(weightsf , wires = wf ) hidden state ht . Finally, equation (11) decides whether the
return ([Expectation(P auliZ(wire) for each wire]) new hidden state ht should fully accept the candidate hidden
input gate : devicei = device(backend, wires = wi ) state, retain the previous moment’s hidden state, or be a
#circuit_input(inputs, weights): compromise between the two.
split inputs into p batches
rt = σ(BU QN Nr ); (8)
for batch in p:
encoding(batch, wires = wi ) zt = σ(BU QN Nz ); (9)

variation(weightsi , wires = wi ) ht = tanh(BU QN Nh ); (10)
return ([Expectation(P auliZ(wire) for each wire]) ∼
update gate : devicec = device(backend, wires = wc ) ht = (1 − zt ) ∗ ht−1 + zt ∗ ht ; (11)
#circuit_update(inputs, weights):
split inputs into p batches Here, BU QN Ni , (i ∈ r, z, h) represent the reset gate
for batch in p: circuit, update gate circuit, and candidate hidden state circuit,
encoding(batch, wires = wc ) respectively. It is important to note that, due to the character-
variation(weightsc , wires = wc ) istics of the GRU network, the input to BU QN Nh differs
return ([Expectation(P auliZ(wire) for each wire]) from other BU QN N . In the computation of Equation (10),
output gate : deviceo = device(backend, wires = wo ) the output rt of the reset gate is multiplied by the hidden
#circuit_output(inputs, weights): state output ht−1 from the previous time step. This product
split inputs into p batches determines how much information from the previous time
for batch in p: step can be utilized. The resulting product is concatenated
encoding(batch, wires = wo ) with etj and serves as input to BU QN Nh . After passing
variation(weightso , wires = wo ) through the tanh activation function, a new candidate hidden
return ([Expectation(P auliZ(wire) for each wire]) state is obtained.

D. PARAMETER NON-SHARING BUQNN


C. BATCH UPLOADING QUANTUM GRU In this section, we will introduce the proposed PN-BUQNN
Above, we have detailed the structure of BUQLSTM. Similar (Parameterized-Nonshared BUQNN) to enhance the learning
in principle to BUQLSTM, in this section, we only provide capability of BUQNN. As shown in Fig.6.(a), in traditional
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

QLSTM networks, the linear embedding layers before each ⃗ 1 , and wi denotes the updat-
Here, vt1 is an element in batch
VQC and after each VQC are non-shared parameters. While able rotation gate parameters. From (13), it is evident that the
this reduces the parameter count of the model, it may intro- value of vt1 should not be too large or too small, as it may
duce some issues. By using the same linear transformation lead to gradient vanishing. Therefore, a clip operation on the
for all variational quantum circuits, the model’s ability to output can be added after the linear layer preceding BUQNN,
learn different input data features could be limited. For restricting the output within a specified range. Through this
instance, if the forget gate and the input gate need to extract approach, it is possible to optimize the weights of the linear
different features from the input data, a model with shared pa- transformation effectively, thus mitigating gradient vanishing
rameters may struggle to simultaneously satisfy the require- to a certain extent. In subsequent experiments, we will use
ments of both gates.Furthermore, during the training process, such a linear layer to control the output within the range of
all variational quantum circuits backpropagate through the [-3, 3].
same linear layer. This can lead to gradient explosion or
vanishing gradients. If the gradient of a particular variational IV. DATASET AND EXPERIMENTAL RESULT
quantum circuit is exceptionally large, it may "overwhelm" A. DATASET
the gradients of other circuits, making it difficult for the entire The experiments utilized two Bengali text classification
network to learn effectively.To address the aforementioned datasets. The BOOK-Reviews dataset [36] is a collection
issues, we propose PN-BUQNN, as depicted in Fig.6.(b). of Bengali book reviews gathered from the internet (such
as blogs, Facebook, and e-commerce websites). This dataset
is a binary classification dataset (with positive and negative
classes) containing 2000 book reviews. The other dataset
used in the experiment is YouTube-B [37], a collection of
comments on Bengali dramas collected from the YOUTUBE
site. It contains 11807 comments, of which 8500 are positive
and 3307 are negative. Given the limitations of current NISQ
devices on the efficiency of quantum algorithms in classifi-
cation tasks, we selected 2000 entries from the YouTube-B
FIGURE 6. (a) represents a parameter-shared VQC, while (b) represents our
dataset for the experiment, with 1700 for training and 300
proposed PN-BUQNN. for testing. We named the modified YouTubeB dataset as
We employ linear layers before and after each BUQNN, YouTubeB-S. The average number of words per sentence
which means they can learn more appropriate input and in the BOOK-Reviews and YouTubeB-S datasets is 46 and
output transformations for themselves. The independent lin- 21, respectively. As shown in Table 1, to prevent long-tail
ear layers imply that each BUQNN can learn and extract distribution in the data, the number of data entries in each
different features, with optimizations being specific to each category was kept approximately equal. The dataset we used
BUQNN. During the training process, since each BUQNN is available at https://github.com/nuistyl/Bengali-dataset.
has its own linear layer, their gradient updates will no
longer affect each other. Furthermore, adding a linear layer TABLE 1. The datasets used in the experiment are the BOOK-Reviews
dataset and the YouTubeB-S dataset
before BU QN N can to some extent alleviate the issue
of gradient vanishing. The issue of gradient vanishing is Dataset Positive Negative All
discussed in low-qubit Variational Quantum Circuits (VQC)
BOOK-Reviews 996 1004 2000
[35], and a similar problem exists in the encoding layer of
BU QN N . The input feature vector vti generates different YouTubeB-S 1005 995 2000
rotation angles for the parameters of Ry and Rz gates using

the arctan function. For the input batch1, the encoding layer
of BU QN N can be represented as: encoding(batch ⃗ 1) = TABLE 2. Comparison of the Number of Model Parameters Used

Rz (arctan(vt1 )2 )Ry (arctan(vt1 )) |0⟩ . . . Rz (arctan(vtN )2 )


Ry (arctan(vtN )) |0⟩. It can be observed that the encoding Model Number of Number of Classical All
Quantum Gates Parameters
layer of BUQNN involves the use of arctan, whose derivative
LSTM 0 224 224
may lead to gradient vanishing. For instance, the derivative
of Ry (arctan(vt1 )) is given as follows: GRU 0 168 168
  QLSTM 96 72 168
∂Ry arctan vt1 1 T ∂ arctan vt
 1
= −Ry arctan vt QGRU 72 72 144
∂wi ∂wi
BUQLSTM 144 20 164
(12)
 BUQGRU 108 20 128
∂ arctan vt1 1 ∂vt1
= 2 ∂w (13)
∂wi 1
1 + (vt ) i In the subsequent experimental process, we will use these
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

FIGURE 7. Experimental results graph.

two datasets to test the two structures shown on both sides of resulting in the embedding vector etj . To facilitate quantum
Fig.2, respectively. simulation, we combined MBERT with a linear layer to
control the feature size to 8 dimensions. We also compared
B. STRUCTURE WITH BUQRNN it with a classical LSTM network with an input size of
In this section, we conducted experiments on the BUQRNN 8 dimensions and 224 parameters. All experiments used a
architecture shown in Fig.2. We employed BUQNN to define hidden layer size of 4 dimensions, and the feature dimensions
BUQLSTM and BUQGRU and conducted experiments using combined with the hidden layer dimensions were fed into
a 4-qubit circuit. For comparison, we also constructed a tradi- the model. For a fair comparison, the depth of the VQC in
tional variational quantum circuit with 4 qubits. In Equation the traditional QRNN was set to 2. Reference [38] mentions
(1), the input vector is passed through the MBERT model, that when the architecture of the VQC is extensive when the

8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

FIGURE 8. Experimental results graph.

circuit depth does not decrease the number of gates - the are shown in Table 2. The experiments were conducted using
quantum data classification errors of VQC typically decrease the ADAMW optimizer and the cross-entropy loss function.
exponentially with the increase in circuit depth. This rapid The PENNYLANE framework was used for modeling the
error suppression ends when reaching the final Helstrom limit quantum circuits, which includes multiple built-in simulators
of quantum state discrimination. However, considering the to meet different task requirements.
limitations of NISQ devices, it is challenging to train more All the aforementioned experiments were conducted under
parameters. In future work, we plan to further explore the im- the conditions of a learning rate of 0.01, 50 epochs, and
pact of VQC depth on classification accuracy by improving the use of linear warm-up optimization method. To enhance
experimental design and adopting more advanced quantum the persuasiveness of the experimental results, we conducted
devices. The model parameter counts used in the experiments 10 experiments for each trial using datasets divided into
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

different partitions to obtain the mean accuracy. Due to space structure even performs worse in certain tasks. This is due to
constraints, we present only the first three results for each the information loss caused by the dimensionality reduction
experiment. Fig.7 presents the comparison results of different of the linear layer, which prevents it from working effectively.
models on two datasets, where the highest accuracy is shown In contrast, in the PN-BUQRNN structure, we fully load
in Table 3. Overall, while quantum structures have disadvan- the information into the BUQNN circuit and optimize each
tages compared to classical structures, the BUQRNN-based BUQNN circuit independently, resulting in better results.
structures show improved accuracy compared to traditional
QRNN structures on both datasets. This is because traditional TABLE 4. The accuracy of the structure with PN-BUQRNN compared to the
baseline structure on the YouTubeB-S and BOOK-Reviews validation sets
QRNN utilizes linear layers to reduce the dimensionality of
the input feature sequence during circuit simulation, which
Model BOOK-Reviews YouTubeB-S
may result in a loss of semantic information to some extent.
MBERT+LSTM 84.231 92.394
TABLE 3. Accuracy of Structures with BUQRNN and Baseline Structures on MBERT+PN-QLSTM 84.266 91.416
BOOK-Reviews and YouTubeB-S Validation Sets
MBERT+PN-BUQLSTM 84.738 92.681

Model BOOK-Reviews YouTubeB-S MBERT+GRU 84.524 92.214

MBERT+LSTM 84.231 92.394 MBERT+PN-QGRU 84.343 90.762

MBERT+QLSTM 82.235 91.266 MBERT+PN-BUQGRU 85.304 92.690

MBERT+BUQLSTM 83.121 92.037


MBERT+GRU 84.524 92.214 D. ABLATION STUDY ON PARAMETER NON-SHARING
MBERT+QGRU 83.857 90.735
CIRCUITS
MBERT+BUQGRU 84.227 91.728

In contrast, BUQRNN adopts the approach of upload-


ing the complete sequence into the circuit in batches. On
the YOUTUBE dataset, the BUQLSTM structure and the
BUQGRU structure achieve improvements of 0.771% and
0.993%, respectively, compared to the QLSTM structure and
the QGRU structure. On the BOOK-REVIEWS dataset, the
BUQLSTM structure and the BUQGRU structure achieve
improvements of 0.886% and 0.370%, respectively, com- FIGURE 9. There are two modes for parameter non-shared circuits.

pared to the QLSTM structure and the QGRU structure. In previous experiments, we constructed models with pa-
Although the improvements in accuracy are limited, our rameter non-sharing by using separate linear layers before
proposed BUQRNN model has fewer parameters compared and after each quantum circuit. However, there is still a
to QRNN and classical RNN, making it more suitable for question to be verified. As shown in Fig.9, what would be
low-resource language domains. the result if we only use parameter non-sharing linear layers
at the input end of the quantum circuit, or only at the output
C. STRUCTURE WITH PN-BUQRNN end?
In the previous section, we presented the results of We tested the PN-BUQRNN structure with two modes
BUQRNN. In this section, we tested the structure equipped shown in Fig.9.(a) and Fig.9.(b). Fig.10 and Table 5 present
with PN-BUQRNN, in which independent linear layers were the experimental results, indicating a decreasing trend in
used before and after each BUQNN. For comparative ex- accuracy for both modes on the two datasets. Among them,
periments, to maintain consistency with the structure of PN- mode (a) performs better than mode (b). When using mode
BUQRNN, we set the depth of the VQC in traditional QRNN (a), the accuracy slightly decreases compared to the case
to 3 and also used independent linear layers before and after where both the input and output ends use parameter non-
each VQC, denoted as PN-QRNN. shared in the previous section, while mode (b) significantly
The experimental settings remained consistent with the affects the experimental results. This verifies the correctness
previous section. Fig.8 illustrates the comparison results of of our structure, which is that it is necessary to add parameter
using the parameter non-shared LSTM model and the GRU non-shared linear layers at both the input and output ends of
model on the two datasets. The highest accuracy is shown the quantum circuit, and it will better optimize the quantum
in Table 4. Due to space constraints, we present only the circuit.
results of the first three runs for each experiment. Overall, This confirms the correctness of our structure i.e., adding
the PN-BUQRNN structure outperforms the classical RNN parameter non-sharing linear layers at both the input and
structure and the PN-QRNN structure on both datasets. After output ends of the quantum circuit is necessary and will
applying the parameter non-shared VQC, the PN-QRNN better optimize the quantum circuit. Therefore, we believe
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

FIGURE 10. The experimental results of the two modes are presented on the datasets BOOK-Reviews and YouTubeB-S, respectively.

that it is possible to select the appropriate model structure 12%. Given that traditional QRNNs are unable to indepen-
based on different task requirements. That is, if accuracy is dently optimize the linear layers before and after the quantum
pursued, the BUQRNN with parameter non-sharing linear circuit, thereby limiting their adaptability to VQC, we are
layers before and after can be used. If both accuracy and inspired to combine the aforementioned BUQNN design
efficiency are pursued, it is also feasible to only use the with a recurrent neural network with non-shared parameters,
parameter non-sharing structure after BUQNN. resulting in a model class called PN-BUQRNN. Quantum
neural networks constructed using this approach perform bet-
TABLE 5. PN-BUQRNN Structure with Two Types of Parameter-Unshared ter in experiments, surpassing both classical neural networks
Circuits, where superscript (a) refers to a in Fig.9 and superscript (b) refers to and traditional QRNNs on two Bengali text datasets. As
b in Fig.9
an attempt in the field of low-resource language quantum
neural networks, we demonstrate the feasibility of applying
Model BOOK-Reviews YouTubeB-S
quantum algorithms to address practical issues in the low-
PN-BUQLSTM(a) 84.547 92.036
resource text domain. Considering the limited computational
PN-BUQGRU(a) 85.047 91.644 resources in low-resource regions, our method allows for
PN-BUQLSTM(b) 84.232 89.925 circuit simulation with a S number of qubits, aligning with
PN-BUQGRU(b) 82.435 91.356 the characteristics of the low-resource language domain.
Finally, our goal is to introduce our proposed model to natural
language processing tasks in more low-resource regions, in
V. CONCLUSION
order to address a broader range of real-world issues.
To address the brute force approach taken by traditional
QRNNs when handling feature dimensions larger than the
number of qubits, we propose a solution that breaks feature VI. APPENDIX
vectors into batches and passes them through the circuit, A. FEASIBILITY EXPLORATION OF BUQRNN AS A
thereby increasing the available information. As such, in WORD EMBEDDING MODEL
this paper, we design a novel incremental quantum neural In the discussion of the aforementioned related work, ref-
network, termed BUQNN, and apply it to LSTM and GRU erence [27] mentions an approach that utilizes a quantum
networks, forming BUQRNN. Experimental results on the neural network as a word embedding model. The authors
Bengali corpus demonstrate that, compared to traditional employed QLSTM as the pretraining model and then uti-
QRNNs, our proposed BUQRNN improves accuracy by up lized the obtained word embeddings for downstream tasks.
to 0.993% while reducing model complexity on average by Due to the similarity with this work, in this section, we
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

explore the applicability of the proposed BUQRNN as an can amplitude encoding be used? The nature of BUQNN
embedding layer model on Bengali text corpora. The exper- (essentially a VQC using batch uploading) makes it difficult
imental data consists of the aforementioned BOOK-Reviews to use amplitude encoding for feature encoding; otherwise,
and YouTubeB-S datasets, and the specific methodology is BUQNN would degrade into a traditional VQC circuit. To
outlined below: compare it with amplitude encoding, we used the BUQLSTM
in Section IV.B and created a QLSTM using amplitude
TABLE 6. The results of word vector representations obtained using
BUQRNN and QRNN on the BOOK-Reviews and YouTubeB-S datasets. encoding. For 8-dimensional input data, amplitude encoding
requires 4 qubits to encode the features into the circuit. The
Model BOOK-Reviews YouTubeB-S model layers and experimental hyperparameters were kept
QLSTM 76.314 82.246
consistent with the above Section IV.B.
QGRU 77.241 82.436 TABLE 7. Comparison of our BUQLSTM network with amplitude-encoded
BUQLSTM 77.243 83.357 QLSTM network on the BOOK-Reviews and YouTubeB-S datasets.

BUQGRU 78.923 83.143


Model BOOK-Reviews YouTubeB-S

•Pretraining Quantum Models: We selected the proposed Amplitude-QLSTM 82.579 91.557


BUQRNN and QRNN mentioned in Section IV.B as BUQLSTM 83.121 92.037
comparative models. To ensure a fair comparison, we
kept the experimental parameters consistent with those Table 7 demonstrates the advantages of the proposed
in Section IV.B, with word embedding sizes of 8 for BUQLSTM network. To some extent, amplitude encoding
both BUQRNN and QRNN, and vocabulary size as the theoretically offers higher information capacity. However,
output size. in practical applications, the state preparation required for
• Pretraining: The aforementioned models were sepa-
amplitude encoding is expensive in terms of operations. In
rately trained on the BOOK-Reviews and YouTubeB-S contrast, the encoding method of BUQLSTM is more direct
until convergence. During the language model training, and requires only a small number of qubits to embed feature
sentiment labels were disregarded. vectors into the circuit.
• Model Evaluation: We assessed the utility of the pre-
trained embedding vectors in downstream SA tasks. REFERENCES
Specifically, we used the pretrained word embeddings
[1] J. J. Grefenstette, “Genetic algorithms and machine learning,”in Proceed-
from BUQRNN and QRNN to train a linear layer-based ings of the sixth annual conference on Computational learning theo-
classifier. The choice of a linear layer as the classifier ryz.,Santa Cruz., CA.USA,1993,pp.3–4.
aimed for experimental convenience. Consistent with [2] T.M. Mitchell, Machine learning, NY,USA:McGraw-hill,2007,pp.14–16.
[3] Z. Zhou, Machine learning, USA:Springer Nature,2021,pp.5–25.
the discussed methods in related work, we employed [4] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural lan-
a random subset of language model training data for guage processing: an introduction,” J AM MED INFORM ASSN, vol. 18,
evaluation to avoid introducing additional noise into the no. 5, pp.544–551,Sep,2011,doi:10.1136/amiajnl-2011-000464.
[5] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of
model. deep learning for natural language processing,” IEEE TNNLS, vol. 32, no.
We conducted ten experiments to obtain the mean perfor- 2, pp.604–624,Apr,2020,doi:10.1109/TNNLS.2020.2979670.
mance, and Table 6 presents our experimental results.Table [6] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms
and applications: A survey,”AIN SHAMS ENG J, vol. 5, no. 4, pp.1093–
6 presents our experimental results. The word embeddings 1113,Apr,2014,doi:10.1016/j.asej.2014.04.011.
trained by BUQRNN demonstrate higher accuracy com- [7] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-
pared to QRNN, indicating the high utility of our proposed based methods for sentiment analysis,” Computational linguistics, vol. 37,
no. 2, pp.267–307, June, 2011, doi:10.1162/COLI_a_00049.
BUQRNN as a word embedding model in the low-resource [8] A. Hasan, S. Moin , A. Karim, and S. Shamshirband, “Machine
language domain. The primary reason for BUQRNN’s supe- learning-based sentiment analysis for twitter accounts,” Mathemati-
rior results lies in its ability to fully load input vectors into cal and computational applications, vol. 23, no. 1, pp.11, Oct,2018,
doi:10.3390/mca28050101.
the circuit, thus avoiding the loss of semantic information. [9] B. Agarwal, R. Nayak, N. Mittal, and S. Patnaik, Deep learning-based
This experimental outcome also reinforces our viewpoint approaches for sentiment analysis,USA:Springer, 2020,pp.2–31.
discussed earlier. In future explorations, we anticipate using [10] Zhang, Binlong and Zhou, Wei, “Transformer-Encoder-GRU (TE-GRU)
for Chinese Sentiment Analysis on Chinese Comment Text," Neu-
BUQRNN as a word embedding model for various down- ral Processing Letters, vol. 55, no.2, pp.1857–1867, July,2022, doi:
stream tasks in the low-resource language domain. 10.1007/s11063-022-10966-8.
[11] Luo, Xiaoyu, “Efficient English text classification using selected ma-
B. THE DIFFERENCE BETWEEN BUQNN AND VQC chine learning techniques," Alexandria Engineering Journal, vol.60, no.3,
pp.3401–3409, Feb,2021, doi:10.1016/j.aej.2021.02.009.
USING AMPLITUDE ENCODING [12] Min B, Ross H, Sulem E, et al, " Recent advances in natural language
In VQC, the encoding layer can utilize various encoding processing via large pre-trained language models: A survey," ACM Com-
methods, such as the most common angle encoding or am- puting Surveys, vol.56, No. 30, pp.1–40, Sep,2023,doi:10.1145/3605943.
[13] Haifeng Wang, Jiwei Li, Hua Wu, Eduard Hovy, Yu Sun, " Pre-Trained
plitude encoding. In BUQNN, we adopt angle encoding to Language Models and Their Applications," Engineering, vol.25, pp.51–
encode feature vectors into the quantum circuit. However, 65, Apr,2022, doi:10.1016/j.eng.2022.04.024.

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903

Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification

[14] Alchieri, Leonardo and Badalotti, Davide and Bonardi, Pietro and Bianco, Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 01-08,
Simone, " An introduction to quantum machine learning: from quantum doi: 10.1109/IJCNN55064.2022.9892496.
logic to quantum deep learning," Quantum Machine Intelligence, vol.3, [36] S. Sazzed, "Cross-lingual sentiment classification in low-resource bengali
pp.1–30, Oct,2021, doi:10.1007/s42484-021-00056-8. language," in Proceedings of the sixth workshop on noisy user-generated
[15] Wiebe, Nathan and Kapoor, Ashish and Svore, Krysta M," Quantum deep text (W-NUT 2020), Nov, 2020, pp.50–60.
learning," 2014, arxiv:1412.3489. [37] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and
[16] Bennett, Charles H and Bernstein, Ethan and Brassard, Gilles and C. Potts, "Recursive deep models for semantic compositionality over a
Vazirani, Umesh, " Strengths and weaknesses of quantum computing," sentiment treebank," in Proceedings of the 2013 conference on empirical
SIAM journal on Computing, vol.26, no.5, pp.1510–1523, Jan,1997, methods in natural language processing, Oct, 2013, pp.1631–1642.
doi:10.1137/S0097539796300933. [38] Zhang, Bingzhi, and Quntao Zhuang. "Fast decay of classification error in
[17] Lai, Wei, Jinjing Shi, and Yan Chang, "Quantum-Inspired Fully Complex- variational quantum circuits." in Quantum Science and Technology, June,
Valued Neutral Network for Sentiment Analysis," Axioms, vol. 12, no. 3, 2022, doi:10.1088/2058-9565/ac70f5.
pp.308, Feb, 2023, doi:10.3390/axioms12030308.
[18] S. Y. Chen, S. Yoo, and Y. L. Fang, "Quantum long short-term memory," in
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2022, pp.8622–8626.
[19] Preskill, John, "Quantum computing in the NISQ era and beyond," Quan-
tum, vol.2, pp.79, Aug,2018, doi:10.22331/q-2018-08-06-79.
[20] Y. Cao, X. Zhou, X. Fei, H. Zhao, W. Liu, and J. Zhao, "Linear-layer-
enhanced quantum long short-term memory for carbon price forecast-
ing,"Quantum Machine Intelligence, vol. 5, no. 2, pp.1–12, Jul, 2023,
doi:10.1007/s42484-023-00115-2.
[21] S. Sazzed, âĂIJCross-lingual sentiment classification in low-resource
bengali language," in Proceedings of the sixth workshop on noisy user-
generated text (W-NUT 2020), Nov, 2020, pp.50–60.
[22] C. -H. H. Yang, J. Qi, S. Y. -C. Chen, Y. Tsao and P. -Y. Chen,
"When BERT Meets Quantum Temporal Convolution Learning for Text
Classification in Heterogeneous Computing," ICASSP 2022 - 2022
IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), Singapore, Singapore, 2022, pp. 8602-8606, doi:
10.1109/ICASSP43922.2022.9746412.
[23] Al-Amin M, Islam M S, Uzzal S D, "Sentiment analysis of Bengali
comments with Word2Vec and sentiment information of words" in 2017
international conference on electrical, computer and communication engi-
neering (ECCE), Feb, 2017, pp.186-190.
[24] Chowdhury, Pallab and Eumi, Ettilla Mohiuddin and Sarkar, Ovi and
Ahamed, Md Faysal, âĂIJBangla news classification using GloVe vector-
ization, LSTM, and CNN" in Proceedings of the International Conference
on Big Data, IoT, and Machine Learning: BIM 2021, Dec,2017, pp.723-
731.
[25] Hossain, Md Rajib and Hoque, Mohammed Moshiul and Sarker, Iqbal
H, Text Classification Using Convolution Neural Networks with FastText
Embedding, USA:Springer 2021, pp.101–113.
[26] Pires, Telmo and Schlinger, Eva and Garrette, Dan, âĂIJHow multilingual
is multilingual BERT?," 2019, arxiv:1906.01502.
[27] S. S. Li et al., "PQLM - Multilingual Decentralized Portable Quantum
Language Model," ICASSP 2023 - 2023 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island,
Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095215.
[28] R. Di Sipio, J. -H. Huang, S. Y. -C. Chen, S. Mangini and M. Worring,
"The Dawn of Quantum Natural Language Processing," ICASSP 2022
- 2022 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), Singapore, Singapore, 2022, pp. 8612-8616, doi:
10.1109/ICASSP43922.2022.9747675.
[29] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii,
J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and others, "Variational
quantum algorithms," Nature Reviews Physics, vol. 3, no. 9, pp.625–644,
Aug, 2021, doi:10.1038/s42254-021-00348-9.
[30] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, "Quantum
entanglement," Reviews of modern physics, vol. 81, no. 2, pp.865, June,
2009, doi:10.1103/RevModPhys.81.865.
[31] Ryan LaRose and Brian Coyle, "Robust data encodings for quan-
tum classifiers,"Physical Review A, vol. 103, no. 2, Aug2020,
doi:10.1103/PhysRevA.102.032420.
[32] I. Glendinning, "The bloch sphere," in QIA Meeting, 2005, pp.3–18.
[33] M. Periyasamy, N. Meyer, C. Ufrecht, D. D. Scherer, A. Plinge, and C.
Mutschler, "Incremental data-uploading for full-quantum classificationn,"
in 2022 IEEE International Conference on Quantum Computing and
Engineering (QCE), May, 2022, pp.31–3.
[34] Perez-Salinas, Adrian, et al. "Data re-uploading for a universal quantum
classifier." Quantum, vol.4, pp.226, Feb,2020, doi:10.22331/q-2020-02-
06-226.
[35] Z. Hong, J. Wang, X. Qu, C. Zhao, W. Tao and J. Xiao, "QSpeech: Low-
Qubit Quantum Speech Application Toolkit," 2022 International Joint

VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like