Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/TQE.2020.doi
ABSTRACT
Text sentiment analysis is an important task in natural language processing and has always been a hot
research topic. However, in low-resource regions such as South Asia, where languages like Bengali are
widely used, the research interest is relatively low compared to high-resource regions due to limited
computational resources, flexible word order, and high inflectional nature of the language. With the
development of quantum technology, quantum machine learning models leverage the superposition property
of qubits to enhance model expressiveness and achieve faster computation compared to classical systems.
To promote the development of quantum machine learning in low-resource language domains, we propose
a quantum-classical hybrid architecture. This architecture utilizes a pre-trained multilingual BERT model to
obtain vector representations of words and combines the proposed Batched Upload Quantum Recurrent
Neural Network (BUQRNN) and Parameter Non-shared Batched Upload Quantum Recurrent Neural
Network (PN-BUQRNN) as feature extraction models for sentiment analysis in Bengali. Our numerical
results demonstrate that the proposed BUQRNN structure achieves a maximum accuracy improvement of
0.993% in Bengali text classification tasks while reducing average model complexity by 12%. The PN-
BUQRNN structure surpasses the BUQRNN structure once again and outperforms classical architectures in
certain tasks.
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
sification becomes intricate as sentiment information may be parameter-sharing linear layers both before and after each
expressed differently in sentences. The emergence of pre- VQC. This allows for independent optimization of each
trained language models [12], [13] has improved feature VQC, which is advantageous for the model.
extraction in Bengali sentiment classification tasks, as they Our contributions are as follows:
can effectively capture sentiment information within Bengali • We proposed a Batched Uploading Quantum Recurrent
sentences by learning rich language representations and con- Neural Network (BUQRNN) specifically designed for
textual understanding. sequential data. This method requires only a S number
Despite the positive role of pre-trained language models of qubits and mitigates the loss of semantic information
in Bengali sentiment classification tasks, traditional feature caused by previous approaches.
extraction models still face efficiency challenges. Quantum • We introduced a Parameter Non-sharing BUQRNN
deep learning [14], [15] combines the concepts of quantum (PN-BUQRNN) that employs independent linear layers
computing and deep learning, leveraging the parallelism ad- for each VQC, enabling independent optimization of
vantage [16] of quantum computing to accelerate the training each VQC in the model structure.
and inference processes of models. By utilizing quantum • For the first time, we applied quantum algorithms to
neural networks and quantum gate operations, quantum deep address sentiment classification tasks in low-resource
learning models can handle complex sentiment classification languages such as Bengali, which holds significant im-
tasks more efficiently [17]. A quantum-classical hybrid recur- portance for advancing the development of quantum in
rent neural network model (QRNN) based on the quantum low-resource languages.
variational circuit (VQC) core was proposed in literature
The rest of the paper is organized as follows. Section
[18]. Such networks have been successfully applied as fea-
2 focuses on the text classification process and discusses
ture extractors in text classification tasks for high-resource
word embedding techniques and the QRNN network for low-
languages, demonstrating better performance compared to
resource language text classification. Section 3 presents the
their classical counterparts. Considering the characteristics
specific implementation approach to address the aforemen-
of low-resource languages, this sparks the idea of using
tioned issues. Section 4 describes the numerical simulation
quantum algorithms to improve low-resource text SA tasks.
results, and Section 5 concludes the paper, providing insights
However, previous studies have shown that QRNNs may
into future directions and prospects.
struggle to effectively capture semantic information and may
even result in information loss when dealing with longer
II. RELATED WORK
sequences. The current challenges can be summarized as
follows: Reference [21] employed a combination of Multilingual
BERT embeddings and RNN for text classification tasks
• Due to the limitations of current Noisy Intermediate- in Bengali. In this section, we will introduce an improved
Scale Quantum (NISQ) devices [19], it is necessary to quantum-classical hybrid model based on its architecture.
match the dimensionality of the input sequence with the We will discuss various methods for text classification in the
number of qubits. Previous QRNN models employed low-resource language domain and provide an overview of
parameter-sharing linear layers to reduce the dimension- Bengali-specific word embedding techniques and the QRNN
ality of the input data, which may potentially result in model.
the loss of semantic information to some extent.
• Previous QRNN models did not optimize the Quantum A. WORD EMBEDDING
Variational Circuit (VQC) specifically but instead uti-
In the domain of high-resource languages, there are generally
lized parameter-sharing linear layers for optimization
two methods for generating word vectors: contextual word
across all VQCs.
embedding techniques and non-contextual word embedding
In response to the first situation mentioned above, we techniques. BERT as a context-based pre-trained language
designed and utilized a Batched Uploading Quantum Neural model, can also be employed for word embedding. In the ref-
Network (BUQNN), which is essentially a structure that erence [22], BERT is utilized to obtain word vectors for En-
incorporates VQC circuits. The BUQNN divides the input glish text, followed by feature extraction using the quantum
feature sequence into batches and loads them into the circuit TCN. The results demonstrate the high performance of BERT
to obtain the complete semantic information. We refer to the while also indicating the feasibility of related quantum mod-
QRNN model that utilizes this BUQNN as BUQRNN. By els in text classification tasks. Similar to the high-resource
adopting this approach, we can alleviate the semantic infor- language domain, low-resource languages can be categorized
mation loss caused by previous methods with a S number of into two main types in terms of word embedding methods.
qubits. The first type is non-contextual word embedding methods.
Regarding the second situation mentioned, in reference Reference [23] used the word2vec model to generate Bengali
[20], a non-parameter-sharing linear layer was applied after word vectors. However, word2vec does not handle subword
the VQC to enhance the expressiveness of the circuit. We information, such as affixes, and does not consider spelling
followed this idea and made improvements by using non- characteristics of words. In contrast to Word2Vec, GloVe [24]
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
considers both local context and global statistical information and finally collapses them into classical states through mea-
to generate word vectors. FastText [25], an open-source surement operations. The parameters of the rotation gates
word embedding and text classification library developed by need to be updated through gradient descent. Fig.1 illustrates
Facebook AI Research, differs from Word2Vec and GloVe the structure of the variational quantum circuit included in
in that it considers subword information within words. In the QLSTM network. In this case, we assume the circuit
contrast to non-contextual word embeddings, the second is simulated with 4 qubits. The input word vector sequence
type, contextual word embeddings, take into account the v⃗t = (vt1 , vt2 , ..., vtn ), where v⃗t is composed of the current
context of words in specific sentences or texts, enabling better MBERT output et and the previous hidden layer feature
capture of word semantics. Multilingual BERT (MBERT) ht−1 . Initially, a linear layer is used to match the number
[26], as a multilingual version of the BERT model, has of qubits, and the encoding layer utilizes angle encoding [31]
been pretrained on Wikipedia texts from 104 languages. An to embed the input sequence into the circuit. The variational
important feature of MBERT is its bidirectional self-attention layer optimizes the rotation angles of the qubits on the Bloch
mechanism using the Transformer model, which allows for sphere [32] using a series of rotation gates with updatable
a better understanding of the complex semantics of words parameters. The measurement part employs Pauli-Z gates to
within their context. Experimental results [21] demonstrate measure the states of the qubits. Measuring in the Pauli-
that compared to traditional word embeddings such as Fast- 1 0
Z basis σz = means projecting the state onto
Text and GloVe, MBERT performs better in Bengali word 0 −1
embeddings. In addition to employing classical methods for one of the eigenstates of the Pauli-Z matrix, namely |0⟩ or
text word embedding as mentioned above, the reference [27] |1⟩. The measured classical bits are then expanded to match
proposes a novel technique for text word embedding using the size of the hidden layer through another linear layer.
a quantum language model. The results indicate that word However, when I > n (where n is the number of qubits
vectors mapped by the quantum language model can achieve and I is the length of the feature sequence), the current
performance comparable to their classical counterparts in approach typically reduces the input dimensionality to match
downstream tasks. In this paper, we will choose MBERT as the number of qubits through a linear layer, which may
the word embedding model for Bengali text. The sentence result in the loss of semantic information to some extent.
vector x can be represented as x = (x1 , x2 , x3 , . . . , xt ) We draw inspiration from a quantum algorithm for image
after tokenization, where xt represents the t-th word in the classification on the MNIST handwritten dataset [33], which
sentence. After passing through the MBERT model, the differs from the previous data re-uploading approach [34].
corresponding BERT vector representation can be obtained The former method slices the image horizontally and uploads
as follows: the value of each pixel to the quantum circuit, while the latter
repeatedly uploads the feature vector to the circuit. Due to
MBERT(x1 , x2 , x3 , . . . , xt ) = (e1 , e2 , e3 , . . . , et ) (1) the limitations of current NISQ devices, the latter approach
requires significant computational resources for processing
Here, each et represents the MBERT vector representation of high-dimensional feature sequences, making it difficult to
the sentence with index j at time t. implement. The former approach, by batch uploading, re-
duces the dependence on the number of qubits and fully loads
B. QRNN the feature sequence into the circuit. We aim to address the
first issue mentioned in the introduction of traditional QRNN
using this method.
In addition, the parameters of the linear embedding layer
before each VQC and the linear expansion layer after each
VQC in traditional QRNN are shared. Taking QLSTM as
an example in reference [28], the model uses four VQCs to
replace the classical network layers, and these four VQCs
share the embedding layer and expansion layer. However,
this shared linear layer cannot be effectively optimized for
different VQCs. In reference [20], non-shared linear layers
with separate parameters were added after each VQC in
FIGURE 1. The VQC structure used in QLSTM consists of (a) an encoding
layer, (b) a variational layer, and (c) a measurement layer. The encoding layer
QLSTM to achieve independent optimization of the circuits.
is composed of a series of rotation gates used for encoding. We were inspired by this and made improvements by incor-
porating non-shared parameter linear layers before and after
QLSTM [28] is a variation of QRNN that incorpo- each VQC, along with the aforementioned batch uploading
rates a variational quantum circuit (VQC) [29] to construct approach, to achieve more comprehensive circuit optimiza-
quantum-gated units. The VQC adjusts the initial states of tion and further improve the performance of QRNN.
qubits through a series of rotation gates, performs quantum
entanglement operations using a series of CNOT gates [30],
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
FIGURE 2. The framework includes two red dashed sections, showcasing the proposed structures designed to address the mentioned issues.
III. THE QRNN WE PROPOSE the dimensionality of the input word vector features. This
Regarding the first issue mentioned in the introductory sec- assumes a scenario where the division is evenly divisible. If it
tion, in sections A and B, we demonstrate the replacement is not divisible, the number of batches p needs to be increased
of classical neural networks in classical RNN networks with by 1, and the remaining space is padded with zero elements.
a Batched Uploading Quantum Neural Network (BUQNN) ⃗ 1 , batch
After the division, v⃗t = (batch ⃗ 2 , ..., batch
⃗ p ), where
and apply it to QRNN, proposing a BUQRNN based on ⃗ is a vector containing N features. In this demonstration,
batch
a classical-quantum hybrid framework. Unlike traditional we use four qubits, and there is one variational layer between
QRNN, which requires matching the number of qubits to any two encoding layers. This restriction is just a design
the feature vectors, our approach provides universality and choice, and alternative design schemes can be chosen. Each
effectiveness for handling higher-dimensional feature vec- feature batch undergoes angle encoding embedding circuit
tors. BUQNN divides input features into batches according and then a variational layer. One feature batch is uploaded
to a predefined number of qubits and passes them through to the circuit at a time, and after loading p times, all feature
a variational quantum circuit, forming an n-layer encoding- vectors can be embedded in the circuit. It is important to note
variational hybrid structure. This approach allows for pro- that the essence of BUQNN is a VQC using batch uploading,
cessing sequence data without reducing the dimensionality of which differs from the structure of the VQC included in
feature vectors and does not require a large number of qubits traditional QLSTM networks. It can directly embed feature
to handle the data. Addressing the second issue mentioned vectors into the circuit without the need for additional linear
in the introductory section, section C provides a specific layers. The BUQNN with linear layers will be mentioned in
solution. The workflow for the sentiment classification task Section D of this chapter.
in Bengali language is shown in Fig. 2. The input text is
transformed into word embeddings using MBERT, and then
the word vectors are fed into the proposed BUQRNN or
PN-BUQRNN for feature extraction. Finally, the extracted
features are input into a fully connected layer for classifi-
cation. We will now describe the implementation details of
BUQRNN and PN-BUQRNN.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
∼
same input data and generates a new cell state candidate ct
through a tanh function (as shown in (4)). Equation (5) com-
bines the output of the input gate with the forget gate ft , the
cell state of the previous moment ct−1 , and the new cell state
∼
candidate ct , and the resulting vector will be used to update
the current cell state. In other words, the output of the input
gate (a real number between 0 and 1) determines how much
∼
of the new information ct is added to the current cell state
ct . This mechanism allows LSTM to better remember long-
term dependencies, avoiding the vanishing gradient problem
of ordinary RNNs.
BU QN No : The goal of BU QN No is to generate the
output of the cell. In Equation (6), Ot obtains its output FIGURE 5. Our Proposed BUQGRU.
through the sigmoid function after obtaining the expectation
value from BU QN No . In Equation (7), the output ot is a brief introduction to the BUQGRU network. Fig.5 depicts
multiplied element-wise with the output of the update gate our BUQGRU network, where we replace the classical neural
ct (which is processed through a tanh activation function), network of the GRU model with BUQNN. Compared to
generating the new hidden state vector ht , which will be BUQLSTM, it only requires three BUQNNs. The current
passed to the next time step for calculation. input etj and the previous moment’s ht−1 are fed into the
network. The reset gate, composed of BU QN Nr and the
Algorithm 1 An algorithm for BUQLSTM. sigmoid activation function, determines how much of the
BUQLSTM( INPUT _ SIZE , HIDDEN _ SIZE ) previous moment’s hidden state information should be used
inputs = concatenate(input_size, hidden_size) when calculating the current candidate hidden state. The
forget gate : devicef = device(backend, wires = wf ) update gate, comprised of BU QN Nz and the sigmoid func-
#circuit_forget(inputs, weights): tion, determines how much of the previous moment’s hidden
split inputs into p batches state information should be preserved when calculating the
for batch in p: current hidden state. Next, BU QN Nz combined with the
encoding(batch, wires = wf ) tanh activation
∼
function is used to calculate the candidate
variation(weightsf , wires = wf ) hidden state ht . Finally, equation (11) decides whether the
return ([Expectation(P auliZ(wire) for each wire]) new hidden state ht should fully accept the candidate hidden
input gate : devicei = device(backend, wires = wi ) state, retain the previous moment’s hidden state, or be a
#circuit_input(inputs, weights): compromise between the two.
split inputs into p batches
rt = σ(BU QN Nr ); (8)
for batch in p:
encoding(batch, wires = wi ) zt = σ(BU QN Nz ); (9)
∼
variation(weightsi , wires = wi ) ht = tanh(BU QN Nh ); (10)
return ([Expectation(P auliZ(wire) for each wire]) ∼
update gate : devicec = device(backend, wires = wc ) ht = (1 − zt ) ∗ ht−1 + zt ∗ ht ; (11)
#circuit_update(inputs, weights):
split inputs into p batches Here, BU QN Ni , (i ∈ r, z, h) represent the reset gate
for batch in p: circuit, update gate circuit, and candidate hidden state circuit,
encoding(batch, wires = wc ) respectively. It is important to note that, due to the character-
variation(weightsc , wires = wc ) istics of the GRU network, the input to BU QN Nh differs
return ([Expectation(P auliZ(wire) for each wire]) from other BU QN N . In the computation of Equation (10),
output gate : deviceo = device(backend, wires = wo ) the output rt of the reset gate is multiplied by the hidden
#circuit_output(inputs, weights): state output ht−1 from the previous time step. This product
split inputs into p batches determines how much information from the previous time
for batch in p: step can be utilized. The resulting product is concatenated
encoding(batch, wires = wo ) with etj and serves as input to BU QN Nh . After passing
variation(weightso , wires = wo ) through the tanh activation function, a new candidate hidden
return ([Expectation(P auliZ(wire) for each wire]) state is obtained.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
QLSTM networks, the linear embedding layers before each ⃗ 1 , and wi denotes the updat-
Here, vt1 is an element in batch
VQC and after each VQC are non-shared parameters. While able rotation gate parameters. From (13), it is evident that the
this reduces the parameter count of the model, it may intro- value of vt1 should not be too large or too small, as it may
duce some issues. By using the same linear transformation lead to gradient vanishing. Therefore, a clip operation on the
for all variational quantum circuits, the model’s ability to output can be added after the linear layer preceding BUQNN,
learn different input data features could be limited. For restricting the output within a specified range. Through this
instance, if the forget gate and the input gate need to extract approach, it is possible to optimize the weights of the linear
different features from the input data, a model with shared pa- transformation effectively, thus mitigating gradient vanishing
rameters may struggle to simultaneously satisfy the require- to a certain extent. In subsequent experiments, we will use
ments of both gates.Furthermore, during the training process, such a linear layer to control the output within the range of
all variational quantum circuits backpropagate through the [-3, 3].
same linear layer. This can lead to gradient explosion or
vanishing gradients. If the gradient of a particular variational IV. DATASET AND EXPERIMENTAL RESULT
quantum circuit is exceptionally large, it may "overwhelm" A. DATASET
the gradients of other circuits, making it difficult for the entire The experiments utilized two Bengali text classification
network to learn effectively.To address the aforementioned datasets. The BOOK-Reviews dataset [36] is a collection
issues, we propose PN-BUQNN, as depicted in Fig.6.(b). of Bengali book reviews gathered from the internet (such
as blogs, Facebook, and e-commerce websites). This dataset
is a binary classification dataset (with positive and negative
classes) containing 2000 book reviews. The other dataset
used in the experiment is YouTube-B [37], a collection of
comments on Bengali dramas collected from the YOUTUBE
site. It contains 11807 comments, of which 8500 are positive
and 3307 are negative. Given the limitations of current NISQ
devices on the efficiency of quantum algorithms in classifi-
cation tasks, we selected 2000 entries from the YouTube-B
FIGURE 6. (a) represents a parameter-shared VQC, while (b) represents our
dataset for the experiment, with 1700 for training and 300
proposed PN-BUQNN. for testing. We named the modified YouTubeB dataset as
We employ linear layers before and after each BUQNN, YouTubeB-S. The average number of words per sentence
which means they can learn more appropriate input and in the BOOK-Reviews and YouTubeB-S datasets is 46 and
output transformations for themselves. The independent lin- 21, respectively. As shown in Table 1, to prevent long-tail
ear layers imply that each BUQNN can learn and extract distribution in the data, the number of data entries in each
different features, with optimizations being specific to each category was kept approximately equal. The dataset we used
BUQNN. During the training process, since each BUQNN is available at https://github.com/nuistyl/Bengali-dataset.
has its own linear layer, their gradient updates will no
longer affect each other. Furthermore, adding a linear layer TABLE 1. The datasets used in the experiment are the BOOK-Reviews
dataset and the YouTubeB-S dataset
before BU QN N can to some extent alleviate the issue
of gradient vanishing. The issue of gradient vanishing is Dataset Positive Negative All
discussed in low-qubit Variational Quantum Circuits (VQC)
BOOK-Reviews 996 1004 2000
[35], and a similar problem exists in the encoding layer of
BU QN N . The input feature vector vti generates different YouTubeB-S 1005 995 2000
rotation angles for the parameters of Ry and Rz gates using
⃗
the arctan function. For the input batch1, the encoding layer
of BU QN N can be represented as: encoding(batch ⃗ 1) = TABLE 2. Comparison of the Number of Model Parameters Used
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
two datasets to test the two structures shown on both sides of resulting in the embedding vector etj . To facilitate quantum
Fig.2, respectively. simulation, we combined MBERT with a linear layer to
control the feature size to 8 dimensions. We also compared
B. STRUCTURE WITH BUQRNN it with a classical LSTM network with an input size of
In this section, we conducted experiments on the BUQRNN 8 dimensions and 224 parameters. All experiments used a
architecture shown in Fig.2. We employed BUQNN to define hidden layer size of 4 dimensions, and the feature dimensions
BUQLSTM and BUQGRU and conducted experiments using combined with the hidden layer dimensions were fed into
a 4-qubit circuit. For comparison, we also constructed a tradi- the model. For a fair comparison, the depth of the VQC in
tional variational quantum circuit with 4 qubits. In Equation the traditional QRNN was set to 2. Reference [38] mentions
(1), the input vector is passed through the MBERT model, that when the architecture of the VQC is extensive when the
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
circuit depth does not decrease the number of gates - the are shown in Table 2. The experiments were conducted using
quantum data classification errors of VQC typically decrease the ADAMW optimizer and the cross-entropy loss function.
exponentially with the increase in circuit depth. This rapid The PENNYLANE framework was used for modeling the
error suppression ends when reaching the final Helstrom limit quantum circuits, which includes multiple built-in simulators
of quantum state discrimination. However, considering the to meet different task requirements.
limitations of NISQ devices, it is challenging to train more All the aforementioned experiments were conducted under
parameters. In future work, we plan to further explore the im- the conditions of a learning rate of 0.01, 50 epochs, and
pact of VQC depth on classification accuracy by improving the use of linear warm-up optimization method. To enhance
experimental design and adopting more advanced quantum the persuasiveness of the experimental results, we conducted
devices. The model parameter counts used in the experiments 10 experiments for each trial using datasets divided into
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
different partitions to obtain the mean accuracy. Due to space structure even performs worse in certain tasks. This is due to
constraints, we present only the first three results for each the information loss caused by the dimensionality reduction
experiment. Fig.7 presents the comparison results of different of the linear layer, which prevents it from working effectively.
models on two datasets, where the highest accuracy is shown In contrast, in the PN-BUQRNN structure, we fully load
in Table 3. Overall, while quantum structures have disadvan- the information into the BUQNN circuit and optimize each
tages compared to classical structures, the BUQRNN-based BUQNN circuit independently, resulting in better results.
structures show improved accuracy compared to traditional
QRNN structures on both datasets. This is because traditional TABLE 4. The accuracy of the structure with PN-BUQRNN compared to the
baseline structure on the YouTubeB-S and BOOK-Reviews validation sets
QRNN utilizes linear layers to reduce the dimensionality of
the input feature sequence during circuit simulation, which
Model BOOK-Reviews YouTubeB-S
may result in a loss of semantic information to some extent.
MBERT+LSTM 84.231 92.394
TABLE 3. Accuracy of Structures with BUQRNN and Baseline Structures on MBERT+PN-QLSTM 84.266 91.416
BOOK-Reviews and YouTubeB-S Validation Sets
MBERT+PN-BUQLSTM 84.738 92.681
pared to the QLSTM structure and the QGRU structure. In previous experiments, we constructed models with pa-
Although the improvements in accuracy are limited, our rameter non-sharing by using separate linear layers before
proposed BUQRNN model has fewer parameters compared and after each quantum circuit. However, there is still a
to QRNN and classical RNN, making it more suitable for question to be verified. As shown in Fig.9, what would be
low-resource language domains. the result if we only use parameter non-sharing linear layers
at the input end of the quantum circuit, or only at the output
C. STRUCTURE WITH PN-BUQRNN end?
In the previous section, we presented the results of We tested the PN-BUQRNN structure with two modes
BUQRNN. In this section, we tested the structure equipped shown in Fig.9.(a) and Fig.9.(b). Fig.10 and Table 5 present
with PN-BUQRNN, in which independent linear layers were the experimental results, indicating a decreasing trend in
used before and after each BUQNN. For comparative ex- accuracy for both modes on the two datasets. Among them,
periments, to maintain consistency with the structure of PN- mode (a) performs better than mode (b). When using mode
BUQRNN, we set the depth of the VQC in traditional QRNN (a), the accuracy slightly decreases compared to the case
to 3 and also used independent linear layers before and after where both the input and output ends use parameter non-
each VQC, denoted as PN-QRNN. shared in the previous section, while mode (b) significantly
The experimental settings remained consistent with the affects the experimental results. This verifies the correctness
previous section. Fig.8 illustrates the comparison results of of our structure, which is that it is necessary to add parameter
using the parameter non-shared LSTM model and the GRU non-shared linear layers at both the input and output ends of
model on the two datasets. The highest accuracy is shown the quantum circuit, and it will better optimize the quantum
in Table 4. Due to space constraints, we present only the circuit.
results of the first three runs for each experiment. Overall, This confirms the correctness of our structure i.e., adding
the PN-BUQRNN structure outperforms the classical RNN parameter non-sharing linear layers at both the input and
structure and the PN-QRNN structure on both datasets. After output ends of the quantum circuit is necessary and will
applying the parameter non-shared VQC, the PN-QRNN better optimize the quantum circuit. Therefore, we believe
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
FIGURE 10. The experimental results of the two modes are presented on the datasets BOOK-Reviews and YouTubeB-S, respectively.
that it is possible to select the appropriate model structure 12%. Given that traditional QRNNs are unable to indepen-
based on different task requirements. That is, if accuracy is dently optimize the linear layers before and after the quantum
pursued, the BUQRNN with parameter non-sharing linear circuit, thereby limiting their adaptability to VQC, we are
layers before and after can be used. If both accuracy and inspired to combine the aforementioned BUQNN design
efficiency are pursued, it is also feasible to only use the with a recurrent neural network with non-shared parameters,
parameter non-sharing structure after BUQNN. resulting in a model class called PN-BUQRNN. Quantum
neural networks constructed using this approach perform bet-
TABLE 5. PN-BUQRNN Structure with Two Types of Parameter-Unshared ter in experiments, surpassing both classical neural networks
Circuits, where superscript (a) refers to a in Fig.9 and superscript (b) refers to and traditional QRNNs on two Bengali text datasets. As
b in Fig.9
an attempt in the field of low-resource language quantum
neural networks, we demonstrate the feasibility of applying
Model BOOK-Reviews YouTubeB-S
quantum algorithms to address practical issues in the low-
PN-BUQLSTM(a) 84.547 92.036
resource text domain. Considering the limited computational
PN-BUQGRU(a) 85.047 91.644 resources in low-resource regions, our method allows for
PN-BUQLSTM(b) 84.232 89.925 circuit simulation with a S number of qubits, aligning with
PN-BUQGRU(b) 82.435 91.356 the characteristics of the low-resource language domain.
Finally, our goal is to introduce our proposed model to natural
language processing tasks in more low-resource regions, in
V. CONCLUSION
order to address a broader range of real-world issues.
To address the brute force approach taken by traditional
QRNNs when handling feature dimensions larger than the
number of qubits, we propose a solution that breaks feature VI. APPENDIX
vectors into batches and passes them through the circuit, A. FEASIBILITY EXPLORATION OF BUQRNN AS A
thereby increasing the available information. As such, in WORD EMBEDDING MODEL
this paper, we design a novel incremental quantum neural In the discussion of the aforementioned related work, ref-
network, termed BUQNN, and apply it to LSTM and GRU erence [27] mentions an approach that utilizes a quantum
networks, forming BUQRNN. Experimental results on the neural network as a word embedding model. The authors
Bengali corpus demonstrate that, compared to traditional employed QLSTM as the pretraining model and then uti-
QRNNs, our proposed BUQRNN improves accuracy by up lized the obtained word embeddings for downstream tasks.
to 0.993% while reducing model complexity on average by Due to the similarity with this work, in this section, we
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
explore the applicability of the proposed BUQRNN as an can amplitude encoding be used? The nature of BUQNN
embedding layer model on Bengali text corpora. The exper- (essentially a VQC using batch uploading) makes it difficult
imental data consists of the aforementioned BOOK-Reviews to use amplitude encoding for feature encoding; otherwise,
and YouTubeB-S datasets, and the specific methodology is BUQNN would degrade into a traditional VQC circuit. To
outlined below: compare it with amplitude encoding, we used the BUQLSTM
in Section IV.B and created a QLSTM using amplitude
TABLE 6. The results of word vector representations obtained using
BUQRNN and QRNN on the BOOK-Reviews and YouTubeB-S datasets. encoding. For 8-dimensional input data, amplitude encoding
requires 4 qubits to encode the features into the circuit. The
Model BOOK-Reviews YouTubeB-S model layers and experimental hyperparameters were kept
QLSTM 76.314 82.246
consistent with the above Section IV.B.
QGRU 77.241 82.436 TABLE 7. Comparison of our BUQLSTM network with amplitude-encoded
BUQLSTM 77.243 83.357 QLSTM network on the BOOK-Reviews and YouTubeB-S datasets.
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3373903
Wenbin Yu et.al et al.: Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
[14] Alchieri, Leonardo and Badalotti, Davide and Bonardi, Pietro and Bianco, Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 01-08,
Simone, " An introduction to quantum machine learning: from quantum doi: 10.1109/IJCNN55064.2022.9892496.
logic to quantum deep learning," Quantum Machine Intelligence, vol.3, [36] S. Sazzed, "Cross-lingual sentiment classification in low-resource bengali
pp.1–30, Oct,2021, doi:10.1007/s42484-021-00056-8. language," in Proceedings of the sixth workshop on noisy user-generated
[15] Wiebe, Nathan and Kapoor, Ashish and Svore, Krysta M," Quantum deep text (W-NUT 2020), Nov, 2020, pp.50–60.
learning," 2014, arxiv:1412.3489. [37] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and
[16] Bennett, Charles H and Bernstein, Ethan and Brassard, Gilles and C. Potts, "Recursive deep models for semantic compositionality over a
Vazirani, Umesh, " Strengths and weaknesses of quantum computing," sentiment treebank," in Proceedings of the 2013 conference on empirical
SIAM journal on Computing, vol.26, no.5, pp.1510–1523, Jan,1997, methods in natural language processing, Oct, 2013, pp.1631–1642.
doi:10.1137/S0097539796300933. [38] Zhang, Bingzhi, and Quntao Zhuang. "Fast decay of classification error in
[17] Lai, Wei, Jinjing Shi, and Yan Chang, "Quantum-Inspired Fully Complex- variational quantum circuits." in Quantum Science and Technology, June,
Valued Neutral Network for Sentiment Analysis," Axioms, vol. 12, no. 3, 2022, doi:10.1088/2058-9565/ac70f5.
pp.308, Feb, 2023, doi:10.3390/axioms12030308.
[18] S. Y. Chen, S. Yoo, and Y. L. Fang, "Quantum long short-term memory," in
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2022, pp.8622–8626.
[19] Preskill, John, "Quantum computing in the NISQ era and beyond," Quan-
tum, vol.2, pp.79, Aug,2018, doi:10.22331/q-2018-08-06-79.
[20] Y. Cao, X. Zhou, X. Fei, H. Zhao, W. Liu, and J. Zhao, "Linear-layer-
enhanced quantum long short-term memory for carbon price forecast-
ing,"Quantum Machine Intelligence, vol. 5, no. 2, pp.1–12, Jul, 2023,
doi:10.1007/s42484-023-00115-2.
[21] S. Sazzed, âĂIJCross-lingual sentiment classification in low-resource
bengali language," in Proceedings of the sixth workshop on noisy user-
generated text (W-NUT 2020), Nov, 2020, pp.50–60.
[22] C. -H. H. Yang, J. Qi, S. Y. -C. Chen, Y. Tsao and P. -Y. Chen,
"When BERT Meets Quantum Temporal Convolution Learning for Text
Classification in Heterogeneous Computing," ICASSP 2022 - 2022
IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), Singapore, Singapore, 2022, pp. 8602-8606, doi:
10.1109/ICASSP43922.2022.9746412.
[23] Al-Amin M, Islam M S, Uzzal S D, "Sentiment analysis of Bengali
comments with Word2Vec and sentiment information of words" in 2017
international conference on electrical, computer and communication engi-
neering (ECCE), Feb, 2017, pp.186-190.
[24] Chowdhury, Pallab and Eumi, Ettilla Mohiuddin and Sarkar, Ovi and
Ahamed, Md Faysal, âĂIJBangla news classification using GloVe vector-
ization, LSTM, and CNN" in Proceedings of the International Conference
on Big Data, IoT, and Machine Learning: BIM 2021, Dec,2017, pp.723-
731.
[25] Hossain, Md Rajib and Hoque, Mohammed Moshiul and Sarker, Iqbal
H, Text Classification Using Convolution Neural Networks with FastText
Embedding, USA:Springer 2021, pp.101–113.
[26] Pires, Telmo and Schlinger, Eva and Garrette, Dan, âĂIJHow multilingual
is multilingual BERT?," 2019, arxiv:1906.01502.
[27] S. S. Li et al., "PQLM - Multilingual Decentralized Portable Quantum
Language Model," ICASSP 2023 - 2023 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island,
Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095215.
[28] R. Di Sipio, J. -H. Huang, S. Y. -C. Chen, S. Mangini and M. Worring,
"The Dawn of Quantum Natural Language Processing," ICASSP 2022
- 2022 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), Singapore, Singapore, 2022, pp. 8612-8616, doi:
10.1109/ICASSP43922.2022.9747675.
[29] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii,
J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and others, "Variational
quantum algorithms," Nature Reviews Physics, vol. 3, no. 9, pp.625–644,
Aug, 2021, doi:10.1038/s42254-021-00348-9.
[30] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, "Quantum
entanglement," Reviews of modern physics, vol. 81, no. 2, pp.865, June,
2009, doi:10.1103/RevModPhys.81.865.
[31] Ryan LaRose and Brian Coyle, "Robust data encodings for quan-
tum classifiers,"Physical Review A, vol. 103, no. 2, Aug2020,
doi:10.1103/PhysRevA.102.032420.
[32] I. Glendinning, "The bloch sphere," in QIA Meeting, 2005, pp.3–18.
[33] M. Periyasamy, N. Meyer, C. Ufrecht, D. D. Scherer, A. Plinge, and C.
Mutschler, "Incremental data-uploading for full-quantum classificationn,"
in 2022 IEEE International Conference on Quantum Computing and
Engineering (QCE), May, 2022, pp.31–3.
[34] Perez-Salinas, Adrian, et al. "Data re-uploading for a universal quantum
classifier." Quantum, vol.4, pp.226, Feb,2020, doi:10.22331/q-2020-02-
06-226.
[35] Z. Hong, J. Wang, X. Qu, C. Zhao, W. Tao and J. Xiao, "QSpeech: Low-
Qubit Quantum Speech Application Toolkit," 2022 International Joint
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4