QA Using QNLP
QA Using QNLP
Abstract
Natural language processing (NLP) is at the forefront of great advances in contemporary AI, and
it is arguably one of the most challenging areas of the field. At the same time, in the area of
Quantum Computing (QC), with the steady growth of quantum hardware and notable improve-
ments towards implementations of quantum algorithms, we are approaching an era when quantum
computers perform tasks that cannot be done on classical computers with a reasonable amount of
resources. This provides an new range of opportunities for AI, and for NLP specifically. In this work,
we work with the Categorical Distributional Compositional (DisCoCat) model of natural language
meaning, whose underlying mathematical underpinnings make it amenable to quantum instantia-
tions. Earlier work on fault-tolerant quantum algorithms has already demonstrated potential quantum
advantage for NLP, notably employing DisCoCat. In this work, we focus on the capabilities of noisy
intermediate-scale quantum (NISQ) hardware and perform the first implementation of an NLP task
on a NISQ processor, using the DisCoCat framework. Sentences are instantiated as parameterised
quantum circuits; word-meanings are embedded in quantum states using parameterised quantum-
circuits and the sentence’s grammatical structure faithfully manifests as a pattern of entangling
operations which compose the word-circuits into a sentence-circuit. The circuits’ parameters are
trained using a classical optimiser in a supervised NLP task of binary classification. Our novel QNLP
model shows concrete promise for scalability as the quality of the quantum hardware improves in the
near future and solidifies a novel branch of experimental research at the intersection of QC and AI.
1
Springer Nature 2021 LATEX template
language understanding and language generation, fast pace. The prominence of QC is now well-
under the hood of mainstream NLP models one established, especially after the experiments aim-
exclusively finds deep neural networks, which ing to demonstrate quantum advantage for the
famously suffer the criticism of being uninter- specific task of sampling from random quantum
pretable black boxes [7]. circuits [24]. QC has the potential to reach the
One way to bring transparency to said black whole range of human interests, from foundations
boxes, is to explicitly incorporate linguistic struc- of physics and computer science, to applications
ture, such as grammar and syntax [8–10], into in engineering, finance, chemistry, and optimi-
distributional language models. Note, in this work sation problems [25], and even procedural map
we will use the terms grammar and syntax inter- generation [26].
changeably, the essence of these terms being In the last half-decade, the natural conceptual
that they refer to structural information that fusion of QC with AI, and especially the sub-
the textual data could be supplemented with. A field of AI known as machine learning (ML), has
prominent approach attempting this merge is the lead to a plethora of novel and exciting advance-
Distributional Compositional Categorical model ments. The quantum machine learning (QML)
of natural language meaning (DisCoCat) [11– literature has reached an immense size considering
13], which pioneered the paradigm of combining its young age, with the cross-fertilisation of ideas
explicit grammatical (or syntactic) structure with and methods between fields of research as well
distributional (or statistical) methods for encod- as academia and industry being a dominant driv-
ing and computing meaning (or semantics). There ing force. The landscape includes using quantum
has also been follow-up related work on neural- computers for subroutines in ML algorithms for
based models where syntax is also incorporated executing linear algebra operations [27], or quan-
in a recursive neural network, where the syntac- tising classical machine learning algorithms based
tic structures dictate the order of the recursive on neural networks [28], support vector machines,
calls to the recurring cell [14]. This approach also clustering [29], or artificial agents who learn from
provides the tools for modelling linguistic phenom- interacting with their environment [30], and even
ena such as lexical entailment and ambiguity, as quantum-inspired and dequantised classical algo-
well as the transparent construction of syntactic rithms which nevertheless retain a complexity
structures like, relative and possessive pronouns theoretic advantage [31]. Small-scale classification
[15, 16], conjunction, disjunction, and negation experiments have also been implemented with
[17]. quantum technology [32, 33].
From a modern lens, DisCoCat as it is pre- From this collection of ingredients there organ-
sented in the literature, is a tensor network lan- ically emerges the interdisciplinary field of Quan-
guage model. Recently, the motivation for design- tum Natural Language Processing (QNLP), a
ing interpretable AI systems has caused a surge in research area still in its infancy [34–38], com-
the use of tensor networks in language modelling bines NLP and QC and seeks novel quantum
[18–21]. A tensor network is a graph whose ver- language model designs and quantum algorithms
tices are endowed with tensors. Every vertex has for NLP tasks. Building on the recently estab-
an arity, ie a number of edges to which it belongs lished methodology of QML, one imports QC
which represent the tensor’s indices. Edges rep- algorithms to obtain theoretical speedups for spe-
resent tensor contractions, ie identification of the cific NLP tasks or use the quantum Hilbert space
indices joined by the edge and summation over as a feature space in which NLP tasks are to be
their range (Eistein summation). Intuitively, a executed.
tensor network is a compressed representation of a The first paper on QNLP using the DisCoCat
multiliear map. Tensor networks, have been used framework by Zeng and Coecke [34], introduced
to capture probability distributions of complex an approach where a standard NLP task are
many-body systems, both classical and quantum, instantiated as quantum computations. The task
and they also have a formal appeal as rigorous of sentence similarity was reduced to the closest-
algebraic tools [22, 23]. vector problem, for which there exists a quantum
Quantum computing (QC) is a field which, algorithm providing a quadratic speedup, albeit
in parallel with NLP is growing at an extremely assuming a Quantum Random Access Memory
Springer Nature 2021 LATEX template
2 The model
The DisCoCat model relies on algebraic models of
grammar which use types and reduction-rules to
mathematically formalise syntactic structures of
Fig. 1 Diagram for “Romeo who loves Juliet dies”. The sentences. Historically, the first formulation of Dis-
grammatical reduction is generated by the nested pattern CoCat employed pregroup grammars (Appendix
of non-crossing cups, which connect words through wires A), which were developed by Lambek [44]. How-
of types n or s. Grammaticality is verified by only one s-
wire left open. The diagram represents the meaning of the
ever, any other typeological grammar, such as
whole sentence from a process-theoretic point of view. The Combinatory Categorial Grammar (CCG), can be
relative pronoun ‘who’ is modeled by the Kronecker tensor. used to instantiate DisCoCat models.
Interpreting the diagram in CQM, it represents a quantum In this work we will work with pregroup gram-
state.
mars, to stay close to the existing DisCoCat
literature. In a pregroup grammar, a sentence’s
(QRAM). The mapping of the NLP taks to a type isQ composed of a finite product of words
quantum computation is attributed to the math- σ = i wi . A parser tags a word w ∈ σ with
ematical similarity of the structures underlying its part of speech. Accordingly, w is assigned a
pregroup type tw = i bκi i comprising a product
Q
DisCoCat and quantum theory. This similarity
becomes apparent when both are expressed in the of basic (or atomic) types bi from the finite set
graphical language of string diagrams of monoidal B. Each type carries an adjoint order κi ∈ Z.
categories or process theories [39]. The categor- Pregroup parsing is efficient; specifically it is lin-
ical formulation of quantum theory is known as ear time under reasonable assumptions [45]. The
Categorical Quantum Mechanics (CQM) [40] and type of a sentence is the product of the types
it becomes apparent that the string diagrams of its words and it is deemed grammatical iff
describing CQM are tensor networks endowed it type-reduces to the special type s0 ∈ B, i.e.
the sentence-type, tσ = w tw → s0 . Reductions
Q
with a graphical language in the form of a
rewrite-system (a.k.a. diagrammatic algebra with are performed by iteratively applying pairwise
string diagrams). The language of string diagrams annihilations of basic types with adjoint orders
places syntactic structures and quantum processes of the form bi bi+1 . As an example, consider the
on equal footing, and thus allows the canonical grammatical reduction:
instantiation of grammar-aware quantum models tRomeo who loves Juliet dies =
for NLP. tRomeo twho tloves tJuliet tdies =
In this work, we bring DisCoCat to the cur- (n0 )(n1 n0 s−1 n0 )(n1 s0 n−1 )(n0 )(n1 s0 ) →
rent age of noisy intermediate-scale quantum n0 n1 n0 s−1 n0 n1 s0 n−1 n0 n1 s0 → n0 s−1 s0 n1 s0 →
(NISQ) devices by performing the first-ever proof- n0 n1 s0 → s0 .
of-concept QNLP experiment on actual quantum At the core of DisCoCat is a process-theoretic
processors. We employ the framework introduced model of natural language meaning. Process
in Ref. [41] by adopting the paradigm of param- theories are alternatively known as symmetric
eterised quantum circuits (PQCs) as quantum monoidal (or tensor) categories [46]. Process net-
machine learning models [42, 43], which currently works such as those that manifest in DisCoCat can
dominates near-term algorithms. PQCs can be be represented graphically with string diagrams
used to parameterise quantum states and pro- [47]. String diagrams are not just convenient
cesses, as well as complex probability distribu- graphical notation, but they constitute a formal
tions, and so they can be used in NISQ machine graphical language for reasoning about complex
learning pipelines. The framework we use in this process networks (see Appendix B for an introduc-
work allows for the execution of experiments tion to string diagrams and concepts relevant to
involving non-trivial text corpora, which moreover this work). String diagrams are generated by boxes
involves complex grammatical structures. The with input and output wires, with each wire carry-
specific task we showcase here is binary classifica- ing a type. Boxes can be composed to form process
tion of sentences in a supervised-learning hybrid networks by wiring outputs to inputs and making
classical-quantum QML setup.
Springer Nature 2021 LATEX template
on the number of qubits assigned to each pre- a Kronecker tensor (a.k.a. ’spider’), whose entries
group type b ∈ B from which the word-types are are all zeros except when all indices are the same
composed and cups are mapped to Bell effects. for which case the entries are ones [15, 16]. It is
Given a sentence σ, we instantiate its quan- also known as a ’copy’ tensor, as it copies the
tum circuit by first concatenating in parallel the computational basis.
word-circuits of each word as they appear in the In Fig.2 we show an example of choices
sentence, corresponding N to performing a tensor of word-circuits for specific numbers of qubits
product, Cσ (θσ ) = w Cw (θw ) which prepares assigned to each basic pregroup type (Appendix
the state |σ(θσ )i from the all-zeros basis state. E3). In Fig.3 we show the corresponding circuit
As such, a sentence is parameterised by the con- to “Romeo who loves Juliet dies”. In practice,
catenation of parameters of its words, θσ = we perform the mapping of sentence diagrams
∪w∈σ θw . The parameters θw determine the word- to quantum circuits using the Python library
embedding |w(θw )i. In other words, we use the DisCoPy [50], which provides a data structure for
Hilbert space as a feature space [32, 48, 49] in monoidal string-diagrams and enables the instan-
which the word-embeddings are defined. Finally, tiation of functors, including functors based on
we apply Bell effects as dictated by the cup pat- PQCs.
tern in the grammatical reduction, a function Here, a motivating remark is in order. In ”clas-
whose result we shall denote gσ (|σ(θσ )i). Note sical” implementations of DisCoCat, where the
that in general this procedure prepares an unnor- semantics chosen in order to realise a model is
malised quantum state. In the special case where in terms of tensor networks, a sentence diagram
no qubits are assigned to the sentence type, i.e. represents a vector which results from a tensor
qs = 0, then it is an amplitude which we write contraction. In this case, meanings of words are
as hgσ |σ(θσ )i. Formally, this mapping constitutes encoded in the state-tensors in terms of cooc-
a parameterised functor from the pregroup gram- currence frequencies or other vector-space word-
mar category to the category of quantum circuits. embeddings [51]. In general, tensor contractions
The parameterisation is defined via a function are exponentially expensive to compute. The cost
from the set of parameters to functors from the scales exponentially with the order of the largest
aforementioned source and target categories. tensors present in the tensor network and the
Our model has hyperparameters (Appendix base of the scaling is the dimension of the vector
E). The wires of the DisCoCat diagrams we con- spaces carried by the wires. However, tensor net-
sider carry types n or s. The number of qubits works resulting from interpreting syntactic string
that we assign to each pregroup type are qn and diagrams over vector spaces and linear maps do
qs . These determine the arity of each word, i.e. the not have a generic topology; rather they are tree-
width of the quantum circuit that prepares each like. This means that contracting tensor networks
word-state. We set qs = 0 throughout this work, whose connectivity is given by pregroup reduc-
which establishes that the sentence-circuits repre- tions are efficiently contractable as a function of
sent scalars. For a unary word w, i.e. a word-state the dimension of the wires carrying the vector
on 1 qubit, we choose to prepare using two rota- spaces playing the role of semantic spaces. Even
2 1
tions as Rz (θw )Rx (θw )|0i. For a word w of arity in this case however, the dimension of the wires
k ≥ 2, we use a depth-d IQP-style parameterisa- for NLP-relevant applications can become pro-
tion [32] consisting of d layers where each layer hibitively large (order of hunderds) in practice. In
consists of a layer of Hadamard gates followed by a a fault-tolerant quantum computing setting, ide-
i
layer of controlled-Z rotations CRz (θw ), such that ally, as is proposed in Ref.[34], one has access to a
i ∈ {1, 2, . . . , d(k − 1)}. Such circuits are in part QRAM and one would be able to efficiently encode
motivated by the conjecture that circuits involv- such tensor entries as quantum amplitudes using
ing them are classically hard to evaluate [32]. The only ⌈log2 d⌉ qubits to encode a d-dimensional vec-
relative pronoun “who” is mapped to the GHZ cir- tor. However, building a QRAM currently remains
cuit, i.e. the circuit that prepares a GHZ state on challenging [52]. In the NISQ case, we still attempt
the number of qubits as determined by qn and qs . to take advantage of the tensor-product structure
This is justified by prior work where relative pro- defined by a collection of qubits which provides
nouns and other functional words are modelled by an exponentially large Hilbert space as a function
Springer Nature 2021 LATEX template
we show the convergence of the cost function not require postselection, such as the pregroup-
under SPSA optimisation and report the training based models where in order for each Bell effect
and testing errors for different choices of hyper- to take place one needs to postselect on measure-
parameters. This constitutes the first non-trivial ments involving the qubits on which one wishes to
QNLP experiment on a programmable quantum realise a Bell effect.
processor. According to Fig.4, scaling up the word- We also look toward more complex QNLP
circuits results in improvement in training and tasks such as sentence similarity and work with
testing errors, and remarkably, we observe this on real-world large-scale data using a pregroup
the quantum computer, as well. This is impor- parser, as made possible with lambeq [53]. In that
tant for the scalability of our experiment when context, regularisation techniques during training
future hardware allows for greater circuit sizes and will become important, which is an increasingly
thus richer quantum-enhanced feature spaces and relevant topic for QML that in general deserves
grammatically more complex sentences. more attention [43].
In addition, our DisCoCat-based QNLP frame-
4 Discussion and Outlook work is naturally generalisable to accommodate
mapping sentences to quantum circuits involving
We have performed the first-ever quantum natu- mixed states and quantum channels. This is useful
ral language processing experiment by means of as mixed states allow for modelling lexical entaile-
classification of sentences annotated with binary ment and ambiguity [63, 64]. As also stated above,
labels, a special case of QA, on actual quantum it is possible to define functors in terms of hybrid
hardware. We used a compositional-distributional models where both neural networks and PQCs are
model of meaning, DisCoCat, constructed by a involved, where heuristically one aims to quantify
structure-preserving mapping from grammatical the possible advantage of such models compared
reductions of sentences to PQCs. This proof- to strictly classical ones.
of-concept work serves as a demonstration that Furthermore, note that the word-embeddings
QNLP is possible on currently available quantum are learned in-task in this work. However, train-
devices and that it is a line of research worth ing PQCs to prepare quantum states that serve
exploring. as word embeddings can be achieved by using the
A remark on postselection is in order. QML- usual NLP objectives [51]. It is interesting to ver-
based QNLP tasks such as the one implemented ify that such pretrained word embeddings can be
in this work rely on the optimisation of a scalar useful in downstream tasks, such as the simple
cost function. In general, estimating a scalar classification task presented in this work.
encoded in an amplitude on a quantum computer Finally, looking beyond the DisCoCat model,
requires either postselection or coherent control it is well-motivated to adopt the recently intro-
over arbitrary circuits so that a swap test or duced DisCoCirc model [65] of meaning and its
a Hadamard test can be performed (Appendix mapping to PQCs [66], which allows for QNLP
H). Notably, in special cases of interest to QML, experiments on text-scale real-world data in a
the Hadamard test can be adapted to NISQ fully-compositional framework. In this model,
technologies [61, 62]. In its general form, how- nouns are treated as first-class citizen ‘entities’ of
ever, the depth-cost resulting after compilation of a text and makes sentence composition explicit.
controlled circuits becomes prohibitable with cur- Entities go through gates which act as modifiers
rent quantum devices. However, given the rapid on them, modelling for example the application
improvement in quantum computing hardware, we of adjectives or verbs. The model also considers
envision that such operations will be within reach higher-order modifiers, such as adverbs modify-
in the near-term. ing verbs. This interaction structure, viewed as
Future work includes experimentation with a process network, can again be used to instan-
other grammars, such as CCG which returns tiate models in terms of neural networks, tensor
tree-like diagrams, and using them to construct networks, or quantum circuits. In the latter case,
PQC-based functors, as is done in Ref.[14] but entities are modelled as density matrices carried
with neural networks. This for example would by wires and their modifiers as quantum channels.
enable the design of PQC-based functors that do
Springer Nature 2021 LATEX template
Acknowledgments
KM thanks Vojtech Havlicek and Christopher
Self for discussions on QML, Robin Lorenz
and Marcello Benedetti for comments on the
manuscript, and the TKET team at Quantinuum
for technical advice on quantum compilation on
IBMQ machines. KM is grateful to the Royal
Commission for the Exhibition of 1851 for finan-
cial support under a Postdoctoral Research
Fellowship. AT thanks Simon Harrison for finan-
cial support through the Wolfson Harrison UK
Research Council Quantum Foundation Scholar-
ship. We acknowledge the use of IBM Quantum
services for this work. The views expressed are
those of the authors, and do not reflect the official
policy or position of IBM or the IBM Quantum
team.
Appendix A Pregroup
Grammar Appendix B String
Diagrams
Pregroup grammars where introduced by Lambek
as an algebraic model for grammar [44]. String diagrams describing process theories are
A pregroup grammar G is freely generated by generated by states, effects, and processes. In
the basic types in a finite set b ∈ B. Basic types Fig.B1 we comprehensively show these generators
are decorated by an integer k ∈ Z, which signi- along with constraining equations on them. String
fies their adjoint order. Negative integers −k, with diagrams for process theories formally describe
k ∈ N, are called left adjoints of order k and pos- process networks where only connectivity matters,
itive integers k ∈ N are called right adjoints. We i.e. which outputs are connected to which inputs.
shall refer to a basic type to some adjoint order In other words, the length of the wires carries no
(include the zeroth order) simply as ‘type’. The meaning and the wires are freely deformable as
zeroth order k = 0 signifies no adjoint action on long as the topology of the network is respected.
the basic type and so we often omit it in notation, It is beyond the purposes of this work to pro-
b0 = b. vide a comprehensive exposition on diagrammatic
The pregroup algebra is such that the two languages. We provide the necessary elements
kinds of adjoint (left and right) act as left and which are used for the implementation of our
right inverses under multiplication of basic types QNLP experiments.
bk bk+1 → ǫ → bk+1 bk ,
Appendix C Random
where ǫ ∈ B is the trivial or unit type. The left Sentence
hand side of this reduction is called a contraction Generation
and the right hand side an expansion. Pregroup with CFG
grammar also accommodates induced steps a → b
for a, b ∈ B. The symbol ‘→’ is to be read as A context-free grammar generates a language from
‘type-reduction’ and the pregroup grammar sets a set of production (or rewrite) rules applied on
the rules for which reductions are valid. symbols. Symbols belong to a finite set Σ and
Now, to go from word to sentence, we consider There is a special type S ∈ Σ called initial. Pro-
a finite set of words called the vocabulary V . We duction rules
Qbelong to a finite set R and are of the
call the dictionary (or lexicon) the finite set of form T → i Ti , where T, Ti ∈ Σ. The applica-
entries D ⊆ V × (B × Z)∗ . The star symbol A∗ tion of a production rule results in substituting a
Springer Nature 2021 LATEX template
11
Fig. B1 Diagrams are read from top to bottom. States rule whose output is the S-type. Then we ran-
have only outputs, effects have only inputs, processes
(boxes) have both input and output wires. All wires carry domly pick boxes and compose them backwards,
types. Placing boxes side by side is allowed by the monoidal always respecting type-matching when inputs of
structure and signifies parallel processes. Sequential pro- production rules are fed into outputs of other pro-
cess composition is represented by composing outputs of
a box with inputs of another box. A process transforms a
duction rules. The generation terminates when
state into a new state. There are special kinds of states production rules are applied which have no inputs
called caps and effects called cups, which satisfy the snake (i.e. they are states), and they correspond to the
equation which relates them to the identity wire (trivial words in the finite vocabulary.
process). Process networks freely generated by these gen-
erators need not be planar, and so there exists a special
In Fig.C2 (on the left hand side of the arrows)
process that swaps wires and acts trivially on caps and we show the string-diagram generators we use
cups. to randomly produce sentences from a vocabu-
lary of words composed of nouns, transitive verbs,
intransitive verbs, and relative pronouns. The cor-
symbol with a product (or string) of symbols. Ran- responding types of these parts of speech are
domly generating a sentence amounts to starting N, T V, IV, RP RON . The vocabulary is the union
from S and randomly applying production rules of the words of each type, V = VN ∪ VT V ∪ VIV ∪
uniformly sampled from the set R. The produc- VRP RON .
tion ends when all types produced are terminal Having randomly generated a sentence from
types, which are non other than words in the finite the CFG, its string diagram can be translated into
vocabulary V . a pregroup sentence diagram. To do so we use the
From a process theory point of view, we repre- translation rules shown in Fig.C2. Note that a cup
sent symbols as types carried by wires. Production labeled by the basic type b is used to represent a
rules are represented as boxes with input and out- contraction bk bk+1 → ǫ. Pregroup grammars are
put wires labelled by the appropriate types. The weakly equivalent to context-free grammars, in the
process network (or string diagram) describing the sense that they generate the same language [67,
production of a sentence ends with a production 68].
Springer Nature 2021 LATEX template
13
Appendix G Quantum
Compilation
In order to perform quantum compilation we use
pytket [60]. It is a Python module for interfacing
with CQC’s TKET, a toolset for quantum pro-
gramming. From this toolbox, we need to make
use of compilation passes.
At a high level, quantum compilation can
be described as follows. Given a circuit and a
Fig. F5 Algebraic decay of mean training and testing device, quantum operations are decomposed in
error for the data displayed in Fig.4 (bottom) obtained terms of the devices native gateset. Further-
by basinhopping. Increasing the depth of the word-circuits more, the quantum circuit is reshaped in order
results in algebraic decay of the mean training and test-
to make it compatible with the device’s topol-
ing errors. The slopes are log etr ∼ log −1.2d and log ete ∼
−0.3 log d. We attribute the existence of the plateau for etr ogy [74]. Specifically, the compilation pass that we
at large depths is due the small scale of our experiment and use is default compilation pass(2). The inte-
the small values for our hyperparameters determining the ger option is set to 2 for maximum optimisation
size of the quantum-enhanced feature space.
under compilation [75].
Circuits written in pytket can be run on
other devices by simply changing the back-
temperature value (1) and the default number of end being called, regardless whether the hard-
basin hops (100). ware might be fundamentally different in terms
of what physical systems are used as qubits.
This makes TKET it platform agnostic. We
F.1 Error Decay stress that on IBMQ machines specifically, the
In Fig.F5 we show the decay of mean training and native gates are arbitrary single-qubit unitaries
test errors for the question answering task for cor- (‘U3’ gate) and entangling controlled-not gates
pus K30 simulated classically, which is shown as (‘CNOT’ or ‘CX’). Importantly, CNOT gates
inset in Fig.4. Plotting in log-log scale we reveal, show error rates which are one or even two
at least initially, an algebraic decay of the errors orders of magnitude larger than error rates of U3
with the depth of the word-circuits. gates. Therefore, we measure the depth of or cir-
cuits in terms of the CNOT-depth. Using pytket
this can be obtained by invoking the command
F.2 On the influence of noise to depth by type(OpType.CX).
the cost function landscape For both backends used in this work,
ibmq montreal and ibmq toronto, the reported
Regarding optimisation on a quantum computer, quantum volume is 32 and the maximum allowed
we comment on the effect of noise on the optimal number of shots is 213 .
parameters. Consider a successful optimisation
of L(θ) performed on a NISQ device, return-
∗
ing θNISQ . However, if we instantiate the circuits Appendix H Swap Test and
∗
Cσ (θNISQ ) and evaluate them on a classical com- Hadamard Test
CC ∗
puter to obtain the predicted labels lpr (θNISQ ),
we observe that these can in general differ from In our binary classification NLP task, the pre-
NISQ ∗
the labels lpr (θNISQ ) predicted by evaluating dicted label is the norm squared of zero-to-zero
the circuits on the quantum computer. In the con- transition amplitude where the unitary U rep-
text of a fault-tolerant quantum computer, this resents the word-circuits and the circuits that
should not be the case. However, since there is implement the Bell effects as dictated by the gram-
a non-trivial coherent-noise channel that our cir- matical structure. Estimating |h0 . . . 0|U |0 . . . 0i|2 ,
cuits undergo, it is expected that the optimiser’s or the amplitude h0 . . . 0|U |0 . . . 0i itself in case one
result are affected in this way. wants to define a cost function where it appears
Springer Nature 2021 LATEX template
15
Models are Few-Shot Learners (2020) [14] Socher, R., Perelygin, A., Wu, J., Chuang, J.,
Manning, C.D., Ng, A., Potts, C.: Recursive
[4] TURING, A.M.: I.—COMPUTING deep models for semantic compositionality
MACHINERY AND INTELLIGENCE. over a sentiment treebank. In: Proceedings of
Mind LIX(236), 433–460 (1950) https:// the 2013 Conference on Empirical Methods in
academic.oup.com/mind/article-pdf/LIX/ Natural Language Processing, pp. 1631–1642.
236/433/30123314/lix-236-433.pdf. https:// Association for Computational Linguistics,
doi.org/10.1093/mind/LIX.236.433 Seattle, Washington, USA (2013). https://
www.aclweb.org/anthology/D13-1170
[5] Searls, D.B.: The language of genes. Nature
420(6912), 211–217 (2002). https://doi.org/ [15] Sadrzadeh, M., Clark, S., Coecke, B.:
10.1038/nature01255 The frobenius anatomy of word meanings
i: subject and object relative pronouns.
[6] Zeng, Z., Shi, H., Wu, Y., Hong, Z.: Sur- Journal of Logic and Computation 23(6),
vey of natural language processing tech- 1293–1317 (2013). https://doi.org/10.1093/
niques in bioinformatics. Computational logcom/ext044
and Mathematical Methods in Medicine
2015, 674296 (2015). https://doi.org/10. [16] Sadrzadeh, M., Clark, S., Coecke, B.: The
1155/2015/674296 frobenius anatomy of word meanings ii: pos-
sessive relative pronouns. Journal of Logic
[7] Buhrmester, V., Münch, D., Arens, M.: Anal- and Computation 26(2), 785–815 (2014).
ysis of Explainers of Black Box Deep Neural https://doi.org/10.1093/logcom/exu027
Networks for Computer Vision: A Survey
(2019) [17] Lewis, M.: Towards logical negation for com-
positional distributional semantics (2020)
[8] Lambek, J.: The mathematics of sentence
structure. AMERICAN MATHEMATICAL [18] Pestun, V., Vlassopoulos, Y.: Tensor network
MONTHLY, 154–170 (1958) language model (2017)
[9] MONTAGUE, R.: Universal grammar. Theo- [19] Gallego, A.J., Orus, R.: Language Design as
ria 36(3), 373–398 (2008). https://doi.org/ Information Renormalization (2019)
10.1111/j.1755-2567.1970.tb00434.x
[20] Bradley, T.-D., Stoudenmire, E.M., Ter-
[10] Chomsky, N.: Syntactic Structures. Mouton, illa, J.: Modeling Sequences with Quantum
(1957) States: A Look Under the Hood (2019)
[11] Coecke, B., Sadrzadeh, M., Clark, S.: Math- [21] Efthymiou, S., Hidary, J., Leichenauer, S.:
ematical Foundations for a Compositional TensorNetwork for Machine Learning (2019)
Distributional Model of Meaning (2010)
[22] Eisert, J.: Entanglement and tensor network
[12] Grefenstette, E., Sadrzadeh, M.: Experimen- states (2013)
tal support for a categorical compositional
distributional model of meaning. In: The 2014 [23] Orús, R.: Tensor networks for complex
Conference on Empirical Methods on Natural quantum systems. Nature Reviews Physics
Language Processing., pp. 1394–1404 (2011). 1(9), 538–550 (2019). https://doi.org/10.
arXiv:1106.4058 1038/s42254-019-0086-7
[13] Kartsaklis, D., Sadrzadeh, M.: Prior disam- [24] Arute, F., Arya, K., Babbush, R., Bacon,
biguation of word tensors for constructing D., Bardin, J.C., Barends, R., Biswas, R.,
sentence vectors. In: The 2013 Conference Boixo, S., Brandao, F.G.S.L., Buell, D.A.,
on Empirical Methods on Natural Language Burkett, B., Chen, Y., Chen, Z., Chiaro, B.,
Processing., pp. 1590–1601. ACL, (2013) Collins, R., Courtney, W., Dunsworth, A.,
Springer Nature 2021 LATEX template
17
Farhi, E., Foxen, B., Fowler, A., Gidney, C., In: Wallach, H., Larochelle, H., Beygelz-
Giustina, M., Graff, R., Guerin, K., Habeg- imer, A., d' Alché-Buc, F., Fox, E., Garnett,
ger, S., Harrigan, M.P., Hartmann, M.J., Ho, R. (eds.) Advances in Neural Information
A., Hoffmann, M., Huang, T., Humble, T.S., Processing Systems, vol. 32, pp. 4134–4144.
Isakov, S.V., Jeffrey, E., Jiang, Z., Kafri, Curran Associates, Inc., (2019). https://
D., Kechedzhi, K., Kelly, J., Klimov, P.V., proceedings.neurips.cc/paper/2019/file/
Knysh, S., Korotkov, A., Kostritsa, F., Land- 16026d60ff9b54410b3435b403afd226-Paper.
huis, D., Lindmark, M., Lucero, E., Lyakh, pdf
D., Mandrà, S., McClean, J.R., McEwen, M.,
Megrant, A., Mi, X., Michielsen, K., Mohseni, [30] Dunjko, V., Taylor, J.M., Briegel, H.J.:
M., Mutus, J., Naaman, O., Neeley, M., Neill, Quantum-enhanced machine learning. Phys-
C., Niu, M.Y., Ostby, E., Petukhov, A., Platt, ical Review Letters 117(13) (2016). https://
J.C., Quintana, C., Rieffel, E.G., Roushan, doi.org/10.1103/physrevlett.117.130501
P., Rubin, N.C., Sank, D., Satzinger, K.J.,
Smelyanskiy, V., Sung, K.J., Trevithick, [31] Chia, N.-H., Gilyén, A., Li, T., Lin, H.-H.,
M.D., Vainsencher, A., Villalonga, B., White, Tang, E., Wang, C.: Sampling-based sublin-
T., Yao, Z.J., Yeh, P., Zalcman, A., Neven, ear low-rank matrix arithmetic framework
H., Martinis, J.M.: Quantum supremacy for dequantizing quantum machine learn-
using a programmable superconducting pro- ing. Proceedings of the 52nd Annual ACM
cessor. Nature 574(7779), 505–510 (2019). SIGACT Symposium on Theory of Comput-
https://doi.org/10.1038/s41586-019-1666-5 ing (2020). https://doi.org/10.1145/3357713.
3384314
[25] Bharti, K., Cervera-Lierta, A., Kyaw, T.H.,
Haug, T., Alperin-Lea, S., Anand, A., Deg- [32] Havlı́ček, V., Córcoles, A.D., Temme, K.,
roote, M., Heimonen, H., Kottmann, J.S., Harrow, A.W., Kandala, A., Chow, J.M.,
Menke, T., Mok, W.-K., Sim, S., Kwek, L.-C., Gambetta, J.M.: Supervised learning with
Aspuru-Guzik, A.: Noisy intermediate-scale quantum-enhanced feature spaces. Nature
quantum (NISQ) algorithms (2021) 567(7747), 209–212 (2019). https://doi.org/
10.1038/s41586-019-0980-2
[26] Wootton, J.R.: Procedural generation using
quantum computation. International Confer- [33] Li, Z., Liu, X., Xu, N., Du, J.: Experimen-
ence on the Foundations of Digital Games tal realization of a quantum support vector
(2020). https://doi.org/10.1145/3402942. machine. Physical Review Letters 114(14)
3409600 (2015). https://doi.org/10.1103/physrevlett.
114.140504
[27] Harrow, A.W., Hassidim, A., Lloyd, S.:
Quantum algorithm for linear systems of [34] Zeng, W., Coecke, B.: Quantum algorithms
equations. Physical Review Letters 103(15) for compositional natural language process-
(2009). https://doi.org/10.1103/physrevlett. ing. Electronic Proceedings in Theoreti-
103.150502 cal Computer Science 221, 67–75 (2016).
https://doi.org/10.4204/eptcs.221.8
[28] Beer, K., Bondarenko, D., Farrelly, T.,
Osborne, T.J., Salzmann, R., Scheiermann, [35] O’Riordan, L.J., Doyle, M., Baruffa, F., Kan-
D., Wolf, R.: Training deep quantum neu- nan, V.: A hybrid classical-quantum work-
ral networks. Nature Communications 11(1) flow for natural language processing. Machine
(2020). https://doi.org/10.1038/s41467-020- Learning: Science and Technology (2020).
14454-2 https://doi.org/10.1088/2632-2153/abbd2e
[29] Kerenidis, I., Landman, J., Luongo, A., [36] Wiebe, N., Bocharov, A., Smolensky, P.,
Prakash, A.: q-means: A quantum algo- Troyer, M., Svore, K.M.: Quantum Language
rithm for unsupervised machine learning. Processing (2019)
Springer Nature 2021 LATEX template
[37] Bausch, J., Subramanian, S., Piddock, S.: A learning in feature hilbert spaces. Physical
Quantum Search Decoder for Natural Lan- Review Letters 122(4) (2019). https://doi.
guage Processing (2020) org/10.1103/physrevlett.122.040504
[38] Chen, J.C.: Quantum computation and nat- [49] Lloyd, S., Schuld, M., Ijaz, A., Izaac, J., Kil-
ural language processing (2002) loran, N.: Quantum embeddings for machine
learning (2020)
[39] Coecke, B., Kissinger, A.: Picturing Quan-
tum Processes. A First Course in Quantum [50] de Felice, G., Toumi, A., Coecke, B.: Dis-
Theory and Diagrammatic Reasoning. Cam- CoPy: Monoidal Categories in Python (2020)
bridge University Press, (2017). https://doi.
org/10.1017/9781316219317 [51] Mikolov, T., Chen, K., Corrado, G., Dean, J.:
Efficient Estimation of Word Representations
[40] Abramsky, S., Coecke, B.: A categorical in Vector Space (2013)
semantics of quantum protocols. In: Pro-
ceedings of the 19th Annual IEEE Sympo- [52] Aaronson, S.: Read the fine print. Nature
sium on Logic in Computer Science, 2004., Physics 11(4), 291–293 (2015)
pp. 415–425 (2004). https://doi.org/10.1109/
LICS.2004.1319636 [53] Kartsaklis, D., Fan, I., Yeung, R., Pear-
son, A., Lorenz, R., Toumi, A., de Felice,
[41] Meichanetzidis, K., Gogioso, S., Felice, G.D., G., Meichanetzidis, K., Clark, S., Coecke,
Chiappori, N., Toumi, A., Coecke, B.: Quan- B.: lambeq: An Efficient High-Level Python
tum Natural Language Processing on Near- Library for Quantum NLP (2021)
Term Quantum Computers (2020)
[54] Yeung, R., Kartsaklis, D.: A CCG-Based
[42] Schuld, M., Bocharov, A., Svore, K.M., Version of the DisCoCat Framework (2021)
Wiebe, N.: Circuit-centric quantum clas-
sifiers. Physical Review A 101(3) (2020). [55] Spall, J.C.: Implementation of the simulta-
https://doi.org/10.1103/physreva.101. neous perturbation algorithm for stochas-
032308 tic optimization. IEEE Transactions on
Aerospace and Electronic Systems 34(3),
[43] Benedetti, M., Lloyd, E., Sack, S., Fioren- 817–823 (1998). https://doi.org/10.1109/7.
tini, M.: Parameterized quantum circuits as 705889
machine learning models. Quantum Science
and Technology 4(4), 043001 (2019). https:// [56] Bonet-Monroig, X., Wang, H., Vermetten, D.,
doi.org/10.1088/2058-9565/ab4eb5 Senjean, B., Moussa, C., Bäck, T., Dunjko,
V., O’Brien, T.E.: Performance comparison
[44] Lambek, J.: From word to sentence of optimization methods on variational quan-
tum algorithms (2021)
[45] Preller, A.: Linear processing with pregroups.
Studia Logica: An International Journal for [57] de Felice, G., Meichanetzidis, K., Toumi,
Symbolic Logic 87(2/3), 171–197 (2007) A.: Functorial question answering. Electronic
Proceedings in Theoretical Computer Science
[46] Baez, J.C., Stay, M.: Physics, Topology, Logic 323, 84–94 (2020). https://doi.org/10.4204/
and Computation: A Rosetta Stone (2009) eptcs.323.6
[47] Selinger, P.: A survey of graphical languages [58] Chen, Y., Pan, Y., Dong, D.: Quantum Lan-
for monoidal categories. Lecture Notes in guage Model with Entanglement Embedding
Physics, 289–355 (2010). https://doi.org/10. for Question Answering (2020)
1007/978-3-642-12821-9 4
[59] Zhao, Q., Hou, C., Liu, C., Zhang, P., Xu,
[48] Schuld, M., Killoran, N.: Quantum machine
Springer Nature 2021 LATEX template
19
R.: A quantum expectation value based lan- [71] Olson, B., Hashmi, I., Molloy, K., Shehu,
guage model with application to question A.: Basin hopping as a general and versatile
answering. Entropy 22(5), 533 (2020) optimization framework for the characteriza-
tion of biological macromolecules. Advances
[60] Sivarajah, S., Dilkes, S., Cowtan, A., Sim- in Artificial Intelligence 2012, 1–19 (2012).
mons, W., Edgington, A., Duncan, R.: Tket: https://doi.org/10.1155/2012/674832
A retargetable compiler for nisq devices.
Quantum Science and Technology (2020). [72] Gao, F., Han, L.: Implementing the nelder-
https://doi.org/10.1088/2058-9565/ab8e92 mead simplex algorithm with adaptive
parameters. Computational Optimization
[61] Mitarai, K., Fujii, K.: Methodology for and Applications 51(1), 259–277 (2010).
replacing indirect measurements with direct https://doi.org/10.1007/s10589-010-9329-3
measurements. Physical Review Research
1(1) (2019). https://doi.org/10.1103/ [73] https://pypi.org/project/scipy
physrevresearch.1.013006
[74] Cowtan, A., Dilkes, S., Duncan, R., Kra-
[62] Benedetti, M., Fiorentini, M., Lubasch, M.: jenbrink, A., Simmons, W., Sivarajah,
Hardware-efficient variational quantum algo- S.: On the Qubit Routing Problem. In:
rithms for time evolution (2020) van Dam, W., Mancinska, L. (eds.) 14th
Conference on the Theory of Quantum
[63] Piedeleu, R., Kartsaklis, D., Coecke, B., Computation, Communication and Cryp-
Sadrzadeh, M.: Open System Categorical tography (TQC 2019). Leibniz Interna-
Quantum Semantics in Natural Language tional Proceedings in Informatics (LIPIcs),
Processing (2015) vol. 135, pp. 5–1532. Schloss Dagstuhl–
Leibniz-Zentrum fuer Informatik, Dagstuhl,
[64] Bankova, D., Coecke, B., Lewis, M., Marsden, Germany (2019). https://doi.org/10.4230/
D.: Graded Entailment for Compositional LIPIcs.TQC.2019.5. http://drops.dagstuhl.
Distributional Semantics (2016) de/opus/volltexte/2019/10397
[65] Coecke, B.: The Mathematics of Text Struc- [75] https://github.com/CQCL/pytket
ture (2020)
[76] Aharonov, D., Jones, V., Landau, Z.: A
[66] Coecke, B., de Felice, G., Meichanetzidis, K., polynomial quantum algorithm for approx-
Toumi, A.: Foundations for Near-Term Quan- imating the jones polynomial. Algorithmica
tum Natural Language Processing (2020) 55(3), 395–421 (2008). https://doi.org/10.
1007/s00453-008-9168-0
[67] Buszkowski, W., Moroz, K.: Pregroup Gram-
mars and Context-free Grammars
[69] https://github.com/andim/noisyopt