0% found this document useful (0 votes)
32 views

QA Using QNLP

This document summarizes a research paper that presents a novel approach for grammar-aware sentence classification using quantum computers. The approach uses the Categorical Distributional Compositional model to represent sentences as parameterized quantum circuits, with word meanings embedded as quantum states and grammatical structure represented by entanglement patterns. The quantum circuits are trained on a supervised natural language processing task of binary classification using a classical optimizer. The results demonstrate promise for scalability as quantum hardware improves.

Uploaded by

Soumik Maity
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

QA Using QNLP

This document summarizes a research paper that presents a novel approach for grammar-aware sentence classification using quantum computers. The approach uses the Categorical Distributional Compositional model to represent sentences as parameterized quantum circuits, with word meanings embedded as quantum states and grammatical structure represented by entanglement patterns. The quantum circuits are trained on a supervised natural language processing task of binary classification using a classical optimizer. The results demonstrate promise for scalability as quantum hardware improves.

Uploaded by

Soumik Maity
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Springer Nature 2021 LATEX template

Grammar-aware sentence classification on quantum computers


arXiv:2012.03756v2 [quant-ph] 14 Feb 2023

Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice and Bob Coecke


Quantinuum, 17 Beaumont Street, Oxford OX1 2NA, United Kingdom.
Department of Computer Science, University of Oxford, OX1 3QD, United Kingdom.

*Corresponding author(s). E-mail(s): [email protected];

Abstract
Natural language processing (NLP) is at the forefront of great advances in contemporary AI, and
it is arguably one of the most challenging areas of the field. At the same time, in the area of
Quantum Computing (QC), with the steady growth of quantum hardware and notable improve-
ments towards implementations of quantum algorithms, we are approaching an era when quantum
computers perform tasks that cannot be done on classical computers with a reasonable amount of
resources. This provides an new range of opportunities for AI, and for NLP specifically. In this work,
we work with the Categorical Distributional Compositional (DisCoCat) model of natural language
meaning, whose underlying mathematical underpinnings make it amenable to quantum instantia-
tions. Earlier work on fault-tolerant quantum algorithms has already demonstrated potential quantum
advantage for NLP, notably employing DisCoCat. In this work, we focus on the capabilities of noisy
intermediate-scale quantum (NISQ) hardware and perform the first implementation of an NLP task
on a NISQ processor, using the DisCoCat framework. Sentences are instantiated as parameterised
quantum circuits; word-meanings are embedded in quantum states using parameterised quantum-
circuits and the sentence’s grammatical structure faithfully manifests as a pattern of entangling
operations which compose the word-circuits into a sentence-circuit. The circuits’ parameters are
trained using a classical optimiser in a supervised NLP task of binary classification. Our novel QNLP
model shows concrete promise for scalability as the quality of the quantum hardware improves in the
near future and solidifies a novel branch of experimental research at the intersection of QC and AI.

Keywords: string diagrams, compositionality, natural language processing, quantum computing

1 Introduction of words in the context of other words. The tex-


tual data on which they are trained are mined
NLP is a rapidly evolving area of AI of both the- from large text corpora; the probability distribu-
oretical importance and practical interest [1, 2]. tion to be learned captures the statistical patterns
Large language models, such as the relatively of cooccurence of words in the data. Due to this,
recent GPT-3 with its 175 billion parameters [3], in this work we shall refer to such models as
show impressive results on general NLP tasks distributional.
and one dares to claim that humanity is enter- NLP technology becomes increasingly entan-
ing Turing-test territory [4]. Such models, almost gled with everyday life as part of search engines,
always based on neural networks, work by learn- personal assistants, information extraction and
ing to model conditional probability distributions data-mining algorithms, medical diagnoses, and
even bioinformatics [5, 6]. Despite success in both

1
Springer Nature 2021 LATEX template

language understanding and language generation, fast pace. The prominence of QC is now well-
under the hood of mainstream NLP models one established, especially after the experiments aim-
exclusively finds deep neural networks, which ing to demonstrate quantum advantage for the
famously suffer the criticism of being uninter- specific task of sampling from random quantum
pretable black boxes [7]. circuits [24]. QC has the potential to reach the
One way to bring transparency to said black whole range of human interests, from foundations
boxes, is to explicitly incorporate linguistic struc- of physics and computer science, to applications
ture, such as grammar and syntax [8–10], into in engineering, finance, chemistry, and optimi-
distributional language models. Note, in this work sation problems [25], and even procedural map
we will use the terms grammar and syntax inter- generation [26].
changeably, the essence of these terms being In the last half-decade, the natural conceptual
that they refer to structural information that fusion of QC with AI, and especially the sub-
the textual data could be supplemented with. A field of AI known as machine learning (ML), has
prominent approach attempting this merge is the lead to a plethora of novel and exciting advance-
Distributional Compositional Categorical model ments. The quantum machine learning (QML)
of natural language meaning (DisCoCat) [11– literature has reached an immense size considering
13], which pioneered the paradigm of combining its young age, with the cross-fertilisation of ideas
explicit grammatical (or syntactic) structure with and methods between fields of research as well
distributional (or statistical) methods for encod- as academia and industry being a dominant driv-
ing and computing meaning (or semantics). There ing force. The landscape includes using quantum
has also been follow-up related work on neural- computers for subroutines in ML algorithms for
based models where syntax is also incorporated executing linear algebra operations [27], or quan-
in a recursive neural network, where the syntac- tising classical machine learning algorithms based
tic structures dictate the order of the recursive on neural networks [28], support vector machines,
calls to the recurring cell [14]. This approach also clustering [29], or artificial agents who learn from
provides the tools for modelling linguistic phenom- interacting with their environment [30], and even
ena such as lexical entailment and ambiguity, as quantum-inspired and dequantised classical algo-
well as the transparent construction of syntactic rithms which nevertheless retain a complexity
structures like, relative and possessive pronouns theoretic advantage [31]. Small-scale classification
[15, 16], conjunction, disjunction, and negation experiments have also been implemented with
[17]. quantum technology [32, 33].
From a modern lens, DisCoCat as it is pre- From this collection of ingredients there organ-
sented in the literature, is a tensor network lan- ically emerges the interdisciplinary field of Quan-
guage model. Recently, the motivation for design- tum Natural Language Processing (QNLP), a
ing interpretable AI systems has caused a surge in research area still in its infancy [34–38], com-
the use of tensor networks in language modelling bines NLP and QC and seeks novel quantum
[18–21]. A tensor network is a graph whose ver- language model designs and quantum algorithms
tices are endowed with tensors. Every vertex has for NLP tasks. Building on the recently estab-
an arity, ie a number of edges to which it belongs lished methodology of QML, one imports QC
which represent the tensor’s indices. Edges rep- algorithms to obtain theoretical speedups for spe-
resent tensor contractions, ie identification of the cific NLP tasks or use the quantum Hilbert space
indices joined by the edge and summation over as a feature space in which NLP tasks are to be
their range (Eistein summation). Intuitively, a executed.
tensor network is a compressed representation of a The first paper on QNLP using the DisCoCat
multiliear map. Tensor networks, have been used framework by Zeng and Coecke [34], introduced
to capture probability distributions of complex an approach where a standard NLP task are
many-body systems, both classical and quantum, instantiated as quantum computations. The task
and they also have a formal appeal as rigorous of sentence similarity was reduced to the closest-
algebraic tools [22, 23]. vector problem, for which there exists a quantum
Quantum computing (QC) is a field which, algorithm providing a quadratic speedup, albeit
in parallel with NLP is growing at an extremely assuming a Quantum Random Access Memory
Springer Nature 2021 LATEX template

2 The model
The DisCoCat model relies on algebraic models of
grammar which use types and reduction-rules to
mathematically formalise syntactic structures of
Fig. 1 Diagram for “Romeo who loves Juliet dies”. The sentences. Historically, the first formulation of Dis-
grammatical reduction is generated by the nested pattern CoCat employed pregroup grammars (Appendix
of non-crossing cups, which connect words through wires A), which were developed by Lambek [44]. How-
of types n or s. Grammaticality is verified by only one s-
wire left open. The diagram represents the meaning of the
ever, any other typeological grammar, such as
whole sentence from a process-theoretic point of view. The Combinatory Categorial Grammar (CCG), can be
relative pronoun ‘who’ is modeled by the Kronecker tensor. used to instantiate DisCoCat models.
Interpreting the diagram in CQM, it represents a quantum In this work we will work with pregroup gram-
state.
mars, to stay close to the existing DisCoCat
literature. In a pregroup grammar, a sentence’s
(QRAM). The mapping of the NLP taks to a type isQ composed of a finite product of words
quantum computation is attributed to the math- σ = i wi . A parser tags a word w ∈ σ with
ematical similarity of the structures underlying its part of speech. Accordingly, w is assigned a
pregroup type tw = i bκi i comprising a product
Q
DisCoCat and quantum theory. This similarity
becomes apparent when both are expressed in the of basic (or atomic) types bi from the finite set
graphical language of string diagrams of monoidal B. Each type carries an adjoint order κi ∈ Z.
categories or process theories [39]. The categor- Pregroup parsing is efficient; specifically it is lin-
ical formulation of quantum theory is known as ear time under reasonable assumptions [45]. The
Categorical Quantum Mechanics (CQM) [40] and type of a sentence is the product of the types
it becomes apparent that the string diagrams of its words and it is deemed grammatical iff
describing CQM are tensor networks endowed it type-reduces to the special type s0 ∈ B, i.e.
the sentence-type, tσ = w tw → s0 . Reductions
Q
with a graphical language in the form of a
rewrite-system (a.k.a. diagrammatic algebra with are performed by iteratively applying pairwise
string diagrams). The language of string diagrams annihilations of basic types with adjoint orders
places syntactic structures and quantum processes of the form bi bi+1 . As an example, consider the
on equal footing, and thus allows the canonical grammatical reduction:
instantiation of grammar-aware quantum models tRomeo who loves Juliet dies =
for NLP. tRomeo twho tloves tJuliet tdies =
In this work, we bring DisCoCat to the cur- (n0 )(n1 n0 s−1 n0 )(n1 s0 n−1 )(n0 )(n1 s0 ) →
rent age of noisy intermediate-scale quantum n0 n1 n0 s−1 n0 n1 s0 n−1 n0 n1 s0 → n0 s−1 s0 n1 s0 →
(NISQ) devices by performing the first-ever proof- n0 n1 s0 → s0 .
of-concept QNLP experiment on actual quantum At the core of DisCoCat is a process-theoretic
processors. We employ the framework introduced model of natural language meaning. Process
in Ref. [41] by adopting the paradigm of param- theories are alternatively known as symmetric
eterised quantum circuits (PQCs) as quantum monoidal (or tensor) categories [46]. Process net-
machine learning models [42, 43], which currently works such as those that manifest in DisCoCat can
dominates near-term algorithms. PQCs can be be represented graphically with string diagrams
used to parameterise quantum states and pro- [47]. String diagrams are not just convenient
cesses, as well as complex probability distribu- graphical notation, but they constitute a formal
tions, and so they can be used in NISQ machine graphical language for reasoning about complex
learning pipelines. The framework we use in this process networks (see Appendix B for an introduc-
work allows for the execution of experiments tion to string diagrams and concepts relevant to
involving non-trivial text corpora, which moreover this work). String diagrams are generated by boxes
involves complex grammatical structures. The with input and output wires, with each wire carry-
specific task we showcase here is binary classifica- ing a type. Boxes can be composed to form process
tion of sentences in a supervised-learning hybrid networks by wiring outputs to inputs and making
classical-quantum QML setup.
Springer Nature 2021 LATEX template

Fig. 3 The PQC to which “Romeo who loves Juliet dies”


of Fig.1 is mapped, with the choices of hyper-parameters
Fig. 2 Example instance of mapping from sentence dia- of Fig.2. As qs = 0, the circuit represents a scalar.
grams to PQCs where qn = 1 and qs = 0. (a) The dashed
square is the empty diagram. In this example, (b) unary
word-states are prepared by parameterised Rx rotations
followed by Rz rotations and (c) k-ary word-states are pre-
pared by parameterised word-circuits of width k and depth
d = 2. (d) The cup is mapped to a Bell effect, i.e. a CNOT implemented by component-wise substitution of
followed by a Hadamard on the control and postselection
on h00|. (e) The Kronecker tensor modelling the relative
the boxes and wires in the diagrams. Such a
pronoun is mapped to a GHZ state. structure preserving mapping constitutes a func-
tor, and ensures that the model instantiation is
‘canonical’. Valid choices of semantics range from
sure the types are respected. Output-only pro- neural networks, to tensor networks (which was
cesses are called states and input-only processes the default choice for the majority of the DisCo-
are called effects. Cat literature), to quantum processes, and even
A grammatical reduction is viewed as a process hybrid combinations involving components from
and so it can be represented as a string diagram. In the whole range of available choice by interpreting
string diagrams representing pregroup-grammar the string diagram correspondingly.
reductions, words are represented as states and In this work, we focus on the latter choice
pairwise type-reductions are represented by a of quantum processes, and in particular, we will
pattern of nested cup-effects (wires bent in a U- be realising quantum processes using pure quan-
shape), and identities (straight wires). Wires in tum circuits. As we have described in Ref.[41],
the string diagram carry the label of the basic the string diagram of the syntactic structure of a
type being reduced. As an example, in Fig.1 we sentence σ can be canonically mapped to a PQC
show the string diagram representing the pregroup Cσ (θσ ) over the parameter set θ. The key idea
reductions for “Romeo who loves Juliet dies”. here is that such circuits inherit their architecture,
Only the s-wire is left open, which is the witness in terms of a particular connectivity of entan-
of grammaticality. gling gates, from the grammatical reduction of the
Given a string diagram resulting from the sentence.
grammatical reduction of a sentence, we can Quantum circuits also, being part of pure
instantiate a model for natural language process- quantum theory, enjoy a graphical language in
ing by giving semantics to the string diagram. terms of string diagrams. The mapping from sen-
This two-step process of constructing a model, tence diagram to quantum circuit begins simply
where syntax and the semantics are treated sep- by reinterpreting a sentence diagram, such as that
arately, is the origin of the framework’s name; of Fig.1, as a diagram in categorical quantum
”Compositional” refers to the string diagram mechanics (CQM). The word-state of word w in
describing structure and ”Distributional” refers to a sentence diagram is mapped to a pure quantum
the semantic spaces where meaning is encoded and state prepared from a trivial reference product-
⊗qw
processed. Any choice of semantics that respects state by
P a PQC as |w(θw )i = Cw (θw )|0i , where
the compositional structure is allowed, and is qw = b∈tw qb . The width of the circuit depends
Springer Nature 2021 LATEX template

on the number of qubits assigned to each pre- a Kronecker tensor (a.k.a. ’spider’), whose entries
group type b ∈ B from which the word-types are are all zeros except when all indices are the same
composed and cups are mapped to Bell effects. for which case the entries are ones [15, 16]. It is
Given a sentence σ, we instantiate its quan- also known as a ’copy’ tensor, as it copies the
tum circuit by first concatenating in parallel the computational basis.
word-circuits of each word as they appear in the In Fig.2 we show an example of choices
sentence, corresponding N to performing a tensor of word-circuits for specific numbers of qubits
product, Cσ (θσ ) = w Cw (θw ) which prepares assigned to each basic pregroup type (Appendix
the state |σ(θσ )i from the all-zeros basis state. E3). In Fig.3 we show the corresponding circuit
As such, a sentence is parameterised by the con- to “Romeo who loves Juliet dies”. In practice,
catenation of parameters of its words, θσ = we perform the mapping of sentence diagrams
∪w∈σ θw . The parameters θw determine the word- to quantum circuits using the Python library
embedding |w(θw )i. In other words, we use the DisCoPy [50], which provides a data structure for
Hilbert space as a feature space [32, 48, 49] in monoidal string-diagrams and enables the instan-
which the word-embeddings are defined. Finally, tiation of functors, including functors based on
we apply Bell effects as dictated by the cup pat- PQCs.
tern in the grammatical reduction, a function Here, a motivating remark is in order. In ”clas-
whose result we shall denote gσ (|σ(θσ )i). Note sical” implementations of DisCoCat, where the
that in general this procedure prepares an unnor- semantics chosen in order to realise a model is
malised quantum state. In the special case where in terms of tensor networks, a sentence diagram
no qubits are assigned to the sentence type, i.e. represents a vector which results from a tensor
qs = 0, then it is an amplitude which we write contraction. In this case, meanings of words are
as hgσ |σ(θσ )i. Formally, this mapping constitutes encoded in the state-tensors in terms of cooc-
a parameterised functor from the pregroup gram- currence frequencies or other vector-space word-
mar category to the category of quantum circuits. embeddings [51]. In general, tensor contractions
The parameterisation is defined via a function are exponentially expensive to compute. The cost
from the set of parameters to functors from the scales exponentially with the order of the largest
aforementioned source and target categories. tensors present in the tensor network and the
Our model has hyperparameters (Appendix base of the scaling is the dimension of the vector
E). The wires of the DisCoCat diagrams we con- spaces carried by the wires. However, tensor net-
sider carry types n or s. The number of qubits works resulting from interpreting syntactic string
that we assign to each pregroup type are qn and diagrams over vector spaces and linear maps do
qs . These determine the arity of each word, i.e. the not have a generic topology; rather they are tree-
width of the quantum circuit that prepares each like. This means that contracting tensor networks
word-state. We set qs = 0 throughout this work, whose connectivity is given by pregroup reduc-
which establishes that the sentence-circuits repre- tions are efficiently contractable as a function of
sent scalars. For a unary word w, i.e. a word-state the dimension of the wires carrying the vector
on 1 qubit, we choose to prepare using two rota- spaces playing the role of semantic spaces. Even
2 1
tions as Rz (θw )Rx (θw )|0i. For a word w of arity in this case however, the dimension of the wires
k ≥ 2, we use a depth-d IQP-style parameterisa- for NLP-relevant applications can become pro-
tion [32] consisting of d layers where each layer hibitively large (order of hunderds) in practice. In
consists of a layer of Hadamard gates followed by a a fault-tolerant quantum computing setting, ide-
i
layer of controlled-Z rotations CRz (θw ), such that ally, as is proposed in Ref.[34], one has access to a
i ∈ {1, 2, . . . , d(k − 1)}. Such circuits are in part QRAM and one would be able to efficiently encode
motivated by the conjecture that circuits involv- such tensor entries as quantum amplitudes using
ing them are classically hard to evaluate [32]. The only ⌈log2 d⌉ qubits to encode a d-dimensional vec-
relative pronoun “who” is mapped to the GHZ cir- tor. However, building a QRAM currently remains
cuit, i.e. the circuit that prepares a GHZ state on challenging [52]. In the NISQ case, we still attempt
the number of qubits as determined by qn and qs . to take advantage of the tensor-product structure
This is justified by prior work where relative pro- defined by a collection of qubits which provides
nouns and other functional words are modelled by an exponentially large Hilbert space as a function
Springer Nature 2021 LATEX template

of the number of qubits, and can be used as a predicted incorrectly:


feature-space in which the word-embeddings can
be trained. Consequently, we adopt the paradigm 1 X
of QML in terms of PQCs to carry out near-term eA = | ⌊lσpr (θ∗ )⌉ − lσ , A = ∆, E.
|A|
QNLP tasks. Any possible quantum advantage is σ∈A

to be identified heuristically on a case by case basis


depending on the task, the data, and the available This supervised learning task of binary classi-
quantum computing resources. fication for sentences is a special case of question
answering (QA) [57–59]; questions are posed as
statements and the truth labels are the binary
3 Classification task answers. After training on ∆, the model predicts
the answer to a previously unseen question from
Now that we have established our construction of E, which comprises sentences containing words all
sentence circuits, we describe a simple QNLP task. of which have appeared in ∆. The optimisation is
The dataset or ‘labelled corpus’ K = {(Dσ , lσ )}σ , performed over the parameters of all the sentences
is a finite set of sentence-diagrams {Dσ }σ con- in the training set θ = ∪σ∈∆ θσ . In our exper-
structed from a finite vocabulary of words V . Each iments, each word appears at least once in the
sentence has a binary label lσ ∈ {0, 1}. In this training set and so θ = ∪w∈V θw . Note that what
work, the labels represent the sentences’ truth- is being learned are the inputs, i.e. the quantum
values; 0 for False and 1 for True. Our setup word embeddings, to an entangling process cor-
trivially generalises to multi-class classification by responding to the grammar. Recall that a given
assigning qs = log2 (#classes) to the s-type wire sentence-circuit does not necessarily involve the
and measuring in the Z-basis. parameters of every word. However, every word
We split K into the training set ∆ containing appears in at least one sentence, which introduces
the first ⌊p|{Dσ }σ |⌉ of the sentences, where p ∈ classical correlations between the sentences. This
(0, 1), and the test set E containing the rest. makes such a learning task possible.
We define the predicted label as In this work, we use artificial data in the form
of a very small-scale corpus of grammatical sen-
lσpr (θσ ) = |hgσ |σ(θσ )i|2 ∈ [0, 1] (1) tences. We randomly generate sentences using a
simple context-free grammar (CFG) whose pro-
duction rules we define. Each sentence then is
from which we can obtain the binary label by
accompanied by its syntax tree by definition of our
rounding to the nearest integer ⌊lσpr ⌉ ∈ {0, 1}.
generation procedure. The syntax tree can be cast
The parameters of the words need to be opti-
in string-diagram form, and each CFG-generated
mised (or trained) so that the predicted labels
sentence-diagram can then be transformed into a
match the labels in the training set. The optimiser
DisCoCat diagram (see Appendix C for details).
we invoke is SPSA [55], a gradient-free optimiser
Even though the data is synthetic, we curate
which has shown adequate performance in noisy
the data by assigning labels by hand so that
settings [56] (Appendix F). The cost function we
the truth values among the sentences are consis-
define is
tent with a story, rendering the classification task
X semantically non-trivial.
L(θ) = (lσpr (θσ ) − lσ )2 . (2) Were one to use real-world datasets of labelled-
σ∈∆ sentences, one could use a parser in order to obtain
the syntactic structures of the sentences; the rest
Minimising the cost function returns the opti- of our pipeline would still remain. Since the writ-
mal parameters θ∗ = argminL(θ) from which the ing of this manuscript, the open-source Python
model predicts the labels lσpr (θ∗ ). Essentially, this package lambeq [53] has been made available,
constitutes learning a functor from the grammar which couples to the end-to-end parser Bobcat.
category to the category of quantum circuits. We The parser, given a sentence, returns its syntax
then quantify the performance by the training and tree as a grammatical reduction in the Combi-
test errors e∆ and eE , as the proportion of labels natory Categorial Grammar (CCG). The package
Springer Nature 2021 LATEX template

Fig. 5 Convergence of the cost L(θ) evaluated on quan-


tum computers vs SPSA iterations for corpus K16 . For
qn = 1, d = 2, for which |θ| = 10, on ibmq montreal (blue)
we obtain etr = 0.125 and ete = 0.5. For qn = 1, d = 3,
where |θ| = 13, on ibmq toronto (green) we get etr = 0.125
and a lower testing error ete = 0.375. On ibmq montreal
(red) we get both lower training and testing errors, etr = 0,
ete = 0.375 than for d = 2. In all cases, the CNOT-depth
of any sentence-circuit after TKET-compilation is at most
3. Classical simulations (dashed), averaged over 20 realisa-
tions, agree with behaviour on IBMQ for both cases d = 2
(yellow) and d = 3 (purple).

function, for qn = 1 and qn = 2, for increasing


word-circuit depth d. To clearly show the decrease
in training and test errors as a function of d
Fig. 4 Convergence of mean cost function hL(θ)i vs num- when invoking the global optimiser basinhopping
ber of SPSA iterations for corpus K30 . A lower minimum is (Appendix F).
reached for larger d. (Top) qn = 1 and |θ| = 8 + 2d. Results
are averaged over 20 realisations. (Bottom) qn = 2 and
|θ| = 10d. Results are averaged over 5 realisations. (Insets) 3.2 Experiments on IBMQ
Mean training and test errors hetr i, hete i vs d. Using the
global optimisation basinhopping with local optimisation We now turn to readily available NISQ devices
Nelder-Mead (red), the errors decrease with d. provided by the IBMQ in order to estimate the
predicted labels in Eq.1.
also has the capability to translate CCG reduc- Before each circuit can be run on a backend,
tions to pregroup reductions [54], and so effec- in this case a superconducting quantum processor,
tively it is the first practical tool introduced for it first needs to be compiled. A quantum com-
large-scale parsing in pregroup grammar. The piler takes as input a circuit and a backend and
string diagrams in lambeq are encoded using outputs an equivalent circuit which is compatible
DisCoPy, and thus enables the instantiation of with the backend’s topology. A quantum compiler
large-scale DisCoCat models on real-world data. also aims to minimise the most noisy operations.
For IBMQ, the gate most prone to erros is the
entangling CNOT gates. The compiler we use in
3.1 Classical Simulation this work is TKET [60], and for each circuit-run on
We first show results from classical simulations of a backend, we use the maximum allowed number
the QA task. The sentence circuits are evaluated of shots (Appendix G).
exactly on a classical computer to compute the We consider the corpus K16 from 6 words
predicted labels in Eq.1. We consider the corpus (Appendix D) and set p = 0.5. For every evalua-
K30 of 30 sentences sampled from the vocabulary tion of the cost function under optimisation, the
of 7 words (Appendix D) and we set p = 0.5. circuits were run on the IBMQ quantum comput-
In Fig.4 we show the convergence of the cost ers ibmq montreal and ibmq toronto. In Fig.5
Springer Nature 2021 LATEX template

we show the convergence of the cost function not require postselection, such as the pregroup-
under SPSA optimisation and report the training based models where in order for each Bell effect
and testing errors for different choices of hyper- to take place one needs to postselect on measure-
parameters. This constitutes the first non-trivial ments involving the qubits on which one wishes to
QNLP experiment on a programmable quantum realise a Bell effect.
processor. According to Fig.4, scaling up the word- We also look toward more complex QNLP
circuits results in improvement in training and tasks such as sentence similarity and work with
testing errors, and remarkably, we observe this on real-world large-scale data using a pregroup
the quantum computer, as well. This is impor- parser, as made possible with lambeq [53]. In that
tant for the scalability of our experiment when context, regularisation techniques during training
future hardware allows for greater circuit sizes and will become important, which is an increasingly
thus richer quantum-enhanced feature spaces and relevant topic for QML that in general deserves
grammatically more complex sentences. more attention [43].
In addition, our DisCoCat-based QNLP frame-
4 Discussion and Outlook work is naturally generalisable to accommodate
mapping sentences to quantum circuits involving
We have performed the first-ever quantum natu- mixed states and quantum channels. This is useful
ral language processing experiment by means of as mixed states allow for modelling lexical entaile-
classification of sentences annotated with binary ment and ambiguity [63, 64]. As also stated above,
labels, a special case of QA, on actual quantum it is possible to define functors in terms of hybrid
hardware. We used a compositional-distributional models where both neural networks and PQCs are
model of meaning, DisCoCat, constructed by a involved, where heuristically one aims to quantify
structure-preserving mapping from grammatical the possible advantage of such models compared
reductions of sentences to PQCs. This proof- to strictly classical ones.
of-concept work serves as a demonstration that Furthermore, note that the word-embeddings
QNLP is possible on currently available quantum are learned in-task in this work. However, train-
devices and that it is a line of research worth ing PQCs to prepare quantum states that serve
exploring. as word embeddings can be achieved by using the
A remark on postselection is in order. QML- usual NLP objectives [51]. It is interesting to ver-
based QNLP tasks such as the one implemented ify that such pretrained word embeddings can be
in this work rely on the optimisation of a scalar useful in downstream tasks, such as the simple
cost function. In general, estimating a scalar classification task presented in this work.
encoded in an amplitude on a quantum computer Finally, looking beyond the DisCoCat model,
requires either postselection or coherent control it is well-motivated to adopt the recently intro-
over arbitrary circuits so that a swap test or duced DisCoCirc model [65] of meaning and its
a Hadamard test can be performed (Appendix mapping to PQCs [66], which allows for QNLP
H). Notably, in special cases of interest to QML, experiments on text-scale real-world data in a
the Hadamard test can be adapted to NISQ fully-compositional framework. In this model,
technologies [61, 62]. In its general form, how- nouns are treated as first-class citizen ‘entities’ of
ever, the depth-cost resulting after compilation of a text and makes sentence composition explicit.
controlled circuits becomes prohibitable with cur- Entities go through gates which act as modifiers
rent quantum devices. However, given the rapid on them, modelling for example the application
improvement in quantum computing hardware, we of adjectives or verbs. The model also considers
envision that such operations will be within reach higher-order modifiers, such as adverbs modify-
in the near-term. ing verbs. This interaction structure, viewed as
Future work includes experimentation with a process network, can again be used to instan-
other grammars, such as CCG which returns tiate models in terms of neural networks, tensor
tree-like diagrams, and using them to construct networks, or quantum circuits. In the latter case,
PQC-based functors, as is done in Ref.[14] but entities are modelled as density matrices carried
with neural networks. This for example would by wires and their modifiers as quantum channels.
enable the design of PQC-based functors that do
Springer Nature 2021 LATEX template

Acknowledgments
KM thanks Vojtech Havlicek and Christopher
Self for discussions on QML, Robin Lorenz
and Marcello Benedetti for comments on the
manuscript, and the TKET team at Quantinuum
for technical advice on quantum compilation on
IBMQ machines. KM is grateful to the Royal
Commission for the Exhibition of 1851 for finan-
cial support under a Postdoctoral Research
Fellowship. AT thanks Simon Harrison for finan-
cial support through the Wolfson Harrison UK
Research Council Quantum Foundation Scholar-
ship. We acknowledge the use of IBM Quantum
services for this work. The views expressed are
those of the authors, and do not reflect the official
policy or position of IBM or the IBM Quantum
team.

Author contributions All authors con-


tributed to the theory, the design of the model,
and the high-level definition of the classification
task. AT and GDF wrote the DisCoPy library
[50] and tested early versions of the experiment
on ibmq singapore. KM generated and anno-
tated the data, implemented the simulations and
experiments, and wrote the manuscript. BC led
the project.

Author information The authors declare


no competing financial interests.

Data availability The toy datasets of gen-


erated sentences used in this work can be found
in Appendix D. Further information about the
datasets generated and analysed during the cur-
rent study are available from the corresponding
author on reasonable request.
Springer Nature 2021 LATEX template

Appendix denotes the set of finite strings that can be gener-


In this supplementary material we begin by ated by the elements of the set A. Each dictionary
briefly reviewing pregroup grammar. We then pro- entry assigns
Q a product (or string) of types to a
vide the necessary background to the graphical word tw = i bki i , ki ∈ Z.
language of process theories describe our proce- Finally, a pregroup grammar G generates a
dure for generating random sentence diagrams language LG ⊆ V ∗ as follows. A sentence is a
using a context-free grammar. For completeness sequence (or list) of words σ ∈ V ∗ . The type of
we include the three labelled corpora of sen- a sentence
Q is the product of types of its words
tences we used in this work. Furthermore, we show tσ = i twi , where wi ∈ V and i ≤ |σ|. A sentence
details of our mapping from sentence diagrams to is grammatical, i.e. it belongs to the language gen-
quantum circuits. Finally we give details on the erated by the grammar σ ∈ LG , if and only if there
optimisation methods we used for our supervised exists a sequence of reductions so that the type of
quantum machine learning task and the specific the sentence reduces to the special sentence-type
compilation pass we used from CQC’s compiler, s ∈ B as tσ → · · · → s. Note that it is in fact pos-
TKET. sible to type-reduce grammatical sentences only
using contractions.

Appendix A Pregroup
Grammar Appendix B String
Diagrams
Pregroup grammars where introduced by Lambek
as an algebraic model for grammar [44]. String diagrams describing process theories are
A pregroup grammar G is freely generated by generated by states, effects, and processes. In
the basic types in a finite set b ∈ B. Basic types Fig.B1 we comprehensively show these generators
are decorated by an integer k ∈ Z, which signi- along with constraining equations on them. String
fies their adjoint order. Negative integers −k, with diagrams for process theories formally describe
k ∈ N, are called left adjoints of order k and pos- process networks where only connectivity matters,
itive integers k ∈ N are called right adjoints. We i.e. which outputs are connected to which inputs.
shall refer to a basic type to some adjoint order In other words, the length of the wires carries no
(include the zeroth order) simply as ‘type’. The meaning and the wires are freely deformable as
zeroth order k = 0 signifies no adjoint action on long as the topology of the network is respected.
the basic type and so we often omit it in notation, It is beyond the purposes of this work to pro-
b0 = b. vide a comprehensive exposition on diagrammatic
The pregroup algebra is such that the two languages. We provide the necessary elements
kinds of adjoint (left and right) act as left and which are used for the implementation of our
right inverses under multiplication of basic types QNLP experiments.

bk bk+1 → ǫ → bk+1 bk ,
Appendix C Random
where ǫ ∈ B is the trivial or unit type. The left Sentence
hand side of this reduction is called a contraction Generation
and the right hand side an expansion. Pregroup with CFG
grammar also accommodates induced steps a → b
for a, b ∈ B. The symbol ‘→’ is to be read as A context-free grammar generates a language from
‘type-reduction’ and the pregroup grammar sets a set of production (or rewrite) rules applied on
the rules for which reductions are valid. symbols. Symbols belong to a finite set Σ and
Now, to go from word to sentence, we consider There is a special type S ∈ Σ called initial. Pro-
a finite set of words called the vocabulary V . We duction rules
Qbelong to a finite set R and are of the
call the dictionary (or lexicon) the finite set of form T → i Ti , where T, Ti ∈ Σ. The applica-
entries D ⊆ V × (B × Z)∗ . The star symbol A∗ tion of a production rule results in substituting a
Springer Nature 2021 LATEX template

11

Fig. C2 CFG generation rules used to produce the cor-


pora K30 , K6 , K16 used in this work, represented as string-
diagram generators, where wN ∈ VN , wT V ∈ VT V , wIV ∈
VIV , wRP RON ∈ VRP RON . They are mapped to pregroup
reductions by mapping CFG symbols to pregroup types,
and so CFG-states are mapped to DisCoCat word-states
and production boxes are mapped to products of cups and
identities. Note that the pregroup unit ǫ is the empty wire
and so it is never drawn. Pregroup type contractions corre-
spond to cups and expansions to caps. Since grammatical
reduction are achievable only with contractions, only cups
are required for the construction of sentence diagrams.

Fig. B1 Diagrams are read from top to bottom. States rule whose output is the S-type. Then we ran-
have only outputs, effects have only inputs, processes
(boxes) have both input and output wires. All wires carry domly pick boxes and compose them backwards,
types. Placing boxes side by side is allowed by the monoidal always respecting type-matching when inputs of
structure and signifies parallel processes. Sequential pro- production rules are fed into outputs of other pro-
cess composition is represented by composing outputs of
a box with inputs of another box. A process transforms a
duction rules. The generation terminates when
state into a new state. There are special kinds of states production rules are applied which have no inputs
called caps and effects called cups, which satisfy the snake (i.e. they are states), and they correspond to the
equation which relates them to the identity wire (trivial words in the finite vocabulary.
process). Process networks freely generated by these gen-
erators need not be planar, and so there exists a special
In Fig.C2 (on the left hand side of the arrows)
process that swaps wires and acts trivially on caps and we show the string-diagram generators we use
cups. to randomly produce sentences from a vocabu-
lary of words composed of nouns, transitive verbs,
intransitive verbs, and relative pronouns. The cor-
symbol with a product (or string) of symbols. Ran- responding types of these parts of speech are
domly generating a sentence amounts to starting N, T V, IV, RP RON . The vocabulary is the union
from S and randomly applying production rules of the words of each type, V = VN ∪ VT V ∪ VIV ∪
uniformly sampled from the set R. The produc- VRP RON .
tion ends when all types produced are terminal Having randomly generated a sentence from
types, which are non other than words in the finite the CFG, its string diagram can be translated into
vocabulary V . a pregroup sentence diagram. To do so we use the
From a process theory point of view, we repre- translation rules shown in Fig.C2. Note that a cup
sent symbols as types carried by wires. Production labeled by the basic type b is used to represent a
rules are represented as boxes with input and out- contraction bk bk+1 → ǫ. Pregroup grammars are
put wires labelled by the appropriate types. The weakly equivalent to context-free grammars, in the
process network (or string diagram) describing the sense that they generate the same language [67,
production of a sentence ends with a production 68].
Springer Nature 2021 LATEX template

Appendix D Corpora (’Romeo loves Romeo’, 0.0),


(’Juliet loves Romeo’, 0.0),
Here we present the sentences and their labels used (’Juliet dies’, 1.0)]
in the experiments presented in the main text.
The types assigned to the words of this sen- Corpus K16 of 16 labeled sentences from the
tence are as follows. Nouns get typed as tw∈VN = vocabulary VN = {’Romeo’, ’Juliet’}, VT V =
n0 , transitive verbs are given type tw∈VT V = {’loves’, ’kills’}, VIV = {’dies’}, VRP RON =
n1 s0 n−1 , intransitive verbs are typed tw∈IV = {’who’}:
n1 s0 , and the relative pronoun is typed twho = [(’Juliet kills Romeo who dies’, 0),
n1 n0 s−1 n0 . (’Juliet dies’, 1),
Corpus K30 of 30 labeled sentences from the (’Romeo who loves Juliet dies’, 1),
vocabulary VN = {’Dude’, ’Walter’}, VT V = (’Romeo dies’, 1),
{’loves’, ’annoys’}, VIV = {’abides’,’bowls’}, (’Juliet who dies dies’, 1),
VRP RON = {’who’}: (’Romeo loves Juliet’, 1),
[(’Dude who loves Walter bowls’, 1), (’Juliet who dies loves Juliet’, 0),
(’Dude bowls’, 1), (’Romeo kills Juliet who dies’, 0),
(’Dude annoys Walter’, 0), (’Romeo who kills Romeo dies’, 1),
(’Walter who abides bowls’, 0), (’Romeo who dies dies’, 1),
(’Walter loves Walter’, 1), (’Romeo who loves Romeo dies’, 0),
(’Walter annoys Dude’, 1), (’Romeo kills Juliet’, 0),
(’Walter bowls’, 1), (’Romeo who dies kills Romeo’, 1),
(’Walter abides’, 0), (’Juliet who dies kills Romeo’, 0),
(’Dude loves Walter’, 1), (’Romeo loves Romeo’, 0),
(’Dude who bowls abides’, 1), (’Romeo who dies kills Juliet’, 0)]
(’Walter who bowls annoys Dude’, 1),
(’Dude who bowls bowls’, 1),
(’Dude who abides abides’, 1), Appendix E Sentence to
(’Dude annoys Dude who bowls’, 0),
(’Walter annoys Walter’, 0),
Circuit mapping
(’Dude who abides bowls’, 1), Quantum theory has formally been shown to be a
(’Walter who abides loves Walter’, 0), process theory. Therefore it enjoys a diagrammatic
(’Walter who bowls bowls’, 1), language in terms of string diagrams. Specifically,
(’Walter loves Walter who abides’, 0), in the context of the quantum circuits we con-
(’Walter annoys Walter who bowls’, 0), struct in our experiments, we use pure quantum
(’Dude abides’, 1), theory. In the case of pure quantum theory, pro-
(’Dude loves Walter who bowls’, 1), cesses are unitary operations, or quantum gates
(’Walter who loves Dude bowls’, 1), in the context of circuits. The monoidal struc-
(’Dude loves Dude who abides’, 1), ture allowing for parallel processes is instantiated
(’Walter who abides loves Dude’, 0), by the tensor product and sequential composi-
(’Dude annoys Dude’, 0), tion is instantiated by sequential composition of
(’Walter who annoys Dude bowls’, 1), quantum gates.
(’Walter who annoys Dude abides’, 0), In Fig.E3 we show the generic construction of
(’Walter loves Dude’, 1), the mapping from sentence diagrams to parame-
(’Dude who bowls loves Walter’, 1)] terised quantum circuits for the hyperparameters
and parameterised word-circuits we use in this
Corpus K6 of 6 labeled sentences from the work.
vocabulary VN = {’Romeo’, ’Juliet’}, VT V = A wire carrying basic pregroup type b is given
{’loves’}, VIV = {’dies’}, VRP RON = {’who’}: qb qubits. A word-state with only one output wire
[(’Romeo dies’, 1.0), becomes a one-qubit-state prepared from |0i. For
(’Romeo loves Juliet’, 0.0), the preparation of such unary states we choose the
(’Juliet who dies dies’, 1.0), sequence of gates defining an Euler decomposition
Springer Nature 2021 LATEX template

13

Fig. F4 Minimisation of binary cross entropy cost func-


tion LBCE with SPSA for the question answering task for
corpus K30 .

pick randomly a direction and estimate the deriva-


tive by finite difference with step-size depending
on c towards that direction. This requires two cost
Fig. E3 Mapping from sentence diagrams to parame- function evaluations. This provides a significant
terised quantum circuits. Here we show how the generators speed up the evaluation of L(θ). Then take a step
of sentence diagrams are mapped to generators of circuits,
for the hyperparameters we consider in this work. of size depending on a towards (opposite) that
direction if the derivative has negative (positive)
sign. In our experiments we use minimizeSPSA
of one-qubit unitaries Rz (θ1 ) ◦ Rx (θ2 ) ◦ Rz (θ3 ). from the Python package noisyopt [69], and we
Word-states with more than one output wires set a = 0.1 and c = 0.1, except for the experiment
become multiqubit states on k > Q 1 qubits pre- on ibmq for d = 3 for which we set a = 0.05 and
k
pared by an IQP-style circuit from i=1 |0i. Such c = 0.05.
a word-circuit is composed of d-many layers. Each Note that for classical simulations, we use
layer is composed of a layer of Hadamard gates just-in-time compilation of the cost function by
followed by a layer in which every neighbour- invoking jit from jax [70]. In addition, the choice
ing pair of qubit
 wires is connected by  a CRz (θ) of the squares-of-differences cost we defined in
k−1
gate, ⊗ki=1 H ◦ ⊗i=1 CRz (θi )i,i+1 . Since all Eq.2 is not unique. One can as well use the binary
CRz gates commute with each other it is justi- cross entropy
fied to consider this as a single layer, at least
abstractly. The Kronecker tensor with n-many 1 X
output wires of type b is mapped to a GHZ state LBCE (θ) = − lσ log lσpr (θσ )+(1−lσ ) log(1−lσpr(θσ ))
|∆|
on nqb qubits. Specifically, GHZ is a circuit that σ∈∆
P qb N
prepares the state 2x=0 ni=1 |bin(x)i, where bin
is the binary expression of an integer. The cup of and the cost function can be minimised as well, as
pregroup type b is mapped to qb -many nested Bell shown in Fig.F4.
effects, each of which is implemented as a CNOT In our classical simulation of the experiment
followed by a Hadamard gate on the control qubit we also used basinhopping [71] in combination
and postselection on h00|. with Nelder-Mead [72] from the Python pack-
age SciPy [73]. Nelder-Mead is a gradient-free
local optimisation method. basinhopping hops
Appendix F Optimisation (or jumps) between basins (or local minima) and
Method then returns the minimum over local minima of
the cost function, where each minimum is found
The gradient-free otpimisation method we use, by Nelder-Mead. The hop direction is random.
Simultaneous Perturbation Stochastic Approxi- The hop is accepted according to a Metropolis cri-
mation (SPSA), works as follows. Start from a ran- terion depending on the the cost function to be
dom point in parameter space. At every iteration minimised and a temperature. We used the default
Springer Nature 2021 LATEX template

Appendix G Quantum
Compilation
In order to perform quantum compilation we use
pytket [60]. It is a Python module for interfacing
with CQC’s TKET, a toolset for quantum pro-
gramming. From this toolbox, we need to make
use of compilation passes.
At a high level, quantum compilation can
be described as follows. Given a circuit and a
Fig. F5 Algebraic decay of mean training and testing device, quantum operations are decomposed in
error for the data displayed in Fig.4 (bottom) obtained terms of the devices native gateset. Further-
by basinhopping. Increasing the depth of the word-circuits more, the quantum circuit is reshaped in order
results in algebraic decay of the mean training and test-
to make it compatible with the device’s topol-
ing errors. The slopes are log etr ∼ log −1.2d and log ete ∼
−0.3 log d. We attribute the existence of the plateau for etr ogy [74]. Specifically, the compilation pass that we
at large depths is due the small scale of our experiment and use is default compilation pass(2). The inte-
the small values for our hyperparameters determining the ger option is set to 2 for maximum optimisation
size of the quantum-enhanced feature space.
under compilation [75].
Circuits written in pytket can be run on
other devices by simply changing the back-
temperature value (1) and the default number of end being called, regardless whether the hard-
basin hops (100). ware might be fundamentally different in terms
of what physical systems are used as qubits.
This makes TKET it platform agnostic. We
F.1 Error Decay stress that on IBMQ machines specifically, the
In Fig.F5 we show the decay of mean training and native gates are arbitrary single-qubit unitaries
test errors for the question answering task for cor- (‘U3’ gate) and entangling controlled-not gates
pus K30 simulated classically, which is shown as (‘CNOT’ or ‘CX’). Importantly, CNOT gates
inset in Fig.4. Plotting in log-log scale we reveal, show error rates which are one or even two
at least initially, an algebraic decay of the errors orders of magnitude larger than error rates of U3
with the depth of the word-circuits. gates. Therefore, we measure the depth of or cir-
cuits in terms of the CNOT-depth. Using pytket
this can be obtained by invoking the command
F.2 On the influence of noise to depth by type(OpType.CX).
the cost function landscape For both backends used in this work,
ibmq montreal and ibmq toronto, the reported
Regarding optimisation on a quantum computer, quantum volume is 32 and the maximum allowed
we comment on the effect of noise on the optimal number of shots is 213 .
parameters. Consider a successful optimisation
of L(θ) performed on a NISQ device, return-

ing θNISQ . However, if we instantiate the circuits Appendix H Swap Test and

Cσ (θNISQ ) and evaluate them on a classical com- Hadamard Test
CC ∗
puter to obtain the predicted labels lpr (θNISQ ),
we observe that these can in general differ from In our binary classification NLP task, the pre-
NISQ ∗
the labels lpr (θNISQ ) predicted by evaluating dicted label is the norm squared of zero-to-zero
the circuits on the quantum computer. In the con- transition amplitude where the unitary U rep-
text of a fault-tolerant quantum computer, this resents the word-circuits and the circuits that
should not be the case. However, since there is implement the Bell effects as dictated by the gram-
a non-trivial coherent-noise channel that our cir- matical structure. Estimating |h0 . . . 0|U |0 . . . 0i|2 ,
cuits undergo, it is expected that the optimiser’s or the amplitude h0 . . . 0|U |0 . . . 0i itself in case one
result are affected in this way. wants to define a cost function where it appears
Springer Nature 2021 LATEX template

15

Fig. G6 (Left) Circuit for the Hadamard test. Measur-


ing the control qubit in the computational basis allows
one to estimate hZi = Re(hψ|U |ψi) if b = 0, and hZi =
Im(hψ|U |ψi) if b = 1. The state ψ can be a multiqubit state,
and in this work we are interested in the case ψ = |0 . . . 0i.
(Right) Circuit for the swap test. Sampling from the control
qubit allows one to estimate hZi = |hψ|φi|2 .

instead of its norm, can be done by postselecting


on h0 . . . 0|. However, postselection costs exponen-
tial time in the number of postselected qubits; in
our case needs to discard all bitstring sampled
from the quantum computer that have Hamming
weight other than zero. This is the procedure we
follow in this proof of concept experiment, as we
can afford doing so due to the small circuit sizes.
In such a setting, postselection can be avoided
by using the swap test to estimate the normed
square of the amplitude or the Hadamard test for
the amplitude itself [76]. See Fig.G6 for the corre-
sponding circuits of those routines. In Fig.G7 we
show how the swap test or the Hadamard test can
be used to estimate the amplitude represented by
the postselected sentence-circuit of Fig.3. Further-
more, at least for tasks that are defined such that
every sentence corresponds to a circuit, we argue
that sentences do not grow arbitrarily long, and
so the cost of evaluating the cost function is upper
bounded in practical applications.

Fig. G7 Use of swap test (top) and Hadamard test (bot-


tom) to estimate the norm-squared of the amplitude or the
amplitude itself respectively, which is represented by the
References postselected circuit of Fig.3.

[1] Jurafsky, D., Martin, J.H.: Speech and Lan-


guage Processing: An Introduction to Natural
Language Processing, Computational Lin- [3] Brown, T.B., Mann, B., Ryder, N., Sub-
guistics, and Speech Recognition, 1st edn. biah, M., Kaplan, J., Dhariwal, P., Neelakan-
Prentice Hall PTR, USA (2000) tan, A., Shyam, P., Sastry, G., Askell, A.,
Agarwal, S., Herbert-Voss, A., Krueger, G.,
[2] Blackburn, P., Bos, J.: Representation and Henighan, T., Child, R., Ramesh, A., Ziegler,
Inference for Natural Language: A First D.M., Wu, J., Winter, C., Hesse, C., Chen,
Course in Computational Semantics. Center M., Sigler, E., Litwin, M., Gray, S., Chess, B.,
for the Study of Language and Information, Clark, J., Berner, C., McCandlish, S., Rad-
Stanford, CA (2005) ford, A., Sutskever, I., Amodei, D.: Language
Springer Nature 2021 LATEX template

Models are Few-Shot Learners (2020) [14] Socher, R., Perelygin, A., Wu, J., Chuang, J.,
Manning, C.D., Ng, A., Potts, C.: Recursive
[4] TURING, A.M.: I.—COMPUTING deep models for semantic compositionality
MACHINERY AND INTELLIGENCE. over a sentiment treebank. In: Proceedings of
Mind LIX(236), 433–460 (1950) https:// the 2013 Conference on Empirical Methods in
academic.oup.com/mind/article-pdf/LIX/ Natural Language Processing, pp. 1631–1642.
236/433/30123314/lix-236-433.pdf. https:// Association for Computational Linguistics,
doi.org/10.1093/mind/LIX.236.433 Seattle, Washington, USA (2013). https://
www.aclweb.org/anthology/D13-1170
[5] Searls, D.B.: The language of genes. Nature
420(6912), 211–217 (2002). https://doi.org/ [15] Sadrzadeh, M., Clark, S., Coecke, B.:
10.1038/nature01255 The frobenius anatomy of word meanings
i: subject and object relative pronouns.
[6] Zeng, Z., Shi, H., Wu, Y., Hong, Z.: Sur- Journal of Logic and Computation 23(6),
vey of natural language processing tech- 1293–1317 (2013). https://doi.org/10.1093/
niques in bioinformatics. Computational logcom/ext044
and Mathematical Methods in Medicine
2015, 674296 (2015). https://doi.org/10. [16] Sadrzadeh, M., Clark, S., Coecke, B.: The
1155/2015/674296 frobenius anatomy of word meanings ii: pos-
sessive relative pronouns. Journal of Logic
[7] Buhrmester, V., Münch, D., Arens, M.: Anal- and Computation 26(2), 785–815 (2014).
ysis of Explainers of Black Box Deep Neural https://doi.org/10.1093/logcom/exu027
Networks for Computer Vision: A Survey
(2019) [17] Lewis, M.: Towards logical negation for com-
positional distributional semantics (2020)
[8] Lambek, J.: The mathematics of sentence
structure. AMERICAN MATHEMATICAL [18] Pestun, V., Vlassopoulos, Y.: Tensor network
MONTHLY, 154–170 (1958) language model (2017)
[9] MONTAGUE, R.: Universal grammar. Theo- [19] Gallego, A.J., Orus, R.: Language Design as
ria 36(3), 373–398 (2008). https://doi.org/ Information Renormalization (2019)
10.1111/j.1755-2567.1970.tb00434.x
[20] Bradley, T.-D., Stoudenmire, E.M., Ter-
[10] Chomsky, N.: Syntactic Structures. Mouton, illa, J.: Modeling Sequences with Quantum
(1957) States: A Look Under the Hood (2019)
[11] Coecke, B., Sadrzadeh, M., Clark, S.: Math- [21] Efthymiou, S., Hidary, J., Leichenauer, S.:
ematical Foundations for a Compositional TensorNetwork for Machine Learning (2019)
Distributional Model of Meaning (2010)
[22] Eisert, J.: Entanglement and tensor network
[12] Grefenstette, E., Sadrzadeh, M.: Experimen- states (2013)
tal support for a categorical compositional
distributional model of meaning. In: The 2014 [23] Orús, R.: Tensor networks for complex
Conference on Empirical Methods on Natural quantum systems. Nature Reviews Physics
Language Processing., pp. 1394–1404 (2011). 1(9), 538–550 (2019). https://doi.org/10.
arXiv:1106.4058 1038/s42254-019-0086-7

[13] Kartsaklis, D., Sadrzadeh, M.: Prior disam- [24] Arute, F., Arya, K., Babbush, R., Bacon,
biguation of word tensors for constructing D., Bardin, J.C., Barends, R., Biswas, R.,
sentence vectors. In: The 2013 Conference Boixo, S., Brandao, F.G.S.L., Buell, D.A.,
on Empirical Methods on Natural Language Burkett, B., Chen, Y., Chen, Z., Chiaro, B.,
Processing., pp. 1590–1601. ACL, (2013) Collins, R., Courtney, W., Dunsworth, A.,
Springer Nature 2021 LATEX template

17

Farhi, E., Foxen, B., Fowler, A., Gidney, C., In: Wallach, H., Larochelle, H., Beygelz-
Giustina, M., Graff, R., Guerin, K., Habeg- imer, A., d' Alché-Buc, F., Fox, E., Garnett,
ger, S., Harrigan, M.P., Hartmann, M.J., Ho, R. (eds.) Advances in Neural Information
A., Hoffmann, M., Huang, T., Humble, T.S., Processing Systems, vol. 32, pp. 4134–4144.
Isakov, S.V., Jeffrey, E., Jiang, Z., Kafri, Curran Associates, Inc., (2019). https://
D., Kechedzhi, K., Kelly, J., Klimov, P.V., proceedings.neurips.cc/paper/2019/file/
Knysh, S., Korotkov, A., Kostritsa, F., Land- 16026d60ff9b54410b3435b403afd226-Paper.
huis, D., Lindmark, M., Lucero, E., Lyakh, pdf
D., Mandrà, S., McClean, J.R., McEwen, M.,
Megrant, A., Mi, X., Michielsen, K., Mohseni, [30] Dunjko, V., Taylor, J.M., Briegel, H.J.:
M., Mutus, J., Naaman, O., Neeley, M., Neill, Quantum-enhanced machine learning. Phys-
C., Niu, M.Y., Ostby, E., Petukhov, A., Platt, ical Review Letters 117(13) (2016). https://
J.C., Quintana, C., Rieffel, E.G., Roushan, doi.org/10.1103/physrevlett.117.130501
P., Rubin, N.C., Sank, D., Satzinger, K.J.,
Smelyanskiy, V., Sung, K.J., Trevithick, [31] Chia, N.-H., Gilyén, A., Li, T., Lin, H.-H.,
M.D., Vainsencher, A., Villalonga, B., White, Tang, E., Wang, C.: Sampling-based sublin-
T., Yao, Z.J., Yeh, P., Zalcman, A., Neven, ear low-rank matrix arithmetic framework
H., Martinis, J.M.: Quantum supremacy for dequantizing quantum machine learn-
using a programmable superconducting pro- ing. Proceedings of the 52nd Annual ACM
cessor. Nature 574(7779), 505–510 (2019). SIGACT Symposium on Theory of Comput-
https://doi.org/10.1038/s41586-019-1666-5 ing (2020). https://doi.org/10.1145/3357713.
3384314
[25] Bharti, K., Cervera-Lierta, A., Kyaw, T.H.,
Haug, T., Alperin-Lea, S., Anand, A., Deg- [32] Havlı́ček, V., Córcoles, A.D., Temme, K.,
roote, M., Heimonen, H., Kottmann, J.S., Harrow, A.W., Kandala, A., Chow, J.M.,
Menke, T., Mok, W.-K., Sim, S., Kwek, L.-C., Gambetta, J.M.: Supervised learning with
Aspuru-Guzik, A.: Noisy intermediate-scale quantum-enhanced feature spaces. Nature
quantum (NISQ) algorithms (2021) 567(7747), 209–212 (2019). https://doi.org/
10.1038/s41586-019-0980-2
[26] Wootton, J.R.: Procedural generation using
quantum computation. International Confer- [33] Li, Z., Liu, X., Xu, N., Du, J.: Experimen-
ence on the Foundations of Digital Games tal realization of a quantum support vector
(2020). https://doi.org/10.1145/3402942. machine. Physical Review Letters 114(14)
3409600 (2015). https://doi.org/10.1103/physrevlett.
114.140504
[27] Harrow, A.W., Hassidim, A., Lloyd, S.:
Quantum algorithm for linear systems of [34] Zeng, W., Coecke, B.: Quantum algorithms
equations. Physical Review Letters 103(15) for compositional natural language process-
(2009). https://doi.org/10.1103/physrevlett. ing. Electronic Proceedings in Theoreti-
103.150502 cal Computer Science 221, 67–75 (2016).
https://doi.org/10.4204/eptcs.221.8
[28] Beer, K., Bondarenko, D., Farrelly, T.,
Osborne, T.J., Salzmann, R., Scheiermann, [35] O’Riordan, L.J., Doyle, M., Baruffa, F., Kan-
D., Wolf, R.: Training deep quantum neu- nan, V.: A hybrid classical-quantum work-
ral networks. Nature Communications 11(1) flow for natural language processing. Machine
(2020). https://doi.org/10.1038/s41467-020- Learning: Science and Technology (2020).
14454-2 https://doi.org/10.1088/2632-2153/abbd2e

[29] Kerenidis, I., Landman, J., Luongo, A., [36] Wiebe, N., Bocharov, A., Smolensky, P.,
Prakash, A.: q-means: A quantum algo- Troyer, M., Svore, K.M.: Quantum Language
rithm for unsupervised machine learning. Processing (2019)
Springer Nature 2021 LATEX template

[37] Bausch, J., Subramanian, S., Piddock, S.: A learning in feature hilbert spaces. Physical
Quantum Search Decoder for Natural Lan- Review Letters 122(4) (2019). https://doi.
guage Processing (2020) org/10.1103/physrevlett.122.040504

[38] Chen, J.C.: Quantum computation and nat- [49] Lloyd, S., Schuld, M., Ijaz, A., Izaac, J., Kil-
ural language processing (2002) loran, N.: Quantum embeddings for machine
learning (2020)
[39] Coecke, B., Kissinger, A.: Picturing Quan-
tum Processes. A First Course in Quantum [50] de Felice, G., Toumi, A., Coecke, B.: Dis-
Theory and Diagrammatic Reasoning. Cam- CoPy: Monoidal Categories in Python (2020)
bridge University Press, (2017). https://doi.
org/10.1017/9781316219317 [51] Mikolov, T., Chen, K., Corrado, G., Dean, J.:
Efficient Estimation of Word Representations
[40] Abramsky, S., Coecke, B.: A categorical in Vector Space (2013)
semantics of quantum protocols. In: Pro-
ceedings of the 19th Annual IEEE Sympo- [52] Aaronson, S.: Read the fine print. Nature
sium on Logic in Computer Science, 2004., Physics 11(4), 291–293 (2015)
pp. 415–425 (2004). https://doi.org/10.1109/
LICS.2004.1319636 [53] Kartsaklis, D., Fan, I., Yeung, R., Pear-
son, A., Lorenz, R., Toumi, A., de Felice,
[41] Meichanetzidis, K., Gogioso, S., Felice, G.D., G., Meichanetzidis, K., Clark, S., Coecke,
Chiappori, N., Toumi, A., Coecke, B.: Quan- B.: lambeq: An Efficient High-Level Python
tum Natural Language Processing on Near- Library for Quantum NLP (2021)
Term Quantum Computers (2020)
[54] Yeung, R., Kartsaklis, D.: A CCG-Based
[42] Schuld, M., Bocharov, A., Svore, K.M., Version of the DisCoCat Framework (2021)
Wiebe, N.: Circuit-centric quantum clas-
sifiers. Physical Review A 101(3) (2020). [55] Spall, J.C.: Implementation of the simulta-
https://doi.org/10.1103/physreva.101. neous perturbation algorithm for stochas-
032308 tic optimization. IEEE Transactions on
Aerospace and Electronic Systems 34(3),
[43] Benedetti, M., Lloyd, E., Sack, S., Fioren- 817–823 (1998). https://doi.org/10.1109/7.
tini, M.: Parameterized quantum circuits as 705889
machine learning models. Quantum Science
and Technology 4(4), 043001 (2019). https:// [56] Bonet-Monroig, X., Wang, H., Vermetten, D.,
doi.org/10.1088/2058-9565/ab4eb5 Senjean, B., Moussa, C., Bäck, T., Dunjko,
V., O’Brien, T.E.: Performance comparison
[44] Lambek, J.: From word to sentence of optimization methods on variational quan-
tum algorithms (2021)
[45] Preller, A.: Linear processing with pregroups.
Studia Logica: An International Journal for [57] de Felice, G., Meichanetzidis, K., Toumi,
Symbolic Logic 87(2/3), 171–197 (2007) A.: Functorial question answering. Electronic
Proceedings in Theoretical Computer Science
[46] Baez, J.C., Stay, M.: Physics, Topology, Logic 323, 84–94 (2020). https://doi.org/10.4204/
and Computation: A Rosetta Stone (2009) eptcs.323.6

[47] Selinger, P.: A survey of graphical languages [58] Chen, Y., Pan, Y., Dong, D.: Quantum Lan-
for monoidal categories. Lecture Notes in guage Model with Entanglement Embedding
Physics, 289–355 (2010). https://doi.org/10. for Question Answering (2020)
1007/978-3-642-12821-9 4
[59] Zhao, Q., Hou, C., Liu, C., Zhang, P., Xu,
[48] Schuld, M., Killoran, N.: Quantum machine
Springer Nature 2021 LATEX template

19

R.: A quantum expectation value based lan- [71] Olson, B., Hashmi, I., Molloy, K., Shehu,
guage model with application to question A.: Basin hopping as a general and versatile
answering. Entropy 22(5), 533 (2020) optimization framework for the characteriza-
tion of biological macromolecules. Advances
[60] Sivarajah, S., Dilkes, S., Cowtan, A., Sim- in Artificial Intelligence 2012, 1–19 (2012).
mons, W., Edgington, A., Duncan, R.: Tket: https://doi.org/10.1155/2012/674832
A retargetable compiler for nisq devices.
Quantum Science and Technology (2020). [72] Gao, F., Han, L.: Implementing the nelder-
https://doi.org/10.1088/2058-9565/ab8e92 mead simplex algorithm with adaptive
parameters. Computational Optimization
[61] Mitarai, K., Fujii, K.: Methodology for and Applications 51(1), 259–277 (2010).
replacing indirect measurements with direct https://doi.org/10.1007/s10589-010-9329-3
measurements. Physical Review Research
1(1) (2019). https://doi.org/10.1103/ [73] https://pypi.org/project/scipy
physrevresearch.1.013006
[74] Cowtan, A., Dilkes, S., Duncan, R., Kra-
[62] Benedetti, M., Fiorentini, M., Lubasch, M.: jenbrink, A., Simmons, W., Sivarajah,
Hardware-efficient variational quantum algo- S.: On the Qubit Routing Problem. In:
rithms for time evolution (2020) van Dam, W., Mancinska, L. (eds.) 14th
Conference on the Theory of Quantum
[63] Piedeleu, R., Kartsaklis, D., Coecke, B., Computation, Communication and Cryp-
Sadrzadeh, M.: Open System Categorical tography (TQC 2019). Leibniz Interna-
Quantum Semantics in Natural Language tional Proceedings in Informatics (LIPIcs),
Processing (2015) vol. 135, pp. 5–1532. Schloss Dagstuhl–
Leibniz-Zentrum fuer Informatik, Dagstuhl,
[64] Bankova, D., Coecke, B., Lewis, M., Marsden, Germany (2019). https://doi.org/10.4230/
D.: Graded Entailment for Compositional LIPIcs.TQC.2019.5. http://drops.dagstuhl.
Distributional Semantics (2016) de/opus/volltexte/2019/10397
[65] Coecke, B.: The Mathematics of Text Struc- [75] https://github.com/CQCL/pytket
ture (2020)
[76] Aharonov, D., Jones, V., Landau, Z.: A
[66] Coecke, B., de Felice, G., Meichanetzidis, K., polynomial quantum algorithm for approx-
Toumi, A.: Foundations for Near-Term Quan- imating the jones polynomial. Algorithmica
tum Natural Language Processing (2020) 55(3), 395–421 (2008). https://doi.org/10.
1007/s00453-008-9168-0
[67] Buszkowski, W., Moroz, K.: Pregroup Gram-
mars and Context-free Grammars

[68] Pentus, M.: Lambek grammars are context


free. In: [1993] Proceedings Eighth Annual
IEEE Symposium on Logic in Computer Sci-
ence, pp. 429–433 (1993). https://doi.org/10.
1109/LICS.1993.287565

[69] https://github.com/andim/noisyopt

[70] Bradbury, J., Frostig, R., Hawkins, P.,


Johnson, M.J., Leary, C., Maclaurin, D.,
Wanderman-Milne, S.: JAX: Composable
Transformations of Python+NumPy pro-
grams. http://github.com/google/jax

You might also like