0% found this document useful (0 votes)
14 views28 pages

Quantum Speedup of Monte Carlo Methods: Ashley Montanaro July 12, 2017

This document presents a quantum algorithm that accelerates Monte Carlo methods, achieving a near-quadratic speedup over classical algorithms for estimating expected values of randomized or quantum algorithms with bounded variance. The algorithm can be applied to efficiently compute partition functions and estimate total variation distances between probability distributions. The findings suggest significant improvements in runtime for various applications in statistical physics and other fields utilizing Monte Carlo methods.

Uploaded by

hamza.jaffali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views28 pages

Quantum Speedup of Monte Carlo Methods: Ashley Montanaro July 12, 2017

This document presents a quantum algorithm that accelerates Monte Carlo methods, achieving a near-quadratic speedup over classical algorithms for estimating expected values of randomized or quantum algorithms with bounded variance. The algorithm can be applied to efficiently compute partition functions and estimate total variation distances between probability distributions. The findings suggest significant improvements in runtime for various applications in statistical physics and other fields utilizing Monte Carlo methods.

Uploaded by

hamza.jaffali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Quantum speedup of Monte Carlo methods

Ashley Montanaro∗

July 12, 2017


arXiv:1504.06987v3 [quant-ph] 11 Jul 2017

Abstract
Monte Carlo methods use random sampling to estimate numerical quantities which are hard
to compute deterministically. One important example is the use in statistical physics of rapidly
mixing Markov chains to approximately compute partition functions. In this work we describe
a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The
algorithm estimates the expected output value of an arbitrary randomised or quantum subrou-
tine with bounded variance, achieving a near-quadratic speedup over the best possible classical
algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of
the fastest known classical algorithms with rigorous performance bounds for computing partition
functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algo-
rithm can also be used to estimate the total variation distance between probability distributions
efficiently.

1 Introduction
Monte Carlo methods are now ubiquitous throughout science, in fields as diverse as statistical
physics [37], microelectronics [30] and mathematical finance [23]. These methods use randomness
to estimate numerical properties of systems which are too large or complicated to analyse deter-
ministically. In general, the basic core of Monte Carlo methods involves estimating the expected
output value µ of a randomised algorithm A. The natural algorithm for doing so is to produce k
samples, each corresponding to the output of an independent execution of A, and then to output
the average µe of the samples as an approximation of µ. Assuming that the variance of the random
variable corresponding to the output of A is at most σ 2 , the probability that the value output by
this estimator is far from the truth can be bounded using Chebyshev’s inequality:
σ2
µ − µ| ≥ ] ≤
Pr[|e .
k2
It is therefore sufficient to take k = O(σ 2 /2 ) to estimate µ up to additive error  with, say, 99%
success probability. This simple result is a key component in many more complex randomised
approximation schemes (see e.g. [50, 37]).
Although this algorithm is fairly efficient, its quadratic dependence on σ/ seems far from ideal:
for example, if σ = 1, to estimate µ up to 4 decimal places we would need to run A over 100 million
times. Unfortunately, it can be shown that, without any further information about A, the sample
complexity of this algorithm is asymptotically optimal [15] with respect to its scaling with σ and
, although it can be improved by a constant factor [29].

Department of Computer Science, University of Bristol, UK; [email protected].

1
We show here that, using a quantum computer, the number of uses of A required to approximate
µ can be reduced almost quadratically beyond the above classical bound. Assuming that the
variance of the output of the algorithm A is at most σ 2 , we present a quantum algorithm which
estimates µ up to additive error , with 99% success probability, using A only O(σ/)
e times1 . It
follows from known lower bounds on the quantum complexity of approximating the mean [45] that
the runtime of this algorithm is optimal, up to polylogarithmic factors. This result holds for an
arbitrary algorithm A used as a black box, given only an upper bound on the variance.
An important aspect of this construction is that the underlying subroutine A need not be a
classical randomised procedure, but can itself be a quantum algorithm. This enables any quantum
speedup obtained by A to be utilised within the overall framework of the algorithm. A particu-
lar case in which this is useful is quantum speedup of Markov chain Monte Carlo methods [38].
Classically, such methods use a rapidly mixing Markov chain to approximately sample from a
probability distribution corresponding to the stationary distribution of the chain. Quantum walks
are the quantum analogue of random walks (see e.g. [57] for a review). In some cases, quantum
walks can reduce the mixing time quadratically (see e.g. [3, 58]), although it is not known whether
this can be achieved in general [48, 6, 18]. We demonstrate that this known quadratic reduction
can be combined with our algorithm to speed up the fastest known general-purpose classical algo-
rithm with rigorous performance bounds [50] for approximately computing partition functions up
to small relative error, a fundamental problem in statistical physics [37]. As another example of
how our algorithm can be applied, we substantially improve the runtime of a quantum algorithm
for estimating the total variation distance between two probability distributions [13].

1.1 Prior work


The topic of quantum estimation of mean output values of algorithms with bounded variance con-
nects to several previously-explored directions. First, it generalises the problem of approximating
the mean, with respect to the uniform distribution, of an arbitrary bounded function. This has
been addressed by a number of authors. The first asymptotically optimal quantum algorithm for
this problem, which uses O(1/) queries to achieve additive error , seems to have been given by
Heinrich [27]; an elegant alternative optimal algorithm was later presented by Brassard et al. [11].
Previous algorithms, which are optimal up to lower-order terms, were described by Grover [25],
Aharonov [2] and Abrams and Williams [1]. Using similar techniques to Brassard et al., Wocjan et
al. [59] described an efficient algorithm for estimating the expected value of an arbitrary bounded
observable. It is not difficult to combine these ideas to approximate the mean of arbitrary bounded
functions with respect to nonuniform distributions (see Section 2.1).
One of the main technical ingredients in the present paper is based on an algorithm of Heinrich
for approximating the mean, with respect to the uniform distribution, of functions with bounded
L2 norm [27]. Section 2.2 describes a generalisation of this result to nonuniform distributions,
using similar techniques. This is roughly analogous to the way that amplitude amplification [12]
generalises Grover’s quantum search algorithm [24].
The related problem of quantum estimation of expectation values of observables, an important
task in the simulation of quantum systems, has been studied by Knill, Ortiz and Somma [36]. These
authors give an algorithm for estimating tr(Aρ) for observables A such that one can efficiently
implement the operator e−iAt . The algorithm is efficient (i.e. achieves runtimes close to O(1/))
when the tails of the distribution tr(Aρ) decay quickly. However, in the case where one only knows
1
The O
e notation hides polylogarithmic factors.

2
Algorithm Precondition Approximation of µ Uses of A and A−1
1 v(A) ∈ [0, 1] Additive error  O(1/)
3 Var(v(A)) ≤ σ 2 Additive error  O(σ/)
e
4 Var(v(A))/(E[v(A)])2 ≤ B Relative error  O(B/)
e

Table 1: Summary of the main quantum algorithms presented in this paper for estimating the mean
output value µ of an algorithm A. (Algorithm 2, omitted, is a subroutine used in Algorithm 3.)

an upper bound on the variance of this distribution, the algorithm does not achieve a better runtime
than classical sampling. Yet another related problem, that of exact Monte Carlo sampling from a
desired probability distribution, was addressed by Destainville, Georgeot and Giraud [17]. Their
quantum algorithm, which uses Grover’s algorithm as a subroutine, achieves roughly a quadratic
speedup over classical exact sampling.
√ This algorithm’s applicability is limited by the fact that its
runtime scaling can be as slow as O( N ), where N is the number of states of the system; we often
think of N as being exponential in the input size.
Quantum algorithms have been used previously to approximate classical partition functions and
solve related problems. In particular, a number of authors [40, 39, 4, 56, 21, 7, 22, 16, 43] have
considered the complexity of computing Ising and Potts model partition functions. These works
in some cases achieve exponential quantum speedups over the best known classical algorithms.
Unfortunately, they in general either produce an approximation accurate up to a specified additive
error bound, or only work for specific classes of partition function problems with restrictions on
interaction strengths and topologies, or both. Here we aim to approximate partition functions up
to small relative error in a rather general setting.
Using related techniques to the present work, Somma et al. [49] used quantum walks to accelerate
classical simulated annealing processes, and quantum estimation of partition functions up to small
relative error was addressed by Wocjan et al. [59]. Their algorithm, which is based on the use of
quantum walks and amplitude estimation, achieves a quadratic speedup over classical algorithms
with respect to both mixing time and accuracy. However, it cannot be directly applied to accelerate
the most efficient classical algorithms for approximating partition function problems, which use
so-called Chebyshev cooling schedules (discussed in Section 3). This is essentially because these
algorithms are based around estimating the mean of random variables given only a bound on the
variance. This was highlighted as an open problem in [59], which we resolve here.
Several recent works have developed quantum algorithms for the quantum generalisation of
sampling from a Gibbs distribution: producing a Gibbs state ρ ∝ e−βH for some quantum Hamil-
tonian H [53, 47, 52, 60]. Given such a state, one can measure a suitable observable to compute
some quantity of interest about H. Supplied with an upper bound on the variance of such an ob-
servable, the procedure detailed here can be used (as for any other quantum algorithm) to reduce
the number of repetitions required to estimate the observable to a desired accuracy.

1.2 Techniques
We now give an informal description of our algorithms, which are summarised in Table 1 (for
technical details and proofs, see Section 2). For any randomised or quantum algorithm A, we write
v(A) for the random variable corresponding to the value computed by A, with the expected value of
v(A) denoted E[v(A)]. For concreteness, we think of A as a quantum algorithm which operates on
n qubits, each initially in the state |0i, and whose quantum part finishes with a measurement of k

3
of the qubits in the computational basis. Given that the measurement returns outcome x ∈ {0, 1}k ,
the final output is then φ(x), for some fixed function φ : {0, 1}k → R. If A is a classical randomised
algorithm, or a quantum circuit using (for example) mixed states and intermediate measurements,
a corresponding unitary quantum circuit of this form can be produced using standard reversible-
computation techniques [5]. As is common in works based on quantum amplitude amplification
and estimation [12], we also assume that we have the ability to execute the algorithm A−1 , which
is the inverse of the unitary part of A. If we do have a description of A as a quantum circuit, this
can be achieved simply by running the circuit backwards, replacing each gate with its inverse.
We first deal with the special case where the output of A is bounded between 0 and 1. Here a
quantum algorithm for approximating µ := E[v(A)] quadratically faster than is possible classically
can be found by combining ideas from previously known algorithms [27, 11, 59]. We append
an additionalpqubit and define p a unitary operator W on k + 1 qubits which performs the map
|xi|0i 7→ |xi( 1 − φ(x)|0i + φ(x)|1i). If the final measurement of the algorithm A is replaced
with performing W , then measuring the added qubit, the probability that we receive the answer
1 is precisely µ. Using quantum amplitude estimation [12] the probability that this measurement
returns 1 can be estimated to higher accuracy than is possible classically. Using t iterations of

amplitude estimation, we can output an estimate µ e such that |eµ − µ| = O( µ/t + 1/t2 ) with high
probability [12]. In particular, O(1/) iterations of amplitude estimation are sufficient to produce
an estimate µ e such that |e
µ − µ| ≤  with, say, 99% probability.
The next step is to use the above algorithm as a subroutine in a more general procedure
that can deal with algorithms A whose output is non-negative, has bounded `2 norm, but is not
necessarily bounded
p between 0 and 1. That is, algorithms for which we can control the expression
kv(A)k2 := E[v(A)2 ]. The procedure for this case generalises, and is based on the same ideas as,
a previously known result for the uniform distribution [27].
The idea is to split the output of A up into disjoint intervals depending on size. Write Ap,q
for the “truncated” algorithm which outputs v(A) if p ≤ v(A) < q, and otherwise outputs 0. We
estimate µ by applying the above algorithm to estimate E[v(Ap,q )] for a sequence of O(log 1/)
intervals which are exponentially increasing in size, and summing the results. As the intervals [p, q)
get larger, the accuracy with which we approximate E[v(Ap,q )] decreases, and values v(A) larger
than about 1/ are ignored completely. However, the overall upper bound on kv(A)k2 allows us to
infer that these larger values do not affect the overall expectation µ much; indeed, if µ depended
significantly on large values in the output, the `2 norm of v(A) would be high.
The final result is that for kv(A)k2 = O(1), given appropriate parameter choices, the estimate
e satisfies |e
µ µ − µ| = O() with high probability, and the algorithm uses A O(1/)
e times in total.
This scaling is a near-quadratic improvement over the best possible classical algorithm.
We next consider the more general case of algorithms A which have bounded variance, but whose
output need not be non-negative, nor bounded in `2 norm. To apply the previous algorithm, we
would like to transform the output of A to make its `2 norm low. If v(A) has mean µ and variance
upper-bounded by σ 2 , a suitable way to achieve this is to subtract µ from the output of A, then
divide by σ. The new algorithm’s output would have `2 norm upper-bounded by 1, and estimating
its expected value up to additive error /σ would give us an estimate of µ up to . Unfortunately,
we of course do not know µ initially, so cannot immediately implement this idea. To approximately
implement it, we first run A once and use the output m e as a proxy for µ. Because Var(v(A)) ≤ σ 2 ,
me is quite likely to be within distance O(σ) of µ. Therefore, the algorithm B produced from A by
subtracting m e and dividing by σ is quite likely to have `2 norm upper-bounded by a constant. We
can thus efficiently estimate the positive and negative parts of E[v(B)] separately, then combine

4
and rescale them. The overall algorithm achieves accuracy  in time O(σ/).
e
A similar idea can be used to approximate the expected output value of algorithms for which
we have a bound on the relative variance, namely that Var(v(A)) = O(µ2 ). In this setting it turns
out that O(1/)
e uses of A suffice to produce an estimate µ
e accurate up to relative error , i.e. for
which |eµ − µ| ≤ µ. This is again a near-quadratic improvement over the best possible classical
algorithm.

1.3 Approximating partition functions


In this section we discuss (with details in Section 3) how these algorithms can be applied to the
problem of approximating partition functions. Consider a (classical) physical system which has
state space Ω, together with a Hamiltonian H : Ω → R specifying the energy of each configuration2
x ∈ Ω. Here we will assume that H takes integer values in the set {0, . . . , n}. A central problem is
to compute the partition function X
Z(β) = e−β H(x)
x∈Ω

for some inverse temperature β defined by β = 1/(kB T ), where T is the temperature and kB is
Boltzmann’s constant. As well as naturally encapsulating various models in statistical physics, such
as the Ising and Potts models, this framework also encompasses well-studied problems in computer
science, such as counting the number of valid k-colourings of a graph. In particular, Z(∞) counts
the number of configurations x such that H(x) = 0. It is often hard to compute Z(β) for large
β but easy to approximate Z(β) ≈ |Ω| for β ≈ 0. In many cases, such as the Ising model, it is
known that computing Z(∞) exactly falls into the #P-complete complexity class [34], and hence
is unlikely to admit an efficient quantum or classical algorithm.
Here our goal will be to approximate Z(β) up to relative error , for some small . That is, to
output Ze such that |Z
e − Z(β)| ≤  Z(β), with high probability. For simplicity, we will focus on
β = ∞ in the following discussion, but it is easy to see how to generalise to arbitrary β.
Let 0 = β0 < β1 < · · · < β` = ∞ be a sequence of inverse temperatures. A standard classical
approach to design algorithms for approximating partition functions [55, 19, 10, 50, 59] is based
around expressing Z(β` ) as the telescoping product

Z(β1 ) Z(β2 ) Z(β` )


Z(β` ) = Z(β0 ) ... .
Z(β0 ) Z(β1 ) Z(β`−1 )

If we can compute Z(β0 ) = |Ω|, and can also approximate each of the ratios αi := Z(βi+1 )/Z(βi )
accurately, taking the product will give a good approximation to Z(β` ). Let πi denote the Gibbs
(or Boltzmann) probability distribution corresponding to inverse temperature βi , where
1
πi (x) = e−βi H(x) .
Z(βi )

To approximate αi we define the random variable

Yi (x) = e−(βi+1 −βi )H(x) .

Then one can readily compute that Eπi [Yi ] = αi , so sampling from each distribution πi allows us
to estimate the quantities αi . It will be possible to estimate αi up to small relative error efficiently
2
We use x to label configurations rather than the more standard σ to avoid confusion with the variance.

5
if the ratio E[Yi2 ]/E[Yi ]2 is low. This motivates the concept of a Chebyshev cooling schedule [50]:
a sequence of inverse temperatures βi such that E[Yi2 ]/E[Yi ]2 = O(1) for all i. It is known that,
for any partition function problem
√ as defined above such that |Ω| = A, there exists a Chebyshev
cooling schedule with ` = O( e log A) [50].
It is sufficient to approximate E[Yi ] up to relative error O(/`) for each i to get an overall
approximation accurate up to relative error . To achieve this, the quantum algorithm presented
here
√ needs to use at most O(`/) samples from Yi . Given a Chebyshev cooling schedule with ` =
e
O( log A), the algorithm thus uses O((log
e e A)/) samples in total, a near-quadratic improvement
in terms of  over the complexity of the fastest known classical algorithm [50].
In general, we cannot exactly sample from the distributions πi . Classically, one way of approx-
imately sampling from these distributions is to use a Markov chain which mixes rapidly and has
stationary distribution πi . For a reversible, ergodic Markov chain, the time required to produce
such a sample is controlled by the relaxation time τ := 1/(1 − |λ1 |) of the chain, where λ1 is the
second largest eigenvalue in absolute value [38]. In particular, sampling from a distribution close
to πi in total variation distance requires Ω(τ ) steps of the chain.
It has been known for some time that quantum walks can sometimes mix quadratically faster [3].
One case where efficient mixing can be obtained is for sequences of Markov chains whose stationary
distributions π are close [58]. Further,
P for this special case one can approximately produce coherent
p
“quantum sample” states |πi = x∈Ω π(x)|xi efficiently. Here we can show (Section 3.2) that the
Chebyshev cooling schedule condition implies that each distribution in the sequence π1 , . . . , π`−1
is close enough to its predecessor that we can use techniques of [58] to approximately produce any
e √τ ) quantum walk steps each. Using similar ideas we can approximately reflect
state |πi i using O(`
e √τ ) quantum walk steps.
about |πi i using only O(
Approximating E[Yi ] up to relative error O(/`) using our algorithm requires one quantum sam-
ple approximating |πi i, and O(`/)
e approximate reflections about |πi i. Therefore, the total number
e √τ /). Summing over i, we get a quantum algo-
of quantum walk steps required for each i is O(`

rithm for approximating an arbitrary partition function up to relative error  using O((log
e A) τ /)
quantum walk steps. The fastest known classical algorithm [50] exhibits quadratically worse de-
pendence on both τ and .
In the above discussion, we have neglected the complexity of computing the Chebyshev cool-
ing schedule itself. An efficient classical algorithm for this task is known [50], which runs in
time O((log
e A)τ ). Adding the complexity of this part, we finish with an overall complexity of
√ √
O((log
e A) τ ( τ + 1/)). We leave the interesting question open of whether there exists a more
efficient quantum algorithm for finding a Chebyshev cooling schedule.

1.4 Applications
We now sketch several representative settings (for details, see Section 3.4) in which our algorithm
for approximating partition functions gives a quantum speedup.

• The ferromagnetic Ising model above the critical temperature. This well-studied statis-
tical
P physics model is defined in terms of a graph G = (V, E) by the Hamiltonian H(z) =
− (u,v)∈E zu zv , where |V | = n and z ∈ {±1}n . The Markov chain known as the Glauber
dynamics is known to mix rapidly above a certain critical temperature and to have as its
stationary distribution the Gibbs distribution. For example, for any graph with maximum
degree O(1), the mixing time of the Glauber dynamics for sufficiently low inverse temperature
β is O(n log n) [44]. In this case, as A = 2n , the quantum algorithm approximates Z(β) to

6
e 3/2 / + n2 ) steps. The corresponding classical algorithm [50]
within relative error  in O(n
e 2 /2 ) steps.
uses O(n

• Counting colourings. Here we are given a graph G with n vertices and maximum degree
d. We seek to approximately count the number of valid k-colourings of G, where a colouring
of the vertices is valid if all pairs of neighbouring vertices are assigned different colours. In
the case where k > 2d, the use of a rapidly mixing Markov chain gives a quantum algorithm
approximating the number of colourings of G up to relative error  in time O(n e 3/2 / + n2 ),
as compared with the classical O(n 2 2
e / ) [50].

• Counting matchings. A matching in a graph G is a subset M of the edges of G such


that no pair of edges in M shares a vertex. In statistical physics, matchings are studied
under the name of monomer-dimer coverings [26]. Our algorithm can approximately count
the number of matchings on a graph with n vertices and m edges in O(ne 3/2 m1/2 / + n2 m)
steps, as compared with the classical O(n 2 2
e m/ ) [50].

Finally, as another example of how our algorithm can be applied, we improve the accuracy
of an existing quantum algorithm for estimating the total variation distance between probability
distributions. In this setting, we are given the ability to sample from probability distributions p
and q on n elements, and would like to estimate the distance between them up to additive error

. A quantum algorithm of Bravyi, Harrow and Hassidim solves this problem using O( n/8 )
samples [13], while no classical algorithm can achieve sublinear dependence on n [54].
Quantum mean estimation can significantly improve the dependence of this quantum algorithm
on . The total variation distance between p and q can be described as the expected value of the
random variable R(x) = |p(x)−q(x)|
p(x)+q(x) , where x is drawn from the distribution r = (p + q)/2 [13].
For each x, R(x) can be computed up to accuracy  using O( e √n/3/2 ) iterations of amplitude
estimation. Wrapping this within O(1/) iterations of the mean-estimation algorithm, we obtain
an overall algorithm running in time O(e √n/5/2 ). See Section 4 for details.

2 Algorithms
We now give technical details, parameter values and proofs for the various algorithms described
informally in Section 1.2. Recall that, for any randomised or quantum algorithm A, we let v(A)
be the random variable corresponding to the value computed by A. We assume that A takes no
input directly, but may have access to input (e.g. via queries to some black box or “oracle”) during
its execution. We further assume throughout that A is a quantum algorithm of the following form:
apply some unitary operator to the initial state |0n i; measure k ≤ n qubits of the resulting state in
the computational basis, obtaining outcome x ∈ {0, 1}k ; output φ(x) for some easily computable
function φ : {0, 1}k → R. We finally assume that we have access to the inverse of the unitary part
of the algorithm, which we write as A−1 .

Lemma 1 (Powering lemma [35]). Let A be a (classical or quantum) algorithm which aims to
estimate some quantity µ, and whose output µ e satisfies |µ − µ
e| ≤  except with probability γ, for
some fixed γ < 1/2. Then, for any δ > 0, it suffices to repeat A O(log 1/δ) times and take the
median to obtain an estimate which is accurate to within  with probability at least 1 − δ.

We will also need the following fundamental result from [12]:

7
Theorem 2 (Amplitude estimation [12]). There is a quantum algorithm called amplitude es-
timation which takes as input one copy of a quantum state |ψi, a unitary transformation U =
2|ψihψ| − I, a unitary transformation V = I − 2P for some projector P , and an integer t. The
a, an estimate of a = hψ|P |ψi, such that
algorithm outputs e
p
a(1 − a) π 2
|e
a − a| ≤ 2π + 2
t t
with probability at least 8/π 2 , using U and V t times each.

The success probability of 8/π 2 can be improved to 1 − δ for any δ > 0 using the powering
lemma at the cost of an O(log 1/δ) multiplicative factor.

2.1 Estimating the mean with bounded output values


We first consider the problem of estimating E[v(A)] in the special case where v(A) is bounded
between 0 and 1. The algorithm for this case is effectively a combination of elegant ideas of Brassard
et al. [11] and Wocjan et al. [59]. The former described an algorithm for efficiently approximating
the mean of an arbitrary function with respect to the uniform distribution; the latter described
an algorithm for approximating the expected value of a particular observable, with respect to an
arbitrary quantum state. The first quantum algorithm achieving optimal scaling for approximating
the mean of a bounded function under the uniform distribution was due to Heinrich [27].

Input: an algorithm A such that 0 ≤ v(A) ≤ 1, integer t, real δ > 0.


Assume that A is a quantum algorithm which makes no measurements until the end of the
algorithm; operates on initial input state |0n i; and its final measurement is a measurement of
the last k ≤ n of these qubits in the computational basis.

1. If necessary, modify A such that it makes no measurements until the end of the algorithm;
operates on initial input state |0n i; and its final measurement is a measurement of the last
k ≤ n of these qubits in the computational basis.

2. Let W be the unitary operator on k + 1 qubits defined by


p p 
W |xi|0i = |xi 1 − φ(x)|0i + φ(x)|1i ,

where each computational basis state x ∈ {0, 1}k is associated with a real number φ(x) ∈
[0, 1] such that φ(x) is the value output by A when measurement outcome x is received.

3. Repeat the following step O(log 1/δ) times and output the median of the results:

(a) Apply t iterations of amplitude estimation, setting |ψi = (I ⊗ W )(A ⊗ I)|0n+1 i,


P = I ⊗ |1ih1|.

Algorithm 1: Approximating the mean output value of algorithms bounded between 0 and 1 (cf. [11,
27, 59])

Theorem 3. Let |ψi be defined as in Algorithm 1 and set U = 2|ψihψ| − I. Algorithm 1 uses
O(log 1/δ) copies of the state A|0n i, uses U O(t log 1/δ) times, and outputs an estimate µ
e such

8
that p !
E[v(A)] 1
|e
µ − E[v(A)]| ≤ C + 2
t t
with probability at least 1 − δ, where C is a universal constant. In particular, for any fixed δ > 0
and any  such that 0 ≤  ≤ 1, to produce an estimate p µe such that with probability at least 1 − δ,
|e
µ − E[v(A)]| ≤  E[v(A)] it suffices to take t = O(1/( E[v(A)])). To achieve |e µ − E[v(A)]| ≤ 
with probability at least 1 − δ it suffices to take t = O(1/).

Proof. The complexity claim follows immediately from Theorem 2. Also observe that W can be
implemented efficiently, as it is a controlled rotation of one qubit dependent on the value of φ(x) [59].
It remains to show the accuracy claim. The final state of A, just before its last measurement, can
be written as X
|ψ 0 i = A|0n i = αx |ψx i|xi
x

for some normalised states |ψx i. If we then attach an ancilla qubit and apply W , we obtain
X p p 
|ψi = (I ⊗ W )(A ⊗ I)|0n i|0i = αx |ψx i|xi 1 − φ(x)|0i + φ(x)|1i .
x

We have X
hψ|P |ψi = |αx |2 φ(x) = E[v(A)].
x

Therefore, when we apply amplitude estimation, by Theorem 2 we obtain an estimate µ


e of µ =
E[v(A)] such that p
µ(1 − µ) π 2
|e
µ − µ| ≤ 2π + 2
t t
with probability at least 8/π 2 . The powering lemma (Lemma 1) implies that the median of
O(log 1/δ) repetitions will lie within this accuracy bound with probability at least 1 − δ.

Observe that U = 2|ψihψ| − I can be implemented with one use each of A and A−1 , and
V = I − 2P is easy to implement.
It seems likely that the median-finding algorithm of Nayak and Wu [45] could also be generalised
in a similar way, to efficiently compute the median of the output values of any quantum algorithm.
As we will not need this result here we do not pursue this further.

2.2 Estimating the mean with bounded `2 norm


We now use Algorithm 1 to give an efficient quantum algorithm for approximating the mean output
value of a quantum algorithm whose output has bounded `2 norm. In what follows, for any algorithm
A, let A<x , Ax,y , A≥y , be the algorithms defined by executing A to produce a value v(A) and:

• A<x : If v(A) < x, output v(A), otherwise output 0;

• Ax,y : If x ≤ v(A) < y, output v(A), otherwise output 0;

• A≥y : If y ≤ v(A), output v(A), otherwise output 0.

9
In addition, for any algorithm A and any function f : R → R, let f (A) be the algorithm produced by
evaluating v(A) and computing f (v(A)). Note that Algorithm 1 can easily be modified to compute
E[f (v(A))] rather than E[v(A)], for any function f : R → [0, 1], by modifying the operation W .
Our algorithm and correctness proof are a generalisation of a result of Heinrich [27] for com-
puting the mean with respect to the uniform distribution
p of functions with bounded L2 norm, and
are based on the same ideas. Write kv(A)k2 := E[v(A) ]. 2

Input: an algorithm A such that v(A) ≥ 0, and an accuracy  < 1/2.


 √ 
D log2 1/
1. Set k = dlog2 1/e, t0 =  , where D is a universal constant to be chosen later.

2. Use Algorithm 1 with t = t0 , δ = 1/10 to estimate E[v(A0,1 )]. Let the estimate be µ
e0 .

3. For ` = 1, . . . , k:

(a) Use Algorithm 1 with t = t0 , δ = 1/(10k) to estimate E[v(A2`−1 ,2` )/2` ]. Let the
estimate be µ
e` .

e0 + k`=1 2` µ
P
4. Output µ e=µ e` .

Algorithm 2: Approximating the mean of positive functions with bounded `2 norm

Lemma 4. Let |ψi = A|0n i, U = 2|ψihψ| − I. Algorithm 2 uses O(log(1/) log log(1/)) copies
of |ψi, uses U O((1/) log3/2 (1/) log log(1/)) times, and estimates E[v(A)] up to additive error
(kv(A)k2 + 1)2 with probability at least 4/5.

Proof. We first show the resource bounds. Algorithm 1 is run Θ(log 1/) times, each time with pa-
rameter δ = Ω(1/(log 1/)).pBy Theorem 3, each use of Algorithm 1 consumes O(log log 1/) copies
of |ψi and uses U O((1/) log(1/) log log(1/)) times. The total number of copies of |ψi used is
O(log(1/) log log(1/)), and the total number of uses of U is O((1/) log3/2 (1/) log log(1/)).
All of the uses of Algorithm 1 succeed, except with probability at most 1/5 in total. To estimate
the total error in the case where they all succeed, we write
k
X
E[v(A)] = E[v(A0,1 )] + 2` E[v(A2`−1 ,2` )/2` ] + E[v(A≥2k )]
`=1

and use the triangle inequality term by term to obtain


k
X
|e
µ − E[v(A)]| ≤ |e
µ0 − E[v(A0,1 )]| + 2` |e
µ` − E[v(A2`−1 ,2` )/2` ]| + E[v(A≥2k )].
`=1

Let p(x) denote the probability that A outputs x. We have


X 1 X 2 kv(A)k22
E[v(A≥2k )] = p(x)x ≤ p(x)x = .
2k x 2k
x≥2k

By Theorem 3, p !
E[v(A0,1 )] 1
|e
µ0 − E[v(A0,1 )]| ≤ C + 2
t0 t0

10
and similarly
q 
E[v(A2`−1 ,2` )] 1
µ` − E[v(A2`−1 ,2` )/2` ]| ≤ C 
|e + 2.
t0 2`/2 t0

So the total error is at most


p q 
k `−1 ,2` )]
E[v(A0,1 )] 1 X E[v(A 2 1  kv(A)k22
C + 2+ 2`  + + .
t0 t0 t0 2`/2 t20 2k
`=1

We apply Cauchy-Schwarz to the first part of each term in the sum:


k k
!1/2
X
`/2
q √ X
`

2 E[v(A2`−1 ,2` )] ≤ k 2 E[v(A2`−1 ,2` )] ≤ 2k kv(A)k2 ,
`=1 `=1

where the second inequality follows from


X 1 X
2 kv(A2`−1 ,2` )k22
E[v(A2`−1 ,2` )] = p(x)x ≤ p(x)x = .
2`−1 2`−1
2`−1 ≤x<2` 2`−1 ≤x<2
`

Inserting this bound and using E[v(A0,1 )] ≤ 1, we obtain


√ !
1 1 2k kv(A)k2 2k+1 kv(A)k22
|e
µ − E[v(A)]| ≤ C + 2+ + 2 + .
t0 t0 t0 t0 2k
Inserting the definitions of t0 and k, we get an overall error bound
|e
µ − E[v(A)]|
1/2 !
2 √

C  1 4
≤ p + + 2 kv(A)k2 1 + + +  kv(A)k22
D log2 1/ D log2 1/ log2 1/ D log2 1/
 
C  4
≤ + + 2 kv(A)k2 + +  kv(A)k22
D D D
   
C 5 2
= 1+ + 2 kv(A)k2 + kv(A)k2
D D
using 0 <  < 1/2 in the second inequality. For a sufficiently large constant D, this is upper-bounded
by (kv(A)k2 + 1)2 as claimed.

Observe that, if E[v(A)2 ] = O(1), to achieve additive error  the number of uses of A that
we need is O((1/) log3/2 (1/) log log(1/)). By the powering lemma, we can repeat Algorithm 2
O(log 1/δ) times and take the median to improve the probability of success to 1 − δ for any δ > 0.

2.3 Estimating the mean with bounded variance


We are now ready to formally state our algorithm for estimating the mean output value of an
arbitrary algorithm with bounded variance. For clarity, some of the steps are reordered as compared
with the informal description in Section 1.2. Recall that, in the classical setting, if we wish to
estimate E[v(A)] up to additive error  for an arbitrary algorithm A such that
Var(v(A)) := E[(v(A) − E[v(A)])2 ] ≤ σ 2 ,
we need to use A Ω(σ 2 /2 ) times [15].

11
Input: an algorithm A such that Var(v(A)) ≤ σ 2 for some known σ, and an accuracy  such
that  < 4σ.

1. Set A0 = A/σ.

2. Run A0 once and let m


e be the output.

3. Let B be the algorithm produced by executing A0 and subtracting m.


e

4. Apply Algorithm 2 to algorithms −B<0 /4 and B≥0 /4 with accuracy /(32σ) and failure
e− , µ
probability 1/9, to produce estimates µ e+ of E[v(−B<0 )/4] and E[v(B≥0 )/4], respec-
tively.

5. Set µ µ− + 4e
e − 4e
e=m µ+ .

6. Output σ µ
e.

Algorithm 3: Approximating the mean with bounded variance

Theorem 5. Let |ψi = A|0n i, U = 2|ψihψ| − I. Algorithm 3 uses O(log(σ/) log log(σ/)) copies
of |ψi, uses U O((σ/) log3/2 (σ/) log log(σ/)) times, and estimates E[v(A)] up to additive error 
with success probability at least 2/3.

Proof. First, observe that me is quite close to µ0 := E[v(A0 )] with quite high probability. As
0 2
Var(v(A )) = Var(v(A))/σ ≤ 1, by Chebyshev’s inequality we have
1
Pr[|v(A0 ) − µ0 | ≥ 3] ≤ .
9
e − µ0 | ≤ 3. In this case we have
We therefore assume that |m

kv(B)k2 = E[v(B)2 ]1/2 = E[((v(A0 ) − µ0 ) + (µ0 − m))


e 2 ]1/2
≤ E[(v(A0 ) − µ0 )2 ]1/2 + E[(µ0 − m)
e 2 ]1/2
≤ 4,

where the first inequality is the triangle inequality. Thus kv(B)/4k2 ≤ 1, which implies that
kv(−B<0 )/4k2 ≤ 1 and kv(B≥0 )/4k2 ≤ 1.
The next step is to use Algorithm 2 to estimate E[v(−B<0 )/4] and E[v(B≥0 )/4] with accuracy
/(32σ) and failure probability 1/9. By Lemma 4, if the algorithm succeeds in both cases the
estimates are accurate up to /(8σ). We therefore obtain an approximation of each of E[v(−B<0 )]
and E[v(B≥0 )] up to additive error /(2σ). As we have

E[v(A)] = σ E[v(A0 )] = σ(m


e − E[v(−B<0 )] + E[v(B≥0 )])

by linearity of expectation, using a union bound we have that σ µ


e approximates E[v(A)] up to
additive error  with probability at least 2/3.

2.4 Estimating the mean with bounded relative error


It is often useful to obtain an estimate of the mean output value of an algorithm which is accurate
up to small relative error, rather than the absolute error achieved by Algorithm 3. Assume that

12
we have the bound on the relative variance that Var(v(A))/(E[v(A)])2 ≤ B, where we normally
think of B as small, e.g. B = O(1). Classically, it follows from Chebyshev’s inequality that the
simple classical algorithm described in the Introduction approximates E[v(A)] up to additive error
 E[v(A)] with O(B/2 ) uses of A. In the quantum setting, we can improve the dependence on 
near-quadratically.

Input: An algorithm A such that v(A) ≥ 0 and Var(v(A))/(E[v(A)])2 ≤ B for some B ≥ 1,


and an accuracy  < 27B/4.

e = k1 ki=1 vi .
P
1. Run A k = d32Be times, receiving output values v1 , . . . , vk , and set m

2. Apply Algorithm 2 to A/m e with accuracy 2/(3(2 B + 1)2 ) and failure probability 1/8.
Let µ
e be the output of the algorithm, multiplied by m.
e

3. Output µ
e.

Algorithm 4: Approximating the mean with bounded relative error

Theorem 6. Let |ψi = A|0n i, U = 2|ψihψ| − I. Algorithm 4 uses O(B + log(1/) log log(1/))
copies of |ψi, uses U O((B/) log3/2 (B/) log log(B/)) times, and outputs an estimate µ
e such that

µ − E[v(A)]| ≥  E[v(A)]] ≤ 1/4.


Pr[|e

Proof. The complexity bounds follow from Lemma 4; we now analyse the claim about accuracy.
m
e is a random variable whose expectation is E[v(A)] and whose variance is Var(v(A))/d32Be. By
Chebyshev’s inequality, we have
4 Var(m) 4 Var(v(A)) 1
e − E[m]|
e ≥ |E[m]|/2] ≤ ≤ .
e
Pr[|m 2
= 2
d32BeE[v(A)]
e
E[m]
e 8
We can thus assume that E[v(A)]/2 ≤ me ≤ 3 E[v(A)]/2. In this case, when we apply Algorithm 2
to A/m,
e we receive an estimate of E[v(A)]/m
e which is accurate up to additive error
2(kv(A)k2 /me + 1)2  E[v(A)](2 kv(A)k2 /E[v(A)] + 1)2  E[v(A)]
√ ≤ √ ≤
3(2 B + 1) 2 m(2
e B + 1) 2 m
e

except with probability 1/8, where we use kv(A)k2 /E[v(A)] ≤ B. Multiplying by m e and taking
a union bound, we get an estimate of E[v(A)] which is accurate up to  except with probability at
most 1/4.

Once again, using the powering lemma we can repeat Algorithms 3 and 4 O(log 1/δ) times and
take the median to improve their probabilities of success to 1 − δ for any δ > 0.
To see that Algorithms 3 and 4 are close to optimal, we can appeal to a result of Nayak
and Wu [45]. Let A be an algorithm which picks an integer x between 1 and N uniformly at
random, for some large N , and outputs f (x) for some function f : {1, . . . , N } → {0, 1}. Then
E[v(A)] = |{x : f (x) = 1}|/N . It was shown by Nayak and Wu [45] that any quantum algorithm
which computes this quantity for an arbitrary function f up to (absolute or relative) error  must
make at most Ω(1/) queries to f in the case that |{x : f (x) = 1}| = N/2. As the output of A
for any such function has variance 1/4, this implies that Algorithms 2 and 4 are optimal in the
black-box setting in terms of their scaling with , up to polylogarithmic factors. By rescaling, we
get a similar near-optimality claim for Algorithm 3 in terms of its scaling with σ.

13
3 Partition function problems
In this section we formally state and prove our results about partition function problems. We first
recall the definitions from Section 1.3. A partition function Z is defined by
X
Z(β) = e−β H(x)
x∈Ω

where β is an inverse temperature and H is a Hamiltonian function taking integer values in the set
{0, . . . , n}. Let 0 = β0 < β1 < · · · < β` = ∞ be a sequence of inverse temperatures and assume
that we can easily compute Z(β0 ) = |Ω|. We want to approximate Z(∞) by approximating the
ratios αi := Z(βi+1 )/Z(βi ) and using the telescoping product

Z(β1 ) Z(β2 ) Z(β` )


Z(β` ) = Z(β0 ) ... .
Z(β0 ) Z(β1 ) Z(β`−1 )
Finally, a sequence of Gibbs distributions πi is defined by
1
πi (x) = e−βi H(x) .
Z(βi )

3.1 Chebyshev cooling schedules


We start by motivating, and formally defining, the concept of a Chebyshev cooling schedule [50].
To approximate αi we define the random variable

Yi (x) = e−(βi+1 −βi )H(x) .

Then
1 X −βi H(x) −(βi+1 −βi )H(x) 1 X −βi+1 H(x) Z(βi+1 )
E[Yi ] := Eπi [Yi ] = e e = e = = αi .
Z(βi ) Z(βi ) Z(βi )
x∈Ω x∈Ω

The following result was shown by Dyer and Frieze [19] (see [50] for the statement here):
Theorem 7. Let Y0 , . . . , Y`−1 be independent random variables such that E[Yi2 ]/E[Yi ]2 ≤ B for all
ei be the average of 16B`/2 independent samples from
i, and write Y = E[Y0 ]E[Y1 ] . . . E[Y`−1 ]. Let α
Yi , and set Ye = α
e0 α
e1 . . . α
e`−1 . Then

Pr[(1 − )Y ≤ Ye ≤ (1 + )Y ] ≥ 3/4.

Thus a classical algorithm can approximate Z(∞) up to relative error  using O(B`2 /2 ) sam-
ples in total, assuming that Z(0) can be computed without using any samples and that we have
E[Yi2 ]/E[Yi ]2 ≤ B. To characterise the latter constraint, observe that we have

1 X −βi H(x) −2(βi+1 −βi )H(x) 1 X (βi −2βi+1 )H(x) Z(2βi+1 − βi )


E[Yi2 ] = e e = e = ,
Z(βi ) Z(βi ) Z(βi )
x∈Ω x∈Ω

so
E[Yi2 ] Z(2βi+1 − βi )Z(βi )
2
= .
(E[Yi ]) Z(βi+1 )2
This motivates the following definition:

14
Definition 1 (Chebyshev cooling schedules [50]). Let Z be a partition function. Let β0 , . . . , β` be
a sequence of inverse temperatures such that 0 = β0 < β1 < · · · < β` = ∞. The sequence is called
a B-Chebyshev cooling schedule for Z if

Z(2βi+1 − βi )Z(βi )
≤B
Z(βi+1 )2

for all i, for some fixed B.

Assume that we have a sequence of estimates α ei such that, for all i, |e


αi − αi | ≤ (/2`) αi with
probability at least 1 − 1/(4`). We output as a final estimate

Z
e = Z(0) α
e0 α
e1 . . . α
e`−1 .

By a union bound, all of the estimates α


ei are accurate to within (/2`) αi , except with probability
at most 1/4. Assuming that all the estimates are indeed accurate, we have

Ze
1 − /2 ≤ (1 − /(2`))` ≤ ≤ (1 + /(2`))` ≤ e/2 ≤ 1 + 
Z(∞)

for  < 1. Thus |Ze − Z(∞)| ≤  Z(∞) with probability at least 3/4.
Using these ideas, we can formalise the discussion in Section 1.3.

Theorem 8. Let Z be a partition function with |Ω| = A. Assume that we are given a B-Chebyshev
cooling schedule 0 = β0 < β1 < · · · < β` = ∞ for Z. Further assume that we have the ability to
exactly sample from the distributions πi , i = 1, . . . , ` − 1. Then there is a quantum algorithm which
outputs an estimate Ze such that

Pr[(1 − )Z(∞) ≤ Ze ≤ (1 + )Z(∞)] ≥ 3/4.

using
B`2
      
B` log ` B` B`
O log3/2 log log =O
e
   
samples in total.

Proof. For each i = 1, . . . , ` − 1, we use Algorithm 4 to estimate E[Yi ] up to additive error


(/(2`))E[Yi ] with failure probability 1/(4`). As the βi form a B-Chebyshev cooling schedule,
E[Yi2 ]/E[Yi ]2 ≤ B, so Var(Yi )/E[Yi ]2 ≤ B. By Theorem 6, each use of Algorithm 4 requires
     
B` 3/2 B` B`
O log log log log `
  

samples from πi to achieve the desired accuracy and failure probability. The total number of
samples is thus  2    
B` log ` 3/2 B` B`
O log log log
  
as claimed.

15
3.2 Approximate sampling
It is unfortunately not always possible to exactly sample from the distributions πi . However, one
classical way of approximately sampling from each of these distributions is to use a (reversible,
ergodic) Markov chain which has unique stationary distribution πi . Assume the Markov chain has
relaxation time τ , where τ := 1/(1 − |λ1 |), and λ1 is the second largest eigenvalue in absolute value.
Then one can sample from a distribution π ei such that ke
πi −πi k ≤  using O(τ log(1/(πmin,i ))) steps
of the chain, where πmin,i = minx |πi (x)| [38]. We would like to replace the classical Markov chain
with a quantum walk, to obtain a faster mixing time. A construction due to Szegedy [51] defines
a quantum walk corresponding to any ergodic Markov chain, such that the dependence on τ in the

mixing time can be improved to O( τ ) [48]. Unfortunately, it is not known whether in general the
dependence on πmin,i can be kept logarithmic [48, 18]. Indeed, proving such a result is likely to be
hard, as it would imply a polynomial-time quantum algorithm for graph isomorphism [6].
Nevertheless, it was shown by Wocjan and Abeyesinghe [58] (improving previous work on us-
ing quantum walks for classical annealing [49]) that one can achieve relatively efficient quantum
sampling if one has access to a sequence of slowly varying Markov chains.
Theorem 9 (Wocjan and Abeyesinghe [58]). Let M0 , . . . , Mr be classical reversible Markov chains
with stationary distributions π0 , . . . , πr such that each chain has relaxation time at most τ . Assume
that |hπi |πi+1 i|2 ≥ p for some p > 0 and all i ∈ {0, . . . , r − 1}, and that we can prepare the state
|π0 i. Then, for any  > 0, there is a quantum algorithm which produces a quantum state |e πr i such
that k|e a
πr i − |πr i|0 ik ≤ , for some integer a. The algorithm uses

O(r τ log2 (r/)(1/p) log(1/p))

steps in total of the quantum walk operators Wi corresponding to the chains Mi .

In addition, one can approximately reflect about the states |πi i more efficiently still, with a
runtime that does not depend on r. This will be helpful because Algorithm 4 uses significantly
more reflections than it does copies of the starting state.
Theorem 10 (Wocjan and Abeyesinghe [58], see [59] for version here). Let M0 , . . . , Mr be classical
reversible Markov chains with stationary distributions π0 , . . . , πr such that each chain has relaxation
time at most τ . For each i there is an approximate reflection operator R ei such that

ei |φi|0b i = (2|ψihψ| − I)|φi|0b i + |ξi,


R

where |φi is arbitrary, b = O((log τ )(log 1/)), and |ξi is a vector with k|ξik ≤ . The algorithm

uses O( τ log(1/)) steps of the quantum walk operator Wi corresponding to the chain Mi .

In our setting, we can easily create the quantum state |π0 i, which is the uniform superposition
over all configurations x. We now show that the overlaps |hπi |πi+1 i|2 are large for all i. We go via
the chi-squared divergence
2 X
ν(x)2

2
X ν(x)
χ (ν, π) := π(x) −1 = − 1.
π(x) π(x)
x∈Ω x∈Ω

As noted in [50], one can calculate that

Z(βi )Z(2βi+1 − βi )
χ2 (πi+1 , πi ) = − 1. (1)
Z(βi+1 )2

16
Therefore, if the βi values form a Chebyshev cooling schedule, χ2 (πi+1 , πi ) ≤ B − 1 for all i. For
any distributions ν, π, we also have
s
1 1 X π(x)
p = qP ≤ ν(x) = hν|πi
χ2 (ν, π) + 1 ν(x) ν(x) ν(x)
x∈Ω π(x) x∈Ω


by applying Jensen’s inequality to the function x 7→ 1/ x. So, for all i, |hπi |πi+1 i|2 ≥ 1/B. Note
that in [50] it was necessary to introduce the concept of a reversible Chebyshev cooling schedule to
facilitate “warm starts” of the Markov chains used in the algorithm. That work uses the fact that
one can efficiently sample from πi+1 , given access to samples from πi , if χ2 (πi , πi+1 ) = O(1); this
is the reverse of the condition (1). Here we do not need to reverse the schedule as the precondition
|hπi |πi+1 i|2 ≥ Ω(1) required for Theorem 9 is already symmetric.
We are now ready to formally state our result about approximating partition functions. We
assume that  is relatively small to simplify the bounds; this is not an essential restriction.

Theorem 11. Let Z be a partition function. Assume we have a B-Chebyshev cooling schedule
β0 = 0 < β1 < β2 < · · · < β` = ∞ for B = O(1). Assume that for every inverse temperature βi
we have a reversible ergodic Markov chain Mi with stationary distribution πi and relaxation time
upper-bounded√by τ . Further assume that we can sample directly from M0 . Then, for any δ > 0
and  = O(1/ log `), there is a quantum algorithm which uses
√ √
O((`2 τ /) log5/2 (`/) log(`/δ) log log(`/)) = O(`
e 2 τ /)

steps of the quantum walks corresponding to the Mi chains and outputs Z


e such that

Pr[(1 − )Z(∞) ≤ Ze ≤ (1 + )Z(∞)] ≥ 1 − δ.

Proof. For each i, we use Algorithm 4 to approximate αi up to relative error /(2`), with failure
probability γ, for some small constant γ. This would require R reflections about the state |πβi i, for
some R such that R = O((`/) log3/2 (`/) log log(`/)), and O(log(`/) log log(`/)) copies of |πβi i.
Instead of performing exact reflections and using exact copies of the states |πi i, we use approx-

imate reflections and approximate copies of |πi i. By Theorem 10, O( τ log(1/r )) walk operations
are sufficient to reflect about |πi i up to an additive error term of order r . By Theorem 9, as we
have a Chebyshev cooling schedule, a quantum state |e πi i − |πi i|0b ik ≤ s can be
πi i such that k|e

produced using O(` τ log2 (`/s )) steps of the quantum walks corresponding to the Markov chains
M0 , . . . , M i .
We choose r = γ/R, s = γ. Then the final state of Algorithm 4 using approximate reflections
and starting with the states |e
πi i rather than |πi i can differ from the final state of an exact algorithm
by at most Rr + s = 2γ in `2 norm. This implies that the total variation distance between the
output probability distributions of the exact and inexact algorithms is at most 2γ, and hence by a
union bound that the approximation is accurate up to relative error /(2`) except with probability
3γ. For each i, we then take the median of O(log(`/δ)) estimates to achieve an estimate which is
accurate up to relative error /(2`) except with probability at most δ/`. By a union bound, all
the estimates are accurate up to relative error /(2`) except with probability at most δ, so their
product is accurate to relative error  except with probability at most δ.
The total number of steps needed to produce all the copies of the states |e πi i required is thus

O(` · ` τ (log2 `) · log(`/) log log(`/) · log(`/δ))

17
and the total number of steps needed to perform the reflections is

O(` · τ (log R) · R · log(`/δ)).

Adding the two, substituting the value of R, and using  = O(1/ log `), we get an overall bound
of √ √
O((`2 τ /) log5/2 (`/) log(`/δ) log log(`/)) = O(`
e 2 τ /)

as claimed.

We remark that, in the above complexities, we have chosen to take the number of quantum
walk steps used as our measure of complexity. This is to enable a straightforward comparison with
the classical literature, which typically uses a random walk step as its elementary operation for
the purposes of measuring complexity [50]. To implement each quantum walk step efficiently and
accurately, two possible approaches are to use efficient state preparation [14] or recently developed
approaches to efficient simulation of sparse Hamiltonians [9].
Finally, we mention that one could also consider a more general setting for approximate sam-
pling. Imagine that we would like to approximate the mean µ of some random variable chosen
according to some distribution π, but only have access to samples from a distribution πe that ap-
proximates π (using some method which, for example, might not be a quantum walk). In this case,
one can show that the estimation algorithm does not notice the difference between π e and π and
hence allows efficient estimation of µ. See Appendix A for the details.

3.3 Computing a Chebyshev cooling schedule


We still need to show that, given a particular partition function, we can actually find a Chebyshev
cooling schedule. For this we simply use a known classical result:

Theorem 12 (Štefankovič, Vempala and Vigoda [50]). Let Z be a partition function. Assume that
for every inverse temperature β we have a Markov chain Mβ with stationary distribution πβ and
relaxation time upper-bounded by τ . Further assume that we can sample directly from M0 . Then,
for any δ > 0 and any B = O(1), we can produce a B-Chebyshev cooling schedule of length
p
` = O( log A(log n)(log log A))

with probability at least 1 − δ, using at most

Q = O((log A)((log n) + log log A)5 τ log(1/δ))

steps of the Markov chains.

We remark that a subsequent algorithm [28] improves the polylogarithmic terms and the hidden
constant factors in the complexity. However, this algorithm assumes that we can efficiently generate
independent samples from distributions approximating πβ for arbitrary β. The most efficient general
algorithm known [50] for approximately sampling from arbitrary distributions πβ uses “warm starts”
and hence does not produce independent samples.
Combining all the ingredients, we have the following result:

Corollary
√ 13. Let Z be a partition function and let  > 0 be a desired precision such that  =
O(1/ log log A). Assume that for every inverse temperature β we have a Markov chain Mβ with

18
stationary distribution πβ and relaxation time upper-bounded by τ . Further assume that we can
sample directly from M0 . Then, for any δ > 0, there is a quantum algorithm which uses

O(((log A)(log2 n)(log log A)2 τ /) log5/2 ((log A)/) log((log A)/δ) log log((log A)/)
+ (log A)((log n) + log log A)5 τ log(1/δ)))
√ √
= O((log
e A) τ (1/ + τ ))

steps of the Mβ chains and their corresponding quantum walk operations, and outputs Ze such that

Pr[(1 − )Z(∞) ≤ Ze ≤ (1 + )Z(∞)] ≥ 1 − δ.

The best comparable classical result known is O((log


e A)τ /2 ) [50]. We therefore see that we
have achieved a near-quadratic reduction in the complexity with respect to both τ and , assuming

that  ≤ 1/ τ . Otherwise, we still achieve a near-quadratic reduction with respect to .
Note that, if we could find a quantum algorithm that outputs a Chebyshev cooling schedule

using O((log
e A) τ ) steps of the Markov chains, Corollary 13 would be improved to a complexity

of O((log
e A) τ /). It is instructive to note why this does not seem to be immediate. The classical
algorithm for this problem [50] needs to approximately sample from Markov chains Mβ for arbitrary
values of β. To do this, it starts by fixing a nonadaptive Chebyshev cooling schedule 0 < β10 <
β20 < · · · < β`0 = ∞ of length ` = O(log
e A). When the algorithm wants to sample from Mβ with
0 0
βi < β < βi+1 , the algorithm uses an approximate sample from Mβi0 as a “warm start”. To produce
one sample corresponding to each βi0 value requires O(`τe ) samples, because each Mβ 0 also provides
i
a warm start for Mβi+1 0 . But, in the quantum case, this does not work because, by no-cloning, the
states |πβi0 i cannot be reused in this way to provide warm starts for multiple runs of the algorithm.

3.4 Some partition function problems


In this section we describe some representative applications of our results to problems in statistical
physics and computer science.
The ferromagnetic Ising model. This well-studied statistical physics model is defined in
terms of a graph G = (V, E) by the Hamiltonian
X
H(z) = − zu zv ,
(u,v)∈E

where |V | = n and z ∈ {±1}n . A standard method to approximate the partition function of the
Ising model uses the Glauber dynamics. This is a simple Markov chain with state space {±1}n ,
each of whose transitions involves only updating individual sites, and whose stationary distribution
is the Gibbs distribution
1 −βH(z)
πβ (z) = e .
Z(β)
This Markov chain, which has been intensively studied for decades, is known to mix rapidly in
certain regimes [41]. Here we mention just one representative recent result:

Theorem 14 (Mossel and Sly [44]). For any integer d > 2, and inverse temperature β > 0 such
that (d − 1) tanh β < 1, the mixing time of the Glauber dynamics on any graph of maximum degree
d is O(n log n).

19
(More precise results than Theorem 14 are known for certain specific graphs such as lattices [42].)
As we have A = 2n , in the regime where (d − 1) tanh β < 1 the quantum algorithm approximates
e 3/2 / + n2 ) steps. The fastest known classical algorithm with
Z(β) to within  relative error in O(n
rigorously proven performance bounds [50] uses time O(n e 2 /2 ). We remark that an alternative
approach of Jerrum and Sinclair [34], which is based on analysing a different Markov chain, gives a
polynomial-time classical algorithm which works for any temperature, but is substantially slower.
Counting colourings. Here we are given as input a graph G with n vertices and maximum
degree d. We seek to approximately count the number of valid k-colourings of G, where a colouring
of the vertices is valid if all pairs of neighbouring vertices are assigned different colours, and k =
O(1). In physics, this problem corresponds to the partition function of the Potts model evaluated
at zero temperature. It is known that the Glauber dynamics for the Potts model mixes rapidly in
some cases [20]. One particularly clean result of this form is work of Jerrum [31] showing that this
Markov chain mixes in time O(n log n) if k > 2d. As here A = k n , we obtain a quantum algorithm
approximating the number of colourings of G up to relative error  in O(n e 3/2 / + n2 ) steps, as
compared with the classical O(n 2 2
e / ) [50].
Counting matchings. A matching in a graph G is a subset M of the edges of G such that no
pair of edges in M shares a vertex. In statistical physics, matchings are often known as monomer-
dimer coverings [26]. To count the number of matchings, we consider the partition function
X
Z(β) = e−β|M | ,
M ∈M

where M is the set of matchings of G. We have Z(0) = |M|, while Z(∞) = 1, as in this case
the sum is zero everywhere except the empty matching (00 = 1). Therefore, in this case we seek
to approximate Z(0) using a telescoping product which starts with Z(∞). In terms of the cooling
schedule 0 = β0 < β1 < · · · < β` = ∞, we have

Z(β`−1 ) Z(β`−2 ) Z(β0 )


Z(β0 ) = Z(β` ) ... .
Z(β` ) Z(β`−1 ) Z(β1 )

As we have reversed our usage of the cooling schedule, rather than looking for it to be a B-Chebyshev
cooling schedule we instead seek the bound

Z(2βi − βi+1 )Z(βi+1 )


≤B
Z(βi )2

to hold for all i = 0, . . . , ` − 1. That is, the roles of βi and βi+1 have been reversed as compared
with (1). However, the classical algorithm for printing a cooling schedule can be modified to output
a “reversible” schedule where this constraint is satisfied too, with only a logarithmic increase in
complexity [50]. In addition, it was shown by Jerrum and Sinclair [33, 32] that, for any β, there is
a simple Markov chain which has stationary distribution π, where
1 X
π(M ) = e−β|M | ,
Z(β)
M ∈M

and which has relaxation time τ = O(nm) on a graph with n vertices and m edges. Finally, in the
setting of matchings, A = O(n!2n ). Putting these parameters together, we obtain a quantum com-
e 3/2 m1/2 / + n2 m), as compared with the lowest known classical bound O(n
plexity O(n e 2 m/2 ) [50].

20
4 Estimating the total variation distance
Here we give the technical details of our improvement of the accuracy of a quantum algorithm of
Bravyi, Harrow and Hassidim [13] for estimating the total variation distance between probability
distributions. In this setting, we are given the ability to sample from probability
P distributions p
and q on n elements, and would like to estimate kp − qk := 12 kp − qk1 = 21 x∈[n] |p(x) − q(x)|
up to additive error . Classically, estimating kp − qk up to error, say, 0.01 cannot be achieved
using O(nα ) samples for any α < 1 [54], but in the quantum setting the dependence on n can be
improved quadratically:
Theorem 15 (Bravyi, Harrow and Hassidim [13]). Given the ability to sample from p and q, there
is a quantum algorithm which estimates kp − qk up to additive error , with probability of success

1 − δ, using O( n/(8 δ 5 )) samples.

Here we will use Theorem 3 to improve the dependence on  and δ of this algorithm. We will
approximate the mean output value of the following algorithm, which was a subroutine previously
used in [13].

Let p and q be probability distributions on n elements and let r = (p + q)/2.

1. Draw a sample x ∈ [n] according to r.

2. Use amplitude estimation with t queries, for some t to be determined, to obtain estimates
pe(x), qe(x) of the probability of obtaining outcome x under distributions p and q.

3. Output |e
p(x) − qe(x)|/(e
p(x) + qe(x)).

Algorithm 5: Subroutine for estimating the total variation distance

If the estimates pe(x), qe(x) were precisely accurate, the expected output of the subroutine would
be
X  p(x) + q(x)  |p(x) − q(x)| 1 X
E := = |p(x) − q(x)| = kp − qk.
2 p(x) + q(x) 2
x∈[n] x∈[n]

We now bound how far the expected output E


e of the algorithm is from this exact value. By linearity
of expectation,

X X
|E
e − E| = e − d(x)] ≤
r(x)E[d(x) e − d(x)|]
r(x)E[|d(x)
x∈[n] x∈[n]

where d(x) = |p(x) − q(x)|/(p(x) + q(x)), d(x)


e = |e p(x) − qe(x)|/(e
p(x) + qe(x)). Note that d(x)
e is a
random variable. Split [n] into “small” and “large” parts according to whether r(x) ≤ /n. Then
X X
|E
e − E| ≤ e − d(x)|] +
r(x)E[|d(x) e − d(x)|]
r(x)E[|d(x)
x,r(x)≤/n x,r(x)≥/n
X
≤ + e − d(x)|]
r(x)E[|d(x)
x,r(x)≥/n

using that 0 ≤ d(x), d(x)


e ≤ 1. From Theorem 2, for any δ > 0 we have
p
p(x) π 2
|e
p(x) − p(x)| ≤ 2π + 2
t t

21
p
except with probability at most δ, using O(t log 1/δ) samples from p. If t ≥ 4π/(η p(x) + q(x))
for some 0 ≤ η ≤ 1, this implies that
p p
2πη p(x) p(x) + q(x) π 2 η 2 (p(x) + q(x))
|e
p(x) − p(x)| ≤ + ≤ η(p(x) + q(x))
4π 16π 2
except with probability at most δ. A similar claim also holds for |e
q (x) − q(x)|. We now use the
following technical result from [13]:

Proposition 16. Consider a real-valued function f (p, q) = (p − q)/(p + q) where 0 ≤ p, q ≤ 1.


Assume that |p − pe|, |q − qe| ≤ η(p + q) for some η ≤ 1/5. Then |f (p, q) − f (e
p, qe)| ≤ 5η.
p
By Proposition 16, for all x such that t ≥ 4π/(η p(x) √ + q(x)) we have |d(x) e − d(x)| ≤ 5η,
√ −3/2
except with probability at most 2δ. We now fix t = d10 2π n e. Then, for all x such that
p(x) + q(x) ≥ 2/n, |d(x)
e − d(x)| ≤  except with probability at most 2δ. Thus, for all x such that
r(x) ≥ /n,
e − d(x)|] ≤ 2δ + (1 − 2δ) ≤ 2δ + .
E[|d(x)
Taking δ = , we have
|E
e − E| ≤ 4
√ −3/2 √
for any , using O( n log(1/)) samples. It therefore suffices to use O( n−3/2 log(1/))
samples to achieve |E
e − E| ≤ /2. As the output of this subroutine is bounded between 0 and 1, to
approximate E up to additive error /2 with failure probability δ, it suffices to use the subroutine
e

O((1/) log(1/δ)) times by Theorem 3. So the overall complexity is O(( n/5/2 ) log(1/) log(1/δ)).

For small  and δ this is a substantial improvement on the O( n/(8 δ 5 )) complexity stated by
Bravyi, Harrow and Hassidim [13].

Acknowledgements
This work was supported by the UK EPSRC under Early Career Fellowship EP/L021005/1. I
would like to thank Aram Harrow for helpful conversations and pointing out references, and Daniel
Lidar for supplying further references. I would also like to thank several anonymous referees for
their helpful comments. Special thanks to Tongyang Li for pointing out an error in Section 4.

A Stability of Algorithm 3
It is often the case that one wishes to estimate some quantity of interest defined in terms of samples
from some probability distribution π, but can only sample from a distribution π e which is close to
π in total variation distance (for example, using Markov chain Monte Carlo methods). We now
show that, if Algorithm 3 is given access to samples from π e rather than π, it does not notice the
difference. We will need the following claim.

Claim 17. For any x, y ∈ [0, 1],


πp 2
| arcsin x − arcsin y| ≤ |x − y 2 |.
2

22
Proof. We use a standard addition formula for arcsin to obtain
p p
| arcsin x − arcsin y| = | arcsin(x 1 − y 2 − y 1 − x2 )|
π p 2 p
≤ | x (1 − y 2 ) − y 2 (1 − x2 )|
2
πp 2
≤ |x − y 2 |,
2
where the first inequality is sin θ ≥ (2/π)θ for all θ ∈ [0, π/2], and the second inequality is
p p
|a − b| ≤ |a − b|(a + b) = |a2 − b2 |,
valid for all non-negative a and b.

Lemma 18. Let A and B be algorithms with distributions DA and DB on their output values, such
that kDA − DB k ≤ γ, for some γ. Assume that Algorithm 3 is applied to A, and uses the operator
U = 2|ψihψ| − I T times, where |ψi = A|0i. Then the algorithm estimates E[v(B)] up to additive
π2 √
error  except with probability at most 3/10 + √ 6
T γ.

Lemma 18 is reminiscent of the hybrid argument for proving lower bounds on quantum query
complexity [8]: if the distributions DA and DB are close, and the amplitude amplification algorithm
makes few queries, it cannot distinguish them. However, here the quantifiers appear in a different
order: whereas in the setting of lower-bounding quantum query complexity we wish to show that
there exist pairs of distributions which are indistinguishable by any possible algorithm, here we
wish to show that one fixed algorithm cannot distinguish any pair of close distributions.
Also note that Wocjan et al. [59] proved a similar result in the setting where we are given access
to an approximate rotation U e ≈ U . However, the result here is more general, in that we do not
assume that |φi = B|0i is close to |ψi, but merely that the measured probability distributions are
close.

Proof. We first use the calculations for the output probabilities of the amplitude estimation algo-
rithm from [12] when applied as in Theorem 3 with t queries to an algorithm with mean output
value µA , and another with mean output value µB .
For x, y ∈ R, define d(x, y) = minz∈Z |z +x−y|. 2πd(x, y) is the length of the shortest arc on the
unit circle between e2πix and e2πiy . Let ωA and ωB be defined by sin2 ωA = µA , sin2 ωB = µB , and
set ∆ = d(ωA , ωB ). Finally, let MA and MB be the distributions over the measurement outcomes
when amplitude estimation is applied to estimate µA , µB .
The distribution on the measurement outcomes of the amplitude estimation algorithm after t
uses of the input operator, when applied to a phase of ω, is equivalent [12] to that obtained by
measuring the state
1 X 2πiωy
|St (ω)i := √ e |yi,
t y∈[t]

so the total variation distance between the distributions MA and MB obeys the bound
sin2 (πt∆)
kMA − MB k2 ≤ 1 − |hSt (ωA )|St (ωB )i|2 = 1 − ,
t2 sin2 (π∆)
where the first equality is standard [46] and the second equality is [12, Lemma 10]. Using the
inequalities
θ3
θ− ≤ sin θ ≤ θ,
6

23
valid for θ ≥ 0, we obtain
2 2
πt∆ − (πt∆)3 /6 (πt∆)2 (πt∆)2
 
kMA − MB k2 ≤ 1 − =1− 1− ≤ .
tπ∆ 6 3

As we have
πp
∆ = min |z + ωA − ωB | ≤ |ωA − ωB | ≤ |µA − µB |
z∈Z 2
by Claim 17, we have
π2 p
kMA − MB k ≤ √ t |µA − µB |.
2 3
Within Algorithm 2, Theorem 3 is applied to v(A2`−1 ,2` )/2` for various values of `. We have

1 X
|E[v(A2`−1 ,2` )/2` ] − E[v(B2`−1 ,2` )/2` ]| = x| Pr[v(A) = x] − Pr[v(B) = x]|
2` `−1
2 ≤x<2`
X
≤ | Pr[v(A) = x] − Pr[v(B) = x]|
x
= 2kDA − DB k ≤ 2γ.

Thus, for each run of the algorithm which uses A t times,

π2 √
kMA − MB k ≤ √ t γ.
3
This is equivalent to the output of the algorithm being a probabilistic mixture of MB and some
π2 √
other distribution M, where the probability of it being M is at most √ 3
t γ.
Algorithm 3 uses A T times in total. Each use of A is either within Algorithm 2 or one separate
sample from v(A) in Algorithm 3. We can similarly think of this sample as being taken from B,
π2 √
except with probability at most γ ≤ √ 3
γ. Taking a union bound over all uses of A, we get the
claimed bound.

References
[1] D. Abrams and C. Williams. Fast quantum algorithms for numerical integrals and stochastic
processes, 1999. quant-ph/9908083.

[2] D. Aharonov. Quantum computation. In Annual Reviews of Computational Physics VI, chap-
ter 7, pages 259–346. World Scientific, 1998. quant-ph/9812037.

[3] D. Aharonov, A. Ambainis, J. Kempe, and U. Vazirani. Quantum walks on graphs. In Proc.
33rd Annual ACM Symp. Theory of Computing, pages 50–59, 2001. quant-ph/0012090.

[4] D. Aharonov, I. Arad, E. Eban, and Z. Landau. Polynomial quantum algorithms for
additive approximations of the Potts model and other points of the Tutte plane, 2007.
quant-ph/0702008.

[5] D. Aharonov, A. Kitaev, and N. Nisan. Quantum circuits with mixed states. In Proc. 30th
Annual ACM Symp. Theory of Computing, pages 20–30, 1998.

24
[6] D. Aharonov and A. Ta-Shma. Adiabatic quantum state generation. SIAM J. Comput.,
37(1):47–82, 2007. quant-ph/0301023.

[7] I. Arad and Z. Landau. Quantum computation and the evaluation of tensor networks. SIAM
J. Comput., 39:3089–3121, 2010. arXiv:0805.0040.

[8] C. Bennett, E. Bernstein, G. Brassard, and U. Vazirani. Strengths and weaknesses of quantum
computing. SIAM J. Comput., 26(5):1510–1523, 1997. quant-ph/9701001.

[9] D. Berry, A. Childs, and R. Kothari. Hamiltonian simulation with nearly optimal dependence
on all parameters. In Proc. 56th Annual Symp. Foundations of Computer Science, pages 792–
809, 2015. arXiv:1501.01715.

[10] I. Bezáková, D. Štefankovič, V. Vazirani, and E. Vigoda. Accelerating simulated annealing


for the permanent and combinatorial counting problems. SIAM J. Comput., 37(5):1429–1454,
2008.

[11] G. Brassard, F. Dupuis, S. Gambs, and A. Tapp. An optimal quantum algorithm to approx-
imate the mean and its application for approximating the median of a set of points over an
arbitrary distance, 2011. arXiv:1106.4267.

[12] G. Brassard, P. Høyer, M. Mosca, and A. Tapp. Quantum amplitude amplification and es-
timation. Quantum Computation and Quantum Information: A Millennium Volume, pages
53–74, 2002. quant-ph/0005055.

[13] S. Bravyi, A. W. Harrow, and A. Hassidim. Quantum algorithms for testing properties of
distributions. IEEE Trans. Inform. Theory, 57(6):3971–3981, 2011. arXiv:0907.3920.

[14] C.-F. Chiang, D. Nagaj, and P. Wocjan. Efficient circuits for quantum walks. Quantum Inf.
Comput., 10(5&6):420–424, 2010. arXiv:0903.3465.

[15] P. Dagum, R. Karp, M. Luby, and S. Ross. An optimal algorithm for Monte Carlo estimation.
SIAM J. Comput., 29(5):1484–1496, 2000.

[16] G. De las Cuevas, W. Dür, M. van den Nest, and M. Martin-Delgado. Quantum algorithms
for classical lattice models. New J. Phys., 13:093021, 2011. arXiv:1104.2517.

[17] N. Destainville, B. Georgeot, and O. Giraud. Quantum algorithm for exact Monte Carlo
sampling. Phys. Rev. Lett., 104:250502, 2010. arXiv:1003.1862.

[18] V. Dunjko and H. Briegel. Sequential quantum mixing for slowly evolving sequences of Markov
chains, 2015. arXiv:1503.01334.

[19] M. Dyer and A. Frieze. Computing the volume of convex bodies: a case where randomness
provably helps. In Probabilistic Combinatorics and Its Applications, volume 44 of Proceedings
of Symposia in Applied Mathematics, pages 123–170. American Mathematical Society, 1992.

[20] A. Frieze and E. Vigoda. A survey on the use of Markov chains to randomly sample colourings.
In Combinatorics, Complexity and Chance, pages 53–71. Oxford University Press, 2007.

[21] J. Geraci and D. Lidar. On the exact evaluation of certain instances of the Potts partition
function by quantum computers. Comm. Math. Phys., 279:735–768, 2008. quant-ph/0703023.

25
[22] J. Geraci and D. Lidar. Classical Ising model test for quantum circuits. New J. Phys.,
12:075026, 2010. arXiv:0902.4889.

[23] P. Glasserman. Monte Carlo methods in financial engineering. Springer, New York, 2003.

[24] L. Grover. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett.,
79(2):325–328, 1997. quant-ph/9706033.

[25] L. Grover. A framework for fast quantum mechanical algorithms. In Proc. 30th Annual ACM
Symp. Theory of Computing, pages 53–62, 1998. quant-ph/9711043.

[26] O. Heilmann and E. Lieb. Theory of monomer-dimer systems. Comm. Math. Phys., 25:190–
232, 1972.

[27] S. Heinrich. Quantum summation with an application to integration. Journal of Complexity,


18(1):1–50, 2001. quant-ph/0105116.

[28] M. Huber. Approximation algorithms for the normalizing constant of Gibbs distributions,
2012. arXiv:1206.2689.

[29] M. Huber. Improving Monte Carlo randomized approximation schemes, 2014.


arXiv:1411.4074.

[30] C. Jacoboni and P. Lugli. The Monte Carlo method for semiconductor device simulation.
Springer-Verlag, Wien-New York, 1989.

[31] M. Jerrum. A very simple algorithm for estimating the number of k-colourings of a low-degree
graph. Random Structures and Algorithms, 7(2):157–165, 1995.

[32] M. Jerrum. Counting, sampling and integrating: algorithms and complexity. Birkhäuser Verlag,
Basel, 2003.

[33] M. Jerrum and A. Sinclair. Approximating the permanent. SIAM J. Comput., 18(6):1149–
1178, 1989.

[34] M. Jerrum and A. Sinclair. Polynomial-time approximation algorithms for the Ising model.
SIAM J. Comput., 22(5):1087–1116, 1993.

[35] M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial structures from
a uniform distribution. Theoretical Computer Science, 43(2–3):169–188, 1986.

[36] E. Knill, G. Ortiz, and R. Somma. Optimal quantum measurements of expectation values of
observables. Phys. Rev. A, 75:012328, 2007. quant-ph/0607019.

[37] W. Krauth. Statistical Mechanics: Algorithms and Computations. Oxford University Press,
Oxford, 2006.

[38] D. Levin, Y. Peres, and E. Wilmer. Markov chains and mixing times. American Mathematical
Society, 2009.

[39] D. Lidar. On the quantum computational complexity of the Ising spin glass partition function
and of knot invariants. New J. Phys., 6:167, 2004. quant-ph/0309064.

[40] D. Lidar and O. Biham. Simulating Ising spin glasses on a quantum computer. Phys. Rev. E,
56:3661, 1997. quant-ph/9611038.

26
[41] F. Martinelli. Lectures on Glauber dynamics for discrete spin models. In Lectures on probability
theory and statistics (Saint-Flour, 1997), volume 1717 of Lecture Notes in Mathematics, pages
93–191. Springer, 1997.

[42] F. Martinelli and E. Olivieri. Approach to equilibrium of Glauber dynamics in the one phase
region. Comm. Math. Phys., 161(3):447–486, 1994.

[43] A. Matsuo, K. Fujii, and N. Imoto. Quantum algorithm for an additive approximation of Ising
partition functions. Phys. Rev. A, 90:022304, 2014. arXiv:1405.2749.

[44] E. Mossel and A. Sly. Exact thresholds for Ising-Gibbs samplers on general graphs. The Annals
of Probability, 41(1):294–328, 2013.

[45] A. Nayak and F. Wu. The quantum query complexity of approximating the median and
related statistics. In Proc. 31st Annual ACM Symp. Theory of Computing, pages 384–393,
1999. quant-ph/9804066.

[46] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge
University Press, 2000.

[47] D. Poulin and P. Wocjan. Sampling from the thermal quantum Gibbs state and evaluating
partition functions with a quantum computer. Phys. Rev. Lett., 103:220502, 2009. 0905.2199.

[48] P. Richter. Quantum speedup of classical mixing processes. Phys. Rev. A, 76:042306, 2007.
quant-ph/0609204.

[49] R. Somma, S. Boixo, H. Barnum, and E. Knill. Quantum simulations of classical annealing
processes. Phys. Rev. Lett., 101(13):130504, 2008. arXiv:0804.1571.

[50] D. Štefankovič, S. Vempala, and E. Vigoda. Adaptive simulated annealing: a new connection
between sampling and counting. J. ACM, 56(3):18:1–18:36, 2009. cs.DS/0612058.

[51] M. Szegedy. Quantum speed-up of Markov chain based algorithms. In Proc. 45th Annual
Symp. Foundations of Computer Science, pages 32–41, 2004. quant-ph/0401053.

[52] K. Temme, T. Osborne, K. Vollbrecht, D. Poulin, and F. Verstraete. Quantum Metropolis


sampling. Nature, 471:87–90, 2011. arXiv:0911.3635.

[53] R. Tucci. Use of quantum sampling to calculate mean values of observables and partition
function of a quantum system, 2009. arXiv:0912.4402.

[54] P. Valiant. Testing symmetric properties of distributions. SIAM J. Comput., 40(6):1927–1968,


2011.

[55] J. Valleau and D. Card. Monte Carlo estimation of the free energy by multistage sampling. J.
Chem. Phys., 57:5457, 1972.

[56] M. Van den Nest, W. Dür, R. Raussendorf, and H. Briegel. Quantum algorithms for spin
models and simulable gate sets for quantum computation. Phys. Rev. A, 80:052334, 2008.
arXiv:0805.1214.

[57] S. Venegas-Andraca. Quantum walks: a comprehensive review. Quantum Information Pro-


cessing, 11(5):1015–1106, 2012. arXiv:1201.4780.

27
[58] P. Wocjan and A. Abeyesinghe. Speedup via quantum sampling. Phys. Rev. A, 78:042336,
2008. arXiv:0804.4259.

[59] P. Wocjan, C.-F. Chang, D. Nagaj, and A. Abeyesinghe. Quantum algorithm for approximating
partition functions. Phys. Rev. A, 80:022340, 2009. arXiv:0811.0596.

[60] M.-H. Yung and A. Aspuru-Guzik. A quantum-quantum Metropolis algorithm. Proceedings


of the National Academy of Sciences, 109(3):754–759, 2012. arXiv:1011.1468.

28

You might also like