0% found this document useful (0 votes)
7 views

Markov Chain Monte-Carlo Enhanced Variational Quantum Algorithms

Algoritmos quânticos

Uploaded by

rg.reis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Markov Chain Monte-Carlo Enhanced Variational Quantum Algorithms

Algoritmos quânticos

Uploaded by

rg.reis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Markov Chain Monte-Carlo Enhanced Variational Quantum Algorithms

Taylor L. Patti,1, 2, ∗ Omar Shehab,2 Khadijeh Najafi,1, 2 and Susanne F. Yelin1


1
Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
2
IBM Quantum, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
Variational quantum algorithms are poised to have significant impact on high-dimensional opti-
mization, with applications in classical combinatorics, quantum chemistry, and condensed matter.
Nevertheless, the optimization landscape of these algorithms is generally nonconvex, causing subop-
timal solutions due to convergence to local, rather than global, minima. In this work, we introduce a
variational quantum algorithm that uses classical Markov chain Monte Carlo techniques to provably
converge to global minima. These performance guarantees are derived from the ergodicity of our
algorithm’s state space and enable us to place analytic bounds on its time-complexity. We demon-
arXiv:2112.02190v2 [quant-ph] 1 Feb 2022

strate both the effectiveness of our technique and the validity of our analysis through quantum
circuit simulations for MaxCut instances, solving these problems deterministically and with per-
fect accuracy. Our technique stands to broadly enrich the field of variational quantum algorithms,
improving and guaranteeing the performance of these promising, yet often heuristic, methods.

I. INTRODUCTION due to its success in high-dimensional spaces and suit-


ability for unnormalized probability distributions [32].
Since[1] the advent of the Variational Quantum Eigen- MCMC-VQA utilizes modified VQAs and their statistics
solver (VQE) [2, 3] and Quantum Approximate Opti- as the Metropolis-Hastings transition kernels and quan-
mization Algorithm (QAOA) [4], quantum algorithms tum state energies as state likelihoods. These quantities
that function in tandem with classical machine learning are then used to determine the viability of parameter
have garnered great interest. These variational quantum updates. Our algorithm requires no increase in quan-
algorithms (VQAs) typically harness some form of clas- tum overhead and only a minimal increase classical over-
sical gradient descent to tackle a large-scale optimization head. MCMC-VQA represents a time-discrete, space-
problem on the exponential state space of quantum hard- continuous Markov chain, as the algorithm progresses
ware [5–7]. Applications of these methods have included in discrete VQA epochs while training a continuous-
the optimization of NP-hard combinatorial problems [8– parameter quantum circuit. It can also be classified as
12], the identification of eigenstates and energies in quan- a form of Stochastic Gradient Descent MCMC [33, 34].
tum chemistry applications [13–15], and the study of con- Although in this work we focus on VQE [3], our tech-
densed matter systems [16–18]. Much like their classical niques are readily applicable to a wide array of quantum
counterparts, the above near-term quantum algorithms machine learning applications.
can be plagued by nonconvex optimization landscapes, While other works have introduced quantum subrou-
causing them to converge to suboptimal minima [19]. A tines for classical MCMC methods that offer a quadratic
variety of techniques have been suggested to address this speedup for random walks [35–37] and sampling [38, 39],
issue in NP-hard combinatorial optimization problems, this manuscript takes the opposite approach by designing
such as: “warm starting” proceedures [20–22], composi- a classical MCMC subroutine for quantum algorithms.
tion with classical neural networks [23], multibasis encod- Likewise, while classical MCMC methods have been used
ings with bistable convergence [11], and other techniques to simulate quantum computing routines [40, 41], our
[12, 24, 25]. However, these methods offer few provable work uses classical MCMC to enhance quantum algo-
optimization guarantees of practical utility. While opti- rithms on quantum hardware.
mization landscapes are known to become more convex We briefly review VQAs, focusing on VQE (Fig. 1,
with high-depth [19], the adverse effect of quantum noise gray) for quantum optimization for MaxCut problems.
[26] and barren plateaus [27–31] on deep quantum net- This choice of application is motivated by the ample non-
works is well-documented. convexity of the corresponding quadratic loss functions
In order to avoid the local minima convergence that [11, 19]. VQAs are parameterized by input states |ψi
plagues VQAs, we introduce MCMC-VQA, a technique and quantum circuit unitaries Ut = U (θ̂t ), where θ̂t are
that adapts the ergodic exploration of classical Markov the variable parameters learned during epoch t−1. With-
chain Monte Carlo (MCMC) to guarantee the global con- out loss of
Qgenerality, we choose the n-qubit input state
n−1
vergence of quantum algorithms. As samples of ergodic as |0i = i=0 |0i such that the output state is entirely
systems are representative of their underlying probability defined by θ̂ and assume that the initial parameters θ̂0
distribution, an ergodic VQA necessarily yields a sam- are randomly selected at the start of each new sequence
ple that contains states near the global minimum. In of epochs.
this work, we focus on the Metropolis-Hastings algorithm MaxCut is a partitioning problem on undirected
graphs G (Fig. 1, black), where edges ωi connect pairs of
vertices via , vib [42]. The goal is to optimally assign all
vertices via , vib ∈ {−1, 1}, so as to maximize the objec-
[email protected] tive function
2

FIG. 1: Diagram of a random graph for MaxCut, VQE, and MCMC-VQA. Random graphs (black, Secs. I and
III) in this work are generated with normally distributed edge weights wi . The objective is to minimize Eq. 1 by
optimally assigning each pair of vertices via , vib ∈ {−1, 1}. MaxCut can be solved on a quantum computer by
mapping via , vib → σia , σib and minimizing the corresponding H. See Sec. III for graph details. VQE (gray, Sec. I)
minimizes the loss function for each θ̂ by calculating the expectation value Λ(θ̂) and updating θ̂ with gradient
descent using ∇Λ(θ̂). MCMC-VQA (blue, Sec. II) uses gradient descent with ∇Λ(θ̂) and random noise ξΘr to
produce candidate state θ̂′ , but also calculates probability distributions P (θ̂) and P (θ̂′ ), as well as proposal
distributions G(θ̂′ |θ̂) and G(θ̂|θ̂′ ). Using these distributions, the acceptance distribution A(θ̂′ |θ̂) is calculated and
compared to random uniform sample u ∼ U (0, 1). If A(θ̂′ |θ̂) > u, then θ̂′ → θ̂. Otherwise, the MCMC-VQA
algorithm restarts with the original θ̂. (Red) after the maximum number of MCMC-VQA epochs TMC have
occurred, the sampled parameters with the lowest loss, θ̂min , are selected and the optimization completes with a
closing sequence of VQE epochs.

where ωi remains unchanged from the MaxCut objective


1X function and via , vib → σia , σib for Pauli-Z spin operators
maximize wi (1 − via vib ) . (1) σia , σib . Maximizing the cut of G is then equivalent to
2 i minimizing the loss function
In this work, we will consider a generalized form of the
problem known as weighted MaxCut, in which wi take Λt = Λ(θ̂t ) = h0|(Ut† |H|Ut )|0i (3)
arbitrary real values. X X
To solve MaxCut via VQE, a graph G is encoded in = ωi hσia σib it = µit ,
the Ising model Hamiltonian i i

where µit are the expectation values of the quadratic Max-


Cut terms. VQE circuit training updates parameters
X
H= ωi σia σib , (2)
i θ̂ via gradient descent on Λt (Fig. 1), where the gra-
3

dient of any θtk ∈ θ̂t can  be calculated as ∇k Λ(θ̂t ) = A(x′ |x) is the acceptance distribution, or the probability
of accepting the new state x′ given state x. To satisfy

Λ(θ̂t + ǫk̂) − Λ(θ̂t − ǫk̂) /2ǫ by finite difference. As
Eq. 4, the acceptance distribution is defined as
∇Λ(θ̂t ) → 0 in the vicinity of both global and local min-
ima, VQE training is prone to stagnation at suboptimal
solutions. 
P (x′ )G(x|x′ )

A(x′ |x) = min 1, . (6)
P (x)G(x′ |x)
II. RESULTS
Note that as only the ratio P (x′ )/P (x) is considered, the
In this section, we present our novel method for en- probability distribution need not be normalized. To de-
hancing the performance of VQAs with classical MCMCs, termine whether the candidate state x′ or the current
a technique that we dub MCMC-VQA. We start by state xt should be used as the future state xt+1 , a sam-
briefly reviewing traditional MCMC, focusing on the ple u is drawn from the uniform distribution U (0, 1). If
Metropolis-Hastings algorithm. Then, we introduce A(x′ |xt ) ≥ u, then xt+1 = x′ and we say that the candi-
MCMC-VQA, derive its behavior, and verify our find- date state x′ is accepted. Otherwise, xt+1 = xt and we
ings with numerical simulations. say that x′ is rejected.
We now present the MCMC-VQA method. Fig. 1 con-
tains a diagram of the algorithm (blue). In particular,
A. MCMC-VQA Method we focus on an ergodic Metropolis-Hastings algorithm,
which is guaranteed to sample states near global minima.
MCMC algorithms, such as Metropolis-Hastings, com- We outline the algorithm both idealistically and experi-
bine the randomized sampling of Monte-Carlo methods mentally, prove its ergodicity and convergence, and verify
with the Markovian dynamics of a Markov chain in or- these findings with numerical simulations.
der to randomly sample from a distribution that is diffi- As we seek the lowest energy eigenstate when solving
cult to characterize deterministically [32]. MCMC is par- MaxCut via VQE, we define P (θ̂) as the Boltzmann dis-
ticularly useful for approximations in high-dimensional tribution
spaces, where the so-called “curse of dimensionality” can
make techniques such as random sampling prohibitively
slow [43]. The core merit of MCMC techniques is their er- X
godicity, which guarantees that all states of the distribu- P (θ̂a ) = exp (−βΛa ) /Z, Z= exp (−βΛi ) , (7)
i
tion are eventually sampled in a statistically representa-
tive way, regardless of which initial point is chosen. This
representative sample is known as the unique stationary such that a state’s probability increases exponentially
distribution π. In particular, any Markov chain that is with decreasing loss function.
both irreducible (each state has a non-zero probability of To calculate the proposal distribution G(θ̂′ |θ̂t ), we
transitioning to any other state) and aperiodic (not par- must consider the sampling statistics of VQAs. Due to
titioned into sets that undergo periodic transitions) will quantum uncertainty, a measurement mri (θ̂t ) of operators
provably converge to its unique stationary distribution π, ωi σia σib from Eq. 2 is a sample from a distribution with
from which it samples ergodically [44]. The mathemati- mean µit and variance
cal properties of ergodic Markov chains are well-studied,
including analytic bounds for solution quality and mixing
time (number of epochs) [45, 46].
(∆it )2 = ωi2 [h(σia σib )2 it − hσia σib i2t ] = ωi2 [1 − (µit )2 ]. (8)
In order to obtain π for a distribution of inter-
est, Metropolis-Hastings specifies the transition kernel
P (x′ |x), which is the probability that state x transitions The Central Limit Theorem asserts that, assuming at
to state x′ . Typically, the Markov process is defined such least M & 30 independent and identically distributed
that transitions satisfy the detailed balance condition: measurements mri (θ̂t ), an estimate of the loss  func-
tion Λt is the statistic lt ∼ N Λt , (∆Λ 2
t ) , where
i 2
(∆Λ 2
Similarly, ∀θtk ∈ θ̂t
P
P (x)P (x′ |x) = P (x′ )P (x|x′ ). (4) t ) = i (∆t ) /M [47, 48].
and assuming small parameter shifts  ǫ, the gradient
When Eq. 4 holds, the chain is said to be reversible and ∇k Λt = Λ(θ̂t + ǫk̂) − Λ(θ̂t − ǫk̂) /2ǫ is the statis-
is guaranteed to converge to a stationary distribution.  
P (x′ |x) can be factored into two quantities tic dk lt ∼ N ∇k Λt , [∆2Λ (θ̂t + ǫk̂) + ∆2Λ (θ̂t − ǫk̂)]/4ǫ2 .
The variance of this distribution can be simplified by not-
ing that to first order in ǫ, the parameter shifted Pauli
P (x′ |x) = G(x′ |x)A(x′ |x), (5) operators are σia ±k
= σia (θ̂ ± ǫk̂) = σia ± ιiak , where
where G(x′ |x) is the proposal distribution, or the condi- σia = σia (θ̂) and ιiak = (∂σia /∂θk )ǫ. We can then sim-
tional probability of proposing state x′ given state x, and plify the sum ∆i (θ̂t + ǫk̂)2 + ∆i (θ̂t − ǫk̂)2 = 2∆i (θ̂t )2 by
4

FIG. 2: Example trajectories with inverse thermodynamic temperature β = 0.8 (left) and β = 0.2 (right).
Four-hundred MCMC-VQA epochs (Markovian epochs) are followed by a closing sequence of VQE epochs
(beginning at red dashed line), which is initialized with the best parameters θ̂min found during the Markov process.
At lower temperature (β = 0.8), trajectories become trapped in local minima and reaching ergodicity is a lengthy
process. Conversely, the high-temperature (β = 0.2) trajectories rapidly reach burn-in, generating θ̂min that lead to
near perfect convergence during the VQE closing sequence. See Sec. III for simulation details.

noting that distribution

 
±k ±k 2 ±k ±k 2
dk lt ∼ N ∇k Λt , ∆2Λ (θ̂t )/2ǫ2 . (10)
∆i (θ̂t + ǫk̂)2 = h(ωi σia σib ) i − hωi σia σib i , (9a)
+k +k 2 −k −k 2
h(σia σib ) i + h(σia σib ) i = 2 + O(ι2 ), (9b) Standard gradient descent would propose the candi-
+k +k 2
hσia σib i + −k −k 2
hσia σib i = 2hσia σib i + O(ι ). 2
(9c) date state θ̂′ = θ̂ − η∇Λt , however MCMC-VQA adds
a normally distributed random noise term Θr ∼ N (0, 1)
with scale parameter ξ in order to expand the support of
Now, up to first order in ι, we can derive the gradient’s the proposal distribution G(θ̂′ |θ̂t ). This specifies

Λ 2

  
2 (∆t )
Y 
′ ′ ′ 2 ′
G(θ̂ |θ̂t ) = G(θ̂ |θ̂t )k , G(θ̂ |θ̂t )k = pdf N η∇k Λ(θ̂t ), ξ + η θ̂ t − θ̂ , (11)
2ǫ2
k

where the notation pdf N µ, σ 2 (x) denotes the prob-


 
A(θ̂′ |θ̂t ) > u and θ̂t+1 = θ̂t otherwise.
ability density function at point x of a normal distribu- After TMC epochs of the above Markovian process,
tion with mean µ and variance σ 2 . It follows that the MCMC-VQA implements a short series of traditional
acceptance distribution is given by VQA epochs for rapid convergence to the nearest min-
imum. In particular, these closing VQA epochs are ini-
! tialized with θ̂min , the parameter set of lowest eigenvalue
′ P (θ̂′ )G(θ̂t |θ̂′ )
A(θ̂ |θ̂t ) = min 1, . (12) Λmin found during the Metropolis-Hastings phase. In this
P (θ̂t )G(θ̂′ |θ̂t ) manner, MCMC-VQA can be considered a “warm start-
ing” procedure [20–22], but with ergodic guarantees.
We note that G(θ̂t |θ̂′ ) is obtained by simply exchanging θ̂t Example MCMC-VQA trajectories are shown in Fig.
and θ̂′ in Eq. 11. A random uniform sample u ∼ U (0, 1) 2 with inverse thermodynamic temperatures β = 0.8 and
is then drawn for comparison, such that θ̂t+1 = θ̂′ if β = 0.2. The details of all simulations are given in
5

Sec. III. Our algorithm combines the gradient descent- ware, as


based optimization of VQE with a Markovian process
that escapes local minima. Such exploration is signifi- !
cantly greater at the higher-temperature β = 0.2, where ′ p(θ̂′ )g(θ̂t |θ̂′ )
a(θ̂ |θ̂t ) = min 1, , (13a)
rather than settling into distinct loss function basins p(θ̂t )g(θ̂′ |θ̂t )
from which escape is relatively rare, the trajectories dis-
play the trademark “burn-in” behavior of ergodic Markov p(θ̂) ∝ exp(−βlt ), (13b)
chains. By the time that the closing VQE epochs are
Y
g(θ̂′ |θ̂t ) = g(θ̂′ |θ̂t )k , (13c)
applied, the ergodic β = 0.2 MCMC-VQA chains have k
sampled states sufficiently near the global minimum and  Λ 2
 
2 (δt )

converge to the groundtruth nearly uniformly. ′ 2 ′
g(θ̂ |θ̂t )k = pdf N ηdk lt , ξ + η θ̂ t − θ̂ .
2ǫ2
(13d)
Fig. 3 (left) displays the average accuracy 1 − α (where
MCMC-VQA does not increase the quantum complexity
α is the average error, blue), and standard deviation
of VQAs (number of operations carried out on quantum
(gray) of MaxCut solutions with MCMC-VQA as a func-
tion of β. Dashed lines represent the performance of hardware), as the measurements to estimate Λ(θ̂) are car-
traditional VQE on the same set of graphs and circuit ried out in the typical way. Moreover, the acceptance
ansatz. We note that all simulated β values outper- distribution and its components are computed classically
form traditional VQE. Until β ∼ 0.2, higher tempera- with simple arithmetic.
ture MCMC-VQA chains have higher accuracy and bet-
ter convergence, as their more permissive temperature
C. Proof of Ergodicity
parameter biases the acceptance distribution towards ac-
cepting the candidate states. However, performance de-
creases at very high temperatures, for which the MCMC- If a Metropolis-Hastings algorithm is irreducible and
VQA chains are no longer appreciably biased towards en- aperiodic, then the resulting Markov chain is provably
ergy minimization and the algorithm becomes more like ergodic [44]. That is, it will explore all areas of the prob-
random sampling than intrepid gradient descent. Like- ability distribution, converging on average to the Markov
wise, the optimal amount of parameter update noise ξ is process’ unique stationary distribution, which includes
inversely proportional to β (Fig. 3, right), as higher tem- the global minimum of the solution space. Moreover,
peratures permit more radical deviations from standard as we have chosen to sample from the Boltzmann distri-
gradient descent. bution of the loss function, we sample from states near
optimal solutions with exponentially higher probability.

C.1. Irreducibility

The VQA Metropolis-Hastings Markov chain is irre-


ducible if ∀θ̂a , θ̂b , ∃T, {θ̂1 , θ̂2 , ....., θˆT } such that

TY
−1
p(θ̂1 |θ̂a )p(θ̂b |θ̂T ) p(θ̂i+1 |θ̂i ) > 0. (14)
B. Implementation of MCMC-VQA on Quantum
i=1
Hardware
That is, the Markov chain is irreducible if, for any two
points in parameter space θ̂a , θ̂b , there exists a series of
As discussed above, the loss function Λt is not pre- transitions of any length T such that θ̂a → θ̂b with non-
cisely determined on actual quantum zero probability [49]. While this definition of irreducibil-
P hardware,
i
but
i ity is sufficient, we will instead focus on the yet more
rather estimated as a statistic lt = q
i t , where qt =
1
PM r powerful condition of strong irreducibility. A Markov
M r=1 mi (θ̂t ). As a result, the variance of a single chain is strongly irreducible iff
observable measurement (∆it )2 is estimated by (δti )2 =
ωi2 [1−(qti )2 ], while thatPof the total loss
P function (∆Λ 2
t ) is
Λ 2 i 2 2 i 2
estimated by (δt ) = i (δt ) /M = i ωi [1 − (qt ) ]/M , g(θ̂a |θ̂b ) > 0, ∀θ̂a , θ̂b , (15)
for M -measurements per observable. Alternatively, the
variances could be directly estimated from the standard meaning that all points in parameter space have a non-
deviations of expectation value statistics. We then define zero probability of transitioning to all other points [50].
a(θ̂′ |θ̂t ), the acceptance distribution on quantum hard- This condition is then equivalent to
6

FIG. 3: (Left, blue) Average MCMC-VQA accuracy (1 − α, for average error α) vs inverse thermodynamic
temperature β. Nearly perfect average accuracy is obtained for properly tuned hyperparameter β (here, β ≈ 0.2).
At low temperature (large β), the algorithm mixes slowly, only partially approximating ergodicity in TMC = 400
Markovian epochs. This partial convergence results in lower accuracy, which approaches that of traditional VQE
(blue dashed line) in the limit of large β. Conversely, for high temperature (small β), the algorithm is insufficiently
biased towards low-energy solutions, which renders its gradient descent inefficient and reduces its accuracy. (Left,
gray) The standard deviation of MCMC-VQE accuracy vs β. Higher standard deviation directly corresponds with
lower accuracy. As discussed above, at high β, this is due to runs trapped in local minima (see Fig. 2), while at low
β, this stems from the lack of energy-preferred convergence. (Left) Optimal value of ξ vs β, where ξ is the gradient
descent noise parameter (θ̂′ = θ̂ − η∇Λt + ξΘr ) and each trajectory undergoes TMC = 400 Markovian epochs. As
larger temperatures generate more permissive acceptance distributions A(θ̂′ |θ̂), higher ξ values lead to more efficient
mixing in the low-β limit. See Sec. III for simulation details.

" 2 #
(2π)−1/2 − θak − θbk − ηdk la
g(θ̂b |θ̂a )k = p exp > 0, ∀k, (16)
ξ 2 + η 2 (δaΛ )2 /2ǫ2 2 (ξ 2 + η 2 (δaΛ )2 /2ǫ2 )

2
where we note that δΛ (θ̂t ) ∝ 1/M . creasing ξ. Moreover, due to the uncertainty introduced
Eq. 15 is satisfied, at least technically to some toler- by finite statistics dk la and (δaΛ )2 , sampling of the propo-
ance, ∀θ̂a , θ̂b . Although g(θ̂b |θ̂a )k may become very small, sition kernel g(θ̂b |θ̂a )k can allow for otherwise unlikely
it will generally retain a non-zero probability for virtually transitions.
all transitions, and the chain will be strongly irreducible, C.2. Aperiodicity
albeit perhaps slow to convergence. More precise argu-
ments can be made in the limit of large √ ξ, where to first
order in small 1/ξ, g(θ̂b |θ̂a )k → 1/ 2πξ and all transi- In the case of strong irreducibility argued above (Eq.
tions become equally likely. While this extreme ξ limit 15), aperiodicity is automatically satisfied. Assuming
is too random to result in efficient gradient descent, it only the weaker irreducibility of Eq. 14, it is sufficient
illustrates a concrete transition to irreducibility with in- to show that [49]

" #
(2π)−1/2 − (ηdk la )2
a(θ̂a |θ̂a )g(θ̂a |θ̂a ) = g(θ̂a |θ̂a ) = p exp > 0. (17)
ξ 2 + η 2 (δaΛ )2 /2ǫ2 2 (ξ 2 + η 2 (δaΛ )2 /2ǫ2 )

As long as η 6≫ ξ, Eq. 17 holds for all but singular points θ̂a .


7

FIG. 4: Average accuracy vs Markovian epochs for three different β values. Gray dots are the average MCMC-VQA
accuracy 1 − α, and blue curves are a least squares fit of this data to the analytical accuracy of an ergodic Markov
chain 1 − αMC (τ ), with theoretical mixing time τ (see Eq. 18). The analytical time-dependence of αMC matches the
observed scaling of α, affirming that MCMC-VQA is an ergodic Markov chain, and thus guaranteeing convergence to
the global minimum. Furthermore, the ratio of observed scale parameters √ between MCMC-VQA simulations with
different β values is consistent with the analytic dependence τ ∝ ln(1/ π ∗ ) (Eq. 18) on the least likely state
π ∗ ∝ exp(−βΛmax ) (Eq. 7). This functional dependence on temperature further supports our claims of ergodically
sampling from P (θ̂) and thus deterministically converging to the global minimum.

D. Mixing Time further verifies that MCMC-VQA is an ergodic Markov


process that successfully samples from the target distri-
The mixing time τ of a Markov chain is the number bution.
of epochs required to reach a certain threshold of conver-
gence. For an ergodic, discrete-time Markov chain, τ is
analytically bounded by III. NUMERICAL SIMULATIONS

The simulations in this work are done using a modified


 
2 1
τ ≤ 2 ln √ , (18) version of TensorLy-Quantum, an open-source software
Φ αMC π ∗
package for quantum circuit simulation using factorized
where αMC = |S − π| is the distance between the Markov tensors [51, 52]. TensorLy-Quantum specializes in exact
chain’s sampled distribution S and the true stationary tensor contraction, such that the simulations are carried
distribution π, π ∗ is the probability of the least likely out without truncation or approximation.
(maximum energy) state of π, and Φ is the conductance The MaxCut instances optimized in this work are gen-
or “Cheeger constant” of the Markov process [45]. The erated from ten graphs. Each graph has ten vertices and
conductance can be understood as the minimum of nor- an equal number of randomly selected edges, which are
malized ergodic flows between all possible partitions of randomly generated from the unit normal distribution.
the state space. Such graphs are equivalent to the Gilbert model of ran-
Fig. 4 demonstrates that the performance of MCMC- dom graphs [53]. The number of edges was chosen to
VQA is consistent with the theoretical predictions of er- be equal to that of vertices as this ratio is observed to
godic Markov chains (Eq. 18). That is, the time depen- pose high difficulty for random MaxCut problems of this
dence of MCMC-VQA optimization error α follows the model [54, 55].
same ln(1/α) scaling as the distribution distance αMC in All numerical simulations in this work are done us-
Eq. 18. Moreover, least-squares analysis of Fig. 4 data ing the graphs described above, with twenty randomly
reveals a√β-dependent scale factor that is proportional initialized runs completed for each graph. The quan-
to ln(1/ π ∗ ), which is consistent with the Boltzmann tum circuits use one parameterized rotation per vertex.
distribution p(θ̂a ) ∝ exp(−βΛa ) from which our method We illustrate our work using circuits with relatively few
samples. This temperature-dependent time-complexity parameters, because their optimization landscape is es-
8

pecially nonconvex and thus prone to convergence in lo- validity of our analytical findings, and the capacity of
cal minima [19], however MCMC-VQA can be used with MCMC-VQA to not only outperform traditional VQAs,
arbitrary parameterization. The circuit gates are alter- but to do so with up to perfect and deterministic conver-
nated between a layer of single-qubit parameterized ro- gence.
tations (angles θ̂) about the y-axis and a layer of two- In future research, MCMC-VQA should be studied for
qubit control-Z gates. For each method (VQE or MCMC- a variety of different applications, quantum algorithms,
VQA) and set of hyperparameters, a variety of learning and Markov processes. In addition to quantum optimiza-
rates are scanned so that numerical comparisons could tion, VQAs have been employed to address a myriad of
be drawn against the optimal performance of each algo- topics in both quantum chemistry [13–15] and condensed
rithm. All VQE sequences consisted of 100 epochs. Fig. matter physics [16–18]. Moreover, even simple quantum
2 shows an ensemble of trajectories whereas Figs. 3 and Hamiltonians, such as the transverse field Ising model,
4 is the average over the optimal learning rate for all ten are known to acutely struggle with premature conver-
graphs and 20 random initializations. For simplicity, we gence to local, rather than global, minima. Similarly,
take the large M limit, assuming many measurements our technique could be extended to QAOA [4] or any of
and precise expectation values. the numerous VQAs that have been proposed in recent
years. Finally, tens of MCMCs have been devised over
the past 70 years, each with their own advantages, with
IV. DISCUSSION variations featuring Gibbs sampling [56], parallel temper-
ing [57], and independence sampling [58]. These methods
In this work, we have introduced MCMC-VQA: a novel could be substituted for Metropolis-Hastings in order to
variational quantum algorithm that harnesses classical produce algorithms with lower computational overhead
Makov chains to obtain analytic convergence gaurantees and faster mixing times. In short, varieties of MCMC-
for parameterized quantum circuits. As ergodic Markov VQA can be developed for a broad spectrum of varia-
chains representatively sample a target probability dis- tional quantum algorithms to both improve and guaran-
tribution, they identify regions near the global minimum tee performance.
with high probability. We present MCMC-VQA, both ACKNOWLEDGEMENTS
from a theoretical and practical perspective, prove its er-
godicity, and derive its time-complexity (mixing time) as O.S. likes to thank Katie Pizzolato for accommodat-
a function of both accuracy and inverse thermodynamic ing HPC resource requests on IBM Cloud. This work
temperature. Focusing on MaxCut optimization within was done during T.L.P.’s internship at IBM Quantum,
the VQE framework due to its plentiful local minima and for which T.L.P. thanks Katie Pizzolato and the entire
employing a reversible Metropolis-Hastings Markov pro- IBM Quantum team. S.F.Y. would like to acknowledge
cess, we demonstrate the ergodicity of our method, the funding by NSF and AFOSR.

[1] While finalizing this manuscript, we became aware of [9] G. Nannicini, Phys. Rev. E 99, 013304 (2019).
another work applying Markov Chain Monte-Carlo tech- [10] L. Braine, D. J. Egger, J. Glick, and S. Woerner, IEEE
nique in quantum algorithms [59]. However, we differenti- Transactions on Quantum Engineering 2, 1 (2021).
ate our work by targeting near-term quantum algorithms [11] T. L. Patti, J. Kossaifi, A. Anandkumar, and S. F. Yelin,
and providing the proof of ergodicity. “Variational quantum optimization with multi-basis en-
[2] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru- codings,” (2021), arXiv:2106.13304 [quant-ph].
Guzik, New Journal of Physics 18, 023023 (2016). [12] B. Fuller, C. Hadfield, J. R. Glick, T. Imamichi, T. Itoko,
[3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. R. J. Thompson, Y. Jiao, M. M. Kagele, A. W. Blom-
Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, Schieber, R. Raymond, and A. Mezzacapo, “Approxi-
Nature communications 5, 4213 (2014). mate solutions of combinatorial problems via quantum
[4] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint relaxations,” (2021), arXiv:2111.03167 [quant-ph].
arXiv:1411.4028 (2014). [13] S. McArdle, S. Endo, A. Aspuru-Guzik, S. C. Benjamin,
[5] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, and X. Yuan, Rev. Mod. Phys. 92, 015003 (2020).
S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, [14] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,
L. Cincio, and P. J. Coles, Nature Reviews Physics 3 M. Brink, J. M. Chow, and J. M. Gambetta, Nature
(2021). 549, 242 (2017).
[6] W. Lavrijsen, A. Tudor, J. Müller, C. Iancu, and [15] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J.
W. de Jong, in 2020 IEEE International Conference Mayhall, Nature Communications 10 (2019).
on Quantum Computing and Engineering (QCE) (IEEE, [16] M. B. Ritter, in Journal of Physics: Conference Series,
2020) pp. 267–277. Vol. 1290 (IOP Publishing, 2019) p. 012003.
[7] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, [17] N. Vogt, S. Zanker, J.-M. Reiner, T. Eckl, A. Marusczyk,
S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, and M. Marthaler, “Preparing symmetry broken ground
L. Cincio, et al., Nature Reviews Physics , 1 (2021). states with variational quantum algorithms,” (2020),
[8] M. R. Garey and D. S. Johnson, Computers and in- arXiv:2007.01582 [quant-ph].
tractability, Vol. 29 (wh freeman New York, 2002).
9

[18] F. Zhang, N. Gomes, Y. Yao, P. P. Orth, and T. Iadecola, [37] J. Lemieux, B. Heim, D. Poulin, K. Svore, and
Phys. Rev. B 104, 075159 (2021). M. Troyer, Quantum 4, 287 (2020).
[19] J. Lee, A. B. Magann, H. A. Rabitz, and C. Arenz, Phys. [38] A. Montanaro, Proc. R. Soc. A. 471 (2016),
Rev. A 104, 032401 (2021). https://doi.org/10.1098/rspa.2015.0301.
[20] D. Beaulieu and A. Pham, arXiv preprint [39] A. Cornelissen and S. Jerbi, “Quantum algorithms
arXiv:2108.13464 (2021). for multivariate monte carlo estimation,” (2021),
[21] D. J. Egger, J. Mareček, and S. Woerner, Quantum 5, arXiv:2107.03410 [quant-ph].
479 (2021). [40] Y. Wang, S. Wu, and J. Zou, Statistical Science 31, 362
[22] W. van Dam, K. Eldefrawy, N. Genise, and N. Parham, (2016).
arXiv preprint arXiv:2108.08805 (2021). [41] M. Medvidovic and G. Carleo, npj Quantum Information
[23] J. Rivera-Dean, P. Huembeli, A. Acı́n, and J. Bowles, 7 (2021), 10.1038/s41534-021-00440-z.
“Avoiding local minima in variational quantum algo- [42] C. W. Commander, “Maximum cut problem, max-
rithms with neural networks,” (2021), arXiv:2104.02955 cutmaximum cut problem, max-cut,” in Encyclopedia of
[quant-ph]. Optimization, edited by C. A. Floudas and P. M. Parda-
[24] S. M. Harwood, D. Trenev, S. T. Stober, P. Barkoutsos, los (Springer US, Boston, MA, 2009) pp. 1991–1999.
T. P. Gujarati, S. Mostame, and D. Greenberg, arXiv [43] C. J. Geyer, Statistical Science 7, 473 (1992).
preprint arXiv:2102.02875 (2021). [44] S. P. Brooks, Journal of the Royal Statistical Society.
[25] O. Shehab, I. H. Kim, N. H. Nguyen, K. Landsman, C. H. Series D (The Statistician) 47, 69 (1998).
Alderete, D. Zhu, C. Monroe, and N. M. Linke, arXiv [45] R. Montenegro and P. Tetali, Found. Trends Theor. Com-
preprint arXiv:1906.00476 (2019). put. Sci. 1, 237–354 (2006).
[26] S. Bravyi, D. Gosset, and R. König, Science 362, 308 [46] N. M. March (2011).
(2018). [47] T. K. Kim, Korean journal of anesthesiology 68, 540
[27] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, (2015).
and H. Neven, Nature communications 9, 1 (2018). [48] S. G. Kwak and J. H. Kim, Korean J Anesthesiol. 70
[28] T. L. Patti, K. Najafi, X. Gao, and S. F. Yelin, Phys. (2017), 10.4097/kjae.2017.70.2.144.
Rev. Research 3, 033090 (2021). [49] C. Daskalakis, “6.896: Probability and computation,”
[29] C. Ortiz Marrero, M. Kieferová, and N. Wiebe, PRX (2011).
Quantum 2, 040316 (2021). [50] N. Whiteley, “The metropolis-hastings algorithm,”
[30] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, “Con- (2008).
necting ansatz expressibility to gradient magnitudes and [51] T. L. Patti, J. Kossaifi, S. F. Yelin, and A. Anandkumar,
barren plateaus,” (2021), arXiv:2101.02138 [quant-ph]. “Tensorly-quantum: Quantum machine learning with
[31] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. tensor methods,” (2021), arXiv:2112.10239 [quant-ph].
Coles, Nature Communications 12 (2021). [52] “Tensorly-quantum: Tensor-based quantum machine
[32] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, learning,” (2021).
A. H. Teller, and E. Teller, Journal of the Royal Statis- [53] E. N. Gilbert, The Annals of Mathematical Statistics 30,
tical Society. Series D (The Statistician) 47, 69 (1998). 1141 (1959).
[33] H. Robbins and S. Monro, The annals of mathematical [54] D. Coppersmith, D. Gamarnik, M. Hajiaghayi, and G. B.
statistics , 400 (1951). Sorkin, Random Structures & Algorithms 24, 502 (2004).
[34] C. Nemeth and P. Fearnhead, Journal of the [55] T. Luczak, in Proceedings of Random graphs, Vol. 87
American Statistical Association 116, 433 (2021), (1990) pp. 151–159.
https://doi.org/10.1080/01621459.2020.1847120. [56] A. E. Gelfand, Journal of the American statistical Asso-
[35] M. Szegedy, in Proceedings of the 45th Annual IEEE ciation 95, 1300 (2000).
Symposium on Foundations of Computer Science, FOCS [57] D. J. Earl and M. W. Deem, Physical Chemistry Chem-
’04 (IEEE Computer Society, USA, 2004) p. 32–41. ical Physics 7, 3910 (2005).
[36] K. Temme, T. J. Osborne, K. G. Vollbrecht, D. Poulin, [58] W. K. Hastings, Biometrika 57, 97 (1970).
and F. Verstraete, Nature 471, 87 (2011). [59] G. Mazzola, “Digital quantum advantage in monte carlo
simulations of frustrated spin models,” (in prep.).

You might also like