0% found this document useful (0 votes)
11 views25 pages

Advantage of Quantum Neural Networks As Quantum Information Decoders

Uploaded by

mcko0302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Advantage of Quantum Neural Networks As Quantum Information Decoders

Uploaded by

mcko0302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Advantage of Quantum Neural Networks as Quantum Information Decoders

Weishun Zhong,1, 2, 3, ∗ Oles Shtanko,4, † and Ramis Movassagh2, 5, ‡


1
Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
2
IBM Quantum, MIT-IBM Watson AI lab, Cambridge MA, 02142, USA
3
School of Natural Sciences, Institute for Advanced Study, Princeton, NJ, 08540, USA
4
IBM Quantum, IBM Research, Almaden, San Jose CA, 95120, USA
5
Google Quantum AI, Venice, CA, 90291, USA
A promising strategy to protect quantum information from noise-induced errors is to encode it into
the low-energy states of a topological quantum memory device. However, readout errors from such
memory under realistic settings is less understood. We study the problem of decoding quantum
information encoded in the groundspaces of topological stabilizer Hamiltonians in the presence
arXiv:2401.06300v1 [quant-ph] 11 Jan 2024

of generic perturbations, such as quenched disorder. We first prove that the standard stabilizer-
based error correction and decoding schemes work adequately well in such perturbed quantum
codes by showing that the decoding error diminishes exponentially in the distance of the underlying
unperturbed code. We then prove that Quantum Neural Network (QNN) decoders provide an almost
quadratic improvement on the readout error. Thus, we demonstrate provable advantage of using
QNNs for decoding realistic quantum error-correcting codes, and our result enables the exploration
of a wider range of non-stabilizer codes in the near-term laboratory settings.

Many physical quantum systems, including supercon- strated that parameterized quantum circuits [5, 34, 35],
ducting qubits [1], trapped ions [2], and cold atoms [3], also called Quantum Neural Networks (QNNs), can rec-
are widely regarded as promising platforms for quan- ognize quantum states in different phases and perform
tum computers. However, they are susceptible to noise, QEC [30, 36, 37]. Despite its success in small-scale tasks,
which prevents them from reliably storing and processing training quantum neural networks is known to suffer from
quantum information [4, 5]. One promising strategy to the barren plateau problem, where gradients become ex-
overcome these is to perform Quantum Error Correction ponentially small in the system size, and optimization
(QEC) [1], which involves encoding information using a resembles searching for a needle in a haystack [38]. Nev-
set of noise-insensitive states. Physically, quantum infor- ertheless, it has been recently demonstrated that the
mation can be encoded in the low-energy states of sys- barren plateau problem can be circumvented by utiliz-
tems described by topological Hamiltonians [6–12]. One ing logarithmic-depth convolutional networks with local
of the most common examples of topological Hamiltoni- output observables [39]. Here, we propose to employ
ans are the stabilizer Hamiltonians, which are the sum QNNs for decoding imperfect stabilizer codes that are
of the parity check of a stabilizer code [13–17]. How- low-energy states of topological Hamiltonians, and show
ever, stabilizer Hamiltonians are finely tuned. Thus, here their advantage over standard QEC.
we embark on an exploration of a more general case of In this work we introduce and study stabilizer Hamil-
Hamiltonians that are not an ideal stabilizer Hamilto- tonians [40] perturbed by a generic local quenched dis-
nians, but are subject to local perturbations. We ex- order. We establish rigorous results on decoding errors
amine the limitations of quantum error correction un- using standard QEC methods. In particular, we show
der such conditions. We then propose an alternative, that simple measurements of logical operators are not
Hamiltonian-agnostic approach to decoding information sufficient to achieve fault tolerance, and that a sequence
for such codes based on quantum machine learning. of stabilizer measurements followed by error corrections
In the last decade, we have witnessed the spectacu- are required to decode even noiseless input states. We
lar success of machine learning with noisy classical data show theoretically that QNNs can outperform standard
in a variety of tasks, such as classification, regression, QEC methods. We also study numerically how to con-
and generative modeling [18–22]. Recently, there has struct such QNNs for minimal circuit architectures.
been a considerable amount of research devoted to the
application of machine learning to the study of quan- Problem setup. We study a model of a quantum
tum many-body systems [23–33], where the complexity code spanning the groundspace of a generic topological
of many-body interactions makes analytical or numerical Hamiltonian. We start our analysis with a general stabi-
treatments formidable. In particular, it has been demon- lizer quantum code [[n, 1, d]] of distance d, which encodes
a single logical qubit using n physical qubits. We define
{Sa } as a set of n − 1 stabilizer operators with the con-
dition that [Sa , Sb ] = 0. Furthermore, we consider XL
[email protected]

and ZL to be the logical Pauli-X and Pauli-Z operators
[email protected]
[email protected]
of this stabilizer code, respectively. Next, we introduce
2

Quantum Error Correction

Quantum Neural Network


Ground- Compare
space

Disorder
perturbation

FIG. 1. Schematics of decoding realistic quantum codes. From left to right: Stabilizer codes realized as ground states
of physical Hamiltonians get perturbed and yield imperfect codewords. A state |Ψ⟩ prepared by the imperfect codewords is
presented to the magic box decoder Ô for subsequent decoding. (i) Top row: Ô = QQEC . |Ψ⟩ is decoded through standard
QEC procedure. (ii) Bottom row: Ô = QQNN . |Ψ⟩ is decoded through a quantum neural network. After decoding, bit strings
are sampled from the resulting probability distributions and the outcomes of the two decoding procedures are compared.

V as a generic k-local perturbation (k < d), which is a p > 0, which describes the locality of the noise error,
sum of spatially local terms acting on at most k qubits. serves here as a measure of the noise strength.
Then, we define the Hamiltonian as Our goal is to construct a logical state decoder that
takes |Ψ⟩ as input and returns either 0 or 1 after it has
n−1
X measured the qubit on logical basis Q ∈ {X, Y, Z}. We
H = H0 + λV, H0 = − Sa , (1) illustrate this decoding process in Fig. 1. Specifically,
a=1 for the Z decoder, the expectation of the qubit output
where H0 is the non-perturbed stabilizer Hamiltonian, V is fZ (θ, ϕ) := c20 − |c1 |2 = cos 2θ. Similarly, the X and
is a k-local perturbation defined above, and λ is a small Y decoders should return the expectations fX (θ, ϕ) =
real parameter that sets the perturbation strength. sin 2θ cos ϕ and fY (θ, ϕ) = sin 2θ sin ϕ, respectively.
When λ = 0 , the groundspace of H is doubly degener- From a mathematical point of view, the decoding is a
ate, spanned by the codewords of the original stabilizer general measurement of a certain (non-local) operator Ô.
code. If 0 < λ < ∆, where ∆ is the gap of the Hamilto- To evaluate the generalization error of the decoding |Ψ⟩,
nian H0 , the two near-degenerate ground states remain we use the mean square error as a metric to measure the
an approximate quantum code. This means we can en- deviation of the prediction from the ground truth,
code information using the two lowest eigenstates |ψ0 ⟩ Z  2
and |ψ1 ⟩ of the perturbed Hamiltonian, H|ψi ⟩ = Ei |ψi ⟩. εQ (Ô) = Eα dθdϕ µ(θ, ϕ) ⟨Ψ|Pα ÔPα |Ψ⟩ − fQ (θ, ϕ) ,
The energy splitting δ = |E1 − E0 | is exponentially small
(3)
in the stabilizer code’s distance d [41]. Specifically, we en-
code the quantum information for a single logical qubit where µ(ϕ, θ) is a measure that depends on the input
using quantum states of the form distribution, Eα is expectation over input Pauli errors
Pα , and fQ (θ, ϕ) is the function that describes the de-
|Ψ⟩ = c0 |Ω0 ⟩ + c1 |Ω1 ⟩, (2)
sired output expectation value for the measurement in
where |Ωi ⟩ =
P the basis Q.
j uij |ψj ⟩ are the codewords, u is a
Hamiltonian-dependent 2 × 2 unitary matrix, c0 = cos θ Limitations of standard decoding. The conven-
and c1 = eiϕ sin θ represent the amplitudes of the en- tional method of constructing a decoder depends on the
coded logical state on the Bloch sphere, with θ ∈ [0, π] information available about the system. First, we as-
and ϕ ∈ [0, 2π] being the coordinates. There are two dis- sume that we only know the logical operators QL ∈
tinct sources of error that affect the precision of decoding: {XL , YL , ZL } for the unperturbed stabilizer code. Given
the perturbation V that makes the codewords imperfect, this information, the only accessible action is to measure
and the potential noise at the input. We model such QL directly. This approach could be used, for instance,
noise by stochastically applying p-local Pauli operators in an experimental setting [12] where it is known that
Pα to input states in Eq. (2). The integer parameter the logical operators must be a string of Pauli operators,
3

(a) (b) (c)


scaling scaling scaling
(a)
(b)

(c)

FIG. 2. Universality of standard decoding. Performance of standard decoding strategies on perturbed stabilizer codes. On
y-axis we abbreviate the generalization error in X basis εX by ε. The universal scaling laws ε ∼ λα are shown in dashed lines.
Panel (a) show the error of measuring logical operator XL as a function of perturbation strength λ. Dashed line has scaling
exponent α = 4, as predicted by Theorem 1. Panel (b) shows the rescaled logarithm of the error for measuring logical operator
XL after measuring syndromes and applying error correction. Dashed line has scaling exponent α = 2(d + 1), consistent with
Theorem 2. Panel (c) is the same as panel (b) but for noisy input states. Dashed line has scaling exponent α = 2(d + 1 − 2p),
as predicted by Theorem 2. Note that all the curves from different codes nearly collapse into a single, universal shape in (a)-(c)
after dividing by the scaling exponent of the dashed line. Inset in (b)-(c) shows the raw unrescaled data which have different
slopes depending on the code distance.

but no efficient decoder is available. The error associated the codespace of the perturbed Hamiltonian into the
with this approach is given by the following theorem. codespace of original stabilizer code. The logical oper-
ators can then be measured to obtain the result. The
Theorem 1 (naive decoding, informal) For noiseless in- combination of these two procedures is equivalent to mea-
put and generic V , the generalization error of measuring suring the operator
the logical operator is εQ (QL ) = Θ(λ4 ).

QQEC := EQEC (QL ), (4)
Here we use the Θ (big-Theta) asymptotic notation to
indicate the exact scaling of the error with perturbation where EQEC (·) is the quantum error-correction map (see
strength λ, while holding everything else fixed (number SI Section III), and A† denotes the adjacent map for a
of qubits, distance, etc.). The formal version of the theo- superoperator A, i.e., Tr (O1 A(O2 )) = Tr (A† (O1 )O2 ).
rem along with the proof can be found in Supplementary The associated error is given by the following theorem
Information (SI) Section II. (the formal version, as well as the proof can be found in
We illustrate the result of Theorem 1 by numerically SI Section III).
simulating the result of measurement for three different
stabilizer codes of distance d = 3: the five-qubit code Theorem 2 (decoding with error correction, informal)
[42], the Steane code [15], and Shor’s code [13]. We For noiseless input, the generalization error for standard
also consider the smallest distance-5 code, QEC is εQ (QQEC ) = O(λ2⌈d/k⌉ ). For noisy input (p > 1)
Pn [[11, 1, 5]] [43]. and generic V , the generalization error is εQ (QQEC ) =
As for perturbation, in all cases V = i=1 Vi , where Vi
are single-qubit matrices sampled from the Gaussian uni- Θ(λ2⌈(d+1−2p)/k⌉ ).
tary ensemble and normalized so that its spectral norm Here we use O notation to indicate asymptotic upper
is unity, ||Vi || = 1. The results are shown in Fig. 2(a). bound as λ → 0 and ⌈x⌉ is a ceiling function that rounds
Theorem 1 shows that the error vanishes as λ4 , inde- up any non-integer x. As a corollary, for states unaffected
pendent of the distance d. Therefore, measuring logical by noise (p = 0), turning on error correction restores
operators for the unperturbed code does not help if the dependence on the code distance in the error scaling. In
goal is to achieve fault tolerance by increasing the code particular, for 1-local perturbations, the Θ(λ4 ) constant
distance with more physical qubits. Moreover, simply scaling of Theorem 1 becomes O(λ2d ).
measuring logical operators for noisy inputs (Pα ̸= I) We illustrate this result in Fig. 2(b) and (c), where
will necessarily lead to a non-vanishing error for any λ, we present numerical simulations of the codes and the
including λ = 0. perturbation V considered earlier. For the noiseless case
The decoding quality can be significantly improved if (Fig. 2(b)), we find that for the codes we study, all curves
in additional to the logical operators QL , one also knows have the same λ2(d+1) scaling, consistent with the O(λ2d )
the stabilizer operators Sa from the unperturbed Hamil- upper bound in Theorem 2. To illustrate this feature, we
tonian (Eq. (1)). In this scenario, one could measure show the collapsed version of the original data (shown
the syndromes of the input state and then apply er- in the inset), log(εQ (QQEC ))/(d + 1) versus λ. The case
ror correction based on the measured syndromes. This where the state |Ψ⟩ is affected by p one-qubit Pauli er-
procedure with high probability transforms states from rors is shown in Fig. 2(c). Similar to the noiseless case,
4

(a) (b) (c) (d)


Perfect decoding

FIG. 3. QNN performance for 5-qubit code. (a) Our QNN architecture consists of an error-correction circuit CQ and
a decoding circuit DQ . (b) Decoding noiseless states for perturbed 5-qubit code at λ = 1. (c)-(d) Example decoding of
imperfect 5-qubit code using XL (red), XQEC (green) and XQNN (blue). (c) Decoding noiseless states. (d) Decoding noisy
states corrupted by a randomly chosen single-qubit Pauli operator Pα ∈ {I, X, Y, Z} acting on a randomly chosen qubit. Panels
(b)-(c) demonstrate that QNN significantly outperforms the standard approaches across nearly two orders of magnitude in λ.
Panel (d) confirms that the scaling of standard QEC and QNN agree with our analytical predictions in Theorems 2-3.

we show that the values of log(εQ (QQEC ))/(d + 1 − 2p) general. Indeed we show that this bound is saturated in
collapse to the same slope, agrees with our tight bound the following proposition.
in Theorem 2.
The standard decoding approaches considered in The- Proposition 1 (informal) Consider noisy inputs with
orem 1-2 cannot be applied if one has no knowledge distribution µ(θ, ϕ) = µ(θ, ϕ + π) and label values
about the unperturbed code. In such scenario, varia- fQ (θ, ϕ) = ±1. Then for generic V , all QNN unitaries
tional methods such as the QNN can be utilized. In UQ must satisfy εQ (QQNN ) = Ω(λ4⌈(d−2p)/k⌉ ).
the following, we do not assume any knowledge of the
Here we use Ω notation to indicate asymptotic lower
perturbed Hamiltonian H beyond the access to the two
bound as λ → 0. For a formal version of Proposition 1
spanning codewords of its ground space. Nevertheless,
with accompanying proof, see SI Section V. This rigorous
as we show below, QNNs can outperform the standard
result, as well as Theorem 3, were proved without any re-
QEC.
strictions on the unitary UQ . In practice, achieving this
Decoding using QNN. Motivated by recent ma-
optimal QNN may involve numerous gates and a complex
chine learning enabled studies of quantum many-body
optimization process. In what follows we numerically il-
systems [30, 44, 45], we propose to use QNNs for decod-
lustrate our results in the context of an exampled QNN.
ing non-stabilizer codes. Essentially, QNN is a unitary
Our network, trainable to sample in logical Q-basis,
circuit that takes the encoded state as input and returns
consists of two blocks shown in Fig. 3(a): correction cir-
the result by measuring the output qubit. If the QNN
cuit CQ (blue) and decoding circuit DQ (green). The
transformation is represented by the unitary operator UQ
correction circuit is composed of local two-qubit gates
and Z0 is the Pauli-Z operator of the output qubit, the
forming the brickwork pattern of depth dC . In turn,
whole procedure is equivalent to measuring the operator
the decoding circuit is a convolutional network consist-

QQNN := UQ Z0 UQ . (5) ing of O(log n) layers of two-qubit unitaries, discarding
half the qubits after each layer, similar to previous pro-
In the absence of noise in the input, QNN decoding, un- posals [30, 39]. If the combined network has an overall
like standard QEC, can achieve arbitrary precision at suf- logarithmic depth, we expect that it does not suffer from
ficiently large depths. This is because, for arbitrary pair the barren plateau problem [38] and can be efficiently
of codewords, there exists a unitary UQ that transforms optimized. We parametrize all the 2-qubit unitaries Uk
them into a product state with different states of the composing the circuit as
target qubit. In the presence of noise, QNN provides an  
almost quadratic error improvement over the standard  X 4 
QEC result in Theorem 2, as detailed in the following Uk = exp i φkαβ σα ⊗ σβ , (6)
theorem.
 
α,β=1

Theorem 3 (QNN decoding, informal) For noisy in- where φkαβ ∈ [0, 2π] are trainable angles and σα are
puts, one could always find a QNN such that its single-qubit Pauli matrices.
generalization error for decoding is εQ (QQNN ) = We assume access to a limited set of training and vali-
O(λ4⌈(d−2p)/k⌉ ). dation states in code space {|Ψµ ⟩}, with equal probabili-
The formal version of the theorem and its proof can be ties affected by p-local Pauli errors, and the correspond-
found in SI Section IV. It is also possible to show that the ing labels (θµ , ϕµ ). In practice, one can assume such
error scaling with λ in Theorem 3 cannot be improved in access, as the groundspace can be prepared adiabatically
5

in experimental settings. We optimize the QNN with ob- decoding realistic codes with large code distances. Our
jective function εQ (QQNN ) (Eq. (3)) on the training data. analytical work was confirmed and corroborated by nu-
After training, we estimate the QNN decoding error from merical simulations of codes with various distances rang-
the validation data withheld from training. ing from 5 to 11 qubits. The numerical methodology may
We provide numerical simulation results in Fig. 3, find use-cases elsewhere.
where we study a uniform 1-local perturbation Vi = An interesting future direction is to find whether it
P n is possible to construct logarithmic-depth QNNs such
i=1 (Xi + Zi ). Fig. 3(b)-(c) suggest that our QNN
architecture (C = 4) can decode noiseless inputs and as convolutional NNs that saturate the bound in The-
achieves superior performance compared to standard de- orem 3. Also, our study reveals a minor gap between
coding schemes discussed in Theorems 1-2, regardless of the O(λ2d ) upper bound scaling in Theorem 2 and the
the perturbation strength. The QNN error can be fur- empirical O(λ2(d+1) ) performance for standard quantum
ther reduced by considering deeper architectures at the error correction with noiseless input states for k = 1 local
cost of longer training times. perturbation. Whether the bound in Theorem 2 can be
improved remains an open question.
The noisy scenario is shown in Fig. 3(d). Incidentally, Our theoretical framework and numerical protocol
for the 5-qubit code (d = 3, p = 1) simulation, the represent a first step toward decoding non-stabilizer im-
lower bound in Theorem 3 predicts the same λ4 scaling perfect codes, which may be of practical relevance in the
as in Theorem 2, which the present QNN saturates. near-term quantum devices. Our work paves the way for
However, with QNN one can potentially achieve better novel applications of machine learning in quantum error
scaling by considering larger code distances such as the correction. While we focus on imperfect stabilizer codes,
[[11,1,5]] code. Intriguingly, while QNN performance is an immediate question is whether our formalism can be
comparable to that of standard QEC for small λ, and extended beyond near-stabilizer codes. In addition, how
becomes superior for large λ. Meanwhile, this approach does the structure and depth of QNNs affect practical de-
does not require any knowledge of the Hamiltonian, as coding performance remains an important open problem.
needed in standard QEC methods. Also, this result
illustrates that sequential error correction and measure-
ment of the logical operator at the end of the circuit
can be implemented by applying a QNN unitary and ACKNOWLEDGMENTS
single-qubit measurement without using any ancilla.
Acknowledgments. W.Z. acknowledges support
Discussion. In this work, we have developed a theo- from the IBM Quantum Summer Internship Program,
retical framework for decoding stabilizer codes perturbed where this work initiated. W.Z. also acknowledges sup-
by disorder. We proved general performance bounds port from the Starr Foundation at the Institute for Ad-
regarding the decoding errors of standard QEC versus vanced Study, where the final phase of the project was
QNN decoding techniques for such codes. Our results completed. O.S. and R.M. acknowledge funding from
reveal universal scaling behaviors of the decoding error the MIT-IBM Watson AI Lab under the project Machine
for arbitrary code distances, and suggests a provable ad- Learning in Hilbert space. The research was partly sup-
vantage of using QNNs over standard QEC methods for ported by the IBM Research Frontiers Institute.

[1] S. Bravyi, O. Dial, J. M. Gambetta, D. Gil, and [7] S. B. Bravyi and A. Y. Kitaev, Quantum codes on a
Z. Nazario, The future of quantum computing with su- lattice with boundary, arXiv preprint quant-ph/9811052
perconducting qubits, arXiv preprint arXiv:2209.06841 (1998).
(2022). [8] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.
[2] C. D. Bruzewicz, J. Chiaverini, R. McConnell, and J. M. Cleland, Surface codes: Towards practical large-scale
Sage, Trapped-ion quantum computing: Progress and quantum computation, Phys. Rev. A 86, 032324 (2012).
challenges, Appl. Phys. Rev. 6 (2019). [9] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Topo-
[3] C. Gross and I. Bloch, Quantum simulations with ultra- logical quantum memory, J. Math. Phys. 43, 4452 (2002).
cold atoms in optical lattices, Science 357, 995 (2017). [10] R. Raussendorf and J. Harrington, Fault-tolerant quan-
[4] J. Preskill, Quantum computing in the nisq era and be- tum computation with high threshold in two dimensions,
yond, Quantum 2, 79 (2018). Phys. Rev. Lett. 98, 190504 (2007).
[5] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug, [11] K. Satzinger, Y.-J. Liu, A. Smith, C. Knapp, M. New-
S. Alperin-Lea, A. Anand, M. Degroote, H. Hei- man, C. Jones, Z. Chen, C. Quintana, X. Mi,
monen, J. S. Kottmann, T. Menke, et al., Noisy A. Dunsworth, et al., Realizing topologically ordered
intermediate-scale quantum (nisq) algorithms, arXiv states on a quantum processor, Science 374, 1237 (2021).
preprint arXiv:2101.08448 (2021). [12] G. Semeghini, H. Levine, A. Keesling, S. Ebadi, T. T.
[6] A. Y. Kitaev, Fault-tolerant quantum computation by Wang, D. Bluvstein, R. Verresen, H. Pichler, M. Kali-
anyons, Ann. Phys. (New York) 303, 2 (2003). nowski, R. Samajdar, et al., Probing topological spin
6

liquids on a programmable quantum simulator, Science Nat. Comput. Sci. 1, 403 (2021).
374, 1242 (2021). [36] D. F. Locher, L. Cardarelli, and M. Müller, Quantum
[13] P. W. Shor, Scheme for reducing decoherence in quantum error correction with quantum autoencoders, Quantum
computer memory, Phys. Rev. A 52, R2493 (1995). 7, 942 (2023).
[14] A. R. Calderbank and P. W. Shor, Good quantum error- [37] C. Cao, C. Zhang, Z. Wu, M. Grassl, and B. Zeng, Quan-
correcting codes exist, Phys. Rev. A 54, 1098 (1996). tum variational learning for quantum error-correcting
[15] A. M. Steane, Error correcting codes in quantum theory, codes, Quantum 6, 828 (2022).
Phys. Rev. Lett. 77, 793 (1996). [38] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab-
[16] A. R. Calderbank, E. M. Rains, P. Shor, and N. J. Sloane, bush, and H. Neven, Barren plateaus in quantum neural
Quantum error correction via codes over gf (4), IEEE T. network training landscapes, Nat. Commun. 9, 1 (2018).
Inform. Theory 44, 1369 (1998). [39] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.
[17] D. Gottesman, Stabilizer codes and quantum error cor- Coles, Cost function dependent barren plateaus in shal-
rection (California Institute of Technology, 1997). low parametrized quantum circuits, Nat. Commun. 12, 1
[18] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. (2021).
Manzagol, and L. Bottou, Stacked denoising autoen- [40] R. Movassagh and Y. Ouyang, Constructing quantum
coders: Learning useful representations in a deep net- codes from any classical code and their embedding
work with a local denoising criterion., J. Mach. Learn. in ground space of local hamiltonians, arXiv preprint
Res. 11 (2010). arXiv:2012.01453 (2020).
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet [41] S. Bravyi, M. B. Hastings, and S. Michalakis, Topological
classification with deep convolutional neural networks, quantum order: stability under local perturbations, J.
Commun. ACM 60, 84 (2017). Math. Phys. 51, 093512 (2010).
[20] D. P. Kingma and M. Welling, Auto-encoding variational [42] R. Laflamme, C. Miquel, J. P. Paz, and W. H. Zurek,
bayes, arXiv preprint arXiv:1312.6114 (2013). Perfect quantum error correcting code, Phys. Rev. Lett.
[21] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning 77, 198 (1996).
(MIT press, 2016). [43] M. Grassl, Bounds on the minimum distance of linear
[22] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, codes and quantum codes, Online available at http://
D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, www.codetables.de (2007), accessed on 2023-04-11.
Generative adversarial networks, Commun. ACM 63, 139 [44] H.-Y. Huang, R. Kueng, G. Torlai, V. V. Albert,
(2020). and J. Preskill, Provably efficient machine learn-
[23] G. Carleo and M. Troyer, Solving the quantum many- ing for quantum many-body problems, arXiv preprint
body problem with artificial neural networks, Science arXiv:2106.12627 (2021).
355, 602 (2017). [45] H.-Y. Huang, M. Broughton, J. Cotler, S. Chen,
[24] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, J. Li, M. Mohseni, H. Neven, R. Babbush, R. Kueng,
N. Wiebe, and S. Lloyd, Quantum machine learning, Na- J. Preskill, et al., Quantum advantage in learning from
ture 549, 195 (2017). experiments, Science 376, 1182 (2022).
[25] E. P. Van Nieuwenburg, Y.-H. Liu, and S. D. Huber, [46] I. Hubač and S. Wilson, Brillouin-wigner methods for
Learning phase transitions by confusion, Nat. Phys. 13, many-body systems, in Brillouin-Wigner Methods for
435 (2017). Many-Body Systems (Springer, 2010) pp. 133–189.
[26] J. Carrasquilla and R. G. Melko, Machine learning phases [47] J. W. S. B. Rayleigh, The theory of sound, Vol. 2
of matter, Nat. Phys. 13, 431 (2017). (Macmillan, 1896).
[27] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, [48] E. Schrödinger, Quantisierung als eigenwertproblem, An-
N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine nalen der physik 385, 437 (1926).
learning and the physical sciences, Rev. Mod. Phys. 91,
045002 (2019).
[28] L. Wang, Discovering phase transitions with unsuper-
vised learning, Phys. Rev. B 94, 195105 (2016).
[29] Y. Zhang and E.-A. Kim, Quantum loop topography for
machine learning, Phys. Rev. Lett. 118, 216401 (2017).
[30] I. Cong, S. Choi, and M. D. Lukin, Quantum convolu-
tional neural networks, Nat. Phys. 15, 1273 (2019).
[31] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang,
Unsupervised generative modeling using matrix product
states, Phys. Rev. X 8, 031012 (2018).
[32] A. M. Gomez, S. F. Yelin, and K. Najafi, Reconstruct-
ing quantum states using basis-enhanced born machines,
arXiv preprint arXiv:2206.01273 (2022).
[33] W. Zhong, X. Gao, S. F. Yelin, and K. Najafi, Many-
body localized hidden born machine, arXiv preprint
arXiv:2207.02346 (2022).
[34] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Pa-
rameterized quantum circuits as machine learning mod-
els, Quantum Sci. Technol. 4, 043001 (2019).
[35] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli,
and S. Woerner, The power of quantum neural networks,
S1

Supplementary Information for


“Advantage of Quantum Neural Networks as Quantum Information Decoders”
Weishun Zhong,1,2,3 Oles Shtanko4 , and Ramis Movassagh2,5
1
Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
2
IBM Quantum, MIT-IBM Watson AI lab, Cambridge MA, 02142, USA
3
School of Natural Sciences, Institute for Advanced Study, Princeton, NJ, 08540, USA
4
IBM Quantum, IBM Research – Almaden, San Jose CA, 95120, USA
5
Google Quantum AI, Venice Beach, CA, 90291, USA

In Section I we present details of the system defined in Eq. (1) and the Brillouin-Wigner perturbation series. Our
central result in this section is Lemma 1, which derives our main results. Section II contains the proof of Theorem 1,
Section III contains the proof of Theorem 2, Section IV contains the proof of Theorem 3, Section V contains the
proof of Proposition 1. In Section VI we present details of the numerical simulations performed in the main text. In
particular, we include details on how to construct our model stabilizer codes as well as our Quantum Neural Networks
(QNNs).

I. PERTURBED STABILIZER CODES

In this section, we first describe the properties of the ideal stabilizer code, then set up the perturbed problem as
well as the corresponding Brillouin-Wigner (BW) series used throughout the proofs.
Let Sa be a set of independent stabilizers with a ∈ {1, . . . , n − 1}, then we define stabilizer Hamiltonain H0 as
n−1
X
H0 = − Sa . (S.1)
a=1

The eigenstates and eigenenergies of this unperturbed Hamiltonian are


(0) (0) (0)
H0 |ψk ⟩ = Ek |ψk ⟩. (S.2)

In particular, the groundspace of is doubly degenerate, spanned on a two-dimensional basis of codewords. We define
(0) (0) (0) (0)
the unperturbed codewords as |Ωi ⟩, i = 0, 1, satisfying Sa |Ωi ⟩ = +|Ωi ⟩ for all a. The stabilizer code {|Ωi ⟩} is
said to have code distance d = 2k + 1 if it can be corrected up to k-qubit Pauli errors. This condition implies that for
any p-local Pauli operator Pα such that 2p + 1 ≤ d, the stabilizer code satisfies the Knill-Laflamme (KL) condition,
(0) (0)
⟨Ωk |Pα Pβ |Ωk′ ⟩ = ϵ0αβ δkk′ , (S.3)

where ϵ0αβ is a Hermitian matrix that saisfies ϵ0αα = 1. Conversely, when 2p + 1 > d, we assume that there exist at
least one Pauli operator Perr of weight d such that

1 (0) (0)
∥δO∥1 > 0, δO := O − Tr (O)O, Okk′ = ⟨Ωk |Perr |Ωk′ ⟩, (S.4)
2
where ∥ · ∥1 denotes 1-norm. Eq. (S.4) states that above code distance, there exist at least one error operator that
will make the codewords non-orthogonal.
(0)
The unperturbed stabilizer code |Ωi ⟩ also has the following property.
(0) (0)
Claim 1 For any eigenstate |ψk ⟩ of the Hamiltonian H0 and any Pauli operator P , the state P |ψk ⟩ is also an
eigenstate of H0 .
(0) (0) P
Proof The eigenstate |ψk ⟩ has the energy Ek = − a sa , where sa are values of the individual stabilizers,
(0) (0) (0)
Sa |ψk ⟩ = sa |ψk ⟩. We consider the action of the unperturbed Hamiltonian H0 on P |ψk ⟩,
(0) (0) (0)
X X
H0 P |ψk ⟩ = − Sa P |ψk ⟩ = − (−1)ra P Sa |ψk ⟩
a a
S2

(0) (0) (0)


X
=− (−1)ra sa P |ψk ⟩ := Ẽk P |ψk ⟩, (S.5)
a

(0) (0)
where Ẽk := − a (−1)ra +1 sa , ra = 0 if [P, Sa ] = 0 and ra = 1 if {P, Sa } = 0. Therefore, the resulting state P |ψk ⟩
P
(0)
is still an eigenstate of H0 (possibly the same as |ψk ⟩). □
Next, we consider a perturbation of H0 by adding a generic k-local term V , as shown in Eq. (1) of the manuscript.
The resulting Hamiltonian is

H = H0 + λV, (S.6)

where λ is a small real parameter. We focus on the spectral problem

H|ψk ⟩ = Ek |ψk ⟩, (S.7)

where |ψk ⟩ and Ek are the eigenvectors and eigenenergies of the Hamiltonian H. To describe the structure of the
code, we need an analytical expression for the two lowest energy eigenstates |ψ0 ⟩ and |ψ1 ⟩. To find one, we use the
Brillouin-Wigner (BW) perturbative expansion [46] of the form

1 X j (j)
|ψk ⟩ = λ |ψk ⟩, (S.8)
Nk j=0

where Nk is a normalization coefficient and the correction terms are


(0)
X |ψm (0)
(j) (0) ⟩⟨ψm |
|ψk ⟩ = (Gk V )j |ψk ⟩, Gk = (0)
. (S.9)
m̸=k Ek − Em
(j)
Note that |ψk ⟩ for j > 0 are not normalized and depend on λ through Ek on the RHS. The associated energies of
the states are
(0) (0) (0)
X
Ek = Ek + λj ⟨ψk |V (Gk V )j−1 |ψk ⟩. (S.10)
j=1

The advantage of Eq. (S.8)-(S.9) is that one does not need to consider our case when a degeneracy is present in
the unperturbed energy spectrum using a separate formula, as required by the standard Rayleigh-Schrödinger (RS)
perturbation theory [47, 48].
However, with arbitrary choice of the degenerate eigenstates, the perturbation series in Eq. (S.8)-(S.9) cannot be
treated as a Taylor series in the following sense. If we take into consideration only the first k terms in the series, the
contribution from the remaining terms will not have the usual O(λk+1 ) scaling that a converging Taylor series has.
This is because the denominator of the propagator Gk can be polynomially small in λ, thus potentially reducing the
power of or even cancelling the λ-prefactor. Use the following Lemma, we show that this problem can be resolved by
choosing a proper basis for the degenrate groundspace.
(0) P (0)
Lemma 1 There exists a basis |ϕk ⟩ = k′ βkk′ |ψk′ ⟩, where β is a 2 × 2 unitary transformation, such that

1 X (0)
|ψk ⟩ = (λG̃0 V )j |ϕk ⟩ + O(λd ), (S.11)
N j=0

where N > 0 and G̃0 takes the form


(0)
X |ψm (0)
⟩⟨ψm |
G̃0 = (0)
. (S.12)
m̸=0,1 E0 − Em

The proof is given below at the end of this section. It is important to note that now, in the case of twofold
groundspace degeneracy, all denominators in Eq. (S.12) have O(1) scaling due to the presence of a gap in the unper-
turbed Hamiltonian, see Eq. (S.10). This allows us to truncate the series with a controlled error. We will use the
expression from this Lemma in the following sections.
Another important result is the Knill-Laflamme condition for the resulting perturbed quantum code. With the help
of perturbation theory, it can be shown that the generalization of Eq. (S.3) to the groundspace codes can be expressed
as the following Lemma:
S3
P
Lemma 2 Let |Ωk ⟩ = k′ ukk |ψk ⟩ be the superposition of the two lowest eigenstates of the Hamiltonian H in
′ ′

Eq. (S.6) for some 2 × 2 unitary matrix u, and let Pα be p-local Pauli errors. Then
⟨Ωk |Pα Pβ |Ωk′ ⟩ = ϵαβ δkk′ + λq hαk,βk′ , (S.13)
where q = ⌈(d − 2p)/k⌉, ϵαβ and hαa,βb are O(1) matrices satisfying ϵαα = 1, hαk,αk′ = 0 and hαk,βk′ = h∗βk′ ,αk .
The proof can be found below in this section.
Finally, we show the following technical result that we will use later.
Lemma 3 Upon condition in Eq. (S.4), there exists k-local V and p-local Pauli operators Pα , Pβ such that
∥Tαβ ∥1 > 0 (S.14)
where
d−2p
1 X (0) (0)
Tαβ = tαβ − Tr (tαβ )I, (tαβ )kl := ⟨Ωk |(V G̃0 )m Pα Pβ (G̃0 V )d−m−2p |Ωl ⟩. (S.15)
2 m=0

Here I is 2 × 2 identity matrix and G̃0 is defined in Eq. (S.12).


This Lemma for p = 0 shows that there exists a perturbation V such that the two lowest eigenstates become
non-orthogonal at the d-th order of perturbation theory. In the presence of noise affecting p > 0 qubits, the overlap
becomes nonvanishing at the d − 2p order.

Proof of Lemma 1. We start from the Brillouin-Wigner series, writing it as



1 (0)
X 1
∀k ∈ {0, 1} : |ψk ⟩ = Mk |ψk ⟩, Mk := (λGk V )j = , (S.16)
Nk j=0
1 − λGk V

where Nk are normalization coefficients and Gk is the propagator defined in Eq. (S.9). Here and below we treat 1 as
the identity operator. Next, for each k ∈ {0, 1} we consider a decomposition
(0) (0) (0) (0)
X |ψm ⟩⟨ψm | |ψ1−k ⟩⟨ψ1−k |
Gk = G̃k + Πk , G̃k = (0)
, Πk = (0)
, (S.17)
m̸=0,1 Ek − Em Ek − E1−k

where the operator G̃k does not act on the unperturbed codespace and Πk is a projector to an orthogonal state in
the codespace. Using this decomposition, we rewrite
1 1
Mk = = M̃k , (S.18)
1 − λ(G̃k + Πk )V 1 − λΠk V M̃k

where we use the notation M̃k := (1 − λG̃k V )−1 . Then, by applying the Taylor series expansion, we can rewrite the
second term in the above operator product as
∞ ∞
1 X
j
X
= (λΠk V M̃k ) = 1 + (λΠk V M̃k )j
1 − λΠk V M̃k j=0 j=1

! (S.19)
X
j 1
=1+λ (λΠk V M̃k ) Πk V M̃k = 1 + λ Πk V M̃k
j=0
1 − λΠk V M̃k

Here, in the last step, we recombine the series back into the original expression. This gives us an expanded version of
Eq. (S.18) as
(0) (0)
! !
1 λ|ψ1−k ⟩⟨ψ1−k |V M̃k
Mk = M̃k 1 + λ Πk V M̃k = M̃k 1 + (0) (0) (0)
. (S.20)
1 − λΠk V M̃k Ek − E1−k − λ⟨ψ1−k |V M̃k |ψ1−k ⟩

The usefulness of this expression becomes apparent when we insert it into Eq. (S.16) and obtain
1 
(0) (0)

∀k ∈ {0, 1} : |ψk ⟩ = M̃k |ψk ⟩ + fk (λ)|ψ1−k ⟩ , (S.21)
Nk
S4

where we introduced
(0) (0)
λ⟨ψ1−k |V M̃k |ψk ⟩
fk (λ) = (0) (0) (0)
, f1 (λ) = −f0∗ (λ) + O(λd ) (S.22)
Ek − E1−k − λ⟨ψ1−k |V M̃k |ψ1−k ⟩

Let us rewrite Eq (S.21) using a new normalization Ñk as


1 X (0)
|ψk ⟩ = βkk′ M̃k |ψk′ ⟩ + O(λd ), (S.23)
Ñk k′

where β is a unitary matrix


Å ã
1 1 f0
β= , (S.24)
w −f0∗ 1
p
using w = 1 + |f0 |2 and Ñk = Nk /w. From the normalization condition we have

1 X ∗ 1
βkk′ βll′ ⟨ψk′ |M̃k† M̃l |ψl′ ⟩ = 2 ⟨ψk |M̃k† M̃k |ψk ⟩δkl + O(λd ) = δkl .
(0) (0) (0) (0)
⟨ψk |ψl ⟩ = (S.25)
Ñk Ñl k′ l′ Ñk

Therefore

Ñk = [⟨ψk |M̃k† M̃k |ψk ⟩]−1/2 + O(λd ),


(0) (0)
(S.26)

Using the fact that M̃0 = M̃1 + O(λd ), we conclude that N = Ñ0 = Ñ1 + O(λd ), which leads us to

1 X X (0)
|ψk ⟩ = βkk′ (λG̃0 V )m |ψk′ ⟩ + O(λd ). (S.27)
N ′ m=0
k

This expression concludes our proof. □

Proof of Lemma 2. We expand both codewords in BW series using Eq. (S.11) to get

1 X X ′ (0) ′ (0)
⟨Ωk |Pα Pβ |Ω ⟩ = 2
k′ u∗kl uk′ l′ λm+m ⟨Ωl |(V G̃0 )m Pα Pβ (G̃0 V )m |Ωl′ ⟩ (S.28)
N
l,l′ ∈{0,1} m,m′ =0

P
Next, we use the decomposition of the perturbation operator V = ν vν Kν with k-local Pauli operators Kν . Then
(0) (0)
consider |ψk ⟩ to be an eigenstate of the Hamiltonian H0 . By Claim 1, Kν |ψk ⟩ is also an eigenstate of the Hamiltonian
H0 , hence

(0) 1 (0) (0) (0)


G̃0 Kν |ψk ⟩ = Kν |ψk ⟩, Eνk := ⟨ψk |Kν H0 Kν |ψk ⟩ (S.29)
E0 − Eνk
(0) (0)
Since |Ωk ⟩ ≡ |ψk ⟩ for k = 0, 1, there exist real coefficients κν1 ...νm such that
(0) (0)
X
(V G̃0 )m |Ωk ⟩ = κν1 ...νm Kν1 . . . Kνm |Ωk ⟩ (S.30)
ν1 ...νm

Using this relation and the Knill-Laflamme condition in Eq. (S.3), for all m + m′ < q := ⌈(d − 2p)/k⌉ the matrix
element in Eq. (S.28) has the form
(0) ′ (0) ′
m + m′ < q := ⌈(d − 2p)/k⌉ : ⟨Ωl |(V G̃0 )m Pα Pβ (G̃0 V )m |Ωl′ ⟩ = ϵmm
αβ δkk′ . (S.31)

We have used the following notations



m m
′ X X Y Y
ϵmm
αβ := κ∗µ1 ,...µm κν1 ,...νm vµ∗ i vν i (S.32)
µ1 ,...µm ν1 ,...,νm′ i=1 i=1
S5

Inserting Eq. (S.31) into Eq (S.28), we get

⟨Ωk |Pα Pβ |Ωk′ ⟩ = ϵαβ δkk′ + λq hαk,βk′ , (S.33)

where
q−1 q−1−m
1 X X m+m′ mm′
ϵαβ = λ ϵαβ ,
N 2 m=0 ′
m =0
∞ (S.34)
1 X
m+m′ −q ′
X (0) m′ (0)
hαk,βk′ = 2 λ H(m + m − q) u∗kl uk′ l′ ⟨Ωl |(V m
G̃0 ) Pα Pβ (G̃0 V ) |Ωl′ ⟩,
N
m,m′ =0 l,l′ ∈{0,1}

where H(·) is the Heaviside step function. Note that since Pα2 = I, we have ϵαα = 1 and hαk,αk′ = 0. Also since
⟨Ωk |Pα Pβ |Ωk′ ⟩ = ⟨Ωk′ |Pβ Pα |Ωk ⟩∗ , we have hαk,βk′ = h∗βk′ ,αk . □

Proof of Lemma 3. According to its definition, the operator Oerr in Eq. (S.4) is a Pauli operator and thus is a
product of single-qubit Pauli operators. Given the freedom in choosing p-local operators Pα and Pβ , we can set Pα Pβ
to be a product of 2p one-qubit Pauli operators that are already in Oerr . Then we choose a set of non-overlapping
⌈(d−2p)/k⌉
k-local Pauli operators {Ki }i=1 to be complementary such that the uncorrectable error operator can be written
as

Oerr = K1 . . . K⌈(d−2p)/k⌉ Pα Pβ . (S.35)

We also choose operators Ki that have no overlap with Pα Pβ . Now we choose the perturbation V in the form
X
V = Ki . (S.36)
i

For this perturbation, we can rewrite the main object of this Lemma as
d−2p
(0) (0)
X
(Tαβ )kl := ⟨Ωk |(V G̃0 )m Pα Pβ (G̃0 V )d−m−2p |Ωl ⟩
m=0
(S.37)
d−2p
(0) (0)
XX
= ⟨Ωk |Ka1 G̃0 . . . Kam G̃0 Pα Pβ G̃0 Kam+1 G̃0 . . . G̃0 Kad |Ωl ⟩,
m=0 a

Now, let us define a sequence of energy values

Ea,m = {E(a1 ),m E(a1 ,a2 ),m . . . E(a1 ,...ad ),m } (S.38)

where
 (0) (0)
⟨Ω0 |Ka1 . . . Kal H0 Kal . . . Ka1 |Ω0 ⟩, if l < m,

E(a1 ,...,al ),m = ⟨Ω0 |Ka1 . . . Kal Pβ Pα H0 Pα Pβ Kal . . . Ka1 |Ω(0)
(0)
0 ⟩, if l = m, (S.39)
 (0)
 (0)
⟨Ω0 |Ka1 . . . Kam Pβ Pα Kam . . . Kal H0 Kal . . . Kam Pα Pβ Kal . . . Ka1 |Ω0 ⟩, if l > m.

Each E(a1 ,...,al ),m is an energy of an eigenstate of the original stabilizer Hamiltonian, therefore E(a1 ,...,al ),m > E0 .
Using the expression for G̃0 in Eq. (S.12) we can rewrite Eq. (S.37) as
d−2p
(0) (0)
X X Y 1 (0) (0)
(Tαβ )kl = ⟨Ωk |Oerr |Ωl ⟩ = c′ ⟨Ωk |Oerr |Ωl ⟩, (S.40)
m=0 a∈Perm(d+1) E∈Ea,m
E0 − E

where Perm(n) is defined as a set of all vectors obtained by permutation of elements in (1, . . . , n) and c′ is a non-
vanishing coefficient. |c′ | > 0 because it is a sum of terms all having the same sign, as E0 < E for any E ∈ Ea,m .
Using Eq. (S.37), we then obtain

∥T̃αβ ∥1 = |c′ |∥δO∥1 > 0. (S.41)

This expression concludes our proof. □


S6

II. PROOF OF THEOREM 1

Theorem 1 (formal) Let H0 be a [[n, 1, d]] stabilizer Hamiltonian with r-local parity check operators Sa , QL a logical
PN
Pauli operator, and V = i=1 Vi , where Vi are k-local operators, k < d. Then, for any V satisfying ∥[V, Sa ]∥ > 0 for
N ′ operators Sa , the generalization error in Eq. (3) for noiseless input (p = 0) is εQ (QL ) = Θ(λ4 N ′2 /r4 ).
Proof Let us rewrite the generalization error measure from Eq. (3) as
Z  2
(0) (0)
εQ (QL ) = dθdϕµ(θ, ϕ) ⟨Ψθ,ϕ |QL |Ψθ,ϕ ⟩ − ⟨Ψθ,ϕ |QL |Ψθ,ϕ ⟩ , (S.42)

(0)
where QL ∈ {XL , YL , ZL } is a logical Pauli operator, |Ψθ,ϕ ⟩ are logical states of unperturbed stabilizer code and
|Ψθ,ϕ ⟩ are logical states of perturbed code,
(0) (0)
X X
|Ψθ,ϕ ⟩ = ψk (θ, ϕ)|Ωk ⟩, |Ψθ,ϕ ⟩ = ψk (θ, ϕ)|Ωk ⟩, (S.43)
k∈{0,1} k∈{0,1}

where ψ0 (θ, ϕ) = cos θ and ψ1 (θ, ϕ) = eiϕ sin θ are two-dimensional wavefunction components. We choose the code-
words to be the simultaneous lowest energy eigenstates of the Hamiltonian H0 and eigenstates of the logical operator,
(0) (0)
QL |Ωk ⟩ = (−1)k |Ωk ⟩. (S.44)
In turn, the perturbed codewords are a superposition of two lowest eigenstates of the perturbed Hamiltonian H,
X
|Ωk ⟩ = αkk′ |ψk′ ⟩, (S.45)
k′ ∈{0,1}

where αkk′ are components of a two-dimensional unitary rotation. To prove the theorem, we must find the lower
bound under the assumption that the choice of α can be arbitrary.
Using these definitions, we rewrite the generalization error as
Z !2
X î (0) (0)
ó

εQ (QL ) = dθdϕµ(θ, ϕ) ψk (θ, ϕ)ψl (θ, ϕ) ⟨Ωk |QL |Ωl ⟩ − ⟨Ωk |QL |Ωl ⟩ . (S.46)
k,l∈{0,1}

We also use Lemma 1 to rewrite the eigenstates of the Hamiltonian as a series



1 X X (0)
|ψk ⟩ = βkk′ (λG̃0 V )m |Ωk′ ⟩ + O(λd ), (S.47)
N
k′ ∈{0,1} m=0

(0) (0)
where we have chosen |ψk ⟩ = |Ωk ⟩ for k ∈ {0, 1}. Combining the expressions in Eqs. (S.45) and (S.47), we get the
resulting expression for the codewords of the perturbed Hamiltonian by the stabilizer codewords as

1 X X (0)
|Ωk ⟩ = ukk′ (λG̃0 V )m |Ωk′ ⟩ + O(λd ), (S.48)
N ′ m=0
k

where we introduce two-dimensional matrix u = αβ, uu† = I. With help of this expression, we rewrite

1 X ∗ X ′ (0) ′ (0)
⟨Ωk |QL |Ωl ⟩ = u ′ ull′ λm+m ⟨Ωk′ |(V G̃0 )m QL (G̃0 V )m |Ωl′ ⟩ + O(λd ). (S.49)
N 2 ′ ′ kk
k l m,m′ =0

Next, considering only lowest orders in perturbation theory, we get the expression
1 X ∗  ′ (0) (0)

⟨Ωk |QL |Ωl ⟩ = 2 ukk′ ull′ (−1)k δk′ l′ + λ2 ⟨Ωk′ |V G̃0 QL G̃0 V |Ωl′ ⟩ +O(λ3 ) (S.50)
N ′′
k l

(0)
where we used that G̃0 |Ωk ⟩ = 0. Using perturbation theory and the Knill-Laflamme condition, we can also rewrite
the normalization condition as
1  (0) (0)

⟨Ωk |Ωk ⟩ = 2 1 + λ2 ⟨Ω0 |V G̃20 V |Ω0 ⟩ + O(λ3 ) = 1. (S.51)
N
S7

Thus, we conclude that


1 (0) (0)
= 1 − λ2 ⟨Ω0 |V G̃20 V |Ω0 ⟩ + O(λ3 ). (S.52)
N2
Combining Eqs. (S.50) and (S.52), we get the expression
X  ′

⟨Ωk |QL |Ωl ⟩ = u∗kk′ ull′ (−1)k δk′ l′ − λ2 Mk′ l′ +O(λ3 ), (S.53)
k ′ l′

where we introduced 2 × 2 matrix


(0) (0)
Mkl = ⟨Ωk |V G̃0 (I − QL )G̃0 V |Ωk ⟩δkl . (S.54)
(0)
Since eigenstates in H0 are doubly degenerate at each energy level, it is convenient to split the indices k in |ψk ⟩ as
(0)
|ψi± ⟩ to denote simultaneous eigenstates of QL and H0 ,
(0) (0) (0) (0) (0)
QL |ψi± ⟩ = ±|ψi± ⟩, H0 |ψi± ⟩ = Ei |ψi± ⟩ (S.55)
(0) (0) (0) (0)
Note that |ψ0+ ⟩ ≡ |Ω0 ⟩ and |ψ0− ⟩ ≡ |Ω1 ⟩ by definition. Since [QL , H0 ] = 0, we can present the matrix M as
Å ã
M0 0
M= (S.56)
0 M1

where its diagonal elements satisfy


X 1 (0) (0)
Mk = 2 |⟨Ωk |V |ψi− ⟩|2 . (S.57)
i>0
(E0 − Ei )2

Now, we can rewrite the generalization error as


Z  2
εQ (QL ) = dθdϕµ(θ, ϕ) ⟨θϕ|u(Z − λ2 M )u† |θϕ⟩ − ⟨θϕ|Z|θϕ⟩ +O(λ3 ), (S.58)

PN
where Z is 2 × 2 Pauli-Z matrix and |θϕ⟩ = cos θ|0⟩ + eiϕ sin θ|1⟩. Assuming that V = i=1 Vi consists of local terms
(0) (0)
that does not commute with at least N ′ stabilizers Sa , ⟨Ωk |V |ψi− ⟩ is non-vanishing for at least N ′ eigenstates.
Furthermore, assuming that the majority of local terms do not commute with ∝ r stabilizers, the energy difference
in the denominator is Ei − E0 ∝ r. Therefore

εQ (QL ) = O(λ4 max(M0 , M1 )) = O(λ4 N ′2 /r4 ). (S.59)

This expression concludes our proof. □

Remark 1 Condition [H0 , V ] = 0 break the notion of generality, both provide vanishing generalization error.
(0) (0)
Indeed, the first condition automatically satisfies ⟨Ωk |V |ψi− ⟩ = 0. Using the second condition, one can also show
that
(0) (0) 1 (0) (0)
∀i ̸= 0 : ⟨Ωk |V |ψi− ⟩ = (0) (0)
⟨Ωk |[H0 , V ]|ψi− ⟩ = 0. (S.60)
E0 − Ei

It is also straightforward to show that these assumptions lead to vanishing higher orders.

Remark 2 For a typical local term V , one should expect N ′ ∼ N .

Remark 3 The scaling exponent 4 in Theorem 1 is due to the fact that we use mean square loss to measure general-
ization error, the exponent would be 2x if we used Lx loss function.
S8

III. PROOF OF THEOREM 2

First, let us introduce the weighted squared expectation for a 2 × 2 operator O as


Z
⟨O⟩µ(θ,ϕ) := min dθdϕ µ(θ0 + θ, ϕ0 + ϕ)|⟨ψθ,ϕ |O|ψθ,ϕ ⟩|2 , (S.61)
θ0 ,ϕ0

where we define |ψθ,ϕ ⟩ = cos θ|0⟩ + eiϕ sin θ|1⟩. For later convenience, we also introduce the following notation for the
Λth order contributions (in the BW expansion) of decoding noisy input states using standard QEC,
Λ
(Λ)
X
m Λ−m
Wkl := Eα Tr (ZL EQEC (|ψlα ⟩⟨ψkα |), (S.62)
m=0

m (0)
where |ψkα ⟩ = (V G̃0 )m Pα |Ωk ⟩. Then we formulate the formal version of the theorem as
PN
Theorem 2 (Formal) Let H0 be the stabilizer Hamiltonian for the [[n, 1, d]] code, V = i=1 Vi , where Vi are k-local
operators, k < d, and QQEC is the operator of logical measurement after correction in Eq. (). Then, for any V , input
distribution µ(θ, ϕ), and distribution of p-qubit Pauli errors Pα , the generalization error in Eq. (3) is

εQ (QQEC ) = O(λ2⌈ξ/k⌉ ), (S.63)

where ξ = d if p = 0 and ξ = d − 2p + 1 if p > 0. Moreover, for p ≥ 1 if

⟨W (d−2p+1) ⟩µ(θ,ϕ) > 0, (S.64)

the generalization error is

εQ (QQEC ) = Θ(λ2⌈(d+1−2p)/k⌉ ). (S.65)

Remark 4 For any given µ(θ, ϕ), there is no reason to believe that Eq. (S.64) is violated for more than a small family
of finely tuned perturbations V and noise distributions. Therefore, in the informal version of the theorem, we call V
that satisfies this condition to be “general”.

Proof Consider Sα to be commuting Pauli parity checks for the stabilizer code satisfying [Sα , Sα′ ] = 0 and
Sα |Ωk ⟩ = |Ωk ⟩. For any such code, there exists a set of distinct Pauli corrections Cβ = Cβ† , satisfying Sα Cβ =
(0) (0)

sαβ Cβ Sα , where sαβ = ±1 are error syndromes. The choice of each operator Cβ is not unique; we consider the set
where each Cβ has the smallest possible weight. In this representation, these operators must satisfy 2S(Cβ ) ≤ d − 1,
where S(O) represents the weight of the operator O, i.e. the number of qubits it acts as a non-identity. Using these
notations, we can rewrite the error correction as a quantum channel
Y1
Fβ ρFβ† ,
X
C(ρ) := Fβ = Cβ (1 + sαβ Sα ) = P Cβ , (S.66)
α
2
β

where Fβ are Kraus operators for error correction, P is the projector to the codespace. It is convenient to express the
expectation values of the logical operators after correction as
(0) (0)
Tr (QL C(ρ)) = Tr (QL C(ρ)) = Tr (C † (QL )ρ) = Tr (Q̃L ρ), (S.67)
(0) (0) (0) (0) (0)
where C † is the adjacent map to the error correcting map C, and QL = |Ω0 ⟩⟨Ω0 | − |Ω1 ⟩⟨Ω1 | is the restriction of
(0)
the logical operator QL to the codespace, |Ωk ⟩ are codewords that, without loss of generality, have been chosen as
(0) (0)
eigenstates of the logical operator QL , i.e. QL |Ωk ⟩ = (−1)k |Ωk ⟩. The operator Q̃L represents an effective logical
operator after correction,
(0) (0)
X X X
Q̃L := C † (QL ) ≡ Fµ† QL Fµ = (−1)k Cβ |Ω(0) (0)
r ⟩⟨Ωr |Cβ . (S.68)
µ β r∈{0,1}

We aim to derive an upper bound on the generalization error


Z  2
(0) (0)
εQ (QL ) = Eα dθdϕ ⟨Ψθ,ϕ |Pα Q̃L Pα |Ψθ,ϕ ⟩ − ⟨Ψθ,ϕ |QL |Ψθ,ϕ ⟩ (S.69)
S9

where Pα are Pauli operators of weight p according to the statement of the theorem.
As a first step, we use the relationship in Eq. (S.48) between the perturbed and unperturbed codewords to express
the following matrix element as

1 X ∗ X ′ (0) ′ (0)
⟨Ωk |Pα Q̃L Pα |Ωl ⟩ = u ′ ull′ λm+m ⟨Ωk′ |(V G̃0 )m Pα Q̃L Pα (G̃0 V )m |Ωl′ ⟩ + O(λd ). (S.70)
N 2 ′ ′ kk
k l m,m′ =0
P
Next we note that for a k-local correction, we can express V = ν vν Kν , where Kν are k-local Pauli operators. Then,
it is possible to write the following decomposition for the perturbation series,
(0) (0)
X
(G̃0 V )m |Ωk ⟩ = κµ λsµ Kµ |Ωk ⟩, sµ := S(Kµ ) ≤ mk, (S.71)
µ

where κµ are certain coefficients that are not divergent in the limit λ → 0. To prove Eq. (S.71), we express

(0)
X vaj . . . va1 (0)
(G̃0 V )m |Ωk ⟩ = (0) (0)
λS(Kaj ...Ka1 ) Kaj . . . Ka1 |Ωk ⟩, (S.72)
a (Ek − Efj ) . . . (Ek − Ef1 )

where fi = fi (a) are labels of the energy levels that depend on the sequence a.
Using Eq. (S.71) leads us to the expression
(0) (0)
X X XX
⟨Ωk |Pα Q̃L Pα |Ωl ⟩ = u∗kk′ ull′ κ∗µ κν λsµ +sν (−1)r ⟨Ωk′ |Kµ Pα Cβ |Ω(0) (0) d
r ⟩⟨Ωr |Cβ Pα Kν |Ωl′ ⟩+O(λ ). (S.73)
k ′ l′ µν β r

For the convenience of computing this expression, let us introduce the following definition. For any two Pauli operators
(0) (0)
A and B, we say that the action of A is isomorphic to action of B, or A ∼ B, if A|Ωk ⟩ = B|Ωk ⟩ for k ∈ {0, 1}.
Then, for any code stabilized by Pauli operators, we can rewrite

if Kµ Pα Cβ ∼ eiϕµαβ I


 δab ,
if Kµ Pα Cβ ∼ eiϕµαβ XL

δab ,


(0) (0) iϕµαβ
⟨Ωa |Kµ Pα Cβ |Ωb ⟩ = e (−1) iδab , if Kµ Pα Cβ ∼ eiϕµαβ YL
a (S.74)
a iϕµαβ




 (−1) δab , if Kµ Pα Cβ ∼ e ZL
0, otherwise

where a := 1 − a and ϕµαβ is a phase.


Keeping these expressions in mind, let us focus on the product of matrix elements in Eq. (S.73), i.e.
(0) (0)
⟨Ωk′ |Kµ Pα Cβ |Ω(0) (0)
r ⟩⟨Ωr |Cβ Pα Kν |Ωl′ ⟩ (S.75)

and consider three possible scenarios where it is different from zero.

• Scenario 1. Both operators of the matrix elements are isomorphic to an identity transformation, i.e. Kµ Pα Cβ ∝
eiϕµαβ I and Kν Pα Cβ ∝ eiϕναβ I. This case also means that Kµ Kν ∼ eiϕµν I, where ϕµν = ϕµαβ − ϕναβ . The
value of the sum is given by the value of this unit-valued product of non-vanishing matrix elements,
X (0) (0)
⟨Ωk′ |Kµ Pα Cβ |Ω(0) (0)
r ⟩⟨Ωr |Cβ Pα Kν |Ωl′ ⟩ = e
iϕµν
δk′ r δrl′ . (S.76)
β

• Scenario 2. One of the operators is isomorphic to an identity transformation, and the other is isomorphic to a
logical operator. For example, consider Kµ Pα Cβ ∼ eiϕµαβ Q′L and Kν Pα Cβ ∼ eiϕναβ I, where Q′L ∈ {XL , YL , ZL }.
This case also means that Kµ Kν ∼ eiϕµν Q′L . Th resulting product of the matrix elements, according to the
Knill-Laflamme condition, is nonzero if and only if

S(Kµ ) + S(Kν ) ≥ d. (S.77)

Similar conclusion can be made if Kµ Pα Cβ ∼ eiϕµαβ I and Kν Pα Cβ ∼ eiϕναβ Q′L .


S10

• Scenario 3. Both operators are isomorphic to logical operators. In this case Kµ Pα Cβ ∼ eiϕµαβ Q′L and Kν Pα Cβ ∼
eiϕναβ Q′′L , where Q′L , Q′′L ∈ {XL , YL , ZL }. In this case
S(Kµ ) + S(Pα ) + S(Cβ ) ≥ d,
(S.78)
S(Kν ) + S(Pα ) + S(Cβ ) ≥ d.
Summing these two expression we get
S(Kµ ) + S(Kν ) ≥ 2d − 2S(Cβ ) − 2S(Pα ) ≥ d + 1 − 2p, (S.79)
where we have taken into account that any correction operator has weigh never reaches the half of the half-weight
of the logical operator, i.e. 2S(Cβ ) ≤ d − 1.

Based on the aforementioned results, we conclude that the elements of the matrix should take the form
1 X ∗ X X
⟨Ωk |Pα Q̃L Pα |Ωl ⟩ = 2 ukk′ ull′ (−1)r δk′ r δl′ r λsµ +sν eiϕµν κ∗µ κν H(d − 2p − sµ − sν )
N ′′ r µν
k l
d−2p+1
1 X ∗ X (0) (0) (S.80)
+ λd−2p+1 2
u u
kk′ ll ′ ⟨Ωk |(V G0 )m Pα Z̃L Pα (V G0 )d−2p+1−m |Ωl ⟩
N ′′ m=1
k l
min(⌈d/k⌉,⌈(d−2p+2)/k⌉
+ O(λ )),
where H(x) is Heaviside step function defined as H(x) = 1 if x ≥ 0 and H(x) = 0 otherwise. Recalling that

Z̃L = EQEC (ZL ), it is not hard to see that the second term of this expression includes the matrix Wkl defined in
Eq. (S.64),
d−2p+1 d−2p+1 Ä ä
(0) (0)
X X
⟨Ωk |(V G0 )m Pα Z̃L Pα (V G0 )d−2p+1−m |Ωl ⟩ = Tr ZL EQEC (|ψlm ⟩⟨ψkd−2p+1−m |) := Wkl (S.81)
m=1 m=1

Also, from the wavefunction normalization condition, we have


1 X sµ +sν iϕµν ∗
⟨Ωk |Ωk ⟩ = 1 = 2 λ e κµ κν H(d − 2p − sµ − sν ) + O(λ⌈d/k⌉ ). (S.82)
N µν

As the result, we get


Å ã
1
⟨Ωk |Pα Q̃L Pα |Ωl ⟩ = u† (Z + λ⌈(d−2p+1)/k⌉ 2 W )u + O(λmin(⌈d/k⌉,⌈(d−2p+1)/k⌉+1) ). (S.83)
N kl

This means that, generally, we can choose α and, therefore unitary u = αβ, such that
(0) (0) (0) (0)
X
⟨Ωk |Pα Q̃L Pα |Ωl ⟩ = ⟨Ωk |QL |Ωl ⟩ + O(λmin(⌈d/k⌉,⌈(d−2p+1)/k⌉ ) ≡ ⟨Ωk |QL |Ωl ⟩ + O(λ⌈ξ/k⌉ ), (S.84)
α

where ξ = d if p = 0 and ξ = d + 1 − 2p if p ≥ 1.
Now for p ≥ 1, the generalization error Eq. (S.69) is
Z !2
X î (0) (0)
ó
εQ (Q̃L ) := dθdϕµ(θ, ϕ) ψk∗ (θ, ϕ)ψl (θ, ϕ)Eα ⟨Ωk |Pα Q̃L Pα |Ωl ⟩ − ⟨Ωk |QL |Ωl ⟩
k,l∈{0,1}

λ⌈2(d−2p+1)/k⌉ (S.85)
Z
= dθdϕ µ(θ, ϕ)|⟨ψθ,ϕ |u† W (d−2p+1) u|ψθ,ϕ ⟩|2 + O(λR )
N4
λ⌈2(d−2p+1)/k⌉
≥ ⟨W (d−2p+1) ⟩µ(θ,ϕ) + O(λR ),
N4
where R = min(⌈d/k⌉, ⌈(d − 2p + 1)/k⌉ + 1). Here, we use the fact that u|ψθ,ϕ ⟩ = |ψθ−θ0 ,ϕ−ϕ0 ⟩ for some θ0 ∈ [−π, π]
and ϕ0 ∈ [−π, π]. Using Eq. (S.64), we have
εQ (Q̃L ) = Θ(λ⌈2(d−2p+1)/k⌉ ). (S.86)
This expression concludes our proof.

S11

IV. PROOF OF THEOREM 3

We start from the formal version of the Theorem stated as



Theorem 3 (Formal) Define the quantum neural network observable QQNN = UQ Z0 UQ , where UQ is a unitary
transformation and Z0 is the Pauli Z operator on the target qubit. Then for every Pauli observable Q ∈ {X, Y, Z}
there exists a corresponding unitary UQ such that the error εQ (QQNN )) = O(λ4⌈(d−2p)/k⌉ ).
Proof The proof begins by noting that the Pauli-Z operator acting on the output qubit, denoted by Z0 , has
eigenvalues of ±1. Consequently, the transformation it undergoes can always be expressed as

X X
QQN N := UQ Z0 UQ = (−1)k |uαk ⟩⟨uαk |, (S.87)
α k∈{0,1}

where |uαk ⟩ are orthonormal basis states. There is a one-to-one correspondence between the basis set {|uαk ⟩} and
the unitary UQ . Therefore, proving the existence of the basis set is equivalent to proving the existence of UQ .
To show the existence of the basis set, we first show that there exists another complete set of nearly-orthonormal
states {|vαk ⟩} such that

⟨vαk |vβk′ ⟩ = δαβ δkk′ + O(λ2q ).


(S.88)
X
(−1)l ⟨Ωk |Pβ |vαl ⟩⟨vαl |Pβ |Ωk′ ⟩ = (−1)k δkk′ + O(λ2q )
αl

where q := ⌈(d − 2p)/k⌉. Given this set, we can always assign the eigenstates as

|uαk ⟩ = |vαk ⟩ + O(λ2q ), (S.89)

where the last O(λ2q ) term can be decided by the Gram-Schmidt orthonormalization procedure applied to the set of
vectors {|vαk ⟩}. Finding such states ensures us that

X
⟨Ωk |Pα UQ Z0 UQ Pβ |Ωk′ ⟩ = (−1)l ⟨Ωk |Pβ |vαl ⟩⟨vαl |Pβ |Ωk′ ⟩ + O(λ2q ) = (−1)k δkk′ + O(λ2q ). (S.90)
αl

Then, the generalization error is


Z " Å ã#2
† (0) (0)
X
εQ (QQNN ) = Eβ dθdϕµ(θ, ϕ) ψk ψl∗ ⟨Ωk |Pβ UQ Z0 UQ Pβ |Ωl ⟩ − ⟨Ωk |QL |Ωl ⟩ = O(λ4q ), (S.91)
kl

which would brings us the proof of the theorem.


The rest of the proof is centered around constructing the basis in Eq. (S.88). We start form Lemma 2. We rewrite
the result in Eq. (S.13) by “diagonalizing” the error basis as

⟨Ωk |Bα† Bβ |Ωk′ ⟩ = eα δαβ δkk′ + λq h̃αk,βk′ , (S.92)



P
where Bα = β ωαβ Pβ are new error operators, ωαβ is a unitary matrix defining the eigenbasis of ϵαβ , eα are
eigenvalues of the matrix ϵαβ ,
X X
∗ ∗
ωαα′ ϵα′ β ′ ωββ ′ = eα δαβ , h̃αk,βk′ := ωαα′ hα′ k,β ′ k′ ωββ ′. (S.93)
α′ β ′ α′ β ′

Assume D is the number of independent errors {Pα }. Assuming that eα > 0 for small enough λ, our goal is to find a
set of (unnormalized) vectors |wαa ⟩ such that states
1  
|vαk ⟩ = √ Bα |Ωk ⟩ + λq |wαk ⟩ , α≤D
eα (S.94)
|vαk ⟩ = |ψαk ⟩, α>D
S12

satisfy Eq. (S.88). Here |ψαk ⟩ are states that are orthogonal to the error space spanned on vectors Pα |Ωk ⟩. To do so,
we first expand the l.h.s. of the first condition using the ansatz in Eq. (S.94), we rewrite
X X (−1)l 
(−1)l ⟨Ωk |Pβ |ναl ⟩⟨ναl |Pβ |Ωk′ ⟩ = ⟨Ωk |Pβ Bα |Ωl ⟩⟨Ωl |Bα† Pβ |Ωk′ ⟩ + λq ⟨Ωk |Pβ Bα |Ωl ⟩⟨wαl |Pβ |Ωk′ ⟩

α,l α,l (S.95)

q † 2q
+ λ ⟨Ωk |Pβ |wαl ⟩⟨Ωl |Bα Pβ |Ωk′ ⟩ + O(λ ) .

Next, we use the inverse transformation that connects Pauli Pα error to the the errors Bα as
X
Pβ = ωβ ′ β B β ′ . (S.96)
β′

The first term in Eq. (S.95) becomes


X (−1)l X (−1)l X
⟨Ωk |Pβ Bα |Ωl ⟩⟨Ωl |Bα† Pβ |Ωk′ ⟩ = ωβ ′′ β ωβ∗ ′ β ⟨Ωk |Bβ† ′ Bα |Ωl ⟩⟨Ωl |Bα† Bβ ′′ |Ωk′ ⟩
eα eα
α,l α,l β ′ ,β ′′
X X (−1)l 
= ωβ ′′ β ωβ∗ ′ β e2α δαβ ′ δαβ ′′ δkl δk′ l + λq eα δαβ ′ δkl h̃αl,β ′′ k′

β ′ ,β ′′ α,l

+ λq eα δαβ ′′ δk′ l h̃β ′ k,αl + O(λ2q )
X X (S.97)
= (−1)k δkk′ ωβ∗ ′ β eβ ′ ωβ ′ β + λq (−1)k ωβ∗ ′ β h̃β ′ k,β ′′ k′ ωβ ′′ β
β′ β ′ ,β ′′
′ X
+ λq (−1)k ωβ∗ ′ β δk′ l h̃β ′ k,β ′′ k′ ωβ ′′ β + O(λ2q )
β ′ ,β ′′
h ′
i
= (−1)k δkk′ ϵββ + λq (−1)k + (−1)k hβk,βk′ + O(λ2q )
= (−1)k δkk′ + O(λ2q ),

where we used the fact that hβk,βk′ = 0 and ϵββ = 1. The second term in Eq. (S.95) becomes
X (−1)l X (−1)l
⟨Ωk |Bβ† ′ Bα |Ωl ⟩⟨wαl |Pβ |Ωk′ ⟩
X
λq ⟨Ωk |Pβ Bα |Ωl ⟩⟨wαl |Pβ |Ωk′ ⟩ = λq ωβ∗ ′ β
eα eα
α,l β′ α,l
X X
= λq ωβ∗ ′ β (−1)l δkl δαβ ′ ⟨wαl |Pβ |Ωk′ ⟩ + O(λ2q ) (S.98)
β′ α,l
X
= λq (−1)k ωβ∗ ′ β ⟨wβ ′ k |Pβ |Ωk′ ⟩ + O(λ2q ).
β′

Finally, the third term in Eq. (S.95) becomes


X (−1)l X X (−1)l
λq ⟨Ωk |Pβ |wαl ⟩⟨Ωl |Bα† Pβ |Ωk′ ⟩ = λq ωβ ′ β ⟨Ωk |Pβ |wαl ⟩⟨Ωl |Bα† Bβ |Ωk′ ⟩
eα eα
α,l β′ α,l
X X
q
=λ ωβ ′ β (−1)l δk′ l δαβ ′ ⟨Ωk |Pβ |wαl ⟩ + O(λ2q ) (S.99)
β′ α,l
′ X
= λq (−1)k ωβ ′ β ⟨Ωk |Pβ |wβ ′ k′ ⟩ + O(λ2q ).
β′

Together, these expressions give us


X X ′

(−1)l ⟨Ωk |Pβ |vαl ⟩⟨vαl |Pβ |Ωk′ ⟩ = (−1)k δkk′ + λq (−1)k ωβ ′ β ⟨Ωk |Pβ |wβ ′ k′ ⟩ + (−1)k ωβ∗ ′ β ⟨wβ ′ k |Pβ |Ωk′ ⟩ +O(λ2q )
αl β′
 ′

= (−1)k δkk′ + λq (−1)k ⟨Ωk |Pβ |w̃βk′ ⟩ + (−1)k ⟨w̃βk |Pβ |Ωk′ ⟩ +O(λ2q ),
(S.100)
S13

where we introduced the states


X
|w̃βk ⟩ := ωβ ′ β |wβ ′ k ⟩. (S.101)
β′

Similarly, we express the second condition as


λq  
⟨vαk |vβk′ ⟩ = δkk′ δαβ + √ h̃αk,βk′ + ⟨wαk |Bβ |Ωk′ ⟩ + ⟨Ωk |Bα† |wβk′ ⟩ +O(λ2q )
eα eβ
λq X   (S.102)
∗ 2q
= δkk′ δαβ + √ ωαα′ ωββ ′ hα′ k,β ′ k ′ + ⟨w̃α′ k |Pβ ′ |Ωk ′ ⟩ + ⟨Ωk |Pα′ |w̃β ′ k ′ ⟩ +O(λ ).
eα eβ ′ ′
αβ

Then, to satisfy the conditions in Eq. (S.88), it is sufficient to have



(−1)k ⟨Ωk |Pβ |w̃βk′ ⟩ + (−1)k ⟨w̃βk |Pβ |Ωk′ ⟩ = O(λq ),
(S.103)
hαk,βk′ + ⟨w̃αk |Pβ |Ωk′ ⟩ + ⟨Ωk |Pα |w̃βk′ ⟩ = O(λq ).
It is straighforward to verify that the following solution satisfies both conditions in Eq. (S.103),
X 1X
|w̃αk ⟩ = c∗αk,βk′ Pβ |Ωk′ ⟩, cαk,βk′ = − hαk,γk′ (ϵ−1 )γβ . (S.104)

2 γ
βk

where (ϵ−1 )αβ are the matrix elements of the inverse matrix to ϵαβ . Indeed, if we insert this solution in the first line,
we get

(−1)k ⟨Ωk |Pβ |w̃βk′ ⟩ + (−1)k ⟨w̃βk |Pβ |Ωk′ ⟩ (S.105)
X ′ X
= (−1)k c∗βk′ ,β ′ k′′ ⟨Ωk |Pβ Pβ ′ |Ωk′′ ⟩ + (−1)k cβk,β ′ k′′ ⟨Ωk′′ |Pβ ′ Pβ |Ωk′ ⟩ (S.106)
β ′ k′′ β ′ k′′
X ′ X
= (−1)k (cβk′ ,β ′ k ϵβ ′ β )∗ + (−1)k cβk,β ′ k′ ϵβ ′ β + O(λq ) (S.107)
β′ β′

= (−1)k h∗βk′ ,βk + (−1)k hβk,βk′ + O(λq ) = O(λq ) (S.108)
(S.109)
Similarly, for the next line, we have
hαk,βk′ + ⟨w̃αk |Pβ |Ωk′ ⟩ + ⟨Ωk |Pα |w̃βk′ ⟩ (S.110)
Xï ò
= hαk,βk′ + cαk,β ′ k′′ ⟨Ωk′′ |Pβ ′ Pβ |Ωk′ ⟩ + c∗βk′ ,β ′ k′′ ⟨Ωk |Pα Pβ ′ |Ωk′′ ⟩ (S.111)
β ′ k′′
Xï ò

= hαk,βk′ + cαk,β ′ k′ ϵβ ′ β + (cβk′ ,β ′ k ϵβ ′ α ) + O(λq ) (S.112)
β′
1 1
= hαk,βk′ − hαk,βk′ − h∗βk′ ,αk + O(λq ) = O(λq ) (S.113)
2 2
Then, using Eq. (S.101), we obtain
1X 1X
|wαk ⟩ = − ωβα c∗βk,γk′ Pγ |Ωk′ ⟩, cαk,βk′ = − hαk,γk′ (ϵ−1 )γβ . (S.114)
2 ′
2 γ
βγk

Thus, we get the basis for the optimal unitary QNN by substituting Eq. (S.104) into Eq. (S.94). By proving the
existence of Eq. (S.88), we prove Eq. (S.91) and thus, complete our proof. □

V. PROOF OF PROPOSITION 1

In the previous scetion, we have proven the existence of the unitary transformation that provides εQ (QQNN ) =
O(λ4⌈(d−2p)/k⌉ ). In this section, we show that this result is not possible to improve it under generic problem setting.
Let us define the matrix
d−2p
(0) (0)
X
(Tαβ )kl := ⟨Ωk |(V G̃0 )m Pα Pβ (V G̃0 )d−2p−m |Ωl ⟩. (S.115)
m=1
S14

Proposition 1 Under conditions of Theorem 3, consider any distribution µ(θ, ϕ) such that f (θ, ϕ) = ±1, µ(θ, ϕ) =
µ(θ, ϕ + π), and
Z
∀θ0 , ϕ0 : Eα Eβ dθdϕ µ(θ + θ0 , ϕ + ϕ0 )|⟨ψk (θ, ϕ)|Tαβ |ψl (θ, ϕ + π)⟩|4 > 0. (S.116)

Then, for all possible UQ the generalization error satisfies


εQ (QQNN ) = Ω(λ4⌈(d−2p)/k⌉ ). (S.117)
¶ (0) (0)
©
Lemma 4 Any state |Ψ⟩ ∈ C := |Ψ(0) ⟩, |Ψ(0) ⟩ = cos θ|Ω0 ⟩ + eiϕ sin θ|Ω1 ⟩, ∀θ, ϕ ∈ [0, 2π] and its orthogonal state
(0)
in the codespace |Ψ⊥ ⟩ have opposite-sign expectation values in respect to QL ∈ {XL , YL , ZL }, i.e.
(0) (0)
⟨Ψ(0) |QL |Ψ(0) ⟩ = −⟨Ψ⊥ |QL |Ψ⊥ ⟩. (S.118)
Proof Assuming that
|Ψ(0) ⟩ = cos θ|Ω0 ⟩ + eiϕ sin θ|Ω1 ⟩, (S.119)
its orthogonal state (up to a phase factor) must take the form
(0)
|Ψ⊥ ⟩ = sin θ|Ω0 ⟩ − eiϕ cos θ|Ω1 ⟩. (S.120)
It is straightforward to verify that
(0) (0)
⟨Ψ⊥ |XL |Ψ⊥ ⟩ = − sin 2θ cos ϕ = −⟨Ψ(0) |XL |Ψ(0) ⟩,
(0) (0)
⟨Ψ⊥ |YL |Ψ⊥ ⟩ = − sin 2θ sin ϕ = −⟨Ψ(0) |YL |Ψ(0) ⟩, (S.121)
(0) (0) (0) (0)
⟨Ψ⊥ |ZL |Ψ⊥ ⟩ = − cos 2θ = −⟨Ψ |ZL |Ψ ⟩.
Thus, we arrive at the statement of the Lemma. □

Proof of Proposition 1. The generalization error of decoding with QNN can be rewritten as
Z  2

εQ (QQNN ) = Eα dθdϕµ(θ, ϕ) Tr (Z0 UQ Pα |Ψ⟩⟨Ψ|Pα UQ ) − ⟨Ψ(0) |QL |Ψ(0) ⟩ . (S.122)

For simplicity, let us introduce the following notations for label values
⊥ (0) (0)
fQ (θ, ϕ) := ⟨Ψ(0) |QL |Ψ(0) ⟩, fQ (θ, ϕ) := ⟨Ψ⊥ |QL |Ψ⊥ ⟩ = −fQ , (S.123)
where we use the results of Lemma 4. Note that fQ (θ, ϕ) can be ±1 according to the condition of the Proposition.
Recall that the input states |Ψ⟩ and |Ψ⊥ ⟩ are constructed from the codewords, with coefficients ψk := ψk (θ, ϕ) and
ψk⊥ := ψk⊥ (θ, ϕ),
X X
|Ψ⟩ = ψk |Ωk ⟩, |Ψ⊥ ⟩ = ψk⊥ |Ωk ⟩, (S.124)
k k

ψk ψk⊥ = 0. Then we introduce


P
where the coefficients satisfy orthogonality condition k
† †
Fα (θ, ϕ) := Tr (Z0 UQ Pα |Ψ⟩⟨Ψ|Pα UQ ), Fα⊥ (θ, ϕ) := Tr (Z0 UQ Pα |Ψ⊥ ⟩⟨Ψ⊥ |Pα UQ ). (S.125)

Using these notations and note that fQ (θ, ϕ)2 = 1, we rewrite


Z
2
εQ (QQNN ) = Eα dθdϕµ(θ, ϕ) (1 − fQ (θ, ϕ)Fα (θ, ϕ))
ÅZ Z ã
1 2 2
= Eα Eβ dθdϕµ(θ, ϕ) (1 − fQ (θ, ϕ)Fα (θ, ϕ)) + dθdϕµ(θ, ϕ + π) (1 − fQ (θ, ϕ)Fβ (θ, ϕ))
2
Z
1 h
2 ⊥
2 i
= Eα Eβ dθdϕµ(θ, ϕ) (1 − fQ (θ, ϕ)Fα (θ, ϕ)) + 1 − fQ (θ, ϕ)Fβ⊥ (θ, ϕ) (S.126)
2
Z
1 ⊥
2
≥ Eα Eβ dθdϕµ(θ, ϕ) 2 − fQ (θ, ϕ)Fα (θ, ϕ) − fQ (θ, ϕ)Fβ⊥ (θ, ϕ)
4
 2
Z Å ã
1  ⊥
= Eα Eβ dθdϕµ(θ, ϕ) 1 − fQ (θ, ϕ) Fα (θ, ϕ) − Fβ (θ, ϕ) ,
2
S15

where we used the symmetry property of the input state distribution µ(θ, ϕ) = µ(θ, ϕ + π) and the inequality a2 + b2 ≥
1 2
2 (a + b) . Next, let us focus on the quantity

δF := Fα (θ, ϕ) − Fβ⊥ (θ, ϕ) ≡ Tr UQ Z0 UQ (Pα |Ψ⟩⟨Ψ|Pα − Pβ |Ψ⊥ ⟩⟨Ψ⊥ |Pβ ) . (S.127)
† †
Since the operator UQ Z0 UQ has eigenvalues ±1, its operator norm is ∥UQ Z0 UQ ∥ = 1. Then, by definition of the trace
distance D(ρ, σ) between states ρ and σ, we have
|δF | ≤ 2D (Pα |Ψ⟩⟨Ψ|Pα , Pβ |Ψ⊥ ⟩⟨Ψ⊥ |Pβ ) . (S.128)
Then, the generalized error is bounded as
Z   2
εQ (QQNN ) ≥ Eα Eβ dθdϕµ(θ, ϕ) 1 − D Pα |Ψ⟩⟨Ψ|Pα , Pβ |Ψ⊥ ⟩⟨Ψ⊥ |Pβ . (S.129)

For any two pure states |Ψ1 ⟩ and |Ψ2 ⟩, the trace distance satisfies
 » 1
D |Ψ1 ⟩⟨Ψ1 |, |Ψ2 ⟩⟨Ψ2 | ≤ 1 − |⟨Ψ1 |Ψ2 ⟩|2 ≤ 1 − |⟨Ψ1 |Ψ2 ⟩|2 . (S.130)
2
Using this inequality, we have
Z
1
εQ (QQNN ) ≥ Eα Eβ dθdϕµ(θ, ϕ)|⟨Ψ|Pα Pβ |Ψ⊥ ⟩|4 . (S.131)
4
Using this representation, we can rewrite
X
⟨Ψ|Pα Pβ |Ψ⊥ ⟩ = ψk∗ ψl⊥ ⟨Ωk |Pα Pβ |Ωl ⟩. (S.132)
kl

Now we can expand the codewords in terms of their BW series (Eq. (S.48)),
X u∗ ′ ull′ ∞
X (0) ′ (0)
⟨Ωk |Pα Pβ |Ωl ⟩ = kk
⟨Ωk′ |(λV G̃0 )m Pα Pβ (λG̃0 V )m |Ωl′ ⟩ + O(λd ). (S.133)
N2
′ ′
k l m,m′ =0

In the summation over i, i′ in Eq. (S.133), for terms with m + m′ < ⌈(d − 2p)/k⌉, we can apply the Knill-Laflamme
condition (Eq. (S.3)),
X (0) ′ (0)
X ′
⟨Ωk′ |(λV G̃0 )m Pα Pβ (λG̃0 V )m |Ωl′ ⟩ = ϵmm
αβ δk′ l′ := δk′ l′ ϵ̃αβ , (S.134)
m+m′ < m+m′ <
⌈(d−2p)/k⌉ ⌈(d−2p)/k⌉


where ϵmm
αβ can be different for different m, m′ . Recall that u is a unitary matrix, we have
1 Ä  ä
⟨Ωk |Pα Pβ |Ωl ⟩ = 2
ϵ̃αβ δkl + λ⌈(d−2p)/k⌉ u† Tαβ u kl + O(λ⌈(d−2p+1)/k⌉ ). (S.135)
N
Thus, for any unitary u we have
Z Z
Eα Eβ dθdϕµ(θ, ϕ)|⟨Ψ|Pα Pβ |Ψ⊥ ⟩|4 = Eα Eβ dθdϕ µ(θ, ϕ)|⟨ψk (θ, ϕ)|u† T̃αβ u|ψl (θ, ϕ + π)⟩|4
Z (S.136)
= Eα Eβ dθdϕ µ(θ + θ0 , ϕ + ϕ0 )|⟨ψk (θ, ϕ)|T̃αβ |ψl (θ, ϕ + π)⟩|4 > 0.

The last inequality follows from the assumption in Eq. (S.116). Therefore, we have
Z
Eα Eβ dθdϕµ(θ, ϕ)|⟨Ψ|Pα Pβ |Ψ⊥ ⟩|4 = Θ(λ4⌈(d−2p)/k⌉ ). (S.137)

Here Ω(·) is a big-Omega notation that show that asymptotically at λ → 0 the function is both upper bounded and
lower bounded by the funcion in Theta. Insetring this result in Eq. (S.131), we get

εQ (QQNN ) = Ω(λ4⌈(d−2p)/k⌉ ). (S.138)


Finally, using Theorem 3, we arrive at the statement of the Proposition. □
S16

Remark 5 The assumption in Eq. (S.116) serves as a generality condition for the chosen input states and perturbation
V . As we had shown in Lemma 3, the matrix Tαβ is not identity, at least for some values of V and noise operators.
Therefore, the l.h.s. of Eq. (S.116) vanishes in this case only for a fine-tuned choice of input states. More generally,
there is no particular reason to believe that this expression vanishes for generic parameters of the problem.
Remark 6 For noiseless input (p = 0), the QNN considered in Theorem 3 can decode input states with arbitrary
accuracy, provided that there is no constraint on its architecture.

VI. DETAILS OF NUMERICAL SIMULATIONS

In this section, we will delve into the details of the numerical results that are presented in the main text. First,
we introduce the stabilizers of the unperturbed codes used in the numerical simulations, as well as the possible
errors considered in these codes. Then we outline the numerical procedures for selecting perturbations and obtaining
codewords from the perturbed Hamiltonian, as described by Eq. (1). For Fig. 2, we provide insight into the construction
of the “error-corrected” logical operators, as expressed in Eq. (4), and elaborate on how we obtain the expectation
values. Moving on to Fig. 3, we outline the steps for training the QNN as specified in Eq. (6)-(5), and provide details
on how to estimate the generalization error in QNNs. We conclude this section by discussing the effect of QNN depth
on the results and by considering possible alternative architectures.

A. Unperturbed codes

In this subsection, we list the stabilizers Sa for the unperturbed stabilizer codes used in the main text, as well
possibe Pauli errors that can occur in these codes.

1. [[5,1,3]] code

The stabilizers of this code are [42]


S1 = XZZXI,
S2 = IXZZX,
S3 = XIXZZ,
S4 = ZXIXZ.
There are 3 × 5 = 15 single-qubit errors, which correspond to a single-Pauli error, X, Y or Z on one of the 5 qubits.

2. [[7,1,3]] (Steane) code

The stabilizers of this code are [15]


S1 = IIIXXXX,
S2 = IXXIIXX,
S3 = XIXIXIX,
S4 = IIIZZZZ,
S5 = IZZIIZZ,
S6 = ZIZIZIZ.
There are 3 × 7 = 21 single-qubit errors, which correspond to a single-Pauli error, X, Y or Z on one of the 7 qubits.

3. [[9,1,3]] (Shor) code

The stabilizers of this code are [13]


S1 = ZZIIIIIII,
S17

S2 = IZZIIIIII,
S3 = IIIZZIIII,
S4 = IIIIZZIII,
S5 = IIIIIIZZI,
S6 = IIIIIIIZZ,
S7 = XXXXXXIII,
S8 = IIIXXXXXX.

There are 3 × 9 = 27 single-qubit errors, which correspond to a single-Pauli error, X, Y or Z on one of the 9 qubits.

4. [[11,1,5]] code

The stabilizers are [43]

S1 = XZIZIXIZZII,
S2 = IY IZZY IIZZI,
S3 = IZXIZXIIIZZ,
S4 = IZZY IY ZIIZI,
S5 = IIZZXXZZIZZ,
S6 = IIZZIIY ZZIY,
S7 = IZIZZZZY IZY,
S8 = IIIZIZIZXZX,
S9 = IZIZIIZIZXX,
S10 = ZZZZZZIIIII.

There are 11× 3 = 33 single-qubit errors, which correspond to a single-Pauli error, X, Y or Z on one of the 11 qubits,
and 32 × 112 = 495 two-qubit errors, which correspond to two-Pauli errors on two of the 11 qubits.

B. QEC simulations

Given codewords constructed as linear combination of the two lowest fully-perturbed eigenstates |ψk ⟩,
X
|Ωk ⟩ = αkk′ |ψk′ ⟩, (S.139)
k′ ∈{0,1}

where αkk′ is a unitary matrix that we can choose to optimize standard QEC performance. We use the following
parameterization:

eit0 cos t1 ei(t0 +t2 ) sin t1


Å ã
α = i(t0 +t3 ) . (S.140)
e sin(t1 ) −ei(t0 +t2 +t3 ) cos(t1 )

Ignoring the global phase, we set t0 = 0. We√then choose t1 and t2 by maximizing ⟨Ω0 |ZL |Ω0 ⟩, and t3 by maximizing
⟨Ω+ |XL |Ω+ ⟩, where |Ω± ⟩ := (|Ω0 ⟩ ± |Ω1 ⟩)/ 2.

Once α is fixed, we build P samples


P
|Ψµ ⟩ = cos(θµ )|Ω0 ⟩ + sin(θµ )eiϕµ |Ω1 ⟩

µ=1
, (S.141)

where θµ ∈ [0, π] and ϕµ ∈ [0, 2π] are chosen i.i.d. from corresponding uniform distributions. This provides an
estimate for the integral with the constant measure µ(θ, ϕ) = 1/2π 2 .
S18

1. Details for Fig. 2

In all cases, we estimate the decoding loss εQ on S = 103 samples of the input state |ΨP µ ⟩ (Eq. (S.141)). The
n
results are then averaged over 105 different realizations of the single-qubit perturbation, V = i=1 Vi , where random
correction Vi is acting on qubit i, sampled from 2 × 2 GUE, and normalized such that ||Vi || = 1.
In Fig. 2(a), after specifying fQ (θ, ϕ) for each |Ψ⟩ as defined in Eq. (2) (e.g., fX (θ, ϕ) = sin 2θ cos ϕ, fY (θ, ϕ) =
sin 2θ sin ϕ, fZ (θ, ϕ) = cos 2θ), we get the exact expectation of the ideal code logical operator by evaluating ⟨Ψ|QL |Ψ⟩,
where QL ∈ {XL , YL , ZL }.
In Fig. 2(b), we numerically obtain the expectation value of effective error-corrected logical operator Q̃L , where
X Y1
Q̃L := Fµ† QL Fµ , Fµ = Cµ (1 + sαµ Sα ) = P Cµ , (S.142)
µ α
2

where sαµ = ±1 represents the µ-th error syndrome obtained from measuring projection onto the a-th stabilizer
subspace, Cµ represents the single-qubit (5, 7, 9, 11-qubit codes) or two-qubit (11-qubit code) error-correction corre-
P (0) (0)
sponding to syndrome sαµ , and P = k=0,1 |Ωk ⟩⟨Ωk | is the projector to the codespace.
In Fig. 2(c), for each sample state |Ψµ ⟩ (Eq. (S.141)), we first randomly apply an error (single-qubit Pauli error for
5, 7, or 9-qubit code, single or two-qubit Pauli error for 11-qubit code) to it, then obtain the expectation value of Q̃L .

C. QNN-QEC simulations

We train the QNN with NT ∼ 103 training sample states as defined in Eq. (S.141). In some numerical simulations
(see below) we use exact states, in the other we apply to some of the states single- or two-qubit Pauli errors. After
training, we test our QNN on NV = 10 × NT samples withheld from the training but independently drawn from the
same distribution. The factor of 10 is chosen to ensure that validation set provides statistically significant results. If
the error is present in the training set, the states in the validation set are chosen to contain errors as well. For each
validation sample state |Ψµα ⟩ := Pα |Ψµ ⟩, we obtain the expectation value of measuring Z operator at the output
qubit, by computing

⟨QQNN ⟩µα := ⟨Ψµα |UQ Z0 UQ |Ψµα ⟩. (S.143)

Then we use the result to estimate the generalization error by performing average over the NV validation sample
states,

M NV /M
1 X X Ä ä2
εQ (QQNN ) = ⟨QQNN ⟩µα − ⟨Ψ(0) (0)
µ |QL |Ψµ ⟩ . (S.144)
M NV α=1 µ=1

1. Details for Fig. 3

In this section, we present details of the QNN simulation in the main text Fig.3. Specializing the general architecture
in Fig.3(a) to 5-qubit code, we use the 5-qubit QNN architecture in Fig. S1. In Fig.3(b)-(d), we have NT = 1600
training examples. After training, as described in the previous section, we validate on NV different validation samples,
independently drawn from the same distribution.
Each two-qubit unitary Uk (Eq. (6)) contains 42 −1 = 15 independent parameters. We fix the overall non-measurable
angle by choosing ϕk00 = 1 and allow the other 15 angles ϕkαβ to be free parameters to be optimized. For the QNN
shown in Fig. S1, there are a total of (4C + 1) × 15 parameters. For the result presented in Fig. 3, we chose dC = 4,
which corresponds to 255 parameters. We optimize these parameters with BFGS method using the SciPy optimize
library (maximum number of iterations is 105 , tolerance for termination is 10−15 ). Note that it is more common to
optimize QNN with Stochastic Gradient Descent (SGD), where the gradients are estimated by methods such as finite
difference. We find that, however, optimizing with SGD tends to yield larger decoding error, incompatible with the
accuracy required by the comparison with perturbation theory.
In Fig.3(b)-(c), all the NT training samples and NV validation samples are unaffected by errors. In Fig.3(d), all the
samples are noisy: we randomly apply one of the M = 16 possible single-qubit Pauli errors (including the identity)
to each of the sample states in both training and validation set.
S19

(a) (b) (c) (d) (e)


(depth )

FIG. S1. 5-qubit QNN. (a) The ‘error-correction circuit’ CQ has depth dC . The ‘decoding circuit’ DQ consitst of a single
2-qubit unitary. (b)-(e) The decoding performance improves as depth dC increases.

FIG. S2. Different QNN architectures. We consider the same task as in Fig.3(c). dC = 2, 3, 4 corresponds to independent
unitaries Uk in Fig. S1, dC = 4 (Trans. Inv.) correspond to same Uk within the same layer. For reference, we reproduce the
decoding performance of the standard decoding εX (QQEC ) (green curve) and εX (QL ) (blue curve) as in Fig.3(c).

2. The effect of QNN depth

In this section, we discuss the rationale for choosing the architecture presented in Fig.3(a), as well as the effect of
different depths dC for the QNN. The motivation for calling CQ the ‘error-correction circuit’ is as follows. We find
that for a shallow network, e.g., dC = 1, even for ideal stabilizer code (λ = 0), the QNN fails to learn noisy samples
(Fig. S1(b)). However, as the depth dC increases, the QNN generalization erorr decreases and ⟨XQNN ⟩ gradually
approaches fX (Fig. S1(c)-(d)), and there appears to be a critical depth where the scatter plot between ⟨XQNN ⟩
and fX suddenly falls on the diagonal (QNN expectation value matches the expected measurement value fX ). We
observe that at dC = 4 the scatter plot falls on the diagonal (⟨XQNN ⟩ ≈ fX ) (Fig. S1(e)). Therefore, CQ has the
spirit of performing error-correction on the noisy samples, but we emphasis that CQ is not really performing the
error-correction equivalent to the standarad QEC. For dC = 4, the typical optimization time is around one hour on a
intel i7 @ 1.90GHz 4 Cores CPU (without parallelization).
Next, we compare the generalization error across different circuit depths dC for the architecture in Fig. S1. We
also present a potentially different architecture with less number of parameters than the one in Fig.3(a). We consider
the same task as in Fig.3(c) (noiseless inputs). In Fig. S2, we consider different depths dC = 2, 3, 4, and find the
generalization performance improves as depth increases. Next, we consider a Translationally Invariant (Trans. Inv.)
architecture where all the 2-qubit unitaries Uk within the same layer of CQ (same layer of circuit, see the grey
blockcade in Fig. S1, which includes two layers of qubits) share the same parameters ϕkαβ . Note that this architecture
contains less parameters, e.g., dC = 4 (Trans. Inv. ) has 75 parameters, less than the 255 parameters in dC = 4. We
find that the generalization performance of this architecture is worse than that of the dC = 4 architecture we used
in Fig. S1 (grey curve in Fig. S2). It is worth noticing that all the different depths and architectures considered in
Fig. S1 has roughly the same constant scaling with λ, suggesting a robust scaling behavior regardless of the specific
architecture used.

You might also like