0% found this document useful (0 votes)
24 views13 pages

S P Q C N N A: Ubspace Reserving Uantum Onvolutional Eural Etwork Rchitectures

This document presents a novel quantum convolutional neural network (QCNN) architecture that utilizes Hamming weight preserving quantum circuits to achieve polynomial speed-ups in training and performance compared to classical deep learning architectures. The proposed architecture includes convolutional and pooling layers that maintain the symmetries of quantum states while incorporating non-linear operations, and it demonstrates effective image classification capabilities with fewer parameters. An open-source simulation library is provided to facilitate efficient implementation and testing of these quantum techniques using GPU-oriented resources.

Uploaded by

ankitsharmaps92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

S P Q C N N A: Ubspace Reserving Uantum Onvolutional Eural Etwork Rchitectures

This document presents a novel quantum convolutional neural network (QCNN) architecture that utilizes Hamming weight preserving quantum circuits to achieve polynomial speed-ups in training and performance compared to classical deep learning architectures. The proposed architecture includes convolutional and pooling layers that maintain the symmetries of quantum states while incorporating non-linear operations, and it demonstrates effective image classification capabilities with fewer parameters. An open-source simulation library is provided to facilitate efficient implementation and testing of these quantum techniques using GPU-oriented resources.

Uploaded by

ankitsharmaps92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

S UBSPACE P RESERVING Q UANTUM C ONVOLUTIONAL N EURAL

N ETWORK A RCHITECTURES

Léo Monbroussou1,2 , Jonas Landman3,4 , Letao Wang1,5 , Alex Grilo1 , and Elham Kashefi1,3
1
Laboratoire d’Informatique de Paris 6, CNRS, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France
2
CEMIS, Direction Technique, Naval Group, 83190 Ollioules, France
3
School of Informatics, University of Edinburgh, United Kingdom
4
QC Ware, Palo Alto, USA and Paris, France
arXiv:2409.18918v1 [quant-ph] 27 Sep 2024

5
Ecole Normale Supérieure Paris-Saclay, 4 avenue des Sciences, 91190 Gif-sur-Yvette, France

Subspace preserving quantum circuits are a class of quantum algorithms that, relying on some sym-
metries in the computation, can offer theoretical guarantees for their training. Those algorithms
have gained extensive interest as they can offer polynomial speed-up and can be used to mimic clas-
sical machine learning algorithms. In this work, we propose a novel convolutional neural network
architecture model based on Hamming weight preserving quantum circuits. In particular, we intro-
duce convolutional layers, and measurement based pooling layers that preserve the symmetries of
the quantum states while realizing non-linearity using gates that are not subspace preserving. Our
proposal offers significant polynomial running time advantages over classical deep-learning archi-
tecture. We provide an open source simulation library for Hamming weight preserving quantum
circuits that can simulate our techniques more efficiently with GPU-oriented libraries. Using this
code, we provide examples of architectures that highlight great performances on complex image
classification tasks with a limited number of qubits, and with fewer parameters than classical deep-
learning architectures.

1 Introduction
Quantum Machine Learning (QML) has become a promising area for real world applications of quantum computers,
but near-term methods and their scalability are still important research topics. A consequent amount of efforts has
been put into understanding how to avoid Barren Plateaus (BP) [1, 2], a vanishing gradient phenomenon that prevents
the variational algorithms [3] from being trained efficiently. In particular, evidence has recently been shown [4] that
the structures that allow us to avoid BP seem to allow classical simulation techniques. In addition, other important
questions must be tackled to design near-term quantum algorithms that may offer an advantage. How to ensure that the
performance of the algorithms will scale with its dimension, and how to compare classical and quantum algorithms on
different figures of merit for a same use case? Those questions are especially hard to answer as we only have access
to simulation tools over a low number of qubits or to noisy intermediate scale quantum computers [5].
In this context, we propose a new QML algorithm that behaves as a Convolutional Neural Network (CNN) architecture
[6]. This type of neural network is particularly useful in classical Machine Learning for many use cases including, for
example, computer vision tasks [7, 8], and Time Series analysis [9]. We illustrate ours on image classification. This
analogy allows to ensure that our algorithm will still be useful on a larger scale, as CNN is widely used. To design
such an algorithm, we used Hamming weight (HW) preserving quantum circuits [10], a particular type of subspace
preserving framework that allows one to avoid BP [10–14] by considering a Hilbert space of polynomial size at the
cost of having only a polynomial advantage, that could be of high degree. While recent works [4, 15] have shown
that the absence of BP in this framework leads to the existence of efficient simulation under certain conditions, i.e., no
exponential running time complexity advantage, we play within this framework and measurement based techniques to
offer significant advantages to our method. In addition, large simulations using GPU clusters show impressive results
for our method in comparison with the classical one, including a reduction of the number of parameters which could
lead to an even greater advantage.

Related Work: Our proposal differs from previous CNN architectures such as [16–18] in several ways. First, it
mimics classical convolutional layers, and pooling by using a specific subspace preserving encoding. Therefore, we
believe CNNs could be replaced by our quantum equivalents even for large architecture. Secondly, our proposal offers
polynomial speed-ups and is therefore ”classically simulable”, in the sense that a classical algorithm can perform
the same computation in polynomial time due to the use of quantum circuits that are subspace preserving. This
choice is motivated by the theoretical guarantees of such circuits in the training and expressivity of the model [10,
13, 14]. However, one could use our methods to offer a polynomial speed-up of high degree that could eventually
achieve a quantum utility in comparison with CNNs, especially considering that our method seems to achieve similar
performances with less parameters. Finally, we tackle the classical simulation of our layers by offering a subspace
preserving simulation library [19] that allows us to test our proposal on larger learning problems than usually presented
in the QML community. Our proposed QCNN architecture is quite different from the one in [16], and the results found
in [20] likely do not directly translate to this work. In particular, our architecture contains correlated parameters and
measurement-controlled operations. Future work may determine if LOWESA or similar algorithms based on Pauli
propagation can be adapted to achieve speed-ups paralleling those of our specialized method. In addition, recent work
[20] on classical simulation of one specific type of Quantum Convolutional Neural Networks (QCNN) has shown,
using the LOWESA algorithm [21, 22], that the Iris Cong [16] proposal of QCNN is effectively classically simulable
by considering the subspace of the low-weight measurement operators that are sufficient for the classification of
“locally-easy” datasets. This method of classical simulation is different from the one used in our library, and could be
applied to algorithms that are not subspace preserving. Our proposed QCNN architecture is quite different from the
one in [16], and the results found in [20] likely do not directly translate to this work. In particular, our architecture
contains correlated parameters and measurement-controlled operations. Future work may determine if LOWESA or
similar algorithms based on Pauli propagation can be adapted to achieve speed-ups paralleling those of our specialized
method, especially for challenging classification tasks such as described in this paper.
In this paper we mainly use operations that are HW preserving. We recall the main properties of those operations
in Appendix A. In Section 2, we introduce the different HW preserving layers used to design quantum convolutional
neural network architectures. First, we present in 2.1 the Convolutional layer, then in 2.2 the measurement based
pooling layer, and in 2.3 we present the dense layer used at the end of the architecture. In Section 3 we discuss the
advantage of our quantum methods over its classical analogs in 3.1 by focusing on the model complexity, and we
present a training comparison in 3.2 for image classification tasks used to benchmark CNN architectures using our
simulation tools from [19]. Finally, we conclude in Section 4.

2 Quantum and Classical Convolutational Neural Network Architecture

In this Section, we present our HW preserving convolutional architecture. There exists many type of CNN architec-
tures, and we recall the very first one introduced by LeCun [7] which is the original version of LeNet in Figure 1. This
neural network is composed of successive convolutional and pooling layers, and it ends with a dense layer.

Convolutional Filter

Convolutional Filter

Pooling Pooling

Convolution Convolution Dense Layer


Figure 1: A Convolutional Neural Network architecture. In this example, the input is a batch of 2-dimensional
images and is thus a 3-dimensional tensor.

This structure is quite simple: the convolution parts extract features from the initial images, the pooling parts reduce
the dimension of the images, and the final dense layer mixes the features and performs the classification task. The
convolution layers are very appropriate for the feature extraction, as they perform a translation invariant operation on
the initial image by applying a convolution filter that is optimized through the training. Usually [7, 8, 23], each layer
can be followed by the application of a nonlinear function.

2
2.1 Quantum Convolutional Layer

In this Section we explain our convolutional layer based on tensor encoding. In the following, we will present the
classical convolutional layer, and introduce our Hamming weight preserving quantum convolutional. We will show
how the quantum version performs a convolution operation that is analog to the classical one, and what their differences
are. We illustrate both operations in Figure 2.

Classical Convolutional Layer:


Let’s recall what mathematical operation a classical convolutional layer performs. Consider a 2-dimensional tensor
x = (xi,j )(i,j)∈[d1 ]×[d2 ] , a convolution filter W = (wi,j )i,j∈[K] and final image x̃ = (x̃i,j )(i,j)∈[d1 ]×[d2 ] . We have:
X
∀(i, j) ∈ [d1 ] × [d2 ], x̃i,j = wa,b xi−⌊ K ⌋+a,j−⌊ K ⌋+b (1)
2 2
a,b∈[K]
which corresponds to a convolution operation between the Filter tensor and the Filter window around the pixel. In
Figure 2a, we illustrate this 2-dimensional example with the Filter window in green and the Filter in blue. We can
extend this definition to any k-dimensional tensor and for any convolutional layer of dimension less or equal to k.
Notice that in the case of a 2-dimensional convolution for a 3-dimensional input such as a batch of square images (see
Figure 1), each image is affected by the same 2-dimensional convolutional operation with the filter, such as described
in Eq.(1).

Tensor Encoding
To perform the quantum convolutional layer and the encoding, we will use Reconfigurable Beam Splitter (RBS) gates,
well-known two qubit gates used for HW preserving algorithms [24–26]. Additional information and properties of
this gate can be found in Appendix A.
Definition 1 (Reconfigurable Beam Splitter gate). The Reconfigurable Beam Splitter (RBS) gate is a 2-qubit gate that
corresponds to a θ-planar rotation between the states |01⟩ and |10⟩:
1 0 0 0
 
0 cos(θ) sin(θ) 0
RBS(θ) =  . (2)
0 − sin(θ) cos(θ) 0
0 0 0 1
We propose encoding classical data in such a way that allows us to apply a convolutional layer by using HW preserving
circuits. More precisely, we propose to load any tensor of dimension k by using amplitude encoding on a Tensor Basis
of HW k.
Definition 2 (HW Preserving Tensor encoding). Consider a classical tensor of dimension k such that x =
d1 ×···×dk
(x1,...,1 , . . . , xd1 ,...,dk ) ∈ RP . An amplitude encoding tensor encoding data loader is a parametrized n-qubit
quantum circuit (with n = i∈[k] di that prepares the quantum states:
1 X X E E
|x⟩ = ··· xi1 ,...,ik edi11 ⊗ · · · ⊗ edikk , (3)
||x||
i1 ∈[d1 ] ik ∈[dk ]
E
where edill = |0 . . . 010 . . . 0⟩ is a state corresponding to a bit-string with dl bits and only the bit il is equal to 1.
n E o
Therefore, for any j ∈ [k], edi l | i ∈ [dl ] is a fixed family of d orthonormal quantum states, and || · || denotes the
2-norm of Rd .
For example we can map a 2 × 2 matrix image x to a state |x⟩ using this encoding:
 
x1,1 x1,2 1
X= −→ |x⟩ = (x1,1 |1010⟩ + x1,2 |1001⟩ + x2,1 |0110⟩ + x2,2 |0101⟩) . (4)
x2,1 x2,2 ||x||
This choice of this encoding gives a structure to the state that allows us to apply the Convolutional layer and the Pooling
layers described in the following. It can be considered as amplitude encoding on a specific basis, and can be realized
thanks to quantum data loaders that perform amplitude encoding on the basis of fixed HW [10, 27, 28]. The tensor
encoding offers the opportunity to use measurement based operation to apply a Pooling operation as described in 2.2
that reduces the dimension of the state and apply non-linearities while preserving the tensor encoding structure of the
final states which has never been done before to our knowledge. This is the key ingredient for our global convolutional
architecture. Notice that the final state can be decomposed on k registers, and that all the registers can be considered
alone as HW preserving circuits of HW 1.

3
= =
=

Quantum
Classical Convolution
Convolution

(a) (b)
Figure 2: Classical (a) and Quantum (b) Convolutional layers. The convolutional filter is represented in blue.

Hamming-Weight Preserving Convolutional Layer:


Considering a tensor encoding of dimension k, applying a RBS-based quantum circuit on K qubits of one register
performs rotations between the states corresponding to each pixel linked with those qubits. As an example, performing
a RBS-based quantum circuit on the K first qubits of the line register for a 3 dimension image tensor encoded will
affect all the pixels in the K first lines of all the images. By applying the same circuit to each K consecutive qubits of
each register, one can perform a k-dimensional convolution. On each register, the HW is equal to 1 (or unary).
For example, with k = 2, we consider a 2-dimensional tensor x = (xi,j )(i,j)∈[d1 ]×[d2 ] which is tensor encoded such as
in Definition 2. If one applies a RBS based circuit between all the qubits of indexes I, . . . , (I + K) ∈ [d1 ] of the line
register, and another one between all the qubits of indexes J, . . . , (J + K) ∈ [d2 ] of the column register, then one can
consider that the corresponding K × K pixels form a filter window affected by a unitary matrix UFilter (Θ) such that:
I+K
X J+K
X I+K
X J+K
X
x̃i,j |ei , ej ⟩ = UFilter (Θ) xi,j |ei , ej ⟩ , (5)
i=I j=J i=I j=J

with Θ the RBS parameters, |ei ⟩ is a unary state corresponding to the Definition 2, UFilter (Θ) = (ui,j (Θ))i,j∈[K 2 ] the
quantum convolutional filter, and the final image x̃ = (x̃i,j )(i,j)∈[d1 ]×[d2 ] which is still tensor encoded.
Each pixel in the convolutional window is affected by the quantum filter by a convolutional relation analog to the one
given by Eq.(1):
X i,j i,j
∀i ∈ JI, I + KK, ∀j ∈ JJ, J + KK, x̃i,j = wa,b xa,b with wa,b = ⟨ea , eb | UFilter (Θ) |ei , ej ⟩ . (6)
a,b∈[K]

We show through Eq.(6), that applying a same RBS-based circuit to each K consecutive qubits for each register of
a tensor encoded state is equivalent to applying a convolution function. The quantum convolution is analog to the
classical one in the sense where each pixel in the filter window is affected by a classical convolution operation with
a K × K classical filter corresponding to a part of the K 2 × K 2 quantum filter coefficients. The HW preserving
convolutional layer is illustrated in Figure 2b. Notice that we apply the same operations to each set of K × K pixels,
and not the same operation to each pixel. This limitation can be bypassed by loading several copies of the initial tensor
that are translated in a batch, we implement such a feature in [19]. Eq.(5) and Eq.(6) can be adapted for any initial
tensor dimension and any filter dimension.
For a d-dimensional quantum convolutional layer, the quantum filter unitary is of size K d × K d . We use K as the
quantum convolutional layer hyperparameter. Even if the quantum filter is bigger than the classical filter, the structure
of the quantum convolutional circuit is such that the number of parameters is smaller for the quantum layer, as ex-
plained in Section 3.1. We show in our simulations presented in Section 3.2 that the QCNN architecture performances
are similar to the ones of classical CNN architecture.
In this setting, previous work [24, 25] have proposed efficient circuits that maximize the controlability with different
depth, meaning that those circuits can reach ion the unary basis any orthogonal matrices. In the case of the tensor
encoded data, each register has a corresponding HW of 1. Usual HW preserving ansatz that are can achieve any
unitary matrix in shte subspace of HW 1 are presented in Appendix B. In the following, we will focus on the butterfly

4
Corresponding
Quantum
Quantum Circuit
Data
Loader

Figure 3: A 2 dimensional convolutional layer using HW preserving quantum circuits and tensor encoding.

Qd
circuit [25] that minimizes the depth. Therefore, considering a d-dimensional quantum filter of size i=1 Ki , the
depth of one convolutional layer is maxi∈[d] O (log(Ki )). We illustrate the HW preserving convolutional layer in
Figure 3. Notice that the d-dimensional filter that we apply in the convolutional layer cannot be any matrix of size
K1 . . . Kd . As explained in [24], n-qubit RBS based circuits can only perform n × n orthogonal transformations while
considering a HW of 1. The resulting filter, or equivalent unitary on all the registers, is thus a parametrized orthogonal
matrix with i=1 Ki (K2i −1) independent parameters.
Pd

In terms of time complexity, the classical CNN layer depends on the input size, and its filter size. While considering
a batch of I × I images, the complexity of conventional
 2D convolution [29] depends on the size of the input image,
the number of channels C, and is O C 2 · K 2 · I 2 (we consider that the final are of dimension I × I). For our HW
preserving convolutional layer, the complexity only depends on the filter size K. As explained previously, considering
a butterfly circuit that maximized the expressivity in the subspace of HW 1, the depth of the quantum circuit is
O (log(K)). A comparison of the number of parameters and the time complexity between classical and quantum
convolutional architecture layers is presented in Section 3.1. The quantum polynomial advantage increases with the
dimension of the tensor, i.e., the HW of the encoding. Therefore, this layer may offer a more interesting advantage for
use cases with inputs of large dimensions, such as series classification [30].

2.2 Quantum Pooling Layer

In this Section, we introduce a Pooling layer that preserves the tensor structure of the quantum state. This layer allows
us to reduce the dimension of the image but also to apply some non linearities by using measurements. Applying
non linearities in QML architectures is a non trivial task as variational quantum circuits perform linear algebraic
operations on the quantum state. Previous works propose to use classical computation between quantum layers to
apply non linearities [24] or to use specific hardware tools to create non-linearities [31]. Another proposal [16] of
quantum CNN has considered using measurements and single qubit gates controlled by the outcomes to perform non
linearities. However, to our knowledge, there is no existing method to perform a Pooling layer with non linearities
that preserves the structure of the state, allowing to keep the subspace preserving properties of the computation.
Therefore, our proposal offers the possibility of deep learning architectures that are subspace preserving and thus, that
ensure theoretical guarantees on their training. In addition, our method does not require using adaptive measurement
techniques, but only to consider CNOT gates and to ignore a part of the qubits in the remaining part of the circuit.
Considering the tensor encoding, our Pooling method consists in applying a CNOT between each pair of qubits in
the register corresponding to the dimension that we want to reduce. In the following part of the circuit, we will only
consider the target qubits. This method is mathematically equivalent to measuring the control qubits and applying a
bit flip operation to the target qubits when measuring the corresponding control qubits in state |1⟩.
This
P Pooling circuit
Npreserves the tensor encoding structure. Considering an initial state ρ = |X⟩ ⟨X| with |X⟩ =
xi,j I I
i,j∈[I] ||x|| e i e j , the resulting state considering a Pooling layer for a square image (see Figure 4) is:
X ED E X x̃il,k O
ρ̃ = pi X̃ i X̃ i with X̃ i = eO
l eO
k , (7)
i
||x||
l,k∈[O]

5
1) 2)

Pooling

Figure 4: 1) Illustration of the Pooling Layer effect on a 2 dimensional image with Pooling windows in blue. 2)
Quantum Circuit for the Pooling layer and its equivalent representation using measurement and control X gates.

with O = I/2. The preservation of the tensor encoding structure allows us to implement several convolutional and
pooling layers as in most classical deep learning architecture. In addition, this pooling operation is analog to the
average pooling operation commonly used in deep learning architecture. Consider the case of a 4 by 4 image in which
we apply this pooling operation:
x1,1 x1,2 x1,3 x1,4
 
x x2,2 x2,3 x2,4 
X =  2,1 → ρ̃ . (8)
x3,1 x3,2 x3,3 x3,4 
x4,1 x4,2 x4,3 x4,4
And
 2
x11 + x212 + x213 + x214 x12 x14 + x22 x24 x21 x41 + x22 x42 x22 x44

 x12 x14 + x22 x24 x213 + x214 + x223 + x224 x24 x42 x23 x43 + x24 x44 
ρ̃ =  . (9)
x21 x41 + x22 x42 x24 x42 x231 + x232 + x241 + x242 x32 x34 + x42 x44 
2 2 2 2
x22 x44 x23 x43 + x24 x44 x32 x34 + x42 x44 x33 + x34 + x43 + x44
Notice that the diagonal terms are the sum of the squared values of the pixels in the Pooling windows (see Figure 4).
Therefore, the probability of measuring the state corresponding to a certain pixel after the Pooling layer is the sum of
the probability of measuring the states carrying the pixels in the corresponding Pooling window. Finally, considering
a measurement based Pooling operation allows us to apply some non linearities to the quantum state which is good as
non-linear activation functions are usually used after the Pooling layers.

2.3 Quantum Dense Layer

In this Section, we discuss the final part of our subspace preserving deep-learning architecture. As explained in the
introduction of Section 2, we consider an architecture very similar to LeNet (see Figure 1). After applying several Con-
volutional and Pooling Layers, this architecture ends with a vectorization of the image followed by a fully connected
layer or dense layer.
In the case of our architecture, no vectorization is required. The dense layer only consists of applying a RBS-based
trainable quantum circuit to the remaining qubit while merging all the registers.
Applying such a circuit on a quantum state of fixed HW k corresponds to applying an orthogonal neural network
as a dense layer. Previous works [10, 24] have highlighted the fact that using quantum orthogonal neural networks
results in powerful neural networks. The number of parameters and the choice of the structure, and more specifically
the choice of the connectivity used for this circuit are very important in the maximal controlability of this layer. For
example, using a line connectivity for this n-qubit layer implies that the layer equivalent unitary is a compound matrix
[10, 32]. This reduced the maximal dimension of the Dynamical Lie Algebra of this layer to n(n − 1)/2, meaning that
only a low number of parameters can be useful.
According to the HW of the states in the dense layer, i.e., the dimension k of the tensor considered as the input of
the architecture, the dense layer is harder to simulate classically. The impact of a RBS gate parametrized by θ in
a subspace of n qubits and HW k is n−2 k−1 θ-planar rotations. Therefore, considering a HW k independent of the

6
Row Register
Merging the Registers

Quantum Model

Gradient
based
training
Column Register

Final Pooling Layer Dense Layer Measurement

Figure 5: A 2 dimensional convolutional layer using Hamming weight preserving quantum circuits and tensor encod-
ing. Vertical lines in the dense layer represent two-qubit RBS gates, parametrized with independent angles.

number of qubits n, the dense layer can be classically simulated. However, the complexity of this simulation could1
be polynomial of degree k.

3 Results and Simulation


3.1 Complexity results

The term ”model complexity” may refer to different meanings in deep learning, including the expressive capacity and
effective model complexity [33]. It may also refer to the time complexity of the different layers [34]. To compare the
feed-forward and training running time between the quantum deep learning layers introduced in Section 2 and their
classical equivalent, two important criteria are particularly significant. First, the number of parameters of the model is
a standard metric of the running time, as a low number of parameters reduces the training and the forward pass of a
model. In addition, the forward pass running time is very important, and determines the number of basic operations
a computer needs to run the model. In Table 1, we compare the running time complexity of the forward pass for
each convolutional neural network layer with the depth of the analog quantum layers. The depth of the corresponding
quantum circuits gives the number of basic quantum operations, i.e., number of parallel gates that should apply.

Convolutional Pooling Orthogonal Dense Dense


Qk 2 2
Qk n
 Pk
Classical Time Complexities O( i=1 Ki · di ) O( i=1 di ) O(p · k ) O( i=1 d2i )
p
Quantum Layer Depth O(log(K)) O(1) O( n ) -
Table 1: Time complexity comparison between classical deep learning layers and Hamming weight preserving quan-
tum analogs. We consider k dimensional convolutional neural network layers with d1 × · · · × dk the size of the square
input tensor, {K1 , · · · Kk } the size of the convolutional filter, and p the number of parameters in the orthogonal dense
Pk
layer. We call n the global number of qubits in the case of the quantum architecture where n = i=1 di .

In Section 2, we presented our layers in the case of 2 or 3-dimensional convolutional architecture. In Table 1, we
consider the case of k-dimensional convolutional layer to consider a general case. The quantum advantage increases
with the dimension of the tensor. However, this dimension corresponds to the global HW, and one should be careful
to consider this value independent of the number of qubits n to avoid Barren Plateau [10–12].
In addition to the running time complexity, one should also consider the number of parameters of the model and the
running time associated with the vectorizations in the model. Indeed, preparing the state for each layer in a classical
CNN architecture requires to vectorized it, especially when using GPUs for computation [35]. In the case of our
quantum models, we don’t need to adapt the state as our Convolutional and Pooling layers preserve the structure of
our state, and the final dense layer only requires to apply RBS between qubits from different registers.
1 n−2

to our knowledge, performing the k−1
θ-planar rotations is the best simulation algorithm that exists for RBS-based quantum
circuits.

7
Convolutional Pooling Orthogonal Dense Dense
Qk 2 n
 n Pk 2
Classical Layers i=1 Ki 0 p ≤ (
k k − 1)/2 ( i=1 di )
Pk n n
Quantum Layers i=1 Ki (Ki − 1)/2 0 p ≤ k ( k − 1)/2 -
Table 2: The number of parameters of classical deep learning layers and of Hamming weight preserving quantum
analogs. We consider k dimensional convolutional neural network layers with d1 × · · · × dk the size of the square
input tensor, {K1 , · · · Kk } the size of the convolutional filter. We call n the global number of qubits in the case of the
Pk
quantum architecture where n = i=1 di .

Thanks to Table 1 and Table 2, we observe that the Convolutional and Pooling layers offer large polynomial advantages,
especially when considering high dimensional input tensors. The quantum filter is less parametrized than the classical
one, but simulations presented in Section 3.2 show that those quantum orthogonal filters perform well. Similarly, the
quantum orthogonal dense layer performs well with a reduced number of parameters in comparison with classical
dense. Previous works [24, 25, 28] have already shown that orthogonal layers perform well in comparison with
dense layers. The quantum advantage in terms of running time complexity, number of parameters, and lack of
vectorization needed open new perspective to design useful subspace preserving QML algorithms.

3.2 Simulations

In this Section, we test our method using several very famous datasets used to benchmark classification algorithms.
We proposed in our [19] a GPU-based toolkit to simulate Hamming-Weight preserving deep-learning architectures.
To do so, the code performs linear algebra using the PyTorch [36] library while only considering the smaller subspace
possible. The pooling part of the circuit is simulated using projectors between different subspace bases. Thanks to our
method, we were able to simulate larger quantum circuits and to perform image classification on 10-classes datasets
and not only binary classification as usually done in QML. Our simulation software allows one to mix our subspace
preserving simulation with classical layers thanks to its PyTorch module implementation. To our knowledge, this is
the most complex image classification task, in the sense of the number of labels, realized with classical data.

Parameters Dataset Training Accuracy Testing Accuracy Epochs


MNIST 91.33% ± 0.36% 84.59% ± 0.91% 30
CNN Architecture 990 FashionMNIST 82.8% ± 0.3% 73.83% ± 1.56% 40
CIFAR-10 35.65% ± 0.43% 27.79% ± 0.85% 40
MNIST 93.79% ± 0.76% 86.79% ± 1.45% 30
QCNN Architecture 755 FashionMNIST 82.95% ± 0.47% 78.29% ± 0.83% 40
CIFAR-10 34.29% ± 1.15% 28.71% ± 1.05% 40
Table 3: Simulation Results. We consider 2000 training samples and 1000 testing samples. We trained the architectures
described in Figure 6, with Adam optimizer, and Cross Entropy Loss. All hyper-parameters and computations can be
found in [19].

To benchmark our layers, we propose to compare a classical CNN architecture with a quantum one and similar hyper-
parameters, for 4 well known image recognition datasets. Each dataset ([37–39] has 10 classes of image, which we
prepare by applying a average pooling layer to reduce the size of the input images. Every simulation can be found in
[19], and an illustration of both architectures is presented on Figure 6a. We ran our simulations using a NVIDIA A100
80 GB GPU on a cluster.
Results presented in Figure 6 and Table 3 show that our architecture offers similar performance than classical CNN
architecture. In addition, with the running time complexity advantages summarized in Section 3.1, and the lack of vec-
torization needed, the quantum architecture reaches similar accuracy with fewer parameters due to the orthogonality
of its final dense layer, and the structure of its convolutional layers. Our model even outperforms the classical archi-
tecture for the MNIST and Fashion MNIST dataset classification. In the case of CIFAR-10 dataset, both architectures
do not have the complexity to achieve a satisfying result after the training, but we observe similar training behavior
and performance.

8
Image: 16 (height) 16 (width) Image: 16 (height) 16 (width) QCNN
7 (channels) 90 CNN
Conv: 4 4 kernel, 1 padding, 2 stride
1 input channel, 7 output channels Conv: 4 4 kernel, 2 (equivalent) stride 80

Training accuracy
Average Pooling: 2 2 kernel, 2stride Quantum Pooling: 2 2 kernel
70
ReLU

Conv: 4 4 kernel, 1 padding, 2 stride Conv: 4 4 kernel, 2 (equivalent) stride


7 input channels, 7 output channels
60

Quantum Pooling: 2 2 kernel


Average Pooling: 2 2 kernel, 2stride 50
ReLU, flatten
Quantum Orthogonal Layer
Linear Layer: 7 input features, 10 output features 40
Measuring 5 qubits
0 5 10 15 20 25 30
Output: 1of 10 classes Output: 1of 10 classes Epochs

(a) CNN architecture and HW preserving convolutional ar- (b) MNIST digit dataset [37].
chitecture used for training comparison.

QCNN QCNN
CNN 35 CNN
80

30
70
Training accuracy

Training accuracy

25
60
20

50
15

40
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Epochs Epochs

(c) Fashion MNIST dataset [38]. (d) CIFAR-10 dataset [39].

Figure 6: Average training accuracy and standard deviation comparison between classical CNN architecture and a HW
preserving architecture (a) for classification of 10 label datasets (b,c,d), with 2000 input images. The average values
and standard deviation are derived from 10 different trainings. The quantum architecture (QCNN) has 755 parameters
and the classical architecture (CNN) has 990 parameters. Both architecture (choice of layers and hyper-parameters)
are unchanged in all the case, and can be found in [19].

4 Conclusion

In this paper, we introduce a Convolutional and a measurement based Pooling layer that offer polynomial advantages
over their classical analogs. By conserving the subspace preserving structure of the state during the computation,
these layers can be assembled to perform complex deep-learning algorithms such as Convolutional Neural Network
architectures, while assuring the correct training of the quantum circuit. In particular, those circuits can avoid Barren
Plateau by only considering subspaces of polynomial size, limiting the potential running time advantages to polynomial
ones. Recent work [4] has pointed out the link between the absence of Barren Plateau and a non-exponential advantage
in the near-term QML literature, and we believe that our proposal offers a promising path for useful QML algorithms
by optimizing the framework that avoids vanishing gradient phenomena.
Our works also deal with an important question that only a few works address due to hardware limitations: how to
ensure that a method’s performance will scale with the size of the problems? The authors in [40] raises this issue
effectively, by offering software tools to compare popular QML models with classical ones and observing that out-of-
the-box classical machine learning models usually outperform the quantum classifiers for binary classification tasks.

9
In addition, their results suggest that ”quantumness” may not be the crucial ingredient for the small learning tasks
considered.
By offering software tools that are tailored for Hamming-Weight preserving algorithms, and by mimicking the behavior
of state-of-the-art classical deep-learning layers, we offer a solution that performs well in comparison with classical
methods while offering an interesting running time advantage. Our software, that can be accessed through [19],
allowed us to train our model on 10-label classification tasks that are far more complex than usual binary classification
tasks used to illustrate QML methods and are commonly used in the classical Machine Learning literature.
We hope that our work, combined with other future studies on different figures of merits such as noise resilience,
energy consumption, differential privacy, etc... will offer new perspectives for near term QML algorithms.

5 Acknowledgment
This work is supported by the H2020-FETOPEN Grant PHOQUSING (GA no.: 899544), the Engineering and Physical
Sciences Research Council (grants EP/T001062/1), and the Naval Group Centre of Excellence for Information Human
factors and Signature Management (CEMIS). ABG is supported by ANR JCJC TCS-NISQ ANR-22-CE47-0004, and
by the PEPR integrated project EPiQ ANR-22-PETQ-0007 part of Plan France 2030. This work is part of HQI
initiative (www.hqi.fr) and is supported by France 2030 under the French National Research Agency award number
ANR-22-PNCQ-0002. The authors warmly thank Manuel Rudolph for the helpful discussions.

References
[1] Jarrod R. McClean et al. “Barren plateaus in quantum neural network training landscapes”. In: Nature Commu-
nications 9.1 (Nov. 2018). DOI: 10.1038/s41467-018-07090-4.
[2] Martin Larocca et al. A Review of Barren Plateaus in Variational Quantum Computing. 2024. arXiv: 2405.
00781 [quant-ph].
[3] M. Cerezo et al. “Variational quantum algorithms”. In: Nature Reviews Physics 3.9 (Aug. 2021), pp. 625–644.
DOI : 10.1038/s42254-021-00348-9.
[4] M. Cerezo et al. Does provable absence of barren plateaus imply classical simulability? Or, why we need to
rethink variational quantum computing. 2024. arXiv: 2312.09121 [quant-ph].
[5] John Preskill. “Quantum Computing in the NISQ era and beyond”. In: Quantum 2 (Aug. 2018), p. 79. ISSN:
2521-327X. DOI: 10.22331/q-2018-08-06-79.
[6] Keiron O’Shea and Ryan Nash. “An Introduction to Convolutional Neural Networks”. In: ArXiv abs/1511.08458
(2015).
[7] Y. Lecun et al. “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE 86.11
(1998), pp. 2278–2324. DOI: 10.1109/5.726791.
[8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional
Neural Networks”. In: Advances in Neural Information Processing Systems. Ed. by F. Pereira et al. Vol. 25.
Curran Associates, Inc., 2012.
[9] Raneen Younis, Sergej Zerr, and Zahra Ahmadi. “Multivariate Time Series Analysis: An Interpretable CNN-
based Model”. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA).
2022, pp. 1–10. DOI: 10.1109/DSAA54385.2022.10032335.
[10] Léo Monbroussou et al. Trainability and Expressivity of Hamming-Weight Preserving Quantum Circuits for
Machine Learning. 2023. arXiv: 2309.15547 [quant-ph].
[11] Martin Larocca et al. “Diagnosing Barren Plateaus with Tools from Quantum Optimal Control”. In: Quantum 6
(Sept. 2022), p. 824. DOI: 10.22331/q-2022-09-29-824.
[12] N. L. Diaz et al. Showcasing a Barren Plateau Theory Beyond the Dynamical Lie Algebra. 2023. arXiv: 2310.
11505 [quant-ph].
[13] Michael Ragone et al. A Unified Theory of Barren Plateaus for Deep Parametrized Quantum Circuits. 2023.
arXiv: 2309.09342 [quant-ph].
[14] Enrico Fontana et al. The Adjoint Is All You Need: Characterizing Barren Plateaus in Quantum Ansätze. 2023.
arXiv: 2309.07902 [quant-ph].
[15] Matthew L. Goh et al. Lie-algebraic classical simulations for variational quantum computing. 2023. arXiv:
2308.01432 [quant-ph].
[16] Iris Cong, Soonwon Choi, and Mikhail D. Lukin. “Quantum convolutional neural networks”. In: Nature Physics
15.12 (Aug. 2019), pp. 1273–1278. ISSN: 1745-2481. DOI: 10.1038/s41567-019-0648-8.

10
[17] Iordanis Kerenidis, Jonas Landman, and Anupam Prakash. Quantum Algorithms for Deep Convolutional Neural
Networks. 2019. arXiv: 1911.01117 [quant-ph].
[18] ShiJie Wei et al. A Quantum Convolutional Neural Network on NISQ Devices. 2021. arXiv: 2104 . 06918
[quant-ph].
[19] Léo Monbroussou and Letao Wang. “Hamming Weight Preserving QCNN Simulation Software”. In: (2024).
url: https://github.com/ptitbroussou/HW_QCNN.
[20] Pablo Bermejo et al. Quantum Convolutional Neural Networks are (Effectively) Classically Simulable. 2024.
arXiv: 2408.12739 [quant-ph].
[21] Manuel S. Rudolph et al. Classical surrogate simulation of quantum systems with LOWESA. 2023. arXiv: 2308.
09109 [quant-ph].
[22] Enrico Fontana et al. Classical simulations of noisy variational quantum circuits. 2023. arXiv: 2306.05400
[quant-ph].
[23] Kaiming He et al. Deep Residual Learning for Image Recognition. 2015. arXiv: 1512.03385 [cs.CV].
[24] Jonas Landman et al. “Quantum Methods for Neural Networks and Application to Medical Image Classifica-
tion”. In: (2022). DOI: 10.22331/q-2022-12-22-881. eprint: arXiv:2212.07389.
[25] El Amine Cherrat et al. “Quantum Vision Transformers”. In: (2022). eprint: arXiv:2209.08167.
[26] Nishant Jain et al. Quantum Fourier Networks for Solving Parametric PDEs. 2023. arXiv: 2306 . 15415
[quant-ph].
[27] Renato M. S. Farias et al. Quantum encoder for fixed Hamming-weight subspaces. 2024. arXiv: 2405.20408
[quant-ph].
[28] Sonika Johri et al. “Nearest Centroid Classification on a Trapped Ion Quantum Computer”. In: (2020). eprint:
arXiv:2012.04145.
[29] Tao Wei, Yonghong Tian, and Chang Wen Chen. “Rethinking Convolution: Towards an Optimal Efficiency”.
In: 2021.
[30] Hassan Ismail Fawaz et al. “Deep learning for time series classification: a review”. In: Data Mining and Knowl-
edge Discovery 33.4 (Mar. 2019), pp. 917–963. ISSN: 1573-756X. DOI: 10.1007/s10618-019-00619-1.
[31] Gregory R. Steinbrecher et al. Quantum optical neural networks. 2018. arXiv: 1808.10047 [quant-ph].
[32] Iordanis Kerenidis and Anupam Prakash. Quantum machine learning with subspace states. 2022. arXiv: 2202.
00054 [quant-ph].
[33] Xia Hu et al. Model Complexity of Deep Learning: A Survey. 2021. arXiv: 2103.05127 [cs.LG].
[34] Bhoomi Shah and Hetal Bharat Bhavsar. “Time Complexity in Deep Learning Models”. In: Procedia Computer
Science (2022).
[35] Jimmy SJ. Ren and Li Xu. On Vectorization of Deep Convolutional Neural Networks for Vision Tasks. 2015.
arXiv: 1501.07338 [cs.CV].
[36] Adam Paszke et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 2019. arXiv:
1912.01703 [cs.LG].
[37] Yann LeCun, Corinna Cortes, and CJ Burges. “MNIST handwritten digit database”. In: ATT Labs [Online].
Available: http://yann.lecun.com/exdb/mnist 2 (2010).
[38] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a Novel Image Dataset for Benchmarking Ma-
chine Learning Algorithms. 2017. arXiv: 1708.07747 [cs.LG].
[39] Alex Krizhevsky. “Learning Multiple Layers of Features from Tiny Images”. In: 2009.
[40] Joseph Bowles, Shahnawaz Ahmed, and Maria Schuld. Better than classical? The subtle art of benchmarking
quantum machine learning models. 2024. arXiv: 2403.07059 [quant-ph].

A Reminder on Hamming Weight Preserving circuits


In this Section, we present the main properties of Hamming Weight (HW) preserving gates. Let’s define the basis of
n-qubit states of HW k:
Bkn = {|e⟩ | e ∈ {0, 1}n and HW (e) = k} , (10)
with HW (e) the number of 1 in the bitstring e. We call Hamming weight preserving a n-qubit quantum circuit such
that its corresponding unitary matrix U follows:
∀k ∈ [n], ∀ |e⟩ ∈ Bkn , U · |e⟩ ∈ span(Bkn ) . (11)
Such a unitary is subspace preserving, and we can re-order its indexes to form a block diagonal matrix where each
block is a unitary matrix corresponding to the action of the circuit in a specific subspace.

11
Figure 7: Block representation of the HW-preserving unitaries. U is the 2n × 2n unitary corresponding to a n-qubit
HW-preserving quantum circuit. Each block k is the unitary matrix corresponding to the preserved subspace of HW
k, and the state basis Bkn . Their size are dk × dk where dk = nk .

The most common HW preserving gate is the Reconfigurable Beam Splitter (RBS). This is the only HW preserving
gate used in this article, and it is easy to implement or native on many quantum devices. RBS gates are presented
through Definition 1. Notice that the definition of the RBS is the same as the one of the photonic Beam Splitter
when considering a single photon. Theoretical guarantees on the training of RBS based circuit have been proposed by
previous work [10], and it can be used to perform amplitude encoding on a specific subspace [10, 27, 28].

12
B Hamming Weight Preserving Quantum Ansatz
In this Section, we recall the RBS-based ansatz presented in [25], that can reach any unitary in the subspace of HW 1.

Figure 8: From left to right: Pyramid circuit, Butterfly Circuit, and X circuit. Vertical lines represent two-qubit RBS
gates, parametrized with independent angles.

Each n-qubit circuit presented in Figure 8 is able to achieve any unitary matrix of dimension n × n in the subspace of
HW 1. Therefore, those circuit are good candidates to be applied on each register of HW 1 in the quantum convolu-
tional layers presented in Section 2.1.

13

You might also like