Synthesis of Quantum Logic Circuits
Synthesis of Quantum Logic Circuits
Abstract
The pressure of fundamental limits on classical computation and the promise of exponential speedups
from quantum effects have recently brought quantum circuits [10] to the attention of the Electronic De-
sign Automation community [18, 28, 7, 27, 17]. We discuss efficient quantum logic circuits which
perform two tasks: (i) implementing generic quantum computations and (ii) initializing quantum reg-
isters. In contrast to conventional computing, the latter task is nontrivial because the state-space of an
n-qubit register is not finite and contains exponential superpositions of classical bit strings. Our proposed
circuits are asymptotically optimal for respective tasks and improve earlier published results by at least
a factor of two.
The circuits for generic quantum computation constructed by our algorithms are the most efficient
known today in terms of the number of difficult gates (quantum controlled-NOTs). They are based
on an analogue of the Shannon decomposition of Boolean functions and a new circuit block, quantum
multiplexor, that generalizes several known constructions. A theoretical lower bound implies that our
circuits cannot be improved by more than a factor of two. We additionally show how to accommodate
the severe architectural limitation of using only nearest-neighbor gates that is representative of current
implementation technologies. This increases the number of gates by almost an order of magnitude, but
preserves the asymptotic optimality of gate counts.
1 Introduction
As the ever-shrinking transistor approaches atomic proportions, Moore’s law must confront the small-scale
granularity of the world: we cannot build wires thinner than atoms. Worse still, at atomic dimensions
we must contend with the laws of quantum mechanics. For example, suppose one bit is encoded as the
presence or the absence of an electron in a small region.1 Since we know very precisely where the elec-
tron is located, the Heisenberg uncertainty principle dictates that we cannot know its momentum with high
accuracy. Without a reasonable upper bound on the electron’s momentum, there is no alternative but to
use a large potential to keep it in place, and expend significant energy during logic switching. A quanti-
tative analysis of these phenomena leads experts from NCSU, SRC and Intel [36] to derive fundamental
limitations on the scalability of any computing device which moves electrons.
Yet these same quantum effects also facilitate a radically different form of computation [13]. Theo-
retically, quantum computers could outperform their classical counterparts when solving certain discrete
1 Most current computing technologies use electron charges to store information; exceptions include spintronics-based techniques,
1
problems [16]. For example, a successful large-scale implementation of Shor’s integer factorization [29]
would compromise the RSA cryptosystem used in electronic commerce. On the other hand, quantum ef-
fects may also be exploited for public-key cryptography [4]. Indeed, such cryptography systems, based
on single-photon communication, are commercially available from MagiQ Technologies in the U.S. and
IdQuantique in Europe.
Physically, a quantum bit might be stored in one of a variety of quantum-mechanical systems. A broad
survey of these implementation technologies, with feasibility estimates and forecasts, is available in the
form of the ARDA quantum computing roadmap [1]. Sample carriers of quantum information include
top-electrons in hyperfine energy levels of either trapped atoms or trapped ions, tunneling currents in
cold superconductors, nuclear spin polarizations in nuclear magnetic resonance, and polarization states of
single photons. A collection of n such systems would comprise an n-qubit register, and quantum logic gates
(controlled quantum processes) would then be applied to the register to perform a computation. In practice,
such gates might result from rotating the electron between hyperfine levels by shining a laser beam on the
trapped atom/ion, tuning the tunneling potential by changing voltages and/or current in a super-conducting
circuit, or perhaps passing multiple photons through very efficient nonlinear optical media.
The logical properties of qubits also differ significantly from those of classical bits. Bits and their
manipulation can be described using two constants (0 and 1) and the tools of boolean algebra. Qubits, on
the other hand, must be discussed in terms of vectors, matrices, and other linear algebraic constructions.
We will fully specify the formalism in Section 2, but give a rough idea of the similarities and differences
between classical and quantum information below.
These differences notwithstanding, quantum logic circuits, from a high level perspective, exhibit many
similarities with their classical counterparts. They consist of quantum gates, connected (though without
fanout or feedback) by quantum wires which carry quantum bits. Moreover, logic synthesis for quantum
circuits is as important as for the classical case. In current implementation technologies, gates that act
on three or more qubits are prohibitively difficult to implement directly. Thus, implementing a quantum
computation as a sequence of two-qubit gates is of crucial importance. Two-qubit gates may in turn be
decomposed into circuits containing one-qubit gates and a standard two-qubit gate, usually the quantum
controlled-not (CNOT). These decompositions are done by hand for published quantum algorithms (e.g.,
Shor’s factorization algorithm [29] or Grover’s quantum search [16]), but have long been known to be
possible for arbitrary quantum functions [12, 3]. While CNOTs are used in an overwhelming majority of
theoretical and practical work in quantum circuits, their implementations are orders of magnitude more
error-prone than implementations of single-qubit gates and have longer durations. Therefore, the cost of
a quantum circuit can be realistically calculated by counting CNOT gates. Moreover, it has been shown
previously that if CNOT is the only two-qubit gate type used, the number of such gates in a sufficiently
large irredundant circuit is lower-bounded by approximately 20% [27].
The first quantum logic synthesis algorithm to so decompose an arbitrary n-qubit gate would return a
circuit containing O(n3 4n ) CNOT gates [3]. The work in [9] interprets this algorithm as the QR decomposi-
tion, well-known in matrix algebra. Improvements on this method have used clever circuit transformations
and/or Gray codes [20, 2, 31] to lower this gate count. More recently, different techniques [21] have led to
2
circuits with CNOT-counts of 4n − 2n+1 . The exponential gate count is not unexpected: just as the exponen-
tial number of n-bit Boolean functions ensures that the circuits computing them are generically large, so too
in the quantum case. Indeed, it has been shown that n-qubit operators generically require ⌈ 41 (4n − 3n − 1)⌉
CNOTs [27]. Similar exponential lower bounds existed earlier in other gate libraries [20].
Existing algorithms for n-qubit circuit synthesis remain a factor of four away from lower bounds and
fare poorly for small n. These algorithms require at least 8 CNOT gates for n = 2, while three CNOT gates
are necessary and sufficient in the worst case [27, 34, 33]. Further, a simple procedure exists to produce
two-qubit circuits with minimal possible number of CNOT gates [25]. In contrast, in three qubits the lower
bound is 14 while the generic n-qubit decomposition of [21] achieves 48 CNOTs and a specialty 3-qubit
circuit of [32] achieves 40.
In this work, we focus on identifying useful quantum circuit blocks. To this end, we analyze quantum
conditionals and define quantum multiplexors that generalize CNOT, Toffoli and Fredkin gates. Such
quantum multiplexors implement if-then-else conditionals when the controlling predicate evaluates to a
coherent superposition of |0i and |1i. We find that quantum multiplexors prove amenable to recursive
decomposition and vastly simplify the discussion of many results in quantum logic synthesis (cf. [8, 31,
21]). Ultimately, our analysis leads to a quantum analogue of the Shannon decomposition, which we apply
to the problem of quantum logic synthesis.
We contribute the following key results.
• An arbitrary n-qubit quantum state can be prepared by a circuit containing no more than 2n+1 − 2n
CNOT gates. This lies a factor of four away from the theoretical lower bound.
• An arbitrary n-qubit operator can be implemented in a circuit containing no more than (23/48) ×
4n − (3/2) × 2n + 4/3 CNOT gates. This improves upon the best previously published work by a
factor of two and lies less than a factor of two away from the theoretical lower bound.
• In the special case of three qubits, our technique yields a circuit with 20 CNOT gates, whereas the
best previously known result was 40.
3
2 Background and Notation
The notion of a qubit formalizes the logical properties of an ideal quantum-mechanical system with two
basis states. The two states are labeled |0i and |1i. They can be distinguished by quantum measurement of
the qubit, which yields a single classical bit of information, specifying which state the qubit was observed
in. However, the state of an isolated (in particular, unobserved) qubit must be modeled by vector in a
two-dimensional complex2 vector space H 1 which is spanned by the basis states.
H 1 = spanC {|0i , |1i} (1)
We identify |0i and |1i with the following column vectors.
1 0
|0i = |1i = (2)
0 1
Thus, an arbitrary state |φi ∈ H 1 can be written in either of the two equivalent forms given below.
α0
|φi = α0 |0i + α1 |1i = (3)
α1
The entries of the state vector determine the readout probabilities: if we measure a qubit whose state is
described by |φi, we should expect to see |0i with probability |α0 |2 and |1i with probability |α1 |2 . Since
these are the only two possibilities, α0 and α1 are required to satisfy |α0 |2 + |α1 |2 = 1.
To describe the state of N, we must somehow obtain from |ψL i and |ψM i a state vector |ψN i ∈ H n . Quan-
tum mechanics demands that we use a natural generalization of bitstring concatenation called the tensor
product. To compute the tensor product of two states, we write |ψN i = |ψL i |ψM i, and expand it using the
distributive law.
∑ βb γb′ |bi b′
|ψL i |ψM i = (6)
b∈Bℓ ,b′ ∈Bm
2 Complex rather than real coefficients are required in most applications. For example, in certain optical implementations [22,
§7.4.2] real and imaginary parts encode both the presence and phase of a photon.
4
Let · denote concatenation; then |b i |b′ i and |b · b′i represent the same bitstring state. As b · b′ ∈ Bn ,
we have |ψL i |ψM i ∈ H n , as desired.
Perhaps counter-intuitively, the quantum-mechanical state of N cannot in general be specified only in
terms of the states of L and M. Indeed, H k is a 2k dimensional vector space, and for n ≫ 2 we observe 2n ≫
2m + 2ℓ . For example, three independent qubits can be described by three two-dimensional vectors, while
a generic state-vector of a three-qubit system is eight-dimensional. Much interest in quantum computing
is driven by this exponential scaling of the state space, and the loss of independence between different
subsystems is called quantum entanglement.
5
The nomenclature Rx , Ry , Rz is motivated by a picture of one-qubit states as points on the surface of a
sphere of unit radius in R3 . This picture is called the Bloch sphere [22], and may be obtained by expanding
an arbitrary two-dimensional complex vector as below.
θ θ
|ψi = α0 |0i + α1 |1i = re it/2
e −iϕ/2
cos |0i + eiϕ/2
sin |1i (8)
2 2
The constant factor reit/2 is physically undetectable. Ignoring it, we are left with two angular parameters
θ and ϕ, which we interpret as spherical coordinates (1, θ, ϕ). In this picture, |0i and |1i correspond to the
north and south poles, (1, 0, 0) and (1, π, 0), respectively. The Rx (θ) gate (resp. Ry (θ), Rz (θ)) corresponds
to a counterclockwise rotation by θ around the x (resp. y, z) axis. Finally, just as the point given by the
spherical coordinates (1, θ, ϕ) can be moved to the north pole by first rotating −ϕ degrees around the z-axis,
then −θ degrees around the y axis, so too the following matrix equations hold.
∑ αb·b′ V b W b′
V ⊗ W |ψi = (11)
b∈Bℓ ,b′ ∈Bm
Here, V |b i ∈ H ℓ and W |b′ i ∈ H m are to be concatenated, or tensored, as per Equation 6. It can be deduced
from Equation 11 that the 2n × 2n matrix of V ⊗ W is given by
6
_ _ _
U1 U4 U7
V2
U ∼
=
U
2 U5 U8
V1 V3
U3 U6 U9
_ _ _
Figure 1: A typical quantum logic circuit. Information flows from left to right, and the higher wires
represent higher order qubits. The quantum operation performed by this circuit is (U7 ⊗ U8 ⊗ U9 )(I2 ⊗
V3 )(V2 ⊗ I2)(U4 ⊗ U5 ⊗ U6 )(I2 ⊗ V1 )(U1 ⊗ U2 ⊗ U3), and the last factor is outlined above. Note that when
the matrix A · B is applied to vector ~v, this is equivalent to applying the matrix B first, followed by the
matrix A. Therefore, the formulas describing quantum circuits must be read right to left.
gates are fully specified can be checked by multiplying matrices. However, in addition to fully specified
gates, our circuit diagrams will contain the following generic, or under-specified gates:
Notation. An equivalence of circuits containing generic gates will mean that for any specification (i.e.,
parameter values) of the gates on one side, there exists a specification of the gates on the other such that
the circuits compute the same operator. Generic gates used in this paper are limited to the following:
∼
= Rz Ry Rz
| i Rz Ry |∗i
We shall use a backslash to denote that a given wire may carry an arbitrary number of qubits (quantum
bus). In the sequel, we seek backslashed analogues of Theorems 1 and 2.
7
3 Quantum Conditionals and the Quantum Multiplexor
Classical conditionals can be described by the if-then-else construction: if the predicate is true,
perform the action specified in the then clause, if it is false, perform the action specified in the else
clause. At the gate level, such an operation might be performed by first processing the two clauses in
parallel, then multiplexing the output. To form the quantum analogue, we replace the predicate by a qubit,
replace true and false by |1i and |0i, and demand that the actions corresponding to clauses be unitary.
The resulting “quantum conditional” operator U will then be unitary. In particular, when selecting based
on a coherent superposition α0 |0i + α1 |1i, it will generate a linear combination of the then and else
outcomes. Below, we shall use the term quantum multiplexor to refer to the circuit block implementing a
quantum conditional.
Notation. We shall say that a gate U is a quantum multiplexor with select qubits S if it preserves any
bitstring state |bi carried by S. In this case, we denote U in quantum logic circuit diagrams by “ ” on each
select qubit, connected by a vertical line to a gate on the remaining data (read-write) qubits.
In the event that a multiplexor has a single select bit, and the select bit is most significant, the matrix of
the quantum multiplexor is block diagonal.
U0
U= (13)
U1
The multiplexor will apply U0 or U1 to the data qubits according as the select qubit carries |0i or |1i. To
express such a block diagonal decomposition, we shall use the notation U = U0 ⊕ U1 that is standard in
linear algebra. More generally, let V be a multiplexor with s select qubits and a d-qubit wide data bus. If
the select bits are most significant, the matrix of V will be block diagonal, with 2s blocks of size 2d × 2d .
The j-th block V j is the operator applied to the data bits when the select bits carry | ji.
In general, a gate depicted as a quantum multiplexor need not read or modify as many qubits as in-
dicated on a diagram. For example, a multiplexor which performs the same operation on the data bits
regardless of what the select bits carry can be implemented as an operation on the data bits alone. We give
a less trivial example below: a multiplexor which applies a different scalar multiplication for each value of
the select bits can be implemented as a diagonal operator applied to the select bits.
\ \ ∆
∼
=
Indeed, both circuits represent diagonal matrices in which each diagonal entry is repeated (at least)
twice. In the former case, the repetition is due to a multiplexed scalar acting on the least significant qubit,
and in the latter there is no attempt to modify the least significant qubit.
We now clarify the meaning of multiplexed generic gates in circuit diagrams, like that in the above
circuit equivalence.
Notation. Let G be a generic gate. A specification U of a multiplexed-G gate can be any quantum
multiplexor which effects a potentially different specification of G on the data qubits for each bitstring
appearing on the select qubits. Of course, select qubits may carry a superposition of several bitstring
states, in which case the behavior of the multiplexed gate is defined by linearity.
8
3.1 Quantum Multiplexors on Two Qubits
Perhaps the simplest quantum multiplexor is the Controlled-NOT (CNOT) gate.
1 0 0 0
0 1 0 0
CNOT = I ⊕ σx = 0 0 0 1 =
• (14)
0 0 1 0
On bitstring states, the CNOT flips the second (data) bit if the first (select) bit is |1i, hence the name
Controlled-NOT. The CNOT is so common in quantum circuits that it has its own notation: a “•” on the
select qubit connected by a vertical line to an “⊕” on the data qubit. This notation is motivated by the
characterization of the CNOT by the formula |b1 i |b2 i 7→ |b1 i |b1 XOR b2 i. Several CNOTs are depicted
in Figure 3.
The CNOT, together with the one-qubit gates defined in §2, forms a universal gate library for quantum
circuits.4 In particular, we can use it as a building block to help construct more complicated multiplexors.
For example, we can implement the multiplexor Rz (θ0 ) ⊕ Rz (θ1 ) by the following circuit.
• •
Rz ( θ0 +θ
2 )
1
Rz ( θ0 −θ
2 )
1
In fact, the exact same statement holds if we replace Rz by Ry (this can be verified by multiplying four
matrices). We summarize the result with a circuit equivalence.
Theorem 4 : Demultiplexing a singly-multiplexed Ry or Rz .
• •
∼
=
Rk Rk
Rk
A similar decomposition exists for any U ⊕ V where U,V are one-qubit gates. The idea is to first
unconditionally apply V on the less significant qubit, and then apply A = UV † , conditioned on the more
significant qubit. Decompositions for such controlled-A operators are well known [3, 9]. Indeed, if we
write A = eit Rz (α)Ry (β)Rz (γ) by Theorem 1, then U ⊕ V is implemented by the following circuit.
• eit/2 • Rz (t)
Since V is a generic unitary, it can absorb adjacent one-qubit boxes, simplifying the circuit. We re-
express the result as a circuit equivalence.
Theorem 5 : Decompositions of a two-qubit multiplexor [3]
• • Rz •
∼
= ∼
= ∆
Ry Rz
Rz
Ry
Proof. The first equivalence is just a re-statement of what we have already seen; the second follows from
it by applying a CNOT on the right to both sides and extracting a diagonal operator.
4 This was first shown in [12]. The results in the present work also constitute a complete proof.
9
3.2 The Multiplexor Extension Property
The theory of n-qubit quantum multiplexors begins with the observation that whole circuits and even circuit
equivalences can be multiplexed. This observation has non-quantum origins and can be exemplified by
comparing two expressions involving conditionals in terms of a classical bit s.
• if (s) A0 · B0 else A1 · B1
• As · Bs . Here As means if (s) A0 else A1 , with the syntax and semantics of (s?A0 :A1 ) in
the C programming language.
Indeed, one can either make a whole expression conditional on s or make each term conditional on s —
the two behaviors will be identical. Similarly, one can multiplex a whole equation (with two different
instantiations of every term) or multiplex each of its terms. The same applies to quantum multiplexing by
linearity.
Consider the special case of quantum multiplexors with a single data bit, but arbitrarily many select
bits. We seek to implement such multiplexors via CNOTs and one-qubit gates, beginning with the following
decomposition.
\ \ \ ∆
∼
= ∼
=
Rz Ry Rz Rz Ry Rz
∆ Rz Rz
∆ ∼
= ∼
= ∼
=
\ \ \ \ ∆
Proof. The first equivalence asserts that any diagonal gate can be expressed as a multiplexor of diagonal
gates. This is true because diagonal gates possess the block-diagonal structure characteristic of multiplex-
ors, with each block being diagonal. The second equivalence amounts to the MEP applied to the obvious
fact that a one-qubit gate given by a diagonal matrix is a scalar multiple of an Rz gate. The third follows
from Theorem 3.
It remains to decompose the other gates appearing on the right in the circuit diagram of Theorem 6.
We shall call these gates multiplexed Rz (or Ry ) gates,5 as, e.g., the rightmost would apply a different Rz
gate to the data qubit for each classical configuration of the select bits. While efficient implementations are
known [8, 21], the usual derivations involve large matrices and Gray codes.
5
Other authors have used the term uniformly-controlled rotations to describe these gates [21].
10
• • • •
• • • •
_ _ _ _
∼
= ∼
=
Rz Rz
Rz
Rz
Rz
Rz
Rz
_ _ _ _
Figure 2: The recursive decomposition of a multiplexed Rz gate. The boxed CNOT gates may be canceled.
Proof. We show how to produce |0i on the least significant bit; the case of |1i is similar. Let |ψi be
an arbitrary (n + 1)-qubit state. Divide the 2n+1 -element vector |ψi into 2n contiguous 2-element blocks.
Each is to be interpreted as a two-dimensional complex vector, and the c-th is to be labeled |ψc i. We now
determine rc ,tc , ϕc , θc as in Equation 9.
Rz (−ϕc )Ry (−θc ) |ψc i = rc eitc |0i (15)
Let |ψ′ i be theL
n-qubit state given by the 2n -element row vector with c-th entry rc eitc , and let U be the block
diagonal sum c Ry (−θc )Rz (−ϕc ). Then U |φi = |φ′ i |0i, and U may be implemented by a multiplexed Rz
gate followed by a multiplexed Ry .
We may apply Theorem 8 to implement the (n + 1)-bit circuit given above with 2n+1 CNOT gates. A
slight optimization is possible given that the gates on the right-hand size in Theorem 8 can be optionally
11
reversed, as explained above. Indeed, if we reverse the decomposition of the multiplexed Ry gate, its first
gate (CNOT) will cancel with the last gate (CNOT) from the decomposed multiplexed Rz gates. Thus, only
2n+1 − 2 CNOT gates are needed.
Applying Theorem 9 recursively can reduce a given n-qubit quantum state |ψi to a scalar multiple of a
desired bitstring state |bi; the resulting circuit C uses 2n+1 − 2n CNOT gates. To go from |bi to |ψi, apply
the gates of C in reverse order and inverted. We shall call this the inverse circuit, C† .
The state preparation technique can be used to decompose an arbitrary unitary operator U. The idea is
to construct a circuit for U † by iteratively applying state preparation. Indeed, an operator is entirely deter-
mined by its behavior on basis vectors. To this end, each iteration needs to implement the correct behavior
on a new basis vector while preserving the behavior on previously processed basis vectors. This idea has
been tried before [20, 31], but with methods less efficient than Theorem 9. We outline the procedure below.
• At step 0, apply Theorem 9 to find a circuit C0 that maps U |0i to a scalar multiple of |0i. Let
U1 = C0U.
• At step j, apply Theorem 9 to find a circuit C j that maps U | ji to a scalar multiple of | ji. Importantly,
the construction of C j and the previous steps of the algorithm ensure C j |ii = |ii for all i < j. Define
U j+1 = C jU j .
• U2n −1 will be diagonal, and may be implemented by a circuit D via Theorem 7.
most laptops this numerical computation scales to ten-qubit quantum operators, i.e., 1024 × 1024 matrices.
12
significant bit for each classical configuration of the low order bits. Thus the CSD can be restated as the
following equivalence of generic circuits.
Ry
∼
=
\ \
It has been observed that this theorem may be recursively applied to the side factors on the right-hand
side [30]. Indeed, this can be achieved by adding more qubits via the MEP, as shown below.
\ \
∼
= Ry
\ \
We may now outline the best previously published generic quantum logic synthesis algorithm [21].
Iterated application of Theorem 11 to the decomposition of Theorem 10 gives a decomposition of an
arbitrary unitary operator into single-data-bit QMUX gates, some of which are already multiplexed Ry
gates. Those which are not can be decomposed into multiplexed rotations by Theorem 6, and then all the
multiplexed rotations can be decomposed into elementary gates by Theorem 8.
One weakness of this algorithm is that it cannot readily take advantage of hand-optimized generic
circuits on low numbers of qubits [34, 33, 27, 25]. This is because it does not recurse on generic operators,
but rather on multiplexors.
Rz
∼
=
\ \
Proof. Let U = U0 ⊕U1 be the multiplexor of choice; we formulate and solve an equation for the unitaries
required to implement U in the manner indicated above. We want unitary V,W and unitary diagonal D
satisfying U = (I ⊗ V )(D ⊕ D† )(I ⊗ W ). In other words,
U1 V D W
= (16)
U2 V D† W
Multiplying the expressions for U1 and U2 , we cancel out the W -related terms and obtain U1U2 † = V D2V † .
Using this equation, one can recover D and V from U1U2 † by a standard computational primitive called
diagonalization. Further, W = DV †U2 . It remains only to remark that for D diagonal, the matrix D ⊕ D† is
in fact a multiplexed Rz gate acting on the most significant bit in the circuit.
13
Number of qubits and gate counts
Synthesis Algorithm 1 2 3 4 5 6 7 n
Original QR decomp. [3, 9] ——– O(n3 4n )
Improved QR decomp. [20] ——– O(n4n )
Palindrome transform [2] ——– O(n4n )
QR [31, Table I] 0 4 64 536 4156 22618 108760 O(4n )
QR (Theorem 9) 0 8 62 344 1642 7244 30606 2 × 4n − (2n + 3) × 2n + 2n
CSD [21, p. 4] 0 8 48 224 960 3968 16128 4n − 2 × 2n
QSD (l = 1) 0 6 36 168 720 2976 12096 (3/4) × 4n − (3/2) × 2n
QSD (l = 2) 0 3 24 120 528 2208 9024 (9/16) × 4n − (3/2) × 2n
QSD (l = 2, optimized) 0 3 20 100 444 1868 7660 (23/48) × 4n − (3/2) × 2n + 4/3
Lower bounds [27] 0 3 14 61 252 1020 4091 ⌈ 14 (4n − 3n − 1)⌉
Table 1: A comparison of CNOT counts for unitary circuits generated by several algorithms (best results
are in bold). We have labeled the algorithms by the matrix decomposition they implement. The results
of this paper are boldfaced, including an optimized QR decomposition and three algorithms based on the
Quantum Shannon Decomposition (QSD). Other rows represent previously published algorithms. Gate
counts are not given for algorithms whose performance is not (generically) asymptotically optimal.
Using the new decomposition, we now demultiplex the two side multiplexors in the Cosine-Sine De-
composition (Theorem 10). This leads to the following decomposition of generic operators that can be
applied recursively.
Rz Ry Rz
∼
=
\ \
Hence an arbitrary n-qubit operator can be implemented by a circuit containing three multiplexed rota-
tions and four generic (n − 1)-qubit operators, which can be viewed as cofactors of the original operator.
One can now apply the decomposition of Theorem 13 recursively, which corresponds to iterating the above
inequality. If ℓ-qubit operators may be implemented using ≤ cℓ CNOT gates, one can prove the following
inequality for cn by induction.
14
• • •
• •
• •
∼
=
•
•
We have recorded in Table 1 the formula for cn with recursion bottoms out at one-qubit operators
(l = 1 and cl = 0), or two-qubit operators (l = 2 and cl = 3 by [27, 34, 33]). In either case, we improve
on the best previously published algorithm (cf. [21]). However, to obtain our advertised CNOT-count of
(23/48) × 4n − (3/2) × 2n + 4/3 we shall need two further optimizations. Due to their more technical
nature, they are discussed in the Appendix.
Note that for n = 3, only 20 CNOTs are needed. This is the best known three-qubit circuit at present
(cf. [32]). Thus, our algorithm is the first efficient n-qubit circuit synthesis routine which also produces a
best-practice circuit in a small number of qubits.
6 Nearest-Neighbor Circuits
A frequent criticism of quantum logic synthesis (especially highly optimized circuits which nonetheless
must conform to large theoretical lower bounds on the number of gates) is that the resulting circuits are
physically impractical. In particular, naïve gate counts ignore many important physical problems which
arise in practice. Many such are grouped under the topic of quantum architectures [5, 23], including
questions of (1) how best to arrange the qubits and (2) how to adapt a circuit diagram to a particular
physical layout. A spin chain7 is perhaps the most restrictive architecture: the qubits are laid out in a line,
and all CNOT gates must act only on adjacent (nearest-neighbor) qubits. As spin-chains embed into two
and three dimensional grids, we view them as the most difficult architecture from the perspective of layout.
The work in [14] shows how to adapt Shor’s algorithm to spin-chains without asymptotic increase in gate
counts. However, it is not yet clear if generic circuits can be adapted similarly.
As shown next, our circuits adapt well to the spin-chain limitations. Most CNOT gates used in our
decomposition already act on nearest neighbors, e.g., those gates implementing the two-qubit operators.
Moreover, Fig. 2 shows that only 2n−k CNOT gates of length k (where the length of a local CNOT is 1) will
appear in the circuit implementing a multiplexed rotation with (n − 1) control bits. Figure 3 decomposes a
length k CNOT into 4k − 4 length 1 CNOTs. Summation shows that 9 × 2n−1 − 8 nearest-neighbor CNOTs
suffice to implement the multiplexed rotation. Therefore restricting CNOT gates to nearest-neighbor inter-
actions increases CNOT count by at most a factor of nine.
15
achieve the best known controlled-not counts, both for small numbers of qubits and asymptotically. Our
approach has the additional advantage that it co-opts all results on small numbers of qubits – e.g., future
specialty techniques developed for three-qubit quantum logic synthesis can be used as terminal cases of
our recursion. We have also discussed various problems specific to quantum computation, specifically
initialization of quantum registers and mapping to the nearest-neighbor gate library.
Acknowledgements. We are grateful to Professors Dianne O’Leary from the Univ. of Maryland
and Joseph Shinnerl from UCLA for their help with computing the CS decomposition in Matlab; to
Gavin Brennen at NIST and Jun Zhang at UC Berkeley for their helpful comments, and the authors of
quant-ph/0406003, whose package Qcircuit.tex produced almost all figures.
This work is funded by the DARPA QuIST program and an NSF grant. SSB is supported by an NRC
postdoctoral fellowship. The views and conclusions contained herein are those of the authors and should
not be interpreted as necessarily representing official policies or endorsements of employers and funding
agencies. Certain commercial equipment or instruments may be identified in this paper to specify exper-
imental procedures. Such identification is not intended to imply recommendation or endorsement by the
National Institute of Standards and Technology.
16
A.2 Extracting Diagonals to Improve Decomposition of Two-Qubit Operators
Terminate the recursion when only two-qubit operators remain; there will be 4n−2 of them. These two-qubit
operators all act on the least significant qubits and are separated by the controls of multiplexed rotations.
To perform better optimization, we recite a known result on the decomposition of two-qubit operators.
• Ry •
∼
= ∆
Ry
We use Theorem 14 to decompose the rightmost two-qubit operator; migrate the diagonal through the
select bits of the multiplexor to the left, and join it with the two-qubit operator on the other side. Now we
decompose this operator, and continue the process. Since we save one CNOT in the implementation of
every two-qubit gate but the last, we improve the l = 2, cl = 3 count by 4n−2 − 1 gates.
References
[1] The ARDA Roadmap For Quantum Information Science and Technology,
http://qist.lanl.gov .
[2] A. V. Aho and K. M. Svore. Compiling quantum circuits using the palindrome transform. e-print,
quant-ph/0311008.
[3] A. Barenco, C. Bennett, R. Cleve, D.P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J.A. Smolin,
and H. Weinfurter, Elementary gates for quantum computation. Phys. Rev. A, 52:3457, 1995.
[4] C. H. Bennett and G. Brassard. Quantum cryptography: Public-key distribution and coin tossing. In
Proceedings of IEEE International Conference on Computers, Systems, and Signal Processing, page
175179, Bangalore, India, 1984. IEEE Press.
[5] G.K. Brennen, D. Song, and C.J. Williams, Quantum-computer architecture using nonlocal interac-
tions. Phys. Rev. A.(R), 67:050302, 2003.
[6] S.S. Bullock, Note on the Khaneja Glaser decomposition. Quant. Info. and Comp. 4:396, 2004.
[7] S. S. Bullock and I. L. Markov. An elementary two-qubit quantum computation in twenty-three
elementary gates. In Proceedings of the 40th ACM/IEEE Design Automation Conference, pages 324–
329, Anaheim, CA, June 2003. Journal: Phys. Rev. A 68:012318, 2003.
[8] S. S. Bullock and I. L. Markov, Smaller circuits for arbitrary n-qubit diagonal computations.
Quant. Info. and Comp. 4:27, 2004.
[9] G. Cybenko: “Reducing Quantum Computations to Elementary Unitary Operations”, Comp. in Sci.
and Engin., March/April 2001, pp. 27-32.
[10] D. Deutsch, Quantum Computational Networks, Proc. R. Soc. London A 425:73, 1989.
[11] D. Deutsch, A. Barenco, A. Ekert, Universality in quantum computation. Proc. R. Soc. London A
449:669, 1995.
[12] D. P. DiVincenzo. Two-bit gates are universal for quantum computation. Phys. Rev. A 15:1015, 1995.
17
[13] R. P. Feynman. Quantum mechanical computers. Found. Phys., 16:507–531, 1986.
[14] A. G. Fowler, S. J. Devitt, L. C. L. Hollenberg, “Implementation of Shor’s Algorithm on a Linear
Nearest Neighbour Qubit Array”, Quant. Info. Comput. 4, 237-251 (2004).
[15] G.H. Golub and C. vanLoan, Matrix Computations, Johns Hopkins Press, 1989.
[16] L. K. Grover. Quantum mechanics helps with searching for a needle in a haystack. Phys. Rev. Let.,
79:325, 1997.
[17] W. N. N. Hung, X. Song, G. Yang, J. Yang, and M. Perkowski. Quantum logic synthesis by symbolic
reachability analysis. In Proceedings of the 41st Design Automation Conference, San Diego, CA,
June 2004.
[18] K. Iwama, Y. Kambayashi, and S. Yamashita. Transformation rules for designing cnot-based quantum
circuits. In Proceedings of the 39th Design Automation Conference, pages 419–425, 2002.
[19] R. Jozsa and N. Linden, On the role of entanglement in quantum computational speed-up. e-print,
quant-ph/0201143.
[20] E. Knill, Approximation by quantum circuits. LANL report LAUR-95-2225.
[21] M. Möttönen, J. J. Vartiainen, V. Bergholm, and M. M. Salomaa. Quantum circuits for general
multiqubit gates. Phys. Rev. Let., 93:130502, 2004.
[22] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge
University Press, 2000.
[23] M. Oskin, F.T. Chong, I. Chuang, and J. Kubiatowicz, Building quantum wires: the long and the
short of it. In 30th Annual International Symposium on Computer Architecture (ISCA), June 2003.
[24] C. C. Paige and M. Wei. History and generality of the CS decomposition. Linear Alg. and App.,
208:303, 1994.
[25] V. V. Shende, S. S. Bullock, and I. L. Markov. Recognizing small-circuit structure in two-qubit
operators. Phys. Rev. A, 70:012310, 2004.
[26] V. V. Shende and I. L. Markov. Quantum Circuits for Incompletely Specified Two-Qubit Operators
Quant. Inf. and Comput., vol.5, no.1, pp. 49-58, January 2005.
[27] V. V. Shende, I. L. Markov, and S. S. Bullock. Smaller two-qubit circuits for quantum communication
and computation. In Design, Automation, and Test in Europe, pages 980–985, Paris, France, February
2004. Journal: Phys. Rev. A, 69:062321, 2004.
[28] V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes. Synthesis of reversible logic circuits. IEEE
Transactions on Computer Aided Design, 22:710, 2003.
[29] P. Shor. Polynomial-time algorithms for prime factorization and discrete logarithm on a quantum
computer. SIAM Journal on Computing, 26(5):1484–1509, 1997.
[30] R. R. Tucci, A Rudimentary Quantum Compiler. e-print, quant-ph/9805015.
[31] J. J. Vartiainen, M. Möttönen, and M. M. Salomaa. Efficient decomposition of quantum gates. Phys.
Rev. Let., 92:177902, 2004.
18
[32] F. Vatan and C. Williams. Realization of a general three-qubit quantum gate. e-print,
quant-ph/0401178.
[33] F. Vatan and C. Williams. Optimal quantum circuits for general two-qubit gates. Phys. Rev. A,
69:032315, 2004.
[34] G. Vidal and C. M. Dawson. A universal quantum circuit for two-qubit transformations with three
CNOT gates. Phys. Rev. A, 69:010301, 2004.
[35] J. Zhang, J. Vala, S. Sastry, and K. B. Whaley. Exact two-qubit universal quantum circuit. Phys. Rev.
Let., 91:027903, 2003.
[36] V. V. Zhirnov, R. K. Cavin, J. A. Hutchby, and G. I. Bourianoff. Limits to binary logic switch scaling
— a gedanken model. Proceedings of the IEEE, 91(11):1934–1939, November 2003.
19