Intro to Quantum Computing - Aaronson
Intro to Quantum Computing - Aaronson
Lecture Notes
Scott Aaronson1
Fall 2018
Page
2
CONTENTS 3
6 Mixed States 42
6.1 Mixed States . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.1 Density Matrices . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Properties of Density Matrices . . . . . . . . . . . . . . 46
6.1.3 Partial Trace and Reduced Density Matrices . . . . . . 47
9 Superdense Coding 65
9.1 Superdense Coding . . . . . . . . . . . . . . . . . . . . . . . . 65
11 Quantifying Entanglement 75
11.1 Schmidt Decomposition . . . . . . . . . . . . . . . . . . . . . . 76
11.2 Von Neumann Entropy . . . . . . . . . . . . . . . . . . . . . . 76
11.2.1 Entanglement Entropy . . . . . . . . . . . . . . . . . . 77
11.3 Mixed State Entanglement . . . . . . . . . . . . . . . . . . . . 79
25 Hamiltonians 206
25.1 Quantum Algorithms for NP-complete Problems . . . . . . . 206
25.2 Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
25.2.1 Matrix Exponentiation . . . . . . . . . . . . . . . . . . 208
25.2.2 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6 CONTENTS
7
8 LECTURE 1. COURSE INTRODUCTION AND THE ECT
distribution over various possibilities, but since the outcomes are perfectly
correlated, as soon as you learn the headline on your copy you instantly know
that your friend’s must be the same.
So, what does quantum mechanics have to say about each of these princi-
ples? To give you a teaser for much of the rest of the course:
I We’ll still use probabilities. But the way we’ll calculate probabilities will
be totally different, and will violate the axiom of monotonicity. That is,
increasing the number of ways for an event to happen, can decrease the
probability that it happens.
I Locality will be upheld. But Local Realism will be overthrown. And
if those two principles sounded like restatements of each other—well,
quantum mechanics will dramatically illustrate the difference between
them!
10 LECTURE 1. COURSE INTRODUCTION AND THE ECT
The famous theoretical physicist Richard Feynman said that everything about
quantum mechanics could be encapsulated in the Double Slit Experiment.
In the double-slit experiment, you shoot photons one at a time toward a wall
with two narrow slits. Where each photon lands on a second wall is proba-
bilistic. If we plot where photons appear on the back wall, some places are
very likely, some not. In Figures 2.1 – 2.3 you can see diagrams showing
the basic experimental set-up and results from performing both single-slit and
double-slit experiments with photons.
Note that some places on the screen being likely and others unlikely in
and of itself isn’t the weird part: we could totally explain this by some theory
where each photon just had some extra degree of freedom (an “RFID tag”)
that we didn’t know about, and that determined which way it went. What’s
weird is as follows. For some interval on the second wall:
Let P be the probability that the photon lands in the interval with both
slits open.
Let P1 be the probability that the photon lands in the interval if only
slit 1 is open.
Let P2 be the probability that the photon lands in the interval if only
slit 2 is open.
You’d think that P = P1 + P2 . But experiment finds that that’s not the
case! Even places that are never hit when both slits are open, can sometimes
be hit if only one slit is open.
The weirdness isn’t that “God plays dice,” but rather that
“these aren’t normal dice”!
11
12 LECTURE 2. PROBABILITY THEORY AND QM
Figure 2.1: Experimental setup for a Figure 2.2: Experimental setup for a
single-slit photon interference exper- double-slit photon interference exper-
iment iment
You may think to measure which slit the photon went through, but doing
so changes the measurement results into something that makes more sense,
with just two bright patches, one for each slit. Note that it isn’t important
whether there’s a conscious observer: if the information about which slit the
photon went through leaks out in any way into the outside environment, the
results go back to looking like they obey classical probability theory.
Figure 2.4: Double-slit experiment with measuring devices on each slit which
measure which slit any given photon passes through. In this case the probabil-
ity distribution looks like the average of the individual single slit distributions.
The story of atomic physics between roughly 1900 and 1926 is that sci-
entists kept finding things that didn’t fit with the usual laws of mechanics
or probability. They often came up with hacky solutions that explained a
14 LECTURE 2. PROBABILITY THEORY AND QM
α = α1 + α2 (2.2)
Then from the Born rule, Equation 2.1, we have
If, for example, α1 = 1/2 and α2 = −1/2, then we find P = 0 if both slits
are open but, P = 1/4 if only one slit is open; This phenomenon is known as
interference.
So then, to justify the electron not spiraling into the nucleus we can say
that, yes, there are many paths where the electron does do that, but some
have positive amplitudes and others have negative amplitudes and they end
up canceling each other out.
For now, we’ll consider classical probability. Let’s look at flipping a coin.
We model this with a vector assigning a probability to each possibility: p =
P (heads) and q = P (tails).
p p, q ≥ 0
(2.5)
q p+q =1
Turning the coin over means the probability that the coin was heads is
now the probability that the coin is tails. If it helps, you can think of the
16 LECTURE 2. PROBABILITY THEORY AND QM
where P (a|b) is the conditional probability for the state of the coin to be “a”
given that it was previously in the state “b.” We could also flip the coin fairly.
1 1
1
2 2
p
1 1 = = 21 (2.8)
2 2
q 2
Which means that regardless of previous state, both possibilities are now
equally likely. Let’s say we flip the coin, and if we get heads we flip again, but
if we get tails we turn it to heads.
1
q
0 2
p 2
= (2.9)
1 1
2
q p + 2q
Does this make sense? Since if the state of the coin is found to be heads we
do a fair flip we can see that given it’s heads the probability after we do the
flip for heads or tails is 21 . That is, P (heads|heads) = P (tails|heads) = 21 . If
we see a tails however we always flip it back to heads and so P (tails|tails) = 0
and P (tails|heads) = 1.
So, which matrices can be used as transformations? Firstly, we know that
all entries have to be non-negative (because probabilities can’t be negative).
We also know that each column must sum to 1, since we need the sum of initial
probabilities to equal the sum of the transformed probabilities (namely, both
should equal 1). A matrix that satisfies these conditions is called a Stochastic
Matrix.
Now let’s say we want to flip two coins, or rather, two bits. For the first
bit a = P (0) and b = P (1). For the second let c = P (0) and d = P (1).
0 a 0 c
1 b 1 d
To combine the two vectors we need a new operation, called Tensor Product:
P (00) ac
a c P (01) ad
⊗ =
P (10) = bc
(2.10)
b d
P (11) bd
2.1. LINEAR ALGEBRA APPROACH TO PROBABILITY THEORY 17
It’s worth noting that not all possible 4-element vectors can arise by the
tensor product of two 2-element vectors. For example, suppose by contradic-
tion that
1
ac 2
ad 0
= . (2.11)
bc 0
1
bd 2
Then multiplying the first and last equations together implies that (ac)(bd) =
1
4
, while multiplying the second and third implies that (ad)(bc) = 0, giving a
contradiction. As such, the 4-element vector on the right hand side can’t be
written as the tensor product of two 2-element vectors.
As we did with the one-bit systems, we can describe probabilistic transfor-
mations of two-bit systems using stochastic matrices. For example, let’s say
we apply a transformation where if the first bit is 1 then we flip the second bit
and otherwise we do nothing to the system. The 4 × 4 matrix that achieves
this is given in Equation 2.12, and is called the Controlled NOT or CNOT
matrix; it will also come up often in quantum computing.
00 01 10 11
1 00 0 0 0
01 0
1 0 0
CNOT = (2.12)
10 0 0 0 1
11 0 0 1 0
Suppose we apply the CNOT matrix to the following vector, representing a
system where the first bit is either zero or one with 21 probablity and the second
bit is always 0:
1 1
1 0 0 0 2
00
2
0 1 0 0 0 01 0
= (2.13)
0 0 0 1 12 10 0
1
0 0 1 0 0 11
2
19
20 LECTURE 3. BASIC RULES OF QM
interesting!
Another motivation for adopting the ket notation is that it simplifies many
of the linear algebra operations we use frequently in quantum mechanics. Of-
ten, for example, we’ll need to take the transpose (or conjugate transpose for
complex-valued vectors):
α
→ α∗ β ∗
β
and we can use these conjugate-transposed vectors to define a norm on our
vector space
2
∗ ∗ α
||v|| = α β = |α|2 + |β|2 (3.1)
β
In ket notation both of these operations can easily be represented. The
conjugate-transpose of a ket, |ψi = α |0i + β |1i, is represented by a corre-
sponding object called a bra hψ| where
property that hx|yi = hy|xi∗ . Note that we’ll be adopting the physics conven-
tion of denoting the conjugate-transpose with the † symbol (read dagger).
The set of all possible pure quantum states of a qubit with real coefficients
defines a circle. The set of all possible quantum states with complex coefficients
defines a sphere known as the Bloch sphere, which we will learn about in
greater detail in a later lecture. In addition to the states |0i and |1i (known as
the “standard basis states”) there are four other single qubit states that occur
so frequently in quantum information theory that they’ve been given special
names:
|0i + |1i
|+i = √
2
|0i − |1i
|−i = √
2
(3.2)
|0i + i |1i
|ii = √
2
|0i − i |1i
|−ii = √
2
Figure 3.1: Standard and
Hadamard basis states respre-
sented on the real plane.
The pair {|+i , |−i} is often referred to as the “Hadamard basis.” Figure 3.1
shows the positions of the standard and Hadamard basis states in the real
plane.
qubit, U , is unitary if |α|2 + |β|2 = |α0 |2 + |β 0 |2 for all input vectors [α, β]> . In
other words, U preserves the 2-norm of the vector.
Examples of 1-Qubit Unitary Transformations
1 0
0 1
1 0
cos (θ) − sin (θ)
0 1 1 0 0 i sin (θ) cos (θ) (3.3)
Identity NOT Gate Relative Phase Shift 2D-Rotations
The second-to-last unitary matrix above has the effect of mapping |0i → |0i
and |1i → i |1i; in fact, we can replace the i in this matrix with any unit
magnitude complex number of the form eiθ . Remember Euler’s equation,
eiθ = cos (θ) + i sin (θ). (3.4)
Rotations also preserve the 2-norm of vectors and so for example, we can use
the last matrix above to rotate our state in the real plane by some specified
angle θ (we denote this family of matrices Rθ ).
Since a unitary matrix, U , preserves the 2-norm of vectors it immediately
follows that it must also preserve the value of the inner product hψ|ψi. This
gives the following series of equalities hψ|ψi = (|ψi)† |ψi = (U |ψi)† U |ψi =
hψ| U † U |ψi. This can only be true for all |ψi if U † U = I, which implies that
for a unitary matrix U −1 = U † . It also implies that the rows of U must be an
orthogonal unit basis. Conversely, it’s easy to see that if U −1 = U † then U
is unitary. So you can tell if a matrix is unitary by checking if U † U = I, or
equivalently if the rows (or the columns) form an orthogonal unit basis.
But in the quantum world, you can sometimes apply a unitary transformation
to a superposition state and get a determinate answer, even though the answer
would have been random had you applied it to any individual component of the
superposition. Many of the most interesting phenomena in quantum mechanics
can be explained in terms of quantum interference. As an illustrative
example, suppose we start initially in the |0i state. We then apply twice a
unitary transformation which when applied to |0i places us in |+i = |0i+|1i
√
2
|0i−|1i
and which when applied to |1i places us in |−i = √
2
. This situation drawn
in Figure 3.2.
To get the amplitude associated with a particular path we take the product
of the amplitudes along that path. Then to get the final amplitude associated
with a particular output, we sum the amplitudes for each of the paths through
the tree leading to that output. In this case, the amplitude of |0i is 1 and the
amplitude of |1i is 0. The paths leading to |0i interfere constructively while
the paths leading to |1i interfere destructively.
√
1 1+i 1−i
NOT = , (4.1)
2 1−i 1+i
√ √
which you can check satisfies the property that NOT NOT = NOT. One
of the most ubiquitous gates in quantum information is the Hadamard gate
1 1 1
H=√ (4.2)
2 1 −1
The Hadamard gate is so useful because it maps the {|0i , |1i} basis to the
{|+i , |−i} basis, and vice versa.
" 1 #
1 1 1 1 √
H |0i = √ = √12 = |+i
2 1 −1 0 2
Similarly, H |1i = |−i, H |+i = |0i and H |−i = |1i. Note that the
{|0i , |1i} basis and {|+i , |−i} basis form different two different orthogonal
(and complementary) bases with the special property that being maximally
25
26 LECTURE 4. QUANTUM GATES, CKTS, ZENO AND E-V BOMB
certain in the {|0i , |1i} basis means that you’re maximally uncertain in the
{|+i , |−i} basis and vice versa. Another example of a basis changing gate is
1 1 1
√ . (4.3)
2 i −i
This one switches us from the {|0i , |1i} basis to the {|ii , |−ii} basis. Why
would we want to use multiple bases? We like to think of vectors existing
abstractly in a vector space, but to do computations we often need to pick a
convenient basis. When we see some actual quantum algorithms and protocols,
we’ll see the power that comes from switching between bases.
I Invertible: This should be clear, since preserving the 2-norm means that
U † U = I which means U −1 = U † .
– In other words, the transformation |ψi → U |ψi can always be
reversed by applying U † , since U † U |ψi = |ψi.
This last item is part of why it’s important that unitary matrices
are in gen-
1 0
eral complex-valued. If, for example, the transformation was applied
0 −1
by some process which took 1 second,
then by applying the same process for
1 0
half of a second, we can obtain or some other square root of the trans-
0 i
1 0
formation. But we invite you to check that has no 2 × 2 real-valued
0 −1
square root.
By the way, if we allow ourselves the ability to add an extra dimension
1 0
then there is a 3 × 3 matrix that “squares” to :
0 −1
2
1 0 0 1 0 0
0 0 1 = 0 −1 0 .
0 −1 0 0 0 −1
1 0
But, to take a square root of , either you need complex numbers, or
0 −1
else you need to add a third dimension. The latter is analogous to reflecting
28 LECTURE 4. QUANTUM GATES, CKTS, ZENO AND E-V BOMB
So how can we reconcile these two sets of rules? That’s the famous Mea-
surement Problem; we’ll talk about various points of view on it later. De-
spite the philosophical conflict, unitary transformations and measurement sync
up well because unitary transformations preserve the 2-norm and measurement
gives probabilities determined by the 2-norm.
|1i H H
Figure 4.1: Quantum circuit representing two Hadamard gates followed by a
measurement on a qubit initialized to the |1i state.
The operations in Figure 4.1 are simple enough that we’d have no trouble
writing out the matrices and vectors explicitly, but as we add more qubits
to our system and allow for multi-qubit operations, matrix representations
quickly become unwieldy (see the final two gates in Table 4.1). Quantum cir-
cuits give us a tool for succinctly describing all manner of complicated quantum
transformations.
Quantum circuits allow for operations on an arbitrary number of qubits.
Here is a circuit containing a two-qubit gate, labeled U , which is followed by
a Hadamard on the first qubit and then measurements on both qubits.
|0i H
U
|0i
Figure 4.2: A generic two qubit operation followed by a Hadamard on the first
qubit and a pair of measurements.
a system in this manner we can add new qubits (typically all assumed to be
initialized to |0i) to the system which are called ancilla qubits.
One final notational convention we’ll introduce in this section is for con-
trolled gates. This includes the CNOT gate that we saw first in Section 2.1.
A controlled gate can be split into two parts, the control and the target. We
represent a control qubit using a thick solid dot on a wire. We then draw a
vertical line connecting to a gate on another qubit(s) which we wish to control.
In the figure below a pair of arbitrary qubits (meaning we won’t specify the
input ahead of time) has a series of controlled gates applied.
• •
U •
Figure 4.3: Different types of controlled operations. The first gate is a CNOT
with the first qubit as the control and the second qubit as the target. Notice
the special notation (⊕) for the target of the CNOT gate. The second is a
controlled-U operation, where U is arbitrary. This operation applies U if the
control qubit is |1i and does nothing otherwise. The final gate is a CNOT
gate with second qubit as the control and the first as the target. Control can
run in either direction up or down.
There are other notational conventions used for various gates relevant to
quantum information, more than we can go through in detail. For a summary
of the most common ones that we’ll come across during the course see Table
4.1.1
0 0 0 0 0 0 0 1
Table 4.1: Circuit representations for some commonly used gates in quantum
information theory.
32 LECTURE 4. QUANTUM GATES, CKTS, ZENO AND E-V BOMB
Another interesting variant of the same kind of effect is called the Watched
Pot Effect. Say we want to keep a qubit at |0i, but it keeps rotating towards
|1i (it’s drifting). If we keep measuring it in the {|0i , |1i} basis every time
the qubit has drifted by an angle , the odds of it jumping to |1i at any given
measurement is only 2 . So if we repeat the measurements ≈ 1 times, then the
probability it ending up at |1i is only ≈ , even though it would have drifted
to |1i with certainty had we not measured.
This is the quantum airport though (IATA code BQP), so suppose instead
that we can upgrade our bit to a qubit: |bi = α |0i + β |1i. We’ll also assume
that in the case there is no bomb, the state |bi gets returned to you. If there
is a bomb, the bomb measures in the {|0i , |1i} basis. If the outcome is |0i,
then |0i is returned to you, while if the outcome is |1i, the bomb explodes.
What we can do is start with the |0i state and apply the rotation
cos () − sin ()
R = ,
sin () cos ()
giving us cos () |0i + sin () |1i. If there’s a bomb, the probability it explodes
is sin2 () ≈ 2 , otherwise we get back |0i. If there’s no bomb, we get back
cos () |0i + sin () |1i.
π
So repeating this process, each time applying R , about 2 times makes the
total probability of setting off the bomb (if there is a bomb) only 2 π
sin2 () ≈
π
2
. Yet, by measuring our qubit to see whether it’s |0i or |1i, we still learn
whether or not a bomb was there. If there was a bomb then by the watched
pot effect the state will be |0i with high probability, and if there wasn’t then
our repeated applications of R succeeded in rotating the state by π2 and our
state is |1i. Of course, the catch is that this requires not merely a qubit on
our end, but also a bomb that can be “quantumly interrogated”!
Figure 4.5: Evolution of the qubit after multiple queries with no bomb.
Lecture 5: The Coin Problem,
Distinguishability, Multi-Qubit States
and Entanglement
Figure 5.1: Distribution for the coin-flip probability. Since the standard error
scales like √1n , where n is the number of coin flips, if we want to distinguish
reliably between two distributions whose means are separated by we need
√1 ∼ → n ∼ 12 coin flips.
n
34
5.2. DISTINGUISHABILITY OF QUANTUM STATES 35
the |0i state, and consider the two rotations R and R− , which rotate by
and − radians respectively. We can repeatedly flip the coin, and if it lands
tails apply R (rotating clockwise) and if it lands heads apply R− (rotating
counterclockwise). After many flips (∼ 12 ) we can then measure the qubit and
statistically infer that if it’s in the |0i state the coin was most likely fair, while
if it’s in the |1i state the coin is most likely biased. You might raise a few
objections about the protocol:
I Won’t counting out the right number of steps again require a lot of stor-
age?
– No. We can give a protocol with a half-life (some independent prob-
ability of halting at each step) causing it to repeat approximately
the number of times we want it to.
I What about if the qubit drifts by a multiple of π? Won’t that make a
biased coin look fair?
– That’s possible, but we can make it so that a biased coin is more
likely to land on |1i than a fair coin.
You may want to measure in the |vi, |v ⊥ i basis, as it would eliminate one
kind of error completely (not getting |vi ensures the state was |wi). But if you
just want to maximize the probability of getting the right answer, and if |vi
and |wi are equally likely, then there’s a better way, illustrated in Figure 5.5.
Take the bisector of |vi and |wi and define the measurement basis by using the
states 45◦ to either side. When we perform the measurement in this basis we
output the original vector closest to the measurement result as the outcome.
α |0i + β |1i
|0i ⊗ p , (5.1)
|α|2 + |β|2
p
where the factor of |α|2 + |β|2 in the denominator ensures that the result is
properly normalized. This is called the Partial Measurement Rule. This is
actually the last “basic rule” of quantum mechanics that we’ll see in the course;
everything else is just a logical consequence of rules we’ve already covered.
0 0 1 0
38 LECTURE 5. THE COIN PROBLEM, INNER PRODUCTS, . . .
0 0 1 0
which can be written in tensor product notation as
1 0 0 1
I ⊗ NOT = ⊗ (5.4)
0 1 1 0
Likewise, if we want to apply a NOT to the first qubit and do nothing to
the second qubit we can apply NOT ⊗ I, which in matrix representation is
00 01 10 11
0
00 0 1 0
01 0 0 0 1
NOT ⊗ I = (5.5)
10 1 0 0 0
11 0 1 0 0
Remember that rows represent input amplitudes and columns represent
output amplitudes, so for NOT ⊗ I the amplitude on 00 in the input is the
amplitude on 10 in the output.
Very often in quantum information we’ll want to take a group of qubits
and perform an operation on one of them: say, “Hadamard the third qubit.”
What that really means is applying the unitary matrix I ⊗ I ⊗ H ⊗ I ⊗ · · · ⊗ I.
The desired operation on the relevant qubit(s) is tensor-producted with the
identity operation on all the other qubits.
What’s H ⊗ H?
1 1 1 1
11 −1 1 −1
(5.6)
2 1 1 −1 −1
1 −1 −1 1
Why should it look like this? Let’s look at the first row: H ⊗ H |00i = |++i.
For the second row, H ⊗ H |01i = |+−i, and so on. All of the two-qubit uni-
taries we’ve seen were built up using tensor products of single-qubit unitaries,
except for the CNOT, where the first qubit affects the second. We’ll need
operations like CNOT in order to have one qubit affect another.
5.3. MULTI-QUBIT STATES AND OPERATIONS 39
5.3.2 Entanglement
Let’s see the multi-qubit operations of Section 5.3.1 in action by calculating
the result of applying the circuit below. The sequence of operations in Figure
5.6 and their effects on the input are given in ket notation in Equation 5.7 and
in vector notation in Equation 5.8.
|0i H •
|0i
Figure 5.6: Circuit for producing a Bell pair.
The action of the CNOT can also be written as |x, yi → |x, y ⊕ xi. The
state that this circuit ends on, |00i+|11i
√
2
, is called the Singlet or the Bell Pair
or the EPR Pair. This state is particularly interesting because measuring the
first qubit collapses the state of the second qubit. The state can’t be factored
into a tensor product of the first qubit’s state and the second qubit’s state.
Such a state is called entangled , which for pure states simply means: not
decomposable into a tensor product.
A state that’s not entangled is called unentangled or separable or a product
state (for pure states, which are the only kind being discussed at this point,
all three of these mean the same thing).
The basic rules of quantum mechanics, which we saw earlier, force entan-
glement to exist. It was noticed quite early in the history of the field. It turns
out that most states are entangled.
As we mentioned earlier, entanglement was arguably what troubled Ein-
stein the most about quantum mechanics. He thought that it meant that
quantum mechanics must entail “spooky action at a distance.” That’s be-
cause, while typically particles need to be close to become entangled, once
they’re entangled you can separate them to an arbitrary distance and they’ll
40 LECTURE 5. THE COIN PROBLEM, INNER PRODUCTS, . . .
stay entangled (assuming nothing else is done to them). This has actually
been demonstrated experimentally for distances of up to 150 miles (improved
to a couple thousand miles by Chinese satellite experiments, while this course
was being taught!).
Let’s say that Alice and Bob entangle a pair of particles by setting their
state to |00i+|11i
√
2
. Then Alice brings her particle to the moon while Bob stays on
Earth. If Alice measures her particle, she can instantaneously know whether
Bob will observe a |0i or a |1i when he measures his.
Figure 5.7: We often denote shared entanglement between two parties with a
squiggly line (you know, cause entanglement is “spooky” and “weird”).
This bothered Einstein, but others thought that it wasn’t that big a deal.
After all, Alice doesn’t get to control the outcome of her measurement! She
sees |0i and |1i with equal probability, which means that in this case, the
“spooky action” can be explained as just a correlation between two random
variables, as we could already see in the classical world. However, a famous
1935 paper of Einstein, Podolsky, and Rosen brought up a further problem:
namely, there are other things Alice could do instead of measuring in the
{|0i , |1i} basis.
What happens if Alice measures in the {|+i , |−i} basis? She’ll get either
|+i or |−i, as you might expect. Indeed, we can model the situation by Alice
Hadamarding her qubit and then measuring in the {|0i , |1i} basis. Alice
Hadamarding gives us the state
|00i + |11i |00i + |01i + |10i − |11i
(H ⊗ I) = .
2 2
So now, applying the partial measurement rule what is Bob’s state? If Alice
sees |0i, then Bob’s qubit collapses to
|0i + |1i
√ = |+i .
2
Conversely, if Alice sees |1i then Bob’s qubit collapses to
|0i − |1i
√ = |−i .
2
5.3. MULTI-QUBIT STATES AND OPERATIONS 41
Einstein, Podolsky and Rosen went on to talk about how this is more
troubling than before. If Alice measures in the {|0i , |1i} basis, then Bob’s
state collapses to |0i or |1i, but if she measures in the {|+i , |−i} basis, then
Bob’s state collapses to |+i or |−i. And that looks a lot like faster-than-light
communication!
How can we explain this? One thing we can do is ask “what happens if
Bob makes a measurement?”
I In the case where Alice measured her qubit in the {|0i , |1i} basis, Bob
will see |0i or |1i with equal probability if he measures his qubit in the
same basis.
I In the case where Alice measured her qubit in the {|+i , |−i} basis, Bob
will still see |0i or |1i with equal probability if he measures his qubit in
the {|0i , |1i} basis (as an exercise, check this).
So, at least in this case, the probability that Bob sees |0i or |1i is the
same regardless of what Alice chooses to do. So, it looks like there might be
something more general going on here! In particular, a different description
should exist of Bob’s part of the state that’s totally unaffected by Alice’s
measurements—thereby making manifest the principle of no faster-than-light
communication. Which brings us to the next lecture. . .
Lecture 6: Mixed States
So far we’ve only talked about pure states (i.e., isolated superpositions), but
you can also have quantum superposition layered together with regular, old
probabilistic uncertainty. This becomes extremely important when we talk
about states where we’re only measuring one part. Last time we discussed
the Bell Pair, and how if Alice measures her qubit in any basis, the state of
Bob’s qubit collapses to whichever state she got for her qubit. Even so, there’s
a formalism that helps us see why Bob can’t do anything to learn which basis
Alice makes her measurement in, and more generally, why Alice can’t transmit
any information instantaneously—in keeping with special relativity. This is
the formalism of. . .
Mixed states in some sense are just probability distributions over quantum
superpositions. We can define a mixed state as a distribution over quantum
states, {pi , |ψi i}, meaning that with probability pi the state is |ψi i.
42
6.1. MIXED STATES 43
where |ψi i hψi | denotes the outer product of |ψi with itself. The outer prod-
uct is the matrix which you get by multiplying
|α0 |2
α0
α1 ...
αi αj∗
∗
α0 α1∗ ∗
· · · αN = . (6.2)
.. −1 ..
. αj αi∗ .
αN −1 |αN −1 |2
Note that αi αj∗ = (αi∗ αj )∗ , which means that the matrix is its own conjugate
transpose ρ = ρ† . This makes ρ a Hermitian Matrix. For the standard basis
states, for example, we get
1 0 0 0
|0i h0| = and |1i h1| = .
0 0 0 1
Therefore, an even mixture of them would be
1
|0i h0| + |1i h1| 0 I
= 2 1 = . (6.3)
2 0 2 2
Similarly,
1 1 1
− 21
|+i h+| = 2
1
2
1 |−i h−| = 2
2 2
− 12 1
2
and
1
|+i h+| + |−i h−| 0 I
= 2 1 = . (6.4)
2 0 2 2
Notice that an equal mixture of |0i and |1i is different from an equal
superposition of |0i and |1i (a.k.a. |+i), and so they have different density
matrices. However, the mixture of |0i and |1i and the mixture of |+i and |−i
have the same density matrix, which makes sense because Alice converting
between the two bases in our Bell pair example should maintain Bob’s density
matrix representation of his state.
44 LECTURE 6. MIXED STATES
A purestate written as a density matrix, for example |+i h+|, would look
1 1
like 12 21 , a matrix of rank one. Indeed, a density matrix has rank 1 if and
2 2
only if it represents a pure state.
What if we want to measure a density matrix in a different basis? Measur-
ing ρ in the basis {|vi , |wi} will give P (|vi) = hv|ρ|vi and P (|wi) = hw|ρ|wi.
The matrix I/2 that we’ve encountered above as the even mixture of |0i
and |1i (and also of |+i and |−i) is called the Maximally Mixed State.
This state is basically just the outcome of a classical coin flip, and it has a
special property: regardless of the basis we measure it in, both outcomes will
be equally likely. So for every basis {|vi , |wi} we get the probabilities
I 1 1
hv| |vi = hv|vi = ,
2 2 2
I 1 1
hw| |wi = hw|wi = .
2 2 2
6.1. MIXED STATES 45
This explains why Alice, no matter what she tries, is unsuccessful in sending
a message to Bob by measuring her half of a Bell pair. Namely, because the
maximally mixed state in any other basis is still the maximally mixed state.
The generalization of this fact to any state shared by Alice and Bob and to any
operation performed by Alice is called the No-Communication Theorem.
So
Phow do we handle unitary transformations with density matrices? Since
ρ = i pi |ψi i hψi |, applying U to ρ means that ρ gets mapped to
!
X X X
† †
pi (U |ψi i)(U |ψi i) = pi U |ψi i hψi | U = U pi |ψi i hψi | U † = U ρU † .
i i i
(6.7)
You can pull out the U ’s since it’s the same one applied to each state in the
mixture.
It’s worth noting that getting n2 numbers in the density matrix isn’t some
formal artifact; we really do need all those extra parameters. What do the
off-diagonal entries represent?
1 1
|+i h+| = 2
1
2
1
2 2
The off-diagonal entries are where the “quantumness” of the state re-
sides. They’re where the potential interference between |0i and |1i is repre-
sented. The off-diagonal entries can vary depending on relative phase: |+i h+|
has positive
off-diagonal entries, |−i h−| has negative off-diagonal entries and
1 −i
|ii hi| = 2
i
2
1 has off-diagonal entries of opposite signs. Later we’ll see that
2 2
as a quantum system interacts with the environment, the off-diagonal entries
tend to get pushed down toward 0.
ISquare
I Hermitian
P
I Trace 1 (which is to say
i ρii = 1)
1
2
−10
Could M = 1 be a density matrix?
−10 2
No! Measuring this in the {|+i , |−i} basis would give h+|M |+i = 19/2. Bad!
Remember that you can always transform ρ to U ρU † , whose diagonal then
has to be a probability distribution. If we want that condition to hold for all
U , then we need to add the restriction:
As a refresher, for the matrix ρ, the eigenvectors |xi are the vectors that
satisfy the equation ρ |xi = λ |xi for some eigenvalue λ. If we had an eigen-
vector |xi with a negative eigenvalue then the probability hx|ρ|xi = λ would
be negative, which is nonsense.
Could we have missed a condition? Let’s check. We claim: any Hermitian
PSD matrix with trace 1 can arise as a density matrixPof a quantum state.
For such a ρ, we can represent it in the form ρ = i λi |ψi i hψi | where the
|ψi i are the (normalized) eigenvectors of ρ. Then hψi |ρ|ψi i = λi , so the λi ’s
sum to T r(ρ) = 1. This process of obtaining eigenvalues and eigenvectors is
called eigendecomposition. We know the eigenvalues will be real because
the matrix is Hermitian and they’re non-negative because the matrix is PSD.
For every density matrix ρ, there’s a U such that U ρU † is diagonal (with ρ’s
eigenvalues along it’s diagonal). Namely, the U that switches between the
standard basis and ρ’s eigenbasis.
One important quantity you can always compute for density matrices is
the rank defined as
A density matrix of rank n might look, for example, like that in Equation 6.6,
while a density matrix of rank 1 represents a pure state.
6.1. MIXED STATES 47
In general, rank tells you the minimum number of pure states that you have
to mix to reach a given mixed state.
2 21 12
2 1
2 1 1 1 0
|+i h+| + |0i h0| = + = 13 31
3 3 3 12 12 3 0 0 3 3
P −1
In general, if you have a bipartite pure state, it’ll look like |ψi = N
i,j=0 αi,j |ii |ji
and Bob’s reduced density matrix can be obtained using
X
∗
(ρB )j,j 0 = αi,j αi,j 0. (6.10)
i
The process of going from a pure state of a composite system, to the mixed
state of part of the system, is called tracing out.
The key points:
there’s nothing that Alice can do to her subsystem that affects Bob’s reduced
density matrix. You already have the tools to prove this: just calculate Bob’s
reduced density matrix, then apply a unitary transformation to Alice’s side,
then see if Bob’s density matrix changes. Or have Alice measure her side, and
see if Bob’s reduced density matrix changes.
To review, we’ve seen three different types of states in play, each more
general than the last:
|0i h0| + |1i h1| |+i h+| + |−i h−| |ii hi| + |−ii h−i|
, , or .
2 2 2
⊥ ⊥
More generally we can define the maximally mixed state as |vihv|+|v2 ihv | for
any orthogonal pair of states |vi and |v ⊥ i. The sum of any two of these vectors
on the sphere is the origin.
More generally, we can in this way represent any mixed state as a point
inside of the sphere. A mixture of any states |vi and |wi, represented as points
on the surface of the sphere, will be a point on the line segment connecting
50
7.1. THE BLOCH SPHERE 51
Figure 7.1: The Bloch sphere representation of a qubit state. Antipodal points
on the surface of the sphere correspond to orthogonal states. Pure states live
on the surface of the sphere while mixed states correspond to points in the
interior. The center of the sphere is a special point and corresponds to the
maximally mixed state I2 .
the two.
We can show geometrically that every 1-qubit mixed state can be written as
a mixture of only two pure states. Why? Because you can always draw a line
that connects any pure state you want to some point in the sphere representing
a mixed state, and then see which other pure state the line intersects on its
way out. The point can then be described as some convex combination of the
vectors representing the pure states. This is visually represented in Figure 7.2.
Experimentalists love the Bloch sphere, because it works identically to
how spin works with electrons and other spin- 21 particles. You can measure
the particle’s “spin” relative to any axis of the sphere, and the result will be
that the electron is spinning either clockwise or counterclockwise relative to
the axis. The particle’s spin state is literally a qubit, which collapses to one
of the two possible spin states on measurement.
The weird part about spin- 12 particles is that you could have asked the
direction of the spin relative to any other axis and still would have gotten that
it was either clockwise or counter-clockwise relative to that axis. So what’s
really going on: what’s the real spin direction? Well, the actual state is just
some point on the Bloch sphere, so there is a “real spin direction,” but there’s
also no measurement that reliably tells us that direction. The crazy part here
52 LECTURE 7. BLOCH SPHERE, NO-CLONING. . .
It turns out that we can prove that a procedure to reliably copy an un-
known quantum state cannot exist. It’s easy to prove, but it’s a fundamental
fact about quantum mechanics. In fact, we already saw one proof: namely,
cloning would imply superluminal communication, which would violate the
No-Communication theorem that you proved in the homework! But let’s see
more directly why cloning is impossible.
Let’s try to clone a single qubit, |ψi = α |0i + β |1i.
In our quantum circuit we want to apply some unitary transformation that
takes |ψi and an ancilla as input, and produces two copies of |ψi as output.
|ψi |ψi
U
|0i |ψi
Algebraically our cloner U would need to implement the transformation
The problem? This transformation isn’t linear (it has quadratic terms), so it
can’t be unitary!
To clarify, a known procedure that outputs some state |ψi can be rerun to
get many copies of |ψi. What the No-Cloning Theorem says is that if |ψi is
given to you but is otherwise unknown then you can’t make a copy of it.
Another clarification: CNOT seems like a copying gate—as it maps |00i →
|00i and |10i → |11i. So why doesn’t it violate the No-Cloning Theorem?
Because it only copies if the input state is |0i or |1i. Classical information can
be copied. Doing CNOT on |+i |0i produces the Bell pair |00i+|11i√
2
; this sort of
copies the first qubit in an entangled way, but that’s different than making a
copy of |+i. Having two qubits in the local state I2 is not the same as having
two in the state |+i. In general, for any orthonormal basis you can clone the
basis vectors, if you know that your input state is one of them.
Since the No-Cloning Theorem is so important, we’ll present another proof
of it. A unitary transformation can be defined as a linear transformation that
preserves inner product. Which is to say that the angle between |vi and |wi
is the same as the angle between U |vi and U |wi. Thus hv|U † U |wi = hv|wi.
What would a cloning map do to this inner product? Let | hv|wi | = c; then
The bank maintains a giant database that stores for each bill in circulation
the classical serial number s as well as a string f (s) that encodes what the
quantum state attached to bill is supposed to be.
To verify a bill, you bring it back to the bank. The bank verifies the bill
by looking at the serial number, and then measuring each qubit in the bill
in the basis in which it was supposed to be prepared. That is, if the qubit
was supposed to be |0i or |1i, then measure in the {|0i , |1i} basis; if it was
supposed to be |+i or |−i, then measure in the {|+i , |−i} basis. For each
measurement, check that you get the expected outcome.
Consider a counterfeiter who doesn’t know which basis each qubit is sup-
posed to be in, so they guess the bases uniformly at random. They only have
a ( 12 )n chance of making all n guesses correctly. Of course one could imagine
a more sophisticated counterfeiter, but it’s possible to prove that regardless of
what the counterfeiter does, if they map a single input bill to two output bills
then the output bills will both pass verification with probability at most ( 34 )n .
When the bank goes to measure each qubit they’ll find that the ones that
should be in the {|0i , |1i} basis are correct all of the time. But, the ones that
should be in the {|+i , |−i} basis are correct on both bills only 14 of the time.
Thus the probability that the counterfeiter succeeds (i.e., that both bills pass
verification) is ( 85 )n .
58
8.1. QUANTUM MONEY ATTACKS 59
CN OT (cos ()) |0i + sin () |1i) |0i = cos ()) |00i + sin () |11i . (8.1)
Following the measurement of the bill, most of the time |ci will snap back to
|0i. At each step the probability of getting caught (i.e. failing verification) is
60 LECTURE 8. QUANTUM MONEY AND QKD
sin2 () ≈ 2 . Thus the total probability of getting caught after the 2
π
iterations
2π
is upper-bounded by 2 = O() by the union bound. A similar analysis can be
done if |ψi i is |1i or |−i; we’re unlikely to get caught, and |ci keeps “snapping
back” to |0i. But if |ψi i = |+i then something different happens; in that case
the CNOT gate has no effect, so |ci gradually rotates from |0i to |1i. So, when
we measure at the end we can distinguish |+i from the other states because
it’s the only one that causes |ci to rotate to |1i. By symmetry we can give
analogous procedures to recognize the other three possible states for |ψi i. So
then we just iterate over all n qubits in the bill, learning them one by one just
like in the previous interactive attack on Wiesner’s scheme.
Can Wiesner’s scheme be fixed to patch this vulnerability? Yes! The bank
can just give the customer a new bill (of the same value) after each verification,
instead of the bill that was verified.
There’s an additional problem with Wiesner’s scheme, as we’ve seen it.
Namely, it requires the bank to hold a huge amount of information, one secret
for every bill in circulation. However, a paper by Bennett, Brassard, Breidbart
and Wiesner, from 1982, points out how to circumvent this by saying: let f
be a pseudorandom function with a secret key k, so that for any serial number
s, the bank can compute fk (s) for itself rather than needing to look it up. Of
course the bank had better keep k itself secret—if it leaks then out the entire
money system collapses! But assuming that k remains a secret, why is this
secure?
We use a reduction argument. Suppose that the counterfeiter can copy
money by some means. What does that say about fk ? If fk were truly random,
then the counterfeiter wouldn’t have succeeded, by the security of Weisner’s
original scheme. So by checking whether the counterfeiter succeeds, we can
distinguish fk from a random function. So fk wasn’t very good at being pseu-
dorandom! Note that with this change, we give up on information-theoretic
security of the sort that we had with Wiesner’s original scheme. Now we “only”
have security assuming that it’s computationally intractable to distinguish fk
from random. Moreover, a recent result by Prof. Aaronson shows that some
computational assumption is necessary if we don’t want the bank to have to
store a giant database.
However, even after we make the improvements above, Wiesner’s scheme
still has a fundamental problem, which is that to verify a bill you need to take
it back to the bank. If you have to go to the bank, then arguably you might
as well have used a credit card or something instead! The point of cash is
supposed to be that we don’t need a bank to complete a transaction. This
leads to the concept of Public-Key Quantum Money.
8.2. QUANTUM KEY DISTRIBUTION 61
I Given that Alice and Bob have a shared key k ∈ {0, 1}n , Alice can take
her secret message m ∈ {0, 1}n and encode it by the ciphertext c = m⊕k,
where ⊕ denotes the bit-wise XOR.
I Bob, after receiving c, can decode the message using his copy of the
secret key, using the fact that c ⊕ k = m ⊕ k ⊕ k = m.
As its name implies, the One-Time Pad can only be used once securely
with a given key, so it requires a large amount of shared key. In fact, in
the classical world Claude Shannon proved that if they want to communicate
securely, Alice and Bob either need a shared secret key that’s at least as long as
all the messages that they want to send, or else they must make computational
assumptions about the eavesdropper “Eve”. The great discovery of Quantum
Key Distribution (QKD) was that quantum mechanics lets us get secure key
distribution with no need for computational assumptions! We do, however,
need communication channels capable of sending quantum states.
62 LECTURE 8. QUANTUM MONEY AND QKD
The basic idea is that you’re trying to establish some shared secret knowl-
edge and you want to know for certain that no eavesdroppers on the channel
can uncover it. You’ve got a channel to transmit quantum information and a
channel to transmit classical information. In both eavesdroppers may be able
to listen in (no secrecy). But, in the classical channel we’ll assume you at least
have authentication; Bob knows that any messages really come from Alice and
vice versa. The BB84 protocol proceeds as follows:
I Now Alice and Bob share which bases they picked to encode and measure
the state |ψi (the strings y and y 0 ). They discard any bits of x and x0
for which they didn’t pick the same basis (which will be about half the
bits). What remains of x and x0 is now their shared secret key.
classical encryption scheme, like the One-Time Pad, or a scheme that depends
on computational assumptions (if they want to make their shared secret key
last longer).
Lecture 9: Superdense Coding
65
66 LECTURE 9. SUPERDENSE CODING
For Bob to decode this transformation, he’ll want to use the transformation
1 0 0 1
1 1 0 0 −1
√ , (9.2)
2 0 −1 1
0
0 1 −1 0
which corresponds to the circuit
• H
So, Alice transforms the Bell pair into one of the four orthogonal states
above, then Bob decodes that two-qubit state into one of the four possible
combinations of |0i and |1i, corresponding to the original bits x and y. For
example, if Bob receives |01i−|10i
√
2
, then applying CNOT gets him |1i ⊗ |−i and
the then Hadamard gives him |1i ⊗ |1i. If Bob receives |00i−|11i
√
2
, then applying
CNOT gets him |0i ⊗ |+i and the Hadamard then gives him |0i ⊗ |1i.
Naturally, we could ask: if Alice and Bob had even more pre-shared en-
tanglement, could Alice send an arbitrarily large amount of information by
transmitting only one qubit? There’s a theorem that says no. It turns out
that given a qubit and any number of pre-shared entangled qubits (ebits), you
can send two bits of classical information, but no more. That is, we can write
the inequality
67
68 LECTURE 10. TELEPORTATION, ENTANGLEMENT SWAPPING. . .
entanglement. How should Alice go about this? Once the question is posed,
you can play around with different combinations of operations and you’d even-
tually discover that what works is this:
|ψi • H
|00i+|11i
√
2
Figure 10.1: Quantum circuit for performing quantum teleportation protocol.
where |ψi = α |0i + β |1i is the state Alice wishes to send. The top two qubits
in the circuit above are Alice’s. At the end, will Alice also have |ψi? No.
A logical consequence of the No-Cloning Theorem is that there can only be
one copy of the qubit. Could we hope for a similar protocol without sending
classical information? No, because of the No-Communication Theorem.
Now let’s analyze the behavior of the circuit in Figure 10.1 in more detail.
The qubit Alice wants to transmit is |ψi = α |0i + β |1i. The combined state
of her qubit, along with the entangled Bell Pair she shares with Bob is
1
√ (α |+00i + α |+11i + β |−10i + β |−01i)
2
1
= (α |000i + α |100i + α |011i + α |111i + β |010i − β |110i + β |001i − β |101i)
2
(10.4)
Finally, Alice measures both of her qubits in the {|0i , |1i} basis. This leads
to four possible outcomes for the state of Bob’s qubit conditioned on her
10.1. QUANTUM TELEPORTATION 69
If Alice Sees: 00 01 10 11
Then Bob’s qubit is: α |0i + β |1i α |1i + β |0i α |0i − β |1i α |1i − β |0i
Next, Alice tells Bob her measurement results via a classical channel and
Bob uses the information to “correct” his qubit to |ψi. If the first bit sent
by Alice is 1 then Bob applies Z, and if the second bit sent by Alice is 1
then Bob applies X. These transformations will bring Bob’s qubit to the
state |ψi = α |0i + β |1i. That means they’ve successfully transmitted a qubit
without a quantum channel!
Note this protocol never assumed that Alice knew what |ψi
was.
For the protocol to work, Alice had to measure her syndrome bits and
communicate the result to Bob. These measurements were destructive (since
we can’t ensure that they’ll be made in a basis orthonormal to |ψi), and thus
Alice doesn’t have |ψi at the end. Alice and Bob also “use up” their Bell pair
in the process of teleporting |ψi.
FAQ
How do people come up with this stuff ?
Well it’s worth pointing out that quantum mechanics was discovered in
1926 and that quantum teleportation was only discovered in the 90’s. These
sorts of protocols can be hard to find. Sometimes someone tries to prove that
something is impossible, and in doing so eventually figures out a way to get it
done. . .
Aren’t we fundamentally sending infinitely more information than
two classical bits if we’ve sent over enough information to perfectly
describe an arbitrary qubit, since the qubit’s amplitudes could be
arbitrary complex numbers?
In some sense, but at the end of the day, Bob only really obtains the infor-
mation that he can measure, which is significantly less. Amplitudes may “ex-
ist” physically, but they’re different from other physical quantities like length,
in that they seem to act a lot more like probabilities. Like, there’s a state of a
single qubit α |0i + β |1i such that the binary encoding of β corresponds to the
complete works of Shakespeare—–the rules of quantum mechanics don’t put a
limit on the amount of information that it takes to specify an amplitude. With
that said, we could also encode the complete works of Shakespeare into the
probability that a classical coin lands heads! In both cases, the works of Shake-
speare wouldn’t actually be retrievable by measuring the system, assuming we
didn’t have an immense number of copies of it.
H •
×
×
The final gate in this circuit is a SWAP gate between the last two qubits. Note
that the first and third end up entangled even though there’s never “direct”
contact between them. The second qubit serves as an intermediary.
What does it take for Alice and Bob to get entangled? The
obvious way is for Alice to create a Bell pair and then send
one of the qubits to Bob. In most real-world experiments
the entangled qubits are created somewhere between Alice
and Bob and then one qubit is sent to each.
teleports his half of the Bell pair that he shares with Alice to Diane. The
result of this series of teleportations is that Bob and Diane now both have one
half of a Bell pair—even though the two qubits Bob and Diane possess were
never in causal contact with one another!
course, but for now we’ll use it to illustrate an interesting conceptual point.
Let’s say that Alice, Bob, and Charlie hold random bits, which are either all
0 or all 1 (so, they’re classically correlated). If all three of them get together,
they can see that their bits are correlated and the same is true even if only
two of them are together.
Now suppose instead that the three players share a GHZ state. With all
three of them together they can see that the state is entangled, but what
if Charlie is gone? Can Alice and Bob see that they’re entangled with each
other? No. To see this observe that by the No-Communication Theorem,
Charlie could’ve measured without Alice and Bob knowing. But, if he did,
then Alice and Bob would clearly have classical correlation only: either both
|0i’s (if Charlie got the measurement outcome |0i) or both |1i (if Charlie got
|1i). From this it follows that Alice and Bob have only classical correlation
regardless of whether Charlie measured or not.
A different way to see this is to look at the reduced density matrix of the
state shared by Alice and Bob,
1
2
0 0 0
0 0 0 0
ρAB = 0 0 0 0 .
0 0 0 21
Notice that this is different than the density matrix of a Bell pair,
1 1
2
0 0 2
0 0 0 0
ρBell =
.
0 0 0 0
1
2
0 0 12
This is one illustration of a general princi-
ple called the Monogamy of Entanglement.
Simply put, if Alice has a qubit that is max-
imally entangled with Bob’s qubit, then that
qubit can’t also be maximally entangled with
Charlie’s qubit. With the GHZ state, you can
only see the entanglement if you have all three
qubits together. This is sometimes analogized to
the Borromean Rings, an arrangement of three
rings with the property that all three are linked together, but removing any
one ring unlinks the other two.
There are other 3-qubit states that behave differently than the GHZ state.
In the W state, |100i+|010i+|001i
√
3
, there’s some entanglement between Alice and
74 LECTURE 10. TELEPORTATION, ENTANGLEMENT SWAPPING. . .
Bob, and there’s some entanglement between Alice and Charlie, but neither
pair is maximally entangled.
In the next lecture, we’ll make more rigorous precisely what we mean by
saying Alice and Bob share “some” entanglement when we talk about how to
quantify entanglement.
Lecture 11: Quantifying Entan-
glement
How do you quantify how much entanglement there is between two quantum
systems? It’s worth noting that we get to decide what we think a measure of
entanglement ought to mean. We’ve seen how it can be useful to think of Bell
pairs as a resource, so we can phrase the question as “how many ‘Bell pairs of
entanglement’ does a given state correspond to?”
how do we calculate many Bell pairs it’s worth? Our first observation is that
given any bipartite pure state you can always find a change of basis on Alice’s
side and another change of basis on Bob’s side that puts the state into the
simpler form
X
λi |vi i |wi i , (11.2)
i
where the set of states {|vi i} form an orthonormal basis and likewise for the set
75
76 LECTURE 11. QUANTIFYING ENTANGLEMENT
of states {|wi i}—though the sets of states {|vi i} and {|wi i} are not necessarily
orthonormal with respect to each other. We call the form of the state in
Equation 11.2 the Schmidt Decomposition or Schmidt Form.
Measuring in the {|vi i |wi i} basis would yield the probability distribution
>
|λ0 |2 , · · · , |λn−1 |2 .
n−1
X 1
H(P ) = pi log2 . (11.4)
i=0
pi
(both pure and mixed), called the von Neumann Entropy. The von Neu-
mann entropy of a mixed state ρ is
n−1
X 1
S(ρ) = γi log2 (11.5)
i=0
γi
where diag(A) is the length-n vector obtained from the diagonal of the n × n
matrix A. This alternative definition makes it immediately clear that the
von Neumann entropy of any pure state |ψi is 0, because there’s always some
measurement basis (namely, a basis containing |ψi) that returns a definite
outcome.
For example, you could choose to measure the state |+i in the {|0i , |1i}
basis and you’ll have complete uncertainty and a Shannon entropy of 1. But,
if you measure |+i in the {|+i , |−i} basis you’ll have a Shannon entropy of 0
because you’ll always get the outcome |+i. As such, the von Neumann entropy
of |+i is 0. By contrast, the von Neumann entropy of the maximally mixed
state, I2 , is 1; similarly, the von Neumann Entropy of the n-qubit maximally
mixed state is n.
To quantify the entanglement of this state we’ll use a measure called the
Entanglement Entropy. The entanglement entropy of a (pure) bipartite
state is given by
>
E(|ψi) = S(ρA ) = S(ρB ) = H( |λ0 |2 , · · · , |λn−1 |2 )
(11.8)
3 4
|ψi = |0i |+i + |1i |−i .
5 5
Then we can calculate the entanglement entropy using any of the three equiva-
lent definitions in Equation 11.8. In this case we already have the state written
in Schmidt form, so we can directly use the last definition in Equation 11.8:
2 2 ! 2 2 !
3 5 4 5
E(|ψi) = log2 + log2 ≈ .942.
5 3 5 4
This means that if Alice and Bob shared 1000 copies of |ψi, they’d be able to
teleport about 942 qubits. We can also confirm that the other two definitions
for the entanglement entropy, in terms of the von Neumann entropies of Alice
and Bob’s reduced density matrices, do in fact give the same value. Suppose
Bob measures his state in the {|+i , |−i} basis. Bob sees |+i with probability
9/25, in which case Alice’s qubit is |0i, and he sees |−i with probability 16/25,
in which case Alice’s qubit is |1i. As such, the reduced density matrix for
Alice’s qubit is
11.3. MIXED STATE ENTANGLEMENT 79
9 16
ρA = |0i h0| + |1i h1| .
25 25
Likewise, suppose Alice measures her qubit in the {|0i , |1i} basis. Alice will
see |0i with probability 9/25, in which case Bob’s qubit is |+i, and she’ll see
|1i with probability 16/25, in which case Bob’s qubit is |−i. As such, the
reduced density matrix for Bob’s qubit is
9 16
ρB = |+i h+| + |−i h−| .
25 25
Both of these reduced density matrices are already diagonalized and so we can
immediately see that they share the same eigenvalues—though they have dif-
ferent eigenvectors. Moreover, the values of the eigenvalues correspond exactly
to the values of the squared coefficients of the Schmidt form of the state. As
such, we’ll find precisely the same value for the entanglement entropy regard-
less of whether we calculate it by finding the Schmidt form of a given state or
by finding one of the reduced density matrices. In practice, this means you
should feel free to use whichever method is most convenient.
A mixed state is called entangled if and only if it’s not separable. This condi-
tion is subtle, as it sometimes happens that a density matrix looks entangled,
but it turns out there’s some non-obvious decomposition which shows it’s ac-
tually separable. It’s important to note that the converse of the separability
criterion in Equation 11.9 is not true. That is, being able to decompose a
mixed state into a convex combination of entangled states does not imply
that the mixed state is entangled. A simple counterexample is the two-qubit
maximally mixed state
where the |φi i’s are each one of the 4 Bell states
At this point in the course, we’re finally in a position to step back and ask,
what is quantum mechanics telling us about reality? It should be no surprise
that there isn’t a consensus on this question (to put it mildly)! But, regard-
less of your own views, it’s important to know something about the various
positions people have defended over the years, as the development of these
positions has sometimes gone hand in hand with breakthroughs in quantum
mechanics. We’ll see an example of this later with the Bell inequality, and ar-
guably quantum computing itself is another example. Most discussions about
the implications of quantum mechanics for our understanding of reality center
around the so-called Measurement Problem. In most physics texts (and in
this class for that matter) measurement is introduced as just a primitive op-
eration whose implications we don’t try to understand more deeply. However,
there’s a fundamental weirdness about measurement in QM, which stems from
the fact that the theory seems to demand both:
I Unitary evolution in which |ψi → U |ψi.
I Measurements in which a state collapses to some outcome |ii with prob-
ability | hψ|ii |2 .
82
12.1. THE COPENHAGEN INTERPRETATION 83
While the SUAC view has some obvious practical advantages, it seems
clear that it can’t satisfy people’s curiosity forever. This is not only because
science has always aspired to understand what the world is like, with exper-
iments and predictions a means to that end. A second reason is that as ex-
perimenters become able to create ever larger and more complicated quantum
superpositions—in effect “breaching” the Copenhagen boundary between the
quantum and classical worlds—it becomes less and less viable to “quarantine”
quantum mechanics as simply a weird mathematical formalism that happens
to work for predicting the behavior of electrons and photons. The more QM
impinges on the world of our everyday experience, the more it seems necessary
to come to terms with whatever it says about that world.
| i+| i
√
2
Then, Schrödinger adds some flair by asking what happens if we create a
quantum state that corresponds to a superposition of states, in one of which
a cat is alive and the other the cat is dead? (Or perhaps a superposition of
happy and sad, √12 (| i + | i), if you prefer a less grisly thought experiment).
He isolates the state of the cat from the external environment by putting it in
a box. The point of Einstein and Schrödinger’s thought experiment is that the
formal rules of quantum mechanics apply whenever you have distinguishable
states, regardless of their size. In particular, they say that in principle you can
create arbitrary linear combinations of such states. By the time we’re talking
about something as big as a cat, it seems patently obvious that we should have
to say something about the nature of what’s going on before measurement.
Otherwise we’d devolve into extreme solipsism—saying, for example, that the
cat only exists once we’ve opened the box to observe it.
12.3. DYNAMICAL COLLAPSE 85
|Wigner0 i + |Wigner1 i
|Wigner’s Friendi ⊗ √ =
2
|Wigner’s Friendi |Wigner0 i + |Wigner’s Friendi |Wigner1 i
√
2
From Wigner’s point of view, he’s thinking one thought or the other one. But,
from his friend’s point of view, Wigner isn’t thinking either of them until
a measurement gets made. After Wigner’s friend makes a measurement, the
state of his own mind will change to either a state in which he saw that Wigner
wanted pancakes or one where he saw Wigner wanted eggs. At that point we’ll
have an entangled state like
| i+| i | ih | + | ih |
√ →
2 2
Note that in principle, there’s a measurement that can distinguish the two
states above.
the above—ideally deriving that criterion from more fundamental laws. Some
suggestions include:
I Collapse happens when some number of atoms get involved.
I Collapse happens after a certain total mass is reached.
I Collapse happens when a system reaches a certain level of “complexity.”
|0 · · · 0i + |1 · · · 1i
√
2
will cause all of the qubits to collapse to either |0 · · · 0i or |1 · · · 1i. While the
probability for some given atom to collapse is minuscule, with a sufficiently
large number of atoms in our system (say ∼ 1023 for a typical cat) the proba-
bility that at least one of the atoms has collapsed can be overwhelming. In the
GRW proposal, macroscopically large superposition states (often called cat-
states) are inherently unstable—the bigger the system, the shorter expected
lifetime of maintaining a Schrödinger-cat-like state.
Penrose then has further ideas about how all of this might
be related to consciousness, which we won’t go into.
Superconducting Circuits
this current can number in the billions or trillions and so this is an example
of a quantum superposition involving billions of particles!
rule was obeyed. But, many people in the past half-century have been unsat-
isfied with that argument, seeing it as circular—as it essentially smuggles the
Born rule into the definition of “almost all branches”! So, proponents have
continued to look for something better.
There are many arguments, which we won’t go into here, that try to for-
malize the intuition that the Born probabilities are naturally “baked into”
how quantum mechanics works. After all, unitary evolution already singles
out the 2-norm as special by preserving it, so then why shouldn’t the proba-
bilities also be governed by the 2-norm? More pointedly, one can argue that,
if the probabilities were governed by something other than the 2-norm, then
we’d get bizarre effects like faster-than-light communication. But, while these
arguments help explain why the Born rule is perhaps the only choice of prob-
ability rule that makes internal mathematical sense, they still leave slightly
mysterious how probability enters at all into Everett’s vision of a determin-
istically evolving wavefunction. In Everett’s defense, one could ask the same
questions—where do these probabilities come from? why should they follow the
Born rule, rather than some other rule? —in any interpretation, not just in
MWI.
If there’s no experiment that could differentiate the Copenhagen Interpre-
tation from Many Worlds, why bother arguing about it?
Many Worlders say that the opponents of Galileo and Copernicus could
also claim the same about the Copernican versus Ptolemaic theories, since
Copernican heliocentrism made no difference to the predictions of celestial
movement. Today we might say that the Copernican view is better since if
you were to fly outside of the solar system and see all the planets (including
Earth) rotating around the far more massive sun, you’d realize that the Coper-
nican was closer to reality; it’s only our parochial situation of living on Earth
that ever motivated geocentrism in the first place. If we push this analogy
further, it might be harder to think of anything similar for the Many Worlds
interpretation, since quantum mechanics itself explains why we can’t really
get outside of the state |ψi to see the branching—or even get outside our own
branch to interact in any way with the other decoherent branches.
There is one neat way you could imagine differentiating the two, though.
Before we talked about doing the double-slit experiment with larger and larger
systems. Bringing that thread to its logical conclusion, what if we could run
the double-slit experiment with a person going through the slits? It seems
like it would then be necessary to say that “observers” can indeed exist in
superpositions of having one experience and having a different one. This is
what the MWI said all along, but it seems to put a lot of rhetorical strain on
the Copenhagen interpretation. If you talk to modern Copenhagenists about
94 LECTURE 12. INTERPRETATIONS OF QUANTUM MECHANICS
this they’ll often take a quasi-solipsistic view, saying that if this experiment
were run “the person behaving quantumly doesn’t count as an observer; only
I, the experimenter, do.” Of course, the Wigner’s Friend thought experiment
was trying to get at this same difficulty.
Let’s say I buy into the argument that the universe keeps branching. In
what basis is this branching occurring?
This third question is called the Preferred Basis Problem. We talked
about Schrödinger’s cat as branching into the | i state and the | i state.
But mathematically we could equally well have decomposed the cat’s state in
a basis like:
| i+| i | i−| i
√ √
2 2
So, is there anything besides our intuition to “prefer” the first decomposi-
tion over the second one? There’s a whole field of physics that tries to answer
questions like these, called Decoherence Theory. The central idea is that
there are certain bases whose states tend to be robust to interactions with the
environment; most bases, however, don’t have this property. In the example
above, decoherence theory would explain that an alive cat doesn’t easily deco-
here if you poke it, but a cat in the √12 (| i + | i) state does, because the | i
and | i branches interact differently with the environment. This, according
to decoherence theory, is more-or-less how the laws of physics pick out certain
bases as being special.
12.4. THE MANY-WORLDS INTERPRETATION 95
From the standpoint of decoherence theory we can say that an event has
“definitely happened” only if there exist many records of the event spread
through the environment, so that it’s no longer feasible to erase them all.
96
13.1. HIDDEN VARIABLE THEORIES 97
also a “real place” where the particle is, even before anyone measures it. To
make that work we need to give a rule for how the superposition “guides” the
real particle. This rule should have the property that, if anyone does measure
the particle, they’ll find exactly the result that quantum mechanics predicted
for it—since we certainly don’t want to give up on quantum mechanics’ em-
pirical success!
At first, you might think that it would be tricky to find such a rule; in-
deed, you might wonder whether such a rule is possible at all. However, the
real problem turns out to be more like an embarrassment of riches! There
are infinitely many possible rules that could satisfy the above property—and
by design, they all yield exactly the same predictions as standard quantum
mechanics. So there’s no experimental way to know which one is correct.
To explain this in a bit more detail, let’s switch from particle positions
back to the discrete quantum mechanics that we’re more comfortable with
in this course. Suppose we have a quantum pure state, represented as an
amplitude vector in some fixed basis. Then when we multiply by a unitary
transformation, suppose we want to be able to say: “this is the basis state we
were really in before the unitary was applied, and this is the one we’re really
in afterwards.” In other words, we want to take the equation
β0 U0,1 · · · U0,n−1 α0
.. .. .. ..
. = . . . (13.1)
βn−1 Un−1,1 Un−1,n−1 αn−1
and map it to an equation
|β0 |2 |α0 |2
..
. = S ... (13.2)
|βn−1 |2 |αn−1 |2
for some choice of stochastic matrix S (possibly depending on the input and
output vectors). There are many, many such matrices S. For example, you
could put [|β0 |2 , · · · , |βn−1 |2 ]> in every column, which would say that you’re
always jumping randomly around, but in a way that preserves the Born rule.
You could have been in a different galaxy one Planck time (∼ 10−43 seconds)
ago; now you’re here (with fictitious memories planted in your brain); who
knows where you’ll be a Planck time from now?
Bohm, however, thought not about this discrete setting, but mostly about
the example of a particle moving around in continuous Euclidean space. In
the latter case it turns out that one can do something nice that isn’t possible
98 LECTURE 13. HIDDEN VARIABLES AND BELL’S INEQUALITY
When Bohm proposed his interpretation, he was super eager for Einstein
(whose objections to quantum mechanics we’ve discussed previously) to accept
it, but Einstein didn’t really go for it, probably because of this nonlocality
problem. What Einstein really seems to have wanted (in modern terms) is a
Local Hidden Variable Theory where hidden variables not only exist, but
can be localized to specific points in space and are only influenced by things
happening close to them.
be measured), and that each qubit carries around its own local copy of the
answers.
This is not Bohmian mechanics. In fact, around 1963 John Bell wrote a
paper that drew attention to the nonlocal character of Bohmian mechanics.
Bell remarked that it would be interesting to prove that all hidden variable
theories must be nonlocal. In other words that nonlocality isn’t just some
defect of Bohm’s proposal, but inherent to hidden variable theories in general.
The paper has a footnote saying that as the paper was going to press, such a
proof was found. This was the first announcement of one of the most famous
discoveries ever made about quantum mechanics, what we now call Bell’s
Theorem.
Einstein and others had already touched on the idea of local hidden variable
theories in their philosophical debates in the 1930s. Bell was the first to ask: do
local hidden variables have any empirical consequences that disagree with the
predictions of quantum mechanics? Is there an actual experiment that could
rule out the possibility of local hidden variables? Bell came up with such an
experiment. We’ll describe it differently from how Bell did originally—more
computer sciencey—as a game with two cooperating players named (what
else?) Alice and Bob, where the achievable win probability can be improved
through shared entanglement (to a value higher than is possible classically).
This game is called the CHSH Game.
The idea is that Alice and Bob are placed in separate rooms and are both
given a challenge bit (x and y, respectively) by a referee, Charlie. The challenge
bits are chosen uniformly at random, and independently of each other. Alice
100 LECTURE 13. HIDDEN VARIABLES AND BELL’S INEQUALITY
sends an answer bit, a, back to the referee and Bob sends back an answer bit
b. Alice and Bob “win” the game iff
Figure 13.1: Diagrammatic depiction of the CHSH game. Charlie, the referee,
prepares two challenge bits x and y uniformly at random and sends them
to Alice and Bob respectively. Alice and Bob in response send bits a and b
respectively (which could depend on their inputs) back to Charlie with the
goal of having a + b = xy (mod 2). In other words, Alice and Bob want to
select bits such that the parity of their selected bits is equal to the AND of
their input bits.
argument holds true even if we assume that Alice and Bob have access to
shared randomness.
Let’s treat Alice’s output bit a as a function of her input bit x and Bob’s
output bit b as a function of his input bit y. In order to win the game they
need to select bits a and b satisfying the the equation
Strategy x y a b a+b xy
0 0 0 0 0 0
0 1 0 0 0 0
Always Send 0 1 0 0 0 0 0
1 1 0 0 0 1
0 0 1 1 0 0
0 1 1 1 0 0
Always Send 1 1 0 1 1 0 0
1 1 1 1 0 1
0 0 0 0 0 0
0 1 0 1 1 0
Same as Input 1 0 1 0 1 0
1 1 1 1 0 1
0 0 1 1 0 0
0 1 1 0 1 0
Opposite of Input 1 0 0 1 1 0
1 1 0 0 0 1
Table 13.1: Enumeration of all possible deterministic strategies for the CHSH
game. Lines where a + b 6= xy (mod 2) are colored red.
The Bell Inequality, in this framework, is just the slightly boring state-
ment that we proved above. Namely, that the maximum classical win proba-
bility in the CHSH game is 75%. Bell noticed an additional fact, though. If
Alice and Bob have access to a pre-shared Bell pair, |00i+|11i
√
2
, then there’s a
better strategy. In that case, in fact, their maximum win probability is
102 LECTURE 13. HIDDEN VARIABLES AND BELL’S INEQUALITY
π
P = cos2 ≈ 85%
8
How do they use entanglement to achieve an improvement over the classical
win probability? Tune in next time to find out!
Lecture 14: Nonlocal Games
π π
5π 5π 5π
|π/8i = cos |0i + sin |1i , | i = cos |0i + sin |1i
8 8 8 8 8
(14.1)
π π
3π 3π
3π
|−π/8i = cos − |0i +sin − |1i , | i = cos
|0i +sin |1i
8 8 8 88
(14.2)
The states correspond to a rotation of the standard basis either π/8 radians
counterclockwise, as in Equation 14.1, or π/8 radians clockwise, as in Equation
14.1. We actually saw at least one of these states in a different context back in
Section 5.2 when discussing the distinguishability of quantum states—though
103
104 LECTURE 14. NONLOCAL GAMES
π
P (|π/8i) = |h0|π/8i|2 = |cos (π/8)|2 = cos2
8 π (14.3)
P (|5π/8i) = |h0|5π/8i| = |cos (5π/8)| = |sin (π/8)|2 = sin2
2 2
8
14.1. CHSH GAME: QUANTUM STRATEGY 105
Now suppose Alice sees |1i. Then Bob’s qubit collapses to |1i and when
he measures in the {|π/8i , |5π/8i} basis he sees either |π/8i or |5π/8i with
probabilities
π
P (|π/8i) = |h1|π/8i|2 = |sin (π/8)|2 = sin2
8 (14.4)
2 π
2 2 2
P (|5π/8i) = |h1|5π/8i| = |sin (5π/8)| = |cos (π/8)| = cos
8
Recall that the goal is to output a and b such that a + b = xy (mod 2). In
this first case xy = 0 so Alice and Bob win if they return either a = b = 0 or
a = b = 1. The probability they win is thus given by
P (a = 0)P (b = 0|a = 0) + P (a = 1))P (b = 1|a = 1)
1 1
2 π 2 π
= cos + cos
2 8 2 8
2 π
= cos
8
The analysis for the case where x = 0 and y = 1 and the case where
x = 1 and y = 0 is very similar to the case above, so we’ll leave it to the
reader to verify that the protocol indeed succeeds with probability cos2 (π/8)
in those cases. Somewhat more interesting is the case where x = y = 1. In
order to analyze this case it is useful to note that the Bell state takes the
same general form regardless of which basis we choose to right it in. In other
words, |00i+|11i
√
2
= |vvi+|wwi
√
2
for any orthonormal basis {|vi , |wi}. In this case
we’ll choose to write it as |++i+|−−i
√
2
. Therefore, when Alice measures in the
{|+i , |−i} basis she sees both |+i and |−i with equal probability. In either
case, Bob’s qubit collapses to the same outcome. Suppose Alice sees |+i. Then
when Bob measures his qubit in the {|−π/8i , |3π/8i} basis he sees |−π/8i with
probability
2 π
2
P (|−π/8i) = |h+| − π/8i| = sin (14.5)
8
and he sees |3π/8i with probability
π π
P (|3π/8i) = 1 − P (|−π/8i) = 1 − sin2 = cos2 (14.6)
8 8
Likewise, when Alice sees |−i, Bob sees either |−π/8i with probability
π
P (|−π/8i) = |h−| − π/8i|2 = cos2 (14.7)
8
106 LECTURE 14. NONLOCAL GAMES
π π
P (|3π/8i) = 1 − P (|−π/8i) = 1 − cos2 = sin2 . (14.8)
8 8
The win condition for Alice and Bob when x = y = 1 is for a and b to have
odd parity. As such, the win probability is:
to the problem. You may think that maybe if we came at it another way,
we could use entanglement to win even more than 85% of the time, perhaps
even 100% of the time. Surprisingly, the cos2 π8 win probability turns out
to be optimal for quantum strategies, even if Alice and Bob share unlimited
amounts of entanglement. This result is known as Tsirelson’s Inequality or
108 LECTURE 14. NONLOCAL GAMES
1 2
cos (θ0 − φ0 ) + cos2 (θ0 − φ1 ) + cos2 (θ1 − φ0 ) + sin2 (θ1 − φ1 ) .
P (win) =
4
(14.9)
Each of the four input pairs has an equal chance of occurring. In the first
three cases, Alice and Bob win iff they output the same bit. The probability of
this is given by the squared inner product of corresponding measurement basis
elements, which is equal to the squared cosine of the difference between their
measurement angles. In the fourth case, Alice and Bob win iff they output
different bits. As such, we take the squared sine of the difference between their
measurement angles. Using the power-reduction identity from trigonometry,
we can rewrite 14.9 as
P (win) =
1 1
+ [cos (2(θ0 − φ0 )) + cos (2(θ0 − φ1 )) + cos (2(θ1 − φ0 )) + sin (2(θ1 − φ1 ))]
2 8
(14.10)
We can then get rid of the 2’s inside the cosines by folding them into our
original angles. It will be helpful to think of the cosines as the inner products
between unit vectors. In that case, we can rewrite the above as
1 1
+ [u0 · v0 + u0 · v1 + u1 · v0 − u1 · v1 ]
2 8 (14.11)
1 1
= + [u0 · (v0 + v1 ) + u1 · (v0 − v1 )] .
2 8
14.1. CHSH GAME: QUANTUM STRATEGY 109
1 1 1 1
+ [u0 · (v0 + v1 ) + u1 · (v0 − v1 )] ≤ + [||v0 + v1 || + ||v0 − v1 ||] . (14.12)
2 8 2 8
Using the parallelogram law (2||x||2 + 2||y||2 = ||x + y||2 + ||x − y||2 ), we get
1 1 1 1h p
2
i
+ [||v0 + v1 || + ||v0 − v1 ||] = + ||v0 + v1 || + 4 − ||v0 + v1 || .
2 8 2 8
√ (14.13)
Next, since v0 and v1 are orthornormal, ||v0 + v1 || = 2. So we find
1 1 h√ √ i 1 1h √ i
+ 2+ 2 = + 2 2
2 8 2 8
1 1 (14.14)
2 π
= + √ = cos .
2 2 2 8
π
So cos2 8
really is the maximum winning probability for the CHSH game.
This was evidence, not only that local realism was false, but also that
entanglement had been created. Most physicists shrugged, already sold on
quantum mechanics (and on the existence of entanglement). But, a few, still
committed to a classical view of the world, continued to look for loopholes
in the experiment. Skeptics pointed out two main loopholes in the existing
experiments, essentially saying “if you squint enough, classical local realism
might still be possible”:
I Detection Loophole
– Sometimes detectors fail to detect a photon, or they detect non-
existent photons (called “dark counts”). Enough noise in the ex-
periments turns out to make a local hidden-variable explanation
possible again.
I The Locality Loophole
– Performing the measurements and storing their results in a com-
puter memory takes some time, maybe nanoseconds or microsec-
onds. Now, unless Alice and Bob and the referee are very far away
from each other, this opens the possibility of a sort of “local hidden
variable conspiracy,” where as soon as Alice measures, some par-
ticle (unknown to present-day physics) flies over to Bob and says
“hey, Alice got the measurement outcome 0, you should return the
measurement outcome 0 too.” The particle would travel only at
the speed of light, yet could still reach Bob before his computer
registered the measurement outcome.
By the 2000s physicists were able to close the locality loophole, but only
in experiments still subject to the detection loophole and vice versa. Finally,
in 2016 several teams managed to do experiments that closed both loopholes
simultaneously.
There are still people who deny the reality of quantum entanglement, but
through increasingly solipsistic arguments. The last stronghold for these skep-
tics is the idea of Superdeterminism. Superdeterminism and the related
“Freedom-of-Choice Loophole” explains the results of CHSH experiments by
saying “we only think Alice and Bob can choose measurement bases randomly.
14.2. THE ODD CYCLE GAME 111
In actuality, there’s a grand cosmic conspiracy involving all of our brains, our
computers, and our random number generators, with the purpose of rigging
the measurement bases to ensure that Alice and Bob can win the CHSH game
85% of the time. But that’s all this cosmic conspiracy does! It doesn’t allow
FTL communication or anything like that, even though it easily could.” Nobel
Laureate Gerard ’t Hooft (the ’t is pronounced like “ut”) advocates superde-
terminism, so it’s not like the idea lacks distinguished supporters, at the very
least.
The referee chooses between these tests with equal probability—and cru-
cially, he doesn’t tell Alice or Bob which test he’s performing. In a single run
of the game, the referee performs one such test, and gets answers from Alice
and Bob. We’ll assume their answers are always RED or BLUE.
What strategy provides the best probability that Alice and Bob will pass the
referee’s test and win the game? Classically we know that, regardless of what
Alice and Bob do, P (win) < 1. Why? One can show that for Alice and Bob
to answer all possible challenges correctly they’d need an actual two-coloring,
which is impossible. The best they can do is agree on a coloring for all but
one of the vertices, which gives them a win probability of
1
P (win) = 1 − . (14.15)
2n
112 LECTURE 14. NONLOCAL GAMES
Figure 14.1: A 5-node cycle with a partial two-coloring. Notice that there is
no consistent way to fill in a color for the node labeled “?”.
π π π 2
2 2 1
P (lose) = 1 − cos = sin ≈ =O . (14.17)
2n 2n 2n n2
14.3. THE MAGIC SQUARE GAME 113
As with the Odd Cycle game, there is no classical strategy that allows Alice
and Bob to win with certainty, as that would require an actual assignment of
entries on the grid that satisfies all the constraints on the rows and columns.
The constraints on the rows require the total number of −1’s in the grid to
be even, while the constraints on the columns require the number to be odd.
This implies that there’s no classical strategy that lets Alice and Bob win the
game with probability 1.
Nevertheless, David Mermin discovered a quantum strategy where Alice
and Bob win with probability 1. This strategy requires them to share 2 ebits
114 LECTURE 14. NONLOCAL GAMES
Until recently, the Bell inequality was taught because it was historically and
conceptually important, not because it had any practical applications. Sure,
it establishes that you can’t get away with a local hidden variable theory,
but in real life, no one actually wants to play the CHSH game, do they?
Recently, however, the Bell inequality has found applications in one of the most
important tasks in computing and cryptography: the generation of guaranteed
random numbers.
115
116 LECTURE 15. EINSTEIN-CERTIFIED RANDOMNESS
That’s what makes so interesting (and non-obvious) that the Bell inequality
lets us certify numbers as being truly random under very weak assumptions.
These assumptions basically boil down to “no faster-than-light travel is possi-
ble.” Let’s now explain how.
Suppose you have two boxes that share quantum entanglement. We’ll
imagine the boxes were designed by your worst enemy, so you trust nothing
about them. All we’ll assume is that the boxes can’t send signals back and
forth, say because you put them in Faraday cages, or separated them by so
large a distance that light doesn’t have enough time to travel between them
during the duration of your interaction. A referee sends the boxes challenge
numbers, x and y and the boxes return numbers a and b. If the returned
numbers pass a test, we’ll declare them to be truly random.
So what’s the trick? Well, we already saw the trick; it’s just the CHSH
game! The usual way to present the CHSH game is as a way for Alice and Bob
to prove that they share entanglement, and thus that the universe is quantum-
mechanical and that local hidden-variable theories are false. However, winning
the CHSH game more than 75% of the time also establishes that a and b must
have some randomness, and that there was some amount of entropy generated.
Why? Because suppose instead that a and b were deterministic functions.
That is, suppose they could be written as a(x, r) and b(y, r) respectively, in
terms of Alice and Bob’s inputs as well as shared random bits. In that case,
whatever these functions were, they’d define a local hidden-variable theory,
which is precisely what Bell’s Theorem rules out! So the conclusion is that,
if x and y are random and there’s no communication between Alice and Bob,
then there must exist at least some randomness in the outputs a and b.
Around 2012, Umesh Vazirani coined the term Einstein-Certified Ran-
domness for this sort of thing. The basic idea goes back earlier, for example,
to Roger Colbeck’s 2006 PhD thesis and (in cruder form) to Prof. Aaronson’s
15.1. GUARANTEED RANDOM NUMBERS 117
2002 review of Stephen Wolfram’s “A New Kind of Science,” which used the
idea to refute Wolfram’s proposal for a deterministic hidden-variable theory
underlying quantum mechanics.
OK, so how do we actually extract random bits from the results of the
CHSH game? You could just take the stream of all the a’s and b’s that are
outputted after many plays of the CHSH game. Admittedly, this need not give
us a uniform random string. In other words, if the output string has length
n, then its Shannon entropy,
X 1
px log2 (15.1)
x
px
where px is the probability of string x, will in general be less than n. However,
we can then convert x into an (almost) uniformly random string on a smaller
n
number of bits, say 10 or something, by using a well-known tool from classical
theoretical computer science called a randomness extractor. A randomness
extractor is a function that crunches down many sort-of-random bits (and,
typically, a tiny number of truly random bits, called the seed ) into a smaller
number of very random bits.
OK, but there’s an obvious problem with this whole scheme. Namely, we
needed the input bits to be uniformly random in order to play the CHSH
game. But, that means we put in two perfect random bits, x and y, in order
to get out two bits a and b that are not perfectly random! In other words, the
entropy we put in is greater than the entropy we get out, and the whole thing
is a net loss. A paper by Pironio et al. addressed this by pointing out that you
don’t have to give Alice and Bob perfectly random bits every time the CHSH
game is played. Instead, you can just input x = y = 0 most of the time, and
occasionally stick in some random x’s and y’s to prevent Alice and Bob from
using hidden variables. Crucially, if Alice or Bob gets a 0 input in a given
round, then they have no way of knowing whether that round is for testing
or for randomness generation. So, if they want to pass the randomly-inserted
tests, then they’ll need to play the CHSH game correctly in all the rounds (or
almost all of them), which means generating a lot of randomness.
At this point it all comes down to a quantitative question of how much
entropy can we get out, per bit of entropy that we put in? There was a race
to answer this by designing better and better protocols that got more and
118 LECTURE 15. EINSTEIN-CERTIFIED RANDOMNESS
more randomness out per bit of randomness invested. First, Colbeck showed
how to get cn bits out for n bits in, for some constant c > 1. Then, Pironio
et al. showed how to get ∼ n2√bits out per n bits in. Then, Vazirani and
Vidick showed how to get ∼ e n bits out per n bits in, which is the first
time we had exponential randomness expansion. But, all this time an obvious
question remained in the background: why not just use a constant amount
of randomness to jump-start the randomness generation, and then feed the
randomness outputted by Alice and Bob back in as input, and so on forever,
thereby getting unlimited randomness out?
It turns out that a naı̈ve way of doing this doesn’t work. If you just feed
Alice and Bob the same random bits that they themselves generated, then
they’ll recognize those bits, so they won’t be random to them. This will allow
Alice and Bob to cheat, making their further outputs non-random.
If you don’t have a limit on the number of devices used, then a simple fix
for this problem is to feed Alice and Bob’s outputs to two other machines,
Charlie and Diane. Then you can feed Charlie and Diane’s outputs to two
more machines, Edith and Fay, and so on forever, getting exponentially more
randomness each time. But what if we have only a fixed number of devices
(like 4, or 6) and we still want unlimited randomness expansion? In that case,
a few years ago Coudron and Yuen, and independently Chung, Miller, Shi, and
Wu, figured out how to use the additional devices as “randomness laundering
machines.” These extra machines are used to convert random bits that Alice
and Bob can predict into random bits that they can’t predict, so that the
output bits can be fed back to Alice and Bob for further expansion.
One question that these breakthrough works didn’t address was exactly how
many random seed bits are needed to jump-start this entire process. Like, are
we talking a billion bits or 2 bits? In a student project supervised by Prof.
Aaronson, Renan Gross calculated the first explicit upper bound, showing that
a few tens of thousands of random bits suffice. That’s likely still far from the
truth, and finding a tighter upper-bound is still an open question. For all we
know it might be possible with as few as 10 or 20 random bits.
It’s also worth mentioning that this sort of protocol has already been ex-
perimentally demonstrated at NIST, and indeed is part of what’s used for
NIST’s public randomness beacon.
15.1. GUARANTEED RANDOM NUMBERS 119
Having seen lots of quantum protocols, we’re finally ready to tackle the holy
grail of the field: a programmable quantum computer, a single machine that
could do any series of quantum-mechanical operations. Quantum computation
has two intellectual origins. One comes from David Deutsch, who was think-
ing about experimentally testing the Many-Worlds Interpretation (of which
he was and remains a firm believer) during his time as a postdoc here at UT
Austin1 . Much like with Wigner’s friend, Deutsch imagined creating an equal
superposition of a brain having measured a qubit as |0i, and the same brain
having measured the qubit as |1i. In some sense, if you ever had to ascribe
such a superposition state to yourself then you’d have gone beyond the Copen-
hagen Interpretation. But, how could we ever test this? Given the practical
impossibility of isolating all the degrees of freedom in a human brain, the first
step would presumably have to be: take a complete description of a human
brain, and upload it to a computer. Then, the second step would be to put the
computer into a superposition of states corresponding to different thoughts.
This then naturally leads to more general questions: could anything as
complicated as a computer be maintained in a superposition of “semantically
different” states? How would you arrange that, in practice? Would such a
computer be able to use its superposition power to do anything that it couldn’t
do otherwise?
The other path to the same questions came from Richard Feynman, who
gave a famous lecture in 1982 concerned with the question how do you sim-
ulate quantum mechanics on a classical computer? Chemists and physicists
had known for decades that this is hard, essentially because the number of
amplitudes that you need to keep track of increases exponentially with the
number of particles. This is the case because, as we know, an n-qubit state in
1
David Deutsch is arguably more famous, however, for his pivotal role in the Avengers’
defeat of the evil tyrant Thanos, in the movie Avengers Endgame.
120
121
what’s computable in general. As such, they certainly can’t let us solve the
halting problem or anything of that kind.
Why does each gate act on only a few qubits? Where is this assumption
coming from?
It’s similar to how classical computers don’t have gates act on arbitrar-
ily large numbers of bits, and instead use small gates like AND, OR, and
NOT to build up complex circuits. The laws of physics provide us with local
interactions—one particle colliding with another one, two currents flowing into
a transistor, etc. . . —and it’s up to us to string together those local interactions
into something more complicated.
In the quantum case, you could imagine a giant unitary matrix U , which
takes qubits, interprets them as encoding an instance of the 3SAT problem (or
some other NP-complete problem) and then CNOTs the answer (1 for yes, 0
for no) into an extra qubit. That U formally exists and is formally allowed
by the rules of quantum mechanics. But how would you go about actually
implementing it in practice? Well, you’d need to build it up out of smaller
components—say, components that act on only a few qubits at a time.
It turns out that it’s possible to implement any unitary U you want using
exponentially many simple components (we’ll say more about that later in
the lecture). The question that will most interest us in quantum computing
theory is which U ’s can be implemented using only polynomially many simple
components. Those are the U ’s that we’ll consider feasible or efficient.
What is the role of interference in quantum computing?
Quantum amplitudes can cancel each other out; that’s the most important
way in which they differ from classical probabilities. The goal in quantum
computing is always to choreograph a pattern of interference such that, for
each wrong answer, some of the contributions to its amplitude are positive and
others are negative (or, for complex amplitudes, they point every which way
in the complex plane), so on the whole they interfere destructively and cancel
each other out. Meanwhile, the contributions to the correct answer’s amplitude
should interfere constructively (say, by pointing along the same direction in
the complex plane). If we can arrange that, then when we measure, the right
answer will be observed with high probability.
Note that if it weren’t for interference, then we might as well have just used
a classical computer with a random-number generator and saved the effort
of building a quantum computer. In that sense, all quantum-computational
advantage relies on interference.
What is the role of entanglement in quantum computing?
In some sense, we can develop the entire theory of quantum computing
without ever talking about entanglement. However, pretty much any time
16.1. COMPLEXITY OF GENERAL UNITARIES: COUNTING ARGUMENT123
(α0 |0i + β0 |1i) ⊗ (α1 |0i + β1 |1i) ⊗ · · · ⊗ (αn−1 |0i + βn−1 |1i)
2n = T log T (16.4)
2n
This is satisfied for T ≈ n . Moreover, if T is smaller then (N + T )O(T ) is
n
minuscule compared to 22 , so almost every function must require a circuit of
n
size O( 2n ).
any number of bits. For example, the NAND gate by itself is universal. The
diagram in Figure 16.1 shows how you’d construct an OR gate out of NANDs.
You can also work out how to construct an AND gate and a NOT gate, and
from there you can get anything else. By contrast, the set {AND, OR} is not
universal, because it can only express monotone Boolean functions; that is,
changing an input bit from 0 to 1 can never change an output bit from 1 to
0. Likewise, the set {NOT, XOR} is not universal, because it can only express
Boolean functions that are linear or affine mod 2. So, while “most” gate sets
are universal, being universal isn’t completely automatic.
A •
B •
1 1 ⊕ AB = ¬AB
Figure 16.2: Circuit for simulating the effect of NAND using a Toffoli gate
along with an ancillary bit.
16.2. UNIVERSAL GATE SETS 127
Note that this argument does not yet establish that Toffoli is a universal
reversible gate in the sense we defined above, because for that we need to
implement all possible reversible transformations, not just compute all Boolean
functions. However, a more careful argument, part of which we’ll give later in
the course, shows that Toffoli is universal in that stronger sense as well.
A good example of a gate that separates the two kinds of universality is
the Fredkin Gate, which is also called Controlled-SWAP or CSWAP. This is
a 3-bit gate that swaps the second and third bits iff the first bit is set to 1.
Just like the Toffoli gate, the Fredkin gate can simulate a NAND gate (we’ll
leave the construction as an exercise for the reader) and is therefore capable
of universal computation. However, the Fredkin gate is not universal in the
stronger sense, because it can never change the total number of 1’s in the
input string, also called the Hamming weight. For example, it’s impossible to
compose any number of Fredkin gates in order to map the string 000 to 111.
A gate with the property that it always preserves the Hamming weight of the
input is called conservative.
There are also reversible gates that are not even capable, on their own, of
universal computation. An important example is the CNOT gate, which we
saw earlier. By composing CNOT gates, we can only express Boolean functions
that are affine mod 2. For example, we can never express AND or OR. More
generally, and in contrast to the irreversible case, it turns out that no 2-bit
classical reversible gate is capable of universal computation. For universality,
we need reversible gates (such as Toffoli or Fredkin) that act on at least 3 bits.
It’s worth noting that any classical reversible gate can also be used as a
quantum gate (i.e., it’s unitary). From this we can immediately see that,
if nothing else, a quantum circuit can compute any function that a classical
circuit of similar size can compute. We simply need to transform the classical
circuit into one made of, say, Toffoli gates.
S = {CNOT, Hadamard, S}
16.2. UNIVERSAL GATE SETS 129
Are there any other ways for a set of quantum gates, acting on qubits,
to fail to be universal? That’s currently an open question! It’s one of Prof.
Aaronson’s personal favorites. Not many people in the field care about this
particular question, since we have lots of universal gate sets that work, and so
most just roll with it. But it would be nice to know, and you should go solve
it.
So then, what are come examples of universal quantum gate sets? It turns
out that the stabilizer set, S = {CNOT, Hadamard, S}, becomes universal if
you swap out the Hadamard gate for nearly anything else. So, for example,
S = {CNOT, Rπ/8 , S} is universal, as is S = {Toffoli, Hadamard, S} by a 2002
result of Yaoyun Shi. Additionally, if you just pick a 2-qubit gate uniformly
at random, then it’s known to have a 100% chance (in the measure-theoretic
sense) of being universal. With universality, the whole difficulty comes from
the remaining 0% of gates! Above, we listed bad cases—ways of failing to be
universal—but there are also general criteria such that if your gate set meets
them then it’s universal. We won’t cover this, but see the paper of Shi, for
example, for such criteria.
using only O(4n polylog( 1 )) gates. In other words, if we treat n as fixed, then
the complexity scales only like some power of log ( 1 ). This means that all
universal gate sets, or at least the ones closed under inverses, “fill in the space
of all unitary transformations reasonably quickly.” Furthermore, the gate se-
quences that achieve the Solovay-Kitaev bound can actually be found by a
reasonably fast algorithm. Whether the closed-under-inverses condition can
be removed remains an unsolved problem to this day.
The original proofs of Solovay-Kitaev from the late 1990s required a number
of gates which grew like log3.97 ( 1 ). However, more recently it’s been shown
that, at least if we use special universal gate sets arising from algebra and
number theory, we can get the number of gates to grow only like log ( 1 ). This is
not only much more practical, but it’s also the best one could possibly hope for
on information-theoretic grounds. The proof of the Solovay-Kitaev Theorem is
beyond the scope of this course, but it’s contained in many quantum computing
textbooks, including Nielsen & Chuang.
Lecture 17: Quantum Query Com-
plexity and The Deutsch-Josza Prob-
lem
People often want to know where the true power of quantum computing comes
from.
I Is it the ability of amplitudes to interfere with one another?
I Is it the huge size of Hilbert space (the space of possible quantum states)?
I Is it that entanglement gives us 2n amplitudes to work with?
But that’s sort of like dropping your keys and asking what made them fall?
There can be many complementary explanations for the same fact, all of
them valid, and that’s the case here. If there weren’t a huge number of am-
plitudes, quantum mechanics would be easy to simulate classically. If the
amplitudes were instead standard probabilities, rather than complex numbers
that could interfere with each other, QM would also be easy to simulate classi-
cally. If no entanglement were allowed, then all pure states would be product
states, once again easy to represent and simulate classically. If we were re-
stricted to stabilizer operations, QM would be easy to simulate classically by
the Gottesman-Knill Theorem. But as far as we know, full QM—involving
interference among exponentially many amplitudes in complicated, entangled
states and with non-stabilizer group operations—is hard to simulate classi-
cally, and that’s what opens up the possibility of getting exponential speedups
using a quantum computer.
131
132 LECTURE 17. QC AND DEUTSCH-JOSZA
|xi |xi
Uf
|yi |y ⊕ f (x)i
Figure 17.1: Diagram of the XOR oracle.
Later, we’ll see that it’s often most convenient to consider quantum queries
17.1. QUANTUM QUERY COMPLEXITY 133
that map each basis state |xi to (−1)f (x) |xi. In other words, queries that
write the function value into the phase of the amplitude, rather than storing
it explicitly in memory. Or, more precisely, we consider queries that map each
basis state |x, bi to (−1)f (x)·b |x, bi, where b is a bit that controls whether the
query should even take place or not. This sort of Phase Oracle doesn’t really
have any classical counterpart, but it is extremely useful for setting up desired
interference patterns.
It behooves us to ask: how do the phase oracle and XOR oracle compare?
Could one be more powerful than the other? Happily, it turns out that they’re
equivalent, in the sense that either can be used to simulate the other with no
cost in the number of queries. This is a result that we’ll need later in this
lecture. To see the equivalence, all we need to do is consider what happens
when the second register is placed in the |−i state before a query using a
Hadamard gate. You can check that this converts a XOR oracle into a phase
oracle as follows:
|x, 0i − |x, 1i
√ = |x, −i (17.2)
2
and if f (x) = 1 then the state is
|x, 1i − |x, 0i
√ = − |x, −i (17.3)
2
we can rewrite both outcomes as just (−1)f (x) |x, −i. Meanwhile, if the second
register is placed in the |+i state, then nothing happens, so we really do get
the |x, bi → (−1)f (x)·b |x, bi behavior. Conveniently, the converse is also true!
That is, if we have a phase oracle, then by placing the output register in one
of the states |+i or |−i, we can simulate the effect of a XOR oracle, with the
phase oracle causing |+i and |−i to be swapped if and only if f (x) = 1.
Taking a step back, though, what are we really doing when we design
a quantum algorithm in the query model? We’re abstracting away part of
the problem, by saying “Our problem involves some function, f (x), that we
want to learn some property of in a way that only requires evaluating f (x)
on various inputs, while ignoring the details of how f is computed.” Because
of this abstracting away of details, the query model is also referred to as the
Black-Box Model. So, for example, you might want to learn: is there some
134 LECTURE 17. QC AND DEUTSCH-JOSZA
input x such that f (x) = 1?, Does f (x) = 1 for the majority of x’s? Does f
satisfy some global symmetry, such as periodicity, or is it far from any function
satisfying the symmetry? etc. . . We then want to know how many queries to
f are needed to learn this property.
Why do we care about the black-box model? You’re debating how you’d
phrase your wishes if you found a magical genie. Who cares? The truth is
more prosaic, though. You can think of a black box as basically a huge input
string. From that standpoint, querying f (x) just means looking up the xth
element in the string. Or another perspective is to imagine you’re writing
code that calls a subroutine that computes f (x). You don’t want to modify
the subroutine (or even examine it’s internal workings). You just want to
know how many calls to it are needed to find some information about f . We
assume in the quantum case that calls to the subroutine can happen on a
superposition of different input values and that we get back a superposition of
different answers.
Now suppose we had a small circuit C such that C |xi = |f (x)i. Then,
simply by running that circuit backwards (inverting all the gates and reversing
their order) we could get C −1 |f (x)i = |xi. Thereby inverting the supposed
one-way function! But why doesn’t this contradict our starting assumption that
f was easy to compute? Because a reversible circuit for f would at best give us
a mapping like |xi |0i → |xi |f (x)i, a mapping that leaves x around afterward.
Inverting that mapping will only take us from f (x) to x if we know x already,
so the reversible mapping is no help in breaking the one-way function after all.
This still leaves the question of how we efficiently implement the mapping
|xi |0i → |xi |f (x)i if given a small non-reversible circuit for f . In the last
lecture we saw how it’s possible to take any non-reversible circuit and simulate
it by a reversible one by, for example, using Toffoli gates to simulate NAND
gates. The trouble is that along the way to the final answer, the construction
also produces all sorts of undesired results in the ancillary bits. The technical
name for this is Garbage (yes, really).
X X
αx |x, 0i → αx |x, f (x)i
x x
136 LECTURE 17. QC AND DEUTSCH-JOSZA
without all the garbage? Back in the 1970s, Charles Bennett (while studying
this problem in the context of classical reversible computation) invented a trick
for this called Uncomputing. It’s simple, though also strange when you first
see it. The trick to getting rid of garbage is to run computations first forward
and then in reverse. Let’s say I have some circuit C such that
where gar(x) is a generic term for all the garbage produced as a byproduct of
the computation. Then I do the following:
|xi |xi
|0i |0i
|0i C C −1 |0i
.. ..
. .
|0i • |0i
|0i |f (x)i
Figure 17.2: Circuit depicting the uncomputation of garbage.
The reason why we can copy f (x) in spite of the No-Cloning Theorem is
that we’re assuming that f (x) is a classical answer. This won’t work if the
output of the circuit is a general quantum state.
With uncomputing out of the way we’re finally ready to talk about some
quantum algorithms.
17.3. DEUTSCH’S ALGORITHM 137
|0i + |1i
(−1)f (0) √ (17.5)
2
while if f (0) 6= f (1), meaning we have odd parity, we get
|0i − |1i
(−1)f (0) √ (17.6)
2
We can ignore the phase out front, since global phase doesn’t affect measure-
ment. Applying another Hadamard gate now gets our quantum state back to
the {|0i , |1i} basis. If we have even parity, f (0) = f (1), then the output is
|0i and if we have odd parity, f (0) 6= f (1), then the output |1i. A complete
quantum circuit for the algorithm is given in Figure 17.3.
Note that if we wanted the parity of an n-bit input string, Deutsch’s algo-
rithm would let us get that with n2 queries. We simply need to break the string
up into n2 blocks of 2 bits each, use Deutsch’s algorithm to learn the parity, pB ,
of each block B (using n2 queries in total), and then calculate the parity of the
pB ’s. This last step doesn’t increase the query complexity because it doesn’t
involve making any additional queries to f . This turns out to be optimal—any
quantum algorithm to compute the parity of an n-bit string requires at least
n
2
queries—but we won’t prove it in this course.
138 LECTURE 17. QC AND DEUTSCH-JOSZA
|0i H H
UF
|1i H
Figure 17.3: Quantum circuit for Deustch’s algorithm with the additional
ancillary qubit used for simulating the phase query shown explicitly.
Truth is, this speedup still isn’t all that impressive, because
the classical probabilistic algorithm is nearly as fast, and
would be perfectly fine in practice. Until 1994 or so, non of
the quantum speedups that we knew were very impressive!
|0i H H
|0i H H
UF ..
..
. .
|0i H H
Figure 17.4: Quantum circuit for the Deutsch-Josza algorithm.
So, given the circuit for the Deutsch-Josza above (call it C), what’s the prob-
ability of getting back the state |0 · · · 0i? This is given by | h0 · · · 0|C|0 · · · 0i |2 .
The first step in the algorithm is to apply a Hadamard gate to each of the n
input qubits. As we mentioned above, applying a round of Hadamard gates to
each qubit in our circuit is such a common primitive in quantum algorithms
that it will be helpful to describe the effect of the transformation on arbitrary
standard basis states. The first thing to note is that we can rewrite the effect
of the Hadamard gate on a single qubit as
1 X
H ⊗n |xi = √ (−1)x·y |yi . (17.9)
2n y∈{0,1}n
Here x · y denotes the inner product. This formula essentially says that we
pick up a –1 phase for every i such that xi = yi = 1. Now, coming back to
140 LECTURE 17. QC AND DEUTSCH-JOSZA
the Deutsch-Jozsa algorithm, after we Hadamard all n of the qubits and then
query the oracle, we get the state
1 X
√ (−1)f (x) |xi . (17.10)
2n x∈{0,1}n
Rather than simplifying this entire sum, let’s take a shortcut by just looking
at the amplitude for the state y = |0 · · · 0i. This amplitude is given by
1 X
(−1)f (x) . (17.12)
2n
x∈{0,1}n
f (1000) = s1
f (0100) = s2
f (0010) = s3
f (0001) = s4
But, no classical algorithm can do better than this, since each query can only
provide one new bit of information about s, and s has n bits. The Bernstein-
Vazirani algorithm, however, solves the problem quantumly using only one
query!
141
142 LECTURE 18. B-Z AND SIMON’S ALGORITHM
|0i H H
|0i H H
UF ..
..
. .
|0i H H
Figure 18.1: Quantum circuit for the Bernstein-Vazirani algorithm. Note that,
aside from having a different oracle, the circuit is identical to Figure 17.4
.
state is the all zero string |0 · · · 0i, then after a round of Hadamards, the state
of the system is given by
1 X
|ψi = √ |xi . (18.1)
2n x∈{0,1}n
1 X 1 X
|ψi = √ (−1)f (x) |xi = √ (−1)s·x |xi . (18.2)
2n x∈{0,1}n
2n x∈{0,1}n
The question is how do we measure the resulting state in a way that gives
us information about the secret string s? To see how to proceed, let’s start
by restating the observation originally made in Equations 17.8 and 17.9, that
given an input state |xi = |x0 , · · · , xn−1 i the effect of applying a Hadamard
gate to all n qubits is
|1i and the qubits that didn’t pick up a phase (si = 0) from |+i to |0i:
1 X
H ⊗n √ (−1)s·x |xi = |si (18.4)
n
2 x∈{0,1}n
So, from here we can simply measure the qubits in the standard basis to
retrieve |si = |s0 , · · · , sn−1 i.
This convenient result isn’t a coincidence, you can see that Bernstein and
Vazirani designed their problem around what a quantum computer would be
able to do!
such that when you flip the bits in that subset the output is unaffected. What
does this mean? Let’s do an example with 3-bit inputs and with secret string
s = 110. Let’s say we query f a few times and get the following outputs
f (000) = 5
f (110) = 0
f (001) = 6
f (111) = 6
We’re given no information about how to interpret the outputs themselves,
so it doesn’t really matter whether we think of them as strings, integers, or
whatever. The only thing that does matter is whether two inputs map to the
same output. Since f (001) = f (111) = 6, we now know that s = 001 ⊕ 111 =
110.
This is simple enough with 3 bits, but we’re more interested in f ’s with,
say, 1000-bit inputs. In that case, we claim that finding the secret string is
prohibitively expensive classically. How expensive? Well, it’s not hard to show
that any deterministic algorithm needs to query f at least 2n−1 +1 times, by an
argument similar to the one that we used for Deutsch-Jozsa. But once again,
the more relevant question is how many queries are needed for a randomized
algorithm.
We claim that we can do a little bit better with a randomized algorithm,
getting down to O(2n/2 ) queries. This is related to the famous Birthday
Paradox, which isn’t really a paradox so much as a “birthday fact.” It simply
says that, if you gather merely 23 people in a room, that’s enough to have a
∼ 50% probability of getting at least one pair of people who share a birthday. √
More generally, if there were n days in the year, then you’d need about n
people in the room for a likely birthday collision. At least assuming birthdays
are uniformly distributed, in reality they’re not exactly, e.g., there are clusters
of them about 9 months after major holidays1 . The takeaway here is that the
number of pairs of people is what’s important and that scales quadratically
with the number of people.
The Birthday Paradox is also useful in cryptanalysis. For example, crypto-
graphic hash functions need to make it intractable to find any two inputs x and
y with the same hash value, f (x) = f (y). But, by using a “birthday attack”—
i.e., repeatedly choosing a random input x, then comparing f (x) against f (y)
for every previously queried input y—we can find a collision using a number
of queries that scales only like the square root of the size of f ’s range. This is
quadratically faster than one might have expected naively.
1
But it can be shown that any nonuniformities only make collisions more likely.
18.2. SIMON’S PROBLEM 145
|0i
Input Register
H H
|0i
H H
.. ..
. .
|0i
H H
UF
Output Register
|0i
|0i
.. ..
. .
|0i
This time the function f has a large output so we need to write out its
values in a separate n-qubit answer register rather than just encoding it into
the phase. But, it’s important to note that the answers themselves aren’t what
we care about! We’re only writing them out because by doing so, we create a
18.2. SIMON’S PROBLEM 147
desired interference pattern in the input register. Indeed, at this point in the
algorithm we could simply discard the answer register, or do anything else we
liked with them. For pedagogical simplicity let’s assume we now measure the
answer register, and let’s assume that the result of the measurement is |wi.
|xi + |yi
√ such that f (x) = f (y) = w (18.7)
2
1 X
√ (−1)x·z |zi ,
2n z∈{0,1}n
1 X
√ (−1)y·z |zi .
2n z∈{0,1}n
⊗n |xi + |yi 1 X
H √ =√ [(−1)x·z + (−1)y·z ] |zi . (18.8)
2 2n+1 z∈{0,1}n
Now we’ll measure this state in the standard basis. Which z’s could we get
when we do so? For a given z to be observed, it must have a nonzero amplitude.
This means that (−1)x·z and (−1)y·z must be equal, which occurs if and only
if x · z = y · z (mod 2). Or, rewriting this equation a bit:
x · z + y · z = 0 (mod 2)
(x ⊕ y) · z = 0 (mod 2) (18.9)
s · z = 0 (mod 2)
So, what we get when we measure is an n-bit string z, chosen uniformly
at random from among all of the 2n−1 strings whose inner product with s is
0. In other words, we haven’t yet learned s itself, but we’ve learned a bit of
information about s, which I hope you’ll grant is something! What if we really
want s itself ? In that case, we can just repeat Simon’s algorithm over and
over, starting from the beginning each time! This will give us a collection of
strings {z0 , · · · , zk−1 }, which are selected uniformly random and independent
of each other, and which all have even inner products with s:
s · z0 = 0 (mod 2)
s · z1 = 0 (mod 2)
.. (18.10)
.
s · zk−1 = 0 (mod 2)
Now that we’ve got these equations, what should we tell a classical computer
to do with them? Well, suppose k = n + c, where c is some large constant.
Then we now have a collection of n + c linear equations in n unknowns over a
finite field with two elements. We can solve this system of equations efficiently
using a classical computer.
It’s not hard to do a probabilistic analysis showing that after we’ve seen
slightly more than n equations, with overwhelming probability we’ll be left
with a system of equations that has exactly two solutions. Namely, 0n and s
itself. We can throw away 0n , because we assumed s 6= 0n . So that leaves us
with s.
So, with O(n) queries along with a polynomial amount of additional clas-
sical computation we can find s, in sharp contrast to the provably exponential
number of queries needed to solve Simon’s problem classically. A few addi-
tional questions:
Does Simon’s algorithm have a deterministic counterpart? Yes, one can
modify the algorithm so that it succeeds with certainty rather than “merely”
overwhelming probability. We won’t go into the details here.
Why doesn’t this just prove that quantum algorithms are better? It’s sort of
tricky to translate Simon’s algorithm from the black-box setting into the “real-
world.” To get a speedup over classical computing in terms of the sheer number
of gates or computational steps, we’d need some small circuit to compute a
function f that was actually like our magical Simon function.
To illustrate, f (x) = Ax for some rank-(n–1) Boolean matrix A would
satisfy the Simon promise. But, the difficulty in getting a quantum speedup
this way is that once we pin down the details of how we’re computing f —
in our example, as soon as we give the actual matrix A—we then need to
compare against classical algorithms that know those implementation details
as well. As soon as we reveal the innards of the black box, the odds of an
efficient classical solution become much higher! In our example, if we knew
the matrix A then we could solve Simon’s problem in classical polynomial
time just by calculating A’s nullspace. More generally, no one to this day has
found a straightforward “application” of Simon’s algorithm, in the sense of a
class of efficiently computable functions f that satisfy the Simon promise and
for which any classical algorithm plausibly needs exponential time to solve
Simon’s problem when the algorithm is given the implementation details of f .
The story goes that Daniel Simon wrote a paper about this theoretical
black-box problem with an exponential quantum speedup and the paper got
rejected. But there was one guy on the program committee who was like,
“hey, this is interesting.” He figured that if you changed a few aspects of what
Simon was doing, you could get a quantum algorithm to find the periods of
periodic functions, which would in turn let you do all sorts of fun stuff. That
guy was Peter Shor and the algorithm he invented will be the focus of our next
three lectures.
Lecture 19: RSA and Shor’s Al-
gorithm
150
19.1. RSA ENCRYPTION 151
No one has shown that factoring and discrete log are nec-
essarily related, e.g. by giving a reduction between them.
In practice, though, advances in solving one problem al-
most always seem to lead to advances in solving the other
152 LECTURE 19. RSA AND SHOR’S ALGORITHM
in short order.
We now know that both RSA and Diffie-Hellman were first discovered in
secret—by the mathematician Clifford Cocks for the former and by James H.
Ellis, Clifford Cocks and Malcolm J. Williamson for the latter—at GCHQ (the
British NSA) before they were rediscovered in public about a few years later.
x = qy + r, (19.2)
such that qy is the greatest multiple of y less than x. This means that y > r.
Then, we find the gcd of y and r and we keep recursing in this way until
r = 0. The size of the numbers involved in each recursive step goes down by
a constant factor each time, which means the whole algorithm runs in time
linear in n, the number of digits of x and y.
19.2. PERIOD FINDING 153
OK, but how do we find the collisions classically? This is the birthday
paradox all over again. Recall from the last lecture that something like ∼ 2n/2
queries to f are both necessary and sufficient.
Why is this important? Well, let’s say we want to factor some number
N = pq, a product of distinct primes. Then here’s an approach: pick an x
such that gcd(x, N ) = 1.
Such x’s are easy to find. Indeed, in the rare event that we
pick an x and it happens that gcd(x, N ) > 1, we can run
Euclid’s algorithm on x and N to factor N right then and
there!
What can we say about this function? First of all, how hard is it to compute?
A naive approach would use r–1 multiplications, but r can be large and so this
can end up being exponential in n = log(N ). But there’s a much, much faster
approach called Repeated Squaring. It’s best illustrated with an example.
Say we want to calculate 1321 mod 15. We could calculate 13 × 13 × · · · × 13
mod 15 by alternating multiplication with reducing mod 15, but that’s still 20
multiplications. Instead, let’s rewrite it as a product of 13 raised to various
powers of 2:
ϕ(N ) = (p − 1)(q − 1) = pq − p − q + 1
So, if we know ϕ(N ) we now know both N = pq and p + q, and we can use the
quadratic formula to solve for p and q themselves. Unfortunately, this doesn’t
quite work with Shor’s algorithm, because the period of f might not equal
ϕ(N ), the most we can say is that the period divides ϕ(N ).
Here’s what we’ll do instead. Let’s pick a random x relatively prime to
N , and find the period s of f (r) = xr mod N . We then have that xs ≡ 1
mod N . Now, let’s imagine we’re lucky and s is even (which intuitively should
happen maybe half the time?). In that case we can write
This gives us a plan of attack for factoring N . The key remaining problem—
and the one we’ll use quantum mechanics to solve—is finding the period of
f.
The first step in the quantum period finding algorithm is to make an equal
superposition over all nonnegative integers less than some upper bound Q,
q−1
1 X
√ |ri |f (r)i , (19.8)
Q r=0
with each integer written in its binary representation. For technical reasons
we’ll explain in a later lecture, we set Q to be a power of 2 of order N 2 . We
can prepare this state by Hadamarding the log(Q) qubits in the input register,
then querying f and writing the output into the log(Q) qubit of the answer
register (with uncomputing to get rid of garbage).
Unlike with Simon’s algorithm, in Shor’s algorithm Uf is not just an ab-
stract black box. We can find an actual quantum circuit to implement Uf ,
because f is just the modular exponentiation function.
By using the repeated squaring trick, we can create actual circuit for Uf
that maps |ri |0 · · · 0i to |ri |f (r)i out of a network of polylog(N ) Toffoli gates.
Just like with Simon’s algorithm, we won’t care at all about the actual
value of f (r), but only about the effect that computing f (r) has on the |ri
register. So for pedagogical purposes, we’ll immediately measure the |f (r)i
register and then discard the result.
19.2. PERIOD FINDING 157
What’s left in the input register? By the partial measurement rule, what’s
left is an equal superposition over all the possible r’s that could’ve led to the
observed value f (r). Since f is periodic with a secret period s, these values
will differ from each other by multiples of s. In other words, we now have the
state
158
20.1. QUANTUM FOURIER TRANSFORM 159
the period s. In science and engineering, any time you have a periodic signal
and you’re trying to extract its period there’s an essential tool used called the
Fourier Transform. There are many types of Fourier transforms: contin-
uous, Boolean, etc. For us, though, the Q-dimensional Quantum Fourier
Transform or QFT will be the Q × Q matrix FQ defined as follows:
ω ij
(FQ )i,j = hi|FQ |ji = √ (20.1)
Q
where ω = e2πi/Q is a Qth root of unity. Here’s some useful intuition for how
the Fourier transform works. In graduate school you can easily fall into a 26-
hour-per-day cycle. So, one day you wake up at 8am, the next day you wake
up at 10am, then 12pm, and so forth so that if nothing interrupts you, you
cycle all the way around. Suppose you’ve fallen into such a cycle and you want
to figure out how long the cycle is without doing any complicated calculations
like subtraction.
What you can do is install a series of clocks in your room, each tracking
“days” of different lengths. So, you’d have a 23-hour clock, a 24-hour clock, a
25-hour clock, etc. . . In addition, you install a bulletin board below each clock
and place a single thumbtack in its center. Now, every time you wake up you
go to each clock and move the thumbtack one inch in the direction the hour
hand points.
What will happen if you keep doing this, week after week? If you’re really
keeping 26-hour days, then the thumbtack corresponding to the 26-hour clock
will always move in the same direction. This is constructive interference!
And the same is true for the 13-hour clock (as well as the 1-hour and 2-hour
clocks). All the others—the 23-hour clock, the 24-hour clock, etc.—will have
the thumbtack move around, sometimes one direction, sometimes another, so
that it eventually returns to the origin.
The Quantum Fourier Transform is essentially this, but with quantum-
mechanical amplitudes instead of thumbtacks.
There are two questions we need to answer here:
I How do we implement the Quantum Fourier Transform using a small
quantum circuit?
– Since it’s a Q × Q matrix, it’s not obvious whether we can do it
using a circuit with only polylog(Q) gates.
I Once we’ve applied the QFT and measured, how do we make sense of the
outcome?
– Complications can arise because in all likelihood, the period s won’t
evenly divide Q.
160 LECTURE 20. QUANTUM FOURIER TRANSFORM
1 1 1 1 1 1 1 1 1 1 1 1
1 1 ω 1 ω 2 ω 3 1 2 3
= 1 1 ω 2 ω 0 ω 2 = 1 1 i −1 −i
F4 = 2 4 6
2 1 ω ω ω 2 1 ω ω ω 2 1 −1 1 −1
1 ω3 ω6 ω9 1 ω3 ω2 ω1 1 −i −1 i
(20.3)
1 1 1 1 1 1 1 1
1
ω1 ω2 ω3 ω4 ω5 ω6 ω7
1
ω2 ω4 ω6 1 ω2 ω4 ω6
1 1 ω3 ω6 ω1 ω4 ω7 ω2 ω5
F8 = √ (20.4)
2 2 1
ω4 1 ω4 1 ω4 1 ω4
1
ω5 ω2 ω7 ω4 ω ω6 ω3
1 ω6 ω4 ω2 1 ω6 ω4 ω2
1 ω7 ω6 ω5 ω4 ω3 ω2 ω1
As in Equation 20.1, the ω’s in the examples above are Qth roots of unity
where Q is the dimension of the matrix. We could design an algorithm to
apply these matrices by brute force, but there’s a better way. This method is
related to one of the most widely used classical algorithms, the Fast Fourier
Transform or FFT.
Suppose we have a vector of length Q, and we want to apply a Q×Q matrix
A to it. In general this requires us to do the full matrix-vector multiplication
which takes ∼ Q2 . However, if we know that A is the Fourier transform, then
the FFT lets us apply it in only O(Q log(Q)) steps, by exploiting regularities
in the Fourier matrix.
What regularity is there in the Fourier matrix? Look at F4 . If we swap
the second and third columns, we get:
1 1 1 1 1 1 1 1
11 i −1 −i → 1 1 −1 i −i
2 1 −1 1 −1 2 1 1 −1 −1
1 −i −1 i 1 −1 −i i
162 LECTURE 20. QUANTUM FOURIER TRANSFORM
Notice how the matrix can be broken up into 4 blocks, each either equal to H
or related to H by the application of a diagonal matrix. In fact if we define
the matrix B as
1 0
B= (20.5)
0 i
You can work out why it happens on your own, but it turns out that we
can define FQ in terms of this nesting recurrence:
1 FQ/2 BQ/2 FQ/2
FQ = √ , (20.9)
2 FQ/2 −BQ/2 FQ/2
20.1. QUANTUM FOURIER TRANSFORM 163
where we’ve now defined the generalization of the diagonal B matrix above,
BQ/2 , as (omitting the zeros on the off-diagonal)
1
ω
2
ω .
(20.10)
...
Q
ω 2 −1
If we let C(FQ ) be the number of steps needed to apply FQ , we get the
recurrence relation
By the time you reach the last couple of bits, the rotation is exponentially
small.
The final step to get the matrix we want is a Hadamard gate applied to
the most significant bit,
1 IQ/2 IQ/2
H ⊗ IQ/2 = √ , (20.15)
2 IQ/2 −IQ/2
which results in the final matrix
1 I I FQ/2 0 FQ/2 BFQ/2
√ = . (20.16)
2 I −I 0 BFQ/2 = FQ/2 −BFQ/2
Figure 20.2 contains the finished quantum circuit.
Most-Significant Bit ×
×
×
FQ/2 A
.. ..
. .
×
×
Least-Significant Bit × H
| {z }
Reorder Qubits
Recall from the previous lecture that after we compute f (r) and then mea-
sure the second register, we have a quantum state of the form
etc. . . are all pointing in different directions in the complex plane and they
cancel each other out.
The harder, general case is that s doesn’t divide Q. In this case, if we cal-
culate the final amplitude for a specific basis state k, then ignoringP the global
phase ω and the normalization, we’ll still get a sum of the form L−1
kr
l=0 ω
ksl
.
So, how likely we are to observe k will still depend on whether this sum con-
structively or destructively interferes. What changes is that now Q/s isn’t an
integer, and as a result, neither the constructive nor the destructive interfer-
ence will be perfect. But we’ll see that they’re still good enough for the period
s to be efficiently recovered.
If k 6= bc Qs e then we’ll claim that we see mostly destructive interference.
We claim that if k is close to an integer multiple of Qs , then we mostly see
constructive interference. If k is far from any integer multiple of Qs , then we’ll
see mostly destructive interference.
Let’s look at the constructive case first. Assume we get a bit lucky and we
have (say) k = c Qs + where || ∼ 10 1
. That means that, ignoring normalization,
the final amplitude of basis state k has the form
L−1 L−1
(c Q
X X
+)sl
ω s = ω cQl ω sl . (20.19)
l=0 l=0
We can drop the ω cQl because ω cQl = e(2πi/Q)(cQl) = e2πicl = 1. We’re then
left with justs the ω sl part. For the sake of clarity we can rewrite this term as
L−1 (2πi) Q l
. Recall now that L ∼ Qs . As such, the sum above corresponds
P
l=0 e
to a sum over complex numbers constrained to an fraction of the unit circle,
20.1. QUANTUM FOURIER TRANSFORM 167
ranging roughly from 1 to e2πi . Assuming is relatively small, this means the
complex numbers all point in close to the same direction and so we mostly
have constructive interference. This situation is illustrated in Figure 20.3.
In this lecture we’ll finish Shor’s algorithm and then discuss some of its im-
plications. Last we saw our protagonists, they were in a superposition of the
form
We’ve addressed the first case, so we’ll now focus on the second case. Say we
run the algorithm once getting an integer k1 = bc1 Qs e and then run it again to
get k2 , k3 , etc. The question is then, given these integers almost all of which
are close to integer multiples of Qs , how do we use them to deduce s itself ?
168
21.1. CONTINUED FRACTION ALGORITHM 169
14 1 1
3.14 = 3 + = 3 + 100 = 3 + 2 = ···
100 14
7 + 14
The idea is that we keep pulling out the largest integer we can and rewriting
our expression until we have an approximation of Qs to within an accuracy of
about Q12 . The reason why the method works is that s is a relatively small
integer, so Qs is not only rational but has a relatively small denominator. In
more detail, let’s write
Q
k=c ± (21.1)
s
where is some small value. Then we divide the above equation through by
Q to get
k c
= ± (21.2)
Q s Q
This immediately implies the following inequality
k c
− ≤ (21.3)
Q s Q
We’ll exploit the key inequality above, along with the following:
I We know l.
I We know Q (because we picked it, it’some power of 2 of order N 2 ).
170 LECTURE 21. CONTINUED FRACTIONS AND SHOR WRAP-UP
There’s math that backs this up, we’re just not covering it
here.
Suppose I give you a rational number, say 0.25001, and I tell you that it’s
close to a rational number with an unusually small denominator. How could
you figure out which such rational number it’s close to without having to try
all possible small denominators, of which there might still be too many? In
this particular example you just stare at the thing and immediately see that
1
4
is the answer! OK, but what would be a more systematic way of doing it?
A more systematic way is to expand the input number as a continued
fraction until the leftover part is so small that we can safely discard it. To
illustrate:
25001 1 1 1 1
.25001 = = 100000 = 24997 = 1 = 1
100000 25001
3+ 25001
3+ 25001 3+ 4
1+ 24997
24997
4
Now we’ve reached 24997
, a number small enough for us to discard, which leaves
us with
1 1 1
1 ≈ 1 =
3+ 4
1+ 24997
3+ 1
4
So now we have a way to find sc . Are we done? Well, we still have the same
difficulty that we encountered in the s divides Q case. Namely that k and s
might share a nontrivial divisor. If, for example, k and s were even, then we’d
c/2
have no possible way to tell sc apart from s/2 . We solve this using exactly the
same approach as before. We repeat the algorithm several times to generate
21.2. APPLICATIONS OF SHOR’S ALGORITHM 171
c1 c2 c3
, , ,
s1 s2 s3
etc. . . One can then show that the least common multiple of the si ’s
will be s itself, with high probability.
The ink wasn’t dry on Shor’s paper before people started asking: what else
might Shor’s algorithm be good for, besides factoring?
For starters, as we mentioned a couple lectures ago and as Shor showed in
his original paper, it also gives exponential speedup for Discrete Log, which
is the following problem:. given a prime p, and integers g and a, find an x such
that g x = a mod p. This is how Shor’s algorithm breaks the Diffie-Hellman
cryptosystem.
It was noted shortly afterward that Shor’s algorithm can also be modified
to break Elliptic Curve cryptosystems. Indeed, people quickly figured out that
Shor’s algorithm can be modified to solve pretty much any problem related
to finding hidden structures in abelian groups. Almost all the public-key
cryptosystems that we currently use in practice involve finding such hidden
structures.
In the years after Shor’s algorithm, a lot of research in quantum algorithms
was directed towards answering the question of to what extent can we generalize
Shor’s algorithm to solve problems about non-abelian groups? By now, though,
many people have given up on this research direction. It turns out that finding
hidden structures in non-abelian groups is very, very hard.
Why did people care about non-abelian groups? Well, if Shor’s algorithm
could be generalized to handle them, there are two famous problems that
would help us solve.
172 LECTURE 21. CONTINUED FRACTIONS AND SHOR WRAP-UP
The next quantum algorithm we’ll cover is Grover’s Algorithm which was
discovered in 1995 shortly after Shor’s algorithm.
The number of qubits needed to run Grover’s algorithm is very low, O(log (N )),
and the number of gates required (besides those needed to compute f itself)
174
175
√
is also reasonable, O( N log (N )). However, for Grover’s algorithm to work
we do need to assume that we have access to f that lets us apply the uni-
tary transformation |x, ai → |x, a ⊕ f (x)i. This wasn’t important in Shor’s
Algorithm because we only made one query and then discarded the result.
There are two main example applications to keep in mind with Grover’s
algorithm. The first application is solving combinatorial search and optimiza-
tion problems, such as NP-complete problems. Here, we think of N = 2n as
being exponentially large, and we think of each candidate solution x ∈ {0, 1}n
as an n-bit string. We then set, for example, f (x) = ϕ(x), where ϕ is an
instance of Satisfiability or some other NP-complete problem. Then, Grover’s
algorithm can find an x such that ϕ(x) = 1 in O(2n/2 )poly(n)) time. That
is, N = 2n/2 queries to f , and poly(n) time to implement each query (say,
by checking whether a given x satisfies ϕ). This is an apparent speedup for
NP-complete problems—but at most a quadratic one and also only conjec-
tural, because of course we can’t even rule out the possibility of P=NP, which
would annihilate this sort of speedup.
For an NP-complete problem like CircuitSAT, we can be pretty confident
that the Grover speedup is real, because no one has found any classical algo-
rithm that’s even slightly better than brute force. On the other hand, for more
“structured” NP-complete problems, we do know exponential-time algorithms
that are faster than brute force. For example, 3SAT is solvable classically in
about O(1.3n ) time. So then, the question becomes a subtle one of whether
Grover’s algorithm can be combined with the best classical tricks that we know
to achieve a polynomial speedup even compared to a classical algorithm that
uses the same tricks. For many NP-complete problems the answer seems to
be yes, but it need not be yes for all of them.
The second example application of Grover’s algorithm to keep in mind is
searching an actual physical database. Say you have a database of personnel
records and you want to find a person who matches various conditions (hair
color, hometown, etc.). You can set f (x) = 1 if person x meets the criteria and
f (x) =
√ 0 otherwise. Grover’s algorithm can search for an x such that f (x) = 1
in O( N ) steps. One big advantage of Grover’s algorithm as applied to actual
physical databases is that the quantum speedup is provable; it doesn’t rely on
any unproved computational hardness assumptions.
Some people have questioned the practicality of using Grover’s algorithm to
search a physical database, because the database needs to support “superposed
queries.” That is, you need to be able to query many records in superposition
and get back a superposition of answers. A memory that would support these
kinds of queries is called a “quantum RAM.” Building one is a whole additional
technological problem beyond building a quantum computer itself. It remains
176 LECTURE 22. GROVER’S ALGORITHM
unclear whether people will be able to build quantum RAMs without n active,
parallel computing elements—which, if you had them, would remove the need
to run Grover’s algorithm.
N −1
1 X
√ |xi , (22.1)
N x=0
This is the so-called Grover Diffusion Operator, which has the effect of
22.1. THE ALGORITHM 177
1
PN −1
flipping all N amplitudes about the mean amplitude ᾱ = N x=0 αx ,
αx → 2ᾱ − αx . (22.4)
So why does applying D help us? Well, let’s look at what’s happening after
a single Grover iteration of applying Uf and D pictorially, using the depiction
shown in Figures 22.1–22.4. After a single diffusion operation, we’ve managed
to increase the amplitude of the marked item to roughly √3N and decrease the
amplitudes of all the other items accordingly.
Then, we keep repeating by applying another Uf and then another D and
so on. By doing so, we can increase the amplitude of the marked item further
as pictured in Figures 22.5–22.7.
Figure 22.1: The initial amplitudes Figure 22.2: The amplitudes fol-
of the system, an even superposi- lowing the first application of the
tion state. phase oracle. Note that the ampli-
tude of the marked item has had its
sign flipped.
Figure 22.3: The average ampli- Figure 22.4: The amplitudes fol-
tude ᾱ has been explicitly drawn lowing the first Grover diffusion op-
in. erator.
178 LECTURE 22. GROVER’S ALGORITHM
First, though,
√ a natural question√ to ask about Grover’s algorithm is why
3
should it take N steps? Why not N or log N ? We see here that in some
sense the ultimate source of the N is the fact that amplitudes are the square
roots of probabilities. Instead of adding ∼ N1 probability to the marked item
with each query, quantum mechanics lets us add ∼ √1N amplitude, resulting
in quadratically faster convergence. This intuition will be made more rigorous
in the next lecture when we learn about the BBBV Theorem.
N −1
1 X
βy = √ (−1)y·x αx . (22.5)
N x=0
P −1
The first of these amplitudes plays a special role. If y = 0 we have √1N N
x=0 αx
which is proportional to the average, which is good because our goal was to
invert about the average. The other y values play no particular role in Grover’s
algorithm.
180 LECTURE 22. GROVER’S ALGORITHM
So, in the Hadamard basis what we want is to perform the diagonal matrix
A,
1
−1
A= . (22.6)
...
−1
2. Apply A.
|0i H H H H H ···
.. Uf .. A .. Uf .. A ..
. . . . .
|0i H H H H H ···
The first thing to note is that we can write the unitary operation A in Equation
22.6 as
1
−1
A= = 2 |0i h0| − I. (22.7)
..
.
−1
this produces
In other words, the A matrix flips the sign of all of the amplitudes except that
of the |0i state (equivalently you could say A flips the sign of α0 while leaving
the rest of the amplitudes alone). This has the effect of reflecting our state
about the |0i axis in the N -dimensional Hilbert space of the system.
Recall that D = H ⊗n AH ⊗n . Thus,
D = H ⊗n AH ⊗n
= H ⊗n (2 |0i h0| − I)H ⊗n
(22.8)
= 2H ⊗n |0i h0| H ⊗n − H ⊗n IH ⊗n
2 |φi hφ| − I,
P −1
where |φi = √1N N x=0 |xi is the uniform superposition state. Notice the
similarity between the form of this operator and the form of A in Equation
22.7. Indeed, just as we saw that the result of applying A is a reflection of the
state about the |0i axis in N -dimensional Hilbert space, D reflects about the
|φi axis.
Similarly, Uf corresponds to a reflection about the |x∗ i axis in N -dimensional
Hilbert space, where |x∗ i is the basis state corresponding to the marked item.
22.1.3 Analysis
Now let’s analyze Grover’s algorithm more carefully and actually prove that
it works.
182 LECTURE 22. GROVER’S ALGORITHM
The initial state of the system following the first round of Hadamards
shown in Figure 22.8 is
N −1
1 X
|ψi = √ |xi . (22.9)
N x=0
Somewhere in the N -dimensional space is the basis state |x∗ i corresponding
to the marked item we’re looking for. Our initial state |ψi overlaps only very
slightly with the state of the marked item |x∗ i: hψ|x∗ i = √1N . So |ψi and
|x∗ i are not quite orthogonal, but nearly so. Now, these two states |ψi and
|x∗ i span a two-dimensional subspace of the overall N -dimensional Hilbert
space. A crucial insight about Grover’s algorithm is that it operates entirely
within this subspace. Why? Simply because if we start in the subspace then
neither the queries nor the Grover diffusion operations ever cause us to leave
it! This means that we can visualize everything Grover’s algorithm is doing
by just drawing a picture in the 2D plane. The axes here are given by |x∗ i
and |unmarkedi, where |unmarkedi is an equal superposition state over all of
the unmarked items:
1 X
|unmarkedi = √ |xi . (22.10)
N − 1 x6=x∗
Note that |x∗ i and |unmarkedi are clearly orthogonal to each other. We’ve
seen already how the algorithm alternates between two types of operations:
I Inverting the component of our state that points in the |x∗ i direction by
querying Uf as shown in Figure 22.9.
Figure 22.9: Applying the oracle Uf t0 the initial state |ψi to get the new
state |ψ 0 i.
22.1. THE ALGORITHM 183
I Reflecting our state about |φi, the uniform superposition state, using
the diffusion operator D ,as shown in Figure 22.10:
Initially, the angle of |ψi with the horizontal is θ = arcsin ( √1N ) ≈ √1N .
After each iteration, we’ve rotated by an additional √2N . Hence, the probability
of success after the tth iteration is given by
∗ 2 2 2t + 1
P (success) = | hx |ψi | = sin √ . (22.11)
N
This means that we’ll get super close to
√ |x∗ i and have a high probability of
observing x∗ if we measure after about π4 N iterations—something that you
can directly see from the geometric picture. We might not get exactly to 1 if
the step size of the rotations causes us to overshoot slightly, but at any rate
we’ll get close.
We can see right away that, unlike any classical algorithm, Grover’s al-
gorithm has the amusing property that it’s success probability starts getting
worse if we run it for too long!
One of the first questions people asked about Grover’s algorithm was: what
if the number of marked items, K,isn’t known? You can sort of see the danger.
It’s possible to run Grover’s algorithm the right number of times to hit the
peak success probability when there’s a single marked item, but that might
overshoot and lead to success probability near 0 is there are more marked
items.
The most basic way to solve this problem is simply
√ to run the algorithm for
a random number of iterations, say between 0 and N . If we do this then most
of the time we expect to end up somewhere around the middle of a sinusoid
(neither at a trough nor a peak), where we have a constant probability (say,
40% or 60%) of observing a solution if we measure. This is perfectly sufficient
186 LECTURE 22. GROVER’S ALGORITHM
from an algorithmic standpoint, since it means that we only need to repeat the
algorithm O(1) times √ on average until we see a marked item. This gives us an
upper bound of O( N ) queries to find a marked item with high probability,
regardless of how many marked items there are (assuming there’s at least one).
What happens if we run Grover’s algorithm, but the database turns out to
have no marked items? When we query f nothing happens. When we apply
the diffusion operator nothing happens. So, the state just remains a uniform
superposition over all N items for the entire duration of the algorithm. This
means that when we measure we just get a random item. We can check that
item and see that it isn’t marked.
How can we be certain that there are no marked items? This is the question
that arises in the decision version of Grover’s algorithm. In fact, no matter how
many times we run Grover’s algorithm, we never become 100% sure that there
are no marked items since we could’ve just gotten unlucky and failed to find the
items every time. However, after O(1) repetitions, the algorithm has as a high
a probability as we like (say, > 99.99%) of finding a marked item assuming
that there’s at least one of them. If, after we’ve run Grover’s algorithm a
sufficient number of times, we still haven’t found a marked item, then we
can√deduce that there almost certainly weren’t any. This again requires only
O( N ) queries.
We’ve
√ said that if there are K marked items then we find one of them
in O( N ) queries without knowing K.qIn fact we can do even better than
N
that and find a marked item in only O( K ) queries; again without knowing
K. This is the same performance as if we did know K, so how does this
work? Assume for simplicity that N is a power of 2. First, we guess that
almost all items are marked, do a single query and then measure. If we find
a marked item, great. If not, we guess that N2 items are marked and run
Grover’s algorithm with K = N2 . If we find a marked item, great. Next we run
Grover’s algorithm with K = N4 , then K = N8 and so on, repeatedly halving
our guess for the number of marked items until either we’ve found a marked
item or we’ve searched unsuccessfully with K = 1.
This method wastes some queries on “wrong” values of K. Crucially, be-
cause the number of queries is increasing exponentially, the number of wasted
queries is only a constant factor greater than the number of queries used in
the final iteration; the one that guesses an approximately correct value of K.
Details of the analysis are left as an exercise for the reader.
Let’s end by mentioning a different way to handle the case of multiple
marked items that achieves essentially the same performance using a purely
classical trick. Again suppose we have N items, K of which are marked. We
22.1. THE ALGORITHM 187
want to reduce this to the case of just a single marked item. How do we do
N
that? Simple, we pick K items uniformly at random and then run Grover’s
algorithm on that subset only. The number of marked items that we’ll catch
in the subset is well approximated by a Poisson distribution, and one can
calculate the probability of catching exactly one marked item in our sample
as ∼ 1e . So, we search that subset of KN
items using Grover’s algorithm for
q
N
the single marked item case (which uses O( K ) queries). If we don’t find a
marked item we can try again with a new random subset.
Exercise for the reader: Show that, if there are
√ K marked items and we
want to find all of them, we can do that using O( N K) queries.
Lecture 23: The BBBV Theo-
rem and Applications of Grover’s
Algorithm
Amusingly, BBBV were trying to prove that there’s no magic way to search
faster using a quantum computer. They were able to get a lower bound of
188
23.1. THE BBBV THEOREM 189
√
Ω( N ), and figured that tightening the bound to Ω(N ) was a technical issue
that they could leave for the future—that is, until Grover came along and
showed why such a tightening is impossible!
While by now we know many proofs of the BBBV Theorem, the original
(and still most self-contained) proof uses what’s called a Hybrid Argument.
Imagine we’re using an arbitrary quantum algorithm to search for a single
marked item in a list of size N . Without loss of generality we can say that the
algorithm makes a total of T queries and follows some sequence of operations
that looks like
U0 → Q0 → U1 → Q1 → U2 → Q2 → · · · (23.1)
We’ll let |ψt i denote the state immediately following the tth query. Let
x ∈ {0, . . . N − 1} denote and index of a queried list element and suppose
we allow the algorithm to use an unlimited number of ancillary qubits for
workspace, the state of which will be denoted w. In general, |ψt i can be
written as a superposition of the list elements x and the possible values of the
190 LECTURE 23. BBBV AND GROVER APPLICATIONS
Here αx,w,t is the amplitude of the basis state |x, wi following the tth iteration.
Note that |αx,w,t |2 is the probability of finding the item x if we were to measure
after the tth iteration. The query magnitude Mx is therefore the sum over all
of the probability that we would find item x if we measured at iteration t.
Rearranging, we find that the sum of all the query magnitudes is
N
X −1 X
T X T N
X X −1 X T
X
2 2
|αx,w,t | = |αx,w,t | = 1 = T. (23.4)
x=0 t=1 w t=1 x=0 w t=1
From this it immediately follows that the average query magnitude is
N −1
1 X T
Mx = . (23.5)
N x=0 N
Furthermore, given any list of numbers there must be at least one number
in the list whose value is at most the average. Thus let’s fix some index
x∗ ∈ {0, . . . , N − 1} such that Mx∗ ≤ N
T
.
Figure 23.1: Modified table with the final query changed to return 1 for x∗ .
So how much can changing the result of the last query change the final
state? Before the final query, the states are identical: |ψt i = |ψt0 i for t < T .
The effect of the final query is to flip the phase of the amplitudes associated
192 LECTURE 23. BBBV AND GROVER APPLICATIONS
Figure 23.2: For the hybrid argument we repeat the process of swapping out
the all-zero oracle for more and more queries until we’ve replaced it with the
oracle returning 1 for x∗ for every iteration of the algorithm.
|ψT i − |ψT0 i =
−1 X
N
! !
X XX X
αx,w,T |x, wi − αx,w,T |x, wi − αx∗ ,w,T |x∗ , wi
x=0 w x6=x∗ w w
X
= 2 αx∗ ,w,T |x∗ , wi
w
s
X
=2 |αx∗ ,w,T |2 .
w
(23.6)
Suppose now that the hybrid oracle returns 1 for x∗ on the last two queries.
How much will the output state differ from that in the all-zero case now? We
again know that up to the second-to-last query the states are the same. The
distance between |ψT −1 i and |ψT0 −1 i is then (following the exact same process
as above)
X
|ψT −1 i − |ψT0 −1 i = 2 αx∗ ,w,T −1 |x∗ , wi
w
s
X (23.7)
=2 |αx∗ ,w,T −1 |2
w
But we’re not done yet, because what about the final query? We’ll define the
new vectors |eT i = |ψT i − |ψT0 i and |eT −1 i = |ψT −1 i − |ψT0 −1 i. Rearranging,
we can rewrite |ψT0 i as |ψT i − |eT i and |ψT0 −1 i as |ψT −1 i − |eT −1 i. Then, by
23.1. THE BBBV THEOREM 193
The first term above is precisely the output state |ψT0 i that we had when we
changed only the final query to include the marked item. So we can rewrite
the above as
Here the third line crucially used the fact that QT and UT are unitary.
It is straightforward to generalize this to replacing more and more of the
queries with ones that return f (x∗ ) = 1, as shown in Figure 23.2. Let’s
(T )
call the state corresponding to replacing the oracle for all T iterations |ψT i
(where the superscript denotes the numberPof times we’ve modified the oracle).
P 2 T
Let Mx,t = w |αx,w,t | , so that Mx = t=1 Mx,t , where Mx is the query
magnitude of x. Then
T
X
s
X T
X
(T )
p
|ψT i − |ψT i ≤2 |αx∗ ,w,t |2 =2 Mx∗ ,t . (23.11)
t=1 w t=1
case,
T
(T )
X p 2T
|ψT i − |ψT i = 2 Mx∗ ,t ≤ √ . (23.13)
t=1 N
194 LECTURE 23. BBBV AND GROVER APPLICATIONS
Quantumly we could speed this up by searching each row for ap0 using
√
Grover’s algorithm. The query complexity for each row would be ∼ N=
1/4 1/4
N , or technically N log(N ) if we repeat the Grover search on each row
enough times to have (say) a N1 probability of error. This means that searching
√
the whole table would take ∼ N N 1/4 log(N ) = N 3/4 log(N ) queries.
23.2. APPLICATIONS OF GROVER’S ALGORITHM 195
1/4
√
O( N
|{z} × N 1/4
|{z} × log(N ) ) = O( N log(N ))
| {z }
Outer Grover Inner Grover Error Reduction
Why couldn’t we just do Grover’s algorithm once, over the whole table?
Well, just because there’s a 0 somewhere in the table doesn’t mean that there
couldn’t be a row of all 1’s somewhere else.
OR of ANDs Tree
We could easily generalize the scheme we used for the OR of ANDs problem
to evaluate (e.g.) an OR of ANDs of ORs by doing three recursive layers of
Grover search and so forth. If we allow an arbitrary number of layers then
we enter a setting commonly seen in A.I. research: game trees for two-player
196 LECTURE 23. BBBV AND GROVER APPLICATIONS
games of alternation such as chess and Go. In this setting the goal is to find
a move you can make (represented by an OR over various options), such that
given any move that your opponent makes (represented by ANDs over various
options), there is a move that you can make in response, etc. . . that eventually
wins you the game.
The problem is that as the game tree gets deeper and deeper, the advantage
of Grover’s algorithm over classical search seems to get weaker and weaker.
This is for two reasons: first, the amplification that’s needed at each layer to
prevent error buildup and second, the constant factors, which multiply across
the layers. With regards to the second reason you might object that the
constant factor for Grover’s algorithm is π4 < 1, so if we multiply this constant
factor across layers you’d think we’d do better and bettwe with increasing
depth! Note however that each layer actually needs to run Grover’s algorithm
on the layer below it twice: once to evaluate |xi → |xi |f (x)i for that layer,
and a second time to uncompute any garbage generated (which would have
the effect of destroying the interference effects needed for higher levels of the
recursion). So the constant factor actually becomes π2 > 1.
In short, we still haven’t answered the natural question: can a quantum
computer help you play chess? For game-tree search with a deep enough tree,
Prof. Aaronson and others conjectured that the diminishing returns from
Grover’s algorithm would end up negating any asymptotic advantage over a
classical computer.
In 2007, however, Farhi, Goldstone, and Gutmann along with others who
23.2. APPLICATIONS OF GROVER’S ALGORITHM 197
built on their work dramatically refuted that conjecture. The upshot is that
we now know √ how to evaluate any game tree with N leaves, no matter how
deep, in O( N ) time on a quantum computer. This is also known to be
asymptotically optimal.
So, yes, quantum computers probably would help you play chess! (At least
is brute-force game tree evaluation is a component of the classical algorithm
you’re competing against). To attach some numbers to this claim, Claude
Shannon famously estimated the number of possible board positions in chess
as ∼ 1043 , which is certainly out of range for any existing computer on earth.
But if quantum computers brought that down to ∼ 1021.5 , solving chess might
just be doable.
198
24.1. MORE APPLICATIONS OF GROVER’S ALGORITHM 199
Lecture 18 when discussing Simon’s √ algorithm. If there are N days in the year,
then you only need to ask about N people before there’s an excellent chance
that you’ll find two with the same birthday. This is because what matters is
the number of pairs of people, which grows quadratically with the number of
people asked. The lower bound can be proven using the union bound. With
a random two-to-one function, each pair has only a ∼ N1 chance of being a
collision, so to find a collision with constant
√ probability you need to look at
∼ N pairs or more (and therefore make ∼ N queries).
What about quantumly? Well, √ we could of course simulate the above ran-
domized √ algorithm to get ∼ N . But there’s also a completely different way
to get ∼ N . Namely, we could first query f (1), and then do a Grover search
for an x 6= 1 such that f (x) = f (1). So a question√naturally arises: can we
combine the two approaches to do even better than N ?
In 1997 Brassard, Hoyer and Tapp (BHT) showed how to do exactly that.
Here’s their algorithm:
I First, pick N 1/3 random inputs to f , query them classically and sort the
results for fast lookup.
I Next, run Grover’s algorithm on N 2/3 more random inputs to f (inputs
that weren’t queried in the first step). In this Grover search, count each
input x as marked if and only if f (x) = f (y), where y is one of the N 1/3
inputs that was already queried in the first step. This requires lookups
to our sorted list, but no additional queries to f .
This algorithm makes N 1/3 × N 2/3 = N pairwise comparisons and the runtime
is
√
N 1/3 + N 2/3 = O(N 1/3 ) (24.1)
The BHT algorithm gives a good illustration of one way quantum algo-
rithms can end up with weird running times. You have two or more phases of
the algorithm that you try to balance against each other in order to minimize
the total time.
At a high level you can see why the BBBV proof that we used to prove the
optimality of Grover’s algorithm doesn’t work for the collision problem. In the
BBBV proof we changed a single element from 0 to 1 and then showed that
200 LECTURE 24. MORE GROVER. . .
it would take many iterations for the algorithm to notice. With the collision
problem, by contrast, the key issue is that turning a one-to-one function into
a two-to-one function requires changing half the elements. Instead, Aaronson
and Shi used polynomial approximation theory (a branch of mathematics) to
rule out super-fast quantum algorithms for the collision problem.
In some sense, proving a quantum lower bound for the collision problem
should be harder than proving one for the Grover problem, because if the lower
bound for collision did too much then it would rule out things like Simon’s
algorithm or Shor’s algorithm. The details of the proof are beyond the scope of
this course, but generally what allows a lower bound for the collision problem
is that it has permutation symmetry that Simon’s and Shor’s problems lack: if
you take a one-to-one function and permute its inputs and outputs arbitrarily,
then it’s still a one-to-one function, and likewise for a two-to-one function.
Inner Subroutine:
I Given
√ a list of √
the N values of the function we randomly split them into
N blocks of N values each.
I We then pick a block at random and query all elements in it.
– Sort elements for fast lookup. This doesn’t require any extra queries.
– If we find a collision, return it and halt.
I If we don’t find a collision then use Grover to search for collisions between
the block we initially selected and the rest of the list. If we find a collision,
return it and halt.
Outer Grover:
√
I Run Grover over the choice of which of the N blocks we query initially
when running the inner subroutine above.
– If any of the inner subroutines being Grover searched over return a
collision, then return it and halt.
inputs with constant probability, we now simply run our hypothesized Ele-
ment
√ Distinctness algorithm on that subset. This gives a query complexity of
o(( N ) ) = o(N 1/3 ), which contradicts the lower bound we proved for the
2/3
Collision Problem.
Matching this lower bound, in 2003 Andris Ambainis found a quantum al-
gorithm that solves Element Distinctness with O(N 2/3 ) queries. His algorithm
uses “quantum walks,” which are vaguely like Grover’s algorithm but more
sophisticated. It also requires a huge amount of workspace qubits, namely
∼ N 2/3 of them. Whether this large number of workspace qubits is necessary
remains open to this day.
Quantumly it turns out that we can solve this problem using only 1 queries,
a quadratic speedup, by using a clever application of Grover’s algorithm (we
omit the details here).
We know that P ⊆ BQP, because Toffoli gates can simulate AND, OR,
and NOT gates and hence universal classical digital computation. So any clas-
sical digital calculation can be simulated by a quantum computer. We also
know from Shor’s algorithm that Factoring (when suitably phrased as a deci-
sion problem) is in BQP, though it’s not known (to put it mildly) whether
Factoring is in P. We also don’t know whether NP ⊆ BQP: that is, whether
quantum computers can solve all NP problems (including NP-complete prob-
lems) in polynomial time. The BBBV Theorem does tell us that there isn’t an
“easy” proof of NP ⊆ BQP to be had, one that just treats the NP problem
as a black box.
Can quantum computers solve NP-complete problems in polynomial time?
remains one of the big open problems of quantum complexity theory. Of
course, if P = NP then NP ⊆ BQP, so any proof of NP * BQP would
require proving P 6= NP at the least. No reduction or equivalence between
the P = NP and NP ⊆ BQP problems is currently known.
We could also ask the converse, Is BQP ⊆ NP? In other words, for every
problem that a quantum computer can solve, is there a short proof of the answer
that’s easy to verify classically? It’s possible that there are counterexamples
to this, but we don’t have any good candidates right now—that is, unless
we broaden the definitions of BQP and NP beyond decision problems, to
capture more general problems such as promise problems, search problems
and sampling problems.
The last important question to ask here is: if BQP doesn’t seem to be
contained in P, and maybe not even in NP, then what is it contained in?
What classical class gives an upper bound on what a quantum computer can
do? Well, Bernstein and Vazirani showed that it’s possible to simulate a
quantum computer classically with exponential time and polynomial memory,
basically by writing an amplitude of interest as a sum of exponentially many
contributions, and then evaluating the contributions one by one, reusing the
same memory each time and adding the results to a running total. This gives us
an upper bound: BQP ⊆ PSPACE, where PSPACE (Polynomial Space)
is the class of problems solvable on a digital computer using a polynomial
amount of memory, but possibly exponential time. It’s possible to get a better
upper bound on BQP, but it involves other complexity classes that we won’t
define here.
So, what would it take to prove that P is different from BQP? Of course
this would follow if factoring wasn’t in P, but proving the latter would require
showing P 6= NP! It’s also not clear that there is any better hope for proving
P 6= BQP in the near future than there is for proving P 6= NP. The reason
is that BQP is sandwiched between P and PSPACE:
24.3. QUANTUM COMPLEXITY THEORY 205
P ⊆ BQP ⊆ PSPACE.
For this reason, any proof of P 6= BQP would also need to show that P 6=
PSPACE, which is a big unsolved problem in itself.
But even if it turns out that quantum computers can’t solve NP-complete
problems in polynomial time, the question still remains how close can they
get? We know from the BBBV Theorem that any approach that ignores the
structure of NP-complete problems can at most yield the Grover speedup.
206
25.2. HAMILTONIANS 207
25.2 Hamiltonians
Recall that unitary matrices are discrete-time linear transformations of quan-
tum states:
|ψi → U |ψi
But in physics, time is typically treated as continuous. Rather than view-
ing the transformation of |ψi to U |ψi as a discrete jump, it is viewed as the
result of a continuous process causing the state to evolve over some interval
of time. Hamiltonians are just the instantaneous time generators of unitary
transformations. That is, they’re things that give rise to unitary transforma-
208 LECTURE 25. HAMILTONIANS
tions when you “leave them running” for some period of time. Like density
matrices, Hamiltonians are described by Hermitian matrices. Unlike density
matrices, however, Hamiltonians don’t need to be positive semidefinite or to
have trace 1. Physically, Hamiltonians are operators that are used to represent
the total energy of a system.
d
i |ψi = H |ψi , (25.1)
dt
with H being some Hamiltonian. This equation describes the evolution of an
isolated quantum pure state in continuous time.
where we now just plug in a matrix instead of a scalar in order to get a matrix-
valued result. Here are some examples:
0 0 1 0 1 0 e 0
exp = exp = (25.4)
0 0 0 1 0 0 0 1
More generally, for any diagonal matrix we have
λ0 e λ0
exp
... =
... .
(25.5)
λn−1
λn−1 e
hv|H|vi = λ
If we take the complex conjugate of both sides we get
H = U DU † (25.7)
where U is a unitary transformation and D is diagonal and real-valued. Now,
to show that e–iHt is unitary just diagonalize H:
†
e–iHt = e−itU DU = U e–itD U † . (25.8)
We know from Equation 25.5 that e−itD has the form
e−itλ0
e−itD =
.. ,
(25.9)
.
−itλn−1
e
which is clearly unitary because the λi ’s are reals. Therefore, e−iHt = U e−iDt U †
is unitary as well.
Note that if |vi is an eigenvector of H associated with the eigenvalue λ then
|vi is also an eigenvector of e–iHt with the eigenvalue e–iλt . So eigenvectors of
H give rise to eigenvectors of the corresponding unitary.
Now, what about going backwards? Given a unitary U , can we always find
a Hermitian matrix H such that U = e–iHt ? Yes, this is not hard. First
diagonalize U (which we can always do for unitary matrices, again by the
spectral theorem) to get U = V DV † . We then just need to take matrix
logarithm—like the matrix exponential, this can be defined in terms of the
Taylor expansion. Similar to what we found for the matrix exponential, the
logarithm of a diagonal matrix, D, can be obtain by taking the logarithm
of each entry. For each diagonal element of D, Djj , we find a λj such that
25.2. HAMILTONIANS 211
Djj = e–iλj t . Will the set of {λj } that we get by solving this be unique? No,
because by Euler’s formula we can always add 2πi to the exponent and the
equation will still hold. We saw, for example that
0 0
0 0
is a logarithm of the identity matrix. What else is?
2πi 0 4πi 0
, ,··· (25.10)
0 2πi 0 6πi
Thus, any given unitary can arise from infinitely many dif-
ferent Hamiltonians.
25.2.2 Energy
Physicists have a special name for the eigenvalues that you get by diagonalizing
a Hamiltonian. They call them energies. Note that they’re all real and can
therefore can be ordered from least to greatest:
λ0
...
(25.11)
λn−1
with λ0 ≤ λ1 ≤ · · · ≤ λn−1 . To each energy λj there corresponds an en-
ergy eigenstate |vj i such that H |vj i = λj |vi. Why are they called energies?
Because they are possible values for the system’s energy.
form a complete basis (we won’t prove that at the moment, but it is indeed
true), so we can write an arbitrary state as a superposition over the energy
eigenstates:
But the above picture is actually extremely useful. For one thing, it suggests
that we can simply define energy as the speed at which a quantum state picks
up a phase.
One thing that’s clear from our definition is that energy is conserved. More
formally, the expectation value of the energy in the state |ψi,
X
hψ|H|ψi = |αj |2 λj ,
j
This initial guess that your system is in the ground state turns out to
work very well much of the time. Why are quantum systems often found
sitting in their ground states? Intuitively, it’s because physical systems “like”
to minimize their energy: when lower energy states are available, they tend
towards them. The ground state, by definition, is the lowest they can go.
But since quantum mechanics is time-reversible, how is it even possible for a
system to be “attracted” to a certain state? Excellent question! You can thank
the Second Law of Thermodynamics and the conservation of energy for
this.
The same question arises in classical physics,
which is time-reversible too. If you leave a ball
rolling around in a basin and return a while later
you probably won’t find it in an “excited state”—
i.e. continuing to roll around. Whatever energy
it had in its excited state, it could reach a lower
energy by rolling downhill and giving off heat (en-
ergy) via friction, eventually coming to a rest.
When this happens, the kinetic energy that used to be in the ball dissipates
away in the heat.
In principle it’s possible that the reverse could happen: the heat in the
basin could coalesce back into the ball and make it spontaneously move. But
we essentially never observe that, and the reason comes down to entropy. For
all the heat to coalesce back into the motion of the ball would require an
absurdly finely-tuned “conspiracy,” in which a massive number of particles
synchronize their random motion to impart a kick to the ball. The probability
of this synchronized behavior occurring by chance falls off exponentially with
214 LECTURE 25. HAMILTONIANS
the number of particles. But the reverse process, motion dissipating into heat,
requires no similar conspiracy; it only requires that our universe does contain
low-entropy objects like balls.
Pretty much exactly the same story works in the quantum case and explains
why, when we find quantum systems in nature, they’re often sitting in their
ground states. If they weren’t then their interactions with surrounding systems
would tend to carry away excess energy until they were. By contrast, all the
quantum algorithms and protocols that we’ve seen in this course are examples
of quantum systems that don’t just sit in their ground states. Stuff happens,
the system evolves!
Figure 25.1: Hydrogen atom emit- Figure 25.2: Hydrogen atom ab-
ting a photon due and dropping to sorbing a photon and jumping up
a lower energy level. to a higher energy level.
To give an example of these concepts that the physicists really love, let’s
talk about the hydrogen atom. In the ground state, a hydrogen atom has its
electron sitting in the lowest energy shell (the one closest to the nucleus). The
first excited state has the electron in the next shell up (a bit farther away on
average from the nucleus). If the atom is in its first excited state, it’s easy for
it to drop back down to its ground state by emitting a photon. The photon
carries away an amount of energy that’s exactly equal to the difference between
the ground and the first excited energies. Conversely, a hydrogen atom in its
25.2. HAMILTONIANS 215
ground state can jump up to its first excited state via the electron absorbing
a photon. But the latter process is not spontaneous; it requires a photon just
happening to come by. That’s why hydrogen atoms in nature are most often
found in their ground states.
H = H0 ⊗ H1 .
We do not, in general, have e−i(H0 ⊗H1 ) = e−iH0 ⊗e−iH1 . However, that equation
does hold (up to a global phase) if either H0 or H1 is the identity matrix.
In that case, we can think of H as acting nontrivially on only one of the
two subsystems and trivially on the other (just like with tensor products of
unitaries).
What does this mean physically? Intuitively, it just means we’ve got two
things going on at the same time. For example, H0 and H1 could correspond
to two different forces acting on our system.
This basically means that we can achieve the same effect as A and B occurring
simultaneously by repeatedly switching between doing a tiny bit of A and a tiny
bit of B. We won’t do it here, but it’s possible to prove that the approximation
improves as decreases, becoming an exact equality in the limit → 0. This
is important for the question of how to simulate a real-world quantum system
using a quantum computer. Indeed, the straightforward approach is just:
To flesh this out a bit more we ought to say something about what the
Hamiltonians of real physical systems tend to look like. Suppose we model
the universe as a gigantic lattice of qubits, say in 2 or 3 dimensions (hey, it is
a quantum computing class!). This is actually exactly the sort of model used
to study many condensed matter systems. Suppose too that we only consider
25.2. HAMILTONIANS 217
Here Hj is a Hamiltonian that acts only on the qubit j and trivially (i.e., as
the identity, which has no effect) on all the others, for example:
h1 h2 1 0 1 0
H0 = ⊗ ⊗ ··· ⊗
h3 h4 0 1 0 1
Meanwhile, for every edge (j, k) in the lattice, Hjk is a Hamiltonian that acts
nontrivially on the neighboring qubits j and k and trivially on all the other
qubits, for example:
0 ⊗n−2
0
⊗ 1 0
.
0 0 1
1
Figure 25.3: Example of a lattice of qubits with some set of pairwise nearest-
neighbor interactions (blue squiggly lines) between them.
So each qubit “talks” only to its immediate neighbors in the lattice. Even
so, evolving the Hamiltonian over time gives us effects that can propagate
arbitrarily far.
As soon as we’ve written this, though, we face a puzzle. Won’t this lead to
faster-than-light travel? Indeed, even when t is arbitrarily small, one can check
that the unitary matrix e–iHt will generally contain effects coupling every qubit
in the lattice to every other one. Granted, the magnitude of these effects will
218 LECTURE 25. HAMILTONIANS
fall off exponentially with distance, but causality demands that there should
be literally zero effects propagating across the lattice faster than light.
So what’s the resolution? Basically it’s just that the picture we’re using
comes from non-relativistic quantum mechanics, so it yields a good approxi-
mation only if the relevant speeds are small compared to the speed of light.
When the speeds are larger, we need the framework of quantum field theory,
which does ensure that faster-than-light influences are exactly zero.
OK, now we’re ready to set things up for the next lecture. Suppose that H,
a Hamiltonian acting on n qubits, is the sum of many “simple” Hamiltonians
acting on a few qubits each:
H = H0 + · · · + Hm−1 .
Since H is a 2n × 2n , matrix figuring out its ground state (or ground states) by
brute-force diagonalization could be extremely time-consuming. This leads to
a question: if I know the ground states of the Hj ’s individually, can I combine
them in some simple way to get the ground state of H itself ? Alas, the answer
is almost certainly “no.” More precisely, we claim that finding the ground
state of a Hamiltonian of this form is an NP-hard problem. To prove this
we’ll show how to take any instance of the famous 3SAT problem and encode
it into the ground state problem. Suppose we have a Boolean formula in n
variables,
m−1
X
H= Hi (25.19)
i=0
How do you give someone a Hamiltonian like that? Providing the full
2 × 2n Hermitian matrix would be wasteful. Instead you can simply list the
n
local terms {H0 , . . . , Hm−1 } (to some suitable precision), together with the
qubits to which they’re applied.
Is this problem NP-complete? Since we know it’s NP-hard, what we’re
asking here is whether it’s also in NP. In other words, when we claim that the
ground-state energy of some Hamiltonian is at most (say) 5, can we prove it by
giving a short witness? It turns out that we can—but, as far as anyone knows
today, only by giving a quantum witness! A quantum witness that works is
simply the n-qubit ground state itself. Thus, the Local Hamiltonians problem
is not known to be in NP; it’s only known to be in the quantum analogue of
NP, which is called QMA (Quantum Merlin-Arthur). An important theorem
from around 1999 says that Local Hamiltonians is actually complete for QMA,
just like 3SAT is complete for NP.
220
26.1. LOCAL HAMILTONIANS 221
So, if natural quantum systems like to settle into their ground states, and if
finding the ground state is NP-hard, does this mean that we could use quantum
systems to solve NP-hard problems? People talked about such questions even
before the concept of quantum computing was in place. But there’s a serious
problem. It’s not always true that natural quantum systems quickly settle
into their ground states. Starting from hard instances of 3SAT might produce
complicated and exotic Hamiltonians, far from physicists’ usual experience.
Those complicated Hamiltonians might be ones for which it’s hard to reach
the ground state.
Figure 26.1: Optimization landscape with a local minimum that the particle
might get trapped in.
In the hillside above, will the ball get to the point that minimizes its grav-
itational potential energy? Probably not anytime soon! If we wait a million
years, maybe a thunderstorm will push the ball up over the hill in the middle,
or something. But for the foreseeable future, it’s much more likely for the ball
to rest at the local minimum on the left.
In hard real-world optimization problems you may have a very bumpy
landscape, with thousands of dimensions and plenty of local optima to get
trapped in. You might wonder if quantum computing could help us wade
through these local optima, and it certainly seems plausible. In fact, the hope
it could was a central starting point for today’s topic.
222 LECTURE 26. THE ADIABATIC ALGORITHM
So, gradually changing the Hamiltonian moves us from one ground state to
another ground state. In the minds of Farhi, Goldstone, Gutmann and Sipser
this suggested a plan to solve NP-hard problems using Hamiltonians.
Here’s how the adiabatic algorithm would work with 3SAT as an example.
First we need to pick a Hamiltonian with a known and easy to prepare ground
state. For example, the Hamiltonian
1 −1
H= , (26.1)
−1 1
with eigenstates |+i and |−i. The energy of |+i is 0 and the energy of |−i is
2, so |+i is the ground state. We can create an initial Hamiltonian Hi (note
we’re using i here to denote “initial,” not as an index) by applying H to each
qubit individually:
n−1
X 1 −1
Hi = . (26.2)
−1 1 j
j=0
Here the subscript j is meant to denote that the j th term acts nontrivially
on qubit j, tensored with the identity matrix on the remaining n − 1 qubits.
The ground state of Hi , namely |+i⊗n , has the advantage that it is easy to
prepare on a quantum computer. We then gradually change Hi to another
Hamiltonian Hf , which encodes some n-bit 3SAT instance that we’d like to
solve:
m−1
X
Hf = hj , (26.3)
j=0
26.2. THE ADIABATIC ALGORITHM 223
the ground state. We define the Minimum Eigenvalue Gap, g, as the min-
imum of the energy difference between the first excited energy and the ground
energy as a function of the time t.
We’ll call the time at which g is minimized tmin . The gap g turns out to be a
crucial quantity for determining how long the adiabatic algorithm will take to
solve a given problem instance. Roughly speaking, in order to remain in the
ground state throughout the entire computation, we need to ensure that near
tmin the rate of evolution of the system is ∝ g 2 . This requirement means that
the overall computation will run in something like ∼ g12 time.
1
If we could show that g was always lower-bounded by nO(1) for 3SAT (or
some other NP-complete problem), then we’d get NP ⊆ BQP; quantum com-
puters would be able to solve all NP problems in polynomial time. In reality
we’re approximating the Hamiltonians with discrete-time unitary transforma-
tions so we’d end up with something close to the ground state of Hf , not the
ground state itself. But even that would still be good enough to solve our
NP-hard problem.
So the question boils down to: what is the behavior of the minimum spectral
gap for the problem instances that we want to solve?
What emerged after a couple of years is that, for hard instances of 3SAT
(or even 2SAT, for that matter), the minimum eigenvalue gap often does get
exponentially small. At the avoided level crossing you’d need to run the al-
gorithm for an exponential number of steps in order to remain in the ground
state and thereby solve your SAT instance.
But the story doesn’t end here. The physicists regrouped and said, “Okay,
so maybe this technique doesn’t always work, but it might still give big ad-
vantages for some types of optimization problems!”
By now there’s been lots of research on classifying the types of solution
26.2. THE ADIABATIC ALGORITHM 225
Figure 26.3: Optimization landscape where a sharp thin peak separates the
global minimum from the rest of the landscape.
landscapes where the adiabatic algorithm performs well and those where it
performs poorly. Some encouraging results came from Farhi, Goldstone, Gut-
mann in 2002. These authors constructed landscapes that had a global min-
imum at the bottom of a wide basin, but also a tall thin spike blocking the
way to that minimum. Starting from the far left a classical algorithm based
on steepest descent would get stuck forever at the base of the spike (i.e., at a
local minimum) and would never reach the global minimum.
OK, but before we examine the performance of the adiabatic algorithm
on this sort of landscape shouldn’t we first look at better classical algorithms?
Indeed, one example of such an algorithm is Simulated Annealing which,
much like the adiabatic algorithm, will always eventually reach the global
minimum if you run it for long enough. In some regards simulated annealing
can be thought of as a classical counterpart to the adiabatic algorithm.
The basic idea of simulated annealing is to evaluate the fitness function
around the current point and then:
On the fitness landscape with the spike in Figure 26.3, simulated annealing
would eventually get over the spike despite how energetically unfavorable it is.
However, if the spike is tall enough then it would take an exponential amount
of time.
On the other hand, if the spike is thin enough then, Farhi et al. showed
that the adiabatic algorithm can get past it in only polynomial time. It does
so by exploiting a well-known quantum phenomenon called Tunneling.
Popular articles explain tunneling by saying that a quantum particle can
get through barriers that a classical particle could never get through. It would
probably be more accurate to say: “in places that a classical particle would
need exponential time to get through, sometimes a quantum particle can get
through in polynomial time.” In terms of interference, we can say, “The paths
that involve the particle not going over the spike interfere destructively and
cancel each other out, leaving only paths where the particle does get over it.”
The phenomenon of tunneling is important in many
places in physics. For one thing, it’s why the sun can
shine! Nuclear fusion requires hydrogen atoms to get su-
per close before they realize that it’s energetically favor-
able for them to fuse. The trouble is, when they’re not
quite so close, they strongly repel each other (because
both nuclei are positively charged). When quantum me-
chanics came along, it explained how, while the energy
barrier would prevent fusion classically, it still happens
because the nuclei are able to tunnel past the barrier.
Anyway, the 2002 paper of Farhi et al. was good news for the adiabatic
algorithm, but tunneling only helps if the spike is sufficiently thin. Since
then, we’ve learned more about the types of fitness landscapes for which the
adiabatic algorithm is expected to help. In a landscape like the one pictured
in Figure 26.4 simulated annealing and the adiabatic algorithm would both
have trouble, and would both take an exponential amount of time.
26.2. THE ADIABATIC ALGORITHM 227
Figure 26.4: An optimization landscape where a very tall thick hill separates
the global minimum from the rest of the landscape. The thicker the barrier,
the less helpful tunneling is in practice.
Or consider the fitness landscape in Figure 26.5 (we can only draw a 1-
dimensional cross-section, but imagine that all of the 2n solutions in an n-
dimensional hypercube have equal values except for the one good solution).
This would also take exponential time for both simulated annealing and the
adiabatic algorithm to traverse. We actually already know this by the BBBV
Theorem—because in this case we’re effectively just querying a black box in
an attempt to find a unique marked item.
It turns out that if you’re clever about how you run the adiabatic algorithm,
you can achieve the Grover speedup in the case above, but not anything faster.
Indeed, the BBBV Theorem tells us that Ω(2n/2 ) steps are needed using the
adiabatic algorithm or any other quantum algorithm. What’s cool is that just
by knowing BBBV, without any physics, that the spectral gap has to decrease
exponentially.
Just like the expert who Farhi consulted was alluding to with
his wisecrack, physicists can use knowledge from quantum
algorithms to learn new things about spectral gaps.
228 LECTURE 26. THE ADIABATIC ALGORITHM
Figure 26.6: High-level sketch of the strategy for getting a Grover speedup
using the adiabatic algorithm. The key idea is that we only need to run the
algorithm slowly in the vicinity of the minimum eigenvalue gap.
OK, here’s a subtler question. Suppose the adiabatic algorithm had worked
to solve 3SAT in polynomial time. Would that have violated the BBBV Theo-
rem? It turns out that the answer is no. The BBBV Theorem applies only to
black-box search. The Hamiltonian encoding a 3SAT instance contains richer
information than the black box considered by BBBV. In particular, it encodes
not merely whether each possible solution is satisfying or unsatisfying, but
also the number of clauses that it violates. A 2002 paper by van Dam, Mosca,
and Vazirani showed that that information alone is enough to reconstruct the
3SAT instance in polynomial time—and hence also to solve the instance in
polynomial time if we assumed (for example) that P = NP! This means that
there’s no hope of proving a black-box lower bound like BBBV in this setting.
We also know classical algorithms that can solve 3SAT in less than 2n/2
time, the best currently known is O(1.3n ). This is another way of seeing that
the BBBV Theorem can’t encompass everything that it’s possible to do on
3SAT.
OK, let’s consider one more type of landscape: the type pictured in Figure
26.7. Here a funny thing happens: simulated annealing gets into the local
minimum, but the algorithm then escapes it, crosses the plateau and reaches
the global minimum in polynomial time. Meanwhile the adiabatic algorithm
just keeps returning to the local minimum and takes exponential time to reach
the global minimum!
There’s even further problem with establishing quantum speedups for adi-
26.2. THE ADIABATIC ALGORITHM 229
Figure 26.7: The adiabatic algorithm struggles with this sort of optimization
landscape while classical algorithms like simulated annealing have little diffi-
culty.
ization part? Indeed, the adiabatic algorithm could be seen not only as an
algorithm but also as a proposal for the physical implementation of quantum
computers. An important result by Dorit Aharonov et al., from 2004, says that
adiabatic quantum computers would be universal for quantum computation;
that is, able to solve all BQP problems efficiently.
There’s a venture-capital-backed startup called D-Wave that’s been build-
ing special-purpose devices to implement a noisy approximation to the adi-
abatic algorithm (called quantum annealing), using physical Hamiltonians
themselves. D-Wave’s latest model has about 2000 superconducting qubits.
You can encode an optimization problem of your choice onto their chip by
choosing the interaction Hamiltonian for each pair of neighboring qubits.
D-Wave was all over the press because they actually sold a
few of their machines to companies like Google and Lock-
heed Martin and were notorious for claiming that quantum
computing is “already useful in practice.” They were even
on the cover of Time magazine!
Professor Aaronson has been to D-Wave’s headquar-
ters. Funnily enough their machine is literally a room-sized
black box (most of the hardware inside the box is devoted
to cooling; the actual qubits are on a chip no larger than
an ordinary computer chip).
So what’s the verdict? Experimental data shows that the D-Wave device
is indeed able to solve optimization problems encoded in its special format
at a speed that’s often competitive with the best known classical algorithms.
26.2. THE ADIABATIC ALGORITHM 231
Unfortunately, results over the past decade do not clearly show any quantum
speedup over the best classical algorithms. Why not? Roughly speaking there
are three main possible causes for the lack of speedup on D-Wave’s current
devices. As far as we know, the truth might be any combination of them.
This sounds pretty bad! Why is anyone optimistic that quantum computing
could scale even in principle? We’ll see why in the next lecture when we
explore the basics of quantum error-correction, a technique that has not yet
been demonstrated (by D-Wave or by anyone else), but that many researchers
expect will ultimately be necessary for scalable quantum computing.
Lecture 27: Quantum Error Cor-
rection
At the end of the last lecture we discussed some of the difficulties with achiev-
ing a quantum speedup using currently available quantum computing devices
such as the one manufactured by D-Wave’s. We saw how D-Wave’s devices
are cooled to 10 milliKelvin, but even that might be too hot and lead to too
much decoherence and error! In the setting of adiabatic quantum computing
this shows up mostly as unwanted level crossings. Based on this you might
wonder if even 10 milliKelvin isn’t cold enough—if nothing short of absolute
zero and perfect isolation seem to suffice—then why should building a scalable
quantum computer be possible at all?
We need to separate two issues. First, there’s the “engineering challenge”
of building a scalable quantum computer. Everyone agrees that at a bare
minimum it will be staggeringly hard to achieve the required degree of isolation
for thousands or millions of qubits when those qubits also need to interact with
each other in a carefully choreographed way. Maybe various practical problems
will prevent human beings from doing it in the next 50 or 100 years. Maybe it
will be too expensive. Theory alone can’t answer such questions. Then there’s
the question of whether anything prevents scalable quantum computing even in
principle. If quantum mechanics itself were to break down that could certainly
prevent quantum computing from being possible—but it would also represent
a much more revolutionary development for physics than “merely” building a
quantum computer! So, short of a breakdown of quantum mechanics, on what
grounds have people argued that scalable quantum computers aren’t possible?
233
234 LECTURE 27. QUANTUM ERROR CORRECTION
The most simple form of classical error correction is based on the idea of
introducing redundancy by directly using repetition. The simplest repetition-
based error correcting code is the 3-bit Repetition Code which lets us
encode one logical bit using 3 physical bits. We encode a logical 0, denoted
0̄, as 000. Likewise we encode a logical 0, denoted 1̄, as 111. We call the bit
27.1. CLASSICAL ERROR CORRECTION 235
strings selected to encode the logical states the codewords. We claim that this
code lets us both detect and correct an error in any one physical bit. Given
an 3-bit input string x = x0 x1 x2 we do error detection by checking whether
x0 = x1 = x2 . Assuming that at most one bit experiences an error we’ll be
able to identify it. If we detect an error (say in x0 ) we can do error correction
by setting x = MAJORITY(x0 , x1 , x2 ).
It can be shown that any code that can both detect and correct a single
bit-flip error must use at least 3 bits. By contrast, if we just want to detect a
single bit-flip error and not correct it 2 bits suffice.
Figure 27.1: Graphical representation of the 3-bit bit-flip code. Each corner
of the cube is labeled with a particular 3-bit string, with adjacent corners
differing in exactly one bit. The bottom (green) cloud encloses the set of
states reachable from 000 when at most one bit-flip occurs. The upper (blue)
cloud encloses the set of states reachable from 111 when at most one bit-flip
occurs. In order for our error correcting code to work, the set of states in the
two clouds cannot have any overlap.
236 LECTURE 27. QUANTUM ERROR CORRECTION
A useful geometric picture for why the 3-bit repetition code works is given
in Figure 27.1. We can associate each of the 8 3-bit strings with a corner of a
cube selected such that each point differs from its neighbor in exactly one place
(the hamming distance is 1). Essentially, the code simply picks two points on
the cube that are maximally far from each other and declares one to be the
encoding of 0 and the other to be the encoding of 1. This idea generalizes
to codes with longer codewords, in which case the set of possible bit strings
correspond to points on the boolean hypercube.
000 and 111 can each get corrupted to any point in their
respective “clouds”, but since the two “clouds” don’t over-
lap we’re able to correct the error. We’ll seek to replicate
this behavior in the quantum case.
Crucially, until you get over the hurdle of error correction, it may not look
like your quantum computer is doing much of anything useful. This is the main
reason why progress in experimental quantum computing has often seemed
slow (with the world record for Shor’s algorithm remaining the factorization
of 21 into 3 × 7, etc. . . ). Many people believe that practically important
speedups will come only after we’ve overcome this hurdle.
It’s like the joke where someone calls 911 to report a dead
body and the operator asks the caller if they’re sure the
person is actually dead. Gunshots are heard; then the caller
says, “OK, now what?”
But the quantum Zeno effect only solves the problem of quantum error-
correction if the following two things are true:
I The only thing we’re worried about is continuous drift (rather than, e.g.,
discrete bit-flips).
I We know a basis in which our qubit is supposed to be one of the basis
vectors.
Figure 27.2: Circuit for encoding an input state in the 3-qubit bit-flip code.
Figure 27.3: Circuit for encoding an input state into the 3-qubit phase-flip
code.
|+̄i = |+i |+i |+i and |−̄i = |−i |−i |−i (27.8)
OK, but just like the 3-qubit bit-flip code failed to protect against phase-
flip errors, this code fails to protect against bit-flip errors. We can see this by
first writing the logical states |0̄i and |1̄i using the above code
1 1
|0̄i = √ (|+̄i + |−̄i) = √ (|+i |+i |+i + |−i |−i |−i)
2 2
1
= (|000i + |011i + |101i + |110i)
2 (27.9)
1 1
|1̄i = √ (|+̄i − |−̄i) = √ (|+i |+i |+i − |−i |−i |−i)
2 2
1
= (|001i + |010i + |100i + |111i)
2
We now observe that by applying a bit-flip to any qubit we can change |0̄i
27.2. QUANTUM ERROR CORRECTION 241
to |1̄i or vice versa. This observation brings us to our first serious quantum
error-correcting code.
|0̄i = |+i |+i |+i and |1̄i = |−i |−i |−i . (27.10)
Note we’re labeling our codeword differently than we did in Equation 27.8
(that’s ok, we have some freedom in how we choose to label the codewords).
As we noted above though, this encoding is susceptible to bit-flip errors. To
protect from bit-flip errors we take each of the states comprising the codewords
above and encode those using the 3-qubit bit-flip code, which yields:
⊗3 ⊗3
|000i + |111i |000i − |111i
|0̄i = √ and |1̄i = √ (27.11)
2 2
We claim that this code lets us detect and correct a bit-flip or a phase-flip
(and hence, any possible error) on any one of the 9 qubits.
How does this detect and correct bit-flip errors? We just need build a
quantum circuit that checks whether all 3 of the qubits in a given block have
the same value and if they don’t it sets the wayward qubit equal to the majority
of all 3 qubits in the block. We then apply that circuit to each of the 3 blocks
separately. The circuit for performing this check can be found in Figure 27.5.
More interestingly, how does this code also detect and correct phase-flip
errors? We can build a quantum circuit that computes the relative phase
between |000i and |111i within each block, checks whether all 3 phases have
the same value, and sets any wayward phases equal to the majority of all 3
phases. A circuit for performing this check can be found in Figure 27.6.
We can combine both of the error checking circuits in Figures 27.5 and
27.6. We call the resulting circuit the syndrome detecting circuit. We first
apply the circuit from 27.5 to each of the three blocks of qubits encoded with
the bit-flip code in order to fix any bit-flip errors in these blocks. We then
apply a version of 27.6 to detect phase-flip errors on a block-by-block basis.
Note that a phase-flip on any qubit in a block modifies the overall state of the
242 LECTURE 27. QUANTUM ERROR CORRECTION
Figure 27.4: Encoding circuit for Shor’s 9-qubit code. Indentation meant to
emphasize the concatenated nature of the encoding scheme. Reproduced from
Nielsen and Chuang’s “Quantum Computation and Quantum Information.”
• X ¬x∧y
α |000i + β |111i • • X x∧y
• X x∧¬y
|0i x
|0i y
Figure 27.5: Circuit for detecting and correcting bit-flip errors on an encoded
input state. The final round of X gates is applied conditionally based on the
values of x and y observed.
27.2. QUANTUM ERROR CORRECTION 243
• X ¬x∧y
H H
α |000i + β |111i H • • X x∧y H
H • X x∧¬y H
|0i x
|0i y
Figure 27.6: Circuit for detecting and correcting phase-flip errors on an en-
coded input state. The final round of X gates are applied conditionally based
on the values of x and y observed.
qubits in the block in exactly the same way (you can verify this by applying
a phase-flip error to the phase-flip code codewords and seeing what happens).
The final measurements applied to the ancillary qubits in Figure 27.7 produce
an output string whose value depends on precisely which error occurred and
on which qubit; this output string is called the syndrome.
x y Error
0 0 No Error
1 0 X0
1 1 X1
0 1 X2
x y Error
0 0 No Error
1 0 Z0
1 1 Z1
0 1 Z2
collapsing our state such that only the error corresponding to the syndrome
occurred—subject to the assumption that we really only had one qubit experi-
ence an error. We recommend trying this out directly for an arbitrary logically
encoded input state and a few different errors. The reason the state collapses
following the measurement is that our syndrome detection circuit generates
entanglement between the qubits we use to encode our state and the ancilla
qubits we use to detect the errors.
You can check that both of the operations above work not just for |0̄i and
|1̄i, but for arbitrary superpositions of the form α |0̄i + β |1̄i. So, the above
will fix any stray unitary transformations that get applied to any one qubit.
But what about errors that involve decoherence or measurement? We claim
that once we’ve handled all possible unitary transformations, we’ve automat-
ically handled all possible errors—because an arbitrary error might turn pure
states into mixed states, but it still keeps us within the same 4-dimensional
subspace. The measurements performed at the end of the circuits in Figure
27.7 still project us down to one of the orthogonal states corresponding to
some particular error occurring. We can still get back to our original state
|ψ0 i by applying bit flips and phase flips as needed.
Shor’s 9-qubit code was the first quantum error correcting code. Not long
afterward Andrew Steane found a shorter code that could also detect and
correct any error on 1 qubit. Steane’s code encoded 1 logical qubit into only 7
physical qubits. Then, Raymond Laflamme and others found codes that used
only 5 qubits. Five qubits turns out to be the least possible if you want both
to detect and to correct an arbitrary 1-qubit error, just like 3 bits is the least
27.2. QUANTUM ERROR CORRECTION 245
• X ¬a∧b Z g∧¬h
• • X a∧b
• X a∧¬b
|0i a
|0i b
• X ¬c∧d Z g∧h
• • X c∧d
• X c∧¬d
|0i c
|0i d
• X ¬e∧f Z ¬g∧h
• • X e∧f
• X e∧¬f
|0i e
|0i f
|0i H • H g
|0i H • H h
Figure 27.7: Quantum circuit for performing the full syndrome detection and
error correction procedure for the Shor 9-qubit code. We’ve structured the
circuit to highlight the concatenated nature of the error correction. We first use
the circuit from 27.5 on each block of 3 qubits to detect and correct any bit-flip
errors. We then use a version of the syndrome detection circuit in 27.6—we’ve
made liberal use of the identities X = HZH and (H⊗H)(CNOTi→j )(H⊗H) =
CNOTj→i , where the subscripts on CNOTi→j denote that qubit i controls qubit
j, to reduce the size of the circuit for the sake of brevity—to detect and correct
any phase-flip errors.
246 LECTURE 27. QUANTUM ERROR CORRECTION
we can still solve any problem in BQP, so long as is sufficiently small. More-
over, we can simulate any error-free quantum circuit with only an additional
polylog(n)-factor overhead in the number of gates.
Since its discovery, the threshold theorem has set much of the research
agenda for experimental quantum computing. It says that once we can de-
crease error below a certain threshold, we’ll effectively be able to make it arbi-
trarily small by applying multiple recursive layers of quantum error-correction.
Journalists often try to gauge progress in experimental quantum computing by
asking about the number of qubits, but at least as important is the reliability
of the qubits. It’s reliability that will determine when (if ever) we cross the
threshold that would let us get to arbitrarily small error and add as many
additional qubits as we liked.
No one there yet, but lots of progress is being made on two fronts:
27.2. QUANTUM ERROR CORRECTION 247
Here’s one milestone that has been achieved fairly recently, in 2016 the
research groups of Michel Devoret and Robert Schoelkopf at Yale reported the
use of a quantum error correcting code to keep a logical qubit alive for longer
that the physical qubits comprising it.
Lecture 28: The Stabilizer For-
malism
In this lecture we’ll see a beautiful formalism that was originally invented to
describe quantum-error correcting codes, but now plays many different roles
in quantum computation. First, some definitions:
1 0
I Stabilizer Gates are the gates CNOT, Hadamard and S = (also
0 i
called the “Phase Gate”).
I Stabilizer Circuits are quantum circuits made entirely of stabilizer
gates.
I Stabilizer States are the states that a stabilizer circuit can generate
starting from |0 · · · 0i .
248
249
1 0 0 1 0 −i 1 0
I= X= Y = Z= (28.1)
0 1 1 0 i 0 0 −1
Notice that these matrices match up with the errors we needed to worry about
in quantum error-correction (the last one up to a global phase).
Note that for Y gate this is equivalent to the bit-phase-flip error we saw in
the previous lecture up to a global phase. The Pauli matrices satisfy several
250 LECTURE 28. THE STABILIZER FORMALISM
beautiful identities:
XY = iZ Y X = –iZ
Y Z = iX ZY = –iX
(28.2)
ZX = iY XZ = –iY
2 2 2
X =Y =Z =I
If you’ve seen the quaternions, you might recall that they’re defined using
the same kinds of relations. This is not a coincidence!
In addition, all four Pauli matrices are both unitary and Hermitian. So what
does each Pauli matrix stabilize?
So each of the six 1-qubit stabilizer states has a corresponding Pauli matrix
that stabilizes it.
Next, given an n-qubit pure state |ψi, we define |ψi’s stabilizer group to be
the group of all tensor products of Pauli matrices that stabilize |ψi. We know
this is a group since the set of Pauli matrices is closed under multiplication
and so is stabilizing |ψi. Stabilizer groups have the additional property of
being Abelian (the group multiplication operation is commutative).
To illustrate, the stabilizer group of |0i is {I, Z}. The stabilizer group of
|+i is {I, X}. The stabilizer group of |0i ⊗ |+i is the Cartesian product of
those two groups:
{I ⊗ I, I ⊗ X, Z ⊗ I, Z ⊗ X}. (28.3)
As a convention, we’ll omit the ⊗’s from now on when unambiguous. For the
example above this convention gives us {II, IX, ZI, ZX}.
For a slightly more interesting example, what’s the stabilizer group of a
Bell pair? We know XX is in it because
How many bits does it take to store such a generating set in your computer?
Well, there are n generators and each one takes 2n + 1 bits to specify: 2 bits
252 LECTURE 28. THE STABILIZER FORMALISM
for each of the n Pauli matrices plus 1 additional bit for the sign. So the total
number of bits is n(2n + 1) = 2n2 + n = O(n2 ). Naı̈vely writing out the entire
amplitude vector or the entire stabilizer group would have taken ∼ 2n bits, so
we’ve gotten an exponential savings. We’re already starting to see the power
of the stabilizer formalism.
the rules are correct, but you should examine them one by one and see if you
can convince yourself. We’re also going to cheat a little. Keeping track of
the +’s and –’s is tricky and not particularly illuminating, so we’ll just ignore
them. What do we lose by ignoring them? Well, whenever measuring a qubit
has a definite outcome (say, either |0i or |1i), we need the +’s and −’s to
figure out which of the two it is. On the other hand, if we only want to know
whether measuring a qubit will give a definite outcome or a random outcome
(and not which definite outcome in the former case), then we can ignore the
signs.
28.1. THE GOTTESMAN-KNILL THEOREM 255
The Algorithm:
I Finally, whenever the ith qubit is measured in the {|0i , |1i} basis the
measurement will have a determinate outcome if and only if the ith col-
umn of the X matrix is all 0’s.
There are also rules for updating the tableau in case the measurement outcome
is not determinate, but we won’t cover them here.
Here’s another cool fact: the number of basis states that have nonzero
amplitudes is just 2k , where k is the rank of the X-matrix. In the example in
Equation 28.4, rank(X) = 0, corresponding to the fact that our “superposi-
tion” only contains a single basis state, namely |0000i.
256 LECTURE 28. THE STABILIZER FORMALISM
|0i H • S
|0i
Let’s test this all out by keeping track of the tableau for the circuit above.
We start with the state |00i which has the tableau representation
0 0 1 0
. (28.5)
0 0 0 1
Applying the Hadamard to the first qubit has effect of swapping the first
columns of the X-matrix and Z-matrix.
1 0 0 0
(28.6)
0 0 0 1
You could convert this back into the generators by saying that the current
state is the one generated by +XI and +IZ; this makes sense since those do
indeed generate the stabilizer group for |+i |0i. Now, to apply the CNOT we
bitwise XOR the first column in the X-matrix into the second column in the
X-matrix and likewise bitwise XOR the second column in the Z-matrix into
the first column in the Z-matrix. This results in the tableau
1 1 0 0
. (28.7)
0 0 1 1
The generators corresponding to this tableau are {XX, ZZ}, which as we saw
earlier are indeed the stabilizer generators for a Bell pair as we expect. Finally,
we apply the Phase gate by bitwise XORing the first column in the X-matrix
into the first column of the Z-matrix. This results in the tableau
1 1 1 0
. (28.8)
0 0 1 1
This final tableau corresponds to the generators {Y X, ZZ}, which you can
check are the stabilizers for the state |00i+i|11i
√
2
.
research, most (though not all) of the error-correcting codes that have been
seriously considered are stabilizer codes. The reason is similar to why linear
codes play such a central role in classical error correction. Namely, it makes
everything much easier to calculate and reason about, and by insisting on
it we don’t seem to give up any of the error-correcting properties that we
want. As a result, the stabilizer formalism is the lingua franca of quantum
error-correction; it’s completely indispensable in this setting.
Shor’s 9-qubit code is an example of a stabilizer code. Recall that with that
⊗3
code we had the codewords (|000i±|111i)
√
2 2
. These two codewords correspond to
the stabilizer group generators below.
Z Z I I I I I I I
I Z Z I I I I I I
I I I Z Z I I I I
I I I I Z Z I I I
I I I I I I Z Z I (28.9)
I I I I I I I Z Z
X X X X X X I I I
I I I X X X X X X
± X X X X X X X X X
The sign on the last line gives either |0̄i for + or |1̄i for –.
We can use intuition to see why the above elements are in the stabilizer
group. Firstly, phase-flips applied to any pair of qubits in the same block
cancel each other out. Secondly, bit-flips applied to all of the qubits in a given
block also take us back to where we started, though possibly with the addition
of a global −1 phase. You then just need to check that these 9 elements are
linearly independent of each other, meaning that there aren’t any more to be
found.
Now that we know the stabilizer formalism, we’re finally ready to see an
“optimal” 5-qubit code for detecting and correcting an arbitrary error on
any one qubit. The codeword states would be a mess if we wrote them out
explicitly—they’re given by superpositions over 32 different 5-bit strings! Ev-
erything is much more compact if we use the stabilizer formalism. The code
corresponds to the following set of stabilizer group generators:
X Z Z X I
I X Z Z X
X I X Z Z
(28.10)
Z X I X Z
± X X X X X
258 LECTURE 28. THE STABILIZER FORMALISM
Once again the sign on the last generator is + if we want the |0̄i, state or − if
we want the |0̄i state. One can check (we won’t prove it here) that this code
can indeed detect and correct a bit-flip, phase-flip, or bit-phase-flip error on
any one of the five qubits.
But doing all that is expensive and creates lots of new opportunities for error!
While the qubits are unencoded there’s nothing to protect them from decoher-
ence. So it would be awesome if we had a code where applying gates to encoded
qubits was hardly more complicated than applying them to unencoded qubits.
This motivates the following definition: the gate G is transversal for the code
C if in order to apply G to logical qubits encoded using C, all you need to
do is apply G independently to each of the physical qubits. For example, the
Hadamard gate is transversal if you can Hadamard a logical qubit by just
separately Hadamarding each of the physical qubits. You should check, for
example, that the Hadamard gate is transversal for Shor’s 9-qubit code.
It turns out that there are quantum error-correcting codes for which the
CNOT, Hadamard, and Phase gates are all transversal. Thus, if you use one
of these codes, then applying any stabilizer circuit to the encoded qubits is
extremely cheap and easy. Unfortunately, we already saw that the stabilizer
gates are non-universal. Moreover, there’s a theorem due to Zeng, Cross and
Chuang in 2007 that says that for any useful stabilizer code C the correspond-
ing set of transversal gates can’t be universal. Eastin and Knill generalized
this to show that no useful quantum error-correcting code can have a universal
set of transversal gates. This means that if we want a universal error-corrected
quantum computer we’re going to need to figure out how to implement some
non-stabilizer gate (say Toffoli or T ) in a non-transversal manner, probably
via some sequence of gates that is much more expensive.
28.2. STABILIZER CODES 259
|ψi • S T |ψi
√1 (|0i + eiπ/4 |1i)
2
Figure 28.1: Circuit for simulating a T gate using stabilizer operations and
measurements with the magic state √12 (|0i + eiπ/4 |1i).