0% found this document useful (0 votes)
4 views

Simulating Quantum Computing

Uploaded by

Kaique Fernandes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Simulating Quantum Computing

Uploaded by

Kaique Fernandes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Simulating Quantum Computers Using OpenCL

Adam Kelly

November 9, 2018

Quantum computing is an emerging technol- matrices.


ogy, promising a paradigm shift in computing, In this work, a simulator using OpenCL is de-
and allowing for speed ups in many different scribed, a technology introduced in section 1.2.
problems. However, quantum devices are still
in their early stages, most with only a small
1.1 Existing Research
number qubits. This places a reliance on sim-
ulation to develop quantum algorithms and to The idea of using classical computers to simulate
arXiv:1805.00988v2 [quant-ph] 7 Nov 2018

verify these devices. While there exists many quantum computers and quantum mechanics is noth-
algorithms for the simulation of quantum cir- ing new. There exists a variety of software libraries
cuits, there is (at the time of writing) no tools that can be used to so, each with different purposes.
which use OpenCL to parallelize this simula- Some libraries such as QuTIP[16] are aimed at solv-
tion, thereby taking advantage of devices such ing a wide variety of quantum mechanical problems,
as GPUs while still remaining portable. whereas others are more specialized such as Quipper
In this paper, such a tool is described, in- [14] for controlling quantum computers and qHipster
cluding optimizations in areas such as gate ap- for simulating quantum computers using distributed
plication. This leads to a new approach that computing techniques [20]. A comprehensive list of
outperforms other popular state vector based tools is available on Quantiki [2].
simulators. An implementation of the pro- While the area of simulation is well established,
posed simulator is available at https://qcgpu. there are, to my knowledge, no simulation tools that
github.io. can take advantage of hardware acceleration. It is
well known that dedicated hardware can speed up
certain types of computations. This is becoming in-
1 Introduction creasingly more apparent in fields such as machine
learning, gaming and cryptocurrency mining.
Quantum computing is a paradigm shift in comput- While this research mainly looks at state vector
ing. These devices are thought to be the key to solving simulations, there are other ways of doing simulations.
some types of problems, such as factoring semi-prime These include using the Feynmann path integral for-
integers [19], search for elements in an unstructured mulation of quantum mechanics [6, 8], using tensor
database [15, 23], simulation of quantum systems, op- networks [18] and applying different simulations for
timization [13] and chemistry problems. circuits made up of certain types of quantum gates
These problems are not feasible to solve using classi- [7]. These techniques (while not covered in this work)
cal computers, but quantum computers may fix that. will hopefully be included in the simulation software
Still, it is estimated that hundreds [4] up to thousands at a later date.
[5] of qubits (the quantum analogue to bits) will be
needed. Still, the way that quantum computers oper-
1.2 OpenCL
ate does not violate the Church-Turing principle [10].
This means that quantum computers can be, to some OpenCL (Open Computing Language) is a general-
extent, simulated using classical computers. purpose framework for heterogeneous parallel com-
There are some existing quantum computers, such puting on cross-vendor hardware, such as CPUs,
as IBM’s Q Experience [3], a semi-public cloud based GPUs, DSP (digital signal processors) and FPGAs
quantum computer with up to 20 qubits. While the (field-programmable gate arrays). It provides an ab-
number of qubits available at the moment is small, straction for low-level hardware routing and a consis-
as it increases, many issues are being raised. One of tent memory and execution model for dealing with
these issues is the ability to assess the correctness, massively-parallel code execution. This allows the
performance and scalability of quantum algorithms. framework to scale from embedded systems to hard-
It is this issue which simulators of quantum comput- ware from Nvidia, ATI, AMD, Intel and other man-
ers address. They allow the user to test quantum ufacturers, all without having to rewrite the source
algorithms using a limited number of qubits, and cal- code for various architectures. A more detailed
culate measurements, state amplitudes and density overview of OpenCL is given in [22].

1
queue.
The hardware will then load DRAM into the global
device RAM, and execute each work group on the
work-queue.
On the device, the multiprocessor will execute the
kernel using multiple threads at once. If there is more
work groups than threads on the device, they will be
serialized.
There are some limitations. The global work size
must be a multiple of the work group size. This is
to say the work group must fit evenly into the data
Figure 1: The OpenCL programming model/architecture structure.
Secondly, the number of elements in the n-
dimensional vector must be less or equal the
The main advantage of using OpenCL over a hard- CL_KERNEL_WORK_GROUP_SIZE flag. This is important
ware specific framework is that of a portability first to the QCGPU library as it sets a hard limitation
approach. OpenCL has the largest hardware cover- on the size of the state vector being stored on the
age, and as a header only library, it requires no spe- GPU. CL_KERNEL_WORK_GROUP_SIZE is a hardware flag,
cific tools or other dependencies. Aside from this, and OpenCL will return an error code if either of
OpenCL is very well suited to tasks that can be ex- these conditions is violated. This can be avoided by
pressed as a program working in parallel over simple using an approach similar to the distributed memory
data structures (such as arrays/vectors). The dis- techniques used in other simulations. This feature is
advantages with OpenCL, however, come from this planned to be implemented soon.
lack of a hardware-specific approach. Using propri-
etary frameworks can sometimes be faster than using
OpenCL, and sometimes it can also be more straight- 1.3 Quantum Computing
forward to develop kernels for the devices. Before considering quantum computing, let’s first
OpenCL is an open standard maintained by the start with classical computation. A classical com-
non-profit Khronos Group. It views a computing sys- puter is the type of computer that you may have at
tem as a number of compute devices (such as CPUs home. Laptops, Tablets, Phones and Smart TV’s are
or accelerators such as GPUs), attached to a host pro- all examples of a classical computer.
cessor (a CPU). OpenCL executes functions on these A quantum computer is different. It takes advan-
devices called Kernels, and these kernels are written tage of principles of quantum mechanics such as su-
in a C-like language, OpenCL C. A compute device perposition, entanglement and measurement to per-
is made up of several compute units which contain form computation (see the following section). Because
multiple processing elements. It is the processing el- of this, it can do computations that normal computers
ements that execute kernels. This is shown in figure will never be able to do.
1.
At the host level, a compute device is selected. The
OpenCL API then uses its platform later to submit 1.3.1 Qubits and State
work to the device and manage things like the work In a classical computer, information is represented as
distribution and memory. The work is defined using a bit. A bit is a binary system, and thus can be in
kernels. These kernels are written in OpenCL C, and one of two states, 0 or 1. In a quantum computer,
execute in parallel over a predefined, n-dimensional information is represented as a qubit. The qubit is
computation domain. Each independent element of the quantum analogue of a bit. Using Dirac notation
this execution is a work item. These are equivalent [11], a qubit can be in the state |0i or |1i, or (more
to Nvidia CUDA threads. The groups of work items, importantly) a superposition (linear combination) of
work groups, are equivalent to CUDA thread blocks. these states. Mathematically, the state of a single
With this, a general pipeline for most GPGPU qubit |ψi is
OpenCL applications can be described. First, a CPU |ψi = α |0i + β |1i , (1.1)
host defines an n-dimensional computation domain
over some region of DRAM memory. Every index of such that α, β ∈ C. The coefficients also must follow
2 2
this n-dimensional domain will be a work item, and a normalization condition of |α| + |β| = 1.
each work item will execute the same given Kernel. In the above state, the complex numbers α and β
The host then defines a grouping of these into work are known as amplitudes. The states |0i and |1i are
groups. Each work item in the work-groups will ex- known as basis states. Importantly, given any state
ecute concurrently within a compute unit and will |ψi, it is impossible to extract the amplitudes of any
share some local memory. These are placed on a work basis state.

2
Commonly used is the vector notation for states. 1.3.2 Manipulating the State
The basis states |0i and |1i are vectors that form
In a classical computer, bits are manipulated using
an orthonormal basis for that qubits state space.
logic gates. There is a quantum analogue to this too.
The standard representation (and the one followed
Just as the state of a system of qubits was defined
throughout QCGPU and this paper) is
using vectors, the way they change can be described
also. The state of a qubit (or multiple qubits) is
changed by quantum logic gates, or just gates. When
   
1 0
|0i = , |1i = . (1.2) representing the state of qubits as vectors, quantum
0 1
gates are represented using matrices. These matrices
Following from this, the state |ψi can be repre- must comply with certain rules in order to be valid
sented as a unit vector in the two-dimensional com- quantum gates.
plex vector space, For a matrix to represent a quantum gate, it must
be unitary. A matrix U is unitary if it satisfies the
property that it’s conjugate transpose U † is also its
 
α
|ψi = (1.3) inverse, thus U † U = U U † = I, where I is the identity
β
matrix. In quantum computing, all gates have a cor-
The concepts here generalize to quantum systems responding unitary matrix, and all unitary matrices
containing many qubits. Since a single qubit has two have a corresponding quantum gate.
distinct basis states, an n qubit system has 2n distinct Gates that act on a single qubit are represented by
basis states. In quantum computing, a multiple qubit a 2 × 2 matrix. More generally, an n qubit gate is
system is known as a register. represented by a 2n × 2n matrix.
To combine the states of two individual qubits, the A single qubit gate can be applied to a quantum
Kronecker/tensor product must be used. For exam- register with an arbitrary number of qubits. For a
ple, to combine the states two qubits |ψ1 i and |ψ2 i, gate U to act on the jth qubit in an n qubit register,
  the full gate is formed by
    α1 α2
α1 α2  α1 β2 
U = I ⊗ I ⊗ . . . ⊗U ⊗ · · · ⊗ I , (1.7)
|ψ1 i ⊗ |ψ2 i = ⊗ =  (1.4)
β1 β2  β1 α2  | {z } | {z }
j − 1 times n − j times
β1 β2
or more succinctly
When basis vectors are combined, it is convention n
(
O U j=t
to say |1i ⊗ |0i = |10i or |2i (as ‘10’ is 2 in binary). Ut = (1.8)
More generally, An n qubit register is described by j=1
I otherwise
a unit vector |φi in the 2n dimensional complex vector
In matrix form, gates are applied to registers using
space,
  matrix multiplication. Multiple gates can be applied
α0
 α1  to a register. This is called a circuit. The gates being
|φi =  .  .
 
(1.5) applied to a register can be detailed using a circuit
 ..  diagram.
α2n −1 In a circuit diagram, each line across represents a
qubit, and each of the blocks on the lines) represents
This is equivalent to a linear combination of the basis gates or other operations such as measurement (see
states section 1.3.3). An example circuit diagram for three
n
2X −1
|ψi = αj |ji (1.6) qubits, applying the gate U to the second qubit is
j=0
shown below.
P
Where |ji is the jth basis vector, and αj = 1.
|0i |0i
There are some things note from this. Consider the
vector |ψi = √12 (|00i+|11i). It was stated before that |0i U U |0i
individual qubits can be combined using the Kroneck- |0i |0i
er/tensor product. Yet, there is no solution for the
vectors |ai and |bi to the equation |ai⊗|bi = |φi. That
is because |ψi is entangled, which means the state can-
1.3.3 Measurement
not be separated into individual qubit states. This is
important, as it is the entanglement that makes the It was stated before that given any state |ψi, it is
simulation of quantum computers hard, as it means impossible to extract the amplitudes for each of the
the number of amplitudes that need to be stored basis states. Still, there is a way to get classical infor-
grows exponentially rather then linearly. mation, a bit, out of a qubit.

3
In the previous section, it was said that quantum 2 Simulating Quantum Computers Us-
states are altered by unitary transformations or matri-
ces. However, that only applies to a closed quantum
ing OpenCL
system, that is, one that doesn’t interact with exter- This section describes the simulation method used in
nal physical systems. If you go to find out information the QCGPU library. The focus will be on the OpenCL
about this quantum system, you are interacting with Implementations.
it. This interaction causes the system to be no longer To be able to simulate a quantum computer, a sim-
closed, and the system is no longer only altered by ulation tool must have (at the bare minimum) a few
Unitary transformations. This different type of inter- things. The first is the ability to represent the state of
action is called a measurement. the quantum computer. This is usually done by rep-
Quantum measurements are described by what is resenting the state of the qubit register being consid-
called a measurement operator. These are operators ered, and is discussed in section 2.1. Secondly, there
act on the vector space made up of the basis states of needs to be a way to perform operations. This is dis-
the quantum system being considered. Measurement cussed in sections 2.2 and 2.3. Lastly there needs to
operators are a collection {Mm }, where m refers to be a way to see the outcome of the operations. This is
the measurement outcome that may occur. usually done by implementing quantum measurement,
If a state of a quantum system (like a quantum as discussed in section 2.4. However, it is sometimes
register) is |ψi immediately before a measurement, useful to just see the unmeasured quantum state. This
then the probability of getting a result m is given by is implemented in the simulator but is not discussed.

Throughout the software, the library ‘pyopencl’ has
p(m) = hψ|Mm Mm |ψi , (1.9) been used to interact with OpenCL from python.
and the state of the system after measurement, |ψ 0 i ‘numpy’ is used throughout also.
is

Mm |ψi
2.1 Representing State
|ψ 0 i = p . (1.10)

hψ|Mm Mm |ψi As previously described, the state of an n qubit reg-
ister is characterized by a normalized vector in the
This description of measurement applies to an ar-
2n dimensional complex vector space. Because of
bitrary quantum system, but now just qubits will be
this, such a state can be represented using 2n com-
considered. Qubits are almost always measured in
plex numbers. It is here that the main challenge with
the computational basis. The measurement of a single
simulating quantum computers lies, the exponential
qubit in the computational basis has two measure-
growth in the amount of complex numbers needed to
ment operators, M0 = |0ih0|, and M1 = |1ih1|. This
describe a register.
means that there is two possible measurement out-
In QCGPU, the state vector is stored as an array of
comes, 0 and 1.
2n complex floats. These complex floats correspond
Now consider the state |ψi = α |0i + β |1i. Then,
to the components of the state vector.
following from equation 1.9, the probability of obtain-
When a new state is initialized with a given number
ing a 0 when measuring is
of qubits, the initial state is |00 . . . 0i. In terms of
OpenCL, the array is stored on the device in global
p(0) = hψ|M0† M0 |ψi = hψ|M0 |ψi = |α| .
2 memory, with read and write permissions.
(1.11)
It should be noted the amount of memory needed
In the same way, the probability of obtaining a 1 is to store an n qubit state. A complex float requires 64
2
p(1) = |β| . After the measurement, the two possible bits, and the state is described by 2n complex num-
resulting states are: bers, thus total amount of memory needed to store
the state vector is
M0 |ψi α
= |0i = |0i (1.12) 64 · 2n bits. (2.1)
|α| |α|
M1 |ψi β To give a general idea, to simulate 5 qubits,
= |1i = |1i (1.13) 256 bytes are required. To simulate 10 qubits,
|β| |β|
8.192 kilobytes are required. To simulate 20 qubits,
Note that the coefficients in equation 1.12 and 1.13 8.389 megabytes are required. For 25 qubits, 268.4
x
and below are of the form |x| . This is equal to ±1. megabytes are required and to simulate 32 qubits, and
This can only be a global phase shift, and thus doesn’t for 30 qubits, 8.59 gigabytes of memory is required.
affect the measurement outcomes, and can be ignored.
The principles shown here generalize to multiple
2.2 Representing Gates
qubits analogously, except there are 2n possible mea-
surement outcomes, corresponding to the number of As the state of the qubits is represented as a vector,
resulting basis states. gates are represented as matrices. This was looked at

4
in section 1.3. independent (not based on the rest of the computa-
It was stated before that to apply a single qubit gate tion). This is what makes it suited to be parallelled.
U to the tth qubit in an n qubit quantum register, the For the kernel source code, see appendix B.1.
full matrix could be calculated by
n
( Algorithm 1: Gate application Algorithm
O U j=t (v, G, t)
Ut = (2.2)
j=1
I otherwise Input: An n qubit quantum state represented
This presents a problem however, as it is very in- by a column vector v = (v1 , . . . v2n )T and
efficient. The first problem is that the size of such a single qubit gate G, represented by a
a matrix would be 2n × 2n . That would take up a 2 × 2 matrix, acting on the tth qubit.
massive amount of memory, which is already a prob- 1 for i ← 0 to 2n−1 do
lem. Secondly, this calculation relies on the Kronecker 2 a ← the ith integer who’s tth bit is 0;
product, which for two matrices of size n1 × m1 and 3 b ← the ith integer who’s tth bit is 1;
n2 × m2 , has a running time of O(n1 n2 m1 m2 ) using // The following must be
big-O notation. This would make the simulator ex- simultaneously updated
tremely slow. 4 va ← va · G0,0 + vb · G0,1 ;
To avoid these issues, one has to use a different 5 vb ← vb · G1,1 + va · G1,0 ;
gate application algorithm to matrix multiplication
(see the following section), and represent gates in a
different way. To apply a single qubit gate as a controlled gate,
In QCGPU, gates are stored as 2 × 2 matrices, and algorithm 1 can be adapted. If the control qubit is
the only type of gates are single qubit gates. From cth in the register, only apply the update to va if the
this, controlled gates (for multiple qubits) can be ap- cth bit of a is one, and only update vb if the cth bit of
plied using any single qubit gate (again, see the fol- b is one. The corresponding kernel for this is shown
lowing section). in appendix B.2.
This is possible due to a concept known as uni-
versality. A set of gates is known as universal, if
2.4 Parallelizing the Measurement Algorithm
any possible operation on a quantum computer can
be reduced to them. An example of these sets is The measurement process relies on knowing the prob-
{T, H, CN OT }. ability of each output state. The actual selection of an
The T and H gates are single qubit gates, thus outcome based on these probabilities cannot be par-
can be represented in QCGPU, and the CN OT gate allelizing, however the calculation of the probabilities
is just the controlled X gate. The X gate is a sin- can. For the source code, see appendix B.3.
gle qubit gate, so it can be applied as a controlled From this an outcome can be selected. Because the
gate using the software. This means that the simula- probabilities can be calculated separately to the mea-
tor can do any operation by just implementing single surement, it also allows multiple measurements to be
qubit gates and the ability to apply them as controlled made without having to apply all of the gates again.
gates. While this isn’t possible on a quantum computer, it
For the implementation of the gates, it was chosen does mean that it is easier to prototype / simulate
just to pass in each element of the 2x2 matrix into algorithms, the primary goal of the software library.
the OpenCL kernels. This avoided complexity in the
gate application methods. This can be seen in the
following section. 3 Benchmarking
In the library, single qubit gates are represented as a
class. This class allows the end user to just use either In order to see if using hardware acceleration to
2x2 arrays or 2x2 matrices from numpy to represent simulate quantum computers is faster then the con-
gates, so as to not have to think about the internal ventional state vector approach, it was necessary to
representation. benchmark the software against other commonly used
tools.
It was decided to test against two different tools,
2.3 Improving the Gate Application Algorithm ProjectQ [21] and the simulator in Qiskit [1].
As gates are only represented as 2 × 2 matrices,
they can’t be applied via matrix multiplication. This 3.1 Designing The Experiments
means a different gate application algorithm must be
used. The goal of the benchmarking experiments was to test
Algorithm 1 details this approach. The structure of if there was a difference in speed between the different
the algorithm is a for loop through have the number simulators. The experiments were designed with this
of amplitudes. Note that the inside of the for loop is goal in mind.

5
3.1.1 Avoiding Possible Errors Algorithm 2: Benchmarking algorithm (n,
The task of benchmarking software is not an easy one. samples)
There are many different things which can impact the Input: The number of qubits n to test up to,
performance of software, all of which have to be taken and the number of samples for each
into account when performing benchmarks. Some- number of qubits, samples
times, the way that programming languages work, dif- Output: A list of the type of simulator, the
ferent run-time optimizations can change the speed of number of qubits and the time taken to
some software. This can be detrimental to the overall run the benchmark.
benchmarking results, and can be hard to diagnose. 1 data ← [];
Most of these issues boil down to independence. 2 for i ← 0 to n do
The easiest way to avoid these issues is shuffling. If 3 for 0 to samples do
you have a series of experiments to be run using the 4 type ← a type of simulator to use,
different tools, the order in which each individual ex- randomly chosen;
periment is run should be random. This is done in 5 t ← the time taken to run a quantum
the benchmarking code in section 3.2. Fourier transform with i qubits;
6 data ← (type, i, t);
3.1.2 Reproducibility
7 return data
Reproducibility is very important in software bench-
marking. Different hardware and software configura-
tions can make software change in performance. QFT Performance
To avoid this, all of the experiments were run using QCGPU
60 Qiskit
a virtual machine hosted by Amazon Web Services. Project Q
The machine was an EC2 P3.2xLarge instance with 50
the following specifications:
Runtime (seconds)

40

P3.2xLarge 30

GPU Nvidia Tesla V100 20


GPU Memory 16GB
10
vCPUs 8
Memory 61GB 0
0 5 10 15 20 25
Number of Qubits
Table 1: EC2 Instance Specifications
Figure 2: Benchmarking Data

3.2 Benchmarking Method


The biggest difference in time can be seen toward
For the actual experiment that was being timed in the end, where QCGPU is on average over 150 times
the benchmarked, it was decided to use the quan- faster than the Qiskit simulator and 8 times faster
tum Fourier transform. This is a transformation that then the ProjectQ simulator. This difference would
can be built up using both single and controlled qubit only increase with larger circuits.
gates. The reasoning for using the quantum Fourier
transform was that it is an integral part of many dif-
3.3.1 A Statistical Analysis
ferent quantum algorithms, and thus would be a real-
istic task that the simulator would perform. To prove the hypothesis of ‘using hardware accel-
The benchmarking algorithm is detailed in algo- eration provides a speed improvement over existing
rithm 2, and the benchmarking source code is given tools’, one needs to perform a statistical analysis. The
in the appendix. software was being compared against two tools, thus
the analysis will be repeated twice, analogously.
The significance test for the populations was chosen
3.3 Results
based on the properties of the data set.
When running the benchmarks, it was found that af- The dataset showed (against both Qiskit and Pro-
ter 24 qubits, the IBM software was intermittent, oc- jectQ) a significance in the homogeneity of variances.
casionally throwing errors so it was decided to stop This was determined using a Levene test, which gave
the benchmarks at the 24 qubit mark. p-values of 0.00194 and 0.006 respectively.
To see a graph of the mean running time for each Using a Shapiro-Wilk test, it was found that the
simulator at between 1 and 24 qubits, see figure 2. samples came from a normally distributed population,

6
Because of some of the features of this library
(namely hardware acceleration), this library fits a
wide range of use cases, especially those of labs that
already have this kind of hardware available (due
to the popularity in fields such as machine learn-
ing, which also takes advantage of hardware acceler-
ation). The speedup offered by the hardware acceler-
ation makes this library a valid choice for researchers
in the theoretical and practical quantum computing
field.

Figure 3: The website for the software library. 4.2 Areas for Future Research
There is many areas for future research in regards to
with p values of 0.0144 for QCGPU, 0.0333 for Pro- this work.
jectQ and 0.08 for Qiskit. Because quantum computers are described using
Because of these two properties, it was decided to linear algebra, there exists a wide variety of ways
use Welch’s t-test to determine the p-value of the null (other than the state vector approach taken in this
hypothesis. The resulting p-values were 0.0003396 work) to simulate quantum computers. Some of these
when testing against qiskit, and 0.003189 when test- include using the Feynman path integral formulation
ing against projectq, thus the null hypothesis can be of quantum mechanics [6, 8], using tensor networks
rejected. [18] and applying different simulations for circuits
made up of certain types of quantum gates [7]. Graph-
based approaches [9] have also been shown as success-
4 Conclusions ful. These techniques (while not covered in this work)
will hopefully be included in the simulation software
The previous chapters have explored the implemen- at a later date.
tation of a library for the simulation of quantum The simulator described in this report was able to
computers, using hardware acceleration through the simulate 28 qubits. To simulate more, a distributed
OpenCL framework. Although time-consuming, the approach would have to be taken. These approaches
simulation of quantum computers is a necessary part are detailed in [17, 20].
of developing and testing new quantum algorithms. It is also planned to integrate the software with
Through the development of the library, it has been other quantum computing frameworks, to improve it’s
shown how hardware acceleration with devices such as usefulness and versatility.
GPUs can help speed up the simulation of quantum
computers. With the various optimizations done also,
there has been shown to be a speedup, even on rel-
atively low powered hardware, compared to existing
libraries for a similar purpose.

4.1 Applications of this Work


The software developed during this research,
QCGPU, has a number of very useful applications.
Because the software is open source, it is easily ac-
cessible (see figure 3), thus enables it’s use without
having to get proprietary software, or pay some kind
of subscription. This means that any research done
using the software (such as the simulation of algo-
rithms) can be reproduced freely, and easily. It also
lowers the barrier to entry in regards to using the
software.
The need to simulate quantum computers is likely
one that will not go away, and will be essential to
development of quantum devices. The use of simula-
tors is vital in the development of quantum algorithms
also, as it is the only way to have knowledge of what
the internal state of the quantum computer would be
like when running the algorithm.

7
References RevA.64.042306. URL https://arxiv.org/
abs/quant-ph/0012114.
[1] Qiskit | Quantum Information Science Kit. [13] E. Farhi, J. Goldstone, and S. Gutmann. A
https://qiskit.org/. Quantum Approximate Optimization Algorithm.
[2] Quantiki, List of QC Simulators. arXiv:1411.4028 [quant-ph], Nov. 2014.
[3] IBM Research AI.
[14] A. S. Green, P. L. Lumsdaine, N. J. Ross,
https://www.research.ibm.com/ibm-q/, June
P. Selinger, and B. Valiron. Quipper: A Scalable
2018.
Quantum Programming Language. Apr. 2013.
[4] D. S. Abrams and S. Lloyd. Quantum Algo-
DOI: 10.1145/2499370.2462177.
rithm Providing Exponential Speed Increase for
Finding Eigenvalues and Eigenvectors. Phys. [15] L. K. Grover. A fast quantum mechanical
Rev. Lett., 83(24):5162–5165, Dec. 1999. DOI: algorithm for database search. arXiv:quant-
10.1103/PhysRevLett.83.5162. ph/9605043, May 1996.
[5] S. Beauregard. Circuit for Shor’s algorithm us- [16] J. Johansson, P. Nation, and F. Nori. QuTiP: An
ing 2n+3 qubits. arXiv:quant-ph/0205095, May open-source Python framework for the dynam-
2002. ics of open quantum systems. Computer Physics
[6] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, and Communications, 183(8):1760–1772, 2012.
H. Neven. Simulation of low-depth quantum cir- [17] R. LaRose. Distributed Memory Techniques
cuits as complex undirected graphical models. for Classical Simulation of Quantum Circuits.
arXiv:1712.05384 [quant-ph], Dec. 2017. arXiv:1801.01037 [quant-ph], Jan. 2018.
[7] S. Bravyi, D. Browne, P. Calpin, E. Campbell, [18] I. L. Markov and Y. Shi. Simulating quan-
D. Gosset, and M. Howard. Simulation of quan- tum computation by contracting tensor net-
tum circuits by low-rank stabilizer decomposi- works. SIAM Journal on Computing, 38(3):963–
tions. arXiv:1808.00128 [quant-ph], July 2018. 981, Jan. 2008. ISSN 0097-5397, 1095-7111. DOI:
[8] J. Chen, F. Zhang, C. Huang, M. Newman, and 10.1137/050644756.
Y. Shi. Classical Simulation of Intermediate-Size [19] P. Shor. Polynomial-Time Algorithms for Prime
Quantum Circuits. arXiv:1805.01450 [quant-ph], Factorization and Discrete Logarithms on a
May 2018. Quantum Computer. SIAM Review, 41(2):
[9] Z.-Y. Chen, Q. Zhou, C. Xue, X. Yang, G.- 303–332, Jan. 1999. ISSN 0036-1445. DOI:
C. Guo, and G.-P. Guo. 64-Qubit Quantum 10.1137/S0036144598347011.
Circuit Simulation. Science Bulletin, 63(15): [20] M. Smelyanskiy, N. P. D. Sawaya, and
964–971, Aug. 2018. ISSN 20959273. DOI: A. Aspuru-Guzik. qHiPSTER: The Quantum
10.1016/j.scib.2018.06.007. High Performance Software Testing Environ-
[10] D. Deutsch. Quantum theory, the Church–Turing ment. arXiv:1601.07195 [quant-ph], Jan. 2016.
principle and the universal quantum computer.
[21] D. S. Steiger, T. Häner, and M. Troyer. ProjectQ:
Proc. R. Soc. Lond. A, 400(1818):97–117, 1985.
An Open Source Software Framework for Quan-
[11] P. A. M. Dirac. A new notation for quan-
tum Computing. Dec. 2016. DOI: 10.22331/q-
tum mechanics. In Mathematical Proceedings
2018-01-31-49.
of the Cambridge Philosophical Society, vol-
ume 35, pages 416–418. Cambridge University [22] J. Tompson and K. Schlachter. An introduction
Press, 1939. to the opencl programming model. Person Edu-
[12] J. Du, M. Shi, J. Wu, X. Zhou, Y. Fan, B. Ye, cation, 49, 2012.
and R. Han. Implementation of a quantum algo- [23] C. Zalka. Grover’s quantum searching algorithm
rithm to solve Bernstein-Vazirani’s parity prob- is optimal. Physical Review A, 60(4):2746–2751,
lem without entanglement on an ensemble quan- Oct. 1999. ISSN 1050-2947, 1094-1622. DOI:
tum computer. Dec. 2000. DOI: 10.1103/Phys- 10.1103/PhysRevA.60.2746.

8
A Benchmarking Source Code
The following is the source code used during the benchmarking of QCGPU against ProjectQ and Qiskit.
1 import click
2 import time
3 import random
4 import statistics
5 import csv
6 import os . path
7 import math
8
9 from qiskit import QuantumRegister , Quantu mCircuit
10 from qiskit import execute , Aer
11
12 from projectq import MainEngine
13 from projectq . backends import Simulator
14 import projectq . ops as ops
15
16 import qcgpu
17
18 def c o n s t r u ct _ c i r c u i t ( num_qubits ) :
19 q = Q ua nt um Re g is te r ( num_qubits )
20 circ = Qu antumCirc uit ( q )
21
22 # Quantum Fourier Transform
23 for j in range ( num_qubits ) :
24 for k in range ( j ) :
25 circ . cu1 ( math . pi / float (2**( j - k ) ) , q [ j ] , q [ k ])
26 circ . h ( q [ j ])
27
28 return circ
29
30
31 # Benchmarking functions
32 qisk it_backe nd = Aer . get_backend ( ’ s t a t e v e c t o r _ s i m u l a t o r ’)
33 eng = MainEngine ( backend = Simulator () , engine_list =[])
34
35 # Setup the OpenCL Device
36 qcgpu . backend . c reate_co ntext ()
37
38 def bench_qiskit ( qc ) :
39 start = time . time ()
40 job_sim = execute ( qc , qiskit_ backend )
41 sim_result = job_sim . result ()
42 return time . time () - start
43
44 def bench_qcgpu ( num_qubits ) :
45 start = time . time ()
46 state = qcgpu . State ( num_qubits )
47
48 for j in range ( num_qubits ) :
49 for k in range ( j ) :
50 state . cu1 (j , k , math . pi / float (2**( j - k ) ) )
51 state . h ( j )
52
53 state . backend . queue . finish ()
54 return time . time () - start
55
56 def bench_pr ojectq ( num_qubits ) :
57 start = time . time ()
58
59 q = eng . al locate_q ureg ( num_qubits )
60
61 for j in range ( num_qubits ) :
62 for k in range ( j ) :
63 ops . CRz ( math . pi / float (2**( j - k ) ) ) | ( q [ j ] , q [ k ])
64 ops . H | q [ j ]
65 eng . flush ()
66
67 t = time . time () - start
68 # measure to get rid of runtime error message
69 for j in q :
70 ops . Measure | j

9
71
72 return t
73
74 def benchmark ( samples , qubits , out , single ) :
75 functions = bench_qcgpu , bench_qiskit , bench_ projectq
76 times = { f . __name__ : [] for f in functions }
77 writer = create_csv ( out )
78
79 for n in range (0 , qubits ) :
80 # Construct the circuit
81 qc = c o n s t r u c t _ c i r c u i t ( n +1)
82
83 # Run the benchmarks
84 for i in range ( samples ) :
85 func = random . choice ( functions )
86 if func . __name__ != ’ bench_qiskit ’:
87 t = func ( n + 1)
88 else :
89 t = func ( qc )
90 times [ func . __name__ ]. append ( t )
91
92 if __name__ == ’ __main__ ’:
93 benchmark ()

B OpenCL Kernel Source Code


B.1 Gate Application

1
2 /*
3 * Returns the nth number where a given digit
4 * is cleared in the binary r epresent ation of the number
5 */
6 static int nth_cleared ( int n , int target )
7 {
8 int mask = (1 << target ) - 1;
9 int not_mask = ~ mask ;
10
11 return ( n & mask ) | (( n & not_mask ) << 1) ;
12 }
13
14 /*
15 * Applies a single qubit gate to the register .
16 * The gate matrix must be given in the form :
17 *
18 * A B
19 * C D
20 */
21 __kernel void apply_gate (
22 __global cfloat_t * amplitudes ,
23 int target ,
24 cfloat_t A ,
25 cfloat_t B ,
26 cfloat_t C ,
27 cfloat_t D )
28 {
29 int const global_id = get_global_id (0) ;
30
31 int const zero_state = nth_cleared ( global_id , target ) ;
32 int const one_state = zero_state | (1 << target ) ;
33
34 cfloat_t const zero_amp = amplitudes [ zero_state ];
35 cfloat_t const one_amp = amplitudes [ one_state ];
36
37 amplitudes [ zero_state ] = cfloat_add ( cfloat_mul (A , zero_amp ) , cfloat_mul (B , one_amp ) ) ;
38 amplitudes [ one_state ] = cfloat_add ( cfloat_mul (D , one_amp ) , cfloat_mul (C , zero_amp ) ) ;
39 }

10
B.2 Controlled Gate Application

1 /*
2 * Applies a controlled single qubit gate to the register .
3 */
4 __kernel void a p p l y _ c o n t r o l l e d _ g a t e (
5 __global cfloat_t * amplitudes ,
6 int control ,
7 int target ,
8 cfloat_t A ,
9 cfloat_t B ,
10 cfloat_t C ,
11 cfloat_t D )
12 {
13 int const global_id = get_global_id (0) ;
14 int const zero_state = nth_cleared ( global_id , target ) ;
15 int const one_state = zero_state | (1 << target ) ; // Set the target bit
16
17 int const c o n t r o l _ v a l _z e r o = (((1 << control ) & zero_state ) > 0) ? 1 : 0;
18 int const co nt ro l _v al _o n e = (((1 << control ) & one_state ) > 0) ? 1 : 0;
19
20 cfloat_t const zero_amp = amplitudes [ zero_state ];
21 cfloat_t const one_amp = amplitudes [ one_state ];
22
23 if ( c o n t r o l _ v a l _ z e r o == 1)
24 {
25 amplitudes [ zero_state ] = cfloat_add ( cfloat_mul (A , zero_amp ) , cfloat_mul (B , one_amp ) )←-
;
26 }
27
28 if ( c on tr ol _v a l_ on e == 1)
29 {
30 amplitudes [ one_state ] = cfloat_add ( cfloat_mul (D , one_amp ) , cfloat_mul (C , zero_amp ) ) ;
31 }
32 }

B.3 Probability Calculation

1 __kernel void c a l c u l a t e _ p r o b a b i l i t i e s (
2 __global complex_f * const amplitudes ,
3 __global float * probabilities )
4 {
5 uint const state = get_global_id (0) ;
6 complex_f amp = amplitudes [ state ];
7
8 probabilities [ state ] = complex_abs ( mul ( amp , amp ) ) ;
9 }

C Example Implementation of the Bernstein-Vazirani Algorithm


In this section, the Bernstein Vazirani algorithm is introduced, along with it’s implementation using the software
developed in this project.
This algorithm was one of the first algorithms to show that quantum computers could have a speedup over
classical computers. It shows the power of circuits that even have a low depth (not that many gates).
The implementation given here is without entanglement, and is based on a paper by Du et al. [12].

C.1 Introduction
The Bernstein-Vazirani
P algorithm finds a hidden integer a ∈ {0, 1}n from an oracle fa that returns a bit
a · x ≡ i ai xi mod 2 for an input x.
Implemented classically, the oracle returns fa (x) = ax mod 2. The quantum oracle behaves analogously, but
can be queried with a superposition.
To solve this problem classically, the hidden integer can be found by checking the oracle with the inputs
x = 1, 2, . . . , 2i , 2n−1 , where each query reveals the ith bit of a (ai ). This is the optimal classical solution, and

11
is O(n). Using a quantum oracle and the Bernstein-Vazirani algorithm, a can be found with just one query to
the oracle.

C.2 Algorithm
The Bernstein-Vazirani algorithm to find the hidden integer a is very simple. Start from the zero state |00 . . . 0i,
apply a Hadamard gate to each qubit, query the oracle, apply another Hadamard gate to each qubit and measure
the resulting state to find a. This procedure is shown in algorithm 3.

Algorithm 3: Bernstein-Vazirani (fa )


Input: A quantum oracle Ufa that returns a bit a · x ≡ i ai xi mod 2, for a hidden integer a ∈ {0, 1}n
P
and input x
Output: a: the hidden integer
1 |ψi ← |000 . . . 000i;
2 |ψi ← H ⊗n ;
3 |ψi ← Ufa ;
4 |ψi ← H ⊗n ;
5 return a ← Measure |ψi;

The correctness of this algorithm can be shown too. Consider the state |ai, where measuring the state would
result in the binary string corresponding to the hidden integer a. If a Hadamard gate is applied to each qubit
in that state, the resulting state is

H ⊗n 1 X
|ai −−−→ √ (−1)a·x |xi . (C.1)
2n x∈{0,1}n

Now consider the state |000 . . . 0i, the same state that the algorithm starts in. Applying Hadamard gates
gives

H ⊗n 1 X
|000 . . . 0i −−−→ √ |xi . (C.2)
2n x∈{0,1}n

These two states differ by a phase of (−1)ax .


Now, the quantum oracle fa returns 1 on input x such that a · x ≡ 1 mod 2, and returns 0 otherwise. This
means we have the following transformation:
fa
|xi (|0i − |1i) −→ |xi (|0 ⊕ fa (x)i − |1 ⊕ fa (x)i) = (−1)a·x |xi (|0i − |1i) , (C.3)
where ⊕ is the XOR operation (outputs 1 only when the inputs differ) and |0i ≡ |00 . . . 0i. In the above
equation, the |0i − |1i state does not change, and can be ignored. Thus, the oracle can create (−1)ax |xi from
the input |xi.
With this, starting from the state |0i,

H ⊗n 1 X
|0i −−−→ √ |xi (C.4)
2n x∈{0,1}n
fa 1 X
−→ √ (−1)ax |xi (C.5)
2n x∈{0,1}n

H ⊗n
−−−→ |ai , (C.6)

as the Hadamard gates cancel.

C.3 Inner Product Oracle


The oracle used in this algorithm is the Inner product oracle. It transforms the state |xi into the state (−1)ax |xi.
The method of construction shown here requires no ancilla qubits (extra qubits not used in the final result) [12].
This is not the only method. Another approach is to use CNOT gates, but that does require ancilla qubits.

12
To construct the oracle, first note that
Y
(−1)a·x = (−1)a1 x1 . . . (−1)ai xi . . . (−1)an xn = (−1)xi . (C.7)
i:ai =1

It follows from this that the inner product oracle can be composed of single qubit gates,

Ofa = O1 ⊗ O2 ⊗ · · · ⊗ Oi ⊗ · · · ⊗ On , (C.8)
where Oi = (1 − ai )I + ai Z. The gates I and Z are the identity gates and Pauli Z gates respectively, and
ai ∈ {0, 1}.

C.4 Implementation
Now, an implementation of this algorithm using QCGPU will be shown.
1 import qcgpu

First, the number of qubits to use in the experiment can be set. Also, in order to construct the oracle, the
hidden integer a must be given.
1 num_qubits = 14 # how many qubits to use
2 a = 101 # the hidden integer . bit - string is 1100101

Now the algorithm can be implemented


1 # Create the quantum register
2 register = qcgpu . State ( num_qubits )
3
4 # Apply Hadamard gates to each qubit
5 for i in range ( num_qubits ) :
6 register . h ( i )
7
8 # Apply the inner - product oracle
9 for i in range ( num_qubits ) :
10 if ( a & (1 << i ) ) :
11 register . z ( i )
12 # note : here should be an identity gate ,
13 # but that doesn ’t modify the state
14
15 # Apply Hadamard gates to each qubit
16 for i in range ( num_qubits ) :
17 register . h ( i )
18
19 # Measure the register
20 measurements = register . measure ( samples = 1000)

As can be seen from figure 4, the measurement outcome is the same as the bit-string of the hidden integer a.

1.2
1.000
1.0

0.8
Probabilities

0.6

0.4

0.2

0.0
01
1001
0001
0000

Figure 4: Bernstein-Vazirani Measurement Outcomes

13

You might also like