VCL Guide
VCL Guide
Cluster Platform
January 2013
ii
c
Copyright
2010-2013 Amnon Barak and Amnon Shiloh. All right reserved.
Preface
This document presents the VirtualCL and SuperCL guide and manuals.
Further information is available at http : //www.mosix.org/txt vcl.html.
iii
iv PREFACE
Contents
Preface iii
2 SuperCL 5
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Using SuperCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Installation 7
3.1 Automatic installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Manual installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Manuals 9
4.1 For users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 For programmers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
2 CONTENTS
Chapter 1
1.1 Overview
The VirtualCL (VCL) cluster platform is a wrapper for OpenCLTM that allows most un-
modified applications to transparently utilize multiple OpenCL devices in a cluster as if all
the devices are on the local computer.
The main features of VCL are:
• Works with OpenCL devices (CPUs, GPUs, Accelerators) from all vendors.
3
4 CHAPTER 1. THE VCL CLUSTER PLATFORM
SuperCL
2.1 Overview
SuperCL is a micro-language to optimize remote OpenCL operations by reducing the net-
work overheads.
5
6 CHAPTER 2. SUPERCL
Chapter 3
Installation
File Directory
vcl /etc/init.d/vcl
vclconf /sbin/vclconf
opencld /sbin/opencld
broker /sbin/broker
libopenCL.so /usr/lib/vcl/libOpenCL.so
vclrun /usr/bin/vclrun
man/man7/vcl.7 /usr/share/man/man3/supercl.3
man/man3/supercl.3 /usr/share/man/man7/vcl.7
supercl.h /usr/include/supercl.h
Then either run “vclconf” or edit the VCL configuration manually, according to the
instructions in “man vcl”.
7
8 CHAPTER 3. INSTALLATION
Chapter 4
Manuals
The manuals in this chapter are provided for general information. Users are advised to rely
on the manuals that are provided with their specific VCL distribution.
9
VCL (7) VCL Description VCL (7)
NAME
VCL — - Virtual wrapper for OpenCL, combining the power of many GPUs
INTRODUCTION
The OpenCL standard allows applications to accelerate computation by using various GPU and other devices
in a generic way. However, the number of such accelerating devices is limited by hardware, with typically
only 1-4 devices per computer.
VCL is a wrapper for OpenCL that extends access to OpenCL devices (such as CPUs, GPUs and accelera-
tors) beyond the devices of the local computer.
Users of VCL run their applications on hosting-nodes (hosts), using the VCL library instead of a vendor-spe-
cific SDK (OpenCL Software-Development-Kit). The actual OpenCL devices reside in back-end nodes.
Hosting-nodes and back-end nodes may overlap, so some computers may serve simultaneously as both a
hosting-node and a back-end node.
REQUIREMENTS
1. All participating computers must run Linux with the x86_64 (64-bit) architecture.
2. Hosts must be connected to back-end nodes over a network that supports TCP/IP.
3. TCP/IP port 255 must be reserved for VCL (not used by other applications or blocked by a firewall).
4. Back-end nodes must have OpenCL version 1.1 (or higher) installed (but different nodes are not
required to have the same hardware or SDK).
CONFIGURATION
To configure VCL interactively, simply run vclconf, which will guide you through the various configura-
tion options.
Vclconf can be used in two ways:
1. In order to configure the local computer, respond to the first question by pressing <Enter>.
2. In order to configure other computers, respond to the first question by entering the path to a root-direc-
tory: this is typically done on an NFS server that stores an image of the root-partition for a cluster of
hosts, back-end nodes, or both.
Below is a detailed description of the VCL configuration files, in case you prefer to edit them manually:
/etc/vcl/is_back_end
The presence of this file indicates that the computer is a back-end node.
/etc/vcl/may_read_files
The presence of this file allows reading files on this back-end node for the implementaiton of the
CL_MEM_FILE_HOST_PTR extension (see below) and within SuperCL(3).
/etc/vcl/may_write_files
The presence of this file allows writing files on this back-end node within SuperCL(3).
/etc/vcl/amd-1.2
The presence of this file indicates that you wish to use AMD’s experimental OpenCL-1.2 SDK on
this back-end (see "CURRENT STATUS" below).
/etc/vcl/is_host
The presence of this file indicates that the computer is a hosting-node.
/etc/vcl/nodes
On hosting nodes, this file explicitly lists the potential back-end nodes, one per line, either as host-
names or as IP addresses (see "STARTING VCL" below for debug options).
/etc/vcl/passwd
This file contains a unique password that is agreed between the relevant hosts and back-end nodes.
It MUST be owned by "root" and allow no read/write permissions to any other users. When present,
this will prevent unauthorized hosting-nodes from attempting to contact VCL.
these extensions are indeed supported by all back-end nodes), then a comma-separated list of exten-
sions can be given, which can then be returned by clGetPlatformInfo().
VCL also adds its own extension, named "multi-node", which by default is the only extension
returned.
(note that extensions that allow OpenCL to interact with OpenGL are not supported).
Alternately, use:
vclrun --extensions={comma-separated-list-of-extensions}.
SIG_FOR_VCL_USE=
The VCL library needs a signal for its internal use. The variable
SIG_FOR_VCL_USE={signum} can be used to select a different signal-number in the range of
34 to 64, in case the default of 45 is already in use by the application.
Alternately use:
vclrun --sig={signum}
VCL_JOBID=
Declare the application to be part of a "job", which can be any positive number: to succeed, the sys-
tem-administrator must first allocate separate resources for the given job (see JOB SUPPORT
below).
Alternately, use:
vclrun --job={job-number}.
STARTING VCL
To start VCL, run:
/etc/init.d/vcl start.
If you just configured a back-end node to also be a hosting-node, run:
/etc/init.d vcl start_host.
If you just configured a hosting node to also be a back-end node, run:
/etc/init.d/vcl start_backend.
VCL EXTENSIONS
A new memory-object creation flag was introduced in VCL to prevent the expensive overhead of sending ker-
nel input over the network. The initial contents of an input memory-object may instead be read from a file
on the first back-end node where the memory-object is used by a kernel.
When the
CL_MEM_FILE_HOST_PTR (0x100000)
flag is set, the host_ptr argument of clCreateBuffer(), clCreateImage2D() and
clCreateImage3D() points to a structure that describes a file from which the memory-object is to be
read. The structure contains:
{
long long version; /∗ must be 1 for now ∗/
char ∗filename; /∗ the file from which to read ∗/
unsigned long long file_offset; /∗ where to start reading ∗/
unsigned long long count; /∗ number of bytes to read ∗/
}
If the filename does not start with a ’/’, then it is interpreted relative to the current-diretory on the hosting-
node.
If count is greater than the object’s size, then it is truncated to the object’s size. If it is smaller than the
object’s size, then the rest of the memory-object is filled with 0’s.
Unless filename is "/dev/null", a file named /etc/vcl/may_read_files must be present on the
back-end node to permit access. filename is opened on the back-end host using the same user and group
IDs as that of the calling application. If the file cannot be read on the back-end node (including due to lack
of permission or the absence of the file /etc/vcl/may_read_files), then the kernel fails with the
error CL_OUT_OF_RESOURCES.
The contents of a memory-object created with the CL_MEM_FILE_HOST_PTR flag are undefined until the
first kernel uses that memory-object. Attempts to read/write/copy such a memory-object before its first use
will fail with the error CL_INVALID_MEM_OBJECT.
The CL_MEM_FILE_HOST_PTR flag cannot be combined with CL_MEM_USE_HOST_PTR or
CL_MEM_COPY_HOST_PTR.
Another major extension is SUPERCL, combining multiple back-end operations in a single call, thus saving
on network delays - please read the SUPERCL(3) manual page.
JOB SUPPORT
A "job" is a set of application-instances that share access to a number of OpenCL devices on different nodes
(not necessary the cluster defined by vclconf). Jobs are numbered with positive integers. To set up for
running a job, the system-administrator (or a "root" script) should run:
/sbin/broker -k{jobID}.
CURRENT STATUS
VCL supports OpenCL version 1.1 (and 1.0) almost completely.
Outstanding problems:
1. There is no way (yet) for applications to know on which back-end node a given device resides.
2. OpenCL-programs that produce different routines, or routines with different parameters for different
device-types, will only work properly if they are created as different "Programs" for different device-
types.
3. VCL is unable to return an error when an application supplies a kernel with an argument of a wrong
size. An error will therefore only occur when the kernel is eventually activated.
SEE ALSO
supercl(3).
NAME
SuperCL - Optimize remote OpenCL operations by reducing network overheads
PURPOSE
The main obstacle in using OpenCL devices on remote computers (nodes) is network latency: in order to run
a remote kernel, or to perform a remote I/O operation, an instruction must be sent over the network, followed
by a reply - this adds two network-latency delays to the kernel’s execution time. SuperCL minimizes these
delays by packing multiple remote kernel activations and/or I/O operations into a single call, so that only one
networked instruction is needed to activate them all and only one reply is received.
While SuperCL is included and most useful in VCL(7), it may also be convenient for general OpenCL use.
AUDIENCE
This manual is intended for programmers who are familiar with the OpenCL library.
SYNOPSIS
#include <supercl.h>
DESCRIPTION
clSuper is similar in structure to standard clEnqueueXXX() OpenCL functions, but instead of queuing
a single operation, it queues a sequence of operations. clSuper shares its OpenCL context and other
OpenCL objects with the rest of the OpenCL library and can be freely mixed with the other OpenCL func-
tions: please see the OpenCL manual (version 1.1) for the description of queue,
num_events_in_wait_list, event_wait_list and event.
Sequence is an array of instructions, constituting a mini-program that invokes OpenCL kernels and I/O
operations. Each instruction is 128 bytes long, beginning with the cmd (command) field - The rest of the
instruction depends on the specific command. The sequence ends with the command
SCL_END_OF_SEQUENCE. clSuper executes the instructions of the sequence in order, unless an instruc-
tion calls for a jump, forks a new thread or terminates a thread. When the last thread reaches the
SCL_END_OF_SEQUENCE without encountering any errors, the operation terminates and the event status
is set to CL_COMPLETE. If an error is encountered, all threads terminate and the event status is set to first
encountered error.
Registers are available for program control (such as loops) and for other features as described below. Regis-
ter values can be either 64-bit integers or 64-bit real numbers. All registers are initially set to integer-zero.
There is no need to pre-define registers and there is no hard limit on the number of registers used, but a large
number of registers (1000’s and more) can slow down SuperCL significantly.
Register numbers are 31-bit integers and there are two types of registers - global and private. Global regis-
ters are shared among all threads, while private registers belong to a specific thread and/or its child-threads.
The instructions are:
SCL_KERNEL_ARG
Set a kernel argument to a fixed value.
struct
{
cl_kernel kernel;
cl_int argno;
long arg_len;
union
{
int arg_int;
long arg_long;
float arg_float;
double arg_double;
char arg_value[SCL_MAX_ARG_LEN];
cl_mem arg_mem;
cl_sampler arg_sampler;
struct
{
int regno;
int reg_is_private;
};
};
} kernel_arg;
kernel_arg.kernel is the kernel for which to set the argument.
kernel_arg.argno is the index of the argument to set (starting from 0).
kernel_arg.arglen is the size of the argument.
Memory-object arguments are placed in kernel_arg.arg_mem. Sampler arguments are placed in
kernel_arg.arg_sampler. Regular arguments are placed in either kernel_arg.arg_int,
kernel_arg.arg_long, kernel_arg.arg_float, kernel_arg.arg_double or
kernel_arg.arg_value.
In the rare case when an argument’s size is more than SCL_MAX_ARG_LEN bytes, several consecutive
SCL_KERNEL_ARG instructions are needed, all with the same kernel_arg.kernel,
kernel_arg.argno and kernel_arg.arg_len (set to the total argument length).
Note that all kernel arguments must be defined within SuperCL sequences before a kernel can run: argu-
ments previously set outside the sequence (using clSetKernelArg) do not hold within clSuper().
SCL_KERNEL_ARG_FROM_REGISTER
Set a variable kernel argument from a given register.
Instruction parameters are as above in SCL_KERNEL_ARG, except that the data is taken from the register
kernel_arg.regno. If kernel_arg.reg_is_private is set, then regno designates a private
register.
The argument’s size must be 1, 2, 4 to 8 bytes and if the register contains a real value, then the argument
can only be 4 bytes (float) or 8 bytes (double). It is the programmer’s responsibility to make sure that the
argument’s type corresponds to the register’s type (integer or real).
SCL_RUN_KERNEL
Run an OpenCL kernel.
struct
{
cl_kernel kernel;
cl_device_id device;
int work_dim;
size_t global_work_offset[3];
size_t global_work_size[3];
size_t local_work_size[3];
} run_kernel;
run_kernel.kernel is the kernel to run. run_kernel.device is the device to run the kernel on: a
value of NULL implies the same device as the device associated with the queue - otherwise, in multi-
node plaforms such as VCL(7), the device must be on the same node and same platform as the device of
the queue.
run_kernel.work_dim, run_kernel.global_work_offset,
run_kernel.global_work_size and run_kernel.local_work_size correspond to the same
arguments of clEnqueueNDRangeKernel().
SCL_COPY_BUFFER
Copy an OpenCL buffer, or a part thereof, to another OpenCL buffer.
struct
{
cl_mem from;
cl_mem to;
off_t from_offset;
off_t to_offset;
size_t count;
unsigned int flags;
} copy_buffer;
When no flags are set, copy_buffer.count bytes are copied from buffer copy_buffer.from at
offset copy_buffer.from_offset to buffer copy_buffer.to at offset
copy_buffer.to_offset.
When copy_buffer.flags include the flags: SCLF_FROM_OFFSET_IS_A_REGISTER_NUMBER,
SCLF_TO_OFFSET_IS_A_REGISTER_NUMBER and/or SCLF_COUNT_IS_A_REGISTER_NUMBER,
then the corresponding values in copy_buffer.from_offset, copy_buffer.to_offset and/or
copy_buffer.count are taken to be register numbers containing the corresponding integer values.
Further, if copy_buffer.flags also include the flags:
SCLF_FROM_OFFSET_REGISTER_IS_PRIVATE, SCLF_TO_OFFSET_REGISTER_IS_PRIVATE
and/or SCLF_COUNT_REGISTER_IS_PRIVATE, then the corresponding registers are taken to be private
registers.
SCL_COPY_IMAGE
Copy an OpenCL image, or a region thereof, to another OpenCL image.
struct
{
cl_mem from;
cl_mem to;
size_t src_origin[3];
size_t dst_origin[3];
size_t region[3];
} copy_image;
The given region (copy_image.region) is copied from the image copy_image.from at
copy_image.src_origin to the image copy_image.to at copy_image.dst_origin.
SCL_COPY_IMAGE_TO_BUFFER
Copy an OpenCL image, or a region thereof, to an OpenCL buffer.
struct
{
cl_mem image;
cl_mem buffer;
size_t image_origin[3];
size_t region[3];
off_t buffer_offset;
unsigned int flags;
} copy_image_to_buffer;
When no flags are set, the region copy_image_to_buffer.region is copied from the image
copy_image_to_buffer.image at copy_image_to_buffer.image_origin to the buffer
copy_image_to_buffer.buffer at copy_image_to_buffer.buffer_offset.
When copy_image_to_buffer.flags includes the flag
SCLF_TO_OFFSET_IS_A_REGISTER_NUMBER, then
copy_image_to_buffer.buffer_offset is taken to contain the number of an integer-register that
contains the target buffer’s offset. Further, if copy_image_to_buffer.flags also includes the flag
SCLF_TO_OFFSET_REGISTER_IS_PRIVATE, then that register is taken to be a private register.
SCL_COPY_BUFFER_TO_IMAGE
Copy an OpenCL buffer, or a part thereof, to an OpenCL image.
struct
{
cl_mem buffer;
cl_mem image;
off_t buffer_offset;
size_t image_origin[3];
size_t region[3];
unsigned int flags;
} copy_buffer_to_image;
When no flags are set, a section of the buffer copy_buffer_to_image.buffer, beginning at
copy_buffer_to_image.buffer_offset, is copied to the region
copy_buffer_to_image.region of the image copy_buffer_to_image.image at
copy_buffer_to_image.image_offset.
{
int regno;
int flags;
cl_mem buffer;
size_t offset;
enum
{
LOAD_CHAR, LOAD_UCHAR, LOAD_SHORT, LOAD_USHORT, LOAD_INT,
LOAD_UINT, LOAD_LONG, LOAD_ULONG, LOAD_FLOAT, LOAD_DOUBLE
} register_type;
} reg_buffer;
An element from the buffer reg_buffer.buffer at offset reg_buffer.offset is loaded to the
register number reg_buffer.regno. The type of the element is determined by
reg_buffer.register_type and can be: char, unsigned char, short, unsigned short, int, unsigned int,
long, unsigned long, float or double.
If reg_buffer.flags includes the flag SCLF_REGISTER_IS_PRIVATE, then the register to be
loaded is a private register.
If reg_buffer.flags includes the flag SCLF_REGISTER_IS_INDIRECT, then
reg_buffer.regno is taken to be a register that contains the (integer) number of the register into which
to load the data.
If reg_buffer.flags also includes the flag SCLF_INDIRECT_REGISTER_IS_PRIVATE then the
register contaning the register-number is private.
If reg_buffer.flags includes the flag SCLF_OFFSET_IS_A_REGISTER_NUMBER then
reg_buffer.offset is taken to be an (integer) register number of a register that contains the actual
(integer) offset.
If reg_buffer.flags also include the flag SCLF_OFFSET_REGISTER_IS_PRIVATE, then that
register is private.
SCL_LOAD_BUFFER_FROM_REGISTER
Everything as in LOAD_REGISTER_FROM_BUFFER, except that the value of the register is loaded to the
buffer element.
SCL_COPY_BUFFER_TO_HOST
Copy an OpenCL buffer, or a part thereof, to the host (application) memory.
struct
{
cl_mem buf;
size_t offset;
size_t count;
void ∗to;
size_t host_offset;
unsigned int flags;
} copy_buffer_to_host;
When no flags are set, copy_buffer_to_host.count bytes of the buffer
copy_buffer_to_host.buf, beginning at copy_buffer_to_host.offset, are copied to host
(application) memory address (copy_buffer_to_host.to +
copy_buffer_to_host.offset)
When copy_buffer_to_host.flags includes the flags
SCLF_FROM_OFFSET_IS_A_REGISTER_NUMBER,
SCLF_TO_OFFSET_IS_A_REGISTER_NUMBER and/or SCLF_COUNT_IS_A_REGISTER_NUM-
BER, then the respective buffer-offset, host-offset and/or count are taken to be register numbers where the
corresponding registers contain the values of the buffer-offset, host-offset and/or the byte-count. Further,
when copy_buffer_to_host.flags also contains the flags
SCLF_FROM_OFFSET_REGISTER_IS_PRIVATE, SCLF_TO_OFFSET_REGISTER_IS_PRIVATE
and/or SCLF_COUNT_REGISTER_IS_PRIVATE, then the corresponding registers are taken to be private
registers.
This instruction may complete before the data actually arrives at the host-application. It is however guaran-
teed that data from all SCL_COPY_BUFFER_TO_HOST and SCL_COPY_IMAGE_TO_HOST instructions
and signals from all SCL_SIGNAL_HOST instructions will arrive at the host-application in the same order
as completed (including by other threads).
SCL_COPY_IMAGE_TO_HOST
Copy an OpenCL image, or a region thereof, to the host (application) memory.
struct
{
cl_mem image;
size_t origin[3];
size_t region[3];
void ∗to;
} copy_image_to_host;
The region copy_image_to_host.region of the image copy_image_to_host.image, begin-
ning at copy_image_to_host.origin, is copied to the host (application) memory address
copy_image_to_host.to
This instruction may complete before the data actually arrives at the host-application. It is however guaran-
teed that data from all SCL_COPY_BUFFER_TO_HOST and SCL_COPY_IMAGE_TO_HOST instructions
and signals from all SCL_SIGNAL_HOST instructions will arrive at the host-application in the same order
as completed (including by other threads).
SCL_SIGNAL_HOST
Send a signal to the host application.
This instruction may complete before the signal actually arrives at the host-application. It is however guar-
anteed that data from all SCL_COPY_BUFFER_TO_HOST and SCL_COPY_IMAGE_TO_HOST instruc-
tions and signals from all SCL_SIGNAL_HOST instructions will arrive at the host-application in the same
order as completed (including by other threads).
SCL_FILENAME
Set a file-name for later use by the SCL_CHDIR, SCL_READFILE and SCL_WRITEFILE instructions.
struct
{
char cont;
char numeric;
char digits;
char reg_is_private;
int regno;
char fn[SCL_MAX_FILENAME_LEN];
} filename;
Each thread carries one file-name (initially the NULL string). When forking, this file-name is inherited by
the child-thread.
If filename.cont is 0, then a fresh file-name is started. If it is 1, then a string is appended to the exist-
ing file-name.
If filename.numeric is 0, then the string is taken from filename.fn. If it is 1, then a numeric
string is built from the non-negative integer register, filename.regno (when
filename.reg_is_private is set, then filename.regno refers to a private register). Further, if
digits is positive (1-127), then the numeric string will contain at least that number of digits, 0-padded to
the left as necessary.
SCL_CHDIR
Set a current-directory for later use by the SCL_READFILE and SCL_WRITEFILE instructions.
This instruction has no parameters.
Each thread may carry a current-directory (initially none). When forking, this current-directory is inherited
by the child-thread.
The thread’s current directory is set by this instruction to the current thread’s file-name, which must there-
fore be previously set.
Unless the current thread’s file-name starts with a ’/’, the directory-name is interpreted relative to the former
current-directory, which must also be previously set.
Note that the current-directory refers to the directory itself, not to its name, so if that directory is later
moved, file input/output that is relative to the current-directory will occur in the moved directory.
SCL_READFILE
SCL_WRITEFILE
Read data from a file to an OpenCL buffer or write data from an OpenCL buffer to a file.
struct
{
cl_mem buffer;
off_t file_offset;
off_t buffer_offset;
size_t count;
unsigned int flags;
int mode;
} file_rw;
The name of the file to read/write should be set in advance using the SCL_FILENAME instruction. Unless
that file-name starts with a ’/’, it is interpreted relative to the thread’s current-directory, which must be pre-
viously set as well.
file_rw.buffer is the buffer to read/write. file_rw.file_offset is the file-offset from where
to read or where to write. file_rw.buffer_offset is the offset in the buffer where to read or write
from. file_rw.count is the number of bytes to read/write. file_rw.flags may contain the fol-
lowing flags:
SCLF_FILE_OFFSET_IS_A_REGISTER_NUMBER
file_rw.file_offset contains a register number which contains the actual file offset.
SCLF_FILE_OFFSET_REGISTER_IS_PRIVATE
The register given with SCLF_FILE_OFFSET_IS_A_REGISTER_NUMBER is a private register.
SCLF_BUFFER_OFFSET_IS_A_REGISTER_NUMBER
file_rw.buffer_offset contains a register number which contains the actual buffer offset.
SCLF_BUFFER_OFFSET_REGISTER_IS_PRIVATE
The register given with SCLF_BUFFER_OFFSET_IS_A_REGISTER_NUMBER is a private register.
SCLF_COUNT_IS_A_REGISTER_NUMBER
file_rw.count contains a register number which contains the actual count.
SCLF_COUNT_REGISTER_IS_PRIVATE
The register given with SCLF_COUNT_IS_A_REGISTER_NUMBER is a private register.
SCLF_CREATE_FILE
(only with SCL_WRITEFILE) If the file to be written does not exist, create it with mode given in
file_rw.mode.
SCL_ARITHMETIC
Perform various register operations.
struct
{
union
{
long long i;
double d;
} val1, val2;
enum supercl_arithmetic op;
unsigned int flags;
long label;
} arithmetic;
The operation to perform depends on arithmetic.op.
The first set of operations have two operands (arithmetic.val1 and arithmetic.val2), where
the first operand, a register, is affected by the second operand. These are:
SCLF_OP1_INTEGER / SCLF_OP2_INTEGER
The corresponding operand is a constant 64-bit integer (with value in arithmetic.val1.i or
arithmetic.val2.i)
SCLF_OP1_REAL / SCLF_OP2_REAL
The corresponding operand is a constant 64-bit real number (with value in arithmetic.val1.d or
arithmetic.val2.d)
SCLF_OP1_SPECIAL_REGISTER / SCLF_OP2_SPECIAL_REGISTER
The corresponding operand is a special internal register. At this time only two such registers are avail-
able, both are 64-bit integers and both are read-only:
1. OPENCL_EVENT_COUNTER increments every time an OpenCL operation completes on the device
where the clSuper() is queued. This register cannot be relied upon to count the number of
OpenCL operations on the device because it may also increment on other events - but as its value can
only increase, it can be used to check whether any OpenCL function completed (including functions
initiated by the current or another SuperCL() instance), for example in order to pause until a value
of a memory-object has changed.
2. SUPERCL_NANOTIME provides the time in nano-seconds since an arbitrary point in the past (which
can be different for different devices).
SCLF_OP1_PRIVATE / SCLF_OP2_PRIVATE
The corresponding register operand is private.
SCLF_OP1_INDIRECT / SCLF_OP2_INDIRECT
The corresponding operand is a register containing the integer register-number on which to operate (this
allows, for example, to create arrays of registers). The resulting register-number must be a non-nega-
tive integer.
SCLF_OP1_INDIRECT_PRIVATE / SCLF_OP2_INDIRECT_PRIVATE
In combination with SCLF_OP1_INDIRECT / SCLF_OP2_INDIRECT, the corresponding register
that contains the number of the register-operand is private.
SCL_LABEL
A place to jump to.
long label;
No operation is involved: arithmetic jump operations can go here and new threads can start here.
SCL_FORK
Start a new thread.
struct fork
{
long label;
unsigned int flags;
} fork;
A new thread starts running from the label fork.label. By default, the label (fork.label) is
searched forward (wrapping back to the first instruction if the end-of-sequence is reached) - unless the flag
SCLF_JUMP_BACKWARD is set in fork.flags, causing a backward search.
If fork.flags includes SUPERCL_FORK_PRIVATE_REGISTERS, then the new thread acquires its
own set of private registers - otherwise the new thread shares its parent’s private registers (if neither the par-
ent thread nor any of its ancesstors was created using SCL_FORK with the
SUPERCL_FORK_PRIVATE_REGISTERS set, then the "private" registers are the global registers).
SCL_JOIN
Join two threads.
The first thread that reaches any specific SCL_JOIN instruction waits there. The second
thread that arrives at that point exits and causes the first thread to continue.
If all remaining threads are waiting at an SCL_JOIN instruction, then clSuper() fails with the
CL_INVALID_PROGRAM_EXECUTABLE error.
SCL_END_OF_SEQUENCE
End of the sequence.
struct
{
long version;
int ∗faulty_instruction;
} end_of_sequence;
When all threads reach this instruction, the clSuper() instance is complete.
For the current release, end_of_sequence.version must be 0.
If end_of_sequence.faulty_instruction is not NULL and an error occurs, then the instruction
number at which the error occured is stored in the integer pointed by
end_of_sequence.faulty_instruction (at the time of calling clSuper). Instruction numbers
start at 0. A few general errors that are not related to a specific instruction may instead store the total num-
ber of instructions. If no error occurs, the pointed integer is not modified.
In order to prevent races and allow threads to perform complex arithmetic operations in an atomic fashion,
threads are guaranteed to continue to run uninterrupted by other threads so long as they:
1. Do not run OpenCL kernels.
2. Do not perform operations that involve OpenCL memory-objects.
3. Do not perform file I/O.
4. Do not jump backwards.
5. Do not fork backwards.
6. Do not arrive at SCL_JOIN.
ERROR CODES
Some errors are detected immediately when clSuper() is invoked, causing it to return an error. Among
these errors:
CL_INVALID_VALUE
The end_of_sequence.version is not zero.
CL_INVALID_DEVICE
A device for running a kernel is either not on the same node as the node associated with queue; not of
the same platform; or not in the same context.
CL_INVALID_KERNEL
A kernel mentioned in the sequence does not exist or does not belong to the same context.
CL_INVALID_PROGRAM
A kernel mentioned in the sequence is not built on any device of the node and platform associated
with queue.
CL_INVALID_MEM_OBJECT
A memory-object mentioned in the sequence does not exist or does not belong to the same context.
CL_INVALID_VALUE
sequence is NULL.
CL_INVALID_VALUE
Kernel-argument length > SCL_MAX_ARG_LEN, but the following instruction(s) do not match the
same SCL_KERNEL_ARG operation.
CL_INVALID_VALUE
Kernel-argument length is negative; zero for a regular argument; not sizeof(cl_mem) for a memory-
object argument; or not sizeof(cl_sampler) for a sampler argument.
CL_INVALID_VALUE
Memory-object argument is not a memory-object
CL_INVALID_VALUE
Kernel-argument from register is not 1, 2, 4 or 8 bytes long.
CL_INVALID_VALUE
Inappropriate flags for an instruction.
CL_INVALID_VALUE
Negative or extremely large offset or region.
CL_INVALID_VALUE
Inappropriate signal number.
CL_INVALID_VALUE
Inappropriate register_type.
CL_INVALID_PROGRAM_EXECUTABLE
Jump/Fork to a non-existent label.
CL_INVALID_CONTEXT
Memory-object or sampler argument is not of the same context as queue.
CL_INVALID_ARG_INDEX
Invalid argument number.
CL_INVALID_SAMPLER
Sampler argument is not a sampler.
CL_INVALID_QUEUE
queue is not a valid command queue.
CL_INVALID_WORK_DIMENSION
Kernel’s work-dimension is not 1-3.
CL_INVALID_WORK_GROUP_SIZE
Negative or extremely large work-group size.
CL_INVALID_MEM_OBJECT
A memory-object mentioned in a copy operation is of inappropriate dimenstions (a buffer when expect-
ing an image or an image when expecting a buffer).
CL_INVALID_VALUE
Attempt to constract a file-name from a NULL-string; or using a negative number of digits.
Other errors are detected only once the sequence starts. In addition to the usual errors that are reported by
OpenCL when functions fail, the following are also possible:
CL_INVALID_PROGRAM_EXECUTABLE
Lost connection with the remote node.
CL_INVALID_OPERATION
Arithmetic error: division by zero; square-root of a negative number; logarithmus of a non-positive num-
ber; modulus of a non-positive or non-integer value; raising a negative value to a non-integer power;
CL_INVALID_OPERATION
Negative or non-integer register number during an arithmetic operation.
CL_INVALID_VALUE
Attempt to load a 1 or 2 byte buffer element from a register that contains a real value.
CL_INVALID_VALUE
Register number derived from "offset" or "count" is negative or does not fit in a 31-bit integer.
CL_INVALID_VALUE
Indirect register in a copy operation contains a non-integer or a negative value.
CL_INVALID_VALUE
Bad; non-integer; or negative register used in constructing a file-name.
CL_INVALID_VALUE
File-name or former current-directory not defined when attempting to set current-directory.
CL_OUT_OF_RESOURCES
Failure to open the current-directory: possibly due to it’s non-existence; file-name or former current-
directory undefined; lack of access-permissions by user; or lack of administrative permission to perform
I/O altogether on the node where SuperCL is running.
CL_INVALID_VALUE
Invalid; non-integer; or negative-valued register used as file-offset, buffer-offset or count in file-I/O oper-
ations.
CL_OUT_OF_RESOURCES
Failure to read/write file, possibly due to the file’s non-existence; file-name or current-directory unde-
fined; reading a file that is too short; lack of access-permissions by user; or lack of administrative per-
mission to perform I/O altogether on the node where SuperCL is running.
CL_INVALID_VALUE
Count when copying data to host, is negative.
CL_INVALID_PROGRAM_EXECUTABLE:
Deadlock - all threads stuck in SCL_JOIN.
The addition of the CL_MEM_FILE_HOST_PTR flag in cl_mem_flags allows for the use of temporary
memory-objects that are only used within a clSuper() instance, whose data never needs to be transferred
across the network from the host-application to the remote node.
EXAMPLES OF USE
1. Run a number of kernels in a row.
2. Run a number of kernels N times in a loop.
3. Run a number of kernels N times in a loop - in between, assynchroneously report intermediate results to
the application (possibly send it a signal to let it know that the data is ready).
4. Run an iterative kernel. The user of the interactive application may from time to time request to read a
particular section of the data or to pause or terminate the SuperCL instance (this can be done by writing
to a control buffer). If kernels run for a long time, then responding to user requests can be done in a dif-
ferent thread.
5. Run an iterative kernel on an image that is too large to fit on one node and must therefore be divided
among several SuperCL instances on several nodes. While the next iteration is running, the edges from
the previous iteration are sent assynchorneously (by a different thread), through the application, to other
SuperCL instances on other nodes. Once edges arrive from other nodes, a different kernel can be used
to integrate the edges with the main buffer/image.
6. Run a GPU kernel, then check the accuracy of the result using a CPU kernel: iterate until sufficient
accuracy is achieved.
7. As above, but send intermediate results to the application (assynchroneously). The application may
then report (assynchroneously) about the progress of other SuperCL instances on other nodes, which
can affect whether to continue iterating and/or modify some parameters.
CURRENT STATUS
SuperCL is still evolving, so no binary backward-compatibility of future releases with the current release
should be assumed.
SEE ALSO
vcl(7).