Skip to content

Commit f5fb759

Browse files
authored
[SYCL] Extension spec for queue index (#7520)
Add a proposed extension specification that allows a queue to be tied to a "queue index".
1 parent 1b68ffd commit f5fb759

File tree

1 file changed

+214
-0
lines changed

1 file changed

+214
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
= sycl_ext_intel_queue_index
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2022-2022 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
30+
== Contact
31+
32+
To report problems with this extension, please open a new issue at:
33+
34+
https://github.com/intel/llvm/issues
35+
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 6 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
44+
== Status
45+
46+
This is a proposed extension specification, intended to gather community
47+
feedback. Interfaces defined in this specification may not be implemented yet
48+
or may be in a preliminary state. The specification itself may also change in
49+
incompatible ways before it is finalized. *Shipping software products should
50+
not rely on APIs defined in this specification.*
51+
52+
53+
== Overview
54+
55+
Backends such as Level Zero and OpenCL expose an "index" to a device's work
56+
submission queue, which allows the application to fine tune the way work is
57+
submitted to a device. This extension exposes that same concept to SYCL
58+
applications.
59+
60+
Most SYCL applications should not need to use this extension because the SYCL
61+
implementation automatically selects an efficient way to submit work to a
62+
device, including automatic selection of a queue index when necessary.
63+
Therefore, this extension is aimed at advanced users who understand the device
64+
hardware and think they can outperform the default implementation by specifying
65+
an explicit queue index.
66+
67+
Note that this extension can be supported on any backend, even if the backend
68+
has no notion of a "queue index". Backends that have no native support for a
69+
queue index can report that a device has only a single available queue index.
70+
Applications can then only request one possible queue index, and the backend
71+
can treat this as the default behavior (i.e. the backend can ignore the index).
72+
73+
74+
== Specification
75+
76+
=== Feature test macro
77+
78+
This extension provides a feature-test macro as described in the core SYCL
79+
specification. An implementation supporting this extension must predefine the
80+
macro `SYCL_EXT_INTEL_QUEUE_INDEX` to one of the values defined in the table
81+
below. Applications can test for the existence of this macro to determine if
82+
the implementation supports this feature, or applications can test the macro's
83+
value to determine which of the extension's features the implementation
84+
supports.
85+
86+
[%header,cols="1,5"]
87+
|===
88+
|Value
89+
|Description
90+
91+
|1
92+
|Initial version of this extension.
93+
|===
94+
95+
=== New device information descriptor
96+
97+
This extension adds the following new device information descriptor which
98+
allows the application to query the number of available queue indices for the
99+
device.
100+
101+
```
102+
namespace sycl::ext::intel::info::device {
103+
104+
struct max_compute_queue_indices;
105+
106+
} // namespace sycl::ext::intel::info::device
107+
```
108+
109+
The return type for this information descriptor is `int`, and the value is a
110+
positive integer telling the number queue indices that are available for the
111+
device. These indices are numbered sequentially starting at `0`.
112+
113+
=== New queue property
114+
115+
This extension adds the following new queue property which can be specified to
116+
the queue constructor via the `property_list` parameter.
117+
118+
```
119+
namespace sycl::ext::intel::property::queue {
120+
121+
class compute_index {
122+
public:
123+
compute_index(int idx);
124+
int get_index();
125+
};
126+
127+
} // namespace sycl::ext::intel::property::queue
128+
```
129+
130+
The `compute_index` property is a hint to the implementation which can affect
131+
work submission concurrency. When two queues for the same device have
132+
different queue indices, there is a greater chance that commands submitted to
133+
the two queues will be concurrently submitted to the device.
134+
135+
It is an error to specify a queue index that is out of range for the queue's
136+
device. The `queue` constructor throws an `exception` with `errc::invalid` if
137+
the index is less than `0` or if the index is greater than or equal to the
138+
value returned by `max_compute_queue_indices` for the queue's device.
139+
140+
The constructor and member functions of the `compute_index` property have the
141+
following semantics.
142+
143+
[%header,cols="1,3"]
144+
|===
145+
|Function
146+
|Description
147+
148+
|`compute_index(int idx)`
149+
|Constructs a property with the given queue index.
150+
151+
|`int get_index()`
152+
|Returns the queue index associated with the property.
153+
|===
154+
155+
156+
== Example usage
157+
158+
The following code snippet shows how to create a SYCL queue using a specific
159+
queue index.
160+
161+
```
162+
#include <sycl/sycl.hpp>
163+
164+
using sycl;
165+
using sycl::ext::intel;
166+
167+
void foo(device d) {
168+
int max_index = d.get_info<info::device::max_compute_queue_indices>();
169+
int index = /* choose value between 0 and max_index-1 */;
170+
queue q{d, property::queue::compute_index{index}};
171+
}
172+
```
173+
174+
175+
== Behavior on Intel GPU devices
176+
177+
:multi-CCS: https://github.com/intel/compute-runtime/blob/master/level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md
178+
:sycl_ext_intel_cslice: https://github.com/intel/llvm/pull/7513
179+
180+
This non-normative section describes the behavior of the `compute_index`
181+
property for some specific Intel GPU devices when using {dpcpp}. These details
182+
are not part of the extension specification, and this behavior may not apply to
183+
other devices.
184+
185+
On many Intel devices, there is just one available queue index, and there is
186+
therefore no advantage to using the `compute_index` property. However, this
187+
property can sometimes be useful when running on Data Center GPU Flex series
188+
devices (aka ATS-M) or Data Center GPU Max series devices (aka PVC).
189+
190+
Some models of ATS-M support multiple queue indices with the semantics
191+
described in the sections above. When a single process submits kernels to
192+
different queue indices, there is a greater likelihood that the kernels will
193+
be submitted concurrently.
194+
195+
PVC also supports multiple queue indices on each tile, but these queue indices
196+
have a different semantic. In order to expose multiple queue indices on PVC,
197+
the device driver must be configured in {multi-CCS}[multi-CCS] mode. In this
198+
mode, the PVC root device still has just one queue index, however each "tile"
199+
has multiple queue indices. Therefore, the application must first create
200+
sub-devices to access each tile, and then the application can construct a queue
201+
on these sub-devices using the `compute_index` property.
202+
203+
The semantics of these PVC queue indices is different, though. On PVC, each
204+
queue index corresponds to a fixed subset of the execution units. Queues using
205+
different indices still have a greater likelihood of submitting kernels
206+
concurrently, but each kernel also runs on its own partition of the execution
207+
units. Therefore, the `compute_index` property is just an alternate way to
208+
run on a partition of the device, exactly the same as creating a "cslice"
209+
sub-device via the {sycl_ext_intel_cslice}[sycl_ext_intel_cslice] extension.
210+
211+
In both the ATS-M case and the PVC case, constructing a SYCL queue with
212+
`compute_index` causes the runtime to submit kernels exclusively to that index
213+
on the underlying Level Zero or OpenCL driver. Without this property, the
214+
runtime is free to distribute kernels across the available queue indices.

0 commit comments

Comments
 (0)