Releases: intel/llvm
Intel SYCL Compiler release 6.0.1
Dependencies included in the release
Components included in the release
- clang version
19.0.0
- SYCL runtime version 8.0.0 (as indicated by predefined macro
__LIBSYCL_MAJOR_VERSION
,__LIBSYCL_MINOR_VERSION
and__LIBSYCL_PATCH_VERSION
)
Compatibility with previous releases
This is a patch/bugfix release for v6.0.0
and it is compatible with it. For compatibility with other releases, please see the corresponding documentation for v6.0.0
.
Compatibility with oneAPI
Intel(R) oneAPI DPC++/C++ Compiler version 2025.0 release leverages codebase from sycl-rel-6_0_0
branch and it is the closest oneAPI DPC++/C++ compiler release to this one (in terms of available features and bugfixes).
However, this does not guarantee any feature or bugfix parity between these two releases.
We would like to specifically note that bugfixes and patches applied to 2025.0.X oneAPI patch releases do not match those made for this release.
Validation & quality expectations
In general, list of supported hardware and operating systems should match the one provided by Intel(R) oneAPI DPC++/C++ Compiler for version 2025.0, see corresponding system requirements.
However, we did not perform the same exhaustive testing of this open-source branch and therefore there could be some unique issues that are not present in Intel (R) oneAPI DPC++/C++ Compiler version 2025.0.
You can find full validation logs for the branch here but a summary of it will also be posted below.
End to end tests
The following hardware and software configurations were tested:
Driver versions listed as reported by sycl::device::get_info<info::device::driver_version>()
.
- Windows
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.6.31896
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
- Driver version: 32.0.101.6559
- Intel(R) oneAPI Unified Runtime over Level-Zero on
- Linux (Ubuntu 22.04)
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
- Driver version: 25.09.32961.5
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.6.32961.500000
- Intel(R) OpenCL on
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 Ghz OpenCL 3.0- Driver version: 2024.18.10.0.08_160000
- AMD HIP BACKEND on AMD Radeon PRO W6800
- Driver version: HIP 60342.13
- NVIDIA CUDA BACKEND on NVIDIA A10G
- Driver version: CUDA 12.1
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
SYCL CTS
- Linux (Ubuntu 22.04)
- Intel(R) OpenCL on
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 Ghz OpenCL 3.0- Driver version: 2024.18.10.0.08_160000
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.6.32961.500000
- Intel(R) OpenCL on
We use the latest available CTS in our validation, but sycl-rel-6.0.0
is an
old branch already and there were some CTS changes made which make them fail.
Known failures are:
-
test_device
CTS test fails because some OpenCL-specific device info queries
do not throw exceptions on non-OpenCL backends. See KhronosGroup/SYCL-Docs#625 -
test_header
CTS target cannot be compiled, because definition of
SYCL_LANGUAGE_VERSION
does not match the latest version of the SYCL 2020
specification. See KhronosGroup/SYCL-Docs#634 -
test_language
CTS target cannot be compiled, because the compiler does not
abide to all new rules for constant-evaluated expression from the latest
version of the SYCL 2020 specification. See KhronosGroup/SYCL-Docs#388 -
multi_ptr
CTS target can be compiled, but the test reports failure because
of a compile-time disabled tests due to implementation of
decorated_generic_ptr
andraw_generic_ptr
aliases is missing.
See KhronosGroup/SYCL-Docs#598
How to use
This release does not provide a pre-built binaries of our SYCL compiler and
simply marks a known good commit on a corresponding release branch which can be
used for building the compiler and the runtime for your needs. To do so, follow
Get Started Guide
Detailed changelog
For a more detailed changelog refer to Release notes Jul'24 in our release notes document.
Changes since v6.0.0
This patch release contains a fix for error: SYCL kernel cannot call a variadic function
issue mentioned in Known issues
section of the 6.0.0 release.
The issue should not be reproducible in most environment. Its reproducibility depends on implementation details of
specific C++ STL implementation that's installed on your system. However, we haven't seen a setup yet where it would
still be reproducible.
The fix has a potential to break ABI and to preserve compatibility, it won't kick in on systems where it would cause
an ABI-break (see the note above about reproducibility). If you still encounter it and do not care about binary
compatibility with previous releases, then you can define macro __SYCL_USE_PLAIN_ARRAY_AS_VEC_STORAGE=1
(before including sycl/sycl.hpp
)
to force-enable the fix.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
Intel SYCL Compiler release 6.0.0
Dependencies included in the release
Components included in the release
- clang version
19.0.0
- SYCL runtime version 8.0.0 (as indicated by predefined macro
__LIBSYCL_MAJOR_VERSION
,__LIBSYCL_MINOR_VERSION
and__LIBSYCL_PATCH_VERSION
)
Compatibility with previous releases
This is a first formal release in this repo and therefore there are no other releases to be compatible with.
However, the repo has existed for a while and this release is not compatible with builds produced from older commits, because there were ABI-breaking changes made to the codebase just prior taking the sycl-rel-6_0_0
branch.
Compatibility with oneAPI
Intel(R) oneAPI DPC++/C++ Compiler version 2025.0 leverages codebase from sycl-rel-6_0_0
branch and it is the closest oneAPI DPC++/C++ compiler release to this one (in terms of available features and bugfixes).
However, this does not guarantee any feature or bugfix parity between these two releases.
Validation & quality expectations
In general, list of supported hardware and operating systems should match the one provided by Intel(R) oneAPI DPC++/C++ Compiler for version 2025.0, see corresponding system requirements.
However, we did not perform the same exaustive testing of this open-source branch and therefore there could be some unique issues that are not present in Intel (R) oneAPI DPC++/C++ Compiler version 2025.0.
You can find full validation logs for the branch here but a summary of it will also be posted below.
End to end tests
The following hardware and software configurations were tested:
Driver versions listed as reported by sycl::device::get_info<info::device::driver_version>()
.
- Windows
- Driver version: 32.0.101.6129
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.5.31093
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
- Driver version: 32.0.101.6129
- Linux (Ubuntu 22.04)
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
- Driver version: 24.52.32224.5
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.6.32224.500000
- Intel(R) OpenCL on
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 Ghz OpenCL 3.0- Driver version: 2024.18.10.0.08_160000
- AMD HIP BACKEND on AMD Radeon RX 6700 XT gfx1031
- Driver version: HIP 60342.13
- NVIDIA CUDA BACKEND on NVIDIA A10G 8.6
- Driver version: CUDA 12.1
- Intel(R) OpenCL Graphics on Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO
SYCL CTS
- Linux (Ubuntu 22.04)
- Intel(R) OpenCL on
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 Ghz OpenCL 3.0- Driver version: 2024.18.10.0.08_160000
- Intel(R) oneAPI Unified Runtime over Level-Zero on
Intel(R) Iris(R) Xe Graphics 12.0.0- Driver version: 1.6.32224.500000
- Intel(R) OpenCL on
We use the latest available CTS in our validation, but sycl-rel-6.0.0
is an
old branch already and there were some CTS changes made which make them fail.
Known failures are:
-
test_device
CTS test fails because some OpenCL-specific device info queries
do not throw exceptions on non-OpenCL backends. See KhronosGroup/SYCL-Docs#625 -
test_header
CTS target cannot be compiled, because definition of
SYCL_LANGUAGE_VERSION
does not match the latest version of the SYCL 2020
specification. See KhronosGroup/SYCL-Docs#634 -
test_language
CTS target cannot be compiled, because the compiler does not
abide to all new rules for constant-evaluated expression from the latest
version of the SYCL 2020 specification. See KhronosGroup/SYCL-Docs#388 -
multi_ptr
CTS target cannot be compiled, because implementation for
decorated_generic_ptr
andraw_generic_ptr
aliases is missing.
See KhronosGroup/SYCL-Docs#598
How to use
This release does not provide a pre-built binaries of our SYCL compiler and
simply marks a known good commit on a corresponding release branch which can be
used for building the compiler and the runtime for your needs. To do so, follow
Get Started Guide
Detailed changelog
For a more detailed changelog refer to Release notes Jul'24 in our release notes document.
Known issues
This section describes additional issues which were reported that are not listed in the corresponding section of the detailed changelog.
error: SYCL kernel cannot call a variadic function
Compilation error like this will be reported for applications that use sycl::vec::operator[]
(either directly, or indirectly through built-ins like sycl::group_broadcast
) when compiled on Windows in Debug mode using clang.exe
compiler driver.
There are a couple of workarounds available to it:
- switch to using
clang-cl.exe
driver - pass an extra flag during compilation:
-Xsycl-target-frontend "-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0"
Note that if your environment and application is not what described above (sycl::vec::operator[]
usage on Windows in Debug mode through clang.exe
compiler driver), then the error is expected to be legit, i.e. device code contains some illegal construct.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler 2022-12
New features
SYCL Compiler
- Added support for per-object device code compilation under the option
-fno-sycl-rdc
. This improves compiler performance and reduces memory usage,
but can only be used if there are no cross-object dependencies. [f884993] - Added support for per-aspect device code split mode. [9a2c4fe]
- Extended support for the large GRF mode to non-ESIMD kernels. [9994934]
[ab2a42c] - Implemented the
sycl_ext_intel_device_architecture
extension. [0e32a28] [b59d93c] [5bd5c87] [e5de913] - Implemented the
sycl_ext_oneapi_kernel_properties
experimental extension. [332e4ee] [27454de] [70ee3d5] [430c722] - Added support for generic address space atomic built-ins to CUDA libclc.
[d6a8fd1]
SYCL Library
- Implemented accessor member functions
swap
,byte_size
,max_size
and
empty
. [f1f907a] - Implemented SYCL 2020 default accessor constructor. [04928f9]
- Implemented SYCL 2020 accessor iterators. [5b9fd3c] [c7b1a00]
- Changed
value_type
of read-only accessors toconst
in accordance with
SYCL 2020. [227614c] - Implemented SYCL 2020
multi_ptr
andaddress_space_cast
. [8700b76]
[483984a] [4a9e9a0] - Implemented SYCL 2020
has_extension
free functions. [7f1a6ef] - Implemented SYCL 2020
aspect_selector
. [c0a4a56] - Implemented new SYCL 2020 style FPGA selectors. [0417651]
- Implemented SYCL 2020 default
async_handler
behavior. [cd93d8f] - Implemented SYCL 2020
is_compatible
free function. [67f6bba] - Implemented queue shortcut functions with placeholder accessors. [5ee066e]
- Added support for creating a kernel bundle with descendent devices of the
passed context's members. [a782779] - Implemented non-blocking destruction and deferred release of memory objects
without attached host memory. [894ce25] - Implemented the
sycl_ext_oneapi_queue_priority
extension. [cdb09dc] - Implemented the
sycl_ext_oneapi_user_defined_reductions
extension. [8311d79] - Implemented the
sycl_ext_oneapi_queue_empty
extension proposal. [c493295] - Implemented the
sycl_ext_oneapi_weak_object
extension. [d948427] [9297f63] - Implemented the
sycl_ext_intel_cslice
extension. The old behavior that exposed compute slices as sub-sub-devices is
now deprecated. For compatibility purposes, it can be brought back via the
SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING
environment
varible. [5995c618] - Implemented the
sycl_ext_intel_queue_index
extension. [d2ec964] [7179e83] - Implemented the
sycl_ext_oneapi_memcpy2d
extension. [516d411] - Implemented device ID, memory clock rate and bus width information queries
from thesycl_ext_intel_device_info
extension. [1d99344] [4f7787c] - Implemented
ext::oneapi::experimental::radix_sorter
from the
sycl_ext_oneapi_group_sort
extension proposal. [86ba180] - Implemented a new unified interface for the
sycl_ext_oneapi_matrix
extension for CUDA. [166bbc3] - Added support for sorting over sub-groups. [168767c]
- Added C++ API wrappers for the Intel math functions
ceil
,floor
,rint
,
sqrt
,rsqrt
andtrunc
. [1b7582b] - Implemented a SYCL device library for
bfloat16
Intel math function
utilities. [fc136d6] - Added support for range reductions with any number of reduction variables.
[572bc50] - Added support for reductions with kernels accepting
item
. [5d5e9f4] - Enabled sub-group masks for 64-bit subgroups. [10d50ed]
- Implemented the new non-experimental API for DPAS. [55bf1a0] [1e7a8ea]
- Added 8/16-bit type support to
lsc_block_load
andlsc_block_store
ESIMD
API. [f9d8059] - Implemented atomic operation support in the ESIMD emulator. [a6a0dea]
- Added various trivial utility functions for the
half
type. [b4ce7c0] - Added type cast functions between
half
andfloat
/integer types to
libdevice. [599b1b9] - Implemented the
ONEAPI_DEVICE_SELECTOR
environment variable that, in
addition to supportingSYCL_DEVICE_FILTER
syntax, allows to expose GPU
sub-devices as SYCL root devices and supports negative filters.
SYCL_DEVICE_FILTER
is now deprecated. [28d0cd3] [b21e74e] [77b6f34]
[6bd5f9c] [6aefd63] - Added the
SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION
enviornment variable. [bd03e0d]
Documentation
- Added the
sycl_ext_oneapi_device_architecture
extension specification. [7f2b17e] - Added the
sycl_ext_oneapi_memcpy2d
extension specification. [296e9c3] - Added the
sycl_ext_oneapi_user_defined_reductions
extension specification. [cd4fd8c] - Added the
sycl_ext_oneapi_weak_object
extension specification. [d948427] - Added the
sycl_ext_oneapi_prod
extension proposal. [ed7cb4b] - Added the
sycl_ext_codeplay_kernel_fusion
extension proposal. [be3dfbd] - Added the
sycl_ext_intel_queue_index
extension proposal. [f5fb759] - Added the
sycl_ext_intel_cslice
extension proposal. [5777e1f] - Added the
sycl_ext_oneapi_group_sort
extension update proposal that introduced sorting functions with fixed-size arrays. [c6d1caf] - Added device ID, memory clock rate and bus width device information queries to the
sycl_ext_intel_device_info
extension. [1d99344][4f7787c]
Improvements
SYCL Compiler
- Added the
InferAddressSpaces
pass to the SPIR/SPIR-V compilation pipeline,
reducing the size of the generated device code. [a3ae0dd] - Redesigned pointer handling so that it no longer decomposes kernel argument
types containing pointers. [3916d3b] [d55e9c2] [9b02506] - Kernel lambda operator is now always inlined in the device code entry point
unless-O0
is used. [b91b732] [2359d94] - Improved entry point handling in the
sycl-post-link
tool. [53d9c7b] - The
reqd_work_group_size
attribute now works with 1, 2 or 3 operands.
[4ff42c3] - Enabled using
-fcf-protection
option with-fsycl
, which results in it
being applied only to host code compilation and producing a warning. [b6f61f6] - Linux based compiler driver on Windows now pulls in the
sycld
debug library
whenmsvcrtd
is specified as a dependent library. [ebf6c59] - Added
/Zc:__cplusplus
as a default option during host compilation with MSVC.
[e7ed860] - Improved the
ESIMDOptimizeVecArgCallConv
optimization pass to cover more IR
patterns. [4926454] - Added support for more types in ESIMD lsc functions. [d9e40ec]
- Added error diagnostics for using
sycl::ext::oneapi::experimental::annotated_arg/ptr
as a nested type.
[321c733] - The status of
bfloat16
support was changed from experimental to supported.
[7b47ebb]
SYCL Library
- Updated
online_compiler
with Gen12 GPU support. [adfb1c1] -
get_kernel_bundle
andhas_kernel_bundle
now check that the kernels are
compatible with the devices. [91b1515] - Waiting for an event associated with a kernel that uses a stream now also
waits for the stream to be flushed. [1db0e81] - Added the requested device type to the message of the exception thrown when no
such devices are found. [6b83ad7] - Optimized
operator[]
ofhost_accessor
. [01e60f7] - Improved reduction performance on discrete GPUs. [99bdc82]
- Added
invoke_simd
support for functions withvoid
return type. [3fd0850] - The Level Zero plugin now creates every event as host-visible by default.
[f3d245d] - Added Level Zero plugin support for global work sizes greater than
UINT32_MAX
as long as they are divisible by some legal work-group size and
the resulting quotient does not exce...