cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication#
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose, and
are scalars or vectors.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features#
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
Input A/B
Input C
Output D
Compute
Block scaled
Support SM arch
FP32
FP32
FP32
FP32
No
8.0, 8.6, 8.7
9.0, 10.0, 12.0
BF16
BF16
BF16
FP32
FP16
FP16
FP16
FP32
FP16
FP16
FP16
FP16
No
9.0
INT8
INT8
INT8
INT32
No
8.0, 8.6, 8.7
9.0, 10.0, 12.0
INT32
INT32
FP16
FP16
BF16
BF16
INT8
INT8
INT8
INT32
No
8.0, 8.6, 8.7
9.0, 10.0, 12.0
INT32
INT32
FP16
FP16
BF16
BF16
E4M3
FP16
E4M3
FP32
No
9.0, 10.0, 12.0
BF16
E4M3
FP16
FP16
BF16
BF16
FP32
FP32
E5M2
FP16
E5M2
FP32
No
9.0, 10.0, 12.0
BF16
E5M2
FP16
FP16
BF16
BF16
FP32
FP32
E4M3
FP16
E4M3
FP32
A/B/D_OUT_SCALE =
VEC64_UE8M0
D_SCALE =
32F
10.0, 12.0
BF16
E4M3
FP16
FP16
A/B_SCALE =
VEC64_UE8M0
BF16
BF16
FP32
FP32
E2M1
FP16
E2M1
FP32
A/B/D_SCALE =
VEC32_UE4M3
D_SCALE =
32F
10.0, 12.0
BF16
E2M1
FP16
FP16
A/B_SCALE =
VEC32_UE4M3
BF16
BF16
FP32
FP32
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support#
Supported SM Architectures:
SM 8.0
,SM 8.6
,SM 8.7
,SM 8.9
,SM 9.0
,SM 10.0
,SM 12.0
Supported CPU architectures and operating systems:
OS |
CPU archs |
---|---|
|
|
|
|