cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication#
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose, and are scalars or vectors.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features#
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
Input A/B
Input C
Output D
Compute
Support arch
FP32
FP32
FP32
FP32
SM 8.0, 8.6, 8.7, 9.0
BF16
BF16
BF16
FP32
FP16
FP16
FP16
FP32
FP16
FP16
FP16
FP16
SM 9.0
INT8
INT8
INT8
INT32
SM 8.0, 8.6, 8.7, 9.0
INT32
INT32
FP16
FP16
BF16
BF16
E4M3
FP16
E4M3
FP32
SM 9.0
BF16
E4M3
FP16
FP16
BF16
BF16
FP32
FP32
E5M2
FP16
E5M2
FP32
SM 9.0
BF16
E5M2
FP16
FP16
BF16
BF16
FP32
FP32
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support#
Supported SM Architectures:
SM 8.0
,SM 8.6
,SM 8.7
,SM 8.9
,SM 9.0
Supported CPU architectures and operating systems:
OS |
CPU archs |
---|---|
|
|
|
|