Add mixed precision (FP16) support for ROCm #7663

jglaser · 2023-06-25T04:23:15Z

This PR adds mixed precision support for matrix-matrix multiplies on ROCm, which is available in hipBLAS through hipblasGemmEx. On MI200 GPUs, this utilizes the MFMA (matrix fused-multiply add) instructions, which are intended as analogues of the NVIDIA tensor cores.

For cupy.float16 arrays, we allow the use of the hipBLAS primitive, which was previously disabled, and fp16 values were casted to single precision, albeit issuing a warning. I am not entirely sure if the reason for this was the lack of hardware and/or API support in earlier versions of the AMD libraries, therefore a version check for hipBLAS might be required. It appears though that support has been stable since ROCm 4.2.0 (https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md).

To verify that the code now uses rocblas_gemm_ex under the hood, enable rocBLAS runtime tracing with the unit test like this

ROCBLAS_LAYER=1 pytest tests/cupy_tests/math_tests/test_matmul.py -s -k TestMatmulOut

takagi · 2023-06-26T05:21:42Z

/test mini

takagi · 2023-06-26T05:26:29Z

Thanks, @jglaser! Your change looks good to me. Can I ask you to fix just static-checks?

takagi · 2023-06-26T05:39:48Z

ROCBLAS_LAYER=1 pytest tests/cupy_tests/math_tests/test_matmul.py -s -k TestMatmulOut

I also checked hipblasGemmEx was called locally.

takagi · 2023-07-07T06:43:21Z

/test mini

jglaser · 2023-07-09T20:13:33Z

Thanks @takagi for moving this along. Looks like two CUDA related tests are still failing, however, I can't see the test results without logging in

takagi

LGTM!

takagi · 2023-07-10T06:06:54Z

CI failures are not related. Our CI requires logging in for security reasons.

takagi · 2023-07-10T06:07:09Z

Thanks, @jglaser!

Add mixed precision (FP16) support for ROCm

199b922

takagi self-assigned this Jun 26, 2023

takagi added cat:enhancement Improvements to existing features prio:medium labels Jun 26, 2023

takagi added this to the v13.0.0b1 milestone Jun 26, 2023

Static check

f11b555

takagi force-pushed the rocm_mfma branch from 82c2b0a to f11b555 Compare July 7, 2023 04:30

takagi approved these changes Jul 10, 2023

View reviewed changes

takagi merged commit a017c31 into cupy:main Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mixed precision (FP16) support for ROCm #7663

Add mixed precision (FP16) support for ROCm #7663

jglaser commented Jun 25, 2023 •

edited

Loading

takagi commented Jun 26, 2023

takagi commented Jun 26, 2023

takagi commented Jun 26, 2023

takagi commented Jul 7, 2023

jglaser commented Jul 9, 2023 •

edited

Loading

takagi left a comment

takagi commented Jul 10, 2023

takagi commented Jul 10, 2023

Add mixed precision (FP16) support for ROCm #7663

Add mixed precision (FP16) support for ROCm #7663

Conversation

jglaser commented Jun 25, 2023 • edited Loading

takagi commented Jun 26, 2023

takagi commented Jun 26, 2023

takagi commented Jun 26, 2023

takagi commented Jul 7, 2023

jglaser commented Jul 9, 2023 • edited Loading

takagi left a comment

Choose a reason for hiding this comment

takagi commented Jul 10, 2023

takagi commented Jul 10, 2023

jglaser commented Jun 25, 2023 •

edited

Loading

jglaser commented Jul 9, 2023 •

edited

Loading