Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mixed precision (FP16) support for ROCm #7663

Merged
merged 2 commits into from
Jul 10, 2023
Merged

Conversation

jglaser
Copy link
Contributor

@jglaser jglaser commented Jun 25, 2023

This PR adds mixed precision support for matrix-matrix multiplies on ROCm, which is available in hipBLAS through hipblasGemmEx. On MI200 GPUs, this utilizes the MFMA (matrix fused-multiply add) instructions, which are intended as analogues of the NVIDIA tensor cores.

For cupy.float16 arrays, we allow the use of the hipBLAS primitive, which was previously disabled, and fp16 values were casted to single precision, albeit issuing a warning. I am not entirely sure if the reason for this was the lack of hardware and/or API support in earlier versions of the AMD libraries, therefore a version check for hipBLAS might be required. It appears though that support has been stable since ROCm 4.2.0 (https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md).

To verify that the code now uses rocblas_gemm_ex under the hood, enable rocBLAS runtime tracing with the unit test like this

ROCBLAS_LAYER=1 pytest tests/cupy_tests/math_tests/test_matmul.py -s -k TestMatmulOut

@takagi takagi self-assigned this Jun 26, 2023
@takagi takagi added cat:enhancement Improvements to existing features prio:medium labels Jun 26, 2023
@takagi
Copy link
Member

takagi commented Jun 26, 2023

/test mini

@takagi
Copy link
Member

takagi commented Jun 26, 2023

Thanks, @jglaser! Your change looks good to me. Can I ask you to fix just static-checks?

@takagi takagi added this to the v13.0.0b1 milestone Jun 26, 2023
@takagi
Copy link
Member

takagi commented Jun 26, 2023

ROCBLAS_LAYER=1 pytest tests/cupy_tests/math_tests/test_matmul.py -s -k TestMatmulOut

I also checked hipblasGemmEx was called locally.

@takagi
Copy link
Member

takagi commented Jul 7, 2023

/test mini

@jglaser
Copy link
Contributor Author

jglaser commented Jul 9, 2023

Thanks @takagi for moving this along. Looks like two CUDA related tests are still failing, however, I can't see the test results without logging in

Copy link
Member

@takagi takagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@takagi
Copy link
Member

takagi commented Jul 10, 2023

CI failures are not related. Our CI requires logging in for security reasons.

@takagi takagi merged commit a017c31 into cupy:main Jul 10, 2023
@takagi
Copy link
Member

takagi commented Jul 10, 2023

Thanks, @jglaser!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features prio:medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants