-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math.nextafter for cuda #9541
base: main
Are you sure you want to change the base?
math.nextafter for cuda #9541
Conversation
Caveat: This PR intentionally only implements support for FP32 and FP64, but not FP16. I failed to figure out how to do this correctly and do not require it for my use case at the moment. |
@esc It appears I forgot the release notes - please re-run CI. |
Ping @gmarkall |
|
||
binarys_fastmath = {} | ||
binarys_fastmath['powf'] = 'fast_powf' | ||
binarys_fastmath['nextafterf'] = 'fast_nextafterf' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a fast version of nextafter
for 64-bit operands too: https://docs.nvidia.com/cuda/libdevice-users-guide/__nv_nextafter.html#__nv_nextafter - I'm not 100% sure this will work (been a while since I thought about these implementations) but maybe:
binarys_fastmath['nextafterf'] = 'fast_nextafterf' | |
binarys_fastmath['nextafterf'] = 'fast_nextafterf' | |
binarys_fastmath['nextafter'] = 'fast_nextafter' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally - if there are fastmath implementations of these functions, they should also be tested in numba/cuda/tests/cudapy/test_fastmath.py
- you should be able to create a test by following the pattern used in the tests for other functions. If you're having trouble working out what patterns to check for, let me know and I'll see if I can help work out something appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! On the whole this looks good, I just have a comment about the fast implementation and testing it on the diff.
I think omitting float16
support for now is fine - I didn't see an implementation in the cuda_fp16.{h,hpp}
headers. I did see something that may be applicable in libcu , but the route to using it (or whether it is applicable) is not obvious so I'm happy with leaving it for now.
This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks! |
Ping - still alive, will post a fix soon. |
Thanks @s-m-e! Continued development of the CUDA target is proceeding in https://github.com/nvidia/numba-cuda, as described in https://numba.discourse.group/t/rfc-moving-the-cuda-target-to-a-new-package-maintained-by-nvidia/2628 Would it be possible to open this PR in that repository please? If that's a bit of a hassle for you, let me know and I can open a similar PR there (but I'm not sure I'd be able to give you a way to push to the branch, but it would at lease make it easy for you to have a starting point over there). |
This PR adds support for
math.nextafter
for CUDA. Partially fixes #9435 (no support fornumpy.nextafter
), related to #9424 and #9438.