Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Add overloads generated by specialization to the current dispatcher. #9106

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gmarkall
Copy link
Member

Adding overloads to the current dispatcher when specializing saves recreating the overload for subsequent calls to the unspecialized dispatcher that would have used the overload.

This also has the side effects of making overloads available after a call to ForAll() on a dispatcher.

This provides an alternative to the implementation in #9057, which also aimed to make overloads available after a ForAll() call, but did it by changing the way ForAll() worked and potentially increasing latency for calls on specialized dispatchers (of which there are some use cases in cuDF).

Whilst looking into this I also noticed that all the CUDA dispatcher specialize tests worked in an "unexpected" way and didn't provide a good pattern to follow for this test added in this PR, so they are all fixed up in this PR too.

Dispatcher specialization takes actual arguments, not types. So although
these tests did technically test the specialization mechanism, the way
in which they were doing it was extremely weird because a kernel would
never actually be launched with a type as an argument.
Adding overloads to the current dispatcher when specializing saves
recreating the overload for subsequent calls to the unspecialized
dispatcher that would have used the overload.

This also has the side effects of making overloads available after a
call to `ForAll()` on a dispatcher.
@gmarkall gmarkall added CUDA CUDA related issue/PR 2 - In Progress labels Jul 28, 2023
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

gpuci run tests

@Matt711
Copy link
Contributor

Matt711 commented Jul 31, 2023

The PR still works with my cuDF example for numba-inspector.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, couple of minor suggestions RE variable naming else looks good.

Comment on lines 132 to 134
f_f32a_f32a = f.specialize(f_arg, f_arg)
self.assertEqual(len(f.overloads), 1)
self.assertIs(f_f32a_f32a.overloads[f_arg_ty, f_arg_ty],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f_f32a_f32a = f.specialize(f_arg, f_arg)
self.assertEqual(len(f.overloads), 1)
self.assertIs(f_f32a_f32a.overloads[f_arg_ty, f_arg_ty],
f_f32f_f32f = f.specialize(f_arg, f_arg)
self.assertEqual(len(f.overloads), 1)
self.assertIs(f_f32f_f32f.overloads[f_arg_ty, f_arg_ty],

suggest naming scheme consistency with F-ordered input.

f_f32a_f32a = f.specialize(float32[:], float32[:])
# 'F' order specialization
f_arg = np.zeros((2, 2), order='F')
f_f32a_f32a = f.specialize(f_arg, f_arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f_f32a_f32a = f.specialize(f_arg, f_arg)
f_f32f_f32f = f.specialize(f_arg, f_arg)

suggestion to match ordering F.

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Sep 12, 2023
@gmarkall gmarkall modified the milestones: 0.59.0-rc1, 0.60.0-rc1 Dec 5, 2023
Copy link

github-actions bot commented Mar 5, 2024

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

@github-actions github-actions bot added the stale Marker label for stale issues. label Mar 5, 2024
@gmarkall
Copy link
Member Author

gmarkall commented Mar 5, 2024

Still planning to finish this when I get a moment.

@github-actions github-actions bot removed the stale Marker label for stale issues. label Mar 6, 2024
@gmarkall gmarkall modified the milestones: 0.60.0-rc1, 0.61.0-rc1 Apr 9, 2024
Copy link

github-actions bot commented Jul 9, 2024

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

@github-actions github-actions bot added the stale Marker label for stale issues. label Jul 9, 2024
@github-actions github-actions bot added the abandoned - stale PRs automatically closed due to stale status label Jul 16, 2024
@github-actions github-actions bot closed this Jul 16, 2024
@gmarkall
Copy link
Member Author

Missed this due to being on PTO, going to reopen it to keep it relevant (to get round to one day).

@gmarkall gmarkall reopened this Jul 16, 2024
@gmarkall gmarkall removed the abandoned - stale PRs automatically closed due to stale status label Jul 16, 2024
@gmarkall gmarkall removed the stale Marker label for stale issues. label Jul 16, 2024
@gmarkall gmarkall modified the milestones: 0.61.0-rc1, nvidia-cuda-next Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on author Waiting for author to respond to review CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants