-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Add overloads generated by specialization to the current dispatcher. #9106
base: main
Are you sure you want to change the base?
Conversation
Dispatcher specialization takes actual arguments, not types. So although these tests did technically test the specialization mechanism, the way in which they were doing it was extremely weird because a kernel would never actually be launched with a type as an argument.
Adding overloads to the current dispatcher when specializing saves recreating the overload for subsequent calls to the unspecialized dispatcher that would have used the overload. This also has the side effects of making overloads available after a call to `ForAll()` on a dispatcher.
gpuci run tests |
gpuci run tests |
The PR still works with my cuDF example for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, couple of minor suggestions RE variable naming else looks good.
f_f32a_f32a = f.specialize(f_arg, f_arg) | ||
self.assertEqual(len(f.overloads), 1) | ||
self.assertIs(f_f32a_f32a.overloads[f_arg_ty, f_arg_ty], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_f32a_f32a = f.specialize(f_arg, f_arg) | |
self.assertEqual(len(f.overloads), 1) | |
self.assertIs(f_f32a_f32a.overloads[f_arg_ty, f_arg_ty], | |
f_f32f_f32f = f.specialize(f_arg, f_arg) | |
self.assertEqual(len(f.overloads), 1) | |
self.assertIs(f_f32f_f32f.overloads[f_arg_ty, f_arg_ty], |
suggest naming scheme consistency with F
-ordered input.
f_f32a_f32a = f.specialize(float32[:], float32[:]) | ||
# 'F' order specialization | ||
f_arg = np.zeros((2, 2), order='F') | ||
f_f32a_f32a = f.specialize(f_arg, f_arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_f32a_f32a = f.specialize(f_arg, f_arg) | |
f_f32f_f32f = f.specialize(f_arg, f_arg) |
suggestion to match ordering F
.
This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks! |
Still planning to finish this when I get a moment. |
This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks! |
Missed this due to being on PTO, going to reopen it to keep it relevant (to get round to one day). |
Adding overloads to the current dispatcher when specializing saves recreating the overload for subsequent calls to the unspecialized dispatcher that would have used the overload.
This also has the side effects of making overloads available after a call to
ForAll()
on a dispatcher.This provides an alternative to the implementation in #9057, which also aimed to make overloads available after a
ForAll()
call, but did it by changing the wayForAll()
worked and potentially increasing latency for calls on specialized dispatchers (of which there are some use cases in cuDF).Whilst looking into this I also noticed that all the CUDA dispatcher specialize tests worked in an "unexpected" way and didn't provide a good pattern to follow for this test added in this PR, so they are all fixed up in this PR too.