-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338
Conversation
…ionally, extend np.intersect1d to support the first optional argument (assume_unique).
I’ve just now seen #4815, possibly some of the code could be ported across from there too. |
…gorithm for np.in1d with short second inputs.
I'm not sure why those tests are failing all of a sudden - it seems to be at the Edit: nvm, spotted #9342. |
@jaredjeya feel free to ignore the |
Now it’s been merged, how do I get the failed tests to run again? Is there anything I need to do to keep the new code up-to-date with the main branch? (Sorry, this is my first ever PR to a project in active development!) |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@jaredjeya Many thanks for the PR! I'm tentatively moving it to the 0.60 milestone - at the moment we're working on the 0.59 release, and the 0.60 release will be focused around supporting NumPy 2.0, so it may be some time before it's possible to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jaredjeya. Code seems fine as it is almost a 1:1 port of what NumPy does. Just have some minor comments that should be easy to address.
@jaredjeya, could you include the tests from NumPy for the set of functions added in this PR? |
…owed from numpy's own testing. Also fixed some bugs in the tests (including that lists were never tested).
So what I've done here is add the numpy test cases to the tests I'd already written, rather than writing new tests entirely. Also fixed that I'd been casting all the inputs to arrays before testing them, so I wasn't testing the functionality with lists. Otherwise I also added a check to catch at compile time if someone calls these functions with incorrect literal values for |
I forgot to add the attribution for test_setops_manyways, which should read:
Since it's just a comment is there a way to add that in without re-running all the tests? |
numba/np/arraymath.py
Outdated
@@ -4982,20 4981,23 @@ def np_in1d_impl(ar1, ar2, assume_unique=False, invert=False, kind="sort"): | |||
def jit_np_isin(element, test_elements, assume_unique=False, invert=False, | |||
kind="sort"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for kind
should be None
, not "sort"
.
The following should work on your branch:
import numpy as np
from numba import njit
@njit
def foo():
return np.isin(3, 4, kind=None)
print(foo())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I've done it like I did is because I didn't implement the numpy kind="table"
method. (The output is equivalent, but I suppose the performance is higher in certain cases with the table method (I haven't tested)). When kind=None
, the numpy code will use the table method if certain conditions are met, otherwise the sorting method, so we don't support that either -- strictly, we only support kind="sort"
. So I thought np.isin(3, 4, kind=None)
should fail.
If you think it should work I'm happy to change it, or otherwise we just say we don't support the "kind" argument and leave it at that. Or, I could even implement the "table" method, it isn't that much more complicated than the sort method, and then we fully support the "kind" argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I think it is best to match as close as possible the NumPy signature but keep a check internally for kind == "sort"
.
I decided to add some tests to make sure the correct errors result when nonsense inputs are supplied, including e.g. literals like This is the full type-checking code within the if kind is not None:
if (isinstance(kind, types.StringLiteral)
and kind.literal_value != "sort"):
raise NumbaValueError('isin: Only kind="sort" is supported')
elif not isinstance(kind, (types.UnicodeType, str, types.NoneType)):
raise TypingError('isin: Argument "kind" must be a string or None') this is the code in the actual implementation, if kind is not None and kind != "sort":
raise ValueError('isin: Only kind="sort" is supported') and this is the testing code: @njit()
def table_isin(a, b):
return np.isin(a, b, kind="table")
with self.assertRaises(NumbaValueError):
table_isin(a, b) I also tried What am I doing wrong? Other functions e.g. |
Nevermind, I sorted it |
numba/np/arraymath.py
Outdated
if kind is not None and kind != "sort": | ||
raise ValueError('isin: Only kind="sort" is supported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If kind=None
, NumPy will use table
by default, which is not supported in this PR. Change the clause to something like this:
if kind is not None and kind != "sort": | |
raise ValueError('isin: Only kind="sort" is supported') | |
if kind != "sort": | |
raise ValueError('isin: Only kind="sort" is supported') |
This will force a user to spell the kind="sort"
keyword argument and not assume Numba supports the table
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If kind=None, NumPy will use table by default, which is not supported in this PR.
Yes, that is why I didn't implement kind=None in the first place. I'll change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature needs to be with kind=None
as it is what NumPy specifies. But let's keep a check to only run the function if the user specifies kind="sort"
.
Let me know if you have any questions regarding this.
numba/np/arraymath.py
Outdated
if not (kind is None or isinstance(kind, types.NoneType)): | ||
kindval = getattr(kind, "literal_value", kind) | ||
if not isinstance(kindval, | ||
(types.UnicodeType, str, types.StringLiteral)): | ||
raise TypingError('in1d: Argument "kind" must be a string or None') | ||
elif kindval != "sort": | ||
raise NumbaValueError('in1d: Only kind="sort" is supported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but we can keep only the ValueError
to simplify the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I should get rid of this whole section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I've reduced it to just,
if not isinstance(kind, (types.UnicodeType, str, types.StringLiteral)):
raise TypingError('in1d: Only kind="sort" is supported')
which makes sure kind=None doesn't make it past compilation. It seemed to be causing some type inference issues if I didn't have that.
@guilhermeleobas Tests seem to be failing now because According to the release notes, pre-1.24, the kind="sort" behaviour was the default. Newer versions automatically pick the optimal algorithm. It might not be too hard to implement kind="table", but it would take me a while as I'm quite busy atm. Otherwise we can drop the keyword entirely and say we're only replicating pre-1.24 behaviour? The documentation suggests using a version gate but I don't know what that means in this context. |
I think we can have an if-else statement to support the kind argument if NumPy version if greater or equal 1.24: if numpy_version < (1, 24):
@overload(np.isin)
def jit_np_isin(element, test_elements, assume_unique=False, invert=False):
....
else:
@overload(np.isin)
def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
kind=None):
... You may have to update the tests as well to only pass kind="sort" if NumPy >= 1.24
That's also an option. Choose the one that is easiest for you to implement. |
This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks! |
I do have this on my todo list, I've just been somewhat swamped recently. I do plan to set aside some time to hopefully get this over the line soon. Hopefully nothing major needed to merge this with subsequent changes to numba. |
…t of pre-1.24 Numpy.
I decided it's simpler and less confusing to the end user (especially working across multiple versions) if we just specify that "kind" isn't supported, and that behaviour matches that of pre-1.24. Unfortunately it looks like the build is failing due to changes to put in the new typing system - I'm not really sure what to do here? Edit: nevermind, managed to sort it out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I'm happy with the current changes. Thanks for your effort in this PR.
Thank you for the patch. |
Adds support for the functions
np.in1d
,np.isin
,np.setxor1d
, andnp.setdiff1d
, all including optional argumentassume_unique
as well asinvert
fornp.in1d
andnp.isin
.Extends
np.intersect1d
to support the first optional argument (assume_unique).Resolves #4677. Improves on #4074.
I have written unit tests for all the new functions and functionality, and applied flake8 to my code.
The only question I have is how to handle the optional argument
kind
innp.in1d
: I've implemented thekind="sort"
algorithm, but this is not the default behaviour ofnumpy
(which picks an algorithm based on performance heuristics), so I've made it explicit that onlykind="sort"
is supported.