Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

jaredjeya · 2023-12-07T18:11:09Z

Adds support for the functions np.in1d, np.isin, np.setxor1d, and np.setdiff1d, all including optional argument assume_unique as well as invert for np.in1d and np.isin.
Extends np.intersect1d to support the first optional argument (assume_unique).

Resolves #4677. Improves on #4074.

I have written unit tests for all the new functions and functionality, and applied flake8 to my code.

The only question I have is how to handle the optional argument kind in np.in1d: I've implemented the kind="sort" algorithm, but this is not the default behaviour of numpy (which picks an algorithm based on performance heuristics), so I've made it explicit that only kind="sort" is supported.

…ionally, extend np.intersect1d to support the first optional argument (assume_unique).

jaredjeya · 2023-12-08T00:33:20Z

I’ve just now seen #4815, possibly some of the code could be ported across from there too.

…gorithm for np.in1d with short second inputs.

jaredjeya · 2023-12-08T13:42:48Z

I'm not sure why those tests are failing all of a sudden - it seems to be at the flake8 step? But the errors flake8 is highlighting are unrelated to code I've modified and it didn't fail in the earlier test.

Edit: nvm, spotted #9342.

stuartarchibald · 2023-12-08T13:44:55Z

@jaredjeya feel free to ignore the flake8 failures until #9342 is merged later today (a new version of flake8 was released and it picked up a couple of new issues). Thanks.

jaredjeya · 2023-12-10T18:29:23Z

@jaredjeya feel free to ignore the flake8 failures until #9342 is merged later today (a new version of flake8 was released and it picked up a couple of new issues). Thanks.

Now it’s been merged, how do I get the failed tests to run again? Is there anything I need to do to keep the new code up-to-date with the main branch? (Sorry, this is my first ever PR to a project in active development!)

gmarkall · 2023-12-11T11:02:08Z

/azp run

azure-pipelines · 2023-12-11T11:02:18Z

Azure Pipelines successfully started running 1 pipeline(s).

gmarkall · 2023-12-12T16:17:52Z

@jaredjeya Many thanks for the PR! I'm tentatively moving it to the 0.60 milestone - at the moment we're working on the 0.59 release, and the 0.60 release will be focused around supporting NumPy 2.0, so it may be some time before it's possible to review this PR.

guilhermeleobas

Thanks, @jaredjeya. Code seems fine as it is almost a 1:1 port of what NumPy does. Just have some minor comments that should be easy to address.

numba/np/arraymath.py

guilhermeleobas · 2024-01-15T15:22:20Z

@jaredjeya, could you include the tests from NumPy for the set of functions added in this PR?
https://github.com/numpy/numpy/blob/b0371ef240560e78b651a5d7c9407ae3212a3d56/numpy/lib/tests/test_arraysetops.py#L15

…owed from numpy's own testing. Also fixed some bugs in the tests (including that lists were never tested).

jaredjeya · 2024-01-16T17:45:00Z

So what I've done here is add the numpy test cases to the tests I'd already written, rather than writing new tests entirely. Also fixed that I'd been casting all the inputs to arrays before testing them, so I wasn't testing the functionality with lists.

Otherwise I also added a check to catch at compile time if someone calls these functions with incorrect literal values for kind, but I don't enforce that it's a compile time constant. I also changed some of the error messages for readability and consistency.

jaredjeya · 2024-01-16T17:59:29Z

I forgot to add the attribution for test_setops_manyways, which should read:

# https://github.com/numpy/numpy/blob/b0371ef240560e78b651a5d7c9407ae3212a3d56/numpy/lib/tests/test_arraysetops.py#L588 # noqa: E501

Since it's just a comment is there a way to add that in without re-running all the tests?

numba/np/arraymath.py

guilhermeleobas · 2024-01-19T16:58:05Z

numba/np/arraymath.py

@@ -4982,20 4981,23 @@ def np_in1d_impl(ar1, ar2, assume_unique=False, invert=False, kind="sort"):
 def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
 kind="sort"):


The default value for kind should be None, not "sort".

The following should work on your branch:

import numpy as np from numba import njit @njit def foo(): return np.isin(3, 4, kind=None) print(foo())

The reason I've done it like I did is because I didn't implement the numpy kind="table" method. (The output is equivalent, but I suppose the performance is higher in certain cases with the table method (I haven't tested)). When kind=None, the numpy code will use the table method if certain conditions are met, otherwise the sorting method, so we don't support that either -- strictly, we only support kind="sort". So I thought np.isin(3, 4, kind=None) should fail.

If you think it should work I'm happy to change it, or otherwise we just say we don't support the "kind" argument and leave it at that. Or, I could even implement the "table" method, it isn't that much more complicated than the sort method, and then we fully support the "kind" argument.

Okay. I think it is best to match as close as possible the NumPy signature but keep a check internally for kind == "sort".

numba/np/arraymath.py

jaredjeya · 2024-01-22T16:23:55Z

I decided to add some tests to make sure the correct errors result when nonsense inputs are supplied, including e.g. literals like kind="foo". However I can't for the life of me figure out how to get numba to actually compile these as literals.

This is the full type-checking code within the @overload function:

if kind is not None:
    if (isinstance(kind, types.StringLiteral)
            and kind.literal_value != "sort"):
        raise NumbaValueError('isin: Only kind="sort" is supported')
    elif not isinstance(kind, (types.UnicodeType, str, types.NoneType)):
        raise TypingError('isin: Argument "kind" must be a string or None')

this is the code in the actual implementation,

if kind is not None and kind != "sort":
    raise ValueError('isin: Only kind="sort" is supported')

and this is the testing code:

@njit()
def table_isin(a, b):
    return np.isin(a, b, kind="table")
with self.assertRaises(NumbaValueError):
    table_isin(a, b)

I also tried kind=literal("table") and kind=literally("table"). But the code always raises a ValueError at runtime rather than a NumbaValueError at compilation time, and if I insert debug print statements, I see repr(kind) == unicode_type and type(kind) == <class 'numba.core.types.misc.UnicodeType'>.

What am I doing wrong? Other functions e.g. np.searchsorted use identical patterns in their tests.

jaredjeya · 2024-01-22T17:06:44Z

Nevermind, I sorted it

guilhermeleobas · 2024-01-24T16:29:53Z

numba/np/arraymath.py

+ if kind is not None and kind != "sort":
+ raise ValueError('isin: Only kind="sort" is supported')


If kind=None, NumPy will use table by default, which is not supported in this PR. Change the clause to something like this:

Suggested change

if kind is not None and kind != "sort":

raise ValueError('isin: Only kind="sort" is supported')

if kind != "sort":

raise ValueError('isin: Only kind="sort" is supported')

This will force a user to spell the kind="sort" keyword argument and not assume Numba supports the table option.

If kind=None, NumPy will use table by default, which is not supported in this PR.

Yes, that is why I didn't implement kind=None in the first place. I'll change it back.

The function signature needs to be with kind=None as it is what NumPy specifies. But let's keep a check to only run the function if the user specifies kind="sort".

Let me know if you have any questions regarding this.

guilhermeleobas · 2024-01-24T16:30:43Z

numba/np/arraymath.py

+ if not (kind is None or isinstance(kind, types.NoneType)):
+ kindval = getattr(kind, "literal_value", kind)
+ if not isinstance(kindval,
+ (types.UnicodeType, str, types.StringLiteral)):
+ raise TypingError('in1d: Argument "kind" must be a string or None')
+ elif kindval != "sort":
+ raise NumbaValueError('in1d: Only kind="sort" is supported')


That's fine, but we can keep only the ValueError to simplify the codebase.

To be clear, I should get rid of this whole section?

Actually I've reduced it to just,

if not isinstance(kind, (types.UnicodeType, str, types.StringLiteral)): raise TypingError('in1d: Only kind="sort" is supported')

which makes sure kind=None doesn't make it past compilation. It seemed to be causing some type inference issues if I didn't have that.

numba/np/arraymath.py

jaredjeya · 2024-02-14T16:30:30Z

@guilhermeleobas Tests seem to be failing now because numpy < 1.24 doesn't support the kind keyword, and because we now force this to be included explicitly, the numpy and numba behaviours diverge. What do you suggest?

According to the release notes, pre-1.24, the kind="sort" behaviour was the default. Newer versions automatically pick the optimal algorithm.

It might not be too hard to implement kind="table", but it would take me a while as I'm quite busy atm. Otherwise we can drop the keyword entirely and say we're only replicating pre-1.24 behaviour? The documentation suggests using a version gate but I don't know what that means in this context.

guilhermeleobas · 2024-02-14T18:09:51Z

According to the release notes, pre-1.24, the kind="sort" behaviour was the default. Newer versions automatically pick the optimal algorithm.

I think we can have an if-else statement to support the kind argument if NumPy version if greater or equal 1.24:

if numpy_version < (1, 24):
    @overload(np.isin)
    def jit_np_isin(element, test_elements, assume_unique=False, invert=False):
         ....

else:
    @overload(np.isin)
    def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
                    kind=None):
        ...

You may have to update the tests as well to only pass kind="sort" if NumPy >= 1.24

Otherwise we can drop the keyword entirely and say we're only replicating pre-1.24 behaviour?

That's also an option. Choose the one that is easiest for you to implement.

github-actions · 2024-07-09T01:49:42Z

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

jaredjeya · 2024-07-10T16:07:21Z

This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks!

I do have this on my todo list, I've just been somewhat swamped recently. I do plan to set aside some time to hopefully get this over the line soon. Hopefully nothing major needed to merge this with subsequent changes to numba.

…t of pre-1.24 Numpy.

jaredjeya · 2024-08-27T17:11:28Z

I decided it's simpler and less confusing to the end user (especially working across multiple versions) if we just specify that "kind" isn't supported, and that behaviour matches that of pre-1.24.

Unfortunately it looks like the build is failing due to changes to put in the new typing system - I'm not really sure what to do here?

Edit: nevermind, managed to sort it out

guilhermeleobas

Sounds good to me. I'm happy with the current changes. Thanks for your effort in this PR.

esc · 2024-09-24T08:22:51Z

Thank you for the patch.

synapticarbors and others added 5 commits December 7, 2023 12:46

Add support for np.setxor1d

ed7bd17

np.setxor1d clean up flake8 errors

8d75fea

Implement the functions np.in1d, np.setxor1d, and np.setdiff1d. Addit…

f74b246

…ionally, extend np.intersect1d to support the first optional argument (assume_unique).

Towncrier release note.

3b0bb2b

Fixed accidentally removing last line of np_trim_zeros during rebase.

1015c03

jaredjeya mentioned this pull request Dec 8, 2023

Add support for numpy.isin #4815

Closed

4 tasks

jaredjeya marked this pull request as draft December 8, 2023 12:10

Adds np.isin support and associated tests. Implements numpy's fast al…

b4b4a1e

…gorithm for np.in1d with short second inputs.

jaredjeya marked this pull request as ready for review December 8, 2023 13:39

jaredjeya changed the title ~~Add np.in1d, np.setxor1d, np.setdiff1d, extend np.intersect1d.~~ Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. Dec 8, 2023

gmarkall added 2 - In Progress 3 - Ready for Review and removed 2 - In Progress labels Dec 11, 2023

gmarkall added this to the 0.60.0-rc1 milestone Dec 12, 2023

guilhermeleobas reviewed Jan 12, 2024

View reviewed changes

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

guilhermeleobas added 4 - Waiting on author Waiting for author to respond to review Effort - medium Medium size effort needed and removed 3 - Ready for Review labels Jan 12, 2024

Implementing reviewer comments, including adding extensive tests borr…

5a16a93

…owed from numpy's own testing. Also fixed some bugs in the tests (including that lists were never tested).

jaredjeya requested a review from guilhermeleobas January 16, 2024 22:20

guilhermeleobas requested changes Jan 19, 2024

View reviewed changes

Support kind="None", test for correct compilation errors.

3725c80

jaredjeya requested a review from guilhermeleobas January 22, 2024 23:12

guilhermeleobas requested changes Jan 24, 2024

View reviewed changes

Remove support for kind="None", only support explicit kind="sort".

b99853a

gmarkall modified the milestones: 0.60.0-rc1, 0.61.0-rc1 Apr 9, 2024

github-actions bot added the stale Marker label for stale issues. label Jul 9, 2024

guilhermeleobas added 2 - In Progress and removed stale Marker label for stale issues. labels Jul 10, 2024

Remove support for "kind" keyword, explain that behaviour matches tha…

4337466

…t of pre-1.24 Numpy.

jaredjeya added 2 commits August 27, 2024 18:25

Try and merge with upstream changes

ddc2d28

Merge remote-tracking branch 'upstream/main' into add-setops1d

5d5b5b5

guilhermeleobas approved these changes Aug 27, 2024

View reviewed changes

guilhermeleobas added 5 - Ready to merge Review and testing done, is ready to merge and removed 2 - In Progress 4 - Waiting on author Waiting for author to respond to review labels Aug 27, 2024

esc merged commit 301ba23 into numba:main Sep 24, 2024
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

jaredjeya commented Dec 7, 2023 •

edited

Loading

jaredjeya commented Dec 8, 2023

jaredjeya commented Dec 8, 2023 •

edited

Loading

stuartarchibald commented Dec 8, 2023

jaredjeya commented Dec 10, 2023

gmarkall commented Dec 11, 2023

azure-pipelines bot commented Dec 11, 2023

gmarkall commented Dec 12, 2023

guilhermeleobas left a comment

guilhermeleobas commented Jan 15, 2024

jaredjeya commented Jan 16, 2024

jaredjeya commented Jan 16, 2024

guilhermeleobas Jan 19, 2024

jaredjeya Jan 19, 2024

guilhermeleobas Jan 19, 2024

jaredjeya commented Jan 22, 2024 •

edited

Loading

jaredjeya commented Jan 22, 2024

guilhermeleobas Jan 24, 2024

jaredjeya Jan 30, 2024

guilhermeleobas Feb 2, 2024 •

edited

Loading

guilhermeleobas Jan 24, 2024

jaredjeya Feb 13, 2024

jaredjeya Feb 13, 2024

jaredjeya commented Feb 14, 2024 •

edited

Loading

guilhermeleobas commented Feb 14, 2024

github-actions bot commented Jul 9, 2024

jaredjeya commented Jul 10, 2024

jaredjeya commented Aug 27, 2024 •

edited

Loading

guilhermeleobas left a comment

esc commented Sep 24, 2024

		@@ -4982,20 4981,23 @@ def np_in1d_impl(ar1, ar2, assume_unique=False, invert=False, kind="sort"):
		def jit_np_isin(element, test_elements, assume_unique=False, invert=False,
		kind="sort"):

		if kind is not None and kind != "sort":
		raise ValueError('isin: Only kind="sort" is supported')

Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

Add np.in1d, np.isin, np.setxor1d, np.setdiff1d, extend np.intersect1d. #9338

Conversation

jaredjeya commented Dec 7, 2023 • edited Loading

jaredjeya commented Dec 8, 2023

jaredjeya commented Dec 8, 2023 • edited Loading

stuartarchibald commented Dec 8, 2023

jaredjeya commented Dec 10, 2023

gmarkall commented Dec 11, 2023

azure-pipelines bot commented Dec 11, 2023

gmarkall commented Dec 12, 2023

guilhermeleobas left a comment

Choose a reason for hiding this comment

guilhermeleobas commented Jan 15, 2024

jaredjeya commented Jan 16, 2024

jaredjeya commented Jan 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaredjeya commented Jan 22, 2024 • edited Loading

jaredjeya commented Jan 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guilhermeleobas Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaredjeya commented Feb 14, 2024 • edited Loading

guilhermeleobas commented Feb 14, 2024

github-actions bot commented Jul 9, 2024

jaredjeya commented Jul 10, 2024

jaredjeya commented Aug 27, 2024 • edited Loading

guilhermeleobas left a comment

Choose a reason for hiding this comment

esc commented Sep 24, 2024

jaredjeya commented Dec 7, 2023 •

edited

Loading

jaredjeya commented Dec 8, 2023 •

edited

Loading

jaredjeya commented Jan 22, 2024 •

edited

Loading

guilhermeleobas Feb 2, 2024 •

edited

Loading

jaredjeya commented Feb 14, 2024 •

edited

Loading

jaredjeya commented Aug 27, 2024 •

edited

Loading