Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow numba functions within zip files to be cached #9630

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

max-sixty
Copy link

Numba doesn't currently allow .zip archives to contain numba functions when caching is enabled. This is all but required when using Spark.

This change enables that, by recognizing a zip file and using a user-wide cache directory for the cache. It closes (most of?) #4908 (comment)

I'm the maintainer of numbagg, but haven't contributed directly to numba, so please let me know if the contribution needs any work, happy to make changes.

FWIW — hoping to be helpful — I found it not super easy to use the repo — I notice it doesn't use pytest / code formatters / etc. That does make contributions a bit harder for newcomers, though I'm sure it's for good reason...

@gmarkall gmarkall added 2 - In Progress caching Issue involving caching labels Jun 27, 2024
@gmarkall
Copy link
Member

Thanks for the PR! I'm just marking it as "2 - In progress" while we wait for CI.

FWIW — hoping to be helpful — I found it not super easy to use the repo — I notice it doesn't use pytest / code formatters / etc. That does make contributions a bit harder for newcomers, though I'm sure it's for good reason...

Thanks for the comments, I do think they're helpful. It doesn't use pytest because there's a lot of homegrown addons to unittest that have been added over the years, and porting them over to pytest wasn't trivial (there was an attempt some years ago but I believe it stalled).

WRT to code formatting, we only check for flake8 compliance (with a large number of files excluded as they were historically never checked and they never got fixed up yet). I think there's also more that could be done to make things a smoother experience here, but it's been difficult to move forward with relative to other items.

@gmarkall
Copy link
Member

From a quick skim it looks like the tests aren't platform-agnostic WRT the way paths are formed (the root, and separators) - I haven't looked deeper, but perhaps pathlib.Path could be used to make the form of the tests platform-independent?

@max-sixty
Copy link
Author

From a quick skim it looks like the tests aren't platform-agnostic WRT the way paths are formed (the root, and separators) - I haven't looked deeper, but perhaps pathlib.Path could be used to make the form of the tests platform-independent?

Thanks, fixed

@max-sixty
Copy link
Author

It doesn't use pytest because there's a lot of homegrown addons to unittest that have been added over the years, and porting them over to pytest wasn't trivial (there was an attempt some years ago but I believe it stalled).

Yes totally re homegrown addons...

I would say that pytest is much much nicer, both from the perspective of maintainers being able to design better test suites, and the perspective of contributors who are running tests — for example I couldn't seem to enter pdb on a test failure with the current setup? Transitioning probably isn't trivial but also probably not herculean — pytest can collect & run standard unittests, and running two test commands for a while could be a way to get moving.

WRT to code formatting, we only check for flake8 compliance (with a large number of files excluded as they were historically never checked and they never got fixed up yet). I think there's also more that could be done to make things a smoother experience here, but it's been difficult to move forward with relative to other items.

OK, you might want to check out ruff for linting; it fixes most failures itself. For formatting: we moved to black in a big bang in xarray, was very successful. The one-off merge conflicts are a bit annoying, but tractable ("merge up to the formatting commit, format with black, merge in the formatting commit").


The pytest transition probably requires some expertise but if you wanted a hand with linting & formatting, happy to help.

@max-sixty
Copy link
Author

This is ready for review!

@gmarkall
Copy link
Member

gmarkall commented Jul 2, 2024

This is ready for review!

Oops, sorry, I forgot to check back when I was waiting for CI the other day.

@gmarkall gmarkall added this to the 0.61.0-rc1 milestone Jul 2, 2024
@max-sixty
Copy link
Author

Hi team — the gentlest of pings here. I know things can get backlogged. I do think this is a fairly small unambiguous improvement, with some decent comments and a great test. If there's anything I can do to reduce the reviewing burden, please lmk.

@guilhermeleobas
Copy link
Contributor

Sorry! I'll review it today

@guilhermeleobas
Copy link
Contributor

Sorry, I wrongly pushed to your branch an empty commit

@max-sixty
Copy link
Author

No prob re empty commit, push away.

Docs error seems unrelated though?

numba/_helperlib.c:22:44: error: ‘I’ undeclared (first use in this function)
   22 |     #define _complex_float_ctor(r, i) (r   I * i)
      |                                            ^
numba/_helperlib.c:147:12: note: in expansion of macro ‘_complex_float_ctor’
  147 |     *out = _complex_float_ctor((float) _out.real, (float) _out.imag);
      |            ^~~~~~~~~~~~~~~~~~~
numba/_helperlib.c:22:44: note: each undeclared identifier is reported only once for each function it appears in
   22 |     #define _complex_float_ctor(r, i) (r   I * i)
      |                                            ^
numba/_helperlib.c:147:12: note: in expansion of macro ‘_complex_float_ctor’
  147 |     *out = _complex_float_ctor((float) _out.real, (float) _out.imag);
      |            ^~~~~~~~~~~~~~~~~~~
error: command '/usr/bin/gcc' failed with exit code 1

@guilhermeleobas
Copy link
Contributor

Docs error seems unrelated though?

Yes! I guess something must have changed on conda-forge.

@stuartarchibald
Copy link
Contributor

I think that fixing the docs issue requires merging numba:main into this branch so that #9675 is picked up.

numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/tests/test_caching.py Outdated Show resolved Hide resolved
numba/core/caching.py Outdated Show resolved Hide resolved
numba/core/caching.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Author

(I moved the dev tools discussions to other issues. This PR is ready for a second review — thanks!)

Copy link
Contributor

@guilhermeleobas guilhermeleobas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. Just wait for @sklam review and we can merge this one. Thanks for your contribution.

numba/tests/test_caching.py Outdated Show resolved Hide resolved
@guilhermeleobas
Copy link
Contributor

guilhermeleobas commented Aug 20, 2024

@sklam, are you ok with the current state or would like to review this one as well? If not, we can merge it.

@guilhermeleobas guilhermeleobas added 4 - Waiting on second reviewer Patch needs a second reviewer. and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Aug 27, 2024
@sklam sklam added 5 - Ready to merge Review and testing done, is ready to merge 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on second reviewer Patch needs a second reviewer. 3 - Ready for Review 5 - Ready to merge Review and testing done, is ready to merge labels Aug 27, 2024
@sklam
Copy link
Member

sklam commented Aug 27, 2024

@max-sixty, can you add a news fragment for this PR? It will then be part of the release note. See https://github.com/numba/numba/tree/main/docs/upcoming_changes for README and examples.

@max-sixty
Copy link
Author

Great, added!

@esc
Copy link
Member

esc commented Aug 29, 2024

/azp run

Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@esc esc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some capitalization fixes.

docs/upcoming_changes/9630.improvement.rst Outdated Show resolved Hide resolved
docs/upcoming_changes/9630.improvement.rst Outdated Show resolved Hide resolved
docs/upcoming_changes/9630.improvement.rst Outdated Show resolved Hide resolved
docs/upcoming_changes/9630.improvement.rst Outdated Show resolved Hide resolved
docs/upcoming_changes/9630.improvement.rst Outdated Show resolved Hide resolved
max-sixty and others added 5 commits August 29, 2024 09:02
Co-authored-by: Emergency Self-Construct <[email protected]>
Co-authored-by: Emergency Self-Construct <[email protected]>
Co-authored-by: Emergency Self-Construct <[email protected]>
Co-authored-by: Emergency Self-Construct <[email protected]>
Co-authored-by: Emergency Self-Construct <[email protected]>
@max-sixty
Copy link
Author

Done!

Copy link
Member

@esc esc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@max-sixty
Copy link
Author

No urgency but it states still Waiting on Author — I think it's now ready for you to merge — lmk if that's not correct

@esc esc added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge caching Issue involving caching
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants