-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
huggingface-cli scan-cache doesn't capture cached datasets #2218
Comments
The offending code can be found here, where the default cache location is sourced from environment variable HF_HUB_CACHE: I say 'offending code', but that's just the original commit of that code. It was how it was designed at the time, I suppose, but I imagine it was decided later to have a shared blob download location to allow for datasets that had shared files? I'm guessing... |
Thanks for pointing that out @sealad886! The |
I've recently noticed that I'm unable to use What seems to be happening is the following:
Is there a simple workaround in the setting of the env vars so that one can use |
Hi @lewtun thanks for the feedback. This is something specific to |
that only concerns the |
Describe the bug
The cached location of datasets is variant depending on how you download them from Huggingface:
In this case, the default location (I'll use MacOS since that's what I have, but I'm assuming some level of overall consistency here) is:
$HOME/.cache/huggingface/hub/
. In the above example, the directory created isdatasets--wikimedia--wikisource
such that:In this case, the default location is no longer controlled by the environment variable HF_HUB_CACHE. The naming convention is also slightly different. The default location is:
$HOME/.cache/huggingface/datasets
and the data structure is:Using
huggingface-cli scan-cache
a user is unable to access the (actually useful) second cache location. I say "actually useful" because to date I haven't yet been able to figure out how to easily get a dataset cached with the CLI to be used in any models in code.Other issues that may or may not need separate tickets
huggingface-cli delete-cache
.Reproduction
Well...use the code and examples above.
Logs
No response
System info
The text was updated successfully, but these errors were encountered: