Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return the File instance from Entry::unpack #203

Merged
merged 1 commit into from
May 17, 2019

Conversation

rbtcollins
Copy link
Contributor

This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

Follow on patch from #202 - again, text conflicts or I would have made them standalone.

Copy link
Owner

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me and looks like some good wins!

This is technically a breaking change but that's ok. I'm probably gonna try to release this as a patch version and pray it doesn't break anything and if that doesn't work out a new major version could be published.

I expected some further speedups on behalf of rustup from removing extra syscalls on the extracted files (using the returned handle instead of reopening), but I was surprised to see that the main purpose was actually just dropping files! Isn't this a case where if extraction happened in parallel on many threads it'd also solve the issue?

src/entry.rs Outdated Show resolved Hide resolved
src/entry.rs Outdated Show resolved Hide resolved
src/entry.rs Outdated Show resolved Hide resolved
src/entry.rs Outdated Show resolved Hide resolved
@rbtcollins
Copy link
Contributor Author

Seems reasonable to me and looks like some good wins!

Cool. I have some further things I'm poking at - such as being able to set the content indexing disabled flag - this would need to let folk hook into the open() call more deeply, or would need a more complex way of telling unpack what to do; if you have API suggestions around this that would be great.

This is technically a breaking change but that's ok. I'm probably gonna try to release this as a patch version and pray it doesn't break anything and if that doesn't work out a new major version could be published.

FWIW rustup didn't fail to build with this change added to it - the existing signature io::Result<()> is already must-consume, with the value ignored; as you say "technically breaking" - I too hope no fallout happens.

I expected some further speedups on behalf of rustup from removing extra syscalls on the extracted files (using the returned handle instead of reopening), but I was surprised to see that the main purpose was actually just dropping files! Isn't this a case where if extraction happened in parallel on many threads it'd also solve the issue?

Yes, if extraction was happening in parallel on many threads, CloseHandle would be per-thread as a consequence. However handling of dependencies in tar files becomes more complex with parallel extraction (e.g. extract foo/ then foo/bar ). This by comparison is very simple to reason about (see the patch in rustup).

Another thing we can parallelise for more speed is the decompression of the tar content; and then also another person is interested in extracting while downloading from the net for even better wall clock times.

@rbtcollins rbtcollins force-pushed the return-file branch 3 times, most recently from 049e3a3 to ac582f5 Compare May 16, 2019 22:56
src/entry.rs Outdated Show resolved Hide resolved
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
@alexcrichton alexcrichton merged commit 7098b0a into alexcrichton:master May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants