Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zlib::GzipReader#readpartial until eof? causes EOFError #56

Closed
martinemde opened this issue Aug 16, 2023 · 4 comments · Fixed by #72
Closed

Zlib::GzipReader#readpartial until eof? causes EOFError #56

martinemde opened this issue Aug 16, 2023 · 4 comments · Fixed by #72

Comments

@martinemde
Copy link
Contributor

martinemde commented Aug 16, 2023

I'm using the data.tar.gz from ruby-progressbar-1.13.0.gem, which seems to trigger this error. If I take exactly the same data.tar.gz, gunzip, and re-gzip it, then this bug doesn't happen. However, the gem unpacks fine, it's just readpartial that breaks.

$ gem fetch ruby-progressbar -v 1.13.0
$ tar zxf ruby-progressbar-1.13.0.gem data.tar.gz
irb> io = File.open('data.tar.gz')
=> #<File:data.tar.gz>
irb> io.size
=> 10250
irb> Zlib::GzipReader.wrap(io) { |gzio| gzio.readpartial(16_384) until gzio.eof? } # rubygems does this, but with #read
Traceback (most recent call last):
        7: from /usr/bin/irb:23:in `<main>'
        6: from /usr/bin/irb:23:in `load'
        5: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
        4: from (irb):9
        3: from (irb):9:in `wrap'
        2: from (irb):9:in `block in irb_binding'
        1: from (irb):9:in `readpartial'
EOFError (end of file reached)

I have done some digging to verify I didn't cause this. It seems to require that you use this data.tar.gz that has a conflict with how zlib uses readpartial. Other data.tar.gz files work, and this same data.tar file re-gzipped works.

I think the EOF is raised here because it reaches the end of the file during gzfile_read_more(gz, outbuf); and then checks if it's at the end before finalizing the unzip. https://github.com/ruby/zlib/blob/master/ext/zlib/zlib.c#L2933

This does seem like it could be fixed. Using gzio.read(16_384) instead of gzio.readpartial works.

Edit2: The error is real but my understanding of it and why it happened was wrong. Leaving this open with changed title until the fix is merged.

This bug blocks rubygems from using readpartial to read gems.

@martinemde martinemde changed the title Zlib::GzipReader#readpartial seems to always raise EOFError Zlib::GzipReader#readpartial raises EOFError on certain valid-ish gz files Aug 16, 2023
martinemde added a commit to rubygems/rubygems that referenced this issue Aug 16, 2023
@martinemde
Copy link
Contributor Author

I'm running into a similar issue on jruby. It seems that readpartial on a gzipreader wrapped io does not always read as many bytes as you ask it to. This can cause the io.pos to be shifted, starting at the wrong position to line up with the tar header. This happens only if I use readpartial to move through the gzio, but not if I use gzio.read.

@martinemde martinemde changed the title Zlib::GzipReader#readpartial raises EOFError on certain valid-ish gz files Zlib::GzipReader#readpartial reads the wrong number of bytes Sep 14, 2023
@segiddins
Copy link
Contributor

I think that readpartial isnt guaranteed to read exactly len bytes.

From the IO docs: Reads up to maxlen bytes from the stream

also:

When blocked, the method waits for either more data or EOF on the stream:

If more data is read, the method returns the data.

If EOF is reached, the method raises [EOFError](dfile:///Users/segiddins/Library/Application Support/Dash/DocSets/Ruby_3/Ruby.docset/Contents/Resources/Documents/EOFError.html).

When not blocked, the method responds immediately:

Returns data from the buffer if there is any.

Otherwise returns data from the stream if there is any.

Otherwise raises [EOFError](dfile:///Users/segiddins/Library/Application Support/Dash/DocSets/Ruby_3/Ruby.docset/Contents/Resources/Documents/EOFError.html) if the stream has reached EOF.

So it seems like an EOFError should only be raise if no data was read, which I think is the opposite of the implementation today.

I think I have a fix though, and cutting down on the number of calls needed in a readpartial until eof? loop also seems like a good thing?

segiddins added a commit to segiddins/zlib that referenced this issue Sep 19, 2023
@martinemde martinemde changed the title Zlib::GzipReader#readpartial reads the wrong number of bytes Zlib::GzipReader#readpartial until eof? causes EOFError Dec 16, 2023
@segiddins
Copy link
Contributor

I spent more time looking into this.

I ran gunzip -c test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz | gzip -9 - >test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz, then diffed the hexdumps of the two files:

❯ diff -U5 --label orig <(hexdump -C test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz) --label new <(hexdump -C test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz)
--- orig
    new
@@ -1,7  1,7 @@
-00000000  1f 8b 08 00 c7 e8 02 64  02 03 ec 3d 7d 7f da 38  |.......d...=}..8|
 00000000  1f 8b 08 00 47 b5 83 65  02 03 ed 3d 7d 7f da 38  |....G..e...=}..8|
 00000010  d2 fb 37 9f 42 4d 9e 2e  90 25 06 f2 d6 3d b6 34  |..7.BM...%...=.4|
 00000020  4b 12 da 72 9b 84 fc 80  6c af 4f 36 3f d7 80 49  |K..r....l.O6?..I|
 00000030  7c 35 36 67 9b a4 d9 a6  cf 67 7f 66 f4 62 4b 7e  ||56g.....g.f.bK~|
 00000040  83 74 d3 f6 ae 07 db 4d  82 ad 19 8d 46 d2 68 66  |.t.....M....F.hf|
 00000050  34 1a 1d 77 0e db a7 fd  b6 16 7c 08 7e f8 52 9f  |4..w......|.~.R.|
@@ -636,8  636,8 @@
 000027a0  4c 00 36 75 c7 a5 ed bd  1a cf af 99 72 09 5a ac  |L.6u........r.Z.|
 000027b0  ec 5e 8d fb 22 97 b8 d7  6c e9 d5 75 65 48 7d 4f  |.^.."...l..ueH}O|
 000027c0  eb ff 8d e9 61 bc ec 63  69 00 0b f7 7f f7 b6 e2  |....a..ci.......|
 000027d0  eb ff 4e 7d 95 ff e5 2b  fa ff e4 c5 9d 90 df db  |..N}... ........|
 000027e0  bd 3e 0b ce 2a d6 b5 fa  b6 56 0b b7 59 56 53 7d  |.>..*....V..YVS}|
-000027f0  f5 59 7d 56 9f d5 e7 7b  f9 fc 3f 00 00 00 ff ff  |.Y}V...{..?.....|
-00002800  03 00 19 b6 3b 4c 00 ea  00 00                    |....;L....|
-0000280a
 000027f0  f5 59 7d 56 9f d5 e7 7b  f9 fc 3f 19 b6 3b 4c 00  |.Y}V...{..?..;L.|
 00002800  ea 00 00                                          |...|
 00002803

What's interesting here is, other than one byte at the start of the deflate stream, the only diff is at the end.

So then I ran https://github.com/madler/infgen, and diffed that:

❯ diff -U5 --label orig <(~/Development/github.com/madler/infgen/a.out test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz) --label new <(~/Development/github.com/madler/infgen/a.out test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz)
--- orig
    new
@@ -1,9  1,10 @@
 ! infgen 3.2 output
 !
 gzip
 !
 last
 dynamic
 litlen 0 10
 litlen 10 7
 litlen 32 5
 litlen 33 9
@@ -4275,16  4276,9 @@
 match 258 1
 match 258 1
 match 258 1
 match 258 1
 match 200 1
-end
-!
-stored
-end
-!
-last
-fixed
 end
 !
 crc
 length

and this shows us what's happening here.

The original file has 3 different deflate blocks in the stream: the main block, followed by an empty block, followed by a final empty block.

Because our Zlib implementation keeps reading until the output buffer has something in it, it will read through those final 2 empty blocks, try to read again, and then EOFError because finally the stream is (actually) at its end.

I don't know what the fix here should be, other than changing gzfile_read_more to break unconditionally (even when ZSTREAM_BUF_FILLED(&gz->z) == 0), and removing the ZSTREAM_BUF_FILLED check from readpartial as well.

@segiddins
Copy link
Contributor

Minimized test case:

#!/usr/bin/env ruby -w

require 'stringio'
require 'zlib'

contents = "a" * 2030
zd = Zlib::Deflate.new(Zlib::NO_COMPRESSION)
header = "\x1f\x8b\x08\x00\xc7\xe8\x02\x64\x02\x03".b
s = "".b
s << zd.deflate(contents)
s << zd.flush(Zlib::SYNC_FLUSH)
s << zd.finish
zd.close

s = header  << s[2..-5] << [Zlib.crc32(contents)].pack('l<*') << [contents.size].pack('l<*')

read = "".b
Zlib::GzipReader.wrap(StringIO.new(s)) do |gzio| 
  until gzio.eof?
    part = gzio.readpartial(1024) # RAISES
    read << part
  end
end
pp read == contents

martinemde added a commit to martinemde/zlib that referenced this issue Dec 22, 2023
Only consider it eof if we read ahead and something fills the buf.
If not, we may only have empty blocks and the footer.

Fixes ruby#56
segiddins added a commit to segiddins/zlib that referenced this issue Dec 22, 2023
…he stream is not yet finished

See code comments for details

Fixes ruby#56
segiddins added a commit to segiddins/zlib that referenced this issue Dec 22, 2023
…he stream is not yet finished

See code comments for details

Fixes ruby#56
segiddins added a commit to segiddins/zlib that referenced this issue Dec 22, 2023
…he stream is not yet finished

See code comments for details

Fixes ruby#56
segiddins added a commit to segiddins/zlib that referenced this issue Dec 22, 2023
…he stream is not yet finished

See code comments for details

Fixes ruby#56
segiddins added a commit to segiddins/zlib that referenced this issue Jan 16, 2024
…he stream is not yet finished

See code comments for details

Fixes ruby#56
martinemde added a commit to martinemde/zlib that referenced this issue Jan 22, 2024
Only consider it eof if we read ahead and something fills the buf.
If not, we may only have empty blocks and the footer.

Fixes ruby#56
@nobu nobu closed this as completed in #72 Feb 22, 2024
matzbot pushed a commit to ruby/ruby that referenced this issue Feb 22, 2024
Only consider it eof if we read ahead and something fills the buf.
If not, we may only have empty blocks and the footer.

Fixes ruby/zlib#56

ruby/zlib@437bea8003
k0kubun added a commit to ruby/ruby that referenced this issue May 29, 2024
	[ruby/zlib] In Zlib::GzipReader#eof? check if we're actually at eof

	Only consider it eof if we read ahead and something fills the buf.
	If not, we may only have empty blocks and the footer.

	Fixes ruby/zlib#56

	ruby/zlib@437bea8003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants