`ReadableFileHandleProtocol.readToEnd` can fail to read the complete contents of file sizes less than the single shot read limit #2769

rpecka · 2024-07-06T10:33:08Z

Resolved an issue where ReadableFileHandleProtocol.readToEnd could fail to read the contents of files smaller than the single shot read limit (64 MiB).

Motivation:

If readToEnd detects that the file in question is smaller than the single shot read limit, then it will read the file using a single call to readChunk, however, there isn't a guarantee that readChunk will return the entire requested chunk. If this happens, then readToEnd only returns the result of the first read and does not execute any followup reads.

Modifications:

I separated this into two sections (two commits) because I found another issue that I had to resolve in order to fix the chunking problem.

First Commit

This is what is required to fix the missing chunk reads, but it causes testReadFileAsChunks to fail because handle.readChunks(in: ..., chunkLength: .bytes(128)) moves the file access position to the end, which means that the subsequent handle.readToEnd(maximumSizeAllowed: .bytes(1024 * 1024)) reads zero bytes since the file is fully read, so we get a precondition failure when we run contents.moveReaderIndex(forwardBy: 100) because we're trying to move the reader index to 100 for a byte array of length zero.

The problem is that when we initialize a FileChunks object, if the range is set to 0..<Int.max, we use the .entireFile chunk range. This causes BufferedStream to use a ProducerState with a nil range, which means that no seeking is done when reading chunks. It looks like this behavior is intended for the case where we want to read an unseekable file, but it's being inadvertently triggered when we request a chunked read of a whole file.

TLDR: If we do any chunked read of a file, then try to do a chunked read of the entire file, the second read will begin where the first one left off instead of moving the pointer to the beginning of the file, despite the caller requesting a range starting at index zero.

Second Commit

Rewrite ChunkRange to have two modes:
- current: reads from whatever the underlying file handle's offset currently is.
- specified: reads from the specified range.
Allow the range to be unspecified when calling ReadableFileHandleProtocol.readChunks. This will trigger the use of ChunkRange.current.
Use the nil range argument from ReadableFileHandleProtocol.readToEnd when reading an unseekable file.
testWriteAndReadUnseekableFile: I think that this test was incorrect and there's no reason that we should not be able to read the contents of a fifo that we just wrote to.

General Comment

Part of the reason I think this is happening is because the readToEnd function is a bit counter intuitive in that it has a default parameter of 0 for fromAbsoluteOffset. When it's called using the default, it's not clear to the caller that it's going to go back to offset zero before reading (if the file is not a fifo). Maybe this should be changed to a nil default?

Result:

readToEnd should now return the full file contents when the file size is lower than the single shot read limit but readChunk does not return the entire requested chunk.

… shot read limit since readChunk is not guaranteed to read all the requested bytes

…fset

rpecka · 2024-07-06T10:54:57Z

PRBs are failing due to the change to make the range argument in readChunks optional. One option would be to add a new function to the ReadableFileHandle protocol with the optional range. IIRC this is a binary-compatible but source breaking change -- not sure what the objectives are for this project.

Before we look into that tho, I think this would be a good opportunity to discuss how we could improve the API here if breaking changes were on the table.

To illustrate the API problem I was describing in the PR description:

// read to the end of a file using an unbounded chunk range
for try await chunk in handle.readChunks(in: ..., chunkLength: .bytes(128)) {
    bytes.writeImmutableBuffer(chunk)
}

// then, call `readToEnd` on the same file
var contents = try await handle.readToEnd(maximumSizeAllowed: .bytes(1024 * 1024))

If someone unfamiliar with the project read this, I think it would be reasonable for them to think that contents will be an empty buffer since the first call already read the file to completion. In reality, there is a hidden default argument so the true call is:

var contents = try await handle.readToEnd(fromAbsoluteOffset: 0, maximumSizeAllowed: .bytes(1024 * 1024))

So the value of bytes will be the same as the value of contents.
Keep in mind that this is the behavior after this patch. The current behavior would have readToEnd read from the current offset and return a zero length buffer, but I would argue that this is bad because someone could explicitly specify fromAbsoluteOffset: 0 and we would still read from the current offset rather than seeking to zero.

This is then made even more confusing by the behavior if the file is a fifo, since in that case, an offset of zero means that we should begin reading from the current position (since seeking is impossible).

glbrntt · 2024-07-08T09:08:54Z

Thanks for opening this PR @rpecka!

First of all I'd like it if we could separate this into two separate PRs: the issue with reading chunks is very different to the issue of potentially reading short so these should be addressed separately.

W.r.t. the issue with reading chunks, I don't think the user should be passing in an optional range here. Instead I think we should detect whether the file being read is a FIFO and then call the appropriate read function. This can be coupled with some validation against the existing ChunkRange (i.e. if it's a FIFO then we should only allow the entire file to be read or any other range starting at zero). We should add some additional documentation to explain the limitations for FIFOs though.

glbrntt · 2024-07-08T09:10:09Z

FWIW: some of the PRBs are failing because you're using syntax which isn't available in older Swift versions (5.8) which we still support.

rpecka · 2024-07-09T06:54:56Z

@glbrntt thanks for the feedback. I wrote a PR for the file pointer issue here: #2772

glbrntt · 2024-07-09T13:19:24Z

Sources/NIOFileSystem/FileHandleProtocol.swift

+            let chunkLength: ByteCount = if !forceChunkedRead, readSize <= singleShotReadLimit {
+                .bytes(Int64(readSize))


We still support Swift 5.8 so you can't use this syntax, you'll need to declare let chunkLength: ByteCount and then assign to it.

rpecka added 2 commits July 6, 2024 03:03

Use chunked file read even when the read size is less than the single…

5de0623

… shot read limit since readChunk is not guaranteed to read all the requested bytes

Rewrite ChunkRange to use either a specified offset or the current of…

ecbb478

…fset

glbrntt reviewed Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ReadableFileHandleProtocol.readToEnd` can fail to read the complete contents of file sizes less than the single shot read limit #2769

`ReadableFileHandleProtocol.readToEnd` can fail to read the complete contents of file sizes less than the single shot read limit #2769

rpecka commented Jul 6, 2024

rpecka commented Jul 6, 2024

glbrntt commented Jul 8, 2024

glbrntt commented Jul 8, 2024

rpecka commented Jul 9, 2024

glbrntt Jul 9, 2024

		let chunkLength: ByteCount = if !forceChunkedRead, readSize <= singleShotReadLimit {
		.bytes(Int64(readSize))

ReadableFileHandleProtocol.readToEnd can fail to read the complete contents of file sizes less than the single shot read limit #2769

Are you sure you want to change the base?

ReadableFileHandleProtocol.readToEnd can fail to read the complete contents of file sizes less than the single shot read limit #2769

Conversation

rpecka commented Jul 6, 2024

Motivation:

Modifications:

First Commit

Second Commit

General Comment

Result:

rpecka commented Jul 6, 2024

glbrntt commented Jul 8, 2024

glbrntt commented Jul 8, 2024

rpecka commented Jul 9, 2024

glbrntt Jul 9, 2024

Choose a reason for hiding this comment

`ReadableFileHandleProtocol.readToEnd` can fail to read the complete contents of file sizes less than the single shot read limit #2769

`ReadableFileHandleProtocol.readToEnd` can fail to read the complete contents of file sizes less than the single shot read limit #2769