Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadableFileHandleProtocol.readToEnd can fail to read the complete contents of file sizes less than the single shot read limit #2769

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rpecka
Copy link

@rpecka rpecka commented Jul 6, 2024

Resolved an issue where ReadableFileHandleProtocol.readToEnd could fail to read the contents of files smaller than the single shot read limit (64 MiB).

Motivation:

If readToEnd detects that the file in question is smaller than the single shot read limit, then it will read the file using a single call to readChunk, however, there isn't a guarantee that readChunk will return the entire requested chunk. If this happens, then readToEnd only returns the result of the first read and does not execute any followup reads.

Modifications:

I separated this into two sections (two commits) because I found another issue that I had to resolve in order to fix the chunking problem.

First Commit

This is what is required to fix the missing chunk reads, but it causes testReadFileAsChunks to fail because handle.readChunks(in: ..., chunkLength: .bytes(128)) moves the file access position to the end, which means that the subsequent handle.readToEnd(maximumSizeAllowed: .bytes(1024 * 1024)) reads zero bytes since the file is fully read, so we get a precondition failure when we run contents.moveReaderIndex(forwardBy: 100) because we're trying to move the reader index to 100 for a byte array of length zero.

The problem is that when we initialize a FileChunks object, if the range is set to 0..<Int.max, we use the .entireFile chunk range. This causes BufferedStream to use a ProducerState with a nil range, which means that no seeking is done when reading chunks. It looks like this behavior is intended for the case where we want to read an unseekable file, but it's being inadvertently triggered when we request a chunked read of a whole file.

TLDR: If we do any chunked read of a file, then try to do a chunked read of the entire file, the second read will begin where the first one left off instead of moving the pointer to the beginning of the file, despite the caller requesting a range starting at index zero.

Second Commit

  • Rewrite ChunkRange to have two modes:
    • current: reads from whatever the underlying file handle's offset currently is.
    • specified: reads from the specified range.
  • Allow the range to be unspecified when calling ReadableFileHandleProtocol.readChunks. This will trigger the use of ChunkRange.current.
  • Use the nil range argument from ReadableFileHandleProtocol.readToEnd when reading an unseekable file.
  • testWriteAndReadUnseekableFile: I think that this test was incorrect and there's no reason that we should not be able to read the contents of a fifo that we just wrote to.

General Comment

Part of the reason I think this is happening is because the readToEnd function is a bit counter intuitive in that it has a default parameter of 0 for fromAbsoluteOffset. When it's called using the default, it's not clear to the caller that it's going to go back to offset zero before reading (if the file is not a fifo). Maybe this should be changed to a nil default?

Result:

readToEnd should now return the full file contents when the file size is lower than the single shot read limit but readChunk does not return the entire requested chunk.

@rpecka
Copy link
Author

rpecka commented Jul 6, 2024

PRBs are failing due to the change to make the range argument in readChunks optional. One option would be to add a new function to the ReadableFileHandle protocol with the optional range. IIRC this is a binary-compatible but source breaking change -- not sure what the objectives are for this project.

Before we look into that tho, I think this would be a good opportunity to discuss how we could improve the API here if breaking changes were on the table.

To illustrate the API problem I was describing in the PR description:

// read to the end of a file using an unbounded chunk range
for try await chunk in handle.readChunks(in: ..., chunkLength: .bytes(128)) {
    bytes.writeImmutableBuffer(chunk)
}

// then, call `readToEnd` on the same file
var contents = try await handle.readToEnd(maximumSizeAllowed: .bytes(1024 * 1024))

If someone unfamiliar with the project read this, I think it would be reasonable for them to think that contents will be an empty buffer since the first call already read the file to completion. In reality, there is a hidden default argument so the true call is:

var contents = try await handle.readToEnd(fromAbsoluteOffset: 0, maximumSizeAllowed: .bytes(1024 * 1024))

So the value of bytes will be the same as the value of contents.
Keep in mind that this is the behavior after this patch. The current behavior would have readToEnd read from the current offset and return a zero length buffer, but I would argue that this is bad because someone could explicitly specify fromAbsoluteOffset: 0 and we would still read from the current offset rather than seeking to zero.

This is then made even more confusing by the behavior if the file is a fifo, since in that case, an offset of zero means that we should begin reading from the current position (since seeking is impossible).

@glbrntt
Copy link
Contributor

glbrntt commented Jul 8, 2024

Thanks for opening this PR @rpecka!

First of all I'd like it if we could separate this into two separate PRs: the issue with reading chunks is very different to the issue of potentially reading short so these should be addressed separately.

W.r.t. the issue with reading chunks, I don't think the user should be passing in an optional range here. Instead I think we should detect whether the file being read is a FIFO and then call the appropriate read function. This can be coupled with some validation against the existing ChunkRange (i.e. if it's a FIFO then we should only allow the entire file to be read or any other range starting at zero). We should add some additional documentation to explain the limitations for FIFOs though.

@glbrntt
Copy link
Contributor

glbrntt commented Jul 8, 2024

FWIW: some of the PRBs are failing because you're using syntax which isn't available in older Swift versions (5.8) which we still support.

@rpecka
Copy link
Author

rpecka commented Jul 9, 2024

@glbrntt thanks for the feedback. I wrote a PR for the file pointer issue here: #2772

Comment on lines 379 to 380
let chunkLength: ByteCount = if !forceChunkedRead, readSize <= singleShotReadLimit {
.bytes(Int64(readSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still support Swift 5.8 so you can't use this syntax, you'll need to declare let chunkLength: ByteCount and then assign to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants