Made chunk reading explicit when using read or pread #2772

rpecka · 2024-07-09T06:54:17Z

Resolved an issue where reading a file in chunks using an unbounded range would read from the current file pointer even for regular files.

Motivation:

When a FileChunks object is initialized, it checks if the range is set to 0..<Int.max. If it is, then instead of reading using a series of offsets and the pread sys call, it uses the read syscall repeatedly. This is fine if the read is the first ever of this type for a file, but if we read this way twice, then the second call will be affected by the side file pointer effect from the first call.

For example:

// Read the file the first time. This will repeatedly call `read` until EOF.
var firstRead = ByteBuffer()
for try await chunk in handle.readChunks(in: 0..<Int.max, chunkLength: .bytes(128)) {
  XCTAssertLessThanOrEqual(chunk.readableBytes, 128)
  firstRead.writeImmutableBuffer(chunk)
}
// Read the file again using `read` until EOF. This will read zero bytes since the previous call moved the file pointer to the end of the file without resetting it.
var secondRead = ByteBuffer()
for try await chunk in handle.readChunks(in: 0..<Int.max, chunkLength: .bytes(128)) {
  XCTAssertLessThanOrEqual(chunk.readableBytes, 128)
  secondRead.writeImmutableBuffer(chunk)
}

The main issue is that the read syscall affects the file pointer while the pread syscall does not.

Modifications:

Add a readChunksFromFilePointer to ReadableFileHandleProtocol to explicitly read from the current file pointer instead of relying on the magic 0..<Int.max range.
Use the new function when reading from an unseekable file in .readToEnd.
Rename the ChunkRange cases to make what they are doing clearer.

Result:

Reading 0..<Int.max over and over again will have the same result each time.

glbrntt · 2024-07-09T13:12:23Z

Sources/NIOFileSystem/FileChunks.swift

+        case filePointerToEnd
+        case range(Range<Int64>)


I don't think there's any reason to change these names

Isn't entireFile misleading because it read from the current file pointer, not from the beginning of the file?

glbrntt · 2024-07-09T13:14:44Z

Sources/NIOFileSystem/FileHandleProtocol.swift

+    /// Returns an asynchronous sequence of chunks read from the file starting from the current file pointer.
+    ///
+    /// - Parameters:
+    ///   - size: The maximum length of the chunk to read as a ``ByteCount``.
+    /// - Returns: A sequence of chunks read from the file.
+    func readChunksFromFilePointer(chunkLength size: ByteCount) -> FileChunks


This isn't quite what I had in mind. Rather than adding new API I think we should use the type of the file to determine how to do the read inside FileChunks. Once we know the type of the file we can determine whether the range passed in is acceptable and then call the appropriate read function.

This would make the checks that happen inside of .readToEnd redundant. Should we keep those or remove them?

This also means that calling readToEnd will stat the file twice.

I’m trying out your recommended solution and it also causes problems because if we check the file type in the FileChunks initializer, then that means the function has to be async throws. But the readChunks function from ReadableFileHandleProtocol is neither async or throws so that would be an API change.

rpecka mentioned this pull request Jul 9, 2024

ReadableFileHandleProtocol.readToEnd can fail to read the complete contents of file sizes less than the single shot read limit #2769

Open

Made chunk reading explicit when using read or pread

7baadcc

rpecka force-pushed the file-pointer-offset-bug branch from 30b9cec to 7baadcc Compare July 9, 2024 06:56

glbrntt requested changes Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made chunk reading explicit when using read or pread #2772

Made chunk reading explicit when using read or pread #2772

rpecka commented Jul 9, 2024 •

edited

Loading

glbrntt Jul 9, 2024

rpecka Jul 20, 2024

glbrntt Jul 9, 2024

rpecka Jul 20, 2024

rpecka Jul 20, 2024

Made chunk reading explicit when using read or pread #2772

Are you sure you want to change the base?

Made chunk reading explicit when using read or pread #2772

Conversation

rpecka commented Jul 9, 2024 • edited Loading

Motivation:

Modifications:

Result:

glbrntt Jul 9, 2024

Choose a reason for hiding this comment

rpecka Jul 20, 2024

Choose a reason for hiding this comment

glbrntt Jul 9, 2024

Choose a reason for hiding this comment

rpecka Jul 20, 2024

Choose a reason for hiding this comment

rpecka Jul 20, 2024

Choose a reason for hiding this comment

rpecka commented Jul 9, 2024 •

edited

Loading