Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle bind failures when running NIO in a tight sandbox #2616

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

dkz2
Copy link
Contributor

@dkz2 dkz2 commented Dec 25, 2023

Addressed AcceptBackoffHandlerTest assertion failure when running NIO in a tight sandbox with bind disallowed.

Motivation:

When running SwiftNIO tests in a restricted environment, an assertion error was encountered when bind and other network related operations were disallowed. The tests were expected to fail in such environment, but hitting an assertion was not expected. This change aims to handle such failure gracefully. Notably, the test in AcceptBackoffHandlerTest are all failing as expected, but it was also hitting an assertion, which was not the intended behavior. This PR closes #2465.

Modifications:

  • Removed XCTAssertNoThrow in setupChannel to allow bind failures to propagate upwards.
  • Added a call to close0 to release resources and ensure proper channel state management when bind fails.
  • Created a new test in testBindFailureClosesChannel in SALChannelTests to simulate and test the behavior when a bind operation fails.
  • Modified assertBind in SyscallAbstractionLayer to also account for when a bind operation could fails as expected.

Result:

After these changes, all tests in AcceptBackoffHandlerTest will fail gracefully instead of hitting an assertion if bind fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not related to the PR right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct not related.

Comment on lines 374 to 376

/// An attempt to bind a `Channel` failed.
case bindFailed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this is a breaking change but I don't think we need this case at all.

self.updateCachedAddressesFromSocket(updateRemote: false)
try self.socket.listen(backlog: backlog)
} catch {
promise?.fail(ChannelError.bindFailed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely still fail the promise with the error produced by the socket.bind class. Otherwise it is hard to tell what went wrong. Could you explain why this approach works around the assert? I would like to understand how we actually ended up in the assert in the first place since something must have caused the call to readable and we just have to avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after re thinking my approach the way I set up my sandbox to test this was not correct, thus nor is this solution. In the actual testChannelInactiveCancelScheduled I set up a class MockServerSocket: ServerSocket {} that essentially always simulate bind failure. However I did this by overriding bind but by doing that I am not hitting all of NIOs binding process.

Now I am overriding the standard bind sys call thats called by NIO by creating a bind function that always returns EPERM

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen) {
    int (*original_bind)(int, const struct sockaddr *, socklen_t);
    original_bind = dlsym(RTLD_NEXT, "bind");
    errno = EPERM;
    return -1;
}

compile it down to a shared object and preload it and run the test. Going to use this approach to create a new solution here.

In the actual testChannelInactiveCancelScheduled we call readable since we want to test whether scheduled reads are closed when the channel is closed. The issue here is readable checks if the lifecycleManager is active, however at this stage it's only transitioned from fresh to preRegistered thus triggering the assert.

@@ -263,6 263,7 @@ final class ServerSocketChannel: BaseSocketChannel<ServerSocket> {
// It's important to call the methods before we actually notify the original promise for ordering reasons.
self.becomeActive0(promise: promise)
}.whenFailure{ error in
self.close0(error: error, mode: .all, promise: nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon bind failing, the lifecycleManager seems to remain in the preRegistered state. thus we are transiting to the closed state here.

@@ -884,7 884,18 @@ extension SALTest {
}
}
}


func assertBindFailure(expectedAddress: SocketAddress, file: StaticString = #filePath, line: UInt = #line) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fold this into the existing assertBind call, with an optional errorReturn parameter that defaults to nil? That's a little closer to how we typically structure things in the SAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

import NIOEmbedded
import NIOHTTP1

private class MessageEndHandler<Head: Equatable, Body: Equatable>: ChannelInboundHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you don't actually want to add this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would like @Lukasa to have one final looks as well though

@@ -322,6 322,48 @@ final class SALChannelTest: XCTestCase, SALTest {
}
}.salWait())
}

func testBindFailureClosesChannel() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't specific enough: it continues to pass even with the new close0 line removed. It looks like this is because the failed bind promise is caught by the ClientBootstrap in the connect flow and triggers a close0.

The same code appears to be present in the server bootstrap. Can we try to nail down where the bind issue came from in your original reproducer? It doesn't currently look like bind0 needs to close where it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I will be taking a closer look at this over the weekend!

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BaseSocketChannel: assert(self.lifecycleManager.isActive) hit when running NIO tests in sandbox
3 participants