Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: cleanup killing of microVMs during teardown #4660

Merged
merged 16 commits into from
Jul 8, 2024

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Jul 1, 2024

Changes

Sort out some spurious warnings about "No such process" when killing Firecracker during test teardown. This is done in two steps:

  • Turn the warning into a hard error (e.g. test failure)
  • Analyze each failure, and mark the ones that are expected (for example because we expect the Firecracker process to exit during test execution, and thus cannot kill it again during teardown).

Reason

This allows us to catch issues such as #4662

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.11%. Comparing base (e439ef8) to head (853ab6a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4660    /-   ##
=======================================
  Coverage   82.11%   82.11%           
=======================================
  Files         255      255           
  Lines       31261    31261           
=======================================
  Hits        25670    25670           
  Misses       5591     5591           
Flag Coverage Δ
4.14-c5n.metal 79.60% <ø> (-0.01%) ⬇️
4.14-m5n.metal 79.59% <ø> (-0.01%) ⬇️
4.14-m6a.metal 78.96% <ø> ( 0.13%) ⬆️
4.14-m6g.metal 76.63% <ø> (ø)
4.14-m6i.metal ?
4.14-m7g.metal 76.63% <ø> (ø)
5.10-c5n.metal 82.12% <ø> (ø)
5.10-m5n.metal 82.11% <ø> ( <0.01%) ⬆️
5.10-m6a.metal 81.54% <ø> ( 0.11%) ⬆️
5.10-m6g.metal 79.41% <ø> (ø)
5.10-m6i.metal 82.10% <ø> (ø)
5.10-m7g.metal 79.41% <ø> (ø)
6.1-c5n.metal 82.12% <ø> (-0.01%) ⬇️
6.1-m5n.metal 82.11% <ø> (ø)
6.1-m6a.metal 81.55% <ø> ( 0.13%) ⬆️
6.1-m6g.metal 79.40% <ø> (-0.01%) ⬇️
6.1-m6i.metal 82.10% <ø> (ø)
6.1-m7g.metal 79.41% <ø> ( <0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

If no Firecracker process with `self.firecracker_pid` is found, this can
have two reasons:
- Firecracker died unexpectedly. In this case, we want to log a test
  failure
- Firecracker exited expectedly (say, in a negative test for CLI
  arguments). In this case, we want to be explicit about that in the
  test (by setting `self._killed` to `True`, or similar).

Thus, just silently logging an error message in this scenario is
undesirable, as it hides genuine errors. Instead, log the firecracker
process log_data and re-raise the exception. Follow up commits will fix
up the second case above.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat self-assigned this Jul 2, 2024
@roypat roypat marked this pull request as ready for review July 2, 2024 15:47
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jul 2, 2024
@roypat roypat requested review from zulinx86 and pb8o July 2, 2024 15:50
tests/framework/microvm.py Show resolved Hide resolved
tests/framework/microvm.py Outdated Show resolved Hide resolved
tests/framework/microvm.py Outdated Show resolved Hide resolved
tests/framework/microvm.py Show resolved Hide resolved
There's library methods for killing processes in python, so let's just
use those instead of shelling out.

Also move the `os.kill` call into the try-except block, to get logging
in case Firecracker unexpected died before we could kill it.

The only change in behavior is that we no longer silently ignore the
case where we fail to kill Firecracker because it is already dead.

Signed-off-by: Patrick Roy <[email protected]>
The comment about "it being too late to assert" was outdated, as it only
applied back when `kill` was called inside of destructors, and before we
configured pytest to stop ignoring exceptions in destructors.

Nowadays, we call `kill` inside of test teardown, were it is very
possible indeed to raise exceptions. So do that.

Signed-off-by: Patrick Roy <[email protected]>
- utils.check_output already asserts `rc == 0`
- `output.find("firecracker") == -1` is an unidiomatic version of
  "firecracker" not in output.

Signed-off-by: Patrick Roy <[email protected]>
The open-coded subprocess.Popen sequence in Microvm.spawn was equivalent
to `utils.check_output(..., shell=False)`, so just use that.

Signed-off-by: Patrick Roy <[email protected]>
Call this method on microvms that have met an untimely demise during
test execution (for example, because we are specifically testing that
firecracker exits correctly after receiving a signal).

The function raises an exception if the Firecracker process is not
actually dead, to avoid leaking processes.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat force-pushed the kill-cleanup branch 3 times, most recently from cff853b to 799b4d8 Compare July 3, 2024 12:56
zulinx86
zulinx86 previously approved these changes Jul 4, 2024
tests/integration_tests/functional/test_error_code.py Outdated Show resolved Hide resolved
This is so that we do not try to kill them again in `Microvm.kill`,
which would result in a test failure.

Also remove various variants of "wait for Firecracker process to go
away", as `mark_killed` has a `wait_for_termination` inside of it.

Signed-off-by: Patrick Roy <[email protected]>
Use a library function instead of calling out to a shell.

Signed-off-by: Patrick Roy <[email protected]>
In `test_signals.py::test_sigxfsz_handler`, we seem to need to wait on
average 6 seconds. Prior to unifying all the "wait for FC to die" tests,
this one just had an endless loop, and the retry parameters passed to
`wait_process_termination` which it is using now instead do not provide
sufficient waiting. So simply increase the waiting in
`wait_process_termination` to 10s.

Signed-off-by: Patrick Roy <[email protected]>
Document how we use `depends_on` to make sure that a failure of the
Style step does not prevent all the other integration tests from
running.

Signed-off-by: Patrick Roy <[email protected]>
All tests that formerly set this property now use `mark_killed`. The
only other location that set this value was `kill()` itself, to avoid
errors when kill() ends up being called twice. However, this is already
managed by the `Microvm._killed` attribute, so this guarding is not
necessary. Thus, remove the `expect_kill_by_signal`, as it is useless.

Signed-off-by: Patrick Roy <[email protected]>
A PID is an integer, so return it as one. Having it be a string means it
cannot be directly passed to things such as `os.kill`.

Signed-off-by: Patrick Roy <[email protected]>
Explicitly call out that the whole "grep for jailer_id" dance is a
regression test.

Signed-off-by: Patrick Roy <[email protected]>
Use os.kill in test_serial_io instead of using a shell to invoke `kill`.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat requested review from zulinx86 and pb8o July 4, 2024 15:22
@roypat roypat merged commit 4100110 into firecracker-microvm:main Jul 8, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants