Don't abort CI checks on first error #12206

alice-i-cecile · 2024-02-29T18:07:45Z

What problem does this solve or what need does it fill?

When working on complex PRs like #12163, it's frustrating to fix all of the errors reported by CI, only to be met with another round of failures in a follow-up crate.

What solution would you like?

Pass the --keep-going flag to CI. See rust-lang/cargo#10496 for the tracking issue. Apparently this is now stable as of Rust 1.74: https://doc.rust-lang.org/cargo/commands/cargo-build.html#option-cargo-build---keep-going

It looks like for cargo test, we also need to pass in --no-fail-fast: https://doc.rust-lang.org/cargo/commands/cargo-test.html#options

What alternative(s) have you considered?

Always run CI repeatedly locally.

The text was updated successfully, but these errors were encountered:

mockersf · 2024-03-04T10:39:12Z

The goal of CI is to fail fast in a case of errors, and to not do extra works when possible.

Changing that would need more changes than just passing --keep-going everywhere: reducing dependency between jobs, not cancelling everything on the first job failed. Failing fast is the default everywhere and would need to add more configuration.

alice-i-cecile · 2024-03-04T17:16:58Z

That makes a lot of sense: I didn't think about dependent workflows. As a reviewer, I would still prefer it if all errors were caught, but enabling this option locally will be tremendously useful as an author.

# Objective - Make running CI locally relatively less painful by allowing continuation after failure - Fix #12206 ## Solution Previously, `ci` would halt after encounting a failure (or shortly thereafter). For instance, if you ran `cargo run -p ci` and a single test within a CI check failed somewhere in the middle, you would never even reach many of the other tests within the check and certainly none of the CI checks that came afterward. This was annoying; if I am fixing issues with CI tests, I want to just see all of the issues up-front instead of having to rerun CI every time I fix something to see the next error. Furthermore, it is not infrequent (because of subtle configuration/system differences) to encounter spurious CI failures locally; previously, when these occurred, they would make the process of running CI largely pointless, since you would have to push your branch in order to see your actual CI failures from the automated testing service. Now, when running `ci` locally, we have a couple new tools to ameliorate these and provide a smoother experience: - Firstly, there is a new optional flag `--keep-going` that can be passed while calling `ci` (e.g. `cargo run -p ci -- doc --keep-going`). It has the following effects: - It causes the `--keep-going` flag to be passed to the script's `cargo doc` and `cargo check` invocations, so that they do not stop when they encounter an error in a single module; instead, they keep going (!) and find errors subsequently. - It causes the `--no-fail-fast` flag to be passed to the script's `cargo test` invocations, to similar effect. - Finally, it causes `ci` itself not to abort after a single check fails; instead, it will run every check that was invoked. Thus, for instance, `cargo run -p ci -- --keep-going` will run every CI check even if it encounters intermediate failures, and every such check will itself be run to completion. - Secondly, we now allow multiple ordinary arguments to be passed to `ci`. For instance, `cargo -p ci -- doc test` now executes both the 'doc' checks and the 'test' checks. This allows the caller to run the tests they care about with fewer invocations of `ci`. As of this PR, automated testing will remain unchanged. --- ## Changelog - tools/ci/src/main.rs refactored into staging and execution steps, since the previous control flow did not naturally support continuing after failure. - Added "--keep-going" flag to `ci`. - Added support for invoking `ci` with multiple arguments. --- ## Discussion ### Design considerations I had originally split this into separate flags that controlled: 1. whether `--keep-going`/`--no-fail-fast` would be passed to the constituent commands 2. whether `ci` would continue after a component test failed However, I decided to merge these two behaviors, since I think that, if you're in the business of seeing all the errors, you'll probably want to actually see all of the errors. One slightly annoying thing, however, about the new behavior with `--keep-going`, is that you will sometimes find yourself wishing that the script would pause or something, since it tends to fill the screen with a lot of junk. I have found that sending stdout to /dev/null helps quite a bit, but I don't think `cargo fmt` or `cargo clippy` actually write to stderr, so you need to be cognizant of that (and perhaps invoke the lints separately). ~~Next, I'll mention that I feel somewhat strongly that the current behavior of `ci` for automated testing should remain the same, since its job is more like detecting that errors exist rather than finding all of them.~~ (I was convinced this could have value.) Orthogonally, there is the question of whether or not we might want to make this (or something similar) actually the default behavior and make the automated test script invoke some optional flags — it doesn't have to type with its hands, after all. I'm not against that, but I don't really want to rock the boat much more with this PR, since anyone who looks at the diff might already be a little incensed.

# Objective - Make running CI locally relatively less painful by allowing continuation after failure - Fix bevyengine#12206 ## Solution Previously, `ci` would halt after encounting a failure (or shortly thereafter). For instance, if you ran `cargo run -p ci` and a single test within a CI check failed somewhere in the middle, you would never even reach many of the other tests within the check and certainly none of the CI checks that came afterward. This was annoying; if I am fixing issues with CI tests, I want to just see all of the issues up-front instead of having to rerun CI every time I fix something to see the next error. Furthermore, it is not infrequent (because of subtle configuration/system differences) to encounter spurious CI failures locally; previously, when these occurred, they would make the process of running CI largely pointless, since you would have to push your branch in order to see your actual CI failures from the automated testing service. Now, when running `ci` locally, we have a couple new tools to ameliorate these and provide a smoother experience: - Firstly, there is a new optional flag `--keep-going` that can be passed while calling `ci` (e.g. `cargo run -p ci -- doc --keep-going`). It has the following effects: - It causes the `--keep-going` flag to be passed to the script's `cargo doc` and `cargo check` invocations, so that they do not stop when they encounter an error in a single module; instead, they keep going (!) and find errors subsequently. - It causes the `--no-fail-fast` flag to be passed to the script's `cargo test` invocations, to similar effect. - Finally, it causes `ci` itself not to abort after a single check fails; instead, it will run every check that was invoked. Thus, for instance, `cargo run -p ci -- --keep-going` will run every CI check even if it encounters intermediate failures, and every such check will itself be run to completion. - Secondly, we now allow multiple ordinary arguments to be passed to `ci`. For instance, `cargo -p ci -- doc test` now executes both the 'doc' checks and the 'test' checks. This allows the caller to run the tests they care about with fewer invocations of `ci`. As of this PR, automated testing will remain unchanged. --- ## Changelog - tools/ci/src/main.rs refactored into staging and execution steps, since the previous control flow did not naturally support continuing after failure. - Added "--keep-going" flag to `ci`. - Added support for invoking `ci` with multiple arguments. --- ## Discussion ### Design considerations I had originally split this into separate flags that controlled: 1. whether `--keep-going`/`--no-fail-fast` would be passed to the constituent commands 2. whether `ci` would continue after a component test failed However, I decided to merge these two behaviors, since I think that, if you're in the business of seeing all the errors, you'll probably want to actually see all of the errors. One slightly annoying thing, however, about the new behavior with `--keep-going`, is that you will sometimes find yourself wishing that the script would pause or something, since it tends to fill the screen with a lot of junk. I have found that sending stdout to /dev/null helps quite a bit, but I don't think `cargo fmt` or `cargo clippy` actually write to stderr, so you need to be cognizant of that (and perhaps invoke the lints separately). ~~Next, I'll mention that I feel somewhat strongly that the current behavior of `ci` for automated testing should remain the same, since its job is more like detecting that errors exist rather than finding all of them.~~ (I was convinced this could have value.) Orthogonally, there is the question of whether or not we might want to make this (or something similar) actually the default behavior and make the automated test script invoke some optional flags — it doesn't have to type with its hands, after all. I'm not against that, but I don't really want to rock the boat much more with this PR, since anyone who looks at the diff might already be a little incensed.

alice-i-cecile added D-Trivial Nice and easy! A great choice to get started with Bevy A-Build-System Related to build systems or continuous integration C-Usability A targeted quality-of-life change that makes Bevy easier to use labels Feb 29, 2024

alice-i-cecile changed the title ~~Add keep-going flag to CI to catch more errors per run~~ Don't abort CI checks on first error Feb 29, 2024

mweatherley mentioned this issue Mar 1, 2024

Quality-of-life updates for running CI locally #12242

Merged

mockersf added the X-Controversial There is active debate or serious implications around merging this PR label Mar 4, 2024

alice-i-cecile removed the D-Trivial Nice and easy! A great choice to get started with Bevy label Mar 4, 2024

alice-i-cecile closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024

mockersf closed this as completed in #12242 Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't abort CI checks on first error #12206

Don't abort CI checks on first error #12206

alice-i-cecile commented Feb 29, 2024 •

edited

Loading

mockersf commented Mar 4, 2024 •

edited

Loading

alice-i-cecile commented Mar 4, 2024

Don't abort CI checks on first error #12206

Don't abort CI checks on first error #12206

Comments

alice-i-cecile commented Feb 29, 2024 • edited Loading

What problem does this solve or what need does it fill?

What solution would you like?

What alternative(s) have you considered?

mockersf commented Mar 4, 2024 • edited Loading

alice-i-cecile commented Mar 4, 2024

alice-i-cecile commented Feb 29, 2024 •

edited

Loading

mockersf commented Mar 4, 2024 •

edited

Loading