Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide scheduling hints? #7437

Open
SimonSapin opened this issue Sep 26, 2019 · 11 comments
Open

Provide scheduling hints? #7437

SimonSapin opened this issue Sep 26, 2019 · 11 comments
Labels
A-build-execution Area: anything dealing with executing the compiler Performance Gotta go fast! S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@SimonSapin
Copy link
Contributor

When more crates are ready to start compiling than there is available parallelism, how does Cargo pick which ones to start first? Are there ways to influence this scheduling? (For example, does the order of declarations of dependencies in a given Cargo.toml file matter? Or, what effect would it have to add an otherwise unnecessary edge to the dependency graph?) Should we add a new mechanism to influence it?

Here is the output of cargo build -Z timings for Servo: (with "Min unit time" set to 10 seconds)

canvas
canvas

If we start from the end of the graph, (part of) a critical path is very apparent: the final executable only starts to build after the script crate has finished. The script crate in turns only starts after the build script for the mozjs_sys has finished.

Additionally, CPU utilization drops while script is compiling because Cargo runs out of other tasks to do and rustc only has limited intra-crate parallelism. (Though I expected codegen-units to provide some more parallelism during codegen, but that’s a separate issue.)

It seems that if we could start mozjs_sys and script and earlier, the total time could be significantly reduced. Specifically: cargo build -p mozjs_sys && cargo build -p script && cargo build might lead to better scheduling. A way to achieve that scheduling with more parallelism could be to assign priorities. mozjs_sys and its recursive dependencies have priority 2 (more urgent), script and its dependencies that don’t already have a priority have priority 1, and everything else has priority 1.

Literally this mechanism with numeric priority levels is probably not the UX we want for Cargo. But how does it sound to add some way to influence the scheduling? Maybe call it “hints” to avoid setting in stone the current algorithm.

CC #5125 which is on the more general topic of scheduling

@SimonSapin
Copy link
Contributor Author

I expected codegen-units to provide some more parallelism during codegen, but that’s a separate issue

(Looks like codegen-units defaults to 2, not ncpu.)

@alexcrichton
Copy link
Member

Scheduling has come up from time to time, but generally hasn't received a ton of love unfortunately! Some history is:

The tl;dr is that Cargo only has limited judgement when scheduling things. This only comes up when Cargo has multiple candidates to schedule for an available jobserver token. The "waiting" red line in the graph you have, when it's nonzero, is the only chance where Cargo actually has a scheduling decision to make.

When Cargo has a scheduling decision, it currently sorts the list of candidates to scheduled based on the number of transitive crates which depend on the crate. The thinking is that this schedules things that will hopefully unlock the most parallelism later in the graph. The previous behavior after the "first fix" was sorting based on depth in the graph.

That being said I definitely agree that we should have some way of providing hints to scheduling. For example Servo should be able to codify "when you compile the script crate, it's like you compile 15 crates serially" or something like that, and Cargo could use that to prioritize scheduling of script's transitive dependencies.

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

@ehuss
Copy link
Contributor

ehuss commented Sep 26, 2019

(Looks like codegen-units defaults to 2, not ncpu.)

I'm pretty sure it is 16?

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 26, 2019

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

At a meetup last night we realized that that can be answered with a small addition to the data stored in the -Ztimings (cc #7396 (comment)). I think it will be very motivating to "know" the performance we are leaving on the table.

@SimonSapin
Copy link
Contributor Author

(Looks like codegen-units defaults to 2, not ncpu.)

I'm pretty sure it is 16?

Oh, interesting. Any guess why this machine’s 4 cores / 8 threads are not saturated during codegen for the script crate in the timings above, then?

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

If I run cargo clean then cargo build -p script, the total time is 319 seconds whereas in the full build graph the script crate finishes after second 560. So as far as I can tell that’s ~240 seconds worth of work that is not a dependency of script, and that could (with different scheduling) be done in parallel with it.

@ehuss ehuss added A-build-execution Area: anything dealing with executing the compiler Performance Gotta go fast! labels Oct 10, 2019
@JoshMcguigan
Copy link
Contributor

Is it practical to use the json output from cargo build -Ztimings=json (or something similar) as a kind of profile guided optimization to determine the optimal build order automatically?

In any case, I've submitted #8908 which attempts to take a small step toward allowing some weighting to be considered when determining build order. I think this should be useful for either allowing a user to provide scheduling hints as requested in this issue, or in implementing a system which uses actual timing data from a previous build.

bors added a commit that referenced this issue Nov 30, 2020
update dependency queue to consider cost for each node

In support of #7437, this updates the dependency queue implementation to consider a non-fixed cost for each node. The behavior of this implementation should match the previous implementation if all units are assigned the same cost by the code which calls the `queue` method (which is what we do for now).

In the future I can think of at least two ways these costs could be used:

1. Use some known constant value by default (say 100 as I've done in this PR), and allow the user to provide hints for certain units to influence compilation order (as requested in #7437).
2. Take timing data from a previous compilation (perhaps the json output of `cargo build -Ztimings=json`) and use the actual time required to compile a given unit (in milliseconds) as its cost. Any units not included in the provided timing data could perhaps have their cost set to the median cost of all crates included in the timing data.
@luser
Copy link
Contributor

luser commented Feb 9, 2021

Is it practical to use the json output from cargo build -Ztimings=json (or something similar) as a kind of profile guided optimization to determine the optimal build order automatically?

That's #7396.

@epage
Copy link
Contributor

epage commented Nov 2, 2023

#11032 looked at adding more heuristics to improve build order but they were a mixed bag. Something that came up in that discussion was a (presumed) perma-unstable way of configuring the priority of every package to explore what heuristics could be used. #7396 could then build on this to create a feedback loop for your own system to improve.

@epage epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Nov 2, 2023
@ariasanovsky
Copy link

Now that nightly supports parallel front-end compilation, I propose we develop an algorithm $A = A_\alpha$ with an adjustable configuration $\alpha$ and follow a data-driven approach to optimize $\alpha$ for different build environments. Guessing an optimal schedule should run $o(\text{compilation time})$, but may be used to reduce overall compilation time.

Notes:

  1. Job-shop scheduling is famously NP-hard.
  • Despite this, a great deal of effort has gone towards the development of operations research techniques to develop solvers and heuristics to efficiently find $\varepsilon$-good-enough solutions that perform well in problem contexts
  • We can improve overall compilation time by leveraging decades of existing research methodologies.
  1. Job scheduling is harder when processing time is unmeasured or stochastic
  • To optimize compile times for all users, we need to account for the diversity of hardware specifications and other factors, controlling for bias.
  1. Heuristics will work on my machine but not your's
  • Hard-coding bias terms for famous crates in the short-term may work for a large number of users, but create unmeasured friction for others.
  • Hard-coded coded values need to be benchmarked periodically and updated via a reproducible set of techniques that may be adjusted empirically over time.

@weihanglo
Copy link
Member

FWIW, not exactly the same issue but could share some ideas in between regarding parallelism — #12912.

@weihanglo
Copy link
Member

Similarly, protobuf generation (via prost) is usually assigned to a lower priority but takes relatively longer time to build. Here is a case in risingwave:

Since we have integrated SQLite into Cargo, it is a good time to evaluate how to persist timings data as a first step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler Performance Gotta go fast! S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

9 participants