Provide scheduling hints? #7437

SimonSapin · 2019-09-26T09:02:12Z

When more crates are ready to start compiling than there is available parallelism, how does Cargo pick which ones to start first? Are there ways to influence this scheduling? (For example, does the order of declarations of dependencies in a given Cargo.toml file matter? Or, what effect would it have to add an otherwise unnecessary edge to the dependency graph?) Should we add a new mechanism to influence it?

Here is the output of cargo build -Z timings for Servo: (with "Min unit time" set to 10 seconds)

If we start from the end of the graph, (part of) a critical path is very apparent: the final executable only starts to build after the script crate has finished. The script crate in turns only starts after the build script for the mozjs_sys has finished.

Additionally, CPU utilization drops while script is compiling because Cargo runs out of other tasks to do and rustc only has limited intra-crate parallelism. (Though I expected codegen-units to provide some more parallelism during codegen, but that’s a separate issue.)

It seems that if we could start mozjs_sys and script and earlier, the total time could be significantly reduced. Specifically: cargo build -p mozjs_sys && cargo build -p script && cargo build might lead to better scheduling. A way to achieve that scheduling with more parallelism could be to assign priorities. mozjs_sys and its recursive dependencies have priority 2 (more urgent), script and its dependencies that don’t already have a priority have priority 1, and everything else has priority 1.

Literally this mechanism with numeric priority levels is probably not the UX we want for Cargo. But how does it sound to add some way to influence the scheduling? Maybe call it “hints” to avoid setting in stone the current algorithm.

CC #5125 which is on the more general topic of scheduling

The text was updated successfully, but these errors were encountered:

SimonSapin · 2019-09-26T14:27:24Z

I expected codegen-units to provide some more parallelism during codegen, but that’s a separate issue

(Looks like codegen-units defaults to 2, not ncpu.)

alexcrichton · 2019-09-26T14:32:17Z

Scheduling has come up from time to time, but generally hasn't received a ton of love unfortunately! Some history is:

Initial issue - achive better parallelism when building deps #5014
First fix - Improve Cargo's scheduling of builds #5100
Recent fix - change the priority in witch we build crates #7390

The tl;dr is that Cargo only has limited judgement when scheduling things. This only comes up when Cargo has multiple candidates to schedule for an available jobserver token. The "waiting" red line in the graph you have, when it's nonzero, is the only chance where Cargo actually has a scheduling decision to make.

When Cargo has a scheduling decision, it currently sorts the list of candidates to scheduled based on the number of transitive crates which depend on the crate. The thinking is that this schedules things that will hopefully unlock the most parallelism later in the graph. The previous behavior after the "first fix" was sorting based on depth in the graph.

That being said I definitely agree that we should have some way of providing hints to scheduling. For example Servo should be able to codify "when you compile the script crate, it's like you compile 15 crates serially" or something like that, and Cargo could use that to prioritize scheduling of script's transitive dependencies.

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

ehuss · 2019-09-26T14:37:48Z

(Looks like codegen-units defaults to 2, not ncpu.)

I'm pretty sure it is 16?

Eh2406 · 2019-09-26T14:41:42Z

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

At a meetup last night we realized that that can be answered with a small addition to the data stored in the -Ztimings (cc #7396 (comment)). I think it will be very motivating to "know" the performance we are leaving on the table.

SimonSapin · 2019-09-26T14:56:23Z

(Looks like codegen-units defaults to 2, not ncpu.)

I'm pretty sure it is 16?

Oh, interesting. Any guess why this machine’s 4 cores / 8 threads are not saturated during codegen for the script crate in the timings above, then?

If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of script get held back by accident?

If I run cargo clean then cargo build -p script, the total time is 319 seconds whereas in the full build graph the script crate finishes after second 560. So as far as I can tell that’s ~240 seconds worth of work that is not a dependency of script, and that could (with different scheduling) be done in parallel with it.

JoshMcguigan · 2020-11-28T02:48:47Z

Is it practical to use the json output from cargo build -Ztimings=json (or something similar) as a kind of profile guided optimization to determine the optimal build order automatically?

In any case, I've submitted #8908 which attempts to take a small step toward allowing some weighting to be considered when determining build order. I think this should be useful for either allowing a user to provide scheduling hints as requested in this issue, or in implementing a system which uses actual timing data from a previous build.

update dependency queue to consider cost for each node In support of #7437, this updates the dependency queue implementation to consider a non-fixed cost for each node. The behavior of this implementation should match the previous implementation if all units are assigned the same cost by the code which calls the `queue` method (which is what we do for now). In the future I can think of at least two ways these costs could be used: 1. Use some known constant value by default (say 100 as I've done in this PR), and allow the user to provide hints for certain units to influence compilation order (as requested in #7437). 2. Take timing data from a previous compilation (perhaps the json output of `cargo build -Ztimings=json`) and use the actual time required to compile a given unit (in milliseconds) as its cost. Any units not included in the provided timing data could perhaps have their cost set to the median cost of all crates included in the timing data.

luser · 2021-02-09T18:14:28Z

Is it practical to use the json output from cargo build -Ztimings=json (or something similar) as a kind of profile guided optimization to determine the optimal build order automatically?

That's #7396.

epage · 2023-11-02T20:30:22Z

#11032 looked at adding more heuristics to improve build order but they were a mixed bag. Something that came up in that discussion was a (presumed) perma-unstable way of configuring the priority of every package to explore what heuristics could be used. #7396 could then build on this to create a feedback loop for your own system to improve.

ariasanovsky · 2023-11-10T19:39:22Z

Now that nightly supports parallel front-end compilation, I propose we develop an algorithm $A = A_\alpha$ with an adjustable configuration $\alpha$ and follow a data-driven approach to optimize $\alpha$ for different build environments. Guessing an optimal schedule should run $o(\text{compilation time})$, but may be used to reduce overall compilation time.

Notes:

Job-shop scheduling is famously NP-hard.

Despite this, a great deal of effort has gone towards the development of operations research techniques to develop solvers and heuristics to efficiently find $\varepsilon$-good-enough solutions that perform well in problem contexts
We can improve overall compilation time by leveraging decades of existing research methodologies.

Job scheduling is harder when processing time is unmeasured or stochastic

To optimize compile times for all users, we need to account for the diversity of hardware specifications and other factors, controlling for bias.

Heuristics will work on my machine but not your's

Hard-coding bias terms for famous crates in the short-term may work for a large number of users, but create unmeasured friction for others.
Hard-coded coded values need to be benchmarked periodically and updated via a reproducible set of techniques that may be adjusted empirically over time.

weihanglo · 2023-11-10T20:46:46Z

FWIW, not exactly the same issue but could share some ideas in between regarding parallelism — #12912.

weihanglo · 2024-03-01T13:15:47Z

Similarly, protobuf generation (via prost) is usually assigned to a lower priority but takes relatively longer time to build. Here is a case in risingwave:

Since we have integrated SQLite into Cargo, it is a good time to evaluate how to persist timings data as a first step.

ehuss added A-build-execution Area: anything dealing with executing the compiler Performance Gotta go fast! labels Oct 10, 2019

JoshMcguigan mentioned this issue Nov 28, 2020

update dependency queue to consider cost for each node #8908

Merged

luser mentioned this issue Feb 9, 2021

Hint mechanism to require more "slots" to build a crate #8405

Closed

epage mentioned this issue Nov 2, 2023

optimize dep queue by recording and using crate build times #7396

Open

epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide scheduling hints? #7437

Provide scheduling hints? #7437

SimonSapin commented Sep 26, 2019

SimonSapin commented Sep 26, 2019

alexcrichton commented Sep 26, 2019

ehuss commented Sep 26, 2019

Eh2406 commented Sep 26, 2019

SimonSapin commented Sep 26, 2019

JoshMcguigan commented Nov 28, 2020

luser commented Feb 9, 2021

epage commented Nov 2, 2023

ariasanovsky commented Nov 10, 2023

weihanglo commented Nov 10, 2023

weihanglo commented Mar 1, 2024

Provide scheduling hints? #7437

Provide scheduling hints? #7437

Comments

SimonSapin commented Sep 26, 2019

SimonSapin commented Sep 26, 2019

alexcrichton commented Sep 26, 2019

ehuss commented Sep 26, 2019

Eh2406 commented Sep 26, 2019

SimonSapin commented Sep 26, 2019

JoshMcguigan commented Nov 28, 2020

luser commented Feb 9, 2021

epage commented Nov 2, 2023

ariasanovsky commented Nov 10, 2023

weihanglo commented Nov 10, 2023

weihanglo commented Mar 1, 2024