Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize librustc_driver.so with BOLT #116352

Merged
merged 5 commits into from
Oct 14, 2023
Merged

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Oct 2, 2023

This PR optimizes librustc_driver.so on 64-bit Linux CI with BOLT.

Code

One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:

  1. Only pass it when we actually plan to use BOLT. How to best do that? config.toml entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by opt-dist, therefore bootstrap doesn't know about it by default.
  2. Only pass it to librustc_driver.so (see performance below).

Some discussion of this flag already happened on Zulip.

Performance

Latest perf. results can be found here. Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C libstd (?).

Summary:

  • ✔️ -1.8% mean improvement in cycle counts across many primary benchmarks.
  • ✔️ -1.8% mean Max-RSS improvement.
  • ✖️ 34 MiB ( 48%) artifact size regression of librustc_driver.so.
    • This is caused by building librustc_driver.so with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky.
    • Note that the size of this file was recently reduced in Build rustc with a single CGU on x64 Linux #115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
  • ✖️ 1.4 MiB ( 53%) artifact size regression of rustc.
    • This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to librustc_driver.so (where they are needed) and for rustc (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap rustc shim to only apply the relocation flag for the shared library and not for rustc.

CI time

CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and librustc_driver BOLT profile gathering at the same time (now they are gathered separately for LLVM and librustc_driver).

r? @Mark-Simulacrum

Also CC @onur-ozkan, primarily for the bootstrap linker flag issue.

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Oct 2, 2023
@@ -904,6 904,11 @@ impl Step for Rustc {
cargo.arg("-p").arg(krate);
}

if compiler.stage == 1 {
// Relocations are required for BOLT to work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've expected this if to gate on the usage of BOLT somehow -- do we have a config flag for that? I think just a generic config flag is fine ("enable-bolt-settings") is ok, we don't need it to actually do optimizations. Maybe even "opt-dist" is the right name for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I have discussed in the PR description and on Zulip. @onur-ozkan has expressed a desire to avoid a BOLT-specific flag, although it seems like the most straightforward solution to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have problem with doing this with using flags/config.toml. My point was more about its behaviour.

Add something like --bolt-flags, which would set the flag only for the cases where it's needed (and thus tying its behavior to opt-dist)

Like here, "which would set the flag only for the cases where it's needed" this seems very limited experience to me.

Copy link
Contributor Author

@Kobzol Kobzol Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the interaction of the flag seems very specific. However, I feel like that's also true in general of several other similar flags. For example, the PGO generate/use flags are also only performed in stage 2, and basically I don't think that they are very useful outside of opt-dist.

The fact that we pass a specific linker flag is tied to BOLT - that won't really change. The fact that we only pass it to librustc_driver is tied to the way the compiler is distributed, and unless that changes, it also don't make sense to do it in any other way. And performing BOLT on earlier stage than 2 also doesn't make sense. So from this point of view, I think that the behavior of the flag is not that limited, because it's pretty much the only way we could do it that currently makes sense.

bootstrap and opt-dist are intrinsically intertwined, for better or worse, so I don't think that this flag will change that or create some unique, "anti-systemic" solution :D

@Mark-Simulacrum
Copy link
Member

One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to: [...]

We might need e.g. some tricks in the bootstrap rustc shim to only apply the relocation flag for the shared library and not for rustc.

Given the desire to pass it only to one specific invocation of rustc, my inclination is to use an environment variable set by opt-dist and read by the custom rustc wrapper (src/bootstrap/bin/rustc.rs).

@bors
Copy link
Contributor

bors commented Oct 8, 2023

☔ The latest upstream changes (presumably #114623) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot
Copy link
Collaborator

rustbot commented Oct 9, 2023

This PR changes src/bootstrap/config.rs. If appropriate, please also update CONFIG_CHANGE_HISTORY in src/bootstrap/lib.rs and change-id in config.example.toml.

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 9, 2023

I added the --enable-bolt-settings flag and an env. var passed to the rustc shim.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 9, 2023
@bors
Copy link
Contributor

bors commented Oct 9, 2023

⌛ Trying commit 9a0e90f with merge cdee08f...

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 9, 2023
Optimize `librustc_driver.so` with BOLT

This PR optimizes `librustc_driver.so` on 64-bit Linux CI with BOLT.

### Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
1) Only pass it when we actually plan to use BOLT. How to best do that? `config.toml` entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by `opt-dist`, therefore bootstrap doesn't know about it by default.
2) Only pass it to `librustc_driver.so` (see performance below).

Some discussion of this flag already happened on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Adding.20a.20one-off.20linker.20flag).

### Performance
Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C   libstd (?).

Summary:
- ✔️ `-1.8%` mean improvement in cycle counts across many primary benchmarks.
- ✔️ `-1.8%` mean Max-RSS improvement.
- ✖️ 34 MiB ( 48%) artifact size regression of `librustc_driver.so`.
  - This is caused by building `librustc_driver.so` with relocations (which are required for BOLT). Hopefully, it will be [fixed](https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674) in the future with BOLT improvements, but now trying to reduce this size increase is [tricky](rust-lang#114649).
  - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- ✖️ 1.4 MiB ( 53%) artifact size regression of `rustc`.
  - This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to `librustc_driver.so` (where they are needed) and for `rustc` (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap `rustc` shim to only apply the relocation flag for the shared library and not for `rustc`.

### CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and `librustc_driver` BOLT profile gathering at the same time (now they are gathered separately for LLVM and `librustc_driver`).

r? `@Mark-Simulacrum`

Also CC `@onur-ozkan,` primarily for the bootstrap linker flag issue.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 9, 2023

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 9, 2023
@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 11, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 11, 2023

⌛ Trying commit 482a820 with merge dbae2a1...

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 11, 2023
Optimize `librustc_driver.so` with BOLT

This PR optimizes `librustc_driver.so` on 64-bit Linux CI with BOLT.

### Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
1) Only pass it when we actually plan to use BOLT. How to best do that? `config.toml` entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by `opt-dist`, therefore bootstrap doesn't know about it by default.
2) Only pass it to `librustc_driver.so` (see performance below).

Some discussion of this flag already happened on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Adding.20a.20one-off.20linker.20flag).

### Performance
Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C   libstd (?).

Summary:
- ✔️ `-1.8%` mean improvement in cycle counts across many primary benchmarks.
- ✔️ `-1.8%` mean Max-RSS improvement.
- ✖️ 34 MiB ( 48%) artifact size regression of `librustc_driver.so`.
  - This is caused by building `librustc_driver.so` with relocations (which are required for BOLT). Hopefully, it will be [fixed](https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674) in the future with BOLT improvements, but now trying to reduce this size increase is [tricky](rust-lang#114649).
  - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- ✖️ 1.4 MiB ( 53%) artifact size regression of `rustc`.
  - This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to `librustc_driver.so` (where they are needed) and for `rustc` (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap `rustc` shim to only apply the relocation flag for the shared library and not for `rustc`.

### CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and `librustc_driver` BOLT profile gathering at the same time (now they are gathered separately for LLVM and `librustc_driver`).

r? `@Mark-Simulacrum`

Also CC `@onur-ozkan,` primarily for the bootstrap linker flag issue.
@bors
Copy link
Contributor

bors commented Oct 11, 2023

☀️ Try build successful - checks-actions
Build commit: dbae2a1 (dbae2a14a1354fb5b3c695e22afc5692610395d0)

@rust-timer

This comment has been minimized.

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 14, 2023
@bors bors merged commit c543b6f into rust-lang:master Oct 14, 2023
12 checks passed
@rustbot rustbot added this to the 1.75.0 milestone Oct 14, 2023
@Kobzol Kobzol deleted the rustc-driver-bolt branch October 14, 2023 23:42
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c543b6f): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.3% [0.2%, 5.7%] 10
Regressions ❌
(secondary)
1.9% [0.3%, 5.0%] 60
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 4
All ❌✅ (primary) 2.3% [0.2%, 5.7%] 10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.6%, 1.6%] 9
Regressions ❌
(secondary)
3.4% [1.7%, 5.4%] 15
Improvements ✅
(primary)
-2.6% [-5.0%, -0.6%] 66
Improvements ✅
(secondary)
-2.9% [-9.0%, -0.6%] 108
All ❌✅ (primary) -2.2% [-5.0%, 1.6%] 75

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.0% [-3.1%, -0.9%] 70
Improvements ✅
(secondary)
-2.5% [-4.7%, -1.9%] 18
All ❌✅ (primary) -2.0% [-3.1%, -0.9%] 70

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 628.052s -> 623.517s (-0.72%)
Artifact size: 271.29 MiB -> 305.48 MiB (12.60%)

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 15, 2023

The interesting metrics here are cycle counts (-2% mean improvement amongst primary benchmarks) and Max-RSS (-2% mean improvement amongst primary benchmarks). Bootstrap time has also been reduced by ~5s (usually 2-3s tends to be noise for bootstrap timings, so 5s seems like a statistically significant improvement to me).

The instruction count regressions stem happens mostly for "hello-world-like" programs that contain tiny amounts of code. It's basically a fixed startup cost regression that happens because the compiler library is now ~30 MiB larger. The size of librustc_driver.so has been recently reduced by ~30 MiB by switching to a single CGU, so this mostly cancels out. It's unfortunate that we lost the size improvements, but I think that it's worth the 2% cycle wins.

@rustbot label: perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Oct 15, 2023
bors-ferrocene bot added a commit to ferrocene/ferrocene that referenced this pull request Oct 17, 2023
54: Pull upstream master 2023 10 17 r=pietroalbini a=Veykril

* rust-lang/rust#116196
* rust-lang/rust#116824
* rust-lang/rust#116822
* rust-lang/rust#116477
* rust-lang/rust#116826
* rust-lang/rust#116820
  * rust-lang/rust#116811
  * rust-lang/rust#116808
  * rust-lang/rust#116805
  * rust-lang/rust#116800
  * rust-lang/rust#116798
  * rust-lang/rust#116754
* rust-lang/rust#114370
* rust-lang/rust#116804
  * rust-lang/rust#116802
  * rust-lang/rust#116790
  * rust-lang/rust#116786
  * rust-lang/rust#116709
  * rust-lang/rust#116430
  * rust-lang/rust#116257
  * rust-lang/rust#114157
* rust-lang/rust#116731
* rust-lang/rust#116550
* rust-lang/rust#114330
* rust-lang/rust#116724
* rust-lang/rust#116782
  * rust-lang/rust#116776
  * rust-lang/rust#115955
  * rust-lang/rust#115196
* rust-lang/rust#116775
* rust-lang/rust#114589
* rust-lang/rust#113747
* rust-lang/rust#116772
  * rust-lang/rust#116771
  * rust-lang/rust#116760
  * rust-lang/rust#116755
  * rust-lang/rust#116732
  * rust-lang/rust#116522
  * rust-lang/rust#116341
  * rust-lang/rust#116172
* rust-lang/rust#110604
* rust-lang/rust#110729
* rust-lang/rust#116527
* rust-lang/rust#116688
* rust-lang/rust#116757
  * rust-lang/rust#116753
  * rust-lang/rust#116748
  * rust-lang/rust#116741
  * rust-lang/rust#116863
* rust-lang/rust#116691
* rust-lang/rust#116643
* rust-lang/rust#116683
* rust-lang/rust#116635
* rust-lang/rust#115515
* rust-lang/rust#116742
  * rust-lang/rust#116661
  * rust-lang/rust#116576
  * rust-lang/rust#116540
* rust-lang/rust#116352
* rust-lang/rust#116737
  * rust-lang/rust#116730
  * rust-lang/rust#116723
  * rust-lang/rust#116715
  * rust-lang/rust#116603
  * rust-lang/rust#116591
  * rust-lang/rust#115439
* rust-lang/rust#116264
* rust-lang/rust#116727
  * rust-lang/rust#116704
  * rust-lang/rust#116696
  * rust-lang/rust#116695
  * rust-lang/rust#116644
  * rust-lang/rust#116630
* rust-lang/rust#116728
  * rust-lang/rust#116689
  * rust-lang/rust#116679
  * rust-lang/rust#116618
  * rust-lang/rust#116577
  * rust-lang/rust#115653
* rust-lang/rust#116702
* rust-lang/rust#116015
* rust-lang/rust#115822
* rust-lang/rust#116407
* rust-lang/rust#115719
* rust-lang/rust#115524
* rust-lang/rust#116705
* rust-lang/rust#116645
* rust-lang/rust#116233
* rust-lang/rust#115108
* rust-lang/rust#116670
* rust-lang/rust#116676
* rust-lang/rust#116666



Co-authored-by: Benoît du Garreau <[email protected]>
Co-authored-by: Colin Finck <[email protected]>
Co-authored-by: Ian Jackson <[email protected]>
Co-authored-by: Joshua Liebow-Feeser <[email protected]>
Co-authored-by: León Orell Valerian Liehr <[email protected]>
Co-authored-by: Trevor Gross <[email protected]>
Co-authored-by: Evan Merlock <[email protected]>
Co-authored-by: joboet <[email protected]>
Co-authored-by: Ralf Jung <[email protected]>
Co-authored-by: DaniPopes <57450786 [email protected]>
Co-authored-by: Mark Rousskov <[email protected]>
Co-authored-by: onur-ozkan <[email protected]>
Co-authored-by: Nicholas Nethercote <[email protected]>
Co-authored-by: The 8472 <[email protected]>
Co-authored-by: Samuel Thibault <[email protected]>
Co-authored-by: reez12g <[email protected]>
Co-authored-by: Jakub Beránek <[email protected]>
if let Some("rustc_driver") = crate_name {
cmd.arg("-Clink-args=-Wl,-q");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added after the debug printing, so the extra flags won't show up when the command is printed. Was that deliberate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not, good catch, thanks! #123189

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 31, 2024
Log BOLT args in bootstrap `rustc` shim

Before, the BOLT argument would not be logged, because it was only added after the logging has happened.

Found by `@RalfJung` [here](rust-lang#116352 (comment)).
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 31, 2024
Rollup merge of rust-lang#123189 - Kobzol:rustc-shim-log, r=onur-ozkan

Log BOLT args in bootstrap `rustc` shim

Before, the BOLT argument would not be logged, because it was only added after the logging has happened.

Found by `@RalfJung` [here](rust-lang#116352 (comment)).
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Apr 3, 2024
Log BOLT args in bootstrap `rustc` shim

Before, the BOLT argument would not be logged, because it was only added after the logging has happened.

Found by `@RalfJung` [here](rust-lang/rust#116352 (comment)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants