PGO optimized binnary #6943

FilipAndersson245 · 2022-10-29T14:44:02Z

Describe the solution you'd like

I would suggest we explore if it's possible to integrate Profile-Guided-Optimization (PGO).
PGO builds the binary twice, the first time with intrinsics, and runs it on some set of example input to gather statistics on how the functions are called. this data is then used to optimize a second build enabling future performance, as it can how the program is executed during runtime.
things we need to consider if this is to be implemented:

Collect a representative of data, representing common use of the shell, I believe this should not be too difficult, by running some general set of benchmarks / common commands.
Setup a CI infrastructure to do this for our 3week releases.runs

Describe alternatives you've considered

No response

Additional context and details

https://github.com/Kobzol/cargo-pgo

fdncred · 2022-10-29T15:39:37Z

How do you do a PGO build with cargo? I'd like to try it.

EDIT: I see this now. Looks like a pain. I wonder if it would be worth it?
https://doc.rust-lang.org/rustc/profile-guided-optimization.html

FilipAndersson245 · 2022-10-29T15:41:01Z

How do you do a PGO build with cargo? I'd like to try it.

https://github.com/Kobzol/cargo-pgo
I managed to run PGO locally using this cargo extension. the extension manages alot of the more anoying parts.

EDIT: after installing it it seem you can do the following

cargo pgo build
cargo pgo run # Gather statistics
cargo pgo optimize # Create the new optimized binary.

fdncred · 2022-10-29T16:19:25Z

Ugh - I get this on Mac.

> cargo pgo run 
[2022-10-29T16:18:11Z INFO  cargo_pgo::pgo::instrument] PGO profile directory will be cleared.
[2022-10-29T16:18:11Z INFO  cargo_pgo::pgo::instrument] PGO profiles will be stored into /Users/fdncred/src/forks/nushell/target/pgo-profiles.
    Finished release [optimized] target(s) in 0.29s
[2022-10-29T16:18:11Z INFO  cargo_pgo::pgo::instrument] PGO instrumentation build finished successfully.
     Running `target/aarch64-apple-darwin/release/nu`
Error:
  × Nushell launched as a REPL, but STDIN is not a TTY; either launch in a
  │ valid terminal or provide arguments to invoke a script!

Cargo finished with an error (1)

FilipAndersson245 · 2022-10-29T16:21:30Z

I ran nushell with nu -c 'source ...' and i belive it worked

rgwood · 2022-10-29T17:48:50Z

I don't think this is going to be worth the effort or the impact to compile times. But if someone wants to give it a try and report back, feel free.

FilipAndersson245 · 2022-10-31T16:18:53Z

I managed to run it locally on some new data, and the runtime performance improvements are quite low, it gives a small to medium improvement to binary size, but have a large impact on compile time, as it has to compile multiple release builds.

rgwood · 2023-03-01T06:53:13Z

Thanks for confirming that. Closing this as it sounds like the tradeoff is not worth it at this point in time.

zamazan4ik · 2023-06-14T21:13:51Z

I managed to run it locally on some new data, and the runtime performance improvements are quite low, it gives a small to medium improvement to binary size, but have a large impact on compile time, as it has to compile multiple release builds.

@FilipAndersson245 Could you share please how did you perform the training (profiling) and benchmarking phases? On which scenario, how long, etc. And on these scenarios - what exactly numbers did you get? At least just for history :)

Thanks in advance!

FilipAndersson245 · 2023-06-14T21:16:08Z

@zamazan4ik scary, I was just looking into PGO again in nushell, I will give some info tomorow, have to sleep now.

zamazan4ik · 2023-06-14T22:17:55Z

I performed some benchmarks. Since I have no understanding about Nushell internals I used what I found - benches. So I trained and evaluated PGO for Nushell on the available benches (cargo pgo bench cargo pgo optimize bench).

My setup is Apple Macbook Pro M1 14. I tested the following configurations:

Default Release build (size-optimized)
PGO for size-optimized release build
Release with opt-level = 3. Everything else is the same
(just for history) Instrumentation mode so you could estimate how instrumentation affects the benchmarks performance

Results are here: https://pastebin.com/KkuhyQ2F

At least according to these benchmarks - PGO helps a lot with performance. How does it influence Nu shell performance in general - I do not know. But my guess (or my hope) is if you have these benchmarks - they are important for you.

Regarding multi-stage builds... You could at least leave PGO scripts in the directory and describe them somewhere in the documentation PGO results. And then people/maintainers could choose on their own - do they want to use PGO in their use-case or not.

More about multiple other PGO applications (and other PGO ways beyond usual PGO like CSPGO (Context-Sensitive PGO) or BOLT you could find here: https://github.com/ZaMaZaN4iK/awesome-pgo

FilipAndersson245 · 2023-06-15T04:18:04Z

I was unable to find the specific script I tested on but I will add my take to do PGO
I think we need to use as real as possible data to ensure the optimization capture real performance improvements, it is also misleading to run PGO on the existing benchmarks as it would make the result biased towards the benchmark, skewing the results.

I just now did a quick test comparing the standard nu with nu-pgo where i only do a run of the binnary,

cargo pgo run --  -- "-c 'exit'"
cargo pgo optimize

Running just this gives a faster starup time

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./nu -c 'exit'`	17.6 ± 1.2	16.4	29.9	1.15 ± 0.12
`./nu-pgo -c 'exit'`	15.3 ± 1.1	14.1	24.7	1.00

comparing it with gradient_benchmark_no_check it is less impressive

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./nu -c 'source gradient_benchmark_no_check.nu; exit'`	462.1 ± 20.4	442.3	524.6	1.03 ± 0.06
`./nu-pgo -c 'source gradient_benchmark_no_check.nu; exit'`	447.6 ± 16.4	424.0	476.0	1.00

As pgo have the potential to even slow down the runtime if the data trained on don`t represent actually use cases.
I will try to compile down some script using community code to see if this can be improved, but the initial impression of just using this seems decent to good.

edit: Both those where also run using codegen-units=1 but this would probably not affect the relative difference between the two.

FilipAndersson245 added the enhancement New feature or request label Oct 29, 2022

rgwood closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGO optimized binnary #6943

PGO optimized binnary #6943

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

fdncred commented Oct 29, 2022 •

edited

Loading

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

fdncred commented Oct 29, 2022

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

rgwood commented Oct 29, 2022

FilipAndersson245 commented Oct 31, 2022

rgwood commented Mar 1, 2023

zamazan4ik commented Jun 14, 2023 •

edited

Loading

FilipAndersson245 commented Jun 14, 2023

zamazan4ik commented Jun 14, 2023 •

edited

Loading

FilipAndersson245 commented Jun 15, 2023 •

edited

Loading

PGO optimized binnary #6943

PGO optimized binnary #6943

Comments

FilipAndersson245 commented Oct 29, 2022 • edited Loading

Related problem

Describe the solution you'd like

Describe alternatives you've considered

Additional context and details

fdncred commented Oct 29, 2022 • edited Loading

FilipAndersson245 commented Oct 29, 2022 • edited Loading

fdncred commented Oct 29, 2022

FilipAndersson245 commented Oct 29, 2022 • edited Loading

rgwood commented Oct 29, 2022

FilipAndersson245 commented Oct 31, 2022

rgwood commented Mar 1, 2023

zamazan4ik commented Jun 14, 2023 • edited Loading

FilipAndersson245 commented Jun 14, 2023

zamazan4ik commented Jun 14, 2023 • edited Loading

FilipAndersson245 commented Jun 15, 2023 • edited Loading

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

fdncred commented Oct 29, 2022 •

edited

Loading

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

FilipAndersson245 commented Oct 29, 2022 •

edited

Loading

zamazan4ik commented Jun 14, 2023 •

edited

Loading

zamazan4ik commented Jun 14, 2023 •

edited

Loading

FilipAndersson245 commented Jun 15, 2023 •

edited

Loading