-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO optimized binnary #6943
Comments
How do you do a PGO build with cargo? I'd like to try it. EDIT: I see this now. Looks like a pain. I wonder if it would be worth it? |
https://github.com/Kobzol/cargo-pgo EDIT: after installing it it seem you can do the following
|
Ugh - I get this on Mac.
|
I ran nushell with nu -c 'source ...' and i belive it worked |
I don't think this is going to be worth the effort or the impact to compile times. But if someone wants to give it a try and report back, feel free. |
I managed to run it locally on some new data, and the runtime performance improvements are quite low, it gives a small to medium improvement to binary size, but have a large impact on compile time, as it has to compile multiple release builds. |
Thanks for confirming that. Closing this as it sounds like the tradeoff is not worth it at this point in time. |
@FilipAndersson245 Could you share please how did you perform the training (profiling) and benchmarking phases? On which scenario, how long, etc. And on these scenarios - what exactly numbers did you get? At least just for history :) Thanks in advance! |
@zamazan4ik scary, I was just looking into PGO again in nushell, I will give some info tomorow, have to sleep now. |
I performed some benchmarks. Since I have no understanding about Nushell internals I used what I found - benches. So I trained and evaluated PGO for Nushell on the available benches ( My setup is Apple Macbook Pro M1 14. I tested the following configurations:
Results are here: https://pastebin.com/KkuhyQ2F At least according to these benchmarks - PGO helps a lot with performance. How does it influence Nu shell performance in general - I do not know. But my guess (or my hope) is if you have these benchmarks - they are important for you. Regarding multi-stage builds... You could at least leave PGO scripts in the directory and describe them somewhere in the documentation PGO results. And then people/maintainers could choose on their own - do they want to use PGO in their use-case or not. More about multiple other PGO applications (and other PGO ways beyond usual PGO like CSPGO (Context-Sensitive PGO) or BOLT you could find here: https://github.com/ZaMaZaN4iK/awesome-pgo |
I was unable to find the specific script I tested on but I will add my take to do PGO I just now did a quick test comparing the standard
Running just this gives a faster starup time
comparing it with gradient_benchmark_no_check it is less impressive
As pgo have the potential to even slow down the runtime if the data trained on don`t represent actually use cases. edit: Both those where also run using |
Related problem
We would always strive to improve performance, as it improves the user experience if the program has a low latency and predictability.
Describe the solution you'd like
I would suggest we explore if it's possible to integrate Profile-Guided-Optimization (PGO).
PGO builds the binary twice, the first time with intrinsics, and runs it on some set of example input to gather statistics on how the functions are called. this data is then used to optimize a second build enabling future performance, as it can how the program is executed during runtime.
things we need to consider if this is to be implemented:
Describe alternatives you've considered
No response
Additional context and details
https://github.com/Kobzol/cargo-pgo
The text was updated successfully, but these errors were encountered: