Profile-Guided Optimization (PGO) benchmark report #6

zamazan4ik · 2024-08-05T07:43:57Z

Hi!

I was interested in optimizing the library's performance even further. I evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.

Test environment

Fedora 40
Linux kernel 6.9.12
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.79
pulldown-latex version: main branch on commit 62cbc23a48ab3828dfd70e177227e7d3c03bb042
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use the cargo-pgo tool. Release bench result I got with taskset -c 0 cargo bench command. The PGO training phase is done with taskset -c 0 cargo pgo bench, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench.

taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/9180993e98d13046a4bcf798e554d0bc
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/aacc0ec8cb158a6a078c4ea1d32327e8
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/56f459df333322e9180e827699c98eb5

According to the results, PGO measurably improves the library's performance in many cases.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other datasets (if you are interested enough). If it shows improvements - add a note to the documentation (the README file, I guess?) about possible improvements in the library's performance with PGO.
Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference or checking some assembly/LLVM IR differences before and after PGO. However, this job can be boring and time-consuming - and the compiler already does all the job automatically when PGO is used.

I would be happy to answer your questions about PGO.

P.S. I created the issue just because Discussions are disabled for the repository. It's just a benchmark report, not a bug or smth like that.

The text was updated successfully, but these errors were encountered:

carloskiki · 2024-08-06T01:34:38Z

Wow thank you so much! I had never heard of PGO before, apart from the usual "always optimize based on profiling." I did not know there were tools that analyze the binary and optimize it "automatically," this is super nice! I will keep this open for now, I think its a great thing but as of now the crate is still to much in development to start optimizing it.

As an aside, I have opened the discussion tab for the repo :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark report #6

Profile-Guided Optimization (PGO) benchmark report #6

zamazan4ik commented Aug 5, 2024

carloskiki commented Aug 6, 2024

Profile-Guided Optimization (PGO) benchmark report #6

Profile-Guided Optimization (PGO) benchmark report #6

Comments

zamazan4ik commented Aug 5, 2024

Test environment

Benchmark

Results

Further steps

carloskiki commented Aug 6, 2024