Continuous benchmarking #80

bartvanerp · 2023-02-28T15:06:47Z

In the future it would be good to have some kind of benchmarking system in our CI, such that we become aware how changes in our code impact performance. An example of such a system is provided by FluxBench.jl and corresponding website.

bvdmitri · 2023-10-05T13:08:20Z

Before we commence the actual benchmarking process, it's crucial to conduct preliminary research to determine what tasks we can and should perform ourselves and what components we can potentially leverage from other packages or libraries. Here are the key aspects we need to investigate:

Benchmarking Methodology:
- Research the available benchmarking tools, such as BenchmarkTools.jl and PkgBenchmark.jl, and determine which one is most suitable for our needs. Others?
Benchmark Targets:
- Define what specific aspects we need to benchmark. For example the set of benchmarks will clearly be different for RxInfer and ExponentialFamily. Clearly outline the performance metrics or characteristics that need measurement.
Reporting and Visualization:
- Explore methods for reporting and visualizing benchmark results. Should we use graphical representations, tables, or a combination of both? What libraries can we use for that?
Results Storage:
- Determine where to store the benchmark results to ensure easy access and future analysis.
Benchmark Execution:
- Investigate the feasibility of executing benchmarks using our GitHub runner. Assess the setup process's complexity and determine if it's straightforward to configure.
As much as possible research is appreciated

This task has been added to the milestone for tracking and prioritization.

bvdmitri · 2023-11-14T12:06:13Z

@bartvanerp
This task has been added to the milestone for tracking and prioritization.

bartvanerp · 2023-11-15T12:32:03Z

Just did some very extensive research:

Benchmarking Methodology:
I think PkgBenchmark.jl is the best for creating the benchmark suite. I played around with this for RxSDE.jl a bit and really liked it. This package, however, only tests the execution speed, but I think this metric would be good to start off with. Other metrics, once relevant and implemented, would likely require custom tests anyways.

Benchmark Targets:
Let's start off with automatic execution speed as a performance metric. Later on we can extend it, if we have some relevant other metrics and appropriate tests. For me this is beyond the scope of this PR.

Reporting and Visualization:
PkgBenchmark.jl automatically generates a report (with differences) between two different commits. There also exists BenchmarkCI.jl to run this on GitHub, but I don't think this will give us reliable performance metrics. FluxBench.jl depends on both, but is likely very much tailored towards Flux.jl, so I am not sure whether this is desirable. For now I propose to just generate the report and to include it manually in a PR, which will be required before having the PR approved. Other reporting/visualization tools will be nice, but probably we will have to implement this ourselves.

Results Storage:
If we manually copy them in the PR, then they are saved there. Ideally we want something similar as CodeCov, which just executes the PR and shows the difference report.

Benchmark Execution:
I think this post is a nice example of running the benchmarks through CI on the GitHub hosted runner: https://labs.quansight.org/blog/2021/08/github-actions-benchmarks. It is just not quite stable. Furthermore, it will burn through our GitHub minutes. We could hook up a Raspberry Pi (which is not fast, but perhaps that is actually a good thing as we are targeting these devices) as a custom runner: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners.

@bvdmitri @albertpod let me know what you think.

bartvanerp · 2023-11-20T07:37:53Z

Aside from the real performance benchmakr, we can also already start building a test suit for allocations using AllocCheck.jl, https://github.com/JuliaLang/AllocCheck.jl.

bartvanerp · 2023-11-21T14:02:21Z

Today we discussed the issue together with @albertpod and @bvdmitri. We agree on the following plan:

All of our packages will need to be extended with a benchmark suite containing performance and memory (allocation) benchmarks. Alternative metrics can be added later once we have developed suitable methods for testing them. @bartvanerp will make a start with this for the FastCholesky.jl package to experiment with it.

Starting in January we will extend the benchmark suites to our other packages and will divide tasks.

For now we will ask everyone to run the benchmarks locally when filing a PR. The benchmarking diff/results will need to be uploaded with the PR. Future work will be to automate this using a custom GitHub runner (our Raspberry Pi), and to visualize results online.

bartvanerp · 2023-11-21T19:56:52Z

Made a start with the benchmarks for FastCholesky.jl at ReactiveBayes/FastCholesky.jl#8.

There is one point which I need to adjust in my above statements: let's skip the extra allocation benchmarks, as these are automatically included in PkgBenchmark.jl.

bartvanerp · 2023-11-28T09:15:42Z

Coming back to the memory benchmarking: I think it will still be good to create tests for inplace functions, which we assume to be non-allocating, to check whether they are still non-allocating. Kind of like a test which checks @allocated foo() == 0. The AllocCheck package currently does not support this, but the TestNoAllocations package does. Nonetheless, AllocCheck has some PR's which will include this behaviour and supercede TestNoAllcoations: JuliaLang/AllocCheck.jl#59, JuliaLang/AllocCheck.jl#55

bvdmitri · 2023-11-28T09:28:49Z

I used AllocCheck here. The limitation is that it does check allocations statically, which limits its applications to a very small and type-stable functions. Still useful though

wouterwln · 2024-03-15T12:56:50Z

Let's make sure we have a benchmark suite boilerplate set up before the 3.0.0 release, such that we can track performance from 3.0.0 onwards

wouterwln · 2024-04-12T14:26:55Z

I'm moving this to 3.1.0 now, but I suggest we use https://github.com/MilesCranmer/AirspeedVelocity.jl for this. It works with existing ecosystem of BenchmarkTools and PkgBenchmark. Let's investigate if we can bench RMP and GraphPPL behaviour with this as well

bartvanerp added enhancement New feature or request Performance Improve code speed labels Feb 28, 2023

bvdmitri assigned albertpod and bartvanerp Oct 5, 2023

bvdmitri added this to the RxInfer update Nov 28th milestone Nov 14, 2023

albertpod assigned bartvanerp and unassigned bartvanerp Jan 11, 2024

wouterwln modified the milestones: RxInfer update Nov 28th, RxInfer 3.0.0 release Mar 15, 2024

wouterwln modified the milestones: RxInfer 3.0.0 release, RxInfer 3.1.0 release Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous benchmarking #80

Continuous benchmarking #80

bartvanerp commented Feb 28, 2023 •

edited

Loading

bvdmitri commented Oct 5, 2023

bvdmitri commented Nov 14, 2023

bartvanerp commented Nov 15, 2023

bartvanerp commented Nov 20, 2023

bartvanerp commented Nov 21, 2023

bartvanerp commented Nov 21, 2023

bartvanerp commented Nov 28, 2023 •

edited

Loading

bvdmitri commented Nov 28, 2023

wouterwln commented Mar 15, 2024

wouterwln commented Apr 12, 2024

Continuous benchmarking #80

Continuous benchmarking #80

Comments

bartvanerp commented Feb 28, 2023 • edited Loading

bvdmitri commented Oct 5, 2023

bvdmitri commented Nov 14, 2023

bartvanerp commented Nov 15, 2023

bartvanerp commented Nov 20, 2023

bartvanerp commented Nov 21, 2023

bartvanerp commented Nov 21, 2023

bartvanerp commented Nov 28, 2023 • edited Loading

bvdmitri commented Nov 28, 2023

wouterwln commented Mar 15, 2024

wouterwln commented Apr 12, 2024

bartvanerp commented Feb 28, 2023 •

edited

Loading

bartvanerp commented Nov 28, 2023 •

edited

Loading