Profile-Guided Optimization (PGO) on Memcached #1054

zamazan4ik · 2023-07-02T16:17:08Z

Hi!

I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.

Test environment

Fedora 38
Linux kernel 6.3.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Clang 16 (from the Fedora repositories). I use Clang just because I prefer LLVM-based tooling
Memcached version: the most recent to the date from master branch (commit efee763c93249358ea5b3b42c7fd4e57e2599c30)

Tested configurations

I have tested the following Memcached configurations (with corresponding CFLAGS and LDFLAGS):

Release: CC=clang CFLAGS="-O3" ./configure
Release with PGO: CC=clang CFLAGS="-O3 -fprofile-instr-use=memcached.profdata" ./configure

As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented memcached version, run memtier_benchmark with the instrumented memcached, collect instrumentation data, then rebuild memcached again with the collected data.

Benchmark

I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text for Instrument and Benchmarking phases. memcached is started with the command taskset -c 0 memcached -p 21789 -t 1 .

Results

Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.

-O3

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        25641.23          ---          ---         0.42276         0.41500         0.82300         0.88700      7384.01
Gets       256409.43       233.66    256175.78         0.42171         0.41500         0.82300         0.88700      6547.85
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     282050.66       233.66    256175.78         0.42180         0.41500         0.82300         0.88700     13931.86

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26243.54          ---          ---         0.41591         0.41500         0.81500         0.83900      7557.46
Gets       262432.51       239.30    262193.21         0.41474         0.41500         0.81500         0.83900      6701.70
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     288676.05       239.30    262193.21         0.41485         0.41500         0.81500         0.83900     14259.16

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26421.20          ---          ---         0.41497         0.41500         0.81500         0.86300      7608.63
Gets       264209.12       240.98    263968.14         0.41378         0.40700         0.80700         0.86300      6747.09
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     290630.32       240.98    263968.14         0.41389         0.40700         0.80700         0.86300     14355.71

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26630.51          ---          ---         0.41111         0.40700         0.80700         0.83900      7668.90
Gets       266302.20       242.84    266059.36         0.41034         0.40700         0.80700         0.83900      6800.52
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     292932.71       242.84    266059.36         0.41041         0.40700         0.80700         0.83900     14469.43

-O3 PGO

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27202.89          ---          ---         0.40124         0.39900         0.79100         0.81500      7833.73
Gets       272025.88       248.30    271777.58         0.40043         0.39900         0.79100         0.81500      6946.76
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     299228.77       248.30    271777.58         0.40051         0.39900         0.79100         0.81500     14780.49

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27445.08          ---          ---         0.39962         0.39900         0.78300         0.81500      7903.48
Gets       274447.81       250.38    274197.43         0.39888         0.39900         0.78300         0.81500      7008.57
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301892.89       250.38    274197.43         0.39895         0.39900         0.78300         0.81500     14912.05

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27204.11          ---          ---         0.40191         0.39900         0.78300         0.81500      7834.08
Gets       272038.14       247.41    271790.73         0.40070         0.39900         0.78300         0.81500      6946.82
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     299242.26       247.41    271790.73         0.40081         0.39900         0.78300         0.81500     14780.90

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27439.44          ---          ---         0.40177         0.39900         0.78300         0.81500      7901.85
Gets       274391.37       251.01    274140.36         0.40058         0.39900         0.78300         0.81500      7007.32
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301830.80       251.01    274140.36         0.40069         0.39900         0.78300         0.81500     14909.17

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27415.53          ---          ---         0.40157         0.39900         0.79100         0.81500      7894.97
Gets       274152.28       250.20    273902.08         0.40053         0.39900         0.79100         0.81500      7001.05
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301567.81       250.20    273902.08         0.40063         0.39900         0.79100         0.81500     14896.01

I didn't test (and profiled) other memtier_benchmark profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt) can help to achieve even more performance - also didn't test it.

More about other PGO results (e.g. for Redis) you can find here.

The text was updated successfully, but these errors were encountered:

dormando · 2023-07-02T16:39:57Z

Kinda nifty, thanks!

zamazan4ik · 2023-08-28T21:19:08Z

@dormando What do you think about adding information regarding PGO into the Memcached documentation? So users/maintainers will be able to optimize Memcached according to their own workloads.

Here are some examples from other projects:

ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
Databend: https://databend.rs/doc/contributing/pgo
Vector: https://vector.dev/docs/administration/tuning/pgo/
Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) on Memcached #1054

Profile-Guided Optimization (PGO) on Memcached #1054

zamazan4ik commented Jul 2, 2023 •

edited

Loading

dormando commented Jul 2, 2023

zamazan4ik commented Aug 28, 2023

Profile-Guided Optimization (PGO) on Memcached #1054

Profile-Guided Optimization (PGO) on Memcached #1054

Comments

zamazan4ik commented Jul 2, 2023 • edited Loading

Test environment

Tested configurations

Benchmark

Results

dormando commented Jul 2, 2023

zamazan4ik commented Aug 28, 2023

zamazan4ik commented Jul 2, 2023 •

edited

Loading