Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) on Memcached #1054

Open
zamazan4ik opened this issue Jul 2, 2023 · 2 comments
Open

Profile-Guided Optimization (PGO) on Memcached #1054

zamazan4ik opened this issue Jul 2, 2023 · 2 comments

Comments

@zamazan4ik
Copy link

zamazan4ik commented Jul 2, 2023

Hi!

I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.

Test environment

  • Fedora 38
  • Linux kernel 6.3.7
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Clang 16 (from the Fedora repositories). I use Clang just because I prefer LLVM-based tooling
  • Memcached version: the most recent to the date from master branch (commit efee763c93249358ea5b3b42c7fd4e57e2599c30)

Tested configurations

I have tested the following Memcached configurations (with corresponding CFLAGS and LDFLAGS):

  • Release: CC=clang CFLAGS="-O3" ./configure
  • Release with PGO: CC=clang CFLAGS="-O3 -fprofile-instr-use=memcached.profdata" ./configure

As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented memcached version, run memtier_benchmark with the instrumented memcached, collect instrumentation data, then rebuild memcached again with the collected data.

Benchmark

I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text for Instrument and Benchmarking phases. memcached is started with the command taskset -c 0 memcached -p 21789 -t 1 .

Results

Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.

-O3
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        25641.23          ---          ---         0.42276         0.41500         0.82300         0.88700      7384.01
Gets       256409.43       233.66    256175.78         0.42171         0.41500         0.82300         0.88700      6547.85
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     282050.66       233.66    256175.78         0.42180         0.41500         0.82300         0.88700     13931.86

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26243.54          ---          ---         0.41591         0.41500         0.81500         0.83900      7557.46
Gets       262432.51       239.30    262193.21         0.41474         0.41500         0.81500         0.83900      6701.70
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     288676.05       239.30    262193.21         0.41485         0.41500         0.81500         0.83900     14259.16

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26421.20          ---          ---         0.41497         0.41500         0.81500         0.86300      7608.63
Gets       264209.12       240.98    263968.14         0.41378         0.40700         0.80700         0.86300      6747.09
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     290630.32       240.98    263968.14         0.41389         0.40700         0.80700         0.86300     14355.71

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        26630.51          ---          ---         0.41111         0.40700         0.80700         0.83900      7668.90
Gets       266302.20       242.84    266059.36         0.41034         0.40700         0.80700         0.83900      6800.52
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     292932.71       242.84    266059.36         0.41041         0.40700         0.80700         0.83900     14469.43

-O3 PGO
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27202.89          ---          ---         0.40124         0.39900         0.79100         0.81500      7833.73
Gets       272025.88       248.30    271777.58         0.40043         0.39900         0.79100         0.81500      6946.76
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     299228.77       248.30    271777.58         0.40051         0.39900         0.79100         0.81500     14780.49

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27445.08          ---          ---         0.39962         0.39900         0.78300         0.81500      7903.48
Gets       274447.81       250.38    274197.43         0.39888         0.39900         0.78300         0.81500      7008.57
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301892.89       250.38    274197.43         0.39895         0.39900         0.78300         0.81500     14912.05

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27204.11          ---          ---         0.40191         0.39900         0.78300         0.81500      7834.08
Gets       272038.14       247.41    271790.73         0.40070         0.39900         0.78300         0.81500      6946.82
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     299242.26       247.41    271790.73         0.40081         0.39900         0.78300         0.81500     14780.90

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27439.44          ---          ---         0.40177         0.39900         0.78300         0.81500      7901.85
Gets       274391.37       251.01    274140.36         0.40058         0.39900         0.78300         0.81500      7007.32
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301830.80       251.01    274140.36         0.40069         0.39900         0.78300         0.81500     14909.17

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        27415.53          ---          ---         0.40157         0.39900         0.79100         0.81500      7894.97
Gets       274152.28       250.20    273902.08         0.40053         0.39900         0.79100         0.81500      7001.05
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     301567.81       250.20    273902.08         0.40063         0.39900         0.79100         0.81500     14896.01

I didn't test (and profiled) other memtier_benchmark profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt) can help to achieve even more performance - also didn't test it.

More about other PGO results (e.g. for Redis) you can find here.

@dormando
Copy link
Member

dormando commented Jul 2, 2023

Kinda nifty, thanks!

@zamazan4ik
Copy link
Author

@dormando What do you think about adding information regarding PGO into the Memcached documentation? So users/maintainers will be able to optimize Memcached according to their own workloads.

Here are some examples from other projects:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants