You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.
Test environment
Fedora 38
Linux kernel 6.3.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Clang 16 (from the Fedora repositories). I use Clang just because I prefer LLVM-based tooling
Memcached version: the most recent to the date from master branch (commit efee763c93249358ea5b3b42c7fd4e57e2599c30)
Tested configurations
I have tested the following Memcached configurations (with corresponding CFLAGS and LDFLAGS):
Release: CC=clang CFLAGS="-O3" ./configure
Release with PGO: CC=clang CFLAGS="-O3 -fprofile-instr-use=memcached.profdata" ./configure
As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented memcached version, run memtier_benchmark with the instrumented memcached, collect instrumentation data, then rebuild memcached again with the collected data.
Benchmark
I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text for Instrument and Benchmarking phases. memcached is started with the command taskset -c 0 memcached -p 21789 -t 1 .
Results
Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.
I didn"t test (and profiled) other memtier_benchmark profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt) can help to achieve even more performance - also didn"t test it.
More about other PGO results (e.g. for Redis) you can find here.
The text was updated successfully, but these errors were encountered:
@dormando What do you think about adding information regarding PGO into the Memcached documentation? So users/maintainers will be able to optimize Memcached according to their own workloads.
Hi!
I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.
Test environment
master
branch (commitefee763c93249358ea5b3b42c7fd4e57e2599c30
)Tested configurations
I have tested the following Memcached configurations (with corresponding
CFLAGS
andLDFLAGS
):CC=clang CFLAGS="-O3" ./configure
CC=clang CFLAGS="-O3 -fprofile-instr-use=memcached.profdata" ./configure
As a PGO technique, I use
-fprofile-instr-generate
/-fprofile-instr-use
options from Clang. Build instrumentedmemcached
version, runmemtier_benchmark
with the instrumentedmemcached
, collect instrumentation data, then rebuildmemcached
again with the collected data.Benchmark
I use
memtier_benchmark
withtaskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text
for Instrument and Benchmarking phases.memcached
is started with the commandtaskset -c 0 memcached -p 21789 -t 1
.Results
Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in
memtier_benchmark
format. I have rechecked - the results are consistent between runs.-O3
-O3 + PGO
I didn"t test (and profiled) other
memtier_benchmark
profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt
) can help to achieve even more performance - also didn"t test it.More about other PGO results (e.g. for Redis) you can find here.
The text was updated successfully, but these errors were encountered: