-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase default buffer sizes for storage uploads and downloads. #2657
Comments
I am starting to get data around this. It seems that the buffer size is not that important for downloads after about 1-2MiB, but it remains important for uploads even up to 4MiB: The buffer size is very important for the CPU overhead for uploads, but not so much for downloads: Also the downloads are constrained by CPU, while the uploads are constrained by the roundtrip type to upload each chunk (recall that we are using resumable uploads): A larger analysis is here: throughput-vs-cpu-analysis.pdf These are preliminary results, but I wanted to save them somewhere. |
@dopiera I think this would be a good thing for you to tackle next. It is obvious we need larger buffer sizes (see the graphs above), but there are a couple of questions to answer:
For "ideal throughput" (large objects > 32 MiB) all these questions can be answered (I think) by running the throughput_vs_cpu benchmark with the right parameters, capturing the results and then printing the pretty graphs. For "ideal latency" (small objects < 4MiB, maybe up to 32MiB) we need a new benchmark. I think I can upload the Python code to generate the graphs in to this bug. |
TL;DR; the buffer size should be set to 8MiB for uploads and 1.5MiB for downloads. Methodology:
The tests were run on a I've tun the test on a regional ( I made the results available via pushing to my repo: dopiera@9c1161c . There is also a helper script for plotting them. For every observation, I'm leaving a command showing the data. Observations:
$ for crc in True False ; do for md5 in True False ; do \
echo crc=$crc md5=$md5; \
python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==$crc and md5==$md5 and file_sz > 256 and chunk_sz > 16" \
chunk_sz,file_sz,bw ; \
done ; done
crc=True md5=True
Samples=221 EX=353.15, SD=46.17
gnuplot> ^D
crc=True md5=False
Samples=239 EX=353.64, SD=42.39
gnuplot> ^D
crc=False md5=True
Samples=210 EX=355.64, SD=41.25
gnuplot> ^D
crc=False md5=False
Samples=264 EX=352.39, SD=45.10
gnuplot> ^D The cpu use actually never reaches 1.0 with both CRC and MD5 on: $ python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True" \
chunk_sz,file_sz,cpu_use A far-fetched conclusion is that we don't need to make downloading asynchronous to get better performance.
$ python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True" \
chunk_sz,file_sz,bw
$ python plot2.py \
regional.csv \
op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 1 and chunk_sz < 1.5" \
chunk_sz,bw
Samples=39 EX=326.42, SD=41.47
gnuplot> ^D
$ python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 1.5 and
chunk_sz < 2" \
chunk_sz,bw
Samples=37 EX=353.70, SD=44.13
gnuplot> ^D
$ python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 16 " \
chunk_sz,bw
Samples=153 EX=358.02, SD=43.69
gnuplot> ^D
$ python plot2.py \
regional.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz < 16" \
chunk_sz,bw
Samples=463 EX=350.55, SD=44.80
gnuplot>
$ python plot2.py \
remote.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True " \
chunk_sz,file_sz,bw A closer look at large files and limited chunks size shows that 1.5MiB is a good choice for remote case too: $ python plot2.py \
remote.csv \
"op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz < 16" \
chunk_sz,bw
$ for crc in True False ; do for md5 in True False ; do \
echo crc=$crc md5=$md5; \
python plot2.py regional.csv "op==\"UPLOAD\" and crc==$crc and md5==$md5 and file_sz > 200" \
chunk_sz,file_sz,bw; \
done ; done
crc=True md5=True
Samples=1085 EX=48.31, SD=7.10
gnuplot> ^D
crc=True md5=False
Samples=1148 EX=48.34, SD=7.12
gnuplot> ^D
crc=False md5=True
Samples=1109 EX=48.17, SD=7.23
gnuplot> ^D
crc=False md5=False
Samples=1131 EX=48.32, SD=6.69
gnuplot> ^D
$ python plot2.py \
regional.csv \
"op==\"UPLOAD\" and crc==True and md5==True" \
chunk_sz,file_sz,bw
$ python plot2.py \
regional.csv \
"op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 and chunk_sz >16" \
chunk_sz,file_sz,bw
Samples=261 EX=50.75, SD=3.12
$ python plot2.py \
regional.csv
"op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 and chunk_sz <16" \
chunk_sz,bw
$ python plot2.py \
remote.csv \
"op==\"UPLOAD\" and crc==True and md5==True" \
chunk_sz,file_sz,bw
$ python plot2.py \
remote.csv "op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 " \
chunk_sz,bw |
Opening the stream within a region takes 55ms and closing 110ms (165ms total). To a Assuming 50MiB/s and 30 MiB/s throughputs, these numbers mean, that during that extra time, 8.2 MiB or 19.5MiB could be sent respectively. This is the reason why I think files up to at least 20MiB (instead of current 5MiB) should be sent in a non-resumable fashion. |
This fixes googleapis#2657. Experiments, which back these numbers are described in googleapis#2657.
This is an effect of experiments in googleapis#2657.
This is an effect of experiments in #2657.
The sizes are too small for good performance, they are currently set to 256KiB, and I get better performance from GCE with 1MiB or bigger. My benchmarks show that for my workstation at work something in the 32MiB range is better.
I think we should increase these values. As to what is the ideal number, there probably isn't one, see #726. But until then, we should pick something that performs about optimally for GCE against a regional bucket (in the same region as the GCE instance).
We should also write a benchmark similar to
gsutil perfdiag
, so we can compare the two.The text was updated successfully, but these errors were encountered: