Increase default buffer sizes for storage uploads and downloads. #2657

coryan · 2019-05-14T20:14:36Z

The sizes are too small for good performance, they are currently set to 256KiB, and I get better performance from GCE with 1MiB or bigger. My benchmarks show that for my workstation at work something in the 32MiB range is better.

I think we should increase these values. As to what is the ideal number, there probably isn't one, see #726. But until then, we should pick something that performs about optimally for GCE against a regional bucket (in the same region as the GCE instance).

We should also write a benchmark similar to gsutil perfdiag, so we can compare the two.

The text was updated successfully, but these errors were encountered:

coryan · 2019-05-27T20:23:14Z

I am starting to get data around this. It seems that the buffer size is not that important for downloads after about 1-2MiB, but it remains important for uploads even up to 4MiB:

The buffer size is very important for the CPU overhead for uploads, but not so much for downloads:

Also the downloads are constrained by CPU, while the uploads are constrained by the roundtrip type to upload each chunk (recall that we are using resumable uploads):

A larger analysis is here:

throughput-vs-cpu-analysis.pdf

These are preliminary results, but I wanted to save them somewhere.

coryan · 2019-06-19T15:30:04Z

@dopiera I think this would be a good thing for you to tackle next. It is obvious we need larger buffer sizes (see the graphs above), but there are a couple of questions to answer:

How much larger do they need to be?
What is the size at which you start getting diminishing returns? (this is related to the previous question).
Is that size the same for regional buckets in the same region as the GCE instance vs. buckets in a different region?

For "ideal throughput" (large objects > 32 MiB) all these questions can be answered (I think) by running the throughput_vs_cpu benchmark with the right parameters, capturing the results and then printing the pretty graphs.

For "ideal latency" (small objects < 4MiB, maybe up to 32MiB) we need a new benchmark.

I think I can upload the Python code to generate the graphs in to this bug.

dopiera · 2019-08-07T11:48:42Z

TL;DR; the buffer size should be set to 8MiB for uploads and 1.5MiB for downloads.

Methodology:
I've modified the throughput_vs_cpu benchmark such that:

MD5 and CRC are also picked at random by the test; this is to eliminate the variance between tests (Random CRC and MD5 in storage throughput benchmark #2943)
the buffer sizes are picked according to a non-uniform distribution to allow for a single test to cover both small and large values of the buffers with enough samples for everything (Favor small sizes in storage throughput benchmark. #2942)
report progress of uploading/downloading over time (Make GCS throughput benchmark record progress. #2944)

The tests were run on a n1-standard-4 machine. The benchmark was using 2 threads to make sure that there is enough CPU for every thread and that hyperthreading doesn't kick in (if there actually is any - cpuinfo doesn't suggest it).

I've tun the test on a regional (europe-west3) and a remote (europe-west3 -> us-east1) bucket to collect a couple of thousand samples each.

I made the results available via pushing to my repo: dopiera@9c1161c . There is also a helper script for plotting them. For every observation, I'm leaving a command showing the data.

Observations:

CPU is not a bottleneck neither for uploads nor downloads.
This shows that even for largest chunk and files sizes, bandwidth is approximately the same whether MD5 or CRC are on or not:

$ for crc in True False ; do for md5 in True False ; do \
    echo crc=$crc md5=$md5; \
    python plot2.py \
        regional.csv \
        "op==\"DOWNLOAD\" and crc==$crc and md5==$md5  and file_sz > 256 and chunk_sz > 16" \
        chunk_sz,file_sz,bw ; \
    done ; done
crc=True md5=True
Samples=221 EX=353.15, SD=46.17
gnuplot> ^D
crc=True md5=False
Samples=239 EX=353.64, SD=42.39
gnuplot> ^D
crc=False md5=True
Samples=210 EX=355.64, SD=41.25
gnuplot> ^D
crc=False md5=False
Samples=264 EX=352.39, SD=45.10
gnuplot> ^D

The cpu use actually never reaches 1.0 with both CRC and MD5 on:

$ python plot2.py \
  regional.csv \
  "op==\"DOWNLOAD\" and crc==True and md5==True" \
  chunk_sz,file_sz,cpu_use

A far-fetched conclusion is that we don't need to make downloading asynchronous to get better performance.

file size has an effect on the download bandwidth, but anything larger than 400 is mostly the same; it also doesn't make much sense to analyze chunk sizes larger than 16MiB for downloads (you'd have to run gnuplot and rotate the graph to actually see it):

$ python plot2.py \
    regional.csv \
    "op==\"DOWNLOAD\" and crc==True and md5==True" \
    chunk_sz,file_sz,bw

for file sizes larger than 400M, the chunk sizes between 1.5MiB and 2MiB yield similar average bandwidth to chunk sizes larger than 16MiB; the average between 1MiB and 1.5MiB is singificantly slower, so 1.5Mib is a good value; the look at the plot seems to confirm it:

$ python plot2.py \
    regional.csv \
    op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 1 and chunk_sz < 1.5"  \
    chunk_sz,bw
Samples=39 EX=326.42, SD=41.47
gnuplot> ^D
$ python plot2.py \
    regional.csv \ 
    "op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 1.5 and 
    chunk_sz < 2" \
    chunk_sz,bw
Samples=37 EX=353.70, SD=44.13
gnuplot> ^D
$ python plot2.py \
    regional.csv \
    "op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz > 16 "  \
    chunk_sz,bw
Samples=153 EX=358.02, SD=43.69
gnuplot> ^D
$ python plot2.py \
    regional.csv \
    "op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz < 16" \
    chunk_sz,bw
Samples=463 EX=350.55, SD=44.80
gnuplot>

for remote locations, the file size matters more for bandwidth and the peak is at lower chunk sizes:

$ python plot2.py \
    remote.csv \
    "op==\"DOWNLOAD\" and crc==True and md5==True " \
    chunk_sz,file_sz,bw

A closer look at large files and limited chunks size shows that 1.5MiB is a good choice for remote case too:

$ python plot2.py \
    remote.csv \
    "op==\"DOWNLOAD\" and crc==True and md5==True and file_sz > 400 and chunk_sz < 16" \
    chunk_sz,bw

for uploads, CPU is not an issue either:

$ for crc in True False ; do for md5 in True False ; do \
    echo crc=$crc md5=$md5; \
    python plot2.py regional.csv "op==\"UPLOAD\" and crc==$crc and md5==$md5  and file_sz > 200" \
    chunk_sz,file_sz,bw; \
    done ; done
crc=True md5=True
Samples=1085 EX=48.31, SD=7.10
gnuplot> ^D
crc=True md5=False
Samples=1148 EX=48.34, SD=7.12
gnuplot> ^D
crc=False md5=True
Samples=1109 EX=48.17, SD=7.23
gnuplot> ^D
crc=False md5=False
Samples=1131 EX=48.32, SD=6.69
gnuplot> ^D

for uploads, file sizes don't play that big of a role and for sizes larger than 200MiB don't seem to play a role at all:

$ python plot2.py \
    regional.csv \
    "op==\"UPLOAD\" and crc==True and md5==True" \
    chunk_sz,file_sz,bw

the average upload BW for chunk sizes larger than 16MiB and file size larger than 200MiB is around 50MiB/s; this chunk and file size range is already the plateau:

$ python plot2.py \
    regional.csv \
    "op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 and chunk_sz >16"  \
    chunk_sz,file_sz,bw
Samples=261 EX=50.75, SD=3.12

the link between the chunk size and bandwidth seems to have a local maximum every 2MiB with 8MiB being the first peek to reach 50MiB/s (the top which we can reach), so 8MiB is a good value:

$ python plot2.py \
    regional.csv
    "op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 and chunk_sz <16"  \
    chunk_sz,bw

file sizes larger than 200MiB don't matter for uploads to a remote location either:

$ python plot2.py \
    remote.csv \
    "op==\"UPLOAD\" and crc==True and md5==True" \
    chunk_sz,file_sz,bw

for remote uploads, the benefits of increasing the buffer go all the way up to 32MiB, but the 8Mib seems to achieve around 60% of the maximum possible, so 8MiB is still the way to go IMO:

$ python plot2.py \
    remote.csv "op==\"UPLOAD\" and crc==True and md5==True and file_sz > 200 " \
    chunk_sz,bw

dopiera · 2019-08-07T13:13:52Z

Opening the stream within a region takes 55ms and closing 110ms (165ms total). To a us-east1 from europe-west3 these numbers are 257ms and 392ms (649ms total) respectively.

Assuming 50MiB/s and 30 MiB/s throughputs, these numbers mean, that during that extra time, 8.2 MiB or 19.5MiB could be sent respectively. This is the reason why I think files up to at least 20MiB (instead of current 5MiB) should be sent in a non-resumable fashion.

This fixes googleapis#2657. Experiments, which back these numbers are described in googleapis#2657.

This is an effect of experiments in googleapis#2657.

This fixes #2657.

This is an effect of experiments in #2657.

coryan added api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels May 14, 2019

coryan mentioned this issue May 14, 2019

Performance issue for uploading files #2656

Closed

coryan assigned dopiera Jun 19, 2019

coryan added this to High priority in Customer Issues Jul 17, 2019

dopiera added a commit to dopiera/google-cloud-cpp that referenced this issue Aug 7, 2019

Increase download and upload buffers.

0d2c68c

This fixes googleapis#2657. Experiments, which back these numbers are described in googleapis#2657.

dopiera mentioned this issue Aug 7, 2019

Increase download and upload buffers. #2945

Merged

dopiera added a commit to dopiera/google-cloud-cpp that referenced this issue Aug 7, 2019

Increase the threshold fur using resumable uploads

342b823

This is an effect of experiments in googleapis#2657.

dopiera mentioned this issue Aug 7, 2019

feat: Increase the threshold for using resumable uploads #2946

Merged

coryan closed this as completed in #2945 Aug 7, 2019

Customer Issues automation moved this from High priority to Closed Aug 7, 2019

coryan pushed a commit that referenced this issue Aug 7, 2019

Increase download and upload buffers. (#2945)

389a5e0

This fixes #2657.

coryan pushed a commit that referenced this issue Aug 7, 2019

Increase the threshold fur using resumable uploads (#2946)

5ad3645

This is an effect of experiments in #2657.

coryan mentioned this issue Dec 16, 2019

Write speeds not affected by write chunk size #3290

Closed

coryan mentioned this issue Apr 1, 2021

[Q] ...Set some Curl options from Client/ClientOptions #6136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase default buffer sizes for storage uploads and downloads. #2657

Increase default buffer sizes for storage uploads and downloads. #2657

coryan commented May 14, 2019

coryan commented May 27, 2019

coryan commented Jun 19, 2019

dopiera commented Aug 7, 2019

dopiera commented Aug 7, 2019

Increase default buffer sizes for storage uploads and downloads. #2657

Increase default buffer sizes for storage uploads and downloads. #2657

Comments

coryan commented May 14, 2019

coryan commented May 27, 2019

coryan commented Jun 19, 2019

dopiera commented Aug 7, 2019

dopiera commented Aug 7, 2019