New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/cipher: HTTPS throughput is poor #11929
Comments
/cc @agl |
@jacobsa crypto/aes already uses AES-NI when possible for the AES block calculations (I believe that's the I can't locate any previous discussion of GCM-specific hardware acceleration in issues or on the mailing lists. |
Thanks to the inspiration from @cespare I wrote an implementation based on this Intel whitepaper that's about twice as fast. We could probably do better by incorporating the techniques for parallelizing blocks from that paper, but it's a start. I'll clean it up so that it works on other architectures and older CPUs, then send a CL. |
CL https://golang.org/cl/13020 mentions this issue. |
How does this play with https://groups.google.com/forum/#!msg/golang-codereviews/m5QTnSUZU6c/Jc5yaMyF2_QJ ? |
Note that we do have https://go-review.googlesource.com/#/c/10484/ pending for 1.6. |
@agl: Awesome, this is where I was hoping to go eventually anyway. I only wish I had known about this before I spent a day on my CL. :-( |
This essentially exercises GCS throughput. The current results are pretty crappy, probably in large part due to Go's poor AES-GCM performange: golang/go#11929 For #83.
This doubles the throughput of AES-GCM encryption and decryption, mostly due to being much faster to perform the GF(2^128) multiplication. The implementation of the multiplication step is based on the gfmul function from this Intel whitepaper: https://software.intel.com/sites/default/files/managed/72/cc/clmul-wp-rev-2.02-2014-04-20.pdf I suspect we could do better by using the techniques in that paper for the overall AES-GCM process, but this gives a nice win already. For golang#11929. Before: BenchmarkAESGCMSeal1K-12 200000 8449 ns/op 121.19 MB/s BenchmarkAESGCMOpen1K-12 200000 8461 ns/op 121.01 MB/s After: BenchmarkAESGCMSeal1K-12 300000 4028 ns/op 254.18 MB/s BenchmarkAESGCMOpen1K-12 300000 4047 ns/op 252.97 MB/s Change-Id: I819339be142e67d3482b832ff012afe036e96222
The flush performance is a bit shitty, probably because of golang/go#11929. % go install -v && gcsfuse --temp-dir /mnt/ssd0 jacobsa-standard-asia ~/mp % go build ./benchmarks/write_to_gcs && ./write_to_gcs --dir ~/mp Wrote 1.00 GiB in 3.58963678s (285.27 MiB/s) Flushed 1.00 GiB in 28.135228987s (36.40 MiB/s)
The flush performance is a bit shitty, probably because of golang/go#11929. % go install -v && gcsfuse --temp-dir /mnt/ssd0 jacobsa-standard-asia ~/mp % go build ./benchmarks/write_to_gcs && ./write_to_gcs --dir ~/mp Wrote 1.00 GiB in 3.58963678s (285.27 MiB/s) Flushed 1.00 GiB in 28.135228987s (36.40 MiB/s)
For the record, that CL was merged as eedaf9a. That appears to help a lot; not sure if it's good enough to call the issue fixed or not. |
@jacobsa, can you update this bug with a status update of where we're at, numbers-wise? |
For the record, here's what I did to set up a VM and the object:
Curl's throughput varies a decent amount, but this run showing 245 MiB/CPU-second seems representative:
Go 1.5.1 looks like this:
And at tip:
So looking pretty comparable with curl. Nice work. |
I should have translated the output to be comparable with curl. Go 1.5.1 does 80 MiB/CPU-second, and tip does 248 MiB/CPU-second. So about as fast as curl now. |
Can this be done for ARM and ARM64? |
@Gaillard, search for (or create) an ARM-specific bug. This one is closed and we don't reuse issues. |
will do thanks |
Here is a program that downloads the first 1 GiB from a large object full of zeroes stored in Google Cloud Storage, and reports its throughput. It also dumps a CPU profile to
/tmp/cpu.pprof
.Running on an
n1-standard-4
Google Cloud Engine VM inus-central1-a
on Sandy Bridge CPUs looks like this:In other words, it takes 100% CPU to do 78 MiB/s; CPU appears to be the bottleneck here. Compare that to
curl
:Here it appears we've maxed out the network bandwidth available, using 64% CPU. The throughput in terms of walltime is 3x as high. Throughput in terms of bytes per CPU second is about 5x as high.
The CPU profile for the Go program looks like this:
I know that it's possible to do some AES cipher work in hardware on modern Intel CPUs, and presumably that's what curl is doing via OpenSSL. Has this been considered for
crypto/cipher
in Go?The text was updated successfully, but these errors were encountered: