Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto: understand performance differences compared to BoringSSL #21525

Open
rsc opened this issue Aug 18, 2017 · 7 comments
Open

crypto: understand performance differences compared to BoringSSL #21525

rsc opened this issue Aug 18, 2017 · 7 comments

Comments

@rsc
Copy link
Contributor

rsc commented Aug 18, 2017

I ran all the crypto benchmarks with standard Go crypto and with BoringCrypto. Results below.

In general there is about a 200ns overhead to calling into BoringCrypto via cgo for a particular call. So for example aes.BenchmarkEncrypt (testing encryption of a single 16-byte block) went from 13ns to 209ns, or +1500%. That we can't do much about except hope that bulk operations call into cgo once instead of once per 16 bytes.

But there are also some mysteries or things to consider fixing. I've put this in milestone Go 1.10 because some of them may be bugs in the Go distribution that we should at least understand. Once we know that the problems are all on the dev.boringcrypto side, we can switch the milestone to Unreleased.

crypto/aes

  • AESCFBEncrypt1K, AESCFBDecrypt1K, AESOFB1K are much slower because there is no bulk CFB operation (no cipher.cfbAble interface). Should there be? Probably.
  • AESCTR1K looks like it is not using the ctrAble implementation that boring.aesCipher is offering.
  • AESCBCEncrypt1K looks like it is not using the cbcEncAble implementation that boring.aesCipher is offering.
  • AESCBCDecrypt1K looks like it is not using the cbcDecAble implementation that boring.aesCipher is offering.

crypto/ecdsa

  • Why does SignP256 take an extra 9µs in BoringCrypto? That's too big to be cgo. Should the signature conversion be improved?
  • SignP384 drops from 5.54ms to 0.85ms, indicating that the Go implementation has 6X room for improvement.
  • Even after the 6X, I don't understand why P384 is so much slower than P256.
  • KeyGeneration is 10X slower in BoringCrypto than in Go. We should make sure the Go version is not missing something important.

crypto/hmac

  • How is it that HMACSHA256_1K takes the same amount of time as HMACSHA256_32 in BoringCrypto?
  • For that matter, how it is that, in BoringCrypto, HMACSHA256_1K takes 2µs but crypto/sha256's Hash1K takes 4µs?

crypto/rsa

  • Why is RSA2048Sign 3X faster in BoringCrypto?

Benchmark results (also at https://perf.golang.org/search?q=upload:20170818.4):

name                              old time/op    new time/op    delta
pkg:crypto/aes goos:linux goarch:amd64
Encrypt-4                           13.3ns ± 2%   208.6ns ± 5%  +1473.15%  (p=0.008 n=5+5)
Decrypt-4                           13.2ns ± 1%   255.0ns ± 0%  +1828.90%  (p=0.016 n=5+4)
Expand-4                            75.8ns ± 0%    76.4ns ± 1%       ~     (p=0.056 n=5+5)
pkg:crypto/cipher goos:linux goarch:amd64
AESGCMSeal1K-4                       341ns ± 1%     503ns ± 0%    +47.71%  (p=0.008 n=5+5)
AESGCMOpen1K-4                       321ns ± 0%     496ns ± 1%    +54.68%  (p=0.008 n=5+5)
AESGCMSeal8K-4                      2.04µs ± 0%    2.21µs ± 1%     +8.27%  (p=0.008 n=5+5)
AESGCMOpen8K-4                      1.97µs ± 1%    2.18µs ± 0%    +10.84%  (p=0.008 n=5+5)
AESCFBEncrypt1K-4                   2.37µs ± 0%   14.48µs ± 0%   +512.17%  (p=0.008 n=5+5)
AESCFBDecrypt1K-4                   2.27µs ± 1%   14.48µs ± 1%   +538.94%  (p=0.008 n=5+5)
AESOFB1K-4                          1.46µs ± 1%   13.76µs ± 1%   +844.07%  (p=0.008 n=5+5)
AESCTR1K-4                          1.66µs ± 1%    8.99µs ± 0%   +442.57%  (p=0.008 n=5+5)
AESCBCEncrypt1K-4                   2.27µs ± 1%    8.13µs ± 0%   +257.59%  (p=0.008 n=5+5)
AESCBCDecrypt1K-4                   1.67µs ± 2%   11.65µs ± 2%   +598.98%  (p=0.008 n=5+5)
pkg:crypto/des goos:linux goarch:amd64
Encrypt-4                            162ns ± 3%     159ns ± 1%       ~     (p=0.222 n=5+5)
Decrypt-4                            157ns ± 1%     158ns ± 2%       ~     (p=0.722 n=5+5)
TDESEncrypt-4                        380ns ± 0%     381ns ± 1%       ~     (p=0.857 n=5+5)
TDESDecrypt-4                        386ns ± 0%     386ns ± 0%       ~     (p=1.000 n=5+5)
pkg:crypto/ecdsa goos:linux goarch:amd64
SignP256-4                          36.2µs ± 2%    45.3µs ± 1%    +24.91%  (p=0.008 n=5+5)
SignP384-4                          5.54ms ± 0%    0.85ms ± 1%    -84.63%  (p=0.008 n=5+5)
VerifyP256-4                         104µs ± 1%     102µs ± 0%     -1.29%  (p=0.016 n=5+4)
KeyGeneration-4                     21.9µs ± 1%   200.3µs ± 0%   +815.39%  (p=0.008 n=5+5)
pkg:crypto/elliptic goos:linux goarch:amd64
BaseMult-4                           979µs ± 3%     954µs ± 1%     -2.62%  (p=0.008 n=5+5)
BaseMultP256-4                      19.8µs ± 0%    19.7µs ± 1%       ~     (p=0.151 n=5+5)
ScalarMultP256-4                    77.3µs ± 0%    76.7µs ± 0%     -0.70%  (p=0.008 n=5+5)
pkg:crypto/hmac goos:linux goarch:amd64
HMACSHA256_1K-4                     4.46µs ± 0%    1.92µs ± 0%    -56.91%  (p=0.008 n=5+5)
HMACSHA256_32-4                     1.16µs ± 0%    1.94µs ± 2%    +66.17%  (p=0.008 n=5+5)
pkg:crypto/md5 goos:linux goarch:amd64
Hash8Bytes-4                         184ns ± 0%     183ns ± 0%     -0.54%  (p=0.029 n=4+4)
Hash1K-4                            2.01µs ± 0%    2.01µs ± 1%       ~     (p=0.087 n=5+5)
Hash8K-4                            14.8µs ± 1%    14.8µs ± 0%       ~     (p=0.651 n=5+5)
Hash8BytesUnaligned-4                184ns ± 0%     183ns ± 0%     -0.89%  (p=0.000 n=5+4)
Hash1KUnaligned-4                   2.01µs ± 0%    2.00µs ± 0%     -0.41%  (p=0.040 n=5+5)
Hash8KUnaligned-4                   14.9µs ± 1%    15.2µs ± 3%       ~     (p=0.690 n=5+5)
pkg:crypto/rand goos:linux goarch:amd64
Prime-4                              146ms ±57%     143ms ± 8%       ~     (p=0.548 n=5+5)
pkg:crypto/rc4 goos:linux goarch:amd64
RC4_128-4                            313ns ± 3%     287ns ± 2%     -8.19%  (p=0.008 n=5+5)
RC4_1K-4                            2.72µs ± 2%    2.70µs ± 1%       ~     (p=0.151 n=5+5)
RC4_8K-4                            21.8µs ± 2%    22.0µs ± 2%       ~     (p=0.056 n=5+5)
pkg:crypto/rsa goos:linux goarch:amd64
RSA2048Sign-4                       2.90ms ± 1%    1.08ms ± 1%    -62.64%  (p=0.008 n=5+5)
pkg:crypto/sha1 goos:linux goarch:amd64
Hash8Bytes-4                         215ns ± 0%     562ns ± 1%   +161.58%  (p=0.016 n=4+5)
Hash320Bytes-4                       821ns ± 1%    1096ns ± 0%    +33.45%  (p=0.008 n=5+5)
Hash1K-4                            1.64µs ± 0%    2.26µs ± 0%    +37.21%  (p=0.008 n=5+5)
Hash8K-4                            10.6µs ± 0%    12.5µs ± 0%    +18.04%  (p=0.008 n=5+5)
pkg:crypto/sha256 goos:linux goarch:amd64
Hash8Bytes-4                         310ns ± 1%     679ns ± 1%   +118.96%  (p=0.008 n=5+5)
Hash1K-4                            3.61µs ± 0%    3.99µs ± 0%    +10.50%  (p=0.008 n=5+5)
Hash8K-4                            26.8µs ± 3%    26.6µs ± 1%       ~     (p=0.548 n=5+5)
pkg:crypto/sha512 goos:linux goarch:amd64
Hash8Bytes-4                         419ns ± 1%     805ns ± 1%    +91.94%  (p=0.008 n=5+5)
Hash1K-4                            2.67µs ± 1%    3.13µs ± 1%    +17.25%  (p=0.008 n=5+5)
Hash8K-4                            18.0µs ± 0%    18.8µs ± 1%     +4.07%  (p=0.008 n=5+5)
pkg:crypto/tls goos:linux goarch:amd64
Throughput/MaxPacket/1MB-4          4.02ms ± 1%    3.48ms ± 1%    -13.48%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB-4          6.11ms ± 2%    5.66ms ± 1%     -7.46%  (p=0.008 n=5+5)
Throughput/MaxPacket/4MB-4          10.3ms ± 1%    10.0ms ± 1%     -3.65%  (p=0.008 n=5+5)
Throughput/MaxPacket/8MB-4          18.6ms ± 1%    18.5ms ± 0%       ~     (p=0.151 n=5+5)
Throughput/MaxPacket/16MB-4         35.1ms ± 1%    35.5ms ± 1%       ~     (p=0.222 n=5+5)
Throughput/MaxPacket/32MB-4         68.0ms ± 1%    69.9ms ± 2%     +2.67%  (p=0.008 n=5+5)
Throughput/MaxPacket/64MB-4          133ms ± 1%     137ms ± 0%     +2.90%  (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB-4      4.11ms ± 1%    3.55ms ± 2%    -13.55%  (p=0.008 n=5+5)
Throughput/DynamicPacket/2MB-4      6.32ms ± 4%    5.70ms ± 2%     -9.80%  (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB-4      10.5ms ± 1%    10.1ms ± 1%     -3.51%  (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB-4      18.7ms ± 1%    18.6ms ± 0%       ~     (p=0.222 n=5+5)
Throughput/DynamicPacket/16MB-4     35.3ms ± 1%    35.7ms ± 1%     +1.18%  (p=0.032 n=5+5)
Throughput/DynamicPacket/32MB-4     67.9ms ± 0%    69.6ms ± 1%     +2.44%  (p=0.008 n=5+5)
Throughput/DynamicPacket/64MB-4      134ms ± 0%     137ms ± 1%     +2.21%  (p=0.016 n=4+5)
Latency/MaxPacket/200kbps-4          699ms ± 1%     697ms ± 0%       ~     (p=0.151 n=5+5)
Latency/MaxPacket/500kbps-4          286ms ± 0%     283ms ± 0%     -0.84%  (p=0.008 n=5+5)
Latency/MaxPacket/1000kbps-4         147ms ± 0%     145ms ± 0%     -1.62%  (p=0.008 n=5+5)
Latency/MaxPacket/2000kbps-4        77.9ms ± 1%    74.6ms ± 2%     -4.22%  (p=0.008 n=5+5)
Latency/MaxPacket/5000kbps-4        35.5ms ± 0%    33.2ms ± 4%     -6.46%  (p=0.008 n=5+5)
Latency/DynamicPacket/200kbps-4      139ms ± 3%     138ms ± 0%       ~     (p=0.151 n=5+5)
Latency/DynamicPacket/500kbps-4     60.7ms ± 1%    58.9ms ± 1%     -2.97%  (p=0.008 n=5+5)
Latency/DynamicPacket/1000kbps-4    34.0ms ± 2%    32.1ms ± 1%     -5.51%  (p=0.008 n=5+5)
Latency/DynamicPacket/2000kbps-4    20.0ms ± 1%    18.1ms ± 3%     -9.52%  (p=0.008 n=5+5)
Latency/DynamicPacket/5000kbps-4    10.2ms ± 4%    10.1ms ± 8%       ~     (p=1.000 n=5+5)
@rsc rsc added this to the Go1.10 milestone Aug 18, 2017
@valyala
Copy link
Contributor

valyala commented Aug 27, 2017

Cc'ing @agl

@aead
Copy link
Contributor

aead commented Sep 1, 2017

The other points may require a closer look - very nice to bring this up 👍

@vielmetti
Copy link

This issue should have a "Performance" label.

@odeke-em
Copy link
Member

odeke-em commented Mar 5, 2018

/cc-ing @FiloSottile too

@daixiang0
Copy link

@rsc it any update for this? Where can I see the latest go-BoringSSL benchmark?

@or-shachar
Copy link

@rsc - Looks like you created some kind of automated benchmarking speed.
I wonder what will be the result today.

Any chance to share the scripts so we can repeat the test?

PS: I know that support for boringssl is not promised but there is a community that is willing to invest the time to make sure it's 🌷

Thanks!

@or-shachar
Copy link

So I accidentally discovered that I can easily run benchmark on builtin packages by running go test -bench=. crypto/...

I've created a script"

# this is for go 1.19 + that supports using boringcrypto by simply setting env variable
go version
go install golang.org/x/perf/cmd/benchstat@latest

echo "=========================="
echo "BENCHMARK WITHOUT BORINGSSL"
echo "=========================="
go test -bench=. crypto/... -count 5 | tee old.txt

echo "=========================="
echo "BENCHMARK WITH BORINGSSL"
echo "=========================="
GOEXPERIMENT=boringcrypto go test -bench=. crypto/... -count 5 | tee new.txt

benchstat old.txt new.txt | tee benchmark_comparison.txt

If you'll confirm this script is actually what I need to run - I'm more than happy to share the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants