Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/aes: ghash is 4X too slow on amd64 #21501

Closed
rsc opened this issue Aug 17, 2017 · 3 comments
Closed

crypto/aes: ghash is 4X too slow on amd64 #21501

rsc opened this issue Aug 17, 2017 · 3 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 17, 2017

Diff for crypto/aes adding a new benchmark BenchmarkAESGCMSign8K, which is GMAC-like:

diff --git a/src/crypto/cipher/benchmark_test.go b/src/crypto/cipher/benchmark_test.go
index 93c40d0f46..83fede242f 100644
--- a/src/crypto/cipher/benchmark_test.go
+++ b/src/crypto/cipher/benchmark_test.go
@@ -26,6 +26,22 @@ func benchmarkAESGCMSeal(b *testing.B, buf []byte) {
 	}
 }
 
+func benchmarkAESGCMSign(b *testing.B, buf []byte) {
+	b.SetBytes(int64(len(buf)))
+
+	var key [16]byte
+	var nonce [12]byte
+	aes, _ := aes.NewCipher(key[:])
+	aesgcm, _ := cipher.NewGCM(aes)
+	var out []byte
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		out = append(out[:0], buf...)
+		out = aesgcm.Seal(out, nonce[:], nil, buf)
+	}
+}
+
 func benchmarkAESGCMOpen(b *testing.B, buf []byte) {
 	b.SetBytes(int64(len(buf)))
 
@@ -58,6 +74,10 @@ func BenchmarkAESGCMSeal8K(b *testing.B) {
 	benchmarkAESGCMSeal(b, make([]byte, 8*1024))
 }
 
+func BenchmarkAESGCMSign8K(b *testing.B) {
+	benchmarkAESGCMSign(b, make([]byte, 8*1024))
+}
+
 func BenchmarkAESGCMOpen8K(b *testing.B) {
 	benchmarkAESGCMOpen(b, make([]byte, 8*1024))
 }

On my Core i5 Skylake laptop using amd64:

With the standard Go crypto routines, Seal8K runs at 4.0 GB/s, while Sign8K runs at 1.3 GB/s.

With BoringCrypto, Seal8K runs at 3.7 GB/s, while Sign8K runs at 6.0 GB/s. The 4.0 vs 3.7 for Seal8K could be cgo overhead (or not), but the 1.3 vs 6.0 for Sign8K is clearly more than that. Profiling shows that the Go code spends its time in gcmAesData processing 16 bytes at a time, while the BoringCrypto code spends its time in gcm_ghash_avx processing 256 bytes at a time, apparently at a 4X speed win.

Go should take the 4X speed win too.

/cc @agl @vkrasnov

@rsc rsc added this to the Go1.10 milestone Aug 17, 2017
@vkrasnov
Copy link
Contributor

Optimizing GHASH is a trivial change to the code, I just didn't know there was a use case for GHASH without GCM. It really was optimized for TLS with 13byte additional authenticated data.
I can easily write a fix if there is demand.

@rsc
Copy link
Contributor Author

rsc commented Aug 18, 2017

I think there is demand. This came up in an application using AES-GMAC which was surprised it was slower than AES-GCM.

@vkrasnov
Copy link
Contributor

@agl agl closed this as completed Aug 18, 2017
@golang golang locked and limited conversation to collaborators Aug 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants