Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/aes: add dedicated asm version of AES, AES-GCM for arm64 #18498

Closed
matt2909 opened this issue Jan 3, 2017 · 14 comments
Closed

crypto/aes: add dedicated asm version of AES, AES-GCM for arm64 #18498

matt2909 opened this issue Jan 3, 2017 · 14 comments

Comments

@matt2909
Copy link
Contributor

matt2909 commented Jan 3, 2017

Add a dedicated asm version of AES, AES-GCM for arm64 - utilizing ARMv8-A crypto extension when available.

It should be noted that an asm accelerated version of this algorithm, utilizing AES-NI when available, exists for amd64.

@matt2909
Copy link
Contributor Author

matt2909 commented Jan 3, 2017

A partial implementation seems to have been developed under changelist:
https://go-review.googlesource.com/#/c/32579/

@yonderblue
Copy link

Any chance to be targeted for 1.9?

@bradfitz bradfitz modified the milestones: Go1.9, Unplanned Feb 23, 2017
@bradfitz
Copy link
Contributor

@cherrymui, is this something you could review?

@cherrymui
Copy link
Member

I am not familiar with the algorithm. I may be able to review in terms of whether the assembly version and the Go version are equivalent. Probably not so fast though.

@vielmetti
Copy link

There's related wrok here https://github.com/minio/sha256-simd/ and this open issue there for upstream support minio/sha256-simd#7

@matt2909
Copy link
Contributor Author

matt2909 commented Mar 27, 2017 via email

@ALTree
Copy link
Member

ALTree commented Oct 25, 2017

A partial fix here: https://go-review.googlesource.com/c/go/+/64490

@gopherbot
Copy link

Change https://golang.org/cl/64490 mentions this issue: crypto/aes: optimize arm64 AES implementation

@gopherbot
Copy link

Change https://golang.org/cl/77810 mentions this issue: crypto/aes: implement AES-GCM mode

@gopherbot
Copy link

Change https://golang.org/cl/102460 mentions this issue: crypto/aes: implement AES-GCM mode(interleave of CTR and GHASH ) for arm64

@gopherbot
Copy link

Change https://golang.org/cl/107298 mentions this issue: crypto/aes: implement AES-GCM AEAD for arm64

@vielmetti
Copy link

On arm64, Packet Type 2A / c1.large.arm Cavium ThunderX:

ed@ed-2a-bcc-llvm:~$ go test crypto/cipher -bench GCM

goos: linux
goarch: arm64
pkg: crypto/cipher
BenchmarkAESGCMSeal1K-96           20000             68788 ns/op          14.89 MB/s
BenchmarkAESGCMOpen1K-96           20000             68697 ns/op          14.91 MB/s
BenchmarkAESGCMSign8K-96           10000            182114 ns/op          44.98 MB/s
BenchmarkAESGCMSeal8K-96            3000            536359 ns/op          15.27 MB/s
BenchmarkAESGCMOpen8K-96            3000            537432 ns/op          15.24 MB/s
PASS
ok      crypto/cipher   9.404s
ed@ed-2a-bcc-llvm:~$ 
ed@ed-2a-bcc-llvm:~$ go version
go version go1.10.2 linux/arm64
ed@ed-2a-bcc-llvm:~$ ~/go/bin/go1.11beta1 test crypto/cipher -bench GCM
goos: linux
goarch: arm64
pkg: crypto/cipher
BenchmarkAESGCMSeal1K-96           50000             37520 ns/op          27.29 MB/s
BenchmarkAESGCMOpen1K-96           50000             37550 ns/op          27.27 MB/s
BenchmarkAESGCMSign8K-96           10000            172278 ns/op          47.55 MB/s
BenchmarkAESGCMSeal8K-96            5000            289794 ns/op          28.27 MB/s
BenchmarkAESGCMOpen8K-96            5000            288511 ns/op          28.39 MB/s
PASS
ok      crypto/cipher   9.274s

1.11beta1 is substantially faster than 1.10.2.

@jared2501
Copy link

So excited to see this merge! https://go-review.googlesource.com/c/go/+/107298

gopherbot pushed a commit that referenced this issue Jul 20, 2018
Use the dedicated AES* and PMULL* instructions to accelerate AES-GCM

name              old time/op    new time/op      delta
AESGCMSeal1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.66%  (p=0.000 n=9+10)
AESGCMOpen1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.43%  (p=0.000 n=10+10)
AESGCMSign8K-46     58.6µs ± 0%       2.1µs ± 0%    -96.41%  (p=0.000 n=9+8)
AESGCMSeal8K-46     92.8µs ± 0%       5.7µs ± 0%    -93.86%  (p=0.000 n=9+9)
AESGCMOpen8K-46     92.9µs ± 0%       5.7µs ± 0%    -93.84%  (p=0.000 n=8+9)

name              old speed      new speed        delta
AESGCMSeal1K-46   84.7MB/s ± 0%  1153.4MB/s ± 0%  +1262.21%  (p=0.000 n=9+10)
AESGCMOpen1K-46   84.4MB/s ± 0%  1115.2MB/s ± 0%  +1220.53%  (p=0.000 n=10+10)
AESGCMSign8K-46    140MB/s ± 0%    3894MB/s ± 0%  +2687.50%  (p=0.000 n=9+10)
AESGCMSeal8K-46   88.2MB/s ± 0%  1437.5MB/s ± 0%  +1529.30%  (p=0.000 n=9+9)
AESGCMOpen8K-46   88.2MB/s ± 0%  1430.5MB/s ± 0%  +1522.01%  (p=0.000 n=8+9)

This change mirrors the current amd64 implementation, and provides optimal performance
on a range of arm64 processors including Centriq 2400 and Apple A12. By and large it is
implicitly tested by the robustness of the already existing amd64 implementation.

The implementation interleaves GHASH with CTR mode to achieve the highest possible
throughput, it also aggregates GHASH with a factor of 8, to decrease the cost of the
reduction step.

Even thought there is a significant amount of assembly, the code reuses the go
code for the amd64 implementation, so there is little additional go code.

Since AES-GCM is critical for performance of all web servers, this change is
required to level the playfield for arm64 CPUs, where amd64 currently enjoys an
unfair advantage.

Ideally both amd64 and arm64 codepaths could be replaced by hypothetical AES and
CLMUL intrinsics, with a few additional vector instructions.

Fixes #18498
Fixes #19840

Change-Id: Icc57b868cd1f67ac695c1ac163a8e215f74c7910
Reviewed-on: https://go-review.googlesource.com/107298
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@jared2501
Copy link

Thanks so much for the hard work on this!

FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
Use the dedicated AES* and PMULL* instructions to accelerate AES-GCM

name              old time/op    new time/op      delta
AESGCMSeal1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.66%  (p=0.000 n=9+10)
AESGCMOpen1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.43%  (p=0.000 n=10+10)
AESGCMSign8K-46     58.6µs ± 0%       2.1µs ± 0%    -96.41%  (p=0.000 n=9+8)
AESGCMSeal8K-46     92.8µs ± 0%       5.7µs ± 0%    -93.86%  (p=0.000 n=9+9)
AESGCMOpen8K-46     92.9µs ± 0%       5.7µs ± 0%    -93.84%  (p=0.000 n=8+9)

name              old speed      new speed        delta
AESGCMSeal1K-46   84.7MB/s ± 0%  1153.4MB/s ± 0%  +1262.21%  (p=0.000 n=9+10)
AESGCMOpen1K-46   84.4MB/s ± 0%  1115.2MB/s ± 0%  +1220.53%  (p=0.000 n=10+10)
AESGCMSign8K-46    140MB/s ± 0%    3894MB/s ± 0%  +2687.50%  (p=0.000 n=9+10)
AESGCMSeal8K-46   88.2MB/s ± 0%  1437.5MB/s ± 0%  +1529.30%  (p=0.000 n=9+9)
AESGCMOpen8K-46   88.2MB/s ± 0%  1430.5MB/s ± 0%  +1522.01%  (p=0.000 n=8+9)

This change mirrors the current amd64 implementation, and provides optimal performance
on a range of arm64 processors including Centriq 2400 and Apple A12. By and large it is
implicitly tested by the robustness of the already existing amd64 implementation.

The implementation interleaves GHASH with CTR mode to achieve the highest possible
throughput, it also aggregates GHASH with a factor of 8, to decrease the cost of the
reduction step.

Even thought there is a significant amount of assembly, the code reuses the go
code for the amd64 implementation, so there is little additional go code.

Since AES-GCM is critical for performance of all web servers, this change is
required to level the playfield for arm64 CPUs, where amd64 currently enjoys an
unfair advantage.

Ideally both amd64 and arm64 codepaths could be replaced by hypothetical AES and
CLMUL intrinsics, with a few additional vector instructions.

Fixes golang#18498
Fixes golang#19840

Change-Id: Icc57b868cd1f67ac695c1ac163a8e215f74c7910
Reviewed-on: https://go-review.googlesource.com/107298
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
Use the dedicated AES* and PMULL* instructions to accelerate AES-GCM

name              old time/op    new time/op      delta
AESGCMSeal1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.66%  (p=0.000 n=9+10)
AESGCMOpen1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.43%  (p=0.000 n=10+10)
AESGCMSign8K-46     58.6µs ± 0%       2.1µs ± 0%    -96.41%  (p=0.000 n=9+8)
AESGCMSeal8K-46     92.8µs ± 0%       5.7µs ± 0%    -93.86%  (p=0.000 n=9+9)
AESGCMOpen8K-46     92.9µs ± 0%       5.7µs ± 0%    -93.84%  (p=0.000 n=8+9)

name              old speed      new speed        delta
AESGCMSeal1K-46   84.7MB/s ± 0%  1153.4MB/s ± 0%  +1262.21%  (p=0.000 n=9+10)
AESGCMOpen1K-46   84.4MB/s ± 0%  1115.2MB/s ± 0%  +1220.53%  (p=0.000 n=10+10)
AESGCMSign8K-46    140MB/s ± 0%    3894MB/s ± 0%  +2687.50%  (p=0.000 n=9+10)
AESGCMSeal8K-46   88.2MB/s ± 0%  1437.5MB/s ± 0%  +1529.30%  (p=0.000 n=9+9)
AESGCMOpen8K-46   88.2MB/s ± 0%  1430.5MB/s ± 0%  +1522.01%  (p=0.000 n=8+9)

This change mirrors the current amd64 implementation, and provides optimal performance
on a range of arm64 processors including Centriq 2400 and Apple A12. By and large it is
implicitly tested by the robustness of the already existing amd64 implementation.

The implementation interleaves GHASH with CTR mode to achieve the highest possible
throughput, it also aggregates GHASH with a factor of 8, to decrease the cost of the
reduction step.

Even thought there is a significant amount of assembly, the code reuses the go
code for the amd64 implementation, so there is little additional go code.

Since AES-GCM is critical for performance of all web servers, this change is
required to level the playfield for arm64 CPUs, where amd64 currently enjoys an
unfair advantage.

Ideally both amd64 and arm64 codepaths could be replaced by hypothetical AES and
CLMUL intrinsics, with a few additional vector instructions.

Fixes golang#18498
Fixes golang#19840

Change-Id: Icc57b868cd1f67ac695c1ac163a8e215f74c7910
Reviewed-on: https://go-review.googlesource.com/107298
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Jul 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants