Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL #22809

Closed
williamweixiao opened this issue Nov 19, 2017 · 17 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@williamweixiao
Copy link
Member

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9.2 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="arm64"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_arm64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

go test vendor/golang_org/x/crypto/chacha20poly1305 -bench .

What did you expect to see?

Performance can be on par with OpenSSL (https://blog.cloudflare.com/content/images/2017/11/sym_key_1_core.png)

What did you see instead?

3X slower than OpenSSL( https://blog.cloudflare.com/content/images/2017/11/go_sym_key_1_core.png)

@titanous titanous added help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance labels Nov 21, 2017
@titanous titanous added this to the Unplanned milestone Nov 21, 2017
@dmitshur dmitshur changed the title crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL x/crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL Mar 20, 2018
@gopherbot
Copy link

Change https://golang.org/cl/105895 mentions this issue: crypto/poly1305: arm64 implementation using multiword arithmetic

@gopherbot
Copy link

Change https://golang.org/cl/105896 mentions this issue: crypto/poly1305: arm64 implementation using multiword arithmetic

@gopherbot
Copy link

Change https://golang.org/cl/107628 mentions this issue: internal/chacha20: add arm64 SIMD implementation

@vielmetti
Copy link

Substantial perf improvements on Cavium ThunderX going from go 1.10.2 to go1.11beta1, but not 3x faster.

ed@ed-2a-bcc-llvm:~$ go version
go version go1.10.2 linux/arm64
ed@ed-2a-bcc-llvm:~$ go test vendor/golang_org/x/crypto/chacha20poly1305 -bench .
goos: linux
goarch: arm64
pkg: vendor/golang_org/x/crypto/chacha20poly1305
BenchmarkChacha20Poly1305Open_64-96               500000              3047 ns/op          21.00 MB/s
BenchmarkChacha20Poly1305Seal_64-96               500000              2920 ns/op          21.91 MB/s
BenchmarkChacha20Poly1305Open_1350-96              50000             30990 ns/op          43.56 MB/s
BenchmarkChacha20Poly1305Seal_1350-96              50000             30890 ns/op          43.70 MB/s
BenchmarkChacha20Poly1305Open_8K-96                10000            173794 ns/op          47.14 MB/s
BenchmarkChacha20Poly1305Seal_8K-96                10000            173907 ns/op          47.11 MB/s
PASS
ok      vendor/golang_org/x/crypto/chacha20poly1305     10.538s
ed@ed-2a-bcc-llvm:~$ 
ed@ed-2a-bcc-llvm:~$ ~/go/bin/go1.11beta1 test vendor/golang_org/x/crypto/chacha20poly1305 -bench .
goos: linux
goarch: arm64
pkg: vendor/golang_org/x/crypto/chacha20poly1305
BenchmarkChacha20Poly1305Open_64-96              1000000              2249 ns/op          28.45 MB/s
BenchmarkChacha20Poly1305Seal_64-96              1000000              2245 ns/op          28.50 MB/s
BenchmarkChacha20Poly1305Open_1350-96             100000             19541 ns/op          69.08 MB/s
BenchmarkChacha20Poly1305Seal_1350-96             100000             19439 ns/op          69.45 MB/s
BenchmarkChacha20Poly1305Open_8K-96                10000            105547 ns/op          77.61 MB/s
BenchmarkChacha20Poly1305Seal_8K-96                10000            105938 ns/op          77.33 MB/s
PASS
ok      vendor/golang_org/x/crypto/chacha20poly1305     11.173s

@mengzhuo
Copy link
Contributor

@vielmetti internal/chacha20 won't be merged into go1.11 since it's frozen.

@vielmetti
Copy link

Thanks @mengzhuo . Can we get this onto the go1.12 roster then? It's currently marked as "Unplanned".

@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Go1.12 Jun 29, 2018
@ianlancetaylor
Copy link
Contributor

I changed the milestone, but note that that doesn't cause the work to be done. This is an open source project so the best way to get something done is to volunteer to do it. Thanks.

@vielmetti
Copy link

Noted at https://go-review.googlesource.com/c/crypto/+/107628

"If you prioritize arm64 chacha and arm64 poly, it will see production use super soon after."

It appears from comments on that patch that the coding work has largely been done but there are constraints on the availability of reviewers for arm64 assembly.

@vielmetti
Copy link

Who is reviewing arm64 assembly these days, @ianlancetaylor , and can they use qualified help? I'm happy to help recruit qualified reviewers from the arm64 Go community if I know the qualifications.

@ianlancetaylor
Copy link
Contributor

At present arm64 assembly is typically reviewed by the tireless @cherrymui . I'm sure @benshi001 would also have good input.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 3, 2019

@FiloSottile This was marked for 1.12. Things still on target for that?

@vielmetti
Copy link

This issue is marked currently as "help wanted". What is the nature of the help desired?

@ianlancetaylor
Copy link
Contributor

@vielmetti Figuring out how to make the code run faster.

One of the meanings of the "help wanted" label is "we would like this to happen but nobody is working on it."

@zx2c4
Copy link
Contributor

zx2c4 commented Feb 6, 2019

"we would like this to happen but nobody is working on it."

Pretty sure somebody was working on it, but then CL just didn't get much of a review. Until today, that is.

@ianlancetaylor
Copy link
Contributor

@zx2c4 The "help wanted" label was added before any of the CLs were sent. But I should have been clearer in my response; my apologies.

gopherbot pushed a commit to golang/crypto that referenced this issue Feb 11, 2019
Inspired by Vectorization of ChaCha Stream Cipher
https://eprint.iacr.org/2013/759.pdf

name            old time/op    new time/op    delta
ChaCha20/32        690ns ± 0%     872ns ± 0%   +26.38%  (p=0.000 n=10+10)
ChaCha20/63        750ns ± 0%     987ns ± 0%   +31.53%  (p=0.000 n=10+10)
ChaCha20/64        674ns ± 0%     879ns ± 0%   +30.42%  (p=0.000 n=8+10)
ChaCha20/256      2.28µs ± 0%    0.82µs ± 0%   -64.13%  (p=0.000 n=10+10)
ChaCha20/1024     8.64µs ± 0%    2.92µs ± 0%   -66.15%  (p=0.000 n=9+9)
ChaCha20/1350     11.9µs ± 0%     4.5µs ± 0%   -62.51%  (p=0.000 n=10+8)
ChaCha20/65536     554µs ± 0%     181µs ± 0%   -67.33%  (p=0.000 n=10+10)

name            old speed      new speed      delta
ChaCha20/32     46.3MB/s ± 0%  36.7MB/s ± 0%   -20.87%  (p=0.000 n=10+9)
ChaCha20/63     83.9MB/s ± 0%  63.8MB/s ± 0%   -23.97%  (p=0.000 n=10+10)
ChaCha20/64     94.9MB/s ± 0%  72.8MB/s ± 0%   -23.31%  (p=0.000 n=10+10)
ChaCha20/256     112MB/s ± 0%   312MB/s ± 0%  +178.74%  (p=0.000 n=10+10)
ChaCha20/1024    119MB/s ± 0%   350MB/s ± 0%  +195.31%  (p=0.000 n=10+9)
ChaCha20/1350    114MB/s ± 0%   303MB/s ± 0%  +166.73%  (p=0.000 n=8+8)
ChaCha20/65536   118MB/s ± 0%   362MB/s ± 0%  +206.12%  (p=0.000 n=10+10)

Updates golang/go#22809
Change-Id: I487487faa2ae4ff29de6fd8eb1317740c2939c10
Reviewed-on: https://go-review.googlesource.com/c/107628
Reviewed-by: Filippo Valsorda <filippo@golang.org>
@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@andybons andybons removed this from the Go1.13 milestone Jul 8, 2019
@andybons andybons added this to the Go1.14 milestone Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
bored-engineer pushed a commit to bored-engineer/ssh that referenced this issue Oct 13, 2019
Inspired by Vectorization of ChaCha Stream Cipher
https://eprint.iacr.org/2013/759.pdf

name            old time/op    new time/op    delta
ChaCha20/32        690ns ± 0%     872ns ± 0%   +26.38%  (p=0.000 n=10+10)
ChaCha20/63        750ns ± 0%     987ns ± 0%   +31.53%  (p=0.000 n=10+10)
ChaCha20/64        674ns ± 0%     879ns ± 0%   +30.42%  (p=0.000 n=8+10)
ChaCha20/256      2.28µs ± 0%    0.82µs ± 0%   -64.13%  (p=0.000 n=10+10)
ChaCha20/1024     8.64µs ± 0%    2.92µs ± 0%   -66.15%  (p=0.000 n=9+9)
ChaCha20/1350     11.9µs ± 0%     4.5µs ± 0%   -62.51%  (p=0.000 n=10+8)
ChaCha20/65536     554µs ± 0%     181µs ± 0%   -67.33%  (p=0.000 n=10+10)

name            old speed      new speed      delta
ChaCha20/32     46.3MB/s ± 0%  36.7MB/s ± 0%   -20.87%  (p=0.000 n=10+9)
ChaCha20/63     83.9MB/s ± 0%  63.8MB/s ± 0%   -23.97%  (p=0.000 n=10+10)
ChaCha20/64     94.9MB/s ± 0%  72.8MB/s ± 0%   -23.31%  (p=0.000 n=10+10)
ChaCha20/256     112MB/s ± 0%   312MB/s ± 0%  +178.74%  (p=0.000 n=10+10)
ChaCha20/1024    119MB/s ± 0%   350MB/s ± 0%  +195.31%  (p=0.000 n=10+9)
ChaCha20/1350    114MB/s ± 0%   303MB/s ± 0%  +166.73%  (p=0.000 n=8+8)
ChaCha20/65536   118MB/s ± 0%   362MB/s ± 0%  +206.12%  (p=0.000 n=10+10)

Updates golang/go#22809
Change-Id: I487487faa2ae4ff29de6fd8eb1317740c2939c10
Reviewed-on: https://go-review.googlesource.com/c/107628
Reviewed-by: Filippo Valsorda <filippo@golang.org>
@rfjakob
Copy link

rfjakob commented Apr 11, 2020

Looks this has been done via https://github.com/golang/crypto/blob/master/chacha20/chacha_arm64.s .

@mengzhuo
Copy link
Contributor

mengzhuo commented Dec 16, 2020

Could someone close this issue? Apparently it's done.

@golang golang locked and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

10 participants