Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/internal/nistec: remove ppc64le assembly #52424

Open
FiloSottile opened this issue Apr 19, 2022 · 3 comments
Open

crypto/internal/nistec: remove ppc64le assembly #52424

FiloSottile opened this issue Apr 19, 2022 · 3 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@FiloSottile
Copy link
Contributor

In #52182 (comment), @laboger reports that the fiat-crypto (#40171) code with @pmur's compiler improvements (https://go.dev/cl/393656) is within range of the assembly performance!

This is extremely impressive considering the fiat-crypto code also uses safer but slower complete formulas and a somewhat naive 4-bit scalar multiplication window.

ScalarBaseMult/P256                    237µs ± 0%      52µs ± 0%   -78.22%  (p=1.000 n=1+1)
ScalarMult/P256                        239µs ± 0%     213µs ± 0%   -10.95%  (p=1.000 n=1+1)

The ScalarBaseMult benchmark is still significantly slower, because the assembly uses a large precomputed table, while the fiat-crypto code just runs ScalarMult. This is very much fixable.

I will land the ScalarBaseMult optimization in the fiat-crypto code, and then we can remove the ppc64le assembly entirely!

@FiloSottile FiloSottile self-assigned this Apr 19, 2022
@seankhliao seankhliao added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 22, 2022
@FiloSottile
Copy link
Contributor Author

https://go.dev/cl/404174 is the promised ScalarBaseMult optimization, so it's possible that the assembly is now slower than the fiat-crypto code!

@gopherbot
Copy link

Change https://go.dev/cl/404174 mentions this issue: crypto/elliptic: precompute ScalarBaseMult doublings

gopherbot pushed a commit that referenced this issue May 5, 2022
name                    old time/op    new time/op    delta
pkg:crypto/ecdsa goos:darwin goarch:amd64
Sign/P224-16               250µs ± 2%      91µs ± 2%  -63.42%  (p=0.000 n=10+9)
Sign/P384-16               955µs ± 3%     311µs ± 2%  -67.48%  (p=0.000 n=10+10)
Sign/P521-16              2.74ms ± 2%    0.82ms ± 2%  -69.95%  (p=0.000 n=10+10)
Verify/P224-16             440µs ± 3%     282µs ± 5%  -35.94%  (p=0.000 n=9+10)
Verify/P384-16            1.72ms ± 2%    1.07ms ± 1%  -38.02%  (p=0.000 n=10+9)
Verify/P521-16            5.10ms ± 2%    3.18ms ± 3%  -37.70%  (p=0.000 n=10+10)
GenerateKey/P224-16        225µs ± 3%      67µs ± 4%  -70.42%  (p=0.000 n=9+10)
GenerateKey/P384-16        881µs ± 1%     241µs ± 2%  -72.67%  (p=0.000 n=10+10)
GenerateKey/P521-16       2.62ms ± 3%    0.69ms ± 3%  -73.78%  (p=0.000 n=10+9)
pkg:crypto/elliptic/internal/nistec goos:darwin goarch:amd64
ScalarMult/P224-16         219µs ± 4%     209µs ± 3%   -4.57%  (p=0.003 n=10+10)
ScalarMult/P384-16         838µs ± 2%     823µs ± 1%   -1.72%  (p=0.004 n=10+9)
ScalarMult/P521-16        2.48ms ± 2%    2.45ms ± 2%     ~     (p=0.052 n=10+10)
ScalarBaseMult/P224-16     214µs ± 4%      54µs ± 4%  -74.88%  (p=0.000 n=10+10)
ScalarBaseMult/P384-16     828µs ± 2%     196µs ± 3%  -76.38%  (p=0.000 n=10+10)
ScalarBaseMult/P521-16    2.50ms ± 3%    0.55ms ± 2%  -77.96%  (p=0.000 n=10+10)

Updates #52424
For #52182

Change-Id: I2be3c2b8cdeead512063ef489e43805f4ee71d0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/404174
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Fernando Lobato Meeser <felobato@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
@laboger
Copy link
Contributor

laboger commented May 11, 2022

Here are comparisons using noasm vs. asm using latest:

crypto/internal/nistec:
ScalarMult/P256         153µs ± 0%     145µs ± 0%   -5.84%  (p=1.000 n=1+1)
ScalarBaseMult/P256    45.2µs ± 0%    23.5µs ± 0%  -48.14%  (p=1.000 n=1+1)

crypto/elltipic:
ScalarBaseMult/P256                   52.3µs ± 0%    38.3µs ± 0%  -26.78%  (p=1.000 n=1+1)
ScalarMult/P256                        161µs ± 0%     160µs ± 0%   -1.02%  (p=1.000 n=1+1)

crypto/ecdsa:
Sign/P256           96.2µs ± 0%    87.8µs ± 0%   -8.71%  (p=1.000 n=1+1)
Verify/P256          212µs ± 0%     196µs ± 0%   -7.43%  (p=1.000 n=1+1)
GenerateKey/P256    53.8µs ± 0%    40.0µs ± 0%  -25.63%  (p=1.000 n=1+1)

No meaningful difference in the crypto/tls benchmarks.
Looks like the assembler version is still significantly faster than the native Go version for some.

@seankhliao seankhliao added this to the Go1.20 milestone Aug 20, 2022
@gopherbot gopherbot modified the milestones: Go1.20, Go1.21 Feb 1, 2023
@gopherbot gopherbot modified the milestones: Go1.21, Go1.22 Aug 8, 2023
@gopherbot gopherbot modified the milestones: Go1.22, Go1.23 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants