-
Notifications
You must be signed in to change notification settings - Fork 18k
crypto/elliptic: p224 could be faster #48914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks. Do you want to send in your changes? It's not necessary to open an issue for a change like this; it's OK to just send the change. If you've already sent the change, make sure the description says "Fixes #48914". Thanks. |
Hi Ian, Yes. I tried creating a branch, but Gerritt gives me an error regarding create permissions. Where can I send the change ? |
You create the branch in your repo and open a pr imho. Gerrit will import the pr for review. |
Hi @andig, I followed the contribution instructions before I saw your suggestions. Here's the id of the codereview: :$ git-codereview mail |
Change https://golang.org/cl/355349 mentions this issue: |
@ianlancetaylor CL has been updated ;) |
The dedicated P-224 code was removed in 93bab8a in favor of fiat-crypto generated code. |
What version of Go are you using (
go version
)?1.18
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?Linux 4.19.0-17-cloud-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
Intel, AVX-512 compatible
What did you do?
I ran some cryptotoken code with a linux performance profiler. The p224 code showed up as being a high CPU user.
I noticed that there are some tight loops in the p224.go code that should be unrolled. Other code in this repo is written in assembly, and it makes sense to have the best possible performance here.
Here are the before / after results for benchstat. You can see the improvement in the "ScalarBaseMult/p224-16' and 'ScalarMult/p224-16' lines below:
:$ benchstat 10tests-output.txt 10tests-faster-output.txt
name old time/op new time/op delta
ScalarBaseMult/P256-16 15.9µs ± 0% 15.6µs ± 0% -1.95% (p=0.000 n=9+9)
ScalarBaseMult/P224-16 768µs ± 1% 537µs ± 0% -30.02% (p=0.000 n=10+9)
ScalarBaseMult/P384-16 3.90ms ± 3% 3.82ms ± 3% -2.23% (p=0.023 n=10+10)
ScalarBaseMult/P521-16 1.81ms ± 1% 1.76ms ± 0% -2.54% (p=0.000 n=10+9)
ScalarMult/P256-16 64.2µs ± 0% 62.6µs ± 0% -2.49% (p=0.000 n=8+10)
ScalarMult/P224-16 771µs ± 1% 538µs ± 0% -30.28% (p=0.000 n=10+9)
ScalarMult/P384-16 3.90ms ± 3% 3.80ms ± 3% -2.49% (p=0.011 n=10+10)
ScalarMult/P521-16 1.80ms ± 2% 1.76ms ± 0% -2.30% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
ScalarBaseMult/P256-16 256B ± 0% 256B ± 0% ~ (all equal)
ScalarBaseMult/P224-16 192B ± 0% 192B ± 0% ~ (all equal)
ScalarBaseMult/P384-16 1.74MB ± 2% 1.75MB ± 3% ~ (p=0.393 n=10+10)
ScalarBaseMult/P521-16 776B ± 0% 776B ± 0% ~ (all equal)
ScalarMult/P256-16 256B ± 0% 256B ± 0% ~ (all equal)
ScalarMult/P224-16 256B ± 0% 256B ± 0% ~ (all equal)
ScalarMult/P384-16 1.73MB ± 4% 1.74MB ± 2% ~ (p=0.356 n=10+9)
ScalarMult/P521-16 775B ± 0% 775B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
ScalarBaseMult/P256-16 5.00 ± 0% 5.00 ± 0% ~ (all equal)
ScalarBaseMult/P224-16 4.00 ± 0% 4.00 ± 0% ~ (all equal)
ScalarBaseMult/P384-16 14.3k ± 2% 14.5k ± 3% ~ (p=0.481 n=10+10)
ScalarBaseMult/P521-16 10.0 ± 0% 10.0 ± 0% ~ (all equal)
ScalarMult/P256-16 5.00 ± 0% 5.00 ± 0% ~ (all equal)
ScalarMult/P224-16 6.00 ± 0% 6.00 ± 0% ~ (all equal)
ScalarMult/P384-16 14.3k ± 4% 14.3k ± 2% ~ (p=0.345 n=10+9)
ScalarMult/P521-16 10.0 ± 0% 10.0 ± 0% ~ (all equal)
The text was updated successfully, but these errors were encountered: