Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rsa: linux/arm64 Go 1.9 performance is +10X slower than OpenSSL #22807

Open
Tracked by #57752
williamweixiao opened this issue Nov 19, 2017 · 2 comments
Open
Tracked by #57752
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@williamweixiao
Copy link
Member

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9.2 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="arm64"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_arm64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

go test crypto/rsa -bench .

What did you expect to see?

Performance can be on par with OpenSSL (https://blog.cloudflare.com/content/images/2017/11/pub_key_1_core-2.png)

What did you see instead?

+10X slower than OpenSSL (https://blog.cloudflare.com/content/images/2017/11/go_pub_key_1_core.png)

@titanous titanous added help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance labels Nov 21, 2017
@titanous titanous added this to the Unplanned milestone Nov 21, 2017
@vielmetti
Copy link

Go 1.11beta1 is substantially faster than Go 1.10.2 on this test, on Cavium ThunderX / Packet c1.large.arm ("Type 2A").

ed@ed-2a-bcc-llvm:~$ go version
go version go1.10.2 linux/arm64
ed@ed-2a-bcc-llvm:~$ go test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkRSA2048Decrypt-96                    20          74651551 ns/op
BenchmarkRSA2048Sign-96                       20          77650290 ns/op
Benchmark3PrimeRSA2048Decrypt-96              50          35958813 ns/op
PASS
ok      crypto/rsa      8.809s
ed@ed-2a-bcc-llvm:~$ ~/go/bin/go1.11beta1 test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkRSA2048Decrypt-96                   100          11466566 ns/op
BenchmarkRSA2048Sign-96                      100          11855513 ns/op
Benchmark3PrimeRSA2048Decrypt-96             200           7684199 ns/op
PASS
ok      crypto/rsa      6.584s

@bobby-stripe
Copy link

bobby-stripe commented Mar 6, 2023

some updated numbers on a 3rd generation AWS Graviton (c7g) host:

$ go version
go version devel go1.21-b94dc384ca Sat Mar 4 00:00:01 2023 +0000 linux/arm64
$ go test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkDecryptPKCS1v15/2048-32         	     597	   2000184 ns/op
BenchmarkDecryptPKCS1v15/3072-32         	     200	   5976582 ns/op
BenchmarkDecryptPKCS1v15/4096-32         	      88	  13397414 ns/op
BenchmarkEncryptPKCS1v15/2048-32         	    6457	    185655 ns/op
BenchmarkDecryptOAEP/2048-32             	     603	   1990501 ns/op
BenchmarkEncryptOAEP/2048-32             	    6457	    185121 ns/op
BenchmarkSignPKCS1v15/2048-32            	     583	   2048907 ns/op
BenchmarkVerifyPKCS1v15/2048-32          	    6528	    183649 ns/op
BenchmarkSignPSS/2048-32                 	     583	   2052886 ns/op
BenchmarkVerifyPSS/2048-32               	    6442	    185743 ns/op
PASS
ok  	crypto/rsa	14.990s
$ cat /proc/cpuinfo | head -n 9
processor	: 0
BogoMIPS	: 2100.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0xd40
CPU revision	: 1

and on an M1 Max:

$ go version
go version devel go1.21-b94dc384ca Sat Mar 4 00:00:01 2023 +0000 darwin/arm64
$ go test crypto/rsa -bench . -cpu 1
goos: darwin
goarch: arm64
pkg: crypto/rsa
BenchmarkDecryptPKCS1v15/2048         	    1040	   1217645 ns/op
BenchmarkDecryptPKCS1v15/3072         	     303	   3562839 ns/op
BenchmarkDecryptPKCS1v15/4096         	     148	   8073468 ns/op
BenchmarkEncryptPKCS1v15/2048         	    8928	    130840 ns/op
BenchmarkDecryptOAEP/2048             	    1023	   1146886 ns/op
BenchmarkEncryptOAEP/2048             	    8979	    131854 ns/op
BenchmarkSignPKCS1v15/2048            	     994	   1194395 ns/op
BenchmarkVerifyPKCS1v15/2048          	    9250	    131157 ns/op
BenchmarkSignPSS/2048                 	     997	   1199584 ns/op
BenchmarkVerifyPSS/2048               	    9013	    131653 ns/op
PASS
ok  	crypto/rsa	15.288s

AWS c6i.8xlarge (Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz) compared to c7g.8xlarge`:

name                     old time/op  new time/op  delta
DecryptPKCS1v15/2048-32  1.52ms ± 0%  2.00ms ± 0%  +31.41%  (p=0.008 n=5+5)
DecryptPKCS1v15/3072-32  4.56ms ± 1%  5.98ms ± 0%  +31.06%  (p=0.008 n=5+5)
DecryptPKCS1v15/4096-32  10.2ms ± 0%  13.4ms ± 0%  +31.67%  (p=0.008 n=5+5)
EncryptPKCS1v15/2048-32   180µs ± 0%   185µs ± 0%   +3.09%  (p=0.008 n=5+5)
DecryptOAEP/2048-32      1.54ms ± 0%  1.99ms ± 0%  +28.88%  (p=0.008 n=5+5)
EncryptOAEP/2048-32       183µs ± 1%   185µs ± 0%   +1.29%  (p=0.008 n=5+5)
SignPKCS1v15/2048-32     1.58ms ± 0%  2.05ms ± 0%  +29.66%  (p=0.008 n=5+5)
VerifyPKCS1v15/2048-32    179µs ± 1%   184µs ± 0%   +2.56%  (p=0.008 n=5+5)
SignPSS/2048-32          1.59ms ± 1%  2.05ms ± 0%  +29.24%  (p=0.008 n=5+5)
VerifyPSS/2048-32         182µs ± 1%   186µs ± 0%   +2.06%  (p=0.008 n=5+5)

This is actually slightly better than the Ubuntu Focal OpenSSL 1.1.1f performance difference (Graviton 37% slower than Intel for same host types), although it looks like 2048-bit RSA is 2x as fast in OpenSSL (compared to Go benchmarks above) as reported by openssl speed rsa2048 on the c7g Graviton 3 hosts:

$ openssl speed rsa2048
Doing 2048 bits private rsa's for 10s: 10322 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 419431 2048 bits public RSA's in 9.98s
OpenSSL 1.1.1f  31 Mar 2020
built on: Mon Feb  6 17:57:17 2023 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-0kQqA1/openssl-1.1.1f=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000969s 0.000024s   1032.2  42027.2

OK finally Go vs. GOEXPERIMENT=boringcrypto on an AWS c7g/3rd generation Graviton:

name                     old time/op  new time/op  delta
DecryptPKCS1v15/2048-32  2.00ms ± 0%  0.91ms ± 0%  -54.59%  (p=0.008 n=5+5)
DecryptPKCS1v15/3072-32  5.98ms ± 0%  2.71ms ± 0%  -54.62%  (p=0.008 n=5+5)
DecryptPKCS1v15/4096-32  13.4ms ± 0%   6.1ms ± 0%  -54.84%  (p=0.008 n=5+5)
EncryptPKCS1v15/2048-32   185µs ± 0%     8µs ± 0%  -95.80%  (p=0.008 n=5+5)
DecryptOAEP/2048-32      1.99ms ± 0%  0.91ms ± 0%  -54.16%  (p=0.008 n=5+5)
EncryptOAEP/2048-32       185µs ± 0%    12µs ± 0%  -93.47%  (p=0.008 n=5+5)
SignPKCS1v15/2048-32     2.05ms ± 0%  0.91ms ± 0%  -55.72%  (p=0.008 n=5+5)
VerifyPKCS1v15/2048-32    184µs ± 0%     7µs ± 0%  -96.45%  (p=0.008 n=5+5)
SignPSS/2048-32          2.05ms ± 0%  0.91ms ± 0%  -55.67%  (p=0.008 n=5+5)
VerifyPSS/2048-32         186µs ± 0%     7µs ± 0%  -96.16%  (p=0.008 n=5+5)

(with those boringcrypto sign numbers roughly matching up with the rsa2048 perf reported by OpenSSL above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

4 participants