Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rsa: severe VerifyPKCS1v15 performance regression in Go 1.20, Go 1.21 [freeze exception] #63516

Closed
Tracked by #57752
nkatsaros opened this issue Oct 12, 2023 · 11 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance Security
Milestone

Comments

@nkatsaros
Copy link
Contributor

nkatsaros commented Oct 12, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21.3 linux/amd64

Does this issue reproduce with the latest release?

Yes, with the latest releases of Go 1.20 and Go 1.21.

What operating system and processor architecture are you using (go env)?

go env Output
$ ./go1.21.3.linux-amd64/bin/go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/gorsabench/go1.21.3.linux-amd64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/gorsabench/go1.21.3.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1639202501=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Some of our services started seeing severe performance degradation after upgrading to Go 1.20. Applications which verify many RS256 signed JWT tokens saw dramatic increases to CPU usage and profiles indicated that rsa.VerifyPKCS1v15 was the problematic function call.

I copied the BenchmarkVerifyPKCS1v15 benchmark to Go 1.19 and ran benchmarks on Go 1.19.13, Go 1.20.10, Go 1.21.3 and Go 1.21.3+boringcrypto. I've filtered the benchmarks to only include the VerifyPKCS1v15 benchmark since that was the easiest to copy to Go 1.19, however, anything that calls rsa.go's encrypt() is affected.

% benchstat go1.19.13.linux-amd64.txt go1.20.10.linux-amd64.txt go1.21.3.linux-amd64.txt go1.21.3.linux-amd64-boringcrypto.txt
goos: linux
goarch: amd64
pkg: crypto/rsa
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
                       │ go1.19.13.linux-amd64.txt │         go1.20.10.linux-amd64.txt         │         go1.21.3.linux-amd64.txt         │ go1.21.3.linux-amd64-boringcrypto.txt │
                       │          sec/op           │     sec/op      vs base                   │    sec/op      vs base                   │    sec/op     vs base                 │
VerifyPKCS1v15/2048-4                  8.053µ ± 8%   216.550µ ±  7%  +2589.05% (p=0.002 n=6)     121.072µ ± 1%  +1403.43% (p=0.002 n=6)     5.664µ ± 14%  -29.67% (p=0.002 n=6)

I've been following these performance regressions (see #59442) and assumed they were fixed in Go 1.21. Go 1.21 mostly fixed the performance of RSA Private Key operations.

Our solution to retain performance, avoid massively scaling and remain on a recent version of Go is to compile with GOEXPERIMENT=boringcrypto which does not seem like a reasonable long term solution.

The encrypt function contains a comment regarding a possible optimization using a sync.Once.

func encrypt(pub *PublicKey, plaintext []byte) ([]byte, error) {
	boring.Unreachable()

	// Most of the CPU time for encryption and verification is spent in this
	// NewModulusFromBig call, because PublicKey doesn't have a Precomputed
	// field. If performance becomes an issue, consider placing a private
	// sync.Once on PublicKey to compute this.
	N, err := bigmod.NewModulusFromBig(pub.N)
	if err != nil {
		return nil, err
	}
	m, err := bigmod.NewNat().SetBytes(plaintext, N)
	if err != nil {
		return nil, err
	}
	e := uint(pub.E)

	return bigmod.NewNat().ExpShort(m, e, N).Bytes(N), nil
}

I have a modification to my local Go 1.21 tree that contains this fix and caches bigmod.NewModulusFromBig(pub.N). It brings performance closer to Go 1.19 (~67% worse on my M1 MacBook Pro). I do not think a patch containing this fix would be accepted. Adding a sync.Once to rsa.PublicKey introduces a few issues:

  • rsa.PublicKey is no longer asn1.Marshalable. This expectation is documented in a comment within TestMarshalPKCS1PublicKey
    // It's never been documented that asn1.Marshal/Unmarshal on rsa.PublicKey works,
    // but it does, and we know of code that depends on it.
    // Lock that in, even though we'd prefer that people use MarshalPKCS1PublicKey and ParsePKCS1PublicKey.
    
  • It's a weird use of sync.Once which I don't commonly see in the stdlib.

I think a proper change would require exposing some kind of Precompute() functionality on rsa.PublicKey but I think that would still break the asn1.Marshal expectation.

What did you expect to see?

Similar performance to Go 1.19.

What did you see instead?

VerifyPKCS1v15 operations are ~25x slower in Go 1.20 and ~14x slower in Go 1.21.

@bcmills
Copy link
Contributor

bcmills commented Oct 12, 2023

See also #57752.

@tareksha
Copy link

Confirming the issue here too. In our system we have a component that executes rsa operations at high rates and the critical part that calls rsa.VerifyPKCS1v15 is approximately twice slower when upgrading from 1.19 to 1.21. We had to revert back to 1.19.
Previously we had to revert from go 1.20 due to the same reason.

@dmitshur
Copy link
Contributor

CC @golang/security.

@dmitshur dmitshur added this to the Backlog milestone Oct 13, 2023
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 13, 2023
@gopherbot
Copy link

Change https://go.dev/cl/552895 mentions this issue: crypto/rsa: use E = 65537 in benchmarks

@gopherbot
Copy link

Change https://go.dev/cl/552935 mentions this issue: crypto/rsa,crypto/internal/bigmod: improve verify/encrypt performance

@FiloSottile
Copy link
Contributor

Turns out we were benchmarking the wrong thing: all the benchmark keys had E = 3, which hid the algorithmic slowdown on the exponent size. See https://go.dev/cl/552895. Apologies, I am really not happy this happened, and that it took me so long to catch it.

I have a CL out that makes public key operations faster than they were in Go 1.19, closing the loop on all the performance regressions.

I'd appreciate if folks could test this with their workloads.

gotip download 552935
gotip build

(This will get you something not too dissimilar from Go 1.22rc1, with the CL patched on top. I wouldn't run the whole fleet on it, but it should be stable enough for a lab.)

@mateusz834
Copy link
Member

So it turns out that the go1.19 -> go.1.20 (real-world) Verify operations were only slower by x4, not by x20 as noted in the release notes, right?

https://go.dev/doc/go1.20#cryptorsapkgcryptorsa
https://go.dev/cl/452095

Encryption operations are approximately 20x slower than before

@tareksha
Copy link

hi @FiloSottile is this fix being downported to 1.21?

@FiloSottile FiloSottile changed the title crypto/rsa: severe VerifyPKCS1v15 performance regression in Go 1.20, Go 1.21 crypto/rsa: severe VerifyPKCS1v15 performance regression in Go 1.20, Go 1.21 [freeze exception] Jan 9, 2024
@FiloSottile
Copy link
Contributor

@golang/release, @rsc suggested getting this freeze exception'd rather than backported.

@dmitshur dmitshur modified the milestones: Backlog, Go1.22 Jan 9, 2024
@dmitshur
Copy link
Contributor

dmitshur commented Jan 11, 2024

We've discussed and agreed to approve this freeze exception for Go 1.22. If this change turns out not to go as anticipated, please update the issue and follow up accordingly. Thanks.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 11, 2024
gopherbot pushed a commit that referenced this issue Jan 12, 2024
Now, this is embarrassing. For the whole Go 1.20 and Go 1.21 cycles, we
based RSA public key operation (verification and decryption) benchmarks
on the keys in rsa_test.go, which had E = 3. Most keys in use, including
all those generated by GenerateKey, have E = 65537. This significantly
skewed even relative benchmarks, because the new constant-time
algorithms would incur a larger slowdown for larger exponents.

I noticed this only because I got a production profile for an
application that does a lot of RSA verifications, saw ExpShort show up,
made ExpShort faster, and the crypto/rsa profiles didn't move.

We were measuring the wrong thing, and the slowdown was worse than we
thought. My apologies.

(If E had not been parametrized, it would have avoided issues like this
one, too. Grumble. https://words.filippo.io/parameters/#fn9)

goos: darwin
goarch: arm64
pkg: crypto/rsa
                       │ g35222eeb78 │                 new                 │
                       │   sec/op    │   sec/op     vs base                │
DecryptPKCS1v15/2048-8   1.414m ± 2%   1.417m ± 1%        ~ (p=0.971 n=10)
DecryptPKCS1v15/3072-8   4.107m ± 0%   4.160m ± 1%   +1.29% (p=0.000 n=10)
DecryptPKCS1v15/4096-8   9.363m ± 1%   9.305m ± 1%        ~ (p=0.143 n=10)
EncryptPKCS1v15/2048-8   162.8µ ± 2%   212.1µ ± 0%  +30.34% (p=0.000 n=10)
DecryptOAEP/2048-8       1.460m ± 4%   1.413m ± 1%        ~ (p=0.105 n=10)
EncryptOAEP/2048-8       161.7µ ± 0%   213.4µ ± 0%  +31.99% (p=0.000 n=10)
SignPKCS1v15/2048-8      1.419m ± 1%   1.476m ± 1%   +4.05% (p=0.000 n=10)
VerifyPKCS1v15/2048-8    160.6µ ± 0%   212.6µ ± 3%  +32.38% (p=0.000 n=10)
SignPSS/2048-8           1.419m ± 0%   1.477m ± 2%   +4.07% (p=0.000 n=10)
VerifyPSS/2048-8         163.9µ ± 8%   212.3µ ± 0%  +29.50% (p=0.000 n=10)
geomean                  802.5µ        899.1µ       +12.04%

Updates #63516

Change-Id: Iab4a0684d8101ae07dac8462908d8058fe5e9f3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/552895
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
@FiloSottile
Copy link
Contributor

Submitted to master.

ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Now, this is embarrassing. For the whole Go 1.20 and Go 1.21 cycles, we
based RSA public key operation (verification and decryption) benchmarks
on the keys in rsa_test.go, which had E = 3. Most keys in use, including
all those generated by GenerateKey, have E = 65537. This significantly
skewed even relative benchmarks, because the new constant-time
algorithms would incur a larger slowdown for larger exponents.

I noticed this only because I got a production profile for an
application that does a lot of RSA verifications, saw ExpShort show up,
made ExpShort faster, and the crypto/rsa profiles didn't move.

We were measuring the wrong thing, and the slowdown was worse than we
thought. My apologies.

(If E had not been parametrized, it would have avoided issues like this
one, too. Grumble. https://words.filippo.io/parameters/#fn9)

goos: darwin
goarch: arm64
pkg: crypto/rsa
                       │ g35222eeb78 │                 new                 │
                       │   sec/op    │   sec/op     vs base                │
DecryptPKCS1v15/2048-8   1.414m ± 2%   1.417m ± 1%        ~ (p=0.971 n=10)
DecryptPKCS1v15/3072-8   4.107m ± 0%   4.160m ± 1%   +1.29% (p=0.000 n=10)
DecryptPKCS1v15/4096-8   9.363m ± 1%   9.305m ± 1%        ~ (p=0.143 n=10)
EncryptPKCS1v15/2048-8   162.8µ ± 2%   212.1µ ± 0%  +30.34% (p=0.000 n=10)
DecryptOAEP/2048-8       1.460m ± 4%   1.413m ± 1%        ~ (p=0.105 n=10)
EncryptOAEP/2048-8       161.7µ ± 0%   213.4µ ± 0%  +31.99% (p=0.000 n=10)
SignPKCS1v15/2048-8      1.419m ± 1%   1.476m ± 1%   +4.05% (p=0.000 n=10)
VerifyPKCS1v15/2048-8    160.6µ ± 0%   212.6µ ± 3%  +32.38% (p=0.000 n=10)
SignPSS/2048-8           1.419m ± 0%   1.477m ± 2%   +4.07% (p=0.000 n=10)
VerifyPSS/2048-8         163.9µ ± 8%   212.3µ ± 0%  +29.50% (p=0.000 n=10)
geomean                  802.5µ        899.1µ       +12.04%

Updates golang#63516

Change-Id: Iab4a0684d8101ae07dac8462908d8058fe5e9f3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/552895
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Most libraries don't consider N secret, but it's arguably useful for
privacy applications. However, E should generally be fixed, and there is
a lot of performance to be gained by using variable-time exponentiation.

The threshold trick is from BoringSSL.

goos: linux
goarch: amd64
pkg: crypto/rsa
cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
                       │     old      │                 new                 │
                       │    sec/op    │   sec/op     vs base                │
DecryptPKCS1v15/2048-4    1.398m ± 0%   1.396m ± 4%        ~ (p=0.853 n=10)
DecryptPKCS1v15/3072-4    3.640m ± 0%   3.652m ± 1%        ~ (p=0.063 n=10)
DecryptPKCS1v15/4096-4    7.756m ± 0%   7.764m ± 0%        ~ (p=0.853 n=10)
EncryptPKCS1v15/2048-4   175.50µ ± 0%   39.37µ ± 0%  -77.57% (p=0.000 n=10)
DecryptOAEP/2048-4        1.375m ± 0%   1.371m ± 1%        ~ (p=0.089 n=10)
EncryptOAEP/2048-4       177.64µ ± 0%   41.17µ ± 1%  -76.82% (p=0.000 n=10)
SignPKCS1v15/2048-4       1.419m ± 0%   1.393m ± 1%   -1.84% (p=0.000 n=10)
VerifyPKCS1v15/2048-4    173.70µ ± 1%   38.28µ ± 2%  -77.96% (p=0.000 n=10)
SignPSS/2048-4            1.437m ± 1%   1.413m ± 0%   -1.64% (p=0.000 n=10)
VerifyPSS/2048-4         176.83µ ± 1%   43.08µ ± 5%  -75.64% (p=0.000 n=10)

This finally makes everything in crypto/rsa faster than it was in Go 1.19.

goos: linux
goarch: amd64
pkg: crypto/rsa
cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
                       │ go1.19.txt  │              go1.20.txt               │              go1.21.txt               │               new.txt               │
                       │   sec/op    │    sec/op     vs base                 │    sec/op     vs base                 │   sec/op     vs base                │
DecryptPKCS1v15/2048-4   1.458m ± 0%    1.597m ± 1%    +9.50% (p=0.000 n=10)    1.395m ± 1%    -4.30% (p=0.000 n=10)   1.396m ± 4%   -4.25% (p=0.002 n=10)
DecryptPKCS1v15/3072-4   4.023m ± 1%    5.332m ± 1%   +32.53% (p=0.000 n=10)    3.649m ± 1%    -9.30% (p=0.000 n=10)   3.652m ± 1%   -9.23% (p=0.000 n=10)
DecryptPKCS1v15/4096-4   8.710m ± 1%   11.937m ± 1%   +37.05% (p=0.000 n=10)    7.564m ± 1%   -13.16% (p=0.000 n=10)   7.764m ± 0%  -10.86% (p=0.000 n=10)
EncryptPKCS1v15/2048-4   51.79µ ± 0%   267.68µ ± 0%  +416.90% (p=0.000 n=10)   176.42µ ± 0%  +240.67% (p=0.000 n=10)   39.37µ ± 0%  -23.98% (p=0.000 n=10)
DecryptOAEP/2048-4       1.461m ± 0%    1.613m ± 1%   +10.37% (p=0.000 n=10)    1.415m ± 0%    -3.13% (p=0.000 n=10)   1.371m ± 1%   -6.18% (p=0.000 n=10)
EncryptOAEP/2048-4       54.24µ ± 0%   269.19µ ± 0%  +396.28% (p=0.000 n=10)   177.31µ ± 0%  +226.89% (p=0.000 n=10)   41.17µ ± 1%  -24.10% (p=0.000 n=10)
SignPKCS1v15/2048-4      1.510m ± 0%    1.705m ± 0%   +12.93% (p=0.000 n=10)    1.423m ± 1%    -5.78% (p=0.000 n=10)   1.393m ± 1%   -7.76% (p=0.000 n=10)
VerifyPKCS1v15/2048-4    50.87µ ± 0%   266.41µ ± 1%  +423.71% (p=0.000 n=10)   174.38µ ± 0%  +242.79% (p=0.000 n=10)   38.28µ ± 2%  -24.75% (p=0.000 n=10)
SignPSS/2048-4           1.513m ± 1%    1.709m ± 0%   +12.97% (p=0.000 n=10)    1.461m ± 0%    -3.42% (p=0.000 n=10)   1.413m ± 0%   -6.58% (p=0.000 n=10)
VerifyPSS/2048-4         53.45µ ± 1%   268.56µ ± 0%  +402.48% (p=0.000 n=10)   177.29µ ± 0%  +231.72% (p=0.000 n=10)   43.08µ ± 5%  -19.39% (p=0.000 n=10)
geomean                  514.6µ         1.094m       +112.65%                   801.6µ        +55.77%                  442.1µ       -14.08%

Fixes golang#63516

Change-Id: If40e596a2e4b3ab7a202ff34591cf9cffecfcc1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/552935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
psFried added a commit to estuary/data-plane-gateway that referenced this issue Apr 12, 2024
We're hoping that this will help address the high CPU usage we've observed
in production when there's a lot of new connections being created. See:
golang/go#63516 (comment)
psFried added a commit to estuary/data-plane-gateway that referenced this issue Apr 12, 2024
We're hoping that this will help address the high CPU usage we've observed
in production when there's a lot of new connections being created. See:
golang/go#63516 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance Security
Projects
None yet
Development

No branches or pull requests

7 participants