Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rsa: Some severe performance regressions in Go 1.20 #59442

Closed
Tracked by #57752
sywhang opened this issue Apr 4, 2023 · 19 comments
Closed
Tracked by #57752

crypto/rsa: Some severe performance regressions in Go 1.20 #59442

sywhang opened this issue Apr 4, 2023 · 19 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Milestone

Comments

@sywhang
Copy link
Contributor

sywhang commented Apr 4, 2023

What version of Go are you using (go version)?

$ go version
1.20.3

Does this issue reproduce with the latest release?

Yes (only on latest release).

What operating system and processor architecture are you using (go env)?

linux/amd64

go env Output
$ go env
~ » go env                                                                                                                  sungyoon@sungyoon
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go-repos/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/go-repos:/opt/go/path:/home/user/go-code"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/go/root"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/go/root/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2856124056=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Some services at Uber started seeing severe performance degradation after upgrading to Go 1.20.

Profiles revealed crypto/rsa related stacks showing up everywhere.

Here is a repro benchmark that shows around ~60% regression compared to Go 1.19:

package cryptosign

import (
        "crypto"
        "crypto/rand"
        "crypto/rsa"
        "crypto/sha256"
)

func Sign(key any, msg []byte) (sig []byte, err error) {
        k, _ := key.(*rsa.PrivateKey)
        h := sha256.New()
        h.Write(msg)
        return rsa.SignPKCS1v15(rand.Reader, k, crypto.SHA256, h.Sum(nil))
}

func BenchmarkSign(b *testing.B) {
        msg := []byte("secret text")
        rsaKey, _ := rsa.GenerateKey(rand.Reader, 2048)

        b.ResetTimer()

        for i := 0; i < b.N; i++ {
                Sign(rsaKey, msg)
        }
}
benchstat before.txt after.txt                                                                                                                                                                                
goos: linux
goarch: amd64
pkg: github.com/sywhang/issues/cryptosign
cpu: AMD EPYC 7B13
        │ before.txt  │              after.txt              │
        │   sec/op    │   sec/op     vs base                │
Sign-96   1.246m ± 2%   2.009m ± 7%  +61.23% (p=0.000 n=10)

What did you expect to see?

I am aware of the new crypto/rsa changes that were introduced in Go 1.20 that involves removing big.Int to bigmod changes, which could be related to the regression. (#56980).

This benchmark was created based on profile from a single service that reported this issue internally, and there may be more paths in crypto/rsa that has similar issues. Will update as we find more such paths if we find any.

What did you see instead?

60% regression as noted above.

@ianlancetaylor
Copy link
Contributor

CC @golang/security

@ianlancetaylor ianlancetaylor added this to the Go1.21 milestone Apr 4, 2023
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Apr 4, 2023
@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@ianlancetaylor ianlancetaylor added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 4, 2023
@FiloSottile
Copy link
Contributor

On the amd64 platforms I was able to benchmark the slowdown of RSA-2048 was 20% as noted in the release notes. Looks like AMD EPYC is suffering more than the Intel CPUs I tested. We have work planned as part of #57752 that should bring the performance of Go 1.21 almost in line with Go 1.19 (if not better).

@ianlancetaylor
Copy link
Contributor

@FiloSottile Should we mark #57752 as a 1.21 release blocker?

@rsc
Copy link
Contributor

rsc commented Apr 5, 2023

Do we know what's different about the AMD EPYCs that accounts for 60% slowdown instead of 20% slowdown?
Which part of #57752 is specifically going to help the AMD EPYCs?

@FiloSottile
Copy link
Contributor

Do we know what's different about the AMD EPYCs that accounts for 60% slowdown instead of 20% slowdown? Which part of #57752 is specifically going to help the AMD EPYCs?

Honestly, I don't know what AMD EPYCs are doing differently, but https://go.dev/cl/471259 is switching the hot loop back to the same assembly that math/big was using, so I am reasonably confident it will revert the slowdown.

Filed #59463 specifically to track that change, added it to #57752. Let's keep this open to benchmark the new code on EPYCs?

@sywhang
Copy link
Contributor Author

sywhang commented Apr 10, 2023

Can confirm https://go-review.googlesource.com/c/go/+/471259/ reduces part of the overhead from switch to bigmod.
Compared with Go 1.20:

goos: linux
goarch: amd64
pkg: github.com/sywhang/issues/cryptosign
cpu: AMD EPYC 7B13
        │  after.txt  │              after2.txt              │
        │   sec/op    │    sec/op     vs base                │
Sign-96   2.009m ± 7%   1.473m ± 15%  -26.68% (p=0.000 n=10)

Compared with Go 1.19:

goos: linux
goarch: amd64
pkg: github.com/sywhang/issues/cryptosign
cpu: AMD EPYC 7B13
        │ before.txt  │              after2.txt              │
        │   sec/op    │    sec/op     vs base                │
Sign-96   1.246m ± 2%   1.473m ± 15%  +18.21% (p=0.000 n=10)

@evanj
Copy link
Contributor

evanj commented Apr 13, 2023

Is there any chance that this change to crypto/rsa might be considered for a backport? A 60% performance degredation is pretty bad for some applications, and the August release of 1.21 is 4 months away still. We could do our own backport, but maintaining our own build of the Go tools is a fair amount of work.

@ianlancetaylor
Copy link
Contributor

@evanj We'll have to see a completed fix, and give it plenty of time for testing, before we are able to consider a backport. While performance is important, it would of course be a bad idea to backport a risky fix to security code.

@evanj
Copy link
Contributor

evanj commented Apr 13, 2023

Looks like GOEXPERIMENT=boringcrypto might be a good workaround for some applications. It seems likely we will test this on one of the applications at Datadog that is running into this at least. (Speaking of experimental changes to security code :) )

@FiloSottile
Copy link
Contributor

Thank you for the benchmark @sywhang. Still surprising there would be a +18% difference when the code spend most of its time in the same assembly core now, and on Intel I see +5%. I'll rent an AMD EPYC to get some profiles.

Indeed, this is a bigger change than can be backported, at least quickly. In the future, this is the kind of feedback that is very useful during the release candidate phase: we had a rollback ready in case issues like this arose before Go 1.20 was released.

@sywhang
Copy link
Contributor Author

sywhang commented Apr 13, 2023

@FiloSottile I'd be happy to share the profiles with you, if that helps.

Certainly, I agree with you this regression would've been much better to catch during rc phase. The issue though is that not many service owners want to deploy rc builds to prod or or even staging. My team doesn't own any services that have meaningful traffic, or even one that exercises the code path. We only heard about the regression from service owners after we upgraded the whole company to the new version after several steps of verification.

@evanj
Copy link
Contributor

evanj commented Apr 14, 2023

You make a great point that it would be better if we could test release candidates in production. I would like to do this, but prioritizing that work has been hard, as you might be able to guess by the fact that we are just migrating to 1.20 now, after 1.20.3 has been released!

FWIW I'm not sure our slowdown is specifically EPYC related. I'll try to extract a reproduction case from our workload and share some details once we have more fully investigated it.

@evanj
Copy link
Contributor

evanj commented Apr 18, 2023

Ugh well we tracked down one of our problems: Some code was inadvertently calling json.Marshall/json.Unmarshal on rsa.PrivateKey; the new private fields in rsa.PrecomputedValues don't get round-tripped, which cause this code to execute a really slow RSA path. The "quick fix" is to call rsa.PrivateKey.Precompute() after Unmarshal, but the real fix is that code should probably not be calling json.Marshal on rsa.PrivateKey at all ...

@FiloSottile
Copy link
Contributor

Oh! That's a very interesting observation @evanj. I wonder if we should implement a custom json.Marshaler to marshal the private fields, or even just an Unmarshaler that runs precomputation (although that might be too slow to do in the unmarshaling path).

Would you mind opening a separate issue about marhsal/unmarshal round-tripping?

@rsc
Copy link
Contributor

rsc commented Apr 18, 2023

On the general topic of marshal/unmarshal changes, we had a long conversation to decide the policy on #10275, specifically #10275 (comment).

This suggests that the rule is: if there's an existing, reasonable marshaling, we can't change it. If the marshaling is useless, then we can correct it as a kind of bug fix.

I would amend that to say that if we're already generating a JSON struct, then we can change it in backwards compatible ways, as long as it really is backwards compatible for both new-marshal/old-unmarshal and old-marshal/new-unmarshal.

It sounds like we broke the rule to some extent with the RSA changes, and we should look into some kind of fix, perhaps calling Precompute during Unmarshal.

@evanj
Copy link
Contributor

evanj commented Apr 18, 2023

I created #59695 for the json.Unmarshal issue. Thank you!

@FiloSottile
Copy link
Contributor

I reproduced a 70% slowdown on 3rd gen EPYCs, and although I haven't diagnosed the cause, the new strategy and assembly in CL 471259 regains all that and then some.

goos: linux
goarch: amd64
pkg: crypto/rsa
cpu: AMD EPYC 7R13 Processor
                       │ go1.19a.txt │               go1.20a.txt               │                newa.txt                 │
                       │   sec/op    │    sec/op      vs base                  │    sec/op      vs base                  │
DecryptPKCS1v15/2048-4   970.0µ ± 0%    1667.6µ ± 0%    +71.92% (p=0.000 n=10)     951.6µ ± 0%     -1.90% (p=0.002 n=10)
DecryptPKCS1v15/3072-4   2.949m ± 0%     5.124m ± 0%    +73.75% (p=0.000 n=10)     2.675m ± 0%     -9.29% (p=0.000 n=10)
DecryptPKCS1v15/4096-4   6.350m ± 0%    11.660m ± 0%    +83.62% (p=0.000 n=10)     5.746m ± 0%     -9.51% (p=0.000 n=10)
EncryptPKCS1v15/2048-4   6.605µ ± 1%   183.807µ ± 0%  +2683.05% (p=0.000 n=10)   123.720µ ± 0%  +1773.27% (p=0.000 n=10)
DecryptOAEP/2048-4       973.8µ ± 0%    1670.8µ ± 0%    +71.57% (p=0.000 n=10)     951.8µ ± 0%     -2.27% (p=0.002 n=10)
EncryptOAEP/2048-4       8.444µ ± 1%   185.889µ ± 0%  +2101.56% (p=0.000 n=10)   124.142µ ± 0%  +1370.27% (p=0.000 n=10)
SignPKCS1v15/2048-4      976.8µ ± 0%    1725.5µ ± 0%    +76.65% (p=0.000 n=10)     979.6µ ± 0%     +0.28% (p=0.000 n=10)
VerifyPKCS1v15/2048-4    5.713µ ± 0%   182.983µ ± 0%  +3103.19% (p=0.000 n=10)   122.737µ ± 0%  +2048.56% (p=0.000 n=10)
SignPSS/2048-4           980.3µ ± 0%    1729.5µ ± 0%    +76.42% (p=0.000 n=10)     985.7µ ± 3%     +0.55% (p=0.000 n=10)
VerifyPSS/2048-4         8.168µ ± 1%   185.312µ ± 0%  +2168.76% (p=0.000 n=10)   123.772µ ± 0%  +1415.33% (p=0.000 n=10)

@gopherbot
Copy link

Change https://go.dev/cl/471259 mentions this issue: crypto/internal/bigmod: switch to saturated limbs

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Projects
None yet
Development

No branches or pull requests

9 participants