Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rsa: TLS requests are 2-3x slower in go1.20 with race detection enabled #56980

Open
Tracked by #57752
liggitt opened this issue Nov 29, 2022 · 7 comments
Open
Tracked by #57752
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@liggitt
Copy link
Contributor

liggitt commented Nov 29, 2022

What version of Go are you using (go version)?

$ go version

go version devel go1.20-bb0d8297d7 Mon Nov 28 19:38:37 2022 +0000 linux/amd64
and
go version devel go1.20-bb0d8297d7 Mon Nov 28 19:38:37 2022 +0000 darwin/arm64

Does this issue reproduce with the latest release?

No, introduced in go1.20 dev branch in https://go-review.googlesource.com/c/go/+/326012

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/liggitt/.cache/go-build"
GOENV="/usr/local/google/home/liggitt/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/google/home/liggitt/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/liggitt/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/google/home/liggitt/git/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/liggitt/git/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.20-bb0d8297d7 Mon Nov 28 19:38:37 2022 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1952584495=/tmp/go-build -gno-record-gcc-switches"

and

GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/liggitt/Library/Caches/go-build"
GOENV="/Users/liggitt/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/liggitt/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/liggitt/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/liggitt/git/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/liggitt/git/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="devel go1.20-bb0d8297d7 Mon Nov 28 19:38:37 2022 +0000"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/37/ns7gt60104scfm9fvg02p1jh00kjgb/T/go-build3981291129=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Ran project unit tests making TLS requests with race detection enabled (xref kubernetes/kubernetes#114145 (comment))

Minimal reproducer:

package main

import (
	"net/http"
	"net/http/httptest"
	"strings"
	"testing"
	"time"
)

func TestRequest(t *testing.T) {
	transport, req := setup(t)
	start := time.Now()
	do(t, transport, req)
	t.Log(time.Since(start))
}

func BenchmarkRequest(b *testing.B) {
	transport, req := setup(b)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		do(b, transport, req)
	}
}

func setup(t testing.TB) (*http.Transport, *http.Request) {
	data := []byte(strings.Repeat("a", 1024*1024)) // 1kb of data
	server := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) { w.Write(data) }))
	t.Cleanup(server.Close)

	transport := server.Client().Transport.(*http.Transport).Clone()
	transport.DisableKeepAlives = true // dial every time

	req, err := http.NewRequest("GET", server.URL, nil)
	if err != nil {
		t.Fatal(err)
	}

	return transport, req
}

func do(t testing.TB, transport *http.Transport, req *http.Request) {
	resp, err := transport.RoundTrip(req)
	if err != nil {
		t.Fatal(err)
	}
	defer resp.Body.Close()
	if resp.StatusCode != 200 {
		t.Fatal("unexpected status")
	}
}

run with

$go1.19.3 test main_test.go -race -bench . -count=20 > 1.19.3-race.txt
$go1.19.3 test main_test.go -bench . -count=20 > 1.19.3-norace.txt
$gotip test main_test.go -race -bench . -count=20 > head-race.txt
$gotip test main_test.go -bench . -count=20 > head-norace.txt
benchstat 1.19.3-race.txt head-race.txt
benchstat 1.19.3-norace.txt head-norace.txt

What did you expect to see?

TLS request times are roughly equivalent to go1.19 regardless of race detection.

What did you see instead?

to make the same request as with go1.19, go1.20 takes ~2x longer on linux/amd64, ~3x longer on darwin/arm64, and even longer on linux/ppc64le (I'm working on getting numbers with the minimal reproducer above)

linux/amd64:

name        old time/op  new time/op   delta
Request-24  8.92ms ± 3%  16.82ms ± 1%  +88.45%  (p=0.000 n=20+20)

darwin/arm64

name        old time/op  new time/op   delta
Request-10  9.32ms ± 5%  30.26ms ± 1%  +224.58%  (p=0.000 n=19+18)

This is with a simple request that only has the client verifying the server cert. When more is involved (mutual TLS or an RSA JWT verification), the slowdown is even more pronounced (3x - 7x slowdowns observed in kubernetes/kubernetes#114145).

Without race detection, the slowdown due to https://go-review.googlesource.com/c/go/+/326012 was not as drastic, but was still milliseconds longer per request:

linux/amd64:

name        old time/op  new time/op  delta
Request-24  3.18ms ± 3%  4.02ms ± 3%  +26.58%  (p=0.000 n=19+19)

darwin/arm64

name        old time/op  new time/op  delta
Request-10  5.68ms ± 8%  7.72ms ±31%  +35.90%  (p=0.000 n=20+20)

cc @FiloSottile

@liggitt liggitt changed the title crypto/rsa: TLS requests are 2x slower in go1.20 with race detection enabled crypto/rsa: TLS requests are 2-3x slower in go1.20 with race detection enabled Nov 29, 2022
@mkumatag
Copy link

cc @laboger @archanaravindar

@FiloSottile
Copy link
Contributor

Thank you for the details! If I understand the benchmark correctly, both the server and the client side of the handshake contribute to the benchmarked time, so this is observing an RSA-2048 sign+verify cycle. In that case +26% on amd64 and +35% on arm64 is more or less in line with the benchmarks we had.

I had not benchmarked with the race detector enabled, and it's interesting the performance profile is so different. I can look into it, but it would help to understand if race detector-enabled performance is a goal. The Go 1.20 changes are a tradeoff that knowingly sacrifice a bit of performance for a lot of security by cutting a large chunk of attack surface (math/big).

In the meantime, if you just want your tests to be faster (assuming that's where the race detector matters), you might consider using a smaller (insecure) RSA key, or checking if ECDSA keys perform better.

https://go-review.git.corp.google.com/c/go/+/326012

(Please use https://go.dev/cl/ links, these links don't work outside Google. There's an internal Chrome extension you can use to easily copy a shortened version of the link to the current page. Thank you!)

@liggitt
Copy link
Contributor Author

liggitt commented Nov 29, 2022

In that case +26% on amd64 and +35% on arm64 is more or less in line with the benchmarks we had.

It wasn't clear to me that the benchmarks in http://go.dev/cl/326012 represented known multi-millisecond slowdowns for requests (where both the client and server performance is impacted in serial). The CL also alluded to possible optimizations ("this is without any assembly at all, and that further improvements are likely possible"), and I was surprised to see it merge without attempting to preserve status quo for performance. Are any of those improvements planned before go1.20 freezes?

it would help to understand if race detector-enabled performance is a goal

It's not specifically a goal, but the change turned several of our tests that were previously solid green (taking ~30ms to make a request with a 100ms timeout) into flaking or solidly failing (taking 100-200ms+ to make the same request) on some architectures.

As we are working to be able to update Kubernetes release branches to new go minor versions with minimal code changes, knowing this is in the pipeline and anticipating chasing down new test flakes and backporting test changes to release branches as a result of a go version bump is a red flag.

@FiloSottile
Copy link
Contributor

In that case +26% on amd64 and +35% on arm64 is more or less in line with the benchmarks we had.

It wasn't clear to me that the benchmarks in http://go.dev/cl/326012 represented known multi-millisecond slowdowns for requests (where both the client and server performance is impacted in serial). The CL also alluded to possible optimizations ("this is without any assembly at all, and that further improvements are likely possible"), and I was surprised to see it merge without attempting to preserve status quo for performance. Are any of those improvements planned before go1.20 freezes?

Go 1.20 already froze. We landed https://go.dev/cl/445019, https://go.dev/cl/445020, https://go.dev/cl/445021, and https://go.dev/cl/452095 to reduce the performance gap. https://go.dev/cl/452095 has the final benchmarks on amd64 compared to Go 1.19, which amount to 500µs for sign+verify on the Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz I ran them on, which is more or less in line with the 840µs you're seeing. We do have extra optimization work planned for Go 1.21, and the new backend can in theory become faster than the old one with the same amount of assembly.

When you say multi-millisecond slowdowns, I assume you mean with the race detector enabled. That's interesting and worth looking into (it might be an issue in the race detector or in the interaction, maybe?) but the question is whether race detector-enabled performance is a blocker worth reverting the security improvement over.

@randall77
Copy link
Contributor

I suspect that the greater slowdowns with the race detector just mirror the fact that the new implementation is more Go and less assembly. The race detector instrumentation only gets inserted in Go code.

I don't think race detector performance should be a release blocker.

@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 29, 2022
@liggitt
Copy link
Contributor Author

liggitt commented Nov 29, 2022

When you say multi-millisecond slowdowns, I assume you mean with the race detector enabled

No, without race detection, a dial+handshake+request took at least a millisecond longer on the machines I tested, and in some architectures, multiple milliseconds longer. That reflects real runtime impact, which I was surprised by.

I don't think race detector performance should be a release blocker.

I tend to agree (pain of backporting deflaking test changes notwithstanding), but I was surprised by the degree of the performance drop, and wanted to surface what we saw in case it indicated a bigger problem.

@Rajalakshmi-Girish
Copy link

and even longer on linux/ppc64le (I'm working on getting numbers with the minimal reproducer above)

@liggitt
linux/ppc64le:

name       old time/op  new time/op   delta
Request-8  33.3ms ±17%  136.4ms ± 5%  +309.28%  (p=0.000 n=18+16)

Have run your Minimal reproducer in description.

[root@raji-workspace ~]# go1.19.3 version
go version go1.19.3 linux/ppc64le
[root@raji-workspace ~]# gotip version
go version devel go1.20-78472603c6 Tue Nov 29 22:59:16 2022 +0000 linux/ppc64le
[root@raji-workspace ~]# go1.19.3 test main_test.go -race -bench . -count=20 > 1.19.3-race.txt
[root@raji-workspace ~]# go1.19.3 test main_test.go -bench . -count=20 > 1.19.3-norace.txt
[root@raji-workspace ~]# gotip test main_test.go -race -bench . -count=20 > head-race.txt
[root@raji-workspace ~]# gotip test main_test.go -bench . -count=20 > head-norace.txt
[root@raji-workspace ~]#  benchstat 1.19.3-race.txt head-race.txt
name       old time/op  new time/op   delta
Request-8  33.3ms ±17%  136.4ms ± 5%  +309.28%  (p=0.000 n=18+16)
[root@raji-workspace ~]#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants