Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestRequestLimit/h2 becomes significantly more expensive and slower after x/net@v0.23.0 #66668

Closed
dmitshur opened this issue Apr 3, 2024 · 15 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Apr 3, 2024

#!watchflakes
post <-  pkg == "net/http" && test ~ `TestRequestLimit` && status == "ABORT"

I've initially observed this on various builders, most often linux-386 ones, but it seems to reproduce on darwin/arm64 too.

At tip without x/net@v0.23.0 (e.g., commit 61a3ee5), TestRequestLimit/h2 consistently passes in under a second:

$ go test -count=1 net/http -v -run=^TestRequestLimit$
=== RUN   TestRequestLimit
=== RUN   TestRequestLimit/h1
=== RUN   TestRequestLimit/h2
--- PASS: TestRequestLimit (0.14s)
    --- PASS: TestRequestLimit/h1 (0.06s)
    --- PASS: TestRequestLimit/h2 (0.08s)
PASS
ok  	net/http	0.491s
$ go test -count=1 net/http -run=^TestRequestLimit$
ok  	net/http	0.442s
$ go test -count=1 net/http -run=^TestRequestLimit$
ok  	net/http	0.395s
$ go test -count=1 net/http -run=^TestRequestLimit$
ok  	net/http	0.425s

When x/net@v0.23.0 is pulled in (e.g., CL 576295), it takes anywhere between 15 seconds and 200+ seconds:

$ go test -count=1 net/http -v -run=^TestRequestLimit$
=== RUN   TestRequestLimit
=== RUN   TestRequestLimit/h1
=== RUN   TestRequestLimit/h2
--- PASS: TestRequestLimit (14.94s)
    --- PASS: TestRequestLimit/h1 (0.06s)
    --- PASS: TestRequestLimit/h2 (14.88s)
PASS
ok  	net/http	15.327s
$ go test -count=1 net/http -v -run=^TestRequestLimit$
=== RUN   TestRequestLimit
=== RUN   TestRequestLimit/h1
=== RUN   TestRequestLimit/h2
--- PASS: TestRequestLimit (266.59s)
    --- PASS: TestRequestLimit/h1 (0.06s)
    --- PASS: TestRequestLimit/h2 (266.52s)
PASS
ok  	net/http	266.886s
$ go test -count=1 net/http -v -run=^TestRequestLimit$
=== RUN   TestRequestLimit
=== RUN   TestRequestLimit/h1
=== RUN   TestRequestLimit/h2
--- PASS: TestRequestLimit (205.00s)
    --- PASS: TestRequestLimit/h1 (0.05s)
    --- PASS: TestRequestLimit/h2 (204.95s)
PASS
ok  	net/http	205.313s

I'm seeing around 125% CPU usage while this one test runs. This seems to cause it to exceed the 180 second timeout on some builders, especially when running all other tests and not just this one.

CC @neild.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 3, 2024
@dmitshur dmitshur added this to the Go1.23 milestone Apr 3, 2024
@neild
Copy link
Contributor

neild commented Apr 5, 2024

The root cause of this is an unfortunate side-effect of the fix for #65051. It's arguably a bug in that fix.

When the HTTP/2 server rejects a connection due to a bad request, it sends a GOAWAY frame and closes the connection. The GOAWAY includes the ID of the most recent request processed by the server. The problem is that the GOAWAY sent by the server in this case includes the ID of the previous request, not the one which caused us to close the connection.

When the HTTP/2 client gets a GOAWAY with an error for a connection where it has sent only one request, it should assume that there's something wrong with that request and not retry it, even if the GOAWAY doesn't indicate that this request was seen by the server. It does not. This is #60636.

The interaction of these two behaviors means that in TestRequestLimit, requests get sent, rejected, and retried repeatedly.

One half of the fix will be to make the GOAWAY sent when rejecting a request correctly indicate that the server saw that request. The other will be to finally fix #60636, if I can figure out a good way to do so. (It's tricky, because the retry behavior is spread between net/http and x/net/http, and the interactions between the two sides are complicated.)

@gopherbot
Copy link

Change https://go.dev/cl/576756 mentions this issue: http2: send correct LastStreamID in stream-caused GOAWAY

@MadhavJivrajani
Copy link

Thank you for the RCA and the potential fix @neil, I really appreciate the detailed write-up! ♥️

@MadhavJivrajani
Copy link

@neild @dmitshur 👋🏼

Hey folks, thank you for tracking this. I'm trying to understand the process better here, and apologies if I've missed reading about this somewhere, would the bug fix in https://go.dev/cl/576756 warrant another patch release for it? Or would this then be included whenever the next one happens to be planned in the future? Thanks!

gopherbot pushed a commit to golang/net that referenced this issue Apr 5, 2024
When closing a connection because a stream contained a request we
didn't like (for example, because the request headers exceed
the maximum we will accept), set the LastStreamID in the GOAWAY
frame to include the offending stream. This informs the client
that retrying the request is unlikely to succeed, and avoids
retry loops.

This change requires passing the stream ID of the offending
stream from Framer.ReadFrame up to the caller. The most sensible
way to do this would probably be in the error. However,
ReadFrame currently returns a defined error type for
connection-ending errors (ConnectionError), and that type is a
uint32 with no place to put the stream ID. Rather than changing
the returned errors, ReadFrame now returns an error along with
a non-nil Frame containing the stream ID, when a stream is
responsible for a connection-ending error.

For golang/go#66668

Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4
Reviewed-on: https://go-review.googlesource.com/c/net/+/576756
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/576895 mentions this issue: http2: don't retry the first request on a connection on GOAWAY error

@neild
Copy link
Contributor

neild commented Apr 5, 2024

@gopherbot please open backport issues. This is a bug/deficiency in a previous cherrypick, so we should backport the fix as well.

@gopherbot
Copy link

Backport issue(s) opened: #66697 (for 1.21), #66698 (for 1.22).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@neild
Copy link
Contributor

neild commented Apr 5, 2024

@MadhavJivrajani I don't believe this warrants an off-schedule minor release, but I do think we should backport https://go.dev/cl/576756 into the next minor releases.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 5, 2024
@gopherbot
Copy link

Change https://go.dev/cl/576679 mentions this issue: all: update vendored golang.org/x/net

gopherbot pushed a commit to golang/net that referenced this issue Apr 5, 2024
When a server sends a GOAWAY frame, it indicates the ID of the
last stream it processed. We use this to mark any requests after
that stream as being safe to retry on a new connection.

Change this to not retry the first request on a connection if we
get a GOAWAY with an error, even if the GOAWAY has a stream ID
of 0 indicating that it didn't process that request.
If we're getting an error as the first result on a new connection,
then there's either something wrong with the server or something
wrong with our request; either way, retrying isn't likely to be
productive and may be unsafe.

This matches the behavior of the HTTP/1 client, which
also avoids retrying the first request on a new connection.

For golang/go#66668
Fixes golang/go#60636

Change-Id: I90ea7cfce2974dd413f7cd8b78541678850376a5
Reviewed-on: https://go-review.googlesource.com/c/net/+/576895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot

This comment was marked as resolved.

gopherbot pushed a commit that referenced this issue Apr 8, 2024
Pull in CL 576895:

	ec05fdcd http2: don't retry the first request on a connection on GOAWAY error

For #66668.
Fixes #60636.

Change-Id: I9903607e3d432a5db0325da82eb7f4b378fbddde
Reviewed-on: https://go-review.googlesource.com/c/go/+/576976
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/578336 mentions this issue: [internal-branch.go1.21-vendor] http2: send correct LastStreamID in stream-caused GOAWAY

@gopherbot
Copy link

Change https://go.dev/cl/578338 mentions this issue: [internal-branch.go1.22-vendor] http2: send correct LastStreamID in stream-caused GOAWAY

gopherbot pushed a commit to golang/net that referenced this issue Apr 12, 2024
…tream-caused GOAWAY

When closing a connection because a stream contained a request we
didn't like (for example, because the request headers exceed
the maximum we will accept), set the LastStreamID in the GOAWAY
frame to include the offending stream. This informs the client
that retrying the request is unlikely to succeed, and avoids
retry loops.

This change requires passing the stream ID of the offending
stream from Framer.ReadFrame up to the caller. The most sensible
way to do this would probably be in the error. However,
ReadFrame currently returns a defined error type for
connection-ending errors (ConnectionError), and that type is a
uint32 with no place to put the stream ID. Rather than changing
the returned errors, ReadFrame now returns an error along with
a non-nil Frame containing the stream ID, when a stream is
responsible for a connection-ending error.

Merge conflicts were avoided by cherry-picking CL 576235 (test deflake)
prior to this, and then by squashing CL 576175 (typo fix) into this CL.

For golang/go#66668.
For golang/go#66697.

Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4
Reviewed-on: https://go-review.googlesource.com/c/net/+/576756
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-on: https://go-review.googlesource.com/c/net/+/578336
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit to golang/net that referenced this issue Apr 12, 2024
…tream-caused GOAWAY

When closing a connection because a stream contained a request we
didn't like (for example, because the request headers exceed
the maximum we will accept), set the LastStreamID in the GOAWAY
frame to include the offending stream. This informs the client
that retrying the request is unlikely to succeed, and avoids
retry loops.

This change requires passing the stream ID of the offending
stream from Framer.ReadFrame up to the caller. The most sensible
way to do this would probably be in the error. However,
ReadFrame currently returns a defined error type for
connection-ending errors (ConnectionError), and that type is a
uint32 with no place to put the stream ID. Rather than changing
the returned errors, ReadFrame now returns an error along with
a non-nil Frame containing the stream ID, when a stream is
responsible for a connection-ending error.

Merge conflicts were avoided by cherry-picking CL 576235 (test deflake)
prior to this, and then by squashing CL 576175 (typo fix) into this CL.

For golang/go#66668.
For golang/go#66698.

Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4
Reviewed-on: https://go-review.googlesource.com/c/net/+/576756
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-on: https://go-review.googlesource.com/c/net/+/578338
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/578357 mentions this issue: [release-branch.go1.21] net/http: update bundled golang.org/x/net/http2

@gopherbot
Copy link

Change https://go.dev/cl/578358 mentions this issue: [release-branch.go1.22] net/http: update bundled golang.org/x/net/http2

gopherbot pushed a commit that referenced this issue Apr 12, 2024
Pull in CL 578338:

	db050b07 http2: send correct LastStreamID in stream-caused GOAWAY

For #66668.
Fixes #66698.

Change-Id: Ie7cbc44cd559eb8bc34f6c4ad4ead678ec2f55ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/578358
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Apr 12, 2024
Pull in CL 578336:

	ef58d90f http2: send correct LastStreamID in stream-caused GOAWAY

For #66668.
Fixes #66697.

Change-Id: I91fc8a67f21fadcb1801ff29d5e2b0453db89617
Reviewed-on: https://go-review.googlesource.com/c/go/+/578357
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <-  pkg == "net/http" && test ~ `TestRequestLimit` && status == "ABORT"
2024-04-03 15:10 go1.21-linux-386 release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2
2024-04-03 15:10 go1.21-linux-amd64-boringcrypto release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2
2024-04-03 15:10 go1.21-linux-amd64-clang15 release-branch.go1.21@ae591334 net/http.TestRequestLimit/h1 [ABORT] (log)
=== RUN   TestRequestLimit/h1
=== PAUSE TestRequestLimit/h1
=== CONT  TestRequestLimit/h1
2024-04-03 15:10 go1.21-linux-amd64-noopt release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2
2024-04-03 15:10 go1.22-linux-386-softfloat release-branch.go1.22@e55d7cf8 net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2
2024-04-03 15:35 go1.21-linux-amd64-boringcrypto release-branch.go1.21@d8392e69 net/http.TestRequestLimit/h1 [ABORT] (log)
=== RUN   TestRequestLimit/h1
=== PAUSE TestRequestLimit/h1
=== CONT  TestRequestLimit/h1
2024-04-03 15:35 go1.21-linux-amd64-goamd64v3 release-branch.go1.21@d8392e69 net/http.TestRequestLimit/h1 [ABORT] (log)
=== RUN   TestRequestLimit/h1
=== PAUSE TestRequestLimit/h1
=== CONT  TestRequestLimit/h1
2024-04-03 15:35 go1.22-linux-386-softfloat release-branch.go1.22@dddf0ae4 net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2
2024-04-03 18:00 go1.22-linux-386-softfloat release-branch.go1.22@a65a2bbd net/http.TestRequestLimit/h2 [ABORT] (log)
=== RUN   TestRequestLimit/h2
=== PAUSE TestRequestLimit/h2
=== CONT  TestRequestLimit/h2

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: No status
Development

No branches or pull requests

4 participants