Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestTransportAndServerSharedBodyRace failures with Proxy copy error #58398

Closed
gopherbot opened this issue Feb 7, 2023 · 5 comments
Closed
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gopherbot
Copy link

gopherbot commented Feb 7, 2023

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportAndServerSharedBodyRace" && `Proxy copy error:` && date < "2023-10-01"

Issue created automatically to collect these failures.

Example (log):

Proxy copy error: EOF
--- FAIL: TestTransportAndServerSharedBodyRace (0.01s)
    --- FAIL: TestTransportAndServerSharedBodyRace/h2 (0.02s)
        serve_test.go:3969: Proxy copy error: EOF
        clientserver_test.go:210: server log: http: TLS handshake error from 127.0.0.1:51608: read tcp 127.0.0.1:36197->127.0.0.1:51608: use of closed network connection

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 7, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportAndServerSharedBodyRace"
2023-02-07 22:22 linux-amd64-clang go@9565d990 net/http.TestTransportAndServerSharedBodyRace (log)
Proxy copy error: EOF
--- FAIL: TestTransportAndServerSharedBodyRace (0.01s)
    --- FAIL: TestTransportAndServerSharedBodyRace/h2 (0.02s)
        serve_test.go:3969: Proxy copy error: EOF
        clientserver_test.go:210: server log: http: TLS handshake error from 127.0.0.1:51608: read tcp 127.0.0.1:36197->127.0.0.1:51608: use of closed network connection

watchflakes

@bcmills bcmills changed the title net/http: TestTransportAndServerSharedBodyRace failures net/http: TestTransportAndServerSharedBodyRace failures with Proxy copy error Feb 8, 2023
@bcmills
Copy link
Contributor

bcmills commented Feb 8, 2023

(attn @neild)

#49336 is for the same test, but has a different (possibly unrelated?) failure mode.

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportAndServerSharedBodyRace" && `Proxy copy error:`
2023-03-08 21:46 linux-amd64 go@4df95d51 net/http.TestTransportAndServerSharedBodyRace (log)
Proxy copy error: EOF
--- FAIL: TestTransportAndServerSharedBodyRace (0.00s)
    --- FAIL: TestTransportAndServerSharedBodyRace/h2 (0.03s)
        serve_test.go:3973: Proxy copy error: EOF

watchflakes

@bcmills bcmills added this to the Go1.22 milestone Sep 11, 2023
@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Sep 11, 2023
@bcmills bcmills assigned bcmills and unassigned bcmills Sep 11, 2023
@bcmills bcmills removed the Testing An issue that has been verified to require only test changes, not just a test failure. label Sep 11, 2023
@gopherbot
Copy link
Author

Change https://go.dev/cl/527196 mentions this issue: net/http: scale rstAvoidanceDelay to reduce test flakiness

gopherbot pushed a commit that referenced this issue Sep 13, 2023
As far as I can tell, some flakiness is unavoidable in tests
that race a large client request write against a server's response
when the server doesn't read the full request.
It does not appear to be possible to simultaneously ensure that
well-behaved clients see EOF instead of ECONNRESET and also prevent
misbehaving clients from consuming arbitrary server resources.
(See RFC 7230 §6.6 for more detail.)

Since there doesn't appear to be a way to cleanly eliminate
this source of flakiness, we can instead work around it:
we can allow the test to adjust the hard-coded delay if it
sees a plausibly-related failure, so that the test can retry
with a longer delay.

As a nice side benefit, this also allows the tests to run more quickly
in the typical case: since the test will retry in case of spurious
failures, we can start with an aggressively short delay, and only back
off to a longer one if it is really needed on the specific machine
running the test.

Fixes #57084.
Fixes #51104.
For #58398.

Change-Id: Ia4050679f0777e5eeba7670307a77d93cfce856f
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-race,gotip-linux-amd64-race,gotip-windows-amd64-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/527196
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
@bcmills
Copy link
Contributor

bcmills commented Nov 9, 2023

I believe this is fixed as of CL 527196.

@bcmills bcmills closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Done
Development

No branches or pull requests

2 participants