Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: apparent deadlock in TestTransportMaxConnsPerHost #45570

Closed
bcmills opened this issue Apr 14, 2021 · 4 comments
Closed

net/http: apparent deadlock in TestTransportMaxConnsPerHost #45570

bcmills opened this issue Apr 14, 2021 · 4 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 14, 2021

2021-04-14T03:15:34-e7ab1a5/windows-amd64-2008

The test timed out with many copies of this goroutine stack:

goroutine 2201 [select, 5 minutes]:
net/http.(*Transport).getConn(0xc0000637c0, 0xc0003bb540, 0x0, 0xc000382300, 0x4, 0xc00038f4d0, 0xf, 0x0, 0x0, 0x0, ...)
	C:/workdir/go/src/net/http/transport.go:1370 +0x66f
net/http.(*Transport).roundTrip(0xc0000637c0, 0xc0000ebf00, 0x0, 0x0, 0x0)
	C:/workdir/go/src/net/http/transport.go:579 +0x9b0
net/http.(*Transport).RoundTrip(0xc0000637c0, 0xc0000ebf00, 0xc0000637c0, 0x0, 0x0)
	C:/workdir/go/src/net/http/roundtrip.go:18 +0x3c
net/http.send(0xc0000ebf00, 0x9bdda0, 0xc0000637c0, 0x0, 0x0, 0x0, 0xc00038a1c0, 0x0, 0x1, 0x0)
	C:/workdir/go/src/net/http/client.go:251 +0x714
net/http.(*Client).send(0xc0002b9020, 0xc0000ebf00, 0x0, 0x0, 0x0, 0xc00038a1c0, 0x0, 0x1, 0x20)
	C:/workdir/go/src/net/http/client.go:175 +0xcf
net/http.(*Client).do(0xc0002b9020, 0xc0000ebf00, 0x0, 0x0, 0x0)
	C:/workdir/go/src/net/http/client.go:723 +0xb2e
net/http.(*Client).Do(...)
	C:/workdir/go/src/net/http/client.go:591
net/http_test.TestTransportMaxConnsPerHost.func2.2()
	C:/workdir/go/src/net/http/transport_test.go:652 +0x425
net/http_test.TestTransportMaxConnsPerHost.func2.3(0xc000018300, 0xc0002b90e0)
	C:/workdir/go/src/net/http/transport_test.go:668 +0x58
created by net/http_test.TestTransportMaxConnsPerHost.func2
	C:/workdir/go/src/net/http/transport_test.go:666 +0x386

This is the first such failure in the logs since #34941 was fixed in 2019, so tentatively marking as release-blocker for Go 1.17 — it may be a regression.

CC @bradfitz @fraenkel @neild @empijei

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 14, 2021
@bcmills bcmills added this to the Go1.17 milestone Apr 14, 2021
@neild neild self-assigned this Apr 14, 2021
@fraenkel
Copy link
Contributor

It is a valid issue. I will send in a fix which will probably cause some other failure if it happens again.
The current issue is that we expect 1 connection and the channel is sized to 1. If have an additional dial, we hang.

@gopherbot
Copy link

Change https://golang.org/cl/310213 mentions this issue: http: allow multiple dials in TestTransportMaxConnsPerHost

@neild
Copy link
Contributor

neild commented Apr 15, 2021

I have not been able to reproduce this failure after leaving a windows-amd64-2008 builder running the test on repeat for an hour. Whatever happened seems to be rare.

The problem seems to be an unexpected additional Dial call. My best guess is that the test's connection dropped for some unknown reason, resulting in a redial. The test hangs and times out in this situation, rather than reporting an error. CL 310213 fixes the hang.

I don't think this is a release blocker unless it crops up again.

gopherbot pushed a commit that referenced this issue Apr 15, 2021
If there is more than the expected single dial, the channel will block.
Allow at least one connection per client, and do the expected cleanup.

Updates #45570

Change-Id: Iaecd45298a7d7c591b7d7b1be13cea6e4a1e2e85
Reviewed-on: https://go-review.googlesource.com/c/go/+/310213
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Trust: Damien Neil <dneil@google.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
@neild
Copy link
Contributor

neild commented Apr 15, 2021

I'm going to tentatively close this. CL 310213 should fix the test to not panic on failure, and I can't reproduce that failure at all. We can revisit if it flakes again.

@neild neild closed this as completed Apr 15, 2021
@golang golang locked and limited conversation to collaborators Apr 15, 2022
@rsc rsc unassigned neild Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants