Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: deadlock in TestMissingStatusNoPanic on release-branch.go1.15 branch #45358

Closed
cagedmantis opened this issue Apr 2, 2021 · 16 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@cagedmantis
Copy link
Contributor

2505e26/windows-amd64-longtest
1035bb9/windows-amd64-longtest
b35c38a/windows-amd64-longtest
636a6d14/windows-amd64-longtest

May be related to ##42942

CC CC @alexbrainman @fraenkel @neild @bradfitz @golang/release

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 2, 2021
@fraenkel
Copy link
Contributor

fraenkel commented Apr 3, 2021

@cagedmantis I think this is the wrong test. While the error occurs on TestCancelRquestWhenSharing after a few seconds, the moire obvious one is TestMissingStatusNoPanice which has been stuck for 14m.

goroutine 17971 [chan receive, 14 minutes]:
net/http_test.TestMissingStatusNoPanic(0xc00060e000)
	C:/workdir/go/src/net/http/transport_test.go:5320 +0x450

@bcmills
Copy link
Contributor

bcmills commented May 13, 2021

@gopherbot
Copy link

Change https://golang.org/cl/319275 mentions this issue: net/http: prevent infinite wait during TestMissingStatusNoPanic

@bcmills
Copy link
Contributor

bcmills commented May 13, 2021

@bcmills bcmills changed the title net/http: TestCancelRequestWhenSharingConnection is flaky on release-branch.go1.15 branch net/http: deadlock in TestMissingStatusNoPanic on release-branch.go1.15 branch May 13, 2021
@bcmills bcmills added this to the Go1.15.13 milestone May 13, 2021
@fraenkel
Copy link
Contributor

All of the failures are due to TestMissingStatusNoPanic so far. My fix above will just expose the real issue which we currently aren't seeing.

gopherbot pushed a commit that referenced this issue May 14, 2021
If the client request never makes it to the server, the outstanding
accept is never broken. Change the test to always close the listening
socket when the client request completes.

Updates #45358

Change-Id: I744a91dfa11704e7e528163d7669c394e90456dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/319275
Trust: Heschi Kreinick <heschi@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/320189 mentions this issue: [release-branch.go1.15] net/http: prevent infinite wait during TestMissingStatusNoPanic

@gopherbot
Copy link

Change https://golang.org/cl/320190 mentions this issue: [release-branch.go1.16] net/http: prevent infinite wait during TestMissingStatusNoPanic

@fraenkel
Copy link
Contributor

On the plus side, we now see the actual issue(s) in Go 1.15 failures.

@fraenkel
Copy link
Contributor

I see at least one of the issues and may be the actual problem....
Notice its all using the same port.

TestTransportPersistConnReadLoopEOF is failing but not sending a bogus conn to the connc channel.

@fraenkel
Copy link
Contributor

The above is incorrect, the testcase is fine.
The error being reported happens when the system is underload and usually out of resources. One can tweak the TCP settings to reduce the failure.

There is also this https://go.googlesource.com/benchmarks/+/refs/heads/master/http/http.go#42
I guess something got fixed/improved in 1.16+.

@gopherbot
Copy link

Closed by merging e3c9537 to release-branch.go1.15.

gopherbot pushed a commit that referenced this issue Jun 2, 2021
…ssingStatusNoPanic

If the client request never makes it to the server, the outstanding
accept is never broken. Change the test to always close the listening
socket when the client request completes.

Updates #45358

Change-Id: I744a91dfa11704e7e528163d7669c394e90456dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/319275
Trust: Heschi Kreinick <heschi@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit c0a7ecf)
Reviewed-on: https://go-review.googlesource.com/c/go/+/320189
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Jun 2, 2021
…ssingStatusNoPanic

If the client request never makes it to the server, the outstanding
accept is never broken. Change the test to always close the listening
socket when the client request completes.

Updates #45358

Change-Id: I744a91dfa11704e7e528163d7669c394e90456dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/319275
Trust: Heschi Kreinick <heschi@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit c0a7ecf)
Reviewed-on: https://go-review.googlesource.com/c/go/+/320190
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@dmitshur

This comment has been minimized.

@dmitshur dmitshur reopened this Jun 2, 2021
@dmitshur

This comment has been minimized.

@dmitshur dmitshur reopened this Jun 2, 2021
@neild
Copy link
Contributor

neild commented Jun 2, 2021

The root cause of the failures is port exhaustion, which hasn't been addressed, but the deadlock in TestMissingStatusNoPanic should be fixed.

I haven't been able to reproduce the port exhaustion failures manually after running go test net a number of times on a windows-amd64-longtest builder today. Possibly I'm not taking the right steps to reproduce the problem, or possibly whatever was causing that builder to run out of ports has been addressed. I'm going to tentatively assume the latter unless these failures show up again.

@fraenkel
Copy link
Contributor

fraenkel commented Jun 2, 2021

I think it might be the combination of things. One test that didn't finish, holding up others than haven't done their cleanup.
It was a bit weird that it was only occurring in one release.

@dmitshur dmitshur modified the milestones: Go1.15.14, Go1.15.15 Jul 12, 2021
@dmitshur dmitshur added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 4, 2021
@dmitshur
Copy link
Contributor

dmitshur commented Aug 4, 2021

The root cause of the failures is port exhaustion, which hasn't been addressed, but the deadlock in TestMissingStatusNoPanic should be fixed.

The port exhaustion issue is now also fixed (by CL 339829). The windows-amd64-longtest builder is looking healthy on release-branch.go1.15, and I don't think anything more is left here, so closing.

It was a bit weird that it was only occurring in one release.

I think it was an unfortunate coincidence to do with runtime scheduling. On 1.15 the TestCancelRequestWhenSharingConnection test was managing to reach the 32k ephemeral port limit during its 10 seconds, but on 1.16+ it was only using up around 30k~, leaving just barely enough room for the rest of net/http tests to succeed. Rerunning the same test in quick succession would fail on 1.16+ as well.

@dmitshur dmitshur closed this as completed Aug 4, 2021
@golang golang locked and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants