Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/netutil: TestLimitListenerSaturation killed with SIGQUIT after apparent deadlock #61811

Open
gopherbot opened this issue Aug 7, 2023 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Milestone

Comments

@gopherbot
Copy link

gopherbot commented Aug 7, 2023

#!watchflakes
post <- pkg == "golang.org/x/net/netutil" && test == "TestLimitListenerSaturation" && `SIGQUIT: quit` && `^golang\.org/x/net/netutil\.\(\*limitListener\)\.Accept` && goos == "netbsd"

Issue created automatically to collect these failures.

Example (log):

SIGQUIT: quit
PC=0x7be5c m=7 sigcode=0

r0      0x4
r1      0x0
r2      0x0
r3      0x0
r4      0x400003d748
r5      0x0
r6      0x5
...
	/var/gobuilder/buildlet/go/src/runtime/proc.go:404
runtime.semacquire1(0x40002060b8, 0x78?, 0x1, 0x0, 0x28?)
	/var/gobuilder/buildlet/go/src/runtime/sema.go:160 +0x208 fp=0x400004cdf0 sp=0x400004cda0 pc=0x5bdd8
sync.runtime_Semacquire(0x400011e340?)
	/var/gobuilder/buildlet/go/src/runtime/sema.go:62 +0x2c fp=0x400004ce30 sp=0x400004cdf0 pc=0x772bc
sync.(*WaitGroup).Wait(0x40002060b0)
	/var/gobuilder/buildlet/go/src/sync/waitgroup.go:116 +0x74 fp=0x400004ce50 sp=0x400004ce30 pc=0x82114
golang.org/x/net/netutil.TestLimitListenerSaturation(0x4000224000)
	/var/gobuilder/buildlet/gopath/src/golang.org/x/net/netutil/listen_test.go:203 +0x230 fp=0x400004cf60 sp=0x400004ce50 pc=0x137840
testing.tRunner(0x4000224000, 0x1926b0)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 7, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/net/netutil" && test == "TestLimitListenerSaturation"
2023-08-04 22:30 netbsd-arm64-bsiegert net@c8c0290b go@02366863 x/net/netutil.TestLimitListenerSaturation (log)
SIGQUIT: quit
PC=0x7be5c m=7 sigcode=0

r0      0x4
r1      0x0
r2      0x0
r3      0x0
r4      0x400003d748
r5      0x0
r6      0x5
...
	/var/gobuilder/buildlet/go/src/runtime/proc.go:404
runtime.semacquire1(0x40002060b8, 0x78?, 0x1, 0x0, 0x28?)
	/var/gobuilder/buildlet/go/src/runtime/sema.go:160 +0x208 fp=0x400004cdf0 sp=0x400004cda0 pc=0x5bdd8
sync.runtime_Semacquire(0x400011e340?)
	/var/gobuilder/buildlet/go/src/runtime/sema.go:62 +0x2c fp=0x400004ce30 sp=0x400004cdf0 pc=0x772bc
sync.(*WaitGroup).Wait(0x40002060b0)
	/var/gobuilder/buildlet/go/src/sync/waitgroup.go:116 +0x74 fp=0x400004ce50 sp=0x400004ce30 pc=0x82114
golang.org/x/net/netutil.TestLimitListenerSaturation(0x4000224000)
	/var/gobuilder/buildlet/gopath/src/golang.org/x/net/netutil/listen_test.go:203 +0x230 fp=0x400004cf60 sp=0x400004ce50 pc=0x137840
testing.tRunner(0x4000224000, 0x1926b0)

watchflakes

@gopherbot gopherbot added this to the Unreleased milestone Aug 7, 2023
@bcmills
Copy link
Contributor

bcmills commented Aug 7, 2023

@golang/netbsd (and CC @golang/runtime), the first puzzle here is why this test had to be killed with SIGQUIT instead of panicking with test timed out. go test gave it a 10-minute delay after the 100-minute timeout expired:
https://cs.opensource.google/go/go/+/refs/heads/master:src/cmd/go/internal/test/test.go;l=801-829;drc=460dc37c885b83a27d589befe3f52097fe3363b0

@bcmills
Copy link
Contributor

bcmills commented Aug 7, 2023

The second puzzle is why the test seems to have deadlocked. It has a call to Accept pending, which seems to imply one of three possibilities:

  • Maybe a net.Dial failed spuriously, causing the number of accepted connections to never reach saturation.
    • The test does bake in an assumption that the kernel is able to queue at least attemptsPerWave connections, so maybe that assumption sometimes doesn't hold on NetBSD?
  • Maybe all of the net.Dial calls succeeded, but for some reason the kernel dropped enough of the connections that the limiter failed to saturate.
  • Maybe the time.AfterFunc that closes the saturated channel somehow failed to schedule after the timeout expired.

@bcmills
Copy link
Contributor

bcmills commented Aug 7, 2023

With only N=1 samples it's hard to be sure, but given the number of other flaky failure modes on NetBSD I suspect that this is a NetBSD-specific failure mode.

@bcmills bcmills changed the title x/net/netutil: TestLimitListenerSaturation failures x/net/netutil: TestLimitListenerSaturation killed with SIGQUIT after apparent deadlock Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Projects
Status: No status
Development

No branches or pull requests

2 participants