Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestListenCloseListen failures #65175

Closed
gopherbot opened this issue Jan 19, 2024 · 7 comments
Closed

net: TestListenCloseListen failures #65175

gopherbot opened this issue Jan 19, 2024 · 7 comments
Assignees
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@gopherbot
Copy link

#!watchflakes
default <- pkg == "net" && test == "TestListenCloseListen"

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestListenCloseListen
    net_test.go:315: failed on try 1/10: listen tcp 127.0.0.1:48574: bind: address already in use
--- FAIL: TestListenCloseListen (0.00s)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 19, 2024
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "net" && test == "TestListenCloseListen"
2024-01-08 18:26 gotip-solaris-amd64 go@75984918 net.TestListenCloseListen (log)
=== RUN   TestListenCloseListen
    net_test.go:315: failed on try 1/10: listen tcp 127.0.0.1:48574: bind: address already in use
--- FAIL: TestListenCloseListen (0.00s)

watchflakes

@bcmills
Copy link
Contributor

bcmills commented Jan 19, 2024

This test is inherently racy:
https://cs.opensource.google/go/go/+/master:src/net/net_test.go;l=301-302;drc=a81507868344dccebef13c6d8d890633e59a93e3

Depending on the other tests running on the machine, it may have already reused the port for some other listener, perhaps even in some other test process.

Perhaps we could keep the port from being reused by leaving an open connection on it, but I wonder if that would also prevent the Listen on that port from reopening it. 🤔

(CC @ianlancetaylor @neild)

@bcmills bcmills added this to the Backlog milestone Jan 19, 2024
@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jan 19, 2024
@bcmills bcmills self-assigned this Jan 20, 2024
@bcmills bcmills added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. labels Jan 20, 2024
@bcmills bcmills modified the milestones: Backlog, Go1.23 Jan 20, 2024
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 20, 2024
@gopherbot
Copy link
Author

Change https://go.dev/cl/557177 mentions this issue: net: attempt to deflake TestListenCloseListen

@gopherbot gopherbot reopened this Jan 22, 2024
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "net" && test == "TestListenCloseListen"
2024-01-22 16:54 linux-arm64-race go@4ca1caf4 net.TestListenCloseListen (log)
2024/01/22 17:06:59 killing splice client after 5 second shutdown timeout
==================
WARNING: DATA RACE
Write at 0x00c00039e770 by goroutine 817:
  net.TestListenCloseListen()
      /tmp/workdir/go/src/net/net_test.go:357 +0x4a0
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:1689 +0x180
  testing.(*T).Run.gowrap1()
      /tmp/workdir/go/src/testing/testing.go:1742 +0x40
...
Goroutine 2226 (finished) created at:
  net.TestListenCloseListen()
      /tmp/workdir/go/src/net/net_test.go:307 +0x21c
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:1689 +0x180
  testing.(*T).Run.gowrap1()
      /tmp/workdir/go/src/testing/testing.go:1742 +0x40
==================
--- FAIL: TestListenCloseListen (0.00s)
    testing.go:1398: race detected during execution of test

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "net" && test == "TestListenCloseListen"
2024-01-22 16:30 windows-amd64-race go@41c05ea4 net.TestListenCloseListen (log)
==================
WARNING: DATA RACE
Write at 0x00c0000248c0 by goroutine 409:
  net.TestListenCloseListen()
      C:/workdir/go/src/net/net_test.go:357 +0x644
  testing.tRunner()
      C:/workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      C:/workdir/go/src/testing/testing.go:1742 +0x44

...
Goroutine 1510 (finished) created at:
  net.TestListenCloseListen()
      C:/workdir/go/src/net/net_test.go:307 +0x29e
  testing.tRunner()
      C:/workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      C:/workdir/go/src/testing/testing.go:1742 +0x44
==================
--- FAIL: TestListenCloseListen (0.00s)
    testing.go:1398: race detected during execution of test
2024-01-22 16:50 windows-amd64-race go@846bb475 net.TestListenCloseListen (log)
==================
WARNING: DATA RACE
Write at 0x00c000598b40 by goroutine 408:
  net.TestListenCloseListen()
      C:/workdir/go/src/net/net_test.go:357 +0x644
  testing.tRunner()
      C:/workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      C:/workdir/go/src/testing/testing.go:1742 +0x44

...
Goroutine 1539 (finished) created at:
  net.TestListenCloseListen()
      C:/workdir/go/src/net/net_test.go:307 +0x29e
  testing.tRunner()
      C:/workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      C:/workdir/go/src/testing/testing.go:1742 +0x44
==================
--- FAIL: TestListenCloseListen (0.00s)
    testing.go:1398: race detected during execution of test
2024-01-22 17:00 linux-amd64-race go@558919b4 net.TestListenCloseListen (log)
2024/01/22 17:12:04 killing splice client after 5 second shutdown timeout
==================
WARNING: DATA RACE
Write at 0x00c000694200 by goroutine 864:
  net.TestListenCloseListen()
      /workdir/go/src/net/net_test.go:357 +0x644
  testing.tRunner()
      /workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /workdir/go/src/testing/testing.go:1742 +0x44
...
Goroutine 2285 (finished) created at:
  net.TestListenCloseListen()
      /workdir/go/src/net/net_test.go:307 +0x29e
  testing.tRunner()
      /workdir/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /workdir/go/src/testing/testing.go:1742 +0x44
==================
--- FAIL: TestListenCloseListen (0.01s)
    testing.go:1398: race detected during execution of test

watchflakes

@gopherbot
Copy link
Author

Change https://go.dev/cl/557536 mentions this issue: net: delete TestListenCloseListen

gopherbot pushed a commit that referenced this issue Jan 22, 2024
In CL 557177, I attempted to fix a logical race in this test (#65175).
However, I introduced a data race in the process (#65209).

The race was reported on the windows-amd64-race builder. When I tried
to reproduce it on linux/amd64, I added a time.Sleep in the Accept
loop. However, that Sleep causes the test to fail outright with
EADDRINUSE, which suggests that my earlier guess about the open Conn
preventing reuse of the port was, in fact, incorrect.

On some platforms we could instead use SO_REUSEPORT and avoid closing
the first Listener entirely, but that wouldn't be even remotely in the
spirit of the original test.

Since I don't see a way to preserve the test in a way that is not
inherently flaky / racy, I suggest that we just delete it. It was
originally added as a regression test for a bug in the nacl port,
which no longer exists anyway. (Some of that code may live on in the
wasm port, but it doesn't seem worth maintaining a flaky
port-independent test to maintain a regression test for a bug specific
to secondary platforms.)

Fixes #65209.
Updates #65175.

Change-Id: I32f9da779d24f2e133571f0971ec460cebe7820a
Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/557536
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Also make it flakier in longtest mode by burning through more
ephemeral ports. (Burning through the ports raised the failure rate
for me locally enough to reliably reproduce the failure in golang#65175 with
-count=10.)

Fixes golang#65175 (I hope).

Change-Id: I5f5b68b6bf6a6aa92e66f0288078817041656a3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/557177
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
In CL 557177, I attempted to fix a logical race in this test (golang#65175).
However, I introduced a data race in the process (golang#65209).

The race was reported on the windows-amd64-race builder. When I tried
to reproduce it on linux/amd64, I added a time.Sleep in the Accept
loop. However, that Sleep causes the test to fail outright with
EADDRINUSE, which suggests that my earlier guess about the open Conn
preventing reuse of the port was, in fact, incorrect.

On some platforms we could instead use SO_REUSEPORT and avoid closing
the first Listener entirely, but that wouldn't be even remotely in the
spirit of the original test.

Since I don't see a way to preserve the test in a way that is not
inherently flaky / racy, I suggest that we just delete it. It was
originally added as a regression test for a bug in the nacl port,
which no longer exists anyway. (Some of that code may live on in the
wasm port, but it doesn't seem worth maintaining a flaky
port-independent test to maintain a regression test for a bug specific
to secondary platforms.)

Fixes golang#65209.
Updates golang#65175.

Change-Id: I32f9da779d24f2e133571f0971ec460cebe7820a
Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/557536
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@bcmills
Copy link
Contributor

bcmills commented Mar 11, 2024

Fixed (by deleting) in https://go.dev/cl/557536.

@bcmills bcmills closed this as completed Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Status: Done
Development

No branches or pull requests

2 participants