Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestDialParallel failures due to longer-than-expected delay on Windows #52173

Closed
bcmills opened this issue Apr 5, 2022 · 9 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 5, 2022

--- FAIL: TestDialParallel (6.16s)
    dial_test.go:173: dialClosedPort: measured delay 1.0028044s
    dial_test.go:313: #5: got 1.2757236s; want <= 1.20336528s
FAIL
FAIL	net	55.514s

greplogs --dashboard -md -l -e 'FAIL: TestDialParallel .*(?:\n .*)*got .*; want <= .*'

2022-04-05T18:01:26-9e16cc1/windows-amd64-longtest

Note that the test currently uses a hard-coded heuristic for the timing bounds:
https://cs.opensource.google/go/go/+/master:src/net/dial_test.go;l=302-309;drc=da7891f6f36c48f2931ed916ed305330c06f9bd7

(See previously #35616; CC @ianlancetaylor @neild.)

@bcmills
Copy link
Contributor Author

bcmills commented Apr 5, 2022

Since windows/amd64 is a first-class port, this is a release-blocker for Go 1.19.

(It may be possible to resolve by making the timing bounds even more lax, or by somehow eliminating the timing bounds entirely if the test is meaningful without them.)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Apr 5, 2022
@bcmills bcmills added this to the Go1.19 milestone Apr 5, 2022
@bcmills bcmills changed the title net: TestDialParallel failures due to net: TestDialParallel failures due to longer-than-expected delay on Windows Apr 6, 2022
@prattmic
Copy link
Member

prattmic commented May 20, 2022

FWIW, I've been attempting to reproduce a different Windows issue, but have managed to reproduce this twice thus far in ~1hr of testing with 25 windows-amd64-longtest gomotes running all.bat using https://github.com/mknyszek/goswarm/.

Edit: I got 58(!) cases of this over the weekend.

@gopherbot
Copy link

Change https://go.dev/cl/408354 mentions this issue: net: add even more timing slop for TestDialParallel

@prattmic prattmic reopened this Jun 2, 2022
@prattmic
Copy link
Member

prattmic commented Jun 2, 2022

I am still reproducing this after https://go.dev/cl/408354, though at a much lower rate. In the past ~24hr of 25 windows-amd64-longtest gomotes running all.bat I have reproduced this once.

--- FAIL: TestDialParallel (6.45s)
    dial_test.go:173: dialClosedPort: measured delay 1.0830134s
    dial_test.go:332: #4 (cancel): got 104.4913ms; want <= 100ms
FAIL
FAIL    net     42.024s

@heschi
Copy link
Contributor

heschi commented Jun 6, 2022

@prattmic is this OK after beta 1?

@prattmic prattmic added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Jun 6, 2022
@prattmic
Copy link
Member

prattmic commented Jun 6, 2022

@neild will know better, but I imagine so, it is quite rare now.

@neild
Copy link
Contributor

neild commented Jun 6, 2022

Yes, this is okay after beta1.

The test is irredeemably flaky. I think it needs to be rewritten.

@gopherbot
Copy link

Change https://go.dev/cl/410754 mentions this issue: net: use synthetic network in TestDialParallel

@gopherbot
Copy link

Change https://go.dev/cl/410957 mentions this issue: net: fix testHookDialTCP race

gopherbot pushed a commit that referenced this issue Jun 8, 2022
CL 410754 introduces a race accessing the global testHookDialTCP hook.
Avoiding this race is difficult, since Dial can return while
goroutines it starts are still running. Add a version of this
hook to sysDialer, so it can be set on a per-test basis.

(Perhaps other uses of this hook should be moved to use the
sysDialer-local hook, but this change fixes the immediate data race.)

For #52173.

Change-Id: I8fb9be13957e91f92919cae7be213c38ad2af75a
Reviewed-on: https://go-review.googlesource.com/c/go/+/410957
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@rsc rsc unassigned neild Jun 22, 2022
@golang golang locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants