New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: TestDialTimeout/1m/5s failures on Windows LUCI builders #62377
Comments
Just kidding, looks like you landed a fix already. Duplicate of #62359. I'll leave this open and see if the issue still shows up on LUCI. Fully one step behind this afternoon. 😅 |
What I landed earlier was an arm64-specific skip, not a proper fix. 🙃 Good to know that this isn't architecture-specific, at least. |
Change https://go.dev/cl/524595 mentions this issue: |
The LUCI builders seem to show that the failure mode in #62359 is not specific to windows/arm64, but it occurs to me that we ought to be able to eventually retry by making the timeout so short that the remote end can't possibly respond in time (discounting the possibility that the kernel itself might short-circuit the loopback address). For #62377. Updates #62359. Updates #56876. Change-Id: I1fb5fa4f2a5d2cfe35465f34248ed9a035f91f4f Reviewed-on: https://go-review.googlesource.com/c/go/+/524595 Reviewed-by: Damien Neil <dneil@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change https://go.dev/cl/524936 mentions this issue: |
The time granularity on windows is large enough that setting even an implausibly small timeout still gives ConnectEx enough time to succeed before the timeout expires. That causes TestDialTimeout to sometimes flake, because it expects to be able to provoke a timeout using some nonzero duration. This change takes a two-pronged approach to address the problem: 1. We can set a deadline on the FD more aggressively. (If the Context has already expired, or the deadline is already known, we can go ahead and set it on the fd without waiting for a background goroutine to get around to it.) 2. We can reintroduce a test hook to ensure that Dial takes a measurable amount of time before it completes, so that setting an implausibly short deadline sets that deadline in the past instead of the future. Together, these reduce the flake rate on a windows-amd64-longtest gomote from around 1-in-10 to less than 1-in-2000. For #62377. Change-Id: I03975c32f61fffa9f6f84efb3c474a01ac5a0d1e Reviewed-on: https://go-review.googlesource.com/c/go/+/524936 Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Damien Neil <dneil@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
I believe this is fixed by https://go.dev/cl/524936. @mknyszek, can you confirm? |
I don't see any new failures on LUCI: https://ci.chromium.org/ui/test/golang/net.TestDialTimeout%2F1ms%2F5s. I think we're good here, thanks Bryan! |
The text was updated successfully, but these errors were encountered: