Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: frequent failures in TestSyscallN on windows-amd64-2012 since 2021-08-19 #48012

Closed
bcmills opened this issue Aug 27, 2021 · 6 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows release-blocker Soon This needs to be done soon. (regressions, serious bugs, outages)
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 27, 2021

$ greplogs --dashboard -md -e FAIL: TestSyscallN -l

2021-08-27T12:24:13-acdea4f/windows-amd64-2012
2021-08-27T01:42:38-d7e2e2e/windows-amd64-2012
2021-08-27T00:36:54-62f88b6/windows-amd64-2012
2021-08-27T00:36:45-e7eee5e/windows-amd64-2012
2021-08-26T22:37:54-967a801/windows-amd64-2012
2021-08-26T19:34:58-eb6a07f/windows-amd64-2012
2021-08-26T19:07:27-1f8d456/windows-amd64-2012
2021-08-26T18:26:42-5e6a7e9/windows-amd64-2012
2021-08-26T12:19:23-770df2e/windows-amd64-2012
2021-08-25T23:44:02-4f26202/windows-amd64-2012
2021-08-25T23:43:58-0ac64f6/windows-amd64-2012
2021-08-25T23:43:53-4068fb6/windows-amd64-2012
2021-08-25T19:45:20-6cf1d5d/windows-amd64-2012
2021-08-25T19:29:15-5baf60d/windows-amd64-2012
2021-08-25T19:06:16-3d66767/windows-amd64-2012
2021-08-25T18:27:41-d2f002c/windows-amd64-2012
2021-08-25T03:13:06-e1fcf88/windows-amd64-2012
2021-08-25T03:12:43-d37b8de/windows-amd64-2012
2021-08-25T01:57:42-de1c934/windows-amd64-2012
2021-08-24T18:30:13-5b64381/windows-amd64-2012
2021-08-24T16:36:55-b1cdf86/windows-amd64-2012
2021-08-24T15:55:38-d70c69d/windows-amd64-2012
2021-08-24T00:01:29-8eeb1bf/windows-amd64-2012
2021-08-23T19:46:36-6b9e3f8/windows-amd64-2012
2021-08-23T15:27:42-f1d8ea1/windows-amd64-2012
2021-08-23T13:09:11-7a6d64f/windows-amd64-2012
2021-08-23T13:07:07-baf2866/windows-amd64-2012
2021-08-23T11:23:58-c1a1478/windows-amd64-2012
2021-08-22T21:43:43-1958582/windows-amd64-2012
2021-08-22T13:14:19-bd68459/windows-amd64-2012
2021-08-22T13:11:50-96d816c/windows-amd64-2012
2021-08-21T11:23:14-8fff20f/windows-amd64-2012
2021-08-21T00:55:22-e17439e/windows-amd64-2012
2021-08-21T00:24:02-c991278/windows-amd64-2012
2021-08-20T19:44:02-ab9aaf4/windows-amd64-2012
2021-08-20T18:45:25-0f25251/windows-amd64-2012
2021-08-20T18:44:58-4d00fcb/windows-amd64-2012
2021-08-19T20:50:13-65074a4/windows-amd64-2012

--- FAIL: TestSyscallN (0.00s)
    --- FAIL: TestSyscallN/arg-25 (0.91s)
        testing.go:967: TempDir RemoveAll cleanup: remove C:\Users\gopher\AppData\Local\Temp\1\TestSyscallNarg-252181793634\001\mydll.dll: Access is denied.
FAIL
FAIL	runtime	55.584s

This appears to be a very recent regression, so it should be bisected and either rolled back or fixed. (CC @golang/release).

The failure mode closely resembles #45672, and probably shares the same underlying cause as #25965 (CC @bufflig).

@bcmills bcmills added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker Soon This needs to be done soon. (regressions, serious bugs, outages) labels Aug 27, 2021
@bcmills bcmills added this to the Go1.18 milestone Aug 27, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Aug 27, 2021

In the similar issue #19491, we fixed the test flakiness by adding a retry loop around the RemoveAll call (via cmd/go/internal/robustio.RemoveAll).

As noted in #45672 (comment):

I'm not sure whether it would be better to change [the specific test] to use robustio.RemoveAll explicitly, or to change the testing package to retry RemoveAll on Windows in case of Access is denied errors, or perhaps to change something even lower in the stack on Windows (perhaps os.RemoveAll itself?) to make it more robust.

@changkun
Copy link
Member

changkun commented Aug 27, 2021

I suspect the problem might be because of a large number of parallel use of t.TempDir.

In the CL, the test added a t.Parallel() and speed up the execution time of the entire test, suggested by @ianlancetaylor .

A simple fix on the test itself may drop the t.Parallel()

@gopherbot
Copy link

Change https://golang.org/cl/345670 mentions this issue: runtime: avoid run TestSyscallN in parallel

@cagedmantis
Copy link
Contributor

@bufflig would you be able to take a look since this is labelled with soon.

@heschi
Copy link
Contributor

heschi commented Sep 29, 2021

Seems like we should submit the workaround CL https://golang.org/cl/345670 to resolve this?

@gopherbot
Copy link

Change https://golang.org/cl/371296 mentions this issue: testing: retry spurious errors from RemoveAll for temp directories

gopherbot pushed a commit that referenced this issue Dec 14, 2021
This works around what appears to be either a kernel bug or a Go
runtime or syscall bug affecting certain Windows versions
(possibly all pre-2016?).

The retry loop is a simplified version of the one used in
cmd/go/internal/robustio. We use the same 2-second arbitrary timeout
as was used in that package, since it seems to be reliable in practice
on the affected builders. (If it proves to be too short, we can
lengthen it, within reason, in a followup CL.)

Since this puts a higher-level workaround in place, we can also revert
the lower-level workaround added to a specific test in CL 345670.

This addresses the specific occurrences of the bug for users of
(*testing.T).TempDir, but does not fix the underlying bug for Go users
outside the "testing" package (which remains open as #25965).

Fixes #50051
Updates #48012
Updates #25965

Change-Id: I35be7125f32f05c8350787f5ca9a22974b8d0770
Reviewed-on: https://go-review.googlesource.com/c/go/+/371296
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Patrik Nyblom <pnyb@google.com>
Trust: Patrik Nyblom <pnyb@google.com>
Run-TryBot: Patrik Nyblom <pnyb@google.com>
@golang golang locked and limited conversation to collaborators Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows release-blocker Soon This needs to be done soon. (regressions, serious bugs, outages)
Projects
None yet
Development

No branches or pull requests

5 participants