-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: failures in TestVectoredHandlerDontCrashOnLibrary on windows-amd64-longtest #49959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yea, looks related. And as with that one, probably a pain to reproduce, but I'll try. |
Hasn't occurred since 2021-11-24. I also tried to reproduce on gomote but couldn't reproduce. |
The interesting part is that the binary runs successfully with no error, just didn't print anything. This may be different from #49681. Maybe we need |
This hasn't happened again. Unfortunately, it was rare enough that that doesn't tell us much:
|
Still no more occurrences. I'm going to try stress testing this.
It may be necessary to run the full runtime test, but I'll try just the failing test first. |
I was able to reproduce this at commit 9e7600d, the last commit we saw a dashboard failure at. But it takes a long time. Over the course of about 20 hours with 8 gomotes, I got 4 failures out of 17,040 runs (plus 2,813 buildlet flakes :( ). |
I've added a bunch of trace prints to the test program and will see if that sheds any light on this. |
It seems I can only create 3 windows-amd64-longtest builders at a time, though I'm not sure where that limit is coming from. I'm switching to windows-amd64-2016, which is the same image, just on a smaller VM. Hopefully I can still reproduce there. (Edited to clarify: on the builders, the test only runs with longtest, so we wouldn't expect to see it on windows-amd64-2016, but since I'm running the test manually, that doesn't matter.) |
This will probably surprise no one, but it seems adding tracing made this no longer reproducible. I've done 21,665 runs with no failures. Now I'm not sure what to do. |
Given @cherrymui's insight in #49959 (comment), and combining that with C best practices (https://wiki.sei.cmu.edu/confluence/display/c/FIO23-C.+Do+not+exit+with+unflushed+data+in+stdout+or+stderr), I would say maybe we should add an explicit If we see any more of these failures (on our own builders or reported by users) after that, we can always either reopen this issue or open a followup issue to investigate further. |
I'll try the fflush. I tried reproducing it a different way that didn't cause nearly as many problems with the builders. Over the course of 20 hours I got 3 failures out of 231,000 iterations of the test.
|
Looks like Cherry was right. 0 failures after 250,000 iterations with the fflush. |
Change https://golang.org/cl/380494 mentions this issue: |
Very, very rarely TestVectoredHandlerDontCrashOnLibrary fails because the C subprocess exits with a 0 status code and no output. This appears to happen because C does not actually guarantee that stdout will be flushed on exit and somehow, very rarely, it is not flushed. Add explicit fflushes to fix this. This reduces the failure rate of TestVectoredHandlerDontCrashOnLibrary from 0.0013% to 0% in 250,000 iterations. Fixes golang#49959. Change-Id: I892cf49a165ac91134c5da37588a2ab11e1f3f8b Reviewed-on: https://go-review.googlesource.com/c/go/+/380494 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
greplogs --dashboard -md -l -e 'FAIL: TestVectoredHandlerDontCrashOnLibrary'
2021-11-24T20:56:40-9e7600d/windows-amd64-longtest
2021-11-12T23:26:33-3a4b950/windows-amd64-longtest
2021-11-09T20:10:50-5430203/windows-amd64-longtest
This appears to be a regression during the Go 1.18 cycle; it may be related in some way to #49681.
(CC @bufflig @alexbrainman)
The text was updated successfully, but these errors were encountered: