New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: frequent SIGSEGV in clock_gettime_trampoline on openbsd-386 #49532
Comments
This looks like a fairly old bug, but the failure rate varies. It seems to have gone from “a couple per week” to “several per day” around 2021-11-04. |
That line intentionally faults when clock_gettime syscall fails. Why the syscall could fail? And why it is nondeterministic? |
Investigations lead us to believe this is okay after Beta1 because this has been happening for some time, seems to be confined to this platform, and isn't a first-class port. |
Just a shot in the dark, but does anyone know if openbsd's libc requires any particular alignment on SP? runtime·clock_gettime_trampoline doesn't do any alignment, so I wonder if that is tripping up a check in libc or the kernel. |
I thought about that. I don't know. But we also don't align the SP in other syscall trampolines, whereas it always fails here. Also, if that's the case, I'd expect it fails much more deterministically. |
I would have to double check, but I do not believe there are alignment requirements beyond those normally required by the hardware architecture itself (it is also trivial to write test code to verify this) - I suspect the underlying issue is some form of memory corruption given we're seeing other failures like faulting on unlock: https://build.golang.org/log/3e594f9d7a015c79ab62629337b14a2e21a86e9d Additionally, the openbsd-386-68 builder had been passing fairly consistently during the development cycle and things seem to have turned red around early November (although in limited testing I've not been able to trigger failures myself). |
I also clicked through the source in libc and the kernel and did not notice anything particularly noteworthy. No explicit alignment checks, no odd error paths beyond the ones @4a6f656c mentioned. The only thing of note is that it appeared that on 386, libc may really need to make a system call. I didn't see the fast path getting set up for 386 like it does for amd64, though I may well have missed it. |
Ah, no you're correct - I neglected to recall that |
It seems the failure only occurs on openbsd-386-68 builder, not -70 or -70-n1 builders. |
It is also interesting that all failures seem to come from the cmd/dist binary, not anything else. |
Change https://golang.org/cl/368334 mentions this issue: |
For #49532. Change-Id: I5afc64c987f0519903128550a7dac3a0f5e592cf Reviewed-on: https://go-review.googlesource.com/c/go/+/368334 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com>
CL 268334, which I just submitted, will change the message printed during this failure to a "clock_gettime failed" fatal error. New combined greplogs (not yet tested because there haven't been any new failures):
|
It's possible I made this worse. I haven't seen any clock_gettime failures since my CL went in, but we did get a new "stopm holding p" with, unfortunately, no useful backtrace, which we haven't seen before. |
Seems this hasn't occurred since @aclements 's CL... |
For better or for worse, this is no longer happening, so closing. |
greplogs --dashboard -md -l -e \(\?ms\)\\Aopenbsd-.\*clock_gettime_trampoline
2021-11-11T16:17:21-666fc17/openbsd-386-68
2021-11-11T04:02:33-3949faf/openbsd-386-68
2021-11-10T21:53:03-229b909/openbsd-386-68
2021-11-10T17:15:54-8a3be15/openbsd-386-68
2021-11-10T05:08:25-17980df/openbsd-386-68
2021-11-10T02:26:41-02d7eab/openbsd-386-68
2021-11-10T00:45:37-ec86bb5/openbsd-386-68
2021-11-09T21:58:03-805b4d5/openbsd-386-68
2021-11-09T00:08:09-bee0c73/openbsd-386-68
2021-11-08T17:46:34-2e210b4/openbsd-386-68
2021-11-06T19:41:14-cfb3dc7/openbsd-386-68
2021-11-06T16:43:43-3544082/openbsd-386-68
2021-11-05T22:55:56-e83a204/openbsd-386-68
2021-11-05T21:34:10-bb53fd7/openbsd-386-68
2021-11-05T21:28:34-90462df/openbsd-386-68
2021-11-05T21:27:34-7aed6dd/openbsd-386-68
2021-11-05T21:27:19-58ec925/openbsd-386-68
2021-11-05T00:52:04-1c4cfd8/openbsd-386-68
2021-11-04T23:56:29-0e5f287/openbsd-386-68
2021-11-04T21:41:49-156abe5/openbsd-386-68
2021-11-04T20:01:10-9b2dd1f/openbsd-386-68
2021-11-04T20:00:54-961aab2/openbsd-386-68
2021-10-06T22:28:59-d477ef3-b18ba59/openbsd-386-64
2021-09-30T20:30:12-1c35f2a-c035d82/openbsd-386-64
2021-09-29T20:06:10-1c35f2a-5930cff/openbsd-386-64
2021-09-27T22:22:35-ba6b94c-cd4d592/openbsd-386-64
2021-09-21T13:18:09-fe076c8-39e08c6/openbsd-386-64
2021-09-09T00:10:46-076821b-e30a090/openbsd-386-64
2021-08-30T02:40:46-3e0d083-56c3856/openbsd-386-64
2021-08-16T12:54:44-a55d515-c88e3ff/openbsd-386-64
2021-07-14T17:25:06-5061c41-60ddf42/openbsd-386-64
2021-06-21T20:53:11-d25f906-761edf7/openbsd-386-64
2021-06-08T20:19:02-689f4c7/openbsd-386-68
2021-06-08T20:19:02-689f4c7/openbsd-amd64-64
2021-06-03T16:41:39-4abb1e2-e0d029f/openbsd-386-64
2021-06-02T21:39:28-384c392-dd7ba3b/openbsd-386-64
2021-05-12T20:59:48-8287d5d-6db7480/openbsd-386-64
2021-05-10T23:42:56-79d39ff-5c48951/openbsd-386-64
2021-05-08T17:03:18-f05e912-b38b1b2/openbsd-386-64
2021-05-07T02:17:32-c0140e8-d2b0311/openbsd-386-64
2021-05-05T21:37:16-1949673-cf73f1a/openbsd-386-64
The text was updated successfully, but these errors were encountered: