New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: bad backtraces in gdb on s390x #24385
Comments
I think this is ok. |
Hmm. I can say that gdb doesn’t complain about the equivalent bt call on linux/amd64. And something broke more seriously with the should-be-unrelated CL 99078. (I spent quite a while poking at it, to no avail. Then I noticed this complaint on master and guess I hoped it was related.) |
My hunch is that we should be arranging for the saved LR of |
Okay, it's pretty clear that we're zeroing the saved LR. That's pretty simple and it's the same on most arches. My hunch now is that the .debug_frame information is wrong. In particular, since the compiler thinks of This appears to be the case for .debug_frame on all LR machines. I'm not sure why it only appears to affect s390x. Maybe it affects others, too, and we just haven't noticed. |
Are you saying that we should let .debug_frame think
I may have seen that on other LR architectures, and thought that was ok. |
That's one possibility. A totally different approach would be to rearrange the way we enter goroutines. We could, for example, actually enter at goexit and have it actually call the goroutine entry point, rather than hand-constructing a frame that makes it look like goexit called the entry point. I wouldn't do this just to fix up the debug info, but it may simplify things overall and have the side-effect of making the debug info right.
I believe so, though I haven't specifically checked. |
Having thought about this more, perhaps the tricky part is that we don't know how big goexit's frame actually is. Logically, goexit's frame includes the arguments to the goroutine entry function, but those aren't fixed size and there's no way to represent variable sized arguments in Go's calling convention. I tried setting things up so we entered at goexit and it called the entry function, but ran head-first into this problem. And it would be a problem for any sort of generic DWARF unwinding. This works in Go's own tracebacks only because we stop when we see goexit, so it doesn't matter that we can't walk over its frame. So, maybe we need to write out a special FDE for goexit that explicitly sets the LR register to 0 to stop the debugger, similar to how we explicitly stop internal tracebacks. (Now I'm curious why this seems to work on x86 machines...) |
On AMD64:
On AMD64, even Apparently returnPC=0 doesn't stop gdb from tracing back, but once it gets to PC=0 it cannot proceed. |
On LR machines, I think making |
If we do this with DWARF then I think the normal way to mark entry points is to insert a |
Thanks @mundaym. This sounds the right thing to do. |
I'm kicking the milestone forward to 1.12. |
Change https://golang.org/cl/169726 mentions this issue: |
Reproduce:
Though the test passes, observe this line:
So something is unhealthy about s390x's backtrace of non-running goroutines.
This ended up being surfaced more visibly in build failures after CL 99078, e.g. https://build.golang.org/log/17e01b4804101ee50ca882c4520a3337818e42da. The CL was then reverted to fix the s390x build. Once this issue is fixed, let's re-apply CL 99078.
I don't plan to look into this further. I'm hoping someone with more s390x expertise can take over from here.
cc @mundaym @aclements
The text was updated successfully, but these errors were encountered: