Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: morestack on gsignal signal: trace/breakpoint trap due to g0 stack misattribution #43853

Closed
prattmic opened this issue Jan 22, 2021 · 4 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.

Comments

@prattmic
Copy link
Member

In rare cases on linux/amd64 race builds we've seen crashes that look like:

fatal: morestack on gsignal

signal: trace/breakpoint trap (core dumped)

The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled, mstart estimates the g0 stack bounds (cgo side), but this is a rough estimate and the g0 stack.lo may actually be beyond the end of the g0 stack.

On signal delivery, adjustSignalStack may then incorrectly determine that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0 stack.lo, functions in signal handling have a high probability of "running out of stack space" and calling morestack. Boom.

Here's one example of overlap I captured:

Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)

mstart contains a fudge factor of 1024 to try to address this inaccuracy, but checking against pthread_attr_getstack indicates that the mstart SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).

cc @cherrymui @aclements

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 22, 2021
@prattmic
Copy link
Member Author

There is pthread_attr_getstack which can provide accurate stack bounds. However, I'm not convinced we can use this portably. e.g., glibc's implementation looks like it always succeeds, but NetBSD's appears to be able to return NULL for the stack address.

@gopherbot
Copy link

Change https://golang.org/cl/285772 mentions this issue: runtime: check for g0 stack last in signal handler

@cherrymui
Copy link
Member

#26061 is related.

@prattmic
Copy link
Member Author

Shall we do a backport for this? I certainly affects 1.15 and I believe 1.14 as well.

@golang golang locked and limited conversation to collaborators Jan 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants