Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "goroutine stack exceeds 250000000-byte limit" on linux-arm #35470

Closed
bcmills opened this issue Nov 8, 2019 · 10 comments
Closed

runtime: "goroutine stack exceeds 250000000-byte limit" on linux-arm #35470

bcmills opened this issue Nov 8, 2019 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2019

We're seeing stack overflows this week in various tests on the linux-arm builder.

This may be related to #35349, but the stack traces on linux-arm are more diverse.

CC @ianlancetaylor @aclements @mknyszek @cherrymui

2019-11-08T19:24:30-e6c12c3/linux-arm
2019-11-07T19:20:35-ceca99b/linux-arm
2019-11-07T18:39:03-05aa4a7/linux-arm
2019-11-07T16:13:31-0bf2eb5/linux-arm
2019-11-05T20:56:05-81559af/linux-arm
2019-11-05T17:19:16-1b3a1db/linux-arm

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 8, 2019
@bcmills bcmills added this to the Go1.14 milestone Nov 8, 2019
@ianlancetaylor
Copy link
Contributor

These stacks are not large at all, so the problem is not actual stack overflow, but a false report of stack overflow.

@cherrymui
Copy link
Member

I can imagine a possibility: if there are both a synchronous preemption request (by clobbering the stack guard) and an asynchronous one (by signal), and the goroutine in a function prologue first sees the clobbered stack guard, so it will call morestack. If the signal lands after the CMP instruction but before the call to morestack, it will be asynchronously preempted, enter the scheduler. When it is resumed, the scheduler clears the preemption request, unclobbers the stack guard. But the resumed goroutine will still call morestack (as it has passed the CMP instruction). morestack will, as there is no preemption request, double the stack unnecessarily. If this happens multiple times, the stack may grow too big, although only a small amount is actually used.

I let it print the current stack bounds in the stack-too-large error message, and the stack is indeed quite large, with only a small amount used:

runtime: goroutine stack exceeds 250000000-byte limit
	sp=0x1847dad8 stack=[0x10480000, 0x18480000]
fatal error: stack overflow

In theory this can happen on other platforms. Not sure why this is only seen on the ARM builder.

@cherrymui
Copy link
Member

Maybe we want to disable async preemption in function prologue between the CMP instruction and the call to morestack? As it will call morestack, it will be preempted anyway.

@randall77
Copy link
Contributor

@cherrymui That sounds like a good idea.

I think we might have to start at the load of the stack guard, as the CMP result is predestined at that point. But then maybe we need to only prevent async preemption if that loaded value is in fact the preempted guard.

Tricky.

@gopherbot
Copy link

Change https://golang.org/cl/207350 mentions this issue: cmd/internal/obj: mark split-stack prologue nonpreemptible

@gopherbot
Copy link

Change https://golang.org/cl/207351 mentions this issue: runtime: print more information on stack overflow

@gopherbot
Copy link

Change https://golang.org/cl/207349 mentions this issue: cmd/internal/obj, runtime: use register map to mark unsafe points

@ianlancetaylor
Copy link
Contributor

See also #35784.

gopherbot pushed a commit that referenced this issue Nov 27, 2019
Currently we use stack map index -2 to mark unsafe points, i.e.
PC ranges that is not safe for async preemption. This has a
problem: it cannot mark CALL instructions, because for stack scan
a valid stack map index is needed.

This CL switches to use register map index for marking unsafe
points instead, which does not conflict with stack scan and can
be applied on CALL instructions. This is necessary as next CL
will mark call to morestack nonpreemptible.

For #35470.

Change-Id: I357bf26c996e1fee1e7eebe4e6bb07d62930d3f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/207349
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue Nov 27, 2019
Print the current SP and (old) stack bounds when the stack grows
too large. This helps to identify the problem: whether a large
stack is used, or something else goes wrong.

For #35470.

Change-Id: I34a4064d5c7280978391d835e171b90d06f87222
Reviewed-on: https://go-review.googlesource.com/c/go/+/207351
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@aclements
Copy link
Member

Should we also close #35784?

@cherrymui
Copy link
Member

Yeah, I think we can close that.

@golang golang locked and limited conversation to collaborators Dec 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants