Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/internal/obj: rework stack unwind metadata on LR machines #53609

Open
cherrymui opened this issue Jun 29, 2022 · 3 comments
Open

cmd/internal/obj: rework stack unwind metadata on LR machines #53609

cherrymui opened this issue Jun 29, 2022 · 3 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@cherrymui
Copy link
Member

On LR machines, our function prologue generally looks like (stack bounds check and frame pointer handling omitted): store the LR at -frame_size(SP), then decrement SP to SP-frame_size (if they cannot be done on the same instruction, either due to the lack of the instruction on the architecture or the stack frame too large).

If we decrement the SP first, if a signal arrives immediately after the instruction before saving the LR, the runtime will see the junk value at the LR slot and will fail to unwind the stack. So we store the LR first then decrement the SP.

This generally works. But in some cases if the signal stack is not set (e.g. #53374 , also iOS doesn't support sigaltstack), the signal will arrive on the current stack and the kernel will push the signal frame immediately below the SP, clobbering our saved LR. In those cases we have to store the LR again after decrementing the SP. This makes our function prologue a bit inefficient.

We can avoid the problem if we expand the stack unwind metadata to express "the SP has decremented but the return address is still in the LR register". Currently, the metadata is just about SP delta and the runtime always assumes the return address can be found at 0(SP) (for non-leaf functions). With the expanded metadata, we can decrement SP and then store the LR.

This also makes our prologue more similar to C functions.

I plan to do it in Go 1.20.

@cherrymui cherrymui self-assigned this Jun 29, 2022
@cherrymui cherrymui added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 29, 2022
@cherrymui cherrymui added this to the Go1.20 milestone Jun 29, 2022
@gopherbot
Copy link

Change https://go.dev/cl/412474 mentions this issue: cmd/internal/obj/arm64: save LR and SP in one instruction for small frames

gopherbot pushed a commit that referenced this issue Jun 29, 2022
…rames

When we create a thread with signals blocked. But glibc's
pthread_sigmask doesn't really allow us to block SIGSETXID. So we
may get a signal early on before the signal stack is set. If we
get a signal on the current stack, it will clobber anything below
the SP. This CL makes it to save LR and decrement SP in a single
MOVD.W instruction for small frames, so we don't write below the
SP.

We used to use a single MOVD.W instruction before CL 379075.
CL 379075 changed to use an STP instruction to save the LR and FP,
then decrementing the SP. This CL changes it back, just this part
(epilogues and large frame prologues are unchanged). For small
frames, it is the same number of instructions either way.

This decreases the size of a "small" frame from 0x1f0 to 0xf0.
For frame sizes in between, it could benefit from using an
STP instruction instead of using the prologue for the "large"
frame case. We don't bother it for now as this is a stop-gap
solution anyway.

This only addresses the issue with small frames. Luckily, all
functions from thread entry to setting up the signal stack have
samll frames.

Other possible ideas:
- Expand the unwind info metadata, separate SP delta and the
  location of the return address, so we can express "SP is
  decremented but the return address is in the LR register". Then
  we can always create the frame first then write the LR, without
  writing anything below the SP (except the frame pointer at SP-8,
  which is minor because it doesn't really affect program
  execution).
- Set up the signal stack immediately in mstart in assembly.

For Go 1.19 we do this simple fix. We plan to do the metadata fix
in Go 1.20 ( #53609 ).

Other LR architectures are addressed in CL 413428.

Fix #53374.

Change-Id: I9d6582ab14ccb06ac61ad43852943d9555e22ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/412474
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Eric Fang <eric.fang@arm.com>
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
…rames

When we create a thread with signals blocked. But glibc's
pthread_sigmask doesn't really allow us to block SIGSETXID. So we
may get a signal early on before the signal stack is set. If we
get a signal on the current stack, it will clobber anything below
the SP. This CL makes it to save LR and decrement SP in a single
MOVD.W instruction for small frames, so we don't write below the
SP.

We used to use a single MOVD.W instruction before CL 379075.
CL 379075 changed to use an STP instruction to save the LR and FP,
then decrementing the SP. This CL changes it back, just this part
(epilogues and large frame prologues are unchanged). For small
frames, it is the same number of instructions either way.

This decreases the size of a "small" frame from 0x1f0 to 0xf0.
For frame sizes in between, it could benefit from using an
STP instruction instead of using the prologue for the "large"
frame case. We don't bother it for now as this is a stop-gap
solution anyway.

This only addresses the issue with small frames. Luckily, all
functions from thread entry to setting up the signal stack have
samll frames.

Other possible ideas:
- Expand the unwind info metadata, separate SP delta and the
  location of the return address, so we can express "SP is
  decremented but the return address is in the LR register". Then
  we can always create the frame first then write the LR, without
  writing anything below the SP (except the frame pointer at SP-8,
  which is minor because it doesn't really affect program
  execution).
- Set up the signal stack immediately in mstart in assembly.

For Go 1.19 we do this simple fix. We plan to do the metadata fix
in Go 1.20 ( golang#53609 ).

Other LR architectures are addressed in CL 413428.

Fix golang#53374.

Change-Id: I9d6582ab14ccb06ac61ad43852943d9555e22ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/412474
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Eric Fang <eric.fang@arm.com>
@gopherbot
Copy link

Change https://go.dev/cl/424254 mentions this issue: runtime: drop function context from traceback

gopherbot pushed a commit that referenced this issue Sep 2, 2022
Currently, gentraceback tracks the closure context of the outermost
frame. This used to be important for "unstarted" calls to reflect
function stubs, where "unstarted" calls are either deferred functions
or the entry-point of a goroutine that hasn't run. Because reflect
function stubs have a dynamic argument map, we have to reach into
their closure context to fetch to map, and how to do this differs
depending on whether the function has started. This was discovered in
issue #25897.

However, as part of the register ABI, "go" and "defer" were made much
simpler, and any "go" or "defer" of a function that takes arguments or
returns results gets wrapped in a closure that provides those
arguments (and/or discards the results). Hence, we'll see that closure
instead of a direct call to a reflect stub, and can get its static
argument map without any trouble.

The one case where we may still see an unstarted reflect stub is if
the function takes no arguments and has no results, in which case the
compiler can optimize away the wrapper closure. But in this case we
know the argument map is empty: the compiler can apply this
optimization precisely because the target function has no argument
frame.

As a result, we no longer need to track the closure context during
traceback, so this CL drops all of that mechanism.

We still have to be careful about the unstarted case because we can't
reach into the function's locals frame to pull out its context
(because it has no locals frame). We double-check that in this case
we're at the function entry.

I would prefer to do this with some in-code PCDATA annotations of
where to find the dynamic argument map, but that's a lot of mechanism
to introduce for just this. It might make sense to consider this along
with #53609.

Finally, we beef up the test for this so it more reliably forces the
runtime down this path. It's fundamentally probabilistic, but this
tweak makes it better. Scheduler testing hooks (#54475) would make it
possible to write a reliable test for this.

For #54466, but it's a nice clean-up all on its own.

Change-Id: I16e4f2364ba2ea4b1fec1e27f971b06756e7b09f
Reviewed-on: https://go-review.googlesource.com/c/go/+/424254
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@cherrymui
Copy link
Member Author

Didn't get chance to do it in 1.20. Will do in 1.21.

@cherrymui cherrymui modified the milestones: Go1.20, Go1.21 Dec 5, 2022
@cherrymui cherrymui modified the milestones: Go1.21, Go1.22 Jun 1, 2023
@cherrymui cherrymui modified the milestones: Go1.22, Go1.23 Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

2 participants