Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, runtime: add and use go:nosplitrec compilation directive #21314

Open
ianlancetaylor opened this issue Aug 4, 2017 · 34 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

A key error leading to #21306 was a call from a nosplit function to a function not marked nosplit, permitting preemption at a point where it was not permitted.

Therefore, just as we have go:nowritebarrierrec meaning that such a function may not call any function that permits write barriers, we should have go:nosplitrec meaning that such a function may not call any function that permits preemption.

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Aug 4, 2017
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Aug 4, 2017
@cherrymui
Copy link
Member

cherrymui commented Aug 4, 2017

Shouldgo:nosplit actually mean recursively nosplit? I think usually we want it doesn't grow stack or be preempted in the whole duration of the function. When would we want a particular function to be nosplit but it can call other functions that are not nosplit?

@ianlancetaylor
Copy link
Contributor Author

A function like sigtrampgo is go:nosplit because it is called by the signal handler and g may be nil, but sigtrampgo sets up g and once that is done it is fine to call non-nosplit functions like sighandler.

Of course, we could change go:nosplit to default to being recursive and add a go:yessplit annotation, along the lines of go:yeswritebarrierrec.

@aclements
Copy link
Member

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

The stack no longer splits, but there are places where we use this to mean "no stack growth", not "no preemption". Both currently have the same consequence, but maybe now would be a good time to tease them apart?

Of course, we could change go:nosplit to default to being recursive and add a go:yessplit annotation, along the lines of go:yeswritebarrierrec.

I like this idea in general, but it may be a bit of a pain. Maybe it would make sense to combine making it recursive by default with renaming it?

@aclements
Copy link
Member

I've been looking through various uses of nosplit and there seem to be four distinct reasons for it. While we're pondering introducing a new annotation, I believe it's worth considering splitting out these different reasons, even if they have the same or similar consequences at the moment:

  • We're simply in a bad state for stack growth/scheduling. Ian's sigtrampgo example falls into this category (though I'm not convinced that's real; isn't it always called on the signal stack?). Also functions like mstart and needm, though again I think these may always be called on the g0 or gsignal stacks anyway. In this case, these functions may put the system into a state that's okay for growth/scheduling and then call regular functions, so this is where a non-recursive go:nosplit is still useful.

  • There are slots on the stack with unknown type, so we can't safely move or walk the stack. For example, everything related to entersyscall and exitsyscall, newdefer and deferreturn, as well as a lot of things related to the buffered write barrier fall in this category. This must be recursive and there's probably never any need to opt out of it. Interestingly, this is still about both stack moving and preemption because it's not safe to preempt and scan the stack. For this, I propose a new annotation called go:noscan (since both stack moving and preemption are about scanning) that applies recursively and prevents stack moving and preemption. In gc, this would drop the stack check prologue and could also affect loop preemption/etc. In toolchains that use segmented stacks like gccgo, I believe it is safe to create new stack segments in these situations, so it could block only preemption.

  • The function must not be preempted. For example, dolockOSThread and related functions muck with scheduler state and would break if their G moved. Similarly, there are functions like typedmemmove that must prevent preemption to block GC phase transitions. In these cases, it's actually okay to move the stack, but it's not okay to preempt them. I believe this should always apply recursively. For this, I propose a go:nosched annotation.

  • The function doesn't care, but it's a helper function to one of the above. I believe this is actually the vast majority of nosplit functions, which is unfortunate because there's rarely a clear reason why they're nosplit, making them very hard to maintain. For this, I propose that the compiler should automatically propagate the recursive annotations above as much as it can. The linker would still check everything in the end, so it would still be necessary to manually propagate across assembly and package boundaries, but this is quite rare.

@ianlancetaylor
Copy link
Contributor Author

sigtrampgo is always called on the gsignal stack, but it is called directly from the signal handler, so the global g will point to the goroutine that was running when the signal was delivered. The stack check prologue will compare the stack pointer on the gsignal stack to g's goroutine stack, and do something horrible. We need to disable the stack check prologue for that function.

@gopherbot
Copy link

Change https://golang.org/cl/79615 mentions this issue: runtime: document sigtrampgo better

gopherbot pushed a commit that referenced this issue Nov 23, 2017
Add an explanation of why sigtrampgo is nosplit.

Updates #21314.

Change-Id: I3f5909d2b2c180f9fa74d53df13e501826fd4316
Reviewed-on: https://go-review.googlesource.com/79615
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/79815 mentions this issue: runtime: fix final stack split in exitsyscall

gopherbot pushed a commit that referenced this issue Nov 24, 2017
exitsyscall should be recursively nosplit, but we don't have a way to
annotate that right now (see #21314). There's exactly one remaining
place where this is violated right now: exitsyscall -> casgstatus ->
print. The other prints in casgstatus are wrapped in systemstack
calls. This fixes the remaining print.

Updates #21431 (in theory could fix it, but that would just indicate
that we have a different G status-related crash and we've *never* seen
that failure on the dashboard.)

Change-Id: I9a5e8d942adce4a5c78cfc6b306ea5bda90dbd33
Reviewed-on: https://go-review.googlesource.com/79815
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@cherrymui cherrymui modified the milestones: Go1.10, Go1.11 Nov 30, 2017
@cherrymui
Copy link
Member

I assume the change won't happen in Go 1.10.
@aclements has a plan with a few new directives.

@aclements
Copy link
Member

Yeah, not happening for 1.10. I might try to prototype this before the tree opens, though.

Some thoughts from discussing these annotations with others:

  • It's possible we should rename go:nosplit to, say, go:nostackcheck. However, go:nosplit is publicly documented (go doc cmd/compile) and is used occasionally in code outside std. We could, of course, accept both and change it in the runtime, but it's not clear to me that's worthwhile.

  • It's possible we should have go:gcatomic instead of go:nosched. That's certainly more specific in many cases that need to block preemption in order to block GC phase transitions. On the other hand, it doesn't cover a handful of uses like dolockOSThread that specifically want to block G/M/P migration. Maybe those should just use go:nosplit or possibly acquirem/releasem. (/cc @RLH for thoughts)

There are also two more, more obscure uses of go:nosplit: 1) as an optimization in really hot functions, though I wasn't able to find any examples of this, and 2) on functions that call getcallerpc. The latter is for historical reasons, from when we actually split stacks and didn't want to sometimes get the PC of the "return from split" function. @ianlancetaylor, does gccgo still care about this? Even if it does, presumably the compiler should just infer this from the presence of a call to getcallerpc.

@ianlancetaylor
Copy link
Contributor Author

gccgo never required any special handling for getcallerpc. It is implemented by a compiler intrinsic that simply fetches the return address (the GCC intrinsic __builtin_return_address).

@cherrymui
Copy link
Member

I think @aclements's question is that we may want getcallerpc to return the PC of the actual caller, instead of __morestack (or whatever the trampoline is) if the function calling getcallerpc splits stack. Fetching the return address from the stack probably returns the latter.

@ianlancetaylor
Copy link
Contributor Author

@cherrymui Ah, got it, thanks. Yes, for that to work right, in gccgo, any function that calls getcallerpc ought to be marked nosplit. (Which is not actually true today.)

@aclements
Copy link
Member

Thanks. Could gccgo instead automatically mark any function that calls getcallerpc as nosplit? (Or getcallerpc could handle the trampoline return PC, like how it used to handle the stack barrier return PC.)

@ianlancetaylor
Copy link
Contributor Author

gccgo could certainly mark any function that calls getcallerpc as nosplit, although it would be risky as gccgo has no test that prevents stack overflow of nosplit functions; it just relies on gc's test for that plus the fact that gccgo has a larger red zone. Adding that test would be difficult as there is no good place to put it, since gccgo is just a frontend that has no knowledge of stack sizes, and it uses an unmodified system linker.

Having getcallerpc handle the trampoline return PC would drastically affect its efficiency. In gccgo the only way to do that is to, in effect, turn getcallerpc into runtime.Callers. That would make it necessary to go through the gccgo version of libgo and remove the calls to getcallerpc that are only there for the race detector. Which could of course be done.

@aclements
Copy link
Member

Having getcallerpc handle the trampoline return PC would drastically affect its efficiency. In gccgo the only way to do that is to, in effect, turn getcallerpc into runtime.Callers.

Oh dear. The stack trampoline doesn't save its own return PC anywhere convenient?

@ianlancetaylor
Copy link
Contributor Author

gccgo lives in a more complex world than gc. The stack splitting code is not part of the Go frontend, it's part of GCC itself and the GCC support libraries. The Go library has no access to the data structures used by the GCC support libraries.

It is also true that the stack trampoline doesn't save the return PC anywhere in particular. It just saves the stack pointer. Quoting gcc/libgcc/config/i386/morestack.S:

# We do a little dance so that the processor's call/return return
# address prediction works out.  The compiler arranges for the caller
# to look like this:
#   call __generic_morestack
#   ret
#  L:
#   // carry on with function
# After we allocate more stack, we call L, which is in our caller.
# When that returns (to the predicted instruction), we release the
# stack segment and reset the stack pointer.  We then return to the
# predicted instruction, namely the ret instruction immediately after
# the call to __generic_morestack.  That then returns to the caller of
# the original caller.

@gopherbot
Copy link

Change https://golang.org/cl/83015 mentions this issue: runtime: mark heapBits.bits nosplit

gopherbot pushed a commit that referenced this issue Dec 11, 2017
heapBits.bits is used during bulkBarrierPreWrite via
heapBits.isPointer, which means it must not be preempted. If it is
preempted, several bad things can happen:

1. This could allow a GC phase change, and the resulting shear between
the barriers and the memory writes could result in a lost pointer.

2. Since bulkBarrierPreWrite uses the P's local write barrier buffer,
if it also migrates to a different P, it could try to append to the
write barrier buffer concurrently with another write barrier. This can
result in the buffer's next pointer skipping over its end pointer,
which results in a buffer overflow that can corrupt arbitrary other
fields in the Ps (or anything in the heap, really, but it'll probably
crash from the corrupted P quickly).

Fix this by marking heapBits.bits go:nosplit. This would be the
perfect use for a recursive no-preempt annotation (#21314).

This doesn't actually affect any binaries because this function was
always inlined anyway. (I discovered it when I was modifying heapBits
and make h.bits() no longer inline, which led to rampant crashes from
problem 2 above.)

Updates #22987 and #22988 (but doesn't fix because it doesn't actually
change the generated code).

Change-Id: I60ebb928b1233b0613361ac3d0558d7b1cb65610
Reviewed-on: https://go-review.googlesource.com/83015
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements
Copy link
Member

Another instance where this would have helped: https://go-review.googlesource.com/c/go/+/85880/1

@aclements
Copy link
Member

@ianlancetaylor, I'm sure it's more complicated than this, but reading over gcc's __morestack for x86-64, since the rsp from the old stack segment gets saved in rbp when __morestack calls L, could getcallerpc avoid all the complexity of actually walking the stack and work roughly like

x = get obvious caller pc (using rbp or frame size or whatever)
if x is not in __morestack
    return x
return *(saved rbp + 16)

I think the stack around the saved rbp looks like

f's return PC
morestack's return PC (to f)
morestack's saved rbp         <-- f's saved rbp

@ianlancetaylor
Copy link
Contributor Author

@aclements The whole world is not an x86. Also, libgcc is not part of the gofrontend. I'd rather avoid having gofrontend depend on implementation details of libgcc.

I don't think it matters, though. I looked through gccgo's version of the runtime package, and, remembering that gccgo does not support the race detector, there was only one meaningful call to getcallerpc, from newosproc. It would be easy to make just that function nosplit, or, since it is called only from compiler-generated code, change the way it is called.

@aclements
Copy link
Member

I'd rather avoid having gofrontend depend on implementation details of libgcc.

Sure. I hadn't thought that it would. I assumed (perhaps incorrectly) that getcallerpc would be lowered to __builtin_return_address and that __builtin_return_address would potentially have to deal with this issue even in non-Go code that used -fsplit-stack.

I don't think it matters, though. ... It would be easy to make just that function nosplit, or, since it is called only from compiler-generated code, change the way it is called.

Ah, great. That sounds like an easy solution. :)

@aclements
Copy link
Member

I've been experimenting with my proposed go:noscan annotation and I'm increasingly thinking that the recursive effect is a mistake, but it could have a local effect combined with recursive verification.

The problem with automatically propagating it recursively is it can easily worm its way into lots of surprising parts of the runtime. Most of its surprise reach is easy to fix, but fixing it requires noticing it, and if the propagation is automatic, it's hard to notice.

But I also want to fix the problem of nosplit being added to helpers and then being hard to remove because you don't know why it was added.

So my alternate proposal is: Add both a go:noscan annotation and a go:noscancalled annotation. Both have the effect of disabling stack scanning for only the annotated function (meaning no split check in gc). The linker checks that functions are go:noscancalled if and only if they are reachable from a go:noscan function. This way, the effect of these annotations is local and maintained directly in the code (and hence more predictable), but the verification is still recursive. By making the check strict, we avoid keeping around unneeded annotations (much like our strict import and variable use checks).

@ianlancetaylor
Copy link
Contributor Author

Are you suggesting any check for an unannotated function called by a go:noscan function? Or will that just be a potential bug?

@aclements
Copy link
Member

Are you suggesting any check for an unannotated function called by a go:noscan function?

I am. I'm suggesting that the linker should check that all of the functions called by a go:noscan function must be go:noscancalled (perhaps or go:nosplit to make the transition easier). I'm also suggesting that the linker check that no additional functions are marked go:noscancalled. This would mean there are no surprises in what gets its stack check prologue removed.

One potential downside is that different build configurations may have somewhat different sets of go:noscancalled functions. I'm not sure how to deal with that.

Another possibility would be propagate the go:noscan annotation automatically (no go:noscancalled annotation), but have diagnostic output to report exactly which functions were automatically marked. We could check that in so there's an obvious diff in that file when something changes.

@aclements
Copy link
Member

Alternatively, maybe doing this all purely statically is just the wrong approach. In most uses, it's fine if these helper functions grow the stack, so it's a little unfortunate that just because one of their callers is noscan, they lose the ability to grow that stack for all uses.

A possible hybrid static/dynamic approach would be for a noscan function to set a flag on entry (and clear it on exit) that inhibits stack growth even if the stack check fails and for the linker to check that the call tree does in fact fit in the nosplit space, so ignoring the stack check is okay.

This would have to be integrated with the linker's nosplit check because if a nosplit function calls a noscan function, the caller (and any further nosplit functions above it) have to count against the space. Things also get dicey with indirect calls, though that's already dicey (and as it turns out we don't make any problematic indirect calls in noscan call trees).

This approach wouldn't involve any recursive effect on generated code and would let us remove most of the nosplits from the runtime, while still letting us statically ensure the safety of this code.

@bcmills
Copy link
Contributor

bcmills commented Jan 16, 2018

In most uses, it's fine if these helper functions grow the stack, so it's a little unfortunate that just because one of their callers is noscan, they lose the ability to grow that stack for all uses.

Another way to think of this is that the functions must have a static upper bound on stack size, even if the caller does not guarantee that the available stack space exceeds that bound.

(Perhaps we could name that go:nonrecursive or go:stacklimit instead of go:nosplit?)

Are there cases where we care about a static upper bound but that bound is parametric (say, with the stack bound of a passed-in function)?

@aclements
Copy link
Member

Another way to think of this is that the functions must have a static upper bound on stack size, even if the caller does not guarantee that the available stack space exceeds that bound.

I think this is a valid generalization, but would be difficult to actually implement. Currently, splittable functions ensure there's enough available space for any chain of nosplit functions, and we can use that same global bound to ensure that noscan call trees always fit on the stack.

You could imagine that the prologue of a noscan function could grow the stack to whatever bound was required by its call tree (just use this size instead of its own frame size for the growth prologue). But there are some complications: 1. the prologue is written by the compiler, but the stack bound analysis needs to be done by the linker since it may cross assembly functions or package boundaries, and 2. for at least syscalls and and the write barrier, by the time we reach the noscan function we already can't grow the stack (for syscalls, because of the arguments to the syscall, and for the write barrier because its arguments are passed in registers).

Are there cases where we care about a static upper bound but that bound is parametric (say, with the stack bound of a passed-in function)?

Not for the functions I'm planning to mark noscan. They don't make any indirect function calls (at least, not without first switching to the system stack).

@eliasnaur
Copy link
Contributor

Two cases where I believe a go:nosplitrec or go:noscan would have helped:

CL 21314 would have missed a go:nosplit for the runtime.crash function if Austin had not caught my mistake in review.

The breaking of the nosplit chain CL 111258 is fixing. A go:noscan would presumably have caught the subtle issue that the closure in systemstack(func() { ... }) is not marked nosplit.

@aclements aclements modified the milestones: Go1.11, Unplanned Jun 19, 2018
@aclements
Copy link
Member

Another example: CL 131277 fixed a missing nosplit. This is a case of "bad state for stack growth/scheduling" since the runtime may not be initialized when entering this function via libpreinit.

@wsc1
Copy link

wsc1 commented Oct 16, 2018

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

The naming and related breakdown of functionality needs to take into account 3rd party user code, as go:nosplit is available to 3rd parties independent of relation to the runtime.

The main reason I can think of for 3rd party code is to make it easier to express scheduling control
points in the stack based/cooperative scheduling semantic. If that semantic is indeed going away,
then one way to give control back to the go user programmer would be to introduce compiler directive which is independent of runtime internals like whether the stack is split or whether the GC uses a write barrier or some other kind of hypothetical hardware assist mechanism.

I would propose to set aside a directive for the go user, perhaps go:noruntime or, if needed
also go:noruntimerec and go:runtimeresume which would have the semantics of go:nosplit* labels but would be classified as intended for consumers of the go runtime and not for implementors.

User directives would then need to be treated as such w.r.t. changes over time: they would be independent of runtime implementation and provide a means for a plain old Go programmer to turn on and off the runtime (and take associated responsibilities such as not calling package runtime functions and working with preallocated memory only, small local variables, ..., but I guess sys calls should be allowed in some form ).

This would enable clean real-time audio code which is user Go code, and may also enable a Go user to interface Go with various other operating system services which give guidelines that risk bad interactions with the runtime.

By classifying a separate directive to the Go user, it would also allow runtime and compiler development to proceed relatively independently of external consumers of go:nosplit and vice versa.

@gopherbot
Copy link

Change https://golang.org/cl/276173 mentions this issue: runtime: prevent stack growth after fork in runtime.sigfillset

gopherbot pushed a commit that referenced this issue Dec 9, 2020
This fixes the unexpected growth of stack in child process, which
is caused by stack checking code in runtime.sigfillset called from
runtime.sigset while clearing the signal handlers in child process.

The redundant stack checking code is generated due to missing
'//go:nosplit' directive that should be annotated for
runtime.sigfillset.

Fixes #43066
Updates #21314

Change-Id: I9483a962a4b0747074313991841e2440ee32198c
Reviewed-on: https://go-review.googlesource.com/c/go/+/276173
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@aclements
Copy link
Member

Another example where nosplitrec would have prevented bugs: CL 431919

@thanm
Copy link
Contributor

thanm commented Oct 10, 2022

Another place where it would have helped described in #56044.

@gopherbot
Copy link

Change https://go.dev/cl/441859 mentions this issue: runtime: mark arenaIdx.l1 and arenaIdx.l2 methods as nosplit

gopherbot pushed a commit that referenced this issue Oct 10, 2022
Mark the "l1" and "l2" methods on "arenaIdx" with //go:nosplit, since
these methods are called from a nosplit context (for example, from
"spanOf").

Fixes #56044.
Updates #21314.

Change-Id: I48c7aa756b59a13162c89ef21066f83371ae50f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/441859
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
Mark the "l1" and "l2" methods on "arenaIdx" with //go:nosplit, since
these methods are called from a nosplit context (for example, from
"spanOf").

Fixes golang#56044.
Updates golang#21314.

Change-Id: I48c7aa756b59a13162c89ef21066f83371ae50f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/441859
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants