Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: pointers passed to cgo escape to the heap #27538

Open
FiloSottile opened this issue Sep 6, 2018 · 12 comments
Open

cmd/compile: pointers passed to cgo escape to the heap #27538

FiloSottile opened this issue Sep 6, 2018 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@FiloSottile
Copy link
Contributor

In https://go-review.googlesource.com/c/go/+/133836 I work around an extra allocation by moving outLen (a simple size_t, passed by pointer to a C function) to the C stack. I initially thought escaping was unavoidable across the cgo boundary, but in fact cgo has strict rules about not retaining any pointers to Go memory, so I'm not sure why that variable needed to escape.

/cc @ianlancetaylor @dr2chase

@FiloSottile FiloSottile added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 6, 2018
@FiloSottile FiloSottile added this to the Unplanned milestone Sep 6, 2018
@ianlancetaylor
Copy link
Contributor

All Go pointers passed to cgo must escape, because cgo code may call back to Go code, and calling back to Go code may cause the stack to be copied to a new location. If the pointer did not escape, it might be invalidated during the stack copy, violating the promise that a pointer passed to C will not move.

@dr2chase
Copy link
Contributor

dr2chase commented Sep 6, 2018

It might be interesting to revisit this in the future, perhaps by running callbacks to Go on discontiguous stacks. Austin's latest plan for preemptible loops involves scanning the last frame conservatively, which means that the GC cannot necessarily relocate stacks without a goroutine's cooperation; instead, it might need to leave a "fix your stack" notification for the goroutine's next check, and it is done synchronously.

@bcmills
Copy link
Contributor

bcmills commented Sep 7, 2018

I talked with @aclements about this a year or two ago, and he suggesting using a fresh stack for each call from cgo back into Go. That way, we could guarantee that the Go stack calling C would not be relocated for the duration of the call, even if the topmost Go function call needs to expand its stack.

I think that's also what @dr2chase means by “running callbacks to Go on discontiguous stacks”, but I could be mistaken.

@aclements
Copy link
Member

A more efficient variation of using a fresh stack for every C->Go call is to keep using the same stack, but record a cut point at the C->Go switch SP. If the stack needs to grow, we keep the old stack up to the cut point, and only move what's after the cut point.

I mentioned this idea to Russ back when we had talked about it and he was strongly opposed to moving back to anything resembling split stacks because of how they affect debugging, profiling, performance, and overall complexity. But in this particular case, the stack is already weird because it jumps back and forth between the Go and C stacks. With my "cut point" idea, you also don't have the performance cliff problem that split stacks had. It's also really unfortunate that we're interfacing with a calling convention that depends heavily on out-parameters and yet we force all of those out-parameters to be allocated on the heap.

We also might want something like this for debugger call injection if I pull the universal liveness maps back out.

@bcmills
Copy link
Contributor

bcmills commented Sep 7, 2018

you also don't have the performance cliff problem that split stacks had.

With the cut-point approach, we could also mitigate the performance cliff by leaving some sort of “stack was moved to here” note for the goroutine to see when we return back from C. Then it could finish migrating to the new stack after the C call returns, and potentially avoid resizing again if it repeats that Go→C→Go chain.

@aclements
Copy link
Member

Interesting. Are you imagining there could still be a cliff if, say, you're already close to the stack bound when you do the Go -> C -> Go transition and you do that transition repeatedly? I could see that.

One possible problem with finishing the move after the return to Go is that this would require the new stack to be allocated with space for the old stack. If you did several transitions between Go and C without returning, you could wind up with several stacks and several of these pending moves. Maybe the asymptotics of that are okay since, even in the limit, it's at most double the space you need. (Obviously we'd need to orchestrate those pending moves correctly. Maybe on the first return you do all of the pending moves, or maybe you do the moves one at a time as you return and always move to the current stack.)

@bcmills
Copy link
Contributor

bcmills commented Sep 7, 2018

Are you imagining there could still be a cliff if, say, you're already close to the stack bound when you do the Go -> C -> Go transition and you do that transition repeatedly?

Yep, that's the one I would be concerned about.

One possible problem with finishing the move after the return to Go is that this would require the new stack to be allocated with space for the old stack.

That's true, but we could start using the new stack at its base: we know it will be completely empty again by the time we return to C (and back to Go), so we don't actually waste any of its space. (If the old stack is N bytes and the new stack uses M bytes before returning, we might want to resize again to make sure it's at least N+M bytes, but running that close to the wire seems like it will be infrequent anyway.)

@aclements
Copy link
Member

Hmm. So we'd still allocate the new stack to be 2x the old stack, but we could start at its base and have all of that space available. I like it. I think that approach would require that we move one stack at a time as we unwind transitions; otherwise, we may not have enough space on the new stack for the sum of all of the old stacks (not a problem, just an observation).

@aclements
Copy link
Member

@cherrymui made an excellent observation that any sort of stack splitting approach isn't enough: Go could pass a pointer to C, which could pass that pointer back to Go, and the Go callback could then leak it to the heap. We completely lose the escape flows when we enter C, despite the cgo pointer passing rules, so it's not only about moving stacks.

I'm not sure there's any good solution to this. We could introduce annotations on C functions to communicate promises about what happens with pointers (the C call is already unsafe anyway). We could do something where we allocate C-escaped objects in some special heap where we can detect leaks and otherwise free it on return from the allocating frame (this sounds really hard). @dr2chase had an interesting idea that if we were to do a Rust-specific FFI, that the Rust type system may help here.

@bcmills
Copy link
Contributor

bcmills commented Dec 4, 2019

Go could pass a pointer to C, which could pass that pointer back to Go, and the Go callback could then leak it to the heap.


According to https://golang.org/cmd/cgo/#hdr-Passing_pointers:

C code may not keep a copy of a Go pointer after the call returns.

To me, that implies that the C code also must not call a Go function that would keep a copy of the Go pointer. But perhaps I am reading it more strictly than intended.

Also:

A Go function called by C code may not return a Go pointer […]. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.

That restriction implies that a Go function called from C cannot allow variables to escape through function arguments or return values: the only possible escape of a function argument is through the heap, if that is even allowed.

@ianlancetaylor
Copy link
Contributor

@bcmills That wasn't the intent of the rules. The intent was that the system always have a fixed known set of Go pointers visible to C code, but there wasn't meant to be any restriction on what Go code could do with those pointers, even if the Go code in question is called by C code.

Of course, we could change the rules.

@ianlancetaylor
Copy link
Contributor

We could annotate a C function, in the cgo preamble, to say that it does not call back to Go. That would be easy to verify at run time.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants