-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/cgo: immediately handoff P before returning to C host program #57103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
experimentally handoff P before the reentersyscall returning to C host program in order to improve cgo performance. releated issue: golang#57103 Change-Id: Ibbcda7bded04de20ee196f0b0dd1a7ed41765dfe
Change https://go.dev/cl/455418 mentions this issue: |
cc @golang/runtime When calling cgo, we don't do immediate handoff under the assumption that often the C function will return quickly, making it advantageous to keep the P fast path. For the inverse case, I can see intuitively how it is less obvious that the C code will call Go again very soon. Unfortunately these are all heuristics, so I don't know what behavior tends to be in practice. It presumably varies widely from program to program. |
What if we introduce a new compiler directive that allows programmers to provide hints to the runtime about which cgo functions are likely to be called again very soon or not?
Maybe I can try to work out a sample version in the few next days. |
Or perhaps we can introduce a new env, just like GOMAXPROCS:
Although this approach does not provide precise control, it has the advantage of simplicity. The implementation can be further discussed, there is plenty of ways to achieve it. But for now, I think we can provide opportunity that allows programmers to decide whether to handoff P or not. What do you think? |
In my opinion, I think handoff P immediately is better in most cases. It may deserve the default behavior. P is an expensive resource, we'd better not waste it, it's wasting P while it's waiting for another C call Go. |
If we think that most C functions return quickly, then it seems to me that handing off the P immediately is not better. Better to let the goroutine continue with its cached context. Handing off the P immediately is better if the C function takes a long time. So we have to make a judgement call. We've decided that we think that on average C functions tend to return quickly. |
If our program is mainly driven by Go, then I do agree that C functions tend to return quickly and we should not handoff the P. But if the program is mainly driven by C, the Go part is a library that runs embeddedly on a C host program (that's why Go provides build mode Another point is that maybe it's better to provide ways for users to tune their program according to their real situation. Just like what |
Perhaps this could be detected (either for the callsite, or the C function)? Default to not handing off, and switch to handing off immediately if some threshold is reached. The impact of handing off for short lived calls is relatively large - really don't want to do it if it is unnecessary. |
I think this is just an unintentional bug in our implementation. When a C program calls into Go, we have to acquire an M, which acquires a P. When we return to C, the standard |
My apologies, I think I misunderstood the code earlier. I agree with @prattmic that this is a bug that we should fix. |
ok, do we have any idea on how to fix it? I can try to work it out in on the related cl. Maybe we should take different actions according to build-mode in |
It's not a question of the buildmode, it's a question of whether a Go function is returning to a C function that was not called by a Go function. I think that the |
PTAL, I've moved the |
It's worth noting that https://go.dev/cl/392854 is touching some of this code as well. |
Yeah, we'd better not add it to I think it's better to check |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Recently we're building our go program as dynamic linking lib(.so) and run it on a C-embedded program using cgo, and we found that there is still room for optimization.
As shown in the following demo, under the condition of limited P's resource, there's some delay between the cgo returns and the background goroutine being scheduled.
run
The above demo indicates that there's some schedule delay between the cgo returns and the background goroutine being scheduled. After going through runtime code, we found that when the cgo returns,
reentersyscall
changes P's status to_Psyscall
and left it waiting until sysmon retake, which leading to sub-optimized performance.If we try to handoff p immediately after cgo returns, as shown in the related pr, we can observe much better cgo performance.
Therefore, this issue and the related pr request changes that the runtime could handoff p immediately before cgo returns to the C host program for better performance. However, how to determine whether it's returning to C host program or it's just a normal syscall (should not handoff p) is still a question. A possible way is to add compiler directive such as
//go:handoffp
on the exported go function?What did you expect to see?
the background goroutine should be scheduled as soon as possible
What did you see instead?
there is some delay between cgo return and the background goroutine being scheduled, leading to sub-optimized performance.
The text was updated successfully, but these errors were encountered: