runtime/cgo: immediately handoff P before returning to C host program #57103

antJack · 2022-12-06T09:06:23Z

What version of Go are you using (`go version`)?

$ go version
go version go1.19.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yongjie.yyj/.cache/go-build"
GOENV="/home/yongjie.yyj/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/yongjie.yyj/gopath/pkg/mod"
GONOPROXY="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GONOSUMDB="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GOOS="linux"
GOPATH="/home/yongjie.yyj/gopath"
GOPRIVATE="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GOPROXY="https://goproxy.cn"
GOROOT="/home/yongjie.yyj/go1.19.3"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/yongjie.yyj/go1.19.3/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1750707572=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Recently we're building our go program as dynamic linking lib(.so) and run it on a C-embedded program using cgo, and we found that there is still room for optimization.

As shown in the following demo, under the condition of limited P's resource, there's some delay between the cgo returns and the background goroutine being scheduled.

// foo.go
package main

import "C"
import (
	"fmt"
	"sync/atomic"
	"time"
)

var ch = make(chan int64, 1)
var t, n int64

func init() {
	go func() { // background goroutine
		for {
			start := <-ch
			now := time.Now().UnixNano()
			atomic.AddInt64(&t, now-start) // from cgo returns to background goroutine being scheduled
			atomic.AddInt64(&n, 1)
		}
	}()
}

//export foo
func foo() {
	ch <- time.Now().UnixNano() 
        // cgo returns
}

//export report
func report() {
	fmt.Println(atomic.LoadInt64(&t) / atomic.LoadInt64(&n), "ns")
}

func main() {}

// main.cc
#include <unistd.h>

#include "libfoo.h"

int main() {
	for(int i = 0; i < 10000; i++) {
		usleep(1000); // do somethings...
                foo();   // cgo call
	}
	sleep(1);
	report();
}

run

> go build -buildmode=c-shared -o libfoo.so foo.go     // build go as dynamic link lib

> gcc main.cc -lfoo -L./ -lpthread -o main -g        // build C host program

> GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main         // run demo under limited P's resource

The above demo indicates that there's some schedule delay between the cgo returns and the background goroutine being scheduled. After going through runtime code, we found that when the cgo returns, reentersyscall changes P's status to _Psyscall and left it waiting until sysmon retake, which leading to sub-optimized performance.

If we try to handoff p immediately after cgo returns, as shown in the related pr, we can observe much better cgo performance.

$ GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main-before
14214 ns

$ GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main-after
7163 ns

Therefore, this issue and the related pr request changes that the runtime could handoff p immediately before cgo returns to the C host program for better performance. However, how to determine whether it's returning to C host program or it's just a normal syscall (should not handoff p) is still a question. A possible way is to add compiler directive such as //go:handoffp on the exported go function?

What did you expect to see?

the background goroutine should be scheduled as soon as possible

What did you see instead?

there is some delay between cgo return and the background goroutine being scheduled, leading to sub-optimized performance.

The text was updated successfully, but these errors were encountered:

experimentally handoff P before the reentersyscall returning to C host program in order to improve cgo performance. releated issue: golang#57103 Change-Id: Ibbcda7bded04de20ee196f0b0dd1a7ed41765dfe

gopherbot · 2022-12-06T09:17:49Z

Change https://go.dev/cl/455418 mentions this issue: runtime/cgo: immediately handoff P before returning to C host program

prattmic · 2022-12-06T18:16:56Z

cc @golang/runtime

When calling cgo, we don't do immediate handoff under the assumption that often the C function will return quickly, making it advantageous to keep the P fast path.

For the inverse case, I can see intuitively how it is less obvious that the C code will call Go again very soon. Unfortunately these are all heuristics, so I don't know what behavior tends to be in practice. It presumably varies widely from program to program.

antJack · 2022-12-07T02:29:46Z

What if we introduce a new compiler directive that allows programmers to provide hints to the runtime about which cgo functions are likely to be called again very soon or not?

//export foo
//go:handoffp      <- foo is *not* likely to be called soon, so the runtime can immediately release the P
func foo() {
    // xxx
}

//export bar
func bar() {     //  <- no hint, bar is likely to be called soon, just let P waiting on the fast path
    // xxx
}

Maybe I can try to work out a sample version in the few next days.

antJack · 2022-12-07T02:54:36Z

Or perhaps we can introduce a new env, just like GOMAXPROCS:

GOMAXPROCS=1 GOHANDOFFP=1 ./main

Although this approach does not provide precise control, it has the advantage of simplicity. The implementation can be further discussed, there is plenty of ways to achieve it. But for now, I think we can provide opportunity that allows programmers to decide whether to handoff P or not. What do you think?

doujiang24 · 2022-12-08T11:57:23Z

In my opinion, I think handoff P immediately is better in most cases. It may deserve the default behavior.
Since we can not assume there will be another C call Go soon, after the previous Go function returns to C.

P is an expensive resource, we'd better not waste it, it's wasting P while it's waiting for another C call Go.

ianlancetaylor · 2022-12-09T04:16:08Z

If we think that most C functions return quickly, then it seems to me that handing off the P immediately is not better. Better to let the goroutine continue with its cached context.

Handing off the P immediately is better if the C function takes a long time.

So we have to make a judgement call. We've decided that we think that on average C functions tend to return quickly.

antJack · 2022-12-09T06:53:50Z

If our program is mainly driven by Go, then I do agree that C functions tend to return quickly and we should not handoff the P. But if the program is mainly driven by C, the Go part is a library that runs embeddedly on a C host program (that's why Go provides build mode c-archive/c-shared), then the question may change from "how quickly the C function will return" to "how quickly the C host program will enter the Go lib again". The point is that different build modes may lead to different answers.

Another point is that maybe it's better to provide ways for users to tune their program according to their real situation. Just like what GOGC does, we can also let users determine whether the runtime should immediately handoff P or not. Otherwise, they can only passively accept the assumption that C functions tend to return quickly, no matter what they actually do.

mpx · 2022-12-11T23:57:43Z

Perhaps this could be detected (either for the callsite, or the C function)? Default to not handing off, and switch to handing off immediately if some threshold is reached.

The impact of handing off for short lived calls is relatively large - really don't want to do it if it is unnecessary.

prattmic · 2022-12-14T21:11:36Z

I think this is just an unintentional bug in our implementation. When a C program calls into Go, we have to acquire an M, which acquires a P. When we return to C, the standard entersyscall marks the P as _Psyscall and stores it in m.oldp. Then because we are returning to C entirely, we release the M back to the needm pool. It doesn't make much sense for a released M to still be referencing the old P. Sure, another call into Go could get the M back and then get the P back, but it might not even be the same thread.

ianlancetaylor · 2022-12-15T00:00:35Z

My apologies, I think I misunderstood the code earlier. I agree with @prattmic that this is a bug that we should fix.

antJack · 2022-12-15T03:25:11Z

ok, do we have any idea on how to fix it? I can try to work it out in on the related cl. Maybe we should take different actions according to build-mode in reentersyscall?

ianlancetaylor · 2022-12-16T03:44:58Z

It's not a question of the buildmode, it's a question of whether a Go function is returning to a C function that was not called by a Go function. I think that the dropm function can check m.oldp. If it is still in _Psyscall state, it can switch it to _Pidle state and call handoffp.

antJack · 2022-12-16T08:30:03Z

PTAL, I've moved the handoffp into the dropm in CL455418

ianlancetaylor · 2022-12-16T23:07:56Z

It's worth noting that https://go.dev/cl/392854 is touching some of this code as well.

doujiang24 · 2022-12-17T03:23:12Z

Yeah, we'd better not add it to dropm, dropm will be skipped totally in CL 392854

I think it's better to check m.isextra instead.

gopherbot added the compiler/runtime label Dec 6, 2022

antJack linked a pull request Dec 6, 2022 that will close this issue

runtime: immediately handoff P before returning to C host program #57104

Open

prattmic added the NeedsDecision label Dec 6, 2022

mknyszek added this to Go Compiler / Runtime Dec 7, 2022

prattmic added this to the Backlog milestone Dec 14, 2022

prattmic moved this to Todo in Go Compiler / Runtime Dec 14, 2022

prattmic self-assigned this Dec 14, 2022

prattmic added NeedsFix and removed NeedsDecision labels Dec 14, 2022

gabyhelp mentioned this issue Jul 25, 2024

runtime: severe performance drop for cgo calls in go1.22.5 #68587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/cgo: immediately handoff P before returning to C host program #57103

runtime/cgo: immediately handoff P before returning to C host program #57103

antJack commented Dec 6, 2022

gopherbot commented Dec 6, 2022

prattmic commented Dec 6, 2022

antJack commented Dec 7, 2022

antJack commented Dec 7, 2022

doujiang24 commented Dec 8, 2022

ianlancetaylor commented Dec 9, 2022

antJack commented Dec 9, 2022

mpx commented Dec 11, 2022

prattmic commented Dec 14, 2022

ianlancetaylor commented Dec 15, 2022

antJack commented Dec 15, 2022

ianlancetaylor commented Dec 16, 2022

antJack commented Dec 16, 2022

ianlancetaylor commented Dec 16, 2022

doujiang24 commented Dec 17, 2022

runtime/cgo: immediately handoff P before returning to C host program #57103

runtime/cgo: immediately handoff P before returning to C host program #57103

Comments

antJack commented Dec 6, 2022

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

gopherbot commented Dec 6, 2022

prattmic commented Dec 6, 2022

antJack commented Dec 7, 2022

antJack commented Dec 7, 2022

doujiang24 commented Dec 8, 2022

ianlancetaylor commented Dec 9, 2022

antJack commented Dec 9, 2022

mpx commented Dec 11, 2022

prattmic commented Dec 14, 2022

ianlancetaylor commented Dec 15, 2022

antJack commented Dec 15, 2022

ianlancetaylor commented Dec 16, 2022

antJack commented Dec 16, 2022

ianlancetaylor commented Dec 16, 2022

doujiang24 commented Dec 17, 2022

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?