Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: immediately handoff P before returning to C host program #57103

Open
antJack opened this issue Dec 6, 2022 · 15 comments · May be fixed by #57104
Open

runtime/cgo: immediately handoff P before returning to C host program #57103

antJack opened this issue Dec 6, 2022 · 15 comments · May be fixed by #57104
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@antJack
Copy link

antJack commented Dec 6, 2022

What version of Go are you using (go version)?

$ go version
go version go1.19.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yongjie.yyj/.cache/go-build"
GOENV="/home/yongjie.yyj/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/yongjie.yyj/gopath/pkg/mod"
GONOPROXY="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GONOSUMDB="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GOOS="linux"
GOPATH="/home/yongjie.yyj/gopath"
GOPRIVATE="*.alipay-inc.com,*.alibaba-inc.com,*.alipay.com"
GOPROXY="https://goproxy.cn"
GOROOT="/home/yongjie.yyj/go1.19.3"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/yongjie.yyj/go1.19.3/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1750707572=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Recently we're building our go program as dynamic linking lib(.so) and run it on a C-embedded program using cgo, and we found that there is still room for optimization.

As shown in the following demo, under the condition of limited P's resource, there's some delay between the cgo returns and the background goroutine being scheduled.

// foo.go
package main

import "C"
import (
	"fmt"
	"sync/atomic"
	"time"
)

var ch = make(chan int64, 1)
var t, n int64

func init() {
	go func() { // background goroutine
		for {
			start := <-ch
			now := time.Now().UnixNano()
			atomic.AddInt64(&t, now-start) // from cgo returns to background goroutine being scheduled
			atomic.AddInt64(&n, 1)
		}
	}()
}

//export foo
func foo() {
	ch <- time.Now().UnixNano() 
        // cgo returns
}

//export report
func report() {
	fmt.Println(atomic.LoadInt64(&t) / atomic.LoadInt64(&n), "ns")
}

func main() {}
// main.cc
#include <unistd.h>

#include "libfoo.h"

int main() {
	for(int i = 0; i < 10000; i++) {
		usleep(1000); // do somethings...
                foo();   // cgo call
	}
	sleep(1);
	report();
}

run

> go build -buildmode=c-shared -o libfoo.so foo.go     // build go as dynamic link lib

> gcc main.cc -lfoo -L./ -lpthread -o main -g        // build C host program

> GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main         // run demo under limited P's resource

The above demo indicates that there's some schedule delay between the cgo returns and the background goroutine being scheduled. After going through runtime code, we found that when the cgo returns, reentersyscall changes P's status to _Psyscall and left it waiting until sysmon retake, which leading to sub-optimized performance.

If we try to handoff p immediately after cgo returns, as shown in the related pr, we can observe much better cgo performance.

$ GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main-before
14214 ns

$ GOMAXPROCS=1 LD_LIBRARY_PATH=./ ./main-after
7163 ns

Therefore, this issue and the related pr request changes that the runtime could handoff p immediately before cgo returns to the C host program for better performance. However, how to determine whether it's returning to C host program or it's just a normal syscall (should not handoff p) is still a question. A possible way is to add compiler directive such as //go:handoffp on the exported go function?

What did you expect to see?

the background goroutine should be scheduled as soon as possible

What did you see instead?

there is some delay between cgo return and the background goroutine being scheduled, leading to sub-optimized performance.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 6, 2022
antJack added a commit to antJack/go that referenced this issue Dec 6, 2022
experimentally handoff P before the reentersyscall returning to C host
program in order to improve cgo performance.

releated issue: golang#57103

Change-Id: Ibbcda7bded04de20ee196f0b0dd1a7ed41765dfe
@gopherbot
Copy link

Change https://go.dev/cl/455418 mentions this issue: runtime/cgo: immediately handoff P before returning to C host program

@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Dec 6, 2022
@prattmic
Copy link
Member

prattmic commented Dec 6, 2022

cc @golang/runtime

When calling cgo, we don't do immediate handoff under the assumption that often the C function will return quickly, making it advantageous to keep the P fast path.

For the inverse case, I can see intuitively how it is less obvious that the C code will call Go again very soon. Unfortunately these are all heuristics, so I don't know what behavior tends to be in practice. It presumably varies widely from program to program.

@antJack
Copy link
Author

antJack commented Dec 7, 2022

What if we introduce a new compiler directive that allows programmers to provide hints to the runtime about which cgo functions are likely to be called again very soon or not?

//export foo
//go:handoffp      <- foo is *not* likely to be called soon, so the runtime can immediately release the P
func foo() {
    // xxx
}

//export bar
func bar() {     //  <- no hint, bar is likely to be called soon, just let P waiting on the fast path
    // xxx
}

Maybe I can try to work out a sample version in the few next days.

@antJack
Copy link
Author

antJack commented Dec 7, 2022

Or perhaps we can introduce a new env, just like GOMAXPROCS:

GOMAXPROCS=1 GOHANDOFFP=1 ./main

Although this approach does not provide precise control, it has the advantage of simplicity. The implementation can be further discussed, there is plenty of ways to achieve it. But for now, I think we can provide opportunity that allows programmers to decide whether to handoff P or not. What do you think?

@doujiang24
Copy link
Contributor

In my opinion, I think handoff P immediately is better in most cases. It may deserve the default behavior.
Since we can not assume there will be another C call Go soon, after the previous Go function returns to C.

P is an expensive resource, we'd better not waste it, it's wasting P while it's waiting for another C call Go.

@ianlancetaylor
Copy link
Contributor

If we think that most C functions return quickly, then it seems to me that handing off the P immediately is not better. Better to let the goroutine continue with its cached context.

Handing off the P immediately is better if the C function takes a long time.

So we have to make a judgement call. We've decided that we think that on average C functions tend to return quickly.

@antJack
Copy link
Author

antJack commented Dec 9, 2022

If our program is mainly driven by Go, then I do agree that C functions tend to return quickly and we should not handoff the P. But if the program is mainly driven by C, the Go part is a library that runs embeddedly on a C host program (that's why Go provides build mode c-archive/c-shared), then the question may change from "how quickly the C function will return" to "how quickly the C host program will enter the Go lib again". The point is that different build modes may lead to different answers.

Another point is that maybe it's better to provide ways for users to tune their program according to their real situation. Just like what GOGC does, we can also let users determine whether the runtime should immediately handoff P or not. Otherwise, they can only passively accept the assumption that C functions tend to return quickly, no matter what they actually do.

@mpx
Copy link
Contributor

mpx commented Dec 11, 2022

Perhaps this could be detected (either for the callsite, or the C function)? Default to not handing off, and switch to handing off immediately if some threshold is reached.

The impact of handing off for short lived calls is relatively large - really don't want to do it if it is unnecessary.

@prattmic prattmic added this to the Backlog milestone Dec 14, 2022
@prattmic prattmic self-assigned this Dec 14, 2022
@prattmic
Copy link
Member

I think this is just an unintentional bug in our implementation. When a C program calls into Go, we have to acquire an M, which acquires a P. When we return to C, the standard entersyscall marks the P as _Psyscall and stores it in m.oldp. Then because we are returning to C entirely, we release the M back to the needm pool. It doesn't make much sense for a released M to still be referencing the old P. Sure, another call into Go could get the M back and then get the P back, but it might not even be the same thread.

@prattmic prattmic added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Dec 14, 2022
@ianlancetaylor
Copy link
Contributor

My apologies, I think I misunderstood the code earlier. I agree with @prattmic that this is a bug that we should fix.

@antJack
Copy link
Author

antJack commented Dec 15, 2022

ok, do we have any idea on how to fix it? I can try to work it out in on the related cl. Maybe we should take different actions according to build-mode in reentersyscall?

@ianlancetaylor
Copy link
Contributor

It's not a question of the buildmode, it's a question of whether a Go function is returning to a C function that was not called by a Go function. I think that the dropm function can check m.oldp. If it is still in _Psyscall state, it can switch it to _Pidle state and call handoffp.

@antJack
Copy link
Author

antJack commented Dec 16, 2022

PTAL, I've moved the handoffp into the dropm in CL455418

@ianlancetaylor
Copy link
Contributor

It's worth noting that https://go.dev/cl/392854 is touching some of this code as well.

@doujiang24
Copy link
Contributor

Yeah, we'd better not add it to dropm, dropm will be skipped totally in CL 392854

I think it's better to check m.isextra instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

Successfully merging a pull request may close this issue.

6 participants