Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: CGO code EINTR errors at random points on AIX #50521

Closed
endurox-dev opened this issue Jan 9, 2022 · 3 comments
Closed

runtime: CGO code EINTR errors at random points on AIX #50521

endurox-dev opened this issue Jan 9, 2022 · 3 comments

Comments

@endurox-dev
Copy link

endurox-dev commented Jan 9, 2022

What version of Go are you using (go version)?

$ go version
go version go1.16.9 aix/ppc64

Does this issue reproduce with the latest release?

1.16.9 is the latest release from IBM.

What operating system and processor architecture are you using (go env)?

go env Output
 go env
GO111MODULE=""
GOARCH="ppc64"
GOBIN=""
GOCACHE="/home/user1/.cache/go-build"
GOENV="/home/user1/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64"
GOHOSTOS="aix"
GOINSECURE=""
GOMODCACHE="/home/user1/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="aix"
GOPATH="/home/user1/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/freeware/lib/golang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/freeware/lib/golang/pkg/tool/aix_ppc64"
GOVCS=""
GOVERSION="go1.16.9"
GCCGO="gccgo"
GOPPC64="power8"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -maix64 -pthread -mcmodel=large -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build940967260=/tmp/go-build -gno-record-gcc-switches

What did you do?

Mavimax is working on Enduro/X middleware which has some nice bindings to Golang. Also, Enduro/X uses Golang for network connectivity, etc. The core of Enduro/X is written in C and follows the strict semantics of the Unix operating system, as middleware is used for mission-critical projects (real-time banking transactions, etc). When integrating with C, C code is activated by Go code. And what we have recently seen, that our unit-tests which uses Golang bindings randomly fail on AIX operating system.

When we have investigated this deeper, it founds out that we randomly receive some EINTR errors. One could say that you should add retries/loops on system calls to deal with EINTR (and this might be not a loops only, to restart the system call, timeouts shall be recalculated to reflect time already spent, etc.. - thus additional work). But I guess that is not the correct answer, as firstly somebody could use a lot of code which relay that signals and errors have a meaning, secondly there might be some proprietary libraries used which cannot be modified. I guess Golang shall not impose such behavior that other program C threads see some effects from other threads.

We have created a small program to illustrate this issue:

package main

/*
#include 
#include 
#include 
#cgo LDFLAGS:
// simple C code
void do_some_c_call(void) {

    struct timespec timeout;
    timeout.tv_sec = 0;
    timeout.tv_nsec = 1000;
    
    if (0!=nanosleep(&timeout, &timeout) && errno == EINTR)
    {
        printf("INTERRUPTED\n");
    }

}
*/
import "C"


import (
        "sync"
)

func main() {
    
        var wg sync.WaitGroup
        
        for i :=0; i<100000; i++  {
                wg.Add(1)
                go func() {
                        defer wg.Done()
                            for {
                                C.do_some_c_call()
                            }
                }()
        }
        wg.Wait()
}

And when running on AIX for a few minutes, we get:

$ go run test.go 
INTERRUPTED
INTERRUPTED
INTERRUPTED
INTERRUPTED
INTERRUPTED
INTERRUPTED

While on Linux, I was unable to see this output.

We have made test case with GODEBUG="asyncpreemptoff=1" set, then error does not appear on AIX:

$ GODEBUG="asyncpreemptoff=1" go run test.go

(running several minutes).

What did you expect to see?

I expect that in CGO environment, Unix system calls are completed without EINTR errors at random points.
In the above example "INTERRUPTED" shall not be printed in any case.

What did you see instead?

Unix system calls, randomly return error with errno set to EINTR.
In the above example "INTERRUPTED" was printed.

I see that there was a similar case in your unit tests:

But here what I report is already a production-grade defect.

@ianlancetaylor
Copy link
Contributor

See this note from the 1.14 release notes:

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again. For more information about this see man 7 signal for Linux systems or similar documentation for other systems.

See also asyncpreemptoff at https://pkg.go.dev/runtime for a way to disable these signals for specific Go programs.

Closing because this is working as expected.

@spuhpointer
Copy link

spuhpointer commented Jan 10, 2022

Some thoughts:

  1. I just google around, and I see that there are several projects that breaks with introduction of Go 1.14. And probably more software will break, as not all projects are using fresh compilers. Maybe Go compiler somehow can detect that if it links with cgo, then by default enable asyncpreemptoff?

  2. How do you think, if at Go->C entry C code would blocking SIGURG via pthread_sigmask() for the particular thread and unblock when returning to Go (or Go receives call from C), will it break a Go runtime or will it work ok? If it would work fine, then maybe Go can do this automatically?

@ianlancetaylor
Copy link
Contributor

Maybe Go compiler somehow can detect that if it links with cgo, then by default enable asyncpreemptoff?

Thanks for the suggestion, but I don't think that would be appropriate in general. There are many Go programs that call into C code for a few specific purposes, and we don't want those programs to lose the advantages of preemption. The problem only arises with Go programs that call into C code that does I/O. That is a less common case, as Go is perfectly able to do I/O itself.

How do you think, if at Go->C entry C code would blocking SIGURG via pthread_sigmask() for the particular thread and unblock when returning to Go (or Go receives call from C), will it break a Go runtime or will it work ok? If it would work fine, then maybe Go can do this automatically?

As far as I know that wouldn't help. It is already the case that the Go runtime does not send a signal to a thread running C code. The system must be returning EINTR from the system all even though the signal was not directed to the thread making the system call.

@golang golang locked and limited conversation to collaborators Jan 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants