Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: EINTR on darwin from os.Process.Wait #36644

Closed
crawshaw opened this issue Jan 19, 2020 · 16 comments
Closed

os: EINTR on darwin from os.Process.Wait #36644

crawshaw opened this issue Jan 19, 2020 · 16 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@crawshaw
Copy link
Member

Background: our Go code is built with -buildmode=c-archive and linked into a macOS app built by xcode. We are not installing any signal handlers, but it is possible some Apple library is doing so behind our backs. We are using the os/exec package to periodically execute netstat -an.

Occasionally, the os/exec.Cmd.Wait method is returning EINTR. This EINTR comes directly from os.Process.Wait. Unfortunately the os/exec package does not expect EINTR, and so it treats it as if the fork process has exited and cleans up its state, so that Cmd.Wait cannot be called again.

The darwin man pages wait(2) and sigaction(2) suggest this should only happen if a signal handler does not have the SA_RESTART flag set. I have not yet determined if Apple is setting another handler for us, or if this is something unusual like Issue #11180.

Even if this is a core Apple library setting a handler (we use no third-party code), it may be worth handling EINTR for darwin in os.Process.Wait, to make the object useful in Go code linked with -buildmode=c-archive.

@ianlancetaylor
Copy link
Contributor

Perhaps we do have to change the os package for this, but I'd like to clearly understand the root cause first. We have not historically tried to handle all possible EINTR errors due to a non-Go signal handler that does not use SA_RESTART.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 21, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Jan 21, 2020
@crawshaw
Copy link
Member Author

I did a little investigating. I was unable to replicate this running a normal binary, but it happens reliably when run under lldb:

$ cat eintr.go 
package main

import "os/exec"

import "C"

func main() {}

//export Netstat
func Netstat() {
	cmd := exec.Command("netstat", "-na")
	if _, err := cmd.Output(); err != nil {
		panic(err)
	}
}
$ cat eintr.c 

extern void Netstat();

int main(void) {
	Netstat();
	return 0;
}
$ go build -buildmode=c-archive eintr.go && cc eintr.c eintr.a && lldb -o run ./a.out
(lldb) target create "./a.out"
Current executable set to './a.out' (x86_64).
(lldb) run
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
Process 41628 exited with status = 2 (0x00000002) 

Process 41628 launched: '/Users/crawshaw/a.out' (x86_64)
(lldb) 

This is the typical way xcode runs development binaries, which is why we see it so frequently.

@ianlancetaylor
Copy link
Contributor

What happens with gdb?

@crawshaw
Copy link
Member Author

(It is surprisingly hard to install gdb on macOS 10.15 with codesigning.)

Results are similar:

~ $ gdb ./a.out 
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin19.2.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
Loading Go Runtime support.
(gdb) start
Temporary breakpoint 1 at 0x1000843b0: file <autogenerated>, line 1.
Starting program: /Users/crawshaw/a.out 
[New Thread 0x1803 of process 75954]
[New Thread 0x1b03 of process 75954]
[New Thread 0x2603 of process 75954]
warning: unhandled dyld version (16)
[New Thread 0x180f of process 75954]
[New Thread 0x1c03 of process 75954]
[New Thread 0x1d03 of process 75954]
[New Thread 0x2303 of process 75954]
[New Thread 0x2403 of process 75954]
[New Thread 0x2503 of process 75954]
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
[Inferior 1 (process 75954) exited with code 02]
(gdb) 

@ianlancetaylor
Copy link
Contributor

Thanks. When I look at that gdb output the first thing that strikes me is: what signal is it? gdb normally reports every signal that a program receives (I don't know about lldb). Why is that not being reported here? When you run info handle, which signals are marked as No under Print?

@crawshaw
Copy link
Member Author

Not printed, according to info handle: SIGALRM, SIGURG, SIGCHLD, SIGIO, SIGVTALRM, SIGPROF, SIGWINCH, SIGPOLL, SIGWAITING, SIGLWP, SIGPRIO, SIGCANCEL, SIGLIBRT.

I tried setting handle SIGCHLD print and got:

(gdb) handle SIGCHLD print
Signal        Stop	Print	Pass to program	Description
SIGCHLD       No	Yes	Yes		Child status changed
(gdb) run
Starting program: /Users/crawshaw/a.out 
[New Thread 0x2703 of process 77262]
[New Thread 0x2503 of process 77262]
[New Thread 0x1a03 of process 77262]
warning: unhandled dyld version (16)
[New Thread 0x1b03 of process 77262]
[New Thread 0x1c03 of process 77262]
[New Thread 0x1d03 of process 77262]
[New Thread 0x2303 of process 77262]
[New Thread 0x2403 of process 77262]
[New Thread 0x270f of process 77262]

Thread 3 received signal SIGCHLD, Child status changed.
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
[Inferior 1 (process 77262) exited with code 02]
(gdb) 

(The debugger freezes on run if I attempt to set all signals to print.)

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 21, 2020

Thanks. So it sounds like something is installing a SIGCHLD signal handler without setting SA_RESTART. Any idea what that is?

I'm trying to drill down on this because if that is what is happening, and we can't avoid it, then we need to add an EINTR loop to a dozen or more places in the standard library. You are just seeing it in wait because that is where your program is spending its time. We will still see the problem with lower frequency in read, write, sendmsg, etc.

@crawshaw
Copy link
Member Author

I only see this behavior under lldb and gdb.

Googling around, here is a reference claiming that gdb installs a SIGCHLD handler on macOS: https://jmmv.dev/2008/10/boostprocess-and-sigchld.html (I notice Julio Merino is a Google employee, maybe jmmv@ will know more?)

Grepping through the GDB sources I don't see any macos-specific code for this, but I am not familiar with the codebase.

@ianlancetaylor
Copy link
Contributor

I wonder if this is a side effect of using ptrace.

What happens if your Go c-archive does

    os.Notify(make(chan syscall.Signal, 1), syscall.SIGCHLD)

?

@crawshaw
Copy link
Member Author

crawshaw commented Jan 21, 2020

Calling signal.Notify captures the signal and lets the program exit cleanly.

@networkimprov
Copy link

cc @jmmv since @crawshaw meant to

@ianlancetaylor
Copy link
Contributor

At least there is a workaround.

I don't know what else we can do short of adding EINTR loops throughout.

@bcmills
Copy link
Contributor

bcmills commented Jan 22, 2020

I don't know what else we can do short of adding EINTR loops throughout.

That is #20400. There, you said (#20400 (comment)):

Unfortunately I agree with you that checking for EINTR in the standard library is the only realistic approach.

@ianlancetaylor
Copy link
Contributor

The change can be simultaneously the only realistic approach and yet be undesirable.

@gopherbot
Copy link

Change https://golang.org/cl/232862 mentions this issue: internal/poll, os: loop on EINTR

@networkimprov
Copy link

I posted a 1.14 backport request for this: #39026

tmm1 pushed a commit to fancybits/go that referenced this issue Jul 8, 2020
Historically we've assumed that we can install all signal handlers
with the SA_RESTART flag set, and let the system restart slow functions
if a signal is received. Therefore, we don't have to worry about EINTR.

This is only partially true, and we've added EINTR checks already for
connect, and open/read on Darwin, and sendfile on Solaris.

Other cases have turned up in golang#36644, golang#38033, and golang#38836.

Also, golang#20400 points out that when Go code is included in a C program,
the C program may install its own signal handlers without SA_RESTART.
In that case, Go code will see EINTR no matter what it does.

So, go ahead and check for EINTR. We don't check in the syscall package;
people using syscalls directly may want to check for EINTR themselves.
But we do check for EINTR in the higher level APIs in os and net,
and retry the system call if we see it.

This change looks safe, but of course we may be missing some cases
where we need to check for EINTR. As such cases turn up, we can add
tests to runtime/testdata/testprogcgo/eintr.go, and fix the code.
If there are any such cases, their handling after this change will be
no worse than it is today.

For golang#22838
Fixes golang#20400
Fixes golang#36644
Fixes golang#38033
Fixes golang#38836

Change-Id: I7e46ca8cafed0429c7a2386cc9edc9d9d47a6896
Reviewed-on: https://go-review.googlesource.com/c/go/+/232862
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit 8c1db77)
@golang golang locked and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants