Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: (*Cmd).Wait hangs on macOS when setctty is used after commit b15c399a3 #61779

Open
FiloSottile opened this issue Aug 5, 2023 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@FiloSottile
Copy link
Contributor

A test that involves setting the command's TTY with the code below, and then calling (*Cmd).Wait, started timing out on macOS in Go 1.20.

func SetCtty(cmd *exec.Cmd, tty *os.File) {
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Setctty: true,
		Setsid:  true,
		Ctty:    3,
	}
	cmd.ExtraFiles = []*os.File{tty}
}

I bisected it down to b15c399.

commit b15c399a36a38509ae56dd69670974566f7b0d52
Author: Yuval Pavel Zholkover <paulzhol@gmail.com>
Date:   Sat Jul 30 20:41:58 2022 +0300

    os: only add file descriptors which are set to non-blocking mode to the netpoller

    Either ones where kind == kindNonBlock or those we've successfully called syscall.SetNonblock() on.
    Restore blocking behavior if we detect an error registering with the netpoller and our flow was
    successful in setting the inital syscall.SetNonblock().

    Update #54100

    Change-Id: I08934e4107c7fb36c15a7ca23ac880490b4df235
    Reviewed-on: https://go-review.googlesource.com/c/go/+/420334
    TryBot-Result: Gopher Robot <gobot@golang.org>
    Reviewed-by: Dmitri Goutnik <dgoutnik@gmail.com>
    Reviewed-by: Ian Lance Taylor <iant@google.com>
    Run-TryBot: Yuval Pavel Zholkover <paulzhol@gmail.com>
    Reviewed-by: Than McIntosh <thanm@google.com>
    Auto-Submit: Ian Lance Taylor <iant@golang.org>

 src/os/file_unix.go | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

You can see the failing tests in the CI of rogpeppe/go-internal#172 (https://github.com/rogpeppe/go-internal/actions/runs/5005273594?pr=172) or in age's tests (https://github.com/FiloSottile/age/actions/runs/5633140339/job/15261730363).

/cc @paulzhol @ianlancetaylor @dmgk #54100

@ianlancetaylor
Copy link
Contributor

I'm sure I'm doing something dumb but I can't find the sources for the TestScripts/pty test. I cloned https://github.com/FiloSottile/go-internal but I don't see that test.

If the cause is https://go.dev/cl/420334 then my first guess would be that Darwin kqueue does not permit adding a pty. But I don't understand what the pty is being used for. It looks like the subcommand stdin, stdout, and stderr will all be pipes.

@mvdan
Copy link
Member

mvdan commented Aug 6, 2023

@ianlancetaylor that testdata file is added in the PR https://github.com/rogpeppe/go-internal/pull/172/files, so you can find it at the branch https://github.com/FiloSottile/go-internal/tree/filippo/pty.

@FiloSottile
Copy link
Contributor Author

FiloSottile commented Aug 6, 2023 via email

FiloSottile added a commit to FiloSottile/go-internal that referenced this issue Aug 6, 2023
mvdan pushed a commit to rogpeppe/go-internal that referenced this issue Aug 7, 2023
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 7, 2023
@mknyszek mknyszek added this to the Backlog milestone Aug 7, 2023
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Aug 9, 2023

I don't fully understand what is happening.

It's clear that macOS does not permit opening /dev/tty and passing the resulting descriptor to kevent.

Before CL 420334 we would first try to add the descriptor to kqueue, and if that succeeded put the descriptor into non-blocking mode. After that CL we would first put the descriptor into non-blocking mode, and then try to add it to kqueue. We made that change because of #54100, in which we could sometimes add a descriptor to kqueue and then fail to make it non-blocking, which is bad. So now we first set the descriptor to non-blocking, and then try to add it to kqueue. If we fail to add it to kqueue, we put the descriptor back into blocking mode.

That is the sequence that happens for macOS with /dev/tty. It seems fine. But for some reason that I don't understand, opening /dev/tty, setting it to non-blocking mode, then setting it back to blocking mode, causes an inexplicable problem: the program hangs when calling _exit. Unfortunately I can't run dtruss on the system I am using, so I don't know precisely which system call is hanging. I don't know whether this is a libsystem bug or a kernel bug.

It is also possible that the problem arises because we put /dev/ptmx or the /dev/ttysNNN device into non-blocking mode.

I have not been able to write my own test case for this, although I can see that the test described above fails. I'm not sure what is different, other than that I am using the internal/testpty package from the standard library.

I have a simple fix that fixes this issue. Maybe we should just go with it. I'm not sure.

@gopherbot
Copy link

Change https://go.dev/cl/517555 mentions this issue: os: don't add character devices to kqueue on darwin

@paulzhol
Copy link
Member

paulzhol commented Aug 9, 2023

I don't have access to the hardware, but darwin has a test suite which sets the returned openpty() fd's to O_NONBLOCK, adds them to various kqueues and does read/write calls on them.

I'd love to see some system call traces showing whether the pair of opened ptys in the parent fail to be added to kqueue or it is the opening of /dev/tty in the child (which could be not having a controlling terminal anyway) which fails.

FreeBSD man page on tty devices has this section regarding TIOCNOTTY/TIOCSCTTY and /dev/tty:

The current system does not allocate a controlling terminal
to a process on an open() call: there is a specific ioctl
called TIOCSCTTY to make a terminal the controlling terminal.
In addition, a program can fork() and call the setsid() sys-
tem call which will place the process into its own session -
which has the effect of disassociating it from the control-
ling terminal. This is the new and preferred method for pro-
grams to lose their controlling terminal.

However, environmental restrictions may prohibit the process
from being able to fork() and call the setsid() system call
to disassociate it from the controlling terminal. In this
case, it must use TIOCNOTTY.

Maybe the child OSX process needs to set TIOCSCTTY on /dev/tty as well?

This does not explain the blocking of _exit of course.

@bcmills
Copy link
Contributor

bcmills commented Nov 3, 2023

The next time someone sees a failure mode matching this issue, could you paste a goroutine dump from the timed-out process directly into the GitHub issue? The CI links from the original post seem to have expired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
None yet
Development

No branches or pull requests

7 participants