Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: s390x flake #19243

Closed
bradfitz opened this issue Feb 22, 2017 · 3 comments
Closed

os/exec: s390x flake #19243

bradfitz opened this issue Feb 22, 2017 · 3 comments

Comments

@bradfitz
Copy link
Contributor

https://build.golang.org/log/085751bd5790fd794a1a711f0f15bdefa146443a

--- FAIL: TestPipeLookPathLeak (0.97s)
	exec_test.go:314: leaked 4 fds; want ~0; have:
		COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
		exec.test 46001 root  cwd       DIR   94,1     4096 1312440 /data/golang/workdir/go/src/os/exec
		exec.test 46001 root  rtd       DIR   94,6     4096       2 /
		exec.test 46001 root  txt       REG   94,6  5990480  296171 /tmp/go-build658367580/os/exec/_test/exec.test
		exec.test 46001 root  mem       REG   0,18   217032   11606 /run/nscd/passwd
		exec.test 46001 root  mem       REG   94,6  1821624  269949 /lib64/libc-2.19.so
		exec.test 46001 root  mem       REG   94,6   137096  262296 /lib64/libpthread-2.19.so
		exec.test 46001 root  mem       REG   94,6   161664  262260 /lib64/ld-2.19.so
		exec.test 46001 root    0r      CHR    1,3      0t0    1028 /dev/null
		exec.test 46001 root    1w     FIFO    0,9      0t0 1093550 pipe
		exec.test 46001 root    2w     FIFO    0,9      0t0 1093550 pipe
		exec.test 46001 root    3r      CHR    1,3      0t0    1028 /dev/null
		exec.test 46001 root    4u     0000   0,10        0    6484 anon_inode
		exec.test 46001 root    5r     FIFO    0,9      0t0 1096158 pipe
		exec.test 46001 root    6w     FIFO    0,9      0t0 1096158 pipe
		exec.test 46001 root    7r     FIFO    0,9      0t0 1096159 pipe
		exec.test 46001 root    8w     FIFO    0,9      0t0 1096159 pipe
		exec.test 46001 root    9w  unknown                         /proc/46001/fd/9 (readlink: No such file or directory)
		
		originally:
		COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
		exec.test 46001 root  cwd    DIR   94,1     4096 1312440 /data/golang/workdir/go/src/os/exec
		exec.test 46001 root  rtd    DIR   94,6     4096       2 /
		exec.test 46001 root  txt    REG   94,6  5990480  296171 /tmp/go-build658367580/os/exec/_test/exec.test
		exec.test 46001 root  mem    REG   0,18   217032   11606 /run/nscd/passwd
		exec.test 46001 root  mem    REG   94,6  1821624  269949 /lib64/libc-2.19.so
		exec.test 46001 root  mem    REG   94,6   137096  262296 /lib64/libpthread-2.19.so
		exec.test 46001 root  mem    REG   94,6   161664  262260 /lib64/ld-2.19.so
		exec.test 46001 root    0r   CHR    1,3      0t0    1028 /dev/null
		exec.test 46001 root    1w  FIFO    0,9      0t0 1093550 pipe
		exec.test 46001 root    2w  FIFO    0,9      0t0 1093550 pipe
		exec.test 46001 root    4u  0000   0,10        0    6484 anon_inode
		exec.test 46001 root    5r  FIFO    0,9      0t0 1093604 pipe
		exec.test 46001 root    7r  FIFO    0,9      0t0 1093605 pipe
FAIL
FAIL	os/exec	1.645s
ok  	os/signal	4.618s

/cc @mundaym @ianlancetaylor

@bradfitz bradfitz added this to the Go1.9Maybe milestone Feb 22, 2017
@mundaym
Copy link
Member

mundaym commented Feb 22, 2017

Running locally I occasionally hit the maximum 3 tries in this test (though I haven't seen it fail). Putting it up to 5 might make this a bit more robust.

I tried experimenting with removing the exec.Command("something-that-does-not-exist-binary") and I still occasionally see fdGrowth hit 4. I'm wondering if the exec.Command("lsof", "-b", "-n", "-p", strconv.Itoa(os.Getpid())).Output() call is allocating pipes and lsof is sometimes seeing them. This might also explain another observation I made which is that fdGrowth is sometimes negative.

@mundaym
Copy link
Member

mundaym commented Feb 23, 2017

I played around with strace and I think what is happening is:

  1. Parent process creates 4 pipes, with 2 FDs each: stdin, stdout, stderr & error status
  2. Parent process starts child process (which inherits the FDs of the pipes)
  3. Parent process closes its copies of the FDs it created for the child process
  4. Child process starts, overwrites (dup2) its stdin, stdout and stderr with the pipe FDs
  5. Child process executes the command (in this case lsof)

This looks like a race condition, there is nothing preventing step 3 being delayed until after step 5 has started. So lsof can see the fds the parent process created for its process.

Some possible solutions:

a. Accept a difference of up to 4 (maybe 5) fds. Probably accompanied with an increase in the number of failed exec calls (the cleanup of which this test is trying to check) from 4 to 6 or more so that a leak of 1 pipe is definitely detected.
b. Add some sort of synchronisation between the parent and child processes.

I suspect that either of these fixes will remove the need for retries.

Does this sound reasonable?

@gopherbot
Copy link

CL https://golang.org/cl/37343 mentions this issue.

@golang golang locked and limited conversation to collaborators Feb 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants