os/exec: TestExtraFiles closes random fds #5073

bradfitz · 2013-03-18T14:45:31Z

The new pollster should probably handle out of fds more gracefully.

Fixing an unrelated fd leak, I wrote a test to intentionally run out of file descriptors.

With the new scheduler, you get pages of this ~forever, EBADF from epoll_wait, I believe:

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
9
epollwait failed with epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with
epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9

dvyukov · 2013-03-18T15:09:54Z

Comment 1:

Please share the test.
I am curious bad fd is passed to epoll and not rejected earlier...

bradfitz · 2013-03-18T15:13:55Z

Comment 2:

The test is in https://golang.org/cl/7799046/
Make sure that fix is not in your tree (if I submit it soon) or sync to 43eb97ed849a,
and then patch in that test  to os/exec/exec_test.go (after TestPipes), making the loop
count much more than 4.  Say, 50:
// issue #5071
func TestPipeLookPathLeak(t *testing.T) {
        fd0 := numOpenFDS(t)
        for i := 0; i < 50; i++ {
                cmd := Command("something-that-does-not-exist-binary")
            cmd.StdoutPipe()
                cmd.StderrPipe()
            cmd.StdinPipe()
                if err := cmd.Run(); err == nil {
                        t.Fatal("unexpected success")
            }
        }
        fdGrowth := numOpenFDS(t) - fd0
    if fdGrowth > 2 {
                t.Errorf("leaked %d fds; want ~0", fdGrowth)
        }
}
func numOpenFDS(t *testing.T) int {
        lsof, err := Command("lsof", "-n", "-p", strconv.Itoa(os.Getpid())).Output()
        if err != nil {
                t.Skip("skipping test; error finding or running lsof")
                return 0
        }
    return bytes.Count(lsof, []byte("\n"))
}
...
And then run "go test -v os/exec" (run all tests, not just the leaky one, because a
future one must use epoll_wait)

dvyukov · 2013-03-22T08:26:35Z

Comment 3:

Can not reproduce.
I apply the following patch:
--- a/src/pkg/os/exec/exec.go   Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 12:22:54 2013 +0400
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       // c.closeDescriptors(c.closeAfterStart)
+       // c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r 9ca85035f95a src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 12:22:54 2013 +0400
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 100; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
And then:
$ go test -cpu=1,2,4,8,16,32,1,2,4,8,16,32 os/exec
but it only says:
--- FAIL: TestPipeLookPathLeak (0.04 seconds)
    exec_test.go:168: leaked 6 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-16 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.14 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0

bradfitz · 2013-03-22T15:35:05Z

Comment 4:

I can still reproduce.  You just need to set the number high enough for the ulimit.
ante:src $ hg diff
diff -r ccdacf6833dc src/pkg/os/exec/exec.go
--- a/src/pkg/os/exec/exec.go   Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 08:32:50 2013 -0700
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       //c.closeDescriptors(c.closeAfterStart)
+       //c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r ccdacf6833dc src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 08:32:50 2013 -0700
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 2000; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
ante:src $ hg id
ccdacf6833dc+ tip
$ go test -v os/exec
Interestingly, I see three outcomes:
1) sometimes it passes.
2) sometimes it fails with two goroutines racing on stdout with garbled output (like in
my original report above), and 
3) sometimes it's only one goroutine looping the errors:
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9

dvyukov · 2013-03-25T17:11:34Z

Comment 5:

Hit deadlock in this test:
https://golang.org/issue/5123

dvyukov · 2013-03-25T18:08:00Z

Comment 6:

I see in strace that somebody actually closes epoll fd.

dvyukov · 2013-03-26T07:07:52Z

Comment 7:

It is simply this bad guy:
--- FAIL: TestPipeLookPathLeak (0.21 seconds)
    exec_test.go:168: leaked 12 fds; want ~0
=== RUN TestExtraFiles
--- PASS: TestExtraFiles (0.02 seconds)
    exec_test.go:209: Something already leaked - closed fd 3
    exec_test.go:209: Something already leaked - closed fd 4
    exec_test.go:209: Something already leaked - closed fd 5
    exec_test.go:209: Something already leaked - closed fd 6
    exec_test.go:209: Something already leaked - closed fd 7
    exec_test.go:209: Something already leaked - closed fd 8
It's not particularly cool to close random fds.
I think the solution is to limit the error message storm in such cases and move this
test into a separate process.

robpike · 2013-05-18T15:04:13Z

Comment 8:

Labels changed: added go1.2maybe, removed go1.1maybe.

dvyukov · 2013-07-30T17:02:03Z

Comment 9:

The message storm was fixed by:
changeset:   16430:163d528e7436
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Tue Mar 26 20:25:43 2013 +0400
summary:     runtime: does not report duplicate errors in netpoll
I don't know what to do with these strange tests that close random fds.

Owner changed to ---.

rsc · 2013-07-30T22:41:24Z

Comment 10:

Labels changed: added feature.

dvyukov · 2013-08-10T10:51:05Z

Comment 11:

Labels changed: removed feature.

rsc · 2013-09-10T23:11:21Z

Comment 12:

Labels changed: added testing.

rsc · 2013-10-02T00:30:56Z

Comment 13:

Labels changed: added go1.3, removed go1.2maybe.

rsc · 2013-12-04T01:30:05Z

Comment 14:

Labels changed: added release-go1.3.

rsc · 2013-12-04T01:30:26Z

Comment 15:

Labels changed: removed go1.3.

rsc · 2013-12-04T01:48:38Z

Comment 16:

Labels changed: added repo-main.

rsc · 2014-05-11T18:30:20Z

Comment 17:

Labels changed: added release-none, suggested, removed release-go1.3.

odeke-em · 2019-10-15T06:27:32Z

Hey @bradfitz, are you still able to reproduce this issue? It has been 6 years and this
issue hasn't surfaced. I just ran the repro that you had suggested in #5073 (comment) and I get

    exec_test.go:360: using lsof because: open /proc/self/fd: no such file or directory
    exec_test.go:403: number of open file descriptors changed: got 46, want 10

which is unrelated to this bug/report expected, as the number of file descriptors left open after that commented patch increases.

bradfitz · 2019-10-16T15:55:21Z

Let's just close this. It's super old & so much has changed.

We can reopen it if we ever see it again.

bradfitz added accepted Testing An issue that has been verified to require only test changes, not just a test failure. Suggested Issues that may be good for new contributors looking for work to do. labels May 11, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

bradfitz closed this as completed Oct 16, 2019

golang locked and limited conversation to collaborators Oct 15, 2020

gopherbot added the FrozenDueToAge label Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os/exec: TestExtraFiles closes random fds #5073

os/exec: TestExtraFiles closes random fds #5073

bradfitz commented Mar 18, 2013

dvyukov commented Mar 18, 2013

bradfitz commented Mar 18, 2013

dvyukov commented Mar 22, 2013

bradfitz commented Mar 22, 2013

dvyukov commented Mar 25, 2013

dvyukov commented Mar 25, 2013

dvyukov commented Mar 26, 2013

robpike commented May 18, 2013

dvyukov commented Jul 30, 2013

rsc commented Jul 30, 2013

dvyukov commented Aug 10, 2013

rsc commented Sep 10, 2013

rsc commented Oct 2, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

rsc commented May 11, 2014

odeke-em commented Oct 15, 2019

bradfitz commented Oct 16, 2019

os/exec: TestExtraFiles closes random fds #5073

os/exec: TestExtraFiles closes random fds #5073

Comments

bradfitz commented Mar 18, 2013

dvyukov commented Mar 18, 2013

bradfitz commented Mar 18, 2013

dvyukov commented Mar 22, 2013

bradfitz commented Mar 22, 2013

dvyukov commented Mar 25, 2013

dvyukov commented Mar 25, 2013

dvyukov commented Mar 26, 2013

robpike commented May 18, 2013

dvyukov commented Jul 30, 2013

rsc commented Jul 30, 2013

dvyukov commented Aug 10, 2013

rsc commented Sep 10, 2013

rsc commented Oct 2, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

rsc commented May 11, 2014

odeke-em commented Oct 15, 2019

bradfitz commented Oct 16, 2019