Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: TestExtraFiles closes random fds #5073

Closed
bradfitz opened this issue Mar 18, 2013 · 19 comments
Closed

os/exec: TestExtraFiles closes random fds #5073

bradfitz opened this issue Mar 18, 2013 · 19 comments
Labels
FrozenDueToAge Suggested Issues that may be good for new contributors looking for work to do. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bradfitz
Copy link
Contributor

The new pollster should probably handle out of fds more gracefully.

Fixing an unrelated fd leak, I wrote a test to intentionally run out of file descriptors.

With the new scheduler, you get pages of this ~forever, EBADF from epoll_wait, I believe:

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
9
epollwait failed with epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with
epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
@dvyukov
Copy link
Member

dvyukov commented Mar 18, 2013

Comment 1:

Please share the test.
I am curious bad fd is passed to epoll and not rejected earlier...

@bradfitz
Copy link
Contributor Author

Comment 2:

The test is in https://golang.org/cl/7799046/
Make sure that fix is not in your tree (if I submit it soon) or sync to 43eb97ed849a,
and then patch in that test  to os/exec/exec_test.go (after TestPipes), making the loop
count much more than 4.  Say, 50:
// issue #5071
func TestPipeLookPathLeak(t *testing.T) {
        fd0 := numOpenFDS(t)
        for i := 0; i < 50; i++ {
                cmd := Command("something-that-does-not-exist-binary")
            cmd.StdoutPipe()
                cmd.StderrPipe()
            cmd.StdinPipe()
                if err := cmd.Run(); err == nil {
                        t.Fatal("unexpected success")
            }
        }
        fdGrowth := numOpenFDS(t) - fd0
    if fdGrowth > 2 {
                t.Errorf("leaked %d fds; want ~0", fdGrowth)
        }
}
func numOpenFDS(t *testing.T) int {
        lsof, err := Command("lsof", "-n", "-p", strconv.Itoa(os.Getpid())).Output()
        if err != nil {
                t.Skip("skipping test; error finding or running lsof")
                return 0
        }
    return bytes.Count(lsof, []byte("\n"))
}
...
And then run "go test -v os/exec" (run all tests, not just the leaky one, because a
future one must use epoll_wait)

@dvyukov
Copy link
Member

dvyukov commented Mar 22, 2013

Comment 3:

Can not reproduce.
I apply the following patch:
--- a/src/pkg/os/exec/exec.go   Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 12:22:54 2013 +0400
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       // c.closeDescriptors(c.closeAfterStart)
+       // c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r 9ca85035f95a src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 12:22:54 2013 +0400
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 100; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
And then:
$ go test -cpu=1,2,4,8,16,32,1,2,4,8,16,32 os/exec
but it only says:
--- FAIL: TestPipeLookPathLeak (0.04 seconds)
    exec_test.go:168: leaked 6 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-16 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.14 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0

@bradfitz
Copy link
Contributor Author

Comment 4:

I can still reproduce.  You just need to set the number high enough for the ulimit.
ante:src $ hg diff
diff -r ccdacf6833dc src/pkg/os/exec/exec.go
--- a/src/pkg/os/exec/exec.go   Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 08:32:50 2013 -0700
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       //c.closeDescriptors(c.closeAfterStart)
+       //c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r ccdacf6833dc src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 08:32:50 2013 -0700
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 2000; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
ante:src $ hg id
ccdacf6833dc+ tip
$ go test -v os/exec
Interestingly, I see three outcomes:
1) sometimes it passes.
2) sometimes it fails with two goroutines racing on stdout with garbled output (like in
my original report above), and 
3) sometimes it's only one goroutine looping the errors:
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9

@dvyukov
Copy link
Member

dvyukov commented Mar 25, 2013

Comment 5:

Hit deadlock in this test:
https://golang.org/issue/5123

@dvyukov
Copy link
Member

dvyukov commented Mar 25, 2013

Comment 6:

I see in strace that somebody actually closes epoll fd.

@dvyukov
Copy link
Member

dvyukov commented Mar 26, 2013

Comment 7:

It is simply this bad guy:
--- FAIL: TestPipeLookPathLeak (0.21 seconds)
    exec_test.go:168: leaked 12 fds; want ~0
=== RUN TestExtraFiles
--- PASS: TestExtraFiles (0.02 seconds)
    exec_test.go:209: Something already leaked - closed fd 3
    exec_test.go:209: Something already leaked - closed fd 4
    exec_test.go:209: Something already leaked - closed fd 5
    exec_test.go:209: Something already leaked - closed fd 6
    exec_test.go:209: Something already leaked - closed fd 7
    exec_test.go:209: Something already leaked - closed fd 8
It's not particularly cool to close random fds.
I think the solution is to limit the error message storm in such cases and move this
test into a separate process.

@robpike
Copy link
Contributor

robpike commented May 18, 2013

Comment 8:

Labels changed: added go1.2maybe, removed go1.1maybe.

@dvyukov
Copy link
Member

dvyukov commented Jul 30, 2013

Comment 9:

The message storm was fixed by:
changeset:   16430:163d528e7436
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Tue Mar 26 20:25:43 2013 +0400
summary:     runtime: does not report duplicate errors in netpoll
I don't know what to do with these strange tests that close random fds.

Owner changed to ---.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 10:

Labels changed: added feature.

@dvyukov
Copy link
Member

dvyukov commented Aug 10, 2013

Comment 11:

Labels changed: removed feature.

@rsc
Copy link
Contributor

rsc commented Sep 10, 2013

Comment 12:

Labels changed: added testing.

@rsc
Copy link
Contributor

rsc commented Oct 2, 2013

Comment 13:

Labels changed: added go1.3, removed go1.2maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 14:

Labels changed: added release-go1.3.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 15:

Labels changed: removed go1.3.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 16:

Labels changed: added repo-main.

@rsc
Copy link
Contributor

rsc commented May 11, 2014

Comment 17:

Labels changed: added release-none, suggested, removed release-go1.3.

@bradfitz bradfitz added accepted Testing An issue that has been verified to require only test changes, not just a test failure. Suggested Issues that may be good for new contributors looking for work to do. labels May 11, 2014
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@odeke-em
Copy link
Member

Hey @bradfitz, are you still able to reproduce this issue? It has been 6 years and this
issue hasn't surfaced. I just ran the repro that you had suggested in #5073 (comment) and I get

    exec_test.go:360: using lsof because: open /proc/self/fd: no such file or directory
    exec_test.go:403: number of open file descriptors changed: got 46, want 10

which is unrelated to this bug/report expected, as the number of file descriptors left open after that commented patch increases.

@bradfitz
Copy link
Contributor Author

Let's just close this. It's super old & so much has changed.

We can reopen it if we ever see it again.

@golang golang locked and limited conversation to collaborators Oct 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Suggested Issues that may be good for new contributors looking for work to do. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

6 participants