-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/cgo: Mac OS X 10.6 can leak fds to child processes #2603
Comments
I can't reproduce this in a stand-alone test on either OS X 10.7 or 10.6. Here's a patch which tried: http://golang.org/cl/5503063 Even with the case "darwin" part commented out, it won't fail. I added logging in crypto/tls/root_darwin.go to verify the C code was being run. |
Owner changed to builder@golang.org. |
I'll see about bumping it to 10.7. Owner changed to @adg. |
I looked into this. There are fd leaks even on 10.7 when using TLS or DNS from the Mac libraries. We should fix this more generally in exec. stackoverflow.com/questions/899038 explains how to find the largest in-use fd on a variety of systems (all different). Updating the builders to 10.7 will not solve this problem. Owner changed to @rsc. |
FYI, CentOS 5/RHEL 5 also suffer from a similar bug. Test log: http://pastebin.com/xF7cN5Xw This os/exec test fails every time I run it, so I think we should solve it generally in os/exec. |
On the systems that we care about, fixing this requires reading from /dev/fd or /proc/self/fd to find out about the highest fds. Or we could just close n to 100. Both are kind of kludgy, and we're really just working around bugs in other software that happens to be sharing the same address space. This can wait until after Go 1. Labels changed: added priority-later, removed priority-go1. Status changed to HelpWanted. |
I'd like to add that NetBSD suffers from this problem too, for what that's worth. Reading Russ' comment, I gather that we want to close any FDs that may be open above a threshold: we know the threshold, but we don't know the upper limit, although we can determine it in a possibly OS-dependent way. If that's the case, I'm willing to look into it, although I'll only be able to test it on a (very) limited number of platforms. As for Russ' comment, my take is that a test ought to warn us when behaviour diverges from the expected; it's a judgement call whether bad OS conditions should be treated as divergence from the Go specification. I guess there has to be reliable documentation to cover special cases like this one. Lucio. |
this is also the case on Red Hat RHEL 6. I had thought it was the product of our build environment, because the os/exec test fails consistently there. We are using koji as our build infrastructure. Here is log output of the failed build, http://pastebin.com/ZuQ3Zx2m On a RHEL6 workstation and virtual machine I have been unable to reproduce. Only on this build server. |
Re comment #13: I can't remember whether RHEL 6 is one of those kernels where O_CLOEXEC isn't respected on one of the system calls. It might be, but the failure you pasted does look like it's your build system leaking fds: exec_test.go:158: Something already leaked - closed fd 3 exec_test.go:211: CombinedOutput: exit status 1; output "leaked parent file. fd = 15; want 12 ... exec.test 7948 mockbuild 15r REG 9,1 4220044 18579833 /tmp/go-build506154113/os/exec/_test/exec.test |
I wonder if https://golang.org/issue/5714 was the root cause ? |
i don't think issue #5714 is the root cause. it's because libc (libSystem) creates its own fd for certain operations, but those fds aren't O_CLOEXEC. |
Any chance of this getting some attention? (Is anybody sure it still happens with 10.10?) |
It doesn't happen after 10.6. We can't fix this, it's a kernel bug in 10.6, On Tue, Mar 24, 2015 at 9:34 AM, Aaron Jacobs notifications@github.com
|
And we're discussing elsewhere just dropping 10.6 support in Go 1.5, too. Especially since we can't run virtualized builders for it legally. |
@davecheney: Are you sure? Russ's comment on 2012-02-15 says that there are leaks even on 10.7, and bumping the version to 10.7 will not help. |
(Plus further comments that say this happens on other operating systems. Perhaps the issue should be re-titled to reduce confusion.) |
You are probably correct. All I know is that it doesn't happen on current On Tue, Mar 24, 2015 at 11:36 AM, Aaron Jacobs notifications@github.com
|
Per #9511, we will not be making any further bug fixes specific to 10.6. |
@rsc: This is not specific to 10.6. See your comment from 2012-02-15 and the last few comments in this thread. |
@rsc: Ping. I think this thread should be re-opened; it is not only about OS X 10.6. Also, the link still remains in the public documentation for os/exec. |
This issue has gotten confusing. Russ said it fails with OS X 10.7, Dave Cheney says it works on current OS X. Various people have chimed in with issues on other systems, but there are no test cases. I think it would be better to open a new issue with an example program and clear information about how and where it fails. |
Agreed, I first came here looking for clarification myself. :-) At the minimum though, the documentation should probably be updated to not point at this confusing, closed bug. |
The text was updated successfully, but these errors were encountered: