Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: pipe2: too many open files on S390X #20703

Closed
cherrymui opened this issue Jun 16, 2017 · 9 comments
Closed

net: pipe2: too many open files on S390X #20703

cherrymui opened this issue Jun 16, 2017 · 9 comments
Milestone

Comments

@cherrymui
Copy link
Member

Since https://go-review.googlesource.com/c/45815/, the S390X builder starts to fail with

pipe2: too many open files
FAIL	net	3.146s

https://build.golang.org/log/c50cf67a5e81277c308bae6d09f661dbb5ecdef6

cc @ianlancetaylor @mundaym

@ianlancetaylor
Copy link
Contributor

Running using gomote, I see

--- FAIL: TestVariousDeadlines (1.13s)
...
	timeout_test.go:913: for 250ms run 3/3, server in 250.726305ms wrote 395146821: readfrom tcp4 127.0.0.1:35191->127.0.0.1:35775: write tcp4 127.0.0.1:35191->127.0.0.1:35775: write: connection reset by peer
	timeout_test.go:878: 500ms run 1/3
	timeout_test.go:882: dial tcp 127.0.0.1:35191: socket: too many open files
--- PASS: TestTCPSpuriousConnSetupCompletionWithCancel (1.18s)
=== RUN   ExampleIPv4
pipe2: too many open files
exit status 1

I think it must be one of the new tests (TestTCPSpuriousConnSetupCompletion, TestTCPSpuriousConnSetupCompletionWithCancel) that are using up the file descriptors, but at the moment I'm not seeing it.

@ianlancetaylor ianlancetaylor added this to the Go1.9 milestone Jun 16, 2017
@ianlancetaylor
Copy link
Contributor

On the s390 gomote ulimit -n reports 1024, as compared to my laptop where it reports 32768. /proc/sys/fs/file-max is 808890 on s390 gomote, 1048576 on my laptop. But running the net test under strace -f doesn't use nearly that many file descriptors. The largest file descriptor number I see is 159.

@ianlancetaylor
Copy link
Contributor

OK, running strace -f on gomote s390 shows that the net test is indeed opening file descriptors up to 1023 before calls to socket start to fail with EMFILE.

@ianlancetaylor
Copy link
Contributor

The pattern I'm seeing is

[pid  7669] socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
...
[pid  7669] <... socket resumed> )      = 998
...
[pid  7669] setsockopt(998, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
...
[pid  7669] <... setsockopt resumed> )  = 0
...
[pid  7669] connect(998, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16 <unfinished ...>
...
[pid  7669] <... connect resumed> )     = 0
...
[pid  7669] close(998 <unfinished ...>
...
[pid  7669] <... close resumed> )       = 0

with longs sequences of other unrelated system calls in the ellipses. 8.8.8.8 is the only address listed in /etc/resolv.conf in the gomote. In other words, we are creating a socket for contacting a UDP DNS server, but then we are closing that socket without writing anything to it. While that socket is open we are opening many more sockets. This pattern repeats over and over.

@ianlancetaylor
Copy link
Contributor

I think I see the problem. TestTCPSpuriousConnSetupCompletionWithCancel works by starting a bunch of connections and canceling them. The connections use a DNS lookup, because connecting to a simple IP address did not recreate the problem the test was testing for. Canceling the connection does not cancel the DNS lookup, but it does cause the singleflight to forget what it was doing. So we wind up stacking up more and more DNS lookups, using up more and more file descriptors.

@gopherbot
Copy link

CL https://golang.org/cl/45999 mentions this issue.

@gopherbot
Copy link

Change https://golang.org/cl/100840 mentions this issue: net: don't let cancelation of a DNS lookup affect another lookup

gopherbot pushed a commit that referenced this issue Mar 16, 2018
Updates #8602
Updates #20703
Fixes #22724

Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
Reviewed-on: https://go-review.googlesource.com/100840
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/102787 mentions this issue: [release-branch.go1.10] net: don't let cancelation of a DNS lookup affect another lookup

@gopherbot
Copy link

Change https://golang.org/cl/103215 mentions this issue: [release-branch.go1.9] net: don't let cancelation of a DNS lookup affect another lookup

gopherbot pushed a commit that referenced this issue Mar 29, 2018
…fect another lookup

Updates #8602
Updates #20703
Fixes #22724

Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
Reviewed-on: https://go-review.googlesource.com/100840
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-on: https://go-review.googlesource.com/102787
Run-TryBot: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Mar 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants