Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: benchmarks fail on sierra with 'too many open files' #18753

Open
josharian opened this issue Jan 22, 2017 · 16 comments
Open

net/http: benchmarks fail on sierra with 'too many open files' #18753

josharian opened this issue Jan 22, 2017 · 16 comments
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@josharian
Copy link
Contributor

Sample run excerpt:

BenchmarkCookieString-8                      	 2000000	       900 ns/op	     384 B/op	       3 allocs/op
BenchmarkReadSetCookies-8                    	  500000	      3703 ns/op	     960 B/op	      17 allocs/op
BenchmarkReadCookies-8                       	  300000	      4948 ns/op	    2008 B/op	      16 allocs/op
BenchmarkHeaderWriteSubset-8                 	 3000000	       545 ns/op	       0 B/op	       0 allocs/op
BenchmarkReadRequestChrome-8                 	  500000	      3187 ns/op	 191.68 MB/s	    1809 B/op	      14 allocs/op
BenchmarkReadRequestCurl-8                   	 1000000	      1663 ns/op	  46.90 MB/s	     880 B/op	       9 allocs/op
BenchmarkReadRequestApachebench-8            	 1000000	      1670 ns/op	  49.10 MB/s	     883 B/op	       9 allocs/op
BenchmarkReadRequestSiege-8                  	 1000000	      2161 ns/op	  69.87 MB/s	     965 B/op	      11 allocs/op
BenchmarkReadRequestWrk-8                    	 1000000	      1682 ns/op	  23.78 MB/s	     832 B/op	       7 allocs/op
BenchmarkClientServer-8                      	   20000	     71011 ns/op	    4962 B/op	      67 allocs/op
BenchmarkClientServerParallel4-8             	  100000	     15276 ns/op	    4937 B/op	      64 allocs/op
BenchmarkClientServerParallel64-8            	2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 5ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 5ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 10ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 20ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 40ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 80ms
2017/01/22 11:19:46 http: Accept error: accept tcp 127.0.0.1:60387: accept: too many open files; retrying in 5ms
   10000	    135870 ns/op	    4749 B/op	      63 allocs/op
--- BENCH: BenchmarkClientServerParallel64-8
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	serve_test.go:4494: Get: Get http://127.0.0.1:60387: dial tcp 127.0.0.1:60387: socket: too many open files
	... [output truncated]
BenchmarkClientServerParallelTLS4-8          	  100000	     15757 ns/op	    4945 B/op	      64 allocs/op
BenchmarkClientServerParallelTLS64-8         	2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 10ms
2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 20ms
2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 40ms
2017/01/22 11:19:49 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
2017/01/22 11:19:50 http: Accept error: accept tcp 127.0.0.1:60976: accept: too many open files; retrying in 5ms
    5000	    475368 ns/op	    6175 B/op	      80 allocs/op
--- BENCH: BenchmarkClientServerParallelTLS64-8
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	serve_test.go:4494: Get: Get https://127.0.0.1:60976: dial tcp 127.0.0.1:60976: socket: too many open files
	... [output truncated]
BenchmarkServer-8                            	   20000	     75523 ns/op	    2341 B/op	      21 allocs/op
BenchmarkClient-8                            	   20000	     72795 ns/op	    3351 B/op	      50 allocs/op
@josharian josharian added this to the Go1.9 milestone Jan 22, 2017
@bradfitz
Copy link
Contributor

What's your ulimit -a? Macs are always super low, IIRC.

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 22, 2017
@josharian
Copy link
Contributor Author

Yeah:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

@alexellis
Copy link

alexellis commented Jan 27, 2017

@bradfitz I'm facing the same issue what's the fix for this? Can the open connections be throttled or queued or is it worth moving to a different programming language for high-load web servers?

Within an Alpine Docker container I get this (that's odd, because the limit looks similar to the Host MacOs limit, but Docker is running its own VM - Moby)

1	/usr/bin/fwatchdog	socket:[35439]
1	/usr/bin/fwatchdog	socket:[36754]
1	/usr/bin/fwatchdog	socket:[35445]
1	/usr/bin/fwatchdog	socket:[36763]
1	/usr/bin/fwatchdog	socket:[35473]
~ # lsof|grep /usr/bin|wc -l
251
~ # ulimit -a
-f: file size (blocks)             unlimited
-t: cpu time (seconds)             unlimited
-d: data seg size (kb)             unlimited
-s: stack size (kb)                8192
-c: core file size (blocks)        0
-m: resident set size (kb)         unlimited
-l: locked memory (kb)             82000
-p: processes                      unlimited
-n: file descriptors               1048576
-v: address space (kb)             unlimited
-w: locks                          unlimited
-e: scheduling priority            0
-r: real-time priority             0

@bradfitz
Copy link
Contributor

@alexellis, it's not clear you're facing the same problem. What did you do?

or is it worth moving to a different programming language for high-load web servers?

If your OS configuration only permits N file descriptors, the program language won't matter at all. You only get N regardless.

It's possible Docker or Moby are the ones hitting the limit and passing it back to you.

@alexellis
Copy link

I could be wrong - but it does give the same error as listed.

When I generate a high load on my Golang web server which uses net/http - I'm seeing hundreds of connections on lsof.

The example code is @ https://github.com/alexellis/faas/blob/labels_metrics/watchdog/main.go

As far as I can see I am closing what's being opened.

What I meant by a different language / HTTP server implementation was one that would control the in-flight connections. The ones I'm seeing I think have already "served" a connection but appear to have something open.

@bradfitz
Copy link
Contributor

@alexellis, can you file a separate bug? I'd like to keep this one about @josharian's very specific bug report.

Your case seems like it would involve enough debugging that I suspect it's going to drown out the original intent of this bug.

@pwaller
Copy link
Contributor

pwaller commented Jan 30, 2017

Just to add another data point, I am not an OSX user but have a contractor trying to run my code, which fails with "too many open files" during an accept in http.ListenAndServe. Apologies but I don't want to open the bug since it's not happening on my computer so I can't provide many details. It's a Yosimite system, and running go1.8rc2.

@pwaller
Copy link
Contributor

pwaller commented Feb 1, 2017

I retract the above, it was because of the fsnotify package creating lots of file descriptors and the limit being 256 with a low system-wide limit, too. Apologies for the noise.

@kennygrant
Copy link
Contributor

I've seen this too recently (same output as josharian above). Errors on all the BenchmarkClientServerParallel tests, most often on BenchmarkClientServerParallel64.

You can raise the file limit with ulimit (for current session), but then I see a different error:

Ada:http kenny$ ulimit -n 10000
Ada:http kenny$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited
Ada:http kenny$ go test -bench=.
goos: darwin
goarch: amd64
pkg: net/http
BenchmarkCookieString-4                      	 1000000	      1103 ns/op	     384 B/op	       3 allocs/op
BenchmarkReadSetCookies-4                    	  500000	      3599 ns/op	     688 B/op	      11 allocs/op
BenchmarkReadCookies-4                       	  300000	      4672 ns/op	    2008 B/op	      16 allocs/op
BenchmarkHeaderWriteSubset-4                 	 2000000	       701 ns/op	       0 B/op	       0 allocs/op
BenchmarkReadRequestChrome-4                 	  500000	      3823 ns/op	 159.82 MB/s	    1808 B/op	      14 allocs/op
BenchmarkReadRequestCurl-4                   	 1000000	      2102 ns/op	  37.10 MB/s	     880 B/op	       9 allocs/op
BenchmarkReadRequestApachebench-4            	 1000000	      2012 ns/op	  40.75 MB/s	     883 B/op	       9 allocs/op
BenchmarkReadRequestSiege-4                  	 1000000	      2701 ns/op	  55.89 MB/s	     965 B/op	      11 allocs/op
BenchmarkReadRequestWrk-4                    	 1000000	      1434 ns/op	  27.88 MB/s	     832 B/op	       7 allocs/op
BenchmarkClientServer-4                      	   20000	     86044 ns/op	    4852 B/op	      64 allocs/op
BenchmarkClientServerParallel4-4             	   50000	     51470 ns/op	    7672 B/op	      74 allocs/op
BenchmarkClientServerParallel64-4            	    3000	    928726 ns/op	   11285 B/op	      88 allocs/op
--- BENCH: BenchmarkClientServerParallel64-4
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62040->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62039->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62038->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62037->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62036->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62035->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62031->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62034->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62029->127.0.0.1:61917: read: connection reset by peer
	serve_test.go:4493: Get: Get http://127.0.0.1:61917: read tcp 127.0.0.1:62033->127.0.0.1:61917: read: connection reset by peer
	... [output truncated]
BenchmarkClientServerParallelTLS4-4          	2017/03/17 09:03:38 http: TLS handshake error from 127.0.0.1:63863: read tcp 127.0.0.1:63405->127.0.0.1:63863: use of closed file or network connection
2017/03/17 09:03:39 http: TLS handshake error from 127.0.0.1:63354: read tcp 127.0.0.1:63865->127.0.0.1:63354: use of closed file or network connection
2017/03/17 09:03:39 http: TLS handshake error from 127.0.0.1:63360: read tcp 127.0.0.1:63865->127.0.0.1:63360: use of closed file or network connection
    5000	    273600 ns/op	   21305 B/op	     228 allocs/op
BenchmarkClientServerParallelTLS64-4         	    2000	    547983 ns/op	   48283 B/op	     488 allocs/op
--- BENCH: BenchmarkClientServerParallelTLS64-4
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63513->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63516->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63515->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63517->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63518->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63520->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63519->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63521->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63522->127.0.0.1:63468: read: connection reset by peer
	serve_test.go:4493: Get: Get https://127.0.0.1:63468: read tcp 127.0.0.1:63524->127.0.0.1:63468: read: connection reset by peer
	... [output truncated]
BenchmarkServer-4                            	   10000	    117595 ns/op	    2238 B/op	      20 allocs/op
BenchmarkClient-4                            	   10000	    102093 ns/op	    3349 B/op	      48 allocs/op
BenchmarkServerFakeConnNoKeepAlive-4         	  100000	     18730 ns/op	    4524 B/op	      44 allocs/op
BenchmarkServerFakeConnWithKeepAlive-4       	  100000	     14655 ns/op	    2439 B/op	      22 allocs/op
BenchmarkServerFakeConnWithKeepAliveLite-4   	  200000	      7824 ns/op	    1300 B/op	      12 allocs/op
BenchmarkServerHandlerTypeLen-4              	  200000	      9385 ns/op	    2119 B/op	      20 allocs/op
BenchmarkServerHandlerNoLen-4                	  200000	      8380 ns/op	    2087 B/op	      17 allocs/op
BenchmarkServerHandlerNoType-4               	  200000	      8227 ns/op	    2087 B/op	      18 allocs/op
BenchmarkServerHandlerNoHeader-4             	  200000	      6155 ns/op	    1300 B/op	      12 allocs/op
BenchmarkServerHijack-4                      	  100000	     24303 ns/op	   15968 B/op	      42 allocs/op
BenchmarkCloseNotifier-4                     	    2000	    675642 ns/op	    3316 B/op	      54 allocs/op
PASS
ok  	net/http	82.847s

Running the tests with b.SetParallelism(1) in benchmarkClientServerParallel seems to fix it, so presumably it's some problem with running these tests in parallel.

@mahadevTW
Copy link

I see same issue on centos ,Ulimit open files is 10,000.
in my case:
Code looks like:
func someHandler(request, response) { for v := range obj.GetSomeChannel() { go v.requestSomeHttpServer() //v.request is again one more http call to. } }
and this someHandler is getting called multiple times because it serves user front service.

Now , v.requestSomeHttpServer is not able to serve because there are multiple request which hitting and it is throwing too many open files.

@bradfitz bradfitz added the Testing An issue that has been verified to require only test changes, not just a test failure. label May 24, 2017
@bradfitz bradfitz removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 27, 2017
@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jan 4, 2018

I looked into this a few weeks back and it does appear that something weird is going on. There were many more connections than I expected from reading the benchmark.

But this needn't block Go 1.10, since benchmarks don't run by default, so punting.

@carusyte
Copy link

My web crawler program which was able to run smoothly now crashes because of too many open files error after upgrading to 1.10. OS type is Mac OS 10.13.6.

@davecheney
Copy link
Contributor

@carusyte I would encourage you to file a new bug with instructions for how to reproduce this error. Thank you.

@dxjones
Copy link

dxjones commented Nov 21, 2018

I am still seeing this error read: connection reset by peer when testing parallel HTTP 1.1 connections on localhost with Go 1.11.1 on both Mac OSX 10.13.6 and Linux Ubuntu 18.04

Has there been any progress on this issue in the past year?

Are there any practical workarounds?

@agnivade
Copy link
Contributor

This issue is specifically about "too many open files", not "connection reset by peer". I believe you already commented on #20960 about this. But if you feel your issue is separate, please feel free to file a new issue.

@JoshuaWise
Copy link

JoshuaWise commented Dec 6, 2018

I'm also getting this error, even with ulimit -n 1024. I'm just running a pretty simple benchmarking app handwritten in go.

const SAMPLE_SIZE = 128
const CONCURRENCY = 64

func benchmark(urls []string) []BenchmarkResult {
  channel := make(chan BenchmarkResult)
  results := make([]BenchmarkResult, 0)
  urlCount := len(urls)

  for needsMore(results) {
    actives := 0
    pending := SAMPLE_SIZE
    for pending > 0 {
      for actives < CONCURRENCY && actives < pending {
        // The request() function will pass a result to the channel after the http request returns.
        go request(urls[rand.Intn(urlCount)], channel)
        actives += 1
      }
      result := <-channel
      results = append(results, result)
      actives -= 1
      pending -= 1
    }
    /*
      Irrelevant math for benchmarking purposes here...
     */
  }

  return results
}

According to my code, there should never be more than 64 requests open at a time, so the ulimit of 1024 should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests