Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: connCount underflow panic with MaxConnsPerHost #38172

Closed
andybalholm opened this issue Mar 31, 2020 · 19 comments
Closed

net/http: connCount underflow panic with MaxConnsPerHost #38172

andybalholm opened this issue Mar 31, 2020 · 19 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@andybalholm
Copy link
Contributor

I'm splitting this off from #34941 because it seems to be a separate issue, although with the same symptom.

$ go version
go version go1.14 darwin/amd64

I compiled this binary on macOS, but I'm running it on Linux amd64.

Here is the stack trace:

panic: net/http: internal error: connCount underflow

goroutine 2114 [running]:
net/http.(*Transport).decConnsPerHost(0x1094ca0, 0x0, 0x0, 0xc001f110b5, 0x4, 0xc001a27320, 0x14, 0x0)
		/usr/local/go/src/net/http/transport.go:1391 +0x590
net/http.(*persistConn).closeLocked(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2584 +0xf5
net/http.(*persistConn).close(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2574 +0x83
net/http.(*persistConn).readLoop.func1(0xc0000cd320, 0xc0073b4d60)
		/usr/local/go/src/net/http/transport.go:1946 +0x41
net/http.(*persistConn).readLoop(0xc0000cd320)
		/usr/local/go/src/net/http/transport.go:2122 +0x12ae
created by net/http.(*Transport).dialConn
		/usr/local/go/src/net/http/transport.go:1647 +0xc56

The usage is in a forward proxy server (github.com/andybalholm/redwood). We have about 2000 users running through the server I have been watching these errors on the most closely. So the usage is heavy, and probably about as close to random as you are likely to find.

I thought that switching to two Transports (one for HTTP/1.1 and one for HTTP/2) instead of one http.Transport that was configured to automatically switch protocols would get rid of the panics. But it didn't seem to help. (So it seems that the panic doesn't depend on having HTTP/2 enabled on the Transport.)

I had Transport.MaxConnsPerHost set to 8. Setting it to 0 (unlimited) made the panics go away.

@andybalholm
Copy link
Contributor Author

I just noticed that the binary that stack trace is from was compiled with Go 1.14, not Go 1.14.1 like I had thought originally. But we got the same panics with a binary compiled on Linux with Go 1.14.1.

@andybalholm andybalholm changed the title net/http: connCount underflow panic with MaxConnsPerHost net/http: connCount underflow panic with MaxConnsPerHost Mar 31, 2020
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 1, 2020
@andybons andybons added this to the Unplanned milestone Apr 1, 2020
@andybons
Copy link
Member

andybons commented Apr 1, 2020

@bcmills
Copy link
Contributor

bcmills commented Apr 1, 2020

@andybalholm, could you try testing with a go build from head, or patching in CL 220905? I wonder if this is another symptom of the bug that was fixed there.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 1, 2020
@bcmills bcmills modified the milestones: Unplanned, Backlog Apr 1, 2020
@andybalholm
Copy link
Contributor Author

Building with

go version devel +620208790e Wed Apr 1 19:32:57 2020 +0000 darwin/amd64

I still get the panic:

panic: net/http: internal error: connCount underflow

goroutine 67124 [running]:
net/http.(*Transport).decConnsPerHost(0x108df80, 0x0, 0x0, 0xc00b2800e4, 0x4, 0xc002061900, 0x17, 0x0)
	/usr/local/go/src/net/http/transport.go:1388 +0x590
net/http.(*persistConn).closeLocked(0xc00af806c0, 0xb847e0, 0xc00006cb60)
	/usr/local/go/src/net/http/transport.go:2586 +0xf5
net/http.(*persistConn).close(0xc00af806c0, 0xb847e0, 0xc00006cb60)
	/usr/local/go/src/net/http/transport.go:2576 +0x83
net/http.(*persistConn).closeConnIfStillIdle(0xc00af806c0)
	/usr/local/go/src/net/http/transport.go:1890 +0xcb
created by time.goFunc
	/usr/local/go/src/time/sleep.go:169 +0x44

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 1, 2020
@fraenkel
Copy link
Contributor

fraenkel commented Apr 2, 2020

@bcmills This is an http request issue not http2.

@bcmills
Copy link
Contributor

bcmills commented Apr 7, 2020

@andybalholm, could you share some more detail about your usage? (Any unusual Transport customizations, proxies, or anything like that?)

@rsc
Copy link
Contributor

rsc commented Apr 7, 2020

If you have a bunch of these panic traces, are there any that don't have that stack sequence (decConnsPerHost called from closeLocked called from close)?

@andybalholm
Copy link
Contributor Author

@bcmills Here is my transport:

var dialer = &net.Dialer{
	Timeout:   30 * time.Second,
	KeepAlive: 30 * time.Second,
	DualStack: true,
}

var httpTransport = &http.Transport{
	Proxy:                 http.ProxyFromEnvironment,
	DialContext:           dialer.DialContext,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
}

The environment doesn't specify a proxy. I call http2.ConfigureTransport(httpTransport) elsewhere in the code. (But not making that call doesn't fix the problem.)

@andybalholm
Copy link
Contributor Author

@rsc All the stack traces seem to have that sequence. And close is being called from either net/http.(*persistConn).readLoop.func1 or net/http.(*persistConn).closeConnIfStillIdle.

@fraenkel
Copy link
Contributor

fraenkel commented Apr 8, 2020

One thing I realize is that we dont check the return of removeIdleConnLocked during closeConnIfStillIdle before we blindly close the persistConn.

@fraenkel
Copy link
Contributor

fraenkel commented Apr 8, 2020

More interesting is that the order of pc.close() and removeIdleConn vary which means we could easily get into this situation.

@fraenkel
Copy link
Contributor

fraenkel commented Apr 9, 2020

@andybalholm I am confused. You mention that you set MaxConnsPerHost = 8, but the above does not show you setting it. Where/When did you do that.

@andybalholm
Copy link
Contributor Author

andybalholm commented Apr 9, 2020 via email

@bcmills
Copy link
Contributor

bcmills commented Apr 9, 2020

@andybalholm, do you set those after the Transport is already in use? I don't believe that's safe in general. (It wouldn't surprise me if it's even a data race.)

What happens if you run your program under the race detector?

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 9, 2020
@andybalholm
Copy link
Contributor Author

andybalholm commented Apr 9, 2020 via email

@fraenkel
Copy link
Contributor

fraenkel commented Apr 9, 2020

Can you run with -race? That would catch this pretty fast if it was done later.

@andybalholm
Copy link
Contributor Author

andybalholm commented Apr 10, 2020

I can't run it with -race on the server where I was getting the panics; it's too heavily loaded. But I tried it on a local server with just me proxied through it, and that was enough:

WARNING: DATA RACE
Read at 0x000001587540 by goroutine 87:
  net/http.(*Transport).onceSetNextProtoDefaults()
      /usr/local/go/src/net/http/transport.go:344 +0x61
  net/http.(*Transport).onceSetNextProtoDefaults-fm()
      /usr/local/go/src/net/http/transport.go:343 +0x41
  sync.(*Once).doSlow()
      /usr/local/go/src/sync/once.go:66 +0x103
  sync.(*Once).Do()
      /usr/local/go/src/sync/once.go:57 +0x68
  net/http.(*Transport).roundTrip()
      /usr/local/go/src/net/http/transport.go:485 +0x8b
  net/http.(*Transport).RoundTrip()
      /usr/local/go/src/net/http/roundtrip.go:17 +0x42
  main.proxyHandler.ServeHTTP()
      /root/src/redwood/proxy.go:328 +0x17fb
  main.(*proxyHandler).ServeHTTP()
      <autogenerated>:1 +0x114
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2807 +0xce
  net/http.initALPNRequest.ServeHTTP()
      /usr/local/go/src/net/http/server.go:3381 +0xfc
  net/http.(*initALPNRequest).ServeHTTP()
      <autogenerated>:1 +0xa6
  net/http.Handler.ServeHTTP-fm()
      /usr/local/go/src/net/http/server.go:87 +0x64
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/server.go:2095 +0xc0

Previous write at 0x000001587540 by main goroutine:
  golang.org/x/net/http2.configureTransport()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/transport.go:174 +0x420
  golang.org/x/net/http2.ConfigureTransport()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/transport.go:136 +0x7ca
  main.main()
      /root/src/redwood/redwood.go:88 +0x7ab

It's been several years since I tried running with -race. I didn't find anything then, and I figured only a heavily-loaded production server would have enough concurrency to expose races if there were any. But now I know that running with -race locally is worth doing.

Reordering the startup steps gets rid of the race.

@mrjrieke
Copy link

mrjrieke commented Aug 12, 2020

Running go 1.14.6 or 1.14.7

Since this is a different code path and happening during automatic cleanup, seems like this shouldn't happen...

panic: net/http: internal error: connCount underflow
goroutine 25 [running]:
net/http.(*Transport).decConnsPerHost(0xc0004ba280, 0x0, 0x0, 0x7fae7523e750, 0x5, 0xc0004af9e0, 0x18, 0x0)
/usr/local/go/src/net/http/transport.go:1391 +0x590
net/http.(*persistConn).closeLocked(0xc0000a4000, 0x132adcf00, 0xc000118c50)
/usr/local/go/src/net/http/transport.go:2584 +0xf5
net/http.(*persistConn).close(0xc0000a4000, 0x132adcf00, 0xc000118c50)
/usr/local/go/src/net/http/transport.go:2574 +0x83
net/http.(*persistConn).closeConnIfStillIdle(0xc0000a4000)
/usr/local/go/src/net/http/transport.go:1888 +0xcb
created by time.goFunc
/usr/local/go/src/time/sleep.go:168 +0x44

Should I open a new issue for this?

@networkimprov
Copy link

@mrjrieke please try your code with the race detector, as that proved to be the problem above. If that doesn't reveal a mistake, please file a new issue.

@golang golang locked and limited conversation to collaborators Aug 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

8 participants