Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: HTTP/2 with MaxConnsPerHost hangs or crashes #34941

Closed
rhysh opened this issue Oct 16, 2019 · 24 comments
Closed

net/http: HTTP/2 with MaxConnsPerHost hangs or crashes #34941

rhysh opened this issue Oct 16, 2019 · 24 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Oct 16, 2019

What version of Go are you using (go version)?

$ go1.13 version
go version go1.13.1 darwin/amd64
$ go-tip version
go version devel +f6c624a22a Wed Oct 16 15:58:33 2019 +0000 darwin/amd64
$ go1.12 version
go version go1.12.10 darwin/amd64
$ go1.11 version
go version go1.11.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes: The test hangs with Go 1.13. The test panics with tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go1.13 env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/Users/rhys/go/version/go1.13"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/Users/rhys/go/version/go1.13/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build216795429=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Using an http.Client with an http.Transport configured to use http/2 and with MaxConnsPerHost = 1, I started a large number of concurrent http/2 requests: more than the server would allow on a single TCP connection.

In the test environment, I did this by having the server delay its responses until I knew a large number of requests were either active on the connection or waiting for permission to use the connection, at which point I allowed the http.Handler to respond.

What did you expect to see?

I expected that every request I started would eventually be seen by the server, and that the client would receive the server's response.

What did you see instead?

In Go 1.11 where the default behavior is to use a single TCP connection for each http/2 authority, the requests queue up and once I allow the handler to return they are all processed. That's fine in this context.

In Go 1.12 where the http.Transport is correctly able to use more than one TCP connection for each http/2 authority (but doesn't yet know how to use MaxConnsPerHost), it passes all of the requests on to the server immediately. That's also fine in this context.

In Go 1.13, the server only sees the first 249 requests. The others appear stuck indefinitely. I see that as a bug.

In tip, the Transport panics. That's a second bug.


$ go1.13 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:43 http: TLS handshake error from 127.0.0.1:50300: remote error: tls: bad certificate
--- FAIL: TestMaxConns (1.15s)
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:86: HTTP/2 requests started: 249 != 300
    repro_test.go:89: HTTP/2 requests completed: 249 != 300
FAIL
FAIL    command-line-arguments  2.287s
FAIL

$ go-tip test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:57 http: TLS handshake error from 127.0.0.1:50603: remote error: tls: bad certificate
panic: net/http: internal error: connCount underflow

goroutine 371 [running]:
net/http.(*Transport).decConnsPerHost(0xc0000d63c0, 0x0, 0x0, 0xc0000c6540, 0x5, 0xc0000160d0, 0xf, 0x0)
        /usr/local/go/src/net/http/transport.go:1318 +0x604
net/http.(*Transport).roundTrip(0xc0000d63c0, 0xc00024f200, 0x0, 0xc000626ac0, 0x100e518)
        /usr/local/go/src/net/http/transport.go:542 +0x762
net/http.(*Transport).RoundTrip(0xc0000d63c0, 0xc00024f200, 0xc0000d63c0, 0x0, 0x0)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc00024f200, 0x13bd4c0, 0xc0000d63c0, 0x0, 0x0, 0x0, 0xc000010338, 0x241ef00, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0, 0xc000010338, 0x0, 0x1, 0x20300000000000)
        /usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:692 +0x3c8
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:560
command-line-arguments.TestMaxConns.func2(0xc00040ca70, 0xc00009e2f8, 0xc00040e700, 0xc000590468, 0xc000086300)
        /tmp/repro_test.go:69 +0x1fb
created by command-line-arguments.TestMaxConns
        /tmp/repro_test.go:63 +0x451
FAIL    command-line-arguments  1.250s
FAIL

And for context, the passing results with Go 1.11 and Go 1.12:

$ go1.11 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:50 http: TLS handshake error from 127.0.0.1:50307: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.15s)
    repro_test.go:82: starts=155 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
PASS
ok      command-line-arguments  2.281s
$ go1.12 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:35 http: TLS handshake error from 127.0.0.1:49994: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.17s)
    repro_test.go:82: starts=84 finishes=0
    repro_test.go:82: starts=174 finishes=0
    repro_test.go:82: starts=263 finishes=0
    repro_test.go:82: starts=300 finishes=0
    repro_test.go:82: starts=300 finishes=0
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
PASS
ok      command-line-arguments  3.505s

package repro

import (
	"context"
	"crypto/tls"
	"net/http"
	"net/http/httptest"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

func TestMaxConns(t *testing.T) {
	totalRequests := 300

	allow := make(chan struct{})
	var (
		starts   int64
		finishes int64
	)
	h := func(w http.ResponseWriter, r *http.Request) {
		if !r.ProtoAtLeast(2, 0) {
			t.Errorf("Request is not http/2: %q", r.Proto)
			return
		}
		atomic.AddInt64(&starts, 1)
		<-allow
	}

	s := httptest.NewUnstartedServer(http.HandlerFunc(h))
	s.TLS = &tls.Config{
		NextProtos: []string{"h2"},
	}
	s.StartTLS()
	defer s.Close()

	transport := s.Client().Transport.(*http.Transport)
	clientConfig := transport.TLSClientConfig
	transport.TLSClientConfig = nil

	// make a request to trigger HTTP/2 autoconfiguration
	resp, err := s.Client().Get(s.URL)
	if err == nil {
		resp.Body.Close()
	}
	// now allow the client to connect to the ad-hoc test server
	transport.TLSClientConfig.RootCAs = clientConfig.RootCAs

	transport.MaxConnsPerHost = 1

	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		req, err := http.NewRequest("GET", s.URL, nil)
		if err != nil {
			t.Fatalf("NewRequest: %s", err)
		}
		wg.Add(1)
		go func() {
			defer wg.Done()
			defer atomic.AddInt64(&finishes, 1)
			ctx, cancel := context.WithCancel(ctx)
			defer cancel()
			req = req.WithContext(ctx)
			resp, err := s.Client().Do(req)
			if err != nil {
				return
			}
			resp.Body.Close()
		}()
	}

	for i := 0; i < 10; i++ {
		if i == 5 {
			close(allow)
		}
		time.Sleep(100 * time.Millisecond)
		t.Logf("starts=%d finishes=%d", atomic.LoadInt64(&starts), atomic.LoadInt64(&finishes))
	}

	if have, want := atomic.LoadInt64(&starts), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests started: %d != %d", have, want)
	}
	if have, want := atomic.LoadInt64(&finishes), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests completed: %d != %d", have, want)
	}
}
@julieqiu
Copy link
Member

/cc @bradfitz

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 18, 2019
@bradfitz
Copy link
Contributor

@fraenkel, another MaxConnsPerHost issue. (sorry)

(The other is #34978).

@bradfitz bradfitz added this to the Go1.14 milestone Oct 18, 2019
@fraenkel
Copy link
Contributor

As written, the test case as a data race.
I changed the test case slightly:

  1. Fixing the data race, (transport.MaxConnsHost)
  2. using ForceAttemptHTTP2 to simplifiy the setup
  3. count successful finishes

The failure does still occur but I can get a few successful runs.
There is obviously some book keeping issue.

It is the same for 1.13.3 and tip (46aa835)

package issue34941

import (
	"context"
	"crypto/tls"
	"net/http"
	"net/http/httptest"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

func TestMaxConns(t *testing.T) {
	totalRequests := 300

	allow := make(chan struct{})
	var (
		starts   int64
		finishes int64
	)
	h := func(w http.ResponseWriter, r *http.Request) {
		if !r.ProtoAtLeast(2, 0) {
			t.Errorf("Request is not http/2: %q", r.Proto)
			return
		}
		atomic.AddInt64(&starts, 1)
		<-allow
	}

	s := httptest.NewUnstartedServer(http.HandlerFunc(h))
	s.TLS = &tls.Config{
		NextProtos: []string{"h2"},
	}
	s.StartTLS()
	defer s.Close()

	transport := s.Client().Transport.(*http.Transport)
	// clientConfig := transport.TLSClientConfig
	// transport.TLSClientConfig = nil
	transport.MaxConnsPerHost = 1
	transport.ForceAttemptHTTP2 = true

	// make a request to trigger HTTP/2 autoconfiguration
	// resp, err := s.Client().Get(s.URL)
	// if err == nil {
	// 	resp.Body.Close()
	// }
	// now allow the client to connect to the ad-hoc test server
	// transport.TLSClientConfig.RootCAs = clientConfig.RootCAs

	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		req, err := http.NewRequest("GET", s.URL, nil)
		if err != nil {
			t.Fatalf("NewRequest: %s", err)
		}
		wg.Add(1)
		go func() {
			defer wg.Done()
			ctx, cancel := context.WithCancel(ctx)
			defer cancel()
			req = req.WithContext(ctx)
			resp, err := s.Client().Do(req)
			if err != nil {
				return
			}
			resp.Body.Close()
			atomic.AddInt64(&finishes, 1)
		}()
	}

	for i := 0; i < 10; i++ {
		if i == 5 {
			close(allow)
		}
		time.Sleep(100 * time.Millisecond)
		t.Logf("starts=%d finishes=%d", atomic.LoadInt64(&starts), atomic.LoadInt64(&finishes))
	}

	if have, want := atomic.LoadInt64(&starts), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests started: %d != %d", have, want)
	}
	if have, want := atomic.LoadInt64(&finishes), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests completed: %d != %d", have, want)
	}
}

@fraenkel
Copy link
Contributor

A fix is coming shortly. We cannot blindly decrement the conn count. We need to only decrement if we have removed the idle connection.

@fraenkel
Copy link
Contributor

Bah. A simple tweak to our existing test for MaxConnsPerHost has uncovered yet another issue.
Just tweak the loop to 300 and tip will hit the same issue reported. My fix will fix it but now I have

--- FAIL: TestTransportMaxConnsPerHost (0.06s)
    transport_test.go:658: round 1: too many dials (http2): 2 != 1
    transport_test.go:661: round 1: too many get connections (http2): 2 != 1
    transport_test.go:664: round 1: too many tls handshakes (http2): 2 != 1

@fraenkel
Copy link
Contributor

All I have determined at this point is that the http2 side starts sending back http.http2noCachedConnError after some time. The test passes when it doesn't occur which is not very often.

@gopherbot
Copy link

Change https://golang.org/cl/202087 mentions this issue: net/http: only decrement connection count if we removed a connection

@fraenkel
Copy link
Contributor

Part of the issue is the connection coordination between http and http2. When using 300 clients, the first 50+ compete to create the connection on the http side before the http2 side is aware of it. There is some glitch (still investigating) when the http2 side is aware, it actually causes a second connection.
By changing the test case to first do a single client request, the failure rate is greatly reduced but the problem still occurs. However, it shows that we will always reuse the first connection but may dial/tls handshake a second.
I don't think this will ever be perfect but I would at least like to better understand what is causing the second connection. We may need to relax the test case slightly.

@fraenkel
Copy link
Contributor

The http2 server imposes a limit of 250 concurrent streams. Once we reach that number which does happen when we stampede with 300 requests, there is a point where a new connection is created.

@mchtech
Copy link

mchtech commented Nov 18, 2019

There is still a certain chance to crash with "internal error: connCount underflow" on golang 1.13.4 when go client enables HTTP2.
The "connCount underflow" error did not appeared when using golang 1.13.3

@fraenkel
Copy link
Contributor

Can you also provide the stack trace? And the potential scenario if different than reported.
A new issue would be best.

@mchtech
Copy link

mchtech commented Nov 19, 2019

As written, the test case as a data race.
I changed the test case slightly:

  1. Fixing the data race, (transport.MaxConnsHost)
  2. using ForceAttemptHTTP2 to simplifiy the setup
  3. count successful finishes

The failure does still occur but I can get a few successful runs.
There is obviously some book keeping issue.

It is the same for 1.13.3 and tip (46aa835)

package issue34941

import (
	"context"
	"crypto/tls"
	"net/http"
	"net/http/httptest"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

func TestMaxConns(t *testing.T) {
	totalRequests := 300

	allow := make(chan struct{})
	var (
		starts   int64
		finishes int64
	)
	h := func(w http.ResponseWriter, r *http.Request) {
		if !r.ProtoAtLeast(2, 0) {
			t.Errorf("Request is not http/2: %q", r.Proto)
			return
		}
		atomic.AddInt64(&starts, 1)
		<-allow
	}

	s := httptest.NewUnstartedServer(http.HandlerFunc(h))
	s.TLS = &tls.Config{
		NextProtos: []string{"h2"},
	}
	s.StartTLS()
	defer s.Close()

	transport := s.Client().Transport.(*http.Transport)
	// clientConfig := transport.TLSClientConfig
	// transport.TLSClientConfig = nil
	transport.MaxConnsPerHost = 1
	transport.ForceAttemptHTTP2 = true

	// make a request to trigger HTTP/2 autoconfiguration
	// resp, err := s.Client().Get(s.URL)
	// if err == nil {
	// 	resp.Body.Close()
	// }
	// now allow the client to connect to the ad-hoc test server
	// transport.TLSClientConfig.RootCAs = clientConfig.RootCAs

	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		req, err := http.NewRequest("GET", s.URL, nil)
		if err != nil {
			t.Fatalf("NewRequest: %s", err)
		}
		wg.Add(1)
		go func() {
			defer wg.Done()
			ctx, cancel := context.WithCancel(ctx)
			defer cancel()
			req = req.WithContext(ctx)
			resp, err := s.Client().Do(req)
			if err != nil {
				return
			}
			resp.Body.Close()
			atomic.AddInt64(&finishes, 1)
		}()
	}

	for i := 0; i < 10; i++ {
		if i == 5 {
			close(allow)
		}
		time.Sleep(100 * time.Millisecond)
		t.Logf("starts=%d finishes=%d", atomic.LoadInt64(&starts), atomic.LoadInt64(&finishes))
	}

	if have, want := atomic.LoadInt64(&starts), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests started: %d != %d", have, want)
	}
	if have, want := atomic.LoadInt64(&finishes), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests completed: %d != %d", have, want)
	}
}

This test still fails on golang 1.13.4.
The smaller the value of transport.MaxConnsPerHost, the greater the probability that the test program will crash with panic: net/http: internal error: connCount underflow.

panic: net/http: internal error: connCount underflow

goroutine 119 [running]:
net/http.(*Transport).decConnsPerHost(0xc0000e2780, 0x0, 0x0, 0xc0000b4780, 0x5, 0xc00020e010, 0xf, 0x0)
	/usr/local/opt/go/libexec/src/net/http/transport.go:1334 +0x604
net/http.(*Transport).roundTrip(0xc0000e2780, 0xc000346a00, 0x0, 0xc000185b08, 0x100e7b8)
	/usr/local/opt/go/libexec/src/net/http/transport.go:546 +0x77f
net/http.(*Transport).RoundTrip(0xc0000e2780, 0xc000346a00, 0xc0000e2780, 0x0, 0x0)
	/usr/local/opt/go/libexec/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc000346a00, 0x1593fe0, 0xc0000e2780, 0x0, 0x0, 0x0, 0xc0001be2b0, 0xc0001dbcd0, 0x1, 0x0)
	/usr/local/opt/go/libexec/src/net/http/client.go:250 +0x443
net/http.(*Client).send(0xc0000a18f0, 0xc000346a00, 0x0, 0x0, 0x0, 0xc0001be2b0, 0x0, 0x1, 0x10)
	/usr/local/opt/go/libexec/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc0000a18f0, 0xc000346a00, 0x0, 0x0, 0x0)
	/usr/local/opt/go/libexec/src/net/http/client.go:641 +0x3ce
net/http.(*Client).Do(...)
	/usr/local/opt/go/libexec/src/net/http/client.go:509
test.TestMaxConns.func2(0xc0000ab210, 0xc00009db00, 0xc0000a6358, 0xc0000a8480, 0xc0000aa848)
	/Volumes/data/code/go/src/test/main_test.go:68 +0x1ad
created by test.TestMaxConns
	/Volumes/data/code/go/src/test/main_test.go:63 +0x3e4

@fraenkel
Copy link
Contributor

The fix only went into 1.14.
The powers that be would have to decide if it gets backported.

@creker
Copy link

creker commented Jan 15, 2020

Why isn't this backported? Maybe we need to ping someone? It's clearly a bug and regression in 1.13. 1.14 should be released fairly soon but that's a major release with tons of changes (the runtime in particular) that may come with its own bag of problems.

@networkimprov
Copy link

cc @toothrot @dmitshur re backport

@networkimprov
Copy link

@gopherbot please backport to 1.13

This is a regression, as documented in the issue comments.

@gopherbot
Copy link

Backport issue(s) opened: #36583 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@michaeldorner
Copy link

michaeldorner commented Jan 19, 2020

I fear that this issue is not resolved completely:

In Go version go1.13.6 darwin/amd64 I still get

panic: net/http: internal error: connCount underflow

goroutine 247 [running]:
net/http.(*Transport).decConnsPerHost(0xc0000963c0, 0x0, 0x0, 0xc00038caf0, 0x5, 0xc0000f9660, 0x11, 0x0)
	/usr/local/go/src/net/http/transport.go:1334 +0x604
net/http.(*Transport).roundTrip(0xc0000963c0, 0xc000312e00, 0xc000374600, 0xc0001e040c, 0xc0001e0450)
	/usr/local/go/src/net/http/transport.go:546 +0x77f
net/http.(*Transport).RoundTrip(0xc0000963c0, 0xc000312e00, 0xc0000963c0, 0xbf8140581cdfd498, 0xe2c22493c)
	/usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc000312c00, 0x1356060, 0xc0000963c0, 0xbf8140581cdfd498, 0xe2c22493c, 0x15254e0, 0xc0000fc1f8, 0xbf8140581cdfd498, 0x1, 0x0)
	/usr/local/go/src/net/http/client.go:250 +0x443
net/http.(*Client).send(0xc000082c00, 0xc000312c00, 0xbf8140581cdfd498, 0xe2c22493c, 0x15254e0, 0xc0000fc1f8, 0x0, 0x1, 0x0)
	/usr/local/go/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc000082c00, 0xc000312c00, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/http/client.go:641 +0x3ce
net/http.(*Client).Do(...)
	/usr/local/go/src/net/http/client.go:509
main.main.func2.1(0xc000082c00, 0xc0001c94c0, 0xc000312c00)
	/Users/michaeldorner/Code/Go/src/main.go:80 +0x4d
created by main.main.func2
	/Users/michaeldorner/Code/Go/src/main.go:79 +0x8c
exit status 2

Lines 1332 and 1333 in https://golang.org/src/net/http/transport.go explain that this
Shouldn't happen, but if it does, the counting is buggy and could easily lead to a silent deadlock, so report the problem loudly. which I do hereby. :)

Please let me know if I shall create a new issue.

@fraenkel
Copy link
Contributor

@michaeldorner This has not be backported to 1.13 yet.

@michaeldorner
Copy link

@fraenkel Oh sorry, my bad. Thanks for the fast reply.

@andybalholm
Copy link
Contributor

I'm still getting internal error: connCount underflow sometimes on 1.14.1.

@fraenkel
Copy link
Contributor

@andybalholm can you attach the stack trace?
Is there anything special in your usage?

@andybalholm
Copy link
Contributor

andybalholm commented Mar 31, 2020

Here is the stack trace:

panic: net/http: internal error: connCount underflow

goroutine 2114 [running]:
net/http.(*Transport).decConnsPerHost(0x1094ca0, 0x0, 0x0, 0xc001f110b5, 0x4, 0xc001a27320, 0x14, 0x0)
		/usr/local/go/src/net/http/transport.go:1391 +0x590
net/http.(*persistConn).closeLocked(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2584 +0xf5
net/http.(*persistConn).close(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2574 +0x83
net/http.(*persistConn).readLoop.func1(0xc0000cd320, 0xc0073b4d60)
		/usr/local/go/src/net/http/transport.go:1946 +0x41
net/http.(*persistConn).readLoop(0xc0000cd320)
		/usr/local/go/src/net/http/transport.go:2122 +0x12ae
created by net/http.(*Transport).dialConn
		/usr/local/go/src/net/http/transport.go:1647 +0xc56

The usage is in a forward proxy server (github.com/andybalholm/redwood). We have about 2000 users running through the server I have been watching these errors on the most closely. So the usage is heavy, and probably about as close to random as you are likely to find.

I thought that switching to two Transports (one for HTTP/1.1 and one for HTTP/2) instead of one http.Transport that was configured to automatically switch protocols would get rid of the panics. But it didn't seem to help. (So it seems that the panic doesn't depend on having HTTP/2 enabled on the Transport.)

I had Transport.MaxConnsPerHost set to 8. Setting it to 0 (unlimited) made the panics go away.

@fraenkel
Copy link
Contributor

@andybalholm Thanks for the stacktrace. Please open a separate issue to track this. It is a different path than the one fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants