Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: http2: no cached connection was available #22091

Closed
sjenning opened this issue Sep 29, 2017 · 12 comments
Closed

x/net/http2: http2: no cached connection was available #22091

sjenning opened this issue Sep 29, 2017 · 12 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@sjenning
Copy link

sjenning commented Sep 29, 2017

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

Does this issue reproduce with the latest release?

Unknown, checking now

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/sjennings/projects/go"
GORACE=""
GOROOT="/home/sjennings/.gvm/gos/go1.8.3"
GOTOOLDIR="/home/sjennings/.gvm/gos/go1.8.3/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build663365804=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

In Kubernetes 1.8, we are seeing $SUBJECT error message on pods that have both a readiness and liveness probe. In this case, the probes are nearly simultaneous (within microseconds). About 2-3 times per hour, there is a timing collision resulting in the failure of one of the two probes with http2: no cached connection was available

I confirmed the disabling http2 for the probe connections causes the issue not to occur.

I also confirmed that disabling one of the probes causes the issue not to occur.

This is not load related as the transport is only used for the probes and there are 2 probes every 5 seconds. It is a timing issue.

What did you expect to see?

The connection succeeds every time

What did you see instead?

http2: no cached connection was available

Additional References

#16582

kubernetes/kubernetes#49740

I'll try to write a tight recreator for this since it is difficult and time consuming to recreate on Kubernetes.

@smarterclayton @derekwaynecarr

@sjenning sjenning changed the title net/http: http2: no cached connection was available x/net/http2: http2: no cached connection was available Sep 29, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Nov 2, 2017

/cc @tombergan

Tom, does this sound familiar? My memory's still fuzzy from my leave.

I see #16582 is marked fixed but later comments (#16582 (comment)) suggest it's still happening on Go 1.9.x.

@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 2, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Nov 2, 2017
@tombergan
Copy link
Contributor

@WillSewell
I am still receiving this error on Go 1.9 when performing a large number of concurrent requests (400). Looking at the http2 verbose logs, I can see a large number of http2: no cached connection was available, but only a small % of these bubble up to http.Client.Do.

Do you have a repro you can share? I cannot repro with https://play.golang.org/p/zKthX7Y9RW (from #16582).

I was looking through the source code to see why these were not being caught by the fix in 7a62274. The only other place I can see RoundTripper.RoundTrip being called where http2ErrNoCachedConn is not being handled is on this line: https://github.com/golang/go/blob/master/src/net/http/transport.go#L371. I believe the concrete implementation here is noDialH2RoundTripper since we are configuring the client with https://github.com/golang/net/blob/master/http2/transport.go#L125. So my suspicion is this is where the error is being returned from. Is there a reason there is no retrying performed after a error on https://github.com/golang/go/blob/master/src/net/http/transport.go#L371?

Perhaps I misunderstood... noDialH2RoundTripper cannot return ErrNoCachedConn. It translates that error to ErrSkipAltProtocol. Did you mean s/ErrNoCachedConn/ErrSkipAltProtocol/ ?

According to the comments, there is a known possible race with CloseIdleConnections (https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221). @WillSewell and @sjenning, can you verify that you're not calling CloseIdleConnections?

@sjenning
Copy link
Author

sjenning commented Nov 2, 2017

We are not calling CloseIdleConnections, at least not at the time the issue occurs.

FYI, we worked around this in kube by using seperate transports for the two probe types such that a timing collision doesn't occur kubernetes/kubernetes#53318

@tombergan
Copy link
Contributor

I wonder if this happens because the transport uses DisableKeepAlives=true?
https://github.com/kubernetes/kubernetes/blob/master/pkg/probe/http/http.go#L41

Looks like yes. I cannot repro with DisableKeepAlives=false, but I can repro with DisableKeepAlives=true. For the record:
https://play.golang.org/p/8-KRho9KPH

My guess: the second request selects the same connection as the first request, but its RoundTrip doesn't execute until after the first request completes. By that point, the connection has gone idle and shut down because keep alives are disabled. This could be a manifestation of https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221

// TODO: don't close a cc if it was just added to the pool
// milliseconds ago and has never been used. There's currently
// a small race window with the HTTP/1 Transport's integration
// where it can add an idle conn just before using it, and
// somebody else can concurrently call CloseIdleConns and

@sjenning, do you know why you're disabling keep alives? (Wild guess: that flag was added with HTTP/1.1 in mind. Since these are infrequent probes, there's no reason to keep the connection open, so keep alives were disabled.)

@WillSewell, are you also also disabling keep alives?

@sjenning
Copy link
Author

sjenning commented Nov 3, 2017

DisableKeepAlives=true was result of kubernetes/kubernetes#15733 which fixed kubernetes/kubernetes#15643

@WillSewell
Copy link

Do you have a repro you can share? I cannot repro with https://play.golang.org/p/zKthX7Y9RW (from #16582).

I also can't reproduce it when running your example. I'm afraid I don't think I have time right now to boil it down to a minimal test case (I just tried and failed).

According to the comments, there is a known possible race with CloseIdleConnections (https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221). @WillSewell and @sjenning, can you verify that you're not calling CloseIdleConnections?

I'm not.

@WillSewell, are you also also disabling keep alives?

I'm not, although I do get the error when running your example with keep alives disabled.

@someonegg
Copy link

someonegg commented Dec 1, 2017

If import "golang.org/x/net/http2" directly and use its ConfigureTransport function, the fix [7a62274] will not work.

Just like https://play.golang.org/p/8-KRho9KPH

The err is http2.ErrNoCachedConn instead of http2ErrNoCachedConn.

 func (pc *persistConn) shouldRetryRequest(req *Request, err error) bool {
 	if err == http2ErrNoCachedConn {
 		// Issue 16582: if the user started a bunch of
 		// requests at once, they can all pick the same conn
 		// and violate the server's max concurrent streams.
 		// Instead, match the HTTP/1 behavior for now and dial
 		// again to get a new TCP connection, rather than failing
 		// this request.
 		return true
 	}
 	...
}

@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 10, 2018
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 10, 2018
@bradfitz
Copy link
Contributor

@tombergan, I'll take this.

@gopherbot
Copy link

Change https://golang.org/cl/87297 mentions this issue: http2: add internal function isNoCachedConnError to test for ErrNoCachedConn

@gopherbot
Copy link

Change https://golang.org/cl/87298 mentions this issue: net/http: fix "http2: no cached connection..." error with x/net/http2

@bradfitz
Copy link
Contributor

@someonegg, thanks!

gopherbot pushed a commit to golang/net that referenced this issue Jan 10, 2018
…hedConn

In a given program there may be two separate copies of
ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's
golang.org/x/net/http2 version. We need to be able to detect either
in net/http.

This CL adds a function to report whether an error value represents
that type of error, and then a subsequent CL to net/http will use it
instead of ==.

Updates golang/go#22091

Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4
Reviewed-on: https://go-review.googlesource.com/87297
Reviewed-by: Tom Bergan <tombergan@google.com>
@xh3b4sd
Copy link

xh3b4sd commented Feb 2, 2018

For my interest, in which go version will this be fixed? How could I find out by myself?

c3mb0 pushed a commit to c3mb0/net that referenced this issue Apr 2, 2018
…hedConn

In a given program there may be two separate copies of
ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's
golang.org/x/net/http2 version. We need to be able to detect either
in net/http.

This CL adds a function to report whether an error value represents
that type of error, and then a subsequent CL to net/http will use it
instead of ==.

Updates golang/go#22091

Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4
Reviewed-on: https://go-review.googlesource.com/87297
Reviewed-by: Tom Bergan <tombergan@google.com>
@golang golang locked and limited conversation to collaborators Feb 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants