Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: SetKeepAlivesEnabled(false) closes all HTTP/2 connections older than 5 seconds #36946

Closed
nwidger opened this issue Jan 31, 2020 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@nwidger
Copy link
Contributor

nwidger commented Jan 31, 2020

What version of Go are you using (go version)?

$ go version 1.13.7
go version go1.13.7 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niels/.cache/go-build"
GOENV="/home/niels/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/niels/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/niels/sdk/go1.13.7"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/niels/sdk/go1.13.7/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build412221026=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Here is a play.golang.org link for a program which reproduces the issue:

https://play.golang.org/p/7mPbWhl7QeA

it may be necessary to save it to a main.go file and run it manually with

$ go run main.go

Here is an example output from running the program:

2020/01/31 15:04:28 server: listening on 127.0.0.1:8443
2020/01/31 15:04:30 client: request 1: sent
2020/01/31 15:04:30 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state new
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:33 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:33 client: request 1: response: 200: received request 1 over http2
2020/01/31 15:04:33 client: request 2: sent
2020/01/31 15:04:33 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:36 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:36 client: request 2: response: 200: received request 2 over http2
2020/01/31 15:04:36 client: request 3: sent
2020/01/31 15:04:36 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:39 client: request 3: response: 200: received request 3 over http2
2020/01/31 15:04:39 client: request 4: sent
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:39 ========== disabling keep alives ==========
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:39 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state closed
2020/01/31 15:04:39 Get https://127.0.0.1:8443/: unexpected EOF
exit status 1

The program starts up an HTTPS server and has a client send it requests in a loop. Importantly, both the client and server are configured to use HTTP/2 over TLS and the client reuses a single TCP connection for each request. The handler for the HTTPS server blocks for 3 seconds and then sends back its response. On the fourth request, once we can be sure that the client connection has been open for at least 5 seconds, SetKeepAlivesEnabled(false) is called on the server within the handler of the request, before the response has been written out.

What did you expect to see?

After the call to SetKeepAlivesEnabled, I expected the requests sent by the client to continue working as normal. The documentation for SetKeepAlivesEnabled claims only to disable HTTP keep-alives, however by looking at the source code one can see it also apparently calls a method called closeIdleConns. However, because we are calling SetKeepAlivesEnabled within the handler of a request that has not yet been responded to yet, clearly this connection is not idle. The documentation for StateIdle is defined to be:

    // StateIdle represents a connection that has finished
    // handling a request and is in the keep-alive state, waiting
    // for a new request. Connections transition from StateIdle
    // to either StateActive or StateClosed.

What did you see instead?

The connection is closed by SetKeepAlivesEnabled via closeIdleConns and the client request fails, coming back with an unexpected EOF error. I believe this is ultimately due to the fact that the HTTP/2 code keeps its own connection state bookkeeping which is not shared with the HTTP/1 server. Using the http.Server.ConnState hook and some sneaky use of the runtime package, the logs in the example output above show that, according to the HTTP/1 server, the connection never transitions out of the StateNew state while the HTTP/2 code transitions it between StateActive and StateIdle with each new request on the connection.

Add to that this heuristic within closeIdleConns:

		// Issue 22682: treat StateNew connections as if
		// they're idle if we haven't read the first request's
		// header in over 5 seconds.
		if st == StateNew && unixSec < time.Now().Unix()-5 {
			st = StateIdle
		}

and you get the outcome that any HTTP/2 connection that has been open for longer than 5 seconds will be closed by a call to SetKeepAlivesEnabled(false). This heuristic was apparently introduced by #22682.

You can run the example program with both the client and the server configured to use HTTP/1 by passing -http=false:

$ go run main.go -http2=false

which shows that the client requests continue working after the call to SetKeepAlivesEnabled(false). Here is an example output running with HTTP/1:

2020/01/31 15:10:23 server: listening on 127.0.0.1:8443
2020/01/31 15:10:25 client: request 1: sent
2020/01/31 15:10:25 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state new
2020/01/31 15:10:25 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:28 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:28 client: request 1: response: 200: received request 1 over http1
2020/01/31 15:10:28 client: request 2: sent
2020/01/31 15:10:28 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:31 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:31 client: request 2: response: 200: received request 2 over http1
2020/01/31 15:10:31 client: request 3: sent
2020/01/31 15:10:31 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:34 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:34 client: request 3: response: 200: received request 3 over http1
2020/01/31 15:10:34 client: request 4: sent
2020/01/31 15:10:34 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:34 ========== disabling keep alives ==========
2020/01/31 15:10:34 ==========  disabled keep alives ==========
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state closed
2020/01/31 15:10:37 client: request 4: response: 200: received request 4 over http1
2020/01/31 15:10:37 client: request 5: sent
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state new
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state active
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state closed
2020/01/31 15:10:40 client: request 5: response: 200: received request 5 over http1
2020/01/31 15:10:40 client: request 6: sent
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state new
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state active
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state closed
2020/01/31 15:10:43 client: request 6: response: 200: received request 6 over http1
2020/01/31 15:10:43 client: request 7: sent
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state new
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state active
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state closed
2020/01/31 15:10:46 client: request 7: response: 200: received request 7 over http1
2020/01/31 15:10:46 client: request 8: sent
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state new
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state active
2020/01/31 15:10:49 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state closed
2020/01/31 15:10:49 client: request 8: response: 200: received request 8 over http1

Please let me know if I can provide any additional information about the issue.

@fraenkel
Copy link
Contributor

fraenkel commented Feb 4, 2020

@bradfitz It seems the connection should be transitioned to StateHijacked. I haven't checked but I believe we are just collecting http/2 connections.

@nwidger
Copy link
Contributor Author

nwidger commented Feb 5, 2020

@fraenkel @bradfitz It looks like http.Server's graceful shutdown method Shutdown is also affected since it too calls closeIdleConns. Here's lightly modified version of the previous program which calls Shutdown instead of SetKeepAlivesEnabled within the handler:

https://play.golang.org/p/UtgaaHEJjYw

Here is example output running with HTTP/2, you can see that request 4 comes back with an unexpected EOF error since the connection is closed before a response is written:

2020/02/05 15:12:33 server: listening on 127.0.0.1:8443
2020/02/05 15:12:35 client: request 1: sent
2020/02/05 15:12:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state new
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:38 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:38 client: request 1: response: 200: received request 1 over http2
2020/02/05 15:12:38 client: request 2: sent
2020/02/05 15:12:38 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:41 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:41 client: request 2: response: 200: received request 2 over http2
2020/02/05 15:12:41 client: request 3: sent
2020/02/05 15:12:41 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:44 client: request 3: response: 200: received request 3 over http2
2020/02/05 15:12:44 client: request 4: sent
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:44 ========== calling shutdown ==========
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:44 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state closed
2020/02/05 15:12:44 client: request 4: error: Get https://127.0.0.1:8443/: unexpected EOF
2020/02/05 15:12:44 ==========  called shutdown ==========
2020/02/05 15:12:44 servertls error: http: Server closed
2020/02/05 15:12:44 server: no longer serving, waiting 5s seconds before exiting...
2020/02/05 15:12:49 done.

and with -http2=false to show the behavior with HTTP/1:

2020/02/05 15:11:27 server: listening on 127.0.0.1:8443
2020/02/05 15:11:29 client: request 1: sent
2020/02/05 15:11:29 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state new
2020/02/05 15:11:29 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:32 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:32 client: request 1: response: 200: received request 1 over http1
2020/02/05 15:11:32 client: request 2: sent
2020/02/05 15:11:32 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:35 client: request 2: response: 200: received request 2 over http1
2020/02/05 15:11:35 client: request 3: sent
2020/02/05 15:11:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:38 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:38 client: request 3: response: 200: received request 3 over http1
2020/02/05 15:11:38 client: request 4: sent
2020/02/05 15:11:38 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:38 ========== calling shutdown ==========
2020/02/05 15:11:38 servertls error: http: Server closed
2020/02/05 15:11:38 server: no longer serving, waiting 5s seconds before exiting...
2020/02/05 15:11:43 done.

@fraenkel
Copy link
Contributor

fraenkel commented Feb 6, 2020

@nwidger Its all related on how connection management works and when idle connections are closed.
There is an outstanding issue #26303, which I need more details on since its unclear to me why http2 is being managed this way.

@nwidger
Copy link
Contributor Author

nwidger commented Feb 6, 2020

@fraenkel Agreed, I too am very curious to learn more.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 7, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 7, 2020
@cagedmantis
Copy link
Contributor

/cc @bradfitz @tombergan

@nwidger
Copy link
Contributor Author

nwidger commented Feb 13, 2020

@fraenkel Is there anything I can do to help make progress? I am very interested in addressing this issue as it is causing us upgrade issues in production. Right now my only fix is to disable HTTP/2.

@fraenkel
Copy link
Contributor

Not at this time.

@fraenkel
Copy link
Contributor

This should be fixed by #39776

@networkimprov
Copy link

In that case, should the CL reference this issue?

@nwidger
Copy link
Contributor Author

nwidger commented Jun 29, 2020

@fraenkel Thank you so much for looking into this issue and taking a stab at fixing it! I really appreciate it. I'm not sure the changes in https://golang.org/cl/240278 completely address the issue, however. I just reran my two reproduction programs from above using the CL. The first one (https://play.golang.org/p/7mPbWhl7QeA) is fixed but I think the second one (https://play.golang.org/p/UtgaaHEJjYw) still has issues:

$ bin/go version
go version devel +6f6d4046eb Sat Jun 27 13:31:34 2020 -0600 darwin/amd64
$ bin/go run main2.go
2020/06/29 09:42:16 server: listening on 127.0.0.1:8443
2020/06/29 09:42:18 client: request 1: sent
2020/06/29 09:42:18 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state new
2020/06/29 09:42:18 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state active
2020/06/29 09:42:18 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state idle
2020/06/29 09:42:18 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state active
2020/06/29 09:42:21 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state idle
2020/06/29 09:42:21 client: request 1: response: 200: received request 1 over http2
2020/06/29 09:42:21 client: request 2: sent
2020/06/29 09:42:21 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state active
2020/06/29 09:42:24 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state idle
2020/06/29 09:42:24 client: request 2: response: 200: received request 2 over http2
2020/06/29 09:42:24 client: request 3: sent
2020/06/29 09:42:24 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state active
2020/06/29 09:42:27 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state idle
2020/06/29 09:42:27 client: request 3: response: 200: received request 3 over http2
2020/06/29 09:42:27 client: request 4: sent
2020/06/29 09:42:27 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:52935: state active
2020/06/29 09:42:27 ========== calling shutdown ==========
2020/06/29 09:42:27 servertls error: http: Server closed
2020/06/29 09:42:27 server: no longer serving, waiting 5s seconds before exiting...
2020/06/29 09:42:32 done.

The client never gets a response for request 4 despite the fact that Shutdown is called within the response handler for the request. Maybe your CL was not meant to address this issue, or perhaps I'm mistaken in some way?

@fraenkel
Copy link
Contributor

Please use GODEBUG=http2debug=2. Depending on what LastStreamId is sent in the GoAway will decide whether its correct or not.
I would also recommend not piling a set of problems in one issue because it becomes hard to track the issues at hand.

@nwidger
Copy link
Contributor Author

nwidger commented Jun 29, 2020

@fraenkel Sure I can rerun with GODEBUG=httpdebug=2. Would you prefer that I add a comment with the output to #39776 or should I open a new issue?

@fraenkel
Copy link
Contributor

New issue please.

@nwidger
Copy link
Contributor Author

nwidger commented Jun 29, 2020

@fraenkel Done, #39919. Let me know if I can add any more information to the issue.

@gopherbot
Copy link

Change https://golang.org/cl/240278 mentions this issue: net/http: mark http/2 connections active

gopherbot pushed a commit that referenced this issue Sep 6, 2020
On Server.Shutdown, all idle connections are closed.
A caveat for new connections is that they are marked idle
after 5 seconds.
Previously new HTTP/2 connections were marked New, and after 5 seconds,
they would then become idle. With this change, we now mark HTTP/2
connections as Active to allow the proper shutdown sequence to occur.

Fixes #36946
Fixes #39776

Change-Id: I31efbf64b9a2850ca544da797f86d7e1b3378e8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/240278
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants