x/net/http2: investigate Server speed regressions from more select case #20302

bradfitz · 2017-05-09T20:35:50Z

Go bug for tracking kubernetes/kubernetes#45216

It appears that the new select cases in http2.serverConn.serve slowed things down.

But tip (~Go 1.9) had a number of select improvements after Go 1.8, so it doesn't show up as much perhaps. See comments and numbers in the linked bug.

/cc @tombergan

gopherbot · 2017-05-09T20:44:30Z

CL https://golang.org/cl/43034 mentions this issue.

tombergan · 2017-05-09T22:48:24Z

From the profile linked in kubernetes/kubernetes#45216, this looks like lock contention.

runtime.selectgoImpl: 13.06s of 451.69s
runtime.selunlock: 0.58s of 179.29s
runtime.sellock: 1.74s of 251.83s

Compare this to the profile from go 1.7:

runtime.selectgoImpl: 4.65s of 12.69s
runtime.selunlock: doesn't register
runtime.sellock: doesn't register

josharian · 2017-05-09T23:08:56Z

Pity--looks like mutex profiling doesn't include futex. I wonder how to detect which case is contended; if it is in fact lock contention, knowing which lock it is should give reproduction hints.

bradfitz · 2017-05-09T23:10:36Z

/cc @mdempsky @randall77 @aclements for any runtime help.

This is a pretty serious speed regression for Kubernetes, but one we can't yet repro in a stand-alone benchmark. Any clues?

josharian · 2017-05-10T01:31:49Z

@bradfitz it looks like BenchmarkGets has no concurrency (at least according to the trace viewer), so the select statement isn't under any contention. So if the problem is lock contention, it won't show up. Maybe b.RunParallel to the rescue?

josharian · 2017-05-10T01:35:32Z

(I tried hacking it in myself but broke all sorts of things--obviously requires more domain expertise than I have to get it right.)

tombergan · 2017-05-10T04:48:44Z

Here is a version of BenchmarkServerGets that adds concurrency. I cannot reproduce the bug though. The benchmark has the same runtime before and after commenting out the select cases listed in this comment.

I wonder if there's something about their testbed setup that makes really long scheduling delays possible? Or maybe they're running in a VM that has a weird implementation of sys_futex?

josharian · 2017-05-10T14:34:32Z

@tombergan thanks. That benchmark patch doesn't apply cleanly for me (mail a CL next time?), but I applied it manually, and got:

BenchmarkServerGets-8   	--- FAIL: BenchmarkServerGets-8
	server_test.go:2989: Error: timeout waiting for frame
	server_test.go:235: Framer read log:
		2017-05-10 07:25:51.540231465 Framer 0xc4205082a0: read SETTINGS len=24, settings: MAX_FRAME_SIZE=1048576, MAX_CONCURRENT_STREAMS=101, MAX_HEADER_LIST_SIZE=1048896, INITIAL_WINDOW_SIZE=1048576
		2017-05-10 07:25:51.54027134 Framer 0xc4205082a0: read WINDOW_UPDATE len=4 (conn) incr=983041
		2017-05-10 07:25:51.540286228 Framer 0xc4205082a0: read SETTINGS flags=ACK len=0
		2017-05-10 07:25:51.540439446 Framer 0xc4205082a0: read HEADERS flags=END_HEADERS stream=1 len=49
		2017-05-10 07:25:51.540454635 Framer 0xc4205082a0: read DATA flags=END_STREAM stream=1 len=12 data="Hello, world"

and lots more errors.

Anyway, from visually inspecting the patch, it does set up a concurrent get, but not N concurrent gets, so it doesn't appear that it would really introduce contention. You want something (waves hands) like:

  setUpServer()

  b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            get()
        }
    })

You can check for full concurrency using runtime tracing. Be sure to use a short benchtime to avoid overwhelming amounts of data.

$ go test -bench=Gets -benchtime=50ms -trace=trace.out
$ go tool trace trace.out

You should see all procs active the whole time, not just one or two. If it is fully parallel, and there are contention effects, you should also be able to see the effect when you mess with -cpu:

$ go test -bench=Gets -cpu=1,2,4,8,16,32,64,128

tombergan · 2017-05-10T15:32:04Z

Sorry the patch didn't apply cleanly, not sure why that happened.

HTTP/2 frames must be serialized onto a TCP connection. Whether or not the "gets" run concurrently on the client side is not relevant to server-side concurrency. Previously, the test forced serial execution on the server because it waited for a response immediately after sending each request. My patched test reads the responses in a background goroutine so the server can respond concurrently with the client sending new requests. The client still needs to send requests serially (the st.writeHeaders call) because, again, HTTP/2 frames are serialized.

The server will serialize response frames, but each HTTP request handler should run concurrently. That serialization is managed by the serverConn.serve loop, which is the method that includes the large select stmt. The select stmt selects on new frames from the client, new write requests from HTTP handlers, completed net.Conn.write() calls (which happen in a background goroutine), timers, and other small things. Assuming lock contention is the problem, the contention is likely on st.wantWriteFrameCh, since that channel is written to by all request handler goroutines, while the other channels have only one writer.

Thanks for the tips about runtime tracing. Hopefully I have time later today to try that out.

josharian · 2017-05-10T16:19:56Z

Cool. Let me know if you have any problems with tracing; happy to help. Feel free to also ping me on some form of chat if interactive discussion would be helpful. Brad can give you my contact info.

bradfitz · 2017-05-10T16:20:28Z

@tombergan, yes, that's the sort of benchmark I was imagining: one connection, bunch of concurrent requests streaming from long-running handlers doing Write+Flush for b.N iterations or something. That sounds like what they're hitting with their "Watch" APIs.

gopherbot · 2017-05-11T04:59:59Z

CL https://golang.org/cl/43230 mentions this issue.

tombergan · 2017-05-11T16:35:34Z

For the diagnosis, see this comment onwards.

josharian · 2017-05-11T20:59:52Z

Depending on the fix, this might be for Go 1.8.2, since it causes serious problems and there isn't a clear workaround. Marking as Go 1.8.2, with the expectation that others may disagree and move it back.

gopherbot · 2017-05-12T21:29:15Z

CL https://golang.org/cl/43455 mentions this issue.

bradfitz · 2017-05-13T00:29:31Z

I filed a more general bug at #20351

gracefulShutdownCh is shared by all connections in a server. When a server accumulates many connections (e.g., 5000 in the kubemark-5000 benchmark), we have 5000 serverConn.serve goroutines selecting on this channel. This means 5000 goroutines hammer the channel's lock, which causes severe lock contention. The fix in this CL is to make a local proxy for gracefulShutdownCh in each connection so that each connection selects on gracefulShutdownCh at most once per connection rather than once per serverConn.serve loop iteration. This fix is intended to be backported quickly into Go 1.8.2. The downside of this fix is 2KB extra stack usage per connection. A better fix will be implemented in Go 1.9. Unfortunately, I have been unable to reproduce this problem locally. This fix was verified by the kubernetes team. See: kubernetes/kubernetes#45216 (comment) Updates golang/go#20302 Change-Id: I19ab19268a6ccab9b6e9dffa0cfbc89b8c7d0f19 Reviewed-on: https://go-review.googlesource.com/43455 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

gopherbot · 2017-05-13T00:31:09Z

CL https://golang.org/cl/43459 mentions this issue.

This drops the number of select cases in serverConn.server from 10 to 6. It was 7 in Go 1.7. It increased to 10 in Go 1.8. * replace testing-only testHookCh with always-on grab-bag serveMsgCh to be used for both test purposes, and infrequently used message types * remove the settingsTimer.C case for the initial settings timer and use serveMsgCh and time.AfterFunc instead * ... and do the same for the idle timeout timer. * ... and for the shutdown timer. * remove wantStartPushCh and just send the *startPushRequest to serveMsgCh too I could go further with this (and plan to, later), but these are the safe and easy ones that don't require more work elsewhere. The speed gets better the more the request/response go via the serverConn.serve loop. (once per Request.Body.Read or ResponseWriter.Flush): name old time/op new time/op delta ServerGets-4 138µs ± 3% 134µs ± 2% -2.54% (p=0.002 n=10+10) ServerPosts-4 176µs ±27% 154µs ± 2% -12.62% (p=0.011 n=10+10) Updates kubernetes/kubernetes#45216 Updates golang/go#20302 Change-Id: I18019554089d7e3d76355d7137b5957e9597e803 Reviewed-on: https://go-review.googlesource.com/43034 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Tom Bergan <tombergan@google.com>

…cefulShutdownCh gracefulShutdownCh is shared by all connections in a server. When a server accumulates many connections (e.g., 5000 in the kubemark-5000 benchmark), we have 5000 serverConn.serve goroutines selecting on this channel. This means 5000 goroutines hammer the channel's lock, which causes severe lock contention. The fix in this CL is to make a local proxy for gracefulShutdownCh in each connection so that each connection selects on gracefulShutdownCh at most once per connection rather than once per serverConn.serve loop iteration. This fix is intended to be backported quickly into Go 1.8.2. The downside of this fix is 2KB extra stack usage per connection. A better fix will be implemented in Go 1.9. Unfortunately, I have been unable to reproduce this problem locally. This fix was verified by the kubernetes team. See: kubernetes/kubernetes#45216 (comment) Updates golang/go#20302 Change-Id: I19ab19268a6ccab9b6e9dffa0cfbc89b8c7d0f19 Reviewed-on: https://go-review.googlesource.com/43455 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> (cherry picked from commit d3ede01) Reviewed-on: https://go-review.googlesource.com/43459 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Chris Broadfoot <cbro@golang.org>

gopherbot · 2017-05-23T20:14:46Z

CL https://golang.org/cl/43996 mentions this issue.

…tdownCh lock contention slowdown This updates the bundled x/net/http2 repo to git rev 186fd3fc (from the net repo's release-branch.go1.8) for: [release-branch.go1.8] http2: fix lock contention slowdown due to gracefulShutdownCh https://golang.org/cl/43459 Fixes #20302 Change-Id: Ia01a44c6749292de9c16ca330bdebe1e52458b18 Reviewed-on: https://go-review.googlesource.com/43996 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Chris Broadfoot <cbro@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>

This will be used to allow http2 servers to register a shutdown function so that net/http.Server.Shutdown will work when the http2 server is configured via a manual call to http2.ConfigureServer. Currently, Shutdown only works when the http2 server is configured automatically by the net/http package. Updates #20302 Updates #18471 Change-Id: Ifc2b5f3126126a106b49ea4a7e999279852b9cc9 Reviewed-on: https://go-review.googlesource.com/44003 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

tombergan · 2017-05-23T23:57:35Z

Reopening for the go 1.9 fixes.

gopherbot · 2017-05-23T23:57:37Z

CL https://golang.org/cl/44003 mentions this issue.

This is a better fix that https://golang.org/cl/43455. Instead of creating a separate goroutine to wait for the global shutdown channel, we reuse the new serverMsgCh, which was added in a prior CL. We also use the new net/http.Server.RegisterOnShutdown method to register a shutdown callback for each http2.Server. Updates golang/go#20302 Updates golang/go#18471 Change-Id: Icf29d5e4f65b3779d1fb4ea92924e4fb6bdadb2a Reviewed-on: https://go-review.googlesource.com/43230 Run-TryBot: Tom Bergan <tombergan@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

tombergan · 2017-05-24T00:06:36Z

Still TODO is to bundle https://golang.org/cl/44003 into net/http.

gopherbot · 2017-05-24T00:09:20Z

CL https://golang.org/cl/44006 mentions this issue.

bradfitz · 2017-05-24T19:54:42Z

@tombergan, I'm going to switch this back to Go 1.8.3 so it shows up in the Github search link from the Go 1.8.3 release notes.

/cc @broady

gracefulShutdownCh is shared by all connections in a server. When a server accumulates many connections (e.g., 5000 in the kubemark-5000 benchmark), we have 5000 serverConn.serve goroutines selecting on this channel. This means 5000 goroutines hammer the channel's lock, which causes severe lock contention. The fix in this CL is to make a local proxy for gracefulShutdownCh in each connection so that each connection selects on gracefulShutdownCh at most once per connection rather than once per serverConn.serve loop iteration. This fix is intended to be backported quickly into Go 1.8.2. The downside of this fix is 2KB extra stack usage per connection. A better fix will be implemented in Go 1.9. Unfortunately, I have been unable to reproduce this problem locally. This fix was verified by the kubernetes team. See: kubernetes/kubernetes#45216 (comment) Updates golang/go#20302 Change-Id: I19ab19268a6ccab9b6e9dffa0cfbc89b8c7d0f19 Reviewed-on: https://go-review.googlesource.com/43455 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

This drops the number of select cases in serverConn.server from 10 to 6. It was 7 in Go 1.7. It increased to 10 in Go 1.8. * replace testing-only testHookCh with always-on grab-bag serveMsgCh to be used for both test purposes, and infrequently used message types * remove the settingsTimer.C case for the initial settings timer and use serveMsgCh and time.AfterFunc instead * ... and do the same for the idle timeout timer. * ... and for the shutdown timer. * remove wantStartPushCh and just send the *startPushRequest to serveMsgCh too I could go further with this (and plan to, later), but these are the safe and easy ones that don't require more work elsewhere. The speed gets better the more the request/response go via the serverConn.serve loop. (once per Request.Body.Read or ResponseWriter.Flush): name old time/op new time/op delta ServerGets-4 138µs ± 3% 134µs ± 2% -2.54% (p=0.002 n=10+10) ServerPosts-4 176µs ±27% 154µs ± 2% -12.62% (p=0.011 n=10+10) Updates kubernetes/kubernetes#45216 Updates golang/go#20302 Change-Id: I18019554089d7e3d76355d7137b5957e9597e803 Reviewed-on: https://go-review.googlesource.com/43034 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Tom Bergan <tombergan@google.com>

This is a better fix that https://golang.org/cl/43455. Instead of creating a separate goroutine to wait for the global shutdown channel, we reuse the new serverMsgCh, which was added in a prior CL. We also use the new net/http.Server.RegisterOnShutdown method to register a shutdown callback for each http2.Server. Updates golang/go#20302 Updates golang/go#18471 Change-Id: Icf29d5e4f65b3779d1fb4ea92924e4fb6bdadb2a Reviewed-on: https://go-review.googlesource.com/43230 Run-TryBot: Tom Bergan <tombergan@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

bradfitz added the Performance label May 9, 2017

bradfitz added this to the Go1.9 milestone May 9, 2017

bradfitz self-assigned this May 9, 2017

josharian mentioned this issue May 9, 2017

runtime/pprof: extend mutex profiling to select case blocking #20305

Open

josharian modified the milestones: Go1.8.2, Go1.9 May 11, 2017

This was referenced May 13, 2017

runtime: select on a shared channel is slow with many Ps #20351

Open

Upgrade to Go 1.8. resulted in significant performance regression kubernetes/kubernetes#45216

Closed

bradfitz modified the milestones: Go1.8.2, Go1.8.3 May 18, 2017

bradfitz removed this from the Go1.8.2 milestone May 18, 2017

bradfitz closed this as completed May 23, 2017

tombergan reopened this May 23, 2017

tombergan modified the milestones: Go1.9, Go1.8.3 May 23, 2017

gopherbot closed this as completed in e6e6cad May 24, 2017

bradfitz modified the milestones: Go1.8.3, Go1.9 May 24, 2017

golang locked and limited conversation to collaborators May 24, 2018

gopherbot added the FrozenDueToAge label May 24, 2018

rsc unassigned bradfitz Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/net/http2: investigate Server speed regressions from more select case #20302

x/net/http2: investigate Server speed regressions from more select case #20302

bradfitz commented May 9, 2017 •

edited

Loading

gopherbot commented May 9, 2017

tombergan commented May 9, 2017

josharian commented May 9, 2017

bradfitz commented May 9, 2017

josharian commented May 10, 2017

josharian commented May 10, 2017

tombergan commented May 10, 2017

josharian commented May 10, 2017

tombergan commented May 10, 2017 •

edited

Loading

josharian commented May 10, 2017

bradfitz commented May 10, 2017

gopherbot commented May 11, 2017

tombergan commented May 11, 2017

josharian commented May 11, 2017

gopherbot commented May 12, 2017

bradfitz commented May 13, 2017

gopherbot commented May 13, 2017

gopherbot commented May 23, 2017

tombergan commented May 23, 2017

gopherbot commented May 23, 2017

tombergan commented May 24, 2017

gopherbot commented May 24, 2017

bradfitz commented May 24, 2017

x/net/http2: investigate Server speed regressions from more select case #20302

x/net/http2: investigate Server speed regressions from more select case #20302

Comments

bradfitz commented May 9, 2017 • edited Loading

gopherbot commented May 9, 2017

tombergan commented May 9, 2017

josharian commented May 9, 2017

bradfitz commented May 9, 2017

josharian commented May 10, 2017

josharian commented May 10, 2017

tombergan commented May 10, 2017

josharian commented May 10, 2017

tombergan commented May 10, 2017 • edited Loading

josharian commented May 10, 2017

bradfitz commented May 10, 2017

gopherbot commented May 11, 2017

tombergan commented May 11, 2017

josharian commented May 11, 2017

gopherbot commented May 12, 2017

bradfitz commented May 13, 2017

gopherbot commented May 13, 2017

gopherbot commented May 23, 2017

tombergan commented May 23, 2017

gopherbot commented May 23, 2017

tombergan commented May 24, 2017

gopherbot commented May 24, 2017

bradfitz commented May 24, 2017

bradfitz commented May 9, 2017 •

edited

Loading

tombergan commented May 10, 2017 •

edited

Loading