Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: ioutil.ReadAll(req.Body) gets stuck on HTTP/2 #40816

Open
jkowalski opened this issue Aug 15, 2020 · 7 comments
Open

x/net/http2: ioutil.ReadAll(req.Body) gets stuck on HTTP/2 #40816

jkowalski opened this issue Aug 15, 2020 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jkowalski
Copy link

What version of Go are you using (go version)?

$ go version
go 1.14.7

Does this issue reproduce with the latest release?

tbd.

What operating system and processor architecture are you using (go env)?

Debian/Buster (server)
Linux Mint (client)

What did you do?

This was reported by users of Kopia (backup tool). I can't personally reproduce this, but two users have reported it independently.

There's a client and server (HTTP+TLS), both compiled using golang 1.14.7. The client sends large blobs (~20 MB) to the server using PUT method on 1-4 parallel goroutines.

Sometimes the server gets stuck handling requests for no apparent reason. We have captured stack traces when this happens, both client and server-side:

(server) kopia/kopia#538 (comment)
(client) kopia/kopia#538 (comment)

In the stack traces you see that the server is stuck on ioutil.ReadAll() when reading from the request body for 3 minutes:

goroutine 23781 [sync.Cond.Wait, 3 minutes]:
runtime.goparkunlock(...)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc000326528, 0x16)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc000326518)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/sync/cond.go:56 +0x9d
net/http.(*http2pipe).Read(0xc000326510, 0xc005779b50, 0x7e2b0, 0x7e2b0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:3515 +0x8f
net/http.(*http2requestBody).Read(0xc00117e7e0, 0xc005779b50, 0x7e2b0, 0x7e2b0, 0x12, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:5876 +0xa0
bytes.(*Buffer).ReadFrom(0xc0001136c0, 0x7fb0a5e23020, 0xc00117e7e0, 0x7fb0a5e23020, 0x9, 0xf8bc01)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/bytes/buffer.go:204 +0xb1
io/ioutil.readAll(0x7fb0a5e23020, 0xc00117e7e0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/io/ioutil/ioutil.go:36 +0xe3
io/ioutil.ReadAll(...)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/io/ioutil/ioutil.go:45
github.com/kopia/kopia/internal/server.(*Server).handleContentPut(0xc0000cc4d0, 0x12e4380, 0xc00117e8a0, 0xc002a2b900, 0x7fb0a5c80658, 0x1030ea0, 0xf22842676f7cbc01)

The client:

goroutine 66172 [select]:
net/http.(*http2ClientConn).roundTrip(0xc000187980, 0xc004356700, 0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:7596 +0x977
net/http.(*http2Transport).RoundTripOpt(0xc00069cb40, 0xc004356700, 0xf8ba00, 0xc0002d3f50, 0xc00432d740, 0x5)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:6948 +0x16f
net/http.(*http2Transport).RoundTrip(...)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:6909
net/http.http2noDialH2RoundTripper.RoundTrip(0xc00069cb40, 0xc004356700, 0x12cb520, 0xc00069cb40, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:9103 +0x3e
net/http.(*Transport).roundTrip(0x1b130a0, 0xc004356700, 0xc0043277f0, 0x1114ace, 0xc004368630)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/transport.go:515 +0xd94
net/http.(*Transport).RoundTrip(0x1b130a0, 0xc004356700, 0x13, 0xc00019f0a0, 0x14)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/roundtrip.go:17 +0x35
github.com/kopia/kopia/internal/apiclient.basicAuthTransport.RoundTrip(0x12c9e60, 0x1b130a0, 0xc00019f220, 0x13, 0xc00019f0a0, 0x14, 0xc004356700, 0x2a1e40e9e74a, 0x3b7b8839, 0x3b7b883900000008)
	/home/travis/gopath/src/github.com/kopia/kopia/internal/apiclient/apiclient.go:192 +0x75
github.com/kopia/kopia/internal/apiclient.loggingTransport.RoundTrip(0x12cac00, 0xc0002d3e00, 0xc004356700, 0xc0043277e0, 0xc0044712d0, 0x40cc48)
	/home/travis/gopath/src/github.com/kopia/kopia/internal/apiclient/apiclient.go:202 +0x7b
net/http.send(0xc004356700, 0x12cac20, 0xc00018a130, 0x0, 0x0, 0x0, 0xc004a3cbd8, 0x203001, 0x1, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xc0002d3e30, 0xc004356700, 0x0, 0x0, 0x0, 0xc004a3cbd8, 0x0, 0x1, 0xc004368660)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xc0002d3e30, 0xc004356700, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:699 +0x44a
net/http.(*Client).Do(...)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:567
github.com/kopia/kopia/internal/apiclient.(*KopiaAPIClient).runRequest(0xc00001a800, 0x12e42c0, 0xc004496100, 0x110b701, 0x3, 0xc00432d740, 0x54, 0x0, 0x0, 0xf1a040, ...)
	/home/travis/gopath/src/github.com/kopia/kopia/internal/apiclient/apiclient.go:69 +0x135
github.com/kopia/kopia/internal/apiclient.(*KopiaAPIClient).Put(...)

Server-side code is here:
https://github.com/kopia/kopia/blob/9a6dea898b79a47cab16a681f1a3310a1a38b820/internal/server/api_content.go#L62

Client-side code is here:
https://github.com/kopia/kopia/blob/923c91b5a45f48a042569bc38d0de26d1caa7377/internal/apiclient/apiclient.go#L44

According to the bug report, disabling HTTP/2 using GODEBUG resolves the issue.

Full disclosure: I'm a Googler, but Kopia is my personal project and not affiliated with Google.

@jkowalski
Copy link
Author

jkowalski commented Aug 16, 2020

I did some more digging through the stack traces and the server is handling 5 requests at this time:

goroutine 23751 - holding rlock - inside ioutil.ReadAll()
goroutine 23781 - holding rlock - inside ioutil.ReadAll()

goroutine 23888 - waiting for rlock - did not do ioutil.ReadAll() yet
goroutine 23939 - waiting for rlock - did not do ioutil.ReadAll() yet
goroutine 24018 - waiting for rlock - did not do ioutil.ReadAll() yet

goroutine 34 - waiting for r/w lock

My understanding is that since RWMutex is writer-preferring, 34 will not acquire the r/w lock until 23751 and 23781 finish, and will also prevent 24018, 23939 and 23888 from acquiring the read lock.

Now I'm wondering whether having 3 requests that are not reading request bodies immediately (because they have not acquired the shared lock) could prevent the other 2 requests that already have the shared lock from making progress, in particular due to the fact that request bodies are rather large ~20MB in size each.

In other words - does HTTP/2 require all request handlers to read request bodies to ensure liveness? It appears that HTTP/1.1 server does not have this behavior (I'm guessing because there is no request pipelining).

@networkimprov
Copy link

cc @fraenkel

@jkowalski
Copy link
Author

FWIW - we changed the code in Kopia to read all request payload before acquiring shared lock and got confirmation that it fixed the issue.

If timely reading of request bodies is indeed required for liveness, perhaps it should be documented.

@dmitshur dmitshur added this to the Backlog milestone Aug 18, 2020
@dmitshur
Copy link
Contributor

/cc @bradfitz @tombergan per owners.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2020
@oguzyildizz
Copy link

hi @jkowalski , were you able to find any workarounds for this problem? I'm having it pretty consistently too with http2

@networkimprov
Copy link

See his last comment above.

@oguzyildizz
Copy link

oh I misunderstood the problem then, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants