New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: transports with MaxConnsPerHost != 0 can deadlock #32336
Comments
Here's another one from the https://build.golang.org/log/0329c5cb84d687a5adaca96483af1cda0dd8dd28
|
The fact that it deadlocked in |
This is slightly different than the original issue. The original issue was a counting problem only affecting http2. The two reported issues here occur on http (android) and http2.(amd64). |
The other identical behavior in both is that they have at least one (two for http2) goroutines processing a request. The processing is reading the header/headerFrame waiting on additional bytes but there is no client routine. |
Another one on |
This is a real bug in net/http. The implementation of Transport.MaxConnsPerHost is buggy and can deadlock. The saving grace appears to be that the field was added in Go 1.11 and almost no one uses it. When there is no limit, there is no bug. If I add this (perfectly valid) sleep, then the test basically always hangs:
The problem is that the other side of this channel is tryPutIdleConn, which is doing:
That is, tryPutIdleConn assumes that inability to send to waitingDialer means the dialer was satisfied some other way and no longer wants the connection. This is not true - maybe the dialer is just about to execute the select (perhaps temporarily stuck in an artificial sleep!). Then the inability to send just means the dialer is slow to select, not that it doesn't want the connection. Worse, there can be multiple waiting dialers with the channel. The fact that none of them are receiving this instant does not mean they won't try shortly. Deleting the channel from the map strands all those waiting dialers - any future tryPutIdleConn will not wake them either. This doesn't matter in the test since MaxConnsPerHost = 1. But in general it would. Since MaxConnsPerHost = 1, though, tryPutIdleConn puts the mistaken-for-unwanted conn in the t.idleConn map. The test runs with a mostly-zero client Transport, and in particular IdleConnTimeout = 0, so the conn is never going to fall out of the idle map, never going to get closed, and never going to decrement the counter and let any of the getConns try to do their own dials. Everything is just stuck. This code was always racy, back to 2013 at least, but it didn't matter because if you lost the race you just created a few extra persistConns. Really the accounting wasn't racy so much as sloppy or imprecise. But the addition of MaxConnsPerHost in CL 71272 (cae5c7f) (cc @meirf) assumes the accounting is precise and therefore hangs. I will look into what the best fix is tomorrow, but I wanted to write this down for now. |
@gopherbot please backport go1.11 go1.12 |
Backport issue(s) opened: #32822 (for 1.11), #32823 (for 1.12). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Any fix should probably try to address #29889 at the same time. |
Also #31982, which I've closed as a duplicate of this one (different failure but same root cause: MaxConnsPerHost is just fundamentally flawed). |
I can take a guess at this. The actual connection management for http/2 is done on the http/2 side. I have mentioned in a few PRs/issues that there are issues with the split between http and http2 since the connection may be gone on the http2 but the http side will still have a pconn in the map since there is no reverse path. In the end all http is doing is dialing and handing over a connection. |
@rsc To understand in detail the h1/h2 setup, I recommend watching this video https://youtu.be/FARQMJndUn0 by bradfitz starting from either of these points and continuing until the end of the video: I believe @fraenkel is correct. Thank you for investigating and fixing. |
Change https://golang.org/cl/184262 mentions this issue: |
A deadlock observed on the
android-386-emu
builder(https://build.golang.org/log/f34d8448b943252002ed53347d9ecc20d1c59d6f).
I don't see any
runnable
goroutines, and the only onerunning
is intime/sleep.go
.CC @bradfitz
The text was updated successfully, but these errors were encountered: