Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestTransportMaxConnsPerHost flakes #31784

Closed
bradfitz opened this issue May 1, 2019 · 14 comments
Closed

net/http: TestTransportMaxConnsPerHost flakes #31784

bradfitz opened this issue May 1, 2019 · 14 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented May 1, 2019

I've seen this a few times now. @fraenkel, could you take a look?

https://storage.googleapis.com/go-build-log/b6c795d5/android-amd64-emu_95c3cb1f.log

--- FAIL: TestTransportMaxConnsPerHost (0.05s)
    transport_test.go:638: request failed: Get https://127.0.0.1:34180: write tcp 127.0.0.1:37680->127.0.0.1:34180: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45068: read tcp 127.0.0.1:36221->127.0.0.1:45068: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45070: read tcp 127.0.0.1:36221->127.0.0.1:45070: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45074: read tcp 127.0.0.1:36221->127.0.0.1:45074: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45066: read tcp 127.0.0.1:36221->127.0.0.1:45066: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45072: read tcp 127.0.0.1:36221->127.0.0.1:45072: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45069: read tcp 127.0.0.1:36221->127.0.0.1:45069: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45071: read tcp 127.0.0.1:36221->127.0.0.1:45071: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45073: read tcp 127.0.0.1:36221->127.0.0.1:45073: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45067: write tcp 127.0.0.1:36221->127.0.0.1:45067: use of closed network connection
2019/05/01 16:13:25 http: TLS handshake error from 127.0.0.1:45080: write tcp 127.0.0.1:36221->127.0.0.1:45080: use of closed network connection
FAIL
exitcode=1go_android_exec: adb exec-out rm -rf /data/local/tmp/go_android_exec/http.test-20892
@bradfitz bradfitz added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. labels May 1, 2019
@bradfitz bradfitz added this to the Go1.13 milestone May 1, 2019
@bradfitz
Copy link
Contributor Author

bradfitz commented May 1, 2019

Oh, and ignore the http: TLS handshake error from spam. That's an unrelated issue that needs fixing. (Tests shouldn't spam)

The concerning part here is only:

--- FAIL: TestTransportMaxConnsPerHost (0.05s)
    transport_test.go:638: request failed: Get https://127.0.0.1:34180: write tcp 127.0.0.1:37680->127.0.0.1:34180: use of closed network connection

@fraenkel
Copy link
Contributor

fraenkel commented May 2, 2019

on the plus side, I can reproduce it

$ ~/workspace/go/bin/go test -run TestTransportMaxConnsPerHost -count=100
--- FAIL: TestTransportMaxConnsPerHost (0.01s)
    transport_test.go:638: request failed: Get https://127.0.0.1:39561: write tcp 127.0.0.1:36518->127.0.0.1:39561: use of closed network connection

@fraenkel
Copy link
Contributor

fraenkel commented May 2, 2019

another oddity that randomly appears when the other paases

$ ~/workspace/go/bin/go test -run TestTransportMaxConnsPerHost -count=100 -race
--- FAIL: TestTransportMaxConnsPerHost (0.03s)
    transport_test.go:662: Too many get connections (http2): 2
    transport_test.go:676: Too many get connections (http2): 3
--- FAIL: TestTransportMaxConnsPerHost (0.03s)
    transport_test.go:662: Too many get connections (http2): 2
    transport_test.go:676: Too many get connections (http2): 3
FAIL

@gopherbot
Copy link

Change https://golang.org/cl/175097 mentions this issue: net/http: skip flaky TestTransportMaxConnsPerHost for now

gopherbot pushed a commit that referenced this issue May 2, 2019
Updates #31784

Change-Id: Iee056c850c03939606b227a12715c76b0339d268
Reviewed-on: https://go-review.googlesource.com/c/go/+/175097
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@fraenkel
Copy link
Contributor

fraenkel commented May 3, 2019

@bradfitz I know how to make the test pass, but it is just papering over a potential issue.
The failing client is the 11th one. We have created a bit of a race between the current connection being closed and what connection the next client gets. It is quite a lot of code before MarkDead() is called in http2, and that window between Close() and MarkDead() allows any client to retrieve a connection that is closed.
The test case is simulating a network failure. I decided to try and make it cleaner by adding transport.CloseIdleConnections after the conn.Close(). Much to my surprise we end up in a hang at transport.go: line 1030.

@fraenkel
Copy link
Contributor

fraenkel commented May 3, 2019

I don't see how to fix the CloseIdleConnections which would have fixed the testcase.
The transport tracks host connection counts. When we close idle connections, the counts are not corrected for http2. We don't know the connect method key, nor do we know which connections are being closed on the http2 side.

@fraenkel
Copy link
Contributor

fraenkel commented May 3, 2019

I spoke too soon. We have the connection key and we can remove the host connection count. However, we really have no idea if the http2 connection is really closed or not. But we have marked the persistConn has broken regardless.

@fraenkel
Copy link
Contributor

fraenkel commented May 3, 2019

I have a fix coming. The definition of MaxConnPerHost once you CloseIdleConnections for http/2 is not always exact but it does attempt to do the right thing.

@gopherbot
Copy link

Change https://golang.org/cl/174950 mentions this issue: net/http: TestTransportMaxConnsPerHost flakes

@cuonglm
Copy link
Member

cuonglm commented May 11, 2019

@fraenkel
Copy link
Contributor

@cuonglm This is a different issue. Its the one I saw here #31784 (comment)

@cuonglm
Copy link
Member

cuonglm commented May 11, 2019

@fraenkel do we have an issue to track it already?

@fraenkel
Copy link
Contributor

Not that I know of.

@cuonglm
Copy link
Member

cuonglm commented May 11, 2019

@fraenkel I created one.

@golang golang locked and limited conversation to collaborators May 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants