Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: failure in TestTransportConnectionCloseOnRequest due to reused client port #52450

Closed
bcmills opened this issue Apr 20, 2022 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 20, 2022

--- FAIL: TestTransportConnectionCloseOnRequest (0.00s)
    transport_test.go:286: error in connectionClose=true. unexpected bodiesDiffer=false; body1="127.0.0.1:17137"; body2="127.0.0.1:17137"
FAIL
FAIL	net/http	10.018s

greplogs --dashboard -md -l -e 'FAIL: TestTransportConnectionCloseOnRequest .*' --since=2021-07-01

2022-04-19T15:59:25-7a06243/openbsd-amd64-68
2021-12-21T18:43:00-9502339/openbsd-386-70-n1
2021-11-22T20:34:40-f13fcd9/openbsd-amd64-70

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 20, 2022
@bcmills bcmills added this to the Go1.19 milestone Apr 20, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Apr 20, 2022

This error message is fairly inscrutable to me (compare https://go.dev/wiki/CodeReviewComments#useful-test-failures). I think it is asserting that two consecutive calls that set req.Close do not use the same port number for both calls. However, I do not immediately see why that would be the case.

Since the request sets Close: true, that should cause the server's transport to set closeAfterReply to true:
https://cs.opensource.google/go/go/+/master:src/net/http/server.go;l=1313-1315;drc=ac01de5446ec92544768dabee3b0d1faf5f596d9

That causes the server to close its write side of the connection immediately after it has written the response:
https://cs.opensource.google/go/go/+/master:src/net/http/server.go;l=1974-1979;drc=master

If I understand correctly, the fact that the server closes the connection may avoid the TCP TIME-WAIT delay on the client side, allowing the port to be reused immediately. If that is the case, then the test seems bogus: it is asserting that the port is not immediately reused, but not actually preventing that reuse.

@neild, @ianlancetaylor, @bradfitz: could one of you confirm my understanding? (If the above is correct, then this failure mode is not specific to OpenBSD and we should fix the test rather than skipping it.)

@bcmills bcmills changed the title net/http: cryptic failure in TestTransportConnectionCloseOnRequest on openbsd net/http: failure in TestTransportConnectionCloseOnRequest due to reused client port Apr 20, 2022
@bradfitz
Copy link
Contributor

That test is testing that http.Request.Close is behaving on outgoing requests.

When false, the underlying connection should be reused for the two HTTP requests and they should both have the same body (the remote ip:port the http.Handler saw).

When true, an underlying connection should never be reused. The server's behavior and TIME-WAIT are irrelevant: if we set "Connection: close" on our outbound request, the Transport should never add that TCP connection back to its free pool for reuse on the 2nd request.

So this error looks real to me. I don't think the test is bad.

@bradfitz
Copy link
Contributor

Note that this is a Transport test, in transport_test.go. The bug will almost certainly be in the client side probably in transport.go, not on the server side in server.go.

@bradfitz
Copy link
Contributor

And the Transport code looks like it's fine:

index f2d538b04a..e5ad44da3d 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -2136,6 +2136,9 @@ func (pc *persistConn) readLoop() {
                        // or we get an unexpected informational (1xx) response.
                        // StatusCode 100 is already handled above.
                        alive = false
+                       if rc.req.Close {
+                               panic("yup it was close")
+                       }
                }
 
                if !hasBody || bodyWritable {

=>

dev:http $ go test -v -run=TestTransportConnectionCloseOnRequest -race -failfast -count=1000
=== RUN   TestTransportConnectionCloseOnRequest
panic: yup it was close

goroutine 36 [running]:
net/http.(*persistConn).readLoop(0xc000388360)
        /home/bradfitz/go/src/net/http/transport.go:2140 +0x1818
created by net/http.(*Transport).dialConn
        /home/bradfitz/go/src/net/http/transport.go:1750 +0x2586
exit status 2
FAIL    net/http        0.025s

So it's definitely setting alive = false which will make it not return the connection to the pool. Yet it is somehow?

@bcmills
Copy link
Contributor Author

bcmills commented Apr 20, 2022

So it's definitely setting alive = false which will make it not return the connection to the pool. Yet it is somehow?

My hypothesis is that the test is creating two distinct TCP connections that happen to use the same outbound port. That's the connection to TIME-WAIT: it seems plausible for the second connection to have the same outbound port if (and only if):

  1. the first connection is closed before the second one is dialed, and
  2. the port from the first connection is not still in the TIME-WAIT state.

@bradfitz
Copy link
Contributor

Ah, okay. Yeah, I suppose that's possible. Low odds and kernels try to avoid it, which is why it probably shows up more on OpenBSD: different networking stack. Maybe they don't care/try there.

We could make the http.Handler's repsonse body include the %p of the net.TCPConn as well. I'll try that out.

@gopherbot
Copy link

Change https://go.dev/cl/401314 mentions this issue: net/http: deflake TestTransportConnectionCloseOnRequest

@golang golang locked and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants