Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestTransportRemovesDeadIdleConnections failing on Plan 9 #15464

Closed
0intro opened this issue Apr 27, 2016 · 10 comments
Closed

net/http: TestTransportRemovesDeadIdleConnections failing on Plan 9 #15464

0intro opened this issue Apr 27, 2016 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Plan9
Milestone

Comments

@0intro
Copy link
Member

0intro commented Apr 27, 2016

TestTransportRemovesDeadIdleConnections (CL 22492) is failing on Plan 9.

--- FAIL: TestTransportRemovesDeadIdleConnections (3.12s)
    transport_test.go:480: Transport didn't notice idle connection's death.
        before: ["|http|127.0.0.1:38937"]
         after: ["|http|127.0.0.1:38937"]
http.test 3212: warning: process exceeds 100 file descriptors

See https://build.golang.org/log/669586c471ca9095d6b6f2d0237e975e10792022

@0intro 0intro self-assigned this Apr 27, 2016
@0intro 0intro added this to the Go1.7 milestone Apr 27, 2016
@0intro
Copy link
Member Author

0intro commented Apr 27, 2016

I've added some debug prints:

cpu% go test -v -run TestTransportRemovesDeadIdleConnections
=== RUN   TestTransportRemovesDeadIdleConnections
newFD /net/tcp/11 local 127.0.0.1:60699 remote <nil>
newFD /net/tcp/18 local 127.0.0.1:60699 remote 127.0.0.1:39558
newFD /net/tcp/12 local 127.0.0.1:39558 remote 127.0.0.1:60699
readLoop: Peek
Read: /net/tcp/12
Read: /net/tcp/18
Read: /net/tcp/18 -> 132
Read: /net/tcp/18
Read: /net/tcp/12 -> 132
readLoop: numExpectedResponses: 1
readLoop: Peek
Read: /net/tcp/12
closeConnChan
Close: /net/tcp/18
Hangup: write /net/tcp/11/listen: inappropriate use of fd
waitCondition: timed out
Close: /net/tcp/12
Hangup
Read: /net/tcp/12 -> EOF
Read: /net/tcp/18 -> read /net/tcp/18/data: Hangup
readLoop: numExpectedResponses: 0
readLoop: returned
removeIdleConn
Close: /net/tcp/18
--- FAIL: TestTransportRemovesDeadIdleConnections (3.37s)
    transport_test.go:480: Transport didn't notice idle connection's death.
        before: ["|http|127.0.0.1:60699"]
         after: ["|http|127.0.0.1:60699"]
FAIL
exit status: 'http.test 740549: 1'
FAIL    net/http    3.425s

The server listens on /net/tcp/11. The client opens a connection on /net/tcp/12 which is accepted by the server on /net/tcp/18.

In readLoop, on the first iteration (numExpectedResponses=1), the client calls Peek(1) and reads 132 bytes. Then, the server closes the connection on his side. On the second iteration (numExpectedResponses=0), the client calls Peek(1), but the Read never returns. Consenquently, we never reach the "numExpectedResponses == 0" test to leave readLoop.

This happens because server closes the connection on his side (/net/tcp/18), but the connection is still open on client's side (/net/tcp/12).

This issue can be reproduced independently of Go, like this:

Listen on port 8080:

% aux/listen1 -v tcp!127.1!8080 cat

Connect to port 8080:

% netdial tcp!127.1!8080

Figure out what is the connection file on the server side:

% for(i in /net/tcp/*){if(grep 8080 $i/local >/dev/null >[2=1] && grep ::!0 $i/remote >/dev/null >[2=1]){echo $i}}
/net/tcp/11

Figure out what is the connection file on the client side:

% for(i in /net/tcp/*){if(grep 8080 $i/remote >/dev/null >[2=1]){echo $i}}
/net/tcp/12

Close the connection from the server side:

% echo hangup >/net/tcp/11/ctl

The connection on the client side is still on "established" state:

% cat /net/tcp/12/status
Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 32688 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 1016

Closing the connection with "hangup" is supposed to send a RST to the other side. I'm not sure why the connection remains in "established" state. I'll investigate.

FYI @bradfitz

@0intro
Copy link
Member Author

0intro commented Apr 27, 2016

I noticed a mistake in my previous test.

This is the listener file:

%  for(i in /net/tcp/*){if(grep 8080 $i/local >/dev/null >[2=1] && ! grep ::!0 $i/remote >/dev/null >[2=1]){echo $i}}
/net/tcp/11

This is the connection file on the server side:

%  for(i in /net/tcp/*){if(grep 8080 $i/local >/dev/null >[2=1] && grep ::!0 $i/remote >/dev/null >[2=1]){echo $i}}
/net/tcp/17

Close the connection from the server side:

% echo hangup >/net/tcp/17/ctl

Closing the connection from the server side sends RST to the client, which closes the connection on his side, as expected:

% cat /net/tcp/12/status
Closed qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 4380 swin 1048560>>4 rwin 1048560>>4 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2154

I'm still not sure why it doesn't work in Go, since the read syscall should return -1 in this case, with errstr set to "Hangup".

@0intro
Copy link
Member Author

0intro commented Apr 27, 2016

The issue is that in the "accept" case, netFD.ctl represents listener's /net/tcp/*/listen file, while the "hangup" string should be written to connection's /net/tcp/*/ctl file.

The "hangup" string is not handled on the listen file. That's why you get the "inappropriate use of fd" error in this case.

Writing the "hangup" string to the ctl string solves the issue here, but it wouldn't be a good idea to do it, since the connection would be closed before the client have received all the data.

@bradfitz
Copy link
Contributor

Is this a dup of #11476, etc? I feel like a number of the failing net and net/http tests were because net.TCPConn.Read would never wake up when somebody else closes the TCP connection. If you fix this, I imagine you can remove the t.Skip calls from a bunch of tests.

@0intro
Copy link
Member Author

0intro commented Apr 27, 2016

Yes, but I don't have a proper fix for this issue yet. I propose that we skip this test until we figure out the right way to fix this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/22513 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 28, 2016
Updates #15464.

Change-Id: If3221034bb10751c6fcf1fbeba401a879c18079f
Reviewed-on: https://go-review.googlesource.com/22513
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz bradfitz modified the milestones: Go1.8Maybe, Go1.7 May 19, 2016
@bradfitz
Copy link
Contributor

Not a Go 1.7 blocker, since Plan 9 isn't a first class port. I'd consider a CL for for 1.7, though, depending on what the fix ends up looking like.

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 10, 2016
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31270 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31271 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 18, 2016
Previously, we used to write the "hangup" message to
the TCP connection control file to be able to close
a connection, while waking up the readers.

The "hangup" message closes the TCP connection with a
RST message. This is a problem when closing a connection
consecutively to a write, because the reader may not have
time to acknowledge the message before the connection is
closed, resulting in loss of data.

We use a "close" message, newly implemented in the Plan 9
kernel to be able to close a TCP connection gracefully with a FIN.

Updates #15464.

Change-Id: I2050cc72fdf7a350bc6c9128bae7d14af11e599c
Reviewed-on: https://go-review.googlesource.com/31271
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 18, 2016
This version adds a TCP "close" message:

    http://9legacy.org/9legacy/patch/9-tcp-close.diff

Updates golang/go#15464

Change-Id: I4866858338db4b408fd180b5d89560aa86049adc
Reviewed-on: https://go-review.googlesource.com/31327
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31393 mentions this issue.

@golang golang locked and limited conversation to collaborators Oct 18, 2017
@rsc rsc unassigned 0intro Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Plan9
Projects
None yet
Development

No branches or pull requests

4 participants