Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/rpc/jsonrpc: ServeConn doesn't detect network client hangup #2427

Closed
bytbox opened this issue Nov 7, 2011 · 10 comments
Closed

net/rpc/jsonrpc: ServeConn doesn't detect network client hangup #2427

bytbox opened this issue Nov 7, 2011 · 10 comments
Milestone

Comments

@bytbox
Copy link
Contributor

bytbox commented Nov 7, 2011

issue #1828 is unfixed at tip. (And, obviously, no test cases exist for testing this
behavior.) Details and patched attachments copied below


What is the expected output?
A single "connection reset" log message:
2011/05/14 16:36:34 rpc: rpc: server cannot decode request: read tcp [::1]:52123:
connection reset by peer

What do you see instead?
The above log message is repeated ad infinitum.

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Ubuntu 11.04

Which revision are you using?  (hg identify)
5d2ce135523+ (release-branch.r57) release/release.r57.1
I did check top of tree, though.


Please provide any additional information below.
The underlying problem is that rpc.*Server·ServeCodec serves until it gets an os.EOF or
io.ErrUnexpectedEOF, but json.*Decoder·readValue gets a net.OpError, so it just loops
forever, trying and failing to read from the same connection.

Two possible fixes:
- rpc.*Server·ServeCodec stops serving for all error types instead of just os.EOF and
io.ErrUnexpectedEOF
- jsonrpc.*serverCodec·ReadRequestHeader, which calls into the json decoder, masks all
non-eof errors as io.ErrUnexpectedEOF

I'm not sure of the ramifications for the first, so I tried the second, and it appears
to work. Here's a patch that does it:
--- a/src/pkg/rpc/jsonrpc/server.go     Wed May 04 00:17:17 2011 -0400
+++ b/src/pkg/rpc/jsonrpc/server.go     Sat May 14 17:34:33 2011 -0700
@@ -67,6 +67,9 @@
 func (c *serverCodec) ReadRequestHeader(r *rpc.Request) os.Error {
        c.req.reset()
        if err := c.dec.Decode(&c.req); err != nil {
+               if err != os.EOF {
+                       return io.ErrUnexpectedEOF
+               }
                return err
        }
        r.ServiceMethod = c.req.Method

Attachments:

  1. main.go (754 bytes)
  2. jsonrpc.py (370 bytes)
@bytbox
Copy link
Contributor Author

bytbox commented Nov 8, 2011

Comment 1:

Correction: use the attached (modified) jsonrpc.py, which closes the connection at the
end. Expected output: none. Actual output, usually none, but sometimes "rpc: rpc: server
cannot decode request: read tcp [::1]:52123: connection reset by peer". Less often seen
is "rpc: writing response: write tcp [::1]:58961: broken pipe"

Attachments:

  1. jsonrpc.py (382 bytes)

@rsc
Copy link
Contributor

rsc commented Nov 9, 2011

Comment 2:

Status changed to WaitingForReply.

@rsc
Copy link
Contributor

rsc commented Nov 9, 2011

Comment 3:

I think this is working fine at tip.
I added some prints to ServeCodec
diff -r 6b38b2ad3d41 src/pkg/net/rpc/server.go
--- a/src/pkg/net/rpc/server.go Wed Nov 09 16:01:55 2011 -0500
+++ b/src/pkg/net/rpc/server.go Wed Nov 09 16:07:19 2011 -0500
@@ -391,10 +391,13 @@
 // ServeCodec is like ServeConn but uses the specified codec to
 // decode requests and encode responses.
 func (server *Server) ServeCodec(codec ServerCodec) {
+println("serving", codec)
    sending := new(sync.Mutex)
    for {
+println("read...", codec)
        service, mtype, req, argv, replyv, keepReading, err := server.readRequest(codec)
        if err != nil {
+println("read request", codec, err.Error())
            if err != io.EOF {
                log.Println("rpc:", err)
            }
@@ -410,6 +413,7 @@
        }
        go service.call(server, sending, mtype, req, argv, replyv, codec)
    }
+println("close", codec)
    codec.Close()
 }
 
When I run your program and jsonrpc.py against it, it prints:
$ python2.6 jsonrpc.py
{"params": [{"First": 99}], "method": "RpcOps.AddOne", "id": "1"}
serving (0xf84001efc0,0xf840051000)
read... (0xf84001efc0,0xf840051000)
read... (0xf84001efc0,0xf840051000)
read request (0xf84001efc0,0xf840051000) EOF
close (0xf84001efc0,0xf840051000)
$ 
You say that you are using release.r57, and it is definitely
not fixed there.  You also say that you checked tip. 
What does checked mean?  Did you run the code or just look
for your fix?  Your fix is strange - it turns any non-EOF
error into ErrUnexpectedEOF, losing information - and we 
solved the problem by adding the additional keepReading bool
return value instead.
Can you please try running your program at tip?
I also tried running your jsonrpc.py with the sendall
argument changed to 'This is not JSON!'.  It prints a
single error about not being able to decode and then
closes the connection gracefully.
Russ

@bytbox
Copy link
Contributor Author

bytbox commented Nov 9, 2011

Comment 4:

Sorry - I did not modify the details from issue #1828, so the release information was
wrong. I am indeed running at tip - attached is a main.go fixed again for the latest
changes, and my system information is below. Sorry for not filing properly.
Again, it works fine most (=80%) of the time, but it prints inconsistent error messages
the other 20%. Some experimentation shows that it's more likely to go the error route
when the system is under moderate load.
What operating system are you running?
# Archlinux.
sh-4.2$ uname -a
Linux jagadai 3.1.0-ARCH-09552-g82f614f #47 SMP PREEMPT Sat Nov 5 22:50:26 EDT 2011
x86_64 Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz GenuineIntel GNU/Linux
Which compiler are you using?
sh-4.2$ 6g -V
6g version weekly.2011-11-09 10399+
What operating system are you running.
sh-4.2$ hg id
b23afd541c82+ tip

Attachments:

  1. main.go (723 bytes)

@rsc
Copy link
Contributor

rsc commented Dec 9, 2011

Comment 5:

Labels changed: added priority-later.

@rsc
Copy link
Contributor

rsc commented Dec 12, 2011

Comment 6:

Labels changed: added priority-go1.

@robpike
Copy link
Contributor

robpike commented Jan 13, 2012

Comment 7:

Owner changed to builder@golang.org.

@rsc
Copy link
Contributor

rsc commented Jan 24, 2012

Comment 8:

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2012

Comment 9:

Labels changed: added go1-must.

@rsc
Copy link
Contributor

rsc commented Feb 12, 2012

Comment 10:

I looked into this.  If I start your server and run the python script 100 times, then I
do in fact get the occasional error you reported, but that is not surprising: the python
client has hung up unexpectedly, and that is being reported.  Exactly whether the error
gets to the server in time depends on TCP connection timing, I suspect.
Regardless of whether the error gets printed, the connection close does seem to be
processed correctly.  After running a few hundred runs of the python script, I ran
killall -abrt a.out and there were no unexpected goroutines, just main (listening) and
the epoll goroutine.
So I think this is working correctly.

Status changed to WorkingAsIntended.

@rsc rsc added this to the Go1 milestone Apr 10, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants