Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: writing to TCP socket returns incorrect number of bytes upon timeout on Windows #24727

Open
piotrnar opened this issue Apr 6, 2018 · 7 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@piotrnar
Copy link

piotrnar commented Apr 6, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.1 windows/amd64

Does this issue reproduce with the latest release?

Yes - it seems to have been there for ages.

What operating system and processor architecture are you using (go env)?

I use Windows 10 64 bit.

What did you do?

The easiest way to see and reproduce the problem is by using the proof-of-concept tools.

First the server part - you can download it from here: https://pastebin.com/NUdxuMDw
Start the server on any PC that has TCP port 11111 open and accessible.
OS of the server machine does not matter.

Now download the actual test client: https://pastebin.com/SFQyyHfb
Run the test client on Windows.
You just need to specify one parameter on the command line: IP or host name of your running server (the one with open TCP port 11111).

The test is supposed to send 1MB of data to the server.
But you will see that it actually sends more....

The reason for it, is that whenever the Write() function returns a timeout, it always returns n=0.
Which seems to be wrong, as you can sometimes see some data going through.

This problem does not occur when you run the same code on Linux (there you can actually see n>0 being returned).
I have not checked other platforms.

What did you expect to see?

Always 1MB of data received at the server side.

What did you see instead?

More than 1MB of data received at the server side.

@piotrnar piotrnar changed the title net: Write to socket returns incorrect number of bytes written upon timeout (Window only) net: writing to TCP socket returns incorrect number of bytes upon timeout (Window only) Apr 6, 2018
@ALTree ALTree added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 6, 2018
@ALTree ALTree added this to the Go1.11 milestone Apr 6, 2018
@odeke-em odeke-em changed the title net: writing to TCP socket returns incorrect number of bytes upon timeout (Window only) net: writing to TCP socket returns incorrect number of bytes upon timeout on Windows Apr 24, 2018
@odeke-em
Copy link
Member

Thank you for filling this issue @piotrnar!

Huh, that's an interesting issue, I don't have access to a Windows computer to help test/repro this but I can perhaps help by combining most of your pastebin'd code into a self contained test that anyone can now run during investigations, and if a bug is found, it could be used as a test case to ensure no future regressions:

package main

import (
	"bytes"
	"io"
	"net"
	"testing"
	"time"
)

func TestIssue24727(t *testing.T) {
	ln, err := net.Listen("tcp4", ":0")
	if err != nil {
		t.Fatalf("Failed to bind to an available address: %v", err)
	}
	defer ln.Close()

	type result struct {
		n   int64
		err error
	}

	resultsChan := make(chan *result)
	go func() {
		defer close(resultsChan)

		var buf [0x100000]byte
		rConn, rErr := ln.Accept()
		if rErr != nil {
			t.Fatalf("Failed to accept a connection: %v", rErr)
		}

		var n int64
		var in int
		var err error
		defer func() {
			resultsChan <- &result{n: n, err: err}
		}()

		for {
			in, err = rConn.Read(buf[:])
			n += int64(in)
			if err == nil {
				continue
			}

			ne, ok := err.(net.Error)
			if ok && ne.Timeout() {
				err = nil
			} else {
				return
			}
		}
	}()

	addr := ln.Addr().String()
	wConn, err := net.Dial("tcp4", addr)
	if err != nil {
		t.Fatalf("Failed to dial to address (%s): %v", addr, err)
	}

	bufSize := 4096
	aSlice := bytes.Repeat([]byte("a"), bufSize)
	var wn int64

	runClientWrite := func() {
		defer wConn.Close()

		for i := 0; i < 1<<20; i += bufSize {
			wConn.SetWriteDeadline(time.Now().Add(1 * time.Millisecond))
			n, err := wConn.Write(aSlice)
			if err != nil {
				if nerr, ok := err.(net.Error); ok && nerr.Timeout() {
					err = nil
					if n != 0 {
						t.Errorf("n %d on timeout err: %v", n, err)
					} else {
						t.Errorf("Error: %v", err)
						return
					}
				}
			}
			wn += int64(n)
		}
	}
	runClientWrite()

	res := <-resultsChan
	t.Logf("ServerReadCount %d ClientWriteCount %d", res.n, wn)
	if g, w := res.n, wn; g != w {
		t.Errorf("Byte counts: serverReadCount %d clientWriteCount %d", g, w)
	}
	if err := res.err; err != nil && err != io.EOF {
		t.Fatalf("Unexpected error from server: %+v res: %#v", err, res)
	}
}

and I'll also page some net experts @mikioh @bradfitz @ianlancetaylor

@alexbrainman
Copy link
Member

@odeke-em your test outputs

C:\>u:\test
--- FAIL: TestIssue24727 (0.05s)
        main_test.go:68: Error: <nil>
        main_test.go:78: ServerReadCount 749568 ClientWriteCount 749568
FAIL

C:\>

on my Windows computer.

Alex

@odeke-em
Copy link
Member

Thank you @alexbrainman for running it!

I just noticed that I had prematurely added the else clause in if n != 0 yet it should instead have been the else clause for if nerr, ok := e.(net.Error); ok && nerr.Timeout() { i.e.

--- old.go	2018-04-25 00:15:22.000000000 -0700
+++ fixed.go	2018-04-25 00:15:06.000000000 -0700
@@ -73,11 +73,12 @@
 				if nerr, ok := err.(net.Error); ok && nerr.Timeout() {
 					err = nil
 					if n != 0 {
-						t.Errorf("n %d on timeout err: %v", n, err)
-					} else {
-						t.Errorf("Error: %v", err)
+						t.Logf("n %d", n)
 						return
 					}
+				} else {
+					t.Errorf("Error: %v", err)
+					return
 				}
 			}
 			wn += int64(n)

Sorry for that!

Could you please try again with this corrected version?

package main

import (
	"bytes"
	"io"
	"net"
	"testing"
	"time"
)

func TestIssue24727(t *testing.T) {
	ln, err := net.Listen("tcp4", ":0")
	if err != nil {
		t.Fatalf("Failed to bind to an available address: %v", err)
	}
	defer ln.Close()

	type result struct {
		n   int64
		err error
	}

	resultsChan := make(chan *result)
	go func() {
		defer close(resultsChan)

		var buf [0x100000]byte
		rConn, rErr := ln.Accept()
		if rErr != nil {
			t.Fatalf("Failed to accept a connection: %v", rErr)
		}

		var n int64
		var in int
		var err error
		defer func() {
			resultsChan <- &result{n: n, err: err}
		}()

		for {
			in, err = rConn.Read(buf[:])
			n += int64(in)
			if err == nil {
				continue
			}

			ne, ok := err.(net.Error)
			if ok && ne.Timeout() {
				err = nil
			} else {
				return
			}
		}
	}()

	addr := ln.Addr().String()
	wConn, err := net.Dial("tcp4", addr)
	if err != nil {
		t.Fatalf("Failed to dial to address (%s): %v", addr, err)
	}

	bufSize := 4096
	aSlice := bytes.Repeat([]byte("a"), bufSize)
	var wn int64

	runClientWrite := func() {
		defer wConn.Close()

		for i := 0; i < 1<<20; i += bufSize {
			wConn.SetWriteDeadline(time.Now().Add(1 * time.Millisecond))
			n, err := wConn.Write(aSlice)
			if err != nil {
				if nerr, ok := err.(net.Error); ok && nerr.Timeout() {
					err = nil
					if n != 0 {
						t.Logf("n %d", n)
						return
					}
				} else {
					t.Errorf("Error: %v", err)
					return
				}
			}
			wn += int64(n)
		}
	}
	runClientWrite()

	res := <-resultsChan
	t.Logf("ServerReadCount %d ClientWriteCount %d", res.n, wn)
	if g, w := res.n, wn; g != w {
		t.Errorf("Byte counts: serverReadCount %d clientWriteCount %d", g, w)
	}
	if err := res.err; err != nil && err != io.EOF {
		t.Fatalf("Unexpected error from server: %+v res: %#v", err, res)
	}
}

@alexbrainman
Copy link
Member

Could you please try again with this corrected version?

C:\>u:\test
PASS

C:\>

Alex

@piotrnar
Copy link
Author

If you want to have it work (fail) on localhost connection, you have to add delay (time.Sleep) before in, err = rConn.Read(buf[:]) so the socket's buffer gets a chance to become full.

10ms should be enough, but better go for time.Sleep(100 * time.Millisecond)

@piotrnar
Copy link
Author

The code like this:

package main

import (
	"bytes"
	"io"
	"net"
	"testing"
	"time"
)

func TestIssue24727(t *testing.T) {
	ln, err := net.Listen("tcp4", ":0")
	if err != nil {
		t.Fatalf("Failed to bind to an available address: %v", err)
	}
	defer ln.Close()

	type result struct {
		n   int64
		err error
	}

	resultsChan := make(chan *result)
	go func() {
		defer close(resultsChan)

		var buf [0x100000]byte
		rConn, rErr := ln.Accept()
		if rErr != nil {
			t.Fatalf("Failed to accept a connection: %v", rErr)
		}

		var n int64
		var in int
		var err error
		defer func() {
			resultsChan <- &result{n: n, err: err}
		}()

		for {
			time.Sleep(100 * time.Millisecond)
			in, err = rConn.Read(buf[:])
			n += int64(in)
			if err == nil {
				continue
			}

			ne, ok := err.(net.Error)
			if ok && ne.Timeout() {
				err = nil
			} else {
				return
			}
		}
	}()

	addr := ln.Addr().String()
	wConn, err := net.Dial("tcp4", addr)
	if err != nil {
		t.Fatalf("Failed to dial to address (%s): %v", addr, err)
	}

	bufSize := 4096
	aSlice := bytes.Repeat([]byte("a"), bufSize)
	var wn int64

	runClientWrite := func() {
		defer wConn.Close()

		for i := 0; i < 1<<20; i += bufSize {
			wConn.SetWriteDeadline(time.Now().Add(1 * time.Millisecond))
			n, err := wConn.Write(aSlice)
			if err != nil {
				if nerr, ok := err.(net.Error); ok && nerr.Timeout() {
					err = nil
					if n != 0 {
						t.Logf("n %d", n)
						return
					}
				} else {
					t.Errorf("Error: %v", err)
					return
				}
			}
			wn += int64(n)
		}
	}
	runClientWrite()

	res := <-resultsChan
	t.Logf("ServerReadCount %d ClientWriteCount %d", res.n, wn)
	if g, w := res.n, wn; g != w {
		t.Errorf("Byte counts: serverReadCount %d clientWriteCount %d", g, w)
	}
	if err := res.err; err != nil && err != io.EOF {
		t.Fatalf("Unexpected error from server: %+v res: %#v", err, res)
	}
}

Windows:

> go test
--- FAIL: TestIssue24727 (0.31s)
        tcp_test.go:91: ServerReadCount 1048576 ClientWriteCount 696320
        tcp_test.go:93: Byte counts: serverReadCount 1048576 clientWriteCount 696320
FAIL
exit status 1
FAIL    _/E_/nettest    0.329s

Linux:

> go test
PASS
ok      _/mnt/e/nettest 0.312s

@piotrnar
Copy link
Author

However...

I'm not quite sure if this is a good idea to test this on localhost, because when I try it on Linux the socket write function never returns a timeout.

It does return timeouts on Windows, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

6 participants