Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: tcpproxy test hangs under Go tip (due to splice), works on Go 1.10 #25985

Closed
bradfitz opened this issue Jun 20, 2018 · 6 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@bradfitz
Copy link
Contributor

https://github.com/google/tcpproxy at git rev dbc151467 has test hangs at Go tip (rev f3f7bd5) but works fine with Go 1.10.

Sometimes it even passes at Go tip, but with -count=20 I've never seen it pass more than a couple times before hanging:

dev:tcpproxy $ GOROOT=$HOME/go1.10 go test -v -run=PROXYOut -count=20 github.com/google/tcpproxy
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
PASS
ok      github.com/google/tcpproxy      0.016s

But with Go tip:

dev:tcpproxy $ GOROOT=$HOME/go go test -v -run=PROXYOut -count=20
=== RUN   TestProxyPROXYOut
--- PASS: TestProxyPROXYOut (0.00s)
=== RUN   TestProxyPROXYOut
....
(hangs)

If I send it SIGQUIT:

^\SIGQUIT: quit
PC=0x45dba1 m=0 sigcode=128

goroutine 0 [idle]:
runtime.futex(0x916360, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffc41051530, 0x40bf92, ...)
        /home/bradfitz/go/src/runtime/sys_linux_amd64.s:531 +0x21
runtime.futexsleep(0x916360, 0x0, 0xffffffffffffffff)
        /home/bradfitz/go/src/runtime/os_linux.go:46 +0x4b
runtime.notesleep(0x916360)
        /home/bradfitz/go/src/runtime/lock_futex.go:151 +0xa2
runtime.stopm()
        /home/bradfitz/go/src/runtime/proc.go:2013 +0xe3
runtime.findrunnable(0xc00002cf00, 0x0)
        /home/bradfitz/go/src/runtime/proc.go:2484 +0x4dc
runtime.schedule()
        /home/bradfitz/go/src/runtime/proc.go:2610 +0x13a
runtime.park_m(0xc000001980)
        /home/bradfitz/go/src/runtime/proc.go:2673 +0xae
runtime.mcall(0x0)
        /home/bradfitz/go/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000142300, 0x6f75a1, 0x11, 0x708718, 0x477401)
        /home/bradfitz/go/src/testing/testing.go:879 +0x37a
testing.runTests.func1(0xc000142200)
        /home/bradfitz/go/src/testing/testing.go:1119 +0x78
testing.tRunner(0xc000142200, 0xc0000afe08)
        /home/bradfitz/go/src/testing/testing.go:827 +0xbf
testing.runTests(0xc00012c260, 0x910280, 0x9, 0x9, 0x40d22f)
        /home/bradfitz/go/src/testing/testing.go:1117 +0x2a2
testing.(*M).Run(0xc000130000, 0x0)
        /home/bradfitz/go/src/testing/testing.go:1034 +0x165
main.main()
        _testmain.go:58 +0x13d

goroutine 13 [IO wait]:
internal/poll.runtime_pollWait(0x7f47b1a60af0, 0x72, 0xc000059b88)
        /home/bradfitz/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000186018, 0x72, 0xffffffffffffff00, 0x73fbe0, 0x8de450)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000186018, 0xc00018c000, 0x5d1, 0x5d1)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc000186000, 0xc00018c02f, 0x5d1, 0x5d1, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_unix.go:163 +0x179
net.(*netFD).Read(0xc000186000, 0xc00018c02f, 0x5d1, 0x5d1, 0x3, 0x0, 0x0)
        /home/bradfitz/go/src/net/fd_unix.go:203 +0x4f
net.(*conn).Read(0xc000180008, 0xc00018c02f, 0x5d1, 0x5d1, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/net/net.go:176 +0x68
bytes.(*Buffer).ReadFrom(0xc000188000, 0x73f040, 0xc000180008, 0xc000059d60, 0x59c0b8, 0xc000186000)
        /home/bradfitz/go/src/bytes/buffer.go:206 +0xb4
io/ioutil.readAll(0x73f040, 0xc000180008, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/io/ioutil/ioutil.go:36 +0xb5
io/ioutil.ReadAll(0x73f040, 0xc000180008, 0xc000180008, 0x73f040, 0xc000180008, 0x0, 0x0)
        /home/bradfitz/go/src/io/ioutil/ioutil.go:45 +0x3e
github.com/google/tcpproxy.TestProxyPROXYOut(0xc000142300)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy_test.go:316 +0x529
testing.tRunner(0xc000142300, 0x708718)
        /home/bradfitz/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
        /home/bradfitz/go/src/testing/testing.go:878 +0x353

goroutine 15 [chan receive]:
github.com/google/tcpproxy.(*Proxy).awaitFirstError(0xc000010a00, 0xc0000704e0)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:210 +0x3d
created by github.com/google/tcpproxy.(*Proxy).Start
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:205 +0x3e7

goroutine 14 [IO wait]:
internal/poll.runtime_pollWait(0x7f47b1a60f00, 0x72, 0x0)
        /home/bradfitz/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000130398, 0x72, 0xc000148000, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000130398, 0xffffffffffffff00, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000130380, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_unix.go:378 +0x1a0
net.(*netFD).accept(0xc000130380, 0xc00000e080, 0x0, 0x0)
        /home/bradfitz/go/src/net/fd_unix.go:239 +0x42
net.(*TCPListener).accept(0xc00000e070, 0xc000048748, 0xc000048750, 0x38)
        /home/bradfitz/go/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).Accept(0xc00000e070, 0x7086d8, 0xc000010a00, 0x742ae0, 0xc00000e080)
        /home/bradfitz/go/src/net/tcpsock.go:260 +0x47
github.com/google/tcpproxy.(*Proxy).serveListener(0xc000010a00, 0xc0000704e0, 0x7416a0, 0xc00000e070, 0xc00005cf30, 0x1, 0x1)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:216 +0x8d
created by github.com/google/tcpproxy.(*Proxy).Start
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:203 +0x20f

goroutine 16 [semacquire]:
internal/poll.runtime_Semacquire(0xc0001305a8)
        /home/bradfitz/go/src/runtime/sema.go:61 +0x39
internal/poll.(*FD).Close(0xc000130580, 0xc000130580, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_unix.go:106 +0xde
net.(*netFD).Close(0xc000130580, 0xc000156000, 0x742af8)
        /home/bradfitz/go/src/net/fd_unix.go:185 +0x4f
net.(*conn).Close(0xc00000e080, 0x0, 0x0)
        /home/bradfitz/go/src/net/net.go:200 +0x4a
github.com/google/tcpproxy.(*DialProxy).HandleConn(0xc000010a40, 0x742ae0, 0xc00000e080)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:388 +0x405
github.com/google/tcpproxy.(*Proxy).serveConn(0xc000010a00, 0x742ae0, 0xc00000e080, 0xc00005cf30, 0x1, 0x1, 0x1)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:239 +0x2df
created by github.com/google/tcpproxy.(*Proxy).serveListener
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:221 +0x76

goroutine 35 [IO wait]:
internal/poll.runtime_pollWait(0x7f47b1a60d60, 0x72, 0x6)
        /home/bradfitz/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000130698, 0x72, 0xffffffffffffff00, 0x73fbe0, 0x8de450)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000130698, 0x0, 0x400000, 0x2)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.spliceDrain(0xd, 0xc000130680, 0x400000, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/splice_linux.go:92 +0x99
internal/poll.Splice(0xc000130580, 0xc000130680, 0x4000000000000000, 0x0, 0xc000049500, 0x0, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/splice_linux.go:57 +0x215
net.splice(0xc000130580, 0x73f040, 0xc00000e088, 0xc000156000, 0x4, 0xc000049648, 0x406c7c)
        /home/bradfitz/go/src/net/splice_linux.go:30 +0xa8
net.(*TCPConn).readFrom(0xc00000e080, 0x73f040, 0xc00000e088, 0xc000049680, 0x40b63d, 0x6a84a0)
        /home/bradfitz/go/src/net/tcpsock_posix.go:48 +0x46
net.(*TCPConn).ReadFrom(0xc00000e080, 0x73f040, 0xc00000e088, 0x7f47b1a61100, 0xc00000e080, 0x456e01)
        /home/bradfitz/go/src/net/tcpsock.go:103 +0x5d
io.copyBuffer(0x73f060, 0xc00000e080, 0x73f040, 0xc00000e088, 0x0, 0x0, 0x0, 0x1, 0x1000000000001, 0x1)
        /home/bradfitz/go/src/io/io.go:388 +0x303
io.Copy(0x73f060, 0xc00000e080, 0x73f040, 0xc00000e088, 0xc000088720, 0x0, 0x1)
        /home/bradfitz/go/src/io/io.go:364 +0x5a
github.com/google/tcpproxy.proxyCopy(0xc000070600, 0x73f060, 0xc00000e080, 0x73f040, 0xc00000e088)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:428 +0x49
created by github.com/google/tcpproxy.(*DialProxy).HandleConn
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:385 +0x338

rax    0xca
rbx    0x916220
rcx    0x45dba3
rdx    0x0
rdi    0x916360
rsi    0x80
rbp    0x7ffc410514f8
rsp    0x7ffc410514b0
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x8
r13    0x33
r14    0x73a9a0
r15    0x0
rip    0x45dba1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
exit status 2
FAIL    github.com/google/tcpproxy      18.931s

If I disable the splice support (f2316c2) in Go 1.11 (with just an added return statement at the top of splice_linux.go's func splice), then it works again.

/cc @acln0 @danderson

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Jun 20, 2018
@bradfitz bradfitz added this to the Go1.11 milestone Jun 20, 2018
@bradfitz
Copy link
Contributor Author

This Close blocking is suspicious:

goroutine 18 [semacquire]:
internal/poll.runtime_Semacquire(0xc00014e028)
        /home/bradfitz/go/src/runtime/sema.go:61 +0x39
internal/poll.(*FD).Close(0xc00014e000, 0xc00014e000, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_unix.go:106 +0xde
net.(*netFD).Close(0xc00014e000, 0xc000156000, 0x742dd8)
        /home/bradfitz/go/src/net/fd_unix.go:185 +0x4f
net.(*conn).Close(0xc000152000, 0x0, 0x0)
        /home/bradfitz/go/src/net/net.go:200 +0x4a
github.com/google/tcpproxy.(*DialProxy).HandleConn(0xc000010940, 0x742dc0, 0xc000152000)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:399 +0x696
github.com/google/tcpproxy.(*Proxy).serveConn(0xc000010900, 0x742dc0, 0xc000152000, 0xc00005cd90, 0x1, 0x1, 0x5b2ac90b)
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:239 +0x2df
created by github.com/google/tcpproxy.(*Proxy).serveListener
        /home/bradfitz/src/github.com/google/tcpproxy/tcpproxy.go:221 +0x76

bradfitz added a commit to inetaf/tcpproxy that referenced this issue Jun 20, 2018
@acln0
Copy link
Contributor

acln0 commented Jun 20, 2018

Investigating.

@bradfitz
Copy link
Contributor Author

Note that we just worked around it with inetaf/tcpproxy@7f81f77 so if you try to reproduce, use the commit just before that.

@acln0
Copy link
Contributor

acln0 commented Jun 20, 2018

I can reproduce it. It hangs in exactly the same way on my machine.

@acln0
Copy link
Contributor

acln0 commented Jun 20, 2018

It seems like moving this defer to the top of HandleConn fixes the hang in the test, but it is still unclear why the hang was happening in the first place. Perhaps this is a clue. Still investigating.

EDIT: the following test, adapted from the https://github.com/google/tcpproxy code, seems to reproduce the issue reliably from within package net.

func TestSpliceIssue25985(t *testing.T) {
	front, err := newLocalListener("tcp")
	if err != nil {
		t.Fatal(err)
	}
	defer front.Close()
	back, err := newLocalListener("tcp")
	if err != nil {
		t.Fatal(err)
	}
	defer back.Close()

	proxy := func() error {
		src, err := front.Accept()
		if err != nil {
			return err
		}
		dst, err := Dial("tcp", back.Addr().String())
		if err != nil {
			return err
		}
		defer dst.Close()
		defer src.Close()
		errc := make(chan error, 1)
		go func() {
			_, err := io.Copy(src, dst)
			errc <- err
		}()
		go func() {
			_, err := io.Copy(dst, src)
			errc <- err
		}()
		return <-errc
	}

	proxyErrc := make(chan error, 1)
	go func() { proxyErrc <- proxy() }()

	toFront, err := Dial("tcp", front.Addr().String())
	if err != nil {
		t.Fatal(err)
	}

	io.WriteString(toFront, "foo")
	toFront.Close()

	fromProxy, err := back.Accept()
	if err != nil {
		t.Fatal(err)
	}
	defer fromProxy.Close()

	_, err = ioutil.ReadAll(fromProxy)
	if err != nil {
		t.Fatal(err)
	}

	<-proxyErrc
}

This test seems to reinforce the aforementioned observation about changing the order of the deferred Close calls. Furthermore, it exhibits very similar behavior to the original reported issue, including the blocking Close.

EDIT 2: I believe I've figured it out. I think it has to do with the way poll.Splice holds locks. Still not 100% sure the solution is the correct one, but I'll send a CL shortly. Sorry for the trouble.

@gopherbot
Copy link

Change https://golang.org/cl/120317 mentions this issue: internal/poll: use more fine-grained locking in Splice

@golang golang locked and limited conversation to collaborators Jun 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants