New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: TestDialerDualStack fails when net.inet.tcp.blackhole=2 #12052
Comments
Does the failing test pass if you run it directly without a timeout?
|
Yes it does with a few warnings i think are clang specific: ~/golang/src$ go test -test.short net net/usr/local/go/src/net/cgo_unix.go:53:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes] netcc: warning: argument unused during compilation: '-pthread' |
Just to be sure i redid with the correct CC and GOROOT_BOOSTRAP variables then everything seems fine: ~/golang/src$ CC=clang GOROOT_BOOTSTRAP=/usr/local/go go test -test.short net |
Marking for 1.6 to make sure the timeouts are long enough. |
The set of net package test cases in short mode usually takes 2-4s, 10-20s on poor processors. @aperum can you show us the hardware profile (and load average) of your node under the test? |
Hardware is an i7 with 32GB RAM: $ sysctl hw.model This is a snapshot while building go: vmstat 20 The 100% idle lines are actually while the net tests are running. I also redid the test and made sure nothing is blocked by the firewall. The process hanging around until the timeout hits seems to be this: root 71283 0,0 0,0 46212 15896 11 S+ 1:44pm 0:00,33 /tmp/go-build754929443/net/http/_test/http.test -test.short=true -test.timeout=3m0s Attaching truss to this process shows a lots of repeats of the following: $ truss -f -s 255 -p 71283 -d |
Thanks. Can you try the following and let us know which one stalls for fault isolation.
and
|
I used the freshly compiled go tool version to do these tests instead of the bootstrap version: ~/golang/src/net$ GOROOT=/root/golang ~/golang/bin/go test -c ~/golang/src/net/http$ GOROOT=/root/golang ~/golang/bin/go test -c There are a lot of http tests taking >600 secs. But i think this is a local problem as i can't reproduce them on another machine running the exact same OS version. So the only common failure is the TestDialerDualStack test from the net test suite. Doing a tcpdump on the very long running http tests reveals countless packets 8128 bytes in length filled with 'a's. But as said on antoher machine the http test suite runs fine without a failure... I uploaded the full transcript of the net and http tests, sans interface addresses, here: |
Debugging this a bit further i extracted the dialClosedPort function from dial_test.go into a small standalone program and using this and tcpdump i saw that there were no RST packets sent. $ sysctl net.inet.tcp.blackhole This usually prevents our SYN flooded servers from responding with a RST flood which is a good thing. Sadly this is a global option and not per interface, so this also happens on the loopback device. This fixes the original problem. It seems this test was introduced with 1.5 so it never blew up before. I still get the http test errors but as i only get them on one of two machines i'll consider them a local problem for now. |
@aperum, Thanks for the investigation. Please open a separate issue for net/http stuff if you have a such weird memory corruption? issue consistently. /CC @pmarks-net |
In theory, dialClosedPort could detect this configuration and skip all relevant tests, but this is probably working as intended: if you blackhole localhost, you're gonna have a bad time. |
It's quite common to use this option with hosts facing the internet but i agree it is a pity that this option also applies to loopback devices. But as you said this is more of an OS than a golang issue so i'll close this report, thanks for the debug help! |
$ pkg version -e go
go-1.4.2,1
This is the stock ports version used to bootstrap the build.
$ uname -m
amd64
$ freebsd-version -ku
10.1-RELEASE-p16
10.1-RELEASE-p16
With a freshly cloned ~/golang:
~/golang$ git checkout go1.5rc1
~/golang/src$ CC=clang GOROOT_BOOTSTRAP=/usr/local/go ./all.bash
...
ok mime 0.011s
ok mime/multipart 0.123s
ok mime/quotedprintable 0.342s
panic: test timed out after 3m0s
goroutine 166 [running]:
testing.startAlarm.func1()
/home/aperum/golang/src/testing/testing.go:703 +0x132
created by time.goFunc
/home/aperum/golang/src/time/sleep.go:129 +0x3a
goroutine 1 [chan receive]:
testing.RunTests(0x7986c8, 0x88e4a0, 0x9c, 0x9c, 0x51af01)
/home/aperum/golang/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820053ef8, 0x80109e658)
/home/aperum/golang/src/testing/testing.go:494 +0x70
net.TestMain(0xc820053ef8)
/home/aperum/golang/src/net/main_test.go:50 +0x2b
main.main()
net/_test/_testmain.go:410 +0x113
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/home/aperum/golang/src/runtime/asm_amd64.s:1696 +0x1
goroutine 185 [IO wait]:
net.runtime_pollWait(0x80109f7b0, 0x77, 0x4502d0)
/home/aperum/golang/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0xc820107f00, 0x77, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitWrite(0xc820107f00, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:82 +0x36
net.(_netFD).connect(0xc820107ea0, 0x0, 0x0, 0x80109e728, 0xc82011c240, 0xecd555c6f, 0xe36d93,
0x890fa0, 0x0, 0x0)
/home/aperum/golang/src/net/fd_unix.go:114 +0x1f6
net.(_netFD).dial(0xc820107ea0, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0)
/home/aperum/golang/src/net/sock_posix.go:137 +0x351
net.socket(0x720be0, 0x3, 0x2, 0x1, 0x0, 0xc820117400, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, ...)
/home/aperum/golang/src/net/sock_posix.go:89 +0x411
net.internetSocket(0x720be0, 0x3, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, 0xecd555c6f, 0xc800e36d93, 0x890fa0, 0x1, ...)
/home/aperum/golang/src/net/ipsock_posix.go:160 +0x141
net.dialTCP(0x720be0, 0x3, 0x0, 0xc820117470, 0xecd555c6f, 0xc800e36d93, 0x890fa0, 0x2, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:171 +0x11e
net.dialSingle(0xc8201f4800, 0x80109e590, 0xc820117470, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:364 +0x3f5
net.dialSerial.func1(0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:336 +0x75
net.dial(0x720be0, 0x3, 0x80109e590, 0xc820117470, 0xc8201dba40, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, ...)
/home/aperum/golang/src/net/fd_unix.go:40 +0x60
net.dialSerial(0xc8201f4800, 0xc82011c220, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:338 +0x760
net.(*Dialer).Dial(0xc8201dbf08, 0x720be0, 0x3, 0xc820011c30, 0xf, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:232 +0x50f
net.TestDialerDualStack(0xc82016e2d0)
/home/aperum/golang/src/net/dial_test.go:662 +0x660
testing.tRunner(0xc82016e2d0, 0x88e638)
/home/aperum/golang/src/testing/testing.go:456 +0x98
created by testing.RunTests
/home/aperum/golang/src/testing/testing.go:561 +0x86d
goroutine 195 [IO wait]:
net.runtime_pollWait(0x80109f1b0, 0x72, 0xc8200920b0)
/home/aperum/golang/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0xc8200e5790, 0x72, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitRead(0xc8200e5790, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:78 +0x36
net.(_netFD).accept(0xc8200e5730, 0x0, 0x80105a028, 0xc8201141c0)
/home/aperum/golang/src/net/fd_unix.go:408 +0x27c
net.(_TCPListener).AcceptTCP(0xc8200e0018, 0x722d80, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:254 +0x4d
net.(_TCPListener).Accept(0xc8200e0018, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:264 +0x3d
net.TestDialerDualStack.func2(0xc8200e5650, 0x80109f8f0, 0xc8200e0018)
/home/aperum/golang/src/net/dial_test.go:638 +0x27
net.(_dualStackServer).buildup.func1(0x7989e8, 0xc8200e5650, 0x1)
/home/aperum/golang/src/net/mockserver_test.go:138 +0x77
created by net.(*dualStackServer).buildup
/home/aperum/golang/src/net/mockserver_test.go:140 +0x73
FAIL net 180.011s
ok net/http 6.438s
ok net/http/cgi 0.641s
...
I can reproduce this also using release-branch.go1.5 and on two different machines.
The text was updated successfully, but these errors were encountered: