Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestVariousDeadlines is flaky #19519

Closed
clausecker opened this issue Mar 12, 2017 · 16 comments
Closed

net: TestVariousDeadlines is flaky #19519

clausecker opened this issue Mar 12, 2017 · 16 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin OS-FreeBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@clausecker
Copy link

What version of Go are you using (go version)?

go version go1.8 freebsd/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="freebsd"
GOPATH="/home/fuz/src/go"
GORACE=""
GOROOT="/home/fuz/go"
GOTOOLDIR="/home/fuz/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build535249256=/tmp/go-build -gno-record-gcc-switches"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-I/home/fuz/include"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-L/home/fuz/lib"

uname -a output:

FreeBSD miso 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

ifconfig output:

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 34:e6:d7:60:b2:c5
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 1c:65:9d:0d:70:31
	inet 10.53.43.185 netmask 0xffff0000 broadcast 10.53.255.255 
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: IEEE 802.11 Wireless Ethernet MCS mode 11na
	status: associated
	ssid clt channel 100 (5500 MHz 11a ht/40+) bssid fc:5b:39:f6:fa:e9
	regdomain 106 indoor ecm authmode WPA2/802.11i privacy ON
	deftxkey UNDEF AES-CCM 2:128-bit AES-CCM 3:128-bit txpower 30 bmiss 7
	mcastrate 6 mgmtrate 6 scanvalid 60 ampdulimit 64k ampdudensity 8
	shortgi wme burst roaming MANUAL bintval 102
	groups: wlan 

sysctl hw.model output:

hw.model: Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz

What did you do?

I tried to update my Go 1.6 installation to Go 1.8 by removing the Go installation, downloading the source code from here and running all.bash.

What did you expect to see?

A successful build followed by a successful pass through the test suite.

What did you see instead?

A successful build followed by TestVariousDeadlines failing:

--- FAIL: TestVariousDeadlines (8.01s)
	timeout_test.go:877: 1ns run 1/1
	timeout_test.go:902: for 1ns run 1/1, good client timeout after 53.221µs, reading 0 bytes
	timeout_test.go:916: for 1ns run 1/1, timeout waiting for server to finish writing
FAIL
FAIL	net	9.515s
@bradfitz bradfitz added OS-FreeBSD Testing An issue that has been verified to require only test changes, not just a test failure. labels Mar 21, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Mar 21, 2017
@josharian
Copy link
Contributor

Not just freebsd. Just failed locally on darwin/amd64:

--- FAIL: TestVariousDeadlines (5.03s)
	timeout_test.go:878: 1ns run 1/1
	timeout_test.go:903: for 1ns run 1/1, good client timeout after 38.928µs, reading 0 bytes
	timeout_test.go:913: for 1ns run 1/1, server in 177.208µs wrote 32768: readfrom tcp4 127.0.0.1:59054->127.0.0.1:59055: write tcp4 127.0.0.1:59054->127.0.0.1:59055: write: broken pipe
	timeout_test.go:878: 2ns run 1/1
	timeout_test.go:903: for 2ns run 1/1, good client timeout after 4.776µs, reading 0 bytes
	timeout_test.go:913: for 2ns run 1/1, server in 89.252µs wrote 32768: readfrom tcp4 127.0.0.1:59054->127.0.0.1:59056: write tcp4 127.0.0.1:59054->127.0.0.1:59056: write: broken pipe
	timeout_test.go:878: 5ns run 1/1
	timeout_test.go:903: for 5ns run 1/1, good client timeout after 8.099µs, reading 0 bytes
	timeout_test.go:917: for 5ns run 1/1, timeout waiting for server to finish writing
FAIL
FAIL	net	6.371s

@josharian josharian changed the title net: TestVariousDeadlines fails on FreeBSD net: TestVariousDeadlines is flaky Apr 13, 2017
@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 3, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Unplanned Jan 3, 2018
@josharian josharian modified the milestones: Unplanned, Go1.11 Mar 30, 2018
@josharian
Copy link
Contributor

This continues to be one of the more common flakes on the dashboard. Moving the milestone back to 1.11.

@mvdan
Copy link
Member

mvdan commented Apr 7, 2018

Any hints to reproduce this issue? I just failed to do so on my linux/amd64 laptop, reaching 5000 successful runs with:

stress -p 256 ./net.test -test.run TestVariousDeadlines$ -test.cpu 10

Might be useful to start pasting builder failure logs here, to see if there is a pattern. For example, this might show up only on BSDs.

@josharian
Copy link
Contributor

@bcmills
Copy link
Contributor

bcmills commented Apr 26, 2018

Here's another repro in the builders (plan9-386):
https://build.golang.org/log/2f56b2bc471c2d8d68f862238d750ffcfc88c34d

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
@millerresearch
Copy link
Contributor

The plan9_386 failure is very frequent but slightly different - it's always "timeout (5s) waiting for client to timeout (...) reading" where the reported freebsd and darwin failure is "timeout waiting for server to finish writing".

On plan9_386, I have never observed the failure on real hardware, or on qemu running on an otherwise idle server. This weekend I have managed to reproduce it (very intermittently) by simulating a busy server, by running qemu in parallel with many cpu-bound "nice --10" processes. My hypothesis is that it's the measurement of time that's flawed, not the network implementation.

@fuzxxl and @josharian , are your observed failures on real hardware or on virtual machines?

@clausecker
Copy link
Author

@millerresearch The failure was observed on real hardware (a Dell Precision M4800 running FreeBSD 11.0 I think). I am currently not able to reproduce it using Go 1.10 on the same machine now running FreeBSD 11.2.

@bcmills
Copy link
Contributor

bcmills commented Dec 12, 2018

@bcmills
Copy link
Contributor

bcmills commented Mar 13, 2019

linux-s390x-ibm: https://build.golang.org/log/c82cf7bb1f5da349bf7b69f741d30eb89b13d143

--- FAIL: TestVariousDeadlines (0.00s)
    --- FAIL: TestVariousDeadlines/750ns-1 (0.00s)
        timeout_test.go:445: client Copy = 8192, read |0: not pollable; want timeout

@bcmills
Copy link
Contributor

bcmills commented May 30, 2019

@katiehockman
Copy link
Contributor

This is flaking for darwin-amd64-nocgo as well:
https://build.golang.org/log/671f5a8fc07f8d33a418dd2966c660389e4e230f

/cc @mikioh

@bcmills
Copy link
Contributor

bcmills commented Jun 26, 2019

The linux-s390x-ibm failure mode in #19519 (comment) is different.

The darwin-amd64-nocgo failure mode seems to be the same as for FreeBSD. IIRC, macOS is also derived from BSD — perhaps the net package (or the test) is relying on assumptions about system calls that don't hold for BSD flavors of those syscalls?

@gopherbot
Copy link

Change https://golang.org/cl/184137 mentions this issue: net: deflake TestVariousDeadlines

@gopherbot
Copy link

Change https://golang.org/cl/184157 mentions this issue: crypto/tls: deflake localPipe in tests

gopherbot pushed a commit that referenced this issue Jun 29, 2019
The localPipe implementation assumes that every successful net.Dial
results in exactly one successful listener.Accept. I don't believe this
is guaranteed by essentially any operating system. For this test, we're
seeing flakes on dragonfly (#29583).

But see also #19519, flakes due to the same assumption on FreeBSD
and macOS in package net's own tests.

This CL rewrites localPipe to try a few times to get a matching pair
of connections on the dial and accept side.

Fixes #29583.

Change-Id: Idb045b18c404eae457f091df20456c5ae879a291
Reviewed-on: https://go-review.googlesource.com/c/go/+/184157
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@golang golang locked and limited conversation to collaborators Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin OS-FreeBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

10 participants