Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestTCPReadWriteAllocs flakiness #8859

Closed
mikioh opened this issue Oct 3, 2014 · 12 comments
Closed

net: TestTCPReadWriteAllocs flakiness #8859

mikioh opened this issue Oct 3, 2014 · 12 comments
Milestone

Comments

@mikioh
Copy link
Contributor

mikioh commented Oct 3, 2014

i now see the following on darwin w/ tip:

go test -run=TestTCPReadWriteMallocs -cpu=1,2,4,8,16,32
--- FAIL: TestTCPReadWriteMallocs (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
--- FAIL: TestTCPReadWriteMallocs-2 (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
--- FAIL: TestTCPReadWriteMallocs-4 (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
--- FAIL: TestTCPReadWriteMallocs-8 (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
--- FAIL: TestTCPReadWriteMallocs-16 (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
--- FAIL: TestTCPReadWriteMallocs-32 (0.02s)
    tcp_test.go:531: Got 1 allocs, want 0
FAIL

not tested on all platforms yet, but freebsd and linux work fine.

% hg id
920cde0a8b2d+ tip

% uname -a
Darwin airborne.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014;
root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64
@mikioh
Copy link
Contributor Author

mikioh commented Oct 3, 2014

Comment 1:

Labels changed: added repo-main.

@ianlancetaylor
Copy link
Contributor

Comment 2:

Labels changed: added release-go1.5.

@mikioh mikioh added new labels Oct 3, 2014
@bradfitz bradfitz modified the milestone: Go1.5 Dec 16, 2014
@mikioh
Copy link
Contributor Author

mikioh commented Feb 24, 2015

Closed by 77a2113.

@mikioh mikioh closed this as completed Feb 24, 2015
@josharian
Copy link
Contributor

This still breaks on my OS X machine.

It is an intermittent failure, but it happens enough to be frustrating. Anecdotally, it happens more when running all.bash or the full net tests than just the lone test. I am in Portugal at the moment with flaky internet connections; it is possible that the failures are correlated to internet problems, although I have been unable to pin it down. I have also been unable to reproduce this with the test converted to a benchmark (I was hoping to use -memprofile to point to the source of the allocations).

I'm running 10.10.2 (14C109) with the pprof-enabling kernel patch.

If you have suggestions for other things to try to diagnose this, I'm happy to help.

@josharian josharian reopened this Feb 26, 2015
@mikioh mikioh changed the title net: TestTCPReadWriteMallocs flakiness on darwin net: TestTCPReadWriteAllocs flakiness on darwin Feb 26, 2015
@mikioh
Copy link
Contributor Author

mikioh commented Feb 26, 2015

with the pprof-enabling kernel patch.

ugh... this one? https://github.com/rsc/pprof_mac_fix

@josharian
Copy link
Contributor

Yes, that one. Can't live without it, I'm afraid.

@mikioh mikioh changed the title net: TestTCPReadWriteAllocs flakiness on darwin net: TestTCPReadWriteAllocs flakiness on darwin with pprof_mac_fix Feb 26, 2015
@josharian
Copy link
Contributor

This is driving me crazy. Sending a CL to disable for now.

josharian added a commit that referenced this issue Feb 28, 2015
Having this test fail, as it does reliably for me,
makes working frustrating. Disable it for now,
until we can diagnose the issue.

Update issue #8859.

Change-Id: I9dda30d60793e7a51f48f445c78ccb158068cc25
Reviewed-on: https://go-review.googlesource.com/6381
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz
Copy link
Contributor

bradfitz commented Mar 1, 2015

I'm also going to disable Dragonfly. Probably related.

bradfitz added a commit that referenced this issue Mar 1, 2015
Update #8859
Update #10042

Change-Id: Idc7eadb447b73563ce9085e50c2042652442c2d9
Reviewed-on: https://go-review.googlesource.com/6412
Reviewed-by: Minux Ma <minux@golang.org>
@josharian
Copy link
Contributor

I managed to get a stack trace, and I think I have a diagnosis.

Under some network conditions (I don't understand which), the SYS_READ syscall in syscall.read fills the buffer and sets e1 to 35 (EAGAIN). 35 is converted to an error a few lines down, which allocates. The error is passed up to io.ReadFull via io.ReadAtLeast. Both calls explicitly document that they return a nil error iff the buffer is not full. In this case, the buffer gets filled on the first request, so the error from syscall.read gets swallowed. As a result, the t.Fatal is not triggered, so we never actually saw the EAGAIN in the test results.

It's not obvious to me what the right fix is. Perhaps we should ignore EAGAIN in syscall.read if we have a full buffer? I also don't know whether this a complete diagnosis.

Incidentally, this has nothing to do with the kernel patch. The patch is extremely narrow in scope and concerned with the handling of profiling signals, which are not even in use when this test fails.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 4, 2015

@mikioh mikioh changed the title net: TestTCPReadWriteAllocs flakiness on darwin with pprof_mac_fix net: TestTCPReadWriteAllocs flakiness Mar 5, 2015
@mikioh
Copy link
Contributor Author

mikioh commented Mar 5, 2015

@josharian,

Thanks for the investigation. I now understand that some additional stuff (e.g., pprof_mac_fix) or environment condition to the kernel can easily shake scheduling for uio_vectors and make EAGAIN notifications. That means that assuming netpoll hotpaths sail on calm sea during test is pretty naive. Perhaps permitting netpoll hotpaths allocate a few stuff in TestReadWriteAllocs might make sense.

@mikioh
Copy link
Contributor Author

mikioh commented Mar 5, 2015

permitting...

either applying change 6701 or permitting...

bradfitz added a commit that referenced this issue Mar 27, 2015
Update #8859

Change-Id: I5b0005b308e83954a495f06d27b7d8d30e813820
Reviewed-on: https://go-review.googlesource.com/8193
Reviewed-by: Ian Lance Taylor <iant@golang.org>
bradfitz added a commit that referenced this issue Mar 27, 2015
The previously-submitted https://go-review.googlesource.com/#/c/6701
didn't include dragonfly, freebsd, nacl, netbsd, openbsd, or solaris.
(or things like darwin/arm or ppc64 or arm64)

So do them all.

Note I had to copy the function into tables_nacl.go. I found that
preferable to creating a new file just to have suitable build
tags. It's likely this function will be mirrored to plan9 and windows
later too, each of the 4 with their own policy of which error values
are common.

The corresponding x/sys CL for this CL is https://golang.org/cl/8190
but it excludes nacl (not in x/sys) and solaris (already broken).

Update Issue #8859

Change-Id: I91902615692b29b69c905edd9e126a26337294f6
Reviewed-on: https://go-review.googlesource.com/8192
Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants