Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: dial tests are flaky on BSD #15157

Closed
bradfitz opened this issue Apr 6, 2016 · 16 comments
Closed

net: dial tests are flaky on BSD #15157

bradfitz opened this issue Apr 6, 2016 · 16 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Apr 6, 2016

These have been flaking a lot on OpenBSD ...

https://storage.googleapis.com/go-build-log/f99ca413/openbsd-amd64-gce58_83506189.log

--- FAIL: TestDialTimeoutFDLeak (1.89s)
    dial_test.go:164: got 96; want >= 100
--- FAIL: TestDialerDualStackFDLeak (0.80s)
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
--- FAIL: TestReadUnixgramWithUnnamedSocket (0.19s)
    unixsock_test.go:60: read unixgram /tmp/go-nettest180765192: i/o timeout
--- FAIL: TestDialTimeoutMaxDuration (0.40s)
    timeout_test.go:140: #0: Dial didn't return in an expected time
FAIL
FAIL    net 12.082s

/cc @mikioh

@bradfitz bradfitz added this to the Unplanned milestone Apr 6, 2016
gopherbot pushed a commit that referenced this issue Apr 6, 2016
Flaky tests are a distraction and cover up real problems.

File bugs instead and mark them as flaky.

This moves the net/http flaky test flagging mechanism to internal/testenv.

Updates #15156
Updates #15157
Updates #15158

Change-Id: I0e561cd2a09c0dec369cd4ed93bc5a2b40233dfe
Reviewed-on: https://go-review.googlesource.com/21614
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@mikioh mikioh added the Testing An issue that has been verified to require only test changes, not just a test failure. label May 11, 2016
@bradfitz bradfitz changed the title net: dial tests are flaky on OpenBSD net: dial tests are flaky on BSD May 11, 2016
@bradfitz bradfitz modified the milestones: Go1.7Maybe, Unplanned May 11, 2016
@bradfitz
Copy link
Contributor Author

Also on FreeBSD:

https://build.golang.org/log/73446e673c8f780dbfbc11aaab2b4f8e4daefb68

ok      mime/quotedprintable    0.156s
--- FAIL: TestDialerDualStackFDLeak (0.58s)
    dial_test.go:172: dial tcp: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
FAIL
FAIL    net 2.256s
2016/05/11 17:45:36 Failed: exit status 1

@mikioh, @mdempsky, can one of you investigate?

@mdempsky
Copy link
Member

Yeah, I'll look into the OpenBSD failures today.

@gopherbot
Copy link

CL https://golang.org/cl/23244 mentions this issue.

gopherbot pushed a commit that referenced this issue May 19, 2016
Fixes #14717.
Updates #15157.

Change-Id: I7238b4fe39f3670c2dfe09b3a3df51a982f261ed
Reviewed-on: https://go-review.googlesource.com/23244
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@adg adg modified the milestones: Go1.8, Go1.7Maybe Jul 18, 2016
@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 7, 2016
@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@gopherbot
Copy link

CL https://golang.org/cl/34656 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 20, 2016
Updates #15157

Change-Id: Id280705f4382c3b2323f0eed786a400a184614de
Reviewed-on: https://go-review.googlesource.com/34656
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@josharian
Copy link
Contributor

Just saw what looked like a flake on a linux/amd64 race trybot run:

https://storage.googleapis.com/go-build-log/01e274f4/linux-amd64-race_4322dd7e.log

@mundaym
Copy link
Member

mundaym commented Feb 8, 2017

Another possible flake on a linux/amd64 race trybot run:

https://storage.googleapis.com/go-build-log/f6b2f823/linux-amd64-race_be3b785d.log

@mdempsky
Copy link
Member

mdempsky commented Feb 17, 2017

I spent a little time looking into the TestDialerDualStackFDLeak flake on OpenBSD last night.

I was able to repro the issue under ktrace, and Go doesn't appear to be doing anything obviously wrong. We make a non-blocking connect call, it returns EINPROGRESS, we register it with kqueue, and then the kevent syscall blocks for 6 seconds before the kernel reports that the connect failed.

The failure is always around 6 seconds, so I suspect it's something TCP timeout related.

I was next trying to repro the issue while tcpdump'ing lo0 (hoping it hints what part of the stack might be failing), but no success yet.

@bradfitz
Copy link
Contributor Author

@mdempsky, thanks for investigating.

If we can't make progress on this, though, the openbsd builders are doing more harm than good with flaky tests.

I think it might be time to slap on a bunch of testenv.SkipFlaky(t, 15157) to all these tests on OpenBSD.

@mdempsky
Copy link
Member

@bradfitz For net flakes on OpenBSD, I'm inclined to agree.

@mdempsky
Copy link
Member

Doesn't seem to be limited to BSD: https://storage.googleapis.com/go-build-log/fa1eb023/linux-amd64-race_e3f8fc0c.log

--- FAIL: TestDialTimeoutFDLeak (0.59s)
	dial_test.go:136: got 99; want >= 100

@gopherbot
Copy link

CL https://golang.org/cl/40498 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 12, 2017
It's flaky and distracting.

I'm not sure what it's testing, either. It hasn't saved us before.

Somebody can resurrect it if they have time.

Updates #15157

Change-Id: I27bbfe51e09b6259bba0f73d60d03a4d38711951
Reviewed-on: https://go-review.googlesource.com/40498
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
lparth pushed a commit to lparth/go that referenced this issue Apr 13, 2017
It's flaky and distracting.

I'm not sure what it's testing, either. It hasn't saved us before.

Somebody can resurrect it if they have time.

Updates golang#15157

Change-Id: I27bbfe51e09b6259bba0f73d60d03a4d38711951
Reviewed-on: https://go-review.googlesource.com/40498
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Unplanned Jan 3, 2018
@bcmills
Copy link
Contributor

bcmills commented Aug 12, 2019

On darwin-386-10_14 (https://build.golang.org/log/6e8422bf8109adb9f942363402c03bd942fe0):

--- FAIL: TestDialTimeoutMaxDuration (0.36s)
    timeout_test.go:133: #0: Dial didn't return in an expected time
FAIL
FAIL	net	11.653s

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2020

Filed the darwin failures separately as #43069.

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2020

I don't see any recent failures for TestDialTimeoutFDLeak or TestReadUnixgramWithUnnamedSocket.

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2020

The i/o timeout failure mode of TestDialerDualStackFDLeak now only occurs on Solaris (#43070).

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2020

The remaining TestDialerDualStackFDLeak failures on BSD have a different failure mode from the one reported here, filed as #43071.

I'm going to close out this meta-issue, since there are now separate issues for the individual observed failure modes.

@bcmills bcmills closed this as completed Dec 8, 2020
@golang golang locked and limited conversation to collaborators Dec 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests