Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: Test{Read,Write}TimeoutFluctuation failures on BSD variants #50725

Closed
bcmills opened this issue Jan 20, 2022 · 11 comments
Closed

os: Test{Read,Write}TimeoutFluctuation failures on BSD variants #50725

bcmills opened this issue Jan 20, 2022 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jan 20, 2022

--- FAIL: TestReadTimeoutFluctuation (1.04s)
    timeout_test.go:272: Read took over 1s; expected 0.1s
--- FAIL: TestWriteTimeoutFluctuation (1.06s)
    timeout_test.go:324: Write took over 1s; expected 0.1s
FAIL
FAIL	os	1.975s

greplogs --dashboard -md -l -e '(?ms)FAIL: Test(?:Read|Write)TimeoutFluctuation.*FAIL\s+os\s' --since=2021-01-01

2022-01-18T23:59:40-50869f3/dragonfly-amd64
2021-11-18T06:05:29-f6647f2/freebsd-arm-paulzhol
2021-11-03T00:07:03-d6f7203/dragonfly-amd64
2021-09-24T14:52:47-217507e/netbsd-arm-bsiegert
2021-09-21T20:28:50-2fc7df9/dragonfly-amd64
2021-04-16T01:54:27-f08c552/dragonfly-amd64-5_8

This may or may not be related in some way to #50189.

The most frequent recent failures are on dragonfly-amd64 (CC @tuxillo). A good first step might be to change the test to panic instead of calling t.Errorf or t.Fatalf: a panic would at least dump any goroutines that are stuck so that we can see which goroutines and/or system calls are still in flight at the time of failure.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 20, 2022
@bcmills bcmills added this to the Go1.19 milestone Jan 20, 2022
@bcmills
Copy link
Contributor Author

bcmills commented May 9, 2022

greplogs -l -e '(?ms)FAIL: Test(?:Read|Write)TimeoutFluctuation.*FAIL\s+os\s' --since=2022-01-19
2022-05-06T17:41:30-091e913/dragonfly-amd64

@tuxillo
Copy link
Contributor

tuxillo commented May 9, 2022

Any action to be taken on my side?

@bcmills
Copy link
Contributor Author

bcmills commented May 9, 2022

I still think changing the test to panic would be a good first step — it would be nice to know what the test is blocked on when it fails!

@tuxillo
Copy link
Contributor

tuxillo commented May 9, 2022

Okay, let me know if we need to check anything OS specific on our side :)

@ianlancetaylor
Copy link
Contributor

Rolling forward to 1.20. Please comment if you disagree. Thanks.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Go1.20 Jun 24, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Jun 29, 2022

Another failure on https://storage.googleapis.com/go-build-log/6db3b12e/openbsd-amd64-68_317a0f16.log (an openbsd-amd64-68 TryBot)

If we're not going to prioritize a fix for these tests, they should be skipped to reduce noise when testing unrelated changes.

@rsc
Copy link
Contributor

rsc commented Jun 29, 2022

An 0.9s delay during all.bash on a slow machine (running many different test binaries at once, which can be causing long rescheduling delays) does not seem out of the ordinary to me. I think this test is too picky.

@ianlancetaylor
Copy link
Contributor

These timeout fluctuation tests were added (by me) in https://go.dev/cl/71770. They were directly copied from the net package. Since then, the net package versions of those tests have been updated, by @bcmills, in https://go.dev/cl/365334, for #36108.

The first step here might be to copy CL 365334 into the os package.

@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jun 29, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Jun 29, 2022

(Note that that fix also needed a followup in CL 372215.)

@ianlancetaylor
Copy link
Contributor

I will prepare a CL.

@gopherbot
Copy link

Change https://go.dev/cl/415234 mentions this issue: os: simplify deadline fluctuation tests

jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
This applies the net package CL 365334, CL 366176, CL 372215 to the os
package.

CL 365334:

    These tests were checking for fairly narrow timing windows, but were
    running in parallel and heavily dependent on timer and goroutine
    scheduling. This change eliminates unnecessary goroutines, runs the
    tests sequentially (dramatically shortening the timeouts to reduce the
    penalty of doing so), and uses timestamp comparison instead of
    background timers to hopefully gain some robustness from monotonic
    timestamps.

    Many of the other tests from this package would benefit from similar
    simplifications, which we can apply if and when we notice flaky
    failures or want to improve the latency of running the test.

CL 366176:

    It appears that at least the OpenBSD kernel gets sloppier the longer
    the timeout we give it, up to an observed overhead of around 25%.
    Let's give it a little more than that (33%) in the comparison, and
    also increase the growth curve to match the actual observed times
    instead of exponential initial growth.

CL 372215:

    Decrease the slop everywhere else, since NetBSD and OpenBSD seem to be
    the only ones that miss by that much.

For golang#36108
For golang#50189
Fixes golang#50725 (we hope)

Change-Id: I0854d27af67ca9fcf0f9d9e4ff67acff4c2effc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/415234
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@golang golang locked and limited conversation to collaborators Jun 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Development

No branches or pull requests

5 participants