Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/unix: TestEventPortErrors failures #58259

Closed
gopherbot opened this issue Feb 2, 2023 · 11 comments
Closed

x/sys/unix: TestEventPortErrors failures #58259

gopherbot opened this issue Feb 2, 2023 · 11 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-illumos OS-Solaris
Milestone

Comments

@gopherbot
Copy link

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:186: unexpected lack of timeout

watchflakes

@gopherbot gopherbot added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 2, 2023
@gopherbot gopherbot added this to the Unreleased milestone Feb 2, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"
2023-01-31 16:01 illumos-amd64 sys@e7d7f631 go@4fe46cee x/sys/unix.TestEventPortErrors (log)
--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:186: unexpected lack of timeout

watchflakes

@bcmills
Copy link
Contributor

bcmills commented Feb 2, 2023

This test is Solaris-specific (attn @golang/solaris @golang/illumos), and was added in CL 324630 and adjusted in CL 422338 (attn @nshalman @tklauser @ianlancetaylor).

This test isn't in line with https://go.dev/wiki/CodeReviewComments#useful-test-failures: it doesn't log what actually happened. In particular it would be useful to know what the actual error returned by port.GetOne was, not just the fact that it wasn't ETIME. So probably a good starting point would be to fix up the test to log more information on failure.

@gopherbot
Copy link
Author

Change https://go.dev/cl/465015 mentions this issue: unix: improve flaky solaris test logging

@bcmills
Copy link
Contributor

bcmills commented Feb 2, 2023

See previously #45643.

@jclulow
Copy link
Contributor

jclulow commented Feb 2, 2023

It would really help us if we could get the OS-illumos label put on these sorts of issues when they apply to illumos as well as Solaris, FWIW.

@bcmills
Copy link
Contributor

bcmills commented Feb 2, 2023

Good point — my apologies for not thinking to add it!

gopherbot pushed a commit to golang/sys that referenced this issue Feb 2, 2023
relates to golang/go#58259

Change-Id: I47bbf4afb59b6e98571297ba53d7f9370712e030
Reviewed-on: https://go-review.googlesource.com/c/sys/+/465015
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@nshalman
Copy link

nshalman commented Feb 2, 2023

Per the discussion in https://go.dev/cl/465015 I think the next step is to make the tests handle the syscalls returning EINTR more gracefully. The occurrence reported by the bot isn't the only place where it could happen.

Rambling to myself:
The tricky part is that I don't have any clever ideas of how to intentionally exercise those code paths in the tests...
Perhaps there's a clever way with dtrace?
The important thing would be to document how to trigger the EINTR to make sure that the tests when running in CI will do all the right things, not to actually trigger EINTR in every CI run (I think....)

EDIT to add:
Probably don't need dtrace. Better is to define a maximum number of times we'll tolerate being interrupted during such tests, wrap the call in a for loop, loop around on EINTR, break out on the expected value (whether nil or ETIME or whatever) and fail the test if we exit the loop with EINTR again. The happy path will enter the loop once, the flaky path will probably only loop once or twice, and the failure path can be exercised by temporarily adding a line explicitly setting err to EINTR.

That code might get a little verbose, so it might make sense to write one wrapper each for Get and GetOne.

@nshalman
Copy link

nshalman commented Feb 3, 2023

My current sketch (needs some light commenting, and to be used in all the tests currently calling GetOne...):

diff --git a/unix/syscall_solaris_test.go b/unix/syscall_solaris_test.go
index 6c2b906..d79e9fb 100644
--- a/unix/syscall_solaris_test.go
+++ b/unix/syscall_solaris_test.go
@@ -18,6 +18,24 @@ import (
        "golang.org/x/sys/unix"
 )

+var MAX_EINTR = 10
+
+func getOneRetry(t *testing.T, p *unix.EventPort, timeout *unix.Timespec) (e *unix.PortEvent, err error) {
+       t.Helper()
+       for i := 1; i < MAX_EINTR; i++ {
+               e, err = p.GetOne(timeout)
+               // to exercise the MAX_EINTR uncomment the following line
+               //err = unix.EINTR
+               if err != unix.EINTR {
+                       break
+               }
+       }
+       if err != nil {
+               return nil, err
+       }
+       return e, err
+}
+
 func TestStatvfs(t *testing.T) {
        if err := unix.Statvfs("", nil); err == nil {
                t.Fatal(`Statvfs("") expected failure`)
@@ -181,7 +199,7 @@ func TestEventPortErrors(t *testing.T) {
        }
        timeout := new(unix.Timespec)
        timeout.Nsec = 1
-       _, err = port.GetOne(timeout)
+       _, err = getOneRetry(t, port, timeout)
        if err != unix.ETIME {
                // See https://go.dev/issue/58259
                // Perhaps we sometimes get EINTR ???

@gopherbot
Copy link
Author

Change https://go.dev/cl/465055 mentions this issue: unix: make solaris syscall tests less flaky

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"
2023-02-24 13:20 illumos-amd64 sys@92c4c39f go@f76fc28c x/sys/unix.TestEventPortErrors (log)
--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:188: port.GetOne(&{0 1}) returned error interrupted system call, want timer expired

watchflakes

@nshalman
Copy link

nshalman commented Mar 8, 2023

Looks like the improved error message works. Good reminder for me to finish up https://go.dev/cl/465055

@golang golang locked and limited conversation to collaborators Mar 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-illumos OS-Solaris
Projects
Status: Done
Development

No branches or pull requests

4 participants