Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: occasional "resource temporarily unavailable" flakes on linux-s390x builder #32328

Open
bcmills opened this issue May 30, 2019 · 9 comments
Labels
arch-s390x Issues solely affecting the s390x architecture. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented May 30, 2019

It's not clear to me whether this is related to CL 177599 and #32205.

From https://build.golang.org/log/8842ba4fe354ba0d2a48ea5918280b3a2a202dcb:

##### ../misc/cgo/errors
removing /data/golang/workdir/tmp/TestPointerChecks471752625
--- FAIL: TestPointerChecks (0.98s)
    --- FAIL: TestPointerChecks/exportok (0.00s)
        ptr_test.go:596: 
        ptr_test.go:597: failed unexpectedly: fork/exec /data/golang/workdir/tmp/TestPointerChecks471752625/src/ptrtest/ptrtest.exe: resource temporarily unavailable
FAIL

CC @ianlancetaylor @rsc

@bcmills bcmills added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 30, 2019
@bcmills bcmills added this to the Go1.13 milestone May 30, 2019
@bcmills
Copy link
Contributor Author

bcmills commented May 30, 2019

Here's a seemingly-related failure in the runtime test on the same builder:

--- FAIL: TestPanicTraceback (0.00s)
    crash_test.go:67: starting testprog PanicTraceback: fork/exec /data/golang/workdir/tmp/go-build726730629/testprog.exe: resource temporarily unavailable
FAIL

(https://build.golang.org/log/e518ab1802ac73754f4f3a51d7fb3cba86868b4e)

So perhaps it's not specific to the misc/cgo/errors test.

@bcmills bcmills changed the title misc/cgo/errors: TestPointerChecks flake on linux-s390x all: occasional "resource temporarily unavailable" flakes on linux-s390x builder May 30, 2019
@bcmills
Copy link
Contributor Author

bcmills commented May 30, 2019

CC @mundaym

@gopherbot
Copy link

Change https://golang.org/cl/179603 mentions this issue: misc/cgo/errors: limit number of parallel executions

@bcmills
Copy link
Contributor Author

bcmills commented Jun 14, 2019

I haven't seen misc/cgo/errors flake again, but this still occurs sporadically in other tests:
https://build.golang.org/log/75ddd2b8e6749643c9150bf8846ed69f5afdcddf
https://build.golang.org/log/7028d294ba985b9bd6e5cb2024af1fa2a07f7b37
https://build.golang.org/log/2cc3638b3235b7a7b63a474df5991ec468dc404d

I think this may need a deeper fix in our fork/exec wrapper.

@bcmills bcmills reopened this Jun 14, 2019
@mundaym
Copy link
Member

mundaym commented Jun 14, 2019

I've done a bit of digging (systemd isn't something I'm very familiar with) and I think this might be due to the default systemd TasksMax setting in SLES 12. It is only 512 threads which includes stage0 and the buildlet and everything they spawn...

I don't know if the fork/exec wrapper can do much if it hits a limit like this, I don't think retrying will necessarily solve the situation.

I've added the following lines to the buildlet service:

TasksMax=65536
LimitNOFILE=65536
LimitNPROC=65536

Hopefully this will make the s390x builder less flaky in future...

@mundaym
Copy link
Member

mundaym commented Jun 14, 2019

Does anyone know if there is a way to check the current cgroup's resource limits? Maybe we can get the buildlet to print some of them.

@mundaym
Copy link
Member

mundaym commented Jun 14, 2019

@bradfitz and @dmitshur: any thoughts on the systemd task settings we should be using for buildlets?

@bradfitz
Copy link
Contributor

any thoughts on the systemd task settings we should be using for buildlets?

For Go stuff I've always just used the defaults. But maybe the defaults have changed or your distro has lower limits or s390x ends up creating more threads for some reason?

Does anyone know if there is a way to check the current cgroup's resource limits? Maybe we can get the buildlet to print some of them.

I don't. But that's a good idea.

@mundaym
Copy link
Member

mundaym commented Jun 14, 2019

But maybe the defaults have changed or your distro has lower limits or s390x ends up creating more threads for some reason?

I think it's a distro defaults thing. Ubuntu 18.04 defaults to 4915 tasks which is a lot more headroom.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@mundaym mundaym added the arch-s390x Issues solely affecting the s390x architecture. label Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-s390x Issues solely affecting the s390x architecture. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

6 participants