Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/api: tests timing out on plan9-arm builder #38537

Closed
millerresearch opened this issue Apr 20, 2020 · 7 comments
Closed

cmd/api: tests timing out on plan9-arm builder #38537

millerresearch opened this issue Apr 20, 2020 · 7 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9 Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@millerresearch
Copy link
Contributor

CL 224619 slowed the cmd/api tests on plan9_arm from best-case about 1 minute to best-case about 3 minutes, and worst-case timing out after 13 minutes, for example here.

The CL added code to the cmd/api test to "warm up the import cache" by starting 31 go list -deps -json std commands in parallel. Each of these 31 commands walks the source and package trees doing a 'stat' (twice) on each of 2562 files, and opens and reads every source file at least twice (once partially, once fully). This is quite hard work on a diskless Raspberry Pi 3 with 1GB of RAM. In the instances where the test times out, it appears the OS has started swapping (to the same file server which holds the source tree -- swapping to SDcard is not really practical).

The dist test command doesn't run cmd/api itself on Plan 9 platforms, because it takes too long. Can I suggest skipping the cmd/api test on plan9_arm for the same reason?

Alternatively, perhaps a radical idea: instead of forking off separate go list commands, burdening the OS with 31 independent garbage-collected address spaces, and producing json text to be re-read and parsed, would it be feasible to do the tree walks internally as goroutines?

@bcmills
Copy link
Contributor

bcmills commented Apr 20, 2020

This is quite hard work on a diskless Raspberry Pi 3 with 1GB of RAM.

The other 32-bit ARM builders have very generous GO_TEST_TIMEOUT_SCALE settings:
https://github.com/golang/build/blob/1f68cb08aa220ffd9307944f095298f748f77597/dashboard/builders.go#L1935-L1953
https://github.com/golang/build/blob/1f68cb08aa220ffd9307944f095298f748f77597/dashboard/builders.go#L1965-L1974

Perhaps the same should be set for the plan9-arm builder.

(Or, isn't there currently a RPi available with 4GB of RAM? Perhaps a builder upgrade is in order.)

That said, the full GOROOT/src tree is currently only ~103 MiB. That should fit easily within 1GiB, so perhaps this is just a matter of needing to restrict the number of concurrent go list calls. I'll see if that can be done without significantly impacting latency.

@andybons andybons added OS-Plan9 Testing An issue that has been verified to require only test changes, not just a test failure. labels Apr 20, 2020
@andybons andybons added this to the Unplanned milestone Apr 20, 2020
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 20, 2020
@gopherbot
Copy link

Change https://golang.org/cl/228998 mentions this issue: cmd/api: limit concurrent 'go list' calls to GOMAXPROCS

gopherbot pushed a commit that referenced this issue Apr 20, 2020
Each invocation of 'go list' may consume a significant quantity of
system resources, including buffers for reading files and RAM for the
runtime's memory footprint.
Very small builders may even hit swap as a result of that load,
further exacerbating resource contention.

To avoid overloading small builders, restrict 'go list' calls to
runtime.GOMAXPROCS as it is set at the first call to loadImports.

This also somewhat improves running time even on larger machines: on
my workstation, this change reduces the wall time for 'go test
cmd/api' by around 100ms.

Updates #38537

Change-Id: I968e0f961a8f1d84c27e1ab8b621b9670dcfd448
Reviewed-on: https://go-review.googlesource.com/c/go/+/228998
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bcmills
Copy link
Contributor

bcmills commented Apr 21, 2020

Hmm, looks like that wasn't enough:
2020-04-21T15:37:29-4f27e1d/plan9-arm

Any idea what the actual value of runtime.GOMAXPROCS(0) ends up being on that builder? (If it's an RPi 3, I would expect GOMAXPROCS=4?)

It might be interesting to try disabling swap entirely on the builder — an explicit failure seems much more useful than an unexpectedly slow test run.

@bcmills bcmills added the Builders x/build issues (builders, bots, dashboards) label Apr 21, 2020
@bcmills bcmills changed the title cmd/api: tests timing out on plan9_arm cmd/api: tests timing out on plan9-arm builder Apr 21, 2020
@millerresearch
Copy link
Contributor Author

I'm pretty sure GOMAXPROCS will be 4, but I'll check. Without swap enabled, processes will eventually start getting killed from lack of physical memory, but it may not be reflected in the log. There are other tests which can cause swapping, too.

@millerresearch
Copy link
Contributor Author

Yes, GOMAXPROCS is 4 on the plan9_arm builders.

@millerresearch
Copy link
Contributor Author

I've increased GO_TEST_TIMEOUT_SCALE to 6, and haven't seen any timeouts since.

I'll mark this closed, and see if I can tweak the plan9_arm builders for more performance. The whole test suite is currently taking about 90 minutes.

@millerresearch
Copy link
Contributor Author

(Or, isn't there currently a RPi available with 4GB of RAM? Perhaps a builder upgrade is in order.)

I've been experimenting with a couple of Raspberry Pi 4 boards, with 2GB and 4GB respectively, as plan9_arm builders. Even on those, cmd/api test can take 5+ minutes.

The standard Plan 9 file servers don't have a directory cache like linux, or an inode cache (the "inode" information is in the directory entry, since there are no hard links). Walking a deep tree with multiple stat and open calls on each file is expensive. The Plan 9 idiom for doing this would include a chdir into each directory up and down the tree, which avoids the long pathname lookups. But this wouldn't be compatible with the go semantics of current working directory shared across all goroutines.

The cost of tree walks is the sole reason that the overall test run on plan9_arm takes more than an hour. Most of the time is spent in the the checkNotStale tree walk at the beginning of each group of tests in dist test, which I suspect is redundant (see issue #24300).

@golang golang locked and limited conversation to collaborators Apr 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9 Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants