Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: frequent test timeouts on dragonfly-amd64 builder #45215

Closed
bcmills opened this issue Mar 24, 2021 · 10 comments
Closed

cmd/go: frequent test timeouts on dragonfly-amd64 builder #45215

bcmills opened this issue Mar 24, 2021 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Dragonfly
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 24, 2021

2021-03-24T03:16:14-4357f71/dragonfly-amd64
2021-03-23T23:09:33-87a3ac5/dragonfly-amd64
2021-03-23T01:21:24-d25476e/dragonfly-amd64
2021-03-19T17:31:59-4829031/dragonfly-amd64
2021-03-19T16:11:47-a937729/dragonfly-amd64
2021-03-18T21:27:21-d3ab6b5/dragonfly-amd64
2021-03-18T21:22:04-bdbba22/dragonfly-amd64
2021-03-18T17:14:39-f47fab9/dragonfly-amd64
2021-03-18T16:51:27-eaa1dde/dragonfly-amd64
2021-03-18T16:51:24-af4388a/dragonfly-amd64
2021-03-18T04:17:00-c2d6251/dragonfly-amd64

The timeouts don't seem to occur in any one particular test case, and no individual test case seems to be running for an inordinately long time. I think this builder is just slow, but I'm filing this issue in case there is a specific performance problem in cmd/go or the test itself that we can address.

I wonder if something about the dragonfly kernel or the builder configuration in particular makes the I/O syscalls used by cmd/go slower than on other platforms.

CC @tuxillo @jayconrod @matloob

@bcmills bcmills added OS-Dragonfly NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 24, 2021
@bcmills bcmills added this to the Backlog milestone Mar 24, 2021
@tuxillo
Copy link
Contributor

tuxillo commented Mar 26, 2021

Probably it is the setup. Each builder is running in a VM but the virtual disks are stored in hard disks, not SSDs. Also, we have to periodically cleanup the filesystem more often than in a standard setup, in which the cleanup occurs once a day. If they are causing too much trouble, I could probably move the VMs to another server. Let me know!

@dmitshur
Copy link
Contributor

Based on that information about the builder, it sounds like it's expected it to need more time to complete tests successfully, so increasing the test timeout (#45216) has a good chance of resolving this issue.

tuxillo added a commit to tuxillo/build that referenced this issue Apr 30, 2021
This sets the GO_TEST_TIMEOUT_SCALE=2 for dragonfly builders in order
to try to solve intermittent test failures.

For golang/go#45215.
For golang/go#34034.
Fixes golang/go#45216.
@gopherbot
Copy link

Change https://golang.org/cl/315329 mentions this issue: dashboard: increase GO_TEST_TIMEOUT_SCALE for dragonfly builders

gopherbot pushed a commit to golang/build that referenced this issue Apr 30, 2021
This sets the GO_TEST_TIMEOUT_SCALE=2 for dragonfly builders in order
to try to solve intermittent test failures.

For golang/go#45215.
For golang/go#34034.
Fixes golang/go#45216.

Change-Id: I81dae533c137ff1903bf6b1a52cf9752e8921b7c
GitHub-Last-Rev: 9b4fb80
GitHub-Pull-Request: #35
Reviewed-on: https://go-review.googlesource.com/c/build/+/315329
Run-TryBot: Carlos Amedee <carlos@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Trust: Carlos Amedee <carlos@golang.org>
@tuxillo
Copy link
Contributor

tuxillo commented Jun 13, 2021

This one can be closed?

@bcmills
Copy link
Contributor Author

bcmills commented Jun 22, 2021

This builder is still running extremely slowly, and I suspect (but haven't confirmed) that a lot of the problem comes down to disk performance.

We may need to increase the timeout scale factor further, but I wonder if there are other issues at play. For example, is it possible that the builder is memory-starved? Maybe we need to artificially lower GOMAXPROCS to avoid hitting swap while running the tests...

@tuxillo
Copy link
Contributor

tuxillo commented Jun 22, 2021

We're going to try and have make.bash working so that the builders can be run in GCE, see: #23060

That might help us a bit with the builder slowness (hopefully).

@bcmills
Copy link
Contributor Author

bcmills commented Jun 28, 2022

It's been about a year since the last update on this issue, and we're now running into acute timeout issues on the dragonfly builder (#53577).

Is there still work in progress to migrate this builder to GCE? That seems like it would go a long way toward making test failures on dragonfly easier to debug, and with a scalable GCE image it would also be much easier to add a dragonfly SlowBot to specific changes to detect regressions upfront.

@tuxillo
Copy link
Contributor

tuxillo commented Jun 28, 2022

It's been about a year since the last update on this issue, and we're now running into acute timeout issues on the dragonfly builder (#53577).

Is there still work in progress to migrate this builder to GCE? That seems like it would go a long way toward making test failures on dragonfly easier to debug, and with a scalable GCE image it would also be much easier to add a dragonfly SlowBot to specific changes to detect regressions upfront.

No progress in the GCE builder, but if you tell me you really need it I'll prioritize it then.

@gopherbot
Copy link

Change https://go.dev/cl/419084 mentions this issue: dashboard: use dragonfly on GCE for dragonfly-amd64 builds

@golang golang locked and limited conversation to collaborators Aug 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Dragonfly
Projects
None yet
Development

No branches or pull requests

4 participants