Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: shard and scale the longtest SlowBots #37439

Closed
bcmills opened this issue Feb 25, 2020 · 4 comments
Closed

x/build: shard and scale the longtest SlowBots #37439

bcmills opened this issue Feb 25, 2020 · 4 comments
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 25, 2020

I missed a Windows test failure in CL 220645 because I forgot to run it against the windows-amd64-longtest SlowBot. I forgot to run it against that SlowBot because I'm not in the habit of doing so.

I'm not in the habit of running that SlowBot because it is currently much too slow. To pick some relevant runs:

  • The first run on CL 220717 started at 5:07 PM and completed at 5:31 PM (24 minutes).
  • The second run on that CL started at 5:51 PM and completed at 6:33 PM (42 minutes).
  • The run on CL 220722 started at 5:48 PM and completed at 6:28 PM (40 minutes).

In contrast, a regular TryBot typically caps out around 10 minutes (#32632), and we consider runs that take longer than 20 minutes to be unacceptably slow (#36629, #36482).

Since there is nothing particularly special about the hardware needed to run the longtest builds (they're just large VMs), I think we should adjust the builder configuration to run the -longtest SlowBots with 4 or more shards each. That way, the end-to-end latency impact of adding one of these bots to a CL will be minimal, and we will not only have less of a disincentive to using them, but also have much faster feedback in order to inform revert-or-fix decisions when one breaks.

CC @golang/osp-team

@bcmills bcmills added Builders x/build issues (builders, bots, dashboards) ToolSpeed FeatureRequest labels Feb 25, 2020
@gopherbot gopherbot added this to the Unreleased milestone Feb 25, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2020
@cagedmantis
Copy link
Contributor

/cc @toothrot @dmitshur

@dmitshur dmitshur self-assigned this Nov 6, 2020
@gopherbot
Copy link

Change https://golang.org/cl/268037 mentions this issue: dashboard: try to speed up pre-submit longtest builders

gopherbot pushed a commit to golang/build that referenced this issue Nov 7, 2020
The longtest builders are currently primarily post-submit builders,
where it's okay for them to be as slow as they need to be in order
to provide additional test coverage. In this context, whether they
take 40 minutes or 50 makes little difference.

The longtest builders are also sometimes requested via SlowBots for
changes that are riskier than usual, or otherwise desire additional
coverage beyond the normal TryBots. They're also always enabled for
CLs to release branches. In such contexts, speeding up SlowBot runs
from 40 minutes to 20 or less would be appreciated and in turn help
people use longtest SlowBots more frequently.

Longtest builders are already configured to use sharded tests.
Configure them to use additional helpers to speed up test execution.
Try out 3, 5, and 9 helpers to see how much it helps before settling.

For golang/go#37439.

Change-Id: I425bc0257b7a54bb32c0eb1719fea7ba3f4fd461
Reviewed-on: https://go-review.googlesource.com/c/build/+/268037
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
@dmitshur
Copy link
Contributor

dmitshur commented Nov 7, 2020

I sent https://golang.org/cl/268037 for this.

I had a chance to try out all 3 values for additional TryBot helpers (3, 5, and 9) in at least one CL and the times so far were:

windows-amd64-longtest (+3 helpers) linux-386-longtest (+5) linux-amd64-longtest (+9)
12 min 21 sec 12 min 6 sec 10 min 15 sec
10 min 9 sec

It seems even just 3 additional helpers goes a long way to speed up the longtest SlowBot. I'll collect some more timing data next up.

(It's not a completely fair comparison since the builders are different, but it's enough to get some idea.)

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 21, 2020
@gopherbot
Copy link

Change https://golang.org/cl/279513 mentions this issue: dashboard: pick 4 TryBot helpers for -longtest SlowBots

@golang golang locked and limited conversation to collaborators Dec 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Projects
None yet
Development

No branches or pull requests

4 participants