x/build: improve LUCI builder test sharding strategy for the main Go repository #65814
Labels
Builders
x/build issues (builders, bots, dashboards)
NeedsDecision
Feedback is required from experts, contributors, and/or the community before a change can be made.
ToolSpeed
Milestone
The current test sharding strategy for the LUCI builders is to generate 4 test shards and distribute the
go tool dist test -list
tests across them via a hash of their names. This strategy means the test execution order and grouping is deterministic, which is useful for reproducibility, but doesn't take into account how long the tests take to run.As a result, we've observed differences in test shard run times up to 2x (longest vs. shortest). The worst cases tend to be on builders where certain tests are disproportionately slower, either because of the build mode or the platform. (The race mode builders and Windows builders are hit particularly hard.)
There are a few things we can do to fix this. The easiest one is to just find a hash that distributes the tests more evenly. This seems fragile at first, but the
go tool dist test -list
names change very infrequently, so this might work well. Another is to weigh the tests according to historical runtimes on a particular builder, and bucket them according to some load balancing scheme. (Probably not on every run; we still value determinism, so the weights will probably be hard-coded and updated only occasionally.)We can probably get back a few minutes of build latency for presubmit runs this way.
The text was updated successfully, but these errors were encountered: