Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add "slowbots" support #34501

Closed
bradfitz opened this issue Sep 24, 2019 · 32 comments
Closed

x/build: add "slowbots" support #34501

bradfitz opened this issue Sep 24, 2019 · 32 comments
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bradfitz
Copy link
Contributor

We aim to make TryBots be only 5 minutes, but sometimes people want long, deeper tests over all architectures.

We've heard from the compiler/runtime team (@martisch, et al) that a more broad TryBot mode would be useful.

I thought we had a bug for this already but can't find it. (Maybe it was a comment on another bug)

The plan I thought writing about elsewhere was that when clicking "+1 TryBot", you could write some magic syntax in the comment field on Gerrit to specify which extra bots run. Some other tool can help you construct that magic comment & do the Gerrit comment for you, like git-codereview does (when you specify git-codereview mail -trybot)

@gopherbot gopherbot added this to the Unreleased milestone Sep 24, 2019
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Sep 24, 2019
@bradfitz
Copy link
Contributor Author

Related:

@martisch
Copy link
Contributor

Related:

@bcmills
Copy link
Contributor

bcmills commented Sep 24, 2019

@bcmills
Copy link
Contributor

bcmills commented Sep 24, 2019

Heh, @bradfitz used the term “SlowBots” in #29239 (comment) too.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 26, 2019
@eliasnaur
Copy link
Contributor

This would be fantastic for testing changes on the slow and difficult to set up mobile devices. It would have helped avoid the 4 part fix https://go-review.googlesource.com/c/go/+/193847/

@andybons
Copy link
Member

andybons commented Oct 1, 2019

Maybe we fold this into an umbrella issue for a Commit Queue that would cover a lot of the issues above?

@bradfitz
Copy link
Contributor Author

bradfitz commented Oct 1, 2019

Let's not. They're different enough and I was actually working on this one last week.

@gopherbot
Copy link

Change https://golang.org/cl/201204 mentions this issue: maintner/maintnerd: return TRY= comment directives in Run-TryBot votes

gopherbot pushed a commit to golang/build that referenced this issue Oct 16, 2019
For slowbots support.

Updates golang/go#34501

Change-Id: I6f8a8b187c3ce4531498f106d62e29e1d1f421d8
Reviewed-on: https://go-review.googlesource.com/c/build/+/201204
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/201338 mentions this issue: cmd/coordinator: add SlowBots support, opt-in to different slow trybots

@bradfitz
Copy link
Contributor Author

It's alive! (on https://go-review.googlesource.com/c/go/+/171822/22#message-1627ac4750a897104245e2dc87cc20f838c0b057)

Screen Shot 2019-10-15 at 11 08 22 PM

/cc @ianlancetaylor

@aclements
Copy link
Member

This is really helpful!

How do we know what tokens we can list in "TRY"? Is there an "all" alias?

@bradfitz
Copy link
Contributor Author

The details of the TRY= directive syntax is mostly undefined/undesigned at this point, so opinions welcome. I went with some quick implementation for now just so I could implement something.

The tokens can currently be any builder name, or a handful of defined terms that map to the best builder for that term. They currently always add to the default trybot step.

The CL's dashboard/builders.go defines the well-known aliases at the top, but that's subject to change.

I'm somewhat reluctant to add "all", as that's pretty large and overkill, but maybe. I'd at least want to add some "mostly-all" or "each" (one of each GOOS GOARCH) too, which would be different in that "all" would include really everything, like, linux-386-387 and linux-386-nocgo and weirdo things, but "each" wouldn't? I'm not sure.

@aclements
Copy link
Member

It would be great to be able to specify GOOSes and GOARCHes and get a reasonable set of builders for those (perhaps all of the "first-class ports" that match). It's generally pretty clear what GOOSes and GOARCHes you touched, and harder to manually map those to a useful set of builders.

One problem I've run into is that specifying a broken builder appears to leave to trybots stuck for a really long time. E.g., on CL 201402 I listed netbsd-arm-bsiegert, and now that I look at the dashboard, that builder clearly just isn't up. But the trybot run has been dutifully waiting an hour and twenty minutes at this point to get a spot on that builder, which means I don't get the trybot report back on the CL. I know these are slowbots, but maybe it should give up after, say, an hour?

@agnivade
Copy link
Contributor

This is fantastic ! Yes, a few handful of predefined terms which map to different levels of builder coverage would be very helpful.

@aclements
Copy link
Member

I've been using this a bunch on my non-cooperative preemption changes since they touch a lot of architecture code and it's been incredibly helpful.

A few things that have caught me by surprise (these aren't necessarily wrong, but should probably be noted in any documentation):

  1. TRY= must be specified in the same comment that sets Run-TryBot+1.
  2. If there's already a Run-TryBot+1 comment for the latest PS, adding a TRY comment, even with Run-TryBot+1 in the same comment, won't start the slowbots. I think the only way to get out of this state is to upload a new PS or clear TryBot-Result (which requires special bits). This one is probably more of a bug than a feature.
  3. If there's a TRY comment on an older PS, setting Run-TryBot+1 on a new PS without specifying TRY again will run whatever the last TRY set specified was. This is probably the right behavior, but surprised me because of surprise 1. My mental model had been "whatever TRY is specified on the Run-TryBot+1 that triggered the trybots".

@bradfitz
Copy link
Contributor Author

I got sidetracked by other stuff, but I just made a minor update tonight:

  • For the TRY= terms, you can now use any GOARCH, GOOS, or GOOS-GOARCH value, without having to know the exact builder name, or the best builder name. It now has a mapping of what the best one is. A GOARCH by itself means linux in general. (amd64p32 might mean nacl on release branches, but I forgot to include it and my tests don't catch it because it runs go tool dist list with my tip Go)
  • It will report a bit more on what it did. It's still terrible, but it's something. There's another bug open for making TryBot result pages permanent after they complete.

gopherbot pushed a commit to golang/build that referenced this issue Oct 18, 2019
This parses TRY= comments to opt-in to slower/difference trybots.

This needs some docs/UI work yet.

Updates golang/go#34501

Change-Id: I13a835520d7ac341bc86139b0a2118235bc83e60
Reviewed-on: https://go-review.googlesource.com/c/build/+/201338
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@bradfitz
Copy link
Contributor Author

I wrote the start of some docs:
https://github.com/golang/go/wiki/SlowBots

@bradfitz
Copy link
Contributor Author

Spit-balling some additional TRY= token matching ideas: (the existing ones would continue to work)

  • TRY=all-os (at least one of each GOOS, auto pick best GOARCH for each)
  • TRY=all-arch (at least one of each GOARCH, auto pick best GOOS for each)
  • TRY=all-os-arch and/or TRY=all-ports (at least one of each GOOS/GOARCH combo, auto pick best specific builder config for each)
  • TRY=all-os-arch-variants (literally everything. But spelled annoyingly long to make not reach for it too easily which I fear they would if it were called all)
  • TRY=windows-*-* all variants of Windows
  • TRY=dragonfly-amd64-* all variants of DragonFly (including bare dragonfly-amd64; it's not technically a glob match)

Thoughts?

@bcmills
Copy link
Contributor

bcmills commented Oct 21, 2019

I like the idea of biasing toward fewer builders. (Better than my suggestion to use build constraints, at any rate.)

Maybe wildcards consistently instead of all?

  • TRY=*-386 (one of each GOOS for GOARCH=386)
  • TRY=linux-* (one of each GOARCH for GOOS=linux)
  • TRY=dragonfly-amd64-* (all configurations of the named GOOS/GOARCH)
  • TRY=*-*-race (all race builders)
  • TRY=*-*-* (literally everything; not as annoyingly long but that Shift key is a bit of a reach)

@bradfitz
Copy link
Contributor Author

bradfitz commented Oct 21, 2019

@bcmills, that's exactly what I started with but I had some hesitation that it was too cryptic, especially with the one you didn't include:

  • TRY=*-* (one of each port; a GOOS/GOARCH combo)

Unlike TRY=*-*-*, I expect *-* to be somewhat common and reasonable, so I was afraid it was too ugly and would confuse people and all-ports would be more readable

But if you're cool with it, I'm fine with it.

I'm fine with *-*-race too but that's somewhat of a special case. That's the only well-known/magic third component name.

@bcmills
Copy link
Contributor

bcmills commented Oct 21, 2019

That's the only well-known/magic third component name.

At least, until we address #29641 and #26529. 😉

@bcmills
Copy link
Contributor

bcmills commented Oct 21, 2019

I expect *-* to be somewhat common and reasonable, so I was afraid it was too ugly and would confuse people and all-ports would be more readable

I suspect that *-* will be too expensive to use much in practice. (If we used it consistently, it would likely saturate the reverse builders for a lot of platforms that don't actually need that much bot coverage.)

So a bit of ugliness there may be a feature.

@bradfitz
Copy link
Contributor Author

@bcmills, oh, the other reservation I had about using the * globby patterns is how do you say "one of each GOOS, regardless of GOARCH?" (e.g. we only have aix for ppc64, not amd64, and nacl for amd64p32/386/arm)

TRY=* doesn't sound right. And using a placeholder like _ for "any" like TRY=*-_ is cryptic.

@gopherbot
Copy link

Change https://golang.org/cl/203421 mentions this issue: dashboard: add SlowBot alias for freebsd-arm64-dmgk

gopherbot pushed a commit to golang/build that referenced this issue Oct 25, 2019
This should fix TestSlowBotAliases on tip.

Updates golang/go#24715
Updates golang/go#34501

Change-Id: I80f99a5c24ef0860717605816a753fe6625aff17
Reviewed-on: https://go-review.googlesource.com/c/build/+/203421
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
codebien pushed a commit to codebien/build that referenced this issue Nov 13, 2019
For slowbots support.

Updates golang/go#34501

Change-Id: I6f8a8b187c3ce4531498f106d62e29e1d1f421d8
Reviewed-on: https://go-review.googlesource.com/c/build/+/201204
Reviewed-by: Ian Lance Taylor <iant@golang.org>
codebien pushed a commit to codebien/build that referenced this issue Nov 13, 2019
This parses TRY= comments to opt-in to slower/difference trybots.

This needs some docs/UI work yet.

Updates golang/go#34501

Change-Id: I13a835520d7ac341bc86139b0a2118235bc83e60
Reviewed-on: https://go-review.googlesource.com/c/build/+/201338
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
codebien pushed a commit to codebien/build that referenced this issue Nov 13, 2019
This should fix TestSlowBotAliases on tip.

Updates golang/go#24715
Updates golang/go#34501

Change-Id: I80f99a5c24ef0860717605816a753fe6625aff17
Reviewed-on: https://go-review.googlesource.com/c/build/+/203421
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@bradfitz
Copy link
Contributor Author

Talking to myself,

TRY=* doesn't sound right. And using a placeholder like _ for "any" like TRY=*-_ is cryptic.

I suppose ? is a better globby pattern instead of _.

So TRY=*-? would mean all OSes on any (the best relevant) architecture.

Too cryptic?

TRY=geese would be cuter.

@gopherbot
Copy link

Change https://golang.org/cl/227778 mentions this issue: cmd/coordinator: consider explicitly requested builders as SlowBots

@dmitshur
Copy link
Contributor

dmitshur commented Apr 9, 2020

In order to resolve #38279 and make SlowBots more predictable, I am considering changing them slightly so that whenever a builder is explicitly requested via the TRY= syntax, it is always considered a SlowBot.

Previously, if a builder was explicitly requested via TRY= but also was a part of the default trybot set for a given CL, then it wasn't considered to be a SlowBot. So it was possible to write TRY=linux-amd64 and see "TryBots are starting ..." instead of "SlowBots are starting ...".

I've sent CL 227778 for review that implements this change. Please let me know if there's any feedback.

@gopherbot
Copy link

Change https://golang.org/cl/227780 mentions this issue: cmd/coordinator: consider explicitly requested builders as SlowBots

gopherbot pushed a commit to golang/build that referenced this issue Apr 9, 2020
Previously, a builder was considered to be "SlowBot" whenever it was
requested via the TRY= syntax, and it wasn't already a part of the
default set of trybot builders.

This change makes it so that a builder is considered a SlowBot if
it was explicitly requested via the TRY= syntax, even if it was
already going to run anyway.

The goal behind this change is to improve the experience of using
SlowBots, by making them more useful and predictable.

The UI currently reports that "SlowBots are starting ..." whenever
there is a non-zero amount of SlowBots. So if a user requested
TRY=linux-amd64, they'd likely see "TryBots are starting ...",
which can be misunderstood to mean that SlowBots are not working.

Additionally, if a user requested three builders via TRY= syntax,
but one of them happened to be a part of the default try set, then
the message at the end would be "Extra slowbot builds that ran:"
including only 2 builders. At that point, it's hard to tell whether
they got the TRY= syntax wrong for one of the builders, or if it was
omitted because it's always run anyway.

A future usability improvement in golang.org/issue/38279 is to make
SlowBots not skip any tests that normal trybots may skip for faster
results. This change is in preparation for that.

For golang/go#38279.
For golang/go#34501.

Change-Id: Idb7f1bcb1694fe72f85237976995a94f273c7c16
Reviewed-on: https://go-review.googlesource.com/c/build/+/227778
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/227779 mentions this issue: cmd/coordinator, dashboard: don't skip cmd/dist tests for SlowBots

gopherbot pushed a commit to golang/build that referenced this issue Apr 9, 2020
When a user explicitly requests a SlowBot builder via the TRY= syntax,
it is a signal that they're willing to wait longer for test results.
It's more useful to not skip any tests and give the user confidence
that the SlowBot run will catch all issues that would otherwise be
caught only by post-submit builders after the CL is submitted.

This is done by considering a TryBot that isn't an explicit SlowBot
to be a "normal TryBot", and the ShouldRunDistTest policy function
is provided a parameter specifying whether it's a normal TryBot
(non-SlowBot) run.

Tests are only skipped during normal TryBot runs, not SlowBot runs.

Fixes golang/go#38279.
For golang/go#34501.

Change-Id: Ia11fd51c71f9de1089b6b6f4c027014893f224b9
Reviewed-on: https://go-review.googlesource.com/c/build/+/227779
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@randall77
Copy link
Contributor

Slowbots seem to be stickier than I imagined. See CL 238077. I requested a slowbot early on in development. I'd now like to run the normal trybot set, but it seems to still be picking up my original slowbot request, even though that request is now several patch sets in the past.

@dmitshur
Copy link
Contributor

@randall77 That is intended behavior, documented at https://golang.org/wiki/SlowBots:

When running TryBots again later, the most recent TRY= comment on the current patchset is used. To turn it off set TRY= with an empty string after the equals sign. If the current patchset doesn't have a TRY= comment, the most recent non-empty TRY= comment is used.

If you already know that and would like to suggest a change, please open a separate issue about it.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 18, 2020
@dmitshur
Copy link
Contributor

I'm going to close this high level tracking issue as done.

By now, SlowBots are widely available and very useful for development purposes. We intend to continue to support them (or an alternative mechanism that provides equivalent functionality) as long as they're deemed useful, and we'll continue to improve them over time together with the rest of the build infrastructure. For all remaining bugs and usability improvements related to SlowBots, please open new issues with "x/build/cmd/coordinator:" prefix and SlowBots in the title.

Thank you @bradfitz and everyone else who contributed to making SlowBots a reality!

@golang golang locked and limited conversation to collaborators Aug 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

10 participants