Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: darwin-amd64 trybot waiting for 84+ minutes #23856

Closed
mdempsky opened this issue Feb 15, 2018 · 6 comments
Closed

x/build: darwin-amd64 trybot waiting for 84+ minutes #23856

mdempsky opened this issue Feb 15, 2018 · 6 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@mdempsky
Copy link
Member

https://farmer.golang.org/try?commit=99b72794

darwin-amd64-10_11 running 84.4 min

darwin-amd64-10_11 rev 99b72794 (trybot set for Ic3a9667); running; (nil *buildlet.Client), 1h24m22.84109737s ago
  2018-02-15T20:33:47Z checking_for_snapshot 
  2018-02-15T20:34:11Z finish_checking_for_snapshot after 176.3ms
  2018-02-15T20:34:11Z get_buildlet 
  2018-02-15T20:34:11Z wait_static_builder host-darwin-10_11
  2018-02-15T20:34:11Z waiting_machine_in_use 
 +2775.3s (now)

Is this expected? Are the darwin trybots just this heavily overloaded at the moment?

/cc @bradfitz

@gopherbot gopherbot added this to the Unreleased milestone Feb 15, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Feb 15, 2018
@mdempsky
Copy link
Member Author

/cc @andybons

@bradfitz
Copy link
Contributor

Expected. @aclements just committed a bazillion CLs, and we only have 20 Mac VMs.

screen shot 2018-02-15 at 1 45 36 pm

We have an open bug for a scheduler (#19178) to smartly assign buildlets to builds, with priorities, and that's up soon on my list, as I'm ramping back up to work.

@mdempsky
Copy link
Member Author

I see. I saw the build dashboard was pretty busy looking, but it seemed odd darwin-amd64 (which I thought is usually on the fast side of things) was the only hanging trybot.

Is darwin-amd64 the only trybot that can't scale with demand?

@bradfitz
Copy link
Contributor

Is darwin-amd64 the only trybot that can't scale with demand?

Currently, yes. There was also linux-arm in the trybot set, which had a fixed number (50), but those were disabled due to #22748 (first) and #22749 (most recently). I could probably re-enabled them, and wait until their networking sucks again, or redesign things to not depend on the network as much, which is unfortunate.

We could buy more Macs, or hope that #19178 and #23858 are sufficient.

@aclements
Copy link
Member

Sorry :(

I'm not sure this is the only problem, though. When I was trybot'ing those changes earlier today the dashboard was pretty quiet and I don't think any trybots other than mine were running, but it still took over two hours for all of the darwin-amd64 trybots to finish. It's only 16 CLs, so shouldn't 20 Mac VMs be enough to handle this plus a bit?

@bradfitz
Copy link
Contributor

@aclements, well, each CL consumes 3 Mac VMs for sharding (or up to 4 for Trybots). And some of the VMs are currently statically partitioned into distinct roles (some for macOS 10.8, some for macOS 10.12 Sierra, etc).

We expect only 15 VMs for macOS 10.11, which is what we run for TryBots.

But, ah --- only 7 are connected at the moment, which I see via https://farmer.golang.org/status/reverse.json and the front page of https://farmer.golang.org/

We don't alert on that, of course. #22603 and #21315 and #15760 track that.

@andybons and I need to come up with a plan, now that the go-cloud team has stopped working on it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

4 participants