-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: darwin-amd64 trybot waiting for 84+ minutes #23856
Comments
/cc @andybons |
Expected. @aclements just committed a bazillion CLs, and we only have 20 Mac VMs. We have an open bug for a scheduler (#19178) to smartly assign buildlets to builds, with priorities, and that's up soon on my list, as I'm ramping back up to work. |
I see. I saw the build dashboard was pretty busy looking, but it seemed odd darwin-amd64 (which I thought is usually on the fast side of things) was the only hanging trybot. Is darwin-amd64 the only trybot that can't scale with demand? |
Currently, yes. There was also linux-arm in the trybot set, which had a fixed number (50), but those were disabled due to #22748 (first) and #22749 (most recently). I could probably re-enabled them, and wait until their networking sucks again, or redesign things to not depend on the network as much, which is unfortunate. We could buy more Macs, or hope that #19178 and #23858 are sufficient. |
Sorry :( I'm not sure this is the only problem, though. When I was trybot'ing those changes earlier today the dashboard was pretty quiet and I don't think any trybots other than mine were running, but it still took over two hours for all of the darwin-amd64 trybots to finish. It's only 16 CLs, so shouldn't 20 Mac VMs be enough to handle this plus a bit? |
@aclements, well, each CL consumes 3 Mac VMs for sharding (or up to 4 for Trybots). And some of the VMs are currently statically partitioned into distinct roles (some for macOS 10.8, some for macOS 10.12 Sierra, etc). We expect only 15 VMs for macOS 10.11, which is what we run for TryBots. But, ah --- only 7 are connected at the moment, which I see via https://farmer.golang.org/status/reverse.json and the front page of https://farmer.golang.org/ We don't alert on that, of course. #22603 and #21315 and #15760 track that. @andybons and I need to come up with a plan, now that the go-cloud team has stopped working on it. |
https://farmer.golang.org/try?commit=99b72794
darwin-amd64-10_11 running 84.4 min
Is this expected? Are the darwin trybots just this heavily overloaded at the moment?
/cc @bradfitz
The text was updated successfully, but these errors were encountered: