Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/makemac: Mac VMs are failing to start due to "Invalid configuration" #23859

Closed
bradfitz opened this issue Feb 15, 2018 · 7 comments
Closed
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Feb 15, 2018

Half of our Mac VMs aren't running.

On our Mac VMware cluster, on the Linux VM that runs x/build/cmd/makemac in a systemd unit:

# journalctl -f -u makemac
...
Feb 15 14:19:06 godns makemac[24540]: $ govc device.usb.add -vm mac_10_11_host10a
Feb 15 14:19:07 godns makemac[24540]: $ govc vm.disk.attach -vm mac_10_11_host10a -link=true -persist=false -ds=Pure1-1 -disk osx_11_frozen/osx_11_frozen.vmdk
Feb 15 14:19:07 godns makemac[24540]: $ govc vm.destroy mac_10_11_host10a
Feb 15 14:19:08 godns makemac[24540]: 2018/02/15 14:19:08 Error creating 10.11: govc vm.disk.attach ...: exit status 1, govc: Invalid configuration for device '0'.
Feb 15 14:19:13 godns makemac[24540]: 2018/02/15 14:19:13 Have capacity for 8 more Mac VMs; creating requested 10.10 ...
Feb 15 14:19:14 godns makemac[24540]: $ govc vm.create -m 4096 -c 6 -on=false -net dvPortGroup-Private -g darwin14_64Guest -ds BOOT_8 mac_10_10_host08a
Feb 15 14:19:16 godns makemac[24540]: $ govc vm.change -e smc.present=TRUE -e ich7m.present=TRUE -e firmware=efi -e guestinfo.key-darwin-amd64-10_10=xx -e guestinfo.name=mac_10_10_host08a -vm mac_10_10_host08a
Feb 15 14:19:17 godns makemac[24540]: $ govc device.usb.add -vm mac_10_10_host08a
Feb 15 14:19:18 godns makemac[24540]: $ govc vm.disk.attach -vm mac_10_10_host08a -link=true -persist=false -ds=Pure1-1 -disk osx_10_frozen/osx_10_frozen.vmdk
Feb 15 14:19:18 godns makemac[24540]: $ govc vm.destroy mac_10_10_host08a
Feb 15 14:19:18 godns makemac[24540]: 2018/02/15 14:19:18 Error creating 10.10: govc vm.disk.attach ...: exit status 1, govc: Invalid configuration for device '0'.
...

Notice all the govc: Invalid configuration for device '0'..

Why did this start failing? This has been running unmodified for about 18 months.

Investigate.

/cc @andybons @mdempsky @aclements

@gopherbot gopherbot added this to the Unreleased milestone Feb 15, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Feb 15, 2018
@bradfitz bradfitz self-assigned this Feb 15, 2018
@bradfitz
Copy link
Contributor Author

Original bug report I should've used was #23856.

@gopherbot
Copy link

Change https://golang.org/cl/94601 mentions this issue: dashboard: disable Mac trybots for now

gopherbot pushed a commit to golang/build that referenced this issue Feb 15, 2018
Updates golang/go#23859

Change-Id: Ic7833420b5538f59314afefefb431bfd14355ece
Reviewed-on: https://go-review.googlesource.com/94601
Reviewed-by: Andrew Bonventre <andybons@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/95735 mentions this issue: dashboard: adjust how many Mac VMs we expect

gopherbot pushed a commit to golang/build that referenced this issue Feb 21, 2018
Updates golang/go#23859

Change-Id: I7e0fed5b17430669a726a8a7bbf8d3efc190034c
Reviewed-on: https://go-review.googlesource.com/95735
Reviewed-by: Andrew Bonventre <andybons@golang.org>
@bradfitz
Copy link
Contributor Author

Logged in and poked around. It seems our vSphere/vCenter/vWhatever crapped itself and ran out of disk space for something and then went downhill fast into a weird state.

The MacStadium folk are cleaning it up.

@bradfitz
Copy link
Contributor Author

MacStadium said they fixed something, but I still see 5 alerts.

But upon poking around more, I found that 4 of our 10 physical nodes had lost their connections to the shared NFS datastore. I had to manually remount those:

screen shot 2018-02-21 at 11 23 30 am

No clue why they became unmounted or why manual action was required to repair it.

But it all seems to be working again, even with VMware still alerting about stuff:

screen shot 2018-02-21 at 11 27 40 am

I'm following up with MacStadium about that. (https://portal.macstadium.com/tickets/47331)

/cc @andybons

@bradfitz
Copy link
Contributor Author

And I see all 20 back up & connected.

I'll re-enable trybots.

@ianlancetaylor
Copy link
Contributor

Seems like this issue is fixed, so closing.

@golang golang locked and limited conversation to collaborators Nov 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

3 participants