-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: migrate off Kubernets buildlets back to VMs? #25108
Comments
/cc @bcmills |
+1 for this. The flakiness of the builders under load was palpable and painful during the 1.10.1 release. |
Just started on this. So far so good. I created a VM running the cos-stable image (
(and the normal ... and then the buildlet came right up, and pretty quickly. (didn't measure) The "privileged: true" part was a test. I think we'll be able to run a few more tests than we did before with it. And I think the 'tty: true' part is unnecessary. I thought we needed it for something, but I think I'm misremembering. |
Change https://golang.org/cl/111267 mentions this issue: |
Note: after this is live, we'll probably want to remove the |
Change https://golang.org/cl/111639 mentions this issue: |
Once containers run on COS instead of Kubernetes, one name (Kube*) is wrong and the other (GCE) is ambiguous. So rename them now to be more specific. No behavior changes. Just renaming in this step, to reduce size of next CL. Updates golang/go#25108 Change-Id: Ib09eb682ef74acbbf6ed50b46074f834ef5e0c0b Reviewed-on: https://go-review.googlesource.com/111639 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change https://golang.org/cl/111640 mentions this issue: |
Updates golang/go#25108 Change-Id: I5a82a4b26407158cf24d770a887759f8335d6441 Reviewed-on: https://go-review.googlesource.com/111640 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change https://golang.org/cl/112715 mentions this issue: |
…exec Google's Container-Optimized Linux's konlet container start-up program creates any requested tmpfs mounts as noexec. That doesn't work for doing builds in, so remount it as executable. This is required to run builds on COS instead of GKE. Updates golang/go#25108 Change-Id: I9b719caf9180a03bafefa5b3b4b47ee43b9e5c1c Reviewed-on: https://go-review.googlesource.com/112715 Reviewed-by: Andrew Bonventre <andybons@golang.org>
Change https://golang.org/cl/112735 mentions this issue: |
Change https://golang.org/cl/112735 mentions this issue: |
The nacl image hadn't been updated in 2+ years and it needed to be updated as part of rolling out the new COS-based builders. But no released version works for us yet; we were getting the same errors as in golang/go#23836 ("Signal 11 from untrusted code") We were getting lucky that it was working with an ancient (pepper_34?) version, but I was unable to get those working again either. Rolling forward is better anyway, as we haven't had a Dockerfile reflecting reality for this builder for 2+ years. This is the same version used in playground in CL 101735, which said: > playground: update NaCl to trunk.544461 > > This pulls in https://crrev.com/c/962675, which fixes the > underlying issue of NaCl mishandling signals during a SIGSEGV. Updates golang/go#23836 Updates golang/go#25108 Change-Id: I187042af71a1249e84ce2070aa8039a88d2c02c2 Reviewed-on: https://go-review.googlesource.com/112735 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Andrew Bonventre <andybons@golang.org>
We've had a lot of flakiness with our Kubernetes-based buildlets. This seems to happen most during periods of high load, suggesting that we're still hitting isolation problems with Kubernetes, despite our various pod configuration knobs (requested CPU/mem + limit CPU/mem) being pretty paranoid and reviewed by various Kubernetes people.
Further, Kubernetes seems to get into a state where things are bad and then a cluster upgrade or nuke+recreate makes things good again.. for a bit.
I think it might be time to stop using Kubernetes for our buildlets (we'll still use them for all our misc services) and switch back to using VMs.
In the past, our Linux VMs were tedious because we had to prepare VM images for each config. We did this using a "docker2boot" tool I wrote that converted a container image (built from a Dockerfile) into a bootable GCE VM. But the whole process & testing was still slow & painful to iterate on.
When we moved to Kubernetes, we moved to more vanilla Dockerfiles with pushes & pulls to gcr.io. This was much less painful.
I don't propose we move back to custom VM images. I don't want to use docker2boot again (as cool of a hack as it was).
Instead, I think we should use GCE's container OS image (https://cloud.google.com/container-optimized-os/docs/) and use our existing buildlet containers.
The pros of moving to VMs:
The cons of moving to VMs:
I think this is worth trying. We can make it a flag and be able to revert easily. We don't delete the Kubernetes-based building code in case we want to switch back to it in the future.
/cc @andybons @FiloSottile
The text was updated successfully, but these errors were encountered: