Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: run an Oracle Solaris builder #15072

Closed
bradfitz opened this issue Apr 2, 2016 · 48 comments
Closed

x/build: run an Oracle Solaris builder #15072

bradfitz opened this issue Apr 2, 2016 · 48 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Apr 2, 2016

@jtsylve made the mistake of volunteering to run a new builder in #14957 (comment)

This bug is about that process.

See https://github.com/golang/go/wiki/DashboardBuilders

@bradfitz bradfitz added the Builders x/build issues (builders, bots, dashboards) label Apr 2, 2016
@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 2, 2016

(I hit shift-enter or something and posted before done typing)

You basically have two options:

  1. run the old builder in a loop. It doesn't support trybots, and doesn't have high network requirements, but it works for build.golang.org at least and we'll catch mistakes retroactively.

  2. for the new-style builder, it should run on a fast network connection and ideally be very elastic (run in the cloud somewhere) or have a lot of capacity so it doesn't become the slowest trybot builder.

Does Sun Solaris run on GCE or EC2?

@binarycrusader
Copy link
Contributor

Oracle Solaris does not run on GCE or EC2 as far as I know currently as it does not have support for the virtnet, virtio, etc. drivers.

Aram and I are working on an alternative that we hope to provide more information about soon.

@bradfitz bradfitz added this to the Unreleased milestone Apr 7, 2016
@DemiMarie
Copy link

What about running Solaris in a VM inside Linux?

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 7, 2016

@drbo, which VMs does Solaris run in? Ideally Free VMs, since I don't want to deal with paperwork. (I'm fine with paying for stuff, but not if it involves time)

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

I can confirm it works on VMWare, but that's of course not free.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

Actually, VMWare may work for us. We're running some Mac builders on MacStadium.com, and they support VMWare as an option.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

I'll try getting the builder working locally in a VMWare instance first just to make sure there won't be issues.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

Brad can you send me whatever hash I need to get this working?

@binarycrusader
Copy link
Contributor

Solaris should reliably work in VMWare and VirtualBox at the least. I have heard it works in QEMU as well. Thanks @jtsylve for your help.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

@jtsylve, email me. my username at golang.org.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

VirtualBox headless would be ideal. Then we could run it on GCE on Linux.

gopherbot pushed a commit to golang/build that referenced this issue Apr 8, 2016
Updates golang/go#15072

Change-Id: I4e02556429ac65d521e6d01687232c1412d078fb
Reviewed-on: https://go-review.googlesource.com/21766
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

@bradfitz I'm getting the following error:

solaris-amd64-oraclejtsylve at 6c5352f181846b73d532c039df3017befe657d6a

:: Running /tmp/workdir/go/src/make.bash with args ["/tmp/workdir/go/src/make.bash"] and env ["HZ=100" "LC_MONETARY=" "SHELL=/usr/bin/bash" "TERM=sun-color" "LC_NUMERIC=" "OLDPWD=/go1.4/src" "LC_ALL=" "PAGER=/usr/bin/less -ins" "MAIL=/var/mail/root" "PATH=/go1.6/bin:/root/go/bin:/go1.6/bin:/go1.4/bin:/usr/bin:/usr/sbin" "LC_MESSAGES=" "LC_COLLATE=" "PWD=/root" "LANG=en_US.UTF-8" "TZ=localtime" "SHLVL=1" "HOME=/root" "GOROOT=/go1.6" "LOGNAME=root" "LC_CTYPE=" "GOPATH=/root/go" "LC_TIME=" "_=/root/go/bin/buildlet" "GOROOT_BOOTSTRAP=/usr/local/go-bootstrap" "WORKDIR=/tmp/workdir" "GO_BUILDER_NAME=solaris-amd64-oraclejtsylve" "GOBIN="] in dir /tmp/workdir/go/src

##### Building Go bootstrap tool.
cmd/dist
ERROR: Cannot find /usr/local/go-bootstrap/bin/go.
Set $GOROOT_BOOTSTRAP to a working Go tree >= Go 1.4.

/usr/local/go-bootstrap/bin/go is there and is executable.

This seems similar to the error that one of the live solaris builders started getting today.

https://build.golang.org/log/975e3884c0d1b523155ff108d86755988c153619

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

When I run the bootstrap version of go it bails with an error go: cannot find GOROOT directory: /home/bradfitz/go-solaris-amd64-bootstrap. It looks like there's something hardcoded to you there.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

Why are you running the bootstrap version of Go? It's for the builders to use, and the builders sets the proper environment variables.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

I'm not trying to run the bootstrap version of Go. The builders are failing with it, so to debug I tried to execute it and got that error. I also noticed that the public solaris builder started to fail with the same error today (presumably when you updated that bootstrap tar).

If I copy over my go build to the bootstrap directory, everything works as expected (fails due to issue #14957)

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

Are you sure? Where is the builder failing with it? Got a failure log URL linked off http://go-dashboard-dev.appspot.com/ somewhere?

Ignore the failing public solaris builder. That's @zombiezen redoing things. He just doesn't have the bootstrap installed yet. I don't see any cannot find GOROOT directory error in either the public or your builders.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

https://go-dashboard-dev.appspot.com/log/b5dfb88fb2f96ef180c6ce4ff784029bd7b2bb32

That says it can't find the bootstrap binary (although it's there, but fails with the above error if I try to run it). If I replace the bootstrap binaries from the tar with my own go runtime in the same path, it seems to work as expected.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 8, 2016

I don't see why your binaries vs the bootstrap ones would matter. I'll wait until @zombiezen has the SmartOS one running again before we try to debug the Oracle one more.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 8, 2016

I figured it out. It was a permissions issue. Untarring those files sets the permissions to user 1000 instead of root.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 9, 2016

OK, I've got a script to install the buildlet on a fresh oracle solaris build as a service that automatically restarts. Everything seems to be working well. I think I'm just waiting on the new stage0 binary to simplify things.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 9, 2016

@jtsylve, and can that then run in VirtualBox on Linux on GCE?

@jtsylve
Copy link
Contributor

jtsylve commented Apr 9, 2016

I'll do some testing to find out. I'm looking at some of the bsd env scripts as a reference and I'm just not sure how the auto install will work since the solaris isos aren't in a public mirror.

Right now it should work on the VMWare thing like the OS X builders.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 9, 2016

I don't even know what the licensing of Oracle Solaris is. If it's commercial, maybe it's easier to just run it on EC2 instead:

Then, like Amazon does with Windows, we just let Amazon take care of the licensing costs and we only pay Amazon.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 9, 2016

The license is open (http://www.oracle.com/technetwork/licenses/solaris-cluster-express-license-167852.html), but you need to agree and sign into a free oracle account to download it.

Docker is coming to solaris as well, so that will be nice: https://www.oracle.com/corporate/pressrelease/docker-gets-in-the-zone-with-oracle-solaris-073015.html

@binarycrusader
Copy link
Contributor

@bradfitz Solaris is not supported on EC2 as far as I know (not directly). You'd have to run it inside VirtualBox or something else, which I'm uncertain whether that it is possible if the EC2 instances are themselves virtualized (I don't know).

@jtsylve standard disclaimer: I am not a lawyer, and this is not legal advice, and I am not a spokesperson for my employer.

With that said, the short version is that you may use the version downloaded from the Oracle Technology Network with this primary limitation (from the link that @jtsylve provided, which is the correct one):

... limited License to use the Programs only for the purpose of developing, testing, prototyping and demonstrating your applications, and not for any other purpose.

However, please be aware that (generally speaking) no security fixes or other updates are provided without a support contract, so any running instance of that version should be secured accordingly with that in mind. I hope that more favorable terms are available in the future for development/testing licenses.

@binarycrusader
Copy link
Contributor

@bradfitz My understanding is that it does, since we've asked about it before, but I can understand the desire for "official" sanction; I will email them today and hopefully hear back early this coming week.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 9, 2016

Actually, it'd be less work for me and licensing stress if Oracle could just run the Go builders themselves.

Once you get it working reliably on our staging build coordinator I can give you a token to get work and show up on our production build coordinator. Then you can run as many VMs as you'd like, but I don't have to own them.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 9, 2016

I uploaded the build script for comment: https://go-review.googlesource.com/#/c/21791

@gopherbot
Copy link

CL https://golang.org/cl/21791 mentions this issue.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 11, 2016

Now that #14957 is fixed the test builder is now reporting successful builds on oracle solaris.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 12, 2016

I just ran into an issue with the oracle solaris builder that I could use some advice on from one of the solaris gurus. My internet connection at home dropped, so the buildlet was not able to connect to the coordinator. After a few attempts, the service dropped into maintenance mode with this in the log. "Restarting too quickly, changing state to maintenance". How can I configure the service to not do this?

I've got the following in the manifest, because I thought it would allow like 10,000 restarts in a second before failing, but that doesn't seem to be the case:

<property_group name="startd" type="framework">
  <propval name='critical_failure_count' type='integer' value='10000'/>
  <propval name='critical_failure_period' type='integer' value='1'/>
</property_group>

@binarycrusader
Copy link
Contributor

@jtsylve I would need to see the output from "svcs -xv" before I can provide a recommendation. But generally, for altering the timeout, you don't need to alter the manifest, you can set properties using an SMF profile.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 12, 2016

I fixed the issue. The buildlet restarts too often for startd's liking. It puts it in maintenance mode after 5 restarts. The critical_failure_count and critical_failure_period seem like additions to SmartOS and not Oracle Solaris. I solved the problem by just running the buildlet in a loop in the startup script. (See: https://golang.org/cl/21791)

@bradfitz Can you check on the status of the builder? I think I may have disabled it somehow. The buildlet is connected to the coordinator, but is no longer getting new assignments.

The last log mentions this:

2016-04-12T02:10:41Z finish_start_tests after 54.654536667s; err=Buildlet http://solaris reverse peer solaris/174.70.120.206:61429 for modes [solaris-amd64-oraclejtsylve] failed heartbeat after 10.000822451s; marking dead; err=timeout waiting for headers; solaris: [test:3_5]
2016-04-12T02:10:41Z main_buildlet_broken solaris

@binarycrusader
Copy link
Contributor

@jtsylve how is the service failing? Historically, SMF had a hard limit of three failures to start a service before it goes into maintenance.

If the buildlet service doesn't start reliably, then a better answer is probably something similar to what you did.

@binarycrusader
Copy link
Contributor

@bradfitz wrote:

@binarycrusader, do you know if we could get somebody from Oracle to clarify whether running a continuous build for developing Go counts as "developing, testing, prototyping and demonstrating your applications"?

I have confirmed with corporate that this is an allowed use case.

@jtsylve
Copy link
Contributor

jtsylve commented Apr 16, 2016

So where do we go from here? The script I provided in the attached CL seems to do all the heavy lifting for setup and works very well. I've got the test server running on a VM on my laptop and even with the frequent service outages (due to being on a laptop), it seems to consistently come back on line and catch up work, so I'd say it's reasonably stable.

@bradfitz
Copy link
Contributor Author

@jtsylve, can you move it to EC2 or GCE inside VirtualBox now? I can then get you some production keys.

@zombiezen
Copy link
Contributor

After #9515, there is now a README with a set of instructions inside x/build/env/solaris-amd64/joyent, along with an SMF service definition that runs the buildlet. This should make it straightforward to run on another Solaris machine.

@bradfitz
Copy link
Contributor Author

@jtsylve, you still interested in this? It looks like https://go-review.googlesource.com/#/c/21791/ could be substantially simpler with the work @zombiezen did.

Let us know if we can help.

@jtsylve
Copy link
Contributor

jtsylve commented Jun 30, 2016

I haven't gotten a chance to sit down and try to learn how to use GCE, and it's unclear to me how to proceed with that, really. The script itself works, but it's possible that it could be made simpler.

@bradfitz
Copy link
Contributor Author

This has nothing to do with GCE.

@jtsylve
Copy link
Contributor

jtsylve commented Jun 30, 2016

OK Then i'm confused. Your last instruction was to try to move the server to GCE under virtual box. That's the setup I haven't been able to figure out yet. What's the next steps?

@bradfitz
Copy link
Contributor Author

I'm confused too. I'll talk to Ross and figure out where we're at with the SmartOS builders and get back to you. I've restored your CL in the meantime. It might be the answer, or part of it.

gopherbot pushed a commit to golang/build that referenced this issue Jun 30, 2016
This script installs the buildlet as a service.  It does not yet
use the stage0 binary.

Updates golang/go#15072
Updates golang/go#9515

Change-Id: I1566a821cbc26b9007d5ceba20020c2efa37f038
Reviewed-on: https://go-review.googlesource.com/21791
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz bradfitz changed the title x/build: run a Sun Solaris builder? x/build: run an Oracle Solaris builder Jun 27, 2017
@gopherbot
Copy link

CL https://golang.org/cl/46724 mentions this issue.

gopherbot pushed a commit to golang/build that referenced this issue Jun 27, 2017
Updates golang/go#15072

Change-Id: I56767a421428418add66aa9b50f0baf9aa202538
Reviewed-on: https://go-review.googlesource.com/46724
Reviewed-by: Andrew Bonventre <andybons@google.com>
Reviewed-by: Shawn Walker-Salas <shawn.walker@oracle.com>
@gopherbot
Copy link

CL https://golang.org/cl/46831 mentions this issue.

@bradfitz
Copy link
Contributor Author

Deployed to farmer.golang.org.

@binarycrusader
Copy link
Contributor

binarycrusader commented Jun 27, 2017

Looks like latest version of Go has some regressions compared to 1.7 causing some unexpected test failures for the builder; I'll file issues and investigate.

@golang golang locked and limited conversation to collaborators Jun 27, 2018
@rsc rsc unassigned jtsylve Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

6 participants