Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add LUCI openbsd-ppc64 builder #63480

Open
n2vi opened this issue Oct 10, 2023 · 23 comments
Open

x/build: add LUCI openbsd-ppc64 builder #63480

n2vi opened this issue Oct 10, 2023 · 23 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Milestone

Comments

@n2vi
Copy link

n2vi commented Oct 10, 2023

Following the instructions at Dashboard builders:

hostname openbsd-ppc64-n2vi

CSR is attached after renaming since Github doesn't seem to allow attaching with the name openbsd-ppc64-n2vi.csr you asked for.

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Oct 10, 2023
@gopherbot gopherbot added this to the Unreleased milestone Oct 10, 2023
@dmitshur dmitshur self-assigned this Oct 11, 2023
@gopherbot
Copy link

Change https://go.dev/cl/534976 mentions this issue: main.star: add openbsd-ppc64, linux-riscv64, freebsd-riscv64 builders

@dmitshur
Copy link
Contributor

dmitshur commented Oct 12, 2023

Thanks. Here's the resulting certificate: openbsd-ppc64-n2vi-1697128325.cert.txt.

I've mailed CLs to define your new builder in LUCI and will comment once that's done.

@n2vi
Copy link
Author

n2vi commented Oct 12, 2023

Thank you; I confirm that using the cert I get a plausible looking luci_machine_tokend/token.json.

gopherbot pushed a commit to golang/build that referenced this issue Oct 12, 2023
Since the list of BUILDER_TYPES is nearly sorted, keep that up,
and sort (using 'Sort Lines' in $EDITOR) two of Linux run mods.

For golang/go#63480.
For golang/go#63481.
For golang/go#63482.

Change-Id: Icef633ab7a0d53b5807c2ab4a076d74c291dc0ea
Reviewed-on: https://go-review.googlesource.com/c/build/+/534976
TryBot-Bypass: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
@dmitshur
Copy link
Contributor

Glad to hear!

Step 3 is complete (CL 534976 is submitted), so you should be able to proceed with the next steps. Please feel free to comment here if you see something unexpected, or run into a problem that the documentation doesn't cover. Thanks.

@dmitshur dmitshur assigned n2vi and unassigned dmitshur Oct 12, 2023
@n2vi
Copy link
Author

n2vi commented Oct 13, 2023

I have not read the code yet to diagnose this; leaving assigned to me.

2023/10/13 18:29:39 Bootstrapping the swarming bot with certificate authentication
2023/10/13 18:29:39 retrieving the luci-machine-token from the token file
2023/10/13 18:29:39 Downloading the swarming bot
2023/10/13 18:29:39 Starting the swarming bot /home/swarming/.swarming/swarming_bot.zip
72354 2023-10-13 18:29:47.331 E: ts_mon monitoring is disabled because the endpoint provided is invalid or not supported:
72354 2023-10-13 18:29:48.890 E: Request to https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake failed with HTTP status code 403: 403 Client Error: Forbidden for url: https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake
72354 2023-10-13 18:29:48.891 E: Failed to contact for handshake, retrying in 0 sec...

@n2vi
Copy link
Author

n2vi commented Oct 13, 2023

I don't see anything in the code or logs here that help me diagnose. It just looks like the server didn't like the token.json that had been refreshed just a minute before.

Maybe someone there can check server-side luci logs? Unable to reassign to dmitshur; hope someone there sees this.

@dmitshur
Copy link
Contributor

Thanks for the update.

I recall there was a similar looking error in #61666 (comment). We'll take a look.

@n2vi
Copy link
Author

n2vi commented Oct 13, 2023

In case it helps... I set both -token-file-path on the bootstrapswarm command line and also LUCI_MACHINE_TOKEN in the environment. The logs don't indicate any trouble reading the token.json file, though they're not very explicit.

I appreciate that there have been serious security flaws in the past from too-detailed error messages. But I'd venture that it is safe for luci to say more than "403".

I recognize I'm a guinea pig for the Go LUCI stuff, so happy to give you a login on t.n2vi.com if you would find it easier to debug directly or hop on a video call with screen sharing.

Finally, I recognize I'm a newcomer to Go Builders. So it could well be user error here.

@dmitshur
Copy link
Contributor

Thanks for your patience as we work through this and smooth out the builder onboarding process.

I set both -token-file-path on the bootstrapswarm command line and also LUCI_MACHINE_TOKEN in the environment.

To confirm, are both of them set to the same value, which is the file path location of the token.json file? If you don't mind experimenting on your side, you can check if anything is different if you leave LUCI_MACHINE_TOKEN unset and instead rely on the default location for your OS (/var/lib/luci_machine_tokend/token.json I believe).

We'll keep looking into this on our side. Though next week we might be somewhat occupied by a team event, so please expect some delays. Thanks again.

@n2vi
Copy link
Author

n2vi commented Oct 14, 2023

Yes, both are set to the same value /home/luci/luci_machine_tokend/token.json. (My OS doesn't have /var/lib and anyway not a fan of leaving cleartext credentials in obscure corners of the filesystem.)

This morning I've retried the same invocation of bootstrapswarm as before and don't get the 403 Client Error. So maybe there was just a transient issue.

Happy to set this effort on the shelf for a week or two; enjoy the team event!

@cagedmantis cagedmantis added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 16, 2023
@dmitshur
Copy link
Contributor

CC @golang/release.

@n2vi
Copy link
Author

n2vi commented Nov 3, 2023

Over the last week I tried swarm a few more times with no problems, so whatever issue I saw before indeed seems transient. I never saw swarm do any actual work, presumably because some server-side table is still pointing to my machine as in the old-builder state rather than new-builder. Fine by me.

I'll have limited ability to work on it from November 8 - 20, but happy to work on it during the next few days if you're waiting on me.

@dmitshur dmitshur assigned dmitshur and unassigned dr2chase Nov 28, 2023
@dmitshur
Copy link
Contributor

dmitshur commented Dec 7, 2023

The builder is currently in a "Quarantined—Had 6 consecutive BOT_DIED tasks" state. @n2vi Can you please restart the swarming bot on your side and see if that's enough to get it out of that state?

We've applied changes on our side (e.g., CL 546715) that should help avoid this repeating, but it's possible more work will be needed. Let's see what happens after you restart it next time. Thanks.

@dmitshur dmitshur assigned n2vi and unassigned dmitshur Dec 7, 2023
@n2vi
Copy link
Author

n2vi commented Dec 7, 2023

Restarted.

There is a message from swarming_bot urllib3 that they only support openssl but this is compiled with libressl. From the GitHub issue cited, I think this is merely a warning but if you're otherwise stuck I can investigate deeper.

Or if there is something else I can do to help, just ask.

FWIW, on my other openbsd-ppc64 machine, I've succeeded in compiling go1.22-devel from tip without change. Both machines are running what OpenBSD calls -current, i.e. compiled from source as of a few days ago.

@dmitshur
Copy link
Contributor

dmitshur commented Dec 7, 2023

Thanks. I saw it came back up in idle state. I gave it some work, and I see it failed with:

Could not resolve version infra/tools/cipd/openbsd-ppc64:git_revision:ec494f363fdfd8cdd5926baad4508d562b7353d4: no such package: infra/tools/cipd/openbsd-ppc64

That's useful information and on us to fix. Specifically, we need to set things up for CIPD packages to be built for the openbsd/ppc64 platform. (That will involve somewhat similar to crrev.com/c/5086069, with the caveat it can't be done using e.g. Go 1.21.0 since it doesn't support openbsd/ppc64 yet, whereas Go 1.22.0 will work.)

We'll update this issue once that's done.

@n2vi
Copy link
Author

n2vi commented Feb 26, 2024

Any news on this?

t.n2vi.net aka host-openbsd-ppc64-n2vi had been getting kernel panics from the gopher buildlet stream so, amidst power outages and other troubles, I brought software up to date in an effort to debug. Go 1.22 is running fine here.

Currently the buildlet fails to compile and while I work on that I thought I'd check in parallel on LUCI progress. If we can just cut over to the new system maybe I don't need to debug the old one?

@dmitshur
Copy link
Contributor

Thanks for checking in.

I've looked into this and as things stand now, there is expected additional time after the public Go 1.22.0 release, before it's available for the CIPD package building pipeline to use. This delay might decrease in the future, and we might be able to work around it, but not yet.

We'll update this issue after the CIPD packages are ready. Thanks.

@4a6f656c
Copy link
Contributor

@dmitshur any update on this given the hard deadline for May 17th?

@cagedmantis
Copy link
Contributor

@n2vi Can you try again?

@n2vi
Copy link
Author

n2vi commented Apr 29, 2024

I'm traveling, but will try remotely soon.

Just to confirm: you'd like both the old builder and the new swarming running in parallel for now, correct?

@n2vi
Copy link
Author

n2vi commented Apr 30, 2024

Killed off the old processes running as swarm, rebuilt golang.org/x/build/cmd/bootstrapswarm@latest, and tried to restart but got an error message about flag -token-file-path provided but not defined.

In the morning I'll start digging into what has changed in the API since October.

@dmitshur
Copy link
Contributor

As long as you still have LUCI_MACHINE_TOKEN set as needed in the environment, you should be able to drop the -token-file-path flag. It used to be required to set both to the same value, and the flag was removed in CL 548955 in favor of the env var.

@n2vi
Copy link
Author

n2vi commented Apr 30, 2024

Thanks. Restarted without error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Projects
Status: Planned
Development

No branches or pull requests

6 participants