Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5 of 5 linux/ppc64 builders are missing #41742

Closed
dmitshur opened this issue Oct 1, 2020 · 7 comments
Closed

5 of 5 linux/ppc64 builders are missing #41742

dmitshur opened this issue Oct 1, 2020 · 7 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Oct 1, 2020

From https://farmer.golang.org/#health:

image

/cc @andybons @golang/osp-team

@dmitshur dmitshur added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 1, 2020
@dmitshur dmitshur added this to the Unreleased milestone Oct 1, 2020
@laboger
Copy link
Contributor

laboger commented Oct 2, 2020

I am trying to contact OSU for information on this.

@ceseo
Copy link
Contributor

ceseo commented Oct 2, 2020

I see the system is responding to ping:

PING 140.211.169.164 (140.211.169.164): 56 data bytes
64 bytes from 140.211.169.164: icmp_seq=0 ttl=48 time=245.184 ms
64 bytes from 140.211.169.164: icmp_seq=1 ttl=48 time=256.996 ms
64 bytes from 140.211.169.164: icmp_seq=2 ttl=48 time=302.820 ms

Maybe it's just a matter of logging in and restarting the builder?

@laboger
Copy link
Contributor

laboger commented Oct 2, 2020

osu ticket #31316 was created for this. They are saying the machine is up and can be ssh'ed to. There were some issues yesterday that may have caused some instances to be rebooted. As Carlos noted the ppc64 builder might just need to be restarted. I don't have a key to get on those machines.

@ramereth
Copy link

ramereth commented Oct 2, 2020

@bradfitz please let me know if I can add anyone's key to the machine so that we can get it going again.

@cagedmantis
Copy link
Contributor

@ramereth Thanks for your work on this. We are able to log in.

@ramereth
Copy link

ramereth commented Oct 2, 2020

The service seems to be running however it's erroring out with the following:

Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Creating go-be-%d01 ...
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Error creating go-be-%d01: exit status 125,
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: See 'docker run --help'.
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Creating go-be-%d02 ...
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Error creating go-be-%d02: exit status 125,
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: See 'docker run --help'.
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Creating go-be-%d03 ...
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Error creating go-be-%d03: exit status 125,
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: See 'docker run --help'.
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Creating go-be-%d04 ...
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:25 Error creating go-be-%d04: exit status 125,
Oct 02 14:30:25 go-be-xenial-3 rundockerbuildlet[2975]: See 'docker run --help'.
Oct 02 14:30:26 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:26 Creating go-be-%d05 ...
Oct 02 14:30:26 go-be-xenial-3 rundockerbuildlet[2975]: 2020/10/02 14:30:26 Error creating go-be-%d05: exit status 125,
Oct 02 14:30:26 go-be-xenial-3 rundockerbuildlet[2975]: See 'docker run --help'.

Here's what the systemd unit for that service looks like:

[Unit]
Description=Run Buildlets in Docker
After=network.target

[Install]
WantedBy=network-online.target

[Service]
Type=simple
# The (-n * -cpu) values must currently be <= number of host cores.
# The host has 10 cores, so the -n=5 (five containers) * -cpu=2 (two CPUs per container) == 10.
# -memory=3.9g doesn't work with crun; TODO: tiborvass is investigating
ExecStart=/usr/local/bin/rundockerbuildlet -basename=go-be-%d -image=golang/builder -n=5 -cpu=2 -memory= --env=host-linux-ppc64-osu
Restart=always
RestartSec=2
StartLimitInterval=0

@cagedmantis
Copy link
Contributor

@ramereth I was able to get it back up and running by changing ExecStart to ExecStart=/usr/local/bin/rundockerbuildlet -basename=ppc64_ -image=golang/builder -n=5 -cpu=2 -memory= --env=host-linux-ppc64-osu. Thanks again for your help.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 19, 2020
@golang golang locked and limited conversation to collaborators Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants