Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: frequent failures on freebsd-arm-paulzhol builder with "signal: killed" since 2021-12-23 #50540

Closed
bcmills opened this issue Jan 10, 2022 · 7 comments
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jan 10, 2022

The freebsd-arm-paulzhol builder has failed a substantial fraction of builds with signal: killed since late December, and also seems to have a lot of missing runs over that interval (perhaps the buildlet itself is also being killed?).

greplogs --dashboard -md -l -e '(?ms)\Afreebsd-arm-paulzhol .* signal: killed\n' --since=2021-01-01

2022-01-08T00:24:25-90860e0/freebsd-arm-paulzhol
2022-01-08T00:24:25-5cfca57-90860e0/freebsd-arm-paulzhol
2022-01-08T00:24:25-1d35b9e-90860e0/freebsd-arm-paulzhol
2022-01-07T22:46:47-7f3eb61/freebsd-arm-paulzhol
2022-01-07T22:46:47-5cfca57-7f3eb61/freebsd-arm-paulzhol
2022-01-07T22:46:47-1d35b9e-7f3eb61/freebsd-arm-paulzhol
2022-01-07T22:40:23-c74be77/freebsd-arm-paulzhol
2022-01-07T22:40:23-5cfca57-c74be77/freebsd-arm-paulzhol
2022-01-07T21:15:14-be26ca9/freebsd-arm-paulzhol
2022-01-07T21:15:14-1d35b9e-be26ca9/freebsd-arm-paulzhol
2022-01-07T19:22:09-8b8fc08-f1596d7/freebsd-arm-paulzhol
2022-01-07T18:40:16-f1596d7/freebsd-arm-paulzhol
2022-01-07T18:40:16-1d35b9e-f1596d7/freebsd-arm-paulzhol
2022-01-07T18:20:24-5b0dc2d-ade5488/freebsd-arm-paulzhol
2022-01-07T06:34:04-5b0dc2d-11b28e7/freebsd-arm-paulzhol
2022-01-07T06:34:04-1d35b9e-11b28e7/freebsd-arm-paulzhol
2022-01-07T06:34:04-11b28e7/freebsd-arm-paulzhol
2022-01-07T02:37:20-40afced/freebsd-arm-paulzhol
2022-01-07T02:37:20-1d35b9e-40afced/freebsd-arm-paulzhol
2022-01-07T02:32:39-5b0dc2d-2bb7f6b/freebsd-arm-paulzhol
2022-01-07T02:32:39-2bb7f6b/freebsd-arm-paulzhol
2022-01-07T02:32:39-1d35b9e-2bb7f6b/freebsd-arm-paulzhol
2022-01-07T02:32:03-ab4556a/freebsd-arm-paulzhol
2022-01-07T02:32:03-5b0dc2d-ab4556a/freebsd-arm-paulzhol
2022-01-07T01:36:17-c1e7c51/freebsd-arm-paulzhol
2022-01-07T01:36:17-1d35b9e-c1e7c51/freebsd-arm-paulzhol
2022-01-07T00:15:59-07525e1/freebsd-arm-paulzhol
2022-01-07T00:02:57-c295137/freebsd-arm-paulzhol
2022-01-06T22:12:11-b9cae6f/freebsd-arm-paulzhol
2022-01-06T22:12:11-5b0dc2d-b9cae6f/freebsd-arm-paulzhol
2022-01-06T19:42:27-5b0dc2d-10f1ed1/freebsd-arm-paulzhol
2022-01-06T19:42:27-1d35b9e-10f1ed1/freebsd-arm-paulzhol
2022-01-06T19:42:27-10f1ed1/freebsd-arm-paulzhol
2021-12-29T04:10:07-1d35b9e-91e7821/freebsd-arm-paulzhol
2021-12-23T17:27:50-ed766b6/freebsd-arm-paulzhol
2021-10-02T16:05:55-a7fe161/freebsd-arm-paulzhol
2021-10-02T00:31:26-64da5e0/freebsd-arm-paulzhol
2021-09-17T08:20:48-6602c86/freebsd-arm-paulzhol
2021-08-02T17:18:57-8a7ee4c/freebsd-arm-paulzhol
2021-02-11T18:02:48-864d4f1/freebsd-arm-paulzhol
2021-01-21T19:15:21-3c2f11b/freebsd-arm-paulzhol

Attn @paulzhol; CC @golang/release.

@bcmills bcmills added arch-arm Issues solely affecting the 32-bit arm architecture. Builders x/build issues (builders, bots, dashboards) OS-FreeBSD NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Soon This needs to be done soon. (regressions, serious bugs, outages) labels Jan 10, 2022
@bcmills bcmills added this to the Backlog milestone Jan 10, 2022
@paulzhol
Copy link
Member

The most recent ones which look like:

Building Go cmd/dist using /usr/home/paulzhol/go1.4. (go1.4-bootstrap-20170531 freebsd/arm)
Building Go toolchain1 using /usr/home/paulzhol/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
go build cmd/compile/internal/ssa: /tmp/workdir-host-freebsd-arm-paulzhol/go/pkg/tool/freebsd_arm/compile: signal: killed
go tool dist: FAILED: /tmp/workdir-host-freebsd-arm-paulzhol/go/pkg/tool/freebsd_arm/go_bootstrap install -gcflags=all= -ldflags=all= -a -i cmd/asm cmd/cgo cmd/compile cmd/link: exit status 1

Are due to out of memory during bootstrap. This also sometimes happened mid test like:
https://build.golang.org/log/04777783e4d4c1b95e6e9868f3d61e8bfaa076fe

They were due to the loss of a network connectivity to a block device used as swap, I've restored it. But it stil seems 1.18 is a much heavy memory user.

There used to be a way to control the bots by canceling and triggering a run a new. I think I've been locked out of that for several years. And besides it can never keep up with all the builds when it takes more than 3 houres per run.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 11, 2022

it stil seems 1.18 is a much heavy memory user.

That could be due to #44167 — including non-heap sources of GC work in the pacing decisions can cause less frequent collection when those sources are a significant fraction of memory usage.

There used to be a way to control the bots by canceling and triggering a run a new. I think I've been locked out of that for several years.

The command that does that is golang.org/x/build/cmd/retrybuilds, although I'm not sure what permissions that requires. (Someone from @golang/release could probably fill in more detail.)

And besides it can never keep up with all the builds when it takes more than 3 [hours] per run.

I think that's fine in general, as long as the builder makes regular forward progress? (It seems ok to be missing test runs in the middle of a burst of changes as long as we occasionally get an up-to-date run at the end of the burst.)

@dmitshur
Copy link
Contributor

I understand the Soon label was added to look into the sudden increase in failing builds starting with late December, and it seems it was determined to be related to an increase in memory use in Go 1.18. (If that's not working as expected, it probably needs a separate issue.) I'll remove the Soon label, since there doesn't appear to be a clear time-sensitive action that must be taken in the order of days or hours here.

There used to be a way to control the bots by canceling and triggering a run a new. I think I've been locked out of that for several years.

As Bryan mentioned, retrybuilds is that command. If you are the builder owner, I believe you should be able to use the builder key to clear out failures on that particular builder. It's expected there may be some changes happening as part of #47521, but please file an issue for that command if you can't use it on a builder you own.


It seems the current state of the builder is that it is missing. From https://farmer.golang.org/#pools:

host-freebsd-arm-paulzhol: 0/0 (1 missing)

@dmitshur dmitshur removed the Soon This needs to be done soon. (regressions, serious bugs, outages) label Jan 28, 2022
@paulzhol
Copy link
Member

2022-04-27T00:09 is the most recent. I'm not sure I can do anything about it.
RAM is very constrained, I already have a swap device mounted over ISCSI.
tmpfs is used as the filesystem for the build which should be swapped out under memory presure.

The path forward for FreeBSD on ARMv7 is probably running it virtualized under ARM64 anyway.

@bcmills
Copy link
Contributor Author

bcmills commented May 2, 2022

This is still happening frequently this week. I'm going to drop freebsd-arm-paulzhol from builder triage until it can be addressed.

@paulzhol
Copy link
Member

I think I've found the root cause. I've managed to reproduce it by running all.bash on a rpi3b (more cores, less RAM, slower network):

IMG_6113

The iscsid daemon (ISCSI initiatior control plane) plus the dhcient daemon were swapped out due to high memory presure. Later the lease can't be renewed and ISCSI can't re-establish the connection to the block device used for swap.

I've switched the following sysctls:

vm.swap_enabled=0
vm.pfault_oom_attempts=-1

The first one is meant to prevent whole process swapping (seperate from paging in FreeBSD) of runnable but inactive processes like iscsid.
While the second is to prevent a proccess from getting OOM Killed if paging cannot reclaim RAM fast enough (this was observed in several runs where dist got Killed during bootsrap).

Additionally I've moved /tmp to be mounted on a regualr UFS FS on an ISCSI block device, instead of tmpfs. This prevents heavy thrashing as the majority of the swapfile is used to hold the Go sources and build artifacts.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Projects
None yet
Development

No branches or pull requests

4 participants