Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: spacemonkey arm5 builders broken, clean tmp dirs automatically #28041

Closed
bradfitz opened this issue Oct 5, 2018 · 16 comments
Closed
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Oct 5, 2018

Looks like the spacemonkey builders were broken by https://go-review.googlesource.com/c/go/+/139418 (os: add UserHomeDir) because they're running without $HOME set and are using an old stage0 before we set it automatically in https://go-review.googlesource.com/30599 ?

I could update cmd/buildlet instead, though. It'd be slightly redundant with stage0, but wouldn't require changes on the arm5 hosts.

/cc @zeebo

@gopherbot gopherbot added this to the Unreleased milestone Oct 5, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Oct 5, 2018
@gopherbot
Copy link

Change https://golang.org/cl/140177 mentions this issue: cmd/buildlet: set USER and HOME if unset on the arm5 builder

@bradfitz
Copy link
Contributor Author

bradfitz commented Oct 5, 2018

Looks like the arm5 builders aren't reloading the buildlet per run like they're supposed to.

I don't know how they're configured.

@zeebo?

@bradfitz bradfitz reopened this Oct 5, 2018
@zeebo
Copy link
Contributor

zeebo commented Oct 6, 2018

I have since moved on from there. I’ll try to get in contact with whoever owns these now and point them here.

I don’t know how long that will take or if they will care, so any temporary measures to fix this are fine by me.

@onionjake
Copy link

I can look into this. I haven't connected to them before so it might take a bit for me to sort that out.

@onionjake
Copy link

@bradfitz Looks like the process is failing because the builders are run with the user builder with the home of /home/builder and it is getting access denied trying to write to /root. Would you prefer that set $USER and $HOME properly so you don't need the workaround at all?

@bradfitz
Copy link
Contributor Author

Which process is failing?

Actually, let's back up. Can you describe how those machines are configured? Is there a systemd unit? What's it run, or what's its complete definition? If not systemd, what runs the loop that connects to the coordinator?

I can then document all this in our repo.

@onionjake
Copy link

It uses daemontools. I added the exports for USER and HOME just now.

root@go-builder-3:~# cat /etc/service/stage0/run 
#!/bin/bash

set -e

SCRATCH=~builder/stage0scratch
export TMPDIR=${SCRATCH}/tmp
export WORKDIR=${SCRATCH}/workdir

mkdir -p ${TMPDIR}
mkdir -p ${WORKDIR}
chown -R builder:builder ${SCRATCH}
cd ${SCRATCH}

export GO_BUILDER_ENV=linux-arm-arm5spacemonkey
export GO_TEST_TIMEOUT_SCALE=5
export USER=builder
export HOME=/home/builder
exec bash -c "setuidgid builder /usr/local/bin/stage0 2>&1 | logger"

Builder was looking for the build key in /root/, which got permission denied. I think the builder is running again now that I set USER and HOME correctly.

@bradfitz
Copy link
Contributor Author

I see 2 connected now at https://farmer.golang.org/ ...

host-linux-arm5spacemonkey: 2/2 (1 missing)

There used to be 3, though. Is one still coming online?

@bradfitz
Copy link
Contributor Author

Nevermind, the third just showed up.

@onionjake
Copy link

onionjake commented Oct 12, 2018

Looks like two different failures now?

# _/home/builder/stage0scratch/workdir/go/misc/cgo/test.test
/home/builder/stage0scratch/workdir/go/pkg/tool/linux_arm/link: flushing /home/builder/stage0scratch/tmp/go-link-851625167/go.o: write /home/builder/stage0scratch/tmp/go-link-851625167/go.o: no space left on device
FAIL	_/home/builder/stage0scratch/workdir/go/misc/cgo/test [build failed]
##### ../misc/cgo/testplugin
PASS
something
# command-line-arguments
/home/builder/stage0scratch/workdir/go/pkg/tool/linux_arm/link: running arm-linux-gnueabi-gcc failed: exit status 1
collect2: error: ld returned 1 exit status

2018/10/11 23:21:42 Failed: exit status 2
2018/10/11 23:21:47 FAILED

I will double check on the disk space.

@onionjake
Copy link

It looks like there is a lot of stuff left around in tmp should that be cleaned between each run?

root@go-builder-2:/home/builder/stage0scratch# du -hs workdir/ tmp
619M    workdir/
1.2G    tmp

@bradfitz
Copy link
Contributor Author

@dmitshur, can you investigate the tempdir situation and who's responsible for cleaning it that's not?

I suspect what should happen (but likely isn't) is that the buildlet should figure out the temp dir it's been given (using https://golang.org/pkg/os/#TempDir) and then it should make a subdirectory under that and then set the appropriate environment variable(s) for all child processes (in handleExec). Then on graceful shutdown (handleHalt) it can clean up its own directories (as best it can), and on ungraceful shutdowns it can instead nuke it if it exists.

So os.TempDir returns (likely) /tmp, and then on start-up we nuke /tmp/buildlet, and then all child processes run with, say, TMP=/tmp/buildlet and on shutdown we nuke /tmp/buildlet.

I'm not sure what the story is now.

@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 12, 2018
@bradfitz bradfitz changed the title x/build: spacemonkey arm5 builders broken x/build: spacemonkey arm5 builders broken, clean tmp dirs automatically Oct 12, 2018
@onionjake
Copy link

I will stop the process, clean the dirs, and start it again to see if we can get some passing and make sure there are not other issues.

@onionjake
Copy link

With the space cleaned it looks like all the builds are consistently failing with:

##### API check
Error running API checker: exit status 1
...
exit status 1
2018/10/12 16:39:17 Failed: exit status 1
2018/10/12 16:39:23 FAILED

I will try and look to see if there is something else that might be causing this issue.

@gopherbot
Copy link

Change https://golang.org/cl/144637 mentions this issue: cmd/buildlet: set up & clean TMPDIR and GOCACHE for child processes

@bradfitz
Copy link
Contributor Author

Okay, this the new arm5 buildlet binary is pushed.

I'll watch it for any new problems.

It'll keep itself cleaned for new stuff, but you might have to clean legacy messes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants