Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: linux-arm builder can't finish an all.bash run when test sharding isn't used #40872

Closed
randall77 opened this issue Aug 18, 2020 · 13 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@randall77
Copy link
Contributor

When I run a trybot on linux/arm, I get "out of memory" or "no space left on device" errors.

ok  	reflect	3.259s
ok  	regexp	0.949s
ok  	regexp/syntax	4.910s
# cmd/compile/internal/ssa
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x29ca71, 0x16)
	/workdir/go/src/runtime/panic.go:1116 +0x5c
runtime.sysMap(0x11400000, 0x3000000, 0x42bc70)
	/workdir/go/src/runtime/mem_linux.go:169 +0xa8
runtime.(*linearAlloc).alloc(0x41d280, 0x3000000, 0x400000, 0x42bc70, 0x0)
	/workdir/go/src/runtime/malloc.go:1447 +0x94
ok  	cmd/addr2line	28.924s
ok  	cmd/api	113.949s
ok  	cmd/asm/internal/asm	33.778s
ok  	cmd/asm/internal/lex	0.233s
# cmd/fix.test
panic: no space left on device

goroutine 1 [running]:
cmd/link/internal/ld.Main(0x3f33c0, 0x4, 0x8, 0x1, 0xd, 0xe, 0x0, 0x0, 0x285316, 0x12, ...)
	/workdir/go/src/cmd/link/internal/ld/main.go:319 +0x1c08
main.main()
	/workdir/go/src/cmd/link/main.go:68 +0x12c
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory
/workdir/go/pkg/tool/linux_arm/vet: fork/exec /workdir/go/pkg/tool/linux_arm/vet: cannot allocate memory

Is there anything we can do to fix this? Can we get more memory and/or disk space on these builders?

I could work on making cmd/compile/internal/ssa tests take less memory, perhaps. Not sure how much we could save.

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Aug 18, 2020
@gopherbot gopherbot added this to the Unreleased milestone Aug 18, 2020
@randall77
Copy link
Contributor Author

How do the trybots succeed? Is it because they shard out the tests to multiple machines?

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Aug 18, 2020

Can you please include links to CLs where you've seen this? That way we'll have more information (e.g., which commit was being tested exactly, etc.). How often is this happening? Did it start recently?

The linux-arm builder is defined https://github.com/golang/build/blob/148ff27ab5b70970002d390c9e1da4b861f6da9f/dashboard/builders.go#L1736-L1756. They run on Scaleway (also see here), so adjusting resources will be limited to what's available there (we might already be maxed out; but need to look again to be more confident).

I see that linux-arm trybots are currently disabled because of other issues:

tryBot:            nil, // Issue 22748, Issue 22749

Is this issue about that builder when requested via SlowBots or something else?

/cc @cagedmantis @toothrot @andybons per builder owners.

@dmitshur dmitshur added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 18, 2020
@randall77
Copy link
Contributor Author

This happens when using gomote to run all.bash manually:

gomote create linux-arm
gomote push user-khr-linux-arm-0
gomote run go/src/all.bash

Sorry, I guess I'm using the term "trybot" to mean both the thing that tests CLs as well as manual gomotes. I mean the latter (except in the context of my second comment).

@dmitshur dmitshur removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 18, 2020
@dmitshur dmitshur changed the title x/build: linux-arm trybots can't finish a run x/build: linux-arm builder can't finish an all.bash run when test sharding isn't used Aug 18, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Aug 18, 2020

@cagedmantis Do you expect #36841 will be able to help with this (by enabling a linux-arm builder with bigger limits)?

@cagedmantis
Copy link
Contributor

@dmitshur Yes, I'm actively working on the linux-arm-aws builder with more resources. I will assign myself to this issue.

@cagedmantis cagedmantis self-assigned this Aug 18, 2020
@dmitshur
Copy link
Contributor

Oh, I believe this is the same issue as #35628. /cc @cherrymui I'll close it in favor of that one, and move your assignment @cagedmantis if you don't mind.

@cherrymui
Copy link
Member

This is not exactly the same. #35628 is about trybot, this is about "when test sharding isn't used" e.g. manual gomote runs. The trybot one has weird STALE errors, whereas this one is OOMing or out of disk space.

About disk space, if I remember correctly, last time I looked, the machine actually has reasonably sizable disk space, but we're running on a very small partition.

@dmitshur
Copy link
Contributor

We can re-open this if it'd be helpful to confirm this issue is fixed when #35628 is fixed, but as I understand, this builder is broken in all contexts other than as a post-submit builder (on build.golang.org).

@dmitshur dmitshur reopened this Aug 19, 2020
@gopherbot
Copy link

Change https://golang.org/cl/249420 mentions this issue: cmd/coordinator: warn about known linux-arm SlowBot issue

gopherbot pushed a commit to golang/build that referenced this issue Aug 20, 2020
The current linux-arm builder is known to have trouble when used as
a SlowBot. Start warning about it when the builder is requested via
the TRY= SlowBot UI.

I've considered also removing or disabling the "arm" SlowBot alias,
but that would make it easier to miss that there's an issue, since
SlowBots don't warn about unknown builders:

	If you specify an unknown TRY= token, it'll just ignore it
	and won't report an error.

We can consider making further changes as this situation evolves.
The goal here is to start notifying about a known problem sooner.

For golang/go#35628.
For golang/go#40872.

Change-Id: Ibc1205720c44ec4823c632c04fc2f887368258c1
Reviewed-on: https://go-review.googlesource.com/c/build/+/249420
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/270517 mentions this issue: dashboard: remove known issue label from linux-arm-aws builder

gopherbot pushed a commit to golang/build that referenced this issue Nov 16, 2020
The linux-arm-aws builder was initially labeled with a known issue
because it was experimental. The builder has been tested and is no
longer considered experimental.

Fixes golang/go#41867
Updates golang/go#40872
Updates golang/go#35628

Change-Id: I61f43f2c2651c26d3f5d4db01b779686ddb6a92b
Reviewed-on: https://go-review.googlesource.com/c/build/+/270517
Trust: Carlos Amedee <carlos@golang.org>
Run-TryBot: Carlos Amedee <carlos@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@erikwilson
Copy link

Also ran into a similar panic: no space left on device error building go with an RPi4.
In my case the /tmp partition is only 64MB, and it looks like the go building stuff needs slightly more.
Might be nice to put a check in the build script or have a better error message on the specific location since those files are cleaned up.
As a workaround was able to export GOTMPDIR=/dev/shm and the build was able to succeed.

@cagedmantis
Copy link
Contributor

Instead of dedicating more time to the linux-arm builders which is hosted on Scaleway, I think it may be best to replace the current linux-arm builder with the one hosted on AWS. The new builders have additional resources which should resolve all of these issues. Please comment if you disagree with this plan.

@gopherbot
Copy link

Change https://golang.org/cl/303230 mentions this issue: dashboard: add linux-arm-aws to trybots

@golang golang locked and limited conversation to collaborators Apr 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants