New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: rangegen.go test killed on ppc64 #65725
Comments
I changed I bumped it later that day to 10G, and set up a 5G swap file on the ppc64le builder. Hopefully that is enough to keep things running, hopefully faster. |
Thanks for the update. For what it's worth, I got this failure on https://go.dev/cl/564137 at around Feb 14, 6pm EST. It sounds like that may be after your latest change? (This could be a bug in that WIP CL, but it seems unlikely to fail just this one test) |
Looking at the VM, it did trigger an OOM on a container. The syslog is claiming 7.5G of that is "file" usage. Is there any way to make LUCI more conservative with it disk usage? This seems like a pretty big jump from the old CI which also ran entirely on a tmpfs. |
That's a good question. I wonder how much disk this test used on the old infra. I know this test does generate an absolutely massive source file. cc @golang/release |
Found new dashboard test flakes for:
2024-02-15 14:35 gotip-linux-ppc64-power10 go@cfe7f21d cmd/internal/testdir.Test/rangegen.go (log)
|
I didn't realize the ppc64 LUCI builder also needed a bump too. It is bumped to 10g. If this doesn't crash again in the next week, this issue can be closed. |
Poking around at the idle ppc64le builders, there is a folder "/home/swarming/.cache/gopls" which is consuming between 2.5G and 3.3G on each instance. That's a problem. Is it possible to update LUCI to cleanup caches at the end of a test run? |
I recall @adonovan used a GOPLSCACHE mechanism to help with that in the previous infrastructure (CL 494297). I'm not seeing that in the LUCI infrastructure—perhaps it needs to be ported over. There's some relevant discussion in this thread. CC @mknyszek. |
For reference, this is what has accrued on linux-ppc64le-power8--05 since the last container reboot:
|
Yes, that was an effective fix for the problems of this kind we saw in the older builders. It should be as simple as setting GOPLSCACHE to a temp directory for the entire run. |
Found new dashboard test flakes for:
2024-02-16 14:59 gotip-linux-ppc64-power10 go@5258d4ed cmd/internal/testdir.Test/rangegen.go (log)
|
Found new dashboard test flakes for:
2024-02-16 15:12 gotip-linux-ppc64le go@3b515812 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-16 15:51 go1.22-linux-ppc64le release-branch.go1.22@d6a27193 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-16 16:53 gotip-linux-ppc64le go@7f799f33 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-16 18:13 gotip-linux-ppc64le go@a0226c56 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-16 20:25 gotip-linux-ppc64-power10 go@cdd0ddaf cmd/internal/testdir.Test/rangegen.go (log)
2024-02-17 00:13 gotip-linux-ppc64-power10 go@e41fabd6 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-17 00:13 gotip-linux-ppc64le go@e41fabd6 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-19 08:55 gotip-linux-ppc64-power10 go@5c92f43c cmd/internal/testdir.Test/rangegen.go (log)
2024-02-19 20:44 gotip-linux-ppc64le go@0882ca7a cmd/internal/testdir.Test/rangegen.go (log)
2024-02-20 14:56 gotip-linux-ppc64-power10 go@098a87fb cmd/internal/testdir.Test/rangegen.go (log)
2024-02-20 17:57 gotip-linux-ppc64le go@c1828fbc cmd/internal/testdir.Test/rangegen.go (log)
2024-02-20 18:06 gotip-linux-ppc64-power10 go@67361bf8 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-20 18:06 gotip-linux-ppc64le go@67361bf8 cmd/internal/testdir.Test/rangegen.go (log)
|
Found new dashboard test flakes for:
2024-02-20 20:44 gotip-linux-ppc64le go@4ce008d7 cmd/internal/testdir.Test/rangegen.go (log)
2024-02-20 21:02 gotip-linux-ppc64le go@de65aa41 cmd/internal/testdir.Test/rangegen.go (log)
|
I've set |
@adonovan @pmur Can you clarify what you mean by "a temp directory"? Is the |
Yes, a temp directory that lasts for a complete run of tests at a single CL is ideal. Thanks. |
Sent crrev.com/c/5314212. I created an explicit subdirectory in the workdir for it which will have the same effect, and I thought it might be a bit clearer to have it next to the |
The change landed, so expect this to roll out over the next half hour or so. |
I confirmed that the environment variable is now set in new builds to a directory that will definitely get wiped on each run. Hopefully this should be resolved. Closing optimistically. |
Found new dashboard test flakes for:
2024-02-21 17:22 gotip-linux-ppc64-power10 go@cd170327 cmd/internal/testdir.Test/rangegen.go (log)
|
Found new dashboard test flakes for:
2024-04-09 04:07 gotip-linux-ppc64_power8 go@9f3f4c64 cmd/internal/testdir.Test/rangegen.go (log)
|
I set up some of the new builders to use 8G instead of 10G memory limits, which can OOM rangegen tests. That has since been resolved for the last week or so. |
Test/rangegen.go
is occasionally getting killed on ppc64 builders.e.g., https://ci.chromium.org/ui/p/golang/builders/ci/gotip-linux-ppc64le/b8756201558164082785/test-results?sortby=&groupby=
From the history, this seems to be a recent regression: https://ci.chromium.org/ui/test/golang/cmd%2Finternal%2Ftestdir.Test%2Frangegen.go?q=V%3Ago_branch%3Dmaster+V%3Agoos%3Dlinux+V%3Ahost_goos%3Dlinux+
Note that this test is very large and has caused OOMs before (#64789).
cc @golang/ppc64
The text was updated successfully, but these errors were encountered: