-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: android/arm and android/arm64 builders hang in bootstrap #35554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmmm. I tested that CL on android/arm64 manually and with trybots, both worked fine. Not sure why it starts to fail now... |
Note the android/arm* builders have been timing out for a week: |
Yeah, I submitted that CL roughly a week ago, but I tested it before that, and it worked fine. I also tested the corresponding ARM32 CL (https://go-review.googlesource.com/c/go/+/202338) and it worked fine. And the android/arm builder showed ok after that CL was submitted. |
FWIW, darwin/arm64 can also hang. I killed this one with SIGQUIT: https://build.golang.org/log/294e3f1975bd1cd17bd05e8cf4c0f4ec83dd3a00 |
@eliasnaur Could you test whether disabling async preemption fix the timeout? |
There's also the env variable |
Thank you. Setting |
Thanks. Could you get a stack trace with GOTRACEBACK=crash set? |
On the android/arm builder, do we build an ARM64 toolchain first? (It looks that way from the log.) Is this the tip version of Go? On https://build.golang.org/?page=3&branch=master , there are some ok on android/arm after the ARM64 preemption CL submitted. Also, on either builder, in the build log it says
but also
I'm not sure I understand how the build process works on the android builder. |
There are currently two slightly different builders because I'm reconstructing them to be built from a setup script. You may disregard android/arm for now, it doesn't work on the new builder types yet, and is a cross compile from arm64. The android/arm64 builds are native. The old builder uses a prebuilt toolchain I made (go-android-arm64-bootstrap). The new builder uses Go from the |
Thanks @eliasnaur |
For some reason I'm now failing to reproduce the hangs :( Another crash appeared, but it seemed unrelated:
I'll file a separate issue. |
This looks like #35538 |
Good to see it doesn't hang, at least no always. Thanks! |
Got one with GOTRACEBACK=crash on android/arm: https://build.golang.org/log/aa5855e5d65bbe8361fa9b79d53c1f9be22f1804
|
And another android/arm on the old builder, but without GOTRACEBACK=crash:
|
Another from the old darwin/arm64 builder (no GOTRACEBACK=crash): |
Finally, a GOTRACEBACK=crash enabled log from the new android/arm64 builder: https://build.golang.org/log/d1bc80f88a28797a24773d9a98525e9ce8db9aa2 EDIT: and another from android/arm: https://build.golang.org/log/71a889d265d2fd092aa0bcb5cb6109eb02405764 |
Change https://golang.org/cl/206959 mentions this issue: |
Thanks for the stack traces. These to some degree similar to the ones I saw when I debug #35473. If so, it indicates that a signal is received when the G is bad. Could you try https://go-review.googlesource.com/c/go/+/206959 ? This doesn't fix anything but it will make it crash instead of hanging if my assumption is right. |
Thank you, Cherry. With your CL, I got a "bad g in signal handler" for an android/arm build:
And a similar for android/arm64:
No stack traces though, but perhaps you expected that? |
Thank you. This confirms my assumption, although I don't yet know what causes the bad g. I'll keep looking. I also want to note that it seems all the hangings happen in go_bootstrap. If I understand correctly, on Android we default to PIE, which default to external linking, which brings in cgo. So all binaries are effectively cgo binaries. But go_bootstrap is the only exception, due to
It attempted to bring in cgo but failed. A non-cgo externally linked PIE may still working, but it is also an unusual configuration. Cgo affects whether we use TLS to save the G. |
When we receive a signal, if G is nil we call badsignal, which calls needm. When cgo is not used, there is no extra M, so needm will just hang. In this situation, even GOTRACEBACK=crash cannot get a stack trace, as we're in the signal handler and cannot receive another signal (SIGQUIT). Instead, just crash. For #35554. Updates #34391. Change-Id: I061ac43fc0ac480435c050083096d126b149d21f Reviewed-on: https://go-review.googlesource.com/c/go/+/206959 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
That's #31343. Do you have an idea of the scope to implement it? I don't mind working on it this cycle if it helps async preemption. |
Change https://golang.org/cl/207299 mentions this issue: |
Thanks, @eliasnaur . Does android require buildmode=pie? Does buildmode=exe work? The go_bootstrap program is only used for building the toolchain and the actual go command, it will be deleted then. If buildmode=exe works, we could just use that. It won't affect any user program. If not, could we try internal linking? Internal linking PIE works well on Linux/ARM64. It may work just fine on Android as well. (It doesn't work on ARM32 though. But you bootstrap from ARM64 anyway.) |
Android refuses to run non-pie programs. See https://go-review.googlesource.com/c/go/+/170943. Fortunately it seems internal linking of non-cgo PIE programs work on android/arm64. |
Change https://golang.org/cl/207445 mentions this issue: |
Thank you very much, Cherry! I'll keep working on CL 207299 because it's a gain independent of this issue. |
Thanks @eliasnaur . The builders are happy now. |
The android arm builders have hung in the bootstrap. I used to think it was caused by a flaky network, because of #35553. Example:
https://farmer.golang.org/temporarylogs?name=android-arm-corellium&rev=99957b6930c76b683dbca1ff4bcdd56e59b1e035&st=0xc009856f20
Killing the build with SIGQUIT gives various stack traces:
https://build.golang.org/log/e41631f86f761d215a8a9282ab67af9dbd6397d5
https://build.golang.org/log/f878ba2305112c955d167419607afb76f8ffa78f
https://build.golang.org/log/49023d89fec5e6e62cdda9cd37bdf1d1ea9962e3
I bisected the arm64 hangs to start at https://golang.org/cl/203461, which is the enablement of arm64 preemption.
CC @cherrymui.
The text was updated successfully, but these errors were encountered: