Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build,cmd/compile: frequent "signal killed" building cmd/compile/internal/ssa on android-arm-corellium builder since 2021-12-07 #50084

Closed
bcmills opened this issue Dec 9, 2021 · 12 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge mobile Android, iOS, and x/mobile NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android release-blocker
Milestone

Comments

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Dec 9, 2021
@gopherbot gopherbot added this to the Unreleased milestone Dec 9, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Dec 9, 2021

Release-blocker via #11811 (CC @golang/release).

@changkun and/or @steeve: could you investigate the source of these failures?

@bcmills bcmills added mobile Android, iOS, and x/mobile okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 OS-Android release-blocker labels Dec 9, 2021
@bcmills bcmills modified the milestones: Unreleased, Go1.18 Dec 9, 2021
@changkun
Copy link
Member

Just a first impression: I haven't touched any android builders yet but only iOS builders, and probably @steeve didn't too for years. If the failures occur later in the past three days, they may likely be caused by newly landed commits.

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 10, 2021
@cherrymui cherrymui removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@aclements
Copy link
Member

@aclements
Copy link
Member

It seems possible that the cmd/compile/internal/ssa package just got big enough that the build is now getting OOM-killed. The next step here may be to reproduce and look at the syslog.

@bcmills bcmills changed the title x/build,cmd/compile: frequent "signal killed" on android-arm-corellium builder since 2021-12-07 x/build,cmd/compile: frequent "signal killed" building cmd/compile/internal/ssa on android-arm-corellium builder since 2021-12-07 Jan 19, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Jan 19, 2022

Two more, also building cmd/compile/internal/ssa:

greplogs --dashboard -md -l -e '(?ms)\Aandroid-arm.*/compile: signal: killed' --since=2022-01-04

2022-01-18T23:59:40-50869f3/android-arm-corellium
2022-01-18T21:43:02-cf5d73e/android-arm-corellium

@gopherbot
Copy link

Change https://golang.org/cl/381514 mentions this issue: dashboard: set GOMAXPROCS=1 for android corellium builder

@changkun
Copy link
Member

I hope we would revert the change. I've been contacting Corellium lately since before Christmas. There are multiple issues (including this) to their environment that needs to be fixed at least from what I have experienced:

  1. The devices may be automatically turned off for an unknown reason (their response is under investigation)
  2. The devices may be automatically being wiped out for an unknown reason (same response)
  3. The VPN connection may not be established due to cert error

The impression is their environment monitor may trigger certain action (when OOM), then the entire device pool will be rescheduled or removed from running status, which causes frequent device missing from the farmer when there is no action being made from the human side.

@ianlancetaylor
Copy link
Contributor

@changkun Sorry, which change do you think we should revert?

@changkun
Copy link
Member

changkun commented Feb 1, 2022

@changkun Sorry, which change do you think we should revert?

Not sure what else I could refer to? It looks like there is only one change related to this thread. I was telling a story that the problem could be on the Corellium side based on the recent experience to maintain them. Hence we may not need to set GOMAXPROCS=1 as long as they confirmed and fixed it.

@aclements
Copy link
Member

We'd be happy to be able to revert and run with more parallelism, but at least to me it's not clear how this is related to the Corellium issues you mentioned. From the build logs, this is not a device disappearing or a VPN issue. The compiler is getting a signal and that's successfully being reported up through all.bash and terminating the build. Maybe I'm not understanding what you're saying?

@ianlancetaylor
Copy link
Contributor

@changkun I'm sorry for being slow to understand. When I ask a specific question like "what change do you think we should revert" it really helps me a lot if you can simply provide a CL number or a git revision or something. Please feel free to also explain why it is a stupid question, but please also answer the question. Many thanks.

I'm going to guess that you mean https://go.dev/cl/381514? I don't know if that change is helping but I don't understand why reverting it would help either.

My apologies if I'm missing the point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge mobile Android, iOS, and x/mobile NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android release-blocker
Projects
None yet
Development

No branches or pull requests

7 participants