Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: throwOnGCWork (and general builder instability) #29124

Closed
eliasnaur opened this issue Dec 6, 2018 · 5 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@eliasnaur
Copy link
Contributor

Seen on several builders:

linux/amd64: https://build.golang.org/log/2ce33c8fa9981d37f3d5a81e384285c0be6df37a
android/arm: https://build.golang.org/log/34a763d60d9908f39856dd7380948081bef8fff9 (although it looks like the error is from the host running vet)

I noticed the errors because the Android builder has started to time out a lot:
https://build.golang.org/log/8b9d9127e958601e4a0a3aa71243463d814bd92e
https://build.golang.org/log/8aa5d8614243f1aea5354ba3598e66143de3249c
https://build.golang.org/log/80435e9f9573e9eec360dcb0a69772ca4ccf991a
https://build.golang.org/log/24e04fb21e1383cd002c3a575398f38cf15c7532
For this run I had to manually send a SIGQUIT signal to a vet process that hung for hours:
https://build.golang.org/log/994e945a2a0f4e4ac2efdbabf9c7fedcf402553e

Timeouts also happen on several builders unrelated to mobile:
linux/amd64: https://build.golang.org/log/f1d308e0e68a514f98c242fce22f83b790cc8304
linux/amd64: https://build.golang.org/log/bebe0b5a38b0218a6814e60769645435e0a48a90
darwin/amd64: https://build.golang.org/log/75959bdf3602be17a3247ce5f1735a323392794d

@ianlancetaylor
Copy link
Contributor

CC @aclements

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 6, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Dec 6, 2018
@aclements
Copy link
Member

linux/amd64: https://build.golang.org/log/2ce33c8fa9981d37f3d5a81e384285c0be6df37a
android/arm: https://build.golang.org/log/34a763d60d9908f39856dd7380948081bef8fff9

Dup of #27993. (Definitely a release-blocker. I'm actively working on this one.)

https://build.golang.org/log/8b9d9127e958601e4a0a3aa71243463d814bd92e
https://build.golang.org/log/8aa5d8614243f1aea5354ba3598e66143de3249c
https://build.golang.org/log/80435e9f9573e9eec360dcb0a69772ca4ccf991a
https://build.golang.org/log/24e04fb21e1383cd002c3a575398f38cf15c7532

Dup of #25519. Only affects debug call injection, so not high-priority. This isn't technically Android-specific, though it seems to happen a lot there, so maybe we should just disable that test on Android.

https://build.golang.org/log/994e945a2a0f4e4ac2efdbabf9c7fedcf402553e

Stuck doing a syscall.Open?

@eliasnaur
Copy link
Contributor Author

https://build.golang.org/log/994e945a2a0f4e4ac2efdbabf9c7fedcf402553e

Stuck doing a syscall.Open?

I believe the vet process used ~100% cpu when I sent it SIGQUIT. So an infinite loop somehow.

@gopherbot
Copy link

Change https://golang.org/cl/154112 mentions this issue: runtime: fix hangs in TestDebugCall*

gopherbot pushed a commit that referenced this issue Dec 17, 2018
This fixes a few different issues that led to hangs and general
flakiness in the TestDebugCall* tests.

1. This fixes missing wake-ups in two error paths of the SIGTRAP
   signal handler. If the goroutine was in an unknown state, or if
   there was an unknown debug call status, we currently don't wake the
   injection coordinator. These are terminal states, so this resulted
   in a hang.

2. This adds a retry if the target goroutine is in a transient state
   that prevents us from injecting a call. The most common failure
   mode here is that the target goroutine is in _Grunnable, but this
   was previously masked because it deadlocked the test.

3. Related to 2, this switches the "ready" signal from the target
   goroutine from a blocking channel send to a non-blocking channel
   send. This makes it much less likely that we'll catch this
   goroutine while it's in the runtime performing that send.

4. This increases GOMAXPROCS from 2 to 8 during these tests. With the
   current setting of 2, we can have at most the non-preemptible
   goroutine we're injecting a call in to and the goroutine that's
   trying to make it exit. If anything else comes along, it can
   deadlock. One particular case I observed was in TestDebugCallGC,
   where runtime.GC() returns before the forEachP that prepares
   sweeping on all goroutines has finished. When this happens, the
   forEachP blocks on the non-preemptible loop, which means we now
   have at least three goroutines that need to run.

Fixes #25519.

Updates #29124.

Change-Id: I7bc41dc0b865b7d0bb379cb654f9a1218bc37428
Reviewed-on: https://go-review.googlesource.com/c/154112
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@aclements
Copy link
Member

Closing as a dup of #27993 and #25519.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants