-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: hangs in TestGdbAutotmpTypes with various GOMAXPROCS on ppc64le; setting GOEXPERIMENT=nopacerredesign avoids the hangs #49852
Comments
This is likely a test that assumes a GC won't run, but then it does and breaks the test. We've run into and fixed many such tests at this point in the release cycle. I can look into it, but my guess is this is a test-only failure with pretty low severity. |
Also, since it hangs it should be straightforward to send it a |
I used GOMAXPROCS=32 Does not hang for GOMAXPROCS<=16
|
Ugh. It looks like it's blocking on a child process that is the actual culprit. |
I can't trivially reproduce this myself because we're missing This may be why we haven't seen this failure on the builders. I have a CL out to roll back the heap minimum, https://golang.org/cl/368137, could you try that out and let me know if it helps? Given the program that's executing I'm very surprised that the GC is executing at all (I mean it must be if the pacer changes are to blame in any way), but maybe there's more to this than meets the eye. My next steps to getting to the bottom of this are to introduce a line like |
I took a look to understand why the child is deadlocking. The gdb process is waiting (in the poll syscall) for the parent to read it's stdout pipe. It looks like the goroutine responsible for reading the pipe is halted for GC. |
This runtime test no longer hangs with CL 368137 under the same conditions as reported above. |
Alrighty, good to know. We should still understand why the test is failing. Either a GC really isn't allowed here so we should figure out how best to disable a GC in the child, or we fix the test to tolerate additional GC cycles. |
I suspect I am missing some details, but how do we generically avoid deadlocks like those when using the (*cmd).CombinedOutput? If I understand it correctly, this method implicitly creates a goroutine to slurp up stdout/stderr from the child via a pipe, and then parks itself in a blocking wait syscall. If a GC halts the slurping goroutine, and then the child blocks waiting on the other end of the pipe to slurp, and then the GC is waiting to halt the goroutine stuck in the wait syscall, we are deadlocked. How do we avoid this? |
Goroutines that block in syscalls can be scanned by the GC without issue, generally speaking. The GC grabs the goroutine by setting its Gscan bit in its status, and upon exiting the syscall the goroutine tries to reacquire its own status, spinning until the GC is done with it. There have been some subtle deadlocks around this code before, but if this was a problem with even a small fraction of |
This is effectively mitigated for this release since https://golang.org/cl/368137 went in, so I don't think it should be marked a release blocker anymore. We've basically confirmed that it's not a new bug in the release because we know the difference is that a GC was happening during this test somewhere where it wasn't happening before (which in theory should not change anything -- the rest of the GC has had very few changes this cycle). We should still keep this open, and I intend to continue investigating, but this is going to be lower priority for me. If you want to reproduce now, you'll have to also specify |
Take another look at this one, I see a different behavior now. The gdb command starts, hit the breakpoint I wonder if gdb is halting the other threads while executing the step, and we happen to get preempted with a GC operation. |
Change https://golang.org/cl/370775 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Does not fail on go 1.17
What operating system and processor architecture are you using (
go env
)?Different ppc64le systems; it fails on power8 ubuntu, power9 rhel8, and a power10 (debian I think).
Behavior described below is the same on all systems.
What did you do?
We have seen intermittent hangs in the runtime tests on various machines. When tracking it down I found it was hanging in TestGdbAutotmpTypes.
On my power9 RH8 I can make it hang consistently with this:
go test -c
export GOMAXPROCS=64
./runtime.test -test.run=TestGdbAutotmpTypes -test.cpu=1 -test.count=10
If I set GOEXPERIMENT=nopacerredesign it doesn't hang.
It doesn't consistently hang if count=1.
It doesn't seem to hang with GOMAXPROCS=2 but didn't try many experiments.
It does not hang on go 1.17.
What did you expect to see?
PASS
What did you see instead?
Test hangs, and viewing the processes shows it is trying to run TestGdbAutotmpTypes.
The text was updated successfully, but these errors were encountered: