cmd/api: occasional hangs on 'go list' in NewWalker without useful output #45884

bcmills · 2021-04-30T15:53:03Z

2021-04-23T20:57:54-691e1b8/linux-mips64le-mengzhuo (12 minutes on go list)
2021-04-23T13:48:10-bedfeed/linux-mips-rtrk (12 minutes on go list)
2021-04-21T20:24:34-2550563/linux-mips64le-mengzhuo (12 minutes ingo list)

This is probably a bug in one or more of the go command, the os package, or the Linux platform on MIPS.

However, this failure output isn't giving us much useful information. NewWalker should accept a context.Context, which we can then plumb to the test's deadline to send SIGQUIT to go list to get more useful output in case of failure.

The text was updated successfully, but these errors were encountered:

bcmills · 2021-05-05T01:13:15Z

Also seen on a linux-arm-aws TryBot:
https://storage.googleapis.com/go-build-log/9041f75f/linux-arm-aws_6baaa067.log

bcmills · 2021-05-05T01:14:37Z

I suspect that this is the same deadlock observed in #45916 (CC @prattmic @mknyszek).

bcmills · 2021-05-05T01:17:45Z

2021-05-03T01:35:44-2c9f5a1/linux-mips64le-mengzhuo
2021-04-30T19:38:25-d19eece/aix-ppc64
2021-04-28T18:03:21-90614ff/openbsd-amd64-64
2021-04-23T20:57:54-691e1b8/linux-mips64le-mengzhuo
2021-04-23T13:48:10-bedfeed/linux-mips-rtrk
2021-04-21T20:24:34-2550563/linux-mips64le-mengzhuo
2021-04-21T04:26:11-69c94ad/openbsd-arm64-jsing

bcmills · 2021-05-05T01:18:31Z

Marking as release-blocker for Go 1.17, because this is looking like a regression introduced sometime in late April.

bcmills · 2021-05-05T01:20:05Z

(It's not obvious to me whether the regression is in the runtime or in cmd/go, but from the odd stack traces in #45916 I initially suspect the runtime.)

prattmic · 2021-05-05T15:37:50Z

The earliest failure here (by filename timestamp) is 2021-04-21T04:26:11-69c94ad/openbsd-arm64-jsing. That ran at 69c94ad, which is before 7e97e4e and ecfce58, two of my top contenders as potentially problematic. So if these issues are all related, those may not be involved.

toothrot · 2021-05-06T18:11:28Z

/cc @dmitshur

gopherbot · 2021-05-10T21:00:01Z

Change https://golang.org/cl/318569 mentions this issue: runtime: hold sched.lock across atomic pidleget/pidleput

prattmic · 2021-05-10T21:09:35Z

It is hard to be certain without more complete stacks, but I think http://golang.org/cl/318569 will fix this. i.e., that #45975, #45916, #45885, and #45884 all have the same cause.

As a cleanup, golang.org/cl/307914 unintentionally caused the idle GC work recheck to drop sched.lock between acquiring a P and committing to keep it (once a worker G was found). This is unsafe, as releasing a P requires extra checks once sched.lock is taken (such as for runSafePointFn). Since checkIdleGCNoP does not perform these extra checks, we can now race with other users. In the case of #45975, we may hang with this sequence: 1. M1: checkIdleGCNoP takes sched.lock, gets P1, releases sched.lock. 2. M2: forEachP takes sched.lock, iterates over sched.pidle without finding P1, releases sched.lock. 3. M1: checkIdleGCNoP puts P1 back in sched.pidle. 4. M2: forEachP waits forever for P1 to run the safePointFn. Change back to the old behavior of releasing sched.lock only after we are certain we will keep the P. Thus if we put it back its removal from sched.pidle was never visible. Fixes #45975 For #45916 For #45885 For #45884 Change-Id: I191a1800923b206ccaf96bdcdd0bfdad17b532e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/318569 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

prattmic · 2021-05-11T14:42:50Z

I think this bug should be fixed, but can't be sure. Please note here if you see more occurrences.

heschi · 2021-05-20T18:28:18Z

The last 30 commits, roughly corresponding to the last week, show no signs of this problem. Closing.

bcmills added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. labels Apr 30, 2021

bcmills added this to the Backlog milestone Apr 30, 2021

bcmills mentioned this issue Apr 30, 2021

cmd/compile: apparent deadlocks on builders #45885

Closed

bcmills changed the title ~~cmd/api: occasional hangs on 'go list' in NewWalker on MIPS without useful output~~ cmd/api: occasional hangs on 'go list' in NewWalker without useful output May 5, 2021

bcmills added release-blocker NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 5, 2021

bcmills modified the milestones: Backlog, Go1.17 May 5, 2021

gopherbot removed the NeedsFix The path to resolution is known, but the work has not been done. label May 5, 2021

bcmills mentioned this issue May 5, 2021

runtime,sync/atomic: deadlocks in sync/atomic_test.TestValueSwapConcurrent #45975

Closed

prattmic mentioned this issue May 10, 2021

x/build: Builders takes too long to complete tests #45916

Closed

prattmic added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 11, 2021

heschi closed this as completed May 20, 2021

golang locked and limited conversation to collaborators May 20, 2022

gopherbot added the FrozenDueToAge label May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/api: occasional hangs on 'go list' in NewWalker without useful output #45884

cmd/api: occasional hangs on 'go list' in NewWalker without useful output #45884

bcmills commented Apr 30, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

prattmic commented May 5, 2021

toothrot commented May 6, 2021

gopherbot commented May 10, 2021

prattmic commented May 10, 2021

prattmic commented May 11, 2021

heschi commented May 20, 2021

cmd/api: occasional hangs on 'go list' in NewWalker without useful output #45884

cmd/api: occasional hangs on 'go list' in NewWalker without useful output #45884

Comments

bcmills commented Apr 30, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

bcmills commented May 5, 2021

prattmic commented May 5, 2021

toothrot commented May 6, 2021

gopherbot commented May 10, 2021

prattmic commented May 10, 2021

prattmic commented May 11, 2021

heschi commented May 20, 2021