-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: Builders takes too long to complete tests #45916
Comments
Can you send a |
w.r.t. the line number on goroutine 22, here's the objdump from vet built at 95c5f4d:
Just looking at this, the line numbers seem reasonable (the |
Actually, |
Change https://golang.org/cl/316469 mentions this issue: |
Another one.
|
Was this issue first seen 2 days ago? If this is a runtime issue, http://golang.org/cl/310850 is a good candidate for potentially missing work. Ideally we could get a core file from one of these runs (@mknyszek perhaps golang.org/cl/316469 should use 'crash' after all?). @mengzhuo If you see this again, I wonder if you could use |
If we're testing through dist, we're testing the implementation of Go, so we're interested in any package failing with potential runtime issues. In these cases, we'd like to have as much relevant detail as possible, but currently runtime stack frames and goroutines are suppressed due to the default GOTRACEBACK setting. So, try to set GOTRACEBACK to system if it's unset. Check if it's unset first so we don't override the user asking for a lower or higher level. This change was brought up in the context of #45916, since there's an apparent deadlock (or something!) in the runtime that appears when running other code, but it's difficult to see exactly where it's blocked. However, this change is very generally useful. This change also runs scripted tests with GOTRACEBACK=system, upgrading from GOTRACEBACK=all. Often, script tests can trigger failures deep in the runtime in interesting ways because they start many individual Go processes, so being able to identify points of interest in the runtime is quite useful. For #45916. Change-Id: I3d50658d0d0090fb4c9182b87200d266c7f8f915 Reviewed-on: https://go-review.googlesource.com/c/go/+/316469 Trust: Michael Knyszek <mknyszek@google.com> Trust: Bryan C. Mills <bcmills@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Change https://golang.org/cl/318569 mentions this issue: |
It is hard to be certain without more complete stacks, but I think http://golang.org/cl/318569 will fix this. i.e., that #45975, #45916, #45885, and #45884 all have the same cause. |
As a cleanup, golang.org/cl/307914 unintentionally caused the idle GC work recheck to drop sched.lock between acquiring a P and committing to keep it (once a worker G was found). This is unsafe, as releasing a P requires extra checks once sched.lock is taken (such as for runSafePointFn). Since checkIdleGCNoP does not perform these extra checks, we can now race with other users. In the case of #45975, we may hang with this sequence: 1. M1: checkIdleGCNoP takes sched.lock, gets P1, releases sched.lock. 2. M2: forEachP takes sched.lock, iterates over sched.pidle without finding P1, releases sched.lock. 3. M1: checkIdleGCNoP puts P1 back in sched.pidle. 4. M2: forEachP waits forever for P1 to run the safePointFn. Change back to the old behavior of releasing sched.lock only after we are certain we will keep the P. Thus if we put it back its removal from sched.pidle was never visible. Fixes #45975 For #45916 For #45885 For #45884 Change-Id: I191a1800923b206ccaf96bdcdd0bfdad17b532e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/318569 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
I think this bug should be fixed, but can't be sure. Please note here if you see more occurrences. |
Take a look at https://build.golang.org now, some of builders that running for more than 20000 seconds (5 h).
But when I login into the machine, it got "hang up" since 5 hours ago.
https://farmer.golang.org/temporarylogs?name=linux-mips64le-mengzhuo&rev=95c5f4da80960d0e3511d39c9a9db7280099a37e&st=0xc0167edba0&subName=net&subRev=89ef3d95e781148a0951956029c92a211477f7f9
@bcmills
The text was updated successfully, but these errors were encountered: