-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestReadMetrics failures #60607
Comments
Found new dashboard test flakes for:
2023-06-05 16:41 windows-arm64-11 go@587c1c19 runtime.TestReadMetrics (log)
|
I've been having the issue a lot, and I think it's caused by CL 497315 |
This is a test-only issue. I believe there are a few legitimate ways this can happen. I'll fix it. |
On second thought, I don't see how this could happen. The test is fairly strong in terms of it's consistency. (World is stopped, caches and stats are flushed.) There should be no skew that makes this possible. I can't reproduce on linux/amd64, so the fact that this happened on windows/arm64 is interesting. Maybe there's something odd happening on weak memory architectures. |
@RuinanSun Do you happen to have more details as to where you encountered this error and what you did to reproduce it? |
Ah, nope. Sorry, I had this in my head before, but when I came back to this I forgot. 😅 The issue is that the GC can double-count objects as live if two GC workers race to mark the same object. This is intentional and inconsequential; it would make the GC significantly slower for one worker to have to take ownership of marking an object. Marking is idempotent by design. The race is rare, and the rest of the runtime is already robust to it. This is indeed a test-only issue. That being said, I'm not sure what to change the test to. That will take some thought. |
Change https://go.dev/cl/501858 mentions this issue: |
Found new dashboard test flakes for:
2023-10-31 20:47 linux-amd64-longtest go@b11defea runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-10-31 20:38 darwin-amd64-longtest go@66b8107a runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-10-31 20:50 darwin-amd64-longtest go@d2f3a68b runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-01 17:49 linux-amd64-longtest go@b7a695bd runtime.TestReadMetrics (log)
|
I suspect a recent change got some accounting wrong. I'll try and bisect. |
I believe I've bisected it down to e293c4b. |
Reproducer:
|
I have a suspicion this is a latent issue that's just cropping up now. My best guess is that stack movement happens between |
Oof, yeah, this is going to be very difficult to avoid. We can't stay on the system stack for I'm inclined to just disable the test for |
Found new dashboard test flakes for:
2023-11-01 17:49 darwin-amd64-longtest go@b7a695bd runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-02 03:57 linux-amd64-longtest go@e5ef4846 runtime.TestReadMetrics (log)
2023-11-02 04:17 linux-amd64-longtest go@11677d98 runtime.TestReadMetrics (log)
|
I had a realization as to how we could work around the constraints of |
Change https://go.dev/cl/539117 mentions this issue: |
Found new dashboard test flakes for:
2023-11-02 08:05 linux-amd64-longtest go@4e896d17 runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-02 19:15 darwin-amd64-longtest go@2ffe600d runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-02 19:37 darwin-amd64-longtest go@f31a030e runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-03 15:06 linux-amd64-longtest go@1764da77 runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-03 15:20 windows-amd64-longtest go@e2d9574b runtime.TestReadMetrics (log)
|
Change https://go.dev/cl/539695 mentions this issue: |
ReadMetricsSlow was updated to call the core of readMetrics on the systemstack to prevent issues with stat skew if the stack gets moved between readmemstats_m and readMetrics. However, readMetrics calls into the map implementation, which has race instrumentation. The system stack typically has no racectx set, resulting in crashes. Donate racectx to g0 like the tracer does, so that these accesses don't crash. For #60607. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-race Change-Id: Ic0251af2d9b60361f071fe97084508223109480c Reviewed-on: https://go-review.googlesource.com/c/go/+/539695 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
Found new dashboard test flakes for:
2023-11-03 16:11 linux-amd64-race go@6a32ecc0 runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-03 16:11 freebsd-amd64-race go@6a32ecc0 runtime.TestReadMetrics (log)
2023-11-03 16:11 linux-arm64-race go@6a32ecc0 runtime.TestReadMetrics (log)
|
Found new dashboard test flakes for:
2023-11-03 16:11 darwin-amd64-race go@6a32ecc0 runtime.TestReadMetrics (log)
2023-11-03 16:11 linux-amd64-longtest-race go@6a32ecc0 runtime.TestReadMetrics (log)
2023-11-03 16:11 linux-s390x-ibm-race go@6a32ecc0 runtime.TestReadMetrics (log)
2023-11-03 16:11 windows-amd64-race go@6a32ecc0 runtime.TestReadMetrics (log)
|
These are race mode failures from before https://go.dev/cl/539695. I believe these are all the failures, so they should stop trickling in now. |
Issue created automatically to collect these failures.
Example (log):
— watchflakes
The text was updated successfully, but these errors were encountered: