New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/metrics: time.Sleep contributes to "user" rather than "idle" cpu class #60276
Comments
That's very odd. I would expect the M to be blocked in the netpoller, after releasing its P. Once the P is idle, all the CPU time should be going to idle time. Specifically, down here: https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=3105?q=findRunnable&ss=go%2Fgo. I suspect whatever the problem is, it'll be easy to fix. |
Found the issue; we're using a stale "now" value for Ms that block in netpoll.
Ugh, this probably has implications for the GC CPU limiter. |
Change https://go.dev/cl/496183 mentions this issue: |
We may want to backport this? The fix is very small and we have a test. |
Change https://go.dev/cl/487215 mentions this issue: |
Change https://go.dev/cl/496117 mentions this issue: |
This was an oversight, which might cause accounted-for idle time to be lost. Noticed this while working on #60276. Change-Id: Ic743785d6dc82555e660f2c9b6aaa9dedef56ed8 Reviewed-on: https://go-review.googlesource.com/c/go/+/496117 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
Currently the CPU stats are only updated once every mark termination, but for writing robust tests, it's often useful to force this update. Refactor the CPU stats accumulation out of gcMarkTermination and into its own function. This is also a step toward real-time CPU stats. While we're here, fix some incorrect documentation about dedicated GC CPU time. For #59749. For #60276. Change-Id: I8c1a9aca45fcce6ce7999702ae4e082853a69711 Reviewed-on: https://go-review.googlesource.com/c/go/+/487215 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
Change https://go.dev/cl/497876 mentions this issue: |
This test is fundamentally flaky because of a mismatch between how internal idle time is calculated and how the test expects it to be calculated. It's unclear how to resolve this mismatch, given that it's perfectly valid for a goroutine to remain asleep while background goroutines (e.g. the scavenger) run. In practice, we might be able to set some generous lower-bound, but until we can confirm that on the affected platforms, skip the test as flaky unconditionally. For #60276. For #60376. Change-Id: Iffd5c4be10cf8ae8a6c285b61fcc9173235fbb2a Reviewed-on: https://go-review.googlesource.com/c/go/+/497876 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/498274 mentions this issue: |
This test was introduced as a regression test for #60276. However, it was quite flaky on a number of different platforms because there are myriad ways the runtime can eat into time one might expect is completely idle. This change re-enables the test, but makes it much more resilient. Because the issue we're testing for is persistent, we now require 10 consecutive failures to count. Any single success counts as a test success. This change also makes the test's idle time bound more lenient, allowing for a little bit of time to be eaten up. The regression we're testing for results in nearly zero idle time being accounted for. If this is still not good enough to eliminate flakes, this test should just be deleted. For #60276. Fixes #60376. Change-Id: Icd81f0c9970821b7f386f6d27c8a566fee4d0ff7 Reviewed-on: https://go-review.googlesource.com/c/go/+/498274 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. New in Go 1.20 (with the introduction of
/cpu/classes/user:cpu-seconds
), still present in tip.What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I inspected
/cpu/classes/user:cpu-seconds
and/cpu/classes/idle:cpu-seconds
from the runtime/metrics package before and after callingtime.Sleep
.Note that CPU time accounting isn't available on the playground, so the problem doesn't reproduce there.
What did you expect to see?
I expected time during which all user goroutines are asleep to count as "idle" time.
What did you see instead?
The full duration of the
time.Sleep
call is counted towards/cpu/classes/user:cpu-seconds
. Maybe the P that is responsible for tracking the timer isn't considered to be "fully asleep"?I see there's a chance of some overestimation:
But there's also a chance of overestimation in the "idle" class. And during the sleep, no user or runtime code should be "executing".
CC @golang/runtime @mknyszek
The text was updated successfully, but these errors were encountered: