-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: TestCPUProfileMultithreadMagnitude rusage too high on linux-arm64-aws #50097
Comments
The number of samples in the serial portion (cpuHog1) are spot-on: 10ms per sample, 100ms wall clock, 10 samples. The number of samples in the parallel portion (cpuHog3) are very close to the goal: 10ms per sample, 100ms wall clock, GOMAXPROCS=4 (this builder runs on a m6g.xlarge, 4 cores and no hyperthreading), 41 samples vs an expected 40. The failure is in the comparison of the serial portion. The profile reports 100ms, but rusage(2) reports 184ms. So this looks like another source of "extra/background/non-application CPU time that the runtime expends". (Earlier flakes were from GC.) Of the 59 total samples, taking out the 10+41 that are in the CPU hog functions the test expects, the remaining 8 are all in Of those, five point at the PC right after the epollwait syscall: go/src/runtime/sys_linux_arm64.s Line 731 in 766f89b
Two are at atomics: Line 5749 in 766f89b
Line 5818 in 766f89b
One is at a function call boundary, either before or after calling Line 19 in 766f89b
|
Change https://golang.org/cl/372800 mentions this issue: |
…tely Currently TestCPUProfileMultithreadMagnitude runs two CPU consumption functions in a single profile and then analyzes the results as separate subtests. This works fine, but when debugging failures it makes manual analysis of the profile dump a bit annoying. Refactor the test to collect separate profiles for each subtest for easier future analysis. For #50097. For #50232. Change-Id: Ia1c8bb86aaaf652e64c5e660dcc2da47d2194c2b Reviewed-on: https://go-review.googlesource.com/c/go/+/372800 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Bryan Mills <bcmills@google.com>
Was the change above intended to fix the test, or just improve debuggability? This is marked as a release blocker. |
Just improve debuggability. Fixing this still needs more investigation, which the CL helps with. |
Given that the rate of builder runs has slowed down for the code freeze, and given that these failures appear to be detecting a mostly-successful-but-incomplete fix, would it make sense to add a |
I ran this test for several hours on a I think that adding a temporary skip would be OK given that as @rhysh's mentions, this appears to be unexpected background CPU, rather than an issue with profiling itself. That said, I don't have any particular reason to believe this issue would be limited to arm64, so I'm not sure what constraints to put on the skip. @rhysh right now the test compares time in |
Any updates on this, @prattmic? Checking in as it's a release-blocker for Go 1.18. |
Change https://golang.org/cl/379535 mentions this issue: |
…tely Currently TestCPUProfileMultithreadMagnitude runs two CPU consumption functions in a single profile and then analyzes the results as separate subtests. This works fine, but when debugging failures it makes manual analysis of the profile dump a bit annoying. Refactor the test to collect separate profiles for each subtest for easier future analysis. For golang#50097. For golang#50232. Change-Id: Ia1c8bb86aaaf652e64c5e660dcc2da47d2194c2b Reviewed-on: https://go-review.googlesource.com/c/go/+/372800 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Bryan Mills <bcmills@google.com>
…readMagnitude TestCPUProfileMultithreadMagnitude compares pprof results vs OS rusage to verify that pprof is capturing all CPU usage. Presently it compares the sum of cpuHog1 samples vs rusage. However, background usage from the scheduler, GC, etc can cause additional CPU usage causing test failures if rusage is too far off from the cpuHog1 samples. That said, this test doesn't actually need to care about cpuHog1 samples. It simply cares that pprof samples match rusage, not what the breakdown of usage was. As a result, we can compare the sum of _all_ pprof samples vs rusage, which should implicitly include any background CPU usage. Fixes golang#50097. Change-Id: I649a18de5b3dcf58b62be5962fa508d14cd4dc79 Reviewed-on: https://go-review.googlesource.com/c/go/+/379535 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Seen on a
linux-arm64-aws
TryBot for unrelated CL 370666 (https://storage.googleapis.com/go-build-log/a297f68d/linux-arm64-aws_04471a78.log):Marking as release-blocker for Go 1.18 because this test is new (as of CL 324129).
(CC @prattmic @rhysh)
The text was updated successfully, but these errors were encountered: