Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestCPUProfileMultithreadMagnitude rusage too low #50232

Closed
prattmic opened this issue Dec 16, 2021 · 10 comments
Closed

runtime/pprof: TestCPUProfileMultithreadMagnitude rusage too low #50232

prattmic opened this issue Dec 16, 2021 · 10 comments
Assignees
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux release-blocker
Milestone

Comments

@prattmic
Copy link
Member

2021-12-06T19:10:14-7a84066/linux-s390x-ibm

--- FAIL: TestCPUProfileMultithreadMagnitude (0.83s)
    pprof_test.go:127: Running on Linux 4.18.0
    pprof_test.go:526: total 125 CPU profile samples collected:
        5: 0x1739ec (runtime/pprof.cpuHog0:61 runtime/pprof.cpuHog1:55) 0x173939 (runtime/pprof.cpuHogger:41) 0x174ea5 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.1:176) 0x18893d (runtime/pprof.diffCPUTimeRUsage:27) 0x17614b (runtime/pprof.diffCPUTime:485) 0x174a9b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2:174) 0x175f2b (runtime/pprof.testCPUProfile:453) 0x174537 (runtime/pprof.TestCPUProfileMultithreadMagnitude:173) 0xf6661 (testing.tRunner:1410) labels: map[]
        
        2: 0x84c00 (runtime.asyncPreempt:7) 0x1739eb (runtime/pprof.cpuHog0:60 runtime/pprof.cpuHog1:55) 0x173939 (runtime/pprof.cpuHogger:41) 0x174ea5 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.1:176) 0x18893d (runtime/pprof.diffCPUTimeRUsage:27) 0x17614b (runtime/pprof.diffCPUTime:485) 0x174a9b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2:174) 0x175f2b (runtime/pprof.testCPUProfile:453) 0x174537 (runtime/pprof.TestCPUProfileMultithreadMagnitude:173) 0xf6661 (testing.tRunner:1410) labels: map[]
        
        4: 0x1739fa (runtime/pprof.cpuHog0:64 runtime/pprof.cpuHog1:55) 0x173939 (runtime/pprof.cpuHogger:41) 0x174ea5 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.1:176) 0x18893d (runtime/pprof.diffCPUTimeRUsage:27) 0x17614b (runtime/pprof.diffCPUTime:485) 0x174a9b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2:174) 0x175f2b (runtime/pprof.testCPUProfile:453) 0x174537 (runtime/pprof.TestCPUProfileMultithreadMagnitude:173) 0xf6661 (testing.tRunner:1410) labels: map[]
        
        5: 0x84c00 (runtime.asyncPreempt:7) 0x1739f9 (runtime/pprof.cpuHog0:62 runtime/pprof.cpuHog1:55) 0x173939 (runtime/pprof.cpuHogger:41) 0x174ea5 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.1:176) 0x18893d (runtime/pprof.diffCPUTimeRUsage:27) 0x17614b (runtime/pprof.diffCPUTime:485) 0x174a9b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2:174) 0x175f2b (runtime/pprof.testCPUProfile:453) 0x174537 (runtime/pprof.TestCPUProfileMultithreadMagnitude:173) 0xf6661 (testing.tRunner:1410) labels: map[]
        
        34: 0x173a7a (runtime/pprof.cpuHog0:64 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        43: 0x173a6c (runtime/pprof.cpuHog0:61 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        14: 0x84c00 (runtime.asyncPreempt:7) 0x173a6b (runtime/pprof.cpuHog0:60 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        1: 0x173a80 (runtime/pprof.cpuHog0:64 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        13: 0x84c00 (runtime.asyncPreempt:7) 0x173a79 (runtime/pprof.cpuHog0:62 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        2: 0x173a5e (runtime/pprof.cpuHog0:60 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        1: 0x173a68 (runtime/pprof.cpuHog0:60 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
        1: 0x84c00 (runtime.asyncPreempt:7) 0x173a7f (runtime/pprof.cpuHog0:64 runtime/pprof.cpuHog3:83) 0x173939 (runtime/pprof.cpuHogger:41) 0x174d4b (runtime/pprof.TestCPUProfileMultithreadMagnitude.func2.2.1:188) labels: map[]
        
    pprof_test.go:597: runtime/pprof.cpuHog1: 16
    pprof_test.go:597: runtime/pprof.cpuHog3: 109
    --- FAIL: TestCPUProfileMultithreadMagnitude/serial_execution_OS_vs_pprof (0.00s)
        pprof_test.go:214: compare 86.706ms vs 160ms
        pprof_test.go:226: CPU usage reports are too different (limit -40.0%, got -45.8%)
    pprof_test.go:130: Failure of this test may indicate that your system suffers from a known Linux kernel bug fixed on newer kernels. See https://golang.org/issue/49065.
FAIL
FAIL	runtime/pprof	7.818s

This failure is the opposite of #50097. Here, the rusage is lower than the pprof samples, which is rather surprising. It also seems like we have way too many cpuHog1 samples. I'd expect 10, but see 16.

cc @rhysh @bcmills

@prattmic prattmic added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 16, 2021
@gopherbot
Copy link

Change https://golang.org/cl/372800 mentions this issue: runtime/pprof: run TestCPUProfileMultithreadMagnitude subtests separately

@rhysh
Copy link
Contributor

rhysh commented Dec 16, 2021

The CPU profiler asks for SIGPROF signals at 100Hz: https://github.com/golang/go/blob/go1.17.5/src/runtime/pprof/pprof.go#L771

The runtime allows goroutines to run for 10ms before preempting them (via SIGURG): https://github.com/golang/go/blob/go1.17.5/src/runtime/proc.go#L5476

Are we seeing this genre of failure—a small fraction of times the CPU profiler runs include many samples pointing to code in the scheduler—because those intervals are in sync?

@prattmic
Copy link
Member Author

@rhysh did you intend to put that comment on #50097? The profile here has no samples in the scheduler.

@rhysh
Copy link
Contributor

rhysh commented Dec 16, 2021

I meant it for both. This has samples "in" asyncPreempt.

@prattmic
Copy link
Member Author

Ah true, good point.

gopherbot pushed a commit that referenced this issue Jan 10, 2022
…tely

Currently TestCPUProfileMultithreadMagnitude runs two CPU consumption
functions in a single profile and then analyzes the results as separate
subtests.

This works fine, but when debugging failures it makes manual analysis of
the profile dump a bit annoying.

Refactor the test to collect separate profiles for each subtest for
easier future analysis.

For #50097.
For #50232.

Change-Id: Ia1c8bb86aaaf652e64c5e660dcc2da47d2194c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/372800
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Bryan Mills <bcmills@google.com>
@prattmic prattmic self-assigned this Feb 17, 2022
@prattmic prattmic added this to the Go1.19 milestone Feb 18, 2022
@heschi
Copy link
Contributor

heschi commented Mar 16, 2022

ping @prattmic @laboger -- this needs a skip or a fix.

@bcmills
Copy link
Contributor

bcmills commented Mar 17, 2022

This appears not to be specific to s390x. It has now also reproduced on linux/amd64 and linux/arm64, which are both first-class ports.

greplogs --dashboard -md -l -e 'CPU usage reports are too different' --since=2022-03-01

2022-03-16T20:10:07-91631bc/linux-amd64-wsl
2022-03-16T17:41:38-c0158b6/linux-arm64-aws

@bcmills bcmills changed the title runtime/pprof: TestCPUProfileMultithreadMagnitude rusage too low on linux-s390x-ibm runtime/pprof: TestCPUProfileMultithreadMagnitude rusage too low Mar 17, 2022
@rhysh
Copy link
Contributor

rhysh commented Mar 18, 2022

I'm working on this. I plan to convert it back to the usual profiling test structure, so testCPUProfile can retry with a longer duration on failures. This will also re-enable testCPUProfile's normal behavior of skipping on QEMU platforms (and OSes where profiling is "broken", though this test is Linux-only).

The early versions of TestCPUProfileMultithreadMagnitude ran the serial and parallel versions as part of the same profile, which got in the way of using testCPUProfile's retry feature.

I'm also going to restructure the rusage reporting so the test can log the user and system time separately. I expect the SIGPROF deliveries to match the sum of those, but there may be a helpful clue in that breakdown when the test fails.

@gopherbot
Copy link

Change https://go.dev/cl/393934 mentions this issue: runtime/pprof: rerun magnitude test on failure

gopherbot pushed a commit that referenced this issue Mar 22, 2022
Restructure TestCPUProfileMultithreadMagnitude so it will run again with
a longer duration on failure. Log the split between the user vs system
CPU time that rusage reports.

For #50232

Change-Id: Ice5b38ee7594dbee1eaa5686d32b968c306e3e85
Reviewed-on: https://go-review.googlesource.com/c/go/+/393934
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
@bcmills
Copy link
Contributor

bcmills commented Mar 30, 2022

So far, so good. (Is there anything still planned for this issue, or can it be closed?)

greplogs --dashboard -md -l -e 'FAIL: TestCPUProfileMultithreadMagnitude' --since=2022-03-23

jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
…tely

Currently TestCPUProfileMultithreadMagnitude runs two CPU consumption
functions in a single profile and then analyzes the results as separate
subtests.

This works fine, but when debugging failures it makes manual analysis of
the profile dump a bit annoying.

Refactor the test to collect separate profiles for each subtest for
easier future analysis.

For golang#50097.
For golang#50232.

Change-Id: Ia1c8bb86aaaf652e64c5e660dcc2da47d2194c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/372800
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Bryan Mills <bcmills@google.com>
@rsc rsc unassigned rhysh and prattmic Jun 22, 2022
@prattmic prattmic self-assigned this Jun 24, 2022
@golang golang locked and limited conversation to collaborators Jun 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants