Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestCPUProfileMultithreadMagnitude failure due to usage too high on linux-arm-aws #53785

Open
bcmills opened this issue Jul 11, 2022 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jul 11, 2022

2022-07-06T19:34:57-2f43de6/linux-arm-aws:

--- FAIL: TestCPUProfileMultithreadMagnitude (0.42s)
    pprof_test.go:123: Running on Linux 4.19.0
    --- FAIL: TestCPUProfileMultithreadMagnitude/serial (0.20s)
        pprof_test.go:189: Running with 1 workers
        pprof_test.go:524: total 9 CPU profile samples collected:
            3: 0x15609c (runtime/pprof.cpuHog0:61 runtime/pprof.cpuHog1:55) 0x155fe3 (runtime/pprof.cpuHogger:41) 0x157207 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func3.1.1.1:202) labels: map[]
            
            6: 0x1560a8 (runtime/pprof.cpuHog0:64 runtime/pprof.cpuHog1:55) 0x155fe3 (runtime/pprof.cpuHogger:41) 0x157207 (runtime/pprof.TestCPUProfileMultithreadMagnitude.func3.1.1.1:202) labels: map[]
            
        pprof_test.go:595: runtime/pprof.cpuHog1: 9
        pprof_test.go:226: compare 154.991ms vs 90ms
        pprof_test.go:228: compare got CPU usage reports are too different (limit -40.0%, got -41.9%) want nil
    pprof_test.go:126: Failure of this test may indicate that your system suffers from a known Linux kernel bug fixed on newer kernels. See https://golang.org/issue/49065.
FAIL
FAIL	runtime/pprof	7.677s

greplogs -l -e 'FAIL: TestCPUProfileMultithreadMagnitude' --since=2022-03-23
2022-07-06T19:34:57-2f43de6/linux-arm-aws

See previously #50097 (attn @prattmic; CC @golang/runtime).

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 11, 2022
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 11, 2022
@bcmills bcmills added this to the Backlog milestone Jul 11, 2022
@prattmic
Copy link
Member

I believe this is a case of #49065. That bug is not x86-specific, however I missed checking whether our ARM builders had updated kernels.

@prattmic
Copy link
Member

prattmic commented Jul 11, 2022

@golang/release I don't quite follow what is going on in https://cs.opensource.google/go/x/build/+/master:env/linux-arm64/aws/, but I get a sense that if we regenerate the AWS image that it will pick up a newer Debian base image which (presumably) has a newer kernel package including the fix for this issue. Does that sound correct?

@bcmills
Copy link
Contributor Author

bcmills commented Jul 11, 2022

Oh! Maybe we just need to remove or widen the GOARCH condition at
https://cs.opensource.google/go/go/+/master:src/runtime/pprof/pprof_test.go;l=133;drc=6ec46f470797ad816c3a5b20eece0995f13d2bc4
?

@prattmic
Copy link
Member

Oops, I didn't look closely enough:

  • This bug only affected Linux 5.9-5.16, but this VM is on 4.19.
  • This bug depended on CONFIG_POSIX_CPU_TIMERS_TASK_WORK, which was only enabled on x86 until 5.16 (IIRC), which added it to arm64.

So this should be different after all.

@dmitshur
Copy link
Contributor

@prattmic What you said about the x/build/env/linux-arm64/aws/ directory sounds plausible to me, but it also seems possible that the Docker image reuses the host's kernel, and if so then VMImage may be where an update would need to happen to pick up a newer kernel version. Someone else may know more.

@rhysh
Copy link
Contributor

rhysh commented Jul 22, 2022

This failure in 2022-07-06T19:34:57-2f43de6/linux-arm-aws is on release-branch.go1.18.

I think this the same sort of "short test duration means small sample size means moderate chance of failure when we get unlucky" as we saw in #50232. I fixed that in https://go.dev/cl/393934, "runtime/pprof: rerun magnitude test on failure", but that isn't backported to Go 1.18.

Should we backport that fix to Go 1.18, or live with the noise until it's EOL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants